Coding Standards¶

This document defines the Python coding standards for ScamShield AI. All code in functions/, dashboard/, and tests/ must follow these conventions. Standards are enforced by ruff (linting and formatting).

Python Version¶

Python 3.11 is the target runtime. Use type hints from typing for compatibility (not X | Y union syntax from 3.10+, since typing.Optional is the project convention).

Import Order¶

Imports follow three groups separated by blank lines: stdlib, third-party, local. Enforced by ruff's isort rules.

import logging
import os
from typing import Dict, List, Optional

from pydantic import BaseModel, Field
import httpx

from guvi.models import GuviRequest, GuviResponse
from utils.logging import setup_logging, timed

Rules:

Absolute imports only (no relative imports like from . import)
One import per line for from X import Y when importing multiple names
from __future__ import annotations is not used in this project

Type Hints¶

Required on all public function signatures. Use Optional[X] for nullable parameters.

def extract_upi_ids(text: str) -> List[str]:
    """Extract UPI IDs from text."""
    ...

def get_session(session_id: str) -> Optional[Dict[str, Any]]:
    """Get session from Firestore, or None if not found."""
    ...

def process_message(
    session: SessionState,
    message: str,
    history: List[Dict],
    metadata: GuviMetadata,
) -> ProcessingResult:
    """Process a scammer message and return the result."""
    ...

Private helper functions (prefixed with _) do not strictly require type hints but are encouraged.

Docstrings¶

Google style. Required on all public functions and classes.

def sanitize_message(text: str) -> str:
    """Sanitize a scammer message before embedding it in a Gemini prompt.

    Args:
        text: Raw scammer message text.

    Returns:
        Sanitized text with injection patterns replaced by [FILTERED].
    """
    ...

class Orchestrator:
    """Main orchestration engine.

    Coordinates classification, persona response, and evidence extraction.
    """
    ...

For simple one-line functions, a single-line docstring is acceptable:

def _sanitize_task_name(session_id: str) -> str:
    """Sanitize session ID for use in Cloud Tasks task name."""
    ...

Pydantic Models¶

GUVI-Facing Models (API boundary)¶

Use camelCase field names to match the GUVI API specification:

class ExtractedIntelligence(BaseModel):
    """Intelligence extracted during the honeypot conversation."""

    bankAccounts: List[str] = Field(
        default_factory=list,
        description="Extracted bank account numbers (9-18 digits)",
    )
    upiIds: List[str] = Field(
        default_factory=list,
        description="Extracted UPI IDs (format: name@bank)",
    )

Internal Models¶

Use snake_case field names:

class SessionState(BaseModel):
    """Internal session state stored in Firestore."""

    guvi_session_id: str
    scam_type: Optional[str] = None
    confidence: float = 0.0
    message_count: int = 0

Conventions¶

Use Field(...) for required fields with descriptions
Use Field(default_factory=list) for mutable defaults (never = [])
Use @field_validator for custom validation logic
Use model_dump() and model_validate() (not the deprecated .dict() / .parse_obj())

HTTP Client¶

Use httpx for all HTTP calls in functions/. The requests library is not a project dependency.

import httpx

response = httpx.post(
    url,
    json=payload,
    headers={"Content-Type": "application/json", "x-api-key": api_key},
    timeout=10.0,
)
response.raise_for_status()

For the callback service, which makes multiple calls, use httpx.Client for connection pooling:

self.client = httpx.Client(timeout=30.0)
response = self.client.post(url, json=payload, headers=headers)

Logging¶

Use the structured JSON logger. Never use print().

import logging

logger = logging.getLogger(__name__)

# Info: session-level events
logger.info(f"Processing request: session={session_id}")

# Warning: recoverable issues
logger.warning(f"Rate limited: session={session_id}, reason={reason}")

# Error: failures
logger.error(f"Callback failed for session={session_id}")

# Exception: errors with stack traces
logger.exception(f"Handler error: {e}")

What to Log¶

Level	Log	Do Not Log
INFO	Session IDs, scam types, evidence type counts ("found 2 UPI IDs")	Full scammer messages, evidence values, API keys
WARNING	Rate limiting, recoverable failures, missing config	PII, secrets
ERROR	Unrecoverable failures, callback failures	Secret values
DEBUG	Evidence values (for local troubleshooting only)	Secrets

Performance Tracking¶

Use the @timed() decorator for performance-sensitive functions:

from utils.logging import timed

@timed("gemini.classify")
def classify_scam(text: str) -> str:
    """Classify the scam type using Gemini."""
    ...

Error Handling¶

Use structured logging in except blocks, never bare print()
Re-raise exceptions after logging unless you have a specific recovery strategy
Use logger.exception() in catch blocks (includes stack trace automatically)
Return typed error responses at API boundaries

try:
    result = gemini_client.classify(text)
except Exception:
    logger.exception("Classification failed")
    raise

At the API boundary (handler), errors are caught and converted to graceful fallback responses:

except Exception as e:
    logger.exception(f"Error processing request: {e}")
    return GuviResponse(
        status="success",
        reply="Ek minute, network slow hai...",
        agentNotes=f"Error fallback: {str(e)[:100]}",
    ).model_dump()

Line Length and Formatting¶

Max line length: 100 characters
Formatter: ruff format
Linter: ruff check (rules: E, F, I)

# Check for issues
ruff check functions/ dashboard/ tests/

# Auto-fix
ruff check --fix functions/ dashboard/ tests/

# Format
ruff format functions/ dashboard/ tests/

Naming Conventions¶

Element	Convention	Example
Functions	`snake_case`	`extract_upi_ids()`
Variables	`snake_case`	`session_id`
Classes	`PascalCase`	`GuviRequest`
Constants	`UPPER_SNAKE_CASE`	`MAX_MESSAGES_PER_SESSION`
Private functions	`_snake_case`	`_extract_evidence_from_full_conversation()`
File names	`snake_case`	`regex_patterns.py`
Test files	`test_*.py`	`test_extractors.py`

Dashboard-Specific Conventions¶

st.set_page_config() must be the first Streamlit call in every page
require_auth() must be called immediately after st.set_page_config()
All Firestore reads go through dashboard/utils/firestore_client.py
All rendering uses shared functions from dashboard/utils/display.py
Use sys.path.insert for importing from utils/:

import sys, os
sys.path.insert(0, os.path.join(os.path.dirname(__file__), ".."))
from utils.auth import require_auth

Linter Configuration¶

The project uses ruff.toml at the repository root:

line-length = 100
target-version = "py311"

[lint]
select = ["E", "F", "I"]

E: pycodestyle errors
F: pyflakes (unused imports, undefined names)
I: isort (import ordering)