Skip to content

Coding Standards

This document defines the Python coding standards for ScamShield AI. All code in functions/, dashboard/, and tests/ must follow these conventions. Standards are enforced by ruff (linting and formatting).

Python Version

Python 3.11 is the target runtime. Use type hints from typing for compatibility (not X | Y union syntax from 3.10+, since typing.Optional is the project convention).

Import Order

Imports follow three groups separated by blank lines: stdlib, third-party, local. Enforced by ruff's isort rules.

import logging
import os
from typing import Dict, List, Optional

from pydantic import BaseModel, Field
import httpx

from guvi.models import GuviRequest, GuviResponse
from utils.logging import setup_logging, timed

Rules:

  • Absolute imports only (no relative imports like from . import)
  • One import per line for from X import Y when importing multiple names
  • from __future__ import annotations is not used in this project

Type Hints

Required on all public function signatures. Use Optional[X] for nullable parameters.

def extract_upi_ids(text: str) -> List[str]:
    """Extract UPI IDs from text."""
    ...

def get_session(session_id: str) -> Optional[Dict[str, Any]]:
    """Get session from Firestore, or None if not found."""
    ...

def process_message(
    session: SessionState,
    message: str,
    history: List[Dict],
    metadata: GuviMetadata,
) -> ProcessingResult:
    """Process a scammer message and return the result."""
    ...

Private helper functions (prefixed with _) do not strictly require type hints but are encouraged.

Docstrings

Google style. Required on all public functions and classes.

def sanitize_message(text: str) -> str:
    """Sanitize a scammer message before embedding it in a Gemini prompt.

    Args:
        text: Raw scammer message text.

    Returns:
        Sanitized text with injection patterns replaced by [FILTERED].
    """
    ...
class Orchestrator:
    """Main orchestration engine.

    Coordinates classification, persona response, and evidence extraction.
    """
    ...

For simple one-line functions, a single-line docstring is acceptable:

def _sanitize_task_name(session_id: str) -> str:
    """Sanitize session ID for use in Cloud Tasks task name."""
    ...

Pydantic Models

GUVI-Facing Models (API boundary)

Use camelCase field names to match the GUVI API specification:

class ExtractedIntelligence(BaseModel):
    """Intelligence extracted during the honeypot conversation."""

    bankAccounts: List[str] = Field(
        default_factory=list,
        description="Extracted bank account numbers (9-18 digits)",
    )
    upiIds: List[str] = Field(
        default_factory=list,
        description="Extracted UPI IDs (format: name@bank)",
    )

Internal Models

Use snake_case field names:

class SessionState(BaseModel):
    """Internal session state stored in Firestore."""

    guvi_session_id: str
    scam_type: Optional[str] = None
    confidence: float = 0.0
    message_count: int = 0

Conventions

  • Use Field(...) for required fields with descriptions
  • Use Field(default_factory=list) for mutable defaults (never = [])
  • Use @field_validator for custom validation logic
  • Use model_dump() and model_validate() (not the deprecated .dict() / .parse_obj())

HTTP Client

Use httpx for all HTTP calls in functions/. The requests library is not a project dependency.

import httpx

response = httpx.post(
    url,
    json=payload,
    headers={"Content-Type": "application/json", "x-api-key": api_key},
    timeout=10.0,
)
response.raise_for_status()

For the callback service, which makes multiple calls, use httpx.Client for connection pooling:

self.client = httpx.Client(timeout=30.0)
response = self.client.post(url, json=payload, headers=headers)

Logging

Use the structured JSON logger. Never use print().

import logging

logger = logging.getLogger(__name__)

# Info: session-level events
logger.info(f"Processing request: session={session_id}")

# Warning: recoverable issues
logger.warning(f"Rate limited: session={session_id}, reason={reason}")

# Error: failures
logger.error(f"Callback failed for session={session_id}")

# Exception: errors with stack traces
logger.exception(f"Handler error: {e}")

What to Log

Level Log Do Not Log
INFO Session IDs, scam types, evidence type counts ("found 2 UPI IDs") Full scammer messages, evidence values, API keys
WARNING Rate limiting, recoverable failures, missing config PII, secrets
ERROR Unrecoverable failures, callback failures Secret values
DEBUG Evidence values (for local troubleshooting only) Secrets

Performance Tracking

Use the @timed() decorator for performance-sensitive functions:

from utils.logging import timed

@timed("gemini.classify")
def classify_scam(text: str) -> str:
    """Classify the scam type using Gemini."""
    ...

Error Handling

  • Use structured logging in except blocks, never bare print()
  • Re-raise exceptions after logging unless you have a specific recovery strategy
  • Use logger.exception() in catch blocks (includes stack trace automatically)
  • Return typed error responses at API boundaries
try:
    result = gemini_client.classify(text)
except Exception:
    logger.exception("Classification failed")
    raise

At the API boundary (handler), errors are caught and converted to graceful fallback responses:

except Exception as e:
    logger.exception(f"Error processing request: {e}")
    return GuviResponse(
        status="success",
        reply="Ek minute, network slow hai...",
        agentNotes=f"Error fallback: {str(e)[:100]}",
    ).model_dump()

Line Length and Formatting

  • Max line length: 100 characters
  • Formatter: ruff format
  • Linter: ruff check (rules: E, F, I)
# Check for issues
ruff check functions/ dashboard/ tests/

# Auto-fix
ruff check --fix functions/ dashboard/ tests/

# Format
ruff format functions/ dashboard/ tests/

Naming Conventions

Element Convention Example
Functions snake_case extract_upi_ids()
Variables snake_case session_id
Classes PascalCase GuviRequest
Constants UPPER_SNAKE_CASE MAX_MESSAGES_PER_SESSION
Private functions _snake_case _extract_evidence_from_full_conversation()
File names snake_case regex_patterns.py
Test files test_*.py test_extractors.py

Dashboard-Specific Conventions

  • st.set_page_config() must be the first Streamlit call in every page
  • require_auth() must be called immediately after st.set_page_config()
  • All Firestore reads go through dashboard/utils/firestore_client.py
  • All rendering uses shared functions from dashboard/utils/display.py
  • Use sys.path.insert for importing from utils/:
import sys, os
sys.path.insert(0, os.path.join(os.path.dirname(__file__), ".."))
from utils.auth import require_auth

Linter Configuration

The project uses ruff.toml at the repository root:

line-length = 100
target-version = "py311"

[lint]
select = ["E", "F", "I"]
  • E: pycodestyle errors
  • F: pyflakes (unused imports, undefined names)
  • I: isort (import ordering)