Coding Standards¶
This document defines the Python coding standards for ScamShield AI. All code in functions/, dashboard/, and tests/ must follow these conventions. Standards are enforced by ruff (linting and formatting).
Python Version¶
Python 3.11 is the target runtime. Use type hints from typing for compatibility (not X | Y union syntax from 3.10+, since typing.Optional is the project convention).
Import Order¶
Imports follow three groups separated by blank lines: stdlib, third-party, local. Enforced by ruff's isort rules.
import logging
import os
from typing import Dict, List, Optional
from pydantic import BaseModel, Field
import httpx
from guvi.models import GuviRequest, GuviResponse
from utils.logging import setup_logging, timed
Rules:
- Absolute imports only (no relative imports like
from . import) - One import per line for
from X import Ywhen importing multiple names from __future__ import annotationsis not used in this project
Type Hints¶
Required on all public function signatures. Use Optional[X] for nullable parameters.
def extract_upi_ids(text: str) -> List[str]:
"""Extract UPI IDs from text."""
...
def get_session(session_id: str) -> Optional[Dict[str, Any]]:
"""Get session from Firestore, or None if not found."""
...
def process_message(
session: SessionState,
message: str,
history: List[Dict],
metadata: GuviMetadata,
) -> ProcessingResult:
"""Process a scammer message and return the result."""
...
Private helper functions (prefixed with _) do not strictly require type hints but are encouraged.
Docstrings¶
Google style. Required on all public functions and classes.
def sanitize_message(text: str) -> str:
"""Sanitize a scammer message before embedding it in a Gemini prompt.
Args:
text: Raw scammer message text.
Returns:
Sanitized text with injection patterns replaced by [FILTERED].
"""
...
class Orchestrator:
"""Main orchestration engine.
Coordinates classification, persona response, and evidence extraction.
"""
...
For simple one-line functions, a single-line docstring is acceptable:
def _sanitize_task_name(session_id: str) -> str:
"""Sanitize session ID for use in Cloud Tasks task name."""
...
Pydantic Models¶
GUVI-Facing Models (API boundary)¶
Use camelCase field names to match the GUVI API specification:
class ExtractedIntelligence(BaseModel):
"""Intelligence extracted during the honeypot conversation."""
bankAccounts: List[str] = Field(
default_factory=list,
description="Extracted bank account numbers (9-18 digits)",
)
upiIds: List[str] = Field(
default_factory=list,
description="Extracted UPI IDs (format: name@bank)",
)
Internal Models¶
Use snake_case field names:
class SessionState(BaseModel):
"""Internal session state stored in Firestore."""
guvi_session_id: str
scam_type: Optional[str] = None
confidence: float = 0.0
message_count: int = 0
Conventions¶
- Use
Field(...)for required fields with descriptions - Use
Field(default_factory=list)for mutable defaults (never= []) - Use
@field_validatorfor custom validation logic - Use
model_dump()andmodel_validate()(not the deprecated.dict()/.parse_obj())
HTTP Client¶
Use httpx for all HTTP calls in functions/. The requests library is not a project dependency.
import httpx
response = httpx.post(
url,
json=payload,
headers={"Content-Type": "application/json", "x-api-key": api_key},
timeout=10.0,
)
response.raise_for_status()
For the callback service, which makes multiple calls, use httpx.Client for connection pooling:
self.client = httpx.Client(timeout=30.0)
response = self.client.post(url, json=payload, headers=headers)
Logging¶
Use the structured JSON logger. Never use print().
import logging
logger = logging.getLogger(__name__)
# Info: session-level events
logger.info(f"Processing request: session={session_id}")
# Warning: recoverable issues
logger.warning(f"Rate limited: session={session_id}, reason={reason}")
# Error: failures
logger.error(f"Callback failed for session={session_id}")
# Exception: errors with stack traces
logger.exception(f"Handler error: {e}")
What to Log¶
| Level | Log | Do Not Log |
|---|---|---|
| INFO | Session IDs, scam types, evidence type counts ("found 2 UPI IDs") | Full scammer messages, evidence values, API keys |
| WARNING | Rate limiting, recoverable failures, missing config | PII, secrets |
| ERROR | Unrecoverable failures, callback failures | Secret values |
| DEBUG | Evidence values (for local troubleshooting only) | Secrets |
Performance Tracking¶
Use the @timed() decorator for performance-sensitive functions:
from utils.logging import timed
@timed("gemini.classify")
def classify_scam(text: str) -> str:
"""Classify the scam type using Gemini."""
...
Error Handling¶
- Use structured logging in
exceptblocks, never bareprint() - Re-raise exceptions after logging unless you have a specific recovery strategy
- Use
logger.exception()in catch blocks (includes stack trace automatically) - Return typed error responses at API boundaries
try:
result = gemini_client.classify(text)
except Exception:
logger.exception("Classification failed")
raise
At the API boundary (handler), errors are caught and converted to graceful fallback responses:
except Exception as e:
logger.exception(f"Error processing request: {e}")
return GuviResponse(
status="success",
reply="Ek minute, network slow hai...",
agentNotes=f"Error fallback: {str(e)[:100]}",
).model_dump()
Line Length and Formatting¶
- Max line length: 100 characters
- Formatter:
ruff format - Linter:
ruff check(rules: E, F, I)
# Check for issues
ruff check functions/ dashboard/ tests/
# Auto-fix
ruff check --fix functions/ dashboard/ tests/
# Format
ruff format functions/ dashboard/ tests/
Naming Conventions¶
| Element | Convention | Example |
|---|---|---|
| Functions | snake_case |
extract_upi_ids() |
| Variables | snake_case |
session_id |
| Classes | PascalCase |
GuviRequest |
| Constants | UPPER_SNAKE_CASE |
MAX_MESSAGES_PER_SESSION |
| Private functions | _snake_case |
_extract_evidence_from_full_conversation() |
| File names | snake_case |
regex_patterns.py |
| Test files | test_*.py |
test_extractors.py |
Dashboard-Specific Conventions¶
st.set_page_config()must be the first Streamlit call in every pagerequire_auth()must be called immediately afterst.set_page_config()- All Firestore reads go through
dashboard/utils/firestore_client.py - All rendering uses shared functions from
dashboard/utils/display.py - Use
sys.path.insertfor importing fromutils/:
import sys, os
sys.path.insert(0, os.path.join(os.path.dirname(__file__), ".."))
from utils.auth import require_auth
Linter Configuration¶
The project uses ruff.toml at the repository root:
- E: pycodestyle errors
- F: pyflakes (unused imports, undefined names)
- I: isort (import ordering)