System Architecture Overview¶
ScamShield AI is an AI-powered honeypot system that engages phone/SMS scammers in realistic conversations, wastes their time, and extracts actionable intelligence (UPI IDs, bank accounts, phone numbers, etc.). It runs as a pair of Firebase Cloud Functions (2nd gen) in asia-south1, backed by Firestore for session persistence and Google Gemini Flash for LLM-powered classification and response generation.
Component Diagram¶
graph TB
subgraph "External Systems"
GUVI["GUVI Evaluator<br/>(sends scam messages)"]
GEMINI["Google Gemini Flash<br/>(3-flash-preview / 2.0-flash)"]
CT["Cloud Tasks<br/>(delayed callbacks)"]
end
subgraph "Firebase Cloud Functions (asia-south1)"
MAIN["main.py<br/>guvi_honeypot"]
DELAYED["main.py<br/>send_delayed_callback"]
subgraph "Request Layer"
HANDLER["guvi/handler.py<br/>Auth + Parse + Orchestrate"]
MODELS["guvi/models.py<br/>Pydantic Request/Response"]
CALLBACK["guvi/callback.py<br/>GUVI Callback Service"]
end
subgraph "Processing Engine"
ORCH["engine/orchestrator.py<br/>Classify - Persona - Extract"]
CTX["engine/context.py<br/>PipelineContext"]
end
subgraph "LLM Layer"
CLIENT["gemini/client.py<br/>Gemini Client + Fallback"]
CLASSIFIER["gemini/prompts/classifier.py<br/>Scam Classifier Prompt"]
PERSONAS["gemini/prompts/personas/<br/>3 Indian Personas"]
end
subgraph "Evidence Extraction"
REGEX["extractors/regex_patterns.py<br/>14 Regex Extractors"]
KEYWORDS["extractors/keywords.py<br/>11 Keyword Categories"]
end
subgraph "Storage"
SESSIONS["firestore/sessions.py<br/>Session CRUD + Evidence Index"]
end
subgraph "Security & Utils"
SANITIZER["utils/sanitizer.py<br/>Prompt Injection Filter"]
OIDC["utils/oidc.py<br/>Cloud Tasks OIDC Verify"]
RATE["utils/rate_limiter.py<br/>Per-Session Rate Limits"]
LOGGING["utils/logging.py<br/>Structured JSON Logging"]
end
subgraph "Task Scheduling"
SCHEDULER["tasks/callback_scheduler.py<br/>Cloud Tasks Scheduler"]
end
end
subgraph "Data Stores"
FS_SESSIONS[("Firestore<br/>honeypot_sessions")]
FS_EVIDENCE[("Firestore<br/>evidence_index")]
FS_RATE[("Firestore<br/>rate_limits")]
end
GUVI -->|"POST /guvi_honeypot<br/>x-api-key header"| MAIN
MAIN --> HANDLER
HANDLER --> RATE
HANDLER --> MODELS
HANDLER --> ORCH
HANDLER --> CALLBACK
ORCH --> CTX
ORCH --> CLIENT
ORCH --> REGEX
ORCH --> KEYWORDS
CLIENT --> GEMINI
CLIENT --> CLASSIFIER
CLIENT --> PERSONAS
CLIENT --> SANITIZER
HANDLER --> SESSIONS
SESSIONS --> FS_SESSIONS
SESSIONS --> FS_EVIDENCE
RATE --> FS_RATE
CALLBACK -->|"POST /updateHoneyPotFinalResult"| GUVI
SCHEDULER --> CT
CT -->|"POST /send_delayed_callback<br/>OIDC Bearer token"| DELAYED
DELAYED --> OIDC
DELAYED --> SESSIONS
DELAYED --> CALLBACK
Key Components¶
| Component | File | Purpose |
|---|---|---|
| Entry Point | functions/main.py |
Exports guvi_honeypot and send_delayed_callback Cloud Functions |
| Handler | functions/guvi/handler.py |
API key auth, request parsing, orchestration, callback dispatch, response assembly |
| Models | functions/guvi/models.py |
Pydantic models: GuviRequest, GuviResponse, SessionState, ExtractedIntelligence, GuviCallbackPayload |
| Orchestrator | functions/engine/orchestrator.py |
Pipeline: classify -> select persona -> extract evidence -> self-correct strategy -> generate response |
| Pipeline Context | functions/engine/context.py |
Priority-ordered prompt sections for language, edge cases, and turn-aware quality directives |
| Gemini Client | functions/gemini/client.py |
LLM calls with automatic model fallback (primary: gemini-3-flash-preview, fallback: gemini-2.0-flash), circuit breaker, keyword fallback |
| Classifier Prompt | functions/gemini/prompts/classifier.py |
Few-shot scam classification prompt for 12 scam types |
| Personas | functions/gemini/prompts/personas/ |
3 culturally-authentic Indian personas with two-tier selection |
| Regex Extractors | functions/extractors/regex_patterns.py |
14 regex-based extractors for UPI, bank accounts, phones, emails, URLs, amounts, IFSC, Aadhaar (Verhoeff), PAN, crypto wallets, case IDs, policy/order numbers |
| Keyword Extractor | functions/extractors/keywords.py |
11 weighted keyword categories with pre-compiled regex patterns |
| Session Storage | functions/firestore/sessions.py |
Firestore CRUD with in-memory fallback, atomic ArrayUnion evidence accumulation, cross-session evidence index |
| Callback Service | functions/guvi/callback.py |
Sends GuviCallbackPayload to GUVI's updateHoneyPotFinalResult endpoint with retry + circuit breaker |
| Callback Scheduler | functions/tasks/callback_scheduler.py |
Schedules/cancels Cloud Tasks for delayed callbacks after 10s inactivity |
| Sanitizer | functions/utils/sanitizer.py |
Strips 12 prompt injection patterns, truncates to 2000 chars |
| OIDC Verifier | functions/utils/oidc.py |
Verifies Cloud Tasks OIDC Bearer tokens against Google public keys |
| Rate Limiter | functions/utils/rate_limiter.py |
Per-session Firestore transactional counters: 100 total, 10/min |
Runtime Configuration¶
| Setting | Value | Source |
|---|---|---|
| Region | asia-south1 (Mumbai) |
Function decorator |
| Runtime | Python 3.11 | Firebase config |
| Memory | 512 MB (guvi_honeypot), 256 MB (send_delayed_callback) |
Function decorator |
| Timeout | 60s (guvi_honeypot), 30s (send_delayed_callback) |
Function decorator |
| Primary LLM | gemini-3-flash-preview |
gemini/client.py |
| Fallback LLM | gemini-2.0-flash |
gemini/client.py |
| Callback delay | 10 seconds | tasks/callback_scheduler.py |
| Rate limit (total) | 100 messages/session | utils/rate_limiter.py |
| Rate limit (burst) | 10 messages/minute | utils/rate_limiter.py |
Secrets¶
All secrets are stored in GCP Secret Manager and injected at function deploy time:
| Secret | Used By | Purpose |
|---|---|---|
GEMINI_API_KEY |
gemini/client.py |
Authenticates Gemini API calls |
SCAMSHIELD_API_KEY |
guvi/handler.py, guvi/callback.py |
Inbound auth (GUVI sends to us) and outbound auth (we send to GUVI) |
Production Behavior
When K_SERVICE is set (Cloud Functions runtime) and SCAMSHIELD_API_KEY is missing, all requests are denied. In local development (no K_SERVICE), all requests are allowed.
Data Flow Summary¶
- Inbound: GUVI sends a
POST /guvi_honeypotwith a scammer message, conversation history, and metadata. - Auth: API key validated via
x-api-keyheader. - Rate Check: Per-session Firestore transactional counter enforces 100 total / 10 per minute.
- Session: Loaded from Firestore (or created fresh). Survives cold starts.
- Evidence: Regex extractors run on all scammer messages (full conversation, not just current turn).
- Cross-Session: Evidence index queried for known scammer fingerprints (UPI, bank, phone, email).
- Classification: Gemini classifies scam type with confidence score. Keyword fallback if LLM fails.
- Persona: Two-tier selection (scam type base mapping, then language override with exemptions).
- Response: Gemini generates in-character response with turn-aware quality directives and self-correction strategy.
- Persist: Single batch Firestore write for session state, evidence, conversation history.
- Callback: From turn 1 onward, sends accumulated intelligence to GUVI's
updateHoneyPotFinalResult(overwrite semantics). - Return:
GuviResponsewith reply,scamDetected,extractedIntelligence,engagementMetrics,agentNotes.