Session Management¶

ScamShield AI uses Firestore for persistent session storage across Firebase cold starts. Sessions accumulate evidence, track conversation state, and enable cross-session intelligence linking. When Firestore is unavailable, the system falls back to in-memory storage transparently.

Firestore Collections¶

erDiagram
    honeypot_sessions {
        string doc_id PK "sessionId from GUVI"
        string persona "Active persona name"
        string scam_type "Classified scam type"
        float confidence "Detection confidence 0.0-1.0"
        string state "Session state machine value"
        int message_count "Total messages exchanged"
        bool callback_sent "Whether callback was sent"
        map extracted_evidence "ExtractedIntelligence (14 fields)"
        array conversation_history "Full message history"
        string strategy_state "Self-correction state"
        int messages_since_evidence "Counter for stalled convos"
        bool high_value_extracted "UPI or bank obtained"
        string source "guvi or testing"
        string started_at "ISO 8601 timestamp"
        string created_at "ISO 8601 timestamp"
        string updated_at "ISO 8601 timestamp"
    }

    evidence_index {
        string doc_id PK "type:normalized_value"
        string type "upi, bank, phone, email"
        string value "Original evidence value"
        array sessions "Session IDs where found"
        array scam_types "Scam types from sessions"
        string first_seen "ISO 8601 timestamp"
        string last_seen "ISO 8601 timestamp"
        int total_occurrences "Number of unique sessions"
        string source "guvi or testing"
    }

    rate_limits {
        string doc_id PK "sessionId"
        int total_messages "Lifetime message count"
        int minute_key "Current minute epoch bucket"
        int minute_count "Messages in current minute"
        float first_request "Unix timestamp"
        float last_request "Unix timestamp"
    }

    honeypot_sessions ||--o{ evidence_index : "evidence linked via session IDs"

Collection Schemas¶

`honeypot_sessions`¶

The primary session collection. Document ID is the GUVI sessionId.

Field	Type	Default	Description
`persona`	string	`"sharma_uncle"`	Active persona name
`scam_type`	string	`null`	Classified scam type (e.g., `KYC_BANKING`)
`confidence`	float	`0.0`	Scam detection confidence
`state`	string	`"INITIAL"`	Session state machine value
`message_count`	int	`0`	Total messages exchanged
`callback_sent`	bool	`false`	Whether callback has been sent
`extracted_evidence`	map	`{}`	Nested map with 14 evidence arrays (see below)
`conversation_history`	array	`[]`	Full conversation: `{sender, text, timestamp}` per message
`strategy_state`	string	`"BUILDING_TRUST"`	Self-correction strategy state
`messages_since_evidence`	int	`0`	Turns since last new high-value evidence
`high_value_extracted`	bool	`false`	True if UPI or bank account found
`source`	string	`"guvi"`	Request source: `guvi` or `testing`
`started_at`	string	(auto)	Session start time (ISO 8601)
`created_at`	string	(auto)	Document creation time
`updated_at`	string	(auto)	Last modification time

extracted_evidence sub-fields (all List[str]):

bankAccounts, upiIds, phishingLinks, phoneNumbers, emailAddresses, suspiciousKeywords (max 15), ifscCodes, cryptoWallets, aadhaarNumbers, panNumbers, amounts, caseIds, policyNumbers, orderNumbers

`evidence_index`¶

Cross-session evidence index. Document ID is {type}:{normalized_value} (e.g., upi:fraud@oksbi).

Field	Type	Description
`type`	string	Evidence type: `upi`, `bank`, `phone`, `email`
`value`	string	Original evidence value
`sessions`	array	List of session IDs where this evidence appeared
`scam_types`	array	Scam types associated with sessions
`first_seen`	string	ISO 8601 timestamp of first occurrence
`last_seen`	string	ISO 8601 timestamp of most recent occurrence
`total_occurrences`	int	Count of unique sessions
`source`	string	Request source

Only high-value evidence types are indexed: UPI IDs, bank accounts, phone numbers, and email addresses.

`rate_limits`¶

Per-session rate limiting counters. Document ID is the sessionId.

Field	Type	Description
`total_messages`	int	Lifetime message count for this session
`minute_key`	int	Current minute bucket (`int(now / 60)`)
`minute_count`	int	Messages in the current minute bucket
`first_request`	float	Unix timestamp of first request
`last_request`	float	Unix timestamp of most recent request

Session State Machine¶

stateDiagram-v2
    [*] --> INITIAL: New session created

    INITIAL --> INITIAL: confidence <= 0.7
    INITIAL --> ENGAGING: confidence > 0.7

    ENGAGING --> ENGAGING: msg_count <= 5, no high-value evidence
    ENGAGING --> COMPLIANT: msg_count > 5, no high-value evidence
    ENGAGING --> EXTRACTION_SUCCESS: UPI or bank account extracted

    COMPLIANT --> COMPLIANT: msg_count <= 10, no high-value evidence
    COMPLIANT --> EXTRACTING: msg_count > 10
    COMPLIANT --> EXTRACTION_SUCCESS: UPI or bank account extracted

    EXTRACTING --> EXTRACTING: continued engagement
    EXTRACTING --> EXTRACTION_SUCCESS: UPI or bank account extracted

    EXTRACTION_SUCCESS --> EXTRACTION_SUCCESS: [terminal state]

State transitions are computed by Orchestrator._determine_state():

def _determine_state(self, current_state, scam_type, confidence, message_count, evidence):
    if current_state == "INITIAL":
        return "ENGAGING" if confidence > 0.7 else "INITIAL"
    if evidence.upiIds or evidence.bankAccounts:
        return "EXTRACTION_SUCCESS"
    if message_count > 10:
        return "EXTRACTING"
    if message_count > 5:
        return "COMPLIANT"
    return "ENGAGING"

State	Meaning
`INITIAL`	New session, classification in progress
`ENGAGING`	Scam detected, building trust with scammer
`COMPLIANT`	Good engagement (5+ messages), continuing extraction
`EXTRACTING`	Long engagement (10+ messages), actively extracting
`EXTRACTION_SUCCESS`	High-value evidence (UPI/bank) successfully obtained

Strategy State Machine¶

Separate from the session state, the strategy state drives self-correction of the extraction approach:

stateDiagram-v2
    [*] --> BUILDING_TRUST

    BUILDING_TRUST --> EXTRACTING: msg_count >= 3 AND confidence > 0.6
    BUILDING_TRUST --> BUILDING_TRUST: still building rapport

    EXTRACTING --> DIRECT_PROBE: 4+ msgs without evidence, no high-value
    EXTRACTING --> PIVOTING: high-value evidence obtained

    DIRECT_PROBE --> BUILDING_TRUST: scammer disengaging (short responses)
    DIRECT_PROBE --> DIRECT_PROBE: 3+ msgs, vary tactics

    PIVOTING --> PIVOTING: continue extracting scammer identity

Strategy State	Behavior
`BUILDING_TRUST`	Cooperative, confused victim. Ask basic questions.
`EXTRACTING`	Start requesting payment details. Express willingness but demand verification.
`DIRECT_PROBE`	Direct approach: "Send your UPI ID so I can pay." Express urgency.
`PIVOTING`	Payment details obtained. Now extract scammer's personal info: name, ID, address, email.

NOT_SCAM bypass

When scam_type == "NOT_SCAM", the strategy stays at BUILDING_TRUST with natural conversation guidance. No extraction tactics are applied.

Session Lifecycle¶

sequenceDiagram
    participant H as Handler
    participant FS as Firestore
    participant EI as Evidence Index

    Note over H,EI: Turn 1

    H->>FS: get_session(id) → None
    H->>FS: save_session(id, initial_state)

    H->>H: Extract evidence, classify, generate response

    H->>FS: batch_update_session(id, {<br/>evidence, scam_type, confidence,<br/>state, persona, strategy, msg_count})
    H->>EI: store_evidence_index(id, evidence)

    Note over H,EI: Turn 2+

    H->>FS: get_session(id) → SessionState
    H->>H: Extract evidence (full conversation)
    H->>EI: find_matching_evidence(evidence)
    EI-->>H: cross_session_match

    H->>H: Classify, select persona, generate response

    H->>FS: batch_update_session(id, merged_updates)
    H->>EI: store_evidence_index(id, merged_evidence)

Key Implementation Details¶

Lazy initialization: The Firestore client is not created at import time. It is lazily initialized on first use to prevent timeout during Firebase code loading:

def _get_db():
    global _db, _firestore_available
    if _firestore_available is False:
        return None
    if _db is None:
        with _db_lock:
            if _db is None and _firestore_available is not False:
                _db = firestore.client()

In-memory fallback: If Firestore initialization fails, all operations fall back to a module-level _memory_sessions dict. This is logged as a warning but the system continues operating.

Batch writes: All session updates are combined into a single set(merge=True) call via batch_update_session(), replacing what was previously 4 separate writes (conversation, evidence, message count, session state).

Evidence accumulation: merge_evidence_locally() performs set union on all 14 evidence fields. The only field with a cap is suspiciousKeywords (limited to 15). For atomic server-side operations, accumulate_evidence() uses Firestore's ArrayUnion.

Callback Trigger Conditions¶

Callbacks are sent to GUVI from turn 1 onward (CALLBACK_MIN_TURN = 1). The rationale:

GUVI's evaluator may stop at any turn (up to 10).
The callback endpoint uses updateHoneyPotFinalResult with overwrite semantics.
Sending every turn ensures the latest intelligence is always submitted.
A failed callback on one turn will be retried on the next turn with fresher data.

The original should_send_callback() function in orchestrator.py defines the legacy conditions (used for delayed callbacks):

Condition	Threshold
High-value evidence (UPI or bank) + engagement	`message_count >= 10`
High confidence + keywords	`confidence > 0.85` AND `keywords >= 3` AND `messages >= 8`
Long engagement	`message_count >= 10`
Already sent	Skip (idempotency)

Cross-Session Evidence Linking¶

flowchart TD
    subgraph "Session A (past)"
        A_EV["UPI: fraud@oksbi<br/>Phone: 9876543210"]
    end

    subgraph "Evidence Index"
        IDX_UPI["upi:fraud@oksbi<br/>sessions: [A]<br/>scam_types: [KYC_BANKING]"]
        IDX_PHONE["phone:9876543210<br/>sessions: [A]<br/>scam_types: [KYC_BANKING]"]
    end

    subgraph "Session B (current)"
        B_MSG["Scammer sends:<br/>'Send to fraud@oksbi'"]
        B_EXTRACT["Extract: fraud@oksbi"]
        B_LOOKUP["find_matching_evidence()"]
        B_RESULT["is_known_scammer: true<br/>total_matching_sessions: 1<br/>known_scam_types: [KYC_BANKING]"]
    end

    A_EV -->|"store_evidence_index()"| IDX_UPI
    A_EV -->|"store_evidence_index()"| IDX_PHONE

    B_MSG --> B_EXTRACT
    B_EXTRACT --> B_LOOKUP
    B_LOOKUP --> IDX_UPI
    IDX_UPI --> B_RESULT

Indexing¶

When evidence is stored via store_evidence_index(), each high-value item (UPI, bank, phone, email) gets its own document in evidence_index:

Document ID: {type}:{normalized_value} (lowercased, spaces removed)
Sessions array: Updated via ArrayUnion (race-safe)
Scam types array: Updated via ArrayUnion

Lookup¶

find_matching_evidence() takes the current session's evidence and queries the index for each UPI, bank, phone, and email. Results are aggregated:

matches: List of matching items with previous_sessions, occurrence_count, scam_types
total_matching_sessions: Count of unique previous sessions
known_scam_types: Union of all scam types from matching sessions
is_known_scammer: True if any UPI or bank account match has occurrence_count > 0

Impact on Processing¶

When is_known_scammer is True:

Confidence boost: +0.1 * min(match_count, 3), capped at 0.95
Aggressive prompt injection: Known scammer alert with match details, demanding employee ID, supervisor name, office address, email, callback number
Logging: Cross-session match details logged for audit