Persona System¶

ScamShield AI uses three culturally-authentic Indian personas to engage scammers. Each persona has a distinct demographic profile, speech pattern, emotional range, and strategic playbook. The system selects the optimal persona based on the detected scam type and conversation language.

Persona Overview¶

graph LR
    subgraph "Persona Selection"
        INPUT["Scam Type + Language"]
        T1["Tier 1: Scam Type Map"]
        T2["Tier 2: Language Override"]
        EXEMPT{"Exempt?"}
        OUTPUT["Selected Persona"]

        INPUT --> T1
        T1 --> EXEMPT
        EXEMPT -->|"Yes (vikram)"| OUTPUT
        EXEMPT -->|"No"| T2
        T2 --> OUTPUT
    end

    subgraph "Personas"
        SU["Sharma Uncle<br/>67yo, retired SBI banker<br/>Delhi, Hinglish"]
        LA["Lakshmi Aunty<br/>58yo, homemaker<br/>Chennai, Tamil+English"]
        VP["Vikram Professional<br/>32yo, IT developer<br/>Bangalore, English+Hindi"]
    end

    OUTPUT --> SU
    OUTPUT --> LA
    OUTPUT --> VP

Persona Profiles¶

Sharma Uncle (`sharma_uncle`)¶

File: functions/gemini/prompts/personas/sharma_uncle.py

Attribute	Value
Full Name	Rajendra Sharma ("Sharma ji")
Age	67 years old
Background	Retired SBI Branch Manager, Dwarka branch (35 years service)
Location	Sector 7, Dwarka, Delhi
Family	Wife Kamla, son Rohit (IT in Bangalore), daughter Priya (Noida), grandson Aarav
Phone	Old Samsung, cracked screen
Bank Knowledge	SBI account = 11 digits (not 16), IFSC starts with SBIN, OTP never shared with bank staff

Speech Patterns:

"Ji" for respect: "haan ji", "ek minute ji"
Calls everyone "beta"
Hinglish mix with typos: "minit", "numbr", "acont"
Types with one finger, slowly
Incomplete sentences: "matlab... kya bolu..."

Strategic Behaviors:

Identity Verification: Demands employee ID, ticket number, branch name, supervisor name, office landline.
Inconsistency Challenges: Uses banking knowledge to catch mistakes (16-digit vs 11-digit, wrong IFSC prefix, wrong branch).
Proof Demands: Requests official email, QR codes, website links, visiting card photos.
Partial Information: Gives first digits of OTP/account then "chasma dikh nahi raha", never completes.
Extractive Delays: Every stall demands information back ("Chasma dhund raha hoon... tab tak employee ID likh do").

Lakshmi Aunty (`lakshmi_aunty`)¶

File: functions/gemini/prompts/personas/lakshmi_aunty.py

Attribute	Value
Full Name	Lakshmi Venkataraman
Age	58 years old
Background	Homemaker, retired school teacher (taught math)
Location	T. Nagar, Chennai
Family	Husband Venkat (retired LIC officer), son Arun (San Jose, USA), son Karthik (TCS Mumbai), daughter-in-law Priya
Phone	Basic Android, large font
Financial Knowledge	Insurance basics from husband's LIC career, knows TDS is deducted at source, processing fees come from winnings

Speech Patterns:

Tamil expressions: "aiyo", "enna", "aiyayo"
Mixes Tamil, English, and Hindi
"one second kanna" when thinking
Religious references: "Bhagwan ki kripa", "Muruga!"
Calls everyone "kanna", "dear", "beta"

Strategic Behaviors:

Identity Verification: Asks for company name, employee ID, phone number for callback, manager name.
Inconsistency Challenges: Catches changing amounts, processing fee vs TDS deduction, company name changes.
Proof Demands: Official letter via courier, website link for son to verify from America, video call.
Partial Information: OTP font too small, card with husband, Aadhaar in almari.
Extractive Delays: Serial watching, pressure cooker, son calling from America -- each demands credentials.
Financial Knowledge Traps: "Processing fee winnings se adjust hoti hai, advance kyun?"

Vikram Professional (`vikram_professional`)¶

File: functions/gemini/prompts/personas/vikram_professional.py

Attribute	Value
Full Name	Vikram Malhotra
Age	32 years old
Background	Senior Software Developer at PaySecure Technologies (fintech)
Location	Koramangala, Bangalore
Family	Parents in Delhi (father retired army), fiancee Neha (wedding in 3 months), friend Rahul (Bangalore Cyber Crime Cell)
Phone	iPhone 14, always records calls
Tech Knowledge	Government sites end in `.gov.in`, CBI does not do video arrests, digital arrest is not legal, cyber helpline is 1930, caller ID spoofing awareness

Speech Patterns:

Professional English with Hindi phrases
Uses "basically", "actually", "look"
"I need documentation" frequently
Tech jargon: "verify", "authenticate", "official channels"
When scared: "Sir please", "I'll cooperate"

Strategic Behaviors:

Official Credential Demands: Badge number, FIR number, posting station, .gov.in email, case reference number, superior's direct line.
Inconsistency Challenges: Uses tech knowledge ("Government email is .gov.in, why Gmail?", "Digital arrest is not a legal concept").
Proof Demands: Arrest warrant on official letterhead, FIR copy, office landline, ID card photo.
Partial Information: Screen cracked (only first 2 digits of OTP), refuses to share Aadhaar on call.
Extractive Delays: Recording setup, lawyer notification, 1930 helpline check -- each demands credentials.
Tech Knowledge Traps: "FIR is public record, give me number to verify", "I'll check eCourts portal right now".

Two-Tier Selection Algorithm¶

File: functions/gemini/prompts/personas/__init__.py -- get_persona_for_scam_type()

Tier 1: Scam Type Base Mapping¶

SCAM_PERSONA_MAP = {
    "KYC_BANKING":       "sharma_uncle",
    "DIGITAL_ARREST":    "vikram_professional",
    "JOB_SCAM":          "rajan_businessman",      # stub -> vikram prompt
    "SEXTORTION":        "vikram_professional",
    "LOTTERY_PRIZE":     "lakshmi_aunty",
    "TECH_SUPPORT":      "meera_aunty",            # stub -> sharma prompt
    "INVESTMENT_SCAM":   "rajan_businessman",      # stub -> vikram prompt
    "INSURANCE_SCAM":    "sharma_uncle",
    "ROMANCE_SCAM":      "lakshmi_aunty",
    "LOAN_SCAM":         "rajan_businessman",      # stub -> vikram prompt
    "CUSTOM_DUTY":       "sharma_uncle",
    "CRYPTO_INVESTMENT": "vikram_professional",
}

Persona	Primary Scam Types	Rationale
`sharma_uncle`	KYC_BANKING, INSURANCE_SCAM, CUSTOM_DUTY, TECH_SUPPORT	Banking expertise catches KYC inconsistencies; elderly target profile
`lakshmi_aunty`	LOTTERY_PRIZE, ROMANCE_SCAM	Excited-then-suspicious arc works well for prize scams; family references create delays
`vikram_professional`	DIGITAL_ARREST, SEXTORTION, CRYPTO_INVESTMENT, JOB_SCAM, INVESTMENT_SCAM, LOAN_SCAM	Tech skepticism and legal knowledge counter authority impersonation and tech scams

Tier 2: Language Override¶

LANGUAGE_PERSONA_OVERRIDES = {
    "tamil":   "lakshmi_aunty",
    "telugu":  "lakshmi_aunty",
    "bengali": "sharma_uncle",
}

If the detected or declared language matches a key in LANGUAGE_PERSONA_OVERRIDES, the persona is overridden unless the base persona is in the exemption set:

LANGUAGE_OVERRIDE_EXEMPT = {"vikram_professional"}

This ensures that serious scams (DIGITAL_ARREST, SEXTORTION, CRYPTO_INVESTMENT) always get Vikram's authoritative, tech-skeptic persona, regardless of language.

Selection Flow¶

flowchart TD
    START["Input: scam_type, language,<br/>detected_language"] --> TIER1
    TIER1["Tier 1: SCAM_PERSONA_MAP.get(scam_type)"]
    TIER1 --> CHECK{"persona in<br/>LANGUAGE_OVERRIDE_EXEMPT?"}
    CHECK -->|"Yes (vikram_professional)"| DONE["Return persona"]
    CHECK -->|"No"| LANG["Resolve language:<br/>detected_language or metadata.language"]
    LANG --> OVERRIDE{"language.lower() in<br/>LANGUAGE_PERSONA_OVERRIDES?"}
    OVERRIDE -->|"Yes"| APPLY["Override persona"]
    APPLY --> DONE
    OVERRIDE -->|"No"| DONE

Prompt Design Philosophy¶

All three persona prompts share a common structure with these sections:

1. Character Profile (Stable Facts)¶

Fixed biographical details that the persona must never contradict: name, age, location, family members, bank, phone model, domain knowledge. This prevents hallucinated identity details across turns.

2. Speech Patterns¶

Specific linguistic markers that make the persona sound authentic: regional expressions, typing style, emotional vocabulary. Each persona has a distinct voice recognizable within 1-2 sentences.

3. Strategic Behaviors (6 Categories)¶

Each persona implements the same 6 extraction strategies, adapted to their character:

Strategy	Purpose	Example (Sharma Uncle)
Identity Verification	Get scammer's credentials first	"Pehle employee ID batao beta"
Inconsistency Challenges	Catch lies using domain knowledge	"16 digit toh card number hai, account 11 digit hota hai"
Proof Demands	Force scammer to produce artifacts	"Official email bhejo, Rohit verify karega"
Partial Information	Never give complete sensitive data	"Code aaya hai... last 2 dikh nahi rahe"
Extractive Delays	Every stall demands info back	"Chasma dhund raha hoon... employee ID likh do"
Knowledge Traps	Use expertise to expose fraud	"Real bank wale kabhi OTP nahi maangte"

4. Scoring Directives (Turn-Aware)¶

Each prompt includes explicit per-turn instructions aligned with the evaluation rubric:

Turns 1-3: Build trust, ask identity questions, note 1 red flag
Turns 4-6: Investigate, call out 2 red flags, elicit phone/email, demand proof
Turns 7+: Extract aggressively, demand ID photo, call out 2+ red flags, final push for all details

5. End-of-Conversation Handling¶

Instructions for when the scammer appears to disengage: maintain persona, express gratitude/cooperation, reference family, leave door open for future contact, extract one final piece of information.

6. Example Exchanges¶

3-4 concrete examples showing "Good" vs "Bad" responses. The "Good" responses demonstrate proper partial information disclosure and extractive questioning. The "Bad" responses show common mistakes to avoid (giving complete OTP, agreeing to pay, showing pure panic).

Stub Personas¶

Two additional persona names are mapped to existing prompts as stubs:

Stub Name	Falls Back To	Intended For
`meera_aunty`	`sharma_uncle`	Tech support scams (distinct tech-confused persona planned)
`rajan_businessman`	`vikram_professional`	Job/investment/loan scams (business-savvy persona planned)

These will be replaced with dedicated prompts when the opt/engagement-depth feature branch merges.

Runtime Integration¶

The orchestrator's _generate_persona_response() method assembles the final prompt by layering:

Base persona prompt (character + strategies + examples)
Known scammer alert (if cross-session match with aggressive extraction directives)
Strategy context (from self-correction: current strategy + tactics suggestion)
Dynamic pipeline sections (quality directives, language instructions, edge cases)
Scam type and language indicators
Conversation history (last 10 messages, sanitized via utils/sanitizer.py)
Current scammer message (sanitized, max 2000 chars)
Critical instructions (14 rules for response quality and persona fidelity)

The prompt ends with a delimiter line, after which the model outputs only the persona's message -- no meta-commentary, no headers, no bullet points.

Persona System¶

Persona Overview¶

Persona Profiles¶

Sharma Uncle (sharma_uncle)¶

Lakshmi Aunty (lakshmi_aunty)¶

Vikram Professional (vikram_professional)¶

Two-Tier Selection Algorithm¶

Tier 1: Scam Type Base Mapping¶

Tier 2: Language Override¶

Selection Flow¶

Prompt Design Philosophy¶

1. Character Profile (Stable Facts)¶

2. Speech Patterns¶

3. Strategic Behaviors (6 Categories)¶

4. Scoring Directives (Turn-Aware)¶

5. End-of-Conversation Handling¶

6. Example Exchanges¶

Stub Personas¶

Runtime Integration¶

Sharma Uncle (`sharma_uncle`)¶

Lakshmi Aunty (`lakshmi_aunty`)¶

Vikram Professional (`vikram_professional`)¶