Skip to content

Chapter 1: Foundations --- Why Firebase + Gemini

What We Built

This chapter covers the foundational choices that shaped every line of code in ScamShield AI: why Firebase Cloud Functions instead of a long-running server, why Google Gemini Flash instead of GPT-4 or Claude, why Python 3.11 with Pydantic v2 instead of Node.js or Go, and how the project directory is structured. By the end, you will understand the constraints we were designing against and why these choices compose well together.

Why This Approach

The Constraints

Before picking any technology, we listed what the system must do:

  1. Respond in real time. The GUVI evaluator (and real scammers) expect a reply within seconds. A cold start that takes 30 seconds is unacceptable.
  2. Scale to zero. Between hackathon evaluations, the system should cost nothing. We are not a funded startup with an always-on budget.
  3. Handle secrets securely. The system needs a Gemini API key and an authentication key. These must never appear in code or logs.
  4. Persist conversation state. A scam conversation spans multiple requests. State must survive across cold starts.
  5. Deploy in one command. The CI/CD pipeline must go from git push to live endpoint without manual intervention.
  6. Support Hindi, Hinglish, and Tamil. The LLM must handle code-mixed Indian languages fluently.

These constraints eliminated most options before we started evaluating them.

Why Firebase Cloud Functions

flowchart TD
    subgraph "What we considered"
        A[AWS Lambda + API Gateway]
        B[Google Cloud Run]
        C[Firebase Cloud Functions]
        D[Self-hosted FastAPI]
    end

    C --> E[Winner]

    subgraph "Why Firebase won"
        F[Zero-config Firestore integration]
        G[Secret Manager built in]
        H[Auto-scaling to zero]
        I[Single deploy command]
        J[Python 3.11 runtime]
    end

    E --> F
    E --> G
    E --> H
    E --> I
    E --> J

Firebase Cloud Functions (2nd generation) runs on Cloud Run under the hood, but wraps it with an opinionated deployment model that eliminates boilerplate. Here is what mattered:

Firestore integration. Session state needs to persist across requests. Firestore is a document database that is accessible from Cloud Functions with zero configuration --- no connection strings, no IAM role setup, no VPC peering. You import the Admin SDK and it works.

Secret Manager. The @https_fn.on_request(secrets=["GEMINI_API_KEY", "SCAMSHIELD_API_KEY"]) decorator injects secrets as environment variables at runtime. No .env files in production, no secret fetching code, no risk of accidentally logging them.

Scale to zero. Firebase Cloud Functions bill per invocation. When no one is sending scam messages, the cost is literally zero. During a burst of evaluation requests, the platform spins up instances automatically. We configured our function with 512 MB memory and a 60-second timeout --- generous enough for Gemini API calls without paying for headroom we don't need.

One-command deploy. firebase deploy --only functions builds a virtual environment, packages dependencies, uploads to Cloud Run, and cuts over traffic. The CI/CD workflow is 15 lines of YAML.

The tradeoff is cold starts. The first request after a period of inactivity takes 3--8 seconds as the Python runtime initializes, imports are loaded, and the Gemini client connects. We mitigated this with a singleton pattern for the Gemini client and lazy imports for heavy modules.

Why Gemini Flash

The LLM choice was driven by three hard requirements:

Requirement Gemini Flash GPT-4 Claude
Response latency ~1--2s ~3--5s ~2--4s
Hindi/Hinglish quality Excellent Good Good
Tamil expressions Good Moderate Moderate
Free tier 15 RPM / 1M tokens None Limited
JSON mode (structured output) Native Native Via prompting
Safety filter override Configurable Limited Limited

Speed is non-negotiable. A real person texting on WhatsApp responds in seconds, not tens of seconds. Gemini 3 Flash delivers sub-2-second responses for our prompt sizes. GPT-4 is more capable but noticeably slower for the kind of short, conversational outputs we need.

Multilingual fluency. Indian scammers operate in Hindi, Hinglish (Hindi-English mix), and regional languages. Our personas need to code-switch naturally --- "Haan ji, ek minute" in one sentence, "What is your employee ID?" in the next. Gemini's training data includes substantial Indian language content, and it handles this code-mixing more fluently than alternatives we tested.

Safety filter configuration. This is a system that processes scam messages --- content that triggers safety filters in every LLM. Gemini allows per-request safety setting overrides, so we can set HARM_CATEGORY_DANGEROUS_CONTENT to BLOCK_NONE for our analysis pipeline while keeping other categories at reasonable thresholds.

Fallback strategy. We implemented automatic model fallback: if gemini-3-flash-preview returns a 404 (it's a preview model that may be rotated), the client transparently retries with gemini-2.0-flash:

# Model IDs in preference order
GEMINI_PRIMARY_MODEL = "gemini-3-flash-preview"
GEMINI_FALLBACK_MODEL = "gemini-2.0-flash"

def _call_model(self, contents: str, config: types.GenerateContentConfig,
                use_breaker: bool = True):
    """Call generate_content with automatic fallback."""
    for attempt in range(2):
        try:
            if use_breaker:
                return gemini_breaker.call(
                    self.client.models.generate_content,
                    model=self._active_model,
                    contents=contents,
                    config=config,
                )
            else:
                return self.client.models.generate_content(
                    model=self._active_model,
                    contents=contents,
                    config=config,
                )
        except Exception as e:
            if attempt == 0 and self._is_model_not_found(e):
                logger.warning(
                    f"Model {self._active_model} unavailable, "
                    f"falling back to {GEMINI_FALLBACK_MODEL}"
                )
                self._active_model = GEMINI_FALLBACK_MODEL
                continue
            raise

We also wrapped the Gemini calls in a circuit breaker (using pybreaker). If the API fails 5 times in 60 seconds, the breaker opens and we fall back to keyword-based classification. This ensures the system degrades gracefully rather than hanging on repeated API failures.

Why Python 3.11

Python was chosen for pragmatic reasons, not ideological ones:

  • Pydantic v2 gives us fast, validated data models with automatic JSON serialization. The GUVI API spec requires camelCase field names; Pydantic handles the snake_case-to-camelCase translation with field aliases.
  • google-genai SDK is the official Python client for Gemini. It is well-maintained and supports structured output (JSON schema enforcement).
  • Regex ecosystem. Our evidence extraction is regex-heavy. Python's re module is battle-tested, and the patterns are readable.
  • Firebase Functions Python runtime reached general availability with Python 3.11 support, including the 2nd-gen Cloud Run backend.

The main tradeoff is cold start time. A Node.js function cold-starts in ~1 second; Python takes 3--8 seconds with our dependency set. We accepted this because the Gemini API call itself dominates the total response time, and subsequent requests (warm starts) are fast.

The Code

Project Structure

functions/
  main.py                          # Firebase entry point, exports cloud functions
  guvi/
    handler.py                     # Webhook handler: auth, parse, process, respond
    models.py                      # Pydantic models (GuviRequest, GuviResponse, etc.)
    callback.py                    # Sends intelligence reports to GUVI
  engine/
    orchestrator.py                # Pipeline: classify -> persona -> extract -> respond
    context.py                     # Pipeline context (enrichment data between stages)
  gemini/
    client.py                      # Gemini API client with fallback + circuit breaker
    prompts/
      classifier.py                # Scam classification prompt + few-shot examples
      extractor.py                 # LLM-based evidence extraction prompt
      personas/
        __init__.py                # Persona selection logic (two-tier lookup)
        sharma_uncle.py            # Retired SBI banker, Delhi
        lakshmi_aunty.py           # Homemaker, Chennai
        vikram_professional.py     # IT professional, Bangalore
  extractors/
    regex_patterns.py              # Regex extractors for 11 evidence types
    keywords.py                    # Suspicious keyword detection with scoring
  firestore/
    sessions.py                    # Session CRUD + cross-session evidence index
  tasks/
    callback_scheduler.py          # Cloud Tasks delayed callback scheduling
  utils/
    sanitizer.py                   # Prompt injection defense
    rate_limiter.py                # Per-session rate limiting
    logging.py                     # Structured JSON logging
    oidc.py                        # Cloud Tasks OIDC token verification

The structure mirrors the processing pipeline. A request enters through main.py, is handled by guvi/handler.py, flows through engine/orchestrator.py (which calls gemini/client.py and extractors/), and the result is stored in firestore/ and reported via guvi/callback.py.

The Entry Point: main.py

Firebase Cloud Functions requires a specific export pattern. Each decorated function becomes a separately deployed HTTP endpoint:

"""
ScamShield AI - Firebase Cloud Functions Entry Point
"""
import json
import logging

from firebase_functions import https_fn
from utils.logging import setup_logging

# Configure structured JSON logging
setup_logging()

logger = logging.getLogger(__name__)

# Import and export the GUVI honeypot function
from guvi.handler import guvi_honeypot

# Export for Firebase
__all__ = ["guvi_honeypot", "send_delayed_callback"]

Why imports happen at module level

Firebase discovers cloud functions by importing main.py and scanning for decorated functions. The guvi_honeypot function is defined in handler.py but must be importable from main.py. This is why we use a top-level import --- the decorator @https_fn.on_request(...) registers it with Firebase's function registry at import time.

The Function Decorator

The decorator on guvi_honeypot configures the Cloud Functions runtime:

@https_fn.on_request(
    timeout_sec=60,
    memory=512,
    region="asia-south1",  # Mumbai region
    secrets=["GEMINI_API_KEY", "SCAMSHIELD_API_KEY"],
)
def guvi_honeypot(request: https_fn.Request) -> https_fn.Response:
    ...
  • timeout_sec=60: Gemini API calls can take 2--5 seconds, and we may make two (classify + generate). With network overhead and Firestore writes, 60 seconds is comfortable.
  • memory=512: The Python runtime plus our dependencies (Pydantic, google-genai, httpx) need ~300 MB. 512 MB gives headroom without overpaying.
  • region="asia-south1": Mumbai. Our users and the GUVI evaluator are in India. Latency to Gemini's API is lowest from an Indian region.
  • secrets=["GEMINI_API_KEY", "SCAMSHIELD_API_KEY"]: These are injected as environment variables at runtime from Google Cloud Secret Manager.

Singleton Gemini Client

The Gemini client uses HTTPS connections that benefit from reuse. We use a module-level singleton with a lock to ensure thread safety:

_gemini_client: Optional["GeminiClient"] = None
_gemini_lock = threading.Lock()

def get_gemini_client() -> "GeminiClient":
    """Get or create a singleton GeminiClient."""
    global _gemini_client
    if _gemini_client is None:
        with _gemini_lock:
            if _gemini_client is None:
                _gemini_client = GeminiClient()
    return _gemini_client

This is the classic double-checked locking pattern. On a warm start (the function instance is already running), get_gemini_client() returns the existing client with no overhead. On a cold start, it creates one and reuses it for the lifetime of the instance.

Key Architectural Decision

Serverless vs. always-on.

We seriously considered deploying as a Cloud Run service with min_instances=1 to eliminate cold starts entirely. The always-on approach gives consistent sub-second response times because the container is pre-warmed.

We chose serverless (scale-to-zero) for three reasons:

  1. Cost. An always-on Cloud Run service in asia-south1 with 512 MB memory costs roughly $15--20/month even at zero traffic. Firebase Cloud Functions costs $0 at zero traffic. For a hackathon project that may sit idle for days between evaluations, this matters.

  2. Cold start is tolerable. The GUVI evaluator sends multiple messages per session. Only the first message of a session hits a cold start (3--8 seconds). Subsequent messages in the same session hit warm instances and respond in 1--3 seconds. The evaluator's scoring does not heavily penalize the first response's latency.

  3. Operational simplicity. Firebase deploy handles container building, dependency installation, traffic routing, and rollback. There is no Dockerfile to maintain, no health check to configure, no minimum instance count to tune.

The decision would be different in production at scale. If ScamShield were handling thousands of concurrent scam conversations, the cold start penalty on new instances would be unacceptable, and we would switch to Cloud Run with min_instances and autoscaling.

What We Learned

Lesson: The Firebase import chain

Firebase's function discovery mechanism requires that all cloud functions be importable from main.py. This creates an import chain: main.py -> handler.py -> orchestrator.py -> gemini/client.py -> etc. If any module in this chain has a side effect at import time (like initializing Firebase Admin SDK or connecting to Firestore), it will execute during firebase deploy --- even on machines without credentials. We learned this the hard way when unit tests started failing because importing the handler triggered Firebase initialization. The fix: lazy imports for Firebase Admin SDK, and extracting utilities (like OIDC verification) into standalone modules that do not import Firebase.

Lesson: Circuit breakers are not optional for LLM APIs

During early testing, a Gemini outage caused every request to hang for 30 seconds (the timeout), then fail. With multiple concurrent requests, this cascaded into function timeouts and a backlog. Adding a circuit breaker (pybreaker with a 5-failure threshold and 30-second reset) transformed the failure mode: after 5 failures, subsequent requests immediately fall back to keyword-based classification instead of waiting for a timeout. The system degrades gracefully instead of falling over.

Lesson: Region selection has compounding effects

Placing the function in asia-south1 (Mumbai) reduced latency to both the Gemini API and Firestore by ~100ms compared to us-central1. Over a 10-message conversation with 2 API calls per message, that saves 2 full seconds of accumulated latency. For a system where perceived response speed is part of the deception (real people text slowly), this headroom lets us do more processing within a believable response time.


Previous: Chapter 0 -- The Problem | Next: Chapter 2 -- First Working Endpoint