Pratiksha Pawar — AI/ML Engineer

Things I've built

RAG Pipeline · Financial Compliance · GenAI

Regulatory RAG System

Compliance analysts at Deutsche Bank were spending hours manually searching thousands of pages of Basel III and MiFID II documentation. I built a retrieval system that makes that search instant, cited, and reliable.

The key engineering decision was in retrieval strategy. Pure vector search scored 79% on our eval set — hybrid BM25 plus vector scored 91%. Regulatory text has dense, exact terminology like LCR and CET1 that semantic search alone misses. The 70/30 weighting was tuned on a 200-question eval set built with compliance analysts.

LlamaIndexChromaDBFAISS OpenAI EmbeddingsBM25 FastAPIBasel IIIMiFID II

pipeline.py

# Hybrid retrieval: 70% vector + 30% BM25
return QueryFusionRetriever(
    retrievers=[
        VectorIndexRetriever(index=self.index, similarity_top_k=10),
        BM25Retriever.from_defaults(nodes=self.nodes, similarity_top_k=10),
    ],
    retriever_weights=[0.7, 0.3],
    similarity_top_k=5,
)
# Result: 91% relevance@5 vs 79% pure vector

91%

relevance@5 on internal compliance eval set

60%

reduction in analyst research time

10K+

regulatory documents indexed

View on GitHub →

Agentic AI · MCP · Financial Services

MCP Financial Data Assistant

Risk analysts needed to query live trading and liquidity data for ad-hoc questions — but every query meant writing SQL or waiting for a data engineer. I built an MCP server that lets Claude do it in natural language.

The server exposes 6 structured financial tools. Claude decides which tools to call in what order, executes them, and synthesizes the answer. The hardest part was tool design — too granular and the model makes too many steps, too broad and it loses precision. Getting the boundaries right took several iterations with actual analyst workflows.

Anthropic MCPClaude API PythonTool Use Agentic WorkflowsSQLiteFastAPI

server.py

# MCP tool: expose financial data to Claude
@server.call_tool()
async def call_tool(name: str, args: dict):
    if name == "get_risk_metrics":
        r = db.execute("SELECT * FROM risk_metrics WHERE portfolio_id=?",
                      (args["portfolio_id"],)).fetchone()
        return dict(r)
# Claude resolves: "What's our APAC credit exposure this week?"
# → get_portfolio → get_risk_metrics → get_exposure_by_region

90s

analyst time-to-insight, down from 15 minutes

financial data tools exposed via MCP

100%

audit-logged — every tool call traceable

View on GitHub →

LLM Evaluation · Financial Services

LLM Benchmarking Framework

At Deutsche Bank I needed to figure out which LLMs were actually worth deploying in a regulated financial environment. So I built an evaluation framework to find out.

The framework compares models on what actually matters for enterprise use: TTFT, throughput, hallucination rate, and cost per token. The finding that shaped the final architecture: GPT-4 had the best accuracy but cost 15x more than Gemini. The answer wasn't "use the best model" — it was use the right model for the right risk level.

GPT-4Gemini ProGrok TTFTHallucination Rate PythonLangFuse

benchmark.py

# Compare models on what matters in production
results = LLMBenchmark(
    models=["gpt-4", "gemini-pro", "grok-1"],
    metrics=["ttft", "throughput", "hallucination_rate", "cost_per_token"],
    risk_level="high"
).run(dataset="regulatory_qa.jsonl")
# GPT-4 → high-stakes tasks
# Gemini → high-volume synthesis (65% cheaper)

65%

reduction in regulatory research time

LLMs evaluated head-to-head in production

15x

cost difference that changed the architecture

Time-Series Forecasting · Risk Models

Market Risk Forecasting Pipeline

At Deutsche Bank I built the forecasting layer for market risk and liquidity models. The pipeline runs daily across 25M+ records from multiple asset class systems.

The interesting part was model selection. ARIMA is interpretable and fast. Prophet handles seasonality. LSTM captures non-linear patterns. The choice isn't about which model is best in theory — it's about which one you can explain to a risk committee and audit under Basel III.

ARIMAProphetLSTM Apache AirflowMLflow AWS SageMakerBasel III

21%

improvement in VaR forecasting accuracy

50min

pipeline runtime, down from 5 hours

25M+

daily records processed

NLP · Healthcare · MLOps

Clinical NLP Anomaly Detection

At HCLTech I built a pipeline to catch ICD-10 coding errors in physician notes before they caused billing or compliance issues. Medical coders were spending hours on manual review — the goal was to make that review targeted.

The pipeline uses SpaCy for entity extraction and TF-IDF for anomaly scoring. The deployment constraint made this more interesting than the model — everything had to be HIPAA-compliant with strict audit logging and zero PII in model inputs or outputs.

SpaCyTF-IDFFlask DockerAWS EC2 HIPAAMLflow

25%

improvement in ICD-10 coding accuracy

75%

faster model deployment time

90%+

model consistency maintained in production

I build ML systems
that survive production.

I'm making a deliberate move, not a desperate one.

I build ML systemsthat survive production.

I'm making a deliberate move, not a desperate one.

I build ML systems
that survive production.