Harvard University
M.S. Data Science · Institute for Applied Computational Science
GPA 4.0/4.0. Coursework in ML, MLOps, Quantitative NLP, Multilevel Models, Big Data Systems. Research across RAG, AI safety, and interpretable health analytics.
Data Scientist · ML Researcher
Harvard Data Science candidate building retrieval-augmented and multi-agent ML systems for voice assistants, clinical research, and responsible automation.
Agentic ML, RAG, and AI safety for health + voice products
Amazon Alexa AI · Harvard & BCH CHIP Lab · ChronicQue
Cut Alexa log runtimes 40% and boosted RareMind coverage 13.6%
Studying at the Institute for Applied Computational Science with a 4.0 GPA—specializing in retrieval-augmented generation, model eval, and multi-agent orchestration.
About
My current work spans Alexa's Semantic Enrichment Pipeline, and ChronicQue's rare disease navigator. I love translating messy, high-volume signals into human-friendly insights with rigorous evaluation and strong product instincts.
Education
Graduate and undergraduate training that shaped my approach to causal inference, large-scale systems, and responsible AI.
M.S. Data Science · Institute for Applied Computational Science
GPA 4.0/4.0. Coursework in ML, MLOps, Quantitative NLP, Multilevel Models, Big Data Systems. Research across RAG, AI safety, and interpretable health analytics.
B.A. Economics & Statistics · GPA 4.0/4.0
Focused on statistical learning, causal inference, advanced econometrics, and time series analysis—fueling my product analytics toolkit.
Experience
From Alexa’s semantic cache to public-health intelligence, here are the teams and systems I’ve helped build recently.
Data Engineer, Alexa Semantic Enrichment Pipeline
Processed 18 TB/30-month utterance logs to support RAG for Alexa's semantic cache, cutting key joins from 3% mismatches to 0.08% and lifting tail-index retrieval to ~10M while reducing runtime 40% with salted Spark keys and AQE.
Machine Learning Engineer · Agentic Unified Review of Unstructured Media
Designed a multi-agent workflow that validates static web scraping with an LLM-backed extraction pipeline and modular roles (8 agents). Benchmarked against human-labeled datasets to deliver +100 pp discovery recall, +13.6% extra coverage from new events, 12% more fresh discoveries, and 29% fewer false positives after review, trading a small cost increase for broader, cleaner coverage.
AI Researcher · Diagnostic Assistant for Rare & Common Diseases
Closed the 4–5 year rare-disease diagnosis gap with a hybrid engine where Phrank handles rare disorders alongside a medical LLM for common cases, all routed by a lightweight meta-learner.
Researcher · Context-Debias
Extended the Context-Debias framework to new attributes, adding L2 stability terms to preserve original semantic information; retained GLUE benchmark performance while reducing SEAT (Age/Disability) bias scores from 0.51 → 0.04.
Data Scientist, Product Analytics
Shipped a Looker Studio + BigQuery dashboard monitoring 300k+ user sessions, unlocking a 15% lift in engagement and training logistic models to flag session-level bounce risk, improving UX changes by 10%.
Selected work
Recent systems I built or led—spanning RAG pipelines, health tech, and bias-aware NLP.
Built a spatial panel linking NIFC fire perimeters to SafeGraph visits, indexing concentric buffers (0–2/2–5/5–10/10–25 km) around each alarm. Estimated a PPML difference-in-differences/event-study to recover the ATT: close-in visits fall ~40% immediately, while mid-band zones show a substitution bump before re-stabilizing six months later.
Built multilevel mixed-effects and Bayesian models linking watershed response to climate indicators, clustering basins to highlight shared risk patterns for water management.
Compared renewable energy technologies via Bayesian evidence synthesis and hierarchical modeling to rank investments and inform capital allocation.
Built a modular review stack for public-health surveillance with role-specific agents (Relevance Gate, Layout Parser, Fact Checker, Credibility Scorer, Arbiter) plus an LLM-validated extraction pipeline to transform static web scraping into structured alerts.
Hybrid matcher where Phrank scores rare diseases, a medical LLM handles common cases, and ClinPhen powers symptom extraction with negation handling, all orchestrated by a meta-learner for 20 ms responses at F1 0.80.
Extended the bias-mitigation framework with orthogonal loss terms, maintaining GLUE performance while shrinking SEAT Age/Disability scores 0.51 → 0.04.
Want to see prior geospatial + econometrics work? Browse the analytics archive for long-form notebooks and published analyses.
Hackathons
Favorite sprints where collaboration, research, and storytelling came together.
ConvNeXt + DANN backbone with captioning and saliency to explain why an image is flagged. Designed to keep platforms safe while staying transparent.
View on GitHubFastAPI + React platform that extracts HPO terms, triages symptoms via LLMs, and produces shareable diagnostic reports.
View on GitHubBlended BERT embeddings, TF-IDF, and SHAP to predict which pro bono legal questions get answered—achieving 87% accuracy.
Contact
I'm currently open to full time MLE/DS/DE opportunities on areas related to agentic system, responsible AI, or delightful data storytelling.