Projects
Side work I keep coming back to. Each one started as something I wanted to use myself, then turned into a place to think out loud about agents, infrastructure, and the boring parts of shipping.
langgraph-rag-agent
Source ↗Multi-agent RAG over the LangChain Python docs. Five-node LangGraph with model routing: Haiku for verification, Sonnet for planning. The planner/verifier loop is cyclic — the planner sees the verifier's reason on rejection, so the second attempt is informed instead of identical.
A human-in-the-loop interrupt on SqliteSaver pauses execution one cycle before the budget runs out, with three resume paths (approve, rewrite, reject). Pydantic structured outputs keep the verifier honest; Langfuse traces sit on the driver for observability.
Deployed on AWS Lambda behind a key-throttled API Gateway. CI runs mocked pytest plus a 10-case real-API eval gate that calls Sonnet on every PR before promoting to Lambda.
LLM Shield
Source ↗Resilience proxy in front of a flaky upstream, built against the OpenAI API as the test target. The premise: I wanted to see how my own LLM integrations would behave when the provider has a bad day, and the only way to know was to put a layer between them.
Three patterns wired together from scratch. Idempotency keys in Redis so client retries don't duplicate upstream calls. Exponential backoff with jitter on transient errors. A circuit breaker that trips on an upstream error-rate threshold, fails fast while open, and tries a single probe before closing back.
The proxy speaks the OpenAI API verbatim, so existing clients point at it without code changes.
Holler
Source ↗Push-to-talk voice dictation for Linux. Whisper transcribes through Groq; an LLM step (Groq and OpenAI interchangeable) cleans spoken punctuation, handles bilingual translate-back, and does light rephrasing.
Hard-learned guardrails: skip on empty input, reject hallucinated length-ratio expansions, fall back to raw transcription on API failure. Versioned prompts by (mode, language); evals on a fixed test set replace vibes.
Runs about a dollar a month at daily use.
Roast My Repo
Source ↗Roast My Repo is a live serverless API that analyzes any public GitHub repository and returns a brutally honest, LLM-generated code review — the kind a senior engineer would give if they had zero filter.
The backend is a TypeScript Lambda deployed on AWS behind API Gateway, with cold-start times under 300ms. It fetches the repo tree, samples the most relevant files, and passes them to Groq's inference API for fast, structured critique.
The whole thing ships through a GitHub Actions pipeline I wrote from scratch — lint, test, build, deploy to Lambda — so every push to main hits production within two minutes.