March 10, 2025
LLM Eval Dashboard
A lightweight evaluation framework and dashboard for tracking LLM quality across prompt versions, models, and test suites.
EvalsNext.jsPythonOpenAI
View projectGitHub
Side projects and experiments at the intersection of AI and product. Most are open source.
A lightweight evaluation framework and dashboard for tracking LLM quality across prompt versions, models, and test suites.
An autonomous support agent that resolves Tier-1 tickets using RAG over internal knowledge bases and escalates edge cases to humans.