Home Posts Projects

LLMs and evals

November 12, 2024

https://eugeneyan.com/writing/llm-patterns/#how-to-apply-evals
https://www.sh-reya.com/blog/ai-engineering-flywheel/
https://huyenchip.com/2024/07/25/genai-platform.html
https://eugeneyan.com/writing/evals/
https://x.com/karpathy/status/1599852921541128194

Pipeline (including evals):

⁠https://www.sh-reya.com/blog/ai-engineering-flywheel/
⁠https://huyenchip.com/2024/07/25/genai-platform.html
⁠https://jxnl.github.io/blog/writing/2024/02/28/levels-of-complexity-rag-applications/ ⁠ Evals (overview):
https://eugeneyan.com/writing/llm-patterns/#how-to-apply-evals
https://eugeneyan.com/writing/evals/
⁠https://hamel.dev/blog/posts/evals/

Evals (practical):

⁠https://docs.anthropic.com/en/docs/build-with-claude/develop-tests
⁠https://github.com/anthropics/anthropic-cookbook/blob/main/misc/building%5Fevals.ipynb
⁠https://cookbook.openai.com/examples/evaluation/getting_started_with_openai_evals
⁠https://github.com/run-llama/ai-engineer-workshop/blob/main/notebooks/02_evaluation.ipynb
⁠https://eugeneyan.com/writing/aligneval/

Misc:

⁠https://github.com/openai/evals/tree/main ⁠https://github.com/pltrdy/rouge

© 2025 Luke Miloszewski