HomePostsProjects

LLMs and evals

November 12, 2024

  • https://eugeneyan.com/writing/llm-patterns/#how-to-apply-evals
  • https://www.sh-reya.com/blog/ai-engineering-flywheel/
  • https://huyenchip.com/2024/07/25/genai-platform.html
  • https://eugeneyan.com/writing/evals/
  • https://x.com/karpathy/status/1599852921541128194

Pipeline (including evals):

  • ⁠https://www.sh-reya.com/blog/ai-engineering-flywheel/

  • ⁠https://huyenchip.com/2024/07/25/genai-platform.html

  • ⁠https://jxnl.github.io/blog/writing/2024/02/28/levels-of-complexity-rag-applications/ ⁠ Evals (overview):

  • https://eugeneyan.com/writing/llm-patterns/#how-to-apply-evals

  • https://eugeneyan.com/writing/evals/

  • ⁠https://hamel.dev/blog/posts/evals/

Evals (practical):

  • ⁠https://docs.anthropic.com/en/docs/build-with-claude/develop-tests
  • ⁠https://github.com/anthropics/anthropic-cookbook/blob/main/misc/building%5Fevals.ipynb
  • ⁠https://cookbook.openai.com/examples/evaluation/getting_started_with_openai_evals
  • ⁠https://github.com/run-llama/ai-engineer-workshop/blob/main/notebooks/02_evaluation.ipynb
  • ⁠https://eugeneyan.com/writing/aligneval/

Misc:

⁠https://github.com/openai/evals/tree/main ⁠https://github.com/pltrdy/rouge

© 2025 Luke Miloszewski

Email AddressGitHubTwitter