|
You are here |
www.confident-ai.com | ||
| | | | |
humanloop.com
|
|
| | | | | An overview of evaluating LLM applications. The emerging evaluation framework, parallels to traditional software testing and some guidance on best practices. | |
| | | | |
hamel.dev
|
|
| | | | | How to construct domain-specific LLM evaluation systems. | |
| | | | |
blog.context.ai
|
|
| | | | | Large Language Models are incredibly impressive, and the number of products with LLM-based features is growing exponentially But the excitement of launching an LLM product is often followed by important questions: how well is it working? Are my changes improving it? What follows are usually rudimentary, home-grown evaluations (or evals) | |
| | | | |
marcospereira.me
|
|
| | | In this post we summarize the math behind deep learning and implement a simple network that achieves 85% accuracy classifying digits from the MNIST dataset. | ||