Outer Web | Explore

Explore >> Select a destination

You are here		transformer-circuits.pub Superposition, Memorization, and Double Descent
\|	\|	www.lesswrong.com Towards Monosemanticity: Decomposing Language Models With Dictionary Learning - LessWrong	4.8 parsecs away Travel
\|	\|	Text of post based on our blog post as a linkpost for the full paper which is considerably longer and more detailed. ...	4.8 parsecs away Travel
\|	\|	thesephist.com Prism: mapping interpretable concepts and features in a latent space of language \| thesephist.com	4.6 parsecs away Travel
\|	\|	[AI summary] The text provides an in-depth overview of research on sparse autoencoders (SAEs) applied to embeddings for automated interpretability. It discusses methods for analyzing and manipulating embeddings, including feature extraction, gradient-based optimization, and visualization tools. The work emphasizes the importance of understanding model representations to improve human-computer interaction with information systems. Key components include: 1) Automated interpretability prompts for generating feature labels, 2) Feature gradients implementation for optimizing embeddings to match desired feature dictionaries, and 3) Visualizations of feature spaces and embedding transformations. The text also includes FAQs addressing the use of embeddings over lan...	4.6 parsecs away Travel
\|	\|	iclr-blogposts.github.io Double Descent Demystified \| ICLR Blogposts 2024	3.6 parsecs away Travel
\|	\|	Identifying, Interpreting & Ablating the Sources of a Deep Learning Puzzle	3.6 parsecs away Travel
\|	\|	marcospereira.me Backpropagation From Scratch-Marcos Pereira	15.3 parsecs away Travel
\|		In this post we summarize the math behind deep learning and implement a simple network that achieves 85% accuracy classifying digits from the MNIST dataset.	15.3 parsecs away Travel