Explore >> Select a destination


You are here

jaykmody.com
| | rohan.ga
15.6 parsecs away

Travel
| | This is a reference, not a guide. In a modern LLM, the "weights" consist of several distinct collections of matrices and tensors that serve different functions during inference: Token Embeddings - Large matrix mapping token IDs to vector representations - Used at the very start of inference to convert input tokens to vectors - Typically shape: [vocab_size, hidden_dim] Attention Mechanism Weights - Query/Key/Value Projection Matrices: In standard attention: 3 separate matrices [hidden_dim, hidden_dim] In GQA: One Q matrix but fewer K/V matrices [hidden_dim, kv_dim] Used to project hidden states into query, key, and value spaces - Output Projection Matrix: Maps attention outputs back to hidden dimension [hidden_dim, hidden_dim] Used after attention calculation to project back to main representation RoPE Parameters - Not traditional weight matrices but positional embedding tensors - Used to rotate query/key vectors to encode positional information - Applied during attention computation by complex multiplication
| | nlp.seas.harvard.edu
8.4 parsecs away

Travel
| | The Annotated Transformer
| | www.v7labs.com
10.1 parsecs away

Travel
| | Learn about the different types of neural network architectures.
| | seekinglavenderlane.com
34.1 parsecs away

Travel
|