You are here |
jaykmody.com | ||
| | | |
rohan.ga
|
|
| | | | This is a reference, not a guide. In a modern LLM, the "weights" consist of several distinct collections of matrices and tensors that serve different functions during inference: Token Embeddings - Large matrix mapping token IDs to vector representations - Used at the very start of inference to convert input tokens to vectors - Typically shape: [vocab_size, hidden_dim] Attention Mechanism Weights - Query/Key/Value Projection Matrices: In standard attention: 3 separate matrices [hidden_dim, hidden_dim] In GQA: One Q matrix but fewer K/V matrices [hidden_dim, kv_dim] Used to project hidden states into query, key, and value spaces - Output Projection Matrix: Maps attention outputs back to hidden dimension [hidden_dim, hidden_dim] Used after attention calculation to project back to main representation RoPE Parameters - Not traditional weight matrices but positional embedding tensors - Used to rotate query/key vectors to encode positional information - Applied during attention computation by complex multiplication | |
| | | |
nlp.seas.harvard.edu
|
|
| | | | The Annotated Transformer | |
| | | |
www.v7labs.com
|
|
| | | | Learn about the different types of neural network architectures. | |
| | | |
seekinglavenderlane.com
|
|
| |