You are here |
theankurtyagi.com | ||
| | | |
www.agentcloud.dev
|
|
| | | | Learn about the difference between Agent Cloud and OpenAI | |
| | | |
www.agentcloud.dev
|
|
| | | | Compare the features, performance, and use cases of AgentCloud and Qdrant | |
| | | |
www.daffodilsw.com
|
|
| | | | Need AI development services? Daffodil Software is an AI Development Company which helps automate tasks, improve efficiency, and drive growth | |
| | | |
rohan.ga
|
|
| | This is a reference, not a guide. In a modern LLM, the "weights" consist of several distinct collections of matrices and tensors that serve different functions during inference: Token Embeddings - Large matrix mapping token IDs to vector representations - Used at the very start of inference to convert input tokens to vectors - Typically shape: [vocab_size, hidden_dim] Attention Mechanism Weights - Query/Key/Value Projection Matrices: In standard attention: 3 separate matrices [hidden_dim, hidden_dim] In GQA: One Q matrix but fewer K/V matrices [hidden_dim, kv_dim] Used to project hidden states into query, key, and value spaces - Output Projection Matrix: Maps attention outputs back to hidden dimension [hidden_dim, hidden_dim] Used after attention calculation to project back to main representation RoPE Parameters - Not traditional weight matrices but positional embedding tensors - Used to rotate query/key vectors to encode positional information - Applied during attention computation by complex multiplication |