|
You are here |
jaketae.github.io | ||
| | | | |
teddykoker.com
|
|
| | | | | Google AI recently released a paper, Rethinking Attention with Performers (Choromanski et al., 2020), which introduces Performer, a Transformer architecture which estimates the full-rank-attention mechanism using orthogonal random features to approximate the softmax kernel with linear space and time complexity. In this post we will investigate how this works, and how it is useful for the machine learning community. | |
| | | | |
sigmoidprime.com
|
|
| | | | | An exploration of Transformer-XL, a modified Transformer optimized for longer context length. | |
| | | | |
randorithms.com
|
|
| | | | | The Taylor series is a widely-used method to approximate a function, with many applications. Given a function \(y = f(x)\), we can express \(f(x)\) in terms ... | |
| | | | |
wtfleming.github.io
|
|
| | | |||