|
You are here |
jaykmody.com | ||
| | | | |
marcospereira.me
|
|
| | | | | In this post we summarize the math behind deep learning and implement a simple network that achieves 85% accuracy classifying digits from the MNIST dataset. | |
| | | | |
sigmoidprime.com
|
|
| | | | | An exploration of Transformer-XL, a modified Transformer optimized for longer context length. | |
| | | | |
swethatanamala.github.io
|
|
| | | | | In this paper, authors proposed a new simple network architecture, the Transformer, based solely on attention mechanisms, removing convolutions and recurrences entirely. Transformer is the first transduction model relying entirely... | |
| | | | |
www.swyx.io
|
|
| | | That one time we tried to emulate our brains with computer chips | ||