Explore >> Select a destination


You are here

teddykoker.com
| | harvardnlp.github.io
1.3 parsecs away

Travel
| | [AI summary] The provided code is a comprehensive implementation of the Transformer model, including data loading, model architecture, training, and visualization. It also includes functions for decoding and visualizing attention mechanisms across different layers of the model. The code is structured to support both training and inference, with examples provided for running the model and visualizing attention patterns.
| | sigmoidprime.com
3.0 parsecs away

Travel
| | An exploration of Transformer-XL, a modified Transformer optimized for longer context length.
| | vxlabs.com
3.0 parsecs away

Travel
| | I have recently become fascinated with (Variational) Autoencoders and with PyTorch. Kevin Frans has a beautiful blog post online explaining variational autoencoders, with examples in TensorFlow and, importantly, with cat pictures. Jaan Altosaar's blog post takes an even deeper look at VAEs from both the deep learning perspective and the perspective of graphical models. Both of these posts, as well as Diederik Kingma's original 2014 paper Auto-Encoding Variational Bayes, are more than worth your time.
| | www.paepper.com
9.9 parsecs away

Travel
| When you have a big data set and a complicated machine learning problem, chances are that training your model takes a couple of days even on a modern GPU. However, it is well-known that the cycle of having a new idea, implementing it and then verifying it should be as quick as possible. This is to ensure that you can efficiently test out new ideas. If you need to wait for a whole week for your training run, this becomes very inefficient.