Explore >> Select a destination


You are here

blog.research.google
| | teddykoker.com
1.8 parsecs away

Travel
| | Gradient-descent-based optimizers have long been used as the optimization algorithm of choice for deep learning models. Over the years, various modifications to the basic mini-batch gradient descent have been proposed, such as adding momentum or Nesterovs Accelerated Gradient (Sutskever et al., 2013), as well as the popular Adam optimizer (Kingma & Ba, 2014). The paper Learning to Learn by Gradient Descent by Gradient Descent (Andrychowicz et al., 2016) demonstrates how the optimizer itself can be replac...
| | bdtechtalks.com
1.1 parsecs away

Travel
| | Gradient descent is the main technique for training machine learning and deep learning models. Read all about it.
| | ai.googleblog.com
1.1 parsecs away

Travel
| |
| | comsci.blog
18.0 parsecs away

Travel
| In this blog post, we will learn about vision transformers (ViT), and implement an MNIST classifier with it. We will go step-by-step and understand every part of the vision transformers clearly, and you will see the motivations of the authors of the original paper in some of the parts of the architecture.