Explore >> Select a destination


You are here

lepisma.xyz
| | programmathically.com
4.9 parsecs away

Travel
| | Sharing is caringTweetIn this post, we develop an understanding of why gradients can vanish or explode when training deep neural networks. Furthermore, we look at some strategies for avoiding exploding and vanishing gradients. The vanishing gradient problem describes a situation encountered in the training of neural networks where the gradients used to update the weights []
| | www.paepper.com
5.3 parsecs away

Travel
| | [AI summary] This article explains how to train a simple neural network using Numpy in Python without relying on frameworks like TensorFlow or PyTorch, focusing on the implementation of ReLU activation, weight initialization, and gradient descent for optimization.
| | bdtechtalks.com
5.2 parsecs away

Travel
| | Gradient descent is the main technique for training machine learning and deep learning models. Read all about it.
| | www.lesswrong.com
22.5 parsecs away

Travel
| A new Anthropic interpretability paper-"Toy Models of Superpostion"-came out last week that I think is quite exciting and hasn't been discussed here...