Explore >> Select a destination


You are here

windowsontheory.org
| | blog.ml.cmu.edu
3.3 parsecs away

Travel
| | The latest news and publications regarding machine learning, artificial intelligence or related, brought to you by the Machine Learning Blog, a spinoff of the Machine Learning Department at Carnegie Mellon University.
| | francisbach.com
2.0 parsecs away

Travel
| | [AI summary] This text discusses the scaling laws of optimization in machine learning, focusing on asymptotic expansions for both strongly convex and non-strongly convex cases. It covers the derivation of performance bounds using techniques like Laplace's method and the behavior of random minimizers. The text also explains the 'weird' behavior observed in certain plots, where non-strongly convex bounds become tight under specific conditions. The analysis connects theoretical results to practical considerations in optimization algorithms.
| | hackmd.io
3.7 parsecs away

Travel
| |
| | programmathically.com
28.4 parsecs away

Travel
| Sharing is caringTweetIn this post, we develop an understanding of why gradients can vanish or explode when training deep neural networks. Furthermore, we look at some strategies for avoiding exploding and vanishing gradients. The vanishing gradient problem describes a situation encountered in the training of neural networks where the gradients used to update the weights []