Explore >> Select a destination


You are here

cgad.ski
| | bdtechtalks.com
1.8 parsecs away

Travel
| | Gradient descent is the main technique for training machine learning and deep learning models. Read all about it.
| | windowsontheory.org
2.5 parsecs away

Travel
| | Previous post: ML theory with bad drawings Next post: What do neural networks learn and when do they learn it, see also all seminar posts and course webpage. Lecture video (starts in slide 2 since I hit record button 30 seconds too late - sorry!) - slides (pdf) - slides (Powerpoint with ink and animation)...
| | francisbach.com
2.6 parsecs away

Travel
| | [AI summary] The blog post discusses the spectral properties of kernel matrices, focusing on the analysis of eigenvalues and their estimation using tools like the matrix Bernstein inequality. It also covers the estimation of the number of integer vectors with a given L1 norm and the relationship between these counts and combinatorial structures. The post includes a detailed derivation of bounds for the difference between true and estimated eigenvalues, highlighting the role of the degrees of freedom and the impact of regularization in kernel methods. Additionally, it touches on the importance of spectral analysis in machine learning and its applications in various domains.
| | programmathically.com
17.9 parsecs away

Travel
| Sharing is caringTweetIn this post, we develop an understanding of why gradients can vanish or explode when training deep neural networks. Furthermore, we look at some strategies for avoiding exploding and vanishing gradients. The vanishing gradient problem describes a situation encountered in the training of neural networks where the gradients used to update the weights []