Outer Web | Explore

Explore >> Select a destination

You are here		francisbach.com Rethinking SGD's noise - II: Implicit Bias - Machine Learning Research Blog
\|	\|	fa.bianp.net On the Convergence of the Unadjusted Langevin Algorithm	6.2 parsecs away Travel
\|	\|	The Langevin algorithm is a simple and powerful method to sample from a probability distribution. It's a key ingredient of some machine learning methods such as diffusion models and differentially private learning. In this post, I'll derive a simple convergence analysis of this method in the special case when the ...	6.2 parsecs away Travel
\|	\|	windowsontheory.org A blitz through classical statistical learning theory - Windows On Theory	5.3 parsecs away Travel
\|	\|	Previous post: ML theory with bad drawings Next post: What do neural networks learn and when do they learn it, see also all seminar posts and course webpage. Lecture video (starts in slide 2 since I hit record button 30 seconds too late - sorry!) - slides (pdf) - slides (Powerpoint with ink and animation)...	5.3 parsecs away Travel
\|	\|	iclr-blogposts.github.io How to compute Hessian-vector products? \| ICLR Blogposts 2024	8.4 parsecs away Travel
\|	\|	The product between the Hessian of a function and a vector, the Hessian-vector product (HVP), is a fundamental quantity to study the variation of a function. It is ubiquitous in traditional optimization and machine learning. However, the computation of HVPs is often considered prohibitive in the context of deep learning, driving practitioners to use proxy quantities to evaluate the loss geometry. Standard automatic differentiation theory predicts that the computational complexity of an HVP is of the same order of magnitude as the complexity of computing a gradient. The goal of this blog post is to provide a practical counterpart to this theoretical result, showing that modern automatic differentiation frameworks, JAX and PyTorch, allow for efficient computation of these HVPs in standard deep learning cost functions.	8.4 parsecs away Travel
\|	\|	serengil.wordpress.com Sigmoid Function as Neural Network Activation Function - Sefik Ilkin Serengil	50.9 parsecs away Travel
\|		Sigmoid function (aka logistic function) is moslty picked up as activation functionin neural networks.Because its derivative is easy to demonstrate. It produces output in scale of [0 ,1] whereas input is meaningful between [-5, +5]. Out of this range produces same outputs.In this post, we'll mention the proof of the derivative calculation. Sigmoid function is...	50.9 parsecs away Travel