Explore >> Select a destination


You are here

justindomke.wordpress.com
| | francisbach.com
2.9 parsecs away

Travel
| | [AI summary] This text discusses the scaling laws of optimization in machine learning, focusing on asymptotic expansions for both strongly convex and non-strongly convex cases. It covers the derivation of performance bounds using techniques like Laplace's method and the behavior of random minimizers. The text also explains the 'weird' behavior observed in certain plots, where non-strongly convex bounds become tight under specific conditions. The analysis connects theoretical results to practical considerations in optimization algorithms.
| | blog.research.google
2.3 parsecs away

Travel
| | [AI summary] This blog post introduces Stochastic Re-weighted Gradient Descent (RGD), a novel optimization algorithm that improves deep neural network performance by re-weighting data points during training based on their difficulty, enhancing generalization and robustness against data distribution shifts.
| | fa.bianp.net
1.9 parsecs away

Travel
| | MathJax.Hub.Config({ extensions: ["tex2jax.js"], jax: ["input/TeX", "output/HTML-CSS"], tex2jax: { inlineMath: [ ['$','$'], ["\\(","\\)"] ], displayMath: [ ['$$','$$'], ["\\[","\\]"] ], processEscapes: true }, TeX: { equationNumbers: { autoNumber: "AMS" }, Macros: { RR: "{\\mathbb{R}}", argmin: "{\\mathop{\\mathrm{arg\\,min}}}", bold: ["{\\bf #1}",1] } }, "HTML-CSS": { availableFonts: ["TeX"] } }); TL;DR: I describe a method for hyperparameter optimization by gradient descent. Most machine ...
| | marcospereira.me
12.9 parsecs away

Travel
| In this post we summarize the math behind deep learning and implement a simple network that achieves 85% accuracy classifying digits from the MNIST dataset.