|
You are here |
jhui.github.io | ||
| | | | |
www.jeremykun.com
|
|
| | | | | This post is a sequel to Formulating the Support Vector Machine Optimization Problem. The Karush-Kuhn-Tucker theorem Generic optimization problems are hard to solve efficiently. However, optimization problems whose objective and constraints have special structure often succumb to analytic simplifications. For example, if you want to optimize a linear function subject to linear equality constraints, one can compute the Lagrangian of the system and find the zeros of its gradient. More generally, optimizing... | |
| | | | |
aria42.com
|
|
| | | | | Numerical optimization is at the core of much of machine learning. In this post, we derive the L-BFGS algorithm, commonly used in batch machine learning applications. | |
| | | | |
iclr-blogposts.github.io
|
|
| | | | | The product between the Hessian of a function and a vector, the Hessian-vector product (HVP), is a fundamental quantity to study the variation of a function. It is ubiquitous in traditional optimization and machine learning. However, the computation of HVPs is often considered prohibitive in the context of deep learning, driving practitioners to use proxy quantities to evaluate the loss geometry. Standard automatic differentiation theory predicts that the computational complexity of an HVP is of the same order of magnitude as the complexity of computing a gradient. The goal of this blog post is to provide a practical counterpart to this theoretical result, showing that modern automatic differentiation frameworks, JAX and PyTorch, allow for efficient computat... | |
| | | | |
coornail.net
|
|
| | | Neural networks are a powerful tool in machine learning that can be trained to perform a wide range of tasks, from image classification to natural language processing. In this blog post, well explore how to teach a neural network to add together two numbers. You can also think about this article as a tutorial for tensorflow. | ||