Explore >> Select a destination


You are here

distill.pub
| | blog.evjang.com
1.8 parsecs away

Travel
| | This is a tutorial on common practices in training generative models that optimize likelihood directly, such as autoregressive models and ...
| | colinraffel.com
1.0 parsecs away

Travel
| |
| | sander.ai
1.6 parsecs away

Travel
| | Thoughts on the tension between iterative refinement as the thing that makes diffusion models work, and our continual attempts to make it _less_ iterative.
| | www.paepper.com
10.4 parsecs away

Travel
| Today's paper: Rethinking 'Batch' in BatchNorm by Wu & Johnson BatchNorm is a critical building block in modern convolutional neural networks. Its unique property of operating on "batches" instead of individual samples introduces significantly different behaviors from most other operations in deep learning. As a result, it leads to many hidden caveats that can negatively impact model's performance in subtle ways. This is a citation from the paper's abstract and the emphasis is mine which caught my attention. Let's explore these subtle ways which can negatively impact your model's performance! The paper of Wu & Johnson can be found on arxiv.