|
You are here |
www.modular.com | ||
| | | | |
ashvardanian.com
|
|
| | | | | This blogpost is a mirror of the original post on Modular.com. Modern CPUs have an incredible superpower: super-scalar operations, made available through single instruction, multiple data (SIMD) parallel processing. Instead of doing one operation at a time, a single core can do up to 4, 8, 16, or even 32 operations in parallel. In a way, a modern CPU is like a mini GPU, able to perform a lot of simultaneous calculations. Yet, because it's so tricky to write parallel operations, almost all that potential remains untapped, resulting in code that only does one operation at a time. | |
| | | | |
mcyoung.xyz
|
|
| | | | | [AI summary] The text provides an in-depth exploration of SIMD (Single Instruction, Multiple Data) programming, focusing on its application in optimizing algorithms like base64 decoding. It outlines the challenges of writing portable SIMD code across different architectures, the role of compilers and instruction sets, and the importance of avoiding branches in performance-critical code. The article transitions into a practical example of implementing a SIMD version of the base64 decoding algorithm, emphasizing the use of shuffles and data reordering to efficiently process data in parallel. It also touches on the trade-offs between using intrinsics, portable SIMD libraries, and compiler optimizations, while highlighting the complexities of cross-platform deve... | |
| | | | |
cprimozic.net
|
|
| | | | | A detailed summary of the techniques I used to optimize my Advent of Code 2024 solution for Day 9 Part 2. Employs a variety of techniques including algorithmic shortcuts, bespoke data structures, and low-level optimizations + SIMD. | |
| | | | |
shellsharks.com
|
|
| | | An introduction to x86 Intel assembly. | ||