Explore >> Select a destination


You are here

www.bitsnbites.eu
| | www.nayuki.io
3.2 parsecs away

Travel
| | [AI summary] The user has provided a comprehensive overview of the x86 architecture, covering topics such as basic arithmetic operations, control flow with jumps and conditionals, memory addressing modes, the stack and calling conventions, advanced instructions like SSE, virtual memory, and differences between x86-32 and x86-64. The user is likely looking for a summary or clarification of the x86 architecture, possibly for learning purposes or to reinforce their understanding.
| | ashvardanian.com
3.8 parsecs away

Travel
| | David Patterson had recently mentioned that (rephrasing): The programmers may benefit from using complex instruction sets directly, but it is increasingly challenging for compilers to automatically generate them in the right spots. In the last 3-4 years I gave a bunch of talks on the intricacies of SIMD programming, highlighting the divergence in hardware and software design in the past ten years. Chips are becoming bigger and more complicated to add more functionality, but the general-purpose compilers like GCC, LLVM, MSVC and ICC cannot keep up with the pace. Hardly any developer codes in Assembly today, hoping that the compiler will do the heavy lifting.
| | cprimozic.net
5.3 parsecs away

Travel
| | A detailed summary of the techniques I used to optimize my Advent of Code 2024 solution for Day 9 Part 2. Employs a variety of techniques including algorithmic shortcuts, bespoke data structures, and low-level optimizations + SIMD.
| | ashvardanian.com
12.9 parsecs away

Travel
| This blogpost is a mirror of the original post on Modular.com. Modern CPUs have an incredible superpower: super-scalar operations, made available through single instruction, multiple data (SIMD) parallel processing. Instead of doing one operation at a time, a single core can do up to 4, 8, 16, or even 32 operations in parallel. In a way, a modern CPU is like a mini GPU, able to perform a lot of simultaneous calculations. Yet, because it's so tricky to write parallel operations, almost all that potential remains untapped, resulting in code that only does one operation at a time.