You are here |
easyperf.net | ||
| | | |
ashvardanian.com
|
|
| | | | A bit of history. Not so long ago, we tried to use GPU acceleration from Python. We benchmarked NumPy vs CuPy in the most common number-crunching tasks. We took the highest-end desktop CPU and the highest-end desktop GPU and put them to the test. The GPU, expectedly, won, but not just in Matrix Multiplications. Sorting arrays, finding medians, and even simple accumulation was vastly faster. So we implemented multiple algorithms for parallel reductions in C++ and CUDA, just to compare efficiency. CUDA was obviously harder, than using std::accumulate, but there is a shortcut: thrust::reduce. | |
| | | |
blog.jessriedel.com
|
|
| | | | [Other posts in this series: 2,3,4.] I had the chance to have dinner tonight with Paul Ginsparg of arXiv fame, and he graciously gave me some feedback on a very speculative idea that I've been kicking around: augmenting - or even replacing - the current academic article model with collaborative documents. Even after years of mulling it over, my thoughts on this aren't fully formed. But I thought I'd share my thinking, however incomplete, after incorporating Paul's commentary while it is still fresh in my... | |
| | | |
blog.stackblitz.com
|
|
| | | | Deep dive into WebAssembly performance issues on Apple Silicon and browser engines. | |
| | | |
manybutfinite.com
|
|
| | Earlier we've explored the anatomy of a program in memory, the landscape of how our programs run in a computer. Now we turn to the call stack, the work horse in most programming languages and virtual |