Outer Web | Explore

Explore >> Select a destination

You are here		dev-discuss.pytorch.org TorchDynamo Update 9: Making DDP Work with TorchDynamo - compiler - PyTorch Developer Mailing List
\|	\|	saturncloud.io Speeding up Neural Network Training With Multiple GPUs and Dask \| Saturn Cloud Blog	5.3 parsecs away Travel
\|	\|	By combining Dask and PyTorch you can easily speed up training a model across a cluster of GPUs. But how much of a benefit does that bring? This blog post finds out!	5.3 parsecs away Travel
\|	\|	www.paepper.com PyTorch multi-GPU training for faster machine learning results :: Päpper's Machine Learning Blog - This blog features state o...	5.9 parsecs away Travel
\|	\|	When you have a big data set and a complicated machine learning problem, chances are that training your model takes a couple of days even on a modern GPU. However, it is well-known that the cycle of having a new idea, implementing it and then verifying it should be as quick as possible. This is to ensure that you can efficiently test out new ideas. If you need to wait for a whole week for your training run, this becomes very inefficient.	5.9 parsecs away Travel
\|	\|	pytorch.org Introducing PyTorch Fully Sharded Data Parallel (FSDP) API \| PyTorch	5.2 parsecs away Travel
\|	\|	Recent studies have shown that large model training will be beneficial for improving model quality. During the last 3 years, model size grew 10,000 times from BERT with 110M parameters to Megatron-2 with one trillion. However, training large AI models is not easy-aside from the need for large amounts of computing resources, software engineering complexity is also challenging. PyTorch has been working on building tools and infrastructure to make it easier.	5.2 parsecs away Travel
\|	\|	justinhj.github.io Optimizing training a GPT style Tokenizer with C++	29.4 parsecs away Travel
\|		[AI summary] The user has provided a detailed explanation of implementing the BPE (Byte Pair Encoding) algorithm for tokenization, focusing on the challenges and considerations involved in the process. They describe the use of different conflict resolution strategies, such as first occurrence and lexicographical ordering, and discuss the optimization techniques applied to improve performance, including incremental frequency counting and efficient data structures. The user also outlines future directions for the project, such as porting to Zig, exploring other tokenization algorithms, and optimizing encoding/decoding steps. The response highlights the complexity of working with C++ and the benefits of using modern C++ practices while emphasizing the importanc...	29.4 parsecs away Travel