Outer Web | Explore

Explore >> Select a destination

You are here		spacy.io Linguistic Features · spaCy Usage Documentation
\|	\|	tonybaloney.github.io Working with Chinese, Japanese, and Korean text in Generative AI pipelines	3.5 parsecs away Travel
\|	\|	[AI summary] A technical guide to handling Chinese, Japanese, and Korean text in Large Language Model pipelines by optimizing tokenization and text splitting strategies for specific language characteristics.	3.5 parsecs away Travel
\|	\|	explosion.ai End-to-end Neural Coreference Resolution in spaCy · Explosion	2.7 parsecs away Travel
\|	\|	Coreference resolution is the problem of resolving entities in texts to references such as pronouns. Even if you've never heard of it, it's something we all do constantly every day, and is a key to understanding natural language. We recently added an experimental implementation of an end-to-end neural coreference component to spaCy. This post explains the architecture of our model in detail.	2.7 parsecs away Travel
\|	\|	justinhj.github.io Optimizing training a GPT style Tokenizer with C++	4.0 parsecs away Travel
\|	\|	[AI summary] The user has provided a detailed explanation of implementing the BPE (Byte Pair Encoding) algorithm for tokenization, focusing on the challenges and considerations involved in the process. They describe the use of different conflict resolution strategies, such as first occurrence and lexicographical ordering, and discuss the optimization techniques applied to improve performance, including incremental frequency counting and efficient data structures. The user also outlines future directions for the project, such as porting to Zig, exploring other tokenization algorithms, and optimizing encoding/decoding steps. The response highlights the complexity of working with C++ and the benefits of using modern C++ practices while emphasizing the importanc...	4.0 parsecs away Travel
\|	\|	cupano.com Home · joecupano/blog Wiki · GitHub	14.2 parsecs away Travel
\|		My Blog Posts. Contribute to joecupano/blog development by creating an account on GitHub.	14.2 parsecs away Travel