Explore >> Select a destination


You are here

www.philschmid.de
| | www.confident-ai.com
5.0 parsecs away

Travel
| | In this article, we'll walkthrough how to fine-tune and evaluate a LLaMA-2 model using Hugging Face and DeepEval
| | blog.paperspace.com
4.8 parsecs away

Travel
| | In this article, we will learn how to make predictions using the 4-bit quantized ?? Idefics-9B model and fine-tune it on a specific dataset.
| | www.answer.ai
12.2 parsecs away

Travel
| | This blog post introduces ModernBERT, a family of state-of-the-art encoder-only models representing improvements over older generation encoders across the board, with a 8192 sequence length, better downstream performance and much faster processing.
| | lilianweng.github.io
74.3 parsecs away

Travel
| [Updated on 2019-02-14: add ULMFiT and GPT-2.] [Updated on 2020-02-29: add ALBERT.] [Updated on 2020-10-25: add RoBERTa.] [Updated on 2020-12-13: add T5.] [Updated on 2020-12-30: add GPT-3.] [Updated on 2021-11-13: add XLNet, BART and ELECTRA; Also updated the Summary section.] I guess they are Elmo & Bert? (Image source: here) We have seen amazing progress in NLP in 2018. Large-scale pre-trained language modes like OpenAI GPT and BERT have achieved great performance on a variety of language tasks using generic model architectures. The idea is similar to how ImageNet classification pre-training helps many vision tasks (*). Even better than vision classification pre-training, this simple and powerful approach in NLP does not require labeled data for pre-training, allowing us to experiment with increased training scale, up to our very limit.