How much is 128k tokens in an LLM context?

Have you ever wondered how much a 128k context length is in a Large Language Model (LLM)?

That’s why I used an available version of The Lord of the Rings and counted the tokens.

You can check out the token lengths here

I assume they chose 128k for marketing reasons, as context lengths are usually powers of 2.

The 128k tokens context length in GPT-4 is an impressive leap, showcasing just how far we’ve come in processing vast amounts of text. However, even with this extended capacity, it falls short of accommodating the full text of a book like The Lord of the Rings. This limitation highlights the need for innovative strategies to handle longer texts, such as Retrieval-Augmented Generation (RAG) or other advanced retrieval solutions.

GPT-2 had a token size of up 1024.

2^12 = 512 tokens

2^13 = 1024 tokens

GPT-3 had a token size of up 2048.

2^14 = 2048 tokens

GPT-3.5-Turbo had a token size of up 4096.

2^15 = 4096 tokens

GPT4 had a token size of up 128k.

2^16 = 8192 tokens

2^17 = 16384 tokens

2^18 = 32768 tokens

2^19 = 65536 tokens

2^20 = 131072 tokens

128k tokens

Tokenizer o200k_base

Whats also interesting is the tokenizer used for training the model. The one from GPT-4 is called gpt4-tokenizer and is a byte pair encoding tokenizer.

As we know LLMs work as next token predictor. The model predicts the next token based on the previous tokens.

In this file o200k_base you can see the tokens the tokens used for this procedure.

Be aware that every model has a different tokenizer.

All the above mentioned context lengths are reffered to the o200k_base tokenizer.

Enjoy! ❤️