How much is 128k tokens in an LLM context?

October 11, 2024

I was always wodering how much 128k context length is in a Large Language Model (LLM).

Thats why I took an available version of Lord of the Rings and counted the tokens.

You can check out the token lengths here

I assume they made 128k for maketing reasons. Usually the context lengths were powers of 2.


GPT-2 had a token size of up 1024.

2^12 = 512 tokens

2^13 = 1024 tokens


GPT-3 had a token size of up 2048.

2^14 = 2048 tokens


GPT-3.5-Turbo had a token size of up 4096.

2^15 = 4096 tokens


GPT4 had a token size of up 128k.

2^16 = 8192 tokens

2^17 = 16384 tokens

2^18 = 32768 tokens

2^19 = 65536 tokens

2^20 = 131072 tokens

128k tokens


Tokenizer o200k_base

Whats also interesting is the tokenizer used for training the model. The one from GPT-4 is called gpt4-tokenizer and is a byte pair encoding tokenizer.

As we know LLMs work as next token predictor. The model predicts the next token based on the previous tokens.

In this file o200k_base you can see the tokens the tokens used for this procedure.

Be aware that every model has a different tokenizer.

All the above mentioned context lengths are reffered to the o200k_base tokenizer.

Enjoy! ❤️