A token is the basic unit of text that AI language models process. Rather than reading individual characters or whole words, LLMs break text into tokens, which can be words, parts of words, or punctuation. For English, one token is roughly 3/4 of a word, so 100 tokens equals approximately 75 words. Tokenization is performed by algorithms like BPE (Byte Pair Encoding) or SentencePiece. Token counts matter because they determine API costs (priced per input/output token), context window limits, and processing speed. Different models use different tokenizers, so the same text may produce different token counts across models.
Frequently Asked Questions
What is a token in AI?
A token is the basic unit of text an AI model processes. It can be a word, part of a word, or punctuation. One token is roughly 3/4 of an English word.
Why do tokens matter?
Tokens determine API pricing, context window limits, and processing speed. Understanding token counts helps optimize cost and performance when using AI models.