In the context of large language models, a token is a unit of text that the model processes. A token can be as small as a single character or as large as a word or punctuation mark. The exact size of a token depends on the specific tokenization algorithm used by the model. For example:
The word “computer” is one token.
The sentence “Hello, how are you?” consists of 6 tokens: “Hello”, “,”, “how”, “are”, “you”, “?”
Typically, the model splits longer texts into smaller components (tokens) for efficient processing, making it easier to understand, generate, and manipulate text at a granular level.
For many LLMs, including OpenAI’s GPT models, usage costs are determined by the number of tokens processed, which includes both input tokens (the text prompt given to the model) and output tokens (the text generated by the model). Since the computational cost of running these models is high, token-based pricing provides a fair and scalable way to charge for usage.
Calculating Tokens in a Request
Before diving into cost calculation, let’s break down how tokens are accounted for in a request:
Input Tokens:
The text or query sent to the model is split into tokens. For example, if you send a prompt like “What is the capital of France?”, this prompt will be tokenized, and each word will contribute to the token count.
Output Tokens:
The response generated by the model also consists of tokens. For example, if the model responds with “The capital of France is Paris.”, the words in this sentence are tokenized as well.
For instance:
Input: “What is the capital of France?” (7 tokens)
Output: “The capital of France is Paris.” (7 tokens)
Total tokens used in the request: 14 tokens
Tokenize the Input and Output
First, determine the number of tokens in your input text and the model’s output.
Example:
Input Prompt: “What is the weather like in New York today?” (8 tokens)
Output: “The weather in New York today is sunny with a high of 75 degrees.” (14 tokens)
Total Tokens: 8 + 14 = 22 tokens
2. Identify the Pricing for the Model
Pricing will vary depending on the model provider. For this example, let’s assume the pricing is:
$0.02 per 1,000 tokens
3. Calculate Total Cost Based on Tokens
Multiply the total number of tokens by the rate per 1,000 tokens:
Step-by-Step Guide to Calculating the Cost
Tokenize the Input and Output
First, determine the number of tokens in your input text and the model’s output.
Example:
Input Prompt: “What is the weather like in New York today?” (8 tokens)
Output: “The weather in New York today is sunny with a high of 75 degrees.” (14 tokens)
Total Tokens: 8 + 14 = 22 tokens
2. Identify the Pricing for the Model
Pricing will vary depending on the model provider. For this example, let’s assume the pricing is:
$0.02 per 1,000 tokens
TOTAL COST = (22/1000) * 0.02 = 0.00044
Factors Influencing Token Costs
Several factors can influence the number of tokens generated and therefore the overall cost:
Length of Input Prompts:
Longer prompts result in more input tokens, increasing the overall token count.
Length of Output Responses:
If the model generates lengthy responses, more tokens are used, leading to higher costs.
Complexity of the Task:
More complex queries that require detailed explanations or multiple steps will result in more tokens, both in the input and output.
Model Used:
Different models (e.g., GPT-3, GPT-4) may have different token limits and pricing structures. More advanced models typically charge higher rates per 1,000 tokens.
Token Limits Per Request:
Many LLM providers impose token limits on each request. For instance, a single request might be capped at 2,048 or 4,096 tokens, including both input and output tokens.
Reducing Costs When Using LLMs
Optimize Prompts:
Keep prompts concise but clear to minimize the number of input tokens. Avoid unnecessary verbosity.
Limit Response Length:
Control the length of the model’s output using the maximum tokens parameter. This prevents the model from generating overly long responses, saving on tokens.
Batch Processing:
If possible, group related queries together to reduce the number of individual requests.
Choose the Right Model:
Use smaller models when applicable, as they are often cheaper per token compared to larger, more advanced models.