Monday, October 7, 2024

What's the maximum token limit or context length for various LLM Models?

The maximum context length (or token limit) for various LLMs depends on the specific model you are using. Here’s a general breakdown for common LLMs and their context lengths:


1. OpenAI GPT Models:

GPT-3.5 (davinci): 4,096 tokens

GPT-4 (8k variant): 8,192 tokens

GPT-4 (32k variant): 32,768 tokens

2. Anthropic Claude:

Claude 1/2: 100k tokens (depends on version, with newer versions supporting larger contexts)

3. LLaMA (Meta):

LLaMA-2 (7B, 13B): 4,096 tokens

LLaMA-2 (70B): 8,192 tokens (some variants may support more)

4. Cohere:

Cohere Command: 4096 tokens

5. Mistral:

Mistral Models: Typically support 8,192 tokens or more depending on the implementation and fine-tuning.

Understanding Token Limits:

Tokens are units of text. A token might be as short as one character or as long as one word. For example, "chatGPT is great!" would be split into 6 tokens (["chat", "G", "PT", " is", " great", "!"]).

When providing context (like cli_retriever) or a prompt (runcli_prompt), the entire length (context + user question) must stay within the token limit. If the combined size exceeds the token limit, the model will truncate the input.

Determining Token Length in LangChain:

To ensure that your context (cli_retriever) and any additional inputs (e.g., runcli_prompt) fit within the LLM's context window, you can estimate token length or use LangChain utilities to split your input text if necessary (e.g., RecursiveCharacterTextSplitter).

So, for your runcli_chain, the maximum size of {"context": cli_retriever, "question": RunnablePassthrough()} depends on the specific LLM you are querying. You would typically set the chain’s limits based on the LLM’s token capacity mentioned above.


references:

OpenAI 

No comments:

Post a Comment