The maximum context length (or token limit) for various LLMs depends on the specific model you are using. Here’s a general breakdown for common LLMs and their context lengths:
1. OpenAI GPT Models:
GPT-3.5 (davinci): 4,096 tokens
GPT-4 (8k variant): 8,192 tokens
GPT-4 (32k variant): 32,768 tokens
2. Anthropic Claude:
Claude 1/2: 100k tokens (depends on version, with newer versions supporting larger contexts)
3. LLaMA (Meta):
LLaMA-2 (7B, 13B): 4,096 tokens
LLaMA-2 (70B): 8,192 tokens (some variants may support more)
4. Cohere:
Cohere Command: 4096 tokens
5. Mistral:
Mistral Models: Typically support 8,192 tokens or more depending on the implementation and fine-tuning.
Understanding Token Limits:
Tokens are units of text. A token might be as short as one character or as long as one word. For example, "chatGPT is great!" would be split into 6 tokens (["chat", "G", "PT", " is", " great", "!"]).
When providing context (like cli_retriever) or a prompt (runcli_prompt), the entire length (context + user question) must stay within the token limit. If the combined size exceeds the token limit, the model will truncate the input.
Determining Token Length in LangChain:
To ensure that your context (cli_retriever) and any additional inputs (e.g., runcli_prompt) fit within the LLM's context window, you can estimate token length or use LangChain utilities to split your input text if necessary (e.g., RecursiveCharacterTextSplitter).
So, for your runcli_chain, the maximum size of {"context": cli_retriever, "question": RunnablePassthrough()} depends on the specific LLM you are querying. You would typically set the chain’s limits based on the LLM’s token capacity mentioned above.
references:
OpenAI
No comments:
Post a Comment