ADK primarily uses two mechanisms for model integration:
Direct String / Registry: For models tightly integrated with Google Cloud (like Gemini models accessed via Google AI Studio or Vertex AI) or models hosted on Vertex AI endpoints. You typically provide the model name or endpoint resource string directly to the LlmAgent. ADK's internal registry resolves this string to the appropriate backend client, often utilizing the google-genai library.
Wrapper Classes: For broader compatibility, especially with models outside the Google ecosystem or those requiring specific client configurations (like models accessed via LiteLLM). You instantiate a specific wrapper class (e.g., LiteLlm) and pass this object as the model parameter to your LlmAgent.
Using Google Gemini Models¶
This section covers authenticating with Google's Gemini models, either through Google AI Studio for rapid development or Google Cloud Vertex AI for enterprise applications. This is the most direct way to use Google's flagship models within ADK.
Integration Method: Once you are authenticated using one of the below methods, you can pass the model's identifier string directly to the model parameter of LlmAgent.
The google-genai library, used internally by ADK for Gemini models, can connect through either Google AI Studio or Vertex AI.
Model support for voice/video streaming
In order to use voice/video streaming in ADK, you will need to use Gemini models that support the Live API. You can find the model ID(s) that support the Gemini Live API in the documentation:
Google AI Studio: Gemini Live API
Vertex AI: Gemini Live API
Google AI Studio
This is the simplest method and is recommended for getting started quickly.
Authentication Method: API Key
Setup:
Get an API key: Obtain your key from Google AI Studio.
Set environment variables: Create a .env file (Python) or .properties (Java) in your project's root directory and add the following lines. ADK will automatically load this file.
export GOOGLE_API_KEY="YOUR_GOOGLE_API_KEY"
export GOOGLE_GENAI_USE_VERTEXAI=FALSE
(or)
Pass these variables during the model initialization via the Client (see example below).
Google Cloud Vertex AI¶
For scalable and production-oriented use cases, Vertex AI is the recommended platform. Gemini on Vertex AI supports enterprise-grade features, security, and compliance controls
Based on your development environment and usecase, choose one of the below methods to authenticate. Pre-requisites: A Google Cloud Project with Vertex AI enabled.
Method A: User Credentials (for Local Development)¶
Install the gcloud CLI: Follow the official installation instructions.
Log in using ADC: This command opens a browser to authenticate your user account for local development.
gcloud auth application-default login
Set environment variables:
export GOOGLE_CLOUD_PROJECT="YOUR_PROJECT_ID"
export GOOGLE_CLOUD_LOCATION="YOUR_VERTEX_AI_LOCATION" # e.g., us-central1
Explicitly tell the library to use Vertex AI:
export GOOGLE_GENAI_USE_VERTEXAI=TRUE
Models: Find available model IDs in the Vertex AI documentation.
Method B: Vertex AI Express Mode¶
Vertex AI Express Mode offers a simplified, API-key-based setup for rapid prototyping.
Sign up for Express Mode to get your API key.
Set environment variables:
export GOOGLE_API_KEY="PASTE_YOUR_EXPRESS_MODE_API_KEY_HERE"
export GOOGLE_GENAI_USE_VERTEXAI=TRUE
Method C: Service Account (for Production & Automation)¶
For deployed applications, a service account is the standard method.
Create a Service Account and grant it the Vertex AI User role.
Provide credentials to your application:
On Google Cloud: If you are running the agent in Cloud Run, GKE, VM or other Google Cloud services, the environment can automatically provide the service account credentials. You don't have to create a key file.
Elsewhere: Create a service account key file and point to it with an environment variable:
export GOOGLE_APPLICATION_CREDENTIALS="/path/to/your/keyfile.json"
Instead of the key file, you can also authenticate the service account using Workload Identity. But this is outside the scope of this guide.
Method C: Service Account (for Production & Automation)¶
For deployed applications, a service account is the standard method.
Create a Service Account and grant it the Vertex AI User role.
Provide credentials to your application:
On Google Cloud: If you are running the agent in Cloud Run, GKE, VM or other Google Cloud services, the environment can automatically provide the service account credentials. You don't have to create a key file.
Elsewhere: Create a service account key file and point to it with an environment variable:
export GOOGLE_APPLICATION_CREDENTIALS="/path/to/your/keyfile.json"
Instead of the key file, you can also authenticate the service account using Workload Identity. But this is outside the scope of this guide.
from google.adk.agents import LlmAgent
# --- Example using a stable Gemini Flash model ---
agent_gemini_flash = LlmAgent(
# Use the latest stable Flash model identifier
model="gemini-2.0-flash",
name="gemini_flash_agent",
instruction="You are a fast and helpful Gemini assistant.",
# ... other agent parameters
)
# --- Example using a powerful Gemini Pro model ---
# Note: Always check the official Gemini documentation for the latest model names,
# including specific preview versions if needed. Preview models might have
# different availability or quota limitations.
agent_gemini_pro = LlmAgent(
# Use the latest generally available Pro model identifier
model="gemini-2.5-pro-preview-03-25",
name="gemini_pro_agent",
instruction="You are a powerful and knowledgeable Gemini assistant.",
# ... other agent parameters
)
Using Anthropic models
You can integrate Anthropic's Claude models directly using their API key or from a Vertex AI backend into your Java ADK applications by using the ADK's Claude wrapper class.
For Vertex AI backend, see the Third-Party Models on Vertex AI section.
Prerequisites:
Dependencies:
Anthropic SDK Classes (Transitive): The Java ADK's com.google.adk.models.Claude wrapper relies on classes from Anthropic's official Java SDK. These are typically included as transitive dependencies.
Anthropic API Key:
Obtain an API key from Anthropic. Securely manage this key using a secret manager.
Using Cloud & Proprietary Models via LiteLLM
To access a vast range of LLMs from providers like OpenAI, Anthropic (non-Vertex AI), Cohere, and many others, ADK offers integration through the LiteLLM library.
Integration Method: Instantiate the LiteLlm wrapper class and pass it to the model parameter of LlmAgent.
LiteLLM Overview: LiteLLM acts as a translation layer, providing a standardized, OpenAI-compatible interface to over 100+ LLMs.
Install LiteLLM
pip install litellm
Example for OpenAI:
export OPENAI_API_KEY="YOUR_OPENAI_API_KEY"
Example for Anthropic (non-Vertex AI):
export ANTHROPIC_API_KEY="YOUR_ANTHROPIC_API_KEY"
Consult the LiteLLM Providers Documentation for the correct environment variable names for other providers.
Example:
from google.adk.agents import LlmAgent
from google.adk.models.lite_llm import LiteLlm
# --- Example Agent using OpenAI's GPT-4o ---
# (Requires OPENAI_API_KEY)
agent_openai = LlmAgent(
model=LiteLlm(model="openai/gpt-4o"), # LiteLLM model string format
name="openai_agent",
instruction="You are a helpful assistant powered by GPT-4o.",
# ... other agent parameters
)
# --- Example Agent using Anthropic's Claude Haiku (non-Vertex) ---
# (Requires ANTHROPIC_API_KEY)
agent_claude_direct = LlmAgent(
model=LiteLlm(model="anthropic/claude-3-haiku-20240307"),
name="claude_direct_agent",
instruction="You are an assistant powered by Claude Haiku.",
# ... other agent parameters
)
Using Open & Local Models via LiteLLM
For maximum control, cost savings, privacy, or offline use cases, you can run open-source models locally or self-host them and integrate them using LiteLLM.
Integration Method: Instantiate the LiteLlm wrapper class, configured to point to your local model server.
Ollama Integration¶
Ollama allows you to easily run open-source models locally.
Model choice
If your agent is relying on tools, please make sure that you select a model with tool support from Ollama website.
For reliable results, we recommend using a decent-sized model with tool support.
The tool support for the model can be checked with the following command:
ollama show mistral-small3.1
Model
architecture mistral3
parameters 24.0B
context length 131072
embedding length 5120
quantization Q4_K_M
Capabilities
completion
vision
tools
You are supposed to see tools listed under capabilities.
You can also look at the template the model is using and tweak it based on your needs.
ollama show --modelfile llama3.2 > model_file_to_modify
For instance, the default template for the above model inherently suggests that the model shall call a function all the time. This may result in an infinite loop of function calls.
Given the following functions, please respond with a JSON for a function call
with its proper arguments that best answers the given prompt.
Respond in the format {"name": function name, "parameters": dictionary of
argument name and its value}. Do not use variables.
You can swap such prompts with a more descriptive one to prevent infinite tool call loops.
Review the user's prompt and the available functions listed below.
First, determine if calling one of these functions is the most appropriate way to respond. A function call is likely needed if the prompt asks for a specific action, requires external data lookup, or involves calculations handled by the functions. If the prompt is a general question or can be answered directly, a function call is likely NOT needed.
If you determine a function call IS required: Respond ONLY with a JSON object in the format {"name": "function_name", "parameters": {"argument_name": "value"}}. Ensure parameter values are concrete, not variables.
If you determine a function call IS NOT required: Respond directly to the user's prompt in plain text, providing the answer or information requested. Do not output any JSON.
Using ollama_chat provider¶
Our LiteLLM wrapper can be used to create agents with Ollama models.
root_agent = Agent(
model=LiteLlm(model="ollama_chat/mistral-small3.1"),
name="dice_agent",
description=(
"hello world agent that can roll a dice of 8 sides and check prime"
" numbers."
),
instruction="""
You roll dice and answer questions about the outcome of the dice rolls.
""",
tools=[
roll_die,
check_prime,
],
)
Using openai provider¶
Alternatively, openai can be used as the provider name. But this will also require setting the OPENAI_API_BASE=http://localhost:11434/v1 and OPENAI_API_KEY=anything env variables instead of OLLAMA_API_BASE. Please note that api base now has /v1 at the end.
root_agent = Agent(
model=LiteLlm(model="openai/mistral-small3.1"),
name="dice_agent",
description=(
"hello world agent that can roll a dice of 8 sides and check prime"
" numbers."
),
instruction="""
You roll dice and answer questions about the outcome of the dice rolls.
""",
tools=[
roll_die,
check_prime,
],
)
export OPENAI_API_BASE=http://localhost:11434/v1
export OPENAI_API_KEY=anything
adk web
You can see the request sent to the Ollama server by adding the following in your agent code just after imports.
import litellm
litellm._turn_on_debug()
Request Sent from LiteLLM:
curl -X POST \
http://localhost:11434/api/chat \
-d '{'model': 'mistral-small3.1', 'messages': [{'role': 'system', 'content': ...
Self-Hosted Endpoint (e.g., vLLM)
Tools such as vLLM allow you to host models efficiently and often expose an OpenAI-compatible API endpoint.
Setup:
Deploy Model: Deploy your chosen model using vLLM (or a similar tool). Note the API base URL (e.g., https://your-vllm-endpoint.run.app/v1).
Important for ADK Tools: When deploying, ensure the serving tool supports and enables OpenAI-compatible tool/function calling. For vLLM, this might involve flags like --enable-auto-tool-choice and potentially a specific --tool-call-parser, depending on the model. Refer to the vLLM documentation on Tool Use.
Authentication: Determine how your endpoint handles authentication (e.g., API key, bearer token).
Integration Example:
import subprocess
from google.adk.agents import LlmAgent
from google.adk.models.lite_llm import LiteLlm
# --- Example Agent using a model hosted on a vLLM endpoint ---
# Endpoint URL provided by your vLLM deployment
api_base_url = "https://your-vllm-endpoint.run.app/v1"
# Model name as recognized by *your* vLLM endpoint configuration
model_name_at_endpoint = "hosted_vllm/google/gemma-3-4b-it" # Example from vllm_test.py
# Authentication (Example: using gcloud identity token for a Cloud Run deployment)
# Adapt this based on your endpoint's security
try:
gcloud_token = subprocess.check_output(
["gcloud", "auth", "print-identity-token", "-q"]
).decode().strip()
auth_headers = {"Authorization": f"Bearer {gcloud_token}"}
except Exception as e:
print(f"Warning: Could not get gcloud token - {e}. Endpoint might be unsecured or require different auth.")
auth_headers = None # Or handle error appropriately
agent_vllm = LlmAgent(
model=LiteLlm(
model=model_name_at_endpoint,
api_base=api_base_url,
# Pass authentication headers if needed
extra_headers=auth_headers
# Alternatively, if endpoint uses an API key:
# api_key="YOUR_ENDPOINT_API_KEY"
),
name="vllm_agent",
instruction="You are a helpful assistant running on a self-hosted vLLM endpoint.",
# ... other agent parameters
)