Thursday, October 2, 2025

What is Google AgentSpace?

 AgentSpace is a dedicated, enterprise-grade platform designed by Google (often integrated within Vertex AI) for the complete lifecycle management of complex, autonomous AI Agents.


It moves AI Agents—which are programs built on Large Language Models (LLMs) like Gemini that can reason, plan, and use external tools/APIs—from research prototypes into reliable, scalable, and governed business solutions.


Think of AgentSpace as the operating system or orchestration layer for your organization's fleet of AI assistants. It provides the tooling necessary to manage the complexity that comes from agents making decisions and taking actions autonomously.


What is AgentSpace?

AgentSpace provides a centralized environment for four core functions related to AI Agents:


Building and Iteration: It offers frameworks and templates to define an agent's reasoning capabilities, its permitted external tools (APIs, databases), and its core mission (e.g., "The Customer Service Agent").


Deployment: It handles the transition from a development environment to a production environment, ensuring the agent is containerized, secure, and ready to handle high traffic.


Governance and Safety: It allows developers to define guardrails and constraints to ensure the agent's actions are safe, ethical, and comply with corporate policy.


Monitoring and Evaluation: It continuously tracks the agent's performance, latency, failure rates, and reasoning paths, allowing for rapid debugging and improvement.


How AgentSpace Benefits Enterprises

The value of AgentSpace lies in solving the specific challenges that arise when autonomous AI agents are integrated into critical business operations:


1. Robust Governance and Auditability

In an enterprise, every system action must be traceable. Since an AI agent makes its own decisions (e.g., calling an internal API or creating a ticket), strict control is necessary.


Benefit: AgentSpace provides detailed logging and audit trails for every action an agent takes, every tool it calls, and every internal reasoning step. This ensures regulatory compliance and provides a clear chain of accountability.


Safety Guards: It allows the enterprise to define security parameters—what APIs the agent is allowed to call, what data tables it is prohibited from accessing—thereby mitigating security and compliance risks.


2. Scalability and Reliability (Observability)

An agent that works well in testing must scale to handle thousands or millions of user interactions.


Benefit: AgentSpace is built on cloud infrastructure designed for massive scale. It handles load balancing and resource allocation automatically. More importantly, it provides deep observability tools (dashboards, metrics) that track agent performance in real-time. This helps enterprises quickly identify and fix issues like agents getting stuck in loops, using outdated information, or generating high-latency responses.


3. Accelerated Time-to-Value

Building a complex, custom agent often involves stitching together multiple tools, models, and data sources.


Benefit: The platform provides pre-integrated tools and frameworks that simplify the creation of complex agents. By managing the underlying infrastructure, versioning, and deployment logic, AgentSpace dramatically reduces the time required for developers to move an agent from a concept to a reliable production service. This means faster delivery of capabilities like automated triage, complex data analysis assistants, and autonomous execution of workflows.

What is Gemini Gems ?

A "Gem" is essentially a dedicated, personalized workspace powered by the Gemini model. You can think of it as your own private, tailored AI assistant created for a specific purpose or project.


The core idea behind Gems is to give users control over the scope and focus of their conversations, offering a middle ground between a general public chat and a highly customized application.


Key Characteristics of Gems:

Specialization: You can create a Gem with a specific persona and instructions. For example:


A "Coding Coach" Gem focused only on Python and Docker.


A "Travel Planner" Gem focused only on itinerary creation and logistics.


A "Creative Writer" Gem focused on fiction and storytelling.


Isolated Context: A Gem maintains its own history and context, separate from your main Gemini chat history. This isolation helps keep conversations focused and prevents context from bleeding across unrelated topics.


Efficiency: Because the Gem has a defined role, it is often more efficient and accurate in responding to specialized prompts within that domain.


What is "Saved Info in Gems"?

"Saved Info" is the feature that allows you to provide a Gem with long-term, persistent context and preference data that it uses across all your future interactions with that specific Gem.


This is fundamentally different from standard chat history, where the model only remembers what was discussed in the current thread.


The Purpose of Saved Info:

Personalized Grounding: You can input explicit, private data that the Gem should always reference.


Consistent Persona: The Gem can use this information to maintain consistency and relevance over time.



In short, Gems are the personalized chat environments, and Saved Info is the specific, long-term memory that makes each Gem uniquely useful to you by eliminating the need to repeat your preferences in every new conversation.



Wednesday, October 1, 2025

Google Cloud Learning - GenMedia MCP server

You can use the Firebase MCP server to give AI-powered development tools the ability to work with your Firebase projects. The Firebase MCP server works with any tool that can act as an MCP client, including Claude Desktop, Cline, Cursor, Visual Studio Code Copilot, Windsurf Editor, and more.

An editor configured to use the Firebase MCP server can use its AI capabilities to help you:


Create and manage Firebase projects

Manage your Firebase Authentication users

Work with data in Cloud Firestore and Firebase Data Connect

Retrieve Firebase Data Connect schemas

Understand your security rules for Firestore and Cloud Storage for Firebase

Send messages with Firebase Cloud Messaging



MCP Servers for Genmedia x Gemini CLI


What is the "Genmedia x Gemini CLI" Context?

Before defining MCP, let's look at the components:


Gemini CLI: The command-line interface used to interact with the Gemini model family, allowing developers and users to trigger GenAI tasks, deploy models, and manage input/output data.


Genmedia: This is a term likely referring to a suite of Google Cloud Media Services or applications focused on Generative Media (handling, processing, and generating video, audio, and high-resolution images). These workloads are extremely resource-intensive.


The MCP Servers are the dedicated backbone for the "Genmedia" part of the equation.


The Role of MCP Servers (Media-Optimized Compute)

While "MCP" can have various meanings, in this high-performance context, it is inferred to stand for a specialized compute platform, potentially Media Compute Platform or similar proprietary internal terminology.


These servers are designed to address the unique challenges of generative media:


1. High-Performance Hardware

These are not general-purpose virtual machines. MCP Servers would be provisioned with specialized hardware necessary to run state-of-the-art media and AI models efficiently:


GPUs/TPUs: They are powered by massive arrays of Graphics Processing Units (GPUs) or Tensor Processing Units (TPUs), which are essential for the parallel computations required by large transformer models like Gemini.


Large Memory and VRAM: Generative media tasks (especially video) require large amounts of Video RAM (VRAM) and system memory to hold both the large models and the massive input/output files.


2. High Throughput & Low Latency

Processing a 4K video or generating several minutes of complex animation requires moving terabytes of data quickly.


High-Speed Networking: MCP Servers are equipped with extremely high-bandwidth networking (often 100Gbps or higher) to minimize the latency involved in reading media from storage, running it through the model, and writing the result back.


Optimized Storage: They often interface directly with low-latency, high-throughput storage systems tailored for media workloads.


3. Dedicated Workloads for Genmedia

When you use the Gemini CLI to initiate a video generation task (a Genmedia workload), the system transparently routes that request to these specialized MCP Servers because they are the only infrastructure capable of completing the task economically and quickly


Tuesday, September 30, 2025

Google Cloud Learning - Context7 MCP Server

Context7 provides up-to-date documentation for LLMs and AI code editors. If you are looking to provide the context to the LLM with the latest documentation for the framework of your choice, the Context7 MCP server is a good one to configure.

Make sure that you do have your library listed at the Context7 home page.

Here is the MCP Server that you need to add in the settings.json file.

"context7": {

      "httpUrl": "https://mcp.context7.com/mcp"

    }

Once the MCP Server is configured and Gemini CLI loaded successfully with it, you should be able to view the Context7 tools as shown below:

You can now be specific in your prompt and ask Gemini CLI to use Context7 for the latest documentation, while generating your application or code snippet, using a specific XYZ framework.

Here is an example prompt, where I want to write an Agent using the Agent Development Kit (ADK) from Google. I am specifying in my prompt to look up the documentation for the same via the Context7 MCP Server.

I am working on coding an Agent using the Agent Development Kit (ADK) from Google. I would like to know how to create the LLMAgent in Python. Use Context7 for the latest documentation on ADK and specifically use /google/adk-python, /google/adk-docs and adk.wiki 

Google Slides MCP Server

The Github project at https://github.com/matteoantoci/google-slides-mcp provides a MCP server for interacting with the Google Slides API. It allows you to create, read, and modify Google Slides presentations programmatically.

The steps to configure the MCP server are given in the project. You will need to have a Node.js environment where you build out the server, configure a Google Cloud Project and OAuth 2.0 tokens and then configure the MCP Server in the settings.json file.

Once setup, you can run prompts like:

Extract the latest information from "web_url", summarize it into key points and create a presentation named "my_presentation".

Give it a try!


Google Cloud Learning - Github MCP Server

The Github official MCP Server provides sufficient documentation on the tools that it exposes along with how to configure the same. You can pick your choice in terms of running it locally or remotely, since Gemini CLI supports remote MCP Servers too.


Once you have PAT, you will need to add the MCP Server object in the settings.json file. The complete settings.json file on my system is shown below. You might have additional settings, but the mcpServers object should be as given below:


{

  "theme": "Default",

  "selectedAuthType": "oauth-personal",

  "mcpServers": {

       "github": {

            "httpUrl": "https://api.githubcopilot.com/mcp/",

            "headers": {

                "Authorization": "GITHUB_PAT"

            },

            "timeout": 5000

       }

  }

}



You can either start Gemini CLI again or do a /mcp refresh command, once you have updated the settings.json with the Github MCP Server configuration. The screenshot below highlights the Github MCP Server that is configured on my machine and the various tools that are now available to the Gemini CLI to work with MCP.


An example question like this below 


"Who am on GitHub?"


Notice that it will pick the correct tool from the Github MCP Server but as with other in-built Tools, this will also require that you provide explicit permission to invoke the tool. Go ahead and see what output you get.


You should now work with one of your Github projects. Give your queries in natural language like:


Describe the <repo-name> to me?

Clone the <repo-name> on my local machine.

Describe @<file-name> or @<directory-name>/

What are the different components of this repository?

I have made necessary changes. Can you push the changes to Github and use the Github MCP Server tools to do that.



Google Cloud Learning - Configuring MCP Servers

An MCP server is an application that exposes tools and resources to the Gemini CLI through the Model Context Protocol, allowing it to interact with external systems and data sources. MCP servers act as a bridge between the Gemini model and your local environment or other services like APIs.


An MCP server enables the Gemini CLI to discover and execute tools thereby extending Gemini CLI's capabilities to perform actions beyond its built-in features, such as interacting with databases, APIs, custom scripts, or specialized workflows.


You can configure MCP servers at the global level in the ~/.gemini/settings.json file or in your project's root directory. Create or open the .gemini/settings.json file. Within the file, you will need to add the mcpServers configuration block, as shown below:


"mcpServers": {

    "server_name_1": {},

    "server_name_2": {},

    "server_name_n": {}

 }


Each server configuration supports the following properties


Required (one of the following)


command (string): Path to the executable for Stdio transport

url (string): SSE endpoint URL (e.g., "http://localhost:8080/sse")

httpUrl (string): HTTP streaming endpoint URL



Optional


args (string[]): Command-line arguments for Stdio transport

headers (object): Custom HTTP headers when using url or httpUrl

env (object): Environment variables for the server process. Values can reference environment variables using $VAR_NAME or ${VAR_NAME} syntax

cwd (string): Working directory for Stdio transport

timeout (number): Request timeout in milliseconds (default: 600,000ms = 10 minutes)

trust (boolean): When true, bypasses all tool call confirmations for this server (default: false)

includeTools (string[]): List of tool names to include from this MCP server. When specified, only the tools listed here will be available from this server (whitelist behavior). If not specified, all tools from the server are enabled by default.

excludeTools (string[]): List of tool names to exclude from this MCP server. Tools listed here will not be available to the model, even if they are exposed by the server. Note: excludeTools takes precedence over includeTools - if a tool is in both lists, it will be excluded.



Google Cloud Learning - Gemini CLI Shell mode

Gemini Shell mode (sometimes referred to simply as "Shell" or "Command Line Mode") is a specialized environment where the Gemini model acts as a highly intelligent terminal assistant. Instead of just generating explanatory text about how to perform a task, the model generates, executes, and debugs commands directly in a simulated command-line interface.


It's essentially a way to get help with scripting, debugging, configuration, and environment setup without leaving the command line.


You can go into shell mode by pressing ! in the message box. This will toggle to a Shell mode. You can get back by pressing ! again or by hitting the ESC key.


When in shell mode, you will see the ! at the start of the prompt as shown below:


You can directly check using standard commands like pwd and ls as shown below. Please note that the commands assume a Linux type OS. If you are on another operating system (Windows), please use equivalent commands like (cd, dir and type). Do note that the output of these calls is included in the model's context window.


That's a great question about one of the interactive and helpful modes of the Gemini family of models!


Gemini Shell mode (sometimes referred to simply as "Shell" or "Command Line Mode") is a specialized environment where the Gemini model acts as a highly intelligent terminal assistant. Instead of just generating explanatory text about how to perform a task, the model generates, executes, and debugs commands directly in a simulated command-line interface.


It's essentially a way to get help with scripting, debugging, configuration, and environment setup without leaving the command line.


How Gemini Shell Mode Works

The core idea is to bridge the gap between human language and complex command-line actions:


1. Context and Execution

When you enter Shell mode, you give Gemini a high-level task (e.g., "Find all Python files that haven't been modified in the last 30 days and list their sizes").


2. Command Generation

Gemini translates your request into the appropriate command-line syntax (e.g., a complex find command with arguments).


3. Verification and Execution

Crucially, Gemini typically asks you to verify the command before executing it in the simulated environment. Once executed, it displays the output, just as a real terminal would.


4. Iteration and Debugging

If the command fails or if the output isn't what you expected, you can tell Gemini: "That output is too verbose; just show me the file names." Gemini will then generate a new, refined command, often involving piping the output to other tools like grep or awk.


Main Benefits of Using Shell Mode

Learning and Exploration: It helps users quickly learn complex shell commands (bash, zsh, etc.) by showing the exact syntax and explaining the function of each flag.


Time Savings: It eliminates the need to look up documentation for tricky commands (like awk or complex sed expressions).


Debugging Assistance: It helps diagnose issues with scripts, environment variables, and missing libraries quickly, especially in environments like Docker or remote servers.


Complex Task Automation: It can combine multiple steps (e.g., compress a directory, encrypt it, and upload it to a remote server) into a sequence of executable, verified commands.


In short, it transforms the command line from a strictly manual interface into a collaborative workspace with an AI expert.