-- Living Mobile --: 2026

Monday, July 13, 2026

What is AgentCore Harness

Every harness session is stateful by default and runs in a secure, isolated microVM per session (backed by AgentCore runtime). The agent has its own filesystem and shell, so it can write code, execute it, and can persist short-term and long-term memories and files across sessions, even when the underlying microVM session has expired and is replaced by a new one. Agents can use any model provided by Amazon Bedrock, OpenAI, Google Gemini, or any LiteLLM-compatible provider, and can switch providers mid-session without losing context, so you can plan with one model and execute with another, or swap providers for a price-performance test without rebuilding the conversation. Agents can connect to tools through AgentCore gateway, MCP servers, or use the built-in browser or code interpreter . You can attach AWS skills from Git, S3, or the curated AWS skills catalog with a single toggle, so the agent picks up domain expertise on demand instead of improvising. When you need a custom environment with your own dependencies, you can bring your own container. You can also mount S3 Files or EFS to share data across sessions and harnesses with full S3 durability and history. Every action is traced automatically through AgentCore observability, with a unified view that surfaces what the agent did across every capability in one place, so you stop hopping between log groups to piece together what happened.

You can iterate on real traffic with AgentCore evaluations and optimization to score behavior, get prompt and tool-description recommendations, and run A/B tests with statistical significance reporting per session. Then, roll out changes safely with immutable versions and named endpoints, and roll back instantly by pointing an endpoint at an earlier version. You can drop a harness into a larger pipeline through the AgentCore InvokeHarness state in AWS Step Functions, or export to Strands code (Claude Agent SDK coming soon) and run it on AgentCore runtime when configuration isn’t enough. Everything you need to build, run, and operate production agents, without managing infrastructure. The harness is powered by Strands Agents, the open-source agent framework from AWS.

There is no separate harness charge. You pay only for the underlying AgentCore capabilities you use. For details, see the AgentCore pricing page.

The main benefits of AgentCore Runtime

Amazon Bedrock AgentCore Runtime provides a secure, serverless and purpose-built hosting environment for deploying and running AI agents or tools. It offers the following benefits:

Framework agnostic

AgentCore Runtime lets you transform any local agent code to cloud-native deployments with a few lines of code no matter the underlying framework. Works seamlessly with popular frameworks like LangGraph, Strands, and CrewAI. You can also leverage it with custom agents that don’t use a specific framework.

Model flexibility

AgentCore Runtime works with any Large Language Model, such as models offered by Amazon Bedrock, Anthropic Claude, Google Gemini, and OpenAI.

Protocol support

AgentCore Runtime lets agents communicate with other agents and tools via Model Context Protocol (MCP) or Agent to Agent (A2A).

Session isolation

In AgentCore Runtime, each user session runs in a dedicated microVM with isolated CPU, memory, and filesystem resources. This helps create complete separation between user sessions, safeguarding stateful agent reasoning processes and helps prevent cross-session data contamination. After session completion, the entire microVM is terminated and memory is sanitized, delivering deterministic security even when working with non-deterministic AI processes.

Extended execution time

AgentCore Runtime supports both real-time interactions and long-running workloads up to 8 hours, enabling complex agent reasoning and asynchronous workloads that may involve multi-agent collaboration or extended problem-solving sessions.

Persistent filesystems

Runtime supports persisting filesystem state across session stop/resume cycles. The agent’s files, installed packages, and build artifacts can survive session stops without external storage.

Consumption-based pricing model

Runtime implements consumption-based pricing that charges only for resources actually consumed. Unlike allocation-based models that require pre-selecting resources, Runtime dynamically provisions what’s needed without requiring right-sizing. The service aligns CPU billing with actual active processing - typically eliminating charges during I/O wait periods when agents are primarily waiting for LLM responses - while continuously maintaining your session state.

Built-in authentication

AgentCore Runtime, powered by AgentCore Identity, assigns distinct identities to AI agents and seamlessly integrates with your corporate identity provider such as Okta, Microsoft Entra ID, or Amazon Cognito, enabling your end users to authenticate into only the agents they have access to. In addition, Runtime lets outbound authentication flows to securely access third-party services like Slack, Zoom, and GitHub - whether operating on behalf of users or autonomously (using either OAuth or API keys).

Agent-specific observability

AgentCore Runtime provides specialized built-in tracing that captures agent reasoning steps, tool invocations, and model interactions, providing clear visibility into agent decision-making processes, a critical capability for debugging and auditing AI agent behaviors.

Enhanced payload handling

AgentCore Runtime can process 100MB payloads enabling seamless processing of multiple modalities (text, images, audio, video), with rich media content or large datasets.

Bidirectional streaming

AgentCore Runtime supports both HTTP API calls and persistent WebSocket connections for real-time bidirectional streaming, enabling interactive applications with immediate response feedback and maintained conversation context.

Unified set of agent-specific capabilities

AgentCore Runtime is delivered through a single, comprehensive SDK that provides streamlined access to the complete AgentCore capabilities including Memory, Tools, and Gateway. This integrated approach eliminates the integration work typically required when building equivalent agent infrastructure from disparate components.

AgentCore harness and AgentCore Runtime

AgentCore harness and AgentCore Runtime solve different parts of the same problem. This page explains the conceptual difference and provides a feature-by-feature comparison to help you choose between them.

Conceptual difference

AgentCore Runtime is a serverless hosting environment. You bring agent code - written in any framework or no framework - wrap it with the AgentCore SDK’s BedrockAgentCoreApp entrypoint, package it into an ARM64 container, push it to Amazon ECR, and deploy. The orchestration loop is yours. To use any other AgentCore primitive (Memory, Gateway, Browser, Code Interpreter, outbound Identity), you call it from your code, typically through the AgentCore SDK. Runtime provides the infrastructure - isolation, scaling, sessions, auth gating, and observability plumbing - while the agent logic is code you write.

AgentCore harness is a managed agent harness - the orchestration loop itself is provided, powered by Strands Agents. You declare what the agent is (model, system prompt, tools, memory, limits) as configuration, and AgentCore runs the loop. Most features are a single config field: switching a model or adding a tool is a config change, not a redeploy. The harness is a managed abstraction that runs inside Runtime - CloudTrail records harness operations under AWS::BedrockAgentCore::Runtime.

For nearly every feature, the pattern is the same:

Harness - configuration, no code.

Runtime - you write code, usually with the AgentCore SDK plus your framework.

Sunday, July 12, 2026

OpenSearch Configurations

Enhance document retrieval in OpenSearch for Generative AI applications by utilizing Hybrid Search, RRF, and metadata filtering. These techniques improve recall and precision. They are executed using OpenSearch's k-NN plugin, Search Pipelines, and Machine Learning (ML) connectors.1. Metadata Filtering & EnrichmentInstead of a blind vector search, constrain your semantic results with exact structural attributes.

How it helps: Reduces the search space and ensures the LLM only receives context relevant to specific dates, regions, or document categories.OpenSearch Components: Utilizes Metadata Fields in your mappings along with Post-Filtering or Boolean Query Clauses to combine structured and unstructured data.2.

Hybrid Search (Semantic + Keyword)Vector embeddings capture context but sometimes miss exact product codes or names. Hybrid search brings the best of both worlds.How it helps: Blends context-aware semantic search with exact-match keyword search (BM25) to catch both synonyms and specific identifiers.

OpenSearch Components: Executed via Hybrid Queries inside the query DSL, which trigger parallel scoring for different search clauses.3. Reciprocal Rank Fusion (RRF)Combining scores from vectors and keyword searches is difficult because they operate on different scales. RRF bypasses score calibration by merging results based solely on their rank.How it helps: Provides robust, out-of-the-box relevance by prioritizing documents that rank highly across multiple search types.

OpenSearch Components: Handled by the Search Pipeline and configured using the score-ranker-processor (available via the Neural Search plugin).4. RerankingFirst-stage retrievals (like vectors) may pull a wide net of documents, but aren't always precise. A reranker re-orders these results.How it helps: Uses specialized cross-encoder ML models to deeply evaluate query-to-document context, bringing the most precise answers to the top before sending them to the LLM.OpenSearch Components: Executed via the Rerank Processor within a Search Pipeline, which can run models locally via OpenSearch’s ML Commons or via external integrations (like Amazon Bedrock or Cohere).

What is Data Prepper in AWS Ingestion Pipeline?

Data Prepper is an open-source, server-side data collector and streaming ETL engine that powers the OpenSearch Ingestion pipeline. It accepts, filters, transforms, enriches, and routes large-scale observability data (logs, traces, and metrics) before it gets indexed into your cluster. [1, 2]

Core Components

Data Prepper pipelines are configured as customizable streaming graphs. Every pipeline is built with three main elements: [1, 2, 3]

Sources: The input interfaces that receive or pull data. Common sources include OpenTelemetry (OTel) collectors, Apache Kafka, or Amazon S3. [1]
Processors: The data manipulation engines. They are used to filter out noise, mask sensitive data for compliance, parse formats (like using the grok parser), or enrich events with geolocation and metadata. [1, 2, 3, 4, 5]
Sinks: The destinations for your data. Typically, the sink will be an Amazon OpenSearch Service domain or an Amazon OpenSearch Serverless collection, though it can also route to standard output for debugging. [1, 2]

Primary Use Cases

Trace Analytics: It handles high-volume distributed tracing data, allowing developers to visualize event flows and pinpoint performance issues. [1, 2]
Log Ingestion: It normalizes messy, unstructured application and system logs into structured, actionable insights for querying in OpenSearch Dashboards. [1, 2]
Cost Optimization & Security: It applies smart sampling, deduplication, and PII redaction to reduce overall storage costs and ensure data compliance before indexing. [1]

Algorithms in OpenSearch

OpenSearch supports a variety of algorithms for document embedding (the process of converting text to vectors) and search (the process of finding the nearest neighbors). These are typically categorized into **Vector Engines** (the storage and retrieval backends) and **Algorithm Types** (the logic used to traverse the data).

### 1. Vector Search Engines (Backends)

OpenSearch uses two primary "engines" to handle vector data. The choice between them dictates what algorithms you can use:

* **Lucene:** The default, native OpenSearch engine. It is tightly integrated, meaning it excels at **hybrid search** (filtering by metadata like `date` or `category` *while* performing a vector search).

* **Faiss (Facebook AI Similarity Search):** A high-performance library built for scale. It offers more sophisticated compression algorithms and is generally faster for very large datasets, though it can be slightly less flexible with complex metadata filtering compared to Lucene.

---

### 2. Approximate Nearest Neighbor (ANN) Algorithms

Since searching through millions of vectors exactly (Exact k-NN) is computationally expensive, OpenSearch uses ANN algorithms to provide fast, near-accurate results.

* **HNSW (Hierarchical Navigable Small World):**

* **How it works:** It builds a multi-layered graph where the top layers are "expressways" for long-distance jumps and the bottom layer contains all data points for local connections.

* **Best for:** General use cases requiring high performance and low latency. It is the most common default.

* **IVF (Inverted File Index):**

* **How it works:** It partitions the vector space into clusters (Voronoi cells). At search time, it only scans the clusters closest to the query vector.

* **Best for:** Extremely large datasets where you need to balance memory consumption and search speed by only searching a subset of the data.

---

### 3. Compression and Optimization Algorithms (Faiss Specific)

If you are using the **Faiss** engine, you can apply compression to fit massive vector indexes into RAM:

* **PQ (Product Quantization):** Breaks vectors into small chunks and compresses each chunk. This dramatically reduces memory footprint at the cost of a slight drop in search precision.

* **SQ (Scalar Quantization):** Converts high-precision floating-point numbers into 8-bit integers. This is highly effective for reducing memory usage while maintaining a high level of search accuracy.

---

### 4. Similarity Metrics (Distance Functions)

The "algorithm" for calculating how similar two vectors are depends on your distance metric, specified via `space_type`:

| Metric | Best For |

| --- | --- |

| **L2 (Euclidean)** | General-purpose spatial similarity. |

| **Cosine Similarity** | Semantic text similarity (especially when document length varies). |

| **Inner Product** | Models that are already normalized (often equivalent to Cosine similarity). |

---

### 5. Integration Strategies for Embeddings

To get your data *into* the vector space, you must choose an embedding model strategy:

* **External Embedding (Client-side):** You use a service like OpenAI, Cohere, or an internal Python script to convert text to vectors *before* sending it to OpenSearch.

* **ML Commons (In-cluster):** OpenSearch hosts the model directly. This is the "Gold Standard" because it creates an **Ingest Pipeline** where you send raw text, and OpenSearch automatically uses an internal model (e.g., BERT, RoBERTa) to generate the embedding upon arrival.

---

### Summary Checklist for Choosing

1. **Need metadata filtering?** Use the **Lucene** engine with **HNSW**.

2. **Need maximum scale/memory efficiency?** Use the **Faiss** engine with **IVF + PQ/SQ**.

3. **Accuracy vs. Latency?** Adjust `ef_search` (the number of candidates explored). Higher values increase accuracy but increase search time.

Would you like to see how to configure an Ingest Pipeline using ML Commons to automate this embedding process?

How to Ingest pdfs into the OpenSearch for Semantic searching?

Ingesting PDF documents for embedding-based search (often called Semantic Search or Vector Search) in Amazon OpenSearch is a fundamentally different process than standard log or text ingestion.

While standard ingestion involves sending raw text or JSON fields to be indexed by keyword, embedding-based ingestion requires transforming unstructured PDF content into high-dimensional vector embeddings before storing them.

How Embedding-Based Ingestion Differs

Standard Ingestion: You send text directly. OpenSearch uses inverted indexes (like a book's index) to perform keyword matching.
Embedding-Based Ingestion: You must first extract text from the PDF, send that text to an AI model (like Amazon Bedrock or a local model) to generate numerical vectors, and then store those vectors in an OpenSearch knn (k-nearest neighbors) index.

Steps for Embedding-Based Ingestion in OpenSearch

1. Preprocessing and Text Extraction

PDFs are binary files. OpenSearch cannot read them directly. You must first extract the text:

Use a library (like PyPDF2, LangChain, or Amazon Textract) to convert the PDF content into clean, readable text strings.
Chunking: Since AI models have input limits, you must split long PDF text into smaller, overlapping "chunks" (e.g., 500 tokens each).

2. Vectorization (The Embedding Step)

You transform your text chunks into vectors.

Send your chunks to an embedding model (e.g., amazon.titan-embed-text-v1 via Amazon Bedrock).
The model returns a list of floating-point numbers (e.g., 1024 dimensions) representing the semantic meaning of that chunk.

3. Configuring an OpenSearch `knn` Index

Before you store the vectors, your index must be configured to support Vector Search.

You create an index with a knn_vector field type.
You must define an engine (e.g., nmslib, faiss) and a method (e.g., hnsw for high performance) to allow for efficient similarity searching.

4. Ingestion Pipeline / Integration

You have two primary ways to connect this:

The Ingestion Pipeline (Recommended): You can use an OpenSearch Ingestion Pipeline to automatically call your embedding model (like Bedrock) while the document is being indexed. This keeps the vectorization logic inside the OpenSearch ecosystem.
Application-Side Processing: Your application extracts the text, calls the Bedrock API to get the embedding, and then sends the final JSON document (containing both the raw text and the vector) to the OpenSearch Bulk API.

Summary: Why is it different?

Feature	Standard Document Ingestion	Embedding-Based Ingestion
Data Format	JSON / Raw Logs	JSON + Vector Array
Processing	Tokenization (Keyword splitting)	ML Inference (Model generation)
Index Type	Inverted Index	KNN Index (`knn_vector` field)
Search Method	BM25 / Keyword Match	Vector Similarity (Cosine/Euclidean)

Key Architectural Note: The "In-Cluster" Approach

Amazon OpenSearch Service now supports "ML Commons," which allows you to integrate your OpenSearch domain directly with Amazon Bedrock. This is highly efficient because:

You define an Ingestion Pipeline inside OpenSearch.
You send the raw PDF text chunks to that pipeline.
OpenSearch automatically calls Bedrock to generate the embeddings.
OpenSearch indexes the vectors for you, abstracting away much of the manual vector handling in your application code.

-- Living Mobile --

Monday, July 13, 2026

What is AgentCore Harness

The main benefits of AgentCore Runtime

AgentCore harness and AgentCore Runtime

Sunday, July 12, 2026

OpenSearch Configurations

What is Data Prepper in AWS Ingestion Pipeline?

Algorithms in OpenSearch

How to Ingest pdfs into the OpenSearch for Semantic searching?

How Embedding-Based Ingestion Differs

Steps for Embedding-Based Ingestion in OpenSearch

1. Preprocessing and Text Extraction

2. Vectorization (The Embedding Step)

3. Configuring an OpenSearch `knn` Index

4. Ingestion Pipeline / Integration

Summary: Why is it different?

Key Architectural Note: The "In-Cluster" Approach

Followers

Blog Archive

About Me

Monday, July 13, 2026

Sunday, July 12, 2026

How Embedding-Based Ingestion Differs

Steps for Embedding-Based Ingestion in OpenSearch

1. Preprocessing and Text Extraction

2. Vectorization (The Embedding Step)

3. Configuring an OpenSearch knn Index

4. Ingestion Pipeline / Integration

Summary: Why is it different?

Key Architectural Note: The "In-Cluster" Approach

Followers

Blog Archive

About Me

3. Configuring an OpenSearch `knn` Index