Monday, December 30, 2024

How to run mongo atlas vector DB locally?

Its pretty much easy, 

docker pull mongodb/mongodb-atlas-local:latest

docker run -p 27017:27017 mongodb/mongodb-atlas-local


Below is connection string without and with authentication 

mongosh "mongodb://localhost:27017/?directConnection=true"

mongosh "mongodb://user:pass@localhost:27017/?directConnection=true"


references:

https://www.mongodb.com/docs/atlas/cli/current/atlas-cli-deploy-docker/

How to create a docker image and push to Docker hub

Have Dockerfile like this below 

================================

# Use Python 3.12 as the base image

FROM python:3.12-slim

# Set the working directory

WORKDIR /app

# Copy the requirements and install dependencies

COPY requirements.txt .

RUN pip install --no-cache-dir -r requirements.txt

# Copy the application code

COPY ./app ./app

# Expose the application port

EXPOSE 8000

# Start the FastAPI app

CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8000"]


Have requirement.txt file like below 

=====================================

fastapi

uvicorn


Have the main.py like below 

===========================

from fastapi import FastAPI

app = FastAPI()

@app.get("/")

def read_root():

    return {"message": "Hello, Dockerized FastAPI!"}


Now the build process is like below 

The below steps to be executed after creating an account in docker hub. Instead of using password on terminal, access token can be created and that can be used. 


docker build -t mrrathish/crashing-docker-app:latest .

docker login

docker push mrrathish/crashing-docker-app:latest


docker run -p 8000:8000 mrrathish/crashing-docker-app:latest

That's it pretty much 


Friday, December 27, 2024

What is Container , Container Image

A container is run from a container image.

A container image is a static version of all the files, environment variables, and the default command/program that should be present in a container. Static here means that the container image is not running, it's not being executed, it's only the packaged files and metadata.

In contrast to a "container image" that is the stored static contents, a "container" normally refers to the running instance, the thing that is being executed.

When the container is started and running (started from a container image) it could create or change files, environment variables, etc. Those changes will exist only in that container, but would not persist in the underlying container image (would not be saved to disk).

A container image is comparable to the program file and contents, e.g. python and some file main.py.

And the container itself (in contrast to the container image) is the actual running instance of the image, comparable to a process. In fact, a container is running only when it has a process running (and normally it's only a single process). The container stops when there's no process running in it.

A container image normally includes in its metadata the default program or command that should berun when the container is started and the parameters to be passed to that program. Very similar to what would be if it was in the command line.

When a container is started, it will run that command/program (although you can override it and make it run a different command/program).

A container is running as long as the main process (command or program) is running.

A container normally has a single process, but it's also possible to start subprocesses from the main process, and that way you will have multiple processes in the same container.

But it's not possible to have a running container without at least one running process. If the main process stops, the container stops.

references:

https://fastapi.tiangolo.com/deployment/docker/#what-is-a-container-image

Sunday, December 22, 2024

How to use Pedantic to declare JSON data models (Data shapes)

 First, you need to import BaseModel from pydantic and then use it to create subclasses defining the schema, or data shapes, you want to receive.

Next, you declare your data model as a class that inherits from BaseModel, using standard Python types for all the attributes:

# main.py

from typing import Optional

from fastapi import FastAPI

from pydantic import BaseModel


class Item(BaseModel):

    name: str

    description: Optional[str] = None

    price: float

    tax: Optional[float] = None


app = FastAPI()


@app.post("/items/")

async def create_item(item: Item):

    return item


When a model attribute has a default value, it is not required. Otherwise, it is required. To make an attribute optional, you can use None.


For example, the model above declares a JSON object (or Python dict) like this:


{

    "name": "Foo",

    "description": None,

    "price": 45.2,

    "tax": None

}


In this case, since description and tax are optional because they have a default value of None, this JSON object would also be valid:


{

    "name": "Foo",

    "price": 45.2

}


A JSON object that omits the default values is also valid.

Next, add the new pydantic model to your path operation as a parameter. You declare it the same way you declared path parameters:



# main.py


from typing import Optional


from fastapi import FastAPI

from pydantic import BaseModel


class Item(BaseModel):

    name: str

    description: Optional[str] = None

    price: float

    tax: Optional[float] = None


app = FastAPI()


@app.post("/items/")

async def create_item(item: Item):

    return item

The parameter item has a type hint of Item, which means that item is declared as an instance of the class Item.

With that Python type declaration, FastAPI will:

Read the body of the request as JSON

Convert the corresponding types if needed

Validate the data and return a clear error if it is invalid

Give you the received data in the parameter item—since you declared it to be of type Item, you will also have all the editor support, with completion and type checks for all the attributes and their types

By using standard type hints with pydantic, FastAPI helps you build APIs that have all these best practices by default, with little effort.


References:

https://realpython.com/fastapi-python-web-apis/#create-a-first-api


Unicorn and alternatives

What is Uvicorn?

Uvicorn is a lightning-fast ASGI (Asynchronous Server Gateway Interface) server designed to run Python web applications. It supports asynchronous frameworks like FastAPI, Starlette, and others. Uvicorn is built on top of uvloop and httptools, providing excellent performance for handling concurrent requests in modern web applications.

Why is Uvicorn Required for FastAPI?

FastAPI is an ASGI framework, meaning it requires an ASGI server to handle HTTP requests and serve the application. Uvicorn is a popular choice because:

Asynchronous Support: It natively supports async features, which are central to FastAPI’s high-performance capabilities.

Performance: Uvicorn is optimized for speed and can efficiently handle a large number of concurrent requests.

Compatibility: Uvicorn is fully compatible with FastAPI and provides seamless integration.

Ease of Use: It's simple to install and use, with minimal configuration required.

Without a server like Uvicorn, FastAPI can't process incoming HTTP requests or serve responses.


Alternatives to Uvicorn

There are other ASGI servers available that can be used instead of Uvicorn. Here are some common alternatives:

Daphne

Developed by the Django Channels team.

Suitable for applications that require WebSocket support or compatibility with Django Channels.

Less performant than Uvicorn in general cases.

Command


daphne myapp:app

Hypercorn


A highly configurable ASGI server.

Supports multiple protocols, including HTTP/1, HTTP/2, WebSocket, and QUIC.

A good alternative if fine-grained control over server behavior is needed.

Command:


hypercorn myapp:app

ASGI Built-in Development Server

Some ASGI frameworks come with built-in development servers for local testing.

Not recommended for production.




Saturday, December 21, 2024

What is dependency Injection in Python

Dependency Injection (DI) in Python is a design pattern where the dependencies of a class or function are provided (injected) from the outside, rather than being created or managed by the class or function itself. This approach makes the code more modular, testable, and easier to maintain.


Key Concepts

Dependency: Any external object or resource that a class or function needs to operate (e.g., a database connection, an API client, a logger).

Injection: Supplying the dependency from outside the class or function, typically as an argument.

Why Use Dependency Injection?

Decoupling: Reduces tight coupling between components.

Testability: Makes it easier to test classes or functions by providing mock or stub dependencies.

Flexibility: Allows swapping out dependencies without modifying the dependent class or function.

Example Without Dependency Injection

Here, the dependency (Logger) is created inside the Service class, which tightly couples the two.


class Logger:

    def log(self, message):

        print(f"LOG: {message}")


class Service:

    def __init__(self):

        self.logger = Logger()  # Dependency is created inside the class


    def perform_task(self):

        self.logger.log("Task performed")


Example With Dependency Injection

In this example, the Logger is injected into the Service class, making the Service independent of how the Logger is implemented.


python

Copy code




class Logger:

    def log(self, message):

        print(f"LOG: {message}")


class Service:

    def __init__(self, logger):  # Dependency is injected

        self.logger = logger


    def perform_task(self):

        self.logger.log("Task performed")


# Injecting the dependency

logger = Logger()

service = Service(logger)

service.perform_task()


Benefits

The Service class does not need to know how the Logger is implemented.

You can easily swap out Logger for another implementation (e.g., FileLogger, DatabaseLogger) without modifying Service.

Dependency Injection with Frameworks

In larger applications, dependency injection frameworks (like Dependency Injector or pytest fixtures in testing) can help manage dependencies systematically.


from dependency_injector import containers, providers


class Logger:

    def log(self, message):

        print(f"LOG: {message}")


class Service:

    def __init__(self, logger):

        self.logger = logger


    def perform_task(self):

        self.logger.log("Task performed")


# DI container

class Container(containers.DeclarativeContainer):

    logger = providers.Factory(Logger)

    service = providers.Factory(Service, logger=logger)


# Using the container

container = Container()

service = container.service()

service.perform_task()



1. Providers

Providers are objects responsible for creating and managing dependencies. They can define how objects (dependencies) are created and supply these objects to other parts of the application.


Providers can:


Instantiate objects.

Return pre-configured instances.

Manage singletons (single instances reused across the application).

Provide factories for creating new instances on demand.

Types of Providers

Here are some commonly used provider types in Dependency Injector:


Factory: Creates a new instance of an object every time it's called.

Singleton: Creates and returns the same instance for every call.

Callable: Calls a specified function or callable.

Configuration: Provides values from an external configuration source (e.g., environment variables, files).

Delegate: Delegates provisioning to another provider.

Resource: Manages external resources with lifecycle hooks like initialization and cleanup


from dependency_injector import providers


# Factory provider (creates a new instance each time)

class Logger:

    def log(self, message):

        print(f"LOG: {message}")


logger_provider = providers.Factory(Logger)

logger_instance_1 = logger_provider()

logger_instance_2 = logger_provider()


print(logger_instance_1 is logger_instance_2)  # False (new instance each time)


# Singleton provider (creates one instance only)

singleton_logger = providers.Singleton(Logger)

logger_instance_3 = singleton_logger()

logger_instance_4 = singleton_logger()


print(logger_instance_3 is logger_instance_4)  # True (same instance)


Sunday, December 15, 2024

Create Embedding Model that adds additional context

 1. Conceptual Understanding

Core Idea: You want to create embeddings that not only represent the input text but also incorporate external knowledge or context. This can significantly improve the quality of similarity searches and downstream tasks.

Example: Imagine you're embedding product descriptions. Adding context like brand, category, or even user purchase history can lead to more relevant recommendations.

2. Methods

Concatenation:

Approach:

Obtain Context Embeddings: Generate embeddings for the context information (e.g., brand, category) using the same or a different embedding model.

Concatenate: Concatenate the context embeddings with the input text embeddings.

Example:

Input: "Comfortable shoes"

Context: "Brand: Nike, Category: Running"

Embedding: embed("Comfortable shoes") + embed("Brand: Nike") + embed("Category: Running")

Weighted Sum:

Approach:

Obtain embeddings as in concatenation.

Assign weights to each embedding based on its importance.

Calculate a weighted sum of the embeddings.

Example:

weighted_embedding = 0.7 * embed("Comfortable shoes") + 0.2 * embed("Brand: Nike") + 0.1 * embed("Category: Running")

Contextualized Embeddings:

Approach:

Use a language model (like BERT or GPT) to generate embeddings.

Feed the input text and context to the model simultaneously.

The model will generate embeddings that capture the interaction between the text and the context.

Implementation: Utilize Hugging Face Transformers library for easy access to pre-trained models.

3. Implementation Example (Concatenation with Sentence-Transformers)


Python


from sentence_transformers import SentenceTransformer


model = SentenceTransformer('all-MiniLM-L6-v2') 


def embed_with_context(text, context):

    """

    Generates embeddings for the input text with added context.


    Args:

        text: The input text.

        context: A dictionary containing context information.


    Returns:

        The concatenated embedding.

    """

    text_embedding = model.encode(text)

    context_embeddings = [model.encode(f"{key}: {value}") for key, value in context.items()]

    return np.concatenate([text_embedding] + context_embeddings)


# Example Usage

input_text = "Comfortable shoes"

context = {"Brand": "Nike", "Category": "Running"}

embedding = embed_with_context(input_text, context)

4. Key Considerations


Context Representation: Choose a suitable format for representing context (dictionaries, lists, etc.).

Embedding Model: Select an embedding model that aligns with your context and task.

Weighting: Experiment with different weighting schemes for optimal results.

Evaluation: Thoroughly evaluate the performance of your custom embeddings on your specific task.

Remember: The effectiveness of your custom embeddings will depend on the quality and relevance of the context information you provide. Experiment with different approaches and carefully evaluate the results to find the best solution for your use case.


What are various data for similarity Search performance

1. SQuAD (Stanford Question Answering Dataset)

Use Case: Test vector search accuracy for question-answer retrieval.

How to Use:

Preprocess the dataset to create embeddings for questions and answers.

Use questions as queries and measure the retrieval accuracy against their respective answers.


2. MS MARCO

Use Case: Benchmark performance for document or passage ranking tasks.

How to Use:

Use passages and queries provided in the dataset.

Generate embeddings and use queries to retrieve relevant passages.


3. Quora Question Pairs

Use Case: Test duplicate or paraphrased question detection.

How to Use:

Generate embeddings for questions.

Use one question as a query and check if its paraphrase is retrieved as the most similar result.


4. TREC Datasets

Use Case: Test retrieval systems for a variety of topics, including QA, passage retrieval, and entity matching.

How to Use:

Choose a task, create embeddings, and evaluate search performance.


5. Synthetic Dataset for Quick Testing

If you want something lightweight and simple to start:



from sklearn.datasets import make_blobs

import numpy as np


# Generate synthetic data

data, _ = make_blobs(n_samples=100, centers=5, n_features=50, random_state=42)


# Convert to strings for a mock dataset

docs = [" ".join(map(str, row)) for row in data]


# Example: Simulate a search query

query = np.mean(data[:10], axis=0)  # Take an average vector as query


1. Sentence Similarity Datasets:

STSBenchmark: A collection of sentence pairs with human-rated similarity scores. This dataset is widely used for evaluating sentence embedding models and similarity search.   

SemEval datasets: SemEval has hosted several tasks related to semantic similarity, including paraphrase identification and textual entailment, which can provide valuable data for evaluating vector similarity search.   

Quora Question Pairs: A dataset of question pairs from Quora, where the task is to determine whether a pair of questions are duplicates.   

2. Text Retrieval Datasets:


MS MARCO: A large-scale dataset for document ranking, containing passages from Wikipedia and a set of queries.   

TREC datasets: A collection of datasets for information retrieval tasks, including question answering and document retrieval. Some TREC datasets can be adapted for vector similarity search evaluation.   

News2Dataset: A dataset of news articles and their corresponding summaries, which can be used to evaluate the retrieval of relevant documents based on query vectors.

3. Image Retrieval Datasets:


ImageNet: A large image dataset with millions of images and associated labels. It can be used to evaluate image similarity search by comparing query images to images in the dataset.   

Places365: A dataset of images categorized into 365 scene categories, which can be used to evaluate place recognition and image retrieval based on visual similarity.   

4. Code Search Datasets:


GitHub CodeSearchNet: A large dataset of code snippets and their natural language descriptions, which can be used to evaluate code search based on textual queries.   

Key Considerations When Choosing a Dataset:

Relevance to your use case: Select a dataset that is relevant to the specific application of vector similarity search you are interested in (e.g., question answering, product recommendation, image search).

Dataset size: Choose a dataset that is large enough to provide a meaningful evaluation of your system's performance.

Data quality: Ensure that the dataset is of high quality and free from errors or biases.

Availability and licensing: Make sure that the dataset is readily available and that you have the necessary rights to use it for your evaluation.

By using these datasets, you can effectively test the performance of your vector store and compare different approaches to vector similarity search.

references:


Friday, December 13, 2024

What are some of Advanced RAG techniques

Here are some advanced RAG techniques

Input/output validation: Ensuring groundedness. This technique verifies that the input query and generated output align with specific use cases and company policies. It helps maintain control over the LLM’s responses, preventing unintended or harmful outputs.


Guardrails: Compliance and auditability. Guardrails ensure that queries and responses adhere to relevant regulations and ethical guidelines. They also make it possible to track and review interactions with the LLM for accountability and transparency.


Explainable responses: This aspect involves providing clear explanations for how the LLM arrived at its conclusions. This is crucial for building trust and understanding the reasoning behind the model’s outputs.


Caching: Efficient handling of similar queries. Semantic caching optimizes the LLM’s performance by storing and reusing the results of similar queries. This reduces latency and improves the overall efficiency of the system.


Hybrid search: Combining semantic and keyword matching. This technique leverages both semantic understanding and exact keyword matching to retrieve the most relevant information from the knowledge base. This approach enhances the accuracy and breadth of the LLM’s responses.


Re-ranking: Improving relevance and accuracy. Re-ranking involves retrieving a set of relevant data points and reordering them based on their relevance to the specific query. This helps ensure the most pertinent information is presented to the user.


Evals: Continuous self-learning. Evals use techniques like Reinforcement Learning from Human Feedback (RLHF) to continuously improve the LLM’s performance. This involves collecting human feedback on the model’s responses and using that feedback to refine its future outputs.

references
OpenAI 
https://levelup.gitconnected.com/building-enterprise-ai-apps-with-multi-agent-rag-06356b35ba1a

Langchain work with Google Gemini

from langchain_google_genai import ChatGoogleGenerativeAI

llm = ChatGoogleGenerativeAI(

    model="gemini-1.5-pro",

    temperature=0,

    max_tokens=None,

    timeout=None,

    max_retries=2,

    # other params...

)

messages = [

    (

        "system",

        "You are a helpful assistant that translates English to French. Translate the user sentence.",

    ),

    ("human", "I love programming."),

]

ai_msg = llm.invoke(messages)

ai_msg


print(ai_msg.content)



Chaining is like below 


from langchain_core.prompts import ChatPromptTemplate


prompt = ChatPromptTemplate.from_messages(

    [

        (

            "system",

            "You are a helpful assistant that translates {input_language} to {output_language}.",

        ),

        ("human", "{input}"),

    ]

)

chain = prompt | llm

chain.invoke(

    {

        "input_language": "English",

        "output_language": "German",

        "input": "I love programming.",

    }

)

references:

https://python.langchain.com/docs/integrations/chat/google_generative_ai/


Thursday, December 12, 2024

What is Gemini 2.0

 Gemini 2.0 Flash is available in Gemini API and Google AI Studio. Building on the success of Gemini 1.5 Flash, the 2.0 release introduces performance improvements, new Multimodal Live API, expanded output modalities, and native tool use.

Here's a breakdown of what's new:

Improved Performance: Gemini 2.0 Flash Experimental provides double the speed of Gemini 1.5 Pro, with enhanced capabilities across multimodal understanding, text, code, video, and spatial reasoning.

Multimodal Live API: Develop real-time applications with streaming audio and video, natural conversational patterns, and tool integration via the new Multimodal Live API.

Native Tool Use: Build intelligent agentic features with integrated tools such as Google Search (including parallel search), code execution, and custom function calling.

Sample code is as below 


import google.generativeai as genai

genai.configure(api_key="YOUR_API_KEY")

model = genai.GenerativeModel("gemini-1.5-flash")

response = model.generate_content("Explain how AI works")

print(response.text)


Gemini 2.0 is a multi modal model. which has capabilites as given below 


references:


What is Command in Langgraph

Command can be useful to combine control flow (edges) and state updates (nodes). For example, you might want to BOTH perform state updates AND decide which node to go to next in the SAME node. LangGraph provides a way to do so by returning a Command object from node functions:

def my_node(state: State) -> Command[Literal["my_other_node"]]:

    return Command(

        # state update

        update={"foo": "bar"},

        # control flow

        goto="my_other_node"

    )

For e.g. like this below 

def node_a(state: State) -> Command[Literal["node_b", "node_c"]]:

    print("Called A")

    value = random.choice(["a", "b"])

    # this is a replacement for a conditional edge function

    if value == "a":

        goto = "node_b"

    else:

        goto = "node_c"

    # note how Command allows you to BOTH update the graph state AND route to the next node

    return Command(

        # this is the state update

        update={"foo": value},

        # this is a replacement for an edge

        goto=goto,

    )

References:

https://langchain-ai.github.io/langgraph/how-tos/command/

Does Langgraph studio support python app created outside of Langgraph CLI?

LangGraph Studio primarily works with applications created using the LangGraph CLI. This is because LangGraph Studio is designed to work within the LangGraph ecosystem, which involves defining workflows, agents, and actions using LangGraph's specific structure and configuration format.

To clarify:

LangGraph CLI: This tool is used to create, configure, and initialize applications that are compatible with LangGraph Studio. It allows you to define the graph, agents, tasks, and workflows in a way that LangGraph Studio can interpret and visualize.

Regular Python Apps: If you have a regular Python app that doesn't adhere to LangGraph's predefined structure (i.e., it's not set up using the LangGraph CLI), you can't directly load it into LangGraph Studio without adapting it to the LangGraph framework.

Integration Options:

If you have a regular Python app that you want to integrate with LangGraph Studio, you would likely need to refactor it into the LangGraph framework, defining tasks, agents, and workflows in a way LangGraph Studio can handle.

Alternatively, you could wrap your existing Python logic in a way that it becomes part of the LangGraph structure by using LangGraph’s APIs and conventions for agents and workflows.

In summary, LangGraph Studio is tailored to apps created with LangGraph CLI, but with some effort, it is possible to adapt a regular Python application to fit into the LangGraph model for integration.


References:

OpenAI 

What is Langgraph Server?

The LangGraph Server is a backend server component of the LangGraph framework, designed to facilitate complex workflows by leveraging multi-agent collaboration. It acts as the execution environment for the agents and their plans, providing a centralized mechanism to coordinate, manage, and monitor the execution of tasks across different agents.


Key Features

Execution Environment:


Executes plans generated by the Planner agent.

Supervises agents as they perform tasks, ensuring tasks are completed sequentially or concurrently as defined.

Agent Coordination:


Allows agents to communicate and share data through a Global State.

Ensures agents adhere to the task plan by managing dependencies and transitions.

State Management:


Maintains the global state of the workflow, enabling agents to retrieve or update task-related data dynamically.

Error Handling:


Handles failures during agent execution, allowing retries or fallback strategies as defined in the workflow.

Monitoring and Logging:


Provides logs for each agent's actions and state transitions.

Enables real-time tracking of task progress.

API Interface:


Exposes endpoints for interacting with the LangGraph ecosystem, such as submitting plans, retrieving execution results, or querying the global state.

Scalability:


Supports distributed execution for scaling complex workflows involving multiple agents.

How LangGraph Server Works

Input: Receives a high-level plan from the Planner agent or a user-defined workflow.

Task Assignment:

Breaks down the plan into steps.

Assigns each step to the appropriate agent.

Execution:

Executes tasks via agents, tracking their progress.

Updates the global state based on task outcomes.

Output:

Returns the final result once all tasks are complete.

Provides detailed logs or partial results during execution.

Example Use Case

Suppose you have a GenAI application where:


A Planner agent creates a workflow to diagnose network issues.

The workflow includes fetching device details, running diagnostics, and generating a report.

The LangGraph Server will:


Receive the workflow plan.

Assign steps to specific agents (e.g., a FetchAgent for device details, a DiagnosticAgent for running diagnostics).

Monitor the progress and ensure data flows correctly between agents.

Return the consolidated results to the user or a downstream system.

Benefits

Centralized orchestration for distributed multi-agent workflows.

Enhanced fault tolerance and retry mechanisms.

Streamlined integration of various agents and tools.

Would you like a deeper dive into any specific aspect of the LangGraph Server?

What is Langgraph Studio

Key Features of LangGraph Studio:

Graph Visualization:


Displays the flow of tasks and decisions made by agents in a graph format.

Shows the hierarchy and relationships between agents.

Execution Monitoring:


Allows real-time tracking of agent activity, including:

Input and output data at each stage.

Agent states and decisions.

Logs execution results, making it easier to identify bottlenecks.

Debugging Tools:


Provides insights into why an agent made a specific decision.

Highlights errors or issues in execution steps.

Facilitates rollback or re-execution of specific steps.

State Management:


Visualizes the global state used by LangGraph to share data between agents.

Tracks updates to the state as agents perform their tasks.

Support for Planner and Executor:


Helps visualize the steps generated by the Planner agent.

Shows how the Executor agent delegates tasks to specialized agents.

Interactive Workflow Editing:


Users can modify workflows or test scenarios directly within the studio.

Provides an interface to simulate different agent interactions without rewriting code.


references:

OpenAI 

https://medium.com/@kbdhunga/langgraph-studio-b3f16e51d437


Wednesday, December 11, 2024

What are multi modal RAG systems?

Multimodal RAG systems, AI systems capable of processing text and non-text data.

Multimodal RAG enables more sophisticated inferences beyond what is conveyed by text alone. For example, it could analyze someone’s facial expressions and speech tonality to give a richer context to a meeting’s transcription

three basic strategies at increasing levels of sophistication.


Translate modalities to text. => In this case 

Text-only retrieval + MLLM 

Multimodal retrieval + MLLM


A simple way to make a RAG system multimodal is by translating new modalities to text before storing them in the knowledge base. This could be as simple as converting meeting recordings into text transcripts, using an existing multimodal LLM (MLLM) to generate image captions, or converting tables to a readable text format (e.g., .csv or .json).


In text-only retrieval + MLLM , generate text representations of all items in the knowledge base, e.g., descriptions and meta-tags, for retrieval, but to pass the original modality to a multimodal LLM (MLLM).


In level 3, we can use multimodal embeddings to perform multimodal retrieval. This works the same way as text-based vector search, but now the embedding space co-locates similar concepts independent of its original modality. The results of such a retrieval strategy can then be passed directly to a MLLM.


references:

https://towardsdatascience.com/multimodal-rag-process-any-file-type-with-ai-e6921342c903