Sunday, January 28, 2024

What is LlamanaIndex

LlamaIndex offers several benefits for building and using large language models (LLMs):


Data Management and Ingestion:


Simplified data ingestion: It connects to various data sources like APIs, PDFs, databases, and documents, streamlining the process of bringing data into your LLM system.

Unified data storage: It stores both structured and unstructured data in a single platform, eliminating the need for separate storage solutions.

Native data privacy: It supports private data handling, allowing you to securely store and use sensitive information for your LLM applications.

LLM Efficiency and Performance:


Faster query responses: By indexing your data, LlamaIndex enables LLMs to retrieve relevant information quickly and efficiently.

Reduced computational resources: It optimizes data access, minimizing the computational power required for LLMs to process information.

Enhanced accuracy: With efficient data retrieval, LLMs can generate more accurate and relevant outputs based on the available information.

Development and Integration:


Easy integration: LlamaIndex offers a high-level API, making it simple to integrate it with existing LLM frameworks and applications.

Multiple use cases: It supports various LLM applications, including natural language processing, image recognition, and predictive analytics.

Flexibility: It adapts to diverse data formats and LLM needs, providing a versatile solution for various projects.

Additional benefits:


Open-source availability: LlamaIndex is open-source, allowing for community contributions and customization.

Scalability: It can be scaled to handle large datasets and complex LLM applications.

Active development: The project is continuously updated with new features and improvements.

Overall, LlamaIndex offers a valuable tool for developers and researchers working with LLMs by simplifying data management, enhancing performance, and facilitating integration. Its versatility and open-source nature make it an attractive option for various LLM projects.


I hope this comprehensive overview of LlamaIndex benefits is helpful! Feel free to ask if you have any further questions.


References:

BARD

PCA Introduction

PCA is the method that finds and sort main directions over which our data vary.

The data are charted in an X/Y graph, that means we chose two (conventional) directions over which describe our data. The two axes (actually their directions) are – loosely speaking – the basis we use to calculate the coordinate numerical values.

The axes direction is somewhat arbitrary, that means we can always choose other directions. For instance we could find out the direction over which our data changes more quickly and define that as x-axis. It turns out that the direction over which the data changes the most is the first principal component. 

Below is how we can compute this in Python 


import matplotlib.pyplot as plt

from pandas import read_csv

 

# Read the data

data = read_csv('data.csv')

cs = data["X"].values

temp = data["Y"].values

 

# Take away the mean

x = cs - cs.mean()

y = temp - temp.mean()

 

# Group the data into a single matrix

datamatrix = np.array([x,y])

 

# Calculate the covariance matrix

covmat = np.cov(datamatrix)

 

# Find eigenvalues and eigenvectors of the covariance matrix

w,v = np.linalg.eig(covmat)

 

# Get the index of the largest eigenvalue

maxeig = np.argmax(w)

 

# Get the slope of the line passing through the origin and the largest eigenvector

m = -v[maxeig, 1]/v[maxeig, 0]

line = m*x



plt.scatter(x, y)

plt.xlabel('x')

plt.ylabel('y')

 

plt.quiver(0,0, x[0], line[0], units = 'xy', scale = 1, color='r', width = 0.2)

plt.axis('equal')

plt.ylim((-18,18))

 

plt.show()




Step 1, reading the data and assigning it to numpy arrays

Step 2, for PCA to work we need to take away the mean from both coordinates, that is we want the data to be centred at the origin of the x-y coordinates

Step 3, group the data in a single array

Step 4, calculate the covariance matrix of this array. Since we are dealing with a 2D dataset (bivariate data), the covariance matrix will be 2×2

Step 5, calculate eigenvalues and eigenvectors of the covariance matrix

Step 6, get the index of the largest eigenvalue. The first principal component we are looking for is the eigenvector corresponding to the largest eigenvalue

Step 7, this one is just needed for plotting. We get the slope of the line that is parallel to the principal component

Step 8,  Now we just need to plot the first principal component on top of the data.



sklearn provides quick library for doing this. 


from sklearn.decomposition import PCA

 

datazip = list(zip(x,y))

pca = PCA(n_components=2)

pca.fit(datazip)

 

# Print the eigenvectors

print(pca.components_)


Saturday, January 27, 2024

Langchain with Gemini

It is as simple as this 

from langchain_google_genai import ChatGoogleGenerativeAI

llm = ChatGoogleGenerativeAI(model="gemini-pro")

response = llm.invoke("Explain Quantum Computing in 50 words?")

print(response.content)

batch_responses = llm.batch(

    [

        "Who is the Prime Minister of India?",

        "What is the capital of India?",

    ]

)

for response in batch_responses:

    print(response.content)


For Analysing images, it can be done as below 


from langchain_core.messages import HumanMessage

llm = ChatGoogleGenerativeAI(model="gemini-pro-vision")


message = HumanMessage(

    content=[

        {

            "type": "text",

            "text": "Describe the image",

        },

        {

            "type": "image_url",

            "image_url": "https://picsum.photos/id/237/200/300"

        },

    ]

)


response = llm.invoke([message])

print(response.content)

Now if want to find difference between Images

from langchain_core.messages import HumanMessage

llm = ChatGoogleGenerativeAI(model="gemini-pro-vision")


message = HumanMessage(

    content=[

        {

            "type": "text",

            "text": "Find the differences between the given images",

        },

        {

            "type": "image_url",

            "image_url": "https://picsum.photos/id/237/200/300"

        },

        {

            "type": "image_url",

            "image_url": "https://picsum.photos/id/219/5000/3333"

        }

    ]

)


response = llm.invoke([message])

print(response.content)


References:

https://codemaker2016.medium.com/build-your-own-chatgpt-using-google-gemini-api-1b079f6a8415#5f9f

Gemini - Simple first app

To get API key, can go via makersuite: 

https://makersuite.google.com/app/apikey

The below code Sample can get you started 

import os

import google.generativeai as genai

os.environ['GOOGLE_API_KEY'] = "AIzaSyB5dyxjVRBu4Ee0Oopcp_s_wX5gGnjK6sg"

genai.configure(api_key = os.environ['GOOGLE_API_KEY'])

model = genai.GenerativeModel('gemini-pro')

response = model.generate_content("What are top 5 superfast cars in the world?")

print(response.text)

references:

https://codemaker2016.medium.com/build-your-own-chatgpt-using-google-gemini-api-1b079f6a8415#5f9f 


What is Google Gemini in 2024

Gemini is designed to seamlessly handle text, images, audio, and video; these foundational models redefine the boundaries of AI interactions. 

Gemini AI is a set of large language models (LLMs) created by Google AI. It is known for its cutting-edge advancements in multimodal understanding and processing

It is a powerful tool that can handle tasks that involves data such as text, image, audio, video, code and there by grasp complex things arise in usual real world scenarios

Gemini uses Google's TPU to perform faster. 

There are three flavours for Gemini 

Gemini Ultra:

Strengths:

Cutting-edge: Surpasses previous models like GPT-4 in benchmarks.

Multimodal understanding: Processes and analyzes complex data including text, images, audio, and video.

Advanced reasoning: Capable of intricate decision-making and problem-solving.

Focus: Pushing the boundaries of AI capabilities.

Target users: Advanced researchers, developers working on cutting-edge AI projects.


Gemini Pro:

Strengths:

Versatility across tasks: Handles text, images, and more.

Scalability: Efficient deployment on diverse hardware.

Powerful: Excellent language understanding, generation, and translation.

Focus: Scaling performance across various applications.

Target users: Developers, researchers, businesses aiming for versatile AI solutions.


Gemini Nano:

Strengths:

Lightweight and efficient: Ideal for on-device tasks on mobile devices.

Fast performance: Delivers results quickly.

User-friendly: Accessible for individuals with less technical expertise.

Focus: Making AI accessible and efficient for everyday use.

Target users: Individuals, developers wanting lightweight AI solutions for mobile applications.



References:

OpenAI 



Monday, January 22, 2024

Langchain PydanticOutputParser

The PydanticOutputParser emerges as a valuable asset in the LangChain arsenal. By seamlessly bridging the gap between raw text and organized, JSON-like structures, LangChain empowers users to extract valuable insights with precision and ease. By transforming language model outputs into structured information, LangChain propels us toward a future where the data generated is not just strings but meaningful, structured insights.


In the code that follows, Pydantic is being used to define data models that represent the structure of the competitive intelligence information. Pydantic is a data validation and parsing library for Python that allows you to define simple or complex data structures using Python data types. In this case, we using Pydantic models (Competitor and Company) to define the structure of the competitive intelligence data.


import pandas as pd

from typing import Optional, Sequence

from langchain.llms import OpenAI

from langchain.output_parsers import PydanticOutputParser

from langchain.prompts import PromptTemplate

from pydantic import BaseModel


# Load data from CSV

df = pd.read_csv("data.csv", sep=';')


# Pydantic models for competitive intelligence

class Competitor(BaseModel):

    company: str

    offering: str

    advantage: str

    products_and_services: str

    additional_details: str


class Company(BaseModel):

    """Identifying information about all competitive intelligence in a text."""

    company: Sequence[Competitor]


# Set up a Pydantic parser and prompt template

parser = PydanticOutputParser(pydantic_object=Company)

prompt = PromptTemplate(

    template="Answer the user query.\n{format_instructions}\n{query}\n",

    input_variables=["query"],

    partial_variables={"format_instructions": parser.get_format_instructions()},

)


# Function to process each row and extract information

def process_row(row):

    _input = prompt.format_prompt(query=row['INTEL'])

    model = OpenAI(temperature=0)

    output = model(_input.to_string())

    result = parser.parse(output)

    

    # Convert Pydantic result to a dictionary

    competitor_data = result.model_dump()


    # Flatten the nested structure for DataFrame creation

    flat_data = {'INTEL': [], 'company': [], 'offering': [], 'advantage': [], 'products_and_services': [], 'additional_details': []}


    for entry in competitor_data['company']:

        flat_data['INTEL'].append(row['INTEL'])

        flat_data['company'].append(entry['company'])

        flat_data['offering'].append(entry['offering'])

        flat_data['advantage'].append(entry['advantage'])

        flat_data['products_and_services'].append(entry['products_and_services'])

        flat_data['additional_details'].append(entry['additional_details'])


    # Create a DataFrame from the flattened data

    df_cake = pd.DataFrame(flat_data)


    return df_cake


# Apply the function to each row and concatenate the results

intel_df = pd.concat(df.apply(process_row, axis=1).tolist(), ignore_index=True)


# Display the resulting DataFrame

intel_df.head()



references:

https://medium.com/@shubham.shardul2019/output-parsers-in-langchain-pydantic-json-parsing-31be48ce6cfe

https://medium.com/@ingridwickstevens/extract-structured-data-from-unstructured-text-using-llms-71502addf52b



Monday, January 1, 2024

AWSCertCP: AWS Serverless Options

Amazon Web Services (AWS) offers a variety of serverless computing options, allowing you to build and run applications without managing the underlying infrastructure. Here are some key AWS serverless options:

AWS Lambda:

Type: Function-as-a-Service (FaaS)

Description: AWS Lambda allows you to run code without provisioning or managing servers. You can upload your code, and Lambda automatically takes care of the infrastructure, scaling, and availability. It is event-driven and supports various trigger sources.

Amazon API Gateway:

Type: Managed API Gateway

Description: Amazon API Gateway enables you to create, publish, and manage APIs at any scale. It can be used to build RESTful APIs, WebSocket APIs, and to connect APIs to Lambda functions.

Amazon DynamoDB (with Streams):

Type: NoSQL Database Service with Streams

Description: DynamoDB is a serverless, fully managed NoSQL database. DynamoDB Streams allows you to capture changes to your data and trigger serverless functions in response to those changes.

Amazon S3 (with Event Notifications):

Type: Object Storage Service with Event Notifications

Description: Amazon S3 is a serverless object storage service. You can configure event notifications on S3 buckets to trigger Lambda functions in response to object creation, deletion, or other events.

AWS Step Functions:

Type: Serverless Orchestration Service

Description: AWS Step Functions allows you to coordinate the components of distributed applications using visual workflows. It is used for building serverless workflows that integrate with Lambda functions, services, and more.

AWS App Runner:

Type: Fully Managed Container Service

Description: AWS App Runner is a fully managed service that makes it easy to build, deploy, and scale containerized applications quickly. It abstracts away the underlying infrastructure, allowing you to focus on your code.

AWS EventBridge:

Type: Serverless Event Bus

Description: AWS EventBridge is a serverless event bus service that makes it easy to connect different applications using events. It allows you to build event-driven architectures by integrating with various AWS services.

Amazon Aurora Serverless:

Type: Relational Database Service

Description: Amazon Aurora Serverless is a fully managed relational database service that automatically adjusts capacity based on your application's needs. It is suitable for workloads with unpredictable or variable usage patterns.

AWS Glue:

Type: Serverless Data Integration Service

Description: AWS Glue is a serverless data integration service that makes it easy to discover, prepare, and transform data for analysis. It supports data processing using Apache Spark.

Amazon Cognito:

Type: Identity and User Management Service

Description: Amazon Cognito is a serverless service for user identity and access management. It provides authentication, authorization, and user management for applications.

AWS Amplify:

Type: Serverless Framework for Web and Mobile Apps

Description: AWS Amplify is a serverless framework for building scalable and secure web and mobile applications. It provides a set of tools and services for frontend and backend development.

These serverless options provide a range of services for building and running applications without the need to manage servers. Depending on your use case and application requirements, you can choose the appropriate AWS serverless services to meet your needs.

references:

OpenAPI