Sunday, August 13, 2023

How to Stop running workloads and in Kubernetes

In Kubernetes API, there is no verb “stop”. Technically speaking, it is not possible to “stop” something in Kubernetes. However, instead, we can set the number of replicas to zero. This action will instruct the deployment controller to delete all the existing pods of a given deployment. After that, no new pods will be created unless the replica count is increased back to more than zero. Applying this setting is a literal equivalent to stopping a deployment.

kubectl --namespace default scale deployment my-deployment --replicas 0

Now to get the deployment name, below command can be used 

kubectl get deploy -o wide

NAME           READY   UP-TO-DATE   AVAILABLE   AGE   CONTAINERS                                             IMAGES                  SELECTOR

nginx-webapp   5/5     5            5           8h    sidecar-container1,sidecar-container2,main-container   busybox,busybox,nginx   app=nginx-webapp

So the command to stop the deployment in this scenario is 

kubectl --namespace default scale deployment nginx-webapp --replicas 0

Stop Multiple Deployments

Kubectl allows you to perform the same operation on multiple objects at once. By doing some Linux shell magic we can obtain a list of all deployments and scale all of them using a single command.

To stop all Kubernetes deployments, run the following kubectl command:

kubectl --namespace default scale deployment $(kubectl --namespace default get deployment | awk '{print $1}') --replicas 0 

Deployment is not the only resource that manages Kubernetes workloads, there are also stateful sets.

kubectl --namespace default scale statefulset --replicas 0 $(kubectl --namespace default get statefulset  | awk '{print $1}')

If you want to perform a complete cleanup of your Kubernetes cluster, you can delete all your resources at once:

kubectl delete all --all --namespace default


references:

https://yourdevopsmentor.com/blog/how-to-stop-all-kubernetes-deployments/#:~:text=In%20Kubernetes%20API%2C%20there%20is,pods%20of%20a%20given%20deployment.


What is Kind in Kubernetes

In Kubernetes, the kind field is used to specify the type of Kubernetes resource being described in a YAML or JSON configuration file. The kind field determines how the Kubernetes API server should interpret and handle the resource. Here are some common values for the kind field along with brief explanations of each:

Pod:

Represents a single instance of a running process in a cluster. Pods are the smallest deployable units in Kubernetes and can contain one or more containers.

Service:

Provides network connectivity to a set of pods. Services enable load balancing and DNS-based discovery for pods.

ReplicationController:

Ensures a specified number of pod replicas are running at all times. If a pod fails or is deleted, the ReplicationController replaces it.

Deployment:

Provides declarative updates to applications. It allows you to define desired state and manages the deployment and scaling of pods.

StatefulSet:

Manages the deployment and scaling of a set of pods with unique identities. It is useful for applications that require stable network identities and persistent storage.

DaemonSet:

Ensures that a copy of a specified pod is running on each node in the cluster. Used for running background tasks or agents.

Job:

Represents a single task or batch job. Jobs create one or more pods and run the specified command to completion.

CronJob:

Creates jobs on a schedule defined using the Cron format. Useful for running jobs periodically.

Namespace:

Provides a way to logically partition resources within a cluster. Namespaces help in organizing and managing resources.

ConfigMap:

Stores configuration data as key-value pairs that can be used by pods or other resources.

Secret:

Stores sensitive information, such as passwords, tokens, or API keys, securely.

PersistentVolume:

Represents a storage resource in the cluster that can be used by pods. PersistentVolumes decouple storage from individual pods.

PersistentVolumeClaim:

Requests a specific amount of storage from a PersistentVolume. Pods use PersistentVolumeClaims to access storage resources.

ServiceAccount:

Represents an identity for pods and allows fine-grained access control to Kubernetes API resources.

Ingress:

Manages external access to services within a cluster, typically for HTTP-based applications.

ClusterRole:

Defines a set of permissions for accessing cluster-level resources. ClusterRoles are used with RoleBindings and ClusterRoleBindings.

Namespace:

Logical partitioning of a Kubernetes cluster, allowing isolation of resources and access control.

CustomResourceDefinition (CRD):

Extends Kubernetes API with custom resources. It allows you to define your own API objects.

ServiceMonitor:

Custom resource used by monitoring tools like Prometheus Operator to discover and monitor services.

These are just a few examples of the possible values for the kind field in Kubernetes resource definitions. Each value corresponds to a specific type of resource that Kubernetes manages. The kind field is crucial for proper interpretation and functioning of the resource within the Kubernetes cluster.


References:

ChatGPT 


What is NodePort in Kubernetes Deployment

In Kubernetes, a NodePort is a type of service that exposes an application running inside a cluster to be accessible from outside the cluster. It allows you to expose your application to a specific port on each node in the cluster, making the application accessible from the public internet or other networks.

Here's how the NodePort type works:

Internal Cluster Communication: Your application runs as a set of pods inside the Kubernetes cluster. These pods communicate with each other using their own IP addresses and ports within the cluster's internal network.

NodePort Service: When you create a NodePort service, Kubernetes allocates a port (the NodePort) on each node in the cluster. Any traffic that arrives at this port on any node is forwarded to the corresponding port on the selected pods.

External Access: This means that the service can be accessed from outside the cluster by connecting to any node's IP address on the specified NodePort.

Port Range: The NodePort is usually in the range 30000-32767. You can specify the NodePort explicitly when creating the service, or let Kubernetes choose an available port within this range.

References:

ChatGPT 

What is Kubernetes Deployment YAML

A Kubernetes user or administrator specifies data in a YAML file, typically to define a Kubernetes object. The YAML configuration is called a “manifest”, and when it is “applied” to a Kubernetes cluster, Kubernetes creates an object based on the configuration.


A Kubernetes Deployment YAML specifies the configuration for a Deployment object—this is a Kubernetes object that can create and update a set of identical pods. Each pod runs specific containers, which are defined in the spec.template field of the YAML configuration. 


The Deployment object not only creates the pods but also ensures the correct number of pods is always running in the cluster, handles scalability, and takes care of updates to the pods on an ongoing basis. All these activities can be configured through fields in the Deployment YAML. 


The following YAML configuration creates a Deployment object that runs 5 replicas of an NGINX container.


apiVersion: apps/v1

kind: Deployment

metadata:

  name: nginx-deployment

  labels:

    app: web

spec:

  selector:

    matchLabels:

      app: web

  replicas: 5

  strategy:

    type: RollingUpdate

  template:

    metadata:

      labels:

        app: web

    spec:

      containers:

       —name: nginx

          image: nginx

          ports:

           —containerPort: 80



spec.replicas—specifies how many pods to run

strategy.type—specifies which deployment strategy should be used. In this case and in the following examples we select RollingUpdate, which means new versions are rolled out gradually to pods to avoid downtime.

spec.template.spec.containers—specifies which container image to run in each of the pods and ports to expose.



Below is how to specify the limits 


 spec:

      containers:

       —name: nginx

          image: nginx

          resources:

            limits:

              memory: 200Mi

            requests:

              cpu: 100m

              memory: 200Mi

          ports:

           —containerPort: 80



limits—each container should not be allowed to consume more than 200Mi of memory.

requests—each container requires 100m of CPU resources and 200Mi of memory on the node



references:

https://codefresh.io/learn/kubernetes-deployment/kubernetes-deployment-yaml/#:~:text=The%20Deployment%20object%20not%20only,fields%20in%20the%20Deployment%20YAML.


Kubernetes Sidecar Pattern

A pod is the basic building block of kubernetes application. Kubernetes manages pods instead of containers and pods encapsulate containers. A pod may contain one or more containers, storage, IP addresses, and, options that govern how containers should run inside the pod.


A pod that contains one container refers to a single container pod and it is the most common kubernetes use case. A pod that contains Multiple co-related containers refers to a multi-container pod. There are few patterns for multi-container pods one of them is the sidecar container pattern



What are Sidecar Containers

Sidecar containers are the containers that should run along with the main container in the pod. This sidecar pattern extends and enhances the functionality of current containers without changing it



All the Containers will be executed parallelly and the whole functionality works only if both types of containers are running successfully. Most of the time these sidecar containers are simple and small that consume fewer resources than the main container.


Below is a sample pod yaml file 


apiVersion: v1

kind: Pod

metadata:

  name: sidecar-container-demo

spec:

  containers:

  - image: busybox

    command: ["/bin/sh"]

    args: ["-c", "while true; do echo echo $(date -u) 'Hi I am from Sidecar container' >> /var/log/index.html; sleep 5;done"]

    name: sidecar-container

    resources: {}

    volumeMounts:

    - name: var-logs

      mountPath: /var/log

  - image: nginx

    name: main-container

    resources: {}

    ports:

      - containerPort: 80

    volumeMounts:

    - name: var-logs

      mountPath: /usr/share/nginx/html

  dnsPolicy: Default

  volumes:

  - name: var-logs

    emptyDir: {}




// create the pod

kubectl create -f pod.yml

// list the pods

kubectl get po

// exec into pod

kubectl exec -it sidecar-container-demo -c main-container -- /bin/sh

# apt-get update && apt-get install -y curl

# curl localhost




references:

https://medium.com/bb-tutorials-and-thoughts/kubernetes-learn-sidecar-container-pattern-6d8c21f873d

Saturday, August 12, 2023

Installing and running Minikube on Mac

Step 1:

To check if virtualization is supported on macOS, run the following command on your terminal.

sysctl -a | grep -E --color 'machdep.cpu.features|VMX'

If you see VMX in the output (should be colored), the VT-x feature is enabled in your machine.


Make sure you have kubectl installed. You can install kubectl using below command


brew install kubectl

Verify kubectl version

kubectl version


Install a Hypervisor

If you do not already have a hypervisor installed, install one of these now:


• HyperKit

• VirtualBox

• VMware Fusion


We will install HyperKit to run our Minikube

brew install hyperkit

Verify that you installed kubectl & HyperKit successfully in your mac using

brew list


Install Minikube

The easiest way to install Minikube on macOS is using Homebrew


brew install minikube


minikube version

We successfully setup minikube in our mac now we are good to start minikube


Starting minikube is using below 

minikube start

if you observe the statement above , minikube choose default driver as hyperkit , that’s what we installed as Hypervisor


Once minikube started successfully , we can verify its status

minikube status

After you have confirmed whether Minikube is working with your chosen hypervisor, you can continue to use Minikube or you can stop your cluster. To stop your cluster, run:

minikube stop

Delete minikube

minikube delete


references:

https://medium.com/@javatechie/kubernetes-tutorial-install-run-minikube-in-mac-os-k8s-cluster-369b25b0c3f0

Friday, August 4, 2023

What is Sidecar container

 A Kubernetes sidecar container is an additional container that is deployed alongside the main container within the same Kubernetes Pod. The term "sidecar" is derived from the sidecar attached to a motorcycle, which provides additional support and functionality. Similarly, a sidecar container in Kubernetes enhances the capabilities of the main container by providing complementary functionality or services.

The primary purpose of sidecar containers is to support the main application container by sharing the same network namespace, storage, and other resources within the Pod. This allows sidecar containers to closely interact with the main container and work together seamlessly.

Some common use cases for Kubernetes sidecar containers include:

Logging and Monitoring: A sidecar container can be used to collect logs from the main container or forward them to a centralized logging system. It can also handle metrics and send them to monitoring solutions.

Security and Encryption: A sidecar container can handle tasks related to security, such as managing SSL certificates, handling encryption/decryption, or authenticating requests.

Data Synchronization: A sidecar container can perform data synchronization or caching tasks, making data readily available to the main container.

Adapters and Proxies: A sidecar container can act as an adapter or proxy, modifying requests and responses before they reach the main container.

Backup and Restore: A sidecar container can handle backup and restore operations for the main container's data or configurations.

Using sidecar containers has several advantages, including:

Separation of Concerns: Sidecar containers allow you to keep specific functionalities or services separate from the main application, promoting a modular and maintainable architecture.

Reuse and Scalability: Sidecar containers can be easily reused across different applications, promoting code reuse and reducing duplication.

Easy Integration: Sidecar containers can integrate seamlessly with the main container within the same Pod, simplifying communication and coordination.

When defining sidecar containers in a Kubernetes Pod, ensure that the sidecar and main containers have clearly defined roles and responsibilities. Keep in mind that each container within a Pod shares the same network namespace, so they can communicate using localhost and ports without the need for exposing them externally.

Overall, Kubernetes sidecar containers are a powerful pattern to extend and enhance the functionality of your applications, enabling you to build more robust and feature-rich containerized solutions.

references:
OpenAI 

Can openshift pod container multiple containers

 Yes, OpenShift, which is built on top of Kubernetes, supports running multiple containers within a single pod. Kubernetes introduced the concept of multi-container pods, and OpenShift inherits and extends this capability.


A pod is the smallest deployable unit in Kubernetes and OpenShift. It represents a single instance of a running process in a cluster and can contain one or more containers that share the same network namespace, storage, and other resources.


The concept of having multiple containers within a single pod is particularly useful when those containers need to work together closely, share data, or perform related tasks. They can communicate with each other via localhost, which simplifies inter-container communication.


Here's an example of how you might define a multi-container pod in an OpenShift/Kubernetes YAML manifest:



apiVersion: v1

kind: Pod

metadata:

  name: multi-container-pod

spec:

  containers:

  - name: container-1

    image: container-1-image:latest

    # Container 1 configuration goes here

  - name: container-2

    image: container-2-image:latest

    # Container 2 configuration goes here


In this example, the pod named multi-container-pod contains two containers, container-1 and container-2. Both containers share the same network namespace, which means they can communicate with each other using localhost on specific ports.


Some common use cases for multi-container pods include:


Sidecar Containers: A sidecar container runs alongside the main application container and provides supporting services such as logging, monitoring, or data synchronization.


Data Preprocessing: One container can perform data preprocessing before passing the processed data to the main container.


Proxy or Adapter Containers: A proxy or adapter container can modify or transform the data before it reaches the main container.


Debugging and Troubleshooting: A debugging container can be used to inspect and troubleshoot issues in the main application container.


It's important to note that while multi-container pods offer benefits in terms of shared resources and communication, you should use them judiciously and consider the complexity they might introduce. Each container in a pod should be related to the same application and provide a distinct service or functionality that supports the main application's operation.


references:

OpenAI 

What is bitsandbytes

Bitsandbytes is a lightweight wrapper around CUDA custom functions, in particular 8-bit optimizers and quantization functions.

Features

8-bit Optimizers: Adam, AdamW, RMSProp, LARS, LAMB (saves 75% memory)

Stable Embedding Layer: Improved stability through better initialization, and normalization

8-bit quantization: Quantile, Linear, and Dynamic quantization

Fast quantile estimation: Up to 100x faster than other algorithms


Using the 8-bit Optimizers

With bitsandbytes 8-bit optimizers can be used by changing a single line of code in your codebase. For NLP models we recommend also to use the StableEmbedding layers (see below) which improves results and helps with stable 8-bit optimization. To get started with 8-bit optimizers, it is sufficient to replace your old optimizer with the 8-bit optimizer in the following way:

import bitsandbytes as bnb

# adam = torch.optim.Adam(model.parameters(), lr=0.001, betas=(0.9, 0.995)) # comment out old optimizer

adam = bnb.optim.Adam8bit(model.parameters(), lr=0.001, betas=(0.9, 0.995)) # add bnb optimizer

adam = bnb.optim.Adam(model.parameters(), lr=0.001, betas=(0.9, 0.995), optim_bits=8) # equivalent


torch.nn.Embedding(...) ->  bnb.nn.StableEmbedding(...) # recommended for NLP models


Note that by default all parameter tensors with less than 4096 elements are kept at 32-bit even if you initialize those parameters with 8-bit optimizers. This is done since such small tensors do not save much memory and often contain highly variable parameters (biases) or parameters that require high precision (batch norm, layer norm). You can change this behavior like so:


# parameter tensors with less than 16384 values are optimized in 32-bit

# it is recommended to use multiplies of 4096

adam = bnb.optim.Adam8bit(model.parameters(), min_8bit_size=16384) 


References:

https://pypi.org/project/bitsandbytes-cuda113/#:~:text=Bitsandbytes%20is%20a%20lightweight%20wrapper,bit%20optimizers%20and%20quantization%20functions.

What is meant by decoder only AI model

A "decoder-only" AI model refers to a specific type of neural network architecture where the model is designed to perform decoding tasks without an encoder component. In the context of neural networks, an encoder is responsible for extracting useful representations or features from the input data, while the decoder takes those representations and generates the desired output.


Typically, in many AI models, such as autoencoders or sequence-to-sequence models, there is both an encoder and a decoder. For example:


Autoencoder: An autoencoder is a type of neural network used for unsupervised learning. It consists of an encoder network that maps the input data to a lower-dimensional latent space representation, and a decoder network that reconstructs the input data from the latent representation.


Sequence-to-Sequence (Seq2Seq) Model: Seq2Seq models are used in tasks like machine translation or chatbot generation. They have an encoder that processes the input sequence and a decoder that generates the output sequence.


In contrast, a decoder-only AI model omits the encoder and focuses solely on the decoding aspect. The input to the model is typically a fixed-size representation or context vector, and the model's objective is to generate a desired output based on that context.


Decoder-only models can be used in various scenarios, such as:


Language Generation: In natural language processing, a decoder-only model can be used to generate sentences or paragraphs based on a given context or initial input.


Image Generation: In computer vision, a decoder-only model can be employed to generate images based on a latent representation or context vector.


Recommender Systems: In recommender systems, a decoder-only model can be used to generate personalized recommendations based on user preferences or historical data.


One advantage of decoder-only models is their efficiency, as they can be smaller and require fewer computations compared to models with both an encoder and decoder. However, they heavily rely on the quality of the context or latent representation provided as input.


Overall, the decision to use a decoder-only AI model depends on the specific task, data, and requirements of the application. It is a design choice in neural network architecture that can be beneficial in certain situations where only the decoding aspect is relevant.


References

OpenAI 

What is Falcon-B and Falcon-B Instruct

Falcon-7B is a 7B parameters causal decoder-only model built by TII and trained on 1,500B tokens of RefinedWeb enhanced with curated corpora. It is made available under the Apache 2.0 license.


Why use Falcon-7B?

It outperforms comparable open-source models (e.g., MPT-7B, StableLM, RedPajama etc.), thanks to being trained on 1,500B tokens of RefinedWeb enhanced with curated corpora. See the OpenLLM Leaderboard.

It features an architecture optimized for inference, with FlashAttention (Dao et al., 2022) and multiquery (Shazeer et al., 2019).

It is made available under a permissive Apache 2.0 license allowing for commercial use, without any royalties or restrictions.

Falcon-7B-Instruct is a 7B parameters causal decoder-only model built by TII based on Falcon-7B and finetuned on a mixture of chat/instruct datasets. It is made available under the Apache 2.0 license.


Why use Falcon-7B-Instruct?

You are looking for a ready-to-use chat/instruct model based on Falcon-7B.

Falcon-7B is a strong base model, outperforming comparable open-source models (e.g., MPT-7B, StableLM, RedPajama etc.), thanks to being trained on 1,500B tokens of RefinedWeb enhanced with curated corpora. See the OpenLLM Leaderboard.

It features an architecture optimized for inference, with FlashAttention (Dao et al., 2022) and multiquery (Shazeer et al., 2019).

 This is an instruct model, which may not be ideal for further finetuning. 


References

https://huggingface.co/tiiuae/falcon-7b

https://huggingface.co/tiiuae/falcon-7b-instruct

Wednesday, August 2, 2023

What is Private GPT

The main objective of Private GPT is to Interact privately with your documents using the power of GPT, 100% privately, with no data leaks. This is one of the most popular repos, with 34k+ stars.


PrivateGPT is a tool that allows you to train and use large language models (LLMs) on your own data. LLMs are powerful AI models that can generate text, translate languages, write different kinds of creative content, and answer your questions in an informative way.

There are many reasons why you might want to use privateGPT. For example, you might want to use it to:


Generate text that is tailored to your specific needs

Translate languages more accurately

Write creative content that is more original

Answer your questions in a more informative way


PrivateGPT gives you these benefits:


Privacy: PrivateGPT allows you to train LLMs on your own data, without having to worry about your data being shared with others.

Control: PrivateGPT gives you full control over the training process, so you can ensure that your LLM is trained on the data that you want it to be trained on.

LLMs can be expensive to train and require a lot of computing resources. PrivateGPT solves these problems by allowing you to train LLMs on your own data, without having to worry about the cost or resources.


Below are few easy steps for this


python -m venv venv 

source venv/bin/activate


git clone https://github.com/imartinez/privateGPT.git

cd privateGPT

pip3 install -r requirements.txt 


mkdir models

cd models

wget https://gpt4all.io/models/ggml-gpt4all-j-v1.3-groovy.bin

cd ..


mv example.env .env

vi .env 


Add the below 


PERSIST_DIRECTORY=db

MODEL_TYPE=GPT4All

MODEL_PATH=models/ggml-gpt4all-j-v1.3-groovy.bin

EMBEDDINGS_MODEL_NAME=all-MiniLM-L6-v2

MODEL_N_CTX=1000


python ingest.py

python privateGPT.py


Thats all !!! 

References:

https://generativeai.pub/unlocking-data-privacy-how-to-build-your-private-enterprise-data-app-with-private-gpt-and-llama-2-eb50d032d145

pip install for specific architecture

 On a Mac M1, I was getting this error message when attempting to run ingest.py

ImportError: dlopen(/Users/.../lib/python3.10/site-packages/hnswlib.cpython-310-darwin.so, 0x0002): tried: '/Users/.../lib/python3.10/site-packages/hnswlib.cpython-310-darwin.so' (mach-o file, but is an incompatible architecture (have 'x86_64', need 'arm64')), '/System/Volumes/Preboot/Cryptexes/OS/Users/.../lib/python3.10/site-packages/hnswlib.cpython-310-darwin.so' (no such file), '/Users/.../lib/python3.10/site-packages/hnswlib.cpython-310-darwin.so' (mach-o file, but is an incompatible architecture (have 'x86_64', need 'arm64'))

Below steps was taken to resolve the issue

# 1. Uninstall hnswlib

> pip uninstall hnswlib


# 2. Clear the pip cache

> pip cache purge


# 3. Reinstall with the arm64 architecture

> ARCHFLAGS="-arch arm64" pip install hnswlib


What is Python Poetry

Poetry is a tool for dependency management and packaging in Python. It allows you to declare the libraries your project depends on and it will manage (install/update) them for you. Poetry offers a lockfile to ensure repeatable installs, and can build your project for distribution.


The installer script is available directly at install.python-poetry.org, and is developed in its own repository. The script can be executed directly (i.e. ‘curl python’) or downloaded and then executed from disk (e.g. in a CI environment).

Linux, macOS, Windows (WSL)

curl -sSL https://install.python-poetry.org | python3 -

This was mostly a straight forward installation 

export PATH="/Users/retheesh/.local/bin:$PATH"

poetry --version

Poetry (version 1.5.1)

references:

https://python-poetry.org/docs/#installing-with-the-official-installer


What is Llama-2 and Llama 2-Chat

Meta has this week released an Open Source version of LLM mode, Llama 2, for public use. The large language model (LLM), which can be used to create a chat GPT like chatbot.

Many believe that Llama 2 is the industry’s most important release since ChatGPT in November 2022.

Llama-2, an updated version of Llama 1, trained on a new mix of publicly available data. Meta increased the size of the pretraining corpus by 40%, doubled the context length of the model, and adopted grouped-query attention. Llama 2 was released with 7B, 13B, and 70B parameters.

Llama 2-Chat, a fine-tuned version of Llama 2 that is optimized for dialogue use cases. The variants of this model have 7B, 13B, and 70B parameters as well.

Pretraining data: The Llama-2 training corpus includes a new mix of data from publicly available sources that does not include data from Meta’s products or services. Removed data from certain sites known to contain a high volume of personal information about private individuals. The model was trained on 2 trillion tokens of data as this provides a good performance–cost trade-off, up-sampling the most factual sources in an effort to increase knowledge and dampen hallucinations.

FineTuning: Llama 2-Chat is the result of several months of research and iterative applications of alignment techniques, including both instruction tuning and RLHF, requiring significant computational and annotation resources. RLHF is a model training procedure that is applied to a fine-tuned language model to further align model behavior with human preferences and instruction following.

references:

https://generativeai.pub/unlocking-data-privacy-how-to-build-your-private-enterprise-data-app-with-private-gpt-and-llama-2-eb50d032d145


PoC with LaMini-Flan LLM model

python -m venv venv

source venv/bin/activate


pip install torch torchvision torchaudio

pip install transformers langchain streamlit==1.24.0

pip install accelerate


All the files can be downloaded from here https://huggingface.co/MBZUAI/LaMini-Flan-T5-248M and click on “Files and Versions”

Need to download all the 11 files 

Below is sample code for demoing this. 


from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

from langchain.llms import HuggingFacePipeline

import torch


checkpoint = "./model/"

tokenizer = AutoTokenizer.from_pretrained(checkpoint)

base_model = AutoModelForSeq2SeqLM.from_pretrained(checkpoint,

                                                    device_map='auto',

                                                    torch_dtype=torch.float32)

llm = HuggingFacePipeline.from_model_id(model_id=checkpoint,

                                        task = 'text2text-generation',

                                        model_kwargs={"temperature":0.60,"min_length":30, "max_length":600, "repetition_penalty": 5.0})

                                        

from langchain import PromptTemplate, LLMChain

template = """{text}"""

prompt = PromptTemplate(template=template, input_variables=["text"])

chat = LLMChain(prompt=prompt, llm=llm)


yourprompt = input("Enter your prompt: ")


reply = chat.run(yourprompt)

print(reply) 



references:

https://levelup.gitconnected.com/building-a-local-chatbot-on-your-local-pc-100-offline-100-privacy-b617cc29558b

What is LLMChain, PromptTemplate

A LLMChain is the most common type of chain. It consists of a PromptTemplate, a model (either an LLM or a ChatModel), and an optional output parser. This chain takes multiple input variables, uses the PromptTemplate to format them into a prompt. It then passes that to the model.


Language models take text as input - that text is commonly referred to as a prompt. Typically this is not simply a hardcoded string but rather a combination of a template, some examples, and user input. LangChain provides several classes and functions to make constructing and working with prompts easy.


A prompt template refers to a reproducible way to generate a prompt. It contains a text string ("the template"), that can take in a set of parameters from the end user and generates a prompt.


A prompt template can contain:

instructions to the language model,

a set of few shot examples to help the language model generate a better response,

a question to the language model.

Here's the simplest example:


from langchain import PromptTemplate

template = """\

You are a naming consultant for new companies.

What is a good name for a company that makes {product}?

"""


prompt = PromptTemplate.from_template(template)

prompt.format(product="colorful socks")


References:

https://docs.langchain.com/docs/components/chains/llm-chain#:~:text=A%20LLMChain%20is%20the%20most,passes%20that%20to%20the%20model.



Tuesday, August 1, 2023

What is TorchVision and TorchAudio

 torchvision and torchaudio are Python packages that are part of the PyTorch ecosystem. PyTorch is an open-source deep learning library developed by Facebook's AI Research lab (FAIR) that provides a flexible and efficient framework for building and training various types of deep neural networks.

torchvision:

torchvision is a package that provides image and video datasets, model architectures, and image transformation utilities for use with PyTorch. It is commonly used in computer vision tasks and helps researchers and practitioners to easily access and work with standard datasets and pre-trained models. Some key components of torchvision include:

Datasets: torchvision.datasets module provides popular image and video datasets such as CIFAR-10, CIFAR-100, MNIST, ImageNet, and more, allowing you to quickly load and use these datasets in your projects.

Transforms: torchvision.transforms module provides a set of common image transformations like resizing, cropping, flipping, normalization, and data augmentation, making it easy to preprocess and augment images before feeding them into a neural network.

Pre-trained Models: torchvision.models module provides pre-trained deep learning models such as ResNet, VGG, AlexNet, etc., which you can use directly or fine-tune on your own tasks.

torchaudio:

torchaudio is a package that provides audio processing functionalities for PyTorch. It is designed to work seamlessly with PyTorch tensors and allows you to work with audio data in the same way as image data in torchvision. Some key functionalities of torchaudio include:

Data I/O: torchaudio provides functions to load and save audio data in various formats, making it easy to work with audio datasets.

Audio Transformations: torchaudio.transforms module offers a range of audio transformations like resampling, time stretching, frequency masking, and spectrogram computation, enabling you to preprocess and augment audio data for deep learning models.

Audio Dataset: torchaudio.datasets module provides access to common audio datasets for tasks like speech recognition and audio classification.

Both torchvision and torchaudio are valuable extensions of PyTorch that streamline the process of working with image and audio data, respectively, and enable users to build and experiment with a wide range of deep learning models in computer vision and audio processing domains.


references:

ChatGPT 

What is all-MiniLM-L6-v2

This is a sentence-transformers model: It maps sentences & paragraphs to a 384 dimensional dense vector space and can be used for tasks like clustering or semantic search.

Usage (Sentence-Transformers)

pip install -U sentence-transformers


from sentence_transformers import SentenceTransformer

sentences = ["This is an example sentence", "Each sentence is converted"]


model = SentenceTransformer('sentence-transformers/all-MiniLM-L6-v2')

embeddings = model.encode(sentences)

print(embeddings)


Without sentence-transformers, you can use the model like this: First, you pass your input through the transformer model, then you have to apply the right pooling-operation on-top of the contextualized word embeddings.


from transformers import AutoTokenizer, AutoModel

import torch

import torch.nn.functional as F


#Mean Pooling - Take attention mask into account for correct averaging

def mean_pooling(model_output, attention_mask):

    token_embeddings = model_output[0] #First element of model_output contains all token embeddings

    input_mask_expanded = attention_mask.unsqueeze(-1).expand(token_embeddings.size()).float()

    return torch.sum(token_embeddings * input_mask_expanded, 1) / torch.clamp(input_mask_expanded.sum(1), min=1e-9)



# Sentences we want sentence embeddings for

sentences = ['This is an example sentence', 'Each sentence is converted']


# Load model from HuggingFace Hub

tokenizer = AutoTokenizer.from_pretrained('sentence-transformers/all-MiniLM-L6-v2')

model = AutoModel.from_pretrained('sentence-transformers/all-MiniLM-L6-v2')


# Tokenize sentences

encoded_input = tokenizer(sentences, padding=True, truncation=True, return_tensors='pt')


# Compute token embeddings

with torch.no_grad():

    model_output = model(**encoded_input)


# Perform pooling

sentence_embeddings = mean_pooling(model_output, encoded_input['attention_mask'])


# Normalize embeddings

sentence_embeddings = F.normalize(sentence_embeddings, p=2, dim=1)


print("Sentence embeddings:")

print(sentence_embeddings)


References:

https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2

What is DistilGPT2

DistilGPT2 (short for Distilled-GPT2) is an English-language model pre-trained with the supervision of the smallest version of Generative Pre-trained Transformer 2 (GPT-2). Like GPT-2, DistilGPT2 can be used to generate text. Users of this model card should also consider information about the design, training, and limitations of GPT-2. 

Model Description: DistilGPT2 is an English-language model pre-trained with the supervision of the 124 million parameter version of GPT-2. DistilGPT2, which has 82 million parameters, was developed using knowledge distillation and was designed to be a faster, lighter version of GPT-2.

references:

https://huggingface.co/distilgpt2