-- Living Mobile --: July 2023

Sunday, July 30, 2023

What is Teachable Machine?

Teachable Machine is a web-based tool that makes creating machine learning models fast, easy, and accessible to everyone.

students use the camera or microphone on their device to train a machine through artificial intelligence (AI) to see or hear something and predict what it is.

Teachable Machine is a web tool that makes it fast and easy to create machine learning models for your projects, no coding required. Train a computer to recognize your images, sounds, & poses, then export your model for your sites, apps, and more.

Steps to use

Gathering Data: Gather and group your examples into classes, or categories, that you want the computer to learn. Upload your own image files, or capture them live with a mic or webcam.

Train the model: Click on- train model, tensorflow.js starts training a neural network in your browser.

Testing model: There you can see your model output and done.

references:

https://teachablemachine.withgoogle.com/train

Saturday, July 22, 2023

Benefits of Log file analysis using Deep Learning

Log file analysis using deep learning offers several advantages compared to traditional methods. Here are some key benefits:

Non-linearity: Deep learning models, such as neural networks with multiple hidden layers, can capture complex non-linear patterns in log data. This allows them to detect subtle anomalies and correlations that may be difficult to identify with linear or rule-based approaches.

Feature Learning: Deep learning models can automatically learn relevant features from raw log data, reducing the need for manual feature engineering. This is especially beneficial when dealing with unstructured or high-dimensional log data.

Flexibility: Deep learning models can handle various log data formats, including text logs, numerical logs, and even log images or sequences. This flexibility makes them suitable for analyzing diverse types of log data from different sources.

Scalability: Deep learning models can scale to handle large volumes of log data, making them suitable for real-time or big data log analysis.

Adaptability: Deep learning models can adapt to changing log data patterns over time. They can be updated with new data to continuously improve their performance in detecting anomalies or identifying patterns.

End-to-End Learning: Deep learning models can perform end-to-end learning, meaning they can take raw log data as input and produce actionable insights or anomaly detection results as output, without relying on manual intermediate steps.

Unsupervised Learning: Deep learning-based log analysis often involves unsupervised learning, where the models do not require labeled data for training. This is beneficial when labeled anomalous data is scarce or expensive to obtain.

Contextual Understanding: Deep learning models can capture the temporal dependencies and context in log data, enabling them to consider the sequential nature of log entries and better understand the overall behavior of the system.

Generalization: Once trained on a large dataset of log data, deep learning models can generalize well to new and unseen log data, making them applicable to various log sources and domains.

Real-Time Anomaly Detection: Deep learning models can be deployed for real-time anomaly detection, providing quick insights into potential issues or abnormalities in the system.

While deep learning offers numerous advantages, it's essential to consider the challenges as well, such as the need for substantial amounts of labeled data for supervised learning tasks, computational resources, and model interpretability. Depending on the specific use case and available resources, deep learning can significantly enhance the effectiveness and efficiency of log file analysis, leading to improved system monitoring, security, and operational performance.

references:

openai

Monday, July 17, 2023

What is Nautobot

Nautobot is an open source Network Source of Truth and Network Automation Platform. Nautobot was initially developed as a fork of NetBox (v2.10.4), which was originally created by Jeremy Stretch at DigitalOcean and by the NetBox open source community

Flexible Source of Truth for Networking - Nautobot core data models are used to define the intended state of network infrastructure enabling it as a Source of Truth. While a baseline set of models are provided (such as IP networks and addresses, devices and racks, circuits and cable, etc.) it is Nautobot's goal to offer maximum data model flexibility. This is enabled through features such as user-defined relationships, custom fields on any model, and data validation that permits users to codify everything from naming standards to having automated tests run before data can be populated into Nautobot.

Extensible Data Platform for Automation - Nautobot has a rich feature set to seamlessly integrate with network automation solutions. Nautobot offers GraphQL and native Git integration along with REST APIs and webhooks. Git integration dynamically loads YAML data files as Nautobot config contexts. Nautobot also has an evolving plugin system that enables users to create custom models, APIs, and UI elements. The plugin system is also used to unify and aggregate disparate data sources creating a Single Source of Truth to streamline data management for network automation.

Platform for Network Automation Apps - The Nautobot plugin system enables users to create Network Automation Apps. Apps can be as lightweight or robust as needed based on user needs. Using Nautobot for creating custom applications saves up to 70% development time by re-using features such as authentication, permissions, webhooks, GraphQL, change logging, etc. all while having access to the data already stored in Nautobot. Some production ready applications include:

Golden Configuration

Device Lifecycle

Firewall Models

SSoT

ChatOps

Circuit Maintenance

Capacity Metrics

Device Onboarding

references:

https://docs.nautobot.com/projects/core/en/stable/#application-stack

Sunday, July 16, 2023

What are Syntagmatic taggers

Syntagmatic taggers, also known as sequential taggers or sequential labeling models, are NLP models that assign labels or tags to each word or token in a sequence based on the surrounding context and syntactic relationships. These tags capture information such as part-of-speech (POS) tags, named entities, syntactic dependencies, or other linguistic features.

Some popular syntagmatic taggers include:

Part-of-Speech (POS) Taggers: These taggers assign grammatical categories (e.g., noun, verb, adjective) to each word in a sentence. They capture the syntactic role of words in a sentence and are commonly used in various NLP tasks.

Named Entity Recognition (NER) Taggers: NER taggers identify and classify named entities in text, such as person names, locations, organizations, or dates. They help in extracting specific entities from unstructured text.

Syntactic Dependency Taggers: These taggers assign syntactic dependency labels to words, indicating their grammatical relationships in a sentence. Examples of dependency tags include subject, object, modifier, or conjunction.

Chunking or Shallow Parsing Taggers: Chunking taggers group words into chunks based on syntactic structures such as noun phrases, verb phrases, or prepositional phrases. They provide higher-level syntactic information beyond POS tags.

Syntagmatic taggers are typically trained using supervised machine learning approaches, such as Hidden Markov Models (HMMs), Conditional Random Fields (CRFs), or deep learning models like Recurrent Neural Networks (RNNs) or Transformer-based models. These models learn to make predictions based on the observed context and the relationships between neighboring words in a sequence.

Syntagmatic taggers play a crucial role in various NLP applications, including information extraction, text classification, machine translation, sentiment analysis, and more. They provide essential syntactic and semantic annotations that enable higher-level understanding and analysis of text data.

Which is Best algorithm for NER Naive Bayes or Hidden Markov or Decision Trees?

The choice of which algorithm is better for Named Entity Recognition (NER) depends on various factors such as the specific requirements of your NER task, the characteristics of your data, and the trade-offs you are willing to make. Each algorithm has its strengths and weaknesses. Here's an overview of Naive Bayes, Decision Trees, and Hidden Markov Models (HMMs) for NER:

Naive Bayes:

Naive Bayes is a probabilistic classification algorithm that works well with text classification tasks like NER.

It assumes that the features are conditionally independent, which simplifies the modeling process.

Naive Bayes is computationally efficient and can handle large feature spaces.

However, it may struggle with capturing complex dependencies between words and their labels.

Decision Trees:

Decision Trees are versatile and interpretable algorithms for classification tasks, including NER.

They learn hierarchical decision rules based on the features to classify entities.

Decision Trees can handle both categorical and numerical features and can capture complex relationships.

However, they may be prone to overfitting, especially if the tree becomes too deep or the data is noisy.

Hidden Markov Models (HMMs):

HMMs are sequential models commonly used for NER tasks where the sequence of words matters.

They model the transition probabilities between states (labels) and the emission probabilities of observations (words).

HMMs can capture the sequential dependencies between labels and incorporate context information.

However, HMMs make the simplifying assumption of the Markov property, which limits their ability to capture long-range dependencies.

In practice, the performance of these algorithms may vary depending on the specific dataset and task. It is recommended to experiment with different algorithms and evaluate their performance using appropriate metrics like precision, recall, and F1-score on a validation or test set. Additionally, more advanced techniques like Conditional Random Fields (CRF) or deep learning models such as Recurrent Neural Networks (RNNs) and Transformer-based models have shown promising results for NER tasks and could be considered as well.

Ultimately, the best algorithm for NER depends on the specific requirements, data characteristics, and trade-offs you are willing to make in terms of performance, interpretability, and computational efficiency.

References

chatGPT

Latent semantic indexing, Bag of Words & Word2Vec

Yes, latent semantic indexing (LSI) can be used for feature extraction in natural language processing (NLP) tasks. LSI is a technique that aims to capture the latent semantic relationships between words in a corpus by analyzing the co-occurrence patterns of words in documents.

In the context of feature extraction, LSI can help uncover the underlying semantic structure in a collection of documents and represent them in a lower-dimensional space. It reduces the dimensionality of the document-term matrix by identifying the latent semantic factors that contribute to the similarity and relatedness of documents.

Here's how LSI can aid in feature extraction:

Dimensionality Reduction: LSI reduces the high-dimensional document-term matrix to a lower-dimensional space while preserving the important semantic relationships. It identifies the key latent factors or concepts in the data and represents documents in terms of these factors.

Semantic Similarity: LSI captures the semantic similarity between words and documents. It identifies the common latent factors that contribute to the similarity of words or documents and represents them in a more compact and meaningful way. This can be useful for tasks such as document clustering, information retrieval, or recommendation systems.

Noise Reduction: LSI helps in reducing the noise or irrelevant information in the document-term matrix. It focuses on the most significant latent factors while downplaying the less relevant ones. This can improve the quality of extracted features by filtering out noise and capturing the essence of the data.

Generalization: LSI can help in generalizing the representation of documents. It captures the underlying semantic concepts that go beyond the specific terms used in the documents. This allows for a more generalized and abstract representation of the documents, which can be beneficial in tasks like text classification or topic modeling.

Overall, LSI can be a useful technique for feature extraction in NLP tasks as it uncovers the latent semantic structure in text data and provides a more meaningful and compact representation. It allows for capturing the important aspects of the data while reducing noise and dimensionality, which can lead to improved performance in downstream tasks.

he Bag-of-Words (BoW) model is a common technique used for feature extraction in natural language processing (NLP). It represents text documents as numerical feature vectors, where each feature represents the presence or frequency of a particular word or term in the document corpus.

Here's how the Bag-of-Words model helps in feature extraction:

Simple Representation: The BoW model provides a simple and straightforward representation of text data. It treats each document as an unordered collection of words and disregards the grammar, word order, and context. This simplification allows for efficient feature extraction and comparison.

Vocabulary Creation: The BoW model creates a vocabulary or dictionary of unique words or terms present in the document corpus. Each word or term in the vocabulary becomes a feature or dimension in the feature vector representation.

Term Frequency: The BoW model captures the frequency of each word or term in a document. The number of times a word appears in a document is often used as the value for that word's feature in the feature vector.

Occurrence or Presence: The BoW model can represent the presence or absence of a word in a document. Instead of using term frequency, a binary value (1 or 0) is assigned to each feature depending on whether the word is present or absent in the document.

Vector Space Representation: The BoW model transforms each document into a high-dimensional feature vector, where each dimension corresponds to a word or term in the vocabulary. These feature vectors can then be used as input for various machine learning algorithms for tasks such as text classification, clustering, sentiment analysis, and more.

While the BoW model is a simple and effective technique for feature extraction, it has limitations. It does not consider the semantic meaning or context of words and can lead to high-dimensional and sparse representations. However, with appropriate preprocessing steps, such as stop word removal, stemming, and tf-idf weighting, the BoW model can still provide useful features for many NLP tasks.

Word2Vec is commonly used for feature extraction in natural language processing (NLP) tasks. Word2Vec is a popular algorithm that learns distributed vector representations, or word embeddings, from large text corpora. These word embeddings capture the semantic and syntactic relationships between words, allowing for efficient and meaningful representation of words in a continuous vector space.

Word Embeddings: Word2Vec generates dense vector representations for words in a way that words with similar meanings or contexts are located close to each other in the vector space. These word embeddings capture semantic relationships and capture the meaning of words in a more nuanced manner than simple one-hot encoding or frequency-based representations.

Semantic Similarity: Word2Vec allows for measuring semantic similarity between words based on the proximity of their word embeddings in the vector space. This can be useful in various NLP tasks, such as information retrieval, question answering, or recommendation systems, where understanding the semantic relatedness between words or documents is crucial.

Feature Vectors: Word2Vec can be used to transform individual words into fixed-length feature vectors. These word embeddings can serve as feature representations for words in a document or text corpus. Aggregating the word embeddings of words in a document can yield a feature vector representation of the document itself. This enables the use of machine learning algorithms on top of these feature vectors for tasks like text classification, sentiment analysis, or clustering.

Transfer Learning: Word2Vec embeddings can be pre-trained on large, generic text corpora, such as Wikipedia or news articles. These pre-trained embeddings can then be used as feature representations in downstream NLP tasks with smaller labeled datasets. This transfer learning approach helps leverage the general language knowledge captured by Word2Vec in specific NLP applications.

Word2Vec has been widely adopted and proven effective in various NLP tasks, providing rich and meaningful feature representations. It allows for capturing the semantic relationships between words and provides a foundation for building more advanced NLP models and applications.

References

OpenAI

What is Deep Parsing in NLP

Deep parsing aims to uncover the underlying syntactic structure of a sentence beyond shallow parsing, which typically involves part-of-speech tagging and basic phrase identification. Chunking plays a crucial role in deep parsing as it helps in understanding the grammatical relationships between words and their constituents.

chunking is a technique commonly used in deep parsing. Chunking is the process of grouping words together into meaningful syntactic units called "chunks." It involves identifying and extracting phrases such as noun phrases (NP), verb phrases (VP), prepositional phrases (PP), etc., based on the grammatical structure of the sentence.

Chunking can be performed using various techniques, such as rule-based approaches, regular expressions, or machine learning methods. Machine learning-based approaches, particularly using supervised learning algorithms like Conditional Random Fields (CRF) or Recurrent Neural Networks (RNN), have been widely used for chunking task

Context-Free Grammars (CFGs) are commonly used in deep parsing. CFGs provide a formal representation of the syntax or grammar of a language by defining a set of production rules that specify how different constituents or phrases can be combined to form valid sentences.

Various parsing algorithms, such as CYK (Cocke-Younger-Kasami) parsing, Earley parsing, or chart parsing, are based on CFGs and used in deep parsing. These algorithms recursively apply grammar rules to parse sentences and build parse trees that represent the sentence's syntactic structure.

Additionally, many deep parsing models use probabilistic CFGs or their extensions, such as Lexicalized CFGs (LCFGs) or Probabilistic Context-Free Grammars (PCFGs), to capture statistical patterns and improve parsing accuracy. These models incorporate statistical information, such as word probabilities or transition probabilities, to guide the parsing process.

In summary, Context-Free Grammars serve as the basis for formulating grammar rules and parsing algorithms used in deep parsing. They provide a formal framework for modeling the syntactic structure of sentences and enabling the analysis of complex grammatical relationships in natural language.

References:

What is Constant error carousel?

The constant error carousel, also known as the vanishing gradient problem, refers to the issue where the gradients calculated during backpropagation in RNNs diminish or explode as they propagate back through time. This can lead to difficulties in learning long-term dependencies in sequential data.

LSTM cells were specifically designed to mitigate the vanishing gradient problem and allow for the effective learning of long-term dependencies. They achieve this through the use of gating mechanisms that control the flow of information inside the cell.

The gating mechanisms in LSTM cells, such as the forget gate and input gate, enable the cells to selectively retain or discard information from previous time steps. This selective information flow helps prevent the gradients from vanishing or exploding during training, allowing the network to effectively capture and utilize long-term dependencies in the data.

By utilizing memory cells, input and forget gates, and carefully designed update equations, LSTM cells can maintain a constant error signal throughout the entire sequence. This allows them to capture and propagate gradients over long time horizons, effectively addressing the vanishing gradient problem.

So, to summarize, LSTM cells are specifically designed to overcome the vanishing gradient problem associated with traditional RNNs. They do not suffer from the constant error carousel issue and can effectively capture long-term dependencies in sequential data.

What is Pooling Operations?

Yes, pooling operations, such as max pooling or average pooling, generally reduce the width and height of the output feature maps in convolutional neural networks (CNNs).

Pooling operations are typically applied after convolutional layers in CNN architectures. The purpose of pooling is to downsample the feature maps, reducing their spatial dimensions while retaining important information.

In max pooling, for example, a pooling kernel (typically of size 2x2) slides over the input feature map, and the maximum value within each kernel region is selected as the output value. This effectively reduces the spatial resolution by half, as the output feature map will have half the width and half the height of the input feature map.

Similarly, average pooling computes the average value within each kernel region and replaces the input values with the computed averages. This also reduces the spatial resolution of the feature maps.

The downsampling effect of pooling helps to reduce the computational complexity of subsequent layers, provide translational invariance, and extract higher-level abstract features. By reducing the spatial dimensions, pooling helps to capture the most important features while discarding some fine-grained details.

However, it's important to note that pooling operations can result in some loss of spatial information. In recent years, there has been a trend towards using architectures with smaller or no pooling layers, such as the fully convolutional networks (FCNs), to better preserve spatial information in tasks like semantic segmentation.

Overall, pooling operations are commonly used in CNNs to downsample feature maps and reduce their spatial dimensions, which is beneficial for subsequent layers' efficiency and capturing important features.

Different Deep learning architectures

Siamese

Siamese is an architecture commonly used in deep learning for tasks such as similarity learning and metric learning. The Siamese architecture consists of two or more identical subnetworks that share the same weights and are trained simultaneously.

In the Siamese architecture, each subnetwork takes in a separate input (e.g., two images, two sentences, or two audio clips) and processes them independently through the shared layers. The outputs from the subnetworks are then compared to measure the similarity or dissimilarity between the inputs.

The Siamese architecture has been successfully applied to various tasks, including face recognition, signature verification, text similarity, and image retrieval. It is particularly useful when there is a limited amount of labeled training data or when pairwise similarity information is available.

AlexNet

AlexNet is a convolutional neural network (CNN) architecture that played a pivotal role in advancing the field of computer vision and deep learning. It was introduced by Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton in 2012, and it achieved significant breakthroughs in image classification tasks.

AlexNet consists of multiple convolutional layers, pooling layers, and fully connected layers. It was designed to handle large-scale image classification tasks, particularly on the ImageNet dataset, which contains millions of labeled images across various categories.

VGGNet/ GoogleNet

VGGNet (also known as VGG) and GoogleNet (also known as Inception) are both popular convolutional neural network (CNN) architectures used for image classification and other computer vision tasks.

VGGNet:

VGGNet was introduced by the Visual Geometry Group (VGG) at the University of Oxford. It is known for its simplicity and uniform architecture. VGGNet consists of a series of convolutional layers followed by max pooling layers, and it can have varying depths. The most commonly used variant, VGG16, has 16 convolutional layers, while VGG19 has 19 convolutional layers. VGGNet uses small 3x3 filters with stride 1 throughout the network, which allows for more detailed feature extraction.

GoogleNet:

GoogleNet, also referred to as Inception, was developed by researchers at Google. It is known for its innovative and complex architecture aimed at improving both accuracy and computational efficiency. GoogleNet introduced the concept of inception modules, which consist of multiple parallel convolutional layers of different sizes. These parallel layers capture features at different scales and are then concatenated to form the output of the module. This allows the network to learn both local and global features effectively.

ResNet

ResNet (Residual Neural Network) is a deep convolutional neural network architecture that was introduced by researchers at Microsoft Research in 2015. It was designed to address the challenge of training very deep neural networks by mitigating the vanishing gradient problem.

The key idea behind ResNet is the introduction of skip connections or "identity shortcuts" that allow for the direct flow of information from earlier layers to later layers. These skip connections create shortcut paths that bypass a few convolutional layers, allowing the gradients to flow more easily during backpropagation. This helps alleviate the vanishing gradient problem and enables the training of much deeper networks.

SSD/ Yolo

SSD (Single Shot MultiBox Detector) and YOLO (You Only Look Once) are both popular object detection architectures used in computer vision tasks.

SSD:

SSD is an object detection algorithm that combines the benefits of deep learning and the concept of anchor boxes to detect objects in images. It is a single-shot detector, meaning it performs object detection in a single pass of the network. SSD is designed to achieve high detection accuracy while maintaining real-time processing speeds.

The key features of SSD include:

Utilization of multiple feature maps at different scales to detect objects of varying sizes.

Prediction of object bounding boxes and class probabilities at each location in the feature maps.

Employing anchor boxes with different aspect ratios to handle objects with different shapes and aspect ratios.

SSD has demonstrated excellent performance in object detection tasks and is widely used for real-time applications such as pedestrian detection, face detection, and object tracking.

YOLO:

YOLO is another popular object detection architecture that follows a different approach compared to traditional region-based methods. YOLO takes a unified approach where it divides the input image into a grid and makes predictions for bounding boxes and class probabilities directly from this grid. YOLO is known for its real-time performance and efficiency.

What are various optimisation Algorithms

Momentum takes into account the history of gradients during optimization. It introduces a momentum term that accumulates the past gradients and uses them to guide the parameter updates. The momentum term helps to smooth out the variations in gradient updates and allows the optimizer to navigate through regions with high curvature more efficiently.

In the Momentum optimization algorithm, the parameter update at each iteration is a combination of the current gradient and the accumulated past gradients. The update is influenced by both the current gradient and the momentum term, which is a fraction of the previous parameter update.

RMSprop

RMSprop stands for "Root Mean Square Propagation." It is an adaptive learning rate optimization algorithm that adjusts the learning rate for each parameter based on the average of the squared gradients of that parameter. The main idea behind RMSprop is to divide the learning rate by the square root of the exponentially decaying average of squared gradients.

In RMSprop, the learning rate for each parameter is individually computed and updated during the optimization process. The learning rate is scaled by the square root of the moving average of the squared gradients. This scaling allows the algorithm to reduce the learning rate for parameters with large and frequent updates, while increasing the learning rate for parameters with small and infrequent updates.

Adagrad:

Adagrad stands for "Adaptive Gradient Algorithm." It addresses the challenge of choosing an appropriate learning rate by automatically scaling the learning rates for each parameter based on the frequency and magnitude of their past gradients.

In Adagrad, the learning rate for each parameter is individually computed and updated during the optimization process. The learning rate is inversely proportional to the square root of the sum of squared gradients accumulated for that parameter. This means that parameters with smaller gradients receive larger learning rates, while parameters with larger gradients receive smaller learning rates.

Nadam:

Nadam stands for "Nesterov-accelerated Adaptive Moment Estimation." It combines the benefits of Nesterov Momentum and the adaptive learning rate scheme of Adam to improve the convergence speed and optimization performance.

In Nadam, the parameter updates are based on a combination of the current gradient and the momentum term. It incorporates the Nesterov Momentum technique, which calculates the gradient based on the lookahead position using the momentum term. This lookahead computation helps to make more accurate updates and improves convergence.

Gradient Descent is a fundamental optimization algorithm commonly used in machine learning and deep learning. It is widely employed to minimize the loss or cost function during the training of models.

Gradient Descent is an iterative optimization algorithm that aims to find the minimum of a function by iteratively adjusting the model parameters in the direction of steepest descent. It utilizes the gradients of the function with respect to the parameters to guide the parameter updates.

The basic idea behind Gradient Descent is as follows:

Initialize the model parameters with some initial values.

Compute the gradients of the loss or cost function with respect to the parameters.

Update the parameters by taking steps proportional to the negative gradients.

Repeat steps 2 and 3 until convergence or a specified number of iterations.

references:

Saturday, July 15, 2023

What is faker Library

Faker is a popular library that generates fake (but reasonable) data that can be used for things such as:

Unit Testing

Performance Testing

Building Demos

Working without a completed backend

Faker was originally written in Perl and this is the JavaScript port. Language bindings also exist for Ruby, Java, and Python.

Installation

Install it as a Dev Dependency using your favorite package manager.

npm install @faker-js/faker --save-dev

references:

https://fakerjs.dev/guide/

What is OpenBSD

The OpenBSD project produces a FREE, multi-platform 4.4BSD-based UNIX-like operating system. Our efforts emphasize portability, standardization, correctness, proactive security and integrated cryptography. As an example of the effect OpenBSD has, the popular OpenSSH software comes from OpenBSD.

The current release is OpenBSD 7.3, released April 10, 2023. This is the 54th release.

OpenBSD is developed entirely by volunteers. The project's development environment and developer events are funded through contributions collected by The OpenBSD Foundation. Contributions ensure that OpenBSD will remain a vibrant and free operating system.

references:

https://www.openbsd.org/

What is Azure Time Series Anomaly Detector

Easily embed time-series anomaly detection capabilities into your apps to help users identify problems quickly. Anomaly Detector ingests time-series data of all types and selects the best anomaly detection algorithm for your data to ensure high accuracy. Detect spikes, dips, deviations from cyclic patterns, and trend changes through both univariate and multivariate APIs. Customize the service to detect any level of anomaly. Deploy the anomaly detection service where you need it—in the cloud or at the intelligent edge.

Specialities are :

Automatic detection eliminates the need for labeled training data to help you save time and stay focused on fixing problems as soon as they surface.

Customizable settings let you fine-tune sensitivity to potential anomalies based on the risk profile of your business.

Identify multivariate anomalies

Use multivariate anomaly detection to evaluate multiple signals and the correlations between them to find sudden changes in data patterns before they affect your business.

References:

https://azure.microsoft.com/en-in/products/cognitive-services/anomaly-detector

What is Microsoft Time Series Algorithm

The management team at Adventure Works Cycles wants to predict monthly bicycle sales for the coming year. The company is especially interested in whether the sale of one bike model can be used to predict the sale of another model. By using the Microsoft Time Series algorithm on historical data from the past three years, the company can produce a data mining model that forecasts future bike sales. Additionally, the company can perform cross predictions to see whether the sales trends of individual bike models are related.

Each quarter, the company plans to update the model with recent sales data and update their predictions to model recent trends. To correct for stores that do not accurately or consistently update sales data, they will create a general prediction model, and use that to create predictions for all regions.

In SQL Server 2005 (9.x), the Microsoft Time Series algorithm used a single auto-regressive time series method, named ARTXP. The ARTXP algorithm was optimized for short-term predictions, and therefore, excelled at predicting the next likely value in a series. Beginning in SQL Server 2008, the Microsoft Time Series algorithm added a second algorithm, ARIMA, which was optimized for long-term prediction. For a detailed explanation about the implementation of the ARTXP and ARIMA algorithms

By default, the Microsoft Time Series algorithm uses a mix of the algorithms when it analyzes patterns and making predictions. The algorithm trains two separate models on the same data: one model uses the ARTXP algorithm, and one model uses the ARIMA algorithm. The algorithm then blends the results of the two models to yield the best prediction over a variable number of time slices. Because ARTXP is best for short-term predictions, it is weighted more heavily at the beginning of a series of predictions. However, as the time slices that you are predicting move further into the future, ARIMA is weighted more heavily.

References:

https://learn.microsoft.com/en-us/analysis-services/data-mining/microsoft-time-series-algorithm?view=asallproducts-allversions#example

What is OpenTSDB

OpenTSDB consists of a Time Series Daemon (TSD) as well as set of command line utilities. Interaction with OpenTSDB is primarily achieved by running one or more of the TSDs. Each TSD is independent. There is no master, no shared state so you can run as many TSDs as required to handle any load you throw at it. Each TSD uses the open source database HBase or hosted Google Bigtable service to store and retrieve time-series data. The data schema is highly optimized for fast aggregations of similar time series to minimize storage space.Users of the TSD never need to access the underlying store directly. You can communicate with the TSD via a simple telnet-style protocol, an HTTP API or a simple built-in GUI. All communications happen on the same port (the TSD figures out the protocol of the client by looking at the first few bytes it receives).

Writing

The first step in using OpenTSDB is to send time series data to the TSDs. A number of tools exist to pull data from various sources into OpenTSDB. If you can't find a tool for your needs, you may need to write scripts that collect data from your systems (e.g. by reading interesting metrics from /proc on Linux, collecting counters from your network gear via SNMP, or other interesting data from your applications, via JMX for instance for Java applications) and push data points to one of the TSDs periodically

StumbleUpon wrote a Python framework called tcollector that is used to collect thousand of metrics from Linux 2.6, Apache's HTTPd, MySQL, HBase, memcached, Varnish and more. This low-impact framework includes a number useful collectors and the community is constantly providing more. Alternative frameworks with OpenTSDB support include Collectd, Statsd and the Coda Hale metrics emitter..

In OpenTSDB, a time series data point consists of:

A metric name.

A UNIX timestamp (seconds or millisecinds since Epoch).

A value (64 bit integer or single-precision floating point value), a JSON formatted event or a histogram/digest.

A set of tags (key-value pairs) that describe the time series the point belongs to.

references:

http://opentsdb.net/overview.html

Friday, July 14, 2023

Why Neural Networks are preferred in case of images , videos, large corpus of data

Deep learning is preferred for text, image, and video data because it has demonstrated remarkable success in handling complex and high-dimensional data, capturing intricate patterns, and achieving state-of-the-art performance in various tasks. Here are some reasons why deep learning is well-suited for these types of data:

Representation Learning: Deep learning models are capable of automatically learning hierarchical representations from raw data. This is particularly advantageous for text, image, and video data, which often have high-dimensional and unstructured formats. Deep learning models can learn meaningful features and representations at different levels of abstraction, capturing intricate details and patterns.

Complex Relationships: Text, image, and video data often involve complex relationships and dependencies. Deep learning models, such as convolutional neural networks (CNNs) for images and recurrent neural networks (RNNs) for sequential data like text and video, can capture these complex relationships effectively. They have the ability to model long-term dependencies and capture spatial and temporal patterns, allowing them to learn from context and capture the nuances present in the data.

Scalability: Deep learning models can scale to handle large datasets with millions or billions of samples. They are designed to learn from vast amounts of data and leverage parallel processing capabilities of modern hardware, such as GPUs and TPUs. This scalability makes deep learning models suitable for training on massive text corpora, image datasets, and video collections.

End-to-End Learning: Deep learning enables end-to-end learning, where the entire model is trained to optimize the desired objective function directly from raw input data to output predictions. This eliminates the need for handcrafted feature engineering or explicit preprocessing steps, making the modeling process more streamlined and efficient.

State-of-the-Art Performance: Deep learning has achieved remarkable performance in various text, image, and video analysis tasks, surpassing traditional machine learning approaches. Applications such as natural language processing, image recognition, object detection, image captioning, video analysis, and many others have witnessed significant advancements and breakthroughs with deep learning methods.

However, it's important to note that deep learning models often require large amounts of labeled data and substantial computational resources for training. They can be computationally intensive and may require specialized hardware for efficient training and inference. Additionally, model interpretability and explainability can be challenging with deep learning approaches compared to traditional machine learning methods. Nevertheless, the power and flexibility of deep learning make it the preferred choice for many text, image, and video-related tasks.

Monday, July 10, 2023

What is unicorn

Uvicorn is an ASGI web server implementation for Python.

Until recently Python has lacked a minimal low-level server/application interface for async frameworks. The ASGI specification fills this gap, and means we're now able to start building a common set of tooling usable across all async frameworks.

Uvicorn currently supports HTTP/1.1 and WebSockets.

$ pip install uvicorn

$ pip install 'uvicorn[standard]'

This will install uvicorn with "Cython-based" dependencies (where possible) and other "optional extras".

In this context, "Cython-based" means the following:

the event loop uvloop will be installed and used if possible.

uvloop is a fast, drop-in replacement of the built-in asyncio event loop. It is implemented in Cython. Read more here.

The built-in asyncio event loop serves as an easy-to-read reference implementation and is there for easy debugging as it's pure-python based.

the http protocol will be handled by httptools if possible.

Read more about comparison with h11 here.

Moreover, "optional extras" means that:

the websocket protocol will be handled by websockets (should you want to use wsproto you'd need to install it manually) if possible.

the --reload flag in development mode will use watchfiles.

windows users will have colorama installed for the colored logs.

python-dotenv will be installed should you want to use the --env-file option.

PyYAML will be installed to allow you to provide a .yaml file to --log-config, if desired.

async def app(scope, receive, send):

assert scope['type'] == 'http'

await send({

'type': 'http.response.start',

'status': 200,

'headers': [

[b'content-type', b'text/plain'],

})

await send({

'type': 'http.response.body',

'body': b'Hello, world!',

})

To run the server, can do the below

$ uvicorn example:app

References

https://www.uvicorn.org/

Context-Aware Question-Answering Systems With LLM

The application processes user input and generates appropriate responses based on the document’s content. It uses the LangChain library for document loading, text splitting, embeddings, vector storage, question-answering, and GPT-3.5-turbo under the hood providing the bot responses via JSON to our UI.

The blog here gives good details on this below are the attempts done

git clone https://github.com/Ricoledan/llm-gpt-demo

cd backend/

pip install -r requirements.txt

cd frontend/

npm i

Bit of pre-processing to be done like this below

First, we leverage LangChain’s document_loaders.unstructured package like this import below:

from langchain.document_loaders.unstructured import UnstructuredFileLoader

The load the unstructured data like this

loader = UnstructuredFileLoader(‘./docs/document.txt’)

documents = loader.load()

text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)

split_texts = text_splitter.split_documents(documents)

CharacterTextSplitter in LangChain takes two arguments: chunk_size and chunk_overlap. The chunk_size parameter determines the size of each text chunk, while chunk_overlap specifies the number of overlapping characters between two adjacent chunks. By setting these parameters, you can control the granularity of the text splitting and tailor it to your specific application’s requirements.

Embedding Generation

Representing text numerically

For our model to leverage what’s in the text, we must first convert the textual data into numerical representations called embeddings to make sense of it. These embeddings capture the semantic meaning of the text and allow for efficient and meaningful comparisons between text segments.

embeddings = OpenAIEmbeddings()

In LangChain, embeddings = OpenAIEmbeddings() creates an instance of the OpenAIEmbeddings class, which generates vector embeddings of the text data. Vector embeddings are the numerical representations of the text that capture its semantic meaning. These embeddings are used in various stages of the NLP pipeline, such as similarity search and response generation.

Vector Database Storage

Efficient organization of embeddings

A vector database, a vector store or search engine, is a data storage and retrieval system designed to handle high-dimensional vector data. In the context of natural language processing (NLP) and machine learning, vector databases are used to store and efficiently query embeddings or other vector representations of data.

from langchain.vectorstores import Chroma

Similarity Search

Finding relevant matches

With the query embedding generated, a similarity search is performed in the vector database to identify the most relevant matches. The search compares the query embedding to the stored embeddings, ranking them based on similarity metrics like cosine similarity or Euclidean distance. The top matches are the most relevant text passages to the user’s query.

vector_db = Chroma.from_documents(documents=split_texts, embeddings=embeddings, persist_directory=persist_directory)

This line of code creates an instance of the Chroma vector database using the from_documents() method. By default, Chroma uses an in-memory database, which gets persisted on exit and loaded on start, but for our project, we are persisting the database locally using the persist_directory option and passing in the name with a variable of the same name.

Response Generation

Producing informative and contextual answers

Finally, the crafted prompt is fed to ChatGPT, which generates an answer based on the input. The generated response is then returned to the user, completing the process of retrieving and delivering relevant information based on the user’s query. The language model produces a coherent, informative, and contextually appropriate response by leveraging its deep understanding of language patterns and the provided context.

references:

https://betterprogramming.pub/building-context-aware-question-answering-systems-with-llms-b6f2b6e387ec

What is Chinchilla Model

A more compute-optimal 70B model, called Chinchilla, is trained on 1.4 trillion tokens. Not only does Chinchilla outperform its much larger counterpart, Gopher, but its reduced model size reduces inference cost considerably.

references:

https://sh-tsang.medium.com/brief-review-chinchilla-training-compute-optimal-large-language-models-7e4d00680142

Langchain OpenAI Functions Agent

The OpenAI Functions Agent in Langchain is designed to work with models like gpt-3.5-turbo-0613 and gpt-4–0613 to automatically complete the tasks from tool description to JSON generation to function calls

a) Tool Creation

The tool creation is following the normal Langchain process of custom tool creation. You need to define a class GetflightInPeriodTool inherit BaseTool class by register name, description, _run() and _arun(), and do a format and existence check for input arguments via GetflightInPeriodCheckInput()

from langchain.tools import BaseTool

from pydantic import BaseModel, Field

from typing import Optional, Type

class GetflightInPeriodCheckInput(BaseModel):

fly_from: str = Field(..., description="the 3-digit code for departure airport")

fly_to: str = Field(..., description="the 3-digit code for arrival airport")

date_from: str = Field(..., description="the dd/mm/yyyy format of start date for the range of search")

date_to: str = Field(..., description="the dd/mm/yyyy format of end date for the range of search")

sort: str = Field(..., description="the catagory for low-to-high sorting, only support 'price', 'duration', 'date'")

class GetflightInPeriodTool(BaseTool):

name = "get_flight_in_period"

description = """Useful when you need to search the flights info. You can sort the result by "sort" argument.

if there is no year, you need to use 2023 for search.

Try to understand the parameters of every flight

"""

def _run(self, fly_from: str, fly_to: str, date_from: str, date_to: str, sort: str):

get_flight_in_period_response = get_flight_in_period(fly_from, fly_to, date_from, date_to, sort)

return get_flight_in_period_response

def _arun(self, fly_from: str, fly_to: str, date_from: str, date_to: str, sort: str):

raise NotImplementedError("This tool does not support async")

args_schema: Optional[Type[BaseModel]] = GetflightInPeriodCheckInput

b) Agent Creation

After the tool is created which can successfully request flight info from Kiwi API, the next step is to simply create a Langchain agent with it.

Import the Langchain methods and provide your OpenAI API Key.

from langchain.agents import initialize_agent, Tool

from langchain.agents import AgentType

from langchain.chat_models import ChatOpenAI

import os

os.environ["OPENAI_API_KEY"] = "{Your_API_Key}"

Push our GetflightInPeriodTool() into tools list and use ChatOpenAI() to create llm with gpt-3.5-turbo-0613 model. Please note that this time we should select OPENAI_FUNCTIONS as the agent type to activate the function callings features.

tools = [GetflightInPeriodTool()]

llm = ChatOpenAI(temperature=0.7, model="gpt-3.5-turbo-0613")

open_ai_agent = initialize_agent(tools,

llm,

agent=AgentType.OPENAI_FUNCTIONS,

verbose=True)

@cl.langchain_factory(use_async=False)

def agent():

tools = [GetflightInPeriodTool()]

functions = [format_tool_to_openai_function(t) for t in tools]

print(functions[0])

llm = ChatOpenAI(temperature=0.7, model="gpt-3.5-turbo-0613")

open_ai_agent = initialize_agent(tools,

llm,

agent=AgentType.OPENAI_FUNCTIONS,

verbose=True)

return open_ai_agent

Now running it as Chainlist is pretty easy as below

chainlit run flight_chatbot.py

references:

https://levelup.gitconnected.com/monetize-your-chatbot-by-using-openais-function-calling-and-langchain-157d0d97cd79

What is Chainlit?

Chainlit lets you create ChatGPT-like UIs on top of any Python code in minutes! Some of the key features include intermediary steps visualisation, element management & display (images, text, carousel, etc.) as well as cloud deployment.

Installation

$ pip install chainlit

$ chainlit hello

Sample

import chainlit as cl

@cl.on_message # this function will be called every time a user inputs a message in the UI

async def main(message: str):

# this is an intermediate step

await cl.Message(author="Tool 1", content=f"Response from tool1", indent=1).send()

# send back the final answer

await cl.Message(content=f"This is the final answer").send()

Doing this just below two helped to get the sample chat page up and running

$ pip install chainlit

$ chainlit hello

References:

https://github.com/Chainlit/chainlit

Sunday, July 9, 2023

Function Calls in OpenAI

Now our chatbot application can utilize the output of “functions” for executing it with Internet access, enabling the retrieval of up-to-date information to enhance its responses to user queries

If the functions or APIs offer both valuable and referral functionalities, there is a possibility of earning commissions by releasing and monetizing the chatbot applications with more natural interaction with users than normal Internet searching.

With OpenAI’s recent cool feature, their models have been directly fine-tuned to internally understand the format and meaning of external tools and functions. This enables them to generate JSON-based function instructions, eliminating the need for external prompt engineering works and facilitating a seamless run of proper tools during runtime.

Below is function that returns some recent information

def get_flight_info(fly_from, fly_to, date):

flight = {"flight_no": "AA1234", "price": 100, "depart_time": date+" 17:23:00"}

return flight

Next task is to construct the prompt for gpt-3.5-turbo-0613 model which includes the new features we need.

prompt = [{"role": "system", "content": "You are a helpful assistant. Answer as complete as possible."}]

prompt.append({"role": "user", "content": "Show me the cheapest flight from Berlin to New York in 23 Jul"})

functions = [

{

"name": "get_flight_info",

"description": "Get the info of the cheapest flight for a given date",

"parameters": {

"type": "object",

"properties": {

"fly_from": {

"type": "string",

"description": "the 3-digit code for departure airport"

"fly_to": {

"type": "string",

"description": "the 3-digit code for arrival airport"

"date": {

"type": "string",

"description": "the dd/mm/yyyy format date for flight search"

"required": ["fly_from", "fly_to", "date"]

}

]

completion=openai.ChatCompletion.create(

model="gpt-3.5-turbo-0613",

messages = prompt,

functions = functions

)

Compared to the previous usage of OpenAI API, in this new update, we add one new section to the completion API called functions which contains the function’s name, description, and parameters that inform the model of proper function usage.

message=completion.choices[0].message.function_call

print(message)

{'name': 'get_flight_info', 'arguments': '{\n "fly_from": "BER",\n "fly_to": "JFK",\n "date": "23/07/2023"\n}'}

Now with the below, we can easily execute the function and give the result to the user

import ast

data_dict = message.to_dict()

# Extract the function name and arguments from the dictionary.

function_name = data_dict['name']

arguments = ast.literal_eval(data_dict['arguments'])

# Get a reference to the function object.

function = globals()[function_name]

# Call the function with the arguments.

result = function(**arguments)

print(result)

References:

https://levelup.gitconnected.com/monetize-your-chatbot-by-using-openais-function-calling-and-langchain-157d0d97cd79

Training a pre-traininged LLM using OpenAI

Pre-training is expensive but Fine Tuning is comparatively Cheaper

training_data = """

Your training data goes here.

This can be a collection of articles, books, or any other relevant text.

"""

ine-tuning the Model To fine-tune the GPT-3.5 model with your training data, use the fine_tune function from the OpenAI library. Specify the training data, the model name, and any additional parameters you wish to include.

fine_tuning_job = openai.FineTune.create(

model_engine=model_engine,

n_epochs=n_epochs,

batch_size=batch_size,

learning_rate=learning_rate,

max_tokens=max_tokens,

training_file=os.path.abspath(training_file),

validation_file=os.path.abspath(validation_file),

)

job_id = fine_tuning_job["id"]

print(f"Fine-tuning job created with ID: {job_id}")

You can use the OpenAI API to monitor the progress of your fine-tuning job. The following code snippet shows how to fetch the status of the fine-tuning job:

import time

while True:

fine_tuning_status = openai.FineTune.get_status(job_id)

status = fine_tuning_status["status"]

print(f"Fine-tuning job status: {status}")

if status in ["completed", "failed"]:

break

time.sleep(60)

fine_tuned_model_id = fine_tuning_status["fine_tuned_model_id"]

# Use the fine-tuned model for text generation

def generate_text(prompt, model_id, max_tokens=50):

response = openai.Completion.create(

engine=model_id,

prompt=prompt,

max_tokens=max_tokens,

n=1,

stop=None,

temperature=0.5,

)

return response.choices[0].text.strip()

prompt = "Your example prompt goes here."

generated_text = generate_text(prompt, fine_tuned_model_id)

print(f"Generated text: {generated_text}")

Training Data

{"prompt": "What is the capital of France?", "completion": "Paris"}

{"prompt": "Which gas do plants absorb from the atmosphere?", "completion": "Carbon dioxide"}

{"prompt": "What is the largest mammal on Earth?", "completion": "Blue whale"}

{"prompt": "Which element has the atomic number 1?", "completion": "Hydrogen"}

Validation Data

{"prompt": "What is the chemical formula for water?", "completion": "H2O"}

{"prompt": "What is the square root of 81?", "completion": "9"}

{"prompt": "Who wrote the play 'Romeo and Juliet'?", "completion": "William Shakespeare"}

{"prompt": "What is the freezing point of water in Celsius?", "completion": "0 degrees Celsius"}

References:

https://medium.com/@smitkumbhani080/how-to-train-a-pre-trained-large-language-model-llm-in-python-using-openai-easy-27680c92fc3d

What are various OpenAI Models

The OpenAI API is powered by a diverse set of models with different capabilities and price points. You can also make limited customizations to our original base models for your specific use case with fine-tuning.

MODELS DESCRIPTION

GPT-4 A set of models that improve on GPT-3.5 and can understand as well as generate natural language or code

GPT-3.5 A set of models that improve on GPT-3 and can understand as well as generate natural language or code

DALL·E A model that can generate and edit images given a natural language prompt

Whisper A model that can convert audio into text

Embeddings A set of models that can convert text into a numerical form

Moderation A fine-tuned model that can detect whether text may be sensitive or unsafe

GPT-3Legacy A set of models that can understand and generate natural language

Deprecated A full list of models that have been deprecated

We have also published open source models including Point-E, Whisper, Jukebox, and CLIP.

references:

https://platform.openai.com/docs/models

How to get started with GPT-35-Turbo and GPT-4 with Azure OpenAI Service

Prerequisites

An Azure subscription - Create one for free.

Access granted to Azure OpenAI in the desired Azure subscription.

Currently, access to this service is granted only by application. You can apply for access to Azure OpenAI by completing the form at https://aka.ms/oai/access. Open an issue on this repo to contact us if you have an issue.

An Azure OpenAI Service resource with either the gpt-35-turbo or the gpt-41 models deployed. For more information about model deployment, see the resource deployment guide.

1 GPT-4 models are currently only available by request. To access these models, existing Azure OpenAI customers can apply for access by filling out this form.

Navigate to Azure OpenAI Studio at https://oai.azure.com/ and sign-in with credentials that have access to your OpenAI resource. During or after the sign-in workflow, select the appropriate directory, Azure subscription, and Azure OpenAI resource.

From the Azure OpenAI Studio landing page, select Chat playground.

Playground

Start exploring OpenAI capabilities with a no-code approach through the Azure OpenAI Studio Chat playground. From this page, you can quickly iterate and experiment with the capabilities.

Now below were the practicals

1. Signup for account.

Gave in the email details, I already seemed to have the credentials

It asked for Subscription ID which is associated with the Company = This one I did not have really!

Other than that without the subscription, it shows as below

Thank you for your interest in Azure OpenAI Service. Please submit this form to register for approval to access and use Azure OpenAI’s Limited Access text and code and/or DALL·E 2 text to image models (as indicated in the form). All use cases must be registered. Azure OpenAI Service requires registration and is currently only available to approved enterprise customers and partners. Learn more about limited access to Azure OpenAI Service here.

Limited access scenarios: When evaluating which scenarios to onboard, we consider who will directly interact with the application, who will see the output of the application, whether the application will be used in a high-stakes domain (e.g., medical), and the extent to which the application’s capabilities are tightly scoped. In general, applications in high stakes domains will require additional mitigations and are more likely to be approved for applications with internal-only users and internal-only audiences. Applications with broad possible uses, including content generation capabilities, are more likely to be approved if 1) the domain is not high stakes and users are authenticated or 2) in the case of high stakes domains, anyone who views or interacts with the content is internal to your company.

There is a high chance of the form getting rejected due to either of the below

Possible causes for a denied application:

1) You are not an approved enterprise customer. Learn more here

2) Application submitted with personal email (Example: @gmail.com, @yahoo.com, @hotmail.com, etc.)

references:

https://learn.microsoft.com/en-us/azure/cognitive-services/openai/chatgpt-quickstart?pivots=programming-language-studio&tabs=command-line

Saturday, July 8, 2023

Python Wrappers and decorators

Python wrappers are functions that are added to another function which then can add additional functionality or modifies its behavior without directly changing its source code.

They are typically implemented as decorators, which are special functions that take another function as input and apply some changes to its functionality

Wrapper functions can be useful in various scenarios:

Functionality Extension: We can add features like logging, performance measurement, or caching by wrapping our functions with a decorator.

Code Reusability: We can apply a wrapper function or even a class to multiple entities, you can avoid code duplication, and ensure consistent behavior across different components.

Behavior Modification: We can intercept the input arguments, for example, validate the input variable without the need for many assert lines.

import time

def timer(func):

def wrapper(*args, **kwargs):

# start the timer

start_time = time.time()

# call the decorated function

result = func(*args, **kwargs)

# remeasure the time

end_time = time.time()

# compute the elapsed time and print it

execution_time = end_time - start_time

print(f"Execution time: {execution_time} seconds")

# return the result of the decorated function execution

return result

# return reference to the wrapper function

return wrapper

@timer

def train_model():

print("Starting the model training function...")

# simulate a function execution by pausing the program for 5 seconds

time.sleep(5)

print("Model training completed!")

train_model()

Below is a retry

import time

def retry(max_attempts, delay=1):

def decorator(func):

def wrapper(*args, **kwargs):

attempts = 0

while attempts < max_attempts:

try:

return func(*args, **kwargs)

except Exception as e:

attempts += 1

print(f"Attempt {attempts} failed: {e}")

time.sleep(delay)

print(f"Function failed after {max_attempts} attempts")

return wrapper

return decorator

@retry(max_attempts=3, delay=2)

def fetch_data(url):

print("Fetching the data..")

# raise timeout error to simulate a server not responding..

raise TimeoutError("Server is not responding.")

fetch_data("https://example.com/data") # Retries 3 times with a 2-second delay between attempts

References

https://python.plainenglish.io/five-python-wrappers-that-can-reduce-your-code-by-half-af775feb1d5

Why OpenAI tokens are more in number

import tiktoken

# Get the encoding for the davinci GPT3 model, which is the "r50k_base" encoding.

encoding = tiktoken.encoding_for_model("davinci")

text = "We need to stop anthropomorphizing ChatGPT."

print(f"text: {text}")

token_integers = encoding.encode(text)

print(f"total number of tokens: {encoding.n_vocab}")

print(f"token integers: {token_integers}")

token_strings = [encoding.decode_single_token_bytes(token) for token in token_integers]

print(f"token strings: {token_strings}")

print(f"number of tokens in text: {len(token_integers)}")

encoded_decoded_text = encoding.decode(token_integers)

print(f"encoded-decoded text: {encoded_decoded_text}")

With this simple string, the token count printed is around 50K. OpenAI choses such a big token length!

However, we can’t encode nearly as much information as in OpenAI’s approach. If we used letter-based tokens in the example above, 11 tokens could only encode “We need to”, while 11 of OpenAI’s tokens can encode the entire sentence. It turns out that the current language models have a limit on the maximum number of tokens that they can receive. Therefore, we want to pack as much information as possible in each token.

Now let’s consider the scenario where each word is a token. Compared to OpenAI’s approach, we would only need seven tokens to represent the same sentence, which seems more efficient. And splitting by word is also straighforward to implement. However, language models need to have a complete list of tokens that they might encounter, and that’s not feasible for whole words — not only because there are so many words in the dictionary, but also because it would be difficult to keep up with domain-specific terminology and any new words that are invented.

f you’ve played with OpenAI’s ChatGPT, you know that it produces many tokens, not just a single token. That’s because this basic idea is applied in an expanding-window pattern. You give it n tokens in, it produces one token out, then it incorporates that output token as part of the input of the next iteration, produces a new token out, and so on. This pattern keeps repeating until a stopping condition is reached, indicating that it finished generating all the text you need.

While playing with ChatGPT, you may also have noticed that the model is not deterministic: if you ask it the exact same question twice, you’ll likely get two different answers. That’s because the model doesn’t actually produce a single predicted token; instead it returns a probability distribution over all the possible tokens. In other words, it returns a vector in which each entry expresses the probability of a particular token being chosen. The model then samples from that distribution to generate the output token.

References:

https://towardsdatascience.com/how-gpt-models-work-b5f4517d5b5

Why is a GPU preferable over a CPU for Machine Learning?

When buying a GPU for machine learning, there are several factors to consider. Here are some key aspects to look into:

GPU Architecture: The architecture of the GPU is crucial as it determines its computational capabilities and performance for machine learning tasks. Look for modern architectures, such as NVIDIA's Turing or Ampere, which offer dedicated hardware for machine learning workloads.

CUDA Cores: CUDA cores are parallel processors within the GPU that perform the heavy lifting for machine learning computations. More CUDA cores generally lead to faster training and inference times. Consider GPUs with a higher number of CUDA cores for improved performance.

Memory (VRAM): The amount of video RAM (VRAM) on the GPU is critical for deep learning models, especially those with larger datasets or complex architectures. Choose a GPU with sufficient VRAM to accommodate your training data and model requirements. Aim for at least 8GB or more of VRAM for most machine learning tasks.

Memory Bandwidth: The memory bandwidth of the GPU affects how quickly data can be read from and written to the VRAM. Higher memory bandwidth allows for faster data transfers, which can improve overall training performance.

Tensor Cores (for AI-specific workloads): Tensor cores are specialized hardware components found in some GPUs, such as NVIDIA's RTX series. They accelerate matrix operations commonly used in deep learning, offering significant performance gains. If you'll be working with AI-specific workloads, consider GPUs with tensor cores.

Compatibility and Software Support: Ensure that the GPU you choose is compatible with the deep learning frameworks and libraries you plan to use, such as TensorFlow or PyTorch. Also, check for reliable driver support and compatibility with your operating system.

Power and Cooling: Consider the power requirements of the GPU and ensure that your system's power supply can handle it. Additionally, check if your system has adequate cooling to handle the GPU's thermal requirements, as machine learning workloads can generate substantial heat.

Budget: Finally, consider your budget and strike a balance between performance and cost. Higher-end GPUs tend to offer better performance but come at a higher price. Evaluate your specific needs and choose a GPU that meets your requirements without exceeding your budget.

It's worth noting that GPU selection depends on the specific machine learning tasks you'll be performing. For more complex models or larger datasets, a higher-end GPU with more resources is generally recommended. However, for simpler models or smaller datasets, a mid-range GPU may suffice.

A CPU (Central Processing Unit) is the workhorse of your computer, and importantly is very flexible. It can deal with instructions from a wide range of programs and hardware, and it can process them very quickly. To excel in this multitasking environment a CPU has a small number of flexible and fast processing units (also called cores).

A GPU (Graphics Processing Unit) is a little bit more specialised, and not as flexible when it comes to multitasking. It is designed to perform lots of complex mathematical calculations in parallel, which increases throughput. This is achieved by having a higher number of simpler cores, sometimes thousands, so that many calculations can be processed all at once.

This requirement of multiple calculations being carried out in parallel is a perfect fit for:

graphics rendering — moving graphical objects need their trajectories calculated constantly, and this requires a large amount of constant repeat parallel mathematical calculations.

machine and deep learning — large amounts of matrix/tensor calculations, which with a GPU can be processed in parallel.

any type of mathematical calculation that can be split to run in parallel.

Tensor Processing Unit (TPU)

With the boom in AI and machine/deep learning there are now even more specialised processing cores called Tensor cores. These are faster and more efficient when performing tensor/matrix calculations. Exactly what you need for the type of mathematics involved in machine/deep learning.

Although there are dedicated TPUs, some of the latest GPUs also include a number of Tensor cores, as you will see later in this article.

Nvidia vs AMD

Nvidia’s GPUs have much higher compatibility, and are just generally better integrated into tools like TensorFlow and PyTorch.

trying to use an AMD GPU with TensorFlow requires using additional tools (ROCm), which tend to be a bit fiddly, and sometimes leave you with a not quite up to date version of TensorFlow/PyTorch, just so you can get the card working.

CUDA Cores and Tensor Cores

This is fairly simple really. The more CUDA (Compute Unified Device Architecture) cores / Tensor cores the better.

RAM and chip architecture should probably be considered first, and then look at cards with the highest number of CUDA/tensor cores from your narrowed down selection

For machine/deep learning Tensor cores are better (faster and more efficient) than CUDA cores. This is due to them being designed precisely for the calculations that are required in the machine/deep learning domain.

The reality is it doesn’t matter a great deal, CUDA cores are plenty fast enough. If you can get a card which includes tensor cores too, that is a good plus point to have, just don’t get too hung up on it.

CUDA cores — these are the physical processors on the graphics cards, typically in their thousands.

CUDA 11 — The number may change, but this is referring to the software/drivers that are installed to allow the graphics card to work. New releases are made regularly, and it can be installed like any other software.

CUDA generation (or compute capability) — this describes the capability of the graphics card in terms of it’s generational features. This is fixed in hardware, and so can only be changed by upgrading to a new card. It is distinguished by numbers and a code name. Examples: 3.x [Kepler], 5.x [Maxwell], 6.x [Pascal], 7.x [Turing] and 8.x [Ampere].

references:

https://towardsdatascience.com/how-to-pick-the-best-graphics-card-for-machine-learning-32ce9679e23b

Thursday, July 6, 2023

What is ELI5 a long-form question answering dataset?

ELI5 is a dataset for long-form question answering. It contains 270K complex, diverse questions that require explanatory multi-sentence answers. Web search results are used as evidence documents to answer each question. ELI5 is also a task in Dodecadialogue.

The dataset comprises 270K threads from the Reddit forum ``Explain Like I'm Five'' (ELI5) where an online community provides answers to questions which are comprehensible by five year olds. Compared to existing datasets, ELI5 comprises diverse questions requiring multi-sentence answers. We provide a large set of web documents to help answer the question. Automatic and human evaluations show that an abstractive model trained with a multi-task objective outperforms conventional Seq2Seq, language modeling, as well as a strong extractive baseline. However, our best model is still far from human performance since raters prefer gold responses in over 86% of cases, leaving ample opportunity for future improvement.

References:

https://paperswithcode.com/paper/eli5-long-form-question-answering

Wednesday, July 5, 2023

What is Streamlit app

Streamlit is an open-source Python library used for building interactive web applications and data visualizations. It simplifies the process of creating web-based interfaces for data analysis and machine learning tasks.

With Streamlit, you can write Python scripts that allow you to create custom web applications quickly and easily. You can incorporate charts, tables, interactive widgets, and other visualizations to present and explore your data.

Here's a simple example of a Streamlit application that displays a plot:

import streamlit as st

import pandas as pd

import matplotlib.pyplot as plt

# Load data

data = pd.read_csv("data.csv")

# Display plot

st.line_chart(data)

# Run the Streamlit app

if __name__ == "__main__":

st.title("My Streamlit App")

To accept text input using Streamlit, you can use the text_input function provided by the Streamlit library. Here's an example of how to accept text input from the user:

import streamlit as st

# Accept text input

user_input = st.text_input("Enter your name", "John Doe")

# Display the input

st.write("Hello,", user_input)

Sunday, July 2, 2023

After installing Xcode 14.3 in order to run my app on my iOS 16.3 iPhone XS. I get the following error:

Add the below code to the Podfile. It works for me. Version 14.3 beta 2 (14E5207e)

post_install do |installer|

installer.generated_projects.each do |project|

project.targets.each do |target|

target.build_configurations.each do |config|

config.build_settings['IPHONEOS_DEPLOYMENT_TARGET'] = '13.0'

end

Also, had to do remove all the pods and then reinstall the pods to make it work

references:

https://stackoverflow.com/questions/75574268/missing-file-libarclite-iphoneos-a-xcode-14-3

What is Mel-Spectrogram

A signal is a variation in a certain quantity over time. For audio, the quantity that varies is air pressure. How do we capture this information digitally? We can take samples of the air pressure over time. The rate at which we sample the data can vary, but is most commonly 44.1kHz, or 44,100 samples per second. What we have captured is a waveform for the signal, and this can be interpreted, modified, and analyzed with computer software.

import librosa

import librosa.display

import matplotlib.pyplot as plt

y, sr = librosa.load('./example_data/blues.00000.wav')

plt.plot(y);

plt.title('Signal');

plt.xlabel('Time (samples)');

plt.ylabel('Amplitude');

This is great! We have a digital representation of an audio signal that we can work with. Welcome to the field of signal processing! You may be wondering though, how do we extract useful information from this? It looks like a jumbled mess. This is where our friend Fourier comes in.

The Fourier Transform

An audio signal is comprised of several single-frequency sound waves. When taking samples of the signal over time, we only capture the resulting amplitudes. The Fourier transform is a mathematical formula that allows us to decompose a signal into it’s individual frequencies and the frequency’s amplitude. In other words, it converts the signal from the time domain into the frequency domain. The result is called a spectrum.

This is possible because every signal can be decomposed into a set of sine and cosine waves that add up to the original signal. This is a remarkable theorem known as Fourier’s theorem. Click here if you want a good intuition for why this theorems is true. There is also a phenomenal video by 3Blue1Brown on the Fourier Transform if you would like to learn more here.

The fast Fourier transform (FFT) is an algorithm that can efficiently compute the Fourier transform. It is widely used in signal processing. I will use this algorithm on a windowed segment of our example audio.

import numpy as np

n_fft = 2048

ft = np.abs(librosa.stft(y[:n_fft], hop_length = n_fft+1))

plt.plot(ft);

plt.title('Spectrum');

plt.xlabel('Frequency Bin');

plt.ylabel('Amplitude');

The Spectrogram

The fast Fourier transform is a powerful tool that allows us to analyze the frequency content of a signal, but what if our signal’s frequency content varies over time? Such is the case with most audio signals such as music and speech. These signals are known as non periodic signals. We need a way to represent the spectrum of these signals as they vary over time. You may be thinking, “hey, can’t we compute several spectrums by performing FFT on several windowed segments of the signal?” Yes! This is exactly what is done, and it is called the short-time Fourier transform. The FFT is computed on overlapping windowed segments of the signal, and we get what is called the spectrogram. Wow! That’s a lot to take in. There’s a lot going on here. A good visual is in order.

You can think of a spectrogram as a bunch of FFTs stacked on top of each other. It is a way to visually represent a signal’s loudness, or amplitude, as it varies over time at different frequencies. There are some additional details going on behind the scenes when computing the spectrogram. The y-axis is converted to a log scale, and the color dimension is converted to decibels (you can think of this as the log scale of the amplitude). This is because humans can only perceive a very small and concentrated range of frequencies and amplitudes.

spec = np.abs(librosa.stft(y, hop_length=512))

spec = librosa.amplitude_to_db(spec, ref=np.max)

librosa.display.specshow(spec, sr=sr, x_axis='time', y_axis='log');

plt.colorbar(format='%+2.0f dB');

plt.title('Spectrogram');

The Mel Scale

Studies have shown that humans do not perceive frequencies on a linear scale. We are better at detecting differences in lower frequencies than higher frequencies. For example, we can easily tell the difference between 500 and 1000 Hz, but we will hardly be able to tell a difference between 10,000 and 10,500 Hz, even though the distance between the two pairs are the same.

In 1937, Stevens, Volkmann, and Newmann proposed a unit of pitch such that equal distances in pitch sounded equally distant to the listener. This is called the mel scale. We perform a mathematical operation on frequencies to convert them to the mel scale.

References

https://medium.com/analytics-vidhya/understanding-the-mel-spectrogram-fca2afa2ce53

What is OpenAI Whisper

Whisper is an automatic speech recognition (ASR) system trained on 680,000 hours of multilingual and multitask supervised data collected from the web. We show that the use of such a large and diverse dataset leads to improved robustness to accents, background noise and technical language. Moreover, it enables transcription in multiple languages, as well as translation from those languages into English. We are open-sourcing models and inference code to serve as a foundation for building useful applications and for further research on robust speech processing.

The Whisper architecture is a simple end-to-end approach, implemented as an encoder-decoder Transformer. Input audio is split into 30-second chunks, converted into a log-Mel spectrogram, and then passed into an encoder. A decoder is trained to predict the corresponding text caption, intermixed with special tokens that direct the single model to perform tasks such as language identification, phrase-level timestamps, multilingual speech transcription, and to-English speech translation.

Other existing approaches frequently use smaller, more closely paired audio-text training datasets,1 2,3 or use broad but unsupervised audio pretraining.4,5,6 Because Whisper was trained on a large and diverse dataset and was not fine-tuned to any specific one, it does not beat models that specialize in LibriSpeech performance, a famously competitive benchmark in speech recognition. However, when we measure Whisper’s zero-shot performance across many diverse datasets we find it is much more robust and makes 50% fewer errors than those models.

references:

https://openai.com/research/whisper

What is TikToken Library

Tiktoken is an open-source tool developed by OpenAI that is utilized for tokenizing text.

Tokenization is when you split a text string to a list of tokens. Tokens can be letters, words or grouping of words (depending on the text language).

For example, “I’m playing with AI models” can be transformed to this list [“I”,”’m”,” playing”,” with”,” AI”,” models”].

Then these tokens can be encoded in integers.

OpenAI uses a technique called byte pair encoding (BPE) for tokenization. BPE is a data compression algorithm that replaces the most frequent pairs of bytes in a text with a single byte. This reduces the size of the text and makes it easier to process.

You can use tiktoken to count tokens, because:

You need to know whether the text your are using is very long to be processed by the model

You need to have an idea about OpenAI API call costs (The price is applied by token).

For example, if you are using GPT-3.5-turbo model you will be charged: $0.002 / 1K tokens

How to count the number of tokens using tiktoken?

pip install tiktoken

import tiktoken

Encoding

Different encodings are used in openai: cl100k_base, p50k_base, gpt2.

These encodings depend on the model you are using:

For gpt-4, gpt-3.5-turbo, text-embedding-ada-002, you need to use cl100k_base.

All this information is already included in OpenAI API, you don’t need to remember it. Therefore, you can call the encoding using 2 methods:

If you know the exact encoding name:

encoding = tiktoken.get_encoding("cl100k_base")

Alternatively, you can allow the OpenAI API to provide a suitable tokenization method based on the model you are using:

encoding = tiktoken.encoding_for_model("gpt-3.5-turbo")

print(encoding)

Tokenization

Let’s tokenize this text:

text = "I'm playing with AI models"

This will return a list of tokens integer:

tokens_integer=encoding.encode(text)

tokens_integer

[40, 2846, 5737, 449, 15592, 4211]

print(f"{len(tokens_integer)} is the number of tokens in my text")

6 is the number of tokens in my text

It’s worth mentioning that we can obtain the corresponding token string for each integer token by utilizing the ‘encoding.decode_single_token_bytes()’ function (each string will be a bytes ‘b’ string)

tokens_string = [encoding.decode_single_token_bytes(token) for token in tokens_integer]

tokens_string

[b'I', b"'m", b' playing', b' with', b' AI', b' models']

the space before each word? This is how it works in OpenAI with tiktoken.

Count the number of token in the message to be sent using the API:

message =[{

"role": "user",

"content": "Explain to me how tolenization is working in OpenAi models?",

}]

tokens_per_message = 4

# every message follows <|start|>{role/name}\n{content}<|end|>\n

num_tokens = 0

num_tokens += tokens_per_message

for key, value in message[0].items():

text=value

num_tokens+=len(encoding.encode(value))

print(f"{len(encoding.encode(value))} is the number of token included in {key}")

num_tokens += 3

# every reply is primed with <|start|>assistant<|message|>

print(f"{num_tokens} number of tokens to be sent in our request")

1 is the number of token included in role

15 is the number of token included in content

23 number of tokens to be sent in our request

import openai

openai.api_key='YOUR_API_KEY'

response = openai.ChatCompletion.create(

model='gpt-3.5-turbo-0301',

messages=message,

temperature=0,

max_tokens=200

)

num_tokens_api = response["usage"]["prompt_tokens"]

print(f"{num_tokens_api} number of tokens used by the API")

23 number of tokens used by the API

The number of tokens is the same as what we calculated using ‘tiktoken’.

Furthermore, let’s count the number of tokens in ChatGPT answer :

resp=response["choices"][0]["message"].content

len(encoding.encode(resp))

200

Sunday, July 30, 2023

Saturday, July 22, 2023

Monday, July 17, 2023

Sunday, July 16, 2023

Saturday, July 15, 2023

Friday, July 14, 2023

Monday, July 10, 2023

Sunday, July 9, 2023

Saturday, July 8, 2023

Thursday, July 6, 2023

Wednesday, July 5, 2023

Sunday, July 2, 2023

Followers

Blog Archive

About Me