Saturday, December 31, 2022

What is P99 Latency?

It's 99th percentile. It means that 99% of the requests should be faster than given latency. In other words only 1% of the requests are allowed to be slower.


Imagine that you are collecting performance data of your service and the below table is the collection of results (the latency values are fictional to illustrate the idea


Latency    Number of requests

1s         5

2s         5

3s         10

4s         40

5s         20

6s         15

7s         4

8s         1


The P99 latency of your service is 7s. Only 1% of the requests take longer than that. So, if you can decrease the P99 latency of your service, you increase its performance.


references:

https://stackoverflow.com/questions/12808934/what-is-p99-latency

What is the crond in Linux command

crond daemon or cron is used to execute cron jobs in the background. It is nothing but the daemon that handles and executes commands to run the cronjobs in accordance with the specified schedule. All the schedules and required corresponding commands are stored in the file “crontab”. The full directory path to access this file in Linux is /etc/crontab. In another word, the crond is a server daemon that performs a long-running process in order to execute commands at specified date and time as per the assigned Cron job. It is started during system startup from the /etc/rc.d/init.d/crond file. The cron program itself is located under /usr/sbin/crond.

reference:

https://www.servercake.blog/crond-linux/

What does /dev/tcp do?

Bash supports read/write operations on a pseudo-device file /dev/tcp/[host]/[port] [1].

Writing to this special file makes bash open a tcp connection to host:port, and this feature may be used for some useful purposes, for example:

Query an NTP server

cat </dev/tcp/time.nist.gov/13

reads the time in Daytime Protocol from the NIST Internet Time Service server.

Fetch a web page

this script

exec 3<>/dev/tcp/www.google.com/80

echo -e "GET / HTTP/1.1\r\nhost: http://www.google.com\r\nConnection: close\r\n\r\n" >&3

cat <&3

(echo > /dev/tcp/$HOST/$PORT) >/dev/null 2>&1

result=$?

references:

https://andreafortuna.org/2021/03/06/some-useful-tips-about-dev-tcp/

Kong difference between service mesh

API Gateways facilitate API communications between a client and an application, and across microservices within an application.  Operating at layer 7 (HTTP), an API gateway provides both internal and external communication services, along with value-added services such as authentication, rate limiting, transformations, logging and more. 


Service Mesh is an emerging technology focused on routing internal communications. Operating primarily at layer 4 (TCP), a service mesh provides internal communication along with health checks, circuit breakers and other services. 


Because API Gateways and Service Meshes operate at different layers in the network stack, each technology has different strengths. 


At Kong, we are focused on both API Gateway and Service Mesh solutions. We believe that developers should have a unified and trusted interface to address the full range of internal and external communications, and value added services. Today, however, API Gateways and Service Mesh appear as distinct solutions requiring different architectural and implementation choices. Very soon, that will change. 


references:

https://konghq.com/faqs#:~:text=The%20Kong%20Server%2C%20built%20on,before%20proxying%20the%20request%20upstream.&text=for%20proxying.,Kong%20listens%20for%20HTTP%20traffic.

What are Kong Plugins

Plugins are one of the most important features of Kong. Many Kong API gateway features are provided by plugins. Authentication, rate-limiting, transformation, logging and more are all implemented independently as plugins. Plugins can be installed and configured via the Admin API running alongside Kong.


Almost all plugins can be customized not only to target a specific proxied service, but also to target specific Consumers.


From a technical perspective, a plugin is Lua code that’s being executed during the life-cycle of a proxied request and response. Through plugins, Kong can be extended to fit any custom need or integration challenge. For example, if you need to integrate the API’s user authentication with a third-party enterprise security system, that would be implemented in a dedicated plugin that is run on every request targeting that given API.


There are several plugins at this page 


https://docs.konghq.com/hub/


references

https://konghq.com/faqs#:~:text=The%20Kong%20Server%2C%20built%20on,before%20proxying%20the%20request%20upstream.&text=for%20proxying.,Kong%20listens%20for%20HTTP%20traffic.

What is Kong Datastore

Kong uses an external datastore to store its configuration such as registered APIs, Consumers and Plugins. Plugins themselves can store every bit of information they need to be persisted, for example rate-limiting data or Consumer credentials.


Kong maintains a cache of this data so that there is no need for a database roundtrip while proxying requests, which would critically impact performance. This cache is invalidated by the inter-node communication when calls to the Admin API are made. As such, it is discouraged to manipulate Kong’s datastore directly, since your nodes cache won’t be properly invalidated.


This architecture allows Kong to scale horizontally by simply adding new nodes that will connect to the same datastore and maintain their own cache.


The supported datastores are 


Apache Cassandra

PostgreSQL


Scaling of Kong Server 


Scaling the Kong Server up or down is fairly easy. Each server is stateless meaning you can add or remove as many nodes under the load balancer as you want as long as they point to the same datastore.


Be aware that terminating a node might interrupt any ongoing HTTP requests on that server, so you want to make sure that before terminating the node, all HTTP requests have been processed.


Scaling of Kong Datastore

Scaling the datastore should not be your main concern, mostly because as mentioned before, Kong maintains its own cache, so expect your datastore’s traffic to be relatively quiet.


However, keep in mind that it is always a good practice to ensure your infrastructure does not contain single points of failure (SPOF). As such, closely monitor your datastore, and ensure replication of your data.


If you use Cassandra, one of its main advantages is its easy-to-use replication capabilities due to its distributed nature.


references:

https://konghq.com/faqs#:~:text=The%20Kong%20Server%2C%20built%20on,before%20proxying%20the%20request%20upstream.&text=for%20proxying.,Kong%20listens%20for%20HTTP%20traffic.

Advantages of using Kong

Compared to other API gateways and platforms, Kong has many important advantages that are not found in the market today. Choose Kong to ensure your API gateway platform is:


Radically Extensible

Blazingly Fast

Open Source

Platform Agnostic

Manages the full API lifecycle

Cloud Native

RESTful


What area Kongs main components 

Kong server & Kong datastore


Below is some details on the Kong server


The Kong Server, built on top of NGINX, is the server that will actually process the API requests and execute the configured plugins to provide additional functionalities to the underlying APIs before proxying the request upstream.


Kong listens on several ports that must allow external traffic and are by default:

8000


for proxying. This is where Kong listens for HTTP traffic. See proxy_listen.

8443

for proxying HTTPS traffic. See proxy_listen_ssl.

Additionally, those ports are used internally and should be firewalled in production usage:

8001


provides Kong’s Admin API that you can use to operate Kong. See admin_api_listen.

8444

provides Kong’s Admin API over HTTPS. See admin_api_ssl_listen.

references:

https://konghq.com/faqs#:~:text=The%20Kong%20Server%2C%20built%20on,before%20proxying%20the%20request%20upstream.&text=for%20proxying.,Kong%20listens%20for%20HTTP%20traffic.


What is Apache zooKeeper

Apache ZooKeeper is an effort to develop and maintain an open-source server which enables highly reliable distributed coordination.


ZooKeeper is a centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services. All of these kinds of services are used in some form or another by distributed applications. Each time they are implemented there is a lot of work that goes into fixing the bugs and race conditions that are inevitable. Because of the difficulty of implementing these kinds of services, applications initially usually skimp on them, which make them brittle in the presence of change and difficult to manage. Even when done correctly, different implementations of these services lead to management complexity when the applications are deployed.


references:

https://zookeeper.apache.org/

What is LuaRocks

LuaRocks is the package manager for Lua modules.


It allows you to create and install Lua modules as self-contained packages called rocks. You can download and install LuaRocks on Unix and Windows. Get started


LuaRocks is free software and uses the same license as Lua.


referencesL

https://luarocks.org

What are Kong Plugins

Kong Gateway is a Lua application designed to load and execute Lua or Go modules, which we commonly refer to as plugins. Kong provides a set of standard Lua plugins that get bundled with Kong Gateway. The set of plugins you have access to depends on your installation: open-source, enterprise, or either of these Kong Gateway options running on Kubernetes.


Custom plugins can also be developed by the Kong Community and are supported and maintained by the plugin creators. If they are published on the Kong Plugin Hub, they are called Community or Third-Party plugins.


Why use plugins?

Plugins provide advanced functionality and extend the use of the Kong Gateway, which allows you to add new features to your implementation. Plugins can be configured to run in a variety of contexts, ranging from a specific route to all upstreams, and can execute actions inside Kong before or after a request has been proxied to the upstream API, as well as on any incoming responses.


references:

https://docs.konghq.com/hub/plugins/overview/

Friday, December 30, 2022

Bash check if file exists

FILE=/etc/resolv.conf

if test -f "$FILE"; then

    echo "$FILE exists."

fi

references

https://linuxize.com/post/bash-check-if-file-exists/

bash script print with tab

the echo mode with '\t' character allows to do this, but the most useful one is printf 

printf "%30s %10s %30\n" "${Name}" "${percentage}" 

This gives nice output 

Thursday, December 29, 2022

Linux xargs command

 

xargs, a UNIX and Linux command for building and executing command lines from standard input. Examples of cutting by character, byte position, cutting based on delimiter and how to modify the output delimiter.



The xargs command in UNIX is a command line utility for building an execution pipeline from standard input. Whilst tools like grep can accept standard input as a parameter, many other tools cannot. Using xargs allows tools like echo and rm and mkdir to accept standard input as arguments.



By default xargs reads items from standard input as separated by blanks and executes a command once for each argument. In the following example standard input is piped to xargs and the mkdir command is run for each argument, creating three folders.


echo 'one two three' | xargs mkdir

ls

one two three


xargs v exec

The find command supports the -exec option that allows arbitrary commands to be performed on found files. The following are equivalent.



find ./foo -type f -name "*.txt" -exec rm {} \; 

find ./foo -type f -name "*.txt" | xargs rm


time find . -type f -name "*.txt" -exec rm {} \;

0.35s user 0.11s system 99% cpu 0.467 total


time find ./foo -type f -name "*.txt" | xargs rm

0.00s user 0.01s system 75% cpu 0.016 total



references:

https://shapeshed.com/unix-xargs/


How to run shell scripts with cron tasks

Shell script as below 

#!/bin/sh

mkdir /home/lucky/jh

cd /home/lucky/jh


Now we need to enter the below to the crontab 

crontab -e 

20 * * * * /home/lucky/myfile.sh


chmod +x /home/lucky/myfile.sh


references:

https://askubuntu.com/questions/350861/how-to-set-a-cron-job-to-run-a-shell-script

What is Cron and Crontab in Linux

Cron is the system's main scheduler for running jobs or tasks unattended. A command called crontab allows the user to submit, edit or delete entries to cron. A crontab file is a user file that holds the scheduling information.


crontab -e


And enter the below command 


*/1 * * * * cd /Users/rk/Documents/RR/projects/tools/cron_tests && python crontest.py >> /Users/rk/Documents/RR/projects/tools/cron_tests/cron.txt 2>&1


The cron_test.py can be as below



#!/usr/bin/python3

from datetime import datetime

import os

sysDate = datetime.now()

# convert system date to string

curSysDate = (str(sysDate))

# parse the date only from the string

curDate = curSysDate[0:10]

# parse the hour only from the string

curHour = curSysDate[11:13]

# parse the minutes only from the string

curMin = curSysDate[14:16]

# concatenate hour and minute with underscore

curTime = curHour + '_' + curMin

# val for the folder name

folderName = curDate + '_' + curTime

# make a directory

os.mkdir(folderName)


Practically, there may be some problems with permissions in the Mac. Next to give permissions 


chmod 777 crontest.py 

Or chmod +x  crontest.py 


references:

https://www.geeksforgeeks.org/crontab-in-linux-with-examples/


Wednesday, December 28, 2022

What is The /proc Filesystem

The /proc directory contains virtual files that are windows into the current state of the running Linux kernel. This allows the user to peer into a vast array of information, effectively providing them with the kernel's point-of-view within the system. In addition, the user can use the /proc directory to communicate particular configuration changes to the kernel.


A Virtual Filesystem


In Linux, everything is stored in files. Most users are familiar with the two primary types of files, text and binary. However, the /proc directory contains files that are not part of any filesystem associated with your hard disks, CD-ROM, or any other physical storage device connected to your system (except, arguably, your RAM). Rather, these files are part of a virtual filesystem, enabled or disabled in the Linux kernel when it is compiled.


By default, when a Red Hat Linux system starts up, a line in /etc/fstab is responsible for mounting the /proc filesystem.


none         /proc       proc       defaults       0 0


The best way to understand /proc as a virtual filesystem is to list the files in the directory. The following is only a partial excerpt of such a list:


Viewing Virtual Files

By using cat, more, or less commands in combination with the files within /proc, you can immediately access an enormous amount of information about the system. As an example, if you want to see how the memory registers are currently assigned on your computer:


cat /proc/iomem

00000000-0009fbff : System RAM

0009fc00-0009ffff : reserved

000a0000-000bffff : Video RAM area

000c0000-000c7fff : Video ROM

000f0000-000fffff : System ROM

00100000-03ffcfff : System RAM

  00100000-002557df : Kernel code

  002557e0-0026c80b : Kernel data

03ffd000-03ffefff : ACPI Tables

03fff000-03ffffff : ACPI Non-volatile Storage

dc000000-dfffffff : S3 Inc. ViRGE/DX or /GX

e3000000-e30000ff : Lite-On Communications Inc LNE100TX

  e3000000-e30000ff : eth0

e4000000-e7ffffff : Intel Corporation 440BX/ZX - 82443BX/ZX Host bridge

ffff0000-ffffffff : reserved

[root@bleach /]# 

references:

https://mirror.apps.cam.ac.uk/pub/doc/redhat/ES2.1/rhl-rg-en-7.2/ch-proc.html

Monday, December 26, 2022

How to access array in bash

myArray=("cat" "dog" "mouse" "frog)


for str in ${myArray[@]}; do

  echo $str

done


references 

https://www.freecodecamp.org/news/bash-array-how-to-declare-an-array-of-strings-in-a-bash-script/


What is grpcfuse filesystem

The latest Edge release of Docker Desktop for Windows 2.1.7.0 has a completely new filesharing system using FUSE instead of Samba. 

Instead of Samba running over a Hyper-V virtual network, the new system uses a Filesystem in Userspace (FUSE) server running over gRPC over Hypervisor sockets.

FUSE Server

The request to open/close/read/write etc is received by the FUSE server, which is running as a regular Windows process. Finally in step (6) the FUSE server uses the Windows APIs to perform the read or write and then returns the result to the caller.


The FUSE server runs as the user who is running the Docker app, so it only has access to the user’s files and folders. There is no possibility of the VM gaining access to any other files, as could happen in the previous design if a local admin account is used to mount the drive in the VM.





What is Apache Airflow?

 Apache Airflow is an open-source platform for developing, scheduling, and monitoring batch-oriented workflows. Airflow’s extensible Python framework enables you to build workflows connecting with virtually any technology. A web interface helps manage the state of your workflows. Airflow is deployable in many ways, varying from a single process on your laptop to a distributed setup to support even the biggest workflows.

he main characteristic of Airflow workflows is that all workflows are defined in Python code. “Workflows as code” serves several purposes:

Dynamic: Airflow pipelines are configured as Python code, allowing for dynamic pipeline generation.

Extensible: The Airflow framework contains operators to connect with numerous technologies. All Airflow components are extensible to easily adjust to your environment.

Flexible: Workflow parameterization is built-in leveraging the Jinja templating engine.

Airflow is a batch workflow orchestration platform. The Airflow framework contains operators to connect with many technologies and is easily extensible to connect with a new technology. If your workflows have a clear start and end, and run at regular intervals, they can be programmed as an Airflow DAG.

If you prefer coding over clicking, Airflow is the tool for you. Workflows are defined as Python code which means:

Workflows can be stored in version control so that you can roll back to previous versions

Workflows can be developed by multiple people simultaneously

Tests can be written to validate functionality

Components are extensible and you can build on a wide collection of existing components

https://airflow.apache.org/docs/apache-airflow/stable/

What is Elementary OS

The thoughtful, capable, and ethical replacement for Windows and macOS

If you're new to Linux and looking for a desktop operating system that is elegant and incredibly user-friendly, but doesn't exactly follow the rules that most Debian/Ubuntu-based distributions follow (especially with regards to software installation), Elementary OS is the perfect solution.

references

https://elementary.io/

Sunday, December 25, 2022

Detailed steps to check Docker disk usage

Doing a Quick Check

docker system df

sudo du -sh /var/lib/docker/

This command shows static images, containers that have made changes to their filesystem (e.g., log files), and volumes bound to the containers.


Each version of an image is separate, but it’s stored in layers, so multiple new versions won’t take up twice as much storage space. You can view all images with image ls:



Cleaning these is easy; you don’t want to remove images of running containers, of course, but removing old images is fine—they’ll simply be re-downloaded when needed.



docker image prune -a

docker image rm 3a8d8f76e7f8f



Containers are a bit trickier to track down, since they can use data in many different ways:


1) Underlying image: each container will need to store its image, but this is reused across containers.

2) Modification layer: if a container writes to its filesystem, such as log files, it will be saved in a new layer on top of the underlying image. This is unique to each container.

3) Volumes: containers can have virtual drives mounted to them, which store data directly on disk outside the Docker storage system.

4) Bind Mounts: containers can optionally access directories on the host directly.


docker system df 


Here, this shows the size on disk, as well as the virtual size (which includes the shared underlying image). Since these containers aren’t using any storage outside their bind mounts, the size is zero bytes.


And best option is 


If you have direct access to the server running Docker, you can pop open a shell in the container:


sudo docker exec -it containerID /bin/bash



references:

https://www.howtogeek.com/devops/how-to-check-disk-space-usage-for-docker-images-containers/#:~:text=Doing%20a%20Quick%20Check,size%20of%20the%20entire%20directory. 

What is AWK?

AWK (awk) is a domain-specific language designed for text processing and typically used as a data extraction and reporting tool. Like sed and grep, it is a filter, and is a standard feature of most Unix-like operating systems.

The AWK language is a data-driven scripting language consisting of a set of actions to be taken against streams of textual data – either run directly on files or used as part of a pipeline – for purposes of extracting or transforming text, such as producing formatted reports. The language extensively uses the string datatype, associative arrays (that is, arrays indexed by key strings), and regular expressions. 

references:

https://en.wikipedia.org/wiki/AWK

Shell script Switch case statements

option1(){

    echo "Option 1 executing "

}


promptInput

echo "User opted for " ${action}


case $action in


  1)

    option1

    ;;


  2)

    echo -n "Selected option 2"

    ;;


  3 | 4)

    echo -n "Selected option 3 or 4"

    ;;


  *)

    echo -n "unknown option selected"

    ;;

esac


Bash script functions and return statements

Bash functions, do not allow you to return a value to the caller. It can only specify 0 or 1 to mention the status of function execution is success or failure. The status value is stored in the $? variable.

To return a value from a bash function is to just set a global variable to the result. All variables in bash are global by default

promptInput(){

    echo "Please enter action number to perform"

    echo "1.Option 1"

    echo "2. Option 2"

    echo "3. Option 3"

    echo "4. Option 4"

    read action

}

promptInput

echo "User opted for " ${action}

In this, the action variable is global and the read value is stored inside action. This can be accessed outside like any other global variable using $action 

How to accept command line arguments from a shell script

#!/bin/bash

# A simple copy script

cp $1 $2

# Let's verify the copy worked

echo Details for $2

ls -lh $2


$0 - The name of the Bash script.

$1 - $9 - The first 9 arguments to the Bash script. (As mentioned above.)

$# - How many arguments were passed to the Bash script.

$@ - All the arguments supplied to the Bash script.

$? - The exit status of the most recently run process.

$$ - The process ID of the current script.

$USER - The username of the user running the script.

$HOSTNAME - The hostname of the machine the script is running on.

$SECONDS - The number of seconds since the script was started.

$RANDOM - Returns a different random number each time is it referred to.

$LINENO - Returns the current line number in the Bash script.


references:

https://ryanstutorials.net/bash-scripting-tutorial/bash-variables.php#arguments

What is Rasta and RSpec

 Rasta is a keyword-driven test framework using spreadsheets to drive testing.

It’s loosely based on FIT, where data tables define parmeters and expected results. The spreadsheet can then be parsed using your test fixtures.


For the underlying test harness, Rasta uses RSpec so in addition to reporting results back to the spreadsheet you can take advantage of RSpec’s output formatters and simultaneously export into other formats such as HTML and plain text.


RSpec is a testing tool for Ruby, created for behavior-driven development (BDD). It is the most frequently used testing library for Ruby in production applications. Even though it has a very rich and powerful DSL (domain-specific language), at its core it is a simple tool which you can start using rather quickly.


What is Sonar-Scanner

SonarScanner is a separate client type application that in connection with the SonarQube server will run project analysis and then send the results to the SonarQube server to process it. SonarScanner can handle most programming languages supported by SonarQube except C# and VB.

references:

https://setapp.pl/how-to-use-sonarscanner

What is Clair Scan

Clair scans each container layer and provides a notification of vulnerabilities that may be a threat, based on the Common Vulnerabilities and Exposures database (CVE) and similar databases from Red Hat ®, Ubuntu, and Debian. Since layers can be shared between many containers, introspection is vital to build an inventory of packages and match that against known CVEs.


Clair has also introduced support for programming language package managers, starting with Python, and a new image-oriented API.


Automatic detection of vulnerabilities will help increase awareness and best security practices across development and operations teams, and encourage action to patch and address the vulnerabilities. When new vulnerabilities are announced, Clair knows right away, without rescanning, which existing layers are vulnerable and notifications are sent.


For example, CVE-2014-0160, aka "Heartbleed" has been known for some time, yet Red Hat Quay security scanning found it is still a potential threat to a high percent of the container images users have stored on Quay. 


Take note that vulnerabilities often rely on particular conditions in order to be exploited. For example, Heartbleed only matters as a threat if the vulnerable OpenSSL package is installed and being used. Clair isn’t suited for that level of analysis and teams should still undertake deeper analysis as required.


references:

https://www.redhat.com/en/topics/containers/what-is-clair#:~:text=Clair%20scans%20each%20container%20layer,%C2%AE%2C%20Ubuntu%2C%20and%20Debian.

What is /var/run/docker.sock

 docker.sock is the UNIX socket that Docker daemon is listening to. It's the main entry point for Docker API. It also can be TCP socket but by default for security reasons Docker defaults to use UNIX socket.


Docker cli client uses this socket to execute docker commands by default. You can override these settings as well.


There might be different reasons why you may need to mount Docker socket inside a container. Like launching new containers from within another container. Or for auto service discovery and Logging purposes. This increases attack surface so you should be careful if you mount docker socket inside a container there are trusted codes running inside that container otherwise you can simply compromise your host that is running docker daemon, since Docker by default launches all containers as root.


Docker socket has a docker group in most installation so users within that group can run docker commands against docker socket without root permission but actual docker containers still get root permission since docker daemon runs as root effectively (it needs root permission to access namespace and cgroups).


Saturday, December 24, 2022

Customizing a docker image

In this example, going to customize the jenkins image


Run Jenkins in Docker

In this tutorial, you’ll be running Jenkins as a Docker container from the jenkins/jenkins Docker image.


To run Jenkins in Docker, follow the relevant instructions below for either macOS and Linux or Windows.


On macOS and Linux

Open up a terminal window.


Create a bridge network in Docker using the following docker network create command:


docker network create jenkins


In order to execute Docker commands inside Jenkins nodes, download and run the docker:dind Docker image using the following docker run command:


docker run \

  --name jenkins-docker \

  --rm \

  --detach \

  --privileged \

  --network jenkins \

  --network-alias docker \

  --env DOCKER_TLS_CERTDIR=/certs \

  --volume jenkins-docker-certs:/certs/client \

  --volume jenkins-data:/var/jenkins_home \

  --publish 2376:2376 \

  --publish 3000:3000 --publish 5000:5000 \

  docker:dind \

  --storage-driver overlay2 



Now to customize the Jemkins image, below steps can be followed 


Customise official Jenkins Docker image, by executing below two steps:


Create Dockerfile with the following content:


FROM jenkins/jenkins:2.375.1

USER root

RUN apt-get update && apt-get install -y lsb-release

RUN curl -fsSLo /usr/share/keyrings/docker-archive-keyring.asc \

  https://download.docker.com/linux/debian/gpg

RUN echo "deb [arch=$(dpkg --print-architecture) \

  signed-by=/usr/share/keyrings/docker-archive-keyring.asc] \

  https://download.docker.com/linux/debian \

  $(lsb_release -cs) stable" > /etc/apt/sources.list.d/docker.list

RUN apt-get update && apt-get install -y docker-ce-cli

USER jenkins

RUN jenkins-plugin-cli --plugins "blueocean:1.26.0 docker-workflow:563.vd5d2e5c4007f"

Build a new docker image from this Dockerfile and assign the image a meaningful name, e.g. "myjenkins-blueocean:2.375.1-1":


docker build -t myjenkins-blueocean:2.375.1-1 .


Keep in mind that the process described above will automatically download the official Jenkins Docker image if this hasn’t been done before.


Run your own myjenkins-blueocean:2.375.1-1 image as a container in Docker using the following docker run command:


docker run \

  --name jenkins-blueocean \

  --detach \

  --network jenkins \

  --env DOCKER_HOST=tcp://docker:2376 \

  --env DOCKER_CERT_PATH=/certs/client \

  --env DOCKER_TLS_VERIFY=1 \

  --publish 8080:8080 \

  --publish 50000:50000 \

  --volume jenkins-data:/var/jenkins_home \

  --volume jenkins-docker-certs:/certs/client:ro \

  --volume "$HOME":/home \

  --restart=on-failure \

  --env JAVA_OPTS="-Dhudson.plugins.git.GitSCM.ALLOW_LOCAL_CHECKOUT=true" \

  myjenkins-blueocean:2.375.1-1 



references:


 Jenkins Distributed Builds Architecture


A Jenkins controller can operate by itself both managing the build environment and executing the builds with its own executors and resources. If you stick with this "standalone" configuration you will most likely run out of resources when the number or the load of your projects increase.


To come back up and running with your Jenkins infrastructure you will need to enhance the controller (increasing memory, number of CPUs, etc). The time it takes to maintain and upgrade the machine, the controller together with all the build environment will be down, the jobs will be stopped and the whole Jenkins infrastructure will be unusable.


Scaling Jenkins in such a scenario would be extremely painful and would introduce many "idle" periods where all the resources assigned to your build environment are useless.


Moreover, executing jobs on the controller introduces a "security" issue: the "jenkins" user that Jenkins uses to run the jobs would have full permissions on all Jenkins resources on the controller. This means that, with a simple script, a malicious user can have direct access to private information whose integrity and privacy could not be, thus, guaranteed. 


For all these reasons Jenkins supports agents, where the workload of building projects are delegated to multiple agents.


An agent is a machine set up to offload projects from the controller. The method with which builds are scheduled depends on the configuration given to each project. For example, some projects may be configured to "restrict where this project is run" which ties the project to a specific agent or set of labeled agents. Other projects which omit this configuration will select an agent from the available pool in Jenkins.


In a distributed builds environment, the Jenkins controller will use its resources to only handle HTTP requests and manage the build environment. Actual execution of builds will be delegated to the agents. With this configuration it is possible to horizontally scale an architecture, which allows a single Jenkins installation to host a large number of projects and build environments.


In order for a machine to be recognized as an agent, it needs to run a specific agent program to establish bi-directional communication with the controller.


reference:

https://www.jenkins.io/doc/book/scaling/architecting-for-scale/

Jenkins Scaling Considerations

As an organization matures from a continuous delivery standpoint, its Jenkins requirements will similarly grow. This growth is often reflected in the Jenkins architecture, whether that be "vertical" or "horizontal" growth.


Vertical growth is when the load on a Jenkins controller load is increased by having more configured jobs or orchestrating more frequent builds. This may also mean that more teams are depending on that one controller.


Horizontal growth is the creation of additional Jenkins controllers to accommodate new teams or projects, rather than adding new teams or projects to an existing controller.


There are potential pitfalls associated with each approach to scaling Jenkins, but with careful planning, many of them can be avoided or managed. Here are some things to consider when choosing a strategy for scaling your organization’s Jenkins instances:


Do you have the resources to run a distributed build system? If possible, it is recommended set up dedicated build nodes that run separately from the Jenkins controller. This frees up resources for the controller to improve its scheduling performance and prevents builds from being able to modify any potentially sensitive data in the $JENKINS_HOME. This also allows for a single controller to scale far more vertically than if that controller were both the job builder and scheduler.


Do you have the resources to maintain multiple controllers? Jenkins controllers require regular plugin updates, semi-monthly core upgrades, and regular backups of configurations and build histories. Security settings and roles will have to be manually configured for each controller. Downed controllers will require manual restart of the Jenkins controller and any jobs that were killed by the outage.


How mission critical are each team’s projects? Consider segregating the most vital projects to separate controllers to minimize the impact of a single downed controller. Also consider converting any mission-critical project pipelines to Pipeline jobs, which continue executing even when the agent connection to the controller is lost.


How important is a fast start-up time for your Jenkins instance? The more jobs a controller has configured, the longer it takes to load Jenkins after an upgrade or a crash. The use of folders and views to organize jobs can limit the number of that need to be rendered on start up.


references:

https://www.jenkins.io/doc/book/scaling/architecting-for-scale/

What is Pandas Profiling

The pandas_profiling library in Python include a method named as ProfileReport() which generate a basic report on the input DataFrame. 


The report consist of the following:


DataFrame overview,

Each attribute on which DataFrame is defined,

Correlations between attributes (Pearson Correlation and Spearman Correlation), and

A sample of DataFrame.


pandas_profiling.ProfileReport(df, **kwargs)


bins int Number of bins in histogram. The default is 10.

check_correlation boolean Whether or not to check correlation. It’s `True` by default.

correlation_threshold float Threshold to determine if the variable pair is correlated. The default is 0.9.

correlation_overrides list Variable names not to be rejected because they are correlated. There is no variable in the list (`None`) by default.

check_recoded boolean Whether or not to check recoded correlation (memory heavy feature). Since it’s an expensive computation it can be activated for small datasets. `check_correlation` must be true to disable this check. It’s `False` by default.

pool_size int Number of workers in thread pool. The default is equal to the number of CPU.



References:

https://www.geeksforgeeks.org/pandas-profiling-in-python/

WHAT IS THE DIFFERENCE BETWEEN GOLDEN CLIENT AND SANDBOX CLIENT

 Sandbox server is for trial & error, Here any thing done doesn’t affect the other servers, Implementers use this Server at the very initial stage of realization. Here they create a demo structure, how actual configuration will be done & it can be erased later. Sandbox word means drawing on the sand & rubbing if do not match the figure & try again to meet the correct figure.


Golden server is a kind of development server, here implementers do the real configuration as per requirement & if satisfied with the configuration, they transport it to the quality server for testing & if not, the re-do the configuration to meet the requirement

(remember, this is a very neat and clean client and you cannot use it for rough usage)


Master data is not configured in Golden server

What is Kubernetes plugin

The Kubernetes plugin allocates Jenkins agents in Kubernetes pods. Within these pods, there is always one special container jnlp that is running the Jenkins agent. Other containers can run arbitrary processes of your choosing, and it is possible to run commands dynamically in any container in the agent pod.


Pod templates defined using the user interface declare a label. When a freestyle job or a pipeline job using node('some-label') uses a label declared by a pod template, the Kubernetes Cloud allocates a new pod to run the Jenkins agent.


It should be noted that the main reason to use the global pod template definition is to migrate a huge corpus of existing projects (including freestyle) to run on Kubernetes without changing job definitions. New users setting up new Kubernetes builds should use the podTemplate step 


The podTemplate step defines an ephemeral pod template. It is created while the pipeline execution is within the podTemplate block. It is immediately deleted afterwards. Such pod templates are not intended to be shared with other builds or projects in the Jenkins instance.



The following idiom creates a pod template with a generated unique label (available as POD_LABEL) and runs commands inside it.

podTemplate {

    node(POD_LABEL) {

        // pipeline steps...

    }

}


podTemplate {

    node(POD_LABEL) {

        stage('Run shell') {

            sh 'echo hello world'

        }

    }

}



references:

https://plugins.jenkins.io/kubernetes/

What Is SMOTE?

Just like the name suggests, the technique generates synthetic data for the minority class.

SMOTE proceeds by joining the points of the minority class with line segments and then places artificial points on these lines.


Under the hood, the SMOTE algorithm works in 4 simple steps:


1. Choose a minority class input vector

2. Find its k nearest neighbors (k_neighbors is specified as an argument in the SMOTE() function)

3. Choose one of these neighbors and place a synthetic point anywhere on the line joining the point under consideration and its chosen neighbor

4. Repeat the steps until data is balanced


SMOTE is implemented in Python using the imblearn library.


references:

https://medium.com/analytics-vidhya/balance-your-data-using-smote-98e4d79fcddb

AI/ML what are various correlation coefficients

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.


To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.


Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.


To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.


Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.


To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.


Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Friday, December 23, 2022

AI/ML what is pycaret

PyCaret is an open-source, low-code machine learning library in Python that automates machine learning workflows. It is an end-to-end machine learning and model management tool that exponentially speeds up the experiment cycle and makes you more productive.

Compared with the other open-source machine learning libraries, PyCaret is an alternate low-code library that can replace hundreds of lines of code with a few lines only. This makes experiments exponentially faster and more efficient. PyCaret is essentially a Python wrapper around several machine learning libraries and frameworks such as scikit-learn, XGBoost, LightGBM, CatBoost, spaCy, Optuna, Hyperopt, Ray, and a few more.

The design and simplicity of PyCaret are inspired by the emerging role of citizen data scientists, a term first used by Gartner. Citizen data scientists are power users who can perform both simple and moderately sophisticated analytical tasks that would previously have required more technical expertise. 


references:

https://www.datacamp.com/tutorial/guide-for-automating-ml-workflows-using-pycaret 

Docker logs - how to write own logging driver

At a high level what is happening here is that docker consults a logging driver for the container which returns the “captured” STDOUT of your program. Each Docker daemon has a default logging driver, which each container uses, unless, you configure it to use a different logging driver

what Splunk driver does , sending program STDOUT to Splunk server


The Contract

A Log Driver plugin needs to handle following HTTP Requests —


Start Logging Request

Called when the container is started with the configured logging driver. There are couple of key things to understand about this request. As part of this HTTP request, Docker includes the ID of the container and also a handle to a named pipe for that container. STDOUT from the programs in the container is available on this FIFO and hence Driver should open this FIFO as a reader to continuously ingest the program output.


Stop Logging Request

Called when container is stopped. Gives Driver a chance to cleanup the resources allocated for reading the FIFO stream.


Get Supported Capabilities

Drivers can indicate whether they support ability to read logs (called when docker logs is invoked). For example, It may not make sense for a logging driver which ships log to remote location, to support this functionality


Read Logs

docker logs invokes this call on the driver, to retrieve logs.


Request body are in JSON format. I will not go into the detail of all the Request format. Please check out http.go here , for the request/response details


references:

https://github.com/monmohan/logdriver

https://software-factotum.medium.com/writing-a-docker-log-driver-plugin-7275d99d07be

Docker log techniques

To view all the log files, below will be helpful 

sudo docker ps -qa | sudo xargs docker inspect --format='{{.LogPath}}' 

to view the log files with sizes, below will be helpful 

sudo docker ps -qa | sudo xargs docker inspect --format='{{.LogPath}}' | sudo xargs ls -hl 

The general format of viewing the log file is 

docker inspect -f {{.LogPath}} $HERE_IS_YOUR_CONTAINER


to get the entire docker container ID, instead of truncated one, we need to use the below

 docker ps --no-trunc




references:

https://forums.docker.com/t/docker-logging-driver-json-file-not-able-to-locate-json-log-file-where-the-logs-are-written/32256


How to decode JWT Token

 function parseJwt (token) {

    var base64Url = token.split('.')[1];

    var base64 = base64Url.replace(/-/g, '+').replace(/_/g, '/');

    var jsonPayload = decodeURIComponent(window.atob(base64).split('').map(function(c) {

        return '%' + ('00' + c.charCodeAt(0).toString(16)).slice(-2);

    }).join(''));


    return JSON.parse(jsonPayload);

}


this is usually done by library such as jwt, but with this, it can be done without that


references:

https://stackoverflow.com/questions/38552003/how-to-decode-jwt-token-in-javascript-without-using-a-library

what is tmpfs in Linux

 A temporary file system (TMPFS) uses local memory for file system reads and writes, which is typically much faster than reads and writes in a UFS file system. TMPFS file systems can improve system performance by saving the cost of reading and writing temporary files to a local disk or across the network.

Thursday, December 22, 2022

git remove files from staged area

 git restore --staged .

This command does the trick

https://stackoverflow.com/questions/19730565/how-to-remove-files-from-git-staging-area 

Sunday, December 18, 2022

AI/ML Pandas timedelta function

Date and time calculations using Numpy timedelta64.

Different units are used with timedelta64 for calculations, the list of units are given at the end of this tutorial.

Let us create DataFrame with two datetime columns to calculate the difference.


import pandas as pd 

my_dict={'NAME':['Ravi','Raju','Alex'],

         'dt_start':['1/1/2020','2/1/2020','5/1/2020'],

         'dt_end':['6/15/2022','7/22/2022','11/15/2023']

}

my_data = pd.DataFrame(data=my_dict)

my_data['dt_start'] = pd.to_datetime(my_data['dt_start'])

my_data['dt_end'] = pd.to_datetime(my_data['dt_end'])

print(my_data)

Output

NAME   dt_start     dt_end

0  Ravi 2020-01-01 2022-06-15

1  Raju 2020-02-01 2022-07-22

2  Alex 2020-05-01 2023-11-15


my_data['diff_days']=my_data['dt_end']-my_data['dt_start']

print(my_data)


  NAME   dt_start     dt_end diff_days

0  Ravi 2020-01-01 2022-06-15  896 days

1  Raju 2020-02-01 2022-07-22  902 days

2  Alex 2020-05-01 2023-11-15 1293 days


references:

https://www.plus2net.com/python/pandas-dt-timedelta64.php

Express js The simplest of simplest app

const express = require('express')

const app = express()

const port = 3000


app.get('/', (req, res) => {

  res.send('Hello World!')

})


app.listen(port, () => {

  console.log(`Example app listening on port ${port}`)

})


references:


What is Grizzly Framework

Writing scalable server applications in the Java™ programming language has always been difficult. Before the advent of the Java New I/O API (NIO), thread management issues made it impossible for a server to scale to thousands of users. The Grizzly NIO framework has been designed to help developers to take advantage of the Java™ NIO API. Grizzly’s goal is to help developers to build scalable and robust servers using NIO as well as offering extended framework components: Web Framework (HTTP/S), WebSocket, Comet, and more!




references

https://javaee.github.io/grizzly/

jarvis lab token authentication error


 

But this was just because the jarvis lab instance was shutdown. Just refreshing and checking again actually got it working again

What is ZeroMQ

ZeroMQ (also known as ØMQ, 0MQ, or zmq) looks like an embeddable networking library but acts like a concurrency framework. It gives you sockets that carry atomic messages across various transports like in-process, inter-process, TCP, and multicast. You can connect sockets N-to-N with patterns like fan-out, pub-sub, task distribution, and request-reply. It's fast enough to be the fabric for clustered products. Its asynchronous I/O model gives you scalable multicore applications, built as asynchronous message-processing tasks. It has a score of language APIs and runs on most operating systems.

This is actually used in Jupiter notebook as well. 

The philosophy of ZeroMQ starts with the zero. The zero is for zero broker (ZeroMQ is brokerless), zero latency, zero cost (it’s free), and zero administration.

More generally, “zero” refers to the culture of minimalism that permeates the project. We add power by removing complexity rather than by exposing new functionality.

references

https://zeromq.org/


How to setup jarvislabs

1. Register using email and password

2. Do the verification code in email 

3. Login using the credentials 

4. It shows the allocations and press launch

5. It takes some time to finish launching 







What is jarvislab.ai ?

This is what they say 

We set up all the infra/compute and software (Cuda,Frameworks) required for you to train and deploy your favourite Deep learning model. You can spin GPU/CPU powered instances directly from your browser or automate it through our python API.


It is a 1-click GPU cloud platform for AI engineers and researchers.

References


Tuesday, December 13, 2022

What is the use of init containers?

Init containers can contain utilities or custom code for setup that are not present in an app image. For example, there is no need to make an image FROM another image just to use a tool like sed , awk , python , or dig during setup.

OpenShift Container Platform provides init containers, which are specialized containers that run before application containers and can contain utilities or setup scripts not present in an app image.

references:


Monday, December 12, 2022

Javascript Joi, to validate possible values

Joi.string().valid('STUDENT', 'TEACHER').uppercase().required()

Joi support multiple date formats. 

Suppose, if we want to support multiple formats, 

Say, YYY-MM-DDTHH:mm:ss.sssZ and YYY-MM-DD, then better just check for Joi.date() 

 Also if we want to do the validation independantly for a specific string format, below can be done.

Joi.date().format("YYYY-MM-DDTHH:mm:ss.sssZ").optional(),

references:

https://www.digitalocean.com/community/tutorials/how-to-use-joi-for-node-api-schema-validation

Sunday, December 11, 2022

AI/ML What is MLFlow

 MLflow is an open source platform for managing the end-to-end machine learning lifecycle. It has the following primary components: Tracking: Allows you to track experiments to record and compare parameters and results.


Main components are 


The tracking component allows you to record machine model training sessions (called runs) and run queries using Java, Python, R, and REST APIs.

The model component provides a standard unit for packaging and reusing machine learning models

The model registry component lets you centrally manage models and their lifecycle.

The project component packages code used in data science projects, ensuring it can easily be reused and experiments can be reproduced.


here are two other key concepts in MLflow:


A run is a collection of parameters, metrics, labels, and artifacts related to the training process of a machine learning model.


An experiment is the basic unit of MLflow organization. All MLflow runs belong to an experiment. For each experiment, you can analyze and compare the results of different runs, and easily retrieve metadata artifacts for analysis using downstream tools. Experiments are maintained on an MLflow tracking server hosted on Azure Databricks.



References:

https://www.run.ai/guides/machine-learning-operations/mlflow

Thursday, December 8, 2022

VS Code remote development

This is a pretty useful feature in VS Code 

- First need to setup ssh session VS code 

- then open the folder in ssh session and ready to go! 

references:

https://code.visualstudio.com/docs/remote/ssh 

Wednesday, December 7, 2022

What are Bogon IP addresses

Some IP addresses and IP ranges are reserved for special use, such as for local or private networks, and should not appear on the public internet. These reserved ranges, along with other IP ranges that haven’t yet been allocated and therefore also shouldn’t appear on the public internet are sometimes known as bogons.


references:

https://ipinfo.io/10.254.4.185

Tuesday, December 6, 2022

How to view the docker image details

Below is step to do this.

docker image inspect [OPTIONS] IMAGE [IMAGE...]

references:

https://docs.docker.com/engine/reference/commandline/image_inspect/

Monday, December 5, 2022

How to write swagger documentation

This page gives a write to preview ui for quickly verifying the Swagger documentation 

https://editor.swagger.io/

references:

https://editor.swagger.io/

Sunday, December 4, 2022

Really good docker compose commands

To list all containers listed in docker compose file 


docker compose ps 


To get logs from all containers 


docker compose logs -f --tail 100 



To get events such as Start, kill etc from all cotnainers 


docker compose events


To list the images created by containers 


docker compose images


Below give an overall stats


docker compose ls


my-test-proj    restarting(2), running(26)



Below commands give filtered view 


docker compose ps | grep restarting

docker compose ps | grep exited

docker compose ps | grep running 

Saturday, December 3, 2022

What is Rosetta

Rosetta is not an app that you open or interact with. Rosetta works automatically in the background whenever you use an app that was built only for Mac computers with an Intel processor. It translates the app for use with Apple silicon.

In most cases, you won't notice any difference in the performance of an app that needs Rosetta. 


Which apps need Rosetta?

To identify apps that need Rosetta or can use Rosetta:


Select an app in the Finder.

From the File menu in the menu bar, choose Get Info.

See the information labeled Kind:

Application (Intel) means the app supports only Intel processors and needs Rosetta to work on a Mac with Apple silicon.

Application (Universal) means the app supports both Apple silicon and Intel processors, and uses Apple silicon by default. Universal apps don't need Rosetta.


For apps labeled Application (Universal), the Info window includes the setting “Open using Rosetta.” This setting enables a universal app such as a web browser to use plug-ins, extensions, or other add-ons that haven't been updated to support Apple silicon. If a universal app doesn't recognize an add-on that you installed for the app, you can quit the app, select this setting, and try again.


References:

https://support.apple.com/en-in/HT211861#:~:text=Rosetta%20works%20automatically%20in%20the,an%20app%20that%20needs%20Rosetta.


Why iOS Simulator shows Test mode - Google Ad

 Enable test devices (Test mode)

If you want to do more rigorous testing with production-looking ads, you can now configure your device as a test device and use your own ad unit IDs that you've created in the AdMob UI. Test devices can either be added in the AdMob UI or programmatically using the Google Mobile Ads SDK.

Follow the steps below to add your device as a test device.

Key Point: iOS simulators are automatically configured as test devices.

References:

developer.cisco.com/learning/lab/intro-netconf/step/2

`docker-compose up` times out with UnixHTTPConnectionPool


Some of the ways are: 


1. restart docker and set the below environment variables 

setting DOCKER_CLIENT_TIMEOUT and COMPOSE_HTTP_TIMEOUT environment variables:

it seems to be machine resource limitation. 

Something else my team did was to make a script that starts a batch of containers, waited for them to be healthy, and then started another batch of containers, and so on. That way the computer was not overwhelmed by all the operations that ran during startup.


Sunday, November 27, 2022

How to use Dive to inspect Docker image contents

Install Dive 

brew install dive


Now once installed, just give like below the argument needs to be Image ID

dive 7d35abc40782

references:

https://github.com/wagoodman/dive


Docker steps involved in Building Image

If the Dockerfile content is like this 

#from base image

FROM ubuntu:14.04

#author name

MAINTAINER RAGHU

#commands to run in the container

RUN echo "hello Raghu"

RUN sleep 10

RUN echo "TASK COMPLETED"



Command used to build the image: docker build -t raghavendar/hands-on:2.0 .


Sending build context to Docker daemon 20.04 MB

Step 1 : FROM ubuntu:14.04

---> b1719e1db756

Step 2 : MAINTAINER RAGHU

---> Running in 532ed79e6d55

---> ea6184bb8ef5

Removing intermediate container 532ed79e6d55

Step 3 : RUN echo "hello Raghu"

---> Running in da327c9b871a

hello Raghu

---> f02ff92252e2

Removing intermediate container da327c9b871a

Step 4 : RUN sleep 10

---> Running in aa58dea59595

---> fe9e9648e969

Removing intermediate container aa58dea59595

Step 5 : RUN echo "TASK COMPLETED"

---> Running in 612adda45c52

TASK COMPLETED

---> 86c73954ea96

Removing intermediate container 612adda45c52

Successfully built 86c73954ea96



Some explanation of the build process is as below 

Yes, Docker images are layered. When you build a new image, Docker does this for each instruction (RUN, COPY etc.) in your Dockerfile:


create a temporary container from the previous image layer (or the base FROM image for the first command;

run the Dockerfile instruction in the temporary "intermediate" container;

save the temporary container as a new image layer.


The final image layer is tagged with whatever you name the image - this will be clear if you run docker history raghavendar/hands-on:2.0, you'll see each layer and an abbreviation of the instruction that created it.


Your specific queries:

1) 532 is a temporary container created from image ID b17, which is your FROM image, ubuntu:14.04.

2) ea6 is the image layer created as the output of the instruction, i.e. from saving intermediate container 532.

3) yes. Docker calls this the Union File System and it's the main reason why images are so efficient.

references:

https://stackoverflow.com/questions/39705085/how-are-intermediate-containers-formed

Sunday, November 20, 2022

Javascript for vs foreach method

 


Javascript map vs foreach

Map actually creates a new array. For each does not. this is the main difference. 

on some machines, forEach() was more than 70% slower than map(). Your browser is probably different. You can check out the full test results here:

 references:

https://codeburst.io/javascript-map-vs-foreach-f38111822c0f

Thursday, November 17, 2022

npm registry commands useful

npm config get registry

npm config list

npm config edit

npm config get

npm config set <name> <url>

npm cache clean

npm config delete registry => deletes all registry 

Install from specific registry 

npm install @cisco-bpa-platform/ui-template-manager


Wednesday, November 16, 2022

Camunda Cron Job Expressions

Below are few good resources for finding out the Cron job expressions. There is also a problem that depending on the version of the Camunda engine, the support for these cron expressions are limited. 

It might throw errors like this below. 

NGINE-09026 Exception while parsing cron expression '0 0 * * * MON': Support for specifying both a day-of-week AND a day-of-month parameter is not implemented. [ deploy-error ]
ENGINE-09026 Exception while parsing cron expression '0 0 * ? ? MON': '?' can only be specfied for Day-of-Month or Day-of-Week. [ deploy-error ]
ENGINE-09026 Exception while parsing cron expression '* * * * * 1': Support for specifying both a day-of-week AND a day-of-month parameter is not implemented. [ deploy-error ]
ENGINE-09026 Exception while parsing cron expression '* * * * * 1': Support for specifying both a day-of-week AND a day-of-month parameter is not implemented. [ deploy-error ]
ENGINE-09026 Exception while parsing cron expression '* * * * * MON': Support for specifying both a day-of-week AND a day-of-month parameter is not implemented. [ deploy-error ]


https://www.freeformatter.com/cron-expression-generator-quartz.html

https://crontab.guru/every-week


https://github.com/camunda/zeebe/issues/9673



Tuesday, November 15, 2022

AI/ML How to print Dataframes in style

The approaches are if using IPython, then use the display 

from IPython.display import display

display(df)


If is also possible to apply the style using the below 


df.style


def color_negative_red(val):

    """

    Takes a scalar and returns a string with

    the css property `'color: red'` for negative

    strings, black otherwise.

    """

    color = 'blue' if val > 90 else 'black'

    return 'color: % s' % color


df.style.applymap(color_negative_red)


If using print, then tabulate is a good option 


from tabulate import tabulate


print (tabulate(df, headers = 'keys', tablefmt = 'psql'))

There are many table formats available such other than psql 


with psql, it looks like this below 


references:

https://www.geeksforgeeks.org/display-the-pandas-dataframe-in-table-style/ 


AI/ML What is Document Term Matrix

The text data is represented in the form of a matrix. The rows of the matrix represent the sentences from the data which needs to be analyzed and the columns of the matrix represent the word. The dice under the matrix represent the number of occurrences of the words. Let’s understand it with an example.


import pandas as pd

from sklearn.feature_extraction.text import CountVectorizer

docs = [sentence1, sentence2, sentence3]

print(docs)

docs = [sentence1, sentence2, sentence3]

print(docs)


vec = CountVectorizer()

X = vec.fit_transform(docs)


#now this can be converted to and printed using data frame 

df = pd.DataFrame(X.toarray(), columns=vec.get_feature_names())

df.head()


An example view from another workspace is 


References:

https://analyticsindiamag.com/a-guide-to-term-document-matrix-with-its-implementation-in-r-and-python/

AI/ML Logistic regression - Accuracy Value

Higher accuracy is indication of model performing better. 

Accuracy = TP+TN/TP+FP+FN+TN

TP = True positives

TN = True negatives

FN = False negatives

TN = True negatives


F1-score = 2*(Recall*Precision)/Recall+Precision where,


Precision = TP/TP+FP

Recall = TP/TP+FN


The scikit library gives better methods to do this

from sklearn.metrics import accuracy_score

accuracy_score(y_true, y_pred)


References:

https://stackoverflow.com/questions/47437893/how-to-calculate-logistic-regression-accuracy

AI/ML What is TF , IDF, and TFIDF ?

 The TF-IDF of a term is calculated by multiplying TF and IDF scores. It is basically, importance of a term is high when it occurs a lot in a given document and rarely in others. In short, commonality within a document measured by TF is balanced by rarity between documents measured by IDF.


Term frequency Inverse document frequency (TFIDF) is a statistical formula to convert text documents into vectors based on the relevancy of the word. It is based on the bag of the words model to create a matrix containing the information about less relevant and most relevant words in the document


Term Frequency (TF)


It is the ratio of the occurrence of the word (w) in document (d) per the total number of words in the documents. With this simple formulation, we are measuring the frequency of a word in the document. 

For example, if the sentence has 6 words and contains two “the”, the TF ratio of this word would be (2/6).


Inverse Document Frequency (IDF)

 

IDF calculates the importance of a word in a corpus D. The most frequently used words like “of, we, are” have little to no significance. It is calculated by dividing the total number of documents in the corpus by the number of documents containing the word.


References:

https://www.kdnuggets.com/2022/09/convert-text-documents-tfidf-matrix-tfidfvectorizer.html

AIML What is CountVectorizer and n-gram analysis


It is a scikit-learn package. This is mainly used for analysing commonly occurring words or phrase from a given set of documents such as web pages. 


The usage is something similar below 


import pandas as pd

from sklearn.feature_extraction.text import CountVectorizer


pd.set_option('max_columns', 10)

pd.set_option('max_rows', 10)


df = pd.read_csv('https://raw.githubusercontent.com/flyandlure/datasets/master/gonutrition.csv')

df.head()


# this works similar to the machine learning fit mechanism. We need to fit the vectoriser to the data that we need to analyse 


text = df['product_description']

model = CountVectorizer(ngram_range = (1, 1))

matrix = model.fit_transform(text).toarray()

df_output = pd.DataFrame(data = matrix, columns = model.get_feature_names())

df_output.T.tail(5)


df_output.shape


we set the CountVectorizer to 1, 1 to return unigrams or single words. Increasing the ngram_range will mean the vocabulary is expanded from single words to short phrases of your desired lengths. For example, setting the ngram_range to 2, 2 will return bigrams (2-grams) or two word phrases.


text = df['product_description']

model = CountVectorizer(ngram_range = (2, 2), stop_words='english')

matrix = model.fit_transform(text).toarray()

df_output = pd.DataFrame(data = matrix, columns = model.get_feature_names())

df_output.T.tail(5)


References:

https://practicaldatascience.co.uk/machine-learning/how-to-use-count-vectorization-for-n-gram-analysis#:~:text=CountVectorizer%20will%20tokenize%20the%20data,such%20as%20%E2%80%9Cwhey%20protein%E2%80%9D.


Monday, November 14, 2022

AI/ML What is Swifter package

Swifter package works with data frame which can be used to apply() function on data frame in an efficient manner. This reduces the computation time to a great extent. From documents it says that it could apply functions 100 times faster compared to regular pandas 

Usually the apply is used 

%time df['square'] = df['num'].apply(lambda x: x * 2)

This takes around 42 ms 

With swifter it is applied like this 

%time df['square'] = df['num'].swifter.apply(lambda x: x * 2)

To import and use this package, below to be done 

import pandas as pd

import swifter

pip install -U pandas

The above is required to update pandas

References:

https://morioh.com/p/26c8b6f1a4a1

AI/ML What is POS Tagging?

This is a mechanism for mark up the words in text format for a particular part of a speech based on its definition and context.

Some of the examples are 

JJS adjective, superlative (largest)

LS list market

MD modal (could, will)

NN noun, singular (cat, tree)

NNS noun plural (desks)

NNP proper noun, singular (sarah)

NNPS proper noun, plural (indians or americans)


To count tokens, 


from collections import Counter

import nltk

text = "Shiv is one of the best sites to learn WEB, SAP, Ethical Hacking and much more online."

lower_case = text.lower()

tokens = nltk.word_tokenize(lower_case)

tags = nltk.pos_tag(tokens)

counts = Counter( tag for word,  tag in tags)

print(counts)



references:

https://www.guru99.com/pos-tagging-chunking-nltk.html

AI/ML Google collab error Please use NLTK Downloader to obtain the resources

 


How to overcome this error? 

This happens for most of the NLTK ones. Just do this

import nltk

nltk.download('punkt')

nltk.download('wordnet')

nltk.download('omw-1.4')

AI/ML perfplot for performance Plotting

AI/ML perfplot for performance Plotting 

perfplot extends Python's timeit by testing snippets with input parameters (e.g., the size of an array) and plotting the results. This has also option for live update to the ui.

The code is like below 


import perfplot, string

np.random.seed(123)



def shape(df):

    return df[df.education == 'a'].shape[0]


def len_df(df):

    return len(df[df['education'] == 'a'])


def query_count(df):

    return df.query('education == "a"').education.count()


def sum_mask(df):

    return (df.education == 'a').sum()


def sum_mask_numpy(df):

    return (df.education.values == 'a').sum()


def make_df(n):

    L = list(string.ascii_letters)

    df = pd.DataFrame(np.random.choice(L, size=n), columns=['education'])

    return df


perfplot.show(

    setup=make_df,

    kernels=[shape, len_df, query_count, sum_mask, sum_mask_numpy],

    n_range=[2**k for k in range(2, 25)],

    logx=True,

    logy=True,

    equality_check=False, 

    xlabel='len(df)')



References:

https://stackoverflow.com/questions/35277075/python-pandas-counting-the-occurrences-of-a-specific-value

https://pypi.org/project/perfplot/