Saturday, April 29, 2023

How to set proxy for subscription manager

# yum clean all

Loaded plugins: product-id, subscription-manager

timed out


proxy information can be added in subscription-manager configuration file in /etc/rhsm/rhsm.conf manually with following options:


# an http proxy server to use

proxy_hostname =


# port for http proxy server

proxy_port =



references:

https://access.redhat.com/solutions/57669

Redhat Attaching subscriptoin manager

I'm getting the following error when trying to install a package: There is no repository enabled in /etc/yum.repos.d.


I realized that to have access to the redhat repositories I need to have an account on the site, I did that and after that I linked it to my terminal with the commands:


Below is what helped to get over this issue


Un-register the system :


sudo subscription-manager remove --all

sudo subscription-manager unregister

sudo subscription-manager clean


Re-register the system :


sudo subscription-manager register

sudo subscription-manager refresh


Search for the Pool ID :


sudo subscription-manager list --available


Attach the subscription :


sudo subscription-manager attach --pool=<Pool-ID>



https://access.redhat.com/discussions/6394941 

Using kubectl to Create a Deployment

Once you have a running Kubernetes cluster, you can deploy your containerized applications on top of it. To do so, you create a Kubernetes Deployment. The Deployment instructs Kubernetes how to create and update instances of your application. Once you've created a Deployment, the Kubernetes control plane schedules the application instances included in that Deployment to run on individual Nodes in the cluster.


Once the application instances are created, a Kubernetes Deployment controller continuously monitors those instances. If the Node hosting an instance goes down or is deleted, the Deployment controller replaces the instance with an instance on another Node in the cluster. This provides a self-healing mechanism to address machine failure or maintenance.


You can create and manage a Deployment by using the Kubernetes command line interface, kubectl. Kubectl uses the Kubernetes API to interact with the cluster. 



When you create a Deployment, you'll need to specify the container image for your application and the number of replicas that you want to run. You can change that information later by updating your Deployment;


The common format of a kub ectl command is: kubectl action resource



This performs the specified action (like create, describe or delete) on the specified resource (like node or deployment). You can use --help after the subcommand to get additional info about possible parameters (for example: kub ectl get nodes --help).


Check that ku bectl is configured to talk to your cluster, by running the kubectl version command.



Check that kub ectl is installed and you can see both the client and the server versions.



To view the nodes in the cluster, run the kubectl get nodes command.


You see the available nodes. Later, Kubernetes will choose where to deploy our application based on Node available resources.


Let’s deploy our first app on Kubernetes with the kubectl create deployment command. We need to provide the deployment name and app image location (include the full repository url for images hosted outside Docker hub).



ku bectl cr eate deployment kub ernetes-bootcamp --image=gcr.io/google-samples/kubernetes-bootcamp:v1


Great! You just deployed your first application by creating a deployment. This performed a few things for you:


searched for a suitable node where an instance of the application could be run (we have only 1 available node)

scheduled the application to run on that Node

configured the cluster to reschedule the instance on a new Node when needed


To list your deployments use the kubectl get deployments command:


k ub ectl get deployments


We see that there is 1 deployment running a single instance of your app. The instance is running inside a container on your node.


View the app

Pods that are running inside Kubernetes are running on a private, isolated network. By default they are visible from other pods and services within the same kubernetes cluster, but not outside that network. When we use kubectl, we're interacting through an API endpoint to communicate with our application.



The kubectl command can create a proxy that will forward communications into the cluster-wide, private network. The proxy can be terminated by pressing control-C and won't show any output while its running.


You need to open a second terminal window to run the proxy.


k u b ectl pr ox y


We now have a connection between our host (the online terminal) and the Kubernetes cluster. The proxy enables direct access to the API from these terminals.


You can see all those APIs hosted through the proxy endpoint. For example, we can query the version directly through the API using the curl command:


cu rl http://localhost:8001/version


The API server will automatically create an endpoint for each pod, based on the pod name, that is also accessible through the proxy.


First we need to get the Pod name, and we'll store in the environment variable POD_NAME:


exp ort POD_NAME=$(kubectl get pods -o go-template --template '{{range .items}}{{.metadata.name}}{{"\n"}}{{end}}')

ec ho Name of the Pod: $POD_NAME


You can access the Pod through the proxied API, by running:


curl h ttp://localhost:8001/api/v1/namespaces/default/pods/$POD_NAME/


In order for the new Deployment to be accessible without using the proxy, a Service is required which will be explained in the next modules.


references:

https://kubernetes.io/docs/tutorials/kubernetes-basics/deploy-app/deploy-intro/

Kubernetes is a cluster orchestration system.

Main tasks are


Deploy a containerized application on a cluster.

Scale the deployment.

Update the containerized application with a new software version.

Debug the containerized application.


With modern web services, users expect applications to be available 24/7, and developers expect to deploy new versions of those applications several times a day. Containerization helps package software to serve these goals, enabling applications to be released and updated without downtime. Kubernetes helps you make sure those containerized applications run where and when you want, and helps them find the resources and tools they need to work. Kubernetes is a production-ready, open source platform designed with Google's accumulated experience in container orchestration, combined with best-of-breed ideas from the community.


Creating a cluster 


Kubernetes coordinates a highly available cluster of computers that are connected to work as a single unit. The abstractions in Kubernetes allow you to deploy containerized applications to a cluster without tying them specifically to individual machines. To make use of this new model of deployment, applications need to be packaged in a way that decouples them from individual hosts: they need to be containerized. Containerized applications are more flexible and available than in past deployment models, where applications were installed directly onto specific machines as packages deeply integrated into the host. Kubernetes automates the distribution and scheduling of application containers across a cluster in a more efficient way. Kubernetes is an open-source platform and is production-ready.



A Kubernetes cluster consists of two types of resources:


The Control Plane coordinates the cluster

Nodes are the workers that run applications


The Control Plane is responsible for managing the cluster. The Control Plane coordinates all activities in your cluster, such as scheduling applications, maintaining applications' desired state, scaling applications, and rolling out new updates


A node is a VM or a physical computer that serves as a worker machine in a Kubernetes cluster. Each node has a Kubelet, which is an agent for managing the node and communicating with the Kubernetes control plane. The node should also have tools for handling container operations, such as containerd or Docker. A Kubernetes cluster that handles production traffic should have a minimum of three nodes because if one node goes down, both an etcd member and a control plane instance are lost, and redundancy is compromised. You can mitigate this risk by adding more control plane nodes.

Control Planes manage the cluster and the nodes that are used to host the running applications.

When you deploy applications on Kubernetes, you tell the control plane to start the application containers. The control plane schedules the containers to run on the cluster's nodes. The nodes communicate with the control plane using the Kubernetes API, which the control plane exposes. End users can also use the Kubernetes API directly to interact with the cluster.


A Kubernetes cluster can be deployed on either physical or virtual machines. To get started with Kubernetes development, you can use Minikube. Minikube is a lightweight Kubernetes implementation that creates a VM on your local machine and deploys a simple cluster containing only one node. Minikube is available for Linux, macOS, and Windows systems. The Minikube CLI provides basic bootstrapping operations for working with your cluster, including start, stop, status, and delete.


references:

https://kubernetes.io/docs/tutorials/kubernetes-basics/


What is kubectl

kubectl. The Kubernetes command-line tool, kubectl, allows you to run commands against Kubernetes clusters. You can use kubectl to deploy applications, inspect and manage cluster resources, and view logs. For more information including a complete list of kubectl operations, see the kubectl reference documentation

references:


Friday, April 28, 2023

What is Calico?

 



What is Calico 


Calico is a third-party solution developed to provide flexibility and simplify configuring Kubernetes network connectivity. It is available on all the major cloud platforms and can be installed on bare metal servers. Managing networks in Kubernetes is a complex job that requires experienced administrators


Calico is an open-source CNI (Container Network Interface) plugin for network management developed by Tigera. The plugin aims to simplify Kubernetes networking while making it more scalable and secure.


The NetworkPolicy API, the out-of-the-box network policy management solution for Kubernetes, has a restricted set of features. Limited to a single environment, users can apply network policies created using this API only to labeled pods. Network rules deal only with protocols and ports and can be applied to pods, environments, and subnets.


Calico improves the default Kubernetes networking experience in the following ways:


Rules can use actions like logging, restricting, or permitting. This feature provides administrators with greater flexibility in network configuration.

Aside from ports and protocols, rules can specify port ranges, IPs, node selectors, etc., allowing for a more granular approach to networking.

It extends the list of Kubernetes objects to which users can apply network policies with containers, interfaces, and virtual machines.

It enables the use of DNAT settings and traffic flow management policies.

Interoperability between Kubernetes and non-Kubernetes workloads is possible.

references:


Wednesday, April 26, 2023

Alma Linux set the ip address permanently

 To temporarily set the ip address below is the command 

to install ifconfig if not already available 

yum install net-tools

ifconfig ens192 <ip_address>/<subnet_mask>

ifconfig enp0s3 192.168.178.32/24

ifconfig enp0s3 192.168.178.32 netmask 255.255.255.0


For permanently setting it, below to be followed

ls -l /etc/sysconfig/network-scripts


references:

https://devconnected.com/how-to-change-ip-address-on-linux/

Linux how to change hostname


Type the following command to edit /etc/hostname using nano or vi text editor:

sudo nano /etc/hostname

Delete the old name and setup new name.

Next Edit the /etc/hosts file:

sudo nano /etc/hosts

Replace any occurrence of the existing computer name with your new one.

Reboot the system to changes take effect:

sudo reboot

references:

https://www.cyberciti.biz/faq/ubuntu-change-hostname-command/

pip set proxy

pip install --proxy http://<usr_name>@<proxyserver_name>:<port#> <pkg_name> 

references:

https://www.activestate.com/resources/quick-reads/pip-install-proxy/

yum how to set proxy

Enter management mode by typing your password and pressing Enter twice. Select Exit to terminal using the arrow keys and then press Enter.

2.Type:

nano /etc/yum.conf


3.Add a line with with information about your proxy. For example:

proxy=http://proxysvr.yourdom.com:3128


4.If the proxy requires username and password, add these settings. For example:

proxy=http://proxysvr.yourdom.com:3128

proxy_username=YourProxyUsername

proxy_password=YourProxyPassword


5.Press Ctrl+X and y to save the changes.

references:

https://www.activestate.com/resources/quick-reads/pip-install-proxy/

curl how to set http-proxy

curl -x http://proxy.myproxy.com:8080 -I https://ip-check.net

references:

https://help.limeproxies.com/en/articles/5275112-how-to-test-proxies-in-the-linux-command-line-interface

What is vMotion

A hot migration -- also known as a live migration -- is the process of migrating an active workload, such as a VM, from one physical machine to another. A vMotion-enabled service initiates a hot migration with zero downtime. This lets you replace or upgrade parts of the VM without a machine or system shutdown


vMotion can transfer the active memory and precise execution state of the virtual machine over a high-speed network, allowing it to switch from running on the source vSphere host to the destination vSphere host. vMotion keeps the transfer period imperceptible to users by tracking ongoing memory transactions in a bitmap. Once the entire memory and system state have been copied to the target vSphere host, vMotion suspends the source virtual machine, copies the bitmap to the target vSphere host and resumes the virtual machine on the target vSphere host. Transaction integrity is ensure


references:

https://www.vmware.com/in/products/vsphere/vmotion.html#:~:text=vMotion%20can%20transfer%20the%20active,to%20the%20destination%20vSphere%20host.

Tuesday, April 18, 2023

Rocky Linux how to install python 3.6

 sudo dnf update -y


When executing below command, it should really display the list of python versions available. 

sudo dnf install python

Now we can select one of the version

sudo dnf install python36 -y

python3 -V


references: 

https://www.digitalocean.com/community/tutorials/how-to-install-python-3-and-set-up-a-programming-environment-on-rocky-linux-9

Monday, April 17, 2023

What is SELinux

Security-Enhanced Linux (SELinux) is a security architecture for Linux® systems that allows administrators to have more control over who can access the system. It was originally developed by the United States National Security Agency (NSA) as a series of patches to the Linux kernel using Linux Security Modules (LSM).  


SELinux was released to the open source community in 2000, and was integrated into the upstream Linux kernel in 2003.


How does SELinux work?

SELinux defines access controls for the applications, processes, and files on a system. It uses security policies, which are a set of rules that tell SELinux what can or can’t be accessed, to enforce the access allowed by a policy. 


When an application or process, known as a subject, makes a request to access an object, like a file, SELinux checks with an access vector cache (AVC), where permissions are cached for subjects and objects.


If SELinux is unable to make a decision about access based on the cached permissions, it sends the request to the security server. The security server checks for the security context of the app or process and the file. Security context is applied from the SELinux policy database. Permission is then granted or denied. 


If permission is denied, an "avc: denied" message will be available in /var/log.messages.


refereces:

https://www.redhat.com/en/topics/linux/what-is-selinux


libncurses.so.5 not found

Finally i had to install with wildcard like this below 

sudo yum install libncurses*

references:

 https://stackoverflow.com/questions/17005654/error-while-loading-shared-libraries-libncurses-so-5

Python File Logging

logging.basicConfig(filename=logname,
                    filemode='a',
                    format='%(asctime)s,%(msecs)d %(name)s %(levelname)s %(message)s',
                    datefmt='%H:%M:%S',
                    level=logging.DEBUG)

logging.info("Running Urban Planning")

logger = logging.getLogger('urbanGUI')

references

https://stackoverflow.com/questions/6386698/how-to-write-to-a-file-using-the-logging-python-module

Sunday, April 16, 2023

What is unirest for NodeJS

Unirest is a set of lightweight HTTP libraries available in multiple languages, built and maintained by Kong, who also maintain the open-source API Gateway Kong.


npm install unirest


unirest

  .post('http://mockbin.com/request')

  .headers({'Accept': 'application/json', 'Content-Type': 'application/json'})

  .send({ "parameter": 23, "foo": "bar" })

  .then((response) => {

    console.log(response.body)

  })



Uploading Files


unirest

  .post('http://mockbin.com/request')

  .headers({'Content-Type': 'multipart/form-data'})

  .field('parameter', 'value') // Form field

  .attach('file', '/tmp/file') // Attachment

  .then(function (response) {

    console.log(response.body)

  })



Custom Entity Body


unirest

  .post('http://mockbin.com/request')

  .headers({'Accept': 'application/json'})

  .send(Buffer.from([1,2,3]))

  .then(function (response) {

    console.log(response.body)

  }


references:

https://github.com/Kong/unirest-nodejs


Kubernetes Concepts ReplicaSets, Deploymenets, DaemonSets, StatefulSets, Services & EndPoints, Ingress, Jobs and CronJobs

ReplicaSets are the wrappers around Pods that simply let you manage the number of Pods you want running at a given time. Kubernetes claims high availability and fault tolerance, and having multiple pods running helps achieve this. The job of the ReplicaSet is to ensure Kubernetes is running the configured number of Pods.


It’s good to know that ReplicaSets exist and what they do, but you will rarely need to create or manage ReplicaSets directly. Deployments will manage ReplicaSets for you. You normally only need to deal with ReplicaSets when troubleshooting unexpected problems.


Deployments are one of the most frequently used and managed components in Kubernetes. The Deployment manages your ReplicaSet, which in turn manages the number of Pods that should be running in your desired state. However, it also manages a few more things for you such as updates, rollbacks, and scaling.


After you have deployed your Pod for the first time, there’s a good chance you are going to want to update it when you have a code fix or enhancement. Deployments can help manage this update and can even do a gradual rollout so that all of your Pods are not down at the same time. It will keep track of each update which will allow you to perform rollbacks. This can be very useful if something goes wrong with an update and you need to revert quickly.


You can also use Deployments to scale up or scale down the number of Pods you want to have running. Kubernetes also has a Horizontal Pod Autoscaler that can even do this automatically based on various criteria.


DaemonSets ensure that one Pod of a set is running on each node in your cluster. Generally you want Kubernetes Deployments to manage where and how many of your Pods run in your cluster, but there are some good use cases for DaemonSets. Some of these common use cases include monitoring, logging, virus scanning, and intrusion detection.


For example, you may have a Pod that collects system logs and ships them off to an aggregator. This is a great candidate for a DaemonSet if you want to collect the logs from all of the nodes in your cluster.


StatefulSets

StatefulSets look quite a bit like Deployments, but are specialized for Pods that require persistent storage. They provide a more predictable naming convention for your Pods to help with service discovery and ultimately allow you to do clustering or replication between your Pods in a single StatefulSet. If you are running MySQL, PostgreSQL, MongoDB, Cassandra, or Etcd in your cluster, StatefulSets may be a good solution for managing these workloads.



Service and Endpoints

With your Pods are running and being managed by Deployments, you’ll need a way for your Pods to talk with each other. You may have an e-commerce catalog that needs to talk to a shopping cart, order, and billing APIs. That’s were Services come to the rescue.


A Service allows you to have a listening port open on your cluster or nodes that will send traffic to your Pod. This also supports high availability and fault tolerance by distributing the traffic to your healthy Pods. Note, the word “healthy”. If your Pod is down and deemed as unhealthy, the Service will automatically stop sending it traffic.


Services manage another Kubernetes component called EndPoints to achieve this. As Pods are created, scaled up, scaled down, crash, and recover, the Service will add and remove EndPoints to these Pods. EndPoints are another component you don’t generally manipulate directly, but may have to inspect to troubleshoot a problem.


Ingress

Ingress can be thought of as the glue between the outside world and your Services. It requires an ingress controller such as Nginx or Traefik to be deployed in your cluster. Ingress can provide you HTTP layer 7 traffic control, including load balancing and fan out, as well as SSL termination/offloading.


In terms of traffic control, you can fan out your requests based on host name or path. For example, requests coming to “/v1/catalog/” can go to your catalog Service, which ultimate routes to your catalog Pods and requests for “/v1/orders/” can be sent to your orders Service. While you don’t need Ingress for Pods to talk to each other, it’s key for allowing other users or services on the Internet to reach your Pods.


Jobs and CronJobs

While the majority of workloads are long-lived microservices that you always want running, there is a significant use case for short-lived workloads. Kubernetes has a Job component that allows you to run a Pod with the expectation that it will terminate normally and gracefully in the near future. In the long-lived use case, Kubernetes will automatically restart a Pod that has terminated for any reason. Job-managed Pods may also get retried if they terminate abnormally, but will be marked as completed when they terminate normally.


CronJobs manage Jobs on a scheduled basis, following the same model as a Linux crontab. Using CronJobs, you can schedule a job to run once a day, a week, or a month. More advanced scheduling is possible and very similar to Linux crontab. Running database and file backups as well as other off-hours system maintenance are common uses cases for CronJobs.


references:

https://www.suse.com/c/rancher_blog/introduction-to-kubernetes-workloads/

What is Kubernetes Workloads API

Kubernetes is all about managing your container infrastructure

n Kubernetes, there is no object, component, and any kind of construct called a “workload”. However, the term is often used as a general category for tasks and services you want running on your cluster. It might be synonymous with microservices, applications, containers, or processes. Workloads are often long-lived processes, but can also be short-lived on demand or batch jobs.

Workload Building Blocks

Pods are the most basic building block in Kubernetes when it comes to workloads. A Pod consists of one or more containers. The containers within a pod share networking and host storage, but are isolated in terms of cgroups and kernel namespaces ( not to confuse with Kubernetes namespaces ).

How many containers should you have in your Pod? There is no limit or restriction, but it’s generally a good practice to only have a single container in your pod when you are first getting started. As you’ll see later on, many Kubernetes constructs allow you to scale Pods, so the fewer containers you put in your Pods, the more granular control you have over scaling your infrastructure. 

As you advance your Kubernetes expertise, you can experiment with init and sidecar containers

references:

https://www.suse.com/c/rancher_blog/introduction-to-kubernetes-workloads/


Thursday, April 13, 2023

What is air-gapped and how does it improve network security

 An air gap is a network security measure that implies a physical separation between a secure network and any other computer or network. A gapped computer is not directly connected to the Internet, nor it is connected to any other system. 

Air gaps have been a common security measure in the critical infrastructure sector, where a cyber attack can disrupt or halt major operations. The systems that deploy gapping normally include:

Military computer systems and networks;

Governmental computer systems and networks;

Financial computer systems and networks;

Industrial control systems;

Nuclear power plants;

Aviation computers;

Medical equipment.

Gapped computers are typically located in secure places, such as in a separate server facility with tight security. As a precaution, air-gapped systems have restricted access, so only a few trusted users can access them.

Types of air gaps

There are three main types of the air gap concept. Let’s see each type in more detail.

Total physical air gaps: this type assumes complete physical separation of a system/device from the network. That means there are no network connections to the device and if you need to get or load the data onto it, you need to go to the storage place directly. You may also need to pass through the security since physical access to the environment where the device is stored is usually restricted. 

Isolated air-gapped systems: this type implies that systems/devices are not connected to a common network, but are in the same place (i.e. in one room).

Logical air gaps: are not separated physically from the rest of the system but are isolated from it through encryption and hashing. references:

https://softteco.com/blog/what-is-air-gap

Wednesday, April 12, 2023

What is syslog?

 What is syslog?

Syslog is a protocol that computer systems use to send event data logs to a central location for storage. Logs can then be accessed by analysis and reporting software to perform audits, monitoring, troubleshooting, and other essential IT operational tasks.


The go-to logging method since the 1980s, the syslog protocol has maintained its popularity through its ease of use, making it simple and straightforward to transport event log messages.


Perhaps the most convenient feature supporting this simplicity is the layered architecture, which enables users to put across messages using a number of different protocols. Additionally, when users need to provide vendor-specific extensions, the syslog message format allows them to do so within a structured framework.



How does syslog work?

Although it has been popular for decades, syslog hasn’t always been easy to define, due to lack of standardization. In 2009, the IETF standardized syslog, making it possible to sum up the protocol.


There are three layers to syslog: content, application, and transport.


The transport layer sends the message over a network.

The application layer enables the message to be routed around, interpreted, and stored.

The content layer is the actual data contained within the message, which contains several standardized informational elements, including facility codes and severity levels.



Understanding syslog messages

Syslog event messages are generated by individual applications or other components of a system. All syslog messages follow a standard format, which is required for sharing messages between applications. This format includes the following components:


A header that includes specific fields for priority, version, timestamp, hostname, application, process ID and message ID.

Structured data, with data blocks in the key-value format.

A message, to be UTF-8 encoded. Includes a tag identifying the process that triggered the message, along with the content of the message.



Syslog facility codes

To identify the source of a message, syslog uses a numeric facility code, or simply a “facility,” generated by the originator of the message. These codes originated in Unix systems, and aren’t obvious based on their values. The list below correlates the message code with its facility.


0: kernel messages

1: user-level messages

2: mail system

3: system daemons

4: security/authorization messages

5: messages generated internally by syslog

6: line printer subsystem

7: network news subsystem

8: UUCP subsystem

9: clock daemon

10: security/authorization messages

11: FTP daemon

12: NTP subsystem

13: log audit

14: log alert

15: clock daemon

There are also facility codes 16 through 23, which are designated local use. This means they are used in differing capacities depending on the unique applications or software generating data in your specific system.


Syslog message levels

The syslog message is also tagged with a numeric severity indicator, with 0 being a full-on emergency and 7 used for debug purposes.


0 – Emergency System is Unusable

1 – Alert: Action must be taken immediately

2 – Critical: Critical Conditions

3 – Error: Error Conditions

4 – Warning: Warning Conditions

5 – Notice: Normal but Significant Condition

6 – Informational: Informational messages

7 – Debug: Debug-Level messages

The communication path of a syslog message includes a message originator, which creates and sends the message, and a collector, which takes in and stores the message (i.e., logging server). It can also include relay points in between, which can involve some data processing as the message is sent on. Syslog messages can also be sent to multiple destinations, based on the originating application’s settings.




Syslog data collection

On the log server side, there are also some concepts to help define the process of collecting syslog data:


Listener: Gathers the syslog data over a UDP port. Because UDP does not notify on transmission, a TCP port may be used for this. The listener cannot request data, differentiating it from other collector types.

Database: Syslog can generate large amounts of data, and servers need to be configured to handle the volume.

Software for data handling: Running on top of the server data, software can help automate tasks that are not built in to the syslog process, making the data more usable.


references:
https://www.sumologic.com/syslog/#:~:text=Syslog%20is%20a%20protocol%20that,other%20essential%20IT%20operational%20tasks.

Monday, April 10, 2023

How to install GNOME GUI on top of minimal installation

The reference in the techmint is a good one for installing gui on top of minimal installation. Essentially, below are the commands 


dnf group list --installed

dnf group list --available

dnf group install "Workstation"

dnf group install "Server with GUI"


systemctl get-default

systemctl set-default graphical

systemctl get-default

reboot

references:

https://www.tecmint.com/install-gui-desktop-rocky-linux-9/

Sunday, April 9, 2023

What is a Hypervisor?

A hypervisor, also known as a virtual machine monitor or VMM, is a type of virtualization software that supports the creation and management of virtual machines (VMs) by separating a computer’s software from its hardware. Hypervisors translate requests between the physical and virtual resources, making virtualization possible. When a hypervisor is installed directly on the hardware of a physical machine, between the hardware and the operating system (OS), it is called a bare metal hypervisor. Some bare metal hypervisors are embedded into the firmware at the same level as the motherboard basic input/output system (BIOS). This is necessary for some systems to enable the operating system on a computer to access and use virtualization software.


Because the bare metal hypervisor separates the OS from the underlying hardware, the software no longer relies on or is limited to specific hardware devices or drivers.  This means bare metal hypervisors allow operating systems and their associated applications to run on a variety of types of hardware. They also allow multiple operating systems and virtual machines (guest machines) to reside on the same physical server (host machine). Because the virtual machines are independent of the physical machine, they can move from machine to machine or platform to platform, shifting workloads and allocating networking, memory, storage, and processing resources across multiple servers according to needs. For example, when an application needs more processing power, it can seamlessly access additional machines through the virtualization software. This results in greater cost and energy efficiency and better performance, using fewer physical machines. 


References:

https://www.vmware.com/in/topics/glossary/content/bare-metal-hypervisor.html

What is Witness VM

A "Witness" is a special VM that monitors the Metro Availability configuration health. The Witness resides in a separate failure domain to provide an outside view that can distinguish a site failure from a network interruption between the Metro Availability sites. It can only be configured on AHV and ESXi hypervisors.


The main functions of a Witness include:


· Making a failover decision in the event of a site or inter-site network failure.

· Avoiding a split-brain condition where the same storage container is active on both sites due to (for example) a WAN failure.

· Handling situations where a single storage or network domain fails.



referencs:

https://next.nutanix.com/how-it-works-22/witness-vm-and-why-you-might-need-it-38343#:~:text=A%20%22Witness%22%20is%20a%20special,on%20AHV%20and%20ESXi%20hypervisors.

What is Cisco HyperFlex Strech Cluster deployment

A Hyperflex stretched cluster is a single cluster with geographically distributed nodes. Both sides of the cluster act as primary for certain user VMs. The data for these VMs is replicated synchronously on the other site. Stretched clusters enable you to access the entire cluster even if one of the sites were to completely go down. Typically these sites are connected with a low latency, dedicated, high-speed link between them.


HyperFlex Stretched Cluster enables you to deploy an Active-Active disaster avoidance solution for mission critical workloads requiring high uptime (near zero Recovery Time Objective) and no data loss (zero Recovery Point Objective).


 Prerequisites

Requirements

All the nodes in the cluster should be of the same M5 models (All HX220 M5) or (HX 240 M5)

Only M5 node are supported in sctretch Clusters

Stretch clusters is only supported on ESXi HX platforms

Each site should have a minimum of 2 nodes

ALL the VLANs used on both clusters have to be SAME

Stretch cluster configuration requires a Witness VM

Stretch Clustres require the same number of IP addresses that is needed for a six node cluster

Only one instance of vCenter is used for a stretch cluster

vCenter with DRS and HA is required for the stretch cluster to work properly


References:

https://www.cisco.com/c/en/us/support/docs/hyperconverged-infrastructure/hyperflex-hx-data-platform/214489-hyperflex-stretch-clusters-deployment-gu.html

What is Optuna

Optuna is a software framework for automating the optimization process of these hyperparameters. It automatically searches for and finds optimal hyperparameter values by trial and error for excellent performance. Currently, the software can be used in Python.


Optuna uses a history record of trials to determine which hyperparameter values to try next. Using this data, it estimates a promising area and tries values in that area. Optuna then estimates an even more promising region based on the new result. It repeats this process using the history data of trials completed thus far. Specifically, it employs a Bayesian optimization algorithm called Tree-structured Parzen Estimator.



A hyperparameter is a parameter to control how a machine learning algorithm behaves. In deep learning, the learning rate, batch size, and number of training iterations are hyperparameters. Hyperparameters also include the numbers of neural network layers and channels. They are not, however, just numerical values. Things like whether to use Momentum SGD or Adam in training are also regarded as hyperparameters.


It is almost impossible to make a machine learning algorithm do the job without tuning hyperparameters. The number of hyperparameters tends to be high, especially in deep learning, and it is believed that performance largely depends on how we tune them. Most researchers and engineers that use deep learning technology manually tune these hyperparameters and spend a significant amount of their time doing so.




References:

https://odsc.com/blog/optuna-an-automatic-hyperparameter-optimization-framework/#:~:text=Optuna%20is%20a%20software%20framework,can%20be%20used%20in%20Python.

Saturday, April 8, 2023

AI/ML What is Apache Zeppelin

Web-based notebook that enables data-driven,

interactive data analytics and collaborative documents with SQL, Scala, Python, R and more.


Zeppelin SDK

Not only you can use Zeppelin as interactive notebook, you can also use it as JobServer via Zeppelin SDK (client api & session api)


Spark Interpreter Improved

Spark interpreter provides comparable Python & R user experience like Jupyter Notebook.



Flink Interpreter Improved

Flink interpreter is refactored, supports Scala, Python & SQL. Flink 1.10 and afterwards (Scala 2.11 & 2.12) are all supported.


Yarn Interpreter Mode

You can run interpreter in yarn cluster, e.g. you can run Python interpreter in yarn and R interpreter in yarn.


Inline Configuration

Generic ConfInterpreter provide a way configure interpreter inside each note.


Interpreter Lifecycle Management

Interpreter lifecycle manager automatically terminate interpreter process on idle timeout. So resources are released when they're not in use. See here for more details.

Multi-purpose Notebook

The Notebook is the place for all your needs


 Data Ingestion

 Data Discovery

 Data Analytics

 Data Visualization & Collaboration


Multiple Language Backend

Apache Zeppelin interpreter concept allows any language/data-processing-backend to be plugged into Zeppelin. Currently Apache Zeppelin supports many interpreters such as Apache Spark, Apache Flink, Python, R, JDBC, Markdown and Shell.


References

https://zeppelin.apache.org/



what is X11 on Linux

X11 is very important for a sysadmin because some applications only install or run on GUI. This always happen in case of third party applications. So in this article, we are talking about how to use x11 forwarding.

This is an awesome tool for every sysadmin who is working remotely via putty and need to work or install GUI applications.

Mostly cases we use it on java and KVM applications with the help of x11. 

If you are using it on the remote machine do not set any environment variable. Just enable x11 forwarding in ssh setting in putty and follow the below steps.

references:

https://www.explinux.com/2020/07/how-to-enable-x11-forwarding-in-rhel-8-centos-8.html

Friday, April 7, 2023

Installing Rocky Linux on VM Fusion

The link in reference is a good one. Mainly 

- The rocky linux is not listed as an OS type in VM Fusion 

- Need to select the linux type as Linux > Other Linux 4.x Kernal 64bit 

With this, the installation window comes up properly 

references:

 https://linux.how2shout.com/install-rocky-linux-on-vmware-player-virtual-machine/

Monday, April 3, 2023

what is virbr0

The virbr0, or "Virtual Bridge 0" interface is used for NAT (Network Address Translation). It is provided by the libvirt library, and virtual environments sometimes use it to connect to the outside network.

$ ifconfig

Sample outputs:


virbr0    Link encap:Ethernet  HWaddr 00:00:00:00:00:00  

          inet addr:192.168.122.1  Bcast:192.168.122.255  Mask:255.255.255.0

          inet6 addr: fe80::200:ff:fe00:0/64 Scope:Link

          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1

          RX packets:0 errors:0 dropped:0 overruns:0 frame:0

          TX packets:39 errors:0 dropped:0 overruns:0 carrier:0

          collisions:0 txqueuelen:0 

          RX bytes:0 (0.0 b)  TX bytes:7921 (7.7 KiB)

  

Remove that bridge vrbr0 will make it come back.


$ sudo ifconfig virbr0 down

$ sudo brctl delbr virbr0

Now start the 'default' network using virsh command.


$ sudo virsh net-start default

This will automatically re-create the virbr0 bridge.



references:

https://gist.github.com/abelardojarab/e10ed30ab69bf9636929e17e3446bc2a

What is Cosine Similarity and Adjusted Cosine Similarity

Cosine Similarity

Cosine Similarity is a measurement that quantifies the similarity between two vectors [Which is Rating Vector in this case]

Adjusted Cosine

Adjusted cosine similarity is a modified version of vector-based similarity where we incorporate the fact that different users have different ratings schemes. In other words, some users might rate items highly in general, and others might give items lower ratings as a preference. To handle this nature from rating given by user , we subtract average ratings for each user from each user's rating for different movies.

references:


What is a Zero Vector

 A zero vector, denoted 0, is a vector of length 0, and thus has all components equal to zero. It is the additive identity of the additive group of vectors.

A non-zero vector in a vector space V is a vector that is not equal to the zero vector in V.

In mathematics and physics, a vector space (also called a linear space) is a set whose elements, often called vectors, may be added together and multiplied ("scaled") by numbers called scalars. Scalars are often real numbers, but can be complex numbers or, more generally, elements of any field. The operations of vector addition and scalar multiplication must satisfy certain requirements, called vector axioms. The terms real vector space and complex vector space are often used to specify the nature of the scalars: real coordinate space or complex coordinate space.

references

https://en.wikipedia.org/wiki/Vector_space

AI/ML a good method for printing model performance info

 


# Creating a common fun  0-0OOction which is usable to print the accuracy metrics of different models

def evaluate_performance(actual, pred):

    # Accuracy Score

    acc_score = round(accuracy_score(actual, pred)*100,2)

    

    # Confusion matrix

    confusion = confusion_matrix(actual, pred)

   

    TP = confusion[1,1] # true positive 

    TN = confusion[0,0] # true negatives

    FP = confusion[0,1] # false positives

    FN = confusion[1,0] # false negatives

    

    # Calculating Sensitivity/Recall

    sensitivity_recall = (TP / float(TP + FN))

    sensitivity_recall = round(sensitivity_recall,2)

  

    # Calculating Specificity

    specificity = (TN / float(TN + FP))

    specificity = round(specificity,2)  

  

    # Calculating Precision

    precision = (TN / float(TN + FP))

    precision = round(precision,2)  

    

    # Calculating F_1 score

    f1_score = 2 * ((precision * sensitivity_recall) / (precision + sensitivity_recall))

    f1_score = round(f1_score,2)  

    

    return pd.DataFrame([{"TP":TP,"TN":TN,"FP":FP,"FN":FN,"Recall":sensitivity_recall,"Precision":precision,"Specificity":specificity,"F1-Score":f1_score,"Accuracy":acc_score}])

Sunday, April 2, 2023

How to use GridSearchCV?

sklearn.model_selection.GridSearchCV(estimator, param_grid,scoring=None,

          n_jobs=None, iid='deprecated', refit=True, cv=None, verbose=0, 

          pre_dispatch='2*n_jobs', error_score=nan, return_train_score=False) 


1.estimator: Pass the model instance for which you want to check the hyperparameters.

2.params_grid: the dictionary object that holds the hyperparameters you want to try

3.scoring: evaluation metric that you want to use, you can simply pass a valid string/ object of evaluation metric

4.cv: number of cross-validation you have to try for each selected set of hyperparameters

5.verbose: you can set it to 1 to get the detailed print out while you fit the data to GridSearchCV

6.n_jobs: number of processes you wish to run in parallel for this task if it -1 it will use all available processors. 



#import all necessary libraries

import sklearn

from sklearn.datasets import load_breast_cancer

from sklearn.metrics import classification_report, confusion_matrix 

from sklearn.datasets import load_breast_cancer 

from sklearn.svm import SVC 

from sklearn.model_selection import GridSearchCV

from sklearn.model_selection import train_test_split 

 

#load the dataset and split it into training and testing sets

dataset = load_breast_cancer()

X=dataset.data

Y=dataset.target

X_train, X_test, y_train, y_test = train_test_split( 

                        X,Y,test_size = 0.30, random_state = 101) 

# train the model on train set without using GridSearchCV 

model = SVC() 

model.fit(X_train, y_train) 

   

# print prediction results 

predictions = model.predict(X_test) 

print(classification_report(y_test, predictions)) 




# defining parameter range 

param_grid = {'C': [0.1, 1, 10, 100],  

              'gamma': [1, 0.1, 0.01, 0.001, 0.0001], 

              'gamma':['scale', 'auto'],

              'kernel': ['linear']}  

   

grid = GridSearchCV(SVC(), param_grid, refit = True, verbose = 3,n_jobs=-1) 

   

# fitting the model for grid search 

grid.fit(X_train, y_train) 

 

# print best parameter after tuning 

print(grid.best_params_) 

grid_predictions = grid.predict(X_test) 

   

# print classification report 

print(classification_report(y_test, grid_predictions)) 

references:

How to do HyperParametertuning with GridSearchCV

In almost any Machine Learning project, we train different models on the dataset and select the one with the best performance. However, there is room for improvement as we cannot say for sure that this particular model is best for the problem at hand. Hence, our aim is to improve the model in any way possible. One important factor in the performances of these models are their hyperparameters, once we set appropriate values for these hyperparameters, the performance of a model can improve significantly. In this article, we will find out how we can find optimal values for the hyperparameters of a model by using GridSearchCV.


GridSearchCV is the process of performing hyperparameter tuning in order to determine the optimal values for a given model. As mentioned above, the performance of a model significantly depends on the value of hyperparameters. Note that there is no way to know in advance the best values for hyperparameters so ideally, we need to try all possible values to know the optimal values. Doing this manually could take a considerable amount of time and resources and thus we use GridSearchCV to automate the tuning of hyperparameters.


GridSearchCV is a function that comes in Scikit-learn’s(or SK-learn) model_selection package.So an important point here to note is that we need to have the Scikit learn library installed on the computer. This function helps to loop through predefined hyperparameters and fit your estimator (model) on your training set. So, in the end, we can select the best parameters from the listed hyperparameters.


 { 'C': [0.1, 1, 10, 100, 1000],  

   'gamma': [1, 0.1, 0.01, 0.001, 0.0001], 

   'kernel': ['rbf',’linear’,'sigmoid']  }

Here C, gamma and kernels are some of the hyperparameters of an SVM model. Note that the rest of the hyperparameters will be set to their default values


GridSearchCV tries all the combinations of the values passed in the dictionary and evaluates the model for each combination using the Cross-Validation method. Hence after using this function we get accuracy/loss for every combination of hyperparameters and we can choose the one with the best performance.

references:

https://www.mygreatlearning.com/blog/gridsearchcv/#:~:text=GridSearchCV%20is%20a%20technique%20for,parameter%20values%2C%20predictions%20are%20made.