Sunday, June 19, 2022

String concat() vs concat operators

 MDN has the following to say about string.concat():

It is strongly recommended to use the string concatenation operators (+, +=) instead of this method for perfomance reasons


Also see the link by @Bergi.


references:

https://stackoverflow.com/questions/16124032/js-strings-vs-concat-method


Pod Install on Apple M1 Chip

# Uninstall the local cocoapods gem
sudo gem uninstall cocoapods

# Reinstall cocoapods via Homebrew
brew install cocoapods

 references:

https://stackoverflow.com/questions/64901180/how-to-run-cocoapods-on-apple-silicon-m1

Installing MongoDB on Apple Mac M1 Pro

 Article on how to install mongodb on apple m1 using homebrew

Install homebrew from https://brew.sh/

Install xcode command line using

xcode-select --install

Now to install mongodb use

brew tap mongodb/brew

brew install mongodb-community@5.0

To check if mongodb has been installed use

mongo --version

to start mongoDB as macOS service use

brew services start mongodb-community@5.0

and to stop mongoDB to run as a background service use

brew services stop mongodb-community@5.0

Or, if you don't want/need a background service you can just run:

mongod --config /opt/homebrew/etc/mongod.conf

To run mongodb commands, open a new table and run mongo

To check your databases run show dbs

Full documentation here: https://docs.mongodb.com/manual/tutorial/install-mongodb-on-os-x/

references:

https://docs.mongodb.com/manual/tutorial/install-mongodb-on-os-x/

https://stackoverflow.com/questions/65357744/how-to-install-mongodb-on-apple-m1-chip

Sunday, June 12, 2022

AI/ML: Types of ensemble methods

BAGGing, or Bootstrap AGGregating. BAGGing gets its name because it combines Bootstrapping and Aggregation to form one ensemble model. Given a sample of data, multiple bootstrapped subsamples are pulled. A Decision Tree is formed on each of the bootstrapped subsamples. After each subsample Decision Tree has been formed, an algorithm is used to aggregate over the Decision Trees to form the most efficient predictor. The image below will help explain:


Random Forest Models. Random Forest Models can be thought of as BAGGing, with a slight tweak. When deciding where to split and how to make decisions, BAGGed Decision Trees have the full disposal of features to choose from. Therefore, although the bootstrapped samples may be slightly different, the data is largely going to break off at the same features throughout each model. In contrary, Random Forest models decide where to split based on a random selection of features. Rather than splitting at similar features at each node throughout, Random Forest models implement a level of differentiation because each tree will split based on different features. This level of differentiation provides a greater ensemble to aggregate over, ergo producing a more accurate predictor. Refer to the image for a better understanding.


Similar to BAGGing, bootstrapped subsamples are pulled from a larger dataset. A decision tree is formed on each subsample. HOWEVER, the decision tree is split on different features (in this diagram the features are represented by shapes).


References:

https://towardsdatascience.com/ensemble-methods-in-machine-learning-what-are-they-and-why-use-them-68ec3f9fef5f


AI/ML: Ensemble methods in machine learning

Ensemble methods is a machine learning technique that combines several base models in order to produce one optimal predictive model

A Decision Tree determines the predictive value based on series of questions and conditions. For instance, this simple Decision Tree determining on whether an individual should play outside or not. The tree takes several weather factors into account, and given each factor either makes a decision or asks another question.

When making Decision Trees, there are several factors we must take into consideration: On what features do we make our decisions on? What is the threshold for classifying each question into a yes or no answer? In the first Decision Tree, what if we wanted to ask ourselves if we had friends to play with or not. If we have friends, we will play every time. If not, we might continue to ask ourselves questions about the weather. By adding an additional question, we hope to greater define the Yes and No classes.

This is where Ensemble Methods come in handy! Rather than just relying on one Decision Tree and hoping we made the right decision at each split, Ensemble Methods allow us to take a sample of Decision Trees into account, calculate which features to use or questions to ask at each split, and make a final predictor based on the aggregated results of the sampled Decision Trees.

References:

https://towardsdatascience.com/ensemble-methods-in-machine-learning-what-are-they-and-why-use-them-68ec3f9fef5f


AI/ML: What is Durbin Watson Static Definition

The Durbin Watson (DW) statistic is a test for autocorrelation in the residuals from a statistical model or regression analysis. 

The Durbin-Watson statistic will always have a value ranging between 0 and 4. A value of 2.0 indicates there is no autocorrelation detected in the sample. Values from 0 to less than 2 point to positive autocorrelation and values from 2 to 4 means negative autocorrelation.

Autocorrelation, also known as serial correlation, can be a significant problem in analyzing historical data if one does not know to look out for it. For instance, since stock prices tend not to change too radically from one day to another, the prices from one day to the next could potentially be highly correlated, even though there is little useful information in this observation. In order to avoid autocorrelation issues, the easiest solution in finance is to simply convert a series of historical prices into a series of percentage-price changes from day to day.

references:

https://www.investopedia.com/terms/d/durbin-watson-statistic.asp

Thursday, June 9, 2022

Angular making input variable Optional

function example(x: number?, y?: string) {

 // …

}

“?” Suffix

The first one indicates a nullable parameter that can also be a number, undefined or null. But, the second argument is an optional, it can be a string or undefined.

Undefined vs Null

Again here some may not be aware of the difference between “undefined” and “null”. Undefined represents something that may not exist. Null is used for things that are nullable. So both are entirely different.

@Optional? As the official document says, Optional is a constructor parameter decorator that marks a dependency as optional. This clearly states that @Optional is used in the context of DI (Dependency Injection). 

@Injectable()

class Car {

    constructor(@Optional() public engine: Engine) {}

}

When @Optional is used, no exceptions occur even when the injected dependency is undefined. It treats the dependency as optional. Same thing happens if we replace @Optional annotation with “?”. The injector search for the dependency and when it is undefined, it will throw the exception.

Eventhough sometimes replacing these doesn’t make any difference in the way your code executes, it is necessary to understand that always good to follow best practices and the recommended ways as a good developer.

references:

https://medium.com/@angela.amarapala/understanding-the-usage-of-optional-and-nullable-in-typescript-826c1754df3




What is difference between promise.all() and promise.settleAll

Promise.all will reject as soon as one of the Promises in the array rejects.

Promise.allSettled will never reject - it will resolve once all Promises in the array have either rejected or resolved.

Their resolve values are different as well. Promise.all will resolve to an array of each of the values that the Promises resolve to - eg [Promise.resolve(1), Promise.resolve(2)] will turn into [1, 2]. Promise.allSettled will instead give you [{ status : 'fulfilled', value: 1 }, { status : 'fulfilled', value: 2 }].

Promise.all([Promise.resolve(1), Promise.resolve(2)])

  .then(console.log);

Promise.allSettled([Promise.resolve(1), Promise.resolve(2)])

  .then(console.log);

If one of the Promises rejects, the Promise.all will reject with a value of the rejection, but Promise.allSettled will resolve with an object of { status: 'rejected', reason: <error> } at that place in the array.

Promise.all([Promise.reject(1), Promise.resolve(2)])

  .catch((err) => {

    console.log('err', err);

  });

Promise.allSettled([Promise.reject(1), Promise.resolve(2)])

  .then(console.log);

References

https://stackoverflow.com/questions/59784175/differences-between-promise-all-and-promise-allsettled-in-js

Wednesday, June 8, 2022

Pandas Dataframe tips

=====

 df_appmtns.apply(filter_date, axis=1) 

Here axis is a very important argument because otherwise which it will be giving a series of first column values. if passed as 1, it will give the row by row values 

===== 

When a date object is stored into a dataframe as datetime, it get stored as pandas object type 

dtVal = parser.parse(<date_str>) 

when df['dateval'] = dtVal; 

and when retrieving the type of the date field becomes 

<class 'pandas._libs.tslibs.timestamps.Timestamp'>

So, when trying to filter things based on datetime object, it has to be first converted to datetime and then filter out 

It may be easier to store as millis value for that purpose. 

but a tslibs timestamp can be converted to datetime using below 
bookTs.to_pydatetime()

===================

Inorder to iterate a dataframe row and update one of its column, below can be done

for i, row in df_appmtns.iterrows():
    df_appmtns.at[i,'t'] = row['bookedAt']
    df_appmtns.at[i,'week_label'] = get_week_index(row['bookedAt'])

if we just do row['week_label'] = 'value' , this does not work 

=========================
To append two data frames one below other, below can be used

frames = [df_appmtns,df_data]
df_analyze = pd.concat(frames)


=================================

Dataframe dropping a column
df_data.drop('Id', inplace=True, axis=1)

===================================

DataFrame get all numerical columns 

def is_int_dtype(_type):
    return (_type == np.int64 or _type == np.float64) 

numeric_cols = [col for col in df_data.columns if is_int_dtype(df_data[col].dtype)]

====================================

Dataframe how to append to the end of dataframe 
df.loc[len(df)] = [line, target]

AI/ML: Boston dataset Linear and Ridge regression


#Below is how to get the train and test set 

#Picking 11 columns and the last is the target. 


#preview

features = boston_df.columns[0:11]

target = boston_df.columns[-1]


#X and y values

X = boston_df[features].values

y = boston_df[target].values


#using test_train_split, get the train and test set. 

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=17)


print(" X_train dimension is {}".format(X_train.shape))

print("X_test dimension is {}".format(X_test.shape))



#Scale features. Using standard scaler 

scaler = StandardScaler()

X_train = scaler.fit_transform(X_train)

X_test = scaler.transform(X_test)




#Now do the linear regression 

#Model

lr = LinearRegression()


#Fit model

lr.fit(X_train, y_train)


#predict

#prediction = lr.predict(X_test)


#actual

actual = y_test


train_score_or = lr.score(X_train, y_train)

test_score_lr = lr.score(X_test, y_test)


print("LR Model train score {}".format(train_score_lr))

print("LR model test score {}".format(test_score_lr))



#Ridge Regression Model, pick Alpha as 10 

ridgeReg = Ridge(alpha=10)


ridgeReg.fit(X_train,y_train)


#train and test scorefor ridge regression

train_score_ridge = ridgeReg.score(X_train, y_train)

test_score_ridge = ridgeReg.score(X_test, y_test)


print("\nRidge Model............................................\n")

print("Ridge model train score {}".format(train_score_ridge))

print("Ridge model test score {}".format(test_score_ridge))


Using an alpha value of 10, the evaluation of the model, the train, and test data indicate better performance on the ridge model than on the linear regression model.


Instead of picking alpha as 10 always, it is possible to do cross validation 


#Lasso Cross validation

ridge_cv = RidgeCV(alphas = [0.0001, 0.001,0.01, 0.1, 1, 10]).fit(X_train, y_train)


#score

print("Ridge model train score {}".format(ridge_cv.score(X_train, y_train)))

print("Ridge Model test score {}".format(ridge_cv.score(X_test, y_test))) 



references:

https://www.datacamp.com/tutorial/tutorial-lasso-ridge-regression#data%20importation%20and%20eda

AI/ML: What is Boston dataset

This dataset contains information collected by the U.S Census Service concerning housing in the area of Boston Mass.

 It was obtained from the StatLib archive (http://lib.stat.cmu.edu/datasets/boston), and has been used extensively throughout the literature to benchmark algorithms. However, these comparisons were primarily done outside of Delve and are thus somewhat suspect. The dataset is small in size with only 506 cases.


The name for this dataset is simply boston. It has two prototasks: nox, in which the nitrous oxide level is to be predicted; and price, in which the median value of a home is to be predicted


There are 14 attributes in each case of the dataset. They are:

CRIM - per capita crime rate by town

ZN - proportion of residential land zoned for lots over 25,000 sq.ft.

INDUS - proportion of non-retail business acres per town.

CHAS - Charles River dummy variable (1 if tract bounds river; 0 otherwise)

NOX - nitric oxides concentration (parts per 10 million)

RM - average number of rooms per dwelling

AGE - proportion of owner-occupied units built prior to 1940

DIS - weighted distances to five Boston employment centres

RAD - index of accessibility to radial highways

TAX - full-value property-tax rate per $10,000

PTRATIO - pupil-teacher ratio by town

B - 1000(Bk - 0.63)^2 where Bk is the proportion of blacks by town

LSTAT - % lower status of the population

MEDV - Median value of owner-occupied homes in $1000's




references:

https://www.cs.toronto.edu/~delve/data/boston/bostonDetail.html

Tuesday, June 7, 2022

AI/ML: Ridge Regression

Similar to the lasso regression, ridge regression puts a similar constraint on the coefficients by introducing a penalty factor. However, while lasso regression takes the magnitude of the coefficients, ridge regression takes the square.

Ridge regression is also referred to as L2 Regularization

Why Lasso can be Used for Model Selection, but not Ridge Regression



Considering the geometry of both the lasso (left) and ridge (right) models, the elliptical contours (red circles) are the cost functions for each. Relaxing the constraints introduced by the penalty factor leads to an increase in the constrained region (diamond, circle). Doing this continually, we will hit the center of the ellipse, where the results of both lasso and ridge models are similar to a linear regression model.

However, both methods determine coefficients by finding the first point where the elliptical contours hit the region of constraints. Since lasso regression takes a diamond shape in the plot for the constrained region, each time the elliptical regions intersect with these corners, at least one of the coefficients becomes zero. This is impossible in the ridge regression model as it forms a circular shape and therefore values can be shrunk close to zero, but never equal to zero.

references:

https://online.stat.psu.edu/stat508/book/export/html/749

https://www.datacamp.com/tutorial/tutorial-lasso-ridge-regression#data%20importation%20and%20eda


AI/ML: What is regularization

When it comes to training models, there are two major problems one can encounter: overfitting and underfitting.

Overfitting happens when the model performs well on the training set but not so well on unseen (test) data.

Underfitting happens when it neither performs well on the train set nor on the test set.

Regularization is implemented to avoid overfitting of the data, especially when there is a large variance between train and test set performances. With regularization, the number of features used in training is kept constant, yet the magnitude of the coefficients (m) as seen in below equation is reduced 

Equation is 

y(hat)  = m1 * x1  + m2 * x2 + ... + mn * xn  + b 

There are different ways of reducing model complexity and preventing overfitting in linear models. This includes ridge and lasso regression models.

references:

https://www.datacamp.com/tutorial/tutorial-lasso-ridge-regression#data%20importation%20and%20eda

AI/ML: Lasso regression basics

This is a regularization technique used in feature selection using a Shrinkage method also referred to as the penalized regression method. Lasso is short for Least Absolute Shrinkage and Selection Operator, which is used both for regularization and model selection. If a model uses the L1 regularization technique, then it is called lasso regression.


Lasso Regression for Regularization

In this shrinkage technique, the coefficients determined in the linear model from equation below are shrunk towards the central point as the mean by introducing a penalization factor called the alpha α (or sometimes lamda) values.


Alpha (α) is the penalty term that denotes the amount of shrinkage (or constraint) that will be implemented in the equation. With alpha set to zero, you will find that this is the equivalent of the linear regression model 

and a larger value penalizes the optimization function. Therefore, lasso regression shrinks the coefficients and helps to reduce the model complexity and multi-collinearity


Alpha (α) can be any real-valued number between zero and infinity; the larger the value, the more aggressive the penalization is


Lasso Regression for Model Selection


Due to the fact that coefficients will be shrunk towards a mean of zero, less important features in a dataset are eliminated when penalized. The shrinkage of these coefficients based on the alpha value provided leads to some form of automatic feature selection, as input variables are removed in an effective approach.

references:

https://www.datacamp.com/tutorial/tutorial-lasso-ridge-regression#data%20importation%20and%20eda

Friday, June 3, 2022

AI/ML: Dataframe aggregation and plotting bar and stacked bar charts

 test_data = [ 

    {'week_label' : 'W1', 'user' : 'user1', 'evt' : 'e1', 'dt' : 100},

    {'week_label' :  'W1', 'user' : 'user2', 'evt' : 'e2', 'dt' : 80},

    {'week_label' :  'W1', 'user' : 'user3', 'evt' : 'e3', 'dt' : 10},

    {'week_label' :  'W1', 'user' : 'user4', 'evt' : 'e4', 'dt' : 102},

    {'week_label' :  'W1', 'user' : 'user5', 'evt' : 'e5', 'dt' : 78},

    {'week_label' :  'W1', 'user' : 'user6', 'evt' : 'e6', 'dt' : 999},

    {'week_label' :  'W2', 'user' : 'user7', 'evt' : 'e7', 'dt' : 23},

    {'week_label' : 'W2', 'user' : 'user1', 'evt' : 'e1', 'dt' : 100},

    {'week_label' : 'W2', 'user' : 'user2', 'evt' : 'e2', 'dt' : 80},

    {'week_label' : 'W2', 'user' : 'user3', 'evt' : 'e3', 'dt' : 10},

    {'week_label' : 'W2', 'user' : 'user4', 'evt' : 'e4', 'dt' : 102},

    {'week_label' : 'W2', 'user' : 'user5', 'evt' : 'e5', 'dt' : 78},

    {'week_label' : 'W3', 'user' : 'user7', 'evt' : 'e7', 'dt' : 23},

    {'week_label' : 'W3', 'user' : 'user1', 'evt' : 'e1', 'dt' : 100},

    {'week_label' : 'W3', 'user' : 'user2', 'evt' : 'e2', 'dt' : 80},

    {'week_label' : 'W4', 'user' : 'user3', 'evt' : 'e3', 'dt' : 10},

    {'week_label' : 'W4', 'user' : 'user4', 'evt' : 'e4', 'dt' : 102},

    {'week_label' : 'W4', 'user' : 'user5', 'evt' : 'e5', 'dt' : 78}

]

test_df = pd.DataFrame(test_data)

test_df_an=test_df.groupby('week_label').agg(['count']).reset_index()

print('test_df_an agg \n',test_df_an)


//Below gives bar chart 

test_df_an.plot(x='week_label', y='evt',kind="bar")


//Below gives a stacked version of it 

test_df_an[['evt','dt','user']].plot(kind='bar', stacked=True)


References

https://stackoverflow.com/questions/50594613/how-to-plot-aggregated-by-date-pandas-dataframe

https://stackoverflow.com/questions/23415500/pandas-plotting-a-stacked-bar-chart


AI/ML: Using dataframe to group and summarise

data.groupby(['month']).groups.keys()

Out[59]: ['2014-12', '2014-11', '2015-02', '2015-03', '2015-01']

len(data.groupby(['month']).groups['2014-11'])

Out[61]: 230


data.groupby('month').first()

==> This gives first row of each month 


data.groupby('month')['duration'].sum()

===> This gives sum by each month 


data.groupby('month')['date'].count() 

===> This gives entries in each month 



data.groupby('month')['duration'].sum() 

===> produces Pandas Series


data.groupby('month')[['duration']].sum()

===> Produces Pandas DataFrame



data.groupby('month', as_index=False).agg({"duration": "sum"})

===> The groupby output will have an index or multi-index on rows corresponding to your chosen grouping variables. To avoid setting this index, pass “as_index=False” to the groupby operation.


    

df_analyze.groupby(['week_label','evt']).agg({'evt':'count','t' : 'sum'})

===> This is powerful to give thhe evt filed as count and time as sum if we are using agg function 



agg_procedure = {

    'evt':'count',

    't' : 'sum'

}

df_analyze.groupby(['week_label','evt']).agg(agg_procedure)


===> This above is an equivalent of the corresponding above, just that defined as a procedure   



df_analyze.groupby(['week_label','evt']).agg({

    'dt' : ['min','max', 'sum'],

    'evt' : 'count',

    't' : ['min', 'first', 'nunique']

})




references:

https://www.shanelynn.ie/summarising-aggregation-and-grouping-data-in-python-pandas/


AI/ML: seaborn heat map correlation between dependent and independent variables

What we often want to create, is a colored map that shows the strength of the correlation between every independent variable that we want to include in our model and the dependent variable.

The following code returns the correlation of all features with ‘Sale Price’, a single, dependent variable, sorted by ‘Sale Price’ in a descending manner.

dataframe.corr()[['Sale Price']].sort_values(by='Sale Price', ascending=False)

plt.figure(figsize=(8, 12))

heatmap = sns.heatmap(dataframe.corr()[['Sale Price']].sort_values(by='Sale Price', ascending=False), vmin=-1, vmax=1, annot=True, cmap='BrBG')

heatmap.set_title('Features Correlating with Sales Price', fontdict={'fontsize':18}, pad=16);

references:

https://medium.com/@szabo.bibor/how-to-create-a-seaborn-correlation-heatmap-in-python-834c0686b88e

AI/ML: Seaborn correlation map - some useful options

sns.heatmap(df.corr(),annot=True);

When cannot is set to True, it will basically display the correlated value in the cell 

plt.figure(figsize=(16, 6))

When fig size is specified, the display is bound within this.

# Store heatmap object in a variable to easily access it when you want to include more features (such as title).

# Set the range of values to be displayed on the colormap from -1 to 1, and set the annotation to True to display the correlation values on the heatmap.

heatmap = sns.heatmap(dataframe.corr(), vmin=-1, vmax=1, annot=True)

# Give a title to the heatmap. Pad defines the distance of the title from the top of the heatmap.

heatmap.set_title('Correlation Heatmap', fontdict={'fontsize':12}, pad=12);

A diverging color palette that has markedly different colors at the two ends of the value-range with a pale, almost colorless midpoint, works much better with correlation heatmaps than the default colormap. 

plt.figure(figsize=(16, 6))

heatmap = sns.heatmap(dataframe.corr(), vmin=-1, vmax=1, annot=True, cmap='BrBG')

heatmap.set_title('Correlation Heatmap', fontdict={'fontsize':18}, pad=12);

# save heatmap as .png file

# dpi - sets the resolution of the saved image in dots/inches

# bbox_inches - when set to 'tight' - does not allow the labels to be cropped

plt.savefig('heatmap.png', dpi=300, bbox_inches='tight')


Triangle Correlation Heatmap

Take a look at any of the correlation heatmaps above. If you cut away half of it along the diagonal line marked by 1-s, you would not lose any information. Let’s cut the heatmap in half, then, and keep only the lower triangle.



The Seaborn heatmap ‘mask’ argument comes in handy when we want to cover part of the heatmap.


Mask — takes a boolean array or a dataframe as an argument; when defined, cells become invisible for values where the mask is True



Let’s use the np.triu() numpy function to isolate the upper triangle of a matrix while turning all the values in the lower triangle into 0. (The np.tril() function would do the same, only for the lower triangle.) Using the np.ones_like() function will change all the isolated values into 1.

np.triu(np.ones_like(dataframe.corr()))


plt.figure(figsize=(16, 6))

# define the mask to set the values in the upper triangle to True

mask = np.triu(np.ones_like(dataframe.corr(), dtype=np.bool))

heatmap = sns.heatmap(dataframe.corr(), mask=mask, vmin=-1, vmax=1, annot=True, cmap='BrBG')

heatmap.set_title('Triangle Correlation Heatmap', fontdict={'fontsize':18}, pad=16);


references:

https://medium.com/@szabo.bibor/how-to-create-a-seaborn-correlation-heatmap-in-python-834c0686b88e




AI/ML: Various ways for converting dictionary to dataframe

np.random.seed(0)

data = pd.DataFrame(np.random.choice(10, (3, 4)), columns=list('ABCD')).to_dict('r')



# The following methods all produce the same output.

pd.DataFrame(data)

pd.DataFrame.from_dict(data)

pd.DataFrame.from_records(data)



Dictionary Orientations: orient='index'/'columns'

There are two primary types: "columns", and "index".


orient='columns'

Dictionaries with the "columns" orientation will have their keys correspond to columns in the equivalent DataFrame.


pd.DataFrame.from_dict(data_c, orient='columns')

  A  B  C  D

0  5  0  3  3

1  7  9  3  5

2  2  4  7  6


pd.DataFrame.from_records, the orientation is assumed to be "columns" (you cannot specify otherwise), and the dictionaries will be loaded accordingly.



orient='index'

With this orient, keys are assumed to correspond to index values. This kind of data is best suited for pd.DataFrame.from_dict.


data_i ={

 0: {'A': 5, 'B': 0, 'C': 3, 'D': 3},

 1: {'A': 7, 'B': 9, 'C': 3, 'D': 5},

 2: {'A': 2, 'B': 4, 'C': 7, 'D': 6}}



pd.DataFrame.from_dict(data_i, orient='index')


   A  B  C  D

0  5  0  3  3

1  7  9  3  5

2  2  4  7  6


Setting Custom Index

If you need a custom index on the resultant DataFrame, you can set it using the index=... argument.


pd.DataFrame(data, index=['a', 'b', 'c'])

# pd.DataFrame.from_records(data, index=['a', 'b', 'c'])


   A  B  C  D

a  5  0  3  3

b  7  9  3  5

c  2  4  7  6


This is not supported by pd.DataFrame.from_dict.


https://stackoverflow.com/questions/20638006/convert-list-of-dictionaries-to-a-pandas-dataframe


AI/ML: Dataframe correlation analysis using heatmap

import pandas as pd

import numpy as np

import seaborn as sns

import matplotlib.pyplot as plt

df = pd.read_csv('train.csv')

# print(df.columns)

# df.corr()

df = df[['OverallQual', 'TotalBsmtSF', 'GarageArea', 'GarageCars','SalePrice']]

# print(df['GarageCars'].value_counts())

print(df.dtypes)

sns.heatmap(df.corr());


this is a very small example. the dataset is from Kaggle https://www.kaggle.com/c/house-prices-advanced-regression-techniques/data




Thursday, June 2, 2022

Angular: How to pass data from Child to parent component

Register the EventEmitter in your child component as the @Output:

@Output() onDatePicked = new EventEmitter<any>();

Emit value on click:


public pickDate(date: any): void {

    this.onDatePicked.emit(date);

}

Listen for the events in your parent component's template:

<div>

    <calendar (onDatePicked)="doSomething($event)"></calendar>

</div>

and in the parent component:

public doSomething(date: any):void {

    console.log('Picked date: ', date);

}

It's also well explained in the official docs

https://angular.io/guide/component-interaction

references:

https://angular.io/guide/component-interaction

https://stackoverflow.com/questions/42107167/how-to-pass-data-from-child-to-parent-component-angular

Electron How to clear Cookies

session.defaultSession.clearStorageData([], (data) => {})

Note that the session has to be the one attached the browserwindow. So normally this data will be accessible only on the point where it is attached to the browser window. 

references:

https://stackoverflow.com/questions/37520322/electronjs-how-to-clear-all-cookies-from-a-session

HIPAA security officer requirements

 All Covered Entities and Business Associates are required by 45 CFR 164.308 – the Administrative Safeguards of the HIPAA Security Rule – to identify a HIPAA Security Officer who is responsible for the development and implementation of policies and procedures to ensure the integrity of electronic Protected Health Information (ePHI). The role of HIPAA Security Officer is often designated to an IT Manager due to the perception that the integrity of ePHI is an IT issue. However, this is not necessarily the case 
Although the Technical Safeguards of the HIPAA Security Rule relate to restricting access to systems on which ePHI is maintained and transmission security, only about 30% of a HIPAA Security Offer´s responsibilities are IT-related. The remainder of his or her responsibilities relate to training, auditing, incident management and overseeing Business Associate compliance. A HIPAA Security Officer is also responsible for facility security and the preparation of a Disaster Recovery Plan.
The Responsibilities of a HIPAA Security Office
The HIPAA Security Rule stipulates the person designated the role of HIPAA Security Officer must implement policies and procedures to prevent, detect, contain, and correct breaches of ePHI. Before developing the policies and procedures, the HIPAA Security Officer has to conduct and chronicle risk assessments to cover every element of the Security Rule´s Technical, Physical and Administrative Safeguards 

Once the risks to the integrity of ePHI have been identified, a HIPAA Security Officer must implement measures “to reduce risks and vulnerabilities to a reasonable and appropriate level to comply with 45 CFR 164.306(a)”. Employees have to be trained on any new work practices that are introduced and be informed of the sanctions for failing to comply with the new policies and procedures. In order to enforce the sanctions policy, a system of reviewing information system activity also has to be implemented

HIPAA Security Officer Job Description
A HIPAA Security Officer job description needs to outline the Officer´s responsibilities with regard to establishing and maintaining HIPAA-compliant mechanisms for ensuring the confidentiality, integrity and accessibility of healthcare information systems. These responsibilities will vary according to the nature and size of the organization, but should include:

Responsibilities for establishing, managing and enforcing the Security Rule safeguards and any subsequent rules issued by OCR.
Responsibilities for integrating IT security and HIPAA compliance with the organization´s business strategies and requirements.
Responsibilities for addressing issues related to access controls, business continuity, disaster recovery, and incident response.
Responsibilities for organizational security awareness, including staff training in collaboration with the HIPAA Privacy Officer.
Responsibilities for conducting risk assessments and audits – especially with regard to Business Associates and other third parties.
Responsibilities for investigating data breaches and implementing measures for their future prevention and/or containment.
in larger organizations, the HIPAA Compliance Team. There are many areas of the Security and Privacy Rules that overlap, and resources can be pooled to conduct risk assessments, manage employee training, and accelerate HIPAA compliance. A partnership between Security and Privacy Officers can also better oversee Business Associate compliance.

The HIPAA Privacy Officer Requirement
HIPAA Privacy Officers have been mentioned periodically throughout this article as it is required that, in addition to a HIPAA Security Officer, Covered Entities appoint a HIPAA Privacy Officer. The HIPAA Privacy Officer requirement is mandated by HIPAA and, depending on the nature and size of the organization, it is possible for the two roles to be combined into one.

The role of a HIPAA Privacy Officer is similar in some respects of that to a Security Officer as it involves conducting risk assessments, staff training, and managing Business Associate Agreements. However, a Privacy Officer will also be responsible for establishing, managing, and enforcing HIPAA-compliant policies and procedures to protect PHI in whatever format it is maintained.

references:
https://www.hipaajournal.com/hipaa-security-officer/

Angular how to pass in multiple arguments in emit()

 It allows only single parameter. We can pass an Object which contains multiple keys and values to avoid this issue. 

for e..g 

this.onSelection.emit({'key1' : 'value1', 'key2' : 'value2'})


Wednesday, June 1, 2022

What is difference between Object.Freeze and Object.seal



Both freeze and seal are used to create non extensible objects in JavaScript, but there are plenty of differences between them. Object.seal() allows changes to the existing properties of an object whereas Object.freeze() does not allow so. Object.freeze() makes an object immune to everything even little changes cannot be made. Object.seal() prevents from deletion of existing properties but cannot prevent them from external changes.


  // creates an object

    var obj = {

        // assigns 10 to value

        value: 10

    };

    // creates a non-extensible object

    Object.freeze(obj);

    // updates the value

    obj.value = 20;

    // but cannot change the existing value

    console.log(obj.value);




 // creates an object

    var obj = {

        // assigns 10 to value

        value: 10

    };

    // creates a non-extensible object

    Object.seal(obj);

    // the value gets updated to 20

    obj.value = 20;

    console.log(obj.value);




references:

https://www.geeksforgeeks.org/what-is-the-difference-between-freeze-and-seal-in-javascript/

Linear and Non Linear equations

Linear means something related to a line. All the linear equations are used to construct a line. A non-linear equation is such which does not form a straight line. It looks like a curve in a graph and has a variable slope value.

The major difference between linear and nonlinear equations is given here for the students to understand it in a more natural way. The differences are provided in a tabular form with examples.



Linear Equations

It forms a straight line or represents the equation for the straight line

It has only one degree. Or we can also define it as an equation having the maximum degree 1.

All these equations form a straight line in XY plane. These lines can be extended to any direction but in a straight form.

The general representation of linear equation is y = mx +c 


Non-Linear Equations

It does not form a straight line but forms a curve.

A nonlinear equation has the degree as 2 or more than 2, but not less than 2.

It forms a curve and if we increase the value of the degree, the curvature of the graph increases.

The general representation of nonlinear equations is ax2 + by2 = c