Wednesday, June 8, 2022

AI/ML: Boston dataset Linear and Ridge regression


#Below is how to get the train and test set 

#Picking 11 columns and the last is the target. 


#preview

features = boston_df.columns[0:11]

target = boston_df.columns[-1]


#X and y values

X = boston_df[features].values

y = boston_df[target].values


#using test_train_split, get the train and test set. 

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=17)


print(" X_train dimension is {}".format(X_train.shape))

print("X_test dimension is {}".format(X_test.shape))



#Scale features. Using standard scaler 

scaler = StandardScaler()

X_train = scaler.fit_transform(X_train)

X_test = scaler.transform(X_test)




#Now do the linear regression 

#Model

lr = LinearRegression()


#Fit model

lr.fit(X_train, y_train)


#predict

#prediction = lr.predict(X_test)


#actual

actual = y_test


train_score_or = lr.score(X_train, y_train)

test_score_lr = lr.score(X_test, y_test)


print("LR Model train score {}".format(train_score_lr))

print("LR model test score {}".format(test_score_lr))



#Ridge Regression Model, pick Alpha as 10 

ridgeReg = Ridge(alpha=10)


ridgeReg.fit(X_train,y_train)


#train and test scorefor ridge regression

train_score_ridge = ridgeReg.score(X_train, y_train)

test_score_ridge = ridgeReg.score(X_test, y_test)


print("\nRidge Model............................................\n")

print("Ridge model train score {}".format(train_score_ridge))

print("Ridge model test score {}".format(test_score_ridge))


Using an alpha value of 10, the evaluation of the model, the train, and test data indicate better performance on the ridge model than on the linear regression model.


Instead of picking alpha as 10 always, it is possible to do cross validation 


#Lasso Cross validation

ridge_cv = RidgeCV(alphas = [0.0001, 0.001,0.01, 0.1, 1, 10]).fit(X_train, y_train)


#score

print("Ridge model train score {}".format(ridge_cv.score(X_train, y_train)))

print("Ridge Model test score {}".format(ridge_cv.score(X_test, y_test))) 



references:

https://www.datacamp.com/tutorial/tutorial-lasso-ridge-regression#data%20importation%20and%20eda

No comments:

Post a Comment