Monday, April 11, 2022

Linear regression how to use scipy linear regress

Regression is when we want to examine relationship between variables

Linear regression uses the least square method.

The concept is to draw a line through all the plotted data points. The line is positioned in a way that it minimizes the distance to all of the data points.

The distance is called "residuals" or "errors".

Below are the main steps involved 

Import the modules you need: Pandas, matplotlib and Scipy

Isolate Average_Pulse as x. Isolate Calorie_burnage as y

Get important key values with: slope, intercept, r, p, std_err = stats.linregress(x, y)

Create a function that uses the slope and intercept values to return a new value. This new value represents where on the y-axis the corresponding x value will be placed

Run each value of the x array through the function. This will result in a new array with new values for the y-axis: mymodel = list(map(myfunc, x))

Draw the original scatter plot: plt.scatter(x, y)

Draw the line of linear regression: plt.plot(x, mymodel)

Define maximum and minimum values of the axis

Label the axis: "Average_Pulse" and "Calorie_Burnage"

import pandas as pd

import matplotlib.pyplot as plt

from scipy import stats

full_health_data = pd.read_csv("data.csv", header=0, sep=",")

x = full_health_data["Average_Pulse"]

y = full_health_data ["Calorie_Burnage"]

slope, intercept, r, p, std_err = stats.linregress(x, y)


def myfunc(x):

 return slope * x + intercept


mymodel = list(map(myfunc, x))


plt.scatter(x, y)

plt.plot(x, slope * x + intercept)

plt.ylim(ymin=0, ymax=2000)

plt.xlim(xmin=0, xmax=200)

plt.xlabel("Average_Pulse")

plt.ylabel ("Calorie_Burnage")

plt.show()




References:

https://www.w3schools.com/datascience/ds_linear_regression.asp

 

No comments:

Post a Comment