Linear Regression Algorithm and its implemention in python
Linear regression is a widely used algorithm in machine learning for solving regression problems. The objective of linear regression is to fit a line to a set of data points such that the line is able to predict the output value for a new input value. In this article, we will discuss the basics of linear regression and how it can be implemented using Python.
Linear regression is a statistical method that is used to model the relationship between a dependent variable and one or more independent variables. The dependent variable is usually denoted as Y, while the independent variables are denoted as X1, X2, X3, …, Xn. In simple linear regression, there is only one independent variable( y= ax+b), while in multiple linear regression, there are more than one independent variable.
The linear regression model can be represented as:
Y = b0 + b1X1 + b2X2 + … + bn*Xn + ε
Where b0, b1, b2, …, bn are the coefficients of the independent variables, X1, X2, …, Xn, respectively, and ε is the error term that represents the deviation of the actual value from the predicted value. The objective of linear regression is to estimate the values of b0, b1, b2, …, bn such that the error term is minimized.
Implementing Linear Regression in Python
To implement linear regression in Python, we need to first import the necessary libraries, such as NumPy, Pandas, and Matplotlib. We will use NumPy for mathematical calculations, Pandas for data manipulation, and Matplotlib for visualizations.To implement linear regression in Python, we need to first import the necessary libraries, such as NumPy, Pandas, and Matplotlib. We will use NumPy for mathematical calculations, Pandas for data manipulation, and Matplotlib for visualizations.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
Next, we need to load the data into our program. For this example, we will use a dataset that contains information about the advertising expenditures and sales of a company.
data = pd.read_csv(‘Advertising.csv’)
data.head()
The first step is to visualize the relationship between the independent variable and the dependent variable. We can use scatter plots to visualize the relationship.
plt.scatter(data[‘TV’], data[‘Sales’])
plt.xlabel(‘TV Advertising Expenditure’)
plt.ylabel(‘Sales’)
plt.show()
As we can see from the scatter plot, there is a positive linear relationship between the TV advertising expenditure and sales.
Next, we need to split the data into training and testing sets. We will use 70% of the data for training and 30% of the data for testing.
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(data[‘TV’], data[‘Sales’], test_size=0.3, random_state=42)
After splitting the data, we can create the linear regression model and fit it to the training data.
from sklearn.linear_model import LinearRegression
lin_reg = LinearRegression()
X_train = X_train.values.reshape(-1, 1)
y_train = y_train.values.reshape(-1, 1)
lin_reg.fit(X_train, y_train)
Once the model is trained, we can use it to make predictions on the testing data.
X_test = X_test.values.reshape(-1, 1)
y_test = y_test.values.reshape(-1, 1)
y_pred = lin_reg.predict(X_test)
Finally, we can visualize the results by plotting the regression line on the scatter plot.
plt.scatter(X_test, y_test)
plt.plot(X_test, y_pred, color=’red’)
plt.xlabel(‘TV Advertising Expenditure’)
plt.ylabel(‘Sales’)
plt.show()
As we can see from the graph, the regression line fits the data points well, and we can use it to make predictions on new data.