# What is Linear Regression in Machine Learning?

Artificial Intelligence has started showing its colors around us in every walk of life. We have smart assistants to control our house and routine activities, we have smart predictors for our business to flourish, we even have smart cars to travel to work, and so on. With AI showing its colors everywhere, it is just but natural that one develops an interest in AI and its working. If you are interested in machine learning and its algorithms, this place is just right to get your feet wet. In this blog, we’ll start with the basic algorithms of machine learning to build upon our knowledge gradually yet effectively. The most rudiment algorithm of machine learning happens to be linear regression. That’s right!

**Linear Regression:**

Now first what is regression? Regression is a technique to model output value with the help of independent predictors. Extensively used in the fields of statistics and finance, regression is used to predict a relationship between a dependent variable and independent variables. Regression techniques mostly differ in the number of independent variables and their relationship with the dependent variable.

Linear regression plots a linear relationship between an independent or predictor variable and the dependent or response variable. The main idea is to predict the best fit line with the help of given data points. The best fit line will have minimum error depicted by the distance between the points and the line. Before fitting the linear model to the data it is necessary that variables have any linear relationship. Not deterministic but some statistical association. If the variables have no relationship then fitting a linear model will be of no use. Techniques such as scatterplots are usually used for such studies.

Now let’s say we have established an association between independent and dependent variables. The best fit line for linear regression modeling would be shown as:

Y = a + bX,

Where X is the independent variable and Y is the dependent variable. The slope of the line is b and a is the intercept (the value of y when x = 0).

The task is to find the best values for a and b.

Now before diving in a bit deeper let’s have a look at two important concepts.

**Cost Function:**

A cost function is important as it helps us in predicting the best values for a and b. As we want the best fit line for data points we will convert this search to minimization problem because we want to minimize the error between actual data points and the values predicted by our model.

The difference between the predicted values and the actual values gives the error. We calculate error difference for all values, square them, sum the squared differences, and divide it by the total number of the data points. This gives us the mean error and this is why this cost function is also called a mean squared error function.

Now, we will change the values of a and b so that cost function gives us the best possible value of minima.

**Gradient Descent:**

Now, this leads us to the second most important concept of learning linear regression. How to reduce the cost function to reach minima? Gradient descent helps us here. Gradient descent tells us to start with small values of and b, and change them iteratively until we reach an appropriate minimum for our cost function. Gradient descent tells us how to play with the values of a and b.

Suppose we are standing at the top of the pit intending to reach the bottom. Now either we can run towards the bottom or take small gradual steps. Gradient descent guides us at taking these steps. The number of these steps is the learning rate. This rate determines how fast we are going to reach the minima. In small steps requiring a large time or large steps (running) as a result of which we might skip the minima.

Now to calculate these steps or gradients from cost function, we take gradients of a and b. in other words we take partial derivatives of and b. A little bit of calculus will do the trick here.

A hyperparameter alpha is declared which is the learning rate. Higher learning rate means less time but with a chance of missing the minima and vice versa for the lower learning rate.

**Conclusion**:

We started with machine learning algorithms in this blog. It is necessary to have conceptual knowledge of machine learning algorithms if we want to grasp the idea of machine learning. As any machine learning newbie, we started with the linear regression algorithm with the intention to keep the ball rolling!

Pingback: Naive Bayes Classifier - Artificial Intelligence blog - Automate Intellect

Pingback: Logistic Regression - Artificial Intelligence blog - Automate Intellect