Linear regression

Scott
2 min readApr 9, 2020

--

Machine Learning terms

Independent variable vs Dependent variable: Lets say if you have diabetes you must have insulin resistance, but if you have insulin resistance you might not have diabetes, therefore diabetes is dependent on insulin resistance but insulin resistance is not dependent on diabetes?

  • Independent variable: Factors are not affected by other factors. For example, if the area is 100, my price will be 10k, 50 then 5k. Area is independent of price because the larger the price, doesn’t mean the area is larger, however the larger the area, the tendency to have a larger price. If area is greater but everything else is the same, price will likely be higher. Therefore price is dependent. If price is higher, and everything else is the same (location style), area likely will be larger. Therefore area is dependent.
  • Dependent variable: Affecting by some other feature. Marks are affected by study time. The more you study the higher the marks.
  • Continuous: Usually numbers or strings with unlimited variations in between. The values cannot grouped together in any logical way.
  • Categorical: You can group the values in a logical way. Think enumerations and booleans.
  • Regression: A calculation where we have to assume values?

Linear Regressions:

  • We require data where we can get a straight line. If we have random inconsistent values, it would be non-linear. Your dependent column must be continuous.

Given a scatter plot, multiple lines can be drawn, but the model will choose the line with the most and closest points to it.

Formula

Y = mx + cy = Dependent
x = independent
m = slope
c = intercept
  1. Simple: Only one independent and one dependent. Two total columns.
  1. Multiple: Multiple independents and corresponding one dependent. More than two columns.
  2. Logistic: when we have categorical data in dependent column then we apply logistic which has a sigmoidal graph.
  • Train:
  • Test:

Steps for creating a model

  1. Import required library
  2. read data
  3. identify independent and dependent variables
  4. find the relationships, if your data is not correlated enough, then you cannot create a model.
  5. store independent and dependent in separate variables. ie: x = independent, y = dependent. Independent is usually stored in the 2dimensional square brackets, dependent is usually stored in a 1 dimensional square bracket. Only one dependent variables is possible when creating a model.
  6. train in test split
  7. model func calling.
  8. fit the model (2 arguments independent and dependent.
  9. Predict (one args) dependent in the form of 2 dimensional array.

--

--

No responses yet