Linear Regression:

It can be described as a vector of input features, vector of weights, and bias b

Error at point is
Can also use Mean Absolute Error (MAE) or Mean Squared Error (MSE). MSE is more popular.

Find weights with Gradient Descent

  • Batch gradient descent: Uses all of the training instances to update the model parameters in each iteration.
  • Mini-batch Gradient Descent: Instead of using all examples, Mini-batch Gradient Descent divides the training set into smaller size called batch denoted by ‘b’. Thus a mini-batch ‘b’ is used to update the model parameters in each iteration.
  • Stochastic Gradient Descent (SGD): updates the parameters using only a single training instance in each iteration. The training instance is usually selected randomly. Stochastic gradient descent is often preferred to optimize cost functions when there are hundreds of thousands of training instances or more, as it will converge more quickly than batch gradient descent.

Logistic Regression

For problems where the response variable is not normally distributed, like a coin toss

Using a sigmoid