machine-learning

Gradient Descent is the most popular Optimizer for machine learning models.

It involves the following steps

- Go over the entire dataset
- Compute the gradients of the weights & biases, using Backward-Propagation with respect to the Cost Function
- Average the gradients over the entire dataset
- Add the gradients to the weights & biases while also multiplying them by a Learning Rate

You can introduce some form of batching to the process

**Mini-batch Gradient Descent**allows you to adjust the parameters every N batches instead of after processing the entire dataset. This is less computationally expensive.**Stochastic Gradient Descent**is like**Mini-batch Gradient Descent**, but with N = 1. This means that you tweak the parameters after every sample in the data set.