Gradient Descent is the most popular Optimizer for machine learning models.
How it works
It involves the following steps
- Go over the entire dataset
- Compute the gradients of the weights & biases, using Backward-Propagation with respect to the Cost Function
- Average the gradients over the entire dataset
- Add the gradients to the weights & biases while also multiplying them by a Learning Rate
Batching
You can introduce some form of batching to the process
- Mini-batch Gradient Descent allows you to adjust the parameters every N batches instead of after processing the entire dataset. This is less computationally expensive.
- Stochastic Gradient Descent is like Mini-batch Gradient Descent, but with N = 1. This means that you tweak the parameters after every sample in the data set.