Martin Takac MBZUAI, UAE Associate Professor in the Machine Learning department Training ML models without hyper-parameter tuning - adaptive step-size procedures The training stage, in which a loss is minimized over the training dataset, is one of many steps in the machine learning pipeline, starting with data collection, cleaning, and model selection. Training modern machine learning models involves finding optimal parameter values through iterative optimization algorithms such as stochastic gradient descent (SGD) or its variants. One of the key challenges while using such algorithms is determining appropriate step sizes or learning rates, which strike a balance between convergence speed and stability. In recent years, adaptive step-size algorithms have gained popularity due to their ability to dynamically adjust learning rates based on the gradient magnitudes encountered during training. This talk explores various adaptive step-size algorithms, such as Adam and Adagrad, and discusses their advantages and limitations for training machine learning models. We will then introduce two new strategies for adaptive step-size based on Polyak step-size and implicit step-size in a variance-reduced method Sarah.
Hide player controls
Hide resume playing