# Optimizers

Bases: Optimizer

Parameters
• learning_rate (float) – Step size, defaults to 0.001.

• delta (float) – Constant for division stability, defaults to 10e-7.

Formula
$r = r + \nabla f_{t}(\theta_{t}) \odot \nabla f_{t}(\theta_{t-1})$
$\theta_{t} = \theta_{t-1} - \frac{\epsilon}{\sqrt{\delta + r}} \odot \nabla f_{t}(\theta_{t-1})$
References
Example

>>> AdaGrad(learning_rate=0.001, delta=10e-7).updates(params, grads)


Parameters
• params (dict, optional) – Dictionnary with parameters to be updated, defaults to None.

• grads (dict, optional) – Dictionnary with the computed gradients, defaults to None.

Returns

Updated parameters.

Return type

dict

Bases: Optimizer

Parameters
• learning_rate (float) – Step size, defaults to 0.001.

• beta1 (float) – Exponential decay rate for first moment estimate, defaults to 0.9.

• beta2 (float) – Exponential decay rate for scond moment estimate, defaults to 0.999.

Formula
$m_{t} = \beta_{1} m_{t-1} + (1 - \beta_{1}) \nabla_{\theta} f_{t}(\theta_{t-1})$
$u_{t} = \max(\beta_{2} \cdot u_{t-1}, |\nabla f_{t}(\theta_{t-1})|)$
$\theta_{t} = \theta_{t-1} - \alpha \dfrac{\hat{m}_{t}}{\sqrt{\hat{v}_{t}} + \epsilon}$
References
Example

>>> AdaMax(learning_rate=0.001, beta1=0.9, beta2=0.999).updates(params, grads)


Update loss using the AdaMax formula.

Parameters
• params (dict, optional) – Dictionnary with parameters to be updated, defaults to None.

• grads (dict, optional) – Dictionnary with the computed gradients, defaults to None.

Returns

Updated parameters.

Return type

dict

Bases: Optimizer

Parameters
• learning_rate (float) – Step size, defaults to 0.001.

• beta1 (float) – Exponential decay rate for first moment estimate, defaults to 0.9.

• beta2 (float) – Exponential decay rate for scond moment estimate, defaults to 0.999.

• delta (float) – Constant for division stability, defaults to 10e-8.

Formula
$m_{t} = \beta_{1} m_{t-1} + (1 - \beta_{1}) \nabla_{\theta} f_{t}(\theta_{t-1})$
$v_{t} = \beta_{2} v_{t-1} + (1 - \beta_{2}) \nabla_{\theta}^{2} f_{t}(\theta_{t-1})$
$\hat{m}_{t} = \dfrac{m_{t}}{1 - \beta_{1}^{t}}$
$\hat{v}_{t} = \dfrac{v_{t}}{1 - \beta_{2}^{t}}$
$\theta_{t} = \theta_{t-1} - \alpha \dfrac{\hat{m}_{t}}{\sqrt{\hat{v}_{t}} + \delta}$
References
Example

>>> Adam(learning_rate=0.001, beta1=0.9, beta2=0.999, delta=10e-8).updates(params, grads)


Update loss using the Adam formula.

Parameters
• params (dict, optional) – Dictionnary with parameters to be updated, defaults to None.

• grads (dict, optional) – Dictionnary with the computed gradients, defaults to None.

Returns

Updated parameters.

Return type

dict

class statinf.ml.optimizers.Optimizer(learning_rate=0.01)[source]

Bases: object

Optimization updater

Parameters

object (class) – Optimizer object

class statinf.ml.optimizers.RMSprop(learning_rate=0.001, rho=0.9, delta=1e-05)[source]

Bases: Optimizer

RMSprop optimizer.

Parameters
• learning_rate (float) – Step size, defaults to 0.001.

• rho (float) – Decay rate, defaults to 0.9.

• delta (float) – Constant for division stability, defaults to 10e-6.

Formula
$r = \rho r + (1- \rho) \nabla f_{t}(\theta_{t-1}) \odot \nabla f_{t}(\theta_{t-1})$
$\theta_{t} = \theta_{t-1} - \dfrac{\epsilon}{\sqrt{\delta + r}} \odot \nabla f_{t}(\theta_{t-1})$
References
Example

>>> RMSprop(learning_rate=0.001, rho=0.9, delta=10e-6).updates(params, grads)


Update loss using the RMSprop formula.

Parameters
• params (dict, optional) – Dictionnary with parameters to be updated, defaults to None.

• grads (dict, optional) – Dictionnary with the computed gradients, defaults to None.

Returns

Updated parameters.

Return type

dict

class statinf.ml.optimizers.SGD(learning_rate=0.01, alpha=0.0)[source]

Bases: Optimizer

Parameters
• learning_rate (float) – Step size, defaults to 0.01.

• alpha (float) – Momentum parameter, defaults to 0.0.

Formula
$\theta_{t} = \theta_{t-1} + \alpha v - \epsilon \nabla f_{t}(\theta_{t-1})$
References

Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep learning. MIT press.

Example

>>> SGD(learning_rate=0.01, alpha=0.).updates(params, grads)


Update loss using the SGD formula.

Parameters
• params (dict, optional) – Dictionnary with parameters to be updated, defaults to None.

• grads (dict, optional) – Dictionnary with the computed gradients, defaults to None.

Returns

Updated parameters.

Return type

dict