Optimizers

class statinf.ml.optimizers.AdaGrad(learning_rate=0.001, delta=1e-06)[source]

Bases: Optimizer

Adaptive Gradient optimizer.

Parameters

learning_rate (float) – Step size, defaults to 0.001.
delta (float) – Constant for division stability, defaults to 10e-7.

Formula

\[r = r + \nabla f_{t}(\theta_{t}) \odot \nabla f_{t}(\theta_{t-1})\]

\[\theta_{t} = \theta_{t-1} - \frac{\epsilon}{\sqrt{\delta + r}} \odot \nabla f_{t}(\theta_{t-1})\]

References

Duchi, J., Hazan, E., & Singer, Y. (2011). Adaptive subgradient methods for online learning and stochastic optimization. Journal of machine learning research, 12 (Jul), 2121-2159.
Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep learning. MIT press.

Example

>>> AdaGrad(learning_rate=0.001, delta=10e-7).updates(params, grads)

update(params=None, grads=None)[source]

Update loss using the AdaGrad formula.

Parameters

params (dict, optional) – Dictionnary with parameters to be updated, defaults to None.
grads (dict, optional) – Dictionnary with the computed gradients, defaults to None.

Returns

Updated parameters.

Return type

dict

class statinf.ml.optimizers.AdaMax(learning_rate=0.001, beta1=0.9, beta2=0.999)[source]

Bases: Optimizer

AdaMax optimizer (Adam with infinite norm).

Parameters

learning_rate (float) – Step size, defaults to 0.001.
beta1 (float) – Exponential decay rate for first moment estimate, defaults to 0.9.
beta2 (float) – Exponential decay rate for scond moment estimate, defaults to 0.999.

Formula

\[m_{t} = \beta_{1} m_{t-1} + (1 - \beta_{1}) \nabla_{\theta} f_{t}(\theta_{t-1})\]

\[u_{t} = \max(\beta_{2} \cdot u_{t-1}, |\nabla f_{t}(\theta_{t-1})|)\]

\[\theta_{t} = \theta_{t-1} - \alpha \dfrac{\hat{m}_{t}}{\sqrt{\hat{v}_{t}} + \epsilon}\]

References

Kingma, D. P., & Ba, J. (2014). Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980
Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep learning. MIT press.

Example

>>> AdaMax(learning_rate=0.001, beta1=0.9, beta2=0.999).updates(params, grads)

update(params=None, grads=None)[source]

Update loss using the AdaMax formula.

Parameters

params (dict, optional) – Dictionnary with parameters to be updated, defaults to None.
grads (dict, optional) – Dictionnary with the computed gradients, defaults to None.

Returns

Updated parameters.

Return type

dict

class statinf.ml.optimizers.Adam(learning_rate=0.001, beta1=0.9, beta2=0.999, delta=1e-07)[source]

Bases: Optimizer

Adaptive Moments optimizer.

Parameters

learning_rate (float) – Step size, defaults to 0.001.
beta1 (float) – Exponential decay rate for first moment estimate, defaults to 0.9.
beta2 (float) – Exponential decay rate for scond moment estimate, defaults to 0.999.
delta (float) – Constant for division stability, defaults to 10e-8.

Formula

\[m_{t} = \beta_{1} m_{t-1} + (1 - \beta_{1}) \nabla_{\theta} f_{t}(\theta_{t-1})\]

\[v_{t} = \beta_{2} v_{t-1} + (1 - \beta_{2}) \nabla_{\theta}^{2} f_{t}(\theta_{t-1})\]

\[\hat{m}_{t} = \dfrac{m_{t}}{1 - \beta_{1}^{t}}\]

\[\hat{v}_{t} = \dfrac{v_{t}}{1 - \beta_{2}^{t}}\]

\[\theta_{t} = \theta_{t-1} - \alpha \dfrac{\hat{m}_{t}}{\sqrt{\hat{v}_{t}} + \delta}\]

References

Kingma, D. P., & Ba, J. (2014). Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980
Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep learning. MIT press.

Example

>>> Adam(learning_rate=0.001, beta1=0.9, beta2=0.999, delta=10e-8).updates(params, grads)

update(params=None, grads=None)[source]

Update loss using the Adam formula.

Parameters

params (dict, optional) – Dictionnary with parameters to be updated, defaults to None.
grads (dict, optional) – Dictionnary with the computed gradients, defaults to None.

Returns

Updated parameters.

Return type

dict

class statinf.ml.optimizers.Optimizer(learning_rate=0.01)[source]

Bases: object

Optimization updater

Parameters: object (class) – Optimizer object

updates(params=None, grads=None)[source]

class statinf.ml.optimizers.RMSprop(learning_rate=0.001, rho=0.9, delta=1e-05)[source]

Bases: Optimizer

RMSprop optimizer.

Parameters

learning_rate (float) – Step size, defaults to 0.001.
rho (float) – Decay rate, defaults to 0.9.
delta (float) – Constant for division stability, defaults to 10e-6.

Formula

\[r = \rho r + (1- \rho) \nabla f_{t}(\theta_{t-1}) \odot \nabla f_{t}(\theta_{t-1})\]

\[\theta_{t} = \theta_{t-1} - \dfrac{\epsilon}{\sqrt{\delta + r}} \odot \nabla f_{t}(\theta_{t-1})\]

References

Tieleman, T., & Hinton, G. (2012). Lecture 6.5-rmsprop: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural networks for machine learning, 4(2), 26-31.
Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep learning. MIT press.

Example

>>> RMSprop(learning_rate=0.001, rho=0.9, delta=10e-6).updates(params, grads)

update(params=None, grads=None)[source]

Update loss using the RMSprop formula.

Parameters

params (dict, optional) – Dictionnary with parameters to be updated, defaults to None.
grads (dict, optional) – Dictionnary with the computed gradients, defaults to None.

Returns

Updated parameters.

Return type

dict

class statinf.ml.optimizers.SGD(learning_rate=0.01, alpha=0.0)[source]

Bases: Optimizer

Stochastic Gradient Descent optimizer.

Parameters

learning_rate (float) – Step size, defaults to 0.01.
alpha (float) – Momentum parameter, defaults to 0.0.

Formula

\[\theta_{t} = \theta_{t-1} + \alpha v - \epsilon \nabla f_{t}(\theta_{t-1})\]

References

Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep learning. MIT press.

Example

>>> SGD(learning_rate=0.01, alpha=0.).updates(params, grads)

update(params=None, grads=None)[source]

Update loss using the SGD formula.

Parameters

params (dict, optional) – Dictionnary with parameters to be updated, defaults to None.
grads (dict, optional) – Dictionnary with the computed gradients, defaults to None.

Returns

Updated parameters.

Return type

dict