Optimizers

class statinf.ml.optimizers.AdaGrad(learning_rate=0.001, delta=1e-06)[source]

Bases: Optimizer

Adaptive Gradient optimizer.

Parameters
  • learning_rate (float) – Step size, defaults to 0.001.

  • delta (float) – Constant for division stability, defaults to 10e-7.

Formula
\[r = r + \nabla f_{t}(\theta_{t}) \odot \nabla f_{t}(\theta_{t-1})\]
\[\theta_{t} = \theta_{t-1} - \frac{\epsilon}{\sqrt{\delta + r}} \odot \nabla f_{t}(\theta_{t-1})\]
References
Example

>>> AdaGrad(learning_rate=0.001, delta=10e-7).updates(params, grads)
update(params=None, grads=None)[source]

Update loss using the AdaGrad formula.

Parameters
  • params (dict, optional) – Dictionnary with parameters to be updated, defaults to None.

  • grads (dict, optional) – Dictionnary with the computed gradients, defaults to None.

Returns

Updated parameters.

Return type

dict

class statinf.ml.optimizers.AdaMax(learning_rate=0.001, beta1=0.9, beta2=0.999)[source]

Bases: Optimizer

AdaMax optimizer (Adam with infinite norm).

Parameters
  • learning_rate (float) – Step size, defaults to 0.001.

  • beta1 (float) – Exponential decay rate for first moment estimate, defaults to 0.9.

  • beta2 (float) – Exponential decay rate for scond moment estimate, defaults to 0.999.

Formula
\[m_{t} = \beta_{1} m_{t-1} + (1 - \beta_{1}) \nabla_{\theta} f_{t}(\theta_{t-1})\]
\[u_{t} = \max(\beta_{2} \cdot u_{t-1}, |\nabla f_{t}(\theta_{t-1})|)\]
\[\theta_{t} = \theta_{t-1} - \alpha \dfrac{\hat{m}_{t}}{\sqrt{\hat{v}_{t}} + \epsilon}\]
References
Example

>>> AdaMax(learning_rate=0.001, beta1=0.9, beta2=0.999).updates(params, grads)
update(params=None, grads=None)[source]

Update loss using the AdaMax formula.

Parameters
  • params (dict, optional) – Dictionnary with parameters to be updated, defaults to None.

  • grads (dict, optional) – Dictionnary with the computed gradients, defaults to None.

Returns

Updated parameters.

Return type

dict

class statinf.ml.optimizers.Adam(learning_rate=0.001, beta1=0.9, beta2=0.999, delta=1e-07)[source]

Bases: Optimizer

Adaptive Moments optimizer.

Parameters
  • learning_rate (float) – Step size, defaults to 0.001.

  • beta1 (float) – Exponential decay rate for first moment estimate, defaults to 0.9.

  • beta2 (float) – Exponential decay rate for scond moment estimate, defaults to 0.999.

  • delta (float) – Constant for division stability, defaults to 10e-8.

Formula
\[m_{t} = \beta_{1} m_{t-1} + (1 - \beta_{1}) \nabla_{\theta} f_{t}(\theta_{t-1})\]
\[v_{t} = \beta_{2} v_{t-1} + (1 - \beta_{2}) \nabla_{\theta}^{2} f_{t}(\theta_{t-1})\]
\[\hat{m}_{t} = \dfrac{m_{t}}{1 - \beta_{1}^{t}}\]
\[\hat{v}_{t} = \dfrac{v_{t}}{1 - \beta_{2}^{t}}\]
\[\theta_{t} = \theta_{t-1} - \alpha \dfrac{\hat{m}_{t}}{\sqrt{\hat{v}_{t}} + \delta}\]
References
Example

>>> Adam(learning_rate=0.001, beta1=0.9, beta2=0.999, delta=10e-8).updates(params, grads)
update(params=None, grads=None)[source]

Update loss using the Adam formula.

Parameters
  • params (dict, optional) – Dictionnary with parameters to be updated, defaults to None.

  • grads (dict, optional) – Dictionnary with the computed gradients, defaults to None.

Returns

Updated parameters.

Return type

dict

class statinf.ml.optimizers.Optimizer(learning_rate=0.01)[source]

Bases: object

Optimization updater

Parameters

object (class) – Optimizer object

updates(params=None, grads=None)[source]
class statinf.ml.optimizers.RMSprop(learning_rate=0.001, rho=0.9, delta=1e-05)[source]

Bases: Optimizer

RMSprop optimizer.

Parameters
  • learning_rate (float) – Step size, defaults to 0.001.

  • rho (float) – Decay rate, defaults to 0.9.

  • delta (float) – Constant for division stability, defaults to 10e-6.

Formula
\[r = \rho r + (1- \rho) \nabla f_{t}(\theta_{t-1}) \odot \nabla f_{t}(\theta_{t-1})\]
\[\theta_{t} = \theta_{t-1} - \dfrac{\epsilon}{\sqrt{\delta + r}} \odot \nabla f_{t}(\theta_{t-1})\]
References
Example

>>> RMSprop(learning_rate=0.001, rho=0.9, delta=10e-6).updates(params, grads)
update(params=None, grads=None)[source]

Update loss using the RMSprop formula.

Parameters
  • params (dict, optional) – Dictionnary with parameters to be updated, defaults to None.

  • grads (dict, optional) – Dictionnary with the computed gradients, defaults to None.

Returns

Updated parameters.

Return type

dict

class statinf.ml.optimizers.SGD(learning_rate=0.01, alpha=0.0)[source]

Bases: Optimizer

Stochastic Gradient Descent optimizer.

Parameters
  • learning_rate (float) – Step size, defaults to 0.01.

  • alpha (float) – Momentum parameter, defaults to 0.0.

Formula
\[\theta_{t} = \theta_{t-1} + \alpha v - \epsilon \nabla f_{t}(\theta_{t-1})\]
References

Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep learning. MIT press.

Example

>>> SGD(learning_rate=0.01, alpha=0.).updates(params, grads)
update(params=None, grads=None)[source]

Update loss using the SGD formula.

Parameters
  • params (dict, optional) – Dictionnary with parameters to be updated, defaults to None.

  • grads (dict, optional) – Dictionnary with the computed gradients, defaults to None.

Returns

Updated parameters.

Return type

dict