Optimizers

class statinf.ml.optimizers.AdaGrad(params=None, learning_rate=0.01, delta=1e-07)[source]

Bases: statinf.ml.optimizers.Optimizer

Adaptive Gradient optimizer

Parameters
  • params (list) – List compiling the parameters to update.

  • learning_rate (float) – Step size, defaults to 0.01.

  • delta (float) – Constant for division stability, defaults to 1e-7.

Formula
\[r = r + \nabla f_{t}(\theta_{t}) \odot \nabla f_{t}(\theta_{t-1})\]
\[\theta_{t} = \theta_{t-1} - \frac{\epsilon}{\sqrt{\delta + r}} \odot \nabla f_{t}(\theta_{t-1})\]
References
Example

>>> AdaGrad(params=[W, b], learning_rate=0.01, delta=1e-7).updates(cost)
updates(loss=None)[source]

Update loss using the AdaGrad formula

Parameters

loss (tensor, optional) – Vector of loss to be updated, defaults to None

Returns

Updated parameters and gradients

Return type

tensor

class statinf.ml.optimizers.AdaMax(params=None, learning_rate=0.001, beta1=0.9, beta2=0.999)[source]

Bases: statinf.ml.optimizers.Optimizer

AdaMax optimizer (Adam with infinite norm)

Parameters
  • params (list) – List compiling the parameters to update.

  • learning_rate (float) – Step size, defaults to 0.001.

  • beta1 (float) – Exponential decay rate for first moment estimate, defaults to 0.9.

  • beta2 (float) – Exponential decay rate for scond moment estimate, defaults to 0.999.

Formula
\[m_{t} = \beta_{1} m_{t-1} + (1 - \beta_{1}) \nabla_{\theta} f_{t}(\theta_{t-1})\]
\[u_{t} = \max(\beta_{2} \cdot u_{t-1}, |\nabla f_{t}(\theta_{t-1})|)\]
\[\theta_{t} = \theta_{t-1} - \alpha \dfrac{\hat{m}_{t}}{\sqrt{\hat{v}_{t}} + \epsilon}\]
References
Example

>>> AdaMax(params=[W, b], learning_rate=0.001, beta1=0.9, beta2=0.999).updates(cost)
updates(loss=None)[source]

Update loss using the AdaMax formula

Parameters

loss (tensor, optional) – Vector of loss to be updated, defaults to None

Returns

Updated parameters and gradients

Return type

tensor

class statinf.ml.optimizers.Adam(params=None, learning_rate=0.001, beta1=0.9, beta2=0.999, delta=1e-07)[source]

Bases: statinf.ml.optimizers.Optimizer

Adaptive Moments optimizer

Parameters
  • params (list) – List compiling the parameters to update.

  • learning_rate (float) – Step size, defaults to 0.001.

  • beta1 (float) – Exponential decay rate for first moment estimate, defaults to 0.9.

  • beta2 (float) – Exponential decay rate for scond moment estimate, defaults to 0.999.

  • delta (float) – Constant for division stability, defaults to 10e-8.

Formula
\[m_{t} = \beta_{1} m_{t-1} + (1 - \beta_{1}) \nabla_{\theta} f_{t}(\theta_{t-1})\]
\[v_{t} = \beta_{2} v_{t-1} + (1 - \beta_{2}) \nabla_{\theta}^{2} f_{t}(\theta_{t-1})\]
\[\hat{m}_{t} = \dfrac{m_{t}}{1 - \beta_{1}^{t}}\]
\[\hat{v}_{t} = \dfrac{v_{t}}{1 - \beta_{2}^{t}}\]
\[\theta_{t} = \theta_{t-1} - \alpha \dfrac{\hat{m}_{t}}{\sqrt{\hat{v}_{t}} + \delta}\]
References
Example

>>> ADAM(params=[W, b], learning_rate=0.001, beta1=0.9, beta2=0.999, delta=10e-8).updates(cost)
updates(loss=None)[source]

Update loss using the Adam formula

Parameters

loss (tensor, optional) – Vector of loss to be updated, defaults to None

Returns

Updated parameters and gradients

Return type

tensor

class statinf.ml.optimizers.MomentumSGD(params=None, learning_rate=0.01, alpha=0.9)[source]

Bases: statinf.ml.optimizers.Optimizer

Stochastic Gradient Descent with Momentum

Parameters
  • params (list) – List compiling the parameters to update.

  • learning_rate (float) – Step size, defaults to 0.01.

  • alpha (float) – Momentum parameter, defaults to 0.9.

Formula
\[\theta_{t} = \theta_{t-1} + \alpha v - \epsilon \nabla f_{t}(\theta_{t-1})\]
References

Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep learning. MIT press.

Example

>>> MomentumSGD(params=[W, b], learning_rate=0.01, alpha=0.9).updates(cost)
updates(loss=None)[source]

Update loss using the Momentum SGD formula

Parameters

loss (tensor, optional) – Vector of loss to be updated, defaults to None

Returns

Updated parameters and gradients

Return type

tensor

class statinf.ml.optimizers.Optimizer(params=None)[source]

Bases: object

Optimization updater

Parameters

object (class) – Optimizer object

updates(loss=None)[source]
class statinf.ml.optimizers.RMSprop(params=None, learning_rate=0.001, rho=0.9, delta=1e-05)[source]

Bases: statinf.ml.optimizers.Optimizer

RMSprop optimizer

Parameters
  • params (list) – List compiling the parameters to update.

  • learning_rate (float) – Step size, defaults to 0.001.

  • rho (float) – Decay rate, defaults to 0.9.

  • delta (float) – Constant for division stability, defaults to 10e-6.

Formula
\[r = \rho r + (1- \rho) \nabla f_{t}(\theta_{t-1}) \odot \nabla f_{t}(\theta_{t-1})\]
\[\theta_{t} = \theta_{t-1} - \dfrac{\epsilon}{\sqrt{\delta + r}} \odot \nabla f_{t}(\theta_{t-1})\]
References
Example

>>> RMSprop(params=[W, b], learning_rate=0.01, rho=0.9, delta=10e-6).updates(cost)
updates(loss=None)[source]

Update loss using the RMSprop formula

Parameters

loss (tensor, optional) – Vector of loss to be updated, defaults to None

Returns

Updated parameters and gradients

Return type

tensor

class statinf.ml.optimizers.SGD(learning_rate=0.01, params=None)[source]

Bases: statinf.ml.optimizers.Optimizer

Stochastic Gradient Descent optimizer

Parameters
  • params (list) – List compiling the parameters to update.

  • learning_rate (float) – Step size, defaults to 0.01.

Formula
\[\theta_{t} = \theta_{t-1} - \epsilon \nabla f_{t}(\theta_{t-1})\]
References

Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep learning. MIT press.

Example

>>> SGD(params=[W, b], learning_rate=0.01).updates(cost)
updates(loss=None)[source]

Update loss using the SGD formula

Parameters

loss (tensor, optional) – Vector of loss to be updated, defaults to None

Returns

Updated parameters and gradients

Return type

tensor

statinf.ml.optimizers.build_shared_zeros(shape, name='v')[source]

Builds a theano shared variable filled with a zeros numpy array

Parameters
  • shape (tuple) – Shape of the vector to create.

  • name (str) – Name to give to the shared theano value, defaults to ‘v’.

Example

>>> build_shared_zeros(shape=(5, 10), name='r')
Returns

Theano shared matrix

Return type

theano.shared