Optimizers
- class statinf.ml.optimizers.AdaGrad(learning_rate=0.001, delta=1e-06)[source]
Bases:
Optimizer
Adaptive Gradient optimizer.
- Parameters
learning_rate (
float
) – Step size, defaults to 0.001.delta (
float
) – Constant for division stability, defaults to 10e-7.
- Formula
- \[r = r + \nabla f_{t}(\theta_{t}) \odot \nabla f_{t}(\theta_{t-1})\]\[\theta_{t} = \theta_{t-1} - \frac{\epsilon}{\sqrt{\delta + r}} \odot \nabla f_{t}(\theta_{t-1})\]
- References
Duchi, J., Hazan, E., & Singer, Y. (2011). Adaptive subgradient methods for online learning and stochastic optimization. Journal of machine learning research, 12 (Jul), 2121-2159.
Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep learning. MIT press.
- Example
>>> AdaGrad(learning_rate=0.001, delta=10e-7).updates(params, grads)
- update(params=None, grads=None)[source]
Update loss using the AdaGrad formula.
- Parameters
params (
dict
, optional) – Dictionnary with parameters to be updated, defaults to None.grads (
dict
, optional) – Dictionnary with the computed gradients, defaults to None.
- Returns
Updated parameters.
- Return type
dict
- class statinf.ml.optimizers.AdaMax(learning_rate=0.001, beta1=0.9, beta2=0.999)[source]
Bases:
Optimizer
AdaMax optimizer (Adam with infinite norm).
- Parameters
learning_rate (
float
) – Step size, defaults to 0.001.beta1 (
float
) – Exponential decay rate for first moment estimate, defaults to 0.9.beta2 (
float
) – Exponential decay rate for scond moment estimate, defaults to 0.999.
- Formula
- \[m_{t} = \beta_{1} m_{t-1} + (1 - \beta_{1}) \nabla_{\theta} f_{t}(\theta_{t-1})\]\[u_{t} = \max(\beta_{2} \cdot u_{t-1}, |\nabla f_{t}(\theta_{t-1})|)\]\[\theta_{t} = \theta_{t-1} - \alpha \dfrac{\hat{m}_{t}}{\sqrt{\hat{v}_{t}} + \epsilon}\]
- References
Kingma, D. P., & Ba, J. (2014). Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980
Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep learning. MIT press.
- Example
>>> AdaMax(learning_rate=0.001, beta1=0.9, beta2=0.999).updates(params, grads)
- update(params=None, grads=None)[source]
Update loss using the AdaMax formula.
- Parameters
params (
dict
, optional) – Dictionnary with parameters to be updated, defaults to None.grads (
dict
, optional) – Dictionnary with the computed gradients, defaults to None.
- Returns
Updated parameters.
- Return type
dict
- class statinf.ml.optimizers.Adam(learning_rate=0.001, beta1=0.9, beta2=0.999, delta=1e-07)[source]
Bases:
Optimizer
Adaptive Moments optimizer.
- Parameters
learning_rate (
float
) – Step size, defaults to 0.001.beta1 (
float
) – Exponential decay rate for first moment estimate, defaults to 0.9.beta2 (
float
) – Exponential decay rate for scond moment estimate, defaults to 0.999.delta (
float
) – Constant for division stability, defaults to 10e-8.
- Formula
- \[m_{t} = \beta_{1} m_{t-1} + (1 - \beta_{1}) \nabla_{\theta} f_{t}(\theta_{t-1})\]\[v_{t} = \beta_{2} v_{t-1} + (1 - \beta_{2}) \nabla_{\theta}^{2} f_{t}(\theta_{t-1})\]\[\hat{m}_{t} = \dfrac{m_{t}}{1 - \beta_{1}^{t}}\]\[\hat{v}_{t} = \dfrac{v_{t}}{1 - \beta_{2}^{t}}\]\[\theta_{t} = \theta_{t-1} - \alpha \dfrac{\hat{m}_{t}}{\sqrt{\hat{v}_{t}} + \delta}\]
- References
Kingma, D. P., & Ba, J. (2014). Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980
Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep learning. MIT press.
- Example
>>> Adam(learning_rate=0.001, beta1=0.9, beta2=0.999, delta=10e-8).updates(params, grads)
- update(params=None, grads=None)[source]
Update loss using the Adam formula.
- Parameters
params (
dict
, optional) – Dictionnary with parameters to be updated, defaults to None.grads (
dict
, optional) – Dictionnary with the computed gradients, defaults to None.
- Returns
Updated parameters.
- Return type
dict
- class statinf.ml.optimizers.Optimizer(learning_rate=0.01)[source]
Bases:
object
Optimization updater
- Parameters
object (class) – Optimizer object
- class statinf.ml.optimizers.RMSprop(learning_rate=0.001, rho=0.9, delta=1e-05)[source]
Bases:
Optimizer
RMSprop optimizer.
- Parameters
learning_rate (
float
) – Step size, defaults to 0.001.rho (
float
) – Decay rate, defaults to 0.9.delta (
float
) – Constant for division stability, defaults to 10e-6.
- Formula
- \[r = \rho r + (1- \rho) \nabla f_{t}(\theta_{t-1}) \odot \nabla f_{t}(\theta_{t-1})\]\[\theta_{t} = \theta_{t-1} - \dfrac{\epsilon}{\sqrt{\delta + r}} \odot \nabla f_{t}(\theta_{t-1})\]
- References
Tieleman, T., & Hinton, G. (2012). Lecture 6.5-rmsprop: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural networks for machine learning, 4(2), 26-31.
Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep learning. MIT press.
- Example
>>> RMSprop(learning_rate=0.001, rho=0.9, delta=10e-6).updates(params, grads)
- update(params=None, grads=None)[source]
Update loss using the RMSprop formula.
- Parameters
params (
dict
, optional) – Dictionnary with parameters to be updated, defaults to None.grads (
dict
, optional) – Dictionnary with the computed gradients, defaults to None.
- Returns
Updated parameters.
- Return type
dict
- class statinf.ml.optimizers.SGD(learning_rate=0.01, alpha=0.0)[source]
Bases:
Optimizer
Stochastic Gradient Descent optimizer.
- Parameters
learning_rate (
float
) – Step size, defaults to 0.01.alpha (
float
) – Momentum parameter, defaults to 0.0.
- Formula
- \[\theta_{t} = \theta_{t-1} + \alpha v - \epsilon \nabla f_{t}(\theta_{t-1})\]
- References
Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep learning. MIT press.
- Example
>>> SGD(learning_rate=0.01, alpha=0.).updates(params, grads)
- update(params=None, grads=None)[source]
Update loss using the SGD formula.
- Parameters
params (
dict
, optional) – Dictionnary with parameters to be updated, defaults to None.grads (
dict
, optional) – Dictionnary with the computed gradients, defaults to None.
- Returns
Updated parameters.
- Return type
dict