optimizers.py¶
pycmtensor.optimizers
¶
PyCMTensor optimizers module
Optimizer(name, epsilon=1e-08, **kwargs)
¶
Base optimizer class
Parameters:
Name | Type | Description | Default |
---|---|---|---|
name |
str
|
name of the optimizer |
required |
epsilon |
float
|
small value to avoid division by zero.
Defaults to |
1e-08
|
Adam(params, b1=0.9, b2=0.999, **kwargs)
¶
Bases: Optimizer
An optimizer that implements the Adam algorithm[^1]
Parameters:
Name | Type | Description | Default |
---|---|---|---|
params |
list
|
A list of parameters. |
required |
b1 |
float
|
The value of the b1 parameter. Defaults to 0.9. |
0.9
|
b2 |
float
|
The value of the b2 parameter. Defaults to 0.999. |
0.999
|
**kwargs |
Additional keyword arguments. |
{}
|
Attributes:
Name | Type | Description |
---|---|---|
t |
TensorSharedVariable
|
time step |
m_prev |
list[TensorSharedVariable]
|
previous time step momentum |
v_prev |
list[TensorSharedVariable]
|
previous time step velocity |
-
Kingma et al., 2014. Adam: A Method for Stochastic Optimization. http://arxiv.org/abs/1412.6980 ↩
AdamW(params, b1=0.9, b2=0.999, **kwargs)
¶
Bases: Adam
Initializes the AdamW class with the given parameters.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
params |
list
|
A list of parameters. |
required |
b1 |
float
|
The value of the b1 parameter. Defaults to 0.9. |
0.9
|
b2 |
float
|
The value of the b2 parameter. Defaults to 0.999. |
0.999
|
**kwargs |
Additional keyword arguments. |
{}
|
Example
params = [...] # list of parameters adamw = AdamW(params, b1=0.9, b2=0.999)
Nadam(params, b1=0.99, b2=0.999, **kwargs)
¶
Bases: Adam
An optimizer that implements the Nesterov Adam algorithm[^1]
Parameters:
Name | Type | Description | Default |
---|---|---|---|
params |
list
|
A list of parameters. |
required |
b1 |
float
|
The value of the b1 parameter. Defaults to 0.9. |
0.99
|
b2 |
float
|
The value of the b2 parameter. Defaults to 0.999. |
0.999
|
**kwargs |
Additional keyword arguments. |
{}
|
Attributes:
Name | Type | Description |
---|---|---|
t |
TensorSharedVariable
|
time step |
m_prev |
list[TensorSharedVariable]
|
previous time step momentum |
v_prev |
list[TensorSharedVariable]
|
previous time step velocity |
-
Dozat, T., 2016. Incorporating nesterov momentum into adam.(2016). Dostupné z: http://cs229.stanford.edu/proj2015/054_report.pdf. ↩
Adamax(params, b1=0.9, b2=0.999, **kwargs)
¶
Bases: Adam
An optimizer that implements the Adamax algorithm[^1]. It is a variant of the Adam algorithm
Parameters:
Name | Type | Description | Default |
---|---|---|---|
params |
list
|
A list of parameters. |
required |
b1 |
float
|
The value of the b1 parameter. Defaults to 0.9. |
0.9
|
b2 |
float
|
The value of the b2 parameter. Defaults to 0.999. |
0.999
|
**kwargs |
Additional keyword arguments. |
{}
|
Attributes:
Name | Type | Description |
---|---|---|
t |
TensorSharedVariable
|
time step |
m_prev |
list[TensorSharedVariable]
|
previous time step momentum |
v_prev |
list[TensorSharedVariable]
|
previous time step velocity |
-
Kingma et al., 2014. Adam: A Method for Stochastic Optimization. http://arxiv.org/abs/1412.6980 ↩
Adadelta(params, rho=0.95, **kwargs)
¶
Bases: Optimizer
An optimizer that implements the Adadelta algorithm[^1]
Adadelta is a stochastic gradient descent method that is based on adaptive learning rate per dimension to address two drawbacks:
- The continual decay of learning rates throughout training
- The need for a manually selected global learning rate
Parameters:
Name | Type | Description | Default |
---|---|---|---|
params |
list[TensorSharedVariable]
|
A list of shared variables representing the parameters of the model. |
required |
rho |
float
|
A float representing the decay rate for the learning rate. Defaults to 0.95. |
0.95
|
Attributes:
Name | Type | Description |
---|---|---|
accumulator |
list[TensorSharedVariable]
|
A list of gradient accumulators. |
delta |
list[TensorSharedVariable]
|
A list of adaptive differences between gradients. |
-
Zeiler, 2012. ADADELTA: An Adaptive Learning Rate Method. http://arxiv.org/abs/1212.5701 ↩
RProp(params, inc=1.05, dec=0.5, bounds=[1e-06, 50.0], **kwargs)
¶
Bases: Optimizer
An optimizer that implements the Rprop algorithm[^1]
Parameters:
Name | Type | Description | Default |
---|---|---|---|
params |
list[TensorSharedVariable]
|
A list of TensorSharedVariable objects representing the parameters of the model. |
required |
inc |
float
|
A float representing the increment step if the gradient direction is the same. |
1.05
|
dec |
float
|
A float representing the decrement step if the gradient direction is different. |
0.5
|
bounds |
list[float]
|
A list of floats representing the minimum and maximum bounds for the increment step. |
[1e-06, 50.0]
|
Attributes:
Name | Type | Description |
---|---|---|
factor |
list[TensorVariable]
|
A list of learning rate factor multipliers (init=1.0). |
ghat |
list[TensorVariable]
|
A list of previous step gradients. |
-
Igel, C., & Hüsken, M. (2003). Empirical evaluation of the improved Rprop learning algorithms. Neurocomputing, 50, 105-123. ↩
RMSProp(params, rho=0.9, **kwargs)
¶
Bases: Optimizer
An optimizer that implements the RMSprop algorithm[^1]
Parameters:
Name | Type | Description | Default |
---|---|---|---|
params |
list[TensorSharedVariable]
|
Parameters of the model. |
required |
rho |
float
|
Discounting factor for the history/coming gradient. Defaults to 0.9. |
0.9
|
Attributes:
Name | Type | Description |
---|---|---|
accumulator |
TensorVariable
|
Gradient accumulator. |
-
Hinton, G. E. (2012). rmsprop: Divide the gradient by a running average of its recent magnitude. Retrieved from http://www.cs.toronto.edu/~tijmen/csc321/slides/lecture_slides_lec6.pdf ↩
Momentum(params, mu=0.9, **kwargs)
¶
Bases: Optimizer
Initializes the Momentum optimizer[^1]
Parameters:
Name | Type | Description | Default |
---|---|---|---|
params |
list[TensorSharedVariable]
|
A list of parameters of the model. |
required |
mu |
float
|
The acceleration factor in the relevant direction and dampens oscillations. Defaults to |
0.9
|
Attributes:
Name | Type | Description |
---|---|---|
velocity |
list[TensorSharedVariable]
|
The momentum velocity. |
-
Sutskever et al., 2013. On the importance of initialization and momentum in deep learning. http://jmlr.org/proceedings/papers/v28/sutskever13.pdf ↩
NAG(params, mu=0.99, **kwargs)
¶
Bases: Momentum
An optimizer that implements the Nestrov Accelerated Gradient algorithm[^1]
Parameters:
Name | Type | Description | Default |
---|---|---|---|
params |
list[TensorSharedVariable]
|
A list of parameters of the model. |
required |
mu |
float
|
The acceleration factor in the relevant direction. Defaults to |
0.99
|
Attributes:
Name | Type | Description |
---|---|---|
t |
TensorSharedVariable
|
The momentum time step. |
velocity |
list[TensorSharedVariable]
|
The momentum velocity. |
-
Sutskever et al., 2013. On the importance of initialization and momentum in deep learning. http://jmlr.org/proceedings/papers/v28/sutskever13.pdf ↩
AdaGrad(params, **kwargs)
¶
Bases: Optimizer
An optimizer that implements the Adagrad algorithm[^1]
Adagrad is an optimizer with parameter-specific learning rates, which are adapted relative to how frequently a parameter gets updated during training. The more updates a parameter receives, the smaller the updates.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
params |
list[TensorSharedVariable]
|
parameters of the model |
required |
Attributes:
Name | Type | Description |
---|---|---|
accumulator |
list[TensorSharedVariable]
|
gradient accumulators |
-
Duchi et al., 2011. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization. https://www.jmlr.org/papers/volume12/duchi11a/duchi11a.pdf ↩
SGD(params, **kwargs)
¶
Bases: Optimizer
An optimizer that implements the stochastic gradient algorithm
Parameters:
Name | Type | Description | Default |
---|---|---|---|
params |
list[TensorSharedVariable]
|
parameters of the model |
required |
SQNBFGS(params, config=None, **kwargs)
¶
Bases: Optimizer
Initializes the SQNBFGS optimizer object[^1]
Parameters:
Name | Type | Description | Default |
---|---|---|---|
params |
list[TensorSharedVariable]
|
The parameters of the model. |
required |
config |
config
|
The pycmtensor config object. |
None
|
-
Byrd, R. H., Hansen, S. L., Nocedal, J., & Singer, Y. (2016). A stochastic quasi-Newton method for large-scale optimization. SIAM Journal on Optimization, 26(2), 1008-1031. ↩
clip(param, min, max)
¶
Clips the value of a parameter within a specified range.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
param |
float
|
The parameter value to be clipped. |
required |
min |
float
|
The minimum value that the parameter can take. |
required |
max |
float
|
The maximum value that the parameter can take. |
required |
Returns:
Name | Type | Description |
---|---|---|
float |
The clipped value of the parameter. |