Skip to content

optimizers.py


pycmtensor.optimizers

PyCMTensor optimizers module

Optimizer(name, epsilon=1e-08, **kwargs)

Base optimizer class

Parameters:

Name Type Description Default
name str

name of the optimizer

required
epsilon float

small value to avoid division by zero. Defaults to 1e-8

1e-08

__repr__()

Returns a string representation of the optimizer object.

Returns:

Name Type Description
str

A string representation of the optimizer object, including its name and parameters.

update(**kwargs)

Update parameters for aesara function calls

Returns:

Type Description

None

Adam(params, b1=0.9, b2=0.999, **kwargs)

Bases: Optimizer

An optimizer that implements the Adam algorithm[^1]

Parameters:

Name Type Description Default
params list

A list of parameters.

required
b1 float

The value of the b1 parameter. Defaults to 0.9.

0.9
b2 float

The value of the b2 parameter. Defaults to 0.999.

0.999
**kwargs

Additional keyword arguments.

{}

Attributes:

Name Type Description
t TensorSharedVariable

time step

m_prev list[TensorSharedVariable]

previous time step momentum

v_prev list[TensorSharedVariable]

previous time step velocity


  1. Kingma et al., 2014. Adam: A Method for Stochastic Optimization. http://arxiv.org/abs/1412.6980 

AdamW(params, b1=0.9, b2=0.999, **kwargs)

Bases: Adam

Initializes the AdamW class with the given parameters.

Parameters:

Name Type Description Default
params list

A list of parameters.

required
b1 float

The value of the b1 parameter. Defaults to 0.9.

0.9
b2 float

The value of the b2 parameter. Defaults to 0.999.

0.999
**kwargs

Additional keyword arguments.

{}
Example

params = [...] # list of parameters adamw = AdamW(params, b1=0.9, b2=0.999)

Nadam(params, b1=0.99, b2=0.999, **kwargs)

Bases: Adam

An optimizer that implements the Nesterov Adam algorithm[^1]

Parameters:

Name Type Description Default
params list

A list of parameters.

required
b1 float

The value of the b1 parameter. Defaults to 0.9.

0.99
b2 float

The value of the b2 parameter. Defaults to 0.999.

0.999
**kwargs

Additional keyword arguments.

{}

Attributes:

Name Type Description
t TensorSharedVariable

time step

m_prev list[TensorSharedVariable]

previous time step momentum

v_prev list[TensorSharedVariable]

previous time step velocity


  1. Dozat, T., 2016. Incorporating nesterov momentum into adam.(2016). Dostupné z: http://cs229.stanford.edu/proj2015/054_report.pdf

Adamax(params, b1=0.9, b2=0.999, **kwargs)

Bases: Adam

An optimizer that implements the Adamax algorithm[^1]. It is a variant of the Adam algorithm

Parameters:

Name Type Description Default
params list

A list of parameters.

required
b1 float

The value of the b1 parameter. Defaults to 0.9.

0.9
b2 float

The value of the b2 parameter. Defaults to 0.999.

0.999
**kwargs

Additional keyword arguments.

{}

Attributes:

Name Type Description
t TensorSharedVariable

time step

m_prev list[TensorSharedVariable]

previous time step momentum

v_prev list[TensorSharedVariable]

previous time step velocity


  1. Kingma et al., 2014. Adam: A Method for Stochastic Optimization. http://arxiv.org/abs/1412.6980 

Adadelta(params, rho=0.95, **kwargs)

Bases: Optimizer

An optimizer that implements the Adadelta algorithm[^1]

Adadelta is a stochastic gradient descent method that is based on adaptive learning rate per dimension to address two drawbacks:

  • The continual decay of learning rates throughout training
  • The need for a manually selected global learning rate

Parameters:

Name Type Description Default
params list[TensorSharedVariable]

A list of shared variables representing the parameters of the model.

required
rho float

A float representing the decay rate for the learning rate. Defaults to 0.95.

0.95

Attributes:

Name Type Description
accumulator list[TensorSharedVariable]

A list of gradient accumulators.

delta list[TensorSharedVariable]

A list of adaptive differences between gradients.


  1. Zeiler, 2012. ADADELTA: An Adaptive Learning Rate Method. http://arxiv.org/abs/1212.5701 

RProp(params, inc=1.05, dec=0.5, bounds=[1e-06, 50.0], **kwargs)

Bases: Optimizer

An optimizer that implements the Rprop algorithm[^1]

Parameters:

Name Type Description Default
params list[TensorSharedVariable]

A list of TensorSharedVariable objects representing the parameters of the model.

required
inc float

A float representing the increment step if the gradient direction is the same.

1.05
dec float

A float representing the decrement step if the gradient direction is different.

0.5
bounds list[float]

A list of floats representing the minimum and maximum bounds for the increment step.

[1e-06, 50.0]

Attributes:

Name Type Description
factor list[TensorVariable]

A list of learning rate factor multipliers (init=1.0).

ghat list[TensorVariable]

A list of previous step gradients.


  1. Igel, C., & Hüsken, M. (2003). Empirical evaluation of the improved Rprop learning algorithms. Neurocomputing, 50, 105-123. 

RMSProp(params, rho=0.9, **kwargs)

Bases: Optimizer

An optimizer that implements the RMSprop algorithm[^1]

Parameters:

Name Type Description Default
params list[TensorSharedVariable]

Parameters of the model.

required
rho float

Discounting factor for the history/coming gradient. Defaults to 0.9.

0.9

Attributes:

Name Type Description
accumulator TensorVariable

Gradient accumulator.


  1. Hinton, G. E. (2012). rmsprop: Divide the gradient by a running average of its recent magnitude. Retrieved from http://www.cs.toronto.edu/~tijmen/csc321/slides/lecture_slides_lec6.pdf 

Momentum(params, mu=0.9, **kwargs)

Bases: Optimizer

Initializes the Momentum optimizer[^1]

Parameters:

Name Type Description Default
params list[TensorSharedVariable]

A list of parameters of the model.

required
mu float

The acceleration factor in the relevant direction and dampens oscillations. Defaults to 0.9.

0.9

Attributes:

Name Type Description
velocity list[TensorSharedVariable]

The momentum velocity.


  1. Sutskever et al., 2013. On the importance of initialization and momentum in deep learning. http://jmlr.org/proceedings/papers/v28/sutskever13.pdf 

NAG(params, mu=0.99, **kwargs)

Bases: Momentum

An optimizer that implements the Nestrov Accelerated Gradient algorithm[^1]

Parameters:

Name Type Description Default
params list[TensorSharedVariable]

A list of parameters of the model.

required
mu float

The acceleration factor in the relevant direction. Defaults to 0.99.

0.99

Attributes:

Name Type Description
t TensorSharedVariable

The momentum time step.

velocity list[TensorSharedVariable]

The momentum velocity.


  1. Sutskever et al., 2013. On the importance of initialization and momentum in deep learning. http://jmlr.org/proceedings/papers/v28/sutskever13.pdf 

AdaGrad(params, **kwargs)

Bases: Optimizer

An optimizer that implements the Adagrad algorithm[^1]

Adagrad is an optimizer with parameter-specific learning rates, which are adapted relative to how frequently a parameter gets updated during training. The more updates a parameter receives, the smaller the updates.

Parameters:

Name Type Description Default
params list[TensorSharedVariable]

parameters of the model

required

Attributes:

Name Type Description
accumulator list[TensorSharedVariable]

gradient accumulators


  1. Duchi et al., 2011. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization. https://www.jmlr.org/papers/volume12/duchi11a/duchi11a.pdf 

SGD(params, **kwargs)

Bases: Optimizer

An optimizer that implements the stochastic gradient algorithm

Parameters:

Name Type Description Default
params list[TensorSharedVariable]

parameters of the model

required

SQNBFGS(params, config=None, **kwargs)

Bases: Optimizer

Initializes the SQNBFGS optimizer object[^1]

Parameters:

Name Type Description Default
params list[TensorSharedVariable]

The parameters of the model.

required
config config

The pycmtensor config object.

None

  1. Byrd, R. H., Hansen, S. L., Nocedal, J., & Singer, Y. (2016). A stochastic quasi-Newton method for large-scale optimization. SIAM Journal on Optimization, 26(2), 1008-1031. 

clip(param, min, max)

Clips the value of a parameter within a specified range.

Parameters:

Name Type Description Default
param float

The parameter value to be clipped.

required
min float

The minimum value that the parameter can take.

required
max float

The maximum value that the parameter can take.

required

Returns:

Name Type Description
float

The clipped value of the parameter.