optimizers.py¶

`pycmtensor.optimizers` ¶

PyCMTensor optimizers module

`Optimizer(name, epsilon=1e-08, **kwargs)` ¶

Base optimizer class

Parameters:

Name	Type	Description	Default
`name`	`str`	name of the optimizer	required
`epsilon`	`float`	small value to avoid division by zero. Defaults to `1e-8`	`1e-08`

`repr()` ¶

Returns a string representation of the optimizer object.

Returns:

Name	Type	Description
`str`		A string representation of the optimizer object, including its name and parameters.

`update(**kwargs)` ¶

Update parameters for aesara function calls

Returns:

Type	Description
	None

`Adam(params, b1=0.9, b2=0.999, **kwargs)` ¶

Bases: Optimizer

An optimizer that implements the Adam algorithm[^1]

Parameters:

Name	Type	Description	Default
`params`	`list`	A list of parameters.	required
`b1`	`float`	The value of the b1 parameter. Defaults to 0.9.	`0.9`
`b2`	`float`	The value of the b2 parameter. Defaults to 0.999.	`0.999`
`**kwargs`		Additional keyword arguments.	`{}`

Attributes:

Name	Type	Description
`t`	`TensorSharedVariable`	time step
`m_prev`	`list[TensorSharedVariable]`	previous time step momentum
`v_prev`	`list[TensorSharedVariable]`	previous time step velocity

Kingma et al., 2014. Adam: A Method for Stochastic Optimization. http://arxiv.org/abs/1412.6980 ↩

`AdamW(params, b1=0.9, b2=0.999, **kwargs)` ¶

Bases: Adam

Initializes the AdamW class with the given parameters.

Parameters:

Name	Type	Description	Default
`params`	`list`	A list of parameters.	required
`b1`	`float`	The value of the b1 parameter. Defaults to 0.9.	`0.9`
`b2`	`float`	The value of the b2 parameter. Defaults to 0.999.	`0.999`
`**kwargs`		Additional keyword arguments.	`{}`

Example

params = [...] # list of parameters adamw = AdamW(params, b1=0.9, b2=0.999)

`Nadam(params, b1=0.99, b2=0.999, **kwargs)` ¶

Bases: Adam

An optimizer that implements the Nesterov Adam algorithm[^1]

Parameters:

Name	Type	Description	Default
`params`	`list`	A list of parameters.	required
`b1`	`float`	The value of the b1 parameter. Defaults to 0.9.	`0.99`
`b2`	`float`	The value of the b2 parameter. Defaults to 0.999.	`0.999`
`**kwargs`		Additional keyword arguments.	`{}`

Attributes:

Name	Type	Description
`t`	`TensorSharedVariable`	time step
`m_prev`	`list[TensorSharedVariable]`	previous time step momentum
`v_prev`	`list[TensorSharedVariable]`	previous time step velocity

Dozat, T., 2016. Incorporating nesterov momentum into adam.(2016). Dostupné z: http://cs229.stanford.edu/proj2015/054_report.pdf. ↩

`Adamax(params, b1=0.9, b2=0.999, **kwargs)` ¶

Bases: Adam

An optimizer that implements the Adamax algorithm[^1]. It is a variant of the Adam algorithm

Parameters:

Name	Type	Description	Default
`params`	`list`	A list of parameters.	required
`b1`	`float`	The value of the b1 parameter. Defaults to 0.9.	`0.9`
`b2`	`float`	The value of the b2 parameter. Defaults to 0.999.	`0.999`
`**kwargs`		Additional keyword arguments.	`{}`

Attributes:

Name	Type	Description
`t`	`TensorSharedVariable`	time step
`m_prev`	`list[TensorSharedVariable]`	previous time step momentum
`v_prev`	`list[TensorSharedVariable]`	previous time step velocity

Kingma et al., 2014. Adam: A Method for Stochastic Optimization. http://arxiv.org/abs/1412.6980 ↩

`Adadelta(params, rho=0.95, **kwargs)` ¶

Bases: Optimizer

An optimizer that implements the Adadelta algorithm[^1]

Adadelta is a stochastic gradient descent method that is based on adaptive learning rate per dimension to address two drawbacks:

The continual decay of learning rates throughout training
The need for a manually selected global learning rate

Parameters:

Name	Type	Description	Default
`params`	`list[TensorSharedVariable]`	A list of shared variables representing the parameters of the model.	required
`rho`	`float`	A float representing the decay rate for the learning rate. Defaults to 0.95.	`0.95`

Attributes:

Name	Type	Description
`accumulator`	`list[TensorSharedVariable]`	A list of gradient accumulators.
`delta`	`list[TensorSharedVariable]`	A list of adaptive differences between gradients.

Zeiler, 2012. ADADELTA: An Adaptive Learning Rate Method. http://arxiv.org/abs/1212.5701 ↩

`RProp(params, inc=1.05, dec=0.5, bounds=[1e-06, 50.0], **kwargs)` ¶

Bases: Optimizer

An optimizer that implements the Rprop algorithm[^1]

Parameters:

Name	Type	Description	Default
`params`	`list[TensorSharedVariable]`	A list of TensorSharedVariable objects representing the parameters of the model.	required
`inc`	`float`	A float representing the increment step if the gradient direction is the same.	`1.05`
`dec`	`float`	A float representing the decrement step if the gradient direction is different.	`0.5`
`bounds`	`list[float]`	A list of floats representing the minimum and maximum bounds for the increment step.	`[1e-06, 50.0]`

Attributes:

Name	Type	Description
`factor`	`list[TensorVariable]`	A list of learning rate factor multipliers (init=1.0).
`ghat`	`list[TensorVariable]`	A list of previous step gradients.

Igel, C., & Hüsken, M. (2003). Empirical evaluation of the improved Rprop learning algorithms. Neurocomputing, 50, 105-123. ↩

`RMSProp(params, rho=0.9, **kwargs)` ¶

Bases: Optimizer

An optimizer that implements the RMSprop algorithm[^1]

Parameters:

Name	Type	Description	Default
`params`	`list[TensorSharedVariable]`	Parameters of the model.	required
`rho`	`float`	Discounting factor for the history/coming gradient. Defaults to 0.9.	`0.9`

Attributes:

Name	Type	Description
`accumulator`	`TensorVariable`	Gradient accumulator.

Hinton, G. E. (2012). rmsprop: Divide the gradient by a running average of its recent magnitude. Retrieved from http://www.cs.toronto.edu/~tijmen/csc321/slides/lecture_slides_lec6.pdf ↩

`Momentum(params, mu=0.9, **kwargs)` ¶

Bases: Optimizer

Initializes the Momentum optimizer[^1]

Parameters:

Name	Type	Description	Default
`params`	`list[TensorSharedVariable]`	A list of parameters of the model.	required
`mu`	`float`	The acceleration factor in the relevant direction and dampens oscillations. Defaults to `0.9`.	`0.9`

Attributes:

Name	Type	Description
`velocity`	`list[TensorSharedVariable]`	The momentum velocity.

Sutskever et al., 2013. On the importance of initialization and momentum in deep learning. http://jmlr.org/proceedings/papers/v28/sutskever13.pdf ↩

`NAG(params, mu=0.99, **kwargs)` ¶

Bases: Momentum

An optimizer that implements the Nestrov Accelerated Gradient algorithm[^1]

Parameters:

Name	Type	Description	Default
`params`	`list[TensorSharedVariable]`	A list of parameters of the model.	required
`mu`	`float`	The acceleration factor in the relevant direction. Defaults to `0.99`.	`0.99`

Attributes:

Name	Type	Description
`t`	`TensorSharedVariable`	The momentum time step.
`velocity`	`list[TensorSharedVariable]`	The momentum velocity.

Sutskever et al., 2013. On the importance of initialization and momentum in deep learning. http://jmlr.org/proceedings/papers/v28/sutskever13.pdf ↩

`AdaGrad(params, **kwargs)` ¶

Bases: Optimizer

An optimizer that implements the Adagrad algorithm[^1]

Adagrad is an optimizer with parameter-specific learning rates, which are adapted relative to how frequently a parameter gets updated during training. The more updates a parameter receives, the smaller the updates.

Parameters:

Name	Type	Description	Default
`params`	`list[TensorSharedVariable]`	parameters of the model	required

Attributes:

Name	Type	Description
`accumulator`	`list[TensorSharedVariable]`	gradient accumulators

Duchi et al., 2011. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization. https://www.jmlr.org/papers/volume12/duchi11a/duchi11a.pdf ↩

`SGD(params, **kwargs)` ¶

Bases: Optimizer

An optimizer that implements the stochastic gradient algorithm

Parameters:

Name	Type	Description	Default
`params`	`list[TensorSharedVariable]`	parameters of the model	required

`SQNBFGS(params, config=None, **kwargs)` ¶

Bases: Optimizer

Initializes the SQNBFGS optimizer object[^1]

Parameters:

Name	Type	Description	Default
`params`	`list[TensorSharedVariable]`	The parameters of the model.	required
`config`	`config`	The pycmtensor config object.	`None`

Byrd, R. H., Hansen, S. L., Nocedal, J., & Singer, Y. (2016). A stochastic quasi-Newton method for large-scale optimization. SIAM Journal on Optimization, 26(2), 1008-1031. ↩

`clip(param, min, max)` ¶

Clips the value of a parameter within a specified range.

Parameters:

Name	Type	Description	Default
`param`	`float`	The parameter value to be clipped.	required
`min`	`float`	The minimum value that the parameter can take.	required
`max`	`float`	The maximum value that the parameter can take.	required

Returns:

Name	Type	Description
`float`		The clipped value of the parameter.

optimizers.py¶

pycmtensor.optimizers ¶

Optimizer(name, epsilon=1e-08, **kwargs) ¶

__repr__() ¶

update(**kwargs) ¶

Adam(params, b1=0.9, b2=0.999, **kwargs) ¶

AdamW(params, b1=0.9, b2=0.999, **kwargs) ¶

Nadam(params, b1=0.99, b2=0.999, **kwargs) ¶

Adamax(params, b1=0.9, b2=0.999, **kwargs) ¶

Adadelta(params, rho=0.95, **kwargs) ¶

RProp(params, inc=1.05, dec=0.5, bounds=[1e-06, 50.0], **kwargs) ¶

RMSProp(params, rho=0.9, **kwargs) ¶

Momentum(params, mu=0.9, **kwargs) ¶

NAG(params, mu=0.99, **kwargs) ¶

AdaGrad(params, **kwargs) ¶

SGD(params, **kwargs) ¶

SQNBFGS(params, config=None, **kwargs) ¶

clip(param, min, max) ¶

`pycmtensor.optimizers` ¶

`Optimizer(name, epsilon=1e-08, **kwargs)` ¶

`repr()` ¶

`update(**kwargs)` ¶

`Adam(params, b1=0.9, b2=0.999, **kwargs)` ¶

`AdamW(params, b1=0.9, b2=0.999, **kwargs)` ¶

`Nadam(params, b1=0.99, b2=0.999, **kwargs)` ¶

`Adamax(params, b1=0.9, b2=0.999, **kwargs)` ¶

`Adadelta(params, rho=0.95, **kwargs)` ¶

`RProp(params, inc=1.05, dec=0.5, bounds=[1e-06, 50.0], **kwargs)` ¶

`RMSProp(params, rho=0.9, **kwargs)` ¶

`Momentum(params, mu=0.9, **kwargs)` ¶

`NAG(params, mu=0.99, **kwargs)` ¶

`AdaGrad(params, **kwargs)` ¶

`SGD(params, **kwargs)` ¶

`SQNBFGS(params, config=None, **kwargs)` ¶

`clip(param, min, max)` ¶