optimizers.py¶
          pycmtensor.optimizers
¶
  PyCMTensor optimizers module
            Optimizer(name, epsilon=1e-08, **kwargs)
¶
  Base optimizer class
Parameters:
| Name | Type | Description | Default | 
|---|---|---|---|
name | 
          
                str
           | 
          
             name of the optimizer  | 
          required | 
epsilon | 
          
                float
           | 
          
             small value to avoid division by zero.
Defaults to   | 
          
                1e-08
           | 
        
            Adam(params, b1=0.9, b2=0.999, **kwargs)
¶
  
            Bases: Optimizer
An optimizer that implements the Adam algorithm[^1]
Parameters:
| Name | Type | Description | Default | 
|---|---|---|---|
params | 
          
                list
           | 
          
             A list of parameters.  | 
          required | 
b1 | 
          
                float
           | 
          
             The value of the b1 parameter. Defaults to 0.9.  | 
          
                0.9
           | 
        
b2 | 
          
                float
           | 
          
             The value of the b2 parameter. Defaults to 0.999.  | 
          
                0.999
           | 
        
**kwargs | 
          
             Additional keyword arguments.  | 
          
                {}
           | 
        
Attributes:
| Name | Type | Description | 
|---|---|---|
t | 
          
                TensorSharedVariable
           | 
          
             time step  | 
        
m_prev | 
          
                list[TensorSharedVariable]
           | 
          
             previous time step momentum  | 
        
v_prev | 
          
                list[TensorSharedVariable]
           | 
          
             previous time step velocity  | 
        
- 
Kingma et al., 2014. Adam: A Method for Stochastic Optimization. http://arxiv.org/abs/1412.6980 ↩
 
            AdamW(params, b1=0.9, b2=0.999, **kwargs)
¶
  
            Bases: Adam
Initializes the AdamW class with the given parameters.
Parameters:
| Name | Type | Description | Default | 
|---|---|---|---|
params | 
          
                list
           | 
          
             A list of parameters.  | 
          required | 
b1 | 
          
                float
           | 
          
             The value of the b1 parameter. Defaults to 0.9.  | 
          
                0.9
           | 
        
b2 | 
          
                float
           | 
          
             The value of the b2 parameter. Defaults to 0.999.  | 
          
                0.999
           | 
        
**kwargs | 
          
             Additional keyword arguments.  | 
          
                {}
           | 
        
Example
params = [...] # list of parameters adamw = AdamW(params, b1=0.9, b2=0.999)
            Nadam(params, b1=0.99, b2=0.999, **kwargs)
¶
  
            Bases: Adam
An optimizer that implements the Nesterov Adam algorithm[^1]
Parameters:
| Name | Type | Description | Default | 
|---|---|---|---|
params | 
          
                list
           | 
          
             A list of parameters.  | 
          required | 
b1 | 
          
                float
           | 
          
             The value of the b1 parameter. Defaults to 0.9.  | 
          
                0.99
           | 
        
b2 | 
          
                float
           | 
          
             The value of the b2 parameter. Defaults to 0.999.  | 
          
                0.999
           | 
        
**kwargs | 
          
             Additional keyword arguments.  | 
          
                {}
           | 
        
Attributes:
| Name | Type | Description | 
|---|---|---|
t | 
          
                TensorSharedVariable
           | 
          
             time step  | 
        
m_prev | 
          
                list[TensorSharedVariable]
           | 
          
             previous time step momentum  | 
        
v_prev | 
          
                list[TensorSharedVariable]
           | 
          
             previous time step velocity  | 
        
- 
Dozat, T., 2016. Incorporating nesterov momentum into adam.(2016). Dostupné z: http://cs229.stanford.edu/proj2015/054_report.pdf. ↩
 
            Adamax(params, b1=0.9, b2=0.999, **kwargs)
¶
  
            Bases: Adam
An optimizer that implements the Adamax algorithm[^1]. It is a variant of the Adam algorithm
Parameters:
| Name | Type | Description | Default | 
|---|---|---|---|
params | 
          
                list
           | 
          
             A list of parameters.  | 
          required | 
b1 | 
          
                float
           | 
          
             The value of the b1 parameter. Defaults to 0.9.  | 
          
                0.9
           | 
        
b2 | 
          
                float
           | 
          
             The value of the b2 parameter. Defaults to 0.999.  | 
          
                0.999
           | 
        
**kwargs | 
          
             Additional keyword arguments.  | 
          
                {}
           | 
        
Attributes:
| Name | Type | Description | 
|---|---|---|
t | 
          
                TensorSharedVariable
           | 
          
             time step  | 
        
m_prev | 
          
                list[TensorSharedVariable]
           | 
          
             previous time step momentum  | 
        
v_prev | 
          
                list[TensorSharedVariable]
           | 
          
             previous time step velocity  | 
        
- 
Kingma et al., 2014. Adam: A Method for Stochastic Optimization. http://arxiv.org/abs/1412.6980 ↩
 
            Adadelta(params, rho=0.95, **kwargs)
¶
  
            Bases: Optimizer
An optimizer that implements the Adadelta algorithm[^1]
Adadelta is a stochastic gradient descent method that is based on adaptive learning rate per dimension to address two drawbacks:
- The continual decay of learning rates throughout training
 - The need for a manually selected global learning rate
 
Parameters:
| Name | Type | Description | Default | 
|---|---|---|---|
params | 
          
                list[TensorSharedVariable]
           | 
          
             A list of shared variables representing the parameters of the model.  | 
          required | 
rho | 
          
                float
           | 
          
             A float representing the decay rate for the learning rate. Defaults to 0.95.  | 
          
                0.95
           | 
        
Attributes:
| Name | Type | Description | 
|---|---|---|
accumulator | 
          
                list[TensorSharedVariable]
           | 
          
             A list of gradient accumulators.  | 
        
delta | 
          
                list[TensorSharedVariable]
           | 
          
             A list of adaptive differences between gradients.  | 
        
- 
Zeiler, 2012. ADADELTA: An Adaptive Learning Rate Method. http://arxiv.org/abs/1212.5701 ↩
 
            RProp(params, inc=1.05, dec=0.5, bounds=[1e-06, 50.0], **kwargs)
¶
  
            Bases: Optimizer
An optimizer that implements the Rprop algorithm[^1]
Parameters:
| Name | Type | Description | Default | 
|---|---|---|---|
params | 
          
                list[TensorSharedVariable]
           | 
          
             A list of TensorSharedVariable objects representing the parameters of the model.  | 
          required | 
inc | 
          
                float
           | 
          
             A float representing the increment step if the gradient direction is the same.  | 
          
                1.05
           | 
        
dec | 
          
                float
           | 
          
             A float representing the decrement step if the gradient direction is different.  | 
          
                0.5
           | 
        
bounds | 
          
                list[float]
           | 
          
             A list of floats representing the minimum and maximum bounds for the increment step.  | 
          
                [1e-06, 50.0]
           | 
        
Attributes:
| Name | Type | Description | 
|---|---|---|
factor | 
          
                list[TensorVariable]
           | 
          
             A list of learning rate factor multipliers (init=1.0).  | 
        
ghat | 
          
                list[TensorVariable]
           | 
          
             A list of previous step gradients.  | 
        
- 
Igel, C., & Hüsken, M. (2003). Empirical evaluation of the improved Rprop learning algorithms. Neurocomputing, 50, 105-123. ↩
 
            RMSProp(params, rho=0.9, **kwargs)
¶
  
            Bases: Optimizer
An optimizer that implements the RMSprop algorithm[^1]
Parameters:
| Name | Type | Description | Default | 
|---|---|---|---|
params | 
          
                list[TensorSharedVariable]
           | 
          
             Parameters of the model.  | 
          required | 
rho | 
          
                float
           | 
          
             Discounting factor for the history/coming gradient. Defaults to 0.9.  | 
          
                0.9
           | 
        
Attributes:
| Name | Type | Description | 
|---|---|---|
accumulator | 
          
                TensorVariable
           | 
          
             Gradient accumulator.  | 
        
- 
Hinton, G. E. (2012). rmsprop: Divide the gradient by a running average of its recent magnitude. Retrieved from http://www.cs.toronto.edu/~tijmen/csc321/slides/lecture_slides_lec6.pdf ↩
 
            Momentum(params, mu=0.9, **kwargs)
¶
  
            Bases: Optimizer
Initializes the Momentum optimizer[^1]
Parameters:
| Name | Type | Description | Default | 
|---|---|---|---|
params | 
          
                list[TensorSharedVariable]
           | 
          
             A list of parameters of the model.  | 
          required | 
mu | 
          
                float
           | 
          
             The acceleration factor in the relevant direction and dampens oscillations. Defaults to   | 
          
                0.9
           | 
        
Attributes:
| Name | Type | Description | 
|---|---|---|
velocity | 
          
                list[TensorSharedVariable]
           | 
          
             The momentum velocity.  | 
        
- 
Sutskever et al., 2013. On the importance of initialization and momentum in deep learning. http://jmlr.org/proceedings/papers/v28/sutskever13.pdf ↩
 
            NAG(params, mu=0.99, **kwargs)
¶
  
            Bases: Momentum
An optimizer that implements the Nestrov Accelerated Gradient algorithm[^1]
Parameters:
| Name | Type | Description | Default | 
|---|---|---|---|
params | 
          
                list[TensorSharedVariable]
           | 
          
             A list of parameters of the model.  | 
          required | 
mu | 
          
                float
           | 
          
             The acceleration factor in the relevant direction. Defaults to   | 
          
                0.99
           | 
        
Attributes:
| Name | Type | Description | 
|---|---|---|
t | 
          
                TensorSharedVariable
           | 
          
             The momentum time step.  | 
        
velocity | 
          
                list[TensorSharedVariable]
           | 
          
             The momentum velocity.  | 
        
- 
Sutskever et al., 2013. On the importance of initialization and momentum in deep learning. http://jmlr.org/proceedings/papers/v28/sutskever13.pdf ↩
 
            AdaGrad(params, **kwargs)
¶
  
            Bases: Optimizer
An optimizer that implements the Adagrad algorithm[^1]
Adagrad is an optimizer with parameter-specific learning rates, which are adapted relative to how frequently a parameter gets updated during training. The more updates a parameter receives, the smaller the updates.
Parameters:
| Name | Type | Description | Default | 
|---|---|---|---|
params | 
          
                list[TensorSharedVariable]
           | 
          
             parameters of the model  | 
          required | 
Attributes:
| Name | Type | Description | 
|---|---|---|
accumulator | 
          
                list[TensorSharedVariable]
           | 
          
             gradient accumulators  | 
        
- 
Duchi et al., 2011. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization. https://www.jmlr.org/papers/volume12/duchi11a/duchi11a.pdf ↩
 
            SGD(params, **kwargs)
¶
  
            Bases: Optimizer
An optimizer that implements the stochastic gradient algorithm
Parameters:
| Name | Type | Description | Default | 
|---|---|---|---|
params | 
          
                list[TensorSharedVariable]
           | 
          
             parameters of the model  | 
          required | 
            SQNBFGS(params, config=None, **kwargs)
¶
  
            Bases: Optimizer
Initializes the SQNBFGS optimizer object[^1]
Parameters:
| Name | Type | Description | Default | 
|---|---|---|---|
params | 
          
                list[TensorSharedVariable]
           | 
          
             The parameters of the model.  | 
          required | 
config | 
          
                config
           | 
          
             The pycmtensor config object.  | 
          
                None
           | 
        
- 
Byrd, R. H., Hansen, S. L., Nocedal, J., & Singer, Y. (2016). A stochastic quasi-Newton method for large-scale optimization. SIAM Journal on Optimization, 26(2), 1008-1031. ↩
 
          clip(param, min, max)
¶
  Clips the value of a parameter within a specified range.
Parameters:
| Name | Type | Description | Default | 
|---|---|---|---|
param | 
          
                float
           | 
          
             The parameter value to be clipped.  | 
          required | 
min | 
          
                float
           | 
          
             The minimum value that the parameter can take.  | 
          required | 
max | 
          
                float
           | 
          
             The maximum value that the parameter can take.  | 
          required | 
Returns:
| Name | Type | Description | 
|---|---|---|
float |           
             The clipped value of the parameter.  |