lr-range-test’s documentation

This is a library for LR range tuning, implementing the method proposed in Cyclical Learning Rates for Training Neural Networks. It can be used with any combination of pytorch models and optimizers and supports searching for good values of weight decay.

Usage

Although the library provides a lower-level interface through the lr_range_test.lr_range.InteractiveLRRangeTest and lr_range_test.lr_range.AutomaticLRRangeTest classes, a simpler and easier to use interface is provided via lr_range_test.lr_range_test().

Sample usage for LR values between 1e-7 and 1e1. The LR is varied over the course of 200 steps and the test is ran 2 times, with two different values of weight decay.

import matplotlib
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torchvision import datasets, transforms

from lr_range_test import lr_range_test


class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(1, 20, 5, 1)
        self.conv2 = nn.Conv2d(20, 50, 5, 1)
        self.fc1 = nn.Linear(4 * 4 * 50, 500)
        self.fc2 = nn.Linear(500, 10)

    def forward(self, x):
        x = F.relu(self.conv1(x))
        x = F.max_pool2d(x, 2, 2)
        x = F.relu(self.conv2(x))
        x = F.max_pool2d(x, 2, 2)
        x = x.view(-1, 4 * 4 * 50)
        x = F.relu(self.fc1(x))
        x = self.fc2(x)
        return F.log_softmax(x, dim=1)

# training settings
batch_size = 64
use_cuda = torch.cuda.is_available()
torch.manual_seed(1)
device = torch.device("cuda" if use_cuda else "cpu")
kwargs = {'num_workers': 16, 'pin_memory': True} if use_cuda else {}

# create the loader for MNIST data
train_loader = torch.utils.data.DataLoader(
    datasets.MNIST('../data', train=True, download=True,
                   transform=transforms.Compose([
                       transforms.ToTensor(),
                       transforms.Normalize((0.1307,), (0.3081,))
                   ])),
    batch_size=batch_size, shuffle=True, **kwargs)

# define model, optimizer and loss
model = Net().to(device)
optimizer = optim.Adam(model.parameters(), lr=0.001)
loss_fn = torch.nn.NLLLoss()

matplotlib.use('TkAgg')  # this is needed so we are able to interact with the plot
# run the training with a progress bar for 200 steps, with values of the
result = lr_range_test(model=model, optimizer=optimizer, loss_fn=loss_fn,
                       lr_min=1e-7, lr_max=1e1, train_loader=train_loader,
                       num_steps=200, automatic=False, pbar=True, wd_values=[0.0, 1e-6])
print(result)

For automatic mode simply change thq automatic flag to True.

result = lr_range_test(model=model, optimizer=optimizer, loss_fn=loss_fn,
                           lr_min=1e-7, lr_max=1e1, train_loader=train_loader,
                           num_steps=200, automatic=True, pbar=True, wd_values=[0.0, 1e-6])

Interactive mode

In the interactive fashion, the loss is plotted with respect to the learning rate. A vertical line is drawn tpo indicate the point of steepest improvement in the metric. The user can then drag and select an interval for the desired learning rate which will be inputted in the textboxes above the plot.

If the user wants to redo the plot with different minimum/maximum LR values, or with a different value for weight decay, they can use the boxes in the top-left of the corner to input these values and click PLOT. After they have selected satisfactory values , they can return those values using the SAVE button.

Note

The matplotlib backend should be set to an interactive one in order for interactive mode to work (ie. TkAgg) To do this, simply use matplotlib.use('TkAgg') before calling the lr test function.

Automatic mode

In this mode, no plot is displayed and the lr_max value returned by the function is the value corresponding to the steepest improvement in the metric used. This is equivalent to the x coordinate of the red line displayed in interactive mode.

If multiple weight decay values are used, the one for which the optimal LR value is the greatest, is returned.

API

Simple

lr_range_test.lr_range_test(optimizer, model, train_loader, loss_fn, eval_metric=None, test_loader=None, lr_min=1e-07, lr_max=10.0, num_steps=50, smooth_f=0.05, diverge_th=5.0, wd_values=None, pbar=False, automatic=False, descending=True, device='cuda')[source]

The function expects a model and optimizer for which to perform the test. This model is optimized with wrt. a loss function loss_fn. The data is loaded from a given iterable (or a standard pytorch DataLoader) called train_loader. The loss will be calculated as the batch loss after each step if a test_loader is not specified. If test_loader is specified the loss is computed and averaged on the entirety of the test data.

The learning rate of the model is varied from lr_min to lr_max exponentially over the course of num_steps iterations and smoothed with an exponential moving average with an alpha coefficient of smooth_f. The training is stopped early if the loss diverges by a factor of more than diverge_th from the best recorded loss.

A custom evaluation metric such as accuracy can be specified with an ignite metric. If the metric is expected to increase during training, (eg. accuracy) the descending parameter should be set to False.

The test can be run in either the interactive our automatic way depending on the value of automatic.

The results will be returned as a dictionary

Parameters
  • automatic (bool) – whether to perform an automatic lr range test or an interactive one

  • model (torch.nn.Module) – A torch module receiving inputs and outputting predictions

  • eval_metric (Optional[ignite.metrics.Metric]) – An ignite metric to use when evaluating the test_loader.

  • optimizer (torch.optim.Optimizer) – The optimizer to use for the LR range test.

  • train_loader (DataLoaderType) – An iterable to load data from and feed to the trainer.

  • test_loader (Optional[DataLoaderType]) – An iterable to load data from and feed to the evaluator,

  • loss_fn (Callable[[torch.Tensor, torch.Tensor], torch.Tensor]) – An objective function taking outputs and predictions and returning a metric.

  • device (str) – the device to do the training/evaluation on (default: cuda)

  • descending (bool) – whether the metric/loss chosen should descend or not (ie. accuracy should not)

  • pbar (bool) – whether to print a progress bar during training

  • wd_values (Optional[List[float]]) – the weight decay values to test for

  • diverge_th (float) – the coefficient by which the current metric must differ from the best recorded value to consider that the metric has diverged

  • num_steps (int) – the number of steps to increase LR over

  • lr_max (float) – the lr to end on

  • lr_min (float) – the lr to start from

  • smooth_f (float) – the alpha coefficient for the exponential moving average

Return type

Dict[str, float]

Returns

a dictionary with the results

Low-level

class lr_range_test.lr_range.ModelOrEngineLRRangeTest(optimizer, train_loader, model, train_engine=None, test_engine=None, test_loader=None, loss_fn=None, eval_metric=None, descending=True, device='cuda')[source]

An LR range test base class that simplifies initialization of the engines Provides several parameters setups in which it can be run. For some of these, the class automatically creates trainers and evaluators, and for others, the users can provide their own.

Allowed parameter combinations:

(model, optimizer, loss_fn, train_loader)

The data is taken from train_loader and model is trained with optimizer. The loss value is taken at the end of each iteration from the output of the trainer. The output should have a key called “loss”.

(model, optimizer, loss_fn, train_loader, test_loader)

The same a the last one, but the loss is computed with data from the test_loader after each iteration of train_loader.

(model, optimizer, loss_fn, test_engine, train_loader, test_loader) -

A default trainer will be created, but test_engine will be used as an evaluator.

(model, train_engine, optimizer, train_loader)

The optimizer should belong to the train_engine. The loss is taken from the output of the train_engine.

(model, train_engine, model, optimizer, train_loader, test_loader)

A default evaluator is build using model and loss_fn and the loss is computed on the test set.

(model, train_engine, test_engine, train_loader, test_loader)

Computes the loss using test_engine on data from test_loader.

The model and the optimizer always have to be specified, even if they are trained by proxy using a train engine. This is because they have to be reset at the end of the run and when restarting the run for a new plot (in the interactive case).

The train engine should have key called “loss” in the output. The test engine should have a metric called “loss” if used. (Note: The metric does not have to necessarily represent “the loss”. I can be accuracy or anything the user wants to use). If using the default tests engine, loss_fn will be used as a metric. However, any loss will be overriden, if eval_metric is provided.

Parameters
  • eval_metric (Optional[ignite.metrics.Metric]) – An ignite metric to use when evaluating the test_loader.

  • optimizer (torch.optim.Optimizer) – The optimizer to use for the LR range test.

  • train_loader (DataLoaderType) – An iterable to load data from and feed to the trainer.

  • model (torch.nn.Module) – A torch module receiving inputs and outputting predictions

  • train_engine (Optional[ignite.engine.Engine]) – An alternative to model. Used for training. Must output ‘loss’.

  • test_engine (Optional[ignite.engine.Engine]) – An alternative to the default evaluator. Must output a metric called ‘loss’

  • test_loader (Optional[DataLoaderType]) – An iterable to load data from and feed to the evaluator,

  • loss_fn (Optional[Callable[[torch.Tensor, torch.Tensor], torch.Tensor]]) – An objective function taking outputs and predictions and returning a metric.

  • device (str) – the device to do the training/evaluation on (default: cuda)

  • descending (bool) – whether the metric/loss chosen should descend or not (ie. accuracy should not)

class lr_range_test.lr_range.InteractiveLRRangeTest(optimizer, train_loader, model, train_engine=None, test_engine=None, test_loader=None, loss_fn=None, eval_metric=None, descending=True, device='cuda')[source]

Bases: lr_range_test.lr_range.ModelOrEngineLRRangeTest

run(lr_min=1e-07, lr_max=10.0, num_steps=50, smooth_f=0.05, diverge_th=5.0, wd_values=None, pbar=False)[source]

Perform an interactive LR range test. The method constructs the loss plots for the given weight decays in the interval specified delimited by lr_min and lr_max. The lr is incremented exponentially over num_steps iterations. If no weight decay values are specified, the model will not use any,

The plots are smoothed with and exponential moving average with an alpha of smooth_f. The training will stop prematurely if the smoothed metric worsens by a factor of at least diverge_th compared to the best metric recorded until now.

The best interval is selected from the plot by dragging. On exit, the last interval selected is returned alongside the last entered weight decay value. The plot can be rerun with different values of lr_min, lr_max, wd and num_steps using the “PLOT” inputs.

Return type

Dict[str, float]

class lr_range_test.lr_range.AutomaticLRRangeTest(optimizer, train_loader, model, train_engine=None, test_engine=None, test_loader=None, loss_fn=None, eval_metric=None, descending=True, device='cuda')[source]

Bases: lr_range_test.lr_range.ModelOrEngineLRRangeTest

Range test class that automatically selects the best values for the minimum and maximum LR and weight decay values, based on the approximate gradient of the loss with respect to the learning rate.

run(lr_min=1e-07, lr_max=10.0, num_steps=50, smooth_f=0.05, diverge_th=5.0, wd_values=None, pbar=False)[source]

Similar to the interactive test, but the values for lr are selected automatically. The maximum lr is selected as the steepest improvement value of the smoothed metric plot. The best weight decay is selected as the weight decay value for which the steepest improvement occurs at the greatest LR value.

Parameters
  • pbar (bool) – whether to print a progress bar during training

  • wd_values (Optional[List[float]]) – the weight decay values to test for

  • diverge_th (float) – the coefficient by which the current metric must differ from the best recorded value to consider that the metric has diverged

  • num_steps (int) – the number of steps to increase LR over

  • lr_max (float) – the lr to end on

  • lr_min (float) – the lr to start from

  • smooth_f (float) – the alpha coefficient for the exponential moving average

Return type

Dict[str, float]