lr-range-test’s documentation¶
This is a library for LR range tuning, implementing the method proposed in Cyclical Learning Rates for Training Neural Networks. It can be used with any combination of pytorch models and optimizers and supports searching for good values of weight decay.
Usage¶
Although the library provides a lower-level interface through the
lr_range_test.lr_range.InteractiveLRRangeTest and
lr_range_test.lr_range.AutomaticLRRangeTest classes, a simpler and easier to use interface is provided
via lr_range_test.lr_range_test().
Sample usage for LR values between 1e-7 and 1e1. The LR is varied over the course of 200 steps and the test is ran 2 times, with two different values of weight decay.
import matplotlib
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torchvision import datasets, transforms
from lr_range_test import lr_range_test
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.conv1 = nn.Conv2d(1, 20, 5, 1)
self.conv2 = nn.Conv2d(20, 50, 5, 1)
self.fc1 = nn.Linear(4 * 4 * 50, 500)
self.fc2 = nn.Linear(500, 10)
def forward(self, x):
x = F.relu(self.conv1(x))
x = F.max_pool2d(x, 2, 2)
x = F.relu(self.conv2(x))
x = F.max_pool2d(x, 2, 2)
x = x.view(-1, 4 * 4 * 50)
x = F.relu(self.fc1(x))
x = self.fc2(x)
return F.log_softmax(x, dim=1)
# training settings
batch_size = 64
use_cuda = torch.cuda.is_available()
torch.manual_seed(1)
device = torch.device("cuda" if use_cuda else "cpu")
kwargs = {'num_workers': 16, 'pin_memory': True} if use_cuda else {}
# create the loader for MNIST data
train_loader = torch.utils.data.DataLoader(
datasets.MNIST('../data', train=True, download=True,
transform=transforms.Compose([
transforms.ToTensor(),
transforms.Normalize((0.1307,), (0.3081,))
])),
batch_size=batch_size, shuffle=True, **kwargs)
# define model, optimizer and loss
model = Net().to(device)
optimizer = optim.Adam(model.parameters(), lr=0.001)
loss_fn = torch.nn.NLLLoss()
matplotlib.use('TkAgg') # this is needed so we are able to interact with the plot
# run the training with a progress bar for 200 steps, with values of the
result = lr_range_test(model=model, optimizer=optimizer, loss_fn=loss_fn,
lr_min=1e-7, lr_max=1e1, train_loader=train_loader,
num_steps=200, automatic=False, pbar=True, wd_values=[0.0, 1e-6])
print(result)
For automatic mode simply change thq automatic flag to True.
result = lr_range_test(model=model, optimizer=optimizer, loss_fn=loss_fn,
lr_min=1e-7, lr_max=1e1, train_loader=train_loader,
num_steps=200, automatic=True, pbar=True, wd_values=[0.0, 1e-6])
Interactive mode¶
In the interactive fashion, the loss is plotted with respect to the learning rate. A vertical line is drawn tpo indicate the point of steepest improvement in the metric. The user can then drag and select an interval for the desired learning rate which will be inputted in the textboxes above the plot.
If the user wants to redo the plot with different minimum/maximum LR values, or with a different value for weight decay, they can use the boxes in the top-left of the corner to input these values and click PLOT. After they have selected satisfactory values , they can return those values using the SAVE button.
Note
The matplotlib backend should be set to an interactive one in order for interactive mode
to work (ie. TkAgg) To do this, simply use matplotlib.use('TkAgg') before calling the lr test function.
Automatic mode¶
In this mode, no plot is displayed and the lr_max value returned by the function is
the value corresponding to the steepest improvement in the metric used. This is equivalent
to the x coordinate of the red line displayed in interactive mode.
If multiple weight decay values are used, the one for which the optimal LR value is the greatest, is returned.
API¶
Simple¶
-
lr_range_test.lr_range_test(optimizer, model, train_loader, loss_fn, eval_metric=None, test_loader=None, lr_min=1e-07, lr_max=10.0, num_steps=50, smooth_f=0.05, diverge_th=5.0, wd_values=None, pbar=False, automatic=False, descending=True, device='cuda')[source]¶ The function expects a
modelandoptimizerfor which to perform the test. This model is optimized with wrt. a loss functionloss_fn. The data is loaded from a given iterable (or a standard pytorchDataLoader) calledtrain_loader. The loss will be calculated as the batch loss after each step if atest_loaderis not specified. Iftest_loaderis specified the loss is computed and averaged on the entirety of the test data.The learning rate of the model is varied from
lr_mintolr_maxexponentially over the course ofnum_stepsiterations and smoothed with an exponential moving average with an alpha coefficient ofsmooth_f. The training is stopped early if the loss diverges by a factor of more thandiverge_thfrom the best recorded loss.A custom evaluation metric such as accuracy can be specified with an ignite metric. If the metric is expected to increase during training, (eg. accuracy) the
descendingparameter should be set toFalse.The test can be run in either the interactive our automatic way depending on the value of
automatic.The results will be returned as a dictionary
- Parameters
automatic (
bool) – whether to perform an automatic lr range test or an interactive onemodel (torch.nn.Module) – A torch module receiving inputs and outputting predictions
eval_metric (
Optional[ignite.metrics.Metric]) – An ignite metric to use when evaluating the test_loader.optimizer (torch.optim.Optimizer) – The optimizer to use for the LR range test.
train_loader (
DataLoaderType) – An iterable to load data from and feed to the trainer.test_loader (
Optional[DataLoaderType]) – An iterable to load data from and feed to the evaluator,loss_fn (
Callable[[torch.Tensor, torch.Tensor], torch.Tensor]) – An objective function taking outputs and predictions and returning a metric.device (
str) – the device to do the training/evaluation on (default: cuda)descending (
bool) – whether the metric/loss chosen should descend or not (ie. accuracy should not)pbar (
bool) – whether to print a progress bar during trainingwd_values (
Optional[List[float]]) – the weight decay values to test fordiverge_th (
float) – the coefficient by which the current metric must differ from the best recorded value to consider that the metric has divergednum_steps (
int) – the number of steps to increase LR overlr_max (
float) – the lr to end onlr_min (
float) – the lr to start fromsmooth_f (
float) – the alpha coefficient for the exponential moving average
- Return type
Dict[str,float]- Returns
a dictionary with the results
Low-level¶
-
class
lr_range_test.lr_range.ModelOrEngineLRRangeTest(optimizer, train_loader, model, train_engine=None, test_engine=None, test_loader=None, loss_fn=None, eval_metric=None, descending=True, device='cuda')[source]¶ An LR range test base class that simplifies initialization of the engines Provides several parameters setups in which it can be run. For some of these, the class automatically creates trainers and evaluators, and for others, the users can provide their own.
Allowed parameter combinations:
- (model, optimizer, loss_fn, train_loader)
The data is taken from train_loader and
modelis trained withoptimizer. The loss value is taken at the end of each iteration from the output of the trainer. The output should have a key called “loss”.- (model, optimizer, loss_fn, train_loader, test_loader)
The same a the last one, but the loss is computed with data from the
test_loaderafter each iteration oftrain_loader.- (model, optimizer, loss_fn, test_engine, train_loader, test_loader) -
A default trainer will be created, but
test_enginewill be used as an evaluator.- (model, train_engine, optimizer, train_loader)
The
optimizershould belong to thetrain_engine. The loss is taken from the output of thetrain_engine.- (model, train_engine, model, optimizer, train_loader, test_loader)
A default evaluator is build using
modelandloss_fnand the loss is computed on the test set.- (model, train_engine, test_engine, train_loader, test_loader)
Computes the loss using
test_engineon data fromtest_loader.
The model and the optimizer always have to be specified, even if they are trained by proxy using a train engine. This is because they have to be reset at the end of the run and when restarting the run for a new plot (in the interactive case).
The train engine should have key called “loss” in the output. The test engine should have a metric called “loss” if used. (Note: The metric does not have to necessarily represent “the loss”. I can be accuracy or anything the user wants to use). If using the default tests engine,
loss_fnwill be used as a metric. However, any loss will be overriden, ifeval_metricis provided.- Parameters
eval_metric (
Optional[ignite.metrics.Metric]) – An ignite metric to use when evaluating the test_loader.optimizer (torch.optim.Optimizer) – The optimizer to use for the LR range test.
train_loader (
DataLoaderType) – An iterable to load data from and feed to the trainer.model (torch.nn.Module) – A torch module receiving inputs and outputting predictions
train_engine (
Optional[ignite.engine.Engine]) – An alternative to model. Used for training. Must output ‘loss’.test_engine (
Optional[ignite.engine.Engine]) – An alternative to the default evaluator. Must output a metric called ‘loss’test_loader (
Optional[DataLoaderType]) – An iterable to load data from and feed to the evaluator,loss_fn (
Optional[Callable[[torch.Tensor, torch.Tensor], torch.Tensor]]) – An objective function taking outputs and predictions and returning a metric.device (
str) – the device to do the training/evaluation on (default: cuda)descending (
bool) – whether the metric/loss chosen should descend or not (ie. accuracy should not)
-
class
lr_range_test.lr_range.InteractiveLRRangeTest(optimizer, train_loader, model, train_engine=None, test_engine=None, test_loader=None, loss_fn=None, eval_metric=None, descending=True, device='cuda')[source]¶ Bases:
lr_range_test.lr_range.ModelOrEngineLRRangeTest-
run(lr_min=1e-07, lr_max=10.0, num_steps=50, smooth_f=0.05, diverge_th=5.0, wd_values=None, pbar=False)[source]¶ Perform an interactive LR range test. The method constructs the loss plots for the given weight decays in the interval specified delimited by lr_min and lr_max. The lr is incremented exponentially over num_steps iterations. If no weight decay values are specified, the model will not use any,
The plots are smoothed with and exponential moving average with an alpha of smooth_f. The training will stop prematurely if the smoothed metric worsens by a factor of at least diverge_th compared to the best metric recorded until now.
The best interval is selected from the plot by dragging. On exit, the last interval selected is returned alongside the last entered weight decay value. The plot can be rerun with different values of lr_min, lr_max, wd and num_steps using the “PLOT” inputs.
- Return type
Dict[str,float]
-
-
class
lr_range_test.lr_range.AutomaticLRRangeTest(optimizer, train_loader, model, train_engine=None, test_engine=None, test_loader=None, loss_fn=None, eval_metric=None, descending=True, device='cuda')[source]¶ Bases:
lr_range_test.lr_range.ModelOrEngineLRRangeTestRange test class that automatically selects the best values for the minimum and maximum LR and weight decay values, based on the approximate gradient of the loss with respect to the learning rate.
-
run(lr_min=1e-07, lr_max=10.0, num_steps=50, smooth_f=0.05, diverge_th=5.0, wd_values=None, pbar=False)[source]¶ Similar to the interactive test, but the values for lr are selected automatically. The maximum lr is selected as the steepest improvement value of the smoothed metric plot. The best weight decay is selected as the weight decay value for which the steepest improvement occurs at the greatest LR value.
- Parameters
pbar (
bool) – whether to print a progress bar during trainingwd_values (
Optional[List[float]]) – the weight decay values to test fordiverge_th (
float) – the coefficient by which the current metric must differ from the best recorded value to consider that the metric has divergednum_steps (
int) – the number of steps to increase LR overlr_max (
float) – the lr to end onlr_min (
float) – the lr to start fromsmooth_f (
float) – the alpha coefficient for the exponential moving average
- Return type
Dict[str,float]
-