=====
Usage
=====


Inferno is a utility library built around [PyTorch](http://pytorch.org/), designed to help you train and even build complex pytorch models. And in this tutorial, we'll see how! If you're new to PyTorch, I highly recommended you work through the [Pytorch tutorials](http://pytorch.org/tutorials/) first.

Building a PyTorch Model
~~~~~~~~~~~~~~~~~~~~~~~~~~

Inferno's training machinery works with just about any valid [Pytorch module](http://pytorch.org/docs/master/nn.html#torch.nn.Module). However, to make things even easier, we also provide pre-configured layers that work out-of-the-box. Let's use them to build a convolutional neural network for Cifar-10.

.. code:: python

    import torch.nn as nn
    from inferno.extensions.layers.convolutional import ConvELU2D
    from inferno.extensions.layers.reshape import Flatten

`ConvELU2D` is a 2-dimensional convolutional layer with orthogonal weight initialization and [ELU](http://pytorch.org/docs/master/nn.html#torch.nn.ELU) activation. `Flatten` reshapes the 4 dimensional activation tensor to a matrix. Let's use the Sequential container to chain together a bunch of convolutional and pooling layers, followed by a linear and softmax layer. 


.. code:: python

    model = nn.Sequential(
        ConvELU2D(in_channels=3, out_channels=256, kernel_size=3),
        nn.MaxPool2d(kernel_size=2, stride=2),
        ConvELU2D(in_channels=256, out_channels=256, kernel_size=3),
        nn.MaxPool2d(kernel_size=2, stride=2),
        ConvELU2D(in_channels=256, out_channels=256, kernel_size=3),
        nn.MaxPool2d(kernel_size=2, stride=2),
        Flatten(),
        nn.Linear(in_features=(256 * 4 * 4), out_features=10),
        nn.Softmax()
    )

Models this size don't win competitions anymore, but it'll do for our purpose. 

Data Logistics 
**************************

With our model built, it's time to worry about the data generators. Or is it? 

.. code:: python

    from inferno.io.box.cifar import get_cifar10_loaders
    train_loader, validate_loader = get_cifar10_loaders('path/to/cifar10', 
                                                        download=True, 
                                                        train_batch_size=128, 
                                                        test_batch_size=100)

CIFAR-10 works out-of-the-`box` (pun very much intended) with all the fancy data-augmentation and normalization. Of course, it's perfectly fine if you have your own [`DataLoader`](http://pytorch.org/docs/master/data.html#torch.utils.data.DataLoader).


Preparing the Trainer
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

With our model and data loaders good to go, it's finally time to build the trainer. To start, let's initialize one. 

.. code:: python

    from inferno.trainers.basic import Trainer

    trainer = Trainer(model)
    # Tell trainer about the data loaders
    trainer.bind_loader('train', train_loader).bind_loader('validate', validate_loader)


Now to the things we could do with it. 

Setting up Checkpointing
***************************************
When training a model for days, it's usually a good idea to store the current training state to disk every once in a while. To set this up, we tell `trainer` where to store these *checkpoints* and how often. 

.. code:: python

    trainer.save_to_directory('path/to/save/directory').save_every((25, 'epochs'))

So we're saving once every 25 epochs. But what if an epoch takes forever, and you don't wish to wait that long? 

.. code:: python

    trainer.save_every((1000, 'iterations'))

In this setting, you're saving once every 1000 iterations (= batches). But we might also want to create a checkpoint when the validation score is the best. Easy as 1, 2,

.. code:: python

    trainer.save_at_best_validation_score()

Remember that a checkpoint contains the entire training state, and not just the model. Everything is included in the checkpoint file, including optimizer, criterion, and callbacks but __not the data loaders__. 

Setting up Validation
**************************
Let's say you wish to validate once every 2 epochs.

.. code:: python

    trainer.validate_every((2, 'epochs'))


To be able to validate, you'll need to specify a validation metric.

.. code:: python

    trainer.build_metric('CategoricalError')

Inferno looks for a metric `'CategoricalError'` in `inferno.extensions.metrics`. To specify your own metric, subclass `inferno.extensions.metrics.base.Metric` and implement the `forward` method. With that done, you could:

.. code:: python

    trainer.build_metric(MyMetric)

or 

.. code:: python

    trainer.build_metric(MyMetric, **my_metric_kwargs)


Note that the metric applies to `torch.Tensor`s, and not on `torch.autograd.Variable`s. Also, a metric might be way too expensive to evaluate every training iteration without slowing down the training. If this is the case and you'd like to evaluate the metric every (say) 10 *training* iterations:

.. code:: python

    trainer.evaluate_metric_every((10, 'iterations'))

However, while validating, the metric is evaluated once every iteration.

Setting up the Criterion and Optimizer
***************************************
With that out of the way, let's set up a training criterion and an optimizer. 

.. code:: python

    # set up the criterion
    trainer.build_criterion('CrossEntropyLoss')

The `trainer` looks for a `'CrossEntropyLoss'` in `torch.nn`, which it finds. But any of the following would have worked: 

.. code:: python

    trainer.build_criterion(nn.CrossEntropyLoss)

or 

.. code:: python

    trainer.build_criterion(nn.CrossEntropyLoss())

What this means is that if you have your own loss criterion that has the same API as any of the criteria found in `torch.nn`, you should be fine by just plugging it in. 

The same holds for the optimizer:

.. code:: python

    trainer.build_optimizer('Adam', weight_decay=0.0005)

Like for criteria, the `trainer` looks for a `'Adam'` in `torch.optim` (among other places), and initializes it with `model`'s parameters. Any keywords you might use for `torch.optim.Adam`, you could pass them to the `build_optimizer` method. 

Or alternatively, you could use:

.. code:: python

    from torch.optim import Adam

    trainer.build_optimizer(Adam, weight_decay=0.0005)


If you implemented your own optimizer (by subclassing `torch.optim.Optimizer`), you should be able to use it instead of `Adam`. Alternatively, if you already have an optimizer *instance*, you could do:

.. code:: python

    optimizer = MyOptimizer(model.parameters(), **optimizer_kwargs)
    trainer.build_optimizer(optimizer)


Setting up Training Duration
********************************
You probably don't want to train forever, in which case you must specify: 

.. code:: python

    trainer.set_max_num_epochs(100)

or 

.. code:: python

    trainer.set_max_num_iterations(10000)


If you like to train indefinitely (or until you're happy with the results), use:

.. code:: python

    trainer.set_max_num_iterations('inf')

In this case, you'll need to interrupt the training manually with a `KeyboardInterrupt`. 

Setting up Callbacks
*********************
Callbacks are pretty handy when it comes to interacting with the `Trainer`. More precisely: `Trainer` defines a number of events as 'triggers' for callbacks. Currently, these are: 

.. code:: python

    BEGIN_OF_FIT,
    END_OF_FIT,
    BEGIN_OF_TRAINING_RUN,
    END_OF_TRAINING_RUN,
    BEGIN_OF_EPOCH,
    END_OF_EPOCH,
    BEGIN_OF_TRAINING_ITERATION,
    END_OF_TRAINING_ITERATION,
    BEGIN_OF_VALIDATION_RUN,
    END_OF_VALIDATION_RUN,
    BEGIN_OF_VALIDATION_ITERATION,
    END_OF_VALIDATION_ITERATION,
    BEGIN_OF_SAVE,
    END_OF_SAVE


As an example, let's build a simple callback to interrupt the training on NaNs. We check at the end of every training iteration whether the training loss is NaN, and accordingly raise a `RuntimeError`. 

.. code:: python

    import numpy as np
    from inferno.trainers.callbacks.base import Callback

    class NaNDetector(Callback):
        def end_of_training_iteration(self, **_):
            # The callback object has the trainer as an attribute. 
            # The trainer populates its 'states' with torch tensors (NOT VARIABLES!)
            training_loss = self.trainer.get_state('training_loss')
            # Extract float from torch tensor
            training_loss = training_loss[0]
            if np.isnan(training_loss):
                raise RuntimeError("NaNs detected!")


With the callback defined, all we need to do is register it with the trainer:

.. code:: python

    trainer.register_callback(NaNDetector())


So the next time you get `RuntimeError: "NaNs detected!`, you know the drill. 

Using Tensorboard
**************************

Inferno supports logging scalars and images to Tensorboard out-of-the-box, though this requires you have at least [tensorflow-cpu](https://github.com/tensorflow/tensorflow) installed. Let's say you want to log scalars every iteration and images every 20 iterations:

.. code:: python

    from inferno.trainers.callbacks.logging.tensorboard import TensorboardLogger

    trainer.build_logger(TensorboardLogger(log_scalars_every=(1, 'iteration'), 
                                           log_images_every=(20, 'iterations')),
                         log_directory='/path/to/log/directory')


After you've started training, use a bash shell to fire up tensorboard with:

.. code:: bash

    $ tensorboard --logdir=/path/to/log/directory --port=6007
    
and navigate to `localhost:6007` with your favorite browser.

Fine print: missing the `log_images_every` keyword argument to `TensorboardLogger` will result in images being logged every iteration. If you don't have a fast hard drive, this might actually slow down the training. To not log images, just use `log_images_every='never'`. 

Using GPUs
*************

To use just one GPU: 

.. code:: python

    trainer.cuda()


For multi-GPU data-parallel training, simply pass `trainer.cuda` a list of devices: 

.. code:: python

    trainer.cuda(devices=[0, 1, 2, 3])


__Pro-tip__: Say you only want to use GPUs 0, 3, 5 and 7 (your colleagues might love you for this). Before running your training script, simply: 

.. code:: bash

    $ export CUDA_VISIBLE_DEVICES=0,3,5,7
    $ python train.py

This maps device 0 to 0, 3 to 1, 5 to 2 and 7 to 3. 

One more thing
**************************


Once you have everything configured, use 

.. code:: python

    trainer.fit()

to commence training! This last step is kinda important. :wink:

Cherries:
~~~~~~~~~~~~~~~~~~~~~~


Building Complex Models with the Graph API
****************************************************


Work in Progress:


Parameter Initialization
**************************

Work in Progress:


Support
*************
Work in Progress: