SOL v0.5.0 Documentation > Frameworks > PyTorch

PyTorch

This example requires the torchvision package: https://github.com/pytorch/vision/ . Please note, that SOL does not support the use of model.eval() or model.train(). SOL always assumes model.eval() for running inference, and model.train() when running training.

import torch
import sol
import torchvision.models as models

''' Optimizing Model '''
py_model  = models.__dict__["alexnet"]()
input	  = torch.rand(32, 3, 224, 224)

# Use vdims=[True] if you plan to use changing batchsizes
sol_model = sol.optimize(py_model, [input], vdims=[True])

''' Run training '''
sol_model.train()
for batch in ...:
	input, target = ...
	output = sol_model(input)
	loss = loss_func(output, target)
	loss.backward()
	...

''' Run validation '''
opt.eval()
with torch.no_grad():
	for batch in ...:
		input = ...
		output = opt(input)
		...

IMPORTANT: SOL does not provide the “num_batches_tracked” value for BatchNormalization, therefore loading the state dict with load_state_dict(…, strict=True) will fail in these cases!

F.A.Q.

How do I store/load a Pytorch model?

How do I store/load a Pytorch model?
For storing/loading a SOL PyTorch model, use `model.state_dict()` and `model.load_state_dict(...)` methods. `# Storing sol_model = sol.optimize(pytorch_model, [...]) torch.save(sol_model.state_dict(), PATH) # Loading sol_model = sol.optimize(pytorch_model) sol_model.load_state_dict(torch.load(PATH))` More information on loading/storing the weights can be found here

For storing/loading a SOL PyTorch model, use model.state_dict() and model.load_state_dict(...) methods.

# Storing
sol_model = sol.optimize(pytorch_model, [...])
torch.save(sol_model.state_dict(), PATH)

# Loading
sol_model = sol.optimize(pytorch_model)
sol_model.load_state_dict(torch.load(PATH))

More information on loading/storing the weights can be found here

I get strange errors when running sol.optimize(model, ...), e.g., in Huggingface Transformers.

I get strange errors when running sol.optimize(model, ...), e.g., in Huggingface Transformers.
Huggingface Transformers are incompatible to PyTorch's `torch.jit.script(...)` parser and can only be used with `torch.jit.trace(...)` (see here). As `torch.jit.trace(...)` is much more restricted than `torch.jit.script(...)` in terms of the input and output the models we use `torch.jit.script(...)` as default parser. If you encounter problems, you can try running `sol.optimize(model, ..., trace=True)` to use the `torch.jit.trace(...)` parser instead. But be advised, that you might need to simplify your model input/output accordingly. Please see the PyTorch documentation for more details.

Huggingface Transformers are incompatible to PyTorch's torch.jit.script(...) parser and can only be used with torch.jit.trace(...) (see here). As torch.jit.trace(...) is much more restricted than torch.jit.script(...) in terms of the input and output the models we use torch.jit.script(...) as default parser. If you encounter problems, you can try running

sol.optimize(model, ...,
trace=True)

to use the torch.jit.trace(...) parser instead. But be advised, that you might need to simplify your model input/output accordingly. Please see the PyTorch documentation for more details.

The SOL model returns more outputs than the PyTorch model.

The SOL model returns more outputs than the PyTorch model.
This error occurs, i.e., in TorchVisions Inception V3 or GoogleNet. These models return 1 output in inference and 2 outputs in training mode. SOL relies on the TorchScript parser. Unfortunately the TorchVision models are build in a way that hides the change of output behavior from TorchScript. However, you can implement this yourself as follows: from torchvision import models class Wrap(torch.nn.Module): def __init__(self, model): super().__init__() self.model = model def forward(self, x): out = self.model(x) if torch.jit.is_scripting(): return (out[0], out[1]) if self.training else (out[0], None) return (out[0], out[1]) if self.training else (out, None) model = Wrap(models.inception_v3()) # use only one output model.training = False sol_model = sol.optimize(model, ...) # use two outputs model.training = True sol_model = sol.optimize(model, ...) SOL currently does not support to dynamically switch between these two modes and requires to compile the model for each mode separately.

This error occurs, i.e., in TorchVisions Inception V3 or GoogleNet. These models return 1 output in inference and 2 outputs in training mode. SOL relies on the TorchScript parser. Unfortunately the TorchVision models are build in a way that hides the change of output behavior from TorchScript. However, you can implement this yourself as follows:

from torchvision import models

class Wrap(torch.nn.Module):
	def __init__(self, model):
		super().__init__()
		self.model = model

	def forward(self, x):
		out = self.model(x)
		if torch.jit.is_scripting():
			return (out[0], out[1]) if self.training else (out[0], None)
		return (out[0], out[1]) if self.training else (out, None)

model = Wrap(models.inception_v3())

# use only one output
model.training = False
sol_model = sol.optimize(model, ...)

# use two outputs
model.training = True
sol_model = sol.optimize(model, ...)

SOL currently does not support to dynamically switch between these two modes and requires to compile the model for each mode separately.

Supported Layers

Please refer to https://pytorch.org/docs/stable/ for how these functions are used. This documentation only contains which layers, functions and tensor functionality is currently implemented within SOL.

Layers

aten::Float
aten::Int
aten::IntImplicit
aten::ScalarImplicit
aten::__and__
aten::__contains__
aten::__derive_index
aten::__getitem__
aten::__is__
aten::__isnot__
aten::__not__
aten::__or__
aten::__range_length
aten::_set_item
aten::abs
aten::absolute
aten::acos
aten::acosh
aten::adaptive_avg_pool1d
aten::adaptive_avg_pool2d
aten::adaptive_avg_pool3d
aten::adaptive_max_pool1d
aten::adaptive_max_pool2d
aten::adaptive_max_pool3d
aten::add
aten::addbmm
aten::addcdiv
aten::addcmul
aten::addmm
aten::alpha_dropout
aten::append
aten::arange
aten::arccos
aten::arccosh
aten::arcsin
aten::arcsinh
aten::arctan
aten::arctanh
aten::argmax
aten::argmin
aten::as_tensor
aten::asin
aten::asinh
aten::atan
aten::atanh
aten::avg_pool1d
aten::avg_pool2d
aten::avg_pool3d
aten::batch_norm
aten::broadcast_tensors
aten::cat
aten::ceil
aten::celu
aten::chunk
aten::clamp
aten::clamp_max
aten::clamp_min
aten::clone
aten::complex
aten::concat
aten::constant_pad_nd
aten::contiguous
aten::conv1d
aten::conv2d
aten::conv3d
aten::cos
aten::cosh
aten::cumsum
aten::dict
aten::dim
aten::div
aten::divide
aten::dropout
aten::elu
aten::embedding
aten::empty
aten::eq
aten::equal
aten::erf
aten::exp2
aten::exp
aten::expand
aten::expand_as
aten::expm1
aten::fft_fft2
aten::fft_fft
aten::fft_fftn
aten::fft_hfft
aten::fft_ifft2
aten::fft_ifft
aten::fft_ifftn
aten::fft_ihfft
aten::fft_irfft2
aten::fft_irfft
aten::fft_irfftn
aten::fft_rfft2
aten::fft_rfft
aten::fft_rfftn
aten::flatten
aten::floor
aten::floor_divide
aten::floordiv
aten::fmod
aten::format
aten::frobenius_norm
aten::ge
aten::gelu
aten::greater
aten::greater_equal
aten::group_norm
aten::gru
aten::gru_cell
aten::gt
aten::hardshrink
aten::hardsigmoid
aten::hardswish
aten::hardtanh
aten::imag
aten::instance_norm
aten::is_floating_point
aten::isfinite
aten::isinf
aten::isnan
aten::items
aten::l1_loss
aten::layer_norm
aten::le
aten::leaky_relu
aten::len
aten::linear
aten::list
aten::log10
aten::log1p
aten::log2
aten::log
aten::log_sigmoid
aten::log_softmax
aten::logaddexp2
aten::logaddexp
aten::logical_and
aten::logical_not
aten::logical_or
aten::logical_xor
aten::lstm
aten::lstm_cell
aten::lt
aten::matmul
aten::max
aten::max_pool1d
aten::max_pool1d_with_indices
aten::max_pool2d
aten::max_pool2d_with_indices
aten::max_pool3d
aten::max_pool3d_with_indices
aten::maximum
aten::mean
aten::meshgrid
aten::min
aten::minimum
aten::mse_loss
aten::mul
aten::multiply
aten::narrow
aten::narrow_copy
aten::ne
aten::neg
aten::negative
aten::new_full
aten::norm
aten::not_equal
aten::nuclear_norm
aten::numel
aten::ones
aten::ones_like
aten::percentFormat
aten::permute
aten::pow
aten::prelu
aten::prod
aten::rand
aten::rand_like
aten::randint
aten::randint_like
aten::randn
aten::randn_like
aten::real
aten::reciprocal
aten::relu6
aten::relu
aten::remainder
aten::repeat
aten::reshape
aten::rnn_relu
aten::rnn_relu_cell
aten::rnn_tanh
aten::rnn_tanh_cell
aten::rrelu
aten::rsqrt
aten::rsub
aten::select
aten::selu
aten::sigmoid
aten::sign
aten::silu
aten::sin
aten::sinh
aten::size
aten::slice
aten::smooth_l1_loss
aten::softmax
aten::softmin
aten::softplus
aten::softshrink
aten::split
aten::sqrt
aten::square
aten::squeeze
aten::stack
aten::str
aten::sub
aten::sum
aten::tan
aten::tanh
aten::tile
aten::to
aten::transpose
aten::unsqueeze
aten::upsample_bicubic2d
aten::upsample_bilinear2d
aten::upsample_linear1d
aten::upsample_nearest1d
aten::upsample_nearest2d
aten::upsample_nearest3d
aten::upsample_trilinear3d
aten::values
aten::var
aten::view
aten::warn
aten::where
aten::zeros
aten::zeros_like
prim::CallFunction
prim::Constant
prim::CreateObject
prim::DictConstruct
prim::GetAttr
prim::If
prim::ListConstruct
prim::ListIndex
prim::ListUnpack
prim::Loop
prim::NumToTensor
prim::Print
prim::RaiseException
prim::SetAttr
prim::TupleConstruct
prim::TupleIndex
prim::TupleUnpack
prim::Uninitialized
prim::device
prim::dtype
prim::isinstance
prim::max
prim::min
prim::unchecked_cast

Tested Models

TorchVision
- AlexNet
- SqueezeNet [1.0, 1.1]
- Inception [v3]
- GoogleNet
- DenseNet [121, 161, 169, 202]
- MNasNet [0.5, 0.75, 1.0, 1.3]
- MobileNet [v2, v3 small, v3 large]
- ResNet [18, 34, 50, 101, 152]
- ResNext [50, 102]
- WideResNet [50, 101]
- ShuffleNet [0.5, 1.0, 1.5, 2.0]
- VGG [11, 13, 16, 19, w/ and w/o BN]
Huggingface
- BertModel
- BertForSequenceClassification