This section provides some further information and examples on how SOL processes certain inputs. It also presents some options to control SOL to a greater degree manually.
In most cases you probably want to execute or train your model on an accelerator
like a GPU instead of your CPU. To offload your model, SOL again follows its
easy to use principle that dictates that you do not need to set any SOL
specific parameters to do so. Instead, SOL is designed to read all necessary
information from the model. To do so you have to move the model to the desired
device in a way that is supported by your chosen framework. Just as you would do
without using SOL. So, if you move your model and your input data to another
device, SOL will also compile for that device. The following example shows how
to use an NVIDIA GPU. Use the standard interface of your framework to indicate
which device to use. This can be done via .cuda()
or .to('cuda')
in PyTorch
or with tf.device('/GPU:0'):
in Tensorflow.
import torch
import torchvision.models as models
import sol
model = models.resnet18(pretrained=True).eval()
model.eval()
random_input = torch.randn(1, 3, 224, 224)
model.cuda()
random_input.cuda()
optimized_model = sol.optimize(model)
with torch.no_grad():
out = optimized_model(random_input)
torch.cuda.synchronize()
SOL also offers an option to manually select the device that runs your model.
Calling sol.device.set(device, device_idx)
instructs SOL to use the given
device for all following instructions. In this example the model is optimized
and compiled for a NEC SX-Aurora (VE). If you want to run on an NVIDIA GPU you
just have to change the device parameter from “ve” to “nvidia”. For a list of
supported devices, see device.
import torch
import torchvision.models as models
import sol
model = models.resnet18(pretrained=True).eval()
model.eval()
random_input = torch.randn(1, 3, 224, 224)
sol.device.set("ve", 0)
optimized_model = sol.optimize(model)
with torch.no_grad():
out = optimized_model(random_input)
As you can see, you do not even need to move input and output to the device explicitly but they are copied implicitly when the model is called.
This feature comes in handy if your framework does not support your device
directly (like in the first example with .cuda()
for torch). In this case, you
can use SOL to offload your model automatically to any hardware that is
supported by SOL.
For a list of supported devices, see device. To
check your current installation for devices or look up their name within SOL you
can also call sol.plugins()
:
[INFO ][ 3.80][SOL/core] static (87) Compiler Plugins:
[INFO ][ 3.80][SOL/core] static (87) Devices: [x86, nvidia, ve]
[INFO ][ 3.80][SOL/core] static (87) Frameworks: [pytorch, tensorflow, onnx, numpy]
Most parameters are read from the model directly. But there are some options you
can add to sol.optimize()
directly to change the compilation.
# sol.optimize signature
def optimize(model, args=[], kwargs={}, *, framework=None, vdims=None, determinism=None, **fwargs)
If the shapes of your inputs can not be inferred from the model directly or if
you want to compile for a specific shape, you need to define an example input
with your desired shape and datatype. Simply define shape and datatype of your
inputs by passing a list of tensors of the desired properties to sol.optimize
as arguments.
input_tensor = torch.empty((3, 4), dtype=torch.float16)
sol.optimize(model, [input_tensor])
The entries of this dictionary are passed to the underlying framework. Note that this is and advanced option that usually requires knowledge of the inner workings of the corresponding parser.
Here is one example on how to use it. You can set the shape of an input to a Tensorflow model (in this case called “input_1”) to a specific value without passing an example input in args. This requires you to know the name of the tensor whose shape you want to define and to provide a valid shape, otherwise the compilation will fail.
sol.optimize(model, fwargs={"shapes":{"input_1": [int(batch_size), int(height), int(width), int(channels)]}})
Sometimes the input size can not be read from the model directly making this necessary.
As described before SOL reads the type of the framework from the model and
creates a model of an equal type. By using the framework
keyword you can
define the output type yourself! This allows you to run your models from one
framework in another. If you have for example an old tensorflow model lying
around but want to use it in your current training script that is written in
torch, you can use SOL to simply reuse your old model in your new script!
import torch
import torchvision.models as models
import tensorflow as tf
import sol
import numpy as np
model = models.resnet18(pretrained=True).eval()
random_input = np.random.rand(1, 3, 224, 224) # note that this is not a torch tensor
random_input_t = torch.Tensor(random_input) # Example inputs are required in this case!
optimized_model = sol.optimize(model, [random_input_t], framework="keras") # this creates a tf.keras.Module
with tf.device('/CPU:0'):
out = optimized_model(random_input) # the compiled model behaves like a tensorflow model!
Available options are:
Framework | Description |
---|---|
keras | Returns the model as a keras.Model . |
numpy | Returns the model as a object that stores the weights as Numpy arrays, expecting the inputs to be Numpy arrays. |
pytorch | Returns the model as a torch.nn.Module . |
tensorflow | Returns the model as a object that stores the weights as tf.Variable and the execution is run within TensorFlow, expecting Numpy or TensorFlow tensors as input. |
The determinism
option controls the numerical accuracy of SOL. This allows you
to enable or disable several trade offs between accuracy and performance of your
model. As with all other options, by default the rules of the original framework
of the optimized model are used. So, setting for example
torch.set_float32_matmul_precision("highest")
will define rules how matrix
multiplication in torch is handled. SOL follows these rules as well. Passing a
different value to determinism
allows you to change these rules manually for a
single optimization run.
For more details and possible options in SOL see Determinism in the official documentation.
When you compile a network you will see an output similar to this:
[INFO ][ 7.74][SOL/core] compiler (313) Parsing network AlexNet
[INFO ][ 7.76][SOL/core] Optimizer (42) Analyzing network AlexNet (0x1AE3B71C)
[INFO ][ 7.77][SOL/core] Wrapper (138) Inputs:
[INFO ][ 7.77][SOL/core] Wrapper (138) x: Tensor(dtype=[F32], shape=[#0, 3, 224, 224])
[INFO ][ 7.77][SOL/core] Wrapper (143) Outputs:
[INFO ][ 7.77][SOL/core] Wrapper (143) Tensor(dtype=[F32], shape=[#0, 1000])
[INFO ][ 7.77][SOL/core] Optimizer (84) Model Parameters: 233.08MB
[INFO ][ 7.77][SOL/core] Optimizer (88)
[INFO ][ 7.88][SOL/core] Compiler (73) Compiling network 1AE3B71C_9ACB57BC for x86
[INFO ][ 14.10][SOL/core] Progress (56) 100.0% [■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■]
[INFO ][ 14.10][SOL/core] Compiler (145) Estimated Peak Memory Consumption (assuming: #0: 1):
[INFO ][ 14.10][SOL/core] Compiler (151) Inference: ~238.0MB
Note that in “shape” the first dimension is represented as “#0”. This is a variable dimension (VDim) with index 0. SOL automatically detects that this is the batchsize and the compilation does not depend on this being a fixed size. That means that no recompilation is needed when you call the model with different batchsizes.
If you do want to fix the batchsize to ensure that only the optimal
implementation for one fixed size is created, you can disable VDims with
sol.compile(..., vdims=[False])
.
[INFO ][ 11.40][SOL/core] compiler (313) Parsing network AlexNet
[INFO ][ 11.42][SOL/core] Optimizer (42) Analyzing network AlexNet (0x1AE3B71C)
[INFO ][ 11.43][SOL/core] Wrapper (138) Inputs:
[INFO ][ 11.43][SOL/core] Wrapper (138) x: Tensor(dtype=[F32], shape=[1, 3, 224, 224])
[INFO ][ 11.43][SOL/core] Wrapper (143) Outputs:
[INFO ][ 11.43][SOL/core] Wrapper (143) Tensor(dtype=[F32], shape=[1, 1000])
[INFO ][ 11.43][SOL/core] Optimizer (84) Model Parameters: 233.08MB
[INFO ][ 11.43][SOL/core] Optimizer (88)
[INFO ][ 11.53][SOL/core] Compiler (73) Compiling network 1AE3B71C_CA29BD56 for x86
[INFO ][ 17.99][SOL/core] Progress (56) 100.0% [■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■]
[INFO ][ 17.99][SOL/core] Compiler (145) Estimated Peak Memory Consumption:
[INFO ][ 17.99][SOL/core] Compiler (151) Inference: ~238.0MB
In this example the network was compiled with a batchsize of 1. If the same model is called with a different value it has to be recompiled for that value. This ensures that always an solution with an optimization customized to this specific batchsize is chosen.
There are a few advanced options to control SOL’s behavior outside of
sol.optimize()
. SOL’s settings can be changed by manipulating its config and environment variables. Here are few examples on how to use these
If you have used SOL a few times you will notice that it prints a lot of
information with the “[INFO]” tag by default. To disable any output of this kind
you can set the SOL_LOG
environment variable to ERROR
. This ensures that SOL
will only print messages in case of an error.
SOL_LOG=ERROR python sol_script.py
Enjoy the silence!
By default, SOL creates a directory for its intermediate code generation and
compilation within your current directory in a folder called .sol
. While this
is desired in most cases, sometimes there are reasons for you to want a fixed
location for SOL’s intermediate generations. For example, if you call your
scripts from many different locations and want to prevent multiple .sol
directories or to make sure it uses a faster hard drive.
To change the location of this directory you can do so via the $SOL_CWD
variable. Set SOL_CWD=/path/to/your/dir
to write and read intermediate results
from your desired location.
You can also delete this directory to force SOL to generate all temporary files again.
SOL caches optimized networks to reduce the workload when they are called
repeatedly. If you want to clear this cache to force a recompilation or to save
disk space you can do so via sol.cache.clear()
in python or by setting the
environment variable SOL_CLEAR_CACHE=TRUE
before executing your script.
By default SOL uses autotuning to find an optimal implementation for each layer.
In some cases you want a deterministic choice of implementation or maybe you
want to save time during compilation. You can disable autotuning via
sol.config["autotuning"]= False
in your script to always use the same (then
heuristically chosen) implementation.
For a full list of all supported environment variables and config options you can check out the official documentation ENV and CONFIG.