Advanced

This section provides some further information and examples on how SOL processes certain inputs. It also presents some options to control SOL to a greater degree manually.

Other Devices

In most cases you probably want to execute or train your model on an accelerator like a GPU instead of your CPU. To offload your model, SOL again follows its easy to use principle that dictates that you do not need to set any SOL specific parameters to do so. Instead, SOL is designed to read all necessary information from the model. To do so you have to move the model to the desired device in a way that is supported by your chosen framework. Just as you would do without using SOL. So, if you move your model and your input data to another device, SOL will also compile for that device. The following example shows how to use an NVIDIA GPU. Use the standard interface of your framework to indicate which device to use. This can be done via .cuda() or .to('cuda') in PyTorch or with tf.device('/GPU:0'): in Tensorflow.

import torch
import torchvision.models as models
import sol

model = models.resnet18(pretrained=True).eval()
model.eval()

random_input = torch.randn(1, 3, 224, 224)

model.cuda()
random_input.cuda()

optimized_model = sol.optimize(model)

with torch.no_grad():
    out = optimized_model(random_input)
    torch.cuda.synchronize()

Manual Device Selection

SOL also offers an option to manually select the device that runs your model. Calling sol.device.set(device, device_idx) instructs SOL to use the given device for all following instructions. In this example the model is optimized and compiled for a NEC SX-Aurora (VE). If you want to run on an NVIDIA GPU you just have to change the device parameter from “ve” to “nvidia”. For a list of supported devices, see device.

import torch
import torchvision.models as models
import sol

model = models.resnet18(pretrained=True).eval()
model.eval()

random_input = torch.randn(1, 3, 224, 224)
sol.device.set("ve", 0)

optimized_model = sol.optimize(model)

with torch.no_grad():
    out = optimized_model(random_input)

As you can see, you do not even need to move input and output to the device explicitly but they are copied implicitly when the model is called.

This feature comes in handy if your framework does not support your device directly (like in the first example with .cuda() for torch). In this case, you can use SOL to offload your model automatically to any hardware that is supported by SOL.

For a list of supported devices, see device. To check your current installation for devices or look up their name within SOL you can also call sol.plugins():

[INFO ][  3.80][SOL/core] static (87) Compiler Plugins:
[INFO ][  3.80][SOL/core] static (87)     Devices:     [x86, nvidia, ve]
[INFO ][  3.80][SOL/core] static (87)     Frameworks:  [pytorch, tensorflow, onnx, numpy]

Optimization options

Most parameters are read from the model directly. But there are some options you can add to sol.optimize() directly to change the compilation.

# sol.optimize signature
def optimize(model, args=[], kwargs={}, *, framework=None, vdims=None, determinism=None, **fwargs)

args: Example Input

If the shapes of your inputs can not be inferred from the model directly or if you want to compile for a specific shape, you need to define an example input with your desired shape and datatype. Simply define shape and datatype of your inputs by passing a list of tensors of the desired properties to sol.optimize as arguments.

input_tensor = torch.empty((3, 4), dtype=torch.float16)
sol.optimize(model, [input_tensor])

fwargs: Framework Arguments

The entries of this dictionary are passed to the underlying framework. Note that this is and advanced option that usually requires knowledge of the inner workings of the corresponding parser.

Here is one example on how to use it. You can set the shape of an input to a Tensorflow model (in this case called “input_1”) to a specific value without passing an example input in args. This requires you to know the name of the tensor whose shape you want to define and to provide a valid shape, otherwise the compilation will fail.

sol.optimize(model, fwargs={"shapes":{"input_1": [int(batch_size), int(height), int(width), int(channels)]}})

Sometimes the input size can not be read from the model directly making this necessary.

framework: Cross Framework Compilation

As described before SOL reads the type of the framework from the model and creates a model of an equal type. By using the framework keyword you can define the output type yourself! This allows you to run your models from one framework in another. If you have for example an old tensorflow model lying around but want to use it in your current training script that is written in torch, you can use SOL to simply reuse your old model in your new script!

import torch
import torchvision.models as models
import tensorflow as tf
import sol
import numpy as np

model = models.resnet18(pretrained=True).eval()

random_input = np.random.rand(1, 3, 224, 224) # note that this is not a torch tensor
random_input_t = torch.Tensor(random_input) # Example inputs are required in this case!
optimized_model = sol.optimize(model, [random_input_t], framework="keras") # this creates a tf.keras.Module

with tf.device('/CPU:0'):
    out = optimized_model(random_input) # the compiled model behaves like a tensorflow model!

Available options are:

Framework Description
keras Returns the model as a keras.Model.
numpy Returns the model as a object that stores the weights as Numpy arrays, expecting the inputs to be Numpy arrays.
pytorch Returns the model as a torch.nn.Module.
tensorflow Returns the model as a object that stores the weights as tf.Variable and the execution is run within TensorFlow, expecting Numpy or TensorFlow tensors as input.

determinism: Numerical Accuracy

The determinism option controls the numerical accuracy of SOL. This allows you to enable or disable several trade offs between accuracy and performance of your model. As with all other options, by default the rules of the original framework of the optimized model are used. So, setting for example torch.set_float32_matmul_precision("highest") will define rules how matrix multiplication in torch is handled. SOL follows these rules as well. Passing a different value to determinism allows you to change these rules manually for a single optimization run.

For more details and possible options in SOL see Determinism in the official documentation.

vdims: Variable Dimensions

When you compile a network you will see an output similar to this:

[INFO ][  7.74][SOL/core] compiler (313) Parsing network AlexNet
[INFO ][  7.76][SOL/core] Optimizer (42) Analyzing network AlexNet (0x1AE3B71C)
[INFO ][  7.77][SOL/core] Wrapper (138)  Inputs:
[INFO ][  7.77][SOL/core] Wrapper (138)      x: Tensor(dtype=[F32], shape=[#0, 3, 224, 224])
[INFO ][  7.77][SOL/core] Wrapper (143)  Outputs:
[INFO ][  7.77][SOL/core] Wrapper (143)      Tensor(dtype=[F32], shape=[#0, 1000])
[INFO ][  7.77][SOL/core] Optimizer (84) Model Parameters: 233.08MB
[INFO ][  7.77][SOL/core] Optimizer (88)  
[INFO ][  7.88][SOL/core] Compiler (73)  Compiling network 1AE3B71C_9ACB57BC for x86
[INFO ][ 14.10][SOL/core] Progress (56)  100.0% [■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■]
[INFO ][ 14.10][SOL/core] Compiler (145) Estimated Peak Memory Consumption (assuming: #0: 1):
[INFO ][ 14.10][SOL/core] Compiler (151)      Inference: ~238.0MB

Note that in “shape” the first dimension is represented as “#0”. This is a variable dimension (VDim) with index 0. SOL automatically detects that this is the batchsize and the compilation does not depend on this being a fixed size. That means that no recompilation is needed when you call the model with different batchsizes.

If you do want to fix the batchsize to ensure that only the optimal implementation for one fixed size is created, you can disable VDims with sol.compile(..., vdims=[False]).

[INFO ][ 11.40][SOL/core] compiler (313) Parsing network AlexNet
[INFO ][ 11.42][SOL/core] Optimizer (42) Analyzing network AlexNet (0x1AE3B71C)
[INFO ][ 11.43][SOL/core] Wrapper (138)  Inputs:
[INFO ][ 11.43][SOL/core] Wrapper (138)      x: Tensor(dtype=[F32], shape=[1, 3, 224, 224])
[INFO ][ 11.43][SOL/core] Wrapper (143)  Outputs:
[INFO ][ 11.43][SOL/core] Wrapper (143)      Tensor(dtype=[F32], shape=[1, 1000])
[INFO ][ 11.43][SOL/core] Optimizer (84) Model Parameters: 233.08MB
[INFO ][ 11.43][SOL/core] Optimizer (88)  
[INFO ][ 11.53][SOL/core] Compiler (73)  Compiling network 1AE3B71C_CA29BD56 for x86
[INFO ][ 17.99][SOL/core] Progress (56)  100.0% [■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■]
[INFO ][ 17.99][SOL/core] Compiler (145) Estimated Peak Memory Consumption:
[INFO ][ 17.99][SOL/core] Compiler (151)     Inference: ~238.0MB

In this example the network was compiled with a batchsize of 1. If the same model is called with a different value it has to be recompiled for that value. This ensures that always an solution with an optimization customized to this specific batchsize is chosen.

Controlling SOL

There are a few advanced options to control SOL’s behavior outside of sol.optimize(). SOL’s settings can be changed by manipulating its config and environment variables. Here are few examples on how to use these

SOL Output

If you have used SOL a few times you will notice that it prints a lot of information with the “[INFO]” tag by default. To disable any output of this kind you can set the SOL_LOG environment variable to ERROR. This ensures that SOL will only print messages in case of an error.

SOL_LOG=ERROR python sol_script.py

Enjoy the silence!

Current Working Directory

By default, SOL creates a directory for its intermediate code generation and compilation within your current directory in a folder called .sol. While this is desired in most cases, sometimes there are reasons for you to want a fixed location for SOL’s intermediate generations. For example, if you call your scripts from many different locations and want to prevent multiple .sol directories or to make sure it uses a faster hard drive.

To change the location of this directory you can do so via the $SOL_CWD variable. Set SOL_CWD=/path/to/your/dir to write and read intermediate results from your desired location.

You can also delete this directory to force SOL to generate all temporary files again.

SOL Cache

SOL caches optimized networks to reduce the workload when they are called repeatedly. If you want to clear this cache to force a recompilation or to save disk space you can do so via sol.cache.clear() in python or by setting the environment variable SOL_CLEAR_CACHE=TRUE before executing your script.

Autotuning

By default SOL uses autotuning to find an optimal implementation for each layer. In some cases you want a deterministic choice of implementation or maybe you want to save time during compilation. You can disable autotuning via sol.config["autotuning"]= False in your script to always use the same (then heuristically chosen) implementation.

Further Options

For a full list of all supported environment variables and config options you can check out the official documentation ENV and CONFIG.