Deployment is an advanced feature. This tutorial shows how to use it to produce a deployed model. Actually using the deployed model in another project requires knowledge about.
Deployment is still in experimental state and might change in future!
SOL’s deployment functionality works in a similar way to sol.optimize()
. Just
instead of creating and optimized model that is compiled when it is called,
sol.deploy
creates a directly compiled library of the given model at the given
path. Both functions can be controlled by the same environment variables and
sol.config
. So, for example autotuning can also be enabled or disabled by
setting sol.config["autotuning"]
. But deploy does not read many of its options
directly from the model. Instead you have to define them manually. The
function’s signature looks like this:
def deploy(model, args=[], kwargs={}, fwargs={}, *,
library_type:str = "shared_lib",
library_func:str = "predict",
library_name:str = "sol_deploy",
library_path:str = ".",
device:str,
device_idx:int = 0,
device_idx_runtime:int = 0,
weights_type:str = "linked",
weights_path:str = ".",
vdims:Optional[Union[List, Tuple]] = None,
compiler_args = {},
no_avx_512 = False
):
These options can be separated into four categories:
These options define how SOL optimizes and compiles the network. Most of these are similar to sol.optimize()
.
Parameter | Valid Options/Type | Description |
---|---|---|
model | nn.module/tf.model/onnx_model | The Model, defined in any framework that is supported by SOL |
args | torch.Tensor/tf.Tensor | An Example input, used to get datatype and shapes |
kwargs | dict | Keyword arguments |
kwargs | dict | Framework arguments, e.g. shape |
vdims | list | Variable dimensions, see VDim |
compiler_args | dict | Dictionary which is given directly to underlying JIT compilers |
These options define the properties of the generated library.
Parameter | Valid Options | Description |
---|---|---|
library_type | shared_lib, static_lib |
Creates a shared (.so) or static (.a) library. |
library_name | str | Name of the Library. “sol_” will be used as prefix. |
library_func | str | Name of the main function of that library. |
library_path | str | Path to which the generated Files will be written. |
These options define how weights are stored and read by the generated library.
Parameter | Valid Options | Description |
---|---|---|
weights_type | linked, extern |
Weights are either linked into the generated library or stored in an external location. |
weights_path | str | Path to store external weights. (Can be None for linked weights.) |
Weights that are linked into the library require less manual handling, because they are part of the same file. But the size of a shared library is limited by Linux, so you cannot use this functions for larger networks. Keeping the weights external also allows you to free them on the host if you have moved your model to another device during execution. The library on the other hand stays loaded on the host side as long as it is in use.
These options define the device and the execution mode of how the deployed model.
Parameter | Valid Options | Description |
---|---|---|
device | x86, nvidia, ve, veo |
|
device_idx | int | Device id during compilation (+ autotuning) |
device_idx_runtime | int | Device id of the generated library |
The difference between ve
and veo
is in their respective execution modes.
ve
generates a .vso
library that can be linked into an executable that can
run directly on the Vector Engine. veo
stands for Vector Engine offloading
and generates a host library and a device library. You can use the host library
in your x86 project and it will offload the model and its in- and outputs
automatically to the Vector Engine during runtime.
Here is an example script:
import sol
import torch
import torchvision.models as models
model = models.get_model("convnext_small")
model.eval()
input = torch.rand(10,3,224,224, dtype=torch.float32)
sol.deploy(model, [input],
library_type = "shared_lib",
library_func = "predict",
library_name = "sol_convnext",
library_path = "/LOCAL/deployment",
device_idx = 0,
device = "nvidia",
weights_type = "extern",
weights_path = "data",
# Advanced features
vdims = [False],
compiler_args = {"ispc::vector":"avx2", "gcc:march":"native"},
)
This script creates a folder in /LOCAL/deployment with the following contents:
demouser:LOCAL/deployment$ ls
data libsol_convnext.so sol_convnext_example.c sol_convnext_example.py sol_convnext.h
To compile and run the generated the example, run the following commands.
demouser:LOCAL/deployment$ g++ sol_convnext_example.c -lsol_convnext -o sol_convnext_exe
demouser:LOCAL/deployment$ ./sol_convnext_exe
Note that LIBRARY_PATH
needs to include the location of libsol_convnext.so
for the linker (ld
) to be able to link against it. To run the executable, the
loader (ld.so
) needs to be able to locate the shared library as well. Set
LD_LIBRARY_PATH
accordingly or move libsol_convnext.so
to a folder included
in these paths.
Alternatively you can run the generated python example:
demouser:LOCAL/deployment$ python sol_convnext_example.py
The generated examples are meant to be human readable and show you how to call the deployed model from their respective languages.