Deployment

This feature is still beta and has not been exhaustively been tested!

The command sol.deploy(model, target, dargs, *args, batch_size, autotuning, **kwargs) can be used to deploy trained networks. Currently we support:

  • X86
  • ARM64 native and cross-compilation
  • NVIDIA GPUs
  • NEC SX-Aurora

It is important that only pure framework models can be deployed yet, no SOL-models. So in case you have trained your model with SOL, you need to transfer the weights back to the framework model. This can be done as follows:

# PyTorch 
py_model = ...
sol_model = sol.optimize(...)
sol_model.load_state_dict(py_model.state_dict())

# ... do your training

py_model.load_state_dict(sol_model.state_dict())
# TensorFlow
py_model = ...
sol_model = sol.optimize(...)
sol_model.setWeights(py_model.weights())

# ... do your training

py_model.setWeights(sol_model.weights())

In order to deploy your model, you can use the sol.deploy(...) function which in principle works like the sol.optimize(...) but has some more parameters.

The sol.deploy(…) command

See this for details on model, size, dtype and layout.

Parameter Default Description
target Any of shared_lib, static_lib.
dargs A dict or OrderedDict containing the following values
dargs["lib_name"] "libsolpredict" Name of the library to be generated, e.g., ’libmynetwork’ would generate ’libmynetwork.a'
dargs["func_name"] "sol_predict" Name of the generated C function to call.
dargs["device_type"] Either sol.device.[...] or a tuple with (sol.device.[...], "arch") where "arch" is a device specific string. For details look at this
dargs["host_type"] sol.device.auto Same as device, but for the host system, in case of offloading deployment, e.g., NVIDIA. In host-only (i.e. X86) or pure-offloading (i.e. VE) this gets ignored.
dargs["path"] "." Path where the library and header files will be generated into.

An example look like:

sol.deploy(alexnet, target="lib::shared", {"lib_name": "libNN", "func_name": "predict", "path": ".", "device_type": sol.device.x86}, [1, 3, 224, 224])

Which will generate a library called libNN.so for X86 and a header file in the current working directory.

#ifndef __libNN__
#define __libNN__
#ifdef __cplusplus
extern "C" {
#endif
        void predict_init(const int deviceIdx);
        int predict_seed(const int seed);
        void predict(void* ctx, const float* L0, float** L47);
#ifdef __cplusplus
}
#endif
#endif

There are three methods. predict_init(...) is automatically called the first time you run predict(...) so you only need to use it, if you want to initialize everything prior the very first call. You can further use predict_seed(...) to initiliaze all random number generators, in case you model uses random variables. predict(...) is the actual method that executes your model. It expects ctx to be 0, its purpose is for expert features, not available yet. L0 to point to your input data (allocated on the target device!) and a pointer to L47 to return the result. The numbers are autogenerated and can be different in your case.

The value for L47 will get allocated via sol_ext_malloc(...) and can either be freed with sol_ext_free(...) or with the device’s own free method, e.g., free(...) (X86/ARM64/VE) or cudaFree(...) (NVIDIA). In future we will enable to overwrite the sol_ext_...(...) methods to provide your own memory allocator.

In case of a static lib, SOL will tell you which libraries you need to link against, in case of shared library, SOL already does all of this for you.

[INFO ][  8.79][core] Compiler (63):                                      Please link the following libraries and flags to your application.
[INFO ][  8.79][core] Compiler (63):                                          -L/path/to/sol/deploy
[INFO ][  8.79][core] Compiler (63):                                          -l:libMyAlexNet.a
[INFO ][  8.79][core] Compiler (63):                                          -l:libdnnl.a
[INFO ][  8.79][core] Compiler (63):                                          -lstdc++
[INFO ][  8.79][core] Compiler (63):                                          -fopenmp

Architectures

Depending on the architecture you are deploying to SOL might need additional information for the target hardware. These can be added to the dargs argument.

X86 or ARM64 CPUs

Parameter Default Values
gcc::march native Everything that GCC accepts for -march parameter.
gcc::mtune native Everything that GCC accepts for -mtune parameter.
gcc::elf elf_x86_64 Everything that LD accepts for -m parameter.
gcc::oformat elf64-x86-64 Everything that LD accepts for -oformat parameter.
ispc::vector native [native, sse2, sse4, avx, avx2, avx512]

NVIDIA GPUs

Parameter Default Values
nvidia::cc 2 digit compute capability, i.e. 65

NEC SX-Aurora

The NEC SX-Aurora does not need any additional arguments.