This feature is still beta and has not been exhaustively been tested!
The command sol.deploy(model, target, dargs, *args, batch_size, autotuning, **kwargs)
can be used to deploy trained networks. Currently we support:
It is important that only pure framework models can be deployed yet, no SOL-models. So in case you have trained your model with SOL, you need to transfer the weights back to the framework model. This can be done as follows:
# PyTorch
py_model = ...
sol_model = sol.optimize(...)
sol_model.load_state_dict(py_model.state_dict())
# ... do your training
py_model.load_state_dict(sol_model.state_dict())
# TensorFlow
py_model = ...
sol_model = sol.optimize(...)
sol_model.setWeights(py_model.weights())
# ... do your training
py_model.setWeights(sol_model.weights())
In order to deploy your model, you can use the sol.deploy(...)
function which in principle works like the sol.optimize(...)
but has some more parameters.
See this for details on model, size, dtype and layout.
Parameter | Default | Description |
---|---|---|
target | Any of shared_lib, static_lib . |
|
dargs | A dict or OrderedDict containing the following values | |
dargs["lib_name"] |
"libsolpredict" |
Name of the library to be generated, e.g., ‘libmynetwork’ would generate ‘libmynetwork.a’ |
dargs["func_name"] |
"sol_predict" |
Name of the generated C function to call. |
dargs["device_type"] |
Either sol.device.[...] or a tuple with (sol.device.[...], "arch") where "arch" is a device specific string. For details look at this |
|
dargs["host_type"] |
sol.device.auto |
Same as device, but for the host system, in case of offloading deployment, e.g., NVIDIA. In host-only (i.e. X86) or pure-offloading (i.e. VE) this gets ignored. |
dargs["path"] |
"." |
Path where the library and header files will be generated into. |
An example look like:
sol.deploy(alexnet, target="lib::shared", {"lib_name": "libNN", "func_name": "predict", "path": ".", "device_type": sol.device.x86}, [1, 3, 224, 224])
Which will generate a library called libNN.so
for X86 and a header file in the current working directory.
#ifndef __libNN__
#define __libNN__
#ifdef __cplusplus
extern "C" {
#endif
void predict_init(const int deviceIdx);
int predict_seed(const int seed);
void predict(void* ctx, const float* L0, float** L47);
#ifdef __cplusplus
}
#endif
#endif
There are three methods. predict_init(...)
is automatically called the first time you run predict(...)
so you only need to use it, if you want to initialize everything prior the very first call. You can further use predict_seed(...)
to initiliaze all random number generators, in case you model uses random variables. predict(...)
is the actual method that executes your model. It expects ctx
to be 0, its purpose is for expert features, not available yet. L0
to point to your input data (allocated on the target device!) and a pointer to L47
to return the result. The numbers are autogenerated and can be different in your case.
The value for L47
will get allocated via sol_ext_malloc(...)
and can either be freed with sol_ext_free(...)
or with the device’s own free method, e.g., free(...)
(X86/ARM64/VE) or cudaFree(...)
(NVIDIA). In future we will enable to overwrite the sol_ext_...(...)
methods to provide your own memory allocator.
In case of a static lib, SOL will tell you which libraries you need to link against, in case of shared library, SOL already does all of this for you.
[INFO ][ 8.79][core] Compiler (63): Please link the following libraries and flags to your application.
[INFO ][ 8.79][core] Compiler (63): -L/path/to/sol/deploy
[INFO ][ 8.79][core] Compiler (63): -l:libMyAlexNet.a
[INFO ][ 8.79][core] Compiler (63): -l:libdnnl.a
[INFO ][ 8.79][core] Compiler (63): -lstdc++
[INFO ][ 8.79][core] Compiler (63): -fopenmp
Depending on the architecture you are deploying to SOL might need additional information for the target hardware. These can be added to the dargs argument.
Parameter | Default | Values |
---|---|---|
gcc::march | native | Everything that GCC accepts for -march parameter. |
gcc::mtune | native | Everything that GCC accepts for -mtune parameter. |
gcc::elf | elf_x86_64 | Everything that LD accepts for -m parameter. |
gcc::oformat | elf64-x86-64 | Everything that LD accepts for -oformat parameter. |
ispc::vector | native | [native, sse2, sse4, avx, avx2, avx512] |
Parameter | Default | Values |
---|---|---|
nvidia::cc | 2 digit compute capability, i.e. 65 |
The NEC SX-Aurora does not need any additional arguments.