In v0.5.3 we added an experimental Custom Layer support. In this chapter we describe how it can be used from top to bottom:
For the integration into the neural network, we use custom layers, that integrate seamlessly into the AI framework’s computation mechanisms. However, not all features are available in all frameworks.
To create your own Custom layer, you can use the
sol.[pytorch/tensorflow].nn.Custom
class.
It’s constructor expects some arguments:
function_names: Union[str, List[str], Tuple[str]]
: List of C++ function
names, that get called from the custom layer. 1st is used for forward pass,
2nd for backward pass.include_file: str
: Name of C++ header file to be included by the custom
layer. Gets included as #include<[include_file]>
.compiler_flags: Dict[str, Union[Tuple[str], List[str]]]
: Dictionary of
compiler flags that get passed to he device’s native compiler. Use x86
(GCC), nvidia
(NVCC), or ve
(NCC) for device names.hyper_parameter: Union[Tuple[Union[int, str]], List[Union[int, str]]]
:
Hyper parameters that get passed as template parameters to the custom layer.
Should be of C++ integer-like types, 'char'
, constexpr
value or
C-preprocessor macro.name: str
: Name used in the SOL debug graph, or as name for the layer
within the framework (if supported).By default, the custom layer accepts multiple inputs, and returns identical
outputs. If you need to recompute the shape or dtypes of the custom layer, you
can overwrite the shape_inference
and dtype_inference
methods. 2nd should
return dtypes in the framework’s native dtype format.
class Custom(sol.[pytorch/tensorflow].nn.Custom):
def shape_inference(self, inputs: List[Tensor]) -> List[List[int]]:
...
def dtype_inference(self, inputs: List[Tensor]) -> List[DType]:
...
When computing the derivative it might be required to have a copy available of
any of the inputs or outputs. For this purpose you can overwrite the methods
input_requires_copies
and output_requires_copies
. Each needs to return a
list of boolean values, where True
indicates that the value will be provided
as copy within the backward pass.
class Custom(sol.[pytorch/tensorflow].nn.Custom):
def input_requires_copies(self, inputs: List[Tensor]) -> List[bool]:
...
def output_requires_copies(self, inputs: List[Tensor]) -> List[bool]:
...
Here a full example written in Pytorch.
class Custom(sol.pytorch.nn.Custom):
def __init__(self):
path = [f'-I/path/to/your/code']
include_file = 'custom.h'
compiler_flags = {'x86': path, 've': path}
super().__init__(
('custom_memcpy', 'custom_memcpy'),
include_file,
compiler_flags,
[0, "'c'"],
name="Memcpy"
)
def _pytorch(self, inputs: List[Tensor]) -> List[Tensor]:
return [inputs[0].clone()]
def forward(self, x):
return self._forward([x])[0]
model = torch.nn.Sequential(Custom())
As you can see, we have overwritten the forward
method, as the default
requires a list or tuple as input, which would not work with the
torch.nn.Sequential
. Further, within PyTorch you can provide a _pytorch
method, that performs your operation when running the neural network with
PyTorch. Due to technical limitations this is not available for TensorFlow!
The SOL custom layer integrates your layer into the SOL execution graph,
performs all allocations/deallocations of input and output data and calls into
the functions you have provides with the function_names
arguments.
The raw scheme of the function call looks like this:
#include <sol/[device_type]/api.h>
#include <[include_file]>
SOL_API void [autogenerated_unique_function_name](sol_ctx* ctx) {
[function_name[derivative]]<[training], [hyper_parameters], [i.dtype for i in inputs], [o.dtype for o in outputs]>(ctx, *inputs, *outputs, *input_copies, *output_copies);
}
As described before, first we include your header file and then call
function_name[0]
for forward pass and function_name[1]
for backward pass.
For the template, we provide:
training
, that is false
in inference, and true
during training.hyper_parameters
, i.e., [0, 'c']
.For the function arguments, we provide:
sol_ctx*
which needs to be used to allocate temporary data.sol_tensor*
of all inputs or gradient outputsol_tensor*
of all outputs or gradient inputsol_tensor*
of all input copies (in backward pass only)sol_tensor*
of all outputs copies (in backward pass only)Within your header file you then can provide a function similar to this:
#include <type_traits>
template<bool training, int hyper_0, char hyper_1, typename TI, typename TO>
inline void custom_memcpy(sol_ctx* ctx, const sol_tensor* input, const sol_tensor* output) {
static_assert(std::is_same<TI, TO>::value);
auto i = sol_ptr<TI>(input);
auto o = sol_ptr<TO>(output);
memcpy(o, i, sizeof(TI) * input->numel);
}
As you see, you need to use sol_ptr<TYPE>(sol_tensor*)
to get the raw pointer
from the sol_tensor*
. Please use the following fields of the sol_tensor
if
necessary:
numel
: total number of elements within the tensordims
: number of dimensionsshape
: shape of tensor as array of sol_dim
(using sol_dim = int64_t
)All other fields are object to be changed, and should not be used!
If you need to allocate temporary data, you need to do it like this (You can allocate 1 to 8-dimensional tensors):
sol_tensor tmp = SOL_TENSOR_2D(ctx->device_handle, sol_dtype<T>(), DIM_0, DIM_1);
sol_malloc(&tmp);
T value = sol_ptr<T>(&tmp);
...
sol_free(&tmp);
If your header file shall call into an external library, keep in mind that you
provide the correct compiler_flags
for the device specific compiler. For
GCC/NCC this could be: -L/path/to/your/lib -llibraryname
to link the library
/path/to/your/lib/liblibraryname.so
.