Custom Layers

In v0.5.3 we added an experimental Custom Layer support. In this chapter we describe how it can be used from top to bottom:

Integration into Neural Network

For the integration into the neural network, we use custom layers, that integrate seamlessly into the AI framework’s computation mechanisms. However, not all features are available in all frameworks.

To create your own Custom layer, you can use the sol.[pytorch/tensorflow].nn.Custom class.

It’s constructor expects some arguments:

  1. function_names: Union[str, List[str], Tuple[str]]: List of C++ function names, that get called from the custom layer. 1st is used for forward pass, 2nd for backward pass.
  2. include_file: str: Name of C++ header file to be included by the custom layer. Gets included as #include<[include_file]>.
  3. compiler_flags: Dict[str, Union[Tuple[str], List[str]]]: Dictionary of compiler flags that get passed to he device’s native compiler. Use x86 (GCC), nvidia (NVCC), or ve (NCC) for device names.
  4. hyper_parameter: Union[Tuple[Union[int, str]], List[Union[int, str]]]: Hyper parameters that get passed as template parameters to the custom layer. Should be of C++ integer-like types, 'char', constexpr value or C-preprocessor macro.
  5. name: str: Name used in the SOL debug graph, or as name for the layer within the framework (if supported).

By default, the custom layer accepts multiple inputs, and returns identical outputs. If you need to recompute the shape or dtypes of the custom layer, you can overwrite the shape_inference and dtype_inference methods. 2nd should return dtypes in the framework’s native dtype format.

class Custom(sol.[pytorch/tensorflow].nn.Custom):
	def shape_inference(self, inputs: List[Tensor]) -> List[List[int]]:
		...

	def dtype_inference(self, inputs: List[Tensor]) -> List[DType]:
		...

When computing the derivative it might be required to have a copy available of any of the inputs or outputs. For this purpose you can overwrite the methods input_requires_copies and output_requires_copies. Each needs to return a list of boolean values, where True indicates that the value will be provided as copy within the backward pass.

class Custom(sol.[pytorch/tensorflow].nn.Custom):
	def input_requires_copies(self, inputs: List[Tensor]) -> List[bool]:
		...

	def output_requires_copies(self, inputs: List[Tensor]) -> List[bool]:
		...

Here a full example written in Pytorch.

class Custom(sol.pytorch.nn.Custom):
	def __init__(self):
		path			= [f'-I/path/to/your/code']
		include_file	= 'custom.h'
		compiler_flags	= {'x86': path, 've': path}
		super().__init__(
			('custom_memcpy', 'custom_memcpy'),
			include_file,
			compiler_flags,
			[0, "'c'"],
			name="Memcpy"
		)

	def _pytorch(self, inputs: List[Tensor]) -> List[Tensor]:
		return [inputs[0].clone()]

	def forward(self, x):
		return self._forward([x])[0]

model = torch.nn.Sequential(Custom())

As you can see, we have overwritten the forward method, as the default requires a list or tuple as input, which would not work with the torch.nn.Sequential. Further, within PyTorch you can provide a _pytorch method, that performs your operation when running the neural network with PyTorch. Due to technical limitations this is not available for TensorFlow!

C++ Code

The SOL custom layer integrates your layer into the SOL execution graph, performs all allocations/deallocations of input and output data and calls into the functions you have provides with the function_names arguments.

The raw scheme of the function call looks like this:

#include <sol/[device_type]/api.h>
#include <[include_file]>
SOL_API void [autogenerated_unique_function_name](sol_ctx* ctx) {
	[function_name[derivative]]<[training], [hyper_parameters], [i.dtype for i in inputs], [o.dtype for o in outputs]>(ctx, *inputs, *outputs, *input_copies, *output_copies);
}

As described before, first we include your header file and then call function_name[0] for forward pass and function_name[1] for backward pass.

For the template, we provide:

  1. Boolean training, that is false in inference, and true during training.
  2. All your provided hyper_parameters, i.e., [0, 'c'].
  3. C++ dtypes of all inputs
  4. C++ dtypes of all outputs

For the function arguments, we provide:

  1. sol_ctx* which needs to be used to allocate temporary data.
  2. sol_tensor* of all inputs or gradient output
  3. sol_tensor* of all outputs or gradient input
  4. sol_tensor* of all input copies (in backward pass only)
  5. sol_tensor* of all outputs copies (in backward pass only)

Within your header file you then can provide a function similar to this:

#include <type_traits>

template<bool training, int hyper_0, char hyper_1, typename TI, typename TO>
inline void custom_memcpy(sol_ctx* ctx, const sol_tensor* input, const sol_tensor* output) {
	static_assert(std::is_same<TI, TO>::value);
	auto i = sol_ptr<TI>(input);
	auto o = sol_ptr<TO>(output);
	memcpy(o, i, sizeof(TI) * input->numel);
}

As you see, you need to use sol_ptr<TYPE>(sol_tensor*) to get the raw pointer from the sol_tensor*. Please use the following fields of the sol_tensor if necessary:

  • numel: total number of elements within the tensor
  • dims: number of dimensions
  • shape: shape of tensor as array of sol_dim (using sol_dim = int64_t)

All other fields are object to be changed, and should not be used!

If you need to allocate temporary data, you need to do it like this (You can allocate 1 to 8-dimensional tensors):

sol_tensor tmp = SOL_TENSOR_2D(ctx->device_handle, sol_dtype<T>(), DIM_0, DIM_1);
sol_malloc(&tmp);
T value = sol_ptr<T>(&tmp);
...
sol_free(&tmp);

If your header file shall call into an external library, keep in mind that you provide the correct compiler_flags for the device specific compiler. For GCC/NCC this could be: -L/path/to/your/lib -llibraryname to link the library /path/to/your/lib/liblibraryname.so.