The NEC SX-Aurora supports two execution modes, transparent and native offloading. If you only care about inference, the transparent offloading methods is the easiest to use. For training the native offloading should be used, if available for your framework.
To use the NEC SX-Aurora, it is necessary to set sol.device.set(sol.device.ve, deviceIdx)
(deviceIdx is the index of the Aurora to run on, start from 0). Further it is necessary that the input data is located on the host system. In case you are not using a default installation, you need to use VE_LD_LIBRARY_PATH
to point to libnc++.so
, libcblas.so
, libblas_sequential.so
, libasl_sequential.so
.
Native Offloading is only supported for PyTorch yet!
Within PyTorch we support to use native tensors. For this program PyTorch as if you would use a GPU but replace all calls to cuda
with hip
. I.e.
model.to('hip:0') # copy model to VE#0
input = input.to('hip:0') # copy data to VE#0
model(input) # gets executed on the device
torch.hip.synchronize() # wait for execution to complete
(see https://pytorch.org/docs/stable/cuda.html for description)
torch.Tensor.to('hip')
torch.Tensor.to('hip:X')
torch.hip.synchronize(device=0)
torch.hip.is_available()
torch.hip.current_device()
torch.hip.set_device(device)
torch.hip.device_count()
torch.hip.memory_allocated(device=None)
CLASS torch.hip.device(device)
CLASS torch.hip.device_of(device)
torch.Tensor.hip()
Common Problems
- Only a minimal set of functions is implemented for calling operations on VE tensors outside of model’s that have been optimized with SOL.
- PyTorch tends to initialize all devices during training, even if they don’t get used. In terms of the VE it will block all devices in the system! To prevent this, use the env var
VEDA_VISIBLE_DEVICES
to define which devices you want to use. Example:export VEDA_VISIBLE_DEVICES=0,2,5
. This will enable VEs 0, 2 and 5 and map them onto the numbers0 >> 0, 2 >> 1, 5 >> 2
. The ordering of the devices inVEDA_VISIBLE_DEVICES
does not affect the final ordering, this is always ascending.- If SOL gets stuck, it can be that dead processes are running on the VE. Run
pkill ve_exec
and try to rerun SOL. Same happens if you try to start multiple applications on the same VE at the same time.- If you have GCC-VE and G++-VE installed, you need to make sure that they are not in the PATH, as SOL expects
gcc
andg++
to be the host compilers!- If SOL reports that it cannot find
nc++
, runnc++ --version
and check if you have execution privileges for it and if there is no license problem.- If your
nc++
is not installed in the default folder/opt/nec/ve/bin/
you need to specify the env varNCXX=../nc++
. Same applies forNLD=../nld
andNAR=../nar
.- You can use
sol.config["ve::reporting"] = True
to enable ncc/nc++ reporting.