NEC SX-Aurora

The NEC SX-Aurora supports two execution modes, transparent and native offloading. If you only care about inference, the transparent offloading methods is the easiest to use. For training the native offloading should be used, if available for your framework.

Native Offloading (only PyTorch)

Native Offloading is only supported for PyTorch yet!

PyTorch

Within PyTorch we support to use native tensors. For this program PyTorch as if you would use a GPU but replace all calls to cuda with hip. I.e.

model.hip()               # copy model to VE#0
input = input.hip()       # copy data to VE#0
model(input)              # gets executed on the device
torch.hip.synchronize()   # wait for execution to complete

Available functions

(see https://pytorch.org/docs/stable/cuda.html for description)

torch.Tensor.hip()
torch.Tensor.to('hip')
torch.Tensor.to('hip:X')
torch.nn.Module.hip()
torch.hip.synchronize(device=0)
torch.hip.is_available()
torch.hip.current_device()
torch.hip.set_device(device)
torch.hip.device_count()
torch.hip.memory_allocated(device=None)
CLASS torch.hip.device(device)
CLASS torch.hip.device_of(device)

Transparent Offloading (all frameworks)

To use the NEC SX-Aurora, it is necessary to set sol.device.set("ve", deviceIdx) (deviceIdx is the index of the Aurora to run on, start from 0). Further it is necessary that the input data is located on the host system. In case you are not using a default installation, you need to use VE_LD_LIBRARY_PATH to point to libnc++.so, libcblas.so, libblas_sequential.so, libasl_sequential.so.

Common Problems

  • Only a minimal set of functions is implemented for calling operations on VE tensors outside of model’s that have been optimized with SOL.
  • PyTorch tends to initialize all devices during training, even if they don’t get used. In terms of the VE it will block all devices in the system! To prevent this, use the env var VEDA_VISIBLE_DEVICES to define which devices you want to use. Example: export VEDA_VISIBLE_DEVICES=0,2,5. This will enable VEs 0, 2 and 5 and map them onto the numbers 0 >> 0, 2 >> 1, 5 >> 2. The ordering of the devices in VEDA_VISIBLE_DEVICES does not affect the final ordering, this is always ascending.
  • If SOL gets stuck, it can be that dead processes are running on the VE. Run pkill ve_exec and try to rerun SOL. Same happens if you try to start multiple applications on the same VE at the same time.
  • If you have GCC-VE and G++-VE installed, you need to make sure that they are not in the PATH, as SOL expects gcc and g++ to be the host compilers!
  • If SOL reports that it cannot find nc++, run nc++ --version and check if you have execution privileges for it and if there is no license problem.
  • If your nc++ is not installed in the default folder /opt/nec/ve/bin/ you need to specify the env var NCXX=../nc++. Same applies for NLD=../nld and NAR=../nar.
  • You can use sol.config["ve::reporting"] = True to enable ncc/nc++ reporting.