NEC SX-Aurora

Requirements

Requirement Version
VEOS ≥ 2.7
NCC ≥ 5.0 if using VE3

Native Offloading (PyTorch)

Within PyTorch we support to use native tensors. For this program PyTorch as if you would use a GPU but replace all calls to cuda with ve. E.g.:

model.ve()               # copy model to VE#0
input = input.ve()       # copy data to VE#0
model(input)             # gets executed on the device
torch.ve.synchronize()   # wait for execution to complete

Available functions

(see https://pytorch.org/docs/stable/cuda.html for description)

torch.Tensor.ve()
torch.Tensor.to('ve')
torch.Tensor.to('ve:X')
torch.nn.Module.ve()
torch.ve.synchronize(device=0)
torch.ve.is_available()
torch.ve.current_device()
torch.ve.set_device(device)
torch.ve.device_count()
torch.ve.memory_allocated(device=None)
CLASS torch.ve.device(device)
CLASS torch.ve.device_of(device)

Training

Loss functions are not implemented natively for VE. Instead use a wrapper model to add the loss function to the SOL optimized model.

class TrainingModel(torch.nn.Module):
	def __init__(self, model, loss):
		super().__init__()
		self.model = model
		self.loss  = loss

	def forward(self, input, target):
		output = self.model(input)
		loss   = self.loss (output, target)
		return output, loss

And adjust your training loop:

device = 've:0'
model.to(device)

optimizer      = torch.optim.SGD(model.parameters())
training_model = TrainingModel(model, torch.nn.L1Loss())
training_model = sol.optimize(training_model)

for input, target in dataset:
	input, target = input.to(device), target.to(device)
	output, loss  = training_model(input, target)
	loss.backward()
	optimizer.step()

training_model and model share the same weights, so you don’t need to further adjust your code.

For optimal performance, and if you don’t need PyTorch identical pseudo random numbers, use sol.optimize(..., determinism=sol.pytorch.determinism(sol.Determinism.Rand_Fastest)), which enables to use much faster random number generators.

Native Offloading (TensorFlow)

Due to increasing number of unresolved issues in TensorFlow PluggableDevice API (e.g., #55497, #57095, #60883, #60895) we decided to no longer maintain our veda-tensorflow extension. Therefore you cannot longer use with tf.device("/VE:0"):. Instead please use Transparent Offloading using sol.device.set('ve', 0). We are sorry for the inconvenience, but we don’t see any commitment of the TensorFlow team to accept our bugfixes, nor to fix the issues themselves.

Transparent Offloading (all frameworks)

To use the NEC SX-Aurora, it is necessary to set sol.device.set("ve", deviceIdx) (deviceIdx is the index of the Aurora to run on, start from 0). Further it is necessary that the input data is located on the host system.

As explained in our paper SOL: Effortless Device Support for AI Frameworks without Source Code Changes running inference with Transparent Offloading has nearly zero impact on the performance. However, training performance will be really low!

Config Options

Option Type/Default Description
ve::trace bool/false Enables to use ftrace.
ve::packed bool/false Enables use of packed vector for float32.

Known Issues

  1. float16 or bfloat16 data types are not supported
  2. 3D Convolution or DeConvolution are not supported
  3. running SOL in a write-protected folder might cause compilation to fail #1325
  4. PyTorch’s Bernoulli and Dropout don’t produce identical pseudo random numbers, due to unavailability of MKL’s VSL Bernoulli algorithm for VE.

Env Vars

EnvVar Default Description
NAR “/opt/nec/ve/bin/nar” Path to nar
NCXX “/opt/nec/ve/bin/nc++” Path to nc++
NOBJCOPY “/opt/nec/ve/bin/nobjcopy” Path to nobjcopy
VEDA_VISIBLE_DEVICES see VEDA for description
VE_NODE_NUMBER see VEDA for description
VE_OMP_NUM_THREADS see VEDA for description
_VENODELIST see VEDA for description
VE_LD_LIBRARY_PATH see VEDA for description
NCPATH Used as include paths
NC_INCLUDE_PATH Used as include paths
NCPLUS_INCLUDE_PATH Used as include paths
NLIBRARY_PATH Used as library paths

FAQ

The AI framework reports that an operation is not supported by device type "VE"

This is caused by the fact, that only a minimal subset of VE function calls are supported to be executed “eagerly” within the framework, i.e., +, -, *, /, … If you encounter this problem, please open an issue for VEDA-PyTorch.

SOL reports "not found" for NCC compiler.
Possible Cause 1

SOL is unable to find /opt/nec/ve/bin/nc++. If you don’t use a standard installation, please use NCXX, NAR and NLD env vars to specify the paths to your NCC installation.

Possible Cause 2

If there is a problem with your NCC license SOL is unable to properly detect the compiler. Please run nc++ –version and check for any error messages.

SOL crashes with nc++: /opt/nec/ve/ncc/3.4.2/libexec/ccom is abnormally terminated by SIGSEGV.

On some systems NCC v3.4.2 crashes when compiling code generated by SOL. If you encounter this problem, please switch to an older version of the compiler using the NCXX env var.

SOL reports VEDA_ERROR: VEDA_ERROR_CANNOT_CREATE_CONTEXT.

This error message is triggered when the VE is occupied by another process. SOL relies on AVEO which requires exclusive access to the device. To resolve this issue, terminate all other processes on the device. You can use VE_NODE_NUMBER=0 /opt/nec/ve/ve-top to identify running processes.

Singularity Containers

You can use the following scripts to build Singularity Containers that contain SOL.

Remote Setup

If you want to install SOL using the official repository use this script.

BootStrap: docker
From: rockylinux/rockylinux:8.10

%post
	# setup OS
	dnf update -y
	dnf install -y gcc-toolset-10 python312					# SOL requirements
	dnf install -y epel-release								# VEOS requirements
	dnf install -y libquadmath libdhash protobuf-c log4c	# VEOS requirements
	
	# setup VENV
	python3 -m venv /venv

	. /venv/bin/activate
	python3 -m pip install --upgrade pip
	python3 -m pip install {{ PYTHON_FRAMEWORKS }}
	python3 -m pip install nec-sol
	python3 -m nec-sol install -u "{{ SOL_USERNAME }}" -p "{{ SOL_PASSWORD }}" --accept-license --devices ve
	deactivate

%environment
	# init VE paths
	export PATH=$PATH:/opt/nec/ve/bin
	export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/nec/ve/veos/lib64

	# init GCC/10
	. scl_source enable gcc-toolset-10
	export CC=/opt/rh/gcc-toolset-10/root/usr/bin/gcc
	export CXX=/opt/rh/gcc-toolset-10/root/usr/bin/g++

	# init VENV
	. /venv/bin/activate

	# Configure Proxy here if needed
	# export https_proxy=192.168.0.1:1234
	# export http_proxy=192.168.0.1:1234

Create a file sol4ve.cfg and set the correct values for the variables, replace {USERNAME} and {PASSWORD} with your credentials:

SOL_USERNAME={USERNAME}
SOL_PASSWORD={PASSWORD}
PYTHON_FRAMEWORKS=torch torchvision

And then build it with the following command.

sudo -E singularity build --build-arg-file sol4ve.cfg sol4ve.sif sol4ve.def

Local Setup

In case you want to install SOL from a local folder, you can use the following script.

BootStrap: docker
From: rockylinux/rockylinux:8.10

%files
	{{ SOL_PATH }} /sol

%post
	# setup OS
	dnf update -y
	dnf install -y gcc-toolset-10 python312					# SOL requirements
	dnf install -y epel-release								# VEOS requirements
	dnf install -y libquadmath libdhash protobuf-c log4c	# VEOS requirements
	
	# setup VENV
	python3 -m venv /venv

	. /venv/bin/activate
	python3 -m pip install --upgrade pip
	python3 -m pip install {{ PYTHON_FRAMEWORKS }}
	python3 -m pip install --pre nec-sol-core[ve,torch] veda-pytorch -f /sol # add features if needed
	deactivate

	rm /sol/*.*
	rmdir /sol

%environment
	# init VE paths
	export PATH=$PATH:/opt/nec/ve/bin
	export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/nec/ve/veos/lib64

	# init GCC/10
	. scl_source enable gcc-toolset-10
	export CC=/opt/rh/gcc-toolset-10/root/usr/bin/gcc
	export CXX=/opt/rh/gcc-toolset-10/root/usr/bin/g++

	# init VENV
	. /venv/bin/activate

	# Configure Proxy here if needed
	# export https_proxy=192.168.0.1:1234
	# export http_proxy=192.168.0.1:1234

Create a file sol4ve.cfg and set the correct values for the variables, replace {PATH} with path to the SOL download folder.

SOL_PATH={PATH}
SOL_PASSWORD={PASSWORD}
PYTHON_FRAMEWORKS=torch torchvision

And then build it with the following command.

sudo -E singularity build --build-arg-file sol4ve.cfg sol4ve.sif sol4ve.def

Execution

To run the container, just execute the following command. Instead of manually binding the required folders.

singularity shell --writable-tmpfs --bind /opt/nec:/opt/nec:ro --bind /usr/lib64/libaurlic.so.1:/usr/lib64/libaurlic.so.1:ro --bind /var/opt/nec/ve/veos/:/var/opt/nec/ve/veos/:rw sol4ve.sif

Alternatively you can also use the SINGULARITY_BIND env var.

export SINGULARITY_BIND=/opt/nec:/opt/nec:ro,/usr/lib64/libaurlic.so.1:/usr/lib64/libaurlic.so.1:ro,/var/opt/nec/ve/veos/:/var/opt/nec/ve/veos/:rw
singularity shell --writable-tmpfs sol4ve.sif