v0.2 Altair

  • fixed double free pointer and other error messages when running models for thousands of iterations
  • added torch.Tensor.clone(), torch.min(dim) and torch.max(dim) to be used outside of sol.optimize.
  • added PIP requirement for PyTorch, to ensure the correct version is installed
  • Fixed Segfault in native tensor mode #32
  • Fixed NAN results in backward pass #34
  • Other Bugfixes/Improvements: #28, #29, #33, #36
  • Updated to PyTorch 1.5.1 #19
  • You can get the current VE memory consumption with torch.hip.memory_allocated(device) #26
  • Fixed MemLeak in VE native tensor mode #4
  • Updated to AVEO 0.9.10 #24, which improves the handling of dead processes when SOL died.
  • Updated NLC to 2.1.0 #22
  • Redesigned SOL's seed system, to match usage as in the ML frameworks #21
  • Other Bugfixes/Improvements: #12, #16, #17, #18, #20, #23, #27
  • Supports: PyTorch 1.5.0
  • Bugfixes:
    • [VEBLAS/VEDNN] fixed autotuning #6
    • [Core] fixed crash when using Anaconda #5
  • Supports: PyTorch 1.5.0
  • Known Bugs:
    • Calling torch.concat(...) on CPU outside can result in wrong results. This is caused by a bug in PyTorch 1.5 which is triggered when registering the SX-Aurora API. - We submitted a pull request to PyTorch that will fix the problem in a future release.
  • New Features:
    • [Backends] the heuristics for the DNN backends can be modified with sol.config["heuristic::BACKEND::LAYER::PASS"]. Lower values get prefered. 0 disables the pass for the backend. Use sol.config.print() to get an overview of available options. These only apply if sol.config["autotuning"] = False.
    • [Core] SOL now can compile models for variable batch sizes! For this use sol.optimize(model, sol.input([0, 3, 224, 224]), batch_size=30), where batch_size is used as heuristic to tune the code for. SOL will try its best to provide good performance for all batch sizes, but it can happen that the performance will be lower than compiling for a specific batch size.
    • [Core] added sol.versions() to print versions of used compilers + libraries. Please add this information whenever you report a bug!
  • Bugfixes:
    • [Core] improved internal layout + dimension handling
    • [DFP] fixed bugs in autograd engine (i.e. $`\frac{d\text{pow}(x, n)}{dx}`$)
    • [DFP] fixed race condition in Embedding layer gradient
    • [DNNL] bugfixed grouped convolutions in ResNext
    • [PyTorch] fixed problems with torch.addmm as reported by @malon
    • [VEDA] fixed only showing 1 VE when VEDA_VISIBLE_DEVICES is not set
    • [VEDA] fixed rare race condition, causing "VEDA_ERROR_NOT_INITIALIZED" errors when shutting down the application
    • [VEDA] improved error handling and messages
  • Breaking Changes:
    • [PyTorch] renamed camelcase aB arguments to a_b to match PyTorch conventions.
  • Other Changes:
    • [Core] No-Op layers use names of Input, Param or Output nodes.
    • [Core] Removed sol.config["ve::profiling"].
    • [Core] sol.config["compiler::profile"] = True prints execution times of fused layers.
    • [Core] added memory consumption visualisation. See Usage/Debug for more details.
    • [DNNL] added option for varying memory layouts in filter-pass
    • [MKL] added sol-backend-mkl for X86
    • [PyTorch] added torch.Tensor.data_ptr(), torch.where(), torch.Tensor.where(), torch.Tensor.[__lt__, __le__, __gt__, __ge__, __eq__, __ne__]
    • [VE] SOL will warn if TensorFlow-VE is installed on the system, as it is known to cause serious problems in combination with SOL.
    • [VE] added sol-backend-veda-test to test minimal functionality
  • Deprecated:
    • [VE] Calling torch.nn.L1Loss or torch.nn.functional.l1_loss(...) outside of the sol.optimize(...) call is deprecated and will be removed in a future release. Please add these to the model, i.e.:
      class Model(torch.nn.Model):
      	def __init__(...):
      		self.loss = torch.nn.L1Loss()
      	def forward(self, x):
      		x = self.layers(x)
      		y = self.loss(x)
      		return x, y # x is your inference output, y the loss
  • Supports: PyTorch 1.5.0
  • Bugfixes:
    • added torch.addmm(), torch.addbmm() and torch.addmv() as requested my @malon
    • added transformations to remove simple arithmetic operations from the computation graph (i.e. A + 0.0, A * 1.0, ...)
    • AVEO gets compiled with nfort-3.0.4 to solve compability issues with installations that have nfort < 3.0.27 installed.
    • fixed rare problem in scheduling mechanism
    • fixed code generation problem for tensors with only a single element
  • Supports: PyTorch 1.5.0
  • Bugfixes:
    • fixed torch.sum(), torch.Tensor.sum() gradient
    • enabled trace log in release build (might be removed again in a future releases)
    • added torch.nn.L1Loss, torch.nn.functional.l1_loss
    • fixed VE-PyTorch-API integration
    • improved VE-thread scheduler for small batch sizes
    • fixed AVEO integration
    • updated SOL-PyTorch documentation
  • Supports: PyTorch 1.5.0
  • !!! Breaking Changes !!!
    • sol.optimize and sol.deploy no longer have the parameters vartype, layout and requires_grad. Instead you can directly pass tensors, i.e. sol.optimize(model, torch.rand(3, 5)) or use sol.optimize(model, sol.input([3, 5], torch.long, true, sol.layout.ncp)) to specify types that are not torch.float! This was necessary to support inputs of different type, i.e., in Transformer networks!
    • Renamed "sol.vartype" to "sol.dtype" to be syntactically closer to the AI frameworks
    • Starting with v0.2.0, SOL will have release names of solar systems. This first release is called "Altair".
  • Added:
    • torch.squeeze, torch.Tensor.squeeze, torch.Tensor.squeeze_
    • torch.unsqueeze, orch.Tensor.unsqueeze, torch.Tensor.unsqueeze_
    • torch.erf, torch.sin, torch.cos, torch.pow, torch.tanh, ...
    • torch.nn.LayerNorm
    • ...
  • Bugfixes
    • jit::dot failed with model names that contain spaces
  • New Features/Improvements
    • Supports multiple VE at the same time! We ported most torch.cuda calls to torch.hip. You can also use A.to('hip:1') syntax now!
    • Added support for Transformer networks such as BERT (tested with https://github.com/dhlee347/pytorchic-bert and https://github.com/huggingface/transformers/)
    • Improved Performance for VE random number generation
    • SOL now reports if it recompiles or uses a cached version of the network.
  • Known Bugs/Limitations
    • If SOL crashes it can be that processes keep alive left on the Aurora. Run pkill ve_exec if this happens.
    • Calling torch.concat(...) on CPU outside can result in wrong results. This is caused by a bug in PyTorch 1.5 which is triggered when registering the SX-Aurora API. - We submitted a pull request to PyTorch that will fix the problem in a future release.