v0.5 Diadem

VersionDateChanges
v0.5.0rc2
Docs
23.05.2022

This is the 2nd release candidate of SOL v0.5.0 Diadem. It adds brings back support for ONNX, Numpy (runtime), as well as lots of bugfixes and improvements.

Highlights

  • ONNX frontend
  • Numpy runtime
  • Printing StackTrace when exception occurs in SOL
  • Improved printing of model signature
  • The installer no longer shows packages you don't have access to

Closed Issues

  • #624 [FFT] duplicate symbol "vdims"
  • #623 [TF] tf.nn.max_pool_with_argmax indices differ again...
  • #619 [Hugo] Upgrade 0.99.0
  • #617 [ONNX] "Unable to fetch values for " in ShuffleNet
  • #616 [ONNX] "can't 384 / 13 because it's no divider" in X86 Igornet and Layers[Repeat]
  • #615 [ONNX] "Unable to find dimension PO1" in X86 MNasNet and EfficientNet
  • #614 [ONNX] Cumsum accuracy issues
  • #613 [Core] Improve calculation of memory consumption estimation
  • #612 [TF] TF casts shape of [1,1,1,1] to [] during training
  • #611 [NCC] Segfault in networks using VDims
  • #610 [ONNX] CumSum
  • #606 [ISPC] Wrong initialization of VDims
  • #605 [DB] Verify that we can open multiple instances of SOL using the same DB file
  • #604 [X86+FFT] Accuracy Error in PY_IRFFT2D
  • #603 [X86+PyTorch] Memory is already allocated
  • #602 [X86+FFT] [DNNL ERROR] The operation failed because of incorrect function arguments.
  • #599 [PyTorch] Missing Primitive "aten::empty"
  • #598 [HLIR] VDims cause promotion to F64 in BatchNorm
  • #595 [VE] RNN Segfault in Testcases
  • #594 [Tests] Report % of values that exceed threshold
  • #592 [PyTorch] Models without parameter can create training context if 'model.training = False' is not set
  • #591 [Docs] Update documentation for v0.5.0rc2
  • #590 [Core] "axis needs to be between 0 and 0 but found 1" in DeReduce
  • #589 [TF] Can't use Keras models with scalar parameters
  • #587 [TF] Ordering of Parameters in KerasLayer is not stable
  • #586 [VE] Investigate possible Memleak in PyTorch-VE Runtime
  • #584 [PyTorch] Add Error Handling when User provides too many/too few arguments to function
  • #583 [JsonCPP] Upgrade to 1.9.5
  • #581 [ISPC] Upgrade to 1.18.0
  • #580 [CMake] Change min GCC to 10.X
  • #579 [Hugo] upgrade 0.98.0
  • #578 [PyTorch] RNNCell parsing fails in PyTorch 1.11.0
  • #577 [PyTorch] "illegal view transformation" when parsing RNN
  • #576 [DB] automatically recover from "database image malformed" exceptions
  • #575 [VE] RNN does not compile with NCC 3.4.2
  • #574 [Runtime] SegFault in Context::Destroy when starting another training context.
  • #573 [Core] Print StackTrace when SOL Exceptions are thrown
  • #572 [VE] Value2VDim
  • #571 [X86] Value2VDim
  • #570 [VE] WhereTrue
  • #569 [X86] WhereTrue
  • #568 [VE] PrefixSum
  • #567 [X86] PrefixSum
  • #566 [HLIR] return NAN instead of throwing error when dividing by 0
  • #563 [HLIR] Remove Unnecessary Permute
  • #562 [Runtime] do not reinit Params if the model has been run before
  • #561 [PyTorch] add Mobilenet V3 testcases
  • #560 [PyTorch] add efficientnet testcases
  • #559 [PyTorch] add ConvNext Testcases
  • #558 [PyTorch] add RegNet testcases
  • #556 [HLIR] show actual input/output shapes, not the "possible"
  • #555 [Installer] Cache Password for PIP
  • #554 [Installer] hide packages that user does not have access to
  • #553 [Runtime] Context needs to distinguish between offloading and framework handle!
  • #552 [Numpy] Implement Lazy Allocations in Numpy executor
  • #551 [HLIR] Fix Memory Consumption to report Copies not as Outputs
  • #550 [NCC/OMP] cannot explicitly instantiate std::tuple in sol parallel simd with NCC 3.3 or newer
  • #549 [Compiler/Runtime] Encode/Check framework version, device compute capability, ... in compiled NN library name
  • #548 [SQLite] Upgrade to 3.38.2
  • #546 [DNNL] Upgrade 2.6
  • #545 [TF] SOL's MaxPooling makes other choices than TF's implementation in training
  • #544 [TF] implement save/load methods in Keras model
  • #543 [TF] fix BatchNorm Assignment
  • #539 [DNNL] Illegal Operation in Resnext BWD Pass
  • #538 [Python] Don't store size of Tensor in Python but always request from C++ space
  • #522 [DNNL] Performance Problem in ConvBwdData Param Reorder
  • #514 [PyTorch] Upgrade 1.11.0
  • #512 [HLIR] Copy Buffers during Training, not LayerOutputs
  • #507 [TF] Which momentum does TF use in BatchNorm?
  • #492 [TF/VE] Check if Optimizers and Loss functions are implemented
  • #438 [TF] Update 2.8.0
  • #429 [ONNX] Add Handler option
  • #425 [HLIR] Remove unnecessary Copies in RNN -> Copy -> Output i.e. for Workspace
  • #276 [ONNX] set that params require grad, so we can train ONNX models
  • #201 [DFP] Double check if we correctly report the scratchpad memory
  • #186 [ONNX] use sol.internal.Tensor operators instead of explicit calls
  • #166 [PyTorch] torch.nn.InstanceNorm2d missing
  • #165 [PyTorch] torch.nn.GroupNorm missing
  • #113 [ONNX] PRelu
  • #112 [ONNX] Gather
  • #108 [ONNX] ERROR getsym_handler
  • #74 [DFP] Performance: implement Input Loop Merging
v0.5.0rc1
Docs
23.03.2022

The SOL v0.5.0 Diadem is the next major release of SOL, containing already over 160 closed issues! We further switched to a new rolling releases model using release candidates, to push out fixes + changes more often.

This first release candidate DOES NOT contain support for: NVIDIA devices, ONNX or Deployment! These features will be reenabled in later release candidates.

Highlights

Breaking Changes

  • We modified the sol.optimize(model, args, kwargs={}, framework=None, **fwargs) call. If you have more than one input, you now need to pass it using the args argument as list or tuple, or using the kwargs as a dictionary. This was necessary to be more compliant with the AI framework's.
  • sol.optimize(..., batchsize=...) has been removed in favor of the new variable dimensions system. Please look here for more details.

Closed Issues

  • #538 [Python] Don't store size of Tensor in Python but always request from C++ space
  • #537 [OpenSSL] Upgrade to 1.1.1n
  • #535 [PyTorch] create OMP Symlink in sol-framework-pytorch-x86
  • #534 [DNNL] Upgrade 2.5.3
  • #533 [SQLITE] Upgrade 3.38.0
  • #531 [Core] Report Peak Memory in Output during Compilation Phase for each Pass
  • #530 [Python] increase min Python Version from 3.6 to 3.7 because of TF 2.8.0 requirement
  • #529 [PyTorch] Upgrade v1.10.2
  • #524 [PyTorch] Check why we get weird JIT accuracy errors in HuggingFace Bert training
  • #523 [Python] deprecated "copy_parameters" and just do it always.
  • #521 [DNNL] implement memory layout AutoTuning for Conv
  • #520 [CMake] Enable Release Candidates
  • #519 [VE] If NC++ is not found, we need to disable VE entirely, not just not setting the VE_LD_LIBRARY_PATH
  • #518 [VEDA] upgrade 1.2.0
  • #517 [Runtime] segfault when an inference Context is still open and get cleaned up during shutdown
  • #516 [PyTorch] Find a solution for conflicting GOMP implementation
  • #515 [OMP] Limit number of threads to match number of hardware cores, not number of hardware threads
  • #511 [Runtime] Store raw pointer in sol_tensor
  • #510 [DFP] Unroll Interpolation using TmpVars
  • #509 [Runtime] Evaluate if TBB is better than OMP
  • #508 [Runtime] Remove sol::runtime::Tensor
  • #505 [DFP] Remove Rand from DFP and implement in ISPC/CUDA and VEASL
  • #500 [DFP] input loop merge
  • #499 [Python API] Change sol.optimize(*args, **kwargs) to use one variable
  • #496 [DFP] Ensure that in ISPC each core uses it's own random state
  • #494 [DFP] Remove GLoops
  • #493 [DFP] Don't allocate stack memory in CORES loops.
  • #491 [HLIR] Fix weird TorchVision structure of DenseNet
  • #490 [TF] ValueError: Data cardinality is ambiguous: x sizes: 1, 100, 100
  • #489 [TF] OperatorNotAllowedInGraphError: iterating over `tf.Tensor` is not allowed in Graph execution. Use Eager execution or decorate this function with @tf.function.
  • #488 [HLIR] Add verification for DTypes
  • #487 [PyTorch] Testcase "MemberTensors" fails in Training
  • #486 [TF] can we pass the sol_ctx as attribute instead of scalar tensor?
  • #485 [TF] Enable models with more than 255 model parameter tensors
  • #483 [Core] Buffer Chaining fails if there is a view in between the concat operations
  • #480 [PyTorch] add strided slice support
  • #479 [SQLite] Upgrade to 3.37.2
  • #478 [TF] add missing StridedSlice features
  • #477 [ISPC] Upgrade to 1.17.0
  • #475 [DNNL] Upgrade to v2.5.2
  • #471 [X86] correctly detect vector instructions
  • #470 [TF] tf.math.cumsum
  • #469 [ISPC] migrade dnn_ispc to ispc, as it does not use any dnn components
  • #468 [Core] Verify that all layers have properly been assigned an algorithm before generating code
  • #466 [PyTorch] add torch.cumsum
  • #465 [TF] add tf.Where
  • #464 [Core] remove NUMEL through transform if VDIMS get removed
  • #463 [Core] add input to NUMEL
  • #462 [TF] backport new sol.hlir.reduce API
  • #461 [Core] Revise sol_tensor using real shape and separate numel field
  • #460 [DFP] BackendRenderer missing DTYPE
  • #459 [SQLite] Upgrade to 3.37.1
  • #458 [AVEO] Upgrade to 2.10.0
  • #457 [PyTorch] Support multivalued constants
  • #456 [HLIR] Bugfix GEMM batchHelper function
  • #455 [VDIMS] PyTorch AvgPool2D#3 with two different shapes after each other fail because of identical Hash
  • #454 [Pytorch] Upgrade 1.10.1
  • #453 [Docs] Add TensorFlow Native to VE Chapter
  • #452 [DNNL] Upgrade to 2.5.1
  • #451 [Core] Remove Tensor Class
  • #450 [Docs] Correct VDim case with #2 == 5
  • #449 [Deprecated] YaalUpdate and YaalSeed
  • #448 [VDIMS] option to enable/disable VDIM usage, disable by default.
  • #447 [DFP] Fix BatchNorm BWD Pass
  • #446 [DFP] Flag Loops where DataStride() == 1 for ALL data to be used for SIMD
  • #445 [PyTorch] torch.reshape missing
  • #444 [Core] Print NN Input/Output Signature
  • #443 [PyTorch] argmin + argmax missing
  • #442 [TF] wrong results in activation layers
  • #439 [PIP] Enable manylinux2014 compatible build process
  • #437 [DNNL] Upgrade 2.4.4
  • #436 nec-sol: download.pytorch.org may also need to be trusted
  • #435 nec-sol: codec can't display license agreement
  • #434 upon program exit, malloc(): unsorted double linked list corrupted
  • #433 [TF] Enable to parse tf.Module and tf.saved_models
  • #432 [TF] Upgrade to 2.6.1
  • #431 [PyTorch] torch.rand* and torch.randint* missing
  • #430 [Pytorch] Enable Models without inputs
  • #428 [PyTorch] Add Handler option
  • #427 [TensorFlow] Add Handler option
  • #426 [HLIR] Add Custom Layer Support
  • #423 [PyTorch] Possible Segfault in Py_InfNAN on X86
  • #422 [Core] Add Option to register Operations at runtime
  • #421 [Core] Missing library libcrypto.so.10 on Ubuntu 20.04
  • #420 [PyTorch] Debug v1.10.0 print problems with native-ve
  • #419 [PyTorch] Upgrade RNN API to new HLIR version
  • #418 [CUDNN] Upgrade to 8.2.4
  • #417 [CUDA] Upgrade to 11.3
  • #416 [PyTorch] Upgrade to v1.10.0
  • #415 [CORE] use SQLite Transactions to lock SOL Cache in multiple processes
  • #414 [DNNL] Upgrade to 2.4.2
  • #413 [VE] replace sol_ve_copy with VEDAMemset and VEDAMemcopy
  • #412 [VEDA] Add device side memset
  • #411 [VE] Add ComplexDTypes to Device API
  • #410 [TBB] CMake install script fails on first run.
  • #408 [VEDA] memset 128
  • #407 [VEDA] CMake ASL FFTW
  • #401 [DNNL] Upgrade to 2.4.1
  • #399 [NCC] Add Struct Functions
  • #398 [X86] Check Handle::memset performance!
  • #397 [VEDA] improve S8, S16 vedaMemset Performance
  • #396 [TBB] Upgrade to 2021.4.0
  • #395 [DNNL] Upgrade to 2.4
  • #394 [PyTorch] torch.complex missing
  • #393 [PyTorch] torch.imag missing
  • #391 [PyTorch] sol.optimize(model, torch.Tensor) stopped working?
  • #389 [NEC-SOL] automatically detect VENV
  • #388 [PyTorch] use torch.jit.trace instead of torch.jit.script to support Transformers
  • #387 [HLIR/DFP] Change Dropout to always use input rand
  • #386 [NEC-SOL] UTF-8 encoding problem
  • #385 [DFP] Bad Loop Merging in multi-path kernels
  • #384 [DFP] Wrong vectorization in Bias Sum Case
  • #383 [YAAL] Improve Scheduling
  • #382 [PyTorch] Upgrade to 1.9.1
  • #381 [Tests] Restructure TESTS package, so the PyTorch package does not load TF and vice versa.
  • #379 [NNPACK] Deprecate
  • #376 [ISPC] Check OpenMP implementation performance for small batchsizes
  • #375 [TensorFlow] Upgrade to 2.6.0
  • #364 [DFP] Fix Tile backward pass for X86
  • #363 [PyTorch] aten::repeat missing
  • #362 [PyTorch] aten::tile missing
  • #360 [PyTorch] aten::smooth_l1_loss
  • #355 [PyTorch] HuggingFace transformers broken
  • #332 [TF] Enabled Delayed Allocation in Native-VE
  • #329 [PyTorch] Add BatchNorm num_batches_tracked
  • #321 [API] change sol_external_malloc to use shapes instead of accumulated sizes
  • #307 [HLIR] Input > Dropout > Output produce wrong results in Inference
  • #306 [TF] tf.nn.max_pool_with_argmax returns wrong indicies
  • #302 [PyTorch] Use VE instead of HIP device
  • #299 [PyTorch] Slicing with [0:0] should return empty Tensor
  • #293 [HLIR] Constant with more than 1 element
  • #289 [TF] Remove sol_model.convert("device")
  • #285 [DFP] Remove Nests
  • #284 [DFP] Implement Linear
  • #271 [TensorFlow] enable lazy allocations
  • #267 [Core] Remove OptimizerType
  • #259 [IgorNet] Can't run Inference (division by zero)
  • #239 [PyTorch] add torch.nn.functional.interpolate
  • #225 [DNNL] Upgrade to 2.3.2
  • #212 [HLIR] SIGN on Unsigned is 1
  • #211 [HLIR] Abs on Unsigned is Noop
  • #208 [PyTorch] some networks can't use variable BatchSize
  • #203 [DFP] Buffered Dropout >> Narrow allocates too few memory
  • #202 [Compiler] Show Progress Bar already during Code generation Phase
  • #199 [HLIR] Detect Permutations in front of GEMM and merge them into the GEMM by changing the layout.
  • #198 [HLIR] Initialize Gradients with 0 and make all multi-connections to REQURIESACC
  • #193 [PyTorch] "Only one dimension for View can use summation." in ShuffleNet when using Variadic BatchSizes
  • #187 [DFP] BatchNorm does not support NOT to trace runtime metrics
  • #184 [PyTorch] No Gradient for inputs which get disconnected
  • #183 [PyTorch] LPPool Backward Pass results do not match
  • #182 [DFP] Split BatchNorm into two operations to better fit the actual computations and to remove the workarounds in DFP module
  • #177 [VEBLAS] Memory Estimation: RNN
  • #163 [PyTorch] torch.Tensor.repeat missing
  • #161 [PyTorch] Interpolation Layers missing
  • #157 [Autotuning] Device reports out of memory, as SOL does not use the framework's memory allocator during auto-tuning
  • #139 [Core] Enable to save/load SOL models
  • #135 [HLIR] Pytorchic BERT can't use variable batchsize because of the view?
  • #131 [DFP] segfault generating RReLU layer
  • #120 [DType] Add Complex DTypes
  • #117 [PyTorch] FFT
  • #81 [Variable BatchSize] GPT-2 can't use variable batchsize
  • #80 [Clusters] Errornous accumulation in GPT-2 BWD Pass
  • #71 [DFP] missing layer: MaxUnPooling
  • #70 [DFP] LeakyRelu: Remove IF in generated code
  • #35 [X86] Does not show correct used memory