SOL v0.5.1 Documentation > Releases > v0.5 Diadem

v0.5 Diadem

Version Date Changes

Version	Date	Changes
v0.5.1 Docs	17.10.2022	Highlights General `sol.check_version()` command can be used to check for new version. Form to apply for SOL4VE closed beta. Improved command-line based installer. PyTorch Calls to `sol.optimize(...)` from PyTorch does no longer require example inputs. Instead it gets parsed the very first time it gets executed. Limited support for `torch.einsum(...)` Support for GANs through MaxUnPooling and TransposedConv layer support. Automatically detection when to use `torch.jit.script` and when to fall back to `torch.jit.trace` Improved handling of inline-operations TensorFlow RNN support Closed Issues #775 [OpenSSL] Upgrade to 1.1.1r #774 [PyTorch] v1.12.1 breaks VE complex number support #772 [PyTorch] SqueezeNet training accuracy problems on VE #771 [DFP] Investigate Unvectorizable DType in SqueezeNet Training on VE #765 [PyTorch] automatic fallback to jit.trace if model parsing fails with jit.script #764 [Docs] add Pytorch lazy optimizer documentation #763 [PyTorch] enable kwargs for trace=True cases #762 [PyTorch] add lazy optimizer #761 [PyTorch] add torch.triu #760 [Docs] Update supported layers #759 [Core] Disable Progress bar if not running in interactive shell #758 [Hugo] Upgrade to v0.103.0 #756 [VE] find alternative for "constexpr" in RNN implementation #754 [DFP] Implement Interpolate autoSqueeze #753 [Config] Remove unused config options #752 [Docs] Update Docs for v0.5.1 #751 [NEC-SOL] Add "test access" option to nec-sol #750 [NEC-SOL] Device Support Packages that users don't have access to prevent to install the package in the first place. #748 [DNNL] Upgrade 2.6.2 #747 [NEC-SOL] add --verbose flag and report access to urls. #745 [TF+RNN] verify RNNSimple results #743 Add TF v2.10.0 Support #739 [Hugo] upgrade v0.102.1 #735 [Boost] Upgrade to 1.80.0 #733 [OMP] Add Heuristics to sol_parallel_for and sol_parallel_simd #732 [HLIR] add option to remove unused model parameters #731 [HLIR] Make RNN sequence length variable #730 [Keras] add tf.ensure_shape to KerasLayer #729 [Keras] Support named inputs #728 [HLIR] Repair Where 2 min/max transformation #726 [DNN/RNN] evaluate not using BLAS #725 [TF] Solve Threadblocking Issue on X86 #723 [HLIR] derive(Slice) == Slice, if it's a reverse-slice #720 [TF2RNN] Handle cases where only OH is used without slicing #719 [RNN] Move sol.hlir.rnn API to C++ space #718 [DNN/RNN] Improve Handling of O and OH #717 [RNN] Masking Support #716 [RNN] Recurrent Dropout Support #714 [HLIR] remove dropout layers from inference executions #713 [TF] fix keras (alpha_)dropouts in inference mode #712 [TF] enable to change input shapes using sol.optimize(..., shapes={...}, ...) #711 [DNNL] Upgrade 2.6.1 #705 [OpenSSL] Upgrade 1.1.1q #704 [NVIDIA] JIT compile 64 and 128 Bit memset functions #703 [Numpy] Numpy Runtime can cause SegFault when a NDArray gets freed when SOL already destroyed the handlers during shutdown #702 [ONNX] add MaxUnPool #701 [PyTorch] add MaxUnPooling #700 [HLIR] check if we MaxUnPooling always uses "max" instead of "+=" in fwd Pass #699 [HLIR] Add DeSampling optimization to DeConv #698 [Core] Fix Conv::Transform::Subsampling when applied in Bwd Pass #697 [PyTorch] accuracy problem in RNN with S>1 in v1.12.0 #695 [PyTorch] add aten::pad #694 [PyTorch] add aten::index #688 [PyTorch] Upgrade to 1.12.0 #687 [PyTorch] Add Swin Transformer Tests #686 [NEC-SOL] Pure command line mode? #685 [DFP] fix unvectorized read loops #684 [DFP-NCC] remove struct operator and add restricted keyword. #683 [NCC] Improve DFP unrolling #682 [PyTorch] Add PyTorch Lightning Support #673 [OpenSSL] Upgrade 1.1.1p #672 [JIT-NCC] In debug, warn about obstructive functions, and other vectorization problems. #671 [VE] Add _Pragma("_NEC always_inline") into DFP-NCC headers #670 [API] Add linker script #669 [JIT] Add linker script #668 [API] Replace all remaining extern "C" with SOL_API #665 [CUDNN] Get Bundle from NVIDIA #664 [MKL] Switch to PIP MKL Package instead of Bundling #663 [Runtime] Add Mutex to runtime::device::Network to prevent parallel executions as in TF. #662 [TF/X86] Warn user if tf.config.threading.set_inter_op_parallelism_threads is not set to 1, that it has negative impact on SOL performance #661 [Python] Replace any remaining 'print' with 'tungl.info' #660 [Web] Add SOL4VE Registration Form #658 [PyTorch+TF] Use unified implementation of framework::Handle for all devices #657 [TF] Add option to enable grad on inputs #654 [OpenSSL] Upgrade 1.1.1o #653 [Hugo] Upgrade 0.100.2 #652 [Installer] Upgrade fails with 'set' object has no attribute 'append' #649 [TF] Unable to fetch values for... #647 [Core] Lock DB when executing cache::clear #646 [Docs] Update TF SDK docs #645 [HLIR] Transformation to remove duplicate Permutes is not working correctly. #644 [TF] RNN Seq2Seq testcase exposes #3 as vdim #643 [Hugo] Upgrade v0.100.1 #642 [TF] GRU #641 [TF] LSTM #640 [TF] SimpleRNN #638 [TF] Evaluate unified module instead of compiling new modules for each network #636 [Profiler] Rework API #635 [Installer] Not Showing PyTorch/TensorFlow properly #634 [NCC] investigate why OpenMP is not working with -std=c++17 #633 [MKL] RNN #631 [TF] Upgrade 2.9.1 #630 [Deployment] change binary2obj to objcopy #629 [PyTorch] Sum of BOOL needs to be casted into INT DType #626 [Python] Evaluate if using CFFI is less verbose compared to CTypes #625 [PyTorch] Evaluate if replacing sol.runtime.set_tensor with a CPP function yields in less overhead. #621 [HLIR] ZeroCopy Layer, that tries to not duplicate outputs in frameworks. #609 [ONNX] GroupNorm: illegal view transformation #608 [TF] Add tf.keras.applications testcases #607 [TF] Upgrade 2.8.2, 2.7.4, 2.6.5 #601 [VEDA] VEDA_ERROR_UNKNOWN_CONTEXT thrown when calling vedaDevicePrimaryCtxRetain #600 [PyTorch] Missing Primitive: aten::bernoulli #597 [Core] Wrong Rendering of Output in Inception #593 [X86+VE] IRFFT2D accuracy issues #582 [Core] Show Update Message #547 [PyTorch] Remove RNN Fix for when PyTorch v1.12.0 is released #541 [DFP] improve size-1 write loop removal #540 [DFP] Split BatchSize == 1 outer loops if there are multiple inner #498 [Python API] Add option to override requires_grad #283 [TensorFlow] automatically unload network, when all instances of the network have been destroyed #214 [VEBLAS] Better parallelize RNN for small BatchSizes #209 [PyTorch] automatically unload network, when all instances of the network have been destroyed #123 [Layers] ConvTranspose/Deconvolution
v0.5.0rc2 Docs	23.05.2022	This is the 2nd release candidate of SOL v0.5.0 Diadem. It adds brings back support for ONNX, Numpy (runtime), as well as lots of bugfixes and improvements. Highlights ONNX frontend Numpy runtime Printing StackTrace when exception occurs in SOL Improved printing of model signature The installer no longer shows packages you don't have access to Closed Issues #624 [FFT] duplicate symbol "vdims" #623 [TF] tf.nn.max_pool_with_argmax indices differ again... #619 [Hugo] Upgrade 0.99.0 #617 [ONNX] "Unable to fetch values for " in ShuffleNet #616 [ONNX] "can't 384 / 13 because it's no divider" in X86 Igornet and Layers[Repeat] #615 [ONNX] "Unable to find dimension PO1" in X86 MNasNet and EfficientNet #614 [ONNX] Cumsum accuracy issues #613 [Core] Improve calculation of memory consumption estimation #612 [TF] TF casts shape of [1,1,1,1] to [] during training #611 [NCC] Segfault in networks using VDims #610 [ONNX] CumSum #606 [ISPC] Wrong initialization of VDims #605 [DB] Verify that we can open multiple instances of SOL using the same DB file #604 [X86+FFT] Accuracy Error in PY_IRFFT2D #603 [X86+PyTorch] Memory is already allocated #602 [X86+FFT] [DNNL ERROR] The operation failed because of incorrect function arguments. #599 [PyTorch] Missing Primitive "aten::empty" #598 [HLIR] VDims cause promotion to F64 in BatchNorm #595 [VE] RNN Segfault in Testcases #594 [Tests] Report % of values that exceed threshold #592 [PyTorch] Models without parameter can create training context if 'model.training = False' is not set #591 [Docs] Update documentation for v0.5.0rc2 #590 [Core] "axis needs to be between 0 and 0 but found 1" in DeReduce #589 [TF] Can't use Keras models with scalar parameters #587 [TF] Ordering of Parameters in KerasLayer is not stable #586 [VE] Investigate possible Memleak in PyTorch-VE Runtime #584 [PyTorch] Add Error Handling when User provides too many/too few arguments to function #583 [JsonCPP] Upgrade to 1.9.5 #581 [ISPC] Upgrade to 1.18.0 #580 [CMake] Change min GCC to 10.X #579 [Hugo] upgrade 0.98.0 #578 [PyTorch] RNNCell parsing fails in PyTorch 1.11.0 #577 [PyTorch] "illegal view transformation" when parsing RNN #576 [DB] automatically recover from "database image malformed" exceptions #575 [VE] RNN does not compile with NCC 3.4.2 #574 [Runtime] SegFault in Context::Destroy when starting another training context. #573 [Core] Print StackTrace when SOL Exceptions are thrown #572 [VE] Value2VDim #571 [X86] Value2VDim #570 [VE] WhereTrue #569 [X86] WhereTrue #568 [VE] PrefixSum #567 [X86] PrefixSum #566 [HLIR] return NAN instead of throwing error when dividing by 0 #563 [HLIR] Remove Unnecessary Permute #562 [Runtime] do not reinit Params if the model has been run before #561 [PyTorch] add Mobilenet V3 testcases #560 [PyTorch] add efficientnet testcases #559 [PyTorch] add ConvNext Testcases #558 [PyTorch] add RegNet testcases #556 [HLIR] show actual input/output shapes, not the "possible" #555 [Installer] Cache Password for PIP #554 [Installer] hide packages that user does not have access to #553 [Runtime] Context needs to distinguish between offloading and framework handle! #552 [Numpy] Implement Lazy Allocations in Numpy executor #551 [HLIR] Fix Memory Consumption to report Copies not as Outputs #550 [NCC/OMP] cannot explicitly instantiate std::tuple in sol parallel simd with NCC 3.3 or newer #549 [Compiler/Runtime] Encode/Check framework version, device compute capability, ... in compiled NN library name #548 [SQLite] Upgrade to 3.38.2 #546 [DNNL] Upgrade 2.6 #545 [TF] SOL's MaxPooling makes other choices than TF's implementation in training #544 [TF] implement save/load methods in Keras model #543 [TF] fix BatchNorm Assignment #539 [DNNL] Illegal Operation in Resnext BWD Pass #538 [Python] Don't store size of Tensor in Python but always request from C++ space #522 [DNNL] Performance Problem in ConvBwdData Param Reorder #514 [PyTorch] Upgrade 1.11.0 #512 [HLIR] Copy Buffers during Training, not LayerOutputs #507 [TF] Which momentum does TF use in BatchNorm? #492 [TF/VE] Check if Optimizers and Loss functions are implemented #438 [TF] Update 2.8.0 #429 [ONNX] Add Handler option #425 [HLIR] Remove unnecessary Copies in RNN -> Copy -> Output i.e. for Workspace #276 [ONNX] set that params require grad, so we can train ONNX models #201 [DFP] Double check if we correctly report the scratchpad memory #186 [ONNX] use sol.internal.Tensor operators instead of explicit calls #166 [PyTorch] torch.nn.InstanceNorm2d missing #165 [PyTorch] torch.nn.GroupNorm missing #113 [ONNX] PRelu #112 [ONNX] Gather #108 [ONNX] ERROR getsym_handler #74 [DFP] Performance: implement Input Loop Merging
v0.5.0rc1 Docs	23.03.2022	The SOL v0.5.0 Diadem is the next major release of SOL, containing already over 160 closed issues! We further switched to a new rolling releases model using release candidates, to push out fixes + changes more often. This first release candidate DOES NOT contain support for: NVIDIA devices, ONNX or Deployment! These features will be reenabled in later release candidates. Highlights New Variable Dimensions system PyTorch FFT Support for X86 and VE Fixed parsing for Huggingface Transformers in PyTorch Outsourcing native device support for SX-Aurora into open source projects Breaking Changes We modified the `sol.optimize(model, args, kwargs={}, framework=None, *fwargs)` call. If you have more than one input, you now need to pass it using the `args` argument as list or tuple, or using the `kwargs` as a dictionary. This was necessary to be more compliant with the AI framework's. `sol.optimize(..., batchsize=...)` has been removed in favor of the new variable dimensions system. Please look here for more details. Closed Issues #538 [Python] Don't store size of Tensor in Python but always request from C++ space #537 [OpenSSL] Upgrade to 1.1.1n #535 [PyTorch] create OMP Symlink in sol-framework-pytorch-x86 #534 [DNNL] Upgrade 2.5.3 #533 [SQLITE] Upgrade 3.38.0 #531 [Core] Report Peak Memory in Output during Compilation Phase for each Pass #530 [Python] increase min Python Version from 3.6 to 3.7 because of TF 2.8.0 requirement #529 [PyTorch] Upgrade v1.10.2 #524 [PyTorch] Check why we get weird JIT accuracy errors in HuggingFace Bert training #523 [Python] deprecated "copy_parameters" and just do it always. #521 [DNNL] implement memory layout AutoTuning for Conv #520 [CMake] Enable Release Candidates #519 [VE] If NC++ is not found, we need to disable VE entirely, not just not setting the VE_LD_LIBRARY_PATH #518 [VEDA] upgrade 1.2.0 #517 [Runtime] segfault when an inference Context is still open and get cleaned up during shutdown #516 [PyTorch] Find a solution for conflicting GOMP implementation #515 [OMP] Limit number of threads to match number of hardware cores, not number of hardware threads #511 [Runtime] Store raw pointer in sol_tensor #510 [DFP] Unroll Interpolation using TmpVars #509 [Runtime] Evaluate if TBB is better than OMP #508 [Runtime] Remove sol::runtime::Tensor #505 [DFP] Remove Rand from DFP and implement in ISPC/CUDA and VEASL #500 [DFP] input loop merge #499 [Python API] Change sol.optimize(args, *kwargs) to use one variable #496 [DFP] Ensure that in ISPC each core uses it's own random state #494 [DFP] Remove GLoops #493 [DFP] Don't allocate stack memory in CORES loops. #491 [HLIR] Fix weird TorchVision structure of DenseNet #490 [TF] ValueError: Data cardinality is ambiguous: x sizes: 1, 100, 100 #489 [TF] OperatorNotAllowedInGraphError: iterating over `tf.Tensor` is not allowed in Graph execution. Use Eager execution or decorate this function with @tf.function. #488 [HLIR] Add verification for DTypes #487 [PyTorch] Testcase "MemberTensors" fails in Training #486 [TF] can we pass the sol_ctx as attribute instead of scalar tensor? #485 [TF] Enable models with more than 255 model parameter tensors #483 [Core] Buffer Chaining fails if there is a view in between the concat operations #480 [PyTorch] add strided slice support #479 [SQLite] Upgrade to 3.37.2 #478 [TF] add missing StridedSlice features #477 [ISPC] Upgrade to 1.17.0 #475 [DNNL] Upgrade to v2.5.2 #471 [X86] correctly detect vector instructions #470 [TF] tf.math.cumsum #469 [ISPC] migrade dnn_ispc to ispc, as it does not use any dnn components #468 [Core] Verify that all layers have properly been assigned an algorithm before generating code #466 [PyTorch] add torch.cumsum #465 [TF] add tf.Where #464 [Core] remove NUMEL through transform if VDIMS get removed #463 [Core] add input to NUMEL #462 [TF] backport new sol.hlir.reduce API #461 [Core] Revise sol_tensor using real shape and separate numel field #460 [DFP] BackendRenderer missing DTYPE #459 [SQLite] Upgrade to 3.37.1 #458 [AVEO] Upgrade to 2.10.0 #457 [PyTorch] Support multivalued constants #456 [HLIR] Bugfix GEMM batchHelper function #455 [VDIMS] PyTorch AvgPool2D#3 with two different shapes after each other fail because of identical Hash #454 [Pytorch] Upgrade 1.10.1 #453 [Docs] Add TensorFlow Native to VE Chapter #452 [DNNL] Upgrade to 2.5.1 #451 [Core] Remove Tensor Class #450 [Docs] Correct VDim case with #2 == 5 #449 [Deprecated] YaalUpdate and YaalSeed #448 [VDIMS] option to enable/disable VDIM usage, disable by default. #447 [DFP] Fix BatchNorm BWD Pass #446 [DFP] Flag Loops where DataStride() == 1 for ALL data to be used for SIMD #445 [PyTorch] torch.reshape missing #444 [Core] Print NN Input/Output Signature #443 [PyTorch] argmin + argmax missing #442 [TF] wrong results in activation layers #439 [PIP] Enable manylinux2014 compatible build process #437 [DNNL] Upgrade 2.4.4 #436 nec-sol: download.pytorch.org may also need to be trusted #435 nec-sol: codec can't display license agreement #434 upon program exit, malloc(): unsorted double linked list corrupted #433 [TF] Enable to parse tf.Module and tf.saved_models #432 [TF] Upgrade to 2.6.1 #431 [PyTorch] torch.rand and torch.randint* missing #430 [Pytorch] Enable Models without inputs #428 [PyTorch] Add Handler option #427 [TensorFlow] Add Handler option #426 [HLIR] Add Custom Layer Support #423 [PyTorch] Possible Segfault in Py_InfNAN on X86 #422 [Core] Add Option to register Operations at runtime #421 [Core] Missing library libcrypto.so.10 on Ubuntu 20.04 #420 [PyTorch] Debug v1.10.0 print problems with native-ve #419 [PyTorch] Upgrade RNN API to new HLIR version #418 [CUDNN] Upgrade to 8.2.4 #417 [CUDA] Upgrade to 11.3 #416 [PyTorch] Upgrade to v1.10.0 #415 [CORE] use SQLite Transactions to lock SOL Cache in multiple processes #414 [DNNL] Upgrade to 2.4.2 #413 [VE] replace sol_ve_copy with VEDAMemset and VEDAMemcopy #412 [VEDA] Add device side memset #411 [VE] Add ComplexDTypes to Device API #410 [TBB] CMake install script fails on first run. #408 [VEDA] memset 128 #407 [VEDA] CMake ASL FFTW #401 [DNNL] Upgrade to 2.4.1 #399 [NCC] Add Struct Functions #398 [X86] Check Handle::memset performance! #397 [VEDA] improve S8, S16 vedaMemset Performance #396 [TBB] Upgrade to 2021.4.0 #395 [DNNL] Upgrade to 2.4 #394 [PyTorch] torch.complex missing #393 [PyTorch] torch.imag missing #391 [PyTorch] sol.optimize(model, torch.Tensor) stopped working? #389 [NEC-SOL] automatically detect VENV #388 [PyTorch] use torch.jit.trace instead of torch.jit.script to support Transformers #387 [HLIR/DFP] Change Dropout to always use input rand #386 [NEC-SOL] UTF-8 encoding problem #385 [DFP] Bad Loop Merging in multi-path kernels #384 [DFP] Wrong vectorization in Bias Sum Case #383 [YAAL] Improve Scheduling #382 [PyTorch] Upgrade to 1.9.1 #381 [Tests] Restructure TESTS package, so the PyTorch package does not load TF and vice versa. #379 [NNPACK] Deprecate #376 [ISPC] Check OpenMP implementation performance for small batchsizes #375 [TensorFlow] Upgrade to 2.6.0 #364 [DFP] Fix Tile backward pass for X86 #363 [PyTorch] aten::repeat missing #362 [PyTorch] aten::tile missing #360 [PyTorch] aten::smooth_l1_loss #355 [PyTorch] HuggingFace transformers broken #332 [TF] Enabled Delayed Allocation in Native-VE #329 [PyTorch] Add BatchNorm num_batches_tracked #321 [API] change sol_external_malloc to use shapes instead of accumulated sizes #307 [HLIR] Input > Dropout > Output produce wrong results in Inference #306 [TF] tf.nn.max_pool_with_argmax returns wrong indicies #302 [PyTorch] Use VE instead of HIP device #299 [PyTorch] Slicing with [0:0] should return empty Tensor #293 [HLIR] Constant with more than 1 element #289 [TF] Remove sol_model.convert("device") #285 [DFP] Remove Nests #284 [DFP] Implement Linear #271 [TensorFlow] enable lazy allocations #267 [Core] Remove OptimizerType #259 [IgorNet] Can't run Inference (division by zero) #239 [PyTorch] add torch.nn.functional.interpolate #225 [DNNL] Upgrade to 2.3.2 #212 [HLIR] SIGN on Unsigned is 1 #211 [HLIR] Abs on Unsigned is Noop #208 [PyTorch] some networks can't use variable BatchSize #203 [DFP] Buffered Dropout >> Narrow allocates too few memory #202 [Compiler] Show Progress Bar already during Code generation Phase #199 [HLIR] Detect Permutations in front of GEMM and merge them into the GEMM by changing the layout. #198 [HLIR] Initialize Gradients with 0 and make all multi-connections to REQURIESACC #193 [PyTorch] "Only one dimension for View can use summation." in ShuffleNet when using Variadic BatchSizes #187 [DFP] BatchNorm does not support NOT to trace runtime metrics #184 [PyTorch] No Gradient for inputs which get disconnected #183 [PyTorch] LPPool Backward Pass results do not match #182 [DFP] Split BatchNorm into two operations to better fit the actual computations and to remove the workarounds in DFP module #177 [VEBLAS] Memory Estimation: RNN #163 [PyTorch] torch.Tensor.repeat missing #161 [PyTorch] Interpolation Layers missing #157 [Autotuning] Device reports out of memory, as SOL does not use the framework's memory allocator during auto-tuning #139 [Core] Enable to save/load SOL models #135 [HLIR] Pytorchic BERT can't use variable batchsize because of the view? #131 [DFP] segfault generating RReLU layer #120 [DType] Add Complex DTypes #117 [PyTorch] FFT #81 [Variable BatchSize] GPT-2 can't use variable batchsize #80 [Clusters] Errornous accumulation in GPT-2 BWD Pass #71 [DFP] missing layer: MaxUnPooling #70 [DFP] LeakyRelu: Remove IF in generated code #35 [X86] Does not show correct used memory

v0.5.1
Docs

17.10.2022

Highlights

General
- sol.check_version() command can be used to check for new version.
- Form to apply for SOL4VE closed beta.
- Improved command-line based installer.
PyTorch
- Calls to sol.optimize(...) from PyTorch does no longer require example inputs. Instead it gets parsed the very first time it gets executed.
- Limited support for torch.einsum(...)
- Support for GANs through MaxUnPooling and TransposedConv layer support.
- Automatically detection when to use torch.jit.script and when to fall back to torch.jit.trace
- Improved handling of inline-operations
TensorFlow
- RNN support

Closed Issues

#775 [OpenSSL] Upgrade to 1.1.1r
#774 [PyTorch] v1.12.1 breaks VE complex number support
#772 [PyTorch] SqueezeNet training accuracy problems on VE
#771 [DFP] Investigate Unvectorizable DType in SqueezeNet Training on VE
#765 [PyTorch] automatic fallback to jit.trace if model parsing fails with jit.script
#764 [Docs] add Pytorch lazy optimizer documentation
#763 [PyTorch] enable kwargs for trace=True cases
#762 [PyTorch] add lazy optimizer
#761 [PyTorch] add torch.triu
#760 [Docs] Update supported layers
#759 [Core] Disable Progress bar if not running in interactive shell
#758 [Hugo] Upgrade to v0.103.0
#756 [VE] find alternative for "constexpr" in RNN implementation
#754 [DFP] Implement Interpolate autoSqueeze
#753 [Config] Remove unused config options
#752 [Docs] Update Docs for v0.5.1
#751 [NEC-SOL] Add "test access" option to nec-sol
#750 [NEC-SOL] Device Support Packages that users don't have access to prevent to install the package in the first place.
#748 [DNNL] Upgrade 2.6.2
#747 [NEC-SOL] add --verbose flag and report access to urls.
#745 [TF+RNN] verify RNNSimple results
#743 Add TF v2.10.0 Support
#739 [Hugo] upgrade v0.102.1
#735 [Boost] Upgrade to 1.80.0
#733 [OMP] Add Heuristics to sol_parallel_for and sol_parallel_simd
#732 [HLIR] add option to remove unused model parameters
#731 [HLIR] Make RNN sequence length variable
#730 [Keras] add tf.ensure_shape to KerasLayer
#729 [Keras] Support named inputs
#728 [HLIR] Repair Where 2 min/max transformation
#726 [DNN/RNN] evaluate not using BLAS
#725 [TF] Solve Threadblocking Issue on X86
#723 [HLIR] derive(Slice) == Slice, if it's a reverse-slice
#720 [TF2RNN] Handle cases where only OH is used without slicing
#719 [RNN] Move sol.hlir.rnn API to C++ space
#718 [DNN/RNN] Improve Handling of O and OH
#717 [RNN] Masking Support
#716 [RNN] Recurrent Dropout Support
#714 [HLIR] remove dropout layers from inference executions
#713 [TF] fix keras (alpha_)dropouts in inference mode
#712 [TF] enable to change input shapes using sol.optimize(..., shapes={...}, ...)
#711 [DNNL] Upgrade 2.6.1
#705 [OpenSSL] Upgrade 1.1.1q
#704 [NVIDIA] JIT compile 64 and 128 Bit memset functions
#703 [Numpy] Numpy Runtime can cause SegFault when a NDArray gets freed when SOL already destroyed the handlers during shutdown
#702 [ONNX] add MaxUnPool
#701 [PyTorch] add MaxUnPooling
#700 [HLIR] check if we MaxUnPooling always uses "max" instead of "+=" in fwd Pass
#699 [HLIR] Add DeSampling optimization to DeConv
#698 [Core] Fix Conv::Transform::Subsampling when applied in Bwd Pass
#697 [PyTorch] accuracy problem in RNN with S>1 in v1.12.0
#695 [PyTorch] add aten::pad
#694 [PyTorch] add aten::index
#688 [PyTorch] Upgrade to 1.12.0
#687 [PyTorch] Add Swin Transformer Tests
#686 [NEC-SOL] Pure command line mode?
#685 [DFP] fix unvectorized read loops
#684 [DFP-NCC] remove struct operator and add restricted keyword.
#683 [NCC] Improve DFP unrolling
#682 [PyTorch] Add PyTorch Lightning Support
#673 [OpenSSL] Upgrade 1.1.1p
#672 [JIT-NCC] In debug, warn about obstructive functions, and other vectorization problems.
#671 [VE] Add _Pragma("_NEC always_inline") into DFP-NCC headers
#670 [API] Add linker script
#669 [JIT] Add linker script
#668 [API] Replace all remaining extern "C" with SOL_API
#665 [CUDNN] Get Bundle from NVIDIA
#664 [MKL] Switch to PIP MKL Package instead of Bundling
#663 [Runtime] Add Mutex to runtime::device::Network to prevent parallel executions as in TF.
#662 [TF/X86] Warn user if tf.config.threading.set_inter_op_parallelism_threads is not set to 1, that it has negative impact on SOL performance
#661 [Python] Replace any remaining 'print' with 'tungl.info'
#660 [Web] Add SOL4VE Registration Form
#658 [PyTorch+TF] Use unified implementation of framework::Handle for all devices
#657 [TF] Add option to enable grad on inputs
#654 [OpenSSL] Upgrade 1.1.1o
#653 [Hugo] Upgrade 0.100.2
#652 [Installer] Upgrade fails with 'set' object has no attribute 'append'
#649 [TF] Unable to fetch values for...
#647 [Core] Lock DB when executing cache::clear
#646 [Docs] Update TF SDK docs
#645 [HLIR] Transformation to remove duplicate Permutes is not working correctly.
#644 [TF] RNN Seq2Seq testcase exposes #3 as vdim
#643 [Hugo] Upgrade v0.100.1
#642 [TF] GRU
#641 [TF] LSTM
#640 [TF] SimpleRNN
#638 [TF] Evaluate unified module instead of compiling new modules for each network
#636 [Profiler] Rework API
#635 [Installer] Not Showing PyTorch/TensorFlow properly
#634 [NCC] investigate why OpenMP is not working with -std=c++17
#633 [MKL] RNN
#631 [TF] Upgrade 2.9.1
#630 [Deployment] change binary2obj to objcopy
#629 [PyTorch] Sum of BOOL needs to be casted into INT DType
#626 [Python] Evaluate if using CFFI is less verbose compared to CTypes
#625 [PyTorch] Evaluate if replacing sol.runtime.set_tensor with a CPP function yields in less overhead.
#621 [HLIR] ZeroCopy Layer, that tries to not duplicate outputs in frameworks.
#609 [ONNX] GroupNorm: illegal view transformation
#608 [TF] Add tf.keras.applications testcases
#607 [TF] Upgrade 2.8.2, 2.7.4, 2.6.5
#601 [VEDA] VEDA_ERROR_UNKNOWN_CONTEXT thrown when calling vedaDevicePrimaryCtxRetain
#600 [PyTorch] Missing Primitive: aten::bernoulli
#597 [Core] Wrong Rendering of Output in Inception
#593 [X86+VE] IRFFT2D accuracy issues
#582 [Core] Show Update Message
#547 [PyTorch] Remove RNN Fix for when PyTorch v1.12.0 is released
#541 [DFP] improve size-1 write loop removal
#540 [DFP] Split BatchSize == 1 outer loops if there are multiple inner
#498 [Python API] Add option to override requires_grad
#283 [TensorFlow] automatically unload network, when all instances of the network have been destroyed
#214 [VEBLAS] Better parallelize RNN for small BatchSizes
#209 [PyTorch] automatically unload network, when all instances of the network have been destroyed
#123 [Layers] ConvTranspose/Deconvolution

v0.5.0rc2
Docs

23.05.2022

This is the 2nd release candidate of SOL v0.5.0 Diadem. It adds brings back support for ONNX, Numpy (runtime), as well as lots of bugfixes and improvements.

Highlights

ONNX frontend
Numpy runtime
Printing StackTrace when exception occurs in SOL
Improved printing of model signature
The installer no longer shows packages you don't have access to

Closed Issues

#624 [FFT] duplicate symbol "vdims"
#623 [TF] tf.nn.max_pool_with_argmax indices differ again...
#619 [Hugo] Upgrade 0.99.0
#617 [ONNX] "Unable to fetch values for " in ShuffleNet
#616 [ONNX] "can't 384 / 13 because it's no divider" in X86 Igornet and Layers[Repeat]
#615 [ONNX] "Unable to find dimension PO1" in X86 MNasNet and EfficientNet
#614 [ONNX] Cumsum accuracy issues
#613 [Core] Improve calculation of memory consumption estimation
#612 [TF] TF casts shape of [1,1,1,1] to [] during training
#611 [NCC] Segfault in networks using VDims
#610 [ONNX] CumSum
#606 [ISPC] Wrong initialization of VDims
#605 [DB] Verify that we can open multiple instances of SOL using the same DB file
#604 [X86+FFT] Accuracy Error in PY_IRFFT2D
#603 [X86+PyTorch] Memory is already allocated
#602 [X86+FFT] [DNNL ERROR] The operation failed because of incorrect function arguments.
#599 [PyTorch] Missing Primitive "aten::empty"
#598 [HLIR] VDims cause promotion to F64 in BatchNorm
#595 [VE] RNN Segfault in Testcases
#594 [Tests] Report % of values that exceed threshold
#592 [PyTorch] Models without parameter can create training context if 'model.training = False' is not set
#591 [Docs] Update documentation for v0.5.0rc2
#590 [Core] "axis needs to be between 0 and 0 but found 1" in DeReduce
#589 [TF] Can't use Keras models with scalar parameters
#587 [TF] Ordering of Parameters in KerasLayer is not stable
#586 [VE] Investigate possible Memleak in PyTorch-VE Runtime
#584 [PyTorch] Add Error Handling when User provides too many/too few arguments to function
#583 [JsonCPP] Upgrade to 1.9.5
#581 [ISPC] Upgrade to 1.18.0
#580 [CMake] Change min GCC to 10.X
#579 [Hugo] upgrade 0.98.0
#578 [PyTorch] RNNCell parsing fails in PyTorch 1.11.0
#577 [PyTorch] "illegal view transformation" when parsing RNN
#576 [DB] automatically recover from "database image malformed" exceptions
#575 [VE] RNN does not compile with NCC 3.4.2
#574 [Runtime] SegFault in Context::Destroy when starting another training context.
#573 [Core] Print StackTrace when SOL Exceptions are thrown
#572 [VE] Value2VDim
#571 [X86] Value2VDim
#570 [VE] WhereTrue
#569 [X86] WhereTrue
#568 [VE] PrefixSum
#567 [X86] PrefixSum
#566 [HLIR] return NAN instead of throwing error when dividing by 0
#563 [HLIR] Remove Unnecessary Permute
#562 [Runtime] do not reinit Params if the model has been run before
#561 [PyTorch] add Mobilenet V3 testcases
#560 [PyTorch] add efficientnet testcases
#559 [PyTorch] add ConvNext Testcases
#558 [PyTorch] add RegNet testcases
#556 [HLIR] show actual input/output shapes, not the "possible"
#555 [Installer] Cache Password for PIP
#554 [Installer] hide packages that user does not have access to
#553 [Runtime] Context needs to distinguish between offloading and framework handle!
#552 [Numpy] Implement Lazy Allocations in Numpy executor
#551 [HLIR] Fix Memory Consumption to report Copies not as Outputs
#550 [NCC/OMP] cannot explicitly instantiate std::tuple in sol parallel simd with NCC 3.3 or newer
#549 [Compiler/Runtime] Encode/Check framework version, device compute capability, ... in compiled NN library name
#548 [SQLite] Upgrade to 3.38.2
#546 [DNNL] Upgrade 2.6
#545 [TF] SOL's MaxPooling makes other choices than TF's implementation in training
#544 [TF] implement save/load methods in Keras model
#543 [TF] fix BatchNorm Assignment
#539 [DNNL] Illegal Operation in Resnext BWD Pass
#538 [Python] Don't store size of Tensor in Python but always request from C++ space
#522 [DNNL] Performance Problem in ConvBwdData Param Reorder
#514 [PyTorch] Upgrade 1.11.0
#512 [HLIR] Copy Buffers during Training, not LayerOutputs
#507 [TF] Which momentum does TF use in BatchNorm?
#492 [TF/VE] Check if Optimizers and Loss functions are implemented
#438 [TF] Update 2.8.0
#429 [ONNX] Add Handler option
#425 [HLIR] Remove unnecessary Copies in RNN -> Copy -> Output i.e. for Workspace
#276 [ONNX] set that params require grad, so we can train ONNX models
#201 [DFP] Double check if we correctly report the scratchpad memory
#186 [ONNX] use sol.internal.Tensor operators instead of explicit calls
#166 [PyTorch] torch.nn.InstanceNorm2d missing
#165 [PyTorch] torch.nn.GroupNorm missing
#113 [ONNX] PRelu
#112 [ONNX] Gather
#108 [ONNX] ERROR getsym_handler
#74 [DFP] Performance: implement Input Loop Merging

v0.5.0rc1
Docs

23.03.2022

The SOL v0.5.0 Diadem is the next major release of SOL, containing already over 160 closed issues! We further switched to a new rolling releases model using release candidates, to push out fixes + changes more often.

This first release candidate DOES NOT contain support for: NVIDIA devices, ONNX or Deployment! These features will be reenabled in later release candidates.

Highlights

New Variable Dimensions system
PyTorch FFT Support for X86 and VE
Fixed parsing for Huggingface Transformers in PyTorch
Outsourcing native device support for SX-Aurora into open source projects

Breaking Changes

We modified the sol.optimize(model, args, kwargs={}, framework=None, **fwargs) call. If you have more than one input, you now need to pass it using the args argument as list or tuple, or using the kwargs as a dictionary. This was necessary to be more compliant with the AI framework's.
sol.optimize(..., batchsize=...) has been removed in favor of the new variable dimensions system. Please look here for more details.

Closed Issues

#538 [Python] Don't store size of Tensor in Python but always request from C++ space
#537 [OpenSSL] Upgrade to 1.1.1n
#535 [PyTorch] create OMP Symlink in sol-framework-pytorch-x86
#534 [DNNL] Upgrade 2.5.3
#533 [SQLITE] Upgrade 3.38.0
#531 [Core] Report Peak Memory in Output during Compilation Phase for each Pass
#530 [Python] increase min Python Version from 3.6 to 3.7 because of TF 2.8.0 requirement
#529 [PyTorch] Upgrade v1.10.2
#524 [PyTorch] Check why we get weird JIT accuracy errors in HuggingFace Bert training
#523 [Python] deprecated "copy_parameters" and just do it always.
#521 [DNNL] implement memory layout AutoTuning for Conv
#520 [CMake] Enable Release Candidates
#519 [VE] If NC++ is not found, we need to disable VE entirely, not just not setting the VE_LD_LIBRARY_PATH
#518 [VEDA] upgrade 1.2.0
#517 [Runtime] segfault when an inference Context is still open and get cleaned up during shutdown
#516 [PyTorch] Find a solution for conflicting GOMP implementation
#515 [OMP] Limit number of threads to match number of hardware cores, not number of hardware threads
#511 [Runtime] Store raw pointer in sol_tensor
#510 [DFP] Unroll Interpolation using TmpVars
#509 [Runtime] Evaluate if TBB is better than OMP
#508 [Runtime] Remove sol::runtime::Tensor
#505 [DFP] Remove Rand from DFP and implement in ISPC/CUDA and VEASL
#500 [DFP] input loop merge
#499 [Python API] Change sol.optimize(*args, **kwargs) to use one variable
#496 [DFP] Ensure that in ISPC each core uses it's own random state
#494 [DFP] Remove GLoops
#493 [DFP] Don't allocate stack memory in CORES loops.
#491 [HLIR] Fix weird TorchVision structure of DenseNet
#490 [TF] ValueError: Data cardinality is ambiguous: x sizes: 1, 100, 100
#489 [TF] OperatorNotAllowedInGraphError: iterating over `tf.Tensor` is not allowed in Graph execution. Use Eager execution or decorate this function with @tf.function.
#488 [HLIR] Add verification for DTypes
#487 [PyTorch] Testcase "MemberTensors" fails in Training
#486 [TF] can we pass the sol_ctx as attribute instead of scalar tensor?
#485 [TF] Enable models with more than 255 model parameter tensors
#483 [Core] Buffer Chaining fails if there is a view in between the concat operations
#480 [PyTorch] add strided slice support
#479 [SQLite] Upgrade to 3.37.2
#478 [TF] add missing StridedSlice features
#477 [ISPC] Upgrade to 1.17.0
#475 [DNNL] Upgrade to v2.5.2
#471 [X86] correctly detect vector instructions
#470 [TF] tf.math.cumsum
#469 [ISPC] migrade dnn_ispc to ispc, as it does not use any dnn components
#468 [Core] Verify that all layers have properly been assigned an algorithm before generating code
#466 [PyTorch] add torch.cumsum
#465 [TF] add tf.Where
#464 [Core] remove NUMEL through transform if VDIMS get removed
#463 [Core] add input to NUMEL
#462 [TF] backport new sol.hlir.reduce API
#461 [Core] Revise sol_tensor using real shape and separate numel field
#460 [DFP] BackendRenderer missing DTYPE
#459 [SQLite] Upgrade to 3.37.1
#458 [AVEO] Upgrade to 2.10.0
#457 [PyTorch] Support multivalued constants
#456 [HLIR] Bugfix GEMM batchHelper function
#455 [VDIMS] PyTorch AvgPool2D#3 with two different shapes after each other fail because of identical Hash
#454 [Pytorch] Upgrade 1.10.1
#453 [Docs] Add TensorFlow Native to VE Chapter
#452 [DNNL] Upgrade to 2.5.1
#451 [Core] Remove Tensor Class
#450 [Docs] Correct VDim case with #2 == 5
#449 [Deprecated] YaalUpdate and YaalSeed
#448 [VDIMS] option to enable/disable VDIM usage, disable by default.
#447 [DFP] Fix BatchNorm BWD Pass
#446 [DFP] Flag Loops where DataStride() == 1 for ALL data to be used for SIMD
#445 [PyTorch] torch.reshape missing
#444 [Core] Print NN Input/Output Signature
#443 [PyTorch] argmin + argmax missing
#442 [TF] wrong results in activation layers
#439 [PIP] Enable manylinux2014 compatible build process
#437 [DNNL] Upgrade 2.4.4
#436 nec-sol: download.pytorch.org may also need to be trusted
#435 nec-sol: codec can't display license agreement
#434 upon program exit, malloc(): unsorted double linked list corrupted
#433 [TF] Enable to parse tf.Module and tf.saved_models
#432 [TF] Upgrade to 2.6.1
#431 [PyTorch] torch.rand* and torch.randint* missing
#430 [Pytorch] Enable Models without inputs
#428 [PyTorch] Add Handler option
#427 [TensorFlow] Add Handler option
#426 [HLIR] Add Custom Layer Support
#423 [PyTorch] Possible Segfault in Py_InfNAN on X86
#422 [Core] Add Option to register Operations at runtime
#421 [Core] Missing library libcrypto.so.10 on Ubuntu 20.04
#420 [PyTorch] Debug v1.10.0 print problems with native-ve
#419 [PyTorch] Upgrade RNN API to new HLIR version
#418 [CUDNN] Upgrade to 8.2.4
#417 [CUDA] Upgrade to 11.3
#416 [PyTorch] Upgrade to v1.10.0
#415 [CORE] use SQLite Transactions to lock SOL Cache in multiple processes
#414 [DNNL] Upgrade to 2.4.2
#413 [VE] replace sol_ve_copy with VEDAMemset and VEDAMemcopy
#412 [VEDA] Add device side memset
#411 [VE] Add ComplexDTypes to Device API
#410 [TBB] CMake install script fails on first run.
#408 [VEDA] memset 128
#407 [VEDA] CMake ASL FFTW
#401 [DNNL] Upgrade to 2.4.1
#399 [NCC] Add Struct Functions
#398 [X86] Check Handle::memset performance!
#397 [VEDA] improve S8, S16 vedaMemset Performance
#396 [TBB] Upgrade to 2021.4.0
#395 [DNNL] Upgrade to 2.4
#394 [PyTorch] torch.complex missing
#393 [PyTorch] torch.imag missing
#391 [PyTorch] sol.optimize(model, torch.Tensor) stopped working?
#389 [NEC-SOL] automatically detect VENV
#388 [PyTorch] use torch.jit.trace instead of torch.jit.script to support Transformers
#387 [HLIR/DFP] Change Dropout to always use input rand
#386 [NEC-SOL] UTF-8 encoding problem
#385 [DFP] Bad Loop Merging in multi-path kernels
#384 [DFP] Wrong vectorization in Bias Sum Case
#383 [YAAL] Improve Scheduling
#382 [PyTorch] Upgrade to 1.9.1
#381 [Tests] Restructure TESTS package, so the PyTorch package does not load TF and vice versa.
#379 [NNPACK] Deprecate
#376 [ISPC] Check OpenMP implementation performance for small batchsizes
#375 [TensorFlow] Upgrade to 2.6.0
#364 [DFP] Fix Tile backward pass for X86
#363 [PyTorch] aten::repeat missing
#362 [PyTorch] aten::tile missing
#360 [PyTorch] aten::smooth_l1_loss
#355 [PyTorch] HuggingFace transformers broken
#332 [TF] Enabled Delayed Allocation in Native-VE
#329 [PyTorch] Add BatchNorm num_batches_tracked
#321 [API] change sol_external_malloc to use shapes instead of accumulated sizes
#307 [HLIR] Input > Dropout > Output produce wrong results in Inference
#306 [TF] tf.nn.max_pool_with_argmax returns wrong indicies
#302 [PyTorch] Use VE instead of HIP device
#299 [PyTorch] Slicing with [0:0] should return empty Tensor
#293 [HLIR] Constant with more than 1 element
#289 [TF] Remove sol_model.convert("device")
#285 [DFP] Remove Nests
#284 [DFP] Implement Linear
#271 [TensorFlow] enable lazy allocations
#267 [Core] Remove OptimizerType
#259 [IgorNet] Can't run Inference (division by zero)
#239 [PyTorch] add torch.nn.functional.interpolate
#225 [DNNL] Upgrade to 2.3.2
#212 [HLIR] SIGN on Unsigned is 1
#211 [HLIR] Abs on Unsigned is Noop
#208 [PyTorch] some networks can't use variable BatchSize
#203 [DFP] Buffered Dropout >> Narrow allocates too few memory
#202 [Compiler] Show Progress Bar already during Code generation Phase
#199 [HLIR] Detect Permutations in front of GEMM and merge them into the GEMM by changing the layout.
#198 [HLIR] Initialize Gradients with 0 and make all multi-connections to REQURIESACC
#193 [PyTorch] "Only one dimension for View can use summation." in ShuffleNet when using Variadic BatchSizes
#187 [DFP] BatchNorm does not support NOT to trace runtime metrics
#184 [PyTorch] No Gradient for inputs which get disconnected
#183 [PyTorch] LPPool Backward Pass results do not match
#182 [DFP] Split BatchNorm into two operations to better fit the actual computations and to remove the workarounds in DFP module
#177 [VEBLAS] Memory Estimation: RNN
#163 [PyTorch] torch.Tensor.repeat missing
#161 [PyTorch] Interpolation Layers missing
#157 [Autotuning] Device reports out of memory, as SOL does not use the framework's memory allocator during auto-tuning
#139 [Core] Enable to save/load SOL models
#135 [HLIR] Pytorchic BERT can't use variable batchsize because of the view?
#131 [DFP] segfault generating RReLU layer
#120 [DType] Add Complex DTypes
#117 [PyTorch] FFT
#81 [Variable BatchSize] GPT-2 can't use variable batchsize
#80 [Clusters] Errornous accumulation in GPT-2 BWD Pass
#71 [DFP] missing layer: MaxUnPooling
#70 [DFP] LeakyRelu: Remove IF in generated code
#35 [X86] Does not show correct used memory