v0.5 Diadem

Version/DateChanges
28.02.2023
v0.5.2
Docs

Highlights

  • Deprecated DL4J and Unikraft support.
  • Significantly improved compatibility of SOL integration into PyTorch and TensorFlow.
  • Experimental Custom Layer support for PyTorch and Tensorflow.
  • Lots of internal bugfixes and improvements, e.g., improved code generation of loop indices to reduce recomputation within compute kernels.
  • Added more compiler specific env vars.
  • Added SOL_CWD env var, to enable users to change the directory which SOL uses as working directory.
  • SOL now implements identical random number generators as PyTorch and Tensorflow.
  • New compiler::deterministic config option to trade accuracy of the model for more performance. See configs.
  • Preliminary support for TensorBoard profiler. Set SOL_PROFILE=TENSORBOARD:FILENAME. Results will be stored in FILENAME.
  • You now can use torch.compile(model, backend='sol') as alternative to sol.optimize(...) in Pytorch > 2.0! See here for more details.

Known Issues

  • Since TensorFlow v2.10.0 issue #57095 causes problems within the PluggableDeviceAPI. Tensors that need to be placed within the host memory randomly appear on the executing device. This problem increased in v2.12.0 and can cause random segfaults when running TensorFlow workloads on NEC SX-Aurora. This is a problem within TensorFlow that we cannot fix. Unfortunately it doesn’t seem that the TensorFlow team is going to fix this problem any time soon, although other vendors face the same problem. If you encounter random segfaults using TF + VE, please downgrade to TensorFlow v2.9.0.

  • torch.dropout(...) and torch.bernoulli(scalar) return different random numbers than SOL. This is caused by PyTorch issue #94388 that uses a different random number generator for Bernoulli/Dropout. In PyTorch v1.* this even causes different random numbers on Intel and AMD CPUs. SOL only supports identical random numbers for torch.rand(...) and torch.bernoulli(Tensor) yet.

Closed Issues

  • #1034 [DFP-NCC] Error in PyTorch Narrow testcase
  • #1032 [DNNL] Upgrade 2.7.4
  • #1030 [PyTorch] Could not cast attribute 'num_batches_tracked' to type Tensor: Unable to cast to Tensor
  • #1029 [DimMapper] Assertion 'A.size() == B.size()' failed! in ShuffleNet
  • #1028 [PyTorch] aten::tensor
  • #1027 [SegFault] Investigate random SegFault
  • #1026 [DFP] ASSERT(p->groups() == 1 || p->groups() == p->inChannels()) failed
  • #1025 [HLIR] unable to find gradient in many TIMM networks
  • #1024 [HLIR] Conv: Assertion '!wdims.hasVDims()' failed!
  • #1023 [HLIR] Incompatbile shapes within Arithmetic in GCResNet
  • #1022 [DNNL] The operation failed because of ...
  • #1021 [Parser] Can't initialize sol.hlir.Dim with [64.]/
  • #1020 [Python] Make calls to sol.optimize(...) recoverable when exception occured
  • #1019 [PyTorch] Add support for "same" and "valid" in ConvXD
  • #1018 [PyTorch] add aten::Bool
  • #1017 [PyTorch] add aten::unbind
  • #1014 [RNN/GCC] Investigate performance regression for RNN with GCC 9.X
  • #1012 [TF] Fix TF executing Model
  • #1011 [Python] add check for OMP problems to `python3 -m sol`
  • #1010 [Python] `python3 -m sol` stopped working
  • #1009 [GCC] Remove GLIBCXX11ABI == 0
  • #1007 [Boost] Upgrade 1.82.0
  • #1006 [DNN/RNN] Fix NAN if Softmax + Linear is used as activation functions
  • #1005 [HLIR] remove duplicates not working in some situations
  • #1004 [HLIR] Transform GEMM with in==1 or out==1 to basic operations
  • #1003 [DNN/GCC] Move dnn::mkl::RNN* to dnn::gcc::RNN*
  • #1002 [DNN/RNN] Fix Memory Consumption Reporting
  • #1001 [RNN] Split (De)RNN into (De)RNNActivation and GEMM/(De)RNNDropout
  • #1000 [RNN] Fix #nan for large values of pow, tanh, ...
  • #998 [Sleef] CMake install libsleefgnuabi and add to Illyrian
  • #997 [MKL/RNN] Fix GCC auto-vectorization problems
  • #996 [OpenMP] Investigate Performance regression when sklearn is loaded
  • #994 [DFP] Remove "Buffer" from list of supported layers
  • #992 [WHL] Add backend-noop to nec-sol-core package
  • #989 [TF] Test v2.12.0
  • #988 [Python] private variables such as _DType__determine will be normal attributes in 3.11
  • #987 [Python] Replace `imp` with `importlib` in HLIR parser
  • #986 [SQLite] Upgrade v3.41.2
  • #983 [TF] Test v2.11.1
  • #977 [PyTorch] Upgrade v2.0.0
  • #976 [RNN] Remove unnecessary layer inputs + template params from RNN impl.
  • #975 [TF] SimpleRNN mismatch of Bias gradient with dropout != 0 and activation == ReLU
  • #973 [DNN-RNN] Report correct temporary memory allocations
  • #970 [SQLite] Upgrade v3.41.1
  • #969 [HUGO] Upgrade v0.111.3
  • #968 [RNN] Evaluate if we can move the (I @ WI) * DI part outside of the RNN cell
  • #967 [RNN] Compilation fails for Channelsize == 1
  • #966 [Python] Port sol.tests to termcolor
  • #964 [HLIR] YAAL does not allocate memory for Buffer
  • #963 [C-API] Encode requires_grad as int64_t and set specific bits, instead of wasting an entire array with that
  • #962 [HUGO] Upgrade v0.111.2
  • #961 [HLIR] Circular Cluster construction in SimpleRNN testcase
  • #960 [TF] Get RNN Dropouts correct
  • #959 [HLIR] add axis attribute to Rand
  • #957 [ISPC] Upgrade to v1.19.0
  • #956 [PyTorch] SOL's model parameters don't get updated through the optimizer
  • #953 [Keras] RNN recurrent_dropout needs to use separate Rand objects for each sequence
  • #951 [Profiler] Allow to specify output file
  • #950 [Keras] Investigate influence of MASKING onto loss computation
  • #949 [Keras] Problem handling input_mask/output_mask propagation
  • #948 [HLIR] Transform where to pure arithmetic operation
  • #947 [RNN] doublecheck recurrent_dropout implementation
  • #946 [SQLite] Upgrade to v3.41.0
  • #945 [VE] Resnet101 does not converge
  • #944 [Python] upgrade np dtypes to new standard
  • #942 [RNN] Fix return_sequences + masking training
  • #941 [DFP-ISPC] Error: Ambiguous use of overloaded function "sol_dfp_ispc_max".
  • #940 [TF] Investigate differences in bwd pass for Average and MaxPooling
  • #939 [Keras] Dropouts are not disabled in inference
  • #936 [Keras] Debug Dropout behavior in basic CNN
  • #935 [Keras] Fix automatic setting of vdims=[True]
  • #934 [GCC] Add vdims support
  • #933 [VE] Fix TensorList synchorization
  • #932 [TF] Fix performance problem when using evaluate, predict or fit
  • #931 [TF] Investigate "Optimization loop failed" warning
  • #927 [PyTorch] MaskedFill, LogicalNot: found expected boolean, found int8_t
  • #926 [HLIR] Cast::copy
  • #925 [OpenSSL] Upgrade to v1.1.1t
  • #924 [Backends] Implement MT19937 and Philox Random number generators
  • #923 [Devices] Implement deterministic RAND mode
  • #922 [PyTorch] Fix TorchScript errors in PyTorchic BERT testcase
  • #916 [HLIR] Remove duplicates illegally merges Prefixsum with different shapes
  • #915 [DFP] Reorders can prevent vectorization in DFP
  • #911 [Profiler] Add TensorBoard integration
  • #909 [PyTorch] Disable torch.bernoulli within accuracy test runs
  • #907 [HLIR] Add Constant -> Reduction optimizations
  • #906 [ISPC] investigate why some random numbers are always zero
  • #905 [PyTorch] Debug Efficientnet BatchNorm buffers in training
  • #904 [PyTorch] Debug ConvNext wrong gradients in `layer_scale`
  • #903 [TF] Implement testcases for predict and evaluate
  • #902 [PyTorch] enable PyTorch to make clone of model, if the original is Pytorch model itself
  • #901 [TF] Find workaround for TF giving identical names to RNN states in LSTM
  • #900 [CAPI] Evaluate if we still need CAPI lazy init
  • #899 [PyTorch] Debug Testcases
  • #898 [HLIR] merge duplicates where dims are identical when squeezed
  • #897 [PyTorch] list index out of range, when SOL does not use all parameters
  • #896 [PyTorch] "Missing Parameter" when SOL does not use all model parameters
  • #895 [TF] RNN inaccuracies
  • #894 [TF] LSTM.stateful = True uses identical name `lstm/Variable:0` as name for H and C
  • #893 [RNN] Fix RNN zero_output_for_mask
  • #890 [DL4J] Remove code as it is abandoned
  • #889 [Unikraft] Remove code as it is abandoned
  • #888 [DNNL] Pure Permutation Reorders can result in Segfault on X86
  • #885 [HLIR] don't Derivative::copy Reorders if their src is a Param
  • #883 [DNNL] Upgrade v2.7.3
  • #882 [RNN] Remove OM, seems we do not need it.
  • #881 [TF] analyze TF-TRT integration and check if we can do that with SOL
  • #878 [PyTorch] add requires_grad to module.cpp
  • #877 [PyTorch] Unroll Module/Sequence structure within the Renderer
  • #876 [ONNX] "SET training TO EVAL ONCE WE FIX #527"
  • #875 [DFP] Optimize placement of LookupCheck
  • #871 [DNNL] AutoTuning for inference sometimes chooses different layouts than training causing unncessary reorders
  • #870 [DFP] Optimized Broadcast
  • #868 [DNNL] Add permute support
  • #867 [DFP] Loop Merging merges loop that should be flagged as unmergeable
  • #866 [Keras] change keras.evaluate to execute the training forward pass, not the inference pass
  • #865 [SQLITE] Upgrade v3.40.1
  • #864 [HUGO] Upgrade v0.109.0
  • #862 [PyTorch] Split handle.cpp into handle and module
  • #861 [VE] Backport VE to new runtime API
  • #860 [PyTorch] Move device check to set_tensors
  • #858 [YAAL] Move Shape and DType Checks to YAAL
  • #856 [TensorFlow] Setting KerasView.training needs to set the weight's training value
  • #855 [HLIR] Investigate to set Grads dynamically, similar to VDIMS
  • #854 [HLIR] Consider to not move weight dims from Reorder, if numel differ
  • #853 [CAPI] Disable LazyInit. No longer needed after we do lazy compilation of framework modules.
  • #852 [DFP] complete new indexing
  • #850 [PyTorch] Upgrade 1.13.1
  • #849 [Boost] Upgrade 1.81.0
  • #848 [RNN] implement Softmax Activations
  • #847 [Hugo] Upgrade 0.108.0
  • #846 Make Enums final
  • #845 Determine Absolute SOL_PATH once at startup, to prevent errors when users change CWD at runtime
  • #843 [TF] similar layer names 'lstm_8/concat:0' and 'lstm_8/transpose:0' cause 'duplicate layer' within KerasModel
  • #842 [PyTorch] Evaluate to replace Python Wrapper with torch.jit.compiled version, that uses torch.ops.sol.call instead
  • #841 [Report] Refactor reporting API to not require IDs and instead use str labels
  • #835 Do not compile all frameworks handlers at startup
  • #832 Add Custom Layer Support
  • #831 [PyTorch] integrate sol into torch.compile(...)
  • #830 [PyTorch] Add TIMM models to test suite
  • #828 [Python] add option to reset handlers
  • #821 [PyTorch] fix RNN with Hidden inputs
  • #820 [DNNL] Upgrade to 2.7.2
  • #818 [Hugo] Upgrade 1.106.0
  • #816 [SQLite] Upgrade 3.40.0
  • #813 [JIT] Add [N/NV]CPATH, CPLUS_INCLUDE, C_INCLUDE, ... to respective compilers
  • #812 [HLIR] Remove old scheduling algorithm
  • #811 [HLIR] Move Cluster::srcs, etc. to Device::initSchedule
  • #810 [TF] Enable keras.supports_masking?
  • #809 [PyTorch] Missing layer: MaskedFill
  • #807 [TF] Enable model.layers
  • #806 [Installer] Error installing veda-pytorch due to '==' instead of '~=' version matching
  • #805 [TF] Keep Keras Model name
  • #804 [TF] Can we mimic the Keras output behavior using nn.Identity layers?
  • #803 [TF] Enable Keras users to get a "view" using "get_layer(name=...)" and then do "set_weights(...)"
  • #802 [Installer] Debug passwords with "!"
  • #800 [OpenSSL] upgrade to 1.1.1s
  • #799 [HUGO] upgrade to 1.105.0
  • #796 [PyTorch] Upgrade 1.13.0
  • #793 [WHL] Device Meta Packages `nec-sol-meta-x86` that installs all dependencies for the given device.
  • #792 [WHL] Add nec-sol-omp as dependency to device-x86 and device-ve
  • #791 [WHL] Add jit-dot and jit-python to nec-sol-core requires-dist
  • #789 [DNNL] Upgrade v2.7.1
  • #788 [ISPC] Upgrade v1.18.1
  • #779 [Dependencies] Automatically install symlinks for VEDA and TUNGL to CMAKE_INSTALL_PREFIX
  • #777 [AutoTuner] Store results of previous results and reinit Algo objects directly from DB
  • #773 [DFP] Check why sometimes linear loops don't get stored in IDX
  • #744 [TF+RNN] some RNN hyperparameter constelations produce wrong gradients
  • #715 [HLIR] Transform TF RNN cells to HLIR RNN cells
  • #639 [TF] RNN stateful=True results in "Unable to fetch values for ..."
  • #637 [PyTorch] Build own SOL C-style AutoGrad Function Wrapper
  • #527 [HLIR] GPT-2 Backward causes Stack Overflow in Cluster initialization
  • #230 [PyTorch] Can we clone methods from the original model into the sol_model, i.e. model.doSomething()
  • #179 [PyTorch] Evaluate if using a CPP instead of Python function yields in less overhead
17.10.2022
v0.5.1
Docs

Highlights

  • General
    • sol.check_version() command can be used to check for new version.
    • Form to apply for SOL4VE closed beta.
    • Improved command-line based installer.
  • PyTorch
    • Calls to sol.optimize(...) from PyTorch does no longer require example inputs. Instead it gets parsed the very first time it gets executed.
    • Limited support for torch.einsum(...)
    • Support for GANs through MaxUnPooling and TransposedConv layer support.
    • Automatically detection when to use torch.jit.script and when to fall back to torch.jit.trace
    • Improved handling of inline-operations
  • TensorFlow
    • RNN support

Closed Issues

  • #775 [OpenSSL] Upgrade to 1.1.1r
  • #774 [PyTorch] v1.12.1 breaks VE complex number support
  • #772 [PyTorch] SqueezeNet training accuracy problems on VE
  • #771 [DFP] Investigate Unvectorizable DType in SqueezeNet Training on VE
  • #765 [PyTorch] automatic fallback to jit.trace if model parsing fails with jit.script
  • #764 [Docs] add Pytorch lazy optimizer documentation
  • #763 [PyTorch] enable kwargs for trace=True cases
  • #762 [PyTorch] add lazy optimizer
  • #761 [PyTorch] add torch.triu
  • #760 [Docs] Update supported layers
  • #759 [Core] Disable Progress bar if not running in interactive shell
  • #758 [Hugo] Upgrade to v0.103.0
  • #756 [VE] find alternative for "constexpr" in RNN implementation
  • #754 [DFP] Implement Interpolate autoSqueeze
  • #753 [Config] Remove unused config options
  • #752 [Docs] Update Docs for v0.5.1
  • #751 [NEC-SOL] Add "test access" option to nec-sol
  • #750 [NEC-SOL] Device Support Packages that users don't have access to prevent to install the package in the first place.
  • #748 [DNNL] Upgrade 2.6.2
  • #747 [NEC-SOL] add --verbose flag and report access to urls.
  • #745 [TF+RNN] verify RNNSimple results
  • #743 Add TF v2.10.0 Support
  • #739 [Hugo] upgrade v0.102.1
  • #735 [Boost] Upgrade to 1.80.0
  • #733 [OMP] Add Heuristics to sol_parallel_for and sol_parallel_simd
  • #732 [HLIR] add option to remove unused model parameters
  • #731 [HLIR] Make RNN sequence length variable
  • #730 [Keras] add tf.ensure_shape to KerasLayer
  • #729 [Keras] Support named inputs
  • #728 [HLIR] Repair Where 2 min/max transformation
  • #726 [DNN/RNN] evaluate not using BLAS
  • #725 [TF] Solve Threadblocking Issue on X86
  • #723 [HLIR] derive(Slice) == Slice, if it's a reverse-slice
  • #720 [TF2RNN] Handle cases where only OH is used without slicing
  • #719 [RNN] Move sol.hlir.rnn API to C++ space
  • #718 [DNN/RNN] Improve Handling of O and OH
  • #717 [RNN] Masking Support
  • #716 [RNN] Recurrent Dropout Support
  • #714 [HLIR] remove dropout layers from inference executions
  • #713 [TF] fix keras (alpha_)dropouts in inference mode
  • #712 [TF] enable to change input shapes using sol.optimize(..., shapes={...}, ...)
  • #711 [DNNL] Upgrade 2.6.1
  • #705 [OpenSSL] Upgrade 1.1.1q
  • #704 [NVIDIA] JIT compile 64 and 128 Bit memset functions
  • #703 [Numpy] Numpy Runtime can cause SegFault when a NDArray gets freed when SOL already destroyed the handlers during shutdown
  • #702 [ONNX] add MaxUnPool
  • #701 [PyTorch] add MaxUnPooling
  • #700 [HLIR] check if we MaxUnPooling always uses "max" instead of "+=" in fwd Pass
  • #699 [HLIR] Add DeSampling optimization to DeConv
  • #698 [Core] Fix Conv::Transform::Subsampling when applied in Bwd Pass
  • #697 [PyTorch] accuracy problem in RNN with S>1 in v1.12.0
  • #695 [PyTorch] add aten::pad
  • #694 [PyTorch] add aten::index
  • #688 [PyTorch] Upgrade to 1.12.0
  • #687 [PyTorch] Add Swin Transformer Tests
  • #686 [NEC-SOL] Pure command line mode?
  • #685 [DFP] fix unvectorized read loops
  • #684 [DFP-NCC] remove struct operator and add restricted keyword.
  • #683 [NCC] Improve DFP unrolling
  • #682 [PyTorch] Add PyTorch Lightning Support
  • #673 [OpenSSL] Upgrade 1.1.1p
  • #672 [JIT-NCC] In debug, warn about obstructive functions, and other vectorization problems.
  • #671 [VE] Add _Pragma("_NEC always_inline") into DFP-NCC headers
  • #670 [API] Add linker script
  • #669 [JIT] Add linker script
  • #668 [API] Replace all remaining extern "C" with SOL_API
  • #665 [CUDNN] Get Bundle from NVIDIA
  • #664 [MKL] Switch to PIP MKL Package instead of Bundling
  • #663 [Runtime] Add Mutex to runtime::device::Network to prevent parallel executions as in TF.
  • #662 [TF/X86] Warn user if tf.config.threading.set_inter_op_parallelism_threads is not set to 1, that it has negative impact on SOL performance
  • #661 [Python] Replace any remaining 'print' with 'tungl.info'
  • #660 [Web] Add SOL4VE Registration Form
  • #658 [PyTorch+TF] Use unified implementation of framework::Handle for all devices
  • #657 [TF] Add option to enable grad on inputs
  • #654 [OpenSSL] Upgrade 1.1.1o
  • #653 [Hugo] Upgrade 0.100.2
  • #652 [Installer] Upgrade fails with 'set' object has no attribute 'append'
  • #649 [TF] Unable to fetch values for...
  • #647 [Core] Lock DB when executing cache::clear
  • #646 [Docs] Update TF SDK docs
  • #645 [HLIR] Transformation to remove duplicate Permutes is not working correctly.
  • #644 [TF] RNN Seq2Seq testcase exposes #3 as vdim
  • #643 [Hugo] Upgrade v0.100.1
  • #642 [TF] GRU
  • #641 [TF] LSTM
  • #640 [TF] SimpleRNN
  • #638 [TF] Evaluate unified module instead of compiling new modules for each network
  • #636 [Profiler] Rework API
  • #635 [Installer] Not Showing PyTorch/TensorFlow properly
  • #634 [NCC] investigate why OpenMP is not working with -std=c++17
  • #633 [MKL] RNN
  • #631 [TF] Upgrade 2.9.1
  • #630 [Deployment] change binary2obj to objcopy
  • #629 [PyTorch] Sum of BOOL needs to be casted into INT DType
  • #626 [Python] Evaluate if using CFFI is less verbose compared to CTypes
  • #625 [PyTorch] Evaluate if replacing sol.runtime.set_tensor with a CPP function yields in less overhead.
  • #621 [HLIR] ZeroCopy Layer, that tries to not duplicate outputs in frameworks.
  • #609 [ONNX] GroupNorm: illegal view transformation
  • #608 [TF] Add tf.keras.applications testcases
  • #607 [TF] Upgrade 2.8.2, 2.7.4, 2.6.5
  • #601 [VEDA] VEDA_ERROR_UNKNOWN_CONTEXT thrown when calling vedaDevicePrimaryCtxRetain
  • #600 [PyTorch] Missing Primitive: aten::bernoulli
  • #597 [Core] Wrong Rendering of Output in Inception
  • #593 [X86+VE] IRFFT2D accuracy issues
  • #582 [Core] Show Update Message
  • #547 [PyTorch] Remove RNN Fix for when PyTorch v1.12.0 is released
  • #541 [DFP] improve size-1 write loop removal
  • #540 [DFP] Split BatchSize == 1 outer loops if there are multiple inner
  • #498 [Python API] Add option to override requires_grad
  • #283 [TensorFlow] automatically unload network, when all instances of the network have been destroyed
  • #214 [VEBLAS] Better parallelize RNN for small BatchSizes
  • #209 [PyTorch] automatically unload network, when all instances of the network have been destroyed
  • #123 [Layers] ConvTranspose/Deconvolution
23.05.2022
v0.5.0rc2
Docs

This is the 2nd release candidate of SOL v0.5.0 Diadem. It adds brings back support for ONNX, Numpy (runtime), as well as lots of bugfixes and improvements.

Highlights

  • ONNX frontend
  • Numpy runtime
  • Printing StackTrace when exception occurs in SOL
  • Improved printing of model signature
  • The installer no longer shows packages you don't have access to

Closed Issues

  • #624 [FFT] duplicate symbol "vdims"
  • #623 [TF] tf.nn.max_pool_with_argmax indices differ again...
  • #619 [Hugo] Upgrade 0.99.0
  • #617 [ONNX] "Unable to fetch values for " in ShuffleNet
  • #616 [ONNX] "can't 384 / 13 because it's no divider" in X86 Igornet and Layers[Repeat]
  • #615 [ONNX] "Unable to find dimension PO1" in X86 MNasNet and EfficientNet
  • #614 [ONNX] Cumsum accuracy issues
  • #613 [Core] Improve calculation of memory consumption estimation
  • #612 [TF] TF casts shape of [1,1,1,1] to [] during training
  • #611 [NCC] Segfault in networks using VDims
  • #610 [ONNX] CumSum
  • #606 [ISPC] Wrong initialization of VDims
  • #605 [DB] Verify that we can open multiple instances of SOL using the same DB file
  • #604 [X86+FFT] Accuracy Error in PY_IRFFT2D
  • #603 [X86+PyTorch] Memory is already allocated
  • #602 [X86+FFT] [DNNL ERROR] The operation failed because of incorrect function arguments.
  • #599 [PyTorch] Missing Primitive "aten::empty"
  • #598 [HLIR] VDims cause promotion to F64 in BatchNorm
  • #595 [VE] RNN Segfault in Testcases
  • #594 [Tests] Report % of values that exceed threshold
  • #592 [PyTorch] Models without parameter can create training context if 'model.training = False' is not set
  • #591 [Docs] Update documentation for v0.5.0rc2
  • #590 [Core] "axis needs to be between 0 and 0 but found 1" in DeReduce
  • #589 [TF] Can't use Keras models with scalar parameters
  • #587 [TF] Ordering of Parameters in KerasLayer is not stable
  • #586 [VE] Investigate possible Memleak in PyTorch-VE Runtime
  • #584 [PyTorch] Add Error Handling when User provides too many/too few arguments to function
  • #583 [JsonCPP] Upgrade to 1.9.5
  • #581 [ISPC] Upgrade to 1.18.0
  • #580 [CMake] Change min GCC to 10.X
  • #579 [Hugo] upgrade 0.98.0
  • #578 [PyTorch] RNNCell parsing fails in PyTorch 1.11.0
  • #577 [PyTorch] "illegal view transformation" when parsing RNN
  • #576 [DB] automatically recover from "database image malformed" exceptions
  • #575 [VE] RNN does not compile with NCC 3.4.2
  • #574 [Runtime] SegFault in Context::Destroy when starting another training context.
  • #573 [Core] Print StackTrace when SOL Exceptions are thrown
  • #572 [VE] Value2VDim
  • #571 [X86] Value2VDim
  • #570 [VE] WhereTrue
  • #569 [X86] WhereTrue
  • #568 [VE] PrefixSum
  • #567 [X86] PrefixSum
  • #566 [HLIR] return NAN instead of throwing error when dividing by 0
  • #563 [HLIR] Remove Unnecessary Permute
  • #562 [Runtime] do not reinit Params if the model has been run before
  • #561 [PyTorch] add Mobilenet V3 testcases
  • #560 [PyTorch] add efficientnet testcases
  • #559 [PyTorch] add ConvNext Testcases
  • #558 [PyTorch] add RegNet testcases
  • #556 [HLIR] show actual input/output shapes, not the "possible"
  • #555 [Installer] Cache Password for PIP
  • #554 [Installer] hide packages that user does not have access to
  • #553 [Runtime] Context needs to distinguish between offloading and framework handle!
  • #552 [Numpy] Implement Lazy Allocations in Numpy executor
  • #551 [HLIR] Fix Memory Consumption to report Copies not as Outputs
  • #550 [NCC/OMP] cannot explicitly instantiate std::tuple in sol parallel simd with NCC 3.3 or newer
  • #549 [Compiler/Runtime] Encode/Check framework version, device compute capability, ... in compiled NN library name
  • #548 [SQLite] Upgrade to 3.38.2
  • #546 [DNNL] Upgrade 2.6
  • #545 [TF] SOL's MaxPooling makes other choices than TF's implementation in training
  • #544 [TF] implement save/load methods in Keras model
  • #543 [TF] fix BatchNorm Assignment
  • #539 [DNNL] Illegal Operation in Resnext BWD Pass
  • #538 [Python] Don't store size of Tensor in Python but always request from C++ space
  • #522 [DNNL] Performance Problem in ConvBwdData Param Reorder
  • #514 [PyTorch] Upgrade 1.11.0
  • #512 [HLIR] Copy Buffers during Training, not LayerOutputs
  • #507 [TF] Which momentum does TF use in BatchNorm?
  • #492 [TF/VE] Check if Optimizers and Loss functions are implemented
  • #438 [TF] Update 2.8.0
  • #429 [ONNX] Add Handler option
  • #425 [HLIR] Remove unnecessary Copies in RNN -> Copy -> Output i.e. for Workspace
  • #276 [ONNX] set that params require grad, so we can train ONNX models
  • #201 [DFP] Double check if we correctly report the scratchpad memory
  • #186 [ONNX] use sol.internal.Tensor operators instead of explicit calls
  • #166 [PyTorch] torch.nn.InstanceNorm2d missing
  • #165 [PyTorch] torch.nn.GroupNorm missing
  • #113 [ONNX] PRelu
  • #112 [ONNX] Gather
  • #108 [ONNX] ERROR getsym_handler
  • #74 [DFP] Performance: implement Input Loop Merging
23.03.2022
v0.5.0rc1
Docs

The SOL v0.5.0 Diadem is the next major release of SOL, containing already over 160 closed issues! We further switched to a new rolling releases model using release candidates, to push out fixes + changes more often.

This first release candidate DOES NOT contain support for: NVIDIA devices, ONNX or Deployment! These features will be reenabled in later release candidates.

Highlights

Breaking Changes

  • We modified the sol.optimize(model, args, kwargs={}, framework=None, **fwargs) call. If you have more than one input, you now need to pass it using the args argument as list or tuple, or using the kwargs as a dictionary. This was necessary to be more compliant with the AI framework's.
  • sol.optimize(..., batchsize=...) has been removed in favor of the new variable dimensions system. Please look here for more details.

Closed Issues

  • #538 [Python] Don't store size of Tensor in Python but always request from C++ space
  • #537 [OpenSSL] Upgrade to 1.1.1n
  • #535 [PyTorch] create OMP Symlink in sol-framework-pytorch-x86
  • #534 [DNNL] Upgrade 2.5.3
  • #533 [SQLITE] Upgrade 3.38.0
  • #531 [Core] Report Peak Memory in Output during Compilation Phase for each Pass
  • #530 [Python] increase min Python Version from 3.6 to 3.7 because of TF 2.8.0 requirement
  • #529 [PyTorch] Upgrade v1.10.2
  • #524 [PyTorch] Check why we get weird JIT accuracy errors in HuggingFace Bert training
  • #523 [Python] deprecated "copy_parameters" and just do it always.
  • #521 [DNNL] implement memory layout AutoTuning for Conv
  • #520 [CMake] Enable Release Candidates
  • #519 [VE] If NC++ is not found, we need to disable VE entirely, not just not setting the VE_LD_LIBRARY_PATH
  • #518 [VEDA] upgrade 1.2.0
  • #517 [Runtime] segfault when an inference Context is still open and get cleaned up during shutdown
  • #516 [PyTorch] Find a solution for conflicting GOMP implementation
  • #515 [OMP] Limit number of threads to match number of hardware cores, not number of hardware threads
  • #511 [Runtime] Store raw pointer in sol_tensor
  • #510 [DFP] Unroll Interpolation using TmpVars
  • #509 [Runtime] Evaluate if TBB is better than OMP
  • #508 [Runtime] Remove sol::runtime::Tensor
  • #505 [DFP] Remove Rand from DFP and implement in ISPC/CUDA and VEASL
  • #500 [DFP] input loop merge
  • #499 [Python API] Change sol.optimize(*args, **kwargs) to use one variable
  • #496 [DFP] Ensure that in ISPC each core uses it's own random state
  • #494 [DFP] Remove GLoops
  • #493 [DFP] Don't allocate stack memory in CORES loops.
  • #491 [HLIR] Fix weird TorchVision structure of DenseNet
  • #490 [TF] ValueError: Data cardinality is ambiguous: x sizes: 1, 100, 100
  • #489 [TF] OperatorNotAllowedInGraphError: iterating over `tf.Tensor` is not allowed in Graph execution. Use Eager execution or decorate this function with @tf.function.
  • #488 [HLIR] Add verification for DTypes
  • #487 [PyTorch] Testcase "MemberTensors" fails in Training
  • #486 [TF] can we pass the sol_ctx as attribute instead of scalar tensor?
  • #485 [TF] Enable models with more than 255 model parameter tensors
  • #483 [Core] Buffer Chaining fails if there is a view in between the concat operations
  • #480 [PyTorch] add strided slice support
  • #479 [SQLite] Upgrade to 3.37.2
  • #478 [TF] add missing StridedSlice features
  • #477 [ISPC] Upgrade to 1.17.0
  • #475 [DNNL] Upgrade to v2.5.2
  • #471 [X86] correctly detect vector instructions
  • #470 [TF] tf.math.cumsum
  • #469 [ISPC] migrade dnn_ispc to ispc, as it does not use any dnn components
  • #468 [Core] Verify that all layers have properly been assigned an algorithm before generating code
  • #466 [PyTorch] add torch.cumsum
  • #465 [TF] add tf.Where
  • #464 [Core] remove NUMEL through transform if VDIMS get removed
  • #463 [Core] add input to NUMEL
  • #462 [TF] backport new sol.hlir.reduce API
  • #461 [Core] Revise sol_tensor using real shape and separate numel field
  • #460 [DFP] BackendRenderer missing DTYPE
  • #459 [SQLite] Upgrade to 3.37.1
  • #458 [AVEO] Upgrade to 2.10.0
  • #457 [PyTorch] Support multivalued constants
  • #456 [HLIR] Bugfix GEMM batchHelper function
  • #455 [VDIMS] PyTorch AvgPool2D#3 with two different shapes after each other fail because of identical Hash
  • #454 [Pytorch] Upgrade 1.10.1
  • #453 [Docs] Add TensorFlow Native to VE Chapter
  • #452 [DNNL] Upgrade to 2.5.1
  • #451 [Core] Remove Tensor Class
  • #450 [Docs] Correct VDim case with #2 == 5
  • #449 [Deprecated] YaalUpdate and YaalSeed
  • #448 [VDIMS] option to enable/disable VDIM usage, disable by default.
  • #447 [DFP] Fix BatchNorm BWD Pass
  • #446 [DFP] Flag Loops where DataStride() == 1 for ALL data to be used for SIMD
  • #445 [PyTorch] torch.reshape missing
  • #444 [Core] Print NN Input/Output Signature
  • #443 [PyTorch] argmin + argmax missing
  • #442 [TF] wrong results in activation layers
  • #439 [PIP] Enable manylinux2014 compatible build process
  • #437 [DNNL] Upgrade 2.4.4
  • #436 nec-sol: download.pytorch.org may also need to be trusted
  • #435 nec-sol: codec can't display license agreement
  • #434 upon program exit, malloc(): unsorted double linked list corrupted
  • #433 [TF] Enable to parse tf.Module and tf.saved_models
  • #432 [TF] Upgrade to 2.6.1
  • #431 [PyTorch] torch.rand* and torch.randint* missing
  • #430 [Pytorch] Enable Models without inputs
  • #428 [PyTorch] Add Handler option
  • #427 [TensorFlow] Add Handler option
  • #426 [HLIR] Add Custom Layer Support
  • #423 [PyTorch] Possible Segfault in Py_InfNAN on X86
  • #422 [Core] Add Option to register Operations at runtime
  • #421 [Core] Missing library libcrypto.so.10 on Ubuntu 20.04
  • #420 [PyTorch] Debug v1.10.0 print problems with native-ve
  • #419 [PyTorch] Upgrade RNN API to new HLIR version
  • #418 [CUDNN] Upgrade to 8.2.4
  • #417 [CUDA] Upgrade to 11.3
  • #416 [PyTorch] Upgrade to v1.10.0
  • #415 [CORE] use SQLite Transactions to lock SOL Cache in multiple processes
  • #414 [DNNL] Upgrade to 2.4.2
  • #413 [VE] replace sol_ve_copy with VEDAMemset and VEDAMemcopy
  • #412 [VEDA] Add device side memset
  • #411 [VE] Add ComplexDTypes to Device API
  • #410 [TBB] CMake install script fails on first run.
  • #408 [VEDA] memset 128
  • #407 [VEDA] CMake ASL FFTW
  • #401 [DNNL] Upgrade to 2.4.1
  • #399 [NCC] Add Struct Functions
  • #398 [X86] Check Handle::memset performance!
  • #397 [VEDA] improve S8, S16 vedaMemset Performance
  • #396 [TBB] Upgrade to 2021.4.0
  • #395 [DNNL] Upgrade to 2.4
  • #394 [PyTorch] torch.complex missing
  • #393 [PyTorch] torch.imag missing
  • #391 [PyTorch] sol.optimize(model, torch.Tensor) stopped working?
  • #389 [NEC-SOL] automatically detect VENV
  • #388 [PyTorch] use torch.jit.trace instead of torch.jit.script to support Transformers
  • #387 [HLIR/DFP] Change Dropout to always use input rand
  • #386 [NEC-SOL] UTF-8 encoding problem
  • #385 [DFP] Bad Loop Merging in multi-path kernels
  • #384 [DFP] Wrong vectorization in Bias Sum Case
  • #383 [YAAL] Improve Scheduling
  • #382 [PyTorch] Upgrade to 1.9.1
  • #381 [Tests] Restructure TESTS package, so the PyTorch package does not load TF and vice versa.
  • #379 [NNPACK] Deprecate
  • #376 [ISPC] Check OpenMP implementation performance for small batchsizes
  • #375 [TensorFlow] Upgrade to 2.6.0
  • #364 [DFP] Fix Tile backward pass for X86
  • #363 [PyTorch] aten::repeat missing
  • #362 [PyTorch] aten::tile missing
  • #360 [PyTorch] aten::smooth_l1_loss
  • #355 [PyTorch] HuggingFace transformers broken
  • #332 [TF] Enabled Delayed Allocation in Native-VE
  • #329 [PyTorch] Add BatchNorm num_batches_tracked
  • #321 [API] change sol_external_malloc to use shapes instead of accumulated sizes
  • #307 [HLIR] Input > Dropout > Output produce wrong results in Inference
  • #306 [TF] tf.nn.max_pool_with_argmax returns wrong indicies
  • #302 [PyTorch] Use VE instead of HIP device
  • #299 [PyTorch] Slicing with [0:0] should return empty Tensor
  • #293 [HLIR] Constant with more than 1 element
  • #289 [TF] Remove sol_model.convert("device")
  • #285 [DFP] Remove Nests
  • #284 [DFP] Implement Linear
  • #271 [TensorFlow] enable lazy allocations
  • #267 [Core] Remove OptimizerType
  • #259 [IgorNet] Can't run Inference (division by zero)
  • #239 [PyTorch] add torch.nn.functional.interpolate
  • #225 [DNNL] Upgrade to 2.3.2
  • #212 [HLIR] SIGN on Unsigned is 1
  • #211 [HLIR] Abs on Unsigned is Noop
  • #208 [PyTorch] some networks can't use variable BatchSize
  • #203 [DFP] Buffered Dropout >> Narrow allocates too few memory
  • #202 [Compiler] Show Progress Bar already during Code generation Phase
  • #199 [HLIR] Detect Permutations in front of GEMM and merge them into the GEMM by changing the layout.
  • #198 [HLIR] Initialize Gradients with 0 and make all multi-connections to REQURIESACC
  • #193 [PyTorch] "Only one dimension for View can use summation." in ShuffleNet when using Variadic BatchSizes
  • #187 [DFP] BatchNorm does not support NOT to trace runtime metrics
  • #184 [PyTorch] No Gradient for inputs which get disconnected
  • #183 [PyTorch] LPPool Backward Pass results do not match
  • #182 [DFP] Split BatchNorm into two operations to better fit the actual computations and to remove the workarounds in DFP module
  • #177 [VEBLAS] Memory Estimation: RNN
  • #163 [PyTorch] torch.Tensor.repeat missing
  • #161 [PyTorch] Interpolation Layers missing
  • #157 [Autotuning] Device reports out of memory, as SOL does not use the framework's memory allocator during auto-tuning
  • #139 [Core] Enable to save/load SOL models
  • #135 [HLIR] Pytorchic BERT can't use variable batchsize because of the view?
  • #131 [DFP] segfault generating RReLU layer
  • #120 [DType] Add Complex DTypes
  • #117 [PyTorch] FFT
  • #81 [Variable BatchSize] GPT-2 can't use variable batchsize
  • #80 [Clusters] Errornous accumulation in GPT-2 BWD Pass
  • #71 [DFP] missing layer: MaxUnPooling
  • #70 [DFP] LeakyRelu: Remove IF in generated code
  • #35 [X86] Does not show correct used memory