v0.5 Diadem



  • General
    • sol.check_version() command can be used to check for new version.
    • Form to apply for SOL4VE closed beta.
    • Improved command-line based installer.
  • PyTorch
    • Calls to sol.optimize(...) from PyTorch does no longer require example inputs. Instead it gets parsed the very first time it gets executed.
    • Limited support for torch.einsum(...)
    • Support for GANs through MaxUnPooling and TransposedConv layer support.
    • Automatically detection when to use torch.jit.script and when to fall back to torch.jit.trace
    • Improved handling of inline-operations
  • TensorFlow
    • RNN support

Closed Issues

  • #775 [OpenSSL] Upgrade to 1.1.1r
  • #774 [PyTorch] v1.12.1 breaks VE complex number support
  • #772 [PyTorch] SqueezeNet training accuracy problems on VE
  • #771 [DFP] Investigate Unvectorizable DType in SqueezeNet Training on VE
  • #765 [PyTorch] automatic fallback to jit.trace if model parsing fails with jit.script
  • #764 [Docs] add Pytorch lazy optimizer documentation
  • #763 [PyTorch] enable kwargs for trace=True cases
  • #762 [PyTorch] add lazy optimizer
  • #761 [PyTorch] add torch.triu
  • #760 [Docs] Update supported layers
  • #759 [Core] Disable Progress bar if not running in interactive shell
  • #758 [Hugo] Upgrade to v0.103.0
  • #756 [VE] find alternative for "constexpr" in RNN implementation
  • #754 [DFP] Implement Interpolate autoSqueeze
  • #753 [Config] Remove unused config options
  • #752 [Docs] Update Docs for v0.5.1
  • #751 [NEC-SOL] Add "test access" option to nec-sol
  • #750 [NEC-SOL] Device Support Packages that users don't have access to prevent to install the package in the first place.
  • #748 [DNNL] Upgrade 2.6.2
  • #747 [NEC-SOL] add --verbose flag and report access to urls.
  • #745 [TF+RNN] verify RNNSimple results
  • #743 Add TF v2.10.0 Support
  • #739 [Hugo] upgrade v0.102.1
  • #735 [Boost] Upgrade to 1.80.0
  • #733 [OMP] Add Heuristics to sol_parallel_for and sol_parallel_simd
  • #732 [HLIR] add option to remove unused model parameters
  • #731 [HLIR] Make RNN sequence length variable
  • #730 [Keras] add tf.ensure_shape to KerasLayer
  • #729 [Keras] Support named inputs
  • #728 [HLIR] Repair Where 2 min/max transformation
  • #726 [DNN/RNN] evaluate not using BLAS
  • #725 [TF] Solve Threadblocking Issue on X86
  • #723 [HLIR] derive(Slice) == Slice, if it's a reverse-slice
  • #720 [TF2RNN] Handle cases where only OH is used without slicing
  • #719 [RNN] Move sol.hlir.rnn API to C++ space
  • #718 [DNN/RNN] Improve Handling of O and OH
  • #717 [RNN] Masking Support
  • #716 [RNN] Recurrent Dropout Support
  • #714 [HLIR] remove dropout layers from inference executions
  • #713 [TF] fix keras (alpha_)dropouts in inference mode
  • #712 [TF] enable to change input shapes using sol.optimize(..., shapes={...}, ...)
  • #711 [DNNL] Upgrade 2.6.1
  • #705 [OpenSSL] Upgrade 1.1.1q
  • #704 [NVIDIA] JIT compile 64 and 128 Bit memset functions
  • #703 [Numpy] Numpy Runtime can cause SegFault when a NDArray gets freed when SOL already destroyed the handlers during shutdown
  • #702 [ONNX] add MaxUnPool
  • #701 [PyTorch] add MaxUnPooling
  • #700 [HLIR] check if we MaxUnPooling always uses "max" instead of "+=" in fwd Pass
  • #699 [HLIR] Add DeSampling optimization to DeConv
  • #698 [Core] Fix Conv::Transform::Subsampling when applied in Bwd Pass
  • #697 [PyTorch] accuracy problem in RNN with S>1 in v1.12.0
  • #695 [PyTorch] add aten::pad
  • #694 [PyTorch] add aten::index
  • #688 [PyTorch] Upgrade to 1.12.0
  • #687 [PyTorch] Add Swin Transformer Tests
  • #686 [NEC-SOL] Pure command line mode?
  • #685 [DFP] fix unvectorized read loops
  • #684 [DFP-NCC] remove struct operator and add restricted keyword.
  • #683 [NCC] Improve DFP unrolling
  • #682 [PyTorch] Add PyTorch Lightning Support
  • #673 [OpenSSL] Upgrade 1.1.1p
  • #672 [JIT-NCC] In debug, warn about obstructive functions, and other vectorization problems.
  • #671 [VE] Add _Pragma("_NEC always_inline") into DFP-NCC headers
  • #670 [API] Add linker script
  • #669 [JIT] Add linker script
  • #668 [API] Replace all remaining extern "C" with SOL_API
  • #665 [CUDNN] Get Bundle from NVIDIA
  • #664 [MKL] Switch to PIP MKL Package instead of Bundling
  • #663 [Runtime] Add Mutex to runtime::device::Network to prevent parallel executions as in TF.
  • #662 [TF/X86] Warn user if tf.config.threading.set_inter_op_parallelism_threads is not set to 1, that it has negative impact on SOL performance
  • #661 [Python] Replace any remaining 'print' with 'tungl.info'
  • #660 [Web] Add SOL4VE Registration Form
  • #658 [PyTorch+TF] Use unified implementation of framework::Handle for all devices
  • #657 [TF] Add option to enable grad on inputs
  • #654 [OpenSSL] Upgrade 1.1.1o
  • #653 [Hugo] Upgrade 0.100.2
  • #652 [Installer] Upgrade fails with 'set' object has no attribute 'append'
  • #649 [TF] Unable to fetch values for...
  • #647 [Core] Lock DB when executing cache::clear
  • #646 [Docs] Update TF SDK docs
  • #645 [HLIR] Transformation to remove duplicate Permutes is not working correctly.
  • #644 [TF] RNN Seq2Seq testcase exposes #3 as vdim
  • #643 [Hugo] Upgrade v0.100.1
  • #642 [TF] GRU
  • #641 [TF] LSTM
  • #640 [TF] SimpleRNN
  • #638 [TF] Evaluate unified module instead of compiling new modules for each network
  • #636 [Profiler] Rework API
  • #635 [Installer] Not Showing PyTorch/TensorFlow properly
  • #634 [NCC] investigate why OpenMP is not working with -std=c++17
  • #633 [MKL] RNN
  • #631 [TF] Upgrade 2.9.1
  • #630 [Deployment] change binary2obj to objcopy
  • #629 [PyTorch] Sum of BOOL needs to be casted into INT DType
  • #626 [Python] Evaluate if using CFFI is less verbose compared to CTypes
  • #625 [PyTorch] Evaluate if replacing sol.runtime.set_tensor with a CPP function yields in less overhead.
  • #621 [HLIR] ZeroCopy Layer, that tries to not duplicate outputs in frameworks.
  • #609 [ONNX] GroupNorm: illegal view transformation
  • #608 [TF] Add tf.keras.applications testcases
  • #607 [TF] Upgrade 2.8.2, 2.7.4, 2.6.5
  • #601 [VEDA] VEDA_ERROR_UNKNOWN_CONTEXT thrown when calling vedaDevicePrimaryCtxRetain
  • #600 [PyTorch] Missing Primitive: aten::bernoulli
  • #597 [Core] Wrong Rendering of Output in Inception
  • #593 [X86+VE] IRFFT2D accuracy issues
  • #582 [Core] Show Update Message
  • #547 [PyTorch] Remove RNN Fix for when PyTorch v1.12.0 is released
  • #541 [DFP] improve size-1 write loop removal
  • #540 [DFP] Split BatchSize == 1 outer loops if there are multiple inner
  • #498 [Python API] Add option to override requires_grad
  • #283 [TensorFlow] automatically unload network, when all instances of the network have been destroyed
  • #214 [VEBLAS] Better parallelize RNN for small BatchSizes
  • #209 [PyTorch] automatically unload network, when all instances of the network have been destroyed
  • #123 [Layers] ConvTranspose/Deconvolution

This is the 2nd release candidate of SOL v0.5.0 Diadem. It adds brings back support for ONNX, Numpy (runtime), as well as lots of bugfixes and improvements.


  • ONNX frontend
  • Numpy runtime
  • Printing StackTrace when exception occurs in SOL
  • Improved printing of model signature
  • The installer no longer shows packages you don't have access to

Closed Issues

  • #624 [FFT] duplicate symbol "vdims"
  • #623 [TF] tf.nn.max_pool_with_argmax indices differ again...
  • #619 [Hugo] Upgrade 0.99.0
  • #617 [ONNX] "Unable to fetch values for " in ShuffleNet
  • #616 [ONNX] "can't 384 / 13 because it's no divider" in X86 Igornet and Layers[Repeat]
  • #615 [ONNX] "Unable to find dimension PO1" in X86 MNasNet and EfficientNet
  • #614 [ONNX] Cumsum accuracy issues
  • #613 [Core] Improve calculation of memory consumption estimation
  • #612 [TF] TF casts shape of [1,1,1,1] to [] during training
  • #611 [NCC] Segfault in networks using VDims
  • #610 [ONNX] CumSum
  • #606 [ISPC] Wrong initialization of VDims
  • #605 [DB] Verify that we can open multiple instances of SOL using the same DB file
  • #604 [X86+FFT] Accuracy Error in PY_IRFFT2D
  • #603 [X86+PyTorch] Memory is already allocated
  • #602 [X86+FFT] [DNNL ERROR] The operation failed because of incorrect function arguments.
  • #599 [PyTorch] Missing Primitive "aten::empty"
  • #598 [HLIR] VDims cause promotion to F64 in BatchNorm
  • #595 [VE] RNN Segfault in Testcases
  • #594 [Tests] Report % of values that exceed threshold
  • #592 [PyTorch] Models without parameter can create training context if 'model.training = False' is not set
  • #591 [Docs] Update documentation for v0.5.0rc2
  • #590 [Core] "axis needs to be between 0 and 0 but found 1" in DeReduce
  • #589 [TF] Can't use Keras models with scalar parameters
  • #587 [TF] Ordering of Parameters in KerasLayer is not stable
  • #586 [VE] Investigate possible Memleak in PyTorch-VE Runtime
  • #584 [PyTorch] Add Error Handling when User provides too many/too few arguments to function
  • #583 [JsonCPP] Upgrade to 1.9.5
  • #581 [ISPC] Upgrade to 1.18.0
  • #580 [CMake] Change min GCC to 10.X
  • #579 [Hugo] upgrade 0.98.0
  • #578 [PyTorch] RNNCell parsing fails in PyTorch 1.11.0
  • #577 [PyTorch] "illegal view transformation" when parsing RNN
  • #576 [DB] automatically recover from "database image malformed" exceptions
  • #575 [VE] RNN does not compile with NCC 3.4.2
  • #574 [Runtime] SegFault in Context::Destroy when starting another training context.
  • #573 [Core] Print StackTrace when SOL Exceptions are thrown
  • #572 [VE] Value2VDim
  • #571 [X86] Value2VDim
  • #570 [VE] WhereTrue
  • #569 [X86] WhereTrue
  • #568 [VE] PrefixSum
  • #567 [X86] PrefixSum
  • #566 [HLIR] return NAN instead of throwing error when dividing by 0
  • #563 [HLIR] Remove Unnecessary Permute
  • #562 [Runtime] do not reinit Params if the model has been run before
  • #561 [PyTorch] add Mobilenet V3 testcases
  • #560 [PyTorch] add efficientnet testcases
  • #559 [PyTorch] add ConvNext Testcases
  • #558 [PyTorch] add RegNet testcases
  • #556 [HLIR] show actual input/output shapes, not the "possible"
  • #555 [Installer] Cache Password for PIP
  • #554 [Installer] hide packages that user does not have access to
  • #553 [Runtime] Context needs to distinguish between offloading and framework handle!
  • #552 [Numpy] Implement Lazy Allocations in Numpy executor
  • #551 [HLIR] Fix Memory Consumption to report Copies not as Outputs
  • #550 [NCC/OMP] cannot explicitly instantiate std::tuple in sol parallel simd with NCC 3.3 or newer
  • #549 [Compiler/Runtime] Encode/Check framework version, device compute capability, ... in compiled NN library name
  • #548 [SQLite] Upgrade to 3.38.2
  • #546 [DNNL] Upgrade 2.6
  • #545 [TF] SOL's MaxPooling makes other choices than TF's implementation in training
  • #544 [TF] implement save/load methods in Keras model
  • #543 [TF] fix BatchNorm Assignment
  • #539 [DNNL] Illegal Operation in Resnext BWD Pass
  • #538 [Python] Don't store size of Tensor in Python but always request from C++ space
  • #522 [DNNL] Performance Problem in ConvBwdData Param Reorder
  • #514 [PyTorch] Upgrade 1.11.0
  • #512 [HLIR] Copy Buffers during Training, not LayerOutputs
  • #507 [TF] Which momentum does TF use in BatchNorm?
  • #492 [TF/VE] Check if Optimizers and Loss functions are implemented
  • #438 [TF] Update 2.8.0
  • #429 [ONNX] Add Handler option
  • #425 [HLIR] Remove unnecessary Copies in RNN -> Copy -> Output i.e. for Workspace
  • #276 [ONNX] set that params require grad, so we can train ONNX models
  • #201 [DFP] Double check if we correctly report the scratchpad memory
  • #186 [ONNX] use sol.internal.Tensor operators instead of explicit calls
  • #166 [PyTorch] torch.nn.InstanceNorm2d missing
  • #165 [PyTorch] torch.nn.GroupNorm missing
  • #113 [ONNX] PRelu
  • #112 [ONNX] Gather
  • #108 [ONNX] ERROR getsym_handler
  • #74 [DFP] Performance: implement Input Loop Merging

The SOL v0.5.0 Diadem is the next major release of SOL, containing already over 160 closed issues! We further switched to a new rolling releases model using release candidates, to push out fixes + changes more often.

This first release candidate DOES NOT contain support for: NVIDIA devices, ONNX or Deployment! These features will be reenabled in later release candidates.


Breaking Changes

  • We modified the sol.optimize(model, args, kwargs={}, framework=None, **fwargs) call. If you have more than one input, you now need to pass it using the args argument as list or tuple, or using the kwargs as a dictionary. This was necessary to be more compliant with the AI framework's.
  • sol.optimize(..., batchsize=...) has been removed in favor of the new variable dimensions system. Please look here for more details.

Closed Issues

  • #538 [Python] Don't store size of Tensor in Python but always request from C++ space
  • #537 [OpenSSL] Upgrade to 1.1.1n
  • #535 [PyTorch] create OMP Symlink in sol-framework-pytorch-x86
  • #534 [DNNL] Upgrade 2.5.3
  • #533 [SQLITE] Upgrade 3.38.0
  • #531 [Core] Report Peak Memory in Output during Compilation Phase for each Pass
  • #530 [Python] increase min Python Version from 3.6 to 3.7 because of TF 2.8.0 requirement
  • #529 [PyTorch] Upgrade v1.10.2
  • #524 [PyTorch] Check why we get weird JIT accuracy errors in HuggingFace Bert training
  • #523 [Python] deprecated "copy_parameters" and just do it always.
  • #521 [DNNL] implement memory layout AutoTuning for Conv
  • #520 [CMake] Enable Release Candidates
  • #519 [VE] If NC++ is not found, we need to disable VE entirely, not just not setting the VE_LD_LIBRARY_PATH
  • #518 [VEDA] upgrade 1.2.0
  • #517 [Runtime] segfault when an inference Context is still open and get cleaned up during shutdown
  • #516 [PyTorch] Find a solution for conflicting GOMP implementation
  • #515 [OMP] Limit number of threads to match number of hardware cores, not number of hardware threads
  • #511 [Runtime] Store raw pointer in sol_tensor
  • #510 [DFP] Unroll Interpolation using TmpVars
  • #509 [Runtime] Evaluate if TBB is better than OMP
  • #508 [Runtime] Remove sol::runtime::Tensor
  • #505 [DFP] Remove Rand from DFP and implement in ISPC/CUDA and VEASL
  • #500 [DFP] input loop merge
  • #499 [Python API] Change sol.optimize(*args, **kwargs) to use one variable
  • #496 [DFP] Ensure that in ISPC each core uses it's own random state
  • #494 [DFP] Remove GLoops
  • #493 [DFP] Don't allocate stack memory in CORES loops.
  • #491 [HLIR] Fix weird TorchVision structure of DenseNet
  • #490 [TF] ValueError: Data cardinality is ambiguous: x sizes: 1, 100, 100
  • #489 [TF] OperatorNotAllowedInGraphError: iterating over `tf.Tensor` is not allowed in Graph execution. Use Eager execution or decorate this function with @tf.function.
  • #488 [HLIR] Add verification for DTypes
  • #487 [PyTorch] Testcase "MemberTensors" fails in Training
  • #486 [TF] can we pass the sol_ctx as attribute instead of scalar tensor?
  • #485 [TF] Enable models with more than 255 model parameter tensors
  • #483 [Core] Buffer Chaining fails if there is a view in between the concat operations
  • #480 [PyTorch] add strided slice support
  • #479 [SQLite] Upgrade to 3.37.2
  • #478 [TF] add missing StridedSlice features
  • #477 [ISPC] Upgrade to 1.17.0
  • #475 [DNNL] Upgrade to v2.5.2
  • #471 [X86] correctly detect vector instructions
  • #470 [TF] tf.math.cumsum
  • #469 [ISPC] migrade dnn_ispc to ispc, as it does not use any dnn components
  • #468 [Core] Verify that all layers have properly been assigned an algorithm before generating code
  • #466 [PyTorch] add torch.cumsum
  • #465 [TF] add tf.Where
  • #464 [Core] remove NUMEL through transform if VDIMS get removed
  • #463 [Core] add input to NUMEL
  • #462 [TF] backport new sol.hlir.reduce API
  • #461 [Core] Revise sol_tensor using real shape and separate numel field
  • #460 [DFP] BackendRenderer missing DTYPE
  • #459 [SQLite] Upgrade to 3.37.1
  • #458 [AVEO] Upgrade to 2.10.0
  • #457 [PyTorch] Support multivalued constants
  • #456 [HLIR] Bugfix GEMM batchHelper function
  • #455 [VDIMS] PyTorch AvgPool2D#3 with two different shapes after each other fail because of identical Hash
  • #454 [Pytorch] Upgrade 1.10.1
  • #453 [Docs] Add TensorFlow Native to VE Chapter
  • #452 [DNNL] Upgrade to 2.5.1
  • #451 [Core] Remove Tensor Class
  • #450 [Docs] Correct VDim case with #2 == 5
  • #449 [Deprecated] YaalUpdate and YaalSeed
  • #448 [VDIMS] option to enable/disable VDIM usage, disable by default.
  • #447 [DFP] Fix BatchNorm BWD Pass
  • #446 [DFP] Flag Loops where DataStride() == 1 for ALL data to be used for SIMD
  • #445 [PyTorch] torch.reshape missing
  • #444 [Core] Print NN Input/Output Signature
  • #443 [PyTorch] argmin + argmax missing
  • #442 [TF] wrong results in activation layers
  • #439 [PIP] Enable manylinux2014 compatible build process
  • #437 [DNNL] Upgrade 2.4.4
  • #436 nec-sol: download.pytorch.org may also need to be trusted
  • #435 nec-sol: codec can't display license agreement
  • #434 upon program exit, malloc(): unsorted double linked list corrupted
  • #433 [TF] Enable to parse tf.Module and tf.saved_models
  • #432 [TF] Upgrade to 2.6.1
  • #431 [PyTorch] torch.rand* and torch.randint* missing
  • #430 [Pytorch] Enable Models without inputs
  • #428 [PyTorch] Add Handler option
  • #427 [TensorFlow] Add Handler option
  • #426 [HLIR] Add Custom Layer Support
  • #423 [PyTorch] Possible Segfault in Py_InfNAN on X86
  • #422 [Core] Add Option to register Operations at runtime
  • #421 [Core] Missing library libcrypto.so.10 on Ubuntu 20.04
  • #420 [PyTorch] Debug v1.10.0 print problems with native-ve
  • #419 [PyTorch] Upgrade RNN API to new HLIR version
  • #418 [CUDNN] Upgrade to 8.2.4
  • #417 [CUDA] Upgrade to 11.3
  • #416 [PyTorch] Upgrade to v1.10.0
  • #415 [CORE] use SQLite Transactions to lock SOL Cache in multiple processes
  • #414 [DNNL] Upgrade to 2.4.2
  • #413 [VE] replace sol_ve_copy with VEDAMemset and VEDAMemcopy
  • #412 [VEDA] Add device side memset
  • #411 [VE] Add ComplexDTypes to Device API
  • #410 [TBB] CMake install script fails on first run.
  • #408 [VEDA] memset 128
  • #407 [VEDA] CMake ASL FFTW
  • #401 [DNNL] Upgrade to 2.4.1
  • #399 [NCC] Add Struct Functions
  • #398 [X86] Check Handle::memset performance!
  • #397 [VEDA] improve S8, S16 vedaMemset Performance
  • #396 [TBB] Upgrade to 2021.4.0
  • #395 [DNNL] Upgrade to 2.4
  • #394 [PyTorch] torch.complex missing
  • #393 [PyTorch] torch.imag missing
  • #391 [PyTorch] sol.optimize(model, torch.Tensor) stopped working?
  • #389 [NEC-SOL] automatically detect VENV
  • #388 [PyTorch] use torch.jit.trace instead of torch.jit.script to support Transformers
  • #387 [HLIR/DFP] Change Dropout to always use input rand
  • #386 [NEC-SOL] UTF-8 encoding problem
  • #385 [DFP] Bad Loop Merging in multi-path kernels
  • #384 [DFP] Wrong vectorization in Bias Sum Case
  • #383 [YAAL] Improve Scheduling
  • #382 [PyTorch] Upgrade to 1.9.1
  • #381 [Tests] Restructure TESTS package, so the PyTorch package does not load TF and vice versa.
  • #379 [NNPACK] Deprecate
  • #376 [ISPC] Check OpenMP implementation performance for small batchsizes
  • #375 [TensorFlow] Upgrade to 2.6.0
  • #364 [DFP] Fix Tile backward pass for X86
  • #363 [PyTorch] aten::repeat missing
  • #362 [PyTorch] aten::tile missing
  • #360 [PyTorch] aten::smooth_l1_loss
  • #355 [PyTorch] HuggingFace transformers broken
  • #332 [TF] Enabled Delayed Allocation in Native-VE
  • #329 [PyTorch] Add BatchNorm num_batches_tracked
  • #321 [API] change sol_external_malloc to use shapes instead of accumulated sizes
  • #307 [HLIR] Input > Dropout > Output produce wrong results in Inference
  • #306 [TF] tf.nn.max_pool_with_argmax returns wrong indicies
  • #302 [PyTorch] Use VE instead of HIP device
  • #299 [PyTorch] Slicing with [0:0] should return empty Tensor
  • #293 [HLIR] Constant with more than 1 element
  • #289 [TF] Remove sol_model.convert("device")
  • #285 [DFP] Remove Nests
  • #284 [DFP] Implement Linear
  • #271 [TensorFlow] enable lazy allocations
  • #267 [Core] Remove OptimizerType
  • #259 [IgorNet] Can't run Inference (division by zero)
  • #239 [PyTorch] add torch.nn.functional.interpolate
  • #225 [DNNL] Upgrade to 2.3.2
  • #212 [HLIR] SIGN on Unsigned is 1
  • #211 [HLIR] Abs on Unsigned is Noop
  • #208 [PyTorch] some networks can't use variable BatchSize
  • #203 [DFP] Buffered Dropout >> Narrow allocates too few memory
  • #202 [Compiler] Show Progress Bar already during Code generation Phase
  • #199 [HLIR] Detect Permutations in front of GEMM and merge them into the GEMM by changing the layout.
  • #198 [HLIR] Initialize Gradients with 0 and make all multi-connections to REQURIESACC
  • #193 [PyTorch] "Only one dimension for View can use summation." in ShuffleNet when using Variadic BatchSizes
  • #187 [DFP] BatchNorm does not support NOT to trace runtime metrics
  • #184 [PyTorch] No Gradient for inputs which get disconnected
  • #183 [PyTorch] LPPool Backward Pass results do not match
  • #182 [DFP] Split BatchNorm into two operations to better fit the actual computations and to remove the workarounds in DFP module
  • #177 [VEBLAS] Memory Estimation: RNN
  • #163 [PyTorch] torch.Tensor.repeat missing
  • #161 [PyTorch] Interpolation Layers missing
  • #157 [Autotuning] Device reports out of memory, as SOL does not use the framework's memory allocator during auto-tuning
  • #139 [Core] Enable to save/load SOL models
  • #135 [HLIR] Pytorchic BERT can't use variable batchsize because of the view?
  • #131 [DFP] segfault generating RReLU layer
  • #120 [DType] Add Complex DTypes
  • #117 [PyTorch] FFT
  • #81 [Variable BatchSize] GPT-2 can't use variable batchsize
  • #80 [Clusters] Errornous accumulation in GPT-2 BWD Pass
  • #71 [DFP] missing layer: MaxUnPooling
  • #70 [DFP] LeakyRelu: Remove IF in generated code
  • #35 [X86] Does not show correct used memory