SOL > Releases > v0.7 Fafnir

v0.7 Fafnir

Version/Date	Changes
08.07.2025 v0.7.6 Docs	Highlights Added support for Keras v3, enables TensorFlow ≥ 2.16 Improved performance when using `sol.device.set(...)` and `torch.compile(...)` by reducing memcopy overhead. Closed Issues #1823 [VE] NCC vectorized DFP for loop that is marked as "non vectorizable" #1821 [PyTorch] Wrong gradients for MaxUnPooling #1818 [HLIR] fix wrong integer postfixes #1816 [Python] Deprecated sol.backends API #1815 [TF] Problem using transparent offloading in predict or fit #1814 [Keras] stateless random layers #1813 [PyTorch] Changed random operators in v2.7 #1811 [Keras v3] Keras Applications DenseNet wrong moving mean/variance in training #1810 [HLIR/DNN] Mixed Precision RNN #1808 [WHL] Broken nec-sol-dist-ve-vednn dependency #1806 [CUDNN] Training on ShuffleNet can't be executed on NVIDIA GPUs #1804 [TF] Wrong gradients in RNN #1803 [TF] Can't execute model as static vdims can't be multiplied using None * None #1802 [TF] New Keras Parser produces wrong random numbers with seed=None #1799 [Docs] Add CUDA/CUDNN requirements #1798 [SQL] TRANSACTIONS cause significant waiting times in other processes #1796 [CUBLAS] SOL DNN CUBLASlt requires at least CUDA >= 11.8 #1789 [PyTorch] store torch.nn.Parameter in ctx.params if using torch.compile to reduce memcpy overhead in transparent offload #1755 [TF] TF_Activation fails because of _keras_logits #1392 [TF] Keras v3 support (TF > v2.15)
05.06.2025 v0.7.5 Docs	Bugfix for Keras RNN parsing and added TF Einsum. Closed Issues #1794 [TF] add Einsum #1793 [TF] Parsing error when Dropout -> RNN #1792 [Docs] Add iWAPT presentation
23.05.2025 v0.7.4 Docs	Bugfix release to relax dist dependency requirements. Closed Issues #1787 [WHL] Relax dist dependencies to allow 1.0.post1 like bugfixes
19.05.2025 v0.7.3 Docs	Bugfix release adding `PadV2` handler to TensorFlow. Closed Issues #1786 [TF] Missing PadV2 handler
24.04.2025 v0.7.2 Docs	Bugfix release upgrading to changed compile flags for PyTorch v2.7.0 integration. Closed Issues #1785 [PyTorch] Fix module for v2.7.0, which requires CXX11 ABI
07.03.2025 v0.7.1 Docs	Bugfix release fixing library loading issues on some systems and improving compatibility to PyTorch v2.6.0. Closed Issues #1776 [PyTorch] Unable to find shape in tensor #1775 [Tungl] Prevent SOL from loading /opt/nec/veos/lib64/libtungl.so.0 #1768 [PyTorch] Missing torch.ops.higher_order.autograd_function_apply
28.02.2025 v0.7.0 Docs	Highlights NEC SX-Aurora (VE) training support. Please read the NEC SX-Aurora section for device specific options. Better control over model determinism and performance. Performance optimizations for X86, NVIDIA and NEC SX-Aurora devices. Improved Gather/Scatter implementations that now support advanced slicing modes, e.g., `torch.index_put`. Improved automatic VDim detection. Improved low precision handling (on supported devices). Closed Issues #1754 [Wrapper] Enable Non-Output Models #1746 [DFP] Unable to subtract (802816 * #0 ~ 802816) from (0 + (1024 * #0 ~ 1024)) in debugMemory #1744 [DFP] Faulty Scheduling #1742 [DFP] Performance #1741 [C-API] Remove Tungl dependency #1740 [PyTorch] Wrong result torch.Tensor.scatter_add_ #1739 [PyTorch] Enable sol.optimize(model.member_function) #1737 [TF] Fix cloning of Keras Legacy layers #1736 [HLIR] Fix Pooling::mergePadding for MaxPooling #1735 [NVIDIA] Add API to increase/fetch/free a Handle maintained Workspace. #1733 [Keras] Add _keras_logits to SoftMax and Sigmoid #1732 [CUDNN] Performance Regression in DeConv/AlexNet #1731 [CUBLAS] Autotune if using TF32 is better or not and let SOL adjust the determinism accordingly! #1730 [JSON] Compress DType #1729 [DFP] Faulty broadcasting in Gather case #1728 [CUDNN] add v9 bias/postops #1726 [Compiler/DFP] add `gen::Func post` to all renderSize/Shape methods to allow easier `uniform` dtypes #1723 [PyTorch] Set sol constants as register_buffer(..., persistent=False) if available #1719 [DFP] Slow Gather/Scatter #1717 [Toolchain] Downgrade GCC to 10 to resolve libstdc++ issues #1716 [PyTorch] torch.nanmean and torch.nansum #1715 [HLIR] Remove Buffer -> Buffer without any other layers in between #1714 [HLIR] Remove Gather with 0-sizes indices #1713 [DNNL] Manylinux2.28 compiled DNNL requires libsupc++ #1711 [PyTorch] wrong scatter input/indices dimensions #1709 [HLIR] SegFault BatchNormUpdate when Momentum == 1.0 #1707 [DNN] Make the Conv+BatchNorm Inference also work on VE and NVIDIA #1706 [TF] Accuracy RegNet #1705 [TF] Accuracy ResNetRS #1704 [Pooling] Precompute values for Pooling::initOutputDims, so we can pass them to MergePooling #1703 [TF] Unable to broadcast in sol_hlir_batch_norm_update #1702 [TF] Expected shape (None, 112, 112, 64) but found (Dim(#0) ~= 32, 109, 109, 64) in DenseNet121 #1700 [HLIR] Prevent Outputs to be used for Persistent Copies #1699 [Numpy] AttributeError: `np.mat` was removed in the NumPy 2.0 release. Use `np.asmatrix` instead. #1698 [ToolChain] Check why some Python packages do not get installed in Docker #1697 [ToolChain] Upgrade to newer ManyLinux as LLVM-VE fails on manylinux2014 container #1696 [TF] BCE nan in training bs=10 #1695 [TF] keras.applications.ResNet50 error while parsing #1694 [TF] TFRenderer broken #1693 [TF] SOL does not return identical Keras Config #1691 [DFP] Wrong SIMD Loop Collapsing #1689 [DFP] InitAccessor Unmergeable not correctly working #1688 [PyTorch] Initializing zero strided parameters from Numpy fail #1686 [HLIR] Remove AxisOp and instead encode in Dims #1685 [DFP] Incomplete loop fusion in ConvNext INF BS32 #1682 [JsonCPP] Disable Exceptions #1681 [CURAND] torch.randint changed in 2.5? #1679 [VEASL] Implement MT19937 #1678 [DNN] Evaluate if checking for min or avg is better #1677 [DFP] Split LoopStacks if temporary data cannot be kept in on-chip memory #1676 [YAAL] Simplify Cluster Calling Convention #1675 [YAAL] Reduce spiking memory consumption #1673 [HLIR] SDPAttention add EnableGQA #1672 [DFP] Error in MaxPool2d training in PyTorch #1671 [DFP] Wrong Clustering #1670 [DFP] WeightedPooling DeConv seems to be broken e.g. in ShuffleNet training #1668 [API] Enable user specified options to be passed on from sol.optimize to sol::compiler::Compiler #1667 [VE] Can't install v0.6.1 with VEDA 2.2 #1666 [CMake] Fix make clean that deletes output folders for docs target #1665 [PyTorch] Use fx.symbolic_trace instead of JIT script if possible #1664 [Profiler/Runtime] Add Profiler Annotations to Runtime Hash #1663 [TF] Issue when Masking(Masking(...)) #1659 [TF] Wrong Gradients for ReduceMax/Min #1658 [TF] Error having View before Output in training #1657 [License] Fix Date Format #1656 [HLIR] Consider new DType encoding #1655 [ISPC] Evaluate if `bool ? 1 : 0` or `!!bool` is faster when casting bool to float #1654 [HLIR] Incomplete Cluster Fusion #1649 [HLIR] Add 'isRematerializable' to all deviceCopy Operations #1648 [PyTorch] Reduce memory consumption in graph broken training #1647 [YAAL] Fix Memory Leak of Persistent data not being freed in bwd pass #1646 [PyTorch] Allow A.fwd, B.fwd, B.bwd, A.bwd execution sequences #1644 [PyTorch] Test v2.5.0 #1642 [Sleef] Evaluate if erf(x) is faster than 1-erfc(x) and vice versa #1641 [RNN] Store workspace data in inputLayout format #1640 [TF] Missing 0-th output from ... #1639 [RNN] Compilation error on X86 using softmax activation #1638 [TF] 'KeyError' when parsing LSTM network #1637 [Wrapper] "Can't find Tensor XXX in CTX" error when using models with _keras_mask attribute. #1636 [Profiler] Performance callbacks added without SOL Profiler being activated #1635 [TF] Remove :0 from input argument names #1634 [Optimizer] add wrapper attributes to printSingnature #1633 [VE] generate offloading wrapper library #1632 [PyTorch] Support Dict, List and Tuple inputs in Script Parser #1631 [NVIDIA] Enable to compile for multiple architectures #1630 [VE] Improve Training #1629 [YAAL] Find a way to properly free persistent data #1628 [VE] Update rematerialization of sol_ctx on device #1627 [VE] try catch does not catch exception correctly #1626 [DNN/GEMM] Deprecated BNIBNO and similiar calls #1625 [DNN/RNN] Backport to new GEMM derive API #1624 [PyTorch] register SOL automatically in PyTorch using entry points #1622 [Sleef] Upgrade 3.7 #1617 [DFP] Make LoopStack::leafStacks a view #1616 [DFP] Fix unparallizable outer loops #1613 [HLIR] Deprecate Layer::remove #1612 [DFP] Improve detection for "requires64Bit" in LoopAccessor #1611 [HLIR/DNN] Remove GEMM::offsetB and GEMM::offsetC #1610 [DFP] Race Condition in Scatter add axis==0 #1609 [HLIR] wrong Gradient "as_strided" #1608 [HLIR] Gradient of some Gather is again a Gather, but with reversed indices #1607 [DNN] Allow BatchCount to be broadcastable #1606 [HLIR] Illegal View transformation of ... in LEDModel #1605 [PyTorch] Gradients of torch.scatter_reduce and scatter_add are wrong #1604 [HLIR] Gradients of Gathers are incorrect #1603 [HLIR] Gradients of tensor assignment are incorrect #1602 [HLIR] Transform Buffer->Copy->Buffer to Buffer->Buffer #1601 [Docs] Add new features to documentation #1600 [Algo] Don't store algos with VDims? #1599 [Profiler] Output filename gets uppercasted #1598 [VDims] Testcase gemm_perf compiles every single case, although VDims is activated #1595 [HLIR] Remove Axes from IncdicesBase #1594 [PyTorch] Linspace #1593 [HLIR] Remove offset, step and loopSize. Rename data to * #1590 [NCC] Check why NCC always enables profiler #1589 [FFT] Make input copy a FFTW specific transformation #1588 [ProgressBar] Progress Bar is still broken #1587 [DNN] Some GEMM C kernels are wrong #1586 [PyTorch] Improve handling of torch.SymInt #1585 [PyTorch] FX Parser can duplicate parameters as it's not checking names properly #1584 [HLIR] Encode Constant value in LayerOutput::defaultValue #1583 [DFP] Allow constant LoopTypes #1582 [HLIR] fix Issue_1208 #1581 [DFP] tensor[1, 2, 3] creates no-loop stack in bwd pass #1580 [DFP] Avoid double broadcast e.g. in PY_Repeat #1579 [DFP] AutoBroadcast #1578 [DFP] Fix Lookup Check in WhereSelect #1576 [DFP] WhereSelect #1575 [DNN] Enable WhereSelect to be applied to specific dimension #1573 [HLIR] Why do VDims removed in transformer 'sequences' #1571 [Wrapper] Handle VDims initialization in Wrapper #1570 [HLIR] Deprecate Dim::setDataSize-like methods and replace with editDataSize-like methods #1569 [Numpy] Upgrade to 2.0 API #1568 [PyTorch] Can we use Dynamo also for static models? #1567 [HLIR] Unify Gather, Scatter, Roll, BufferOffset, BufferSlice, Reverse, Tile, ... #1563 [DFP] PY_Roll computes wrong gradients, when multiple roll write to the same input #1562 [DFP] roll(axis=None) causes no write loops in derive #1559 [Docs] Make grey separator standard on all doc pages #1558 [Installer] Use Version Padding for ~= #1556 [OpenSSL] Upgrade to 3.x #1553 [HLIR] Merge Gather #1539 [VE] remove Handle deviceLists and allocate them instead in Module #1521 [DFP] T2520_D0_Output is a register but shall use accessor in ... #1515 [DFP] Correctly implement Grouped Conv #1513 [NVIDIA] CUDA Graphs #1509 [DFP] Don't collapse gather loops #1506 [DFP] Bloom accuracy on AVX512 #1495 [Parser] Add numpy-Advanced Indexing style #1479 [AutoTuner] Cache Algo's per session if they have constraints #1460 [VDims] Enable GEMM vdims for channels #1449 [PyTorch] Add Loss Functions to Parser/HLIR #1444 [Profiler] Add D2H and H2D memcopies #1434 [CUDNN] Graph API #1414 [CUBLAS] Investigate cublasLT tuning options #1401 [VE] Performance #1390 [HLIR] Improve VDims for Views #1352 [YAAL] Improve Error Handling #1237 [HLIR] Faulty Clustering #1173 [DFP] remove DFP::schedule and instead use the LoopFusion structure of DFP::optimizeLoops to determine execution schedule #1141 [HLIR] GEMM optimization for i == 1 \|\| o == 1 not working for backward pass weight-style GEMM #1139 [HLIR] Remove Immediate Layer Fusion in HLIR #1108 [Distributed] Changes required for multi-node distributed computing #928 [DFP] Narrow, Repeat, Tile: unvectorized LoopStack found #808 [API] Improve Error Messages #786 [HLIR] allow tensor to be casted into complex tensor #767 [DFP] Transform Cores to CoresSIMD, if the sub-SIMD don't share data through a Cache #298 [PyTorch] Can't use sliced Tensor assignment #288 [VEDNN] add static lib for deployment #234 [Jupyter] Can we signal Jupyter when SOL has crashed? #168 [HLIR] Enable Replay in HLIR