pagesIndex = [
{
	"uri": "/releases/v0.8.html",
	"title": "v0.8 Gacrux",
	"tags": [],
	"description": "",
	"content": " Version/DateChanges 17.04.2026v0.8.0Docs This is a feature update. It is possible that performance decreased compared to v0.7 release. We are focusing on performance improvements in upcoming releases.\nHighlights All sol.config[...] values can now also be set via env vars. The env name always starts with SOL_CONFIG_ followed by the config key, in upper case and :: replaced with _, e.g., sol.config[\"dfp::debug\"]=True becomes SOL_CONFIG_DFP_DEBUG=TRUE. For boolean values TRUE, ON or 1 can be used. Switched to intrinsics-like code generation for compute kernels. Added support for variable pixel dimensions of conv and pooling layers. Breaking Changes The default location for the SOL cache (.sol folder) is now at $HOME/.cache/sol. Please use the env var SOL_CWD=/path/to/cache to move the SOL cache if required. python3 -m sol fix-omp has been deprecated. Use nec-sol fix-omp instead! python3 -m sol-fix will be removed in v0.9! We decided to make no variable dimensions enabled by default for performance reasons. Use vdims=[...] to manually enable variable dimensions. Changing backend heuristics via sol.config[...] has been removed. Closed Issues #1943\t[PyTorch] Test v2.11.0 #1938\t[TF] Test v2.21.0 #1934\t[CUBLAS] Set RPATH for libcublas.so in CUBLAS-handle #1933\t[LicenseServer] Change .postVERSION to .postEXPIRATIONDATE #1931\t[DFP] Improve Constant Setting #1930\t[DFP] Improve mask sorting #1928\t[HLIR] Performance problem when using BatchNormConvFuse, if the weights are significantly larger than the input data #1927\t[DFP] atomicAdd(0.0) does not need to be executed! #1926\t[DFP] Wrong gradients for complex mul and div #1925\t[DFP] VE Intrinsics performance regression #1923\t[DFP] Inline constant computations #1921\t[HLIR] Add topk #1920\t[DFP] Add code path for CUDA atomics for float16/32 dtypes #1918\t[CUDNNv9] Returns CUDNN_STATUS_NOT_INITIALIZED with float16/bfloat16 but works with float32 #1916\t[DFP] Lookup boundary checks #1914\t[Keras] Keras.predict does not throw exception when wrong input dtype is detected #1913\t[License] Add .postX for every time a new license is issued, to prevent caching effects of expired license files. #1912\t[DFP] move autoSqueeze from lowerIR to Planner #1911\t[DFP] Unnecessary select for select(not(X), scalar(1), scalar(0)) #1910\t[DFP] Minimize register usage #1909\t[SQLite] Improve stability when SOL cache is on a network drive #1906\t[DFP] Investigate why PyTorch is faster on PY_Issue_1740 #1901\t[DFP] AllReduce performance optimization #1899\t[Cache] Add XDG_CACHE_HOME to determine ~/.cache #1896\t[PyTorch] Test v2.9.1 #1895\t[Core] Consider to move .sol to $HOME/.cache/sol as default #1890\t[Installer] Add support for credential files #1867\t[HLIR] ViewDType breaks sol_tensor_ptr_XXX(...) dtype check! #1854\t[PyTorch] ANEMOI parsing: torch.compile reports different DTypes than SOL #1851\t[Debug] Estimated Peak Memory differs from Memory Dump #1845\t[Installer] Keyring #1834\t[CUDNNv8] Remove SymPad and use explicit padding if leftPadding != rightPadding #1831\t[Installer] add --fix-omp option #1819\t[VE] VEDA_ERROR_CANT_FREE_NON_DELAYED_VPTR in TF_Dropout using VEO mode #1812\t[HLIR] Enable Conv with vdims in pixel dimensions #1797\t[JIT] Use SQL lock to mutex compilation of handle/modules in parallel processes #1795\t[VE] VEDA_ERROR_UNKNOWN_VEDA_ARCHITECTURE thrown in initVEDA if no VE and no VEOS is present in system #1782\t[DFP] Split nested loops that underutilize the vectors #1781\t[Python] SOL might use wrong Python #1780\t[Installer] Project-URL is wrong, should be v0.X not v0.X.Y #1778\t[HLIR] Remove returnCarry from PrefixSum, instead handle like in MaxPool #1777\t[Utils] cleanup file paths #1774\t[DNN] Rework Handlers #1773\t[DNNL] Report DNNL Workspace Size #1771\t[YAAL] Workspaces #1770\t[CUDNN] Fix CUDA Graphs when using CUDNN #1769\t[TF] sol.optimize example inputs support #1767\t[Python] \"There is a newer version of SOL available\" detects post installers as NEW SOL version #1764\t[DFP] Deprecate LoopStack::sort #1763\t[DFP] Improve implementation of CoresSIMD -\u003e SIMD #1757\t[DNN] Can we remove sol_ctx from the constructor of DeviceMap? #1734\t[OMP] Remove Cost Model #1710\t[Macros] Unify SOL_HAS_FUNC and SOL_HAS_FUNC2 #1662\t[VLLM] Remove support #1615\t[HLIR] Implement Interpolate as Gather? #1438\t[DFP] AllReduce #919\t[HLIR] Merge two Reorder's with different shape sizes #472\t[X86, VE, NVIDIA] Reverse PrefixSum "
},
{
	"uri": "/releases/v0.7.html",
	"title": "v0.7 Fafnir",
	"tags": [],
	"description": "",
	"content": " Version/DateChanges 14.11.2025v0.7.13Docs Closed Issues #1907\t[HLIR] Error in torch.scatter_reduce_ using VDIMS 12.11.2025v0.7.12Docs Closed Issues #1905\t[SQLite] Add env vars to control SQLITE config #1904\tToo restrictive assertion in scatter mode::Reduce #1900\t[PyTorch] double registration to torch.compile 27.10.2025v0.7.11Docs Highlights [Experimental] Significantly improved VDims system, to enable more dynamic models. [Experimental] Added support for torch.compile(..., dynamic=True) and torch.fx symbolic operators. [Experimental] Added PyTorch graph breaks, to enable using torch.distributed calls within the model. Closed Issues #1888\t[PyTorch] Test v2.9.0 #1887\t[DNN] Stop GEMM autotuning if other solution was found that is significantly faster #1886\t[HLIR] Don't use generator_device for hash if there are Input or Param #1885\t[PyTorch] set torch._dynamo.config.allow_rnn=True #1884\t[VE] Reduce malloc overhead for \u003e4GB allocations #1883\t[DNN] Don't autotune GEMM if dims are unknown #1882\t[HLIR] Add VDim support to arange #1881\t[PyTorch] Evaluate torch.compile(..., dynamic=True) #1880\t[HLIR] Improve graph breaking and detection of identical sub-graphs #1879\t[HLIR] Transform Split(Indices, 2) into 2x Indices #1878\t[PyTorch] Investigate why multiple runs often lead to unstable hash generation #1877\t[Runtime] Share offloaded tensors across multiple instances of runtime::Network #1876\t[PyTorch] Evaluate torch._dynamo.config.capture_scalar_outputs = True #1873\t[Keras] efficientnetb0 \"Cannot modify VDims at this point anymore!\" #1872\t[TransparentOffload] Store exactly which tensors need to be copied back #1870\t[DFP] WeightConv broken in VE #1869\t[HLIR] Chunk can remove DimSym if the inputs get changed to static dims #1868\t[VEDNN] Segfault vednnConvolutionBackwardFilter with 1x1 kernel, 3in, 3out and stride==2 #1866\t[VE] PyTorch accuracy #1861\t[PyTorch] Gradient Accuracy #1860\t[Runtime] Set host_handle/device_handle for output in NetworkRenderer and remove from module.cpp #1859\t[TensorFlow] Test v2.20.0 #1857\t[Runtime] Simplyfy sol_ctx #1856\t[CUDA] libcudart.so requires to link to -ldl in NVIDIA HPC CUDA Toolkit #1855\t[CUDA] Problems finding CUDA toolkit with NVIDIA HPC package #1849\t[PyTorch] einops.einops.reduce #1839\t[PyTorch] Split torch.compile models #1765\t[VE] ANEMOI using no-autotuning is faster on VE than with autotuning #1701\t[HLIR/YAAL/Wrapper] Add \"clone\" operator. #1292\t[PyTorch] torch.compile(...) does not pass models containing RNN layers to custom compilers #90\t[VEDNN] possible error in conv2d_bwd_filter in SqueezeNet 1.0 01.08.2025v0.7.10Docs Closed Issues #1853\t[DFP] allocates too much data for intermediate tensors #1852\t[VE] Report out-of-memory #1850\t[DEBUG] Add stack traces to original code if available 30.07.2025v0.7.9Docs Closed Issues #1846\t[PyTorch] add \"amin\" and \"amax\" to torch.scatter_reduce 28.07.2025v0.7.8Docs Highlights Added more PyTorch operators Enabled usage of VDims within Concat and Split Experimental VDims recommendation system, that automatically sets the batch dimension to be vdim in CNNs and batch and sequence dimensions in RNNs. Closed Issues #1843\t[PyTorch] Investigate Transformer VMap cases that don't work properly #1842\t[PyTorch] torch.where(condition) #1840\t[PyTorch] Improve VDims in Split #1837\t[PyTorch] torch.nonzero #1836\t[PyTorch] torch.unique[_consecutive] #1835\t[PyTorch] torch.sort #1833\t[ISPC] Issue with integer postfix #1832\t[HLIR] Add new VDim Recommendation system #1809\t[HLIR] Enable Concat with VDim inputs #1500\t[PyTorch] torch.amax 10.07.2025v0.7.7Docs Closed Issues #1830\t[Keras v2] mixed_float16 issue #1827\t[HLIR] \"Implementation Error\" in sol::compiler::hlir::functional(sol::compiler::LayerOutput*, sol::compiler::Functional::Type) #1825\t[TF/CUDA] out of memory during training #1824\t[CUDA] Error nvcuda::wmma::precision::tf32 on CC == 7.5 #1817\t[CUDNN] Add CUDNN v8 backend, to enable Conv in older TF versions 09.07.2025v0.7.6Docs Highlights Added support for Keras v3, enables TensorFlow \u0026ge; 2.16 Improved performance when using sol.device.set(...) and torch.compile(...) by reducing memcopy overhead. Closed Issues #1823 [VE] NCC vectorized DFP for loop that is marked as \"non vectorizable\" #1821 [PyTorch] Wrong gradients for MaxUnPooling #1818 [HLIR] fix wrong integer postfixes #1816 [Python] Deprecated sol.backends API #1815 [TF] Problem using transparent offloading in predict or fit #1814 [Keras] stateless random layers #1813 [PyTorch] Changed random operators in v2.7 #1811 [Keras v3] Keras Applications DenseNet wrong moving mean/variance in training #1810 [HLIR/DNN] Mixed Precision RNN #1808 [WHL] Broken nec-sol-dist-ve-vednn dependency #1806 [CUDNN] Training on ShuffleNet can't be executed on NVIDIA GPUs #1804 [TF] Wrong gradients in RNN #1803 [TF] Can't execute model as static vdims can't be multiplied using None * None #1802 [TF] New Keras Parser produces wrong random numbers with seed=None #1799 [Docs] Add CUDA/CUDNN requirements #1798 [SQL] TRANSACTIONS cause significant waiting times in other processes #1796 [CUBLAS] SOL DNN CUBLASlt requires at least CUDA \u003e= 11.8 #1789 [PyTorch] store torch.nn.Parameter in ctx.params if using torch.compile to reduce memcpy overhead in transparent offload #1755 [TF] TF_Activation fails because of _keras_logits #1392 [TF] Keras v3 support (TF \u003e v2.15) 05.06.2025v0.7.5Docs Bugfix for Keras RNN parsing and added TF Einsum.\nClosed Issues #1794 [TF] add Einsum #1793 [TF] Parsing error when Dropout -\u003e RNN #1792 [Docs] Add iWAPT presentation 23.05.2025v0.7.4Docs Bugfix release to relax dist dependency requirements.\nClosed Issues #1787 [WHL] Relax dist dependencies to allow 1.0.post1 like bugfixes 19.05.2025v0.7.3Docs Bugfix release adding PadV2 handler to TensorFlow.\nClosed Issues #1786 [TF] Missing PadV2 handler 24.04.2025v0.7.2Docs Bugfix release upgrading to changed compile flags for PyTorch v2.7.0 integration.\nClosed Issues #1785 [PyTorch] Fix module for v2.7.0, which requires CXX11 ABI 07.03.2025v0.7.1Docs Bugfix release fixing library loading issues on some systems and improving compatibility to PyTorch v2.6.0.\nClosed Issues #1776 [PyTorch] Unable to find shape in tensor #1775 [Tungl] Prevent SOL from loading /opt/nec/veos/lib64/libtungl.so.0 #1768 [PyTorch] Missing torch.ops.higher_order.autograd_function_apply 28.02.2025v0.7.0Docs Highlights NEC SX-Aurora (VE) training support. Please read the NEC SX-Aurora section for device specific options. Better control over model determinism and performance. Performance optimizations for X86, NVIDIA and NEC SX-Aurora devices. Improved Gather/Scatter implementations that now support advanced slicing modes, e.g., torch.index_put. Improved automatic VDim detection. Improved low precision handling (on supported devices). Closed Issues #1754 [Wrapper] Enable Non-Output Models #1746 [DFP] Unable to subtract (802816 * #0 ~ 802816) from (0 + (1024 * #0 ~ 1024)) in debugMemory #1744 [DFP] Faulty Scheduling #1742 [DFP] Performance #1741 [C-API] Remove Tungl dependency #1740 [PyTorch] Wrong result torch.Tensor.scatter_add_ #1739 [PyTorch] Enable sol.optimize(model.member_function) #1737 [TF] Fix cloning of Keras Legacy layers #1736 [HLIR] Fix Pooling::mergePadding for MaxPooling #1735 [NVIDIA] Add API to increase/fetch/free a Handle maintained Workspace. #1733 [Keras] Add _keras_logits to SoftMax and Sigmoid #1732 [CUDNN] Performance Regression in DeConv/AlexNet #1731 [CUBLAS] Autotune if using TF32 is better or not and let SOL adjust the determinism accordingly! #1730 [JSON] Compress DType #1729 [DFP] Faulty broadcasting in Gather case #1728 [CUDNN] add v9 bias/postops #1726 [Compiler/DFP] add `gen::Func post` to all renderSize/Shape methods to allow easier `uniform` dtypes #1723 [PyTorch] Set sol constants as register_buffer(..., persistent=False) if available #1719 [DFP] Slow Gather/Scatter #1717 [Toolchain] Downgrade GCC to 10 to resolve libstdc++ issues #1716 [PyTorch] torch.nanmean and torch.nansum #1715 [HLIR] Remove Buffer -\u003e Buffer without any other layers in between #1714 [HLIR] Remove Gather with 0-sizes indices #1713 [DNNL] Manylinux2.28 compiled DNNL requires libsupc++ #1711 [PyTorch] wrong scatter input/indices dimensions #1709 [HLIR] SegFault BatchNormUpdate when Momentum == 1.0 #1707 [DNN] Make the Conv+BatchNorm Inference also work on VE and NVIDIA #1706 [TF] Accuracy RegNet #1705 [TF] Accuracy ResNetRS #1704 [Pooling] Precompute values for Pooling::initOutputDims, so we can pass them to MergePooling #1703 [TF] Unable to broadcast in sol_hlir_batch_norm_update #1702 [TF] Expected shape (None, 112, 112, 64) but found (Dim(#0) ~= 32, 109, 109, 64) in DenseNet121 #1700 [HLIR] Prevent Outputs to be used for Persistent Copies #1699 [Numpy] AttributeError: `np.mat` was removed in the NumPy 2.0 release. Use `np.asmatrix` instead. #1698 [ToolChain] Check why some Python packages do not get installed in Docker #1697 [ToolChain] Upgrade to newer ManyLinux as LLVM-VE fails on manylinux2014 container #1696 [TF] BCE nan in training bs=10 #1695 [TF] keras.applications.ResNet50 error while parsing #1694 [TF] TFRenderer broken #1693 [TF] SOL does not return identical Keras Config #1691 [DFP] Wrong SIMD Loop Collapsing #1689 [DFP] InitAccessor Unmergeable not correctly working #1688 [PyTorch] Initializing zero strided parameters from Numpy fail #1686 [HLIR] Remove AxisOp and instead encode in Dims #1685 [DFP] Incomplete loop fusion in ConvNext INF BS32 #1682 [JsonCPP] Disable Exceptions #1681 [CURAND] torch.randint changed in 2.5? #1679 [VEASL] Implement MT19937 #1678 [DNN] Evaluate if checking for min or avg is better #1677 [DFP] Split LoopStacks if temporary data cannot be kept in on-chip memory #1676 [YAAL] Simplify Cluster Calling Convention #1675 [YAAL] Reduce spiking memory consumption #1673 [HLIR] SDPAttention add EnableGQA #1672 [DFP] Error in MaxPool2d training in PyTorch #1671 [DFP] Wrong Clustering #1670 [DFP] WeightedPooling DeConv seems to be broken e.g. in ShuffleNet training #1668 [API] Enable user specified options to be passed on from sol.optimize to sol::compiler::Compiler #1667 [VE] Can't install v0.6.1 with VEDA 2.2 #1666 [CMake] Fix make clean that deletes output folders for docs target #1665 [PyTorch] Use fx.symbolic_trace instead of JIT script if possible #1664 [Profiler/Runtime] Add Profiler Annotations to Runtime Hash #1663 [TF] Issue when Masking(Masking(...)) #1659 [TF] Wrong Gradients for ReduceMax/Min #1658 [TF] Error having View before Output in training #1657 [License] Fix Date Format #1656 [HLIR] Consider new DType encoding #1655 [ISPC] Evaluate if `bool ? 1 : 0` or `!!bool` is faster when casting bool to float #1654\t[HLIR] Incomplete Cluster Fusion #1649 [HLIR] Add 'isRematerializable' to all deviceCopy Operations #1648 [PyTorch] Reduce memory consumption in graph broken training #1647 [YAAL] Fix Memory Leak of Persistent data not being freed in bwd pass #1646 [PyTorch] Allow A.fwd, B.fwd, B.bwd, A.bwd execution sequences #1644 [PyTorch] Test v2.5.0 #1642 [Sleef] Evaluate if erf(x) is faster than 1-erfc(x) and vice versa #1641 [RNN] Store workspace data in inputLayout format #1640 [TF] Missing 0-th output from ... #1639 [RNN] Compilation error on X86 using softmax activation #1638 [TF] 'KeyError' when parsing LSTM network #1637 [Wrapper] \"Can't find Tensor XXX in CTX\" error when using models with _keras_mask attribute. #1636 [Profiler] Performance callbacks added without SOL Profiler being activated #1635 [TF] Remove :0 from input argument names #1634 [Optimizer] add wrapper attributes to printSingnature #1633 [VE] generate offloading wrapper library #1632 [PyTorch] Support Dict, List and Tuple inputs in Script Parser #1631 [NVIDIA] Enable to compile for multiple architectures #1630 [VE] Improve Training #1629 [YAAL] Find a way to properly free persistent data #1628 [VE] Update rematerialization of sol_ctx on device #1627 [VE] try catch does not catch exception correctly #1626 [DNN/GEMM] Deprecated BNI*BNO and similiar calls #1625 [DNN/RNN] Backport to new GEMM derive API #1624 [PyTorch] register SOL automatically in PyTorch using entry points #1622 [Sleef] Upgrade 3.7 #1617 [DFP] Make LoopStack::leafStacks a view #1616 [DFP] Fix unparallizable outer loops #1613 [HLIR] Deprecate Layer::remove #1612 [DFP] Improve detection for \"requires64Bit\" in LoopAccessor #1611 [HLIR/DNN] Remove GEMM::offsetB and GEMM::offsetC #1610 [DFP] Race Condition in Scatter add axis==0 #1609 [HLIR] wrong Gradient \"as_strided\" #1608\t[HLIR] Gradient of some Gather is again a Gather, but with reversed indices #1607 [DNN] Allow BatchCount to be broadcastable #1606 [HLIR] Illegal View transformation of ... in LEDModel #1605\t[PyTorch] Gradients of torch.scatter_reduce and scatter_add are wrong #1604 [HLIR] Gradients of Gathers are incorrect #1603 [HLIR] Gradients of tensor assignment are incorrect #1602 [HLIR] Transform Buffer-\u003eCopy-\u003eBuffer to Buffer-\u003eBuffer #1601 [Docs] Add new features to documentation #1600 [Algo] Don't store algos with VDims? #1599 [Profiler] Output filename gets uppercasted #1598 [VDims] Testcase gemm_perf compiles every single case, although VDims is activated #1595 [HLIR] Remove Axes from IncdicesBase #1594 [PyTorch] Linspace #1593 [HLIR] Remove offset, step and loopSize. Rename data* to * #1590 [NCC] Check why NCC always enables profiler #1589\t[FFT] Make input copy a FFTW specific transformation #1588 [ProgressBar] Progress Bar is still broken #1587 [DNN] Some GEMM C kernels are wrong #1586 [PyTorch] Improve handling of torch.SymInt #1585 [PyTorch] FX Parser can duplicate parameters as it's not checking names properly #1584 [HLIR] Encode Constant value in LayerOutput::defaultValue #1583 [DFP] Allow constant LoopTypes #1582 [HLIR] fix Issue_1208 #1581 [DFP] tensor[1, 2, 3] creates no-loop stack in bwd pass #1580 [DFP] Avoid double broadcast e.g. in PY_Repeat #1579 [DFP] AutoBroadcast #1578\t[DFP] Fix Lookup Check in WhereSelect #1576 [DFP] WhereSelect #1575 [DNN] Enable WhereSelect to be applied to specific dimension #1573 [HLIR] Why do VDims removed in transformer 'sequences' #1571 [Wrapper] Handle VDims initialization in Wrapper #1570 [HLIR] Deprecate Dim::setDataSize-like methods and replace with editDataSize-like methods #1569\t[Numpy] Upgrade to 2.0 API #1568 [PyTorch] Can we use Dynamo also for static models? #1567 [HLIR] Unify Gather, Scatter, Roll, BufferOffset, BufferSlice, Reverse, Tile, ... #1563\t[DFP] PY_Roll computes wrong gradients, when multiple roll write to the same input #1562\t[DFP] roll(axis=None) causes no write loops in derive #1559 [Docs] Make grey separator standard on all doc pages #1558 [Installer] Use Version Padding for ~= #1556\t[OpenSSL] Upgrade to 3.x #1553 [HLIR] Merge Gather #1539\t[VE] remove Handle deviceLists and allocate them instead in Module #1521 [DFP] T2520_D0_Output is a register but shall use accessor in ... #1515 [DFP] Correctly implement Grouped Conv #1513 [NVIDIA] CUDA Graphs #1509 [DFP] Don't collapse gather loops #1506 [DFP] Bloom accuracy on AVX512 #1495 [Parser] Add numpy-Advanced Indexing style #1479 [AutoTuner] Cache Algo's per session if they have constraints #1460 [VDims] Enable GEMM vdims for channels #1449 [PyTorch] Add Loss Functions to Parser/HLIR #1444 [Profiler] Add D2H and H2D memcopies #1434\t[CUDNN] Graph API #1414 [CUBLAS] Investigate cublasLT tuning options #1401 [VE] Performance #1390 [HLIR] Improve VDims for Views #1352 [YAAL] Improve Error Handling #1237 [HLIR] Faulty Clustering #1173 [DFP] remove DFP::schedule and instead use the LoopFusion structure of DFP::optimizeLoops to determine execution schedule #1141 [HLIR] GEMM optimization for i == 1 || o == 1 not working for backward pass weight-style GEMM #1139 [HLIR] Remove Immediate Layer Fusion in HLIR #1108 [Distributed] Changes required for multi-node distributed computing #928 [DFP] Narrow, Repeat, Tile: unvectorized LoopStack found #808 [API] Improve Error Messages #786 [HLIR] allow tensor to be casted into complex tensor #767 [DFP] Transform Cores to CoresSIMD, if the sub-SIMD don't share data through a Cache #298 [PyTorch] Can't use sliced Tensor assignment #288\t[VEDNN] add static lib for deployment #234 [Jupyter] Can we signal Jupyter when SOL has crashed? #168 [HLIR] Enable Replay in HLIR "
},
{
	"uri": "/releases/v0.6.html",
	"title": "v0.6 Electra",
	"tags": [],
	"description": "",
	"content": " Version/DateChanges 29.07.2024v0.6.1Docs Closed Issues #1566 [Installer] add nec-sol --version #1565 [Installer] Handle conflicting framework dependencies #1564 [Installer] Handle local version strings for veda-pytorch #1560 [VE] Numerical instability in Sigmoid 27.07.2024v0.6.0Docs Highlights Added experimental support for vLLM. Added experimental support for CUDAGraphs in PyTorch. Added BFloat16 and Float16 support for X86 and NVIDIA. Added FlashAttn-like kernel fusion for X86, NVIDIA and SX-Aurora. Improved torch.compile(...) integration. SOL no longer aborts execution but properly throws Python exceptions that can be catched using try: ... except sol.Exception as e: ... Breaking Changes sol.config['compiler::profile'] has been deprecated. Use env var SOL_PROFILER instead Known Issues No BFloat16 and Float16 support for SX-Aurora. Performance regressions on SX-Aurora (e.g., ConvNext). No gradient computations yet for Interpolation layers. Closed Issues #1555 [NVIDIA] NAN in Albert model #1554 [PyTorch] Scriptparser uses deprecated imp package #1550 [PyTorch] torch.cross #1548 [PyTorch] Unknown at::device #1547 [VDims] Can't use the same vidx twice! (128 * #2 * #2) #1546 [VE] User specified an unsupported autocast device_type 've' #1545 [BuildChain] CentOS7 mirrors are deprecated, we might need to upgrade to manylinux_2_28 #1544 [Python] Destroy sol_optimizer* when error is thrown #1543 [VE] Invalid results when large values get passed to exp #1542 [Core] Uncaught sol::Exception when autotuning triggers an assertion #1540 [VE] PY_Issue_1410 fails #1538 [Python] Allow determinism to be overwritten by user #1537 [PyTorch] Finalize Determinism #1536 [ISPC] Make Prefixsum as template #1535 [DNN] AutoTuner Cross GEMM performance #1534 [PyTorch] torch.rand/rand_like F16/BF16 on X86 #1532 [Wrapper] Don't call sol_call if graph does not contain any nodes #1530 [DNNL] PostOp Bias #1529 [Core] Improve Exception Handling within TF #1528 [PyTorch] aten::extend #1527 [ONNX] change s_tensors to scope, and store s_opset in scope #1526 [ONNX] LRN #1525 [ONNX] Hub Tests #1520 [DFP] No Write/Read loops in 376:628 #1519 [PyTorch] Can't expand [1, 3, 80, 80, 2] to [80, 80, 2] in YOLO perf run #1514 [VE] Finalize VE3 support #1512 [DNN] Add GEMM upcasting API to support e.g. FP16/BF16 on CPU/VE #1508 [Transformers] Persimmon/Qwen2/XLMRobertaXL Accuracy #1507 [TIMM] Upgrade to v1.0.7 #1505 [VE] Set NCC -stdlib=compat #1504 [Parser] Implement lazy Numpy eval #1503 [DFP] Conv Accuracy #1502 [DNNL] Enable F16 and BF16 #1501 [DFP] Tanh(large number) == nan #1498 [NVIDIA] Limit usage of TensorCores to suitable shapes #1497 [ISPC] Unable to store varying in uniform var #1496 [TIMM] fix models with lit != idims.end() error #1494 [HLIR] Add Constraint Registry that allows to serialize them #1493 [DNNL] Upgrade 3.5 #1492 [PyTorch] Unify Script and FX parser #1490 [SLEEF] Upgrade v3.6.1 #1489 [PyTorch/Lightning] Update Testcase to new API #1487 [PyTorch] Test v2.3.1 #1483 [DFP] AccessorPinning fails in PY_REDUCE training #1481 [PyTorch] HuggingFace LLama: This model uses variable number of arguments, falling back to torch.jit.trace #1480 [DFP] CUDA pow(-0.5, 2.0) results in NaN #1477 [DNN] Add timeouts for GEMM and Transpose AutoTune #1476 [DFP] remove stack_alloc calls #1475 [AutoTuner] Reneable caching with new AT scheme #1474 [HLIR] SDPAttention: Cannot modify VDims at this point anymore! #1472 [DFP] Revise CPU+VE cores caching #1471 [CUDA] Don't abort if libcuda.so is not found (e.g. in Docker build) #1470 [Torchvision] vgg11: T18_D0 violates DNNL post op requirements as src is of type Add #1469 [CUDNN] Could not load library libcudnn_cnn_infer.so.8. Error: /usr/lib64/libcudnn_cnn_infer.so.8: undefined symbol #1468 [Runtime] Free Persistent Data in case user reruns training fwd pass without bwd pass #1466 [DFP] DFPBackend::lowerIR(Layer* l, Cast* p) causes infinite loop when using VE #1464 [DFP/Nvidia] LLama using -dt tf32 results in illegal view transformation #1462 [PyTorch] Test v2.3.0 #1459 [NCC] add -march ARCH to NCC command #1457 [DFP] Unvectorized OnlineSoftMax in SDP #1456 [PyTorch] If FlashAttn enabled for F32 SDP, then also set GEMM::TF32 #1454 [VE] VEDA_ERROR_VEO_COMMAND_EXCEPTION when running VEDNN GEMM AutoTune #1452 [Wrapper] Add options to wrapper::Attributes to either override or accumulate #1451 [VE] ve::trace=True does not compile #1450 [DFP] Transform Numel-\u003eCast-\u003eBroadcast to Numel-\u003eCast with correct dims #1446 [DFP] Test TensorCore-like GEMM implementations #1443 [DNN] Implement GEMM autotune swapping inputs #1442 [PyTorch] add Tensor.uniform_ #1441 [PyTorch] add one_hot #1440 [DFP] HMerge identical Pooling/Conv and Reduce layers #1439 [DFP] Reduce Cores LoopAccessors #1437 [DFP] Rework Combine Input #1435 [Accuracy] Investigate SoftMax Gradient problem #1432 [Runtime] pass Model.training parameter as runtime parameter #1431 [DFP/ISPC] add iterations to sol_dfp_reduce_x(..., count_iterations) #1430 [HLIR] move Bernoulli::transform(Device\u0026) to respective backends #1429 [DNNL] Upgrade 3.4.1 #1427 [PyTorch] VLLM support #1425 [Docs] Update max TF version #1424 [HLIR] SDPAttention layer #1422 [DFP] Rework Reduce::transform as it underutilizes #1421 [ISPC] Add PyTorch_DETERMINISM compile flag, as they don't use FP64 sleef functions! #1419 [PyTorch] Test v2.2.2 #1417 [DFP/X86] Accuracy GELU #1416 [DFP] Loop+AccLoop fusion #1415 [PyTorch] NaN when training MNist example on CUDA #1413 [PyTorch] Pass on fwargs and vdims to torch.compile Backend #1412 [TestBench] Add Default DType to perf Output #1411 [PyTorch] add tensordot #1410 [CUDA] Wrong result in CUDA Transpose #1409 [PyTorch] evaluate FP64 GEMM uses tensor cores #1407 [RNN] Add Determinism to API #1406 [Profiler] MemTrace reports wrong total #1405 [NVIDIA] LayerNorm accuracy #1403 [NVIDIA] consider to remove cross-warp-reduction support #1402 [DFP] OnlineSoftMax::derive #1399 [Runtime/HLIR] Include Model Inputs in runtime::RequiresGrad #1398 [DNNL] Use PostOp Activations in Inference #1397 [Numpy] Adopt new VDims system #1396 [Compuler/Runtime] Remove special \"INF\" case of Derivative #1395 [TF] Missing TF handler for DivNoNan #1394 [PyTorch] YOLO accuracy #1393 [TF] Test v2.15.1 #1391 [Runtime] Consider separating INF and Training #1388 [DFP] Still unused Loop Unpacks, e.g. in AlexNet #1387 [DFP] Check Merged FOR-FOR loops, that cause unpacking of Loops (e.g. AlexNet) #1386 [TIMM] fix vit_base_patch16_384 #1385 [Profiler] Reporting to file not working #1384 [NVIDIA] Evaluate new __reduce_xxx_sync function for CC \u003e=8 #1383\t[DFP] Improve cache planning #1382 [DFP] Unnecessary cast in FP16 Mul-\u003eMul #1381 [DFP] Expected [S64] for sol::compiler::backend::dfp::Indices but found [S32] #1378 [HLIR] Inherit Cast in MaxPooling.Indices -\u003e Cast -\u003e ... #1376 [Runtime] Enable to use sol.device.set(...) from different Threads to run multiple devices in parallel #1373 [CUDA] SOL crashes with \"invalid device context\" when using Streamlit #1372 [HLIR] Evaluate if using determinism instead of sol.hlir.RandType is sufficient #1371 [DFP] Upcast pure intermediate results in FP16/BF16 to FP32 #1370 [PyTorch] Test v2.2.1 #1369 [DFP] Add transformation to upcast internal data types from f16 to f32. #1367 [VE] SegFault Norms(64) #1366 [CUDNN] CUDNN_STATUS_BAD_PARAM in TF RegNet #1365 [NVIDIA] PY_Reduce(6/12) argmin/argmax fails with F64 #1364 [NVIDIA] PY_Issue_1316 fails with F64 #1363 [Tests] Enable non-fp32 dtypes in testsuites #1360 [DNN] use sol_sync instead of sol::dnn:XXX::sync #1359\t[CUDA] performance of cublasGEAM not optimial, e.g. in TransposeSwap testcase #1358 [DFP] Fix elementwise cases that don't get CoresSIMD assigned #1357 [Runtime] Implement sol_tensor_swap to skip Noop Reorders #1356 [Sleef] Upgrade v3.6 #1355 [PyTorch] Performance Issues in CNNs with BS=1 on X86 #1351 [PyTorch] Fix 'PY_Padding' on VE #1350 [PyTorch] Fix 'PY_Norms(3)' on VE #1349 [PyTorch] Fix 'PY_Addbmm' on VE #1348 [PyTorch] Fix 'PY_Matmul#T#Batched' on VE #1346 [PyTorch] Fix 'PY_CatRewire' -\u003e nan on VE #1344 [YAAL] Return if any of the checks fail with error code #1343 [TF] Test v2.15.0 #1342 [Runtime] Trigger recompilation if non-dynamic dimension changes instead of crashing #1341 [NVIDIA] Fix library detection if cu11 and cu12 packages are installed #1338 [Rand] Numpy Random Number Generator #1337 [HLIR] Add tensor[condition] operator #1336 [HLIR] Remove (Layer*) arguments from Operation, as they know their layer via m_layer! #1335 [BLAS] Fix decision making on AMD EPYC for new Autotuning #1334 [BLAS] Unify OpenBLAS, MKL, DNNL and AOCLBLAS BLAS Interface #1333 [OpenBLAS] Add Backend #1332 [AutoTuner] Consider Backend Specific \"number of runs\" and \"not improved\" #1331 [AutoTuner] Add option to poll performance for a layer from within another layer's tuning cycle #1330 [PyTorch] Capture ::c10::Error errors in handle and rethrow as sol::Exception #1329 [AutoTuner] AutoTuner Cache does not allow rerunning Reorder -\u003e GEMM, profiling when identical GEMM layer but without previous Reorder was executed #1328 [PyTorch] Test v2.2.0 #1327 [Numpy] Executor overrides input #1326 [MKL/VEBLAS] Evaluate if using GEMV when bs==1 is better #1323 [Profiler] Fix Total Bytes #1322 [DNN] Evaluate other GEMM tuning strategies #1320 [DNN] repair GEMM autotuning #1319 [VE] Attach to host profiler #1318 [PyTorch] Set executing device in model without inputs and parameters #1317 [PyTorch] Adjust executor to also copy MemberTensors that are no Buffers to device #1316 [PyTorch] aten::scaled_dot_product_attention #1315 [TF] Read accuracy modifying values, e.g. tf32 execution #1314 [PyTorch+VE] might cause segfault when exiting #1313 [PyTorch] Using shape of not used tensor #1312 [CAPI] Unify SOL dtypes for generated code #1311 [NCC] Don't expect nc++, ... to be installed in /opt/nec/ve/bin/ #1310 [Installer] Add option to renew license #1309 [Installer] Download option not working, as PIP downloads only Wheels, not Source packages #1308 [HLIR] Tensors should only evaluate value_op if values and value_op are not None #1307 [HLIR] Parser implemented non existing np functions, e.g. np.erf, np.acos, ... #1306 [TF] Check Resnet50 CPU performance #1304 [Core] Progress bar breaks, when Backend Handles get compiled during optimization process #1298 [DNNL] \"Unsupported dnnl_format_tag_t: POI/2/true\" in tf/regnet #1296 [Python] Remove SOL_CONSTANT Params #1289 [DFP] Store ReAlloc not as instruction but directly within LoopStack #1285 [DFP] Group WriteBacks for better performance in case of MasterStacks #1284 [PyTorch] Read Torch accuracy + determinism values and attach them to the layers #1282 [DFP] Minimize Loop Index Calculations #1275\t[ISPC] Investigate impact of setting different target gang-sizes in ISPC compilation #1269 [PyTorch] Implement FlashAttention #1268 [CUBLAS] FP16 + using advanced flags #1264 [DFP] Implement Online normalizer calculation for softmax #1250 [AOCL] Add new Backend #1240\t[PyTorch] enable Torch Compile style fwd+bwd within one pass #1238 [PyTorch] Bloom Support and Optimizations #1236 [HLIR] Reorder -\u003e GEMM transform #1223\t[DNNL] Enable NVIDIA GPUs #1193 [PyTorch/JIT] Add torch.jit.freeze to test-suite #1191 [VDims] Autodetect VDims #1190 [Profiler] Trace Memory Allocations #1186 [VE] Fix AutoCast to 32Bit/64Bit vars to enable vectorization #1185\t[NCC] v5.0.2 fails in PY_Norms, PY_Reduce and PY_TorchNorm #1180\t[HLIR/DFP] Enable to store SOL_CONSTANT in model #1153 [PyTorch] Investigate torch.fx for improving the parser #1060\t[CUDA] Implement transpose as series of calls to cublasgeam #1043\t[TF] \"Unable to find SOL_CONSTANT\" #999 [ISPC] Improve SLEEF integration #991 [Python] Improve performance of sol.optimize #920 [PyTorch] Add more Einsum testcases #913 [DFP] Rework stack memory caches #798\t[NVIDIA] Change dependencies to use NVIDIA PIP packages #787\t[AutoTuner] Think about choosing algorithms not solely based on the performance, but also about it's neighborhood, to increase chances of fusion. #766\t[TF] Accuracy for BatchNorm in Efficientnet #696\t[PyTorch] add aten::roll #651 [TF] tf.keras.applications.efficientnet.EfficientNet + V2 accuracy problems #622 [ISPC] ISPC casts wrongly casts double to uint8 #504 [DFP] fix removal of unnecessary accumulators #503 [DFP] Memory Pinning #502\t[DFP] Operation Inlining #497 [HLIR] Add PassThrough Node #495 [VEBLAS] Evaluate SOL's BatchedGEMM versus new NLC BatchedGEMM #390 [VEDNN] readd GEMM #367 [PyTorch] Add YOLO Test Case #366\t[ONNX] Add YOLO TestCase #319\t[PyTorch] Add option to parse \"if self.training:\" diverging paths #197 [DFP] Reorder: Fill #196 [DFP] Reorder: Narrow #104 [All] FP16, BFloat16 #69 [DFP] Performance: BatchNorm Welford Algorithm "
},
{
	"uri": "/releases/v0.5.html",
	"title": "v0.5 Diadem",
	"tags": [],
	"description": "",
	"content": " Version/DateChanges 12.01.2024v0.5.3Docs Highlights Added experimental keras.Model.compile(..., sol_compile=True, sol_vdims=[...]) support Bringing back CUDA support! 🎉 Fixed random numbers using MKL/VSL within PyTorch \u003e2.0 on X86. Significantly improved TensorFlow RNN performance and numerical stability. Added sol.config['vdims'] = ... to enable usage of SOL's vdims system also with torch.compile(...) that cannot pass such information to the backend compiler. You can now limit number of parallel JIT processes using SOL_JIT_THREADS env var, in case you encounter out-of-memory crashes while compiling. SOL no longer segfaults when more than 50% of system memory is used, when compiling. Breaking Changes Due to increasing number of unresolved issues in TensorFlow PluggableDevice API (e.g., #55497, #57095, #60883 or #60895) we decided to no longer maintain our veda-tensorflow extension. Therefore you cannot longer use with tf.device(\"/VE:0\"):. Instead please use Transparent Offloading using sol.device.set('ve', 0). We are sorry for the inconvenience, but we don't see any commitment of the TensorFlow team to accept our bugfixes, nor to fix the issues themselves. Closed Issues #1305 [PyTorch] Can't optimize TorchScriptModules #1302 [TF] Unable to broadcast (1 * #0) and 2 in TF_Issue#1201 #1301 [Compiler] Investigate why NCC or ISPC sometimes stop when compiling many jobs #1300 [TF] BCE: O3_D0_Output is not stored in ctx #1299 [TF] Segfault when running RNN test suite #1297 [TF] Prefixsum: 'T4_D0_Input (Reverse) has no src!' #1294 [TF] TF_Issue#985 returns wrong results #1293 [TF] tt.nn.max_pool_with_argmax returns wrong indices #1291 [DFP] New Cache Planning fails in simple expansion example #1288 [Runtime] Unify Shutdown/Unload methods in device runtimes #1287 [CMake] Prevent SOL to run out of memory when compiling dependencies #1286 [DFP] CUDA uses too much shared memory if we use too large group_cnt #1283 [DFP] Deprecate renderRootCache #1280 [PyTorch] Expected a value of type 'Optional[float]' for argument 'value' but instead found type 'int'. in 'dm_nfnet_f0' #1279 [PyTorch] Check v2.1.2 compatibility #1278 [PyTorch] PY_Issue_964 fails with `Buffer T25_D0_Output is already chained with C1_D0_Output` #1277 [PyTorch] Investigate \"object has no attribute or method '__ne__'\" when compiling multiple models #1274 [Runtime] deprecated sol::runtime::Tensor and use sol::Tensor instead #1273 [CAPI] move sol/autotune.h to sol/capi/autotune.h #1272 [SDK] Update GEMM example with new autotune routine #1271 [DNNL] Deprecate current Autotune impl and move to handle #1270 [NVIDIA] Fix NVIDIA profiling #1267 [Tests] Warn always about OMP issues #1266 [HLIR] Arithmetic DType detection fails for S64/F32, should be F32 and not F64 #1265 [HLIR] Cast -\u003e Cast optimization #1262 [DFP] transform `X = Y / broadcast(Z)` to `ZZ = 1/Y; X = Y * broadcast(ZZ)` #1260 [CUBLAS] Fix SegFault in autotune #1259 [Tests] Mark if it's an IDENTICAL match (==) #1258 [MKL] Deprecate current Autotune impl and move to handle #1257 [DNNL] Deprecate current autotune implementation and move into DNNL-handle #1256 [CUBLAS] move autotune into handle, similar to VEBLAS and share impl. #1254 [PyTorch] PY_Rand test case runs always on CPU? #1253 posix_spawn fails when cmd contains \"\" #1252 [JIT] Refactor API #1249 [CUDA] Compile backend APIs on target machine, to link to correct CUDA version #1248 [JIT] Out of memory when calling any JIT compiler, when already \u003e50% of system memory is filled. #1247 [PyTorch] Can't trace Inception and GoogleNet suing torch.compile(..., backend='sol') #1246 SoftMax can produce NAN in Bloom #1245 [HLIR] hlir::cast({...}) casts away float of constants e.g. in pow(float, int) #1244 [ISPC] Cast from float to bool results always in False #1243 [Python] implement sol.set_vdims(...) as global default. #1242 [PyTorch] Don't duplicate parameters while optimizing (to prevent crashes) #1241 [PyTorch] Check if we can automatically interleave SOL and non-SOL function using torch.compile #1239 [PyTorch] enable parsing of `torch.autograd.Function` constructs #1235 [HLIR] Constant -\u003e PrefixSum -\u003e Arange #1233 [PyTorch] Add bitwise_x #1232 [SDK] Add FindSOL.cmake script #1231 [SDK] Fix GCC9 compatibility #1229 [SDK] Compiled libraries fail during loading due to missing CXX11 abi symbols #1225 [SDK] Add Examples #1222 [DNNL] Upgrade v3.3.1 #1220 [HLIR/DFP] Split AvgPooling in SumPooling + AvgPoolingNorm #1219 [PyTorch] PyTorch Module tries to be compiled with WITH_CUDA=1 even no CUDA is available. #1218 [PyTorch] Test v2.1.1 #1217 [TensorFlow] Test v2.15.0 #1216 [TensorFlow] Test v2.14.1 #1215 [DFP] unvectorized LoopStack in Reduction #1213 [UI] Add AutoTuner to ProgressBar #1212 [HLIR] Remove Unpooling::requiresZero as it's obsolete #1210 [HLIR] Missed cluster fusion opportunity in Alexnet BWD pass #1209\t[DFP] Split Total and Bias Reductions #1208 [DFP] Input Loop Merging #1207 [Refactoring] Refactor Network* in HLIR to Network\u0026 #1205 [DFP] Split Full-Reduction Layers #1203 [Keras] Does call to sol_fwd require the output shapes really provided? #1202 [Keras] Simple example disables ALL vdims #1201 [API] Don't duplicate outputs anymore, but instead do the duplication in Wrapper #1200 [API] Store sol_network* net, sol_wrapper_container* inputs, sol_wrapper_container* outputs in Optimizer #1199 [Runtime] Enable to use \"DefaultStreams\" if framework does not provide any #1198 [PyTorch] No need to pass dtypes to `module.cpp` as they get initialized in YAAL #1197 [Runtime] Remove fetching of all tensor data in `module.cpp` and instead implement as lazy call in `handle.cpp` that gets executed in `sol_ptr` if needed. #1196 [Runtime] Move handling of malloc/free/... to C-API that directly is accessible in sol_handle without going through `runtime::Handle`. And instead use these direct callbacks also in `runtime::Handle` #1195 [PyTorch] huge overhead when model has many parameters/buffers #1194 [CUDA] add `__launch_bounds__` in generated code #1192 [SDK] Add DNN and DFP headers to adv-sdk #1187 [AT_MT19937] Vectorize 64Bit #1184 [DFP] Too much shared memory in PY_Norms #1183 [PyTorch] Wrong gradient dtype in torch.where(cond, float32, int64) #1182 [Tests] Fix Remaining Errors #1181 [HLIR] Move Rand, Bernoulli and Dropout algorithm selection to respective backends #1179 [TF] wrong indices for tf.nn.max_pool_with_argmax in some situations #1178 [TF] AvgPooling produces deviating gradients #1177 [HLIR] In PY_Norms LayerNorm we have CONSTANTs that get stored as COPY in fwd pass for training #1176 [PyTorch] Fix LayerNorm Weight Gradient #1175 [HLIR] Fix BatchNorm training inaccuracies #1174 [HLIR] Don't create copies of Broadcast! #1172 [HLIR] Implement Functional Sigmoid and Tanh using high level operators, to make benefit of HLIR optimization passes. #1171 [TensorFlow] Performance Regression in DNNL Conv #1170 [PyTorch] Fix AlphaDropout #1169\t[PyTorch] Fix U8 dtype in PyTorch Handle #1168 [CUDNN] Fix NHWC Conv execution #1167 [DNNL] Revert using PYPI packages, as these do not keep up with Github releases #1164 [PyTorch] Test v2.1.0 #1161 [TF] Test v2.14.0 #1160 [TF] Test v2.13.1 #1159 [PyTorch] Always use torch.compile(...) for parsing models for better compatibility #1158 [Python] Upgrade importlib usage #1156 [PyTorch] X86 Dropouts fail #1154 [DFP] Move Cache to beginning of parent Cores (can this be stored within the LoopStack?) #1152 [Keras] Can we get better estimates than `1` when Input-Size is variable? #1151 [TESTS] Add option to force comparison with host system #1150 [CUDA] Enable sub-warp grouping with sizes 1, 2, 4, 8 and 16 #1148 [VEDNN] Port to new SOL-namespace API #1147 [DNN-RNN] Evaluate if instead of copying the input of the activations, that we can copy the output. #1146 [OMP] TF causes MKL to run very slow when using more threads than cores #1145 [CUBLAS/CUDNN] Store handle as thread_local. #1143 [HLIR] Mask does not need gradient, although it is indicated as having gradient in debug output #1142 [PyTorch] Use at::Tensor in `module.cpp` and `handle.cpp` instead of doing the allocations on our own. #1140 [DNN-RNN] Verify bwd SimpleRNN with masking=True for normal and normed activations #1138 [DNN-RNN] Implement new Special Layers using GEMM #1137 [CUBLAS] Investigate CUBLAS overlap error in TF #1136 [DNN-RNN] Remove dB0 + dB1 entirely from RNNDeActivation #1135 [DNN] use a Macro to manage `using` centralized #1134 [DNN-GCC] Investigate why LSTM_Bwd activationX is taking most of the execution time #1133 [CUBLAS] use 64bit API #1132 [DNN-RNN] Evaluate if permuting the WH matrix to a more vector pleasent format improved performance #1131 [RNN-NCC] Improve performance of VE RNN kernels #1130 [VE] Test if NCC v5 has solved the constexpr problem #1117 [TF] `unsupported operand type(s) for *: 'int' and 'NoneType'` in TF_Tile #1116 [CUDA DFP] Wrong results in TF_Reverse #1114 [Compiler] Verify that LayerOutputs have dests and are not dangling #1113 [PyTorch] PY_Norms fails using `-bs 5 -vbs -d nvidia` #1112 [CURAND] Store random states per GPU #1111 [CUFFT] Cache Plans #1107 [TF] Can we use `op.skip_input_indices` to replace requires_grad? #1105 [TF] ValueError: The inequality of unknown TensorShapes is undefined. in TF v2.6.5 when running BWD pass #1103 [CUDNN] Catch invalid padding modes for Conv, e.g., first layer of AlexNet #1101 [DNN-RNN] Optimize TF specific RNN Bwd Kernels #1100 [Runtime] Warn is User does not specify sol.set_seed #1099 [DNN-RNN] Race Condition in accumulating bias #1097 [HLIR+DFP] Chaining multiple Padding Layers fails #1094 [PyTorch] Module does not compile when CUDA is not installed. #1093 [TF] RNN fails with `Layer T13_D0 is already registered to this network!` #1091 [DNN-RNN] Enable RecurrentDropout in SimpleRNN #1090 [HLIR] UnPooling throws `Assertion 'inSize(kd) \u003c= outSize(kd)' failed!` #1089 [RNN] Performance #1087 [CUDA] Make CUDA Handler Impl ThreadSafe #1086 [TF] Check Compatibility v2.13.0 #1083 [PyTorch] Investigate overhead when executing with CUDA #1082 [HLIR] Don't do DFP Bernoulli or BatchNormInference-style fake Algos #1081 [PyTorch] CUDA specific Dropout impl #1080 [PyTorch] torch.mm #1079 [PyTorch] GPU Bernoulli #1078 [DFP] Reenable Grouped Cores #1077 [DFP] identifier \"L20m5_L21\" is undefined #1076 [PyTorch] aten::full #1075 [ISPC] Evaluate if we really need a ISPC version of AT_MT19937 #1072 [PyTorch] File BugReport that DeviceType::CUDA does not register to c10::GetAllocator(...) #1071 [DNNL] store temporary memory consumption in Algo #1070 [DFP] Optimize initUsesVDims by using traverseBool #1069 [DFP] Fix Operation Type::Condition #1068 [PyTorch] Wrong results in PY_Rand on x86 #1067 [DFP] Remove double cast #1066 [C++] gen::string::join #1064 [Tests] Report correct GPU name #1063 [DFP] If same non-cluster value is read, create a copy node to enforce loop merging within DFP? #1061 [HLIR] PrefixSum of len 1 == memcpy #1056 [DNNL] Upgrade 3.1.1 #1055 [TF] modify keras.Model.compile to enable direct SOL compilation #1054 [CUBLAS] port to new API #1053 [CUDNN] port to new API #1052 [CUDA] complete all open issues #1051 [CUDA] Port to new DFP API #1050 [CUDA] add TF_PHILOX support #1049 [CURAND] add AT_PHILOX support #1047 [OpenSSL] Upgrade 1.1.1u #1046 [YAAL] Switch `sol::dnn::X::rand::create` to `sol_random_seed`-callback #1045 [DNNL] '[DNNL ERROR] The operation failed because of incorrect function arguments.' in Reorder with VDIMS after API upgrade #1044 [TF] DenseNet: Segfault #1042 [VEBLAS] Performance regression with BatchSize 2-7 on AlexNet #1041 [Compiler] prevent compilation to be overlapping autotuning phase #1040 [HLIR] Move BatchNormMean/Var to DFP module #1039 [SQLITE] Upgrade v3.42.0 #1038 [HLIR] Broadcasting using VDims, where one is 1 and other is != 1 causes VDims merge conflict #1037 [PyTorch] Reduce Testcase, all var(False) fail #1036 [PyTorch] Test v2.0.1 compatibility #1035 [ISPC] Upgrade v1.20.0 #1033 [DFP] Circular Schedule in tf.layers.ReduceBool testcases #1031 [DNNL] Bugfix Dilated Conv #1016 [PyTorch] add aten::eye #1015 [RNN] Investigate NAN in non \"linear\" cases #1013 [PyTorch] Bernoulli(scalar) produces different results on X86 and VE #1008 [DFP] Error: Ignoring redeclaration of symbol \"L212IN0__11_10L\" in Interpolation TestCase #993 [TF] Check if we handle local seeds correctly when we use multiple NNs and if they match with the seed generated into the train methods, etc. #985 [HLIR] We can use `transform::duplicates` on Rand if they have same local seeds, and there is no dependency between them. #982 [OMP] Parallelize tf_philox across groups * sequences #979 [HLIR] Refactor RAND system #974 [DNN-RNN] Support NSC input format without permutations #972 [DNN-RNN] Remove WH input if H == 0 and seq == 1 #965 [HLIR] Improve Duplicates detection #958 [ISPC] Evaluate if we can use new ISPC Template feature #955 [DFP] PY_Issue_945 creates non vectorized implementation #943 [RNN] Make Masks INT8, because they anyway don't need to be vectorized #921 [DFP] Loop Lookup \"Chicken/Egg\"-Problem preventing GPT2 #917 [HPTT] Add ENABLE_AVX flag #910 [PyTorch] Investigate Performance regression Resnet18 vs Resnet50 #891 [TF] Investigate using tf.function for inner_call #886 [CUDA] Add Custom Layer Support #884 [TF] RNN with stateful=True sometimes produce deviations in \"recurrent_kernel\" within 2nd training iteration #874 [HLIR] Merge into atomic Slice -\u003e Broadcast operation #872 [DNNL] Add Reorder time to Conv Autotuning measurements #869 [YAAL] Experiment to perform sol_shape_checks using OMP #857 [DNNL] Upgrade v3.1 #819 [TF] add keras.layers.GroupNormalization (requires v2.11.0) #785 [VEBLAS] Analyze performance of different parallelization strategies and small/big batchsizes #769 [PyTorch] RegNet Inference Accuracy Problems on AVX512 #768 [DFP] DFP sometimes allocates too much stack memory #655 [TF] check accuracy issues on AlexNet running on X86 #650 [TF] tf.keras.applications.mobilenet.Mobilenet accuracy problem #588 [TF] LayerNormalization #476 [PyTorch] Einsum #474 [CUDA] WhereTrue #473 [CUDA] Value2VDim #467 [CUDA] PrefixSum #405 [CUFFT] FFT #400 [CUDA] add Struct functions #213 [VE] Add checks to jit-ncc to verify that necessary loops really get vectorized #176 [CUDA] RNN #85 [HLIR] H-Merge identical parallel layers #79 [DFP/Performance] H-Merge (i.e. BERT BT) 28.04.2023v0.5.2Docs Highlights Deprecated DL4J and Unikraft support. Significantly improved compatibility of SOL integration into PyTorch and TensorFlow. Experimental Custom Layer support for PyTorch and Tensorflow. Lots of internal bugfixes and improvements, e.g., improved code generation of loop indices to reduce recomputation within compute kernels. Added more compiler specific env vars. Added SOL_CWD env var, to enable users to change the directory which SOL uses as working directory. SOL now implements identical random number generators as PyTorch and Tensorflow. New compiler::deterministic config option to trade accuracy of the model for more performance. See configs. Preliminary support for TensorBoard profiler. Set SOL_PROFILE=TENSORBOARD:FILENAME. Results will be stored in FILENAME. You now can use torch.compile(model, backend='sol') as alternative to sol.optimize(...) in Pytorch \u0026gt; 2.0! See here for more details. Known Issues Since TensorFlow v2.10.0 issue #57095 causes problems within the PluggableDeviceAPI. Tensors that need to be placed within the host memory randomly appear on the executing device. This problem increased in v2.12.0 and can cause random segfaults when running TensorFlow workloads on NEC SX-Aurora. This is a problem within TensorFlow that we cannot fix. Unfortunately it doesn\u0026rsquo;t seem that the TensorFlow team is going to fix this problem any time soon, although other vendors face the same problem. If you encounter random segfaults using TF + VE, please downgrade to TensorFlow v2.9.0.\ntorch.dropout(...) and torch.bernoulli(scalar) return different random numbers than SOL. This is caused by PyTorch issue #94388 that uses a different random number generator for Bernoulli/Dropout. In PyTorch v1.* this even causes different random numbers on Intel and AMD CPUs. SOL only supports identical random numbers for torch.rand(...) and torch.bernoulli(Tensor) yet.\nClosed Issues #1034\t[DFP-NCC] Error in PyTorch Narrow testcase #1032\t[DNNL] Upgrade 2.7.4 #1030\t[PyTorch] Could not cast attribute 'num_batches_tracked' to type Tensor: Unable to cast to Tensor #1029\t[DimMapper] Assertion 'A.size() == B.size()' failed! in ShuffleNet #1028\t[PyTorch] aten::tensor #1027\t[SegFault] Investigate random SegFault #1026\t[DFP] ASSERT(p-\u003egroups() == 1 || p-\u003egroups() == p-\u003einChannels()) failed #1025\t[HLIR] unable to find gradient in many TIMM networks #1024\t[HLIR] Conv: Assertion '!wdims.hasVDims()' failed! #1023\t[HLIR] Incompatbile shapes within Arithmetic in GCResNet #1022\t[DNNL] The operation failed because of ... #1021\t[Parser] Can't initialize sol.hlir.Dim with [64.]/"
},
{
	"uri": "/releases/v0.4.html",
	"title": "v0.4 Citadelle",
	"tags": [],
	"description": "",
	"content": " VersionDateChanges v0.4.2.1Docs20.09.2021 Breaking Changes Starting with v0.4.2.1 SOL requires every user to accept the license agreements. This is handled during installation of SOL and should be fully automatic. The SOLrepository server will issue a digitally signed license file that is installed into the SOL installation directory. You can find a copy of the license agreement it in the /path/to/sol folder AND you'll receive a copy via e-mail. This license is valid for 1-year and needs to be renewed using the installer once expired. We completely overhauled the SOL installer to be a menu-based implementation, to make it easier shape the SOL installation to user needs. Closed Issues #380\t[Installer] User License Agreements #373\t[Deploy] Fix Python API #372\t[License] Add SOL license agreement feature #371\t[ISPC] Update to 1.16.1 #370\t[Licenses] Add licences of used software to SOL distribution and docs #369\t[TensorFlow] \"Dimensions must be equal, but ...\" error during training #365\t[DFP] max/min reduction stores wrong index for gradient #361\t[PyTorch] aten::mse_loss is missing #359\t[Installer] installer always tries to install module x86 #358\t[PyTorch] Missing aten::l1_loss #357\t[PyTorch] Missing aten::broadcast_tensors v0.4.2Docs12.07.2021 Highlights Added TensorFlow v2 support Added experimental Cross-Framework support Breaking Changes In PyTorch we now use the TorchScript parser to trace the neural networks. Unfortunately TorchScript is still VERY experimental and buggy but we already have submitted two pull requests to fix some of the issues: github.com/pytorch/pytorch/pull/61274, github.com/pytorch/pytorch/pull/61274. Switching to TorchScript was required as HuggingFace BERT is currently incompatible with TorchScript. We\u0026rsquo;re looking for a solution. Please use v0.4.1 if you need BERT.\nClosed Issues #356\t[PyTorch] Include bugfixes for PyTorch RNN, LP Pool and Padding #354\t[PyTorch] fix gradient of SIGN function and fix SIGN(0) case #353\t[HLIR] Why do we store Constant copies? #352\t[Runtime] [Input Error grad_outputs:0] Expected [1, 0, 0, 0, 0, 0, 0, 0] but found [0, 0, 0, 0, 0, 0, 0, 0] #351\t[PyTorch] allow parsing of custom model function #350\t[DNNL] Memory Descriptor error in ResNext #349\t[PyTorch] use unique() instead of debugName() in TorchScript #348\t[PyTorch] Enable KWARGS in TorchScript parser #347\t[DFP] nn.Conv2d(cin, cout, 1, stride=2, bias=True, groups=groups) fails in BWD #346\t[DFP] Test PY_AvgPool2d#2 fails in Backward Pass #345\t[DFP] enable struct conditionals for new pooling implementation #344\t[DFP] deprecate scoped conditionals #343\t[SQLITE] Upgrade to 3.36.0 #342\t[HLIR] replace internal LPPool with more high level implementation #341\t[DFP] AvgPool(3,3) on tensor(1, 1, 15, 15) seems to produce wrong gradient. #340\t[VEBLAS] RNNCells do not compile #339\t[TF] Race condition in VE-Native execution during Training #338\t[TF] Training on X86 fails with \"Assertion `t.dtype() == DType_::value' failed\" #337\t[TF] frees inputs between fwd and bwd pass #336\t[TF] VE Native integration allocates unaligned CPU pointers? #335\t[DFP] SqueezeNet BWD Pass can create invalid offsets #334\t[Transparent Offloading] memory corruption during training #333\t[Plugin] SOL tries to load non existing libsol-framework-te.so because of wrong regex expression #330\t[PyTorch/Numpy] Remove framework dependent input shape checks #328\t[Runtime] Move Shape checks from frontends to core-runtime #325\t[PyTorch] TorchScript RNN Networks #324\t[TF] \".local/lib/python3.6/site-packages/tensorflow/include/tensorflow/core/platform/refcount.h:90] Check failed: ref_.load() == 0 (1 vs. 0)\" #323\t[VE] native-ve extension functions produce wrong results with large tensor sizes #322\t[API] remove updateSize callbacks #320\t[TF] Hook up to TF's Memory Allocator #318\t[PyTorch] TorchScript parse problems for torch.nn.LPPool*D, torch.nn.functional.lp_pool*d #317\t[PyTorch] TorchScript Parser can't return varying model outputs... #316\t[PyTorch] Replace parser with TorchScript based implementation #314\t[PyTorch] Update to v1.9.0 #313\t[TF] SOL_LOG=TRACE fails in PluggableDevice TF implementation #312\t[CMake] Add Python dependencies to Dependencies CMake #311\t[ISPC] Update to 1.16.0 #310\t[DFP] Fix Padding #309\t[DFP] Can't execute BatchNorm without Weight+Bias #305\t[DFP] Fix Boundary checks for Max_Pooling in Kernel: 1, Stride: 2, Padding: Valid cases in TF #304\t[PyTorch] add is_inf and is_nan #303\t[Docs] Add description how to use GDB Debugging #301\t[PyTorch] Move X86 and CUDA-API from TH(C) to ATen Interface #297\t[AVEO] URPC_SHM_SEGID on RHEL8 #296\t[TF] argmax + argmin return indicies in different format #295\t[TF] HardSigmoid fails #294\t[TF] tf.keras.layer.Permute(1, 3, 2) fails #291\t[TF] tf.math.reduce_prod #290\t[Python] Change HLIR implementation from class based to function based implementation #287\t[PyTorch] demote version exception to warning and throw instead only if the VE is used #286\t[DFP] Move Reorder Transformation into DFP Backend and don't do it in the Algo postprocessing #282\t[VEDNN] Enable NHWC Convolutions #281\t[HLIR] Enable Convs with NCP input and PIO Parameters #280\t[DNNL] Problem executing NHWC Convolutions #279\t[TF] Conv uses always PIO weights, even with channels_first? #277\t[TF] Why does TF return tf.EagerTensor and SOL returns np.ndarray? #274\t[Python] Implement cross framework execution, i.e. to run TensorFlow model within PyTorch #273\t[TensorFlow] add VE device plugin #272\t[TensorFlow] implement param preprocessing #270\t[NEC-SOL] add option to use \"pip\" instead of \"pip3\" #269\t[NEC-SOL] add --trust option for trusting dav.neclab.eu repository #268\t[TF] Enable multi-input models #231\t[HLIR/DFP] Move HLIR-Loops to DFP-IR #216\t[Python] rename sol.internal to sol.hlir #200\t[Python] Make RNNCells to 1-layer 1-seq RNNs! #171\t[Runtime] store ptr and sizes in sol_ref #115\t[DFP] Bwd Accumulation sometimes not working properly #3\tTensorFlow v2 Support v0.4.1Docs06.05.2021 Added new shortcut URLs: Documentation: sol.neclab.eu/docs Issue Tracker: issue.sol-ai.org/[issue-id] Breaking Changes Installation routine has changed, please refer the installation guide. Behavior of sol.backends.X has changed and does no longer take attributes of sol.backends.X but strings. Behavior of sol.devices.X has changed and does no longer take attributes of sol.devices.X but strings. I.e. sol.device.set(sol.device.ve, 0) is now sol.device.set('ve', 0) Closed Issues #266\t[CUDNN] CUDA_STATUS_ARCH_MISMATCH in pytorch.layers.conv2d testcase #265\t[SQLite] Upgrade 3.35.5 #264\t[HLIR] Reductions get sometimes removed if the dimensions input size is unknown at compile time #263\t[VEBLAS] Wrong results with LSTM w/ bias using NCC 3.2.0 #262\t[Docs] Add docs.sol-project.org subdomain #261\t[Pytorch] Can't parse PyTorchic BERT #260\t[PyTorch] Addbmm problem with varidic batch size #258\t[AutoTuning] GEMMBackend fails when using variable batchsize #256\t[VEBLAS] Autotuning crashes because of missing Handle? #255\t[CUDNN] Bundle Libs to sol-backend-cudnn #254\t[SDK] Generate SOL SDK #253\t[CMake] Prevent DL4J from constant rebuilding #252\t[WHL] enable to install all packages directly from PyPI AND locally #251\t[PyTorch] add torch.Tensor.hip() #250\t[PyTorch] torch.tensor.masked_scatter missing #249\t[DFP] Inception fails in newest build #247\t[DNNL] Compile shared library if WITH_DEPLOYMENT is activated #246\t[Deployment] update to new plugin based API #245\t[ISPC] fix detection of AVX512 extensions #244\t[Plugins] Add Version Check to all SOL Plugins! #242\t[VE] Check if DeviceHandle::reduce still requires the AVEO workaround. #241\t[Tests] Add --help option #240\t[Core] Add Option to add custom backends to SOL #238\t[Jit] Move device specific library paths into the corresponding compilers! #237\t[Core] Refactor Device/Backend System to sideload new backends without rebuilding SOL core components #236\t[CUDNN] bundle libcudnn with SOL! #235\t[SQLite] Update to v3.35.4 #233\t[PyTorch] Update to 1.8.1 #232\t[Cache] sol.cache.clear should delete .sol folder #229\t[PyTorch] torch.tensor.masked_fill_ missing #228\t[PyTorch] torch.tensor.cumsum missing #227\t[PyTorch] Binding for pow() is incorrect #226\t[SQLite] Update to 3.35.2 #224\t[CUDA] Add Struct Reductions #219\t[PyTorch] Kill application from Python API if different version has detected, as the C library cannot be loaded if the API has changed #206\t[Linking] Densenet Linking sometimes dies with GCC/NCC, possibly out of memory? #195\t[HLIR] Match ALGO API to Layer API #162\t[Python] Move Tensor Operators from PyTorch to Python API v0.4.0.2Docs10.03.2021 Minor bugfix release. #223\t[PyTorch] Problem parsing Huggingface BERT #222\t[CUDNN] Upgrade to CUDNN 8 #221\t[CUDA] Add support for SM80 and SM86 architectures #220\t[CUDA] Allow linking against different versions of CUDA v0.4.0.1Docs08.03.2021 Minor bugfix release. #217\t[Transparent Offload] Crashes in 2nd run of model v0.4.0Docs08.03.2021 This is a major release for SOL coming with a series of new features, i.e. ONNX support, RNNs (for SX-Aurora only for now), AdaptivePooling, improved performance, better accuracy for BatchNorms and MeanReductions and many more.\nBreaking Changes We no longer distribute the SOL images via GitLab. Please follow the installation steps described here. SOL API no longer requires to explicitly include the correct interface, i.e. import sol.pytorch as sol. Instead just import sol and SOL will automatically detect the type of your model. The API for sol.deploy(...) has been simplified, please checkout the Deployment documentation. Closed Issues #2\tRecurrent Neural Networks (RNNs) #9\tAdaptive[Avg/Max]Pooling only works if it can be transformed into a normal Pooling #49\t[DFP] Can't use reduction for MaxPooling #53\tONNX Support #66\t[Docs] add ONNX docs #68\t[DFP] missing gradient: MOD #86\t[PIP] Solve name clash with public PYPI repo #92\t[DFP] IDX not used in code generation #95\t[PyTorch] Upgrade to 1.7.0 #98\t[DFP] SoftMax might produce uncompileable code #103\t[DFP] Multi-Value Reductions #107\t[GCC] Error compiling with GCC v4.8.5 #109\t[DFP] wrong gradient for SoftPlus #116\t[PyTorch] Arange producing wrong results if non-integer values used #118\t[Python] using dict for input/output causes randomized SOL hashes #119\t[PyTorch] Double check API calls, if they have changed in last upgrade #121\t[PyTorch] Missing Tests #122\t[PyTorch] HugginFace BERT stopped working #124\t[HuggingFace] Bert dimension mismatch #125\t[Core] Check for Memleaks #126\t[DL4J] Upgrade to new JSON format #127\t[DL4J] Upgrade to new DType System #128\t[Frontends] Make ```autotuning``` an additional parameter of the sol.optimize call! #129\t[DFP] Optimize IDX usage #130\t[DFP] wrong initial value for reduction accumulators #132\t[PyTorch] Testcase Arange fails #133\t[PyTorch] AddCDIV AddCMul missing #134\t[DFP] Wrong gradient for PReLU-Weight #136\t[PyTorch] Min/Max returned indicies do not match the PyTorch indicies format. #137\t[PyTorch] can't use named tuples in output #140\t[VE] Min/Max Reduce or Pooling, that need to use reduction within the kernel, produce wrong results during backward pass #141\t[Performance] MaxPooling Backward Pass #142\t[DFP] Improve Pooling Backward Performance #143\t[DFP] Improve Pooling Fwd Performance #144\t[PyTorch] Upgrade to 1.7.1 #145\t[DFP] Remove Inner #146\t[VE] Fix Updating of VBS #147\t[CUDNN] report Version and warn if version is \u003c 7.6.0 #149\t[CUDA] Exclude Half Precision API from GPUs below Maxwell #150\t[ISPC] Upgrade to 1.15.0 #151\t[DFP] Segfault when optimizing BERT #152\t[VEDNN] Evaluate new LLVM-VE #153\t[DNNL] upgrade to 1.8 #154\t[SQLITE] Update to 3.34.0 #155\t[Python] Add debug option to run ```python -m sol``` to check if SOL works correctly #156\t[PyTorch] return_indices of MaxPooling does not match PyTorch value range #158\t[DNNL] Update to v1.8.1 #159\t[PyTorch] SOL's behavior of inplace methods, i.e. neg_ is not identical #160\t[PyTorch] SOL segfaults during execution when model uses the same tensor for multiple outputs #164\t[PyTorch] verify that torch.nn.Conv1d is working #170\t[SQLITE] Update to 3.34.1 #172\t[VEBLAS] RNN #178\t[DFP] does not zero initialize gradient in backwardpass for narrows #180\t[Python] Single Python Wrapper for all frameworks #185\t[DFP] Missing LXXX_idx in BWD Filter pass for Conv #189\t[DFP] Problem in Embedding BWD #190\t[Deploy] Fix Deployment #192\t[HLIR] add sol_layer_input(network, layer, IType, LayerOutput) #194\t[HLIR] Make all Cat inputs to IType::Input/X instead of IType::None/0 #215\t[Debug] Increase font size in memory consumption graphs "
},
{
	"uri": "/releases/v0.3.html",
	"title": "v0.3 Betelgeuse",
	"tags": [],
	"description": "",
	"content": " VersionDateChanges v0.3.110.11.2020 #51 DL4J Support #82 [CMake] building with PCH seems to be broken #83 [Python] SOL does not support named model arguments, i.e. sol_model(key=value) #84 [Core] Inputs that directly get passed through as output, are not handled correctly #87 [Core] Missing DType: Bool #88 [VE] PyTorch does always print in scientific notation #89 [DFP] Possible race condition in Conv2D BWD_DATA #91 [DFP] Wrongly initialized value for MaxPooling #93 [Pytorch] Can't use torch.Tensor as input for sol.optimize #94 [PyTorch] Switching devices might yield in gradients of different shapes #96 [Docs] add DL4J docs #97 [PyTorch] LocalResponseNorm missing #100 [PyTorch] can't convert buffers prior execution #105 [DFP] Error when using variable batchsizes #106 [AVEO] Errornous transfer of function parameters v0.3.013.10.2020 The SOL v0.3.0 release contains a huge amount of changes. The highlights are listed below: PIP dependency installation: As SOL will support more frameworks starting from v0.3.1, we do not longer install all dependencies by default. Instead we you can select the installation of dependencies when issuing the pip3 install sol-image.whl[torch, onnx]. PyTorch Training No-Grad: During training PyTorch only computes the gradients starting from the tensor.backward() call, setting all other gradients to zero. SOL however assumes all outputs to contribute to the gradient. To achieve the same behavior, we introduced sol.no_grad(tensor). You can use it as follows to integrate into your model, without changing the model itself. Without this, it is not guaranteed that the gradient is identical!\tclass TrainingModel(torch.nn.Module): def __init__(self, model): super().__init__() self.model = model def forward(self, *args): A, B, C = self.model(*args) return A, sol.no_grad(B), sol.no_grad(C) model = TrainingModel(model) sol_model = sol.optimize(model, ...) for(batch in ...): A, B, C = sol_model(batch) A.backward() PyTorch parameter auto-loading: Before it was always necessary to copy parameters from the PyTorch model to the SOL model. We added the parameter copy_parameters=True to sol.optimize(...) which does this automatically. Huggingface GPT-2 support: Support for Huggingface GPT-2 has been added. However, there is an accuracy problem in the backward pass that we are currently investigating #80 Variable batchsize does not work in GPT-2 as they use the wildcard in the second and not first dimension, which is currently not supported by SOL #81. Using GPT-2 without variable batchsize works! ONNX Support: Was postponed to v0.3.1 release, to perform more tests before releasing. Internal changes to graph representation for faster processing. Process bar. Don't be alarmed that it might jump back. This is caused by the face that SOL does not know how many files need to be compiled prior generating the computation graph, therefore it might be the case that more files get added, moving the bar \"backwards\". "
},
{
	"uri": "/frameworks/compatibility.html",
	"title": "Compatibility",
	"tags": [],
	"description": "",
	"content": " FrameworkSOL VersionFramework Version Numpy\t\u0026ge; v0.7.0\t\u0026ge; v1.17.0 and \u0026ge; v2.0.0\tv0.4.2 - v0.7.0\t\u0026ge; v1.17.0, \u0026lt; v2.0.0\tONNX\t\u0026ge; v0.3.1\t\u0026ge; v1.7.0\tPyTorch\t\u0026ge; v0.7.11\t\u0026ge; v1.10.2\tv0.7.2 - v0.7.10\tv1.10.2 - v2.8.0\tv0.7.0 - v0.7.2\tv1.10.2 - v2.5.0\tv0.5.1 - v0.5.3\tv1.10.2 - v2.2.1\tv0.5.0\tv1.10.2 - v1.12.0\tv0.4.2.*\tv1.9.0\tv0.4.1\tv1.8.1\tv0.4.0.*\tv1.8.0\tv0.4.0\tv1.7.1\tv0.3.*\tv1.6.0\tv0.2.6\tv1.5.1\tv0.2.5\tv1.5.0\tTensorFlow\t\u0026ge; v0.7.11\t\u0026ge; v2.6\tv0.7.11\tv2.6-v2.20.0\tv0.7.6 - v0.7.10\tv2.6-v2.19.1\tv0.5.1 - v0.7.5\tv2.6-v2.15.1\tv0.5.0\tv2.6, v2.7, v2.8 and v2.9\tv0.4.2.*\tv2.5\t"
},
{
	"uri": "/releases/v0.2.html",
	"title": "v0.2 Altair",
	"tags": [],
	"description": "",
	"content": " VersionDateChanges v0.2.7.228.07.2020 fixed double free pointer and other error messages when running models for thousands of iterations v0.2.7.115.07.2020 added torch.Tensor.clone(), torch.min(dim) and torch.max(dim) to be used outside of sol.optimize. added PIP requirement for PyTorch, to ensure the correct version is installed v0.2.706.07.2020 Fixed Segfault in native tensor mode #32 Fixed NAN results in backward pass #34 Other Bugfixes/Improvements: #28, #29, #33, #36 v0.2.630.06.2020 Updated to PyTorch 1.5.1 #19 You can get the current VE memory consumption with torch.hip.memory_allocated(device) #26 Fixed MemLeak in VE native tensor mode #4 Updated to AVEO 0.9.10 #24, which improves the handling of dead processes when SOL died. Updated NLC to 2.1.0 #22 Redesigned SOL's seed system, to match usage as in the ML frameworks #21 Other Bugfixes/Improvements: #12, #16, #17, #18, #20, #23, #27 v0.2.518.06.2020 Supports: PyTorch: 1.5.0 Bugfixes: #11, #13 Features: #7, #10, #14, #15 v0.2.417.06.2020 Supports: PyTorch 1.5.0 Bugfixes: [VEBLAS/VEDNN] fixed autotuning #6 [Core] fixed crash when using Anaconda #5 v0.2.316.06.2020 Supports: PyTorch 1.5.0 Known Bugs: Calling torch.concat(...) on CPU outside can result in wrong results. This is caused by a bug in PyTorch 1.5 which is triggered when registering the SX-Aurora API. - We submitted a pull request to PyTorch that will fix the problem in a future release. New Features: [Backends] the heuristics for the DNN backends can be modified with sol.config[\"heuristic::BACKEND::LAYER::PASS\"]. Lower values get prefered. 0 disables the pass for the backend. Use sol.config.print() to get an overview of available options. These only apply if sol.config[\"autotuning\"] = False. [Core] SOL now can compile models for variable batch sizes! For this use sol.optimize(model, sol.input([0, 3, 224, 224]), batch_size=30), where batch_size is used as heuristic to tune the code for. SOL will try its best to provide good performance for all batch sizes, but it can happen that the performance will be lower than compiling for a specific batch size. [Core] added sol.versions() to print versions of used compilers + libraries. Please add this information whenever you report a bug! Bugfixes: [Core] improved internal layout + dimension handling [DFP] fixed bugs in autograd engine (i.e. $`\\frac{d\\text{pow}(x, n)}{dx}`$) [DFP] fixed race condition in Embedding layer gradient [DNNL] bugfixed grouped convolutions in ResNext [PyTorch] fixed problems with torch.addmm as reported by @malon [VEDA] fixed only showing 1 VE when VEDA_VISIBLE_DEVICES is not set [VEDA] fixed rare race condition, causing \"VEDA_ERROR_NOT_INITIALIZED\" errors when shutting down the application [VEDA] improved error handling and messages Breaking Changes: [PyTorch] renamed camelcase aB arguments to a_b to match PyTorch conventions. Other Changes: [Core] No-Op layers use names of Input, Param or Output nodes. [Core] Removed sol.config[\"ve::profiling\"]. [Core] sol.config[\"compiler::profile\"] = True prints execution times of fused layers. [Core] added memory consumption visualisation. See Usage/Debug for more details. [DNNL] added option for varying memory layouts in filter-pass [MKL] added sol-backend-mkl for X86 [PyTorch] added torch.Tensor.data_ptr(), torch.where(), torch.Tensor.where(), torch.Tensor.[__lt__, __le__, __gt__, __ge__, __eq__, __ne__] [VE] SOL will warn if TensorFlow-VE is installed on the system, as it is known to cause serious problems in combination with SOL. [VE] added sol-backend-veda-test to test minimal functionality Deprecated: [VE] Calling torch.nn.L1Loss or torch.nn.functional.l1_loss(...) outside of the sol.optimize(...) call is deprecated and will be removed in a future release. Please add these to the model, i.e.: class Model(torch.nn.Model): def __init__(...): self.loss = torch.nn.L1Loss() def forward(self, x): x = self.layers(x) y = self.loss(x) return x, y # x is your inference output, y the loss v0.2.220.05.2020 Supports: PyTorch 1.5.0 Bugfixes: added torch.addmm(), torch.addbmm() and torch.addmv() as requested my @malon added transformations to remove simple arithmetic operations from the computation graph (i.e. A + 0.0, A * 1.0, ...) AVEO gets compiled with nfort-3.0.4 to solve compability issues with installations that have nfort \u0026lt; 3.0.27 installed. fixed rare problem in scheduling mechanism fixed code generation problem for tensors with only a single element v0.2.119.05.2020 Supports: PyTorch 1.5.0 Bugfixes: fixed torch.sum(), torch.Tensor.sum() gradient enabled trace log in release build (might be removed again in a future releases) added torch.nn.L1Loss, torch.nn.functional.l1_loss fixed VE-PyTorch-API integration improved VE-thread scheduler for small batch sizes fixed AVEO integration updated SOL-PyTorch documentation v0.2.015.05.2020 Supports: PyTorch 1.5.0 !!! Breaking Changes !!! sol.optimize and sol.deploy no longer have the parameters vartype, layout and requires_grad. Instead you can directly pass tensors, i.e. sol.optimize(model, torch.rand(3, 5)) or use sol.optimize(model, sol.input([3, 5], torch.long, true, sol.layout.ncp)) to specify types that are not torch.float! This was necessary to support inputs of different type, i.e., in Transformer networks! Renamed \"sol.vartype\" to \"sol.dtype\" to be syntactically closer to the AI frameworks Starting with v0.2.0, SOL will have release names of solar systems. This first release is called \"Altair\". Added: torch.squeeze, torch.Tensor.squeeze, torch.Tensor.squeeze_ torch.unsqueeze, orch.Tensor.unsqueeze, torch.Tensor.unsqueeze_ torch.erf, torch.sin, torch.cos, torch.pow, torch.tanh, ... torch.nn.LayerNorm ... Bugfixes jit::dot failed with model names that contain spaces New Features/Improvements Supports multiple VE at the same time! We ported most torch.cuda calls to torch.hip. You can also use A.to('hip:1') syntax now! Added support for Transformer networks such as BERT (tested with https://github.com/dhlee347/pytorchic-bert and https://github.com/huggingface/transformers/) Improved Performance for VE random number generation SOL now reports if it recompiles or uses a cached version of the network. Known Bugs/Limitations If SOL crashes it can be that processes keep alive left on the Aurora. Run pkill ve_exec if this happens. Calling torch.concat(...) on CPU outside can result in wrong results. This is caused by a bug in PyTorch 1.5 which is triggered when registering the SX-Aurora API. - We submitted a pull request to PyTorch that will fix the problem in a future release. "
},
{
	"uri": "/frameworks/numpy.html",
	"title": "Numpy",
	"tags": [],
	"description": "",
	"content": " The Numpy frontend does not have a parser and can therefore only be used as execution target!\n"
},
{
	"uri": "/frameworks/onnx.html",
	"title": "ONNX",
	"tags": [],
	"description": "",
	"content": "import sol import numpy as np model = sol.optimize(\u0026#34;myModel.onnx\u0026#34;) # no input description needed, as provided by model itself! # or if you want to override the shape of the model model = sol.optimize(\u0026#34;myModel.onnx\u0026#34;, [np.rand(1, 3, 224, 224), ...], {\u0026#39;named_tensor\u0026#39;: np.rand(3, 2, 1)}) input = np.random.rand(1, 3, 224, 224) output = model(input) F.A.Q. How can I execute an ONNX model on an accelerator device? By default the ONNX frontend returns a Numpy executable model. You can either set sol.optimize(\u0026quot;model.onnx\u0026quot;, framework='pytorch') to a framework that supports accelerator devices or use the sol.device.set(\u0026lsquo;device_type\u0026rsquo;, device_idx) API for transparent offloading.\nHow can I train an ONNX model? The ONNX format does not store information about trainable parameters. However, you can set sol.optimize(\u0026ldquo;model.onnx\u0026rdquo;, framework=\u0026ldquo;pytorch\u0026rdquo;) to load the ONNX model into PyTorch. Then use iterate over model.parameters() and set the param.requires_grad = True for all parameters that shall be trained.\nTested Models ONNX Hub v1.10.0 Format: ModelName (OpSet(s))\nAlexNet (7, 8, 9, 12) CaffeNet (7, 8, 9, 12) DenseNet-121 (6, 7, 8, 9) DenseNet-121-12 (12) EfficientNet-Lite4 (11) Emotion FERPlus (7, 8) FCN ResNet-101 (11) FCN ResNet-50 (11, 12) GoogleNet (12) GoogleNet (3, 6, 7, 8, 9) Inception-2 (7, 8, 9) LResNet100E-IR (8) MNIST (7,8) MNIST-12 (12) MobileNet v2-1.0 (10) MobileNet v2-1.0-fp32 (12) MobileNet v2-7 (7) R-CNN ILSVRC13 (7, 8, 9) ResNet101 (7) ResNet101-v2 (7) ResNet152 (7) ResNet152-v2 (7) ResNet18 (7) ResNet18-v2 (7) ResNet34 (7) ResNet34-v2 (7) ResNet50 (7) ResNet50-caffe2 (7, 8, 9) ResNet50-fp32 (12) ResNet50-v2 (7) ShuffleNet-v1 (7, 8, 9) ShuffleNet-v2 (10) ShuffleNet-v2-fp32 (12) SqueezeNet 1.0 (6, 7, 8, 9, 12) SqueezeNet 1.1 (7) Super_Resolution (10) Tiny YOLOv2 (7,8) VGG 16 (7) VGG 16-bn (7) VGG 16-fp32 (12) VGG 19 (7) VGG 19-bn (7) VGG 19-caffe2 (7, 8, 9) YOLOv2 (9) YOLOv4 (11) ZFNet-512 (7, 8, 9, 12) version-RFB-320 (9) version-RFB-640 (9) Supported Layers Please refer to https://github.com/onnx/onnx/blob/master/docs/Operators.md for how these functions are used. This documentation only contains which layers, functions and tensor functionality are currently implemented within SOL.\nAbs Acos Acosh Add And ArgMax ArgMin Asin Asinh Atan Atanh AveragePool BatchNormalization Bernoulli Cast CastLike Ceil Celu Clip Concat Constant ConstantOfShape Conv ConvTranspose Cos Cosh CumSum DepthToSpace Div Dropout Elu Equal Erf Exp Expand Flatten Floor Gather Gemm GlobalAveragePool GlobalMaxPool Greater GreaterOrEqual HardSigmoid HardSwish Identity InstanceNormalization IsInf IsNaN LRN LeakyRelu Less LessOrEqual Log LogSoftmax MatMul Max MaxPool MaxUnPool Min Mod Mul Neg NonZero Not OneHot Or PRelu Pad Pow RandomUniform RandomUniformLike Range Reciprocal ReduceL1 ReduceL2 ReduceMax ReduceMean ReduceMin ReduceProd ReduceSum Relu Reshape Resize Round Selu Shape Shrink Sigmoid Sign Sin Sinh Slice Softmax Softmin Softplus Softsign Split Sqrt Squeeze Sub Sum Tan Tanh Tile Transpose Trilu Unsqueeze Where Xor "
},
{
	"uri": "/frameworks/pytorch.html",
	"title": "PyTorch",
	"tags": [],
	"description": "",
	"content": "To use SOL in PyTorch just import the package and optimize the model using sol.optimize(model).\nimport torch import sol import torchvision.models as models \u0026#39;\u0026#39;\u0026#39; Optimizing Model \u0026#39;\u0026#39;\u0026#39; py_model = models.__dict__[\u0026#34;alexnet\u0026#34;]() input\t= torch.rand(32, 3, 224, 224) sol_model = sol.optimize(py_model) \u0026#39;\u0026#39;\u0026#39; Run training \u0026#39;\u0026#39;\u0026#39; sol_model.train() # You cannot initialize the optimizer at this point. You need to wait until # you have executed the model at least once, so SOL has compiled it. optimizer = None for batch in ...: input, target = ... output = sol_model(input) loss = loss_func(output, target) # After running the model once, you can safely initialize the optimizer if optimizer is None: optimizer = torch.optim.Adam(sol_model.parameters(), ...) optimizer.zero_grad() loss.backward() optimizer.step() ... \u0026#39;\u0026#39;\u0026#39; Run validation \u0026#39;\u0026#39;\u0026#39; sol_model.eval() with torch.no_grad(): for batch in ...: input = ... output = sol_model(input) ... This example requires the torchvision package: https://github.com/pytorch/vision/ .\nIn v0.5.1 we added an lazy evaluation of sol.optimize(...) which removes the necessity to provide an example input. The model instead gets created the first time it gets executed.\nBy default SOL only optimized the primary execution of the model model(input). If you want to optimize helper functions such as model.predict(...) you need optimize these separately using:\nmodel.predict = torch.compile(model.predict, backend=\u0026#39;sol\u0026#39;) Optimizing helper functions requires PyTorch v2. You can also not use sol.optimize(...) as model.predict is a function, not a model and sol.optimize will not be able to identify the correct parser automatically.\nPyTorch specific options for sol.optimize Option Default Effect parser None compile: always uses torch.compile, jit: always uses torch.jit.script or torch.jit.trace, symbolic: always uses torch.fx.symbolic_trace, None: automatically determines which to use. Using torch.compile can add additional overhead to the execution time. trace False When not using torch.compile, enforces using torch.jit.trace. By default uses torch.jit.script. strict False When using torch.jit.trace, sets the strict argument. check_trace False When using torch.jit.trace, sets the check_trace argument. mode None When set to reduce-overhead uses CUDAGraphs on NVIDIA GPUs. F.A.Q. How can I use CUDA Graphs with SOL and PyTorch? CUDA Graphs support is still experimental in SOL!\nCUDA Graphs allow to significantly reduce the overhead when running GPU workloads but come with some restrictions, e.g., no changing dimensions can be used. For more information on CUDA Graphs please refer to this blog post. To use CUDA Graphs in SOL, just use: sol_model = sol.optimize(pytorch_model, mode=\u0026#39;reduce-overhead\u0026#39;) # or sol_model = torch.compile(pytorch_model, mode=\u0026#39;reduce-overhead\u0026#39;, backend=\u0026#39;sol\u0026#39;) I get the error `Unsupported at::Device: cuda:0`? This error is raised when SOL was unable to find the CUDA toolkit and you try to execute a model on an NVIDIA GPU. Please verify that SOL is able to find the CUDA toolkit by setting the CUDA_HOME env var.\nHow do I store/load a Pytorch model? For storing/loading a SOL PyTorch model, use model.state_dict() and model.load_state_dict(...) methods. # Storing sol_model = sol.optimize(pytorch_model, [...]) torch.save(sol_model.state_dict(), PATH) # Loading sol_model = sol.optimize(pytorch_model) sol_model.load_state_dict(torch.load(PATH)) More information on loading/storing the weights can be found here Can I use torch.compile(...) with SOL? Yes, with SOL ≥ v0.5.2 and PyTorch ≥ 2.0 you can use torch.compile(model, backend=\u0026lsquo;sol\u0026rsquo;) with SOL! For PyTorch \u0026lt; 2.6 you also need to manually import import sol.pytorch to ensure that SOL is correctly registered as backend into PyTorch.\nI get strange errors when running sol.optimize(model, ...), e.g., in Huggingface Transformers. Please read this guide how to convert the transformer into a TorchScript compatible model.\nHow can I update/downgrade to another PyTorch version? Before switching version, please have a look at the compatibility list if your PyTorch version is supported by SOL. If yes, and if you are using SOL with the NEC SX-Aurora TSUBASA, you can switch PyTorch using pip3 install veda-pytorch~={VERSION}. If you are using SOL with any other device, then you can just use pip3 install torch~={VERSION}.\nThe SOL model returns more outputs than the PyTorch model. This error occurs, i.e., in TorchVisions Inception V3 or GoogleNet. These models return 1 output in inference and 2 outputs in training mode. SOL relies on the TorchScript parser. Unfortunately the TorchVision models are build in a way that hides the change of output behavior from TorchScript. However, you can implement this yourself as follows: from torchvision import models class Wrap(torch.nn.Module): def __init__(self, model): super().__init__() self.model = model def forward(self, x): out = self.model(x) if torch.jit.is_scripting(): return (out[0], out[1]) if self.training else (out[0], None) return (out[0], out[1]) if self.training else (out, None) model = Wrap(models.inception_v3()) # use only one output model.training = False sol_model = sol.optimize(model, ...) # use two outputs model.training = True sol_model = sol.optimize(model, ...) SOL currently does not support to dynamically switch between these two modes and requires to compile the model for each mode separately.\nHow can I use Pytorch Lightning with SOL? You can just pass your Pytorch Lightning model to SOL\u0026rsquo;s sol.optimize(...) method.\nclass ResNet50(L.LightningModule): def __init__(self): super().__init__() self.model = sol.optimize(torchvision.models.resnet50()) def training_step(self, batch, batch_idx): ... def configure_optimizers(self): ... Can I implement custom layers using SOL? Please refer to Custom Layers.\nI get NotImplementedError: Could not run 'aten::addmm.out' with arguments from the 'VE' backend.' when trying to run an RNN on VE. Most likely you have used torch.compile(model, backend=\u0026ldquo;sol\u0026rdquo;) on an RNN model. Due to this open issue it\u0026rsquo;s impossible to compile RNNs using the torch.compile API. Please use sol.optimize(model, parser=\u0026lsquo;jit\u0026rsquo;) instead.\nTested Models TorchVision v0.26.0 alexnet convnext_base convnext_large convnext_small convnext_tiny deeplabv3_mobilenet_v3_large deeplabv3_resnet101 deeplabv3_resnet50 densenet121 densenet161 densenet169 densenet201 efficientnet_b0 efficientnet_b1 efficientnet_b2 efficientnet_b3 efficientnet_b4 efficientnet_b5 efficientnet_b6 efficientnet_b7 efficientnet_v2_l efficientnet_v2_m efficientnet_v2_s fcn_resnet101 fcn_resnet50 googlenet inception_v3 lraspp_mobilenet_v3_large maxvit_t mc3_18 mnasnet0_5 mnasnet0_75 mnasnet1_0 mnasnet1_3 mobilenet_v2 mobilenet_v3_large mobilenet_v3_small mvit_v1_b mvit_v2_s quantized_googlenet quantized_inception_v3 quantized_mobilenet_v2 quantized_mobilenet_v3_large quantized_resnet18 quantized_resnet50 quantized_resnext101_32x8d quantized_resnext101_64x4d quantized_shufflenet_v2_x0_5 quantized_shufflenet_v2_x1_0 quantized_shufflenet_v2_x1_5 quantized_shufflenet_v2_x2_0 r2plus1d_18 r3d_18 regnet_x_16gf regnet_x_1_6gf regnet_x_32gf regnet_x_3_2gf regnet_x_400mf regnet_x_800mf regnet_x_8gf regnet_y_128gf regnet_y_16gf regnet_y_1_6gf regnet_y_32gf regnet_y_3_2gf regnet_y_400mf regnet_y_800mf regnet_y_8gf resnet101 resnet152 resnet18 resnet34 resnet50 resnext101_32x8d resnext101_64x4d resnext50_32x4d s3d shufflenet_v2_x0_5 shufflenet_v2_x1_0 shufflenet_v2_x1_5 shufflenet_v2_x2_0 squeezenet1_0 squeezenet1_1 swin3d_b swin3d_s swin3d_t swin_b swin_s swin_t swin_v2_b swin_v2_s swin_v2_t vgg11 vgg11_bn vgg13 vgg13_bn vgg16 vgg16_bn vgg19 vgg19_bn vit_b_16 vit_b_32 vit_h_14 vit_l_16 vit_l_32 wide_resnet101_2 wide_resnet50_2 Transformers v5.5.4 Aimv2 Albert Align AltCLIP AltRoberta Apertus Arcee BarkCausal BarkCoarse BarkSemantic Bart Beit Bert BertLMHead BigBird BigBirdPegasus BioGpt BitNet Blenderbot BlenderbotSmall Blip2 Blip BlipForImageTextRetrieval BlipTextLMHead Bloom BridgeTower CLIP CLIPSeg CLIPSegForImageSegmentation CTRL CTRLLMHead Camembert ChineseCLIP Clvp CodeGen ConvNext ConvNextV2 CpmAnt CpmAntForCausalLM' Csm CsmDepthDecoder DINOv3ConvNext DINOv3ViT DPT DPTForDepthEstimation DPTForSemanticSegmentation Data2Vec Deberta DebertaV2 DecisionTransformerGPT2 DeiT DiffLlama Dinov2 DistilBert Doge DonutSwin EfficientNet Electra Emu3 Eomt Ernie4_5 Ernie Evolla FNet FSMT FalconH1 FastSpeech2Conformer Flaubert Flava FlavaImage FocalNet Fuyu GPT2 GPT2DoubleHeads GPT2LMHead GPTBigCode GPTNeo GPTNeoXJapanese Gemma2 Gemma3 Gemma Git Glm4 Glm GroupViT Helium IBert IJepa ImageGPT InstructBlip Janus Kosmos2 Kosmos2_5 LayoutLM LayoutLMv3 Levit Lfm2 Lilt Llama Luke M2M100 MBart MLCD MPNet MT5 MT5Encoder Mamba2 Marian MarkupLM MaskFormerSwin MegatronBert MetaClip2 Mistral MobileBert MobileNetV1 MobileNetV2 MobileViT ModernBert ModernBertDecoder Mra Mvp NllbMoe Nystromformer Olmo2 Olmo3 Olmo OpenAIGPT Ovis2 OwlViT Owlv2 PLBart Pegasus Perceiver Persimmon Phi3 Phi Pixtral Pvt Qwen2 RemBert RoCBert RoFormer Roberta Sam SamHQ SeamlessM4T SeamlessM4Tv2 SmolLM3 Splinter SqueezeBert StableLm Starcoder2 SwiftFormer Swin Swinv2 SwitchTransformers T5 T5Encoder TrOCR UMT5 UMT5Encoder VaultGemma ViT ViTMSN VideoLlava VisualBert VitDet Voxtral Whisper XCLIP XGLM XLM XLMRoberta XLMRobertaXL Yoso Zamba2 xLSTM TIMM v1.0.26 aimv2_1b_patch14_224 aimv2_1b_patch14_336 aimv2_1b_patch14_448 aimv2_3b_patch14_224 aimv2_3b_patch14_336 aimv2_3b_patch14_448 aimv2_huge_patch14_224 aimv2_huge_patch14_336 aimv2_huge_patch14_448 aimv2_large_patch14_224 aimv2_large_patch14_336 aimv2_large_patch14_448 beit3_base_patch16_224 beit3_giant_patch14_224 beit3_giant_patch14_336 beit3_large_patch16_224 beit_base_patch16_224 beit_base_patch16_384 beit_large_patch16_224 beit_large_patch16_384 beit_large_patch16_512 beitv2_base_patch16_224 beitv2_large_patch16_224 botnet26t_256 botnet50ts_256 caformer_b36 caformer_m36 caformer_s18 caformer_s36 cait_m36_384 cait_m48_448 cait_s24_224 cait_s24_384 cait_s36_384 cait_xs24_384 cait_xxs24_224 cait_xxs24_384 cait_xxs36_224 cait_xxs36_384 coat_lite_medium coat_lite_medium_384 coat_lite_mini coat_lite_small coat_lite_tiny coat_mini coat_small coat_tiny coatnet_0_224 coatnet_0_rw_224 coatnet_1_224 coatnet_1_rw_224 coatnet_2_224 coatnet_2_rw_224 coatnet_3_224 coatnet_3_rw_224 coatnet_4_224 coatnet_5_224 coatnet_bn_0_rw_224 coatnet_nano_cc_224 coatnet_nano_rw_224 coatnet_pico_rw_224 coatnet_rmlp_0_rw_224 coatnet_rmlp_1_rw2_224 coatnet_rmlp_1_rw_224 coatnet_rmlp_2_rw_224 coatnet_rmlp_2_rw_384 coatnet_rmlp_3_rw_224 coatnet_rmlp_nano_rw_224 coatnext_nano_rw_224 convformer_b36 convformer_m36 convformer_s18 convformer_s36 convit_base convit_small convit_tiny convmixer_1024_20_ks9_p14 convmixer_1536_20 convmixer_768_32 convnext_atto convnext_atto_ols convnext_atto_rms convnext_base convnext_femto convnext_femto_ols convnext_large convnext_large_mlp convnext_nano convnext_nano_ols convnext_pico convnext_pico_ols convnext_small convnext_tiny convnext_tiny_hnf convnext_xlarge convnext_xxlarge convnext_zepto_rms convnext_zepto_rms_ols convnextv2_atto convnextv2_base convnextv2_femto convnextv2_huge convnextv2_large convnextv2_nano convnextv2_pico convnextv2_small convnextv2_tiny cs3darknet_focus_l cs3darknet_focus_m cs3darknet_focus_s cs3darknet_focus_x cs3darknet_l cs3darknet_m cs3darknet_s cs3darknet_x cs3edgenet_x cs3se_edgenet_x cs3sedarknet_l cs3sedarknet_x cs3sedarknet_xdw csatv2 csatv2_21m cspdarknet53 cspresnet50 cspresnet50d cspresnet50w cspresnext50 darknet17 darknet21 darknet53 darknetaa53 davit_base davit_base_fl davit_giant davit_huge davit_huge_fl davit_large davit_small davit_tiny deit3_base_patch16_224 deit3_base_patch16_384 deit3_huge_patch14_224 deit3_large_patch16_224 deit3_large_patch16_384 deit3_medium_patch16_224 deit3_small_patch16_224 deit3_small_patch16_384 deit_base_distilled_patch16_224 deit_base_distilled_patch16_384 deit_base_patch16_224 deit_base_patch16_384 deit_small_distilled_patch16_224 deit_small_patch16_224 deit_tiny_distilled_patch16_224 deit_tiny_patch16_224 densenet121 densenet161 densenet169 densenet201 densenet264d dla102 dla102x2 dla102x dla169 dla34 dla46_c dla46x_c dla60 dla60_res2net dla60_res2next dla60x dla60x_c dm_nfnet_f0 dm_nfnet_f1 dm_nfnet_f2 dm_nfnet_f3 dm_nfnet_f4 dm_nfnet_f5 dm_nfnet_f6 dpn107 dpn131 dpn48b dpn68 dpn68b dpn92 dpn98 eca_botnext26ts_256 eca_nfnet_l0 eca_nfnet_l1 eca_nfnet_l2 eca_nfnet_l3 eca_resnet33ts eca_resnext26ts eca_vovnet39b ecaresnet101d ecaresnet101d_pruned ecaresnet200d ecaresnet269d ecaresnet26t ecaresnet50d ecaresnet50d_pruned ecaresnet50t ecaresnetlight ecaresnext26t_32x4d ecaresnext50t_32x4d edgenext_base edgenext_small edgenext_small_rw edgenext_x_small edgenext_xx_small efficientformer_l1 efficientformer_l3 efficientformer_l7 efficientformerv2_l efficientformerv2_s0 efficientformerv2_s1 efficientformerv2_s2 efficientnet_b0 efficientnet_b0_g16_evos efficientnet_b0_g8_gn efficientnet_b0_gn efficientnet_b1 efficientnet_b1_pruned efficientnet_b2 efficientnet_b2_pruned efficientnet_b3 efficientnet_b3_g8_gn efficientnet_b3_gn efficientnet_b3_pruned efficientnet_b4 efficientnet_b5 efficientnet_b6 efficientnet_b7 efficientnet_b8 efficientnet_blur_b0 efficientnet_cc_b0_4e efficientnet_cc_b0_8e efficientnet_cc_b1_8e efficientnet_el efficientnet_el_pruned efficientnet_em efficientnet_es efficientnet_es_pruned efficientnet_h_b5 efficientnet_l2 efficientnet_lite0 efficientnet_lite1 efficientnet_lite2 efficientnet_lite3 efficientnet_lite4 efficientnet_x_b3 efficientnet_x_b5 efficientnetv2_m efficientnetv2_rw_s efficientnetv2_rw_t efficientnetv2_s ese_vovnet19b_dw ese_vovnet19b_slim ese_vovnet19b_slim_dw ese_vovnet39b ese_vovnet39b_evos ese_vovnet57b ese_vovnet99b eva02_base_patch14_224 eva02_base_patch14_448 eva02_base_patch16_clip_224 eva02_large_patch14_224 eva02_large_patch14_448 eva02_large_patch14_clip_224 eva02_large_patch14_clip_336 eva02_small_patch14_224 eva02_small_patch14_336 eva02_tiny_patch14_224 eva02_tiny_patch14_336 eva_giant_patch14_224 eva_giant_patch14_336 eva_giant_patch14_560 eva_giant_patch14_clip_224 eva_large_patch14_196 eva_large_patch14_336 fasternet_l fasternet_m fasternet_s fasternet_t0 fasternet_t1 fasternet_t2 fastvit_ma36 fastvit_mci0 fastvit_mci1 fastvit_mci2 fastvit_mci3 fastvit_mci4 fastvit_s12 fastvit_sa12 fastvit_sa24 fastvit_sa36 fastvit_t12 fastvit_t8 fbnetc_100 fbnetv3_b fbnetv3_d fbnetv3_g flexivit_base flexivit_large flexivit_small focalnet_base_lrf focalnet_base_srf focalnet_huge_fl3 focalnet_huge_fl4 focalnet_large_fl3 focalnet_large_fl4 focalnet_small_lrf focalnet_small_srf focalnet_tiny_lrf focalnet_tiny_srf focalnet_xlarge_fl3 focalnet_xlarge_fl4 gc_efficientnetv2_rw_t gcresnet33ts gcresnet50t gcresnext26ts gcresnext50ts gcvit_base gcvit_small gcvit_tiny gcvit_xtiny gcvit_xxtiny gernet_l gernet_m gernet_s ghostnet_050 ghostnet_100 ghostnet_130 ghostnetv2_100 ghostnetv2_130 ghostnetv2_160 ghostnetv3_050 ghostnetv3_100 ghostnetv3_130 ghostnetv3_160 gmixer_12_224 gmixer_24_224 gmlp_b16_224 gmlp_s16_224 gmlp_ti16_224 hardcorenas_a hardcorenas_b hardcorenas_c hardcorenas_d hardcorenas_e hardcorenas_f hgnet_base hgnet_small hgnet_tiny hgnetv2_b0 hgnetv2_b1 hgnetv2_b2 hgnetv2_b3 hgnetv2_b4 hgnetv2_b5 hgnetv2_b6 hiera_base_224 hiera_base_plus_224 hiera_huge_224 hiera_large_224 hiera_small_224 hiera_tiny_224 hrnet_w18 hrnet_w18_small hrnet_w18_small_v2 hrnet_w18_ssld hrnet_w30 hrnet_w32 hrnet_w40 hrnet_w44 hrnet_w48 hrnet_w48_ssld hrnet_w64 inception_next_atto inception_next_base inception_next_small inception_next_tiny inception_resnet_v2 inception_v3 inception_v4 lcnet_035 lcnet_050 lcnet_075 lcnet_100 lcnet_150 legacy_senet154 legacy_seresnet101 legacy_seresnet18 legacy_seresnet34 legacy_seresnet50 legacy_seresnext101_32x4d legacy_seresnext26_32x4d legacy_seresnext50_32x4d legacy_xception levit_128 levit_128s levit_192 levit_256 levit_256d levit_384 levit_384_s8 levit_512 levit_512_s8 levit_512d levit_conv_128 levit_conv_128s levit_conv_192 levit_conv_256 levit_conv_256d levit_conv_384 levit_conv_384_s8 levit_conv_512 levit_conv_512_s8 levit_conv_512d mambaout_base mambaout_base_plus_rw mambaout_base_short_rw mambaout_base_tall_rw mambaout_base_wide_rw mambaout_femto mambaout_kobe mambaout_small mambaout_small_rw mambaout_tiny maxvit_nano_rw_256 maxvit_pico_rw_256 maxvit_rmlp_nano_rw_256 maxvit_rmlp_pico_rw_256 maxvit_rmlp_small_rw_224 maxvit_rmlp_small_rw_256 maxvit_rmlp_tiny_rw_256 maxvit_small_tf_224 maxvit_small_tf_384 maxvit_small_tf_512 maxvit_tiny_pm_256 maxvit_tiny_rw_224 maxvit_tiny_rw_256 maxvit_tiny_tf_224 maxvit_tiny_tf_384 maxvit_tiny_tf_512 maxxvit_rmlp_nano_rw_256 maxxvit_rmlp_small_rw_256 maxxvit_rmlp_tiny_rw_256 maxxvitv2_nano_rw_256 maxxvitv2_rmlp_base_rw_224 maxxvitv2_rmlp_base_rw_384 maxxvitv2_rmlp_large_rw_224 mixer_b16_224 mixer_b32_224 mixer_l16_224 mixer_l32_224 mixer_s16_224 mixer_s32_224 mixnet_l mixnet_m mixnet_s mixnet_xl mixnet_xxl mnasnet_050 mnasnet_075 mnasnet_100 mnasnet_140 mnasnet_small mobilenet_edgetpu_100 mobilenet_edgetpu_v2_l mobilenet_edgetpu_v2_m mobilenet_edgetpu_v2_s mobilenet_edgetpu_v2_xs mobilenetv1_100 mobilenetv1_100h mobilenetv1_125 mobilenetv2_035 mobilenetv2_050 mobilenetv2_075 mobilenetv2_100 mobilenetv2_110d mobilenetv2_120d mobilenetv2_140 mobilenetv3_large_075 mobilenetv3_large_100 mobilenetv3_large_150d mobilenetv3_rw mobilenetv3_small_050 mobilenetv3_small_075 mobilenetv3_small_100 mobilenetv4_conv_aa_large mobilenetv4_conv_aa_medium mobilenetv4_conv_blur_medium mobilenetv4_conv_large mobilenetv4_conv_medium mobilenetv4_conv_small mobilenetv4_conv_small_035 mobilenetv4_conv_small_050 mobilenetv4_hybrid_large mobilenetv4_hybrid_large_075 mobilenetv4_hybrid_medium mobilenetv4_hybrid_medium_075 mobilenetv5_300m mobilenetv5_300m_enc mobilenetv5_base mobileone_s0 mobileone_s1 mobileone_s2 mobileone_s3 mobileone_s4 mobilevit_s mobilevit_xs mobilevit_xxs mobilevitv2_050 mobilevitv2_075 mobilevitv2_100 mobilevitv2_125 mobilevitv2_150 mobilevitv2_175 mobilevitv2_200 mvitv2_base_cls mvitv2_huge_cls mvitv2_large_cls mvitv2_small_cls naflexvit_base_patch16_parfac_gap nasnetalarge nest_base nest_base_jx nest_small nest_small_jx nest_tiny nest_tiny_jx nextvit_base nextvit_large nextvit_small nf_ecaresnet101 nf_ecaresnet26 nf_ecaresnet50 nf_regnet_b0 nf_regnet_b1 nf_regnet_b2 nf_regnet_b3 nf_regnet_b4 nf_regnet_b5 nf_resnet101 nf_resnet26 nf_resnet50 nf_seresnet101 nf_seresnet26 nf_seresnet50 nfnet_f0 nfnet_f1 nfnet_f2 nfnet_f3 nfnet_f4 nfnet_f5 nfnet_f6 nfnet_f7 nfnet_l0 pit_b_224 pit_b_distilled_224 pit_s_224 pit_s_distilled_224 pit_ti_224 pit_ti_distilled_224 pit_xs_224 pit_xs_distilled_224 pnasnet5large poolformer_m36 poolformer_m48 poolformer_s12 poolformer_s24 poolformer_s36 poolformerv2_m36 poolformerv2_m48 poolformerv2_s12 poolformerv2_s24 poolformerv2_s36 pvt_v2_b0 pvt_v2_b1 pvt_v2_b2 pvt_v2_b2_li pvt_v2_b3 pvt_v2_b4 pvt_v2_b5 rdnet_base rdnet_large rdnet_small rdnet_tiny regnetv_040 regnetv_064 regnetx_002 regnetx_004 regnetx_004_tv regnetx_006 regnetx_008 regnetx_016 regnetx_032 regnetx_040 regnetx_064 regnetx_080 regnetx_120 regnetx_160 regnetx_320 regnety_002 regnety_004 regnety_006 regnety_008 regnety_008_tv regnety_016 regnety_032 regnety_040 regnety_040_sgn regnety_064 regnety_080 regnety_080_tv regnety_120 regnety_1280 regnety_160 regnety_2560 regnety_320 regnety_640 regnetz_005 regnetz_040 regnetz_040_h regnetz_b16 regnetz_b16_evos regnetz_c16 regnetz_c16_evos regnetz_d32 regnetz_d8 regnetz_d8_evos regnetz_e8 repghostnet_050 repghostnet_058 repghostnet_080 repghostnet_100 repghostnet_111 repghostnet_130 repghostnet_150 repghostnet_200 repvgg_a0 repvgg_a1 repvgg_a2 repvgg_b0 repvgg_b1 repvgg_b1g4 repvgg_b2 repvgg_b2g4 repvgg_b3 repvgg_b3g4 repvgg_d2se repvit_m0_9 repvit_m1 repvit_m1_0 repvit_m1_1 repvit_m1_5 repvit_m2 repvit_m2_3 repvit_m3 res2net101_26w_4s res2net101d res2net50_14w_8s res2net50_26w_4s res2net50_26w_6s res2net50_26w_8s res2net50_48w_2s res2net50d res2next50 resmlp_12_224 resmlp_24_224 resmlp_36_224 resmlp_big_24_224 resnest101e resnest14d resnest200e resnest269e resnest26d resnest50d resnest50d_1s4x24d resnest50d_4s2x40d resnet101 resnet101_clip resnet101_clip_gap resnet101c resnet101d resnet101s resnet10t resnet14t resnet152 resnet152c resnet152d resnet152s resnet18 resnet18d resnet200 resnet200d resnet26 resnet26d resnet26t resnet32ts resnet33ts resnet34 resnet34d resnet50 resnet50_clip resnet50_clip_gap resnet50_gn resnet50_mlp resnet50c resnet50d resnet50s resnet50t resnet50x16_clip_gap resnet50x4_clip_gap resnet50x64_clip_gap resnet51q resnet61q resnetaa101d resnetaa34d resnetaa50 resnetaa50d resnetrs101 resnetrs152 resnetrs200 resnetrs270 resnetrs350 resnetrs420 resnetrs50 resnetv2_101 resnetv2_101d resnetv2_101x1_bit resnetv2_101x3_bit resnetv2_152 resnetv2_152d resnetv2_152x2_bit resnetv2_152x4_bit resnetv2_18 resnetv2_18d resnetv2_34 resnetv2_34d resnetv2_50 resnetv2_50d resnetv2_50d_evos resnetv2_50d_frn resnetv2_50d_gn resnetv2_50t resnetv2_50x1_bit resnetv2_50x3_bit resnext101_32x16d resnext101_32x32d resnext101_32x4d resnext101_32x8d resnext101_64x4d resnext26ts resnext50_32x4d resnext50d_32x4d rexnet_100 rexnet_130 rexnet_150 rexnet_200 rexnet_300 rexnetr_100 rexnetr_130 rexnetr_150 rexnetr_200 rexnetr_300 samvit_base_patch16_224 sebotnet33ts_256 sedarknet21 selecsls42 selecsls42b selecsls60 selecsls60b selecsls84 semnasnet_050 semnasnet_075 semnasnet_100 semnasnet_140 senet154 sequencer2d_l sequencer2d_m sequencer2d_s seresnet101 seresnet152 seresnet152d seresnet18 seresnet200d seresnet269d seresnet33ts seresnet34 seresnet50 seresnet50t seresnetaa50d seresnext101_32x4d seresnext101_32x8d seresnext101_64x4d seresnext101d_32x8d seresnext26d_32x4d seresnext26t_32x4d seresnext26ts seresnext50_32x4d seresnextaa101d_32x8d seresnextaa201d_32x8d shvit_s1 shvit_s2 shvit_s3 shvit_s4 skresnet18 skresnet34 skresnet50 skresnet50d skresnext50_32x4d spnasnet_100 starnet_s050 starnet_s100 starnet_s150 starnet_s1 starnet_s2 starnet_s3 starnet_s4 swiftformer_l1 swiftformer_l3 swiftformer_s swiftformer_xs swin_base_patch4_window12_384 swin_base_patch4_window7_224 swin_large_patch4_window12_384 swin_large_patch4_window7_224 swin_s3_base_224 swin_s3_small_224 swin_s3_tiny_224 swin_small_patch4_window7_224 swin_tiny_patch4_window7_224 swinv2_base_window12_192 swinv2_base_window12to16_192to256 swinv2_base_window12to24_192to384 swinv2_base_window16_256 swinv2_base_window8_256 swinv2_cr_base_224 swinv2_cr_base_384 swinv2_cr_base_ns_224 swinv2_cr_giant_224 swinv2_cr_giant_384 swinv2_cr_huge_224 swinv2_cr_huge_384 swinv2_cr_large_224 swinv2_cr_large_384 swinv2_cr_small_224 swinv2_cr_small_384 swinv2_cr_small_ns_224 swinv2_cr_small_ns_256 swinv2_cr_tiny_224 swinv2_cr_tiny_384 swinv2_cr_tiny_ns_224 swinv2_large_window12_192 swinv2_large_window12to16_192to256 swinv2_large_window12to24_192to384 swinv2_small_window16_256 swinv2_small_window8_256 swinv2_tiny_window16_256 swinv2_tiny_window8_256 test_byobnet test_convnext2 test_convnext3 test_convnext test_efficientnet test_efficientnet_evos test_efficientnet_gn test_efficientnet_ln test_mambaout test_nfnet test_resnet test_vit3 test_vit tf_efficientnet_b0 tf_efficientnet_b1 tf_efficientnet_b2 tf_efficientnet_b3 tf_efficientnet_b4 tf_efficientnet_b5 tf_efficientnet_b6 tf_efficientnet_b7 tf_efficientnet_b8 tf_efficientnet_cc_b0_4e tf_efficientnet_cc_b0_8e tf_efficientnet_cc_b1_8e tf_efficientnet_el tf_efficientnet_em tf_efficientnet_es tf_efficientnet_l2 tf_efficientnet_lite0 tf_efficientnet_lite1 tf_efficientnet_lite2 tf_efficientnet_lite3 tf_efficientnet_lite4 tf_efficientnetv2_b0 tf_efficientnetv2_b1 tf_efficientnetv2_b2 tf_efficientnetv2_b3 tf_efficientnetv2_m tf_efficientnetv2_s tf_mixnet_l tf_mixnet_m tf_mixnet_s tf_mobilenetv3_large_075 tf_mobilenetv3_large_100 tf_mobilenetv3_large_minimal_100 tf_mobilenetv3_small_075 tf_mobilenetv3_small_100 tf_mobilenetv3_small_minimal_100 tiny_vit_11m_224 tiny_vit_21m_224 tiny_vit_21m_384 tiny_vit_21m_512 tiny_vit_5m_224 tinynet_a tinynet_b tinynet_c tinynet_d tinynet_e twins_pcpvt_base twins_pcpvt_large twins_pcpvt_small twins_svt_base twins_svt_large twins_svt_small vgg11 vgg11_bn vgg13 vgg13_bn vgg16 vgg16_bn vgg19 vgg19_bn vit_7b_patch16_dinov3 vit_base_mci_224 vit_base_patch14_dinov2 vit_base_patch14_reg4_dinov2 vit_base_patch16_18x2_224 vit_base_patch16_224 vit_base_patch16_224_miil vit_base_patch16_384 vit_base_patch16_clip_224 vit_base_patch16_clip_384 vit_base_patch16_clip_quickgelu_224 vit_base_patch16_dinov3 vit_base_patch16_dinov3_qkvb vit_base_patch16_gap_224 vit_base_patch16_plus_240 vit_base_patch16_plus_clip_240 vit_base_patch16_reg4_gap_256 vit_base_patch16_rope_224 vit_base_patch16_rope_ape_224 vit_base_patch16_rope_reg1_gap_256 vit_base_patch16_rpn_224 vit_base_patch16_siglip_224 vit_base_patch16_siglip_256 vit_base_patch16_siglip_384 vit_base_patch16_siglip_512 vit_base_patch16_siglip_gap_224 vit_base_patch16_siglip_gap_256 vit_base_patch16_siglip_gap_384 vit_base_patch16_siglip_gap_512 vit_base_patch16_xp_224 vit_base_patch32_224 vit_base_patch32_384 vit_base_patch32_clip_224 vit_base_patch32_clip_256 vit_base_patch32_clip_384 vit_base_patch32_clip_448 vit_base_patch32_clip_quickgelu_224 vit_base_patch32_plus_256 vit_base_patch32_siglip_256 vit_base_patch32_siglip_gap_256 vit_base_patch8_224 vit_base_r26_s32_224 vit_base_r50_s16_224 vit_base_r50_s16_384 vit_base_resnet26d_224 vit_base_resnet50d_224 vit_betwixt_patch16_gap_256 vit_betwixt_patch16_reg1_gap_256 vit_betwixt_patch16_reg4_gap_256 vit_betwixt_patch16_reg4_gap_384 vit_betwixt_patch16_rope_reg4_gap_256 vit_betwixt_patch32_clip_224 vit_dlittle_patch16_reg1_gap_256 vit_dpwee_patch16_reg1_gap_256 vit_dwee_patch16_reg1_gap_256 vit_giant_patch14_224 vit_giant_patch14_clip_224 vit_giant_patch14_dinov2 vit_giant_patch14_reg4_dinov2 vit_giant_patch16_gap_224 vit_giantopt_patch16_siglip_256 vit_giantopt_patch16_siglip_384 vit_giantopt_patch16_siglip_gap_256 vit_giantopt_patch16_siglip_gap_384 vit_gigantic_patch14_224 vit_gigantic_patch14_clip_224 vit_gigantic_patch14_clip_378 vit_gigantic_patch14_clip_quickgelu_224 vit_huge_patch14_224 vit_huge_patch14_clip_224 vit_huge_patch14_clip_336 vit_huge_patch14_clip_378 vit_huge_patch14_clip_quickgelu_224 vit_huge_patch14_clip_quickgelu_378 vit_huge_patch14_gap_224 vit_huge_patch14_xp_224 vit_huge_patch16_gap_448 vit_huge_plus_patch16_dinov3 vit_huge_plus_patch16_dinov3_qkvb vit_intern300m_patch14_448 vit_large_patch14_224 vit_large_patch14_clip_224 vit_large_patch14_clip_336 vit_large_patch14_clip_quickgelu_224 vit_large_patch14_clip_quickgelu_336 vit_large_patch14_dinov2 vit_large_patch14_reg4_dinov2 vit_large_patch14_xp_224 vit_large_patch16_224 vit_large_patch16_384 vit_large_patch16_dinov3 vit_large_patch16_dinov3_qkvb vit_large_patch16_rope_224 vit_large_patch16_rope_ape_224 vit_large_patch16_siglip_256 vit_large_patch16_siglip_384 vit_large_patch16_siglip_512 vit_large_patch16_siglip_gap_256 vit_large_patch16_siglip_gap_384 vit_large_patch16_siglip_gap_512 vit_large_patch32_224 vit_large_patch32_384 vit_large_r50_s32_224 vit_large_r50_s32_384 vit_little_patch16_reg1_gap_256 vit_little_patch16_reg4_gap_256 vit_medium_patch16_clip_224 vit_medium_patch16_gap_240 vit_medium_patch16_gap_256 vit_medium_patch16_gap_384 vit_medium_patch16_reg1_gap_256 vit_medium_patch16_reg4_gap_256 vit_medium_patch16_rope_reg1_gap_256 vit_medium_patch32_clip_224 vit_mediumd_patch16_reg4_gap_256 vit_mediumd_patch16_reg4_gap_384 vit_mediumd_patch16_rope_reg1_gap_256 vit_pe_core_base_patch16_224 vit_pe_core_large_patch14_336 vit_pe_core_small_patch16_384 vit_pe_core_tiny_patch16_384 vit_pe_lang_large_patch14_448 vit_pe_spatial_base_patch16_512 vit_pe_spatial_large_patch14_448 vit_pe_spatial_small_patch16_512 vit_pe_spatial_tiny_patch16_512 vit_pwee_patch16_reg1_gap_256 vit_relpos_base_patch16_224 vit_relpos_base_patch16_cls_224 vit_relpos_base_patch16_clsgap_224 vit_relpos_base_patch16_plus_240 vit_relpos_base_patch16_rpn_224 vit_relpos_base_patch32_plus_rpn_256 vit_relpos_medium_patch16_224 vit_relpos_medium_patch16_cls_224 vit_relpos_medium_patch16_rpn_224 vit_relpos_small_patch16_224 vit_relpos_small_patch16_rpn_224 vit_small_patch14_dinov2 vit_small_patch14_reg4_dinov2 vit_small_patch16_18x2_224 vit_small_patch16_224 vit_small_patch16_36x1_224 vit_small_patch16_384 vit_small_patch16_dinov3 vit_small_patch16_dinov3_qkvb vit_small_patch16_rope_224 vit_small_patch16_rope_ape_224 vit_small_patch16_rope_ape_224 vit_small_patch16_rope_ape_224 vit_small_patch32_224 vit_small_patch32_384 vit_small_patch8_224 vit_small_plus_patch16_dinov3 vit_small_plus_patch16_dinov3_qkvb vit_small_r26_s32_224 vit_small_r26_s32_384 vit_small_resnet26d_224 vit_small_resnet50d_s16_224 vit_so150m2_patch16_reg1_gap_256 vit_so150m2_patch16_reg1_gap_384 vit_so150m2_patch16_reg1_gap_448 vit_so150m_patch16_reg4_gap_256 vit_so150m_patch16_reg4_gap_384 vit_so150m_patch16_reg4_map_256 vit_so400m_patch14_siglip_224 vit_so400m_patch14_siglip_378 vit_so400m_patch14_siglip_384 vit_so400m_patch14_siglip_gap_224 vit_so400m_patch14_siglip_gap_378 vit_so400m_patch14_siglip_gap_384 vit_so400m_patch14_siglip_gap_448 vit_so400m_patch14_siglip_gap_896 vit_so400m_patch16_siglip_256 vit_so400m_patch16_siglip_384 vit_so400m_patch16_siglip_512 vit_so400m_patch16_siglip_gap_256 vit_so400m_patch16_siglip_gap_384 vit_so400m_patch16_siglip_gap_512 vit_srelpos_medium_patch16_224 vit_srelpos_small_patch16_224 vit_tiny_patch16_224 vit_tiny_patch16_384 vit_tiny_r_s16_p8_224 vit_tiny_r_s16_p8_384 vit_wee_patch16_reg1_gap_256 vit_xsmall_patch16_clip_224 vitamin_base_224 vitamin_large2_224 vitamin_large2_256 vitamin_large2_336 vitamin_large2_384 vitamin_large_224 vitamin_large_256 vitamin_large_336 vitamin_large_384 vitamin_small_224 vitamin_xlarge_256 vitamin_xlarge_336 vitamin_xlarge_384 vovnet39a vovnet57a wide_resnet101_2 wide_resnet50_2 xception41 xception41p xception65 xception65p xception71 xcit_large_24_p16_224 xcit_large_24_p16_384 xcit_large_24_p8_224 xcit_large_24_p8_384 xcit_medium_24_p16_224 xcit_medium_24_p16_384 xcit_medium_24_p8_224 xcit_medium_24_p8_384 xcit_nano_12_p16_224 xcit_nano_12_p16_384 xcit_nano_12_p8_224 xcit_nano_12_p8_384 xcit_small_12_p16_224 xcit_small_12_p16_384 xcit_small_12_p8_224 xcit_small_12_p8_384 xcit_small_24_p16_224 xcit_small_24_p16_384 xcit_small_24_p8_224 xcit_small_24_p8_384 xcit_tiny_12_p16_224 xcit_tiny_12_p16_384 xcit_tiny_12_p8_224 xcit_tiny_12_p8_384 xcit_tiny_24_p16_224 xcit_tiny_24_p16_384 xcit_tiny_24_p8_224 xcit_tiny_24_p8_384 torch.hub ultralytics/yolov yolov5n yolov5s yolov5m yolov5l yolov5x yolov5n6 yolov5s6 yolov5m6 yolov5l6 yolov5x6 mateuszbuda/brain-segmentation-pytorch unet Supported Layers Please refer to https://pytorch.org/docs/stable/ for how these functions are used. This documentation only contains which layers, functions and tensor functionality is currently implemented within SOL.\nTorchScript (torch.jit.trace/torch.jit.script) aten::Bool aten::Float aten::Int aten::IntImplicit aten::ScalarImplicit aten::__and__ aten::__contains__ aten::__derive_index aten::__getitem__ aten::__is__ aten::__isnot__ aten::__not__ aten::__or__ aten::__range_length aten::_convolution aten::_set_item aten::_unique2 aten::abs aten::absolute aten::acos aten::acosh aten::adaptive_avg_pool1d aten::adaptive_avg_pool2d aten::adaptive_avg_pool3d aten::adaptive_max_pool1d aten::adaptive_max_pool2d aten::adaptive_max_pool3d aten::add aten::addbmm aten::addcdiv aten::addcmul aten::addmm aten::all aten::alpha_dropout aten::amax aten::amin aten::any aten::append aten::arange aten::arccos aten::arccosh aten::arcsin aten::arcsinh aten::arctan aten::arctanh aten::argmax aten::argmin aten::as_strided aten::as_tensor aten::asin aten::asinh aten::atan2 aten::atan aten::atanh aten::avg_pool1d aten::avg_pool2d aten::avg_pool3d aten::backward aten::baddbmm aten::batch_norm aten::bernoulli aten::binary_cross_entropy aten::binary_cross_entropy_with_logits aten::bitwise_and aten::bitwise_left_shift aten::bitwise_not aten::bitwise_or aten::bitwise_right_shift aten::bitwise_xor aten::bmm aten::broadcast_tensors aten::broadcast_to aten::bucketize aten::cat aten::ceil aten::celu aten::chunk aten::clamp aten::clamp_max aten::clamp_min aten::clone aten::complex aten::concat aten::constant_pad_nd aten::contiguous aten::conv1d aten::conv2d aten::conv3d aten::conv_transpose1d aten::conv_transpose2d aten::conv_transpose3d aten::copy aten::cos aten::cosh aten::cosine_embedding_loss aten::cross aten::cross_entropy_loss aten::cumsum aten::dequantize aten::detach aten::device aten::dict aten::dim aten::div aten::divide aten::dot aten::dropout aten::einsum aten::elu aten::embedding aten::empty aten::empty_like aten::eq aten::equal aten::erf aten::erfc aten::exp2 aten::exp aten::expand aten::expand_as aten::expm1 aten::extend aten::eye aten::fft_fft2 aten::fft_fft aten::fft_fftn aten::fft_hfft aten::fft_ifft2 aten::fft_ifft aten::fft_ifftn aten::fft_ihfft aten::fft_irfft2 aten::fft_irfft aten::fft_irfftn aten::fft_rfft2 aten::fft_rfft aten::fft_rfftn aten::fill aten::flatten aten::flip aten::floor aten::floor_divide aten::floordiv aten::fmod aten::format aten::frobenius_norm aten::full aten::full_like aten::gather aten::ge aten::gelu aten::grad aten::greater aten::greater_equal aten::group_norm aten::gru aten::gru_cell aten::gt aten::hardshrink aten::hardsigmoid aten::hardswish aten::hardtanh aten::hinge_embedding_loss aten::huber_loss aten::imag aten::index aten::index_add aten::index_put aten::index_put_ aten::index_select aten::instance_norm aten::is_autocast_enabled aten::is_floating_point aten::isfinite aten::isinf aten::isnan aten::items aten::kl_div aten::l1_loss aten::layer_norm aten::le aten::leaky_relu aten::len aten::lift_fresh aten::linalg_cross aten::linalg_matrix_norm aten::linalg_norm aten::linalg_vector_norm aten::linear aten::linspace aten::list aten::log10 aten::log1p aten::log2 aten::log aten::log_sigmoid aten::log_softmax aten::logaddexp2 aten::logaddexp aten::logical_and aten::logical_not aten::logical_or aten::logical_xor aten::lstm aten::lstm_cell aten::lt aten::mT aten::margin_ranking_loss aten::masked_fill aten::matmul aten::max aten::max_pool1d aten::max_pool1d_with_indices aten::max_pool2d aten::max_pool2d_with_indices aten::max_pool3d aten::max_pool3d_with_indices aten::max_unpool1d aten::max_unpool2d aten::max_unpool3d aten::maximum aten::mean aten::meshgrid aten::min aten::minimum aten::mm aten::mse_loss aten::mul aten::multilabel_margin_loss aten::multiply aten::nanmean aten::nansum aten::narrow aten::narrow_copy aten::ne aten::neg aten::negative aten::new_full aten::new_zeros aten::nll_loss_nd aten::norm aten::not_equal aten::nuclear_norm aten::numel aten::one_hot aten::ones aten::ones_like aten::pad aten::percentFormat aten::permute aten::pixel_shuffle aten::poisson_nll_loss aten::pow aten::prelu aten::prod aten::quantize_per_tensor aten::rand aten::rand_like aten::randint aten::randint_like aten::randn aten::randn_like aten::real aten::reciprocal aten::relu6 aten::relu aten::remainder aten::repeat aten::repeat_interleave aten::requires_grad_ aten::reshape aten::reshape_as aten::rms_norm aten::rnn_relu aten::rnn_relu_cell aten::rnn_tanh aten::rnn_tanh_cell aten::roll aten::round aten::rrelu aten::rsqrt aten::rsub aten::scaled_dot_product_attention aten::scatter aten::scatter_add aten::scatter_reduce aten::select aten::select_scatter aten::selu aten::sigmoid aten::sign aten::silu aten::sin aten::sinh aten::size aten::slice aten::smooth_l1_loss aten::soft_margin_loss aten::softmax aten::softmin aten::softplus aten::softshrink aten::split aten::split_with_sizes aten::sqrt aten::square aten::squeeze aten::stack aten::std aten::str aten::sub aten::sum aten::t aten::tan aten::tanh aten::tensor aten::tensor_split aten::tensordot aten::tile aten::to aten::to_mkldnn aten::topk aten::transpose aten::tril aten::triplet_margin_loss aten::triu aten::type_as aten::unbind aten::unflatten aten::uniform aten::unsqueeze aten::upsample_bicubic2d aten::upsample_bilinear2d aten::upsample_linear1d aten::upsample_nearest1d aten::upsample_nearest2d aten::upsample_nearest3d aten::upsample_trilinear3d aten::values aten::var aten::view aten::view_as_complex aten::view_as_real aten::warn aten::where aten::zero aten::zeros aten::zeros_like prim::CallFunction prim::CallMethod prim::Constant prim::ConstantMKLDNNTensor prim::CreateObject prim::DictConstruct prim::Enter prim::Exit prim::GetAttr prim::If prim::ListConstruct prim::ListIndex prim::ListUnpack prim::Loop prim::ModuleContainerIndex prim::NumToTensor prim::Print prim::PythonOp prim::RaiseException prim::SetAttr prim::TupleConstruct prim::TupleIndex prim::TupleUnpack prim::Uninitialized prim::device prim::dtype prim::grad prim::is_nested prim::isinstance prim::layout prim::max prim::min prim::type prim::unchecked_cast quantized::conv2d quantized::conv2d_relu torch.fx (torch.compile) _operator.add _operator.and_ _operator.div _operator.eq _operator.floordiv _operator.ge _operator.getitem _operator.gt _operator.iadd _operator.imul _operator.invert _operator.isub _operator.itruediv _operator.le _operator.lt _operator.matmul _operator.mod _operator.mul _operator.ne _operator.neg _operator.not_ _operator.or_ _operator.pow _operator.setitem _operator.sub _operator.truediv builtins.getattr einops.einops.rearrange einops.einops.reduce einops.einops.repeat math.ceil torch.Size torch.Tensor.__abs__ torch.Tensor.__and__ torch.Tensor.__eq__ torch.Tensor.__or__ torch.Tensor.abs torch.Tensor.absolute torch.Tensor.acos torch.Tensor.acosh torch.Tensor.add torch.Tensor.addbmm torch.Tensor.addcdiv torch.Tensor.addcmul torch.Tensor.addmm torch.Tensor.all torch.Tensor.amax torch.Tensor.amin torch.Tensor.any torch.Tensor.arccos torch.Tensor.arccosh torch.Tensor.arcsin torch.Tensor.arcsinh torch.Tensor.arctan torch.Tensor.arctanh torch.Tensor.argmax torch.Tensor.argmin torch.Tensor.as_strided torch.Tensor.asin torch.Tensor.asinh torch.Tensor.atan2 torch.Tensor.atan torch.Tensor.atanh torch.Tensor.baddbmm torch.Tensor.bernoulli torch.Tensor.bernoulli_ torch.Tensor.bfloat16 torch.Tensor.bitwise_and torch.Tensor.bitwise_left_shift torch.Tensor.bitwise_not torch.Tensor.bitwise_or torch.Tensor.bitwise_right_shift torch.Tensor.bitwise_xor torch.Tensor.bmm torch.Tensor.bool torch.Tensor.broadcast_to torch.Tensor.byte torch.Tensor.cdouble torch.Tensor.ceil torch.Tensor.cfloat torch.Tensor.char torch.Tensor.chunk torch.Tensor.clamp torch.Tensor.clamp_max torch.Tensor.clamp_min torch.Tensor.clip torch.Tensor.clone torch.Tensor.contiguous torch.Tensor.copy_ torch.Tensor.cos torch.Tensor.cosh torch.Tensor.cpu torch.Tensor.cuda torch.Tensor.cumsum torch.Tensor.detach torch.Tensor.div torch.Tensor.divide torch.Tensor.double torch.Tensor.eq torch.Tensor.equal torch.Tensor.erf torch.Tensor.erfc torch.Tensor.exp torch.Tensor.expand torch.Tensor.expand_as torch.Tensor.expm1 torch.Tensor.fill_ torch.Tensor.fill_diagonal_ torch.Tensor.flatten torch.Tensor.flip torch.Tensor.float torch.Tensor.floor torch.Tensor.fmax torch.Tensor.fmin torch.Tensor.fmod torch.Tensor.gather torch.Tensor.ge torch.Tensor.greater torch.Tensor.greater_equal torch.Tensor.gt torch.Tensor.half torch.Tensor.hardshrink torch.Tensor.imag torch.Tensor.index_add torch.Tensor.index_put torch.Tensor.index_put_ torch.Tensor.index_select torch.Tensor.int torch.Tensor.isfinite torch.Tensor.isinf torch.Tensor.isnan torch.Tensor.item torch.Tensor.le torch.Tensor.less torch.Tensor.less_equal torch.Tensor.log10 torch.Tensor.log1p torch.Tensor.log2 torch.Tensor.log torch.Tensor.logaddexp2 torch.Tensor.logaddexp torch.Tensor.logical_and torch.Tensor.logical_not torch.Tensor.logical_or torch.Tensor.logical_xor torch.Tensor.long torch.Tensor.lt torch.Tensor.masked_fill torch.Tensor.matmul torch.Tensor.max torch.Tensor.maximum torch.Tensor.mean torch.Tensor.min torch.Tensor.minimum torch.Tensor.mm torch.Tensor.mul torch.Tensor.multiply torch.Tensor.nanmean torch.Tensor.nansum torch.Tensor.narrow torch.Tensor.ne torch.Tensor.neg torch.Tensor.negative torch.Tensor.new_empty torch.Tensor.new_full torch.Tensor.new_ones torch.Tensor.new_tensor torch.Tensor.new_zeros torch.Tensor.nonzero torch.Tensor.norm torch.Tensor.not_equal torch.Tensor.numel torch.Tensor.permute torch.Tensor.pow torch.Tensor.prod torch.Tensor.real torch.Tensor.reciprocal torch.Tensor.repeat torch.Tensor.repeat_interleave torch.Tensor.reshape torch.Tensor.reshape_as torch.Tensor.roll torch.Tensor.round torch.Tensor.rsqrt torch.Tensor.scatter torch.Tensor.scatter_add torch.Tensor.scatter_reduce torch.Tensor.select_scatter torch.Tensor.short torch.Tensor.sigmoid torch.Tensor.sign torch.Tensor.sin torch.Tensor.sinh torch.Tensor.size torch.Tensor.softmax torch.Tensor.softmin torch.Tensor.sort torch.Tensor.split torch.Tensor.sqrt torch.Tensor.square torch.Tensor.squeeze torch.Tensor.std torch.Tensor.sub torch.Tensor.subtract torch.Tensor.sum torch.Tensor.t torch.Tensor.tan torch.Tensor.tanh torch.Tensor.tensor_split torch.Tensor.tile torch.Tensor.to torch.Tensor.topk torch.Tensor.transpose torch.Tensor.tril torch.Tensor.triu torch.Tensor.true_divide torch.Tensor.type torch.Tensor.type_as torch.Tensor.unbind torch.Tensor.unflatten torch.Tensor.uniform torch.Tensor.unique torch.Tensor.unique_consecutive torch.Tensor.unsqueeze torch.Tensor.var torch.Tensor.view torch.Tensor.view_as torch.Tensor.where torch.Tensor.zero_ torch._C._autograd._get_data_attr torch._C._autograd._saved_tensors_hooks_disable torch._C._autograd._saved_tensors_hooks_enable torch._C._fft.fft_fft2 torch._C._fft.fft_fft torch._C._fft.fft_fftn torch._C._fft.fft_hfft2 torch._C._fft.fft_hfft torch._C._fft.fft_hfftn torch._C._fft.fft_ifft2 torch._C._fft.fft_ifft torch._C._fft.fft_ifftn torch._C._fft.fft_ihfft2 torch._C._fft.fft_ihfft torch._C._fft.fft_ihfftn torch._C._fft.fft_irfft2 torch._C._fft.fft_irfft torch._C._fft.fft_irfftn torch._C._fft.fft_rfft2 torch._C._fft.fft_rfft torch._C._fft.fft_rfftn torch._C._functorch._add_batch_dim torch._C._functorch._remove_batch_dim torch._C._functorch._vmap_decrement_nesting torch._C._functorch._vmap_increment_nesting torch._C._linalg.linalg_cross torch._C._linalg.linalg_matrix_norm torch._C._linalg.linalg_norm torch._C._linalg.linalg_vector_norm torch._C._log_api_usage_once torch._C._nn.avg_pool2d torch._C._nn.avg_pool3d torch._C._nn.gelu torch._C._nn.linear torch._C._nn.log_sigmoid torch._C._nn.one_hot torch._C._nn.pad torch._C._nn.scaled_dot_product_attention torch._C._nn.softplus torch._C._nn.softshrink torch._C._set_grad_enabled torch._assert torch._check_is_size torch._dynamo.utils.wrapped_sqrt torch._functorch.autograd_function.autograd_function_apply torch._functorch.predispatch._add_batch_dim torch._functorch.predispatch._remove_batch_dim torch._functorch.predispatch._vmap_decrement_nesting torch._functorch.predispatch._vmap_increment_nesting torch._functorch.predispatch.lazy_load_decompositions torch._functorch.vmap.lazy_load_decompositions torch._ops._C.fused_add_rms_norm torch._ops._C.rms_norm torch._ops._C.rotary_embedding torch._ops._C.silu_and_mul torch._ops._c10d_functional.all_gather_into_tensor torch._ops._c10d_functional.all_reduce torch._ops._c10d_functional.wait_tensor torch._ops.aten._assert_scalar.default torch._ops.aten.sym_size.int torch._ops.sol.custom torch._refs.tensor torch.abs torch.acos torch.acosh torch.adaptive_avg_pool1d torch.adaptive_avg_pool2d torch.adaptive_avg_pool3d torch.add torch.addbmm torch.addcdiv torch.addcmul torch.addmm torch.amax torch.amin torch.amp.autocast_mode._enter_autocast torch.amp.autocast_mode._exit_autocast torch.any torch.arange torch.argmax torch.argmin torch.as_strided torch.as_tensor torch.asin torch.asinh torch.atan2 torch.atan torch.atanh torch.autograd.function.FunctionCtx torch.avg_pool1d torch.avg_pool2d torch.avg_pool3d torch.baddbmm torch.bernoulli torch.bitwise_and torch.bitwise_left_shift torch.bitwise_not torch.bitwise_or torch.bitwise_right_shift torch.bitwise_xor torch.bmm torch.broadcast_to torch.bucketize torch.cat torch.ceil torch.chunk torch.clamp torch.clamp_max torch.clamp_min torch.clip torch.clone torch.complex torch.concat torch.concatenate torch.conv1d torch.conv2d torch.conv3d torch.conv_transpose1d torch.conv_transpose2d torch.conv_transpose3d torch.cos torch.cosh torch.cross torch.cumsum torch.div torch.divide torch.dot torch.empty torch.empty_like torch.eq torch.equal torch.erf torch.erfc torch.exp torch.expm1 torch.eye torch.flatten torch.flip torch.floor torch.floor_divide torch.full torch.full_like torch.functional.einsum torch.functional.meshgrid torch.functional.norm torch.functional.split torch.functional.tensordot torch.functional.unique torch.functional.unique_consecutive torch.gather torch.ge torch.getitem torch.greater torch.greater_equal torch.gt torch.hardshrink torch.imag torch.index_put torch.index_select torch.isfinite torch.isinf torch.isnan torch.le torch.less torch.less_equal torch.linspace torch.log10 torch.log1p torch.log2 torch.log torch.log_softmax torch.logaddexp2 torch.logaddexp torch.logical_and torch.logical_not torch.logical_or torch.logical_xor torch.lstm torch.lstm_cell torch.lt torch.masked_fill torch.matmul torch.max torch.maximum torch.mean torch.min torch.minimum torch.mm torch.mul torch.multiply torch.nanmean torch.nansum torch.ne torch.neg torch.negative torch.nn.functional.adaptive_avg_pool1d torch.nn.functional.adaptive_avg_pool2d torch.nn.functional.adaptive_avg_pool3d torch.nn.functional.adaptive_max_pool1d torch.nn.functional.adaptive_max_pool2d torch.nn.functional.adaptive_max_pool3d torch.nn.functional.alpha_dropout torch.nn.functional.batch_norm torch.nn.functional.binary_cross_entropy torch.nn.functional.binary_cross_entropy_with_logits torch.nn.functional.celu torch.nn.functional.cosine_embedding_loss torch.nn.functional.cross_entropy torch.nn.functional.dropout torch.nn.functional.elu torch.nn.functional.embedding torch.nn.functional.gaussian_nll_loss torch.nn.functional.glu torch.nn.functional.group_norm torch.nn.functional.hardsigmoid torch.nn.functional.hardswish torch.nn.functional.hardtanh torch.nn.functional.hinge_embedding_loss torch.nn.functional.huber_loss torch.nn.functional.instance_norm torch.nn.functional.interpolate torch.nn.functional.kl_div torch.nn.functional.l1_loss torch.nn.functional.layer_norm torch.nn.functional.leaky_relu torch.nn.functional.local_response_norm torch.nn.functional.log_softmax torch.nn.functional.lp_pool1d torch.nn.functional.lp_pool2d torch.nn.functional.lp_pool3d torch.nn.functional.margin_ranking_loss torch.nn.functional.max_pool1d torch.nn.functional.max_pool2d torch.nn.functional.max_pool2d_with_indices torch.nn.functional.max_pool3d torch.nn.functional.max_pool3d_with_indices torch.nn.functional.max_unpool1d torch.nn.functional.max_unpool2d torch.nn.functional.max_unpool3d torch.nn.functional.mse_loss torch.nn.functional.multilabel_margin_loss torch.nn.functional.multilabel_soft_margin_loss torch.nn.functional.nll_loss torch.nn.functional.normalize torch.nn.functional.pad torch.nn.functional.poisson_nll_loss torch.nn.functional.relu6 torch.nn.functional.relu torch.nn.functional.rrelu torch.nn.functional.selu torch.nn.functional.silu torch.nn.functional.smooth_l1_loss torch.nn.functional.soft_margin_loss torch.nn.functional.softmax torch.nn.functional.softmin torch.nn.functional.softsign torch.nn.functional.tanh torch.nn.functional.tanhshrink torch.nn.functional.triplet_margin_loss torch.nn.functional.triplet_margin_with_distance_loss torch.nonzero torch.norm torch.numel torch.ones torch.ones_like torch.ops.higher_order.autograd_function_apply torch.ops.higher_order.tag_activation_checkpoint torch.outer torch.permute torch.pixel_shuffle torch.pow torch.prelu torch.prod torch.quantize_per_tensor torch.rand torch.rand_like torch.randint torch.randint_like torch.real torch.relu torch.repeat_interleave torch.reshape torch.rms_norm torch.rnn_relu torch.rnn_relu_cell torch.rnn_tanh torch.rnn_tanh_cell torch.roll torch.round torch.rsqrt torch.scalar_tensor torch.scatter torch.scatter_add torch.scatter_reduce torch.select_scatter torch.sigmoid torch.sign torch.sin torch.sinh torch.softmax torch.sort torch.sqrt torch.square torch.squeeze torch.stack torch.std torch.sub torch.sum torch.swapaxes torch.sym_int torch.sym_max torch.sym_min torch.sym_sum torch.tan torch.tanh torch.tensor torch.tensor_split torch.tile torch.topk torch.transpose torch.tril torch.triu torch.true_divide torch.truediv torch.unbind torch.unflatten torch.unsqueeze torch.var torch.view_as_complex torch.view_as_real torch.where torch.zeros torch.zeros_like "
},
{
	"uri": "/frameworks/tensorflow.html",
	"title": "TensorFlow",
	"tags": [],
	"description": "",
	"content": "SOL\u0026rsquo;s TensorFlow integration supports to translate tf.Function, tf.Module, Keras and tf.saved_model models into SOL models. If your tf.saved_model has multiple signatures, you need to select the preferred one using sol.optimize(my_saved_model.signatures['my_signature']). By default SOL uses the tf.saved_model.__call__ function.\nimport tensorflow as tf import sol import tensorflow.keras as keras def AlexNet(input_shape=(224, 224, 3), format=\u0026#34;channels_last\u0026#34;): inputs = keras.Input(shape=(input_shape)) x = inputs x = keras.layers.Conv2D\t(input_shape=input_shape, filters=64, kernel_size=(11,11), strides=(4,4), padding=\u0026#39;same\u0026#39;, activation=\u0026#39;relu\u0026#39;, data_format=format)(x) x = keras.layers.MaxPooling2D\t(pool_size=3, strides=2, padding=\u0026#39;valid\u0026#39;, data_format=format)(x) x = keras.layers.Conv2D\t(filters=192, kernel_size=5, strides=1, padding=\u0026#39;same\u0026#39;, activation=\u0026#39;relu\u0026#39;, data_format=format)(x) x = keras.layers.MaxPooling2D\t(pool_size=3, strides=2, padding=\u0026#34;valid\u0026#34;, data_format=format)(x) x = keras.layers.Conv2D\t(filters=384, kernel_size=3, strides=1, padding=\u0026#34;same\u0026#34;, activation=\u0026#39;relu\u0026#39;, data_format=format)(x) x = keras.layers.Conv2D\t(filters=256, kernel_size=3, strides=1, padding=\u0026#34;same\u0026#34;, activation=\u0026#39;relu\u0026#39;, data_format=format)(x) x = keras.layers.Conv2D\t(filters=256, kernel_size=3, strides=1, padding=\u0026#34;same\u0026#34;, activation=\u0026#39;relu\u0026#39;, data_format=format)(x) x = keras.layers.MaxPooling2D\t(pool_size=3, strides=2, padding=\u0026#34;valid\u0026#34;, data_format=format)(x) x = keras.layers.Flatten\t(data_format=format)(x) x = keras.layers.Dropout\t(rate=0.5)(x) x = keras.layers.Dense\t(4096, input_shape=(256*6*6,), activation=\u0026#39;relu\u0026#39;)(x) x = keras.layers.Dropout\t(rate=0.5)(x) x = keras.layers.Dense\t(4096, activation=\u0026#34;relu\u0026#34;)(x) x = keras.layers.Dense\t(1000)(x) return keras.models.Model\t(inputs=inputs, outputs=x) @tf.function(input_signature=[tf.TensorSpec([None, 224, 224, 3], dtype=tf.float32)]) def tf_function(input): return ... class TFModule(tf.Module): def init(self): super().__init__() self.var = tf.Variable(...) @tf.function(input_signature=[tf.TensorSpec([None, 224, 224, 3], dtype=tf.float32)]) def __call__(self, input): return ... with tf.device(\u0026#39;/CPU:0\u0026#39;): sol_model = sol.optimize(AlexNet(), batch_size=1) # or sol_model = sol.optimize(tf_function) # or sol_model = sol.optimize(TFModule()) # or sol_model = sol.optimize(tf.saved_model.load(\u0026#34;/path/to/saved/model\u0026#34;)) # Inference output = sol_model(inputs) # Training for Keras Models sol_model.compile(...) sol_model.fit(inputs, targets) # Training for tf.Function and tf.Module # TODO: Since SOL v0.5.3 we integrated SOL tighter into Keras\u0026rsquo;s Model.compile(...) function. It\u0026rsquo;s still experimental and might not work in all situations.\nimport sol.tensorflow # required to enable the modifications to Keras model = init_your_model() model.compile(optimizer, loss, sol_compile=True) # use sol_vdims=[...] to modify the sol.optimize(..., vdims=sol_vdims) attribute. model(input_data) # runs using SOL F.A.Q. What are the best configurations for using SOL with TensorFlow? If you are using SOL with TensorFlow on X86 you should set the following env vars:\nOMP_NUM_THREADS=$(lscpu -b -p=Core,Socket | grep -v \u0026#39;^#\u0026#39; | sort -u | wc -l) OMP_PROC_BIND=TRUE TF_NUM_INTEROP_THREADS=1 How can I define that the model input shall return a gradient? By default all inputs get no gradients assigned. If you want to override this behavior, use\nsol_model = sol.optimize(model, requires_grad={\u0026#34;input_1\u0026#34;, \u0026#34;whatever\u0026#34;}) All input\u0026rsquo;s whose name is within the set will return a gradient.\nHow can I override the model input shapes? By default all inputs use the input shapes defined in the model. If you want to override this behavior, use\nsol_model = sol.optimize(model, shapes={\u0026#34;input_1\u0026#34;: [1, 2, 3], \u0026#34;whatever\u0026#34;: [77, 3, 5]}) Be aware that your overwritten shapes need to be valid in terms of the model, otherwise compilation will fail.\nHow can I update/downgrade to another TensorFlow version? Before switching version, please have a look at the compatibility list if your TensorFlow version is supported by SOL. If yes then you can just use pip3 install tensorflow~={VERSION}.\nHow do I store/load a Tensorflow Keras model? SOL model's cannot be stored directly. For storing/loading a SOL Keras model, use model.save_weights(...) and model.load_weights(...) methods. # Storing sol_model = sol.optimize(keras_model) sol_model.save_weights(checkpoint_path) # Loading sol_model = sol.optimize(keras_model) sol_model.load_weights(checkpoint_path) More information on loading/storing the weights can be found here Which activations/recurrent_activations are supported by RNN layers? SOL currently supports [None, \u0026rsquo;linear\u0026rsquo;, \u0026rsquo;tanh\u0026rsquo;, \u0026lsquo;sigmoid\u0026rsquo;, \u0026lsquo;relu\u0026rsquo;]. If you need another RNN activation function, please get in contact with us.\nTested Models keras.applications (v3.11.3) ConvNeXtBase ConvNeXtLarge ConvNeXtSmall ConvNeXtTiny ConvNeXtXLarge DenseNet121 DenseNet169 DenseNet201 EfficientNetB0 EfficientNetB1 EfficientNetB2 EfficientNetB3 EfficientNetB4 EfficientNetB5 EfficientNetB6 EfficientNetB7 EfficientNetV2B0 EfficientNetV2B1 EfficientNetV2B2 EfficientNetV2B3 EfficientNetV2L EfficientNetV2M EfficientNetV2S InceptionResNetV2 InceptionV3 MobileNet MobileNetV2 MobileNetV3Large MobileNetV3Small NASNetLarge NASNetMobile RegNetX002 RegNetX004 RegNetX006 RegNetX008 RegNetX016 RegNetX032 RegNetX040 RegNetX064 RegNetX080 RegNetX120 RegNetX160 RegNetX320 RegNetY002 RegNetY004 RegNetY006 RegNetY008 RegNetY016 RegNetY032 RegNetY040 RegNetY064 RegNetY080 RegNetY120 RegNetY160 RegNetY320 ResNet101 ResNet101V2 ResNet152 ResNet152V2 ResNet50 ResNet50V2 ResNetRS101 ResNetRS152 ResNetRS200 ResNetRS270 ResNetRS350 ResNetRS420 ResNetRS50 VGG16 VGG19 Xception Supported Layers Please refer to https://www.tensorflow.org/api/stable for how these functions are used. This documentation only contains which layers, functions and tensor functionality is currently implemented within SOL.\nKeras Layers Keras layers not listed are parsed using the TensorFlow parser.\nActivation AlphaDropout BatchNormalization Bidirectional Conv1D Conv1DTranspose Conv2D Conv2DTranspose Conv3D Conv3DTranspose Dense DepthwiseConv1D DepthwiseConv2D Dropout GRU InputLayer LSTM SeparableConv1D SeparableConv2D SimpleRNN Softmax Concatenate TensorFlow Layers Abs Acos Acosh AddN AddV2 All Any ArgMax ArgMin Asin Asinh AssignSubVariableOp AssignVariableOp Atan2 Atan Atanh AvgPool3D AvgPool BatchMatMulV2 BiasAdd Cast Ceil ConcatV2 Const Conv1D Conv2D Conv2DBackpropInput Conv3D Cos Cosh Cumsum DepthwiseConv2dNative DivNoNan Einsum Elu Equal Erf Erfc Exp ExpandDims Expm1 Fill Floor FloorDiv FloorMod FusedBatchNormV3 GatherV2 Greater GreaterEqual Identity IdentityN IsFinite IsInf IsNan LeakyRelu Less LessEqual Log1p Log LogSoftmax LogicalAnd LogicalNot LogicalOr MatMul Max MaxPool3D MaxPool MaxPoolWithArgmax Maximum Mean Min Minimum Mul Neg NoOp NotEqual Pack Pad PadV2 PartitionedCall Placeholder Pow Prod RandomUniform RandomUniformInt Range ReadVariableOp RealDiv Reciprocal Relu6 Relu Reshape ResizeArea ResizeBicubic ResizeBilinear ResizeNearestNeighbor ResourceGather ReverseV2 Round Rsqrt Select SelectV2 Selu Shape Sigmoid Sign Sin Sinh Softmax Softplus Softsign Split SplitV Sqrt Square SquaredDifference Squeeze StatefulPartitionedCall StatelessRandomGetKeyCounter StatelessRandomUniformIntV2 StatelessRandomUniformV2 StatelessWhile StopGradient StridedSlice Sub Sum Tan Tanh TensorListFromTensor TensorListGetItem TensorListReserve TensorListSetItem TensorListStack Tile Transpose Unpack Where While Xdivy Xlog1py Xlogy ZerosLike "
},
{
	"uri": "/releases/v0.1.html",
	"title": "v0.1 SOL",
	"tags": [],
	"description": "",
	"content": " VersionDateChanges v0.1.8.228.04.2020 Minor maintenance release. Fixes linking problem with libnfort_m.so.2. v0.1.8.123.04.2020 Supports: PyTorch 1.4.0 This is a maintenance release, linked against newer VEOS libraries (2.4.2). You only need to update if you encounter Abort (core dump), Illegal Instruction (core dump) or similar errors when running SOL with newer VEOS versions. We will have a new release soon, with support to run SOL on multiple VE's in parallel and PyTorch v1.5.0 support. v0.1.827.01.2020 Supports: PyTorch 1.4.0 Fixed \"X86 requires sol.backends.ispc!\" as reported by @malon Fixed ## WARNING ##: This version of SOL has been linked against PyTorch v1.4.0+cpu but you are using v1.4.0. It's not recommended to use varying versions! as reported by @malon. Fixed limitation to VE#0 in Native Tensors mode as proposed by @efocht. Use VE_NODE_NUMBER env var to set the VE you want to run on. Minor performance improvements for Inference mode. v0.1.724.01.2020 Supports: PyTorch 1.4.0 Lots of performance improvements, especially for inference (BatchSize \u003c 8) Native Tensor Support for PyTorch: This allows you to use Aurora Tensors within PyTorch! I didn't have time to update the documentation yet, but in here is an example: import torch import sol.pytorch input = torch.rand(1, 3, 224, 224) py_model = ... sol_model = sol.optimize(py_model, input.size()) sol_model.load_state_dict(py_model) # sol.device.set(sol.device.ve, 0) # no longer needed sol_model.to(\u0026#34;hip\u0026#34;) # copy model to device input = input.to(\u0026#34;hip\u0026#34;) # copy input to device sol_model(input) torch.hip.synchronize() So in principle it works as with CUDA but you need to use \u0026ldquo;hip\u0026rdquo; instead of cuda. The other method with only using sol.device.set(sol.device.ve, 0) still works and will be further supported, but it has performance drawbacks for training compared to the native tensor implementation.\nLimitation: you only can use VE#0 with this method only L1Loss implemented yet. Please let me know if you use other loss functions. you can use print() and some other basic functions on the Aurora tensors, but most functionality is not implemented. If you want to do computations on the data outside of the SOL optimized model, you need to copy the tensor back to the CPU via output = output.cpu() v0.1.609.12.2019 Supports: PyTorch 1.3.1 SOL will warn you to if you are trying to use an unsupported framework version (e.g. trying to run PyTorch 1.0) Added cache, to not recompile network everytime SOL is run. In case you want to explicitly recompile, delete folder \".sol\" or use \"sol.cache.clear()\" before calling \"sol.optimize(...)\" or \"sol.deploy(...)\" Bugfixes for NHWC input data format. Updated docs to explain more details about \"sol.deploy(...)\" v0.1.502.12.2019 Supports: Pytorch 1.3.1 Preliminary deployment support, look at \"sol.deploy(...)\" in documentation "
},
{
	"uri": "/advanced/performance.html",
	"title": "Performance/Determinism",
	"tags": [],
	"description": "",
	"content": "Since v0.6 SOL obeys framework specific determinism specifiers. Please refer to the frameworks specific options (e.g. PyTorch)\ntl;dr; For deterministic results use:\nsol.config[\u0026#39;autotune\u0026#39;] = False For best performance use:\nsol.config[\u0026#39;autotune\u0026#39;] = True SOL In v0.6 SOL introduced Determinism flags. These allow to change the numerical behavior of SOL. By default, SOL obeys the numerical behavior of the AI framework. See below for information about PyTorch and TensorFlow.\nIf you want to modify the determinism yourself, you can pass a set, list or tuple consisting of sol.Determinism to sol.optimize(..., determinism=...) or torch.compile(..., backend='sol', determinism=...).\nIt\u0026rsquo;s important that you ALWAYS select one Framework type! This can have influence if certain options are obeyed or not. Not all device types support all options. Unavailable options get ignored.\nCurrently supported options:\nOption Effect Framework_PyTorch Sets PyTorch mode. Framework_TensorFlow Sets TensorFlow mode. Framework_Numpy Sets Numpy mode. Framework_ONNX Sets ONNX mode. GEMM_TF32 Enables to use TF32 as replacement for FP32 in GEMMs. GEMM_FP16 Enables to use FP16 accumulators in FP16 GEMMs. GEMM_BF16 Enables to use BF16 accumulators in BF16 GEMMs. GEMM_BF16_2x RESERVED GEMM_NoReorder Prevents SOL from changing transposition of GEMM inputs/outputs. Conv_Benchmark Enables Conv benchmarking. Conv_TF32 Enables TF32 as replacement for FP32 in Conv. Conv_Nondeterministic Enables non-deterministic Conv implementations. Rand_Fastest Allows backends to choose faster rand algorithms. PyTorch For deterministic results use:\ntorch.backends.cudnn.benchmark = False torch.backends.cudnn.deterministic = True torch.backends.cudnn.allow_tf32 = False torch.set_float32_matmul_precision(\u0026#34;highest\u0026#34;) torch.backends.cuda.matmul.allow_tf32 = False torch.backends.cuda.matmul.allow_fp16_reduced_precision_reduction = False torch.backends.cuda.matmul.allow_bf16_reduced_precision_reduction = False For best performance use:\ntorch.backends.cudnn.benchmark = True torch.backends.cudnn.deterministic = False torch.backends.cudnn.allow_tf32 = True torch.set_float32_matmul_precision(\u0026#34;lowest\u0026#34;) torch.backends.cuda.matmul.allow_tf32 = True torch.backends.cuda.matmul.allow_fp16_reduced_precision_reduction = True torch.backends.cuda.matmul.allow_bf16_reduced_precision_reduction = True If you want to use PyTorch\u0026rsquo;s default determinism flags, but add your own, you can use: sol.optimize(..., determinism=sol.pytorch.determinism(sol.Determinism.Rand_Fastest, sol.Determinism.Conv_TF32, ...)).\nMore information about PyTorch\u0026rsquo;s numerical accuracy options can be found here.\nTensorFlow SOL obeys the tf.config.experimental.enable_tensor_float32_execution option.\n"
},
{
	"uri": "/tutorials/preparation.html",
	"title": "Preparation",
	"tags": [],
	"description": "",
	"content": "Dependencies This tutorial uses torch, numpy and torchvision models as examples to demonstrate how to use SOL. These can be installed via:\npip install torch torchvision numpy If you want to use the Tensorflow or ONNX examples instead they can be installed accordingly:\npip install tensorflow pip install onnx Creating a Model SOL does not provide an interface to create models from scratch. So you have to create them in an existing framework before you import them into SOL. There are multiple ways to do so. You can create your models by hand, load them from a model zoo or load your own saved models. The way in which a model was created is not important to SOL. It can read any valid model regardless of its creation process. Here are a few examples to get you started:\nPyTorch In PyTorch you can create a model by hand with torch.nn.Module, you can load a pretrained model from a model zoo or one you saved earlier.\nHere is a simple example of creating or loading a model in torch:\nimport torch import torch.nn as nn import torchvision.models as models class SimpleNN(nn.Module): def __init__(self): super(SimpleNN, self).__init__() self.fc1 = nn.Linear(10, 50) self.fc2 = nn.Linear(50, 1) self.relu = nn.ReLU() def forward(self, x): x = self.relu(self.fc1(x)) x = self.fc2(x) return x # Create an instance of the model model = SimpleNN() Alternatively you can load a pretrained model, e.g. from torchvision:\nimport torch import torchvision.models as models model = models.resnet18(pretrained=True) Tensorflow The model can also be created in Tensorflow in a similar way.\nimport tensorflow as tf from tensorflow import keras from tensorflow.keras import layers model = keras.Sequential([ layers.Dense(64, activation=\u0026#39;relu\u0026#39;, input_shape=(10,)), layers.Dense(32, activation=\u0026#39;relu\u0026#39;), layers.Dense(1, activation=\u0026#39;sigmoid\u0026#39;) ]) # Compile the model model.compile(optimizer=\u0026#39;adam\u0026#39;, loss=\u0026#39;binary_crossentropy\u0026#39;, metrics=[\u0026#39;accuracy\u0026#39;]) Alternatively you can load a pretrained model, e.g. from keras.applications:\nimport tensorflow as tf model = tf.keras.applications.MobileNetV2(weights=\u0026#39;imagenet\u0026#39;) ONNX In ONNX you can load a pretrained model, e.g. from onnx.hub:\nfrom onnx import hub model = hub.load(\u0026#34;resnet50\u0026#34;) "
},
{
	"uri": "/advanced/profiler.html",
	"title": "Profiler",
	"tags": [],
	"description": "",
	"content": "SOL comes with it\u0026rsquo;s own profiler to record performance and memory information.\ntl;dr; SOL_PROFILE=TRUE SOL_CLEAR_CACHE=TRUE python3 ... Output Modes SOL supports four output modes.\nTRUE: Prints a formatted tabled directly in the console. CSV/TSV: Prints a CSV or TSV formatted table in the console. TENSORBOARD: Outputs the profiling to TensorBoard (experimental) Performance and/or Memory Profiling SOL can measure the performance and memory consumption throughout the execution. By default, both get measured. You can set SOL_PROFILE={MODE}:{METRICS}, where {METRICS} can be ALL, PERFORMANCE or MEMORY.\nRedirect to file To redirect the output to a file, or specify the TensorBoard output file, you can set SOL_PROFILE={MODE}:{METRICS}:{FILENAME}.\nAnnotating your own code with SOL profiler API calls In Python you can use:\nwith sol.profiler(sol.profiler.{TYPE}): ... In C/C++ you can use:\nsol_profiler_push(SOL_PROFILER_{TYPE}); ... sol_profiler_pop(); SOL distinguishes the following profiling types:\nParser: any code related to parsing models Compiler: any code related compiling Runtime: any code related to SOL\u0026rsquo;s runtime system Control: code that executes the control plane of SOL Device: code that executes computations on the devices (e.g. GPU kernels) H2D: time spend in copying data from host to device D2H: time spend in copying data from device to host Extern: any external profiling API calls Profiler Sections If you want to group your code into sections, e.g., \u0026ldquo;preprocessing\u0026rdquo;, \u0026ldquo;training\u0026rdquo;, \u0026hellip; you can use sol.profiler.section(\u0026quot;SECTION_NAME\u0026quot;, sol.profiler.{TYPE}, sol.profiler.{TYPE}) to define a section name, and to limit the profiler types that shall be recorded for this section. If you call the function a second time, it will create a new section.\nFramework Specific Profiler integration PyTorch (Experimental) For PyTorch, SOL supports to extract PyTorch profiler calls into SOL\u0026rsquo;s profiler output. You can use:\nwith sol.pytorch.profile(*args, **kwargs): ... args and kwargs get redirected to torch.profiler.profile(*args, **kwargs). When the with block terminates, SOL with catch all profiling events from PyTorch and add them SOL\u0026rsquo;s profiler API.\n"
},
{
	"uri": "/releases.html",
	"title": "Releases",
	"tags": [],
	"description": "",
	"content": "Redirecting to newest release. Click here if redirection does not work.\n"
},
{
	"uri": "/index.html",
	"title": "SOL",
	"tags": [],
	"description": "",
	"content": "SOL: AI Compiler Unlock the potential of AI - solve hardware and software bottlenecks with SOL! What is SOL? SOL is an AI compiler platform that accelerates and optimizes AI workloads. It serves as a replacement for other compilers, boosting performance and reducing memory and compute overhead without requiring hardware or software changes. Why SOL? SOL bridges the gap between advanced hardware and AI frameworks, offering seamless integration with existing AI infrastructure, high optimization and excellent hardware and software compatibility, helping transform your hardware into AI-ready systems. Why is it better than other compilers? SOL has a faster, more efficient auto-tuning phase than Apache TVM and can outperform other commonly used compilers. It maintains high accuracy without using quantization and delivers mathematically equivalent results through smart optimization. With which AI frameworks is SOL compatible? SOL supports major AI frameworks such as NumPy, TensorFlow and PyTorch, thanks to its framework-agnostic design. Further, SOL supports loading ONNX models and retrains them in TensorFlow or PyTorch. It also allows cross-framework execution and supports training, inferences and deployment without bloated dependencies. Which hardware does SOL support? SOL works with a variety of hardware platforms, offering native, hybrid and offload AI execution modes. Its hardware-agnostic design helps ensure broad compatibility and easily extendible platforms. New hardware can be supported by adding hardware plugins for SOL. How do I get started using SOL? If you're an AI engineer, you can start by adding SOL as a package in your python code. If you’re a hardware producer, you can start using SOL by substituting it for your current compiler. To find out more or start using SOL, reach out to us via info@neclab.eu to schedule a SOL introduction call.\n"
},
{
	"uri": "/devices/x86.html",
	"title": "X86 CPUs",
	"tags": [],
	"description": "",
	"content": "When using SOL on X86 CPUs we strongly recommend to set the following OMP env var.\nexport OMP_PROC_BIND=TRUE Env Vars EnvVar Default Description CC \u0026ldquo;gcc\u0026rdquo; Path to gcc CXX \u0026ldquo;gcc\u0026rdquo; Path to g++ COBJCOPY \u0026ldquo;objcopy\u0026rdquo; Path to objcopy CAR \u0026ldquo;ar\u0026rdquo; Path to ar CPATH Used as include paths C_INCLUDE_PATH Used as include paths CPLUS_INCLUDE_PATH Used as include paths LIBRARY_PATH Used as library paths Further see OpenMP Environment Variables.\nFAQ SOL performance using X86 is really really bad, what can I do?\rtl;dr; run once: nec-sol fix-omp -y\nUnfortunately some python packages (e.g., torch or sklearn) ship their own version of libgomp.so. This causes that the application uses multiple OpenMP runtimes, which can cause a series of problems. If you don\u0026rsquo;t define OMP_NUM_THREADS, then only the very first loaded OpenMP runtime runs multi-threaded. Even worse, if you define OMP_NUM_THREADS, all runtimes run in parallel, but use different threadpools, so whenever you switch from one to the other, you will have significant overhead due to swapping the threads. The nec-sol fix-omp -y command detects conflicting versions of libgomp.so and replaces them with a symlink to the system\u0026rsquo;s own version of libgomp.so, so that all libraries use the same threadpool.\nIf you need to run this fixing within your script, because you are using a fully automated process, then you can use:\nimport sol.bugfixes sol.bugfixes.omp(True)\r"
},
{
	"uri": "/tutorials/basics.html",
	"title": "Basics",
	"tags": [],
	"description": "",
	"content": "Domain experts frequently use machine learning in their fields without needing in-depth knowledge of a computer’s inner workings. Rewriting their scripts for performance gains is challenging typically requires expertise different from that of the original author. SOL is designed for these domain experts and aims to optimize machine learning models without the need to understand the underlying hardware. To this end SOL is designed with two main principles in mind:\nEase of use: You do not need to select any SOL specific parameters. All information regarding execution parameters is read directly from the model.\nDrop-in replacement: You do not have to rewrite any of your code. Optimize your model with SOL and use the generated model in place of your old one.\nTo optimize a model with SOL you just need to add\nimport sol to your imports and call\noptimized_model = sol.optimize(model) on your framework model.\nAnd that\u0026rsquo;s it, you now have an optimized version of your previous model!\nAccording to the design goal of drop-in replacement, sol.optimize creates a model of the same type as its input. This means that type(optimized_model) will be equal to type(model). This means for example an nn.Module for torch or tf.keras.Model for Tensorflow. This includes any custom functions you have defined for your model, e.g., model.do_what_I_want(...) are also preserved (but not optimized by SOL). As a result, you do not need to change anything else in your codebase and just replace the old model with the optimized one.\nSOL uses JIT-compilation (just in time) for its final result. This means that compilation is not triggered in sol.optimize(), but when the model is actually called the first time.\nIf you already use torch.compile in your project, adding SOL is even easier. You just have to add backend=\u0026quot;sol\u0026quot; to your torch.compile() call.\nimport torch import torchvision.models as models import sol.pytorch # not needed for torch \u0026gt;= 2.6.0 model = models.resnet18(pretrained=False) optimized_model = torch.compile(model, backend=\u0026#34;sol\u0026#34;) If you are using a pytorch version older than 2.6 you also need to add import sol.pytorch. For any version past 2.6, SOL is automatically detected and added to valid backends during installation.\nRunning Inference The optimized model behaves exactly the same as the framework model beforehand. To use a SOL-model you just replace your old torch model with the optimized one like this:\nimport torch import torchvision.models as models import sol model = models.resnet18() model.eval() optimized_model = sol.optimize(model) # Generate a random input tensor random_input = torch.randn(1, 3, 224, 224) # Run Inference with torch.no_grad(): # out = model(random_input) out = optimized_model(random_input) Training the Model Now, let’s train the model. For this will use a simple example training a small MLP on FashionMNIST. Training requires some more setup, so we define helpers like a dataloader and a definition of the network:\nimport torch from torch import nn from torch.utils.data import DataLoader from torchvision import datasets from torchvision.transforms import ToTensor # Define dataloader training_data = datasets.FashionMNIST( root=\u0026#34;data\u0026#34;, train=True, download=True, transform=ToTensor() ) dataloader = DataLoader(training_data, batch_size=64) # Define model class NeuralNetwork(nn.Module): def __init__(self): super().__init__() self.flatten = nn.Flatten() self.linear_relu_stack = nn.Sequential( nn.Linear(28*28, 512), nn.ReLU(), nn.Linear(512, 512), nn.ReLU(), nn.Linear(512, 10) ) def forward(self, x): x = self.flatten(x) logits = self.linear_relu_stack(x) return logits model = NeuralNetwork().train() # Define the Loss Function loss_fn = nn.CrossEntropyLoss() # Optimize model import sol model = sol.optimize(model) optimizer = torch.optim.SGD(model.parameters(), lr=1e-3) # Training loop num_epochs = 5 for epoch in range(num_epochs): for batch, (X, y) in enumerate(train_dataloader): # Compute prediction error pred = model(X) loss = loss_fn(pred, y) # Backpropagation loss.backward() optimizer.step() optimizer.zero_grad() print(f\u0026#34;Epoch [{epoch+1}/{num_epochs}], Loss: {loss.item():.4f}\u0026#34;) # Save your model to disk torch.save(model.state_dict(), \u0026#34;sol_model\u0026#34;) As before with inference, the only change needed to compile the model with SOL is the call to sol.optimize on the model before it is used in the training process. This example showcases again how the optimized model can be used as a drop-in replacement for the original one. Not only is it used to compute the prediction error, model.parameters() is also used to initialize the optimizer and torch.save saves the models parameters without any changes to equivalent pure PyTorch code.\nThis feature also allows you write your scripts with SOL as an optional component. You can simply wrap the appropriate code in a try block or make it dependent on some parameter of your script. This way you can reuse your script on any machine even if it does not have SOL installed. It also makes sure that any problems that may occur during SOL\u0026rsquo;s optimization will never be a point of critical failure to your script.\nif use_sol: try: import sol model = sol.optimize(model) except: pass "
},
{
	"uri": "/advanced/configs.html",
	"title": "Configs",
	"tags": [],
	"description": "",
	"content": " Option Type/Default Description autotuning bool/true Enables autotuning. autotuning::max_runs int/100 Max number of runs per library and layer. autotuning::not_improved int/5 Number of runs without improvement when autotuning will be stopped. compiler::debug bool/false Adds assertions in the code to check correctness at runtime (requires jit::debug). compiler::debug_graph bool/false Generates SVGs in .sol/debug with the network structure. compiler::debug_memory_consumption bool/false Generates memory consumption estimations in .sol/debug. compiler::debug_text bool/false Textural network output in .sol/debug. compiler::performance_warning bool/false Shows warning, if chose hyper-parameters have negative impact on performance. compiler::remove_unused_params bool/false Removes unused model parameters. compiler::reporting bool/false Enables additional compiler output for compilers. compiler::spawn int/0 0 = posix_spawn(vfork), 1 = posix_spawn(fork), 2 = popen. conv::sampling bool/true dfp::debug bool/false Generates SVGs for the DFP computation graph. framework str/None Used as default if framework is not set in sol.optimize(...) jit::debug bool/false Enables debugging in the generated code. onnx::debug bool/false Generates SVGs for the ONNX computation graph. parser::debug bool/false Enables parser debugging if available. parser::debug_stack_traces bool/false Displays stack traces in Graph and Text debug output if available. reporting bool/false Enables GCC/NCC reporting on the generated code. Since v0.8 config values also can be set via env vars. The option key needs to be upper case, the namespace :: replaced with _ and preceeded by SOL_CONFIG_, e.g., sol.config['dfp::debug'] = True becomes SOL_CONFIG_DFP_DEBUG=TRUE. Boolean values can be TRUE, ON or 1. All other values will be FALSE.\n"
},
{
	"uri": "/frameworks/cross.html",
	"title": "Cross-Framework",
	"tags": [],
	"description": "",
	"content": " This feature is only available for the Python based frameworks.\nSince SOL v0.4.2 you can cross execute SOL models. For this just use sol.optimize(..., framework='?'). I.e. you can load an ONNX Model and run it within PyTorch using:\nimport sol # Run Model with Numpy (def func(...)) import numpy as np np_model = sol.optimize(\u0026#39;mymodel.onnx\u0026#39;, framework=\u0026#39;numpy\u0026#39;) np_input = np.random.rand(1, 3, 224, 224).astype(np.float32) np_output = np_model(np_input) # Run Model with PyTorch (torch.nn.Module) import torch py_model = sol.optimize(\u0026#39;mymodel.onnx\u0026#39;, framework=\u0026#39;pytorch\u0026#39;) py_model.eval() py_input = torch.from_numpy(np_input) with torch.no_grad(): py_output = py_model(py_input) # Run Model with TensorFlow (tf.Module) import tensorflow as tf tf_model = sol.optimize(\u0026#39;mymodel.onnx\u0026#39;, framework=\u0026#39;tensorflow\u0026#39;) tf_output = tf_model(np_input) # Run Model with Keras (tf.keras.model.Model) import tensorflow as tf keras_model = sol.optimize(\u0026#39;mymodel.onnx\u0026#39;, framework=\u0026#39;keras\u0026#39;) keras_output = keras_model.predict(np_input) If you are using a PyTorch as source model, you need to pass example inputs as PyTorch model\u0026rsquo;s don\u0026rsquo;t know their input shapes or dtypes.\npytorch_model = ... keras_model = sol.optimize(pytorch_model, [torch.tensor(np_input)], framework=\u0026#39;keras\u0026#39;) keras_output = keras_model.predict(np_input) "
},
{
	"uri": "/devices/nvidia.html",
	"title": "NVIDIA GPUs",
	"tags": [],
	"description": "",
	"content": " Most of the frameworks have been build with a specific CUDA version. Please refer to PyTorch or TensorFlow to find the correct CUDA version.\nRequirements Requirement Version Comment CUDA Toolkit ≥ 11.0 CUDNN ≥ 8.0 see FAQ how to install file command any Env Vars EnvVar Default Description CUDA_HOME \u0026ldquo;/usr/local/cuda\u0026rdquo; Path to CUDA home dir NVCPATH Used as include paths NVC_INCLUDE_PATH Used as include paths NVCPLUS_INCLUDE_PATH Used as include paths NVLIBRARY_PATH Used as library paths FAQ SOL tells me that support for NVIDIA is not available? This is usually caused by a version of SOL without NVIDIA support. Please check if pip3 list installed | grep nec-sol-device-nvidia shows that it is installed, and has the same version as pip3 list installed | grep nec-sol\nWhich GPUs are supported? In general all CUDA capable GPU starting from the Kepler architecture (i.e. Tesla K40) are supported.\nSOL is unable to load CUDA/CUBLAS/CUDNN The most likely reason is a version mismatch. Please run the following commands:\nnvcc --version python3 -m pip freeze | grep nvidia The CUDA toolkit and the Python packages need to have same major version (e.g., \u0026ldquo;cu12\u0026rdquo; or \u0026ldquo;release 12.*\u0026rdquo;). Further, these also need to match the version your AI framework was compiled for. Please check the homepage of the AI framework for further details.\nSOL does not load CUDNN We do not bundle CUDNN with SOL. PyTorch installs it automatically using PYPI. Please run tests described in previous FAQ entry. For TensorFlow or other frameworks you need to install these manually. Either download it from https://developer.nvidia.com/cudnn (requires a free CUDA developer account) or by running pip3 install nvidia-cudnn-cuXX (replace XX with 11 or 12 depending on your CUDA version).\n"
},
{
	"uri": "/tutorials.html",
	"title": "Tutorials",
	"tags": [],
	"description": "",
	"content": "SOL is a machine learning compiler for deep learning applications. It optimizes deep learning models from popular frameworks for inference and training. This tutorial will guide you through the basics of using SOL first and then present some advanced options for a finer grained control of its behavior. This tutorial consists of four sections:\nPreparation gives you some simple examples on how to create a model in a framework of your choice. If you are familiar with your chosen framework you can skip this section.\nCreating a simple neural network model (or loading a pre-trained one) Basics will guide you through the basics of SOL usage, including:\nOptimizing a model with SOL Running inference with the optimized model Training the optimized model Advanced shows further possibilities to control SOL\u0026rsquo;s settings and behavior, as well as showcasing some features that are enabled by SOL:\nUnderstanding and controlling console output Running on a separate device Using an unsupported device Cross Framework Execution Deployment is an advanced feature of SOL that allows you to compile neural networks into standalone binaries. This section explains:\nSOL deployment\u0026rsquo;s python API and its options How to compile a simple example How to package the output Installing SOL To use SOL you need a user account. (For details, see SOL Installation) If you have an account, you can install it simply using pip:\npip install --upgrade nec-sol nec-sol install Quick Start SOL is meant to be easily used without extensive changes to existing code and has no complicated parameters to learn. You only need two lines of code to optimize a model. Add SOL to your imports in your python script and call sol.optimize() on your model.\nimport sol ... model = sol.optimize(model) No further changes to your script are needed.\nIf you are looking for examples or a more detailed understanding of SOL you can explore the rest of this tutorial.\n"
},
{
	"uri": "/tutorials/advanced.html",
	"title": "Advanced",
	"tags": [],
	"description": "",
	"content": "This section provides some further information and examples on how SOL processes certain inputs. It also presents some options to control SOL to a greater degree manually.\nOther Devices In most cases you probably want to execute or train your model on an accelerator like a GPU instead of your CPU. To offload your model, SOL again follows its easy to use principle that dictates that you do not need to set any SOL specific parameters to do so. Instead, SOL is designed to read all necessary information from the model. To do so you have to move the model to the desired device in a way that is supported by your chosen framework. Just as you would do without using SOL. So, if you move your model and your input data to another device, SOL will also compile for that device. The following example shows how to use an NVIDIA GPU. Use the standard interface of your framework to indicate which device to use. This can be done via .cuda() or .to('cuda') in PyTorch or with tf.device('/GPU:0'): in Tensorflow.\nimport torch import torchvision.models as models import sol model = models.resnet18(pretrained=True).eval() model.eval() random_input = torch.randn(1, 3, 224, 224) model.cuda() random_input.cuda() optimized_model = sol.optimize(model) with torch.no_grad(): out = optimized_model(random_input) torch.cuda.synchronize() Manual Device Selection SOL also offers an option to manually select the device that runs your model. Calling sol.device.set(device, device_idx) instructs SOL to use the given device for all following instructions. In this example the model is optimized and compiled for a NEC SX-Aurora (VE). If you want to run on an NVIDIA GPU you just have to change the device parameter from \u0026ldquo;ve\u0026rdquo; to \u0026ldquo;nvidia\u0026rdquo;. For a list of supported devices, see device.\nimport torch import torchvision.models as models import sol model = models.resnet18(pretrained=True).eval() model.eval() random_input = torch.randn(1, 3, 224, 224) sol.device.set(\u0026#34;ve\u0026#34;, 0) optimized_model = sol.optimize(model) with torch.no_grad(): out = optimized_model(random_input) As you can see, you do not even need to move input and output to the device explicitly but they are copied implicitly when the model is called.\nThis feature comes in handy if your framework does not support your device directly (like in the first example with .cuda() for torch). In this case, you can use SOL to offload your model automatically to any hardware that is supported by SOL.\nFor a list of supported devices, see device. To check your current installation for devices or look up their name within SOL you can also call sol.plugins():\n[INFO ][ 3.80][SOL/core] static (87) Compiler Plugins: [INFO ][ 3.80][SOL/core] static (87) Devices: [x86, nvidia, ve] [INFO ][ 3.80][SOL/core] static (87) Frameworks: [pytorch, tensorflow, onnx, numpy] Optimization options Most parameters are read from the model directly. But there are some options you can add to sol.optimize() directly to change the compilation.\n# sol.optimize signature def optimize(model, args=[], kwargs={}, *, framework=None, vdims=None, determinism=None, **fwargs) args: Example Input If the shapes of your inputs can not be inferred from the model directly or if you want to compile for a specific shape, you need to define an example input with your desired shape and datatype. Simply define shape and datatype of your inputs by passing a list of tensors of the desired properties to sol.optimize as arguments.\ninput_tensor = torch.empty((3, 4), dtype=torch.float16) sol.optimize(model, [input_tensor]) fwargs: Framework Arguments The entries of this dictionary are passed to the underlying framework. Note that this is and advanced option that usually requires knowledge of the inner workings of the corresponding parser.\nHere is one example on how to use it. You can set the shape of an input to a Tensorflow model (in this case called \u0026ldquo;input_1\u0026rdquo;) to a specific value without passing an example input in args. This requires you to know the name of the tensor whose shape you want to define and to provide a valid shape, otherwise the compilation will fail.\nsol.optimize(model, fwargs={\u0026#34;shapes\u0026#34;:{\u0026#34;input_1\u0026#34;: [int(batch_size), int(height), int(width), int(channels)]}}) Sometimes the input size can not be read from the model directly making this necessary.\nframework: Cross Framework Compilation As described before SOL reads the type of the framework from the model and creates a model of an equal type. By using the framework keyword you can define the output type yourself! This allows you to run your models from one framework in another. If you have for example an old tensorflow model lying around but want to use it in your current training script that is written in torch, you can use SOL to simply reuse your old model in your new script!\nimport torch import torchvision.models as models import tensorflow as tf import sol import numpy as np model = models.resnet18(pretrained=True).eval() random_input = np.random.rand(1, 3, 224, 224) # note that this is not a torch tensor random_input_t = torch.Tensor(random_input) # Example inputs are required in this case! optimized_model = sol.optimize(model, [random_input_t], framework=\u0026#34;keras\u0026#34;) # this creates a tf.keras.Module with tf.device(\u0026#39;/CPU:0\u0026#39;): out = optimized_model(random_input) # the compiled model behaves like a tensorflow model! Available options are:\nFramework Description keras Returns the model as a keras.Model. numpy Returns the model as a object that stores the weights as Numpy arrays, expecting the inputs to be Numpy arrays. pytorch Returns the model as a torch.nn.Module. tensorflow Returns the model as a object that stores the weights as tf.Variable and the execution is run within TensorFlow, expecting Numpy or TensorFlow tensors as input. determinism: Numerical Accuracy The determinism option controls the numerical accuracy of SOL. This allows you to enable or disable several trade offs between accuracy and performance of your model. As with all other options, by default the rules of the original framework of the optimized model are used. So, setting for example torch.set_float32_matmul_precision(\u0026quot;highest\u0026quot;) will define rules how matrix multiplication in torch is handled. SOL follows these rules as well. Passing a different value to determinism allows you to change these rules manually for a single optimization run.\nFor more details and possible options in SOL see Determinism in the official documentation.\nvdims: Variable Dimensions When you compile a network you will see an output similar to this:\n[INFO ][ 7.74][SOL/core] compiler (313) Parsing network AlexNet [INFO ][ 7.76][SOL/core] Optimizer (42) Analyzing network AlexNet (0x1AE3B71C) [INFO ][ 7.77][SOL/core] Wrapper (138) Inputs: [INFO ][ 7.77][SOL/core] Wrapper (138) x: Tensor(dtype=[F32], shape=[#0, 3, 224, 224]) [INFO ][ 7.77][SOL/core] Wrapper (143) Outputs: [INFO ][ 7.77][SOL/core] Wrapper (143) Tensor(dtype=[F32], shape=[#0, 1000]) [INFO ][ 7.77][SOL/core] Optimizer (84) Model Parameters: 233.08MB [INFO ][ 7.77][SOL/core] Optimizer (88) [INFO ][ 7.88][SOL/core] Compiler (73) Compiling network 1AE3B71C_9ACB57BC for x86 [INFO ][ 14.10][SOL/core] Progress (56) 100.0% [■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■] [INFO ][ 14.10][SOL/core] Compiler (145) Estimated Peak Memory Consumption (assuming: #0: 1): [INFO ][ 14.10][SOL/core] Compiler (151) Inference: ~238.0MB Note that in \u0026ldquo;shape\u0026rdquo; the first dimension is represented as \u0026ldquo;#0\u0026rdquo;. This is a variable dimension (VDim) with index 0. SOL automatically detects that this is the batchsize and the compilation does not depend on this being a fixed size. That means that no recompilation is needed when you call the model with different batchsizes.\nIf you do want to fix the batchsize to ensure that only the optimal implementation for one fixed size is created, you can disable VDims with sol.compile(..., vdims=[False]).\n[INFO ][ 11.40][SOL/core] compiler (313) Parsing network AlexNet [INFO ][ 11.42][SOL/core] Optimizer (42) Analyzing network AlexNet (0x1AE3B71C) [INFO ][ 11.43][SOL/core] Wrapper (138) Inputs: [INFO ][ 11.43][SOL/core] Wrapper (138) x: Tensor(dtype=[F32], shape=[1, 3, 224, 224]) [INFO ][ 11.43][SOL/core] Wrapper (143) Outputs: [INFO ][ 11.43][SOL/core] Wrapper (143) Tensor(dtype=[F32], shape=[1, 1000]) [INFO ][ 11.43][SOL/core] Optimizer (84) Model Parameters: 233.08MB [INFO ][ 11.43][SOL/core] Optimizer (88) [INFO ][ 11.53][SOL/core] Compiler (73) Compiling network 1AE3B71C_CA29BD56 for x86 [INFO ][ 17.99][SOL/core] Progress (56) 100.0% [■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■] [INFO ][ 17.99][SOL/core] Compiler (145) Estimated Peak Memory Consumption: [INFO ][ 17.99][SOL/core] Compiler (151) Inference: ~238.0MB In this example the network was compiled with a batchsize of 1. If the same model is called with a different value it has to be recompiled for that value. This ensures that always an solution with an optimization customized to this specific batchsize is chosen.\nControlling SOL There are a few advanced options to control SOL\u0026rsquo;s behavior outside of sol.optimize(). SOL\u0026rsquo;s settings can be changed by manipulating its config and environment variables. Here are few examples on how to use these\nSOL Output If you have used SOL a few times you will notice that it prints a lot of information with the \u0026ldquo;[INFO]\u0026rdquo; tag by default. To disable any output of this kind you can set the SOL_LOG environment variable to ERROR. This ensures that SOL will only print messages in case of an error.\nSOL_LOG=ERROR python sol_script.py Enjoy the silence!\nCurrent Working Directory By default, SOL creates a directory for its intermediate code generation and compilation within your current directory in a folder called .sol. While this is desired in most cases, sometimes there are reasons for you to want a fixed location for SOL\u0026rsquo;s intermediate generations. For example, if you call your scripts from many different locations and want to prevent multiple .sol directories or to make sure it uses a faster hard drive.\nTo change the location of this directory you can do so via the $SOL_CWD variable. Set SOL_CWD=/path/to/your/dir to write and read intermediate results from your desired location.\nYou can also delete this directory to force SOL to generate all temporary files again.\nSOL Cache SOL caches optimized networks to reduce the workload when they are called repeatedly. If you want to clear this cache to force a recompilation or to save disk space you can do so via sol.cache.clear() in python or by setting the environment variable SOL_CLEAR_CACHE=TRUE before executing your script.\nAutotuning By default SOL uses autotuning to find an optimal implementation for each layer. In some cases you want a deterministic choice of implementation or maybe you want to save time during compilation. You can disable autotuning via sol.config[\u0026quot;autotuning\u0026quot;]= False in your script to always use the same (then heuristically chosen) implementation.\nFurther Options For a full list of all supported environment variables and config options you can check out the official documentation ENV and CONFIG.\n"
},
{
	"uri": "/advanced/envvars.html",
	"title": "Env Vars",
	"tags": [],
	"description": "",
	"content": "General Env Vars EnvVar Default Description DOT \u0026ldquo;dot\u0026rdquo; Path to GraphViz dot SOL_CLEAR_CACHE Similar to sol.cache.clear(). Available values: [\u0026lsquo;TRUE\u0026rsquo;] SOL_CWD \u0026ldquo;~/.cache/sol\u0026rdquo; Overrides the folder used to store the SOL cache. By default SOL stores it in the user\u0026rsquo;s cache folder. HOME and XDG_CACHE_HOME are obeyed to determine the location of the user\u0026rsquo;s cache folder. When SOL_CWD is set, the cache is generated in $SOL_CWD/.sol. SOL_DEBUG Comma separated list of debug flags. Available values [\u0026lsquo;ALL\u0026rsquo;, \u0026lsquo;COMPILER\u0026rsquo;, \u0026lsquo;JIT\u0026rsquo;, \u0026lsquo;GRAPH\u0026rsquo;, \u0026lsquo;TEXT\u0026rsquo;, \u0026lsquo;MEMORY\u0026rsquo;, \u0026lsquo;PARSER\u0026rsquo;, \u0026lsquo;STACKTRACE\u0026rsquo;] SOL_JIT_THREADS Sets the max number of parallel JIT compilation processes. Default: (number of physical cores - 1) SOL_LOG \u0026ldquo;INFO\u0026rdquo; Defines SOL log level. Available values: [\u0026lsquo;TRACE\u0026rsquo;, \u0026lsquo;DEBUG\u0026rsquo;, \u0026lsquo;INFO\u0026rsquo;, \u0026lsquo;WARN\u0026rsquo;, \u0026lsquo;ERROR\u0026rsquo;] SOL_PROFILE \u0026ldquo;FALSE:ALL:\u0026rdquo; Enables SOL profiler. Accepted format is MODE:DATA:FILENAME with MODE: [\u0026lsquo;FALSE\u0026rsquo;, \u0026lsquo;TRUE\u0026rsquo;, \u0026lsquo;TSV\u0026rsquo;, \u0026lsquo;CSV\u0026rsquo;, \u0026lsquo;TENSORBOARD\u0026rsquo;], DATA: [\u0026lsquo;ALL\u0026rsquo;, \u0026lsquo;PERFORMANCE\u0026rsquo;, \u0026lsquo;MEMORY\u0026rsquo;] and FILENAME as valid filename to write the performance data to. DATA and FILENAME are optional. SOL_SQLITE_JOURNAL_MODE \u0026ldquo;WAL\u0026rdquo; Sets the SQLite pragma \u0026ldquo;journal_mode\u0026rdquo; SOL_SQLITE_LOCKING_MODE \u0026ldquo;NORMAL\u0026rdquo; Sets the SQLite pragma \u0026ldquo;locking_mode\u0026rdquo; SOL_SQLITE_READONLY Opens the SOL cache in readonly mode. No new models can be compiled! Should only be used with an already initialized SOL cache. SOL_SQLITE_SYNCHRONOUS \u0026ldquo;FULL\u0026rdquo; Sets the SQLite pragma \u0026ldquo;synchronous\u0026rdquo; SOL_DEBUG Options Option Effect COMPILER Adds additional checks and device synchronization instructions. GRAPH Generates the computation graphs and stores them in {SOL_CWD}/debug. JIT Compiles the code with debug flags. MEMORY Generates a memory consumption estimation and stores it in {SOL_CWD}/debug. PARSER Prints additional debug information while parsing. TEXT Generates a textual representation of the computation graphs and stores them in {SOL_CWD}/debug. Device specific env vars are listed in the respective section ( X86, NVIDIA, NEC SX-Aurora )\n"
},
{
	"uri": "/install.html",
	"title": "Installation",
	"tags": [],
	"description": "",
	"content": " If you are upgrading from a pre v0.4 release you need to first run pip3 uninstall sol as the package name has changed!\nRequirements SOL has rather small number of requirements:\nRequirement Version Comment Linux any needs to be manylinux_2_28 compatible Python ≥ 3.9 GCC ≥ 8.3 ≥ 9.0 when using PyTorch ≥ 2.2 GraphViz any if using debug features CUDA Toolkit ≥ 11.8 if using NVIDIA GPUs, lower versions disable CUBLASLT support. CUDNN ≥ 8.9 if using NVIDIA GPUs, lower versions disable CUDNN support. file if using NVIDIA GPUs which if using NVIDIA GPUs Further, requirements see in the respective framework or device sections.\nTo verify if your system is Manylinux compatible run this command: python3 -c \u0026quot;import packaging.tags; print(*list(packaging.tags.sys_tags()),sep='\\n')\u0026quot; | grep \u0026quot;py3-none-manylinux_2_28\u0026quot; | wc -l If the output is 1, then your system is compatible. If 0, you need to upgrade your system.\nSOL Closed Beta User Account Before you can install SOL, you need to have a SOL Closed Beta User Account. If you don\u0026rsquo;t have one, please contact your NEC sales representative or use our application form.\nHow can I change my password? Login to portal.neclab.eu and follow the instructions for password change.\nMy account got disabled/deleted. What can I do? In this case you need to contact us, and we need to reenable your account manually.\nWhy does my account not work for the Bug Tracker? For technical reasons, the login to the Bug Tracker uses _ instead of @ in your username. So something@your-domain.eu needs to be something_your-domain.eu.\nInstalling SOL installer First you need to install the SOL installer using:\npip3 install --upgrade nec-sol The SOL installer allow you to manage your installation. The installer automatically detects which frameworks you have installed, and only installs necessary SOL extensions. This means you need to install all frameworks you want to use prior installing SOL.\nInstalling SOL For installing SOL modules, just run:\nnec-sol install The installer will prompt you to enter your login credentials and accept the user license agreement. Since v0.5.1 the installer automatically installs all SOL extensions that are supported on your system and that you have access to.\nStoring username and password Since v0.8 the SOL installer supports Python keyrings. For this please install pip3 install keyring, and setup a backed (follow the instructions in the keyring documentation).\nNext, you need to execute the following two commands to set your username and password accordingly.\npython3 -m keyring set nec-sol username python3 -m keyring set nec-sol password After you can run the SOL installer without providing any username or password!\nInstalling SOL using UV To install SOL using UV, you can use the following script.\nBe aware that UV automatically accepts the SOL license agreement on your behalf and you will not be explicitly prompted to accept them! By using this method you accept all terms of the SOL license agreement!\n[project] ... dependencies = [ \u0026#34;nec-sol-core[{FEATURES}]=={VERSION}\u0026#34;, ] [tool.uv] find-links = [ \u0026#34;https://sol.neclab.eu/core/dist/\u0026#34;, \u0026#34;https://sol.neclab.eu/core/v{VERSION}/\u0026#34;, \u0026#34;https://sol.neclab.eu/nvidia/dist/\u0026#34;, \u0026#34;https://sol.neclab.eu/nvidia/v{VERSION}/\u0026#34;, \u0026#34;https://sol.neclab.eu/ve/dist/\u0026#34;, \u0026#34;https://sol.neclab.eu/ve/v{VERSION}/\u0026#34;, \u0026#34;https://sol.neclab.eu/x86/dist/\u0026#34;, \u0026#34;https://sol.neclab.eu/x86/v{VERSION}/\u0026#34;, \u0026#34;https://sol.neclab.eu/license/index.php/pip-license-index\u0026#34;, ] You need to replace {FEATURES} with a list of features you want to use and {VERSION} with the version you want to use, e.g., nec-sol-core[torch,x86]==0.7.3. Ensure that you also replace {VERSION} in the URLs!\nFor authentication please follow these instructions. tl;dr; you can alternatively create the file ~/.netrc, with following content:\nmachine sol.neclab.eu user your_email@address.com password your_password Be aware that the .netrc stores your password in plain text!\nHow can I automate the SOL installation? Since v0.5.1 the installer has a non-interactive mode. Please check out nec-sol \u0026ndash;help to get all available options.\nHow can I install/download specific devices/frameworks? By default, the installer installs devices and frameworks detected in your OS. If you want to specify them manully, just use the \u0026ndash;devices x86 ve nvidia and \u0026ndash;framework torch numpy onnx tensorflow attributes for nec-sol.\nI get -bash: !...: event not found or __main__.py: error: unrecognized arguments: ... If your password contains special characters such as ! or , you need to put your entire password into single quotes \u0026ndash;password \u0026lsquo;my!passwrd\u0026rsquo;, otherwise your shell will try to interpret these.\nI get 'latin-1' codec can't encode character '\\u2019' in position 50: ordinal not in range(256) This is caused by non-utf-8 encoding in your Python installation. Set this env var export PYTHONIOENCODING=utf-8. More information is available here\nUninstalling SOL Just run:\nnec-sol uninstall Alternatively you can run:\npip3 freeze | grep nec-sol | xargs pip3 uninstall -y Installing SOL on a remote system without direct internet connectivity To install SOL on a server without internet connectivity run the following commands on a machine with internet access:\npip3 install nec-sol pip3 download nec-sol nec-sol download scp *.whl target_machine:/some/path Then switch to your target machine and execute:\ncd /some/path pip3 install nec_sol-*.whl nec-sol install -f . How can I install SOL on the remote system without using the installer? Since v0.5.2 you can use pip3 install nec-sol-core[FEATURES] -f . where FEATURES can be: torch, tensorflow, onnx, numpy, x86, nvidia, ve, tests and sdk.\nIn setuptools you can similarly use nec-sol-core[FEATURES] as a dependency.\nF.A.Q: How can I upgrade/change the version of SOL? For upgrading run:\nnec-sol uninstall pip3 install nec-sol==YOUR_VERSION nec-sol install Be aware that previous versions of SOL used a different installer and might need different steps to be installed.\nI totally messed up everything, how can I reset SOL? Don't Panic! run pip3 install --upgrade nec-sol. This should upgrade the SOL package manager to the newest version. run nec-sol uninstall run nec-sol install Done! PIP does not trust sol.neclab.eu OR I get following error: SSLError(SSLError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:852)'),) Solution #1 (not recommended) Add \u0026ndash;trust when running nec-sol.\nSolution #2 On CentOS pip does not trusts not the systems certificates. As user you can fix this problem with: pip3 config set cert /etc/pki/tls/certs/ca-bundle.crt. As root you can fix this for all users using:\nsudo sh -c \u0026#39;echo [global] \u0026gt; /etc/pip.conf\u0026#39; sudo sh -c \u0026#39;echo cert = /etc/pki/tls/certs/ca-bundle.crt \u0026gt;\u0026gt; /etc/pip.conf\u0026#39; How can I decide to install into user or global site-packages? By default the installer uses pip\u0026rsquo;s default. Usually this is to install globally if run as root, otherwise to install to user. Using nec-sol install \u0026ndash;user you can enforce installation into the user folder.\nMy license expired. How can I renew my SOL license? Just run nec-sol renew-license.\n"
},
{
	"uri": "/devices/ve.html",
	"title": "NEC SX-Aurora",
	"tags": [],
	"description": "",
	"content": "Requirements Requirement Version VEOS ≥ 2.7 NCC ≥ 5.0 if using VE3 Native Offloading (PyTorch) Within PyTorch we support to use native tensors. For this program PyTorch as if you would use a GPU but replace all calls to cuda with ve. E.g.:\nmodel.ve() # copy model to VE#0 input = input.ve() # copy data to VE#0 model(input) # gets executed on the device torch.ve.synchronize() # wait for execution to complete Available functions (see https://pytorch.org/docs/stable/cuda.html for description)\ntorch.Tensor.ve() torch.Tensor.to(\u0026#39;ve\u0026#39;) torch.Tensor.to(\u0026#39;ve:X\u0026#39;) torch.nn.Module.ve() torch.ve.synchronize(device=0) torch.ve.is_available() torch.ve.current_device() torch.ve.set_device(device) torch.ve.device_count() torch.ve.memory_allocated(device=None) CLASS torch.ve.device(device) CLASS torch.ve.device_of(device) Training Loss functions are not implemented natively for VE. Instead use a wrapper model to add the loss function to the SOL optimized model.\nclass TrainingModel(torch.nn.Module): def __init__(self, model, loss): super().__init__() self.model = model self.loss = loss def forward(self, input, target): output = self.model(input) loss = self.loss (output, target) return output, loss And adjust your training loop:\ndevice = \u0026#39;ve:0\u0026#39; model.to(device) optimizer = torch.optim.SGD(model.parameters()) training_model = TrainingModel(model, torch.nn.L1Loss()) training_model = sol.optimize(training_model) for input, target in dataset: input, target = input.to(device), target.to(device) output, loss = training_model(input, target) loss.backward() optimizer.step() training_model and model share the same weights, so you don\u0026rsquo;t need to further adjust your code.\nFor optimal performance, and if you don\u0026rsquo;t need PyTorch identical pseudo random numbers, use sol.optimize(...,\rdeterminism=sol.pytorch.determinism(sol.Determinism.Rand_Fastest)), which enables to use much faster random number generators.\nNative Offloading (TensorFlow) Due to increasing number of unresolved issues in TensorFlow PluggableDevice API (e.g., #55497, #57095, #60883, #60895) we decided to no longer maintain our veda-tensorflow extension. Therefore you cannot longer use with tf.device(\u0026quot;/VE:0\u0026quot;):. Instead please use Transparent Offloading using sol.device.set('ve',\r0). We are sorry for the inconvenience, but we don\u0026rsquo;t see any commitment of the TensorFlow team to accept our bugfixes, nor to fix the issues themselves.\nTransparent Offloading (all frameworks) To use the NEC SX-Aurora, it is necessary to set sol.device.set(\u0026quot;ve\u0026quot;,\rdeviceIdx) (deviceIdx is the index of the Aurora to run on, start from 0). Further it is necessary that the input data is located on the host system.\nAs explained in our paper SOL: Effortless Device Support for AI Frameworks without Source Code Changes running inference with Transparent Offloading has nearly zero impact on the performance. However, training performance will be really low!\nConfig Options Option Type/Default Description ve::trace bool/false Enables to use ftrace. ve::packed bool/false Enables use of packed vector for float32. dfp::ncc::intrinsics bool/false Enables experimental NCC intrinsics backend. (not recommended for production!) Known Issues float16 or bfloat16 data types are not supported 3D Convolution or DeConvolution are not supported PyTorch\u0026rsquo;s Bernoulli and Dropout might not produce identical pseudo random numbers, due to unavailability of MKL\u0026rsquo;s VSL Bernoulli algorithm for VE. Env Vars EnvVar Default Description NAR \u0026ldquo;/opt/nec/ve/bin/nar\u0026rdquo; Path to nar NCXX \u0026ldquo;/opt/nec/ve/bin/nc++\u0026rdquo; Path to nc++ NOBJCOPY \u0026ldquo;/opt/nec/ve/bin/nobjcopy\u0026rdquo; Path to nobjcopy VEDA_VISIBLE_DEVICES see VEDA for description VE_NODE_NUMBER see VEDA for description VE_OMP_NUM_THREADS see VEDA for description _VENODELIST see VEDA for description VE_LD_LIBRARY_PATH see VEDA for description NCPATH Used as include paths NC_INCLUDE_PATH Used as include paths NCPLUS_INCLUDE_PATH Used as include paths NLIBRARY_PATH Used as library paths FAQ The AI framework reports that an operation is not supported by device type \"VE\"\rThis is caused by the fact, that only a minimal subset of VE function calls are supported to be executed \u0026ldquo;eagerly\u0026rdquo; within the framework, i.e., +, -, *, /, \u0026hellip; If you encounter this problem, please open an issue for VEDA-PyTorch.\nSOL reports \"not found\" for NCC compiler.\rPossible Cause 1 SOL is unable to find /opt/nec/ve/bin/nc++. If you don\u0026rsquo;t use a standard installation, please use NCXX, NAR and NLD env vars to specify the paths to your NCC installation.\nPossible Cause 2 If there is a problem with your NCC license SOL is unable to properly detect the compiler. Please run nc++ \u0026ndash;version and check for any error messages.\nSOL crashes with nc++: /opt/nec/ve/ncc/3.4.2/libexec/ccom is abnormally terminated by SIGSEGV.\rOn some systems NCC v3.4.2 crashes when compiling code generated by SOL. If you encounter this problem, please switch to an older version of the compiler using the NCXX env var.\nSOL reports VEDA_ERROR:\rVEDA_ERROR_CANNOT_CREATE_CONTEXT. This error message is triggered when the VE is occupied by another process. SOL relies on AVEO which requires exclusive access to the device. To resolve this issue, terminate all other processes on the device. You can use VE_NODE_NUMBER=0 /opt/nec/ve/ve-top to identify running processes.\nDocker Containers You can use the following scripts to build Docker Containers that contain SOL.\n# check=error=true FROM rockylinux/rockylinux:8.10 AS sol-base RUN dnf update -y RUN dnf install -y python312 python3.12-pip #------------------------------------------------------------------------------- FROM sol-base AS sol-installer ARG username ARG password ARG frameworks=torch ARG devices=ve RUN python3 -m pip install nec-sol RUN python3 -m nec-sol install -u $username -p $password --accept-license --frameworks $frameworks --devices $devices #------------------------------------------------------------------------------- FROM sol-base # SOL requirements RUN dnf install -y gcc-toolset-11 graphviz # VEOS requirements RUN dnf install -y epel-release RUN dnf install -y libquadmath libdhash protobuf-c log4c hwloc # VEDA-PyTorch requirements RUN dnf install -y python3.12-devel RUN python3 -m pip install --no-cache-dir termcolor packaging matplotlib\\ requests threadpoolctl opt_einsum ninja COPY --from=sol-installer\t\\ /usr/local/lib/python3.12/site-packages/tungl\t\\ /usr/local/lib/python3.12/site-packages/tungl COPY --from=sol-installer\t\\ /usr/local/lib/python3.12/site-packages/sol\t\\ /usr/local/lib/python3.12/site-packages/sol COPY --from=sol-installer\t\\ /usr/local/lib/python3.12/site-packages/veda\\ /usr/local/lib/python3.12/site-packages/veda COPY --from=sol-installer\t\\ /usr/local/lib/python3.12/site-packages/veda_pytorch-*.dist-info\t\\ /usr/local/lib/python3.12/site-packages/veda_pytorch-14.0.0.dist-info ENV PATH=\u0026#34;/usr/local/lib/python3.12/site-packages/veda/bin/:$PATH:/opt/nec/ve/bin\u0026#34; ENV LD_LIBRARY_PATH=\u0026#34;/opt/nec/ve/veos/lib64\u0026#34; ENV CC=\u0026#34;/opt/rh/gcc-toolset-11/root/usr/bin/gcc\u0026#34; ENV CXX=\u0026#34;/opt/rh/gcc-toolset-11/root/usr/bin/g++\u0026#34; ENV PYTHONPATH=\u0026#34;/usr/local/lib64/python3.12/site-packages:/usr/local/lib/python3.12/site-packages\u0026#34; To build just run:\ndocker build . -t nec/sol4ve:latest --build-arg username=USERNAME --build-arg password=PASSWORD To run use the following command. You\u0026rsquo;ll need to duplicate the line --device=/dev/veslot* for each VE device you want to use!\ndocker run\t\\ --device=/dev/veslot0 \\ --device=/dev/veslot1 \\ -v /dev:/dev:z \\ -v /var/opt/nec/ve/veos:/var/opt/nec/ve/veos:z \\ -v /usr/lib64/libaurlic.so.1:/usr/lib64/libaurlic.so.1:ro \\ -v /opt/nec:/opt/nec:ro \\ -v $HOME:$HOME:rw \\ -it nec/sol4ve:latest bash Then create a virtual env and install all required libraries.\npython3 -m venv venv . ./venv/bin/activate pip3 install --upgrade pip pip3 install ... Further, VEDA-PyTorch \u0026gt;= v14 requires to compile a PyTorch C++ extension. To prevent recompilation you need to move the PyTorch extension folder to a place where it\u0026rsquo;s persistent between different runs, e.g.:\nexport TORCH_EXTENSIONS_DIR=/some/persistent/folder/.cache/torch_extensions Singularity Containers You can use the following scripts to build Singularity Containers that contain SOL.\nRemote Setup If you want to install SOL using the official repository use this script.\nBootStrap: docker From: rockylinux/rockylinux:8.10 %post # setup OS dnf update -y dnf install -y gcc-toolset-11 python312\t# SOL requirements dnf install -y epel-release\t# VEOS requirements dnf install -y libquadmath libdhash protobuf-c log4c\t# VEOS requirements # setup VENV python3 -m venv /venv . /venv/bin/activate python3 -m pip install --upgrade pip python3 -m pip install {{ PYTHON_FRAMEWORKS }} python3 -m pip install nec-sol python3 -m nec-sol install -u \u0026#34;{{ SOL_USERNAME }}\u0026#34; -p \u0026#34;{{ SOL_PASSWORD }}\u0026#34; --accept-license --devices ve deactivate %environment # init VE paths export PATH=$PATH:/opt/nec/ve/bin export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/nec/ve/veos/lib64 # init GCC/11 . scl_source enable gcc-toolset-11 export CC=/opt/rh/gcc-toolset-11/root/usr/bin/gcc export CXX=/opt/rh/gcc-toolset-11/root/usr/bin/g++ # init VENV . /venv/bin/activate # Configure Proxy here if needed # export https_proxy=192.168.0.1:1234 # export http_proxy=192.168.0.1:1234 Create a file sol4ve.cfg and set the correct values for the variables, replace {USERNAME} and {PASSWORD} with your credentials:\nSOL_USERNAME={USERNAME}\rSOL_PASSWORD={PASSWORD}\rPYTHON_FRAMEWORKS=torch torchvision And then build it with the following command.\nsudo -E singularity build --build-arg-file sol4ve.cfg sol4ve.sif sol4ve.def Local Setup In case you want to install SOL from a local folder, you can use the following script.\nBootStrap: docker From: rockylinux/rockylinux:8.10 %files {{ SOL_PATH }} /sol %post # setup OS dnf update -y dnf install -y gcc-toolset-11 python312\t# SOL requirements dnf install -y epel-release\t# VEOS requirements dnf install -y libquadmath libdhash protobuf-c log4c\t# VEOS requirements # setup VENV python3 -m venv /venv . /venv/bin/activate python3 -m pip install --upgrade pip python3 -m pip install {{ PYTHON_FRAMEWORKS }} python3 -m pip install --pre nec-sol-core[ve,torch] veda-pytorch -f /sol # add features if needed deactivate rm /sol/*.* rmdir /sol %environment # init VE paths export PATH=$PATH:/opt/nec/ve/bin export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/nec/ve/veos/lib64 # init GCC/11 . scl_source enable gcc-toolset-11 export CC=/opt/rh/gcc-toolset-11/root/usr/bin/gcc export CXX=/opt/rh/gcc-toolset-11/root/usr/bin/g++ # init VENV . /venv/bin/activate # Configure Proxy here if needed # export https_proxy=192.168.0.1:1234 # export http_proxy=192.168.0.1:1234 Create a file sol4ve.cfg and set the correct values for the variables, replace {PATH} with path to the SOL download folder.\nSOL_PATH={PATH}\rSOL_PASSWORD={PASSWORD}\rPYTHON_FRAMEWORKS=torch torchvision And then build it with the following command.\nsudo -E singularity build --build-arg-file sol4ve.cfg sol4ve.sif sol4ve.def Execution To run the container, just execute the following command. Instead of manually binding the required folders.\nsingularity shell --writable-tmpfs --bind /opt/nec:/opt/nec:ro --bind /usr/lib64/libaurlic.so.1:/usr/lib64/libaurlic.so.1:ro --bind /var/opt/nec/ve/veos/:/var/opt/nec/ve/veos/:rw sol4ve.sif Alternatively you can also use the SINGULARITY_BIND env var.\nexport SINGULARITY_BIND=/opt/nec:/opt/nec:ro,/usr/lib64/libaurlic.so.1:/usr/lib64/libaurlic.so.1:ro,/var/opt/nec/ve/veos/:/var/opt/nec/ve/veos/:rw singularity shell --writable-tmpfs sol4ve.sif "
},
{
	"uri": "/frameworks/tests.html",
	"title": "Test Coverage",
	"tags": [],
	"description": "",
	"content": " PyTorch General Tests TestResult kwargs\t\u0026#x2713; seed\t\u0026#x2713; Native VE\t\u0026#x2713; Transparent VE\t\u0026#x2713; Neural Networks NetworkX86VENVIDIA InferenceTrainingVBS TrainingInferenceTrainingVBS TrainingInferenceTrainingVBS Training AlexNet\t\u0026#x2713; \u0026#x2713; \u0026#x2713; \u0026#x2713; \u0026#x2713; \u0026#x2713; ? ? ? SqueezeNet 1.0\t\u0026#x2713; \u0026#x2713; \u0026#x2713; \u0026#x2713; \u0026#x2713; \u0026#x2713; ? ? ? SqueezeNet 1.1\t\u0026#x2713; \u0026#x2713; \u0026#x2713; \u0026#x2713; \u0026#x2713; \u0026#x2713; ? ? ? Inception V3\t\u0026#x2713; \u0026#x2713; \u0026#x2713; \u0026#x2713; \u0026#x2713; \u0026#x2713; ? ? ? GoogleNet\t\u0026#x2713; \u0026#x2713; \u0026#x2713; \u0026#x2713; \u0026#x2713; \u0026#x2713; ? ? ? MNasNet 0.5 \u0026#x2713; \u0026#x2713; \u0026#x2713; \u0026#x2713; \u0026#x2713; \u0026#x2713; ? ? ? MNasNet 0.75 \u0026#x2713; \u0026#x2713; \u0026#x2713; \u0026#x2713; \u0026#x2713; \u0026#x2713; ? ? ? MNasNet 1.0 \u0026#x2713; \u0026#x2713; \u0026#x2713; \u0026#x2713; \u0026#x2713; \u0026#x2713; ? ? ? MNasNet 1.3 \u0026#x2713; \u0026#x2713; \u0026#x2713; \u0026#x2713; \u0026#x2713; \u0026#x2713; ? ? ? DenseNet 121 \u0026#x2713; \u0026#x2713; \u0026#x2713; \u0026#x2713; \u0026#x2713; \u0026#x2713; ? ? ? DenseNet 169 \u0026#x2713; \u0026#x2713; \u0026#x2713; \u0026#x2713; \u0026#x2713; \u0026#x2713; ? ? ? DenseNet 201 \u0026#x2713; \u0026#x2713; \u0026#x2713; \u0026#x2713; \u0026#x2713; \u0026#x2713; ? ? ? DenseNet 161 \u0026#x2713; \u0026#x2713; \u0026#x2713; \u0026#x2713; \u0026#x2713; \u0026#x2713; ? ? ? Resnet 18 \u0026#x2713; \u0026#x2713; \u0026#x2713; \u0026#x2713; \u0026#x2713; \u0026#x2713; ? ? ? Resnet 34 \u0026#x2713; \u0026#x2713; \u0026#x2713; \u0026#x2713; \u0026#x2713; \u0026#x2713; ? ? ? Resnet 50 \u0026#x2713; \u0026#x2713; \u0026#x2713; \u0026#x2713; \u0026#x2713; \u0026#x2713; ? ? ? Resnet 101 \u0026#x2713; \u0026#x2713; \u0026#x2713; \u0026#x2713; \u0026#x2713; \u0026#x2713; ? ? ? Resnet 152 \u0026#x2713; \u0026#x2713; \u0026#x2713; \u0026#x2713; \u0026#x2713; \u0026#x2713; ? ? ? Resnext 50 \u0026#x2713; \u0026#x2713; \u0026#x2713; \u0026#x2713; \u0026#x2713; \u0026#x2713; ? ? ? Resnext 101 \u0026#x2713; \u0026#x2713; \u0026#x2713; \u0026#x2713; \u0026#x2713; \u0026#x2713; ? ? ? WideResnet 50 \u0026#x2713; \u0026#x2713; \u0026#x2713; \u0026#x2713; \u0026#x2713; \u0026#x2713; ? ? ? WideResnet 101 \u0026#x2713; \u0026#x2713; \u0026#x2713; \u0026#x2713; \u0026#x2713; \u0026#x2713; ? ? ? RNN (LSTM, GRU, ReLU, TANH) #175 #175 #175 \u0026#x2713; \u0026#x2713; #208 #176 #176 #176 ShuffleNet V2 0.5 \u0026#x2713; \u0026#x2713; #208 \u0026#x2713; \u0026#x2713; #208 ? ? #208 ShuffleNet V2 1.0 \u0026#x2713; \u0026#x2713; #208 \u0026#x2713; \u0026#x2713; #208 ? ? #208 ShuffleNet V2 1.5 \u0026#x2713; \u0026#x2713; #208 \u0026#x2713; \u0026#x2713; #208 ? ? #208 ShuffleNet V2 2.0 \u0026#x2713; \u0026#x2713; #208 \u0026#x2713; \u0026#x2713; #208 ? ? #208 VGG 11 \u0026#x2713; \u0026#x2713; \u0026#x2713; \u0026#x2713; \u0026#x2713; \u0026#x2713; ? ? ? VGG 13 \u0026#x2713; \u0026#x2713; \u0026#x2713; \u0026#x2713; \u0026#x2713; \u0026#x2713; ? ? ? VGG 16 \u0026#x2713; \u0026#x2713; \u0026#x2713; \u0026#x2713; \u0026#x2713; \u0026#x2713; ? ? ? VGG 19 \u0026#x2713; \u0026#x2713; \u0026#x2713; \u0026#x2713; \u0026#x2713; \u0026#x2713; ? ? ? VGG 11 BatchNorm \u0026#x2713; \u0026#x2713; \u0026#x2713; \u0026#x2713; \u0026#x2713; \u0026#x2713; ? ? ? VGG 13 BatchNorm \u0026#x2713; \u0026#x2713; \u0026#x2713; \u0026#x2713; \u0026#x2713; \u0026#x2713; ? ? ? VGG 16 BatchNorm \u0026#x2713; \u0026#x2713; \u0026#x2713; \u0026#x2713; \u0026#x2713; \u0026#x2713; ? ? ? VGG 19 BatchNorm \u0026#x2713; \u0026#x2713; \u0026#x2713; \u0026#x2713; \u0026#x2713; \u0026#x2713; ? ? ? MobileNet \u0026#x2713; \u0026#x2713; \u0026#x2713; \u0026#x2713; \u0026#x2713; \u0026#x2713; ? ? ? MobileNet V2 \u0026#x2713; \u0026#x2713; \u0026#x2713; \u0026#x2713; \u0026#x2713; \u0026#x2713; ? ? ? IgorNet #259 \u0026#x2713; #208 #259 \u0026#x2713; #208 #259 ? #208 PyTorchic BERT #355 #355 #355 #355 #355 #355 #355 #355 #355 Huggingface BERT #355 #355 #355 #355 #355 #355 #355 #355 #355 Huggingface SeqBERT #355 #355 #355 #355 #355 #355 #355 #355 #355 Huggingface GPT-2 #355 #80 #80 #355 #80 #80 #355 #355 #355 ONNX General Tests TestResult run in Numpy\t\u0026#x2713; run in PyTorch\t\u0026#x2713; run in TensorFlow\t? Tensorflow General Tests TestResult Native VE\t\u0026#x2713; Transparent VE\t\u0026#x2713; Neural Networks NetworkX86VENVIDIA InferenceVBS TrainingInferenceVBS TrainingInferenceVBS Training AlexNet\t\u0026#x2713; \u0026#x2713; \u0026#x2713; \u0026#x2713; ? ? SqueezeNet 1.0\t\u0026#x2713; \u0026#x2713; \u0026#x2713; \u0026#x2713; ? ? SqueezeNet 1.1\t\u0026#x2713; \u0026#x2713; \u0026#x2713; \u0026#x2713; ? ? Layers\t\u0026#x2713; #N/A \u0026#x2713; #N/A ? ? "
},
{
	"uri": "/advanced/deployment.html",
	"title": "Deployment",
	"tags": [],
	"description": "",
	"content": " This feature is still beta and has not been exhaustively been tested!\nThe command sol.deploy(model, target, dargs, *args, batch_size, autotuning, **kwargs) can be used to deploy trained networks. Currently we support:\nX86 ARM64 native and cross-compilation NVIDIA GPUs NEC SX-Aurora It is important that only pure framework models can be deployed yet, no SOL-models. So in case you have trained your model with SOL, you need to transfer the weights back to the framework model. This can be done as follows:\n# PyTorch py_model = ... sol_model = sol.optimize(...) sol_model.load_state_dict(py_model.state_dict()) # ... do your training py_model.load_state_dict(sol_model.state_dict()) # TensorFlow py_model = ... sol_model = sol.optimize(...) sol_model.setWeights(py_model.weights()) # ... do your training py_model.setWeights(sol_model.weights()) In order to deploy your model, you can use the sol.deploy(...) function which in principle works like the sol.optimize(...) but has some more parameters.\nThe sol.deploy(\u0026hellip;) command See this for details on model, size, dtype and layout.\nParameter Default Description target Any of shared_lib, static_lib. dargs A dict or OrderedDict containing the following values dargs[\u0026quot;lib_name\u0026quot;] \u0026quot;libsolpredict\u0026quot; Name of the library to be generated, e.g., \u0026rsquo;libmynetwork\u0026rsquo; would generate \u0026rsquo;libmynetwork.a' dargs[\u0026quot;func_name\u0026quot;] \u0026quot;sol_predict\u0026quot; Name of the generated C function to call. dargs[\u0026quot;device_type\u0026quot;] Either sol.device.[...] or a tuple with (sol.device.[...], \u0026quot;arch\u0026quot;) where \u0026quot;arch\u0026quot; is a device specific string. For details look at this dargs[\u0026quot;host_type\u0026quot;] sol.device.auto Same as device, but for the host system, in case of offloading deployment, e.g., NVIDIA. In host-only (i.e. X86) or pure-offloading (i.e. VE) this gets ignored. dargs[\u0026quot;path\u0026quot;] \u0026quot;.\u0026quot; Path where the library and header files will be generated into. An example look like:\nsol.deploy(alexnet, target=\u0026#34;lib::shared\u0026#34;, {\u0026#34;lib_name\u0026#34;: \u0026#34;libNN\u0026#34;, \u0026#34;func_name\u0026#34;: \u0026#34;predict\u0026#34;, \u0026#34;path\u0026#34;: \u0026#34;.\u0026#34;, \u0026#34;device_type\u0026#34;: sol.device.x86}, [1, 3, 224, 224]) Which will generate a library called libNN.so for X86 and a header file in the current working directory.\n#ifndef __libNN__ #define __libNN__ #ifdef __cplusplus extern \u0026#34;C\u0026#34; { #endif void predict_init(const int deviceIdx); int predict_seed(const int64_t seed); void predict(void* ctx, const float* L0, float** L47); #ifdef __cplusplus } #endif #endif There are three methods. predict_init(...) is automatically called the first time you run predict(...) so you only need to use it, if you want to initialize everything prior the very first call. You can further use predict_seed(...) to initiliaze all random number generators, in case you model uses random variables. predict(...) is the actual method that executes your model. It expects ctx to be 0, its purpose is for expert features, not available yet. L0 to point to your input data (allocated on the target device!) and a pointer to L47 to return the result. The numbers are autogenerated and can be different in your case.\nThe value for L47 will get allocated via sol_ext_malloc(...) and can either be freed with sol_ext_free(...) or with the device\u0026rsquo;s own free method, e.g., free(...) (X86/ARM64/VE) or cudaFree(...) (NVIDIA). In future we will enable to overwrite the sol_ext_...(...) methods to provide your own memory allocator.\nIn case of a static lib, SOL will tell you which libraries you need to link against, in case of shared library, SOL already does all of this for you.\n[INFO ][ 8.79][core] Compiler (63): Please link the following libraries and flags to your application. [INFO ][ 8.79][core] Compiler (63): -L/path/to/sol/deploy [INFO ][ 8.79][core] Compiler (63): -l:libMyAlexNet.a [INFO ][ 8.79][core] Compiler (63): -l:libdnnl.a [INFO ][ 8.79][core] Compiler (63): -lstdc++ [INFO ][ 8.79][core] Compiler (63): -fopenmp Architectures Depending on the architecture you are deploying to SOL might need additional information for the target hardware. These can be added to the dargs argument.\nX86 or ARM64 CPUs Parameter Default Values gcc::march native Everything that GCC accepts for -march parameter. gcc::mtune native Everything that GCC accepts for -mtune parameter. gcc::elf elf_x86_64 Everything that LD accepts for -m parameter. gcc::oformat elf64-x86-64 Everything that LD accepts for -oformat parameter. ispc::vector native [native, sse2, sse4, avx, avx2, avx512] NVIDIA GPUs Parameter Default Values nvidia::cc 2 digit compute capability, i.e. 65 NEC SX-Aurora The NEC SX-Aurora does not need any additional arguments.\n"
},
{
	"uri": "/tutorials/deployment.html",
	"title": "Deployment",
	"tags": [],
	"description": "",
	"content": " Deployment is an advanced feature. This tutorial shows how to use it to produce a deployed model. Actually using the deployed model in another project requires knowledge about.\nDeployment is still in experimental state and might change in future!\nHow to use SOL\u0026rsquo;s deployment functionality works in a similar way to sol.optimize(). Just instead of creating and optimized model that is compiled when it is called, sol.deploy creates a directly compiled library of the given model at the given path. Both functions can be controlled by the same environment variables and sol.config. So, for example autotuning can also be enabled or disabled by setting sol.config[\u0026quot;autotuning\u0026quot;]. But deploy does not read many of its options directly from the model. Instead you have to define them manually. The function\u0026rsquo;s signature looks like this:\ndef deploy(model, args=[], kwargs={}, fwargs={}, *, library_type:str\t= \u0026#34;shared_lib\u0026#34;, library_func:str\t= \u0026#34;predict\u0026#34;, library_name:str\t= \u0026#34;sol_deploy\u0026#34;, library_path:str\t= \u0026#34;.\u0026#34;, device:str, device_idx:int\t= 0, device_idx_runtime:int\t= 0, weights_type:str\t= \u0026#34;linked\u0026#34;, weights_path:str\t= \u0026#34;.\u0026#34;, vdims:Optional[Union[List, Tuple]]\t= None, compiler_args\t= {}, no_avx_512\t= False ): These options can be separated into four categories:\nSOL-Options These options define how SOL optimizes and compiles the network. Most of these are similar to sol.optimize().\nParameter Valid Options/Type Description model nn.module/tf.model/onnx_model The Model, defined in any framework that is supported by SOL args torch.Tensor/tf.Tensor An Example input, used to get datatype and shapes kwargs dict Keyword arguments kwargs dict Framework arguments, e.g. shape vdims list Variable dimensions, see VDim compiler_args dict Dictionary which is given directly to underlying JIT compilers Library Options These options define the properties of the generated library.\nParameter Valid Options Description library_type shared_lib, static_lib Creates a shared (.so) or static (.a) library. library_name str Name of the Library. \u0026ldquo;sol_\u0026rdquo; will be used as prefix. library_func str Name of the main function of that library. library_path str Path to which the generated Files will be written. Weight Options These options define how weights are stored and read by the generated library.\nParameter Valid Options Description weights_type linked, extern Weights are either linked into the generated library or stored in an external location. weights_path str Path to store external weights. (Can be None for linked weights.) Weights that are linked into the library require less manual handling, because they are part of the same file. But the size of a shared library is limited by Linux, so you cannot use this functions for larger networks. Keeping the weights external also allows you to free them on the host if you have moved your model to another device during execution. The library on the other hand stays loaded on the host side as long as it is in use.\nDevice Options These options define the device and the execution mode of how the deployed model.\nParameter Valid Options Description device x86, nvidia, ve, veo device_idx int Device id during compilation (+ autotuning) device_idx_runtime int Device id of the generated library The difference between ve and veo is in their respective execution modes. ve generates a .vso library that can be linked into an executable that can run directly on the Vector Engine. veo stands for Vector Engine offloading and generates a host library and a device library. You can use the host library in your x86 project and it will offload the model and its in- and outputs automatically to the Vector Engine during runtime.\nExample Here is an example script:\nimport sol import torch import torchvision.models as models model = models.get_model(\u0026#34;convnext_small\u0026#34;) model.eval() input = torch.rand(10,3,224,224, dtype=torch.float32) sol.deploy(model, [input], library_type\t= \u0026#34;shared_lib\u0026#34;, library_func\t= \u0026#34;predict\u0026#34;, library_name\t= \u0026#34;sol_convnext\u0026#34;, library_path\t= \u0026#34;/LOCAL/deployment\u0026#34;, device_idx\t= 0, device\t= \u0026#34;nvidia\u0026#34;, weights_type\t= \u0026#34;extern\u0026#34;, weights_path\t= \u0026#34;data\u0026#34;, # Advanced features vdims\t= [False], compiler_args\t= {\u0026#34;ispc::vector\u0026#34;:\u0026#34;avx2\u0026#34;, \u0026#34;gcc:march\u0026#34;:\u0026#34;native\u0026#34;}, ) Output Files This script creates a folder in /LOCAL/deployment with the following contents:\ndemouser:LOCAL/deployment$ ls data libsol_convnext.so sol_convnext_example.c sol_convnext_example.py sol_convnext.h To compile and run the generated the example, run the following commands.\ndemouser:LOCAL/deployment$ g++ sol_convnext_example.c -lsol_convnext -o sol_convnext_exe demouser:LOCAL/deployment$ ./sol_convnext_exe Note that LIBRARY_PATH needs to include the location of libsol_convnext.so for the linker (ld) to be able to link against it. To run the executable, the loader (ld.so) needs to be able to locate the shared library as well. Set LD_LIBRARY_PATH accordingly or move libsol_convnext.so to a folder included in these paths.\nAlternatively you can run the generated python example:\ndemouser:LOCAL/deployment$ python sol_convnext_example.py The generated examples are meant to be human readable and show you how to call the deployed model from their respective languages.\n"
},
{
	"uri": "/frameworks.html",
	"title": "Frameworks",
	"tags": [],
	"description": "",
	"content": "Running SOL in a framework is the easiest operation mode, as SOL takes care of most parameters, options, etc. automatically. In principle you just need to load SOL and run the sol.optimize(model, args, kwargs={}, *, framwork=None, vdims=None, **fwargs) function on your target model.\nSee subchapters for details on the different frameworks and example codes.\nParameter Description model Your model you want to optimize using SOL. args Either a list or tuple of framework tensors or other inputs. kwargs A dictionary of named arguments. framework A framework in which the returned model shall be executed. By default the same as the input model. vdims A list or tuple containing the value you want to assign to the variable dimension. Valid values are positive integers or None for variable values. determinism A set, list or tuple sol.Determinism enum items. Click here for more details. fwargs A dictionary containing framework specific flags. See corresponding framework for available flags. Generic Model Functions SOL models are implemented using the framework\u0026rsquo;s own model structure, so they provide the same functionality as the framework\u0026rsquo;s models, except that SOL models always assume, that training = False for inference and training = True for training runs. Additionally they support following functions:\nCommand Description model.__sol__.network Unique hash of the network. model.__sol__.free_ctxs() Forces to free all SOL contexts of this network on all devices. Command Description sol.__version__ SOL version string sol.cache.clear() Clears SOL\u0026rsquo;s build cache, to enforce rebuild of models. sol.check_version() Checks if a new version of SOL is available sol.config.print() Prints all config options and their current values sol.config[\u0026quot;...\u0026quot;] = ... Sets config options sol.deploy(...) Details here sol.device.disable(device) See sol.device.enable(device) sol.device.enable(device) Enables code generation for the specified device. By default all available devices will be build. Device needs to be sol.device.[x86, nvidia, ve] sol.device.set(device, deviceIdx, *, bind_thread=False) Forces SOL to run everything on the given device. If the data is not located on the target device, it will be explicitly copied between the host and the device. By default the device is bound to all threads of the process. Using bind_thread=True you can bind the device only to the current thread. The thread\u0026rsquo;s setting is always preceeded over the process setting. sol.devices() Prints overview of available devices. Green: device is initialized (has been used for computations). Star: default device. sol.env() Prints Env Vars + values used by SOL. sol.optimize(...) Details here sol.plugins() Prints overview of loaded plugins. sol.seed(deviceType=None, deviceIdx=None) Fetches the global seed (both == None), the device type\u0026rsquo;s seed or the seed of a specific device. sol.seeds() Prints seed overview: sol.set_seed(seed, deviceType=None, deviceIdx=None) Sets the seed. sol.versions() Prints versions of used compiler and libraries. For offloading the data needs to be on the host system, otherwise implicit copy is not possible!\n"
},
{
	"uri": "/advanced.html",
	"title": "Advanced",
	"tags": [],
	"description": "",
	"content": ""
},
{
	"uri": "/advanced/debug.html",
	"title": "Debug",
	"tags": [],
	"description": "",
	"content": "SOL provides a series of debug and visualization features.\nCode Debugging SOL\u0026rsquo;s generated code can be debugged by setting:\nsol.config[\u0026#34;jit::debug\u0026#34;] = True\t# compiles with debug symbols sol.config[\u0026#34;compiler::debug\u0026#34;] = True\t# adds assertions into the code (requires jit::debug) Python based frameworks For Python based frameworks you can then execute the following series of commands:\ngdb python3 run myscript.py or run interactively\ngdb python3 run # starting from here the Python console will be active. # use CTRL+C to switch back to GDB import framework import sol model = init_model() sol_model = sol.optimize(...) Computation Graph Option Description sol.config[\u0026quot;compiler::debug_graph\u0026quot;]=True Plots the input CG, and a separate CG for every device and execution pass. sol.config[\u0026quot;compiler::debug_text\u0026quot;]=True Generates a textual representation of your computation graph, in case your graph is too complex. Memory Consumption sol.config[\u0026#34;compiler::debug_memory_consumption\u0026#34;]=True plots the memory consumption for all devices and execution passes in $CWD/.sol/debug/*_memory.svg.\nColors:\nBlue: Fully managed by framework Red: Allocated by SOL, freed by framework Green: Fully managed by SOL Labels:\ninputs: Model inputs parameters: Model parameters outputs: Model outputs gradients: Model gradients (only backward pass) copies: Data shared between forward and backward pass in training intermediate: Data shared between SOL fused layers device: Temporary data stored in device main memory during execution of fused layer core: Temporary data stored per core during execution of fused layer "
},
{
	"uri": "/advanced/vdims.html",
	"title": "Variable Dimensions",
	"tags": [],
	"description": "",
	"content": "Since SOL v0.5 we support variable dimensions (VDims). When you run sol.optimize, SOL will automatically detect variable dimensions. The acceptable input shapes get printed by SOL after analyzing the model.\nInputs: in_0 [#0, 5, #1, #2, #3] Outputs: out_0 { \u0026#34;A\u0026#34;: [#0, 5, #1, #2, #3], \u0026#34;B\u0026#34;: [#0, 5, #1, 3, 3], \u0026#34;C\u0026#34;: [#0, 5, #1, 5, 7], } Here we see that we have 4 VDims (#0 to #3). Depending on the structure of your neural network, SOL will restrict certain dimensions to be of fixed size. For performance reasons, SOL disables all VDims, so you need to enable them by hand using:\nsol_model = sol.optimize(model, [torch.rand(5, 5, 5, 5, 5)], vdims=[True, False, 1]) This enables the #0 to accept any size. #1 will only accept the size that was used when parsing the model (the default behavior). #2 will only accept the size to be 1. As #3 does not get set, it will use the default behavior. So the compatible shape for this example is [*, 5, 5, 1, 5].\nSetting the VDims needs to happen BEFORE you execute the model for the first time, otherwise SOL will not obey your settings! If you need to change your VDims call sol.cache.clear() at the beginning of your script.\n"
},
{
	"uri": "/devices.html",
	"title": "Devices",
	"tags": [],
	"description": "",
	"content": ""
},
{
	"uri": "/advanced/unsupported.html",
	"title": "Unsupported Layers",
	"tags": [],
	"description": "",
	"content": "In case you need to run layers that are not supported by SOL you can partially implement the model using SOL:\nclass MyModel(framework.Model): def __init__(self, ...): self.A = sol.optimize(PartThatCanBeOptimized, ...) self.B = PartThatCannotBeOptimized self.C = sol.optimize(OtherPartThatCanBeOptimized, ...) # don\u0026#39;t forget to set the inputs of this submodel to requiresGrad=True! def forward(self, X): X = self.A(X) # executed by SOL X = self.B(X) # executed by Framework X = self.C(X) # executed by SOL return X It is important that except the input part of the network is called with sol.optimize(..., sol.input(..., requires_grad=True), ...) to ensure correct gradient calculations!\n"
},
{
	"uri": "/advanced/custom.html",
	"title": "Custom Layers",
	"tags": [],
	"description": "",
	"content": "In v0.5.3 we added an experimental Custom Layer support. In this chapter we describe how it can be used from top to bottom:\nIntegration into Neural Network For the integration into the neural network, we use custom layers, that integrate seamlessly into the AI framework\u0026rsquo;s computation mechanisms. However, not all features are available in all frameworks.\nTo create your own Custom layer, you can use the sol.[pytorch/tensorflow].nn.Custom class.\nIt\u0026rsquo;s constructor expects some arguments:\nfunction_names: Union[str, List[str], Tuple[str]]: List of C++ function names, that get called from the custom layer. 1st is used for forward pass, 2nd for backward pass. include_file: str: Name of C++ header file to be included by the custom layer. Gets included as #include\u0026lt;[include_file]\u0026gt;. compiler_flags: Dict[str, Union[Tuple[str], List[str]]]: Dictionary of compiler flags that get passed to he device\u0026rsquo;s native compiler. Use x86 (GCC), nvidia (NVCC), or ve (NCC) for device names. hyper_parameter: Union[Tuple[Union[int, str]], List[Union[int, str]]]: Hyper parameters that get passed as template parameters to the custom layer. Should be of C++ integer-like types, 'char', constexpr value or C-preprocessor macro. name: str: Name used in the SOL debug graph, or as name for the layer within the framework (if supported). By default, the custom layer accepts multiple inputs, and returns identical outputs. If you need to recompute the shape or dtypes of the custom layer, you can overwrite the shape_inference and dtype_inference methods. 2nd should return dtypes in the framework\u0026rsquo;s native dtype format.\nclass Custom(sol.[pytorch/tensorflow].nn.Custom): def shape_inference(self, inputs: List[Tensor]) -\u0026gt; List[List[int]]: ... def dtype_inference(self, inputs: List[Tensor]) -\u0026gt; List[DType]: ... When computing the derivative it might be required to have a copy available of any of the inputs or outputs. For this purpose you can overwrite the methods input_requires_copies and output_requires_copies. Each needs to return a list of boolean values, where True indicates that the value will be provided as copy within the backward pass.\nclass Custom(sol.[pytorch/tensorflow].nn.Custom): def input_requires_copies(self, inputs: List[Tensor]) -\u0026gt; List[bool]: ... def output_requires_copies(self, inputs: List[Tensor]) -\u0026gt; List[bool]: ... Here a full example written in Pytorch.\nclass Custom(sol.pytorch.nn.Custom): def __init__(self): path\t= [f\u0026#39;-I/path/to/your/code\u0026#39;] include_file\t= \u0026#39;custom.h\u0026#39; compiler_flags\t= {\u0026#39;x86\u0026#39;: path, \u0026#39;ve\u0026#39;: path} super().__init__( (\u0026#39;custom_memcpy\u0026#39;, \u0026#39;custom_memcpy\u0026#39;), include_file, compiler_flags, [0, \u0026#34;\u0026#39;c\u0026#39;\u0026#34;], name=\u0026#34;Memcpy\u0026#34; ) def _eager(self, inputs: List[Tensor]) -\u0026gt; List[Tensor]: return [inputs[0].clone()] def forward(self, x): return self._forward([x])[0] model = torch.nn.Sequential(Custom()) As you can see, we have overwritten the forward method, as the default requires a list or tuple as input, which would not work with the torch.nn.Sequential. Further, within PyTorch you can provide a _eager method, that performs your operation when running the neural network with PyTorch. Due to technical limitations this is not available for TensorFlow!\nC++ Code The SOL custom layer integrates your layer into the SOL execution graph, performs all allocations/deallocations of input and output data and calls into the functions you have provides with the function_names arguments.\nThe raw scheme of the function call looks like this:\n#include \u0026lt;sol/[device_type]/api.h\u0026gt; #include \u0026lt;[include_file]\u0026gt; SOL_API void [autogenerated_unique_function_name](sol_ctx* ctx) { [function_name[derivative]]\u0026lt;[training], [hyper_parameters], [i.dtype for i in inputs], [o.dtype for o in outputs]\u0026gt;(ctx, *inputs, *outputs, *input_copies, *output_copies); } As described before, first we include your header file and then call function_name[0] for forward pass and function_name[1] for backward pass.\nFor the template, we provide:\nBoolean training, that is false in inference, and true during training. All your provided hyper_parameters, i.e., [0, 'c']. C++ dtypes of all inputs C++ dtypes of all outputs For the function arguments, we provide:\nsol_ctx* which needs to be used to allocate temporary data. sol_tensor* of all inputs or gradient output sol_tensor* of all outputs or gradient input sol_tensor* of all input copies (in backward pass only) sol_tensor* of all outputs copies (in backward pass only) Within your header file you then can provide a function similar to this:\n#include \u0026lt;type_traits\u0026gt; template\u0026lt;bool training, int hyper_0, char hyper_1, typename TI, typename TO\u0026gt; inline void custom_memcpy(sol_ctx* ctx, const sol_tensor* input, const sol_tensor* output) { static_assert(std::is_same\u0026lt;TI, TO\u0026gt;::value); auto i = sol::ptr\u0026lt;TI\u0026gt;(input); auto o = sol::ptr\u0026lt;TO\u0026gt;(output); memcpy(o, i, sizeof(TI) * input-\u0026gt;numel); } As you see, you need to use sol::ptr\u0026lt;TYPE\u0026gt;(sol_tensor*) to get the raw pointer from the sol_tensor*. Please use the following fields of the sol_tensor if necessary:\nnumel: total number of elements within the tensor dims: number of dimensions shape: shape of tensor as array of sol_dim (using sol_dim = int64_t) All other fields are object to be changed, and should not be used!\nIf you need to allocate temporary data, you need to do it like this (You can allocate 1 to 8-dimensional tensors):\n// in C sol_tensor tmp = SOL_TENSOR_2D(ctx-\u0026gt;device_handle, sol_dtype_f32(), DIM_0, DIM_1); sol_tensor_malloc(\u0026amp;tmp); T* value = sol_tensor_ptr_f32(\u0026amp;tmp); ... sol_tensor_free(\u0026amp;tmp); // in C++ sol::Tensor\u0026lt;float\u0026gt; tmp(ctx-\u0026gt;device_handle, DIM_0, DIM_1); T* value = tmp; If your header file shall call into an external library, keep in mind that you provide the correct compiler_flags for the device specific compiler. For GCC/NCC this could be: -L/path/to/your/lib -llibraryname to link the library /path/to/your/lib/liblibraryname.so.\n"
},
{
	"uri": "/talks.html",
	"title": "Talks/Papers",
	"tags": [],
	"description": "",
	"content": " 2025 Facilitate high-performance hardware integration into AI Frameworks with the NEC SOL AI compiler Slides, 20th International Workshop on Automatic Performance Tuning, 2025\n2022 SOL: Single middleware for optimized multi-architecture AI training and deployment Slides, 33nd NEC User Group Meeting, 2022\nSOL: Reducing the Maintenance Overhead for Integrating Hardware Support into AI Frameworks Article, NEC SX-Aurora Articles, 2022\n2020 SOL: Effortless Device Support for AI Frameworks without Source Code Changes Paper, HPML, 2020\nSOL4VE: Running Deep Neural Networks on the NEC SX-Aurora Tsubasa Slides, SuperComputing NEC Aurora Forum, 2020\nSOL4VE: Bringing Deep Neural Networks to the NEC SX-Aurora TSUBASA Slides, 32nd NEC User Group Meeting, 2020\nSOL: Transparent Neural Network Acceleration on NEC SX-Aurora TSUBASA Slides, Video, Virtual ICM Seminars, 2020\n2019 Integration of NEC SX-Aurora into AI Frameworks Slides, 2nd Aurora Deep Dive Workshop @ RWTH Aachen, 2019\nIntegration of NEC SX-Aurora into AI Frameworks Slides, SuperComputing NEC Aurora Forum and SuperComputing NEC Booth, 2019\nIntegration of NEC SX-Aurora into AI Frameworks Slides, Workshop on Sustained Scalability, 2019\nSol: Transparent Neural Network Acceleration Slides, Supercomputing Frontiers Europe, 2019\n2018 Sol: Transparent Neural Network Acceleration Platform Slides, SuperComputing NEC Booth, 2018\nSol: Transparent Neural Network Acceleration Platform Short Paper, Poster, SuperComputing, 2018\nBrainSlug: Transparent Acceleration of Deep Learning Through Depth-First Parallelism Technical Report, ArXiV, 2018 Short Paper, DeepMobile, 2018\n"
},
{
	"uri": "/info.html",
	"title": "Info",
	"tags": [],
	"description": "",
	"content": "Open Source Dependencies Project License URL Bundled AOCL/BLAS BSD-3/MIT https://github.com/amd/blis/ yes AVEO LGPL 2.1 https://github.com/SX-Aurora/aveo no Boost BSL 1.0 https://github.com/boostorg/boost/ yes CMake BSD-3 https://github.com/Kitware/CMake/ no CUB BSD-3 https://github.com/NVIDIA/cub no Einops MIT https://github.com/arogozhnikov/einops no GCC GPL-3 https://gcc.gnu.org/ no Hugo Apache 2.0 https://github.com/gohugoio/hugo no Hugo Learn Theme MIT https://github.com/matcornic/hugo-theme-learn/ no ISPC BSD-3 https://github.com/ispc/ispc/ yes Illyrian BSD-3 https://github.com/nec-research/illyrian no JsonCPP MIT https://github.com/open-source-parsers/jsoncpp yes LLVM-VE Apache 2.0 https://github.com/sx-aurora-dev/llvm no Matplotlib Matplotlib License https://github.com/matplotlib/matplotlib/ no NEC VEOS LGPL 2.1 https://github.com/veos-sxarr-NEC/veos/blob/master/COPYING no Numpy BSD-3 https://github.com/numpy/numpy/ no ONNX Apache 2.0 https://github.com/onnx/onnx/ no OneDNN Apache 2.0 https://github.com/oneapi-src/oneDNN yes OpenBLAS BSD-3 https://github.com/OpenMathLib/OpenBLAS/ yes OpenSSL Apache 2.0 https://www.openssl.org/ yes Opt. Einsum MIT https://github.com/dgasmith/opt_einsum/ no PIP MIT https://github.com/pypa/pip/ no PyTorch BSD-3 https://github.com/pytorch/pytorch no Python Python License https://www.python.org/ no Requests Apache 2.0 https://github.com/psf/requests no SQLite Public Domain https://www.sqlite.org/ yes SimpleTermMenu MIT https://github.com/IngoMeyer441/simple-term-menu/ no Sleef BSL-1 https://github.com/shibatch/sleef yes Tensorflow Apache 2.0 https://github.com/tensorflow/tensorflow/ no TermColor MIT https://github.com/termcolor/termcolor no ThreadPoolCTL BSD-3 https://github.com/joblib/threadpoolctl/ no TorchVision BSD-3 https://github.com/pytorch/vision no Transformers Apache 2.0 https://github.com/huggingface/transformers/ no Tungl BSD-3 https://github.com/nec-research/tungl/ no VEDA BSD-3 https://github.com/SX-Aurora/veda no VEDA-PyTorch BSD-3 https://github.com/SX-Aurora/veda-pytorch/ no VEDA-Tensors BSD-3 https://github.com/SX-Aurora/veda-tensors/ no VEDNN BSD-2 https://github.com/mergian/vednn/ yes VEURPC LGPL 2.1 https://github.com/SX-Aurora/ve-urpc/ no VSTL BSD-2 https://github.com/takuya-araki/vstl/ yes X86 SIMD Sort BSD-3 https://github.com/intel/x86-simd-sort/ yes Closed Source Dependencies Project License URL Bundled CUBLAS CUDA EULA https://developer.nvidia.com/cublas no CUDA CUDA EULA https://developer.nvidia.com/cuda no CUDNN CUDNN SLA https://developer.nvidia.com/cudnn no Intel Math Kernel Library Intel Simplified Software License https://www.intel.com/content/www/us/en/developer/tools/oneapi/onemkl.html no NEC Compiler Collection no NEC Numeric Library Collection NLC License https://www.hpc.nec/documents/sdk/SDK_NLC/UsersGuide/main/en/index.html no "
},
{
	"uri": "/beta.html",
	"title": "SOL4VE Closed Beta",
	"tags": [],
	"description": "",
	"content": " NEC Laboratories Europe runs a closed beta program for NEC SX-Aurora TSUBASA users. If you are interested, please submit the following form or send an e-mail.\nFirst Name: Last Name: E-Mail: Organization: Organization Type: Please Select Academic Commercial Who is your NEC your sales representative? How did you hear about us? Please Select NEC Sales NEC Marketing Material NEC SX-Aurora Articles SuperComputing International SuperComputing SuperComputing Frontiers Other Submit "
},
{
	"uri": "/releases/rss.html",
	"title": "Automatic Announcements",
	"tags": [],
	"description": "",
	"content": "To be automatically alerted of new SOL releases, you can subscribe to the PyPI RSS Feed: https://pypi.org/rss/project/nec-sol/releases.xml\n"
},
{
	"uri": "/categories.html",
	"title": "Categories",
	"tags": [],
	"description": "",
	"content": ""
},
{
	"uri": "/tags.html",
	"title": "Tags",
	"tags": [],
	"description": "",
	"content": ""
}]