You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
<h2>Updating ISPC Programs For Changes In ISPC 1.24.0</h2>
583
+
<p>This release extends the standard library with new functions performing dot
584
+
product operations. These functions utilize specific hardware instructions from
585
+
AVX-VNNI and AVX512-VNNI. The ISPC targets that support native VNNI
586
+
instructions are <ttclass="docutils literal"><spanclass="pre">avx2vnni-i32x*</span></tt>, <ttclass="docutils literal"><spanclass="pre">avx512icl-*</span></tt> and <ttclass="docutils literal"><spanclass="pre">avx512spr-*</span></tt>. The
587
+
first two targets (<ttclass="docutils literal"><spanclass="pre">avx2vnni-*</span></tt> and <ttclass="docutils literal"><spanclass="pre">avx512icl-*</span></tt>) were introduced in this
588
+
release. Please refer to <aclass="reference internal" href="#dot-product">Dot product</a> for more details.</p>
589
+
<p>Now, uniform integers and enums can be used as non-type template parameters.
590
+
Please refer to <aclass="reference internal" href="#function-templates">Function Templates</a> for more details.</p>
591
+
<p>The release contains the following changes that may affect compatibility with
592
+
older versions:</p>
593
+
<ulclass="simple">
594
+
<li><ttclass="docutils literal"><spanclass="pre">--pic</span></tt> command line flag now corresponds to the <ttclass="docutils literal"><spanclass="pre">-fpic</span></tt> flag of Clang
595
+
and GCC, whereas the newly introduced <ttclass="docutils literal"><spanclass="pre">--PIC</span></tt> corresponds to <ttclass="docutils literal"><spanclass="pre">-fPIC</span></tt>.
596
+
The previous behavior of <ttclass="docutils literal"><spanclass="pre">--pic</span></tt> flag corresponded to <ttclass="docutils literal"><spanclass="pre">-fPIC</span></tt> flag. In
597
+
some cases, to preserve previous behavior, users may need to switch to
<li>Newly introduced macro definitions for numeric limits can cause conflicts
600
+
with user-defined macros with same names. When this happens, ISPC emits
601
+
warnings about macro redefinition. Please, refer to <aclass="reference internal" href="#the-preprocessor">The Preprocessor</a> for
602
+
the full list of macro definitions.</li>
603
+
<li>The implementation of <ttclass="docutils literal">round</tt> standard library function was aligned across
604
+
all targets. It may potentially affect the results of the code that uses this
605
+
function for the following targets: <ttclass="docutils literal"><spanclass="pre">avx2-i16x16</span></tt>, <ttclass="docutils literal"><spanclass="pre">avx2-i8x32</span></tt> and all
606
+
<ttclass="docutils literal">avx512</tt> targets. Please, refer to <aclass="reference internal" href="#basic-math-functions">Basic Math Functions</a> for more details.</li>
silenced with the <ttclass="docutils literal"><spanclass="pre">--wno-perf</span></tt> flag (or by using <ttclass="docutils literal"><spanclass="pre">--woff</span></tt>, which turns
736
767
off all compiler warnings.) Furthermore, <ttclass="docutils literal"><spanclass="pre">--werror</span></tt> can be provided to
737
768
direct the compiler to treat any warnings as errors.</p>
738
-
<p>Position-independent code (for use in shared libraries) is generated if the
739
-
<ttclass="docutils literal"><spanclass="pre">--pic</span></tt> command-line argument is provided.</p>
769
+
<p>The <ttclass="docutils literal"><spanclass="pre">--pic</span></tt> flag can be used to generate position-independent code suitable
770
+
for use in a shared library. The <ttclass="docutils literal"><spanclass="pre">--PIC</span></tt> flag can be used to generate
771
+
position-independent code suitable for dynamic linking avoiding any limit on
772
+
the size of the global offset table. When no <ttclass="docutils literal"><spanclass="pre">--pic</span></tt> or <ttclass="docutils literal"><spanclass="pre">--PIC</span></tt> flag is
773
+
provided, the compiler enforces target-specific default behavior.</p>
@@ -3704,9 +3791,16 @@ <h2>Basic Math Functions</h2>
3704
3791
unsigned int64 signbits(double x)
3705
3792
uniform unsigned int64 signbits(uniform double x)
3706
3793
</pre>
3707
-
<p>Standard rounding functions are provided for <ttclass="docutils literal">float16</tt>, <ttclass="docutils literal">float</tt> and <ttclass="docutils literal">double</tt>
3708
-
types. (On machines that support Intel®SSE or Intel® AVX, these functions all
3709
-
map to variants of the <ttclass="docutils literal">roundss</tt> and <ttclass="docutils literal">roundps</tt> instructions, respectively.)</p>
3794
+
<p>The standard library provides four rounding functions: <ttclass="docutils literal">round</tt>, <ttclass="docutils literal">floor</tt>,
3795
+
<ttclass="docutils literal">ceil</tt> and <ttclass="docutils literal">trunc</tt> for <ttclass="docutils literal">float16</tt>, <ttclass="docutils literal">float</tt> and <ttclass="docutils literal">double</tt> data types. On
3796
+
machines that support Intel®SSE or Intel® AVX, these functions all map to a
3797
+
single instruction, specifically a variant of the <ttclass="docutils literal">roundss</tt> and <ttclass="docutils literal">roundps</tt>
3798
+
instructions. This offers enhanced performance, despite a minor semantic
3799
+
difference in the <ttclass="docutils literal">round</tt> function when compared to the <ttclass="docutils literal">C</tt> math library
3800
+
<ttclass="docutils literal">round</tt> function. It computes the nearest integer value, rounding halfway
3801
+
cases to nearest even integer, i.e., corresponds to the <ttclass="docutils literal">C</tt> math library
3802
+
<ttclass="docutils literal">roundeven</tt> function. These function operate regardless of the current
3803
+
rounding mode and do not signal precision exceptions.</p>
above, there are versions that supports <ttclass="docutils literal">int16</tt>, <ttclass="docutils literal">int32</tt> and <ttclass="docutils literal">int64</tt>
3887
3981
values as well.</p>
3888
3982
</div>
3983
+
<divclass="section" id="dot-product">
3984
+
<h2>Dot product</h2>
3985
+
<p>ISPC supports dot product operations for unsigned and signed <ttclass="docutils literal">int8</tt> and <ttclass="docutils literal">int16</tt> data types,
3986
+
leveraging the AVX-VNNI and AVX512-VNNI instruction sets. The ISPC targets that support
3987
+
native VNNI instruction sets are <ttclass="docutils literal"><spanclass="pre">avx2vnni-i32x*</span></tt>, <ttclass="docutils literal"><spanclass="pre">avx512icl-i32x*</span></tt>, and <ttclass="docutils literal"><spanclass="pre">avx512spr-i32x*</span></tt>.
3988
+
For other targets these operations are emulated.
3989
+
These dot product operations are specifically designed to operate on <em>packed</em> input vectors,
3990
+
necessitating proper packing of input vectors by the programmer before use.</p>
3991
+
<p>For 8-bit Integer Vectors:</p>
3992
+
<p>The functions multiply groups of four unsigned 8-bit integers packed in <ttclass="docutils literal">a</tt> with corresponding
3993
+
four signed 8-bit integers packed in <ttclass="docutils literal">b</tt>, resulting in four intermediate signed 16-bit values.
3994
+
The sum of these values, in combination with the <ttclass="docutils literal">acc</tt> accumulator, is then returned as the final result.</p>
3995
+
<preclass="literal-block">
3996
+
varying int32 dot4add_u8i8packed(varying uint32 a, varying uint32 b,
3997
+
varying int32 acc)
3998
+
varying int32 dot4add_u8i8packed_sat(varying uint32 a, varying uint32 b,
3999
+
varying int32 acc) // saturate the result
4000
+
</pre>
4001
+
<p>For 16-bit Integer Vectors:</p>
4002
+
<p>The functions multiply groups of two signed 16-bit integers packed in <ttclass="docutils literal">a</tt> with corresponding
4003
+
two signed 16-bit integers packed in <ttclass="docutils literal">b</tt>, yielding two intermediate signed 32-bit results.
4004
+
The sum of these results, combined with the <ttclass="docutils literal">acc</tt> accumulator, is then returned as the final result.</p>
4005
+
<preclass="literal-block">
4006
+
varying int32 dot2add_i16packed(varying uint32 a, varying uint32 b,
4007
+
varying int32 acc)
4008
+
varying int32 dot2add_i16packed_sat(varying uint32 a, varying uint32 b,
4009
+
varying int32 acc) // saturate the result
4010
+
</pre>
4011
+
</div>
3889
4012
<divclass="section" id="pseudo-random-numbers">
3890
4013
<h2>Pseudo-Random Numbers</h2>
3891
4014
<p>A simple random number generator is provided by the <ttclass="docutils literal">ispc</tt> standard
0 commit comments