Skip to content

Commit a8fe758

Browse files
committed
Update docs for v1.24.0
1 parent afec15a commit a8fe758

1 file changed

Lines changed: 130 additions & 7 deletions

File tree

ispc.html

Lines changed: 130 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -107,6 +107,7 @@ <h1 class="title">Intel® ISPC User's Guide</h1>
107107
<li><a class="reference internal" href="#updating-ispc-programs-for-changes-in-ispc-1-21-0">Updating ISPC Programs For Changes In ISPC 1.21.0</a></li>
108108
<li><a class="reference internal" href="#updating-ispc-programs-for-changes-in-ispc-1-22-0">Updating ISPC Programs For Changes In ISPC 1.22.0</a></li>
109109
<li><a class="reference internal" href="#updating-ispc-programs-for-changes-in-ispc-1-23-0">Updating ISPC Programs For Changes In ISPC 1.23.0</a></li>
110+
<li><a class="reference internal" href="#updating-ispc-programs-for-changes-in-ispc-1-24-0">Updating ISPC Programs For Changes In ISPC 1.24.0</a></li>
110111
</ul>
111112
</li>
112113
<li><a class="reference internal" href="#getting-started-with-ispc">Getting Started with ISPC</a><ul>
@@ -210,6 +211,8 @@ <h1 class="title">Intel® ISPC User's Guide</h1>
210211
<li><a class="reference internal" href="#math-functions">Math Functions</a><ul>
211212
<li><a class="reference internal" href="#basic-math-functions">Basic Math Functions</a></li>
212213
<li><a class="reference internal" href="#transcendental-functions">Transcendental Functions</a></li>
214+
<li><a class="reference internal" href="#saturating-arithmetic">Saturating Arithmetic</a></li>
215+
<li><a class="reference internal" href="#dot-product">Dot product</a></li>
213216
<li><a class="reference internal" href="#pseudo-random-numbers">Pseudo-Random Numbers</a></li>
214217
<li><a class="reference internal" href="#random-numbers">Random Numbers</a></li>
215218
</ul>
@@ -575,6 +578,34 @@ <h2>Updating ISPC Programs For Changes In ISPC 1.23.0</h2>
575578
<p>The result of selection operator can now be used as lvalue if it has suitable
576579
type.</p>
577580
</div>
581+
<div class="section" id="updating-ispc-programs-for-changes-in-ispc-1-24-0">
582+
<h2>Updating ISPC Programs For Changes In ISPC 1.24.0</h2>
583+
<p>This release extends the standard library with new functions performing dot
584+
product operations. These functions utilize specific hardware instructions from
585+
AVX-VNNI and AVX512-VNNI. The ISPC targets that support native VNNI
586+
instructions are <tt class="docutils literal"><span class="pre">avx2vnni-i32x*</span></tt>, <tt class="docutils literal"><span class="pre">avx512icl-*</span></tt> and <tt class="docutils literal"><span class="pre">avx512spr-*</span></tt>. The
587+
first two targets (<tt class="docutils literal"><span class="pre">avx2vnni-*</span></tt> and <tt class="docutils literal"><span class="pre">avx512icl-*</span></tt>) were introduced in this
588+
release. Please refer to <a class="reference internal" href="#dot-product">Dot product</a> for more details.</p>
589+
<p>Now, uniform integers and enums can be used as non-type template parameters.
590+
Please refer to <a class="reference internal" href="#function-templates">Function Templates</a> for more details.</p>
591+
<p>The release contains the following changes that may affect compatibility with
592+
older versions:</p>
593+
<ul class="simple">
594+
<li><tt class="docutils literal"><span class="pre">--pic</span></tt> command line flag now corresponds to the <tt class="docutils literal"><span class="pre">-fpic</span></tt> flag of Clang
595+
and GCC, whereas the newly introduced <tt class="docutils literal"><span class="pre">--PIC</span></tt> corresponds to <tt class="docutils literal"><span class="pre">-fPIC</span></tt>.
596+
The previous behavior of <tt class="docutils literal"><span class="pre">--pic</span></tt> flag corresponded to <tt class="docutils literal"><span class="pre">-fPIC</span></tt> flag. In
597+
some cases, to preserve previous behavior, users may need to switch to
598+
<tt class="docutils literal"><span class="pre">--PIC</span></tt>.</li>
599+
<li>Newly introduced macro definitions for numeric limits can cause conflicts
600+
with user-defined macros with same names. When this happens, ISPC emits
601+
warnings about macro redefinition. Please, refer to <a class="reference internal" href="#the-preprocessor">The Preprocessor</a> for
602+
the full list of macro definitions.</li>
603+
<li>The implementation of <tt class="docutils literal">round</tt> standard library function was aligned across
604+
all targets. It may potentially affect the results of the code that uses this
605+
function for the following targets: <tt class="docutils literal"><span class="pre">avx2-i16x16</span></tt>, <tt class="docutils literal"><span class="pre">avx2-i8x32</span></tt> and all
606+
<tt class="docutils literal">avx512</tt> targets. Please, refer to <a class="reference internal" href="#basic-math-functions">Basic Math Functions</a> for more details.</li>
607+
</ul>
608+
</div>
578609
</div>
579610
<div class="section" id="getting-started-with-ispc">
580611
<h1>Getting Started with ISPC</h1>
@@ -735,8 +766,11 @@ <h2>Basic Command-line Options</h2>
735766
silenced with the <tt class="docutils literal"><span class="pre">--wno-perf</span></tt> flag (or by using <tt class="docutils literal"><span class="pre">--woff</span></tt>, which turns
736767
off all compiler warnings.) Furthermore, <tt class="docutils literal"><span class="pre">--werror</span></tt> can be provided to
737768
direct the compiler to treat any warnings as errors.</p>
738-
<p>Position-independent code (for use in shared libraries) is generated if the
739-
<tt class="docutils literal"><span class="pre">--pic</span></tt> command-line argument is provided.</p>
769+
<p>The <tt class="docutils literal"><span class="pre">--pic</span></tt> flag can be used to generate position-independent code suitable
770+
for use in a shared library. The <tt class="docutils literal"><span class="pre">--PIC</span></tt> flag can be used to generate
771+
position-independent code suitable for dynamic linking avoiding any limit on
772+
the size of the global offset table. When no <tt class="docutils literal"><span class="pre">--pic</span></tt> or <tt class="docutils literal"><span class="pre">--PIC</span></tt> flag is
773+
provided, the compiler enforces target-specific default behavior.</p>
740774
</div>
741775
<div class="section" id="selecting-the-compilation-target">
742776
<h2>Selecting The Compilation Target</h2>
@@ -901,8 +935,10 @@ <h2>Selecting The Compilation Target</h2>
901935
<tt class="docutils literal"><span class="pre">sse4.1-i32x8</span></tt>, <tt class="docutils literal"><span class="pre">sse4.2-i8x16</span></tt>, <tt class="docutils literal"><span class="pre">sse4.2-i16x8</span></tt>, <tt class="docutils literal"><span class="pre">sse4.2-i32x4</span></tt>, <tt class="docutils literal"><span class="pre">sse4.2-i32x8</span></tt>,
902936
<tt class="docutils literal"><span class="pre">avx1-i32x4</span></tt>, <tt class="docutils literal"><span class="pre">avx1-i32x8</span></tt>, <tt class="docutils literal"><span class="pre">avx1-i32x16</span></tt>, <tt class="docutils literal"><span class="pre">avx1-i64x4</span></tt>, <tt class="docutils literal"><span class="pre">avx2-i8x32</span></tt>,
903937
<tt class="docutils literal"><span class="pre">avx2-i16x16</span></tt>, <tt class="docutils literal"><span class="pre">avx2-i32x4</span></tt>, <tt class="docutils literal"><span class="pre">avx2-i32x8</span></tt>, <tt class="docutils literal"><span class="pre">avx2-i32x16</span></tt>, <tt class="docutils literal"><span class="pre">avx2-i64x4</span></tt>,
938+
<tt class="docutils literal"><span class="pre">avx2vnni-i32x4</span></tt>, <tt class="docutils literal"><span class="pre">avx2vnni-i32x8</span></tt>, <tt class="docutils literal"><span class="pre">avx2vnni-i32x16</span></tt>,
904939
<tt class="docutils literal"><span class="pre">avx512knl-x16</span></tt>, <tt class="docutils literal"><span class="pre">avx512skx-x4</span></tt>, <tt class="docutils literal"><span class="pre">avx512skx-x8</span></tt>, <tt class="docutils literal"><span class="pre">avx512skx-x16</span></tt>, <tt class="docutils literal"><span class="pre">avx512skx-x32</span></tt>,
905-
<tt class="docutils literal"><span class="pre">avx512skx-x64</span></tt>, <tt class="docutils literal"><span class="pre">avx512spr-x4</span></tt>, <tt class="docutils literal"><span class="pre">avx512spr-x8</span></tt>, <tt class="docutils literal"><span class="pre">avx512spr-x16</span></tt>, <tt class="docutils literal"><span class="pre">avx512spr-x32</span></tt>,
940+
<tt class="docutils literal"><span class="pre">avx512skx-x64</span></tt>, <tt class="docutils literal"><span class="pre">avx512icl-x4</span></tt>, <tt class="docutils literal"><span class="pre">avx512icl-x8</span></tt>, <tt class="docutils literal"><span class="pre">avx512icl-x16</span></tt>, <tt class="docutils literal"><span class="pre">avx512icl-x32</span></tt>,
941+
<tt class="docutils literal"><span class="pre">avx512icl-x64</span></tt>, <tt class="docutils literal"><span class="pre">avx512spr-x4</span></tt>, <tt class="docutils literal"><span class="pre">avx512spr-x8</span></tt>, <tt class="docutils literal"><span class="pre">avx512spr-x16</span></tt>, <tt class="docutils literal"><span class="pre">avx512spr-x32</span></tt>,
906942
<tt class="docutils literal"><span class="pre">avx512spr-x64</span></tt>.</p>
907943
<p>Neon targets:</p>
908944
<p><tt class="docutils literal"><span class="pre">neon-i8x16</span></tt>, <tt class="docutils literal"><span class="pre">neon-i16x8</span></tt>, <tt class="docutils literal"><span class="pre">neon-i32x4</span></tt>, <tt class="docutils literal"><span class="pre">neon-i32x8</span></tt>.</p>
@@ -1008,6 +1044,26 @@ <h2>The Preprocessor</h2>
10081044
<td>1</td>
10091045
<td>The macro is defined if LLVM intrinsics support is enabled</td>
10101046
</tr>
1047+
<tr><td>INT8_MIN, INT16_MIN, INT32_MIN, INT64_MIN</td>
1048+
<td>&nbsp;</td>
1049+
<td>Minimum value of signed integer types of the corresponding size</td>
1050+
</tr>
1051+
<tr><td>INT8_MAX, INT16_MAX, INT32_MAX, INT64_MAX</td>
1052+
<td>&nbsp;</td>
1053+
<td>Maximum value of signed integer types of the corresponding size</td>
1054+
</tr>
1055+
<tr><td>UINT8_MAX, UINT16_MAX, UINT32_MAX, UINT64_MAX</td>
1056+
<td>&nbsp;</td>
1057+
<td>Maximum value of unsigned integer types of the corresponding size</td>
1058+
</tr>
1059+
<tr><td>FLT16_MIN, FLT_MIN, DBL_MIN</td>
1060+
<td>&nbsp;</td>
1061+
<td>Smallest positive normal number of the corresponding floating-point type</td>
1062+
</tr>
1063+
<tr><td>FLT16_MAX, FLT_MAX, DBL_MAX</td>
1064+
<td>&nbsp;</td>
1065+
<td>Largest normal number of the corresponding floating-point type</td>
1066+
</tr>
10111067
</tbody>
10121068
</table>
10131069
<p><tt class="docutils literal">ispc</tt> supports the following <tt class="docutils literal">#pragma</tt> directives.</p>
@@ -3426,10 +3482,10 @@ <h2>Function Templates</h2>
34263482
<tt class="docutils literal">template int <span class="pre">add&lt;int&gt;(int</span> a, int b);</tt>).</li>
34273483
<li>Explicit template function specializations (i.e.
34283484
<tt class="docutils literal">template&lt;&gt; int <span class="pre">add&lt;int&gt;(int</span> a, int b) { return a - b;}</tt>).</li>
3485+
<li>Non-type template parametrs (integral and enumeration types).</li>
34293486
</ul>
34303487
<p>What is currently not supported, but is planned to be supported:</p>
34313488
<ul class="simple">
3432-
<li>Non-type template parameters.</li>
34333489
<li>Default values for template parameters.</li>
34343490
<li>Template arguments deduction in template function specializations.</li>
34353491
</ul>
@@ -3517,6 +3573,37 @@ <h2>Function Templates</h2>
35173573
return a1 * a2;
35183574
}
35193575
</pre>
3576+
<p>For non-type template parameters, the following rules apply:</p>
3577+
<ul>
3578+
<li><p class="first">Uniform integral types and enum types can be used as non-type template parameters. Unbound types are treated as uniform.
3579+
For example:</p>
3580+
<pre class="literal-block">
3581+
template &lt;int N&gt; int foo(int a) { // N is uniform int
3582+
return a * N;
3583+
}
3584+
3585+
int bar() {
3586+
return foo&lt;2&gt;(3); // returns 6
3587+
}
3588+
3589+
enum AB { A = 1, B = 2 };
3590+
template &lt;AB ab&gt; int baz(int a) {
3591+
return a * ab;
3592+
}
3593+
3594+
int qux() {
3595+
return baz&lt;B&gt;(3); // returns 6
3596+
}
3597+
</pre>
3598+
</li>
3599+
<li><p class="first">Varying types are not allowed.</p>
3600+
</li>
3601+
<li><p class="first">Integral constants, enumeration constants and template parameters (in the context of the nested templates)
3602+
can be used as non-type template arguments. Constant expressions are not allowed.</p>
3603+
</li>
3604+
<li><p class="first">Partial specialization of function templates with non-type template parameters is not allowed.</p>
3605+
</li>
3606+
</ul>
35203607
<p>You can use limited number of function specifiers with function templates:</p>
35213608
<ul class="simple">
35223609
<li>The keywords <tt class="docutils literal">export</tt>, <tt class="docutils literal">task</tt>, <tt class="docutils literal">typedef</tt>, <tt class="docutils literal">extern &quot;C&quot;</tt> and <tt class="docutils literal">extern &quot;SYCL&quot;</tt>
@@ -3704,9 +3791,16 @@ <h2>Basic Math Functions</h2>
37043791
unsigned int64 signbits(double x)
37053792
uniform unsigned int64 signbits(uniform double x)
37063793
</pre>
3707-
<p>Standard rounding functions are provided for <tt class="docutils literal">float16</tt>, <tt class="docutils literal">float</tt> and <tt class="docutils literal">double</tt>
3708-
types. (On machines that support Intel®SSE or Intel® AVX, these functions all
3709-
map to variants of the <tt class="docutils literal">roundss</tt> and <tt class="docutils literal">roundps</tt> instructions, respectively.)</p>
3794+
<p>The standard library provides four rounding functions: <tt class="docutils literal">round</tt>, <tt class="docutils literal">floor</tt>,
3795+
<tt class="docutils literal">ceil</tt> and <tt class="docutils literal">trunc</tt> for <tt class="docutils literal">float16</tt>, <tt class="docutils literal">float</tt> and <tt class="docutils literal">double</tt> data types. On
3796+
machines that support Intel®SSE or Intel® AVX, these functions all map to a
3797+
single instruction, specifically a variant of the <tt class="docutils literal">roundss</tt> and <tt class="docutils literal">roundps</tt>
3798+
instructions. This offers enhanced performance, despite a minor semantic
3799+
difference in the <tt class="docutils literal">round</tt> function when compared to the <tt class="docutils literal">C</tt> math library
3800+
<tt class="docutils literal">round</tt> function. It computes the nearest integer value, rounding halfway
3801+
cases to nearest even integer, i.e., corresponds to the <tt class="docutils literal">C</tt> math library
3802+
<tt class="docutils literal">roundeven</tt> function. These function operate regardless of the current
3803+
rounding mode and do not signal precision exceptions.</p>
37103804
<pre class="literal-block">
37113805
float round(float x)
37123806
uniform float round(uniform float x)
@@ -3886,6 +3980,35 @@ <h2>Saturating Arithmetic</h2>
38863980
above, there are versions that supports <tt class="docutils literal">int16</tt>, <tt class="docutils literal">int32</tt> and <tt class="docutils literal">int64</tt>
38873981
values as well.</p>
38883982
</div>
3983+
<div class="section" id="dot-product">
3984+
<h2>Dot product</h2>
3985+
<p>ISPC supports dot product operations for unsigned and signed <tt class="docutils literal">int8</tt> and <tt class="docutils literal">int16</tt> data types,
3986+
leveraging the AVX-VNNI and AVX512-VNNI instruction sets. The ISPC targets that support
3987+
native VNNI instruction sets are <tt class="docutils literal"><span class="pre">avx2vnni-i32x*</span></tt>, <tt class="docutils literal"><span class="pre">avx512icl-i32x*</span></tt>, and <tt class="docutils literal"><span class="pre">avx512spr-i32x*</span></tt>.
3988+
For other targets these operations are emulated.
3989+
These dot product operations are specifically designed to operate on <em>packed</em> input vectors,
3990+
necessitating proper packing of input vectors by the programmer before use.</p>
3991+
<p>For 8-bit Integer Vectors:</p>
3992+
<p>The functions multiply groups of four unsigned 8-bit integers packed in <tt class="docutils literal">a</tt> with corresponding
3993+
four signed 8-bit integers packed in <tt class="docutils literal">b</tt>, resulting in four intermediate signed 16-bit values.
3994+
The sum of these values, in combination with the <tt class="docutils literal">acc</tt> accumulator, is then returned as the final result.</p>
3995+
<pre class="literal-block">
3996+
varying int32 dot4add_u8i8packed(varying uint32 a, varying uint32 b,
3997+
varying int32 acc)
3998+
varying int32 dot4add_u8i8packed_sat(varying uint32 a, varying uint32 b,
3999+
varying int32 acc) // saturate the result
4000+
</pre>
4001+
<p>For 16-bit Integer Vectors:</p>
4002+
<p>The functions multiply groups of two signed 16-bit integers packed in <tt class="docutils literal">a</tt> with corresponding
4003+
two signed 16-bit integers packed in <tt class="docutils literal">b</tt>, yielding two intermediate signed 32-bit results.
4004+
The sum of these results, combined with the <tt class="docutils literal">acc</tt> accumulator, is then returned as the final result.</p>
4005+
<pre class="literal-block">
4006+
varying int32 dot2add_i16packed(varying uint32 a, varying uint32 b,
4007+
varying int32 acc)
4008+
varying int32 dot2add_i16packed_sat(varying uint32 a, varying uint32 b,
4009+
varying int32 acc) // saturate the result
4010+
</pre>
4011+
</div>
38894012
<div class="section" id="pseudo-random-numbers">
38904013
<h2>Pseudo-Random Numbers</h2>
38914014
<p>A simple random number generator is provided by the <tt class="docutils literal">ispc</tt> standard

0 commit comments

Comments
 (0)