Skip to content

Commit e6f4eed

Browse files
committed
Docs update for v1.20.0
1 parent f092157 commit e6f4eed

2 files changed

Lines changed: 66 additions & 18 deletions

File tree

ispc.html

Lines changed: 45 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -116,6 +116,7 @@ <h1 class="title">Intel® ISPC User's Guide</h1>
116116
<li><a class="reference internal" href="#updating-ispc-programs-for-changes-in-ispc-1-17-0">Updating ISPC Programs For Changes In ISPC 1.17.0</a></li>
117117
<li><a class="reference internal" href="#updating-ispc-programs-for-changes-in-ispc-1-18-0">Updating ISPC Programs For Changes In ISPC 1.18.0</a></li>
118118
<li><a class="reference internal" href="#updating-ispc-programs-for-changes-in-ispc-1-19-0">Updating ISPC Programs For Changes In ISPC 1.19.0</a></li>
119+
<li><a class="reference internal" href="#updating-ispc-programs-for-changes-in-ispc-1-20-0">Updating ISPC Programs For Changes In ISPC 1.20.0</a></li>
119120
</ul>
120121
</li>
121122
<li><a class="reference internal" href="#getting-started-with-ispc">Getting Started with ISPC</a><ul>
@@ -534,6 +535,15 @@ <h2>Updating ISPC Programs For Changes In ISPC 1.19.0</h2>
534535
and <tt class="docutils literal">typename</tt>.</p>
535536
<p><tt class="docutils literal">ISPC_FP16_SUPPORTED</tt> macro was introduced for the targets supporting FP16.</p>
536537
</div>
538+
<div class="section" id="updating-ispc-programs-for-changes-in-ispc-1-20-0">
539+
<h2>Updating ISPC Programs For Changes In ISPC 1.20.0</h2>
540+
<p>New version of <cite>sse4</cite> targets were added, now you can specify either <cite>sse4.1</cite>
541+
or <cite>sse4.2</cite>, for example <cite>sse4.2-i32x4</cite>. The changes are fully backward
542+
compatible, meaning that <cite>sse4</cite> versions are still accepted and aliased to
543+
<cite>sse4.2</cite>. Multi-target compilation accepts only one of <cite>sse4</cite>/<cite>sse4.1</cite>/<cite>sse4.2</cite>
544+
targets. All of these targets will produce an object file with <cite>sse4</cite> suffix in
545+
multi-target compilation.</p>
546+
</div>
537547
</div>
538548
<div class="section" id="getting-started-with-ispc">
539549
<h1>Getting Started with ISPC</h1>
@@ -753,7 +763,7 @@ <h2>Selecting The Compilation Target</h2>
753763
<td>AVX (2010-2011 era Intel CPUs)</td>
754764
</tr>
755765
<tr><td>avx2</td>
756-
<td>AVX 2 target (2013- Intel &quot;Haswell&quot; CPUs)</td>
766+
<td>AVX 2 target (2013- Intel codename Haswell CPUs)</td>
757767
</tr>
758768
<tr><td>avx512knl</td>
759769
<td>AVX 512 target (Xeon Phi chips codename Knights Landing)</td>
@@ -770,18 +780,24 @@ <h2>Selecting The Compilation Target</h2>
770780
<tr><td>sse2</td>
771781
<td>SSE2 (early 2000s era x86 CPUs)</td>
772782
</tr>
773-
<tr><td>sse4</td>
774-
<td>SSE4 (generally 2008-2010 Intel CPUs)</td>
783+
<tr><td>sse4.1</td>
784+
<td>SSE4.1 (2007 Intel codename Penryn CPUs)</td>
785+
</tr>
786+
<tr><td>sse4.2</td>
787+
<td>SSE4.2 (2008-2010 Intel codename Nehalem CPUs)</td>
775788
</tr>
776789
<tr><td>gen9</td>
777790
<td>Intel Gen9 GPU</td>
778791
</tr>
779-
<tr><td>xehpg</td>
780-
<td>Intel XeHPG GPU</td>
781-
</tr>
782792
<tr><td>xelp</td>
783793
<td>Intel XeLP GPU</td>
784794
</tr>
795+
<tr><td>xehpg</td>
796+
<td>Intel Arc GPU</td>
797+
</tr>
798+
<tr><td>xehpc</td>
799+
<td>Intel Ponte Vecchio GPU</td>
800+
</tr>
785801
</tbody>
786802
</table>
787803
<p>Consult your CPU's manual for specifics on which vector instruction set it
@@ -834,20 +850,38 @@ <h2>Selecting The Compilation Target</h2>
834850
<tr><td>sse2-i32x8</td>
835851
<td>sse2-x2</td>
836852
</tr>
837-
<tr><td>sse4-i32x4</td>
853+
<tr><td>sse4.2-i32x4</td>
838854
<td>sse4</td>
839855
</tr>
840-
<tr><td>sse4-i32x8</td>
856+
<tr><td>sse4.2-i32x8</td>
841857
<td>sse4-x2</td>
842858
</tr>
843-
<tr><td>sse4-i8x16</td>
859+
<tr><td>sse4.2-i8x16</td>
844860
<td>n/a</td>
845861
</tr>
846-
<tr><td>sse4-i16x8</td>
862+
<tr><td>sse4.2-i16x8</td>
847863
<td>n/a</td>
848864
</tr>
849865
</tbody>
850866
</table>
867+
<p>The full list of supported targets is below.</p>
868+
<p>x86 targets:</p>
869+
<p><tt class="docutils literal"><span class="pre">sse2-i32x4</span></tt>, <tt class="docutils literal"><span class="pre">sse2-i32x8</span></tt>, <tt class="docutils literal"><span class="pre">sse4.1-i8x16</span></tt>, <tt class="docutils literal"><span class="pre">sse4.1-i16x8</span></tt>, <tt class="docutils literal"><span class="pre">sse4.1-i32x4</span></tt>,
870+
<tt class="docutils literal"><span class="pre">sse4.1-i32x8</span></tt>, <tt class="docutils literal"><span class="pre">sse4.2-i8x16</span></tt>, <tt class="docutils literal"><span class="pre">sse4.2-i16x8</span></tt>, <tt class="docutils literal"><span class="pre">sse4.2-i32x4</span></tt>, <tt class="docutils literal"><span class="pre">sse4.2-i32x8</span></tt>,
871+
<tt class="docutils literal"><span class="pre">avx1-i32x4</span></tt>, <tt class="docutils literal"><span class="pre">avx1-i32x8</span></tt>, <tt class="docutils literal"><span class="pre">avx1-i32x16</span></tt>, <tt class="docutils literal"><span class="pre">avx1-i64x4</span></tt>, <tt class="docutils literal"><span class="pre">avx2-i8x32</span></tt>,
872+
<tt class="docutils literal"><span class="pre">avx2-i16x16</span></tt>, <tt class="docutils literal"><span class="pre">avx2-i32x4</span></tt>, <tt class="docutils literal"><span class="pre">avx2-i32x8</span></tt>, <tt class="docutils literal"><span class="pre">avx2-i32x16</span></tt>, <tt class="docutils literal"><span class="pre">avx2-i64x4</span></tt>,
873+
<tt class="docutils literal"><span class="pre">avx512knl-x16</span></tt>, <tt class="docutils literal"><span class="pre">avx512skx-x4</span></tt>, <tt class="docutils literal"><span class="pre">avx512skx-x8</span></tt>, <tt class="docutils literal"><span class="pre">avx512skx-x16</span></tt>, <tt class="docutils literal"><span class="pre">avx512skx-x32</span></tt>,
874+
<tt class="docutils literal"><span class="pre">avx512skx-x64</span></tt>, <tt class="docutils literal"><span class="pre">avx512spr-x4</span></tt>, <tt class="docutils literal"><span class="pre">avx512spr-x8</span></tt>, <tt class="docutils literal"><span class="pre">avx512spr-x16</span></tt>, <tt class="docutils literal"><span class="pre">avx512spr-x32</span></tt>,
875+
<tt class="docutils literal"><span class="pre">avx512spr-x64</span></tt>.</p>
876+
<p>Neon targets:</p>
877+
<p><tt class="docutils literal"><span class="pre">neon-i8x16</span></tt>, <tt class="docutils literal"><span class="pre">neon-i16x8</span></tt>, <tt class="docutils literal"><span class="pre">neon-i32x4</span></tt>, <tt class="docutils literal"><span class="pre">neon-i32x8</span></tt>.</p>
878+
<p>Xe targets:</p>
879+
<p><tt class="docutils literal"><span class="pre">gen9-x8</span></tt>, <tt class="docutils literal"><span class="pre">gen9-x16</span></tt>, <tt class="docutils literal"><span class="pre">xelp-x8</span></tt>, <tt class="docutils literal"><span class="pre">xelp-x16</span></tt>, <tt class="docutils literal"><span class="pre">xehpg-x8</span></tt>, <tt class="docutils literal"><span class="pre">xehpg-x16</span></tt>, <tt class="docutils literal"><span class="pre">xehpc-x16</span></tt>, <tt class="docutils literal"><span class="pre">xehpc-x32</span></tt>.</p>
880+
<p>Note that <tt class="docutils literal">sse4.1</tt> and <tt class="docutils literal">sse4.2</tt> targets may not be used together in
881+
multi-target compilation. While the auto-dispatch code will correctly detect
882+
the difference between these two ISAs, they both yield a binary with <tt class="docutils literal">sse4</tt>
883+
suffix. This limitation is to maintain backward compatibility with build
884+
systems expecting <tt class="docutils literal">sse4</tt> suffix.</p>
851885
<p>Finally, <tt class="docutils literal"><span class="pre">--target-os</span></tt> selects the target operating system. Depending on
852886
your host <tt class="docutils literal">ispc</tt> may support Windows, Linux, macOS, Android, iOS and PS4/PS5
853887
targets. Running <tt class="docutils literal">ispc <span class="pre">--help</span></tt> and looking at the output for the <tt class="docutils literal"><span class="pre">--target-os</span></tt>
@@ -3073,7 +3107,7 @@ <h2>Task Parallel Execution</h2>
30733107
<h2>Task Parallelism: &quot;launch&quot; and &quot;sync&quot; Statements</h2>
30743108
<p>One option for combining task-parallelism with <tt class="docutils literal">ispc</tt> is to just use
30753109
regular task parallelism in the C/C++ application code (be it through
3076-
Intel® Thread Building Blocks, OpenMP or another task system), and
3110+
Intel® oneAPI Threading Building Blocks, OpenMP or another task system), and
30773111
for tasks to use <tt class="docutils literal">ispc</tt> for SPMD parallelism across the vector lanes as
30783112
appropriate. Alternatively, <tt class="docutils literal">ispc</tt> also has support for launching tasks
30793113
from <tt class="docutils literal">ispc</tt> code. (Check the <tt class="docutils literal">examples/mandelbrot_tasks</tt> example to

ispc_for_xe.html

Lines changed: 21 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -119,9 +119,9 @@ <h2>Environment</h2>
119119
Compute Runtime</a> on Linux
120120
or <a class="reference external" href="https://www.intel.com/content/www/us/en/download-center/home.html">Intel(R) Graphics Driver for Windows</a> on
121121
Windows. Additionally you need <a class="reference external" href="https://github.com/oneapi-src/level-zero/releases">Level Zero Loader</a>.</p>
122-
<p>To use ISPC Run Time for CPU on Linux you need to have <tt class="docutils literal">OpenMP runtime</tt>
122+
<p>To use ISPC Run Time for CPU on Linux you need to have <tt class="docutils literal">Intel(R) oneAPI Threading Basic Blocks</tt>
123123
installed on your system. Consult your Linux distribution documentation for the
124-
installation of OpenMP runtime instructions.</p>
124+
installation of TBB runtime instructions.</p>
125125
</div>
126126
<div class="section" id="basic-command-line-options">
127127
<h2>Basic Command-line Options</h2>
@@ -192,11 +192,25 @@ <h2>ISPCRT Objects</h2>
192192
data allocated in the USM are valid both on the host and on the device. Also,
193193
there is no need to explicitly handle data movement between the CPU and the
194194
GPU. This is handled automatically by the <tt class="docutils literal">oneAPI Level Zero</tt> runtime.</li>
195-
<li><tt class="docutils literal">Task queue</tt> - Each <tt class="docutils literal">device</tt> has a task (command) queue and executes
196-
commands from it. The execution may be asynchronous, which means that
197-
subsequent commands can begin executing before the previous ones complete.
198-
There are synchronization primitives available to make the execution
199-
synchronous.</li>
195+
<li><tt class="docutils literal">Task queue</tt> - each <tt class="docutils literal">device</tt> has a task (command) queue and executes
196+
commands from it. Commands may be executed simultaneously. To prevent that
197+
one should explicitly insert barriers in places where synchronization is
198+
required. <tt class="docutils literal">Task queue</tt> <tt class="docutils literal">sync</tt> method stops the host thread until GPU
199+
computation completed. For asynchronous computation, one should utilize
200+
<tt class="docutils literal">CommandQueue</tt> and <tt class="docutils literal">CommandList</tt> objects.</li>
201+
<li><tt class="docutils literal">CommandQueue</tt> - represents a logical input stream to the device and
202+
directly maps to L0 command queues.</li>
203+
<li><tt class="docutils literal">CommandList</tt> - represents commands to be executed on a command queue. It
204+
can be created by calling <tt class="docutils literal">createCommandList</tt> method of <tt class="docutils literal">CommandQueue</tt>
205+
object. Synchronization between all commands in list has to be done
206+
explicitly by putting barriers if needed. Fine-grained synchronization via
207+
<tt class="docutils literal">Events</tt> are not supported yet.</li>
208+
<li><tt class="docutils literal">Fence</tt> - is a synchronization primitive to communicate to the host that
209+
command list execution has completed. <tt class="docutils literal">Fence</tt> is created upon command list
210+
submission. It can be waited synchronously (<tt class="docutils literal">sync</tt>) and asynchronously
211+
(periodically checking <tt class="docutils literal">status</tt>). Fence has two states
212+
<tt class="docutils literal">ISPCRT_FENCE_UNSIGNALED</tt> and <tt class="docutils literal">ISPCRT_FENCE_SIGNALED</tt> returned by
213+
<tt class="docutils literal">status</tt> method.</li>
200214
<li><tt class="docutils literal">Barrier</tt> - synchronization primitive that can be inserted into a <tt class="docutils literal">task
201215
queue</tt> to make sure that all tasks previously inserted into this queue have
202216
completed execution. It is not needed to include <tt class="docutils literal">barrier</tt> between memory

0 commit comments

Comments
 (0)