@@ -116,6 +116,7 @@ <h1 class="title">Intel® ISPC User's Guide</h1>
116116< li > < a class ="reference internal " href ="#updating-ispc-programs-for-changes-in-ispc-1-17-0 "> Updating ISPC Programs For Changes In ISPC 1.17.0</ a > </ li >
117117< li > < a class ="reference internal " href ="#updating-ispc-programs-for-changes-in-ispc-1-18-0 "> Updating ISPC Programs For Changes In ISPC 1.18.0</ a > </ li >
118118< li > < a class ="reference internal " href ="#updating-ispc-programs-for-changes-in-ispc-1-19-0 "> Updating ISPC Programs For Changes In ISPC 1.19.0</ a > </ li >
119+ < li > < a class ="reference internal " href ="#updating-ispc-programs-for-changes-in-ispc-1-20-0 "> Updating ISPC Programs For Changes In ISPC 1.20.0</ a > </ li >
119120</ ul >
120121</ li >
121122< li > < a class ="reference internal " href ="#getting-started-with-ispc "> Getting Started with ISPC</ a > < ul >
@@ -534,6 +535,15 @@ <h2>Updating ISPC Programs For Changes In ISPC 1.19.0</h2>
534535and < tt class ="docutils literal "> typename</ tt > .</ p >
535536< p > < tt class ="docutils literal "> ISPC_FP16_SUPPORTED</ tt > macro was introduced for the targets supporting FP16.</ p >
536537</ div >
538+ < div class ="section " id ="updating-ispc-programs-for-changes-in-ispc-1-20-0 ">
539+ < h2 > Updating ISPC Programs For Changes In ISPC 1.20.0</ h2 >
540+ < p > New version of < cite > sse4</ cite > targets were added, now you can specify either < cite > sse4.1</ cite >
541+ or < cite > sse4.2</ cite > , for example < cite > sse4.2-i32x4</ cite > . The changes are fully backward
542+ compatible, meaning that < cite > sse4</ cite > versions are still accepted and aliased to
543+ < cite > sse4.2</ cite > . Multi-target compilation accepts only one of < cite > sse4</ cite > /< cite > sse4.1</ cite > /< cite > sse4.2</ cite >
544+ targets. All of these targets will produce an object file with < cite > sse4</ cite > suffix in
545+ multi-target compilation.</ p >
546+ </ div >
537547</ div >
538548< div class ="section " id ="getting-started-with-ispc ">
539549< h1 > Getting Started with ISPC</ h1 >
@@ -753,7 +763,7 @@ <h2>Selecting The Compilation Target</h2>
753763< td > AVX (2010-2011 era Intel CPUs)</ td >
754764</ tr >
755765< tr > < td > avx2</ td >
756- < td > AVX 2 target (2013- Intel " Haswell" CPUs)</ td >
766+ < td > AVX 2 target (2013- Intel codename Haswell CPUs)</ td >
757767</ tr >
758768< tr > < td > avx512knl</ td >
759769< td > AVX 512 target (Xeon Phi chips codename Knights Landing)</ td >
@@ -770,18 +780,24 @@ <h2>Selecting The Compilation Target</h2>
770780< tr > < td > sse2</ td >
771781< td > SSE2 (early 2000s era x86 CPUs)</ td >
772782</ tr >
773- < tr > < td > sse4</ td >
774- < td > SSE4 (generally 2008-2010 Intel CPUs)</ td >
783+ < tr > < td > sse4.1</ td >
784+ < td > SSE4.1 (2007 Intel codename Penryn CPUs)</ td >
785+ </ tr >
786+ < tr > < td > sse4.2</ td >
787+ < td > SSE4.2 (2008-2010 Intel codename Nehalem CPUs)</ td >
775788</ tr >
776789< tr > < td > gen9</ td >
777790< td > Intel Gen9 GPU</ td >
778791</ tr >
779- < tr > < td > xehpg</ td >
780- < td > Intel XeHPG GPU</ td >
781- </ tr >
782792< tr > < td > xelp</ td >
783793< td > Intel XeLP GPU</ td >
784794</ tr >
795+ < tr > < td > xehpg</ td >
796+ < td > Intel Arc GPU</ td >
797+ </ tr >
798+ < tr > < td > xehpc</ td >
799+ < td > Intel Ponte Vecchio GPU</ td >
800+ </ tr >
785801</ tbody >
786802</ table >
787803< p > Consult your CPU's manual for specifics on which vector instruction set it
@@ -834,20 +850,38 @@ <h2>Selecting The Compilation Target</h2>
834850< tr > < td > sse2-i32x8</ td >
835851< td > sse2-x2</ td >
836852</ tr >
837- < tr > < td > sse4-i32x4</ td >
853+ < tr > < td > sse4.2 -i32x4</ td >
838854< td > sse4</ td >
839855</ tr >
840- < tr > < td > sse4-i32x8</ td >
856+ < tr > < td > sse4.2 -i32x8</ td >
841857< td > sse4-x2</ td >
842858</ tr >
843- < tr > < td > sse4-i8x16</ td >
859+ < tr > < td > sse4.2 -i8x16</ td >
844860< td > n/a</ td >
845861</ tr >
846- < tr > < td > sse4-i16x8</ td >
862+ < tr > < td > sse4.2 -i16x8</ td >
847863< td > n/a</ td >
848864</ tr >
849865</ tbody >
850866</ table >
867+ < p > The full list of supported targets is below.</ p >
868+ < p > x86 targets:</ p >
869+ < p > < tt class ="docutils literal "> < span class ="pre "> sse2-i32x4</ span > </ tt > , < tt class ="docutils literal "> < span class ="pre "> sse2-i32x8</ span > </ tt > , < tt class ="docutils literal "> < span class ="pre "> sse4.1-i8x16</ span > </ tt > , < tt class ="docutils literal "> < span class ="pre "> sse4.1-i16x8</ span > </ tt > , < tt class ="docutils literal "> < span class ="pre "> sse4.1-i32x4</ span > </ tt > ,
870+ < tt class ="docutils literal "> < span class ="pre "> sse4.1-i32x8</ span > </ tt > , < tt class ="docutils literal "> < span class ="pre "> sse4.2-i8x16</ span > </ tt > , < tt class ="docutils literal "> < span class ="pre "> sse4.2-i16x8</ span > </ tt > , < tt class ="docutils literal "> < span class ="pre "> sse4.2-i32x4</ span > </ tt > , < tt class ="docutils literal "> < span class ="pre "> sse4.2-i32x8</ span > </ tt > ,
871+ < tt class ="docutils literal "> < span class ="pre "> avx1-i32x4</ span > </ tt > , < tt class ="docutils literal "> < span class ="pre "> avx1-i32x8</ span > </ tt > , < tt class ="docutils literal "> < span class ="pre "> avx1-i32x16</ span > </ tt > , < tt class ="docutils literal "> < span class ="pre "> avx1-i64x4</ span > </ tt > , < tt class ="docutils literal "> < span class ="pre "> avx2-i8x32</ span > </ tt > ,
872+ < tt class ="docutils literal "> < span class ="pre "> avx2-i16x16</ span > </ tt > , < tt class ="docutils literal "> < span class ="pre "> avx2-i32x4</ span > </ tt > , < tt class ="docutils literal "> < span class ="pre "> avx2-i32x8</ span > </ tt > , < tt class ="docutils literal "> < span class ="pre "> avx2-i32x16</ span > </ tt > , < tt class ="docutils literal "> < span class ="pre "> avx2-i64x4</ span > </ tt > ,
873+ < tt class ="docutils literal "> < span class ="pre "> avx512knl-x16</ span > </ tt > , < tt class ="docutils literal "> < span class ="pre "> avx512skx-x4</ span > </ tt > , < tt class ="docutils literal "> < span class ="pre "> avx512skx-x8</ span > </ tt > , < tt class ="docutils literal "> < span class ="pre "> avx512skx-x16</ span > </ tt > , < tt class ="docutils literal "> < span class ="pre "> avx512skx-x32</ span > </ tt > ,
874+ < tt class ="docutils literal "> < span class ="pre "> avx512skx-x64</ span > </ tt > , < tt class ="docutils literal "> < span class ="pre "> avx512spr-x4</ span > </ tt > , < tt class ="docutils literal "> < span class ="pre "> avx512spr-x8</ span > </ tt > , < tt class ="docutils literal "> < span class ="pre "> avx512spr-x16</ span > </ tt > , < tt class ="docutils literal "> < span class ="pre "> avx512spr-x32</ span > </ tt > ,
875+ < tt class ="docutils literal "> < span class ="pre "> avx512spr-x64</ span > </ tt > .</ p >
876+ < p > Neon targets:</ p >
877+ < p > < tt class ="docutils literal "> < span class ="pre "> neon-i8x16</ span > </ tt > , < tt class ="docutils literal "> < span class ="pre "> neon-i16x8</ span > </ tt > , < tt class ="docutils literal "> < span class ="pre "> neon-i32x4</ span > </ tt > , < tt class ="docutils literal "> < span class ="pre "> neon-i32x8</ span > </ tt > .</ p >
878+ < p > Xe targets:</ p >
879+ < p > < tt class ="docutils literal "> < span class ="pre "> gen9-x8</ span > </ tt > , < tt class ="docutils literal "> < span class ="pre "> gen9-x16</ span > </ tt > , < tt class ="docutils literal "> < span class ="pre "> xelp-x8</ span > </ tt > , < tt class ="docutils literal "> < span class ="pre "> xelp-x16</ span > </ tt > , < tt class ="docutils literal "> < span class ="pre "> xehpg-x8</ span > </ tt > , < tt class ="docutils literal "> < span class ="pre "> xehpg-x16</ span > </ tt > , < tt class ="docutils literal "> < span class ="pre "> xehpc-x16</ span > </ tt > , < tt class ="docutils literal "> < span class ="pre "> xehpc-x32</ span > </ tt > .</ p >
880+ < p > Note that < tt class ="docutils literal "> sse4.1</ tt > and < tt class ="docutils literal "> sse4.2</ tt > targets may not be used together in
881+ multi-target compilation. While the auto-dispatch code will correctly detect
882+ the difference between these two ISAs, they both yield a binary with < tt class ="docutils literal "> sse4</ tt >
883+ suffix. This limitation is to maintain backward compatibility with build
884+ systems expecting < tt class ="docutils literal "> sse4</ tt > suffix.</ p >
851885< p > Finally, < tt class ="docutils literal "> < span class ="pre "> --target-os</ span > </ tt > selects the target operating system. Depending on
852886your host < tt class ="docutils literal "> ispc</ tt > may support Windows, Linux, macOS, Android, iOS and PS4/PS5
853887targets. Running < tt class ="docutils literal "> ispc < span class ="pre "> --help</ span > </ tt > and looking at the output for the < tt class ="docutils literal "> < span class ="pre "> --target-os</ span > </ tt >
@@ -3073,7 +3107,7 @@ <h2>Task Parallel Execution</h2>
30733107< h2 > Task Parallelism: "launch" and "sync" Statements</ h2 >
30743108< p > One option for combining task-parallelism with < tt class ="docutils literal "> ispc</ tt > is to just use
30753109regular task parallelism in the C/C++ application code (be it through
3076- Intel® Thread Building Blocks, OpenMP or another task system), and
3110+ Intel® oneAPI Threading Building Blocks, OpenMP or another task system), and
30773111for tasks to use < tt class ="docutils literal "> ispc</ tt > for SPMD parallelism across the vector lanes as
30783112appropriate. Alternatively, < tt class ="docutils literal "> ispc</ tt > also has support for launching tasks
30793113from < tt class ="docutils literal "> ispc</ tt > code. (Check the < tt class ="docutils literal "> examples/mandelbrot_tasks</ tt > example to
0 commit comments