You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: content/learning-paths/cross-platform/simd-on-rust/_index.md
+7-4Lines changed: 7 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,17 +1,20 @@
1
1
---
2
-
title: Learn how to write SIMD code on Arm using Rust
2
+
title: Write SIMD code on Arm using Rust
3
3
4
4
minutes_to_complete: 30
5
5
6
6
description: Learn how to write SIMD code in Rust on Arm platforms using Neon intrinsics, portable SIMD abstractions, and optimize performance with architecture-specific instructions.
7
7
8
-
who_is_this_for: This is an advanced topic for software developers who want take advantage of SIMD code on Arm systems using Rust.
8
+
who_is_this_for: This is an advanced topic for software developers who want to take advantage of SIMD code on Arm systems using Rust.
9
9
10
10
learning_objectives:
11
-
- Learn how to write SIMD code with Rust on Arm.
11
+
- Write SIMD code with Rust using std::arch and Neon intrinsics on Arm
12
+
- Use portable SIMD abstractions with std::simd for cross-platform code
13
+
- Apply feature detection and target attributes for architecture-specific optimizations
14
+
- Compare C and Rust SIMD implementations and disassembly output
12
15
13
16
prerequisites:
14
-
- An Arm-based computer with recent versions of a C compiler (Clang or GCC) and a Rust compiler installed.
17
+
- An Arm-based computer with recent versions of a C compiler (Clang or GCC) and a Rust compiler installed
Copy file name to clipboardExpand all lines: content/learning-paths/cross-platform/simd-on-rust/conclusion.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -8,5 +8,5 @@ layout: learningpathall
8
8
9
9
You have now seen a few examples of writing SIMD code on Arm with Rust.
10
10
11
-
Performance-wise, there is little difference between C and Rust as Rust is perfectly capable of generating the same assembly code as C in most cases. That said, if you want to program optimal SIMD code using the Arm ASIMD/Neon intrinsics, `std::arch` is the most obvious choice. If, however, your approach needs to be as portable as possible and you don't want to spend time providing multiple implementations for each architecture then `std::simd` is a very viable alternative (even though it's not part of the stable compiler yet).
11
+
Performance-wise, there's little difference between C and Rust as Rust is perfectly capable of generating the same assembly code as C in most cases. That said, if you want to program optimal SIMD code using the Arm ASIMD/Neon intrinsics, `std::arch` is the most obvious choice. If, however, your approach needs to be as portable as possible and you don't want to spend time providing multiple implementations for each architecture then `std::simd` is a very viable alternative (even though it's not part of the stable compiler yet).
Copy file name to clipboardExpand all lines: content/learning-paths/cross-platform/simd-on-rust/intro-to-rust.md
+11-11Lines changed: 11 additions & 11 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -12,10 +12,10 @@ In this Learning Path, you will learn the basics of how to program SIMD code on
12
12
13
13
Rust is a safe programming language with some key advantages:
14
14
15
-
* It is a modern, strong-typed language.
16
-
* Rust is memory safe by design: it is very difficult to introduce a bug like buffer overflow with Rust.
17
-
* Strict language: the Rust compiler is very strict and does not let you make easy mistakes as you might with C.
18
-
* The usage and support for Rust is expanding to many architectures and operating systems.
15
+
* It's a modern, strong-typed language
16
+
* Rust is memory safe by design: it's very difficult to introduce a bug like buffer overflow with Rust
17
+
* Strict language: the Rust compiler is very strict and doesn't let you make easy mistakes as you might with C
18
+
* The usage and support for Rust is expanding to many architectures and operating systems
19
19
20
20
## SIMD with Rust
21
21
@@ -24,19 +24,19 @@ Support for intrinsics in languages such as C and C++ is generally added by the
24
24
Rust is a little different in that regard. While vendors are still very involved in providing the support for SIMD intrinsics in the compiler, there are other alternatives and approaches used to provide SIMD abstraction.
25
25
26
26
Currently there are 2 SIMD programming interfaces in Rust:
27
-
* One under `std::arch` which follows the C intrinsics as much as possible.
28
-
* Another, `std::simd`, which provides a portable abstraction to SIMD programming so that code can just be recompiled across different architectures with more or less the same results. While there are similar libraries for C and C++, this is different in that the intent is for it to be merged as an official extension to the Rust standard library under `std::simd`.
27
+
* One under `std::arch` which follows the C intrinsics as much as possible
28
+
* Another, `std::simd`, which provides a portable abstraction to SIMD programming so that code can be recompiled across different architectures with more or less the same results. While there are similar libraries for C and C++, this is different in that the intent is for it to be merged as an official extension to the Rust standard library under `std::simd`
29
29
30
30
You will learn how to use both of these interfaces to write code that uses Advanced SIMD/Neon instructions on an Arm CPU.
31
31
32
32
Before you start, make sure you have the [Rust compiler installed](/install-guides/rust).
33
33
34
-
You can check if you have a working `rustc` compiler installed by running the following command:
34
+
To check if you have a working `rustc` compiler installed, run the following command:
35
35
36
36
```bash
37
37
rustc --version
38
38
```
39
-
Your output should look similar to the following:
39
+
The output should look similar to:
40
40
41
41
```bash
42
42
rustc 1.79.0 (129f3b996 2024-06-10)
@@ -50,15 +50,15 @@ Switch to the `nightly` version to `rustc` by running the following:
50
50
rustup default nightly
51
51
```
52
52
53
-
Now run the version command again to check if you have the right version:
53
+
To check the version again, run:
54
54
55
55
```bash
56
56
rustc --version
57
57
```
58
-
Your output should now look similar to the following:
58
+
The output should now look similar to:
59
59
60
60
```bash
61
61
rustc 1.82.0-nightly (92c6c0380 2024-07-21)
62
62
```
63
63
64
-
Now that you have a working Rust compiler with the features supported in the nightly version, you can continue with building and running the examples included in this learning path. Please note that the code examples in this learning path are not optimally written for Rust (to do that you would have to use `cargo`, find the proper `crates` to do specific tasks, for example for 2D arrays, which would increase the complexity of this learning path significantly).
64
+
Now that you have a working Rust compiler with the features supported in the nightly version, you can continue with building and running the examples included in this Learning Path. The code examples in this Learning Path aren't optimally written for Rust (to do that you would have to use `cargo`, find the proper `crates` to do specific tasks, for example for 2D arrays, which would increase the complexity of this Learning Path significantly).
Copy file name to clipboardExpand all lines: content/learning-paths/cross-platform/simd-on-rust/simd-on-rust-part1.md
+15-17Lines changed: 15 additions & 17 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -8,9 +8,9 @@ layout: learningpathall
8
8
9
9
## Differences with programming with intrinsics in C and Rust
10
10
11
-
As per the Arm Community blog post about [Neon Intrinsics in Rust](https://community.arm.com/arm-community-blogs/b/architectures-and-processors-blog/posts/rust-neon-intrinsics), there are some differences between C and Rust when programming with intrinsics which are listed in the blog and which will be expanded on in this Learning Path with code examples.
11
+
As per the Arm Community blog post about [Neon Intrinsics in Rust](https://community.arm.com/arm-community-blogs/b/architectures-and-processors-blog/posts/rust-neon-intrinsics), there are some differences between C and Rust when programming with intrinsics which are listed in the blog and which you'll expand on in this Learning Path with code examples.
12
12
13
-
We start with an example that uses Arm Advanced SIMD (Neon) intrinsics in C. Create a file named `average_neon.c` with the contents shown below. This program computes the average value of every pair of elements in 2 arrays:
13
+
You'll start with an example that uses Arm Advanced SIMD (Neon) intrinsics in C. Create a file named `average_neon.c` with the contents shown below. This program computes the average value of every pair of elements in 2 arrays:
Note that the `-fno-inline` option was passed to the compiler. Use this option to prevent the C compiler from inlining the `average_vec` function. This is needed to compare the disassembly output of the `average_vec` function from the C version against the disassembly output from the Rust version.
80
+
Note that the `-fno-inline` option was passed to the compiler. Use this option to prevent the C compiler from inlining the `average_vec` function. You'll need this to compare the disassembly output of the `average_vec` function from the C version against the disassembly output from the Rust version.
81
81
82
82
Generate the disassembly output from the C version as shown below:
The outputs shown from these 2 versions are the same apart from the formatting.
172
171
173
-
This particular example is not very complicated but you will notice some key differences between C and Rust already:
172
+
This particular example isn't very complicated but you'll notice some key differences between C and Rust already:
174
173
175
-
* Uninitialized variables - mutable/immutable arguments passed to the functions are not a concern for a C developer creating a proof of concept program. This is not the case with Rust programming, which forces the developer to think about these things right from the start. This usually means that it takes longer to write a simple program in Rust but you can be certain that this program will not suffer from trivial bugs such as buffer overflows, out of bounds, illegal conversions etc.
176
-
* Conversions/Castings need to be explicit, e.g., `2.0_f32 * ((i+1) as f32)`.
177
-
* There is no need to pass size as a parameter as Rust includes size information in its arrays.
174
+
* Uninitialized variables - mutable/immutable arguments passed to the functions aren't a concern for a C developer creating a proof of concept program. This isn't the case with Rust programming, which forces the developer to think about these things right from the start. This usually means that it takes longer to write a program in Rust but you can be certain that this program won't suffer from trivial bugs such as buffer overflows, out of bounds, illegal conversions etc.
175
+
* Conversions/Castings need to be explicit, for example `2.0_f32 * ((i+1) as f32)`
176
+
* There's no need to pass size as a parameter as Rust includes size information in its arrays
178
177
179
-
Note that this program is not written in the most optimal way for Rust. It is just a 'port' of the C program into Rust with the minimal changes needed to compile and run.
178
+
Note that this program isn't written in the most optimal way for Rust. It's a 'port' of the C program into Rust with the minimal changes needed to compile and run.
180
179
181
-
The next step is to use SIMD intrinsics in your Rust program for the averaging loop. Replace the previous `average_vec` function with the function shown below and save the updated contents in a file named `average2.rs` as shown below:
180
+
The next step is to use SIMD intrinsics in your Rust program for the averaging loop. Replace the previous `average_vec` function with the function shown below and save the updated contents in a file named `average2.rs`:
The results are the same but let's look at some of the differences:
239
237
240
-
* You need to use `target_arch` and `target_feature` to use specific hardware extensions. This is Rust's feature detection which is explained in more detail in the next section.
241
-
* All definitions and functions need to be enabled with `use`, either selectively, for example `use std::arch::aarch64::float32x4_t` or with a wildcard `use std::arch::aarch64::*`. If in doubt, use the latter.
242
-
* You will notice `#[inline(never)]` in the definition of `average_vec`. This is to let the compiler know that it should not inline this function because you will compare the disassembly against the C version.
238
+
* You need to use `target_arch` and `target_feature` to use specific hardware extensions. This is Rust's feature detection which is explained in more detail in the next section
239
+
* All definitions and functions need to be enabled with `use`, either selectively, for example `use std::arch::aarch64::float32x4_t` or with a wildcard `use std::arch::aarch64::*`. If in doubt, use the latter
240
+
* You'll notice `#[inline(never)]` in the definition of `average_vec`. This is to let the compiler know that it shouldn't inline this function because you'll compare the disassembly against the C version
243
241
244
242
Now generate the disassembly output for `average2`as follows:
Copy file name to clipboardExpand all lines: content/learning-paths/cross-platform/simd-on-rust/simd-on-rust-part2.md
+11-11Lines changed: 11 additions & 11 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -8,7 +8,7 @@ layout: learningpathall
8
8
9
9
## An example with dot product instructions
10
10
11
-
You can now continue with an example around `dotprod` intrinsics. Shown below is a program that calculates the sum of absolute differences (SAD) of a 32x32 array of 8-bit unsigned integers (`uint8_t`) using the `vdotq_u32` intrinsic. Save the contents in a file named `dotprod1.c` as shown below:
11
+
You can now continue with an example around `dotprod` intrinsics. Shown below is a program that calculates the sum of absolute differences (SAD) of a 32x32 array of 8-bit unsigned integers (`uint8_t`) using the `vdotq_u32` intrinsic. Save the contents in a file named `dotprod1.c`:
@@ -275,11 +275,11 @@ The output should look like the following:
275
275
6724: d65f03c0 ret
276
276
```
277
277
278
-
Note that where you might expect to see a `udot` instruction, there is a `bl` instruction which indicates a branch. The `udot` instruction is instead called in another function, which carries out the loads again.
278
+
Note that where you might expect to see a `udot` instruction, there's a `bl` instruction which indicates a branch. The `udot` instruction is instead called in another function, which carries out the loads again.
279
279
280
280
This seems counter-intuitive but the reason is that, unlike C, Rust treats the intrinsics like normal functions.
281
281
282
-
Like functions, inlining them is not always guaranteed. If it is possible to inline the intrinsic, code generation and performance would be almost as that with C. If it is not possible, you might find that the same code in Rust performs worse than in C.
282
+
Like functions, inlining them isn't always guaranteed. If it's possible to inline the intrinsic, code generation and performance would be almost as that with C. If it's not possible, you might find that the same code in Rust performs worse than in C.
283
283
284
284
Because of this, you have to look carefully at the disassembly generated from your SIMD Rust code. So, how can you fix this behavior and get the expected generated code?
285
285
@@ -337,5 +337,5 @@ Now look at the changed disassembly output as follows:
337
337
66bc: d65f03c0 ret
338
338
```
339
339
340
-
This disassembly output is now as you would expect it to be as well as being better performant. You will see that the compiler automatically unrolled the loop twice because it was able to figure out that the number of iterations was small. Increasing the iterations will probably disable aggressive unrolling but it will at least inline the intrinsics properly.
340
+
This disassembly output is now as you would expect it to be as well as being better performant. You'll see that the compiler automatically unrolled the loop twice because it was able to figure out that the number of iterations was small. Increasing the iterations will probably disable aggressive unrolling but it will at least inline the intrinsics properly.
0 commit comments