Skip to content

Commit 60b2759

Browse files
authored
Merge pull request #3159 from madeline-underwood/ray
Ray
2 parents 092e63b + 1ad6e8c commit 60b2759

5 files changed

Lines changed: 35 additions & 40 deletions

File tree

content/learning-paths/servers-and-cloud-computing/ray-on-axion/_index.md

Lines changed: 0 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,6 @@
11
---
22
title: Scale AI workloads with Ray on Google Cloud C4A Axion VM
33
description: Deploy and run distributed AI workloads using Ray on Google Cloud Axion C4A Arm-based VMs, covering parallel tasks, hyperparameter tuning, and model serving with Ray Core, Train, Tune, and Serve.
4-
5-
draft: true
6-
cascade:
7-
draft: true
84

95
minutes_to_complete: 30
106

content/learning-paths/servers-and-cloud-computing/ray-on-axion/distributed_workloads.md

Lines changed: 9 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -6,11 +6,11 @@ weight: 6
66
layout: learningpathall
77
---
88

9-
## Run Distributed Workloads with Ray
9+
## Run distributed workloads with Ray
1010

1111
This section demonstrates how to execute parallel tasks and distributed training workloads using Ray on Arm.
1212

13-
You will run simple distributed functions and then scale to multi-worker training using Ray.
13+
You'll run distributed functions and then scale to multi-worker training using Ray.
1414

1515
## Run distributed tasks
1616

@@ -32,7 +32,7 @@ results = ray.get([square.remote(i) for i in range(10)])
3232
print("Results:", results)
3333
```
3434

35-
### Explanation
35+
### Code explanation
3636

3737
* `ray.init()` → connects to the running Ray cluster
3838
* `@ray.remote` → converts a function into a distributed task
@@ -92,7 +92,7 @@ trainer = TorchTrainer(
9292
trainer.fit()
9393
```
9494

95-
### Execute training
95+
### Run the training script
9696

9797
```bash
9898
python3 ray_train.py
@@ -115,14 +115,14 @@ The output is similar to:
115115

116116
This confirms distributed training across multiple workers.
117117

118-
## Explanation
118+
## Training code explanation
119119

120120
* `TorchTrainer` → handles distributed training execution
121121
* `ScalingConfig(num_workers=2)` → runs training on 2 workers
122122
* Each worker executes training in parallel
123-
* Logs may appear from multiple processes
123+
* Logs can appear from multiple processes
124124

125-
## Ray Jobs View (Tasks & Training)
125+
## Ray Jobs view (tasks and training)
126126

127127
![Ray Dashboard Jobs tab showing successful execution of ray_test.py and ray_train.py#center](images/ray-jobs.png "Ray Jobs tab showing distributed tasks and training execution status")
128128

@@ -137,6 +137,6 @@ You have successfully:
137137
* Executed parallel tasks using Ray Core
138138
* Converted functions into distributed workloads
139139
* Performed distributed training using multiple workers
140-
* Observed execution in the Ray dashboard
140+
* Observed execution in the Ray Dashboard
141141

142-
Next, you will perform hyperparameter tuning, deploy models, and benchmark performance.
142+
Next, you'll perform hyperparameter tuning, deploy models, and benchmark performance.

content/learning-paths/servers-and-cloud-computing/ray-on-axion/firewall-setup.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ weight: 3
66
layout: learningpathall
77
---
88

9-
Create a firewall rule in Google Cloud Console to expose required ports for the Ray dashboard and Ray Serve API.
9+
Create a firewall rule in Google Cloud Console to expose required ports for the Ray Dashboard and Ray Serve API.
1010

1111
{{% notice Note %}}
1212
For help with GCP setup, see the Learning Path [Getting started with Google Cloud Platform](/learning-paths/servers-and-cloud-computing/csp/google/).
@@ -38,7 +38,7 @@ Finally, select **Specified protocols and ports** under the **Protocols and port
3838

3939
Then select **Create**.
4040

41-
![Google Cloud Console Protocols and ports section with TCP ports configured alt-txt#center](images/network-port.png "Setting Ray ports in the firewall rule")
41+
![Google Cloud Console Protocols and ports section showing TCP checkbox selected with ports 8265, 8000, and 6379 configured for Ray Dashboard, Serve API, and Head Node#center](images/network-port.png "Setting Ray ports in the firewall rule")
4242

4343
## What you've accomplished and what's next
4444

content/learning-paths/servers-and-cloud-computing/ray-on-axion/setup_and_cluster.md

Lines changed: 11 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ layout: learningpathall
1010

1111
This section guides you through installing Ray on a GCP Arm64 (Axion) virtual machine and setting up a single-node distributed computing cluster.
1212

13-
You will configure the environment, install dependencies, and initialize a Ray cluster optimized for Arm-based infrastructure.
13+
You'll configure the environment, install dependencies, and initialize a Ray cluster optimized for Arm-based infrastructure.
1414

1515
## Update your system
1616

@@ -96,7 +96,7 @@ ray start --head --dashboard-host=0.0.0.0 --num-cpus=4
9696
```
9797

9898
* `--head` → starts the main node (scheduler)
99-
* `--dashboard-host=0.0.0.0` → allows external dashboard access
99+
* `--dashboard-host=0.0.0.0` → allows external Ray Dashboard access
100100
* `--num-cpus=4` → allocates 4 CPU cores
101101

102102
The output is similar to:
@@ -165,30 +165,30 @@ Pending Demands:
165165
(no resource demands)
166166
```
167167

168-
## Access the dashboard
168+
## Access the Ray Dashboard
169169

170-
Open in browser:
170+
Open the following URL in your browser:
171171

172172
```
173173
http://<VM-IP>:8265
174174
```
175175

176-
This dashboard provides visibility into jobs, tasks, and resource utilization.
176+
The Ray Dashboard provides visibility into jobs, tasks, and resource utilization.
177177

178-
## Ray Dashboard Overview
178+
## Ray Dashboard overview
179179

180-
![Ray Dashboard showing cluster overview, utilization, and navigation tabs#center](images/ray-dashboard.png "Ray Dashboard Overview showing cluster status and metrics")
180+
![Ray Dashboard showing cluster overview, utilization, and navigation tabs#center](images/ray-dashboard.png "Ray Dashboard overview showing cluster status and metrics")
181181

182-
This dashboard helps monitor distributed execution and debug workloads in real time.
182+
The Ray Dashboard helps monitor distributed execution and debug workloads in real time.
183183

184184
## What you've learned and what's next
185185

186186
You have successfully:
187187

188-
* Installed Ray on Arm-based SUSE VM
188+
* Installed Ray on an Arm-based SUSE VM
189189
* Created an isolated Python environment
190190
* Installed required dependencies
191191
* Initialized a Ray cluster
192-
* Verified cluster status and dashboard
192+
* Verified cluster status and Ray Dashboard
193193

194-
Next, you will run distributed workloads using Ray.
194+
Next, you'll run distributed workloads using Ray.

content/learning-paths/servers-and-cloud-computing/ray-on-axion/tuning_serving_benchmark.md

Lines changed: 13 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ weight: 7
66
layout: learningpathall
77
---
88

9-
## Ray Tune, Serve, and Benchmarking
9+
## Hyperparameter tuning, serving, and benchmarking
1010

1111
This section demonstrates hyperparameter tuning, model serving, and performance benchmarking using Ray.
1212

@@ -42,20 +42,19 @@ results = tuner.fit()
4242
print("Best result:", results.get_best_result(metric="score", mode="max"))
4343
```
4444

45-
### Explanation
45+
### Code explanation
4646

4747
* `tune.grid_search()` → tries multiple hyperparameter values
48-
* Each value runs as a **separate parallel trial**
48+
* Each value runs as a separate parallel trial
4949
* `session.report()` → sends metrics back to Ray
5050
* `Tuner.fit()` → executes all trials
5151

52-
### Execute tuning
52+
### Run hyperparameter tuning
5353

5454
```bash
5555
python3 ray_tune.py
5656
```
5757

58-
### Output
5958
The output is similar to:
6059

6160
```output
@@ -84,7 +83,7 @@ Best result: Result(
8483

8584
### Understanding the output
8685

87-
* Ray created **3 parallel trials** using different learning rates
86+
* Ray created 3 parallel trials using different learning rates
8887
* Each trial executed independently on available CPU cores
8988
* Scores represent the performance of each configuration
9089

@@ -96,7 +95,7 @@ Best result: Result(
9695

9796
**Best configuration = learning rate 0.1**
9897

99-
* Total runtime ≈ **1 second** (parallel execution)
98+
* Total runtime ≈ 1 second (parallel execution)
10099
* Results stored in:
101100

102101
```bash
@@ -127,7 +126,7 @@ app = Model.bind()
127126
serve.run(app)
128127
```
129128

130-
### Explanation
129+
### Code explanation
131130

132131
* `serve.start()` → initializes serving system
133132
* `@serve.deployment` → defines deployable service
@@ -148,14 +147,14 @@ curl http://127.0.0.1:8000/
148147
The output is similar to:
149148

150149
```output
151-
{"message":"Hello from Ray Serve on ARM VM!"}
150+
{"message":"Hello from Ray Serve on Arm VM!"}
152151
```
153152

154-
## Ray Tune Execution in Dashboard
153+
## Ray Tune execution in Ray Dashboard
155154

156155
![Ray Dashboard Jobs tab showing ray_tune.py trials with SUCCEEDED status#center](images/ray-jobs-status.png "Ray Tune trials executed successfully with different configurations")
157156

158-
The dashboard shows all jobs executed successfully, confirming correct Ray cluster operation.
157+
The Ray Dashboard shows all jobs executed successfully, confirming correct Ray cluster operation.
159158

160159
## Benchmark distributed execution
161160

@@ -184,7 +183,7 @@ print("Execution Time:", end - start)
184183
```
185184

186185

187-
### Execute benchmark
186+
### Run the benchmark
188187

189188
```bash
190189
ray stop
@@ -202,10 +201,10 @@ Execution Time: 5.171869277954102
202201
## Understanding the benchmark
203202

204203
* 20 tasks executed in parallel
205-
* Each task takes ~1 second
204+
* Each task takes approximately 1 second
206205
* With 4 CPUs → total time ≈ 5 seconds
207206

208-
**Sequential execution would take ~20 seconds**
207+
**Sequential execution would take approximately 20 seconds**
209208

210209
* Confirms Ray parallel execution
211210

0 commit comments

Comments
 (0)