Skip to content

Commit 753c400

Browse files
authored
Enhance distributed workloads documentation with output examples
Added output examples for distributed training confirmation.
1 parent 5698f13 commit 753c400

1 file changed

Lines changed: 11 additions & 6 deletions

File tree

content/learning-paths/servers-and-cloud-computing/ray-on-axion/distributed_workloads.md

Lines changed: 11 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -100,12 +100,17 @@ python3 ray_train.py
100100

101101
The output is similar to:
102102
```output
103-
(RayTrainWorker pid=5335) Loss: 1.1982450485229492
104-
(RayTrainWorker pid=5335) Loss: 1.158831000328064
105-
(RayTrainWorker pid=5335) Loss: 1.1220906972885132
106-
(RayTrainWorker pid=5335) Loss: 1.088060736656189
107-
(RayTrainWorker pid=5335) Loss: 1.0567599534988403
108-
(RayTrainWorker pid=5336) Loss: 1.4622551202774048 [repeated 5x across cluster] (Ray deduplicates logs by default. Set RAY_DEDUP_LOGS=0 to disable log deduplication, or see https://docs.ray.io/en/master/ray-observability/user-guides/configure-logging.html#log-deduplication for more options.)
103+
(TrainController pid=5522) Attempting to start training worker group of size 2 with the following resources: [{'CPU': 1}] * 2
104+
(TrainController pid=5522) Started training worker group of size 2:
105+
(TrainController pid=5522) - (ip=10.0.0.19, pid=5563) world_rank=0, local_rank=0, node_rank=0
106+
(TrainController pid=5522) - (ip=10.0.0.19, pid=5564) world_rank=1, local_rank=1, node_rank=0
107+
(RayTrainWorker pid=5563) Setting up process group for: env:// [rank=0, world_size=2]
108+
(RayTrainWorker pid=5563) Loss: 0.9711737036705017
109+
(RayTrainWorker pid=5563) Loss: 0.9491967558860779
110+
(RayTrainWorker pid=5563) Loss: 0.9295402765274048
111+
(RayTrainWorker pid=5563) Loss: 0.911673903465271
112+
(RayTrainWorker pid=5563) Loss: 0.895072340965271
113+
(RayTrainWorker pid=5564) Loss: 1.635019063949585 [repeated 5x across cluster] (Ray deduplicates logs by default. Set RAY_DEDUP_LOGS=0 to disable log deduplication, or see https://docs.ray.io/en/master/ray-observability/user-guides/configure-logging.html#log-deduplication for more options.)
109114
```
110115

111116
This confirms distributed training across multiple workers.

0 commit comments

Comments
 (0)