You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
-[Online Inference with MaxText on v5e Cloud TPU VM](https://cloud.google.com/tpu/docs/tutorials/LLM/jetstream)[[README](#jetstream-maxtext-inference-on-v5e-cloud-tpu-vm-user-guide)]
27
+
-[Online Inference with MaxText on v5e Cloud TPU VM](https://cloud.google.com/tpu/docs/tutorials/LLM/jetstream)[[README](https://github.com/google/JetStream/blob/main/docs/online-inference-with-maxtext-engine.md)]
28
28
-[Online Inference with Pytorch on v5e Cloud TPU VM](https://cloud.google.com/tpu/docs/tutorials/LLM/jetstream-pytorch)[[README](https://github.com/google/jetstream-pytorch/tree/main?tab=readme-ov-file#jetstream-pytorch)]
29
29
-[Serve Gemma using TPUs on GKE with JetStream](https://cloud.google.com/kubernetes-engine/docs/tutorials/serve-gemma-tpu-jetstream)
30
+
-[Observability in JetStream Server](https://github.com/google/JetStream/blob/main/docs/observability-prometheus-metrics-in-jetstream-server.md)
31
+
-[Profiling in JetStream Server](https://github.com/google/JetStream/blob/main/docs/profiling-with-jax-profiler-and-tensorboard.md)
30
32
-[JetStream Standalone Local Setup](#jetstream-standalone-local-setup)
In JetStream Server, we use [Prometheus](https://prometheus.io/docs/introduction/overview/) to collect key metrics within JetStream orchestrator and engines. We implemented a [Prometheus client server](https://prometheus.github.io/client_python/exporting/http/) in JetStream `server_lib.py` and use `MetricsServerConfig` (by passing `prometheus_port` in server entrypoint) to gaurd the metrics observability feature.
4
+
5
+
## Enable Prometheus server to observe Jetstream metrics
6
+
7
+
Metrics are not exported by default, here is an example to run JetStream MaxText server with metrics observability:
8
+
9
+
```bash
10
+
# Refer to JetStream MaxText User Guide for the following server config.
Now that we configured `prometheus_port=9090` above, we can observe various Jetstream metrics via HTTP requests to `0.0.0.0:9000`. Towards the end, the response should have content similar to the following:
43
+
44
+
```
45
+
# HELP jetstream_prefill_backlog_size Size of prefill queue
Copy file name to clipboardExpand all lines: docs/online-inference-with-maxtext-engine.md
-35Lines changed: 0 additions & 35 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -205,41 +205,6 @@ Prompt: Today is a good day
205
205
Response: to be a fan
206
206
```
207
207
208
-
### (optional) Observe Jetstream metrics
209
-
210
-
Metrics are not exported by default, to configure Jetstream to emit metrics start this guide again from step four and replace the `Run the following command to start the JetStream MaxText server` step with the following:
Now that we configured `prometheus_port=9090` above, we can observe various Jetstream metrics via HTTP requests to `0.0.0.0:9000`. Towards the end, the response should have content similar to the following:
233
-
234
-
```
235
-
# HELP jetstream_prefill_backlog_size Size of prefill queue
## Step 6: Run benchmarks with JetStream MaxText server
244
209
245
210
Note: The JetStream MaxText Server is not running with quantization optimization in Step 3. To get best benchmark results, we need to enable quantization (Please use AQT trained or fine tuned checkpoints to ensure accuracy) for both weights and KV cache, please add the quantization flags and restart the server as following:
In JetStream server, we have implemented JAX profiler server to support profiling JAX program with tensorboard.
4
+
5
+
## Profiling with JAX profiler server and tenorboard server
6
+
7
+
Following the [JAX official manual profiling approach](https://jax.readthedocs.io/en/latest/profiling.html#manual-capture-via-tensorboard), here is an example of JetStream MaxText server profiling with tensorboard:
8
+
9
+
1. Start a TensorBoard server:
10
+
```bash
11
+
tensorboard --logdir /tmp/tensorboard/
12
+
```
13
+
You should be able to load TensorBoard at http://localhost:6006/. You can specify a different port with the `--port` flag.
14
+
15
+
2. Start JetStream MaxText server:
16
+
```bash
17
+
# Refer to JetStream MaxText User Guide for the following server config.
3. Open http://localhost:6006/#profile, and click the “CAPTURE PROFILE” button in the upper left. Enter “localhost:9999” as the profile service URL (this is the address of the profiler server you started in the previous step). Enter the number of milliseconds you’d like to profile for, and click “CAPTURE”.
53
+
54
+
4. After the capture finishes, TensorBoard should automatically refresh. (Not all of the TensorBoard profiling features are hooked up with JAX, so it may initially look like nothing was captured.) On the left under “Tools”, select `trace_viewer`.
0 commit comments