You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/online-inference-with-maxtext-engine.md
+36-1Lines changed: 36 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -205,6 +205,41 @@ Prompt: Today is a good day
205
205
Response: to be a fan
206
206
```
207
207
208
+
### (optional) Observe Jetstream metrics
209
+
210
+
Metrics are not exported by default, to configure Jetstream to emit metrics start this guide again from step four and replace the `Run the following command to start the JetStream MaxText server` step with the following:
Now that we configured `prometheus_port=9090` above, we can observe various Jetstream metrics via HTTP requests to `0.0.0.0:9000`. Towards the end, the response should have content similar to the following:
233
+
234
+
```
235
+
# HELP jetstream_prefill_backlog_size Size of prefill queue
## Step 6: Run benchmarks with JetStream MaxText server
209
244
210
245
Note: The JetStream MaxText Server is not running with quantization optimization in Step 3. To get best benchmark results, we need to enable quantization (Please use AQT trained or fine tuned checkpoints to ensure accuracy) for both weights and KV cache, please add the quantization flags and restart the server as following:
0 commit comments