Skip to content
This repository was archived by the owner on Dec 8, 2021. It is now read-only.

Commit 4df2f3f

Browse files
authored
doc: add README for the benchmarks (#1090)
1 parent f351057 commit 4df2f3f

1 file changed

Lines changed: 154 additions & 0 deletions

File tree

Lines changed: 154 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,154 @@
1+
# Cloud Spanner C++ Client Library Benchmarks
2+
3+
This directory contains end-to-end benchmarks for the Cloud Spanner C++ client
4+
library. The benchmarks execute experiments against the production environment.
5+
You need a working Google Cloud Platform project and Cloud Spanner instance
6+
to run these benchmarks. We recommend that you use an isolated instance, that is
7+
one without any other workloads, to run each experiment, and that you do not run
8+
more than one experiment at a time in said instance.
9+
10+
## Creating an instance
11+
12+
Assuming you have an existing Google Cloud Project you can create a Cloud
13+
Spanner instance using the console or:
14+
15+
```console
16+
GOOGLE_CLOUD_PROJECT=... # Your project ID
17+
GOOGLE_CLOUD_CPP_SPANNER_INSTANCE=benchmarks # Choose your instance ID
18+
INSTANCE_CONFIG=regional-us-central1 # Choose the instance location(s)
19+
gcloud spanner instances create ${GOOGLE_CLOUD_CPP_SPANNER_INSTANCE} \
20+
--config=${INSTANCE_CONFIG} --description="An instance to run Benchmarks" \
21+
--nodes=3
22+
```
23+
24+
## CPU Overhead Experiment
25+
26+
This experiment measures the CPU overhead of the client library vs. raw gRPC.
27+
This overhead is never expected to be zero, the library is performing useful
28+
work, but we want it to be low, unsurprising, and (once a baseline is
29+
established) to remain stable unless there is good reason to add overhead.
30+
31+
We recommend that you compile and run these experiments on a VM running on the
32+
same region as the spanner instance you will use for the tests. Create and
33+
configure the VM instance, and then install the development tools for whatever
34+
platform you chose. See the [INSTALL](../../../../INSTALL.md#table-of-contents)
35+
instructions for your distribution.
36+
37+
### Compiling the library
38+
39+
You must compile both the library and its dependencies with optimization, using
40+
CMake this is:
41+
42+
```bash
43+
git clone https://github.com/googleapis/google-cloud-cpp-spanner.git
44+
cd google-cloud-cpp-spanner
45+
cmake -Hsuper -Bcmake-out/si -DCMAKE_BUILD_TYPE=Release \
46+
-DGOOGLE_CLOUD_CPP_EXTERNAL_PREFIX=$HOME/local-spanner
47+
cmake --build cmake-out/si --target project-dependencies
48+
cmake -H. -B.build -DCMAKE_BUILD_TYPE=Release \
49+
-DCMAKE_PREFIX_PATH=$HOME/local-spanner
50+
cmake --build .build
51+
```
52+
53+
### Configuring Authentication
54+
55+
You need to configure the Cloud Spanner client library so it can authenticate
56+
with Cloud Spanner. While covering authentication in detail is beyond the scope
57+
of this README, we assume the reader is familiar with the topic, and refer them
58+
to the [Authentication Overview][authentication-quickstart] if they need a more
59+
in-depth discussion.
60+
61+
Save the credentials you want to use to a file. Then set the
62+
`GOOGLE_APPLICATION_CREDENTIALS` environment variable to the path of this file.
63+
64+
If you are running the benchmarks in a virtual machine, the library can
65+
automatically use the GCE instance service account (when you do **not** set
66+
`GOOGLE_APPLICATION_CREDENTIALS`). You may need to grant this service account
67+
permissions to work with Cloud Spanner. Examine the
68+
[spanner roles][spanner-roles-link] to chose a role for this account, the
69+
principal used to run these benchmark should have (at least) the permissions
70+
granted by the `roles/spanner.databaseAdmin` role.
71+
72+
[spanner-roles-linl]: https://cloud.google.com/spanner/docs/iam#roles
73+
[authentication-quickstart]: https://cloud.google.com/docs/authentication/getting-started 'Authentication Getting Started'
74+
75+
### Running the benchmark
76+
77+
By default the benchmarks run simple smoke tests, the intention is for these
78+
benchmarks to run as part of the CI build, where we want them to finish quickly.
79+
You must specify some options to control for how long the experiments run, for
80+
example, to perform the experiment measuring the CPU overhead of reading columns
81+
of type `STRING` you would run:
82+
83+
```bash
84+
.build/google/cloud/spanner/benchmarks/multiple_rows_cpu_benchmark \
85+
--project=${GOOGLE_CLOUD_PROJECT} \
86+
--instance=${GOOGLE_CLOUD_CPP_SPANNER_INSTANCE} \
87+
--table-size=1000000 \
88+
--maximum-clients=8 \
89+
--maximum-threads=16 \
90+
--iteration-duration=5 \
91+
--samples=60 --experiment=read-string | tee mrcb-read-string.csv
92+
```
93+
94+
The program can run the same experiment with other data types:
95+
96+
```bash
97+
for exp in read-bool read-bytes read-date read-float64 \
98+
read-int64 read-string read-timestamp; do \
99+
.build/google/cloud/spanner/benchmarks/multiple_rows_cpu_benchmark \
100+
--project=${GOOGLE_CLOUD_PROJECT} \
101+
--instance=${GOOGLE_CLOUD_CPP_SPANNER_INSTANCE} \
102+
--table-size=1000000 \
103+
--maximum-clients=8 \
104+
--maximum-threads=16 \
105+
--iteration-duration=5 \
106+
--samples=60 --experiment=${exp} | tee mrcb-${exp}.csv; \
107+
done
108+
```
109+
110+
### Inspecting the results
111+
112+
At this time we have not developed scripts to analyze the benchmark results,
113+
but some simple R commands can help, start R in your command line and then
114+
issue the following commands:
115+
116+
```R
117+
require(ggplot2) # may require install.packages("ggplot2") the first time
118+
df <- data.frame()
119+
for(file in Sys.glob("mrcb-read-*.csv")) {
120+
t <- read.csv(file, comment.char='#');
121+
name <- gsub('mrcb-([a-z-]+).*', '\\1', file);
122+
t$experiment = factor(name);
123+
df <- rbind(df, t);
124+
}
125+
126+
df$CpuTimePerRow <- df$CpuTime / df$RowCount
127+
128+
aggregate(CpuTimePerRow ~ UsingStub + experiment, data=df, FUN=mean)
129+
130+
ggplot(data=df, aes(color=UsingStub, x=experiment, y=CpuTime)) + geom_boxplot()
131+
ggsave('read-data-types.png')
132+
```
133+
134+
## Single Row Throughput Experiment
135+
136+
This experiment measures the throughput of either single-row inserts or
137+
single-row reads using different numbers of clients and threads. The objective
138+
is to verify the client library scales well with more threads and those not
139+
introduce bottlenecks.
140+
141+
To run the experiment reading data for approximately 5 minutes use 20 samples
142+
of 15 seconds each:
143+
144+
```bash
145+
.build/google/cloud/spanner/benchmarks/single_row_throughput_benchmark \
146+
--project=${GOOGLE_CLOUD_PROJECT} \
147+
--instance=${GOOGLE_CLOUD_CPP_SPANNER_INSTANCE} \
148+
--iteration-duration=15 \
149+
--table-size=10000000 \
150+
--maximum-clients=32 \
151+
--maximum-threads=1024 \
152+
--samples=20 2>&1 \
153+
--experiment=read | tee srtp-read.csv
154+
```

0 commit comments

Comments
 (0)