|
| 1 | +# Cloud Spanner C++ Client Library Benchmarks |
| 2 | + |
| 3 | +This directory contains end-to-end benchmarks for the Cloud Spanner C++ client |
| 4 | +library. The benchmarks execute experiments against the production environment. |
| 5 | +You need a working Google Cloud Platform project and Cloud Spanner instance |
| 6 | +to run these benchmarks. We recommend that you use an isolated instance, that is |
| 7 | +one without any other workloads, to run each experiment, and that you do not run |
| 8 | +more than one experiment at a time in said instance. |
| 9 | + |
| 10 | +## Creating an instance |
| 11 | + |
| 12 | +Assuming you have an existing Google Cloud Project you can create a Cloud |
| 13 | +Spanner instance using the console or: |
| 14 | + |
| 15 | +```console |
| 16 | +GOOGLE_CLOUD_PROJECT=... # Your project ID |
| 17 | +GOOGLE_CLOUD_CPP_SPANNER_INSTANCE=benchmarks # Choose your instance ID |
| 18 | +INSTANCE_CONFIG=regional-us-central1 # Choose the instance location(s) |
| 19 | +gcloud spanner instances create ${GOOGLE_CLOUD_CPP_SPANNER_INSTANCE} \ |
| 20 | + --config=${INSTANCE_CONFIG} --description="An instance to run Benchmarks" \ |
| 21 | + --nodes=3 |
| 22 | +``` |
| 23 | + |
| 24 | +## CPU Overhead Experiment |
| 25 | + |
| 26 | +This experiment measures the CPU overhead of the client library vs. raw gRPC. |
| 27 | +This overhead is never expected to be zero, the library is performing useful |
| 28 | +work, but we want it to be low, unsurprising, and (once a baseline is |
| 29 | +established) to remain stable unless there is good reason to add overhead. |
| 30 | + |
| 31 | +We recommend that you compile and run these experiments on a VM running on the |
| 32 | +same region as the spanner instance you will use for the tests. Create and |
| 33 | +configure the VM instance, and then install the development tools for whatever |
| 34 | +platform you chose. See the [INSTALL](../../../../INSTALL.md#table-of-contents) |
| 35 | +instructions for your distribution. |
| 36 | + |
| 37 | +### Compiling the library |
| 38 | + |
| 39 | +You must compile both the library and its dependencies with optimization, using |
| 40 | +CMake this is: |
| 41 | + |
| 42 | +```bash |
| 43 | +git clone https://github.com/googleapis/google-cloud-cpp-spanner.git |
| 44 | +cd google-cloud-cpp-spanner |
| 45 | +cmake -Hsuper -Bcmake-out/si -DCMAKE_BUILD_TYPE=Release \ |
| 46 | + -DGOOGLE_CLOUD_CPP_EXTERNAL_PREFIX=$HOME/local-spanner |
| 47 | +cmake --build cmake-out/si --target project-dependencies |
| 48 | +cmake -H. -B.build -DCMAKE_BUILD_TYPE=Release \ |
| 49 | + -DCMAKE_PREFIX_PATH=$HOME/local-spanner |
| 50 | +cmake --build .build |
| 51 | +``` |
| 52 | + |
| 53 | +### Configuring Authentication |
| 54 | + |
| 55 | +You need to configure the Cloud Spanner client library so it can authenticate |
| 56 | +with Cloud Spanner. While covering authentication in detail is beyond the scope |
| 57 | +of this README, we assume the reader is familiar with the topic, and refer them |
| 58 | +to the [Authentication Overview][authentication-quickstart] if they need a more |
| 59 | +in-depth discussion. |
| 60 | + |
| 61 | +Save the credentials you want to use to a file. Then set the |
| 62 | +`GOOGLE_APPLICATION_CREDENTIALS` environment variable to the path of this file. |
| 63 | + |
| 64 | +If you are running the benchmarks in a virtual machine, the library can |
| 65 | +automatically use the GCE instance service account (when you do **not** set |
| 66 | +`GOOGLE_APPLICATION_CREDENTIALS`). You may need to grant this service account |
| 67 | +permissions to work with Cloud Spanner. Examine the |
| 68 | +[spanner roles][spanner-roles-link] to chose a role for this account, the |
| 69 | +principal used to run these benchmark should have (at least) the permissions |
| 70 | +granted by the `roles/spanner.databaseAdmin` role. |
| 71 | + |
| 72 | +[spanner-roles-linl]: https://cloud.google.com/spanner/docs/iam#roles |
| 73 | +[authentication-quickstart]: https://cloud.google.com/docs/authentication/getting-started 'Authentication Getting Started' |
| 74 | + |
| 75 | +### Running the benchmark |
| 76 | + |
| 77 | +By default the benchmarks run simple smoke tests, the intention is for these |
| 78 | +benchmarks to run as part of the CI build, where we want them to finish quickly. |
| 79 | +You must specify some options to control for how long the experiments run, for |
| 80 | +example, to perform the experiment measuring the CPU overhead of reading columns |
| 81 | +of type `STRING` you would run: |
| 82 | + |
| 83 | +```bash |
| 84 | +.build/google/cloud/spanner/benchmarks/multiple_rows_cpu_benchmark \ |
| 85 | + --project=${GOOGLE_CLOUD_PROJECT} \ |
| 86 | + --instance=${GOOGLE_CLOUD_CPP_SPANNER_INSTANCE} \ |
| 87 | + --table-size=1000000 \ |
| 88 | + --maximum-clients=8 \ |
| 89 | + --maximum-threads=16 \ |
| 90 | + --iteration-duration=5 \ |
| 91 | + --samples=60 --experiment=read-string | tee mrcb-read-string.csv |
| 92 | +``` |
| 93 | + |
| 94 | +The program can run the same experiment with other data types: |
| 95 | + |
| 96 | +```bash |
| 97 | +for exp in read-bool read-bytes read-date read-float64 \ |
| 98 | + read-int64 read-string read-timestamp; do \ |
| 99 | + .build/google/cloud/spanner/benchmarks/multiple_rows_cpu_benchmark \ |
| 100 | + --project=${GOOGLE_CLOUD_PROJECT} \ |
| 101 | + --instance=${GOOGLE_CLOUD_CPP_SPANNER_INSTANCE} \ |
| 102 | + --table-size=1000000 \ |
| 103 | + --maximum-clients=8 \ |
| 104 | + --maximum-threads=16 \ |
| 105 | + --iteration-duration=5 \ |
| 106 | + --samples=60 --experiment=${exp} | tee mrcb-${exp}.csv; \ |
| 107 | +done |
| 108 | +``` |
| 109 | + |
| 110 | +### Inspecting the results |
| 111 | + |
| 112 | +At this time we have not developed scripts to analyze the benchmark results, |
| 113 | +but some simple R commands can help, start R in your command line and then |
| 114 | +issue the following commands: |
| 115 | + |
| 116 | +```R |
| 117 | +require(ggplot2) # may require install.packages("ggplot2") the first time |
| 118 | +df <- data.frame() |
| 119 | +for(file in Sys.glob("mrcb-read-*.csv")) { |
| 120 | + t <- read.csv(file, comment.char='#'); |
| 121 | + name <- gsub('mrcb-([a-z-]+).*', '\\1', file); |
| 122 | + t$experiment = factor(name); |
| 123 | + df <- rbind(df, t); |
| 124 | +} |
| 125 | + |
| 126 | +df$CpuTimePerRow <- df$CpuTime / df$RowCount |
| 127 | + |
| 128 | +aggregate(CpuTimePerRow ~ UsingStub + experiment, data=df, FUN=mean) |
| 129 | + |
| 130 | +ggplot(data=df, aes(color=UsingStub, x=experiment, y=CpuTime)) + geom_boxplot() |
| 131 | +ggsave('read-data-types.png') |
| 132 | +``` |
| 133 | + |
| 134 | +## Single Row Throughput Experiment |
| 135 | + |
| 136 | +This experiment measures the throughput of either single-row inserts or |
| 137 | +single-row reads using different numbers of clients and threads. The objective |
| 138 | +is to verify the client library scales well with more threads and those not |
| 139 | +introduce bottlenecks. |
| 140 | + |
| 141 | +To run the experiment reading data for approximately 5 minutes use 20 samples |
| 142 | +of 15 seconds each: |
| 143 | + |
| 144 | +```bash |
| 145 | +.build/google/cloud/spanner/benchmarks/single_row_throughput_benchmark \ |
| 146 | + --project=${GOOGLE_CLOUD_PROJECT} \ |
| 147 | + --instance=${GOOGLE_CLOUD_CPP_SPANNER_INSTANCE} \ |
| 148 | + --iteration-duration=15 \ |
| 149 | + --table-size=10000000 \ |
| 150 | + --maximum-clients=32 \ |
| 151 | + --maximum-threads=1024 \ |
| 152 | + --samples=20 2>&1 \ |
| 153 | + --experiment=read | tee srtp-read.csv |
| 154 | +``` |
0 commit comments