Tests aren't a separate phase that follows coding; they're a property of the design. A well-tested system has different shapes of tests, each catching a different class of bug, and each running at a different cost.
- 1. The Testing Pyramid
- 2. Unit Tests
- 3. Integration Tests
- 4. End-to-End Tests
- 5. Mocking — Used Correctly
- 6. Property-Based Testing
- 7. Fuzzing
- 8. Contract Testing
- 9. Sanitizers as Tests
- 10. CI Strategy
/\
/E2E\ few, slow, expensive, full system
/------\
/ int. \ more, mid-speed, multiple components together
/----------\
/ unit \ many, fast, single function/class
--------------
| Unit | Integration | E2E | |
|---|---|---|---|
| Per test, time | <1 ms | 10 ms – 1 s | 1–60 s |
| Number of tests | 1000s | 100s | 10s |
| Diagnoses | "this function is wrong" | "two components disagree" | "the whole stack is wrong" |
| Cost to maintain | Low | Medium | High (flaky, slow, brittle) |
Inverted pyramid (lots of E2E, few units) is the typical anti-pattern: slow CI, painful failures, hard to localize bugs.
Test one function or class in isolation. Should be fast (<1 ms), deterministic, no external state.
C++ unit-test frameworks:
- GoogleTest — most common in industry. Mature; integrates with everything.
- Catch2 — header-only, expressive
REQUIRE(x == y)syntax, excellent for small projects. - doctest — fastest compile times; lowest ceremony.
- Boost.Test — if you already use Boost.
#include <gtest/gtest.h>
#include <optional>
#include <string>
struct Credential { bool expired; };
class TokenIssuer {
public:
enum class Error { None, Expired };
std::optional<std::string> issueFor(const std::string& user, Credential c) {
if (c.expired) { lastError = Error::Expired; return std::nullopt; }
lastError = Error::None;
return "token-for-" + user;
}
Error lastError = Error::None;
};
TEST(TokenIssuer, RefusesExpiredCredentials) {
TokenIssuer issuer;
Credential expired{true};
auto token = issuer.issueFor("user42", expired);
EXPECT_FALSE(token.has_value());
EXPECT_EQ(issuer.lastError, TokenIssuer::Error::Expired);
}What makes a unit test good:
- One concept per test. If it fails, you know which behavior broke.
- Arrange / Act / Assert structure visible at a glance.
- No coupling to other tests. Order shouldn't matter.
- Tests behavior, not implementation. Refactoring shouldn't break tests if behavior is the same.
Test how components fit together. Hit a real (or in-memory) database, spin up a real HTTP server, exercise the wire protocol.
- Don't mock the boundary you're integrating across — the bug you're trying to find is likely in the integration.
- Use containers: PostgreSQL in Docker, Redis, Kafka. Testcontainers is the standard.
- For databases: spin up a real one, run schema migrations, run the test, drop. ~1s per test if optimized.
- Keep integration tests narrow — exercise one integration at a time, not the whole stack.
The full system, exercised through its real entry points. UI-driven for web apps; full-stack for services. Slow, brittle, expensive — but they catch a class of bug nothing else does (deployment misconfiguration, real-world latency, third-party API changes).
Discipline:
- Have few of them. A handful that exercise critical user journeys.
- Treat flake as a defect. A flaky E2E that everyone ignores becomes worse than no test.
- Run on a schedule, not on every commit. Or as a "smoke" subset on PR, full suite nightly.
Mocks substitute a dependency in a unit test. Most tests don't need mocks — a real std::vector works in tests, and so does an in-memory hash map for "the cache."
| Term | Meaning |
|---|---|
| Dummy | Object passed but not used. |
| Stub | Returns canned data. "If asked for user 42, return Alice." |
| Fake | Lightweight working impl. In-memory DB; not for prod, real for tests. |
| Mock | Verifies it was called in a particular way. |
| Spy | Records calls; you check after. |
Prefer fakes over mocks. A FakeDatabase that's a real working implementation lets you write tests in terms of behavior ("then the user is saved") instead of choreography ("then save() was called once with these args"). Mock tests tend to break on every refactor because they assert on the implementation.
GoogleMock for when you do want mocks:
#include <gtest/gtest.h>
#include <gmock/gmock.h>
struct Order { int id; };
class Database {
public:
virtual ~Database() = default;
virtual void save(const Order& o) = 0;
};
class MockDatabase : public Database {
public:
MOCK_METHOD(void, save, (const Order&), (override));
};
class OrderService {
public:
OrderService(Database& db) : db(db) {}
void placeOrder(const Order& o) { db.save(o); }
private:
Database& db;
};
using ::testing::_;
TEST(OrderService, SavesAcceptedOrder) {
MockDatabase db;
EXPECT_CALL(db, save(_)).Times(1);
OrderService svc(db);
svc.placeOrder(Order{1});
}Reach for it sparingly.
Instead of "for input X, output should be Y," state a property the code should satisfy for all inputs, and let the framework generate inputs:
#include <rapidcheck.h>
#include <algorithm>
#include <vector>
int main() {
rc::check("reverse twice yields original", [](std::vector<int> v) {
auto r = v;
std::reverse(r.begin(), r.end());
std::reverse(r.begin(), r.end());
RC_ASSERT(r == v);
});
}Properties to test:
- Round-trip:
decode(encode(x)) == x - Idempotence:
f(f(x)) == f(x) - Invariants: "after sort, every adjacent pair is in order"
- Equivalence: "the optimized impl agrees with the slow reference"
C++ frameworks: rapidcheck, proptest-cxx. Great for parsers, serializers, math.
Generate random/structured inputs targeting code paths, run with sanitizers, watch for crashes or assertion failures.
- libFuzzer (built into Clang) — in-process coverage-guided fuzzer.
- AFL++ — out-of-process; better for big binaries.
- Honggfuzz — alternative with strong instrumentation.
- OSS-Fuzz — Google's continuous fuzzing for OSS projects; free if you qualify.
#include <cstddef>
#include <cstdint>
void parse_my_format(const uint8_t* data, size_t size);
// libFuzzer entry point
extern "C" int LLVMFuzzerTestOneInput(const uint8_t* data, size_t size) {
parse_my_format(data, size); // crash here = bug
return 0;
}clang++ -fsanitize=fuzzer,address,undefined fuzz.cpp parse.cpp -o fuzz
./fuzz corpus/What fuzzes well:
- Parsers (JSON, custom binary formats).
- Deserializers (protobuf, ASN.1).
- State machines (random sequences of events).
- File-format readers.
Run fuzzers on a schedule or on PR. Fuzzing finds bugs nothing else does — particularly UB, memory safety, and adversarial input bugs.
Two services communicate over a contract (REST schema, protobuf). Each side tests against a recorded contract instead of mocking the other.
- Provider records: "given this request, I produce this response."
- Consumer asserts: "given that contract, my code parses correctly."
- Tooling: Pact, Postman/Insomnia for snapshot tests.
Cheaper than full integration; catches the "we changed our schema and didn't tell anyone" failure mode. Useful in microservice-heavy environments.
Compiler-level dynamic analyzers that turn UB into reproducible test failures. Run them in CI, not just locally.
| Sanitizer | Catches |
|---|---|
| ASan | Heap/stack OOB, use-after-free, double-free, leaks |
| UBSan | Undefined behavior — int overflow, null deref, invalid casts |
| TSan | Data races |
| MSan | Reads of uninitialized memory (Clang only) |
| LSan | Memory leaks (often bundled with ASan) |
clang++ -fsanitize=address,undefined -O1 -g foo.cpp -o foo
./foo # crashes loudly on bugsA green test suite under sanitizers is far stronger evidence of correctness than a green plain suite.
See Memory Error Detection With Sanitizers.
The shape of a healthy CI:
| Stage | What | Time budget |
|---|---|---|
| Pre-commit hook | Format, lint, fast unit tests | <10 s |
| PR check | Full unit suite, build matrix, ASan/UBSan | <10 min |
| Merge to main | + Integration tests, container build | <30 min |
| Nightly | + E2E, fuzzing run, perf tests, TSan | <2 h |
Discipline:
- No flaky tests. Quarantine, then fix or delete.
- Test in the matrix you ship. Test on every compiler/OS you claim to support.
- Run a sanitized build in CI. If sanitizers fire, the build fails.
- Test the release build, not just debug. Many bugs only show under
-O2. - Reproducible builds. Same commit → same artifact. Required for trustworthy debugging.
- Designing for Testability
- Memory Error Detection With Sanitizers
- GoogleTest, Catch2, doctest
- libFuzzer documentation, AFL++
- Testcontainers
- Working Effectively with Legacy Code, Michael Feathers — on adding tests to untested code.
- Software Engineering at Google, chapters 11–14 — testing at scale.