Internal Development
Intended for the development of WTX although some tips might be useful for your projects.
Size constraints
A large enum aggressively used in several places can cause a negative runtime impact. In fact, this is so common that the community created several lints to prevent such a scenario.
Some real-world use-cases and associated benchmarks.
- https://ziglang.org/download/0.8.0/release-notes.html#Reworked-Memory-Layout
- https://github.com/rust-lang/rust/pull/100441
- https://github.com/rust-lang/rust/pull/95715
That is why WTX has an enforced Error enum size of 16 bytes and that is also the reason why WTX has so many bare error variants.
Performance
Many things that generally improve performance are used in the project, to name a few:
- Manual Vectorization: When an algorithm is known for processing large amounts of data, several experiments are performed to analyze the best way to split loops in order to allow the compiler to take advantage of SIMD instructions.
- Memory Allocation: Whenever possible, all structures related to heap allocations are only created at the instantiation level.
- Fewer Dependencies: No third-party is injected by default. In other words, additional dependencies are up to the user through the selection of Cargo features, which decreases the compilation time of full builds. For example, you can see the mere 7 dependencies required by the PostgreSQL client using
cargo tree -e normal --features crypto-ring,postgres. - Vectored and Buffered IO: Instead of writing a single chunk of data and waiting for it to be sent, multiple chunks are gathered and transmitted in a single operation whenever possible.
Profiling
Uses the h2load benchmarking tool (https://nghttp2.org/documentation/h2load-howto.html) and the h2load internal binary (https://github.com/c410-f3r/wtx/blob/main/wtx-internal/src/bin/h2load.rs) for illustration purposes.
Compilation time / Size
cargo-bloat: Finds out what takes most of the space in executables.
cargo bloat --bin h2load --features h2load | head -20
cargo-llvm-lines: Measures the number and size of instantiations of each generic function in a program.
CARGO_PROFILE_RELEASE_LTO=fat cargo llvm-lines --bin h2load --features h2load --package wtx-internal --release | head -20
Performance
Prepare the executables in different terminals.
h2load -c100 --log-file=/tmp/h2load.txt -m10 -n10000 --no-tls-proto=h2c http://localhost:9000
cargo build --bin h2load --features h2load --profile profiling --target x86_64-unknown-linux-gnu
samply: Command line CPU profiler.
samply record ./target/x86_64-unknown-linux-gnu/profiling/h2load
callgrind: Gives global, per-function, and per-source-line instruction counts and simulated cache and branch prediction data.
valgrind --tool=callgrind --dump-instr=yes --collect-jumps=yes --simulate-cache=yes ./target/x86_64-unknown-linux-gnu/profiling/h2load
Compiler flags
Some non-standard options that will influence the final binary. Only use them if you know what you are doing.
Size
- -C force-frame-pointers=no
- -C force-unwind-tables=no
More size-related parameters can be found at https://github.com/johnthagen/min-sized-rust.
Runtime
- -C llvm-args=–inline-threshold=9999
- -C llvm-args=-enable-dfa-jump-thread
- -C llvm-args=-vectorize-loops
- -C llvm-args=-vectorize-slp
- -C target-cpu=x86-64-v3
Security
- -C control-flow-guard=yes
- -C relocation-model=pie
- -C relro-level=full
- -Z stack-protector=strong