This post measures the performance of wtx and other projects to figure out which one is faster. If any metrics or procedures described here are flawed, feel free to point them.
Differently from autobahn, which is the standard automated test suite that verifies client and server implementations, there isn't an easy and comprehensive benchmark suite for the WebSocket Protocol (well at least I couldn't find any) so let's create one.
Enter ws-bench! Three parameters that result in reasonable combinations trying to replicate what could possibly happen in a production environment are applied to listening servers by the program.
|Number of connections||1||128||256|
|Number of messages||1||64||128|
|Transfer memory (KiB)||1||64||128|
Number of connections
Tells how well a server can handle multiple connections concurrently. For example, there are single-thread, concurrent single-thread or multi-thread implementations.
In some cases this metric is also influenced by the underlying mechanism responsible for scheduling the execution of workers/tasks.
Number of messages
When a payload is very large, it is possible to send it using several sequential frames where each frame holds a portion of the original payload. This frame formed by different smaller frames is called here "message" and the number of "messages" can measure the implementation's ability of handling their encoding or decoding as well as the network latency (round trip time).
It is not rare to hear that the cost of a round trip is higher than the cost of allocating memory, which is generally true. Unfortunately, based on this concept some individuals prefer to indiscriminantly call the heap allocator without investigating whether such a thing might incur a negative performance impact.
Frames tend to be small but there are applications using WebSocket to transfer different types of real-time blobs. That said, let's investigate the impact of larger payload sizes.
In order to try to ensure some level of fairness, all six projects had their files modified to remove writes to
stdout, impose optimized builds where applicable and remove SSL or compression configurations.
The benchmark procedure is quite simple: servers listen to incoming requests on different ports,
ws-bench binary is called with all uris and the resulting chart is generated. In fact, everything is declared in this bash script.
Tested with a notebook composed by i5-1135G7, 256GB SSD and 32GB RAM. Combinations of
mid were discarded for showing almost zero values in all instances.
ws-tools were initially tested but eventually abandoned at a later stage due to frequent shutdowns. I didn't dive into the root causes but they can return back once the underlying problems are fixed by the authors.
wtx as a whole scored an average amount of 6350.31 ms, followed by
tokio-tungstenite with 7602.94 ms,
uWebSockets with 8393.94 ms,
fastwebsockets with 10140.58 ms,
gorilla/websockets with 10900.23 ms and finally
websockets with 17042.41 ms.
websockets performed the worst in several tests but it is unknown whether such behavior could be improved. Perhaps some modification to the
_weboskcets.py file? Let me know if it is the case.
Among the three metrics, the number of messages was the most impactful because the client always verifies the content sent back from a server leading a sequential-like behavior. Perhaps the number of messages is not a good parameter for benchmarking purposes.
wtx was faster in all tests and can indeed be rotulated as the fastest WebSocket implementation at least according to the presented projects and methodology.