Also, the readme has slightly incorrect logic I think:
> According to Special Relativity, information cannot travel faster than the speed of light. Therefore, if the round trip time (RTT) is 4ms, it's physically impossible for them to be farther than 2 light milliseconds away, which is approximately 600 kilometers.
It calls out the 33% for fiber but ignores that there’s not a straightline path between two points on the network and there could be wireless, cable, and DSL links somewhere on that hop.
Also, the controlled variable here is latency, not distance. Thus you can always increase latency through buffering and therefor you could be made to appear further than you are. And that buffering need not even be intentional - your perceived distance estimate will vary based upon queuing delays in intermediary depending on time of day (itself a fingerprint if you incorporate time-aware measurements, but a source of error if you don’t).
Fingerprinting is hard and I dislike the framing that it’s absolutely impossible to mask or that there’s not false positive and false negative error rates with the fingerprint.
Countermeasure: pick some min-RTT >= the actual client RTT (you can do this as a TCP proxy by measuring client ping). Measure server RTT and artificially delay responses to be >= min-RTT. This will require an added delay during the handshake and ACKs, but no added delay for the response payloads.
Counter-countermeasure: the above may lead to TCP message types that don't make sense given a traditional TCP client state machine (e.g., delayed ACK would bundle ACK and PUSH but the system shows separate/simultaneous ACK and PUSH packets. Counter-counter-countermeasure is left to the reader.
The difference in min TCP RTT and min RTT to respond to a websocket payload is a dead giveaway that there's a middlebox terminating TCP somewhere along the path. You can bypass this by sourcing your request within 30ms of wherever TCP is being terminated, anything under that threshold could be caused by regular noise and isn't a reliable fingerprint. Due to how many gateway's there are between you and a residential proxy exit node this makes fingerprinting them extremely easy.
I expect it won't be long until someone deploys the first proxy service that handles the initial CONNECT payload in the kernel before offloading packet forwarding to an eBPF script that will proxy packets between hosts at layer 3, making this fingerprinting technique obsolete. The cat and mouse game continues.
I suppose it's possible botnets ("residential proxies") may get detected this way if they're using SOCKS to forward requests?
Still, this looks like an interesting signal to add to a system like Anubis to increase the difficulty for suspicious traffic sources.
This does very reliably detect TOR traffic, though you can just download a list of exit nodes if that's what you want.
<html><body><h1>You don't seem to be using a TCP Proxy!</h1><p>(If you are using a VPN or any other kind of proxy that is not a TCP Proxy, this will not detect it)</p></body></html>
When deployed on a popular server, one bit of "IP intelligence" this detector itself can gather is keep database of lowest-seen RTT per given source IP, maybe with some filtering - to cut out "faster-than-light" datapoints, gracefully update when actual network topology changes, etc.
That would establish a baseline, and from there, additional end-to-end RTT should become much more visible.
Also available as audiobook, and a documentary ("The KGB, The computer and Me"). https://www.youtube.com/watch?v=Xe5AE-qYan8