(discussed previously on HN 5 years ago – https://news.ycombinator.com/item?id=23775404 – and 10 years ago – https://news.ycombinator.com/item?id=9338708)
"10 years ago we couldn't send an email 500 miles, but these days we can't send it 500 miles because it just routes internally."
Too bad, I think that would have been more interesting to read.
Obviously? I think I've had this phone call myself a few times, although in my experience it was never from a statistician and they didn't give me as much data, but I'm pretty sure the story is mostly accurate.
> I think this is nonsense... why would an invalid or incomplete sendmail configuration default to three milliseconds?
This is a wonderful question, and perhaps much more interesting than anything else in the page, but first, let's reproduce the timing;
My desktop, a 2017 Xeon E7-8880 (144 cores of 2.3ghz; 1tb ram) with a load of 2.26 at this moment:
$ time sleep 0.001
real 0m0.004s
user 0m0.001s
sys 0m0.003s
On my i9-10900k (3.7ghz) current load of 3,31: $ time sleep 0.001
real 0m0,002s
user 0m0,000s
sys 0m0,001s
(In case you think I'm measuring exec; time /bin/echo returns 0's on both machine)Now as to why this is? Well in order to understand that, you need to understand how connect() actually works, and how to create a timeout for connect(). Those skilled in the art know you've got a number of choices on how to do it, but they all involve multiple steps because connect() does not take a timeout as an argument. Here's one way (not too different than what sendmail does/did):
fcntl(f,F_SETFL,O_NONBLOCK);
if(-1==connect(f,...)&&errno==EWOULDBLOCK){
fd_set a;FD_ZERO(&a);FD_SET(f,&a);
if(!select(f+1,&a,&a,NULL,{.tv_sec=0,.tv_usec=0})) {
close(f);return error;
}
}
If you read this carefully, you only need to ask yourself how much time can pass between the top of connect() and the bottom of select(), and if you think it is zero like tedu does, you might probably have the same surprise: Computers are not abstract machines, but made out of matter and powered by energy and thus subject to the laws of physics, and so everything takes time.For others, the surprise might be that it's still 3msec over twenty years later, and I think that is a much more interesting subject to explore than whether the speed of light exists.
It has a 500ms timeout to load some settings from a server in the UK via TLS. If it goes more than that 500ms (or something, it's unclear the exact timeout cause) the app just vapourises.
This is fine in the UK, TLS needs about* 3 times RTT to complete though, so an RTT above about 160ms and it's screwed.
Almost all our users are in the UK, europe, mid-east or east coast USA, and in that 160ms RTT range.
We ran into issues when a dozen people tried to use it in Australia, so the principal still happens with some badly written code.
NO. NO NO NO.
How can you get SO MANY facts wrong when the freaking story is googlable?
Here's the original email: https://web.mit.edu/jemorris/humor/500-miles
Here's the FAQ that covers the ambiguous parts: https://www.ibiblio.org/harris/500milemail-faq.html
This annoys me because I know the original author and I remember when this happened (he told the story a few times).
Let's recap:
> there was a university president
NO! It was the chairman of the statistics department.
> who couldn’t send an email more than 500 miles,
True. Being in the statistics department he had the tools to make actual maps.
> and the wise sysadmin said that’s not possible, so the president said come to my office
Kind of true. There was an office involved.
> and lo and behold, the emails stopped before going 500 miles.
True.
> There’s a lot to the story that’s obviously made up,
NO! Zero of this story was made up.
ALL the people that were involved in the story are still alive. You can literally get them on the phone and talk to them. We're not debating whether or not Han Solo ever used a light saber. THIS SHIT REALLY HAPPENED.
Sheesh.
The answer is that per the original story, it was not defaulting to three milliseconds. It was defaulting to 0, and the 3ms was just how long it took the system to check for a response with a 0 timeout:
> Some experimentation established that on this particular machine with its typical load, a zero timeout would abort a connect call in slightly over three milliseconds.
This is a very different scenario, as it's not clear there should be a poll() there at all (or more likely select() given the age of the story) to match the original, but if there was, the select would have a timeout of 0, not 3ms, and would just happen to be unable to distinguish between 0 and up to 3ms.
Today the websites are hosted on third party cloud servers (my school's main website is some company that hosts your Wordpress or Drupal site so you don't have to) and the email by Microsoft or Google. Same for every school it seems. I guess the IT department that used to run all the infra is now probably just a few people in charge of ordering new laptops for faculty/staff when they break, and replacing Wi-Fi access points every 5 years.
Apparently not.
I mean, I want reliability. But I also want Europeans to be able to taste that authentic latency they'd expect from a fledgling startup running out of a garage in San Jose.