FRESH

Hacker News

Home

Lies we tell ourselves about email addresses

165 points by theanonymousone

by gerdesj

4 subcomments

Email is just like physical mail and thankfully just as endearingly human (sometimes).
Once upon a time (1970/80s) I lived on and off in a mystic land called West Germany. Our postal addresses ended with incantations such as BFPO 40.
Around 1985ish my granny send a Christmas card to us. I should note that she was at this time nearly seventy and sadly suffering from Parkinsons. She addressed the card, in rather crabbed but legible handwriting, to:
Graham and Heath BFPO 40
My mum's name is abbreviated - her daughter. At that time Rheindahlen (nr Moenchengladbach) had a pretty large contingent of Brits in it - it was HQ (BAOR).
The card arrived well before Chrimbo and it took about a week judging by the post mark, which was petty normal in those days. She shoved it into a post box in Ipplepen, nr Newton Abbot, Devon and it found its way to an obscure address in another country. I seem to recall she also forgot the stamp but it still got through.
I'm sure mail like that becomes a point of honour to deliver and HM PO and BFPO did the job admirably.
That attitude is how email MTAs are generally designed to work. They cling on to the good old days and sadly the world is a bit shit. Case sensitivity ... lol!

by frereubu

4 subcomments

We have a UK client in the healthcare industry who registered the domain clientname.healthcare, and they rapidly found that the NHS imposed regexes which rejected name@clientname.healthcare emails.
Aside from regexes though, I also think the new TLDs confuse quite a lot of people. name@clientname.healthcare just doesn't click as an email address as quickly as name@clientname.com, and I'm in tech so I'm sure it's much more confusing for people outside that space.
In fact, that reminds me that we built a site for another client for use inside an exhibition space which was spacename.house and against our advice they put that - without www or https:// - on exhibition panels for use on mobile phones. I am absolutely convinced that most people didn't realise it was a web address.

by mmh0000

0 subcomment

https://fightingforalostcause.net/content/misc/2006/compare-...
This is one of my favorite articles on validating emails using RegEx, I fondly remember reading it over 15 years ago. It's stuck with me ever since.

by greengreengrass

4 subcomments

I, too, get so frustrated by + addresses not working that I’ve configured my MDA to rewrite —- (double hyphen) to plus, and use this in spite on sites that dislike the + variant. I’ve made it impossible to /not/ host my own mail delivery infrastructure now if I want every address I’ve ever given out to still work.
Although more recently I’ve moved to a catch all domain for throwaway, which is even better. It confuses agents on the phone though when I give my email address as {their company name}@mydomain.com
Yeah, most people don’t understand how the ownership and control varies before and after the @ symbol.

by riddley

8 subcomments

I have a gmail address that at least three other people think is their address. I constantly get emails for the dumb stuff they sign up for. NONE of them ever have an "I didn't request this" link. I mean, I get it. That won't make them money, but oh man is it annoying.

by farfatched

5 subcomments

> It’s likely that more people out there are being filtered by badly-implemented form validation than there are being filtered by their own need of hand-holding.
I wish this was asserted with evidence. The author might suggest this because they have unrealistic views of some users.
> In the year of our lord 2026, you can reasonably expect your users to know how to type their own email address - or even better, auto-input from their OS, browser, keyboard app, or password manager.
This really depends on who your users are.
I have multiple family members who have healthy memory, but can't accurately remember their email address everytime: the localpart, the domain, the syntax, everything.
Sending an email verification isn't sufficient, because if the user has typo'd ".com", they might never receive that email, and the user might never be back, or then have to escalate to support.
Meanwhile, if a site is opinionated on TLDs, they might prevent those users facing issues.
I'm sure there are many sites were users have a large variety of odd email addresses, but also there are sites that cater to mostly non-technical users within 1-2 locales, and so may find the friendliest UX is having opinionated validation.

by davidw

1 subcomments

[Old man voice] Back in my day these kinds of articles loved pointing out that, well, the email address could be a UUCP address and that's a whole different parsing situation.
Of course, even then in the mid 90ies, UUCP was not something one really encountered outside of "so you think you're going to parse an email address with regexp?!" articles.
https://en.wikipedia.org/wiki/UUCP#Mail_routing

by Freak_NL

2 subcomments

This all old hat, unfortunately, and also a thing which will be gotten wrong by developers for years to come. Just shouting 'give me a regex for validating email addresses' will make an LLM like ChatGPT happily output bullshit suggesting some overlong regex which is flawed precisely as outlined by the linked article, even though no one is arguing for those long unmaintainable regexes once they've seen the light.
Ah well.
Where there is still room for improvement is in how email addresses are often made a little bit anonymous by a lot of websites. Did you ever see something like 'j*h@gmail.com'? Oh wow, that neatly leaves out John Smith's full name! Like showing only the last four numbers of an IBAN or credit card.
Except for us edge cases with a personal domain, where I then get 'm*l@myfullname.nl'. So stop that. Store it next to the bit of knowledge about validating email addresses — the bits of knowledge you use to correct junior developers and senior idiots.

by rock_artist

3 subcomments

> In that sense, it’s actually pretty surprising that so much of the world’s population wasn’t able to put their own name, in its native written form, in an email address until just 14 years ago.
Maybe for some internal usages. but imagine someone from a country using different language and characters gives me a card with their email. It's now far less portable for me to use it. Those days, I surely could picture it and find the email most likely getting it right.
But email as means of international communication, like passport, should be readable as possible or it kills its purpose.
Even with ASCII emails I have, I already sometimes struggle to pass them over phone or other methods :)

by SeanLuke

2 subcomments

These are waaay too complicated. Web developers can't even handle the easy stuff. My email address is of the form sean@foo.bar.baz, and email address validators on websites reject my address about 30% of the time because it has two periods.

by sylware

0 subcomment

Many email servers do forget about the email adresses with IP literals, that for people who are self-hosted without paying for DNS.
mailbox@[x.x.x.x] and mailbox@[ipv6:...] (and probably without "ipv6" prefix once ipv4 is gone).
This is stronger than SPF since the second the IP of the sending SMTP server does not match the IP in the "from" headers and the envelope, the email is dropped, not even going into spam.
For instance, currently, if I send an email to a gmail slave, their parsers will ask for... a DNS PTR record, Oo "Geniuses" at work, or conveniently breaking all interop with small tech?

by atoav

1 subcomments

I think most of these issues are easy to resolve by being more permissive and supporting what the technical standard allows for.
The Big Problem™ however is case sensitivity in the local-part, because there multiple incompatible things collide:
1. Users are not universally aware of case (in)sensitivity in one direction or the other
2. Existing systems may or may not interpret case at all
My preferred solution would be to adjust the standard to ignore case in the local part by forcing it to lowercase. That aligns with most of the systems and mental model of technically proficient users anyways. It makes much more sense from an UX standpoint since the goal is to be imambiguous.
If we were to enforce the opposite: case sensitivity in the local part this would have multiple downsides:
1. It is inconsistent with itself by making the local part case sensitive but the host part not, that is harder to explain
2. You have to train users to be precise about case on entry. As someone who worked in IT-support, this is a very bad idea. This includes second-order issues like phishing attacks by silbling emails where just the case differs
3. If your service stores email addresses it will need to know whether that specific Mailserver/client/etc treats the email as case-sensitive or not
In my eyes email servers that allow case sensitive local-parts are functionally broken, even if they don't break any rules.

by amiga386

9 subcomments

Add the lie "emails are delivered instantly, so the user can click a link I email them within 1 minute"
And the lie "users always read emails on the same device they're logging into a website with"
And the lie "users can always view HTML email so no need to send a plaintext equivalent, especially if I have a long complex URL I want them to click"
And the lie "Clickable links sent in email are more secure than passwords so I'll stop supporting passwords and instead rely on email delivery of a link for all logins. Whoever clicks that link first is definitely the user who wanted to log in"

by julian_t

4 subcomments

"Email addresses always have a 'normal' TLD"
I registered a ".consulting" domain for my little company when they became available, and it has proved highly problematic ever since. Strangely (or perhaps not) it seems to be the larger players that have the most problems. I would at lest have expected ISPs and comms companies to keep up with this (looking at you, Three)

by jeffbee

2 subcomments

This article says that Gmail can't handle address literals. I personally wrote the IPv6 address literal support for Gmail, so this annoys me. I just tested it and it shortened "[IPv6:2001:etc:etc::192.etc.etc]" down to "@2001" then generated an extremely terse mail delivery subsystem notification that I've never seen before. Which is why you should never just rewrite software without understanding why all the test cases are in the test suite!

by ale42

0 subcomment

An e-mail address can have multiple @ also for... source routing. Of course it doesn't make any sense nowadays, but it's technically allowed. RFC 5321 gives an example:
```
  @hosta.int,@jkl.org:userc@d.bar.org
```
This is a valid e-mail address, with source-routing along two intermediate servers. I guess no sane server on the Internet will accept this, but you never know... (this said, I remember attempting this around 1996, when many servers were open relays, and the message was happily delivered after passing through 3-4 servers).

by smelendez

0 subcomment

Another one is that you can tell “professional” from “personal” email addresses or that every address even cleanly fits into just one category.
A lot of small business owners use gmail or a longstanding ISP account. A lot of people have personal email addresses you can’t easily distinguish from professional ones, between college alumni addresses, personal domains, and obscure ISP and email providers that aren’t in your database.

by KurSix

0 subcomment

Email addresses are a great example of boring infrastructure hiding decades of edge cases

by account42

1 subcomments

> It is relatively expensive to run
Compared to sending a mail or to a customer not getting a mail they wanted?
> Try to keep it as non-restrictive as possible. Something like ^[^@]+@[^@\s]+$, which only makes sure your user has input “something@something”
Requiring a dot in the domain part is perfectly valid. It makes no sense to not validate that the address is in a format that you can actually send something to, which include a domain that you can look up and isn't specifically rejected by your MTA.
> This belief will probably be more commonly held in the English-speaking world, but I’m curious: If you’re not in the Anglosphere, do you still expect emails to require ASCII latin characters?
Yes, I do not trust Unicode with all its ambiguities and alternate forms to resolve to the same identifier on your and that I intended. ASCII-only email addresses are the norm everywhere I have seen.

by atoav

2 subcomments

One thing I have learned about verification is:
Don't just put a link into your mail that directly verifies an email when visited. At least put some button or code input field there.
Why? There are mail clients that will automatically open links for users and if that link is now invalid the user is confused about being able to click them.

by dathinab

1 subcomments

> Note: I have struggled to verify this one, and it’s possible I’m actually misreading the RFC.
Is correct, you can have quoted local parts and (I guess?) theoretically "foo"@mail and foo@mail should even be treated the same.
But practically this is a dead feature and probably should be treated as non existing.
AFIK `[<ip-address]` mails are used by some old data centers for delivering automatic generated "error" mails from unix server in a way which doesn't break when DNS is down.
Also interestingly the `[..]` syntax has a generic extension hook, and that hook allows usage of @ characters. So technically a `foo@[custom:@@@@@@@@]` is a valid mail address, just no one knows how to deliver it ;). (And `custom` must be registered with IANA, theoretically).

by dathinab

0 subcomment

> Punycode [...] and the local-part was still limited to ASCII.
the funny part is this is only half true
The true part: Punycode has never be standardized for the localpart and as such taking a email address with non us-ascii characters in the local part and punycode encoding it is fundamentally wrong.
But: Nothing prevents you to have a local part which "happens" to look like punycode and especially in the early SMTPUTF8 days many providers which did allow non-us-ascii email local parts automatically created an "alias" email address where the local part was punycode encoded. Nothing in the standard prevents this and as consequence punycode encoding a local part _might_ just happen to work for some subset of non-us-ascii emails.

by tracker1

0 subcomment

I have a relatively good email address, and more than a handful of people who don't seem to understand email, just use my address... I've had payment confirmations from mlb.com orders, to tractor supply receipts and junk mail, to student loan paperwork. It's amazing how much garbage I see all because nobody actually confirms email address ownership before signing people up for crap.
The worst is some foreign gambling site, I can't even log into to change the preferences and cancel the account.
Though, I did deface then delete someone's dating profile once, who signed up on an app with my email...

by alkonaut

0 subcomment

Validation to avoid mistakes is, as they point out, good. I'd even go so far as to extend it so that I reject those without any tld (without any dots) just because it's 99.999% a mistake and I don't care about the person who has ben@net. I'd also reject ip numbers.
Next is the spicy take: I need to consider WHY I am gathering this email?
If I'm gathering it for "marketing purposes" or any such cross correlation to other systems, then I'd also reject bob.smith+dontspamme@gmail.com. Or I'd keep both so you can do cross referencing on both the + address and the "raw" one.

by dvh

0 subcomment

It's not lies. And it's not about me either. If I collect email address, it will be used somewhere, someday, in god knows what app. If I'm the one collecting the email, I will make it as restrictive at possible so that it doesn't causes issues down the line. If it's too different than John.Doe_123@example.com, it's best to reject it.
For robust systems the goal was never to allow user type any technically valid email. It is to allow only emails that will not cause issues in the future.

by forgetfreeman

1 subcomments

"Regex is hard, regex wizardry is rare, and regex engine implementations are inconsistent. It’s very, very easy to accidentally get it wrong without realizing it."
The what now? I'm struggling to take this seriously because a decade ago regex where common knowledge, like if you don't have a handle on this you should probably go get a job in marketing levels of common knowledge. Has the profession fallen off this far in ten years?

by adamzwasserman

0 subcomment

I enjoyed the deep dice. A lot of sensible advice, and enjoyed the deep dive. A lot of articles do not get a lot of that as right as this article does.
Anyone who also enjoyed it would probably get a kick out of my article on the same subject that goes into the regex (which has some valid use cases): https://hackernoon.com/on-the-practicality-of-regex-for-emai...

by p0w3n3d

1 subcomments

You really did -- in your domain name, didn't you?

by croes

0 subcomment

And even if you know that an email address is perfectly valid it still could be simply wrong because of a tpyo.

by sohex

0 subcomment

IIIRC in terms of clients mutt (&co) will actually handle “@“ in the local part correctly.
> But the real reason I do that is just because I just like to sit in anger whenever this breaks the user experience because of programming errors or inconsistencies.
Genuinely delighted by the fact that I’m not alone in that.

by UltraSane

0 subcomment

I own a domain I use for email and I have it configured to deliver ANY address that ends with @mydomain. This works like + addressing on steroids. I can have website@mydomain or recipient@mydomainand it makes filtering much easier.

by Const-me

0 subcomment

Good article. Worth noting C# standard library handles most of that complexity, no regular expressions required. Call System.Net.Mail.MailAddress.TryCreate, if successful read Address property to find the normalised address.

by miningtcup

2 subcomments

I would like to point out that the "suggested" validation pattern, ^[^@]+@[^@\s]+$, can filter out valid addresses. "user@something"@example.com is a valid address, and excluding @'s in the user part rejects it.

by mesrik

0 subcomment

There is one more 'lie' missing and not included in that writing which only looks email addresses what is are limits of valid destination addresses.
But if used as a senders source address there are even less limits.
For example you can use a null address <> when sending. That has been used bit less these days than earlier. It's been used ages SMTP delivery status notifications, mail loop prevention and so where intentionally not much sense to expect anyone to reply. And all well known MTA's forward it and email clients handle it very well by disabling reply to that message.
There is however a catch that anyone who thinks he would now start using it when he doesn't want any reply. Ever since IT Service Management (ITSM) and Service Desk software appeared, they have had issues with email coming from <> sender, because they like to always add received messages email addresses to database, where then someone handling would reply. I've been using only few, Service Now (SN) more lately and before Issue Tracker (IT), both didn't at least about year and half ago know how to handle null sender addresses. Both seemed to just discard or sort some trash bin those emails. With our SN sysadmin didn't find where those went in that system.
But otherwise <> as a sender works great. And sure it would be great if those ITSM making folks would get this fixed, because when your postmaster, postmaster, etc. and such role-aliases are the quite often handled by ITSM software, there is good chance you don't get some important notifications from systems that rely on that null address sender.
ps. Search Google: smtp and sender address as "<>" for more info incase needed.

by TZubiri

1 subcomments

Soooo, let's just send a validation email and if they confirm the code, then it's a valid email?
Functionally there's no false positives or false negatives

by jrrv

0 subcomment

> In the year of our lord 2026, you can reasonably expect your users to know how to type their own email address
Lies we tell ourselves about users.

by echoangle

0 subcomment

Maybe I'm taking this too lightly but honestly, if you're playing games with your email address and then don't get my verification mail, it's kind of a you problem. If your email address contains non-printable unicode characters or an IP address as the domain part, I don't really care enough to add support just for you. And surely everyone who does this has a "normal" email as a fallback anyways.

by teo_zero

0 subcomment

The plus sign is a pet peeve of mine, too. But I stopped keeping a list of bad sites when their number has become double digit!

by chrisandchris

0 subcomment

> It turns out that allowing senders to omit dots is common but by no means universal!
I think this is mostly common with Gmail-heavy countries and does not apply to Europe? At least I do not know of anyone that thinks so.

by jiveturkey

3 subcomments

> TL;DR: Don't overthink it, just send a verification email.
pretty bad advice, if taken only as written, without adding more flavor on top.
the major email providers will penalize you if you generate too many undeliverable emails. thus, if you just send a verification email without any pre-validation, it's pretty easy to get into a DoS situation where current/valid users don't get important email sent to them, or that email is significantly delayed, plus incur huge operating cost to resolve the problem.
some form of rate limiting is needed, plus IMHO it's better to use a verifier service or your own heuristic or ML model to test for email validity including valid but fake/spammy/disposable addresses.
sorry, but we are way past the point of being able to have nice things, esp. when we're talking about email.
the "lies" part of the content is great. people do assume all those wrong things. however the TLDR is just wrong, and potentially harmful.

by Xotic007

0 subcomment

[flagged]

by ashley95

2 subcomments

This is cute and all. But for anyone coming here for real-world advice: just use a regex, normalize to lowercase, and surface any errors to users so they know if their email got rejected. This will avoid 99.9% of issues and work for 100% of real human users. This is what everyone else does, and if you have a user with an esoteric email, they will still be able to furnish another one that passes this validation.