> The answer to the query, possibly preface by one or more CNAME RRs that specify aliases encountered on the way to an answer.
The "possibly preface" (sic!) to me is obviously to be understood as "if there are any CNAME RRs, the answer to the query is to be prefaced by those CNAME RRs" and not "you can preface the query with the CNAME RRs or you can place them wherever you want".
https://blog.cloudflare.com/zone-apex-naked-domain-root-doma... , and I quote directly ... "Never one to let a RFC stand in the way of a solution to a real problem, we're happy to announce that CloudFlare allows you to set your zone apex to a CNAME."
The problem? CNAMEs are name level aliases, not record level, so this "feature" would break the caching of NS, MX, and SOA records that exist at domain apexes. Many of us warned them at the time that this would result in a non-deterministic issue. At EC2 and Route 53 we weren't supporting this just to be mean! If a user's DNS resolver got an MX query before an A query, things might work ... but the other way around, they might not. An absolute nightmare to deal with. But move fast and break things, so hey :)
In earnest though ... it's great to see how now CloudFare are handling CNAME chains and A record ordering issues in this kind of detail. I never would have thought of this implicit contract they've discovered, and it makes sense!
"With a sufficient number of users of an API, it does not matter what you promise in the contract: all observable behaviors of your system will be depended on by somebody."
combined with failure to follow Postel's Law:
"Be conservative in what you send, be liberal in what you accept."
That seems like some doubling-down BS to me, since they earlier say "It's ambiguous because it doesn't use MUST or SHOULD, which was introduced a decade after the DNS RFC." The RFC says:
>The answer to the query, possibly preface by one or more CNAME RRs that specify aliases encountered on the way to an answer.
How do you get to interpreting that, in the face of "MUST" being defined a decade later, as "I guess I can append the CNAME to the answer?
Holding onto "we still think the RFC allows it" is a problem. The world is a lot better if you can just admit to your mistakes and move on. I try to model this at home and at work, because trying to "language lawyer" your way out of being wrong makes the world a worse place.
$ echo "A AAAA CAA CNAME DS HTTPS LOC MX NS TXT" | sed -r 's/ /\n/g' | sed -r 's/^/rfc1034.wlbd.nl /g' | xargs dig +norec +noall +question +answer +authority @coco.ns.cloudflare.com
;rfc1034.wlbd.nl. IN A
rfc1034.wlbd.nl. 300 IN CNAME www.example.org.
;rfc1034.wlbd.nl. IN AAAA
rfc1034.wlbd.nl. 300 IN CNAME www.example.org.
;rfc1034.wlbd.nl. IN CAA
rfc1034.wlbd.nl. 300 IN CAA 0 issue "really"
;rfc1034.wlbd.nl. IN CNAME
rfc1034.wlbd.nl. 300 IN CNAME www.example.org.
;rfc1034.wlbd.nl. IN DS
rfc1034.wlbd.nl. 300 IN DS 0 13 2 21A21D53B97D44AD49676B9476F312BA3CEDB11DDC3EC8D9C7AC6BAC A84271AE
;rfc1034.wlbd.nl. IN HTTPS
rfc1034.wlbd.nl. 300 IN HTTPS 1 . alpn="h3"
;rfc1034.wlbd.nl. IN LOC
rfc1034.wlbd.nl. 300 IN LOC 0 0 0.000 N 0 0 0.000 E 0.00m 0.00m 0.00m 0.00m
;rfc1034.wlbd.nl. IN MX
rfc1034.wlbd.nl. 300 IN MX 0 .
;rfc1034.wlbd.nl. IN NS
rfc1034.wlbd.nl. 300 IN NS rfc1034.wlbd.nl.
;rfc1034.wlbd.nl. IN TXT
rfc1034.wlbd.nl. 300 IN TXT "Check my cool label serving TXT and a CNAME, in violation with RFC1034"
The result is DNS resolvers (including CloudFlare Public DNS) will have a cache dependent result if you query e.g. a TXT record (depending if it has the CNAME cached).
At internet.nl (https://github.com/internetstandards/) we found out because some people claimed to have some TXT DMARC record, while also CNAMEing this record (which results in cache dependent results, and since internet.nl uses RFC 9156 QName Minimisation, if first resolves A, and therefor caches the CNAME and will never see the TXT). People configure things similar to https://mxtoolbox.com/dmarc/dmarc-setup-cname instructions (which I find in conflict with RFC1034).> If recursive service is requested and available, the recursive response to a query will be one of the following:
> - The answer to the query, possibly preface by one or more CNAME RRs that specify aliases encountered on the way to an answer.
> While "possibly preface" can be interpreted as a requirement for CNAME records to appear before everything else, it does not use normative key words, such as MUST and SHOULD that modern RFCs use to express requirements. This isn’t a flaw in RFC 1034, but simply a result of its age. RFC 2119, which standardized these key words, was published in 1997, 10 years after RFC 1034.
It's pretty clear that CNAME is at the beginning.
The "possibly" does not refer to the order but rather to the presence.
If they are present, they are are first.
That's the only reasonable conclusion, really.
It’s always DNS.
Please order the answer in the order the resolutions were performed to arrive at the final answer (regardless of cache timings). Anything else makes little sense, especially not in the name of some micro-optimization (which could likely be approached in other ways that don’t alter behaviour).
There's also so much of it, and it mostly works, most of the time. This creates a hysteresis loop in human judgement of efficacy: even a blind chicken gets corn if it's standing in it. Cisco bought cisco., but (a decade ago, when I had access to the firehose) on any given day belkin. would be in the top 10 TLDs if you looked at the NXDOMAIN traffic. Clients don't opportunistically try TCP (which they shouldn't, according to the specification...), but we have DoT (...but should in practice). My ISPs reverse DNS implementation is so bad that qname minimization breaks... but "nobody should be using qname minimization for reverse DNS", and "Spamhaus is breaking the law by casting shades at qname minimization".
"4096 ought to be enough for anybody" (no, frags are bad. see TCP above). There is only ever one request in a TCP connection... hey, what are these two bytes which are in front of the payload in my TCP connection? People who want to believe that their proprietary headers will be preserved if they forward an application protocol through an arbitrary number of intermediate proxy / forwarders (because that's way easier than running real DNS at the segment edge and logging client information at the application level).
Tangential, but: "But there's more to it, because people doing these things typically describe how it works for them (not how it doesn't work) and onlookers who don't pay close attention conclude "it works"." http://consulting.m3047.net/dubai-letters/dnstap-vs-pcap.htm...
> To prevent any future incidents or confusion, we have written a proposal in the form of an Internet-Draft to be discussed at the IETF
Of course.
> Another notable affected implementation was the DNSC process in three models of Cisco ethernet switches. In the case where switches had been configured to use 1.1.1.1 these switches experienced spontaneous reboot loops when they received a response containing the reordered CNAMEs.
... but I am surprised by this:
> One such implementation that broke is the getaddrinfo function in glibc, which is commonly used on Linux for DNS resolution.
Not that glibc did anything wrong -- I'm just surprised that anyone is implementing an internet-scale caching resolver without a comprehensive test suite that includes one of the most common client implementations on the planet.
As an aside, I am super annoyed at Cloudflare for calling their proxy records "CNAME" in their UI. Those are nothing like CNAMEs and have caused endless confusion.
nitpicking at the RFCs when everyone knows DNS is a big old thing with lots going on
how do they not have basic integration tests to check how clients resolve
it seems very unlike cloudflare of old that was much more up front - there is no talk of the need to improve process, just blaming other people
Maybe I'm being overly-cynical but I have a hard time believing that they deliberately omitted a test specifically because they reviewed the RFC and found the ambiguous language. I would've expected to see some dialog with IETF beforehand if that were the case. Or some review of the behavior of common DNS clients.
It seems like an oversight, and that's totally fine.
And I also being shocked that Cisco Switch goes to reboot loop with this DNS order issue.
Also, what's the right mental framework behind deciding when to release a patch RFC vs obsoleting the old standard for a comprehensive update?
Each resolved record would be asserted as a fact, and a tiny search implementation would run after all assertions have been made to resolve the IP address irrespective of the order in which the RRsets have arrived.
A micro Prolog implementation could be rolled into glibc's resolver (or a DNS resolver in general) to solve the problem once and for all.
Reminds me of https://news.ycombinator.com/item?id=37962674 or see https://tech.tiq.cc/2016/01/why-you-shouldnt-use-cloudflare/
Sounds low key selfish / inconsiderate to me
... to push such a change without adequate thought or informed buy in by consumers of that service.
> One such implementation that broke is the getaddrinfo function in glibc, which is commonly used on Linux for DNS resolution.
> Most DNS clients don’t have this issue.
The most widespread implementation on the most widespread server operating system has the issue. I'm skeptical of what the author means by "Most DNS clients."
Also, what is the point of deploying to test if you aren't going to test against extremely common scenarios (like getaddrinfo)?
> To prevent any future incidents or confusion, we have written a proposal in the form of an Internet-Draft to be discussed at the IETF. If consensus is reached...
Pretty sure both Hyrum's Law and Postel's Law have reached the point of consensus.
Being conservative in what you emit means following the spec's most conservative interpretation, even if you think the way it's worded gives you some wiggle room. And the fact that your previous implementation did it that way for a decade means people have come to rely on it.
Wherever possible I compile with gethostbyname instead of getaddrinfo. I use musl instead of glibc
Nothing against IPv6 but I do not use it on the computers and networks I control
It's surprising how something so simple can be so broken.