This is the kind of thing you read in a post-mortem and wonder how they designed something so fiendishly wonderful.
At 2:00am our MySQL master failed and failed over successfully to our secondary server. As part of post-failover ops, ansible playbook proceeded to login to 1000 instances to update the hosts file for the new master. This caused traffic amplication which caused our Etcd nodes to believe they were down. As the etcd nodes failed over, our ansible playbook proceeded to then login to 1000 instances to update the hosts file...
Honestly, whatever system you built is justing do the same exact function as DNS just with extra steps. If you squint really hard /etc/hosts is your local dns cache and ansible is your resolver. I think this kind of "simplification fetishization" is dangerously attractive to people who have only managed relatively simply setups. I don't think anyone who has ever had to deal with high-availability failover would consider Ansible a good solution.
The problem that so many people hit with DNS isn't specific to DNS the protocol - it's the problem of service discovery. This architecture doesn't eliminate service discovery, it just moves it to a far more brittle configuration.
It is used to make entire protocols work (MX records for email, but SRV records are used for much more).
Now, if we do look at the most basic of basic DNS roles — mapping a human readable name to arbitrary set of numbers identifying a machine on the network — we should consider how do we avoid some of the issues while keeping all of the benefits of DNS.
Eg. if we indeed "materialize" machine identifiers, we lose the ability to do virtual hosting (domains not passed in) or fix a problem with just a DNS update (eg. treating load-balancing machines like cattle).
The author jumps immediately to, arguably, ill advised materialization techniques like /etc/hosts, without considering all that DNS does for a complex, real world system and what goes missing.
I take from some of the other comments he uses /etc/hosts on hosts with Ansible to provide resolving. Sounds convoluted as /etc/resolv.conf and libc resolvers works. Go for the lowest fallback and dump files with Ansible. Homelab with extra steps, as setting up a DNS server is easy, ... Consider coredns, dnsmasq, if bind is too much
Thanks for the laugh...
this is classic "easy vs. simple" folly, witness how someone too lazy to [learn how to] setup proper DNS for their infrastructure will do 10x the work hacking something "easy"
Because now you've replaced one single point of failure configuration system with caching and TTLs (DNS) with a higher maintenance and much less widely supported one.
People, services, machines, etc need to "dial" canonical-somewhere. Whatever does the canonical management is the piece that when it breaks everything breaks.
Doesn't matter if it's DNS, EIP rotation, some HA proxy, whatever. It'll break.
It's actually that DNS is so well understood that it doesn't fail more often.
So no, DNS is for IT Infra.
And for very specific nit picks, and I can’t believe I’m entertaining this idea enough to ask, but tell me how the new device on the network bootstraps without DNS? And the guest device. And the printer without Ansible support. And the NDI receiver that needs to resolve its host. And how do you resolve split brain resolution for roaming devices? Are you going to publicly address all internal resources now so my laptop keeps working outside the office?
DNS was not created as a random solution looking for a problem…
This isn't how any of this was really supposed to work. Back in the day the application identifier was the _port number_, according to a big list maintained by ICANN. The idea was that you could go to a machine (identified by IP or more conveniently by DNS) and see if it was running an instance of the ‘Facebook’ application, i.e. you'd find Facebook not at facebook.com:https but at meta.com:facebook. The end goal was to eliminate the need for the former part at some point, and come up with a better way of looking up applications than distributing a list by email. Instead the application ID is now used for transport and the host name instead encodes application ID, which it was never meant for, and that's why we can't have nice things (like device mobility).
But to address the article in a simple environment dns _just_works_. I’ve never once had an issue with bind. It’s incredibly simple and stable and easy to understand when working with within a small environment without much churn and enables other technologies to operate in an expected way because it’s the standard. ACME, kerberos, sshfp, many more are enabled by DNS. Sure maybe you can kludge some of that back together with hosts but I’d rather not just to replace one of the most stable services that exist.
DNS does start to get more complicated in massive environments but that’s just a reflection of the environment. Using ansible to manage /etc/hosts across hundreds or thousands machines with churn will not be less complicated to manage than dns.
> https://www.rfc-editor.org/info/rfc9364/#name-dnssec-core-do...
DNS is for Infrastructure, people use infrastructure.
>That got me thinking, why would we use DNS for infrastructure services? It isn't necessary for machine-to-machine communication. Instead of configuring domain names that may not resolve, we can just directly inject the appropriate IP address(ess) into configuration files. It's easy to configure systems with tools like Ansible or pyinfra at scale.
No no no no god no.
"What if we set up a convoluted higher level application solution"
This is going to go wrong more frequently and contain more errors than DNS.
>Fortunately, we still have /etc/hosts, which we can easily provision. Still no DNS service required! This way, we can configure domain names and pretend to use DNS. I also suspect that DNS queries against /etc/hosts are quite responsive.
No thats a horrible idea. Userspace should never be updating your hosts file, users will fall behind on changes and be placed at extreme security risk. Fully half the benefit of UAC on windows is preventing persistence by preventing malicious entities from updating hosts.
>As of today, most network traffic is encrypted by default, or tunneled through an encrypted channel. DNS is - by default - the exception.
DNS is mostly secure now, to the point where its a problem. But thats a vendor issue not a you issue please dont attempt to solve it. If you go full encrypted DNS you generally also get dragged into HTTPS proxying and things of that nature. This does not get better by removing a dynamic protocol for querying names.
>Due to this risk, there is a case to be made, to - at least - not allow systems to query public DNS records. As servers may need to interfact with services on the internet (update servers, APIs, and so on), such access can be facilitated by a proxy server using allow-listed domains.
Attackers use DNS because its versatile and resistant to the very issues you keep confidently presenting. A protocol is not a risk just because hackers use it. Hackers also use HTTPS and other protocols but we arent burning them at the stake.
>That said, I think it's reasonable to explore if DNS can be avoided altogether within the IT infrastructure to increase reliability and robustness.
Its reasonable for people with much better understanding of the infrastructure and protocol to examine these things. This reads like an end user suggesting "what if we deliver websites by hand printed on paper".
Tell me that you've never used Ansible at scale without telling me that you've never used Ansible at scale.
In SOHO settings I might actually agree, but, this is where I think site administered and distributed multicast DNS was a missed opportunity.
Now it’s for machines