FRESH

Hacker News

Home

XML is a cheap DSL

273 points by y1n0

by necovek

14 subcomments

XML is notoriously expensive to properly parse in many languages. Basically, the entire world centers around 3 open source implementations (libxml2, expat and Xerces), if you want to get anywhere close to actual compliance. Even with them, you might hit challenges (libxml2 was largely unmaintained recently, yet it is the basis for many bindings in other languages).
The main property of SGML-derived languages is that they make "list" a first class object, and nesting second class (by requiring "end" tags), and have two axes for adding metadata: one being the tag name, another being attributes.
So while it is a suitable DSL for many things (it is also seeing new life in web components definition), we are mostly only talking about XML-lookalike language, and not XML proper. If you go XML proper, you need to throw "cheap" out the window.
Another comment to make here is that you can have an imperative looking DSL that is interpreted as a declarative one: nothing really stops you from saying that
```
  totalOwed = totalTax - totalPayments
  totalTax = tentativeTaxNetNonRefundableCredits + totalOtherTaxes
  totalPayments = totalEstimatedTaxesPaid +
                      totalTaxesPaidOnSocialSecurityIncome +
                      totalRefundableCredits
```
means exactly the same as the XML-alike DSL you've got.
One declarative language looking like an imperative language but really using "equations" which I know about is METAFONT. See eg. https://en.wikipedia.org/wiki/Metafont#Example (the example might not demonstrate it well, but you can reorder all equations and it should produce exactly the same result).

by jaen

6 subcomments

Or... you could just use a programming language that looks good and has great support for embedded domain-specific languages (eDSL), like Haskell, OCaml or Scala.
Or, y'know, use the language you have (JavaScript) properly, eg. add a `sum` abstraction instead of `.reduce((acc, val) => { return acc+val }, 0)`.
In particular, the problem of "all the calculations are blocked for a single user input" is solved by eg. applicatives or arrows (these are fairly trivial abstract algebraic concepts, but foreign to most programmers), which have syntactic support in the abovementioned languages.
(Of course, avoid the temptation to overcomplicate it with too abstract functional programming concepts.)
If you write an XML DSL:
1. You have to solve the problem of "what parts can I parallelize and evaluate independently" anyway. Except in this case, that problem has been solved a long time ago by functional programming / abstract algebra / category-theoretic concepts.
2. It looks ugly (IMHO).
3. You are inventing an entirely new vocabulary unreadable to fellow programmers.
4. You will very likely run into Greenspun's tenth rule if the domain is non-trivial.

by twic

3 subcomments

FWIW you can do a better job with the JSON structure than in the article:
```
    {"GreaterOf": [
        {"Value": [0, "Dollar"]},
        {"Subtract": [
            {"Dependency": ["/totalTentativeTax"]},
            {"Dependency": ["/totalNonRefundableCredits"]}
        ]}
    ]}
```
Basically, a node is an object with one entry, whose key is the type and whose value is an array. It's a rather S-expressiony approach. if you really don't like using arrays for all the contents, you could always use more normal values at the leaves:
```
    {"GreaterOf": [
        {"Value": {"value": 0, "kind": "Dollar"}},
        {"Subtract": {
            "minuend": {"Dependency": "/totalTentativeTax"},
            "subtrahend": {"Dependency": "/totalNonRefundableCredits"}
        }}
    ]}
```
It has the nice property that you're always guaranteed to see the type before any of the contents, even if object keys get reordered, so you can do streaming decoding without having to buffer arbitrary amounts of JSON. Probably not important when parsing a tax code, but can be useful for big datasets.

by sgarland

3 subcomments

While a great article, I actually found this linked post [0] to be even better, in which the author lays out how so much modern tooling for web dev exists simply because XML lost the browser war.
EDIT: obviously, JSON tooling sprang up because JSON became the lingua franca. I meant that it became necessary to address the shortcomings of JSON, which XML had solved.
0: https://marcosmagueta.com/blog/the-lost-art-of-xml/

by Decabytes

1 subcomments

S-expressions are a cheap dsl too. I use it in my desktop browser runtime that is powered by wasm that I’m developing As the “HTML”^1 and CSS^2 in fact it works so well I use it also reused it to do the styling for html exports in my markup language designed to fight documentation drift^3.
1. https://gitlab.com/canvasui/canvasui-engine/-/blame/main/exa...
2. https://gitlab.com/canvasui/canvasui-engine/-/blob/main/exam...
3. https://gitlab.com/sablelang/libcuidoc

by Kwpolska

0 subcomment

XML is beloved by tax authorities. The Polish tax authorities really love their e-documents and online filing. Except their XML documents are completely human-unreadable, since the schemas are based on field numbers in paper forms. Even in the brand new National e-Invoicing System, designed from scratch, with no paper forms, most fields have names like ‹P_19N›1‹/P_19N›. You read the XML schema to find out it is a "Marker of lack of delivery of goods or provision of services exempt from tax under Article 43 paragraph 1 of the [VAT] Act, Article 113 paragraphs 1 and 9 of the Act or regulations issued under Article 82 paragraph 3 of the Act or under other provisions" (Google Translated, because of course everything is in Polish). So my invoice is saying "yes [1], I am not [N] exempt from tax under $allThatNonsense [P_19]".
In unrelated news, the main author of the VAT Act is offering tax consulting services, as Registered Tax Advisor #00001.

by jfengel

0 subcomment

It's not a DSL. It's a generic lexer and parser. It takes the text and gives you an abstract syntax tree. The actual DSL is your spec, and the syntax you apply.
It's one of many equivalent such parser tools, a particularly verbose one. As such it's best for stuff not written by hand, but it's ok for generated text.
It has some advantages mostly stemming from its ubiquity, so it has a big tool kit. It has a lot of (somewhat redundant) features, making it complex compared to other options, but sometimes one of those features really fits your use case.

by 1a527dd5

2 subcomments

The trouble with XML has never been XML itself.
It was also about how easy it was to generate great XML.
Because it is complicated and everyone doesn't really agree on how to properly representative an idea or concept, you have to deal with varying output between producers.
I personally love well formed XML, but the std dev is huge.
Things like JSON have a much more tighter std dev.
The best XML I've seen is generated by hashdeep/md5deep. That's how XML should be.
Financial institutions are basically run on XML, but we do a tonne of work with them and my god their "XML" makes you pray and weep for a swift end.

by exabrial

3 subcomments

Given that that is had strong schema XSD verification built in, where you can tell in an instant whether or not the document is correct; it’s the right tool for a majority of jobs.
My experience has been the people complaining about it were simply not using automated tools to handle it. It’s be like people complaining that “binaries/assembly are too hard to handle” and never using a disassembler.

by somat

0 subcomment

XML makes for a pretty good markup language and an ok data interchange format(not a great fit, but the tooling is pretty good). but every single time I have seen it used as a programing language I found it deeply regrettable.
For comparison JSON is a terrible markup language, a pretty good data interchange format, and again, a deeply regrettable programing language. I don't know if anyone has put programing language in straight JSON (I suspect they have shudders) but ansible has quite a few programing structures and is in YAML which is JSON dressed in a config language's clothes.
However as a counter point to my json indictment, it may be possible to make a decent language out of it, look to lisp, it's S-expressions are a sort of a data interchange format(roughly equivalent to json) and it is a pretty good language.

by Sharlin

1 subcomments

> It evokes memories of SOAP configs and J2EE (it’s fine, even good, if those acronyms don’t mean anything to you).
Heh, a couple of years ago I walked past a cart of free-to-take discards at the uni, full of thousand-page tomes about exciting subjects like SOAP, J2EE and CORBA. I wonder how many of the current students even recognized any of those terms.

by thatwasunusual

2 subcomments

It's completely unbelievable that so-called developed countries are struggling with this in 2026.
In Norway, we've had a more or less automated tax system for many years; every year you get a notification that the tax settlement is complete, you log in and check if everything is correct (and edit if desired) and click OK.
It shouldn't be more difficult than this.

by panzi

0 subcomment

XML (and those prolog and KDL expressions) have one big advantage over JSON: The type (in XML the tag name) comes before the rest of the object. In JSON it's usually a type field. That means in JSON it could come at any point and thus you have to load the whole sub-structure as a dynamic hash map before you can evaluate the type field and instantiate the correct type in your programming language. With XML using a SaX parser you are guaranteed to get the type first and thus can immediately instantiate the correct type and load the properties into that, skipping any dynamic hash map. Depending on your application this can mean a big performance difference.

by phlakaton

1 subcomments

I like this post, but I gotta tell you, it just makes me want to dust off and write a bunch of s-expr tools to make that ecosystem equally or more attractive for DSLs.
If I do, the IRS will be the first to know about it! I'll staple an announcement to my 1040. ;-)

by kwon-young

1 subcomments

The article mentions prolog but doesn't mention you can use constraints to fully express his computation graph. My prefered library is clpBNR which has powerful constraints over boolean, integers and floats:
```
  Welcome to SWI-Prolog (threaded, 64 bits, version 9.2.9)

  ?- use_module(library(clpBNR)).
  % *** clpBNR v0.12.2 ***.
  true.
  
  ?- {TotalOwed == TotalTax - TotalPayments}.
  TotalOwed::real(-1.0Inf, 1.0Inf),
  TotalTax::real(-1.0Inf, 1.0Inf),
  TotalPayments::real(-1.0Inf, 1.0Inf).
  
  ?- {TotalOwed == TotalTax - TotalPayments}, TotalTax = 10, TotalPayments = 5.
  TotalOwed = TotalPayments, TotalPayments = 5,
  TotalTax = 10.
```
If you restrict yourself to the pure subset of prolog, you can even express complicated computation involving conditions or recusions. However, this means that your graph is now encoded into the prolog code itself, which is harder to manipulate, but still fully manipulable in prolog itself.
But the author talks about xml as an interchange format which is indeed better than prolog code...

by dale_glass

4 subcomments

It kinda blows my mind that after XML we've managed to make a whole bunch of stuff that's significantly worse for any serious usage.
JSON: No comments, no datatypes, no good system for validation.
YAML: Arcane nonsense like sexagesimal number literals, footguns with anchors, Norway problem, non-string keys, accidental conversion to a number, CODE INJECTION!
I don't know why, but XML's verbosity seems to cause such a visceral aversion in a lot of people that they'd rather write a bunch of boring code to make sure a JSON parses to something sensible, or spend a day scratching their head about why a minor change in YAML caused everything to explode.
Actually my own problem with XML was annoyance that back when I had the thought of doing a complex config format in XML, the idea of modifying it programmatically while retaining comments turned out to be absolutely non-trivial. In comparison with the mess one can make with YAML that's just a trivial thing.

by miki123211

0 subcomment

What I would do here is something like:
1. standardize on JSON as the internal representation, and
2. write a simple (<1kloc) Python-based compiler that takes human-friendly, Pythonic syntax and transforms it into that JSON, based on operator overloading.
So you would write something like:
```
    from factgraph import Max, Dollar # or just import *
    tentative_tax_net_nonrefundable_credits = Max(Dollar(0), total_tentative_tax - total_nonrefundable_credits)
```
and then in class Node (in the compiler):
```
    def __sub__(self, other):
    return SubtractNode(minuent=self, subtrachents=[other])
```
Values like total_nonrefundable_credits would be objects of class Node that "know where they come from", not imperatively-calculated numbers. The __sub__ method (which is Python's way of operator overloading) would return a new node when two nodes are subtracted.

by SoftTalker

0 subcomment

Somehow I got to be nearly 60 years old and never heard the words "minuend" and "subtrahend" so I did learn something today.

by MarkSweep

2 subcomments

There is a middle ground between using XML and imperative code for representing tax forms. Robert Sesek’s ustaxlib [0] uses JavaScript to encode the forms in a way that is reasonably statically analyzable. See the visualizer [1]. My approach uses XML to represent the forms with an embedded DSL to represent most expressions tersely. See for example Form 8960 in ustaxlib [2] and my TaxStuff program [3]. The main thing that the XML format from the article has going for it is that it is easy to write a parser for. But it is a bit verbose for my taste.
[0]: https://github.com/rsesek/ustaxlib
[1]: https://github.com/rsesek/ustaxviewer
[2]: https://github.com/rsesek/ustaxlib/blob/master/src/fed2019/F...
[3]: https://github.com/AustinWise/TaxStuff/blob/master/TaxStuff/...

by eternauta3k

0 subcomment

Interesting, Germany has been publishing its payroll tax algorithm in XML for a while: https://www.bundesfinanzministerium.de/Datenportal/Daten/fre...

by omoikane

1 subcomments

I like how this article lists various alternatives. Like I was thinking "well, JSON is more compact", and they covered JSON. And then "well, s-expressions supports nesting too", and then they covered s-expressions as well. The best documentation always include the things that weren't done.

by thelastgallon

0 subcomment

It is an ironic truth that those who seek to create systems which most assume the perfectibility of humans end up building the systems which are most soul destroying and most rigid, systems that rot from within until like great creaking rotten oak trees they collapse on top of themselves leaving a sour smell and decay. We saw it happen in 1989 with the astonishing fall of the USSR. Conversely, those systems which best take into account the complex, frail, brilliance of human nature and build in flexibility, checks and balances, and tolerance tend to survive beyond all hopes. -- Adam Bosworth, https://adambosworth.net/2004/11/18/iscoc04-talk/

by n_e

2 subcomments

After thinking a bit about the problem, and assuming the project's language is javascript, I'd write the fact graph directly in javascript:
```
  const totalEstimatedTaxesPaid = writable("totalEstimatedTaxesPaid", {
    type: "dollar",
  });
  
  const totalPayments = fact(
    "totalPayments",
    sum([
      totalEstimatedTaxesPaid,
      totalTaxesPaidOnSocialSecurityIncome,
      totalRefundableCredits,
    ]),
  );
  
  const totalOwed = fact("totalOwed", diff(totalTax, totalPayments));
```
This way it's a lot terser, you have auto-completion and real-time type-checking.
The code that processes the graph will also be simpler as you don't have to parse the XML graph and turn it into something that can be executed.
And if you still need XML, you can generate it easily.

by cwbrandsma

0 subcomment

So I was part of a company that did this. They used XML as a programming language, then built apps to manage the "code". This was all done to create mobile apps for old Windows Phone devices back in the Windows Phone 5 and 6 days (before iPhone).
Because of the tooling, you weren't actually writing the XML either, you used a custom built editor (a tree view with a property panel). It all sucked. I was looking at the thing trying to figure out if I could create an intermediate language with my own "compiler" to get around the xml editors they build.
Anyway, every developer hated it. All of them. Well, everyone but the guy that created the monstrosity anyway.

by butterisgood

0 subcomment

XML was once like violence... if you're not getting the results you wanted you should just use more of it. We do not need to go back to that. XML is a step backwards to what was already a step backwards.

by librasteve

2 subcomments

I have been playing with DSLs a little, here is the kind of syntax that I would choose:
```
  invoice "INV-001" for "ACME Corp"
    item "Hosting" 100 x 3
    item "Support" 50 x 2
    tax 20%
  invoice "INV-002" for "Globex"
    item "Consulting" 200 x 5
    discount 10%
    tax 21%
```
In contrast to XML (even with authoring tools), my feeling is that XML (or any angle-bracket language tbh) is just too hard to write correctly (ie XML syntax and XMl schema parsing is very unforgiving) and has a lot of noise when you read it that obscures the main intent of the DSL code.

by jrm4

2 subcomments

To go up one level of abstraction; any thoughts on to whether or not we might actually be able "solve" the time old problem of "which data format" thanks to ubiquitous AI tooling?
Just kind of spitballing here, but in a world where can point AI at some good, or badly formed -- XML, json, toml whatever and just kind of say "hey, what's going on here, fix it?"

by Hfuffzehn

0 subcomment

Right. And as one of the people who has helped the downfall of the German economy by writing DSLs in the last decades: Our DSLs compiled to XML for transportability.
But please don't write DSLs anymore. If you have to, probably even just using Opus to write something for you is better. And AI doesn't like DSLs that can't be in its training base.

by sdovan1

1 subcomments

Sometimes I wonder why we need to invent another DSL. (or when should we?)
At work, we have an XML DSL that bridges two services. It's actually a series of API calls with JSONPath mappings. It has if-else and goto, but no real math (you can only add 1 to a variable though) and no arrays. Debugging is such a pain, makes me wonder why we don't just write Java.

by oaiey

0 subcomment

XML, Json, plain text, whatever, all does not matter. What matters is that you speak domain language. Speak the language of your domain, model your config or data in the language of the domain and users.
That is so powerful and the reason domain driven design is still a powerful concept.

by randomNumber7

0 subcomment

I worked at a place where we had a custom written code generator that used XML as input. It is usable and especially XSD is nice to specify what a valid input file looks like.
On the other hand it is horrible to read and write for humans. Nowadays I would rather use JSON with JSON Schema.

by elijahlucian

0 subcomment

I have also found this. started a project with the idea, but never finished, maybe someone can find the approach useful: https://github.com/ELI7VH/enzyme/

by ddtaylor

0 subcomment

What a day when people are praising XML because it can be used to help calculate a bloated tax code.

by cluckindan

0 subcomment

”With a declarative graph representation, we get auditability and introspection for free, for every single calculation.”
No, you don’t. Those are dependent on the actual implementation.
The XML layer is a neat looking storefront hiding the crimes being committed in the back room.

by 4star3star

0 subcomment

In case it helps anyone tinkering with XML and C#, Visual Studio has a feature in the menu to "paste xml as classes". That can be quite handy if you're going to be deserializing it.

by raverbashing

1 subcomments

Honestly let's leave XML in that 90s drawer from where it should have never left

by Hackbraten

1 subcomments

At the cost of a slightly more complex schema, the JSON representation can be made much more readable:

    {
      "path": "/tentativeTaxNetNonRefundableCredits",
      "description": "Total tentative tax after applying non-refundable credits, but before applying refundable credits.",
      "maxOf": [
        {
          "const": {
            "value": 0,
            "currency": "Dollar"
          }
        },
        {
          "subtract": {
            "from": "/totalTentativeTax",
            "amount": "/totalNonRefundableCredits"
          }
        }
      ]
    }

by imglorp

1 subcomments

The cheaper DSL is lisp. Cheap to parse, extend, transform. And you can have real macros and of course it's all executable.
Oh and the universe is written in lisp (but mostly perl).

by bluebxrry

1 subcomments

As someone who knows exactly as much COBOL as everyone else here, XML is what comes out of Java tooling as the handicap for the office that demands Windows tools; of course, it's barbaric. The real crime is sending me this article in HTML, AKA the Super Weenie Hut Jr of generalized markup languages. Adobe FrameMaker is the real text editor used to forge your generalized markup language. Rumor has it that when FrameMaker dropped Mac support, Jobs cut out Flash games in the next Super Weenie Hut Junior.

by matchagaucho

0 subcomment

Not to mention LLMs love XML.
The markup includes self-describing metadata and constantly reminds the GPT model of explicit context.

by scotty79

2 subcomments

How awesome would XML be if it didn't have attributes, namespaces and could close elements with </>

by PunchyHamster

0 subcomment

It's expensive to read, write, and parse. This is perfect example if "if you only have hammer..."

by SilentM68

0 subcomment

It's worth learning but hard to learn without the right material and a good teacher.

0 subcomment

by LastTrain

0 subcomment

This looks fun but I’d rather have the free direct filing service they discontinued.

by alexfromapex

0 subcomment

One could argue JSON is even cheaper, along the same lines.

by wild_pointer

0 subcomment

It's not so cheap, in terms of maintenance and mental load

by tremon

0 subcomment

I don't get it. What about XML is domain-specific?

by akssri

0 subcomment

Obligatory,
https://www.schnada.de/grapt/eriknaggum-xmlrant.html

by conorcleary

0 subcomment

again with the DSL use more title

by klysm

0 subcomment

I hate everything about XML. I have lost weeks of my life to fixing whitespace sensitive bugs

by lerp-io

0 subcomment

why can't you people just use json?

by baq

1 subcomments

XML is better than yaml.
…note this doesn’t really say much. Both are terrible.

by dndn2

0 subcomment

[dead]

by elophanto_agent

0 subcomment

[dead]

by mikkupikku

0 subcomment

Yeah, but you get what you pay for.

by cl0ckt0wer

0 subcomment

The subtext here is that XML is a powerful tool when generating code with LLMs

by jgalt212

0 subcomment

> Tax logic needs a declarative specification
preach. I'm convinced there are cycles in the tax code that can be exploited for either infinite taxes or zero taxes. Can Claude find them?