- The threading story here is what grabbed my attention. Pass-by-value with copy-on-write means you get data-race immunity without any locks or channels. You just pass data to a thread and mutations stay local. That's a genuinely useful property.
I've worked on systems where we spent more time reasoning about shared state than writing actual logic. The typical answer is "just make everything immutable" but then you lose convenient imperative syntax. This sits in an interesting middle ground.
Curious about performance in practice. Copy-on-write is great until you hit a hot path that triggers lots of copies. Have you benchmarked any real workloads?
- (Edit: in the old post title:) "everything is a value" is not very informative. That's true of most languages nowadays. Maybe "exclusively call-by-value" or "without reference types."
I've only read the first couple paragraphs so far but the idea reminds me of a shareware language I tinkered with years ago in my youth, though I never wrote anything of substance: Euphoria (though nowadays it looks like there's an OpenEuphoria). It had only two fundamental types. (1) The atom: a possibly floating point number, and (2) the sequence: a list of zero or more atoms and sequences. Strings in particular are just sequences of codepoint atoms.
It had a notion of "type"s which were functions that returned a boolean 1 only if given a valid value for the type being defined. I presume it used byte packing and copy-on-write or whatever for its speed boasts.
https://openeuphoria.org/ - https://rapideuphoria.com/
- I have implemented similar behavior in some of my projects. For one, I also have also implemented 'cursors' that point to some part of a value bound to a variable and allow you to change that part of the value of the variable. I have used this to implement program transformations on abstract parse (syntax) trees [1]. I also have implemented a dictionary based on a tree where only part of the tree is modified that needs to be modified [2]. I have also started working on a language that is based on this, but also attempts to add references with defined behavior [3].
[1] https://github.com/FransFaase/IParse/?tab=readme-ov-file#mar...
[2] https://www.iwriteiam.nl/D1801.html#7
[3] https://github.com/FransFaase/DataLang
by discarded1023
4 subcomments
- At the risk of telling you what you already know and/or did not mean to say: not everything can be a value. If everything is a value then no computation (reduction) is possible. Why? Because computation stops at values. This is traditional programming language/lambda calculus nomenclature and dogma. See Plotkin's classic work on PCF (~ 1975) for instance; Winskel's semantics text (~ 1990) is more approachable.
Things of course become a lot more fun with concurrency.
Now if you want a language where all the data thingies are immutable values and effects are somewhat tamed but types aren't too fancy etc. try looking at Milner's classic Standard ML (late 1970s, effectively frozen in 1997). It has all you dream of and more.
In any case keep having fun and don't get too bogged in syntax.
- This sounds quite similar to pure functional languages like Haskell, where a function call cannot have any side effect.
But those go further in that they don't even have any mutable data. Instead of
var foo = { a: 1 };
var bar = foo; // make a copy of foo
set bar.a = 2; // modify bar (makes a copy)
Haskell has foo = Baz { a = 1 }
bar = foo { a = 2 } // make a modified copy of foo
by netbioserror
1 subcomments
- Nim has a similar, strong preference for value semantics. However, its dynamic heap types (strings, seqs, tables) are all implemented as wrappers that hide the internal references and behave with value semantics by default, unless explicitly escape hatched. It makes it incredibly easy to manipulate almost any data in a functional, expression-oriented manner, while preserving the speed and efficiency of being backed by a doubling array-list.
- My hobby language[1] also has no reference semantics, very similar to Herd. I think this is a really interesting point in the design space. A lot of complexity goes away when it's only values, and there are real languages like classic APL that work this way. But there are some serious downsides.
In practice I have found that it's very painful to thread state through your program. I ended up offering global variables, which provide something similar to but worse than generalized reference semantics. My language aims for simplicity so I think this may still be a good tradeoff, but it's tricky to imagine this working well in a larger user codebase.
I like that having only value semantics allows us, internally, to use reference counted immutable objects to cut down on copying; we both pass-by-reference internally and present it as pass-by-value to the programmer. No cycle detection needed because it's not possible to construct cycles. I use an immutable data structures library[2] so that modifications are reasonably efficient. I recommend trying that in Herd; it's almost always better than copy-on-write. Think about the Big-O of modifying a single element in an array, or building up a list by repeatedly appending to it. With pure COW it's hard to have a large array at all--it takes too long to do anything with it!
For the programmer, missing reference semantics can be a negative. Sometimes people want circular linked lists, or to implement custom data structures. It's tough to build new data structures in a language without reference semantics. For the most part, the programmer has to simulate them with arrays. This works for APL because it's an array language, but my BASIC has less of an excuse.
I was able to avoid nearly all reference counting overhead by being single threaded only. My reference counts aren't atomic so I don't pay anything but the inc/dec. For a simple language like TMBASIC this was sensible, but in a language with multithreading that has to pay for atomic refcounts, it's a tough performance pill to swallow. You may want to consider a tracing GC for Herd.
[1] https://tmbasic.com
[2] https://github.com/arximboldi/immer
- Syntax comment: in your control structures you use a keyword ("do", "then") to start a block as well as wrapping the block in parentheses. This feels superfluous. I suggest sticking with either keywords or parens to delineate blocks, not both.
- You should check out Perceus! https://www.microsoft.com/en-us/research/wp-content/uploads/...
by Panzerschrek
1 subcomments
- But what if mutation is intended? How to pass a mutable reference into a function, so that it can change the underlying value and the caller can observe these changes? What about concurrent mutable containers?
- The article mentions shallow copy, but does this create a persistent immutable data structure? Does it modify all nodes up the tree to the root?
by travisgriggs
1 subcomments
- Curious if erlang/elixir isn’t the same sort of thing? Or am I misunderstanding the semantics of “pass by value”?
- the pipe-equal operator is pretty neat, don't think I've seen any other language do that.
by anacrolix
1 subcomments
- Nobody has heard of persistent data structures?!
by bananasandrice
0 subcomment
- [dead]
- > In herd, everything is immutable unless declared with var
So basucally everything is var?
- Small programming language with everything passed by value? You reinvented C?