I feel like this post is overly hyperbolic about the choices that an Open Source maintainer made years ago, and no one seemed to care enough to pay him to rewrite it.
I tried to understand what was wrong in Yjs, as I'm using it myself, but your point is not really with Yjs it seems but on how the interaction is with Prosemirror in your use case. I can see why you're bringing up your points against Yjs and I'm having a hard time understanding why you don't consider alternatives to Prosemirror directly. Put another way, "because this integration was bad the source system must also be bad". I do not condone this part of your article. Seems like a sunken cost fallacy to me and reasoning about it at anothers expense, but perhaps not. Hoping to hear back from you.
I’ve spent 3+ years fighting the same problems while building DocNode and DocSync, two libraries that do exactly what you describe.
DocSync is a client-server library that synchronizes documents of any type (Yjs, Loro, Automerge, DocNode) while guaranteeing that all clients apply operations in the same order. It’s a lot more than 40 lines because it handles many things beyond what’s described here. For example:
It’s local-first, which means you have to handle race conditions.
Multi-tab synchronization works via BroadcastChannel even offline, which is another source of race conditions that needs to be controlled.
DocNode is an alternative to Yjs, but with all the simplicity that comes from assuming a central server. No tombstones, no metadata, no vector clock diffing, supports move operations, etc.
I think you might find them interesting. Take a look at https://docukit.dev and let me know what you think.
Having watched that space now for the last nearly 25 years... of all the projects I've abandoned over the years, that is the one that I am most grateful I gave up on. The gulf between "hey what if we could collaboratively edit live" and what it takes to actually implement it is one of the largest mismatches between intuition and reality I know of. I had no idea.
It is very true that there are nuances you have to deal with when using CRDT toolkits like Yjs and Automerge - the merged state is "correct" as a structure, but may not match your scheme. You have to deal with that into your application (Prosemirror does this for you, if you want it, and can live with the invalid nodes being removed)
You can't have your cake and eat it with CRDTs, just as you can't with OT. Both come with compromises and complexities. Your job as a developer is to weigh them for the use case you are designing for.
One area in particular that I feel CRDTs may really shine is in agentic systems. The ability to fork+merge at will is incredibly important for async long running tasks. You can validate the state after an agent has worked, and then decide to merge to main or not. Long running forks are more complex to achieve with OT.
There is some good content in this post, but it's leaning a little too far towards drama creation for my tast.
We're the nerdiest bunch in the world, absolutely willing to learn and adapt the most arcane stuff if it gives us a real of percieved advantage, yet the fact that Google Docs style CRDTs have completely elided the profession speaks volumes about their actual usefulness.
It would have been nice if the article compared yjs with automerge and others. Jsonjoy, in particular, appears very impressive. https://jsonjoy.com/
The current implementation does suffer from the same issue noted for the Yjs-ProseMirror binding: collaborative changes cause the entire document to be replaced, which messes with some ProseMirror plugins. Specifically, when the client receives a remote change, it rolls back to the previous server state (without any pending local updates), applies the incoming change, and then re-applies its pending local updates; instead of sending a minimal representation of this overall change to ProseMirror, we merely calculate the final state and replace with that.
This is not an inherent limitation of the collaboration algorithm, just an implementation shortcut (as with the Yjs binding). It could be solved by diffing ProseMirror states to find the minimal representation of the overall change, or perhaps by using ProseMirror's built-in undo/redo features to "map" the remote change through the rollback & re-apply steps.
That's why I created prosemirror-collab-commit.
Or internally: [ [HLC, “id:2”, ‘parent_id’, “id:1”], [HLC, “id:2”, ‘type’, “text”], ... ]
Merging is easy, and allows for atomic modifications without rebuilding the entire tree, as well as easy conflict resolution. We add the HLC (clock, peer id). If the time difference between the two clocks is significant, we create a new field [HLC, id, “conflict:” + key, old_value]
Yes, here I agree: yjs core is well written, while plugins are “nice to have”.
EDIT: I live in Seattle and it is 12:34, so I must go to bed soon. But I will wake up and respond to comments first thing in the morning!
const result = step.apply(this.doc);
if (result.failed) return false;
I suspect this doesn't work.There seems to be a conflict of interest with describing Yjs's performance, which basically does the same thing along with Automerge.
But there's another issue that the author hasn't even considered, and possibly it's the root cause why the prosemirrror (which I'd never heard of before btw) does the thing the author thinks is broken... Say you have a document like "请来 means 'please go'" and independently both the Chinese and English collaborators look at that and realise it's wrong. One changes it to "请走 means 'please go'" and the other changes it to "请来 means 'please come'". Those changes are in different spans, and so a merge would blindly accept both resulting in "请走 means 'please come'" which is entirely different from the original, but just as incorrect. Depending on how much other interaction the authors have, this could end up in a back and forth of both repeatedly changing it so the merged document always ended up incorrect, even though individually both authors had made valid corrections.
That example seems a bit hypothetical, but I've experienced the same thing in software development where two BAs had created slightly incompatible documents stating how some functionality should work. One QA guy kept raising bugs saying "the spec says it should do X", the dev would check the cited spec and change the code to match the spec. Weeks later, a different QA guy with a different spec would raise a bug saying "why is this doing X? The spec says it should do Y", a different dev read the cited spec, and changed the code. In this case, the functionality flip-flopped about 10 times over the course of a year and it was only a random conversation one day where one of them complained about a bug they'd fixed many times and the other guy said "hey, that bug sounds familiar" and they realised they were the two who'd been changing the code back and forth.
This whole topic is interesting to me, because I'm essentially solving the same problem in a different context. I've used CRDT so far, but only for somewhat limited state where conflicts can be resolved. I'm now moving to a note-editing section of the app, and while there is only one primary author, their state might be on multiple devices and because offline is important to me, they might not always be in sync. I think I'm probably going to end up highlighting conflicts, I'm not sure. I might end up just re-implementing something akin to Quill's system of inserts / deletes.
I am long-time CKEditor dev, I was responsible for implementing real-time collaboration in the editor and the OT implementation.
Regarding the first part of your article. Guess what - CKEditor would output "" :). And even better, if the user who deleted all does undo, you'd get "u" where it was typed originally.
However, I fully agree, that for every algorithm, you will be able to find a scenario where it fails to resolve conflict in a way expected by the user. But we cannot ask user to resolve a conflict manually every time it happens.
Offline editing, as you correctly observed, is more difficult, because the conflicts pile up, and multiple wrong decisions can result in a horrifying final result. I fully agree, that this is not only an algorithmic problem but also a UX problem. Add to this, that in many apps, you will also have other (meta)data that has to be synced too (besides document data).
CKEditor is, in theory, ready for offline editing. From algorithm POV, offline is no different than very very very slow connection (*). In the end, you receive a set of operations to transform against other set of operations. However, currently we put the editor in read-only state when the connection breaks. We are aware, that even if all transformations resolve as expected, then the end result may still be "weird". And even if the end result is actually as expected, the amount of changes may be overwhelming to a person who just got the connection back, so it still may be good to provide some UI/UX to help them understand what happened.
(*) - that is, unless the editing session on the server ended already, and, simply saying, you don't have anything to connect to (to pull operations from).
Regarding OT. I have a feeling that one mistake most people make, is that they take OT as it is described in some papers or article, and don't want to iterate over this idea. To me, this is not just one algorithm, rather an idea of how to think about and mange changes happening to the data.
For CKEditor, from the very beginning, we were forced to innovate over typical OT implementations. First of all we focused on users intentions. Second of all, we needed to adapt it to tree data structure. These challenges shaped my way of thinking - OT is "an idea", you need to adapt it to your project. Someone here asked if there's library for OT, because they want to use it for spreadsheets. I'll say -- write it on your own and adapt it to spreadsheets. You'll discover that maybe you don't need some operations, or maybe you need new operations dedicated for spreadsheets. This is what we ended up doing. @Reinmar already posted this link here, but we describe our approach here: https://ckeditor.com/blog/lessons-learned-from-creating-a-ri....
Circling back to your example with typing and removing whole sentence. This is how you innovate over OT. To us, such deletion is not deleting N singular characters starting from position P. The intention is to remove some continuous range of text. If someone writes inside the range, it just changes the boundary of stuff to remove, but surely we don't want to show some random letters after the deletion happens. We account for that and make modifications in our OT implementation.
Similarly with positions in document. In CKEditor, you can use LivePositions and LiveRanges, which are basically paths in tree data structure. Every position is transformed by operation too. Many features we have base on that.
So, my take here is -- don't bash OT because you based your experience on some simple implementations. Possibly the same is with Yjs. Don't bash CRDTs because Yjs is doing something badly?
And some final words regarding the second part.
We also follow the same pattern as your diagram shows in "How the simple thing works" section. As I was reading through the article, and looking at provided examples, it's hard for me not to think, that what's happening is some kind of an OT-variant, maybe simplified, or maybe adapted to some specific cases. But there are strong similarities between what you described and CKEditor 5, and we use OT. Like, looking at this from top-level view, I could say, "well, we do the same". We have the same loop with conflict resolution, we just call "rebase" a "transformation", and instead "steps" we have "operations".
Also, you say it is 40LOCs, but how much magic happens in `step.apply()`? How much the architecture was made to make it possible? Even Marijn makes this comment here: https://news.ycombinator.com/item?id=47409647.
For comparison, this is CKEditor's file that includes the OT functions to transform operations: https://github.com/ckeditor/ckeditor5/blob/master/packages/c.... It's 2600LOCs (!), but at least most of it are comments :). Again, the basic idea for OT is very simple (and this implementation could be simpler, we also learned a lot in the process). It's up to you how much you want to delve into solving "user intention" issues.
I really disagree with this article - despite protestation, I feel like their issue is with Yjs, not CRDTs in general.
Namely, their proposed solution:
1. For each document, there is a single authority that holds the source of truth: the document, applied steps, and the current version.
2. A client submits some transactional steps and the lastSeenVersion.
3. If the lastSeenVersion does not match the server’s version, the client must fetch recent changes(lastSeenVersion), rebase its own changes on top, and re-submit.
(3a) If the extra round-trip for rebasing changes is not good enough for you, prosemirror-collab-commit does pretty much the same thing, but it rebases the changes on the authority itself.
This is 80% to a CRDT all by itself! Step 3 there, "rebase its own changes on top" is doing a lot of work and is essentially the core merge function of a CRDT. Also, the steps needed to get the rest of the way to a full CRDT is the solution to their logging woes: tracking every change and its causal history, which is exactly what is needed to exactly re-run any failing trace and debug it.Here's a modified version of the steps of their proposed solution:
1. For each document, every participating member holds the document, applied steps, and the current version.
2. A client submits (to the "server" or p2p) some transactional steps and the lastSeenVersion.
3. If the lastSeenVersion does not match the "server"/peer’s version, the client must fetch recent changes(lastSeenVersion). The server still accepts the changes. Both the client and the "server" rebase the changes of one on top of the other. Which one gets rebased on top of the other can be determined by change depth, author id, real-world timestamp, "server" timestamp, whatever. If it's by server timestamp, you get the exact behavior from the article's solution.
If you store the casual history of each change, you can also replay the history of the document and how every client sees the document change, exactly as it happened. This is the perfect debugging tool!CRDTs can store this casual history very efficiently using run-length encoding: diamond-types has done really good work here, with an explanation of their internals here: https://github.com/josephg/diamond-types/blob/master/INTERNA...
In conclusion, the article seems to be really down on CRDTs in general, whereas I would argue that they're really down on Yjs and have written 80+% of a CRDT without meaning to, and would be happier if they finished to 100%. You can still have the exact behavior they have now by using server timestamps when available and falling back to local timestamps that always sort after server timestamps when offline. A 100% casual-history CRDT would also give them much better debugging, since they could replay whatever view of history they want over and over. The only downside is extra storage, which I think diamond-types has shown can be very reasonable.
It is a collaborative markdown file that also renders very fast. So far so good.
And then... it somehow adds Javascript? And React? And somehow AI is involved? I truly don't understand what it is, and I am (I think) the end customer...
edit: I tried it and I just get "Loading..." forever. So, anyway, next time.