FRESH

Hacker News

Home

Exploring the Fragmentation of Wayland, an xdotool adventure

94 points by viraptor

by ziotom78

1 subcomments

Xdotool and Xmodmap are the two main reasons why, after a few months running Wayland+keyd+dotool I went back to X11. I found really hard to have the following things working at once:
- Italian layout for my keyboard with heavily-customized AltGr keys for mathematical notation (in X11 it's just a matter of having a Xmodmap file)
- Using Espanso for many common shortcuts like :date: (current YYYY-MM-DD date) and :pidigits:
- A reasonable way to run Windows in a VM while using an Italian layout for my keyboard
- The possibility to use automation scripts using something as close as possible to xdotool
- Sometimes I use my home keyboard, sometimes I use my work keyboard, and sometimes I use my laptop keyboard. I expect the system to work in the same way regardless of my input device
It's not that Wayland prevents one from doing all this stuff, but the available solutions were fragile and complicated and took me so long before figuring solutions that only worked partially... For instance, to make keyd work as expected, I was forced to set up my Italian keyboard as an English keyboard and then remap all the keys manually... And every time I plugged a new keyboard, I had to tell keyd to enable my customizations on it, because telling it to use the layout with any keyboard conflicted with VirtualBox.
I understand that X11 is too complicated to be maintained, but from an user's perspective, so far I am far more efficient in X11.

by clemens3

0 subcomment

I am surprised, 87 comments and so far nobody has mentioned X11Libre.
They claim 30 developers right now. Don't know if true and if they are any good, but when the time comes to update my xserver, I will have a look at them - just to show my support.
Ah aeh, Wayland, it too got pitched to me recently. But I don't see it improves on any use case I have, but actively disables functionality I need. So a big no go and pass. Why should I change anything in my single-user distro - thank you.
Some of us thought to ourselves in the 90s, I don't have to use Windows. There is something called Linux. I remember installing SUSe from 5 1/4 floppies (A LOT of them) and configuring scanlines and hoping my CRT survives the first startx command. I gave no one authority to say what get's deprecated and what not to me. The only person who can do this I am myself.
I use Linux because of people back then as Linux Torvalds and today Jordan Sissel, who do the right thing out of passion and not expectation of financial or other reward. Just for themselves and then share it, because it might be useful for others too. It's not a 9 to 5 job to them.
People like Lennart Poettering or some other kids who want to coax me into accepting their toys are a reason for me to run away as fast as possible from such shenanigans. I survived editing the scan lines, I don't need software from IBM.
Regarding Wayland and GUIs: GUIs have been much worse than command line and batch environments for automation. xdotool is kind of the best we have (basically just creating macros like in an editor for the whole system), but neither X11 nor the applications are really designed for automation. AppleScript and d-bus all kind of never really worked out. What will happen now with text based gen-AI models, we will go back to good old text (plus speach) interfaces. We will just tell (e.g. in a text box) the AI what we want and they find a way to deliver whatever it is we asked for. Then finally the AI properly controls a web browser for us, but we don't need to see any of that.

by dvntsemicolon

11 subcomments

You'll never find me saying that Wayland development is good in its present state. I think it's a mess and it has a lot of issues.
But let's be honest about Xorg. The overwhelming majority of people who worked on Xorg are now developing Wayland. Why? Because developing Xorg is a massive pain in the butt. It is a 400K LOC behemoth of a project and it has a ridiculous amount of technical debt. I would have to imagine that if the Xorg developers thought they could fix Xorg, they would do that instead of making a new thing.

by cycomanic

0 subcomment

I don't understand a lot of the complains. It asks for a remote connection? That's because of xwayland (which is x11 inside) not wayland AFAIK. Also all the comments about how that is weird on a single system, mmh the whole X server/client architecture always sounded like one was running like on a remote system.
I actually like the approach that compositors are much more different from each other than WMs used to be, that allows people to experiment much more. Also let's not forget that X was a plethora of different plugins and incompatabilities. The reason many didn't encounter that was that the almost everyone was running xorg with all plugins, that said I still remember the hoops one had to jump through to get transparency etc. You needed a compositor and not all compositors were compatible with all WMs (and all had different capabilities).
That said I do also wish that the protocol would evolve faster. It is my impression that if it wasn't for the wlroots people not much would have happened, especially because the gnome guys seem to rather just implement something for themselves and don't try to use or push the standard.

by erlkonig

0 subcomment

The general movement of UI paradigm has been from one tech to the next with a focus on backwards compat. Almost amusingly so at times, but this is how all the earlier users and use cases can most easily progress. E.g.
* hollerith cards and sundry + printer * printing teletype * dumb (video) terminal * smart (cursor addressable) terminal * images of smart terminals * images of smart terminals with color (businesses resisted color for years) * ... ?
And in the meantime we have an evolution of support for modelling things visually and working with more descriptive protocols - or even function-defining protocols to raise the abstraction chatting with the display server in realtime. In this, "abstracted" means something that can be sent over the network instead of using a local buffer. These are in a less strict order than foregoing...
* text, color plotters, VDST, and all that other old slow stuff * [skipping a bit up through bitmapped greyscale graphics] * bitmapped color graphics * abstracted 2D graphics (-> W and X) * abstracted 3D graphics (OpenGL + GLX) * dynamically client-extendable remote graphics servers (NeWS, mostly 2D) * ... ?
So here I am, waiting for the next stage in these. Hypothesizing that finally we'll get something with 3D abstracted, network graphics (display lists in GLX but accelerated with something like XCB?), where the primary display coördinate space is (x, y, x) instead of (x, y), where the client can push some code to the remote server and raise the abstraction on the fly, finally. Where maybe we'd be able to permission the objects in that space and share it among users live. Where the 2D apps would be inside the 3D space instead of the other way around. Something for the 2000s instead of familiar abilities provided in 1990.
But instead, Wayland. Wayland, which is not backwards compatible with X. Wayland, which is 2D at its heart. Wayland, another 1990 era graphics system with a super thin offering of features for actual end users (not devs) which come at substantial cost in lost X features. Wayland, which resists the one user doing things we've long thought of as normal - in the name of "security".
Wayland is not what I've been waiting for.

by codedokode

1 subcomments

Yes the international keyboard support is pretty bad in both X and Wayland. For example, try using Left Shift to switch to layout 1 (while retaining its shift functionality) without patching Gnome. It's impossible.
Or, try making a virtual on-screen keyboard that would send characters that are not in the layout (for example, Greek character with US keyboard layout). Again, you cannot do that, and it's difficult to understand why virtual keyboard has to be restricted with characters printed on physical keyboard.
And if you want to use remote desktop from a computer with Greek layout to a computer with US layout... again, it's going to be difficult. X server-based remote apps would simply temporarily patch the layout and add non-existent keys there to be able to report the key press on a remote machine with different layout. xdotool, I think, used the same hack to input characters that are not in the layout.

by RVuRnvbM2e

2 subcomments

This is analogous to calling unix account separation "fragmentation". Why can't I just run all my services as root? It has worked for years!?
The answer is that it is a fragile, unmaintainable security nightmare.
Wayland has separation of concerns to fix that problem, with the tradeoffs described in the blog post.

by rweichler

0 subcomment

Srs question, I keep reading everywhere from experienced people Wayland sucks. I need to start learning of these stacks, should I go with Wayland or should I go with Xorg?
If I didn't know any better I would learn the Wayland API. Just like how: if I didn't know any better I would learn Swift (instead of Objective-C). But thankfully I do know better and I know to stay far away from Swift [1]. Is it the same deal with Xorg/Wayland? It seems like noobs prefer Wayland but the experts prefer Xorg.
1. https://youtu.be/ovYbgbrQ-v8?t=1456

by Hannah203

0 subcomment

The post shows a common issue with Wayland. The protocol is there, but each compositor handles things a bit differently, so tools like xdotool end up running into gaps or inconsistent behavior.
Wayland is improving, but there is still a difference between what the spec supports and what developers can rely on across the ecosystem.
A good look at why automation on Wayland still feels rough for some users.

by sho_hn

0 subcomment

tl;dr Wayland doesn't have a good set of universally adopted input emulation and UI automation protocols yet, which makes a portable UI automation utility with the full scope of `xdotool` impossible to write. Work remains to be done to close this gap.
The X protocols in this area were not very good, but due to there being a single viable implementation you could rely on them being present (similar to using MSIE-only features in that browser's dominant era).

by jchw

1 subcomments

In my opinion, three basic things are needed:
- Device emulation: uinput covers this; requiring root is reasonable for what it does.
- Input injection. Like XTEST, but ideally with permissions and more event types (i.e. tablet and touch events.) libei is close but I think it should be a Wayland protocol.
- UI automation: Right now I think the closest you can get is with AT-SPI2, for apps that support it. This should also be a Wayland protocol.
None of these are actually easy if you want to make a good API. (XTEST is a convenient API, but not a particularly good one. Win32 has better input emulation and UI automation features IMO.)
Also the tangent about how crazy the compatibility layers are is weird. Yes, funny things are being done for the sake of compatibility. XWaylandVideoBridge is another example, but screen sharing is an area where Wayland is arguably better (despite what NVIDIA has to say) because you can get zero copy window and screen contents through PipeWire thanks to dmabufs.
Some of the lack of progress comes down to disagreements. libei mainly exists, by my best estimate, because the GNOME folks don't like putting things in Mutter, and don't want to figure out how to deal with moving things out of process while keeping them in protocol. (Nevermind the fact that this still has to go through Mutter eventually, since it is the guy sending the events anyways...) However, as far as I know, lack of progress on UI automation and accessibility entirely comes down to funding. It's easy to say "why not just add SetCursorPos(x, y)" and laugh it off, but attacking these problems is really quite complex. There was Newton for the UI automation part, but unfortunately we haven't heard anything since 2024 AFAIK, and nobody else has stepped up.
https://blogs.gnome.org/a11y/2023/10/27/a-new-accessibility-...
Color management is the perfect example of how a simple ask can be complicated. How hard could it really be? Well, see for yourself.
https://gitlab.freedesktop.org/wayland/wayland-protocols/-/m...
If Wayland lasts as long as X11 did, it's preposterous to not spend the time to try to get the "new" version of these things right even if it is painful in the meantime.
After all, it isn't like UI automation on Linux was ever particularly good. Anyone who has ever used AutoHotkey could've told you that.

by AtlasBarfed

0 subcomment

At this point the Wayland project is effectively keeping desktop Linux from succeeding. It might as well have been a plant project or a strategic intelligence war from Microsoft to keep Linux on the server only.
It's a ten+ year disaster project that held desktop linux back at the precise moment of complete insanity on the part of the Windows designers with Windows 8 and the dual desktop/tiles disaster and yet-another-window-kit.
Microsoft is still pissing off its customers actively, but now we have real traction with Steam for getting gamers off of MS and onto Linux.
The opportunity is still there.

by kjellsbells

2 subcomments

Second system effect is the curse of FOSS projects. It's been that way for decades. I don't see a reliable solution for the structural problem that doesn't somehow end up like a Benevolent Dictatorship. At the end of the day, designing complex systems by committee is hard to do. Maybe there is a maximum size of a group beyond which the communication matrix between the members starts to fracture?

by VeritySage07

0 subcomment

Wayland’s fragmentation is less about one problem and more about how the ecosystem grew. Each compositor implements only what it needs, so tools like xdotool run into gaps and inconsistent behavior.
The post highlights a real coordination issue. The protocols exist, but adoption is uneven and expectations differ across compositors. Users see small breaks and developers face a moving target.
Wayland is improving, especially with work from GNOME and KDE, but stronger shared conventions for automation and accessibility are still needed.
Good write-up that shows why experiences on Wayland vary so much depending on the compositor.

by unit149

0 subcomment

[dead]