FRESH

Hacker News

GLM-5V-Turbo: Toward a Native Foundation Model for Multimodal Agents

161 points by gmays

by julius

3 subcomments

Click coordinates. Agentic GUI is really annoying when the multi-modal agent cannot click on x,y coordinates.
I tested Qwen3.6, Gemma4, Nemotron3-nano-omni. They fully hallucinate x,y coords. (did not try GLM-5V yet)
GPT-5.5 can easily do it. But also Vocaela, a tiny 500M model, is quite good at it. Hope they improve the training for x,y clicking soon on the smallish multi-modals.
Recently slopped a http service together just so my local models can click, instead of relying on all the wild ways agents currently hack into the browser (browser-use, browser-harness, agent-browser, dev-browser etc) https://github.com/julius/vocaela-click-coords-http

by gertlabs

4 subcomments

GLM-5V-Turbo is a model I wanted to like due to its speed and API reliability, but it didn't perform well in our coding and reasoning testing. More recent open source models have made it obsolete. GLM 5.1 is so many light years ahead of it on everything except speed, that I'm not sure why it's still being served.
Comprehensive evaluation results at https://gertlabs.com/rankings

by chillacy

0 subcomment

I switched from kimi to GLM for claws. I found it tends to be a stronger debugger when it comes to fixing openclaw issues with bash, being more thorough in finding the right root cause. Kimi tended to confidently misdiagnose. It's also much faster. The only downside is that it seems to cost an order of magnitude more than kimi.

by _pdp_

1 subcomments

We just migrated an AI agent from Kimi to GLM and frankly I am surprised by the results. It feels premium.
However, both Kimi and GLM can end up in doom loops so be careful how you use them. Without a proper harness the agent can easily get into some tricky situations with no escape.
We had to develop new heuristics in our cloud harness just because of this but I am really grateful that we did as the platform feels now more robust.
A small price to pay for model plug & play!

by zozbot234

0 subcomment

Looks like this was not an open release, the latest GLM-xV release was 4.6V and Turbo models were never open.

by desireco42

1 subcomments

I've been using GLM pretty much exclusively last 6-8 months. I have access to Anthropic and OpenAI models and others. I always keep returning to GLM, it isn't the best, sometimes I would go to Codex to help it, but overall, especially with Turbo, it is everyday good model.
Turbo makes a huge difference in everyday use because it saves you time and you are not in the mood always to wait endlessly.

by DexOmg

0 subcomment

by muddi900

5 subcomments