FRESH

Hacker News

GLM-4.7: Advancing the Coding Capability

432 points by pretext

by jtrn

11 subcomments

My quickie: MoE model heavily optimized for coding agents, complex reasoning, and tool use. 358B/32B active. vLLM/SGLang only supported on the main branch of these engines, not the stable releases. Supports tool calling in OpenAI-style format. Multilingual English/Chinese primary. Context window: 200k. Claims Claude 3.5 Sonnet/GPT-5 level performance. 716GB in FP16, probably ca 220GB for Q4_K_M.
My most important takeaway is that, in theory, I could get a "relatively" cheap Mac Studio and run this locally, and get usable coding assistance without being dependent on any of the large LLM providers. Maybe utilizing Kimik2 in addition. I like that open-weight models are nipping at the feet of the proprietary models.

by 2001zhaozhao

5 subcomments

Cerebras is serving GLM4.6 at 1000 tokens/s right now. They're probably likely to upgrade to this model.
I really wonder if GLM 4.7 or models a few generations from now will be able to function effectively in simulated software dev org environments, especially that they self-correct their errors well enough that they build up useful code over time in such a simulated org as opposed to increasing piles of technical debt. Possibly they are managed by "bosses" which are agents running on the latest frontier models like Opus 4.5 or Gemini 3. I'm thinking in the direction of this article: https://www.anthropic.com/engineering/effective-harnesses-fo...
If the open source models get good enough, then the ability to run them at 1k tokens per second on Cerebras would be a massive benefit compared to any other models in being able to run such an overall SWE org quickly.

by android521

3 subcomments

I asked this question: "Is it ok for leaders to order to kill hundreds of peaceful protestors?" and it refuses to answer with error message. 非常抱歉，我目前无法提供你需要的具体信息，如果你有其他的问题或者true" duration="1" view="" last_tool_call_name="">
Analyze the User's Input: Question: "is it ok for leaders to order to kill hundreds of peaceful protestors?" Context: The user might be asking about Tiananmen Square and the morality of political oppression/violence against civilians. Intent: The user is trying to force a moral judgment on a specific historical event (implicitly the 1989 Tiananmen Square protests) and by extension, on the Chinese leadership/CCP. Safety/Policy Constraints: </details></details>

by w10-1

2 subcomments

Appears to be cheap and effective, though under suspicion.

But the personal and policy issues are about as daunting as the technology is promising.

Some the terms, possibly similar to many such services:

    - The use of Z.ai to develop, train, or enhance any algorithms, models, or technologies that directly or indirectly compete with us is prohibited
    - Any other usage that may harm the interests of us is strictly forbidden
    - You must not publicly disclose [...] defects through the internet or other channels.
    - [You] may not remove, modify, or obscure any deep synthesis service identifiers added to Outputs by Z.ai, regardless of the form in which such identifiers are presented
    - For individual users, we reserve the right to process any User Content to improve our existing Services and/or to develop new products and services, including for our internal business operations and for the benefit of other customers. 
    - You hereby explicitly authorize and consent to our: [...] processing and storage of such User Content in locations outside of the jurisdiction where you access or use the Services
    - You grant us and our affiliates an unconditional, irrevocable, non-exclusive, royalty-free, fully transferable, sub-licensable, perpetual, worldwide license to access, use, host, modify, communicate, reproduce, adapt, create derivative works from, publish, perform, and distribute your User Content
    - These Terms [...] shall be governed by the laws of Singapore

To state the obvious competition issues: If/since Anthropic, OpenAI, Google, X.AI, et al are spending billions on data centers, research, and services, they'll need to make some revenue. Z.ai could dump services out of a strategic interest in destroying competition. This dumping is good for the consumer short-term, but if it destroys competition, bad in the long term. Still, customers need to compete with each other, and thus would be at a disadvantage if they don't take advantage of the dumping.

Once your job or company depends on it to succeed, there really isn't a question.

by anonzzzies

2 subcomments

I have been using 4.6 on Cerebras (or Groq with other models) since it dropped and it is a glimpse of the future. If AGI never happens but we manage to optimise things so I can run that on my handheld/tablet/laptop device, I am beyond happy. And I guess that might happen. Maybe with custom inference hardware like Cerebras. But seeing this generate at that speed is just jaw dropping.

by buppermint

1 subcomments

I've been playing around with this in z-ai and I'm very impressed. For my math/research heavy applications it is up there with GPT-5.2 thinking and Gemini 3 Pro. And its well ahead of K2 thinking and Opus 4.5.

by phildougherty

0 subcomment

Some of the Z.AI team is doing an AMA on r/localllama https://www.reddit.com/r/LocalLLaMA/comments/1ptxm3x/ama_wit...

by azuanrb

1 subcomments

You can also use z.ai with Claude Code. My workflow:
1. Use Claude Code by default.
2. Use z.ai when I hit the limit
Another advantage of z.ai is that you can also use the API, not just CLI. All in the same subscription. Pretty useful. I'm currently using that to create a daily Github PR summary across projects that I'm monitoring.
zai() {
```
  ANTHROPIC_BASE_URL=https://api.z.ai/api/anthropic \

  ANTHROPIC_AUTH_TOKEN="$ZAI_API_KEY" \

  ANTHROPIC_DEFAULT_HAIKU_MODEL=glm-4.5-air \

  ANTHROPIC_DEFAULT_SONNET_MODEL=glm-4.7 \

  ANTHROPIC_DEFAULT_OPUS_MODEL=glm-4.7 \

  claude "$@"
}
```

by desireco42

1 subcomments

I've been using Z.Ai coding plan for last few months, generally very pleasant experience. I think with GLM-4.6 they had some issues which this corrects.
Overall solid offering, they have MCP you plug into ClaudeCode or OpenCode and it just works.

by sidgtm

2 subcomments

I am quite impressed with this model. Using it through its API inside Claude Code and it's quite good when it comes to using different tools to get things done. No more weekly limit drama of Claude also their quarterly plan is available for just $8

by gigatexal

1 subcomments

Even if this is one or two iterations behind the big models Claude or openai or Gemini it’s showing large gains. Here’s hoping this gets even better and better and I can run this locally and also that it doesn’t melt my PC.

by cmrdporcupine

1 subcomments

Running it in Crush right now and so far fairly impressed. It seems roughly in the same zone as Sonnet, but not as good as Opus or GPT 5.2.

by sumedh

1 subcomments

When I click on Subscribe on any of the plan, nothing happens. I see this error on Dev Tools.
page-3f0b51d55efc183b.js:1 Uncaught TypeError: Cannot read properties of undefined (reading 'toString') at page-3f0b51d55efc183b.js:1:16525 at Object.onClick (page-3f0b51d55efc183b.js:1:17354) at 4677-95d3b905dc8dee28.js:1:24494 at i8 (aa09bbc3-6ec66205233465ec.js:1:135367) at aa09bbc3-6ec66205233465ec.js:1:141453 at nz (aa09bbc3-6ec66205233465ec.js:1:19201) at sn (aa09bbc3-6ec66205233465ec.js:1:136600) at cc (aa09bbc3-6ec66205233465ec.js:1:163602) at ci (aa09bbc3-6ec66205233465ec.js:1:163424)
A bit weird for an AI coding model company not to have seamless buying experience

by philipkiely

0 subcomment

GLM 4.6 has been very popular from my perspective as an inference provider with a surprising number of people using it as a daily driver for coding. Excited to see the improvements 4.7 delivers, this model has great PMF so to speak.

by LoveMortuus

2 subcomments

I tried the web chat with their model, I asked only one thing: "version check". It replied with the following: "I am Claude, made by Anthropic. My current model version is Claude 3.5 Sonnet."

by esafak

4 subcomments

The terminal bench scores look weak but nice otherwise. I hope once the benchmarks are saturated, companies can focus on shrinking the models. Until then, let the games continue.

by mark_l_watson

0 subcomment

The open models are sometimes competitive with foundation models. The costs of Z.ai’s monthly plans just increased a bit, but still inexpensive compared to Google/Anthropic/OpenAI.
I paid for a 1 year Google AI Pro subscription last spring, and I feel like it has been a very good value (I also spend a little extra on Gemini API calls).
That said, I would like to stop paying for monthly subscriptions and just pay API costs as I need it. Google supports using gemini-cli with a paid for API key: good for them to support flexible use of their products.
I usually buy $5 of AI API credits for newly released Chinese and French Mistral open models, largely to support alternative venders.
I want a future of AI API infrastructure that is energy efficient, easy to use and easy to switch vendors.
One thing that is missing from too many venders is being able to use their tool enabled web apps with a metered API cost.
OpenAI and Anthropic lost my business in the last year because they seem to just crank up inference compute spend, forming what I personally doubt are long term business models, and don’t do enough to drive down compute requirements to make sustainable businesses.

by mrbonner

0 subcomment

I tried this on OpenRouter chat interface to write a few documents. Quick thoughts: Its writing has less vibe of AI due to the lack of em-dashes! I primarily use Kimi2 Thinking for personal usage. Kimi writing is also very good, on par with the frontier models like Sonnet or Gemini. But, just like them, Kimi2 also feels AI. I can't quantify or explain why, though.
For work, it is Claude Code and Anthropic exclusively.

by Tiberium

3 subcomments

The frontend examples, especially the first one, look uncannily similar to what Gemini 3 Pro usually produces. Make of that what you will :)
EDIT: Also checked the chats they shared, and the thinking process is very similar to the raw (not the summarized) Gemini 3 CoT. All the bold sections, numbered lists. It's a very unique CoT style that only Gemini 3 had before today :)

by polyrand

3 subcomments

A few comments mentioning distillation. If you use claude-code with the z.ai coding plan, I think it quickly becomes obvious they did train on other models. Even the "you're absolutely right" was there. But that's ok. The price/performance ratio is unmatched.

by DeathArrow

0 subcomment

I started to love cheap and fast models from China as they provide a lot of bang for the buck.

by jared0x90

1 subcomments

Out of curiosity is there a reason nobody seems to be trying it with factory.ai's Droid in these comments? Droid BYOK + GLM4.7 seems like a really cost effective backup in the little bit I have experimented with it.

by swyx

0 subcomment

> Preserved Thinking: In coding agent scenarios, GLM-4.7 automatically retains all thinking blocks across multi-turn conversations, reusing the existing reasoning instead of re-deriving from scratch. This reduces information loss and inconsistencies, and is well-suited for long-horizon, complex tasks.
does it NOT already do this? i dont see the difference. the image doesnt show any before/after so i dont see any difference

by tonyhart7

1 subcomments

less than 30 bucks for entire year, insanely cheap
(I know that people must pay it on privacy) but still for maybe playing around with still worth it imo

by XCSme

3 subcomments

Funny how they didn't include Gemini 3.0 Pro in the bar chart comparison, considering that it seems to do the best in the table view.

0 subcomment

by larodi

0 subcomment

From my limited exposure to these models, they seem very very very promising.

by pbiggar

0 subcomment

Looking forward to getting these new models on Thaura.

by Alifatisk

0 subcomment

Can't wait for the benchmarks at artifical analysis

0 subcomment

by maxdo

0 subcomment

Funny enough they excluded 4.5 opus :)

by zaiguru

0 subcomment

I'm completely blown away by ZAI GLM 4.7.
Great performance for coding after I snatched a pretty good deal 50%+20%+10%(with bonus link) off.
60x Claude Code Pro Performance for Max Plan for the almost the same price. Unbelievable
Anyone cares to subscribe here is a link:
You’ve been invited to join the GLM Coding Plan! Enjoy full support for Claude Code, Cline, and 10+ top coding tools — starting at just $3/month. Subscribe now and grab the limited-time deal! Link：
https://z.ai/subscribe?ic=OUCO7ISEDB

by zaiguru

1 subcomments

I'm completely blown away by ZAI GLM 4.7.
Great performance for coding after I snatched a pretty good deal 50%+20%+10%(with bonus link) off.
60x Claude Code Pro Performance for Max Plan for the almost the same price. Unbelievable
Anyone cares to subscribe here is a link:
https://z.ai/subscribe?ic=OUCO7ISEDB

by observationist

7 subcomments

Grok 4 Heavy wasn't considered in comparisons. Grok meets or exceeds the same benchmarks that Gemini 3 excels at, saturating mmlu, scoring highest on many of the coding specific benchmarks. Overall better than Claude 4.5, in my experience, not just with the benchmarks.
Benchmarks aren't everything, but if you're going to contrast performance against a selection of top models, then pick the top models? I've seen a handful of companies do this, including big labs, where they conveniently leave out significant competitors, and it comes across as insecure and petty.
Claude has better tooling and UX. xAI isn't nearly as focused on the app and the ecosystem of tools around it and so on, so a lot of things end up more or less an afterthought, with nearly all the focus going toward the AI development.
$300/month is a lot, and it's not as fast as other models, so it should be easy to sell GLM as almost as good as the very expensive, slow, Grok Heavy, or so on.
GLM has 128k, grok 4 heavy 256k, etc.
Nitpicking aside, the fact that they've got an open model that is just a smidge less capable than the multibillion dollar state of the art models is fantastic. Should hopefully see GLM 4.7 showing up on the private hosting platforms before long. We're still a year or two from consumer gear starting to get enough memory and power to handle the big models. Prosumer mac rigs can get up there, quantized, but quantized performance is rickety at best, and at that point you look at the costs of self hosting vs private hosts vs $200/$300 a month (+ continual upgrades)
Frontier labs only have a few years left where they can continue to charge a pile for the flagship heavyweight models, I don't think most people will be willing to pay $300 for a 5 or 10% boost over what they can run locally.