FRESH

Hacker News

Home

Love C, hate C: Web framework memory problems

163 points by OneLessThing

by bluetomcat

9 subcomments

Good C code will try to avoid allocations as much as possible in the first place. You absolutely don’t need to copy strings around when handling a request. You can read data from the socket in a fixed-size buffer, do all the processing in-place, and then process the next chunk in-place too. You get predictable performance and the thing will work like precise clockwork. Reading the entire thing just to copy the body of the request in another location makes no sense. Most of the “nice” javaesque XXXParser, XXXBuilder, XXXManager abstractions seen in “easier” languages make little sense in C. They obfuscate what really needs to happen in memory to solve a problem efficiently.

by acidx

1 subcomments

One thing to note, too, is that `atoi()` should be avoided as much as possible. On error (parse error, overflow, etc), it has an unspecified return value (!), although most libcs will return 0, which can be just as bad in some scenarios.
Also not mentioned, is that atoi() can return a negative number -- which is then passed to malloc(), that takes a size_t, which is unsigned... which will make it become a very large number if a negative number is passed as its argument.
It's better to use strtol(), but even that is a bit tricky to use, because it doesn't touch errno when there's no error but you need to check errno to know if things like overflow happened, so you need to set errno to 0 before calling the function. The man page explains how to use it properly.
I think it would be a very interesting exercise for that web framework author to make its HTTP request parser go through a fuzz-tester; clang comes with one that's quite good and easy to use (https://llvm.org/docs/LibFuzzer.html), especially if used alongside address sanitizer or the undefined behavior sanitizer. Errors like the one I mentioned will most likely be found by a fuzzer really quickly. :)

by kazinator

2 subcomments

I definitely don't love C that does atoi on a Content-Length value that came from the network and passes that to malloc.
Even before we get to how a malicious would interact with malloc, there is this:
> The functions atof, atoi, atol, and atoll are not required to affect the value of the integer expression errno on an error. If the value of the result cannot be represented, the behavior is undefined. [ISO C N3220 draft]
That includes not only out-of-range values by garbage that cannot be converted to a number at all. atoi("foo") can behave in any manner whatsoever and return anything.
Those functions are okay to use on something that has been validated in a way that it cannot cause a problem. If you know you have a nonempty sequence of nothing but digits, possibly with a minus sign, and the number digits is small enough that the value will fit into int, you are okay.
> A malicious user can pass Content-Length of 4294967295
But why would they when it's fewer keystrokes to use -1, which will go to 4294967295 on a 32 bit malloc, while scaling to 18446744073709551615 on 64 bit?

by yipikaya

4 subcomments

As an aside, it's amusing that it took 25 years for C coders to embrace the C99 named struct designator feature:

    HttpParser parser = {
        .isValid = true,
        .requestBuffer = strdup(request),
        .requestLength = strlen(request),
        .position = 0,
    };

All the kids are doing it now!

by lelanthran

2 subcomments

I can't completely blame the language here: anyone "coding" in a language new to them using an LLM is going to have real problems.

by dang

0 subcomment

Recent and related:
Show HN: I built a web framework in C - https://news.ycombinator.com/item?id=45526890 - Oct 2025 (208 comments)

by AdieuToLogic

1 subcomments

While the classic "Parse, don’t validate"[0] paper uses Haskell instead of C as its illustrative programming language, the approach detailed is very much applicable in these scenarios.
0 - https://lexi-lambda.github.io/blog/2019/11/05/parse-don-t-va...

by jqpabc123

0 subcomment

Reads like an indictment of vibe coding.
LLMs are fundamentally probabilistic --- not deterministic.
This basically means that anything produced this way is highly suspect. And this framework is an example.

by erichocean

2 subcomments

Give Fil-C a try, the speed hit is pretty minimal and you get full memory safety.
https://fil-c.org/

by jacquesm

5 subcomments

There are many, many more such issues with that code. The person that posted it is new to C and had an AI help them to write the code. That's a recipe for disaster, it means the OP does not actually understand what they wrote. It looks nice but it is full of footguns and even though it is a useful learning exercise it also is a great example of why it is better run battle tested frame works than to inexpertly roll your own.
As a learning exercise it is useful, but it should never see production use. What is interesting is that the apparent cleanliness of the code (it reads very well) is obscuring the fact that the quality is actually quite low.
If anything I think the conclusion should be that AI+novice does not create anything that is useable without expert review and that that probably adds up to a net negative other than that the novice will (hopefully) learn something. It would be great if someone could put in the time to do a full review of the code, I have just read through it casually and already picked up a couple of problems, I'm pretty sure that if you did a thorough job of it there would be many more.

by messe

4 subcomments

> Another interesting choice in this project is to make lengths signed:
There are good reasons for this choice in C (and C++) due to broken integer promotion and casting rules.
See: "Subscripts and sizes should be signed" (Bjarne Stroustrup) https://open-std.org/jtc1/sc22/wg21/docs/papers/2019/p1428r0...
As a nice bonus, it means that ubsan traps on overflow (unsigned overflows just wrap).

by qalmakka

0 subcomment

Integer operations, the one thing in computers where basically there's no non-annoying way to do them right except being over pedantic with checks

by qalmakka

0 subcomment

OT: using the `strcasecmp` family of functions is basically asking for trouble - unless you've previously set the locale to "C", which is basically the only locale with a defined behaviour. Otherwise you're basically bound to run onto very funny internationalisation issues you'd rather know nothing about (and fail the Turkey Test)

by pizlonator

0 subcomment

Just compile it with Fil-C

by ge96

0 subcomment

Long as you allocate me, it's alright