FRESH

Hacker News

A better streams API is possible for JavaScript

338 points by nnx

by steve_adams_86

0 subcomment

I ran into a performance issues a few months ago where native streams were behaving terribly, and it seemed to be due to bad back-pressure implementation.
I tried several implementations, tweaked settings, but ultimately couldn't get around it. In some cases I had bizarre drops in activity when the consumer was below capacity.
It could have been related to the other issue they mention, which is the cost of using promises. My streams were initiating HEAPS of promises. The cost is immense when you're operating on a ton of data.
Eventually I had to implement some complex logic to accomplish batching to reduce the number of promises, then figure out some clever concurrency strategies to manage backpressure more manually. It worked well.
Once I was happy with what I had, I ported it from Deno to Go and the result was so stunningly different. The performance improvement was several orders of magnitude.
I also built my custom/native solution using the Effect library, and although some people claim it's inefficient and slow, it out-performed mine by something like 15% off the shelf, with no fine-tuning or clever ideas. I wished I'd used it from the start.
The difference is likely in that it uses a fiber-based model rather than promises at the execution layer, but I'm not sure.

by conartist6

12 subcomments

As it happens i have an even better API than this article proposes!
They propose just using an async iterator of UInt8Array. I almost like this idea, but it's not quite all the way there.
They propose this:
```
  type Stream<T> = {
    next(): Promise<{ done, value: UInt8Array<T> }>
  }
```
I propose this, which I call a stream iterator!
```
  type Stream<T> = {
    next(): { done, value: T } | Promise<{ done, value: T }>
  }
```
Obviously I'm gonna be biased, but I'm pretty sure my version is also objectively superior:
- I can easily make mine from theirs
- In theirs the conceptual "stream" is defined by an iterator of iterators, meaning you need a for loop of for loops to step through it. In mine it's just one iterator and it can be consumed with one for loop.
- I'm not limited to having only streams of integers, they are
- My way, if I define a sync transform over a sync input, the whole iteration can be sync making it possible to get and use the result in sync functions. This is huge as otherwise you have to write all the code twice: once with sync iterator and for loops and once with async iterators and for await loops.
- The problem with thrashing Promises when splitting input up into words goes away. With async iterators, creating two words means creating two promises. With stream iterators if you have the data available there's no need for promises at all, you just yield it.
- Stream iterators can help you manage concurrency, which is a huge thing that async iterators cannot do. Async iterators can't do this because if they see a promise they will always wait for it. That's the same as saying "if there is any concurrency, it will always be eliminated."

by spankalee

2 subcomments

Async iterables aren't necessarily a great solution either because of the exact same promise and stack switching overhead - it can be huge compared to sync iterables.
If you're dealing with small objects at the production side, like individual tag names, attributes, bindings, etc. during SSR., the natural thing to do is to just write() each string. But then you see that performance is terrible compared to sync iterables, and you face a choice:
```
  1. Buffer to produce larger chunks and less stack switching. This is the exact same thing you need to do with Streams. or

  2. Use sync iterables and forgo being able to support async components.
```
The article proposes sync streams to get around this some, but the problem is that in any traversal of data where some of the data might trigger an async operation, you don't necessarily know ahead of time if you need a sync or async stream or not. It's when you hit an async component that you need it. What you really want is a way for only the data that needs it to be async.
We faced this problem in Lit-SSR and our solution was to move to sync iterables that can contain thunks. If the producer needs to do something async it sends a thunk, and if the consumer receives a thunk it must call and await the thunk before getting the next value. If the consumer doesn't even support async values (like in a sync renderToString() context) then it can throw if it receives one.
This produced a 12-18x speedup in SSR benchmarks over components extracted from a real-world website.
I don't think a Streams API could adopt such a fragile contract (ie, you call next() too soon it will break), but having some kind of way where a consumer can pull as many values as possible in one microtask and then await only if an async value is encountered would be really valuable, IMO. Something like `write()` and `writeAsync()`.
The sad thing here is that generators are really the right shape for a lot of these streaming APIs that work over tree-like data, but generators are far too slow.

by etler

0 subcomment

There are many use cases where having a value stream is very useful. I do agree having a separate simpler byte only stream would make sense though. I think the current capabilities of web streams should be kept and an IOStream could be added for optimizing byte streams.
Ideally splitting out the use cases would allow both implementations to be simpler, but that ship has probably sailed.

by szmarczak

0 subcomment

> This pattern has caused connection pool exhaustion in Node.js applications using undici (the fetch() implementation built into Node.js), and similar issues have appeared in other runtimes.
That's an inherent flaw of garbage collected languages. Requiring to explicitly close a resource feels like writing C. Otherwise you have a memory leak or resource exhaustion, because the garbage collector may or may not free the resource. Even C++ is better at this, because it does reference counting instead.

by bikeshaving

2 subcomments

A long time ago, I wrote an abstraction called a Repeater. Essentially, the idea behind it is, what would the Promise constructor look like if it was translated to async iterables.

  import { Repeater } from "@repeaterjs/repeater";
  
  const keys = new Repeater(async (push, stop) => {
    const listener = (ev) => {
      if (ev.key === "Escape") {
        stop();
      } else {
        push(ev.key);
      }
    };
    window.addEventListener("keyup", listener);
    await stop;
    window.removeEventListener("keyup", listener);
  });
  const konami = ["ArrowUp", "ArrowUp", "ArrowDown", "ArrowDown", "ArrowLeft", "ArrowRight", "ArrowLeft", "ArrowRight", "b", "a"];
  (async function() {
    let i = 0;
    for await (const key of keys) {
      if (key === konami[i]) {
        i++;
      } else {
        i = 0;
      }
      if (i >= konami.length) {
        console.log("KONAMI!!!");
        break; // removes the keyup listener
      }
    }
  })();

https://github.com/repeaterjs/repeater

It’s one of those abstractions that’s feature complete and stable, and looking at NPM it’s apparently getting 6.5mil+ downloads a week for some reason.

Lately I’ve just taken the opposite view of the author, which is that we should just use streams, especially with how embedded they are in the `fetch` proposals and whatever. But the tee critique is devastating, so maybe the author is right. It’s exciting to see people are still thinking about this. I do think async iterables as the default abstraction is the way to go.

by matheus-rr

1 subcomments

The practical pain with Web Streams in Node.js is that they feel like they were designed for the browser use case first and backported to the server. Any time I need to process large files or pipe data between services, I end up fighting with the API instead of just getting work done.
The async iterable approach makes so much more sense because it composes naturally with for-await-of and plays well with the rest of the async/await ecosystem. The current Web Streams API has this weird impedance mismatch where you end up wrapping everything in transform streams just to apply a simple operation.
Node's original stream implementation had problems too, but at least `.pipe()` was intuitive. You could chain operations and reason about backpressure without reading a spec. The Web Streams spec feels like it was written by the kind of person who thinks the solution to a complex problem is always more abstraction.

by cogman10

0 subcomment

Seems pretty similar to the design of OKIO in java [1]. With pretty similar goals ultimately. Here's a presentation on the internal details and design decisions. [2]
[1] https://github.com/square/okio
[2] https://www.youtube.com/watch?v=Du7YXPAV1M8

by tracker1

0 subcomment

One minor niggle on freeing resources... I'm hoping it becomes more popular with libraries, but there's using/await using with disppse/disposeAsync which works similarly to C#'s use of using.
I'm working on a db driver that uses it by convention as part of connection/pool usage cleanup.

by z3t4

0 subcomment

I like Node.JS streams. It's very satisfying to rent a 250MB memory machine and let it process GB's of data using streams.

by bennettpompi1

0 subcomment

I really enjoyed reading this article however I can't help but feeling that if you need anything described within it probably shouldn't be writing JS in the first place

by socalgal2

1 subcomments

Promises should not be a big overhead. If they are, that seems like a bug in JS engines.
At a native level (C++/rust), a Promise is just a closure added to a list of callbacks for the event loop. Yes, if you did 1 per streamed byte then it would be huge but if you're doing 1 promise per megabyte, (1000 per gig), it really shouldn't add up 1% of perf.

by rhodey

0 subcomment

the pull-stream module and its ecosystem is relevant here
the idea is basically just use functions. no classes and very little statefulness
https://www.npmjs.com/package/pull-stream

by notnullorvoid

1 subcomments

There's a lot I like about this API, mainly the pull-based iterator approach. I don't really see what the value of the sync APIs are though. What's the difference of just using iterators directly for sync streams?

by halfmatthalfcat

1 subcomments

The Observables spec should just get merged and implemented.
https://github.com/tc39/proposal-observable

by dilap

5 subcomments

> The problems aren't bugs; they're consequences of design decisions that may have made sense a decade ago, but don't align with how JavaScript developers write code today.
> I'm not here to disparage the work that came before — I'm here to start a conversation about what can potentially come next.
Terrible LLM-slop style. Is Mr Snell letting an LLM write the article for him or has he just appropriated the style?

by nottorp

0 subcomment

Well, it's also possible to replace JavaScript with a better language, it's just too late for it...

by kg

1 subcomments

It's a real shame that BYOB (bring your own buffer) reads are so complex and such a pain in the neck because for large reads they make a huge difference in terms of GC traffic (for allocating temporary buffers) and CPU time (for the copies).
In an ideal world you could just ask the host to stream 100MB of stuff into a byte array or slice of the wasm heap. Alas.

by adamnemecek

0 subcomment

It might be a good idea to look into the research on streams as coalgebras, there is quite a bit, for example here https://cs.ru.nl/~jrot/CTC20/.
Coalgebras might seem too academic but so were monads at some point and now they are everywhere.

by ralusek

0 subcomment

I tinkered with an alternative to stream interfaces:

https://github.com/ralusek/streamie

allows you to do things like

    infiniteRecords
    .map(item => doSomeAsyncThing(item), { concurrency: 5 });

And then because I found that I often want to switch between batching items vs dealing with single items:

    infiniteRecords
    .map(item => doSomeAsyncSingularThing(item), { concurrency: 5 })
    .map(groupOf10 => doSomeBatchThing(groupsOf10), { batchSize: 10 })
    // Can flatten back to single items
    .map(item => backToSingleItem(item), { flatten: true });

by murmansk

0 subcomment

For gods sake, finally, somebody have said this!

by paulddraper

0 subcomment

Just use AsyncIterator<UIntArray>.
The objection is
> The Web streams spec requires promise creation at numerous points — often in hot paths and often invisible to users. Each read() call doesn't just return a promise; internally, the implementation creates additional promises for queue management, pull() coordination, and backpressure signaling.
But that's 95% manageable by altering buffer sizes.
And as for that last 5%....what are you doing with JS to begin with?

by ai-christianson

2 subcomments

[flagged]

by animanoir

0 subcomment

[dead]

by Feathercrown

0 subcomment

[flagged]

by user3939382

2 subcomments

“ The Streams Standard was developed between 2014 and 2016 with an ambitious goal to provide "APIs for creating, composing, and consuming streams of data that map efficiently to low-level I/O primitives." Before Web streams, the web platform had no standard way to work with streaming data.”
This is what UDP is for. Everything actually has to be async all the way down and since it’s not, we’ll just completely reimplement the OS and network on top of itself and hey maybe when we’re done with that we can do it a third time to have the cloud of clouds.
The entire stack we’re using right down to the hardware is not fit for purpose and we’re burning our talent and money building these ever more brittle towering abstractions.

by shevy-java

4 subcomments

We deserve a better language than JavaScript.
Sadly it will never happen. WebAssembly failed to keep some of its promises here.