FRESH

Hacker News

Home

Show HN: Apple's SHARP running in the browser via ONNX runtime web

182 points by bring-shrubbery

by exabrial

3 subcomments

A *2.4gb* ONNX? That is wild. This format continues to impress me. ONNX uses 32bit single precision floats I believe, so thats something like ~644m float params/constants. I recently dove deep 'traditional ML' side of the ONNX serialization format for the purposes of writing an JVM ML compiler for trees and regressions. ONNX actually quite clever the way it serializes trees into parallel arrays (which is then serialized using protobuf). My trees have capped out at < 32mb. I haven't dove into the neural net side of things yet, mainly because I don't have any models to run in prod.(https://github.com/exabrial/petrify if anyone is interested.)

by andybak

2 subcomments

I vibecoded a simple web app using Sharp that allowed be to quickly browse any local image folder and view them as "almost" volumetric 3d scenes in a VR headset.
I precomputed and cached each one so it was nearly instant. The effect - although only a crude wrapper around what Sharp already does - was quite transformative and mesmerising. Just the ease of pointing it at any folder of photos and viewing them fully spatially.
It was a bit of a mess code-wise and kinda specific to my local setup - but I should really clean it up deploy it somewhere for other people to try. Although I keep assuming someone else will do it before me and make a better job of it.

by kodablah

1 subcomments

Nice, I've also been doing some similarly neat things via ONNX web at https://intabai.dev (caution, just PoC tools atm, only Chrome tested, only some mobile devices work, no filters).
I think all-client-side in-browser AI imagery is becoming very doable and has lots of privacy benefits. However ONNX web leaves a lot to be desired (I had to proto patch many pytorch conversions because things like Conv3D ops had webgpu issues IIRC). I have yet to try Apache TVM webgpu approaches or any others, but I feel if the webgpu space were more invested in, running these models would be even more feasible.

by mattbaconz

0 subcomment

2.4GB in browser is crazy. I’m curious whether quantization works here or if the model quality falls apart too much.

by javier2

1 subcomments

Did not work in Firefox on Linux, but it runs on Chrome.
Have to admit, I dont get it. I tried it with 3 landscape photos I have and the results were nowhere close to the results in the demo, but that just speaks to the model.
Regardless, its very cool as a browser tech showcase.

by amelius

2 subcomments

I don't like that it uses only a single photo. This means it is going to make up a lot of stuff. E.g. if I show it a photo of a poster, then it will make that poster 3D. With only two photos that problem would already be solved.

by parentheses

0 subcomment

I've been poking at running LLMs in the browser. It feels like we're definitely close (<1 year) to seeing real use cases there.
Ubiquity and coverage of devices is what will take longest. Largely dependent on how well we can shrink models with similar performance and how much we can accelerate mobile devices. This feels like it's but further (<3 years?)

by david_mchale

0 subcomment

I can't wait to get something like this small enough to fit into a browser extension. I already use ONNX for zoom + enhance in Ultra Zoom, zooming from 2d to 3d would be crazy.

by jeroenhd

1 subcomments

What are the requirements for running this? Chrome throws a whole bunch of "out of memory" errors into the console when I try to execute these. I'm guessing 4GiB of VRAM is not enough?

by vessenes

1 subcomments

This is cool. For practitioners, What’s the current state of the art for free form multi picture to splat? The last time I looked at it the pipeline was pretty janky and included a few separate steps.

by herpdyderp

1 subcomments

Loading the model crashes my browser tab from memory usage :/

by andruby

1 subcomments

Are there any examples one could view before downloading?

by sroussey

0 subcomment

Does it use both WASM and WebGPU?

by echelon

1 subcomments

> inference itself is a few seconds on a recent Mac
This is impressive as hell
Very cool demo. It works in about ~9 seconds on my machine.
A few asks if you're going to devote more time to the project: can you make a full orbital camera - it seems to not be able to orbit 360? Also, can you use double click drag to move the camera in non-orbiting mode for view refinement? (Super minor nitpicks - this demo is really cool.)
> Caveats: SHARP's released weights are research-use only (Apple's model license, not the code's).
Nobody should GAF about this. We have all the major players distilling each other in the open. This gives Apple the ability to slap you with lawyers, but in practice you'll often get more done if you just break the rules.
Do you know of any other image-to-splat models? WorldLabs has a few versions of their Marble model, and the Tencent Hunyuan team just released HyWorld as open weights:
https://github.com/Tencent-Hunyuan/HY-World-2.0
HyWorld looks to be SOTA and better than all the other players.
Apple's Sharp is awesome in that it is fast, but it only generates a small depth sample from the image. There are no back faces or splats, so if you move the camera even slightly from the original perspective, you'll see lots of holes.

by zb3

2 subcomments

Why is it so large? Is it the same model used to create 3D effects on iOS lockscreen?

by jumanuba

0 subcomment

[dead]

0 subcomment

by minjikim89

0 subcomment

[flagged]

by tokenhub_dev

0 subcomment

[dead]

by hottrends

0 subcomment

[flagged]

by takahitoyoneda

0 subcomment

[flagged]

by eddyaipt

0 subcomment

[dead]

by Grappelli

0 subcomment

[flagged]

by jt543ujtfrry

0 subcomment

[dead]

by tim0414

0 subcomment

[dead]