FRESH

Hacker News

Home

SHARP, an approach to photorealistic view synthesis from a single image

508 points by dvrp

by superfish

4 subcomments

"Unsplash > Gen3C > The fly video" is nightmare fuel. View at your own risk: https://apple.github.io/ml-sharp/video_selections/Unsplash/g...

by Leptonmaniac

9 subcomments

Can someone ELI5 what this does? I read the abstract and tried to find differences in the provided examples, but I don't understand (and don't see) what the "photorealistic" part is.

by rcarmo

2 subcomments

Well, I got _something_ to work on Apple Silicon:
https://github.com/rcarmo/ml-sharp (has a little demo GIF)
I am looking at ways to approximate Gaussian splats without having to reinvent the wheel, but I'm a bit over my depth since I haven't been playing a lot of attention to those in general.

by supermatt

1 subcomments

I note the lack of human portraits in the example cases.
My experience with all these solutions to date (including whatever apple are currently using) is that when viewed stereoscopically the people end up looking like 2d cutouts against the background.
I haven't seen this particular model in use stereoscopically so I can't comment as to its effectiveness, but the lack of a human face in the example set is likely a bit of a tell.
Granted they do call it "Monocular View Synthesis", but i'm unclear as to what its accuracy or real-world use would be if you cant combine 2 views to form a convincing stereo pair.

by moondev

5 subcomments

cuda gpu only
https://github.com/apple/ml-sharp#rendering-trajectories-cud...

by avaer

0 subcomment

Is there a link with some sample gaussian splat files coming from this model? I couldn't find it.
Without that that it's hard to tell how cherry-picked the NVS video samples are.
EDIT: I did it myself, if anyone wants to check out the result (caveat, n=1): https://github.com/avaer/ml-sharp-example

by yodon

0 subcomment

> photorealistic 3D representation from a single photograph in less than a second

by derleyici

1 subcomments

Apple's Spatial Scene in the Photos app shows similar behavior, turning a single photo into a small 3D scene that you can view by tilting the phone. Demo here: https://files.catbox.moe/93w7rw.mov

by tartoran

1 subcomments

Impressive but something doesn't feel right to me.. Possibly too much sharpness, possibly a mix of cliches, all amplified at once.

by brcmthrowaway

1 subcomments

So this is the secret sauce behind Cinematic mode. The fake bokeh insanity has reached its climax!

by arjie

1 subcomments

This is incredibly cool. It's interesting how it fails in the section where you need to in-paint. SVC seems to do that better than all the rest, though not anywhere close to the photorealism of this model.
Is there a similar flow but to transform either a video/photo/NeRF of a scene into a tighter, minimal polygon approximation of it. The reason I ask is that it would make some things really cool. To make my baby monitor mount I had to knock out the calipers and measure the pins and this and that, but if I could take a couple of photos and iterate in software that would be sick.

by nashashmi

0 subcomment

I could not find any mention of it but does this use regenerative AI? I can’t imagine it able to accomplish anything like this without using a large graphical Model in the back.

by Dumbledumb

0 subcomment

In Chapter D.7 they describe: "The complex reflection in water is interpreted by the network as a distant mountain, therefore the water surface is broken."
This is really interesting to me because the model would have to encode the reflection as both the depth of the reflecting surface (for texture, scattering etc) as well as the "real depth" of the reflected object. The examples in Figure 11 and 12 already look amazing.
Long tail problems indeed.

by diimdeep

0 subcomment

Works great, model file is 2.8 GB, on M2 rendering took a few seconds, result is guassian .ply file but repo implementation requires CUDA card to render video, I have used one of webgl live renderers from here https://github.com/scier/MetalSplatter?tab=readme-ov-file#re...

by alexgotoi

1 subcomments

Apple dropping this is interesting. They've been quiet on the flashy AI stuff while everyone else is yelling about transformers, but 3D reconstruction from single images is actually useful hardware integration stuff.
What's weird is we're getting better at faking 3D from 2D than we are at just... capturing actual 3D data. Like we have LiDAR in phones already, but it's easier to neural-net your way around it than deal with the sensor data properly.
Five years from now we'll probably look back at this as the moment spatial computing stopped being about hardware and became mostly inference. Not sure if that's good or bad tbh.
Will include this one in my https://hackernewsai.com/ newsletter.

by pluralmonad

0 subcomment

This seems like what they have been doing with album covers on applemusic for a couple years.

by benatkin

0 subcomment

That is really impressive. However, it was a bit confusing at first because in the koala example at the top, the zoomed in area is only slightly bigger than the source area. I wonder why they didn't make it 2-3x as big in both axes like they did with the others.

by Geee

1 subcomments

This is great for turning a photo into a dynamic-IPD stereo pair + allows some head movement in VR.

by reactordev

0 subcomment

This would be really fun to create stereoscopic videos with. Take a video input, offset x+0.5 or some coefficient, take the output, put them side by side (or interlaced for shutter glasses) and viola! 3D movies.

by mhalle

0 subcomment

It would be interesting to see how much better this algorithm would be with a stereo pair as input.
Not only do many VR and AR systems acquire stereo, we have historical collections of stereo views in many libraries and museums.

0 subcomment

by BoredPositron

0 subcomment

The paper is just a word salad and it's not better than previous sota? I might be missing a key element here.

by orthoxerox

0 subcomment

The resulting animations feel more like "Live2D" than 3D.

by remh

1 subcomments

Enhance! https://www.youtube.com/watch?v=LhF_56SxrGk

by pmontra

0 subcomment

So Deckard got lucky that the picture enhancement machine allucinated the correct clue? But that was boundto happen 6 years ago, no AI yet.

by harhargange

2 subcomments

TMPI looks just as good if not better.

by yieldcrv

0 subcomment

I want to see with people

by stronglikedan

0 subcomment

That's cool and all, but it seems like only the first step in this, where they go from 2D photo all the way to fully animated (animatable?) characters: https://www.youtube.com/watch?v=DSRrSO7QhXY

by yodon

4 subcomments

See also Spaitial[0] which announced today full 3D environment generation from a single image
[0]https://www.spaitial.ai/

by somethingsome

0 subcomment

Last time I tried depth pro it was not really metric, I wonder if this one is as they claim. If someone has some experience on that side I would be interested

by codebyprakash

0 subcomment

Quite cool!

by ballpug

0 subcomment

[dead]

by IlikeKitties

0 subcomment

[flagged]

by calvinmorrison

4 subcomments

I understand AI for reasoning, knowledge, etc. I haven't figured out how anyone wants to spend money for this visual and video stuff. It just seems like a bad idea.