That means it doesn’t need depth. Depth is helpful for getting good point locations, but SLAM on multiple frames should also work.
I’m guessing that they are researching this for AR or robot navigation. Otherwise, the focus on accurately dividing the scene into objects wouldn’t make sense for me.
Also, can it run on Apple silicon?