3D-SSD: Learning Hierarchical Features from RGB-D Images for Amodal 3D Object Detection.
This paper aims at developing a faster and a more accurate solution to theamodal 3D object detection problem for indoor scenes. It is achieved through anovel neural network that takes a pair of RGB-D images as the input anddelivers oriented 3D bounding boxes as the output. The network, named 3D-SSD,composed of two parts: hierarchical feature fusion and multi-layer prediction.The hierarchical feature fusion combines appearance and geometric features fromRGB-D images while the multi-layer prediction utilizes multi-scale features forobject detection. As a result, the network can exploit 2.5D representations ina synergetic way to improve the accuracy and efficiency. The issue of objectsizes is addressed by attaching a set of 3D anchor boxes with varying sizes toevery location of the prediction layers. At the end stage, the category scoresfor 3D anchor boxes are generated with adjusted positions, sizes andorientations respectively, leading to the final detections using non-maximumsuppression. In the training phase, the positive samples are identified withthe aid of 2D ground truth to avoid the noisy estimation of depth from rawdata, which guide to a better converged model. Experiments performed on thechallenging SUN RGB-D dataset show that our algorithm outperforms thestate-of-the-art Deep Sliding Shape by 10.2% mAP and 88x faster. Further,experiments also suggest our approach achieves comparable accuracy and is 386xfaster than the state-of-art method on the NYUv2 dataset even with a smallerinput image size.
Stay in the loop.
Subscribe to our newsletter for a weekly update on the latest podcast, news, events, and jobs postings.