Building a system for robust 3D scene understanding and natural language interaction within complex environments.
Our projects build upon each other over time to expand the systems capabilities.
We are building a system for robust 3D scene understanding and natural language interaction within complex environments. Our goal is to develop a foundation model that can tokenize complex 3D scenes, perform instance detection, and enable spatial reasoning to answer complex language queries—ultimately working to make 3D scene understanding as accessible and powerful as its 2D counterpart.
Key capabilities of our 3D systems.
Our system takes a 3D scene (3DGS or PC) as input and, in a single neural network forward pass, outputs a feature for each 3D primitive.
Our system provides real-time open-vocabulary 3D content search leveraging the initially extracted 3D features.
Multi-institutional research group.