Researchers at Carnegie Mellon University have proposed a novel method called Diff2Scene for open-vocabulary 3D semantic segmentation and visual grounding tasks. The method leverages frozen representations from text-image generative models, eliminating the need for labeled 3D data. Source: https://dev.to/voxel51/eccv-2024-open-vocabulary-3d-semantic-segmentation-with-text-to-image-diffusion-models-35pm