✨ TL;DR
Asset Harvester converts sparse, real-world object observations from autonomous vehicle driving logs into complete 3D assets suitable for simulation. The system combines large-scale data curation, geometry-aware preprocessing, and a novel sparse-view-conditioned model (SparseViewDiT) to generate simulation-ready 3D objects from limited viewing angles.
Closed-loop simulation is essential for autonomous vehicle development, requiring interactive 3D environments for testing and validation. While neural scene reconstruction can convert driving logs into 3D environments, it fails to produce complete 3D object assets needed for agent manipulation and novel-view synthesis from large viewpoint changes. Real-world AV data presents significant challenges: objects are observed from limited viewing angles with sparse coverage, captured by heterogeneous sensors under varying conditions. Existing image-to-3D methods struggle with these constraints, as they typically assume dense, controlled input views rather than the sparse, in-the-wild observations characteristic of driving logs.
Asset Harvester employs a system-level design combining multiple components to handle real-world AV data challenges. The pipeline begins with large-scale curation of object-centric training tuples from driving logs, followed by geometry-aware preprocessing that handles heterogeneous sensor data. The core technical contribution is SparseViewDiT, a model explicitly designed for limited-angle views that couples sparse-view-conditioned multiview generation with 3D Gaussian lifting. The training recipe incorporates hybrid data curation strategies, augmentation techniques to handle data variability, and self-distillation to improve robustness. This end-to-end system transforms sparse object observations into complete, reusable 3D assets ready for simulation deployment.