Autonomous perception is entering a new era, one where camera-first architectures are rapidly gaining ground over traditional LiDAR-centric designs. This transformation is driven by Bird’s-Eye-View (BEV) neural networks, which reconstruct a top-down 3D understanding of the environment from multiple camera feeds.
By leveraging BEV perception, vehicles can achieve robust object detection and motion tracking with lower hardware cost and greater deployment flexibility. For Autoware, this aligns perfectly with its commitment to AI-first, software-defined, and hardware-agnostic autonomy.
In collaboration with the Autoware Foundation, MulticoreWare Inc. — a global technology company specializing in AI optimization, computer vision, and embedded acceleration — has integrated two BEV-based detection models into Autoware’s Perception stack.

BEVDet – Real-Time 3D Detection from Multi-Camera Images
BEVDet (Bird’s-Eye-View Detection) serves as the foundation of Autoware’s camera-based perception BEVDet applies the Lift–Splat–Shoot principle: lifting image features into 3D, splatting them into a unified BEV grid, and detecting objects in top-down space.
Key Features:
- ROS 2 Integration: Runs as a native Autoware perception node.
- Camera-Only Operation: Derives 3D geometry purely from multi-view camera inputs.
- Unified BEV Representation: Provides consistent spatial context for downstream planning.
- TensorRT Optimization: Upgraded to TensorRT 10.x with FP16 mixed-precision inference.

BEVFormer — Temporal Transformers for Advanced BEV Perception
BEVFormer builds upon BEVDet by introducing temporal reasoning, fusing features from multiple frames to handle motion, occlusion, and continuity over time.
Technical Highlights:
- Spatial Cross-Attention: Selectively gathers visual features from all camera views.
- Temporal Self-Attention: Maintains consistency of BEV features across frames.
- C++ Inference Pipeline: Entirely reimplemented for ROS 2 with ONNX → TensorRT workflow.
- RViz Visualization: Enables real-time 3D bounding box rendering and trajectory tracking.
Why It Matters
- Demonstrating open collaboration between the Autoware Foundation and MulticoreWare to advance open-source, deployable AI.
- Transitioning from LiDAR-heavy to scalable, vision-based perception.
- Achieving real-time performance on embedded and edge devices.
- Strengthening spatial and temporal understanding in complex driving scenes.
Access the Full Technical Report