A previous article argued that AV players (like Waymo, Uber, Aurora, Cruise, Argo, Yandex) chose to control and own LiDAR sensor technology to ensure tighter coupling with the AI software stack. This coupling can also help understand which LiDAR performance features are critical for deployment. Working with multiple sensor modalities helps identify individual sensor features that are critical in different driving situations, eliminates duplicate and redundant information, and reduces unneeded sensor complexity. Tesla’s anti-LIDAR stance and Elon Musk’s “Lidar is a crutch” comment is an extreme case – where presumably the data and machine learning based on radar and camera data from over 0. 5M cars deployed in the field has convinced Tesla that LiDAR is not required in AVs.
RB: System 1 process data in real-time based on past and current information, delivering only meaningful points to enable more efficient point-wide classification and obstacle detection. Getting rid of 90% of the raw data points of LiDAR that are not relevant to dynamic object detection (points of the road surface, the vegetation, the sky, the static environment) allows you to feed the object identification layer with only relevant information (for example, moving or moveable objects for object tracking or road markings for lane-keeping). This improves the identification and control process (robust, fast, lower bandwidth requirements).
Artificial Intelligence (AI) based systems are required for replacing a human driver. Continued innovation and testing of these systems have driven the need for richer sensor data, either through the use of many sensors per AV or higher sensor capabilities – range, accuracy, speed, visibility of FoV (Field of View), resolution and data rates. Paradoxically, the increased sophistication of sensors raises barriers for deployment – higher sensor and compute costs, increased power consumption and thermal issues, reliability and durability concerns, higher time to decision making (latency), and possibly more confusion and errors. It also increases requirements for data transmission bandwidth, memory and computing capabilities (all driving up power, heat, and $$$s).
Human drivers sense a tremendous amount of information through different modalities – visual, audio, smell, haptic, etc. An inexperienced driver absorbs all this data, initially assuming that all of it is relevant. With practice and training, expert drivers can filter out the irrelevant and focus on the relevant information, both in time and space. This enables them to react quickly in the short term (braking for a sudden obstacle on the road or safely navigating out of traffic in the event of vehicle malfunction) and longer-term (changing lanes to avoid a slower moving vehicle). Machines trying to simulate human intelligence should be able to follow a similar model – initially acquire a large amounts of sensor data and train on this, but become more discriminating once the training achieves a certain level.
Aeye believes in the idea of “saliency” – and argues that the goal of a perception system in an AV is to detect and react to surprises (if it not surprising, it is boring! ) The IDAR™ is designed to be agile – by judiciously choosing locations in the scene that it transmits to and receives photons from (rather than spraying them uniformly across the FoV). These decisions are guided by information from the LiDAR itself or other sensors like a high-resolution camera, and intelligence (the "I" in the IDAR™). The goal is to inject photons in a region likely to return salient laser returns (surprises! ). Like Prophesee, dynamic regions in the scene are of more interest and likely to create more surprises.
Outsight solves the System 1 problem (reacting quickly to short term surprises in the driving environment) by providing an artificial Amygdala, and using this to facilitate the long-term Neocortex like functions. The basis of the semantic information that supports the short term decision making is a SLAM (Simultaneous Location and Mapping) on-chip approach which uses the past and present raw point cloud data to create relevant and actionable point-clouds and object detection. The SLAM information includes relative object locations and velocities in the car’s environment.
Successful machine learning should be able to identify the features that are important in the deployment phase. Analysis of the neuron behavior in DNN (Deep Neural Networks) can reveal the aspects of sensor data that are important versus those that are superfluous (similar to DNN neurons processing 2d vision information). This in turn can help thin down sensor and compute specifications for deployment. One of the goals of machine learning during the AV development phase should be to specify sensor suites that provide actionable data at the right time with the optimal level of complexity – to enable timely and efficient decision making and driving decisions.
Aeye (California based LiDAR company) promotes IDAR™ (Intelligent Detection and Ranging) – a 1550 nm wavelength LiDAR using Time of Flight (ToF) techniques to extract depth and intensity information in the scene. Per Aeye’s website, the IDAR™ is "the world’s first solid-state, leading-edge artificial perception system for autonomous vehicles that leverages biomimicry, never misses anything, understands that all objects are not created equal and does everything in real-time".
SR: Couldn’t a fast frame-based camera do what you do? Difference consecutive frames and locate events, but at the same time also have intensity level information on the whole scene? LV: No. Frame-differencing requires FPGA or SoC resources, whereas Prophesee’s cameras deliver events natively. Typically, only 10-30% of the data collected by framed cameras is relevant for driving control.
Event-based and ROI sensing seem like logical directions to enable sensor “thinning” and make them practical for AV deployment. There are opposing views, however. According to Raul Bravo, President of Outsight (France based LiDAR and 3D sensing software company), relying on dynamically creating higher resolution in the event zone in real-time is problematic – because if you know where the event is, you should already be acting, and if you have to search for the event and then interrogate it, then you are too late to act anyway.
The software for our system is like what most other LiDARs have – firmware and embedded SW for sensor control. We supply SDK (Software Development Kit) to customers to experiment with the adaptive control of LiDAR scan patterns. We also supply a library of scan patterns that can be used by our customers for different driving environments (the figure above is an example of such a scan pattern).
RB: Since the processing software integrates information from past events and data, lower resolution LiDAR can indeed be leveraged much better than traditional methods. For example, in the figure below, the raw LiDAR creates very few returns from a low reflectivity obstacle on the road like a tire (less than 5 points per frame, sometimes zero points).
An event-based sensor requires significant innovation in the pixel circuitry. The pixels work asynchronously and at much higher speeds since they do not have to integrate photons like in a conventional frame-based camera. The interconnection technologies to enable this is being worked in collaboration with Sony. They recently announced a project to jointly develop a stacked event-based vision sensor with and industry leading pixel size (< 5 μm) and dynamic range (124 dB).
In the scene below where the car and camera are stationary, the imagery collected by a standard framed HD camera provides a nice visual – but most of the data is not relevant for the immediate driving task (for example the buildings). A human driver easily filters out stationary objects like buildings and trees and focuses on moving pedestrians and cars to decide on the driving action.
Prediction: As AVs approach reality, practical deployment constraints (costs, size, heat, durability, decision making speed, hardware and software reliability) will force sensor and perception providers to focus on what is needed for field deployment rather than what is achievable in a lab or development environment. Superfluous requirements and specifications will continue to get eliminated, resulting in “thin” or lean sensing. In many ways, this is a circular game – a pull by AV providers to make sensors leaner and deployable, and a push by sensor providers to make AVs a reality. Cars excite me.
The mechanics of rapidly pointing the LiDAR towards regions of interest is achieved by a combination of intelligence and a fast, high resonant frequency, 2D Micro-Electro-Mechanical Mirror System (MEMs). The advantages of this approach are increased performance (accuracy, latency, resolution, range, speed) in the ROI, and reduced system complexity, cost, and power consumption. The figure below shows an example of a scanning pattern that can be created by AEye’s software-definable LiDAR.
The basic idea is to use camera and pixel architectures that detect changes in light intensity over a threshold (an event), and providing only this data to the compute stack for further processing. Relative to a high-resolution framed camera, an event-based camera registers and transmits data for only 10-30% of the pixels experiencing intensity changes.
Lean sensing – a critical enabler for autonomous vehicles https://t.co/bPHANQnmF5— Forbes Tech (@ForbesTech) September 5, 2020
Related videos from YouTube