When a falcon spies a tiny quail or other prey in the distance it can dive from the clouds at a speed of over 300 kilometres per hour. Although humans are not naturally endowed with either of these skills, technology has brought us both — we can fly faster than the speed of sound, and now, thanks to advances in computer vision tech, we can visually search huge areas from the sky in the blink of an eye.
Visual search is the ability of an AI model to find images that are visually similar to a query image — a form of pattern recognition applied to the task of image retrieval. The scope and speed of a new visual search model for aerial images introduced by New Mexico’s Descartes Labs would make even a falcon envious.
“This system enables real-time visual search over the surface of the earth,” boasts the paper Visual Search Over Billions of Aerial and Satellite Images. The proposed system is capable of searching the continental United States at 1 -meter pixel resolution, corresponding to approximately 2 billion images, in around 0.1 seconds. As convolutional neural networks (CNN) trained on large image datasets have provided promising outcomes on extracting rich feature representations from photographic imagery, researchers used a CNN in their content-based image retrieval for visual search system.
To reduce data and compute requirements and ensure the visual search system could run in real-time, researchers defined visual similarity using 512 abstract visual features generated by a CNN that had been trained on aerial and satellite imagery. The conversion of these features into binary values played an essential role in the process because of the reduced data footprint.
Researchers chose to use a convolutional neural network with a 50-layer ResNet architecture that was pretrained on ImageNet. They noticed the last few layers of the ImageNet-trained network could deliver surprisingly good similarity search results for satellite imagery even though they were trained largely on photographic images of animals, plants and vehicles.
The team used two image datasets in the study: Aerial over USA and Landsat 8 over Earth. The former consists of aerial images from the National Agriculture Imagery Program and the Texas Orthoimagery Program. The latter was built by adapting data from one of the powerhouses for NASA’s satellite-based earth observation program — Landsat 8. Both datasets include a wide range of object types and landscapes ranging from industrial infrastructure such as wind turbines to natural features such as ponds.
The visual search model performed well on query objects from the classes used during the fine-tuning stage, such as pictures of wind turbines from the NAIP dataset. Although less impressive, the results on generic images that were not part of the supervised learning phase were also judged good enough to be leveraged for applications such as training downstream computer vision models.
The team also identified some shortcomings. For example although some low-quality search results include images that are visually similar to the query images, they failed to contain the query object class. The delicate balance between offering good search results for common object classes and generic class-agnostic search remains a challenge for further studies.
The researchers believe adding features such as multi-scale search, geospatial filtering and temporal filtering to the visual search system can provide an even more efficient tool for thoroughly searching and interpreting visual information in large collections of aerial and satellite imagery.
Visual Search Over Billions of Aerial and Satellite Images is available on ScienceDirect, and the system interactive demo is available at Descartes Labs.
Journalist: Fangyu Cai | Editor: Michael Sarazen