As AI models and the databases they rely on continue increasing in size, researchers are increasingly seeking ways to boost efficiency. While K-nearest neighbour search plays a crucial role in information retrieval tasks such as web search, its computation cost and query latency become prohibitively high in big data scenarios. This has led to the development of approximate nearest neighbour search (ANNS) methods for fast high-recall search. Still, even ANNS computation cost can be very expensive when dealing with billion-scale databases.
In the new paper SPANN: Highly-efficient Billion-scale Approximate Nearest Neighbor Search, a research team from Microsoft, Peking University, Tencent, and Baidu proposes SPANN, a simple but surprisingly efficient memory-disk hybrid vector indexing and search system that guarantees both low latency and high recall. SPANN achieves a 2× speedup over the state-of-the-art ANNS solution DiskANN while retaining the same recall quality and memory cost.
There are two main ANNS approaches for reducing the memory cost associated with billon-scale datasets: inverted index-based and graph-based. Inverted index-based methods split the vector space into K regions by KMeans clustering such that search is only performed in the few regions that are close to the query and employ vector quantization methods such as product quantization to compress the vectors. They use a lossy data compression solution that stores all the compressed vectors and the posting lists in the memory. Graph-based approaches such as DiskANN adopt a hybrid solution, storing the product quantization compressed vectors in memory while storing the navigating spread-out graph and full-precision vectors on the disk.
The proposed SPANN follows the inverted index methodology, but instead of leveraging lossy data compression, it adopts a simple memory-disk hybrid solution.
The team identifies three main challenges in addressing the search problem: posting length limitation, boundary issues and diverse search difficulty; and introduces three techniques designed to solve these challenges and enable their memory disk hybrid solution.
In the index-building stage, the team first limits the length of the posting lists to effectively reduce the number of disk accesses for each posting list in the online search. They then expand the points in the closure of the corresponding posting lists to improve the quality of posting lists, which increases the recall probability of the vectors located on the boundary of the posting lists. In the search stage, they propose a query-aware scheme to dynamically prune the access of unnecessary posting lists and ensure both high recall and low latency.
In their evaluations, the team compared SPANN with current state-of-the-art billion-scale disk-based ANNS algorithms, using recall, latency and vector-query as their performance metrics.
The results show that SPANN is 2× faster than the state-of-the-art ANNS solution DiskANN while maintaining the same recall quality of 90 percent with the same memory cost on billion-scale datasets. Moreover, SPANN can reach 90 percent recall@1 and recall@10 in just ~one millisecond with only 10 percent of original memory cost, indicating its ability to support super large scale vector search with high efficiency and low serving cost.
Author: Hecate He | Editor: Michael Sarazen
We know you don’t want to miss any news or research breakthroughs. Subscribe to our popular newsletter Synced Global AI Weekly to get weekly AI updates.