From millions of fake accounts spreading across social media to credit hackers stealing from bank accounts, online fraud has become a nightmare for Internet users. As Internet companies redouble their efforts to detect and foil emerging fraud techniques, many are turning to artificial intelligence for solutions.
DataVisor is a Mountain View, California based anti-fraud startup that uses AI to detect fake accounts, prevent money laundering, and protect financial institutions from credit scams. The company offers clients its DataVisor APIs for real-time data connection, and a specialized UI for direct results.
DataVisor Co-founders Yinglian Xie and Fang Yu were both Microsoft Senior Researchers for seven years, using data-driven approaches to solve the online service security challenges Internet companies were wrestling with.
The market for digital anti-fraud services is growing as transactions increasingly go online. Global non-cash transaction volume grew 11.2% in 2015 to reach 433.1 billion, the highest growth rate of the past decade, according to the World Payments Report 2017. Online fraud is also moving beyond the financial services sector to become a universal problem, also affecting for example gaming and social media.
Conventional anti-fraud approaches which are largely grounded in credit investigation and hand-tuned rules can no longer handle the scale of the problem. Fraud detection companies have been adopting AI — particularly supervised learning — to automate their fraud detection processes using sophisticated algorithms, vast amount of data, and robust computing capabilities. Trained with historic labeled data that distinguishes real users from fraudulent accounts, supervised learning models can efficiently classify new data and detect suspicious activity.
Yet supervised learning struggles when faced with rapidly variational fraud techniques. Says Xie, “Nowadays, online fraud can change in 24 hours. The characteristic of this area is that we are faced with a constantly changing fraudster, so it is difficult to get enough historical labeled data, limiting the effectiveness of supervised learning.”
The unpredictability of online fraud schemes raises another challenge for supervised learning. Users can potentially be defrauded while using any of a platform’s features, or as Xie suggests at any time during a normal “user life cycle.” This could involve registration, payment, comments, or even app installation. The more features there are in a user life cycle, the more likely that user will get scammed.
“We have helped IGG, a renowned video game publisher and developer, to combat game installation fraud. I barely noticed this problem before, but then realized it could be a big issue in the game industry,” says Xie.
An increasing number of fraud detection companies like DataVisor are now employing unsupervised machine learning (UML), a type of machine learning algorithm that clusters unlabeled data by discovering hidden patterns. Madrid-based online financial institute Openbank for example uses UML algorithms to detect fraud and money laundering.
DataVisor’s fraud detection solution has three components: a UML Engine to cluster results, an Automated Rules Engine to replace time-consuming manual rule-making, and a Global Intelligence Network to gather vast amounts of information and domain knowledge.
The company’s flagship is its UML Engine, which DataVisor bills as the first proven UML solution capable of handling vast volumes of data and discovering patterns from accounts and events in real-time.
“Most papers around unsupervised machine learning are based on very small datasets, and no one has been very successful at applying advanced unsupervised learning algorithms to large-scale data. The challenge is difficult,” says Xie.
According to a company white paper, DataVisor’s UML Engine uses techniques such as natural language processing, image metadata analysis and graph analysis to extract features such as profile info, behaviors and activities, comments and metadata etc. from both structured data and unstructured data.
These features will be then grouped into clustered results with the important feature dimensions and distance functions selected. DataVisor says its UML Engine algorithms provide a more efficient and effective solution than common methods of dimensionality reduction, such as Principal Component Analysis (PCA). The Engine also deploys graph analysis and supervised machine learning algorithms to improve accuracy.
The last step is to rank the detected accounts, assign them confidence scores, and categorize a collection of malicious accounts with similar features, also known as “attack rings.”
The DataVisor white paper reports the UML Engine has been in use for over three years — with a 90-99% detection accuracy — and has that DataVisor has protected over two billion user accounts for large enterprises such as Yelp, Pinterest, Fortune 500 banks, IGG, Toutiao, etc., showing impressive growth for an early-stage startup.
DataVisor is now looking to expand. The company raised US$40 million last month, led by Sequoia Capital China along with existing investors New Enterprise Associates (NEA) and GSR Ventures.
“We believe a company driven by algorithms and data, such as DataVisor, will be very competitive in the future. Because of some characteristics of the anti-fraud industry, we think the barriers will be high, and there will be a ‘Matthew effect’ wherein the rich get richer and the poor get poorer,” says Rock Wang, Managing Director of Sequoia China.
The anti-fraud industry is still in its early stages of development, and while there is a long road ahead for DataVisor, the company believes its detection techniques and services will stay ahead of emerging online fraud schemes. “The development of technology is endless. I think we can still move forward one step further.”
Journalist: Tony Peng| Editor: Michael Sarazen