From Andrew Ng’s “AI for everyone” courses on Coursera to tech giants’ open-sourced tools that lower the tech bar for building machine learning models, we are seeing a wide range of efforts aimed at simplifying AI to make it accessible to everyone.
Northstar is an interactive data science cloud platform introduced last year by MIT and Brown University. It enables users without programming experience or a background in statistics to easily explore and mine data through an intuitive black-and-while user interface on touchscreen devices such as smartphones, tablets or interactive whiteboards. The drag-and-drop interface allows users to easily discover patterns inside the data and build machine learning pipelines.
MIT and Brown have now upgraded the Northstar platform with an AutoML-based component called Virtual Data Scientist (VDS), which helps users generate machine learning models to run prediction tasks on datasets. VDS was introduced in the paper Democratizing Data Science through Interactive Curation of ML Pipelines presented this week at the ACM SIGMOD conference in Amsterdam.
It’s believed that VDS can be used for example by doctors in disease diagnosis; by business owners for sales forecasts, and even to guide coffee shop owners in their inventory planning. All this without requiring a data science background or the hiring of machine learning tech consultants.
User can also run predictive analytics tasks with VDS via models customized to their specific objectives, such as data prediction, image classification, or analyzing complex graph structures. For instance, if medical researchers want to predict potential blood disease in patients, they could simply drag and drop “AutoML” from the list of algorithms in the “operators” box on the screen and then add the “blood” feature from under the “target” tab. The system will then automatically recommend the best machine-learning pipelines for the task, along with their respective error rates, structure, computations, and so on.
Researchers evaluated VDS on 300 real-world datasets where its performance ranked among the fastest interactive AutoML tools thanks to its custom “estimation engine.” This estimation engine sits between the interface and the cloud, and automatically creates representative samples from a dataset. These can be progressively processed to produce high-quality results in seconds.
Researchers say in the future they hope to add a feature that could automatically alert users regarding potential data bias or errors.
The paper Democratizing Data Science through Interactive Curation of ML Pipelines can be found here. The project demo and test installation/collaboration information is here.
Author: Yuqing Li | Editor: Michael Sarazen