This is the first installment of the Synced Lunar New Year Project, a series of interviews with AI experts reflecting on AI development in 2018 and looking ahead to 2019. In this article, Synced chats with Clarifai Founder and CEO Matt Zeiler on recent progress in computer vision and his company’s plans for the future. Founded in New York in 2013, Clarifai produces advanced image recognition systems.
Based on your observations over the past year, what are some trends in the field of computer vision? How do you evaluate the current stage of visual recognition technology？
In 2018, I think there was a lot of interesting work on generating data. What we saw is very photorealistic generated content. The value of that is either to generate a new kind of novel network; or renderings of a scene automatically based on very little information. Like maybe you just have a segmentation mask which says there’s a road in this area, there’s a car over here, there’s a tree over here, and then it can render it. You could take that rendered image and even use it as training data, which is something really exciting because you could generate a bunch of data and then use it to teach a new AI model how to recognize things.
Another thing that we’ve been very passionate about here at Clarifai is getting computation to run these AI models at the edge of a network, on mobile devices like iOS and Android devices or IoT cameras or on-premise servers. Wherever a customer wants to run AI, we want to be able to run that. And we saw similar activities from other people in the AI space doing a lot of research around compressing models, making them efficient so that they can run on these types of new processors.
What are your expectations for the development of visual recognition technologies and applications in 2019?
I think there’s going to be a few things. Computation at the edge is going to be really, really important. So far, people have been using CPUs and GPUs that are built into the devices today. But going forward into the next year, you’re going to start seeing dedicated hardware be more available. We’re actually working on ways to run on specific dedicated hardware built for AI that can actually give really low power usage and really fast processing speed near where the data is being created.
Another is getting smarter about fusing different data types together. So not just relying on pixels of images or video to do computer vision, but fusing that with things like, if you want to search for a shirt in different colors, color is a signal, or different sleeve lengths, that’s another signal, or different price ranges… Can we start understanding all these different types of data together in a holistic way to allow people to understand and extract the value of that data better? I think that’s a trend we’re going to see.
Can you summarize progress Clarifai made in 2018?
There’s been a few things around pushing computation to the edge. We now have iOS, Android, and on-premise SDKs so we can run anywhere, they all connect to our cloud. Our user interfaces can manage your data and train models and then you’ll be able to run them wherever you need them, which is a huge value-add to customers. It all works seamlessly together as one platform.
We’ve launched some of the work around building detection models, finding not just what things are in an image, but where those things are located. We launched a face detection model last week, and it’s way better than our previous version. We’re seeing these huge leaps, as we collect more and more training data and algorithms advance, the models are getting super accurate.
How would you identify Clarifai’s pros and cons compared to other key players in computer vision, particularly regarding the visual search market & image recognition market?
The platform being seamless is really a big advantage of being a small company, where the different product teams are literally sitting next to each other. Whereas for our competitors like Google, Microsoft, Amazon, they’re really large and so it’s hard to coordinate efforts. Our customers can come in with a problem that they want to solve in their business and we can do the whole thing — from collecting more training data, labeling the data, training the models, deploying the models to wherever they need to run, and even, once the model is out there, we can actually iterate and improve it over time.
What will be Clarifai‘s focus for 2019?
I think there’s really three. One is at the edge of a network, I like to call it Smart X — smart homes, smart aerial like drones and aircraft, smart businesses, and smart stores. This is a physical world initiative to get our computer vision out there where seeing something in real time is an important attribute.
Another is digital insights. Over the last five years, this is where we’ve focused: understanding things in consumer photos, travel photos, real estate — and we’re going to be doubling down on our search product to really get insights out of massive amounts of digital content, to slice and dice and extract insights from them.
And then the last one is really driving platform adoption. We’ve seen, over the years, that we started with this one-size-fits-all model. We built some models for specific use cases like travel, weddings, food, etc., but once we opened up the platform to allow custom classification models, the stickiness really grew, and the number of developers and users really grew. And so we’re going to be launching more customization in the platform, and opening up a lot more of the platform to allow developers to really customize it in the ways that only they can. It brings a lot of creativity, and teaches us what use cases are important as well if we give developers freedom.
Clarifai’s technical strength in computer vision is widely recognized. Last year, you launched the Enhanced General Model, your smartest, most diverse, and most robust to date. How do you plan to innovate on your existing technologies and products? Is there a direction you will focus on?
This is the fifth iteration of the public version of our general model. We’re collecting more and more training data from a variety of different sources. A really big source is our actual API traffic, and we’re doing some research right now to be able to leverage that traffic, which isn’t labeled. It comes in as just images or videos, and we want to be able to learn from that automatically so that maybe the next version of the general model is actually learning on its own. There’s lots of research which I’m really excited about because one of the holy grails of AI is either semi-supervised or unsupervised learning, where there doesn’t have to be a human in the loop. And we’re getting some really interesting research results out of that right now, showing improvements without humans in the loop.
What are some technical challenges Clarifai must overcome to reach the next level?
There’s always two. There’s the data challenge and the algorithm challenge. The algorithm side I think is less important, to be honest, because the research community is very strong around the world in publishing new ideas, and most of those ideas are algorithms because the research that’s being done is on fixed-size datasets. ImageNet was a very popular one, and there’s COCO. Those types of datasets are great for the research community because they can come up with new algorithm ideas. And so we benefit from that. As algorithms come out, we get to put them into our tech stack and iterate on them, and we add in our own ideas and make them even better.
And then the data is where, I think, a lot of innovation has to happen in being able to use less and less data. One thing we did building our custom training for classification was to allow new users to come to a platform, and not require a million images to be able to get any benefit out of it. They only need a handful, and training doesn’t take weeks of time like it normally does, it only takes literally seconds. It’s a much faster and much easier process for people to get AI built for their applications. There was a lot of research on the data side and the engineering side to make it really fast, and I think that’s why we can have unique innovations here at Clarifai.
What technologies have you been paying more attention to recently? Which industrial application directions do you value most?
We’ve considered blockchain and distributed networks and privacy preserving, which I think could be really important as we’re building computation at the edge of the network. There’s still a lot of core fundamental problems that need to be solved to make it really beneficial for something like Clarifai to leverage. But that’s one field that I’m monitoring…
I’ve always enjoyed consumer electronics, and we were just presenting at the Consumer Electronics Show in Las Vegas a couple weeks ago, some of our new results on the Smart X initiative, running security cameras in real time and recognizing people walking around and heat maps of patterns of where people walk in a store, for example.
What other new technical directions do you think need to be explored in the future?
I started Clarifai five years ago after doing my PhD at NYU, and that PhD was focused one hundred percent on image understanding. We’ve since expanded into video, but long term, I want to expand into audio, text, and other types of data, and I think that’s going to be really crucial to start doing that holistic understanding I was talking about earlier, and so that’s where I see Clarifai going in the future.
In 2018, there was widespread public concern over AI and computer vision violating human rights and accessing private information. Also data bias is still one of the challenges haunting AI researchers. How does that affect your business?
I think one really important point is that AI is going to be surpassing human capability in many different ways — in terms of accuracy, in terms of speed, in terms of the scale of data it can handle, and in terms of making better and unbiased judgments. A lot of the contention around face recognition is in applications like law enforcement for example, where running a biased classifier might make a poor judgment call, which could be the case if it’s truly biased.
But you have to consider that today in the human system, every human has their own biases, and they get tired, and they get hungry, and they get stressed… None of these things happen to machines, and so, over time, as we collect more and more training data and companies like Clarifai focus on training models that are as unbiased as our experts can make them, we’ll get those out there into the field. That’s going to end up making better judgment calls than any one randomly selected law enforcement person.
AI, long term, is going to be much, much better. I think people just don’t understand that yet out there in the public.
Is Clarifai involved with any partnerships or customers in China?
Yeah, we do have some customers in China using our API. ByButter (北京缪客科技有限公司: 黄油相机) is one of our bigger ones, I actually have never talked to them, but from my understanding they’re like an Instagram-type of app over there. The app is called “ButterCam” [a popular mobile photo editing app offering user the ability to add text and effects to their images].
With a lot of these consumer apps, there’s two things we can help with. One is organizing content so that as it’s being uploaded and created by users and it’s tagged so that other users can find it; and the other big application is moderating content. Whenever you have users able to upload content, you need to look for things like nudity and drugs and weapons and that kind of stuff to filter it out, and so those are the types of things we can help a company like ByButter do.
Where do you see Clarifai in three years?
We’re trying to build the leading AI platform and, as I mentioned earlier, expanding into other data types, running a lot of computation near the edge, and really going up against the tech giants that are our competitors. We think there’s lots of unique advantages in being small and being a startup and being one hundred percent focused on AI versus the big guys that have retail stores and a hundred other divisions all at once. Think of the David and Goliath story…
Journalist: Tony Peng | Editor: Michael Sarazen