This spring one of the leading figures in machine learning, UC Berkeley Professor Michael I. Jordan, published the article Artificial Intelligence — The Revolution Hasn’t Happened Yet on Medium. In the piece Jordan points out that although the term “Artificial Intelligence” is now being intoned by many people across various fields, much misunderstanding remains about the term. He argues that when people say “AI” they are in fact referring to three different tech aspirations: “Human-Imitative AI”, “Intelligence Augmentation” (IA)” and “Intelligent Infrastructure (II)”.
Jordan believes each of these should be resolved separately and on their own merits, and that a human-imitative AI system should not be the principal strategy for approaching IA and II problems. The use of the single initialism “AI” he suggests, prevents clear comprehension of the tech and creates obstacles to problem-solving.
Professor Jordan recently visited China and gave a series of academic talks. Synced caught up with him at the Machine Learning Summit 2018 in Shanghai, where he kindly shared his thoughts on AI.
Synced: Why did you classify AI into three categories?
Michael I. Jordan: I was trying to highlight a few coherent themes. And I was thinking not as an academic, but as someone interested in new use cases in industry and historical development of ideas. It’s to help people appreciate the range of problems and not just work on one problem area with one method like deep learning. They can see different aspects of a problem.
A lot of people who build machine learning systems are not thinking about the social consequences very much until afterward when it’s clear there is a problem. But if you take this broader, federated market type perspective from the very beginning, you’re going to perceive earlier if you’re creating some problems, or even make the system better from the very beginning.
I don’t ever try to say that this is the best and only way to think, or that II is different from IA and all that. I just want to make certain distinctions from time to time to help the clarity of thought. Right now everyone is lumping everything into AI, and often they do that when they don’t know what they’re talking about.
Synced: You’ve mentioned you’re most interested in Intelligent Infrastructure, can you tell us about that?
Michael I. Jordan: II is in some ways like the older description, Internet of Things (IoT). But the IoT was mostly developed by networking researchers, and the whole problem was just to get the “things” to have an IP address, to be on a network and to communicate.
Now the next problem is to let them communicate data, and have that data to be used for inferences, be coherent and build up a knowledge web of data flowing. I think II could be called IoT, but it is a broader class of ideas and problems.
Synced: What is the key component of an II system, what input/output is involved?
Michael I. Jordan: It’s more about what are the rules of engagement in the game — who plays in the game, and how do you engage? To be very concrete, a simple limited form of a market-based learning system would be something like what DiDi Chuxing provides in transportation. The system itself is not super intelligent, it’s just linking drivers and riders. It’s a two-sided market.
Riders have one app on their cell phone, and the drivers have a different app on their cell phone. So it’s not just one app for everybody. That’s already interesting. And then because you have the two apps, you can make a request, and the other side looks at it and makes bids, and a price is set, and the price is adaptive to the moment and the situation. And so that’s a real market.
It’s a little bit different than a classical economics kind of market, it is more adaptive since more data is being used, and it knows more about the preferences of the drivers and the people. On the other hand, it is still very limited: it is only about one thing — getting people from one place to the next.
But think about blending that with a recommendation system. Now, recommendation systems are also pretty limited, they’re not part of an overall market in any way. But if you put the two together, on both sides of the market you have a recommendation plus a market interaction, and that starts to feel much more powerful. A simple example: Restaurant owners and diners, it would be nice if every evening when I walk out in Shanghai, I push a button on my phone and say I am interested in food tonight, the system knows where I am geolocated and I that like certain kinds of cuisines. And then it tells all the restaurant owners around there that I am available as a possible client.
The restaurants learn a little bit about me and my preferences and they bid on me, and maybe they give me a 10 percent discount, then I accept and we now have a pricing transaction. This system puts me in a market that I can accept or reject; and the owner could also get to know me after I’ve been there many times, we could start to have a relationship. Then I’m happy because I’m getting discounts and I start to develop a relationship; the restaurant owners are happy because they can start to fill their restaurant and favour certain kinds of clients and so on.
It’s not that complicated, but you have to do it at a scale with millions of entities on both sides, and the data changes every day. You could do this for any service, like restaurants, haircuts, music, etc.
Synced: You’ve written that II systems should be able to manage distributed decisions. Why is that important?
Michael I. Jordan: Distributed decisions are only difficult when there is scarcity. Any real-world situations have some amount of scarcity. For example, when I navigate someone to the airport, the scarce resource is the amount of possible flow on a given road. I can’t overload it. If I send too many people down the same road, I will create traffic.
Considering the delivery of packages, logistics companies do not have an infinite number of trucks. If the company sends a truck first to you, it cannot come first to me. So you will get your package earlier than me. Thinking like an economist, these companies do not solve the problem with some central algorithms that figure out “the best thing to do.” Instead, they let people bid. People who are in a rush and need the package earlier can pay a little bit more. Every day we would want to be able to express our desires and preferences. And that is again, what a simple market idea does.
We are now talking about building markets that allow data to flow and preferences to be expressed — not pure, simple classical markets. We are using data and preferences and creating a whole system for the markets. Economists are excited because they are in markets where there is much more information, and when they have more information, they can do better.
Synced: Is human-imitative AI required for solving II problems?
Michael I. Jordan: Simple markets don’t require a huge amount of intelligence from the individual agent. For example, the [Chinese ride-hailing company] DiDi Chuxing users are intelligent, but they are not using their intelligence so much; riders only need to say “I need to go to the airport” and so on. And that is true in most of the existing markets in the world, you just need to make simple decisions.
But markets are more efficient and better when each agent has more information and they can act more reliably and logically. And when getting into sophisticated markets like the commodity or stock exchanges, a lot more human intelligence is being used. So you can imagine that classical, human-imitative AI systems will start to be better players in certain interesting markets, and those two ideas [human-imitative AI and II] will come together.
Synced: Do you think there is a lack of attention on II problems in research communities?
Michael I. Jordan: I think there is. I don’t think it is an issue at all in the good, big companies. But I think that a lot of smaller companies hope they will solve big problems just by proposing an intelligent AI device or something, but mostly they won’t.
Also for some hardware companies, it’s very much in their interests to keep running all these AI algorithms on a huge amount of data, because they will make money whether it works or not. So they keep talking about the need for hardware, etc. I am not critical about this, but I think people have to understand that these companies are trying to sell that because it makes money for them in the near term, not because it solves the problem.
I think right now the amount of computation power available to each one of us is very high. A laptop is more than enough to do some very interesting things. Most applications I know of that are actually being used in real life can be run on a laptop. Occasionally for a business model that demands a search process that takes huge amounts of data, or if you are literally doing self-driving cars for example, you can go to the cloud.
On the academic side, the trend is moving pretty quickly. And one of the nice things about AI is that it got a lot of more students interested in our field. I think that is only good. Students come in and learn about AI and deep learning. But a lot of them are not going to stick with one topic, they will also look at other broader problems.
Synced: Most of your II examples are somehow related to economics. Would current machine learning algorithms become very different in the world of economics?
Michael I. Jordan: Not that different. For current machine learning there is only one player, just the algorithm. When there are lots of auctions in a market, you get multiple players. Each player knows a little about other players and has strategic thinking about certain actions getting certain responses from others.
The style is more economical but the basic underlying algorithms such as gradient descent and matrix algorithms in our world are very simple. And they are not that different in the economic world.
For example, in an economic system you often want to arrive at an equilibrium, where both you and I are happy and cannot move any direction to make us a little happier. But we’re competing, and your happiness and mine are not the same. In some sense, the algorithm is going down for you and up for me, or vice versa. Economists can use gradient descent to find saddle points. Because at the saddle point is when I am happy in one direction and you are also happy in the other direction. An algorithm should be able to find a saddle point, that’s not a new problem but as usual there’s often new things to do when you put it in a new context.
Economists run simulations of game situations all the time to see what will happen when people’s different metrics interact. Real computer vision is in the context of an entity that is trying to learn about its environment and move around and find resources, etc. And when we start to get deeper into this field it will also become more economic as well at some point.
It is surprising how the basic matrix, gradient and probability ideas are all used in all these different fields.
Synced: What do you think is overlooked or not mentioned enough in current AI discussion?
Michael I. Jordan: The Uncertainty. Really what machine learning has been is ideas from statistics blended with ideas from computer science. But in that blending a few things have been lost. And one of them is worry about the uncertainty.
The researchers know about it, but sometimes they don’t focus on it enough. Researchers assume that if you have a huge amount of data, that the algorithm will just output the right answers most of the time.
In computer vision problems, if you have many labels, somehow the uncertainty starts to go away. But that is not typical. For lots of other problems, there are still a lot of uncertainties because there are things you didn’t measure in the world. For example, if I am trying to make a decision for you about your medical treatment, there are lots of things happening in your body that I cannot know.
It usually requires more assumptions and efforts to get at uncertainty. For example the bootstrap can get an estimate of uncertainty by sampling data repeatedly, but that method is a compute-intensive and costs a lot. Uncertainty estimation by Bayesian approaches where initial uncertainty is input and gets changed is also costly.
So part of the engineering in these fields is to find cost-effective ways of getting uncertainty. If you’re unable to get uncertainty, you should at least give a statement, and maybe provide a bigger confidence interval to be conservative about the results.
The tradition in statistics is all about managing uncertainty and being clear on the errors etc., while on the computer science side there was no uncertainty. Computer science machine learning people then started to bring in at least some, with a training and test phase with differences due to randomness. But culturally, uncertainty is still less of a focus there.
Synced: You often mention the distributed framework “Ray” in your recent presentations. What is your motivation for the system design and what are the main features of Ray?
Michael I. Jordan: The need for good programming models and languages is always increasing and not going away. When I was young, there were Fortran and C. Now these languages are way better than the previous generations and a lot of productive things have happened. But it has become more and more painful to program such languages on a distributed platform with multi-threads for models where you have data in various places and so on.
Some big success stories behind the latest deep learning include Hadoop, Spark and TensorFlow, all of which allow you to easily work with those models. But we are not done, because those are only useful for the current situation.
Ray is an attempt to pull out a lot of the distributed aspects of problems, so that most engineers won’t think or worry too much about whether to run models on one, five or hundreds of computers. It aims to provide an underlying infrastructure that can use all available resources — it can run the model on one processor for a little while and then move to another, or stop and start processes to keep the system robust and consistent. Ray targets more on the decision side, where we’re doing reinforcement learning and search algorithms; rather than the pattern recognition of AI.
Synced: Compared with pattern recognition, how does decision side AI differ regarding hardware utilization?
Michael I. Jordan: The most important difference is the heterogeneous workload. For pattern recognition, it’s often pretty easy to break up the problem into five or ten pieces, or a hundred pieces of roughly of the same size that require the same resources and the same amount of time running the model. The MapReduce paradigm underlying Hadoop or Spark is one such paradigm.
But for decision making, if you break the problem into many searches and say “I’m going to try this parameter or try this one,” in some of those choices it’ll be clear immediately if it’s a terrible idea, and for others it will be a long time until it’s clear whether it’s a good or bad idea.
So each processor should run the appropriate amount of time for the particular search it’s doing. And when it’s done and knows whether it has a good answer or a bad answer, at that moment the system should stop and then start a new task on the same processor. So it’s a dynamic task graph, what’s called heterogeneous workload, it’s not where every piece is the same. And that’s what Ray does that the previous architectures did not.
Synced: How do you discover new areas of interest, and what have you recently been working on recently?
Michael I. Jordan: I am curious and I like to learn about everything. I’m social and I like to talk to people and learn what they are thinking about. I like to look at technology trends and see where things are going.
And then I start to see some ways of thinking that are new or some problem areas that are not being looked at, or something that feels like if you work on it for a little while interesting things will happen. And then I work on and talk about it and try to inspire other people, including my students, to work on it.
I like to talk about Ray because it’s a really solid and good work, plus I want other people to join. Another area I like talking about is something called false discovery rate, and particularly online false discovery rate — how do I make decisions over time such that at any time you stop me, I’ve made mostly good decisions up until that moment.
I am also pretty interested in mathematics: the geometry of gradient descent algorithms, and search and statistical algorithms. These algorithms are mathematically doing some movements in a parameter space, having both dynamic and geometric properties. We study how these dynamics properties and the geometric properties interact. In particular, there is a study of saddle points. We do mathematics to optimize algorithms for efficient escapes from saddle points. Mathematical people like that because there are real theorems about geometry and dynamics that you can build on, and future students will be able to go further with it.
This is a community where we build ideas that others can build on. I think in ten years, no one will talk about me again, my era will be gone. I don’t think that any of us are doing thinking so important that it’s like Einstein, it’ll last for a hundred years. We’re all doing things a little smaller than that, but I really like the idea that it’s solid enough that the next person really wants to stand on it and build it further.
Source: Synced China
Journalist: Luna Qiu | Localization: Tingting Cao | Editor: Michael Sarazen