AAAI-17 Classic Paper Award
Introduction to the awards
The AAAI Classic Paper award honors the author(s) of paper(s) deemed most influential, chosen from a specific conference year. Each year, the time period considered will be advanced by one year.
The 2017 award will be given to the most influential paper(s) from the Sixteenth National Conference on Artificial Intelligence, held in 1999 in Orlando, Florida, USA.
Papers will be judged on the basis of impact, for example:
- Started a new research (sub)area
- Led to important applications
- Answered a long-standing question/issue or clarified what had been murky
- Made a major advance that figures in the history of the subarea
- Has been picked up as important and used by other areas within (or outside of) AI
- Has been very heavily cited
A Classic Paper Honorable Mention was given to: “Combining Collaborative Filtering with Personal Agents for Better Recommendations” by Nathaniel Good, J. Ben Schafer, Joseph A. Konstan, Al Borchers, Badrul Sarwar, Jon Herlocker, John Riedl . For developing an effective way to combine collaborative filtering and content filtering to provide better recommendations to users.
Introduction to this paper
This paper shows a way to combine the two algorithms of Information Filtering and Collaborative Filtering. It argued that a Collaborative Filtering (CF) framework can be used to combine personal Information Filtering (IF) agents and the opinions of a community of users, and the recommendation produced will be better than what the agents and the users can produce alone. It also shows that using CF to create a personal combination of a set of agents produces better results than either individual agents or other combination mechanisms. One key implication of these results is that users can avoid having to select among agents; they can use them all and let the CF framework select the best ones for them. Back in those days, this is a good way to construct this framework.
In this paper review, apart from the introduction of basic ideas about RSs, I will attempt to help the readers understand Hypotheses and Experimental Design, Discussion, and give my own opinion about this experiment based on present day developements. There will also be some recommended materials to help readers study RSs.
Basic ideas about Recommender Systems
As computers, communication, and the Internet make it easier for anyone and everyone to speak to a large audience, we are presented with more and more information in our every day life. To the point which we may find our ability to search for useful information to be inadequate. As a response to the challenge of information overload, Recommender Systems were created.
Recommender Systems (RS) were envisioned in 1970 , developed and commercialized  in the 1990s, and have been studied for more than 20 years since. It is still being continuously developed today. Martin and his colleagues concluded two waves, the first from 1990 to 2000, the second from 2000 to 2010, and expected the third wave of RSs to appear in 2011 . This paper was published in 1999, corresponding to the end of the first wave and the beginning of the second wave.
We have two popular recommender algorithms: Content Filtering and Collaborative Filtering, but neither of them can meet every requirement. Therefore, to avoid the limitations of collaborative algorithm and content-based algorithm, there are many methods in combining them. Below are four major techniques:
- Combining the predictions of collaborative algorithm and content-based algorithm, such as voting scheme  or rating scheme .
- Based on collaborative algorithm, adding some characteristics of content-based algorithm [4,6].
- Based on content-based algorithm, adding some characteristics of collaborative algorithm .
- Constructing a new model that incorporates characteristics of collaborative algorithm and content-based algorithms [8,9].
There are three different technologies that are commonly used to address information overload challenges. Each technology focuses primarily on a particular set of tasks or questions:
- Information retrieval: fulfilling ephemeral interest queries
- Information filtering: classifying streams of new content into categories
- Collaborative filtering:
- Which items (overall or from a set) should I view?
- How much will I like these particular items?
Each of these technologies plays a role in producing an effective recommender system. We can easily regard information retrieval as a search engine (e.g. google) and information filtering as content-based filtering. Here are some simple explanations for these three techniques
Information filtering (IF) require a profile of user needs or preferences. The simplest systems require the user to create this profile manually or with limited assistance. For example, if Mike bought a book named C++ programming, then the RSs likely think Mike also like Data Structure or C programming because these books are quite similar to each other.
Collaborative filtering (CF) systems build a database of user opinions of available items. For example, Mike and John are both student studying computer science. Mike bought a book named machine learning which John did not, then RSs will also recommend this book to John.
Several systems have tried to combine information filtering and collaborative filtering techniques in an effort to overcome the limitations of each. Sarwar, et al. (1998) showed that a simple but consistent rating agent, such as one that assesses the quality of spelling in a Usenet news article, could be a valuable participant in a collaborative filtering community. They showed in their work how these filterbots—ratings robots that participate as members of a collaborative filtering system – helped users who agreed with them by providing more ratings upon which recommendations could be made. For users who did not agree with the filterbot, the CF framework would notice a low preference correlation and not make use of its ratings.
This paper extends the filterbot concept in three key ways. First, it uses a more intelligent set of filterbots, including learning agents that are personalized to an individual user. Second, it applies this work to small communities, including using CF to serve a single human user. Third, it evaluates the simultaneous use of multiple filterbots. In addition, we explore other combination mechanisms as alternatives to CF. We demonstrate that CF is a useful framework both for integrating agents and for combining agents and humans.
There are many datasets which can be used to train RSs such as Netflix dataset, CiteULike, Delicious dataset and MovieLens dataset. This paper was based on MovieLens [2, 3] dataset. The user ratings for this experiment were drawn from the MovieLens system (http://movielens.umn.edu) which has more than 3 million ratings from over 80,000 users. Fifty users were selected at random from the set of users with more than 120 movie ratings. For each user, three sets of movies/ratings were selected at random without replacement. The first set of 50 ratings, termed the training set, was set aside for use in training the personalized information filtering agents. The second set of 50 ratings, termed the correlation set was used when combining users, agents, or both together. The final set of 20 ratings served as the test set. In each experiment, the test ratings of the target user were withheld and compared against the recommendation value produced by the system.
Hypotheses and Experimental Design
This paper systematically explores the value of collaborative filtering, information filtering, and different combinations of these techniques for creating an effective personal recommendation system. Specifically, we look at four key models as shown in figure 1:
- Pure collaborative filtering using the opinions of other community members
- A single personalized “agent” – a machine learning or syntactic filter
- A combination of many “agents”
- A combination of multiple agents and community member opinions
The experimental design uses two tiers. First, where there are several implementations for a particular model, we evaluate them to find the model that provides the best filtering. Second, we compare the best implementation from each model with the other implementations. These are operationalized as four primary hypotheses below:
- The opinions of a community of users provide better recommendations than a single personalized agent.
- A personalized combination of several agents provides better recommendations than a single personalized agent.
- The opinions of a community of users provides better recommendations than a personalized combination of several agents.
- A personalized combination of several agents and community opinions provides better recommendations than either agents or user opinions alone.
The context in which these hypotheses are tested is a small, anonymous community of movie fans. The combination of small size and non-textual content cause disadvantages for both collaborative filtering and information filtering; it provides a middle-ground between the common contexts for collaborative filtering (many users, little content information) and information filtering (one user, much content information).
Their hypotheses are based on four models of recommender system:
- User opinions only,
- Individual IF agents,
- Combinations of IF agents
- Combinations of IF agents and user opinions.
In this section, this paper describes the variety of implementations of these models with an overview of how it constructed each implementation.
User Opinions Only. In particular, they used the DBLens research collaborative filtering engine developed by the GroupLens Research project for the exploration of collaborative filtering algorithms. DBLens allows experimenters to control several parameters that trade between performance, coverage, and accuracy. For our experiments, we set each of these to prefer maximum coverage and to use all data regardless of performance. The CF result set was computed for each user by loading the correlation data set (50 ratings per user) into the engine, then loading the test set (20 ratings per user) for each user, and requesting a prediction for each test set movie for each user. DBLens has a control that allows us to ignore a user’s rating when making a prediction for that user. The resulting 20 predictions per user were compared against that user’s ratings to produce error and ROC statistics.
Individual IF Agents. Three types of IF agents, or filterbots, were created and studied in this project: DGBots, RipperBot, and a set of GenreBots.
Combinations of IF Agents. They identified four different strategies for combining agents: selecting one agent for each person, averaging the agents together, using regression to create a personal combination, and using CF to create a personal combination. In all but the first strategy, they found it valuable to create two combinations: one that used all 19 GenreBots and one that used the Mega-GenreBot. Adding the 3 DGBots and RipperBot, they refer to these as the 23-agent and 5-agent versions, respectively. BestBot, the best agent per user was selected by testing each bot on the correlation data set (50 ratings) and selecting the bot with the lowest MAE. BestBot then used the ratings generated by that bot for the test data set to produce statistics for evaluation.
Combination of Users and IF Agents. Because user ratings were incomplete, and because CF with 23 agents proved to be the most effective combination of IF agents, they used CF to combine the 23 agents and all 50 users. The method is identical to the CF combination of agents except that they also loaded the ratings for the other 49 users. Again, the database was cleared after each user.
Recommender systems researchers use several different measures for the quality of recommendations produced.
Coverage metrics evaluate the number of items for which the system could provide recommendations.
Statistical accuracy metrics evaluate the accuracy of a filtering system by comparing the numerical prediction values against user ratings for the items that have both predictions and ratings.
Decision-support accuracy metrics evaluate how effective a prediction engine is at helping a user select high-quality items from the item set.
Statistical significance is assessed for mean absolute errors using the Wilcoxan test on paired errors.
Evaluating the hypotheses in the face of multiple metrics can be a challenge. They considered it important to consider both statistical and decision-support accuracy in evaluating different recommender systems. When several agents, for example, provide different but incomparable trade-offs among the two metrics, they consider each one to be a possible “best agent” and compare each of them against the alternative recommender. They consider one alternative to dominate another, however, if there is a significant improvement in one metric and no significant difference in the other.
Results, Discussion and conclusion
The most important results they found were the value of combining agents with CF and of combining agents and users with CF. Besides, they were also pleased, though somewhat surprised, to find that CF outperformed linear regression as a combining mechanism for agents. They were surprised by several of the results that were found, and sought to explain them. They were also surprised by the results they achieved using Ripper. They were impressed by its accuracy, after extensive tuning, but dismayed by how close to random it was in distinguishing good from bad movies. They are still uncertain as to why RipperBot performs as it does, and believe further work is needed to understand why it behaves as it does and whether it would be possible to train it to perform differently.
In the future, they plan to examine further combinations of users and agents in recommender systems. In particular, they are interested in developing a combined community where large numbers of users and agents co-exist. One question they hope to answer is whether users who agree with each other would also benefit from the opinions of each other’s trained agents.
Apart from a few mistakes, this paper clearly shows a CF framework which can be used to combine personal IF agents and the opinions of a community of users. As time goes by, RSs become more and more mature nowadays. Around 1999, many research group were trying to find a good way to improve the performance for RSs. This paper argued a good thought to combine IF and CF, which is significant in those days, though it looks usual now.c
In this framework, author did not refer to information retrieval, because it captures no information about user preferences other than the specific query. In this paper, it is reasonable because its framework is suitable for this kind of situation. However, I think it is actually helpful to improve the recommender Systems because it is the specific request from user, which can be used to adjust result produced by our RSs. Besidches, it is hard for people ask RSs specific things if they have clear request. Therefore, I think the combination of search engine and RSs is likely to become the mainstream in the future.
About metrics, nowadays we also have diversity, Novelty , Degree of surprise  and scrutability  as metrics for RSs, we may have new discovery if we use these new metrics to evaluate this experiment again.c
Adomavicius, Gediminas, and Alexander Tuzhilin. “Toward the next generation of recommender systems: A survey of the state-of-the-art and possible extensions.” IEEE transactions on knowledge chachu’lnd data engineering 17.6 (2005): 734-749.
Ricci, Francesco, Lior Rokach, and Bracha Shapira. Introduction to recommender systems handbook. Springer US, 2011.
Martin, Francisco J., et al. “The big promise of recommender systems.” chulchuI Magazine 32.3 (2011): 19-27.
Resnick, Paul, and Hal R. Varian. “Recommender systems.” Communications of thchu’le ACM 40.3 (1997): 56-58.
Burke, Robin, Alexander Felfernig, and Mehmet H. Göker. “Recommender schulstems: An overview.” AI Magazine 32.3 (2011): 13-18.
Bennett, James, and Stan Lanning. “The netflix prize.” Proceedings of KDD cup and workshop. Vol. 2007. 2007.
- Pazzani, Michael J. “A framework for collaborative, content-based and demographic filtering.” Artificial Intelligence Review 13.5-6 (1999): 393-408.
- Claypool, Mark, et al. “Combining content-based and collaborative filters in an online newspaper.” Proceedings of ACM SIGIR workshop on recommender systems. Vol. 60. 1999.
- Balabanović, Marko, and Yoav Shoham. “Fab: content-based, collaborative recommendation.” Communications of the ACM 40.3 (1997): 66-72.
- Soboroff, Ian, and Charles Nicholas. “Combining content and collaboration in text filtering.” Proceedings of the IJCAI. Vol. 99. 1999.
- Basu, Chumki, Haym Hirsh, and William Cohen. “Recommendation as classification: Using social and content-based information in recommendation.” Aaai/iaai. 1998.
- Popescul, Alexandrin, David M. Pennock, and Steve Lawrence. “Probabilistic models for unified collaborative and content-based recommendation in sparse-data environments.” Proceedings of the Seventeenth conference on Uncertainty in artificial intelligence. Morgan Kaufmann Publishers Inc., 2001.
- Martin, Francisco J., et al. “The big promise of recommender systems.” AI Magazine 32.3 (2011): 19-27.
- Resnick, Paul, and Hal R. Varian. “Recommender systems.” Communications of the ACM 40.3 (1997): 56-58.
- Negroponte, N. “The Architectural Machine: Toward a More Human Environment.” (1970): 5.
- Oscar Celma and Perfecto Herrera. A new approach to evaluating novel recommendations. In Proceedings of the 2008 ACM conference on Recommender systems, RecSys ’08, pages 179–186, New York, NY, USA, 2008. ACM.
- Mouzhi Ge, Carla Delgado-Battenfeld, and Dietmar Jannach. Beyond accuracy: evaluating recommender systems by coverage and serendipity. In Proceedings of the fourth ACM conference on Recommender systems, Rec-Sys ’10, pages 257–260, New York, NY, USA, 2010. ACM.
- Burke, Robin, and Maryam Ramezani. “Matching recommendation technologies and domains.” Recommender systems handbook. Springer US, 2011. 367-386.
Analyst: Duke Lee | Localized by Synced Global Team : Xiang Chen