The Thirty-Fifth AAAI Conference on Artificial Intelligence (AAAI-21) kicked off today as a virtual conference. The organizing committee announced the Best Paper Awards and Runners Up during this morning’s opening ceremony. Three papers received Best Paper Awards and three were recognized as Runners Up.
The total of 9,034 submissions to AAAI 2021 marked another record high, surpassing last year’s 8800. Submissions from China (3,319) almost doubled the number of papers from the United States (1,822). Out of 7,911 papers that went to review, a total of 1,692 papers made it. This year’s acceptance rate was 21 percent, slightly higher than last year’s 20.6 percent.
Best Paper Awards
Institution(s): Beihang University, UC Berkeley, Rutgers University, Beijing Guowang Fuda Science & Technology Development Company
Authors: Haoyi Zhou, Shanghang Zhang, Jieqi Peng, Shuai Zhang, Jianxin Li, Hui Xiong, Wancai Zhang
Abstract: Many real-world applications require the prediction of long sequence time-series, such as electricity consumption planning. Long sequence time-series forecasting (LSTF) demands a high prediction capacity of the model, which is the ability to capture precise long-range dependency coupling between output and input efficiently. Recent studies have shown the potential of Transformer to increase the prediction capacity. However, there are several severe issues with Transformer that prevent it from being directly applicable to LSTF, such as quadratic time complexity, high memory usage, and inherent limitation of the encoder-decoder architecture. To address these issues, we design an efficient transformer-based model for LSTF, named Informer, with three distinctive characteristics: (i) a ProbSparse Self-attention mechanism, which achieves O(Llog L) in time complexity and memory usage, and has comparable performance on sequences’ dependency alignment. (ii) the self-attention distilling highlights dominating attention by halving cascading layer input, and efficiently handles extreme long input sequences. (iii) the generative style decoder, while conceptually simple, predicts the long time-series sequences at one forward operation rather than a step-by-step way, which drastically improves the inference speed of long-sequence predictions. Extensive experiments on four large-scale datasets demonstrate that Informer significantly outperforms existing methods and provides a new solution to the LSTF problem.
Exploration-Exploitation in Multi-Agent Learning: Catastrophe Theory Meets Game Theory
Institution(s): Singapore University of Technology and Design
Authors: Stefanos Leonardos, Georgios Piliouras
Abstract: Exploration-exploitation is a powerful and practical tool in multi-agent learning (MAL), however, its effects are far from understood. To make progress in this direction, we study a smooth analogue of Q-learning. We start by showing that our learning model has strong theoretical justification as an optimal model for studying exploration-exploitation. Specifically, we prove that smooth Q-learning has bounded regret in arbitrary games for a cost model that explicitly captures the balance between game and exploration costs and that it always converges to the set of quantal-response equilibria (QRE), the standard solution concept for games under bounded rationality, in weighted potential games with heterogeneous learning agents. In our main task, we then turn to measure the effect of exploration in collective system performance. We characterize the geometry of the QRE surface in low-dimensional MAL systems and link our findings with catastrophe (bifurcation) theory. In particular, as the exploration hyperparameter evolves over-time, the system undergoes phase transitions where the number and stability of equilibria can change radically given an infinitesimal change to the exploration parameter. Based on this, we provide a formal theoretical treatment of how tuning the exploration parameter can provably lead to equilibrium selection with both positive as well as negative (and potentially unbounded) effects to system performance.
Mitigating Political Bias in Language Models Through Reinforced Calibration
Institution(s): Dartmouth College, University of Texas at Austin, Google AI
Authors: Ruibo Liu, Chenyan Jia, Jason Wei, Guangxuan Xu, Lili Wang, and Soroush Vosoughi
Abstract: Current large-scale language models can be politically biased as a result of the data they are trained on, potentially causing serious problems when they are deployed in real world settings. In this paper, we describe metrics for measuring political bias in GPT-2 generation and propose a reinforcement learning (RL) framework for mitigating political biases in generated text. By using rewards from word embeddings or a classifier, our RL framework guides debiased generation without having access to the training data or requiring the model to be retrained. In empirical experiments on three attributes sensitive to political bias (gender, location, and topic), our methods reduced bias according to both our metrics and human evaluation, while maintaining readability and semantic coherence.
Best Paper Runners Up
Learning From Extreme Bandit Feedback
Institution(s): UC Berkeley, University of Texas at Austin
Authors: Romain Lopez, Inderjit Dhillon, Michael I. Jordan
Abstract: We study the problem of batch learning from bandit feedback in the setting of extremely large action spaces. Learning from extreme bandit feedback is ubiquitous in recommendation systems, in which billions of decisions are made over sets consisting of millions of choices in a single day, yielding massive observational data. In these large-scale real-world applications, supervised learning frameworks such as eXtreme Multi-label Classification (XMC) are widely used despite the fact that they incur significant biases due to the mismatch between bandit feedback and supervised labels. Such biases can be mitigated by importance sampling techniques, but these techniques suffer from impractical variance when dealing with a large number of actions. In this paper, we introduce a selective importance sampling estimator (sIS) that operates in a significantly more favorable biasvariance regime. The sIS estimator is obtained by performing importance sampling on the conditional expectation of the reward with respect to a small subset of actions for each instance (a form of Rao-Blackwellization). We employ this estimator in a novel algorithmic procedure—named Policy Optimization for eXtreme Models (POXM)—for learning from bandit feedback on XMC tasks. In POXM, the selected actions for the sIS estimator are the top-p actions of the logging policy, where p is adjusted from the data and is significantly smaller than the size of the action space. We use a supervised-to-bandit conversion on three XMC datasets to benchmark our POXM method against three competing methods: BanditNet, a previously applied partial matching pruning strategy, and a supervised learning baseline. Whereas BanditNet sometimes improves marginally over the logging policy, our experiments show that POXM systematically and significantly improves over all baselines.
Self-Attention Attribution: Interpreting Information Interactions Inside Transformer
Institution(s): Beihang University, Microsoft Research
Authors: Yaru Hao, Li Dong, Furu Wei, Ke Xu
Abstract: The great success of Transformer-based models benefits from the powerful multi-head selfattention mechanism, which learns token dependencies and encodes contextual information from the input. Prior work strives to attribute model decisions to individual input features with different saliency measures, but they fail to explain how these input features interact with each other to reach predictions. In this paper, we propose a self-attention attribution algorithm to interpret the information interactions inside Transformer. We take BERT as an example to conduct extensive studies. Firstly, we extract the most salient dependencies in each layer to construct an attribution graph, which reveals the hierarchical interactions inside Transformer. Furthermore, we apply selfattention attribution to identify the important attention heads, while others can be pruned with only marginal performance degradation. Finally, we show that the attribution results can be used as adversarial patterns to implement non-targeted attacks towards BERT.
Dual-Mandate Patrols: Multi-Armed Bandits for Green Security
Institution(s): Harvard University, Carnegie Mellon University
Authors: Lily Xu, Elizabeth Bondi, Fei Fang, Andrew Perrault, Kai Wang, Milind Tambe
Abstract: Conservation efforts in green security domains to protect wildlife and forests are constrained by the limited availability of defenders (i.e., patrollers), who must patrol vast areas to protect from attackers (e.g., poachers or illegal loggers). Defenders must choose how much time to spend in each region of the protected area, balancing exploration of infrequently visited regions and exploitation of known hotspots. We formulate the problem as a stochastic multi-armed bandit, where each action represents a patrol strategy, enabling us to guarantee the rate of convergence of the patrolling policy. However, a naive bandit approach would compromise short-term performance for long-term optimality, resulting in animals poached and forests destroyed. To speed up performance, we leverage smoothness in the reward function and decomposability of actions. We show a synergy between Lipschitz continuity and decomposition as each aids the convergence of the other. In doing so, we bridge the gap between combinatorial and Lipschitz bandits, presenting a no-regret approach that tightens existing guarantees while optimizing for short-term performance. We demonstrate that our algorithm, LIZARD, improves performance on real-world poaching data from Cambodia.
AAAI 2021 runs virtually through February 9. AAAI 2022 is scheduled to take place in Vancouver, Canada.
Journalist: Fangyu Cai | Editor: Michael Sarazen