Limit order books (LOBs) traditionally comprise instructions to buy or sell a given security at a specific price or better. The introduction of AI-powered trading systems has significantly impacted limit order book markets in recent years. While studies have shown that LOB prices can be predictable over short time periods, crafting an optimal trading strategy in a short time to translate this predictability into trading profits remains challenging.
In the new paper Asynchronous Deep Double Duelling Q-Learning for Trading-Signal Execution in Limit Order Book Markets, an Oxford University research team proposes Deep Duelling Double Q-Learning with the APEX (asynchronous prioritized experience replay) architecture. The novel approach uses deep reinforcement learning (RL) to train a trading agent to translate predictive signals into optimal limit order trading strategies. Given the same noisy signal of short-term forward mid-quote returns, Deep Double Q-learning outperforms benchmark trading strategies.
The team’s main contributions can be summarized as follows:
- By defining a novel action and state space in a LOB trading environment, we allow for the placement of limit orders at different prices.
- In addition to the timing and level placement of limit orders, our RL agent also learns to use limit orders of single units of stock to manage its inventory as it holds variably sized long or short positions over time.
- More broadly, we demonstrate the practical use case of RL to translate predictive signals into limit order trading strategies, which is still usually a hand-crafted component of a trading system.
- To the best of our knowledge, this is also the first study applying the APEX algorithm to limit order book environments.
The researchers model the trading problem as a Markov Decision Process (MDP). Observing the current environment state, the trading agent takes actions that will transition the environment state according to the stochastic transition function, and seeks to maximize the reward it receives after a transition.
The team first builds a limit order book environment based on the ABIDES market simulator in the OpenAI Gym, where they simulate a realistic trading environment for NASDAQ equities using historical order book messages. They then employ Deep Double Q-learning with a duelling network architecture to approximate the optimal Q-function, using the APEX training architecture to speed up the learning process.
In this setup, the agent models the received artificial directional price signals as a discrete probability distribution over the averaged mid-quote price either decreasing, remaining stable, or increasing over a fixed future time horizon. At each time step, the agent receives a new state observation and a history of the previous values, then chooses an action — place a buy or sell limit order of a single share at bid, mid-quote, or do nothing — that maximize its reward.
The team compared the proposed Deep Double Duelling Q-learning agent with a baseline trading algorithm on Apple (AAPL) limit order book data. Given access to the same artificially perturbed high-frequency signal of future mid-prices, the proposed agent is able to optimize the trading strategy and increase Sharpe ratios significantly, outperforming the baseline strategy at all levels of noise.
The results confirm Deep Double Duelling Q-learning with asynchronous experience replay as a state-of-the-art reinforcement learning algorithm for translating high-frequency trading signals into effective trading strategies. The team hopes their work will motivate further research in this area, for example, in enlarging the action space.
The paper Asynchronous Deep Double Duelling Q-Learning for Trading-Signal Execution in Limit Order Book Markets is on arXiv.
Author: Hecate He | Editor: Michael Sarazen
We know you don’t want to miss any news or research breakthroughs. Subscribe to our popular newsletter Synced Global AI Weekly to get weekly AI updates.