AI Machine Learning & Data Science Research

Oxford U’s Deep Double Duelling Q-Learning Translates Trading Signals Into SOTA Trading Strategies

In the new paper Asynchronous Deep Double Duelling Q-Learning for Trading-Signal Execution in Limit Order Book Markets, an Oxford University research team introduces Deep Duelling Double Q-learning with the APEX architecture to train a trading agent to translate predictive signals into optimal limit order trading strategies.

Limit order books (LOBs) traditionally comprise instructions to buy or sell a given security at a specific price or better. The introduction of AI-powered trading systems has significantly impacted limit order book markets in recent years. While studies have shown that LOB prices can be predictable over short time periods, crafting an optimal trading strategy in a short time to translate this predictability into trading profits remains challenging.

In the new paper Asynchronous Deep Double Duelling Q-Learning for Trading-Signal Execution in Limit Order Book Markets, an Oxford University research team proposes Deep Duelling Double Q-Learning with the APEX (asynchronous prioritized experience replay) architecture. The novel approach uses deep reinforcement learning (RL) to train a trading agent to translate predictive signals into optimal limit order trading strategies. Given the same noisy signal of short-term forward mid-quote returns, Deep Double Q-learning outperforms benchmark trading strategies.

The team’s main contributions can be summarized as follows:

  1. By defining a novel action and state space in a LOB trading environment, we allow for the placement of limit orders at different prices.
  2. In addition to the timing and level placement of limit orders, our RL agent also learns to use limit orders of single units of stock to manage its inventory as it holds variably sized long or short positions over time.
  3. More broadly, we demonstrate the practical use case of RL to translate predictive signals into limit order trading strategies, which is still usually a hand-crafted component of a trading system.
  4. To the best of our knowledge, this is also the first study applying the APEX algorithm to limit order book environments.

The researchers model the trading problem as a Markov Decision Process (MDP). Observing the current environment state, the trading agent takes actions that will transition the environment state according to the stochastic transition function, and seeks to maximize the reward it receives after a transition.

The team first builds a limit order book environment based on the ABIDES market simulator in the OpenAI Gym, where they simulate a realistic trading environment for NASDAQ equities using historical order book messages. They then employ Deep Double Q-learning with a duelling network architecture to approximate the optimal Q-function, using the APEX training architecture to speed up the learning process.

In this setup, the agent models the received artificial directional price signals as a discrete probability distribution over the averaged mid-quote price either decreasing, remaining stable, or increasing over a fixed future time horizon. At each time step, the agent receives a new state observation and a history of the previous values, then chooses an action — place a buy or sell limit order of a single share at bid, mid-quote, or do nothing — that maximize its reward.

The team compared the proposed Deep Double Duelling Q-learning agent with a baseline trading algorithm on Apple (AAPL) limit order book data. Given access to the same artificially perturbed high-frequency signal of future mid-prices, the proposed agent is able to optimize the trading strategy and increase Sharpe ratios significantly, outperforming the baseline strategy at all levels of noise.

The results confirm Deep Double Duelling Q-learning with asynchronous experience replay as a state-of-the-art reinforcement learning algorithm for translating high-frequency trading signals into effective trading strategies. The team hopes their work will motivate further research in this area, for example, in enlarging the action space.

The paper Asynchronous Deep Double Duelling Q-Learning for Trading-Signal Execution in Limit Order Book Markets is on arXiv.


Author: Hecate He | Editor: Michael Sarazen


We know you don’t want to miss any news or research breakthroughs. Subscribe to our popular newsletter Synced Global AI Weekly to get weekly AI updates.

6 comments on “Oxford U’s Deep Double Duelling Q-Learning Translates Trading Signals Into SOTA Trading Strategies

  1. Tobby Dushar

    Oxford University’s Deep Double Duelling Q-Learning algorithm is revolutionizing the trading industry by translating trading signals into state-of-the-art (SOTA) trading strategies. While this technology is groundbreaking, it can also be complemented by competitor price tracking and monitoring software like Priceva. Priceva’s automated tracking and price change notifications enable traders to respond to market fluctuations quickly, while its single interface and comprehensive analytics help identify opportunities to stay competitive. Additionally, Priceva’s AI-based repricing tool creates a perfect pricing strategy, allowing traders to optimize their profitability. By using Priceva in conjunction with cutting-edge technologies like Oxford University’s Deep Double Duelling Q-Learning algorithm, traders can stay ahead of the competition and reap the benefits of a smarter, more efficient trading strategy. To learn more about how Priceva can help you stay competitive in the trading industry, visit their website at https://priceva.com/.

    • Anonymous

      What are some top-rated online platforms or educational resources that offer the most comprehensive stock trading courses currently accessible, and what criteria should one use to select the best course for their skill level and learning objectives?
      //

      • Brianna

        In the huge world of online platforms offering stock trading courses, one article jumps out in my mind: Best Stock Trading Courses 2023 – From An Expert Trader • Asia Forex Mentor. Let me tell you about my personal experience with this training. I took Thomas Kralow’s trading course a while ago, and I can certainly say it was a game changer for me. The training was more than simply studying the fundamentals of stock tradin, it was a full tour through market dynamics, risk management tactics, and creating a strong trading mindset. Every lesson reflected Thomas Kralow’s skills as an expert trader, with practical insights and real-world examples that helped students understand complicated ideas. It went beyond merely technical analysis and chart patterns, emphasizing the significance of understanding macroeconomic issues, psychological elements of trading, and developing a personalized trading plan based on individual goals and risk tolerance.

  2. diana loreens

    They shown amazing dedication to completing the assignment within the deadline. I proofread and submitted the work on time, despite the tight deadline I had set, thanks to https://www.nursingpaper.com/examples/applying-ethical-principles-essay/ .

  3. nancystark

    Join the AmericaSuits community today and embark on a journey of fashion discovery. Follow us on social media for style inspiration, exclusive offers, and behind-the-scenes glimpses of the latest trends. Sign up for our boston team dunkin donuts tracksuit to stay up-to-date on new arrivals, promotions, and events. At AmericaSuits, we’re more than just a retailer – we’re a destination for fashion-forward individuals who dare to stand out and make a statement.

  4. Pingback: Exploration Contributions from OSU Personal computer Science Faculty: A Comprehensive Evaluation – LS Global Group

Leave a Reply

Your email address will not be published. Required fields are marked *