Oxford U’s Deep Double Duelling Q-Learning Translates Trading Signals Into SOTA Trading Strategies

In the new paper Asynchronous Deep Double Duelling Q-Learning for Trading-Signal Execution in Limit Order Book Markets, an Oxford University research team introduces Deep Duelling Double Q-learning with the APEX architecture to train a trading agent to translate predictive signals into optimal limit order trading strategies.

by Synced

2023-01-25

Comments 11

Limit order books (LOBs) traditionally comprise instructions to buy or sell a given security at a specific price or better. The introduction of AI-powered trading systems has significantly impacted limit order book markets in recent years. While studies have shown that LOB prices can be predictable over short time periods, crafting an optimal trading strategy in a short time to translate this predictability into trading profits remains challenging.

In the new paper Asynchronous Deep Double Duelling Q-Learning for Trading-Signal Execution in Limit Order Book Markets, an Oxford University research team proposes Deep Duelling Double Q-Learning with the APEX (asynchronous prioritized experience replay) architecture. The novel approach uses deep reinforcement learning (RL) to train a trading agent to translate predictive signals into optimal limit order trading strategies. Given the same noisy signal of short-term forward mid-quote returns, Deep Double Q-learning outperforms benchmark trading strategies.

The team’s main contributions can be summarized as follows:

By defining a novel action and state space in a LOB trading environment, we allow for the placement of limit orders at different prices.
In addition to the timing and level placement of limit orders, our RL agent also learns to use limit orders of single units of stock to manage its inventory as it holds variably sized long or short positions over time.
More broadly, we demonstrate the practical use case of RL to translate predictive signals into limit order trading strategies, which is still usually a hand-crafted component of a trading system.
To the best of our knowledge, this is also the first study applying the APEX algorithm to limit order book environments.

The researchers model the trading problem as a Markov Decision Process (MDP). Observing the current environment state, the trading agent takes actions that will transition the environment state according to the stochastic transition function, and seeks to maximize the reward it receives after a transition.

The team first builds a limit order book environment based on the ABIDES market simulator in the OpenAI Gym, where they simulate a realistic trading environment for NASDAQ equities using historical order book messages. They then employ Deep Double Q-learning with a duelling network architecture to approximate the optimal Q-function, using the APEX training architecture to speed up the learning process.

In this setup, the agent models the received artificial directional price signals as a discrete probability distribution over the averaged mid-quote price either decreasing, remaining stable, or increasing over a fixed future time horizon. At each time step, the agent receives a new state observation and a history of the previous values, then chooses an action — place a buy or sell limit order of a single share at bid, mid-quote, or do nothing — that maximize its reward.

The team compared the proposed Deep Double Duelling Q-learning agent with a baseline trading algorithm on Apple (AAPL) limit order book data. Given access to the same artificially perturbed high-frequency signal of future mid-prices, the proposed agent is able to optimize the trading strategy and increase Sharpe ratios significantly, outperforming the baseline strategy at all levels of noise.

The results confirm Deep Double Duelling Q-learning with asynchronous experience replay as a state-of-the-art reinforcement learning algorithm for translating high-frequency trading signals into effective trading strategies. The team hopes their work will motivate further research in this area, for example, in enlarging the action space.

The paper Asynchronous Deep Double Duelling Q-Learning for Trading-Signal Execution in Limit Order Book Markets is on arXiv.

Author: Hecate He | Editor: Michael Sarazen

We know you don’t want to miss any news or research breakthroughs. Subscribe to our popular newsletter Synced Global AI Weekly to get weekly AI updates.

11 comments on “Oxford U’s Deep Double Duelling Q-Learning Translates Trading Signals Into SOTA Trading Strategies”

Tobby Dushar

2023-04-22

Oxford University’s Deep Double Duelling Q-Learning algorithm is revolutionizing the trading industry by translating trading signals into state-of-the-art (SOTA) trading strategies. While this technology is groundbreaking, it can also be complemented by competitor price tracking and monitoring software like Priceva. Priceva’s automated tracking and price change notifications enable traders to respond to market fluctuations quickly, while its single interface and comprehensive analytics help identify opportunities to stay competitive. Additionally, Priceva’s AI-based repricing tool creates a perfect pricing strategy, allowing traders to optimize their profitability. By using Priceva in conjunction with cutting-edge technologies like Oxford University’s Deep Double Duelling Q-Learning algorithm, traders can stay ahead of the competition and reap the benefits of a smarter, more efficient trading strategy. To learn more about how Priceva can help you stay competitive in the trading industry, visit their website at https://priceva.com/.

Loading...

Reply
- Anonymous
  
  2024-02-22
  
  What are some top-rated online platforms or educational resources that offer the most comprehensive stock trading courses currently accessible, and what criteria should one use to select the best course for their skill level and learning objectives?
  //
  
  Loading...
  
  Reply
  - Brianna
    
    2024-02-23
    
    In the huge world of online platforms offering stock trading courses, one article jumps out in my mind: Best Stock Trading Courses 2023 – From An Expert Trader • Asia Forex Mentor. Let me tell you about my personal experience with this training. I took Thomas Kralow’s trading course a while ago, and I can certainly say it was a game changer for me. The training was more than simply studying the fundamentals of stock tradin, it was a full tour through market dynamics, risk management tactics, and creating a strong trading mindset. Every lesson reflected Thomas Kralow’s skills as an expert trader, with practical insights and real-world examples that helped students understand complicated ideas. It went beyond merely technical analysis and chart patterns, emphasizing the significance of understanding macroeconomic issues, psychological elements of trading, and developing a personalized trading plan based on individual goals and risk tolerance.
    
    Loading...
diana loreens

2024-03-06

They shown amazing dedication to completing the assignment within the deadline. I proofread and submitted the work on time, despite the tight deadline I had set, thanks to https://www.nursingpaper.com/examples/applying-ethical-principles-essay/ .

Loading...

Reply
nancystark

2024-03-13

Join the AmericaSuits community today and embark on a journey of fashion discovery. Follow us on social media for style inspiration, exclusive offers, and behind-the-scenes glimpses of the latest trends. Sign up for our boston team dunkin donuts tracksuit to stay up-to-date on new arrivals, promotions, and events. At AmericaSuits, we’re more than just a retailer – we’re a destination for fashion-forward individuals who dare to stand out and make a statement.

Loading...

Reply
Pingback: Exploration Contributions from OSU Personal computer Science Faculty: A Comprehensive Evaluation – LS Global Group
Lucas

2024-08-22

Participate in the DunkinRunsOnYou Survey on their new website, DunkinRunsOnYou.Com.Co, which offers an opportunity to reflect on and influence your Dunkin’ experiences. By providing detailed feedback about their Donuts, like the ones you’ve creatively named, and other aspects of their service, you help Dunkin’ understand what works and what could be better. Your input is valuable for their efforts to deliver exceptional products and services. Additionally, participating in the survey might reward you with discounts or special promotions, making it a rewarding way to ensure that your next Dunkin’ visit is even more enjoyable.

Loading...

Reply
Sumit Guptill

2025-03-24

The Jonathan Brown Paddington in Peru Beige Jacket is a perfect mix of classic style and comfort. Its neutral beige tone makes it a versatile and timeless piece for any wardrobe.

Loading...

Reply
Kane Strac

2025-05-16

The Adam Jensen Jacket from Deus Ex combines futuristic style with rugged practicality, reflecting the character’s cyberpunk aesthetic. A perfect choice for fans of tech-inspired, edgy fashion

Loading...

Reply
huadongcable

2026-02-03

N2XRY cable and N2XRH cable belong to the steel wire armored low voltage power cable series, with rated voltage of 0.6/1kV and maximum rated temperature of 90, suitable for industrial facilities, underground installation, public buildings, etc.

Loading...

Reply
BRAZIL999

2026-06-10

Nice post. I understand some thing much harder on various blogs everyday.

Loading...

Reply