Reinforcement Learning with Discrete Diffusion Policies for Combinatorial Action-Spaces

Haitong Ma; Ofir Nabati; Aviv Rosenberg; Bo Dai; Oran Lang; Craig Boutilier; Na Li; Shie Mannor; Lior Shani; Guy Tennenholtz

Reinforcement Learning with Discrete Diffusion Policies for Combinatorial Action-Spaces

Haitong Ma

Ofir Nabati

Aviv Rosenberg

Bo Dai

Oran Lang

Craig Boutilier

Na Li

Shie Mannor

Lior Shani

Guy Tennenholtz

Proceedings of the 43rd International Conference on Machine Learning (ICML-26), Seoul, South Korea (2026)

Google Scholar

Abstract

Reinforcement learning (RL) algorithms have achieved superhuman performance
on many sequential decision-making tasks, but often struggle in domains with
large, combinatorial action spaces. To address this, we introduce a practical and
stable algorithm for training discrete diffusion models to represent policies in
such environments. We formulate a policy mirror descent algorithm that enhances
training stability by reframing policy optimization as an inference problem, which
naturally aligns with the learning objective of discrete diffusion models. Through
extensive experiments on a suite of challenging benchmark tasks, we demonstrate
that our approach achieves significant improvements over existing methods in both
performance and sample efficiency. This work opens a promising new direction
for applying discrete diffusion models in RL to tackle long-standing challenges in
large-scale combinatorial action spaces.

Explore our many areas of focus

Building a collaborative ecosystem

Shaping the future together

Translating discovery into real-world impact

Reinforcement Learning with Discrete Diffusion Policies for Combinatorial Action-Spaces

Abstract

Meet the teams driving innovation

Google AI

Google Cloud

Google DeepMind

Google Labs