Non-Exchangeable Mean Field Markov Decision Processes with common noise : from Bellman equation to quantitative propagation of chaos
Published:
Recommended citation: https://hal.science/hal-05501600v1/document
Co-authors
Abstract
We study infinite-horizon Markov Decision Processes (MDPs) with a continuum of heterogeneous agents interacting through a common noise, without assuming exchangeability. We introduce the framework of Conditional Non-Exchangeable Mean Field MDPs (CNEMF-MDPs) in both a strong formulation and a label–state formulation. We establish the equivalence between these two formulations by showing that the control problem can be lifted to a standard MDP defined on the Wasserstein space $\mathcal{P}{\lambda}(I \times X )$, where I denotes the label (heterogeneity) space, X is the individual state space, and λ specifies the fixed distribution of agent labels. Within this framework, we characterize the value function as the unique fixed point of an appropriate Bellman operator acting on $\mathcal{P}{\lambda}(I \times X )$. Our second contribution is a quantitative analysis of the propagation of chaos for this non-exchangeable setting with common noise. We derive sharp finite-population bounds by comparing the Bellman operator of the finite $N$-agent MDP, defined on the high-dimensional space $\mathcal{X}^N$ , with its infinite-agent counterpart. This comparison yields explicit constructions of near-optimal policies for the $N$-agent system from $\epsilon$-optimal policies of the limiting CNEMF-MDP.
