1 Course reminders

The following is a reminder of the main theoretical results about Deep Learning algorithms for PDE seen during the course.

1.1 Some reminders on PDE and stochastic control

1.1.1 Stochastic control in a nutshell

Mathematical setup

Dynamics of the controlled state process

Let $(\Omega,\mathcal{F}, \mathbb{F}=(\mathcal{F}_t)_{t \geq 0}, \mathbb{P})$ be a probability space satisfying the usual assumptions and big enough to support for simplicity a $\mathbb{R}^n$-valued Brownian motion $W=(W_t)_{t \geq 0}$. We consider a control model where the state of the system is governed by a stochastic differential equation (SDE) valued in $\mathbb{R}^n$ given by

\[ \begin{align} d X_s = b(X_s,\alpha_s) d s + \sigma(X_s, \alpha_s ) d W_s, \hspace{0.5 cm} \end{align} \tag{1.1}\] starting from $X_0= x \in \mathbb{R}^n$ and where we are given 2 measurable maps $(b,\sigma) : \ \mathbb{R}^n \times A \to \mathbb{R}^n ,\mathbb{R}^{n \times n}$ and the control $\alpha=(\alpha_t)_{0 \leq t \leq T}$ is a progressively measurable (with respect to $\mathbb{F})$ process valued in $A \subset \mathbb{R}^m$. We suppose that the measurable maps $(b,\sigma)$ satisfy a uniform Lipschitz condition on $A$: $\exists K \geq 0$, $\forall x,y \in \mathbb{R}^n$, $\forall a \in A$, \[\begin{align} | b(x,a) - b(y,a) |+ |\sigma(x,a) - \sigma(y,a) | \leq K |x-y|. \end{align}\] In the sequel, for $0 \leq t \leq T$, we denote by $\mathcal{T}_{t,T}$ the set of stopping times valued in $[t,T]$. Fix $T > 0$ and we denote by $\mathcal{A}$ the set of progressively mesurable with respect to $\mathbb{F}$ control processes $\alpha = (\alpha_t)_{0 \leq t \leq T}$ such that \[\begin{align} \mathbb{E} \Big[ \int_{0}^{T} | b(0,\alpha_t)|^2 + |\sigma(0,\alpha_t)|^2 dt \Big] < + \infty. \end{align}\] It is known that under the previous conditions, existence and unicity of a strong solution to the SDE (1) for any initial condtion $(t,x) \in [0,T] \in \mathbb{R}^n$. Starting from $x$ at $s=t$, we then denote by $\big \lbrace X_s^{t,x}, t \leq s \leq T \big \rbrace$ the solution to $(1)$ which admits a modification with continuous paths up to indistinguability. We also recall that under the standard conditions on $(b,\sigma)$ and on the integrability condition on $\alpha$, we have \[\begin{align} \mathbb{E} \Big[ \underset{s \leq t \leq T}{\text{ sup }} |X_s^{t,x}|^2 \Big] < \infty. \end{align}\]

Functional objective

Let $f :[0,T] \times \mathbb{R}^n \times A \to \mathbb{R}$ and $g : \mathbb{R}^n \to \mathbb{R}$ two measurable functions. We suppose that g satisfies a quadratic growth condition : $|g(x)| \leq C( 1+ |x|^2)$, $\quad$ $\forall x \in \mathbb{R}^n$, for some constant $C$ independant of $x$. For $(t,x) \in [0,T] \times \mathbb{R}^n$, we denote by $\mathcal{A}(t,x)$ the subset of controls $\alpha \in \mathcal{A}$ such that \[ \begin{align} \mathbb{E} \Big [ \int_{t}^{T} f(s,X_s^{t,x}, \alpha_s) d s \Big] < \infty, \end{align} \] and we assume that $\mathcal{A}(t,x)$ is not empty for all $(t,x) \in [0,T] \times \mathbb{R}^n$. We can then define the cost functional \[ \begin{align} J(t,x,\alpha) := \mathbb{E} \Big[ \int_{t}^{T} f(s,X_s^{t,x},\alpha_s) d s + g(X_T^{t,x}) \Big], \end{align} \] for all $(t,x) \in [0,T] \times \mathbb{R}^n$ and $\alpha \in \mathcal{A}(t,x)$. The objective is to maximise over control processes the gain function $J$ and we introduce the value function

\[ \begin{align} v(t,x) = \underset{\alpha \in \mathcal{A}(t,x)}{\text{ sup }} J(t,x,\alpha). \end{align} \tag{1.2}\]

Given an initial condition $(t,x) \in [0,T) \times \mathbb{R}^n$, we say that $\hat{\alpha} \in \mathcal{A}(t,x)$ is an optimal control if $v(t,x) = J(t,x,\hat{\alpha})$.
A control process $\alpha= (\alpha_s)_{t \leq s \leq T}$ in the form $\alpha_s = \mathrm{a}(s,X_s^{t,x})$ for some measurable function $\mathrm{a}$ from $[0,T] \times \mathbb{R}^n$ into $A$ is called Markovian control.

1.1.2 The dynamic programming approach : HJB equation

The dynamic programming principle (DPP) is a fundamental principle in the theory of stochastic control. For controlled Markov processes, it is formulated as follows

Theorem 1.1 (Dynamic Programming Principle.) Let $(t,x) \in [0,T] \times \mathbb{R}^n$. Then, we have \[ \begin{align} v(t,x) &= \underset{\alpha \in \mathcal{A}(t,x)}{\text{ sup }} \mathbb{E} \Big[ \int_{t}^{t+ \theta} f(s,X_s^{t,x},\alpha_s) d s + v(\theta, X_{\theta}^{t,x}) \Big] \end{align} \tag{1.3}\] for any stopping time $\theta \in \mathcal{T}_{t,T}$.

The interpretation of the DPP is that the optimization problem can be split in two parts: an optimal control on the whole interval $[t,T]$ may be obtained by first searching for an optimal control from time $\theta$ given the state value $X_{\theta}^{t,x}$, i.e. compute $v(\theta,X_{\theta}^{t,x})$ and then maximizing over controls on $[t,\theta]$ the quantity

\[\mathbb{E} \Big[ \int_{t}^{\theta} f(s,X_s^{t,x},\alpha_s ) d s + v(\theta, X_{\theta}^{t,x}) \Big].\]

Proof. The proof of this result is non trivial and is based on a measurable selection argument but the proof can be found easily online so we omit it for now.

Intuitive derivation of the HJB equation

The Hamilton-Jacobi Bellman (HJB) equation is the infinitesimal version of the dynamic programming principle, i.e., it describes the local behavior of the value function when we send the stopping time $\theta$ in Equation 1.3 in Theorem 1.1 to $t$. The HJB equation is also called dynamic programming equation.

Let us consider the time $\theta = t+h$ and a constant control $\alpha = (\alpha_t)_{t \leq s \leq T}= a$ for some arbitrary $a \in A$. Therefore, from the relation Equation 1.3, we have

\[ \begin{align} v(t,x) \geq \mathbb{E} \Big[ \int_{t}^{t+h} f(s,X_s^{t,x},a) + v(t+h,X_{t+h}^{t,x}) \Big] \end{align} \tag{1.4}\]

Under some regularity conditions on $v$, we may apply Itô’s formula between $t$ and $t+h$ which gives

\[ \begin{align} v(t+h, X_{t+h}^{t,x}) = v(t,x) + \int_{t}^{t+h} \Big( \partial_t v + \mathcal{L}^a v \Big)(s, X_s^{t,x}) d s + \text{ Local martingale }, \end{align} \] where we denote the operator $\mathcal{L}^a$ associated to the diffusion with control process given by constant control $a$ and defined as \[ \begin{align} \mathcal{L}^{a}v(t,x) = b(x,a) \cdot D_x v(t,x) + \frac{1}{2} \sigma \sigma^{\top}(t,x) : D_x^2 v(t,x) \end{align}, \] where $\cdot$ denotes the scalar product between vectors on $\mathbb{R}^n$ and $:$ denotes the scalar prodcutes between matrix, i.e., $A:B= \text{Tr}(A^{\top} B)$. Substituting into Equation 1.4, we get \[ \begin{align} 0 \geq \mathbb{E} \Big[\int_{t}^{t+h} \Big( \partial_t v+ \mathcal{L}^a v \Big)(s,X_s^{t,x} ) + f(s,X_s^{t,x},a) d s \Big] \end{align} \] Dividing by $h$ and sending $h \to 0$, from the continuity of the map inside the expectation, we get by the mean-value theorem

\[ \begin{align} 0 \geq \partial_t v (t,x) + \mathcal{L}^a v(t,x) + f(t,x,a). \end{align} \tag{1.5}\]

Since Equation 1.6 holds for any $a \in A$, we obtain the inequality

\[ \begin{align} 0 \geq \partial_t v (t,x) + \underset{a \in A}{\text{ sup }} \big[ \mathcal{L}^a v(t,x) + f(t,x,a) \big] \end{align} \tag{1.6}\] On the other hand, if we assume that $\alpha^{\star}= (\alpha^{\star}_t)_{0 \leq t \leq T}$ is an optimal control. Then in Equation 1.3 , we have

\[ \begin{align} v(t,x) = \mathbb{E} \Big[ \int_{t}^{t+h} f(s,X_s^{\star}, \alpha_s^{\star}) d s + v(t+h, X_{t+h}^{\star}) \Big], \end{align} \] where $X^{\star}$ is the state process solution to Equation 1.1 starting from $x$ at $t$, with the control $\alpha^{\star}$. Using similar arguments as derived above, we end up with

\[ \begin{align} \partial_t v(t,x) + \mathcal{L}^{\alpha^{\star}_t} v(t,x) + f(t,x, \alpha_t^{\star}) =0, \end{align} \] which combined with Equation 1.6, suggests that $v$ should satisfy

\[ \begin{align} \partial_t v(t,x) + \underset{a \in A}{\text{ sup }} \big[ \mathcal{L}^{a} v(t,x) + f(t,x,a) \big ] =0 , \quad \forall (t,x) \in [0,T) \times n, \end{align} \tag{1.7}\] if the above supremum is finite. The PDE Equation 1.7 is often rewritten in the form

\[ \begin{align} \partial_t v(t,x) + H \big(t,x, D_x v(t,x) , D^2_x (t,x) \big) = 0, \quad \forall (t,x) \in [0,T) \times \mathbb{R}^n, \end{align} \tag{1.8}\] where for $(t,x,p,M) \in [0,T] \times \mathbb{R}^n \times \mathbb{R}^n \times \mathcal{S}_n$,¹ we denote

\[ \begin{align} H(t,x,p,M) = \underset{a \in A}{\text{ sup }} \big[ b(x,a) \cdot p + \frac{1}{2} \sigma \sigma^{\top}(x,a) : M + f(t,x,a) \big]. \end{align} \] The map $H$ is called the Hamiltonian of the associated control problem and Equation 1.8 is called dynamic programming equation (DPP) equation or Hamilton-Jacobi-Bellman (HJB) equation. The terminal condition to this PDE is given by

\[ \begin{align} v(T,x) = g(x), \quad \forall x \in \mathbb{R}^n, \end{align} \] which is an immediate consequence of the definition of the value function.

Some remarks:

When the control space $A$ is a singleton, i.e., $A= \big \lbrace a_0 \rbrace$, the HJB equation is reduced to the Cauchy problem

\[ \begin{align} \begin{cases} \partial_t v(t,x) + \mathcal{L}^{a_0} v(t,x) &= f(t,x,a_0), \quad \forall (t,x) \in [0,T) \times \mathbb{R}^n. \\ v(T,x) &= g(x), \quad \forall x \in \mathbb{R}^n. \end{cases} \end{align} \]

The optimality argument of the DPP suggests that if one can find a measurable map $ [0,T] ^n (t,x) ^{}(t,x) A$ such that

\[ \begin{align} \underset{a \in A}{\text{ sup }} \big[ \mathcal{L}^a v(t,x) + f(t,x,a) \big] = \mathcal{L}^{\alpha^{\star}(t,x)} v(t,x) + f(t,x, \alpha^{\star}(t,x)), \end{align} \] then we would get \[ \begin{align} \begin{cases} \partial_t v(t,x) + \mathcal{L}^{\alpha^{\star}(t,x)} v(t,x) + f(t,x, \alpha^{\star}(t,x)) &=0, \quad \forall (t,x) \in [0,T) \in \mathbb{R}^n \\ v(T,x) &= g(x), \quad x \in \mathbb{R}^n \end{cases} \end{align} \] and so by Feynmann-Kac formula,

\[ \begin{align} v(t,x) = \mathbb{E} \Big[ \int_{t}^{T} f(s,X_s, \alpha^{\star}(s,X_s)) d s + g(X_T^{\star}) \Big], \end{align} \] where $X^{\star} = (X^{\star}_s)_{t \leq s \leq T}$ is solution to the SDE $$ \[\begin{align} \begin{cases} dX_s^{\star} &= b(X_s^{\star}, \alpha^{\star}(s,X_s^{\star})) + \sigma(X_s^{\star}, \alpha^{\star}(s,X_s^{\star})) d W_s , \quad t \leq s \leq T, \\ X_t^{\star} &= x \end{cases} \end{align}\] This will show that the process $(\alpha^{\star}_s)_{t \leq s \leq T}= \big(\alpha^{\star}(s,X_s^{\star} )\big)_{t \leq s \leq T}$ is an optimal Markovian control, i.e given from a measurable feedback map $\alpha^{\star}$. ²

The crucial step in the classical approach to dynamic programming consists in proving, that given a smooth solution to the HJB equation, this candidate coïncides with the value function. This result is called verification theorem. We formulate a general version of the verification theorem.

Theorem 1.2 (Verification theorem.) Let $w$ be a function in $\mathcal{C}^{1,2}([0,T) \times \mathbb{R}^n \cap \mathcal{C}^0([0,T] \times \mathbb{R}^n)$, satisfying a growth quadratic growth condition, i.e., there exists a constant $C \geq 0$ such that

\[ \begin{align} |w(t,x)| \leq C \big( 1+ |x|^2 \big), \quad \forall (t,x) \in [0,T] \times \mathbb{R}^n. \end{align} \]

Suppose that

\[ \begin{align} \begin{cases} \partial_t w + \underset{a \in A}{\text{ sup }} \big[ \mathcal{L}^a w(t,x) + f(t,x,a) \big] &\leq 0, \quad (t,x) \in [0,T) \times \mathbb{R}^n, \\ w(T,x) &\geq g(x), \quad x \in \mathbb{R}^n. \end{cases} \end{align} \] Then $w \geq v$ on $[0,T] \times \mathbb{R}^n$.

Suppose further $w(T,\cdot)=g$ and there exists a measurable map $[0,T] \times \mathbb{R}^n \ni (t,x) \mapsto \hat{\alpha}(t,x) \in A$ such that \[ \begin{align} \partial_t w + \underset{a \in A}{\text{ sup }} \big[ \mathcal{L}^a w(t,x) + f(t,x,a) \big] = \partial_t w \big[ \mathcal{L}^{\hat{\alpha}(t,x)} w(t,x) + f(t,x,\hat{\alpha}(t,x)) \big], \end{align} \] and the SDE given by \[ d X_s = b(X_s, \hat{\alpha}(s,X_s)) d s + \sigma(X_s, \hat{\alpha}(s,X_s)) d W_s, \] admits a unique solution, denoted by $\hat{X}_s^{t,x}$ and the process $\big(\hat{\alpha}(s,X_s^{t,x}) \big)_{t \leq s \leq T} \in \mathcal{A}(t,x)$. Then, we have $w=v$ on $[0,T] \times \mathbb{R}^n$ and $\hat{\alpha}$ is an optimal Markovian control.³

Proof. The proof of this result follow essentially from Itô’s formula after denoting the sequence of stopping times $\tau_n = \text{inf} \lbrace s \geq t : \int_{t}^s D_x w(l, X_l^{t,x})^{\top} \sigma(X_l^{t,x}, \alpha_l)|^2 d l \geq n \big \rbrace$ and noticing that $\tau_n \nearrow \infty$ as $n$ tends to $\infty$ and applying Itô’s formula to $w$ between $t$ and $s \wedge \tau_n$.

1.1.3 The BSDE approach : Pontryagin’s formulation

We denote by $\mathbb{S}^2([0,T];\mathbb{R})$ the set of $\mathbb{R}$-valued progressively measurables processes $Y=(Y_t)_{0 \leq t \leq T}$ such that $\mathbb{E} \Big[ \underset{0 \leq t \leq T}{\text{ sup }} |Y_t|^2 \Big] < \infty$ and by $\mathbb{H}^2([0,T];\mathbb{R}^n)$ the set of $\mathbb{R}^n$-valued progressively measurable $Z=(Z_t)_{0 \leq t \leq T}$ such that $\mathbb{E} \Big[ \int_{0}^{T} |Z_t|^2 d t \Big] < \infty$. We are given a pair $(\xi, f)$ called terminal condition and generator (or driver) satisfying :

$(A)$ $\xi \in L^2(\Omega;\mathcal{F}_T; \mathbb{P}; \mathbb{R})$
$(B)$ $f : \Omega \times [0,T] \times \mathbb{R} \times \mathbb{R}^d \to \mathbb{R}$ such that
- $f(\cdot,t,y,z)$ (written for simplicity $f(t,y,z)$) is progressively measurable for all $(y,z)$
- $f(t,0,0) \in \mathbb{H}^2([0,T];\mathbb{R})$.
- $f$ satisfies a uniform Lipschitz condition in $(y,z)$, i.e, there exists a constant $C_f$ such that \[ \begin{align} | f(t,y_1,z_1) - f(t,y_2,z_2)| \leq C_f \big ( |y_1 - y_2| + |z_1 - z_2| \big), \quad \forall y_1,y_2, \forall z_1, z_2, \quad d t \otimes d \mathbb{P} \text{ a-e.} \end{align} \] We now consider the (unidimensional) backward stochastic differential equation (BSDE) \[ \begin{align} \begin{cases} dY_t &= - f(t,Y_t,Z_t) d t + Z_t d W_t ,\\ Y_T &= \xi \end{cases} \end{align} \tag{1.9}\]

Definition 1.1 A solution to the BSDE Equation 1.9 is a pair of processes $(Y,Z) \in \mathbb{S}^2([0,T];\mathbb{R}) \times \mathbb{H}^2([0,T]; \mathbb{R}^n)$ satisfying \[ \begin{align} Y_t = \xi + \int_{t}^{T} f(s,Y_s,Z_s) ds - \int_{t}^{T} Z_s d W_s, \quad 0 \leq t \leq T, \quad \mathbb{P - }\text{ a.s}. \notag \end{align} \]

We now state an import result ensuring existence and unicity to the BSDE $(1)$

Theorem 1.3 Given a pair of $(\xi,f)$ satisfying Assumptions $(A)$ and $(B)$, there exists a unique solution $(Y,Z) \in \mathbb{S}^2([0,T];\mathbb{R}) \times \mathbb{H}^2([0,T];\mathbb{R}^n)$ to Definition 1.1.

Proof. The proof of Theorem 1.3 is based on a fixed point argument on the Banach space $\mathbb{S}^2([0,T];\mathbb{R}) \times \mathbb{H}^2([0,T];\mathbb{R}^n)$ endowed with the norm \[ \begin{align} \lVert (Y,Z) \rVert_{\beta} = \bigg(\mathbb{E} \Big[ \int_{0}^{T} e^{\beta s} \big( |Y_s|^2 + |Z_s|^2 \big) \Big] \bigg)^{\frac{1}{2}} \end{align} \]. We now show that for a suitable choice of $\beta$, the mapping $\Phi$ is well defined on $\mathbb{S}^2([0,T];\mathbb{R}) \times \mathbb{H}^2([0,T];\mathbb{R}^n)$ into $\mathbb{S}^2([0,T];\mathbb{R}) \times \mathbb{H}^2([0,T];\mathbb{R}^n)$ as $(Y,Z) = \Phi(U,V)$. Formally, defining the martingale process $M=(M_t)_{0 \leq t \leq T}$ as $M_t = \mathbb{E} \Big[ \xi + \int_{0}^{T} f(s,U_s,V_s)| \mathcal{F}_t \Big]$, from the Itô’s martingale decomposition theorem, we got the existence of a process $Z=(Z_t)_{0 \leq t \leq T} \in \mathbb{H}^2([0,T];\mathbb{R}^n)$ such that \[ \begin{align} M_t = M_0 + \int_{0}^{t} Z_s d W_s. \end{align} \] We then define the process $Y=(Y_t)_{0 \leq t \leq T}$ as \[ \begin{align} Y_t = M_t - \int_{0}^{t} f(s,U_s, V_s) d s . \end{align} \] It is easy to see that $Y_T = \xi$ by construction and that our $Y$ candidate satisfies $(1)$. Now, you can check under the assumptions on $f$ and $\xi$ that \[ \begin{align} \mathbb{E} \Big[ \underset{0 \leq t \leq T}{\text{ sup }} |\int_{t}^{T} Z_s d W_s |^2 \Big] \leq 4 \mathbb{E}\Big[ \int_{0}^{T} |Z_s|^2 d s \Big] < + \infty. \end{align} \] where the inequality follows from Doob’s inequality. Therefore, $\Phi$ is a well defined map. Now, you can check that for $\beta$ small enough, $\Phi$ is a contraction and therefore the Banach fixed point ensures that there exists a unique solution to $(1)$.

BSDE, PDE and nonlinear Feynman-Kac formula

In this chapter, we will study an extension of the Feynman-Kac formula for semi-linear PDE in the form \[ \begin{align} \begin{cases} \partial_t v(t,x) + \mathcal{L}v(t,x) + f(t,x,v(t,x), \sigma(t,x)^{\top} D_x v(t,x)) = 0, \quad (t,x) \in [0,T) \times \mathbb{R}^n, \\ v(T,x) = g(x). \end{cases} \end{align} \tag{1.10}\] In fact, we will represent the PDE solution of Equation 1.10 through a suitable BSDE representation as below

\[ \begin{align} \begin{cases} d Y_s &= - f(s,X_s,Y_s,Z_s) d s + Z_s d W_s , \\ Y_T &= g(X_T), \end{cases} \end{align} \tag{1.11}\] and we recall that the forward SDE $\mathbb{R}^n$-valued process $X=(X_t)_{0 \leq t \leq T}$ is given by \[ \begin{align} d X_s = b(s,X_s) d s + \sigma(s,X_s) d W_s \end{align} \] Proposition : Link between $v$ and $(Y,Z)$ :

Let $v \in \mathcal{C}^{1,2}([0,T) \times \mathbb{R}^n) \cap C^0([0,T] \times \mathbb{R}^n)$ be a solution to the PDE $(4)$, satisfying a linear growth condition and such that there exists positive constants $C,q$ such that $| D_x v(t,x)| \leq C( 1+ |x|^q)$ for any $x \in \mathbb{R}^n$. Then the pair of processes $(Y,Z)$ defined by

\[ \begin{align} \begin{cases} Y_t &= v(t,X_t), \\ Z_t &= \sigma^{\top}(X_t) D_x v(t,X_t), \quad 0 \leq t \leq T \end{cases} \end{align} \tag{1.12}\]

is the solution to the BSDE Equation 1.11.

Proof. This result is an immediate application of the Itô’s formula applied to the process $\big(v(t,X_t)\big)_{0 \leq t \leq T}$ and the fact that $v$ is a solution to the PDE in Equation 1.10. Indeed, applying Itô’s formula between $t$ and $T$, we have \[ \begin{align} v(t,X_t)= v(T,X_T) - \int_{t}^{T} \Big( \partial_t v(s,X_s) - \mathcal{L}v(s,X_s) \Big) d s - \int_{t}^{T} \sigma^{\top}(s,X_s) D_x v(s,X_s) d W_s, \quad 0 \leq t \leq T. \end{align} \] Now, using that $v$ is a solution to the PDE, we get that \[ \begin{align} Y_t = g(X_T) + \int_{t}^{T} f(s,X_s,Y_s,Z_s) d s - \int_{t}^{T} Z_s d W_s, \quad 0 \leq t \leq T, \end{align} \] where we set $(Y_t,Z_t) = \Big(v(t,X_t), \sigma^{\top}(t,X_t)D_x v(t,X_t)\Big)$ for any $t \in [0,T]$. Moreover, from the growth assumptions on $v$ and $D_x v$, we get that $(Y,Z)$ lies in $\mathbb{S}^2([0,T]; \mathbb{R}) \times \mathbb{H}^2([0,T]; \mathbb{R}^n)$ and is unique by Theorem 1.3 .

1.2 Neural networks based algorithms for solving PDEs

Now, that we have shown how stochastic control problems naturally lead to PDEs, we will rely on some recent advances which appear to numerically solve these PDE, i.e to characterize the function and/or its derivative solution the PDE. Formally, we are going to tackle the following kind of problems.

PDE formulation

Let $v$ a function defined on $[0,T] \times \mathbb{R}^n$ supposed to satisfy the following PDE \[ \begin{align} \begin{cases} \partial_t v(t,x) + \mathcal{H}[v](t,x) &= 0, \quad (t,x) \in [0,T) \times \mathbb{R}^n, \\ v(T,x) &= g(x). \end{cases} \end{align} \tag{1.13}\] where the operator $\mathcal{H}$ is defined over the space of functions over $[0,T] \times \mathbb{R}^n$ potentially with some regularity $[0,T] \times \mathbb{R}^n$ and can be rewritten in the case we are going to cover as \[\begin{align} \mathcal{H}[v](t,x) = H \big(t,x,v(t,x), D_x v(t,x), D^2_x v(t,x) \big). \quad (2) \end{align}\] The complexity of such systems comes notably from

The non linearity of $H$ with respect to $v$ and its derivatives.
The potentially high dimension of the underlying space ($n \approx 100$).

Example of PDEs.

Linear PDE :

\[\begin{align} \begin{cases} \partial_t v(t,x) - r(x) v(t,x) + b(x) \cdot D_x v + \frac{1}{2} (\sigma \sigma^{\top})(t,x) : D^2_x v(t,x) + f(x)= 0, \quad (t,x) \in [0,T) \times \mathbb{R}^n \\ v(T,x) = g(x), \quad x \in \mathbb{R}^n \end{cases} \end{align}\] In this case, the operator $\mathcal{H}$ is linear with respect to its arguments with the map $H$ (recall $(1)$) given by \[\begin{align} H(t,x,y,z,\gamma):= -r(x) y(t,x) + b(x) \cdot z(t,x) + \frac{1}{2} (\sigma \sigma^{\top})(t,x) : \gamma(t,x) \end{align}\] This is typically the type of $PDE$ which arises in option pricing in B-S model by taking $b=r$, $\sigma$ the volatility, $f=0$ and $g$ the option payoff.

Quasilinear PDE : \[\begin{align} \begin{cases} \partial_t v + \mathcal{H}[v] + f(x,v)= 0, \quad (t,x) \in [0,T) \times \mathbb{R}^n, \quad (2)\\ v(T,x) = g(x), \quad x \in \mathbb{R}^n \end{cases} \end{align}\] For instance, when $f(x,y) = r \text{max}(y,0) - ry$, this is the type of PDE which arises from pricing of CVA (Credit Valuation adjustment) where $r$ denotes the intensity of default of a counterparty.

Numerical challenges

However, solving these PDEs have always been highly challenging due to the curse of dimensionnality due to the exponentially scaling when discretizing the action mesh size $\mathbb{R}^n$ (grid based methods) but also from the point of view of Monte-Carlo methods which are limited to low dimensional setting $n \approx 6$ and where the solution is essentially computed at a fixed point $(t,x) \in [0,T] \times \mathbb{R}^n$. In order to solve $(1)$ efficiently, we will rely on neural-network based algorithms which can provide a functional representation of the map $(t,x) \in [0,T] \times \mathbb{R}^n$ at any $(t,x)$ and for $n$ beeing large.

1.2.1 Deep Galerkin Algorithm

Mathematical description :

The main idea of Deep Galerkin algorithm is to rewrite the PDE Equation 1.13 via a stochastic optimization problem. Given a smooth function $w$ on $[0,T] \times \mathbb{R}^n$, we define \[ \begin{align} \mathbb{L}(w) = \mathbb{E} \Big[ w(T,\mathcal{X}) - g(\mathcal{X})|^2 \Big] + \mathbb{E} \Big[ | \partial_t w(\tau,\mathcal{X} - \mathcal{H}[w](\tau,\mathcal{X})) |^2 \Big], \end{align} \tag{1.14}\] where $(\tau,\mathcal{X}) \sim \rho_T \otimes \rho_n$ laws supported on $[0,T] ^n $. In order to solve Equation 1.14 , we parametrize the family of maps $w$ by neural networks and reduce the infinite dimensional optimization problem into a finite one where the finite parameter is given by the parameter of the neural net $\mathcal{U}(\theta)$. and the problem is therefore now to solve $\underset{\theta \in \Theta}{\text{ inf }} L(\theta)$ with $L(\theta)=\mathbb{L}(U_{\theta})$ by uses of stochastic gradient descent methods.

1.2.2 Deep Backward Solvers

Mathematical description of Deep BSDE Solver :

The Deep BSDE Solver takes advantages of semi-linear PDE. The idea is to approximate the unknown functions $Y_0= v(0,X_0)$ and $Z_t = D_x v(t,X_t)$ by neural networks $\mathcal{U}_{\theta}$ on $^d and $\mathcal{Z}_{\theta}$ on $[0,T] \times \mathbb{R}^d$. In practice, we will compute by minimizing over $\theta \in \Theta$ the cost function given by

\[ \begin{align} L(\theta) = \mathbb{E} \Big[ |Y_T^{\theta} - g(X_T)|^2 \Big], \end{align} \] where the dynamics of $(Y_t^{\theta})_{0 \leq t \leq T}$ is given by

\[ \begin{align} Y_t^{\theta} = \mathcal{U}_{\theta}(X_0) - \int_{0}^{t} f(X_s,Y_s^{\theta}, \mathcal{Z}_{\theta}(s,X_s)) d s + \int_{0}^{t} \mathcal{Z}_{\theta}(s,X_s)^{\top} \sigma(X_s) d W_s, \quad 0 \leq t \leq T. \end{align} \]

The method is called Deep “Backward” but in fact it simulates the process $Y$ as a forward process by trying to optimize over its initial parameter $Y_0$.

Mathematical description of Deep BDP Solver :

The Deep BDP Solver is a solver which allows to approximate at every time-step $t_i$ the pair $(Y_{t_i},Z_{t_i})$ by minimizing the local loss function, defined recursively by the Backward Euler Scheme :

\[ \begin{align} \begin{cases} Y_{t_{i+1}} &= Y_{t_i} - f(X_{t_i},Y_{t_i},Z_{t_i}) \Delta_{t_i} + Z_{t_i}^{\top} \sigma(X_{t_i}) \Delta W_{t_i}, \quad i = n-1, \ldots, 0 \\ Y_{t_N} &= g(X_{t_N}). \end{cases} \end{align} \] The method will either try to learn $(Y_{t_i},Z_{t_i})$ at every $i = n-1, \ldots,0$ by defining multiple neural networks, one for $Y$ and one for other $Z$ or we could alternatively taking advantage of Equation 1.12 using one neural network and use automatic differentiation.

$\mathcal{S}_n$ denotes the space of symmetric matrices over $\mathbb{R}^{n \times n}$↩︎
feedback refers to measurable function defined over time $\times$ state space .↩︎
We recall that a control process $\alpha= (\alpha_s)_{t \leq s \leq T}$ in the form $\alpha_s = \mathrm{a}(s,X_s^{t,x})$ is said to be Markovian.↩︎