Data-driven Differential Equation Identification

Data-driven differential equation identification seeks to uncover the governing dynamical models underlying observed phenomena, whether traced from physical experiments or emerging from biological processes.

Our research pursues three interconnected objectives:

Algorithm Development — designing effective, robust, and interpretable methods for recovering differential equations from noisy, sparse, or high-dimensional data.
Identifiability Theory — establishing mathematical frameworks to characterize identifiability conditions for various differential equations.
Uncertainty Quantification — characterizing and propagating uncertainty through the identification pipeline to produce reliable, trustworthy models.

Identification from Noisy Trajectory Data

Unlike classical regression, the feature matrix consists of differential quantities estimated from single noisy trajectory data. Finite difference schemes inherently amplify noise, making accurate feature estimation and reliable model selection highly challenging. We develop robust computational techniques and rigorous model validation methods to accurately recover governing equations from noisy observations.

Noisy finite difference

Denoised derivative (SDD)

Model validation (MTEE)

Accuracy vs. noise level

References

(He et al., 2022)

Varying Coefficient PDE Identification

Description coming soon.

Some identification results

Consistent and Sparse Local Regression (CaSLR) identifies varying coefficient PDEs using local patches, within each of which the varying coefficients are well approximated by constants.

References

(He et al., 2023) (He et al., 2025) (Tang et al., 2025) (He et al., 2024)

Stochastic ODE/PDE Identification

Description coming soon.

Figure placeholder. Replace with your result figure.

References

(Cui & He, 2025)

Two-phase PDE Identification

Description coming soon.

Phase identification

Covering patches(SDD)

Evolution based localization

Uncertainty quantification

References

(Yang & He, 2026)

Identifiability Theory

A fundamental question in data-driven PDE identification is: given observed solution data, when can the underlying PDE be uniquely recovered?

Data space characterization — for elliptic operators, all snapshots of a single trajectory stay $\varepsilon$-close to a linear space of dimension $O(|\log \varepsilon|^2)$, revealing the intrinsic ill-conditioning of single-trajectory identification
Identifiability from two instants — for PDEs with constant coefficients, the parameters are uniquely determined from solutions at two time instants $u(x,t_1)$, $u(x,t_2)$ if the Fourier support $Q$ satisfies sharp combinatorial and geometric conditions
Variable coefficient identifiability — for PDEs with variable coefficients, $\binom{n+d}{d}$ time instants suffice for local recovery, provided the solution contains sufficiently diverse Fourier modes
Stability analysis — high-frequency perturbations to elliptic operators have limited impact on the solution, with explicit bounds depending on the operator order and regularity of the initial data

References

(He et al., 2024) (He et al., 2022)