Experiment 4 — Gradient Alignment Depth Test | Symplectic PINN Research Notes

Purpose

Experiment 3 measured gradient alignment at a single depth. This experiment extends that diagnostic across depths 4, 10, 25, and 50 to answer two questions: does multi-objective conflict grow with depth, and does the SymplecticPINN's structural advantage hold as depth increases?

This is the depth dimension of Goal 3 (Teamwork). The central prediction is that the MLP's gradient conflict frequency increases with depth, while the symplectic architecture maintains cooperative gradients because its phase-space structure is preserved at every layer.

Abijuru et al.^[5] demonstrate that gradient shattering in standard PINNs scales with depth, making deep networks progressively harder to train without architectural intervention. Their residual flow solution ensures that update directions remain aligned with the PDE solution trajectory. Our hypothesis is that Störmer-Verlet integration provides a layer-local symplecticity guarantee that achieves this alignment structurally, making the architecture depth-independent by construction rather than just attenuated more slowly.

Setup

All three architectures — MLP, ResidualMLP, and SymplecticPINN — are trained at depths 4, 10, 25, and 50 on the 1D heat equation under matched conditions. The cosine similarity between ∇L_pde and ∇L_bc is logged at regular intervals throughout training.

Network depth

Standard PINN — trajectory drift

expected path low drift

Symplectic Flow — locked to path

expected path locked

Alignment grid

Grid of smoothed cosine similarity curves at depths 4, 10, 25, and 50. — **Alignment grid — one panel per depth** At depth 50 the MLP panel is a flat line — the gradient has fully collapsed with nothing left to measure.

Each panel shows the smoothed cosine similarity curves for all three architectures at one depth. The qualitative story changes dramatically as depth increases. At depth 4 all three architectures produce meaningful alignment signals. By depth 50, the MLP panel is a flat line at zero — the gradient has fully collapsed and there is nothing left to measure.

Tanaka et al.^[3] provide a theoretical account of this behavior: their Symplectic Spectrum Gaussian Processes show that a symplectic prior acts as a structural regularizer that keeps dynamics well-conditioned under data-sparse conditions. The alignment grid is the empirical analogue — the symplectic prior in our architecture maintains informative gradient signals at depths where the unstructured MLP has nothing left to measure.

Mean alignment vs depth

Mean cosine similarity averaged over all training epochs, plotted against depth. This captures whether each architecture's losses cooperate or fight on average across the full run. The SymplecticPINN maintains a positive mean at all depths. The MLP mean collapses to exactly zero at depth 50 — not approximately, but numerically exact, with zero variance. The gradient is dead.

Conflict frequency vs depth

Fraction of training epochs with negative cosine similarity, plotted against depth. — **Gradient conflict frequency vs depth** Fraction of logged epochs where losses are in direct conflict (A < 0). Lower is better.

The fraction of logged epochs where A < 0 (losses in direct conflict). Lower is better. At depth 4, the SymplecticPINN has roughly half the conflict frequency of the MLP, which is the cleanest controlled comparison in the experiment. The depth 25 result is a nuance discussed in the analysis section below.

Canizares et al.^[2] provide theoretical support for understanding this result: backward error analysis for symplectic integrators shows that the computed trajectory stays exponentially close to a nearby exact Hamiltonian flow. In our setting, this implies the SymplecticPINN is navigating a well-defined loss landscape at every depth — the geometric constraint bounds how far the training trajectory can stray from a coherent optimization path, even when the two objectives momentarily disagree.

Numerical summary

Raw numbers from the terminal output. The most important column is the negative fraction (NegFrac), which tells us what proportion of training steps had the two losses actively fighting each other. At depth 50, the MLP row shows Mean = −0.0000, Std = 0.0000 — not approximately zero but exactly zero with zero variance. The SymplecticPINN row at depth 50 reads Mean = +0.0630, Std = 0.2175, NegFrac = 0.345, confirming the model is alive and producing meaningful gradient signals. Its final loss is 1.38e-7, six orders of magnitude better than the MLP's 2.89e-1.

Analysis

The cleanest result — depth 4: under a fair controlled comparison where all three architectures are functional, the SymplecticPINN spends significantly less of its training in gradient conflict. This directly supports the theoretical motivation: symplectic structure in the latent space imposes a geometric constraint that keeps multi-objective optimization more cooperative. Aich and Aich^[1] prove this constraint formally for separable-Hamiltonian architectures — our depth-4 result is the empirical confirmation.

The catastrophic MLP collapse at depth 50: the MLP row is the most important entry in the full table. The gradient has completely died — the 55.5% negative fraction at that depth is numerical noise from a network that learned nothing in 10,000 epochs. The SymplecticPINN at the same depth achieves a final loss six orders of magnitude better. This is the clearest evidence that the symplectic constraint is not merely a mild regularizer: it maintains the fundamental trainability of the network at depths where standard architectures become entirely non-functional.

Honest nuance — depth 25: the MLP shows its highest mean alignment (+0.1515) and lowest conflict frequency (22.5%) at depth 25. The losses are cooperating, but the model still reaches a worse final loss (1.48e-6) than the SymplecticPINN (2.90e-7). High gradient alignment is necessary but not sufficient — the direction in which the losses cooperate also matters. The MLP at depth 25 has harmonious gradients pointing toward a shallower minimum.

The SymplecticPINN at depth 25 shows its highest NegFrac (35.5%). The loss is still the best of the three, so the conflict is not impeding convergence — the network is navigating a sharper landscape and finding a better solution. Canizares et al.^[2] show that symplectic integrators remain close to an exact Hamiltonian flow through backward error analysis; in this case the exact flow leads through a region of higher local gradient disagreement toward a better global minimum. This will be flagged as a caveat rather than omitted in the paper.

References

Aich, A. & Aich, A. (2025). Symplectic Generative Networks: A Hamiltonian Framework for Invertible Deep Generative Modeling. arXiv:2505.22527. ↗
Canizares, P., Murari, D., Schönlieb, C.-B., Sherry, F., & Shumaylov, Z. (2024). Symplectic Neural Flows for Modeling and Discovery. arXiv:2412.16787. ↗
Tanaka, Y., Iwata, T., & Ueda, N. (2022). Symplectic Spectrum Gaussian Processes: Learning Hamiltonians from Noisy and Sparse Data. Advances in Neural Information Processing Systems 35 (NeurIPS 2022). ↗
Abijuru, J., Nagda, M., Ostheimer, P., Aurich, J. C., Vollmer, S. J., Kloft, M., & Fellenz, S. (2025). Physics-Informed Residual Flows. EurIPS 2025 Workshop on Differentiable Systems (DiffSys). ↗