From 48d4c1044adb99707c20839823c353f0217ef833 Mon Sep 17 00:00:00 2001 From: wassname <1103714+wassname@users.noreply.github.com> Date: Fri, 6 Mar 2026 13:39:53 +0800 Subject: [PATCH] refactor(pinn): extract heat exchanger specifics to refs/ Moved heat-exchanger-specific content from pinn/SKILL.md to pinn/refs/heat_exchanger.md: complexity ladder table, known failure modes (U->0, counterflow signs), property mappings (REFPROP/PCHIP), multi-episode training. PINN skill is now domain-agnostic. pinn/SKILL.md reduced from 4961w to 4274w (~14%). --- pinn/SKILL.md | 89 ++++--------------------------- pinn/refs/heat_exchanger.md | 103 ++++++++++++++++++++++++++++++++++++ 2 files changed, 114 insertions(+), 78 deletions(-) create mode 100644 pinn/refs/heat_exchanger.md diff --git a/pinn/SKILL.md b/pinn/SKILL.md index e0e3952..7e7ef19 100644 --- a/pinn/SKILL.md +++ b/pinn/SKILL.md @@ -9,7 +9,7 @@ description: "PINN (Physics-Informed Neural Network) training best practices and https://gist.github.com/wassname/a68314f731002b32aa637c3d43a081c9/edit -Consolidated from: NeuralPDE.jl tests/docs, ConFIG repo, Wang et al. 2021, Rathore et al. 2024 (ICML), ml_debug folklore, and practical experience with heat exchanger PINNs. +Consolidated from: NeuralPDE.jl tests/docs, ConFIG repo, Wang et al. 2021, Rathore et al. 2024 (ICML), ml_debug folklore, and practical experience. Heat-exchanger-specific notes in [refs/heat_exchanger.md](refs/heat_exchanger.md). Epistemic status: Patterns confirmed across multiple sources. Where sources disagree, noted. Paper claims marked with credence estimates. @@ -21,21 +21,7 @@ PINNs are complex. Before trusting a PINN, work up the complexity ladder and com **Make complexity pay rent.** If a fancier model doesn't improve on the simpler one, the added physics/architecture is either wrong, badly scaled, or unnecessary. -### Complexity ladder (heat exchanger example) - -| Level | Model | Assumptions | What it tests | -|---|---|---|---| -| 0 | Persistence: y(t) = y(t-1) | None | Does model capture any dynamics? | -| 1 | Exponential decay: T(t) = T_ss + (T_0 - T_ss) exp(-t/tau) | First-order response | Dominant thermal mode | -| 2 | Linear state-space / DMDc: fit A, B via OLS on finite differences (or Dynamic Mode Decomposition with control) | Linear dynamics | Near-linear regime | -| 3 | Pure data MLP (no physics loss) | NN can interpolate | If PINN doesn't beat this, physics constraint hurts | -| 4 | Effectiveness-NTU closed form | 1D, constant Cp, steady state | Analytical sanity check | -| 5 | 1D ODE with constant Cp (scipy solve_bvp) | 1D, constant Cp | Classical numerical baseline | -| 6 | 1D ODE with enthalpy (scipy solve_bvp) | 1D, variable properties | Phase change handling | -| 7 | 1D PINN with enthalpy | 1D, learned U | Differentiable model, parameter estimation | -| 8 | 2D PINN with radial maldistribution | 2D, learned U + u_r | Full physics | - -At each level, brainstorm: +Build a complexity ladder for your problem (see [refs/heat_exchanger.md](refs/heat_exchanger.md) for a heat exchanger example). At each level, brainstorm: - What assumption am I adding/relaxing? - What does this buy me (lower RMSE, new physics captured)? - What breaks if I simplify further? @@ -226,21 +212,7 @@ The PINN loss has multiple terms (PDE residual, BCs, ICs, data) with different g > "The proposed approach consistently outperforms a standard PINN-based collocation method." > Source: https://arxiv.org/abs/2104.08426, Abstract and Section 1 > Evidence: evidence/sukumar2022_exact_bc_distance.md -> For our heat exchanger: u_{r,tube}(z,r) = r(R-r) * NN(z,r) enforces zero radial velocity at r=0 and r=R by construction. Same principle. - -### Known failure modes - -**Degenerate U -> 0**: When learning a heat transfer coefficient U jointly with the network, the optimizer may find U -> 0 (no heat transfer, flat profiles). This satisfies the PDE trivially. -- Fix: bound U in [100, 5000], initialize near a physically reasonable value (~500). -- Fix: penalize |dT/dz| being too small (the network should predict temperature change). - -**Sign errors in counterflow**: The counterflow BVP in z-coordinates has both fluids' enthalpies decreasing in z: - - dh_t/dz = -U*A'*(T_t - T_s)/m_dot_t (tube cools, < 0 since T_t > T_s) - - dh_s/dz = -U*A'*(T_t - T_s)/m_dot_s (same sign! shell also decreases in z, but its flow is -z so it warms) - The counterflow is encoded in BCs (h_t at z=0, h_s at z=L), not in opposing ODE signs. - Common mistake: putting dh_s/dz = +q gives dh_s/dz > 0 (shell hot at z=0 exit, cold at z=L entry -- wrong). - -**NN vs scalar parameter**: When a scalar (U) and a NN (temperature field) are optimized jointly, the NN is easier to update (more parameters, more gradient signal). The scalar gets stuck. Fix: use different learning rates, or estimate U separately first. +> Domain-specific failure modes and hard BC examples: see [refs/heat_exchanger.md](refs/heat_exchanger.md). --- @@ -266,52 +238,13 @@ For 2D problems with radial integrals: use a regular grid in r (including r=0 an --- -## 6. Property Mappings (EoS for PINNs) +## 6. Property Mappings & Multi-Episode Training -For heat exchangers with phase change, the T(h) mapping from REFPROP/GERG-2008 must be made differentiable. - -**Recipe:** -1. Generate REFPROP data: T vs h at constant P, dense enough near the phase boundary (~200 points over T = [100, 300] K). -2. Fit a cubic spline to get ~20 minimal knots (reject redundant knots where the curve is nearly linear). -3. Build PCHIP on those knots for monotonicity guarantee (cubic spline can wiggle; PCHIP cannot). -4. Wrap for PyTorch autograd: - ```python - class PchipFunction(torch.autograd.Function): - @staticmethod - def forward(ctx, h): - T = pchip_interp(h.detach().numpy()) # scipy - ctx.save_for_backward(h) - return torch.tensor(T, dtype=h.dtype, device=h.device) - - @staticmethod - def backward(ctx, grad_output): - h, = ctx.saved_tensors - dTdh = pchip_interp.derivative()(h.detach().cpu().numpy()) # scipy - return grad_output * torch.tensor(dTdh, dtype=h.dtype, device=h.device) - ``` -5. Use float64 throughout. float32 loses precision near the phase boundary where dT/dh is small. - -**Why h -> T and not T -> h?** The h(T) direction has a steep region near phase change where dh/dT = Cp -> infinity. The inverse T(h) is smooth and monotone everywhere. Always solve in h space, convert to T only for losses and visualization. +> Domain-specific: differentiable EoS wrapping (REFPROP/PCHIP), IC handling for plant data, multi-episode training. See [refs/heat_exchanger.md](refs/heat_exchanger.md). --- -## 7. Initial Conditions and Multi-Episode Training - -IC is a loss term, but in practice ICs from real plant data are noisy and uncertain. - -**Options (in order of complexity):** -1. Hard-code IC from first observed data point. Simplest; biased if sensor lags. -2. Learnable IC: optimize IC scalars jointly with the network. Adds ~n_states parameters. Works well with L-BFGS polish. -3. Soft IC loss: penalize `||u(t=0) - u_obs(0)||^2` as one loss term. Lets optimizer trade off IC vs physics fit. - -**Multi-episode training** (e.g., ~10 cold restarts over a year): -- Each episode needs its own IC. -- Common pattern: learn per-episode IC vector `ic_k` while sharing the physics network across episodes. -- Warm-start trick: if you have a pure-data pre-fit (Level 3 baseline), initialize PINN data-loss weights from that fit. - ---- - -## 8. Validation and General ML Debugging +## 7. Validation and General ML Debugging ### PINN-specific checks @@ -337,7 +270,7 @@ These apply to PINNs too: | Symptom | Likely cause | |---|---| | Loss stuck from the start | LR too low, bad init, wrong loss function, nondimensionalization missing | -| Loss decreases then explodes | LR too high, numerical instability in T(h) near phase boundary | +| Loss decreases then explodes | LR too high, numerical instability in property mappings | | Loss NaN | log(0), 0/0 in property mapping, temperature outside EoS range | | Physics loss low but BCs violated | Gradient imbalance (Section 4), need loss weighting | | U converges to 0 or bound | Degenerate solution (Section 4), check sign convention | @@ -360,7 +293,7 @@ This takes 5 minutes and saves hours. --- -## 9. Multi-Loss Training Details (ConFIG) +## 8. Multi-Loss Training Details (ConFIG) If you must use multiple loss terms (and in PINNs you usually must): @@ -401,18 +334,18 @@ optimizer.step() --- -## 10. Domain Decomposition (XPINNs) +## 9. Domain Decomposition (XPINNs) For large or geometrically complex domains, split into subdomains each with a local PINN. Interface conditions enforce continuity between subdomains. > Jagtap et al. 2020. "Extended physics-informed neural networks (XPINNs): A generalized space-time domain decomposition based deep learning framework for nonlinear partial differential equations." Commun. Comput. Phys. https://arxiv.org/abs/2005.11025 > Credence ~70%: Multiple citations, implemented in DeepXDE. Enables parallelization; each subdomain network is smaller and easier to optimize. > Key: interface residuals must be added as additional loss terms. Continuity of u and its normal derivative across interfaces. -> Relevant for our heat exchanger: could split axially (cold end / hot end) if the solution has different character in different regions. +> Useful when the solution has different character in different regions (e.g., different phases, boundary layers). **Physics reparameterization**: Instead of learning u(x,t) from scratch, solve for the deviation from a known approximate solution: u(x,t) = u_approx(x,t) + delta(x,t), where the NN learns delta. If u_approx is good, delta is small and the landscape is smoother. The approximate solution can be the steady-state solution, a linearization, or a coarse numerical solution. -## 11. PIKANs (Kolmogorov-Arnold Networks for PINNs) +## 10. PIKANs (Kolmogorov-Arnold Networks for PINNs) > Toscano et al. 2024: PIKANs "lead to smaller models and may also contribute to lowering computational cost while maintaining good accuracy." > Source: https://arxiv.org/abs/2410.13228 diff --git a/pinn/refs/heat_exchanger.md b/pinn/refs/heat_exchanger.md new file mode 100644 index 0000000..8d5b9e6 --- /dev/null +++ b/pinn/refs/heat_exchanger.md @@ -0,0 +1,103 @@ +# Heat Exchanger PINN Notes + +Project-specific notes for heat exchanger PINNs with phase change. Extracted from the general [PINN skill](../SKILL.md) to keep it domain-agnostic. + +--- + +## Complexity ladder (heat exchanger example) + +| Level | Model | Assumptions | What it tests | +|---|---|---|---| +| 0 | Persistence: y(t) = y(t-1) | None | Does model capture any dynamics? | +| 1 | Exponential decay: T(t) = T_ss + (T_0 - T_ss) exp(-t/tau) | First-order response | Dominant thermal mode | +| 2 | Linear state-space / DMDc: fit A, B via OLS on finite differences (or Dynamic Mode Decomposition with control) | Linear dynamics | Near-linear regime | +| 3 | Pure data MLP (no physics loss) | NN can interpolate | If PINN doesn't beat this, physics constraint hurts | +| 4 | Effectiveness-NTU closed form | 1D, constant Cp, steady state | Analytical sanity check | +| 5 | 1D ODE with constant Cp (scipy solve_bvp) | 1D, constant Cp | Classical numerical baseline | +| 6 | 1D ODE with enthalpy (scipy solve_bvp) | 1D, variable properties | Phase change handling | +| 7 | 1D PINN with enthalpy | 1D, learned U | Differentiable model, parameter estimation | +| 8 | 2D PINN with radial maldistribution | 2D, learned U + u_r | Full physics | + +--- + +## Hard BC example (heat exchanger) + +For our heat exchanger: u_{r,tube}(z,r) = r(R-r) * NN(z,r) enforces zero radial velocity at r=0 and r=R by construction. Same principle as Sukumar & Srivastava 2022. + +--- + +## Known failure modes + +**Degenerate U -> 0**: When learning a heat transfer coefficient U jointly with the network, the optimizer may find U -> 0 (no heat transfer, flat profiles). This satisfies the PDE trivially. +- Fix: bound U in [100, 5000], initialize near a physically reasonable value (~500). +- Fix: penalize |dT/dz| being too small (the network should predict temperature change). + +**Sign errors in counterflow**: The counterflow BVP in z-coordinates has both fluids' enthalpies decreasing in z: + - dh_t/dz = -U*A'*(T_t - T_s)/m_dot_t (tube cools, < 0 since T_t > T_s) + - dh_s/dz = -U*A'*(T_t - T_s)/m_dot_s (same sign! shell also decreases in z, but its flow is -z so it warms) + The counterflow is encoded in BCs (h_t at z=0, h_s at z=L), not in opposing ODE signs. + Common mistake: putting dh_s/dz = +q gives dh_s/dz > 0 (shell hot at z=0 exit, cold at z=L entry -- wrong). + +**NN vs scalar parameter**: When a scalar (U) and a NN (temperature field) are optimized jointly, the NN is easier to update (more parameters, more gradient signal). The scalar gets stuck. Fix: use different learning rates, or estimate U separately first. + +--- + +## Property Mappings (EoS for PINNs) + +For heat exchangers with phase change, the T(h) mapping from REFPROP/GERG-2008 must be made differentiable. + +**Recipe:** +1. Generate REFPROP data: T vs h at constant P, dense enough near the phase boundary (~200 points over T = [100, 300] K). +2. Fit a cubic spline to get ~20 minimal knots (reject redundant knots where the curve is nearly linear). +3. Build PCHIP on those knots for monotonicity guarantee (cubic spline can wiggle; PCHIP cannot). +4. Wrap for PyTorch autograd: + ```python + class PchipFunction(torch.autograd.Function): + @staticmethod + def forward(ctx, h): + T = pchip_interp(h.detach().numpy()) # scipy + ctx.save_for_backward(h) + return torch.tensor(T, dtype=h.dtype, device=h.device) + + @staticmethod + def backward(ctx, grad_output): + h, = ctx.saved_tensors + dTdh = pchip_interp.derivative()(h.detach().cpu().numpy()) # scipy + return grad_output * torch.tensor(dTdh, dtype=h.dtype, device=h.device) + ``` +5. Use float64 throughout. float32 loses precision near the phase boundary where dT/dh is small. + +**Why h -> T and not T -> h?** The h(T) direction has a steep region near phase change where dh/dT = Cp -> infinity. The inverse T(h) is smooth and monotone everywhere. Always solve in h space, convert to T only for losses and visualization. + +--- + +## Initial Conditions and Multi-Episode Training + +IC is a loss term, but in practice ICs from real plant data are noisy and uncertain. + +**Options (in order of complexity):** +1. Hard-code IC from first observed data point. Simplest; biased if sensor lags. +2. Learnable IC: optimize IC scalars jointly with the network. Adds ~n_states parameters. Works well with L-BFGS polish. +3. Soft IC loss: penalize `||u(t=0) - u_obs(0)||^2` as one loss term. Lets optimizer trade off IC vs physics fit. + +**Multi-episode training** (e.g., ~10 cold restarts over a year): +- Each episode needs its own IC. +- Common pattern: learn per-episode IC vector `ic_k` while sharing the physics network across episodes. +- Warm-start trick: if you have a pure-data pre-fit (Level 3 baseline), initialize PINN data-loss weights from that fit. + +--- + +## Domain decomposition (heat exchanger note) + +Could split axially (cold end / hot end) if the solution has different character in different regions. + +--- + +## Heat-exchanger-specific symptom table entries + +| Symptom | Likely cause | +|---|---| +| Loss decreases then explodes | Numerical instability in T(h) near phase boundary | +| Loss NaN | temperature outside EoS range | +| U converges to 0 or bound | Degenerate solution, check sign convention | +| PINN worse than pure-data MLP | Physics constraint fighting the data |