Thermo-Mechanical-Metallurgical Surrogates

A proof of concept Thermo-Mechanical-Metallurgical(TMM) machine learning model using deal.ii and libtorch. The complete implementation will be in C++ for a fully automated pipeline for solving models and in-situ training of a neural network with the results.

The initial models will be based on JMAK implementation from kmC-FEA.

Topics and avenues to explore

Metallurgical modeling (new domain)

Phase transformation kinetics:

JMAK theory extensions beyond isothermal conditions
Koistinen-Marburger model for martensitic transformations
Continuous cooling transformation (CCT) diagram integration
Multi-phase field approaches vs. empirical kinetics
Keywords: non-isothermal JMAK kinetics, CCT diagram FEM integration, Koistinen-Marburger welding, phase field additive manufacturing

Microstructure evolution:

Grain growth models (Monte Carlo Potts, cellular automata)
Texture evolution during WAAM thermal cycles
Precipitation kinetics (Kampmann-Wagner numerical model)
CALPHAD-coupled simulations (Thermo-Calc, Pandat integration)
Keywords: Monte Carlo Potts grain growth welding, cellular automata microstructure AM, Kampmann-Wagner precipitation, CALPHAD FEM coupling

Transformation-induced effects:

Transformation-induced plasticity (TRIP) models
Volume change from phase transformations and stress coupling
Greenwood-Johnson mechanism for transformation plasticity
Keywords: TRIP model welding FEM, Greenwood-Johnson transformation plasticity, phase transformation volume change stress

ML surrogate modeling (new domain)

Neural operators for PDE surrogates:

Fourier Neural Operators (FNO) for spatio-temporal fields
DeepONet architecture for operator learning
Graph Neural Operators for unstructured meshes
Low-rank factorization for high-dimensional outputs
Keywords: Fourier neural operator PDE, DeepONet surrogate model, graph neural operator FEM mesh, neural operator heat equation

Physics-informed approaches:

PINNs for transient heat conduction with moving sources
Conservative neural networks (energy/mass conservation)
Hard constraint enforcement vs. penalty methods
Hybrid FEM-ML: ML for closure terms, FEM for conservation
Keywords: PINN transient heat conduction, conservation neural network, hard constraint PINN, hybrid FEM machine learning

Reduced-order modeling:

Proper Orthogonal Decomposition (POD) for thermal fields
Autoencoder-based dimensionality reduction
POD-Galerkin projection with neural network closure
Dynamic mode decomposition (DMD) for transient behavior
Keywords: POD thermal field reduction, autoencoder FEM surrogate, POD neural network closure, dynamic mode decomposition transient

In-situ training and active learning (new domain)

Online training during simulation:

Training surrogate concurrently with FEM solver
When to trigger training (error threshold, iteration count)
Memory-efficient data pipelines for long simulations
Checkpointing and resuming training state
Keywords: online surrogate training simulation, concurrent FEM machine learning, adaptive surrogate model training

Active learning for simulation data:

Uncertainty-driven sampling (where surrogate is uncertain)
Query-by-committee, expected improvement strategies
Adaptive mesh refinement guided by surrogate error
Multi-fidelity: coarse FEM for exploration, fine for refinement
Keywords: active learning FEM sampling, uncertainty-driven surrogate training, multi-fidelity surrogate additive manufacturing, Bayesian optimization simulation

Error estimation and UQ:

Ensemble methods for epistemic uncertainty
Conformal prediction for surrogate confidence bounds
A posteriori error estimation for neural operators
Propagation of surrogate error to downstream predictions
Keywords: ensemble uncertainty neural operator, conformal prediction surrogate, a posteriori error neural network, surrogate error propagation

deal.ii + libtorch integration (new tools)

deal.ii patterns for coupled problems:

Matrix-free operator implementation for thermal problems
Block preconditioners for coupled systems
Adaptive mesh refinement with solution transfer
Parallel distributed computation patterns (step-40, step-42, step-55)
Keywords: deal.ii matrix-free thermal, deal.ii block preconditioner, deal.ii solution transfer AMR, deal.ii distributed parallel

libtorch embedded in C++ simulation:

Model loading and inference within solver loops
Gradient computation for in-situ training
Memory management for large tensor operations
ONNX export for framework interoperability
Keywords: libtorch C++ embedded inference, libtorch gradient computation training, ONNX C++ runtime simulation

Literature review plan

Phase 1: Metallurgical foundations (weeks 1-2)

Goal: Build domain knowledge in phase transformations and microstructure modeling

JMAK and phase transformation kinetics:
- Search: ("JMAK" OR "Johnson-Mehl-Avrami-Kolmogorov") AND ("non-isothermal" OR "continuous cooling") AND ("welding" OR "additive manufacturing")
- Focus: How JMAK is adapted for non-isothermal thermal cycles, limitations, extensions
- Output: Summary of JMAK variants and their applicability to WAAM thermal histories
CALPHAD-integrated simulations:
- Search: ("CALPHAD" OR "Thermo-Calc") AND ("finite element" OR FEM) AND ("welding" OR "additive manufacturing")
- Focus: How thermodynamic databases couple to FEM, computational cost, accuracy gains
- Output: Map of CALPHAD-FEM coupling strategies and software implementations
Transformation-induced plasticity:
- Search: ("TRIP" OR "transformation induced plasticity") AND ("welding" OR "additive manufacturing") AND ("finite element" OR FEM)
- Focus: When TRIP matters (material systems, thermal cycles), model formulations
- Output: Decision tree for when to include TRIP in TMM models

Phase 2: ML surrogates for physics simulation (weeks 3-4)

Goal: Understand the landscape of neural operators and physics-informed ML

Neural operator architectures:
- Search: ("Fourier neural operator" OR DeepONet) AND ("PDE" OR "partial differential equation") AND ("heat transfer" OR "solid mechanics")
- Focus: Architecture choices for spatio-temporal problems, generalization across geometries
- Output: Comparison table of neural operator architectures for thermal/mechanical PDEs
Physics-informed neural networks:
- Search: ("physics-informed neural network" OR PINN) AND ("transient" OR "time-dependent") AND ("heat equation" OR "heat conduction")
- Focus: Training stability for transient problems, handling moving sources, scalability
- Output: PINN limitations and success factors for transient thermal problems
Reduced-order modeling with ML:
- Search: ("proper orthogonal decomposition" OR POD) AND ("neural network" OR "machine learning") AND ("finite element" OR FEM)
- Focus: POD-NN hybrids, when ROM is sufficient vs. when neural operators needed
- Output: Decision framework for choosing ROM vs. neural operator approach

Phase 3: In-situ training and active learning (week 5)

Goal: Understand how to train surrogates during simulation, not post-hoc

Online/concurrent training:
- Search: ("online training" OR "in-situ training" OR "on-the-fly") AND ("surrogate model" OR "reduced order") AND ("simulation" OR "FEM")
- Focus: Training triggers, data selection strategies, convergence criteria
- Output: Taxonomy of in-situ training approaches and their computational overhead
Active learning for simulation:
- Search: ("active learning" OR "adaptive sampling") AND ("surrogate" OR "emulator") AND ("computational model" OR "simulation")
- Focus: Query strategies, uncertainty metrics, multi-fidelity approaches
- Output: Active learning strategy recommendations for FEM surrogate training
Error estimation and UQ:
- Search: ("uncertainty quantification" OR UQ) AND ("neural operator" OR "surrogate model") AND ("PDE" OR "partial differential equation")
- Focus: How to quantify surrogate confidence, error propagation to predictions
- Output: UQ methods suitable for TMM surrogate predictions

Phase 4: deal.ii + libtorch implementation patterns (week 6)

Goal: Understand how to integrate ML into deal.ii-based simulations

deal.ii for coupled thermal problems:
- Search: deal.ii ("thermal" OR "heat transfer") AND ("matrix-free" OR "adaptive mesh")
- Focus: Matrix-free implementations, AMR for moving heat sources, parallel scaling
- Output: deal.ii tutorial/examples most relevant to TMM implementation
libtorch in C++ simulation codes:
- Search: ("libtorch" OR "PyTorch C++") AND ("simulation" OR "solver" OR "FEM") AND ("embedded" OR "integrated")
- Focus: Performance overhead, memory management, gradient computation patterns
- Output: Reference implementations of libtorch embedded in scientific codes
Hybrid FEM-ML architectures:
- Search: ("hybrid" OR "coupled") AND ("machine learning" OR "neural network") AND ("finite element" OR FEM) AND ("closure" OR "surrogate")
- Focus: Where to insert ML (constitutive laws, boundary conditions, full field prediction)
- Output: Architecture options for TMM surrogate with clear tradeoffs

Phase 5: Gap analysis (week 7)

Goal: Synthesize findings into specific research gap identification

TMM + ML intersection matrix:
- Rows: thermal-only, thermo-mechanical, thermo-metallurgical, full TMM
- Columns: analytical, FEM, ROM, neural operator, hybrid FEM-ML
- Mark existing work, identify empty cells as gaps
In-situ training gap:
- How many works train surrogates during simulation vs. post-hoc?
- What problems have been solved with in-situ training?
- What is missing for TMM problems specifically?
Computational efficiency gap:
- Tabulate: model fidelity vs. computational cost for existing TMM approaches
- Where can surrogates provide 10x-100x speedup without accuracy loss?
- What is the bottleneck in current TMM simulations?
Open-source implementation gap:
- How many TMM + ML works are open-source?
- What frameworks are used (commercial vs. open-source)?
- Where is there opportunity for a deal.ii + libtorch reference implementation?

Future exploration: complete fast TMM prediction model

Architecture options for chained prediction

Option 1: GNN → PINN chain

GNN predicts temperature and strain histories from tool path, feeds into PINN for metallurgical prediction. Feasible but error propagation is the critical flaw. Metallurgical kinetics are non-linear in temperature and strain, so upstream errors compound non-linearly through the JMAK/KM equations. The PINN enforces physics on metallurgical equations but cannot correct approximate inputs.

Requires autoregressive or recurrent GNN (GraphGRU, EvolveGCN) to capture history dependence, not static GNN.

Option 2: Single PINN for all three physics

Low feasibility. Three coupled PDE systems in one loss function: heat equation (parabolic, moving source), momentum balance (elliptic with plasticity), kinetics ODEs (stiff, path-dependent). Training instability from different equation scales, stiffness, and convergence rates. PINNs already struggle with multi-term loss balancing for single PDEs.

Path dependence requires either augmenting input space with time (4D spatio-temporal) or recurrent PINN variants, both unproven at this complexity.

Option 3: Neural operator chain (recommended)

Three-stage architecture:

Graph Neural Operator for thermal field:
- Input: tool path + process parameters
- Output: full temperature history T(x,t)
- Trained on FEM thermal solutions
Graph Neural Operator for mechanical field:
- Input: T(x,t) from stage 1 + tool path
- Output: strain/stress history ε(x,t), σ(x,t)
- Can be trained jointly with stage 1 to reduce error propagation
Local kinetics solver (not PINN):
- Input: T(x,t) and ε(x,t) histories at each material point
- JMAK/KM equations are ODEs at each point, decoupled spatially once T and ε are known
- Solve analytically or with small MLP approximating ODE integration
- Computationally cheap, no PINN needed

Why neural operator chain beats PINNs

Neural operators learn the mapping directly, no loss balancing across physics
Path dependence handled naturally by learning full spatio-temporal operator
Metallurgical step is local (ODE per material point), no PDE-constrained network needed
Training is sequential and stable

Error propagation mitigation

Joint training: Train thermal and mechanical operators with combined loss including downstream metallurgical error, not just field-level MSE
Uncertainty quantification: Use ensemble or Bayesian GNO to propagate uncertainty bounds through the chain
Multi-fidelity correction: Train small correction network on high-fidelity FEM data that adjusts chain output

Literature status

Neural operators for thermal fields in AM: demonstrated (FNO for LPBF thermal, GNO for WAAM)
Neural operators for thermo-mechanical: emerging but feasible
PINNs for coupled TMM: no successful demonstrations for full coupling; works only for simplified single-physics cases
In-situ training of chained operators: unexplored, this is the gap

Sundar's Notes

Explorer

Thermo-Mechanical-Metallurgical Surrogates

Thermo-Mechanical-Metallurgical Surrogates

Topics and avenues to explore

Metallurgical modeling (new domain)

ML surrogate modeling (new domain)

In-situ training and active learning (new domain)

deal.ii + libtorch integration (new tools)

Literature review plan

Phase 1: Metallurgical foundations (weeks 1-2)

Phase 2: ML surrogates for physics simulation (weeks 3-4)

Phase 3: In-situ training and active learning (week 5)

Phase 4: deal.ii + libtorch implementation patterns (week 6)

Phase 5: Gap analysis (week 7)

Future exploration: complete fast TMM prediction model

Architecture options for chained prediction

Why neural operator chain beats PINNs

Error propagation mitigation

Literature status

Graph View

Table of Contents

Backlinks