Thermo-Mechanical-Metallurgical Surrogates
A proof of concept Thermo-Mechanical-Metallurgical(TMM) machine learning model
using deal.ii and libtorch. The complete implementation will be in C++ for a
fully automated pipeline for solving models and in-situ training of a neural
network with the results.
The initial models will be based on JMAK implementation from kmC-FEA.
Topics and avenues to explore
Metallurgical modeling (new domain)
Phase transformation kinetics:
- JMAK theory extensions beyond isothermal conditions
- Koistinen-Marburger model for martensitic transformations
- Continuous cooling transformation (CCT) diagram integration
- Multi-phase field approaches vs. empirical kinetics
- Keywords:
non-isothermal JMAK kinetics,CCT diagram FEM integration,Koistinen-Marburger welding,phase field additive manufacturing
Microstructure evolution:
- Grain growth models (Monte Carlo Potts, cellular automata)
- Texture evolution during WAAM thermal cycles
- Precipitation kinetics (Kampmann-Wagner numerical model)
- CALPHAD-coupled simulations (Thermo-Calc, Pandat integration)
- Keywords:
Monte Carlo Potts grain growth welding,cellular automata microstructure AM,Kampmann-Wagner precipitation,CALPHAD FEM coupling
Transformation-induced effects:
- Transformation-induced plasticity (TRIP) models
- Volume change from phase transformations and stress coupling
- Greenwood-Johnson mechanism for transformation plasticity
- Keywords:
TRIP model welding FEM,Greenwood-Johnson transformation plasticity,phase transformation volume change stress
ML surrogate modeling (new domain)
Neural operators for PDE surrogates:
- Fourier Neural Operators (FNO) for spatio-temporal fields
- DeepONet architecture for operator learning
- Graph Neural Operators for unstructured meshes
- Low-rank factorization for high-dimensional outputs
- Keywords:
Fourier neural operator PDE,DeepONet surrogate model,graph neural operator FEM mesh,neural operator heat equation
Physics-informed approaches:
- PINNs for transient heat conduction with moving sources
- Conservative neural networks (energy/mass conservation)
- Hard constraint enforcement vs. penalty methods
- Hybrid FEM-ML: ML for closure terms, FEM for conservation
- Keywords:
PINN transient heat conduction,conservation neural network,hard constraint PINN,hybrid FEM machine learning
Reduced-order modeling:
- Proper Orthogonal Decomposition (POD) for thermal fields
- Autoencoder-based dimensionality reduction
- POD-Galerkin projection with neural network closure
- Dynamic mode decomposition (DMD) for transient behavior
- Keywords:
POD thermal field reduction,autoencoder FEM surrogate,POD neural network closure,dynamic mode decomposition transient
In-situ training and active learning (new domain)
Online training during simulation:
- Training surrogate concurrently with FEM solver
- When to trigger training (error threshold, iteration count)
- Memory-efficient data pipelines for long simulations
- Checkpointing and resuming training state
- Keywords:
online surrogate training simulation,concurrent FEM machine learning,adaptive surrogate model training
Active learning for simulation data:
- Uncertainty-driven sampling (where surrogate is uncertain)
- Query-by-committee, expected improvement strategies
- Adaptive mesh refinement guided by surrogate error
- Multi-fidelity: coarse FEM for exploration, fine for refinement
- Keywords:
active learning FEM sampling,uncertainty-driven surrogate training,multi-fidelity surrogate additive manufacturing,Bayesian optimization simulation
Error estimation and UQ:
- Ensemble methods for epistemic uncertainty
- Conformal prediction for surrogate confidence bounds
- A posteriori error estimation for neural operators
- Propagation of surrogate error to downstream predictions
- Keywords:
ensemble uncertainty neural operator,conformal prediction surrogate,a posteriori error neural network,surrogate error propagation
deal.ii + libtorch integration (new tools)
deal.ii patterns for coupled problems:
- Matrix-free operator implementation for thermal problems
- Block preconditioners for coupled systems
- Adaptive mesh refinement with solution transfer
- Parallel distributed computation patterns (step-40, step-42, step-55)
- Keywords:
deal.ii matrix-free thermal,deal.ii block preconditioner,deal.ii solution transfer AMR,deal.ii distributed parallel
libtorch embedded in C++ simulation:
- Model loading and inference within solver loops
- Gradient computation for in-situ training
- Memory management for large tensor operations
- ONNX export for framework interoperability
- Keywords:
libtorch C++ embedded inference,libtorch gradient computation training,ONNX C++ runtime simulation
Literature review plan
Phase 1: Metallurgical foundations (weeks 1-2)
Goal: Build domain knowledge in phase transformations and microstructure modeling
-
JMAK and phase transformation kinetics:
- Search:
("JMAK" OR "Johnson-Mehl-Avrami-Kolmogorov") AND ("non-isothermal" OR "continuous cooling") AND ("welding" OR "additive manufacturing") - Focus: How JMAK is adapted for non-isothermal thermal cycles, limitations, extensions
- Output: Summary of JMAK variants and their applicability to WAAM thermal histories
- Search:
-
CALPHAD-integrated simulations:
- Search:
("CALPHAD" OR "Thermo-Calc") AND ("finite element" OR FEM) AND ("welding" OR "additive manufacturing") - Focus: How thermodynamic databases couple to FEM, computational cost, accuracy gains
- Output: Map of CALPHAD-FEM coupling strategies and software implementations
- Search:
-
Transformation-induced plasticity:
- Search:
("TRIP" OR "transformation induced plasticity") AND ("welding" OR "additive manufacturing") AND ("finite element" OR FEM) - Focus: When TRIP matters (material systems, thermal cycles), model formulations
- Output: Decision tree for when to include TRIP in TMM models
- Search:
Phase 2: ML surrogates for physics simulation (weeks 3-4)
Goal: Understand the landscape of neural operators and physics-informed ML
-
Neural operator architectures:
- Search:
("Fourier neural operator" OR DeepONet) AND ("PDE" OR "partial differential equation") AND ("heat transfer" OR "solid mechanics") - Focus: Architecture choices for spatio-temporal problems, generalization across geometries
- Output: Comparison table of neural operator architectures for thermal/mechanical PDEs
- Search:
-
Physics-informed neural networks:
- Search:
("physics-informed neural network" OR PINN) AND ("transient" OR "time-dependent") AND ("heat equation" OR "heat conduction") - Focus: Training stability for transient problems, handling moving sources, scalability
- Output: PINN limitations and success factors for transient thermal problems
- Search:
-
Reduced-order modeling with ML:
- Search:
("proper orthogonal decomposition" OR POD) AND ("neural network" OR "machine learning") AND ("finite element" OR FEM) - Focus: POD-NN hybrids, when ROM is sufficient vs. when neural operators needed
- Output: Decision framework for choosing ROM vs. neural operator approach
- Search:
Phase 3: In-situ training and active learning (week 5)
Goal: Understand how to train surrogates during simulation, not post-hoc
-
Online/concurrent training:
- Search:
("online training" OR "in-situ training" OR "on-the-fly") AND ("surrogate model" OR "reduced order") AND ("simulation" OR "FEM") - Focus: Training triggers, data selection strategies, convergence criteria
- Output: Taxonomy of in-situ training approaches and their computational overhead
- Search:
-
Active learning for simulation:
- Search:
("active learning" OR "adaptive sampling") AND ("surrogate" OR "emulator") AND ("computational model" OR "simulation") - Focus: Query strategies, uncertainty metrics, multi-fidelity approaches
- Output: Active learning strategy recommendations for FEM surrogate training
- Search:
-
Error estimation and UQ:
- Search:
("uncertainty quantification" OR UQ) AND ("neural operator" OR "surrogate model") AND ("PDE" OR "partial differential equation") - Focus: How to quantify surrogate confidence, error propagation to predictions
- Output: UQ methods suitable for TMM surrogate predictions
- Search:
Phase 4: deal.ii + libtorch implementation patterns (week 6)
Goal: Understand how to integrate ML into deal.ii-based simulations
-
deal.ii for coupled thermal problems:
- Search:
deal.ii ("thermal" OR "heat transfer") AND ("matrix-free" OR "adaptive mesh") - Focus: Matrix-free implementations, AMR for moving heat sources, parallel scaling
- Output: deal.ii tutorial/examples most relevant to TMM implementation
- Search:
-
libtorch in C++ simulation codes:
- Search:
("libtorch" OR "PyTorch C++") AND ("simulation" OR "solver" OR "FEM") AND ("embedded" OR "integrated") - Focus: Performance overhead, memory management, gradient computation patterns
- Output: Reference implementations of libtorch embedded in scientific codes
- Search:
-
Hybrid FEM-ML architectures:
- Search:
("hybrid" OR "coupled") AND ("machine learning" OR "neural network") AND ("finite element" OR FEM) AND ("closure" OR "surrogate") - Focus: Where to insert ML (constitutive laws, boundary conditions, full field prediction)
- Output: Architecture options for TMM surrogate with clear tradeoffs
- Search:
Phase 5: Gap analysis (week 7)
Goal: Synthesize findings into specific research gap identification
-
TMM + ML intersection matrix:
- Rows: thermal-only, thermo-mechanical, thermo-metallurgical, full TMM
- Columns: analytical, FEM, ROM, neural operator, hybrid FEM-ML
- Mark existing work, identify empty cells as gaps
-
In-situ training gap:
- How many works train surrogates during simulation vs. post-hoc?
- What problems have been solved with in-situ training?
- What is missing for TMM problems specifically?
-
Computational efficiency gap:
- Tabulate: model fidelity vs. computational cost for existing TMM approaches
- Where can surrogates provide 10x-100x speedup without accuracy loss?
- What is the bottleneck in current TMM simulations?
-
Open-source implementation gap:
- How many TMM + ML works are open-source?
- What frameworks are used (commercial vs. open-source)?
- Where is there opportunity for a deal.ii + libtorch reference implementation?
Future exploration: complete fast TMM prediction model
Architecture options for chained prediction
Option 1: GNN → PINN chain
GNN predicts temperature and strain histories from tool path, feeds into PINN for metallurgical prediction. Feasible but error propagation is the critical flaw. Metallurgical kinetics are non-linear in temperature and strain, so upstream errors compound non-linearly through the JMAK/KM equations. The PINN enforces physics on metallurgical equations but cannot correct approximate inputs.
Requires autoregressive or recurrent GNN (GraphGRU, EvolveGCN) to capture history dependence, not static GNN.
Option 2: Single PINN for all three physics
Low feasibility. Three coupled PDE systems in one loss function: heat equation (parabolic, moving source), momentum balance (elliptic with plasticity), kinetics ODEs (stiff, path-dependent). Training instability from different equation scales, stiffness, and convergence rates. PINNs already struggle with multi-term loss balancing for single PDEs.
Path dependence requires either augmenting input space with time (4D spatio-temporal) or recurrent PINN variants, both unproven at this complexity.
Option 3: Neural operator chain (recommended)
Three-stage architecture:
-
Graph Neural Operator for thermal field:
- Input: tool path + process parameters
- Output: full temperature history T(x,t)
- Trained on FEM thermal solutions
-
Graph Neural Operator for mechanical field:
- Input: T(x,t) from stage 1 + tool path
- Output: strain/stress history ε(x,t), σ(x,t)
- Can be trained jointly with stage 1 to reduce error propagation
-
Local kinetics solver (not PINN):
- Input: T(x,t) and ε(x,t) histories at each material point
- JMAK/KM equations are ODEs at each point, decoupled spatially once T and ε are known
- Solve analytically or with small MLP approximating ODE integration
- Computationally cheap, no PINN needed
Why neural operator chain beats PINNs
- Neural operators learn the mapping directly, no loss balancing across physics
- Path dependence handled naturally by learning full spatio-temporal operator
- Metallurgical step is local (ODE per material point), no PDE-constrained network needed
- Training is sequential and stable
Error propagation mitigation
- Joint training: Train thermal and mechanical operators with combined loss including downstream metallurgical error, not just field-level MSE
- Uncertainty quantification: Use ensemble or Bayesian GNO to propagate uncertainty bounds through the chain
- Multi-fidelity correction: Train small correction network on high-fidelity FEM data that adjusts chain output
Literature status
- Neural operators for thermal fields in AM: demonstrated (FNO for LPBF thermal, GNO for WAAM)
- Neural operators for thermo-mechanical: emerging but feasible
- PINNs for coupled TMM: no successful demonstrations for full coupling; works only for simplified single-physics cases
- In-situ training of chained operators: unexplored, this is the gap