Thermo-Mechanical-Metallurgical Surrogates

A proof of concept Thermo-Mechanical-Metallurgical(TMM) machine learning model using deal.ii and libtorch. The complete implementation will be in C++ for a fully automated pipeline for solving models and in-situ training of a neural network with the results.

The initial models will be based on JMAK implementation from kmC-FEA.

Topics and avenues to explore

Metallurgical modeling (new domain)

Phase transformation kinetics:

  • JMAK theory extensions beyond isothermal conditions
  • Koistinen-Marburger model for martensitic transformations
  • Continuous cooling transformation (CCT) diagram integration
  • Multi-phase field approaches vs. empirical kinetics
  • Keywords: non-isothermal JMAK kinetics, CCT diagram FEM integration, Koistinen-Marburger welding, phase field additive manufacturing

Microstructure evolution:

  • Grain growth models (Monte Carlo Potts, cellular automata)
  • Texture evolution during WAAM thermal cycles
  • Precipitation kinetics (Kampmann-Wagner numerical model)
  • CALPHAD-coupled simulations (Thermo-Calc, Pandat integration)
  • Keywords: Monte Carlo Potts grain growth welding, cellular automata microstructure AM, Kampmann-Wagner precipitation, CALPHAD FEM coupling

Transformation-induced effects:

  • Transformation-induced plasticity (TRIP) models
  • Volume change from phase transformations and stress coupling
  • Greenwood-Johnson mechanism for transformation plasticity
  • Keywords: TRIP model welding FEM, Greenwood-Johnson transformation plasticity, phase transformation volume change stress

ML surrogate modeling (new domain)

Neural operators for PDE surrogates:

  • Fourier Neural Operators (FNO) for spatio-temporal fields
  • DeepONet architecture for operator learning
  • Graph Neural Operators for unstructured meshes
  • Low-rank factorization for high-dimensional outputs
  • Keywords: Fourier neural operator PDE, DeepONet surrogate model, graph neural operator FEM mesh, neural operator heat equation

Physics-informed approaches:

  • PINNs for transient heat conduction with moving sources
  • Conservative neural networks (energy/mass conservation)
  • Hard constraint enforcement vs. penalty methods
  • Hybrid FEM-ML: ML for closure terms, FEM for conservation
  • Keywords: PINN transient heat conduction, conservation neural network, hard constraint PINN, hybrid FEM machine learning

Reduced-order modeling:

  • Proper Orthogonal Decomposition (POD) for thermal fields
  • Autoencoder-based dimensionality reduction
  • POD-Galerkin projection with neural network closure
  • Dynamic mode decomposition (DMD) for transient behavior
  • Keywords: POD thermal field reduction, autoencoder FEM surrogate, POD neural network closure, dynamic mode decomposition transient

In-situ training and active learning (new domain)

Online training during simulation:

  • Training surrogate concurrently with FEM solver
  • When to trigger training (error threshold, iteration count)
  • Memory-efficient data pipelines for long simulations
  • Checkpointing and resuming training state
  • Keywords: online surrogate training simulation, concurrent FEM machine learning, adaptive surrogate model training

Active learning for simulation data:

  • Uncertainty-driven sampling (where surrogate is uncertain)
  • Query-by-committee, expected improvement strategies
  • Adaptive mesh refinement guided by surrogate error
  • Multi-fidelity: coarse FEM for exploration, fine for refinement
  • Keywords: active learning FEM sampling, uncertainty-driven surrogate training, multi-fidelity surrogate additive manufacturing, Bayesian optimization simulation

Error estimation and UQ:

  • Ensemble methods for epistemic uncertainty
  • Conformal prediction for surrogate confidence bounds
  • A posteriori error estimation for neural operators
  • Propagation of surrogate error to downstream predictions
  • Keywords: ensemble uncertainty neural operator, conformal prediction surrogate, a posteriori error neural network, surrogate error propagation

deal.ii + libtorch integration (new tools)

deal.ii patterns for coupled problems:

  • Matrix-free operator implementation for thermal problems
  • Block preconditioners for coupled systems
  • Adaptive mesh refinement with solution transfer
  • Parallel distributed computation patterns (step-40, step-42, step-55)
  • Keywords: deal.ii matrix-free thermal, deal.ii block preconditioner, deal.ii solution transfer AMR, deal.ii distributed parallel

libtorch embedded in C++ simulation:

  • Model loading and inference within solver loops
  • Gradient computation for in-situ training
  • Memory management for large tensor operations
  • ONNX export for framework interoperability
  • Keywords: libtorch C++ embedded inference, libtorch gradient computation training, ONNX C++ runtime simulation

Literature review plan

Phase 1: Metallurgical foundations (weeks 1-2)

Goal: Build domain knowledge in phase transformations and microstructure modeling

  1. JMAK and phase transformation kinetics:

    • Search: ("JMAK" OR "Johnson-Mehl-Avrami-Kolmogorov") AND ("non-isothermal" OR "continuous cooling") AND ("welding" OR "additive manufacturing")
    • Focus: How JMAK is adapted for non-isothermal thermal cycles, limitations, extensions
    • Output: Summary of JMAK variants and their applicability to WAAM thermal histories
  2. CALPHAD-integrated simulations:

    • Search: ("CALPHAD" OR "Thermo-Calc") AND ("finite element" OR FEM) AND ("welding" OR "additive manufacturing")
    • Focus: How thermodynamic databases couple to FEM, computational cost, accuracy gains
    • Output: Map of CALPHAD-FEM coupling strategies and software implementations
  3. Transformation-induced plasticity:

    • Search: ("TRIP" OR "transformation induced plasticity") AND ("welding" OR "additive manufacturing") AND ("finite element" OR FEM)
    • Focus: When TRIP matters (material systems, thermal cycles), model formulations
    • Output: Decision tree for when to include TRIP in TMM models

Phase 2: ML surrogates for physics simulation (weeks 3-4)

Goal: Understand the landscape of neural operators and physics-informed ML

  1. Neural operator architectures:

    • Search: ("Fourier neural operator" OR DeepONet) AND ("PDE" OR "partial differential equation") AND ("heat transfer" OR "solid mechanics")
    • Focus: Architecture choices for spatio-temporal problems, generalization across geometries
    • Output: Comparison table of neural operator architectures for thermal/mechanical PDEs
  2. Physics-informed neural networks:

    • Search: ("physics-informed neural network" OR PINN) AND ("transient" OR "time-dependent") AND ("heat equation" OR "heat conduction")
    • Focus: Training stability for transient problems, handling moving sources, scalability
    • Output: PINN limitations and success factors for transient thermal problems
  3. Reduced-order modeling with ML:

    • Search: ("proper orthogonal decomposition" OR POD) AND ("neural network" OR "machine learning") AND ("finite element" OR FEM)
    • Focus: POD-NN hybrids, when ROM is sufficient vs. when neural operators needed
    • Output: Decision framework for choosing ROM vs. neural operator approach

Phase 3: In-situ training and active learning (week 5)

Goal: Understand how to train surrogates during simulation, not post-hoc

  1. Online/concurrent training:

    • Search: ("online training" OR "in-situ training" OR "on-the-fly") AND ("surrogate model" OR "reduced order") AND ("simulation" OR "FEM")
    • Focus: Training triggers, data selection strategies, convergence criteria
    • Output: Taxonomy of in-situ training approaches and their computational overhead
  2. Active learning for simulation:

    • Search: ("active learning" OR "adaptive sampling") AND ("surrogate" OR "emulator") AND ("computational model" OR "simulation")
    • Focus: Query strategies, uncertainty metrics, multi-fidelity approaches
    • Output: Active learning strategy recommendations for FEM surrogate training
  3. Error estimation and UQ:

    • Search: ("uncertainty quantification" OR UQ) AND ("neural operator" OR "surrogate model") AND ("PDE" OR "partial differential equation")
    • Focus: How to quantify surrogate confidence, error propagation to predictions
    • Output: UQ methods suitable for TMM surrogate predictions

Phase 4: deal.ii + libtorch implementation patterns (week 6)

Goal: Understand how to integrate ML into deal.ii-based simulations

  1. deal.ii for coupled thermal problems:

    • Search: deal.ii ("thermal" OR "heat transfer") AND ("matrix-free" OR "adaptive mesh")
    • Focus: Matrix-free implementations, AMR for moving heat sources, parallel scaling
    • Output: deal.ii tutorial/examples most relevant to TMM implementation
  2. libtorch in C++ simulation codes:

    • Search: ("libtorch" OR "PyTorch C++") AND ("simulation" OR "solver" OR "FEM") AND ("embedded" OR "integrated")
    • Focus: Performance overhead, memory management, gradient computation patterns
    • Output: Reference implementations of libtorch embedded in scientific codes
  3. Hybrid FEM-ML architectures:

    • Search: ("hybrid" OR "coupled") AND ("machine learning" OR "neural network") AND ("finite element" OR FEM) AND ("closure" OR "surrogate")
    • Focus: Where to insert ML (constitutive laws, boundary conditions, full field prediction)
    • Output: Architecture options for TMM surrogate with clear tradeoffs

Phase 5: Gap analysis (week 7)

Goal: Synthesize findings into specific research gap identification

  1. TMM + ML intersection matrix:

    • Rows: thermal-only, thermo-mechanical, thermo-metallurgical, full TMM
    • Columns: analytical, FEM, ROM, neural operator, hybrid FEM-ML
    • Mark existing work, identify empty cells as gaps
  2. In-situ training gap:

    • How many works train surrogates during simulation vs. post-hoc?
    • What problems have been solved with in-situ training?
    • What is missing for TMM problems specifically?
  3. Computational efficiency gap:

    • Tabulate: model fidelity vs. computational cost for existing TMM approaches
    • Where can surrogates provide 10x-100x speedup without accuracy loss?
    • What is the bottleneck in current TMM simulations?
  4. Open-source implementation gap:

    • How many TMM + ML works are open-source?
    • What frameworks are used (commercial vs. open-source)?
    • Where is there opportunity for a deal.ii + libtorch reference implementation?

Future exploration: complete fast TMM prediction model

Architecture options for chained prediction

Option 1: GNN → PINN chain

GNN predicts temperature and strain histories from tool path, feeds into PINN for metallurgical prediction. Feasible but error propagation is the critical flaw. Metallurgical kinetics are non-linear in temperature and strain, so upstream errors compound non-linearly through the JMAK/KM equations. The PINN enforces physics on metallurgical equations but cannot correct approximate inputs.

Requires autoregressive or recurrent GNN (GraphGRU, EvolveGCN) to capture history dependence, not static GNN.

Option 2: Single PINN for all three physics

Low feasibility. Three coupled PDE systems in one loss function: heat equation (parabolic, moving source), momentum balance (elliptic with plasticity), kinetics ODEs (stiff, path-dependent). Training instability from different equation scales, stiffness, and convergence rates. PINNs already struggle with multi-term loss balancing for single PDEs.

Path dependence requires either augmenting input space with time (4D spatio-temporal) or recurrent PINN variants, both unproven at this complexity.

Option 3: Neural operator chain (recommended)

Three-stage architecture:

  1. Graph Neural Operator for thermal field:

    • Input: tool path + process parameters
    • Output: full temperature history T(x,t)
    • Trained on FEM thermal solutions
  2. Graph Neural Operator for mechanical field:

    • Input: T(x,t) from stage 1 + tool path
    • Output: strain/stress history ε(x,t), σ(x,t)
    • Can be trained jointly with stage 1 to reduce error propagation
  3. Local kinetics solver (not PINN):

    • Input: T(x,t) and ε(x,t) histories at each material point
    • JMAK/KM equations are ODEs at each point, decoupled spatially once T and ε are known
    • Solve analytically or with small MLP approximating ODE integration
    • Computationally cheap, no PINN needed

Why neural operator chain beats PINNs

  • Neural operators learn the mapping directly, no loss balancing across physics
  • Path dependence handled naturally by learning full spatio-temporal operator
  • Metallurgical step is local (ODE per material point), no PDE-constrained network needed
  • Training is sequential and stable

Error propagation mitigation

  • Joint training: Train thermal and mechanical operators with combined loss including downstream metallurgical error, not just field-level MSE
  • Uncertainty quantification: Use ensemble or Bayesian GNO to propagate uncertainty bounds through the chain
  • Multi-fidelity correction: Train small correction network on high-fidelity FEM data that adjusts chain output

Literature status

  • Neural operators for thermal fields in AM: demonstrated (FNO for LPBF thermal, GNO for WAAM)
  • Neural operators for thermo-mechanical: emerging but feasible
  • PINNs for coupled TMM: no successful demonstrations for full coupling; works only for simplified single-physics cases
  • In-situ training of chained operators: unexplored, this is the gap