Sam Foreman

In ancient times1, back in ~ 2022–2023, virtually all (production) PyTorch code was designed to run on NVIDIA GPUs. In April 2023, AMD announced day-zero support for PyTorch 2.0 within the ROCm 6.0 ecosystem, leveraging new features like...

🎉 Happy New Year!

Sam Foreman · 4mo

I’d like to try and post more this year. Ideally these would be less-polished, more-frequent updates on what I’m thinking about / working on. Ongoing Projects AuroraGPT: Large Language Models for Scientific Applications on...

AuroraGPT: Training Foundation Models on Supercomputers

Sam Foreman · 5mo

🧑🏻‍💻 About Me 🏡 samforeman.me UIUC (2015): Engineering Physics + Applied Mathematics University of Iowa (2015–2019): PhD. Physics1 ANL (2019–2022): Postdoctoral Researcher ANL (2022–Present): Assistant Computational Scientist Member of...

🧊 Cooling Down Checkpoints: Best Practices for Model Evaluation

Sam Foreman · 6mo

📉 Simple Experiment to Compare Validation Loss Cool Down Comparison Note📑 W&B Report See W&B Report: Cooling Down Checkpoints for more details. ☃️ Cooling Down 256 Nodes of Aurora: Cooled down over last 10%: W&B Run: volcanic-blaze-4312...

Training Foundation Models on Supercomputers

Sam Foreman · 6mo

🧑🏻‍💻 About Me 🏡 samforeman.me UIUC (2015): Engineering Physics + Applied Mathematics University of Iowa (2015–2019): PhD. Physics1 ANL (2019–2022): Postdoctoral Researcher ANL (2022–Present): Assistant Computational Scientist Member of...

Training Foundation Models on Supercomputers

Sam Foreman · 7mo

🌐 Distributed Training 🚀 Scaling: Overview ✅ Goal: Minimize: Cost (i.e. amount of time spent training) Maximize: Performance Note📑 Note See 🤗 Performance and Scalability for more details In this talk, we will explore the intricacies of...

AERIS: Argonne’s Earth Systems Model

Sam Foreman · 7mo

🌎 AERIS Figure 1: arXiv:2509.13523 ACM Gordon Bell Prize for Climate Modeling Finalist @ SC’25 We demonstrate a significant advancement in AI weather and climate modeling with AERIS by efficient scaling of window-based transformer...

🎨 Mixing Between Distributions While Training

Sam Foreman · 7mo

Motivation When training on multiple data sources or domains, it is often desirable to smoothly interpolate between two distributions rather than switching abruptly. This ensures stable optimization and avoids sudden shifts in gradient...

Training Foundation Models on Supercomputers

Sam Foreman · 7mo

👀 Scaling: Overview ✅ Goal: Minimize: Cost (i.e. amount of time spent training) Maximize: Performance Note📑 Note See 🤗 Performance and Scalability for more details 🐢 Training on a Single Device See also: Scientific AI at Scale:...

📊 pbs-tui: TUI for PBS Job Scheduler Monitoring

Sam Foreman · 7mo

pbs-tui Figure 1: A terminal dashboard for monitoring PBS Pro schedulers 👀 Overview A terminal user interface built with Textual for monitoring PBS Pro schedulers at the Argonne Leadership Computing Facility. The dashboard surfaces job,...

🍹 BlendCorpus + TorchTitan @ ALCF

Sam Foreman · 8mo

📌 Source Repositories Things are changing quickly, so to avoid confusion, here are the exact branches used for this demo: Using: auroraGPT-ANL/torchtitan @ saforem2/blendcorpus saforem2/blendcorpus @ saforem2/reorg-imports1 🏃‍♂️ Running...

Scientific AI at Scale: AuroraGPT

Sam Foreman · 8mo

🎯 AuroraGPT: Goals AuroraGPT: General purpose scientific LLM Broadly trained on a general corpora plus scientific {papers, texts, data} Explore pathways towards a “Scientific Assistant” model Build with international partners (RIKEN,...

Scientific AI at Scale: Distributed Training

Sam Foreman · 8mo

👀 Overview 📊 Slides @ samforeman.me/talks/openskai25/training/slides 📄 HTML version: samforeman.me/talks/openskai25/training 📑 Outline Scaling: Overview Data Parallel Training Communication Why Distributed Training? Beyond Data...

AuroraGPT

Sam Foreman · 9mo

🎯 AuroraGPT: Goals AuroraGPT: General purpose scientific LLM Broadly trained on a general corpora plus scientific {papers, texts, data} Explore pathways towards a “Scientific Assistant” model Build with international partners (RIKEN,...

🔳 l2hmc-qcd Example: 4D SU(3)

Sam Foreman · 9mo

# %matplotlib inline import matplotlib_inline matplotlib_inline.backend_inline.set_matplotlib_formats('svg') import os os.environ['COLORTERM'] = 'truecolor' import lovely_tensors as lt lt.monkey_patch() lt.set_config(color=False) #...

Latest Posts

Stats