Sam Foreman

Sam Foreman

Sam Foreman

Computational Scientist @ Argonne National Laboratory. AI Group @ Leadership Computing Facility (ALCF)

Latest Posts

In ancient times1, back in ~ 2022–2023, virtually all (production) PyTorch code was designed to run on NVIDIA GPUs. In April 2023, AMD announced day-zero support for PyTorch 2.0 within the ROCm 6.0 ecosystem, leveraging new features like...
I’d like to try and post more this year. Ideally these would be less-polished, more-frequent updates on what I’m thinking about / working on. Ongoing Projects AuroraGPT: Large Language Models for Scientific Applications on...
🧑🏻‍💻 About Me 🏡 samforeman.me UIUC (2015): Engineering Physics + Applied Mathematics University of Iowa (2015–2019): PhD. Physics1 ANL (2019–2022): Postdoctoral Researcher ANL (2022–Present): Assistant Computational Scientist Member of...
📉 Simple Experiment to Compare Validation Loss Cool Down Comparison Note📑 W&B Report See W&B Report: Cooling Down Checkpoints for more details. ☃️ Cooling Down 256 Nodes of Aurora: Cooled down over last 10%: W&B Run: volcanic-blaze-4312...
🧑🏻‍💻 About Me 🏡 samforeman.me UIUC (2015): Engineering Physics + Applied Mathematics University of Iowa (2015–2019): PhD. Physics1 ANL (2019–2022): Postdoctoral Researcher ANL (2022–Present): Assistant Computational Scientist Member of...
🌐 Distributed Training 🚀 Scaling: Overview ✅ Goal: Minimize: Cost (i.e. amount of time spent training) Maximize: Performance Note📑 Note See 🤗 Performance and Scalability for more details In this talk, we will explore the intricacies of...
🌎 AERIS Figure 1: arXiv:2509.13523 ACM Gordon Bell Prize for Climate Modeling Finalist @ SC’25 We demonstrate a significant advancement in AI weather and climate modeling with AERIS by efficient scaling of window-based transformer...
Motivation When training on multiple data sources or domains, it is often desirable to smoothly interpolate between two distributions rather than switching abruptly. This ensures stable optimization and avoids sudden shifts in gradient...
👀 Scaling: Overview ✅ Goal: Minimize: Cost (i.e. amount of time spent training) Maximize: Performance Note📑 Note See 🤗 Performance and Scalability for more details 🐢 Training on a Single Device See also: Scientific AI at Scale:...
pbs-tui Figure 1: A terminal dashboard for monitoring PBS Pro schedulers 👀 Overview A terminal user interface built with Textual for monitoring PBS Pro schedulers at the Argonne Leadership Computing Facility. The dashboard surfaces job,...
📌 Source Repositories Things are changing quickly, so to avoid confusion, here are the exact branches used for this demo: Using: auroraGPT-ANL/torchtitan @ saforem2/blendcorpus saforem2/blendcorpus @ saforem2/reorg-imports1 🏃‍♂️ Running...
🎯 AuroraGPT: Goals AuroraGPT: General purpose scientific LLM Broadly trained on a general corpora plus scientific {papers, texts, data} Explore pathways towards a “Scientific Assistant” model Build with international partners (RIKEN,...
👀 Overview 📊 Slides @ samforeman.me/talks/openskai25/training/slides 📄 HTML version: samforeman.me/talks/openskai25/training 📑 Outline Scaling: Overview Data Parallel Training Communication Why Distributed Training? Beyond Data...
🎯 AuroraGPT: Goals AuroraGPT: General purpose scientific LLM Broadly trained on a general corpora plus scientific {papers, texts, data} Explore pathways towards a “Scientific Assistant” model Build with international partners (RIKEN,...
# %matplotlib inline import matplotlib_inline matplotlib_inline.backend_inline.set_matplotlib_formats('svg') import os os.environ['COLORTERM'] = 'truecolor' import lovely_tensors as lt lt.monkey_patch() lt.set_config(color=False) #...
Search Random