Peter Norgaard

Peter Norgaard

Peter Norgaard studied Mechanical & Aerospace engineering at the University of Washington and then Princeton University. His undergraduate and graduate research focused on experimental and computational plasma physics, particularly applications to magnetic confinement nuclear fusion. He also studied and worked in the field of numerical methods for ordinary and partial differential equations. At Google, Peter works on the inverse problem of plasma state reconstruction from sparse measurements using Bayesian inference.
Authored Publications
Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
An AI system to help scientists write expert-level empirical software
Eser Aygün
Anastasiya Belyaeva
Gheorghe Comanici
Hao Cui
Renee Johnston
Zahra Shamsi
David Smalling
James Thompson
Sarah Martinson
Lai Wei
Yuchen Zhou
Qian-Ze Zhu
Matthew Abraham
Erica Brand
Anna Bulanova
Jeffrey Cardille
Chris Co
Scott Ellsworth
Grace Joseph
Malcolm Kane
Ryan Krueger
Johan Kartiwa
Jackson Cui
Paul Raccuglia
Julie Wang
Kat Chou
James Manyika
Lizzie Dorfman
Shibl Mourad
Nature (2026)
Preview abstract The cycle of scientific discovery is frequently bottlenecked by the slow, manual creation of software to support computational experiments. To address this, we present Empirical Research Assistance (ERA), an AI system that creates expert-level scientific software whose goal is to maximize a quality metric. The system uses a Large Language Model (LLM) and Tree Search (TS) to systematically improve the quality metric and intelligently navigate the large space of possible solutions. ERA achieves expert-level results when it explores and integrates complex research ideas from external sources. The effectiveness of tree search is demonstrated across a diverse range of tasks. In bioinformatics, ERA discovered 40 novel methods for single-cell data analysis that outperformed the top human-developed methods on a public leaderboard. In epidemiology, ERA generated 14 models that outperformed the CDC ensemble and all other individual models for forecasting COVID-19 hospitalizations. ERA also produced expert-level software for geospatial analysis, neural activity prediction in zebrafish, and numerical solution of integrals, and a novel rule-based construction for time series forecasting. By devising and implementing novel solutions to diverse tasks, ERA represents a significant step towards accelerating scientific progress. Keywords: Tree Search, Generative AI, Scorable Scientific Tasks, Empirical Software View details
CURIE: Evaluating LLMs on multitask long context scientific understanding and reasoning
Hao Cui
Zahra Shamsi
Gowoon Cheon
Xuejian Ma
Shutong Li
Maria Tikhanovskaya
Nayantara Mudur
Paul Raccuglia
Victor V. Albert
Pranesh Srinivasan
Haining Pan
Philippe Faist
Brian Rohr
Ekin Dogus Cubuk
Muratahan Aykol
Amil Merchant
Michael Statt
Drew Purves
Elise Kleeman
Ruth Alcantara
Matthew Abraham
Muqthar Mohammad
Ean Phing VanLee
Chenfei Jiang
Lizzie Dorfman
Eun-Ah Kim
International Conference on Learning Representations (ICLR) (2025)
Preview abstract The core of the scientific problem-solving process involves synthesizing information while applying expert knowledge. Large Language Models (LLMs) have the potential to accelerate this process due to their extensive knowledge across a variety of domains. Recent advancements have also made it possible for LLMs to handle very long "in-context" content. However, existing evaluations of long-context LLMs have focused on assessing their ability to summarize or retrieve information within the given context, primarily in generalist tasks that do not require deep scientific expertise. To facilitate analogous assessments of domain-specific tasks, we introduce the scientific long-Context Understanding and Reasoning Inference Evaluations (CURIE) benchmark. This benchmark provides a set of 8 challenging tasks, derived from around 250 scientific research papers, requiring domain expertise, comprehension of long in-context information, and multi-step reasoning that tests the ability of LLMs to assist scientists in realistic workflows. Tasks in CURIE have been collected from experts in six disciplines - materials science, theoretical condensed matter physics, quantum computing, geospatial analysis, biodiversity, and protein sequencing - covering both experimental and theoretical workflows in science. We evaluate a range of closed and open LLMs on these tasks. Additionally, we propose strategies for task decomposition, which allow for a more nuanced evaluation of the models and facilitate staged multi-step assessments. We hope that insights gained from CURIE can guide the future development of LLMs. View details
Preview abstract Building precise simulations of the real world and using numerical methods to solve quantitative problems is an essential task in engineering and physics. We present FEABench, a benchmark to evaluate the ability of language models (LLMs) and LLM agents to simulate and solve physics, mathematics and engineering problems using finite element analysis (FEA) software. We introduce a multipronged evaluation scheme to investigate the ability of LLMs to solve these problems using COMSOL Multiphysics. We further design an LLM agent equipped with the ability to interact with the software through its Application Programming Interface (API), examine its outputs and use tools to improve its solution over several iterations. Our best performing strategy generates executable API calls 88% of the time. However, this benchmark still proves to be challenging enough that the LLMs and agents we tested were not able to completely and correctly solve any problem. LLMs that can successfully interact with and operate FEA software to solve problems such as those in our benchmark would push the frontiers of automation in engineering. Acquiring this capability would augment LLMs' reasoning skills with the precision of numerical solvers and advance the development of autonomous systems that can tackle complex problems in the real world. View details
Neural general circulation models for weather and climate
Dmitrii Kochkov
Janni Yuval
Ian Langmore
Jamie Smith
Griffin Mooers
Milan Kloewer
James Lottes
Peter Dueben
Samuel Hatfield
Peter Battaglia
Alvaro Sanchez
Matthew Willson
Stephan Hoyer
Nature, 632 (2024), pp. 1060-1066
Preview abstract General circulation models (GCMs) are the foundation of weather and climate prediction. GCMs are physics-based simulators that combine a numerical solver for large-scale dynamics with tuned representations for small-scale processes such as cloud formation. Recently, machine-learning models trained on reanalysis data have achieved comparable or better skill than GCMs for deterministic weather forecasting. However, these models have not demonstrated improved ensemble forecasts, or shown sufficient stability for long-term weather and climate simulations. Here we present a GCM that combines a differentiable solver for atmospheric dynamics with machine-learning components and show that it can generate forecasts of deterministic weather, ensemble weather and climate on par with the best machine-learning and physics-based methods. NeuralGCM is competitive with machine-learning models for one- to ten-day forecasts, and with the European Centre for Medium-Range Weather Forecasts ensemble prediction for one- to fifteen-day forecasts. With prescribed sea surface temperature, NeuralGCM can accurately track climate metrics for multiple decades, and climate forecasts with 140-kilometre resolution show emergent phenomena such as realistic frequency and trajectories of tropical cyclones. For both weather and climate, our approach offers orders of magnitude computational savings over conventional GCMs, although our model does not extrapolate to substantially different future climates. Our results show that end-to-end deep learning is compatible with tasks performed by conventional GCMs and can enhance the large-scale physical simulations that are essential for understanding and predicting the Earth system. View details
Preview abstract Hamiltonian Monte Carlo is discussed in the context of a fusion plasma reconstruction. Ill conditioned covariance and multi-modality are discussed in depth. View details
Preview abstract TAE Technologies, Inc. (TAE) is pursuing an alternative approach to magnetically confined fusion, which relies on field-reversed configuration (FRC) plasmas composed of mostly energetic and well-confined particles by means of a state-of-the-art tunable energy neutral-beam (NB) injector system. TAE’s current experimental device, C-2W (also called “Norman”), is the world’s largest compact-toroid device and has made significant progress in FRC performance, producing record breaking, high temperature (electron temperature, Te >500 eV; total electron and ion temperature, Ttot >3 keV) advanced beam-driven FRC plasmas, dominated by injected fast particles and sustained in steady-state for up to 30 ms, which is limited by NB pulse duration. C-2W produces significantly better FRC performance than the preceding C-2U experiment, in part due to Google’s machine-learning framework for experimental optimization, which has contributed to the discovery of a new operational regime where novel settings for the formation sections yield consistently reproducible, hot, and stable plasmas. Active plasma control system has been developed and utilized in C-2W to produce consistent FRC performance as well as for reliable machine operations using magnets, electrodes, gas injection, and tunable NBs. The active control system has demonstrated a stabilization of FRC axial instability. Overall FRC performance is well correlated with NBs and edge-biasing system, where higher total plasma energy is obtained with increasing both NB injection power and applied-voltage on biasing electrodes. C-2W divertors have demonstrated a good electron heat confinement on open-field-lines using strong magnetic mirror fields as well as expanding the magnetic field in the divertors (expansion ratio >30); the electron energy lost per ion, ~6–8, is achieved, which is close to the ideal theoretical minimum. View details
Preview abstract The Hamiltonian Monte Carlo (HMC) method allows sampling from continuous densities. Favorable scaling with dimension has led to wide adoption of HMC by the statistics community. Modern auto-differentiating software should allow more widespread usage in Bayesian inverse problems. This paper analyzes the two major difficulties encountered using HMC for inverse problems: poor conditioning and multi-modality. Novel results on preconditioning and replica exchange Monte Carlo parameter selection are presented in the context of spectroscopy. Recommendations are analyzed rigorously in the Gaussian case, and shown to generalize in a fusion plasma reconstruction. View details
Preview abstract We determined the time-dependent geometry including high-frequency oscillations of the plasma density in TAE’s C2W experiment. This was done as a joint Bayesian reconstruction from a 14-chord FIR interferometer in the midplane, 32 Mirnov probes at the periphery, and 8 shine-through detectors at the targets of the neutral beams. For each point in time we recovered, with credibility intervals: the radial density profile of the plasma; bulk plasma displacement; amplitudes, frequencies and phases of the azimuthal modes n=1 to n=4. Also reconstructed were the radial profiles of the deformations associated with each of the azimuthal modes. Bayesian posterior sampling was done via Hamiltonian Monte Carlo with custom preconditioning. This gave us a comprehensive uncertainty quantification of the reconstructed values, including correlations and some understanding of multimodal posteriors. This method was applied to thousands of experimental shots on C-2W, producing a rich data set for analysis of plasma performance. View details
Preview abstract Hamiltonian Monte Carlo is a popular sampling technique for smooth target densities. The scale lengths of the target have long been known to influence integration error and sampling efficiency. However, quantitative measures intrinsic to the target have been lacking. In this paper, we restrict attention to the multivariate Gaussian and the leapfrog integrator, and obtain a condition number corresponding to sampling efficiency. This number, based on the spectral and Schatten norms, quantifies the number of leapfrog steps needed to efficiently sample. We demonstrate its utility by using this condition number to analyze HMC preconditioning techniques. We also find the condition number of large inverse Wishart matrices, from which we derive burn-in heuristics. View details
Preview abstract Fusion Plasma Reconstruction work done at Google in partnership with TAE is presented. View details
×