Dave Patterson

Dave Patterson

David Patterson received BA, MS, and PhD degrees from UCLA. He is a UC Berkeley Pardee professor emeritus, a Google distinguished engineer since 2016, and Laude Institute Board Chair.

His two most influential Berkeley projects likely were RISC and RAID. He received 50 awards overall, including service awards for his roles as ACM President, UC Berkeley CS Division Chair, and CRA Chair and education awards for teaching and textbooks. The most prominent of his seven co-authored books is Computer Architecture: A Quantitative Approach.

He and his co-author John Hennessy shared the 2017 ACM A.M Turing Award, the 2021 BBVA Foundation Frontiers of Knowledge Award, and the 2022 NAE Charles Stark Draper Prize for Engineering. (The Turing Award is often referred to as the "Nobel Prize of Computing" and the Draper Prize is considered a "Nobel Prize of Engineering.")

Outside of work he plays soccer, lifts weights, cycles, and bodysurfs. He has been married to his high-school sweetheart since 1967, and they have raised two sons, who in turn are raising their four grandchildren.

Research Areas

Authored Publications
Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
Databases in the Era of Memory-Centric Computing
Anastasia Ailamaki
Lawrence Benson
Jana Gičeva
Eric Seldar
Lisa Wu Wills
2025
Preview abstract The increasing disparity between processor core counts and memory bandwidth, coupled with the rising cost and underutilization of memory, introduces a performance and cost Memory Wall and presents a significant challenge to the scalability of database systems. We argue that current processor-centric designs are unsustainable, and we advocate for a shift towards memory-centric computing, where disaggregated memory pools enable cost-effective scaling and robust performance. Database systems are uniquely positioned to leverage memory-centric systems because of their intrinsic data-centric nature. We demonstrate how memory-centric database operations can be realized with current hardware, paving the way for more efficient and scalable data management in the cloud. View details
Preview abstract Many recent papers highlight the importance of thinking about carbon emissions (CO2e) in machine learning (ML) workloads. While elevating the discussion, some early work was also based on incomplete information. (Unfortunately, the most widely cited quantitative estimate that was the basis for many of these papers was off by 88X.) Inspired by these concerns, we looked for approaches that would make ML training considerably less carbon intensive. We identified four best practices that dramatically reduce carbon emissions, and demonstrate two concrete examples of reducing CO2e by 650X over four years and 40X over one year by following them. Provided ML stakeholders follow best practices, we predict that the field will bend the curve of carbon footprint increases from ML training runs to first flatten and then reduce it by 2030 without sacrificing the current rate of rapid advances in ML, contrary to prior dire warnings that ML CO2e will soar. View details
In-Datacenter Performance Analysis of a Tensor Processing Unit
Norman P. Jouppi
Nishant Patil
Gaurav Agrawal
Raminder Bajwa
Sarah Bates
Suresh Bhatia
Nan Boden
Al Borchers
Rick Boyle
Pierre-luc Cantin
Clifford Chao
Chris Clark
Jeremy Coriell
Mike Daley
Matt Dau
Ben Gelb
Tara Vazir Ghaemmaghami
Rajendra Gottipati
William Gulland
Robert Hagmann
C. Richard Ho
Doug Hogberg
John Hu
Dan Hurt
Julian Ibarz
Aaron Jaffey
Alek Jaworski
Alexander Kaplan
Harshit Khaitan
Andy Koch
Naveen Kumar
Steve Lacy
James Law
Diemthu Le
Chris Leary
Zhuyuan Liu
Kyle Lucke
Alan Lundin
Gordon MacKean
Adriana Maggiore
Maire Mahony
Kieran Miller
Rahul Nagarajan
Ravi Narayanaswami
Ray Ni
Kathy Nix
Thomas Norrie
Mark Omernick
Narayana Penukonda
Andy Phelps
Jonathan Ross
ISCA (2017) (to appear)
Preview abstract Many architects believe that major improvements in cost-energy-performance must now come from domain-specific hardware. This paper evaluates a custom ASIC---called a Tensor Processing Unit (TPU)---deployed in datacenters since 2015 that accelerates the inference phase of neural networks (NN). The heart of the TPU is a 65,536 8-bit MAC matrix multiply unit that offers a peak throughput of 92 TeraOps/second (TOPS) and a large (28 MiB) software-managed on-chip memory. The TPU's deterministic execution model is a better match to the 99th-percentile response-time requirement of our NN applications than are the time-varying optimizations of CPUs and GPUs (caches, out-of-order execution, multithreading, multiprocessing, prefetching, ...) that help average throughput more than guaranteed latency. The lack of such features helps explain why, despite having myriad MACs and a big memory, the TPU is relatively small and low power. We compare the TPU to a server-class Intel Haswell CPU and an Nvidia K80 GPU, which are contemporaries deployed in the same datacenters. Our workload, written in the high-level TensorFlow framework, uses production NN applications (MLPs, CNNs, and LSTMs) that represent 95% of our datacenters' NN inference demand. Despite low utilization for some applications, the TPU is on average about 15X - 30X faster than its contemporary GPU or CPU, with TOPS/Watt about 30X - 80X higher. Moreover, using the GPU's GDDR5 memory in the TPU would triple achieved TOPS and raise TOPS/Watt to nearly 70X the GPU and 200X the CPU. View details
×