Robert B. Gramacy Professor of Statistics
Front cover: Lake Wanaka, New Zealand
Surrogates
Gaussian process modeling, design and optimization for the applied sciences
Gaussian process modeling, design and optimization for the applied sciences
A graduate textbook, or professional handbook, on topics at the interface between machine learning, spatial statistics, computer simulation, meta-modeling (i.e., emulation), design of experiments, and optimization. Experimentation through simulation, “human out-of-the-loop” statistical support, management of dynamic processes, online and real-time analysis, automation, and practical application are at the forefront.
Topics include:
- Gaussian process (GP) regression for flexible nonparametric and nonlinear modeling.
- Uncertainty quantification, sensitivity analysis, calibration, sequential design/active learning and (blackbox/Bayesian) optimization.
- Advanced topics: treed partitioning, local GP approximation, coupled nonlinear mean and variance (heteroskedastic) models.
- Treatment appreciates historical response surface methodology (RSM), but emphasizes contemporary methods and implementation in
R
at modern scale. - Rmarkdown facilitates a fully reproducible tour, complete with motivation from, application to, and illustration with, compelling real-data examples.
Presentation targets numerically competent practitioners in engineering, physical, and biological sciences. Writing is statistical in form, but the subjects are not about statistics. Rather, they’re about prediction and synthesis under uncertainty; about visualization and information, design and decision making, computing and clean code.
Access and content
- Download an electronic "print version".
- Links across/below point directly to HTML renderings of the chapters. Or start from the title page.
- Please consider buying a physical copy from CRC, Amazon, Barnes & Noble, or anywhere fine books are sold. Royalties will help subsidize whiskey consumed by content production.
- Access to solution files may be granted to instructors who reach out to me directly. Sharing of access to/copies of solution files is strictly prohibited.
- Please use this BibTeX entry for citation.
Errata
Comments/corrections by email are much appreciated.
- The HTML is updated in near real-time;
- grammar/cosmetic fixes addressed without fuss;
- so errata/corrections apply to major changes in print/PDF version only.
Errata text file linked here.
Review
Many thanks to folks who have helped promote the book.
- Max Morris (IA State) and Brian Williams (LANL) provided blurbs for CRC's marketing materials.
- Shuai Huang (UW) wrote a review for JQT.
- Tony Pourmohamad (Genentech) wrote a review for Technometrics.
- Debashis Ghosh (U Colorado) wrote a review for ISI.
Miscellany
In the event that compiling dependencies problematic, some pre-compiled libraries are provided below.
- TPM (satellite drag) simulator: tpm_osx.so for Apple OSX; tpm_win.so for Windows. Rename to "tpm.so" before using. See tpm_win.doc for a student's step-by-step instructions for a custom compile on Windows.
- Groundwater remediation AEM Bluebird_osx and Ostrich_osx. Rename to remove "_osx". Shell scripts gluing these together make Windows versions harder.
- CRAN Apple OSX binaries are compiled with Clang, which does not have OpenMP support. Here is a binary compiled with GCC and OpenMP: laGP_1.5-3_gccosx.tgz.
(It's never a good idea to trust binaries from the web. I take no responsibility for their content; they're not regularly updated.)
Chapters
-
Chapter 1: Historical Perspective (summary)
Data: wires
Solutions to homework exercises -
Chapter 2: Four Motivating Datasets (summary)
Supplementary code: lockwood archive
Data: lgbb archive, crash archive, and tpm-git
Solutions to homework exercises -
Chapter 3: Steepest Ascent and Ridge Analysis (summary)
Data: plasma (delta), chemical, rising ridge, saddle point, confidence region, sadat, metallurgy, heat, bumper, turbine, viscosity
Solutions to homework exercises -
Chapter 4: Space-filling Design (summary)
Data: lola (cands)
Solutions to homework exercises -
Chapter 5: Gaussian Process Regression (summary)
A webinar for ASA/SPES covering material from the first half of the chapter.
Supplementary notes on Basis Expansion and Splines (thanks to Hastie, Tibshirani and Freedman (2017), Chapter 5) with code in splines2d.R
Data: lgbb archive
Solutions to homework exercises, requiring fried.RData -
Chapter 6: Model-based Design for GPs (summary)
Data: lgbb archive
Solutions to homework exercises, requiring mymaximin.R, lgbbpart_sol.R, maximin_cand_sol.R, lgbbpart_rmses_sol.RData and lgbbpart_designs_sol.RData -
Chapter 7: Optimization (summary)
Supplemtary code: gp_ei_sin.R, and aimprob.R.
Data: lockwood archive
Solutions to homework exercises, requiring htc_sol.R, lockwood_sol.R, htc_sol.RData and lockwood_prog_sol.RData -
Chapter 8: Calibration and Sensitivity (summary)
Data: wiffle balls
Solutions to homework exercises -
Chapter 9: GP Fidelity and Scale (summary)
You'll need a customized SparseEm package for compactly supported kernels.
Data: lgbb archive, crash archive, and lanl archive, sarcos train (test)
Solutions to homework exercises, requiring lgbbpart2_sol.R, sarcos_sol.R, satdrag_sol.R, lgbbpart_rmses2_sol.RData, lgbbpart_designs2_sol.RData, sarcos_sol.RData, sarcos_nomle_sol.RData, satdrag_grace_sol.RData and satdrag_hst_sol.RData -
Chapter 10: Heteroskedasticity (summary)
A talk from DSSV 2020 covering highlights.
Data: ocean field and runs
Supplementary code: fksim.R requiring fkset.RData
Solutions to homework exercises, requiring ocean_seqdes_sol.R, osean_seqdes_sol.RData -
Appendices: (summary)
A: Intel MKL and OSX Accelerate
B: An experiment game, requiring the yield archive