Robert B. Gramacy Professor of Statistics
Dynamic Trees for Learning and Design
dynaTree
is an R
package implementing sequential
Monte Carlo inference for dynamic tree regression and classification models by
particle learning (PL). The sequential nature of inference and the active learning (AL)
hooks provided facilitate thrifty sequential design and optimization..
This software is licensed under the GNU Lesser Public License (LGPL), version 2 or later. See the change log and an archive of previous versions.
The current version provides:
- regression by constant and linear leaf models
- classification by multinomial leaf models
- sequential design for regression models by active learning heuristics including predictive variance (ALM and ALC); and for classification boundaries by predictive entropy
- optimization of regression models by expected improvement (EI) statistics
- variable selection and by relevance statistics and Saltelli-style sensitivity indices
- fully online learning via retirement and active discarding for massive data, and forgetting factors for drifting concepts
Obtaining the package
- Download
R
from cran.r-project.org by selecting the version for your operating system. - Install the
dynaTree
package, from withinR
.
R> install.packages(c("dynaTree"))
- Optionally, install the
akima
,plgp
andtgp
packages, which are helpful for some of the comparisons in the examples and demos.
R> install.packages(c("akima", "plgp", "tgp"))
- Load the library as you would for any
R
library.
R> library(dynaTree)
Documentation
See the package documentation.
A pdf
version of the reference manual, or help pages, is also available.
The help pages can be accessed from within R
.
The best way to acquaint yourself with the functionality of this package is to run the
demos which illustrate the examples contained in the papers referenced below. Try starting with:
R> help(package=dynaTree)
R> ?dynaTree # follow the examples
R> demo(package="dynaTree") # for a listing of the demos
References
- Dynamic trees for learning and design (2011) with Matt Taddy and Nicholas Polson. Journal of the American Statistical Association, 106(493), pp. 109-123; preprint on arXiv:0912.1586
- Variable selection and sensitivity analysis via dynamic trees with an application to computer code performance tuning (2013) with Matt Taddy and Stefan Wild. Annals of Applied Statistics, 7(1), pp. 51-80; preprint on arXiv:1108.4739; also see our science highlight at Argonne
- Information-theoretic data discarding for dynamic trees on data streams (2013) with Christoforos Anagnostopoulos; Entropy 15(12), pp. 5510-5535; preprint on arXiv:1201.5568. A short version was presented at the NIPS workshop on Bayesian Optimization, Experimental Design and Bandits (Granada, Spain)
- Sequential regression for optimal stopping problems (2013) with Mike Ludkovski; preprint on arXiv:1309.3832
- Empirical performance modeling of GPU kernels using active learning (2014) with Prasassa Balaprakash, Karl Rupp, Azamat Mametjanov, Paul Hovland and Stefan Wild; ParCo 2013 proceedings in Parallel Computing: Accelerating Computational Science and Engineering (CSE) vol. 25, pp. 646-655; preprint at ANL/MCS-P4097-0713
- Active-learning-based surrogate models for empirical performance tuning (2013) with Prasassa Balaprakash and Stefan Wild; in IEEE Cluster 2013 proceedings; preprint at ANL/MCS-P4073-0513
- Bayesian treed response surface models (2013) with Hugh Chipman, Ed George and Rob McCulloch; WIREs Data Mining and Knowledge Discovery, 3(4)