Robert B. Gramacy Professor of Statistics

Dynamic Trees for Learning and Design

dynaTree is an R package implementing sequential Monte Carlo inference for dynamic tree regression and classification models by particle learning (PL). The sequential nature of inference and the active learning (AL) hooks provided facilitate thrifty sequential design and optimization..

This software is licensed under the GNU Lesser Public License (LGPL), version 2 or later. See the change log and an archive of previous versions.

The current version provides:

regression by constant and linear leaf models
classification by multinomial leaf models
sequential design for regression models by active learning heuristics including predictive variance (ALM and ALC); and for classification boundaries by predictive entropy
optimization of regression models by expected improvement (EI) statistics
variable selection and by relevance statistics and Saltelli-style sensitivity indices
fully online learning via retirement and active discarding for massive data, and forgetting factors for drifting concepts

Obtaining the package

Download R from cran.r-project.org by selecting the version for your operating system.
Install the dynaTree package, from within R.
R> install.packages(c("dynaTree"))
Optionally, install the akima, plgp and tgp packages, which are helpful for some of the comparisons in the examples and demos.
R> install.packages(c("akima", "plgp", "tgp"))
Load the library as you would for any R library.
R> library(dynaTree)

Documentation

See the package documentation. A pdf version of the reference manual, or help pages, is also available. The help pages can be accessed from within R. The best way to acquaint yourself with the functionality of this package is to run the demos which illustrate the examples contained in the papers referenced below. Try starting with:

R> help(package=dynaTree)
R> ?dynaTree # follow the examples
R> demo(package="dynaTree") # for a listing of the demos

References

Dynamic trees for learning and design (2011) with Matt Taddy and Nicholas Polson. Journal of the American Statistical Association, 106(493), pp. 109-123; preprint on arXiv:0912.1586
Variable selection and sensitivity analysis via dynamic trees with an application to computer code performance tuning (2013) with Matt Taddy and Stefan Wild. Annals of Applied Statistics, 7(1), pp. 51-80; preprint on arXiv:1108.4739; also see our science highlight at Argonne
Information-theoretic data discarding for dynamic trees on data streams (2013) with Christoforos Anagnostopoulos; Entropy 15(12), pp. 5510-5535; preprint on arXiv:1201.5568. A short version was presented at the NIPS workshop on Bayesian Optimization, Experimental Design and Bandits (Granada, Spain)
Sequential regression for optimal stopping problems (2013) with Mike Ludkovski; preprint on arXiv:1309.3832
Empirical performance modeling of GPU kernels using active learning (2014) with Prasassa Balaprakash, Karl Rupp, Azamat Mametjanov, Paul Hovland and Stefan Wild; ParCo 2013 proceedings in Parallel Computing: Accelerating Computational Science and Engineering (CSE) vol. 25, pp. 646-655; preprint at ANL/MCS-P4097-0713
Active-learning-based surrogate models for empirical performance tuning (2013) with Prasassa Balaprakash and Stefan Wild; in IEEE Cluster 2013 proceedings; preprint at ANL/MCS-P4073-0513
Bayesian treed response surface models (2013) with Hugh Chipman, Ed George and Rob McCulloch; WIREs Data Mining and Knowledge Discovery, 3(4)