Software Projects

I began working on software in the 1970s as an undergraduate, both at Caltech and during summer employment at UC-Berkeley Entomology (with Bland Ewing). I played around with APL, Pascal, Fortran and Assembly computer languages. In one project, I redesigned the Pascal compiler to be able to insert Assembly code for fast computation. UC-Berkeley had a CDC 6600 and a 7600 computer, which required users to stand in line with decks of computer cards (think one line of code per piece of cardboard, each fitting in a legal envelope). Nevertheless, our team got access through early “high-speed” telephone wires to a computer each in San Francisco and UCLA.

Later as a graduate student at UC-Berkeley, my thesis involved a substantial computing part, along with high-end theory. I happened to be in the same building with some of the designers of 4BSD Unix and some of the popular early Unix tools. Networking was rather rudimentary, with a colleague rigging a 2-wire connection between terminals to transfer coded between machines. There were still only a few computers on campus, with the new generation being PDP-11 minicomputers. The UC-Berkeley CS and Stat departments shared a machine with 11 MB for each on a disk the size of a airport-friendly suitcase. One day, I watched a colleage trash the superblock of this computer, which lost all the pointers to computer file components. I helped him design tools (functions) to recover most of the files from the major crash.

My professor-track employment at UW-Madison involved a careful balance of theoretical work (to establish my bonafides) and computational projects (to explore tools and ground ideas in data-driven stories). That is, I needed to write theory papers to justify my tenure case, but managed to back up most of these with computer tools to justify the methodology.

I was actually stretched in a third, important way, through my career-long interest in collaboration. This has involved developing professional relationships with colleagues across campus, and around the world. While some of this work extends existing, or develops new, stats theory, many of my collaborations have involved more attention to the practical aspects of addressing challenging research questions through data analysis and visualization. This work led to my book, Practical Data Analysis for Designed Experiments, along with a companion package, pda.

Software Releases

QTL Software

I have contributed to multiple quantitative trait loci (QTL) studies and software projects since the early 1990s. In particular, I am a contributor to R/qtl and R/qtl2, both led by Karl Broman with Saunak Sen, and I have developed or co-developed multiple companion packages to these widely used resources (see below). I co-taught a QTL course with Zhao-Bang Zeng and Chris Basten, and contributed insights and code to QTL Cartographer. In addition, working with Gary Churchill and Elias Chaibub Neto, we developed various tests and packages for mediation analysis.

  • intermediate: Mediation analysis building on work of Gary Churchill team and Elias Chaibub Neto. See links under qtl2mediate, qtlnet and qtlhot below.
  • R/qtl extensions
    • R/qtlbim: QTL Bayesian Interval Mapping. Improved and totally revamped R library for model selection with Bayesan interval mapping, allowing for covariates and epistasis. CRAN in 2006. [deprecated]
    • R/qdg: QTL-driven dependent graphs R library (CRAN 2008). [deprecated]
      • Chaibub Neto E, Ferrara C, Attie AD, Yandell BS (2008) Inferring causal phenotype networks from segregating populations. Genetics 179 : 1089-1100. doi:10.1534/genetics.107.085167.
    • R/qtlhot: QTL Hotspot analysis.
      • Chaibub Neto E, Keller MP, Broman AF, Attie AD, Jansen RC, Broman KW, Yandell BS (2012) Quantile-based permutation thresholds for QTL hotspots. Genetics 191 : 1355-1365. doi:10.1534/genetics.112.139451.
      • Chaibub Neto E, Broman AT, Keller MP, Attie AD, Zhang B, Zhu J, Yandell BS (2013) Modeling causality for pairs of phenotypes in system genetics. Genetics 193 : 1003-1013. doi:10.1534/genetics.112.147124.
    • R/qtlnet: QTL Network analysis.
      • Chaibub Neto E, Keller MP, Attie AD, Yandell BS (2010) Causal Graphical Models in Systems Genetics: a unified framework for joint inference of causal network and genetic architecture for correlated phenotypes. Annals of Applied Statistics 4: 320-339. doi:10.1214/09-AOAS288.
  • R/qtl2 extensions
  • DO Founder Studies
  • Miscellaneous QTL Packages
    • MCMC-QTL: Markov chain Monte Carlo inference for Quantitative Trait Loci. (free release in 1998; Satagopan, Yandell, Newton and Osborn 1996).
    • RevJump-QTL: Bayesian model Determination of the Number of QTLs using Reversible Jump MCMC. (free release in 1999; Satagopan and Yandell 1998).
    • Bmapqtl: Bayesian QTL mapping module for QTL Cartographer. (public domain release in 2001; Gaffney 2001)
    • R/bim: Bayesian interval mapping R library. (free release in 2002; CRAN in 2003; Bioconductor in 2004. [deprecated]