I began working on software in the 1970s as an undergraduate, both at Caltech and during summer employment at UC-Berkeley Entomology (with Bland Ewing). I played around with APL, Pascal, Fortran and Assembly computer languages. In one project, I redesigned the Pascal compiler to be able to insert Assembly code for fast computation. UC-Berkeley had a CDC 6600 and a 7600 computer, which required users to stand in line with decks of computer cards (think one line of code per piece of cardboard, each fitting in a legal envelope). Nevertheless, our team got access through early “high-speed” telephone wires to a computer each in San Francisco and UCLA.
Later as a graduate student at UC-Berkeley, my thesis involved a substantial computing part, along with high-end theory. I happened to be in the same building with some of the designers of 4BSD Unix and some of the popular early Unix tools. Networking was rather rudimentary, with a colleague rigging a 2-wire connection between terminals to transfer coded between machines. There were still only a few computers on campus, with the new generation being PDP-11 minicomputers. The UC-Berkeley CS and Stat departments shared a machine with 11 MB for each on a disk the size of a airport-friendly suitcase. One day, I watched a colleage trash the superblock of this computer, which lost all the pointers to computer file components. I helped him design tools (functions) to recover most of the files from the major crash.
My professor-track employment at UW-Madison involved a careful balance of theoretical work (to establish my bonafides) and computational projects (to explore tools and ground ideas in data-driven stories). That is, I needed to write theory papers to justify my tenure case, but managed to back up most of these with computer tools to justify the methodology.
I was actually stretched in a third, important way, through my career-long interest in collaboration. This has involved developing professional relationships with colleagues across campus, and around the world. While some of this work extends existing, or develops new, stats theory, many of my collaborations have involved more attention to the practical aspects of addressing challenging research questions through data analysis and visualization. This work led to my book, Practical Data Analysis for Designed Experiments, along with a companion package, pda.
- R Software Introduction for Stat 571
- R Appendices for the Stat/For/Hort 571 Course Notes
- Graphical Data Presentation with Emphasis on Genetic Data (R code for ASHS paper)
Software Releases
- Geyser Shiny Module Demo
- GCVPACK: Routines for Generalized Cross Validation (free release in 1986; now part of base of R; Bates, Lindstrom, Wahba and Yandell 1987)
- Splus/QDA: Quality Data Attributes Analysis. (proprietary release in 1999; Yandell and Tragon Corporation).
- Practical Data Analsysis: library(pda) for Splus and R. (free release in 1997; revised in 2000)
- Microarray Data Analysis: library(pickgene) for R. (2001; Lin et al. 2001) Bioconductor
- Quantitative Population Ethology: library(ewing for R. (free release in 2001; Ewing et al. 2001)
QTL Software
I have contributed to multiple quantitative trait loci (QTL) studies and software projects since the early 1990s. In particular, I am a contributor to R/qtl and R/qtl2, both led by Karl Broman with Saunak Sen, and I have developed or co-developed multiple companion packages to these widely used resources (see below). I co-taught a QTL course with Zhao-Bang Zeng and Chris Basten, and contributed insights and code to QTL Cartographer. In addition, working with Gary Churchill and Elias Chaibub Neto, we developed various tests and packages for mediation analysis.
- intermediate:
Mediation analysis building on work of Gary Churchill team and
Elias Chaibub Neto.
See links under
qtl2mediate
,qtlnet
andqtlhot
below. - R/qtl extensions
- R/qtlbim: QTL Bayesian Interval Mapping. Improved and totally revamped R library for model selection with Bayesan interval mapping, allowing for covariates and epistasis. CRAN in 2006. [deprecated]
- R/qdg:
QTL-driven dependent graphs R library (CRAN 2008). [deprecated]
- Chaibub Neto E, Ferrara C, Attie AD, Yandell BS (2008) Inferring causal phenotype networks from segregating populations. Genetics 179 : 1089-1100. doi:10.1534/genetics.107.085167.
- R/qtlhot: QTL Hotspot analysis.
- Chaibub Neto E, Keller MP, Broman AF, Attie AD, Jansen RC, Broman KW, Yandell BS (2012) Quantile-based permutation thresholds for QTL hotspots. Genetics 191 : 1355-1365. doi:10.1534/genetics.112.139451.
- Chaibub Neto E, Broman AT, Keller MP, Attie AD, Zhang B, Zhu J, Yandell BS (2013) Modeling causality for pairs of phenotypes in system genetics. Genetics 193 : 1003-1013. doi:10.1534/genetics.112.147124.
- R/qtlnet: QTL Network analysis.
- Chaibub Neto E, Keller MP, Attie AD, Yandell BS (2010) Causal Graphical Models in Systems Genetics: a unified framework for joint inference of causal network and genetic architecture for correlated phenotypes. Annals of Applied Statistics 4: 320-339. doi:10.1214/09-AOAS288.
- R/qtl2 extensions
- R/qtl2ggplot:
Visualize qtl2 objects with package
ggplot2
. - R/qtl2fst: Fast access to genotype probabilities using FST Package.
- R/qtl2pattern: Routines to investigate strain distribution patterns (SDPs).
- R/qtl2mediate:
QTL mediation using intermediate
- Chaibub Neto E, Broman AT, Keller MP, Attie AD, Zhang B, Zhu J, Yandell BS (2013) Modeling causality for pairs of phenotypes in system genetics. Genetics 193 : 1003–1013. doi:10.1534/genetics.112.147124
- R/qtl2shiny: Shiny app for fine-scale analysis and visualization.
- R/qtl2ggplot:
Visualize qtl2 objects with package
- DO Founder Studies
- R/foundr: Package to analyze and visualize Diversity Outbred (DO) founder lines by sex and condition.
- R/foundrShiny:
Shiny app for
foundr
package. - R/foundrHarmony:
Data input and harmonization for
foundr
package. - Foundr Shiny App Developer Guide
- Miscellaneous QTL Packages
- MCMC-QTL: Markov chain Monte Carlo inference for Quantitative Trait Loci. (free release in 1998; Satagopan, Yandell, Newton and Osborn 1996).
- RevJump-QTL: Bayesian model Determination of the Number of QTLs using Reversible Jump MCMC. (free release in 1999; Satagopan and Yandell 1998).
- Bmapqtl: Bayesian QTL mapping module for QTL Cartographer. (public domain release in 2001; Gaffney 2001)
- R/bim: Bayesian interval mapping R library. (free release in 2002; CRAN in 2003; Bioconductor in 2004. [deprecated]