R for the Data Sciences

Study Groups, Radian in VS Code, and Core Workflows

Brian Yandell (byandell.github.io)

2026-06-18

COMBEE Study Groups & Syllabus

R for Teams in the Data Sciences

Overview & Purpose

Web References

Broad Syllabus Organization

Material is organized into “verb” folders representing key stages of the data science workflow:

  • curate: Manage and transform data structures.
  • visualize: Explore data relationships visually.
  • organize: Document and structure reproducible workflows.
  • analyze: Fit linear models and run statistical analyses.
  • profile: Check efficiency and test code performance.
  • connect: Bridge R with other computing environments and databases.

Study Materials & Indices

Additional resources located in the repository root directory:

Core Documentation

Background & Legacy

  • background.md: Introductory slides on R history.
  • Bates.md: Special teaching index developed by Douglas Bates.
  • data: Local sample folders.

Tip

Many thanks to Douglas Bates and the COMBEE Study Group for co-teaching and sharing the early versions of these materials in 2014–2017.

The 6 Data Science Verbs

1. Curate: Data Structures & Import

Tidyverse Curation

  • Read, manipulate, and display data summaries in concise tables.
  • Work with data frames using tidyverse tools.
  • Write cleaned up data tables in CSV/flat formats.
  • String manipulation and cleaning using characters/regular expressions.

Key Repository Modules

  • tidyverse / intro_dplyr / data_tables
  • purrr: List/vector iteration mapping (map & transpose).
  • species: Integrated dplyr, tidyr, and purrr workflow.
  • string / regex / file

2. Visualize: Grammar of Graphics

Graphical Explorations

  • Understand key components of the grammar of graphics.
  • Visualize data relationships with the ggplot2 package.
  • Examine formatting/layout packages (cowplot / GGally).
  • Create network observations in connected graphs (network).
  • Share results on the web with interactive shiny apps.

Key Repository Modules

  • surveys_ggplot.R
  • ggplot2.Rmd
  • visualize.md
  • graphics.md
  • shiny.Rmd

3. Organize: Workflows & Databases

Structure & Reproducibility

  • Document ongoing research with R Markdown reports.
  • Collapse repeated code blocks into reusable functions.
  • Use git and GitHub for version control.
  • Organize documentation and tasks into shared packages.
  • Manage external databases using SQLite, DBI, dbplyr, RDS, or feather formats.

Key Repository Modules

  • Rmarkdown.md / RmarkdownExample.Rmd
  • function.Rmd
  • package.Rmd
  • github.md
  • database.md / SQLiteDplyr.Rmd
  • writing.md

4. Analyze: Models & Formulas

Statistical Foundations

  • Correlate measurements and compare across groups using linear models (lm).
  • Specify layout matrices and regression variables using R formulas.
  • Organize model parameters cleanly using broom, car, or emmeans packages.
  • Study GLMs, non-linear models, and hierarchical systems genetics.

Key Repository Modules

  • linear_model.Rmd
  • Formulas.Rmd
  • correlate.Rmd
  • covary.Rmd
  • sysgen.md

5. Profile: Optimization & Simulation

Performance & Testing

  • Profile code to identify speed bottlenecks using Rprof and profvis.
  • Debug code using traceback, interactive browser sessions, and RMarkdown breakpoints.
  • Simulate data to study statistical properties (fitting speed, simulations).
  • Create diagnostic plots to test statistical model assumptions.

Key Repository Modules

  • profile.Rmd / lineprof.Rmd
  • simulate.Rmd: Simulation studies (Doug Bates).
  • SimSpeed.Rmd / LmSimulation.Rmd

6. Connect: Interoperability

Beyond R

  • Use the Unix/Linux shell to search, edit, and organize projects.
  • Build automated, reproducible pipelines using GNU Make.
  • Scale up workflows and the split-apply-combine paradigm using remote clusters.
  • Create reproducible, isolated containers of software using Docker and Guix.
  • Typesetting scientific articles using LaTeX.

Key Repository Modules

  • linux.md / beyondR.md
  • docker.md
  • latex.md
  • reproducible.md
  • scaling_up.md

Development Environments & Radian

Using R in VS Code

VS Code has become a highly customizable IDE for R programming, matching RStudio functions:

Interactive VS Code Setup

  • Install the VS Code R extension to execute lines of code and preview datasets.
  • Set up markdown engines for inline output rendering.
  • Configure customized code chunk snippets to improve keyboard speed.
  • Bind keyboard shortcuts to quickly insert RMarkdown/Quarto code chunks.

Key Guides

Radian: A Modern Console for R

Features of Radian

  • Radian is a modern console alternative for R.
  • Built on Python’s prompt_toolkit to provide:
    • Syntax highlighting.
    • Multi-line editing and completion.
    • Bracketed pasting (copying whole code blocks).
    • Searchable command history.

AI Environments Note

Radian is not always seamless within terminal-based AI environments (like cursor or terminal agents). It can interfere with the interactive memory or command execution of these tools. Use Posit’s RStudio if debugging issues.

Radian VS Code Configuration

Add the following settings to your VS Code settings.json and system path:

// Define binary path based on OS
"r.rterm.mac": "/usr/local/bin/radian",
"r.rterm.linux": "/usr/local/bin/radian",
"r.rterm.windows": "C:\\Users\\<username>\\AppData\\Local\\Programs\\Python\\Python310\\Scripts\\radian.exe",

// Set console behaviors
"r.bracketedPast": true,
"r.alwaysUseActiveTerminal": true, // Run code (CTRL + ENTER) in current terminal
"r.rterm.option": [
    "--no-save",
    "--no-restore",
    "--quiet",
    "--r-binary=/usr/local/bin/radian"
],
"r.sessionWatcher": true,
"r.sessionWatcher.showSavePrompt": false

In your .Rprofile home file, disable automatic completion-on-type to prevent latency:

options(radian.complete_while_typing = FALSE)

Core Coding: Native vs Magrittr Pipes

Pipes: Native |> vs Magrittr %>%

Pipes chain multiple nested function calls into sequential, readable steps.

magrittr Pipe (%>%)

  • Loaded via magrittr or dplyr.
  • Rewrites the AST via function wrappers.
  • Allows placing arguments anywhere using the dot . placeholder.
# magrittr style
df %>% filter(x > 10)

Native Pipe (|>)

  • Built directly into R base (version 4.1+).
  • Zero packages needed, no function call overhead (compiled on parse).
  • Requires using the underscore _ placeholder (introduced in R 4.2+).
# Native style
df |> filter(x > 10)

Key Resource

Check the tidyverse blog post Base R (4.1+) |> vs magrittr %>% pipe for detailed performance differences.

Pipe Comparison Matrix

Feature magrittr Pipe (%>%) Native Pipe (|>)
Availability Requires library(magrittr) or tidyverse Built-in base R since version 4.1
Execution Speed Slower (function wrapper wrapper) Faster (parsed directly to standard calls)
Placeholder Support for . placeholder anywhere Support for _ placeholder (R 4.2+, named args only)
Syntax Errors Caught during runtime Caught during parsing
Debugging Deep nested stack traces Standard simple stack traces
# Placeholder comparison:
# magrittr:
mtcars %>% lm(mpg ~ cyl, data = .)

# Native R (4.2+):
mtcars |> lm(mpg ~ cyl, data = _)