I'm Simon (he/him), an R developer and data scientist. I build tools for data scientists at Posit PBC (formerly RStudio).🐛
Most of the time, I'm focused on helping R users get the most out of LLMs:
- tidyverse/vitals: LLM evaluation
 - posit-dev/mcptools: Model Context Protocol (MCP) servers and clients in R
 - posit-dev/btw: easily provide context on R stuff to LLMs
 - simonpcouch/gander: high-performance, low-friction chat for data science
 - simonpcouch/chores: an extensible collection of LLM assistants
 - simonpcouch/predictive: an agentic frontend for predictive modeling with tidymodels
 - simonpcouch/kapa: RAG-based search via the kapa.ai API
 
Some LLM-related experiments that didn't end up going far:
- simonpcouch/buggy: automatically explain and address errors
 - simonpcouch/ensure: automated unit testing for R developers
 
I previously spent most of time on R packages for statistical modeling, and still maintain a good few of those packages:
- tidymodels/broom: convert statistical analysis objects to tidy tibbles
 - tidymodels/infer: a grammar for tidy statistical inference
 - tidymodels/stacks: tidymodels-friendly model stacking and ensembling
 - tidymodels/bonsai: model wrappers for tree-based models
 - tidymodels/workflows: combine preprocessing, modeling, and postprocessing objects
 - tidymodels/tailor: postprocessing with tidymodels
 - tidymodels/workflowsets: creating collections of modeling workflows
 - rstudio/bundle: a consistent interface for model serialization
 
Related to the above packages, I also have some pieces of a book called Efficient Machine Learning with R put together.
Another part of my gig is maintaining database interfaces for R:
- r-dbi/odbc: connect to any ODBC-compliant database with DBI
 - tidyverse/dbplyr: database backend for dplyr
 
I also maintain some personal R packages that range in functionality from to performance profiling to data querying to biological methods:
- extendr/mdl: performant reimagining of R model matrices, written in rust
 - simonpcouch/syrup: profile memory and CPU usage of parallel R code
 - simonpcouch/stopwatch: high precision timings using mocking
 - simonpcouch/anyflights: query 
nycflights13-like data for any recent year and US airport - simonpcouch/gbfs: query data on public bikes from hundreds of bikeshare programs
 - simonpcouch/carpentR: predicting lake algal blooms using plankton dynamics
 - rudeboybert/forestecology: methods for model fitting and assessment in forest ecology
 - simonpcouch/detectors: prediction data from GPT detectors
 - simonpcouch/readmission: hospital readmission data for patients with type 1 diabetes
 - simonpcouch/forested: forest attributes in Washington State
 
Keep up to date with what I'm up to on my website's blog as well as the tidyverse blog.





