5 Efficient workflow

Efficient programming is an important and sometimes vital skill for generating the correct result, on time. Yet coding is only one part of a wider skillset needed for successful project outcomes which involve R programming. In this context we define ‘workflow’ as the sum of practices, habits and systems that enable productivity.8 To some extent workflow is about personal preferences. Everyone’s mind works differently so the most appropriate workflow varies from person to person and from one project to the next. We recommend trying different working practices to discover which works best for you.9

There are, however, concrete steps that can be taken to improve workflow in most projects that involve R programming. Learning them will, in the long-run, improve productivity and reproducibility. With these motivations in mind, the purpose of this chapter is simple: to highlight some key ingredients of an efficient R workflow. It builds on the concept of an R/RStudio project, introduced in Chapter 2, and is ordered chronologically throughout the stages involved in a typical project’s lifespan, from its inception to publication:

  • Project planning. This should happen before any code has been written, to avoid time wasted using poor packages or a mistaken analysis strategy.

  • Package selection. After planning your project you should identify which packages are most suitable to get the work done quickly and effectively. With the burgeoning number of packages available, and the phenomenon that some R packages now perform better than base R for certain functions (*_join, for example, is better than merge).

  • Importing data. This can depend on external packages and represent a time-consuming and computational bottle-neck that prevents progress.

  • Tidying the data. This critical stage results in datasets that are convenient for analysis and processing, with implications for the efficiency of all subsequent stages (Wickham 2014b).

  • Data processing. This stage involves manipulating data to assist in the answering of hypotheses. The focus is on the dplyr and data.table packages. These are designed to make this stage both fast to type process.

  • Publication. This final stage is relevant if you want your R code to be useful for others in the long term. To this end Section 5.7 touches on documentation using knitr and the much stricter approach to code publication of package development.