# 2 Efficient set-up

An efficient computer set-up is analogous to a well-tuned vehicle: its components work in harmony, it is well-serviced, and it is fast. This chapter describes the software decisions that will enable a productive workflow. Starting with the basics and moving to progressively more advanced topics, we explore how the operating system, R version, startup files and IDE can make your R work faster (though IDE could be seen as basic need for efficient programming). Ensuring correct configuration of these elements will have knock-on benefits in many aspects of your R workflow. That’s why we cover them at this early stage (hardware, the other fundamental consideration, is covered in the next chapter). By the end of this chapter you should understand how to set-up your computer and R installation (skip to section 2.3 if R is not already installed on your computer) for optimal computational and programmer efficiency. It covers the following topics:

• R and the operating systems: system monitoring on Linux, Mac and Windows
• R version: how to keep your base R installation and packages up-to-date
• R start-up: how and why to adjust your .Rprofile and .Renviron files
• RStudio: an integrated development environment (IDE) to boost your programming productivity
• BLAS and alternative R interpreters: looks at ways to make R faster

For lazy readers, and to provide a taster of what’s to come, we begin with our ‘top 5’ tips for an efficient R set-up. It is important to understand that efficient programming is not simply the result of following a recipe of tips: understanding is vital for knowing when to use a memorised solution to a problem and when to go back to first principles. Thinking about and understanding R in depth, e.g. by reading this chapter carefully, will make efficiency second nature in your R workflow.

## 2.1 Top 5 tips for an efficient R set-up

• Use system monitoring to identify bottlenecks in your hardware/code
• Keep your R installation and packages up-to-date
• Make use of RStudio’s powerful autocompletion capabilities and shortcuts
• Store API keys in the .Renviron file
• Use BLAS if your R number crunching is too slow

## 2.2 Operating system

R works on all three consumer operating systems (OS) (Linux, Mac and Windows) as well as the server-orientated Solaris OS. R is predominantly platform-independent, meaning that it should behave in the same way on each of these platforms. This is partly facilitated by CRAN tests which ensure that R packages work on all OSs mentioned above. There are some operating system-specific quirks that may influence the choice of OS and how it is set-up for R programming in the long-term. Basic system information can be queried from within R using Sys.info(), as illustrated below for a selection its output:

Sys.info()
#R>                   sysname
#R>                   "Linux"
#R>                  release
#R>       "4.2.0-35-generic"
#R>                  machine
#R>                 "x86_64"
#R>                     user
#R>                  "robin" 

Translated into English, this means that R is running on a 64 bit (x86_64) Linux distribution (kernel version 4.2.0-35-generic) and that the current user is robin. Four other pieces of information (not shown) are also produced by the command, the meaning of which is well documented in ?Sys.info.

Pro tip. The assertive.reflection package can be used to report additional information about your computer’s operating system and R set-up with functions for asserting operating system and other system characteristics. The assert_* functions work by testing the truth of the statement and erroring if the statement is untrue. On a Linux system assert_is_linux() will run silently, whereas assert_is_solaris will cause an error. The package can also test for IDE you are using (e.g. assert_is_rstudio()), the capabilities of R (assert_r_has_libcurl_capability etc.), and what OS tools are available (e.g. assert_r_can_compile_code). These functions can be useful for running code that designed only to run on one type of set-up.

### 2.2.1 Operating system and resource monitoring

Minor differences aside,1 R’s computational efficiency is broadly the same across different operating systems. This is important as it means the techniques will, in general, work equally well on different OSs. Beyond the $$32$$ vs $$64$$ bit issue (covered in the next chapter) and process forking (covered in Chapter 6) the main issue for many will be user friendliness and compatibility other programs used alongside R for work. Changing operating system can be a time consuming process so our advice is usually to stick to whatever OS you are most comfortable with.

Some packages (e.g. those that must be compiled and that depend on external libraries) are best installed at the operating system level (i.e. not using install.packages) on Linux systems. On Debian-based operating systems such as Ubuntu, these are named with the prefix r-cran- (see Section 2.4).

Regardless of your operating system, it is good practice to track how system resources (primarily CPU and RAM use) respond when running time-consuming or RAM-intensive tasks. If you only process small datasets, system monitoring may not be necessary but when handling datasets at the limits of your computer’s resources, it can be a useful tool for identifying bottlenecks, such as when you are running low on RAM. Alongside R profiling functions such as profvis (see Section XXX), system monitoring can help identify performance bottlenecks and opportunities for making tasks run faster.

A common use case for system monitoring of R processes is to identify how much RAM is being used and whether more is needed (covered in Chapter 3). System monitors also report the percentage of CPU resource allocated over time. On modern multi-threaded CPUs, many tasks will use only a fraction of the available CPU resource because R is by default a single-threaded program (see Chapter 6 on parallel programming). Monitoring CPU load in this context can be useful for identifying whether R is running in parallel (see Figure 2.1).

System monitoring is a complex topic that spills over into system administration and server management. Fortunately there are many tools designed to ease monitoring all major operating systems.

• On Linux, the shell command top displays key resource use figures for most distributions. htop and Gnome’s System Monitor (gnome-system-monitor, see Figure 2.1) are more refined alternatives which use command-line and graphical user interfaces respectively. A number of options such as nethogs monitor internet usage.
• On Windows the Task Manager provides key information on RAM and CPU use by process. This can be started in modern Windows versions by typing Ctrl-Alt-Del or by clicking the task bar and ‘Start Task Manager’.
• On Mac the Activity Monitor provides similar functionality. This can be initiated form the Utilities folder in Launchpad.

#### Exercises

1. What is the exact version of your computer’s operating system?
2. Start an activity monitor then type and execute the following code. How do the results on your system compare to those presented in Figure 2-1?

# 1: Create large dataset
X = data.frame(matrix(rnorm(1e8), nrow = 1e7))
# 2: Find the median of each column using a single core
r1 = lapply(X, median)
# 3: Find the median of each column using many cores
r2 = parallel::mclapply(X, median) # runs in serial on Windows
3. What do you notice regarding CPU usage, RAM and system time, during and after each of the three operations?
4. Bonus question: how would the results change depending on operating system?

## 2.3 R version

It is important to be aware that R is an evolving software project, whose behaviour changes over time. This applies to an even greater extent to packages, which occasionally change substantially from one release to the next. For most use cases it we recommend always using the most up-to-date version of R and packages, so you have the latest code. In some circumstances (e.g. on a production server) you may alternatively want to use specific versions which have been tested, to ensure stability. Keeping packages up-to-date is desirable because new code tends to be more efficient, intuitive, robust and feature rich. This section explains how.

Previous R versions can be installed from CRAN’s archive or previous R releases. The binary versions for all OSs can be found at cran.r-project.org/bin/. To download binary versions for Ubuntu ‘Wily’, for example, see cran.r-project.org/bin/linux/ubuntu/wily/. To ‘pin’ specific versions of R packages you can use the packrat package. For more on pinning R versions and R packages see articles on RStudio’s website Using-Different-Versions-of-R and rstudio.github.io/packrat/.

### 2.3.1 Installing R

The method of installing R varies for Windows, Linux and Mac.

On Windows, a single .exe file (hosted at cran.r-project.org/bin/windows/base/) will install the base R package.

On a Mac, the latest version should be installed by downloading the .pkg files hosted at cran.r-project.org/bin/macosx/.

On Debian-based systems adding the CRAN repository in the format. The following bash command will add the repository to /etc/apt/sources.list and keep your operating system updated with the latest version of R:

sudo apt-add-repository https://cran.rstudio.com/bin/linux/ubuntu

In the above code cran.rstudio.com is the ‘mirror’ from which r-base and other r- packages can be installed using the apt system. The following two commands, for example, would install the base R package (a ‘bare-bones’ install) and the package rcurl, which has an external dependency:

sudo apt-get install r-cran-base # install base R
sudo apt-get install r-cran-rcurl # install the rcurl package

Typical output from the second command is illustrate below:

The following extra packages will be installed:
libcurl3-nss
The following NEW packages will be installed
libcurl3-nss r-cran-rcurl
0 to upgrade, 2 to newly install, 0 to remove and 16 not to upgrade.
Need to get 699 kB of archives.
After this operation, 2,132 kB of additional disk space will be used.
Do you want to continue? [Y/n]


R also works on FreeBSD and other Unix-based systems.2

Once R is installed it should be kept up-to-date.

### 2.3.2 Updating R

R is a mature and stable language so well-written code in base R should work on most versions. However, it is important to keep your R version relatively up-to-date, because:

• Bug fixes are introduced in each version, making errors less likely;
• Performance enhancements are made from one version to the next, meaning your code may run faster in later versions;

Release notes with details on each of these issues are hosted at cran.r-project.org/src/base/NEWS. R release versions have 3 components corresponding to major.minor.patch changes. Generally 2 or 3 patches are released before the next minor increment - each ‘patch’ is released roughly every 3 months. R 3.2, for example, has consisted of 3 versions: 3.2.0, 3.2.1 and 3.2.2.

• On Ubuntu-based systems, new versions of R should be automatically detected through the software management system, and can be installed with apt-get upgrade.
• On Mac, the latest version should be installed by the user from the .pkg files mentioned above.
• On Windows installr package makes updating easy:

# check and install the latest R version
installr::updateR() 

For information about changes to expect in the next version, you can subscribe to the R’s NEWS RSS feed: developer.r-project.org/blosxom.cgi/R-devel/NEWS/index.rss. It’s a good way of keeping up-to-date.

### 2.3.3 Installing R packages

Large projects may need several packages to be installed. In this case, the required packages can be installed at once. Using the example of packages for handling spatial data, this can be done quickly and concisely with the following code:

pkgs = c("raster", "leaflet", "rgeos") # package names
install.packages(pkgs)

In the above code all the required packages are installed with two not three lines, reducing typing. Note that we can now re-use the pkgs object to load them all:

inst = lapply(pkgs, library, character.only = TRUE) # load them

In the above code library(pkg[i]) is executed for every package stored in the text string vector. We use library here instead of require because the former produces an error if the package is not available.

Loading all packages at the beginning of a script is good practice as it ensures all dependencies have been installed before time is spent executing code. Storing package names in a character vector object such as pkgs is also useful because it allows us to refer back to them again and again.

### 2.3.4 Installing R packages with dependencies

Some packages have external dependencies (i.e. they call libraries outside R). On Unix-like systems, these are best installed onto the operating system, bypassing install.packages. This will ensure the necessary dependencies are installed and setup correctly alongside the R package. On Debian-based distributions such as Ubuntu, for example, packages with names starting with r-cran- can be search for and installed as follows (see cran.r-project.org/bin/linux/ubuntu/ for a list of these):

apt-cache search r-cran- # search for available cran Debian packages
sudo apt-get-install r-cran-rgdal # install the rgdal package (with dependencies)

On Windows the installr package helps manage and update R packages with system-level dependencies. For example the Rtools package for compiling C/C++ code on Window can be installed with the following command:

installr::install.rtools() 

### 2.3.5 Updating R packages

An efficient R set-up will contain up-to-date packages. This can be done for all packages with:

update.packages() # update installed CRAN packages

The default for this function is for the ask argument to be set to TRUE, giving control over what is downloaded onto your system. This is generally desirable as updating dozens of large packages can consume a large proportion of available system resources.

To update packages automatically, you can add the line update.packages(ask = FALSE) to your .Rprofile startup file (see the next section for more on .Rprofile). Thanks to Richard Cotton for this tip.

An even more interactive method for updating packages in R is provided by RStudio via Tools > Check for Package Updates. Many such time saving tricks are enabled by RStudio, as described in a subsequent section. Next (after the exercises) we take a look at how to configure R using start-up files.

#### Exercises

1. What version of R are you using? Is it the most up-to-date?
2. Do any of your packages need updating?

## 2.4 R startup

Every time R starts a number of things happen. It can be useful to understand this startup process, so you can make R work the way you want it, fast. This section explains how.

### 2.4.1 R startup arguments

The arguments passed to the R startup command (typically simply R from a shell environment) determine what happens. The following arguments are particularly important from an efficiency perspective:

• --no-environ tells R to only look for startup files in the current working directory. (Do not worry if you don’t understand what this means at present: it will become clear as the later in the section.)

• --no-restore tells R not to load any .RData files knocking around in the current working directory.

• --no-save tells R not to ask the user if they want to save objects saved in RAM when the session is ended with q().

Adding each of these will make R load slightly faster, and mean that slightly less user input is needed when you quit. R’s default setting of loading data from the last session automatically is potentially problematic in this context. See An Introduction to R, Appendix B, for more startup arguments.

Some of R’s startup arguments can be controlled interactively in RStudio. See the online help file Customizing RStudio for more on this.

### 2.4.2 An overview of R’s startup files

There are two special files, .Renviron and .Rprofile, which determine how R performs for the duration of the session. R searches and acts on both files by default every time it starts. To turn off this default behaviour, R can be launched with the command line options --no environ and --no-init-file respectively. Thus to launch R without your usual .Rprofile settings (e.g. to ensure that your personal settings are not responsible for an error) from a shell, one would enter the following:

R --no-init-file

But what do these mysterious files do? Their purpose is summarised in the bullet points below and the rest of this chapter is dedicated to demystifying them.

• The primary purpose of .Renviron is to set environment variables. These are settings that relate to the operating system for telling where to find external programs and the contents of user-specific variables that other users should not have access to such as API key, small text strings used to verify the user when interacting web services.

• .Rprofile is a plain text file (which is always called .Rprofile, hence its name) that simply runs lines of R code every time R starts. If you want R to check for package updates each time it starts (as explained in the previous section), you simply add the relevant line somewhere in this file.

When R starts (unless it was launched with --no-environ) it first searches for .Renviron and then .Rprofile, in that order. Although .Renviron is searched for first, we will look at .Rprofile first as it is simpler and for many set-up tasks more frequently useful. Both files can exist in three directories on your computer.

Modification of R’s startup files should not be taken lightly. This is an advanced topic. If you modify your startup files in the wrong way, it can cause problems. We recommend proceeding with caution and take a conservative approach to modifying your startup files so that your R installation behaves similar to other ‘vanilla’ R installations, aiding reproducibility.

### 2.4.3 The location of startup files

Confusingly, multiple versions of these files can exist on the same computer, only one of which will be used per session. Note also that these files should only be changed with caution and if you know what you are doing. This is because they can make your R version behave differently to other R installations, potentially reducing the reproducibility of your code.

Files in three folders are important in this process:

• R_HOME, the directory in which R is installed. The etc sub-directory can contain start-up files read early on in the start-up process. Find out where your R_HOME is with the R.home() command.

• HOME, the user’s home directory. Typically this is /home/username on Unix machines or C:\Users\username on Windows (since Windows 7). Ask R where your home directory with, Sys.getenv("HOME").

• R’s current working directory. This is reported by getwd().

It is important to know the location of the .Rprofile and .Renviron set-up files that are being used out of these three options. R only uses one .Rprofile and one .Renviron in any session: if you have a .Rprofile file in your current project, R will ignore .Rprofile in R_HOME and HOME. Likewise, .Rprofile in HOME overrides .Rprofile in R_HOME. The same applies to .Renviron: you should remember that adding project specific environment variables with .Renviron will de-activate other .Renviron files.

To create a project-specific start-up script, simply create a .Rprofile file in the project’s root directory and start adding R code, e.g. via file.edit(".Rprofile"). Remember that this will make .Rprofile in the home directory be ignored. The following commands will open your .Rprofile from within an R editor:

file.edit("~/.Rprofile") # edit .Rprofile in HOME
file.edit(".Rprofile") # edit project specific .Rprofile

Warning: file paths provided by Windows operating systems will not always work in R. Specifically, if you use a path that contains single backslashes, such as C:\DATA\data.csv, as provided by Windows, this will generate the error: Error: unexpected input in “C:\”. To overcome this issue R provides two functions, file.path and normalizePath. The former can be used to specify file locations without having to use symbols to represent relative file paths, as follows: file.path(“C:”, “DATA”, “data.csv”). The latter takes any input string for a file name and outputs a text string that is standard (cannonical) for the operating system. normalizePath(“C:/DATA/data.csv”), for example, outputs C:\DATA\data.csv on a Windows machine but C:/DATA/data.csv on Unix-based platforms. Note that only the latter would work on both platforms so standard Unix file path notation is safe for all operating systems.

Editing the .Renviron file in the same locations will have the same effect. The following code will create a user specific .Renviron file (where API keys and other cross-project environment variables can be stored), without overwriting any existing file.

user_renviron = path.expand(file.path("~", ".Renviron"))
if(!file.exists(user_renviron)) # check to see if the file already exists
file.create(user_renviron)
file.edit(user_renviron) # open with another text editor if this fails

The pathological package can help find where .Rprofile and .Renviron files are located on your system, thanks to the os_path() function. The output of example(startup) is also instructive.

The location, contents and uses of each is outlined in more detail below.

### 2.4.4 The .Rprofile file

By default, R looks for and runs .Rprofile files in the three locations described above, in a specific order. .Rprofile files are simply R scripts that run each time R runs and they can be found within R_HOME, HOME and the project’s home directory, found with getwd(). To check if you have a site-wide .Rprofile, which will run for all users on start-up, run:

site_path = R.home(component = "home")
fname = file.path(site_path, "etc", "Rprofile.site")
file.exists(fname)

The above code checks for the presence of Rprofile.site in that directory. As outlined above, the .Rprofile located in your home directory is user-specific. Again, we can test whether this file exists using

file.exists("~/.Rprofile")

We can use R to create and edit .Rprofile (warning: do not overwrite your previous .Rprofile - we suggest you try project-specific .Rprofile first):

if(!file.exists("~/.Rprofile")) # only create if not already there
file.create("~/.Rprofile")    # (don't overwrite it)
file.edit("~/.Rprofile")

### 2.4.5 An example .Rprofile file

The example below provides a taster of what goes into .Rprofile. Note that this is simply a usual R script, but with an unusual name. The best way to understand what is going on is to create this same script, save it as .Rprofile in your current working directory and then restart your R session to observer what changes. To restart your R session from within RStudio you can click Session > Restart R or use the keyboard shortcut Ctrl+Shift+F10.

# A fun welcome message
message("Hi Robin, welcome to R")
# Customise the R prompt that prefixes every command
# (use " " for a blank prompt)
options(prompt = "R4geo> ")
# Don't convert text strings to factors with base read functions
options(stringsAsFactors = FALSE)

To quickly explain each line of code: the first simply prints a message in the console each time a new R session is started. The latter two modify options used to change R’s behavior, first to change the prompt in the console (set to R> by default) and second to ensure that unwanted factor variables are not created when read.csv and other functions derived from read.table are used to load external data into R. Note that simply adding more lines the .Rprofile will set more features. An important aspect of .Rprofile (and .Renviron) is that each line is run once and only once for each R session. That means that the options set within .Rprofile can easily be changed during the session. The following command run mid-session, for example, will return the default prompt:

options(prompt = "> ")

More details on these, and other potentially useful .Rprofile options are described subsequently. For more suggestions of useful startup settings, see Examples in help("Startup") and online resources such as those at statmethods.net. The help pages for R options (accessible with ?options) are also worth a read before writing you own .Rprofile.

Ever been frustrated by unwanted + symbols that prevent copyied and pasted multi-line functions from working? These potentially annoying +s can be erradicated by adding options(continue = " ") to your .Rprofile.

#### 2.4.5.1 Setting options

The function options, used above, contains a number of default settings. Typing options() provides a good indication of what be configured. Because options() are often related to personal preference (with few implications for reproducibility), that you will want for many your R sessions, .Rprofile in your home directory or in your project’s folder are sensible places to set them. Other illustrative options are shown below:

options(prompt = "R> ", digits = 4, show.signif.stars = FALSE)

This changes three default options in a single line.

• The R prompt, from the boring > to the exciting R>.
• The number of digits displayed.
• Removing the stars after significant $$p$$-values.

Try to avoid adding options to the start-up file that make your code non-portable. The stringsAsFactors = FALSE argument used above, for example, to your start-up script has knock-on effects for read.table and related functions including read.csv, making them convert text strings into characters rather than into factors as is default. This may be useful for you, but can make your code less portable, so be warned.

#### 2.4.5.2 Setting the CRAN mirror

To avoid setting the CRAN mirror each time you run install.packages you can permanently set the mirror in your .Rprofile.

# local creates a new, empty environment
# This avoids polluting .GlobalEnv with the object r
local({
r = getOption("repos")
r["CRAN"] = "https://cran.rstudio.com/"
options(repos = r)
})

The RStudio mirror is a virtual machine run by Amazon’s EC2 service, and it syncs with the main CRAN mirror in Austria once per day. Since RStudio is using Amazon’s CloudFront, the repository is automatically distributed around the world, so no matter where you are in the world, the data doesn’t need to travel very far, and is therefore fast to download.

#### 2.4.5.3 The fortunes package

This section illustrate what .Rprofile does with reference to a package that was developed for fun. The code below could easily be altered to automatically connect to a database, or ensure that the latest packages have been downloaded.

The fortunes package contains a number of memorable quotes that the community has collected over many years, called R fortunes. Each fortune has a number. To get fortune number $$50$$, for example, enter

fortunes::fortune(50)
#>
#> To paraphrase provocatively, 'machine learning is statistics minus any
#> checking of models and assumptions'.
#>    -- Brian D. Ripley (about the difference between machine learning and
#>       statistics)
#>       useR! 2004, Vienna (May 2004)

It is easy to make R print out one of these nuggets of truth each time you start a session, by adding the following to ~/.Rprofile:

if(interactive())
try(fortunes::fortune(), silent = TRUE)

The interactive function tests whether R is being used interactively in a terminal. The fortune function is called within try. If the fortunes package is not available, we avoid raising an error and move on. By using :: we avoid adding the fortunes package to our list of attached packages.

Typing search(), gives the list of attached packages. By using fortunes::fortune() we avoid adding the fortunes package to that list.

The function .Last, if it exists in the .Rprofile, is always run at the end of the session. We can use it to install the fortunes package if needed. To load the package, we use require, since if the package isn’t installed, the require function returns FALSE and raises a warning.

.Last = function() {
cond = suppressWarnings(!require(fortunes, quietly = TRUE))
if(cond)
try(install.packages("fortunes"), silent = TRUE)
message("Goodbye at ", date(), "\n")
}

#### 2.4.5.4 Useful functions

You can use .Rprofile define new ‘helper’ functions or redefine existing ones so they’re faster to type. For example, we could load the following two functions for examining data frames:

# ht == headtail
ht = function(d, n=6) rbind(head(d, n), tail(d, n))
# Show the first 5 rows & first 5 columns of a data frame
hh = function(d) d[1:5, 1:5]

and a function for setting a nice plotting window:

setnicepar = function(mar = c(3, 3, 2, 1), mgp = c(2, 0.4, 0),
tck = -0.01, cex.axis = 0.9,
las = 1, mfrow = c(1, 1), ...) {
par(mar = mar, mgp = mgp, tck = tck, cex.axis = cex.axis,
las = las, mfrow = mfrow, ...)
}

Note that these functions are for personal use and are unlikely to interfere with code from other people. For this reason even if you use a certain package every day, we don’t recommend loading it in your .Rprofile. Shortening long function names for interactive (but not reproducible code writing). If you frequently use View(), for example, you may be able to save time by referring to it in abbreviated form. This is illustrated below to make it faster to view datasets (although with IDE-driven autocompletion, outlined in the next section, the time savings is less.)

v = utils::View

Also beware the dangers of loading many functions by default: it may make your code less portable. Another potentially useful setting to change in .Rprofile is R’s current working directory. If you want R to automatically set the working directory to the R folder of your project, for example, one would add the following line of code to the project-specific .Rprofile:

setwd("R")

#### 2.4.5.5 Creating hidden environments with .Rprofile

Beyond making your code less portable, another downside of putting functions in your .Rprofile is that it can clutter-up your work space: when you run the ls() command, your .Rprofile functions will appear. Also if you run rm(list=ls()), your functions will be deleted. One neat trick to overcome this issue is to use hidden objects and environments. When an object name starts with ., by default it doesn’t appear in the output of the ls() function

.obj = 1
".obj" %in% ls()
#> [1] FALSE

This concept also works with environments. In the .Rprofile file we can create a hidden environment

.env = new.env()

and then add functions to this environment

USArrests$# a dropdown menu of columns should appear in RStudio To take a more complex example, variable names stored in the data slot of the class SpatialPolygonsDataFrame (a class defined by the foundational spatial package sp) are referred to in the long form spdf@data$varname.5 In this case spdf is the object name, data is the slot and varname is the variable name. RStudio makes such S4 objects easier to use by enabling autocompletion of the short form spdf$varname. Another example is RStudio’s ability to find files hidden away in sub-folders. Typing "te will find test.R even if it is located in a sub-folder such as R/test.R. There are a number of other clever autocompletion tricks that can boost R’s productivity when using RStudio which are best found by experimenting and hitting Tab frequently during your R programming work. ### 2.5.5 Keyboard shortcuts RStudio has many useful shortcuts that can help make your programming more efficient by reducing the need to reach for the mouse and point and click your way around code and RStudio. These can be viewed by using a little known but extremely useful keyboard shortcut (this can also be accessed via the Tools menu). Alt+Shift+K This will display the default shortcuts in RStudio. It is worth spending time identifying which of these could be useful in your work and practising interacting with RStudio rapidly with minimal reliance on the mouse. The power of these autocompletion capabilities can be further enhanced by setting your own keyboard shortcuts. However, as with setting .Rprofile and .Renviron settings, this risks reducing the portability of your workflow. To set your own RStudio keyboard shortcuts, navigate to Tools > Modify Keyboard Shortcuts. Some more useful shortcuts are listed below. There are many more gems to find that could boost your R writing productivity: • Ctrl+Z/Shift+Z: Undo/Redo. • Ctrl+Enter: Execute the current line or code selection in the Source pane. • Ctrl+Alt+R: Execute all the R code in the currently open file in the Source pane. • Ctrl+Left/Right: Navigate code quickly, word by word. • Home/End: Navigate to the beginning/end of the current line. • Alt+Shift+Up/Down: Duplicate the current line up or down. • Ctrl+D: Delete the current line. ### 2.5.6 Object display and output table It is useful to know what is in your current R environment. This information can be revealed with ls(), but this function only provides object names. RStudio provides an efficient mechanism to show currently loaded objects, and their details, in real-time: the Environment tab in the top right corner. It makes sense to keep an eye on which objects are loaded and to delete objects that are no longer useful. Doing so will minimise the probability of confusion in your workflow (e.g. by using the wrong version of an object) and reduce the amount of RAM R needs. The details provided in the Environment tab include the object’s dimension and some additional details depending on the object’s class (e.g. size in MB for large datasets). A very useful feature of RStudio is its advanced viewing functionality. This is triggered either by executing View(object) or by double clicking on the object name in the Environment tab. Although you cannot edit data in the Viewer (this should be considered a good thing from a data integrity perspective), recent versions of RStudio provide an efficient search mechanism to rapidly filter and view the records that are of most interest (see Figure 2-3 above). ### 2.5.7 Project management In the far top-right of RStudio there is a diminutive drop-down menu illustrated with R inside a transparent box. This menu may be small and simple, but it is hugely efficient in terms of organising large, complex and long-term projects. The idea of RStudio projects is that the bulk of R programming work is part of a wider task, which will likely consist of input data, R code, graphical and numerical outputs and documents describing the work. It is possible to scatter each of these elements at random across your hard-discs but this is not recommended. Instead, the concept of projects encourages reproducible working, such that anyone who opens the particular project folder that you are working from should be able to repeat your analyses and replicate your results. It is therefore highly recommended that you use projects to organise your work. It could save hours in the long-run. Organizing data, code and outputs also makes sense from a portability perspective: if you copy the folder (e.g. via GitHub) your can work on it from any computer without worrying about having the right files on your current machine. These tasks are implemented using RStudio’s simple project system, in which the following things happen each time you open an existing project: • The working directory automatically switches to the project’s folder. This enables data and script files to be referred to using relative file paths, which are much shorter than absolute file paths. This means that switching directory using setwd(), a common source of error for R users, is rarely if ever needed. • The last previously open file is loaded into the Source pane. The history of R commands executed in previous sessions is also loaded into the History tab. This assists with continuity between one session and the next. • The File tab displays the associated files and folders in the project, allowing you to quickly find your previous work. • Any settings associated with the project, such as Git settings, are loaded. This assists with collaboration and project-specific set-up. Each project is different but most contain input data, R code and outputs. To keep things tidy, we recommend a sub-directory structure resembling the following: project/ - README.Rmd # Project description - set-up.R # Required packages - R/ # For R code - input # Data files - graphics/ - output/ # Results Proper use of projects ensures that all R source files are neatly stashed in one folder with a meaningful structure. This way data and documentation can be found where one would expect them. Under this system figures and project outputs are ‘first class citizens’ within the project’s design, each with their own folder. Another approach to project management is to treat projects as R packages. This is not recommended for most use cases, as it places restrictions on where you can put files. However, if the aim is code development and sharing, creating a small R package may be the way forward, even if you never intend to submit it on CRAN. Creating R packages is easier than ever before, as documented in (Cotton 2013) and, more recently (H. Wickham 2015c). The devtools package help manage R’s quirks, making the process much less painful. If you use GitHub, the advantage of this approach is that anyone should be able to reproduce your working using devtools::install_github("username/projectname"), although the administrative overheads of creating an entire package for each small project will outweigh the benefits for many. Note that a set-up.R or even a .Rprofile file in the project’s root directory enable project-specific settings to be loaded each time people work on the project. As described in the previous section, .Rprofile can be used to tweak how R works at start-up. It is also a portable way to manage R’s configuration on a project-by-project basis. ### Exercises 1. Try modifying the look and appearance of your RStudio setup. 2. What is the keyboard shortcut to show the other shortcut? (Hint: it begins with Alt+Shift on Linux and Windows.) 3. Try as many of the shortcuts revealed by the previous step as you like. Write down the ones that you think will save you time, perhaps on a post-it note to go on your computer. ## 2.6 BLAS and alternative R interpreters In this section we cover a few system-level options available to speed-up R’s performance. Note that for many applications stability rather than speed is a priority, so these should only be considered if a) you have exhausted options for writing your R code more efficiently and b) you are confident tweaking system-level settings. This should therefore be seen as an advanced section: if you are not interested in speeding-up base R, feel free to skip to the next section of hardware. Many statistical algorithms manipulate matrices. R uses the Basic Linear Algebra System (BLAS) framework for linear algebra operations. Whenever we carry out a matrix operation, such as transpose or finding the inverse, we use the underlying BLAS library. By switching to a different BLAS library, it may be possible to speed-up your R code. Changing your BLAS library is straightforward if you are using Linux, but can be tricky for Windows users. The two open source alternative BLAS libraries are ATLAS and OpenBLAS. The Intel MKL is another implementation, designed for Intel processors by Intel and used in Revolution R (described in the next section) but it requires licensing fees. The MKL library is provided with the Revolution analytics system. Depending on your application, by switching you BLAS library, linear algebra operations can run several times faster than with the base BLAS routines. If you use Linux, you can find whether you have a BLAS library setting with the following function, from benchmarkme: library("benchmarkme") get_linear_algebra() ### 2.6.1 Testing performance gains from BLAS As an illustrative test of the performance gains offered by BLAS, the following test was run on a new laptop running Ubuntu 15.10 on a 6th generation Core i7 processor, before and after OpenBLAS was installed.6 res = benchmark_std() # run a suit of tests to test R's performance It was found that the installation of OpenBLAS led to a 2-fold speed-up (from around 150 to 70 seconds). The majority of the speed gain was from the matrix algebra tests, as can be seen in figure 2.4. Note that the results of such tests are highly dependent on the particularities of each computer. However, it clearly shows that ‘programming’ benchmarks (e.g. the calculation of 3,500,000 Fibonacci numbers) are now much faster, whereas matrix calculations and functions receive a substantial speed boost. This demonstrates that the speed-up you can expect from BLAS depends heavily on the type of computations you are undertaking. ### 2.6.2 Other interpreters The R language can be separated from the R interpreter. The former refers to the meaning of R commands, the latter refers to how the computer executes the commands. Alternative interpreters have been developed to try to make R faster and, while promising, none of the following options has fully taken off. • Microsoft R Open, formerly known as Revolution R Open (RRO), is the enhanced distribution of R from Microsoft. The key enhancement is that is uses multithreaded mathematics libraries, which can improve performance. • Rho (previously called CXXR, short for C++), a re-implementation of the R interpreter for speed and efficiency. Of the new interpreters, this is the one that has the most recent development activity (as of April 2016). • pqrR (pretty quick R) is a new version of the R interpreter. One major downside, is that it is based on R-2.15.0. The developer (Radford Neal) has made many improvements, some of which have now been incorporated into base R. pqR is an open-source project licensed under the GPL. One notable improvement in pqR is that it is able to do some numeric computations in parallel with each other, and with other operations of the interpreter, on systems with multiple processors or processor cores. • Renjin reimplements the R interpreter in Java, so it can run on the Java Virtual Machine (JVM). Since R will be pure Java, it can run anywhere. • Tibco created a C++ based interpreter called TERR. • Oracle also offer an R-interpreter that uses Intel’s mathematics library and therefore achieves a higher performance without changing R’s core. At the time of writing, switching interpreters is something to consider carefully. But in the future, it may become more routine. ### 2.6.3 Useful BLAS/benchmarking resources • The gcbd package benchmarks performance of a few standard linear algebra operations across a number of different BLAS libraries as well as a GPU implementation. It has an excellent vignette summarising the results. • Brett Klamer provides a nice comparison of ATLAS, OpenBLAS and Intel MKL BLAS libraries. He also gives a description of how to install the different libraries. • The official R manual section on BLAS. ### Exercises 1. What BLAS system is your version of R using? ### References Cotton, Richard. 2013. Learning R. O’Reilly Media. Wickham, Hadley. 2015c. R Packages. O’Reilly Media. 1. Benchmarking conducted for a presentation “R on Different Platforms” at useR 2006 found that R was marginally faster on Windows than Linux set-ups. Similar results were reported in an academic paper, with R completing statistical analyses faster on a Linux than Mac OS’s (Sekhon 2006). In 2015 Revolution R supported these results with slightly faster run times for certain benchmarks on Ubuntu than Mac systems. The data from the benchmarkme package also suggests that running code under the Linux OS is faster. 2. See jason-french.com/blog/2013/03/11/installing-r-in-linux/ for more information on installing R on a variety of Linux distributions. 3. See vignette("api-packages") from the httr package for more on this. 4. Other open source R IDEs exist, including RKWard, Tinn-R and JGR. emacs is another popular software environment. However, it has a very steep learning curve. 5. ‘Slots’ are elements of an object (specifically, S4 objects) analogous to a column in a data.frame but referred to with @ not $.

6. OpenBLAS was installed on the computer via sudo apt-get install libopenblas-base, which automatically detected and used by R.