3.4 RStudio

RStudio is an Integrated Development Environment (IDE) for R. It makes life easy for R users and developers with its intuitive and flexible interface. RStudio encourages good programming practice. Through its wide range of features RStudio can help make you a more efficient and productive R programmer, for example by reducing the amount of time spent remembering and typing function names thanks to intelligent autocompletion. Some of the most important features of RStudio include:

  • Flexible window pane layouts to optimise use of screen space and enable fast interactive visual feed-back.
  • Intelligent auto-completion of function names, packages and R objects.
  • A wide range of keyboard shortcuts.
  • Visual display of objects, including a searchable data display table.
  • Real-time code checking and error detection.
  • Menus to install and update packages.
  • Project management and integration with version control.

The above list of features should make it clear that a well set-up IDE can be as important as a well set-up R installation for becoming an efficient R programmer.5 As with R itself, the best way to learn about RStudio is by using it. It is therefore worth reading through this section in parallel with using RStudio to boost your productivity.

3.4.1 Installing and updating RStudio

RStudio can be installed from the RStudio website rstudio.com and is available for all major operating systems. Updating RStudio is simple: click on Help > Check for Updates in the menu. For fast and efficient work keyboard shortcuts should be used wherever possible, reducing the reliance on the mouse. RStudio has many keyboard shortcuts that will help with this. To get into good habits early, try accessing the RStudio Update interface without touching the mouse. On Linux and Windows dropdown menus are activated with the Alt button, so the menu item can be found with:

Alt+H U

On Mac it works differently. Cmd+? should activate a search across menu items, allowing the same operation can be achieved with:

Cmd+? update

Note: in RStudio the keyboard shortcuts differ between Linux and Windows versions on one hand and Mac on the other. In this section we generally only use the Windows/Linux shortcut keys for brevity. The Mac equivalent is usually found by simply replacing Ctl and Alt with the Mac-specific Cmd button.

3.4.2 Window pane layout

RStudio has four main window ‘panes’ (see Figure 3.2), each of which serves a range of purposes:

  • The Source pane, for editing, saving, and dispatching R code to the console (top left). Note that this pane does not exist by default when you start RStudio: it appears when you open an R script, e.g. via File -> New File -> R Script. A common task in this pane is to send code on the current line to the console, via Ctl-Enter (or Cmd-Enter on Mac).

  • The Console pane. Any code entered here is processed by R, line by line. This pane is ideal for interactively testing ideas before saving the final results in the Source pane above.

  • The Environment pane (top right) contains information about the current objects loaded in the workspace including their class, dimension (if they are a data frame) and name. This pane also contains tabbed sub-panes with a searchable history that was dispatched to the console and (if applicable to the project) Build and Git options.

  • The Files pane (bottom right) contains a simple file browser, a Plots tab, Help and Package tabs and a Viewer for visualising interactive R output such as those produced by the leaflet package and HTML ‘widgets’.

RStudio Panels

Figure 3.2: RStudio Panels

Using each of the panels effectively and navigating between them quickly is a skill that will develop over time, and will only improve with practice.

3.4.3 Exercises

You are developing a project to visualise data. Test out the multi-panel RStudio workflow by following the steps below:

  1. Create a new folder for the input data using the Files pane.

  2. Type in downl in the Source pane and hit Enter to make the function download.file() autocomplete. Then type ", which will autocomplete to "", paste the URL of a file to download (e.g. https://www.census.gov/2010census/csv/pop_change.csv) and a file name (e.g. pop_change.csv).
  3. Execute the full command with Ctl-Enter:

    download.file("https://www.census.gov/2010census/csv/pop_change.csv",
                  "data/pop_change.csv")
  4. Write and execute a command to read-in the data, such as

    pop_change = read.csv("data/pop_change.csv", skip = 2)
  5. Use the Environment pane to click on the data object pop_change. Note that this runs the command View(pop_change), which launches an interactive data explore pane in the top left panel (see Figure 2-3).

    The data viewing tab in RStudio.

    Figure 3.3: The data viewing tab in RStudio.

  6. Use the Console to test different plot commands to visualise the data, saving the code you want to keep back into the Source pane, as pop_change.R.

  7. Use the Plots tab in the Files pane to scroll through past plots. Save the best using the Export dropdown button.

The above example shows understanding of these panes and how to use them interactively can help with the speed and productivity of you R programming. Further, there are a number of RStudio settings that can help ensure that it works for your needs.

3.4.4 RStudio options

A range of Project Options and Global Options are available in RStudio from the Tools menu (accessible in Linux and Windows from the keyboard via Alt+T). Most of these are self-explanatory but it is worth mentioning a few that can boost your programming efficiency:

  • GIT/SVN project settings allow RStudio to provide a graphical interface to your version control system, described in Chapter XX.

  • R version settings allow RStudio to ‘point’ to different R versions/interpreters, which may be faster for some projects.

  • Restore .RData: Unticking this default preventing loading previously creating R objects. This will make starting R quicker and also reduce the change of getting bugs due to previously created objects.

  • Code editing options can make RStudio adapt to your coding style, for example, by preventing the autocompletion of braces, which some experienced programmers may find annoying. Enabling Vim mode makes RStudio act as a (partial) Vim emulator.

  • Diagnostic settings can make RStudio more efficient by adding additional diagnostics or by removing diagnostics if they are slowing down your work. This may be an issue for people using RStudio to analyse large datasets on older low-spec computers.

  • Appearance: if you are struggling to see the source code, changing the default font size may make you a more efficient programmer by reducing the time overheads associated with squinting at the screen. Other options in this area relate more to aesthetics, which are also important because feeling comfortable in your programming environment can boost productivity.

3.4.5 Auto-completion

R provides some basic autocompletion functionality. Typing the beginning of a function name, for example rn (short for rnorm()), and hitting Tab will result in the full function names associated with this text string being printed. In this case two options would be displayed: rnbinom and rnorm, providing a useful reminder to the user about what is available. The same applies to file names enclosed in quote marks: typing te in the console in a project which contains a file called test.R should result in the full name "test.R" being auto-completed. RStudio builds on this functionality and takes it to a new level.

Instead of only auto completing options when Tab is pressed, RStudio auto completes them at any point. Building on the previous example, RStudio’s autocompletion triggers when the first three characters are typed: rno. The same functionality works when only the first characters are typed, followed by Tab: automatic auto-completion does not replace Tab autocompletion but supplements it. Note that in RStudio two more options are provided to the user after entering rn Tab compared with entering the same text into base R’s console described in the previous paragraph: RNGkind and RNGversion. This illustrates that RStudio’s autocompletion functionality is not case sensitive in the same way that R is. This is a good thing because R has no consistent function name style!

RStudio also has more intelligent auto-completion of objects and file names than R’s built-in command line. To test this functionality, try typing US, followed by the Tab key. After pressing down until USArrests is selected, press Enter so it autocompletes. Finally, typing $ should leave the following text on the screen and the four columns should be shown in a drop-down box, ready for you to select the variable of interest with the down arrow.

USArrests$ # a dropdown menu of columns should appear in RStudio

To take a more complex example, variable names stored in the data slot of the class SpatialPolygonsDataFrame (a class defined by the foundational spatial package sp) are referred to in the long form spdf@data$varname.6 In this case spdf is the object name, data is the slot and varname is the variable name. RStudio makes such S4 objects easier to use by enabling autocompletion of the short form spdf$varname. Another example is RStudio’s ability to find files hidden away in sub-folders. Typing "te will find test.R even if it is located in a sub-folder such as R/test.R. There are a number of other clever auto-completion tricks that can boost R’s productivity when using RStudio which are best found by experimenting and hitting Tab frequently during your R programming work.

3.4.6 Keyboard shortcuts

RStudio has many useful shortcuts that can help make your programming more efficient by reducing the need to reach for the mouse and point and click your way around code and RStudio. These can be viewed by using a little known but extremely useful keyboard shortcut:

Alt+Shift+K

This will display the default shortcuts in RStudio. It is worth spending time identifying which of these could be useful in your work and practising interacting with RStudio rapidly with minimal reliance on the mouse. Some more useful shortcuts are listed below. There are many more gems to find that could boost your R writing productivity:

  • Ctl+Z/Shift+Z: Undo/Redo.
  • Ctl+Enter: Execute the current line or code selection in the Source pane.
  • Ctl+Alt+R: Execute all the R code in the currently open file in the Source pane.
  • Ctl+Left/Right: Navigate code quickly, word by word.
  • Home/End: Navigate to the beginning/end of the current line.
  • Alt+Shift+Up/Down: Duplicate the current line up or down.
  • Ctl+D: Delete the current line.

3.4.7 Object display and output table

It is useful to know what is in your current R environment. This information can be revealed with ls(), but this function only provides object names. RStudio provides an efficient mechanism to show currently loaded objects, and their details, in real-time: the Environment tab in the top right corner. It makes sense to keep an eye on which objects are loaded and to delete objects that are no longer useful. Doing so will minimise the probability of confusion in your workflow (e.g. by using the wrong version of an object) and reduce the amount of RAM R needs. The details provided in the Environment tab include the object’s dimension and some additional details depending on the object’s class (e.g. size in MB for large datasets).

A very useful feature of RStudio is its advanced viewing functionality. This is triggered either by executing View(object) or by double clicking on the object name in the Environment tab. Although you cannot edit data in the Viewer (this should be considered a good thing from a data integrity perspective), recent versions of RStudio provide an efficient search mechanism to rapidly filter and view the records that are of most interest (see Figure 2-3 above).

3.4.8 Project management

In the far top-right of RStudio there is a diminutive drop-down menu illustrated with R inside a transparent box. This menu may be small and simple, but it is hugely efficient in terms of organising large, complex and long-term projects.

The idea of RStudio projects is that the bulk of R programming work is part of a wider task, which will likely consist of input data, R code, graphical and numerical outputs and documents describing the work. It is possible to scatter each of these elements at random across your hard-discs but this is not recommended. Instead, the concept of projects encourages reproducible working, such that anyone who opens the particular project folder that you are working from should be able to repeat your analyses and replicate your results.

It is therefore highly recommended that you use projects to organise your work. It could save hours in the long-run. Organizing data, code and outputs also makes sense from a portability perspective: if you copy the folder (e.g. via GitHub) your can work on it from any computer without worrying about having the right files on your current machine. These tasks are implemented using RStudio’s simple project system, in which the following things happen each time you open an existing project:

  • The working directory automatically switches to the project’s folder. This enables data and script files to be referred to using relative file paths, which are much shorter than absolute file paths. This means that switching directory using setwd(), a common source of error for R users, is rarely if ever needed.

  • The last previously open file is loaded into the Source pane. The history of R commands executed in previous sessions is also loaded into the History tab. This assists with continuity between one session and the next.

  • The File tab displays the associated files and folders in the project, allowing you to quickly find your previous work.

  • Any settings associated with the project, such as Git settings, are loaded. This assists with collaboration and project-specific set-up.

Each project is different but most contain input data, R code and outputs. To keep things tidy, we recommend a sub-directory structure resembling the following:

project/
  - README.rmd # Project description
  - set-up.R  # Required packages
  - R/ # For R code
  - input # Data files
  - graphics/
  - output/ # Results

Proper use of projects ensures that all R source files are neatly stashed in one folder with a meaningful structure. This way data and documentation can be found where one would expect them. Under this system figures and project outputs are ‘first class citizens’ within the project’s design, each with their own folder.

Another approach to project management is to treat projects as R packages. Creating R packages is easier than ever before, with tools such as devtools for managing R’s quirks and making the process user friendly. If you use GitHub, the advantage of this approach is that anyone should be able to reproduce your working using devtools::install_github("username/projectname"), although the administrative overheads of creating an entire package for each small project will outweigh the benefits for many.

Note that a set-up.R or even a .Rprofile file in the project’s root directory enable project-specific settings to be loaded each time people work on the project. As described in the previous section, .Rprofile can be used to tweak how R works at start-up. It is also a portable way to manage R’s configuration on a project-by-project basis.

3.4.9 Exercises

  1. Try modifying the look and appearance of your RStudio setup.

  2. What is the keyboard shortcut to show the other shortcut? (Hint: it begins with Alt+Shift.)

  3. Try as many of the shortcuts revealed by the previous step as you like. Write down the ones that you think will save you time, perhaps on a post-it note to go on your computer.