3.1 Operating system

R works on all three major operating systems (OSs): Linux, Mac and Windows. R is predominantly platform-independent, meaning that it should behave in the same way on each of these platforms. This is partly facilitated by CRAN tests which ensure that R packages work in all major operating systems. There are some operating system specific quirks that may influence the choice of OS and how it is set-up for R programming in the long-term.

3.1.1 Operating system and resource monitoring

Minor differences aside,1 R’s computational efficiency is the same across different operating systems. This is important as it means the techniques will, in general, work equally well on different OSs. Beyond the \(32\) vs \(64\) bit issue (covered in the next chapter) and process forking (covered in Chapter 6) the main issue for many will be user friendliness and compatibility other programs used alongside R for work. Changing operating system can be a time consuming process so our advice is usually to stick to whatever OS you are most comfortable with.

Some packages (e.g. those that must be compiled and that depend on external libraries) are best installed at the operating system level (i.e. not using install.packages) on Linux systems. On Debian-based operating systems such as Ubuntu, these are named with the prefix r-cran- (see Section 2.4).

Regardless of your operating system, it is good practice to track how system resources (primarily CPU and RAM use) respond when running time-consuming tasks. Alongside R profiling functions such as profvis (see Section XXX), system monitoring can help identify performance bottlenecks and opportunities for making tasks run faster.

A common use case for system monitoring of R processes is to identify how much RAM is being used and whether more is needed (covered in Chapter 3). System monitors also report the percentage of CPU resource allocated over time. On modern multi-threaded CPUs, many tasks will use only a fraction of the available CPU resource because R is by default a single-threaded program (see Chapter 6 on parallel programming). Monitoring CPU load in this context can be useful for identifying whether R is running in parallel (see Figure @ref{2-1}).

Output from a system monitor (`gnome-system-monitor` running on Ubuntu) showing the resources consumed by running the code presented in the second of the Exercises at the end of this section. The first increases RAM use, the second is single-threaded and the third is multi-threaded.

Figure 3.1: Output from a system monitor (gnome-system-monitor running on Ubuntu) showing the resources consumed by running the code presented in the second of the Exercises at the end of this section. The first increases RAM use, the second is single-threaded and the third is multi-threaded.

System monitoring is a complex topic that spills over into system administration and server management. Fortunately there are many tools designed to ease monitoring all major operating systems.

  • On Linux, the shell command top displays key resource use figures for most distributions. htop and Gnome’s System Monitor (gnome-system-monitor, see Figure 2-1) are more refined alternatives which use command-line and graphical user interfaces respectively. A number of options such as nethogs monitor internet usage.
  • On Windows the Task Manager provides key information on RAM and CPU use by process. This can be started in modern Windows versions by typing Ctl-Alt-Del or by clicking the task bar and ‘Start Task Manager’.
  • On Mac the Activity Monitor provides similar functionality. This can be initiated form the Utilities folder in Launchpad.

3.1.2 Exercises

  1. What is the exact version of your computer’s operating system?
  2. Start an activity monitor then type and execute the following code. How do the results on your system compare to those presented in Figure 2-1?

    # 1: Create large dataset
    X = data.frame(matrix(rnorm(1e8), nrow = 1e7))
    
    # 2: Find the median of each column using a single core
    r1 = lapply(X, median)
    
    # 3: Find the median of each column using many cores
    # XXX: Change to function from package
    r2 = parallel::mclapply(X, median) # runs in serial on Windows
  3. What do you notice regarding CPU usage, RAM and system time, during and after each of the three operations?
  4. Bonus question: how would the results change depending on operating system?