## 4.2 Random access memory: RAM

Random access memory (RAM) is a type of computer memory that can be accessed randomly: any byte of memory can be accessed without touching the preceding bytes. RAM is found in computers, phones, tablets and even printers. The amount of RAM R has access to is directly related to the size of data sets that it can process. As the amount of RAM your machine has increases, the size of datasets you can analyse also increases, so it is important to have sufficient RAM for your work.

Even if the original data set is relatively small, the analysis can generate large objects. For example, suppose we want to perform standard cluster analysis. The built-in data set USAarrests, is a data frame with $$50$$ rows and $$4$$ columns. Each row corresponds to a state in the USA

head(USArrests, 3)
##         Murder Assault UrbanPop Rape
## Alabama   13.2     236       58 21.2
## Alaska    10.0     263       48 44.5
## Arizona    8.1     294       80 31.0

If we want to group states that have similar crime statistics, a standard first step is to calculate the distance or similarity matrix

d = dist(USArrests)

When we inspect the object size of the original data set and the distance object using the pryr package

pryr::object_size(USArrests)
## 5.23 kB
pryr::object_size(d)
## 14.3 kB

we have managed to create an object that is three times larger than the original data set. In fact the object d is a symmetric $$n \times n$$ matrix, where $$n$$ is the number of rows in USAarrests. Clearly, as n increases the size of d increases at rate $$O(n^2)$$. So if our original data set contained $$10,000$$ records, the associated distance matrix would contain almost $$10^8$$ values. Of course since the matrix is symmetric, this corresponds to around $$50$$ million unique values. A rough rule of thumb is that your RAM should be three times the size of your data set.

Another benefit of having increasing the amount of onboard RAM is that the ‘garbage collector’, a process that runs periodically to free-up system memory occupied by R, is called less often (we will cover this in more detail in chapter XXX).

It is straightforward to determine how much RAM you have. Under Windows,

1. Clicking the Start button picture of the Start button, then right-clicking Computer. Next click on Properties.
2. In the System section, you can see the amount of RAM your computer has next to the Installed memory (RAM) section. Windows reports how much RAM it can use, not the amount installed. This is only an issue if you are using a 32-bit version of Windows.

In Mac, click the Apple menu. Select About This Mac and a window appears with the relevant information.

On almost all Unix-based OSs, you can find out how much RAM you have using the code vmstat, whilst on all Linux distributions, you can use the command free. Using this in conjunction with the -h tag will provide the answer in human readable format, as illustrated below for a 16 GB machine:

\$ free -h
total       used       free
Mem:           15G       4.0G        11G 

It is sometimes possible to increase your computer’s RAM. On the computer motherboard, there are typically $$2$$ or $$4$$ RAM or memory slots. If you have free slots, then you can add more RAM. However, it is common that all slots are already taken. This means that to upgrade your computer’s memory, some or all of the RAM will have to be removed. To go from $$8$$GB to $$16$$GB, for example, you may have to discard the two $$4$$GB RAM cards and replace them with two $$8$$GB cards. Increasing your laptop/desktop from 4GB to $$16$$GB or $$32$$GB is cheap and should definitely be considered. As R Core member Uwe Ligges states,

fortunes::fortune(192)
##
## RAM is cheap and thinking hurts.
##    -- Uwe Ligges (about memory requirements in R)
##       R-help (June 2007)

It is a testament to the design of R that it is still relevant and its popularity is growing. Ross Ihaka, one of the originators of the R programming language, made a throw-away comment in 2003:

fortunes::fortune(21)
##
## I seem to recall that we were targetting 512k Macintoshes. In our dreams
## we might have seen 16Mb Sun.
##    -- Ross Ihaka (in reply to the question whether R&R thought when they
##       started out that they would see R using 16G memory on a dual Opteron
##       computer)
##       R-help (November 2003)

Considering that a standard smart phone now contains $$1$$GB of RAM, the fact that R was designed for “basic” computers, but can scale across clusters is impressive. R’s origins on computers with limited resources helps explain its efficiency at dealing with large datasets.