Docker Engine is the most commonly used containerization platform. The benefits of using Docker containers include being able to run virtual machine-like environments in which multiple containers run on the same underlying OS. A system virtual machine runs on a hypervisor (such as VMWare and Virtual Box), which in turn runs on the host machine. The main difference between a Docker container and a system virtual machine (VM) is that whereas a virtual machine includes a complete OS, which could be several GB, a Docker container is a lightweight snapshot of the underlying OS. Although a virtual machine does not share the OS kernel, multiple Docker containers running on the same OS do. A Docker container includes all the application binaries and dependencies required to run the application. With multiple Docker containers running in isolation and each Docker container providing its own file system and networking, a better use of the OS kernel is made.
In this article we shall discuss how R, a commonly used language and software for statistical computing and graphics, may be used with Docker Engine. This article has following sections:
- Setting the Environment
- Using the Docker Image for R
- Using R Interactively
- Using R in Batch Mode with an R Script
- Creating a Docker Image for an R Application
- Running the Docker Image for an R Application
- Stopping a Docker Container
- Removing a Docker Image
- Conclusion
Setting the Environment
Even though Docker Engine may be installed on almost any OS, including most Linux distributions, Mac OS, and Windows, and Cloud environments Azure and AWS, we shall use CoreOS Linux, which has Docker pre-installed. Create an Amazon EC2 instance with the CoreOS Stable AMI. Copy the Public IP Address of the CoreOS instance to connect to the instance from a local machine.
Figure 1: Getting the Public IP Address
SSH Login to the CoreOS instance using the Public IP Address.
ssh -i "coreos.pem" core@107.23.107.17
Using the Docker Image for R
A Docker image encapsulates a file system and parameters for a specific application or software to create Docker container/s that run on a Docker Engine. A Docker image by itself does not run an application or software, but forms the basis from which to create Docker container/s. Multiple Docker images may be available for particular software. We shall use the official Docker image for R, called r-base and is available at https://hub.docker.com/_/r-base/ on Docker Hub, which is a repository for Docker images. A Docker image is versioned using tags. A complete name for a Docker image is of the format <imagename>:<tag>. If the :<tag> suffix is omitted, the tag is assumed to be “latest,” which most Docker images provide but are not required to. The Docker documentation may be referred to for an introduction to Docker and Docker commands.
Using R Interactively
A Docker container is created from a Docker image with the docker run command. The docker run command provides several options and the complete syntax may be seen at https://docs.docker.com/engine/reference/run/. To interact with software running in a Docker container from STDIN, use the -i option, and, to attach a pseudo-tty to the STDIN stream, use the -t option. To remove the Docker container after it is exited, use the
docker run -ti --rm r-base
The Docker image r-base gets pulled from the Docker Hub repo and R gets started.
Figure 2: Using a Docker Image to run R
The R command line interface (CLI) gets displayed.
Figure 3: R Command Line Interface
The Docker container created may be displayed with the following command:
docker ps
The Docker images may be displayed with the following command:
docker images
The Docker containers gets displayed and includes information about running containers, such as Container ID, Docker image used to create the container, the time elapsed since the container was created, and a generated container name. The Docker images listed are all the images available locally and include image information such as image name, ID, tag, and when created.
Figure 4: Listing Docker containers and Docker images
Output the version of R installed with the following command on R CLI:
R.version
As indicated, the R version is 3.3.2.
Figure 5: Listing the R version
List all the files in the current path.
list.files(path=".")
The various directories installed by R get listed.
Figure 6: Listing the directories
Create an R function with the following listing:
hello <- function( name ) { sprintf( "Hello, %s", name ); }
Invoke the R function:
hello("John Smith")
A message gets displayed.
Figure 7: Invoking an R Function
R makes use of packages to package functions for an application or software, and packages are available on repositories. R may be configured to specify the repositories from which R packages are to be downloaded with the following command:
setRepositories()
Select one or more repositories by specifying the repository listing number/s separated by a space.
Figure 8: Setting repositories
Though not required, it is recommended to set the download file method with the following option:
options(download.file.method = "wget")
When a package is installed, files for the package are downloaded and, without the download method configuration, some of the files may not get downloaded. When all the package files get downloaded, no error message is displayed.
Figure 9: Setting download file method
Without the file download method option, a message “cannot download all files” could get generated, indicating that all the files for the package could not be downloaded.
Figure 10: Message “cannot download all files”
An R package is installed and set to be used with the following commands in which the package name is specified with the variable package_name:
install.packages('package_name') library('package_name')
Some of the R packages are pre-installed and may be listed with the following command:
installed.packages()
As an example, the “graphics” package is one of the pre-installed R packages. To use the package, run the following command:
package(graphics)
The command has no output, but the graphics package, or the package in the library command, gets loaded to be used.
Figure 11: Loading the graphics library
The R functions provided by an R package may be listed with the ls command. As an example, to list the functions in the “graphics” package, run the following command.
ls(getNamespace("graphics"), all.names=TRUE)
All the functions in the “graphics” package get listed.
Next, we shall use a package. The stringr package is for String functions and, if not installed already, may be installed with the following command:
install.packages("stringr")
Subsequently, load the package with the following command:
library(stringr)
List the functions in the stringr package with the following command:
ls(getNamespace("stringr"), all.names=TRUE)
All the functions in the stringr package get listed.
Figure 12: Listing the functions in the stringr package
Next, use some of the functions. As an example, convert a lower case string to upper case with the str_to_upper function. Convert a upper case string to lower case with the function str_to_lower. A string to be converted could be a mixed case string, as demonstrated in a str_to_lower function input string.
Figure 13: Using the str_to_upper and str_to_lower functions
In this section, we started the R CLI and input R commands interactively. In the next section, we shall run an R script.
Using R in Batch Mode with an R Script
In the R Batch mode, the input to R may be obtained from a file also called an R script. An R script has an .R suffix. Create a helloworld.R script and copy the following listing to the R script:
hello <- function( name ) { sprintf( "Hello, %s", name ); } hello("John Smith")
Run the following docker run command to start a bash shell. The command is similar to the command used before to start an R CLI for the interactive R section, except that a bash shell is started to run R scripts on the Docker container started by the command.
docker run -ti --rm r-base /usr/bin/bash
A bash shell prompt gets displayed.
Figure 14: Starting a Bash Shell Prompt
The Docker container started is listed with the docker ps command.
Figure 15: Listing the Docker containers
To run the R script helloworld.R in the bash shell, we need to copy the script to the Docker container. Use the following syntax command to copy the helloworld.R script to the container. The containerid is a variable for the container ID, which may be obtained from the output from the docker ps command.
docker cp helloworld.R containerid:/helloworld.R
The helloworld.R script gets copied to the Docker container and may be accessed from the bash shell.
Figure 16: Copying files to the Docker container
List the files in the bash shell and the helloworld.R script gets listed.
Figure 17: Listing files in the Docker container
Run the helloworld.R script in the bash shell with the following command:
Rscript helloworld.R
The output from the R script gets displayed.
Figure 18: Running the helloworld.R script
An alternative to copying the R script to the Docker container is to create the R script in the bash shell itself with a vim command. The vim.tiny is a limited version of vim. As an example, create R script, hello-world.R, as follows:
vim.tiny hello-world.R
Copy the same listing as before to the R script, and the message output may be modified to differentiate from the earlier script.
Figure 19: The modified hello-world.R script
Run the hello-world.R script:
Rscript hello-world.R
The R script message gets output.
Figure 20: Outputting the R script message
We have discussed two methods of using R; the first was to start an R command line interface and run R commands interactively. The second was to run R scripts from a bash shell. Both methods involved running the R Docker image r-base directly. In the next section, we shall discuss how the base R image r-base could be used to create another Docker image for an R application and subsequently the Docker image run.
Creating a Docker Image for an R Application
A Docker image is created from a Dockerfile, which has a set of instructions to run when the Docker image is used to run a Docker container. A reference to all the Dockerfile instructions is available at https://docs.docker.com/engine/reference/builder/, although we shall make use of only a few of them.
A Dockerfile file does not include a suffix. A file without a suffix may be created in a Windows command line as follows with the notepad command; note and include the “.” after “Dockerfile”.
>notepad Dockerfile.
Click Yes in the dialog with the message “Do you want to create a new file?” A file, Dockerfile without a suffix, gets created in the directory from which the notepad command is run. Copy the following listing to the Dockerfile file. The FROM instruction sets the base Docker image as r-base. The COPY instruction copies the files from the current directory to the /usr/local/src/scripts directory. The WORKDIR instruction sets the working directory for the COPY and CMD instructions. The directory specified in the WORKDIR instruction is created if it does not already exist. The CMD instruction runs the Rscript helloworld.R command.
FROM r-base COPY . /usr/local/src/scripts WORKDIR /usr/local/src/scripts CMD ["Rscript", "helloworld.R"]
The Dockerfile script refers to a helloworld.R R script, which is the same script used in the preceding section. The helloworld.R script needs to be made available to the Dockerfile in the same directory. If the files are created on a local machine, copy the Dockerfile and the R script to the CoreOS instance with the following scp commands in which the Public IP of the CoreOS instance may be obtained from the EC2 Console as discussed earlier.
scp -I "coreos.pem" helloworld.R core@107.23.107.17 scp -I "coreos.pem" Dockerfile core@107.23.107.17
The Dockerfile and helloworld.R script get copied to the CoreOS instance.
Figure 21: Copying Dockerfile and helloworld.R script to a CoreOS instance
List the Dockerfile and R script in the CoreOS instance with the ls -l command.
Figure 22: Listing the Dockerfile and R script in the CoreOS instance
To build the Dockerfile into a Docker image, run the following docker build command. The command ends with a “.”, which specifies the path to the Dockerfile as the current directory.
docker build -t helloworld .
The docker build command runs to create a Docker image.
Figure 23: Running the docker build command
Subsequently, list the Docker images with the docker images command. The helloworld image gets listed. If no tag is specified, the default tag “latest” is used to create the Docker image with docker build.
Figure 24: Listing the Docker images
Running the Docker Image for an R Application
Next, we shall run the Docker image to create a Docker container with the docker run command. Use the following docker run command, in which the the --rm option is specified to remove the Docker container subsequent to running the CMD instruction.
docker run -ti --rm helloworld
The message from the R script gets output. The R CLI command prompt is not displayed because the CMD instruction runs an Rscript command and, subsequently, the Docker container exits.
Figure 25: Running the Docker image helloworld
Stopping a Docker container
A Docker container may be stopped with the docker stop command, as demonstrated for the Docker container created for running R commands interactively from the R CLI.
Figure 26: Stopping the Docker container
The docker ps command, which lists running containers only, does not list the container because it has been stopped.
Figure 27: No Docker containers get listed
Removing a Docker Image
To remove a Docker image, the docker rmi command may be used. Remove the helloworld Docker image with the following command:
docker rmi helloworld
The Docker image gets removed.
Figure 28: Removing the Docker image
Conclusion
In this article, we used R with Docker Engine.