# Chapter 5 Getting Started with `R`

If you are completely new to all things `R`

, welcome!

If you have a background in computer programming languages or software such as Python, Stata®, SAS®, or Matlab®, you may notice many familiar concepts and terminology such as functions, variables, and operators in the example `R`

code recipes referenced in this book.

## 5.1 Why `R`

?

`R`

is a free, open source statistical programming language that is powerful, flexible, and evolving. `R`

, which has grown significantly in popularity, is an interactive and object-oriented programming language that offers a variety of data structures, graphical capabilities, functions, packages, documentation, and community support. In addition, it is an evolving ecosystem that can effectively handle different data types and perform complex analysis on individual and distributed computer systems, which are important capabilities to consider when developing data analytics solutions of any size or scale.

## 5.2 Download R (Required)

`R`

is compatible with Windows™, macOS, and a variety of Unix systems.

The latest version of `R`

is available for download via the Comprehensive R Archive Network (CRAN):

All of the `R`

code in this book has been tested to work with `R`

version 3.5.2 (2018-12-20). Please check your existing `R`

installation and upgrade to the latest version if needed.

`R`

code files and data by visiting http://www.nandeshwar.info/ds4fundraisingcode.
## 5.3 Install RStudio (Optional)

RStudio is an integrated development environment (IDE), which includes a code editor, debugger, and visualization tools that make `R`

more user friendly.

RStudio Desktop (Open Source Edition) is available for free download via the following links:

RStudio: https://www.rstudio.com/

RStudio Desktop: https://www.rstudio.com/products/rstudio/#Desktop

## 5.4 Install Packages

`R`

is a popular programming language that benefits from community-driven support and ongoing enhancements.

`R`

packages, which you may have already heard about, are bundles of reusable `R`

functions, support documentation, and sample data (if included). As of writing this book, there are currently 12,106 `R`

packages available to download, install, and use. The fact that there are over 12,000 packages of freely available add-on code libraries speaks to the flexibility of the `R`

language and the robust commitment of the `R`

user community. The potential data analytics solutions you can develop using these packages is perhaps only limited by your curiosity, creativity, and willingness to learn `R`

!

We assume you’ve already downloaded `R`

on your computer, so now it’s time to get your feet wet and download two popular `R`

packages, `dplyr`

and `ggplot`

, using the `install.packages`

function to familiarize yourself with the `R`

package installation process.

To run the following code, copy and paste each line into your R console window and click the Enter key. Alternatively, you can copy and paste these commands into a new `R`

script by selecting `File > New File > R Script`

within R Studio.

```
# Install dplyr package
install.packages("dplyr", repos='http://cran.us.r-project.org')
# Install ggplot2 package
install.packages("ggplot2", repos='http://cran.us.r-project.org')
```

Voila! You successfully ran your first `R`

code, which downloaded and installed two popular `R`

packages for data manipulation and visualization tools. We’ll cover these tools later in greater detail.

These lines of `R`

code contain two `install.packages`

commands, each of which is preceded by a comment line indicated by the `#`

symbol. The `#`

symbol is a comment symbol that will not be executed by `R`

. As a good programming practice, comment your code liberally to document it for later reference.

`#`

symbol so that you can later reference, check, test, and update your code as needed.
If you create a new `R`

script, you can also highlight all four lines of code in your script with your mouse cursor and then manually select `Code > Run Select Line(s)`

from the R Studio menu. Alternatively, you can use the keyboard shortcut `Command + Enter`

on a Mac or `Control + Enter`

on Windows or Linux to run these lines of code.

For a full list of RStudio keyboard shortcuts, please refer to RStudio’s knowledge base.

Now that you’ve installed both `R`

packages, let’s load these packages and make them available for use on your system using the `library("package name")`

function.

```
# Load dplyr package
library("dplyr")
# Load ggplot2 package
library("ggplot2")
```

To see all of the `R`

packages installed on your system, call the `library`

function without any arguments (that is, inputs) or package names.

```
# List all packages installed
library()
```

In the `library`

function output, you should see both the `dplyr`

and `ggplot2`

packages listed in alphabetically along with the following brief package descriptions.

`dplyr`

: A Grammar of Data Manipulation`ggplot2`

: Create Elegant Data Visualizations Using the Grammar of Graphics

Congratulations!

You just completed an `R`

package installation process using repeatable and reusable `R`

code, which downloaded, installed, and loaded `R`

packages on your computer.

## 5.5 Learning R

Although `R`

is a powerful statistical modeling and programming environment, it can take some time to get comfortable using `R`

, especially if you don’t have any background in statistics or computer programming. For users with minimal experience in writing code, we encourage you to be patient while you get the hang of working with `R`

. The benefits (flexibility, extensibility, and speed, just to name a few) are well worth the time and effort to overcome the initial learning curve associated with `R`

.

Here are some tips for learning `R`

:

- Do: Many people learn R best through hands-on learning and directly entering
`R`

commands within the`R`

console window. - Review: Check out code samples and retype the commands you find in this book and beyond.
- Experiment: Try modifying
`R`

commands and running the code to see what happens to develop a better sense and understanding of how it works. - Research: You will encounter errors in
`R`

. Fortunately,`R`

has excellent error messages that (usually) offer useful diagnostic information to help you figure out the root cause of the issue.

## 5.6 R Console

Assuming you’ve already installed `R`

on your computer, the first thing you will encounter when you launch `R`

is the `R`

console window and the command prompt `>`

, which indicates `R`

is ready for your instructions.

As previously mentioned, `R`

is an interactive programming environment, so let’s use `R`

as a calculator and enter some basic arithmetic operators to explore it can do.

```
# Addition
1+8
#> [1] 9
# Subtraction
1-7
#> [1] -6
# Division
1/7
#> [1] 0.143
# Multiplication
1*7
#> [1] 7
# Exponentiation
2^3
#> [1] 8
# Order of Operations
1+2*3
#> [1] 7
```

After you enter each command into the `R`

command prompt, each result will be interactively displayed in the `R`

console as shown in Figure 5.2.

If you’ve installed RStudio, the `R`

Console command prompt and interactive output will be displayed at the bottom of your RStudio session window.

## 5.7 Built-in Functions

R has many built-in functions, which are reusable expressions that involve zero or more variables.

```
# Logarithm
log(x = 100)
#> [1] 4.61
# Square Root
sqrt(x = 16)
#> [1] 4
# Round
round(x = 8.3)
#> [1] 8
```

These variables are arguments (inputs or parameters) that are passed to functions in order to perform various types of calculations. For example, the `sqrt`

function takes a single argument of `x`

. We used 16 as our `x`

and the function returned its square root of 4.

Functions can also take more than one parameter, separated by commas.

In the previous example, the `round`

function took the number 8.3 and rounded to the closest integer, which is 8. However, if we pass the `round`

function a number such as pi (3.141592…), we can instruct `R`

to round pi to the nearest hundredth by passing an additional parameter `digits`

the value of `2`

.

```
# Round
round(x = 3.141592, digits = 2)
#> [1] 3.14
```

The base installation of `R`

includes several built-in constant variables, one of which is `pi`

.

`LETTERS`

: The 26 upper-case letters of the Roman alphabet`letters`

: The 26 lower-case letters of the Roman alphabet`month.abb`

: The three-letter abbreviations for the English month names`month.name`

: The English names for the months of the year`pi`

: The ratio of the circumference of a circle to its diameter

Rather that manually typing the value of pi in the previous example, you could have also used the built-in constant `pi`

.

```
# Round
round(x = pi, digits = 2)
#> [1] 3.14
```

If you want additional information about a function and its parameters, the base `R`

installation comes with useful help pages with function descriptions, usage, arguments, details, and examples.

`?`

operator or `help`

function. Another way is using `example(function_name)`

command. Try `example(round)`

in your console.
To learn more about the `round`

function and its usage details, try entering either of the following commands in your `R`

console.

```
# ? Operator Help
?round
# Help Function
help(round)
```

To learn more about built-in constants in the base `R`

namespace, try entering either of the following commands.

```
# ? Operator Help
?Constants
# Help Function
help(Constants)
```

`R`

also allows you to write your own functions. If you are curious or are already comfortable using built-in functions, we encourage you to explore and try creating your own custom functions. For additional details, you can check out this article.
## 5.8 Variables

Variables allow you to store data in a named object, whose values can later be retrieved and changed as needed. To create a variable in `R`

, use the assignment operator “<-”" to assign data to a variable name.

For example, suppose we wanted to store the value of the square root calculation for later use. Here’s a code snippet that stores the calculation in a variable.

```
# Calculate square root and assign to "sqroot" variable
sqroot <- sqrt(16)
# Print "sqroot" value
sqroot
#> [1] 4
```

In this example, you will note that we selected `sqroot`

as the variable name to avoid a naming conflict with the `sqrt`

function. To further extend this example, suppose we needed to regularly update the `sqrt`

function input value instead of hard-coding the value “16”. We can modify the code to use another variable for the input parameter.

```
# Square Root Function Input (Parameter)
input <- 16
# Calculate square root and assign to "sqroot" variable
sqroot <- sqrt(input)
# Print "sqroot" value
sqroot
#> [1] 4
```

## 5.9 Conditional Logic

`R`

provides a variety of logical operators that return a value of `TRUE`

or `FALSE`

.

```
# Less Than
1 < 2
#> [1] TRUE
# Less Than or Equal To
2 <= 2
#> [1] TRUE
# Greater Than
1 > 2
#> [1] FALSE
# Greater Than or Equal to
2 >= 2
#> [1] TRUE
# Exactly Equal to
2 == 2
#> [1] TRUE
# Not Equal To
1 != 1
#> [1] FALSE
# Not X
X <- TRUE
!X
#> [1] FALSE
# X or Y
X <- FALSE
Y <- TRUE
X | Y
#> [1] TRUE
# X AND Y
X <- FALSE
Y <- TRUE
X & Y
#> [1] FALSE
# Test whether value of X is TRUE
X <- FALSE
isTRUE(X)
#> [1] FALSE
```

## 5.10 Data Types

Everything in `R`

is an object. `R`

offers a variety of data types such as scalars, vectors, matrices, data frames, and lists.

## 5.11 Vectors

A vector is an ordered collection of atomic (integer, numeric, character, or logical) values. Vectors are one of the most common and basic data structures in `R`

, so it is useful to familiarize yourself with them.

Vectors can be one of two different types: (1) atomic vectors and (2) lists.

You can manually create a vector by using the `c`

, or `combine`

, function to combine a collection of data values. For example, suppose we needed to create a list of donor ages and store them in a variable called `donor_age`

.

```
# Create donor_age vector
donor_age <- c(28, 32, 77, 57, 52, 41, 42, 49)
```

We can use the `c`

function again to add additional elements to `donor_age`

if needed.

```
# Update donor_age with additional donor age values
donor_age <- c(donor_age, 72, 68)
```

## 5.12 Sequences

You can also create vectors as a sequence of numbers using the `seq`

function or using the “:” operator.

```
seq(from = 1, to = 10)
#> [1] 1 2 3 4 5 6 7 8 9 10
series <- 1:10
series
#> [1] 1 2 3 4 5 6 7 8 9 10
# check whether they give same results
identical(x = seq(1, 10), y = series)
#> [1] TRUE
```

## 5.13 Matrices

Matrices are a special type of atomic (integer, numeric, character, or logical) vector with dimensional attributes (rows and columns). By default, matrices are filled column wise.

## 5.14 Lists

A list is a special vector type where elements are not restricted to a single data type. Because the contents of a list can include a mixture of data types, lists are flexible data structures and sometimes referred to as generic vectors.

To create a list, use the `list`

function.

```
# Update donor_age with additional donor age values
donor_name <- "John Smith"
donor_age <- 58
donor_city <- "San Francisco"
donor_lifetimegiving <- 14225
donor_profile <- list(donor_name, donor_age,
donor_city, donor_lifetimegiving)
donor_profile
#> [[1]]
#> [1] "John Smith"
#>
#> [[2]]
#> [1] 58
#>
#> [[3]]
#> [1] "San Francisco"
#>
#> [[4]]
#> [1] 14225
```

## 5.15 Factors

Factors are vectors used to represent categorical data labels.

Factors can be ordered or unordered and are especially useful when organizing and working with categorical data due to their speed and efficiency. Although factors look like character vectors, they are actually stored internally within `R`

as integers, so you need to be careful when treating them like characters to avoid running into errors. It is also important to note that factors can only contain pre-defined label values, also known as levels.

```
donor_ind <- factor(c("no", "no", "yes",
"yes", "yes", "no",
"no", "yes", "yes",
"yes"))
donor_ind
```

Let’s use the `table`

function to create a two-way frequency table that shows the count of donors versus non-donors using the donor indicator variable `donor_ind`

we just created.

```
donor_ind <- factor(c("no", "no", "yes",
"yes", "yes", "no",
"no", "yes", "yes",
"yes"))
table(donor_ind)
#> donor_ind
#> no yes
#> 4 6
```

## 5.16 Data Frame

A data frame is a special kind of list where each element has the same length. Data frames are important in `R`

because they are used frequently for storing tabular data for analysis.

In addition to length, data frames have additional attributes, such as `rownames`

, which can be used to organize and annotate data labels, such as `donor_id`

.

Let’s create a data frame using the `donor_age`

and `donor_ind`

vectors we just created.

```
donor_age <- c(28, 32, 77,
57, 52, 41, 42,
49, 72, 68)
donor_ind <- factor(c("no", "no", "yes",
"yes", "yes", "no",
"no", "yes", "yes",
"yes"))
dd <- data.frame(donor_age, donor_ind)
dd
#> donor_age donor_ind
#> 1 28 no
#> 2 32 no
#> 3 77 yes
#> 4 57 yes
#> 5 52 yes
#> 6 41 no
#> 7 42 no
#> 8 49 yes
#> 9 72 yes
#> 10 68 yes
```

Let’s use the `table`

function to display a frequency table of `donor_age`

and `donor_ind`

.

```
table(dd)
#> donor_ind
#> donor_age no yes
#> 28 1 0
#> 32 1 0
#> 41 1 0
#> 42 1 0
#> 49 0 1
#> 52 0 1
#> 57 0 1
#> 68 0 1
#> 72 0 1
#> 77 0 1
```

## 5.17 Data Types

`R`

provides several functions to examine the features of various data types such as:

`class`

: What kind of data object?`type`

: What kind of data storage type?`length`

: What is the length of the data object?`attributes`

: What kind of metadata?`str`

: What kind of data object and internal structure?

## 5.18 Additional Support

We encourage you to start where you are and embrace the learning curve you inevitably encounter when learning any type of new language, whether computer or human.

For reference, the following is a link to `R`

manuals provided by the `R`

Development Core Team as a learning resource.

The following is a list of `R`

community support sites with knowledgeable and helpful `R`

user forums, which can be a useful resource when you encounter questions or run into a technical hurdle.

```
# Install dplyr package
#install.packages("dplyr")
# Install ggplot2 package
#install.packages("ggplot2")
# Install tidyverse
#install.packages("tidyverse")
# Load dplyr package
library("dplyr")
# Load ggplot2 package
library("ggplot2")
```