An AWESOME feature of R, and perhaps a main reason for using it, is that it allows you to program your own functions. In fact, all the commands that you use in R are functions.
We can look at the basic structure of a function in R by typing the
function name, but excluding the parentheses. For example, lets take a
look at the function for taking the row mean of an object,
rowMeans()
. Just type:
The basic structure of a function is:
function.name <- function(
arguments or things you need
separated by commas){
actions or stuff you want the function to be doing
return(
information you want from the
function)
}
Basically, we are passing in an object or objects, doing something to it, and then returning an output.
You can think of a function as a “data recipe” because when we build a function, it is just like cooking: we have ingredients (inputs), instructions (steps), and some output.
For example, say you want to make chocolate chip cookies:
Ingredients
Instructions
In R, the recipe would look something like this:
function( butter=0.33, sugar=0.5, eggs=1, flour=2, temp=375 )
{
dry.goods <- combine( flour, sugar )
batter <- mix( dry.goods, butter, eggs )
cookies <- bake( batter, temp, time=10 )
return( cookies )
}
Note that this function to make cookies relies on other functions for each step, combine(), mix(), and bake(). Each of these functions would have to be defined as well, or more likely someone else in the open source community has already written a package called “baking” that contains simple functions for cooking so that you can use them for more complicated recipes.
You will find that R allows you to conduct powerful analysis primarily because you can build on top of and extend a lot of existing functionality.
For example, let’s create a function that calculates the mean of a variable:
This function takes an object input
and divides the sum
of input by the length of input and returns this ratio as an object
called our.mean
. Note that the names we use for the objects
are arbitrary.
The function reads as: create a function called
function.mean
, this function takes one argument called
input
, the function will create an object called
our.mean
by summing input and dividing it by the length of
input, the function then returns the object our.mean
.
Remember that we can see our function by just typing the name
function.mean
.
Now, let’s create an object called x
and calculate the
mean using our function:
## [1] 3
Note here that x
is the input in the function.
We can also create a object that is returned by the function.
## [1] 3
The command: function.mean(x)
just returns the mean, but
the command: x.mean <- function.mean(x)
creates the
object x.mean
that is the mean of the object
x
.
Commenting inside the function is helpful for keeping track of what
is going on, as well as for others who might want to use a function you
have created. In R, any line with a #
sign in front is
commented out of the code in the function and it will pass over it. For
example:
function.mean <- function( input ){
our.mean <- sum( input )/length( input ) #the equation for the mean.
return( our.mean ) #the object 'our.mean' is what is created.
}
It is important to note that R will overwrite the names of existing
functions. For example, the function rowMeans()
can be
overwritten by the command
rowMeans <- function(input){
…}
. So be
careful with your naming of functions. To be safe, just type the name of
what you want to call your function to see if it exists in the loaded
libraries.
function()
FunctionThe return()
command within the function()
function is helpful when our function creates multiple objects, but we
only want particular objects returned by the function. For example,
let’s create a function that takes the mean of two different inputs,
then creates a new object that has both of these means as a single
object:
function.two.means <- function( input1,input2 ){ #this function takes two arguments.
mean.1 <- sum( input1 )/length( input1 ) # take the mean of the first object.
mean.2 <- sum( input2 )/length( input2 ) # take the mean of the second object.
both.means <- c( mean.1,mean.2 ) #combine the means into a object.
return( both.means ) #return the object that is both means.
}
Let’s give it a try:
Create the mean of x
:
## [1] 3
Create the mean of y
:
## [1] 8
Now, create the mean of the means. Note that we have to put in two arguments here:
## [1] 3 8
Note that this function creates an object with two means. That is to say, a vector with dimensions 1 x 2. That is different then a function which takes the mean of the means. For example:
function.mean.of.means <- function( input1,input2 ){ #this function takes two arguments.
mean.1 <- sum( input1 )/length( input1 ) # take the mean of the first object.
mean.2 <- sum( input2 )/length( input2 ) # take the mean of the second object.
mean.of.means <- mean( c( mean.1,mean.2 ) ) #take the mean of the means.
return( mean.of.means ) #return the object that is the mean of both means.
}
Let’s give it a try:
## [1] 3
## [1] 8
## [1] 5.5
Now, compare that to:
## [1] 3 8
Question: What is the difference between the statement
both.means <- c(mean.1,mean.2)
in the first function and
the statement mean.of.means <- mean(c(mean.1,mean.2))
in
the second function?
Task: Create a function to square all the numbers 1: 10 in a vector called y.
Rather than writing line-by-line, it is easier to create functions in a text editor and then copy and paste them into R. Many text editors will auto-generate brackets, parentheses, etc., making the programming much easier.
Within R, you can create script files (which have the .R extension)
than can be called using the source()
function. Rather than
copy and pasting all of the text in a script, you can use the
source()
function to run an entire script.
For example, execute the following script to plot a Christmas tree:
Now, let’s take a look at the file (click here) and see what the script is doing.
Often we need to repeat an action several times and would like a
procedure that does this for us. In R, the for()
function
accomplishes this task and takes the following form:
In this syntax, for
indicates that we are going to loop
from a start index to an end index, i
is the index we are
looping over, 1
is our start index, n
is the
end index, {
opens the loop and }
closes the
loop. With loops, actions that take place in the loop involve objects,
and these objects must be created before we initiate the loop.
Let’s create a loop that takes each value of an object and squares it. For example:
x <- seq( 1, 10, 1 ) # create a object of values 1 through 10.
x.squared <- NULL #create an object that is empty which we will use in the loop.
for( i in 1:10 ){ #for each element for 1 through 10.
x.squared[i] <- x[i]^2 #each element of x.squared will be x[i] squared.
}
x.squared # print the object to see the value.
## [1] 1 4 9 16 25 36 49 64 81 100
Task:
Create an object, y, that is the numbers 1 through 5.
Create an object, x, that is NULL
.
Write a loop that takes the elements of y and a) squares it then b) adds that squared value to the original value.
For example, if my value in y is \(6\), then the output in x should be \(42\) because \(6^2 + 6 = 42\).
R also has some very helpful loops already programmed in the
apply()
function family. The apply()
function
allows us apply to a function to a two-dimensional object. The
apply()
function has three arguments: the object, whether
to apply across rows or columns, the functions to use. Let’s check out
the help:
For example, say we have a data set with 5 individuals and 3 variables and we want the mean for each of the variables:
Person | Var1 | Var2 | Var3 |
---|---|---|---|
John | 1 | 2 | 3 |
Essa | 4 | 5 | 5 |
Stan | 4 | 3 | 2 |
Cora | 1 | 1 | 2 |
Dezy | 3 | 4 | 5 |
—— | |||
Mean | ? | ? | ? |
If we read these data in as a matrix, then the mean for each variable is just the column means:
#Create the data matrix using the matrix function.
dat <- matrix(
c( 1,2,3,4,5,5,4,3,2,1,1,2,3,4,5 ), # a vector of the values.
nrow=5, # the number of rows in the matrix.
ncol=3, # the number of columns in the matrix.
byrow=TRUE #instructions to read by row (vs. by column).
)
Note, we could write:
Or, we could just use the apply()
function:
## [1] 2.6 3.0 3.4
R has a useful function called which()
that returns T/F
statements about objects. The function returns the position of the
values meeting some condition. For example:
## [1] 4 5 6 7 14 15
Shows that elements in the 4th, 5th, 6th, 7th, 14th, and 15th positions are > 3.
Using the same idea, the if()
function can be used if we
want to perform a certain task depending on some condition. The basic
structure is:
if(
statement that gives TRUE or FALSE) {
action to take
}
For example, say we want to recode some value if it meets some
condition. Specifically, lets recode all values in x
to be
0 of they are greater than 3. To do this, we need to put our if
statement in a for()
loop.
x <- c( 1,2,3,4,5,5,4,3,2,1,1,2,3,4,5 )
for( i in 1:15 ){
if( x[i] > 3 ){ #define the condition.
x[i] <- 0 #assign the value if condition is met.
}
}
x
## [1] 1 2 3 0 0 0 0 3 2 1 1 2 3 0 0
There is also the else
command that lets us perform
actions if some condition is not met. For example, recode all values to
be 1 if they are not greater than 3 (i.e. going to be changed to 0).
x <- c( 1,2,3,4,5,5,4,3,2,1,1,2,3,4,5 )
for( i in 1:15 ){
if( x[i] > 3 ){
x[i] <- 0
}
else x[i] <- 1
}
x
## [1] 1 1 1 0 0 0 0 1 1 1 1 1 1 0 0
Finally, there is also the else if()
function that lets
you combine several possible actions. For example:
x <- c( 1,2,3,4,5,5,4,3,2,1,1,2,3,4,5 )
for( i in 1:15 ){
if( x[i] > 3 ){ #recode 4 and 5 to 0.
x[i] <- 0
}
else if( x[i] == 3 ){ #recode 3 to 2.
x[i] <- 2
}
else x[i] <- 1 # recode 1 and 2 to 1.
}
x
## [1] 1 1 2 0 0 0 0 2 1 1 1 1 2 0 0