Name Function Number of R Again
Programming with R
Creating functions
Learning Objectives
- Ascertain a role that takes arguments.
- Return a value from a function.
- Test a function.
- Explicate what a telephone call stack is, and trace changes to the call stack equally functions are called.
- Set default values for function arguments.
- Explain why we should divide programs into small, unmarried-purpose functions.
If nosotros only had one data set to analyze, information technology would probably be faster to load the file into a spreadsheet and use that to plot some simple statistics. Only we have twelve files to check, and may have more in the future. In this lesson, we'll learn how to write a function so that nosotros tin repeat several operations with a single command.
Defining a Part
Permit'due south start by defining a function fahr_to_kelvin that converts temperatures from Fahrenheit to Kelvin:
fahr_to_kelvin <- function(temp) { kelvin <- ((temp - 32) * (five / 9)) + 273.fifteen return(kelvin) } Nosotros ascertain fahr_to_kelvin by assigning it to the output of function. The list of argument names are containted inside parentheses. Next, the body of the function–the statements that are executed when it runs–is contained inside curly braces ({}). The statements in the body are indented by two spaces. This makes the code easier to read but does not touch how the code operates.
When we telephone call the part, the values we pass to it are assigned to those variables so that we can use them inside the function. Inside the function, we employ a return statement to transport a effect back to whoever asked for it.
Let's try running our role. Calling our ain office is no unlike from calling any other office:
# freezing indicate of water fahr_to_kelvin(32) [1] 273.xv # boiling indicate of h2o fahr_to_kelvin(212) [one] 373.15 Nosotros've successfully called the role that we defined, and we accept access to the value that we returned.
Composing Functions
At present that we've seen how to turn Fahrenheit into Kelvin, information technology's easy to turn Kelvin into Celsius:
kelvin_to_celsius <- function(temp) { celsius <- temp - 273.15 render(celsius) } #absolute zero in Celsius kelvin_to_celsius(0) [1] -273.15 What most converting Fahrenheit to Celsius? We could write out the formula, only we don't need to. Instead, we can etch the two functions we take already created:
fahr_to_celsius <- part(temp) { temp_k <- fahr_to_kelvin(temp) result <- kelvin_to_celsius(temp_k) render(result) } # freezing point of h2o in Celsius fahr_to_celsius(32.0) [1] 0 This is our first gustatory modality of how larger programs are built: we ascertain basic operations, then combine them in ever-big chunks to get the effect nosotros desire. Real-life functions will usually be larger than the ones shown hither–typically half a dozen to a few dozen lines–but they shouldn't always be much longer than that, or the adjacent person who reads information technology won't be able to empathise what'southward going on.
Challenge - Create a function
- In the terminal lesson, we learned to concatenate elements into a vector using the
cfunction, east.m.10 <- c("A", "B", "C")creates a vectorxwith iii elements. Furthermore, we can extend that vector again usingc, e.yard.y <- c(10, "D")creates a vectorywith 4 elements. Write a function calledcontendthat takes 2 vectors as arguments, calledoriginalandwrapper, and returns a new vector that has the wrapper vector at the kickoff and end of the original:
best_practice <- c("Write", "programs", "for", "people", "not", "computers") asterisk <- "***" # R interprets a variable with a single value as a vector # with one element. debate(best_practice, asterisk) [ane] "***" "Write" "programs" "for" "people" "not" [seven] "computers" "***" - If the variable
fiverefers to a vector, sov[i]is the vector's commencement element andv[length(v)]is its last (the functionlengthreturns the number of elements in a vector). Write a function calledoutsidethat returns a vector fabricated up of but the offset and last elements of its input:
dry_principle <- c("Don't", "repeat", "yourself", "or", "others") outside(dry_principle) [1] "Don't" "others" The Call Stack
Permit'southward take a closer look at what happens when we telephone call fahr_to_celsius(32). To make things clearer, nosotros'll commencement by putting the initial value 32 in a variable and store the final result in one as well:
original <- 32 final <- fahr_to_celsius(original) The diagram beneath shows what memory looks like later on the first line has been executed:
When we telephone call fahr_to_celsius, R doesn't create the variable temp right away. Instead, it creates something called a stack frame to go on track of the variables defined by fahr_to_kelvin. Initially, this stack frame only holds the value of temp:
When we call fahr_to_kelvin inside fahr_to_celsius, R creates another stack frame to concur fahr_to_kelvin's variables:
It does this considering in that location are now two variables in play called temp: the argument to fahr_to_celsius, and the statement to fahr_to_kelvin. Having two variables with the same name in the same part of the program would be ambiguous, and then R (and every other modernistic programming linguistic communication) creates a new stack frame for each function call to keep that function's variables separate from those divers by other functions.
When the telephone call to fahr_to_kelvin returns a value, R throws away fahr_to_kelvin's stack frame and creates a new variable in the stack frame for fahr_to_celsius to hold the temperature in Kelvin:
Information technology then calls kelvin_to_celsius, which means information technology creates a stack frame to concur that function's variables:
Once more, R throws away that stack frame when kelvin_to_celsius is done and creates the variable result in the stack frame for fahr_to_celsius:
Finally, when fahr_to_celsius is done, R throws away its stack frame and puts its result in a new variable called final that lives in the stack frame nosotros started with:
This final stack frame is e'er there; information technology holds the variables nosotros defined outside the functions in our code. What it doesn't hold is the variables that were in the various stack frames. If we attempt to get the value of temp later our functions accept finished running, R tells us that in that location's no such thing:
temp Error in eval(expr, envir, enclos): object 'temp' not found Why go to all this trouble? Well, here'due south a function called span that calculates the deviation betwixt the minimum and maximum values in an assortment:
bridge <- function(a) { diff <- max(a) - min(a) render(diff) } dat <- read.csv(file = "data/inflammation-01.csv", header = FALSE) # span of inflammation data span(dat) [1] xx Notice bridge assigns a value to variable called unequal. We might very well use a variable with the same proper noun (diff) to hold the inflammation data:
diff <- read.csv(file = "information/inflammation-01.csv", header = FALSE) # span of inflammation data bridge(unequal) [ane] 20 We don't wait the variable diff to take the value 20 later this function call, so the proper name unequal cannot refer to the same variable defined inside span as it does in every bit it does in the principal body of our program (which R refers to equally the global environment). And yes, we could probably choose a unlike name than diff for our variable in this instance, but we don't want to have to read every line of lawmaking of the R functions nosotros call to come across what variable names they use, only in case they change the values of our variables.
The large idea here is encapsulation, and information technology's the primal to writing correct, comprehensible programs. A function's job is to plough several operations into one so that we can think nearly a single part call instead of a dozen or a hundred statements each time nosotros want to practice something. That only works if functions don't interfere with each other; if they practise, we have to pay attention to the details once again, which quickly overloads our short-term memory.
Challenge - Following the call stack
- Nosotros previously wrote functions called
fenceandexterior. Describe a diagram showing how the telephone call stack changes when we run the following:
inner_vec <- "carbon" outer_vec <- "+" result <- exterior(fence(inner_vec, outer_vec)) Testing and Documenting
Once nosotros offset putting things in functions so that we tin re-utilize them, we need to beginning testing that those functions are working correctly. To see how to do this, let's write a function to center a dataset effectually a item value:
center <- function(data, desired) { new_data <- (data - mean(data)) + desired return(new_data) } We could test this on our actual information, but since we don't know what the values ought to exist, it will be hard to tell if the result was correct. Instead, permit'due south create a vector of 0s and then center that effectually 3. This will make it simple to see if our function is working as expected:
z <- c(0, 0, 0, 0) z [i] 0 0 0 0 center(z, three) [1] 3 3 3 3 That looks right, so let's try center on our existent data. Nosotros'll center the inflammation data from solar day 4 around 0:
dat <- read.csv(file = "data/inflammation-01.csv", header = FALSE) centered <- center(dat[, 4], 0) head(centered) [one] 1.25 -0.75 i.25 -1.75 1.25 0.25 It'southward difficult to tell from the default output whether the result is correct, but there are a few uncomplicated tests that will reassure united states:
# original min min(dat[, 4]) [i] 0 # original mean mean(dat[, 4]) [1] 1.75 # original max max(dat[, 4]) [one] iii # centered min min(centered) [1] -one.75 # centered mean mean(centered) [1] 0 # centered max max(centered) [1] 1.25 That seems almost correct: the original hateful was nearly 1.75, and then the lower leap from nix is now about -1.75. The mean of the centered information is 0. We can even go further and check that the standard departure hasn't inverse:
# original standard difference sd(dat[, 4]) [1] 1.067628 # centerted standard difference sd(centered) [i] ane.067628 Those values wait the same, but nosotros probably wouldn't notice if they were different in the 6th decimal place. Let'southward do this instead:
# departure in standard deviations earlier and afterward sd(dat[, four]) - sd(centered) [i] 0 Sometimes, a very small departure can be detected due to rounding at very low decimal places. R has a useful function for comparing 2 objects allowing for rounding errors, all.equal:
all.equal(sd(dat[, 4]), sd(centered)) [1] TRUE Information technology's still possible that our function is wrong, but it seems unlikely enough that we should probably get dorsum to doing our analysis. We accept one more task first, though: we should write some documentation for our function to remind ourselves later what it'due south for and how to use it.
A common manner to put documentation in software is to add comments like this:
heart <- function(data, desired) { # return a new vector containing the original data centered effectually the # desired value. # Case: center(c(one, 2, 3), 0) => c(-1, 0, i) new_data <- (data - hateful(data)) + desired return(new_data) }
Challenge - A more advanced function
- Write a function chosen
analyzethat takes a filename as a argument and displays the three graphs produced in the previous lesson (boilerplate, min and max inflammation over fourth dimension).clarify("data/inflammation-01.csv")should produce the graphs already shown, whileanalyze("information/inflammation-02.csv")should produce corresponding graphs for the 2d data set. Be certain to certificate your part with comments. - Write a function
rescalethat takes a vector as input and returns a corresponding vector of values scaled to prevarication in the range 0 to 1. (If Fifty and H are the lowest and highest values in the original vector, then the replacement for a value v should be (v −L)/(H −L).) Exist sure to document your function with comments. - Test that your
rescalefunction is working properly usingmin,max, andplot.
Defining Defaults
We have passed arguments to functions in ii means: directly, every bit in dim(dat), and by name, every bit in read.csv(file = "information/inflammation-01.csv", header = FALSE). In fact, we can pass the arguments to read.csv without naming them:
dat <- read.csv("data/inflammation-01.csv", FALSE) However, the position of the arguments matters if they are not named.
dat <- read.csv(header = FALSE, file = "data/inflammation-01.csv") dat <- read.csv(FALSE, "data/inflammation-01.csv") Error in read.table(file = file, header = header, sep = sep, quote = quote, : 'file' must be a graphic symbol string or connection To understand what'south going on, and make our own functions easier to use, let's re-define our center function similar this:
center <- part(information, desired = 0) { # return a new vector containing the original information centered around the # desired value (0 past default). # Case: center(c(i, 2, 3), 0) => c(-1, 0, i) new_data <- (information - mean(data)) + desired return(new_data) } The key alter is that the second argument is now written desired = 0 instead of just desired. If we call the part with ii arguments, it works as information technology did before:
test_data <- c(0, 0, 0, 0) center(test_data, three) [1] 3 3 3 3 But we can also now call center() with just one argument, in which case desired is automatically assigned the default value of 0:
more_data <- 5 + test_data more_data [1] 5 5 5 5 heart(more_data) [1] 0 0 0 0 This is handy: if we usually want a office to work one style, merely occasionally need it to do something else, we tin permit people to pass an argument when they demand to but provide a default to make the normal case easier.
The example beneath shows how R matches values to arguments
display <- function(a = one, b = ii, c = iii) { event <- c(a, b, c) names(result) <- c("a", "b", "c") # This names each element of the vector render(result) } # no arguments display() a b c ane two three # one argument display(55) a b c 55 two three # two arguments display(55, 66) a b c 55 66 3 # 3 arguments display (55, 66, 77) a b c 55 66 77 As this case shows, arguments are matched from left to correct, and any that oasis't been given a value explicitly get their default value. We can override this behavior past naming the value as we pass information technology in:
# only setting the value of c display(c = 77) a b c 1 ii 77 With that in hand, permit's look at the assist for read.csv():
?read.csv At that place's a lot of information there, but the most important office is the beginning couple of lines:
read.csv(file, header = True, sep = ",", quote = " \" ", dec = ".", fill = True, comment.char = "", ...) This tells us that read.csv() has one argument, file, that doesn't have a default value, and vi others that do. Now we empathise why the post-obit gives an fault:
dat <- read.csv(Imitation, "information/inflammation-01.csv") Error in read.table(file = file, header = header, sep = sep, quote = quote, : 'file' must be a character string or connection It fails considering FALSE is assigned to file and the filename is assigned to the statement header.
Challenge - A function with default statement values
- Rewrite the
rescalefunction and so that it scales a vector to lie between 0 and one by default, simply will allow the caller to specify lower and upper bounds if they desire. Compare your implementation to your neighbour'south: practice the two functions always deport the aforementioned way?
wagonerthallusithe.blogspot.com
Source: http://monashbioinformaticsplatform.github.io/2015-09-28-rbioinformatics-intro-r/02-func-R.html
0 Response to "Name Function Number of R Again"
Mag-post ng isang Komento