class: center, middle, inverse, title-slide # Lec 02 - Logic and types in R ##
Statistical Programming ### Sem 1, 2020 ###
Dr. Colin Rundel --- exclude: true --- class: middle count: false # In R (almost) <br/> everything is a vector --- ## Vectors The fundamental building block of data in R are vectors (collections of related values, objects, data structures, functions, etc). R has two types of vectors: * **atomic** vectors (*vectors*) - homogeneous collections of the *same* type (e.g. all `true`/`false` values, all numbers, or all character strings). * **generic** vectors (*lists*) - heterogeneous collections of *any* type of R object, even other lists <br/> (meaning they can have a hierarchical/tree-like structure). --- class: middle count: false # Atomic Vectors --- ## Atomic Vectors R has six atomic vector types, we can check the type of any object in R using the `typeof()` function `typeof()` | `mode()` :-----------|:------------ logical | logical double | numeric integer | numeric character | character complex | complex raw | raw Mode is a higher level abstraction, we will discuss this in more detail later. --- ## Vector types `logical` - boolean values `TRUE` and `FALSE` .pull-left[ ```r typeof(TRUE) ``` ``` ## [1] "logical" ``` ] .pull-right[ ```r mode(TRUE) ``` ``` ## [1] "logical" ``` ] -- <br/> `character` - text strings <div> .pull-left[ ```r typeof("hello") ``` ``` ## [1] "character" ``` ```r typeof('world') ``` ``` ## [1] "character" ``` ] .pull-right[ ```r mode("hello") ``` ``` ## [1] "character" ``` ```r mode('world') ``` ``` ## [1] "character" ``` ] </div> --- `double` - floating point numerical values (default numerical type) .pull-left[ ```r typeof(1.33) ``` ``` ## [1] "double" ``` ```r typeof(7) ``` ``` ## [1] "double" ``` ] .pull-right[ ```r mode(1.33) ``` ``` ## [1] "numeric" ``` ```r mode(7) ``` ``` ## [1] "numeric" ``` ] -- <br/> `integer` - integer numerical values (indicated with an `L`) <div> .pull-left[ ```r typeof( 7L ) ``` ``` ## [1] "integer" ``` ```r typeof( 1:3 ) ``` ``` ## [1] "integer" ``` ] .pull-right[ ```r mode( 7L ) ``` ``` ## [1] "numeric" ``` ```r mode( 1:3 ) ``` ``` ## [1] "numeric" ``` ] </div> --- ## Concatenation Atomic vectors can be constructed using the concatenate `c()` function. ```r c(1, 2, 3) ``` ``` ## [1] 1 2 3 ``` -- ```r c("Hello", "World!") ``` ``` ## [1] "Hello" "World!" ``` -- ```r c(1, 1:10) ``` ``` ## [1] 1 1 2 3 4 5 6 7 8 9 10 ``` -- ```r c(1,c(2, c(3))) ``` ``` ## [1] 1 2 3 ``` **Note** - atomic vectors are *always* flat. --- class: split-thirds ## Inspecting types * `typeof(x)` - returns a character vector (length 1) of the *type* of object `x`. * `mode(x)` - returns a character vector (length 1) of the *mode* of object `x`. .pull-left[ ```r typeof(1) ``` ``` ## [1] "double" ``` ```r typeof(1L) ``` ``` ## [1] "integer" ``` ```r typeof("A") ``` ``` ## [1] "character" ``` ```r typeof(TRUE) ``` ``` ## [1] "logical" ``` ] .pull-right[ ```r mode(1) ``` ``` ## [1] "numeric" ``` ```r mode(1L) ``` ``` ## [1] "numeric" ``` ```r mode("A") ``` ``` ## [1] "character" ``` ```r mode(TRUE) ``` ``` ## [1] "logical" ``` ] --- ## Type Predicates * `is.logical(x)` - returns `TRUE` if `x` has *type* `logical`. * `is.character(x)` - returns `TRUE` if `x` has *type* `character`. * `is.double(x)` - returns `TRUE` if `x` has *type* `double`. * `is.integer(x)` - returns `TRUE` if `x` has *type* `integer`. * `is.numeric(x)` - returns `TRUE` if `x` has *mode* `numeric`. .col3_left[ ```r is.integer(1) ``` ``` ## [1] FALSE ``` ```r is.integer(1L) ``` ``` ## [1] TRUE ``` ```r is.integer(3:7) ``` ``` ## [1] TRUE ``` ] .col3_mid[ ```r is.double(1) ``` ``` ## [1] TRUE ``` ```r is.double(1L) ``` ``` ## [1] FALSE ``` ```r is.double(3:8) ``` ``` ## [1] FALSE ``` ] .col3_right[ ```r is.numeric(1) ``` ``` ## [1] TRUE ``` ```r is.numeric(1L) ``` ``` ## [1] TRUE ``` ```r is.numeric(3:7) ``` ``` ## [1] TRUE ``` ] --- ## Other useful predicates * `is.atomic(x)` - returns `TRUE` if `x` is an *atomic vector*. * `is.list(x)` - returns `TRUE` if `x` is a *list*. * `is.vector(x)` - returns `TRUE` if `x` is either an *atomic vector* or *list*. .pull-left[ ```r is.atomic(c(1,2,3)) ``` ``` ## [1] TRUE ``` ```r is.list(c(1,2,3)) ``` ``` ## [1] FALSE ``` ```r is.vector(c(1,2,3)) ``` ``` ## [1] TRUE ``` ] .pull-right[ ```r is.atomic(list(1,2,3)) ``` ``` ## [1] FALSE ``` ```r is.list(list(1,2,3)) ``` ``` ## [1] TRUE ``` ```r is.vector(list(1,2,3)) ``` ``` ## [1] TRUE ``` ] --- ## Type Coercion R is a dynamically typed language -- it will automatically convert between most types without raising warnings or errors. Keep in mind the rule that atomic vectors must always contain values of the same type. ```r c(1, "Hello") ``` ``` ## [1] "1" "Hello" ``` -- .top-pad[] ```r c(FALSE, 3L) ``` ``` ## [1] 0 3 ``` -- .top-pad[] ```r c(1.2, 3L) ``` ``` ## [1] 1.2 3.0 ``` --- ## Operator coercion Operators and functions will also attempt to coerce values to an appropriate type for the given operation <div> .pull-left[ ```r 3.1+1L ``` ``` ## [1] 4.1 ``` ```r 5 + FALSE ``` ``` ## [1] 5 ``` ] .pull-right[ ```r log(1) ``` ``` ## [1] 0 ``` ```r log(TRUE) ``` ``` ## [1] 0 ``` ] </div> -- <br/><br/> .pull-left[ ```r TRUE & FALSE ``` ``` ## [1] FALSE ``` ```r TRUE & 7 ``` ``` ## [1] TRUE ``` ] .pull-right[ ```r TRUE | FALSE ``` ``` ## [1] TRUE ``` ```r FALSE | !5 ``` ``` ## [1] FALSE ``` ] --- ## Explicit Coercion Most of the `is` functions we just saw have an `as` variant which can be used for *explicit* coercion. .pull-left[ ```r as.logical(5.2) ``` ``` ## [1] TRUE ``` ```r as.character(TRUE) ``` ``` ## [1] "TRUE" ``` ```r as.integer(pi) ``` ``` ## [1] 3 ``` ] .pull-right[ ```r as.numeric(FALSE) ``` ``` ## [1] 0 ``` ```r as.double("7.2") ``` ``` ## [1] 7.2 ``` ```r as.double("one") ``` ``` ## Warning: NAs introduced by coercion ``` ``` ## [1] NA ``` ] --- count: false class: middle # Conditionals --- ## Logical (boolean) operators <br/><br/> | Operator | Operation | Vectorized? |:-----------------------------:|:-------------:|:------------: | <code>x | y</code> | or | Yes | `x & y` | and | Yes | `!x` | not | Yes | <code>x || y</code> | or | No | `x && y` | and | No |`xor(x, y)` | exclusive or | Yes --- ## Vectorized? ```r x = c(TRUE,FALSE,TRUE) y = c(FALSE,TRUE,TRUE) ``` .pad-top[] .pull-left[ ```r x | y ``` ``` ## [1] TRUE TRUE TRUE ``` ```r x || y ``` ``` ## [1] TRUE ``` ] .pull-right[ ```r x & y ``` ``` ## [1] FALSE FALSE TRUE ``` ```r x && y ``` ``` ## [1] FALSE ``` ] .footnote[ **Note** both `||` and `&&` only use the *first* value in the vector, all other values are ignored, there is no warning about the ignored values. ] --- ## Vectorization and math Almost all of the basic mathematical operations (and many other functions) in R are vectorized. .pull-left[ ```r c(1, 2, 3) + c(3, 2, 1) ``` ``` ## [1] 4 4 4 ``` ```r c(1, 2, 3) / c(3, 2, 1) ``` ``` ## [1] 0.3333333 1.0000000 3.0000000 ``` ] .pull-right[ ```r log(c(1, 3, 0)) ``` ``` ## [1] 0.000000 1.098612 -Inf ``` ```r sin(c(1, 2, 3)) ``` ``` ## [1] 0.8414710 0.9092974 0.1411200 ``` ] --- ## Length coercion ```r x = c(TRUE, FALSE, TRUE) y = c(TRUE) z = c(FALSE, TRUE) ``` -- .pad-top[] .pull-left[ ```r x | y ``` ``` ## [1] TRUE TRUE TRUE ``` ```r x & y ``` ``` ## [1] TRUE FALSE TRUE ``` ] .pull-right[ ```r y | z ``` ``` ## [1] TRUE TRUE ``` ```r y & z ``` ``` ## [1] FALSE TRUE ``` ] <div/> -- <br/> .pad-top[] ```r x | z ``` ``` ## Warning in x | z: longer object length is not a multiple of shorter object ## length ``` ``` ## [1] TRUE TRUE TRUE ``` --- ## Comparisons Operator | Comparison | Vectorized? :----------:|:--------------------------:|:----------------: `x < y` | less than | Yes `x > y` | greater than | Yes `x <= y` | less than or equal to | Yes `x >= y` | greater than or equal to | Yes `x != y` | not equal to | Yes `x == y` | equal to | Yes `x %in% y` | contains | Yes (over `x`) --- ## Comparisons ```r x = c("A","B","C") z = c("A") ``` .pad-top[] .pull-left[ ```r x == z ``` ``` ## [1] TRUE FALSE FALSE ``` ```r x != z ``` ``` ## [1] FALSE TRUE TRUE ``` ```r x > z ``` ``` ## [1] FALSE TRUE TRUE ``` ] -- .pull-right[ ```r x %in% z ``` ``` ## [1] TRUE FALSE FALSE ``` ```r z %in% x ``` ``` ## [1] TRUE ``` ] --- ## Conditional Control Flow Conditional execution of code blocks is achieved via `if` statements. ```r x = c(1,3) ``` -- .pad-top[] ```r if (3 %in% x) print("This!") ``` ``` ## [1] "This!" ``` -- .pad-top[] ```r if (1 %in% x) print("That!") ``` ``` ## [1] "That!" ``` -- .pad-top[] ```r if (5 %in% x) print("Other!") ``` --- ## `if` is not vectorized ```r x = c(1,3) ``` -- .pad-top[] ```r if (x == 1) print("x is 1!") ``` ``` ## Warning in if (x == 1) print("x is 1!"): the condition has length > 1 and only ## the first element will be used ``` ``` ## [1] "x is 1!" ``` -- .pad-top[] ```r if (x == 3) print("x is 3!") ``` ``` ## Warning in if (x == 3) print("x is 3!"): the condition has length > 1 and only ## the first element will be used ``` --- ## Collapsing logical vectors There are a couple of helper functions for collapsing a logical vector down to a single value: `any`, `all` ```r x = c(3,4,1) ``` .top-pad[] .pull-left[ ```r x >= 2 ``` ``` ## [1] TRUE TRUE FALSE ``` ```r any(x >= 2) ``` ``` ## [1] TRUE ``` ```r all(x >= 2) ``` ``` ## [1] FALSE ``` ] .pull-right[ ```r x <= 4 ``` ``` ## [1] TRUE TRUE TRUE ``` ```r any(x <= 4) ``` ``` ## [1] TRUE ``` ```r all(x <= 4) ``` ``` ## [1] TRUE ``` ] <div/> -- <br/> ```r if (any(x == 3)) print("x contains 3!") ``` ``` ## [1] "x contains 3!" ``` --- ## Nesting Conditionals .pull-left[ ```r x = 3 if (x < 0) { "x is negative" } else if (x > 0) { "x is positive" } else { "x is zero" } ``` ``` ## [1] "x is positive" ``` ] .pull-right[ ```r x = 0 if (x < 0) { "x is negative" } else if (x > 0) { "x is positive" } else { "x is zero" } ``` ``` ## [1] "x is zero" ``` ] --- class: middle count: false # Error Checking --- ## `stop` and `stopifnot` Often we want to validate user input or function arguments - if our assumptions are not met then we often want to report the error and stop execution. ```r ok = FALSE if (!ok) stop("Things are not ok.") ``` ``` ## Error in eval(expr, envir, enclos): Things are not ok. ``` -- .pad-top[] ```r stopifnot(ok) ``` ``` ## Error: ok is not TRUE ``` -- .pad-top[] ```r stopifnot(is.logical(ok)) ``` -- .pad-top[] ```r stopifnot(is.logical(ok+0)) ``` ``` ## Error: is.logical(ok + 0) is not TRUE ``` --- ## Style choices Simple is usually better than complicated - generally it is better to have fewer clauses and have the more important conditions first (e.g. failure conditions) .pull-left[ Do stuff (ok): ```r if (condition_one) { ## ## Do stuff ## } else if (condition_two) { ## ## Do other stuff ## } else if (condition_error) { stop("Condition error occured") } ``` ] .pull-right[ Do stuff (better): ```r # Do stuff better if (condition_error) { stop("Condition error occured") } if (condition_one) { ## ## Do stuff ## } else if (condition_two) { ## ## Do other stuff ## } ``` ] --- class: middle, center # Missing Values --- ## Missing Values R uses `NA` to represent missing values in its data structures, what may not be obvious is that there are different `NA`s for the different types. .pull-left[ ```r typeof(NA) ``` ``` ## [1] "logical" ``` ```r typeof(NA+1) ``` ``` ## [1] "double" ``` ```r typeof(NA+1L) ``` ``` ## [1] "integer" ``` ] .pull-right[ ```r typeof(NA_character_) ``` ``` ## [1] "character" ``` ```r typeof(NA_real_) ``` ``` ## [1] "double" ``` ```r typeof(NA_integer_) ``` ``` ## [1] "integer" ``` ] --- ## NA contageon Because `NA`s represent missing values it makes sense that any calculation using them should also be missing. .pull-left[ ```r 1 + NA ``` ``` ## [1] NA ``` ```r 1 / NA ``` ``` ## [1] NA ``` ```r NA * 5 ``` ``` ## [1] NA ``` ] .pull-right[ ```r mean(c(1, 2, 3, NA)) ``` ``` ## [1] NA ``` ```r sqrt(NA) ``` ``` ## [1] NA ``` ```r 3^NA ``` ``` ## [1] NA ``` ] --- ## NAs are not always contageous A useful mental model for `NA`s is to consider them as a unknown value that could take any of the possible values for that type. For numbers or characters this isn't very helpful, but for a logical value we know that the value must either be `TRUE` or `FALSE` and we can use that when deciding what value to return. -- ```r TRUE & NA ``` ``` ## [1] NA ``` -- ```r FALSE & NA ``` ``` ## [1] FALSE ``` -- ```r TRUE | NA ``` ``` ## [1] TRUE ``` -- ```r FALSE | NA ``` ``` ## [1] NA ``` --- ## Conditionals and missing values `NA`s can be problematic in some cases (particularly for control flow) ```r 1 == NA ``` ``` ## [1] NA ``` -- .pad-top[] ```r if (2 != NA) "Here" ``` ``` ## Error in if (2 != NA) "Here": missing value where TRUE/FALSE needed ``` -- .pad-top[] ```r if (all(c(1,2,NA,4) >= 1)) "There" ``` ``` ## Error in if (all(c(1, 2, NA, 4) >= 1)) "There": missing value where TRUE/FALSE needed ``` -- .pad-top[] ```r if (any(c(1,2,NA,4) >= 1)) "There" ``` ``` ## [1] "There" ``` --- ## Testing for `NA` To explicitly test if a value is missing it is necessary to use `is.na` (often along with `any` or `all`). .pull-left[ ```r NA == NA ``` ``` ## [1] NA ``` ```r is.na(NA) ``` ``` ## [1] TRUE ``` ```r is.na(1) ``` ``` ## [1] FALSE ``` ] .pull-right[ ```r is.na(c(1,2,3,NA)) ``` ``` ## [1] FALSE FALSE FALSE TRUE ``` ```r any(is.na(c(1,2,3,NA))) ``` ``` ## [1] TRUE ``` ```r all(is.na(c(1,2,3,NA))) ``` ``` ## [1] FALSE ``` ] --- ## Other Special values (double) These are defined as part of the IEEE floating point standard (not unique to R) * `NaN` - Not a number * `Inf` - Positive infinity * `-Inf` - Negative infinity .pull-left[ ```r pi / 0 ``` ``` ## [1] Inf ``` ```r 0 / 0 ``` ``` ## [1] NaN ``` ```r 1/0 + 1/0 ``` ``` ## [1] Inf ``` ] .pull-right[ ```r 1/0 - 1/0 ``` ``` ## [1] NaN ``` ```r NaN / NA ``` ``` ## [1] NaN ``` ```r NaN * NA ``` ``` ## [1] NaN ``` ] --- ## Testing for `inf` and `NaN` `NaN` and `Inf` don't have the same testing issues that `NA`s do, but there are still convenience functions for testing for these types of values .pull-left[ ```r NA ``` ``` ## [1] NA ``` ```r 1/0+1/0 ``` ``` ## [1] Inf ``` ```r 1/0-1/0 ``` ``` ## [1] NaN ``` ```r 1/0 == Inf ``` ``` ## [1] TRUE ``` ```r -1/0 == Inf ``` ``` ## [1] FALSE ``` ] -- .pull-right[ ```r is.finite(1/0+1/0) ``` ``` ## [1] FALSE ``` ```r is.finite(1/0-1/0) ``` ``` ## [1] FALSE ``` ```r is.nan(1/0-1/0) ``` ``` ## [1] TRUE ``` ```r is.finite(NA) ``` ``` ## [1] FALSE ``` ```r is.nan(NA) ``` ``` ## [1] FALSE ``` ] --- ## Coercion for infinity and NaN First remember that `Inf`, `-Inf`, and `NaN` have type double, however their coercion behavior is not the same as for other doubles ```r as.integer(Inf) ``` ``` ## Warning: NAs introduced by coercion to integer range ``` ``` ## [1] NA ``` ```r as.integer(NaN) ``` ``` ## [1] NA ``` .top-pad[] .pull-left[ ```r as.logical(Inf) ``` ``` ## [1] TRUE ``` ```r as.logical(NaN) ``` ``` ## [1] NA ``` ] .pull-right[ ```r as.character(Inf) ``` ``` ## [1] "Inf" ``` ```r as.character(NaN) ``` ``` ## [1] "NaN" ``` ] --- ## Exercise 1 **Part 1** What is the type of the following vectors? Explain why they have that type. * `c(1, NA+1L, "C")` * `c(1L / 0, NA)` * `c(1:3, 5)` * `c(3L, NaN+1L)` * `c(NA, TRUE)` **Part 2** Considering only the four (common) data types, what is R's implicit type conversion hierarchy (from highest priority to lowest priority)? *Hint* - think about the pairwise interactions between types. --- class: middle count: false # Loops --- ## for loops Simplest, and most common type of loop in R - given a vector iterate through the elements and evaluate the code block for each. ```r res = c() for(x in 1:10) { res = c(res, x^2) } res ``` ``` ## [1] 1 4 9 16 25 36 49 64 81 100 ``` -- .pad-top[] ```r res = c() for(y in list(1:3, LETTERS[1:7], c(TRUE,FALSE))) { res = c(res, length(y)) } res ``` ``` ## [1] 3 7 2 ``` --- ## `while` loops Repeat until the given condition is **not** met (i.e. evaluates to `FALSE`) ```r i = 1 res = rep(NA,10) while (i <= 10) { res[i] = i^2 i = i+1 } res ``` ``` ## [1] 1 4 9 16 25 36 49 64 81 100 ``` --- ## `repeat` loops Repeat the loop until a `break` is encountered ```r i = 1 res = rep(NA,10) repeat { res[i] = i^2 i = i+1 if (i > 10) break } res ``` ``` ## [1] 1 4 9 16 25 36 49 64 81 100 ``` --- class: split-50 ## Special keywords - `break` and `next` These are special actions that only work *inside* of a loop * `break` - ends the current **loop** (inner-most) * `next` - ends the current **iteration** .pull-left[ ```r res = c() for(i in 1:10) { if (i %% 2 == 0) break res = c(res, i) print(res) } ``` ``` ## [1] 1 ``` ] .pull-right[ ```r res = c() for(i in 1:10) { if (i %% 2 == 0) next res = c(res,i) print(res) } ``` ``` ## [1] 1 ## [1] 1 3 ## [1] 1 3 5 ## [1] 1 3 5 7 ## [1] 1 3 5 7 9 ``` ] --- ## Some helpful functions Often we want to use a loop across the indexes of an object and not the elements themselves. There are several useful functions to help you do this: `:`, `length`, `seq`, `seq_along`, `seq_len`, etc. .pull-left[ ```r 4:7 ``` ``` ## [1] 4 5 6 7 ``` ```r length(4:7) ``` ``` ## [1] 4 ``` ```r seq(4,7) ``` ``` ## [1] 4 5 6 7 ``` ] .pull-right[ ```r seq_along(4:7) ``` ``` ## [1] 1 2 3 4 ``` ```r seq_len(length(4:7)) ``` ``` ## [1] 1 2 3 4 ``` ```r seq(4,7,by=2) ``` ``` ## [1] 4 6 ``` ] --- ## Exercise 2 Below is a vector containing all prime numbers between 2 and 100: .center[ ```r primes = c( 2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97) ``` ] If you were given the vector `x = c(3,4,12,19,23,51,61,63,78)`, write the R code necessary to print only the values of `x` that are *not* prime (without using subsetting or the `%in%` operator). Your code should use *nested* loops to iterate through the vector of primes and `x`. --- count: false # Acknowledgments Above materials are derived in part from the following sources: * Hadley Wickham - [Advanced R](http://adv-r.had.co.nz/) * [R Language Definition](http://stat.ethz.ch/R-manual/R-devel/doc/manual/R-lang.html)