useR to programmeR

Iteration 2

Emma Rand and Ian Lyttle

Learning objectives

This session is (mostly) about functional programming:

Aside: managing file paths within your project
Example: read a bunch of files, then put them in a single data frame
Fundamental paradigms in {purrr}:
- map(), keep(), and reduce()
Adverbs to handle failure
More generally, using functions as arguments to functions 🤯

For coding, we will use r-programming-exercises:

Open R/iteration-02-01-reading-files.R.
Restart R.

Aside: {here} package

For me, here::here() is a truly magical function:

useful in scripts: .R files (like today!)
useful in documents: .Rmd and .qmd files

If you need to:

refer to a file, and
it’s in a fixed place within your project

here() can make your life much simpler!

Here: Example











👋

/Users/ijlyttle/repos/r-programming-exercises/ 
|-- r-programming-exercises.Rproj 
|-- README.md
|-- LICENCE.md
|-- data/
    |-- gapminder/
        |-- 1952.xlsx
        |-- ...
    |-- ...
|-- R/ 
    |-- iteration-02-01-reading-files.R 
    |-- ...

Within iteration-02-01-reading-files.R:

here("data/gapminder/1952.xlsx")

Works just as well for .Rmd, .qmd files.

Here: Searches










🔎

/Users/ijlyttle/repos/r-programming-exercises/ 
|-- r-programming-exercises.Rproj 
|-- README.md
|-- LICENCE.md
|-- data/
    |-- gapminder/
        |-- 1952.xlsx
        |-- ...
    |-- ...
|-- R/ 
    |-- iteration-02-01-reading-files.R 
    |-- ...

Looks in directory for an .Rproj file (simplified)
Doesn’t find one

Here: Moves up and searches

🔎

/Users/ijlyttle/repos/r-programming-exercises/ 
|-- r-programming-exercises.Rproj 
|-- README.md
|-- LICENCE.md
|-- data/
    |-- gapminder/
        |-- 1952.xlsx
        |-- ...
    |-- ...
|-- R/ 
    |-- iteration-02-01-reading-files.R 
    |-- ...

Moves up one directory
Looks again

Here: Finds `.Rproj`

✅

/Users/ijlyttle/repos/r-programming-exercises/ 
|-- r-programming-exercises.Rproj 
|-- README.md
|-- LICENCE.md
|-- data/
    |-- gapminder/
        |-- 1952.xlsx
        |-- ...
    |-- ...
|-- R/ 
    |-- iteration-02-01-reading-files.R 
    |-- ...

Here: Flags project-root

🚩

/Users/ijlyttle/repos/r-programming-exercises/ 
|-- r-programming-exercises.Rproj 
|-- README.md
|-- LICENCE.md
|-- data/
    |-- gapminder/
        |-- 1952.xlsx
        |-- ...
    |-- ...
|-- R/ 
    |-- iteration-02-01-reading-files.R 
    |-- ...

/Users/ijlyttle/repos/r-programming-exercises/

Here: Returns full path

🚩





🎯

/Users/ijlyttle/repos/r-programming-exercises/ 
|-- r-programming-exercises.Rproj 
|-- README.md
|-- LICENCE.md
|-- data/
    |-- gapminder/
        |-- 1952.xlsx
        |-- ...
    |-- ...
|-- R/ 
    |-- iteration-02-01-reading-files.R 
    |-- ...

here("data/gapminder/1952.xlsx")

/Users/ijlyttle/repos/r-programming-exercises/data/gapminder/1952.xlsx

here() returns a string that represents a path.

It makes no guarantee that the path exists.

Here: Epilogue

here() works especially well if you need to rearrange your source (e.g. .R) files.

However, if you move target files (e.g. .xlsx files), you need to modify your calls to here().

The here way:

read_excel(here("data/gapminder/1952.xlsx"))

🧐 Where here() can help

read_excel("../data/gapminder/1952.xlsx")

🔥 Meme Alert

Do not do this:

setwd("/Users/ijlyttle/repos/r-programming-exercises/data/gapminder")

read_excel("1952.xlsx")

Reading multiple files

Iteration functions in {purrr} can help with repetitive tasks.

Example

Read Excel files from a directory, then combine into a single data-frame.

Our turn: Reading data manually

Here’s our starting code:

data1952 <- read_excel(here("data/gapminder/1952.xlsx"))
data1957 <- read_excel(here("data/gapminder/1957.xlsx"))
data1962 <- read_excel(here("data/gapminder/1952.xlsx"))
data1967 <- read_excel(here("data/gapminder/1967.xlsx"))

data_manual <- bind_rows(data1952, data1957, data1962, data1967)

What problems do you see?

(I see two real problems, and one philosophical problem)

Run this example code, discuss with your neighbor.

Our turn: Make list of paths

I see this as a two step problem:

make a named list of paths, name is year
use list of paths to read data frames, combine

Let’s work together to improve this code to get paths:

paths <-
  # get the filepaths from the directory
  fs::dir_ls(here("data/gapminder")) |>
  # convert to list
  # extract the year as names
  print()

Our turn: Read data

Let’s work together to improve this code to read data:

data <-
  paths |>
  # read each file from excel, into data frame
  # keep only non-null elements
  # set list-names as column `year`
  # bind into single data-frame
  # convert year to number
  print()

Fundamental paradigms

Functional programming has three fundamental paradigms; they act on lists or vectors:

map - do this to each element: purrr::map()
filter - like spaghetti, not coffee: purrr::keep()
reduce - combine into new thing: purrr::reduce()

Each of these takes a function as an argument, to tell the operator what to do.

For coding, we will use r-programming-exercises:

Open R/iteration-02-02-fundamental-paradigms.R.
Restart R.

Map: Intro

num <- 1:4
num |> map(\(x) x + 1)

map() takes:

list or atomic vector
function to apply to each member of the vector

Map: Intermediate result

num <- 1:4
num |> map(\(x) x + 1)

Input	Result
1	2
2
3
4

Map: Result

num <- 1:4
num |> map(\(x) x + 1)

Input	Result
1	2
2	3
3	4
4	5

Map: Atomic variants

map() always returns a list:

num <- 1:4
num |> map(\(x) x + 1)

[[1]]
[1] 2

[[2]]
[1] 3

[[3]]
[1] 4

[[4]]
[1] 5

Use an atomic variant to specify type:

num <- 1:4
num |> map_int(\(x) x + 1)

[1] 2 3 4 5

Keep: Intro

num <- 1:4
num |> keep(\(x) x %% 2 == 0)

Outside {purrr}: known as filter(), but {dplyr} took this name first.

keep() takes:

list or vector
function, when applied to each member, returns TRUE or FALSE
- this is called a predicate function

Keep: Intermediate result

num <- 1:4
num |> keep(\(x) x %% 2 == 0)

Input	Evaluation	Result
1	`FALSE`
2	`TRUE`	2
3
4

Keep: Result

num <- 1:4
num |> keep(\(x) x %% 2 == 0)

Input	Evaluation	Result
1	`FALSE`
2	`TRUE`	2
3	`FALSE`
4	`TRUE`	4

Reduce: Intro

num <- 1:4
num |> reduce(\(acc, x) acc + x)

reduce() takes:

a list or vector
a reducer function, which takes two arguments:
- the accumulated value
- the “next” value of the input

Reduce: First result

num <- 1:4
num |> reduce(\(acc, x) acc + x)

Input	Result
1	1
2
3
4

Reduce: Intermediate result

num <- 1:4
num |> reduce(\(acc, x) acc + x)

Input	Result
1
2	3
3
4

Reduce: Result

num <- 1:4
num |> reduce(\(acc, x) acc + x)

Input	Result
1
2
3
4	10

Reduce: Initialize

num <- 1:4
num |> reduce(\(acc, x) acc + x, .init = 1)

Input	Result
1
2
3
4	11

Reduce: Use existing functions

num <- 1:4
num |> reduce(sum, .init = 1)

Input	Result
1
2
3
4	11

Additional arguments

num <- c(1, 2, 3, NA, 4)
num |> reduce(sum)

[1] NA

The default behavior for sum() is not to remove NA values.

To change the behavior, use an anonymous function:

num |> reduce(\(acc, x) sum(acc, x, na.rm = TRUE))

[1] 10

No longer recommended

num |> reduce(sum, na.rm = TRUE)

Using an anonymous function will:

make it more explicit which argument goes to which function.
tend to yield better error messages.

Variants and adverbs

Some useful variants, can mix and match:

map_lgl() , map_int(), map_dbl(), map_chr()
walk(): like map(), but called for side-effect
imap(), lmap(): use index or list-name as argument
map2(), pmap(): apply over sets of inputs

Adverbs modify verbs (functions):

possibly(), quietly(), slowly(), insistently(), safely()
negate(): return the negative of a predicate
compose(): put two functions together
partial(): pre-fill some arguments of a function

Handling failures with adverbs

If we have a failure, we may not want to stop everything.

library("readr")
read_csv("not/a/file.csv")

Error: 'not/a/file.csv' does not exist in current working directory ('/home/runner/work/r-programming/r-programming').

For coding, we will use r-programming-exercises:

Open R/iteration-02-03-adverbs.R.
Restart R.

Function operators a.k.a. adverbs

Function operators:

take a function
return a modified function

library("purrr")

possibly_read_csv <- possibly(read_csv, otherwise = NULL, quiet = FALSE)

possibly_read_csv("not/a/file.csv")

Error: 'not/a/file.csv' does not exist in current working directory ('/home/runner/work/r-programming/r-programming').

NULL

possibly_read_csv(I("a, b\n 1, 2"), col_types = "dd")

# A tibble: 1 × 2
      a     b
  <dbl> <dbl>
1     1     2

Our turn: Handle failure

In the r-programming-exercises repository:

look at data/gapminder_party/
try running your script using this directory

Create a new function:

possibly_read_excel <- possibly() # we do the rest

Use this function in your script.

Our turn: Re-implement `list_rbind()`

Re-implement list_rbind() using functional-programming techniques:

data_reimplemented <-
  paths_party |>
  map(possibly_read_excel) |>
  # keep(negate(is.null)) |>
  # imap(\(df, name) mutate(df, "year" := parse_number(name))) |>
  # reduce(rbind) |>
  print()

Let’s run this, uncommenting one line at a time.

keeps not-NULL values, purrr::keep()
maps name of element to data-column, purrr::imap()
reduces list to single data-frame, purrr::reduce()

Functions as arguments

We have seen functions as arguments in:

map(), keep(), reduce(): tells them what to do
adverbs, like possibly(): tells what behavior to modify

Using functions, themselves, as arguments takes a little getting used-to.

Once you wrap your mind around it, it’s like seeing in more dimensions.

For coding, we will use r-programming-exercises:

Open R/iteration-02-04-functions-as-arguments.R.
Restart R.

Labelling scales

library("tidyverse")
library("palmerpenguins")
library("conflicted")
conflicts_prefer(palmerpenguins::penguins)

ggplot(penguins, aes(x = bill_length_mm, y = bill_depth_mm, color = species)) +
  geom_point()

What if we want lower-case names for the species?

Specify labels 🧐

ggplot(penguins, aes(x = bill_length_mm, y = bill_depth_mm, color = species)) +
  geom_point() + 
  scale_color_discrete(labels = c("adelaide", "chinstrap", "gentoo"))

We can do it manually, but what if we get a dataset with more species?

Use a labelling function 😎

ggplot(penguins, aes(x = bill_length_mm, y = bill_depth_mm, color = species)) +
  geom_point() + 
  scale_color_discrete(labels = tolower) # tolower is a function

Look at ?discrete_scale: labels can take a function.

Function factories

Function operators (adverbs) return modified functions 🤯

Function factories return functions “out of thin air” 🤯🤯

{scales}, used for {ggplot2} is full of these function factories!

Our turn: Labeller

## use scales:: notation, vs. library(), to help autocomplete
percent_labeller <- scales::label_percent(accuracy = 1)

# percent_labeller is a function
percent_labeller(c(0, 0.01, 0.1, 1))

[1] "0%"   "1%"   "10%"  "100%"

Play around with:

accuracy
values sent to percent_labeller
whatever else seems interesting to you

Your turn: Labeller

ggplot(penguins, aes(x = bill_length_mm, color = species)) +
  stat_ecdf()

Add scale_y_continuous() to this plot, to use a percentage-labeller.

Your turn: Labeller (solution)

ggplot(penguins, aes(x = bill_length_mm, color = species)) +
  stat_ecdf() +
  scale_y_continuous(labels = scales::label_percent(accuracy = 1))

To me, this is a cleaner solution than mutating the data from decimal to percent.

Declariative vs. Imperative Programming

Let’s say you wanted to double this:

original <- 1:4

Declarative

Focus on what:

double <- original |> map_dbl(\(x) 2 * x)
double

[1] 2 4 6 8

Imperative

Focus on how:

double = numeric(length(original))
for (i in seq_along(original)) {
  double[i] = original[i] * 2
}
double

[1] 2 4 6 8

Of course, base R has the ultimate declarative approach:

2 * original

[1] 2 4 6 8

ui.dev has a very accessible article on the two approaches.

If we have time

Three fundamental paradigms in functional programming

Given a list and a function:

filter, keep(): make a new list, subset of old list
map(): make a new list, operating on each element
reduce(): make a new “thing”

For coding, we will use r-programming-exercises:

Open R/iteration-02-05-dpurrr.R.
Restart R.

dplyr using purrr?

We can use purrr::keep(), purrr::map(), purrr::reduce() to “implement”:

I claim it’s possible, I don’t claim it’s a good idea.

Our turn: Simplified penguins

library("conflicted")
library("palmerpenguins")
library("dplyr")
library("purrr")

conflicts_prefer(palmerpenguins::penguins)

# simplify penguins (Sorry Allison!)
penguins_local <-
  penguins |>
  mutate(across(where(is.factor), as.character)) |> # use strings, not factors
  select(species, island, body_mass_g, sex) |>      # fewer columns
  print()

# A tibble: 344 × 4
   species island    body_mass_g sex   
   <chr>   <chr>           <int> <chr> 
 1 Adelie  Torgersen        3750 male  
 2 Adelie  Torgersen        3800 female
 3 Adelie  Torgersen        3250 female
 4 Adelie  Torgersen          NA <NA>  
 5 Adelie  Torgersen        3450 female
 6 Adelie  Torgersen        3650 male  
 7 Adelie  Torgersen        3625 female
 8 Adelie  Torgersen        4675 male  
 9 Adelie  Torgersen        3475 <NA>  
10 Adelie  Torgersen        4250 <NA>  
# ℹ 334 more rows

Tabular data: Two perspectives

column-based: named list of column vectors

{
  "species": ["Adelie", "Adelie", ...],
  "island": ["Torgersen", "Torgersen", ...],
  "body_mass_g": [3750, 3800, ...],
  "sex": ["male", "female", ...]
}

row-based: collection of rows, each a named list

[
  {"species": "Adelie", "island": "Torgersen", "body_mass_g": 3750, "sex": "male"}, 
  {"species": "Adelie", "island": "Torgersen", "body_mass_g": 3800, "sex": "female"}, 
  ...
]

Our turn: Helper functions

We have a couple of helper functions to convert to:

Data frames: column-based

#' @param .d unnamed list of named lists, i.e. transposed data frame
#'
#' @return tibble
dpurrr_to_tibble <- function(.d) {
  .d |>
    purrr::list_transpose() |>
    tibble::as_tibble()
}

Lists of lists: row-based

#' @param .data data frame or tibble
#'
#' @return unnamed list of named lists, i.e. transposed data frame
dpurrr_to_list <- function(.data) {
  .data |>
    as.list() |>
    purrr::list_transpose(simplify = FALSE)
}

Our turn: Experiment

# experiment with helpers
penguins_local |>
  head(2) |>
  dpurrr_to_list() |>
  # dpurrr_to_tibble() |>
  str()

List of 2
 $ :List of 4
  ..$ species    : chr "Adelie"
  ..$ island     : chr "Torgersen"
  ..$ body_mass_g: int 3750
  ..$ sex        : chr "male"
 $ :List of 4
  ..$ species    : chr "Adelie"
  ..$ island     : chr "Torgersen"
  ..$ body_mass_g: int 3800
  ..$ sex        : chr "female"

Comment and change lines as you see fit.

Our turn: dpurrr filter (first element)

# filter is just purrr::keep()
penguins_local |>
  dpurrr_to_list() |>
  keep(\(d) d$sex == "female" && !is.na(d$sex)) |>
  dpurrr_to_tibble()

Predicate function acts on each “row”, d, which is a list:

List of 4
 $ species    : chr "Adelie"
 $ island     : chr "Torgersen"
 $ body_mass_g: int 3750
 $ sex        : chr "male"

List of 4
 $ species    : chr "Adelie"
 $ island     : chr "Torgersen"
 $ body_mass_g: int 3800
 $ sex        : chr "female"

List of 4
 $ species    : chr "Adelie"
 $ island     : chr "Torgersen"
 $ body_mass_g: int 3250
 $ sex        : chr "female"

Our turn: dpurrr filter (more elements)

# filter is just purrr::keep()
penguins_local |>
  dpurrr_to_list() |>
  keep(\(d) d$sex == "female" && !is.na(d$sex)) |>
  dpurrr_to_tibble()

Predicate function acts on each “row”, d, which is a list:

List of 4
 $ species    : chr "Adelie"
 $ island     : chr "Torgersen"
 $ body_mass_g: int 3750
 $ sex        : chr "male"

List of 4
 $ species    : chr "Adelie"
 $ island     : chr "Torgersen"
 $ body_mass_g: int 3800
 $ sex        : chr "female"

List of 4
 $ species    : chr "Adelie"
 $ island     : chr "Torgersen"
 $ body_mass_g: int 3250
 $ sex        : chr "female"

Our turn: dpurrr filter (element results)

# filter is just purrr::keep()
penguins_local |>
  dpurrr_to_list() |>
  keep(\(d) d$sex == "female" && !is.na(d$sex)) |>
  dpurrr_to_tibble()

Predicate function acts on each “row”, d, which is a list:

List of 4
 $ species    : chr "Adelie"
 $ island     : chr "Torgersen"
 $ body_mass_g: int 3750
 $ sex        : chr "male"

List of 4
 $ species    : chr "Adelie"
 $ island     : chr "Torgersen"
 $ body_mass_g: int 3800
 $ sex        : chr "female"

List of 4
 $ species    : chr "Adelie"
 $ island     : chr "Torgersen"
 $ body_mass_g: int 3250
 $ sex        : chr "female"

Our turn: dpurrr filter (result)

# filter is just purrr::keep()
penguins_local |>
  dpurrr_to_list() |>
  keep(\(d) d$sex == "female" && !is.na(d$sex)) |>
  dpurrr_to_tibble()

Re-assembled into a tibble:

# A tibble: 165 × 4
   species island    body_mass_g sex   
   <chr>   <chr>           <int> <chr> 
 1 Adelie  Torgersen        3800 female
 2 Adelie  Torgersen        3250 female
 3 Adelie  Torgersen        3450 female
 4 Adelie  Torgersen        3625 female
 5 Adelie  Torgersen        3200 female
 6 Adelie  Torgersen        3700 female
 7 Adelie  Torgersen        3450 female
 8 Adelie  Torgersen        3325 female
 9 Adelie  Biscoe           3400 female
10 Adelie  Biscoe           3800 female
# ℹ 155 more rows

Our turn: dpurrr mutate

#' @param .d unnamed list of named lists, i.e. transposed data frame
#' @param mapper function applied to each member of `.d`
#' 
#' @return unnamed list of named lists, i.e. transposed data frame
dpurrr_mutate <- function(.d, mapper) {
  # modifyList() used to keep current elements
  .d |> purrr::map(\(d) modifyList(d, mapper(d)))
}

This version of mutate operates on every “row”, modifying its list.

penguins_local |>
  dpurrr_to_list() |>
  dpurrr_mutate(\(d) list(body_mass_kg = d$body_mass_g / 1000)) |>
  dpurrr_to_tibble() |>
  print()

Our turn: dpurrr mutate (start)

penguins_local |>
  dpurrr_to_list() |>
  dpurrr_mutate(\(d) list(body_mass_kg = d$body_mass_g / 1000)) |>
  dpurrr_to_tibble()

# A tibble: 344 × 4
   species island    body_mass_g sex   
   <chr>   <chr>           <int> <chr> 
 1 Adelie  Torgersen        3750 male  
 2 Adelie  Torgersen        3800 female
 3 Adelie  Torgersen        3250 female
 4 Adelie  Torgersen          NA <NA>  
 5 Adelie  Torgersen        3450 female
 6 Adelie  Torgersen        3650 male  
 7 Adelie  Torgersen        3625 female
 8 Adelie  Torgersen        4675 male  
 9 Adelie  Torgersen        3475 <NA>  
10 Adelie  Torgersen        4250 <NA>  
# ℹ 334 more rows

Our turn: dpurrr mutate (by row, before)

penguins_local |>
  dpurrr_to_list() |>
  dpurrr_mutate(\(d) list(body_mass_kg = d$body_mass_g / 1000)) |>
  dpurrr_to_tibble()

List of 4
 $ species    : chr "Adelie"
 $ island     : chr "Torgersen"
 $ body_mass_g: int 3750
 $ sex        : chr "male"

List of 4
 $ species    : chr "Adelie"
 $ island     : chr "Torgersen"
 $ body_mass_g: int 3800
 $ sex        : chr "female"

List of 4
 $ species    : chr "Adelie"
 $ island     : chr "Torgersen"
 $ body_mass_g: int 3250
 $ sex        : chr "female"

Our turn: dpurrr mutate (by row, after)

penguins_local |>
  dpurrr_to_list() |>
  dpurrr_mutate(\(d) list(body_mass_kg = d$body_mass_g / 1000)) |>
  dpurrr_to_tibble()

List of 5
 $ species     : chr "Adelie"
 $ island      : chr "Torgersen"
 $ body_mass_g : int 3750
 $ sex         : chr "male"
 $ body_mass_kg: num 3.75

List of 5
 $ species     : chr "Adelie"
 $ island      : chr "Torgersen"
 $ body_mass_g : int 3800
 $ sex         : chr "female"
 $ body_mass_kg: num 3.8

List of 5
 $ species     : chr "Adelie"
 $ island      : chr "Torgersen"
 $ body_mass_g : int 3250
 $ sex         : chr "female"
 $ body_mass_kg: num 3.25

Our turn: dpurrr mutate (result)

penguins_local |>
  dpurrr_to_list() |>
  dpurrr_mutate(\(d) list(body_mass_kg = d$body_mass_g / 1000)) |>
  dpurrr_to_tibble()

# A tibble: 344 × 5
   species island    body_mass_g sex    body_mass_kg
   <chr>   <chr>           <int> <chr>         <dbl>
 1 Adelie  Torgersen        3750 male           3.75
 2 Adelie  Torgersen        3800 female         3.8 
 3 Adelie  Torgersen        3250 female         3.25
 4 Adelie  Torgersen          NA <NA>          NA   
 5 Adelie  Torgersen        3450 female         3.45
 6 Adelie  Torgersen        3650 male           3.65
 7 Adelie  Torgersen        3625 female         3.62
 8 Adelie  Torgersen        4675 male           4.68
 9 Adelie  Torgersen        3475 <NA>           3.48
10 Adelie  Torgersen        4250 <NA>           4.25
# ℹ 334 more rows

Our turn: dpurrr summarise

#' @param .d unnamed list of named lists, i.e. transposed data frame
#' @param reducer function applied accumulator and to each member of `.d`
#' @param .init initial value of accumulator, if empty: first element of `.d`
#' @param ... other arguments passed to `purrr::reduce()`
#'
#' @return unnamed list of named lists, i.e. transposed data frame
dpurrr_summarise <- function(.d, reducer, .init, ...) {
  # wrap result in a list, to return a transposed data frame
  .d |> purrr::reduce(reducer, .init = .init, ...) |> list()
}

Takes a transposed data frame, returns a transposed data frame with a single “row”.

penguins_local |>
  dpurrr_to_list() |>
  dpurrr_summarise(
    \(acc, d) list(
      body_mass_g_min = min(acc$body_mass_g_min, d$body_mass_g, na.rm = TRUE),
      body_mass_g_max = max(acc$body_mass_g_max, d$body_mass_g, na.rm = TRUE)
    )
  ) |>
  dpurrr_to_tibble() |>
  print()

Our turn: dpurrr summarise (start)

penguins_local |>
  dpurrr_to_list() |>
  dpurrr_summarise(
    \(acc, d) list(
      body_mass_g_min = min(acc$body_mass_g_min, d$body_mass_g, na.rm = TRUE),
      body_mass_g_max = max(acc$body_mass_g_max, d$body_mass_g, na.rm = TRUE)
    )
  ) |>
  dpurrr_to_tibble() |>
  print()

# A tibble: 344 × 4
   species island    body_mass_g sex   
   <chr>   <chr>           <int> <chr> 
 1 Adelie  Torgersen        3750 male  
 2 Adelie  Torgersen        3800 female
 3 Adelie  Torgersen        3250 female
 4 Adelie  Torgersen          NA <NA>  
 5 Adelie  Torgersen        3450 female
 6 Adelie  Torgersen        3650 male  
 7 Adelie  Torgersen        3625 female
 8 Adelie  Torgersen        4675 male  
 9 Adelie  Torgersen        3475 <NA>  
10 Adelie  Torgersen        4250 <NA>  
# ℹ 334 more rows

Our turn: dpurrr summarise (by row, before)

penguins_local |>
  dpurrr_to_list() |>
  dpurrr_summarise(
    \(acc, d) list(
      body_mass_g_min = min(acc$body_mass_g_min, d$body_mass_g, na.rm = TRUE),
      body_mass_g_max = max(acc$body_mass_g_max, d$body_mass_g, na.rm = TRUE)
    )
  ) |>
  dpurrr_to_tibble() |>
  print()

List of 4
 $ species    : chr "Adelie"
 $ island     : chr "Torgersen"
 $ body_mass_g: int 3750
 $ sex        : chr "male"

List of 4
 $ species    : chr "Adelie"
 $ island     : chr "Torgersen"
 $ body_mass_g: int 3800
 $ sex        : chr "female"

List of 4
 $ species    : chr "Adelie"
 $ island     : chr "Torgersen"
 $ body_mass_g: int 3250
 $ sex        : chr "female"

Our turn: dpurrr summarise (by row, after)

penguins_local |>
  dpurrr_to_list() |>
  dpurrr_summarise(
    \(acc, d) list(
      body_mass_g_min = min(acc$body_mass_g_min, d$body_mass_g, na.rm = TRUE),
      body_mass_g_max = max(acc$body_mass_g_max, d$body_mass_g, na.rm = TRUE)
    )
  ) |>
  dpurrr_to_tibble() |>
  print()

List of 2
 $ body_mass_g_min: int 2700
 $ body_mass_g_max: int 6300

Our turn: dpurrr summarise (result)

penguins_local |>
  dpurrr_to_list() |>
  dpurrr_summarise(
    \(acc, d) list(
      body_mass_g_min = min(acc$body_mass_g_min, d$body_mass_g, na.rm = TRUE),
      body_mass_g_max = max(acc$body_mass_g_max, d$body_mass_g, na.rm = TRUE)
    )
  ) |>
  dpurrr_to_tibble() |>
  print()

# A tibble: 1 × 2
  body_mass_g_min body_mass_g_max
            <int>           <int>
1            2700            6300

Our turn: dpurrr summarise with grouping

We need a couple more functions to split and combine, also for our reducer:

#' @param .d unnamed list of named lists, i.e. transposed data frame
#' @param name string, name of variable on which to split
#'
#' @return named list of transposed data frames, names: values of split variable
dpurrr_split <- function(.d, name) {
  # uses purrr::map(), purrr::set_names(), purrr::keep()
}

#' @param .nd named list of transposed data frames
#' @param name string, name of variable to put into combined list
#'
#' @return transposed data frame
dpurrr_combine <- function(.nd, name) {
  # uses purrr::imap(), purrr::reduce()
}

body_mass_g_min_max <- function(acc, d) {
  list(
    body_mass_g_min = min(acc$body_mass_g_min, d$body_mass_g, na.rm = TRUE),
    body_mass_g_max = max(acc$body_mass_g_max, d$body_mass_g, na.rm = TRUE)
  )
}

Our turn: dpurrr summarise with grouping (start)

penguins_local |>
  dpurrr_to_list() |>
  dpurrr_split("species") |>
  map(\(d) d |> dpurrr_summarise(body_mass_g_min_max)) |>
  dpurrr_combine("species") |>
  dpurrr_to_tibble() |>
  print()

# A tibble: 344 × 4
   species island    body_mass_g sex   
   <chr>   <chr>           <int> <chr> 
 1 Adelie  Torgersen        3750 male  
 2 Adelie  Torgersen        3800 female
 3 Adelie  Torgersen        3250 female
 4 Adelie  Torgersen          NA <NA>  
 5 Adelie  Torgersen        3450 female
 6 Adelie  Torgersen        3650 male  
 7 Adelie  Torgersen        3625 female
 8 Adelie  Torgersen        4675 male  
 9 Adelie  Torgersen        3475 <NA>  
10 Adelie  Torgersen        4250 <NA>  
# ℹ 334 more rows

Our turn: dpurrr summarise with grouping (by row)

penguins_local |>
  dpurrr_to_list() |>
  dpurrr_split("species") |>
  map(\(d) d |> dpurrr_summarise(body_mass_g_min_max)) |>
  dpurrr_combine("species") |>
  dpurrr_to_tibble() |>
  print()

  List of 4
   $ species    : chr "Adelie"
   $ island     : chr "Torgersen"
   $ body_mass_g: int 3750
   $ sex        : chr "male"

  List of 4
   $ species    : chr "Adelie"
   $ island     : chr "Torgersen"
   $ body_mass_g: int 3800
   $ sex        : chr "female"

  List of 4
   $ species    : chr "Gentoo"
   $ island     : chr "Biscoe"
   $ body_mass_g: int 4500
   $ sex        : chr "female"

  List of 4
   $ species    : chr "Gentoo"
   $ island     : chr "Biscoe"
   $ body_mass_g: int 5700
   $ sex        : chr "male"

  List of 4
   $ species    : chr "Chinstrap"
   $ island     : chr "Dream"
   $ body_mass_g: int 3500
   $ sex        : chr "female"

  List of 4
   $ species    : chr "Chinstrap"
   $ island     : chr "Dream"
   $ body_mass_g: int 3900
   $ sex        : chr "male"

Our turn: dpurrr summarise with grouping (split)

penguins_local |>
  dpurrr_to_list() |>
  dpurrr_split("species") |>
  map(\(d) d |> dpurrr_summarise(body_mass_g_min_max)) |>
  dpurrr_combine("species") |>
  dpurrr_to_tibble() |>
  print()

$Adelie

  List of 4
   $ species    : chr "Adelie"
   $ island     : chr "Torgersen"
   $ body_mass_g: int 3750
   $ sex        : chr "male"

  List of 4
   $ species    : chr "Adelie"
   $ island     : chr "Torgersen"
   $ body_mass_g: int 3800
   $ sex        : chr "female"

$Gentoo

  List of 4
   $ species    : chr "Gentoo"
   $ island     : chr "Biscoe"
   $ body_mass_g: int 4500
   $ sex        : chr "female"

  List of 4
   $ species    : chr "Gentoo"
   $ island     : chr "Biscoe"
   $ body_mass_g: int 5700
   $ sex        : chr "male"

$Chinstrap

  List of 4
   $ species    : chr "Chinstrap"
   $ island     : chr "Dream"
   $ body_mass_g: int 3500
   $ sex        : chr "female"

  List of 4
   $ species    : chr "Chinstrap"
   $ island     : chr "Dream"
   $ body_mass_g: int 3900
   $ sex        : chr "male"

Our turn: dpurrr summarise with grouping (summarise)

penguins_local |>
  dpurrr_to_list() |>
  dpurrr_split("species") |>
  map(\(d) d |> dpurrr_summarise(body_mass_g_min_max)) |>
  dpurrr_combine("species") |>
  dpurrr_to_tibble() |>
  print()

$Adelie

  List of 2
   $ body_mass_g_min: int 2850
   $ body_mass_g_max: int 4775

$Gentoo

  List of 2
   $ body_mass_g_min: int 3950
   $ body_mass_g_max: int 6300

$Chinstrap

  List of 2
   $ body_mass_g_min: int 2700
   $ body_mass_g_max: int 4800

Our turn: dpurrr summarise with grouping (combine 🌗)

penguins_local |>
  dpurrr_to_list() |>
  dpurrr_split("species") |>
  map(\(d) d |> dpurrr_summarise(body_mass_g_min_max)) |>
  dpurrr_combine("species") |>
  dpurrr_to_tibble() |>
  print()

$Adelie

  List of 3
   $ body_mass_g_min: int 2850
   $ body_mass_g_max: int 4775
   $ species        : chr "Adelie"

$Gentoo

  List of 3
   $ body_mass_g_min: int 3950
   $ body_mass_g_max: int 6300
   $ species        : chr "Gentoo"

$Chinstrap

  List of 3
   $ body_mass_g_min: int 2700
   $ body_mass_g_max: int 4800
   $ species        : chr "Chinstrap"

Our turn: dpurrr summarise with grouping (combine 🌕)

penguins_local |>
  dpurrr_to_list() |>
  dpurrr_split("species") |>
  map(\(d) d |> dpurrr_summarise(body_mass_g_min_max)) |>
  dpurrr_combine("species") |>
  dpurrr_to_tibble() |>
  print()

  List of 3
   $ body_mass_g_min: int 2850
   $ body_mass_g_max: int 4775
   $ species        : chr "Adelie"

  List of 3
   $ body_mass_g_min: int 3950
   $ body_mass_g_max: int 6300
   $ species        : chr "Gentoo"

  List of 3
   $ body_mass_g_min: int 2700
   $ body_mass_g_max: int 4800
   $ species        : chr "Chinstrap"

Our turn: dpurrr summarise with grouping (result)

penguins_local |>
  dpurrr_to_list() |>
  dpurrr_split("species") |>
  map(\(d) d |> dpurrr_summarise(body_mass_g_min_max)) |>
  dpurrr_combine("species") |>
  dpurrr_to_tibble() |>
  print()

# A tibble: 3 × 3
  body_mass_g_min body_mass_g_max species  
            <int>           <int> <chr>    
1            2850            4775 Adelie   
2            3950            6300 Gentoo   
3            2700            4800 Chinstrap

Our turn: Finally…

We can agree this presents no danger to dplyr.

In JavaScript, data frames are often arrays of objects (lists); you can use tools like tidyjs:

Summary

{here} can help you manage file paths within projects.
Functional programming has three fundamental paradigms:
- filter (purrr::keep()), map, reduce
{purrr} offers variants and adverbs.
Adverbs can help you handle failure.
Functions can be used as arguments to functions.
Functions can be returned functions.
Another view of data frames (if we had time).

Wrap-up

Please go to pos.it/conf-workshop-survey.

Your feedback is crucial!

Data from the survey informs curriculum and format decisions for future conf workshops, and we really appreciate you taking the time to provide it.

Thank you!

Emma
Garrett
Mine Çetinkaya-Rundel, Posit
You 🤗

useR to programmeR

Learning objectives

Aside: {here} package

Here: Example

Here: Searches

Here: Moves up and searches

Here: Finds .Rproj

Here: Flags project-root

Here: Returns full path

Here: Epilogue

Reading multiple files

Example

Our turn: Reading data manually

Our turn: Make list of paths

Our turn: Read data

Fundamental paradigms

Map: Intro

Map: Intermediate result

Map: Result

Map: Atomic variants

Keep: Intro

Keep: Intermediate result

Keep: Result

Reduce: Intro

Reduce: First result

Reduce: Intermediate result

Reduce: Result

Reduce: Initialize

Reduce: Use existing functions

Additional arguments

Variants and adverbs

Handling failures with adverbs

Function operators a.k.a. adverbs

Our turn: Handle failure

Our turn: Re-implement list_rbind()

Functions as arguments

Labelling scales

Specify labels 🧐

Use a labelling function 😎

Function factories

Our turn: Labeller

Your turn: Labeller

Your turn: Labeller (solution)

Declariative vs. Imperative Programming

Declarative

Imperative

If we have time

dplyr using purrr?

Our turn: Simplified penguins

Tabular data: Two perspectives

Our turn: Helper functions

Our turn: Experiment

Our turn: dpurrr filter (first element)

Our turn: dpurrr filter (more elements)

Our turn: dpurrr filter (element results)

Our turn: dpurrr filter (result)

Our turn: dpurrr mutate

Our turn: dpurrr mutate (start)

Our turn: dpurrr mutate (by row, before)

Our turn: dpurrr mutate (by row, after)

Our turn: dpurrr mutate (result)

Our turn: dpurrr summarise

Our turn: dpurrr summarise (start)

Our turn: dpurrr summarise (by row, before)

Our turn: dpurrr summarise (by row, after)

Our turn: dpurrr summarise (result)

Our turn: dpurrr summarise with grouping

Our turn: dpurrr summarise with grouping (start)

Our turn: dpurrr summarise with grouping (by row)

Our turn: dpurrr summarise with grouping (split)

Our turn: dpurrr summarise with grouping (summarise)

Our turn: dpurrr summarise with grouping (combine 🌗)

Our turn: dpurrr summarise with grouping (combine 🌕)

Our turn: dpurrr summarise with grouping (result)

Our turn: Finally…

Summary

Wrap-up

Thank you!

Here: Finds `.Rproj`

Our turn: Re-implement `list_rbind()`