Ora

What is the opposite of gather in R?

Published in Data Reshaping 2 mins read

The direct opposite of gather() in R, specifically within the tidyr package, is the spread() function.

gather() and spread() are complementary functions designed for reshaping data between "wide" and "long" formats, which are fundamental operations in data tidying.

Understanding Data Reshaping with gather() and spread()

Data often comes in various formats, and for analysis or visualization, it frequently needs to be transformed. The tidyr package in R provides powerful tools for this, with gather() and spread() (and their modern successors pivot_longer() and pivot_wider()) being key players.

What gather() Does

The gather() function is used to convert data from a wide format to a long format. It takes multiple columns that represent different measurements or variables and "gathers" them into just two new columns:

  • One column stores the original column names (often called the "key" or "name" column).
  • Another column stores the values from those original columns (often called the "value" column).

This process increases the number of rows and decreases the number of columns, making it easier to perform analyses where values for a specific category are needed in a single column.

What spread() Does (The Opposite of gather())

Conversely, the spread() function performs the reverse operation of gather(). It converts data from a long format back into a wide format. To do this, spread() takes two existing columns:

  1. A key column: This column contains the unique categories or names that will become the new column headers in the wide format.
  2. A value column: This column contains the data points that will populate the cells under these new column headers.

By using these two columns, spread() effectively expands rows into multiple new columns, decreasing the number of rows and increasing the number of columns.

Example: Reshaping Data in R

Let's illustrate how gather() and spread() work together with a simple dataset. First, ensure you have the tidyr package installed and loaded:

# install.packages("tidyr")
library(tidyr)
library(dplyr) # Often used with tidyr for data manipulation

1. Starting with Wide Data

Imagine we have data showing student scores for different subjects in a wide format:

# Create a sample wide dataset
wide_data <- data.frame(
  Student = c("Alice", "Bob"),
  Math = c(90, 85),
  Science = c(95, 80),
  History = c(88, 92)
)

print(wide_data)

Output:

Student Math Science History
Alice 90 95 88
Bob 85 80 92

2. Using gather() to Make Data Long

Now, let's "gather" the Math, Science, and History columns into a long format.

long_data <- wide_data %>%
  gather(key = "Subject", value = "Score", Math, Science, History)

print(long_data)

Output:

Student Subject Score
Alice Math 90
Bob Math 85
Alice Science 95
Bob Science 80
Alice History 88
Bob History 92

Notice how the Math, Science, and History column names are now values in the Subject column, and their corresponding scores are in the Score column.

3. Using spread() to Return to Wide Format

To demonstrate that spread() is the opposite, we can take our long_data and "spread" it back into wide_data using Subject as the key and Score as the value.

re_widened_data <- long_data %>%
  spread(key = "Subject", value = "Score")

print(re_widened_data)

Output:

Student History Math Science
Alice 88 90 95
Bob 92 85 80

As you can see, re_widened_data is identical to wide_data (though column order might differ), confirming that spread() effectively reverses the operation of gather().

Evolution to pivot_longer() and pivot_wider()

It's important to note that while gather() and spread() are still functional, the tidyr package has introduced more modern and flexible functions:

  • pivot_longer() is the successor to gather().
  • pivot_wider() is the successor to spread().

These new functions offer enhanced capabilities and a more consistent syntax for complex reshaping tasks. However, the underlying concepts remain the same: pivot_wider() is the opposite of pivot_longer(), just as spread() is the opposite of gather().

For more details on tidyr functions, you can refer to the official tidyr package documentation.