Home » R Tutorials » Merge Data Frames in R

Merge Data Frames in R

Merge data frames are quite useful when data is available in different data stores. The combining of data frames in R gives more insights to perform analysis of the dataset. Using merge() , rbind() functions, we can merge data frames in R.

merge() is a built-in R function that merges data frames by one or more common column names. It merges the two data frames horizontally.

rbind( ) function in R combines data frames vertically. Both data frames should have the same variables. If any of the variables from one data frame is not available in the second data frame, either add additional variables in the second data frame and set it NA value (missing).

In this tutorial, we will discuss how to merge data frames in R using the merge() function and combine two data frames using the rbind() function.

Merge Data Frames in R using merge()

Using the merge in-built R function, we can combine both data frames by common key variable.

Let’s consider an example to merge two data frames in R.

# Create a data frame
student_info <- data.frame(
  id = c(1,2,3,4,5,6),
  name = c("Tom","Kim","Sam","Julie","Emily","Chris"),
  age = c(20,21,19,20,21,22),
  gender = c('M','F','M','F','F','M'),
  marks = c(72,77,65,80,85,87)
)
# Print the data frame
student_info

library_info <- data.frame(
id = c(1,2,4,6,3,5),
book_name = c("Statistics","R-Programming","Algebra","Python","Geometry","AP"),
book_isbn = c(978,829,129,233,120,23)
)
library_info

In the above R code, we have created two data frames using the data.frame().

In these two data frames, it has a common key variable as id.

We can merge two data frames by common key column id in R.

# Merge two data frames in R using merge()
student_book <- merge(student_info,library_info,by="id")
# Print the merge data set
student_book

The output of the merged data frame is:

  id  name age gender marks     book_name book_isbn
1  1   Tom  20      M    72    Statistics       978
2  2   Kim  21      F    77 R-Programming       829
3  3   Sam  19      M    65      Geometry       120
4  4 Julie  20      F    80       Algebra       129
5  5 Emily  21      F    85            AP        23
6  6 Chris  22      M    87        Python       233

Combine Data Frames in R using rbind()

Using the rbind() R function, we can combine two data frames in R.

To use the rbind() function for combining two data frames needs

  • Both data frames should have the same variable/columns
  • If the variable is not available in the data frame then assign it to NA value (missing) or delete the extra column from data frame.

Let’s consider an example to demonstrate the merging of two data frames using rbind() in R.

Create two data frames using the below R code.

# Create a data frame as account1 using data.frame()
account1 <- data.frame(Name =c("Tom","Aroy","Kim"), BankName=c("Citi", "HSBC", "HSBC"), Balance=c(3550, 4500, 2800))
# Create data frame as account2
account2 <- data.frame(Name=c("Keory","Elon"), Balance=c(2500, 8000))

In the above R code, we have created two data frames.

account1 data frame and account2 data frame have a few common key variables like Name and Balance.

However, account2 doesn’t have the BankName column name.

If we try to merge two data frames using rbind(), we will get an error as

Error in rbind(deparse.level, ...) : 
  numbers of columns of arguments do not match
Calls: rbind -> rbind

There are two options to deal with the problem.

Either delete the BankName column from the account1 data frame or add the BankName column in the account2 data frame and set them to NA (missing) value.

Let’s implement the second option to add the missing BankName variable in a second data frame and assign it to the NA value.

account2$BankName <- NA

Now, both data frames have the same variables, use the rbind() function in R to join two data frames.

It combines data frames vertically.

# Create a data frame as account1 using data.frame()
account1 <- data.frame(Name =c("Tom","Aroy","Kim"), BankName=c("Citi", "HSBC", "HSBC"), Balance=c(3550, 4500, 2800))
# Create data frame as account2
account2 <- data.frame(Name=c("Keory","Elon"), Balance=c(2500, 8000))
# Add the missing variable and assign it NA value
account2$BankName <- "NA"
# Use rbind() in R to merge two data frames
merge_account <- rbind(account1,account2)
# Print the merge account data set
merge_account

The output of the above R code to join two data frames is:

   Name BankName Balance
1   Tom     Citi    3550
2  Aroy     HSBC    4500
3   Kim     HSBC    2800
4 Keory       NA    2500
5  Elon       NA    8000

Conclusion

I hope the above article on how to merge data frames in R using the merge() function is useful to you. merge() function join the data frames horizontally by common key variable.

You can use the rbind() function in R to bind two data frames. It combines two data frames vertically and required two data frames to have the same variables.

Leave a Comment