Home » R Tutorials » R Subset Data Frame by Column

R Subset Data Frame by Column

R subset() function returns the subset of the data frame which meets the condition. Subsetting data frame in R extracts the variables and observations from the data frame which meets the specified condition.

subset() function accepts three parameters as data frame to be subsetted, logical expression to keep row or elements to keep and select indicates columns to select from the data frame.

subset(x, subset, select, drop = FALSE, …)

In this R subset data frame by column value tutorial, we will discuss how to subset data frame by column value in R with examples.

R Subset Data Frame by Column Name

To subset data frame by columns wise, use subset() function.

Let’s consider an example to understand the subsetting of a data frame in r.

In the following R code, we have created a data frame having columns name, age, gender, and marks and stored it in student_info.

# Create a data frame
student_info <- data.frame(
  name = c("Tom","Kim","Sam","Julie","Emily","Chris"),
  age = c(20,21,19,20,21,22),
  gender = c('M','F','M','F','F','M'),
  marks = c(72,77,65,80,85,87)
)
# Print the data frame
student_info

In our following example, we select columns name and marks from the data frame.

subset(student_info,select = c('name','gender'))

subset function uses parameter data frame to be subsetted, and select expression to select multiple columns from the data frame.

The output of the above r code is:

   name gender
1   Tom      M
2   Kim      F
3   Sam      M
4 Julie      F
5 Emily      F
6 Chris      M

R Subset Data Frame by Column Value

subset function has a subset parameter to write a logical expression.

Let’s use the above student data frame.

We can select only female candidates from the data frame using the following r code.

# Display all columns for Female candidate
subset(student_info,gender=="F")

In the above r code, we have specified the condition to filter data frame by column value having “F”.

The output of the above r code is:

   name age gender marks
2   Kim  21      F    77
4 Julie  20      F    80
5 Emily  21      F    85

Subsetting Data Frame and Select Multiple Columns

Let’s consider an example of the student data frame to select the name and marks of the female candidate only.

Using the following r subset() function, we can display names and marks for female candidates only.

# Using select to select variable
# Display only Name and Marks column from data frame for female candidate
subset(student_info,gender == 'F',select = c(name,marks) )

select argument is used to indicate multiple columns.

The output of the above r code is:

   name marks
2   Kim    77
4 Julie    80
5 Emily    85

Subset Data Frame by Multiple Conditions

Using the subset() function in R, you can subset data frame by multiple conditions.

Let’s use the above student data frame to get a subset of the data frame where gender = “F” and the mark is greater than 80

# Get Female students having marks greater than 80
subset(student_info,marks > 80 & gender == 'F')

In the above R code, we have specified multiple conditions in subset parameter where marks > 80 & gender == “F”

The output of the above r code is:

   name age gender marks
5 Emily  21      F    85

R Subset Data Frame exclude specified column

Using the select argument in the R subset() function, you can select or deselect the column name.

Let’s use the above student_info data frame to get all columns except the age column from the data frame for female candidates.

# Display all columns except age column for Female candidate
subset(student_info,gender == "F", select = -age)

In the above R code, we have specified a condition to extract data for female candidates only.

Using the select argument, we have specified the age column with negative to exclude it from the output.

The output of the r code is:

   name marks
2   Kim    77
4 Julie    80
5 Emily    85

Conclusion

I hope the above article on how to subset data frame in r using subset() function is helpful to you.

using the subset() function, you can specify the multiple conditions to extract the data from the data frame. Use select argument to select or deselect the variable from the date frame.

Leave a Comment