Home » R Tutorials » R Data Frames

R Data Frames

A Data frame is a primary data structure for handling tabular data sets like a spreadsheet. R Data frame is a two-dimensional data structure and displays data in the format of a table. Data frames are an atomic data structure in R.

R Data frames are like matrices except that the columns are allowed to be of different types. R data frames stores heterogeneous data types, which means it stores different types of data in it.

In this tutorial, we will discuss R data frames, how to create data frames in R, access elements of the data frame.

Create Data Frame in R

Using the data.frame() function, we can create a data frame in R.

It converts the collection of vectors or a matrix into a data frame.

Creating an empty data frame in R

To create an empty data frame with variable names and types, use data.frame() function. It creates an empty structure of a data frame.

# Create the empty data frame
student_data <- data.frame(
  Id = numeric(),
  Name = character(),
  Age = numeric()
  
)
str(student_data)
# Print the data frame
student_data

In the above R code, using data.frame() function, it create an empty data frame.

The output of the above r code gets the structure of the data frame and prints the data frame.

'data.frame':	0 obs. of  3 variables:
 $ Id  : num 
 $ Name: chr 
 $ Age : num 

[1] Id   Name Age 
<0 rows> (or 0-length row.names)

Creating a data frame using data.frame() function

Let’s consider an example to create a data frame using the data.frame() function.

student_data <- data.frame(
  name = c("Tom", "Aron", "Gary","Jeannie"),
  gender = c('Male', 'Male','Male', 'Female'),
  age = c(18, 20, 17, 21))

# Print the data frame  
student_data

The output of the above R data frame is:

     name   gender  age
1     Tom    Male   18
2    Aron    Male   20
3    Gary    Male   17
4   Jeannie  Female 21

To get the structure of the data frame, use the str() function.

str(student_data)

The output of the data frame structure is:

'data.frame':   4 obs. of  3 variables:
 $ name  : chr  "Tom" "Aron" "Gary" "Jeannie"
 $ gender: chr  "Male" "Male" "Male" "Female"
 $ age   : num  18 20 17 21

Creating a data frame from vectors

A Data frame can be created using the vectors. To construct a data frame from the vector data, create a vector that corresponds to each column of the data.

name <- c("Tom", "Aron", "Gary","Jeannie")
gender <- c("Male", "Male", "Male", "Female")
age <- c(18, 20, 17, 21)

Use data.frame() function to combine all the vectors to create a data frame.

The data.frame() function creates an object called student_data and stores values of the variable’s name, gender, and age

student_data <- data.frame(name, gender, age)
# Display class of a data frame
class(student) 
[1] "data.frame"

Using the names() function, you can get the name of the variables in the data frame.

# display the name of the variables from the data frame
names(student_data)  
[1] "name" "gender" "age"

Using the str() function to get the structure of the data frame in R.

# Display the structure of a data frame
str(student_data)  
'data.frame':   4 obs. of  3 variables:
 $ name  : chr  "Tom" "Aron" "Gary" "Jeannie"
 $ gender: chr  "Male" "Male" "Male" "Female"
 $ age   : num  18 20 17 21

Creating a data frame from list()

You can create a data frame in R using the list() function.

Let’s practice!

Consider the above example of student data.

Create a list using the vectors.

name <- c("Tom", "Aron", "Gary","Jeannie")
gender <- c("Male", "Male", "Male", "Female")
age <- c(18, 20, 17, 21)

# Create a list from a vector
student_data.list <- list(
  name = name,
  gender = gender,
  age = age
  )
class(student_data.list)

To get the class of the list, use the class() function. The output is as

[1] "list"

Create a data frame from the list

# Create a data frame from list
student_data <- data.frame(student_data.list)

# Display the structure of a data frame
str(student) 

The output of the above r data frame structure

'data.frame':   4 obs. of  3 variables:
 $ name  : chr  "Tom" "Aron" "Gary" "Jeannie"
 $ gender: chr  "Male" "Male" "Male" "Female"
 $ age   : num  18 20 17 21

Get the dimension of a data frame using the dim() function

# Display dimension of a data frame
dim(student_data) 
[1] 4 3

To get attributes of a data frame, use the attributes() function

attributes(student_data)
$names
[1] "name"   "gender" "age"

$class
[1] "data.frame"

$row.names
[1] 1 2 3 4

Summarize the Data Frame in R

Using the Summary() function, you can get the summary of a data frame in R.

Let’s consider an example to get the summary of the data frame.

employee_data <- data.frame (
  Name = c("Tom", "Andrea", "Aaron"),
  Exp = c(8, 15, 10),
  Salary = c(100000, 180000, 145000)
)

# Print the data frame
employee_data

# Display the summary of data frame
summary(employee_data)

In the above R code, the summary() function displays the summary of the data frame.

    Name Exp Salary
1    Tom   8 100000
2 Andrea  15 180000
3  Aaron  10 145000

     Name        Exp           Salary      
 Aaron :1   Min.   : 8.0   Min.   :100000  
 Andrea:1   1st Qu.: 9.0   1st Qu.:122500  
 Tom   :1   Median :10.0   Median :145000  
            Mean   :11.0   Mean   :141667  
            3rd Qu.:12.5   3rd Qu.:162500  
            Max.   :15.0   Max.   :180000 

Access Elements of Dataframe in R

You can access the elements of the data frame by specifying the row or column number.

  • df[i,] returns i row of data frame df,
  • df[,j] returns j column of data frame df and
  • df[i,j] returns (i,j) element of data frame df.

Accessing Rows using index

Let’s consider an example to create a sales data frame and access the elements of the sales data.

sales_data <- data.frame (
  Name = c("Ebook", "Book", "Video"),
  Revenue = c(25000, 15000, 100000),
  Profit = c(10000, 7000, 45000)
) 

To get the first row of data from the data frame in R, use the following code

# Returns the first row of data
sales_data[1,]
   Name Revenue Profit
1 Ebook   25000  10000

Access the dataframe column by index

You can get data frame column data by specifying the column by index.

# Returns the 2nd column of data
sales_data[,2]
[1]  25000  15000 100000

In the above R code, it returns the vector data from the data frame.

To get column data from the data frame, use drop = FALSE.

# returns 2nd column of data frame
sales_data[, 2, drop = FALSE]   
  Revenue
1   25000
2   15000
3  100000

Access Row and Column in Dataframe r

You can access and gets the row and column data from the data frame by specifying the row and column number.

Let’s practice!

# Returns value from 2nd row and 3rd column
sales_data[2,3] 
[1] 7000

To get the first two rows of data and the third column of the data frame, use the following code.

# Returns the first two rows of data and third column of data frame
sales_data[1:2,3]
[1] 10000  7000

You can extract the non-adjacent rows and columns of the data frame using the following r code.

# Returns the elements from first and Third row of sales data frame
sales_data[c(1,3),]
   Name Revenue Profit
1 Ebook   25000  10000
3 Video  100000  45000

Access Dataframe Column by Name

You can get access the data frame column data by name of the variable specified in the square bracket.

# Returns the Revenue column of data frame
sales_data["Revenue"]
Revenue
1   25000
2   15000
3  100000

You can also access the data frame column data using the $ symbol and the name of the variable.

# Retrieve the "Profit" column of data frame
sales_data$Profit
[1] 10000  7000 45000

You can access the column of the data frame in r using the double-quotes.

# Returns Name column of data frame 
sales_data[["Name"]]
[1] "Ebook" "Book"  "Video"

Add Rows to Dataframe r

You can add a row to a data frame using rbind() function.

Let’s practice with an example of Sales data from the above R code.

sales_data <- data.frame (
  Name = c("Ebook", "Book", "Video"),
  Revenue = c(25000, 15000, 100000),
  Profit = c(10000, 7000, 45000)
)
# Create new row with data
new_row <- c("Audio",80000,55000)
# Add row to data frame using rbind()
sales_data <- rbind(sales_data,new_row)

# Print the data frame
sales_data
   Name Revenue Profit
1 Ebook   25000  10000
2  Book   15000   7000
3 Video   1e+05  45000
4 Audio   80000  55000

Removing rows from a data frame

Using the negative index for rows or concatenate function c(), remove rows from a data frame.

# Create a data frame
visitor_info <- data.frame(
  country = c("US","UK","IN","AUS"),
  visitors = c(10000,808,120,340),
  social = c(3200,1220,450,120),
  direct = c(1500,430,200,100)
)
# Print the data frame
visitor_info

# remove row using the concatenate c() function
visistor_info <- visitor_info[-c(2),]

In the above R code to remove row from a data frame,

using data.frame() it creates a data frame with data.

using concatenate function c(), it removes rows from a data frame. We have specified a row number for remove from data frame.

  country visitors social direct
1      US    10000   3200   1500
2      UK      808   1220    430
3      IN      120    450    200
4     AUS      340    120    100

  country visitors social direct
1      US    10000   3200   1500
3      IN      120    450    200
4     AUS      340    120    100

Add Column to Dataframe r

You can add a column to the data frame in R using a simple assignment or using the cbind() function.

Let’s consider an example to add a new column to the visitor_info data frame using a simple assignment.

# Create a data frame
visitor_info <- data.frame(
  country = c("US","UK","IN","AUS"),
  visitors = c(10000,808,120,340),
  social = c(3200,1220,450,120),
  direct = c(1500,430,200,100)
)
# Print the data frame
visitor_info

# Create new vector refer
refer <- c(800,120,80,60)
# Add refer column to data frame
visitor_info$refer <- refer

# Print the data frame
visitor_info

In the above R code to add a column to the data frame in R, we create a new vector and assign a vector to the data frame variable.

The output of the above r code is:

 country visitors social direct
  country visitors social direct
1      US    10000   3200   1500
2      UK      808   1220    430
3      IN      120    450    200
4     AUS      340    120    100

  country visitors social direct refer
1      US    10000   3200   1500   800
2      UK      808   1220    430   120
3      IN      120    450    200    80
4     AUS      340    120    100    60   

Another method to add a column to an existing data frame is using the cbind() function.

# Create new vector as linkedin
linkedin <- c(1000,250,100,45)

# Use the cbind() function to add column to existing data frame
visitor_info <- cbind(visitor_info,linkedin)

# Prints the data frame
visitor_info

The output of the above r code to add a new column to the existing data frame in r is:

 country visitors social direct refer linkedin
1      US    10000   3200   1500   800     1000
2      UK      808   1220    430   120      250
3      IN      120    450    200    80      100
4     AUS      340    120    100    60       45

Conclusion

I hope the article on r data frames, how to create a data frame in r, access elements of a data frame in r using different methods, and modifying the data frame by adding a new row or column is helpful to you.

Leave a Comment