Machine Learning Series: Manipulation of Biological Datasets in R using Dplyr and TidyR

About Course

Manipulation of Biological Datasets in R Using dplyr and tidyr for Machine Learning and Data Science

Real-world data is messy and is often created, processed, and stored by a variety of humans, business processes, and applications. As a result, a data set may be missing individual fields, contain manual input errors, or have duplicate data or different names to describe the same thing. Therefore, it is necessary to perform some type of data pre-processing on the raw biological data so that the biological data analysis provides reliable, precise, and robust results.

Data pre-processing generally means transforming the data into a format that is more easily and effectively processed in data mining, machine learning and other data science-related tasks. Data pre-processing is a part of biological data preparation in which we describe any pre-processing performed on raw biological data to prepare it for another biological data processing procedure. In model building, pre-processng plays an important role. It helps us structure the biological data to engineer the features of data according to our own requirements.

Data pre-processing and manipulation in R was not easy before the advent of dplyr and tidyr packages. A lot of functions provided by these two packages help us perform data manipulations for instance grouping, summarization, filtering, etc. The R package dplyr is a package for making tabular data manipulation easy. The tidyr package enables you to swiftly convert between data formats.

Given the fact that data pre-processing is difficult for machine learning, we are offering a simplified yet in-depth course through which you can learn how to easily pre-process biological data for your machine learning projects.

In BioCode’s Manipulation of Biological Datasets in R Using dplyr and tidyr for Machine Learning and Data Science course you’ll learn how to perform data manipulation on biological data using various functions provided by the dplyr and tidyr package in R. This course is for absolute beginners in bioinformatics scripting and you don’t require any prior knowledge of scripting or even bioinformatics to get started with this course.

This course will include the following sections:

Section 1: Introduction to Machine Learning and Data Science

Description: This section will focus on making sure that the students gain an understanding of the concepts related to machine learning and data science. Students with no knowledge of machine learning and data science will gain an in-depth understanding of machine learning and data science and they will learn how data is pre-processed using R language.

Learning Outcomes: Upon completion of this section, students will be able to:

Discuss R Language.
Describe Data Science.
Explain Machine Learning
Explain Data Pre-processing.

Section 2: Cancer & Biological Data Pre-processing for Machine Learning Using Dplyr

Description: This section will focus on making sure that the students gain an understanding of how biological data is pre-processed for machine learning using the dplyr package in R language. Sometimes in your data analysis you may need only a few rows from your biological dataset to perform the analysis, in that case using the filter() function you can filter those rows. Similarly you might need to perform the analysis on selected columns of your dataset, you can use the select() function for that. Likewise, many other functions are provided by the dplyr package to pre-process your biological data.

Learning Outcomes: Upon completion of this section, students will be able to:

Describe Dplyr Package.
Filter Rows with filter() Function.
Select Columns Using select() Function.
Add New Variables Using mutate() Function.
Create Grouped Summaries Using group_by () and summarize() Function.
Filter Data and Create New Variables by Group Using summarize(), mutate(), and by_group() Function.

Section 3: Cancer & Biological Data Pre-processing for Machine Learning Using Tidyr

Description: This section will focus on making sure that the students gain an understanding of how biological data is pre-processed for machine learning using the tidyr package in R language. In your data analysis you may have missing values in your biological data so by using the drop_na(), fill(), and replace_na() function you can eliminate missing values from your data. Similarly you might feel the need to select a column from your dataset and transform it into a vector; you can easily do this using the pull() function provided by tidyr. Likewise, many other functions are provided by the tidyr package to pre-process your biological data.

Learning Outcomes: Upon completion of this section, students will be able to:

Describe Tidyr Package.
Perform Data Spreading and Gathering using spread() and gather() Function.
Perform Data Separating using separate() Function.
Select a Column in a Data Frame and Transforms it Into a Vector Using pull() Function.
Handle Missing Values in Data Using drop_na(), fill(), and replace_na() Function.
Solve a Case Study Using a tidyr Package.
Handle Nontidy Data.

Course Content

Introduction to Data Science, Machine Learning and Data-Preprocessing

Introduction to R

09:48
Introduction to Data Science

00:00
Introduction to Machine Learning

00:00
Introduction to Data Pre-processing

18:13

Hands-on: Manipulation and Data Pre-processing Using Dplyr

Hands-on: Manipulation and Data Pre-processing Using TidyR

Excerice

Add this certificate to your resume to demonstrate your skills & increase your chances of getting noticed.

Student Ratings & Reviews

3.0

Total 2 Ratings

1 Rating

0 Rating

1 Rating

Muhammad Abdullah

6 months ago

I recently took the BioCode course "Machine Learning Series: Manipulation of Biological Datasets in R using Dplyr and TidyR," and it was fantastic! The instructors were knowledgeable, and the hands-on exercises made complex concepts easy to grasp. This course is perfect for anyone looking to boost their data manipulation skills in R. Highly recommend!

José Jiménez

2 years ago

It wasn't what I expected. Nothing about machine learning, just the basic use of R.

Machine Learning Series: Manipulation of Biological Datasets in R using Dplyr and TidyR

About Course

Manipulation of Biological Datasets in R Using dplyr and tidyr for Machine Learning and Data Science

What Will You Learn?

Course Content

Introduction to Data Science, Machine Learning and Data-Preprocessing

Introduction to R

Introduction to Data Science

Introduction to Machine Learning

Introduction to Data Pre-processing

Hands-on: Manipulation and Data Pre-processing Using Dplyr

Introduction to dplyr

Filter Rows with filter()

Select Columns with select ()

Add New Variables with mutate ()

Grouped Summaries with summerize ()

Grouped Mutates (and Filters)

Hands-on: Manipulation and Data Pre-processing Using TidyR

Introduction to tidyr

Data Spreading Function

Data Gathering Function

Data Separating & Pull

Missing Values

Excerice

Quiz

Earn a certificate

Categorys

Services

Hurry up! Sale ends in: