Assign variable value with <-
If you need to use any library, you will need to install the library first. For example, if you need to use a library, readxl, to read data from Excel, you will have to install it first by running the following command: install.packages(“readxl”). After I ran that commande, I got the following message: package ‘readxl’ successfully unpacked and MD5 sums checked. After the package (or library) has been installed, now I can run the following command to use it: library(readxl).
The package of tidyverse is highly recommended to install because it includes some very popular sub-libraries. So I got it installed after running the following command: install.package(“tidyverse”).
To use datasets included in tidyverse, you will need to run the following command first: libary(tidyverse). After running it, you will be able to see all the datasets included in the tidyverse package.
Now we need to learn an essential fucntion: data(). After running data() function, you should be able to see all the datasets (either default in R or those included in other libraries such as tidyverse).
1) Use glimpse(mpg) to view the structure and sample data values of a large dataset quickly, such as how many rows, columns and each column data type (e.g., int, chr, dbl).
2) Run a simple statistical command: mean(mgp$cyl) to get the mean of the cyl column data
3) A lot of times you want to filter conditions to get statistics of a subset of data (such as female, or car with mileage >=20. If I want to get a subset of data of the mpg with mileage >=20, I can simply ran the following command: filter(mgp, cty>=20). I even can create a new dataset: mpg_effiency <- filter(mpg, cty>=20). Then you can view the new data set with command: view(mpg_efficiency). If I want to see any the subset of data with manufacturer: mpg_ford <- filter(mpg, manufacturer ==’ford’). I just created new dataset mpg_ford. You can view the dataset with: view(mpg_ford).
4) To add a column in a dataset, we will need to use mutate() function. For example, if we want to add another column to convert cty (mileage per gallon) to kilometers per liter, we can do the following command: mpg_metric <- mutate(mpg, cty_metric = cty * 0.425144). After running this command, a new dataset, mpg_metric with a new column, cty_metric, has been created. You can use glimpse() command to see the data structure of the new dataset.
5) There is another R function, called pipes, to take dataset more directly and efficiently. For example, “mpg %>%” literally means USE dataset mgp
cty_metric <- mpg %>%
mutate(mpg cty_metric = cty * 0.425144)
6) Another task we often need to handle is to get results by groups.
view(mpg) #make mpg dataset in front
mpg %>% #use mpg dataset
group_by(class) %>% #use the grouped data
summarise(mean(cty), median(cty)) #get the summarized means and medians of cty by class
7) Data visualization with ggplot2 #gg stands for grammar and graphics
The command is ggplot
#The aes() function is used to define aesthetic mappings, which describe how variables in your data are translated into visual properties (aesthetics) of a plot.
ggplot(mpg, aes(x=cty)) + geom_histogram() # The second part is to tell R to generate a histogram chart
If we want to add label on the x axis, we can do the following:
ggplot(mpg, aes(x=cty)) +
geom_histogram() +
labs(x = “City Mileages”) #Add label for the X axis
We can also do both histogram() and freqppoly() at the same time:
ggplot(mpg, aes(x=cty)) +
geom_histogram() +
geom_freqpoly() +
labs(x = “City Mileages”) #Add label for the X axis
We can also make scatter plot with both x and y
ggplot(mpg, aes(x=cty, y=hwy)) +
geom_point() + #scatterplot
labs(x = “City Mileages”) #Add label for the X axis
If we want to add linear regression line, we can do the following:
ggplot(mpg, aes(x=cty, y=hwy)) +
geom_point() + #scatter plot
geom_smooth(method = “lm”) #regression line
labs(x = “City Mileages”) #Add label for the X axis
# We can add color for different classes
ggplot(mpg, aes(x=cty, y=hwy, color = class)) +
geom_point() + #scatter plot
geom_smooth(method = “lm”) #regression line
labs(x = “City Mileages”) #Add label for the X axis
9) Communicating results with others
Select R Markdown in the selection list after click the green “+” sign at the top left menu bar
In the pop-up new R Markdown window, you can just leave HTML format, and then click OK
Then modify existing template and then click the small Knit sign on the top menu bar to generate the HTML file