# Import ggplot2 package for graphs
install.packages("ggplot2")
library(ggplot2)
## Numeric Distribution
# Default: Histogram
qplot(GDP.per.capita.2017,data= Countries_Sub)
# Add Title and labels:
qplot(GDP.per.capita.2017,data= Countries_Sub,
main = " GDP Per Capita Distribution in 2017",
xlab = "GDP per Capita",
ylab = "Frequency")
# Save plot in global environment
p = qplot(GDP.per.capita.2017,data= Countries_Sub,
main = "SAVED - GDP Per Capita Distribution in 2017",
xlab = "GDP per Capita",
ylab = "Frequency")
#Call object
p
# Density
qplot(GDP.per.capita.2017,data= Countries_Sub, geom ="density",
main = " GDP Per Capita Distribution in 2017",
xlab = "GDP per Capita",
ylab = "Density")
## Categorical distribution
## Default: Bar plot
qplot(Region,data= Countries_Sub,
main = "Region Distribution",
xlab = "Region",
ylab = "Frequency")
## Numerical against Categorical
# Default: Dot plot
qplot(Region,Under.5.Mortality.Rate.2017,data=Countries_Sub,
main = "Mortality Rate Distribution per Region",
xlab = "Region",
ylab = "Under 5 Mortality Rate")
# Box plot
qplot(Region, Under.5.Mortality.Rate.2017,data=Countries_Sub, geom= "boxplot",
main = "Mortality Rate Distribution per Region",
xlab = "Region",
ylab = "Under 5 Mortality Rate")
# Colored Density plot
qplot( Under.5.Mortality.Rate.2017,fill = Region, data=Countries_Sub, geom= "density", alpha=I(.8),
main = "Mortality Rate Distribution per Region",
xlab = "Under 5 Mortality Rate",
ylab = "Frequency")
## Categorical against categorical
# Colored bar plot
qplot(Region,fill=IncomeGroup,data=Countries_Sub, geom= "bar",
main = "Income Group Distribution per Region",
xlab = "Region",
ylab = "Frequency")
## Numerical against numerical
# Default: Dot plot
qplot(GDP.per.capita.2017, Under.5.Mortality.Rate.2017,data=Countries_Sub,
main = "Mortality Rate Distribution against Countries' GDP per Capita in 2017",
xlab = "GDP per Capita",
ylab = "Under 5 Mortality Rate")
## Three variables
#Colored Dot plot
qplot(GDP.per.capita.2017, Under.5.Mortality.Rate.2017, color = Region, data=Countries_Sub,
main = "Mortality Rate Distribution against Countries' GDP per Capita in 2017",
xlab = "GDP per Capita",
ylab = "Under 5 Mortality Rate")
## Save last plot as an image
ggsave('Mortality_Rate_GDP_Region.jpeg', width = 9, height = 6)
## Save any plot from global environment as an image
ggsave('GDP_per_Capita_Distrib.jpeg',p, width = 9, height = 6)
Analysis is as below
# Import ggplot2 package for graphs
install.packages("ggplot2")
library(ggplot2)
install.packages("ggplot2"): This is a one-time command that downloads and installs the ggplot2 package from the CRAN repository onto your computer. This package is essential for creating the plots that follow.
library(ggplot2): This command loads the installed ggplot2 package into your current R session, making all its functions (like qplot, ggsave, etc.) available for use.
2. Analyzing Numeric Distribution (One Variable)
This section focuses on visualizing the distribution of a single numerical variable: GDP.per.capita.2017. The primary function used is qplot() (Quick Plot), which is a simplified way to use ggplot2.
qplot(GDP.per.capita.2017, data= Countries_Sub)
Creates a histogram showing the frequency of different GDP per capita ranges. (The default for one numeric variable).
qplot(..., main = "...", xlab = "...", ylab = "...")
Adds a title (main) and custom labels for the X and Y axes (xlab, ylab) to the histogram for better clarity.
p = qplot(...) and p
Saves the plot to an object named p in the environment, rather than just displaying it. The command p then displays the saved plot.
qplot(..., geom ="density")
Creates a density plot, which is a smoothed-out histogram that shows the probability distribution of the data.
3. Analyzing Categorical Distribution (One Variable)
## Default: Bar plot
qplot(Region,data= Countries_Sub, ...)
This creates a bar plot that shows the count (Frequency) of countries belonging to each unique Region. (The default plot for one categorical variable).
4. Numerical Against Categorical (Two Variables)
This compares the distribution of a numeric variable (Under.5.Mortality.Rate.2017) across different categories (Region).
qplot(Region, Under.5.Mortality.Rate.2017, ...)
Creates a dot plot (or scatter plot) by default, showing individual mortality rate values grouped by region.
qplot(..., geom= "boxplot")
Creates a box plot (or box-and-whisker plot) for each region. This visually summarizes the minimum, maximum, median, and quartiles of the mortality rates for each region.
qplot(..., fill = Region, geom= "density", ...)
Creates a density plot of the mortality rate, but draws a separate, colored density line for each Region to visually compare their probability distributions.
5. Categorical Against Categorical (Two Variables)
# Colored bar plot
qplot(Region, fill=IncomeGroup, data=Countries_Sub, geom= "bar", ...)
This creates a stacked or grouped bar plot. It shows the distribution of Region, but the bars are colored (fill) by IncomeGroup. This visualizes the composition of income groups within each geographic region.
6. Numerical Against Numerical (Two or Three Variables)
This section explores the correlation between two numeric variables.
qplot(GDP.per.capita.2017, Under.5.Mortality.Rate.2017, ...)
Creates a scatter plot (dot plot) where each country is a point, allowing you to visually assess the relationship (or correlation) between GDP and Mortality Rate.
qplot(..., color = Region, ...)
Takes the scatter plot from above and adds a third variable (Region) by coloring each point according to its regional group. This helps identify if the relationship between GDP and Mortality Rate changes based on the region.
7. Saving the Plots
## Save last plot as an image
ggsave('Mortality_Rate_GDP_Region.jpeg', width = 9, height = 6)
## Save any plot from global environment as an image
ggsave('GDP_per_Capita_Distrib.jpeg',p, width = 9, height = 6)
ggsave(...): This function saves the specified plot as an image file.
The first ggsave saves the last plot displayed (the Colored Dot plot) to a file named Mortality_Rate_GDP_Region.jpeg.
The second ggsave explicitly saves the plot object p (which was the "GDP Per Capita Distribution" histogram saved earlier) to a file named GDP_per_Capita_Distrib.jpeg.
width and height: These parameters specify the size of the output image in inches.