Nathalie Vialaneix and Sébastien Déjean
12 octobre 2018
 Edward Tufte about 
1986 Challenger space shuttle disaster
|  |  | 
Ref: Visual explanations: image and quantities, evidence and narrative. Chap. 5 deals with the Challenger disaster.
plot works for many different ojects)ggplot2 is a part of tidyverse (www.tidyverse.org) a collection or R packages designed for data sciencedata.frameggplot2 learning curve is steep but worth the effort
people <- data.frame(weight = c(80, 49, 62, 57),
                     height = c(1.82, 1.58, 1.71, 1.63),
                     gender = c("M", "F", "F", "F"))
people
  weight height gender
1     80   1.82      M
2     49   1.58      F
3     62   1.71      F
4     57   1.63      F
Want to make a scatterplot of height vs. weight:
\( \Rightarrow \) a plot can be found of as a mapping of data to geometric object (point, line, bar…) and their aesthetic attributes (shape, color, size…) \[ \mbox{height} \rightarrow x \qquad \mbox{weight} \rightarrow y \qquad \mbox{gender} \rightarrow \mbox{color} \]
Also scales and a coordinate system are needed to convert data unit to physical drawing units
geometric objects + scales and coordinate system \( \rightarrow \) plot
Data: a data.frame and nothing else!
Aesthetic mapping: describe how data are mapped to things we can see on the plot through the function aes().
Geometric object: perform the actual rendering of the plot and control the type of plot to create (points, line, histogram, boxplot…).
Statistical transformations: transform the data for instance by summarising it in some manner (sum, density, smooth…)
Position adjustments: apply minor changes to the position of elements (jitter, fill, stack, dodge, identity)
Basic idea: specify different parts of the plot, and add them together using the + operator.
library(ggplot2)
ggplot(data = <DATA>, 
       aes(x = <X AXIS VARIABLE>,
           y = <Y AXIS VARIABLE>, ... ), ...) +
  geom_<TYPE>(aes(size = <SIZE VARIABLE>, ...),
                   data = <DATA>,
                   stat = <FUNCTION>,
                   position = <POSITION>,
                   color = <"COLOR">, ...) +
  scale_<AESTHETIC>_<TYPE>(name = <NAME>,
                   breaks = <WHERE>,
                   labels = <LABELS>, ... ) +
  theme(...) +
  facet_<TYPE>(<FORMULA>)
~54,000 round diamonds from http://www.diamondse.info with carat, colour, clarity, cut, total depth, table, depth, width, height, price
data(diamonds)
dim(diamonds)
[1] 53940    10
head(diamonds)
# A tibble: 6 x 10
  carat cut       color clarity depth table price     x     y     z
  <dbl> <ord>     <ord> <ord>   <dbl> <dbl> <int> <dbl> <dbl> <dbl>
1 0.230 Ideal     E     SI2      61.5   55.   326  3.95  3.98  2.43
2 0.210 Premium   E     SI1      59.8   61.   326  3.89  3.84  2.31
3 0.230 Good      E     VS1      56.9   65.   327  4.05  4.07  2.31
4 0.290 Premium   I     VS2      62.4   58.   334  4.20  4.23  2.63
5 0.310 Good      J     SI2      63.3   58.   335  4.34  4.35  2.75
6 0.240 Very Good J     VVS2     62.8   57.   336  3.94  3.96  2.48
Use a subsample of 1,000 diamonds in the following graphids
Base graphics histogram example:
hist(diamonds$price, main = "", xlab = "Price", breaks = 50)
ggplot2 graphics histogram example:
ggplot(diamonds, aes(x = price)) + geom_histogram(bins = 50)
ggplot(diamonds, aes(x = price, fill = color)) + 
  geom_histogram(binwidth = 1000) + facet_wrap( ~ cut) + theme_bw()
colorcut (one panel for each type of
cut)## Create ggplot object, populate it with data
ggplot(diamonds, aes(x = carat, y = price, colour = cut)) +  
## Add layer(s)
  geom_point(alpha = 0.3) +
  geom_smooth() + 
## Scales for dimensions, color palettes
    scale_y_log10() +
## Condition on variables
    facet_grid(~ cut) +
## More options
    ggtitle("First example") + theme_bw()
## Create ggplot object
MyPlot <- ggplot(diamonds)
class(MyPlot)
summary(MyPlot); MyPlot
## Add aesthetics
MyPlot <- MyPlot + aes(x = carat, y = price, colour = cut)
summary(MyPlot)
MyPlot
## Add layer(s)
MyPlot <- MyPlot + geom_point(alpha=0.3)
summary(MyPlot) ; MyPlot  
MyPlot <- MyPlot + geom_smooth()
summary(MyPlot) ; MyPlot
## Scales for dimensions
MyPlot + scale_y_log10()
## Condition on variables
MyPlot + facet_grid(~ cut)
## More options
MyPlot + ggtitle("First example") + theme_bw()
Create a plot object (class gg, ggplot2)
Plot objects can be stored as variables; it's easy to share a plot.
The plot object cannot be displayed without, at least, one layer
ggplot2 aesthetic = “something you can see”, set with the aes() function
Examples:
All geom_XXX require some aesthetics (at least one)
?geom_point
...
Aesthetics
The following aesthetics can be used with geom_point. Aesthetics are mapped 
to variables in the data with the aes function: geom_point(aes(x = var))
x: x position (required)
y: y position (required)
shape: shape of point
colour: border colour
size: size
fill: internal colour
alpha: transparency
All of the following are correct (and equivalent):
ggplot(diamonds, aes(x = carat, y = price, color = cut)) + geom_point()
ggplot(diamonds) +  geom_point(aes(x = carat, y = price, color = cut))
ggplot(diamonds, aes(x = carat, y = price)) + geom_point(aes(color = cut))
Aesthetic mapping:
aes()ggplot(diamonds, aes(x = carat, y = price, color = clarity)) + geom_point()
Setting:
geom_<TYPE>ggplot(diamonds, aes(x = carat, y = price)) + geom_point(color = "red")
with just geom_histogram and geom_point
but improved by adding transparency to points
Tip: Size of bins is 0.2
ggplot2 geometric object = the actual marks put on the plot, a plot must have at least one geom
Examples:
geom_point() for scatterplots, dot plots, etc.geom_line() for time series, geom_smooth() for trend lines 
(spline by default), etc.geom_boxplot() and geom_histogram()geom_bar()help.search("geom_", package = "ggplot2")
geom_abline       geom_jitter
geom_area             geom_line
geom_bar              geom_linerange 
geom_bin2d          geom_path 
geom_blank          geom_point 
geom_boxplot        geom_pointrange 
geom_contour        geom_polygon 
geom_crossbar       geom_quantile 
geom_density        geom_rect 
geom_density2d    geom_ribbon
geom_errorbar       geom_rug 
geom_errorbarh    geom_segment 
geom_freqpoly         geom_smooth 
geom_hex              geom_step 
geom_histogram    geom_text 
geom_hline          geom_tile
geom_vline        ...
p <- ggplot(diamonds)
## Overall histogram
p + geom_histogram(aes(x = price))
## Composition of each bin
p + geom_histogram(aes(x = price, fill = cut))
## Relative proportions
p + geom_histogram(aes(x = price, fill = cut), position = "fill")
p + geom_density(aes(x = price, fill = cut), alpha = 0.5)
p + geom_boxplot(aes(x = cut, y = price), notch = TRUE)
About notches
p + geom_boxplot(aes(x = cut, y = price, fill = color))
geom_rect requires xmin, xmax, ymin, ymaxgeom_tile requires x, y and optionnaly handles width and heightp + geom_tile(aes(x = as.numeric(cut), y = as.numeric(color), fill = depth))
/!\ Only the last value of depth is represented…
geom_raster is a special case of geom_tile with all tiles having the same sizep + geom_raster(aes(x = as.numeric(cut), y = as.numeric(color), fill = depth))
ggplot(diamonds, aes(x = color, y = price, fill = color)) + 
  geom_boxplot(outlier.size = 0) +
  geom_point(aes(fill = color), alpha = 0.1, shape = 21)
ggplot(diamonds, aes(x = color, y = price, fill = color)) + 
  geom_boxplot(outlier.size = 0) +
  geom_point(aes(fill = color), alpha = 0.1, , shape = 21,
             position = position_jitter(w = .3))
ggplot(diamonds, aes(x = reorder(color, price), y = price, fill = color)) + 
  geom_boxplot(outlier.size = 0) +
  geom_point(aes(fill = color), alpha = 0.1, , shape = 21,
             position = position_jitter(w = .4))
reorder is not a ggplot2 function. It deals with its first argument as a categorial variable (color) and reorder its level based on the value of a second variable (price). The third argument (FUN, default is mean) is the function to be applied to price for each level of color.ggplot(diamonds, aes(x = reorder(color, price), y = price, fill = color)) + 
  geom_violin() +
  geom_point(aes(fill = color), alpha = 0.5, , shape = 21,
             position = position_jitter(w = .4))
ggplot(diamonds, aes(x = carat, y = price, color = cut)) +
  geom_point(shape = 21) + geom_smooth() + geom_rug()
Base graphics?check for the option position of geom_bar
Themes handle non-data plot elements like axis labels, plot background, legend
appearance, …: theme_gray() (default), theme_bw(), 
theme_classic(), theme_linedraw(), …
ggplot(diamonds, aes(x = carat, y = price)) + geom_point() + 
  geom_smooth(aes(colour = cut)) + theme_bw()
ggplot(diamonds, aes(x = carat, y = price)) + geom_point() + 
  geom_smooth(aes(colour = cut)) + theme_bw() + ylab("price (in USD)") +
  ggtitle("My beautiful plot")
ggplot(diamonds, aes(x = cut, y = price, fill = clarity)) +
  geom_boxplot() + scale_fill_discrete(name = "Clarity of diamond",
                                       labels = paste("C", 1:8))
library(RColorBrewer)
ggplot(diamonds, aes(x = cut, y = price, fill = clarity)) +
  geom_boxplot() + scale_fill_manual(name = "Clarity of\n diamond",
                                     labels = paste("C", 1:8),
                                     values = brewer.pal(8, "Set2"))
ggplot(diamonds, aes(x = z, y = carat, colour = price)) +
  geom_point() + scale_y_log10() + 
  scale_colour_gradient(low = "grey", high = "pink")
ggplot(diamonds, aes(x = z, y = carat, colour = price)) +
  geom_point() + scale_y_log10() + xlim(0, 10) +
  scale_colour_gradient(low = "grey", high = "pink")
ggplot(diamonds, aes(x = price)) + geom_histogram() + facet_wrap(~ cut)
ggplot(diamonds, aes(x = price)) + geom_histogram(fill = "red") + 
  facet_grid(clarity ~ cut) + theme_dark()
ggplot(diamonds, aes(x = cut, y = price, fill = clarity)) +
  geom_boxplot() + 
  theme(legend.text = element_text(size = 5, colour = "red"),
        legend.position = "top", 
        axis.ticks = element_blank(), 
        axis.text.x = element_text(size = 10, angle = 45, face = "bold"))
To plot several ggplot graphics together, use gridExtra:
library(gridExtra)
p1 <- ggplot(diamonds) + geom_point(aes(x = carat, y = price, color = cut))
p2 <- ggplot(diamonds) + geom_density(aes(x = price, fill = cut), alpha = 0.5)
grid.arrange(p1, p2, ncol = 2)
Use ggsave to save a ggplot (uses file name extension to determine file 
type: .ps, .eps, .tex, .pdf, .jpg, .tiff, .png, .bmp, .svg, .wmf)
p <- ggplot(...) + ...
ggsave("...", plot = p, width = 4, height = 4)
library(ggnetwork); library(network)
data(emon)
ggplot(ggnetwork(emon[[1]], layout = "kamadakawai", arrow.gap = 0.025),
  aes(x, y, xend = xend, yend = yend)) +
  geom_edges(aes(color = Frequency), curvature = 0.1,
  arrow = arrow(length = unit(10, "pt"), type = "open")) +
  geom_nodes(aes(size = Formalization)) +
  scale_color_gradient(low = "grey50", high = "tomato") +
  scale_size_area(breaks = 1:3) + theme_blank()
ggbio = Bioconductor package for ggplots of genomics data http://bioconductor.org/packages/release/bioc/html/ggbio.html
Slides built with material coming from:
Credit for figures