Wednesday, November 13, 2013

ggplot2: Cheatsheet for Scatterplots


  1. Thanks! Very clear and helpful.

  2. Indeed, very clear and helpful. One question: in your last example, you change both colour and shape to vary with vs. Having colour represent vs, and shape, say, am, is not a problem; but how does one construct a suitable legend?

    1. Thanks! You would change scale_shape_manual and scale_color_manual accordingly. I took out the regression lines because it would be confusing but here is the plot with color by vs and shape by am with the legend:

      g2<- ggplot(mtc, aes(x = hp, y = mpg)) +
      geom_point(size=3, aes(color=factor(vs), shape=factor(am))) +
      scale_color_manual(name ="Engine",
      labels=c("V-engine", "Straight"),
      values=c("red","blue")) +
      scale_shape_manual(name ="Transmission",
      labels=c("Automatic", "Manual"),
      values=c(0,2)) +
      theme_bw() +
      axis.title.x = element_text(face="bold", color="black", size=12),
      axis.title.y = element_text(face="bold", color="black", size=12),
      plot.title = element_text(face="bold", color = "black", size=12),
      legend.justification=c(1,1)) +
      labs(x="Horsepower", y = "Miles per Gallon", title= "MPG vs Horsepower by Engine and Transmission")

    2. Some of the plots are not loading (e.g. 4, 6, 8, 10, ...)

    3. Hmm, they look fine to me. Which one specifically doesn't load? Or can you send me a screenshot?

  3. Hi Rokicki.. I'm also Public Health researcher and admire R very much. Its amazing to learn more of R from your blog. I liked this particular ggplot series on Scatterplot.. I would like to know how we can put the regression equation onto the plot, for example in your plot
    p3 <- p1 + geom_point(color="red") + geom_smooth(method = "lm", se = TRUE) #add regression line

    Thank you.

    1. Hi Manoj, Great question! I have updated the Scatterplot blog post to answer it. Check out the last section now and I hope it helps! Thanks for reading.

  4. Thanks for sharing, that what useful. However, annotate() is a better way than geom_text(), as you can see from the poor, jagged annotations it produces, caused by printing over and over. See

  5. Thank you very much for taking the initiative to organize this very useful information in a clear and concise way.

    I recently finished MITx's excellent 15.071x MOOC in data analytics, and this post plus your

    complement the visualization unit of that course very well.

    1. Thanks Nick! I'm really glad it's helpful. That class sounds really interesting. I'll check it out.