Friday, October 4, 2013

Loops revisited: How to rethink macros when using R


  1. Hi Slawa, thank you for this wonderful post!

    In the first several examples, it could be costly in terms of memory use to generate sub data frames when the data set is very large. I guess a better way is to work with formula. We can write a function to do this:

    genForm = function(dep, main, control){
    indep = paste(paste(list.main, collapse = " + "), "+", paste(list.control, collapse = " + "))
    form = paste(dep,"~",indep)
    form = as.formula(form)

    list.control = c("z", "w")
    list.main = c("x")

    summary(lm(genForm("y", "x", list.control), data = mydata))
    summary(glm(genForm("ybin", "x", list.control), data = mydata))

    And regression with a repeatedly used sub data frame can be done this way

    con1 = expression(x > 2 & z < 3)
    summary(glm(genForm("ybin", "x", list.control), data = mydata[eval(con1),]))

    And looping over variables

    lapply(c("y", "ybin") , function(outcome) summary(lm(genForm(outcome, "x", list.control), data = mydata)) )

    1. Hi Yimeng, thanks for your reply! Those are great suggestions. I try to use the shortest possible code but you're right about needing to think about memory, especially with large data sets, which a lot of people work with. Thanks for your contribution!

  2. Actually, there is an even more elegant way to subset on rows within most lm-like model-fitting functions:

    summary(lm(ybin ~ xvars.sub, data = data.sub,subset=x>2 & z<3))

  3. Very glad full information

    Visit some health and fitness product@ slim 24 pro