Sunday, January 20, 2013

Translating Weird R Errors


I love R. I think it's intuitive and clever and overall a great language. But I do get really annoyed sometimes at the completely ridiculous, cryptic error messages it often gives me.  This post will go over some of those seemingly nonsensical errors so you don't have to go crazy trying to find the bug in your code.

1. all arguments must have the same length

To start with, I just make up some quick data:

prob1<-as.data.frame(cbind(c(1,2,3),c(5,4,3)))
colnames(prob1)<-c("Education","Ethnicity")

And now I just want to do a simple table but I get this error:






What the heck. I look back at my dataset and make sure that both those variables are the same length, which they are. The problem here is that I misspelled "Education".  There's a missing "a" in there and instead of telling me that I referenced a variable that doesn't exist, R bizarrely tells me to check the length of my variables. Remember: Anytime you get an error, check to make sure you've spelled everything right. 

If I do this, everything works out great:
table(prob1$Education, prob1$Ethnicity)


2. replacement has 0 rows, data has 3

A very similar problem, with a very different error message. Let's say I forgot what columns were in my prob1 data and I thought I had a Sex indicator in there. So I try to recode it like this:

This error message is also pretty unhelpful. The syntax is totally correct; the problem is that I just don't have a variable named Sex in my dataset. If I do this instead to recode education, a variable that exists, everything is fine:

prob1$Educ_recode<-as.numeric(prob1$Education==2)


3. undefined columns selected

Ironically, the error we so badly wanted before comes up but for a completely different reason. See if you can find the problem here.  I'll take that same little dataset and I just want to know how many rows there are in which Education is not equal to 1.

So, if I want to know the number of rows of the dataframe prob1, I do:

nrow(prob1)

and if I want to know how many have a value of Education not equal to 1, I do the following (incorrectly) and get an error:






Now I check my variable name and I've definitely spelled Education right this time. The problem, actually, is not that I have referenced a column that doesn't exist but I've messed up the syntax to the nrow() function, in that I haven't defined what columns I want to subset.  When I do,

prob1[prob1$Education!=1]

this doesn't make any sense, because I'm saying to subset prob1 but to do this I have to specify which rows I want and which columns I want.  This just lists one condition in the brackets and it's unclear whether it's for the rows or columns.  See my post on subsetting for more details on this.

If I do it the following way, all is good since I'm saying to subset prob1 with only rows with education !=1 and all columns:

nrow(prob1[prob1$Education!=1,])

So this error message does make sense in a way, but it's still a bit cryptic in my opinion.


11 comments:

  1. No doubt the wording could be made more helpful but as a matter of fact, the messages make perfect sense:

    > prob1<-as.data.frame(cbind(c(1,2,3),c(5,4,3)))
    > colnames(prob1)<-c("Education","Ethnicity")
    > table(prob1$Eduction, prob1$Ethnicity)
    Error in table(prob1$Eduction, prob1$Ethnicity) :
    all arguments must have the same length

    This is literally true because prob1$Eduction has length 0 and prob1$Ethnicity has length 3. You can check it:

    length(prob1$Eduction ) # 0
    length(prob1$Ethnicity ) # 3

    > prob1$Sex_recode <- as.numeric(prob1$Sex)
    Error in `$<-.data.frame`(`*tmp*`, "Sex_recode", value = numeric(0)) :
    replacement has 0 rows, data has 3

    NROW(prob1) # 3
    # so the data has 3 rows
    NROW(prob1$Sex) # 0

    > prob1[prob1$Education!=1]
    Error in `[.data.frame`(prob1, prob1$Education != 1) :
    undefined columns selected

    - This is in my opinion cryptic and makes the least sense of all because on the one hand [.data.frame expects 2 arguments and you give just one (so you've left one necessary argument unspecified) but

    > prob1[prob1$Education!=1, ]

    ... here, too, the second argument is missing and there's only comma to show you want it to think there are 2 arguments. Here the missing argument says "I want it all" but in the other version, you have also the second argument (technically) missing and you get nothing.

    BUT when using R one needs subsetting -- ['ing -- a lot, so this should be one of the first things you learn to use and after that, there should be no problems.

    ReplyDelete
  2. In your first example, yes, the problem is that prob1$Eduction returns NULL. If it scares you that R does not error out when you make a typo, or if it also scares you that R allows prob1$E or prob$Educ as short ways to access prob$Education, then the solution is to use the less "interactive" / more "programatic" "[" function.

    prob1["Education"] will work.

    prob1["Educ"] or prob1["Eduction"] will both error out with a meaningful message: "undefined columns selected".

    ReplyDelete
  3. I agree, these are annoying features. However, it is difficult to write good error messages. One programmer's 'cryptic' is another programmer's 'specific'. I've yet to meet the language/software/application that issues perfect error messages.

    ReplyDelete
  4. @Kenn I believe Slawa is saying these error messages are "unhelpful" not "inaccurate" or "don't make sense". The problem in general is that you have to understand the error in order to understand the error message not the other way round.

    That said, I'm sure it's tough for programmers to anticipate needed error messages. I'm not sure we'll ever get to a satisfactory solution here. But what could work is for more posts like this to exist on the internet. So if you get an error message you don't understand, you can google it and find posts like this that explain what's going on.

    ReplyDelete
  5. Thanks for the comments everyone. I really appreciate all the input and I didn't know about the "[" function so thanks! I do agree that the errors make sense if you're quite R literate and experienced, but as a programmer starting out in R (and most people who read my blog I believe are people just starting out in R), it can be frustrating to read an error message and not understand at all what is wrong with your code. Sometimes I get that "PC load letter" feeling :)

    @Tom is definitely right in that hopefully people can google that message and find my post helpful.

    ReplyDelete
  6. To avoid typos you can also use the TAB key (autocomplete) in the following way:

    1. type "prob1$E"
    2. now press TAB
    3. if there is only one variable starting with E, then "autocomplete" will automatically find it. Otherwise you may have to type in a few more letters.

    This also works with file names; start quotes, type in the first few letters of a file name, and press TAB.

    The only problem with autocomplete is that in my experience it may sometimes be unreasonably slow on windows. (But it may depend on computer.)

    ReplyDelete
  7. may be this can be interesting "Error: cannot allocate vector of size 130.4 Mb"

    ReplyDelete
    Replies
    1. This is interesting - can you set up an example of how you got it?

      Delete
  8. Just one more hint. Instead of

    table(prob1$Education, prob1$Ethnicity)

    ... you might try

    with(prob1, table(Education, Ethnicity))

    This will give more meaningful error msgs and may save you some typing, especially if your data frame has a long name:

    compare ...

    table(my.favourite.data.frame.with.a.longish.name$y, my.favourite.data.frame.with.a.longish.name$x)

    ... with

    with(my.favourite.data.frame.with.a.longish.name, table(y,x))

    But for recoding there are different solutions:

    # prob1$Educ_recode<-as.numeric(prob1$Education==2)
    # could be written as:

    prob1$Educ_recode <- with(prob1, as.numeric(Education==2))

    # or

    prob1 <- within(prob1, Educ_recode <- as.numeric(Education==2))

    Again, not shorter this time but useful in many cases.

    As for the "cannot allocate vector ..." error:

    > 1:7^10
    Error: cannot allocate vector of size 1.1 Gb

    I.e this means, basically, "out of memory".

    ReplyDelete
    Replies
    1. Thanks! I have not been in the habit of using with or within, but I will try to incorporate it. I appreciate the comments!

      Delete
  9. This comment has been removed by a blog administrator.

    ReplyDelete