## R: watch out when comparing vectors using “==”

This question was on the R-help mailing list today:

I have a data frame “test”:

test<-data.frame(x=c(1,2,3,4,5,6,7,8),y=c(2,3,4,5,6,7,8,9),total=c(7,7,8,8,9,9,10,10)) testI have a vector "needed":

needed<-c(7,9) neededI need the result to look like this:

1 2 7 2 3 7 5 6 9 6 7 9When I do the following:

result<-test[test["total"]==needed,] resultI only get unique rows that have 7 or 9 in "total":

1 2 7 6 7 9[...]

The solution is to use `%in%`

instead:

test[test$total %in% needed,]

A quick aside: note that you don't select the column "total" by using `test["total"]`

, instead, you need to index correctly and use `test[,"total"]`

, or, more succinctly, use `test$total`

for dataframes. Check also `identical(test["total"],test$total)`

and `identical(test[,"total"],test$total)`

.

Why doesn't `==`

work here? It works just fine if you want to find all rows where the column `total`

equals 7:

test[test$total == 7,]

The trick is that to R, `test$total`

, `needed`

and `7`

are all vectors. `==`

compares them element by element. Since `7`

and `needed`

are shorter than `test$total`

, they are recycled as often as needed to give the same length as `test$total`

. (Note, by the way, that you can of course directly look at the logical index by typing: `test$total==needed`

.) So what R is doing is comparing this:

> test$total [1] 7 7 8 8 9 9 10 10

to this:

> rep(needed,length.out=length(test$total)) [1] 7 9 7 9 7 9 7 9

The two vectors just happen to coincide at test$total[1] and test$total[6]. The recycling is of course no problem when you look for matches with a single number, because:

> rep(7,length.out=length(test$total)) [1] 7 7 7 7 7 7 7 7

will do just fine.

To get the help text on `%in%`

by the way, you need to quote it:

?"%in%"