Is there an R function for finding the index of an element in a vector?

r indexing match vectorization

In R, I have an element x and a vector v. I want to find the first index of an element in v that is equal to x. I know that one way to do this is: which(x == v)[[1]], but that seems excessively inefficient. Is there a more direct way to do it?

For bonus points, is there a function that works if x is a vector? That is, it should return a vector of indices indicating the position of each element of x in v.

As R is optimized to work with vectors, which(x == v)[[1]] is not so very inefficient. It's one comparison (==) operator applied to all vector elements and one subsetting on the indices (which). That's it. Nothing that should be relevant, as long as you're not running 10.000 repetitions on this function. Other solutions like match and Position may not return as many data as which, but they're not necessarily more efficient.

My question specified that I would prefer a function that was vectorized over x, and which(x == v)[[1]] is not.

So this is what I first see when I ask a question on SO

TylerH

The function match works on vectors:

x <- sample(1:10)
x
# [1]  4  5  9  3  8  1  6 10  7  2
match(c(4,8),x)
# [1] 1 5

match only returns the first encounter of a match, as you requested. It returns the position in the second argument of the values in the first argument.

For multiple matching, %in% is the way to go:

x <- sample(1:4,10,replace=TRUE)
x
# [1] 3 4 3 3 2 3 1 1 2 2
which(x %in% c(2,4))
# [1]  2  5  9 10

%in% returns a logical vector as long as the first argument, with a TRUE if that value can be found in the second argument and a FALSE otherwise.

I think that an example with c(2,3,3) and c(1,2,3,4) with both match and %in% would be more instructive with fewer changes between the examples. match(c(2,3,3), c(1:4)) returns different results from which(c(2,3,3) %in% c(1:4)) without needing a longer first vector and as many changes from example to example. It's also worth noting that they handle non-matches very differently.

@John : that's all true, but that is not what the OP asked. The OP asked, starting from a long vector, to find the first match of elements given in another one. And for completeness, I added that if you are interested in all indices, you'll have to use which(%in%). BTW, there is no reason to delete your answer. It's valid information.

I think it would be helpful to stress that the order of the arguments in match matters if you want the index of the first occurrence. For your example, match(x,c(4,8)) gives different results, which is not super obvious at first.

@goldenoslik It helps if you read the help page of match. It's all explained there. But I added that piece of information.

pedroteixeira

the function Position in funprog {base} also does the job. It allows you to pass an arbitrary function, and returns the first or last match.

Position(f, x, right = FALSE, nomatch = NA_integer)

augenbrot

A small note about the efficiency of abovementioned methods:

 library(microbenchmark)

  microbenchmark(
    which("Feb" == month.abb)[[1]],
    which(month.abb %in% "Feb"))

  Unit: nanoseconds
   min     lq    mean median     uq  max neval
   891  979.0 1098.00   1031 1135.5 3693   100
   1052 1175.5 1339.74   1235 1390.0 7399  100

So, the best one is

    which("Feb" == month.abb)[[1]]

Your benchmark is based on a length 12 vector and hence not meaningful. Also in your example which("Feb" == month.abb) returns 2–why the [[1]] ?

@markus this code which("Feb" == month.abb)[[1]] return "2", and this code which(month.abb %in% "Feb") also returns "2". Also, not clear why using vector is not meaningful

It is not about the vector, but about its length. You should generate a vector of appropriate length and then do a benchmark based on that. Quoting from OPs question "I know that one way to do this is: which(x == v)[[1]], but that seems excessively inefficient."

Martin Gal

Yes, we can find the index of an element in a vector as follows:

> a <- c(3, 2, -7, -3, 5, 2)
> b <- (a==-7)  # this will output a TRUE/FALSE vector
> c <- which(a==-7) # this will give you numerical value
> a
[1]  3  2 -7 -3  5  2
> b
[1] FALSE FALSE  TRUE FALSE FALSE FALSE
> c
[1] 3

This is one of the most efficient methods of finding the index of an element in a vector.

Is there an R function for finding the index of an element in a vector?

Follow WeChat

Want to stay one step ahead of the latest teleworks?

相似问题

Platform

Support

Contact US