In R, I have an element x
and a vector v
. I want to find the first index of an element in v
that is equal to x
. I know that one way to do this is: which(x == v)[[1]]
, but that seems excessively inefficient. Is there a more direct way to do it?
For bonus points, is there a function that works if x
is a vector? That is, it should return a vector of indices indicating the position of each element of x
in v
.
which(x == v)[[1]]
is not so very inefficient. It's one comparison (==
) operator applied to all vector elements and one subsetting on the indices (which
). That's it. Nothing that should be relevant, as long as you're not running 10.000 repetitions on this function. Other solutions like match
and Position
may not return as many data as which
, but they're not necessarily more efficient.
which(x == v)[[1]]
is not.
The function match
works on vectors:
x <- sample(1:10)
x
# [1] 4 5 9 3 8 1 6 10 7 2
match(c(4,8),x)
# [1] 1 5
match
only returns the first encounter of a match, as you requested. It returns the position in the second argument of the values in the first argument.
For multiple matching, %in%
is the way to go:
x <- sample(1:4,10,replace=TRUE)
x
# [1] 3 4 3 3 2 3 1 1 2 2
which(x %in% c(2,4))
# [1] 2 5 9 10
%in%
returns a logical vector as long as the first argument, with a TRUE
if that value can be found in the second argument and a FALSE
otherwise.
the function Position
in funprog {base} also does the job. It allows you to pass an arbitrary function, and returns the first or last match.
Position(f, x, right = FALSE, nomatch = NA_integer)
A small note about the efficiency of abovementioned methods:
library(microbenchmark)
microbenchmark(
which("Feb" == month.abb)[[1]],
which(month.abb %in% "Feb"))
Unit: nanoseconds
min lq mean median uq max neval
891 979.0 1098.00 1031 1135.5 3693 100
1052 1175.5 1339.74 1235 1390.0 7399 100
So, the best one is
which("Feb" == month.abb)[[1]]
which("Feb" == month.abb)
returns 2
–why the [[1]]
?
which(x == v)[[1]]
, but that seems excessively inefficient."
Yes, we can find the index of an element in a vector as follows:
> a <- c(3, 2, -7, -3, 5, 2)
> b <- (a==-7) # this will output a TRUE/FALSE vector
> c <- which(a==-7) # this will give you numerical value
> a
[1] 3 2 -7 -3 5 2
> b
[1] FALSE FALSE TRUE FALSE FALSE FALSE
> c
[1] 3
This is one of the most efficient methods of finding the index of an element in a vector.
Success story sharing
match
matters if you want the index of the first occurrence. For your example,match(x,c(4,8))
gives different results, which is not super obvious at first.match
. It's all explained there. But I added that piece of information.