How to access the last value in a vector?

r dataframe vector

Suppose I have a vector that is nested in a dataframe one or two levels. Is there a quick and dirty way to access the last value, without using the length() function? Something ala PERL's $# special var?

So I would like something like:

dat$vec1$vec2[$#]

instead of

dat$vec1$vec2[length(dat$vec1$vec2)]

I am by no means an R expert, but a quick google turned up this: <stat.ucl.ac.be/ISdidactique/Rhelp/library/pastecs/html/…> There appears to be a "last" function.

Related: stackoverflow.com/q/6136613/946850

MATLAB has the notation "myvariable(end-k)" where k is an integer less than the length of the vector that will return the (length(myvariable)-k)th element. That would be nice to have in R.

Jack Bashford

I use the tail function:

tail(vector, n=1)

The nice thing with tail is that it works on dataframes too, unlike the x[length(x)] idiom.

however x[length(x[,1]),] works on dataframes or x[dim(x)[1],]

Note that for data frames, length(x) == ncol(x) so that's definitely wrong, and dim(x)[1] can more descriptively be written nrow(x).

@hadley - kpierce8's suggestion of x[length(x[,1]),] is not wrong (note the comma in the x subset), but it's certainly awkward.

Please note that my benchmark below shows this to be slower than x[length(x)] by a factor of 30 on average for larger vectors!

Doesn't work if you want to add stuff from vectors though tail(vector, n=1)-tail(vector, n=2)

anonymous

To answer this not from an aesthetical but performance-oriented point of view, I've put all of the above suggestions through a benchmark. To be precise, I've considered the suggestions

x[length(x)]

mylast(x), where mylast is a C++ function implemented through Rcpp,

tail(x, n=1)

dplyr::last(x)

x[end(x)[1]]]

rev(x)[1]

and applied them to random vectors of various sizes (10^3, 10^4, 10^5, 10^6, and 10^7). Before we look at the numbers, I think it should be clear that anything that becomes noticeably slower with greater input size (i.e., anything that is not O(1)) is not an option. Here's the code that I used:

Rcpp::cppFunction('double mylast(NumericVector x) { int n = x.size(); return x[n-1]; }')
options(width=100)
for (n in c(1e3,1e4,1e5,1e6,1e7)) {
  x <- runif(n);
  print(microbenchmark::microbenchmark(x[length(x)],
                                       mylast(x),
                                       tail(x, n=1),
                                       dplyr::last(x),
                                       x[end(x)[1]],
                                       rev(x)[1]))}

It gives me

Unit: nanoseconds
           expr   min      lq     mean  median      uq   max neval
   x[length(x)]   171   291.5   388.91   337.5   390.0  3233   100
      mylast(x)  1291  1832.0  2329.11  2063.0  2276.0 19053   100
 tail(x, n = 1)  7718  9589.5 11236.27 10683.0 12149.0 32711   100
 dplyr::last(x) 16341 19049.5 22080.23 21673.0 23485.5 70047   100
   x[end(x)[1]]  7688 10434.0 13288.05 11889.5 13166.5 78536   100
      rev(x)[1]  7829  8951.5 10995.59  9883.0 10890.0 45763   100
Unit: nanoseconds
           expr   min      lq     mean  median      uq    max neval
   x[length(x)]   204   323.0   475.76   386.5   459.5   6029   100
      mylast(x)  1469  2102.5  2708.50  2462.0  2995.0   9723   100
 tail(x, n = 1)  7671  9504.5 12470.82 10986.5 12748.0  62320   100
 dplyr::last(x) 15703 19933.5 26352.66 22469.5 25356.5 126314   100
   x[end(x)[1]] 13766 18800.5 27137.17 21677.5 26207.5  95982   100
      rev(x)[1] 52785 58624.0 78640.93 60213.0 72778.0 851113   100
Unit: nanoseconds
           expr     min        lq       mean    median        uq     max neval
   x[length(x)]     214     346.0     583.40     529.5     720.0    1512   100
      mylast(x)    1393    2126.0    4872.60    4905.5    7338.0    9806   100
 tail(x, n = 1)    8343   10384.0   19558.05   18121.0   25417.0   69608   100
 dplyr::last(x)   16065   22960.0   36671.13   37212.0   48071.5   75946   100
   x[end(x)[1]]  360176  404965.5  432528.84  424798.0  450996.0  710501   100
      rev(x)[1] 1060547 1140149.0 1189297.38 1180997.5 1225849.0 1383479   100
Unit: nanoseconds
           expr     min        lq        mean    median         uq      max neval
   x[length(x)]     327     584.0     1150.75     996.5     1652.5     3974   100
      mylast(x)    2060    3128.5     7541.51    8899.0     9958.0    16175   100
 tail(x, n = 1)   10484   16936.0    30250.11   34030.0    39355.0    52689   100
 dplyr::last(x)   19133   47444.5    55280.09   61205.5    66312.5   105851   100
   x[end(x)[1]] 1110956 2298408.0  3670360.45 2334753.0  4475915.0 19235341   100
      rev(x)[1] 6536063 7969103.0 11004418.46 9973664.5 12340089.5 28447454   100
Unit: nanoseconds
           expr      min         lq         mean      median          uq       max neval
   x[length(x)]      327      722.0      1644.16      1133.5      2055.5     13724   100
      mylast(x)     1962     3727.5      9578.21      9951.5     12887.5     41773   100
 tail(x, n = 1)     9829    21038.0     36623.67     43710.0     48883.0     66289   100
 dplyr::last(x)    21832    35269.0     60523.40     63726.0     75539.5    200064   100
   x[end(x)[1]] 21008128 23004594.5  37356132.43  30006737.0  47839917.0 105430564   100
      rev(x)[1] 74317382 92985054.0 108618154.55 102328667.5 112443834.0 187925942   100

This immediately rules out anything involving rev or end since they're clearly not O(1) (and the resulting expressions are evaluated in a non-lazy fashion). tail and dplyr::last are not far from being O(1) but they're also considerably slower than mylast(x) and x[length(x)]. Since mylast(x) is slower than x[length(x)] and provides no benefits (rather, it's custom and does not handle an empty vector gracefully), I think the answer is clear: Please use x[length(x)].

^ O(1) solutions should be the only acceptable answer in this question.

Thanks for timing all those anon +1!

I tried mylastR=function(x) {x[length(x)} It's faster than mylast in Rcpp, but one time slower than writing x[length(x)] directly

Even with big vectors there is no meaningful difference. Transforming to seconds shows that for the longest vector the fastest method takes 0.000001133 seconds and the slowest method takes 0.102328667 seconds (both median). Well, nobody will notice that in real life. I would choose readabilty over benchmarks here.

Gregg Lind

If you're looking for something as nice as Python's x[-1] notation, I think you're out of luck. The standard idiom is

x[length(x)]

but it's easy enough to write a function to do this:

last <- function(x) { return( x[length(x)] ) }

This missing feature in R annoys me too!

Do note that if you want the last few elements of a vector rather than just the last element, there's no need to do anything complex when adapting this solution. R's vectorization allows you to do neet things like get the last four elements of x by doing x[length(x)-0:3].

Jack Bashford

Combining lindelof's and Gregg Lind's ideas:

last <- function(x) { tail(x, n = 1) }

Working at the prompt, I usually omit the n=, i.e. tail(x, 1).

Unlike last from the pastecs package, head and tail (from utils) work not only on vectors but also on data frames etc., and also can return data "without first/last n elements", e.g.

but.last <- function(x) { head(x, n = -1) }

(Note that you have to use head for this, instead of tail.)

Please note that my benchmark below shows this to be slower than x[length(x)] by a factor of 30 on average for larger vectors!

MichaelChirico

The dplyr package includes a function last():

last(mtcars$mpg)
# [1] 21.4

This basically boils down to x[[length(x)]] again.

Similar under the hood, but with this answer you don't have to write your own function last() and store that function somewhere, like several people have done above. You get the improved readability of a function, with the portability of it coming from CRAN so that someone else can run the code.

Can also write as mtcars$mpg %>% last, depending on your preference.

@RichScriven Unfortunately, it's considerably slower than x[[length(x)]], though!

scuerda

I just benchmarked these two approaches on data frame with 663,552 rows using the following code:

system.time(
  resultsByLevel$subject <- sapply(resultsByLevel$variable, function(x) {
    s <- strsplit(x, ".", fixed=TRUE)[[1]]
    s[length(s)]
  })
  )

 user  system elapsed 
  3.722   0.000   3.594

and

system.time(
  resultsByLevel$subject <- sapply(resultsByLevel$variable, function(x) {
    s <- strsplit(x, ".", fixed=TRUE)[[1]]
    tail(s, n=1)
  })
  )

   user  system elapsed 
 28.174   0.000  27.662

So, assuming you're working with vectors, accessing the length position is significantly faster.

Why not testing tail(strsplit(x,".",fixed=T)[[1]],1) for the 2nd case? To me the main advantage of the tail is that you can write it in one line. ;)

James

Another way is to take the first element of the reversed vector:

rev(dat$vect1$vec2)[1]

This will be expensive though!

Please note that this is an operation whose computational cost is linear in the length of the input; in other words, while O(n), it is not O(1). See also my benchmark below for actual numbers.

@anonymous Unless you use an iterator

@James Right. But in that case, your code also wouldn't work, would it? If by iterator you mean what's provided by the iterators package, then (1) you cannot use [1] to access the first element and (2) while you can apply rev to an iterator, it does not behave as expected: it just treats the iterator object as a list of its members and reverses that.

Akash

I have another method for finding the last element in a vector. Say the vector is a.

> a<-c(1:100,555)
> end(a)      #Gives indices of last and first positions
[1] 101   1
> a[end(a)[1]]   #Gives last element in a vector
[1] 555

There you go!

MichaelChirico

Package data.table includes last function

library(data.table)
last(c(1:10))
# [1] 10

This basically boils down to x[[length(x)]] again.

Kurt Ludikovsky

Whats about

> a <- c(1:100,555)
> a[NROW(a)]
[1] 555

I appreciate that NROW does what you would expect on a lot of different data types, but it's essentially the same as a[length(a)] that OP is hoping to avoid. Using OP's example of a nested vector, dat$vec1$vec2[NROW(dat$vec1$vec2)] is still pretty messy.

may be written as nrow

Note: Unlike nrow, NROW treats a vector as 1-column matrix.

Toby Speight

The xts package provides a last function:

library(xts)
a <- 1:100
last(a)
[1] 100

How to access the last value in a vector?

Follow WeChat

Want to stay one step ahead of the latest teleworks?

相似问题

Platform

Support

Contact US