Unique combination of all elements from two (or more) vectors

r r-faq

I am trying to create a unique combination of all elements from two vectors of different size in R.

For example, the first vector is

a <- c("ABC", "DEF", "GHI")

and the second one is dates stored as strings currently

b <- c("2012-05-01", "2012-05-02", "2012-05-03", "2012-05-04", "2012-05-05")

I need to create a data frame with two columns like this

> data
    a          b
1  ABC 2012-05-01
2  ABC 2012-05-02
3  ABC 2012-05-03
4  ABC 2012-05-04
5  ABC 2012-05-05
6  DEF 2012-05-01
7  DEF 2012-05-02
8  DEF 2012-05-03
9  DEF 2012-05-04
10 DEF 2012-05-05
11 GHI 2012-05-01
12 GHI 2012-05-02
13 GHI 2012-05-03
14 GHI 2012-05-04
15 GHI 2012-05-05

So basically, I am looking for a unique combination by considering all the elements of one vector (a) juxtaposed with all the elements of the second vector (b).

An ideal solution would generalize to more input vectors.

See also: How to generate a matrix of combinations

Gregor Thomas

this maybe what you are after

> expand.grid(a,b)
   Var1       Var2
1   ABC 2012-05-01
2   DEF 2012-05-01
3   GHI 2012-05-01
4   ABC 2012-05-02
5   DEF 2012-05-02
6   GHI 2012-05-02
7   ABC 2012-05-03
8   DEF 2012-05-03
9   GHI 2012-05-03
10  ABC 2012-05-04
11  DEF 2012-05-04
12  GHI 2012-05-04
13  ABC 2012-05-05
14  DEF 2012-05-05
15  GHI 2012-05-05

If the resulting order isn't what you want, you can sort afterwards. If you name the arguments to expand.grid, they will become column names:

df = expand.grid(a = a, b = b)
df[order(df$a), ]

And expand.grid generalizes to any number of input columns.

And without needing plyr to just do a sort: result <- expand.grid(a=a,b=b); result <- result[order(result$a,result$b),];

is someone with more rep than me able to accept this answer?

If order and names should be as in the question: expand.grid(b=b,a=a)[2:1]

Note the title is Unique Combinations - this answer solves the OP problem, but if the 2 columns are of the same data type and you apply expand.grid, you will have unique permutations, not unique combinations

hypothesis

The tidyr package provides the nice alternative crossing, which works better than the classic expand.grid function because (1) strings are not converted into factors and (2) the sorting is more intuitive:

library(tidyr)

a <- c("ABC", "DEF", "GHI")
b <- c("2012-05-01", "2012-05-02", "2012-05-03", "2012-05-04", "2012-05-05")

crossing(a, b)

# A tibble: 15 x 2
       a          b
   <chr>      <chr>
 1   ABC 2012-05-01
 2   ABC 2012-05-02
 3   ABC 2012-05-03
 4   ABC 2012-05-04
 5   ABC 2012-05-05
 6   DEF 2012-05-01
 7   DEF 2012-05-02
 8   DEF 2012-05-03
 9   DEF 2012-05-04
10   DEF 2012-05-05
11   GHI 2012-05-01
12   GHI 2012-05-02
13   GHI 2012-05-03
14   GHI 2012-05-04
15   GHI 2012-05-05

Jaap

Missing in this r-faq overview is the CJ-function from the data.table-package. Using:

library(data.table)
CJ(a, b, unique = TRUE)

gives:

a b 1: ABC 2012-05-01 2: ABC 2012-05-02 3: ABC 2012-05-03 4: ABC 2012-05-04 5: ABC 2012-05-05 6: DEF 2012-05-01 7: DEF 2012-05-02 8: DEF 2012-05-03 9: DEF 2012-05-04 10: DEF 2012-05-05 11: GHI 2012-05-01 12: GHI 2012-05-02 13: GHI 2012-05-03 14: GHI 2012-05-04 15: GHI 2012-05-05

_{NOTE: since version 1.12.2 CJ autonames the resulting columns (see also here and here).}

tmfmnk

Since version 1.0.0, tidyr offers its own version of expand.grid(). It completes the existing family of expand(), nesting(), and crossing() with a low-level function that works with vectors.

When compared to base::expand.grid():

Varies the first element fastest. Never converts strings to factors. Does not add any additional attributes. Returns a tibble, not a data frame. Can expand any generalised vector, including data frames.

a <- c("ABC", "DEF", "GHI")
b <- c("2012-05-01", "2012-05-02", "2012-05-03", "2012-05-04", "2012-05-05")

tidyr::expand_grid(a, b)

   a     b         
   <chr> <chr>     
 1 ABC   2012-05-01
 2 ABC   2012-05-02
 3 ABC   2012-05-03
 4 ABC   2012-05-04
 5 ABC   2012-05-05
 6 DEF   2012-05-01
 7 DEF   2012-05-02
 8 DEF   2012-05-03
 9 DEF   2012-05-04
10 DEF   2012-05-05
11 GHI   2012-05-01
12 GHI   2012-05-02
13 GHI   2012-05-03
14 GHI   2012-05-04
15 GHI   2012-05-05

jay.sf

you can use order function for sorting any number of columns. for your example

df <- expand.grid(a,b)
> df
   Var1       Var2
1   ABC 2012-05-01
2   DEF 2012-05-01
3   GHI 2012-05-01
4   ABC 2012-05-02
5   DEF 2012-05-02
6   GHI 2012-05-02
7   ABC 2012-05-03
8   DEF 2012-05-03
9   GHI 2012-05-03
10  ABC 2012-05-04
11  DEF 2012-05-04
12  GHI 2012-05-04
13  ABC 2012-05-05
14  DEF 2012-05-05
15  GHI 2012-05-05

> df[order( df[,1], df[,2] ),] 
   Var1       Var2
1   ABC 2012-05-01
4   ABC 2012-05-02
7   ABC 2012-05-03
10  ABC 2012-05-04
13  ABC 2012-05-05
2   DEF 2012-05-01
5   DEF 2012-05-02
8   DEF 2012-05-03
11  DEF 2012-05-04
14  DEF 2012-05-05
3   GHI 2012-05-01
6   GHI 2012-05-02
9   GHI 2012-05-03
12  GHI 2012-05-04
15  GHI 2012-05-05`

Follow WeChat

Success story sharing

Want to stay one step ahead of the latest teleworks?

Subscribe Now

相似问题

unique() for more than one variable

Unique combination of all elements from two (or more) vectors

Follow WeChat

Want to stay one step ahead of the latest teleworks?

相似问题

Platform

Support

Contact US