ChatGPT解决这个技术问题 Extra ChatGPT

Convert a list to a data frame

I have a nested list of data. Its length is 132 and each item is a list of length 20. Is there a quick way to convert this structure into a data frame that has 132 rows and 20 columns of data?

Here is some sample data to work with:

l <- replicate(
  132,
  as.list(sample(letters, 20)),
  simplify = FALSE
)
So you want each list element as a row of data in your data.frame?
@RichieCotton It's not right example. "each item is a list of length 20" and you got each item is a one element list of vector of length 20.
Late to the party, but I didn't see anyone mention this, which I thought was very handy (for what I was looking to do).

M
Marek

With rbind

do.call(rbind.data.frame, your_list)

Edit: Previous version return data.frame of list's instead of vectors (as @IanSudbery pointed out in comments).


Why does this work but rbind(your_list) returns a 1x32 list matrix?
@eykanal do.call pass elements of your_list as arguments to rbind. It's equivalent of rbind(your_list[[1]], your_list[[2]], your_list[[3]], ....., your_list[[length of your_list]]).
This method suffers from the null situation.
@FrankWANG But this method is not designed to null situation. It's required that your_list contain equally sized vectors. NULL has length 0 so it should failed.
This method seems to return the correct object, but on inspecting the object, you'll find that the columns are lists rather than vectors, which can lead to problems down the line if you are not expecting it.
A
ATpoint

Update July 2020:

The default for the parameter stringsAsFactors is now default.stringsAsFactors() which in turn yields FALSE as its default.

Assuming your list of lists is called l:

df <- data.frame(matrix(unlist(l), nrow=length(l), byrow=TRUE))

The above will convert all character columns to factors, to avoid this you can add a parameter to the data.frame() call:

df <- data.frame(matrix(unlist(l), nrow=132, byrow=TRUE),stringsAsFactors=FALSE)

Careful here if your data is not all of the same type. Passing through a matrix means that all data will be coerced into a common type. I.e. if you have one column of character data and one column of numeric data the numeric data will be coerced to string by matrix() and then both to factor by data.frame().
@Dave: Works for me... see here r-fiddle.org/#/fiddle?id=y8DW7lqL&version=3
Also take care if you have character data type - data.frame will convert it to factors.
@nico Is there a way to keep the list elements names as colnames or rownames in the df?
This answer is quite old, but maybe this is useful for somebody else (also @N.Varela asked for it): If you wanna keep the list element names, try names(df) <- names(unlist(l[1])) after using above command.
m
mropa

You can use the plyr package. For example a nested list of the form

l <- list(a = list(var.1 = 1, var.2 = 2, var.3 = 3)
      , b = list(var.1 = 4, var.2 = 5, var.3 = 6)
      , c = list(var.1 = 7, var.2 = 8, var.3 = 9)
      , d = list(var.1 = 10, var.2 = 11, var.3 = 12)
      )

has now a length of 4 and each list in l contains another list of the length 3. Now you can run

  library (plyr)
  df <- ldply (l, data.frame)

and should get the same result as in the answer @Marek and @nico.


Great answer. I could you explain a little how that works? It simply returns a data frame for each list entry?
Imho the BEST answer. It returns a honest data.frame. All the data types (character, numeric, etc) are correctly transformed. If the list has different data types their will be all transformed to character with matrix approach.
the sample provided here isn't the one provided by the question. the result of this answer on the original dataset is incorrect.
Works great for me! And the names of the columns in the resulting Data Frame are set! Tx
plyr is being deprecated in favour of dplyr
A
Alex Brown

Fixing the sample data so it matches the original description 'each item is a list of length 20'

mylistlist <- replicate(
  132,
  as.list(sample(letters, 20)),
  simplify = FALSE
)

we can convert it to a data frame like this:

data.frame(t(sapply(mylistlist,c)))

sapply converts it to a matrix. data.frame converts the matrix to a data frame.

resulting in:

https://i.stack.imgur.com/Lv0kn.png


best answer by far! None of the other solutions get the types/column names correct. THANK YOU!
What role are you intending c to play here, one instance of the list's data? Oh wait, c for the concatenate function right? Getting confused with @mnel's usage of c. I also concur with @dchandler, getting the column names right was a valuable need in my use case. Brilliant solution.
that right - standard c function; from ?c : Combine Values into a Vector or List
doesn't work with the sample data provided in the question
Doesn't this generate a data.frame of lists?
j
jdeng

assume your list is called L,

data.frame(Reduce(rbind, L))

Nice one! There is one difference with @Alex Brown's solution compared to yours, going your route yielded the following warning message for some reason: `Warning message: In data.row.names(row.names, rowsi, i) : some row.names duplicated: 3,4 --> row.names NOT used'
Very good!! Worked for me here: stackoverflow.com/questions/32996321/…
Works well unless the list has only one element in it: data.frame(Reduce(rbind, list(c('col1','col2')))) produces a data frame with 2 rows, 1 column (I expected 1 row 2 columns)
Instead of using the base function "Reduce" you can use the purr function "reduce" as in: reduce(L, rbind). This outputs a single dataframe and assumes that each data frame in your list (L) is organized the same way (i.e. contains the same number of columns in the same order.
m
mnel

The package data.table has the function rbindlist which is a superfast implementation of do.call(rbind, list(...)).

It can take a list of lists, data.frames or data.tables as input.

library(data.table)
ll <- list(a = list(var.1 = 1, var.2 = 2, var.3 = 3)
  , b = list(var.1 = 4, var.2 = 5, var.3 = 6)
  , c = list(var.1 = 7, var.2 = 8, var.3 = 9)
  , d = list(var.1 = 10, var.2 = 11, var.3 = 12)
  )

DT <- rbindlist(ll)

This returns a data.table inherits from data.frame.

If you really want to convert back to a data.frame use as.data.frame(DT)


Regarding the last line, setDF now allows for returning to data.frame by reference.
For my list with 30k items, rbindlist worked way faster than ldply
This is indeed super fast!
M
Matt Dancho

The tibble package has a function enframe() that solves this problem by coercing nested list objects to nested tibble ("tidy" data frame) objects. Here's a brief example from R for Data Science:

x <- list(
    a = 1:5,
    b = 3:4, 
    c = 5:6
) 

df <- enframe(x)
df
#> # A tibble: 3 × 2
#>    name     value
#>   <chr>    <list>
#>    1     a <int [5]>
#>    2     b <int [2]>
#>    3     c <int [2]>

Since you have several nests in your list, l, you can use the unlist(recursive = FALSE) to remove unnecessary nesting to get just a single hierarchical list and then pass to enframe(). I use tidyr::unnest() to unnest the output into a single level "tidy" data frame, which has your two columns (one for the group name and one for the observations with the groups value). If you want columns that make wide, you can add a column using add_column() that just repeats the order of the values 132 times. Then just spread() the values.

library(tidyverse)

l <- replicate(
    132,
    list(sample(letters, 20)),
    simplify = FALSE
)

l_tib <- l %>% 
    unlist(recursive = FALSE) %>% 
    enframe() %>% 
    unnest()
l_tib
#> # A tibble: 2,640 x 2
#>     name value
#>    <int> <chr>
#> 1      1     d
#> 2      1     z
#> 3      1     l
#> 4      1     b
#> 5      1     i
#> 6      1     j
#> 7      1     g
#> 8      1     w
#> 9      1     r
#> 10     1     p
#> # ... with 2,630 more rows

l_tib_spread <- l_tib %>%
    add_column(index = rep(1:20, 132)) %>%
    spread(key = index, value = value)
l_tib_spread
#> # A tibble: 132 x 21
#>     name   `1`   `2`   `3`   `4`   `5`   `6`   `7`   `8`   `9`  `10`  `11`
#> *  <int> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1      1     d     z     l     b     i     j     g     w     r     p     y
#> 2      2     w     s     h     r     i     k     d     u     a     f     j
#> 3      3     r     v     q     s     m     u     j     p     f     a     i
#> 4      4     o     y     x     n     p     i     f     m     h     l     t
#> 5      5     p     w     v     d     k     a     l     r     j     q     n
#> 6      6     i     k     w     o     c     n     m     b     v     e     q
#> 7      7     c     d     m     i     u     o     e     z     v     g     p
#> 8      8     f     s     e     o     p     n     k     x     c     z     h
#> 9      9     d     g     o     h     x     i     c     y     t     f     j
#> 10    10     y     r     f     k     d     o     b     u     i     x     s
#> # ... with 122 more rows, and 9 more variables: `12` <chr>, `13` <chr>,
#> #   `14` <chr>, `15` <chr>, `16` <chr>, `17` <chr>, `18` <chr>,
#> #   `19` <chr>, `20` <chr>

Quoting the OP: "Is there a quick way to convert this structure into a data frame that has 132 rows and 20 columns of data?" So maybe you need a spread step or something.
Ah yes, there just needs to be an index column that can be spread. I will update shortly.
s
sbha

Depending on the structure of your lists there are some tidyverse options that work nicely with unequal length lists:

l <- list(a = list(var.1 = 1, var.2 = 2, var.3 = 3)
        , b = list(var.1 = 4, var.2 = 5)
        , c = list(var.1 = 7, var.3 = 9)
        , d = list(var.1 = 10, var.2 = 11, var.3 = NA))

df <- dplyr::bind_rows(l)
df <- purrr::map_df(l, dplyr::bind_rows)
df <- purrr::map_df(l, ~.x)

# all create the same data frame:
# A tibble: 4 x 3
  var.1 var.2 var.3
  <dbl> <dbl> <dbl>
1     1     2     3
2     4     5    NA
3     7    NA     9
4    10    11    NA

You can also mix vectors and data frames:

library(dplyr)
bind_rows(
  list(a = 1, b = 2),
  data_frame(a = 3:4, b = 5:6),
  c(a = 7)
)

# A tibble: 4 x 2
      a     b
  <dbl> <dbl>
1     1     2
2     3     5
3     4     6
4     7    NA

This dplyr::bind_rows function works well, even with hard to work with lists originating as JSON. From JSON to a surprisingly clean dataframe. Nice.
@sbha I tried to use df <- purrr::map_df(l, ~.x) but it seems like its not working , the error message i have is Error: Column X2 can't be converted from integer to character
S
SavedByJESUS

This method uses a tidyverse package (purrr).

The list:

x <- as.list(mtcars)

Converting it into a data frame (a tibble more specifically):

library(purrr)
map_df(x, ~.x)

EDIT: May 30, 2021

This can actually be achieved with the bind_rows() function in dplyr.

x <- as.list(mtcars)
dplyr::bind_rows(x)

 A tibble: 32 x 11
     mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb
   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
 1  21       6  160    110  3.9   2.62  16.5     0     1     4     4
 2  21       6  160    110  3.9   2.88  17.0     0     1     4     4
 3  22.8     4  108     93  3.85  2.32  18.6     1     1     4     1
 4  21.4     6  258    110  3.08  3.22  19.4     1     0     3     1
 5  18.7     8  360    175  3.15  3.44  17.0     0     0     3     2
 6  18.1     6  225    105  2.76  3.46  20.2     1     0     3     1
 7  14.3     8  360    245  3.21  3.57  15.8     0     0     3     4
 8  24.4     4  147.    62  3.69  3.19  20       1     0     4     2
 9  22.8     4  141.    95  3.92  3.15  22.9     1     0     4     2
10  19.2     6  168.   123  3.92  3.44  18.3     1     0     4     4
# ... with 22 more rows

J
Jack Ryan

Reshape2 yields the same output as the plyr example above:

library(reshape2)
l <- list(a = list(var.1 = 1, var.2 = 2, var.3 = 3)
          , b = list(var.1 = 4, var.2 = 5, var.3 = 6)
          , c = list(var.1 = 7, var.2 = 8, var.3 = 9)
          , d = list(var.1 = 10, var.2 = 11, var.3 = 12)
)
l <- melt(l)
dcast(l, L1 ~ L2)

yields:

  L1 var.1 var.2 var.3
1  a     1     2     3
2  b     4     5     6
3  c     7     8     9
4  d    10    11    12

If you were almost out of pixels you could do this all in 1 line w/ recast().


I think reshape2 is being deprecated for dplyr, tidyr, etc
l
laubbas

Extending on @Marek's answer: if you want to avoid strings to be turned into factors and efficiency is not a concern try

do.call(rbind, lapply(your_list, data.frame, stringsAsFactors=FALSE))

R
RubenLaguna

For the general case of deeply nested lists with 3 or more levels like the ones obtained from a nested JSON:

{
"2015": {
  "spain": {"population": 43, "GNP": 9},
  "sweden": {"population": 7, "GNP": 6}},
"2016": {
  "spain": {"population": 45, "GNP": 10},
  "sweden": {"population": 9, "GNP": 8}}
}

consider the approach of melt() to convert the nested list to a tall format first:

myjson <- jsonlite:fromJSON(file("test.json"))
tall <- reshape2::melt(myjson)[, c("L1", "L2", "L3", "value")]
    L1     L2         L3 value
1 2015  spain population    43
2 2015  spain        GNP     9
3 2015 sweden population     7
4 2015 sweden        GNP     6
5 2016  spain population    45
6 2016  spain        GNP    10
7 2016 sweden population     9
8 2016 sweden        GNP     8

followed by dcast() then to wide again into a tidy dataset where each variable forms a a column and each observation forms a row:

wide <- reshape2::dcast(tall, L1+L2~L3) 
# left side of the formula defines the rows/observations and the 
# right side defines the variables/measurements
    L1     L2 GNP population
1 2015  spain   9         43
2 2015 sweden   6          7
3 2016  spain  10         45
4 2016 sweden   8          9

C
Community

More answers, along with timings in the answer to this question: What is the most efficient way to cast a list as a data frame?

The quickest way, that doesn't produce a dataframe with lists rather than vectors for columns appears to be (from Martin Morgan's answer):

l <- list(list(col1="a",col2=1),list(col1="b",col2=2))
f = function(x) function(i) unlist(lapply(x, `[[`, i), use.names=FALSE)
as.data.frame(Map(f(l), names(l[[1]])))

u
user36302

Sometimes your data may be a list of lists of vectors of the same length.

lolov = list(list(c(1,2,3),c(4,5,6)), list(c(7,8,9),c(10,11,12),c(13,14,15)) )

(The inner vectors could also be lists, but I'm simplifying to make this easier to read).

Then you can make the following modification. Remember that you can unlist one level at a time:

lov = unlist(lolov, recursive = FALSE )
> lov
[[1]]
[1] 1 2 3

[[2]]
[1] 4 5 6

[[3]]
[1] 7 8 9

[[4]]
[1] 10 11 12

[[5]]
[1] 13 14 15

Now use your favorite method mentioned in the other answers:

library(plyr)
>ldply(lov)
  V1 V2 V3
1  1  2  3
2  4  5  6
3  7  8  9
4 10 11 12
5 13 14 15

plyr is being deprecated in favour of dplyr
U
UseR_10085

The following simple command worked for me:

myDf <- as.data.frame(myList)

Reference (Quora answer)

> myList <- list(a = c(1, 2, 3), b = c(4, 5, 6))
> myList
$a
[1] 1 2 3
 
$b
[1] 4 5 6
 
> myDf <- as.data.frame(myList)
  a b
1 1 4
2 2 5
3 3 6
> class(myDf)
[1] "data.frame"

But this will fail if it’s not obvious how to convert the list to a data frame:

> myList <- list(a = c(1, 2, 3), b = c(4, 5, 6, 7))
> myDf <- as.data.frame(myList)

Error in (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE, : arguments imply differing number of rows: 3, 4

Note: The answer is toward the title of the question and may skips some details of the question


A note that on the input from the question this only sort of works. OP asks for 132 rows and 20 columns, but this gives 20 rows and 132 columns.
For your example with different-length input where it fails, it's not clear what the desired result would be...
@Gregor True, but the question title is "R - list to data frame". Many visitors of the question and those who voted it up don't have the exact problem of OP. Based on the question title, they just look for a way to convert list to data frame. I myself had the same problem and the solution I posted solved my problem
Yup, just noting. Not downvoting. It might be nice to note in the answer that it does something similar--but distinctly different than--pretty much all the other answers.
J
John Karuitha

If your list has elements with the same dimensions, you could use the bind_rows function from the tidyverse.

# Load the tidyverse
Library(tidyverse)

# make a list with elements having same dimensions
My_list <- list(a = c(1, 4, 5), b = c(9, 3, 8))

## Bind the rows
My_list %>% bind_rows()

The result is a data frame with two rows.


Thank you very much, this is the simplest solution. I tried all other solutions but none worked. Thanks for posting this.
How do you keep each sublist as a column name?
A
Amit Kohli

This is what finally worked for me:

do.call("rbind", lapply(S1, as.data.frame))


t
trevi

For a paralleled (multicore, multisession, etc) solution using purrr family of solutions, use:

library (furrr)
plan(multisession) # see below to see which other plan() is the more efficient
myTibble <- future_map_dfc(l, ~.x)

Where l is the list.

To benchmark the most efficient plan() you can use:

library(tictoc)
plan(sequential) # reference time
# plan(multisession) # benchamark plan() goes here. See ?plan().
tic()
myTibble <- future_map_dfc(l, ~.x)
toc()

W
Will C

A short (but perhaps not the fastest) way to do this would be to use base r, since a data frame is just a list of equal length vectors. Thus the conversion between your input list and a 30 x 132 data.frame would be:

df <- data.frame(l)

From there we can transpose it to a 132 x 30 matrix, and convert it back to a dataframe:

new_df <- data.frame(t(df))

As a one-liner:

new_df <- data.frame(t(data.frame(l)))

The rownames will be pretty annoying to look at, but you could always rename those with

rownames(new_df) <- 1:nrow(new_df)


Why was this downvoted? I'd like to know so I don't continue to spread misinformation.
I've definitely done this before, using a combination of data.frame and t! I guess the people who downvoted feel there are better ways, particularly those that don't mess up the names.
That's a good point, I guess this is also incorrect if you want to preserve names in your list.
z
zhan2383
l <- replicate(10,list(sample(letters, 20)))
a <-lapply(l[1:10],data.frame)
do.call("cbind", a)

M
Mark Miller

Every solution I have found seems to only apply when every object in a list has the same length. I needed to convert a list to a data.frame when the length of the objects in the list were of unequal length. Below is the base R solution I came up with. It no doubt is very inefficient, but it does seem to work.

x1 <- c(2, 13)
x2 <- c(2, 4, 6, 9, 11, 13)
x3 <- c(1, 1, 2, 3, 3, 4, 5, 5, 6, 7, 7, 8, 9, 9, 10, 11, 11, 12, 13, 13)
my.results <- list(x1, x2, x3)

# identify length of each list
my.lengths <- unlist(lapply(my.results, function (x) { length(unlist(x))}))
my.lengths
#[1]  2  6 20

# create a vector of values in all lists
my.values <- as.numeric(unlist(c(do.call(rbind, lapply(my.results, as.data.frame)))))
my.values
#[1]  2 13  2  4  6  9 11 13  1  1  2  3  3  4  5  5  6  7  7  8  9  9 10 11 11 12 13 13

my.matrix <- matrix(NA, nrow = max(my.lengths), ncol = length(my.lengths))

my.cumsum <- cumsum(my.lengths)

mm <- 1

for(i in 1:length(my.lengths)) {

     my.matrix[1:my.lengths[i],i] <- my.values[mm:my.cumsum[i]]

     mm <- my.cumsum[i]+1

}

my.df <- as.data.frame(my.matrix)
my.df
#   V1 V2 V3
#1   2  2  1
#2  13  4  1
#3  NA  6  2
#4  NA  9  3
#5  NA 11  3
#6  NA 13  4
#7  NA NA  5
#8  NA NA  5
#9  NA NA  6
#10 NA NA  7
#11 NA NA  7
#12 NA NA  8
#13 NA NA  9
#14 NA NA  9
#15 NA NA 10
#16 NA NA 11
#17 NA NA 11
#18 NA NA 12
#19 NA NA 13
#20 NA NA 13

B
Bảo Trần

How about using map_ function together with a for loop? Here is my solution:

list_to_df <- function(list_to_convert) {
  tmp_data_frame <- data.frame()
  for (i in 1:length(list_to_convert)) {
    tmp <- map_dfr(list_to_convert[[i]], data.frame)
    tmp_data_frame <- rbind(tmp_data_frame, tmp)
  }
  return(tmp_data_frame)
}

where map_dfr convert each of the list element into a data.frame and then rbind union them altogether.

In your case, I guess it would be:

converted_list <- list_to_df(l)

1. Results are wrong 2. The loop is unefficient. Better use nested map: map(list_to_convert, ~map_dfr(., data.frame)) but still it is wrong.
S
Sebastian

Try collapse::unlist2d (shorthand for 'unlist to data.frame'):

l <- replicate(
  132,
  list(sample(letters, 20)),
  simplify = FALSE
)

library(collapse)
head(unlist2d(l))
  .id.1 .id.2 V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14 V15 V16 V17 V18 V19 V20
1     1     1  e  x  b  d  s  p  a  c  k   z   q   m   u   l   h   n   r   t   o   y
2     2     1  r  t  i  k  m  b  h  n  s   e   p   f   o   c   x   l   g   v   a   j
3     3     1  t  r  v  z  a  u  c  o  w   f   m   b   d   g   p   q   y   e   n   k
4     4     1  x  i  e  p  f  d  q  k  h   b   j   s   z   a   t   v   y   l   m   n
5     5     1  d  z  k  y  a  p  b  h  c   v   f   m   u   l   n   q   e   i   w   j
6     6     1  l  f  s  u  o  v  p  z  q   e   r   c   h   n   a   t   m   k   y   x

head(unlist2d(l, idcols = FALSE))
  V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14 V15 V16 V17 V18 V19 V20
1  e  x  b  d  s  p  a  c  k   z   q   m   u   l   h   n   r   t   o   y
2  r  t  i  k  m  b  h  n  s   e   p   f   o   c   x   l   g   v   a   j
3  t  r  v  z  a  u  c  o  w   f   m   b   d   g   p   q   y   e   n   k
4  x  i  e  p  f  d  q  k  h   b   j   s   z   a   t   v   y   l   m   n
5  d  z  k  y  a  p  b  h  c   v   f   m   u   l   n   q   e   i   w   j
6  l  f  s  u  o  v  p  z  q   e   r   c   h   n   a   t   m   k   y   x

R
Roelof Waaijman

Or you could use the tibble package (from tidyverse):

#create examplelist
l <- replicate(
  132,
  as.list(sample(letters, 20)),
  simplify = FALSE
)

#package tidyverse
library(tidyverse)

#make a dataframe (or use as_tibble)
df <- as_data_frame(l,.name_repair = "unique")




It creates df with 20 rows and 132 columns but it should be otherwise
D
Dimitrios Zacharatos

I want to suggest this solution as well. Although it looks similar to other solutions, it uses rbind.fill from the plyr package. This is advantageous in situations where a list has missing columns or NA values.

l <- replicate(10,as.list(sample(letters,10)),simplify = FALSE)

res<-data.frame()
for (i in 1:length(l))
  res<-plyr::rbind.fill(res,data.frame(t(unlist(l[i]))))

res

N
NCC1701

From a different perspective;

install.packages("smotefamily")
library(smotefamily)
library(dplyr)

data_example = sample_generator(5000,ratio = 0.80)
genData = BLSMOTE(data_example[,-3],data_example[,3])
#There are many lists in genData. If we want to convert one of them to dataframe.

sentetic=as.data.frame.array(genData$syn_data)
# as.data.frame.array seems to be working.