I am trying to LEFT Join 2 data frames but I do not want join all the variables from the second data set:
As an example, I have dataset 1 (DF1):
Cl Q Sales Date
A 2 30 01/01/2014
A 3 24 02/01/2014
A 1 10 03/01/2014
B 4 10 01/01/2014
B 1 20 02/01/2014
B 3 30 03/01/2014
And I would like to left join dataset 2 (DF2):
Client LO CON
A 12 CA
B 11 US
C 12 UK
D 10 CA
E 15 AUS
F 91 DD
I am able to left join with the following code:
merge(x = DF1, y = DF2, by = "Client", all.x=TRUE) :
Client Q Sales Date LO CON
A 2 30 01/01/2014 12 CA
A 3 24 02/01/2014 12 CA
A 1 10 03/01/2014 12 CA
B 4 10 01/01/2014 11 US
B 1 20 02/01/2014 11 US
B 3 30 03/01/2014 11 US
However, it merges both column LO and CON. I would only like to merge the column LO.
Client Q Sales Date LO
A 2 30 01/01/2014 12
A 3 24 02/01/2014 12
A 1 10 03/01/2014 12
B 4 10 01/01/2014 11
B 1 20 02/01/2014 11
B 3 30 03/01/2014 11
You can do this by subsetting the data you pass into your merge:
merge(x = DF1, y = DF2[ , c("Client", "LO")], by = "Client", all.x=TRUE)
Or you can simply delete the column after your current merge :)
I think it's a little simpler to use the dplyr
functions select
and left_join
; at least it's easier for me to understand. The join function from dplyr
are made to mimic sql arguments.
library(tidyverse)
DF2 <- DF2 %>%
select(client, LO)
joined_data <- left_join(DF1, DF2, by = "Client")
You don't actually need to use the "by" argument in this case because the columns have the same name.
%<%
or %>%
?
Nothing elegant but this could be another satisfactory answer.
merge(x = DF1, y = DF2, by = "Client", all.x=TRUE)[,c("Client","LO","CON")]
This will be useful especially when you don't need the keys that were used to join the tables in your results.
Alternative solution using left_join()
and select()
from the dplyr
package, without intermediate steps:
DF1 <- DF1 %>%
left_join(DF2, by = "Client") %>%
select(-CON)
For Client column in both tables:
DF_joined <- DF1 %>% left_join(DF2 %>% select(Client,CON))
One-liner with dplyr
DF_joined <- left_join(DF1, select(DF2, -CON), by = "Client")
Success story sharing