I am trying to join two pandas data frames using two columns:
new_df = pd.merge(A_df, B_df, how='left', left_on='[A_c1,c2]', right_on = '[B_c1,c2]')
but got the following error:
pandas/index.pyx in pandas.index.IndexEngine.get_loc (pandas/index.c:4164)()
pandas/index.pyx in pandas.index.IndexEngine.get_loc (pandas/index.c:4028)()
pandas/src/hashtable_class_helper.pxi in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:13166)()
pandas/src/hashtable_class_helper.pxi in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:13120)()
KeyError: '[B_1, c2]'
Any idea what should be the right way to do this? Thanks!
left_on
and right_on
should be a list of strings, not a string that looks like a list.
left_on=['A_c1','c2']
instead of '[A_c1,c2]'
. It needs a list of string, as @root said. Similarly right_on = ['B_c1','c2']
.
A_df
be named starting with a prefix 'A_', and the columns from B_df
named B_...
. It's totally unnecessary, and it makes basic operations like joins, merges, groupbys etc. annoying.
Try this
new_df = pd.merge(A_df, B_df, how='left', left_on=['A_c1','c2'], right_on = ['B_c1','c2'])
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.merge.html
left_on : label or list, or array-like Field names to join on in left DataFrame. Can be a vector or list of vectors of the length of the DataFrame to use a particular vector as the join key instead of columns right_on : label or list, or array-like Field names to join on in right DataFrame or vector/list of vectors per left_on docs
the problem here is that by using the apostrophes you are setting the value being passed to be a string, when in fact, as @Shijo stated from the documentation, the function is expecting a label or list, but not a string! If the list contains each of the name of the columns beings passed for both the left and right dataframe, then each column-name must individually be within apostrophes. With what has been stated, we can understand why this is inccorect:
new_df = pd.merge(A_df, B_df, how='left', left_on='[A_c1,c2]', right_on = '[B_c1,c2]')
And this is the correct way of using the function:
new_df = pd.merge(A_df, B_df, how='left', left_on=['A_c1','c2'], right_on = ['B_c1','c2'])
Another way of doing this:
new_df = A_df.merge(B_df, left_on=['A_c1','c2'], right_on = ['B_c1','c2'], how='left')
you can use below which is short and simple to understand:
merged_data= df1.merge(df2, on=["column1","column2"])
Success story sharing
left_on
andright_on
are samea
andb
, can we useon = ['a', 'b']
?left_on
andright_on
, i.e., the i-th element ofleft_on
will match with the i-th ofright_on
. It would be nice if the docs state that more explicitly.left_on='[A_c1, c2]', right_on='[B_c1, c2]')
will result in three columns:A_c1
,B_c1
andc2
, whereA_c1
andB_c1
are identical columns.