ChatGPT解决这个技术问题 Extra ChatGPT

pandas: merge (join) two data frames on multiple columns

I am trying to join two pandas data frames using two columns:

new_df = pd.merge(A_df, B_df,  how='left', left_on='[A_c1,c2]', right_on = '[B_c1,c2]')

but got the following error:

pandas/index.pyx in pandas.index.IndexEngine.get_loc (pandas/index.c:4164)()

pandas/index.pyx in pandas.index.IndexEngine.get_loc (pandas/index.c:4028)()

pandas/src/hashtable_class_helper.pxi in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:13166)()

pandas/src/hashtable_class_helper.pxi in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:13120)()

KeyError: '[B_1, c2]'

Any idea what should be the right way to do this? Thanks!

left_on and right_on should be a list of strings, not a string that looks like a list.
Simple typo: left_on=['A_c1','c2'] instead of '[A_c1,c2]'. It needs a list of string, as @root said. Similarly right_on = ['B_c1','c2'].
By the way it's a bad practice to have the columns of dataframe A_df be named starting with a prefix 'A_', and the columns from B_df named B_.... It's totally unnecessary, and it makes basic operations like joins, merges, groupbys etc. annoying.

C
Community

Try this

new_df = pd.merge(A_df, B_df,  how='left', left_on=['A_c1','c2'], right_on = ['B_c1','c2'])

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.merge.html

left_on : label or list, or array-like Field names to join on in left DataFrame. Can be a vector or list of vectors of the length of the DataFrame to use a particular vector as the join key instead of columns right_on : label or list, or array-like Field names to join on in right DataFrame or vector/list of vectors per left_on docs


If left_on and right_on are same a and b, can we use on = ['a', 'b']?
Yes that is perfectly valid.
For those wondering like me, it will merge according to the sequence/ordering of left_on and right_on, i.e., the i-th element of left_on will match with the i-th of right_on. It would be nice if the docs state that more explicitly.
Note that whenever the join keys have different names, all names will occur as columns in the merged table. E.g., left_on='[A_c1, c2]', right_on='[B_c1, c2]') will result in three columns: A_c1, B_c1 and c2, where A_c1 and B_c1 are identical columns.
C
Celius Stingher

the problem here is that by using the apostrophes you are setting the value being passed to be a string, when in fact, as @Shijo stated from the documentation, the function is expecting a label or list, but not a string! If the list contains each of the name of the columns beings passed for both the left and right dataframe, then each column-name must individually be within apostrophes. With what has been stated, we can understand why this is inccorect:

new_df = pd.merge(A_df, B_df,  how='left', left_on='[A_c1,c2]', right_on = '[B_c1,c2]')

And this is the correct way of using the function:

new_df = pd.merge(A_df, B_df,  how='left', left_on=['A_c1','c2'], right_on = ['B_c1','c2'])

j
john ed

Another way of doing this:

new_df = A_df.merge(B_df, left_on=['A_c1','c2'], right_on = ['B_c1','c2'], how='left')

A
Ali karimi

you can use below which is short and simple to understand:

merged_data= df1.merge(df2, on=["column1","column2"])