Pandas join issue: columns overlap but no suffix specified

python join pandas

I have the following data frames:

print(df_a)
     mukey  DI  PI
0   100000  35  14
1  1000005  44  14
2  1000006  44  14
3  1000007  43  13
4  1000008  43  13

print(df_b)
    mukey  niccdcd
0  190236        4
1  190237        6
2  190238        7
3  190239        4
4  190240        7

When I try to join these data frames:

join_df = df_a.join(df_b, on='mukey', how='left')

I get the error:

*** ValueError: columns overlap but no suffix specified: Index([u'mukey'], dtype='object')

Why is this so? The data frames do have common 'mukey' values.

EdChum

Your error on the snippet of data you posted is a little cryptic, in that because there are no common values, the join operation fails because the values don't overlap it requires you to supply a suffix for the left and right hand side:

In [173]:

df_a.join(df_b, on='mukey', how='left', lsuffix='_left', rsuffix='_right')
Out[173]:
       mukey_left  DI  PI  mukey_right  niccdcd
index                                          
0          100000  35  14          NaN      NaN
1         1000005  44  14          NaN      NaN
2         1000006  44  14          NaN      NaN
3         1000007  43  13          NaN      NaN
4         1000008  43  13          NaN      NaN

merge works because it doesn't have this restriction:

In [176]:

df_a.merge(df_b, on='mukey', how='left')
Out[176]:
     mukey  DI  PI  niccdcd
0   100000  35  14      NaN
1  1000005  44  14      NaN
2  1000006  44  14      NaN
3  1000007  43  13      NaN
4  1000008  43  13      NaN

Nazim Kerimbekov

The .join() function is using the index of the passed as argument dataset, so you should use set_index or use .merge function instead.

Please find the two examples that should work in your case:

join_df = LS_sgo.join(MSU_pi.set_index('mukey'), on='mukey', how='left')

join_df = df_a.merge(df_b, on='mukey', how='left')

I'd like to add that set_index has to be applied on both sides and then it improves performance considerably over merge.

Reddspark

This error indicates that the two tables have one or more column names that have the same column name.

The error message translates to: "I can see the same column in both tables but you haven't told me to rename either one before bringing them into the same table"

You either want to delete one of the columns before bringing it in from the other on using del df['column name'], or use lsuffix to re-write the original column, or rsuffix to rename the one that is being brought in.

df_a.join(df_b, on='mukey', how='left', lsuffix='_left', rsuffix='_right')

Joe Eifert

The error indicates that the two tables have the 1 or more column names that have the same column name.

Anyone with the same error who doesn't want to provide a suffix can rename the columns instead. Also make sure the index of both DataFrames match in type and value if you don't want to provide the on='mukey' setting.

# rename example
df_a = df_a.rename(columns={'a_old': 'a_new', 'a2_old': 'a2_new'})
# set the index
df_a = df_a.set_index(['mukus'])
df_b = df_b.set_index(['mukus'])

df_a.join(df_b)

user12690524

Mainly join is used exclusively to join based on the index,not on the attribute names,so change the attributes names in two different dataframes,then try to join,they will be joined,else this error is raised

Pandas join issue: columns overlap but no suffix specified

Follow WeChat

Want to stay one step ahead of the latest teleworks?

相似问题

Platform

Support

Contact US