Replace column values based on another dataframe python pandas – better way?

Attention: In latest version of pandas, both answers above doesn’t work anymore:

KSD’s answer will raise error:

df1 = pd.DataFrame([["X",1,1,0],
              ["Y",0,1,0],
              ["Z",0,0,0],
              ["Y",0,0,0]],columns=["Name","Nonprofit","Business", "Education"])    

df2 = pd.DataFrame([["Y",1,1],
              ["Z",1,1]],columns=["Name","Nonprofit", "Education"])   

df1.loc[df1.Name.isin(df2.Name), ['Nonprofit', 'Education']] = df2.loc[df2.Name.isin(df1.Name),['Nonprofit', 'Education']].values

df1.loc[df1.Name.isin(df2.Name), ['Nonprofit', 'Education']] = df2[['Nonprofit', 'Education']].values

Out[851]:
ValueError: shape mismatch: value array of shape (2,) could not be broadcast to indexing result of shape (3,)

and EdChum’s answer will give us the wrong result:

 df1.loc[df1.Name.isin(df2.Name), ['Nonprofit', 'Education']] = df2[['Nonprofit', 'Education']]

df1
Out[852]: 
  Name  Nonprofit  Business  Education
0    X        1.0         1        0.0
1    Y        1.0         1        1.0
2    Z        NaN         0        NaN
3    Y        NaN         1        NaN

Well, it will work safely only if values in column ‘Name’ are unique and are sorted in both data frames.

Here is my answer:

Way 1:

df1 = df1.merge(df2,on='Name',how="left")
df1['Nonprofit_y'] = df1['Nonprofit_y'].fillna(df1['Nonprofit_x'])
df1['Business_y'] = df1['Business_y'].fillna(df1['Business_x'])
df1.drop(["Business_x","Nonprofit_x"],inplace=True,axis=1)
df1.rename(columns={'Business_y':'Business','Nonprofit_y':'Nonprofit'},inplace=True)

Way 2:

df1 = df1.set_index('Name')
df2 = df2.set_index('Name')
df1.update(df2)
df1.reset_index(inplace=True)

More guide about update.. The columns names of both data frames need to set index are not necessary same before ‘update’. You could try ‘Name1’ and ‘Name2’. Also, it works even if other unnecessary row in df2, which won’t update df1. In other words, df2 doesn’t need to be the super set of df1.

Example:

df1 = pd.DataFrame([["X",1,1,0],
              ["Y",0,1,0],
              ["Z",0,0,0],
              ["Y",0,1,0]],columns=["Name1","Nonprofit","Business", "Education"])    

df2 = pd.DataFrame([["Y",1,1],
              ["Z",1,1],
              ['U',1,3]],columns=["Name2","Nonprofit", "Education"])   

df1 = df1.set_index('Name1')
df2 = df2.set_index('Name2')


df1.update(df2)

result:

      Nonprofit  Business  Education
Name1                                
X           1.0         1        0.0
Y           1.0         1        1.0
Z           1.0         0        1.0
Y           1.0         1        1.0

Leave a Comment