Monday, August 29, 2016

One Line Code Challenge: Special Merge



Suppose we have two data frames:

In[29]: df_data
Out[29]: 
   A  B  C  val_0
0  Y  M  1      1
1  Y  F  1      2
2  N  M  1      3
3  N  F  1      4
4  Y  M  2      5
5  Y  F  2      4
6  N  M  2      3
7  N  F  2      2

In[30]: df_rule
Out[30]: 
   A    B   C  val_1
0  Y  NaN NaN    100
1  N    F NaN    200

2  N    M NaN    300

In the context of an ordinary merge on ['A','B','C'], there is no matches because there are NaN in the df_rule.

In the context of a special merge, we allow NaN to be anything. That is to say
(Y,M,1) can match with (Y,M,NaN) (or (Y,NaN,1)etc)

The task is to achieve this special merge.

Solution:
df_rule.groupby(df_rule.apply(lambda x: '-'.join(x[x.notnull()].index.values), axis=1)).apply(lambda df: pd.merge(df, df_data,on=df.dropna(axis=1,how='all').columns.intersection(df_data.columns).tolist(), how='left',suffixes=('_to_remove','')))[df_rule.columns.union(df_data.columns)]




No comments:

Post a Comment