2019-04-22 19:13:00 by wst
数据处理理解pandas中各种联结方式,有助于更好的实现业务逻辑。下面分别对其一一介绍。
逻辑:左边的所有数据都会出现,当和右边匹配的时候,如果能匹配上,则有几条列几条。否则,右边相应字段则会留空。
情况1:右边匹配不上
In [1]: import pandas as pd
...: df1 = pd.DataFrame([{'a':1,'b':2,'c':3},{'a':2,'b':4,'c':3}])
...:
...:
In [2]: df1
Out[2]:
a b c
0 1 2 3
1 2 4 3
In [3]: df2 = pd.DataFrame([{'a':2,'b':20,'c':30},{'a':3,'b':40,'c':30}])
In [4]: df2
Out[4]:
a b c
0 2 20 30
1 3 40 30
In [5]: df = pd.merge(df1,df2,how='left',on='a')
In [6]: df
Out[6]:
a b_x c_x b_y c_y
0 1 2 3 NaN NaN
1 2 4 3 20.0 30.0
情况2:右边匹配多条
In [7]: df2 = pd.DataFrame([{'a':2,'b':20,'c':30},{'a':2,'b':40,'c':30}])
In [8]: df2
Out[8]:
a b c
0 2 20 30
1 2 40 30
In [9]: df = pd.merge(df1,df2,how='left',on='a')
In [10]: df
Out[10]:
a b_x c_x b_y c_y
0 1 2 3 NaN NaN
1 2 4 3 20.0 30.0
2 2 4 3 40.0 30.0
下面的例子包含了情况1和情况2
In [15]: df2 = pd.DataFrame([{'a':2,'b':20,'c':30},{'a':2,'b':40,'c':30},{'a':5,'b':6,'c':7}])
In [16]: df2
Out[16]:
a b c
0 2 20 30
1 2 40 30
2 5 6 7
In [17]: df = pd.merge(df1,df2,how='right',on='a')
In [18]: df
Out[18]:
a b_x c_x b_y c_y
0 2 4.0 3.0 20 30
1 2 4.0 3.0 40 30
2 5 NaN NaN 6 7
只列出匹配上的项
In [19]: df = pd.merge(df1,df2,how='inner',on='a')
In [20]: df
Out[20]:
a b_x c_x b_y c_y
0 2 4 3 20 30
1 2 4 3 40 30