如何使用groupby.first()将数据框列的值替换为另一列的值?
2020-02-19
57
我有这样的 df:
Value1 Value2
2008-01-01 -1 4
2008-01-01 -1 5
2008-01-03 -1 6
2008-02-25 0 7
2008-02-26 -1 8
2008-02-27 0 9
2008-03-02 5 10
2008-03-16 -1 11
2008-03-17 -1 12
2009-04-04 -1 13
2009-04-07 0 14
我想检查每个 Year-Month 的
第一个
value1
。如果它小于 0,我想用 value1 替换 value2。我该怎么做?
在此示例中,结果应为:
Value1 Value2
2008-01-01 -1 -1
2008-01-01 -1 5
2008-01-03 -1 6
2008-02-25 0 7
2008-02-26 -1 8
2008-02-27 0 9
2008-03-02 5 10
2008-03-16 -1 11
2008-03-17 -1 12
2009-04-04 -1 -1
2009-04-07 0 14
因为只有
2008-01、2009-04
第一个
value1
为负数,
2008-2、2008-03
第一个
value1
为正数,所以保留它。
我用过:
g = df.groupby([df.index.year,df.index.month])
if g['value1'].first() < 0:
g['value1'].first() = g['value2'].first()
它似乎不起作用。谢谢。
3个回答
这里有一种方法:
df.index = pd.to_datetime(df.index)
year_month = df.index.strftime('%Y-%m')
#format the date so its simpler to find where the changes occur
rows_diff = np.flatnonzero(year_month[:-1]!= year_month[1:])+1
#see where the Value1 is 0 in the first row
m = df.iloc[rows_diff, 0] == 0
#set those values to whatever Value2 is
df.loc[m.index, 'Value1'] = df.loc[m.index, 'Value2']
print(df)
Value1 Value2
2008-01-01 -1 5
2008-01-03 -1 6
2008-02-25 7 7
2008-02-26 -1 8
2008-02-27 0 9
2008-03-02 10 10
2008-03-16 -1 11
2008-03-17 -1 12
2009-04-04 13 13
2009-04-07 0 14
yatu
2020-02-19
这是另一种方法。希望对你有用:
import pandas as pd
import numpy as np
df = pd.DataFrame({'date':['2008-01-01','2008-01-03','2008-02-25','2008-02-26','2008-02-27','2008-03-02','2008-03-16','2008-03-17','2009-04-04','2009-04-07'],
'value1':[-1,-1,0,-1,0,5,-1,-1,-1,0],
'value2':[5,6,7,8,9,10,11,12,13,14]})
df['date'] = pd.to_datetime(df['date'])
df['year'] = df['date'].dt.year
df['month'] = df['date'].dt.month
df['day'] = df['date'].dt.day
df_a = (df.groupby(['year','month'])['day'].first()).reset_index()
df_a['aux'] = True
df_output = df.merge(df_a,how='left',on=['year','month','day'])
df_output['value1'] = np.where((df_output['aux'] == True) & (df['value1'] < 0),
df['value2'],
df['value1'])
df_output = df_output.drop(columns=['year','month','day','aux'])
print(df_output)
输出:
date value1 value2
0 2008-01-01 5 5
1 2008-01-03 -1 6
2 2008-02-25 0 7
3 2008-02-26 -1 8
4 2008-02-27 0 9
5 2008-03-02 5 10
6 2008-03-16 -1 11
7 2008-03-17 -1 12
8 2009-04-04 13 13
9 2009-04-07 0 14
Celius Stingher
2020-02-19
我使用
groupby().head()
提取索引并使用
loc
进行更新的方法:
s = df.groupby(df.index.to_period('M'), as_index=False).head(1)
df.loc[s[s['Value1'].lt(0)].index, 'Value1'] = df['Value2']
输出:
Value1 Value2
2008-01-01 5 5
2008-01-03 -1 6
2008-02-25 0 7
2008-02-26 -1 8
2008-02-27 0 9
2008-03-02 5 10
2008-03-16 -1 11
2008-03-17 -1 12
2009-04-04 13 13
2009-04-07 0 14
Quang Hoang
2020-02-19