Question

为什么在这种情况下 groupby 函数不起作用

2023-03-18

1048

python pandas dataframe group-by

我想对汽车价格进行均值插补，并且均值应基于汽车型号，因此我尝试根据汽车型号列对我的数据创建一个分组函数以进行均值插补，但出现此错误：

TypeError：'DataFrameGroupBy'对象不支持项目分配

我试过了

grouped_df = df1.groupby('modele')
def replace_zero_or_1000_with_nan(x):
  x[x == 0.0] = pd.np.nan
  x[x == 1000000.0] = pd.np.nan
  return x

# Use apply() to apply the function to the car_price column of each group
grouped_df['prix_millions'] = grouped_df['prix_millions'].apply(replace_zero_or_1000_with_nan)


# Use transform() to apply the mean value of each car model to the NaN values
imputed_df = grouped_df.transform(lambda x: x.fillna(x.mean()))

我需要将 0 和 10000 的值设为空值，然后将它们替换为基于汽车型号组的均值插补，这样值才合乎逻辑。

Answer 1

您可以使用布尔索引将值 0 和 1000 替换为组的平均值：

# boolean mask, check where values are 0 and 1000
mask = df['prix_millions'].isin([0, 1000])

# compute the mean for each group
mean_per_group = df.loc[~mask].groupby('modele')['prix_millions'].mean()

# replace 0 and 1000 values by the mean of the group
df.loc[mask, 'prix_millions'] = df['modele'].map(mean_per_group)

输出：

227571886

详细信息：

>>> m
0     True
1    False
2    False
3     True
4     True
5     True
6    False
7    False
8    False
9    False
Name: prix_millions, dtype: bool

>>> mean_per_group
modele
1    364.833333
Name: prix_millions, dtype: float64

输入：

>>> df
   modele  prix_millions
0       1              0
1       1            756
2       1            347
3       1           1000
4       1           1000
5       1              0
6       1            137
7       1            748
8       1            123
9       1             78

Answer 2

您收到的错误消息“TypeError：'DataFrameGroupBy' 对象不支持项目分配”是由于您尝试为 groupby 对象分配值，这是不允许的。

要解决此问题，您可以在将 replace_zero_or_1000_with_nan 函数应用于“prix_millions”列时使用 transform 方法而不是 apply。transform 方法将函数应用于每个组并返回与原始数据形状相同的数据框。

以下是更新后的代码：

grouped_df = df1.groupby('modele')

def replace_zero_or_1000_with_nan(x):
  x[x == 0.0] = pd.np.nan
  x[x == 1000000.0] = pd.np.nan
  return x

# Use transform() to apply the function to the car_price column of each group
grouped_df['prix_millions'] = grouped_df['prix_millions'].transform(replace_zero_or_1000_with_nan)

# Use transform() to apply the mean value of each car model to the NaN values
imputed_df = grouped_df.transform(lambda x: x.fillna(x.mean()))

这应该允许您用 NaN 替换 0 和 10000 值，然后将每种车型的平均值归因于 NaN 值。

Answer 3

这是因为 groupby 方法返回的是 DataFrameGroupBy 类型，而不是 DataFrame。事实上，在替换值之前，你真的需要对数据集进行 groupby 吗？我认为你可以通过重新排序操作来解决这个问题。