Question

python pandas apply 不接受 numpy.float64 参数

2022-09-29

684

python pandas numpy apply series

我在将 numpy.float64 变量作为参数传递给 pandas.Series.apply() 时遇到了问题。有没有办法强制使用 pandas 版本的 .mean() 和 .std() 函数来满足 Pandas 的要求？

代码

def normalization(val_to_norm, col_mean, col_sd):
    return (val_to_norm - col_mean) / col_sd

voting_df['pop_estimate'].info()

pop_mean, pop_sd = voting_df['pop_estimate'].mean(), voting_df['pop_estimate'].std()

voting_df['pop_estimate'] = voting_df['pop_estimate'].apply(normalization, pop_mean, pop_sd)

输出

关键行位于底部。

<class 'pandas.core.series.Series'>
Int64Index: 3145 entries, 0 to 3144
Series name: pop_estimate
Non-Null Count  Dtype  
--------------  -----  
3145 non-null   float64
dtypes: float64(1)
memory usage: 49.1 KB

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In [46], line 7
      4 voting_df['pop_estimate'].info()
      6 pop_mean, pop_sd = voting_df['pop_estimate'].mean(), voting_df['pop_estimate'].std()
----> 7 voting_df['pop_estimate'] = voting_df['pop_estimate'].apply(normalization, pop_mean, pop_sd)

File c:\Users\chris\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\core\series.py:4774, in Series.apply(self, func, convert_dtype, args, **kwargs)
   4664 def apply(
   4665     self,
   4666     func: AggFuncType,
   (...)
   4669     **kwargs,
   4670 ) -> DataFrame | Series:
   4671     """
   4672     Invoke function on values of Series.
   4673 
   (...)
   4772     dtype: float64
   4773     """
-> 4774     return SeriesApply(self, func, convert_dtype, args, kwargs).apply()

File c:\Users\chris\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\core\apply.py:1100, in SeriesApply.apply(self)
   1097     return self.apply_str()
   1099 # self.f is Callable
-> 1100 return self.apply_standard()

File c:\Users\chris\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\core\apply.py:1151, in SeriesApply.apply_standard(self)
   1149     else:
   1150         values = obj.astype(object)._values
-> 1151         mapped = lib.map_infer(
   1152             values,
   1153             f,
   1154             convert=self.convert_dtype,
   1155         )
   1157 if len(mapped) and isinstance(mapped[0], ABCSeries):
   1158     # GH#43986 Need to do list(mapped) in order to get treated as nested
   1159     #  See also GH#25959 regarding EA support
   1160     return obj._constructor_expanddim(list(mapped), index=obj.index)

File c:\Users\chris\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\_libs\lib.pyx:2919, in pandas._libs.lib.map_infer()

File c:\Users\chris\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\core\apply.py:139, in Apply.__init__.<locals>.f(x)
    138 def f(x):
--> 139     return func(x, *args, **kwargs)

TypeError: Value after * must be an iterable, not numpy.float64

Answer 1

要向使用 pd.Series.apply 调用的函数提供其他参数，您需要将它们作为关键字参数传递，或使用元组关键字参数 args 。

来自文档：

Series.apply (func, convert_dtype=True, args=(), **kwargs)

Invoke function on values of Series.

Can be ufunc (a NumPy function that applies to the entire Series) or a Python function that only works on single values.

Parameters

func: function
Python function or NumPy ufunc to apply.

convert_dtype : bool, default True
Try to find better dtype for elementwise function results. If False, leave as dtype=object. Note that the dtype is always preserved for some extension array dtypes, such as Categorical.

args : tuple
Positional arguments passed to func after the series value.

**kwargs
Additional keyword arguments passed to func.

因此，要使用位置参数调用它：

voting_df['pop_estimate'].apply(normalization, args=(pop_mean, pop_sd))

或者，使用关键字参数：

voting_df['pop_estimate'].apply(normalization, col_mean=pop_mean, col_sd=pop_sd)

Answer 2

这与数据类型无关。您传递的是 pop_mean 和 pop_sd 作为位置参数，并且它由 apply 使用，而不是 normalization 。

为了传递给 normalization ，请使用 args 或关键字参数：

# sample data setup
voting_df = pd.DataFrame({"pop_estimate": range(3144)})

def normalization(val_to_norm, col_mean, col_sd):
    return (val_to_norm - col_mean) / col_sd

pop_mean, pop_sd = voting_df['pop_estimate'].mean(), voting_df['pop_estimate'].std()

方法 1：使用 args ：

method1 = voting_df['pop_estimate'].apply(normalization, args=(pop_mean, pop_sd))

方法 2：使用关键字参数：

method2 = voting_df['pop_estimate'].apply(normalization,  
                                          col_mean=pop_mean, 
                                          col_sd=pop_sd)

此外，就您而言，您不需要 apply 。相反，直接使用 normalization ：

method3 = normalization(voting_df["pop_estimate"], pop_mean, pop_sd)

或者更好的是，使用已经构建好的库。例如， scipy.stats.zscore :

from scipy.stats import zscore

method4 = zscore(voting_df["pop_estimate"], ddof=1)

验证：

import numpy as np

np.all([
    np.array_equal(method1, method2),
    np.array_equal(method2, method3),
    np.array_equal(method3, method4)    
])
# True