python pandas apply 不接受 numpy.float64 参数
我在将
numpy.float64
变量作为参数传递给
pandas.Series.apply()
时遇到了问题。有没有办法强制使用 pandas 版本的
.mean()
和
.std()
函数来满足 Pandas 的要求?
代码
def normalization(val_to_norm, col_mean, col_sd):
return (val_to_norm - col_mean) / col_sd
voting_df['pop_estimate'].info()
pop_mean, pop_sd = voting_df['pop_estimate'].mean(), voting_df['pop_estimate'].std()
voting_df['pop_estimate'] = voting_df['pop_estimate'].apply(normalization, pop_mean, pop_sd)
输出
关键行位于底部。
<class 'pandas.core.series.Series'>
Int64Index: 3145 entries, 0 to 3144
Series name: pop_estimate
Non-Null Count Dtype
-------------- -----
3145 non-null float64
dtypes: float64(1)
memory usage: 49.1 KB
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Cell In [46], line 7
4 voting_df['pop_estimate'].info()
6 pop_mean, pop_sd = voting_df['pop_estimate'].mean(), voting_df['pop_estimate'].std()
----> 7 voting_df['pop_estimate'] = voting_df['pop_estimate'].apply(normalization, pop_mean, pop_sd)
File c:\Users\chris\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\core\series.py:4774, in Series.apply(self, func, convert_dtype, args, **kwargs)
4664 def apply(
4665 self,
4666 func: AggFuncType,
(...)
4669 **kwargs,
4670 ) -> DataFrame | Series:
4671 """
4672 Invoke function on values of Series.
4673
(...)
4772 dtype: float64
4773 """
-> 4774 return SeriesApply(self, func, convert_dtype, args, kwargs).apply()
File c:\Users\chris\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\core\apply.py:1100, in SeriesApply.apply(self)
1097 return self.apply_str()
1099 # self.f is Callable
-> 1100 return self.apply_standard()
File c:\Users\chris\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\core\apply.py:1151, in SeriesApply.apply_standard(self)
1149 else:
1150 values = obj.astype(object)._values
-> 1151 mapped = lib.map_infer(
1152 values,
1153 f,
1154 convert=self.convert_dtype,
1155 )
1157 if len(mapped) and isinstance(mapped[0], ABCSeries):
1158 # GH#43986 Need to do list(mapped) in order to get treated as nested
1159 # See also GH#25959 regarding EA support
1160 return obj._constructor_expanddim(list(mapped), index=obj.index)
File c:\Users\chris\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\_libs\lib.pyx:2919, in pandas._libs.lib.map_infer()
File c:\Users\chris\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\core\apply.py:139, in Apply.__init__.<locals>.f(x)
138 def f(x):
--> 139 return func(x, *args, **kwargs)
TypeError: Value after * must be an iterable, not numpy.float64
要向使用
pd.Series.apply
调用的函数提供其他参数,您需要将它们作为关键字参数传递,或使用元组关键字参数
args
。
来自文档:
Series.apply
(func, convert_dtype=True, args=(), **kwargs)
Invoke function on values of Series.
Can be ufunc (a NumPy function that applies to the entire Series) or a Python function that only works on single values.
Parameters
func: function
Python function or NumPy ufunc to apply.convert_dtype : bool, default True
Try to find better dtype for elementwise function results. If False, leave as dtype=object. Note that the dtype is always preserved for some extension array dtypes, such as Categorical.args : tuple
Positional arguments passed to func after the series value.**kwargs
Additional keyword arguments passed to func.
因此,要使用位置参数调用它:
voting_df['pop_estimate'].apply(normalization, args=(pop_mean, pop_sd))
或者,使用关键字参数:
voting_df['pop_estimate'].apply(normalization, col_mean=pop_mean, col_sd=pop_sd)
这与数据类型无关。您传递的是
pop_mean
和
pop_sd
作为位置参数,并且它由
apply
使用,而不是
normalization
。
为了传递给
normalization
,请使用
args
或关键字参数:
# sample data setup
voting_df = pd.DataFrame({"pop_estimate": range(3144)})
def normalization(val_to_norm, col_mean, col_sd):
return (val_to_norm - col_mean) / col_sd
pop_mean, pop_sd = voting_df['pop_estimate'].mean(), voting_df['pop_estimate'].std()
方法 1:使用
args
:
method1 = voting_df['pop_estimate'].apply(normalization, args=(pop_mean, pop_sd))
方法 2:使用关键字参数:
method2 = voting_df['pop_estimate'].apply(normalization,
col_mean=pop_mean,
col_sd=pop_sd)
此外,就您而言,您不需要
apply
。相反,直接使用
normalization
:
method3 = normalization(voting_df["pop_estimate"], pop_mean, pop_sd)
或者更好的是,使用已经构建好的库。例如,
scipy.stats.zscore
:
from scipy.stats import zscore
method4 = zscore(voting_df["pop_estimate"], ddof=1)
验证:
import numpy as np
np.all([
np.array_equal(method1, method2),
np.array_equal(method2, method3),
np.array_equal(method3, method4)
])
# True