Question

迭代数据框列：TypeError：'float'对象不可下标

2019-01-10

5804

python pandas

我有一个数据框 ( df )，其中有一列名为 Id ，如下所示

该列的类型为： dtype: object 我已经计算出最大 Id 值并将其分配给一个名为 maxId 的变量（该变量为 678，并且希望将按顺序增加的 maxId 应用于空元素，因此在此示例中，我的输出为：

其中元素 3 和 53 分别被分配了 679 和 680 的值。

我尝试了以下代码，其中我循环遍历该列以查找空元素，然后将 maxId 应用于这些元素：

for item, frame in df['Id'].iteritems():
        if pd.isnull(frame):
            maxId = maxId + 1
            frame['Id'] = maxId

但是我收到错误：

TypeError: 'float' object is not subscriptable

我该怎么做需要做什么来修复？

Answer 1

使用 pd.Series.isnull 和 np.arange :

# calculate maximum value
maxId = int(pd.to_numeric(df['Id'], errors='coerce').max())

# calculate Boolean series of nulls
nulls = df['Id'].isnull()

# assign range starting from one above maxId
df.loc[nulls, 'Id'] = np.arange(maxId + 1, maxId + 1 + nulls.sum())

print(df)

#      Id
# 0     3
# 1    67
# 2   356
# 3   679
# 50   P4
# 51   P5
# 52  678
# 53  680
# 54    2

Answer 2

正如您所说的那样，您已经确定了最大值，可以尝试此矢量化解决方案：

585088123

输出：

3222248359

Answer 3

您是否需要像“P4”和“P5”这样的值？我试图重现与您的类似的 DataFrame，但没有这些值，它就可以正常工作：

df = pd.DataFrame({'A' : [20,4, np.nan, np.nan, 12, np.nan, 6, 10]})

maxID = df['A'].max()

for i in range (len(df['A'])):
    if pd.isnull(df['A'].loc[i]):
        maxID +=1
        df['A'].loc[i] = maxID

我认为您的错误发生的原因是您正试图访问浮点数的元素，就像您对列表所做的那样。

示例：

my_float = 3.0 
my_float[0]

TypeError: 'float' object is not subscriptable