开发者问题收集

将列转换为具有混合类型的日期。“to_datetime”不起作用

2020-11-13
323

这是我的数据框:

   INV_NUM        LOCATION        CREATED_DATE     DATE_OPENED
      1             North           10/10/2020      10/11/2020
      2             North           10/11/2020      10/11/2020
      3             South           10/12/2020      10/01/2020
      4             West            10/13/2020      WIP

我想创建一个输出,生成一个名为“STATUS”的新列。如果“CREATED_DATE”在“DATE_OPENED”之前,则返回“Closed”,如果在“DATE_OPENED”当天或之后,则返回“Open”,如果“DATE_OPENED”为“WIP”,则返回“Closed”。

下面是我为此编写的代码:

定义一个函数:

def date_test(row):
    if row['DATE OPENED'] == 'WIP':
        return 'Closed'
    if row['CREATED_DATE'] < row['DATE OPENED']:
        return 'Closed'
    if row['CREATED_DATE'] >= row['DATE OPENED']:
        return 'Open'
    else:
        return 0000

将类型更改为 datetime:

df['DATE OPENED'] = pd.to_datetime(df['DATE OPENED'], errors='ignore')
df['CREATED_DATE'] = pd.to_datetime(df['CREATED_DATE'], errors='raise')

测试该函数:

df['STATUS'] = df.apply(lambda row: date_test(row), axis=1)

当我运行该代码时,出现以下错误:

TypeError: '<' not supported between instances of 'Timestamp' and 'str'

我还发现,在运行 to_datetime 后,“DATE_OPENED”列的 dtype 仍为“非空对象”。

我认为是“WIP”导致的,但它应该忽略它并继续更改其他的,函数首先查找“WIP”字符串。

2个回答

我认为问题出在这行:

df['DATE OPENED'] = pd.to_datetime(df['DATE OPENED'], errors='ignore')

errors='ignore' 似乎以原始形式返回整个系列,因此该列中的任何字符串都不会转换为日期时间对象。我建议使用 errors='coerce' ,这会将您的“WIP”字符串更改为 NaT 对象,您必须更新函数来处理该问题,但类似这样的方法应该有效:

import pandas as pd


df = pd.DataFrame(
    dict(
        INV_NUM = [1,2,3,4],
        LOCATION = "North North South West".split(),
        CREATED_DATE = "10/10/2020 10/11/2020 10/12/2020 10/13/2020".split(),
        DATE_OPENED = "10/11/2020 10/11/2020 10/01/2020 WIP".split(),
))

def date_test(row):
    if row['DATE_OPENED'] is pd.NaT:
        return 'Closed'
    if row['CREATED_DATE'] < row['DATE_OPENED']:
        return 'Closed'
    if row['CREATED_DATE'] >= row['DATE_OPENED']:
        return 'Open'
    else:
        return 0000

df['DATE_OPENED'] = pd.to_datetime(df['DATE_OPENED'], errors='coerce')
df['CREATED_DATE'] = pd.to_datetime(df['CREATED_DATE'], errors='raise')

df['STATUS'] = df.apply(lambda row: date_test(row), axis=1)

在 python tutor 中逐步执行代码

编辑: 如果您想保留 errors='ignore' 和那些“WIP”字符串,那么这应该有效:

df['DATE_OPENED'] = df['DATE_OPENED'].apply(
    lambda x: pd.to_datetime(x, errors='ignore')
)

python 导师链接 2

Phillyclause89
2020-11-13

您可以简化逻辑。给定 df,将两列都转换为_datetime,将 NaT 保留为无效日期(“WIP”)

df['DATE_OPENED'] = pd.to_datetime(df['DATE_OPENED'], errors='coerce')
df['CREATED_DATE'] = pd.to_datetime(df['CREATED_DATE'], errors='coerce')

现在创建一个默认为“已关闭”的 STATUS 列:

df['STATUS'] = 'Closed'

由于与 NaT 的任何比较都会给出 False ,您可以将“打开”设置为

df.loc[df['CREATED_DATE'] >= df['DATE_OPENED'], 'STATUS'] = 'Open'

为您提供

df
   INV_NUM LOCATION CREATED_DATE DATE_OPENED  STATUS
0        1    North   2020-10-10  2020-10-11  Closed
1        2    North   2020-10-11  2020-10-11  Open
2        3    South   2020-10-12  2020-10-01  Open
3        4     West   2020-10-13         NaT  Closed
FObersteiner
2020-11-13