将列转换为具有混合类型的日期。“to_datetime”不起作用
2020-11-13
323
这是我的数据框:
INV_NUM LOCATION CREATED_DATE DATE_OPENED
1 North 10/10/2020 10/11/2020
2 North 10/11/2020 10/11/2020
3 South 10/12/2020 10/01/2020
4 West 10/13/2020 WIP
我想创建一个输出,生成一个名为“STATUS”的新列。如果“CREATED_DATE”在“DATE_OPENED”之前,则返回“Closed”,如果在“DATE_OPENED”当天或之后,则返回“Open”,如果“DATE_OPENED”为“WIP”,则返回“Closed”。
下面是我为此编写的代码:
定义一个函数:
def date_test(row):
if row['DATE OPENED'] == 'WIP':
return 'Closed'
if row['CREATED_DATE'] < row['DATE OPENED']:
return 'Closed'
if row['CREATED_DATE'] >= row['DATE OPENED']:
return 'Open'
else:
return 0000
将类型更改为 datetime:
df['DATE OPENED'] = pd.to_datetime(df['DATE OPENED'], errors='ignore')
df['CREATED_DATE'] = pd.to_datetime(df['CREATED_DATE'], errors='raise')
测试该函数:
df['STATUS'] = df.apply(lambda row: date_test(row), axis=1)
当我运行该代码时,出现以下错误:
TypeError: '<' not supported between instances of 'Timestamp' and 'str'
我还发现,在运行 to_datetime 后,“DATE_OPENED”列的 dtype 仍为“非空对象”。
我认为是“WIP”导致的,但它应该忽略它并继续更改其他的,函数首先查找“WIP”字符串。
2个回答
我认为问题出在这行:
df['DATE OPENED'] = pd.to_datetime(df['DATE OPENED'], errors='ignore')
errors='ignore'
似乎以原始形式返回整个系列,因此该列中的任何字符串都不会转换为日期时间对象。我建议使用
errors='coerce'
,这会将您的“WIP”字符串更改为 NaT 对象,您必须更新函数来处理该问题,但类似这样的方法应该有效:
import pandas as pd
df = pd.DataFrame(
dict(
INV_NUM = [1,2,3,4],
LOCATION = "North North South West".split(),
CREATED_DATE = "10/10/2020 10/11/2020 10/12/2020 10/13/2020".split(),
DATE_OPENED = "10/11/2020 10/11/2020 10/01/2020 WIP".split(),
))
def date_test(row):
if row['DATE_OPENED'] is pd.NaT:
return 'Closed'
if row['CREATED_DATE'] < row['DATE_OPENED']:
return 'Closed'
if row['CREATED_DATE'] >= row['DATE_OPENED']:
return 'Open'
else:
return 0000
df['DATE_OPENED'] = pd.to_datetime(df['DATE_OPENED'], errors='coerce')
df['CREATED_DATE'] = pd.to_datetime(df['CREATED_DATE'], errors='raise')
df['STATUS'] = df.apply(lambda row: date_test(row), axis=1)
编辑:
如果您想保留
errors='ignore'
和那些“WIP”字符串,那么这应该有效:
df['DATE_OPENED'] = df['DATE_OPENED'].apply(
lambda x: pd.to_datetime(x, errors='ignore')
)
Phillyclause89
2020-11-13
您可以简化逻辑。给定 df,将两列都转换为_datetime,将 NaT 保留为无效日期(“WIP”)
df['DATE_OPENED'] = pd.to_datetime(df['DATE_OPENED'], errors='coerce')
df['CREATED_DATE'] = pd.to_datetime(df['CREATED_DATE'], errors='coerce')
现在创建一个默认为“已关闭”的 STATUS 列:
df['STATUS'] = 'Closed'
由于与
NaT
的任何比较都会给出
False
,您可以将“打开”设置为
df.loc[df['CREATED_DATE'] >= df['DATE_OPENED'], 'STATUS'] = 'Open'
为您提供
df
INV_NUM LOCATION CREATED_DATE DATE_OPENED STATUS
0 1 North 2020-10-10 2020-10-11 Closed
1 2 North 2020-10-11 2020-10-11 Open
2 3 South 2020-10-12 2020-10-01 Open
3 4 West 2020-10-13 NaT Closed
FObersteiner
2020-11-13