ChatGPT解决这个技术问题 Extra ChatGPT

Convert Pandas Column to DateTime

I have one field in a pandas DataFrame that was imported as string format. It should be a datetime variable. How do I convert it to a datetime column and then filter based on date.

Example:

DataFrame Name: raw_data

Column Name: Mycol

Value Format in Column: '05SEP2014:00:00:00.000'


a
atwalsh

Use the to_datetime function, specifying a format to match your data.

raw_data['Mycol'] =  pd.to_datetime(raw_data['Mycol'], format='%d%b%Y:%H:%M:%S.%f')

Note: the format argument isn't required. to_datetime is smart. Go ahead and try it without trying to match your data.
In order to avoid the SettingWithCopyWarning use the @darth-behfans stackoverflow.com/a/42773096/4487805
What if you just want time and not date?
Not terribly smart. Even if some of the column is unambiguously in dayfirst=True format, it will still default to dayfirst=False for the others in the same column. So, safer to use an explicit format specification or at least the dayfirst parameter.
Omitting the format string can cause this operation to be slow with lots of records. This answer discusses why. Looks like infer_datetime_format=True could also increase parsing speed up to ~5-10x (according to pandas docs) if you don't include a format string.
V
Vlad Bezden

If you have more than one column to be converted you can do the following:

df[["col1", "col2", "col3"]] = df[["col1", "col2", "col3"]].apply(pd.to_datetime)

I needed to do the following to specify format states_df[['from_datetime','to_datetime','timestamp']].apply(lambda _: pd.to_datetime(_,format='%Y-%m-%d %H:%M:%S.%f', errors='coerce'))
m
mechanical_meat

You can use the DataFrame method .apply() to operate on the values in Mycol:

>>> df = pd.DataFrame(['05SEP2014:00:00:00.000'],columns=['Mycol'])
>>> df
                    Mycol
0  05SEP2014:00:00:00.000
>>> import datetime as dt
>>> df['Mycol'] = df['Mycol'].apply(lambda x: 
                                    dt.datetime.strptime(x,'%d%b%Y:%H:%M:%S.%f'))
>>> df
       Mycol
0 2014-09-05

Thanks! This is nice because it is more broadly applicable but the other answer was more direct. I had a hard time deciding which I liked better :)
I like this answer better, because it produces a datetime object as opposed to a pandas.tslib.Timestamp object
R
RobC

Use the pandas to_datetime function to parse the column as DateTime. Also, by using infer_datetime_format=True, it will automatically detect the format and convert the mentioned column to DateTime.

import pandas as pd
raw_data['Mycol'] =  pd.to_datetime(raw_data['Mycol'], infer_datetime_format=True)

combine two or more sheets can be pain in the neck, especially when datetime involved. this infer_datetime_format saved me big time. thx chief!
Happy to help @Mike_Leigh !! Also, according to the docs, setting infer_datetime_format=True can increase the parsing speed by ~5-10x, in some cases.
Does not work for my date-format "Jan-18" which should be equal to "%b-%Y"
@Pfinnn if you know the exact date-format you can use the following code: pd.to_datetime('Jan-18', format='%b-%y'). Also, for python strftime cheatsheet refer: strftime.org
P
Petter Friberg
raw_data['Mycol'] =  pd.to_datetime(raw_data['Mycol'], format='%d%b%Y:%H:%M:%S.%f')

works, however it results in a Python warning of A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead

I would guess this is due to some chaining indexing.


Took me a few tries, yet this works: raw_data.loc[:,'Mycol'] = pd.to_datetime(raw_data['Mycol'], format='%d%b%Y:%H:%M:%S.%f')
This worked for me: raw_data.loc[:,'Mycol'] = pd.to_datetime(raw_data.loc[:,'Mycol'], format='%d%b%Y:%H:%M:%S.%f')
df2.loc[:,'datetime'] = pd.to_datetime(df2['datetime']) /usr/lib/python3/dist-packages/pandas/core/indexing.py:543: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: pandas.pydata.org/pandas-docs/stable/… self.obj[item] = s
Or just reset index on df copy
G
Gil Baggio

Time Saver:

raw_data['Mycol'] =  pd.to_datetime(raw_data['Mycol'])

h
hotplasma

It is important to note that pandas.to_datetime will almost never return a datetime.datetime. From the docs

Blockquote

Returns datetime
If parsing succeeded. Return type depends on input:

list-like: DatetimeIndex
Series: Series of datetime64 dtype
scalar: Timestamp

In case when it is not possible to return designated types (e.g. when any element 
of input is before Timestamp.min or after Timestamp.max) return will have 
datetime.datetime type (or corresponding array/Series).

Blockquote


This does not provide an answer to the question. Once you have sufficient reputation you will be able to comment on any post; instead, provide answers that don't require clarification from the asker. - From Review