🕒 DateTime Operations
DateTime operations are the foundation of time-based analysis! Converting text dates to proper datetime objects unlocks powerful capabilities like date arithmetic, filtering by time periods, and extracting components like year or month.
Think of datetime conversion like translating different languages into one common format that everyone can understand and work with.
import pandas as pd
# Sample data with date strings
events = pd.DataFrame({
'event_date': ['2023-01-15', '2023-02-20', '2023-03-10'],
'event_name': ['Product Launch', 'Conference', 'Update Release'],
'attendance': [150, 300, 200]
})
print("Original data:")
print(events)
print("Date column type:", events['event_date'].dtype)
print()
# Convert to datetime
events['event_date'] = pd.to_datetime(events['event_date'])
print("After datetime conversion:")
print(events)
print("Date column type:", events['event_date'].dtype)
🔄 Converting Strings to DateTime
The first step is always converting text dates to pandas datetime objects:
Basic Date Conversion
import pandas as pd
# Different date formats
dates_data = pd.DataFrame({
'iso_format': ['2023-01-15', '2023-02-20', '2023-03-10'],
'us_format': ['01/15/2023', '02/20/2023', '03/10/2023'],
'text_format': ['Jan 15, 2023', 'Feb 20, 2023', 'Mar 10, 2023'],
'sales': [100, 150, 120]
})
print("Original date strings:")
print(dates_data)
print()
# Convert each format
dates_data['iso_converted'] = pd.to_datetime(dates_data['iso_format'])
dates_data['us_converted'] = pd.to_datetime(dates_data['us_format'])
dates_data['text_converted'] = pd.to_datetime(dates_data['text_format'])
print("All converted to same datetime format:")
print(dates_data[['iso_converted', 'us_converted', 'text_converted']])
Handling Conversion Errors
import pandas as pd
# Data with some invalid dates
messy_dates = pd.DataFrame({
'date_string': ['2023-01-15', 'invalid_date', '2023-03-10', ''],
'value': [100, 200, 150, 75]
})
print("Data with invalid dates:")
print(messy_dates)
print()
# Convert with error handling
messy_dates['date_converted'] = pd.to_datetime(messy_dates['date_string'], errors='coerce')
print("After conversion (invalid dates become NaT):")
print(messy_dates)
print()
# Remove rows with invalid dates
clean_dates = messy_dates.dropna(subset=['date_converted'])
print("Clean data only:")
print(clean_dates)
📅 Extracting Date Components
Once you have datetime objects, you can extract useful components:
import pandas as pd
# Sales data
sales = pd.DataFrame({
'sale_date': ['2023-01-15', '2023-02-20', '2023-03-10', '2023-04-05'],
'amount': [250, 300, 180, 220]
})
# Convert to datetime
sales['sale_date'] = pd.to_datetime(sales['sale_date'])
print("Original sales data:")
print(sales)
print()
# Extract components
sales['year'] = sales['sale_date'].dt.year
sales['month'] = sales['sale_date'].dt.month
sales['month_name'] = sales['sale_date'].dt.month_name()
sales['day_of_week'] = sales['sale_date'].dt.day_name()
print("With extracted components:")
print(sales)
🔢 Date Arithmetic
Perform calculations with dates:
import pandas as pd
from datetime import datetime, timedelta
# Order data
orders = pd.DataFrame({
'order_date': ['2023-01-15', '2023-01-20', '2023-01-25'],
'delivery_date': ['2023-01-18', '2023-01-25', '2023-01-30'],
'order_id': [1001, 1002, 1003]
})
# Convert to datetime
orders['order_date'] = pd.to_datetime(orders['order_date'])
orders['delivery_date'] = pd.to_datetime(orders['delivery_date'])
print("Order data:")
print(orders)
print()
# Calculate delivery time
orders['delivery_days'] = (orders['delivery_date'] - orders['order_date']).dt.days
# Calculate days since order
today = datetime.now()
orders['days_since_order'] = (today - orders['order_date']).dt.days
print("With calculated date differences:")
print(orders)
📊 Filtering by Date Ranges
Date filtering is essential for time-based analysis:
import pandas as pd
# Transaction data
transactions = pd.DataFrame({
'date': ['2023-01-15', '2023-02-20', '2023-03-10', '2023-04-05', '2023-05-12'],
'amount': [250, 300, 180, 220, 275],
'customer': ['Alice', 'Bob', 'Charlie', 'Alice', 'Diana']
})
# Convert to datetime
transactions['date'] = pd.to_datetime(transactions['date'])
print("All transactions:")
print(transactions)
print()
# Filter by date range
q1_transactions = transactions[transactions['date'] < '2023-04-01']
print("Q1 transactions (before April):")
print(q1_transactions)
print()
# Filter by specific month
march_transactions = transactions[transactions['date'].dt.month == 3]
print("March transactions:")
print(march_transactions)
print()
# Filter by date range
spring_transactions = transactions[
transactions['date'].between('2023-03-01', '2023-05-31')
]
print("Spring transactions (March-May):")
print(spring_transactions)
📈 Working with Quarters and Periods
Pandas makes it easy to work with business periods:
import pandas as pd
# Quarterly sales data
sales = pd.DataFrame({
'date': pd.date_range('2023-01-01', periods=12, freq='M'),
'sales': [100, 120, 110, 130, 140, 125, 135, 145, 155, 165, 150, 160]
})
print("Monthly sales:")
print(sales.head())
print()
# Add quarter information
sales['year'] = sales['date'].dt.year
sales['quarter'] = sales['date'].dt.quarter
sales['month_name'] = sales['date'].dt.month_name()
print("With quarter info:")
print(sales[['date', 'sales', 'quarter', 'month_name']])
print()
# Quarterly summary
quarterly_summary = sales.groupby(['year', 'quarter'])['sales'].sum()
print("Quarterly totals:")
print(quarterly_summary)
🎯 Custom Date Formats
Sometimes you need to specify the exact date format:
import pandas as pd
# Data with custom date format
custom_data = pd.DataFrame({
'date_string': ['15-Jan-2023', '20-Feb-2023', '10-Mar-2023'],
'revenue': [5000, 7500, 6200]
})
print("Custom format dates:")
print(custom_data)
print()
# Convert with specific format
custom_data['date_parsed'] = pd.to_datetime(
custom_data['date_string'],
format='%d-%b-%Y'
)
print("Parsed with custom format:")
print(custom_data)
print()
# Extract components
custom_data['month'] = custom_data['date_parsed'].dt.month
custom_data['month_name'] = custom_data['date_parsed'].dt.month_name()
print("With extracted components:")
print(custom_data[['date_parsed', 'revenue', 'month_name']])
📝 Real-World Example: Customer Activity Analysis
Let's analyze customer activity patterns over time:
import pandas as pd
# Customer activity data
activity = pd.DataFrame({
'login_date': ['2023-01-15', '2023-01-16', '2023-01-20', '2023-02-01', '2023-02-15'],
'customer_id': [1, 2, 1, 3, 2],
'session_duration': [45, 30, 60, 25, 40] # minutes
})
# Convert to datetime
activity['login_date'] = pd.to_datetime(activity['login_date'])
print("Customer activity:")
print(activity)
print()
# Extract time components
activity['month'] = activity['login_date'].dt.month
activity['day_of_week'] = activity['login_date'].dt.day_name()
activity['week_of_year'] = activity['login_date'].dt.isocalendar().week
print("With time components:")
print(activity)
print()
# Analyze patterns
monthly_stats = activity.groupby('month').agg({
'session_duration': ['mean', 'count'],
'customer_id': 'nunique'
})
monthly_stats.columns = ['Avg_Duration', 'Total_Sessions', 'Unique_Customers']
print("Monthly activity summary:")
print(monthly_stats)
🎯 Key Takeaways
🚀 What's Next?
Excellent! You now know how to convert and manipulate datetime data effectively. Next, let's learn about time series analysis and resampling - powerful techniques for analyzing data over time periods.
Continue to: Time Series and Resampling
Keep working with time! 🕒📊
Was this helpful?
Track Your Learning Progress
Sign in to bookmark tutorials and keep track of your learning journey.
Your progress is saved automatically as you read.