📊 Time Series and Resampling
Time series analysis and resampling are incredibly powerful for understanding patterns over time! Resampling lets you change the frequency of your data - like converting daily sales to monthly totals, or hourly data to daily averages. This is essential for trend analysis and reporting.
Think of resampling like changing the zoom level on a timeline - you can zoom out to see long-term trends or zoom in to see detailed patterns.
import pandas as pd
# Daily sales data
daily_sales = pd.DataFrame({
'date': pd.date_range('2023-01-01', periods=14, freq='D'),
'sales': [100, 120, 110, 130, 140, 160, 150, 170, 180, 165, 175, 185, 190, 200]
})
# Set date as index for time series operations
daily_sales.set_index('date', inplace=True)
print("Daily sales data:")
print(daily_sales.head(7))
print()
# Resample to weekly totals
weekly_sales = daily_sales.resample('W').sum()
print("Weekly sales totals:")
print(weekly_sales)
print()
# Resample to weekly averages
weekly_avg = daily_sales.resample('W').mean()
print("Weekly sales averages:")
print(weekly_avg.round(1))
🎯 Understanding Time Series
Time series data has dates/times as the index, enabling powerful temporal operations:
📊 Creating Time Series Data
Let's start by setting up proper time series data:
import pandas as pd
import numpy as np
# Create sample time series data
dates = pd.date_range('2023-01-01', periods=30, freq='D')
np.random.seed(42) # For consistent results
values = 100 + np.random.randn(30).cumsum() # Random walk starting at 100
# Create time series DataFrame
ts_data = pd.DataFrame({
'value': values
}, index=dates)
print("Time series data (first 10 days):")
print(ts_data.head(10))
print()
# Basic time series info
print("Time series info:")
print(f"Start date: {ts_data.index.min()}")
print(f"End date: {ts_data.index.max()}")
print(f"Frequency: {ts_data.index.freq}")
print(f"Number of observations: {len(ts_data)}")
🔄 Resampling Fundamentals
Resampling changes the frequency of your time series data:
Downsampling Examples
import pandas as pd
# Daily revenue data
daily_revenue = pd.DataFrame({
'revenue': [1000, 1200, 1100, 1300, 1400, 1600, 1500, 1700, 1800, 1650, 1750, 1850, 1900, 2000]
}, index=pd.date_range('2023-01-01', periods=14, freq='D'))
print("Daily revenue (14 days):")
print(daily_revenue)
print()
# Resample to weekly totals
weekly_total = daily_revenue.resample('W').sum()
print("Weekly revenue totals:")
print(weekly_total)
print()
# Resample to weekly averages
weekly_avg = daily_revenue.resample('W').mean()
print("Weekly revenue averages:")
print(weekly_avg.round(0))
Multiple Aggregations
import pandas as pd
# Customer visits data
visits = pd.DataFrame({
'visits': [50, 60, 45, 70, 80, 90, 75, 85, 95, 80, 88, 92, 100, 105]
}, index=pd.date_range('2023-01-01', periods=14, freq='D'))
print("Daily visits:")
print(visits)
print()
# Multiple aggregations at once
weekly_stats = visits.resample('W').agg(['sum', 'mean', 'max', 'min'])
print("Weekly visit statistics:")
print(weekly_stats.round(1))
📅 Working with Different Frequencies
Different business needs require different time frequencies:
import pandas as pd
# Monthly sales data
monthly_data = pd.DataFrame({
'sales': [10000, 12000, 11000, 13000, 14000, 16000]
}, index=pd.date_range('2023-01-01', periods=6, freq='M'))
print("Monthly sales:")
print(monthly_data)
print()
# Quarterly aggregation
quarterly = monthly_data.resample('Q').sum()
print("Quarterly sales:")
print(quarterly)
print()
# Year-to-date running totals
monthly_data['ytd_total'] = monthly_data['sales'].cumsum()
print("With year-to-date totals:")
print(monthly_data)
🎨 Advanced Resampling Options
Customize how resampling handles your data:
import pandas as pd
# Daily data
daily_data = pd.DataFrame({
'value': [10, 20, 30, 40, 50, 60, 70]
}, index=pd.date_range('2023-01-01', periods=7, freq='D'))
print("Daily data:")
print(daily_data)
print()
# Different labeling options
weekly_left = daily_data.resample('W', label='left').sum()
weekly_right = daily_data.resample('W', label='right').sum()
print("Weekly sum (left label - start of week):")
print(weekly_left)
print()
print("Weekly sum (right label - end of week):")
print(weekly_right)
📊 Time-Based Grouping
Group data by time components without resampling:
import pandas as pd
# Transaction data across multiple months
transactions = pd.DataFrame({
'date': pd.date_range('2023-01-01', periods=90, freq='D'),
'amount': [100 + i % 50 for i in range(90)] # Varying amounts
})
# Set date as index
transactions.set_index('date', inplace=True)
print("Sample transactions:")
print(transactions.head())
print()
# Group by month
monthly_totals = transactions.groupby(transactions.index.month)['amount'].sum()
print("Total amount by month:")
print(monthly_totals)
print()
# Group by day of week
dow_avg = transactions.groupby(transactions.index.day_name())['amount'].mean()
print("Average amount by day of week:")
print(dow_avg.round(1))
🔍 Time Series Filtering
Filter time series data using date ranges:
import pandas as pd
# Year of sales data
full_year = pd.DataFrame({
'sales': [1000 + i*10 for i in range(365)]
}, index=pd.date_range('2023-01-01', periods=365, freq='D'))
print("Full year data shape:", full_year.shape)
print("Sample of full year:")
print(full_year.head())
print()
# Filter by month
march_data = full_year['2023-03']
print("March 2023 data:")
print(f"Shape: {march_data.shape}")
print(march_data.head())
print()
# Filter by quarter (first 3 months)
q1_data = full_year['2023-01':'2023-03']
print("Q1 2023 summary:")
print(f"Total sales: {q1_data['sales'].sum()}")
print(f"Average daily sales: {q1_data['sales'].mean():.1f}")
📈 Real-World Example: Website Traffic Analysis
Let's analyze website traffic patterns:
import pandas as pd
import numpy as np
# Simulate hourly website traffic for a week
np.random.seed(42)
hours = pd.date_range('2023-01-01', periods=24*7, freq='H')
# Create realistic traffic pattern (higher during day, lower at night)
base_traffic = 100
hourly_pattern = [50 + 50*np.sin((h % 24 - 6) * np.pi / 12) for h in range(len(hours))]
noise = np.random.normal(0, 10, len(hours))
traffic = [max(0, base + pattern + n) for base, pattern, n in zip([base_traffic]*len(hours), hourly_pattern, noise)]
website_traffic = pd.DataFrame({
'visitors': traffic
}, index=hours)
print("Hourly traffic (first 24 hours):")
print(website_traffic.head(24))
print()
# Daily traffic summary
daily_summary = website_traffic.resample('D').agg({
'visitors': ['sum', 'mean', 'max', 'min']
})
daily_summary.columns = ['Total_Visitors', 'Avg_Hourly', 'Peak_Hour', 'Low_Hour']
print("Daily traffic summary:")
print(daily_summary.round(1))
print()
# Average traffic by hour of day
hourly_pattern = website_traffic.groupby(website_traffic.index.hour)['visitors'].mean()
print("Average visitors by hour of day:")
print(hourly_pattern.round(1))
🎯 Key Takeaways
🚀 What's Next?
Perfect! You now understand time series operations and resampling. Next, let's learn about saving your work - how to export your analyzed data to different formats for sharing and reporting.
Continue to: Saving Your Work
Keep analyzing time! 📊⏰
Was this helpful?
Track Your Learning Progress
Sign in to bookmark tutorials and keep track of your learning journey.
Your progress is saved automatically as you read.