📊 Grouping and Aggregation
Grouping and aggregation are among the most powerful features in pandas! They let you split your data into groups, calculate summaries for each group, and discover patterns you couldn't see before. This is essential for data analysis and reporting.
Think of it like organizing your music by genre, then finding the average song length for each genre - you're grouping (by genre) and aggregating (calculating averages).
import pandas as pd
# Sales data by department
sales = pd.DataFrame({
'department': ['Sales', 'IT', 'Sales', 'HR', 'IT', 'Sales'],
'employee': ['Alice', 'Bob', 'Charlie', 'Diana', 'Eve', 'Frank'],
'revenue': [50000, 75000, 45000, 30000, 80000, 55000]
})
print("Sales data:")
print(sales)
print()
# Group by department and calculate total revenue
dept_summary = sales.groupby('department')['revenue'].sum()
print("Total revenue by department:")
print(dept_summary)
print()
# Multiple statistics at once
dept_stats = sales.groupby('department')['revenue'].agg(['sum', 'mean', 'count'])
print("Department statistics:")
print(dept_stats)
🎯 Why Grouping and Aggregation Matter
Raw data often contains details, but insights come from summaries. Grouping helps you:
📚 What You'll Learn in This Section
Master the art of data summarization:
- 🔗 GroupBy Operations Learn the fundamentals of splitting data into groups and basic aggregations.
- 📈 Aggregation Functions Explore different ways to summarize your grouped data with built-in and custom functions.
- 📋 Pivot Tables and Cross Tabulation Create powerful summary tables and cross-tabulations for analysis.
🔍 Grouping Concepts
Here's how grouping works conceptually:
🛠️ Common Grouping Patterns
Here are the most frequent grouping operations you'll use:
import pandas as pd
# Customer orders data
orders = pd.DataFrame({
'customer': ['Alice', 'Bob', 'Alice', 'Charlie', 'Bob', 'Alice'],
'product': ['Laptop', 'Mouse', 'Keyboard', 'Laptop', 'Monitor', 'Mouse'],
'quantity': [1, 2, 1, 1, 1, 3],
'price': [999, 25, 75, 999, 300, 25]
})
print("Customer orders:")
print(orders)
print()
# Pattern 1: Count by group
customer_orders = orders.groupby('customer').size()
print("Orders per customer:")
print(customer_orders)
print()
# Pattern 2: Sum by group
customer_total = orders.groupby('customer')['price'].sum()
print("Total spent per customer:")
print(customer_total)
print()
# Pattern 3: Multiple aggregations
customer_summary = orders.groupby('customer').agg({
'quantity': 'sum',
'price': ['sum', 'mean']
})
print("Customer summary:")
print(customer_summary)
📊 Aggregation Functions Overview
🚀 Grouping Preview
Get a taste of what's coming:
import pandas as pd
# Survey responses by age group
survey = pd.DataFrame({
'age_group': ['18-25', '26-35', '18-25', '36-45', '26-35', '18-25'],
'satisfaction': [4, 5, 3, 4, 5, 4],
'recommend': ['Yes', 'Yes', 'No', 'Yes', 'Yes', 'Yes'],
'city': ['NYC', 'LA', 'NYC', 'Chicago', 'LA', 'Boston']
})
print("Survey responses:")
print(survey)
print()
# Multiple grouping insights
print("Average satisfaction by age group:")
age_satisfaction = survey.groupby('age_group')['satisfaction'].mean()
print(age_satisfaction)
print()
print("Recommendation rate by city:")
city_recommend = survey.groupby('city')['recommend'].apply(lambda x: (x == 'Yes').mean())
print(city_recommend.round(2))
📈 Real-World Applications
🎯 Grouping Best Practices
🚀 What's Next?
Ready to start grouping your data and discovering insights? Let's begin with the fundamentals of GroupBy operations.
Start with: GroupBy Operations
Time to group and aggregate! 📊🔍
Was this helpful?
Track Your Learning Progress
Sign in to bookmark tutorials and keep track of your learning journey.
Your progress is saved automatically as you read.