🔗 Combining Data
Real-world data often comes from multiple sources! Combining DataFrames is essential for comprehensive analysis - whether you're stacking monthly reports, merging customer data with sales records, or joining information from different databases.
Think of combining data like assembling a puzzle - each DataFrame is a piece, and you need to fit them together correctly to see the complete picture.
import pandas as pd
# Customer information
customers = pd.DataFrame({
'customer_id': [1, 2, 3],
'name': ['Alice', 'Bob', 'Charlie'],
'city': ['NYC', 'LA', 'Chicago']
})
# Order information
orders = pd.DataFrame({
'order_id': [101, 102, 103],
'customer_id': [1, 2, 1],
'amount': [250, 150, 300]
})
print("Customers:")
print(customers)
print()
print("Orders:")
print(orders)
print()
# Combine: Merge customer info with orders
combined = orders.merge(customers, on='customer_id')
print("Combined data:")
print(combined)
🎯 Why Combine Data?
Data combination unlocks powerful analysis possibilities:
📚 What You'll Learn in This Section
Master the essential data combination techniques:
- 📚 Concatenating DataFrames Learn to stack DataFrames vertically and horizontally for simple data combination.
- 🔗 Merging and Join Operations Master different types of joins to combine related data from multiple sources.
🛠️ Types of Data Combination
There are two main ways to combine DataFrames:
📊 Quick Combination Examples
Here's a preview of what you can do:
import pandas as pd
# Example 1: Concatenation (stacking monthly data)
jan_sales = pd.DataFrame({
'product': ['Laptop', 'Mouse'],
'sales': [5, 20]
})
feb_sales = pd.DataFrame({
'product': ['Laptop', 'Keyboard'],
'sales': [8, 15]
})
print("January sales:")
print(jan_sales)
print()
print("February sales:")
print(feb_sales)
print()
# Concatenate (stack) the monthly data
quarterly = pd.concat([jan_sales, feb_sales], ignore_index=True)
print("Combined quarterly data:")
print(quarterly)
Merging Related Data
import pandas as pd
# Example 2: Merging (joining related data)
products = pd.DataFrame({
'product_id': [1, 2, 3],
'name': ['Laptop', 'Mouse', 'Keyboard'],
'category': ['Electronics', 'Accessories', 'Accessories']
})
sales = pd.DataFrame({
'product_id': [1, 2, 1],
'quantity': [2, 10, 1],
'revenue': [2000, 250, 1000]
})
print("Products:")
print(products)
print()
print("Sales:")
print(sales)
print()
# Merge product details with sales
detailed_sales = sales.merge(products, on='product_id')
print("Sales with product details:")
print(detailed_sales)
🎨 Combination Workflow
📈 Real-World Scenarios
Common business cases for data combination:
import pandas as pd
# Scenario: E-commerce analysis
# Customer demographics
demographics = pd.DataFrame({
'customer_id': [1, 2, 3],
'age_group': ['25-34', '35-44', '18-24'],
'region': ['North', 'South', 'East']
})
# Purchase history
purchases = pd.DataFrame({
'customer_id': [1, 1, 2, 3],
'product': ['Laptop', 'Mouse', 'Keyboard', 'Monitor'],
'amount': [999, 25, 75, 300]
})
print("Customer demographics:")
print(demographics)
print()
print("Purchase history:")
print(purchases)
print()
# Combine for customer analysis
customer_analysis = purchases.merge(demographics, on='customer_id')
print("Customer purchase analysis:")
print(customer_analysis)
print()
# Summary by region
regional_summary = customer_analysis.groupby('region')['amount'].agg(['sum', 'count'])
print("Sales summary by region:")
print(regional_summary)
🔍 Data Quality Considerations
🎯 Key Combination Concepts
🚀 What's Next?
Ready to start combining your data like a pro? Let's begin with concatenation - the simpler method for stacking similar DataFrames.
Start with: Concatenating DataFrames
Time to combine! 🔗✨
Was this helpful?
Track Your Learning Progress
Sign in to bookmark tutorials and keep track of your learning journey.
Your progress is saved automatically as you read.