🔗 Combining Data

Real-world data often comes from multiple sources! Combining DataFrames is essential for comprehensive analysis - whether you're stacking monthly reports, merging customer data with sales records, or joining information from different databases.

Think of combining data like assembling a puzzle - each DataFrame is a piece, and you need to fit them together correctly to see the complete picture.

import pandas as pd

# Customer information
customers = pd.DataFrame({
    'customer_id': [1, 2, 3],
    'name': ['Alice', 'Bob', 'Charlie'],
    'city': ['NYC', 'LA', 'Chicago']
})

# Order information
orders = pd.DataFrame({
    'order_id': [101, 102, 103],
    'customer_id': [1, 2, 1],
    'amount': [250, 150, 300]
})

print("Customers:")
print(customers)
print()
print("Orders:")
print(orders)
print()

# Combine: Merge customer info with orders
combined = orders.merge(customers, on='customer_id')
print("Combined data:")
print(combined)

🎯 Why Combine Data?

Data combination unlocks powerful analysis possibilities:

📚 What You'll Learn in This Section

Master the essential data combination techniques:

🛠️ Types of Data Combination

There are two main ways to combine DataFrames:

📊 Quick Combination Examples

Here's a preview of what you can do:

import pandas as pd

# Example 1: Concatenation (stacking monthly data)
jan_sales = pd.DataFrame({
    'product': ['Laptop', 'Mouse'],
    'sales': [5, 20]
})

feb_sales = pd.DataFrame({
    'product': ['Laptop', 'Keyboard'],
    'sales': [8, 15]
})

print("January sales:")
print(jan_sales)
print()
print("February sales:")
print(feb_sales)
print()

# Concatenate (stack) the monthly data
quarterly = pd.concat([jan_sales, feb_sales], ignore_index=True)
print("Combined quarterly data:")
print(quarterly)
import pandas as pd

# Example 2: Merging (joining related data)
products = pd.DataFrame({
    'product_id': [1, 2, 3],
    'name': ['Laptop', 'Mouse', 'Keyboard'],
    'category': ['Electronics', 'Accessories', 'Accessories']
})

sales = pd.DataFrame({
    'product_id': [1, 2, 1],
    'quantity': [2, 10, 1],
    'revenue': [2000, 250, 1000]
})

print("Products:")
print(products)
print()
print("Sales:")
print(sales)
print()

# Merge product details with sales
detailed_sales = sales.merge(products, on='product_id')
print("Sales with product details:")
print(detailed_sales)

🎨 Combination Workflow

📈 Real-World Scenarios

Common business cases for data combination:

import pandas as pd

# Scenario: E-commerce analysis
# Customer demographics
demographics = pd.DataFrame({
    'customer_id': [1, 2, 3],
    'age_group': ['25-34', '35-44', '18-24'],
    'region': ['North', 'South', 'East']
})

# Purchase history
purchases = pd.DataFrame({
    'customer_id': [1, 1, 2, 3],
    'product': ['Laptop', 'Mouse', 'Keyboard', 'Monitor'],
    'amount': [999, 25, 75, 300]
})

print("Customer demographics:")
print(demographics)
print()
print("Purchase history:")
print(purchases)
print()

# Combine for customer analysis
customer_analysis = purchases.merge(demographics, on='customer_id')
print("Customer purchase analysis:")
print(customer_analysis)
print()

# Summary by region
regional_summary = customer_analysis.groupby('region')['amount'].agg(['sum', 'count'])
print("Sales summary by region:")
print(regional_summary)

🔍 Data Quality Considerations

🎯 Key Combination Concepts

🚀 What's Next?

Ready to start combining your data like a pro? Let's begin with concatenation - the simpler method for stacking similar DataFrames.

Start with: Concatenating DataFrames

Time to combine! 🔗✨

Was this helpful?

😔Poor
🙁Fair
😊Good
😄Great
🤩Excellent