📚 Concatenating DataFrames
Concatenation is the simpler way to combine DataFrames! It's like stacking papers on top of each other or placing them side by side. Use concatenation when your DataFrames have similar structures and you want to combine them into one larger dataset.
Think of concatenation like combining monthly reports into a quarterly report - you're stacking similar data together.
import pandas as pd
# Monthly sales data
january = pd.DataFrame({
'product': ['Laptop', 'Mouse', 'Keyboard'],
'sales': [10, 25, 15]
})
february = pd.DataFrame({
'product': ['Laptop', 'Mouse', 'Monitor'],
'sales': [12, 30, 8]
})
print("January sales:")
print(january)
print()
print("February sales:")
print(february)
print()
# Concatenate vertically (stack on top)
quarterly = pd.concat([january, february], ignore_index=True)
print("Combined quarterly sales:")
print(quarterly)
🎯 Understanding Concatenation
Concatenation combines DataFrames along an axis (rows or columns):
📊 Vertical Concatenation (Stacking Rows)
Most common use case - stacking DataFrames with same columns:
import pandas as pd
# Department sales data
sales_dept = pd.DataFrame({
'employee': ['Alice', 'Bob'],
'department': ['Sales', 'Sales'],
'revenue': [50000, 45000]
})
it_dept = pd.DataFrame({
'employee': ['Charlie', 'Diana'],
'department': ['IT', 'IT'],
'revenue': [75000, 80000]
})
print("Sales department:")
print(sales_dept)
print()
print("IT department:")
print(it_dept)
print()
# Concatenate departments
all_employees = pd.concat([sales_dept, it_dept], ignore_index=True)
print("All employees:")
print(all_employees)
Handling Different Columns
import pandas as pd
# DataFrames with some different columns
online_sales = pd.DataFrame({
'product': ['Laptop', 'Mouse'],
'sales': [5, 20],
'website': ['Amazon', 'eBay']
})
store_sales = pd.DataFrame({
'product': ['Keyboard', 'Monitor'],
'sales': [10, 3],
'store_location': ['NYC', 'LA']
})
print("Online sales:")
print(online_sales)
print()
print("Store sales:")
print(store_sales)
print()
# Concatenate with different columns
all_sales = pd.concat([online_sales, store_sales], ignore_index=True)
print("All sales (missing values filled with NaN):")
print(all_sales)
↔️ Horizontal Concatenation (Side by Side)
Combining DataFrames side by side when they have the same number of rows:
import pandas as pd
# Customer basic info
basic_info = pd.DataFrame({
'customer_id': [1, 2, 3],
'name': ['Alice', 'Bob', 'Charlie']
})
# Customer contact info
contact_info = pd.DataFrame({
'email': ['alice@email.com', 'bob@email.com', 'charlie@email.com'],
'phone': ['555-0101', '555-0102', '555-0103']
})
print("Basic info:")
print(basic_info)
print()
print("Contact info:")
print(contact_info)
print()
# Concatenate horizontally
complete_info = pd.concat([basic_info, contact_info], axis=1)
print("Complete customer info:")
print(complete_info)
🏷️ Adding Source Labels
You can track which DataFrame each row came from:
import pandas as pd
# Regional sales
north_sales = pd.DataFrame({
'product': ['Laptop', 'Mouse'],
'sales': [100, 150]
})
south_sales = pd.DataFrame({
'product': ['Laptop', 'Keyboard'],
'sales': [80, 120]
})
print("North region:")
print(north_sales)
print()
print("South region:")
print(south_sales)
print()
# Concatenate with source labels
regional_sales = pd.concat([north_sales, south_sales],
keys=['North', 'South'],
names=['Region', 'Row'])
print("Regional sales with labels:")
print(regional_sales)
print()
# Reset index to make it a regular DataFrame
regional_flat = regional_sales.reset_index()
print("Flattened with region column:")
print(regional_flat)
🔧 Advanced Concatenation Options
Handling Index Conflicts
import pandas as pd
# DataFrames with overlapping indices
q1_data = pd.DataFrame({
'revenue': [100, 200]
}, index=[1, 2])
q2_data = pd.DataFrame({
'revenue': [150, 180]
}, index=[2, 3]) # Note: index 2 appears in both
print("Q1 data:")
print(q1_data)
print()
print("Q2 data:")
print(q2_data)
print()
# Option 1: Keep original indices (default)
combined_keep = pd.concat([q1_data, q2_data])
print("Keep original indices (note duplicate index 2):")
print(combined_keep)
print()
# Option 2: Reset indices
combined_reset = pd.concat([q1_data, q2_data], ignore_index=True)
print("Reset indices:")
print(combined_reset)
Join Types in Concatenation
import pandas as pd
# DataFrames with some overlapping columns
df1 = pd.DataFrame({
'A': [1, 2],
'B': [3, 4],
'C': [5, 6]
})
df2 = pd.DataFrame({
'B': [7, 8],
'C': [9, 10],
'D': [11, 12]
})
print("DataFrame 1:")
print(df1)
print()
print("DataFrame 2:")
print(df2)
print()
# Outer join (default) - keeps all columns
outer_join = pd.concat([df1, df2], ignore_index=True)
print("Outer join (all columns):")
print(outer_join)
print()
# Inner join - only common columns
inner_join = pd.concat([df1, df2], join='inner', ignore_index=True)
print("Inner join (common columns only):")
print(inner_join)
📋 Real-World Example: Monthly Reports
Let's combine multiple monthly reports into a comprehensive analysis:
import pandas as pd
# Create monthly data
months = ['January', 'February', 'March']
monthly_data = []
for i, month in enumerate(months, 1):
month_df = pd.DataFrame({
'product': ['Laptop', 'Mouse', 'Keyboard'],
'sales': [10 + i*2, 20 + i*3, 15 + i*1],
'revenue': [(10 + i*2) * 1000, (20 + i*3) * 25, (15 + i*1) * 75]
})
monthly_data.append(month_df)
# Show individual months
for i, month in enumerate(months):
print(f"{month} sales:")
print(monthly_data[i])
print()
# Combine with month labels
quarterly = pd.concat(monthly_data,
keys=months,
names=['Month', 'Product_Index'])
print("Quarterly summary with month labels:")
print(quarterly)
print()
# Flatten for analysis
quarterly_flat = quarterly.reset_index()
print("Quarterly data for analysis:")
print(quarterly_flat)
print()
# Quick analysis: Total sales by product
product_totals = quarterly_flat.groupby('product')[['sales', 'revenue']].sum()
print("Total sales by product:")
print(product_totals)
⚡ Performance and Best Practices
🎯 Key Takeaways
🚀 What's Next?
Perfect! You now know how to concatenate DataFrames for simple data combination. Next, let's learn about merging and join operations - more sophisticated ways to combine related data.
Continue to: Merging and Join Operations
Keep concatenating! 📚🔗
Was this helpful?
Track Your Learning Progress
Sign in to bookmark tutorials and keep track of your learning journey.
Your progress is saved automatically as you read.