🔄 Data Transformation
Data transformation is where the magic happens! Once you have clean data, you need to shape it for analysis. This means adding new columns, modifying existing ones, and applying functions to transform your data into exactly what you need.
Think of transformation like cooking - you have your ingredients (raw data), and now you're combining, seasoning, and preparing them into something useful for analysis.
import pandas as pd
# Start with simple sales data
sales = pd.DataFrame({
'product': ['Laptop', 'Mouse', 'Keyboard'],
'price': [999, 25, 75],
'quantity': [2, 5, 3]
})
print("Original data:")
print(sales)
print()
# Transform: Add total revenue column
sales['total'] = sales['price'] * sales['quantity']
# Transform: Add price category
sales['price_category'] = sales['price'].apply(lambda x: 'High' if x > 100 else 'Low')
print("After transformation:")
print(sales)
🎯 Why Data Transformation Matters
Raw data is rarely in the exact format you need for analysis. Transformation helps you:
📚 What You'll Learn in This Section
Master the essential data transformation techniques:
- ➕ Adding and Modifying Columns Learn to create new columns and update existing ones with calculations and logic.
- 🔧 Applying Functions Use built-in and custom functions to transform your data efficiently.
- 📝 Renaming and Sorting Data Organize your DataFrame with better names and logical ordering.
🛠️ Common Transformation Patterns
Here are the most frequent data transformations you'll use:
import pandas as pd
# Employee data example
employees = pd.DataFrame({
'name': ['Alice', 'Bob', 'Charlie'],
'salary': [50000, 75000, 60000],
'department': ['Sales', 'IT', 'Sales']
})
print("Common transformations:")
print()
# 1. Mathematical calculations
employees['annual_bonus'] = employees['salary'] * 0.1
# 2. Text transformations
employees['name_upper'] = employees['name'].str.upper()
# 3. Conditional logic
employees['seniority'] = employees['salary'].apply(
lambda x: 'Senior' if x > 65000 else 'Junior'
)
print(employees)
🎨 Transformation Workflow
🚀 Transformation Preview
Get a taste of what's coming:
import pandas as pd
# Product data
products = pd.DataFrame({
'name': ['laptop pro', 'wireless mouse'],
'price': [1299.99, 49.99],
'category': ['electronics', 'electronics']
})
print("Before transformation:")
print(products)
print()
# Multiple transformations
products['name_clean'] = products['name'].str.title()
products['price_rounded'] = products['price'].round(0)
products['price_range'] = pd.cut(products['price'],
bins=[0, 100, 500, 2000],
labels=['Low', 'Medium', 'High'])
print("After transformation:")
print(products)
🎯 Key Transformation Concepts
🚀 What's Next?
Ready to start transforming your data? Let's begin with adding and modifying columns - the foundation of data transformation.
Start with: Adding and Modifying Columns
Time to transform your data! 🔄✨
Was this helpful?
Track Your Learning Progress
Sign in to bookmark tutorials and keep track of your learning journey.
Your progress is saved automatically as you read.