🔧 Applying Functions
Functions are incredibly powerful tools for transforming data! Instead of writing complex loops, you can apply functions to entire columns or DataFrames at once. This makes your code faster, cleaner, and more readable.
Think of functions as specialized tools - each one designed for a specific job, and pandas helps you use them efficiently on your data.
import pandas as pd
# Customer data
customers = pd.DataFrame({
'name': ['alice smith', 'BOB JONES', 'Charlie Brown'],
'email': ['ALICE@email.com', 'bob@EMAIL.COM', 'charlie@company.org'],
'age': [25, 30, 35]
})
print("Original data:")
print(customers)
print()
# Apply functions to clean the data
customers['name_clean'] = customers['name'].str.title()
customers['email_clean'] = customers['email'].str.lower()
customers['age_group'] = customers['age'].apply(lambda x: 'Young' if x < 30 else 'Older')
print("After applying functions:")
print(customers[['name_clean', 'email_clean', 'age_group']])
🎯 What Are Functions in Pandas?
Functions are instructions that transform data. Pandas provides many built-in functions, and you can create your own custom ones.
📝 String Functions
String functions help you clean and format text data:
import pandas as pd
# Messy text data
data = pd.DataFrame({
'product': [' iPhone 14 ', 'SAMSUNG GALAXY', 'google pixel'],
'description': ['Latest Apple phone', 'Android smartphone', 'Google phone']
})
print("Messy data:")
print(data)
print()
# Apply string functions
data['product_clean'] = data['product'].str.strip().str.title()
data['product_upper'] = data['product'].str.upper()
data['description_short'] = data['description'].str[:10] + '...'
print("Cleaned data:")
print(data[['product_clean', 'description_short']])
Common String Operations
🔢 Mathematical Functions
Apply math operations to numerical data:
import pandas as pd
import numpy as np
# Sales data with some negative values (returns)
sales = pd.DataFrame({
'product': ['Laptop', 'Mouse', 'Keyboard'],
'revenue': [1250.75, -29.99, 85.50], # negative = return
'tax_rate': [0.085, 0.085, 0.085]
})
print("Sales data:")
print(sales)
print()
# Apply mathematical functions
sales['revenue_abs'] = sales['revenue'].abs() # Remove negative signs
sales['revenue_rounded'] = sales['revenue'].round(0) # Round to whole numbers
sales['tax_amount'] = (sales['revenue'] * sales['tax_rate']).round(2)
print("With math functions:")
print(sales)
🔨 Custom Functions with Apply
Sometimes you need custom logic that built-in functions can't handle:
Lambda Functions (Quick Custom Functions)
import pandas as pd
# Employee data
employees = pd.DataFrame({
'name': ['Alice', 'Bob', 'Charlie'],
'salary': [50000, 75000, 65000],
'years_experience': [2, 8, 5]
})
print("Employee data:")
print(employees)
print()
# Apply lambda functions for quick transformations
employees['salary_k'] = employees['salary'].apply(lambda x: f"{x//1000}k")
employees['experience_level'] = employees['years_experience'].apply(
lambda x: 'Senior' if x >= 5 else 'Junior'
)
employees['salary_category'] = employees['salary'].apply(
lambda x: 'High' if x > 60000 else 'Standard'
)
print("With lambda functions:")
print(employees)
Regular Custom Functions
For more complex logic, create regular functions:
import pandas as pd
def categorize_grade(score):
"""Convert numerical score to letter grade"""
if score >= 90:
return 'A'
elif score >= 80:
return 'B'
elif score >= 70:
return 'C'
elif score >= 60:
return 'D'
else:
return 'F'
def format_phone(phone):
"""Format phone number"""
# Remove all non-digits
digits = ''.join(filter(str.isdigit, str(phone)))
if len(digits) == 10:
return f"({digits[:3]}) {digits[3:6]}-{digits[6:]}"
return phone
# Student data
students = pd.DataFrame({
'name': ['Alice', 'Bob', 'Charlie'],
'score': [95, 83, 67],
'phone': ['1234567890', '555-123-4567', '9876543210']
})
print("Student data:")
print(students)
print()
# Apply custom functions
students['letter_grade'] = students['score'].apply(categorize_grade)
students['phone_formatted'] = students['phone'].apply(format_phone)
print("With custom functions:")
print(students)
📊 Applying Functions to Multiple Columns
Sometimes you need to use data from multiple columns:
import pandas as pd
def calculate_bmi(row):
"""Calculate BMI from height and weight"""
height_m = row['height_cm'] / 100
bmi = row['weight_kg'] / (height_m ** 2)
return round(bmi, 1)
def full_address(row):
"""Combine address components"""
return f"{row['street']}, {row['city']}, {row['state']}"
# Health data
health = pd.DataFrame({
'name': ['Alice', 'Bob', 'Charlie'],
'height_cm': [165, 180, 175],
'weight_kg': [65, 80, 70],
'street': ['123 Main St', '456 Oak Ave', '789 Pine Rd'],
'city': ['Boston', 'Chicago', 'Denver'],
'state': ['MA', 'IL', 'CO']
})
print("Health data:")
print(health[['name', 'height_cm', 'weight_kg']])
print()
# Apply functions using multiple columns
health['bmi'] = health.apply(calculate_bmi, axis=1)
health['full_address'] = health.apply(full_address, axis=1)
print("With calculated values:")
print(health[['name', 'bmi', 'full_address']])
⚡ Performance Tips
🔄 Alternative: Map for Simple Replacements
For simple value replacements, .map()
can be faster:
import pandas as pd
# Survey responses
survey = pd.DataFrame({
'response': ['Y', 'N', 'Y', 'N', 'Y'],
'rating': [1, 2, 3, 4, 5]
})
print("Survey data:")
print(survey)
print()
# Use map for simple replacements
response_map = {'Y': 'Yes', 'N': 'No'}
rating_map = {1: 'Poor', 2: 'Fair', 3: 'Good', 4: 'Very Good', 5: 'Excellent'}
survey['response_text'] = survey['response'].map(response_map)
survey['rating_text'] = survey['rating'].map(rating_map)
print("With mapped values:")
print(survey)
🎯 Key Takeaways
🚀 What's Next?
Excellent! You now know how to apply functions to transform your data efficiently. Next, let's learn about renaming columns and sorting data to organize your DataFrames.
Continue to: Renaming and Sorting Data
Keep applying those functions! 🔧✨
Was this helpful?
Track Your Learning Progress
Sign in to bookmark tutorials and keep track of your learning journey.
Your progress is saved automatically as you read.