✅ Boolean and Conditional Selection
Boolean selection is like having a smart filter that shows you only the data that meets your conditions. It's the difference between "show me all customers" and "show me customers who spent more than $1000 and live in New York."
🎯 Simple Conditions
Start with basic True/False questions about your data:
import pandas as pd
# Student data
students = pd.DataFrame({
'name': ['Alice', 'Bob', 'Charlie', 'Diana', 'Eve'],
'age': [20, 22, 19, 21, 23],
'grade': ['A', 'B', 'A', 'C', 'B'],
'score': [95, 82, 88, 76, 85]
})
print("Student Data:")
print(students)
print()
print("✅ Simple Conditions:")
print()
# Students over 20
print("1️⃣ Students over 20:")
older_students = students[students['age'] > 20]
print(older_students)
print()
# Grade A students
print("2️⃣ Grade A students:")
a_students = students[students['grade'] == 'A']
print(a_students)
print()
# High scores (>85)
print("3️⃣ High scorers (>85):")
high_scorers = students[students['score'] > 85]
print(high_scorers[['name', 'score']])
📊 Comparison Operators
Different ways to ask True/False questions:
Operator | Meaning | Example |
---|---|---|
== | Equal to | df['grade'] == 'A' |
!= | Not equal to | df['grade'] != 'F' |
> | Greater than | df['age'] > 18 |
>= | Greater than or equal | df['score'] >= 90 |
< | Less than | df['price'] < 100 |
<= | Less than or equal | df['stock'] <= 10 |
import pandas as pd
# Product inventory
products = pd.DataFrame({
'product': ['Laptop', 'Mouse', 'Keyboard', 'Monitor', 'Tablet'],
'price': [999, 25, 75, 300, 450],
'stock': [5, 50, 20, 8, 12],
'category': ['Computer', 'Accessory', 'Accessory', 'Computer', 'Computer']
})
print("Product Inventory:")
print(products)
print()
print("📊 Different Comparisons:")
print()
print("💰 Expensive items (>= $300):")
expensive = products[products['price'] >= 300]
print(expensive[['product', 'price']])
print()
print("📦 Low stock (< 10):")
low_stock = products[products['stock'] < 10]
print(low_stock[['product', 'stock']])
print()
print("🖥️ Not accessories:")
not_accessories = products[products['category'] != 'Accessory']
print(not_accessories[['product', 'category']])
🔗 Combining Conditions
Use multiple conditions together with &
(AND) and |
(OR):
import pandas as pd
# Employee data
employees = pd.DataFrame({
'name': ['Alice', 'Bob', 'Charlie', 'Diana', 'Eve'],
'age': [25, 35, 30, 28, 45],
'department': ['Sales', 'IT', 'Sales', 'HR', 'IT'],
'salary': [50000, 75000, 55000, 48000, 80000],
'years': [2, 8, 5, 3, 12]
})
print("Employee Data:")
print(employees)
print()
print("🔗 Combined Conditions:")
print()
print("1️⃣ Young AND high paid (age < 30 AND salary > 50000):")
young_high_paid = employees[
(employees['age'] < 30) & (employees['salary'] > 50000)
]
print(young_high_paid[['name', 'age', 'salary']])
print()
print("2️⃣ Sales OR IT department:")
sales_or_it = employees[
(employees['department'] == 'Sales') | (employees['department'] == 'IT')
]
print(sales_or_it[['name', 'department']])
print()
print("3️⃣ Experienced (>5 years) AND well-paid (>60000):")
experienced_well_paid = employees[
(employees['years'] > 5) & (employees['salary'] > 60000)
]
print(experienced_well_paid[['name', 'years', 'salary']])
📝 Text-Based Conditions
Special methods for filtering text data:
import pandas as pd
# Customer data
customers = pd.DataFrame({
'name': ['Alice Johnson', 'Bob Smith', 'Charlie Brown', 'Diana Lee'],
'email': ['alice@gmail.com', 'bob@yahoo.com', 'charlie@gmail.com', 'diana@outlook.com'],
'city': ['New York', 'Los Angeles', 'New York', 'Chicago'],
'status': ['Active', 'Inactive', 'Active', 'Active']
})
print("Customer Data:")
print(customers)
print()
print("📝 Text Filtering:")
print()
print("1️⃣ Names starting with 'A':")
a_names = customers[customers['name'].str.startswith('A')]
print(a_names[['name']])
print()
print("2️⃣ Gmail users:")
gmail_users = customers[customers['email'].str.contains('gmail')]
print(gmail_users[['name', 'email']])
print()
print("3️⃣ New York customers who are active:")
ny_active = customers[
(customers['city'] == 'New York') & (customers['status'] == 'Active')
]
print(ny_active[['name', 'city', 'status']])
print()
print("4️⃣ Names containing 'o':")
names_with_o = customers[customers['name'].str.contains('o', case=False)]
print(names_with_o[['name']])
📋 Using isin() for Multiple Values
Filter for multiple specific values at once:
import pandas as pd
# Order data
orders = pd.DataFrame({
'order_id': [1001, 1002, 1003, 1004, 1005, 1006],
'customer': ['Alice', 'Bob', 'Charlie', 'Alice', 'Diana', 'Bob'],
'product': ['Laptop', 'Mouse', 'Keyboard', 'Monitor', 'Tablet', 'Laptop'],
'status': ['Shipped', 'Pending', 'Delivered', 'Shipped', 'Cancelled', 'Delivered']
})
print("Order Data:")
print(orders)
print()
print("📋 Multiple Value Filtering:")
print()
print("1️⃣ Orders from Alice or Bob:")
specific_customers = orders[orders['customer'].isin(['Alice', 'Bob'])]
print(specific_customers[['order_id', 'customer', 'product']])
print()
print("2️⃣ Computer products (Laptop, Monitor, Tablet):")
computers = ['Laptop', 'Monitor', 'Tablet']
computer_orders = orders[orders['product'].isin(computers)]
print(computer_orders[['product', 'customer']])
print()
print("3️⃣ Active statuses (Shipped or Delivered):")
active_statuses = ['Shipped', 'Delivered']
active_orders = orders[orders['status'].isin(active_statuses)]
print(active_orders[['order_id', 'status']])
🎯 Practical Filtering Examples
Real-world filtering scenarios:
import pandas as pd
# Sales data
sales = pd.DataFrame({
'date': ['2023-01-15', '2023-02-10', '2023-01-25', '2023-03-05', '2023-02-20'],
'salesperson': ['Alice', 'Bob', 'Alice', 'Charlie', 'Bob'],
'amount': [1500, 800, 2200, 600, 1800],
'region': ['North', 'South', 'North', 'East', 'South'],
'product_type': ['Software', 'Hardware', 'Software', 'Hardware', 'Software']
})
print("Sales Data:")
print(sales)
print()
print("🎯 Business Filtering Examples:")
print()
print("1️⃣ High-value sales (>$1500) in North region:")
high_value_north = sales[
(sales['amount'] > 1500) & (sales['region'] == 'North')
]
print(high_value_north[['salesperson', 'amount', 'region']])
print()
print("2️⃣ Alice's software sales:")
alice_software = sales[
(sales['salesperson'] == 'Alice') & (sales['product_type'] == 'Software')
]
print(alice_software[['date', 'amount']])
print()
print("3️⃣ Small sales (<$1000) OR East region:")
small_or_east = sales[
(sales['amount'] < 1000) | (sales['region'] == 'East')
]
print(small_or_east[['salesperson', 'amount', 'region']])
print()
print("4️⃣ Top performers (Alice or Bob) with big sales (>$1200):")
top_big_sales = sales[
(sales['salesperson'].isin(['Alice', 'Bob'])) & (sales['amount'] > 1200)
]
print(top_big_sales[['salesperson', 'amount']])
🔍 Checking Your Filters
Always verify your filtering results:
import pandas as pd
# Survey data
survey = pd.DataFrame({
'respondent': ['P1', 'P2', 'P3', 'P4', 'P5', 'P6'],
'age': [25, 45, 32, 28, 38, 52],
'satisfaction': [4, 2, 5, 3, 4, 1],
'would_recommend': [True, False, True, True, True, False]
})
print("Survey Data:")
print(survey)
print()
print("🔍 Filtering with Verification:")
print()
# Filter and check
satisfied_customers = survey[survey['satisfaction'] >= 4]
print("Satisfied customers (satisfaction >= 4):")
print(satisfied_customers)
print(f"Count: {len(satisfied_customers)} out of {len(survey)}")
print()
# Multiple conditions with check
promoters = survey[
(survey['satisfaction'] >= 4) & (survey['would_recommend'] == True)
]
print("Promoters (satisfied AND would recommend):")
print(promoters[['respondent', 'satisfaction', 'would_recommend']])
print(f"Promoter rate: {len(promoters)/len(survey)*100:.1f}%")
print()
# Check what was filtered out
detractors = survey[survey['satisfaction'] <= 2]
print("Detractors (satisfaction <= 2):")
print(detractors[['respondent', 'satisfaction']])
⚠️ Common Filtering Mistakes
Avoid these boolean selection pitfalls that can cause errors or unexpected results:
import pandas as pd
# Sample data for demonstrating correct usage
data = pd.DataFrame({
'name': ['Alice', 'Bob', 'Charlie'],
'age': [25, 30, 35],
'score': [85, 90, 75]
})
print("✅ Correct Boolean Selection Examples:")
print()
# Correct filtering examples
result1 = data[(data['age'] > 25) & (data['score'] > 80)]
print("High age AND high score:")
print(result1)
print()
result2 = data[data['name'].isin(['Alice', 'Bob'])]
print("Alice or Bob:")
print(result2)
print()
result3 = data[(data['age'] > 25) | (data['score'] > 85)]
print("High age OR high score:")
print(result3)
🎯 Key Takeaways
🎮 Filtering Challenge
Practice your boolean selection skills:
import pandas as pd
# E-commerce data
products = pd.DataFrame({
'product_id': [101, 102, 103, 104, 105, 106],
'name': ['Gaming Laptop', 'Wireless Mouse', 'Keyboard Pro', 'Monitor 4K', 'Tablet Air', 'Phone Case'],
'category': ['Computer', 'Accessory', 'Accessory', 'Computer', 'Computer', 'Accessory'],
'price': [1299, 49, 129, 599, 449, 19],
'rating': [4.5, 4.2, 4.8, 4.1, 4.6, 3.9],
'in_stock': [True, True, False, True, True, True]
})
print("E-commerce Products:")
print(products)
print()
print("🎮 Filtering Challenges:")
print()
print("1️⃣ High-rated available products (rating >= 4.5 AND in stock):")
high_rated_available = products[
(products['rating'] >= 4.5) & (products['in_stock'] == True)
]
print(high_rated_available[['name', 'rating', 'in_stock']])
print()
print("2️⃣ Affordable computers (<$600):")
affordable_computers = products[
(products['category'] == 'Computer') & (products['price'] < 600)
]
print(affordable_computers[['name', 'price']])
print()
print("3️⃣ Premium products (>$400) OR top-rated (>4.7):")
premium_or_top = products[
(products['price'] > 400) | (products['rating'] > 4.7)
]
print(premium_or_top[['name', 'price', 'rating']])
print()
print(f"📊 Summary: Found {len(premium_or_top)} premium/top-rated products")
🚀 What's Next?
Fantastic! You now know how to filter data with precise conditions. Next, let's learn about cleaning your data to make it analysis-ready.
Continue to: Data Cleaning
You're mastering data selection like a pro! ✅🎯
Was this helpful?
Track Your Learning Progress
Sign in to bookmark tutorials and keep track of your learning journey.
Your progress is saved automatically as you read.