✅ Boolean and Conditional Selection

Boolean selection is like having a smart filter that shows you only the data that meets your conditions. It's the difference between "show me all customers" and "show me customers who spent more than $1000 and live in New York."

🎯 Simple Conditions

Start with basic True/False questions about your data:

import pandas as pd

# Student data
students = pd.DataFrame({
    'name': ['Alice', 'Bob', 'Charlie', 'Diana', 'Eve'],
    'age': [20, 22, 19, 21, 23],
    'grade': ['A', 'B', 'A', 'C', 'B'],
    'score': [95, 82, 88, 76, 85]
})

print("Student Data:")
print(students)
print()

print("✅ Simple Conditions:")
print()

# Students over 20
print("1️⃣ Students over 20:")
older_students = students[students['age'] > 20]
print(older_students)
print()

# Grade A students
print("2️⃣ Grade A students:")
a_students = students[students['grade'] == 'A']
print(a_students)
print()

# High scores (>85)
print("3️⃣ High scorers (>85):")
high_scorers = students[students['score'] > 85]
print(high_scorers[['name', 'score']])

📊 Comparison Operators

Different ways to ask True/False questions:

OperatorMeaningExample
==Equal todf['grade'] == 'A'
!=Not equal todf['grade'] != 'F'
>Greater thandf['age'] > 18
>=Greater than or equaldf['score'] >= 90
<Less thandf['price'] < 100
<=Less than or equaldf['stock'] <= 10
import pandas as pd

# Product inventory
products = pd.DataFrame({
    'product': ['Laptop', 'Mouse', 'Keyboard', 'Monitor', 'Tablet'],
    'price': [999, 25, 75, 300, 450],
    'stock': [5, 50, 20, 8, 12],
    'category': ['Computer', 'Accessory', 'Accessory', 'Computer', 'Computer']
})

print("Product Inventory:")
print(products)
print()

print("📊 Different Comparisons:")
print()

print("💰 Expensive items (>= $300):")
expensive = products[products['price'] >= 300]
print(expensive[['product', 'price']])
print()

print("📦 Low stock (< 10):")
low_stock = products[products['stock'] < 10]
print(low_stock[['product', 'stock']])
print()

print("🖥️ Not accessories:")
not_accessories = products[products['category'] != 'Accessory']
print(not_accessories[['product', 'category']])

🔗 Combining Conditions

Use multiple conditions together with & (AND) and | (OR):

import pandas as pd

# Employee data
employees = pd.DataFrame({
    'name': ['Alice', 'Bob', 'Charlie', 'Diana', 'Eve'],
    'age': [25, 35, 30, 28, 45],
    'department': ['Sales', 'IT', 'Sales', 'HR', 'IT'],
    'salary': [50000, 75000, 55000, 48000, 80000],
    'years': [2, 8, 5, 3, 12]
})

print("Employee Data:")
print(employees)
print()

print("🔗 Combined Conditions:")
print()

print("1️⃣ Young AND high paid (age < 30 AND salary > 50000):")
young_high_paid = employees[
    (employees['age'] < 30) & (employees['salary'] > 50000)
]
print(young_high_paid[['name', 'age', 'salary']])
print()

print("2️⃣ Sales OR IT department:")
sales_or_it = employees[
    (employees['department'] == 'Sales') | (employees['department'] == 'IT')
]
print(sales_or_it[['name', 'department']])
print()

print("3️⃣ Experienced (>5 years) AND well-paid (>60000):")
experienced_well_paid = employees[
    (employees['years'] > 5) & (employees['salary'] > 60000)
]
print(experienced_well_paid[['name', 'years', 'salary']])

📝 Text-Based Conditions

Special methods for filtering text data:

import pandas as pd

# Customer data
customers = pd.DataFrame({
    'name': ['Alice Johnson', 'Bob Smith', 'Charlie Brown', 'Diana Lee'],
    'email': ['alice@gmail.com', 'bob@yahoo.com', 'charlie@gmail.com', 'diana@outlook.com'],
    'city': ['New York', 'Los Angeles', 'New York', 'Chicago'],
    'status': ['Active', 'Inactive', 'Active', 'Active']
})

print("Customer Data:")
print(customers)
print()

print("📝 Text Filtering:")
print()

print("1️⃣ Names starting with 'A':")
a_names = customers[customers['name'].str.startswith('A')]
print(a_names[['name']])
print()

print("2️⃣ Gmail users:")
gmail_users = customers[customers['email'].str.contains('gmail')]
print(gmail_users[['name', 'email']])
print()

print("3️⃣ New York customers who are active:")
ny_active = customers[
    (customers['city'] == 'New York') & (customers['status'] == 'Active')
]
print(ny_active[['name', 'city', 'status']])
print()

print("4️⃣ Names containing 'o':")
names_with_o = customers[customers['name'].str.contains('o', case=False)]
print(names_with_o[['name']])

📋 Using isin() for Multiple Values

Filter for multiple specific values at once:

import pandas as pd

# Order data
orders = pd.DataFrame({
    'order_id': [1001, 1002, 1003, 1004, 1005, 1006],
    'customer': ['Alice', 'Bob', 'Charlie', 'Alice', 'Diana', 'Bob'],
    'product': ['Laptop', 'Mouse', 'Keyboard', 'Monitor', 'Tablet', 'Laptop'],
    'status': ['Shipped', 'Pending', 'Delivered', 'Shipped', 'Cancelled', 'Delivered']
})

print("Order Data:")
print(orders)
print()

print("📋 Multiple Value Filtering:")
print()

print("1️⃣ Orders from Alice or Bob:")
specific_customers = orders[orders['customer'].isin(['Alice', 'Bob'])]
print(specific_customers[['order_id', 'customer', 'product']])
print()

print("2️⃣ Computer products (Laptop, Monitor, Tablet):")
computers = ['Laptop', 'Monitor', 'Tablet']
computer_orders = orders[orders['product'].isin(computers)]
print(computer_orders[['product', 'customer']])
print()

print("3️⃣ Active statuses (Shipped or Delivered):")
active_statuses = ['Shipped', 'Delivered']
active_orders = orders[orders['status'].isin(active_statuses)]
print(active_orders[['order_id', 'status']])

🎯 Practical Filtering Examples

Real-world filtering scenarios:

import pandas as pd

# Sales data
sales = pd.DataFrame({
    'date': ['2023-01-15', '2023-02-10', '2023-01-25', '2023-03-05', '2023-02-20'],
    'salesperson': ['Alice', 'Bob', 'Alice', 'Charlie', 'Bob'],
    'amount': [1500, 800, 2200, 600, 1800],
    'region': ['North', 'South', 'North', 'East', 'South'],
    'product_type': ['Software', 'Hardware', 'Software', 'Hardware', 'Software']
})

print("Sales Data:")
print(sales)
print()

print("🎯 Business Filtering Examples:")
print()

print("1️⃣ High-value sales (>$1500) in North region:")
high_value_north = sales[
    (sales['amount'] > 1500) & (sales['region'] == 'North')
]
print(high_value_north[['salesperson', 'amount', 'region']])
print()

print("2️⃣ Alice's software sales:")
alice_software = sales[
    (sales['salesperson'] == 'Alice') & (sales['product_type'] == 'Software')
]
print(alice_software[['date', 'amount']])
print()

print("3️⃣ Small sales (<$1000) OR East region:")
small_or_east = sales[
    (sales['amount'] < 1000) | (sales['region'] == 'East')
]
print(small_or_east[['salesperson', 'amount', 'region']])
print()

print("4️⃣ Top performers (Alice or Bob) with big sales (>$1200):")
top_big_sales = sales[
    (sales['salesperson'].isin(['Alice', 'Bob'])) & (sales['amount'] > 1200)
]
print(top_big_sales[['salesperson', 'amount']])

🔍 Checking Your Filters

Always verify your filtering results:

import pandas as pd

# Survey data
survey = pd.DataFrame({
    'respondent': ['P1', 'P2', 'P3', 'P4', 'P5', 'P6'],
    'age': [25, 45, 32, 28, 38, 52],
    'satisfaction': [4, 2, 5, 3, 4, 1],
    'would_recommend': [True, False, True, True, True, False]
})

print("Survey Data:")
print(survey)
print()

print("🔍 Filtering with Verification:")
print()

# Filter and check
satisfied_customers = survey[survey['satisfaction'] >= 4]
print("Satisfied customers (satisfaction >= 4):")
print(satisfied_customers)
print(f"Count: {len(satisfied_customers)} out of {len(survey)}")
print()

# Multiple conditions with check
promoters = survey[
    (survey['satisfaction'] >= 4) & (survey['would_recommend'] == True)
]
print("Promoters (satisfied AND would recommend):")
print(promoters[['respondent', 'satisfaction', 'would_recommend']])
print(f"Promoter rate: {len(promoters)/len(survey)*100:.1f}%")
print()

# Check what was filtered out
detractors = survey[survey['satisfaction'] <= 2]
print("Detractors (satisfaction <= 2):")
print(detractors[['respondent', 'satisfaction']])

⚠️ Common Filtering Mistakes

Avoid these boolean selection pitfalls that can cause errors or unexpected results:

import pandas as pd

# Sample data for demonstrating correct usage
data = pd.DataFrame({
    'name': ['Alice', 'Bob', 'Charlie'],
    'age': [25, 30, 35],
    'score': [85, 90, 75]
})

print("✅ Correct Boolean Selection Examples:")
print()

# Correct filtering examples
result1 = data[(data['age'] > 25) & (data['score'] > 80)]
print("High age AND high score:")
print(result1)
print()

result2 = data[data['name'].isin(['Alice', 'Bob'])]
print("Alice or Bob:")
print(result2)
print()

result3 = data[(data['age'] > 25) | (data['score'] > 85)]
print("High age OR high score:")
print(result3)

🎯 Key Takeaways

🎮 Filtering Challenge

Practice your boolean selection skills:

import pandas as pd

# E-commerce data
products = pd.DataFrame({
    'product_id': [101, 102, 103, 104, 105, 106],
    'name': ['Gaming Laptop', 'Wireless Mouse', 'Keyboard Pro', 'Monitor 4K', 'Tablet Air', 'Phone Case'],
    'category': ['Computer', 'Accessory', 'Accessory', 'Computer', 'Computer', 'Accessory'],
    'price': [1299, 49, 129, 599, 449, 19],
    'rating': [4.5, 4.2, 4.8, 4.1, 4.6, 3.9],
    'in_stock': [True, True, False, True, True, True]
})

print("E-commerce Products:")
print(products)
print()

print("🎮 Filtering Challenges:")
print()

print("1️⃣ High-rated available products (rating >= 4.5 AND in stock):")
high_rated_available = products[
    (products['rating'] >= 4.5) & (products['in_stock'] == True)
]
print(high_rated_available[['name', 'rating', 'in_stock']])
print()

print("2️⃣ Affordable computers (<$600):")
affordable_computers = products[
    (products['category'] == 'Computer') & (products['price'] < 600)
]
print(affordable_computers[['name', 'price']])
print()

print("3️⃣ Premium products (>$400) OR top-rated (>4.7):")
premium_or_top = products[
    (products['price'] > 400) | (products['rating'] > 4.7)
]
print(premium_or_top[['name', 'price', 'rating']])
print()

print(f"📊 Summary: Found {len(premium_or_top)} premium/top-rated products")

🚀 What's Next?

Fantastic! You now know how to filter data with precise conditions. Next, let's learn about cleaning your data to make it analysis-ready.

Continue to: Data Cleaning

You're mastering data selection like a pro! ✅🎯

Was this helpful?

😔Poor
🙁Fair
😊Good
😄Great
🤩Excellent