📋 Selecting Columns and Rows
Learning to select specific columns and rows is fundamental to working with DataFrames. It's like knowing how to pick items from a menu - you want exactly what you need, nothing more, nothing less!
📊 Selecting Single Columns
The most basic selection - getting one column at a time:
import pandas as pd
# Sample data
students = pd.DataFrame({
'name': ['Alice', 'Bob', 'Charlie'],
'age': [20, 22, 21],
'grade': ['A', 'B', 'A'],
'score': [85, 78, 92]
})
print("Original Data:")
print(students)
print()
print("📋 Single Column Selection:")
print()
# Get just names
names = students['name']
print("Student names:")
print(names)
print(f"Type: {type(names)}")
print()
# Get just scores
scores = students['score']
print("Student scores:")
print(scores)
print(f"Average score: {scores.mean()}")
📊 Selecting Multiple Columns
When you need several columns but not all:
import pandas as pd
# Employee data
employees = pd.DataFrame({
'name': ['Alice', 'Bob', 'Charlie'],
'department': ['Sales', 'IT', 'Marketing'],
'salary': [50000, 75000, 60000],
'years': [2, 5, 3]
})
print("Employee Data:")
print(employees)
print()
print("📋 Multiple Column Selection:")
print()
# Select name and salary
basic_info = employees[['name', 'salary']]
print("Name and salary:")
print(basic_info)
print()
# Select name, department, and years
summary = employees[['name', 'department', 'years']]
print("Employee summary:")
print(summary)
🔢 Selecting Rows by Position
Select rows using their position (index):
import pandas as pd
# Product data
products = pd.DataFrame({
'product': ['Laptop', 'Mouse', 'Keyboard', 'Monitor'],
'price': [999, 25, 75, 300],
'stock': [10, 50, 30, 8]
})
print("Product Data:")
print(products)
print()
print("🔢 Row Selection by Position:")
print()
# First row
first_product = products.iloc[0]
print("First product:")
print(first_product)
print()
# First 2 rows
first_two = products.iloc[0:2]
print("First 2 products:")
print(first_two)
print()
# Specific rows (1st and 3rd)
selected = products.iloc[[0, 2]]
print("1st and 3rd products:")
print(selected)
🎯 Combining Column and Row Selection
Select specific rows AND specific columns:
import pandas as pd
# Survey data
survey = pd.DataFrame({
'respondent': ['Person1', 'Person2', 'Person3', 'Person4'],
'age': [25, 34, 28, 45],
'satisfaction': [4, 5, 3, 4],
'city': ['NYC', 'LA', 'Chicago', 'Miami']
})
print("Survey Data:")
print(survey)
print()
print("🎯 Combined Selection:")
print()
# First 2 people, age and satisfaction only
subset = survey.iloc[0:2][['age', 'satisfaction']]
print("First 2 respondents - age and satisfaction:")
print(subset)
print()
# Specific people and columns
specific = survey.iloc[[0, 3]][['respondent', 'city']]
print("1st and 4th respondent - name and city:")
print(specific)
📋 Column Selection Patterns
Common patterns you'll use all the time:
import pandas as pd
# Sales data
sales = pd.DataFrame({
'date': ['2023-01-01', '2023-01-02', '2023-01-03'],
'product': ['A', 'B', 'C'],
'quantity': [10, 5, 8],
'revenue': [1000, 500, 800],
'region': ['North', 'South', 'North']
})
print("Sales Data:")
print(sales)
print()
print("📋 Common Column Patterns:")
print()
# Pattern 1: Key metrics only
metrics = sales[['product', 'revenue']]
print("1️⃣ Key metrics:")
print(metrics)
print()
# Pattern 2: All except one column
# (Select all columns except 'date')
no_date = sales[['product', 'quantity', 'revenue', 'region']]
print("2️⃣ Everything except date:")
print(no_date)
print()
# Pattern 3: Related columns
financials = sales[['product', 'quantity', 'revenue']]
print("3️⃣ Financial data:")
print(financials)
🔍 Working with Selected Data
Once you select data, you can analyze it immediately:
import pandas as pd
# Customer data
customers = pd.DataFrame({
'name': ['Alice', 'Bob', 'Charlie', 'Diana'],
'age': [25, 35, 30, 28],
'purchases': [5, 12, 8, 15],
'total_spent': [500, 1200, 800, 1500]
})
print("Customer Data:")
print(customers)
print()
# Select and analyze spending data
spending = customers[['name', 'total_spent']]
print("💰 Spending Analysis:")
print(spending)
print(f"Average spending: ${spending['total_spent'].mean():.0f}")
print(f"Top spender: {spending.loc[spending['total_spent'].idxmax(), 'name']}")
print()
# Select and analyze purchase behavior
behavior = customers[['name', 'purchases']]
print("🛒 Purchase Behavior:")
print(behavior)
print(f"Average purchases: {behavior['purchases'].mean():.1f}")
⚠️ Common Selection Mistakes
Avoid these common errors that can cause confusion or errors:
import pandas as pd
# Sample data for demonstrating correct selection
data = pd.DataFrame({
'name': ['Alice', 'Bob'],
'age': [25, 30],
'city': ['NYC', 'LA']
})
print("✅ Correct Selection Examples:")
print()
# Correct way to select multiple columns
result = data[['name', 'age']]
print("Multiple columns with double brackets:")
print(result)
print()
# Single column selection
single_col = data['name']
print("Single column selection:")
print(single_col)
print(f"Type: {type(single_col)}")
print()
# Multiple columns vs single column return types
print("📊 Understanding Return Types:")
print(f"Single column returns: {type(data['name'])}")
print(f"Multiple columns return: {type(data[['name', 'age']])}")
📋 Selection Reference Guide
What You Want | Syntax | Returns |
---|---|---|
One column | df['name'] | Series |
Multiple columns | df[['name', 'age']] | DataFrame |
First row | df.iloc[0] | Series |
First 3 rows | df.iloc[0:3] | DataFrame |
Specific rows | df.iloc[[0, 2, 4]] | DataFrame |
Last row | df.iloc[-1] | Series |
Last 2 rows | df.iloc[-2:] | DataFrame |
🎯 Key Takeaways
🎮 Practice Selection
Let's practice with a realistic dataset:
import pandas as pd
# Practice dataset
orders = pd.DataFrame({
'order_id': [1001, 1002, 1003, 1004],
'customer': ['Alice', 'Bob', 'Charlie', 'Alice'],
'product': ['Laptop', 'Mouse', 'Keyboard', 'Monitor'],
'quantity': [1, 2, 1, 1],
'price': [999, 25, 75, 300]
})
print("📦 Orders Dataset:")
print(orders)
print()
print("🎯 Practice Selections:")
print()
print("1️⃣ Customer and product only:")
print(orders[['customer', 'product']])
print()
print("2️⃣ First 2 orders:")
print(orders.iloc[0:2])
print()
print("3️⃣ Financial data (quantity and price):")
financial = orders[['quantity', 'price']]
print(financial)
print(f"Total revenue: ${(financial['quantity'] * financial['price']).sum()}")
🚀 What's Next?
Great! You now know the basics of selecting columns and rows. Next, let's learn about more precise selection methods using loc
and iloc
.
Continue to: Using loc and iloc
You're building solid selection skills! 📋🎯
Was this helpful?
Track Your Learning Progress
Sign in to bookmark tutorials and keep track of your learning journey.
Your progress is saved automatically as you read.