📋 Selecting Columns and Rows

Learning to select specific columns and rows is fundamental to working with DataFrames. It's like knowing how to pick items from a menu - you want exactly what you need, nothing more, nothing less!

📊 Selecting Single Columns

The most basic selection - getting one column at a time:

import pandas as pd

# Sample data
students = pd.DataFrame({
    'name': ['Alice', 'Bob', 'Charlie'],
    'age': [20, 22, 21],
    'grade': ['A', 'B', 'A'],
    'score': [85, 78, 92]
})

print("Original Data:")
print(students)
print()

print("📋 Single Column Selection:")
print()

# Get just names
names = students['name']
print("Student names:")
print(names)
print(f"Type: {type(names)}")
print()

# Get just scores
scores = students['score']
print("Student scores:")
print(scores)
print(f"Average score: {scores.mean()}")

📊 Selecting Multiple Columns

When you need several columns but not all:

import pandas as pd

# Employee data
employees = pd.DataFrame({
    'name': ['Alice', 'Bob', 'Charlie'],
    'department': ['Sales', 'IT', 'Marketing'],
    'salary': [50000, 75000, 60000],
    'years': [2, 5, 3]
})

print("Employee Data:")
print(employees)
print()

print("📋 Multiple Column Selection:")
print()

# Select name and salary
basic_info = employees[['name', 'salary']]
print("Name and salary:")
print(basic_info)
print()

# Select name, department, and years
summary = employees[['name', 'department', 'years']]
print("Employee summary:")
print(summary)

🔢 Selecting Rows by Position

Select rows using their position (index):

import pandas as pd

# Product data
products = pd.DataFrame({
    'product': ['Laptop', 'Mouse', 'Keyboard', 'Monitor'],
    'price': [999, 25, 75, 300],
    'stock': [10, 50, 30, 8]
})

print("Product Data:")
print(products)
print()

print("🔢 Row Selection by Position:")
print()

# First row
first_product = products.iloc[0]
print("First product:")
print(first_product)
print()

# First 2 rows
first_two = products.iloc[0:2]
print("First 2 products:")
print(first_two)
print()

# Specific rows (1st and 3rd)
selected = products.iloc[[0, 2]]
print("1st and 3rd products:")
print(selected)

🎯 Combining Column and Row Selection

Select specific rows AND specific columns:

import pandas as pd

# Survey data
survey = pd.DataFrame({
    'respondent': ['Person1', 'Person2', 'Person3', 'Person4'],
    'age': [25, 34, 28, 45],
    'satisfaction': [4, 5, 3, 4],
    'city': ['NYC', 'LA', 'Chicago', 'Miami']
})

print("Survey Data:")
print(survey)
print()

print("🎯 Combined Selection:")
print()

# First 2 people, age and satisfaction only
subset = survey.iloc[0:2][['age', 'satisfaction']]
print("First 2 respondents - age and satisfaction:")
print(subset)
print()

# Specific people and columns
specific = survey.iloc[[0, 3]][['respondent', 'city']]
print("1st and 4th respondent - name and city:")
print(specific)

📋 Column Selection Patterns

Common patterns you'll use all the time:

import pandas as pd

# Sales data
sales = pd.DataFrame({
    'date': ['2023-01-01', '2023-01-02', '2023-01-03'],
    'product': ['A', 'B', 'C'],
    'quantity': [10, 5, 8],
    'revenue': [1000, 500, 800],
    'region': ['North', 'South', 'North']
})

print("Sales Data:")
print(sales)
print()

print("📋 Common Column Patterns:")
print()

# Pattern 1: Key metrics only
metrics = sales[['product', 'revenue']]
print("1️⃣ Key metrics:")
print(metrics)
print()

# Pattern 2: All except one column
# (Select all columns except 'date')
no_date = sales[['product', 'quantity', 'revenue', 'region']]
print("2️⃣ Everything except date:")
print(no_date)
print()

# Pattern 3: Related columns
financials = sales[['product', 'quantity', 'revenue']]
print("3️⃣ Financial data:")
print(financials)

🔍 Working with Selected Data

Once you select data, you can analyze it immediately:

import pandas as pd

# Customer data
customers = pd.DataFrame({
    'name': ['Alice', 'Bob', 'Charlie', 'Diana'],
    'age': [25, 35, 30, 28],
    'purchases': [5, 12, 8, 15],
    'total_spent': [500, 1200, 800, 1500]
})

print("Customer Data:")
print(customers)
print()

# Select and analyze spending data
spending = customers[['name', 'total_spent']]
print("💰 Spending Analysis:")
print(spending)
print(f"Average spending: ${spending['total_spent'].mean():.0f}")
print(f"Top spender: {spending.loc[spending['total_spent'].idxmax(), 'name']}")
print()

# Select and analyze purchase behavior
behavior = customers[['name', 'purchases']]
print("🛒 Purchase Behavior:")
print(behavior)
print(f"Average purchases: {behavior['purchases'].mean():.1f}")

⚠️ Common Selection Mistakes

Avoid these common errors that can cause confusion or errors:

import pandas as pd

# Sample data for demonstrating correct selection
data = pd.DataFrame({
    'name': ['Alice', 'Bob'],
    'age': [25, 30],
    'city': ['NYC', 'LA']
})

print("✅ Correct Selection Examples:")
print()

# Correct way to select multiple columns
result = data[['name', 'age']]
print("Multiple columns with double brackets:")
print(result)
print()

# Single column selection
single_col = data['name']
print("Single column selection:")
print(single_col)
print(f"Type: {type(single_col)}")
print()

# Multiple columns vs single column return types
print("📊 Understanding Return Types:")
print(f"Single column returns: {type(data['name'])}")
print(f"Multiple columns return: {type(data[['name', 'age']])}")

📋 Selection Reference Guide

What You WantSyntaxReturns
One columndf['name']Series
Multiple columnsdf[['name', 'age']]DataFrame
First rowdf.iloc[0]Series
First 3 rowsdf.iloc[0:3]DataFrame
Specific rowsdf.iloc[[0, 2, 4]]DataFrame
Last rowdf.iloc[-1]Series
Last 2 rowsdf.iloc[-2:]DataFrame

🎯 Key Takeaways

🎮 Practice Selection

Let's practice with a realistic dataset:

import pandas as pd

# Practice dataset
orders = pd.DataFrame({
    'order_id': [1001, 1002, 1003, 1004],
    'customer': ['Alice', 'Bob', 'Charlie', 'Alice'],
    'product': ['Laptop', 'Mouse', 'Keyboard', 'Monitor'],
    'quantity': [1, 2, 1, 1],
    'price': [999, 25, 75, 300]
})

print("📦 Orders Dataset:")
print(orders)
print()

print("🎯 Practice Selections:")
print()

print("1️⃣ Customer and product only:")
print(orders[['customer', 'product']])
print()

print("2️⃣ First 2 orders:")
print(orders.iloc[0:2])
print()

print("3️⃣ Financial data (quantity and price):")
financial = orders[['quantity', 'price']]
print(financial)
print(f"Total revenue: ${(financial['quantity'] * financial['price']).sum()}")

🚀 What's Next?

Great! You now know the basics of selecting columns and rows. Next, let's learn about more precise selection methods using loc and iloc.

Continue to: Using loc and iloc

You're building solid selection skills! 📋🎯

Was this helpful?

😔Poor
🙁Fair
😊Good
😄Great
🤩Excellent