🔍 Data Exploration
Data exploration is like being a detective - you're investigating your data to understand what you're working with! Before you can analyze or clean data, you need to know its shape, structure, and contents.
🎯 Why Explore Data First?
Jumping into analysis without exploration is like cooking without checking ingredients:
import pandas as pd
# Sample dataset to explore
students = pd.DataFrame({
'name': ['Alice', 'Bob', 'Charlie', 'Diana'],
'age': [20, 22, None, 23],
'grade': ['A', 'B', 'A', 'C'],
'score': [85, 78, 92, 69]
})
print("📊 Our Dataset:")
print(students)
print()
# Quick exploration reveals important info
print("🔍 Quick Discovery:")
print(f"Size: {students.shape[0]} students, {students.shape[1]} columns")
print(f"Missing data: {students.isnull().sum().sum()} values")
print(f"Score range: {students['score'].min()} to {students['score'].max()}")
📋 The Data Exploration Process
Step | What to Check | Why It Matters |
---|---|---|
1. Size | Rows and columns | Know what you're dealing with |
2. Structure | Data types, column names | Understand the format |
3. Content | First/last rows, samples | See actual data |
4. Quality | Missing values, duplicates | Spot problems early |
5. Patterns | Statistics, distributions | Find interesting insights |
👀 First Look at Your Data
Always start with these basic checks:
import pandas as pd
# Load sample data
sales = pd.DataFrame({
'product': ['Laptop', 'Mouse', 'Keyboard', 'Monitor'],
'price': [999, 25, 75, 300],
'quantity': [2, 10, 5, 1]
})
# Step 1: Basic Info
print(f"Shape: {sales.shape}")
print(f"Columns: {list(sales.columns)}")
print()
# Step 2: First Look
print("First 3 rows:")
print(sales.head(3))
print()
# Step 3: Data Types
print("Data types:")
print(sales.dtypes)
📊 What You'll Learn in This Section
Master the art of data exploration:
- 👁️ Viewing Your Data Learn different ways to look at and sample your DataFrame.
- 📏 Data Shape and Statistics Understand your data's size, structure, and numerical summaries.
- 🔍 Column Information and Data Types Explore individual columns and understand data types.
🔍 Quick Exploration Example
Here's how exploration works in practice:
import pandas as pd
# Survey data example
survey = pd.DataFrame({
'age_group': ['18-25', '26-35', '18-25', '36-45'],
'satisfaction': [4, 5, 3, 4],
'city': ['NYC', 'LA', 'Chicago', 'NYC']
})
print("=== Quick Data Exploration ===")
print()
print("1️⃣ Overview:")
print(f" {survey.shape[0]} responses, {survey.shape[1]} questions")
print()
print("2️⃣ Sample Data:")
print(survey.head(2))
print()
print("3️⃣ Quick Stats:")
print(f" Average satisfaction: {survey['satisfaction'].mean():.1f}")
print(f" Most common city: {survey['city'].value_counts().index[0]}")
print()
print("4️⃣ Missing Data Check:")
missing = survey.isnull().sum()
print(f" Missing values: {missing.sum()}")
🛠️ Essential Exploration Commands
Command | What It Shows | When to Use |
---|---|---|
.head() | First 5 rows | See what data looks like |
.tail() | Last 5 rows | Check end of dataset |
.sample() | Random rows | Get representative sample |
.shape | (rows, columns) | Know dataset size |
.info() | Complete overview | Understand structure |
.describe() | Statistics | Analyze numerical data |
.nunique() | Unique values count | Check data variety |
🎯 Exploration Red Flags
Watch out for these warning signs:
📈 Quick Pattern Discovery
Exploration helps you spot interesting patterns immediately:
import pandas as pd
# Employee data
employees = pd.DataFrame({
'department': ['Sales', 'IT', 'Sales', 'HR', 'IT'],
'salary': [50000, 75000, 52000, 48000, 80000],
'experience': [2, 5, 3, 1, 7]
})
print("🔍 Pattern Discovery:")
print()
print("Department breakdown:")
print(employees['department'].value_counts())
print()
print("Salary by department:")
dept_salary = employees.groupby('department')['salary'].mean()
for dept, avg_salary in dept_salary.items():
print(f" {dept}: ${avg_salary:,.0f}")
print()
print("Experience vs Salary correlation:")
correlation = employees['experience'].corr(employees['salary'])
print(f" Correlation: {correlation:.2f}")
🎯 Key Takeaways
🚀 What's Next?
Ready to become a data detective? Let's start by learning different ways to view and sample your data.
Start with: Viewing Your Data
Time to explore! 🔍📊
Was this helpful?
Track Your Learning Progress
Sign in to bookmark tutorials and keep track of your learning journey.
Your progress is saved automatically as you read.