🔍 Data Exploration

Data exploration is like being a detective - you're investigating your data to understand what you're working with! Before you can analyze or clean data, you need to know its shape, structure, and contents.

🎯 Why Explore Data First?

Jumping into analysis without exploration is like cooking without checking ingredients:

import pandas as pd

# Sample dataset to explore
students = pd.DataFrame({
    'name': ['Alice', 'Bob', 'Charlie', 'Diana'],
    'age': [20, 22, None, 23],
    'grade': ['A', 'B', 'A', 'C'],
    'score': [85, 78, 92, 69]
})

print("📊 Our Dataset:")
print(students)
print()

# Quick exploration reveals important info
print("🔍 Quick Discovery:")
print(f"Size: {students.shape[0]} students, {students.shape[1]} columns")
print(f"Missing data: {students.isnull().sum().sum()} values")
print(f"Score range: {students['score'].min()} to {students['score'].max()}")

📋 The Data Exploration Process

Step	What to Check	Why It Matters
1. Size	Rows and columns	Know what you're dealing with
2. Structure	Data types, column names	Understand the format
3. Content	First/last rows, samples	See actual data
4. Quality	Missing values, duplicates	Spot problems early
5. Patterns	Statistics, distributions	Find interesting insights

👀 First Look at Your Data

Always start with these basic checks:

import pandas as pd

# Load sample data
sales = pd.DataFrame({
    'product': ['Laptop', 'Mouse', 'Keyboard', 'Monitor'],
    'price': [999, 25, 75, 300],
    'quantity': [2, 10, 5, 1]
})

# Step 1: Basic Info
print(f"Shape: {sales.shape}")
print(f"Columns: {list(sales.columns)}")
print()

# Step 2: First Look
print("First 3 rows:")
print(sales.head(3))
print()

# Step 3: Data Types
print("Data types:")
print(sales.dtypes)

📊 What You'll Learn in This Section

Master the art of data exploration:

👁️ Viewing Your Data Learn different ways to look at and sample your DataFrame.
📏 Data Shape and Statistics Understand your data's size, structure, and numerical summaries.
🔍 Column Information and Data Types Explore individual columns and understand data types.

🔍 Quick Exploration Example

Here's how exploration works in practice:

import pandas as pd

# Survey data example
survey = pd.DataFrame({
    'age_group': ['18-25', '26-35', '18-25', '36-45'],
    'satisfaction': [4, 5, 3, 4],
    'city': ['NYC', 'LA', 'Chicago', 'NYC']
})

print("=== Quick Data Exploration ===")
print()

print("1️⃣ Overview:")
print(f"   {survey.shape[0]} responses, {survey.shape[1]} questions")
print()

print("2️⃣ Sample Data:")
print(survey.head(2))
print()

print("3️⃣ Quick Stats:")
print(f"   Average satisfaction: {survey['satisfaction'].mean():.1f}")
print(f"   Most common city: {survey['city'].value_counts().index[0]}")
print()

print("4️⃣ Missing Data Check:")
missing = survey.isnull().sum()
print(f"   Missing values: {missing.sum()}")

🛠️ Essential Exploration Commands

Command	What It Shows	When to Use
`.head()`	First 5 rows	See what data looks like
`.tail()`	Last 5 rows	Check end of dataset
`.sample()`	Random rows	Get representative sample
`.shape`	(rows, columns)	Know dataset size
`.info()`	Complete overview	Understand structure
`.describe()`	Statistics	Analyze numerical data
`.nunique()`	Unique values count	Check data variety

🎯 Exploration Red Flags

Watch out for these warning signs:

📈 Quick Pattern Discovery

Exploration helps you spot interesting patterns immediately:

import pandas as pd

# Employee data
employees = pd.DataFrame({
    'department': ['Sales', 'IT', 'Sales', 'HR', 'IT'],
    'salary': [50000, 75000, 52000, 48000, 80000],
    'experience': [2, 5, 3, 1, 7]
})

print("🔍 Pattern Discovery:")
print()

print("Department breakdown:")
print(employees['department'].value_counts())
print()

print("Salary by department:")
dept_salary = employees.groupby('department')['salary'].mean()
for dept, avg_salary in dept_salary.items():
    print(f"   {dept}: ${avg_salary:,.0f}")
print()

print("Experience vs Salary correlation:")
correlation = employees['experience'].corr(employees['salary'])
print(f"   Correlation: {correlation:.2f}")

🎯 Key Takeaways

🚀 What's Next?

Ready to become a data detective? Let's start by learning different ways to view and sample your data.

Start with: Viewing Your Data

Time to explore! 🔍📊

Online Python

🔍 Data Exploration

Track Your Learning Progress