🔍 Data Exploration

Data exploration is like being a detective - you're investigating your data to understand what you're working with! Before you can analyze or clean data, you need to know its shape, structure, and contents.

🎯 Why Explore Data First?

Jumping into analysis without exploration is like cooking without checking ingredients:

import pandas as pd

# Sample dataset to explore
students = pd.DataFrame({
    'name': ['Alice', 'Bob', 'Charlie', 'Diana'],
    'age': [20, 22, None, 23],
    'grade': ['A', 'B', 'A', 'C'],
    'score': [85, 78, 92, 69]
})

print("📊 Our Dataset:")
print(students)
print()

# Quick exploration reveals important info
print("🔍 Quick Discovery:")
print(f"Size: {students.shape[0]} students, {students.shape[1]} columns")
print(f"Missing data: {students.isnull().sum().sum()} values")
print(f"Score range: {students['score'].min()} to {students['score'].max()}")

📋 The Data Exploration Process

StepWhat to CheckWhy It Matters
1. SizeRows and columnsKnow what you're dealing with
2. StructureData types, column namesUnderstand the format
3. ContentFirst/last rows, samplesSee actual data
4. QualityMissing values, duplicatesSpot problems early
5. PatternsStatistics, distributionsFind interesting insights

👀 First Look at Your Data

Always start with these basic checks:

import pandas as pd

# Load sample data
sales = pd.DataFrame({
    'product': ['Laptop', 'Mouse', 'Keyboard', 'Monitor'],
    'price': [999, 25, 75, 300],
    'quantity': [2, 10, 5, 1]
})

# Step 1: Basic Info
print(f"Shape: {sales.shape}")
print(f"Columns: {list(sales.columns)}")
print()

# Step 2: First Look
print("First 3 rows:")
print(sales.head(3))
print()

# Step 3: Data Types
print("Data types:")
print(sales.dtypes)

📊 What You'll Learn in This Section

Master the art of data exploration:

🔍 Quick Exploration Example

Here's how exploration works in practice:

import pandas as pd

# Survey data example
survey = pd.DataFrame({
    'age_group': ['18-25', '26-35', '18-25', '36-45'],
    'satisfaction': [4, 5, 3, 4],
    'city': ['NYC', 'LA', 'Chicago', 'NYC']
})

print("=== Quick Data Exploration ===")
print()

print("1️⃣ Overview:")
print(f"   {survey.shape[0]} responses, {survey.shape[1]} questions")
print()

print("2️⃣ Sample Data:")
print(survey.head(2))
print()

print("3️⃣ Quick Stats:")
print(f"   Average satisfaction: {survey['satisfaction'].mean():.1f}")
print(f"   Most common city: {survey['city'].value_counts().index[0]}")
print()

print("4️⃣ Missing Data Check:")
missing = survey.isnull().sum()
print(f"   Missing values: {missing.sum()}")

🛠️ Essential Exploration Commands

CommandWhat It ShowsWhen to Use
.head()First 5 rowsSee what data looks like
.tail()Last 5 rowsCheck end of dataset
.sample()Random rowsGet representative sample
.shape(rows, columns)Know dataset size
.info()Complete overviewUnderstand structure
.describe()StatisticsAnalyze numerical data
.nunique()Unique values countCheck data variety

🎯 Exploration Red Flags

Watch out for these warning signs:

📈 Quick Pattern Discovery

Exploration helps you spot interesting patterns immediately:

import pandas as pd

# Employee data
employees = pd.DataFrame({
    'department': ['Sales', 'IT', 'Sales', 'HR', 'IT'],
    'salary': [50000, 75000, 52000, 48000, 80000],
    'experience': [2, 5, 3, 1, 7]
})

print("🔍 Pattern Discovery:")
print()

print("Department breakdown:")
print(employees['department'].value_counts())
print()

print("Salary by department:")
dept_salary = employees.groupby('department')['salary'].mean()
for dept, avg_salary in dept_salary.items():
    print(f"   {dept}: ${avg_salary:,.0f}")
print()

print("Experience vs Salary correlation:")
correlation = employees['experience'].corr(employees['salary'])
print(f"   Correlation: {correlation:.2f}")

🎯 Key Takeaways

🚀 What's Next?

Ready to become a data detective? Let's start by learning different ways to view and sample your data.

Start with: Viewing Your Data

Time to explore! 🔍📊

Was this helpful?

😔Poor
🙁Fair
😊Good
😄Great
🤩Excellent