🗄️ Loading from Databases

Databases are incredibly powerful for data analysis! Instead of loading entire files into memory, you can query databases to get exactly the data you need. This makes working with large datasets much faster and more efficient.

🛠️ Basic Database Connection

First, let's create a simple SQLite database to practice with:

import pandas as pd
import sqlite3

# Create a simple SQLite database
conn = sqlite3.connect('sample.db')

# Create sample data
sample_data = pd.DataFrame({
    'employee_id': [1, 2, 3, 4, 5],
    'name': ['Alice', 'Bob', 'Charlie', 'Diana', 'Eve'],
    'department': ['Sales', 'IT', 'Sales', 'HR', 'IT'],
    'salary': [50000, 75000, 52000, 48000, 80000]
})

# Save DataFrame to database
sample_data.to_sql('employees', conn, if_exists='replace', index=False)

print("✅ Created sample database with employees table")
print(f"Saved {len(sample_data)} records")

# Close connection
conn.close()

📊 Loading Data from Database

Now let's load data from our database:

import pandas as pd
import sqlite3

# Connect to database
conn = sqlite3.connect('sample.db')

# Load all data
df = pd.read_sql('SELECT * FROM employees', conn)
print("All employees:")
print(df)
print()

# Load specific data with WHERE clause
sales_df = pd.read_sql('SELECT * FROM employees WHERE department = "Sales"', conn)
print("Sales department only:")
print(sales_df)
print()

# Load with calculations
summary = pd.read_sql("""
    SELECT department, 
           COUNT(*) as employee_count,
           AVG(salary) as avg_salary
    FROM employees 
    GROUP BY department
""", conn)
print("Department summary:")
print(summary)

conn.close()

🔍 Different Types of Queries

You can use SQL to filter and process data before loading:

import pandas as pd
import sqlite3

# Reconnect to our database
conn = sqlite3.connect('sample.db')

# Different query examples
queries = {
    "High earners": "SELECT * FROM employees WHERE salary > 60000",
    "IT department": "SELECT name, salary FROM employees WHERE department = 'IT'",
    "Sorted by salary": "SELECT * FROM employees ORDER BY salary DESC",
    "Name and department": "SELECT name, department FROM employees"
}

for description, query in queries.items():
    print(f"📊 {description}:")
    df = pd.read_sql(query, conn)
    print(df)
    print()

conn.close()

🌐 Connecting to Different Databases

Different databases need different connection methods:

SQLite (File-based)

import pandas as pd
import sqlite3

# SQLite - simple file database
conn = sqlite3.connect('database.db')
df = pd.read_sql('SELECT * FROM table_name', conn)
conn.close()

print("✅ SQLite connection established and closed safely")

PostgreSQL / MySQL Examples

import pandas as pd
# Note: These require additional packages

# Example connection patterns (see InfoBox above for details)
print("Database connection patterns moved to InfoBox above")
print("Remember to install required packages and use environment variables for credentials")

🔧 Safe Database Connections

Always handle database connections safely:

import pandas as pd
import sqlite3

def safe_db_query(db_file, query):
    """Safely execute database query"""
    try:
        # Connect to database
        conn = sqlite3.connect(db_file)
        
        # Execute query
        df = pd.read_sql(query, conn)
        
        print(f"✅ Query successful: {len(df)} rows returned")
        return df
        
    except sqlite3.Error as e:
        print(f"❌ Database error: {e}")
        return None
    except Exception as e:
        print(f"❌ Error: {e}")
        return None
    finally:
        # Always close connection
        if 'conn' in locals():
            conn.close()

# Test safe query
df = safe_db_query('sample.db', 'SELECT * FROM employees')
if df is not None:
    print(df.head())

📋 Database Loading Best Practices

PracticeWhyExample
Use LIMITDon't load huge datasetsSELECT * FROM table LIMIT 1000
Filter with WHEREOnly get needed dataWHERE date > '2023-01-01'
Close connectionsPrevent database locksconn.close()
Handle errorsGraceful failuretry/except blocks
Test queries firstAvoid long waitsRun in database tool first

🎯 Common SQL Queries for Analysis

import pandas as pd
import sqlite3

# Useful SQL patterns for data analysis
conn = sqlite3.connect('sample.db')

queries = [
    # Get recent data
    "SELECT * FROM employees ORDER BY employee_id DESC LIMIT 5",
    
    # Count by category
    "SELECT department, COUNT(*) as count FROM employees GROUP BY department",
    
    # Basic statistics
    "SELECT AVG(salary) as avg_salary, MAX(salary) as max_salary FROM employees",
    
    # Filter and sort
    "SELECT name, salary FROM employees WHERE salary > 50000 ORDER BY salary"
]

for i, query in enumerate(queries, 1):
    print(f"Query {i}:")
    df = pd.read_sql(query, conn)
    print(df)
    print()

conn.close()

🎯 Key Takeaways

🚀 What's Next?

Great! You can now load data from databases. Next, let's learn how to handle common errors that occur when loading files and databases.

Continue to: Handling File Errors

You're mastering data loading! 🗄️💪

Was this helpful?

😔Poor
🙁Fair
😊Good
😄Great
🤩Excellent