📄 Reading Files (CSV, Excel, JSON)

Loading data from files is one of the most important skills in data analysis. Let's learn how to read the three most common file formats: CSV, Excel, and JSON files.

📄 Reading CSV Files

CSV is the most common format. It's simple and works everywhere:

import pandas as pd

# Basic CSV reading
# df = pd.read_csv('data.csv')

# Let's simulate with sample data
sample_csv_data = """name,age,city
Alice,25,New York
Bob,30,London
Charlie,35,Tokyo"""

# Save sample data to show loading
with open('sample.csv', 'w') as f:
    f.write(sample_csv_data)

# Now load it
df = pd.read_csv('sample.csv')
print("Loaded CSV data:")
print(df)
print(f"Shape: {df.shape}")

CSV with Different Separators

Not all "CSV" files use commas:

import pandas as pd

# Create sample data with semicolons
sample_data = """name;age;salary
Alice;25;50000
Bob;30;60000
Charlie;35;70000"""

with open('sample_semicolon.csv', 'w') as f:
    f.write(sample_data)

# Load with different separator
df = pd.read_csv('sample_semicolon.csv', sep=';')
print("Data with semicolon separator:")
print(df)

📊 Reading Excel Files

Excel files are everywhere in business. Pandas makes them easy to read:

import pandas as pd

# For Excel files, you might need: pip install openpyxl

# Basic Excel reading
# df = pd.read_excel('data.xlsx')

# Simulate Excel data (you'd normally load from actual .xlsx file)
excel_data = pd.DataFrame({
    'product': ['Laptop', 'Mouse', 'Keyboard'],
    'price': [999, 25, 75],
    'stock': [10, 50, 30]
})

# Save as Excel to demonstrate
excel_data.to_excel('sample.xlsx', index=False)

# Load it back
df = pd.read_excel('sample.xlsx')
print("Loaded Excel data:")
print(df)

Excel with Multiple Sheets

Excel files often have multiple worksheets:

import pandas as pd

# Reading specific sheet
# df = pd.read_excel('data.xlsx', sheet_name='Sales')

# See all sheet names
# sheet_names = pd.ExcelFile('data.xlsx').sheet_names
# print("Available sheets:", sheet_names)

# Example sheet operations
df_sheet1 = pd.read_excel('file.xlsx', sheet_name='Sheet1')
df_first = pd.read_excel('file.xlsx', sheet_name=0)  # First sheet
all_sheets = pd.read_excel('file.xlsx', sheet_name=None)  # All sheets

🌐 Reading JSON Files

JSON is popular for web data and APIs:

import pandas as pd
import json

# Create sample JSON data
json_data = [
    {"name": "Alice", "age": 25, "city": "New York"},
    {"name": "Bob", "age": 30, "city": "London"},
    {"name": "Charlie", "age": 35, "city": "Tokyo"}
]

# Save as JSON file
with open('sample.json', 'w') as f:
    json.dump(json_data, f)

# Load JSON file
df = pd.read_json('sample.json')
print("Loaded JSON data:")
print(df)

🔧 Common Loading Options

Here are the most useful parameters for file loading:

CSV Options

import pandas as pd

# Common CSV parameters
print("CSV loading options:")
print()

# Basic with common options
sample_data = """Name,Age,Salary,Department
Alice,25,50000,Sales
Bob,30,,Engineering
Charlie,35,70000,Marketing"""

with open('sample_options.csv', 'w') as f:
    f.write(sample_data)

# Load with options
df = pd.read_csv('sample_options.csv', 
                 na_values=['', 'N/A'],  # What counts as missing
                 dtype={'Age': int})     # Force Age to be integer

print("Data with options:")
print(df)
print()
print("Data types:")
print(df.dtypes)

Excel Options

import pandas as pd

# Example Excel loading with options
# df = pd.read_excel('sales.xlsx', sheet_name='Q1_Data')
# df = pd.read_excel('report.xlsx', header=1, usecols='A:F')
# df = pd.read_excel('large_file.xlsx', nrows=1000)  # Test with first 1000 rows

# For demonstration (these would work with real files)
sample_excel_options = {
    'sheet_name': 'Data',
    'header': 0,
    'usecols': 'A:D',
    'nrows': 100
}
print(f"Excel options: {sample_excel_options}")

⚠️ Handling Common Problems

File loading often has issues. Here's how to handle them:

import pandas as pd

def safe_file_load(filename, file_type='csv'):
    """Safely load a file with error handling"""
    try:
        if file_type == 'csv':
            df = pd.read_csv(filename)
        elif file_type == 'excel':
            df = pd.read_excel(filename)
        elif file_type == 'json':
            df = pd.read_json(filename)
        
        print(f"✅ Successfully loaded {filename}")
        print(f"   Shape: {df.shape}")
        print(f"   Columns: {list(df.columns)}")
        return df
        
    except FileNotFoundError:
        print(f"❌ File '{filename}' not found")
    except Exception as e:
        print(f"❌ Error loading {filename}: {e}")
    return None

# Test with our sample file
df = safe_file_load('sample.csv', 'csv')
if df is not None:
    print(df.head())

📋 File Loading Checklist

StepWhat to CheckCommand
1. File exists?Is the file path correct?os.path.exists('file.csv')
2. Right method?CSV→read_csv, Excel→read_excelMatch file extension
3. Loaded correctly?Check shape and columns.shape, .columns
4. Data looks right?Inspect first few rows.head()
5. Missing values?Check for NaN values.isnull().sum()

🎯 Quick File Loading Reference

🚀 What's Next?

Great! You can now load data from the most common file formats. Next, let's learn how to connect to databases and load data directly from them.

Continue to: Loading from Databases

You're ready for real-world data! 📊💪

Was this helpful?

😔Poor
🙁Fair
😊Good
😄Great
🤩Excellent