📄 Reading Files (CSV, Excel, JSON)
Loading data from files is one of the most important skills in data analysis. Let's learn how to read the three most common file formats: CSV, Excel, and JSON files.
📄 Reading CSV Files
CSV is the most common format. It's simple and works everywhere:
import pandas as pd
# Basic CSV reading
# df = pd.read_csv('data.csv')
# Let's simulate with sample data
sample_csv_data = """name,age,city
Alice,25,New York
Bob,30,London
Charlie,35,Tokyo"""
# Save sample data to show loading
with open('sample.csv', 'w') as f:
f.write(sample_csv_data)
# Now load it
df = pd.read_csv('sample.csv')
print("Loaded CSV data:")
print(df)
print(f"Shape: {df.shape}")
CSV with Different Separators
Not all "CSV" files use commas:
import pandas as pd
# Create sample data with semicolons
sample_data = """name;age;salary
Alice;25;50000
Bob;30;60000
Charlie;35;70000"""
with open('sample_semicolon.csv', 'w') as f:
f.write(sample_data)
# Load with different separator
df = pd.read_csv('sample_semicolon.csv', sep=';')
print("Data with semicolon separator:")
print(df)
📊 Reading Excel Files
Excel files are everywhere in business. Pandas makes them easy to read:
import pandas as pd
# For Excel files, you might need: pip install openpyxl
# Basic Excel reading
# df = pd.read_excel('data.xlsx')
# Simulate Excel data (you'd normally load from actual .xlsx file)
excel_data = pd.DataFrame({
'product': ['Laptop', 'Mouse', 'Keyboard'],
'price': [999, 25, 75],
'stock': [10, 50, 30]
})
# Save as Excel to demonstrate
excel_data.to_excel('sample.xlsx', index=False)
# Load it back
df = pd.read_excel('sample.xlsx')
print("Loaded Excel data:")
print(df)
Excel with Multiple Sheets
Excel files often have multiple worksheets:
import pandas as pd
# Reading specific sheet
# df = pd.read_excel('data.xlsx', sheet_name='Sales')
# See all sheet names
# sheet_names = pd.ExcelFile('data.xlsx').sheet_names
# print("Available sheets:", sheet_names)
# Example sheet operations
df_sheet1 = pd.read_excel('file.xlsx', sheet_name='Sheet1')
df_first = pd.read_excel('file.xlsx', sheet_name=0) # First sheet
all_sheets = pd.read_excel('file.xlsx', sheet_name=None) # All sheets
🌐 Reading JSON Files
JSON is popular for web data and APIs:
import pandas as pd
import json
# Create sample JSON data
json_data = [
{"name": "Alice", "age": 25, "city": "New York"},
{"name": "Bob", "age": 30, "city": "London"},
{"name": "Charlie", "age": 35, "city": "Tokyo"}
]
# Save as JSON file
with open('sample.json', 'w') as f:
json.dump(json_data, f)
# Load JSON file
df = pd.read_json('sample.json')
print("Loaded JSON data:")
print(df)
🔧 Common Loading Options
Here are the most useful parameters for file loading:
CSV Options
import pandas as pd
# Common CSV parameters
print("CSV loading options:")
print()
# Basic with common options
sample_data = """Name,Age,Salary,Department
Alice,25,50000,Sales
Bob,30,,Engineering
Charlie,35,70000,Marketing"""
with open('sample_options.csv', 'w') as f:
f.write(sample_data)
# Load with options
df = pd.read_csv('sample_options.csv',
na_values=['', 'N/A'], # What counts as missing
dtype={'Age': int}) # Force Age to be integer
print("Data with options:")
print(df)
print()
print("Data types:")
print(df.dtypes)
Excel Options
import pandas as pd
# Example Excel loading with options
# df = pd.read_excel('sales.xlsx', sheet_name='Q1_Data')
# df = pd.read_excel('report.xlsx', header=1, usecols='A:F')
# df = pd.read_excel('large_file.xlsx', nrows=1000) # Test with first 1000 rows
# For demonstration (these would work with real files)
sample_excel_options = {
'sheet_name': 'Data',
'header': 0,
'usecols': 'A:D',
'nrows': 100
}
print(f"Excel options: {sample_excel_options}")
⚠️ Handling Common Problems
File loading often has issues. Here's how to handle them:
import pandas as pd
def safe_file_load(filename, file_type='csv'):
"""Safely load a file with error handling"""
try:
if file_type == 'csv':
df = pd.read_csv(filename)
elif file_type == 'excel':
df = pd.read_excel(filename)
elif file_type == 'json':
df = pd.read_json(filename)
print(f"✅ Successfully loaded {filename}")
print(f" Shape: {df.shape}")
print(f" Columns: {list(df.columns)}")
return df
except FileNotFoundError:
print(f"❌ File '{filename}' not found")
except Exception as e:
print(f"❌ Error loading {filename}: {e}")
return None
# Test with our sample file
df = safe_file_load('sample.csv', 'csv')
if df is not None:
print(df.head())
📋 File Loading Checklist
Step | What to Check | Command |
---|---|---|
1. File exists? | Is the file path correct? | os.path.exists('file.csv') |
2. Right method? | CSV→read_csv, Excel→read_excel | Match file extension |
3. Loaded correctly? | Check shape and columns | .shape , .columns |
4. Data looks right? | Inspect first few rows | .head() |
5. Missing values? | Check for NaN values | .isnull().sum() |
🎯 Quick File Loading Reference
🚀 What's Next?
Great! You can now load data from the most common file formats. Next, let's learn how to connect to databases and load data directly from them.
Continue to: Loading from Databases
You're ready for real-world data! 📊💪
Was this helpful?
Track Your Learning Progress
Sign in to bookmark tutorials and keep track of your learning journey.
Your progress is saved automatically as you read.