🗄️ Data Processing
Data processing is at the heart of many Python applications. Whether you're cleaning text, fetching data from APIs, working with databases, or processing large files, Python provides powerful tools to handle various data formats and sources efficiently.
# Example of common data processing tasks
import re
import json
# Text processing with regex
text = "Contact us at info@company.com or call (555) 123-4567"
emails = re.findall(r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b', text)
phones = re.findall(r'\(\d{3}\) \d{3}-\d{4}', text)
print(f"Found emails: {emails}")
print(f"Found phones: {phones}")
# JSON data processing
data = {
"users": [
{"name": "Alice", "score": 95},
{"name": "Bob", "score": 87},
{"name": "Charlie", "score": 92}
]
}
# Filter and transform data
high_scorers = [user for user in data["users"] if user["score"] > 90]
json_output = json.dumps(high_scorers, indent=2)
print(f"\nHigh scorers: {json_output}")
🎯 Why Data Processing Matters
Modern applications work with diverse data sources and formats. Python excels at connecting, transforming, and analyzing data from various sources.
📚 Data Processing Topics
Learn essential data processing techniques:
- 🔍 Work with Regular Expressions Pattern matching and text extraction with regex.
- 🌐 Parse XML and HTML Extract data from markup languages and web content.
- 🔗 Handle API Requests Fetch and process data from web APIs and services.
- 💾 Work with Databases Connect to and query SQL databases efficiently.
- 📁 Process Large Files Handle big datasets with memory-efficient techniques.
📊 Data Processing Overview
Data Format Types
Format | Use Case | Python Tools |
---|---|---|
JSON | APIs, configuration | json module |
CSV | Spreadsheets, data export | csv module |
XML/HTML | Web content, documents | xml.etree , BeautifulSoup |
SQL | Structured databases | sqlite3 , sqlalchemy |
Text | Logs, documents | re , string methods |
Processing Patterns
Pattern | Purpose | Example |
---|---|---|
Extract | Get specific data | Regex, CSS selectors |
Transform | Change format/structure | JSON to CSV conversion |
Validate | Check data quality | Type checking, ranges |
Aggregate | Summarize data | Counting, grouping |
Filter | Select subset | Conditions, criteria |
🌟 Quick Data Processing Examples
Here's what you'll learn to build:
import re
import json
# Text processing pipeline
def process_log_entry(log_line):
# Extract timestamp, level, and message
pattern = r'(\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}) \[(\w+)\] (.+)'
match = re.match(pattern, log_line)
if match:
return {
'timestamp': match.group(1),
'level': match.group(2),
'message': match.group(3)
}
return None
# Sample data processing
log_lines = [
"2024-01-15 10:30:45 [INFO] User login successful",
"2024-01-15 10:31:12 [ERROR] Database connection failed",
"2024-01-15 10:31:45 [INFO] User logout"
]
processed_logs = []
for line in log_lines:
parsed = process_log_entry(line)
if parsed:
processed_logs.append(parsed)
print("Processed log entries:")
for log in processed_logs:
print(f" {log['level']}: {log['message']}")
# Data transformation
error_count = sum(1 for log in processed_logs if log['level'] == 'ERROR')
info_count = sum(1 for log in processed_logs if log['level'] == 'INFO')
summary = {
'total_entries': len(processed_logs),
'errors': error_count,
'info': info_count
}
print(f"\nLog summary: {json.dumps(summary, indent=2)}")
🔄 Data Processing Workflow
💡 Real-World Applications
Data processing powers many applications:
- Web Scraping 🕷️: Extract product info, prices, news
- Log Analysis 📊: Monitor system health, user behavior
- Data Integration 🔗: Combine data from multiple sources
- API Services 🌐: Process requests, return formatted data
- Report Generation 📈: Transform raw data into insights
- File Processing 📁: Batch convert, clean, or validate files
🧪 Processing Tools Reference
Task | Built-in Modules | Third-party Options |
---|---|---|
Regex | re | |
JSON | json | |
CSV | csv | pandas |
XML/HTML | xml.etree | BeautifulSoup , lxml |
HTTP/APIs | urllib | requests |
Databases | sqlite3 | sqlalchemy , psycopg2 |
🚀 Ready to Process Data?
Data processing skills are essential for building real-world applications. Master these techniques to handle various data sources and formats effectively.
Start your journey: Begin with Work with Regular Expressions
Transform raw data into valuable insights with Python! 🗄️✨
Was this helpful?
Track Your Learning Progress
Sign in to bookmark tutorials and keep track of your learning journey.
Your progress is saved automatically as you read.