🗄️ Data Processing

Data processing is at the heart of many Python applications. Whether you're cleaning text, fetching data from APIs, working with databases, or processing large files, Python provides powerful tools to handle various data formats and sources efficiently.

# Example of common data processing tasks
import re
import json

# Text processing with regex
text = "Contact us at info@company.com or call (555) 123-4567"
emails = re.findall(r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b', text)
phones = re.findall(r'\(\d{3}\) \d{3}-\d{4}', text)

print(f"Found emails: {emails}")
print(f"Found phones: {phones}")

# JSON data processing
data = {
    "users": [
        {"name": "Alice", "score": 95},
        {"name": "Bob", "score": 87},
        {"name": "Charlie", "score": 92}
    ]
}

# Filter and transform data
high_scorers = [user for user in data["users"] if user["score"] > 90]
json_output = json.dumps(high_scorers, indent=2)

print(f"\nHigh scorers: {json_output}")

🎯 Why Data Processing Matters

Modern applications work with diverse data sources and formats. Python excels at connecting, transforming, and analyzing data from various sources.

📚 Data Processing Topics

Learn essential data processing techniques:

📊 Data Processing Overview

Data Format Types

FormatUse CasePython Tools
JSONAPIs, configurationjson module
CSVSpreadsheets, data exportcsv module
XML/HTMLWeb content, documentsxml.etree, BeautifulSoup
SQLStructured databasessqlite3, sqlalchemy
TextLogs, documentsre, string methods

Processing Patterns

PatternPurposeExample
ExtractGet specific dataRegex, CSS selectors
TransformChange format/structureJSON to CSV conversion
ValidateCheck data qualityType checking, ranges
AggregateSummarize dataCounting, grouping
FilterSelect subsetConditions, criteria

🌟 Quick Data Processing Examples

Here's what you'll learn to build:

import re
import json

# Text processing pipeline
def process_log_entry(log_line):
    # Extract timestamp, level, and message
    pattern = r'(\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}) \[(\w+)\] (.+)'
    match = re.match(pattern, log_line)
    
    if match:
        return {
            'timestamp': match.group(1),
            'level': match.group(2),
            'message': match.group(3)
        }
    return None

# Sample data processing
log_lines = [
    "2024-01-15 10:30:45 [INFO] User login successful",
    "2024-01-15 10:31:12 [ERROR] Database connection failed",
    "2024-01-15 10:31:45 [INFO] User logout"
]

processed_logs = []
for line in log_lines:
    parsed = process_log_entry(line)
    if parsed:
        processed_logs.append(parsed)

print("Processed log entries:")
for log in processed_logs:
    print(f"  {log['level']}: {log['message']}")

# Data transformation
error_count = sum(1 for log in processed_logs if log['level'] == 'ERROR')
info_count = sum(1 for log in processed_logs if log['level'] == 'INFO')

summary = {
    'total_entries': len(processed_logs),
    'errors': error_count,
    'info': info_count
}

print(f"\nLog summary: {json.dumps(summary, indent=2)}")

🔄 Data Processing Workflow

💡 Real-World Applications

Data processing powers many applications:

  • Web Scraping 🕷️: Extract product info, prices, news
  • Log Analysis 📊: Monitor system health, user behavior
  • Data Integration 🔗: Combine data from multiple sources
  • API Services 🌐: Process requests, return formatted data
  • Report Generation 📈: Transform raw data into insights
  • File Processing 📁: Batch convert, clean, or validate files

🧪 Processing Tools Reference

TaskBuilt-in ModulesThird-party Options
Regexre
JSONjson
CSVcsvpandas
XML/HTMLxml.etreeBeautifulSoup, lxml
HTTP/APIsurllibrequests
Databasessqlite3sqlalchemy, psycopg2

🚀 Ready to Process Data?

Data processing skills are essential for building real-world applications. Master these techniques to handle various data sources and formats effectively.

Start your journey: Begin with Work with Regular Expressions

Transform raw data into valuable insights with Python! 🗄️✨

Was this helpful?

😔Poor
🙁Fair
😊Good
😄Great
🤩Excellent