📊 Array Aggregation
Array aggregation transforms collections of data into meaningful summaries! Whether you need totals, averages, extremes, or organized data, NumPy's aggregation functions help you extract insights from large datasets efficiently.
import numpy as np
# Array aggregation overview
sales_data = np.array([[120, 135, 145, 160],
[98, 112, 125, 140],
[156, 167, 175, 185]])
print(f"Sales data: \n{sales_data}")
# Basic aggregations
print(f"Total sales: {np.sum(sales_data)}")
print(f"Average: {np.mean(sales_data):.1f}")
print(f"Best performance: {np.max(sales_data)}")
print(f"Worst performance: {np.min(sales_data)}")
🔧 Core Aggregation Types
NumPy aggregation functions fall into three main categories:
- Reduction Functions 📈: Sum, mean, min, max, std
- Sorting Operations 🔢: Sort, argsort, partition
- Data Organization 🗂️: Unique values, counting, grouping
📈 Why Aggregation Matters
Aggregation helps you understand your data by:
- Summarizing large datasets into key metrics
- Finding patterns and trends in data
- Identifying extremes and outliers
- Organizing information for analysis
import numpy as np
# Student test scores across subjects
test_scores = np.array([[85, 92, 78, 88], # Alice
[79, 85, 91, 82], # Bob
[94, 89, 96, 93], # Carol
[72, 78, 74, 76]]) # David
students = ['Alice', 'Bob', 'Carol', 'David']
subjects = ['Math', 'Science', 'English', 'History']
print(f"Test scores: \n{test_scores}")
# Different aggregation insights
print(f"Class average: {np.mean(test_scores):.1f}")
print(f"Top score: {np.max(test_scores)}")
print(f"Student averages: {np.mean(test_scores, axis=1).round(1)}")
🎯 Aggregation Categories
Reduction Functions
Transform arrays into single values or smaller dimensions.
import numpy as np
monthly_sales = np.array([1200, 1350, 1180, 1420, 1290, 1380])
# Common reductions
print(f"Sales: {monthly_sales}")
print(f"Total: {np.sum(monthly_sales)}")
print(f"Average: {np.mean(monthly_sales):.0f}")
print(f"Best month: {np.max(monthly_sales)}")
print(f"Worst month: {np.min(monthly_sales)}")
print(f"Range: {np.ptp(monthly_sales)}") # peak-to-peak
Sorting Operations
Organize data in meaningful order.
import numpy as np
response_times = np.array([245, 123, 456, 189, 334, 267, 198])
# Sorting operations
sorted_times = np.sort(response_times)
sort_indices = np.argsort(response_times)
print(f"Original: {response_times}")
print(f"Sorted: {sorted_times}")
print(f"Sort indices: {sort_indices}")
print(f"Fastest response: {sorted_times[0]}ms")
print(f"Slowest response: {sorted_times[-1]}ms")
Data Organization
Find patterns and unique elements in data.
import numpy as np
survey_responses = np.array([5, 3, 4, 5, 2, 4, 5, 3, 4, 1, 5, 4])
# Unique value analysis
unique_values, counts = np.unique(survey_responses, return_counts=True)
print(f"Responses: {survey_responses}")
print(f"Unique ratings: {unique_values}")
print(f"Response counts: {counts}")
# Most common response
most_common_idx = np.argmax(counts)
print(f"Most common rating: {unique_values[most_common_idx]} ({counts[most_common_idx]} times)")
📐 Multi-Dimensional Aggregation
Work with aggregations across different axes of multi-dimensional arrays.
Axis-Specific Operations
import numpy as np
# Quarterly sales: 4 stores × 4 quarters
quarterly_sales = np.array([[120, 135, 145, 160], # Store A
[98, 112, 125, 140], # Store B
[156, 167, 175, 185], # Store C
[89, 95, 105, 125]]) # Store D
stores = ['Store A', 'Store B', 'Store C', 'Store D']
# Different axis aggregations
store_totals = np.sum(quarterly_sales, axis=1) # Sum across quarters
quarter_totals = np.sum(quarterly_sales, axis=0) # Sum across stores
print(f"Store year totals: {store_totals}")
print(f"Quarterly totals: {quarter_totals}")
# Best performing store and quarter
best_store = np.argmax(store_totals)
best_quarter = np.argmax(quarter_totals)
print(f"Best store: {stores[best_store]} ({store_totals[best_store]})")
print(f"Best quarter: Q{best_quarter + 1} ({quarter_totals[best_quarter]})")
🎯 Performance Insights
Aggregation functions help extract business intelligence.
import numpy as np
# Website traffic data: daily visitors for 4 weeks
daily_traffic = np.array([[1200, 1350, 1180, 1420, 1290, 1100, 980], # Week 1
[1250, 1400, 1220, 1380, 1340, 1150, 1020], # Week 2
[1180, 1320, 1160, 1450, 1280, 1080, 950], # Week 3
[1300, 1480, 1240, 1520, 1390, 1200, 1050]]) # Week 4
days = ['Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat', 'Sun']
# Traffic analysis
weekly_totals = np.sum(daily_traffic, axis=1)
daily_averages = np.mean(daily_traffic, axis=0)
print(f"Weekly traffic totals: {weekly_totals}")
print(f"Average by day: {daily_averages.round(0).astype(int)}")
# Find patterns
best_day_idx = np.argmax(daily_averages)
worst_day_idx = np.argmin(daily_averages)
print(f"Best day: {days[best_day_idx]} ({daily_averages[best_day_idx]:.0f} avg)")
print(f"Worst day: {days[worst_day_idx]} ({daily_averages[worst_day_idx]:.0f} avg)")
📚 What You'll Learn
This section covers essential aggregation techniques:
- 📈 Sum, Mean, Min, Max - Master basic reduction functions for data summarization
- 🔢 Sorting Arrays - Organize data with sorting operations and finding positions
- 🗂️ Finding Unique Values - Discover patterns and count occurrences in data
🧠 Real-World Applications
Business Analytics
import numpy as np
# Product sales data
product_sales = np.array([45, 67, 89, 34, 56, 78, 45, 23, 67, 89])
product_ids = np.array(['A', 'B', 'C', 'D', 'E', 'F', 'A', 'G', 'B', 'C'])
# Aggregation insights
total_revenue = np.sum(product_sales)
avg_sale = np.mean(product_sales)
best_sale = np.max(product_sales)
print(f"Total revenue: ${total_revenue}")
print(f"Average sale: ${avg_sale:.2f}")
print(f"Best single sale: ${best_sale}")
# Find top performers
top_sale_idx = np.argmax(product_sales)
print(f"Top performer: Product {product_ids[top_sale_idx]} (${product_sales[top_sale_idx]})")
Quality Control
import numpy as np
# Manufacturing measurements
measurements = np.array([10.2, 9.8, 10.1, 10.3, 9.9, 10.0, 10.2, 9.7, 10.4, 10.1])
tolerance = 0.3 # ±0.3 units from target (10.0)
# Quality metrics
mean_value = np.mean(measurements)
std_deviation = np.std(measurements)
within_tolerance = np.abs(measurements - 10.0) <= tolerance
print(f"Measurements: {measurements}")
print(f"Mean: {mean_value:.2f}")
print(f"Std deviation: {std_deviation:.3f}")
print(f"Within tolerance: {np.sum(within_tolerance)}/{len(measurements)}")
print(f"Quality rate: {np.mean(within_tolerance)*100:.1f}%")
🎯 Key Benefits
🚀 Ready to Aggregate?
Master data summarization with NumPy's aggregation functions! Start with the fundamental reduction operations.
Begin with: Sum, Mean, Min, Max
Was this helpful?
Track Your Learning Progress
Sign in to bookmark tutorials and keep track of your learning journey.
Your progress is saved automatically as you read.