📊 Array Aggregation

Array aggregation transforms collections of data into meaningful summaries! Whether you need totals, averages, extremes, or organized data, NumPy's aggregation functions help you extract insights from large datasets efficiently.

import numpy as np

# Array aggregation overview
sales_data = np.array([[120, 135, 145, 160],
                       [98, 112, 125, 140],
                       [156, 167, 175, 185]])

print(f"Sales data: \n{sales_data}")

# Basic aggregations
print(f"Total sales: {np.sum(sales_data)}")
print(f"Average: {np.mean(sales_data):.1f}")
print(f"Best performance: {np.max(sales_data)}")
print(f"Worst performance: {np.min(sales_data)}")

🔧 Core Aggregation Types

NumPy aggregation functions fall into three main categories:

  • Reduction Functions 📈: Sum, mean, min, max, std
  • Sorting Operations 🔢: Sort, argsort, partition
  • Data Organization 🗂️: Unique values, counting, grouping

📈 Why Aggregation Matters

Aggregation helps you understand your data by:

  • Summarizing large datasets into key metrics
  • Finding patterns and trends in data
  • Identifying extremes and outliers
  • Organizing information for analysis
import numpy as np

# Student test scores across subjects
test_scores = np.array([[85, 92, 78, 88],   # Alice
                        [79, 85, 91, 82],   # Bob
                        [94, 89, 96, 93],   # Carol
                        [72, 78, 74, 76]])  # David

students = ['Alice', 'Bob', 'Carol', 'David']
subjects = ['Math', 'Science', 'English', 'History']

print(f"Test scores: \n{test_scores}")

# Different aggregation insights
print(f"Class average: {np.mean(test_scores):.1f}")
print(f"Top score: {np.max(test_scores)}")
print(f"Student averages: {np.mean(test_scores, axis=1).round(1)}")

🎯 Aggregation Categories

Reduction Functions

Transform arrays into single values or smaller dimensions.

import numpy as np

monthly_sales = np.array([1200, 1350, 1180, 1420, 1290, 1380])

# Common reductions
print(f"Sales: {monthly_sales}")
print(f"Total: {np.sum(monthly_sales)}")
print(f"Average: {np.mean(monthly_sales):.0f}")
print(f"Best month: {np.max(monthly_sales)}")
print(f"Worst month: {np.min(monthly_sales)}")
print(f"Range: {np.ptp(monthly_sales)}")  # peak-to-peak

Sorting Operations

Organize data in meaningful order.

import numpy as np

response_times = np.array([245, 123, 456, 189, 334, 267, 198])

# Sorting operations
sorted_times = np.sort(response_times)
sort_indices = np.argsort(response_times)

print(f"Original: {response_times}")
print(f"Sorted: {sorted_times}")
print(f"Sort indices: {sort_indices}")
print(f"Fastest response: {sorted_times[0]}ms")
print(f"Slowest response: {sorted_times[-1]}ms")

Data Organization

Find patterns and unique elements in data.

import numpy as np

survey_responses = np.array([5, 3, 4, 5, 2, 4, 5, 3, 4, 1, 5, 4])

# Unique value analysis
unique_values, counts = np.unique(survey_responses, return_counts=True)

print(f"Responses: {survey_responses}")
print(f"Unique ratings: {unique_values}")
print(f"Response counts: {counts}")

# Most common response
most_common_idx = np.argmax(counts)
print(f"Most common rating: {unique_values[most_common_idx]} ({counts[most_common_idx]} times)")

📐 Multi-Dimensional Aggregation

Work with aggregations across different axes of multi-dimensional arrays.

Axis-Specific Operations

import numpy as np

# Quarterly sales: 4 stores × 4 quarters
quarterly_sales = np.array([[120, 135, 145, 160],  # Store A
                           [98, 112, 125, 140],   # Store B
                           [156, 167, 175, 185],  # Store C
                           [89, 95, 105, 125]])   # Store D

stores = ['Store A', 'Store B', 'Store C', 'Store D']

# Different axis aggregations
store_totals = np.sum(quarterly_sales, axis=1)  # Sum across quarters
quarter_totals = np.sum(quarterly_sales, axis=0)  # Sum across stores

print(f"Store year totals: {store_totals}")
print(f"Quarterly totals: {quarter_totals}")

# Best performing store and quarter
best_store = np.argmax(store_totals)
best_quarter = np.argmax(quarter_totals)
print(f"Best store: {stores[best_store]} ({store_totals[best_store]})")
print(f"Best quarter: Q{best_quarter + 1} ({quarter_totals[best_quarter]})")

🎯 Performance Insights

Aggregation functions help extract business intelligence.

import numpy as np

# Website traffic data: daily visitors for 4 weeks
daily_traffic = np.array([[1200, 1350, 1180, 1420, 1290, 1100, 980],   # Week 1
                         [1250, 1400, 1220, 1380, 1340, 1150, 1020],   # Week 2
                         [1180, 1320, 1160, 1450, 1280, 1080, 950],    # Week 3
                         [1300, 1480, 1240, 1520, 1390, 1200, 1050]])  # Week 4

days = ['Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat', 'Sun']

# Traffic analysis
weekly_totals = np.sum(daily_traffic, axis=1)
daily_averages = np.mean(daily_traffic, axis=0)

print(f"Weekly traffic totals: {weekly_totals}")
print(f"Average by day: {daily_averages.round(0).astype(int)}")

# Find patterns
best_day_idx = np.argmax(daily_averages)
worst_day_idx = np.argmin(daily_averages)
print(f"Best day: {days[best_day_idx]} ({daily_averages[best_day_idx]:.0f} avg)")
print(f"Worst day: {days[worst_day_idx]} ({daily_averages[worst_day_idx]:.0f} avg)")

📚 What You'll Learn

This section covers essential aggregation techniques:

🧠 Real-World Applications

Business Analytics

import numpy as np

# Product sales data
product_sales = np.array([45, 67, 89, 34, 56, 78, 45, 23, 67, 89])
product_ids = np.array(['A', 'B', 'C', 'D', 'E', 'F', 'A', 'G', 'B', 'C'])

# Aggregation insights
total_revenue = np.sum(product_sales)
avg_sale = np.mean(product_sales)
best_sale = np.max(product_sales)

print(f"Total revenue: ${total_revenue}")
print(f"Average sale: ${avg_sale:.2f}")
print(f"Best single sale: ${best_sale}")

# Find top performers
top_sale_idx = np.argmax(product_sales)
print(f"Top performer: Product {product_ids[top_sale_idx]} (${product_sales[top_sale_idx]})")

Quality Control

import numpy as np

# Manufacturing measurements
measurements = np.array([10.2, 9.8, 10.1, 10.3, 9.9, 10.0, 10.2, 9.7, 10.4, 10.1])
tolerance = 0.3  # ±0.3 units from target (10.0)

# Quality metrics
mean_value = np.mean(measurements)
std_deviation = np.std(measurements)
within_tolerance = np.abs(measurements - 10.0) <= tolerance

print(f"Measurements: {measurements}")
print(f"Mean: {mean_value:.2f}")
print(f"Std deviation: {std_deviation:.3f}")
print(f"Within tolerance: {np.sum(within_tolerance)}/{len(measurements)}")
print(f"Quality rate: {np.mean(within_tolerance)*100:.1f}%")

🎯 Key Benefits

🚀 Ready to Aggregate?

Master data summarization with NumPy's aggregation functions! Start with the fundamental reduction operations.

Begin with: Sum, Mean, Min, Max

Was this helpful?

😔Poor
🙁Fair
😊Good
😄Great
🤩Excellent