🎭 Copying and Views

Views share memory with the original array (fast but linked). Copies create independent data (safe but slower). Understanding this difference is crucial for performance and avoiding unexpected behavior.

import numpy as np

original = np.array([1, 2, 3, 4, 5])

# View - shares memory
view = original[1:4]
print(f"View: {view}")
print(f"Shares memory: {np.shares_memory(original, view)}")

# Copy - independent data
copy = original.copy()
print(f"Copy: {copy}")
print(f"Shares memory: {np.shares_memory(original, copy)}")

🔍 Understanding Views

Views look at the same data but can have different shapes or access patterns.

When Views Are Created

import numpy as np

data = np.arange(12).reshape(3, 4)
print(f"Original: \n{data}")

# Operations that create views
slice_view = data[1:3, 1:3]     # Slicing
reshape_view = data.reshape(4, 3)  # Reshaping
transpose_view = data.T          # Transposing

print(f"All are views:")
print(f"Slice: {np.shares_memory(data, slice_view)}")
print(f"Reshape: {np.shares_memory(data, reshape_view)}")
print(f"Transpose: {np.shares_memory(data, transpose_view)}")

Views Affect Original Data

import numpy as np

matrix = np.array([[1, 2, 3],
                   [4, 5, 6],
                   [7, 8, 9]])

# Create view and modify it
view = matrix[0:2, 1:3]
print(f"Original: \n{matrix}")
print(f"View: \n{view}")

# Modify view
view[0, 0] = 999
print(f"After modifying view: \n{matrix}")

📝 Understanding Copies

Copies create completely independent arrays with their own memory.

When Copies Are Created

import numpy as np

data = np.array([[1, 2, 3], [4, 5, 6]])

# Operations that create copies
explicit_copy = data.copy()           # Explicit copy
fancy_copy = data[[0, 1]]            # Fancy indexing
boolean_copy = data[data > 3]        # Boolean indexing
flatten_copy = data.flatten()        # Flatten always copies

print(f"Copy independence:")
print(f"Explicit: {np.shares_memory(data, explicit_copy)}")
print(f"Fancy: {np.shares_memory(data, fancy_copy)}")
print(f"Boolean: {np.shares_memory(data, boolean_copy)}")
print(f"Flatten: {np.shares_memory(data, flatten_copy)}")

Copies Are Safe to Modify

import numpy as np

original = np.array([[1, 2], [3, 4]])
copy = original.copy()

print(f"Original: \n{original}")
print(f"Copy: \n{copy}")

# Modify copy - original unchanged
copy[0, 0] = 999
print(f"After modifying copy:")
print(f"Original: \n{original}")
print(f"Copy: \n{copy}")

⚡ Performance Comparison

Views vs copies have very different performance characteristics:

import numpy as np
import time

# Create large array
large_array = np.random.rand(1000, 1000)

# Time view creation
start = time.time()
view = large_array[100:900, 100:900]
view_time = time.time() - start

# Time copy creation
start = time.time()
copy = large_array[100:900, 100:900].copy()
copy_time = time.time() - start

print(f"View creation: {view_time:.6f} seconds")
print(f"Copy creation: {copy_time:.6f} seconds")
print(f"Copy is {copy_time/view_time:.1f}x slower")

# Memory usage
print(f"Original: {large_array.nbytes / 1024**2:.1f} MB")
print(f"View extra memory: 0 MB")
print(f"Copy extra memory: {copy.nbytes / 1024**2:.1f} MB")

🧠 When to Use Each

Use Views When:

  • Memory is limited
  • Performance is critical
  • You want changes to propagate
  • Doing temporary calculations

Use Copies When:

  • Data safety is important
  • Independent processing needed
  • Working with multiple threads
  • Original might be deleted

🔧 Practical Examples

Safe Data Processing

import numpy as np

def process_safely(data):
    """Process data without affecting original"""
    # Work on a copy to be safe
    processed = data.copy()
    processed *= 2
    processed += 1
    return processed

def process_efficiently(data):
    """Process using views for efficiency"""
    # Use view for subset, but copy for modifications
    subset = data[10:-10]  # View
    result = subset * 2 + 1  # Creates new array
    return result

# Test both approaches
original_data = np.random.rand(100)
safe_result = process_safely(original_data)
efficient_result = process_efficiently(original_data)

print(f"Original unchanged: {np.allclose(original_data, original_data)}")
print(f"Results similar: {np.allclose(safe_result[10:-10], efficient_result)}")

ML Data Splitting

import numpy as np

X = np.random.rand(1000, 10)
y = np.random.randint(0, 2, 1000)

def split_with_views(X, y, split_ratio=0.8):
    """Fast but linked to original"""
    split_idx = int(split_ratio * len(X))
    return X[:split_idx], X[split_idx:], y[:split_idx], y[split_idx:]

def split_with_copies(X, y, split_ratio=0.8):
    """Safe but uses more memory"""
    split_idx = int(split_ratio * len(X))
    return (X[:split_idx].copy(), X[split_idx:].copy(), 
            y[:split_idx].copy(), y[split_idx:].copy())

# Choose based on your needs
X_train, X_test, y_train, y_test = split_with_copies(X, y)
print(f"Training set: {X_train.shape}")
print(f"Independent: {not np.shares_memory(X, X_train)}")

🎯 Key Takeaways

🚀 What's Next?

You've mastered array manipulation! Next, explore mathematical functions and operations.

Continue to: Mathematical Functions

Was this helpful?

😔Poor
🙁Fair
😊Good
😄Great
🤩Excellent