🎭 Copying and Views
Views share memory with the original array (fast but linked). Copies create independent data (safe but slower). Understanding this difference is crucial for performance and avoiding unexpected behavior.
import numpy as np
original = np.array([1, 2, 3, 4, 5])
# View - shares memory
view = original[1:4]
print(f"View: {view}")
print(f"Shares memory: {np.shares_memory(original, view)}")
# Copy - independent data
copy = original.copy()
print(f"Copy: {copy}")
print(f"Shares memory: {np.shares_memory(original, copy)}")
🔍 Understanding Views
Views look at the same data but can have different shapes or access patterns.
When Views Are Created
import numpy as np
data = np.arange(12).reshape(3, 4)
print(f"Original: \n{data}")
# Operations that create views
slice_view = data[1:3, 1:3] # Slicing
reshape_view = data.reshape(4, 3) # Reshaping
transpose_view = data.T # Transposing
print(f"All are views:")
print(f"Slice: {np.shares_memory(data, slice_view)}")
print(f"Reshape: {np.shares_memory(data, reshape_view)}")
print(f"Transpose: {np.shares_memory(data, transpose_view)}")
Views Affect Original Data
import numpy as np
matrix = np.array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
# Create view and modify it
view = matrix[0:2, 1:3]
print(f"Original: \n{matrix}")
print(f"View: \n{view}")
# Modify view
view[0, 0] = 999
print(f"After modifying view: \n{matrix}")
📝 Understanding Copies
Copies create completely independent arrays with their own memory.
When Copies Are Created
import numpy as np
data = np.array([[1, 2, 3], [4, 5, 6]])
# Operations that create copies
explicit_copy = data.copy() # Explicit copy
fancy_copy = data[[0, 1]] # Fancy indexing
boolean_copy = data[data > 3] # Boolean indexing
flatten_copy = data.flatten() # Flatten always copies
print(f"Copy independence:")
print(f"Explicit: {np.shares_memory(data, explicit_copy)}")
print(f"Fancy: {np.shares_memory(data, fancy_copy)}")
print(f"Boolean: {np.shares_memory(data, boolean_copy)}")
print(f"Flatten: {np.shares_memory(data, flatten_copy)}")
Copies Are Safe to Modify
import numpy as np
original = np.array([[1, 2], [3, 4]])
copy = original.copy()
print(f"Original: \n{original}")
print(f"Copy: \n{copy}")
# Modify copy - original unchanged
copy[0, 0] = 999
print(f"After modifying copy:")
print(f"Original: \n{original}")
print(f"Copy: \n{copy}")
⚡ Performance Comparison
Views vs copies have very different performance characteristics:
import numpy as np
import time
# Create large array
large_array = np.random.rand(1000, 1000)
# Time view creation
start = time.time()
view = large_array[100:900, 100:900]
view_time = time.time() - start
# Time copy creation
start = time.time()
copy = large_array[100:900, 100:900].copy()
copy_time = time.time() - start
print(f"View creation: {view_time:.6f} seconds")
print(f"Copy creation: {copy_time:.6f} seconds")
print(f"Copy is {copy_time/view_time:.1f}x slower")
# Memory usage
print(f"Original: {large_array.nbytes / 1024**2:.1f} MB")
print(f"View extra memory: 0 MB")
print(f"Copy extra memory: {copy.nbytes / 1024**2:.1f} MB")
🧠 When to Use Each
Use Views When:
- Memory is limited
- Performance is critical
- You want changes to propagate
- Doing temporary calculations
Use Copies When:
- Data safety is important
- Independent processing needed
- Working with multiple threads
- Original might be deleted
🔧 Practical Examples
Safe Data Processing
import numpy as np
def process_safely(data):
"""Process data without affecting original"""
# Work on a copy to be safe
processed = data.copy()
processed *= 2
processed += 1
return processed
def process_efficiently(data):
"""Process using views for efficiency"""
# Use view for subset, but copy for modifications
subset = data[10:-10] # View
result = subset * 2 + 1 # Creates new array
return result
# Test both approaches
original_data = np.random.rand(100)
safe_result = process_safely(original_data)
efficient_result = process_efficiently(original_data)
print(f"Original unchanged: {np.allclose(original_data, original_data)}")
print(f"Results similar: {np.allclose(safe_result[10:-10], efficient_result)}")
ML Data Splitting
import numpy as np
X = np.random.rand(1000, 10)
y = np.random.randint(0, 2, 1000)
def split_with_views(X, y, split_ratio=0.8):
"""Fast but linked to original"""
split_idx = int(split_ratio * len(X))
return X[:split_idx], X[split_idx:], y[:split_idx], y[split_idx:]
def split_with_copies(X, y, split_ratio=0.8):
"""Safe but uses more memory"""
split_idx = int(split_ratio * len(X))
return (X[:split_idx].copy(), X[split_idx:].copy(),
y[:split_idx].copy(), y[split_idx:].copy())
# Choose based on your needs
X_train, X_test, y_train, y_test = split_with_copies(X, y)
print(f"Training set: {X_train.shape}")
print(f"Independent: {not np.shares_memory(X, X_train)}")
🎯 Key Takeaways
🚀 What's Next?
You've mastered array manipulation! Next, explore mathematical functions and operations.
Continue to: Mathematical Functions
Was this helpful?
Track Your Learning Progress
Sign in to bookmark tutorials and keep track of your learning journey.
Your progress is saved automatically as you read.