Update Performance Optimization Workflow

John McCardle 2025-12-02 03:12:11 +00:00
parent da291638c5
commit ac90800e8f
1 changed files with 493 additions and 493 deletions

@ -1,494 +1,494 @@
# Performance Optimization Workflow # Performance Optimization Workflow
Systematic approach to identifying and resolving performance bottlenecks in McRogueFace. Systematic approach to identifying and resolving performance bottlenecks in McRogueFace.
## Quick Reference ## Quick Reference
**Related Systems:** [[Performance-and-Profiling]], [[Grid-System]] **Related Systems:** [[Performance-and-Profiling]], [[Grid-System]]
**Tools:** **Tools:**
- F3 profiler overlay (in-game) - F3 profiler overlay (in-game)
- `src/Profiler.h` - ScopedTimer - `src/Profiler.h` - ScopedTimer
- `tests/benchmark_*.py` - Performance benchmarks - `tests/benchmark_*.py` - Performance benchmarks
## The Optimization Cycle ## The Optimization Cycle
``` ```
1. PROFILE → 2. IDENTIFY → 3. INSTRUMENT → 4. OPTIMIZE → 5. VERIFY 1. PROFILE → 2. IDENTIFY → 3. INSTRUMENT → 4. OPTIMIZE → 5. VERIFY
↑ | ↑ |
└─────────────────────────────────────────────────────────┘ └─────────────────────────────────────────────────────────┘
``` ```
--- ---
## Step 1: Profile - Find the Bottleneck ## Step 1: Profile - Find the Bottleneck
### Using F3 Overlay ### Using F3 Overlay
**Start the game and press F3:** **Start the game and press F3:**
Look for: Look for:
- **Red frame times** (>33ms) - Unacceptable performance - **Red frame times** (>33ms) - Unacceptable performance
- **Yellow frame times** (16-33ms) - Marginal performance - **Yellow frame times** (16-33ms) - Marginal performance
- **High subsystem times** - Which system is slow? - **High subsystem times** - Which system is slow?
- Grid rendering > 10ms? → Grid optimization needed - Grid rendering > 10ms? → Grid optimization needed
- Entity rendering > 5ms? → Entity culling/SpatialHash needed - Entity rendering > 5ms? → Entity culling/SpatialHash needed
- Python script time > 5ms? → Python callback optimization - Python script time > 5ms? → Python callback optimization
**Example profiler output:** **Example profiler output:**
``` ```
Frame: 45.2ms (RED) Frame: 45.2ms (RED)
FPS: 22 FPS: 22
Grid Render: 32.1ms ← BOTTLENECK! Grid Render: 32.1ms ← BOTTLENECK!
Entity Render: 8.5ms Entity Render: 8.5ms
Python Script: 2.1ms Python Script: 2.1ms
Animation: 1.2ms Animation: 1.2ms
Cells Rendered: 10000 Cells Rendered: 10000
Entities: 150 (visible: 50) Entities: 150 (visible: 50)
``` ```
**Analysis:** Grid rendering is the bottleneck (32ms of 45ms total). **Analysis:** Grid rendering is the bottleneck (32ms of 45ms total).
### Running Benchmarks ### Running Benchmarks
**Static Grid Benchmark:** **Static Grid Benchmark:**
```bash ```bash
cd build cd build
./mcrogueface --headless --exec ../tests/benchmark_static_grid.py ./mcrogueface --headless --exec ../tests/benchmark_static_grid.py
``` ```
**Moving Entities Benchmark:** **Moving Entities Benchmark:**
```bash ```bash
./mcrogueface --headless --exec ../tests/benchmark_moving_entities.py ./mcrogueface --headless --exec ../tests/benchmark_moving_entities.py
``` ```
**Output:** Baseline performance metrics to measure improvement against. **Output:** Baseline performance metrics to measure improvement against.
--- ---
## Step 2: Identify - Understand the Problem ## Step 2: Identify - Understand the Problem
### Common Performance Issues ### Common Performance Issues
**Issue: High Grid Render Time on Static Screens** **Issue: High Grid Render Time on Static Screens**
- **Symptom:** 20-40ms grid render, nothing changing - **Symptom:** 20-40ms grid render, nothing changing
- **Cause:** Redrawing unchanged cells every frame - **Cause:** Redrawing unchanged cells every frame
- **Solution:** Implement dirty flag system ([#116](../../issues/116)) - **Solution:** Implement dirty flag system ([#116](../../issues/116))
**Issue: High Entity Render Time with Many Entities** **Issue: High Entity Render Time with Many Entities**
- **Symptom:** 10-20ms entity render with 500+ entities - **Symptom:** 10-20ms entity render with 500+ entities
- **Cause:** O(n) iteration, no spatial indexing - **Cause:** O(n) iteration, no spatial indexing
- **Solution:** Implement SpatialHash ([#115](../../issues/115)) - **Solution:** Implement SpatialHash ([#115](../../issues/115))
**Issue: Slow Bulk Grid Updates from Python** **Issue: Slow Bulk Grid Updates from Python**
- **Symptom:** Frame drops when updating many cells - **Symptom:** Frame drops when updating many cells
- **Cause:** Python/C++ boundary crossings for each cell - **Cause:** Python/C++ boundary crossings for each cell
- **Solution:** Implement batch operations ([#113](../../issues/113)) - **Solution:** Implement batch operations ([#113](../../issues/113))
**Issue: High Python Script Time** **Issue: High Python Script Time**
- **Symptom:** 10-50ms in Python callbacks - **Symptom:** 10-50ms in Python callbacks
- **Cause:** Heavy computation in Python update loops - **Cause:** Heavy computation in Python update loops
- **Solution:** Move hot paths to C++ or optimize Python - **Solution:** Move hot paths to C++ or optimize Python
### Analyzing Call Stacks ### Analyzing Call Stacks
**When F3 overlay isn't enough:** **When F3 overlay isn't enough:**
```bash ```bash
# Build with debug symbols # Build with debug symbols
make clean make clean
cmake .. -DCMAKE_BUILD_TYPE=Debug cmake .. -DCMAKE_BUILD_TYPE=Debug
make make
# Profile with gdb # Profile with gdb
gdb ./mcrogueface gdb ./mcrogueface
(gdb) run (gdb) run
<trigger slow behavior> <trigger slow behavior>
(gdb) info threads (gdb) info threads
(gdb) bt # Backtrace (gdb) bt # Backtrace
``` ```
--- ---
## Step 3: Instrument - Measure Precisely ## Step 3: Instrument - Measure Precisely
### Adding ScopedTimer ### Adding ScopedTimer
**Identify the slow function and wrap it:** **Identify the slow function and wrap it:**
```cpp ```cpp
#include "Profiler.h" #include "Profiler.h"
void MySystem::slowFunction() { void MySystem::slowFunction() {
// Add timer // Add timer
ScopedTimer timer(Resources::game->metrics.mySystemTime); ScopedTimer timer(Resources::game->metrics.mySystemTime);
// Your slow code here // Your slow code here
for (auto& item : items) { for (auto& item : items) {
item->process(); item->process();
} }
} }
``` ```
### Adding Custom Metrics ### Adding Custom Metrics
**1. Add metric field to ProfilingMetrics:** **1. Add metric field to ProfilingMetrics:**
`src/GameEngine.h`: `src/GameEngine.h`:
```cpp ```cpp
struct ProfilingMetrics { struct ProfilingMetrics {
// ... existing fields ... // ... existing fields ...
float mySystemTime = 0.0f; // Add this float mySystemTime = 0.0f; // Add this
}; };
``` ```
**2. Reset in resetPerFrame():** **2. Reset in resetPerFrame():**
```cpp ```cpp
void ProfilingMetrics::resetPerFrame() { void ProfilingMetrics::resetPerFrame() {
// ... existing resets ... // ... existing resets ...
mySystemTime = 0.0f; // Add this mySystemTime = 0.0f; // Add this
} }
``` ```
**3. Display in ProfilerOverlay:** **3. Display in ProfilerOverlay:**
`src/ProfilerOverlay.cpp`: `src/ProfilerOverlay.cpp`:
```cpp ```cpp
void ProfilerOverlay::update(const ProfilingMetrics& metrics) { void ProfilerOverlay::update(const ProfilingMetrics& metrics) {
// ... existing formatting ... // ... existing formatting ...
ss << "My System: " << formatFloat(metrics.mySystemTime) << "ms\n"; ss << "My System: " << formatFloat(metrics.mySystemTime) << "ms\n";
} }
``` ```
**4. Rebuild and test:** **4. Rebuild and test:**
```bash ```bash
make make
cd build cd build
./mcrogueface ./mcrogueface
# Press F3 - see your metric! # Press F3 - see your metric!
``` ```
### Creating Benchmarks ### Creating Benchmarks
**Create `tests/benchmark_mysystem.py`:** **Create `tests/benchmark_mysystem.py`:**
```python ```python
import mcrfpy import mcrfpy
import sys import sys
import time import time
def benchmark(): def benchmark():
# Setup # Setup
mcrfpy.createScene("bench") mcrfpy.createScene("bench")
# ... create test scenario ... # ... create test scenario ...
frame_times = [] frame_times = []
def measure(runtime_ms): def measure(runtime_ms):
frame_times.append(runtime_ms) frame_times.append(runtime_ms)
if len(frame_times) >= 300: # 5 seconds at 60fps if len(frame_times) >= 300: # 5 seconds at 60fps
# Report statistics # Report statistics
avg = sum(frame_times) / len(frame_times) avg = sum(frame_times) / len(frame_times)
min_time = min(frame_times) min_time = min(frame_times)
max_time = max(frame_times) max_time = max(frame_times)
print(f"Average: {avg:.2f}ms") print(f"Average: {avg:.2f}ms")
print(f"Min: {min_time:.2f}ms") print(f"Min: {min_time:.2f}ms")
print(f"Max: {max_time:.2f}ms") print(f"Max: {max_time:.2f}ms")
print(f"FPS: {1000/avg:.1f}") print(f"FPS: {1000/avg:.1f}")
sys.exit(0) sys.exit(0)
mcrfpy.setTimer("benchmark", measure, 16) # Every frame mcrfpy.setTimer("benchmark", measure, 16) # Every frame
benchmark() benchmark()
``` ```
**Run:** **Run:**
```bash ```bash
./mcrogueface --headless --exec ../tests/benchmark_mysystem.py ./mcrogueface --headless --exec ../tests/benchmark_mysystem.py
``` ```
--- ---
## Step 4: Optimize - Make It Faster ## Step 4: Optimize - Make It Faster
### Optimization Strategies ### Optimization Strategies
#### Strategy 1: Reduce Work #### Strategy 1: Reduce Work
**Example: Grid Dirty Flags** **Example: Grid Dirty Flags**
**Before:** Redraw all cells every frame **Before:** Redraw all cells every frame
```cpp ```cpp
for (int x = 0; x < grid_x; x++) { for (int x = 0; x < grid_x; x++) {
for (int y = 0; y < grid_y; y++) { for (int y = 0; y < grid_y; y++) {
renderCell(x, y); // Always renders renderCell(x, y); // Always renders
} }
} }
``` ```
**After:** Only redraw when changed **After:** Only redraw when changed
```cpp ```cpp
if (grid_dirty) { if (grid_dirty) {
for (int x = 0; x < grid_x; x++) { for (int x = 0; x < grid_x; x++) {
for (int y = 0; y < grid_y; y++) { for (int y = 0; y < grid_y; y++) {
renderCell(x, y); renderCell(x, y);
} }
} }
grid_dirty = false; grid_dirty = false;
} }
``` ```
**Expected:** 10-50x improvement for static scenes **Expected:** 10-50x improvement for static scenes
#### Strategy 2: Reduce Complexity #### Strategy 2: Reduce Complexity
**Example: SpatialHash for Entity Queries** **Example: SpatialHash for Entity Queries**
**Before:** O(n) search through all entities **Before:** O(n) search through all entities
```cpp ```cpp
for (auto& entity : entities) { for (auto& entity : entities) {
if (distanceTo(entity, target) < radius) { if (distanceTo(entity, target) < radius) {
nearby.push_back(entity); nearby.push_back(entity);
} }
} }
``` ```
**After:** O(1) hash lookup **After:** O(1) hash lookup
```cpp ```cpp
auto cell = spatialHash.getCell(target.x, target.y); auto cell = spatialHash.getCell(target.x, target.y);
for (auto& entity : cell.entities) { for (auto& entity : cell.entities) {
nearby.push_back(entity); nearby.push_back(entity);
} }
``` ```
**Expected:** 100x+ improvement for large entity counts **Expected:** 100x+ improvement for large entity counts
#### Strategy 3: Batch Operations #### Strategy 3: Batch Operations
**Example: Grid Batch Updates** **Example: Grid Batch Updates**
**Before:** Multiple Python/C++ crossings **Before:** Multiple Python/C++ crossings
```python ```python
for x in range(100): for x in range(100):
for y in range(100): for y in range(100):
grid.at((x, y)).tilesprite = 42 # 10,000 calls! grid.at((x, y)).tilesprite = 42 # 10,000 calls!
``` ```
**After:** Single batch operation **After:** Single batch operation
```python ```python
grid.fill_rect(0, 0, 100, 100, 42) # 1 call! grid.fill_rect(0, 0, 100, 100, 42) # 1 call!
``` ```
**Expected:** 10-100x improvement for bulk updates **Expected:** 10-100x improvement for bulk updates
#### Strategy 4: Cache Results #### Strategy 4: Cache Results
**Example: Path Caching in Entities** **Example: Path Caching in Entities**
**Before:** Recompute path every frame **Before:** Recompute path every frame
```cpp ```cpp
void Entity::update() { void Entity::update() {
path = computePathTo(target); // Expensive! path = computePathTo(target); // Expensive!
followPath(path); followPath(path);
} }
``` ```
**After:** Cache and reuse **After:** Cache and reuse
```cpp ```cpp
void Entity::update() { void Entity::update() {
if (!cachedPath || targetMoved) { if (!cachedPath || targetMoved) {
cachedPath = computePathTo(target); cachedPath = computePathTo(target);
} }
followPath(cachedPath); followPath(cachedPath);
} }
``` ```
**Expected:** 10x+ improvement for pathfinding-heavy scenarios **Expected:** 10x+ improvement for pathfinding-heavy scenarios
### Optimization Checklist ### Optimization Checklist
Before optimizing: Before optimizing:
- [ ] Profiled and identified real bottleneck - [ ] Profiled and identified real bottleneck
- [ ] Measured baseline performance - [ ] Measured baseline performance
- [ ] Understood root cause - [ ] Understood root cause
During optimization: During optimization:
- [ ] Changed only one thing at a time - [ ] Changed only one thing at a time
- [ ] Kept original code for comparison - [ ] Kept original code for comparison
- [ ] Added comments explaining optimization - [ ] Added comments explaining optimization
After optimization: After optimization:
- [ ] Measured improvement - [ ] Measured improvement
- [ ] Verified correctness (no bugs introduced) - [ ] Verified correctness (no bugs introduced)
- [ ] Updated tests if needed - [ ] Updated tests if needed
--- ---
## Step 5: Verify - Measure Improvement ## Step 5: Verify - Measure Improvement
### Re-run Benchmarks ### Re-run Benchmarks
**Before optimization:** **Before optimization:**
``` ```
Average: 45.2ms Average: 45.2ms
Min: 38.1ms Min: 38.1ms
Max: 62.3ms Max: 62.3ms
FPS: 22.1 FPS: 22.1
``` ```
**After optimization:** **After optimization:**
``` ```
Average: 8.5ms ← 5.3x improvement! Average: 8.5ms ← 5.3x improvement!
Min: 7.2ms Min: 7.2ms
Max: 12.1ms Max: 12.1ms
FPS: 117.6 FPS: 117.6
``` ```
### Check Correctness ### Check Correctness
**Visual testing:** **Visual testing:**
1. Run game normally (not headless) 1. Run game normally (not headless)
2. Verify visual output unchanged 2. Verify visual output unchanged
3. Test edge cases (empty grids, max entities, etc.) 3. Test edge cases (empty grids, max entities, etc.)
**Automated testing:** **Automated testing:**
```bash ```bash
# Run existing test suite # Run existing test suite
./mcrogueface --headless --exec tests/test_grid_operations.py ./mcrogueface --headless --exec tests/test_grid_operations.py
./mcrogueface --headless --exec tests/test_entity_movement.py ./mcrogueface --headless --exec tests/test_entity_movement.py
``` ```
### Document Results ### Document Results
**Create issue comment:** **Create issue comment:**
```markdown ```markdown
## Performance Optimization Results ## Performance Optimization Results
**Issue:** #116 (Dirty Flag System) **Issue:** #116 (Dirty Flag System)
**Baseline:** **Baseline:**
- Static grid (100x100): 32.1ms average - Static grid (100x100): 32.1ms average
- FPS: 22 - FPS: 22
**After Optimization:** **After Optimization:**
- Static grid (100x100): 0.8ms average - Static grid (100x100): 0.8ms average
- FPS: 118 - FPS: 118
**Improvement:** 40x faster for static scenes **Improvement:** 40x faster for static scenes
**Test:** `tests/benchmark_static_grid.py` **Test:** `tests/benchmark_static_grid.py`
**Commit:** abc123def **Commit:** abc123def
``` ```
--- ---
## Common Optimization Patterns ## Common Optimization Patterns
### Pattern 1: Early Exit ### Pattern 1: Early Exit
```cpp ```cpp
// Check cheap conditions first // Check cheap conditions first
if (!visible) return; if (!visible) return;
if (opacity <= 0.0f) return; if (opacity <= 0.0f) return;
if (!inViewport(bounds)) return; if (!inViewport(bounds)) return;
// Then do expensive work // Then do expensive work
render(); render();
``` ```
### Pattern 2: Lazy Evaluation ### Pattern 2: Lazy Evaluation
```cpp ```cpp
// Don't compute until needed // Don't compute until needed
mutable bool fovComputed = false; mutable bool fovComputed = false;
mutable std::vector<Point> visibleCells; mutable std::vector<Point> visibleCells;
std::vector<Point>& getVisibleCells() { std::vector<Point>& getVisibleCells() {
if (!fovComputed) { if (!fovComputed) {
visibleCells = computeFOV(); visibleCells = computeFOV();
fovComputed = true; fovComputed = true;
} }
return visibleCells; return visibleCells;
} }
``` ```
### Pattern 3: Object Pooling ### Pattern 3: Object Pooling
```cpp ```cpp
// Reuse instead of allocate/deallocate // Reuse instead of allocate/deallocate
class EntityPool { class EntityPool {
std::vector<Entity> pool; std::vector<Entity> pool;
std::vector<bool> active; std::vector<bool> active;
public: public:
Entity* spawn() { Entity* spawn() {
for (size_t i = 0; i < pool.size(); i++) { for (size_t i = 0; i < pool.size(); i++) {
if (!active[i]) { if (!active[i]) {
active[i] = true; active[i] = true;
return &pool[i]; return &pool[i];
} }
} }
// Grow pool if needed // Grow pool if needed
pool.emplace_back(); pool.emplace_back();
active.push_back(true); active.push_back(true);
return &pool.back(); return &pool.back();
} }
}; };
``` ```
### Pattern 4: Space-Time Tradeoff ### Pattern 4: Space-Time Tradeoff
```cpp ```cpp
// Cache expensive computation // Cache expensive computation
std::unordered_map<int, std::vector<Point>> pathCache; std::unordered_map<int, std::vector<Point>> pathCache;
std::vector<Point> getPathTo(int targetId) { std::vector<Point> getPathTo(int targetId) {
if (pathCache.contains(targetId)) { if (pathCache.contains(targetId)) {
return pathCache[targetId]; // O(1) lookup return pathCache[targetId]; // O(1) lookup
} }
auto path = computeExpensivePath(targetId); auto path = computeExpensivePath(targetId);
pathCache[targetId] = path; pathCache[targetId] = path;
return path; return path;
} }
``` ```
--- ---
## When NOT to Optimize ## When NOT to Optimize
**Don't optimize if:** **Don't optimize if:**
- Performance is already acceptable (< 16ms frame time) - Performance is already acceptable (< 16ms frame time)
- Optimization makes code significantly more complex - Optimization makes code significantly more complex
- You haven't profiled yet (no guessing!) - You haven't profiled yet (no guessing!)
- The bottleneck is elsewhere (optimize hot paths first) - The bottleneck is elsewhere (optimize hot paths first)
**Premature optimization is the root of all evil** - Donald Knuth **Premature optimization is the root of all evil** - Donald Knuth
Focus on: Focus on:
1. Correctness first 1. Correctness first
2. Profile to find real bottlenecks 2. Profile to find real bottlenecks
3. Optimize hot paths only 3. Optimize hot paths only
4. Keep code maintainable 4. Keep code maintainable
--- ---
## Related Documentation ## Related Documentation
- [[Performance-and-Profiling]] - Profiling tools reference - [[Performance-and-Profiling]] - Profiling tools reference
- [[Grid-System]] - Grid optimization opportunities - [[Grid-System]] - Grid optimization opportunities
- [[Writing-Tests]] - Creating performance tests - [[Writing-Tests]] - Creating performance tests
**Open Issues:** **Open Issues:**
- [#115](../../issues/115) - SpatialHash Implementation - [#115](../../issues/115) - SpatialHash Implementation
- [#116](../../issues/116) - Dirty Flag System - [#116](../../issues/116) - Dirty Flag System
- [#113](../../issues/113) - Batch Operations for Grid - [#113](../../issues/113) - Batch Operations for Grid
- [#117](../../issues/117) - Memory Pool for Entities - [#117](../../issues/117) - Memory Pool for Entities