diff --git a/Performance-Optimization-Workflow.-.md b/Performance-Optimization-Workflow.md similarity index 95% rename from Performance-Optimization-Workflow.-.md rename to Performance-Optimization-Workflow.md index d1bce45..6d43fca 100644 --- a/Performance-Optimization-Workflow.-.md +++ b/Performance-Optimization-Workflow.md @@ -1,494 +1,494 @@ -# Performance Optimization Workflow - -Systematic approach to identifying and resolving performance bottlenecks in McRogueFace. - -## Quick Reference - -**Related Systems:** [[Performance-and-Profiling]], [[Grid-System]] - -**Tools:** -- F3 profiler overlay (in-game) -- `src/Profiler.h` - ScopedTimer -- `tests/benchmark_*.py` - Performance benchmarks - -## The Optimization Cycle - -``` -1. PROFILE → 2. IDENTIFY → 3. INSTRUMENT → 4. OPTIMIZE → 5. VERIFY - ↑ | - └─────────────────────────────────────────────────────────┘ -``` - ---- - -## Step 1: Profile - Find the Bottleneck - -### Using F3 Overlay - -**Start the game and press F3:** - -Look for: -- **Red frame times** (>33ms) - Unacceptable performance -- **Yellow frame times** (16-33ms) - Marginal performance -- **High subsystem times** - Which system is slow? - - Grid rendering > 10ms? → Grid optimization needed - - Entity rendering > 5ms? → Entity culling/SpatialHash needed - - Python script time > 5ms? → Python callback optimization - -**Example profiler output:** -``` -Frame: 45.2ms (RED) -FPS: 22 - -Grid Render: 32.1ms ← BOTTLENECK! -Entity Render: 8.5ms -Python Script: 2.1ms -Animation: 1.2ms - -Cells Rendered: 10000 -Entities: 150 (visible: 50) -``` - -**Analysis:** Grid rendering is the bottleneck (32ms of 45ms total). - -### Running Benchmarks - -**Static Grid Benchmark:** -```bash -cd build -./mcrogueface --headless --exec ../tests/benchmark_static_grid.py -``` - -**Moving Entities Benchmark:** -```bash -./mcrogueface --headless --exec ../tests/benchmark_moving_entities.py -``` - -**Output:** Baseline performance metrics to measure improvement against. - ---- - -## Step 2: Identify - Understand the Problem - -### Common Performance Issues - -**Issue: High Grid Render Time on Static Screens** -- **Symptom:** 20-40ms grid render, nothing changing -- **Cause:** Redrawing unchanged cells every frame -- **Solution:** Implement dirty flag system ([#116](../../issues/116)) - -**Issue: High Entity Render Time with Many Entities** -- **Symptom:** 10-20ms entity render with 500+ entities -- **Cause:** O(n) iteration, no spatial indexing -- **Solution:** Implement SpatialHash ([#115](../../issues/115)) - -**Issue: Slow Bulk Grid Updates from Python** -- **Symptom:** Frame drops when updating many cells -- **Cause:** Python/C++ boundary crossings for each cell -- **Solution:** Implement batch operations ([#113](../../issues/113)) - -**Issue: High Python Script Time** -- **Symptom:** 10-50ms in Python callbacks -- **Cause:** Heavy computation in Python update loops -- **Solution:** Move hot paths to C++ or optimize Python - -### Analyzing Call Stacks - -**When F3 overlay isn't enough:** - -```bash -# Build with debug symbols -make clean -cmake .. -DCMAKE_BUILD_TYPE=Debug -make - -# Profile with gdb -gdb ./mcrogueface -(gdb) run - -(gdb) info threads -(gdb) bt # Backtrace -``` - ---- - -## Step 3: Instrument - Measure Precisely - -### Adding ScopedTimer - -**Identify the slow function and wrap it:** - -```cpp -#include "Profiler.h" - -void MySystem::slowFunction() { - // Add timer - ScopedTimer timer(Resources::game->metrics.mySystemTime); - - // Your slow code here - for (auto& item : items) { - item->process(); - } -} -``` - -### Adding Custom Metrics - -**1. Add metric field to ProfilingMetrics:** - -`src/GameEngine.h`: -```cpp -struct ProfilingMetrics { - // ... existing fields ... - float mySystemTime = 0.0f; // Add this -}; -``` - -**2. Reset in resetPerFrame():** - -```cpp -void ProfilingMetrics::resetPerFrame() { - // ... existing resets ... - mySystemTime = 0.0f; // Add this -} -``` - -**3. Display in ProfilerOverlay:** - -`src/ProfilerOverlay.cpp`: -```cpp -void ProfilerOverlay::update(const ProfilingMetrics& metrics) { - // ... existing formatting ... - ss << "My System: " << formatFloat(metrics.mySystemTime) << "ms\n"; -} -``` - -**4. Rebuild and test:** - -```bash -make -cd build -./mcrogueface -# Press F3 - see your metric! -``` - -### Creating Benchmarks - -**Create `tests/benchmark_mysystem.py`:** - -```python -import mcrfpy -import sys -import time - -def benchmark(): - # Setup - mcrfpy.createScene("bench") - # ... create test scenario ... - - frame_times = [] - - def measure(runtime_ms): - frame_times.append(runtime_ms) - if len(frame_times) >= 300: # 5 seconds at 60fps - # Report statistics - avg = sum(frame_times) / len(frame_times) - min_time = min(frame_times) - max_time = max(frame_times) - - print(f"Average: {avg:.2f}ms") - print(f"Min: {min_time:.2f}ms") - print(f"Max: {max_time:.2f}ms") - print(f"FPS: {1000/avg:.1f}") - - sys.exit(0) - - mcrfpy.setTimer("benchmark", measure, 16) # Every frame - -benchmark() -``` - -**Run:** -```bash -./mcrogueface --headless --exec ../tests/benchmark_mysystem.py -``` - ---- - -## Step 4: Optimize - Make It Faster - -### Optimization Strategies - -#### Strategy 1: Reduce Work - -**Example: Grid Dirty Flags** - -**Before:** Redraw all cells every frame -```cpp -for (int x = 0; x < grid_x; x++) { - for (int y = 0; y < grid_y; y++) { - renderCell(x, y); // Always renders - } -} -``` - -**After:** Only redraw when changed -```cpp -if (grid_dirty) { - for (int x = 0; x < grid_x; x++) { - for (int y = 0; y < grid_y; y++) { - renderCell(x, y); - } - } - grid_dirty = false; -} -``` - -**Expected:** 10-50x improvement for static scenes - -#### Strategy 2: Reduce Complexity - -**Example: SpatialHash for Entity Queries** - -**Before:** O(n) search through all entities -```cpp -for (auto& entity : entities) { - if (distanceTo(entity, target) < radius) { - nearby.push_back(entity); - } -} -``` - -**After:** O(1) hash lookup -```cpp -auto cell = spatialHash.getCell(target.x, target.y); -for (auto& entity : cell.entities) { - nearby.push_back(entity); -} -``` - -**Expected:** 100x+ improvement for large entity counts - -#### Strategy 3: Batch Operations - -**Example: Grid Batch Updates** - -**Before:** Multiple Python/C++ crossings -```python -for x in range(100): - for y in range(100): - grid.at((x, y)).tilesprite = 42 # 10,000 calls! -``` - -**After:** Single batch operation -```python -grid.fill_rect(0, 0, 100, 100, 42) # 1 call! -``` - -**Expected:** 10-100x improvement for bulk updates - -#### Strategy 4: Cache Results - -**Example: Path Caching in Entities** - -**Before:** Recompute path every frame -```cpp -void Entity::update() { - path = computePathTo(target); // Expensive! - followPath(path); -} -``` - -**After:** Cache and reuse -```cpp -void Entity::update() { - if (!cachedPath || targetMoved) { - cachedPath = computePathTo(target); - } - followPath(cachedPath); -} -``` - -**Expected:** 10x+ improvement for pathfinding-heavy scenarios - -### Optimization Checklist - -Before optimizing: -- [ ] Profiled and identified real bottleneck -- [ ] Measured baseline performance -- [ ] Understood root cause - -During optimization: -- [ ] Changed only one thing at a time -- [ ] Kept original code for comparison -- [ ] Added comments explaining optimization - -After optimization: -- [ ] Measured improvement -- [ ] Verified correctness (no bugs introduced) -- [ ] Updated tests if needed - ---- - -## Step 5: Verify - Measure Improvement - -### Re-run Benchmarks - -**Before optimization:** -``` -Average: 45.2ms -Min: 38.1ms -Max: 62.3ms -FPS: 22.1 -``` - -**After optimization:** -``` -Average: 8.5ms ← 5.3x improvement! -Min: 7.2ms -Max: 12.1ms -FPS: 117.6 -``` - -### Check Correctness - -**Visual testing:** -1. Run game normally (not headless) -2. Verify visual output unchanged -3. Test edge cases (empty grids, max entities, etc.) - -**Automated testing:** -```bash -# Run existing test suite -./mcrogueface --headless --exec tests/test_grid_operations.py -./mcrogueface --headless --exec tests/test_entity_movement.py -``` - -### Document Results - -**Create issue comment:** - -```markdown -## Performance Optimization Results - -**Issue:** #116 (Dirty Flag System) - -**Baseline:** -- Static grid (100x100): 32.1ms average -- FPS: 22 - -**After Optimization:** -- Static grid (100x100): 0.8ms average -- FPS: 118 - -**Improvement:** 40x faster for static scenes - -**Test:** `tests/benchmark_static_grid.py` - -**Commit:** abc123def -``` - ---- - -## Common Optimization Patterns - -### Pattern 1: Early Exit - -```cpp -// Check cheap conditions first -if (!visible) return; -if (opacity <= 0.0f) return; -if (!inViewport(bounds)) return; - -// Then do expensive work -render(); -``` - -### Pattern 2: Lazy Evaluation - -```cpp -// Don't compute until needed -mutable bool fovComputed = false; -mutable std::vector visibleCells; - -std::vector& getVisibleCells() { - if (!fovComputed) { - visibleCells = computeFOV(); - fovComputed = true; - } - return visibleCells; -} -``` - -### Pattern 3: Object Pooling - -```cpp -// Reuse instead of allocate/deallocate -class EntityPool { - std::vector pool; - std::vector active; - -public: - Entity* spawn() { - for (size_t i = 0; i < pool.size(); i++) { - if (!active[i]) { - active[i] = true; - return &pool[i]; - } - } - // Grow pool if needed - pool.emplace_back(); - active.push_back(true); - return &pool.back(); - } -}; -``` - -### Pattern 4: Space-Time Tradeoff - -```cpp -// Cache expensive computation -std::unordered_map> pathCache; - -std::vector getPathTo(int targetId) { - if (pathCache.contains(targetId)) { - return pathCache[targetId]; // O(1) lookup - } - - auto path = computeExpensivePath(targetId); - pathCache[targetId] = path; - return path; -} -``` - ---- - -## When NOT to Optimize - -**Don't optimize if:** -- Performance is already acceptable (< 16ms frame time) -- Optimization makes code significantly more complex -- You haven't profiled yet (no guessing!) -- The bottleneck is elsewhere (optimize hot paths first) - -**Premature optimization is the root of all evil** - Donald Knuth - -Focus on: -1. Correctness first -2. Profile to find real bottlenecks -3. Optimize hot paths only -4. Keep code maintainable - ---- - -## Related Documentation - -- [[Performance-and-Profiling]] - Profiling tools reference -- [[Grid-System]] - Grid optimization opportunities -- [[Writing-Tests]] - Creating performance tests - -**Open Issues:** -- [#115](../../issues/115) - SpatialHash Implementation -- [#116](../../issues/116) - Dirty Flag System -- [#113](../../issues/113) - Batch Operations for Grid +# Performance Optimization Workflow + +Systematic approach to identifying and resolving performance bottlenecks in McRogueFace. + +## Quick Reference + +**Related Systems:** [[Performance-and-Profiling]], [[Grid-System]] + +**Tools:** +- F3 profiler overlay (in-game) +- `src/Profiler.h` - ScopedTimer +- `tests/benchmark_*.py` - Performance benchmarks + +## The Optimization Cycle + +``` +1. PROFILE → 2. IDENTIFY → 3. INSTRUMENT → 4. OPTIMIZE → 5. VERIFY + ↑ | + └─────────────────────────────────────────────────────────┘ +``` + +--- + +## Step 1: Profile - Find the Bottleneck + +### Using F3 Overlay + +**Start the game and press F3:** + +Look for: +- **Red frame times** (>33ms) - Unacceptable performance +- **Yellow frame times** (16-33ms) - Marginal performance +- **High subsystem times** - Which system is slow? + - Grid rendering > 10ms? → Grid optimization needed + - Entity rendering > 5ms? → Entity culling/SpatialHash needed + - Python script time > 5ms? → Python callback optimization + +**Example profiler output:** +``` +Frame: 45.2ms (RED) +FPS: 22 + +Grid Render: 32.1ms ← BOTTLENECK! +Entity Render: 8.5ms +Python Script: 2.1ms +Animation: 1.2ms + +Cells Rendered: 10000 +Entities: 150 (visible: 50) +``` + +**Analysis:** Grid rendering is the bottleneck (32ms of 45ms total). + +### Running Benchmarks + +**Static Grid Benchmark:** +```bash +cd build +./mcrogueface --headless --exec ../tests/benchmark_static_grid.py +``` + +**Moving Entities Benchmark:** +```bash +./mcrogueface --headless --exec ../tests/benchmark_moving_entities.py +``` + +**Output:** Baseline performance metrics to measure improvement against. + +--- + +## Step 2: Identify - Understand the Problem + +### Common Performance Issues + +**Issue: High Grid Render Time on Static Screens** +- **Symptom:** 20-40ms grid render, nothing changing +- **Cause:** Redrawing unchanged cells every frame +- **Solution:** Implement dirty flag system ([#116](../../issues/116)) + +**Issue: High Entity Render Time with Many Entities** +- **Symptom:** 10-20ms entity render with 500+ entities +- **Cause:** O(n) iteration, no spatial indexing +- **Solution:** Implement SpatialHash ([#115](../../issues/115)) + +**Issue: Slow Bulk Grid Updates from Python** +- **Symptom:** Frame drops when updating many cells +- **Cause:** Python/C++ boundary crossings for each cell +- **Solution:** Implement batch operations ([#113](../../issues/113)) + +**Issue: High Python Script Time** +- **Symptom:** 10-50ms in Python callbacks +- **Cause:** Heavy computation in Python update loops +- **Solution:** Move hot paths to C++ or optimize Python + +### Analyzing Call Stacks + +**When F3 overlay isn't enough:** + +```bash +# Build with debug symbols +make clean +cmake .. -DCMAKE_BUILD_TYPE=Debug +make + +# Profile with gdb +gdb ./mcrogueface +(gdb) run + +(gdb) info threads +(gdb) bt # Backtrace +``` + +--- + +## Step 3: Instrument - Measure Precisely + +### Adding ScopedTimer + +**Identify the slow function and wrap it:** + +```cpp +#include "Profiler.h" + +void MySystem::slowFunction() { + // Add timer + ScopedTimer timer(Resources::game->metrics.mySystemTime); + + // Your slow code here + for (auto& item : items) { + item->process(); + } +} +``` + +### Adding Custom Metrics + +**1. Add metric field to ProfilingMetrics:** + +`src/GameEngine.h`: +```cpp +struct ProfilingMetrics { + // ... existing fields ... + float mySystemTime = 0.0f; // Add this +}; +``` + +**2. Reset in resetPerFrame():** + +```cpp +void ProfilingMetrics::resetPerFrame() { + // ... existing resets ... + mySystemTime = 0.0f; // Add this +} +``` + +**3. Display in ProfilerOverlay:** + +`src/ProfilerOverlay.cpp`: +```cpp +void ProfilerOverlay::update(const ProfilingMetrics& metrics) { + // ... existing formatting ... + ss << "My System: " << formatFloat(metrics.mySystemTime) << "ms\n"; +} +``` + +**4. Rebuild and test:** + +```bash +make +cd build +./mcrogueface +# Press F3 - see your metric! +``` + +### Creating Benchmarks + +**Create `tests/benchmark_mysystem.py`:** + +```python +import mcrfpy +import sys +import time + +def benchmark(): + # Setup + mcrfpy.createScene("bench") + # ... create test scenario ... + + frame_times = [] + + def measure(runtime_ms): + frame_times.append(runtime_ms) + if len(frame_times) >= 300: # 5 seconds at 60fps + # Report statistics + avg = sum(frame_times) / len(frame_times) + min_time = min(frame_times) + max_time = max(frame_times) + + print(f"Average: {avg:.2f}ms") + print(f"Min: {min_time:.2f}ms") + print(f"Max: {max_time:.2f}ms") + print(f"FPS: {1000/avg:.1f}") + + sys.exit(0) + + mcrfpy.setTimer("benchmark", measure, 16) # Every frame + +benchmark() +``` + +**Run:** +```bash +./mcrogueface --headless --exec ../tests/benchmark_mysystem.py +``` + +--- + +## Step 4: Optimize - Make It Faster + +### Optimization Strategies + +#### Strategy 1: Reduce Work + +**Example: Grid Dirty Flags** + +**Before:** Redraw all cells every frame +```cpp +for (int x = 0; x < grid_x; x++) { + for (int y = 0; y < grid_y; y++) { + renderCell(x, y); // Always renders + } +} +``` + +**After:** Only redraw when changed +```cpp +if (grid_dirty) { + for (int x = 0; x < grid_x; x++) { + for (int y = 0; y < grid_y; y++) { + renderCell(x, y); + } + } + grid_dirty = false; +} +``` + +**Expected:** 10-50x improvement for static scenes + +#### Strategy 2: Reduce Complexity + +**Example: SpatialHash for Entity Queries** + +**Before:** O(n) search through all entities +```cpp +for (auto& entity : entities) { + if (distanceTo(entity, target) < radius) { + nearby.push_back(entity); + } +} +``` + +**After:** O(1) hash lookup +```cpp +auto cell = spatialHash.getCell(target.x, target.y); +for (auto& entity : cell.entities) { + nearby.push_back(entity); +} +``` + +**Expected:** 100x+ improvement for large entity counts + +#### Strategy 3: Batch Operations + +**Example: Grid Batch Updates** + +**Before:** Multiple Python/C++ crossings +```python +for x in range(100): + for y in range(100): + grid.at((x, y)).tilesprite = 42 # 10,000 calls! +``` + +**After:** Single batch operation +```python +grid.fill_rect(0, 0, 100, 100, 42) # 1 call! +``` + +**Expected:** 10-100x improvement for bulk updates + +#### Strategy 4: Cache Results + +**Example: Path Caching in Entities** + +**Before:** Recompute path every frame +```cpp +void Entity::update() { + path = computePathTo(target); // Expensive! + followPath(path); +} +``` + +**After:** Cache and reuse +```cpp +void Entity::update() { + if (!cachedPath || targetMoved) { + cachedPath = computePathTo(target); + } + followPath(cachedPath); +} +``` + +**Expected:** 10x+ improvement for pathfinding-heavy scenarios + +### Optimization Checklist + +Before optimizing: +- [ ] Profiled and identified real bottleneck +- [ ] Measured baseline performance +- [ ] Understood root cause + +During optimization: +- [ ] Changed only one thing at a time +- [ ] Kept original code for comparison +- [ ] Added comments explaining optimization + +After optimization: +- [ ] Measured improvement +- [ ] Verified correctness (no bugs introduced) +- [ ] Updated tests if needed + +--- + +## Step 5: Verify - Measure Improvement + +### Re-run Benchmarks + +**Before optimization:** +``` +Average: 45.2ms +Min: 38.1ms +Max: 62.3ms +FPS: 22.1 +``` + +**After optimization:** +``` +Average: 8.5ms ← 5.3x improvement! +Min: 7.2ms +Max: 12.1ms +FPS: 117.6 +``` + +### Check Correctness + +**Visual testing:** +1. Run game normally (not headless) +2. Verify visual output unchanged +3. Test edge cases (empty grids, max entities, etc.) + +**Automated testing:** +```bash +# Run existing test suite +./mcrogueface --headless --exec tests/test_grid_operations.py +./mcrogueface --headless --exec tests/test_entity_movement.py +``` + +### Document Results + +**Create issue comment:** + +```markdown +## Performance Optimization Results + +**Issue:** #116 (Dirty Flag System) + +**Baseline:** +- Static grid (100x100): 32.1ms average +- FPS: 22 + +**After Optimization:** +- Static grid (100x100): 0.8ms average +- FPS: 118 + +**Improvement:** 40x faster for static scenes + +**Test:** `tests/benchmark_static_grid.py` + +**Commit:** abc123def +``` + +--- + +## Common Optimization Patterns + +### Pattern 1: Early Exit + +```cpp +// Check cheap conditions first +if (!visible) return; +if (opacity <= 0.0f) return; +if (!inViewport(bounds)) return; + +// Then do expensive work +render(); +``` + +### Pattern 2: Lazy Evaluation + +```cpp +// Don't compute until needed +mutable bool fovComputed = false; +mutable std::vector visibleCells; + +std::vector& getVisibleCells() { + if (!fovComputed) { + visibleCells = computeFOV(); + fovComputed = true; + } + return visibleCells; +} +``` + +### Pattern 3: Object Pooling + +```cpp +// Reuse instead of allocate/deallocate +class EntityPool { + std::vector pool; + std::vector active; + +public: + Entity* spawn() { + for (size_t i = 0; i < pool.size(); i++) { + if (!active[i]) { + active[i] = true; + return &pool[i]; + } + } + // Grow pool if needed + pool.emplace_back(); + active.push_back(true); + return &pool.back(); + } +}; +``` + +### Pattern 4: Space-Time Tradeoff + +```cpp +// Cache expensive computation +std::unordered_map> pathCache; + +std::vector getPathTo(int targetId) { + if (pathCache.contains(targetId)) { + return pathCache[targetId]; // O(1) lookup + } + + auto path = computeExpensivePath(targetId); + pathCache[targetId] = path; + return path; +} +``` + +--- + +## When NOT to Optimize + +**Don't optimize if:** +- Performance is already acceptable (< 16ms frame time) +- Optimization makes code significantly more complex +- You haven't profiled yet (no guessing!) +- The bottleneck is elsewhere (optimize hot paths first) + +**Premature optimization is the root of all evil** - Donald Knuth + +Focus on: +1. Correctness first +2. Profile to find real bottlenecks +3. Optimize hot paths only +4. Keep code maintainable + +--- + +## Related Documentation + +- [[Performance-and-Profiling]] - Profiling tools reference +- [[Grid-System]] - Grid optimization opportunities +- [[Writing-Tests]] - Creating performance tests + +**Open Issues:** +- [#115](../../issues/115) - SpatialHash Implementation +- [#116](../../issues/116) - Dirty Flag System +- [#113](../../issues/113) - Batch Operations for Grid - [#117](../../issues/117) - Memory Pool for Entities \ No newline at end of file