diff --git a/Performance-Optimization-Workflow.-.md b/Performance-Optimization-Workflow.-.md new file mode 100644 index 0000000..d1bce45 --- /dev/null +++ b/Performance-Optimization-Workflow.-.md @@ -0,0 +1,494 @@ +# Performance Optimization Workflow + +Systematic approach to identifying and resolving performance bottlenecks in McRogueFace. + +## Quick Reference + +**Related Systems:** [[Performance-and-Profiling]], [[Grid-System]] + +**Tools:** +- F3 profiler overlay (in-game) +- `src/Profiler.h` - ScopedTimer +- `tests/benchmark_*.py` - Performance benchmarks + +## The Optimization Cycle + +``` +1. PROFILE → 2. IDENTIFY → 3. INSTRUMENT → 4. OPTIMIZE → 5. VERIFY + ↑ | + └─────────────────────────────────────────────────────────┘ +``` + +--- + +## Step 1: Profile - Find the Bottleneck + +### Using F3 Overlay + +**Start the game and press F3:** + +Look for: +- **Red frame times** (>33ms) - Unacceptable performance +- **Yellow frame times** (16-33ms) - Marginal performance +- **High subsystem times** - Which system is slow? + - Grid rendering > 10ms? → Grid optimization needed + - Entity rendering > 5ms? → Entity culling/SpatialHash needed + - Python script time > 5ms? → Python callback optimization + +**Example profiler output:** +``` +Frame: 45.2ms (RED) +FPS: 22 + +Grid Render: 32.1ms ← BOTTLENECK! +Entity Render: 8.5ms +Python Script: 2.1ms +Animation: 1.2ms + +Cells Rendered: 10000 +Entities: 150 (visible: 50) +``` + +**Analysis:** Grid rendering is the bottleneck (32ms of 45ms total). + +### Running Benchmarks + +**Static Grid Benchmark:** +```bash +cd build +./mcrogueface --headless --exec ../tests/benchmark_static_grid.py +``` + +**Moving Entities Benchmark:** +```bash +./mcrogueface --headless --exec ../tests/benchmark_moving_entities.py +``` + +**Output:** Baseline performance metrics to measure improvement against. + +--- + +## Step 2: Identify - Understand the Problem + +### Common Performance Issues + +**Issue: High Grid Render Time on Static Screens** +- **Symptom:** 20-40ms grid render, nothing changing +- **Cause:** Redrawing unchanged cells every frame +- **Solution:** Implement dirty flag system ([#116](../../issues/116)) + +**Issue: High Entity Render Time with Many Entities** +- **Symptom:** 10-20ms entity render with 500+ entities +- **Cause:** O(n) iteration, no spatial indexing +- **Solution:** Implement SpatialHash ([#115](../../issues/115)) + +**Issue: Slow Bulk Grid Updates from Python** +- **Symptom:** Frame drops when updating many cells +- **Cause:** Python/C++ boundary crossings for each cell +- **Solution:** Implement batch operations ([#113](../../issues/113)) + +**Issue: High Python Script Time** +- **Symptom:** 10-50ms in Python callbacks +- **Cause:** Heavy computation in Python update loops +- **Solution:** Move hot paths to C++ or optimize Python + +### Analyzing Call Stacks + +**When F3 overlay isn't enough:** + +```bash +# Build with debug symbols +make clean +cmake .. -DCMAKE_BUILD_TYPE=Debug +make + +# Profile with gdb +gdb ./mcrogueface +(gdb) run + +(gdb) info threads +(gdb) bt # Backtrace +``` + +--- + +## Step 3: Instrument - Measure Precisely + +### Adding ScopedTimer + +**Identify the slow function and wrap it:** + +```cpp +#include "Profiler.h" + +void MySystem::slowFunction() { + // Add timer + ScopedTimer timer(Resources::game->metrics.mySystemTime); + + // Your slow code here + for (auto& item : items) { + item->process(); + } +} +``` + +### Adding Custom Metrics + +**1. Add metric field to ProfilingMetrics:** + +`src/GameEngine.h`: +```cpp +struct ProfilingMetrics { + // ... existing fields ... + float mySystemTime = 0.0f; // Add this +}; +``` + +**2. Reset in resetPerFrame():** + +```cpp +void ProfilingMetrics::resetPerFrame() { + // ... existing resets ... + mySystemTime = 0.0f; // Add this +} +``` + +**3. Display in ProfilerOverlay:** + +`src/ProfilerOverlay.cpp`: +```cpp +void ProfilerOverlay::update(const ProfilingMetrics& metrics) { + // ... existing formatting ... + ss << "My System: " << formatFloat(metrics.mySystemTime) << "ms\n"; +} +``` + +**4. Rebuild and test:** + +```bash +make +cd build +./mcrogueface +# Press F3 - see your metric! +``` + +### Creating Benchmarks + +**Create `tests/benchmark_mysystem.py`:** + +```python +import mcrfpy +import sys +import time + +def benchmark(): + # Setup + mcrfpy.createScene("bench") + # ... create test scenario ... + + frame_times = [] + + def measure(runtime_ms): + frame_times.append(runtime_ms) + if len(frame_times) >= 300: # 5 seconds at 60fps + # Report statistics + avg = sum(frame_times) / len(frame_times) + min_time = min(frame_times) + max_time = max(frame_times) + + print(f"Average: {avg:.2f}ms") + print(f"Min: {min_time:.2f}ms") + print(f"Max: {max_time:.2f}ms") + print(f"FPS: {1000/avg:.1f}") + + sys.exit(0) + + mcrfpy.setTimer("benchmark", measure, 16) # Every frame + +benchmark() +``` + +**Run:** +```bash +./mcrogueface --headless --exec ../tests/benchmark_mysystem.py +``` + +--- + +## Step 4: Optimize - Make It Faster + +### Optimization Strategies + +#### Strategy 1: Reduce Work + +**Example: Grid Dirty Flags** + +**Before:** Redraw all cells every frame +```cpp +for (int x = 0; x < grid_x; x++) { + for (int y = 0; y < grid_y; y++) { + renderCell(x, y); // Always renders + } +} +``` + +**After:** Only redraw when changed +```cpp +if (grid_dirty) { + for (int x = 0; x < grid_x; x++) { + for (int y = 0; y < grid_y; y++) { + renderCell(x, y); + } + } + grid_dirty = false; +} +``` + +**Expected:** 10-50x improvement for static scenes + +#### Strategy 2: Reduce Complexity + +**Example: SpatialHash for Entity Queries** + +**Before:** O(n) search through all entities +```cpp +for (auto& entity : entities) { + if (distanceTo(entity, target) < radius) { + nearby.push_back(entity); + } +} +``` + +**After:** O(1) hash lookup +```cpp +auto cell = spatialHash.getCell(target.x, target.y); +for (auto& entity : cell.entities) { + nearby.push_back(entity); +} +``` + +**Expected:** 100x+ improvement for large entity counts + +#### Strategy 3: Batch Operations + +**Example: Grid Batch Updates** + +**Before:** Multiple Python/C++ crossings +```python +for x in range(100): + for y in range(100): + grid.at((x, y)).tilesprite = 42 # 10,000 calls! +``` + +**After:** Single batch operation +```python +grid.fill_rect(0, 0, 100, 100, 42) # 1 call! +``` + +**Expected:** 10-100x improvement for bulk updates + +#### Strategy 4: Cache Results + +**Example: Path Caching in Entities** + +**Before:** Recompute path every frame +```cpp +void Entity::update() { + path = computePathTo(target); // Expensive! + followPath(path); +} +``` + +**After:** Cache and reuse +```cpp +void Entity::update() { + if (!cachedPath || targetMoved) { + cachedPath = computePathTo(target); + } + followPath(cachedPath); +} +``` + +**Expected:** 10x+ improvement for pathfinding-heavy scenarios + +### Optimization Checklist + +Before optimizing: +- [ ] Profiled and identified real bottleneck +- [ ] Measured baseline performance +- [ ] Understood root cause + +During optimization: +- [ ] Changed only one thing at a time +- [ ] Kept original code for comparison +- [ ] Added comments explaining optimization + +After optimization: +- [ ] Measured improvement +- [ ] Verified correctness (no bugs introduced) +- [ ] Updated tests if needed + +--- + +## Step 5: Verify - Measure Improvement + +### Re-run Benchmarks + +**Before optimization:** +``` +Average: 45.2ms +Min: 38.1ms +Max: 62.3ms +FPS: 22.1 +``` + +**After optimization:** +``` +Average: 8.5ms ← 5.3x improvement! +Min: 7.2ms +Max: 12.1ms +FPS: 117.6 +``` + +### Check Correctness + +**Visual testing:** +1. Run game normally (not headless) +2. Verify visual output unchanged +3. Test edge cases (empty grids, max entities, etc.) + +**Automated testing:** +```bash +# Run existing test suite +./mcrogueface --headless --exec tests/test_grid_operations.py +./mcrogueface --headless --exec tests/test_entity_movement.py +``` + +### Document Results + +**Create issue comment:** + +```markdown +## Performance Optimization Results + +**Issue:** #116 (Dirty Flag System) + +**Baseline:** +- Static grid (100x100): 32.1ms average +- FPS: 22 + +**After Optimization:** +- Static grid (100x100): 0.8ms average +- FPS: 118 + +**Improvement:** 40x faster for static scenes + +**Test:** `tests/benchmark_static_grid.py` + +**Commit:** abc123def +``` + +--- + +## Common Optimization Patterns + +### Pattern 1: Early Exit + +```cpp +// Check cheap conditions first +if (!visible) return; +if (opacity <= 0.0f) return; +if (!inViewport(bounds)) return; + +// Then do expensive work +render(); +``` + +### Pattern 2: Lazy Evaluation + +```cpp +// Don't compute until needed +mutable bool fovComputed = false; +mutable std::vector visibleCells; + +std::vector& getVisibleCells() { + if (!fovComputed) { + visibleCells = computeFOV(); + fovComputed = true; + } + return visibleCells; +} +``` + +### Pattern 3: Object Pooling + +```cpp +// Reuse instead of allocate/deallocate +class EntityPool { + std::vector pool; + std::vector active; + +public: + Entity* spawn() { + for (size_t i = 0; i < pool.size(); i++) { + if (!active[i]) { + active[i] = true; + return &pool[i]; + } + } + // Grow pool if needed + pool.emplace_back(); + active.push_back(true); + return &pool.back(); + } +}; +``` + +### Pattern 4: Space-Time Tradeoff + +```cpp +// Cache expensive computation +std::unordered_map> pathCache; + +std::vector getPathTo(int targetId) { + if (pathCache.contains(targetId)) { + return pathCache[targetId]; // O(1) lookup + } + + auto path = computeExpensivePath(targetId); + pathCache[targetId] = path; + return path; +} +``` + +--- + +## When NOT to Optimize + +**Don't optimize if:** +- Performance is already acceptable (< 16ms frame time) +- Optimization makes code significantly more complex +- You haven't profiled yet (no guessing!) +- The bottleneck is elsewhere (optimize hot paths first) + +**Premature optimization is the root of all evil** - Donald Knuth + +Focus on: +1. Correctness first +2. Profile to find real bottlenecks +3. Optimize hot paths only +4. Keep code maintainable + +--- + +## Related Documentation + +- [[Performance-and-Profiling]] - Profiling tools reference +- [[Grid-System]] - Grid optimization opportunities +- [[Writing-Tests]] - Creating performance tests + +**Open Issues:** +- [#115](../../issues/115) - SpatialHash Implementation +- [#116](../../issues/116) - Dirty Flag System +- [#113](../../issues/113) - Batch Operations for Grid +- [#117](../../issues/117) - Memory Pool for Entities \ No newline at end of file