Add "Performance-Optimization-Workflow"

2025-10-25 22:37:10 +00:00 · 2025-10-25 22:37:10 +00:00 · 3b9f342ef1
parent dd7a569928
commit 3b9f342ef1
1 changed files with 494 additions and 0 deletions
--- a/Performance-Optimization-Workflow.-.md
+++ b/Performance-Optimization-Workflow.-.md
@ -0,0 +1,494 @@
+# Performance Optimization Workflow
+
+Systematic approach to identifying and resolving performance bottlenecks in McRogueFace.
+
+## Quick Reference
+
+**Related Systems:** [[Performance-and-Profiling]], [[Grid-System]]
+
+**Tools:**
+- F3 profiler overlay (in-game)
+- `src/Profiler.h` - ScopedTimer
+- `tests/benchmark_*.py` - Performance benchmarks
+
+## The Optimization Cycle
+
+```
+1. PROFILE → 2. IDENTIFY → 3. INSTRUMENT → 4. OPTIMIZE → 5. VERIFY
+     ↑                                                         |
+     └─────────────────────────────────────────────────────────┘
+```
+
+---
+
+## Step 1: Profile - Find the Bottleneck
+
+### Using F3 Overlay
+
+**Start the game and press F3:**
+
+Look for:
+- **Red frame times** (>33ms) - Unacceptable performance
+- **Yellow frame times** (16-33ms) - Marginal performance
+- **High subsystem times** - Which system is slow?
+  - Grid rendering > 10ms? → Grid optimization needed
+  - Entity rendering > 5ms? → Entity culling/SpatialHash needed
+  - Python script time > 5ms? → Python callback optimization
+
+**Example profiler output:**
+```
+Frame: 45.2ms (RED)
+FPS: 22
+
+Grid Render: 32.1ms  ← BOTTLENECK!
+Entity Render: 8.5ms
+Python Script: 2.1ms
+Animation: 1.2ms
+
+Cells Rendered: 10000
+Entities: 150 (visible: 50)
+```
+
+**Analysis:** Grid rendering is the bottleneck (32ms of 45ms total).
+
+### Running Benchmarks
+
+**Static Grid Benchmark:**
+```bash
+cd build
+./mcrogueface --headless --exec ../tests/benchmark_static_grid.py
+```
+
+**Moving Entities Benchmark:**
+```bash
+./mcrogueface --headless --exec ../tests/benchmark_moving_entities.py
+```
+
+**Output:** Baseline performance metrics to measure improvement against.
+
+---
+
+## Step 2: Identify - Understand the Problem
+
+### Common Performance Issues
+
+**Issue: High Grid Render Time on Static Screens**
+- **Symptom:** 20-40ms grid render, nothing changing
+- **Cause:** Redrawing unchanged cells every frame
+- **Solution:** Implement dirty flag system ([#116](../../issues/116))
+
+**Issue: High Entity Render Time with Many Entities**
+- **Symptom:** 10-20ms entity render with 500+ entities
+- **Cause:** O(n) iteration, no spatial indexing
+- **Solution:** Implement SpatialHash ([#115](../../issues/115))
+
+**Issue: Slow Bulk Grid Updates from Python**
+- **Symptom:** Frame drops when updating many cells
+- **Cause:** Python/C++ boundary crossings for each cell
+- **Solution:** Implement batch operations ([#113](../../issues/113))
+
+**Issue: High Python Script Time**
+- **Symptom:** 10-50ms in Python callbacks
+- **Cause:** Heavy computation in Python update loops
+- **Solution:** Move hot paths to C++ or optimize Python
+
+### Analyzing Call Stacks
+
+**When F3 overlay isn't enough:**
+
+```bash
+# Build with debug symbols
+make clean
+cmake .. -DCMAKE_BUILD_TYPE=Debug
+make
+
+# Profile with gdb
+gdb ./mcrogueface
+(gdb) run
+<trigger slow behavior>
+(gdb) info threads
+(gdb) bt  # Backtrace
+```
+
+---
+
+## Step 3: Instrument - Measure Precisely
+
+### Adding ScopedTimer
+
+**Identify the slow function and wrap it:**
+
+```cpp
+#include "Profiler.h"
+
+void MySystem::slowFunction() {
+    // Add timer
+    ScopedTimer timer(Resources::game->metrics.mySystemTime);
+    
+    // Your slow code here
+    for (auto& item : items) {
+        item->process();
+    }
+}
+```
+
+### Adding Custom Metrics
+
+**1. Add metric field to ProfilingMetrics:**
+
+`src/GameEngine.h`:
+```cpp
+struct ProfilingMetrics {
+    // ... existing fields ...
+    float mySystemTime = 0.0f;  // Add this
+};
+```
+
+**2. Reset in resetPerFrame():**
+
+```cpp
+void ProfilingMetrics::resetPerFrame() {
+    // ... existing resets ...
+    mySystemTime = 0.0f;  // Add this
+}
+```
+
+**3. Display in ProfilerOverlay:**
+
+`src/ProfilerOverlay.cpp`:
+```cpp
+void ProfilerOverlay::update(const ProfilingMetrics& metrics) {
+    // ... existing formatting ...
+    ss << "My System: " << formatFloat(metrics.mySystemTime) << "ms\n";
+}
+```
+
+**4. Rebuild and test:**
+
+```bash
+make
+cd build
+./mcrogueface
+# Press F3 - see your metric!
+```
+
+### Creating Benchmarks
+
+**Create `tests/benchmark_mysystem.py`:**
+
+```python
+import mcrfpy
+import sys
+import time
+
+def benchmark():
+    # Setup
+    mcrfpy.createScene("bench")
+    # ... create test scenario ...
+    
+    frame_times = []
+    
+    def measure(runtime_ms):
+        frame_times.append(runtime_ms)
+        if len(frame_times) >= 300:  # 5 seconds at 60fps
+            # Report statistics
+            avg = sum(frame_times) / len(frame_times)
+            min_time = min(frame_times)
+            max_time = max(frame_times)
+            
+            print(f"Average: {avg:.2f}ms")
+            print(f"Min: {min_time:.2f}ms")
+            print(f"Max: {max_time:.2f}ms")
+            print(f"FPS: {1000/avg:.1f}")
+            
+            sys.exit(0)
+    
+    mcrfpy.setTimer("benchmark", measure, 16)  # Every frame
+
+benchmark()
+```
+
+**Run:**
+```bash
+./mcrogueface --headless --exec ../tests/benchmark_mysystem.py
+```
+
+---
+
+## Step 4: Optimize - Make It Faster
+
+### Optimization Strategies
+
+#### Strategy 1: Reduce Work
+
+**Example: Grid Dirty Flags**
+
+**Before:** Redraw all cells every frame
+```cpp
+for (int x = 0; x < grid_x; x++) {
+    for (int y = 0; y < grid_y; y++) {
+        renderCell(x, y);  // Always renders
+    }
+}
+```
+
+**After:** Only redraw when changed
+```cpp
+if (grid_dirty) {
+    for (int x = 0; x < grid_x; x++) {
+        for (int y = 0; y < grid_y; y++) {
+            renderCell(x, y);
+        }
+    }
+    grid_dirty = false;
+}
+```
+
+**Expected:** 10-50x improvement for static scenes
+
+#### Strategy 2: Reduce Complexity
+
+**Example: SpatialHash for Entity Queries**
+
+**Before:** O(n) search through all entities
+```cpp
+for (auto& entity : entities) {
+    if (distanceTo(entity, target) < radius) {
+        nearby.push_back(entity);
+    }
+}
+```
+
+**After:** O(1) hash lookup
+```cpp
+auto cell = spatialHash.getCell(target.x, target.y);
+for (auto& entity : cell.entities) {
+    nearby.push_back(entity);
+}
+```
+
+**Expected:** 100x+ improvement for large entity counts
+
+#### Strategy 3: Batch Operations
+
+**Example: Grid Batch Updates**
+
+**Before:** Multiple Python/C++ crossings
+```python
+for x in range(100):
+    for y in range(100):
+        grid.at((x, y)).tilesprite = 42  # 10,000 calls!
+```
+
+**After:** Single batch operation
+```python
+grid.fill_rect(0, 0, 100, 100, 42)  # 1 call!
+```
+
+**Expected:** 10-100x improvement for bulk updates
+
+#### Strategy 4: Cache Results
+
+**Example: Path Caching in Entities**
+
+**Before:** Recompute path every frame
+```cpp
+void Entity::update() {
+    path = computePathTo(target);  // Expensive!
+    followPath(path);
+}
+```
+
+**After:** Cache and reuse
+```cpp
+void Entity::update() {
+    if (!cachedPath || targetMoved) {
+        cachedPath = computePathTo(target);
+    }
+    followPath(cachedPath);
+}
+```
+
+**Expected:** 10x+ improvement for pathfinding-heavy scenarios
+
+### Optimization Checklist
+
+Before optimizing:
+- [ ] Profiled and identified real bottleneck
+- [ ] Measured baseline performance
+- [ ] Understood root cause
+
+During optimization:
+- [ ] Changed only one thing at a time
+- [ ] Kept original code for comparison
+- [ ] Added comments explaining optimization
+
+After optimization:
+- [ ] Measured improvement
+- [ ] Verified correctness (no bugs introduced)
+- [ ] Updated tests if needed
+
+---
+
+## Step 5: Verify - Measure Improvement
+
+### Re-run Benchmarks
+
+**Before optimization:**
+```
+Average: 45.2ms
+Min: 38.1ms
+Max: 62.3ms
+FPS: 22.1
+```
+
+**After optimization:**
+```
+Average: 8.5ms  ← 5.3x improvement!
+Min: 7.2ms
+Max: 12.1ms
+FPS: 117.6
+```
+
+### Check Correctness
+
+**Visual testing:**
+1. Run game normally (not headless)
+2. Verify visual output unchanged
+3. Test edge cases (empty grids, max entities, etc.)
+
+**Automated testing:**
+```bash
+# Run existing test suite
+./mcrogueface --headless --exec tests/test_grid_operations.py
+./mcrogueface --headless --exec tests/test_entity_movement.py
+```
+
+### Document Results
+
+**Create issue comment:**
+
+```markdown
+## Performance Optimization Results
+
+**Issue:** #116 (Dirty Flag System)
+
+**Baseline:**
+- Static grid (100x100): 32.1ms average
+- FPS: 22
+
+**After Optimization:**
+- Static grid (100x100): 0.8ms average
+- FPS: 118
+
+**Improvement:** 40x faster for static scenes
+
+**Test:** `tests/benchmark_static_grid.py`
+
+**Commit:** abc123def
+```
+
+---
+
+## Common Optimization Patterns
+
+### Pattern 1: Early Exit
+
+```cpp
+// Check cheap conditions first
+if (!visible) return;
+if (opacity <= 0.0f) return;
+if (!inViewport(bounds)) return;
+
+// Then do expensive work
+render();
+```
+
+### Pattern 2: Lazy Evaluation
+
+```cpp
+// Don't compute until needed
+mutable bool fovComputed = false;
+mutable std::vector<Point> visibleCells;
+
+std::vector<Point>& getVisibleCells() {
+    if (!fovComputed) {
+        visibleCells = computeFOV();
+        fovComputed = true;
+    }
+    return visibleCells;
+}
+```
+
+### Pattern 3: Object Pooling
+
+```cpp
+// Reuse instead of allocate/deallocate
+class EntityPool {
+    std::vector<Entity> pool;
+    std::vector<bool> active;
+    
+public:
+    Entity* spawn() {
+        for (size_t i = 0; i < pool.size(); i++) {
+            if (!active[i]) {
+                active[i] = true;
+                return &pool[i];
+            }
+        }
+        // Grow pool if needed
+        pool.emplace_back();
+        active.push_back(true);
+        return &pool.back();
+    }
+};
+```
+
+### Pattern 4: Space-Time Tradeoff
+
+```cpp
+// Cache expensive computation
+std::unordered_map<int, std::vector<Point>> pathCache;
+
+std::vector<Point> getPathTo(int targetId) {
+    if (pathCache.contains(targetId)) {
+        return pathCache[targetId];  // O(1) lookup
+    }
+    
+    auto path = computeExpensivePath(targetId);
+    pathCache[targetId] = path;
+    return path;
+}
+```
+
+---
+
+## When NOT to Optimize
+
+**Don't optimize if:**
+- Performance is already acceptable (< 16ms frame time)
+- Optimization makes code significantly more complex
+- You haven't profiled yet (no guessing!)
+- The bottleneck is elsewhere (optimize hot paths first)
+
+**Premature optimization is the root of all evil** - Donald Knuth
+
+Focus on:
+1. Correctness first
+2. Profile to find real bottlenecks
+3. Optimize hot paths only
+4. Keep code maintainable
+
+---
+
+## Related Documentation
+
+- [[Performance-and-Profiling]] - Profiling tools reference
+- [[Grid-System]] - Grid optimization opportunities
+- [[Writing-Tests]] - Creating performance tests
+
+**Open Issues:**
+- [#115](../../issues/115) - SpatialHash Implementation
+- [#116](../../issues/116) - Dirty Flag System
+- [#113](../../issues/113) - Batch Operations for Grid
+- [#117](../../issues/117) - Memory Pool for Entities