1 Performance-Optimization-Workflow
John McCardle edited this page 2025-10-25 22:37:10 +00:00

Performance Optimization Workflow

Systematic approach to identifying and resolving performance bottlenecks in McRogueFace.

Quick Reference

Related Systems: Performance-and-Profiling, Grid-System

Tools:

  • F3 profiler overlay (in-game)
  • src/Profiler.h - ScopedTimer
  • tests/benchmark_*.py - Performance benchmarks

The Optimization Cycle

1. PROFILE → 2. IDENTIFY → 3. INSTRUMENT → 4. OPTIMIZE → 5. VERIFY
     ↑                                                         |
     └─────────────────────────────────────────────────────────┘

Step 1: Profile - Find the Bottleneck

Using F3 Overlay

Start the game and press F3:

Look for:

  • Red frame times (>33ms) - Unacceptable performance
  • Yellow frame times (16-33ms) - Marginal performance
  • High subsystem times - Which system is slow?
    • Grid rendering > 10ms? → Grid optimization needed
    • Entity rendering > 5ms? → Entity culling/SpatialHash needed
    • Python script time > 5ms? → Python callback optimization

Example profiler output:

Frame: 45.2ms (RED)
FPS: 22

Grid Render: 32.1ms  ← BOTTLENECK!
Entity Render: 8.5ms
Python Script: 2.1ms
Animation: 1.2ms

Cells Rendered: 10000
Entities: 150 (visible: 50)

Analysis: Grid rendering is the bottleneck (32ms of 45ms total).

Running Benchmarks

Static Grid Benchmark:

cd build
./mcrogueface --headless --exec ../tests/benchmark_static_grid.py

Moving Entities Benchmark:

./mcrogueface --headless --exec ../tests/benchmark_moving_entities.py

Output: Baseline performance metrics to measure improvement against.


Step 2: Identify - Understand the Problem

Common Performance Issues

Issue: High Grid Render Time on Static Screens

  • Symptom: 20-40ms grid render, nothing changing
  • Cause: Redrawing unchanged cells every frame
  • Solution: Implement dirty flag system (#116)

Issue: High Entity Render Time with Many Entities

  • Symptom: 10-20ms entity render with 500+ entities
  • Cause: O(n) iteration, no spatial indexing
  • Solution: Implement SpatialHash (#115)

Issue: Slow Bulk Grid Updates from Python

  • Symptom: Frame drops when updating many cells
  • Cause: Python/C++ boundary crossings for each cell
  • Solution: Implement batch operations (#113)

Issue: High Python Script Time

  • Symptom: 10-50ms in Python callbacks
  • Cause: Heavy computation in Python update loops
  • Solution: Move hot paths to C++ or optimize Python

Analyzing Call Stacks

When F3 overlay isn't enough:

# Build with debug symbols
make clean
cmake .. -DCMAKE_BUILD_TYPE=Debug
make

# Profile with gdb
gdb ./mcrogueface
(gdb) run
<trigger slow behavior>
(gdb) info threads
(gdb) bt  # Backtrace

Step 3: Instrument - Measure Precisely

Adding ScopedTimer

Identify the slow function and wrap it:

#include "Profiler.h"

void MySystem::slowFunction() {
    // Add timer
    ScopedTimer timer(Resources::game->metrics.mySystemTime);
    
    // Your slow code here
    for (auto& item : items) {
        item->process();
    }
}

Adding Custom Metrics

1. Add metric field to ProfilingMetrics:

src/GameEngine.h:

struct ProfilingMetrics {
    // ... existing fields ...
    float mySystemTime = 0.0f;  // Add this
};

2. Reset in resetPerFrame():

void ProfilingMetrics::resetPerFrame() {
    // ... existing resets ...
    mySystemTime = 0.0f;  // Add this
}

3. Display in ProfilerOverlay:

src/ProfilerOverlay.cpp:

void ProfilerOverlay::update(const ProfilingMetrics& metrics) {
    // ... existing formatting ...
    ss << "My System: " << formatFloat(metrics.mySystemTime) << "ms\n";
}

4. Rebuild and test:

make
cd build
./mcrogueface
# Press F3 - see your metric!

Creating Benchmarks

Create tests/benchmark_mysystem.py:

import mcrfpy
import sys
import time

def benchmark():
    # Setup
    mcrfpy.createScene("bench")
    # ... create test scenario ...
    
    frame_times = []
    
    def measure(runtime_ms):
        frame_times.append(runtime_ms)
        if len(frame_times) >= 300:  # 5 seconds at 60fps
            # Report statistics
            avg = sum(frame_times) / len(frame_times)
            min_time = min(frame_times)
            max_time = max(frame_times)
            
            print(f"Average: {avg:.2f}ms")
            print(f"Min: {min_time:.2f}ms")
            print(f"Max: {max_time:.2f}ms")
            print(f"FPS: {1000/avg:.1f}")
            
            sys.exit(0)
    
    mcrfpy.setTimer("benchmark", measure, 16)  # Every frame

benchmark()

Run:

./mcrogueface --headless --exec ../tests/benchmark_mysystem.py

Step 4: Optimize - Make It Faster

Optimization Strategies

Strategy 1: Reduce Work

Example: Grid Dirty Flags

Before: Redraw all cells every frame

for (int x = 0; x < grid_x; x++) {
    for (int y = 0; y < grid_y; y++) {
        renderCell(x, y);  // Always renders
    }
}

After: Only redraw when changed

if (grid_dirty) {
    for (int x = 0; x < grid_x; x++) {
        for (int y = 0; y < grid_y; y++) {
            renderCell(x, y);
        }
    }
    grid_dirty = false;
}

Expected: 10-50x improvement for static scenes

Strategy 2: Reduce Complexity

Example: SpatialHash for Entity Queries

Before: O(n) search through all entities

for (auto& entity : entities) {
    if (distanceTo(entity, target) < radius) {
        nearby.push_back(entity);
    }
}

After: O(1) hash lookup

auto cell = spatialHash.getCell(target.x, target.y);
for (auto& entity : cell.entities) {
    nearby.push_back(entity);
}

Expected: 100x+ improvement for large entity counts

Strategy 3: Batch Operations

Example: Grid Batch Updates

Before: Multiple Python/C++ crossings

for x in range(100):
    for y in range(100):
        grid.at((x, y)).tilesprite = 42  # 10,000 calls!

After: Single batch operation

grid.fill_rect(0, 0, 100, 100, 42)  # 1 call!

Expected: 10-100x improvement for bulk updates

Strategy 4: Cache Results

Example: Path Caching in Entities

Before: Recompute path every frame

void Entity::update() {
    path = computePathTo(target);  // Expensive!
    followPath(path);
}

After: Cache and reuse

void Entity::update() {
    if (!cachedPath || targetMoved) {
        cachedPath = computePathTo(target);
    }
    followPath(cachedPath);
}

Expected: 10x+ improvement for pathfinding-heavy scenarios

Optimization Checklist

Before optimizing:

  • Profiled and identified real bottleneck
  • Measured baseline performance
  • Understood root cause

During optimization:

  • Changed only one thing at a time
  • Kept original code for comparison
  • Added comments explaining optimization

After optimization:

  • Measured improvement
  • Verified correctness (no bugs introduced)
  • Updated tests if needed

Step 5: Verify - Measure Improvement

Re-run Benchmarks

Before optimization:

Average: 45.2ms
Min: 38.1ms
Max: 62.3ms
FPS: 22.1

After optimization:

Average: 8.5ms  ← 5.3x improvement!
Min: 7.2ms
Max: 12.1ms
FPS: 117.6

Check Correctness

Visual testing:

  1. Run game normally (not headless)
  2. Verify visual output unchanged
  3. Test edge cases (empty grids, max entities, etc.)

Automated testing:

# Run existing test suite
./mcrogueface --headless --exec tests/test_grid_operations.py
./mcrogueface --headless --exec tests/test_entity_movement.py

Document Results

Create issue comment:

## Performance Optimization Results

**Issue:** #116 (Dirty Flag System)

**Baseline:**
- Static grid (100x100): 32.1ms average
- FPS: 22

**After Optimization:**
- Static grid (100x100): 0.8ms average
- FPS: 118

**Improvement:** 40x faster for static scenes

**Test:** `tests/benchmark_static_grid.py`

**Commit:** abc123def

Common Optimization Patterns

Pattern 1: Early Exit

// Check cheap conditions first
if (!visible) return;
if (opacity <= 0.0f) return;
if (!inViewport(bounds)) return;

// Then do expensive work
render();

Pattern 2: Lazy Evaluation

// Don't compute until needed
mutable bool fovComputed = false;
mutable std::vector<Point> visibleCells;

std::vector<Point>& getVisibleCells() {
    if (!fovComputed) {
        visibleCells = computeFOV();
        fovComputed = true;
    }
    return visibleCells;
}

Pattern 3: Object Pooling

// Reuse instead of allocate/deallocate
class EntityPool {
    std::vector<Entity> pool;
    std::vector<bool> active;
    
public:
    Entity* spawn() {
        for (size_t i = 0; i < pool.size(); i++) {
            if (!active[i]) {
                active[i] = true;
                return &pool[i];
            }
        }
        // Grow pool if needed
        pool.emplace_back();
        active.push_back(true);
        return &pool.back();
    }
};

Pattern 4: Space-Time Tradeoff

// Cache expensive computation
std::unordered_map<int, std::vector<Point>> pathCache;

std::vector<Point> getPathTo(int targetId) {
    if (pathCache.contains(targetId)) {
        return pathCache[targetId];  // O(1) lookup
    }
    
    auto path = computeExpensivePath(targetId);
    pathCache[targetId] = path;
    return path;
}

When NOT to Optimize

Don't optimize if:

  • Performance is already acceptable (< 16ms frame time)
  • Optimization makes code significantly more complex
  • You haven't profiled yet (no guessing!)
  • The bottleneck is elsewhere (optimize hot paths first)

Premature optimization is the root of all evil - Donald Knuth

Focus on:

  1. Correctness first
  2. Profile to find real bottlenecks
  3. Optimize hot paths only
  4. Keep code maintainable

Open Issues:

  • #115 - SpatialHash Implementation
  • #116 - Dirty Flag System
  • #113 - Batch Operations for Grid
  • #117 - Memory Pool for Entities