1. PROFILE → 2. IDENTIFY → 3. INSTRUMENT → 4. OPTIMIZE → 5. VERIFY
     ↑                                                         |
     └─────────────────────────────────────────────────────────┘

Step 1: Profile - Find the Bottleneck

Using F3 Overlay

Start the game and press F3:

Look for:

Red frame times (>33ms) - Unacceptable performance
Yellow frame times (16-33ms) - Marginal performance
High subsystem times - Which system is slow?
- Grid rendering > 10ms? → Grid optimization needed
- Entity rendering > 5ms? → Entity culling/SpatialHash needed
- Python script time > 5ms? → Python callback optimization

Example profiler output:

Frame: 45.2ms (RED)
FPS: 22

Grid Render: 32.1ms  ← BOTTLENECK!
Entity Render: 8.5ms
Python Script: 2.1ms
Animation: 1.2ms

Cells Rendered: 10000
Entities: 150 (visible: 50)

Analysis: Grid rendering is the bottleneck (32ms of 45ms total).

Running Benchmarks

Static Grid Benchmark:

cd build
./mcrogueface --headless --exec ../tests/benchmark_static_grid.py

Moving Entities Benchmark:

./mcrogueface --headless --exec ../tests/benchmark_moving_entities.py

Output: Baseline performance metrics to measure improvement against.

Step 2: Identify - Understand the Problem

Common Performance Issues

Issue: High Grid Render Time on Static Screens

Symptom: 20-40ms grid render, nothing changing
Cause: Redrawing unchanged cells every frame
Solution: Implement dirty flag system (#116)

Issue: High Entity Render Time with Many Entities

Symptom: 10-20ms entity render with 500+ entities
Cause: O(n) iteration, no spatial indexing
Solution: Implement SpatialHash (#115)

Issue: Slow Bulk Grid Updates from Python

Symptom: Frame drops when updating many cells
Cause: Python/C++ boundary crossings for each cell
Solution: Implement batch operations (#113)

Issue: High Python Script Time

Symptom: 10-50ms in Python callbacks
Cause: Heavy computation in Python update loops
Solution: Move hot paths to C++ or optimize Python

Analyzing Call Stacks

When F3 overlay isn't enough:

# Build with debug symbols
make clean
cmake .. -DCMAKE_BUILD_TYPE=Debug
make

# Profile with gdb
gdb ./mcrogueface
(gdb) run
<trigger slow behavior>
(gdb) info threads
(gdb) bt  # Backtrace

Step 3: Instrument - Measure Precisely

Adding ScopedTimer

Identify the slow function and wrap it:

#include "Profiler.h"

void MySystem::slowFunction() {
    // Add timer
    ScopedTimer timer(Resources::game->metrics.mySystemTime);
    
    // Your slow code here
    for (auto& item : items) {
        item->process();
    }
}

Adding Custom Metrics

1. Add metric field to ProfilingMetrics:

src/GameEngine.h:

struct ProfilingMetrics {
    // ... existing fields ...
    float mySystemTime = 0.0f;  // Add this
};

2. Reset in resetPerFrame():

void ProfilingMetrics::resetPerFrame() {
    // ... existing resets ...
    mySystemTime = 0.0f;  // Add this
}

3. Display in ProfilerOverlay:

src/ProfilerOverlay.cpp:

void ProfilerOverlay::update(const ProfilingMetrics& metrics) {
    // ... existing formatting ...
    ss << "My System: " << formatFloat(metrics.mySystemTime) << "ms\n";
}

4. Rebuild and test:

make
cd build
./mcrogueface
# Press F3 - see your metric!

Creating Benchmarks

Create tests/benchmark_mysystem.py:

import mcrfpy
import sys
import time

def benchmark():
    # Setup
    mcrfpy.createScene("bench")
    # ... create test scenario ...
    
    frame_times = []
    
    def measure(runtime_ms):
        frame_times.append(runtime_ms)
        if len(frame_times) >= 300:  # 5 seconds at 60fps
            # Report statistics
            avg = sum(frame_times) / len(frame_times)
            min_time = min(frame_times)
            max_time = max(frame_times)
            
            print(f"Average: {avg:.2f}ms")
            print(f"Min: {min_time:.2f}ms")
            print(f"Max: {max_time:.2f}ms")
            print(f"FPS: {1000/avg:.1f}")
            
            sys.exit(0)
    
    mcrfpy.setTimer("benchmark", measure, 16)  # Every frame

benchmark()

Run:

./mcrogueface --headless --exec ../tests/benchmark_mysystem.py

Step 4: Optimize - Make It Faster

Optimization Strategies

Strategy 1: Reduce Work

Example: Grid Dirty Flags

Before: Redraw all cells every frame

for (int x = 0; x < grid_x; x++) {
    for (int y = 0; y < grid_y; y++) {
        renderCell(x, y);  // Always renders
    }
}

After: Only redraw when changed

if (grid_dirty) {
    for (int x = 0; x < grid_x; x++) {
        for (int y = 0; y < grid_y; y++) {
            renderCell(x, y);
        }
    }
    grid_dirty = false;
}

Expected: 10-50x improvement for static scenes

Strategy 2: Reduce Complexity

Example: SpatialHash for Entity Queries

Before: O(n) search through all entities

for (auto& entity : entities) {
    if (distanceTo(entity, target) < radius) {
        nearby.push_back(entity);
    }
}

After: O(1) hash lookup

auto cell = spatialHash.getCell(target.x, target.y);
for (auto& entity : cell.entities) {
    nearby.push_back(entity);
}

Expected: 100x+ improvement for large entity counts

Strategy 3: Batch Operations

Example: Grid Batch Updates

Before: Multiple Python/C++ crossings

for x in range(100):
    for y in range(100):
        grid.at((x, y)).tilesprite = 42  # 10,000 calls!

After: Single batch operation

grid.fill_rect(0, 0, 100, 100, 42)  # 1 call!

Expected: 10-100x improvement for bulk updates

Strategy 4: Cache Results

Example: Path Caching in Entities

Before: Recompute path every frame

void Entity::update() {
    path = computePathTo(target);  // Expensive!
    followPath(path);
}

After: Cache and reuse

void Entity::update() {
    if (!cachedPath || targetMoved) {
        cachedPath = computePathTo(target);
    }
    followPath(cachedPath);
}

Expected: 10x+ improvement for pathfinding-heavy scenarios

Optimization Checklist

Before optimizing:

Profiled and identified real bottleneck
Measured baseline performance
Understood root cause

During optimization:

Changed only one thing at a time
Kept original code for comparison
Added comments explaining optimization

After optimization:

Measured improvement
Verified correctness (no bugs introduced)
Updated tests if needed

Step 5: Verify - Measure Improvement

Re-run Benchmarks

Before optimization:

Average: 45.2ms
Min: 38.1ms
Max: 62.3ms
FPS: 22.1

After optimization:

Average: 8.5ms  ← 5.3x improvement!
Min: 7.2ms
Max: 12.1ms
FPS: 117.6

Check Correctness

Visual testing:

Run game normally (not headless)
Verify visual output unchanged
Test edge cases (empty grids, max entities, etc.)

Automated testing:

# Run existing test suite
./mcrogueface --headless --exec tests/test_grid_operations.py
./mcrogueface --headless --exec tests/test_entity_movement.py

Document Results

Create issue comment:

## Performance Optimization Results

**Issue:** #116 (Dirty Flag System)

**Baseline:**
- Static grid (100x100): 32.1ms average
- FPS: 22

**After Optimization:**
- Static grid (100x100): 0.8ms average
- FPS: 118

**Improvement:** 40x faster for static scenes

**Test:** `tests/benchmark_static_grid.py`

**Commit:** abc123def

Common Optimization Patterns

Pattern 1: Early Exit

// Check cheap conditions first
if (!visible) return;
if (opacity <= 0.0f) return;
if (!inViewport(bounds)) return;

// Then do expensive work
render();

Pattern 2: Lazy Evaluation

// Don't compute until needed
mutable bool fovComputed = false;
mutable std::vector<Point> visibleCells;

std::vector<Point>& getVisibleCells() {
    if (!fovComputed) {
        visibleCells = computeFOV();
        fovComputed = true;
    }
    return visibleCells;
}

Pattern 3: Object Pooling

// Reuse instead of allocate/deallocate
class EntityPool {
    std::vector<Entity> pool;
    std::vector<bool> active;
    
public:
    Entity* spawn() {
        for (size_t i = 0; i < pool.size(); i++) {
            if (!active[i]) {
                active[i] = true;
                return &pool[i];
            }
        }
        // Grow pool if needed
        pool.emplace_back();
        active.push_back(true);
        return &pool.back();
    }
};

Pattern 4: Space-Time Tradeoff

// Cache expensive computation
std::unordered_map<int, std::vector<Point>> pathCache;

std::vector<Point> getPathTo(int targetId) {
    if (pathCache.contains(targetId)) {
        return pathCache[targetId];  // O(1) lookup
    }
    
    auto path = computeExpensivePath(targetId);
    pathCache[targetId] = path;
    return path;
}

When NOT to Optimize

Don't optimize if:

Performance is already acceptable (< 16ms frame time)
Optimization makes code significantly more complex
You haven't profiled yet (no guessing!)
The bottleneck is elsewhere (optimize hot paths first)

Premature optimization is the root of all evil - Donald Knuth

Focus on:

Correctness first
Profile to find real bottlenecks
Optimize hot paths only
Keep code maintainable

Performance-and-Profiling - Profiling tools reference
Grid-System - Grid optimization opportunities
Writing-Tests - Creating performance tests

Open Issues:

#115 - SpatialHash Implementation
#116 - Dirty Flag System
#113 - Batch Operations for Grid
#117 - Memory Pool for Entities

Performance Optimization Workflow

Quick Reference

The Optimization Cycle