Table of Contents
- Performance Optimization Workflow
- Quick Reference
- The Optimization Cycle
- Step 1: Profile - Find the Bottleneck
- Step 2: Identify - Understand the Problem
- Step 3: Instrument - Measure Precisely
- Step 4: Optimize - Make It Faster
- Optimization Strategies
- Strategy 1: Reduce Work
- Strategy 2: Reduce Complexity
- Strategy 3: Batch Operations
- Strategy 4: Cache Results
- Optimization Checklist
- Step 5: Verify - Measure Improvement
- Common Optimization Patterns
- Pattern 1: Early Exit
- Pattern 2: Lazy Evaluation
- Pattern 3: Object Pooling
- Pattern 4: Space-Time Tradeoff
- When NOT to Optimize
- Related Documentation
Performance Optimization Workflow
Systematic approach to identifying and resolving performance bottlenecks in McRogueFace.
Quick Reference
Related Systems: Performance-and-Profiling, Grid-System
Tools:
- F3 profiler overlay (in-game)
src/Profiler.h- ScopedTimertests/benchmark_*.py- Performance benchmarks
The Optimization Cycle
1. PROFILE → 2. IDENTIFY → 3. INSTRUMENT → 4. OPTIMIZE → 5. VERIFY
↑ |
└─────────────────────────────────────────────────────────┘
Step 1: Profile - Find the Bottleneck
Using F3 Overlay
Start the game and press F3:
Look for:
- Red frame times (>33ms) - Unacceptable performance
- Yellow frame times (16-33ms) - Marginal performance
- High subsystem times - Which system is slow?
- Grid rendering > 10ms? → Grid optimization needed
- Entity rendering > 5ms? → Entity culling/SpatialHash needed
- Python script time > 5ms? → Python callback optimization
Example profiler output:
Frame: 45.2ms (RED)
FPS: 22
Grid Render: 32.1ms ← BOTTLENECK!
Entity Render: 8.5ms
Python Script: 2.1ms
Animation: 1.2ms
Cells Rendered: 10000
Entities: 150 (visible: 50)
Analysis: Grid rendering is the bottleneck (32ms of 45ms total).
Running Benchmarks
Static Grid Benchmark:
cd build
./mcrogueface --headless --exec ../tests/benchmark_static_grid.py
Moving Entities Benchmark:
./mcrogueface --headless --exec ../tests/benchmark_moving_entities.py
Output: Baseline performance metrics to measure improvement against.
Step 2: Identify - Understand the Problem
Common Performance Issues
Issue: High Grid Render Time on Static Screens
- Symptom: 20-40ms grid render, nothing changing
- Cause: Redrawing unchanged cells every frame
- Solution: Implement dirty flag system (#116)
Issue: High Entity Render Time with Many Entities
- Symptom: 10-20ms entity render with 500+ entities
- Cause: O(n) iteration, no spatial indexing
- Solution: Implement SpatialHash (#115)
Issue: Slow Bulk Grid Updates from Python
- Symptom: Frame drops when updating many cells
- Cause: Python/C++ boundary crossings for each cell
- Solution: Implement batch operations (#113)
Issue: High Python Script Time
- Symptom: 10-50ms in Python callbacks
- Cause: Heavy computation in Python update loops
- Solution: Move hot paths to C++ or optimize Python
Analyzing Call Stacks
When F3 overlay isn't enough:
# Build with debug symbols
make clean
cmake .. -DCMAKE_BUILD_TYPE=Debug
make
# Profile with gdb
gdb ./mcrogueface
(gdb) run
<trigger slow behavior>
(gdb) info threads
(gdb) bt # Backtrace
Step 3: Instrument - Measure Precisely
Adding ScopedTimer
Identify the slow function and wrap it:
#include "Profiler.h"
void MySystem::slowFunction() {
// Add timer
ScopedTimer timer(Resources::game->metrics.mySystemTime);
// Your slow code here
for (auto& item : items) {
item->process();
}
}
Adding Custom Metrics
1. Add metric field to ProfilingMetrics:
src/GameEngine.h:
struct ProfilingMetrics {
// ... existing fields ...
float mySystemTime = 0.0f; // Add this
};
2. Reset in resetPerFrame():
void ProfilingMetrics::resetPerFrame() {
// ... existing resets ...
mySystemTime = 0.0f; // Add this
}
3. Display in ProfilerOverlay:
src/ProfilerOverlay.cpp:
void ProfilerOverlay::update(const ProfilingMetrics& metrics) {
// ... existing formatting ...
ss << "My System: " << formatFloat(metrics.mySystemTime) << "ms\n";
}
4. Rebuild and test:
make
cd build
./mcrogueface
# Press F3 - see your metric!
Creating Benchmarks
Create tests/benchmark_mysystem.py:
import mcrfpy
import sys
import time
def benchmark():
# Setup
mcrfpy.createScene("bench")
# ... create test scenario ...
frame_times = []
def measure(runtime_ms):
frame_times.append(runtime_ms)
if len(frame_times) >= 300: # 5 seconds at 60fps
# Report statistics
avg = sum(frame_times) / len(frame_times)
min_time = min(frame_times)
max_time = max(frame_times)
print(f"Average: {avg:.2f}ms")
print(f"Min: {min_time:.2f}ms")
print(f"Max: {max_time:.2f}ms")
print(f"FPS: {1000/avg:.1f}")
sys.exit(0)
mcrfpy.setTimer("benchmark", measure, 16) # Every frame
benchmark()
Run:
./mcrogueface --headless --exec ../tests/benchmark_mysystem.py
Step 4: Optimize - Make It Faster
Optimization Strategies
Strategy 1: Reduce Work
Example: Grid Dirty Flags
Before: Redraw all cells every frame
for (int x = 0; x < grid_x; x++) {
for (int y = 0; y < grid_y; y++) {
renderCell(x, y); // Always renders
}
}
After: Only redraw when changed
if (grid_dirty) {
for (int x = 0; x < grid_x; x++) {
for (int y = 0; y < grid_y; y++) {
renderCell(x, y);
}
}
grid_dirty = false;
}
Expected: 10-50x improvement for static scenes
Strategy 2: Reduce Complexity
Example: SpatialHash for Entity Queries
Before: O(n) search through all entities
for (auto& entity : entities) {
if (distanceTo(entity, target) < radius) {
nearby.push_back(entity);
}
}
After: O(1) hash lookup
auto cell = spatialHash.getCell(target.x, target.y);
for (auto& entity : cell.entities) {
nearby.push_back(entity);
}
Expected: 100x+ improvement for large entity counts
Strategy 3: Batch Operations
Example: Grid Batch Updates
Before: Multiple Python/C++ crossings
for x in range(100):
for y in range(100):
grid.at((x, y)).tilesprite = 42 # 10,000 calls!
After: Single batch operation
grid.fill_rect(0, 0, 100, 100, 42) # 1 call!
Expected: 10-100x improvement for bulk updates
Strategy 4: Cache Results
Example: Path Caching in Entities
Before: Recompute path every frame
void Entity::update() {
path = computePathTo(target); // Expensive!
followPath(path);
}
After: Cache and reuse
void Entity::update() {
if (!cachedPath || targetMoved) {
cachedPath = computePathTo(target);
}
followPath(cachedPath);
}
Expected: 10x+ improvement for pathfinding-heavy scenarios
Optimization Checklist
Before optimizing:
- Profiled and identified real bottleneck
- Measured baseline performance
- Understood root cause
During optimization:
- Changed only one thing at a time
- Kept original code for comparison
- Added comments explaining optimization
After optimization:
- Measured improvement
- Verified correctness (no bugs introduced)
- Updated tests if needed
Step 5: Verify - Measure Improvement
Re-run Benchmarks
Before optimization:
Average: 45.2ms
Min: 38.1ms
Max: 62.3ms
FPS: 22.1
After optimization:
Average: 8.5ms ← 5.3x improvement!
Min: 7.2ms
Max: 12.1ms
FPS: 117.6
Check Correctness
Visual testing:
- Run game normally (not headless)
- Verify visual output unchanged
- Test edge cases (empty grids, max entities, etc.)
Automated testing:
# Run existing test suite
./mcrogueface --headless --exec tests/test_grid_operations.py
./mcrogueface --headless --exec tests/test_entity_movement.py
Document Results
Create issue comment:
## Performance Optimization Results
**Issue:** #116 (Dirty Flag System)
**Baseline:**
- Static grid (100x100): 32.1ms average
- FPS: 22
**After Optimization:**
- Static grid (100x100): 0.8ms average
- FPS: 118
**Improvement:** 40x faster for static scenes
**Test:** `tests/benchmark_static_grid.py`
**Commit:** abc123def
Common Optimization Patterns
Pattern 1: Early Exit
// Check cheap conditions first
if (!visible) return;
if (opacity <= 0.0f) return;
if (!inViewport(bounds)) return;
// Then do expensive work
render();
Pattern 2: Lazy Evaluation
// Don't compute until needed
mutable bool fovComputed = false;
mutable std::vector<Point> visibleCells;
std::vector<Point>& getVisibleCells() {
if (!fovComputed) {
visibleCells = computeFOV();
fovComputed = true;
}
return visibleCells;
}
Pattern 3: Object Pooling
// Reuse instead of allocate/deallocate
class EntityPool {
std::vector<Entity> pool;
std::vector<bool> active;
public:
Entity* spawn() {
for (size_t i = 0; i < pool.size(); i++) {
if (!active[i]) {
active[i] = true;
return &pool[i];
}
}
// Grow pool if needed
pool.emplace_back();
active.push_back(true);
return &pool.back();
}
};
Pattern 4: Space-Time Tradeoff
// Cache expensive computation
std::unordered_map<int, std::vector<Point>> pathCache;
std::vector<Point> getPathTo(int targetId) {
if (pathCache.contains(targetId)) {
return pathCache[targetId]; // O(1) lookup
}
auto path = computeExpensivePath(targetId);
pathCache[targetId] = path;
return path;
}
When NOT to Optimize
Don't optimize if:
- Performance is already acceptable (< 16ms frame time)
- Optimization makes code significantly more complex
- You haven't profiled yet (no guessing!)
- The bottleneck is elsewhere (optimize hot paths first)
Premature optimization is the root of all evil - Donald Knuth
Focus on:
- Correctness first
- Profile to find real bottlenecks
- Optimize hot paths only
- Keep code maintainable
Related Documentation
- Performance-and-Profiling - Profiling tools reference
- Grid-System - Grid optimization opportunities
- Writing-Tests - Creating performance tests
Open Issues: