Table of Contents
Performance and Profiling
Performance monitoring and optimization infrastructure for McRogueFace. Press F3 in-game to see real-time metrics, or use the benchmark API to capture detailed timing data to disk.
Quick Reference
Related Issues:
- #104 - Basic Profiling/Metrics (Closed - Implemented)
- #148 - Dirty Flag RenderTexture Caching (Closed - Implemented)
- #123 - Chunk-based Grid Rendering (Closed - Implemented)
- #115 - SpatialHash Implementation (Open - Tier 1)
- #113 - Batch Operations for Grid (Open - Tier 1)
- #117 - Memory Pool for Entities (Open - Tier 1)
Key Files:
src/Profiler.h- ScopedTimer RAII helpersrc/ProfilerOverlay.cpp- F3 overlay visualizationsrc/GameEngine.h- ProfilingMetrics struct
Benchmark API
The benchmark API captures detailed per-frame timing data to JSON files. C++ handles all timing responsibility; Python processes results afterward.
Basic Usage
import mcrfpy
# Start capturing benchmark data
mcrfpy.start_benchmark()
# ... run your test scenario ...
# Stop and get the output filename
filename = mcrfpy.end_benchmark()
print(f"Benchmark saved to: {filename}")
# e.g., "benchmark_12345_20250528_143022.json"
Adding Log Messages
Mark specific events within the benchmark:
mcrfpy.start_benchmark()
# Your code...
mcrfpy.log_benchmark("Player spawned")
# More code...
mcrfpy.log_benchmark("Combat started")
filename = mcrfpy.end_benchmark()
Log messages appear in the logs array of each frame in the output JSON.
Output Format
The JSON file contains per-frame data:
{
"frames": [
{
"frame_number": 1,
"frame_time_ms": 12.5,
"grid_render_time_ms": 8.2,
"entity_render_time_ms": 2.1,
"python_time_ms": 1.8,
"logs": ["Player spawned"]
},
...
],
"summary": {
"total_frames": 1000,
"avg_frame_time_ms": 14.2,
"max_frame_time_ms": 28.5,
"min_frame_time_ms": 8.1
}
}
Processing Results
Since Python processes results after capture, timing overhead doesn't affect measurements:
import json
def analyze_benchmark(filename):
with open(filename) as f:
data = json.load(f)
frames = data["frames"]
slow_frames = [f for f in frames if f["frame_time_ms"] > 16.67]
print(f"Total frames: {len(frames)}")
print(f"Slow frames (>16.67ms): {len(slow_frames)}")
print(f"Average: {data['summary']['avg_frame_time_ms']:.2f}ms")
# Find what was happening during slow frames
for frame in slow_frames[:5]:
print(f" Frame {frame['frame_number']}: {frame['frame_time_ms']:.1f}ms")
if frame.get("logs"):
print(f" Logs: {frame['logs']}")
F3 Profiler Overlay
Activation: Press F3 during gameplay
Displays:
- Frame time (ms) with color coding:
- Green: < 16ms (60+ FPS)
- Yellow: 16-33ms (30-60 FPS)
- Red: > 33ms (< 30 FPS)
- FPS (averaged over 60 frames)
- Detailed breakdowns:
- Grid rendering time
- Entity rendering time
- Python script time
- Animation update time
- Per-frame counts:
- Grid cells rendered
- Entities rendered (visible/total)
- Draw calls
Implementation: src/ProfilerOverlay.cpp
Current Performance
Implemented Optimizations
Chunk-based Rendering (#123):
- Large grids divided into chunks (~256 cells each)
- Only visible chunks processed
- 1000x1000+ grids render efficiently
Dirty Flag Caching (#148):
- Layers track changes per-chunk
- Unchanged chunks reuse cached RenderTexture
- Static scenes: near-zero CPU cost after initial render
Viewport Culling:
- Only cells/entities within viewport processed
- Camera position and zoom respected
Current Bottlenecks
Entity Spatial Queries - O(n) iteration:
- Finding entities at position requires checking all entities
- Becomes noticeable at 500+ entities
- Solution: #115 SpatialHash
Bulk Grid Updates - Python/C++ boundary:
- Many individual
layer.set()calls are slower than batch operations - Each call crosses the Python/C++ boundary
- Solution: #113 Batch Operations
Entity Allocation - Memory fragmentation:
- Frequent spawn/destroy cycles fragment memory
- Solution: #117 Memory Pool
Optimization Workflow
- Profile: Press F3, identify which metric is high
- Benchmark: Use
start_benchmark()to capture detailed data - Analyze: Process JSON to find patterns in slow frames
- Optimize: Make targeted changes
- Verify: Re-run benchmark, compare results
- Iterate: Repeat until acceptable performance
Performance Targets
| Metric | Target | Notes |
|---|---|---|
| Frame time | < 16.67ms | 60 FPS |
| Grid render | < 5ms | For typical game grids |
| Entity render | < 2ms | For < 200 entities |
| Python callbacks | < 2ms | Keep logic light |
C++ Instrumentation
ScopedTimer (RAII)
Automatic timing for code blocks:
#include "Profiler.h"
void expensiveFunction() {
ScopedTimer timer(Resources::game->metrics.functionTime);
// ... code to measure ...
// Timer automatically records duration on destruction
}
Adding New Metrics
- Add field to
ProfilingMetricsinsrc/GameEngine.h - Reset in
resetPerFrame()if per-frame counter - Display in
src/ProfilerOverlay.cpp::update() - Instrument code with ScopedTimer
Related Systems
- Grid-Rendering-Pipeline - Chunk caching and dirty flags
- Entity-Management - Entity performance considerations
- Writing-Tests - Performance test creation
Last updated: 2025-11-29