Table of Contents

Performance and Profiling

Quick Reference
Benchmark API

Basic Usage
Adding Log Messages
Output Format
Processing Results

F3 Profiler Overlay
Current Performance

Implemented Optimizations
Current Bottlenecks

Optimization Workflow

Performance Targets

C++ Instrumentation

ScopedTimer (RAII)
Adding New Metrics

Related Systems

Performance and Profiling

Performance monitoring and optimization infrastructure for McRogueFace. Press F3 in-game to see real-time metrics, or use the benchmark API to capture detailed timing data to disk.

Quick Reference

Related Issues:

#104 - Basic Profiling/Metrics (Closed - Implemented)
#148 - Dirty Flag RenderTexture Caching (Closed - Implemented)
#123 - Chunk-based Grid Rendering (Closed - Implemented)
#115 - SpatialHash Implementation (Open - Tier 1)
#113 - Batch Operations for Grid (Open - Tier 1)
#117 - Memory Pool for Entities (Open - Tier 1)

Key Files:

src/Profiler.h - ScopedTimer RAII helper
src/ProfilerOverlay.cpp - F3 overlay visualization
src/GameEngine.h - ProfilingMetrics struct

Benchmark API

The benchmark API captures detailed per-frame timing data to JSON files. C++ handles all timing responsibility; Python processes results afterward.

Basic Usage

import mcrfpy

# Start capturing benchmark data
mcrfpy.start_benchmark()

# ... run your test scenario ...

# Stop and get the output filename
filename = mcrfpy.end_benchmark()
print(f"Benchmark saved to: {filename}")
# e.g., "benchmark_12345_20250528_143022.json"

Adding Log Messages

Mark specific events within the benchmark:

mcrfpy.start_benchmark()

# Your code...
mcrfpy.log_benchmark("Player spawned")

# More code...
mcrfpy.log_benchmark("Combat started")

filename = mcrfpy.end_benchmark()

Log messages appear in the logs array of each frame in the output JSON.

Output Format

The JSON file contains per-frame data:

{
  "frames": [
    {
      "frame_number": 1,
      "frame_time_ms": 12.5,
      "grid_render_time_ms": 8.2,
      "entity_render_time_ms": 2.1,
      "python_time_ms": 1.8,
      "logs": ["Player spawned"]
    },
    ...
  ],
  "summary": {
    "total_frames": 1000,
    "avg_frame_time_ms": 14.2,
    "max_frame_time_ms": 28.5,
    "min_frame_time_ms": 8.1
  }
}

Processing Results

Since Python processes results after capture, timing overhead doesn't affect measurements:

import json

def analyze_benchmark(filename):
    with open(filename) as f:
        data = json.load(f)

    frames = data["frames"]
    slow_frames = [f for f in frames if f["frame_time_ms"] > 16.67]

    print(f"Total frames: {len(frames)}")
    print(f"Slow frames (>16.67ms): {len(slow_frames)}")
    print(f"Average: {data['summary']['avg_frame_time_ms']:.2f}ms")

    # Find what was happening during slow frames
    for frame in slow_frames[:5]:
        print(f"  Frame {frame['frame_number']}: {frame['frame_time_ms']:.1f}ms")
        if frame.get("logs"):
            print(f"    Logs: {frame['logs']}")

F3 Profiler Overlay

Activation: Press F3 during gameplay

Displays:

Frame time (ms) with color coding:
- Green: < 16ms (60+ FPS)
- Yellow: 16-33ms (30-60 FPS)
- Red: > 33ms (< 30 FPS)
FPS (averaged over 60 frames)
Detailed breakdowns:
- Grid rendering time
- Entity rendering time
- Python script time
- Animation update time
Per-frame counts:
- Grid cells rendered
- Entities rendered (visible/total)
- Draw calls

Implementation: src/ProfilerOverlay.cpp

Current Performance

Implemented Optimizations

Chunk-based Rendering (#123):

Large grids divided into chunks (~256 cells each)
Only visible chunks processed
1000x1000+ grids render efficiently

Dirty Flag Caching (#148):

Layers track changes per-chunk
Unchanged chunks reuse cached RenderTexture
Static scenes: near-zero CPU cost after initial render

Viewport Culling:

Only cells/entities within viewport processed
Camera position and zoom respected

Current Bottlenecks

Entity Spatial Queries - O(n) iteration:

Finding entities at position requires checking all entities
Becomes noticeable at 500+ entities
Solution: #115 SpatialHash

Bulk Grid Updates - Python/C++ boundary:

Many individual layer.set() calls are slower than batch operations
Each call crosses the Python/C++ boundary
Solution: #113 Batch Operations

Entity Allocation - Memory fragmentation:

Frequent spawn/destroy cycles fragment memory
Solution: #117 Memory Pool

Optimization Workflow

Profile: Press F3, identify which metric is high
Benchmark: Use start_benchmark() to capture detailed data
Analyze: Process JSON to find patterns in slow frames
Optimize: Make targeted changes
Verify: Re-run benchmark, compare results
Iterate: Repeat until acceptable performance

Performance Targets

Metric	Target	Notes
Frame time	< 16.67ms	60 FPS
Grid render	< 5ms	For typical game grids
Entity render	< 2ms	For < 200 entities
Python callbacks	< 2ms	Keep logic light

C++ Instrumentation

ScopedTimer (RAII)

Automatic timing for code blocks:

#include "Profiler.h"

void expensiveFunction() {
    ScopedTimer timer(Resources::game->metrics.functionTime);
    // ... code to measure ...
    // Timer automatically records duration on destruction
}

Adding New Metrics

Add field to ProfilingMetrics in src/GameEngine.h
Reset in resetPerFrame() if per-frame counter
Display in src/ProfilerOverlay.cpp::update()
Instrument code with ScopedTimer

Grid-Rendering-Pipeline - Chunk caching and dirty flags
Entity-Management - Entity performance considerations
Writing-Tests - Performance test creation

Last updated: 2025-11-29