1 Performance and Profiling
John McCardle edited this page 2025-11-29 23:30:31 +00:00

Performance and Profiling

Performance monitoring and optimization infrastructure for McRogueFace. Press F3 in-game to see real-time metrics, or use the benchmark API to capture detailed timing data to disk.

Quick Reference

Related Issues:

  • #104 - Basic Profiling/Metrics (Closed - Implemented)
  • #148 - Dirty Flag RenderTexture Caching (Closed - Implemented)
  • #123 - Chunk-based Grid Rendering (Closed - Implemented)
  • #115 - SpatialHash Implementation (Open - Tier 1)
  • #113 - Batch Operations for Grid (Open - Tier 1)
  • #117 - Memory Pool for Entities (Open - Tier 1)

Key Files:

  • src/Profiler.h - ScopedTimer RAII helper
  • src/ProfilerOverlay.cpp - F3 overlay visualization
  • src/GameEngine.h - ProfilingMetrics struct

Benchmark API

The benchmark API captures detailed per-frame timing data to JSON files. C++ handles all timing responsibility; Python processes results afterward.

Basic Usage

import mcrfpy

# Start capturing benchmark data
mcrfpy.start_benchmark()

# ... run your test scenario ...

# Stop and get the output filename
filename = mcrfpy.end_benchmark()
print(f"Benchmark saved to: {filename}")
# e.g., "benchmark_12345_20250528_143022.json"

Adding Log Messages

Mark specific events within the benchmark:

mcrfpy.start_benchmark()

# Your code...
mcrfpy.log_benchmark("Player spawned")

# More code...
mcrfpy.log_benchmark("Combat started")

filename = mcrfpy.end_benchmark()

Log messages appear in the logs array of each frame in the output JSON.

Output Format

The JSON file contains per-frame data:

{
  "frames": [
    {
      "frame_number": 1,
      "frame_time_ms": 12.5,
      "grid_render_time_ms": 8.2,
      "entity_render_time_ms": 2.1,
      "python_time_ms": 1.8,
      "logs": ["Player spawned"]
    },
    ...
  ],
  "summary": {
    "total_frames": 1000,
    "avg_frame_time_ms": 14.2,
    "max_frame_time_ms": 28.5,
    "min_frame_time_ms": 8.1
  }
}

Processing Results

Since Python processes results after capture, timing overhead doesn't affect measurements:

import json

def analyze_benchmark(filename):
    with open(filename) as f:
        data = json.load(f)

    frames = data["frames"]
    slow_frames = [f for f in frames if f["frame_time_ms"] > 16.67]

    print(f"Total frames: {len(frames)}")
    print(f"Slow frames (>16.67ms): {len(slow_frames)}")
    print(f"Average: {data['summary']['avg_frame_time_ms']:.2f}ms")

    # Find what was happening during slow frames
    for frame in slow_frames[:5]:
        print(f"  Frame {frame['frame_number']}: {frame['frame_time_ms']:.1f}ms")
        if frame.get("logs"):
            print(f"    Logs: {frame['logs']}")

F3 Profiler Overlay

Activation: Press F3 during gameplay

Displays:

  • Frame time (ms) with color coding:
    • Green: < 16ms (60+ FPS)
    • Yellow: 16-33ms (30-60 FPS)
    • Red: > 33ms (< 30 FPS)
  • FPS (averaged over 60 frames)
  • Detailed breakdowns:
    • Grid rendering time
    • Entity rendering time
    • Python script time
    • Animation update time
  • Per-frame counts:
    • Grid cells rendered
    • Entities rendered (visible/total)
    • Draw calls

Implementation: src/ProfilerOverlay.cpp


Current Performance

Implemented Optimizations

Chunk-based Rendering (#123):

  • Large grids divided into chunks (~256 cells each)
  • Only visible chunks processed
  • 1000x1000+ grids render efficiently

Dirty Flag Caching (#148):

  • Layers track changes per-chunk
  • Unchanged chunks reuse cached RenderTexture
  • Static scenes: near-zero CPU cost after initial render

Viewport Culling:

  • Only cells/entities within viewport processed
  • Camera position and zoom respected

Current Bottlenecks

Entity Spatial Queries - O(n) iteration:

  • Finding entities at position requires checking all entities
  • Becomes noticeable at 500+ entities
  • Solution: #115 SpatialHash

Bulk Grid Updates - Python/C++ boundary:

  • Many individual layer.set() calls are slower than batch operations
  • Each call crosses the Python/C++ boundary
  • Solution: #113 Batch Operations

Entity Allocation - Memory fragmentation:

  • Frequent spawn/destroy cycles fragment memory
  • Solution: #117 Memory Pool

Optimization Workflow

  1. Profile: Press F3, identify which metric is high
  2. Benchmark: Use start_benchmark() to capture detailed data
  3. Analyze: Process JSON to find patterns in slow frames
  4. Optimize: Make targeted changes
  5. Verify: Re-run benchmark, compare results
  6. Iterate: Repeat until acceptable performance

Performance Targets

Metric Target Notes
Frame time < 16.67ms 60 FPS
Grid render < 5ms For typical game grids
Entity render < 2ms For < 200 entities
Python callbacks < 2ms Keep logic light

C++ Instrumentation

ScopedTimer (RAII)

Automatic timing for code blocks:

#include "Profiler.h"

void expensiveFunction() {
    ScopedTimer timer(Resources::game->metrics.functionTime);
    // ... code to measure ...
    // Timer automatically records duration on destruction
}

Adding New Metrics

  1. Add field to ProfilingMetrics in src/GameEngine.h
  2. Reset in resetPerFrame() if per-frame counter
  3. Display in src/ProfilerOverlay.cpp::update()
  4. Instrument code with ScopedTimer


Last updated: 2025-11-29