1 Performance-and-Profiling
John McCardle edited this page 2025-10-25 20:59:11 +00:00

Performance and Profiling

Performance monitoring and optimization infrastructure for McRogueFace. Press F3 in-game to see real-time metrics.

Quick Reference

Related Issues:

  • #104 - Basic Profiling/Metrics (Tier 1 - Implemented)
  • #115 - SpatialHash Implementation (Tier 1 - Active)
  • #116 - Dirty Flag System (Tier 1 - Active)
  • #117 - Memory Pool for Entities (Tier 1 - Active)
  • #113 - Batch Operations for Grid (Tier 1 - Active)

Key Files:

  • src/Profiler.h - ScopedTimer RAII helper
  • src/ProfilerOverlay.cpp - F3 overlay visualization
  • src/GameEngine.h - ProfilingMetrics struct
  • tests/benchmark_*.py - Performance benchmarks

Implementation: Commit e9e9cd2 (October 2025)

Profiling Tools

F3 Profiler Overlay

Activation: Press F3 during gameplay

Displays:

  • Frame time (ms) with color coding:
    • Green: < 16ms (60+ FPS)
    • Yellow: 16-33ms (30-60 FPS)
    • Red: > 33ms (< 30 FPS)
  • FPS (averaged over 60 frames)
  • Detailed breakdowns:
    • Grid rendering time
    • Entity rendering time
    • FOV overlay time
    • Python script time
    • Animation update time
  • Per-frame counts:
    • Grid cells rendered
    • Entities rendered (visible/total)
    • Draw calls

Implementation: src/ProfilerOverlay.cpp::update(), src/ProfilerOverlay.cpp::render()

ScopedTimer (RAII)

Automatic timing for code blocks:

#include "Profiler.h"

void expensiveFunction() {
    ScopedTimer timer(Resources::game->metrics.functionTime);
    // ... code to measure ...
    // Timer automatically records duration on destruction
}

How it works:

  • Constructor: Records start time
  • Destructor: Calculates elapsed time, adds to metric
  • Zero overhead if profiling disabled

Implementation: src/Profiler.h::ScopedTimer

ProfilingMetrics Struct

Centralized metrics collection:

struct ProfilingMetrics {
    float frameTime = 0.0f;        // Current frame time (ms)
    float avgFrameTime = 0.0f;     // 60-frame average
    int fps = 0;                   // Frames per second
    
    // Detailed timing
    float gridRenderTime = 0.0f;   // Grid rendering
    float entityRenderTime = 0.0f; // Entity rendering
    float fovOverlayTime = 0.0f;   // FOV overlay
    float pythonScriptTime = 0.0f; // Python callbacks
    float animationTime = 0.0f;    // Animation updates
    
    // Per-frame counters
    int gridCellsRendered = 0;
    int entitiesRendered = 0;
    int totalEntities = 0;
    int drawCalls = 0;
    
    void resetPerFrame();  // Call at start of frame
};

Usage: Access via Resources::game->metrics

Implementation: src/GameEngine.h::ProfilingMetrics

Optimization Workflow

See Performance-Optimization-Workflow for detailed guide.

Quick workflow:

  1. Profile: Press F3, identify bottleneck
  2. Instrument: Add ScopedTimers around suspect code
  3. Benchmark: Run tests/benchmark_*.py for baseline
  4. Optimize: Make targeted changes
  5. Verify: Re-run benchmark, check improvement
  6. Iterate: Repeat until acceptable performance

Performance Benchmarks

Benchmark Suite

Files:

  • tests/benchmark_static_grid.py - Grid rendering performance
  • tests/benchmark_moving_entities.py - Entity rendering + movement

Running benchmarks:

cd build
./mcrogueface --headless --exec ../tests/benchmark_static_grid.py

Output: Frame time statistics, entity counts, render metrics

Current Performance Characteristics

Grid Rendering:

  • Static 100x100 grid: ~3-5ms (optimized with culling)
  • Large grids (200x200): Can exceed 16ms without optimization
  • Bottleneck: Redrawing unchanged cells every frame

Entity Rendering:

  • < 100 entities: < 1ms (with culling)
  • 1000 entities: ~5-10ms (needs SpatialHash optimization)
  • Bottleneck: O(n) iteration, no spatial indexing

Python Callbacks:

  • Minimal overhead if callbacks short (< 1ms typical)
  • Bottleneck: Heavy computation in Python update loops

Planned Optimizations

Tier 1 (Active Development)

#116: Dirty Flag System

  • Problem: Static grids redrawn every frame wastefully
  • Solution: Track changes, only redraw when modified
  • Expected: 10-50x improvement for static scenes

#115: SpatialHash

  • Problem: O(n) entity iteration for spatial queries
  • Solution: Hash grid for O(1) entity lookups by position
  • Expected: 100x+ improvement for large entity counts

#113: Batch Operations

  • Problem: Python/C++ boundary crossings for individual cells
  • Solution: NumPy-style batch operations
  • Expected: 10-100x improvement for bulk updates

#117: Memory Pool

  • Problem: Entity allocation/deallocation overhead
  • Solution: Pre-allocated entity pool, recycling
  • Expected: Reduced memory fragmentation, faster spawning

Common Performance Issues

Issue: Low FPS on Static Screens

Symptom: Red FPS counter, high grid render time, nothing moving

Cause: Grid redrawing unchanged cells

Solution:

  • Short-term: Reduce grid size
  • Long-term: Wait for #116 (Dirty flags)

Issue: Slow Entity Queries

Symptom: High Python script time when finding nearby entities

Cause: O(n) iteration through all entities

Solution:

  • Short-term: Limit entity counts (< 100)
  • Long-term: Wait for #115 (SpatialHash)

Issue: Frame Drops During Bulk Updates

Symptom: Yellow/red FPS when updating many grid cells from Python

Cause: Python/C++ boundary crossings

Solution:

  • Short-term: Batch updates manually in C++
  • Long-term: Wait for #113 (Batch operations)

Instrumentation Examples

Adding Metrics to New Systems

// In system's update() or render() function
void MySystem::render() {
    ScopedTimer timer(Resources::game->metrics.mySystemTime);
    
    // Your rendering code
    for (auto& item : items) {
        item->render();
        Resources::game->metrics.itemsRendered++;
    }
}

Adding New Metric Fields

  1. Add field to ProfilingMetrics in src/GameEngine.h
  2. Reset in resetPerFrame() if per-frame counter
  3. Display in src/ProfilerOverlay.cpp::update()
  4. Instrument code with ScopedTimer

Design Decisions

Why F3 Toggle?

  • Standard in game development (Minecraft, Unity, etc)
  • Non-intrusive: Disabled by default
  • Real-time feedback during gameplay

Why RAII ScopedTimer?

  • Automatic cleanup prevents missing timer stops
  • Exception-safe timing
  • Zero overhead when profiling disabled

Tradeoffs:

  • Overlay rendering adds small overhead (~0.1ms)
  • Metrics collection adds branching (negligible)
  • But: Essential visibility for optimization

See Also: