Update "Performance-and-Profiling"
parent
86ddc7d383
commit
ac23c3b889
|
|
@ -0,0 +1,226 @@
|
|||
# Performance and Profiling
|
||||
|
||||
Performance monitoring and optimization infrastructure for McRogueFace. Press F3 in-game to see real-time metrics, or use the benchmark API to capture detailed timing data to disk.
|
||||
|
||||
## Quick Reference
|
||||
|
||||
**Related Issues:**
|
||||
- [#104](../../issues/104) - Basic Profiling/Metrics (Closed - Implemented)
|
||||
- [#148](../../issues/148) - Dirty Flag RenderTexture Caching (Closed - Implemented)
|
||||
- [#123](../../issues/123) - Chunk-based Grid Rendering (Closed - Implemented)
|
||||
- [#115](../../issues/115) - SpatialHash Implementation (Open - Tier 1)
|
||||
- [#113](../../issues/113) - Batch Operations for Grid (Open - Tier 1)
|
||||
- [#117](../../issues/117) - Memory Pool for Entities (Open - Tier 1)
|
||||
|
||||
**Key Files:**
|
||||
- `src/Profiler.h` - ScopedTimer RAII helper
|
||||
- `src/ProfilerOverlay.cpp` - F3 overlay visualization
|
||||
- `src/GameEngine.h` - ProfilingMetrics struct
|
||||
|
||||
---
|
||||
|
||||
## Benchmark API
|
||||
|
||||
The benchmark API captures detailed per-frame timing data to JSON files. C++ handles all timing responsibility; Python processes results afterward.
|
||||
|
||||
### Basic Usage
|
||||
|
||||
```python
|
||||
import mcrfpy
|
||||
|
||||
# Start capturing benchmark data
|
||||
mcrfpy.start_benchmark()
|
||||
|
||||
# ... run your test scenario ...
|
||||
|
||||
# Stop and get the output filename
|
||||
filename = mcrfpy.end_benchmark()
|
||||
print(f"Benchmark saved to: {filename}")
|
||||
# e.g., "benchmark_12345_20250528_143022.json"
|
||||
```
|
||||
|
||||
### Adding Log Messages
|
||||
|
||||
Mark specific events within the benchmark:
|
||||
|
||||
```python
|
||||
mcrfpy.start_benchmark()
|
||||
|
||||
# Your code...
|
||||
mcrfpy.log_benchmark("Player spawned")
|
||||
|
||||
# More code...
|
||||
mcrfpy.log_benchmark("Combat started")
|
||||
|
||||
filename = mcrfpy.end_benchmark()
|
||||
```
|
||||
|
||||
Log messages appear in the `logs` array of each frame in the output JSON.
|
||||
|
||||
### Output Format
|
||||
|
||||
The JSON file contains per-frame data:
|
||||
|
||||
```json
|
||||
{
|
||||
"frames": [
|
||||
{
|
||||
"frame_number": 1,
|
||||
"frame_time_ms": 12.5,
|
||||
"grid_render_time_ms": 8.2,
|
||||
"entity_render_time_ms": 2.1,
|
||||
"python_time_ms": 1.8,
|
||||
"logs": ["Player spawned"]
|
||||
},
|
||||
...
|
||||
],
|
||||
"summary": {
|
||||
"total_frames": 1000,
|
||||
"avg_frame_time_ms": 14.2,
|
||||
"max_frame_time_ms": 28.5,
|
||||
"min_frame_time_ms": 8.1
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Processing Results
|
||||
|
||||
Since Python processes results *after* capture, timing overhead doesn't affect measurements:
|
||||
|
||||
```python
|
||||
import json
|
||||
|
||||
def analyze_benchmark(filename):
|
||||
with open(filename) as f:
|
||||
data = json.load(f)
|
||||
|
||||
frames = data["frames"]
|
||||
slow_frames = [f for f in frames if f["frame_time_ms"] > 16.67]
|
||||
|
||||
print(f"Total frames: {len(frames)}")
|
||||
print(f"Slow frames (>16.67ms): {len(slow_frames)}")
|
||||
print(f"Average: {data['summary']['avg_frame_time_ms']:.2f}ms")
|
||||
|
||||
# Find what was happening during slow frames
|
||||
for frame in slow_frames[:5]:
|
||||
print(f" Frame {frame['frame_number']}: {frame['frame_time_ms']:.1f}ms")
|
||||
if frame.get("logs"):
|
||||
print(f" Logs: {frame['logs']}")
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## F3 Profiler Overlay
|
||||
|
||||
**Activation:** Press F3 during gameplay
|
||||
|
||||
**Displays:**
|
||||
- Frame time (ms) with color coding:
|
||||
- Green: < 16ms (60+ FPS)
|
||||
- Yellow: 16-33ms (30-60 FPS)
|
||||
- Red: > 33ms (< 30 FPS)
|
||||
- FPS (averaged over 60 frames)
|
||||
- Detailed breakdowns:
|
||||
- Grid rendering time
|
||||
- Entity rendering time
|
||||
- Python script time
|
||||
- Animation update time
|
||||
- Per-frame counts:
|
||||
- Grid cells rendered
|
||||
- Entities rendered (visible/total)
|
||||
- Draw calls
|
||||
|
||||
**Implementation:** `src/ProfilerOverlay.cpp`
|
||||
|
||||
---
|
||||
|
||||
## Current Performance
|
||||
|
||||
### Implemented Optimizations
|
||||
|
||||
**Chunk-based Rendering** ([#123](../../issues/123)):
|
||||
- Large grids divided into chunks (~256 cells each)
|
||||
- Only visible chunks processed
|
||||
- 1000x1000+ grids render efficiently
|
||||
|
||||
**Dirty Flag Caching** ([#148](../../issues/148)):
|
||||
- Layers track changes per-chunk
|
||||
- Unchanged chunks reuse cached RenderTexture
|
||||
- Static scenes: near-zero CPU cost after initial render
|
||||
|
||||
**Viewport Culling:**
|
||||
- Only cells/entities within viewport processed
|
||||
- Camera position and zoom respected
|
||||
|
||||
### Current Bottlenecks
|
||||
|
||||
**Entity Spatial Queries** - O(n) iteration:
|
||||
- Finding entities at position requires checking all entities
|
||||
- Becomes noticeable at 500+ entities
|
||||
- **Solution:** [#115](../../issues/115) SpatialHash
|
||||
|
||||
**Bulk Grid Updates** - Python/C++ boundary:
|
||||
- Many individual `layer.set()` calls are slower than batch operations
|
||||
- Each call crosses the Python/C++ boundary
|
||||
- **Solution:** [#113](../../issues/113) Batch Operations
|
||||
|
||||
**Entity Allocation** - Memory fragmentation:
|
||||
- Frequent spawn/destroy cycles fragment memory
|
||||
- **Solution:** [#117](../../issues/117) Memory Pool
|
||||
|
||||
---
|
||||
|
||||
## Optimization Workflow
|
||||
|
||||
1. **Profile**: Press F3, identify which metric is high
|
||||
2. **Benchmark**: Use `start_benchmark()` to capture detailed data
|
||||
3. **Analyze**: Process JSON to find patterns in slow frames
|
||||
4. **Optimize**: Make targeted changes
|
||||
5. **Verify**: Re-run benchmark, compare results
|
||||
6. **Iterate**: Repeat until acceptable performance
|
||||
|
||||
### Performance Targets
|
||||
|
||||
| Metric | Target | Notes |
|
||||
|--------|--------|-------|
|
||||
| Frame time | < 16.67ms | 60 FPS |
|
||||
| Grid render | < 5ms | For typical game grids |
|
||||
| Entity render | < 2ms | For < 200 entities |
|
||||
| Python callbacks | < 2ms | Keep logic light |
|
||||
|
||||
---
|
||||
|
||||
## C++ Instrumentation
|
||||
|
||||
### ScopedTimer (RAII)
|
||||
|
||||
Automatic timing for code blocks:
|
||||
|
||||
```cpp
|
||||
#include "Profiler.h"
|
||||
|
||||
void expensiveFunction() {
|
||||
ScopedTimer timer(Resources::game->metrics.functionTime);
|
||||
// ... code to measure ...
|
||||
// Timer automatically records duration on destruction
|
||||
}
|
||||
```
|
||||
|
||||
### Adding New Metrics
|
||||
|
||||
1. Add field to `ProfilingMetrics` in `src/GameEngine.h`
|
||||
2. Reset in `resetPerFrame()` if per-frame counter
|
||||
3. Display in `src/ProfilerOverlay.cpp::update()`
|
||||
4. Instrument code with ScopedTimer
|
||||
|
||||
---
|
||||
|
||||
## Related Systems
|
||||
|
||||
- [[Grid-Rendering-Pipeline]] - Chunk caching and dirty flags
|
||||
- [[Entity-Management]] - Entity performance considerations
|
||||
- [[Writing-Tests]] - Performance test creation
|
||||
|
||||
---
|
||||
|
||||
*Last updated: 2025-11-29*
|
||||
Loading…
Reference in New Issue