Grounded Multi-Agent Testbed: LLM Agents in Discrete Simulated Environments #154

New Issue

john · 2025-12-01T15:59:23Z

john commented

2025-12-01 15:59:23 +00:00

Grounded Multi-Agent Testbed

Umbrella issue for research infrastructure enabling LLM agents to operate in McRogueFace environments.

Overview

This project implements a testbed for studying grounded language understanding in AI systems. Based on cognitive neuroscience research suggesting that language understanding requires integration of perceptual simulation, world knowledge, and situation modeling (Casto et al., 2025), we use McRogueFace's discrete roguelike environment to study how language agents learn affordances, develop theories of mind, and generalize across novel situations.

Three-Level Architecture

Level 1: Environment Comprehension (Base Physics)

Grid movement, walls, walkability
Field of view computation
Pathfinding (A*, Dijkstra)
Line of sight for interactions

Engine support: Mostly complete via mcrfpy.libtcod

Level 2: Game Behavior Comprehension (Middle Level)

Object affordances: pushable, openable, lockable
NPC behaviors: patrol routes, flee thresholds, scripted responses
Cause-effect chains: button→door, key→lock
Environmental rules: shop hours, trap triggers

Engine support: Entity event system exists (bump, ev_enter, ev_exit), needs formalization

Level 3: Equal Agent Comprehension (High Level)

LLM-driven agents with natural language capability
Per-agent perspective rendering for VLM input
Memory/observation history per agent
Speech affordances: "announce" (room-wide), "speak" (proximity-based)
Emergent behaviors: barter, negotiation, team formation

Engine support: Requires new infrastructure

Two Operating Modes

Headless Simulation Mode

No framerate requirements
Sequential agent turns with full LLM API calls
All randomized NPC behaviors stored as actual values
Entities teleport (no animation)
Output: Simulation log with all decisions and world states

Animated Demo Mode

Replay stored simulation outputs
Full animation of movement paths
Step through simulation as animations complete
Deterministic playback (same random seeds)

Research Questions (from proposal)

Does grounding improve compositional generalization?
Does physics randomization induce investigative behavior?
What transfers between cognitive domains?
Can agents develop accurate theories of mind for scripted NPCs?
What emergent social behaviors arise from multi-agent interaction?
How do agents resolve conflicts between visual, textual, and queried information?

Development Phases

Phase 1: Two Agents in a Room

10x10 room with locked door
Two LLM agents, one button, one door
Agent A can reach button; Agent B's goal blocked by door
Success: Agents solve coordination through emergent communication

Phase 2: Learning Middle-Level Behaviors

Add scripted NPCs (cat, guard, shop door)
Agents must predict and exploit NPC behavior patterns

Phase 3: Affordance Learning and Puzzles

ConceptNet-based object generation with varied properties
Physics randomization (same puzzle, different rules)
Compositional generalization testing

Phase 4: Economic Reasoning

Shops, quest givers, inter-agent trade
Asymmetric information scenarios

Phase 5: Town and Dungeon Integration

Full scenario combining all elements
Emergent dynamics: reputation, party formation, betrayal

Blocking Issues

Engine infrastructure required:

#153 - Separate render loop from game state loop (critical for two-mode system)
#16 - Entity knowledge contents / per-entity perspective
#113 - Batch Operations for Grid (efficient state manipulation)
#114 - CellView API (convenient grid interaction)

Project-specific issues:

#155 - Deterministic Text Descriptions From Room Graph
#156 - Turn-based LLM Agent Orchestration

#55 - McRogueFace as Agent Simulation Environment (predecessor, will be closed early)
#67 - Grid Stitching (future: infinite world scenarios)

References

Casto, C., Ivanova, A., Fedorenko, E., & Kanwisher, N. (2025). What does it mean to understand language? arXiv:2511.19757
Park, J. S., et al. (2023). Generative agents: Interactive simulacra of human behavior. UIST 2023
Leike, J., et al. (2017). AI safety gridworlds. arXiv:1711.09883
Küttler, H., et al. (2020). The NetHack Learning Environment. NeurIPS 2020

# Grounded Multi-Agent Testbed **Umbrella issue for research infrastructure enabling LLM agents to operate in McRogueFace environments.** ## Overview This project implements a testbed for studying grounded language understanding in AI systems. Based on cognitive neuroscience research suggesting that language understanding requires integration of perceptual simulation, world knowledge, and situation modeling (Casto et al., 2025), we use McRogueFace's discrete roguelike environment to study how language agents learn affordances, develop theories of mind, and generalize across novel situations. ## Three-Level Architecture ### Level 1: Environment Comprehension (Base Physics) - Grid movement, walls, walkability - Field of view computation - Pathfinding (A*, Dijkstra) - Line of sight for interactions **Engine support**: Mostly complete via `mcrfpy.libtcod` ### Level 2: Game Behavior Comprehension (Middle Level) - Object affordances: pushable, openable, lockable - NPC behaviors: patrol routes, flee thresholds, scripted responses - Cause-effect chains: button→door, key→lock - Environmental rules: shop hours, trap triggers **Engine support**: Entity event system exists (`bump`, `ev_enter`, `ev_exit`), needs formalization ### Level 3: Equal Agent Comprehension (High Level) - LLM-driven agents with natural language capability - Per-agent perspective rendering for VLM input - Memory/observation history per agent - Speech affordances: "announce" (room-wide), "speak" (proximity-based) - Emergent behaviors: barter, negotiation, team formation **Engine support**: Requires new infrastructure ## Two Operating Modes ### Headless Simulation Mode - No framerate requirements - Sequential agent turns with full LLM API calls - All randomized NPC behaviors stored as actual values - Entities teleport (no animation) - Output: Simulation log with all decisions and world states ### Animated Demo Mode - Replay stored simulation outputs - Full animation of movement paths - Step through simulation as animations complete - Deterministic playback (same random seeds) ## Research Questions (from proposal) 1. Does grounding improve compositional generalization? 2. Does physics randomization induce investigative behavior? 3. What transfers between cognitive domains? 4. Can agents develop accurate theories of mind for scripted NPCs? 5. What emergent social behaviors arise from multi-agent interaction? 6. How do agents resolve conflicts between visual, textual, and queried information? ## Development Phases ### Phase 1: Two Agents in a Room - 10x10 room with locked door - Two LLM agents, one button, one door - Agent A can reach button; Agent B's goal blocked by door - Success: Agents solve coordination through emergent communication ### Phase 2: Learning Middle-Level Behaviors - Add scripted NPCs (cat, guard, shop door) - Agents must predict and exploit NPC behavior patterns ### Phase 3: Affordance Learning and Puzzles - ConceptNet-based object generation with varied properties - Physics randomization (same puzzle, different rules) - Compositional generalization testing ### Phase 4: Economic Reasoning - Shops, quest givers, inter-agent trade - Asymmetric information scenarios ### Phase 5: Town and Dungeon Integration - Full scenario combining all elements - Emergent dynamics: reputation, party formation, betrayal ## Blocking Issues Engine infrastructure required: - [ ] #153 - Separate render loop from game state loop (critical for two-mode system) - [ ] #16 - Entity knowledge contents / per-entity perspective - [ ] #113 - Batch Operations for Grid (efficient state manipulation) - [ ] #114 - CellView API (convenient grid interaction) Project-specific issues: - [ ] #155 - Deterministic Text Descriptions From Room Graph - [ ] #156 - Turn-based LLM Agent Orchestration ## Related Issues - #55 - McRogueFace as Agent Simulation Environment (predecessor, will be closed early) - #67 - Grid Stitching (future: infinite world scenarios) ## References - Casto, C., Ivanova, A., Fedorenko, E., & Kanwisher, N. (2025). What does it mean to understand language? arXiv:2511.19757 - Park, J. S., et al. (2023). Generative agents: Interactive simulacra of human behavior. UIST 2023 - Leike, J., et al. (2017). AI safety gridworlds. arXiv:1711.09883 - Küttler, H., et al. (2020). The NetHack Learning Environment. NeurIPS 2020

john referenced this issue

2025-12-01 16:00:33 +00:00

Deterministic Text Descriptions From Room Graph #155

john referenced this issue

2025-12-01 16:00:34 +00:00

Turn-based LLM Agent Orchestration #156

john added the

Major Feature

Demo Target

priority:tier1-active

labels 2025-12-01 16:15:11 +00:00

john commented

2025-12-01 21:27:22 +00:00

Progress Update: FOV/Perspective System Complete

Commits c5b4200 and a529e5e implement the per-agent perspective rendering infrastructure:

New API

FOV Configuration:

mcrfpy.FOV enum (BASIC, DIAMOND, SHADOW, PERMISSIVE_0-8, RESTRICTIVE, SYMMETRIC_SHADOWCAST)
mcrfpy.default_fov module property
grid.fov and grid.fov_radius properties

ColorLayer Perspective Methods:

fill_rect(x, y, w, h, color) - Fill rectangular region
draw_fov(source, radius, fov, visible, discovered, unknown) - One-time FOV visualization
apply_perspective(entity, visible, discovered, unknown) - Bind layer to entity
update_perspective() - Refresh from bound entity's gridstate
clear_perspective() - Remove binding

Entity Methods:

entity.update_visibility() - Updates gridstate AND all bound ColorLayers
entity.visible_entities(fov=None, radius=None) - Get list of visible entities

Demo

tests/demo/perspective_patrol_demo.py - Interactive fog of war demonstration

Relevance to Phase 1

This provides the foundation for "per-agent perspective rendering for VLM input" mentioned in Level 3. Each agent can have its own ColorLayer showing what it can see, and visible_entities() enables AI decision-making based on line-of-sight.

## Progress Update: FOV/Perspective System Complete Commits `c5b4200` and `a529e5e` implement the per-agent perspective rendering infrastructure: ### New API **FOV Configuration:** - `mcrfpy.FOV` enum (BASIC, DIAMOND, SHADOW, PERMISSIVE_0-8, RESTRICTIVE, SYMMETRIC_SHADOWCAST) - `mcrfpy.default_fov` module property - `grid.fov` and `grid.fov_radius` properties **ColorLayer Perspective Methods:** - `fill_rect(x, y, w, h, color)` - Fill rectangular region - `draw_fov(source, radius, fov, visible, discovered, unknown)` - One-time FOV visualization - `apply_perspective(entity, visible, discovered, unknown)` - Bind layer to entity - `update_perspective()` - Refresh from bound entity's gridstate - `clear_perspective()` - Remove binding **Entity Methods:** - `entity.update_visibility()` - Updates gridstate AND all bound ColorLayers - `entity.visible_entities(fov=None, radius=None)` - Get list of visible entities ### Demo `tests/demo/perspective_patrol_demo.py` - Interactive fog of war demonstration ### Relevance to Phase 1 This provides the foundation for "per-agent perspective rendering for VLM input" mentioned in Level 3. Each agent can have its own ColorLayer showing what it can see, and `visible_entities()` enables AI decision-making based on line-of-sight.

john referenced this issue

2025-12-02 00:57:54 +00:00

Separate render loop from game state loop #153

Sign in to join this conversation.

No Milestone

No project

No Assignees

1 Participants

Notifications

Due Date

The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: john/McRogueFace#154