Grounded Multi-Agent Testbed: LLM Agents in Discrete Simulated Environments #154

Open
opened 2025-12-01 15:59:23 +00:00 by john · 1 comment
Owner

Grounded Multi-Agent Testbed

Umbrella issue for research infrastructure enabling LLM agents to operate in McRogueFace environments.

Overview

This project implements a testbed for studying grounded language understanding in AI systems. Based on cognitive neuroscience research suggesting that language understanding requires integration of perceptual simulation, world knowledge, and situation modeling (Casto et al., 2025), we use McRogueFace's discrete roguelike environment to study how language agents learn affordances, develop theories of mind, and generalize across novel situations.

Three-Level Architecture

Level 1: Environment Comprehension (Base Physics)

  • Grid movement, walls, walkability
  • Field of view computation
  • Pathfinding (A*, Dijkstra)
  • Line of sight for interactions

Engine support: Mostly complete via mcrfpy.libtcod

Level 2: Game Behavior Comprehension (Middle Level)

  • Object affordances: pushable, openable, lockable
  • NPC behaviors: patrol routes, flee thresholds, scripted responses
  • Cause-effect chains: button→door, key→lock
  • Environmental rules: shop hours, trap triggers

Engine support: Entity event system exists (bump, ev_enter, ev_exit), needs formalization

Level 3: Equal Agent Comprehension (High Level)

  • LLM-driven agents with natural language capability
  • Per-agent perspective rendering for VLM input
  • Memory/observation history per agent
  • Speech affordances: "announce" (room-wide), "speak" (proximity-based)
  • Emergent behaviors: barter, negotiation, team formation

Engine support: Requires new infrastructure

Two Operating Modes

Headless Simulation Mode

  • No framerate requirements
  • Sequential agent turns with full LLM API calls
  • All randomized NPC behaviors stored as actual values
  • Entities teleport (no animation)
  • Output: Simulation log with all decisions and world states

Animated Demo Mode

  • Replay stored simulation outputs
  • Full animation of movement paths
  • Step through simulation as animations complete
  • Deterministic playback (same random seeds)

Research Questions (from proposal)

  1. Does grounding improve compositional generalization?
  2. Does physics randomization induce investigative behavior?
  3. What transfers between cognitive domains?
  4. Can agents develop accurate theories of mind for scripted NPCs?
  5. What emergent social behaviors arise from multi-agent interaction?
  6. How do agents resolve conflicts between visual, textual, and queried information?

Development Phases

Phase 1: Two Agents in a Room

  • 10x10 room with locked door
  • Two LLM agents, one button, one door
  • Agent A can reach button; Agent B's goal blocked by door
  • Success: Agents solve coordination through emergent communication

Phase 2: Learning Middle-Level Behaviors

  • Add scripted NPCs (cat, guard, shop door)
  • Agents must predict and exploit NPC behavior patterns

Phase 3: Affordance Learning and Puzzles

  • ConceptNet-based object generation with varied properties
  • Physics randomization (same puzzle, different rules)
  • Compositional generalization testing

Phase 4: Economic Reasoning

  • Shops, quest givers, inter-agent trade
  • Asymmetric information scenarios

Phase 5: Town and Dungeon Integration

  • Full scenario combining all elements
  • Emergent dynamics: reputation, party formation, betrayal

Blocking Issues

Engine infrastructure required:

  • #153 - Separate render loop from game state loop (critical for two-mode system)
  • #16 - Entity knowledge contents / per-entity perspective
  • #113 - Batch Operations for Grid (efficient state manipulation)
  • #114 - CellView API (convenient grid interaction)

Project-specific issues:

  • #155 - Deterministic Text Descriptions From Room Graph
  • #156 - Turn-based LLM Agent Orchestration
  • #55 - McRogueFace as Agent Simulation Environment (predecessor, will be closed early)
  • #67 - Grid Stitching (future: infinite world scenarios)

References

  • Casto, C., Ivanova, A., Fedorenko, E., & Kanwisher, N. (2025). What does it mean to understand language? arXiv:2511.19757
  • Park, J. S., et al. (2023). Generative agents: Interactive simulacra of human behavior. UIST 2023
  • Leike, J., et al. (2017). AI safety gridworlds. arXiv:1711.09883
  • Küttler, H., et al. (2020). The NetHack Learning Environment. NeurIPS 2020
# Grounded Multi-Agent Testbed **Umbrella issue for research infrastructure enabling LLM agents to operate in McRogueFace environments.** ## Overview This project implements a testbed for studying grounded language understanding in AI systems. Based on cognitive neuroscience research suggesting that language understanding requires integration of perceptual simulation, world knowledge, and situation modeling (Casto et al., 2025), we use McRogueFace's discrete roguelike environment to study how language agents learn affordances, develop theories of mind, and generalize across novel situations. ## Three-Level Architecture ### Level 1: Environment Comprehension (Base Physics) - Grid movement, walls, walkability - Field of view computation - Pathfinding (A*, Dijkstra) - Line of sight for interactions **Engine support**: Mostly complete via `mcrfpy.libtcod` ### Level 2: Game Behavior Comprehension (Middle Level) - Object affordances: pushable, openable, lockable - NPC behaviors: patrol routes, flee thresholds, scripted responses - Cause-effect chains: button→door, key→lock - Environmental rules: shop hours, trap triggers **Engine support**: Entity event system exists (`bump`, `ev_enter`, `ev_exit`), needs formalization ### Level 3: Equal Agent Comprehension (High Level) - LLM-driven agents with natural language capability - Per-agent perspective rendering for VLM input - Memory/observation history per agent - Speech affordances: "announce" (room-wide), "speak" (proximity-based) - Emergent behaviors: barter, negotiation, team formation **Engine support**: Requires new infrastructure ## Two Operating Modes ### Headless Simulation Mode - No framerate requirements - Sequential agent turns with full LLM API calls - All randomized NPC behaviors stored as actual values - Entities teleport (no animation) - Output: Simulation log with all decisions and world states ### Animated Demo Mode - Replay stored simulation outputs - Full animation of movement paths - Step through simulation as animations complete - Deterministic playback (same random seeds) ## Research Questions (from proposal) 1. Does grounding improve compositional generalization? 2. Does physics randomization induce investigative behavior? 3. What transfers between cognitive domains? 4. Can agents develop accurate theories of mind for scripted NPCs? 5. What emergent social behaviors arise from multi-agent interaction? 6. How do agents resolve conflicts between visual, textual, and queried information? ## Development Phases ### Phase 1: Two Agents in a Room - 10x10 room with locked door - Two LLM agents, one button, one door - Agent A can reach button; Agent B's goal blocked by door - Success: Agents solve coordination through emergent communication ### Phase 2: Learning Middle-Level Behaviors - Add scripted NPCs (cat, guard, shop door) - Agents must predict and exploit NPC behavior patterns ### Phase 3: Affordance Learning and Puzzles - ConceptNet-based object generation with varied properties - Physics randomization (same puzzle, different rules) - Compositional generalization testing ### Phase 4: Economic Reasoning - Shops, quest givers, inter-agent trade - Asymmetric information scenarios ### Phase 5: Town and Dungeon Integration - Full scenario combining all elements - Emergent dynamics: reputation, party formation, betrayal ## Blocking Issues Engine infrastructure required: - [ ] #153 - Separate render loop from game state loop (critical for two-mode system) - [ ] #16 - Entity knowledge contents / per-entity perspective - [ ] #113 - Batch Operations for Grid (efficient state manipulation) - [ ] #114 - CellView API (convenient grid interaction) Project-specific issues: - [ ] #155 - Deterministic Text Descriptions From Room Graph - [ ] #156 - Turn-based LLM Agent Orchestration ## Related Issues - #55 - McRogueFace as Agent Simulation Environment (predecessor, will be closed early) - #67 - Grid Stitching (future: infinite world scenarios) ## References - Casto, C., Ivanova, A., Fedorenko, E., & Kanwisher, N. (2025). What does it mean to understand language? arXiv:2511.19757 - Park, J. S., et al. (2023). Generative agents: Interactive simulacra of human behavior. UIST 2023 - Leike, J., et al. (2017). AI safety gridworlds. arXiv:1711.09883 - Küttler, H., et al. (2020). The NetHack Learning Environment. NeurIPS 2020
john added the
Major Feature
Demo Target
priority:tier1-active
labels 2025-12-01 16:15:11 +00:00
Author
Owner

Progress Update: FOV/Perspective System Complete

Commits c5b4200 and a529e5e implement the per-agent perspective rendering infrastructure:

New API

FOV Configuration:

  • mcrfpy.FOV enum (BASIC, DIAMOND, SHADOW, PERMISSIVE_0-8, RESTRICTIVE, SYMMETRIC_SHADOWCAST)
  • mcrfpy.default_fov module property
  • grid.fov and grid.fov_radius properties

ColorLayer Perspective Methods:

  • fill_rect(x, y, w, h, color) - Fill rectangular region
  • draw_fov(source, radius, fov, visible, discovered, unknown) - One-time FOV visualization
  • apply_perspective(entity, visible, discovered, unknown) - Bind layer to entity
  • update_perspective() - Refresh from bound entity's gridstate
  • clear_perspective() - Remove binding

Entity Methods:

  • entity.update_visibility() - Updates gridstate AND all bound ColorLayers
  • entity.visible_entities(fov=None, radius=None) - Get list of visible entities

Demo

tests/demo/perspective_patrol_demo.py - Interactive fog of war demonstration

Relevance to Phase 1

This provides the foundation for "per-agent perspective rendering for VLM input" mentioned in Level 3. Each agent can have its own ColorLayer showing what it can see, and visible_entities() enables AI decision-making based on line-of-sight.

## Progress Update: FOV/Perspective System Complete Commits `c5b4200` and `a529e5e` implement the per-agent perspective rendering infrastructure: ### New API **FOV Configuration:** - `mcrfpy.FOV` enum (BASIC, DIAMOND, SHADOW, PERMISSIVE_0-8, RESTRICTIVE, SYMMETRIC_SHADOWCAST) - `mcrfpy.default_fov` module property - `grid.fov` and `grid.fov_radius` properties **ColorLayer Perspective Methods:** - `fill_rect(x, y, w, h, color)` - Fill rectangular region - `draw_fov(source, radius, fov, visible, discovered, unknown)` - One-time FOV visualization - `apply_perspective(entity, visible, discovered, unknown)` - Bind layer to entity - `update_perspective()` - Refresh from bound entity's gridstate - `clear_perspective()` - Remove binding **Entity Methods:** - `entity.update_visibility()` - Updates gridstate AND all bound ColorLayers - `entity.visible_entities(fov=None, radius=None)` - Get list of visible entities ### Demo `tests/demo/perspective_patrol_demo.py` - Interactive fog of war demonstration ### Relevance to Phase 1 This provides the foundation for "per-agent perspective rendering for VLM input" mentioned in Level 3. Each agent can have its own ColorLayer showing what it can see, and `visible_entities()` enables AI decision-making based on line-of-sight.
Sign in to join this conversation.
No Milestone
No project
No Assignees
1 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: john/McRogueFace#154
No description provided.