Learning to Search from Demonstration Sequences

Abstract

We study the problem of learning to search from demonstration sequences. Online search algorithms, such as Monte Carlo Tree Search (MCTS), iteratively simulate trajectories and update action-values to find good actions at decision time. Designing the right search heuristics to guide these simulations is challenging and typically requires substantial domain knowledge. In this work, we propose to learn the search heuristics directly from expert demonstration sequences, enabling the search algorithm to focus on the most promising parts of the search space without hand-crafted domain-specific knowledge. We show that our approach achieves strong performance on planning benchmarks including Sokoban and grid-world navigation, consistently outperforming baselines that do not leverage demonstration data.

Publication
In International Conference on Learning Representations
Dixant Mittal
Dixant Mittal

My research interests include reinforcement learning, planning & search, large language models, and decision-making under uncertainty.