Using Posterior Variance Estimates to Improve Exploration in Monte Carlo Tree Search

Last updated on 25 May 2026

Reformulated the MCTS value estimate as a Gaussian posterior over each node’s true value, propagating both mean and variance up the tree during back-up to capture epistemic uncertainty in unexplored sub-trees. Replaced the standard UCB1 exploration bonus with a posterior-variance-based bonus, using Thompson Sampling to select the action branch to visit next rather than relying on visit-count heuristics alone. Demonstrated improved sample efficiency and stronger final policies on benchmark planning tasks compared to vanilla UCT at matched simulation budgets.

Using Posterior Variance Estimates to Improve Exploration in Monte Carlo Tree Search

Dixant Mittal