Using Posterior Variance Estimates to Improve Exploration in Monte Carlo Tree Search
Reformulated the MCTS value estimate as a Gaussian posterior over each node’s true value, propagating both mean and variance up the tree during back-up to capture epistemic uncertainty in unexplored sub-trees. Replaced the standard UCB1 exploration bonus with a posterior-variance-based bonus, using Thompson Sampling to select the action branch to visit next rather than relying on visit-count heuristics alone. Demonstrated improved sample efficiency and stronger final policies on benchmark planning tasks compared to vanilla UCT at matched simulation budgets.