Lecture 9 | Sparse Markov Decision Processes

Tags

Date

Markov Decision Processes (MDPs)

Maximum Entropy Probability Distribution

•

특정 데이터셋을 설명하는 가장 적절한 distribution 은 constraint 를 만족하는 가장 entropy 가 높은 distribution 임.

S_{q,k}(p) = \frac{k}{q-1}(1-\sum p_i^q)

•

q=k=1q=k=1q=k=1 일 때 Boltzmann-Gibbs Entropy

•

q=2,k=12q=2, k= \frac{1}{2}q=2,k=21​ 일 때, Sparse Tsallis Entropy

Sparse Bellman Equation

•

q function 의 ordering 으로 값이 threshold 보다 작은 것들은 cut-off 함.

Performance Error Bounds

•

Experiment: Reinforcement Learning

•

Sparseax distribution 이 다른 방법들보다 action space 를 더 효율적으로 탐색함.