I found some useful papers about PAC of MDP and its VC (and …) dimensions. Read them later:
Rahul Jain and Pravin Varaiya, “PAC Learning for Markov Decision Process and Dynamics Games,†?, 04.
R. Jain and P. Varaiya, “Extension of PAC Learning for Partially Observable Markov Decision Processes,†?, 04.
Yishay Mansour, “Reinforcement Learning and Mistake Bounded Algorithsm,†?, 1999.