First Steps¶
This page will guide you through the first steps of using the pymdps package. We will first make sure that the package is installed correctly, and then we will create a simple Markov Decision Process (MDP) and solve it using the Value Iteration algorithm.
Installation¶
To install the package, you can use pip:
This will install the latest version of the package from PyPI. If you want to install the latest development version from GitHub, you can use:
Creating an MDP¶
A Markov Decision Process (MDP) is defined by a tuple \((S, A, P, R, \gamma)\) where \(S\) is the set of states, \(A\) is the set of actions, \(P(s' \mid s, a)\) is the transition probability function, \(R(s, a, s')\) is the reward function, and \(\gamma\) is the discount factor.
from pymdps import BaseMDP, MDPSolver
class MyMDP(BaseMDP):
def __init__(self):
super().__init__()
def states(self):
return ['s1', 's2', 's3']
def actions(self, state):
return ['a1', 'a2'] if state == 's1' else ['a3', 'a4']
def transition_probabilities(self, state, action):
return {'s1': 0.5, 's2': 0.5} if action == 'a1' else {'s3': 1.0}
def reward(self, state, action, next_state):
return 1.0
if __name__ == '__main__':
mdp = MyMDP()
solver = MDPSolver(mdp)
solver.solve()
print('Done')