First Steps¶

This page will guide you through the first steps of using the pymdps package. We will first make sure that the package is installed correctly, and then we will create a simple Markov Decision Process (MDP) and solve it using the Value Iteration algorithm.

Installation¶

To install the package, you can use pip:

pip install pymdps

This will install the latest version of the package from PyPI. If you want to install the latest development version from GitHub, you can use:

pip install git+https://github.com/duncaneddy/pymdps

Creating an MDP¶

A Markov Decision Process (MDP) is defined by a tuple \((S, A, P, R, \gamma)\) where \(S\) is the set of states, \(A\) is the set of actions, \(P(s' \mid s, a)\) is the transition probability function, \(R(s, a, s')\) is the reward function, and \(\gamma\) is the discount factor.

Python

from pymdps import BaseMDP, MDPSolver

class MyMDP(BaseMDP):
    def __init__(self):
        super().__init__()

    def states(self):
        return ['s1', 's2', 's3']

    def actions(self, state):
        return ['a1', 'a2'] if state == 's1' else ['a3', 'a4']

    def transition_probabilities(self, state, action):
        return {'s1': 0.5, 's2': 0.5} if action == 'a1' else {'s3': 1.0}

    def reward(self, state, action, next_state):
        return 1.0

if __name__ == '__main__':
    mdp = MyMDP()

    solver = MDPSolver(mdp)

    solver.solve()

    print('Done')