Course Project: Reinforcement Learning

  • Multi-agent collision-free path planning algorithm in unknown and deterministic environment.
  • Built the long short-term memory (LSTM) with the self-attention mechanism to encode the other agents’ states.
  • Fed the agent’s own state and the other agents’ states into two multilayer perceptrons (MLP) to output the optimal value function and the optimal policy.