Course Project: Reinforcement Learning

Multi-agent collision-free path planning algorithm in unknown and deterministic environment.
Built the long short-term memory (LSTM) with the self-attention mechanism to encode the other agents’ states.
Fed the agent’s own state and the other agents’ states into two multilayer perceptrons (MLP) to output the optimal value function and the optimal policy.

Share on

Twitter Facebook LinkedIn