how to use ppo for gpu-parallelised robot manipulation
there are lots of tricks beyond what was introduced.
resources:
here I keep a list of them that I have found relevant for robotic manipulation continuous control
torch.optim.Adam(params, lr=0.001, betas=(0.9, 0.999), eps=1e-08, weight_decay=0)
Orthogonal Initialization of Weights and Constant Initialization of biases Policy heads: gain 0.01 Value: gain 1 Encoder: gain root 2
entropy Andrychowicz, et al. (2021) overall find no evidence that the entropy term improves performance on continuous control environments (decision C13, figure 76 and 77).
Continuous actions via normal distributions Gaussian most popular choice try bump
while Haarnoja et al., (2018) uses the state-dependent standard deviation, that is, the mean and standard deviation are output at the same time