Hi, thank you for quality code. but I wonder why walker_stand task critic loss is too high(up to 1e+3) in my experiment. In my case, I used your conda.yaml and changed env :walker_stand and action_repeat : 2 and batch_size : 512 as you mentioned in paper. how can I get stable critic loss?(for example, reward scaling)
Thank you for reading.
Hi, thank you for quality code. but I wonder why walker_stand task critic loss is too high(up to 1e+3) in my experiment. In my case, I used your
conda.yamland changedenv :walker_standandaction_repeat : 2andbatch_size : 512as you mentioned in paper. how can I get stable critic loss?(for example, reward scaling)Thank you for reading.