[rllib] Misc fixes, A2C (#2679)

A bunch of minor rllib fixes:

pull in latest baselines atari wrapper changes (and use deepmind wrapper by default)
move reward clipping to policy evaluator
add a2c variant of a3c
reduce vision network fc layer size to 256 units
switch to 84x84 images
doc tweaks
print timesteps in tune status
This commit is contained in:
Eric Liang
2018-08-20 15:28:03 -07:00
committed by GitHub
parent 880ef1bd21
commit fbe6c59f72
34 changed files with 220 additions and 129 deletions
@@ -43,6 +43,7 @@ class SyncSamplesOptimizer(PolicyOptimizer):
else:
samples.append(self.local_evaluator.sample())
samples = SampleBatch.concat_samples(samples)
self.sample_timer.push_units_processed(samples.count)
with self.grad_timer:
for i in range(self.num_sgd_iter):
@@ -64,5 +65,7 @@ class SyncSamplesOptimizer(PolicyOptimizer):
3),
"opt_peak_throughput": round(self.grad_timer.mean_throughput,
3),
"sample_peak_throughput": round(
self.sample_timer.mean_throughput, 3),
"opt_samples": round(self.grad_timer.mean_units_processed, 3),
})