Adversarial Policies
Adam Gleave, Michael Dennis, Cody Wild, Neel Kant, Sergey Levine, Stuart Russell
Abstract
Deep reinforcement learning (RL) policies are known to be vulnerable to adversarial perturbations to their observations, similar to adversarial examples for classifiers. However, an attacker is not usually able to directly modify observations. This might lead one to wonder: is it possible to attack an RL agent simply by choosing an adversarial policy acting in a shared environment so as to create natural observations that are adversarial? We demonstrate the existence of adversarial policies in zero-sum simulated robotics games, against state-of-the-art victims trained via self-play to be robust to adversaries. The adversarial policies reliably win against the victims despite not playing the game: they fall to the ground, and look similar to a random policy. We find our attack is more successful in high-dimensional environments, and induces activations in the victim that are extremely unlikely to occur when playing a normal opponent.
Explore Videos
Use "Load Preset" to compare videos from the same environment. Use the "Add" button to explore beyond the presets. By default, the videos are synchronized. Click the lock icon to unlock the playback controls.