Published on October 26, 2023, 8:37 am

  • Google is expanding its vulnerability rewards program (VRP) to include specific attack scenarios related to generative AI. This expansion aims to incentivize research around AI safety and security and shed light on potential issues in generative AI models. The VRP now covers attacks such as prompt injection, training-data extraction, model manipulation, and model theft. Monetary rewards vary based on severity, with up to $31,337 available for uncovering critical vulnerabilities. Google's initiative demonstrates a commitment to the safety and reliability of generative AI while fostering a community of researchers.
  • Google Expands Vulnerability Rewards Program for Generative AI Safety

Google is taking steps to ensure the safety and security of generative AI by expanding its vulnerability rewards program (VRP). The VRP incentivizes ethical hackers to identify and responsibly disclose security flaws, and now it includes attack scenarios specific to generative AI.

By broadening the scope of the VRP, Google aims to encourage research around AI safety and security. This expansion will shed light on potential issues that can arise in generative AI models, ultimately making AI safer for everyone involved.

To address the unique challenges posed by generative AI, such as unfair bias or model manipulation, Google sought to rethink how bugs are categorized and reported. The company drew inspiration from its newly formed AI Red Team, a group of skilled hackers who simulate various adversaries like nation-states and hacktivists. Their mission is to identify vulnerabilities in technology.

During a recent exercise conducted by the AI Red Team, significant threats were discovered in large language models (LLMs) used in products like ChatGPT and Google Bard. One vulnerability identified was prompt injection attacks, where hackers craft prompts that can influence the behavior of the model. Adversarial prompts could generate harmful or offensive text or leak sensitive information. Another type of attack called training-data extraction allows hackers to reconstruct training examples verbatim, potentially extracting personally identifiable information or passwords.

Google’s expanded VRP now covers these types of attacks along with model manipulation and model theft attacks. However, rewards will not be offered for bugs related to copyright issues or data extraction that reconstructs non-sensitive or public information.

The monetary rewards for uncovering vulnerabilities will vary based on their severity. For instance, researchers can earn up to $31,337 if they discover command injection attacks or deserialization bugs in highly sensitive applications like Google Search or Google Play. Lower priority apps carry a maximum reward of $5,000.

In 2022 alone, Google paid out over $12 million in rewards to security researchers. The company’s commitment to generative AI goes beyond cybersecurity as it aims to foster safer and more reliable applications of this cutting-edge technology.

By expanding its vulnerability rewards program, Google demonstrates its dedication to ensuring the safety and integrity of generative AI. This initiative not only enhances the security of AI systems but also fosters a community of researchers working towards making AI technology more robust and trustworthy.


Comments are closed.