Published on November 1, 2023, 8:02 am

TLDR: Singapore's Infocomm Media Development Authority (IMDA) and the AI Verify Foundation are leading an initiative called Sandbox to establish standardized benchmarks and testing protocols for generative AI products. Supported by companies such as Amazon Web Services, Google, and Microsoft, the goal is to create a common approach for evaluating these applications and addressing associated risks. The initiative aims to develop a draft catalog of widely-used technical testing tools and recommend a baseline set of tests for evaluating generative AI products. By establishing a common language and facilitating broader adoption, the initiative promotes safety and trust in the use of generative AI.

Efforts are underway to establish standardized benchmarks and testing protocols for generative artificial intelligence (AI) products. The goal is to create a common approach to evaluating these applications and address the associated risks. This initiative, called Sandbox, is led by Singapore’s Infocomm Media Development Authority (IMDA) and AI Verify Foundation, with support from global market players like Amazon Web Services (AWS), Anthropic, Google, and Microsoft.

Sandbox aims to develop a draft catalog that categorizes existing benchmarks and evaluation methods for large language models (LLMs). The catalog will compile widely-used technical testing tools and recommend a baseline set of tests for evaluating generative AI products. This approach will establish a common language and facilitate the broader adoption of generative AI while ensuring safety and trust in its use.

IMDA emphasizes that systematic and robust evaluation of models is crucial for LLM governance as it helps build trust in these technologies. Through rigorous evaluation, developers can understand a model’s capabilities, intended uses, and limitations. It also provides developers with valuable insights on how to improve their models.

To achieve this common standard, IMDA acknowledges the need for standardized taxonomy and pre-deployment safety evaluations for LLMs. The agency hopes that the draft catalog will initiate global discussions leading to consensus on safety standards in this field.

Moving towards common standards also requires involving stakeholders beyond model developers – including application developers building on top of models and developers of third-party testing tools. IMDA’s Sandbox project aims to demonstrate how different players in the ecosystem can collaborate effectively. For instance, model developers like Anthropic or Google can work alongside app developers OCBC or Singtel, along with third-party testers such as Deloitte and EY, on generative AI use cases in sectors like financial services or telecommunications. Regulators like Singapore’s Personal Data Protection Commission should also be involved to create an environment that encourages experimentation, development, transparency, and meets the needs of all parties involved.

Through Sandbox, IMDA expects to identify gaps in current generative AI evaluations. These include domain-specific applications like human resources and areas specific to different cultures, which are currently underdeveloped. The agency aims to develop benchmarks for evaluating model performance in these specific areas, taking into account cultural and language specificities.

IMDA is collaborating with Anthropic on a Sandbox project that focuses on red-teaming, which challenges existing policies and assumptions used in AI systems. By adopting an adversarial approach, the project aims to evaluate AI models within Singapore’s diverse linguistic and cultural landscape. This will help identify aspects for improvement and ensure the models perform well within the country’s multilingual context.

In July, the Singapore government launched two sandboxes powered by Google Cloud’s generative AI toolsets. One sandbox is exclusively used by government agencies to develop and test generative AI applications, while the other is available to local organizations for up to 100 use cases at no cost for three months.

Overall, the Sandbox initiative led by IMDA and its partners aims to establish standardized benchmarks and testing protocols for generative AI. By creating a common approach globally, this effort promotes safety, trustworthiness, and broader adoption of generative AI applications across various sectors.

Share.

Comments are closed.