China proposes stricter curbs on training data and models used to build generative AI services in bid to tighten security

Draft targets two key areas for improvement – the security of raw training data and large language models used to build the generative AI services

The draft proposes a blacklist system to block training data materials that contain more than 5 per cent of illegal content

Reading Time:2 minutes

China proposes tougher rules for generative AI. Photo: Shutterstock

Published: 8:00am, 14 Oct 2023

China is planning stricter curbs on how generative artificial intelligence (AI) services are applied in the country, as authorities attempt to strike a balance between harnessing the benefits of the technology while mitigating the risks.

New draft guidance published on Wednesday by the National Information Security Standardisation Technical Committee, an agency that enacts standards for IT security, targets two key areas for improvement – the security of raw training data and the large language models (LLMs) used to build the generative AI services.

The draft stipulates that AI training materials should not infringe copyright or breach personal data security. It requires that training data be processed by authorised data labellers and reviewers to pass security checks first.

Secondly, when developers build their LLMs – the deep learning algorithms trained with massive data sets that power generative AI chatbots such as Baidu’s Ernie Bot – they should be based on foundational models filed with and licensed by authorities, according to the draft.

The draft proposes a blacklist system to block training data materials that contain more than 5 per cent of illegal content, together with information deemed harmful under the nation’s cybersecurity laws.

China proposes stricter curbs on training data and models used to build generative AI services in bid to tighten security

Draft targets two key areas for improvement – the security of raw training data and large language models used to build the generative AI services The draft proposes a blacklist system to block training data materials that contain more than 5 per cent of illegal content

Draft targets two key areas for improvement – the security of raw training data and large language models used to build the generative AI services

The draft proposes a blacklist system to block training data materials that contain more than 5 per cent of illegal content