In the early hours of December 9th, the European Union Parliament and Council finally came out with a provisional agreement on the contents of the Artificial Intelligence Act (AIA). In this blog post, we will summarize the main contents of the AIA and discuss its possible implications and open questions using the development and deployment of Large Language Models (LLM) as an example.
The short version
The EU’s Artificial Intelligence Act aims to govern the development and deployment of AI systems in the EU, while ensuring that these systems are safe and respect the health, safety and fundamental rights and freedoms of EU citizens. The provisional agreement states that the Act will apply two years after its entry into force (i.e. following its publication in the Official Journal of the EU), shortened to six months for the bans it contains. The Act most notably impacts AI system deployers, who are regulated according to the risk category of their use case. On the side of generative AI, foundational model developers are facing significant requirements for transparency, safeguards, and testing.
Digging a little deeper
The first draft of the Act was published in April 2021, and its final version is currently undergoing the EU legislative procedure. After the latest agreement, the Act still needs to be confirmed by both the Parliament and the Council, as well as undergo legal-linguistic revisions, before formal adoption.
The Act defines an “AI system” as a machine-based system that, with varying levels of autonomy and for explicit or implicit objectives, generates outputs such as predictions, recommendations, or decisions that can influence physical or virtual environments. The regulation applies to providers, deployers, and distributors of AI systems as well as “affected persons”, meaning individuals or groups of persons who are subject to or otherwise affected by an AI system.
The AIA establishes varying obligations for developers and deployers of AI systems, depending on which risk classification the system in question may fall in. The Act presents four risk categories, namely:
- Unacceptable risk: AI systems that are a clear threat to the safety, livelihoods, and rights of individuals (e.g. systems used for social scoring and systems that exploit vulnerable groups such as children). The use of these systems is prohibited.
- High risk: AI systems that that pose significant harm to the health, safety, or fundamental rights of individuals. Examples of high-risk AI systems include those used for the management of critical infrastructure, education, employment, law enforcement, and border control. High-risk systems will be subject to strict obligations before they can be placed on the market: providers and deployers of these systems must, for instance, develop a risk management process for risk identification and mitigation; apply appropriate data governance & management practices to training, validation, and testing data sets; enable human-oversight; ensure technical robustness and cybersecurity; as well as draw up documentation that demonstrates AIA compliance. (For a complete list of obligations, see Arts. 9-17 AIA).
- Limited risk: Examples of limited-risk AI systems include systems intended to interact with individuals, e.g. chatbots and deep fakes. The compliance obligations for limited-risk AI focus on transparency: users of these systems must be clearly informed that they are interacting with an AI system.
- Minimal risk: Examples of minimal risk AI include spam filters, AI-enabled video games, and inventory management systems. The AIA allows for the free use of minimal risk AI.
The risk categories have fluctuated throughout the drafting stages of the AIA.
Implications for model developers and deployers
AI model and application developers are, of course, quite anxious about the Act, because it has the potential of monumentally impacting the development and usage processes. As the AIA proposal phase is being finalized, it is important to consider possible scenarios and think about the impact the Act would have on different groups in the AI field.
Let’s consider the hottest AI topic of 2023: Large Language Models (LLM). One way to view the LLM lifespan is to divide it into three phases (upstream to downstream): foundation model (FM) development, fine tuning, and deployment. What possible implications would the AI Act have on these phases?
Foundation model developers are the ones doing the “heavy lifting”. They develop the model architecture, scrape together and process the enormous data masses required to pre-train the model, and execute the actual pre-training, during which the model learns most of its capabilities. These are organizations backed by significant resources, since gathering the data and especially the compute-intensive pre-training are expensive activities. Having the most impact on the model itself, a FM developer will, according to the current proposal, be regulated relative to the cumulative amount of compute used for model training. For example, a FM classified as “high-impact” (more than 10^25 floating point operations during training) would also have stricter transparency requirements concerning, for instance, disclosing copyrighted training material. This is a huge requirement; the amount of data required for pre-training is so massive, that its collection process is highly automated, and thus, there is only minimal control over the substance itself. An interesting detail is that according to the latest agreement, open-source models will be subject to lighter regulation.
Fine tuners have a smaller, yet significant impact on the model. They take a pre-trained FM and continue training it on a smaller, more specialized dataset. In a way, they perform the same manipulations on the model as the FM developer, just on a smaller scale. The interesting question follows: how will the AIA discern between them? Will fine tuners be subject to the same, computational impact -relative transparency requirements as FM developers? In any case, fine tuners will have it easier in the sense that they have far more control over the content of their datasets.
Model deployers (considering them separate from fine tuners) do not affect the LLM itself. Rather, they decide on the final use case (although the fine tuner might already have trained the model for that case), and control how the model can be used. This means that they will most likely be subject to the bulk of the AIA’s risk category -based regulation. Deployers also build the software around the FM, affecting how the model can be used, how its inputs and outputs are processed, and how much control the end user is able to exercise over it. Consequently, more “classical” questions of software and information security might well become a critical part of AIA compliance.
What next?
For now, we must wait for the finalized texts to come out to grasp the details of the Act. Meanwhile, every organization dealing with AI systems will have to ponder on the implications of what we know now. Deployers will already have to start giving serious thought on risk categorization and the following requirements. FM developers brace themselves for the additional work that comes with curating masses of training data, while weighing open vs. closed-source development in a new light.