Copyright Challenges in the Age of AI – Part 2

Meet The Authors

Olli Pitkänen

Olli Pitkänen

CLO

Dr. Olli Pitkänen is  proficient expert with extensive experience in ICT and law, leading multidisciplinary projects and providing expertise in legal aspects of ICT, IPRs, privacy, and data as a founder of an IT law firm and advisor to companies and the Finnish government.

Sami Jokela

Sami Jokela

CTO

Dr. Sami Jokela is a seasoned leader with 20+ years of experience in data, technology, and strategy, including roles at Nokia, co-founding startups, and leading Accenture’s technology and information insight practices.

Waltter Roslin

Waltter Roslin

Lawyer

Waltter is a lawyer focusing on questions concerning data sharing, governance, privacy and technology. He is also a PhD researcher at the University of Helsinki where his research focuses on the Finnish pharmaceutical reimbursement scheme.

Copyright challenges in the age of AI - Part 2: Is the output of a generative AI system copyrightable and who is the author?

Introduction

Artificial Intelligence (AI) presents new challenges in the copyright system.  

In the previous part, we analysed these challenges and show how they appear from the perspectives of developing and using AI systems. Especially, we noted that depending on the algorithm, machine learning process can be copyright relevant. If the training involves copying the creative choices made by the original author of a copyrighted work in the training data, it could violate the author’s exclusive rights. On the other hand, if the machine learning process can be considered as data mining, it can be within the limitation or exception defined in the DSM directive and therefore lawful within the EU. Yet, if the output of a generative AI system includes copies of the works in the training data, that cannot be justified by that limitation or exception. 

In this second part of our three-part posting on these challenges we discuss the authorship of AI generated output. 

Authorship

According to copyright law, the author is the person who created the work and who originally holds the copyright to it. Often copyright is transferred directly to the employer by law, or it may be transferred to a publisher, for example, but the original creator is still to be regarded as the author. As the role of AI grows, questions arise as to who is the author of a work that has been influenced by AI?  

As discussed in the previous section, to qualify for copyright protection, a work must be original, the author’s own intellectual creation. The author must have made creative choices in creating the work.  

If AI is used in a process that produces something that is perceived as creative, there are several possibilities as to where the creativity comes from.  

First, the AI user can use the system in a creative way. This is often the case today, for example, when a software product such as Adobe Photoshop contains several tools incorporating AI technology. When editing images in Photoshop, these AI tools can enable complex editing , but in most cases the choices are still made by the human user of the software. 

Second, the software developer may have made creative choices that affect the output of the system in such a way that it seems creative no matter how it is used. 

Third, an AI system can be based on a general model trained using data that includes creative works such as newspaper articles, novels, compositions, pictures or paintings. Likewise, data used to refine a model for a specific application may contain similar works. As discussed in the first part, it is possible that the creative choices made for these works are also reflected in the output produced by the system. Thus, the creativity of the output can be traced back to the original authors of the training material. 

So, an AI output that appears creative may in fact be the result of creative choices made by different actors. Actually, it is often a combination of several individuals creativity, which is called joint authorship in the copyright law. Unless the authors have assigned their rights or agreed otherwise, the consent of all of them is required for commercial exploitation of the joint work, which can be quite complex. 

Currently, most definitions of originality require a human author. Thus, AI cannot at the moment be considered as an author. Yet, a good question is, should an automatically generated work be copyrightable, in the first place? While exclusive rights such as copyright can motivate people and companies to research, create and invent new, useful goods, they can also set up monopolies that are harmful to society and the economy. Therefore, careful consideration should be given to whether granting exclusive rights to automatically created works is societally desirable. If yes, who should get the copyright?  

Some believe that one day it will be necessary to grant rights to AI itself. This would probably require some AI systems to achieve legal personhood. However, at the moment it is very difficult to see what problems this would solve.

Observable originality

One of the key principles behind the copyright system is that copyright does not require a filing of an application and an official examination process like patents. Instead, everybody should be able to evaluate a work and just by observation assess whether it is original and thus copyrightable. In practice, that can be difficult even for an expert, but in principle anyone can read a text or listen to a piece of music and tell if it is copyrightable or not.

A scale with one side featuring copyright symbols and the other side featuring symbols representing AI algorithms, with a question mark in the center.<br />
Caption: "Balancing Copyright Protection and AI Development: A Legal Dilemma.

Ernest Hemingway: Across the River and Into the Trees

Technological development has already challenged that. From a photograph, it can be impossible to tell if the photographer has made creative choices while producing the image, or if the picture is merely a random snapshot. The application of AI makes it ever more difficult to evaluate the copyrightability of a work only by observing the end-result. For example, if an image were a painting by a human artist, it would be protected by copyright, but if it was automatically generated by an AI system, it would not. Therefore, the problem is that in the future we shall need more information about how works are created to assess their copyrightability, which seriously confronts the basic principle of copyright regime. Shall we need a new intellectual property right between copyright and patent that would protect originality and creativity like copyright, but would require an application-based prior prosecution before an authority officially grants the right? 

A scale with one side featuring copyright symbols and the other side featuring symbols representing AI algorithms, with a question mark in the center.<br />
Caption: "Balancing Copyright Protection and AI Development: A Legal Dilemma.

Reijo Keskikiikonen: Tommi, Copyright Council 2016:4, https://okm.fi/lausunnot-2016

Is this a copyrightable work – or merely a random snapshot? Did the photographer make creative choices? Impossible to tell, if we don’t know, how the picture was produced. In this case, the Copyright Council considered that the key element of the photograph is the successful timing of its taking, but this alone is not sufficient to make the picture independent and original. Therefore, it is not copyrightable. However, the photograph still has more limited protection under the photographer’s neighbouring right under Finnish copyright law.

Neighbouring rights

Another problem area of the copyright system in relation to technological developments is neighbouring rights. They are similar to copyright but are to some extent weaker rights and do not require a threshold of originality to be exceeded. In Europe, for example, the Copyright Term Directive (Directive 2006/116/EC) allows Member States to provide for the protection of photographs other than those that are sufficiently original to qualify for copyright protection. Therefore, in many European countries, non-original photographs are protected by a neighbouring right, often called the photographer’s right. Similarly, producers of sound and image recordings – music, television and film  – are granted protection for their recordings, but producers of games or events, for example, are not. Anyone who takes a photo shall have a right, but a drawing needs to be original before it is protected. It seems generally quite random when something is protected by a neighbouring right and when it is not. If anything, a common factor behind the complexity of neighbouring rights might be a tendency to protect investments in intellectual property. [1] 

The main difference between copyright and related rights is that the latter do not require creative choices. The development of artificial intelligence makes these limited rights even more ambiguous, as digital convergence blurs the boundaries between artificial classifications: is there any difference between a photograph edited by AI and a non-photograph produced by AI? Why should some be protected and others not? 

The Anglo-American copyright tradition tends to emphasise the economic aspects of copyright, such as the right of authors to benefit financially from their creativity and the relatively wide scope for employers to obtain rights to works created by their employees (“contract for services” in the UK or “work for hire” in the US). In the continental European tradition, particularly in France, more emphasis is placed on the droit d’auteur, i.e. the moral rights of the original author to his creation. Most jurisdictions fall somewhere between these two extremes and seek to balance the interests of creative individuals and paying commissioners.  

As noted above, at least some neighbouring rights seem to protect in particular investments in intellectual property. The economic aspects of copyright may have a somewhat similar rationale: an author who has invested time, skill and creativity, or an employer who has paid a salary to an employee, should benefit from the work. It could therefore be desirable to move the copyright system in the direction of giving the one who has invested in the system the right to works created with the help of AI. 

Conclusions

To conclude this second part, the results produced by an AI system may well fall outside the scope of copyright protection, because the author must be human. However, the originality of the results of a generative AI system can be traced back to the creative choices made by the user, the software developers, the authors of the works in the training material – or any combination of these. Some neighbouring rights, such as photographer’s or producer’s rights, may also apply to works produced by an AI system. 

In the third part, we’ll present our ideas on copyright and other rights in AI models. 

1001 Lakes’ experts are happy to discuss these topics with you if you have concerns of AI and copyright or how to develop and use AI in compliance with the copyright law.

[1] Pitkänen, O.: Mitä lähioikeus suojaa? [What is protected by neighbouring rights?] Lakimies 5/2017, p. 580–602.

Copyright Challenges in the Age of AI – Part 2

Meet The AuthorsDr. Olli Pitkänen is  proficient expert with extensive experience in ICT and law, leading multidisciplinary projects and providing expertise in legal aspects of ICT, IPRs, privacy, and data as a founder of an IT law firm and advisor to companies and...

Realizing the Value of Networked Data – Part 1

Meet The AuthorsEmeline Banzuzi serves as a legal counsel and reseacher specializing in the dynamic field of law, technology and society, with expertise in data protection consulting, risk management, compliance within FinTech, and academic reseach.Joel Himanen is a...

Copyright Challenges in the Age of AI – Part 1

Can a copyright holder’s exclusive right to make copies prevent AI developers from using copyrighted works in training data?

What’s the deal with the AI Act?

In the early hours of December 9th, the European Union Parliament and Council finally came out with a provisional agreement on the contents of the Artificial Intelligence Act (AIA). In this blog post, we will summarize the main contents of the AIA and discuss its possible implications and open questions using the development and deployment of Large Language Models (LLM) as an example.

Trustworthy data for responsibility and sustainability

Data and AI play a crucial role in proving that companies act responsibly and meet their environmental, social and governance (ESG) targets.

What’s the deal with the AI Act?

Meet The Authors

Emeline Banzuzi

Emeline Banzuzi

Privacy & Data Governance Counsel

Emeline Banzuzi serves as a legal counsel and reseacher specializing in the dynamic field of law, technology and society, with expertise in data protection consulting, risk management, compliance within FinTech, and academic reseach.

Joel Himanen

Joel Himanen

Data Scientist

Joel Himanen is a versatile data scientist with a strong emphasis on advanced analytics, machine learning, and artificial intelligence, having prior experience in data-driven sustainability projects in both the private and public sectors.

What’s the deal with the AI Act?

In the early hours of December 9th, the European Union Parliament and Council finally came out with a provisional agreement on the contents of the Artificial Intelligence Act (AIA). In this blog post, we will summarize the main contents of the AIA and discuss its possible implications and open questions using the development and deployment of Large Language Models (LLM) as an example. 

The short version

The EU’s Artificial Intelligence Act aims to govern the development and deployment of AI systems in the EU, while ensuring that these systems are safe and respect the health, safety and fundamental rights and freedoms of EU citizens. The provisional agreement states that the Act will apply two years after its entry into force (i.e. following its publication in the Official Journal of the EU), shortened to six months for the bans it contains. The Act most notably impacts AI system deployers, who are regulated according to the risk category of their use case. On the side of generative AI, foundational model developers are facing significant requirements for transparency, safeguards, and testing. 

Digging a little deeper

The first draft of the Act was published in April 2021, and its final version is currently undergoing the EU legislative procedure. After the latest agreement, the Act still needs to be confirmed by both the Parliament and the Council, as well as undergo legal-linguistic revisions, before formal adoption. 

The Act defines an “AI system” as a machine-based system that, with varying levels of autonomy and for explicit or implicit objectives, generates outputs such as predictions, recommendations, or decisions that can influence physical or virtual environments. The regulation applies to providers, deployers, and distributors of AI systems as well as “affected persons”, meaning individuals or groups of persons who are subject to or otherwise affected by an AI system.

The AIA establishes varying obligations for developers and deployers of AI systems, depending on which risk classification the system in question may fall in. The Act presents four risk categories, namely: 

 

  • Unacceptable risk: AI systems that are a clear threat to the safety, livelihoods, and rights of individuals (e.g. systems used for social scoring and systems that exploit vulnerable groups such as children). The use of these systems is prohibited. 
  • High risk: AI systems that that pose significant harm to the health, safety, or fundamental rights of individuals. Examples of high-risk AI systems include those used for the management of critical infrastructure, education, employment, law enforcement, and border control. High-risk systems will be subject to strict obligations before they can be placed on the market: providers and deployers of these systems must, for instance, develop a risk management process for risk identification and mitigation; apply appropriate data governance & management practices to training, validation, and testing data sets; enable human-oversight; ensure technical robustness and cybersecurity; as well as draw up documentation that demonstrates AIA compliance. (For a complete list of obligations, see Arts. 9-17 AIA).  
  • Limited risk: Examples of limited-risk AI systems include systems intended to interact with individuals, e.g. chatbots and deep fakes. The compliance obligations for limited-risk AI focus on transparency: users of these systems must be clearly informed that they are interacting with an AI system. 
  • Minimal risk: Examples of minimal risk AI include spam filters, AI-enabled video games, and inventory management systems. The AIA allows for the free use of minimal risk AI.  

The risk categories have fluctuated throughout the drafting stages of the AIA.

Implications for model developers and deployers

 AI model and application developers are, of course, quite anxious about the Act, because it has the potential of monumentally impacting the development and usage processes. As the AIA proposal phase is being finalized, it is important to consider possible scenarios and think about the impact the Act would have on different groups in the AI field. 

Let’s consider the hottest AI topic of 2023: Large Language Models (LLM). One way to view the LLM lifespan is to divide it into three phases (upstream to downstream): foundation model (FM) development, fine tuning, and deployment. What possible implications would the AI Act have on these phases? 

Foundation model developers are the ones doing the “heavy lifting”. They develop the model architecture, scrape together and process the enormous data masses required to pre-train the model, and execute the actual pre-training, during which the model learns most of its capabilities. These are organizations backed by significant resources, since gathering the data and especially the compute-intensive pre-training are expensive activities. Having the most impact on the model itself, a FM developer will, according to the current proposal, be regulated relative to the cumulative amount of compute used for model training. For example, a FM classified as “high-impact” (more than 10^25 floating point operations during training) would also have stricter transparency requirements concerning, for instance, disclosing copyrighted training material. This is a huge requirement; the amount of data required for pre-training is so massive, that its collection process is highly automated, and thus, there is only minimal control over the substance itself. An interesting detail is that according to the latest agreement, open-source models will be subject to lighter regulation. 

Fine tuners have a smaller, yet significant impact on the model. They take a pre-trained FM and continue training it on a smaller, more specialized dataset. In a way, they perform the same manipulations on the model as the FM developer, just on a smaller scale. The interesting question follows: how will the AIA discern between them? Will fine tuners be subject to the same, computational impact -relative transparency requirements as FM developers? In any case, fine tuners will have it easier in the sense that they have far more control over the content of their datasets. 

Model deployers (considering them separate from fine tuners) do not affect the LLM itself. Rather, they decide on the final use case (although the fine tuner might already have trained the model for that case), and control how the model can be used. This means that they will most likely be subject to the bulk of the AIA’s risk category -based regulation. Deployers also build the software around the FM, affecting how the model can be used, how its inputs and outputs are processed, and how much control the end user is able to exercise over it. Consequently, more “classical” questions of software and information security might well become a critical part of AIA compliance. 

What next?

For now, we must wait for the finalized texts to come out to grasp the details of the Act. Meanwhile, every organization dealing with AI systems will have to ponder on the implications of what we know now. Deployers will already have to start giving serious thought on risk categorization and the following requirements. FM developers brace themselves for the additional work that comes with curating masses of training data, while weighing open vs. closed-source development in a new light. 

Copyright Challenges in the Age of AI – Part 2

Meet The AuthorsDr. Olli Pitkänen is  proficient expert with extensive experience in ICT and law, leading multidisciplinary projects and providing expertise in legal aspects of ICT, IPRs, privacy, and data as a founder of an IT law firm and advisor to companies and...

Realizing the Value of Networked Data – Part 1

Meet The AuthorsEmeline Banzuzi serves as a legal counsel and reseacher specializing in the dynamic field of law, technology and society, with expertise in data protection consulting, risk management, compliance within FinTech, and academic reseach.Joel Himanen is a...

Copyright Challenges in the Age of AI – Part 1

Can a copyright holder’s exclusive right to make copies prevent AI developers from using copyrighted works in training data?