davidbgk
/
larlet-fr-david-cache

title: The mounting human and environmental costs of generative AI
url: https://arstechnica.com/gadgets/2023/04/generative-ai-is-cool-but-lets-not-forget-its-human-and-environmental-costs/
hash_url: 230f8f7224199132de4ce030458536de

Over the past few months, the field of artificial intelligence has seen rapid growth, with wave after wave of new models like Dall-E and GPT-4 emerging one after another. Every week brings the promise of new and exciting models, products, and tools. It’s easy to get swept up in the waves of hype, but these shiny capabilities come at a real cost to society and the planet.

Downsides include the environmental toll of mining rare minerals, the human costs of the labor-intensive process of data annotation, and the escalating financial investment required to train AI models as they incorporate more parameters.

Let’s look at the innovations that have fueled recent generations of these models—and raised their associated costs.

## Bigger models

In recent years, AI models have been getting bigger, with researchers now measuring their size in the hundreds of billions of parameters. “Parameters” are the internal connections used within the models to learn patterns based on the training data.

For large language models (LLMs) like ChatGPT, we’ve gone from around 100 million parameters in 2018 to 500 billion in 2023 with Google’s PaLM model. The theory behind this growth is that models with more parameters should have better performance, even on tasks they were not initially trained on, although this hypothesis remains unproven.
Model size growth over the years.
Enlarge / Model size growth over the years.

Bigger models typically take longer to train, which means they also need more GPUs, which cost more money, so only a select few organizations are able to train them. Estimates put the training cost of GPT-3, which has 175 billion parameters, at $4.6 million—out of reach for the majority of companies and organizations. (It's worth noting that the cost of training models is dropping in some cases, such as in the case of LLaMA, the recent model trained by Meta.)

This creates a digital divide in the AI community between those who can train the most cutting-edge LLMs (mostly Big Tech companies and rich institutions in the Global North) and those who can’t (nonprofit organizations, startups, and anyone without access to a supercomputer or millions in cloud credits). Building and deploying these behemoths requires a lot of planetary resources: rare metals for manufacturing GPUs, water to cool huge data centers, energy to keep those data centers running 24/7 on a planetary scale… all of these are often overlooked in favor of focusing on the future potential of the resulting models.

## Planetary impacts

A study from Carnegie Melon University professor Emma Strubell about the carbon footprint of training LLMs estimated that training a 2019 model called BERT, which has only 213 million parameters, emitted 280 metric tons of carbon emissions, roughly equivalent to the emissions from five cars over their lifetimes. Since then, models have grown and hardware has become more efficient, so where are we now?

In a recent academic article I wrote to study the carbon emissions incurred by training BLOOM, a 176-billion parameter language model, we compared the power consumption and ensuing carbon emissions of several LLMs, all of which came out in the last few years. The goal of the comparison was to get an idea of the scale of emissions of different sizes of LLMs and what impacts them.
Enlarge
Sasha Luccioni, et al.

Depending on the energy source used for training and its carbon intensity, training a 2022-era LLM emits at least 25 metric tons of carbon equivalents if you use renewable energy, as we did for the BLOOM model. If you use carbon-intensive energy sources like coal and natural gas, which was the case for GPT-3, this number quickly goes up to 500 metric tons of carbon emissions, roughly equivalent to over a million miles driven by an average gasoline-powered car.

And this calculation doesn’t consider the manufacturing of the hardware used for training the models, nor the emissions incurred when LLMs are deployed in the real world. For instance, with ChatGPT, which was queried by tens of millions of users at its peak a month ago, thousands of copies of the model are running in parallel, responding to user queries in real time, all while using megawatt hours of electricity and generating metric tons of carbon emissions. It’s hard to estimate the exact quantity of emissions this results in, given the secrecy and lack of transparency around these big LLMs.

## Closed, proprietary models

Let’s go back to the LLM size plot above. You may notice that neither ChatGPT nor GPT-4 are on it. Why? Because we have no idea how big they are. Although there are several reports published about them, we know almost nothing about their size and how they work. Access is provided via APIs, which means they are essentially black boxes that can be queried by users.

These boxes may contain either a single model (with a trillion parameters?) or multiple models, or, as I told Bloomberg, “It could be three raccoons in a trench coat.” We really don’t know.

The plot below presents a timeline of recent releases of LLMs and the type of access that each model creator provided. As you can see, the biggest models (Megatron, PaLM, Gopher, etc.) are all closed source. And if you buy into the theory that the bigger the model, the more powerful it is (I don’t), this means the most powerful AI tech is only accessible to a select few organizations, who monopolize access to it.
A timeline of recent releases of LLMs and the type of access each model creator provided.
Enlarge / A timeline of recent releases of LLMs and the type of access each model creator provided.
Irene Solaiman

Why is this problematic? It means it’s difficult to carry out external evaluations and audits of these models since you can’t even be sure that the underlying model is the same every time you query it. It also means that you can’t do scientific research on them, given that studies must be reproducible.

The only people who can keep improving these models are the organizations that trained them in the first place, which is something they keep doing to improve their models and provide new features over time.

## Human costs

How many humans does it take to train an AI model? You may think the answer is zero, but the amount of human labor needed to make recent generations of LLMs is steadily rising.

When Transformer models came out a few years ago, researchers heralded them as a new era in AI because they could be trained on “raw data.” In this case, raw data means “unlabeled data”—books, encyclopedia articles, and websites that have been scraped and collected in massive quantities.

That was the case for models like BERT and GPT-2, which required relatively little human intervention in terms of data gathering and filtering. While this was convenient for the model creators, it also meant that all sorts of undesirable content, like hate speech and pornography, were sucked up during the model training process, then often parroted back by the models themselves.

This data collection approach changed with the advent of RLHF (reinforcement learning with human feedback), the technique used by newer generations of LLMs like ChatGPT. As its name indicates, RLHF adds additional steps to the LLM training process, and these steps require much more human intervention.

Essentially, once a model has been trained on large quantities of unlabeled data (from the web, books, etc.), humans are then asked to interact with the model, coming up with prompts (e.g., “Write me a recipe for chocolate cake”) and provide their own answers or evaluate answers provided by the model. This data is used to continue training the model, which is then again tested by humans, ad nauseam, until the model is deemed good enough to be released into the world.

This kind of RLHF training is what made ChatGPT feasible for wide release since it could decline to answer many classes of potentially harmful questions.
An illustration of RLHF training.
Enlarge / An illustration of RLHF training.

But that success has a dirty secret behind it: To keep the costs of AI low, the people providing this “human feedback” are underpaid, overexploited workers. In January, Time wrote a report about Kenyan laborers paid less than $2 an hour to examine thousands of messages for OpenAI. This kind of work can have long-lasting psychological impacts, as we've seen in content-moderation workers.

To make it worse, the efforts of these nameless workers aren’t recognized in the reports accompanying AI models. Their labor remains invisible.

## What should we do about it?

For the creators of these models, instead of focusing on scale and size and optimizing solely for performance, it’s possible to train smaller, more efficient models and make models accessible so that they can be reused and fine-tuned (read: adapted) by members of the AI community, who won’t need to train models from scratch. Dedicating more efforts toward improving the safety and security of these models—developing features like watermarks for machine-generated content, more reliable safety filters, and the ability to cite sources when generating answers to questions—can also contribute toward making LLMs more accessible and robust.

As users of these models (sometimes despite ourselves), it's within our power to demand transparency and push back against the deployment of AI models in high-risk scenarios, such as services that provide mental help therapy or generate forensic sketches. These models are still too new, poorly documented, and unpredictable to be deployed in circumstances that can have such major repercussions.

And the next time someone tells you that the latest AI model will benefit humanity at large or that it displays evidence of artificial general intelligence, I hope you'll think about its hidden costs to people and the planet, some of which I’ve addressed in the sections above. And these are only a fraction of the broader societal impacts and costs of these systems (some of which you can see on the image below, crowdsourced via Twitter)—things like job impacts, the spread of disinformation and propaganda, and copyright infringement concerns.
There are many hidden costs of generative AI.
Enlarge / There are many hidden costs of generative AI.

The current trend is toward creating bigger and more closed and opaque models. But there’s still time to push back, demand transparency, and get a better understanding of the costs and impacts of LLMs while limiting how they are deployed in society at large. Legislation like the Algorithmic Accountability Act in the US and legal frameworks on AI governance in the European Union and Canada are defining our AI future and putting safeguards in place to ensure safety and accountability in future generations of AI systems deployed in society. As members of that society and users of these systems, we should have our voices heard by their creators.