Sustainable Prompting: How Smarter AI Use Turns Keystrokes into ESG Wins

What is Orbital

Industries We Serve

About Applied Computing

Learn More

Watch Demo

Physics-Grounded AI, ESG & Transition

Sustainable Prompting: How Smarter AI Use Turns Keystrokes into ESG Wins

22 Sept 2025

New evidence shows that modest, practical tweaks such as using task-tuned smaller models, quantising precision and trimming wordy prompts can slash LLM energy use by 50–90%, with little to no accuracy loss. It’s an engineering-based roadmap to scale AI responsibly, where every keystroke translates into ESG decision.

Executive Summary

Modest changes to how Large Language Models (LLMs) are built and used can reduce their energy consumption by up to 90%, without appreciable loss in performance. For repetitive tasks like translation or summarisation, switching to specialised smaller models and curating concise interactions achieve substantial energy savings. Quantisation alone can save up to 44%, while brevity, especially in querying LLMs, can lower energy use by more than 50%. This research underpins a responsible shift towards greener, more accessible AI, highlighting the fact that every keystroke made can be an ESG-driven choice.

Hands typing on a laptop, illustrating how every AI prompt and keystroke carries an energy footprint and ESG impact

An Engineering-Based Imperative: Understanding Energy in AI

Let’s step back. The energy footprint of generative AI today is staggering. By 2030, electricity demand from data centres worldwide is projected to equal what Japan consumes today. In physical terms, every prompt-response cycle uses measurable joules, watts, and kilowatt-hours. This becomes more than an academic issue; it’s engineering and infrastructure in collision. For AI to scale sustainably, we need solutions rooted not in abstract ideals, but in tangible physical efficiencies.

AI is not a function of size. Its value relies in optimising compute, latency and energy consumption. That’s why the UNESCO-UCL report matters: it harnesses profound insights into how execution and architecture choices for LLMs drive real reductions in energy use, and how small tweaks ripple massively.

Right Sizing the Compute

What if instead of one giant general-purpose brain, we had many lean, task-tuned brains? The report explores exactly that: smaller models tailored to specific functions, such as summarisation and translation, use far less energy than general AI models. In the case of LLaMA 3.1, energy use dropped by factors of 15x for summarisation, 35x for translation, and 50x for question answering. What’s impressive is that these energy wins were demonstrated while maintaining accuracy comparable to large models.

This approach echoes a broader lesson applicable in industrial use of AI. Real gains come from specialisation, and from designing systems that are customised towards a particular purpose. (See: General vs Applied AI: Why Refineries Need Specialist Intelligence)

The Quantisation Leverage: Precision Without Excess

Diving deeper into the physics, calculations inside LLMs use floating-point numbers. Quantisation reduces the decimal precision, trading minimal accuracy for substantial savings. Quantisation in the experiments conducted by the UCL team achieved up to a 44% energy reduction while maintaining at least 97% accuracy. In CPU or GPU terms, fewer decimal places mean narrower data paths, reduced memory movement, and lower energy per operation. That’s physics at work: smaller electrical swings, leaner data transfers, and fewer wasted cycles.

Brevity Matters: Shorter Prompts, Efficient Responses

Beyond coding architecture, how we talk to AI also unlocks sustainability gains. In the experiments, halving the prompt length dropped energy use by 5%, while halving the response length cut energy use by around 50%. The asymmetry is telling: the AI’s output construction is much more energy-intensive than parsing input.

Combine brevity with right-sized models, and suddenly the energy bill plunges. In a real-world scaling scenario, trimming both prompt and response word counts by half can translate to massive energy savings, all without sacrificing task quality in broad use.

Aggregate Impact: From Lab to Global Scale

A single model change is compelling on its own. Multiplied across real-world usage, the impact becomes transformative. Using ChatGPT’s estimated one billion daily queries as a reference point, applying quantisation alongside shorter dialogues could reduce daily energy consumption by 75%, a savings equivalent to powering 30,000 UK households. For repetitive tasks such as translation and summarisation, the impact is even greater. Shifting to smaller models and encouraging brevity yields over 90% savings, enough to power 34,000 UK households each day.

A small suburban village, symbolizing how AI efficiency gains can save enough energy to power tens of thousands of UK households each day

Conclusion: Towards a Leaner AI Frontier

You don’t need to hitch everything to the biggest rocket. You need the right tools, aligned with use case and focused on structural leverage. Matching model size to task, applying quantisation techniques, and designing code for brevity can unlock dramatic energy savings without accuracy sacrifice.

The takeaway is clear. AI does not scale sustainably by being bigger. It scales sustainably by being smarter, leaner, and use-case specific. Matching the model to the task efficiently is where the future lies, so we can democratise AI in a sustainable manner.

Trimming both prompt and response word counts by half can translate to massive energy savings, all without sacrificing task quality in broad use.

Walid is an Imperial and NYU alumni, currently working in applied AI applications for the energy sector. The Applied Computing team can be reached at info@appliedcomputing.com