Most of my year-end AI conversations with execs sound like this:
“We are narrowing in on or revising a formal AI strategy with expected results.
Also… our sustainability team is quietly freaking out about all the extra compute.”
That tension is real. The same AI story you’re telling to investors and employees—faster, smarter, more automated—is bumping up against the story you’re telling about energy, water, and long-term ESG commitments.
This piece is about the silver lining I’m seeing in those conversations: if you’re willing to rebalance where AI runs, you can take real pressure off data centers by shifting a meaningful slice of routine AI work onto hardware you already own.
No magic. No greenwashing. Just better placement of workloads.
The uncomfortable math behind “AI everywhere”
A few data points to ground this:
- AI isn’t niche anymore. One large global survey found that 96% of organizations are already deploying AI models in some form.
- The infrastructure to serve that demand is exploding. Data-center construction tied to AI has soared ~40% year over year, raising red flags not just about power but about water consumption to keep those facilities cool.
- At the model level, training compute and power requirements are doubling on aggressive cycles—roughly every five to twelve months depending on what you measure—and the power required for AI training is increasing annually.
- The emissions gap is widening fast. Training early models like AlexNet emitted hundredths of a ton of CO₂; newer frontier models are in the hundreds to thousands of tons per training run—far above the ~18 tons a typical American emits in a year.
Even with hardware getting more energy-efficient over time, the combination of bigger models, more usage, and more automation is pushing total demand sharply upward.
That’s why AI is starting to show up not just in your digital strategy decks, but in ESG briefs and sustainability risk registers.
AI is now a sustainability question, not just a tech question
We’re already seeing early signs of cultural pushback.
Fast Company recently highlighted public sector workers in the UK who are reluctant to use AI tools specifically because of net-zero and climate-commitment concerns, and city IT teams in the U.S. beginning to vet AI projects through a sustainability lens.
At the same time:
- Regulators are moving quickly on climate and AI governance.
- Investors are reading ESG disclosures with more skepticism.
- Younger employees—especially in tech and healthcare—are paying attention to where and how AI runs, not just what it can do.
McKinsey’s latest tech trends work calls out data-center power constraints, grid access, and physical infrastructure frictions as a key scaling challenge for AI and other compute-heavy workloads.
In other words: even if your AI business case clears the financial hurdle, you still have to answer, “What is this doing to our footprint and our story?”
The silver lining: you already own a lot of the compute
Here’s the part that doesn’t get enough airtime in boardrooms:
You’ve already paid for an enormous amount of compute that sits on desks, in carts, and in bags—laptops, desktops, workstations, thin clients—with CPUs, GPUs, and increasingly NPUs that are idle or underused most of the time.
A few trends make this strategically interesting:
- Endpoint hardware is getting more efficient every generation. ML hardware performance has been improving while energy efficiency increases around 40% per year, meaning you’re getting more useful work per kilowatt on newer chips.
- Smaller, cheaper models are catching up. The AI Index shows the cost of running GPT-3.5-level performance dropping more than 280x in about 18 months, as small, efficient models become viable.
- Most high-volume enterprise use cases today are lightweight but frequent: drafting, summarizing, rewriting, translating, routing, and nudging actions—what I usually call “digital chores.”
Those “digital chores” are exactly the type of workloads that can run well on devices you already have, using combinations of CPU, GPU, and NPU, instead of hitting a distant data center for every single prompt.
This doesn’t eliminate the need for cloud AI. You’ll still need large models and shared services for:
- Multi-party workflows and external experiences
- Heavy multimodal workloads
- Training and fine-tuning
- Cross-tenant analytics
But you absolutely do not need to ship every bit of day-to-day reasoning to the cloud.
What “local, on-device AI” actually means (in plain English)
When I say “local, on-device AI,” I’m talking about:
- Models and automation that run on the employee’s machine (laptop, desktop, workstation), not in someone else’s data center.
- Data that never leaves the device for routine tasks—drafting, summarizing, translating, or triggering automations across apps on that same machine.
- Hardware-agnostic acceleration:
- Runs on CPU only for older devices
- Takes advantage of GPU where it’s available
- Lights up NPUs on newer “AI PCs” as they roll into the fleet
That last part matters. A credible strategy here cannot assume every device has a cutting-edge NPU. You need software that:
- Is efficient enough to be useful on CPU.
- Can offload selectively to GPU where present.
- Automatically accelerates on NPUs as refresh cycles bring newer endpoints into the mix.
That’s how you honor prior CapEx on existing hardware, while still being ready for the next generation of devices. When we work closely with our partners like Intel, we are constantly pushing the value of their silicon as far as we can. On-device AI processing is an important component to managing an enterprise workforce. It pulls forward the value of investment beyond simply ‘future-proofing’ – spoiler alert: the future is here (cliche but true).
Why this matters for ESG narratives in finserve, tech, and healthcare
The executives I talk to aren’t trying to turn “we moved some prompts off the cloud” into their core ESG pillar. They’re trying to do three things:
- Avoid an ugly surprise in their climate math.
AI is now a non-trivial line item in your energy and water story. The AI Index shows that model training emissions are already at “hundreds or thousands of tons” per frontier model, and that total power usage continues to rise even as hardware gets more efficient. - Show they’re not blindly scaling compute.
Reports from McKinsey, Deloitte, and others all converge on the same theme: AI adoption is accelerating, but scaling is constrained by infrastructure, governance, and risk—not just algorithms.
Being able to say, “We deliberately kept routine AI workloads on existing hardware and reserved cloud capacity for what truly needs it” is a credible posture. - Align AI with existing “sovereign” and data-residency commitments.
In financial services and healthcare, “sovereign AI” (keeping sensitive data and models within national borders or strict network boundaries) is becoming a design requirement, not a buzzword.
Local, on-device AI is one of the simplest ways to keep a large percentage of sensitive work out of multi-tenant clouds altogether.
And there’s a business backdrop that’s easy to forget: Microsoft’s Work Trend Index shows 82% of leaders say productivity must increase, while 80% of the global workforce says they don’t have enough time or energy to do their jobs.
Digital labor is coming. The question is whether you’ll only buy it via massive data-center expansion, or whether you’ll let some of that intelligence live on the devices you already power.
A simple blueprint: rebalance, don’t rip and replace
If you’re in a CIO/CTO/CSO triangle, here’s a pragmatic way to approach this without turning it into a science project.
Tier your AI workloads
Do a fast classification of existing and planned AI use cases along two axes:
- Data sensitivity
- Public / marketing
- Internal but low-risk
- Regulated / highly sensitive (PHI, PII, trading data, clinical notes)
- Compute intensity
- Lightweight, single-user “digital chores”
- Medium, team-scale tasks
- Heavy, organization-wide or external workloads
Then apply a simple rule of thumb:
- Local first for lightweight, single-user tasks with sensitive or internal data:
- Email drafting and rewrites
- Note-taking and summarization
- Internal translation and tone-shifting
- Simple “do X, then Y” automations on the same device
- Cloud when you must for:
- Cross-team or external experiences
- Heavy multimodal reasoning (long video, complex agents)
- Training, fine-tuning, and shared analytics
You’re not trying to maximize “edge” for its own sake. You’re trying to minimize unnecessary data-center hits for routine work.
Make hardware-aware decisions, not NPU-only bets
When you evaluate software and platforms:
- Require a clear story for CPU-only performance on your existing fleet.
- Ask how the same stack uses GPU where it exists—especially in engineering, research, and imaging-heavy teams.
- Confirm that NPU acceleration is additive, not a hard requirement, so your roll-out can follow natural device refresh cycles instead of a forced forklift upgrade.
This is how you avoid a two-class workforce where only the people on brand-new hardware get the ESG-friendly, low-latency AI experience.
Bake it into ESG and AI governance together
Most organizations are still catching up on AI governance. F5’s AI Readiness research found that only 2% of surveyed organizations qualify as “highly ready” to scale and secure AI across environments.
Use that to your advantage:
- Add “workload placement” (cloud vs device) to your AI steering committee charter.
- Involve your Chief Sustainability Officer early, not as an after-the-fact reviewer.
- Track a couple of simple metrics:
- % of AI interactions served on local hardware vs cloud
- Estimated incremental cloud energy / water impact avoided by keeping routine tasks local (even if it’s coarse at first)
You don’t need perfect telemetry on day one. You need a defensible narrative and a plan to make it more precise over time.
How to talk about this in your next strategy or board review
If you want language that lands with non-technical stakeholders without overselling, something like this tends to resonate:
“As we scale AI, we’re being intentional about where the compute runs.
For routine, single-user tasks—drafting, summarization, translation—we’re shifting more work to the laptops and workstations we already own, using their CPUs, GPUs, and NPUs instead of always calling out to distant data centers.
We reserve cloud AI for the large, shared workloads that truly require it. This lets us capture productivity gains while reducing incremental data-center energy and water impact, and it keeps more of our sensitive data inside our existing network and hardware footprint.”
That’s honest. It’s measurable. And it frames local AI as part of responsible scaling, not a side project.
If you’re adapting this for your own organization
You don’t need to answer these back to me, but they’re the questions I’d have you run through internally:
- Which of our current AI use cases are truly cloud-dependent, and which could be served just as well (or better) on existing endpoints?
- How much of our workforce is already on hardware with GPUs or NPUs—and how will that change over the next 24–36 months?
- Where are ESG, security, and AI governance currently disconnected—and how do we get those stakeholders looking at workload placement together?
- What would it take for us to report, even at a high level, the percentage of AI work we keep local vs in the data center?
If you’re wrestling with those questions in finserve, tech, or healthcare and want a sparring partner—not a pitch deck—I’m always up for a conversation.
References
https://www.f5.com/resources/reports/state-of-ai-application-strategy-report
https://hai.stanford.edu/assets/files/hai_ai_index_report_2025.pdf
https://arxiv.org/pdf/2504.07139
https://www.fastcompany.com/91411720/ai-energy-use-pr-problem
https://www.mckinsey.com/capabilities/tech-and-ai/our-insights/the-top-trends-in-tech
https://www.microsoft.com/en-us/worklab/work-trend-index/ai-at-work-is-here-now-comes-the-hard-part

