PUE has leveled off at around 1.6, but the majority of the inefficient datacenter still average PUE of 2.0-2.5 ...
Facebook and some other private companies have PUE of 1.2 they're known in industry as hyperscalers ...
I think that's the important point here. A large fraction of all servers today are installed in hyperscaler locations. In particular, a large fraction of all worldwide compute power is used by a small group of internet companies (those that provide services to individual users, for example Amazon (*) Facebook, Google, Netflix), and by the big cloud service companies where businesses of all size (from mom and pop stores to the world largest) do their computing, dominated by Amazon, Google and Microsoft. That's the sloppily-named FAANG (plus their chinese counterparts).
The hyperscaler data centers were running at a PUE of ~1.2 in about 2008. They have since gotten lower; both Google and Facebook report an average of 1.1 (not for small experimental systems, but for their fleet with many data centers). That means that cooling has become a relatively small energy overhead, and at this point reducing the energy usage of "the computer" itself is 10x more important than making the cooling better. How does one go from the historical PUE of 1.5 ... 2 down to 1.1? The use of evaporative cooling (turning water into steam) is one ingredient, but not the major one, nor is it always used. Other tools are reducing wasteful air movement, not mixing hot and cold air, running electronics at efficient temperatures (which is much warmer than commonly expected), moving cooling water to where it really helps, and an enormous amount of attention to detail. Data center cooling is a large and important field of engineering.
The other approach is to reduce the energy usage of the computing itself. Example include to change the instruction set of the computer from x86 to Arm or RISC-V, which is happening right now. Another example is to move AI workloads away from general-purpose CPUs (very inefficient) to GPUs (better) and then to dedicated AI chips (best). But as you said, AI training uses a lot of compute cycles. There are many other things that are being done to reduce power usage: Amazon turns disk drives off for hours at a time, other companies use drives that spin slow. On the opposite side, IO-intensive workloads are being moved from generic enterprise nearline disks to fast multi-actuator disks (where the cost of running the spindle motor is amortized over multiple sets of heads that move independently), and to SSDs (which are more energy efficient per IO, even if their total energy usage can be high). Similar things are happening in networking, by not overprovisioning networks, making hops shorter (fewer routers touched), and avoiding media conversion (fiber to the chip). Even the power delivery mechanism within the computer is being optimized, with large-scale adoption of low-voltage DC distribution, coupled with superconducting cables; and UPS batteries deployed in optimal places and with optimal capacities (just large enough to handle the starting delay of the diesel generator).
It is true that a lot of data center cooling is done by evaporating water. And this is where activists start screaming "these evil computer companies are using water that could be used for millions of people to drink". Those statements are vastly exaggerated, to the point of being partly nonsense. To begin with, some of the evaporated water is not treated and drinkable city water, but existing surface water. There is a reason a lot of data centers are built along the Mississippi and Columbia rivers, where large amounts of fresh water are running into the ocean anyway. And a lot of data centers do not use evaporative cooling, in particular in colder climates. There is a reason many data centers are are way north (Canada, Scandinavia).
(*) Amazon is in the list of internet companies because their public-facing "we sell everything" side has become one of the most used search engines, when people search for products, and because it delivers a significant fraction of all advertising. In theory, Microsoft should also be in the list of internet companies, with Bing being the 2nd largest search engine, and a reasonable fraction of advertising delivery.
Having said that: The PUE for hyperscalers is excellent, so much so that cooling computers has become a minor inefficiency when deployed like that. And as you said, this does not apply to all data centers, and not to all modes of computing. The gamer with his 750W tower with blinking lights, who has to crank up the AC in the summer: that is ridiculously inefficient. As is the typical small business who converted an old cleaning closet to a "server room" by sticking a cooling air duct in there, and having a single rack with their DSL gear, network and phone switch, and a few servers, and keep it at 12 degrees C = 55 degrees F because someone told them that computers like it cool: that's also insanely inefficient.