For a while x86 CPUs were just aimed at getting faster, more and more ticks. Then, there was a shift from faster to more efficient (in terms of power used / heat generated vs. performance) in the Pentium 3/4 era. Then, another shift towards multiple cores. Now, the next shift has occurred: CPU and GPU fusion. (Despite AMD talking about it for longer than I can recall. I think I was talking to an AMD VP shortly before they purchased ATI about how GPU and CPU fusion was the future.)
Recently Intel released Sandy Bridge their CPU/GPU processor, and AMD's fusion APU's have been around. Interestingly, both are targeted at laptops (the bulk of the market), and still expect discrete GPU's for high-end performance. The fusion is just a bit of a bonus.
Intel/AMD aren't standing still on the CPU front and are pushing ahead with AVX but the real interesting part is that OpenCL is being pushed across the board. Recent publications from the UK GPU computing conference demonstrate that ARM are pushing OpenCL as their platform of choice (for both CPU and their Mali GPU, it is Clang/LLVM based), and AMD and Intel have been strong supporters of OpenCL too. Did you know Samsung supports OpenCL too?
It would seem that OpenCL has a very strong support base, and is likely the platform of choice for developers on Intel, AMD, ARM, Apple, IBM, etc. What hope does CUDA stand? It seems nVidia will eventually be forced to drop CUDA, or invest heavily into CPU-based CUDA (Hello Ocelot!) or OpenCL translation. Eventually people will tire of writing and re-writing CUDA programs as they swap between platforms.
Envision the not-so-far future where you have a workstation with multiple CPU/GPU fusion cores [non-CUDA], and a discrete fusion GPU (GPU [CUDA]/ARM [non-CUDA] core) - are you really willing to write a specific CUDA routine for just the nVidia GPU, or are you more likely to try a less optimal OpenCL routine that will then also then run on the CPU, CPU SIMD, CPU/GPU fusion and ARM cores?
It seems that the argument being put forward by nVidia to write everything in CUDA for optimal performance will not hold as you will gain more from all the other devices helping out in the computation compared to the loss from using OpenCL over CUDA.
Of course there are other choices out there such as Accelerator and RapidMind.
The UK conference reveals Microsoft's GPGPU language "Accelerator" has made progress, and now is no longer only GPU limited (it now supports SSE3 on modern CPU's). I'm still not aware of anyone using this for anything practical though, which seems to be a bit of a shame. And ever since Intel bought RapidMind (now Array Building Blocks) the tech became very Intel-specific. So neither of those choices sound too promising.
It will be interesting to see what nVidia decide to do with CUDA, how developers will adjust to fusion CPU / GPUs and what higher-level language (or language extension?) will become de-facto in the future. Stay tuned..