We live in a very interesting time! I’m curious to see where the computing industry will go in the upcoming years. And this is extremely hard to predict. Tech giants collaborate with and against each other in many directions. For instance, Intel and AMD compete with each other for the best CPU, but at the same time, together they defend the X86 ecosystem against ARM that is backed by big players like Apple, Amazon, Samsung, and others. One thing I know is that we are at the beginning of the new computing era. The world progressed from the PC era to the cloud era, but now computing becomes even more distributed and heterogeneous. In this short post, I will share my humble opinion on this.
Software programmers have had an “easy ride” for decades, thanks to Moore’s law. Unfortunately, single-threaded performance growth is slowing down, which is so critical for general-purpose computing. John Hennessy said in his Google I/O 2018 talk: “That’s the end of the general-purpose processor performance as we know it”. Now it is the world of specialized HW for a particular domain. I talk a lot about that in the first chapter of my book, so read it for an expanded discussion. According to the popular paper “There’s plenty of room at the top” by Leiserson et al., streamlined HW is the future we will face, where computing devices would be heavily customized for the work they are designed to do. Everything will look like a computer. Even simple earphones will employ a noise-canceling streaming processor, that will be crafted to do just noise-canceling and nothing else.
Something more complex than the earbuds will have multiple accelerators integrated into the single SOC. I believe, our computing world moves from the traditional CPU + GPU design towards an XPU design, where multiple types of HW are integrated into a single system. In a cloud environment, heterogeneity is already a norm, with CPU, GPU, FPGA, and other accelerators hosted under the same rack. But we also see this trend for user devices. For example, the Apple iPhone chip has a CPU, a GPU, and an AI chip integrated into a single SOC. I think this trend can be observed in the actions of the industry’s major players. HW companies are broadening their portfolio to provide a wide range of HW architectures. NVidia is going after the CPU market by buying ARM, AMD acquired Xilinx, one of the major players in the FPGA world, and so on.
There is one problem associated with it. It will be extremely hard for a casual developer to target this insane number of architectures and getting a good level of performance out of them. One of the existing solutions is CUDA which is proprietary to NVidia and only targets their HW. For developers, to be able to program for a variety of different HW architectures, we need an open standard supported by all the vendors.
Sorry for the shameless plug, but here is where Intel’s OneAPI comes into play. This is the project that I’m working on, so I’m happy to talk about it. OneAPI is a bold idea that all the HW devices can be programmed using a single API, just like OpenCL, but much easier. OneAPI is built upon the modern C++ and Khronos SYCL standards. The core of [ … ]