You would need to define “the most powerful computers”. If you mean super computing clusters? The answer is likely heat and price, but understand that those machines are stacks of individual computers, not a single computer. They have a specialized use and do not do the same things desktops, laptops, and phones do. They have used all sorts of different chips in super computers over the years, even that wacky one made out of Play Station 3s because it was the cheapest source of Cell processors.
Now, mainframes, which still exist and can’t seem to be replaced in functionality by their consumer cousins, don’t necessarily use ARM or x86. IBM sill uses their Power PC architecture with added instructions and hardware that makes them almost more CISC than RISC. They run super hot, but can handle certain operations very fast, and their functionality comes before their power consumption. You might count modern mainframes as the fastest “individual” computers for their intended purposes.
If you want to know about home computers, you will find that x86 and ARM still have around the same performance in both their high ends, despite what Apple says. In fact, our super fast M1 and M2 Apple Silicon to some extent just have features, like a HUGE cache or a tiny fabrication process, that are in no way unique to ARM and could easily just be added to any particular x86 model… if you wanted them to be more expensive.
So, really, the question vague, and you need to define what you mean by most powerful, but the fact is none of these architectures really has amazing super powers. They are all basically on the same tech level and the differences are generally negligible, except how much effort is put into ARM power efficiency, and even some of that is a bit of smoke and mirrors.
When you are talking HPC, the only metric that matters is TFLOPS per watt.
ARM still wins because it doesn’t have the massive x86 silicon bloat. So silicon real-estate becomes more cores instead of widgets, gizmos, and doodads within each core. ARM thus becomes more redundant and easier to scale.
Intel is stuck with its “ring bus” topology and AMD is constrained by “infinity fabric.” ARM doesn’t really have those inherent limitations an can therefore be grown to “wafer scale” lattices where the cores are laid out in an X,Y plane like a GPU.
A good example of an ARM core lattice is Tesla’s Dojo. E
… (more)The premise is not really true. If we look at the top 10 supercomputers ranked by the LINPACK benchmarks right now, the one used in the TOP500 list, we see:
- Rank PetaFLOPS CPU GPU
- 1 1194 EPYC 64C AMD MI250X
- 2 443 Fujitsu A64FX none (!)
- 3 309 EPYC 64C AMD MI250X
- 4 238 Xeon 8358 Nvidia Ampere A100
- 5 149 POWER9 Nvidia Tesla V100
- 6 95 POWER9 Nvidia Tesla V100
- 7 93 Sunway SW26010 none
- 8 71 EPYC 64C Nvidia Ampere A100
- 9 64 EPYC 64C Nvidia Ampere A100
- 10 61 Xeon 45-2692 Matrix-2000 (?)
Four of the top ten are AMD’s EPYC processor, which runs the x86-64 instruction set. Two are Intel Xeon CPUs! Two are based on IBM POWER processors, which are not ARM based either. Only one CPU on the list is an ARM architecture — the Sunway is its own idiosyncratic thing (a Chinese RISC architecture.)
We could argue that what most of these supercomputers really use is their GPUs, but those aren’t ARM-based in any way either.
You might be able to make the case that the most powerful laptop or server CPUs are ARM based right now, and argue about the sources of the benefit — which is an interesting question even if debatable. But the most powerful computers are still mainly the x86–64 architecture.
Instruction set tends not to matter as much as we thought twenty or thirty years ago, and the advantage of the newer ARM CPUs may not lie in instruction set at all so much as starting over with a more power-conscious approach. It looks to me more like a classic case of disruption — ARM cores were not good enough for the most demanding applications, until they suddenly were, because they’ve been eating the lower end of the market and improving performance throughout.
But they haven’t yet taken over the supercomputing market.
Amateurs look at processor clock rates. Professionals look at bottlenecks.
In a “supercomputer,” the processors are only one part of the system. With modern processors, they are rarely the bottleneck of the system: communication latency and bandwidth to other nodes is often the limiting factor. There’s no point in speeding up the processors if the system’s performance is going to be basically the same. It’s like trying to drive a Ferrari on a dirt track.
In addition, the cost of running a supercomputer is substantial: it can be on the order of millions of dollars a year in electricity costs alon
… (more)“Why does the most powerful supercomputer in the world use ARM processors if x86 is faster?”
To try to distill a simple answer to what I think you are trying to ask …
When we talk about ARM vs. x86, what we are talking about is instruction sets, in other words, the list of instruction types that each processor category comprehends. Obviously there are lots of different implementations of the ARM instruction set, each with its own speed, power consumption, etc. and there are similarly lots of implementations of the x86 instruction set.
The x86 instruction originated in the 1970s and has been expan
… (more)I attended one of the ARM Technology conferences in Cambridge UK about ten years ago. The CEO gave the keynote speech. He said (paraphrasing from memory):
Let’s talk about the elephant in the room. Intel. Intel doesn’t want our business. We’re a rounding error to them. They want your business. They want to build all the chips you’re making and they have all the engineering resources to do it.
The room went very quiet! Intel at that time ruled. Using a small x86 core like the Atom just meant that they could take over your product space any time they wanted. They had better fabs, they had better design techniques, and they had a killer range of processors they could drop in.
So people went with ARM to avoid getting pulled into the Intel vortex. ARM was entirely happy to just sell you a core, and didn’t care what you did with it.
TSMC had a similar business philosophy and succeeded for a similar reason. They didn’t care what your chip did, so long as the transistors they built for you met their specs. They were not in competition with you.
Intel is no longer the 800-pound gorilla - that role has been taken by Samsung. They don’t have their own processor IP, but now have excellent fabs. They’re trying for TSMC. If you build with them, you can’t be sure that they won’t use your design to upgrade their own. They swear that they won’t, but what’s your recourse if they do? They’re very much in competition with their customers, and that’s going to deter a lot of people.
The major Chinese fab, SMIC, is the same way. They’ll offer a great wafer price, but you can’t be sure that they won’t reverse engineer your design, or even stick in back door security holes. That’s not difficult, and easy to justify if the outside world is constantly ragging on you. Use them at your peril.
ARM and Intel aren’t specific processors, they are instruction sets that chipmakers build whole flotillas of different designs of processors around. And these designs are optimized for different purposes— different communication architectures, different silicon processes, etc.
If you want to compare apples-to-apples, you can compare Apples’ to Intels. If you compare the new Apple M1 (Arm) to Intel’s latest laptop chip, the M1 totally destroys Intel in most benchmarks, simultaneously winning processing speed and power by quite a lot, and this is just the very first generation, lowest end laptop processor Apple has built. The iMac and 16” MacBook versions of this sucker are going to be monsters.
It’s instructive to look at why this is. Three main reasons:
1. Apple directly stacks RAM on top of the CPU, and also adds GPU, DSP, image processor, and “neural engine” in the package, all able to access this RAM without an intermediate cache. This means more bandwidth, shorter connections, faster access and less waste heat.
2. Apple is at TSMC, on a 5 nm process node vs. Intel’s 10 nm — this is a big deal, TSMC is 2 nodes ahead, which means more components on a chip and less power consumption and heat generation.
3. ARM uses a RISC instruction set, which makes it a lot easier to do things like out of order execution — you wind up with each core having more throughput, despite a lower clock rate
Now, a supercomputer isn’t necessarily going to implement the same ARM as Apple, they’re going to implement something even better suited to bulk computation. More cores, maybe a different balance of the peripherals based on the common application load.
But the M1 and its ilk’s low power consumption is their most important superpower. The biggest restriction in supercomputer speed is size, which is driven by heat. The low power of a battery-friendly device also means low power dissipated as heat, which means more cores can be packed closely together. The 5mm node is practically the whole game, although customizability, tight coupling to RAM and better reordering are big too.
And these custom processors are getting affordable. For example, Raspberry Pi was able to build their own chip without Apple’s wallet. A supercomputer built kit of custom, 5nm ARMs is going to utterly blow the doors off of any of the off the shelf 10nm Intel chips.
- They’re orders of magnitude cheaper. The smallest ARM core, the Cortex-M0, costs a penny’s worth of silicon, and even a large one like the Cortex-A72, costs only tens of cents. That’s for just the processor and its local caches, and doesn’t include all the other memory and IO that a system needs.
- They can be built anywhere. They come as “soft IP”, meaning register-transfer-level designs that can be synthesized to the logic cells for any process. You can add them to whatever is special on your own chip.
- You can use them even if you’re competing with Intel or AMD for some application. They would n
Do you know even today lot of companies, banks, public services are still running Win 95/XP and are using 30 year old software, eg written in Cobol.
Now imagine how many x86 programs exist and is in use today. Start with yourself. You bought x86 game for $100, are you going to throw it away cause ARM is better CPU?
Anyway, ARM already replaced everything else in mobile segment. Thanks to Intel making buttload of money from enterprise and consumers and neglecting mobile. And today is too late, even if Intel makes Atom less power hungry and more powerful than ARM it has no chances against ARM. Maybe in future but this process will take a very long time.
Similar is with x86 replacement. Simply to many software. Apple is totally different story to Windows/Linux, it is fully closed ecosystem under total Apple control. If someone wanna make something for Apple device, must obey. It was easy for Apple to make a switch from x86 to ARM, developers were forced to obey and got all necessary tools. Other side is completely different. For example look at Microsoft fighting to push C#, .Net and UWP and mostly failing. ARM Windows are not popular and in past already failed few times, eg Surface RT.
On performance side ARM is not as powerful as x86. Most powerful ARM chip, Apple M1, cannot compete to laptop x86 and recently was obliterated by Intel 12th Gen. Yes, Intel consumes more power but here we talk performance.
Also what matter is, M1 is produced in 5nm node no other desktop CPU is using (strictly for mobile) which is 40% less power than same 7nm node used by AMD and even more less power compared to Intel 7nm. Another reason, as mentioned above, Apple is fully closed ecosystem and as such much better optimized than anything else.
M1 works strictly in Apple devices and Apple removed 32 bit support. Latest x86 CPU is capable of running 40 years old x86 software. Even 16 bit software from MS-DOS era. Also Win 10/11 is capable of running on 30 year old CPU as well as on latest one.
Internally x86 is not much different to ARM and vice versa. Today only way to get performance are parallelization and clever tricks using billions of transistors. Both CPUs are optimizing compilers converting machine code used in programs into internal code consisting of macro and micro Ops, optimizing them and executing in parallel. Every single core of many used in CPU, internally is another “multi core” processing device.
If we look at M1 Max its max clock is 3.2 GHz while x86 is designed to work at 5 GHz and Intel even on 5.5 GHz or even higher. Unfortunately, for now, modern nodes, eg 5nm, are not capable of reaching 5 GHz. And in mobile devices speed/performance is not key factor.
So no, ARM is not going to replace x86 anytime soon. RISC-V is becoming more popular (Intel invested 1 billion) but is way behind ARM in performance. And when RISC-V reaches ARM performance level it will be same power consumption.
About 15 years ago, AMD was far better than Intel. Talking about absolute performance or performance per dollar, AMD was better with both.
AMD’s Barton core Athlons released in 2003 were far better than any of the Intel CPUs. Back then, about none of the custom PCs were built with Intel, but still the statistics showed that Intel sold more CPUs. We, custom PC builders didn’t understand how is it possible as about 99% of the custom PCs we built was using AMD, but the later antitrust cases showed that Intel offered so unfair benefits for the large branded manufacturers that they didn’t use AMD at
… (more)Each supercomputer is designed for the kind of work it is expected to do. It is optimized for performance per money spent on acquisition and operation (electricity and cooling). In supercomputers with GPU’s, the CPU is mostly used for controlling network I/O (including to persistent storage) and the GPU’s.
Some supercomputers for applications that use large amounts of memory for data have been built with slow cores having small data caches. Larger cache wouldn’t have made much difference. Faster cores would just spend most their time stalled waiting for memory to load into registers.
The microarchitecture of any 2 CPUs can be pretty different. Even between two x86–64 CPU’s (Intel’s Core or AMD’s Zen), the internals can be pretty different.
But let’s talk about it from an architectural point of view. That is, what the software sees.
Instructions
The obvious first difference is the encoding of the instructions and the assembly language itself. Here’s “Hello World!” in x86:
And here’s “Hello World!” in ARM:
The major difference is that ARM is a load-store architecture. Instructions that do compute (add, subtract, multiply, move, etc.) must operate on registers only. There are dedi
… (more)From today’s software development perspective there are no differences, well difference is in compiler or programming language executable usage. For example, if Python is used for development only what matters is executable used. Cause Windows runs on both x86 and ARM, executable files have same exe extension but x86 exe will not run on ARM Windows. And for Python developer that’s only difference.
Similar is with other programming languages, eg C or C++ also require CPU compatible executable. Other languages like Java even do not care about “executable” but they must run proper CPU specific VM
I regularly build computers with AMD processors rather than Intel mainly because even though the high-end Intels like the i7 are much faster than the high-end AMD’s most people who ask me to build them a PC just want a good PC at an average price and the minute you add a £150 Intel CPU into the equation vs a £30 AMD and then a £150 Intel motherboard vs a £40 AMD motherboard the price of a new PC quickly rockets, and besides the basic AMD chips aren’t that much different, and actually in some instances the AMD chips can be far superior, especially when it comes down to GPU. The CPU’s with inter
… (more)Short answer: ARM is a RISC architecture and x86 is a CISC architecture.
But what does that mean?
Consider the following C code:
- int square(int num) {
- return num * num;
- }
When compiled to x86 machine code it looks like this:
- 89 f8 // mov eax,edi
- 0f af c7 // imul eax,edi
- c3 // ret
so that function compiles into 3 instructions, the first instruction is 2 bytes, the second instruction is 3 bytes and the last instruction is 1 byte.
Some characteristics of this instruction set are:
- As you can see some instructions can be longer than others, in x86 an instruction may be anywhere between 1 and 15 by
Seymour Cray of Cray supercomputers fame once said “Any idiot can design a fast processor, the challenge is to design a fast system”. Or something very close to that. :)
So the answer is “Supercomputers are not about fast processors, they are about fast everything else”. Memory access times, NUMA, interconnect speed and topology, oh my! And that’s just the tip of an iceberg. :)
It depends on how you measure. Modern x86 has many modes also.
On x86, you have the backward compatibility cruft such as Real Mode, as well as Protected Mode, VM86 (which allows 16-bit code to run in a 32-bit Protected Mode environment), Long Mode, and Compatibility Mode (which allows you to run 32-bit Protected Mode code on a 64-bit machine). That’s 5 separate modes!
And then there’s virtualization, which allows you to encapsulate a virtual x86 environment behind VMENTER and VMEXIT. As I recall, x86 treats guests as a parallel privilege hierarchy to the host OS, with its own virtual-flavored ve
First ARM processors are not necessarily slower than x86. You can get mobile arm chips and you can get server arm chips and the speed is totally different.
Second you obviously have no clue how supercomputers work. The speed of the individual chips is less important than the number of chips and speed of the interconnects between them. Supercomput...
They just are completely different design. ARM was designed in the 80s for the Acorn Archimedes desktop computer. The design from the very beginning was for low power use. The reason originally was cost so that they could use plastic packaging. ARM is based on so called RISC or reduced instruction set philosophy. The term is somewhat bad as it has instructions that Intel CPU does not have like conditional execution of normal instructions. I think simplified or streamlined would describe it better. RISC processors also have plenty of registers, like about 32 in ARM. Intel originally has only ei
… (more)It will be one day, but not yet.
Currently there are no ARM cores powerful enough, and the infrastructure for high-powered arm PCs is not yet there. And ARM-based servers will not be really taking off until we first have an ARM-based powerful development desktop PC where all the software of the server can be easily developed without either needing remote connection to different computer or cross-compilation.
But there is absolutely nothing preventing making those in the future.
Some answers to this question talk about RISC vs CISC without REALLY understanding the actual differences between them,
… (more)Power consumption directly relates to number of transistors and transistors do not care are they implementing ARM or x86.
Looking at connection between power draw and transistor count. Cause of silicon limitations related to clocks and power dissipation, only way to achieve higher performance is complex microarchitecture. We tell how cores internally are “wide”. Wide means as much work as possible is done in parallel. Since Pentium Pro days, 20 years ago, x86 switched to superscalar architecture where instructions used to write programs are internally, by core, converted into micro and macro op
… (more)This is an unanswerable question as stated, because it depends on, well, quite a lot.
- What exact model x86 and what exact model ARM? The “microarchitecture” of different models of x86 vary widely in “cycles per instruction”, which is sort of a measure of microarchitecture. The same is true of ARM
- There is an architectural “efficiency” as well, with different versions of x86 offering many levels of vector performance, for example. The same is true of ARM
- What compilers are used? On x86 you are likely to get different performance from gcc or icc or clang, for example. What switches are used?
- Is Open
Supercomputers are not single CPU devices. The single core Intel processor performance is greater than a single core ARM, however the type of programs that run on supercomputers are things like weather simulation, where the program is massively parallel. The main limitation for a particular facility will be cooling followed closely by power, so having simple processors that consume less power allows you to pack more cores into the data centre. I do not know if RISC V will allow greater densities to beat ARM, but if it does we will see supercomputers using such processors.
I interpret your question as one that asks about a system where both x86 and ARM processors will be accessible to the programmer or user (*).
Apart from the need to run ARM code natively, there is no particularly good reason to do anything like this. The comparative power consumption advantage of ARM architecture is largely a myth; as the ISA itself has nothing to do with power consumption. (See the paper “Power Struggles” by Blem et al. to convince yourself) . ARM cores are not made of magical fairy dust and unicorn poop. There are x86 cores that are competitive with ARM for performance-per-wa
… (more)Intel have a trick up their sleeve called Lakefield which is designed to work like ARM processors. But here’s the thing x86 was the first computer instruction set ever designed and so the designers never thought about multiple cores or instruction level parallelism or out-of-order execution. x86 isn’t feasible with a wider decoder because of variable length instructions. These new techniques have become the standard over the past 5 years. Apple’s M1 performance is heavily based on this parallelism that just isn’t possible with x86.
intel have also been stuck on the same process node for years n
… (more)They will start making Intel® branded ARM processors a little over 20 years ago.
That's an Intel® Xscale processor from 2001. It runs the ARMv5 instruction set.
Intel® ships ARM cores in FPGAs and smart NICs today.
I should know. I was a lead ARM subsystem architect for Intel®’s Mount Evans and Mount Morgan IPUs. That's a 16 core ARM server in the upper right:
If you're wondering whether they're going to build something that drops into a motherboard like a typical desktop or server processor, I can't see why they’d bother.
Who would buy it? How would it benefit Intel®?
Intel® has a huge software eco
… (more)ARM(v8, v7, v6, v5…) and X86 are Instruction Set Architectures, while Cell is a micro-architecture which combines a PowerPC core (PPE) and a set of co-processors (SPEs). In effect, you’re comparing a single processor to an entire ISA, which is not an apples to apples comparison.
image: Understanding the Cell Microprocessor
PowerPC has achieved some level of success and was notably present in most of Apple’s computers from 1994 to 2006. It also served as the ISA for the Xbox 360, PS3, Gamecube, Wii and Wii U, on top of a variety of other applications. That being said, it did have a more limited s
… (more)You mean power as electrical power consumption or performance?
If we talk electrical power, or better to use TDP, then there are not so much differences for the same platform. Difference is related primarily to process node used, eg 3nm, and OS optimizations.
Today we have more CPUs to compare:
- Apple M3 Max ARM produced in 3nm having 12+4 cores and TDP of 78W
- Snapdragon Elite X ARM having 12 cores, produced in 4nm and TDP up to 80W
- AMD Ryzen 7945HX having 16 cores, produced in 5nm and TDP 75W
- AMD Ryzen 8945HS having 8 cores, produced in 4nm and TDP 54W
- Intel i9 185H with 6+8+2 cores, 7nm (multiple n
That’s not the way it works. Comparing specific designs, for example, Intel to AMD CPU cores (both x86/x64 instructions per clock (IPC) has some meaning. However, the three designs mentioned are all close enough in performance. Any significant differences are usually due to differences in the detailed design of specific processors.* Software matters a lot, as well as operating systems, so once you start with a given processor family, you usually stay with it.
- There is one difference right now that can cause a significant difference in floating-point throughput. Unlikely, to affect you, but if y
There's no general definition of “computing power”: that's why benchmarks are a thing.
- Sometimes the Linpack benchmark suite is run, and that gives you floating point throughput for that benchmark. You can then compare processors for those sorts of workloads.
- You might use a more general benchmark, like Spec. That tests a variety of loads, and tells you how a specific processor performs in those workloads.
- You might run something like 3dmark firestrike as a benchmark, which will give you a rough idea of performance in some workloads (notably gaming).
You can really only compare processors for a gi
… (more)As said many times, biggest ARM benefit is power consumption but, it is not so much different to latest x86 CPUs (50 - 100%). Apple benefit, big benefit, here is TSMC 5nm process which is, according to TSMC, up to 30% lower power. AMD CPUs are at 7nm while Intel is somewhere around (do not forget Intel 10nm is like TSMC 7nm!).
Also ARM ISA (instruction set) is way less complicated then x86 but complexity stops at instruction decoders. After instructions are converted into macro Ops (mOp) or micro Ops (uOp) is, let say, identical.
M1 is here “from another world”, its reorder buffer (OoO) is 6
Arm has twice as many general purpose registers as x86_64. Of course both will have plenty of shadow registers and such. But in terms of number of registers which a programmer can access Arm has a lot more.
Arm is much closer to a classic RISC architecture than x86 which matches more closely a CISC architecture. What does that mean in practice? Arm instructions are all fixed length of 32-bits. x86
… (more)A SoC is simply a highly integrated chip which contains a lot of secondary IP blocks and fixed function units on the same die alongside the CPU/GPU complex.
There are also x86 SoCs out there with the same level of integration. They are not as widespread as ARM SoCs given that ARM’s main market is the low power embedded and mobile segment while x86 targets higher performance devices where you don't necessarily need to have everything on a single chip.
Below is a block diagram of a modern x86 SoC from Intel’s Gemini Lake series

No comments:
Post a Comment