What's new

ICube UPU, the next step in processor evolution?

qwerrty

SENIOR MEMBER
Joined
May 12, 2010
Messages
3,743
Reaction score
-12
ICube UPU, the next step in processor evolution?
Reported by Nebojsa Novakovic on Friday, January 13 2012 4:34 pm

In the last part of our Chinese CPU story, we covered ICube and their processors with a brand new instruction set, something not seen in like two decades. Here's a bit more about the new architecture after the visit to their Shenzhen HQ.
4
REVIEW

Any processor guru will tell you - It's very difficult to prop up a new instruction set architecture, even after going through all the technical difficulties, as the whole software stack has to be created from scratch: BIOS, compilers, libraries, operating system porting, drivers, basic applications, not to mention convincing the critical third-party software vendors to actually support the new architecture. All this requires substantial engineering, marketing and financial resources, not to mention a lot of guts.

Ever since Alpha in 1991, there was no new major instruction set architecture to appear in the general market. In fact, since then, most of the non-X86 architectures disappeared from the scene, leaving the X86 - even though widely agreed to be technically the worst - as the pre-dominant one. Power and Sparc still keep a part of the server field, while ARM is, of course, the king of the hill right now in the mobile arena, with its old RISC competitor, MIPS, making some inroads as well.

Now, for the first time in two decades, there's a company openly promoting its own new instruction set, and launching a processor based on it right into the hot waters on the mobile device market. Furthermore, UPU is a brand new philosophy too - for the first time, CPU and GPU are truly fused into one processor core, MVP (Multi-thread Virtual Pipeline), where even the register file is shared! And, the extra surprise element is that the whole thing is fully designed and made in China, a true 'China Core' right from the instruction set definition, without any licensing or other dependencies on the US technology.

ICube was set up by Fred Chow and Simon Moy, two industry veterans: Simon was behind the world's first 64-bits MIPS processors in SGI, and after that the principal engineer in Nvidia for 7 years until 2004, in charge of all the inital GPU, shader and GPU computing efforts. Fred was chief scientist at SGI, in its golden days of funky coloured superworkstations, and principal engineers at MIPS, later developing the Pathscale compiler suite that enabled AMD Opteron its first 64-bit X86 support. He is the chief architect of the open-source Open64 compiler suite.

So, interestingly, here you have two CPU designers actually driving the initial CPU design and instruction set - often it was memory companies (Intel was a DRAM maker when designing its first CPUs) or system companies like IBM, DEC or HP.

icube2.jpg.jpeg


UPU (Unified Processor Unit) approach in their 'Harmony' architecture is the first situation where CPU and GPU threads are sharing the same execution units, register file and many instructions. In a sense, it is a 'total fusion' of the two, unlike the AMD Fusion APU approach where CPU and GPU are still distinct, with separate instruction sets, registers, execution units and such. You could consider the UPU as an example of 'homogeneous computing' just like any standard processors we are used to, while the APU belongs to 'heterogeneous computing' where different threads of different nature would run separately on CPU and GPU portions.

icube1.jpg.jpeg


A very simple, elegant 32-bit RISC core, not unlike the original MIPS, does both functions, and the single 32-unit 32-bit register file is there for all operations. To support further parallelism, 4-way multithreading per core is supported, with optimised logic to remove the need for 4 separate register files. The compromises in the initial version? No SIMD vector stuff like Intel AVX, and no double-precision FP either. If you want more performance, you use more cores, which can be piled up together easily due to comparatively very small core footprint - only 2.7 square mm in the old 65 nm process. If in 32 nm process, it'd likely be only 1 square mm. This means that, on an average current 200 square mm CPU chip in 32 nm process, you could mount over 100 of these cores, plus interconnect logic and huge multimegabyte shared cache, all together.

The result is a very small, compact dual-core chip even in the initial IC1 iteration, which they plan to scale to a quad-core IC2 next year, using 40 nm or better process. Simon Moy told us that, since China is in need of fast catching up, we are likely to see its CPU vendors jumping two process generations in one go to reach higher performance. That also applies to high-end MIPS-based Loongson processors there.

The first chip specs are not bad knowing it's the very first iteration of a brand new instruction set:

icube3.jpg.jpeg


While this first iteration is aimed squarely at inexpensive smartphones and tablets, without need for Full HD display or encoding, the potential of the architecture is there. Small, very low power core, yet with clean, elegant instruction set and ability to put thousands of cores in a single rack, can cover both client and server sides of a cloud, since many-user web page serving doesn't require 64-bitness or fast cores with large memory, just many cores for each of many threads to have its own resource without context switching overhead.

What they could add further is, obviously, 64-bitness with SIMD support in some future iteration for the high end market, as it would help both integer and GPU performance ultimately, as well as address the high end market too. Also, a single channel of DDR2-533 is enough for a low-end phone, but multichannel DDR3 support will be important for market expansion to higher-end devices like servers where its new instruction set is not a problem as the whole open source stuff can be quickly compiled usually.

Once the Android port is fully tuned over the next few months, we'll have a look at the device's performance in real OS, and its real potential. ICube has a tough job on their hand convincing partners that a brand new instruction set should be supported, but, what they created here can apply across many market segments ultimately, from smartphone to superserver. Let's see how they do in their initial chosen market segments first.


ARTICLE NAVIGATOR

Read more: vr-zone.com/articles/icube-upu-the-next-step-in-processor-evolution-/14518.html#ixzz1jdTThtlX
................................
 
But today's consumer processors are all about HD display (decoding) and encoding...
 
Unless China manage to shrink its die fab process from 65nm to current 32/28nm standard, it's not going to be competitive both in terms of core performance and energy efficiency. Right now China can barely make 65nm process. Another issue is that integrated graphics are aimed at the lower end market, and cannot replace dedicated GPU anytime soon.

If your post is implying that China would be able to cut its share of pie in the international CPU market, then you are at least 10 years premature.
 
they are only aiming to steal market share form ARM, mobile market, not to compete with x86 or in GPU buisiness.

roadmap
icuberoadmap640x439.png
 
they are only aiming to steal market share form ARM, mobile market, not to compete with x86 or in GPU buisiness.

Windows 8 is going to support ARM, so the future battle of processors for the tablet, mobile and low end PC market is going to be intense. On top of that, the upper end PC market are where the big bucks can be made, with media professionals/gamers willing to shell out hundreds for a good CPU/GPU. More than profits, it is this market segment that is pushing developers to push the boundaries of integrated circuits. If China wants to get in on the bleeding edge of technology, it will have to clash with AMD/Intel/Nvidia at some point.

Oh and the problem of fab still exists. China at this point has only managed to get familiar with 65nm process, with its first fab in mid 2011. If you can't make the chip at home, you don't have any advantage over competitors.
 
Furthermore, UPU is a brand new philosophy too - for the first time, CPU and GPU are truly fused into one processor core, MVP (Multi-thread Virtual Pipeline), where even the register file is shared! And, the extra surprise element is that the whole thing is fully designed and made in China, a true 'China Core' right from the instruction set definition, without any licensing or other dependencies on the US technology.

The UPU design seems to be a tremendous innovation. I have never heard of a fused CPU and GPU sharing a register file. I'm impressed. I used to know how to program in assembly language and work with registers.

Oh and the problem of fab still exists. China at this point has only managed to get familiar with 65nm process, with its first fab in mid 2011. If you can't make the chip at home, you don't have any advantage over competitors.

China's SMIC has 40nm technology and plenty of fabs.

Semiconductor Manufacturing International Corporation - Wikipedia, the free encyclopedia

"Semiconductor Manufacturing International Corporation, (abbrev. SMIC, NYSE: SMI, SEHK: 981) is a semiconductor foundry in mainland China, providing integrated circuit (IC) manufacturing services on 350nm to 40nm process technologies. Headquartered in Shanghai, SMIC has wafer fabrication sites throughout China, offices in the U.S., Italy, Japan, and Taiwan, and a representative office in Hong Kong. Notable customers include Qualcomm, Broadcom, and Texas Instruments.[1][2][3]

Wafer Fabs

• Shanghai mega-fab: 300mm wafer fabrication facility (fab) and three 200mm wafer fabs
• Beijing mega-fab: Two 300mm wafer fabs
• Tianjin: 200mm wafer fab
• Shenzhen: 200mm wafer fab (under construction)
• Wuhan: 300mm wafer fab (owned by Wuhan Xinxin Semiconductor Manufacturing Corporation, managed and operated by SMIC)[4]"

----------

China Expects to Compete with Intel, in 20 Years

"China Expects to Compete with Intel, in 20 Years
7:00 PM - March 14, 2011 by Douglas Perry - source: HPCWire

China, which recently ascended to the top of the world's supercomputer powers, said that, by the end of this year, it will not be using foreign processors for its future supercomputers anymore.

Instead, the country intends to rely on its own Loongson series of processing cores.

The upcoming Dawning 6000 supercomputer is expected to use 10,000 of these chips to achieve an estimated 1 Petaflop of processing performance. While it is a national priority to use its own processors, scientists admitted that it is difficult at this time to exploit the full potential of the chips as there aren't programmers who can squeeze every bit of horsepower out of them.

China's Tianhe-1A supercomputer, currently the world's fastest supercomputer, uses a total of 186,368 cores and achieves a peak performance of 4.7 PFlops. The systems uses Intel Xeon X5670 CPUs as well as Nvidia Tesla GPUs.

China expects that it will take at least 10 years until the chips will be able to supply the domestic market. In 20 years, China wants to compete with companies such as Intel here in the U.S."
 
The UPU design seems to be a tremendous innovation. I have never heard of a fused CPU and GPU sharing a register file. I'm impressed. I used to know how to program in assembly language.



China's SMIC has 40nm technology and plenty of fabs.

Semiconductor Manufacturing International Corporation - Wikipedia, the free encyclopedia

"Semiconductor Manufacturing International Corporation, (abbrev. SMIC, NYSE: SMI, SEHK: 981) is a semiconductor foundry in mainland China, providing integrated circuit (IC) manufacturing services on 350nm to 40nm process technologies. Headquartered in Shanghai, SMIC has wafer fabrication sites throughout China, offices in the U.S., Italy, Japan, and Taiwan, and a representative office in Hong Kong. Notable customers include Qualcomm, Broadcom, and Texas Instruments.[1][2][3]

Wafer Fabs

• Shanghai mega-fab: 300mm wafer fabrication facility (fab) and three 200mm wafer fabs
• Beijing mega-fab: Two 300mm wafer fabs
• Tianjin: 200mm wafer fab
• Shenzhen: 200mm wafer fab (under construction)
• Wuhan: 300mm wafer fab (owned by Wuhan Xinxin Semiconductor Manufacturing Corporation, managed and operated by SMIC)[4]"
Pure-play fab is not the same as manufacturing the required optical lithography equipments. China can even import the needed machinery to produce Loongson 3B, based on 28nm fab. That's not the subject here. SMIC depends on foreign equipments for its manufacturing. Currently, Canon, Nikon and ASML dominate those market. To put this into an analogy: You can build a car, but you can't build the tools needed to build that car. Right now, China is stuck at 65nm at microfabrication process. Intel has already moved on to 22nm process, leaving China two generations (roughly 8 years) behind.
 
Pure-play fab is not the same as manufacturing the required optical lithography equipments. China can even import the needed machinery to produce Loongson 3B, based on 28nm fab. That's not the subject here. SMIC depends on foreign equipments for its manufacturing. Currently, Canon, Nikon and ASML dominate those market. To put this into an analogy: You can build a car, but you can't build the tools needed to build that car. Right now, China is stuck at 65nm at microfabrication process. Intel has already moved on to 22nm process, leaving China two generations (roughly 8 years) behind.

You pulled a fast one. You switched the topic to semiconductor manufacturing equipment.

Let's return to your original points. You said China only had 65nm technology. Not true. SMIC has 40nm technology.

You said China's first fab was built in mid-2011. Also, not true. SMIC has been in business for many years.

You're not going to admit you were wrong? There's no shame in being wrong. I make occasional mistakes too. However, I admit my errors. Will you?

----------

Anyway, a CPU and GPU are two completely separate units. The GPU used to be an add-on card (e.g. NVIDIA specializes in GPUs). The UPU design seems to be very unique and interesting. I'll read up on it in the next few days.

I used to bake wafers (after photoresist, masking, and etching, etc.) in quartz boats to make chips, which explains my continued interest in the semiconductor industry.
 
both intel & AMD are currently working on this type of UPU. In a few years, they will have their first chip, all the ignorant fools will say them stupid chinamen COPY their UPU architecture. :lol:


Todays APUs


Available since January of this year, AMDs line-up of APUs are the first to integrate x86 CPU cores and DirectX 11-capable Radeon GPU cores on a single die and have been widely adopted by computing OEMs worldwide. Being on the same chip reduces the system power and bill-of-materials, speeds the flow of data between the CPU and GPU through shared memory, and allows the GPU to function as both a graphics engine and an application accelerator in highly efficient compute platforms.

APUs of Tomorrow

Building on the success of the integration of CPU and GPU processing cores on the same chip, AMD is now focused on evolving the architecture to make it appear as a unified processing element to the software programmer. That includes a number of evolutionary steps expected to continue through 2014 such as:

Support for C++ features that more fully leverage the GPU as a parallel processor
User-mode scheduling for lower latency task dispatch between CPUs and GPUs
Unified memory address space and fully coherent memory shared by the CPU and GPU so they operate seamlessly together

AMD also announced plans to publish a detailed specification on the features and functionality required to meet the requirements of the architecture.

*ttp://www.amd.com/us/press-releases/Pages/amd-details-future-technical-2011june14.aspx
 
The UPU design seems to be a tremendous innovation. I have never heard of a fused CPU and GPU sharing a register file. I'm impressed. I used to know how to program in assembly language and work with registers.

i program in assembly, often the job is to use the cheapest processor and squeeze the fastest speed out of it. the register is often the fastest addressing mode. having the gpu and cpu use the same one would definitely improve performance as we wouldnt have to move it out of the register then pass it over to the other core. wonder how many registers this thing has.
 
both intel & AMD are currently working on this type of UPU. In a few years, they will have their first chip, all the ignorant fools will say them stupid chinamen COPY their UPU architecture. :lol:

I think Intel's Sandy Bridge and AMD's fusion is already offering this. The new core i7 has a built in GPU core running at 850MHz
 
I think Intel's Sandy Bridge and AMD's fusion is already offering this. The new core i7 has a built in GPU core running at 850MHz
AMD and Intel's offerings are just the CPU and GPU integrated on the same die, but they are still separate heterogeneous cores with their own independent registers and caches.
 
The UPU design seems to be a tremendous innovation. I have never heard of a fused CPU and GPU sharing a register file. I'm impressed. I used to know how to program in assembly language and work with registers.



China's SMIC has 40nm technology and plenty of fabs.

Semiconductor Manufacturing International Corporation - Wikipedia, the free encyclopedia

"Semiconductor Manufacturing International Corporation, (abbrev. SMIC, NYSE: SMI, SEHK: 981) is a semiconductor foundry in mainland China, providing integrated circuit (IC) manufacturing services on 350nm to 40nm process technologies. Headquartered in Shanghai, SMIC has wafer fabrication sites throughout China, offices in the U.S., Italy, Japan, and Taiwan, and a representative office in Hong Kong. Notable customers include Qualcomm, Broadcom, and Texas Instruments.[1][2][3]

Wafer Fabs

• Shanghai mega-fab: 300mm wafer fabrication facility (fab) and three 200mm wafer fabs
• Beijing mega-fab: Two 300mm wafer fabs
• Tianjin: 200mm wafer fab
• Shenzhen: 200mm wafer fab (under construction)
• Wuhan: 300mm wafer fab (owned by Wuhan Xinxin Semiconductor Manufacturing Corporation, managed and operated by SMIC)[4]"

----------

China Expects to Compete with Intel, in 20 Years

"China Expects to Compete with Intel, in 20 Years
7:00 PM - March 14, 2011 by Douglas Perry - source: HPCWire

China, which recently ascended to the top of the world's supercomputer powers, said that, by the end of this year, it will not be using foreign processors for its future supercomputers anymore.

Instead, the country intends to rely on its own Loongson series of processing cores.

The upcoming Dawning 6000 supercomputer is expected to use 10,000 of these chips to achieve an estimated 1 Petaflop of processing performance. While it is a national priority to use its own processors, scientists admitted that it is difficult at this time to exploit the full potential of the chips as there aren't programmers who can squeeze every bit of horsepower out of them.

China's Tianhe-1A supercomputer, currently the world's fastest supercomputer, uses a total of 186,368 cores and achieves a peak performance of 4.7 PFlops. The systems uses Intel Xeon X5670 CPUs as well as Nvidia Tesla GPUs.

China expects that it will take at least 10 years until the chips will be able to supply the domestic market. In 20 years, China wants to compete with companies such as Intel here in the U.S."


You know that the 200mm wafer fabs were built by Taiwainese, right?
 
Back
Top Bottom