What's new

China Dominates the World TOP500 Supercomputers


Team Swangeese!
Kids from USTC won SCC16.

http://www.studentclustercompetition.us/

SwanGeese.1.jpg


SwanGeese.2.jpg


SwanGeese.3.jpg


SwanGeese.4.jpg





Having the fastest HPC and winning in HPC Application Competition are not the same thing, bro!

These kids sure have brain the size of a pumpkin. :enjoy:
 
dqefqwwqrg-png.353054


Six outstanding research efforts in high performance technical computing have been selected as finalists in supercomputing’s most prestigious competition, the ACM Gordon Bell Prize in High Performance Computing. This year’s finalists represent the broad impact that the field of high performance computing has across the many disciplines of science and engineering:

cewqrqr-png.353056
  1. A Highly Effective Global Surface Wave Numerical Simulation with Ultra-High Resolution,” by a research team from the First Institute of Oceanography (China), National Research Center of Parallel Computer Engineering and Technology (China) and Tsinghua University (China) (abstract)
  2. 10M-Core Scalable Fully-Implicit Solver for Nonhydrostatic Atmospheric Dynamics,” by a research team from the Chinese Academy of Sciences, Tsinghua University (China), the National Research Center of Parallel Computer Engineering and Technology (China) and Beijing Normal University (China) (abstract)
  3. Extreme-Scale Phase Field Simulations of Coarsening Dynamics on the Sunway Taihulight Supercomputer,” by a team of researchers from the Chinese Academy of Sciences, the University of South Carolina, Columbia University (New York), the National Research Center of Parallel Computer Engineering and Technology (China) and the National Supercomputing Center in Wuxi (China) (abstract)
  4. Simulations of Below-Ground Dynamics of Fungi: 1.184 Pflops Attained by Automated Generation and Autotuning of Temporal Blocking Codes,” by a research team from RIKEN (Japan), Chiba University (Japan), Kobe University (Japan) and Fujitsu Ltd. (Japan) (abstract)
  5. Towards Green Aviation with Python at Petascale,” by a research team from Imperial College London (England) (abstract)
  6. Modeling Dilute Solutions Using First-Principles Molecular Dynamics: Computing More than a Million Atoms with Over a Million Cores,” by a research team from Lawrence-Livermore National Laboratory (Calif.) (abstract)
Since three out of six finalists are from China (other three from Japan, US, UK), so exactly which Chinese team won the prize?


The second one, 10 M core scalable fully implicit solver...
 
Does it came as a surprise? Since the top 2 fastest supercomputer is from China. It shall not be surprised we easily won such competition. Not only that China is in fact already a leader in computer technology.


I started this thread, I congratulate China for all its achievements, but China is NOT the leader in computer technology.

China is emerging, but right now US is leaps and bounds ahead in computer technology, overall HPC expertise.

As I have said, I follow this scene, and I know that even the Chinese know this, which they are trying to remedy.

PS- Having the top computer doesn't mean anything if you don't know how to use them.
 
I started this thread, I congratulate China for all its achievements, but China is NOT the leader in computer technology.

China is emerging, but right now US is leaps and bounds ahead in computer technology, overall HPC expertise.

As I have said, I follow this scene, and I know that even the Chinese know this, which they are trying to remedy.

PS- Having the top computer doesn't mean anything if you don't know how to use them.

You often get semiconductor technology, computer architecture and software confused, so I'm not sure what you actually mean by any of this.
 
I started this thread, I congratulate China for all its achievements, but China is NOT the leader in computer technology.

China is emerging, but right now US is leaps and bounds ahead in computer technology, overall HPC expertise.

As I have said, I follow this scene, and I know that even the Chinese know this, which they are trying to remedy.

PS- Having the top computer doesn't mean anything if you don't know how to use them.


You are right. Slowly but surely, Chinese are making progress in each and every field. We don't talk big, we work hard, which separates a great power from wannabes.
 
Team Swan Geese from USTC wins Student Cluster Competition awards in SC16.

中国科大包揽SC16大学生超算竞赛双料冠军


201611181414317670.jpg




图说:中国科大代表队上台领奖

11月16日,在美国盐城湖举行的2016全球超级计算机大会(SC16)大学生超算竞赛中,中国科学技术大学代表队成功打破LINPACK性能基准测试(HPL)世界纪录,创下了31.15TFLOPS的新高;并最终凭借优异表现获得总分以及LINPACK测试双料冠军——这是该项赛事历史上首次有队伍获得双料冠军。

全球超级计算大会是全球超级计算产业的顶级盛会,迄今已有28年历史。自2007年国际大学生超算竞赛(简称SCC)在全球超级计算大会上举办以来,该赛事就吸引着来自世界各地的一流大学踊跃参加,被誉为高性能计算领域的学生“奥运会”。

中国科技大学代表队的队名为“Swan Geese”(鸿雁) ,该校计算机学院教授安虹担任主教练。在谈及队名的由来时,他们解释说,在中国的传统文化中,鸿雁代表着团队合作、坚毅和勇敢,这正是他们希望在赛会中展现出的。
 
Last edited:
Well, the thing is if the prize organiser refused to grand it to Chinese once more,
the whole prize would become a total joke.

CnybnyE005040_20161117_NYMFN0A001_11n.jpg


WASHINGTON, Nov. 17 (Xinhua) -- A 12-member team of Chinese researchers on Thursday won the 2016 ACM Gordon Bell prize, the top award in the field of supercomputing.

This is the first time that Chinese researchers have been awarded the honor.

"It's a historic breakthrough," said Haohuan Fu, deputy director of the National Supercomputing Center in Wuxi and one of the team members.

Fu and his colleagues were honored for developing a method for calculating atmospheric dynamics that could be used to improve global climate models as well as weather predictions.

The award was presented at the 2016 Supercomputing Conference in Salt Lake City, Utah.

The Gordon Bell Prize, awarded each year at the annual supercomputing conference, was established in 1987 by Gordon Bell, a pioneer in high-performance and parallel computing.

It tracks the progress of parallel computing and rewards innovation in applying high performance computing to challenges in science, engineering, and large-scale data analytics.

http://news.xinhuanet.com/english/2016-11/18/c_135838749.htm
 
Typical simulation work that has been completed by Sunway Taihu Light

  • Peta-Scale Fully-Implicit Solver for Nonhydrostatic Atmospheric Dynamics with 8.5M Cores / 千万亿次八百五十万核可扩展非静力大气动力全隐求解器
This research wins the 2016 Gordon Bell Prize in High Performance Computing

An ultra-scalable fully implicit is developed for stiff time-dependent problems frequently found in atmospheric dynamics. In the solver, a hybrid multigrid domain decomposition preconditioner is proposed to greatly accelerate the convergence of the solver, and to exploit coarse-grained parallelism. A physics-based multi-block asynchronized incomplete LU factorization method is customized to solve the subproblems on each overlapped subdomain to further gain fine-grained concurrency. We perform systematic optimizations on different hardware levels for best utilization of the heterogeneous computing units and substantial reduction of the cost of data-movement. The solver enables fast and accurate atmospheric simulations on the emerging heterogeneous Sunway supercomputer in China, scaling to over 8.5 million heterogeneous cores

本项目设计并开发了一种用于大气动力学中经常出现的强时间依赖问题的高可扩展性全隐式求解器。该求解器使用异构多重网格局部分解算法,显著加快了求解器的收敛过程,并利用粗粒度并行。此外,设计了在物理上的多块异步不完全LU分解方法来解决每个重叠子域上的子问题,从而进一步取得粗粒度并发。同时,在不同硬件层级上实现了系统层面上的优化,充分利用异构计算单元,减少数据移动的开销。基于“神威·太湖之光”超级计算机,该求解器实现了快速且准确的大气模拟,可以扩展到850万核

  • Large Scale Phase Field Simulation for Coarsening Dynamics Based on Cahn-Hilliard Equation with Degenerated Mobility / 钛合金微结构演化相场模拟

We present large scale phase field simulation on the new Sunway TaihuLight supercomputer. The highly nonlinear and severely stiff Cahn-Hilliard equations with degenerated mobility for microstructure evolution are solved at extreme scale, demonstrating that the latest advent of high performance computing platform and the new advances in algorithm design are now offering us the possibility to accurately simulate the coarsening dynamics at unprecedented spatial and time scales.

钛合金制备工艺复杂,微观组织形成机制和规律难以通过实验获得,常借助于软件模拟。相场法能够模拟微观组织的演化过程,广泛应用于新材料的设计。ScETD-PF是基于可扩展紧致指数时间差分算法库的相场模拟软件。该软件由中科院网络中心自主开发,支持计算材料科学、计算物理、计算生命科学等学科科研模拟。该应用首次实现了国际最大规模的钛合金微结构粗化相场模拟,显著加快了我国新型钛合金的设计和工艺优化。

This research is selected as a finalist for the 2016 Gordon Bell Prize.


  • A Highly Effective Global Surface Wave Numercial Simulation with Ultra-high Resolution / 高分辨率海浪数值模拟
This research is selected as a finalist for the 2016 Gordon Bell Prize.

Surface wave is one of the most energetic motions in the global ocean, and it is crucially important to marine safety and climate change. High resolution global wave model has the key role for accurate wave forecasting. However the parallel efficiency with a large amount of computation is a big barrier for this kind model by now. In this work, a breakthrough in the design and application of irregular quasi-rectangular grid decomposition, master-slave cooperative computing workflow and pipelining schemes for high resolution global wave model has been achieved. Based on these innovations, the ultra-high horizontal resolution of (1/60) °by (1/60) °global wave model is implemented in the new Sunway heterogeneous Supercomputer with 100 PFlops peak performance. The results show that peak performance of our model can reach 30.07 PFlops with full-scale system consisting of 8,519,680 cores. These innovations provide good scalability and high efficiency for ultra-high resolution global wave model.

对于海洋模式模拟而言,分辨率的提高会带来计算量的大幅提升。如果水平分辨率提高10倍,模式的计算量将增加数百乃至上千倍,是未来E级计算机系统的驱动应用。该应用在“神威·太湖之光”超级计算机实现了(1/60)°高分辨率的全球海洋模式,通过从核加速以及负载均衡、通信重叠和指令流水等优化手段,模式成功扩展到8,519,680核数,达到最高30.07PFlops的峰值性能,获得了优异的扩展性与并行效率。

  • Numerical Simulation of the Aerospace-craft Unification Algorithm / 航天飞行器统一算法数值模拟
Tiangong-1 is China’s first space station, which serves as an experimental testbed for orbital rendezvous and docking. Tiangong-1 is also the prototype of China’s future space lab. In this work, we perform the simulation of the turbulent state of the two-cabin simplified model (10-meter long, with a diameter around 3.5 meters in the cross section) of the Tiangong-1 spaceship in the failing process (flying height=65KM, Ma=13). By using 16,384 processors of the Sunway system, the computation job, which normally takes 12 months, was finished in 20 days. In addition, the simulation results provide a good fit to the result of the wind tunnel test.

基于”神威·太湖之光“超级计算机,对”天宫一号“飞行器两舱简化外形(长度10余米、横截面直径近3.5米)陨落飞行(H=65km、62km、Ma=13)绕流状态大规模并行模拟,使用16,384个处理器在20天内便完成常规需要12个月的计算任务,计算结果与风洞实验结果吻合较好,为”天宫一号“飞行试验提供重要数据支持。

This is a particularly interesting piece of work. China conducted the Tiangong-1 mission in 2011. And such simulation procedures are usually conducted two to three years before the real mission. So does it mean Sunway Taihu Light, or the prototype of Sunway Taihu Light already existed in 2008/09???

@cirr @TaiShang @AndrewJin @Shotgunner51 @ahojunk @JSCh

  • Refactoring and Optimizing the CAM on the New Sunway Many-core Supercomputer / 基于国产平台的国产地球系统模式

Our efforts are refactorizing and optimizing the Community Atmosphere Model (CAM) on the new Sunway supercomputer, which uses a many-core processor that consists of management processing elements (MPEs) and clusters of computing processing elements (CPEs). To map the large code base of CAM to millions of cores on the Sunway system, we take OpenACC-based refactorization as major tool, and apply source-to-source translator tools to generate the most suitable parallelism for the CPE cluster, and to fit the intermediate variable into the limited on-chip fast buffer. For single kernels, when comparing the original ported version using only MPEs and the refactorized version using both the MPE and CPE clusters, we achieve up to 22x speedup for the computer-intensive kernels. For the 25km resolution CAM global model, we manage to scale to 24,000 MPEs, and 1,536,000 CPEs, and achieve a simulation speed of 2.81 model years per day.

本工作主要基于神威超级计算机来完成公共大气模式CAM的代码重构与性能优化。为了将代码量巨大的CAM模式扩展到神威系统的百万计算核上,研究团队依托神威系统提供的OpenACC框架,对原始代码进行重构,设计了与神威系统计算、存储模型相匹配的计算代码,有效地提高了计算性能。与纯主核版本相比,同时使用主、从核的优化程序能取得22倍的性能提升。通过使用24,000个主核以及1,536,000个从核,全球范围25公里分辨率的模拟速度可以达到2.81模式年/天。

  • 岛礁建设浮式平台的移植与优化
岛礁建设浮式平台总长100米级,可停靠万吨级船舶,具有土石方及各类建筑材料卸船、平台上重载汽车装运、经栈桥输送至礁盘、机械货物堆放与起吊、电力与燃料供应、施工人员食宿、淡水制造、污水处理等功能,可拖带至不同待建岛礁重复使用,满足岛礁建设浮式平台针对西沙群岛近期建设及中、南沙群岛有关岛礁将来建设对输送物资上岛,船载土石料高效卸运、礁盘上永久基地的高效施工的工程需求。在科技部973项目“海洋超大型浮体复杂环境响应与结构安全性”和工信部高技术船舶科研项目“岛礁中型(总长 300 米级)浮式结构物关键技术研究”支持下,中国船舶重工集团公司第七〇二研究所进一步发展了可计及复杂海底地形影响的三维水弹性力学分析方法。

用户通过基于吴有生院士创立的三维水弹性理论而发展的可以考虑航速、频域二阶非线性、计及海底地形影响等因素的可视化成熟软件THAFTS,首次采用了数百万处理器核对近岛礁浮式平台和海洋超大型浮体三维水弹性问题进行了大规模并行计算。计算结果准确揭示了在近岛礁海底变化和波浪非均匀性影响下浮式平台的运动和载荷响应特性,所得数据可靠,并与试验结果相互验证,比较准确地评估了浮式平台在近岛礁复杂环境条件下的结构应力水平,具有重要的理论价值和工程实际意义。
 
View attachment 353054

China has best supercomputers, the hardwares. How about supercomputing?

Six outstanding research efforts in high performance technical computing have been selected as finalists in supercomputing’s most prestigious competition, the ACM Gordon Bell Prize in High Performance Computing. The prize will be presented to a single winner this week during SC16 in Salt Lake City.

View attachment 353056

This year’s finalists represent the broad impact that the field of high performance computing has across the many disciplines of science and engineering:
  1. A Highly Effective Global Surface Wave Numerical Simulation with Ultra-High Resolution,” by a research team from the First Institute of Oceanography (China), National Research Center of Parallel Computer Engineering and Technology (China) and Tsinghua University (China) (abstract)
  2. 10M-Core Scalable Fully-Implicit Solver for Nonhydrostatic Atmospheric Dynamics,” by a research team from the Chinese Academy of Sciences, Tsinghua University (China), the National Research Center of Parallel Computer Engineering and Technology (China) and Beijing Normal University (China) (abstract)
  3. Extreme-Scale Phase Field Simulations of Coarsening Dynamics on the Sunway Taihulight Supercomputer,” by a team of researchers from the Chinese Academy of Sciences, the University of South Carolina, Columbia University (New York), the National Research Center of Parallel Computer Engineering and Technology (China) and the National Supercomputing Center in Wuxi (China) (abstract)
  4. Simulations of Below-Ground Dynamics of Fungi: 1.184 Pflops Attained by Automated Generation and Autotuning of Temporal Blocking Codes,” by a research team from RIKEN (Japan), Chiba University (Japan), Kobe University (Japan) and Fujitsu Ltd. (Japan) (abstract)
  5. Towards Green Aviation with Python at Petascale,” by a research team from Imperial College London (England) (abstract)
  6. Modeling Dilute Solutions Using First-Principles Molecular Dynamics: Computing More than a Million Atoms with Over a Million Cores,” by a research team from Lawrence-Livermore National Laboratory (Calif.) (abstract)
Three out of six finalists are from China, one from each of Japan, US, UK. Congrats all!
Typical simulation work that has been completed by Sunway Taihu Light

  • Peta-Scale Fully-Implicit Solver for Nonhydrostatic Atmospheric Dynamics with 8.5M Cores / 千万亿次八百五十万核可扩展非静力大气动力全隐求解器
This research wins the 2016 Gordon Bell Prize in High Performance Computing

An ultra-scalable fully implicit is developed for stiff time-dependent problems frequently found in atmospheric dynamics. In the solver, a hybrid multigrid domain decomposition preconditioner is proposed to greatly accelerate the convergence of the solver, and to exploit coarse-grained parallelism. A physics-based multi-block asynchronized incomplete LU factorization method is customized to solve the subproblems on each overlapped subdomain to further gain fine-grained concurrency. We perform systematic optimizations on different hardware levels for best utilization of the heterogeneous computing units and substantial reduction of the cost of data-movement. The solver enables fast and accurate atmospheric simulations on the emerging heterogeneous Sunway supercomputer in China, scaling to over 8.5 million heterogeneous cores

本项目设计并开发了一种用于大气动力学中经常出现的强时间依赖问题的高可扩展性全隐式求解器。该求解器使用异构多重网格局部分解算法,显著加快了求解器的收敛过程,并利用粗粒度并行。此外,设计了在物理上的多块异步不完全LU分解方法来解决每个重叠子域上的子问题,从而进一步取得粗粒度并发。同时,在不同硬件层级上实现了系统层面上的优化,充分利用异构计算单元,减少数据移动的开销。基于“神威·太湖之光”超级计算机,该求解器实现了快速且准确的大气模拟,可以扩展到850万核

  • Large Scale Phase Field Simulation for Coarsening Dynamics Based on Cahn-Hilliard Equation with Degenerated Mobility / 钛合金微结构演化相场模拟

We present large scale phase field simulation on the new Sunway TaihuLight supercomputer. The highly nonlinear and severely stiff Cahn-Hilliard equations with degenerated mobility for microstructure evolution are solved at extreme scale, demonstrating that the latest advent of high performance computing platform and the new advances in algorithm design are now offering us the possibility to accurately simulate the coarsening dynamics at unprecedented spatial and time scales.

钛合金制备工艺复杂,微观组织形成机制和规律难以通过实验获得,常借助于软件模拟。相场法能够模拟微观组织的演化过程,广泛应用于新材料的设计。ScETD-PF是基于可扩展紧致指数时间差分算法库的相场模拟软件。该软件由中科院网络中心自主开发,支持计算材料科学、计算物理、计算生命科学等学科科研模拟。该应用首次实现了国际最大规模的钛合金微结构粗化相场模拟,显著加快了我国新型钛合金的设计和工艺优化。

This research is selected as a finalist for the 2016 Gordon Bell Prize.


  • A Highly Effective Global Surface Wave Numercial Simulation with Ultra-high Resolution / 高分辨率海浪数值模拟
This research is selected as a finalist for the 2016 Gordon Bell Prize.

Surface wave is one of the most energetic motions in the global ocean, and it is crucially important to marine safety and climate change. High resolution global wave model has the key role for accurate wave forecasting. However the parallel efficiency with a large amount of computation is a big barrier for this kind model by now. In this work, a breakthrough in the design and application of irregular quasi-rectangular grid decomposition, master-slave cooperative computing workflow and pipelining schemes for high resolution global wave model has been achieved. Based on these innovations, the ultra-high horizontal resolution of (1/60) °by (1/60) °global wave model is implemented in the new Sunway heterogeneous Supercomputer with 100 PFlops peak performance. The results show that peak performance of our model can reach 30.07 PFlops with full-scale system consisting of 8,519,680 cores. These innovations provide good scalability and high efficiency for ultra-high resolution global wave model.

对于海洋模式模拟而言,分辨率的提高会带来计算量的大幅提升。如果水平分辨率提高10倍,模式的计算量将增加数百乃至上千倍,是未来E级计算机系统的驱动应用。该应用在“神威·太湖之光”超级计算机实现了(1/60)°高分辨率的全球海洋模式,通过从核加速以及负载均衡、通信重叠和指令流水等优化手段,模式成功扩展到8,519,680核数,达到最高30.07PFlops的峰值性能,获得了优异的扩展性与并行效率。

  • Numerical Simulation of the Aerospace-craft Unification Algorithm / 航天飞行器统一算法数值模拟
Tiangong-1 is China’s first space station, which serves as an experimental testbed for orbital rendezvous and docking. Tiangong-1 is also the prototype of China’s future space lab. In this work, we perform the simulation of the turbulent state of the two-cabin simplified model (10-meter long, with a diameter around 3.5 meters in the cross section) of the Tiangong-1 spaceship in the failing process (flying height=65KM, Ma=13). By using 16,384 processors of the Sunway system, the computation job, which normally takes 12 months, was finished in 20 days. In addition, the simulation results provide a good fit to the result of the wind tunnel test.

基于”神威·太湖之光“超级计算机,对”天宫一号“飞行器两舱简化外形(长度10余米、横截面直径近3.5米)陨落飞行(H=65km、62km、Ma=13)绕流状态大规模并行模拟,使用16,384个处理器在20天内便完成常规需要12个月的计算任务,计算结果与风洞实验结果吻合较好,为”天宫一号“飞行试验提供重要数据支持。

This is a particularly interesting piece of work. China conducted the Tiangong-1 mission in 2011. And such simulation procedures are usually conducted two to three years before the real mission. So does it mean Sunway Taihu Light, or the prototype of Sunway Taihu Light already existed in 2008/09???

@cirr @TaiShang @AndrewJin @Shotgunner51 @ahojunk @JSCh

  • Refactoring and Optimizing the CAM on the New Sunway Many-core Supercomputer / 基于国产平台的国产地球系统模式

Our efforts are refactorizing and optimizing the Community Atmosphere Model (CAM) on the new Sunway supercomputer, which uses a many-core processor that consists of management processing elements (MPEs) and clusters of computing processing elements (CPEs). To map the large code base of CAM to millions of cores on the Sunway system, we take OpenACC-based refactorization as major tool, and apply source-to-source translator tools to generate the most suitable parallelism for the CPE cluster, and to fit the intermediate variable into the limited on-chip fast buffer. For single kernels, when comparing the original ported version using only MPEs and the refactorized version using both the MPE and CPE clusters, we achieve up to 22x speedup for the computer-intensive kernels. For the 25km resolution CAM global model, we manage to scale to 24,000 MPEs, and 1,536,000 CPEs, and achieve a simulation speed of 2.81 model years per day.

本工作主要基于神威超级计算机来完成公共大气模式CAM的代码重构与性能优化。为了将代码量巨大的CAM模式扩展到神威系统的百万计算核上,研究团队依托神威系统提供的OpenACC框架,对原始代码进行重构,设计了与神威系统计算、存储模型相匹配的计算代码,有效地提高了计算性能。与纯主核版本相比,同时使用主、从核的优化程序能取得22倍的性能提升。通过使用24,000个主核以及1,536,000个从核,全球范围25公里分辨率的模拟速度可以达到2.81模式年/天。

  • 岛礁建设浮式平台的移植与优化
岛礁建设浮式平台总长100米级,可停靠万吨级船舶,具有土石方及各类建筑材料卸船、平台上重载汽车装运、经栈桥输送至礁盘、机械货物堆放与起吊、电力与燃料供应、施工人员食宿、淡水制造、污水处理等功能,可拖带至不同待建岛礁重复使用,满足岛礁建设浮式平台针对西沙群岛近期建设及中、南沙群岛有关岛礁将来建设对输送物资上岛,船载土石料高效卸运、礁盘上永久基地的高效施工的工程需求。在科技部973项目“海洋超大型浮体复杂环境响应与结构安全性”和工信部高技术船舶科研项目“岛礁中型(总长 300 米级)浮式结构物关键技术研究”支持下,中国船舶重工集团公司第七〇二研究所进一步发展了可计及复杂海底地形影响的三维水弹性力学分析方法。

用户通过基于吴有生院士创立的三维水弹性理论而发展的可以考虑航速、频域二阶非线性、计及海底地形影响等因素的可视化成熟软件THAFTS,首次采用了数百万处理器核对近岛礁浮式平台和海洋超大型浮体三维水弹性问题进行了大规模并行计算。计算结果准确揭示了在近岛礁海底变化和波浪非均匀性影响下浮式平台的运动和载荷响应特性,所得数据可靠,并与试验结果相互验证,比较准确地评估了浮式平台在近岛礁复杂环境条件下的结构应力水平,具有重要的理论价值和工程实际意义。

So there goes down the argument that Taihu light was just sitting there empty and getting rusty.

Looks like Taihu Light was put in better use than most other super computers that could not make to the top 6 applications.
 
Supercomputer Tianhe-1 carries out 1,400 tasks per day

CGTN, February 6, 2017

China's supercomputer, "Tianhe-One" carries out 1,400 computing tasks per day. It is mainly used to serve universities, research institutions, small and medium-sized enterprises, and provide scientific computing services.

6c0b840a2e381a022cb20a.jpg

China's supercomputer Tianhe-1 carries out 1,400 tasks per day
 
NCSA Facilitates Performance Comparisons with China’s #1 Supercomputer
William M. Tang

Paper describes ‘porting’ and running of a discovery-science-capable code from the plasma physics domain onto Sunway TaihuLight’s ‘home-grown’ architecture

China has topped supercomputer rankings on the international TOP500 list of fastest supercomputers for the past eight years. They have maintained this status with their newest supercomputer, Sunway TaihuLight, constructed entirely from Chinese processors.

While China’s hardware has “come into its own,” as Foreign Affairs wrote in August, no one can say objectively at present how fast this hardware can solve scientific problems compared to other leading systems around the world. This is because the computer is new—having made its debut in June, 2016.

Researchers were able to use seed funding provided through the Global Initiative to Enhance @scale and Distributed Computing and Analysis Technologies (GECAT) project administered by the National Center for Supercomputing Application’s (NCSA) Blue Waters Project to port and run codes on leading computers around the world. GECAT is funded by the National Science Foundation’s Science Across Virtual Institutes (SAVI) program, which focuses on fostering and strengthening interaction among scientists, engineers and educators around the globe. Shanghai Jiao Tong University and its NVIDIA Center of Excellence matched the NSF support for this seed project, and helped enable the collaboration to have unprecedented full access to Sunway TaihuLight and its system experts.

It takes time to transfer, or “port,” scientific codes built to run on other supercomputer architectures, but an international, collaborative project has already started porting one major code used in plasma particle-in-cell simulations, GTC-P. The accomplishments made and the road towards completion were laid out in a recent paper that won “best application paper” from the HPC China 2016 Conference in October.

“While LINPACK is a well-established measure of supercomputing performance based on a linear algebra calculation, real world scientific application problems are really the only way to show how well a computer produces scientific discoveries,” said Bill Tang, lead co-author of the study and head of the Intel Parallel Computing Center at Princeton University. “Real @scale scientific applications are much more difficult to deploy than LINPACK for the purpose of comparing how different supercomputers perform, but it’s worth the effort.”

The GTC-P code chosen for porting to TaihuLight is a well-traveled code in supercomputing, in that it has already been ported to seven leading systems around the world—a process that ran from 2011 to 2014 when Tang served as the U.S. principal investigator for the G8 Research Council’s “Exascale Computing for Global Scale Issues” Project in Fusion Energy, or “NuFuSE.” It was an international high-powered computing collaboration between the US, UK, France, Germany, Japan and Russia.

A major challenge that the Shanghai Jiao Tong and Princeton Universities collaborative team have already overcome is adapting the modern language (OpenACC-2) in which GTC-P was written, making it compatible with TaihuLight’s “homegrown” compiler, SWACC. An early result from the adaptation is that the new TaihuLight processors were found to be about three times faster than a standard CPU processor. Tang said the next step is to make the code work with a larger group of processors.

“If GTC-P can build on this promising start to engage a large fraction of the huge number of TaihuLight processors, we’ll be able to move forward to show objectively how this impressive, new, number-one-ranking supercomputer stacks up to the rest of the supercomputing world,” Tang said, adding that metrics like time to solution and associated energy to solution are key to the comparison.

“These are important metrics for policy makers engaged in deciding which kinds of architectures and associated hardware best merit significant investments,” Tang added.

The top seven supercomputers worldwide on which GTC-P can run well all have diverse hardware investments. For example, NCSA’s Blue Waters has more memory bandwidth than other U.S. systems, while TaihuLight has clearly invested most heavily in powerful new processors.

As Tang said recently in a technical program presentation at the SC16 conference in Salt Lake City, improvements in the GTC-P code have for the first time enabled delivery of new scientific insights. These insights show complex electron dynamics at the scale of the upcoming ITER device, the largest fusion energy facility ever constructed.

“In the process of producing these new findings, we focused on realistic cross-machine comparison metrics, time and energy to solution,” Tang said. “Moving into the future, it would be most interesting to be able to include TaihuLight in such studies.”


About the National Center for Supercomputing Applications

The National Center for Supercomputing Applications (NCSA) at the University of Illinois at Urbana-Champaign provides supercomputing and advanced digital resources for the nation’s science enterprise. At NCSA, University of Illinois faculty, staff, students, and collaborators from around the globe use advanced digital resources to address research grand challenges for the benefit of science and society. NCSA has been advancing one third of the Fortune 50 for more than 30 years by bringing industry, researchers, and students together to solve grand challenges at rapid speed and scale.

NCSA Facilitates Performance Comparisons with China’s #1 Supercomputer – Global Initiative to Enhance @scale and Distributed Computing and Analysis Technologies
 
China to jump supercomputer barrier

2017-02-20 08:31

China Daily Editor: Mo Hong'e

China has started to build a new-generation supercomputer that is expected to be 10 times faster than the current world champion.

This year, China is aiming for breakthroughs in high- performance processors and other key technologies to build the world's first prototype exascale supercomputer, the Tianhe-3, said Meng Xiangfei, the director of application at the National Super Computer Tianjin Center. The prototype is expected to be completed in early 2018.

"Exascale" means it will be capable of making a quintillion (1 followed by 18 zeros) calculations per second. That is at least 10 times faster than the world's current speed champ, the Sunway TaihuLight, China's first supercomputer to use domestically designed processors. That computer has a peak speed of 125 quadrillion (1 followed by 15 zeros) calculations per second, he said.

"Its computing power is on the next level, cementing China as the world leader in supercomputer hardware," Meng said. It would be available for public use and "help us tackle some of the world's toughest scientific challenges with greater speed, precision and scope", he added.

Tianhe-3 will be made entirely in China, from processors to operating system. It will be stationed in Tianjin and fully operational by 2020, earlier than the US plan for its exascale supercomputer, he said.

China also likely has another exascale supercomputer in the works. "Such machines take years to make and typically are retired in six to eight years, so you always need a backup, especially when your older models are overworked."

Tianhe-1, China's first quadrillion-level supercomputer developed in 2009, is now working at full capacity, undertaking more than 1,400 assignments each day, solving problems "from stars to cells".

The exascale supercomputer will be able to analyze smog distribution on a national level, while current models can only handle a district. Tianhe-3 also could simulate earthquakes and epidemic outbreaks in more detail, allowing swifter and more effective government responses, Meng said.

The new machine also will be able to analyze gene sequence and protein structures in unprecedented scale and speed. That may lead to new discoveries and more potent medicine, he said.

Liu Guangming, director of the National Super Computer Tianjin Center, said Tianhe-3 will generate over 10 billion yuan ($1.49 billion) in economic benefits per year, according to the The Paper, a Shanghai news organization.

http://www.ecns.cn/2017/02-20/245954.shtml
 
Back
Top Bottom