>100 Views

May 28, 24

スライド概要

Feasibility Study (DAY-1 : Jan 29, 2024)

Takeshi Iwashita (RIKEN, R-CCS | Kyoto University)

"Introduction of application group activities of RIKEN FS team and a prospective on next-generation applications"

The 6th R-CCS International Symposium

https://www.r-ccs.riken.jp/R-CCS-Symposium/2024/

R-CCS 計算科学研究推進室

1.

Feasibility Study at the Application Research Group Takeshi Iwashita（Kyoto University/RIKEN R-CCS） Jan. 29, 2024 The 6th R-CCS International symposium

2.

Today’s talk Introduction of the research activity of RIKEN application group (10 min.) Personal opinion for next generation systems and applications (5 min.) Introduction of my research about a linear iterative solver for accelerators (5 min.)

3.

Organization Group leaders Iwashita Shimokawabe Takahashi Fukazawa Life Science [SGL: Terayama] Fundamental Science [SGL: Aoki] New Materials and Energy [SGL: Yamaji] Social Science [SGL: Umemoto] Weather and Climate [SGL: Kodama] Manufacturing [SGL: Onishi] Earthquake and Tsunami Disaster Prevention [SGL: Fujita] Digital Twin and Society 5.0 [SGL: Shimokawabe] Scientific Computing Algorithm [SGL: Takahashi] Benchmark Construction [SGL: Murai] ML Algorithm [SGL: Yokota] Performance Modeling [SGL: Domke] Pure App Group CS Group

4.

Application fields (1) Life Science Multi-layered and complex phenomena from the atomic level to the level of cells, organs, and individuals MD simulations, docking simulations, Quantum chemistry simulations, Organ Simulations, Bioinformatics, etc. Collaboration with AI and ML, collaboration with experiments New Materials and Energy Solving the electronic Schrödinger equation given the positions of the nuclei and the number of electrons to predict electronic properties ab initio quantum chemistry calculations, Quantum many-body computation, etc. Fundamental Science Particle physics, nuclear physics, space and planetary fields Quantum chromodynamics, Nuclear shell-model calculations, N-body simulations, etc. Social Science Predicting automobile and pedestrian traffic volumes Agent-based simulation AI-based analysis

5.

Application fields (2) Weather and Climate Various weather and climate models for predicting meteorological phenomena, climate, and the Earth system Regional climate models, Global climate models, Multi-scale models, Nonhydrostatic ocean models, etc. Ensemble data assimilation, Digital Earth, AI surrogate models Earthquake and Tsunami Disaster Prevention Various science and disaster prevention methods related to earthquakes and tsunamis Finite element method, Particle method, Boundary element method, Finite difference method, etc. Ensemble calculation Manufacturing Various numerical approaches for designing and evaluating physical things Phenomenon analysis: turbulent eddy structure, chemical reaction network, interface transfer and deformation, Fluid-structure interaction, chemical reaction and phase change, etc. Optimal design: dimensional contraction of design space, multi-view in-situ visualization, surrogate model, etc. Digital Twin and Society 5.0 Integrating cyber space and physical space

6.

Results of Activities of Application Research Group Survey of the current status of each application field Investigation of achievements expected in each application area around 2030 Investigation of the features of the next-generation computing system which are necessary for achieving the goal System architectures: computational performance, memory (amount/bandwidth), and inter-node communication performance Programming models: language, compiler, tools, libraries, frameworks Construction of benchmark sets that reflect the characteristics of typical applications on going The developed benchmarks were provided to the Architecture Research Group and the System Software and Library Research Group. Each application subgroup had a hearing by the architecture group members. The benchmarks are used for evaluating future devices and system architectures in vendors.

7.

Life Science Group Breakthroughs Expected in Next-Generation Computers • Based on analysis technology using multi-scale simulations from the molecular level to the organ and human body level, it is exp ected that analysis and prediction of biological mechanisms that could not be captured experimentally before will become possibl e through cell and other simulations. • The integration of multi-level data and simulations of biological phenomena with AI and LLM is expected to enable computation al elucidation of disease mechanisms, advanced precision medici ne, and in-silico development of various drugs including small mo lecules and antibody drugs. Required computational environment and challenges for future 1) Computation by an single application is not sufficient; advanced i ntegration of numerous simulations, AIs, and databases is essential. Although speeding up individual applications is necessary, computa tional environment that allows easy and flexible combination of diff erent types of applications is indispensable for solving future issues. (2) Research and development cannot be done by computers alone. Based on a long-term perspective and budget, it is desirable to secur e and train stable human resources and promote collaboration amo ng industry, government, and academia. Multi-scale simulation and analysis for life science Foundation model by AI and LLM Innovations in Medical Systems and Drug Discovery with Supercomputer and AI Precision Medicine Infection Prevention Platform for New Drug Development 6

8.

Weather/Climate Science SG Expected breakthrough on the next-generation supercomputer ・Development of a realistic and detailed weather prediction systems to significantly improve the forecast accuracy of torrential rain and typhoon that cause disasters every year. Realization of tornado forecast, lightning frequency forecast, probability prediction that covers possible scenarios, high-frequency realtime forecast with 3-D dense observations and data science. ・Realization of sophisticated Earth system models that qualitatively improve the understanding and projection of climate change: projection of tropical cyclone, extreme phenomena, and urban climates that can be used for climate adaptation, source and sink estimations of GHGs that contributes to the Paris Agreement, understanding the role of oceanic mesoscale eddies in climate system. ・Comprehensive understanding of multi-scale phenomena from microscopic to planetary scales by massive experiments that have never been possible before. Development and demonstration of next-generation dynamical core, cloud microphysics, and radiation scheme that achieve further sophistications following law of physics. Speedup with low accuracy arithmetic and AI surrogate model. Data assimilation Global cloud-resolving model Large ensemble Necessary spec and issues to achieve a breakthrough ・Dozens of times faster than Fugaku in effective performance of overall workflow. ・Heavy cost spent for source code refactoring to use accelerators and major modifications of algorithms to fully enjoy their benefit. ・High availability system toward realtime forecasts and whole-system simulations. ・High speed and large amount global/local storage. ・Appropriate design of memory bandwidth, memory amount, cache, IO, internode communication etc. that supports wide variety of dynamics and physics schemes and target problems. Bulk lightning model Super-droplet method 7

9.

Breakthroughs Expected in Next-Generation Computer • Instead of directly manipulating society, simulations (digital twin) can be implemented to generate various cases and events. As a measure to understand and handle society mathematically, reproduce virtual society, analyze the causes and consequences of events, and classify them by phase. • Simulation allows us to search the possibility space, including outside of reality. Extreme disaster or spontaneous disruption events can be extracted from large number of simulation results. The conditions of them can then be identified to reduce unexpectedness. Required computational environment and challenges for future 1. Required environment: Many high-performance cores for parallel executions of multi scenarios which are independent each other. + Virtual environment enabling short preparation time, that can run open-source applications existing or to be developed in the social sciences. 2. Challenges: The basic equations of our society and calculation methods established are unknown. While improving performance with known algorithms, it is also necessary to try various algorithms. Vehicle simulation Event possibility search space Case00 Case01 Disaster! Case10 Case11 … reality … Case0N Case1N Disaster! Pedestrian simulation … … … … … CaseNN Disaster CaseN0 parameter2(e.g. # of shelters) Social Science Sub Group … Phase classification in very high dimensional space with generative model study Safe phase Policy proposal Applicable parameter set Congested phase parameter1 (e.g. # or period of signals) ※Much higher dimention in reality (>100 dim.) 8

10.

Plan for 2024 In 2024, the Architecture Research Group will determine some future directions for the next system. Following the architecture research group’s decision, the application research group will conduct their investigation. Modification of science roadmap Investigation of new algorithms and implementation methods to fully utilize the next-generation computing system Efficient use of accelerators

11.

“Personal opinion” for applications on next-generation computers & some research results Jan. 29, 2024 The 6th R-CCS International symposium

12.

About AI applications Human vs Computer vs AI Recent AI technology is really amazing, but computers have done amazing things for many years. Computer has worked much more efficiently than human in some areas (especially conventional simulation and HPC areas). A human cannot do even 10 arithmetic operations in a second. Very accurate analysis Recent AI technology can work in the area that we thought human could do better than computers. Judgement, prediction, estimation, even in generation process of something new This is an amazing thing. But it cannot cover all conventional simulation areas from the viewpoint of accuracy. “適材適所(Tekizai Tekisho)” is very important. (Right people, right place) Both conventional computational science and AI/ML approaches are important. also work by human beings, hopefully Right computer, right AI, right people, all right place

13.

Application areas supported by the next-generation computing system Conventional Computational Science AI/ML workloads (Simulations, conventional HPC workloads) Low byte/flops ratio (Computation intensive) Matrix-matrix product (many, small, dense) High byte/flops ratio Sparse matrix computation, stencil AI for Science Both workloads should be supported. Each workload has a different characteristic. Huge optimization problems Very low byte/flops ratio Huge computational efforts Quantum computing device

14.

What should we do ? From the viewpoint of next-generation architecture Accelerators (including special instructions of CPUs) are essential for improvement in system/node performance. We should consider effective use of accelerators. Recent accelerators (GPUs) have higher performance in low precision arithmetic than in 64 bit FP arithmetic. Very high degrees of parallelism (higher is better) The parallelism should be exploited in SIMD type operations. Preferable data layout exists. Application programmer should investigate new methods, algorith ms, and implementation methods considering these aspects.

15.

Takeshi Iwashita, Senxi Li, Takeshi Fukaya (Hokkaido University)

16.

Publication Takeshi Iwashita, Senxi Li, Takeshi Fukaya, “Hierarchical block multi-color ordering: a new parallel ordering method for vectorization and parallelization of the sparse triangular solver in the ICCG method”, CCF Transactions on High Performance Computing (Springer), volume 2, pages 84–97, (2020). https://doi.org/10.1007/s42514-020-00030-z (Open Access)

17.

Background A sparse triangular solver is a main component of the Gauss-Seidel (GS) smoother, SOR method and IC/ILU preconditioning, which are used as building blocks in various computational science or engineering analyses. However, it is well known that the sparse triangular solver, which consists of forward and backward substitutions, cannot be straightforwardly parallelized. Parallel ordering is one of the most popular methods for parallelization of a sparse triangular solver. Multi-color ordering is a popular parallel ordering, and it is suitable for GPU implementation. But it entails a trade-off problem between convergence and number of synchronization points. One of the remedies for the trade-off problem is block multi-coloring.

18.

Background & Research Target Block multi-coloring (BMC) Multi-color ordering is applied to blocks of unknowns. In the context of parallel IC/ILU preconditioning, the block coloring was investigated in a finite difference method (Iwashita et al. SISC 2005). The method was enhanced for a general sparse linear system (Iwashita et al. IPDPS2012). This technique has been used in various applications because of its advantages in terms of convergence, data locality, and the number of synchronizations (J. Park et al., ISC2014; J. Park et al., SC14; J. Park et al., SC15; E. Vermij et al., ICPP17; Y. Ao et al., ACM TACO 2018; A. Elafrou et al., SC19; Q. Zhu et al., SC21; X. Yang et al., ICS23; Y. Zhang et al., IPDPS23) However, the block multi-coloring method has a drawback in its implementation using SIMD vectorization. In this research, we aimed to develop a new parallel ordering that has the same convergence rate as BMC and makes the SIMD vectorization of substitutions possible.

19.

Problem We consider an n-dimensional linear system of equation: 𝑨𝒙 = 𝒃. We focus on parallelization of the sparse triangular solver: 𝒚 = 𝑳 𝟏 𝒓, 𝒛 = 𝑼 𝟏 𝒚 𝑳 and 𝑼 are lower and upper triangular matrices, respectively. They have the same nonzero element pattern as 𝑨. (Gauss-Seidel, SOR, IC/ILU preconditioning cases) The sparse triangular solver (forward and backward substitutions) are not straightforwardly parallelized. Reordering (parallel ordering) One of the most popular techniques for parallelization of the sparse triangular solver. Reordering: permutation of the elements of index set I 𝑰 = 1, 2, … , 𝑛 that corresponds to the index of each unknown. 𝑖 -th unknown is moved to 𝜋 𝑖 -th unknown in the new system. New (reodered) linear system: 𝑨𝒙 = 𝒃, 𝒙 = 𝑷𝑻 𝒙, 𝑨 = 𝑷 𝑨𝑷𝑻 , 𝒃 = 𝑷 𝒃 𝑨 is a suitable form for parallel processing.

20.

Equivalence of convergence When 𝒙( ) = 𝑷 𝒙( ) holds at every j-th step under initial setting 𝒙( ) = 𝑷 𝒙( ) , we say the iterative solver has equivalence of convergence for two original and reordered linear systems. (Upper subscript means the iteration count.) Jacobi method and most of non-preconditioned Krylov subspace methods have equivalence of convergence. Ordering usually affects the convergence of the iterative solver involving (S)GS, (S)SOR, or IC/ILU preconditioning parts. However, when the condition shown in the next slide, which is called “Equivalent Reordering (ER) Condition”, is satisfied, the equivalence of convergence holds.

21.

Equivalent Reordering Condition ER condition – ∀𝑖 , 𝑖 ∈ 𝐼 such that 𝑎 , ≠ 0 or 𝑎 , ≠ 0, sgn 𝑖 − 𝑖 = sgn 𝜋(𝑖 ) − 𝜋(𝑖 ) . 𝑎 , : 𝑖 -th row 𝑖 -th column element of original coefficient matrix In other words Two orderings have an identical ordering graph. Doi, S., and Lichnewsky, A., Res. Report No. 1452. INRIA, France, (1991) 6 1 5 2 The sketch of the proof is given in the appendix of our paper. 3 7 4 3 4 5 6 7 Coefficient matrix (Colored elements: nonzero) 2 1 Ordering graph

22.

Hierarchical block multi-color ordering Design points for new ordering The same convergence rate of block multi-color ordering The same number of synchronization points for multi-threads Availability of SIMD vectorization for forward-backward substitutions We first apply the (algebraic) block multi-color ordering to the linear system. While keeping the ordering graph, we reorder it again (secondary reordering). Color 1 Color 2 Color 3 (Nonzero entries exist only in dotted parts)

23.

Index set 12 n Color nc Color 2 Color 1 Threads 2, 3, …, T-1 Thread 1 b1(1) b2(1) b3(1) b4(1) b5(1) b6(1) b7(1) b8(1) b9(1) Thread T bn(1)(1) Division of unknowns to blocks in BMC ordering (BMC blocks) Level-1 block b1(1) Level-1 block b2(1) Secondary reordering of HBMC ordering (w =4) We generate level-1 blocks each of which consists of w BMC blocks in each color. (w : SIMD length) We reorder the unknowns in each level-1 block. It does not affect the ordering graph between the unknowns belonging to different level-1 blocks. bn(nc)(nc)

24.

Reordering inside level-1 block Level 1 block which consists of 4 BMC blocks (w=4) bks+2(c) bks+1(c) bks+3(c) bks+4(c) New order Level 2 block Level 2 block Level 2 block The unknowns in the same level-2 block has no data relationship. → SIMD vectorization for the substitutions is possible. (SIMD length: w) The above reordering process does not change the ordering graph that corresponds to the unknowns in the level 1 block. The new ordering is a equivalent ordering to BMC.

25.

Hierarchical block multi-color ordering Level 2 block The parallelism of level-1 blocks is exploited by multiple threads. Level 1 block Color 1 Color 2 Color 3 The coefficient matrix arising from HBMC ordering. Multithreaded and vectorized implementation of forward substitution using intrinsic functions The parallelism among unknowns in a level-2 block is exploited by SIMD instructions.

26.

Numerical tests The program is written in C. We used three types of nodes. A node of Cray XC40 (Xeon Phi KNL processor) A node of Cray CS400 (2 Xeon Broadwell processors) A node of Fujitsu CX2550 (2 Xeon Skylake processors) NVIDIA A100 GPU Three multi-threaded IC(0)-CG solvers based on MC, BMC, and HBMC orderings were tested. Storage format Coefficient matrix: CSR Preconditioner: CSR(MC, BMC), SELL(HBMC) The convergence criterion: relative residual norm less than 10-7 Test problems: 7 matrices from SuiteSparse Matrix collection and a linear system arising in finite edge-element eddy current analysis

27.

Equivalence of convergence We checked the convergence behaviors of BMC and HBMC. G3_circuit Ieej The two lines of the relative residual norms for BMC and HBMC overlap, which indicates that the solvers had an equivalent solution process. The equivalence of convergence was also confirmed in all test cases.

28.

Numerical results Results on Xeon Phi (KNL) Relative speedup vs MC 2 HBMC 1.8 1.6 BMC 1.4 1.2 1 0.8 0.6 0.4 0.2 0 Thermal2 Parabolic_fem MC BMC(8) G3_circuit BMC(16) Audikw_1 BMC(32) Tmt_sym HBMC(8) HBMC(16) Apache2 Ieej HBMC(32) BMC and HBMC are superior to MC in all tests. In 5 out of 7 datasets, HBMC outperforms BMC.

29.

Numerical results Results on Xeon Broadwell Relative speedup vs MC 1.6 1.4 1.2 1 0.8 0.6 0.4 0.2 0 Thermal2 Parabolic_fem MC BMC(8) G3_circuit BMC(16) Audikw_1 BMC(32) Tmt_sym HBMC(8) HBMC(16) Apache2 Ieej HBMC(32) BMC and HBMC are superior to MC with an appropriate block size in all tests. HBMC exhibits better performance than BMC.

30.

Numerical results Results on Xeon Skylake Relative speedup vs MC 1.6 1.4 1.2 1 0.8 0.6 0.4 0.2 0 Thermal2 Parabolic_fem MC BMC(8) G3_circuit BMC(16) Audikw_1 BMC(32) Tmt_sym HBMC(8) HBMC(16) Apache2 Ieej HBMC(32) BMC and HBMC are superior to MC with an appropriate block size in all tests. HBMC exhibits better performance than BMC except for Audikw_1 test. (When SIMD width is set to be 8, the number of padding elements for SELL format is large in the Audikw_1 test.)

31.

Numerical results Results by Mr. Kengo Suzuki, Hokkaido Univ. Performance of parallel ILU-GMRES solver on GPU (V100) Relative speedup vs MC 1.6 1.4 1.2 1 0.8 0.6 0.4 0.2 0 Thermal2 parabolic_fem G3_circuit MC tmt_sym apache2 HBMC HBMC showed 15 – 40 % improvement in solver performance (time to solution) compared with MC on a GPU.

32.

Conclusions A new parallel ordering technique, hierarchical block multi-color ordering (HBMC), was proposed for vectorizing and multithreading the sparse triangular solver. HBMC was designed to maintain the advantages of the block multi-color ordering (BMC) in terms of convergence and the number of synchronizations. In the method, the coefficient matrix was transformed into the matrix with hierarchical block structures. The level-1 blocks were mutually independent in each color, which was exploited in multithreading. Corresponding to the level-2 block, the substitution was converted into w(= SIMD width) independent steps, which were efficiently processed by SIMD instructions. Numerical tests were conducted to examine the proposed method using seven datasets on three types of computational nodes. The numerical results confirmed the equivalence of the convergence of HBMC and BMC. Moreover, numerical tests indicated that HBMC outperformed BMC in 18 out of 21 test cases (seven datasets × three systems), which confirmed the effectiveness of the proposed method. Future work: we will examine our technique for other application problems, particularly a large-scale multigrid application.

33.

Summary of the talk FS activities of Application Research Group 12 subgroups: computational science, social science, data science, algorithms, performance models, benchmark construction Investigation of achievements expected in each application area around 2030 Investigation of the features of the next-generation computing system which are necessary for achieving the goal Construction of benchmark sets that reflect the characteristics of typical applications Plan for 2024 Modification of science roadmap following the architecture group’s report Investigation of new algorithms and implementation methods to fully utilize the next-generation computing system Personal opinion for applications on next-generation supercomputers HPC workloads, AI workloads, AI for science are all important. 32

34.

Appendix: slides from other subgroups (expected achievement around 2030) Jan. 29, 2024 The 6th R-CCS International symposium

35.

Emergent and energy materials subgroup Breakthrough brought by next generation supercomputer ・Acceleration of discovery of hard magnets and spintronics materi als by constructing exhaustive database of disordered materials ・Prediction of degradation of solid-state-battery and quantum effi ciency of solar-cell materials by AI assisted ab initio simulation ・Exploration of quantum device operation principles by ab initio si mulation of ultra-fast photo-induced dynamics in topological materials ・Development of novel microfabrication technology by ab initio si mulation of interacting ultrashort pulse laser and solids ・Measurement and control of quantum entanglement in high-t emperature superconductors and topological quantum spin liquids augmented by many-body quantum simulations Required resources and issues to be solved for the breakthrough ① ・Computational resources to allow us to run few tens of parallel jobs that require 103 nodes of Fugaku, everyday ・Compilers to translate codes written for CPU for accelerators eff ectively ➁ Utilization of accelerators: It is necessary to make computationa l density higher by tuning the codes and refining algorithms Post Fugaku simulation -High performance electric motor -High performance and reliable rechargeable battery -Power efficient spin device -Quantum device and control of quantum en tanglement 34

36.

Subgroup: Manufacturing Breakthrough in next generation computers The practical application of HPC in the design and manufacturing processes of various i ndustries is expected to lead to higher product performance and faster product design to meet diverse and changing needs. • Replacement of performance testing by virtual one based on quasi-direct numerical s imulation for turbulent flows (Reynolds number of 10 or higher) in actual turbomac hinery such as compressors, pumps, and marine propellers. • Digital prototyping of hundreds of different designs by performing multi-objective op timization based on the quasi-direct numerical simulation. • Predictive evaluation of transonic buffet and flutter boundaries for high-speed aircra ft flight conditions, and aircraft noise during takeoff and landing. • Multi-objective, multi-variable airframe design that integrally evaluates performance and safety requirements across the entire range of flight conditions. 図をいれてください （シミュレーションの図など） Analysis of transonic buffet Requirements and challenges for breakthrough 1. Computing resource requirements An estimation for quasi-direct numerical simulation using 20 trillion grids. NOTE: It is assumed that about 30 times weak-scaling is achieved to Fugaku. • Amount of computation: 2.4 ZFLOP / Amount of data: 3.6 PB / Amount of data transf er: 1.0 ZB / Amount of communication: 720 EB / Amount of IO: 2.4 PB / Computation time per analysis: 24 hr 2. Challenges for future applications • Efficient inter-process communication (e.g., optimization of rank maps) • Faster pre- and post-processing (e.g., grid generation, visualization) by optimizing da ta flow between each process (reduced file I/O) • Effective exploration of design space by applying data science (e.g., dimensionality re duction, surrogate models, uncertainty analysis, etc.) 図をいれてください Crusing （将来期待されるブレークス ルーやその社会的なインパク トに関する図など） Numerical analysis covering the entire aircraft flight envelope: Digital flight 35

37.

Fundamental Science Subgroup Neutrino /Dark Matter Breakthrough expected for the next generation computer Elementary Particle, Nucleus, Planet, Universe: from micro- to super macro- physical phenomena ・LQCD explores physics of the standard model (SM)and beyond, elucidating origin of universe and matters ・Ideal simulations using chiral fermions ・on-physical-point computation of u,d,s,c,b quarks → multifaceted precise tests of SM ・elucidation of finite temperature QCD phase structure, constraining dark matter scenarios ・Large volume simulation for precise determination of hadron interactions ・2-3 body interactions → hyper nuclear physics experiments, elucidation of heavy element synthesis in binary neutron mergers ・Nuclear shell model: Systematic calculation of light nuclei up to calcium based on first principles ・Revealing the properties of neutron-rich nuclei - limits of existence and cluster structure ・Galaxy Simulation/Galaxy Formation ・Athena++: Consistent simulation from entire galaxy to internal structure of molecular clouds ・ASURA-FDPS: formation-evolution history of various galaxies existing in the universe ・Sun and Star ・Consistently tracking from magnetic field generation inside the sun to sunspot formation ・Neutrinos/Dark Matter ・Numerical simulation to resolve dark matter halos on the scale of galaxy clusters/galaxy groups Galaxy Simulation Nuclear Shell Model Environment required for the breakthrough, challenges to be addressed ・100 x "Fugaku" required as the application x hardware performance (Computational Science Roadmap) ・Hardware Demands: ・Wide memory bandwidth and sufficient registers and out-of-order resources for effective utilization ・Hardware support for complex number operations and stencil calculations ・Low-latency adjacent communication ・Rewriting the code is acceptable if it can be used as an asset for future use ・Software/Applications: ・Performance improvement throughout the entire workflow ・LQCD: Utilization of half-precision arithmetic, communication avoidance, R&D of various acceleration techniques (such as Multigrid, AI) LQCD Solar Simulation

38.

Earthquake/Tsunami Disaster Subgroup Expected breakthroughs by use of next-generation system • • Develop a multi-scale 3D crustal deformation and seismic wave propagation simulator that combines a global-scale large-scale macro crustal deformation simulator and a high-resolution micro crustal deformation and seismic wave propagation simulator focused on a specific region, and construct a multi-scale model that is consistent with observations of both crustal deformation and seismic motion for the Japanese Islands This is expected to enable us to verify whether we can predict the transition of earthquake occurrence in the surrounding area after large earthquakes Example of crustal deformation analysis Required computer environment/topics to be considered 1. Computer environment required for breakthrough described above 5 to 10-fold increase in memory capacity, 5 to 10-fold increase in memory bandwidth, and 15 to 30-fold increase in arithmetic capability from Fugaku (B/F around 0.1). Here, a computational mechanism that does not significantly degrade performance due to random access or data recurrence is required (e.g., hardware-accelerated atomic add) 2. Topics to be considered/roadmap In order to achieve breakthroughs, we are planning to develop methods to solve the issues for highly efficient assimilation of observation data and the design of computation algorithms that can utilize the performance of future systems with small B/F Example of seismic wave propagation analysis Develop a multi-scale simulator that combines crustal deformation analysis and seismic wave propagation analysis: AI/data analytics-based method learning on data of past time steps are developed to accelerate simulation without deterioration of computational accuracy 37

39.

Digital Twin/Society 5.0 Subgroup Expected breakthroughs by use of next-generation system ・Develop a data-driven parameterization model for weather simulation by integrating simulation and machine learning. By replacing part of the simulation with this model, we can increase the resolution and performance of the simulation to achieve global weather simulations for long periods, such as 100 to 1000 years, with 100 to 1000 times faster performance than current simulations. ・Realize a real-time wind digital twin with a resolution of 25 cm, using realtime assimilation of data from IoT devices, for a 10 km square area in central Tokyo. Required computing environment/issues to consider ・The systems must be able to run applications with different characteristics, such as weather modeling, machine learning, and data processing, with high performance. In the case of heterogeneous systems, it is essential to increase the performance of the inter-system network. ・Data assimilation is necessary for the digital twin, which requires faster computation of dense matrices and eigenvalues. ・Fat nodes are preferred to reduce performance degradation from MPI communication. ・The ability to use AI is essential in the future, and Python should be used with a high degree of freedom. Virtualization technology should be used. Accelerating weather simulation by integrating simulation and machine learning Total air density Internal energy Density of water vapor Input Simulations Prediction by ML/NN ML output Comparison of computational results between simulation and machine learning 38