http://typhooncomputing.com/wp-content/uploads/2013/05/mountains.jpg

Next Generation Japanese Weather Model – Brought to the GPU using a Unified High Performance Codebase

Integrating the Physical Core of Japan’s next weather model on GPU was a challenge, especially since the codebase needed not only to be compatible with x86 CPU, but it needed to be as performant as the architecture specific codebase that was used before. After some experimentations the realization came that in order to achieve this feat for a complex codebase with deep callgraphs inside loops and over 25’000 lines of code, appropriate tools needed to be invented first. That’s how Hybrid Fortran came into existence. With the help of this framework, this large codebase was ported in only three months.

Performant, user- and maintenance friendly GPGPU and a large codebase ported in no time – Typhoon Computing’s Michel is your man for the job.

Prof. Takayuki Aoki,
Tokyo Institute of Technology

Execution Time Results

Lower is Better. Showing execution time results for the computationally most expensive radiation module (~ 80% execution time), using a grid size of 256 x 256 x 63 points.


The Hybrid Fortran framework is Open Source.

It worked out just as intended – the new unified codebase even performs slightly better on CPU than the original x86-only version. This is a significant achievement, since it allows a full integration of Japan’s weather prediction model on GPU clusters together with the already ported Dynamical Core (no memory transfer between CPU and GPU necessary anymore), while keeping a fully performant backwards compatible version for existing CPU clusters in a nicely maintainable, unified codebase.


Hybrid Fortran – The Most Maintainable and Easy to Use System for Unified CPU/GPU Codebases

You’d like your Fortran codebase go really fast, yet keep backwards compatibility with (multicore) CPU? With the Open Source solution Hybrid Fortran it’s as simple as

  1. wrapping your parallelizable code in @parallelRegion directives.
  2. letting the system know which of your arrays are defined in parallelizable data dimensions using @domainDependant directives.
  3. typing make cpu or make gpu in your favorite shell.

build_system

Hybrid Fortran creates separate code versions for CPU and GPU at compile time using a python based preprocessor. All this happens automatically when you call ‘make’ in your shell.