World Scientific
  • Search
Skip main navigation

Cookies Notification

We use cookies on this site to enhance your user experience. By continuing to browse the site, you consent to the use of our cookies. Learn More
×
Our website is made possible by displaying certain online content using javascript.
In order to view the full content, please disable your ad blocker or whitelist our website www.worldscientific.com.

System Upgrade on Tue, Oct 25th, 2022 at 2am (EDT)

Existing users will be able to log into the site and access content. However, E-commerce and registration of new users may not be available for up to 12 hours.
For online purchase, please visit us again. Contact us at [email protected] for any enquiries.

Using Compiler Directives to Port Large Scientific Applications to GPUs: An Example from Atmospheric Science

    For many scientific applications, Graphics Processing Units (GPUs) can be an interesting alternative to conventional CPUs as they can deliver higher memory bandwidth and computing power. While it is conceivable to re-write the most execution time intensive parts using a low-level API for accelerator programming, it may not be feasible to do it for the entire application. But, having only selected parts of the application running on the GPU requires repetitively transferring data between the GPU and the host CPU, which may lead to a serious performance penalty. In this paper we assess the potential of compiler directives, based on the OpenACC standard, for porting large parts of code and thus achieving a full GPU implementation. As an illustrative and relevant example, we consider the climate and numerical weather prediction code COSMO (Consortium for Small Scale Modeling) and focus on the physical parametrizations, a part of the code which describes all physical processes not accounted for by the fundamental equations of atmospheric motion. We show, by porting three of the dominant parametrization schemes, the radiation, microphysics and turbulence parametrizations, that compiler directives are an efficient tool both in terms of final execution time as well as implementation effort. Compiler directives enable to port large sections of the existing code with minor modifications while still allowing for further optimizations for the most performance critical parts. With the example of the radiation parametrization, which contains the solution of a block tri-diagonal linear system, the required code modifications and key optimizations are discussed in detail. Performance tests for the three physical parametrizations show a speedup of between 3× and 7× for execution time obtained on a GPU and on a multi-core CPU of an equivalent generation.