Information

Users who would like early access to the 9304-node Cori Phase II system are required to fill out this form. In this application, they must show that their software is suitable for running on 2nd-generation Intel Xeon Phi Processor 7250 (Knights Landing). Below we discuss the reasons for establishing this gating procedure.

The Knights Landing architecture on Cori Phase II is unique in a number of ways from previous platforms at NERSC such as Cori Phase I:

  1. More on-node parallelism: 68 cores/272 hardware threads on Knights Landing, vs. 32 cores/64 hardware threads on the Xeon Processor E5-2698 v3 (Haswell) on Cori Phase I
  2. Lower clock speeds and serial performance: 1.4 GHz per core on Knights Landing, vs. 2.3 GHz per core on Haswell
  3. Wider vector processing units: 512 bit-wide VPUs on Knights Landing, using AVX-512 instructions, vs. 256 bit-wide VPUs on Haswell, using AVX2 instructions
  4. Smaller caches: each core on Knights Landing has 64 KB of L1 cache (split evenly between instruction cache and data cache), and each pair of cores shares a 1 MB L2 cache. In contrast, each Haswell CPU has 64 KB of L1 cache, a private 256 KB L2 cache, and 40 MB of L3 cache shared among all 16 cores on each socket.
  5. Deeper memory hierarchy: Both Knights Landing and Haswell provide a large amount of DDR4 DRAM per compute node (96 GB and 128 GB, respectively), with peak bandwidths of around 123 GB/sec. In addition, Knights Landing provides 16 GB of high-bandwidth, multi-channel DRAM (MCDRAM). MCDRAM supports transfers at about 430 GB/sec.

Applications which compile and run on Haswell will also compile and run on Knights Landing, but relative code performance on the two platforms may be significantly different without modifications targeting the Xeon Phi platform's unique features described above. This application thus serves two purposes:

  1. It helps NERSC staff identify which users' codes are suitably optimized for Xeon Phi
  2. It helps users identify optimization opportunities and provides guidance regarding how to identify and optimize hotspots in their applications so that they will perform optimally on Cori Phase II

In order to complete this form, you will be asked to run your application in several different ways on Knights Landing compute nodes. For this reason, all NERSC users will have access to the Cori Phase II debug queue, so that they can explore application performance and optimization before submitting this application. We have also drafted an example application for EMGeo, one of our Tier-1 NESAP codes.

Suggested Reading and Additional Information

For more information on profiling, please review information on Performance and Debugging Tools. For information of how to employ and optimize Simultaneous Multithreading (SMT) in your application, please read our website on OpenMP parallelization. Our KNL optimization case studies might also serve as a good reference for what needs to be done in order for getting ready for KNL

For the vectorization comparison in task 9, you need to disable or reduce the vectorization support. Here is a list which describes how you can achieve this with the various NERSC supported compiler flavors. We have created a table with the necessary compile flags:

no vectorizationAVX-2AVX-512
Intel-no-vec -no-simd (and comment out OpenMP SIMD pragmas)-xCORE-AVX2-xMIC-AVX512
GNU-march=knl -fno-tree-vectorize -fno-tree-loop-vectorize -fno-tree-slp-vectorize-march=knl -mavx2-march=knl
Cray-h vector0 -h nopattern-h cpu=haswell-h cpu=mic-knl

Application Form
General Information
Enter the name of your software/application here.
Please briefly describe the science case for your application (2000 words max).
Select the programming languages used in your application (multiple selections possible).
If an important language is not in the list, please specify a list programming languages used in your project, separated by comma.
Select the most important kernels or algorithms used in your application (multiple selections possible, only the major workhorses).
Application Performance

Paste your results from the thread scaling study for Haswell (Cori Phase I) and KNL (Cori Phase II) into the text field and click on "Plot Results" to draw a plot.

For more information, please consult our example page.

Paste your results from the MPI vs. thread scaling study for Haswell and KNL into this field and click on "Plot Results" to draw a plot.

For more information, please consult our example page.

Paste your results from the KNL memory mode study into this field and click on "Plot Results" to draw a plot.

For more information, please consult our example page.

Paste your results from the KNL vectorization experiment into this field and click on "Plot Results" to draw a plot.

For more information, please consult our example page.

Paste your results from the multi-node-scaling experiment into this field and click on "Plot Results" to draw a plot. If you are planning to use the system mainly for capacity workloads, please provide a weak scaling study. If you plan to use it for capability workloads, please provide a strong scaling study. If you plan to do both, please provide both. In any case, please scale the problem up to as many nodes as you are going to use later on with a representative local problem size.

For more information, please consult our example page.

Here you can add additional comments which do not fit into the categories above or give us some feedback about the gating procedure.
I certify that I have the permission to submit this application on behalf of the selected repository. By submitting this application, I am aware that I will become the point of contact for revisions and all future applications for this repository.