Information

Users who would like early access to the 9304 KNL nodes on Cori are required to fill out this form.

The Knights Landing architecture on Cori is unique in a number of ways from previous platforms at NERSC such as Cori Phase I:

  1. More on-node parallelism: 68 cores/272 hardware threads on Knights Landing, vs. 32 cores/64 hardware threads on the Xeon Processor E5-2698 v3 (Haswell) on Cori Phase I
  2. Lower clock speeds and serial performance: 1.4 GHz per core on Knights Landing, vs. 2.3 GHz per core on Haswell
  3. Wider vector processing units: 512 bit-wide VPUs on Knights Landing, using AVX-512 instructions, vs. 256 bit-wide VPUs on Haswell, using AVX2 instructions
  4. Smaller caches: each core on Knights Landing has 64 KB of L1 cache (split evenly between instruction cache and data cache), and each pair of cores shares a 1 MB L2 cache. In contrast, each Haswell CPU has 64 KB of L1 cache, a private 256 KB L2 cache, and 40 MB of L3 cache shared among all 16 cores on each socket.
  5. Deeper memory hierarchy: Both Knights Landing and Haswell provide a large amount of DDR4 DRAM per compute node (96 GB and 128 GB, respectively), with peak bandwidths of around 123 GB/sec. In addition, Knights Landing provides 16 GB of high-bandwidth, multi-channel DRAM (MCDRAM). MCDRAM supports transfers at about 430 GB/sec.

The purpose of this application is to:

  1. Help NERSC understand how our workload uses the Xeon Phi features described above.
  2. Help users identify optimization opportunities and provides guidance regarding how to identify and optimize hotspots in their applications so that they will perform optimally on Cori Phase II

In order to complete this form, you will be asked to run your application (using the debug queue) in several different ways on Knights Landing compute nodes. Detailed instructions and example output can be found here.

For more information on profiling, please review information on Performance and Debugging Tools. For information of how to employ and optimize Simultaneous Multithreading (SMT) in your application, please read our website on OpenMP parallelization. Our KNL optimization case studies might also serve as a good reference for what needs to be done in order for getting ready for KNL

Application Form
General Information
Enter the name of your software/application here.
Please briefly describe the science case for your application (2000 words max).
Select the programming languages used in your application (multiple selections possible).
If an important language is not in the list, please specify a list programming languages used in your project, separated by comma.
Select the most important kernels or algorithms used in your application (multiple selections possible, only the major workhorses).
Application Performance

Instructions and Example.

Paste your results from the thread scaling study for Haswell (Cori Phase I) and KNL (Cori Phase II) into the text field and click on "Plot Results" to draw a plot.

Instructions and Example.

Paste your results from the MPI vs. thread scaling study for Haswell and KNL into this field and click on "Plot Results" to draw a plot.

Instructions and Example.

Paste your results from the KNL memory mode study into this field and click on "Plot Results" to draw a plot.

Instructions and Example.

Use the following compiler options disable or reduce the vectorization support. Here is a list which describes how you can achieve this with the various NERSC supported compiler flavors. We have created a table with the necessary compile flags:

no vectorizationAVX-2AVX-512
Intel-no-vec -no-simd (and comment out OpenMP SIMD pragmas)-xCORE-AVX2-xMIC-AVX512
GNU-march=knl -fno-tree-vectorize -fno-tree-loop-vectorize -fno-tree-slp-vectorize-march=knl -mavx2-march=knl
Cray-h vector0 -h nopattern-h cpu=haswell-h cpu=mic-knl

Paste your results from the KNL vectorization experiment into this field and click on "Plot Results" to draw a plot.

Instructions and Example.

Paste your results from the multi-node-scaling experiment into this field and click on "Plot Results" to draw a plot. If you are planning to use the system mainly for capacity workloads, please provide a weak scaling study. If you plan to use it for capability workloads, please provide a strong scaling study. If you plan to do both, please provide both. In any case, please scale the problem up to as many nodes as you are going to use later on with a representative local problem size.

Here you can add additional comments which do not fit into the categories above or give us some feedback about the gating procedure.
I certify that I have the permission to submit this application on behalf of the selected repository. By submitting this application, I am aware that I will become the point of contact for revisions and all future applications for this repository.