High Performance Linpack with CPU

Compile HPL

Copy the template Makefile:

cp setup/Make.Linux_Intel64 Make.Linux_Intel64

Edit the make file and change following lines:

TOPdir = <hpl-2.3 top folder directory>

MPdir =  <openmpi file directory>
MPinc = -I$(MPdir)/include
MPlib = -L$(MPdir)/lib -lmpi

LAdir = <openblas file directory>
LAinc = -I$(LAdir)/include
LAlib = $(LAdir)/lib/libopenblas.a

CC = mpicc
CCNOOPT = $(HPL_DEFS)
CCFLAGS = $(HPL_DEFS) -O3 -w -z noexecstack -z relro -z now -Wall # modify this according to the cpu

LINKFLAGS = $(CCFLAGS) $(OMP_DEFS)

Compile
```
make arch=Linux_Intel64
```
If you want to clean:
```
make clean arch=Linux_Intel64
```

Run HPL

Edit the file bin/Linux_Intel64/HPL.dat inside the top folder.
Here is an example with 8GB RAM and 4 Cores CPU:

HPLinpack benchmark input file
Innovative Computing Laboratory, University of Tennessee
HPL.out      output file name (if any) 
6            device out (6=stdout,7=stderr,file)
1            # of problems sizes (N)
29184         Ns
1            # of NBs
192           NBs
0            PMAP process mapping (0=Row-,1=Column-major)
1            # of process grids (P x Q)
2            Ps
2            Qs
16.0         threshold
1            # of panel fact
2            PFACTs (0=left, 1=Crout, 2=Right)
1            # of recursive stopping criterium
4            NBMINs (>= 1)
1            # of panels in recursion
2            NDIVs
1            # of recursive panel fact.
1            RFACTs (0=left, 1=Crout, 2=Right)
1            # of broadcast
1            BCASTs (0=1rg,1=1rM,2=2rg,3=2rM,4=Lng,5=LnM)
1            # of lookahead depth
1            DEPTHs (>=0)
2            SWAP (0=bin-exch,1=long,2=mix)
64           swapping threshold
0            L1 in (0=transposed,1=no-transposed) form
0            U  in (0=transposed,1=no-transposed) form
1            Equilibration (0=no,1=yes)
8            memory alignment in double (> 0)
##### This line (no. 32) is ignored (it serves as a separator). ######
0                               Number of additional problem sizes for PTRANS
1200 10000 30000                values of N
0                               number of additional blocking sizes for PTRANS
40 9 8 13 13 20 16 32 64        values of NB

To tune the parameters, can reference the website here. It is not guaranteed to be the optimized setup. Try to tune the parameter by yourself.

The following parameters are probably you need to tune:
- Ps * Qs: the number of cores
- Ns: the problem size
- NBs: the block size
Run benchmark
```
mpirun -np <number of cores> ./xhpl
```

Henry's Notebook

High Performance Linpack with CPU