High Performance Linpack with CPU
- Compile HPL
- Copy the template Makefile:
cp setup/Make.Linux_Intel64 Make.Linux_Intel64
- Edit the make file and change following lines:
TOPdir = <hpl-2.3 top folder directory> MPdir = <openmpi file directory> MPinc = -I$(MPdir)/include MPlib = -L$(MPdir)/lib -lmpi LAdir = <openblas file directory> LAinc = -I$(LAdir)/include LAlib = $(LAdir)/lib/libopenblas.a CC = mpicc CCNOOPT = $(HPL_DEFS) CCFLAGS = $(HPL_DEFS) -O3 -w -z noexecstack -z relro -z now -Wall # modify this according to the cpu LINKFLAGS = $(CCFLAGS) $(OMP_DEFS)
- Compile
make arch=Linux_Intel64
- If you want to clean:
make clean arch=Linux_Intel64
- Copy the template Makefile:
- Run HPL
-
Edit the file bin/Linux_Intel64/HPL.dat inside the top folder.
Here is an example with 8GB RAM and 4 Cores CPU:HPLinpack benchmark input file Innovative Computing Laboratory, University of Tennessee HPL.out output file name (if any) 6 device out (6=stdout,7=stderr,file) 1 # of problems sizes (N) 29184 Ns 1 # of NBs 192 NBs 0 PMAP process mapping (0=Row-,1=Column-major) 1 # of process grids (P x Q) 2 Ps 2 Qs 16.0 threshold 1 # of panel fact 2 PFACTs (0=left, 1=Crout, 2=Right) 1 # of recursive stopping criterium 4 NBMINs (>= 1) 1 # of panels in recursion 2 NDIVs 1 # of recursive panel fact. 1 RFACTs (0=left, 1=Crout, 2=Right) 1 # of broadcast 1 BCASTs (0=1rg,1=1rM,2=2rg,3=2rM,4=Lng,5=LnM) 1 # of lookahead depth 1 DEPTHs (>=0) 2 SWAP (0=bin-exch,1=long,2=mix) 64 swapping threshold 0 L1 in (0=transposed,1=no-transposed) form 0 U in (0=transposed,1=no-transposed) form 1 Equilibration (0=no,1=yes) 8 memory alignment in double (> 0) ##### This line (no. 32) is ignored (it serves as a separator). ###### 0 Number of additional problem sizes for PTRANS 1200 10000 30000 values of N 0 number of additional blocking sizes for PTRANS 40 9 8 13 13 20 16 32 64 values of NB
-
To tune the parameters, can reference the website here. It is not guaranteed to be the optimized setup. Try to tune the parameter by yourself.
The following parameters are probably you need to tune:
- Ps * Qs: the number of cores
- Ns: the problem size
- NBs: the block size
-
Run benchmark
mpirun -np <number of cores> ./xhpl
-