# NPB-CPP-benchmark

**Repository Path**: went_forward/npb-cpp-benchmark

## Basic Information

- **Project Name**: NPB-CPP-benchmark
- **Description**: npb-cpp-benchmark
- **Primary Language**: Unknown
- **License**: MIT
- **Default Branch**: master
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 0
- **Created**: 2022-05-02
- **Last Updated**: 2024-09-12

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

# The NAS Parallel Benchmarks for evaluating C++ parallel programming frameworks on shared-memory architectures

The NPB's Fortran codes were carefully ported to **C++** and are fully compliant to the **NPB3.4.1** version ([NPB official webpage](https://www.nas.nasa.gov/publications/npb.html)). Our [paper](https://doi.org/10.1016/j.future.2021.07.021) contains abundant information on how the porting was conducted and discusses the outcome performance we obtained with **NPB-CPP** on different machines (Intel Xeon, AMD Epyc, and IBM Power8) and compilers (GCC, ICC, and Clang). Results showed that we achieved similar performance with **NPB-CPP** compared to the original **NPB**. **You can use our paper, along with the official reports, as a guide to assess performance using the NPB suite**.

## How to cite this work
  
[[DOI]](https://doi.org/10.1016/j.future.2021.07.021) J. Löff, D. Griebler, G. Mencagli et al., **The NAS Parallel Benchmarks for evaluating C++ parallel programming frameworks on shared-memory architectures**, *Future Generation Computer Systems (FGCS)* (2021)


*This is a repository aimed at providing parallel codes with different C++ parallel programming APIs for the NAS Parallel Benchmarks (NPB). You can also contribute with this project, writing issues and pull requests.*

The conventions we used in our porting can be found [here](notes-conventions.md)


    ===================================================================
      NAS Parallel Benchmarks in C++ using OpenMP, FastFlow and Intel TBB

        This project was conducted in the Parallel Applications
        Modelling Group (GMAP) at PUCRS - Brazil.

        GMAP Research Group leader:
            Luiz Gustavo Leão Fernandes

        Code contributors: 
            Dalvan Griebler (PUCRS)
            Gabriell Araujo (PUCRS)
            Júnior Löff (PUCRS)

      In case of questions or problems, please send an e-mail to us:	
        dalvan.griebler@acad.pucrs.br
        gabriell.araujo@edu.pucrs.br			
        junior.loff@edu.pucrs.br				

      We would like to thank the following researchers for the 
      fruitful discussions:
          Gabriele Mencagli	(UNIPI)
          Massimo Torquati	(UNIPI)
          Marco Danelutto (UNIPI)
    ===================================================================


### Folders inside the  project:

**NPB-SER** - This directory contains the sequential version.

**NPB-OMP** - This directory contains the parallel version implemented with OpenMP (based in the original NPB version).

**NPB-TBB** - This directory contains the parallel version implemented with Threading Building Blocks.

**NPB-FF** - This directory contains the parallel version implemented with FastFlow.

# The Five Kernels and Three Pseudo-applications

Each directory is independent and contains its own implemented version of the kernels and pseudo-applications:

## Kernels

	EP - Embarrassingly Parallel, floating-point operation capacity
	MG - Multi-Grid, non-local memory accesses, short- and long-distance communication
	CG - Conjugate Gradient, irregular memory accesses and communication
	FT - discrete 3D fast Fourier Transform, intensive long-distance communication
	IS - Integer Sort, integer computation and communication

## Pseudo-applications

	BT - Block Tri-diagonal solver
	SP - Scalar Penta-diagonal solver
	LU - Lower-Upper Gauss-Seidel solver

*Tip: The pseudo-applications' performance is bounded to the sequential partial differential equation (PDE) solver*

# Software Requirements

*Warning: our tests were made with GCC-9 and ICC-19*

# How to Compile 

Enter the directory from the version desired and execute:

`$ make _BENCHMARK CLASS=_WORKLOAD`

_BENCHMARKs are: 
		
	EP, CG, MG, IS, FT, BT, SP and LU 
																										
_WORKLOADs are: 
	
	Class S: small for quick test purposes
	Class W: workstation size (a 90's workstation; now likely too small)	
	Classes A, B, C: standard test problems; ~4X size increase going from one class to the next	
	Classes D, E, F: large test problems; ~16X size increase from each of the previous Classes  


Command example:

`$ make ep CLASS=A`

# How to Execute

Binaries are generated inside the bin folder

Command example:
	
`$ ./bin/ep.A`

# Compiler and Parallel Configurations

Each folder contains a default compiler configuration that can be modified in the `config/make.def` file.
You must use this file if you want to modify the target compiler, flags or links that will be used to compile the applications.

## Parallel Execution

### Using and configuring the used parallel programming frameworks

The repository already has an additional directory `libs` with the FastFlow and Intel TBB libraries.

For TBB you need to compile the library and load the environment variables, therefore, enter `libs/tbb-2020.1` and execute the following command:

`$ make`

This command will generate a folder inside `libs/tbb-2020.1/build`. Finally, you can load TBB vars within the script `tbbvars.sh`, for example, executing the following command in your terminal:

`$ source libs/tbb-2020.1/build/linux_intel64_gcc_cc7.5.0_libc2.27_kernel4.15.0_release/tbbvars.sh`

### Setting the degree of parallelism (NUM_THREADS)

The degree of parallelism can be set using the `*RUNTIME*_NUM_THREADS` environment variable.

Command example:
		
`$ export OMP_NUM_THREADS=32`
or
`TBB_NUM_THREADS` and `FF_NUM_THREADS`