HINT is a universal benchmark. It has been ported to numerous platforms,
almost all of which are different kinds of computers. In the following
guide, we describe how to port and use HINT for a specific target machine.
The simplest way is to use an executable
if one available. If not or for the best performance results, you may need
to port the code to the target platform. In about 90% cases, setting appropriate
Makefile options is sufficient for porting HINT. In case the code
needs to be changed, we recommend starting with a similar version and then
modifying it. For best results, we recommend modifying with available
compiler optimization flags and adjustable
defines in the code.
Remember we use C here!
In this document, we will refer to the ANSI C version of HINT. Experiments
with HINT have revealed that the programming language used for the kernel
code (C or Fortran), has no effect on the performance numbers. C has an
added advantage over Fortran 77 that it allow us to use the malloc()
and free() library calls to dynamically increase the size of the
problem. C code is often easier to port to many platforms. We do have a
Fortran version of HINT which we will discuss briefly in a later
HINT Source Code and functions
HINT source code is organized into five files: Makefile,
hint.c, and hkernel.c.
Makefile is the project maintenance
file. It contains the machine types, pre-preprocessor
directives, and the compiler optimization flags for compilation and linking.
hint.h is the header file which contains
the essential #include, adjustable
defines and non-adjustable defines, macros, type declarations, and
function prototypes. We will discuss more about adjustable defines in a
typedefs.h is the header file which
contains all the type definitions. In particular, it determine the appropriate
type of DSIZE (data type of computation), and ISIZE (data type of
indexing) depending upon the pre-processor directives typically specified
in the Makefile.
hint.c contains the main driver code.
It has three main functions:
hkernel.c contains the kernel code. It
has only one function: DSIZE Hint(...) which calculates the
first-order hierarchical integration of a monotonously decreasing function.
The refinement and accuracy of the result depends upon the
number of subintervals. The larger the subinterval, the more accurate the
results will be. However, it would require more calculations and hence
it would take longer to compute.
main(...) : Depending upon the index and computation data
types, this function is responsible for determining the increasing workload
and invoking Run() for each workload.
double Run(..) : This function is responsible for allocating and
deallocating the workload. It is also responsible for timing and validating
the results. Usually, we comment out the validation code during actual
performance measurement to avoid any overhead.
double When() : This function returns the wall clock value.
In the following section, we will describe what you need to do port HINT
to any desired machine.
You must first choose the data type for computation. For double/int/long/longlong/float
data type, the preprocessor define (see typedefs.h) is DOUBLE/INT/LONG/LONGLONG/FLOAT.
You must also choose the data types for indexing. For int/long data
types, the preprocessor define (see typedefs.h) is IINT/ILONG.
Make directory hint. Change to hint directory.
Download HINT code to the current directory from following URL
Check the following in the Makefile and set if required/
Set ARCH to unix.
Set CC to your favorite compiler. Usually it is cc or gcc.
Check the KERN_CFLAGS or DRIV_CFLAGS. Use the standard optimization for
best speed available. For example for SGI cc it is -Ofast, for Sun it is
-xO4, for GNU gcc it is -O3.
Note that with this Makefile, five executables are produced depending upon
the data types.
Type make on the prompt. If there is no error, executable
files named DOUBLE, SHORT, INT, LONGLONG,
and FLOAT will be created in your directory.
Create a directory called data. All the data files will be
produced inside this.
Run hint executables. Use either run.pl
to run a batch job.
(PowerPC/G3) : The Codewarrior project and source code has been provided.
It is quite easy to perform a compilation using the project file on Macintosh.
If you wish to use some other compiler, we recommend starting with
new project file then adding in the HINT source code. Please check
the following URL:
: A Visual C++ 5.0 version of HINT has been provided. First,
open the HINT workspace, set the compilation optimization and rebuild the
HINT. For any other compiler on Windows NT, Windows95 or DOS, we
recommend starting with new project file, and then adding in the Visual
C++ source code. Please check the following URL:
Version - The Java version of HINT has been contributed by Mark Millard.
Download the following file : jHINT.
Carefully read the instructions in README.java
. Note that currently this version doesn't perform respectably (when compared
to the C version) due to the lack of a good Java environment that produces
Version - The HINT kernel has been written in Fortran. You must
link the given C driver code the given kernel code following your machine
conventions (please refer to the compiler user manual on how to link C
and Fortran code together). You need to compile this program in a
manner similar to the serial version described above.
For porting a vector version, remember the following. You can
find details for most of the following in the vector computer's user manual.
There are a number of paradigms which exist in the contemporary world of
Parallel Computing. Please follow the following guidelines for each
of the version.
The data types used for computation should be vectorizable.
Remember to switch on the compiler switch that reports on vectorization.
Most of the inner loop of hkernel.c should be vectorizable.
A reduction operator (for summation) is used in the hkernel.c.
Make sure that you use the right command #pragma to indicate this.
In worst case, one can say that summation is done in serial.
It is important to have the vector length set to the optimal size (must
be a multiple of 2). For most machines, it is either 64 or 128.
Most vector computers have an optimized timer. After initial porting, you
may try rewriting when(). Make sure that the timer you use
is a wall time and not the cpu or system timer. Check details this
section on how to make sure your timer is right.
Single Vector Processor: Cray
Machines (CRAYC90). Hitachi-SX4.
The difference between Hitachi-SX4 and Cray vector programming is that
Hitachi C version doesn't allow C structure assignment (ANSI) as one
vector operation . So, one needs to do a structure assignment element by
Parallel Vector: Cray
Machines (shared memory pragmas), Hitachi-SX4
Fujitsu and other vendors: Use either the Cray version or the Hitachi
version whichever is close to your machine programming. Use appropriate
pragmas, and compilers optimization flags specific to the respective vendors.
Check Makefile for the following and set if necessary.
Select a compiler and set CC to it in the Makefile.
Set the KERN_CFLAGS or DRIV_CFLAGS. Use the standard optimization for the
best speed available.
Set the computation data types as described above in the general guidelines
Now follow the guidelines given below for specific version.
Version (For workstation/PC clusters, IBM-SP2 ): Set ARCH=MPI
in the Makefile. Make sure your MPI path and library are set to appropriate
Memory Version (SGI, SUN, Kendall Square): For SGI and SUN, corresponding
Makefile and batchfile have been provided. For Kendall Square, set ARCH=KSR
in the Makefile.
: MASPAR1, MASPAR2 : Set ARCH=MP1 for MASPAR1, and ARCH=MP2
for MASPAR2 in the Makefile.
: NCUBE, Intel Paragon : Set ARCH=NCUBE2 for NCUBE2, ARCH=NCUBE2S
for NCUBE2S, and ARCH=PARAGON for Intel Paragon.
We will now present here some ways of tuning HINT in a question and answer
format. In order to get results that may be fairly compared across
machines, it is necessary to fine tune your HINT run to get the best curve
possible. The tradeoff is the best curve versus running time.
The longer HINT is run, closer one gets to the "perfect HINT curve".
However do note that a short node may not produce the finest possible curve,
butit is often sufficient for practical purpose. We recommend getting an
initial version of HINT up and running before you attempt to fine tune
Q: What are adjustable defines?
How, and why can I change them?
A: The header file hint.h contains a list of defines
which the user can change to get good performance numbers.
Q: I don't care about best results! How can I get results
ADVANCE is the step size (workload multiplies with this step size).
We use roughly 1 decibel as the step size. A step size closer to
1.0 takes longer to run, but might produce a slightly higher net QUIPS.
NCHUNK is the number of chunks for scatter decomposition. It must
be a multiple of 2. Larger numbers increase the time needed to get
the first result (latency), but it scatter domain more evenly.
NSAMP defines the size of the array used to store the number
of QUIPS measurements. Increase only if required, for instance, if ADVANCE
is small then number of sample points (QUIPS measurement) will be more.
NTRIAL is the number of times a trial is run. Increase this if the computer
is prone to interruption; if your HINT curve is noisy (i.e. not smooth
and jittery curve).
PATIENCE is the number of times a bogus trial is re-run.
RUNTM is the target time in seconds. Each workload is run for an
approximate RUNTM seconds. Hence number of iterations is large for a smaller
workload and small for a larger workload. We recommend the reduction
of RUNTM for high-resolution timers since fewer iterations can yield fairly
accurate reading. Obviously, RUNTM should be much larger than the timer
STOPRT is the ratio between current QUIPS to peak QUIPS, at which the run
must terminate. Smaller numbers instruct HINT to run even if the
performance drop is huge. This might end up running on virtual memory.
STOPTM is the longest acceptable running time in seconds.
MXPROC is the maximum number of processors. It is only valid in the
parallel and parallel-vector versions of the HINT code.
A: The best way to get faster results is to increase the ADVANCE.
You can also reduce the number of the trials (NTRIALS) and the number of
re-trial for bogus results (PATIENCE). The following configuration
in hint.h will produce faster results. Though you may
change either one or all the following #define.
#define ADVANCE 1.2589
#define PATIENCE 7
#define NTRIAL 5
The following configuration will give better results but will be slower
than the one above.
#define ADVANCE 1.1
#define PATIENCE 13
#define NTRIAL 20
Q: My timer resolution is not good. What should I do ?
A: The easiest thing which you can do is to increase the RUNTM.
This will increase the number of iterations for a workload.
Q: I wrote my own timer. How do I
A: Here are few tips:
Check your timer resolution (tdelta). A value of tdelta like
0.0166 means it is low resolution. A value of tdelta like 0.0001666
means resolution is probably high. The sample code to determine tdelta
is as follows:
t0 = clock(); /* dummy call */
for (t0 = clock(); ((t1 = clock())
tdelta = t1 - t0; /* timer resolution
Check whether the timer is a wall clock or not. A way to check this is
to time sleep(). The sample code is as follows:
t = When();
t2 = When();
diff = t2 - t; /* if diff is 1.0 it
means that you are measuring wall clock */
Following links to some other documents which might be useful to the reader:
Hint Paper (html,
README Background (text,
Understanding HINT Graphs (html)
The HINT homepage is at http://www.scl.ameslab.gov/HINT/
which contains links to all the HINT related documents.
Please send any comments or suggestions to firstname.lastname@example.org.
If you write a new version of HINT or improve upon any of the
existing versions, please submit your code to us. We
will appropriately acknowledge your work.