Compiler flags for IBM XL compiler suites
Following is an overview of available compiler flags for IBM XL
compiler suites, with an emphasis on performance optimization and
compatibility with GNU compilers on Big Red at Indiana
University. This document assumes you have experience with compiling
programs at the command line and know how to use basic compiler
options (e.g., -I, -L, -g) to
compile and link C/C++ or Fortran programs.
Consider the following tips for optimizing:
- Start with
-O0(the default for IBM compilers) and gradually improve optimization. Starting at-O0ensures your code is algorithmically correct, preserves all debug information, and can expose problems in existing code. You can also specify the-qarchoption to take advantage of architectural benefits, for example: xlc_r -c -O0 -qarch=ppc970 -g example.cIf a problem occurs,
-qdbggenerates debug information for use by a symbolic debugger. The default is-qnodbg. - Apply
-O2. Compilation at-O2opens your application to a set of comprehensive low-level transformations that apply to subprogram or compilation unit scopes and can include some inlining. Optionally, you can specify-qmaxmem=-1to allow the optimizer to use as much memory as needed. You can also use the debugger by specifying the-goption at-O2.
- Apply
-O3. Specifying-O3initiates more intense low-level transformations that remove many of the limitations present at-O2.-O3implies-qnostrict -qmaxmem=-1 -qhot=level=0. If you encounter problems at this optimization, try using-qstrictor-qnohotalong with-O3, for example: xlc_r -c -O3 -qstrict -qarch=ppc970 -qtune=ppc970 example.c - Turn on selected
-qhotsuboptions.-O3implies only-qhot=level=0, which includes minimal loop transformations to increase performance. You can specify-qhotalong with-O3to see if application performance improves.-qhotis designed to have no effect when no opportunities exist.
Performance optimization
To effectively use the powerful hardware available on Big Red at IU, IBM XL compilers provide a series of flags to take advantage of specific hardware features:
-
-q32and-q64 - Big Red can run both 32- and 64-bit binaries.
-q32instructs the compiler to compile the source code into 32-bit object code, while-q64makes the compiler generate 64-bit object code. On Big Red, the environment variable OBJECT_MODE has the same effect as these options; however, UITS recommends specifying the bit mode explicitly at the command line to avoid confusion. -
-qarch - This lets the compiler target a specific architecture for
compilation. For Big Red, you can use
-qarch=ppc970, to generate instructions specific to Big Red's PPC970 processors. Note that the code generated this way may not run on other architectures. Alternatively, you can use-qarch=ppcor-qarch=ppc64, which make the compiler generate code that runs on all PowerPC or all 64-bit PowerPC architectures.-qarchcorresponds to the-marchoption in GNU compilers. -
-qtune - Using
-qtunewith-qarchwill provide the best performance on the specified architecture. If-qarchis specified, the compiler will default to the matching-qtuneoption. If-qarch=ppcor-qarch=ppc64is specified, you can use either-qtune=autoor-qtune=ppc970. This option corresponds to the-mtuneoption in GNU compilers. -
-qhot - This option instructs compilers to perform high-order loop
analysis and transformations during optimization. The suboptions for
-qhotinclude:-
arraypad - This option permits the compiler to increase the dimensions of
arrays where doing so might improve the efficiency of array-processing
loops, since arrays with dimensions of powers of two can allow
decreased cache utilization. Not all arrays will be padded, and
different arrays might be padded by different amounts. Using
-qhot=arraypadmight be unsafe since, there are no checks for reshaping, which may cause code to break. -
simdandnosimd - This is useful for applications with significant vector processing
needs, such as operations over images. It converts certain operations
in a loop on successive elements of an array into a call to VMX
instructions that use the AltiVec unit on the PPC970 processor. This
is effective only with
-qarchset to an architecture that supports VMX instructions. -
vectorandnovector - This causes the compiler to convert certain operations performed
on successive array elements in a loop into a call to the IBM MASS
library routine, which calculates several results at a time. It
supports both single- and double-precision floating-point
operations. Note that use of this option could reduce the precision of
operations. Using
-qstricttogether with-qhotwill turn it off. -
level=0andlevel=1 -
-qhot=level=0is equivalent tonovector, nosimd, noarraypad.-qhot=level=1is equivalent to-qhot, and has the same effect as a combination of-qhot=nosimd,-qhot=noarraypad, and-qhot=vector.
-
Following are flags you might consider using together with the above architecture-specific flags in compiling and optimizing the code:
-
-qstrictand-qnostrict - This turns off aggressive optimizations that have the potential to alter program semantics.
-
-qipaand-qnoipa -
-qipaenables inter-procedural analysis and-qnoipadisables it.-qipaperforms automatic inlining, limited alias analysis, and limited call-site tailoring. This is implied when you specify an aggressive optimization level such as-O4or-O5. UITS recommends you specify-qipaoptions both when compiling and when linking. -
-qmaxmem - This limits the amount of memory that the compiler allocates while
performing specific, memory-intensive optimizations to the specified
number of kilobytes. A value of
-1allows optimization to take as much memory as it needs without checking for limits. -
-O0,-O(same as-O2),-O3,-O4, and-O5 - At
-O0, the compiler performs only minimal optimizations, such as constant folding and elimination of local common subexpressions.-Oor-O2specifies what the compiler's developer considered the best combination of compilation speed and run time performance.-O3performs additional optimizations that are memory intensive, compile time intensive, or both. UITS recommends you use them when run time improvement is more important than minimizing compilation resources.-O3could slightly alter semantics of the program, because it may rewrite floating point operations and relax conformance to IEEE rules.-O3implies-qhot=level=1.-O4is more aggressive; it automatically sets the-qarchand-qtuneoptions to the compiling architecture, and sets-qhotand-qipa.-O5raises the level of inter-procedural analysis.
Compatibility options
-
-W - UITS recommends using the IBM compiler and linker to
avoid turning off link time optimization. The compiler uses the
-Woption to pass options specified at the command line to the linker and different components of the compiler. For example, to pass the option--whole-archiveto the linker, specify -Wl,--whole-archiveat the command line. For more information, refer to the man page for IBM compilers on Big Red. -
-qlanglvl - Both IBM Fortran and C/C++ use
-qlanglvlto specify the language standard to check against. For detailed language level specification, refer to themanpages. This option corresponds to the-stdoption in the GNU compilers. -
-qpic -
-qpic=smalland-qpic=largecorrespond to-fpicand-fPICin GNU C/C++ compilers.-qpictells the compiler to generate position-independent code that can be used in the shared libraries.-qpic=smallmakes the Global Offset Table less than 64KB, while-qpic=largemakes it larger than 64KB.
C/C++ compiler only
-
-qmkshrobj - This option corresponds to the
-sharedoption in GNU C/C++ compilers. This option tells the linker to create a shared object from the generated object files.
Fortran compiler only
-
-qextnameand-qnoextname -
-qextnameadds a trailing underscore to the names of all global entities except for the main program names. -
-qintsize/-qrealsize/-qautodbl -
-qintsizespecifies the default size for INTEGER and LOGICAL values;-qrealsizespecifies the default size for REAL, DOUBLE PRECISION, COMPLEX, and DOUBLE COMPLEX values.-qautodblconverts single-precision floating-point calculations to double-precision, and double-precision calculations to extended-precision. The default is-qautodbl=none.
For more complete information, see the following IBM compiler references:
This document was developed with support from the National Science Foundation (NSF) under Grant No. 0503697 to the University of Chicago and subcontracted to Indiana University. Additional support was provided by IU through its participation in the TeraGrid, which is supported by the NSF under Grants No. 0833618, SCI451237, SCI535258, and SCI504075. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the NSF.
Last modified on June 19, 2009.






