Compiler flags for IBM XL compiler suites
Following is an overview of available compiler flags for IBM XL
compiler suites, with an emphasis on performance optimization and
compatibility with GNU compilers on Big Red at Indiana
University. This document assumes you have experience with compiling
programs at the command line and know how to use basic compiler
options (e.g., -I, -L, -g) to
compile and link C/C++ or Fortran programs.
Consider the following tips for optimizing:
- Start with
-O0(the default for IBM compilers) and gradually improve optimization. Starting at-O0ensures your code is algorithmically correct and also preserves all debug information and can expose problems in existing code. Also you can specify the-qarchoption to take advantage of architectural benefits, for example: xlc_r -c -O0 -qarch=ppc970 -g example.cIf a problem occurs,
-qdbggenerates debug information for use by a symbolic debugger. The default is-qnodbg. - Apply
-O2. Compilation at-O2opens your application to a set of comprehensive low-level transformations that apply to subprogram or compilation unit scopes and can include some inlining as well. Optionally, you can specify-qmaxmem=-1to allow the optimizer to use as much memory as needed. You can also use the debugger by specifying the-goption at-O2.
- Apply
-O3. Specifying-O3initiates more intense low-level transformations that remove many of the limitations present at-O2.-O3implies-qnostrict -qmaxmem=-1 -qhot=level=0. If you encounter problems at this optimization, try using-qstrictor-qnohotalong with-O3, for example: xlc_r -c -O3 -qstrict -qarch=ppc970 -qtune=ppc970 example.c - Turn on selected
-qhotsuboptions.-O3implies only-qhot=level=0, which includes minimal loop transformations to increase performance. You can specify-qhotalong with-O3to see if the application gets additional performance boost.-qhotis designed to have neutral effect when no opportunities exist.
Performance optimization
To effectively use the powerful hardware available on Big Red at IU, IBM XL compilers provide a series of flags to take advantage of specific hardware features:
-
-q32and-q64 - Big Red can run both 32- and 64-bit binaries.
-q32instructs the compiler to compile the source code into 32-bit object code, while-q64makes the compiler generate 64-bit object code. On Big Red, the environment variable OBJECT_MODE has the same effect as these options; however, UITS recommends specifying the bit mode explicitly at the command line to avoid confusion. -
-qarch - This lets the compiler target a specific architecture for
compilation. For Big Red, you can use
-qarch=ppc970, because Big Red has PPC970 processors. The compiler will generate PPC970 processor specific instructions. Note that the code generated this way may not run on other architectures. Alternatively, you can also use-qarch=ppcor-qarch=ppc64, which make the compiler generate code that runs on all PowerPC or all 64-bit PowerPC architectures.-qarchcorresponds to the-marchoption in GNU compilers. -
-qtune - Matching
-qtunewith-qarchwill provide the best performance on the specified architecture. If-qarchis specified, the compiler will default to the matching-qtuneoption. If-qarch=ppcor-qarch=ppc64is specified, you can use either-qtune=autoor-qtune=ppc970. This option corresponds to the-mtuneoption in GNU compilers. -
-qhot - This option instructs compilers to perform high-order loop
analysis and transformations during optimization. The suboptions for
-qhotinclude the following:-
arraypad - This option permits the compiler to increase the dimensions of
arrays where doing so might improve the efficiency of array-processing
loops, since arrays with dimension of power of two can lead to
decreased cache utilization. Not all arrays will be padded and
different arrays might be padded by different amounts. Use of
-qhot=arraypadmight be unsafe since there is no checking for reshaping, which may cause code to break. -
simdandnosimd - This is useful for applications with significant image processing
needs. It converts certain operations in a loop on successive elements
of an array into a call to VMX instructions that could utilize the
Altivec unit on the PPC970 processor. This is effective only with
-qarchset to an architecture that supports VMX instructions. -
vectorandnovector - This causes the compiler to convert certain operations performed
on successive array elements in a loop into a call to the IBM MASS
library routine, which calculates several results at a time. It
supports both single and double precision floating point
operations. Note that use of this option could reduce the precision of
operations. Using
-qstricttogether with-qhotwill turn it off. -
level=0andlevel=1 -
-qhot=level=0is equivalent tonovector, nosimd, noarraypad.-qhot=level=1is equivalent to-qhotand the same as-qhot=nosimd,-qhot=noarraypad, and-qhot=vector.
-
Following are flags you might consider using together with the above architecture-specific flags in compiling and optimizing the code:
-
-qstrictand-qnostrict - This turns off aggressive optimizations that have the potential to alter program semantics.
-
-qipaand-qnoipa -
-qipaenables inter-procedural analysis and-qnoipadisables it.-qipaperforms automatic inlining, limited alias analysis, and limited call-site tailoring. This is implied when you specify an aggressive optimization level such as-O4or-O5. UITS recommends you specify-qipaoptions both when compiling and when linking. -
-qmaxmem - This limits the amount of memory that the compiler allocates while
performing specific, memory-intensive optimizations to the specified
number of kilobytes. A value of
-1allows optimization to take as much memory as it needs without checking for limits. -
-O0,-O(same as-O2),-O3,-O4, and-O5 - At
-O0, the compiler performs only minimum optimizations, such as constant folding and elimination of local common subexpressions.-Oor-O2specifies what the compiler's developer considered the best combination of compilation speed and run time performance.-O3performs additional optimizations that are memory intensive, compile time intensive, or both. UITS recommends you use them when run time improvement is more important than minimizing compilation resources.-O3could slightly alter semantics of the program because it may rewrite floating point operations and relax conformance to IEEE rules.-O3implies-qhot=level=1.-O4is more aggressive; it automatically sets the-qarchand-qtuneoptions to the compiling architecture, and sets-qhotand-qipa.-O5raises the level of inter-procedural analysis.
Compatibility options
-
-W - UITS recommends using the IBM compiler and linker to
avoid turning off link time optimization. The compiler uses the
-Woption to pass options specified at the command line to the linker and different components of the compiler. For example, to pass the option--whole-archiveto the linker, specify -Wl,--whole-archiveat the command line. For more information refer to the man page for IBM compilers on Big Red. -
-qextnameand-qnoextname(Fortran compiler only) -
-qextnameadds a trailing underscore to the names of all global entities except for the main program names. -
-qlanglvl - Both IBM Fortran and C/C++ use
-qlanglvlto specify the language standard to check against. For detailed language level specification, refer to themanpages. This option corresponds to the-stdoption in the GNU compilers. -
-qintsize/-qrealsize/-qautodbl(Fortran compiler only) -
-qintsizespecifies the default size for INTEGER and LOGICAL values;-qrealsizespecifies the default size for REAL, DOUBLE PRECISION, COMPLEX, and DOUBLE COMPLEX values.-qautodblconverts single-precision floating point calculations to double precision, and double precision calculations to extended precision. The default is-qautodbl=none. -
-qmkshrobj(C/C++ compiler only) - This option corresponds to the
-sharedoption in GNU C/C++ compilers. This option tells the linker to create a shared object from the generated object files. -
-qpic -
-qpic=smalland-qpic=largecorrespond to-fpicand-fPICin GNU C/C++ compilers.-qpictells the compiler to generate position-independent code that can be used in the shared libraries.-qpic=smallspecifies that the Global Offset Table should be less than 64KB, while-qpic=largemeans the Global Offset Table larger than 64KB.
For more complete information, see the following IBM compiler references:
This document was developed with support from the National Science Foundation (NSF) under Grant No. 0503697 to the University of Chicago and subcontracted to Indiana University. Additional support was provided by IU through its participation in the TeraGrid, which is supported by the NSF under Grants No. 0833618, SCI451237, SCI535258, and SCI504075. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the NSF.
Last modified on January 15, 2008.






