reference, declarationdefinition
definition → references, declarations, derived classes, virtual overrides
reference to multiple definitions → definitions
unreferenced
    1
    2
    3
    4
    5
    6
    7
    8
    9
   10
   11
   12
   13
   14
   15
   16
   17
   18
   19
   20
   21
   22
   23
   24
   25
   26
   27
   28
   29
   30
   31
   32
   33
   34
   35
   36
   37
   38
   39
   40
   41
   42
   43
   44
   45
   46
   47
   48
   49
   50
   51
   52
   53
   54
   55
   56
   57
   58
   59
   60
   61
   62
   63
   64
   65
   66
   67
   68
   69
   70
   71
   72
   73
   74
   75
   76
   77
   78
   79
   80
   81
   82
   83
   84
   85
   86
   87
   88
   89
   90
   91
   92
   93
   94
   95
   96
   97
   98
   99
  100
  101
  102
  103
  104
  105
  106
  107
  108
  109
  110
  111
  112
  113
  114
  115
  116
  117
  118
  119
  120
  121
  122
  123
  124
  125
  126
  127
  128
  129
  130
  131
  132
  133
  134
  135
  136
  137
  138
  139
  140
  141
  142
  143
  144
  145
  146
  147
  148
  149
  150
  151
  152
  153
  154
  155
  156
  157
  158
  159
  160
  161
  162
  163
  164
  165
  166
  167
  168
  169
  170
  171
  172
  173
  174
  175
  176
  177
  178
  179
  180
  181
  182
  183
  184
  185
  186
  187
  188
  189
  190
  191
  192
  193
  194
  195
  196
  197
  198
  199
  200
  201
  202
  203
  204
  205
  206
  207
  208
  209
  210
  211
  212
  213
  214
  215
  216
  217
  218
  219
  220
  221
  222
======================
Using Polly with Clang
======================

This documentation discusses how Polly can be used in Clang to automatically
optimize C/C++ code during compilation.


.. warning::

  Warning: clang/LLVM/Polly need to be in sync (compiled from the same SVN
  revision).

Make Polly available from Clang
===============================

Polly is available through clang, opt, and bugpoint, if Polly was checked out
into tools/polly before compilation. No further configuration is needed.

Optimizing with Polly
=====================

Optimizing with Polly is as easy as adding -O3 -mllvm -polly to your compiler
flags (Polly is only available at -O3).

.. code-block:: console

  clang -O3 -mllvm -polly file.c

Automatic OpenMP code generation
================================

To automatically detect parallel loops and generate OpenMP code for them you
also need to add -mllvm -polly-parallel -lgomp to your CFLAGS.

.. code-block:: console

  clang -O3 -mllvm -polly -mllvm -polly-parallel -lgomp file.c

Switching the OpenMP backend
----------------------------

The following CL switch allows to choose Polly's OpenMP-backend:

       -polly-omp-backend[=BACKEND]
              choose the OpenMP backend; BACKEND can be 'GNU' (the default) or 'LLVM';

The OpenMP backends can be further influenced using the following CL switches:


       -polly-num-threads[=NUM]
              set the number of threads to use; NUM may be any positive integer (default: 0, which equals automatic/OMP runtime);

       -polly-scheduling[=SCHED]
              set the OpenMP scheduling type; SCHED can be 'static', 'dynamic', 'guided' or 'runtime' (the default);

       -polly-scheduling-chunksize[=CHUNK]
              set the chunksize (for the selected scheduling type); CHUNK may be any strictly positive integer (otherwise it will default to 1);

Note that at the time of writing, the GNU backend may only use the
`polly-num-threads` and `polly-scheduling` switches, where the latter also has
to be set to "runtime".

Example: Use alternative backend with dynamic scheduling, four threads and
chunksize of one (additional switches).

.. code-block:: console

  -mllvm -polly-omp-backend=LLVM -mllvm -polly-num-threads=4
  -mllvm -polly-scheduling=dynamic -mllvm -polly-scheduling-chunksize=1

Automatic Vector code generation
================================

Automatic vector code generation can be enabled by adding -mllvm
-polly-vectorizer=stripmine to your CFLAGS.

.. code-block:: console

  clang -O3 -mllvm -polly -mllvm -polly-vectorizer=stripmine file.c

Isolate the Polly passes
========================

Polly's analysis and transformation passes are run with many other
passes of the pass manager's pipeline.  Some of passes that run before
Polly are essential for its working, for instance the canonicalization
of loop.  Therefore Polly is unable to optimize code straight out of
clang's -O0 output.

To get the LLVM-IR that Polly sees in the optimization pipeline, use the
command:

.. code-block:: console

  clang file.c -c -O3 -mllvm -polly -mllvm -polly-dump-before-file=before-polly.ll

This writes a file 'before-polly.ll' containing the LLVM-IR as passed to
polly, after SSA transformation, loop canonicalization, inlining and
other passes.

Thereafter, any Polly pass can be run over 'before-polly.ll' using the
'opt' tool.  To found out which Polly passes are active in the standard
pipeline, see the output of

.. code-block:: console

  clang file.c -c -O3 -mllvm -polly -mllvm -debug-pass=Arguments

The Polly's passes are those between '-polly-detect' and
'-polly-codegen'. Analysis passes can be omitted.  At the time of this
writing, the default Polly pass pipeline is:

.. code-block:: console

  opt before-polly.ll -polly-simplify -polly-optree -polly-delicm -polly-simplify -polly-prune-unprofitable -polly-opt-isl -polly-codegen

Note that this uses LLVM's old/legacy pass manager.

For completeness, here are some other methods that generates IR
suitable for processing with Polly from C/C++/Objective C source code.
The previous method is the recommended one.

The following generates unoptimized LLVM-IR ('-O0', which is the
default) and runs the canonicalizing passes on it
('-polly-canonicalize'). This does /not/ include all the passes that run
before Polly in the default pass pipeline.  The '-disable-O0-optnone'
option is required because otherwise clang adds an 'optnone' attribute
to all functions such that it is skipped by most optimization passes.
This is meant to stop LTO builds to optimize these functions in the
linking phase anyway.

.. code-block:: console

  clang file.c -c -O0 -Xclang -disable-O0-optnone -emit-llvm -S -o - | opt -polly-canonicalize -S

The option '-disable-llvm-passes' disables all LLVM passes, even those
that run at -O0.  Passing -O1 (or any optimization level other than -O0)
avoids that the 'optnone' attribute is added.

.. code-block:: console

  clang file.c -c -O1 -Xclang -disable-llvm-passes -emit-llvm -S -o - | opt -polly-canonicalize -S

As another alternative, Polly can be pushed in front of the pass
pipeline, and then its output dumped.  This implicitly runs the
'-polly-canonicalize' passes.

.. code-block:: console

  clang file.c -c -O3 -mllvm -polly -mllvm -polly-position=early -mllvm -polly-dump-before-file=before-polly.ll

Further options
===============
Polly supports further options that are mainly useful for the development or the
analysis of Polly. The relevant options can be added to clang by appending
-mllvm -option-name to the CFLAGS or the clang command line.

Limit Polly to a single function
--------------------------------

To limit the execution of Polly to a single function, use the option
-polly-only-func=functionname.

Disable LLVM-IR generation
--------------------------

Polly normally regenerates LLVM-IR from the Polyhedral representation. To only
see the effects of the preparing transformation, but to disable Polly code
generation add the option polly-no-codegen.

Graphical view of the SCoPs
---------------------------
Polly can use graphviz to show the SCoPs it detects in a program. The relevant
options are -polly-show, -polly-show-only, -polly-dot and -polly-dot-only. The
'show' options automatically run dotty or another graphviz viewer to show the
scops graphically. The 'dot' options store for each function a dot file that
highlights the detected SCoPs. If 'only' is appended at the end of the option,
the basic blocks are shown without the statements the contain.

Change/Disable the Optimizer
----------------------------

Polly uses by default the isl scheduling optimizer. The isl optimizer optimizes
for data-locality and parallelism using the Pluto algorithm.
To disable the optimizer entirely use the option -polly-optimizer=none.

Disable tiling in the optimizer
-------------------------------

By default both optimizers perform tiling, if possible. In case this is not
wanted the option -polly-tiling=false can be used to disable it. (This option
disables tiling for both optimizers).

Import / Export
---------------

The flags -polly-import and -polly-export allow the export and reimport of the
polyhedral representation. By exporting, modifying and reimporting the
polyhedral representation externally calculated transformations can be
applied. This enables external optimizers or the manual optimization of
specific SCoPs.

Viewing Polly Diagnostics with opt-viewer
-----------------------------------------

The flag -fsave-optimization-record will generate .opt.yaml files when compiling
your program. These yaml files contain information about each emitted remark.
Ensure that you have Python 2.7 with PyYaml and Pygments Python Packages.
To run opt-viewer:

.. code-block:: console

   llvm/tools/opt-viewer/opt-viewer.py -source-dir /path/to/program/src/ \
      /path/to/program/src/foo.opt.yaml \
      /path/to/program/src/bar.opt.yaml \
      -o ./output

Include all yaml files (use \*.opt.yaml when specifying which yaml files to view)
to view all diagnostics from your program in opt-viewer. Compile with `PGO
<https://clang.llvm.org/docs/UsersManual.html#profiling-with-instrumentation>`_ to view
Hotness information in opt-viewer. Resulting html files can be viewed in an internet browser.