Blogging site
You’ll use LCSSA to make loop exits “well-formed” so any value defined inside a loop
and used outside flows through a PHI
in an exit block. This isolates loop internals
and unlocks/strengthens LICM
, loop-rotate, vectorization, SCEV
, etc
It’s also very handy around OpenMP parallel for
and lastprivate
/reduction
lowering.
Below is a practical, LLVM-20-style tutorial that goes from concept ⇒ IR ⇒ algorithms ⇒ opt pipelines ⇒ OpenMP specifics ⇒ integration into your own pass.
LCSSA property: For every instruction I
defined in loop L
,
if I
has a use outside L
, then each such use must be dominated by a PHI
in some exit block of L
, and that PHI
is the only cross-loop use of I
.
Intuition: Replace “random uses after the loop” by “exactly one PHI on the loop exit path,” so later passes reason locally at exits.
phi
instruction: %x = phi i32 [ %inc, %then ], [ %dec, %else ]
In this example, if the control flow came from the %then
basic block,
%x
will be assigned the value of %inc
. If the control flow came
from the %else
basic block, %x
will be assigned the value of %dec
.
Placement: phi instructions must always be placed at the very beginning of a basic block, before any other instructions in that block.
Purpose: phi instructions enable the compiler to perform various optimizations by ensuring a clear and unambiguous definition for every variable, even across control flow merges. They are essential for accurate dataflow analysis and transformations.
int f(int *a, int n) {
int s = 0;
for (int i = 0; i < n; ++i)
s += a[i]; // value defined inside loop
return s; // used outside loop
}
Pre-LCSSA
for.body:
%s.cur = phi i32 [0, %entry], [%s.next, %latch]
%val = load i32, ptr %aptr
%s.next = add i32 %s.cur, %val
%cond = icmp slt i32 %i, %n
br i1 %cond, label %latch, label %exit
exit:
ret i32 %s.next ; cross-loop use of %s.next (bad for LCSSA)
Post-LCSSA
exit:
%s.next.lcssa = phi i32 [ %s.next, %for.body ] ; single cross-loop use is the PHI
ret i32 %s.next.lcssa
Given a loop L
, DT
, LI
:
I
is a member of L
that have at least one use outside L
.E
of L
and each predecessor P
is a member of L
of E
,
if I
is available at P
→ E
:
%I.lcssa = phi [I, P]
in E
(one PHI
per exiting predecessor set)I
outside of L
to %I.lcssa
.A loop L
is in LCSSA form if: “no instruction from L
has an out-of-loop use except
LCSSA PHIs directly in exit blocks.” LLVM has isLCSSAForm(DT)
on a loop and a verifier pass.
opt
(new PassManager style)Common pipelines used in the wild:
# Minimal: canonicalize loops, then LCSSA
opt -passes='loop-simplify,lcssa' -S in.ll -o out.ll
# With verification (handy while debugging)
opt -passes='loop-simplify,lcssa,verify' -S in.ll -o out.ll
# A loop-friendly starter pack
opt -passes='mem2reg,instcombine,simplifycfg,loop-simplify,lcssa,indvars,licm' -S in.ll -o out.ll
Notes:
If you want to rely on LCSSA inside your loop/function pass, either:
#include "llvm/IR/Dominators.h"
#include "llvm/Analysis/LoopInfo.h"
#include "llvm/Transforms/Utils/LoopUtils.h" // formLCSSA
using namespace llvm;
struct MyLoopXform {
PreservedAnalyses run(Function &F, FunctionAnalysisManager &FAM) {
auto &LI = FAM.getResult<LoopAnalysis>(F);
auto &DT = FAM.getResult<DominatorTreeAnalysis>(F);
// Ensure LCSSA where needed
for (Loop *L : LI) {
if (!L->isLCSSAForm(DT)) {
formLCSSA(*L, DT, LI);
}
// ... your loop optimization that relies on LCSSA ...
}
return PreservedAnalyses::none();
}
};
If you write a LoopPass, you can also request/assume simplified form and call
formLCSSA(*L, DT, LI)
inside run.
Clang lowers #pragma omp parallel for
by outlining the loop body into a helper
function and calling into the OpenMP runtime (e.g., __kmpc_fork_call
,
OpenMPIRBuilder on the LLVM side). Consequences:
lastprivate
and reduction
create exactly the kind of “value defined in
loop, used after region” scenarios LCSSA excels at (per-thread partials flow to
a merge point).lastprivate
int g(int *a, int n) {
int last = -1;
#pragma omp parallel for lastprivate(last)
for (int i = 0; i < n; ++i)
last = a[i] * 2;
return last; // needs the "last iteration’s" value after region
}
Lowering will:
last
is loop-local; at region exit, a
runtime-controller merge assigns the last iteration result to the shared last
.LoopSimplify
+ LCSSA
in the outlined function ensures the loop’s
exiting edges produce clean PHI
s; this makes downstream opts (e.g., LICM
,
indvars
, vectorization for omp simd
) reliable.clang -O0 -fopenmp -emit-llvm -S omp_lastprivate.c -o omp_lastprivate.ll
opt -passes='loop-simplify,lcssa,verify' -S omp_lastprivate.ll -o omp_lastprivate.lcssa.ll
opt -passes='print<loops>' -disable-output omp_lastprivate.lcssa.ll
-fopenmp-targets=...
)You will get additional device modules; ensure your pipeline runs on those as well (same passes, same reasoning).
opt -passes='mem2reg,sroa,instcombine,simplifycfg,
loop-simplify,lcssa,indvars,licm,
loop-rotate,gvn,simplifycfg' \
-S omp.ll -o omp.opt.ll
opt -passes='mem2reg,sroa,instcombine,simplifycfg,
loop-simplify,lcssa,indvars,licm,
loop-rotate,gvn,simplifycfg' \
-S omp.ll -o omp.opt.ll
lastprivate
models an actual data flow from the loop’s “last iteration”
to the region’s continuation. LCSSA at loop exits keeps that clean.Source (omp_sum.c
)
#include <omp.h>
int sum_last(int *a, int n) {
int s = 0, last = -1;
#pragma omp parallel for reduction(+:s) lastprivate(last)
for (int i = 0; i < n; ++i) {
s += a[i];
last = a[i] * 2;
}
return s + last;
}
IR + LCSSA
clang -O0 -fopenmp -emit-llvm -S omp_sum.c -o omp_sum.ll
opt -passes='loop-simplify,lcssa,verify' -S omp_sum.ll -o omp_sum.lcssa.ll
Open omp_sum.lcssa.ll
; locate the outlined worker (often a mangled helper).
You should see .lcssa
suffixed PHIs in loop exit blocks feeding the region
merge.
Some transforms break LCSSA (e.g., aggressive CFG changes). If your pass needs LCSSA after such transforms, simply call:
formLCSSA(*L, DT, LI); // for one loop
// or
formLCSSA(LI, DT); // for all loops in a function (utility overloads exist)
Then continue ……
mem2reg
/ sroa
).loop-simplify
(preheaders, single backedge, dedicated exits).lcssa
(or formLCSSA
in your pass).verify
while developing.formLCSSA
before analyses/opts that assume it.⸻