Loop-Closed SSA (LCSSA)

Blogging site

Loop-Closed SSA (LCSSA)

You’ll use LCSSA to make loop exits “well-formed” so any value defined inside a loop and used outside flows through a PHI in an exit block. This isolates loop internals and unlocks/strengthens LICM, loop-rotate, vectorization, SCEV, etc It’s also very handy around OpenMP parallel for and lastprivate/reduction lowering.

Below is a practical, LLVM-20-style tutorial that goes from concept ⇒ IR ⇒ algorithms ⇒ opt pipelines ⇒ OpenMP specifics ⇒ integration into your own pass.

0) What LCSSA is

LCSSA property: For every instruction I defined in loop L, if I has a use outside L, then each such use must be dominated by a PHI in some exit block of L, and that PHI is the only cross-loop use of I.

Intuition: Replace “random uses after the loop” by “exactly one PHI on the loop exit path,” so later passes reason locally at exits.

Key aspects of the phi instruction:

1) Before/After example (by hand)

int f(int *a, int n) {
  int s = 0;
  for (int i = 0; i < n; ++i)
    s += a[i];         // value defined inside loop
  return s;            // used outside loop
}

Pre-LCSSA

for.body:
  %s.cur = phi i32 [0, %entry], [%s.next, %latch]
  %val   = load i32, ptr %aptr
  %s.next = add i32 %s.cur, %val
  %cond  = icmp slt i32 %i, %n
  br i1 %cond, label %latch, label %exit

exit:
  ret i32 %s.next          ; cross-loop use of %s.next (bad for LCSSA)

Post-LCSSA

exit:
  %s.next.lcssa = phi i32 [ %s.next, %for.body ] ; single cross-loop use is the PHI
  ret i32 %s.next.lcssa

2) The algorithm (the minimal version)

Given a loop L, DT, LI:

  1. Collect all instructions I is a member of L that have at least one use outside L.
  2. For each exit block E of L and each predecessor P is a member of L of E, if I is available at PE:
    • Create %I.lcssa = phi [I, P] in E (one PHI per exiting predecessor set)
  3. Redirect every non-LCSSA use of I outside of L to %I.lcssa.

3) Verifying LCSSA

A loop L is in LCSSA form if: “no instruction from L has an out-of-loop use except LCSSA PHIs directly in exit blocks.” LLVM has isLCSSAForm(DT) on a loop and a verifier pass.

4) Using opt (new PassManager style)

Common pipelines used in the wild:

# Minimal: canonicalize loops, then LCSSA
opt -passes='loop-simplify,lcssa' -S in.ll -o out.ll

# With verification (handy while debugging)
opt -passes='loop-simplify,lcssa,verify' -S in.ll -o out.ll

# A loop-friendly starter pack
opt -passes='mem2reg,instcombine,simplifycfg,loop-simplify,lcssa,indvars,licm' -S in.ll -o out.ll

Notes:

5) Writing a pass that assumes LCSSA (LLVM 20, new PassManager)

If you want to rely on LCSSA inside your loop/function pass, either:

#include "llvm/IR/Dominators.h"
#include "llvm/Analysis/LoopInfo.h"
#include "llvm/Transforms/Utils/LoopUtils.h"   // formLCSSA
using namespace llvm;

struct MyLoopXform {
  PreservedAnalyses run(Function &F, FunctionAnalysisManager &FAM) {
    auto &LI = FAM.getResult<LoopAnalysis>(F);
    auto &DT = FAM.getResult<DominatorTreeAnalysis>(F);

    // Ensure LCSSA where needed
    for (Loop *L : LI) {
      if (!L->isLCSSAForm(DT)) {
        formLCSSA(*L, DT, LI);
      }
      // ... your loop optimization that relies on LCSSA ...
    }
    return PreservedAnalyses::none();
  }
};

If you write a LoopPass, you can also request/assume simplified form and call formLCSSA(*L, DT, LI) inside run.

6) OpenMP constructs: what changes and what to do

Clang lowers #pragma omp parallel for by outlining the loop body into a helper function and calling into the OpenMP runtime (e.g., __kmpc_fork_call, OpenMPIRBuilder on the LLVM side). Consequences:

OpenMP example: lastprivate

int g(int *a, int n) {
  int last = -1;
  #pragma omp parallel for lastprivate(last)
  for (int i = 0; i < n; ++i)
    last = a[i] * 2;

  return last; // needs the "last iteration’s" value after region
}

Lowering will:

Practical steps to apply LCSSA to OpenMP code:

1. Produce IR with OpenMP (host example):

clang -O0 -fopenmp -emit-llvm -S omp_lastprivate.c -o omp_lastprivate.ll

2. Run on all functions (incluing outlined workers):

opt -passes='loop-simplify,lcssa,verify' -S omp_lastprivate.ll -o omp_lastprivate.lcssa.ll

3. Optional: inspect loops to confirm:

opt -passes='print<loops>' -disable-output omp_lastprivate.lcssa.ll

4. For device/offload builds (-fopenmp-targets=...)

You will get additional device modules; ensure your pipeline runs on those as well (same passes, same reasoning).

opt -passes='mem2reg,sroa,instcombine,simplifycfg,
             loop-simplify,lcssa,indvars,licm,
             loop-rotate,gvn,simplifycfg' \
    -S omp.ll -o omp.opt.ll

5. In an O2-ish pipeline that keeps OpenMP transforms friendly:

opt -passes='mem2reg,sroa,instcombine,simplifycfg,
             loop-simplify,lcssa,indvars,licm,
             loop-rotate,gvn,simplifycfg' \
    -S omp.ll -o omp.opt.ll

6. Common pitfalls (esp. with OpenMP)

7. Minimal Test You Can Run Today

Source (omp_sum.c)

#include <omp.h>
int sum_last(int *a, int n) {
  int s = 0, last = -1;
  #pragma omp parallel for reduction(+:s) lastprivate(last)
  for (int i = 0; i < n; ++i) {
    s    += a[i];
    last  = a[i] * 2;
  }
  return s + last;
}

IR + LCSSA

clang -O0 -fopenmp -emit-llvm -S omp_sum.c -o omp_sum.ll
opt -passes='loop-simplify,lcssa,verify' -S omp_sum.ll -o omp_sum.lcssa.ll

Open omp_sum.lcssa.ll; locate the outlined worker (often a mangled helper). You should see .lcssa suffixed PHIs in loop exit blocks feeding the region merge.

8. Dropping LCSSA on the floor (and fixing it)

Some transforms break LCSSA (e.g., aggressive CFG changes). If your pass needs LCSSA after such transforms, simply call:

formLCSSA(*L, DT, LI);        // for one loop
// or
formLCSSA(LI, DT);            // for all loops in a function (utility overloads exist)

Then continue ……

9. Quick checklist