MachineFunctionSplitter

MachineFunctionSplitter 是 LLVM 中的一个机器函数拆分(Machine Function Splitting)优化通道,根据配置的阈值和运行时/采样的性能分析数据,将冷(rarely executed)的基本块拆分到独立的冷代码段,以提升指令缓存和 TLB 利用率。
该 Pass 主要涉及函数拆分的理由,即为什么要拆分:

  • ProfileSummaryInfo (PSI)MachineBlockFrequencyInfo (MBFI):分别提供函数/基本块的概要分析数据和具体的执行频率,用于判断哪些基本块是“冷”的。
  • BasicBlockSections:LLVM 支持将基本块分组到不同的节(section),本 Pass 会将冷块标记为特定的冷节(如 .text.unlikely.*)。
  • 阈值判断:基于百分位数(PercentileCutoff)或最低执行次数(ColdCountThreshold)来决定冷块;不同的 Profile 类型(Instrumentation vs. Sample)有不同的冷判规则。
    所以这是一个明显的启发式相关的 Pass,允许下列参数:
// FIXME: This cutoff value is CPU dependent and should be moved to  
// TargetTransformInfo once we consider enabling this on other platforms.  
// The value is expressed as a ProfileSummaryInfo integer percentile cutoff.  
// Defaults to 999950, i.e. all blocks colder than 99.995 percentile are split.  
// The default was empirically determined to be optimal when considering cutoff  
// values between 99%-ile to 100%-ile with respect to iTLB and icache metrics on  
// Intel CPUs.  
static cl::opt<unsigned>  
    PercentileCutoff("mfs-psi-cutoff",  
                     cl::desc("Percentile profile summary cutoff used to "  
                              "determine cold blocks. Unused if set to zero."),  
                     cl::init(999950), cl::Hidden);  
  
static cl::opt<unsigned> ColdCountThreshold(  
    "mfs-count-threshold",  
    cl::desc(  
        "Minimum number of times a block must be executed to be retained."),  
    cl::init(1), cl::Hidden);  
  
static cl::opt<bool> SplitAllEHCode(  
    "mfs-split-ehcode",  
    cl::desc("Splits all EH code and it's descendants by default."),  
    cl::init(false), cl::Hidden);
  • -mfs-psi-cutoff=<N>:按 ProfileSummaryInfo 百分位数切割(默认 999950,对应 99.995%)。
  • -mfs-count-threshold=<N>:最低执行次数阈值(默认 1)。
  • -mfs-split-ehcode :是否无条件拆分所有异常处理代码及其后代(布尔开关,默认关闭)。
    但是该 Pass 仅 200 行。

该 Pass 和 -basic-block-sections=all 是冲突的,依赖 BasicBlockSectionsProfileReaderWrapperPassMachineBlockFrequencyInfoWrapperPass, ProfileSummaryInfoWrapperPass(即依赖 PGO 作为 Profiling 输入)。
然后基本逻辑是,遍历基本块,标记“冷”块。

for (auto &MBB : MF) {
  if (MBB.isEntryBlock()) continue;
  if (MBB.isEHPad())
    LandingPads.push_back(&MBB);
  else if (UseProfileData && isColdBlock(MBB, MBFI, PSI)
           && TII.isMBBSafeToSplitToCold(MBB) && !SplitAllEHCode)
    MBB.setSectionID(MBBSectionID::ColdSectionID);
}

最后一步是重排基本块并更新分支:

finishAdjustingBasicBlocksAndLandingPads(MF);

这里完全复用已有 API 了:

static void finishAdjustingBasicBlocksAndLandingPads(MachineFunction &MF) {  
  auto Comparator = [](const MachineBasicBlock &X, const MachineBasicBlock &Y) {  
    return X.getSectionID().Type < Y.getSectionID().Type;  
  };  
  llvm::sortBasicBlocksAndUpdateBranches(MF, Comparator);  
  llvm::avoidZeroOffsetLandingPad(MF);  
}