reference, declarationdefinition
definition → references, declarations, derived classes, virtual overrides
reference to multiple definitions → definitions
unreferenced
    1
    2
    3
    4
    5
    6
    7
    8
    9
   10
   11
   12
   13
   14
   15
   16
   17
   18
   19
   20
   21
   22
   23
   24
   25
   26
   27
   28
   29
   30
   31
   32
   33
   34
   35
   36
   37
   38
   39
   40
   41
   42
   43
   44
   45
   46
   47
   48
   49
   50
   51
   52
   53
   54
   55
   56
   57
   58
   59
   60
   61
   62
   63
   64
   65
   66
   67
   68
   69
   70
   71
   72
   73
   74
   75
   76
   77
   78
   79
   80
   81
   82
   83
   84
   85
   86
   87
   88
   89
   90
   91
   92
//===- Threads.h ------------------------------------------------*- C++ -*-===//
//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//
//===----------------------------------------------------------------------===//
//
// LLD supports threads to distribute workloads to multiple cores. Using
// multicore is most effective when more than one core are idle. At the
// last step of a build, it is often the case that a linker is the only
// active process on a computer. So, we are naturally interested in using
// threads wisely to reduce latency to deliver results to users.
//
// That said, we don't want to do "too clever" things using threads.
// Complex multi-threaded algorithms are sometimes extremely hard to
// reason about and can easily mess up the entire design.
//
// Fortunately, when a linker links large programs (when the link time is
// most critical), it spends most of the time to work on massive number of
// small pieces of data of the same kind, and there are opportunities for
// large parallelism there. Here are examples:
//
//  - We have hundreds of thousands of input sections that need to be
//    copied to a result file at the last step of link. Once we fix a file
//    layout, each section can be copied to its destination and its
//    relocations can be applied independently.
//
//  - We have tens of millions of small strings when constructing a
//    mergeable string section.
//
// For the cases such as the former, we can just use parallelForEach
// instead of std::for_each (or a plain for loop). Because tasks are
// completely independent from each other, we can run them in parallel
// without any coordination between them. That's very easy to understand
// and reason about.
//
// For the cases such as the latter, we can use parallel algorithms to
// deal with massive data. We have to write code for a tailored algorithm
// for each problem, but the complexity of multi-threading is isolated in
// a single pass and doesn't affect the linker's overall design.
//
// The above approach seems to be working fairly well. As an example, when
// linking Chromium (output size 1.6 GB), using 4 cores reduces latency to
// 75% compared to single core (from 12.66 seconds to 9.55 seconds) on my
// Ivy Bridge Xeon 2.8 GHz machine. Using 40 cores reduces it to 63% (from
// 12.66 seconds to 7.95 seconds). Because of the Amdahl's law, the
// speedup is not linear, but as you add more cores, it gets faster.
//
// On a final note, if you are trying to optimize, keep the axiom "don't
// guess, measure!" in mind. Some important passes of the linker are not
// that slow. For example, resolving all symbols is not a very heavy pass,
// although it would be very hard to parallelize it. You want to first
// identify a slow pass and then optimize it.
//
//===----------------------------------------------------------------------===//

#ifndef LLD_COMMON_THREADS_H
#define LLD_COMMON_THREADS_H

#include "llvm/Support/Parallel.h"
#include <functional>

namespace lld {

extern bool threadsEnabled;

template <typename R, class FuncTy> void parallelForEach(R &&range, FuncTy fn) {
  if (threadsEnabled)
    for_each(llvm::parallel::par, std::begin(range), std::end(range), fn);
  else
    for_each(llvm::parallel::seq, std::begin(range), std::end(range), fn);
}

inline void parallelForEachN(size_t begin, size_t end,
                             llvm::function_ref<void(size_t)> fn) {
  if (threadsEnabled)
    for_each_n(llvm::parallel::par, begin, end, fn);
  else
    for_each_n(llvm::parallel::seq, begin, end, fn);
}

template <typename R, class FuncTy> void parallelSort(R &&range, FuncTy fn) {
  if (threadsEnabled)
    sort(llvm::parallel::par, std::begin(range), std::end(range), fn);
  else
    sort(llvm::parallel::seq, std::begin(range), std::end(range), fn);
}

} // namespace lld

#endif