Commit Graph

312163 Commits

Author SHA1 Message Date
Heejin Ahn a41250c7be [WebAssembly] Irreducible control flow rewrite
Summary:
Rewrite WebAssemblyFixIrreducibleControlFlow to a simpler and cleaner
design, which directly computes reachability and other properties
itself. This avoids previous complexity and bugs. (The new graph
analyses are very similar to how the Relooper algorithm would find loop
entries and so forth.)

This fixes a few bugs, including where we had a false positive and
thought fannkuch was irreducible when it was not, which made us much
larger and slower there, and a reverse bug where we missed
irreducibility. On fannkuch, we used to be 44% slower than asm2wasm and
are now 4% faster.

Reviewers: aheejin

Subscribers: jdoerfert, mgrang, dschuff, sbc100, jgravelle-google, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D58919

Patch by Alon Zakai (kripken)

llvm-svn: 356313
2019-03-16 03:00:19 +00:00
Fangrui Song a957f47e0a [ADT] Make SmallVector emplace_back return a reference
This follows the C++17 std::vector change and can simplify immediate
back() calls.

llvm-svn: 356312
2019-03-16 02:41:45 +00:00
Julian Lettner 769b73738d [TSan][libdispatch] Configure libdispatch lit tests
llvm-svn: 356311
2019-03-16 02:07:50 +00:00
Sam Clegg 632c217921 [WebAssembly] Error on R_WASM_MEMORY_ADDR relocations against undefined data symbols.
For these types of relocations an absolute memory address is
required which is not possible for undefined data symbols.  For symbols
that can be undefined at link time (i.e. external data symbols in
shared libraries) a different type of relocation (i.e. via a GOT) will
be needed.

Differential Revision: https://reviews.llvm.org/D59337

llvm-svn: 356310
2019-03-16 01:18:12 +00:00
Amara Emerson 7097e83dab [GlobalISel] Make isel verification checks of vregs run under NDEBUG only.
llvm-svn: 356309
2019-03-16 01:02:10 +00:00
Devin Coughlin a61641ef40 [analyzer] Teach scan-build to find clang when installed in /usr/local/bin/
Change scan-build to support the scenario where scan-build is installed in
$TOOLCHAIN/usr/local/bin/ but clang itself is installed in $TOOLCHAIN/usr/bin/.

This is restricted to when 'xcrun' is present; that is, on the Mac.

rdar://problem/48914634

Differential Revision: https://reviews.llvm.org/D59406

llvm-svn: 356308
2019-03-16 01:01:29 +00:00
Csaba Dabis 49e978f780 hello, clang
Test commit with head and body.

llvm-svn: 356307
2019-03-15 23:44:35 +00:00
Peter Collingbourne 3dea548017 gn build: Add missing dependency to check-clang target.
llvm-svn: 356306
2019-03-15 22:47:34 +00:00
Fedor Sergeev 6a9c2f4f98 [TimePasses] allow -time-passes reporting into a custom stream
TimePassesHandler object (implementation of time-passes for new pass manager)
gains ability to report into a stream customizable per-instance (per pipeline).

Intended use is to specify separate time-passes output stream per each compilation,
setting up TimePasses member of StandardInstrumentation during PassBuilder setup.
That allows to get independent non-overlapping pass-times reports for parallel
independent compilations (in JIT-like setups).

By default it still puts timing reports into the info-output-file stream
(created by CreateInfoOutputFile every time report is requested).

Unit-test added for non-default case, and it also allowed to discover that print() does not work
as declared - it did not reset the timers, leading to yet another report being printed into the default stream.
Fixed print() to actually reset timers according to what was declared in print's comments before.

Reviewed By: philip.pfaffe
Differential Revision: https://reviews.llvm.org/D59366

llvm-svn: 356305
2019-03-15 22:15:23 +00:00
Amara Emerson 3739a20875 [GlobalISel] Allow MachineIRBuilder to build subregister copies.
This relaxes some asserts about sizes, and adds an optional subreg parameter
to buildCopy().

Also update AArch64 instruction selector to use this in places where we
previously used MachineInstrBuilder manually.

Differential Revision: https://reviews.llvm.org/D59434

llvm-svn: 356304
2019-03-15 21:59:50 +00:00
Eli Friedman 68d9a60573 [ARM] Add MachineVerifier logic for some Thumb1 instructions.
tMOVr and tPUSH/tPOP/tPOP_RET have register constraints which can't be
expressed in TableGen, so check them explicitly. I've unfortunately run
into issues with both of these recently; hopefully this saves some time
for someone else in the future.

Differential Revision: https://reviews.llvm.org/D59383

llvm-svn: 356303
2019-03-15 21:44:49 +00:00
Jonathan Peyton 6622732d9a [OpenMP] Fix OMPT cancellation test for GOMP
The GOMP sections interface uses schedule(dynamic) dispatch so it cannot
be assumed which thread executes the cancel and which thread executes
the cancellation point.  This patch allows either thread to execute either
section.

llvm-svn: 356302
2019-03-15 21:24:45 +00:00
Roman Lebedev 9f37790608 [X86] X86ISelLowering::combineSextInRegCmov(): also handle i8 CMOV's
Summary:
As noted by @andreadb in https://reviews.llvm.org/D59035#inline-525780

If we have `sext (trunc (cmov C0, C1) to i8)`,
we can instead do `cmov (sext (trunc C0 to i8)), (sext (trunc C1 to i8))`

Reviewers: craig.topper, andreadb, RKSimon

Reviewed By: craig.topper

Subscribers: llvm-commits, andreadb

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D59412

llvm-svn: 356301
2019-03-15 21:18:05 +00:00
Roman Lebedev b6e376ddfa [X86] Promote i8 CMOV's (PR40965)
Summary:
@mclow.lists brought up this issue up in IRC, it came up during
implementation of libc++ `std::midpoint()` implementation (D59099)
https://godbolt.org/z/oLrHBP

Currently LLVM X86 backend only promotes i8 CMOV if it came from 2x`trunc`.
This differential proposes to always promote i8 CMOV.

There are several concerns here:
* Is this actually more performant, or is it just the ASM that looks cuter?
* Does this result in partial register stalls?
* What about branch predictor?

# Indeed, performance should be the main point here.
Let's look at a simple microbenchmark: {F8412076}
```
#include "benchmark/benchmark.h"

#include <algorithm>
#include <cmath>
#include <cstdint>
#include <iterator>
#include <limits>
#include <random>
#include <type_traits>
#include <utility>
#include <vector>

// Future preliminary libc++ code, from Marshall Clow.
namespace std {
template <class _Tp>
__inline _Tp midpoint(_Tp __a, _Tp __b) noexcept {
  using _Up = typename std::make_unsigned<typename remove_cv<_Tp>::type>::type;

  int __sign = 1;
  _Up __m = __a;
  _Up __M = __b;
  if (__a > __b) {
    __sign = -1;
    __m = __b;
    __M = __a;
  }
  return __a + __sign * _Tp(_Up(__M - __m) >> 1);
}
}  // namespace std

template <typename T>
std::vector<T> getVectorOfRandomNumbers(size_t count) {
  std::random_device rd;
  std::mt19937 gen(rd());
  std::uniform_int_distribution<T> dis(std::numeric_limits<T>::min(),
                                       std::numeric_limits<T>::max());
  std::vector<T> v;
  v.reserve(count);
  std::generate_n(std::back_inserter(v), count,
                  [&dis, &gen]() { return dis(gen); });
  assert(v.size() == count);
  return v;
}

struct RandRand {
  template <typename T>
  static std::pair<std::vector<T>, std::vector<T>> Gen(size_t count) {
    return std::make_pair(getVectorOfRandomNumbers<T>(count),
                          getVectorOfRandomNumbers<T>(count));
  }
};
struct ZeroRand {
  template <typename T>
  static std::pair<std::vector<T>, std::vector<T>> Gen(size_t count) {
    return std::make_pair(std::vector<T>(count, T(0)),
                          getVectorOfRandomNumbers<T>(count));
  }
};

template <class T, class Gen>
void BM_StdMidpoint(benchmark::State& state) {
  const size_t Length = state.range(0);

  const std::pair<std::vector<T>, std::vector<T>> Data =
      Gen::template Gen<T>(Length);
  const std::vector<T>& a = Data.first;
  const std::vector<T>& b = Data.second;
  assert(a.size() == Length && b.size() == a.size());

  benchmark::ClobberMemory();
  benchmark::DoNotOptimize(a);
  benchmark::DoNotOptimize(a.data());
  benchmark::DoNotOptimize(b);
  benchmark::DoNotOptimize(b.data());

  for (auto _ : state) {
    for (size_t i = 0; i < Length; i++) {
      const auto calculated = std::midpoint(a[i], b[i]);
      benchmark::DoNotOptimize(calculated);
    }
  }
  state.SetComplexityN(Length);
  state.counters["midpoints"] =
      benchmark::Counter(Length, benchmark::Counter::kIsIterationInvariant);
  state.counters["midpoints/sec"] =
      benchmark::Counter(Length, benchmark::Counter::kIsIterationInvariantRate);
  const size_t BytesRead = 2 * sizeof(T) * Length;
  state.counters["bytes_read/iteration"] =
      benchmark::Counter(BytesRead, benchmark::Counter::kDefaults,
                         benchmark::Counter::OneK::kIs1024);
  state.counters["bytes_read/sec"] = benchmark::Counter(
      BytesRead, benchmark::Counter::kIsIterationInvariantRate,
      benchmark::Counter::OneK::kIs1024);
}

template <typename T>
static void CustomArguments(benchmark::internal::Benchmark* b) {
  const size_t L2SizeBytes = 2 * 1024 * 1024;
  // What is the largest range we can check to always fit within given L2 cache?
  const size_t MaxLen = L2SizeBytes / /*total bufs*/ 2 /
                        /*maximal elt size*/ sizeof(T) / /*safety margin*/ 2;
  b->RangeMultiplier(2)->Range(1, MaxLen)->Complexity(benchmark::oN);
}

// Both of the values are random.
// The comparison is unpredictable.
BENCHMARK_TEMPLATE(BM_StdMidpoint, int32_t, RandRand)
    ->Apply(CustomArguments<int32_t>);
BENCHMARK_TEMPLATE(BM_StdMidpoint, uint32_t, RandRand)
    ->Apply(CustomArguments<uint32_t>);
BENCHMARK_TEMPLATE(BM_StdMidpoint, int64_t, RandRand)
    ->Apply(CustomArguments<int64_t>);
BENCHMARK_TEMPLATE(BM_StdMidpoint, uint64_t, RandRand)
    ->Apply(CustomArguments<uint64_t>);
BENCHMARK_TEMPLATE(BM_StdMidpoint, int16_t, RandRand)
    ->Apply(CustomArguments<int16_t>);
BENCHMARK_TEMPLATE(BM_StdMidpoint, uint16_t, RandRand)
    ->Apply(CustomArguments<uint16_t>);
BENCHMARK_TEMPLATE(BM_StdMidpoint, int8_t, RandRand)
    ->Apply(CustomArguments<int8_t>);
BENCHMARK_TEMPLATE(BM_StdMidpoint, uint8_t, RandRand)
    ->Apply(CustomArguments<uint8_t>);

// One value is always zero, and another is bigger or equal than zero.
// The comparison is predictable.
BENCHMARK_TEMPLATE(BM_StdMidpoint, uint32_t, ZeroRand)
    ->Apply(CustomArguments<uint32_t>);
BENCHMARK_TEMPLATE(BM_StdMidpoint, uint64_t, ZeroRand)
    ->Apply(CustomArguments<uint64_t>);
BENCHMARK_TEMPLATE(BM_StdMidpoint, uint16_t, ZeroRand)
    ->Apply(CustomArguments<uint16_t>);
BENCHMARK_TEMPLATE(BM_StdMidpoint, uint8_t, ZeroRand)
    ->Apply(CustomArguments<uint8_t>);
```

```
$ ~/src/googlebenchmark/tools/compare.py --no-utest benchmarks ./llvm-cmov-bench-OLD ./llvm-cmov-bench-NEW
RUNNING: ./llvm-cmov-bench-OLD --benchmark_out=/tmp/tmp5a5qjm
2019-03-06 21:53:31
Running ./llvm-cmov-bench-OLD
Run on (8 X 4000 MHz CPU s)
CPU Caches:
  L1 Data 16K (x8)
  L1 Instruction 64K (x4)
  L2 Unified 2048K (x4)
  L3 Unified 8192K (x1)
Load Average: 1.78, 1.81, 1.36
----------------------------------------------------------------------------------------------------
Benchmark                                          Time             CPU   Iterations UserCounters<...>
----------------------------------------------------------------------------------------------------
<...>
BM_StdMidpoint<int32_t, RandRand>/131072      300398 ns       300404 ns         2330 bytes_read/iteration=1024k bytes_read/sec=3.25083G/s midpoints=305.398M midpoints/sec=436.319M/s
BM_StdMidpoint<int32_t, RandRand>_BigO          2.29 N          2.29 N
BM_StdMidpoint<int32_t, RandRand>_RMS              2 %             2 %
<...>
BM_StdMidpoint<uint32_t, RandRand>/131072     300433 ns       300433 ns         2330 bytes_read/iteration=1024k bytes_read/sec=3.25052G/s midpoints=305.398M midpoints/sec=436.278M/s
BM_StdMidpoint<uint32_t, RandRand>_BigO         2.29 N          2.29 N
BM_StdMidpoint<uint32_t, RandRand>_RMS             2 %             2 %
<...>
BM_StdMidpoint<int64_t, RandRand>/65536       169857 ns       169858 ns         4121 bytes_read/iteration=1024k bytes_read/sec=5.74929G/s midpoints=270.074M midpoints/sec=385.828M/s
BM_StdMidpoint<int64_t, RandRand>_BigO          2.59 N          2.59 N
BM_StdMidpoint<int64_t, RandRand>_RMS              3 %             3 %
<...>
BM_StdMidpoint<uint64_t, RandRand>/65536      169770 ns       169771 ns         4125 bytes_read/iteration=1024k bytes_read/sec=5.75223G/s midpoints=270.336M midpoints/sec=386.026M/s
BM_StdMidpoint<uint64_t, RandRand>_BigO         2.59 N          2.59 N
BM_StdMidpoint<uint64_t, RandRand>_RMS             3 %             3 %
<...>
BM_StdMidpoint<int16_t, RandRand>/262144      591169 ns       591179 ns         1182 bytes_read/iteration=1024k bytes_read/sec=1.65189G/s midpoints=309.854M midpoints/sec=443.426M/s
BM_StdMidpoint<int16_t, RandRand>_BigO          2.25 N          2.25 N
BM_StdMidpoint<int16_t, RandRand>_RMS              1 %             1 %
<...>
BM_StdMidpoint<uint16_t, RandRand>/262144     591264 ns       591274 ns         1184 bytes_read/iteration=1024k bytes_read/sec=1.65162G/s midpoints=310.378M midpoints/sec=443.354M/s
BM_StdMidpoint<uint16_t, RandRand>_BigO         2.25 N          2.25 N
BM_StdMidpoint<uint16_t, RandRand>_RMS             1 %             1 %
<...>
BM_StdMidpoint<int8_t, RandRand>/524288      2983669 ns      2983689 ns          235 bytes_read/iteration=1024k bytes_read/sec=335.156M/s midpoints=123.208M midpoints/sec=175.718M/s
BM_StdMidpoint<int8_t, RandRand>_BigO           5.69 N          5.69 N
BM_StdMidpoint<int8_t, RandRand>_RMS               0 %             0 %
<...>
BM_StdMidpoint<uint8_t, RandRand>/524288     2668398 ns      2668419 ns          262 bytes_read/iteration=1024k bytes_read/sec=374.754M/s midpoints=137.363M midpoints/sec=196.479M/s
BM_StdMidpoint<uint8_t, RandRand>_BigO          5.09 N          5.09 N
BM_StdMidpoint<uint8_t, RandRand>_RMS              0 %             0 %
<...>
BM_StdMidpoint<uint32_t, ZeroRand>/131072     300887 ns       300887 ns         2331 bytes_read/iteration=1024k bytes_read/sec=3.24561G/s midpoints=305.529M midpoints/sec=435.619M/s
BM_StdMidpoint<uint32_t, ZeroRand>_BigO         2.29 N          2.29 N
BM_StdMidpoint<uint32_t, ZeroRand>_RMS             2 %             2 %
<...>
BM_StdMidpoint<uint64_t, ZeroRand>/65536      169634 ns       169634 ns         4102 bytes_read/iteration=1024k bytes_read/sec=5.75688G/s midpoints=268.829M midpoints/sec=386.338M/s
BM_StdMidpoint<uint64_t, ZeroRand>_BigO         2.59 N          2.59 N
BM_StdMidpoint<uint64_t, ZeroRand>_RMS             3 %             3 %
<...>
BM_StdMidpoint<uint16_t, ZeroRand>/262144     592252 ns       592255 ns         1182 bytes_read/iteration=1024k bytes_read/sec=1.64889G/s midpoints=309.854M midpoints/sec=442.62M/s
BM_StdMidpoint<uint16_t, ZeroRand>_BigO         2.26 N          2.26 N
BM_StdMidpoint<uint16_t, ZeroRand>_RMS             1 %             1 %
<...>
BM_StdMidpoint<uint8_t, ZeroRand>/524288      987295 ns       987309 ns          711 bytes_read/iteration=1024k bytes_read/sec=1012.85M/s midpoints=372.769M midpoints/sec=531.028M/s
BM_StdMidpoint<uint8_t, ZeroRand>_BigO          1.88 N          1.88 N
BM_StdMidpoint<uint8_t, ZeroRand>_RMS              1 %             1 %
RUNNING: ./llvm-cmov-bench-NEW --benchmark_out=/tmp/tmpPvwpfW
2019-03-06 21:56:58
Running ./llvm-cmov-bench-NEW
Run on (8 X 4000 MHz CPU s)
CPU Caches:
  L1 Data 16K (x8)
  L1 Instruction 64K (x4)
  L2 Unified 2048K (x4)
  L3 Unified 8192K (x1)
Load Average: 1.17, 1.46, 1.30
----------------------------------------------------------------------------------------------------
Benchmark                                          Time             CPU   Iterations UserCounters<...>
----------------------------------------------------------------------------------------------------
<...>
BM_StdMidpoint<int32_t, RandRand>/131072      300878 ns       300880 ns         2324 bytes_read/iteration=1024k bytes_read/sec=3.24569G/s midpoints=304.611M midpoints/sec=435.629M/s
BM_StdMidpoint<int32_t, RandRand>_BigO          2.29 N          2.29 N
BM_StdMidpoint<int32_t, RandRand>_RMS              2 %             2 %
<...>
BM_StdMidpoint<uint32_t, RandRand>/131072     300231 ns       300226 ns         2330 bytes_read/iteration=1024k bytes_read/sec=3.25276G/s midpoints=305.398M midpoints/sec=436.578M/s
BM_StdMidpoint<uint32_t, RandRand>_BigO         2.29 N          2.29 N
BM_StdMidpoint<uint32_t, RandRand>_RMS             2 %             2 %
<...>
BM_StdMidpoint<int64_t, RandRand>/65536       170819 ns       170777 ns         4115 bytes_read/iteration=1024k bytes_read/sec=5.71835G/s midpoints=269.681M midpoints/sec=383.752M/s
BM_StdMidpoint<int64_t, RandRand>_BigO          2.60 N          2.60 N
BM_StdMidpoint<int64_t, RandRand>_RMS              3 %             3 %
<...>
BM_StdMidpoint<uint64_t, RandRand>/65536      171705 ns       171708 ns         4106 bytes_read/iteration=1024k bytes_read/sec=5.68733G/s midpoints=269.091M midpoints/sec=381.671M/s
BM_StdMidpoint<uint64_t, RandRand>_BigO         2.62 N          2.62 N
BM_StdMidpoint<uint64_t, RandRand>_RMS             3 %             3 %
<...>
BM_StdMidpoint<int16_t, RandRand>/262144      592510 ns       592516 ns         1182 bytes_read/iteration=1024k bytes_read/sec=1.64816G/s midpoints=309.854M midpoints/sec=442.425M/s
BM_StdMidpoint<int16_t, RandRand>_BigO          2.26 N          2.26 N
BM_StdMidpoint<int16_t, RandRand>_RMS              1 %             1 %
<...>
BM_StdMidpoint<uint16_t, RandRand>/262144     614823 ns       614823 ns         1180 bytes_read/iteration=1024k bytes_read/sec=1.58836G/s midpoints=309.33M midpoints/sec=426.373M/s
BM_StdMidpoint<uint16_t, RandRand>_BigO         2.33 N          2.33 N
BM_StdMidpoint<uint16_t, RandRand>_RMS             4 %             4 %
<...>
BM_StdMidpoint<int8_t, RandRand>/524288      1073181 ns      1073201 ns          650 bytes_read/iteration=1024k bytes_read/sec=931.791M/s midpoints=340.787M midpoints/sec=488.527M/s
BM_StdMidpoint<int8_t, RandRand>_BigO           2.05 N          2.05 N
BM_StdMidpoint<int8_t, RandRand>_RMS               1 %             1 %
BM_StdMidpoint<uint8_t, RandRand>/524288     1071010 ns      1071020 ns          653 bytes_read/iteration=1024k bytes_read/sec=933.689M/s midpoints=342.36M midpoints/sec=489.522M/s
BM_StdMidpoint<uint8_t, RandRand>_BigO          2.05 N          2.05 N
BM_StdMidpoint<uint8_t, RandRand>_RMS              1 %             1 %
<...>
BM_StdMidpoint<uint32_t, ZeroRand>/131072     300413 ns       300416 ns         2330 bytes_read/iteration=1024k bytes_read/sec=3.2507G/s midpoints=305.398M midpoints/sec=436.302M/s
BM_StdMidpoint<uint32_t, ZeroRand>_BigO         2.29 N          2.29 N
BM_StdMidpoint<uint32_t, ZeroRand>_RMS             2 %             2 %
<...>
BM_StdMidpoint<uint64_t, ZeroRand>/65536      169667 ns       169669 ns         4123 bytes_read/iteration=1024k bytes_read/sec=5.75568G/s midpoints=270.205M midpoints/sec=386.257M/s
BM_StdMidpoint<uint64_t, ZeroRand>_BigO         2.59 N          2.59 N
BM_StdMidpoint<uint64_t, ZeroRand>_RMS             3 %             3 %
<...>
BM_StdMidpoint<uint16_t, ZeroRand>/262144     591396 ns       591404 ns         1184 bytes_read/iteration=1024k bytes_read/sec=1.65126G/s midpoints=310.378M midpoints/sec=443.257M/s
BM_StdMidpoint<uint16_t, ZeroRand>_BigO         2.26 N          2.26 N
BM_StdMidpoint<uint16_t, ZeroRand>_RMS             1 %             1 %
<...>
BM_StdMidpoint<uint8_t, ZeroRand>/524288     1069421 ns      1069413 ns          655 bytes_read/iteration=1024k bytes_read/sec=935.092M/s midpoints=343.409M midpoints/sec=490.258M/s
BM_StdMidpoint<uint8_t, ZeroRand>_BigO          2.04 N          2.04 N
BM_StdMidpoint<uint8_t, ZeroRand>_RMS              0 %             0 %
Comparing ./llvm-cmov-bench-OLD to ./llvm-cmov-bench-NEW
Benchmark                                                   Time             CPU      Time Old      Time New       CPU Old       CPU New
----------------------------------------------------------------------------------------------------------------------------------------
<...>
BM_StdMidpoint<int32_t, RandRand>/131072                 +0.0016         +0.0016        300398        300878        300404        300880
<...>
BM_StdMidpoint<uint32_t, RandRand>/131072                -0.0007         -0.0007        300433        300231        300433        300226
<...>
BM_StdMidpoint<int64_t, RandRand>/65536                  +0.0057         +0.0054        169857        170819        169858        170777
<...>
BM_StdMidpoint<uint64_t, RandRand>/65536                 +0.0114         +0.0114        169770        171705        169771        171708
<...>
BM_StdMidpoint<int16_t, RandRand>/262144                 +0.0023         +0.0023        591169        592510        591179        592516
<...>
BM_StdMidpoint<uint16_t, RandRand>/262144                +0.0398         +0.0398        591264        614823        591274        614823
<...>
BM_StdMidpoint<int8_t, RandRand>/524288                  -0.6403         -0.6403       2983669       1073181       2983689       1073201
<...>
BM_StdMidpoint<uint8_t, RandRand>/524288                 -0.5986         -0.5986       2668398       1071010       2668419       1071020
<...>
BM_StdMidpoint<uint32_t, ZeroRand>/131072                -0.0016         -0.0016        300887        300413        300887        300416
<...>
BM_StdMidpoint<uint64_t, ZeroRand>/65536                 +0.0002         +0.0002        169634        169667        169634        169669
<...>
BM_StdMidpoint<uint16_t, ZeroRand>/262144                -0.0014         -0.0014        592252        591396        592255        591404
<...>
BM_StdMidpoint<uint8_t, ZeroRand>/524288                 +0.0832         +0.0832        987295       1069421        987309       1069413
```

What can we tell from the benchmark?
* `BM_StdMidpoint<[u]int8_t, RandRand>` indeed has the worst performance.
* All `BM_StdMidpoint<uint{8,16,32}_t, ZeroRand>` are all performant, even the 8-bit case.
  That is because there we are computing mid point between zero and some random number,
  thus if the branch predictor is in use, it is in optimal situation.
* Promoting 8-bit CMOV did improve performance of `BM_StdMidpoint<[u]int8_t, RandRand>`, by -59%..-64%.

# What about branch predictor?
* `BM_StdMidpoint<uint8_t, ZeroRand>` was faster than `BM_StdMidpoint<uint{16,32,64}_t, ZeroRand>`,
  which may mean that well-predicted branch is better than `cmov`.
* Promoting 8-bit CMOV degraded performance of `BM_StdMidpoint<uint8_t, ZeroRand>`,
  `cmov` is up to +10% worse than well-predicted branch.
* However, i do not believe this is a concern. If the branch is well predicted,  then the PGO
  will also say that it is well predicted, and LLVM will happily expand cmov back into branch:
  https://godbolt.org/z/P5ufig

# What about partial register stalls?
I'm not really able to answer that.
What i can say is that if the branch is unpredictable (if it is predictable, then use PGO and you'll have branch)
in ~50% of cases you will have to pay branch misprediction penalty.
```
$ grep -i MispredictPenalty X86Sched*.td
X86SchedBroadwell.td:  let MispredictPenalty = 16;
X86SchedHaswell.td:  let MispredictPenalty = 16;
X86SchedSandyBridge.td:  let MispredictPenalty = 16;
X86SchedSkylakeClient.td:  let MispredictPenalty = 14;
X86SchedSkylakeServer.td:  let MispredictPenalty = 14;
X86ScheduleBdVer2.td:  let MispredictPenalty = 20; // Minimum branch misdirection penalty.
X86ScheduleBtVer2.td:  let MispredictPenalty = 14; // Minimum branch misdirection penalty
X86ScheduleSLM.td:  let MispredictPenalty = 10;
X86ScheduleZnver1.td:  let MispredictPenalty = 17;
```
.. which it can be as small as 10 cycles and as large as 20 cycles.
Partial register stalls do not seem to be an issue for AMD CPU's.
For intel CPU's, they should be around ~5 cycles?
Is that actually an issue here? I'm not sure.

In short, i'd say this is an improvement, at least on this microbenchmark.

Fixes [[ https://bugs.llvm.org/show_bug.cgi?id=40965 | PR40965 ]].

Reviewers: craig.topper, RKSimon, spatel, andreadb, nikic

Reviewed By: craig.topper, andreadb

Subscribers: jfb, jdoerfert, llvm-commits, mclow.lists

Tags: #llvm, #libc

Differential Revision: https://reviews.llvm.org/D59035

llvm-svn: 356300
2019-03-15 21:17:53 +00:00
Nikita Popov 1a26144ff5 [AArch64] Turn BIC immediate creation into a DAG combine
Switch BIC immediate creation for vector ANDs from custom lowering
to a DAG combine, which gives generic DAG combines a change to
apply first. In particular this avoids (and x, -1) being turned into
a (bic x, 0) instead of being eliminated entirely.

Differential Revision: https://reviews.llvm.org/D59187

llvm-svn: 356299
2019-03-15 21:04:34 +00:00
Changpeng Fang 989ec59c9f AMDGPU: Fix a SIAnnotateControlFlow issue when there are multiple backedges.
Summary:
At the exit of the loop, the compiler uses a register to remember and accumulate
the number of threads that have already exited. When all active threads exit the
loop, this register is used to restore the exec mask, and the execution continues
for the post loop code.

When there is a "continue" in the loop, the compiler made a mistake to reset the
register to 0 when the "continue" backedge is taken. This will result in some
threads not executing the post loop code as they are supposed to.

This patch fixed the issue.

Reviewers:
  nhaehnle, arsenm

Differential Revision:
  https://reviews.llvm.org/D59312

llvm-svn: 356298
2019-03-15 21:02:48 +00:00
Alex Langford 5ac90b8ba1 [CMake] Correct CMake message mode
Summary:
This wasn't actually printing out a CMake warning, it was prepending
"WARN" to the message.

Reviewers: zturner

Subscribers: mgorny, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D59432

llvm-svn: 356297
2019-03-15 20:43:53 +00:00
Brian Gesiak 9db9b1a175 [coroutines][PR40978] Emit error for co_yield within catch block
Summary:
As reported in https://bugs.llvm.org/show_bug.cgi?id=40978, it's an
error to use the `co_yield` or `co_await` keywords outside of a valid
"suspension context" as defined by [expr.await]p2 of
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/n4775.pdf.

Whether or not the current scope was in a function-try-block's
(https://en.cppreference.com/w/cpp/language/function-try-block) handler
could be determined using scope flag `Scope::FnTryCatchScope`. No
such flag existed for a simple C++ catch statement, so this commit adds
one.

Reviewers: GorNishanov, tks2103, rsmith

Reviewed By: GorNishanov

Subscribers: EricWF, jdoerfert, cfe-commits, lewissbaker

Tags: #clang

Differential Revision: https://reviews.llvm.org/D59076

llvm-svn: 356296
2019-03-15 20:25:49 +00:00
Dan Liew 0bb9b5b481 [CMake] Fix broken uses of `try_compile_only()` and improve the function.
Summary:
There were existing calls to `try_compile_only()` with arguments not
prefixed by `SOURCE` or `FLAGS`. These were silently being ignored.
It looks like the `SOURCE` and `FLAGS` arguments were first introduced
in r278454.

One implication of this is that for a builtins only build for Darwin
(see `darwin_test_archs()`) it would mean we weren't actually passing
`-arch <arch>` to the compiler). This would result in compiler-rt
claiming all supplied architectures could be targetted provided
the compiler could build for Clang's default architecture.

This patch fixes this in several ways.

* Fixes all incorrect calls to `try_compile_only()`.
* Adds code to `try_compile_only()` to check for unhandled arguments
  and raises a fatal error if this occurs. This should stop any
  incorrect calls in the future.
* Improve the documentation on `try_compile_only()` which seemed
  completely wrong.

rdar://problem/48928526

Reviewers: beanz, fjricci, dsanders, kubamracek, yln, dcoughlin

Subscribers: mgorny, jdoerfert, #sanitizers, llvm-commits

Tags: #llvm, #sanitizers

Differential Revision: https://reviews.llvm.org/D59429

llvm-svn: 356295
2019-03-15 20:14:46 +00:00
Craig Topper af856db961 [X86] Strip the SAE bit from the rounding mode passed to the _RND opcodes. Use TargetConstant to save a conversion in the isel table.
The asm parser generates the immediate without the SAE bit. So for consistency we should generate the MCInst the same way from CodeGen.

Since they are now both the same, remove the masking from the printer and replace with an llvm_unreachable.

Use a target constant since we're rebuilding the node anyway. Then we don't have to have isel convert it. Saves about 500 bytes from the isel table.

llvm-svn: 356294
2019-03-15 19:59:35 +00:00
Philip Reames 68a2e4d48b [SimplifyDemandedVec] Strengthen handling all undef lanes (particularly GEPs)
A change of two parts:
1) A generic enhancement for all callers of SDVE to exploit the fact that if all lanes are undef, the result is undef.
2) A GEP specific piece to strengthen/fix the vector index undef element handling, and call into the generic infrastructure when visiting the GEP.

The result is that we replace a vector gep with at least one undef in each lane with a undef.  We can also do the same for vector intrinsics.  Once the masked.load patch (D57372) has landed, I'll update to include call tests as well.

Differential Revision: https://reviews.llvm.org/D57468

llvm-svn: 356293
2019-03-15 19:54:06 +00:00
Simon Pilgrim d33e62c826 [X86][SSE] Fold scalar_to_vector(i64 anyext(x)) -> bitcast(scalar_to_vector(i32 anyext(x)))
Reduce the size of an any-extended i64 scalar_to_vector source to i32 - the any_extend nodes are often introduced by SimplifyDemandedBits.

llvm-svn: 356292
2019-03-15 19:14:28 +00:00
Evgeny Mankov 177301f048 [CUDA][Windows] Partial fix for bug 38811 (Step 2 of 3)
Partial fix for the clang Bug 38811 "Clang fails to compile with CUDA-9.x on Windows".

[Synopsis]
__sptr is a new Microsoft specific modifier (https://docs.microsoft.com/en-us/cpp/cpp/sptr-uptr?view=vs-2017).

[Solution]
Replace all `__sptr` occurrences with `__s` (and all `__cptr` with `__c` as well) to eliminate the below clang compilation error on Windows.

In file included from C:\GIT\LLVM\trunk\llvm-64-release-vs2017-15.9.5\dist\lib\clang\9.0.0\include\__clang_cuda_runtime_wrapper.h:162:
C:\GIT\LLVM\trunk\llvm-64-release-vs2017-15.9.5\dist\lib\clang\9.0.0\include\__clang_cuda_device_functions.h:524:33: error: expected expression
  return __nv_fast_sincosf(__a, __sptr, __cptr);
                                ^
Reviewed by: Artem Belevich

Differential Revision: http://reviews.llvm.org/D59423

llvm-svn: 356291
2019-03-15 19:04:46 +00:00
Nikita Popov 614b1bea97 [ValueTracking] Use ConstantRange overflow checks for unsigned add/sub; NFC
Use the methods introduced in rL356276 to implement the
computeOverflowForUnsigned(Add|Sub) functions in ValueTracking, by
converting the KnownBits into a ConstantRange.

This is NFC: The existing KnownBits based implementation uses the same
logic as the the ConstantRange based one. This is not the case for the
signed equivalents, so I'm only changing unsigned here.

This is in preparation for D59386, which will also intersect the
computeConstantRange() result into the range determined from KnownBits.

llvm-svn: 356290
2019-03-15 18:37:45 +00:00
Jonathan Peyton 5af1c22d0b [OpenMP] Add missing parenthesis in Perl module
llvm-svn: 356289
2019-03-15 18:27:14 +00:00
Jonathan Peyton 44b476c141 [OpenMP] Remove deprecated taskq
Remove very old, unused, and deprecated taskq code.

Patch by Terry Wilmarth

Differential Revision: https://reviews.llvm.org/D58989

llvm-svn: 356288
2019-03-15 18:24:59 +00:00
Sanjay Patel 052d1b7b66 [InstCombine] add tests for logic of NaN fcmps; NFC
llvm-svn: 356287
2019-03-15 18:14:25 +00:00
Nikita Popov 8d92b8e905 [ConstantRange] Try to fix compiler warnings; NFC
Try to fix "ignoring return value" and "default label" errors on
clang-with-thin-lto-ubuntu buildbot.

llvm-svn: 356286
2019-03-15 18:08:06 +00:00
Philip Reames 6867b1f7de [tests] Add a test for constexpr mask as requested in D57372
llvm-svn: 356285
2019-03-15 18:06:32 +00:00
Zachary Turner 98661d0221 Abbreviation declarations are required to have non-null tags.
Treat a null tag as an error.

llvm-svn: 356284
2019-03-15 18:00:43 +00:00
Sanjay Patel a70c9d49af [InstCombine] add tests for masked store/scatter; NFC
Baseline tests for D57247

llvm-svn: 356283
2019-03-15 18:00:28 +00:00
Amara Emerson d55016b276 [AArch64][GlobalISel] Regbankselect: Fix G_BUILD_VECTOR trying to use s16 gpr sources.
Since we can't insert s16 gprs as we don't have 16 bit GPR registers, we need to
teach RBS to assign them to the FPR bank so our selector works.

llvm-svn: 356282
2019-03-15 18:00:01 +00:00
Julian Lettner 764c2165e8 [TSan][libdispatch] Enable linking and running of tests on Linux
When COMPILER_RT_INTERCEPT_LIBDISPATCH is ON the TSan runtime library
now has a dependency on the blocks runtime and libdispatch. Make sure we
set all the required linking options.

Also add cmake options for specifying additional library paths to
instruct the linker where to search for libdispatch and the blocks
runtime. This allows us to build TSan runtime with libdispatch support
without installing those libraries into default linker library paths.

`CMAKE_TRY_COMPILE_TARGET_TYPE=STATIC_LIBRARY` is necessary to avoid
aborting the build due to failing the link step in CMake's
check_c_compiler test.

Reviewed By: dvyukov, kubamracek

Differential Revision: https://reviews.llvm.org/D59334

llvm-svn: 356281
2019-03-15 17:52:27 +00:00
Philip Reames d238bf7855 [X86][GlobalISEL] Support lowering aligned unordered atomics
The existing lowering code is accidentally correct for unordered atomics as far as I can tell. An unordered atomic has no memory ordering, and simply requires the actual load or store to be done as a single well aligned instruction. As such, relax the restriction while adding tests to ensure the lowering remains correct in the future.

Differential Revision: https://reviews.llvm.org/D57803

llvm-svn: 356280
2019-03-15 17:50:30 +00:00
Yonghong Song 44ed286a2f [BPF] handle external global properly
Previous commit 6bc58e6d3dbd ("[BPF] do not generate unused local/global types")
tried to exclude global variable from type generation. The condition is:
     if (Global.hasExternalLinkage())
       continue;
This is not right. It also excluded initialized globals.

The correct condition (from AssemblyWriter::printGlobal()) is:
  if (!GV->hasInitializer() && GV->hasExternalLinkage())
    Out << "external ";

Let us do the same in BTF type generation. Also added a test for it.

Signed-off-by: Yonghong Song <yhs@fb.com>
llvm-svn: 356279
2019-03-15 17:39:10 +00:00
Zachary Turner 1cbbab9277 Return Error and Expected from more DWARF interfaces.
This continues the work of introducing Error and Expected into
the DWARF parsing interfaces, this time for the DWARFCompileUnit
and DWARFDebugAranges classes.

Differential Revision: https://reviews.llvm.org/D59381

llvm-svn: 356278
2019-03-15 17:32:05 +00:00
Aaron Enye Shi 04fddc9b27 [HIP-Clang] propagate -mllvm options to opt and llc
Change the HIP Toolchain to pass the OPT_mllvm options into OPT and LLC stages. Added a lit test to verify the command args.

Reviewers: yaxunl

Differential Revision: https://reviews.llvm.org/D59316

llvm-svn: 356277
2019-03-15 17:31:51 +00:00
Nikita Popov e3d3e862de [ConstantRange] Add overflow check helpers
Add functions to ConstantRange that determine whether the
unsigned/signed addition/subtraction of two ConstantRanges
may/always/never overflows. This will allow checking overflow
conditions based on known constant ranges in addition to known bits.

I'm implementing these methods on ConstantRange to allow them to be
unit tested independently of any ValueTracking machinery. The tests
include exhaustive testing on 4-bit ranges, to make sure the result
is both conservatively correct and maximally precise.

The OverflowResult enum is redeclared on ConstantRange, because
I wanted to avoid a dependency in either direction between
ValueTracking.h and ConstantRange.h.

Differential Revision: https://reviews.llvm.org/D59193

llvm-svn: 356276
2019-03-15 17:29:05 +00:00
Adrian Prantl 2ebbb88960 Implement a better way of not passing the sanitizer environment on to tests.
rdar://problem/48889580

llvm-svn: 356275
2019-03-15 17:22:00 +00:00
Simon Pilgrim 90b700daf1 [AArch64] Regenerate build vector tests
llvm-svn: 356274
2019-03-15 17:17:37 +00:00
Simon Pilgrim 8fbe439345 [SelectionDAG] Add SimplifyDemandedBits handling for ISD::SCALAR_TO_VECTOR
Fixes a lot of constant folding mismatches between i686 and x86_64

llvm-svn: 356273
2019-03-15 17:00:55 +00:00
Robert Widmann 2f1ebe6ee8 [LLVM-C] Expose the "Add Discriminators" Pass To LLVM-C
Summary: Add bindings to create a wrapped "Add Discriminators" pass.  Now that we have debug info support, this is a handy transform to have.

Reviewers: whitequark, deadalnix

Reviewed By: whitequark

Subscribers: dblaikie, aprantl, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D58624

llvm-svn: 356272
2019-03-15 16:57:23 +00:00
Davide Italiano bbcda82e21 [DataFormatters] Remove LLDB_DISABLE_PYTHON from TypeCategory.
llvm-svn: 356271
2019-03-15 16:55:51 +00:00
Simon Pilgrim 65165d54bb [X86] Add SimplifyDemandedBitsForTargetNode support for PINSRB/PINSRW
llvm-svn: 356270
2019-03-15 16:16:49 +00:00
Pavel Labath 230837c662 YAMLIO: Improve endian type support
Summary:
Now that endian types support enumerations (D59141), the existing yaml
support for them is somewhat insufficient. The current solution was to
define the ScalarTraits class for these types, which always forwards to
the ScalarTraits of the underlying type. However, the enum types will
usually have ScalarEnumerationTraits of ScalarBitsetTraits.

In this patch I add the two extra Traits types to the endian types. In
order to properly SFINAE-ize them, I've also added an extra "Enable"
template argument to the Traits template classes.

Reviewers: zturner, sammccall

Subscribers: kristina, Bigcheese, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D59289

llvm-svn: 356269
2019-03-15 15:34:10 +00:00
Teresa Johnson 70ec64cb72 [ThinLTO] Restructure AliasSummary to contain ValueInfo of Aliasee
Summary:
The AliasSummary previously contained the AliaseeGUID, which was only
populated when reading the summary from bitcode. This patch changes it
to instead hold the ValueInfo of the aliasee, and always populates it.
This enables more efficient access to the ValueInfo (specifically in the
recent patch r352438 which needed to perform an index hash table lookup
using the aliasee GUID).

As noted in the comments in AliasSummary, we no longer technically need
to keep a pointer to the corresponding aliasee summary, since it could
be obtained by walking the list of summaries on the ValueInfo looking
for the summary in the same module. However, I am concerned that this
would be inefficient when walking through the index during the thin
link for various analyses. That can be reevaluated in the future.

By always populating this new field, we can remove the guard and special
handling for a 0 aliasee GUID when dumping the dot graph of the summary.

An additional improvement in this patch is when reading the summaries
from LLVM assembly we now set the AliaseeSummary field to the aliasee
summary in that same module, which makes it consistent with the behavior
when reading the summary from bitcode.

Reviewers: pcc, mehdi_amini

Subscribers: inglorion, eraman, steven_wu, dexonsmith, arphaman, llvm-commits

Differential Revision: https://reviews.llvm.org/D57470

llvm-svn: 356268
2019-03-15 15:11:38 +00:00
Simon Pilgrim 55e1330eda [Hexagon] Remove icmp undef from reduced tests
Pre-commit for D59363 (Add icmp UNDEF handling to SelectionDAG::FoldSetCC)

Approved by @kparzysz (Krzysztof Parzyszek)

llvm-svn: 356267
2019-03-15 15:07:44 +00:00
Marshall Clow 2fa901c471 Update a deque test with more assertions. NFC
llvm-svn: 356266
2019-03-15 15:00:41 +00:00
Mircea Trofin 2c3ab66539 [llvm] Skip over empty line table entries.
Summary:
This is similar to how addr2line handles consecutive entries with the
same address - pick the last one.

Reviewers: dblaikie, friss, JDevlieghere

Reviewed By: dblaikie

Subscribers: eugenis, vitalybuka, echristo, JDevlieghere, probinson, aprantl, hiraditya, rupprecht, jdoerfert, llvm-commits

Tags: #llvm, #debug-info

Differential Revision: https://reviews.llvm.org/D58952

llvm-svn: 356265
2019-03-15 15:00:12 +00:00
Ilya Biryukov a06b467ddc [clangd] Remove includes of "gmock-matchers.h". NFC
For consistency with most of the test code.

llvm-svn: 356264
2019-03-15 14:30:07 +00:00