Commit Graph

184229 Commits

Author SHA1 Message Date
Simon Atanasyan 68f73bf262 [mips] Merge common checkings under the same check prefix. NFC
llvm-svn: 370467
2019-08-30 12:15:12 +00:00
Luis Marques c2b3d527fa [RISCV] Fix a couple of tests' CHECKs
llvm-svn: 370466
2019-08-30 12:11:47 +00:00
Haojian Wu ed170c9bf9 Remove an extra ";", NFC.
llvm-svn: 370465
2019-08-30 12:09:31 +00:00
Amaury Sechet 485760f4c0 [X86] Add tests for rotate matching. NFC
llvm-svn: 370464
2019-08-30 11:35:28 +00:00
Bjorn Pettersson 227145924a [CodeGen] Introduce MachineBasicBlock::replacePhiUsesWith helper and use it. NFC
Summary:
Found a couple of places in the code where all the PHI nodes
of a MBB is updated, replacing references to one MBB by
reference to another MBB instead.

This patch simply refactors the code to use a common helper
(MachineBasicBlock::replacePhiUsesWith) for such PHI node
updates.

Reviewers: t.p.northover, arsenm, uabelho

Subscribers: wdng, hiraditya, jsji, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66750

llvm-svn: 370463
2019-08-30 11:23:10 +00:00
Simon Pilgrim 7cbf823f93 [DAGCombine] visitMULHS/visitMULHU - isBuildVectorAllZeros doesn't mean node is all zeros
Return a proper zero vector, just in case some elements are undef.

Noticed by inspection after dealing with a similar issue in PR43159.

llvm-svn: 370460
2019-08-30 10:42:14 +00:00
Simon Pilgrim 01a3c25c27 Fix Wdocumentation warning. NFCI.
llvm-svn: 370459
2019-08-30 10:25:52 +00:00
Chris Jackson fa1fe93789 [llvm-objcopy] Allow the visibility of symbols created by --binary and
--add-symbol to be specified with --new-symbol-visibility

llvm-svn: 370458
2019-08-30 10:17:16 +00:00
Hideto Ueno 6381b143f6 [Attributor] Implement AANoAliasCallSiteArgument initialization
Summary: This patch adds an appropriate `initialize` method for `AANoAliasCallSiteArgument`.

Reviewers: jdoerfert, sstefan1

Reviewed By: jdoerfert

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66927

llvm-svn: 370456
2019-08-30 10:00:32 +00:00
Roman Lebedev 5c9f3cfec7 [LoopIdiomRecognize] BCmp loop idiom recognition
Summary:
@mclow.lists brought up this issue up in IRC.
It is a reasonably common problem to compare some two values for equality.
Those may be just some integers, strings or arrays of integers.

In C, there is `memcmp()`, `bcmp()` functions.
In C++, there exists `std::equal()` algorithm.
One can also write that function manually.

libstdc++'s `std::equal()` is specialized to directly call `memcmp()` for
various types, but not `std::byte` from C++2a. https://godbolt.org/z/mx2ejJ

libc++ does not do anything like that, it simply relies on simple C++'s
`operator==()`. https://godbolt.org/z/er0Zwf (GOOD!)

So likely, there exists a certain performance opportunities.
Let's compare performance of naive `std::equal()` (no `memcmp()`) with one that
is using `memcmp()` (in this case, compiled with modified compiler). {F8768213}

```
#include <algorithm>
#include <cmath>
#include <cstdint>
#include <iterator>
#include <limits>
#include <random>
#include <type_traits>
#include <utility>
#include <vector>

#include "benchmark/benchmark.h"

template <class T>
bool equal(T* a, T* a_end, T* b) noexcept {
  for (; a != a_end; ++a, ++b) {
    if (*a != *b) return false;
  }
  return true;
}

template <typename T>
std::vector<T> getVectorOfRandomNumbers(size_t count) {
  std::random_device rd;
  std::mt19937 gen(rd());
  std::uniform_int_distribution<T> dis(std::numeric_limits<T>::min(),
                                       std::numeric_limits<T>::max());
  std::vector<T> v;
  v.reserve(count);
  std::generate_n(std::back_inserter(v), count,
                  [&dis, &gen]() { return dis(gen); });
  assert(v.size() == count);
  return v;
}

struct Identical {
  template <typename T>
  static std::pair<std::vector<T>, std::vector<T>> Gen(size_t count) {
    auto Tmp = getVectorOfRandomNumbers<T>(count);
    return std::make_pair(Tmp, std::move(Tmp));
  }
};

struct InequalHalfway {
  template <typename T>
  static std::pair<std::vector<T>, std::vector<T>> Gen(size_t count) {
    auto V0 = getVectorOfRandomNumbers<T>(count);
    auto V1 = V0;
    V1[V1.size() / size_t(2)]++;  // just change the value.
    return std::make_pair(std::move(V0), std::move(V1));
  }
};

template <class T, class Gen>
void BM_bcmp(benchmark::State& state) {
  const size_t Length = state.range(0);

  const std::pair<std::vector<T>, std::vector<T>> Data =
      Gen::template Gen<T>(Length);
  const std::vector<T>& a = Data.first;
  const std::vector<T>& b = Data.second;
  assert(a.size() == Length && b.size() == a.size());

  benchmark::ClobberMemory();
  benchmark::DoNotOptimize(a);
  benchmark::DoNotOptimize(a.data());
  benchmark::DoNotOptimize(b);
  benchmark::DoNotOptimize(b.data());

  for (auto _ : state) {
    const bool is_equal = equal(a.data(), a.data() + a.size(), b.data());
    benchmark::DoNotOptimize(is_equal);
  }
  state.SetComplexityN(Length);
  state.counters["eltcnt"] =
      benchmark::Counter(Length, benchmark::Counter::kIsIterationInvariant);
  state.counters["eltcnt/sec"] =
      benchmark::Counter(Length, benchmark::Counter::kIsIterationInvariantRate);
  const size_t BytesRead = 2 * sizeof(T) * Length;
  state.counters["bytes_read/iteration"] =
      benchmark::Counter(BytesRead, benchmark::Counter::kDefaults,
                         benchmark::Counter::OneK::kIs1024);
  state.counters["bytes_read/sec"] = benchmark::Counter(
      BytesRead, benchmark::Counter::kIsIterationInvariantRate,
      benchmark::Counter::OneK::kIs1024);
}

template <typename T>
static void CustomArguments(benchmark::internal::Benchmark* b) {
  const size_t L2SizeBytes = []() {
    for (const benchmark::CPUInfo::CacheInfo& I :
         benchmark::CPUInfo::Get().caches) {
      if (I.level == 2) return I.size;
    }
    return 0;
  }();
  // What is the largest range we can check to always fit within given L2 cache?
  const size_t MaxLen = L2SizeBytes / /*total bufs*/ 2 /
                        /*maximal elt size*/ sizeof(T) / /*safety margin*/ 2;
  b->RangeMultiplier(2)->Range(1, MaxLen)->Complexity(benchmark::oN);
}

BENCHMARK_TEMPLATE(BM_bcmp, uint8_t, Identical)
    ->Apply(CustomArguments<uint8_t>);
BENCHMARK_TEMPLATE(BM_bcmp, uint16_t, Identical)
    ->Apply(CustomArguments<uint16_t>);
BENCHMARK_TEMPLATE(BM_bcmp, uint32_t, Identical)
    ->Apply(CustomArguments<uint32_t>);
BENCHMARK_TEMPLATE(BM_bcmp, uint64_t, Identical)
    ->Apply(CustomArguments<uint64_t>);

BENCHMARK_TEMPLATE(BM_bcmp, uint8_t, InequalHalfway)
    ->Apply(CustomArguments<uint8_t>);
BENCHMARK_TEMPLATE(BM_bcmp, uint16_t, InequalHalfway)
    ->Apply(CustomArguments<uint16_t>);
BENCHMARK_TEMPLATE(BM_bcmp, uint32_t, InequalHalfway)
    ->Apply(CustomArguments<uint32_t>);
BENCHMARK_TEMPLATE(BM_bcmp, uint64_t, InequalHalfway)
    ->Apply(CustomArguments<uint64_t>);
```
{F8768210}
```
$ ~/src/googlebenchmark/tools/compare.py --no-utest benchmarks build-{old,new}/test/llvm-bcmp-bench
RUNNING: build-old/test/llvm-bcmp-bench --benchmark_out=/tmp/tmpb6PEUx
2019-04-25 21:17:11
Running build-old/test/llvm-bcmp-bench
Run on (8 X 4000 MHz CPU s)
CPU Caches:
  L1 Data 16K (x8)
  L1 Instruction 64K (x4)
  L2 Unified 2048K (x4)
  L3 Unified 8192K (x1)
Load Average: 0.65, 3.90, 4.14
---------------------------------------------------------------------------------------------------
Benchmark                                         Time             CPU   Iterations UserCounters...
---------------------------------------------------------------------------------------------------
<...>
BM_bcmp<uint8_t, Identical>/512000           432131 ns       432101 ns         1613 bytes_read/iteration=1000k bytes_read/sec=2.20706G/s eltcnt=825.856M eltcnt/sec=1.18491G/s
BM_bcmp<uint8_t, Identical>_BigO               0.86 N          0.86 N
BM_bcmp<uint8_t, Identical>_RMS                   8 %             8 %
<...>
BM_bcmp<uint16_t, Identical>/256000          161408 ns       161409 ns         4027 bytes_read/iteration=1000k bytes_read/sec=5.90843G/s eltcnt=1030.91M eltcnt/sec=1.58603G/s
BM_bcmp<uint16_t, Identical>_BigO              0.67 N          0.67 N
BM_bcmp<uint16_t, Identical>_RMS                 25 %            25 %
<...>
BM_bcmp<uint32_t, Identical>/128000           81497 ns        81488 ns         8415 bytes_read/iteration=1000k bytes_read/sec=11.7032G/s eltcnt=1077.12M eltcnt/sec=1.57078G/s
BM_bcmp<uint32_t, Identical>_BigO              0.71 N          0.71 N
BM_bcmp<uint32_t, Identical>_RMS                 42 %            42 %
<...>
BM_bcmp<uint64_t, Identical>/64000            50138 ns        50138 ns        10909 bytes_read/iteration=1000k bytes_read/sec=19.0209G/s eltcnt=698.176M eltcnt/sec=1.27647G/s
BM_bcmp<uint64_t, Identical>_BigO              0.84 N          0.84 N
BM_bcmp<uint64_t, Identical>_RMS                 27 %            27 %
<...>
BM_bcmp<uint8_t, InequalHalfway>/512000      192405 ns       192392 ns         3638 bytes_read/iteration=1000k bytes_read/sec=4.95694G/s eltcnt=1.86266G eltcnt/sec=2.66124G/s
BM_bcmp<uint8_t, InequalHalfway>_BigO          0.38 N          0.38 N
BM_bcmp<uint8_t, InequalHalfway>_RMS              3 %             3 %
<...>
BM_bcmp<uint16_t, InequalHalfway>/256000     127858 ns       127860 ns         5477 bytes_read/iteration=1000k bytes_read/sec=7.45873G/s eltcnt=1.40211G eltcnt/sec=2.00219G/s
BM_bcmp<uint16_t, InequalHalfway>_BigO         0.50 N          0.50 N
BM_bcmp<uint16_t, InequalHalfway>_RMS             0 %             0 %
<...>
BM_bcmp<uint32_t, InequalHalfway>/128000      49140 ns        49140 ns        14281 bytes_read/iteration=1000k bytes_read/sec=19.4072G/s eltcnt=1.82797G eltcnt/sec=2.60478G/s
BM_bcmp<uint32_t, InequalHalfway>_BigO         0.40 N          0.40 N
BM_bcmp<uint32_t, InequalHalfway>_RMS            18 %            18 %
<...>
BM_bcmp<uint64_t, InequalHalfway>/64000       32101 ns        32099 ns        21786 bytes_read/iteration=1000k bytes_read/sec=29.7101G/s eltcnt=1.3943G eltcnt/sec=1.99381G/s
BM_bcmp<uint64_t, InequalHalfway>_BigO         0.50 N          0.50 N
BM_bcmp<uint64_t, InequalHalfway>_RMS             1 %             1 %
RUNNING: build-new/test/llvm-bcmp-bench --benchmark_out=/tmp/tmpQ46PP0
2019-04-25 21:19:29
Running build-new/test/llvm-bcmp-bench
Run on (8 X 4000 MHz CPU s)
CPU Caches:
  L1 Data 16K (x8)
  L1 Instruction 64K (x4)
  L2 Unified 2048K (x4)
  L3 Unified 8192K (x1)
Load Average: 1.01, 2.85, 3.71
---------------------------------------------------------------------------------------------------
Benchmark                                         Time             CPU   Iterations UserCounters...
---------------------------------------------------------------------------------------------------
<...>
BM_bcmp<uint8_t, Identical>/512000            18593 ns        18590 ns        37565 bytes_read/iteration=1000k bytes_read/sec=51.2991G/s eltcnt=19.2333G eltcnt/sec=27.541G/s
BM_bcmp<uint8_t, Identical>_BigO               0.04 N          0.04 N
BM_bcmp<uint8_t, Identical>_RMS                  37 %            37 %
<...>
BM_bcmp<uint16_t, Identical>/256000           18950 ns        18948 ns        37223 bytes_read/iteration=1000k bytes_read/sec=50.3324G/s eltcnt=9.52909G eltcnt/sec=13.511G/s
BM_bcmp<uint16_t, Identical>_BigO              0.08 N          0.08 N
BM_bcmp<uint16_t, Identical>_RMS                 34 %            34 %
<...>
BM_bcmp<uint32_t, Identical>/128000           18627 ns        18627 ns        37895 bytes_read/iteration=1000k bytes_read/sec=51.198G/s eltcnt=4.85056G eltcnt/sec=6.87168G/s
BM_bcmp<uint32_t, Identical>_BigO              0.16 N          0.16 N
BM_bcmp<uint32_t, Identical>_RMS                 35 %            35 %
<...>
BM_bcmp<uint64_t, Identical>/64000            18855 ns        18855 ns        37458 bytes_read/iteration=1000k bytes_read/sec=50.5791G/s eltcnt=2.39731G eltcnt/sec=3.3943G/s
BM_bcmp<uint64_t, Identical>_BigO              0.32 N          0.32 N
BM_bcmp<uint64_t, Identical>_RMS                 33 %            33 %
<...>
BM_bcmp<uint8_t, InequalHalfway>/512000        9570 ns         9569 ns        73500 bytes_read/iteration=1000k bytes_read/sec=99.6601G/s eltcnt=37.632G eltcnt/sec=53.5046G/s
BM_bcmp<uint8_t, InequalHalfway>_BigO          0.02 N          0.02 N
BM_bcmp<uint8_t, InequalHalfway>_RMS             29 %            29 %
<...>
BM_bcmp<uint16_t, InequalHalfway>/256000       9547 ns         9547 ns        74343 bytes_read/iteration=1000k bytes_read/sec=99.8971G/s eltcnt=19.0318G eltcnt/sec=26.8159G/s
BM_bcmp<uint16_t, InequalHalfway>_BigO         0.04 N          0.04 N
BM_bcmp<uint16_t, InequalHalfway>_RMS            29 %            29 %
<...>
BM_bcmp<uint32_t, InequalHalfway>/128000       9396 ns         9394 ns        73521 bytes_read/iteration=1000k bytes_read/sec=101.518G/s eltcnt=9.41069G eltcnt/sec=13.6255G/s
BM_bcmp<uint32_t, InequalHalfway>_BigO         0.08 N          0.08 N
BM_bcmp<uint32_t, InequalHalfway>_RMS            30 %            30 %
<...>
BM_bcmp<uint64_t, InequalHalfway>/64000        9499 ns         9498 ns        73802 bytes_read/iteration=1000k bytes_read/sec=100.405G/s eltcnt=4.72333G eltcnt/sec=6.73808G/s
BM_bcmp<uint64_t, InequalHalfway>_BigO         0.16 N          0.16 N
BM_bcmp<uint64_t, InequalHalfway>_RMS            28 %            28 %
Comparing build-old/test/llvm-bcmp-bench to build-new/test/llvm-bcmp-bench
Benchmark                                                  Time             CPU      Time Old      Time New       CPU Old       CPU New
---------------------------------------------------------------------------------------------------------------------------------------
<...>
BM_bcmp<uint8_t, Identical>/512000                      -0.9570         -0.9570        432131         18593        432101         18590
<...>
BM_bcmp<uint16_t, Identical>/256000                     -0.8826         -0.8826        161408         18950        161409         18948
<...>
BM_bcmp<uint32_t, Identical>/128000                     -0.7714         -0.7714         81497         18627         81488         18627
<...>
BM_bcmp<uint64_t, Identical>/64000                      -0.6239         -0.6239         50138         18855         50138         18855
<...>
BM_bcmp<uint8_t, InequalHalfway>/512000                 -0.9503         -0.9503        192405          9570        192392          9569
<...>
BM_bcmp<uint16_t, InequalHalfway>/256000                -0.9253         -0.9253        127858          9547        127860          9547
<...>
BM_bcmp<uint32_t, InequalHalfway>/128000                -0.8088         -0.8088         49140          9396         49140          9394
<...>
BM_bcmp<uint64_t, InequalHalfway>/64000                 -0.7041         -0.7041         32101          9499         32099          9498
```

What can we tell from the benchmark?
* Performance of naive equality check somewhat improves with element size,
  maxing out at eltcnt/sec=1.58603G/s for uint16_t, or bytes_read/sec=19.0209G/s
  for uint64_t. I think, that instability implies performance problems.
* Performance of `memcmp()`-aware benchmark always maxes out at around
  bytes_read/sec=51.2991G/s for every type. That is 2.6x the throughput of the
  naive variant!
* eltcnt/sec metric for the `memcmp()`-aware benchmark maxes out at
  eltcnt/sec=27.541G/s for uint8_t (was: eltcnt/sec=1.18491G/s, so 24x) and
  linearly decreases with element size.
  For uint64_t, it's ~4x+ the elements/second.
* The call obvious is more pricey than the loop, with small element count.
  As it can be seen from the full output {F8768210}, the `memcmp()` is almost
  universally worse, independent of the element size (and thus buffer size) when
  element count is less than 8.

So all in all, bcmp idiom does indeed pose untapped performance headroom.
This diff does implement said idiom recognition. I think a reasonable test
coverage is present, but do tell if there is anything obvious missing.

Now, quality. This does succeed to build and pass the test-suite, at least
without any non-bundled elements. {F8768216} {F8768217}
This transform fires 91 times:
```
$ /build/test-suite/utils/compare.py -m loop-idiom.NumBCmp result-new.json
Tests: 1149
Metric: loop-idiom.NumBCmp

Program                                         result-new

MultiSourc...Benchmarks/7zip/7zip-benchmark    79.00
MultiSource/Applications/d/make_dparser         3.00
SingleSource/UnitTests/vla                      2.00
MultiSource/Applications/Burg/burg              1.00
MultiSourc.../Applications/JM/lencod/lencod     1.00
MultiSource/Applications/lemon/lemon            1.00
MultiSource/Benchmarks/Bullet/bullet            1.00
MultiSourc...e/Benchmarks/MallocBench/gs/gs     1.00
MultiSourc...gs-C/TimberWolfMC/timberwolfmc     1.00
MultiSourc...Prolangs-C/simulator/simulator     1.00
```
The size changes are:
I'm not sure what's going on with SingleSource/UnitTests/vla.test yet, did not look.
```
$ /build/test-suite/utils/compare.py -m size..text result-{old,new}.json --filter-hash
Tests: 1149
Same hash: 907 (filtered out)
Remaining: 242
Metric: size..text

Program                                        result-old result-new diff
test-suite...ingleSource/UnitTests/vla.test   753.00     833.00     10.6%
test-suite...marks/7zip/7zip-benchmark.test   1001697.00 966657.00  -3.5%
test-suite...ngs-C/simulator/simulator.test   32369.00   32321.00   -0.1%
test-suite...plications/d/make_dparser.test   89585.00   89505.00   -0.1%
test-suite...ce/Applications/Burg/burg.test   40817.00   40785.00   -0.1%
test-suite.../Applications/lemon/lemon.test   47281.00   47249.00   -0.1%
test-suite...TimberWolfMC/timberwolfmc.test   250065.00  250113.00   0.0%
test-suite...chmarks/MallocBench/gs/gs.test   149889.00  149873.00  -0.0%
test-suite...ications/JM/lencod/lencod.test   769585.00  769569.00  -0.0%
test-suite.../Benchmarks/Bullet/bullet.test   770049.00  770049.00   0.0%
test-suite...HMARK_ANISTROPIC_DIFFUSION/128    NaN        NaN        nan%
test-suite...HMARK_ANISTROPIC_DIFFUSION/256    NaN        NaN        nan%
test-suite...CHMARK_ANISTROPIC_DIFFUSION/64    NaN        NaN        nan%
test-suite...CHMARK_ANISTROPIC_DIFFUSION/32    NaN        NaN        nan%
test-suite...ENCHMARK_BILATERAL_FILTER/64/4    NaN        NaN        nan%
Geomean difference                                                   nan%
         result-old    result-new       diff
count  1.000000e+01  10.00000      10.000000
mean   3.152090e+05  311695.40000  0.006749
std    3.790398e+05  372091.42232  0.036605
min    7.530000e+02  833.00000    -0.034981
25%    4.243300e+04  42401.00000  -0.000866
50%    1.197370e+05  119689.00000 -0.000392
75%    6.397050e+05  639705.00000 -0.000005
max    1.001697e+06  966657.00000  0.106242
```

I don't have timings though.

And now to the code. The basic idea is to completely replace the whole loop.
If we can't fully kill it, don't transform.
I have left one or two comments in the code, so hopefully it can be understood.

Also, there is a few TODO's that i have left for follow-ups:
* widening of `memcmp()`/`bcmp()`
* step smaller than the comparison size
* Metadata propagation
* more than two blocks as long as there is still a single backedge?
* ???

Reviewers: reames, fhahn, mkazantsev, chandlerc, craig.topper, courbet

Reviewed By: courbet

Subscribers: hiraditya, xbolva00, nikic, jfb, gchatelet, courbet, llvm-commits, mclow.lists

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61144

llvm-svn: 370454
2019-08-30 09:51:23 +00:00
Roman Lebedev 09e4ac1a4d [NFC] SCEVExpander: add SetCurrentDebugLocation() / getCurrentDebugLocation() wrappers
Summary:
The internal `Builder` is private, which means there is
currently no way to set the debuginfo locations for `SCEVExpander`.
This only adds the wrappers, but does not use them anywhere.

Reviewers: mkazantsev, sanjoy, gberry, jyknight, dneilson

Reviewed By: sanjoy

Subscribers: javed.absar, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61007

llvm-svn: 370453
2019-08-30 09:51:02 +00:00
David Stenberg b35d4699d0 [LiveDebugValues] Insert entry values after bundles
Summary:
Change LiveDebugValues so that it inserts entry values after the bundle
which contains the clobbering instruction. Previously it would insert
the debug value after the bundle head using insertAfter(), breaking the
bundle.

Reviewers: djtodoro, NikolaPrica, aprantl, vsk

Reviewed By: vsk

Subscribers: hiraditya, llvm-commits

Tags: #debug-info, #llvm

Differential Revision: https://reviews.llvm.org/D66888

llvm-svn: 370448
2019-08-30 09:06:50 +00:00
Sven van Haastregt fd66c8bf07 vim: add `immarg` keyword
The `immarg` attribute was added in r355981.

llvm-svn: 370443
2019-08-30 08:52:55 +00:00
Nico Weber 629f921568 gn build: Merge r370441
llvm-svn: 370442
2019-08-30 08:26:37 +00:00
Dmitri Gribenko 4fc0d3bd09 [ADT] Removed VariadicFunction
Summary:
It is not used. It uses macro-based unrolling instead of variadic
templates, so it is not idiomatic anymore, and therefore it is a
questionable API to keep "just in case".

Subscribers: mgorny, dmgreen, dexonsmith, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66961

llvm-svn: 370441
2019-08-30 08:21:55 +00:00
Martin Storsjo 3d3a9b3b41 [LLD] [COFF] Support merging resource object files
Extend WindowsResourceParser to support using a ResourceSectionRef for
loading resources from an object file.

Only allow merging resource object files in mingw mode; keep the
existing error on multiple resource objects in link mode.

If there only is one resource object file and no .res resources,
don't parse and recreate the .rsrc section, but just link it in without
inspecting it. This allows users to produce any .rsrc section (outside
of what the parser supports), just like before. (I don't have a specific
need for this, but it reduces the risk of this new feature.)

Separate out the .rsrc section chunks in InputFiles.cpp, and only include
them in the list of section chunks to link if we've determined that there
only was one single resource object. (We need to keep other chunks from
those object files, as they can legitimately contain other sections as
well, in addition to .rsrc section chunks.)

Differential Revision: https://reviews.llvm.org/D66824

llvm-svn: 370436
2019-08-30 06:56:33 +00:00
Martin Storsjo d8d63ff24b [WindowsResource] Remove use of global variables in WindowsResourceParser
Instead of updating a global variable counter for the next index of
strings and data blobs, pass along a reference to actual data/string
vectors and let the TreeNode insertion methods add their data/strings to
the vectors when a new entry is needed.

Additionally, if the resource tree had duplicates, that were ignored
with -force:multipleres in lld, we no longer store all versions of the
duplicated resource data, now we only keep the one that actually ends
up referenced.

Differential Revision: https://reviews.llvm.org/D66823

llvm-svn: 370435
2019-08-30 06:56:02 +00:00
Martin Storsjo e62d5682fb [WindowsResource] Avoid duplicating the input filenames for each resource. NFC.
Differential Revision: https://reviews.llvm.org/D66821

llvm-svn: 370434
2019-08-30 06:55:54 +00:00
Martin Storsjo 9438221785 [COFF] Add a ResourceSectionRef method for getting resource contents
This allows llvm-readobj to print the contents of each resource
when printing resources from an object file or executable, like it
already does for plain .res files.

This requires providing the whole COFFObjectFile to ResourceSectionRef.

This supports both object files and executables. For executables,
the DataRVA field is used as is to look up the right section.

For object files, ideally we would need to complete linking of them
and fix up all relocations to know what the DataRVA field would end up
being. In practice, the only thing that makes sense for an RVA field
is an ADDR32NB relocation. Thus, find a relocation pointing at this
field, verify that it has the expected type, locate the symbol it
points at, look up the section the symbol points at, and read from the
right offset in that section.

This works both for GNU windres object files (which use one single
.rsrc section, with all relocations against the base of the .rsrc
section, with the original value of the DataRVA field being the
offset of the data from the beginning of the .rsrc section) and
cvtres object files (with two separate .rsrc$01 and .rsrc$02 sections,
and one symbol per data entry, with the original pre-relocated DataRVA
field being set to zero).

Differential Revision: https://reviews.llvm.org/D66820

llvm-svn: 370433
2019-08-30 06:55:49 +00:00
Petar Avramovic e96892a8aa [MIPS GlobalISel] Lower uitofp
Add custom lowering for G_UITOFP for MIPS32.

Differential Revision: https://reviews.llvm.org/D66930

llvm-svn: 370432
2019-08-30 05:51:12 +00:00
Petar Avramovic 6412b56513 [MIPS GlobalISel] Lower fptoui
Add lower for G_FPTOUI. Algorithm is similar to the SDAG version
in TargetLowering::expandFP_TO_UINT.
Lower G_FPTOUI for MIPS32.

Differential Revision: https://reviews.llvm.org/D66929

llvm-svn: 370431
2019-08-30 05:44:02 +00:00
Dan Gohman 8cfeeaf9de [CodeGen] Fix lowering for returning the result of an extractvalue
When the number of return values exceeds the number of registers available,
SelectionDAGBuilder::visitRet transforms a function's return to use a
pointer to a buffer to hold return values. When the returned value is an
operator such as extractvalue, the value may have a non-zero result number.
Add that number to the indexing when obtaining the values to store.

This fixes https://bugs.llvm.org/show_bug.cgi?id=43132.

Differential Revision: https://reviews.llvm.org/D66978

llvm-svn: 370430
2019-08-30 04:33:22 +00:00
Jinsong Ji a070f12e57 [PowerPC][NFC] Use inline Subtarget->isPPC64()
To be consistent with all the other instances.

llvm-svn: 370428
2019-08-30 03:16:41 +00:00
Jinsong Ji 54a1ad5bd7 [PowerPC][NFC] Use -mtriple in RUN line, remove target triple in tls.ll
To avoid confusion, especially when -mtriple are also added for PPC32.

llvm-svn: 370427
2019-08-30 02:57:33 +00:00
Fangrui Song 7704b54389 [PPC32] Emit R_PPC_GOT_TPREL16 instead R_PPC_GOT_TPREL16_LO
Unlike ppc64, which has ADDISgotTprelHA+LDgotTprelL pairs,
ppc32 just uses LDgotTprelL32, so it does not make lots of sense to use
_LO without a paired _HA.

Emit R_PPC_GOT_TPREL16 instead R_PPC_GOT_TPREL16_LO to match GCC, and
get better linker relocation check. Note, R_PPC_GOT_TPREL16_{HA,LO}
don't have good linker support:

(a) lld does not support R_PPC_GOT_TPREL16_{HA,LO}.
(b) Top of tree ld.bfd does not support R_PPC_GOT_REL16_HA Initial-Exec -> Local-Exec relaxation:

  // a.o
  addis 3, 3, tsd_tls@got@tprel@ha
  lwz 3, tsd_tls@got@tprel@l(3)
  add 3, 3, tsd_tls@tls
  // b.o
  .section .tdata,"awT"; .globl tsd_tls; tsd_tls:

  // ld/ld-new a.o b.o
  internal error, aborting at ../../bfd/elf32-ppc.c:7952 in ppc_elf_relocate_section

Reviewed By: adalava

Differential Revision: https://reviews.llvm.org/D66925

llvm-svn: 370426
2019-08-30 02:20:49 +00:00
Craig Topper 160ed4cab4 [X86] Explicitly list all the always trivially rematerializable instructions.
Add a default with an llvm_unreachable for anything we don't expect.

This seems safer that just blindly returning true for anything
missing from the switch.

llvm-svn: 370424
2019-08-30 00:54:36 +00:00
Saleem Abdulrasool be638099a4 DebugInfo: add CodeView register mapping for ARM NT
Add the core registers and NEON registers mapping to the CodeView
register ID.  This is sufficient to compile a basic C program with debug
info using CodeView debug info.

llvm-svn: 370423
2019-08-30 00:16:02 +00:00
Dan Gohman da84b688f9 [WebAssembly] Make __attribute__((used)) not imply export.
Add an WASM_SYMBOL_NO_STRIP flag, so that __attribute__((used)) doesn't
need to imply exporting. When targeting Emscripten, have
WASM_SYMBOL_NO_STRIP imply exporting.

Differential Revision: https://reviews.llvm.org/D62542

llvm-svn: 370415
2019-08-29 22:40:00 +00:00
Philip Reames 452e5647a5 [Tests] Precommit a few cases where we're missing oppurtunities for block local simplications off assumes.
llvm-svn: 370414
2019-08-29 22:08:17 +00:00
Jinsong Ji 1ed7d2119e [PowerPC] Support extended mnemonics mffprwz etc.
Summary:
Reported in https://github.com/opencv/opencv/issues/15413.

We have serveral extended mnemonics for Move To/From Vector-Scalar Register Instructions
eg: mffprd,mtfprd etc.

We only support one of them, this patch add the others.

Reviewers: nemanjai, steven.zhang, hfinkel, #powerpc

Reviewed By: hfinkel

Subscribers: wuzish, qcolombet, hiraditya, kbarton, MaskRay, shchenz, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66963

llvm-svn: 370411
2019-08-29 21:53:59 +00:00
Jessica Paquette 04e657be28 [AArch64][GlobalISel] Select arithmetic extended register patterns
This teaches GISel to select patterns which fold an extend plus optional shift
into the addressing mode. In particular, adds and subs.

Factor out the arith extended register ComplexPatterns in AArch64InstrFormats.td
and create GISel equivalents.

Add some equivalent functions to the ones in AArch64ISelDAGToDAG:

- `selectArithExtendedRegister`
- `narrowExtendRegIfNeeded`
- `getExtendTypeForInst`

`getExtendTypeForInst` includes the checks for loads and stores. This will be
used for WRO addressing modes in loads + stores.

Teach selectCopy to properly handle subregister copies on the same bank in
order to support `narrowExtendRegIfNeeded`. The extended register must be a
GPR32, so we need to support same-bank subregister copies.

Fix a bug in getSubRegForClass which would cause registers on things like
GPR32common to end up getting ssub. Just change the check to look for FPR32
rather than GPR32.

For tests:

- Add select-arith-extended-reg.mir
- Update addsub_ext.ll to include GlobalISel checks

Differential Revision: https://reviews.llvm.org/D66835

llvm-svn: 370410
2019-08-29 21:53:58 +00:00
Reid Kleckner 5b79e603d3 [X86] Don't emit unreachable stack adjustments
Summary:
This is a minor improvement on our past attempts to do this. Fixes
PR43155.

Reviewers: hans

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66905

llvm-svn: 370409
2019-08-29 21:24:41 +00:00
Reid Kleckner 81e458d001 Allow '@' to appear in x86 mingw symbols
Summary:
There is no reason to differ in assembler behavior here between -msvc
and -gnu targets. Without this setting, the text after the '@' is
interpreted as a symbol variable, like foo@IMGREL.

Reviewers: mstorsjo

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66974

llvm-svn: 370408
2019-08-29 21:15:02 +00:00
Sanjay Patel 33541fafde [InstCombine] add possible bswap as widening shuffle test; NFC
Goes with the proposal in D66965.

llvm-svn: 370407
2019-08-29 20:57:50 +00:00
Reid Kleckner fe47ed67fc Fix the build for MSVC builds using M_PI
llvm-svn: 370405
2019-08-29 20:32:53 +00:00
Simon Pilgrim 3d705a1fa4 [X86][SSE] combinePMULDQ - pmuldq(x, 0) -> zero vector (PR43159)
ISD::isBuildVectorAllZeros permits undef elements to be present, which means we can't return it as a zero vector. PMULDQ/PMULUDQ is an extending multiply so a multiply by zero of the lower 32-bits should result in a zero 64-bit element.

llvm-svn: 370404
2019-08-29 20:22:08 +00:00
Julian Lettner af78899457 [ASan] Version mismatch check follow-up
Follow-up for:
[ASan] Make insertion of version mismatch guard configurable
3ae9b9d5e4

This tiny change makes sure that this test passes on our internal bots
as well.

llvm-svn: 370403
2019-08-29 20:20:05 +00:00
Matt Arsenault cbd1782c79 AMDGPU/GlobalISel: Legalize sin/cos
llvm-svn: 370402
2019-08-29 20:06:48 +00:00
Sanjay Patel 65f1c04000 [InstCombine] reduce duplicated code; NFC
llvm-svn: 370399
2019-08-29 19:36:18 +00:00
Jordan Rupprecht f9f81289e6 Revert [MBP] Disable aggressive loop rotate in plain mode
This reverts r369664 (git commit 51f48295cb)

It causes many benchmark regressions, internally and in llvm's benchmark suite.

llvm-svn: 370398
2019-08-29 19:03:58 +00:00
Alina Sbirlea 4b87023bae Revert enabling MemorySSA.
Breaks sanitizers bots.

Differential Revision: https://reviews.llvm.org/D58311

llvm-svn: 370397
2019-08-29 19:01:23 +00:00
Craig Topper 5a43fdd313 [X86] Remove what little support we had for MPX
-Deprecate -mmpx and -mno-mpx command line options
-Remove CPUID detection of mpx for -march=native
-Remove MPX from all CPUs
-Remove MPX preprocessor define

I've left the "mpx" string in the backend so we don't fail on old IR, but its not connected to anything.

gcc has also deprecated these command line options. https://www.phoronix.com/scan.php?page=news_item&px=GCC-Patch-To-Drop-MPX

Differential Revision: https://reviews.llvm.org/D66669

llvm-svn: 370393
2019-08-29 18:09:02 +00:00
Matt Arsenault 093ebf9275 GlobalISel: Don't compute known bits for non-integral GEP
llvm-svn: 370392
2019-08-29 17:55:05 +00:00
Florian Hahn f9cdb98f40 [LoopUnrollAndJam] Use Lazy strategy for DTU.
We can also apply the earlier updates to the lazy DTU, instead of
applying them directly.

Reviewers: kuhar, brzycki, asbirlea, SjoerdMeijer

Reviewed By: brzycki, asbirlea, SjoerdMeijer

Differential Revision: https://reviews.llvm.org/D66918

llvm-svn: 370391
2019-08-29 17:47:58 +00:00
Matt Arsenault b2b9a23758 GlobalISel: Add maskedValueIsZero and signBitIsZero to known bits
I dropped the DemandedElts since it seems to be missing from some of
the new interfaces, but not others.

llvm-svn: 370389
2019-08-29 17:24:36 +00:00
Matt Arsenault caff0a88dd GlobalISel: Add known bits to InstructionSelector
AMDGPU uses this for some addressing mode selection patterns. The
analysis run itself doesn't do anything so it seems easier to just
always require this than adding a way to opt in.

llvm-svn: 370388
2019-08-29 17:24:32 +00:00
Alina Sbirlea 6289ee941d [MemorySSA & LoopPassManager] Enable MemorySSA as loop dependency. Update tests.
Summary:
I'm not planning to check this in at the moment, but feedback is very welcome, in particular how this affects performance.
The feedback obtains here will guide the next steps towards enabling this.

This patch enables the use of MemorySSA in the loop pass manager.

Passes that currently use MemorySSA:
 - EarlyCSE
Passes that use MemorySSA after this patch:
 - EarlyCSE
 - LICM
 - SimpleLoopUnswitch
Loop passes that update MemorySSA (and do not use it yet, but could use it after this patch):
 - LoopInstSimplify
 - LoopSimplifyCFG
 - LoopUnswitch
 - LoopRotate
 - LoopSimplify
 - LCSSA
Loop passes that do *not* update MemorySSA:
 - IndVarSimplify
 - LoopDelete
 - LoopIdiom
 - LoopSink
 - LoopUnroll
 - LoopInterchange
 - LoopUnrollAndJam
 - LoopVectorize
 - LoopReroll
 - IRCE

Reviewers: chandlerc, george.burgess.iv, davide, sanjoy, gberry

Subscribers: jlebar, Prazek, dmgreen, jdoerfert, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D58311

llvm-svn: 370384
2019-08-29 17:08:13 +00:00
Jessica Paquette ba04f5fac1 [GlobalISel][AArch64] Select llvm.aarch64.stxr* intrinsics.
Add a GISelPredicateCode to the stxr_* PatFrags in AArch64InstrAtomics.td.

This allows us to select these intrinsics.

Differential Revision: https://reviews.llvm.org/D65779

llvm-svn: 370382
2019-08-29 16:55:55 +00:00
Sanjay Patel 63411910a2 [InstCombine] add tests for bswap disguised as shuffle; NFC
Somewhat motivating case In PR43146:
https://bugs.llvm.org/show_bug.cgi?id=43146

But that's a lot more complicated.

llvm-svn: 370381
2019-08-29 16:48:00 +00:00
Jessica Paquette b8b23a1648 [GlobalISel][AArch64] Use a GISelPredicateCode to select llvm.aarch64.stlxr.*
Remove manual selection code for this intrinsic and use a GISelPredicateCode
instead.

This allows us to fully select this intrinsic without any tricky custom C++
matching.

Differential Revision: https://reviews.llvm.org/D65780

llvm-svn: 370380
2019-08-29 16:45:19 +00:00
Jessica Paquette 87720ac8c8 [AArch64][GlobalISel] Select @llvm.aarch64.ldxr.* intrinsics
Same thing as D66897, but for ldxr.* instead. Add a GISelPredicateCode to the
ldxr_* definitions, which allows us to import them.

Add select-ldxr-intrin.mir, and update arm64-ldxr-stxr.ll.

Differential Revision: https://reviews.llvm.org/D66898

llvm-svn: 370378
2019-08-29 16:33:01 +00:00
Jessica Paquette c327daeea5 [AArch64][GlobalISel] Select @llvm.aarch64.ldaxr.* intrinsics
Add a GISelPredicateCode to ldaxr_*. This allows us to import the patterns for
@llvm.aarch64.ldaxr.*, and thus select them.

Add `isLoadStoreOfNumBytes` for the GISelPredicateCode, since each of these
intrinsics involves the same check.

Add select-ldaxr-intrin.mir, and update arm64-ldxr-stxr.ll.

Differential Revision: https://reviews.llvm.org/D66897

llvm-svn: 370377
2019-08-29 16:16:38 +00:00
Michael Liao 001871dee8 [SimplifyCFG] Skip sinking common lifetime markers of `alloca`.
Summary:
- Similar to the workaround in fix of PR30188, skip sinking common
  lifetime markers of `alloca`. They are mostly left there after
  inlining functions in branches.

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66950

llvm-svn: 370376
2019-08-29 16:12:05 +00:00
Jinsong Ji 8b0317ad7d [PowerPC][NFC] Update fp-int-conversions-direct-moves.ll using script
Also add -ppc-asm-full-reg-names,-ppc-vsr-nums-as-vr.

llvm-svn: 370375
2019-08-29 15:38:02 +00:00
Roman Lebedev 05ef49515e [NFC][SimplifyCFG] 'Safely extract low bits' pattern will also benefit from -phi-node-folding-threshold=3
This is the naive implementation of x86 BZHI/BEXTR instruction:
it takes input and bit count, and extracts low nbits up to bit width.
I.e. unlike shift it does not have any UB when nbits >= bitwidth.
Which means we don't need a while PHI here, simple select will do.
And if it's a select, it should then be trivial to fix codegen
to select it to BEXTR/BZHI.

See https://bugs.llvm.org/show_bug.cgi?id=34704

llvm-svn: 370369
2019-08-29 14:46:49 +00:00
Simon Pilgrim ea67741899 [DAGCombine] Fix shadow variable warnings. NFCI.
llvm-svn: 370365
2019-08-29 14:34:07 +00:00
Pavel Labath bd546e5902 DWARFDebugLoc: Make parsing and error reporting more robust
Summary:
While examining this class for possible use in lldb, I noticed two
things:
- it spits out parsing errors directly to stderr
- the loclists parser can incorrectly return valid location lists when
  parsing malformed (truncated) data

I improve the stderr situation by making the parseOneLocationList
functions return Expected<T>s. The errors are still dumped to stderr by
their callers, so this is only a partial fix, but it is enough for my
use case, as I intend to parse the locations lists one by one.

I fix the behavior in the truncated scenario by using the newly
introduced DataExtractor Cursor API.

I also add tests for handling the error cases, as they currently have no
coverage.

Reviewers: dblaikie, JDevlieghere, probinson

Subscribers: lldb-commits, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D63591

llvm-svn: 370363
2019-08-29 14:26:05 +00:00
Luis Marques cf3b39391e [RISCV] Fix callee-saved-gprs.ll test ABIs
llvm-svn: 370359
2019-08-29 14:05:59 +00:00
Joerg Sonnenberger 799c96693f Allow replaceAndRecursivelySimplify to list unsimplified visitees.
This is part of D65280 and split it to avoid ABI changes on the 9.0
release branch.

llvm-svn: 370355
2019-08-29 13:22:30 +00:00
Simon Atanasyan b23857c149 [mips] Inline emitStoreWithSymOffset and emitLoadWithSymOffset methods. NFC
Both methods `MipsTargetStreamer::emitStoreWithSymOffset` and
`MipsTargetStreamer::emitLoadWithSymOffset` are almost the same and
differ argument names only. These methods are used in the single place
so it's better to inline their code and remove original methods.

llvm-svn: 370354
2019-08-29 13:19:50 +00:00
Simon Atanasyan 3464b91ef7 [mips] Fix expanding `lw/sw $reg1, symbol($reg2)` instruction
When a "base" in the `lw/sw $reg1, symbol($reg2)` instruction is
a register and generated code is position independent, backend
does not add the "base" value to the symbol address.
```
lw     $reg1, %got(symbol)($gp)
lw/sw  $reg1, 0($reg1)
```

This patch fixes the bug and adds the missed `addu` instruction by
passing `BaseReg` into the `loadAndAddSymbolAddress` routine and handles
the case when the `BaseReg` is the zero register to escape redundant
`move reg, reg` instruction:
```
lw     $reg1, %got(symbol)($gp)
addu   $reg1, $reg1, $reg2
lw/sw  $reg1, 0($reg1)
```

Differential Revision: https://reviews.llvm.org/D66894

llvm-svn: 370353
2019-08-29 13:19:38 +00:00
Roman Lebedev c584786854 [InstSimplify] Drop leftover "division-by-zero guard" around `@llvm.umul.with.overflow` inverted overflow bit
Summary:
Now that with D65143/D65144 we've produce `@llvm.umul.with.overflow`,
and with D65147 we've flattened the CFG, we now can see that
the guard may have been there to prevent division by zero is redundant.
We can simply drop it:
```
----------------------------------------
Name: no overflow or zero
  %iszero = icmp eq i4 %y, 0
  %umul = smul_overflow i4 %x, %y
  %umul.ov = extractvalue {i4, i1} %umul, 1
  %umul.ov.not = xor %umul.ov, -1
  %retval.0 = or i1 %iszero, %umul.ov.not
  ret i1 %retval.0
=>
  %iszero = icmp eq i4 %y, 0
  %umul = smul_overflow i4 %x, %y
  %umul.ov = extractvalue {i4, i1} %umul, 1
  %umul.ov.not = xor %umul.ov, -1
  %retval.0 = or i1 %iszero, %umul.ov.not
  ret i1 %umul.ov.not

Done: 1
Optimization is correct!
```
Note that this is inverted from what we have in a previous patch,
here we are looking for the inverted overflow bit.
And that inversion is kinda problematic - given this particular
pattern we neither hoist that `not` closer to `ret` (then the pattern
would have been identical to the one without inversion,
and would have been handled by the previous patch), neither
do the opposite transform. But regardless, we should handle this too.
I've filled [[ https://bugs.llvm.org/show_bug.cgi?id=42720 | PR42720 ]].

Reviewers: nikic, spatel, xbolva00, RKSimon

Reviewed By: spatel

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65151

llvm-svn: 370351
2019-08-29 12:48:04 +00:00
Roman Lebedev aaf6ab4410 [InstSimplify] Drop leftover "division-by-zero guard" around `@llvm.umul.with.overflow` overflow bit
Summary:
Now that with D65143/D65144 we've produce `@llvm.umul.with.overflow`,
and with D65147 we've flattened the CFG, we now can see that
the guard may have been there to prevent division by zero is redundant.
We can simply drop it:
```
----------------------------------------
Name: no overflow and not zero
  %iszero = icmp ne i4 %y, 0
  %umul = umul_overflow i4 %x, %y
  %umul.ov = extractvalue {i4, i1} %umul, 1
  %retval.0 = and i1 %iszero, %umul.ov
  ret i1 %retval.0
=>
  %iszero = icmp ne i4 %y, 0
  %umul = umul_overflow i4 %x, %y
  %umul.ov = extractvalue {i4, i1} %umul, 1
  %retval.0 = and i1 %iszero, %umul.ov
  ret %umul.ov

Done: 1
Optimization is correct!
```

Reviewers: nikic, spatel, xbolva00

Reviewed By: spatel

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65150

llvm-svn: 370350
2019-08-29 12:47:50 +00:00
Roman Lebedev 9f35d2b564 [SimplifyCFG] FoldTwoEntryPHINode(): don't bailout on i1 PHI's if we can hoist a 'not' from incoming values
Summary:
As it can be seen in the tests in D65143/D65144, even though we have formed an '@llvm.umul.with.overflow'
and got rid of potential for division-by-zero, the control flow remains, we still have that branch.

We have this condition:
```
  // Don't fold i1 branches on PHIs which contain binary operators
  // These can often be turned into switches and other things.
  if (PN->getType()->isIntegerTy(1) &&
      (isa<BinaryOperator>(PN->getIncomingValue(0)) ||
       isa<BinaryOperator>(PN->getIncomingValue(1)) ||
       isa<BinaryOperator>(IfCond)))
    return false;
```
which was added back in rL121764 to help with `select` formation i think?

That check prevents us to flatten the CFG here, even though we know
we no longer need that guard and will be able to drop everything
but the '@llvm.umul.with.overflow' + `not`.

As it can be seen from tests, we end here because the `not` is being
sinked into the PHI's incoming values by InstCombine,
so we can't workaround this by hoisting it to after PHI.

Thus i suggest that we relax that check to not bailout if we'd get to hoist the `not`.

Reviewers: craig.topper, spatel, fhahn, nikic

Reviewed By: spatel

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65147

llvm-svn: 370349
2019-08-29 12:47:34 +00:00
Roman Lebedev 473a063a5e [InstCombine] Fold '((%x * %y) u/ %x) != %y' to '@llvm.umul.with.overflow' + overflow bit extraction
Summary:
`((%x * %y) u/ %x) != %y` is one of (3?) common ways to check that
some unsigned multiplication (will not) overflow.
Currently, we don't catch it. We could:
```
$ /repositories/alive2/build-Clang-unknown/alive -root-only ~/llvm-patch1.ll
Processing /home/lebedevri/llvm-patch1.ll..

----------------------------------------
Name: no overflow
  %o0 = mul i4 %y, %x
  %o1 = udiv i4 %o0, %x
  %r = icmp ne i4 %o1, %y
  ret i1 %r
=>
  %n0 = umul_overflow i4 %x, %y
  %o0 = extractvalue {i4, i1} %n0, 0
  %o1 = udiv %o0, %x
  %r = extractvalue {i4, i1} %n0, 1
  ret %r

Done: 1
Optimization is correct!

----------------------------------------
Name: no overflow
  %o0 = mul i4 %y, %x
  %o1 = udiv i4 %o0, %x
  %r = icmp eq i4 %o1, %y
  ret i1 %r
=>
  %n0 = umul_overflow i4 %x, %y
  %o0 = extractvalue {i4, i1} %n0, 0
  %o1 = udiv %o0, %x
  %n1 = extractvalue {i4, i1} %n0, 1
  %r = xor %n1, -1
  ret i1 %r

Done: 1
Optimization is correct!

```

Reviewers: nikic, spatel, efriedma, xbolva00, RKSimon

Reviewed By: nikic

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65144

llvm-svn: 370348
2019-08-29 12:47:20 +00:00
Roman Lebedev fb38b7aab3 [InstCombine] Fold '(-1 u/ %x) u< %y' to '@llvm.umul.with.overflow' + overflow bit extraction
Summary:
`(-1 u/ %x) u< %y` is one of (3?) common ways to check that
some unsigned multiplication (will not) overflow.
Currently, we don't catch it. We could:
```
----------------------------------------
Name: no overflow
  %o0 = udiv i4 -1, %x
  %r = icmp ult i4 %o0, %y
=>
  %o0 = udiv i4 -1, %x
  %n0 = umul_overflow i4 %x, %y
  %r = extractvalue {i4, i1} %n0, 1

Done: 1
Optimization is correct!

----------------------------------------
Name: no overflow, swapped
  %o0 = udiv i4 -1, %x
  %r = icmp ugt i4 %y, %o0
=>
  %o0 = udiv i4 -1, %x
  %n0 = umul_overflow i4 %x, %y
  %r = extractvalue {i4, i1} %n0, 1

Done: 1
Optimization is correct!

----------------------------------------
Name: overflow
  %o0 = udiv i4 -1, %x
  %r = icmp uge i4 %o0, %y
=>
  %o0 = udiv i4 -1, %x
  %n0 = umul_overflow i4 %x, %y
  %n1 = extractvalue {i4, i1} %n0, 1
  %r = xor %n1, -1

Done: 1
Optimization is correct!

----------------------------------------
Name: overflow
  %o0 = udiv i4 -1, %x
  %r = icmp ule i4 %y, %o0
=>
  %o0 = udiv i4 -1, %x
  %n0 = umul_overflow i4 %x, %y
  %n1 = extractvalue {i4, i1} %n0, 1
  %r = xor %n1, -1

Done: 1
Optimization is correct!
```

As it can be observed from tests, while simply forming the `@llvm.umul.with.overflow`
is easy, if we were looking for the inverted answer, then more work needs to be done
to cleanup the now-pointless control-flow that was guarding against division-by-zero.
This is being addressed in follow-up patches.

Reviewers: nikic, spatel, efriedma, xbolva00, RKSimon

Reviewed By: nikic, xbolva00

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65143

llvm-svn: 370347
2019-08-29 12:47:08 +00:00
Roman Lebedev cc95a45f8a [CostModel] Model all `extractvalue`s as free.
Summary:
As disscussed in https://reviews.llvm.org/D65148#1606412,
`extractvalue` don't actually generate any code,
so we should treat them as free.

Reviewers: craig.topper, RKSimon, jnspaulsson, greened, asb, t.p.northover, jmolloy, dmgreen

Reviewed By: jmolloy

Subscribers: javed.absar, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66098

llvm-svn: 370339
2019-08-29 11:50:30 +00:00
Jeremy Morse ca0e4b3689 [DebugInfo] LiveDebugValues: correctly discriminate kinds of variable locations
The missing line added by this patch ensures that only spilt variable
locations are candidates for being restored from the stack. Otherwise,
register or constant-value information can be interpreted as a spill
location, through a union.

The added regression test replicates a scenario where this occurs: the
stack load from [rsp] causes the register-location DBG_VALUE to be
"restored" to rsi, when it should be left alone. See PR43058 for details.

Un x-fail a test that was suffering from this from a previous patch.

Differential Revision: https://reviews.llvm.org/D66895

llvm-svn: 370334
2019-08-29 11:20:54 +00:00
Simon Pilgrim 6c2fc64edc Fix signed/unsigned comparison warning. NFCI.
llvm-svn: 370333
2019-08-29 11:18:53 +00:00
Simon Pilgrim 27f43e6b1a Fix shadow variable warning. NFCI.
llvm-svn: 370332
2019-08-29 11:16:32 +00:00
George Rimar de0bc44883 [yaml2obj] - Allow placing local symbols after globals.
This allows us to produce broken binaries with local
symbols placed after global in '.dynsym'/'.symtab'

Also, simplifies the code.

Differential revision: https://reviews.llvm.org/D66799

llvm-svn: 370331
2019-08-29 10:58:47 +00:00
George Rimar 72e9584698 [llvm-readobj/llvm-readelf] - Report a proper warning when dumping a broken dynamic relocation.
When we have a dynamic relocation with a broken symbol's st_name,
tools report a useless error: "Invalid data was encountered while parsing the file".

After this change we report a warning + "<corrupt>" as a symbol name.

Differential revision: https://reviews.llvm.org/D66734

llvm-svn: 370330
2019-08-29 10:55:57 +00:00
David Green 942c2e3795 [ARM] MVE Masked loads and stores
Masked loads and store fit naturally with MVE, the instructions being easily
predicated. This adds lowering for the simple cases of masked loads and stores.
It does not yet deal with widening/narrowing or pre/post inc.

The llvm masked load intrinsic will accept a "passthru" value, dictating the
values used for the zero masked lanes. In MVE the instructions write 0 to the
zero predicated lanes, so we need to match a passthru that isn't 0 (or undef)
with a select instruction to pull in the correct data after the load.

We also need to do something with unaligned loads/stores. Currently this uses a
similar method used in big endian, using an VLDRB.8 (and potentially a VREV in
BE). This does mean that the predicate mask is converted from, for example, a
v4i1 to a v16i1. The VLDR instructions are defined as using the first bit of
the relevant mask lane, so this could potentially load different results if the
predicate is little odd. As the input is a v4i1 however, I believe this is OK
and all the bits required should be set in the predicate, making the VLDRB.8
load the same data.

Differential Revision: https://reviews.llvm.org/D66534

llvm-svn: 370329
2019-08-29 10:54:35 +00:00
Jeremy Morse 313d2ce999 [DebugInfo] LiveDebugValues should always revisit backedges if it skips them
The "join" method in LiveDebugValues does not attempt to join unseen
predecessor blocks if their out-locations aren't yet initialized, instead
the block should be re-visited later to see if any locations have changed
validity. However, because the set of blocks were all being "process"'d
once before "join" saw them, that logic in "join" was actually ignoring
legitimate out-locations on the first pass through. This meant that some
invalidated locations were not removed from the head of loops, allowing
illegal locations to persist.

Fix this by removing the run of "process" before the main join/process loop
in ExtendRanges. Now the unseen predecessors that "join" skips truly are
uninitialized, and we come back to the block at a later time to re-run
"join", see the @baz function added.

This also fixes another fault where stack/register transfers in the entry
block (or any other before-any-loop-block) had their tranfers initially
ignored, and were then never revisited. The MIR test added tests for this
behaviour.

XFail a test that exposes another bug; a fix for this is coming in D66895.

Differential Revision: https://reviews.llvm.org/D66663

llvm-svn: 370328
2019-08-29 10:53:29 +00:00
Roman Lebedev cc7495a355 [X86][CodeGen][NFC] Delay `combineIncDecVector()` from DAGCombine to X86DAGToDAGISel
Summary:
We were previously doing it in DAGCombine.
But we also want to do `sub %x, C` -> `add %x, (sub 0, C)` for vectors in DAGCombine.
So if we had `sub %x, -1`, we'll transform it to `add %x, 1`,
which `combineIncDecVector()` will immediately transform back into `sub %x, -1`,
and here we go again...

I've marked this as NFC since not a single test changes,
but since that 'changes' DAGCombine, probably this isn't fully NFC.

Reviewers: RKSimon, craig.topper, spatel

Reviewed By: craig.topper

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62327

llvm-svn: 370327
2019-08-29 10:50:09 +00:00
Amaury Sechet 8365e42010 [DAGCombiner] (insert_vector_elt (vector_shuffle X, Y), (extract_vector_elt X, N), IdxC) -> (vector_shuffle X, Y)
Summary: This is beneficial when the shuffle is only used once and end up being generated in a few places when some node is combined into a shuffle.

Reviewers: craig.topper, efriedma, RKSimon, lebedev.ri

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66718

llvm-svn: 370326
2019-08-29 10:35:51 +00:00
David Green e9211b764c [ARM] Masked load and store and predicate tests. NFC
llvm-svn: 370325
2019-08-29 10:32:12 +00:00
Roman Lebedev f13b0e3ed8 [InstCombine] Shift amount reassociation in bittest: trunc-of-lshr (PR42399)
Summary:
Finally, the fold i was looking forward to :)

The legality check is muddy, i doubt  i've groked the full generalization,
but it handles all the cases i care about, and can come up with:
https://rise4fun.com/Alive/26j

I.e. we can perform the fold if **any** of the following is true:
* The shift amount is either zero or one less than widest bitwidth
* Either of the values being shifted has at most lowest bit set
* The value that is being shifted by `shl` (which is not truncated) should have no less leading zeros than the total shift amount;
* The value that is being shifted by `lshr` (which **is** truncated) should have no less leading zeros than the widest bit width minus total shift amount minus one

I strongly suspect there is some better generalization, but i'm not aware of it as of right now.
For now i also avoided using actual `computeKnownBits()`, but restricted it to constants.

Reviewers: spatel, nikic, xbolva00

Reviewed By: spatel

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66383

llvm-svn: 370324
2019-08-29 10:26:23 +00:00
Simon Pilgrim dfb2a19ac2 LegalizeSetCCCondCode - Reduce scope of NeedSwap to fix cppcheck warning. NFCI.
No need for this to be defined outside the only switch case its used in.

llvm-svn: 370320
2019-08-29 10:11:34 +00:00
Simon Pilgrim 920b04011b Fix variable set but no used warnings on NDEBUG builds. NFCI.
llvm-svn: 370319
2019-08-29 10:08:45 +00:00
Simon Pilgrim ef9c6a7077 Fix variable set but no used warning on NDEBUG builds. NFCI.
llvm-svn: 370317
2019-08-29 09:58:47 +00:00
Martin Storsjo 7ba81d95d5 [COFF] Add a ResourceSectionRef method for getting the data entry, print it in llvm-readobj
Differential Revision: https://reviews.llvm.org/D66819

llvm-svn: 370311
2019-08-29 09:00:14 +00:00
Martin Storsjo edb6ab9ba6 [COFF] Add a bounds checking helper for iterating a coff_resource_dir_table
Instead of blindly incrementing pointers in llvm-readobj, use this
helper, which does bounds checking against the available section
data.

Differential Revision: https://reviews.llvm.org/D66818

llvm-svn: 370310
2019-08-29 08:59:56 +00:00
Martin Storsjo 357a40ec7c [COFF] Fix error handling in ResourceSectionRef
Previously, the expression (Reader.readFoo()) was expanded twice,
triggering asserts as one of the Error types ends up not checked
(and as it was expanded twice, the method would end up called twice
if it failed first).

Differential Revision: https://reviews.llvm.org/D66817

llvm-svn: 370309
2019-08-29 08:59:41 +00:00
Martin Storsjo e3e8874b89 [llvm-readobj] Print the resource type textually for .res files
This already is done when dumping resources from coff objects.

Differential Revision: https://reviews.llvm.org/D66816

llvm-svn: 370308
2019-08-29 08:59:31 +00:00
Martin Storsjo cdb9aa6339 [llvm-readobj] Remove a leftover string trim operation. NFC.
This became unnecessary in SVN r359153.

Differential Revision: https://reviews.llvm.org/D66815

llvm-svn: 370307
2019-08-29 08:59:05 +00:00
Craig Topper c96284002e [X86] Remove isel patterns with X86VBroadcast+scalar_to_vector+load.
The DAG should have these as X86VBroadcast+load.

llvm-svn: 370299
2019-08-29 06:36:16 +00:00
Craig Topper 231e628d69 [X86] Remove some unneeded X86VBroadcast isel patterns that have larger than 128 bit input types.
We should always be shrinking the input to 128 bits or smaller
when the node is created.

llvm-svn: 370296
2019-08-29 06:02:11 +00:00
Hideto Ueno cbab334e40 [Attributor] Deduce "noalias" attribute
Summary:
This patch adds very basic deduction for noalias.

Reviewers: jdoerfert, sstefan1

Reviewed By: jdoerfert

Tags: LLVM

Differential Revision: https://reviews.llvm.org/D66207

llvm-svn: 370295
2019-08-29 05:52:00 +00:00
Craig Topper 1ec5c204b8 [X86] Add a DAG combine to combine INSERTPS and VBROADCAST of a scalar load. Remove corresponding isel patterns.
We had an isel pattern to perform this, but its better to
do it in DAG combine as a simplification. This also fixes the lack
of patterns for AVX512 targets.

llvm-svn: 370294
2019-08-29 05:48:48 +00:00
Craig Topper 1aadf6f39f [X86] Make inline assembly 'x' and 'v' constraints work for f128.
Including a type legalizer fix to make bitcast operand promotion
work correctly when getSoftenedFloat returns f128 instead of i128.

Fixes PR43157

llvm-svn: 370293
2019-08-29 05:13:56 +00:00
Florian Hahn 3177b92231 [LoopUnroll] Use Lazy strategy for DTU used for MergeBlockIntoPredecessor.
We do not access the DT in the loop, so we do not have to apply updates
eagerly. We can apply them lazyly and flush them after we are done
merging blocks.

As follow-up work, we might be able to use the DTU above as well,
instead of manually updating the DT.

This brings the example from PR43134 from ~100s to ~4s for a relase +
assertions build on my machine.

Reviewers: efriedma, kuhar, asbirlea, brzycki

Reviewed By: kuhar, brzycki

Differential Revision: https://reviews.llvm.org/D66911

llvm-svn: 370292
2019-08-29 04:26:29 +00:00
Vitaly Buka db751c3778 [ObjectYAML] Fix lifetime issue in dumpDebugLines
Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66901

llvm-svn: 370289
2019-08-29 02:36:48 +00:00
Johannes Doerfert bf112139ac [Attributor] Improve messages in iteration verify mode
When we now verify the iteration count we will see the actual count
and the expected count before the assertion is triggered.

llvm-svn: 370285
2019-08-29 01:29:44 +00:00
Johannes Doerfert a283125ef2 [Attributor][NFC] Add const to map key
llvm-svn: 370284
2019-08-29 01:28:30 +00:00
Johannes Doerfert 62a9c1da78 [Attributor][Fix] Indicate change correctly
llvm-svn: 370283
2019-08-29 01:26:58 +00:00
Johannes Doerfert 1aac182f31 [Attributor] Fix typo
llvm-svn: 370282
2019-08-29 01:26:09 +00:00
Matt Arsenault 216d8ff60b AMDGPU: Don't use frame virtual registers
SGPR spills aren't really handled after SILowerSGPRSpills. In order to
directly control what happens if the scavenger needs to spill, the
scavenger needs to be used directly. There is an alternative to
spilling in these contexts anyway since the frame register can be
increment and restored.

This does present another possible issue if spilling is needed for the
unused carry out if an add is needed. I think this can be avoided by
using a scalar add (although that clobbers SCC, which happens anyway).

llvm-svn: 370281
2019-08-29 01:13:47 +00:00
Matt Arsenault 8ec5c10042 GlobalISel/TableGen: Handle setcc patterns
This is a special case because one node maps to two different G_
instructions, and the operand order is changed.

This mostly enables G_FCMP for AMDPGPU. G_ICMP is still manually
selected for now since it has the SALU and VALU complication to deal
with.

llvm-svn: 370280
2019-08-29 01:13:41 +00:00
Richard Trieu e4a7f0182d Add requirement to test.
-debug-only option for llc is only available in debug builds so
"REQUIRES: asserts" is needed in the tes.

llvm-svn: 370279
2019-08-29 00:46:57 +00:00
Craig Topper af364131af [X86] Fix a couple isel patterns to not shrink a volatile load.
Also add a FIXME because I'm not sure why these patterns exist. Looks like a missing combine.

And another FIXME because the AVX512 equivalent one of the patterns is missing.

llvm-svn: 370276
2019-08-28 23:45:10 +00:00
Shiva Chen b39876d8cd [RISCV] Avoid generating AssertZext for LP64 ABI when lowering floating LibCall
The patch fixed the issue that RV64 didn't clear the upper bits
when return complex floating value with lp64 ABI.

float _Complex
complex_add(float _Complex a, float _Complex b)
{
   return a + b;
}

RealResult = zero_extend(RealA + RealB)
ImageResult = ImageA + ImageB
Return (RealResult | (ImageResult << 32))

The patch introduces shouldExtendTypeInLibCall target hook to suppress
the AssertZext generation when lowering floating LibCall.

Thanks to Eli's comments from the Bugzilla
https://bugs.llvm.org/show_bug.cgi?id=42820

Differential Revision: https://reviews.llvm.org/D65497

llvm-svn: 370275
2019-08-28 23:40:37 +00:00
Heejin Ahn d85fd5a3f4 [WebAssembly] Add atomic.fence instruction
Summary:
This adds `atomic.fence` instruction:
https://github.com/WebAssembly/threads/blob/master/proposals/threads/Overview.md#fence-operator

And we now emit the new `atomic.fence` instruction for multithread
fences, rather than the prevous `atomic.rmw` hack.

Reviewers: dschuff

Subscribers: sbc100, jgravelle-google, hiraditya, sunfish, jfb, tlively, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66794

llvm-svn: 370272
2019-08-28 23:13:43 +00:00
Tom Stellard 5be949e3d0 [LLVM-C] Fix omission of INSTALL_WITH_TOOLCHAIN to llvm_add_library()
Due to a misstake with r365902 that tried to simplify the install with
toolchain logic LLVM-C.dll was no longer being installed.

Patch By: Jakob Bornecrantz

llvm-svn: 370271
2019-08-28 22:59:04 +00:00
Simon Atanasyan 027f1da010 [mips] Add an empty line to separate different patterns. NFC
llvm-svn: 370269
2019-08-28 22:32:16 +00:00
Simon Atanasyan 59bb3609fa [mips] Fix 64-bit address loading in case of applying 32-bit mask to the result
If result of 64-bit address loading combines with 32-bit mask, LLVM
tries to optimize the code and remove "redundant" loading of upper
32-bits of the address. It leads to incorrect code on MIPS64 targets.

MIPS backend creates the following chain of commands to load 64-bit
address in the `MipsTargetLowering::getAddrNonPICSym64` method:
```
(add (shl (add (shl (add %highest(sym), %higher(sym)),
                    16),
               %hi(sym)),
          16),
     %lo(%sym))
```

If the mask presents, LLVM decides to optimize the chain of commands. It
really does not make sense to load upper 32-bits because the 0x0fffffff
mask anyway clears them. After removing redundant commands we get this
chain:
```
(add (shl (%hi(sym), 16), %lo(%sym))
```

There is no patterns matched `(MipsHi (i64 symbol))`. Due a bug in `SYM_32`
predicate definition, backend incorrectly selects a pattern for a 32-bit
symbols and uses the `lui` instruction for loading `%hi(sym)`.

As a result we get incorrect set of instructions with unnecessary 16-bit
left shifting:
```
lui     at,0x0
    R_MIPS_HI16     foo
dsll    at,at,0x10
daddiu  at,at,0
    R_MIPS_LO16     foo
```

This patch resolves two problems:
- Fix `SYM_32/SYM_64` predicates to prevent selection of patterns dedicated
  to 32-bit symbols in case of using N64 ABI.
- Add missed patterns for 64-bit symbols for `%hi/%lo`.

Fix PR42736.

Differential Revision: https://reviews.llvm.org/D66228

llvm-svn: 370268
2019-08-28 22:32:10 +00:00
Jessica Paquette 01cd91aaea Add tie-breaker for register class sorting in getSuperRegForSubReg
llvm::stable_sort is apparently not sufficient.

Use the same tie-breaker/sorting style as TopoOrderRC fix bot failures.

E.g.

http://lab.llvm.org:8011/builders/llvm-clang-x86_64-expensive-checks-win/builds/19401/steps/test-check-all/logs/stdio

llvm-svn: 370267
2019-08-28 22:03:05 +00:00
Artur Pilipenko 925afc1ce7 Fix for "DICompileUnit not listed in llvm.dbg.cu" verification error after ...
...cloning a function from a different module

Currently when a function with debug info is cloned from a different module, the 
cloned function may have hanging DICompileUnits, so that the module with the 
cloned function fails debug info verification.

The proposed fix inserts all DICompileUnits reachable from the cloned function 
to "llvm.dbg.cu" metadata operands of the cloned function module. 

Reviewed By: aprantl, efriedma

Differential Revision: https://reviews.llvm.org/D66510

Patch by Oleg Pliss (Oleg.Pliss@azul.com)

llvm-svn: 370265
2019-08-28 21:27:50 +00:00
Jason Liu a1178b862a [llvm-readobj][XCOFF][NFC] Add return statement to avoid -Wimplicit-fallthrough warning
This is to fix the commit in r370097.

llvm-svn: 370260
2019-08-28 20:59:17 +00:00
Julian Lettner 3ae9b9d5e4 [ASan] Make insertion of version mismatch guard configurable
By default ASan calls a versioned function
`__asan_version_mismatch_check_vXXX` from the ASan module constructor to
check that the compiler ABI version and runtime ABI version are
compatible. This ensures that we get a predictable linker error instead
of hard-to-debug runtime errors.

Sometimes, however, we want to skip this safety guard. This new command
line option allows us to do just that.

rdar://47891956

Reviewed By: kubamracek

Differential Revision: https://reviews.llvm.org/D66826

llvm-svn: 370258
2019-08-28 20:40:55 +00:00
James Y Knight f025968bcc Ignore object files that lack coverage information.
Before this change, if multiple binary files were presented, all of them must have been instrumented or the load would fail with coverage_map_error::no_data_found.

Patch by Dean Sturtevant.

Differential Revision: https://reviews.llvm.org/D66763

llvm-svn: 370257
2019-08-28 20:35:50 +00:00
Philip Reames 0b62951e1d Use the handle --check-prefixes mechanism to de-verbosify a couple atomics tests [NFC]
llvm-svn: 370256
2019-08-28 20:27:39 +00:00
Jessica Paquette 7080ffa21a [GlobalISel] Import patterns containing SUBREG_TO_REG
Reuse the logic for INSERT_SUBREG to also import SUBREG_TO_REG patterns.

- Split `inferSuperRegisterClass` into two functions, one which tries to use
  an existing TreePatternNode (`inferSuperRegisterClassForNode`), and one that
  doesn't. SUBREG_TO_REG doesn't have a node to leverage, which is the cause
  for the split.

- Rename GlobalISelEmitterInsertSubreg.td to GlobalISelEmitterSubreg.td and
  update it.

- Update impacted tests in the AArch64 and X86 backends.

This is kind of a hit/miss for code size improvements/regressions. E.g. in
add-ext.ll, we now get some identity copies. This isn't really anything the
importer can handle, since it's caused by a later pass introducing the copy for
the sake of correctness.

Differential Revision: https://reviews.llvm.org/D66769

llvm-svn: 370254
2019-08-28 20:12:31 +00:00
Nico Weber 6acfc7c587 gn build: Merge r370249
llvm-svn: 370251
2019-08-28 19:38:59 +00:00
Scott Linder 04f6f25421 [AMDGPU] Fix bug when calculating user_spgr_count for Code Object V3 assembler
Stop counting explicitly disabled user_spgr's in the user_sgpr_count field of the kernel descriptor.

Differential Revision: https://reviews.llvm.org/D66900

llvm-svn: 370250
2019-08-28 19:38:15 +00:00
Sanjay Patel 2d4b6777c4 [InstCombine] clean up wrap propagation for reassociated ops; NFCI
Always true/false checks were flagged by static analysis;
https://bugs.llvm.org/show_bug.cgi?id=43143

I have not confirmed the logic difference in propagating nsw vs. nuw,
but presumably we would have noticed a bug by now if that was wrong.

llvm-svn: 370248
2019-08-28 18:58:06 +00:00
Pirama Arumuga Nainar 19205abaaa [ValueMapper] NFC: Remove dead code to pause metadata mapping
Summary:
This functionality was added when Mapper::mapMetadata was recursive.  It
is no longer needed after r265456, which switched it to be iterative.

Reviewers: dexonsmith, srhines

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66860

llvm-svn: 370236
2019-08-28 17:43:14 +00:00
Craig Topper a47db7110d [X86][ReleaseNotes] Add a note about the switch to widening legalization for narrow vectors.
llvm-svn: 370233
2019-08-28 17:18:56 +00:00
Johannes Doerfert f7ca0fe1c8 [Attributor] Regularly clear dependences to remove spurious ones
As dependences between abstract attributes can become stale, e.g., if
one was sufficient to imply another one at some point but it has since
been wakened to the point it is not usable for the formerly implied one.
To weed out spurious dependences, and thereby eliminate unneeded
updates, we introduce an option to determine how often the dependence
cache is cleared and recomputed during the fixpoint iteration.

Note that the initial value was determined such that we see a positive
result on our tests.

Differential Revision: https://reviews.llvm.org/D63315

llvm-svn: 370230
2019-08-28 16:58:52 +00:00
Kevin P. Neal ddf13c00ed [FPEnv] Add fptosi and fptoui constrained intrinsics.
This implements constrained floating point intrinsics for FP to signed and
unsigned integers.

Quoting from D32319:
The purpose of the constrained intrinsics is to force the optimizer to
respect the restrictions that will be necessary to support things like the
STDC FENV_ACCESS ON pragma without interfering with optimizations when
these restrictions are not needed.

Reviewed by:	Andrew Kaylor, Craig Topper, Hal Finkel, Cameron McInally, Roman Lebedev, Kit Barton
Approved by:	Craig Topper
Differential Revision:	http://reviews.llvm.org/D63782

llvm-svn: 370228
2019-08-28 16:33:36 +00:00
Jessica Paquette af0bd41e06 [AArch64][GlobalISel] Fall back when translating musttail calls
These are currently translated as normal functions calls in AArch64.

Until we have proper tail call lowering, we shouldn't translate these.

Differential Revision: https://reviews.llvm.org/D66842

llvm-svn: 370225
2019-08-28 16:19:01 +00:00
Simon Pilgrim 1d8a886c59 Reduce scope of variable only used in a local pattern match. NFCI.
llvm-svn: 370224
2019-08-28 16:06:33 +00:00
David Bolvansky 420327269e [NFC] Added more tests for D66651
llvm-svn: 370222
2019-08-28 16:00:15 +00:00
Craig Topper f79d8a064c [InstCombine] Disable recursion in foldGEPICmp for vector pointer GEPs
Due to missing vector support in this function, recursion can
generate worse code in some cases.

llvm-svn: 370221
2019-08-28 15:40:34 +00:00
Simon Pilgrim b569624049 Fix uninitialized variable warning in cppcheck. NFCI.
InstCombiner::MaxArraySizeForCombine is set outside the constructor so we need to ensure it has a default initialization value.

llvm-svn: 370220
2019-08-28 15:19:49 +00:00
David Bolvansky af118bb6d0 [NFC] Added a comment to avoid possible confusion
llvm-svn: 370217
2019-08-28 15:04:48 +00:00
Ryan Taylor 3b1459ed7c [AMDGPU] Adjust number of SGPRs available in Calling Convention
This reduces the number of SGPRs due to some concerns about running
out of SGPRs if you make all the SGPRs that aren't reserved available
for the calling convention.

Change-Id: Idb4ca4dc72f5b6808cb524ff7270915a8de5b4c1
llvm-svn: 370215
2019-08-28 15:00:45 +00:00
Simon Pilgrim 316bfb0f48 Remove duplicate 'BitWidth' variable. NFCI.
llvm-svn: 370212
2019-08-28 14:37:44 +00:00
Johannes Doerfert 07a5c129c6 [Attributor] Restrict liveness and return information to functions
Summary:
Until we have proper call-site information we should not recompute
liveness and return information for each call site. This patch directly
uses the function versions and introduces TODOs at the usage sites.

The required iterations to get to the fixpoint are most of the time
reduced by this change and we always avoid work duplication.

Reviewers: sstefan1, uenoku

Subscribers: hiraditya, bollu, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66562

llvm-svn: 370208
2019-08-28 14:09:14 +00:00
Simon Pilgrim 284118ce3b InstCombiner::visitSelectInst - rename Pred to MinMaxPred to stop shadow variable warning. NFCI.
We have a lot of Predicate variables, all similarly named....

llvm-svn: 370207
2019-08-28 14:05:38 +00:00
Vlad Tsyrklevich b8a96f4bf5 Reland "[yaml2obj] - Don't allow setting StOther and Other/Visibility at the same time."
This relands this commit, I mistakenly reverted the original change
thinking it was the cause of the observed MSan failures but it was not.

llvm-svn: 370206
2019-08-28 14:04:09 +00:00
Hans Wennborg cff90f07cb [SelectionDAG] Don't generate libcalls for wide shifts on Windows (PR42711)
Neither libgcc or compiler-rt are usually used on Windows, so these
functions can't be called.

Differential revision: https://reviews.llvm.org/D66880

llvm-svn: 370204
2019-08-28 13:55:10 +00:00
Amaury Sechet 3b44c36b29 [X86] Add test for rotate combining when add X, X is used instead of shl X, 1. NFC
llvm-svn: 370203
2019-08-28 13:52:32 +00:00
Vlad Tsyrklevich aba62e9c00 Revert "[yaml2obj] - Don't allow setting StOther and Other/Visibility at the same time."
This reverts commit r370032, it was causing check-llvm failures on
sanitizer-x86_64-linux-bootstrap-msan

llvm-svn: 370198
2019-08-28 13:15:08 +00:00
Simon Pilgrim 14e07d7f4b [DAGCombine] Fix cppcheck shadow variable warning. NFCI.
We already have an outer Ops variable.

llvm-svn: 370197
2019-08-28 12:48:41 +00:00
Simon Atanasyan f46ba4f077 [mips] Use less registers to load address of TargetExternalSymbol
There is no pattern matched `add hi, (MipsLo texternalsym)`. As a result,
loading an address of 32-bit symbol requires two registers and one more
additional instruction:
```
addiu $1, $zero, %lo(foo)
lui   $2, %hi(foo)
addu  $25, $2, $1
```

This patch adds the missed pattern and enables generation more effective
set of instructions:
```
lui   $1, %hi(foo)
addiu $25, $1, %lo(foo)
```

Differential Revision: https://reviews.llvm.org/D66771

llvm-svn: 370196
2019-08-28 12:35:53 +00:00
Amaury Sechet 4f4387dd12 [TargetLowering] Add buildLegalVectorShuffle facility to help build legal shuffles
Summary: There are at least 2 ways to express the same shuffle. Various pieces of code explicit check for both option, but other places do not when they would benefit from doing it. This patches refactor the codebase to use buildLegalVectorShuffle in order to make that behavior more consistent.

Reviewers: craig.topper, efriedma, RKSimon, lebedev.ri

Subscribers: javed.absar, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66804

llvm-svn: 370190
2019-08-28 12:00:06 +00:00
Simon Pilgrim c5b38e2869 [DAGCombine] Remove LoadedSlice::Cost default 'ForCodeSize' constructor arguments. NFCI.
These were always being passed in and it allowed me to add the explicit tag to stop a cppcheck warning about 1 argument constructors.

llvm-svn: 370189
2019-08-28 11:50:36 +00:00
Nico Weber d2f5854567 gn build: Merge r370187
llvm-svn: 370188
2019-08-28 11:42:20 +00:00
David Green 379f6186dd [ARM] Move MVEVPTBlockPass to a separate file. NFC
This just pulls the MVEVPTBlockPass into a separate file, as opposed to being
wrapped up in Thumb2ITBlockPass.

Differential revision: https://reviews.llvm.org/D66579

llvm-svn: 370187
2019-08-28 11:37:31 +00:00
David Green 1c5b143c99 [MVE] VMOVX patterns
This adds fp16 VMOVX patterns, using the same patterns as rL362482 with some
adjustments for MVE. It allows us to move fp16 registers without going into and
out of gprs.

VMOVX is able to move the top bits from a fp16 in a fp reg into the bottom bits
of another register, zeroing the rest. This can be used for odd MVE register
lanes. The top bits are not read by fp16 instructions, so no move is required
there if we are dealing with even lanes.

Differential revision: https://reviews.llvm.org/D66793

llvm-svn: 370184
2019-08-28 10:13:23 +00:00
Hans Wennborg 0af82068a8 [LLVM-C] Fix ByVal Attribute crashing
With the introduction of the typed byval attribute change there was no
way that the LLVM-C API could create the correct class Attribute. If a
program that uses the C API creates a ByVal attribute and annotates a
function with that attribute LLVM will crash when it assembles or write
that module containing the function out as bitcode.

This change is a minimal fix to at least allow code to work, this is
because the byval change is on the 9.0 and I don't want to introduce new
LLVM-C API this late in the release cycle.

By Jakob Bornecrantz!

Differential revision: https://reviews.llvm.org/D66144

llvm-svn: 370176
2019-08-28 09:21:56 +00:00
Ayal Zaks d15df0ede5 [LV] Fold tail by masking - handle reductions
Allow vectorizing loops that have reductions when tail is folded by masking.
A select is introduced in VPlan, choosing between the last value carried by the
loop-exit/live-out instruction of the reduction, and the penultimate value
carried by the reduction phi, according to the "i < n" mask of fold-tail.
This select replaces the last value as the live-out value of the loop.

Differential Revision: https://reviews.llvm.org/D66720

llvm-svn: 370173
2019-08-28 09:02:23 +00:00
Sam Parker a761ba0f2d [ARM][ParallelDSP] Change search for muls
rL369567 reverted a couple of recent changes made to ARMParallelDSP
because of a miscompilation error: PR43073.

The issue stemmed from an underlying bug that was caused by adding
muls into a reduction before it was proved that they could be executed
in parallel with another mul.

Most of the changes here are from the previously reverted commits.
The additional changes have been made area:
1) The Search function now doesn't insert any muls into the Reduction
   object. That now happens once the search has successfully finished.
2) For any muls added into the reduction but that weren't paired, we
   accumulate their values as an input into the smlad.

Differential Revision: https://reviews.llvm.org/D66660

llvm-svn: 370171
2019-08-28 08:51:13 +00:00
David Bolvansky 207c653965 [NFC] Unbreak tests
llvm-svn: 370170
2019-08-28 08:42:40 +00:00
David Bolvansky a0a8dd225d [NFC] Updated test
llvm-svn: 370169
2019-08-28 08:40:45 +00:00
David Bolvansky 05bda8b4e5 Annotate return values of allocation functions with dereferenceable_or_null
Summary:
Example
define dso_local noalias i8* @_Z6maixxnv() local_unnamed_addr #0 {
entry:
  %call = tail call noalias dereferenceable_or_null(64) i8* @malloc(i64 64) #6
  ret i8* %call
}


Reviewers: jdoerfert

Reviewed By: jdoerfert

Subscribers: aaron.ballman, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66651

llvm-svn: 370168
2019-08-28 08:28:20 +00:00
Yi Kong b9d87b9528 [llvm-objdump] Add the missing ARMv8 subarch detection
Differential Revision: https://reviews.llvm.org/D66849

llvm-svn: 370163
2019-08-28 06:37:22 +00:00
Fangrui Song 6964027315 [LoopFusion] Fix another -Wunused-function in -DLLVM_ENABLE_ASSERTIONS=off build
llvm-svn: 370156
2019-08-28 03:12:40 +00:00
Matt Arsenault a8bbcbd006 AMDGPU/GlobalISel: Fix constraining scalar and/or/xor
If the result register already had a register class assigned, the
sources may not have been properly constrained.

llvm-svn: 370150
2019-08-28 02:11:03 +00:00