Commit Graph

255 Commits

Author SHA1 Message Date
serge_sans_paille 39f50da2a3 Support -fstack-clash-protection for x86
Implement protection against the stack clash attack [0] through inline stack
probing.

Probe stack allocation every PAGE_SIZE during frame lowering or dynamic
allocation to make sure the page guard, if any, is touched when touching the
stack, in a similar manner to GCC[1].

This extends the existing `probe-stack' mechanism with a special value `inline-asm'.
Technically the former uses function call before stack allocation while this
patch provides inlined stack probes and chunk allocation.

Only implemented for x86.

[0] https://www.qualys.com/2017/06/19/stack-clash/stack-clash.txt
[1] https://gcc.gnu.org/ml/gcc-patches/2017-07/msg00556.html

Differential Revision: https://reviews.llvm.org/D68720
2020-02-07 10:56:15 +01:00
Hans Wennborg 5852475e2c Bump the trunk major version to 11
and clear the release notes.
2020-01-15 13:38:01 +01:00
Fangrui Song 03b9f0a5e1 Ignore "no-frame-pointer-elim" and "no-frame-pointer-elim-non-leaf" in favor of "frame-pointer"
D56351 (included in LLVM 8.0.0) introduced "frame-pointer".  All tests
which use "no-frame-pointer-elim" or "no-frame-pointer-elim-non-leaf"
have been migrated to use "frame-pointer".

Implement UpgradeFramePointerAttributes to upgrade the two obsoleted
function attributes for bitcode. Their semantics are ignored.

Differential Revision: https://reviews.llvm.org/D71863
2019-12-30 09:46:19 -08:00
Florian Hahn 5762648c46 [Docs] Fix sphinx build errors. 2019-12-23 21:53:30 +01:00
Don Hinton 6555995a6d [CommandLine] Add callbacks to Options
Summary:
Add a new cl::callback attribute to Option.

This attribute specifies a callback function that is called when
an option is seen, and can be used to set other options, as in
option A implies option B.  If the option is a `cl::list`, and
`cl::CommaSeparated` is also specified, the callback will fire
once for each value.  This could be used to validate combinations
or selectively set other options.

Reviewers: beanz, thomasfinch, MaskRay, thopre, serge-sans-paille

Reviewed By: beanz

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70620
2019-12-06 15:16:45 -08:00
Sourabh Singh Tomar f1e3988aa6 Recommit "[DWARF5]Addition of alignment atrribute in typedef DIE."
This revision is revised to update Go-bindings and Release Notes.

The original commit message follows.

This patch, adds support for DW_AT_alignment[DWARF5] attribute, to be emitted with typdef DIE.
When explicit alignment is specified.

Patch by Awanish Pandey <Awanish.Pandey@amd.com>

Reviewers: aprantl, dblaikie, jini.susan.george, SouraVX, alok,
deadalinx

Differential Revision: https://reviews.llvm.org/D70111
2019-12-03 09:51:43 +05:30
Tom Stellard 3ffbf9720f [cmake] Remove LLVM_{BUILD,LINK}_LLVM_DYLIB options on Windows
Summary: The options aren't supported so they can be removed.

Reviewers: beanz, smeenai, compnerd

Reviewed By: compnerd

Subscribers: mgorny, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D69877
2019-11-08 10:37:16 -08:00
Craig Topper b2b6a54f84 [X86] Add support for -mvzeroupper and -mno-vzeroupper to match gcc
-mvzeroupper will force the vzeroupper insertion pass to run on
CPUs that normally wouldn't. -mno-vzeroupper disables it on CPUs
where it normally runs.

To support this with the default feature handling in clang, we
need a vzeroupper feature flag in X86.td. Since this flag has
the opposite polarity of the fast-partial-ymm-or-zmm-write we
used to use to disable the pass, we now need to add this new
flag to every CPU except KNL/KNM and BTVER2 to keep identical
behavior.

Remove -fast-partial-ymm-or-zmm-write which is no longer used.

Differential Revision: https://reviews.llvm.org/D69786
2019-11-04 11:03:54 -08:00
Roman Lebedev c4b757be02
Revert BCmp Loop Idiom recognition transform (PR43870)
As discussed in https://bugs.llvm.org/show_bug.cgi?id=43870,
this transform is missing a crucial legality check:
the old (non-countable) loop would early-return upon first mismatch,
but there is no such guarantee for bcmp/memcmp.

We'd need to ensure that [PtrA, PtrA+NBytes) and [PtrB, PtrB+NBytes)
are fully dereferenceable memory regions. But that would limit
the transform to constant loop trip counts and would further
cripple it because dereferenceability analysis is *very* partial.

Furthermore, even if all that is done, every single test
would need to be rewritten from scratch.

So let's just give up.
2019-11-02 12:48:03 +03:00
Alina Sbirlea bbb43df011 [ReleaseNotes] Add item on deleting the BasicBlockPass(Manager). 2019-10-30 14:26:46 -07:00
Andrew Paverd d157a9bc8b Add Windows Control Flow Guard checks (/guard:cf).
Summary:
A new function pass (Transforms/CFGuard/CFGuard.cpp) inserts CFGuard checks on
indirect function calls, using either the check mechanism (X86, ARM, AArch64) or
or the dispatch mechanism (X86-64). The check mechanism requires a new calling
convention for the supported targets. The dispatch mechanism adds the target as
an operand bundle, which is processed by SelectionDAG. Another pass
(CodeGen/CFGuardLongjmp.cpp) identifies and emits valid longjmp targets, as
required by /guard:cf. This feature is enabled using the `cfguard` CC1 option.

Reviewers: thakis, rnk, theraven, pcc

Subscribers: ychen, hans, metalcanine, dmajor, tomrittervg, alex, mehdi_amini, mgorny, javed.absar, kristof.beyls, hiraditya, steven_wu, dexonsmith, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D65761
2019-10-28 15:19:39 +00:00
Alina Sbirlea c0e6a92e34 Update ReleaseNotes: expand the section on enabling MemorySSA
llvm-svn: 375045
2019-10-16 21:52:09 +00:00
Roman Lebedev 76cdcf25b8 [LoopIdiomRecognize] Recommit: BCmp loop idiom recognition
Summary:
This is a recommit, this originally landed in rL370454 but was
subsequently reverted in  rL370788 due to
https://bugs.llvm.org/show_bug.cgi?id=43206
The reduced testcase was added to bcmp-negative-tests.ll
as @pr43206_different_loops - we must ensure that the SCEV's
we got are both for the same loop we are currently investigating.

Original commit message:

@mclow.lists brought up this issue up in IRC.
It is a reasonably common problem to compare some two values for equality.
Those may be just some integers, strings or arrays of integers.

In C, there is `memcmp()`, `bcmp()` functions.
In C++, there exists `std::equal()` algorithm.
One can also write that function manually.

libstdc++'s `std::equal()` is specialized to directly call `memcmp()` for
various types, but not `std::byte` from C++2a. https://godbolt.org/z/mx2ejJ

libc++ does not do anything like that, it simply relies on simple C++'s
`operator==()`. https://godbolt.org/z/er0Zwf (GOOD!)

So likely, there exists a certain performance opportunities.
Let's compare performance of naive `std::equal()` (no `memcmp()`) with one that
is using `memcmp()` (in this case, compiled with modified compiler). {F8768213}

```
#include <algorithm>
#include <cmath>
#include <cstdint>
#include <iterator>
#include <limits>
#include <random>
#include <type_traits>
#include <utility>
#include <vector>

#include "benchmark/benchmark.h"

template <class T>
bool equal(T* a, T* a_end, T* b) noexcept {
  for (; a != a_end; ++a, ++b) {
    if (*a != *b) return false;
  }
  return true;
}

template <typename T>
std::vector<T> getVectorOfRandomNumbers(size_t count) {
  std::random_device rd;
  std::mt19937 gen(rd());
  std::uniform_int_distribution<T> dis(std::numeric_limits<T>::min(),
                                       std::numeric_limits<T>::max());
  std::vector<T> v;
  v.reserve(count);
  std::generate_n(std::back_inserter(v), count,
                  [&dis, &gen]() { return dis(gen); });
  assert(v.size() == count);
  return v;
}

struct Identical {
  template <typename T>
  static std::pair<std::vector<T>, std::vector<T>> Gen(size_t count) {
    auto Tmp = getVectorOfRandomNumbers<T>(count);
    return std::make_pair(Tmp, std::move(Tmp));
  }
};

struct InequalHalfway {
  template <typename T>
  static std::pair<std::vector<T>, std::vector<T>> Gen(size_t count) {
    auto V0 = getVectorOfRandomNumbers<T>(count);
    auto V1 = V0;
    V1[V1.size() / size_t(2)]++;  // just change the value.
    return std::make_pair(std::move(V0), std::move(V1));
  }
};

template <class T, class Gen>
void BM_bcmp(benchmark::State& state) {
  const size_t Length = state.range(0);

  const std::pair<std::vector<T>, std::vector<T>> Data =
      Gen::template Gen<T>(Length);
  const std::vector<T>& a = Data.first;
  const std::vector<T>& b = Data.second;
  assert(a.size() == Length && b.size() == a.size());

  benchmark::ClobberMemory();
  benchmark::DoNotOptimize(a);
  benchmark::DoNotOptimize(a.data());
  benchmark::DoNotOptimize(b);
  benchmark::DoNotOptimize(b.data());

  for (auto _ : state) {
    const bool is_equal = equal(a.data(), a.data() + a.size(), b.data());
    benchmark::DoNotOptimize(is_equal);
  }
  state.SetComplexityN(Length);
  state.counters["eltcnt"] =
      benchmark::Counter(Length, benchmark::Counter::kIsIterationInvariant);
  state.counters["eltcnt/sec"] =
      benchmark::Counter(Length, benchmark::Counter::kIsIterationInvariantRate);
  const size_t BytesRead = 2 * sizeof(T) * Length;
  state.counters["bytes_read/iteration"] =
      benchmark::Counter(BytesRead, benchmark::Counter::kDefaults,
                         benchmark::Counter::OneK::kIs1024);
  state.counters["bytes_read/sec"] = benchmark::Counter(
      BytesRead, benchmark::Counter::kIsIterationInvariantRate,
      benchmark::Counter::OneK::kIs1024);
}

template <typename T>
static void CustomArguments(benchmark::internal::Benchmark* b) {
  const size_t L2SizeBytes = []() {
    for (const benchmark::CPUInfo::CacheInfo& I :
         benchmark::CPUInfo::Get().caches) {
      if (I.level == 2) return I.size;
    }
    return 0;
  }();
  // What is the largest range we can check to always fit within given L2 cache?
  const size_t MaxLen = L2SizeBytes / /*total bufs*/ 2 /
                        /*maximal elt size*/ sizeof(T) / /*safety margin*/ 2;
  b->RangeMultiplier(2)->Range(1, MaxLen)->Complexity(benchmark::oN);
}

BENCHMARK_TEMPLATE(BM_bcmp, uint8_t, Identical)
    ->Apply(CustomArguments<uint8_t>);
BENCHMARK_TEMPLATE(BM_bcmp, uint16_t, Identical)
    ->Apply(CustomArguments<uint16_t>);
BENCHMARK_TEMPLATE(BM_bcmp, uint32_t, Identical)
    ->Apply(CustomArguments<uint32_t>);
BENCHMARK_TEMPLATE(BM_bcmp, uint64_t, Identical)
    ->Apply(CustomArguments<uint64_t>);

BENCHMARK_TEMPLATE(BM_bcmp, uint8_t, InequalHalfway)
    ->Apply(CustomArguments<uint8_t>);
BENCHMARK_TEMPLATE(BM_bcmp, uint16_t, InequalHalfway)
    ->Apply(CustomArguments<uint16_t>);
BENCHMARK_TEMPLATE(BM_bcmp, uint32_t, InequalHalfway)
    ->Apply(CustomArguments<uint32_t>);
BENCHMARK_TEMPLATE(BM_bcmp, uint64_t, InequalHalfway)
    ->Apply(CustomArguments<uint64_t>);
```
{F8768210}
```
$ ~/src/googlebenchmark/tools/compare.py --no-utest benchmarks build-{old,new}/test/llvm-bcmp-bench
RUNNING: build-old/test/llvm-bcmp-bench --benchmark_out=/tmp/tmpb6PEUx
2019-04-25 21:17:11
Running build-old/test/llvm-bcmp-bench
Run on (8 X 4000 MHz CPU s)
CPU Caches:
  L1 Data 16K (x8)
  L1 Instruction 64K (x4)
  L2 Unified 2048K (x4)
  L3 Unified 8192K (x1)
Load Average: 0.65, 3.90, 4.14
---------------------------------------------------------------------------------------------------
Benchmark                                         Time             CPU   Iterations UserCounters...
---------------------------------------------------------------------------------------------------
<...>
BM_bcmp<uint8_t, Identical>/512000           432131 ns       432101 ns         1613 bytes_read/iteration=1000k bytes_read/sec=2.20706G/s eltcnt=825.856M eltcnt/sec=1.18491G/s
BM_bcmp<uint8_t, Identical>_BigO               0.86 N          0.86 N
BM_bcmp<uint8_t, Identical>_RMS                   8 %             8 %
<...>
BM_bcmp<uint16_t, Identical>/256000          161408 ns       161409 ns         4027 bytes_read/iteration=1000k bytes_read/sec=5.90843G/s eltcnt=1030.91M eltcnt/sec=1.58603G/s
BM_bcmp<uint16_t, Identical>_BigO              0.67 N          0.67 N
BM_bcmp<uint16_t, Identical>_RMS                 25 %            25 %
<...>
BM_bcmp<uint32_t, Identical>/128000           81497 ns        81488 ns         8415 bytes_read/iteration=1000k bytes_read/sec=11.7032G/s eltcnt=1077.12M eltcnt/sec=1.57078G/s
BM_bcmp<uint32_t, Identical>_BigO              0.71 N          0.71 N
BM_bcmp<uint32_t, Identical>_RMS                 42 %            42 %
<...>
BM_bcmp<uint64_t, Identical>/64000            50138 ns        50138 ns        10909 bytes_read/iteration=1000k bytes_read/sec=19.0209G/s eltcnt=698.176M eltcnt/sec=1.27647G/s
BM_bcmp<uint64_t, Identical>_BigO              0.84 N          0.84 N
BM_bcmp<uint64_t, Identical>_RMS                 27 %            27 %
<...>
BM_bcmp<uint8_t, InequalHalfway>/512000      192405 ns       192392 ns         3638 bytes_read/iteration=1000k bytes_read/sec=4.95694G/s eltcnt=1.86266G eltcnt/sec=2.66124G/s
BM_bcmp<uint8_t, InequalHalfway>_BigO          0.38 N          0.38 N
BM_bcmp<uint8_t, InequalHalfway>_RMS              3 %             3 %
<...>
BM_bcmp<uint16_t, InequalHalfway>/256000     127858 ns       127860 ns         5477 bytes_read/iteration=1000k bytes_read/sec=7.45873G/s eltcnt=1.40211G eltcnt/sec=2.00219G/s
BM_bcmp<uint16_t, InequalHalfway>_BigO         0.50 N          0.50 N
BM_bcmp<uint16_t, InequalHalfway>_RMS             0 %             0 %
<...>
BM_bcmp<uint32_t, InequalHalfway>/128000      49140 ns        49140 ns        14281 bytes_read/iteration=1000k bytes_read/sec=19.4072G/s eltcnt=1.82797G eltcnt/sec=2.60478G/s
BM_bcmp<uint32_t, InequalHalfway>_BigO         0.40 N          0.40 N
BM_bcmp<uint32_t, InequalHalfway>_RMS            18 %            18 %
<...>
BM_bcmp<uint64_t, InequalHalfway>/64000       32101 ns        32099 ns        21786 bytes_read/iteration=1000k bytes_read/sec=29.7101G/s eltcnt=1.3943G eltcnt/sec=1.99381G/s
BM_bcmp<uint64_t, InequalHalfway>_BigO         0.50 N          0.50 N
BM_bcmp<uint64_t, InequalHalfway>_RMS             1 %             1 %
RUNNING: build-new/test/llvm-bcmp-bench --benchmark_out=/tmp/tmpQ46PP0
2019-04-25 21:19:29
Running build-new/test/llvm-bcmp-bench
Run on (8 X 4000 MHz CPU s)
CPU Caches:
  L1 Data 16K (x8)
  L1 Instruction 64K (x4)
  L2 Unified 2048K (x4)
  L3 Unified 8192K (x1)
Load Average: 1.01, 2.85, 3.71
---------------------------------------------------------------------------------------------------
Benchmark                                         Time             CPU   Iterations UserCounters...
---------------------------------------------------------------------------------------------------
<...>
BM_bcmp<uint8_t, Identical>/512000            18593 ns        18590 ns        37565 bytes_read/iteration=1000k bytes_read/sec=51.2991G/s eltcnt=19.2333G eltcnt/sec=27.541G/s
BM_bcmp<uint8_t, Identical>_BigO               0.04 N          0.04 N
BM_bcmp<uint8_t, Identical>_RMS                  37 %            37 %
<...>
BM_bcmp<uint16_t, Identical>/256000           18950 ns        18948 ns        37223 bytes_read/iteration=1000k bytes_read/sec=50.3324G/s eltcnt=9.52909G eltcnt/sec=13.511G/s
BM_bcmp<uint16_t, Identical>_BigO              0.08 N          0.08 N
BM_bcmp<uint16_t, Identical>_RMS                 34 %            34 %
<...>
BM_bcmp<uint32_t, Identical>/128000           18627 ns        18627 ns        37895 bytes_read/iteration=1000k bytes_read/sec=51.198G/s eltcnt=4.85056G eltcnt/sec=6.87168G/s
BM_bcmp<uint32_t, Identical>_BigO              0.16 N          0.16 N
BM_bcmp<uint32_t, Identical>_RMS                 35 %            35 %
<...>
BM_bcmp<uint64_t, Identical>/64000            18855 ns        18855 ns        37458 bytes_read/iteration=1000k bytes_read/sec=50.5791G/s eltcnt=2.39731G eltcnt/sec=3.3943G/s
BM_bcmp<uint64_t, Identical>_BigO              0.32 N          0.32 N
BM_bcmp<uint64_t, Identical>_RMS                 33 %            33 %
<...>
BM_bcmp<uint8_t, InequalHalfway>/512000        9570 ns         9569 ns        73500 bytes_read/iteration=1000k bytes_read/sec=99.6601G/s eltcnt=37.632G eltcnt/sec=53.5046G/s
BM_bcmp<uint8_t, InequalHalfway>_BigO          0.02 N          0.02 N
BM_bcmp<uint8_t, InequalHalfway>_RMS             29 %            29 %
<...>
BM_bcmp<uint16_t, InequalHalfway>/256000       9547 ns         9547 ns        74343 bytes_read/iteration=1000k bytes_read/sec=99.8971G/s eltcnt=19.0318G eltcnt/sec=26.8159G/s
BM_bcmp<uint16_t, InequalHalfway>_BigO         0.04 N          0.04 N
BM_bcmp<uint16_t, InequalHalfway>_RMS            29 %            29 %
<...>
BM_bcmp<uint32_t, InequalHalfway>/128000       9396 ns         9394 ns        73521 bytes_read/iteration=1000k bytes_read/sec=101.518G/s eltcnt=9.41069G eltcnt/sec=13.6255G/s
BM_bcmp<uint32_t, InequalHalfway>_BigO         0.08 N          0.08 N
BM_bcmp<uint32_t, InequalHalfway>_RMS            30 %            30 %
<...>
BM_bcmp<uint64_t, InequalHalfway>/64000        9499 ns         9498 ns        73802 bytes_read/iteration=1000k bytes_read/sec=100.405G/s eltcnt=4.72333G eltcnt/sec=6.73808G/s
BM_bcmp<uint64_t, InequalHalfway>_BigO         0.16 N          0.16 N
BM_bcmp<uint64_t, InequalHalfway>_RMS            28 %            28 %
Comparing build-old/test/llvm-bcmp-bench to build-new/test/llvm-bcmp-bench
Benchmark                                                  Time             CPU      Time Old      Time New       CPU Old       CPU New
---------------------------------------------------------------------------------------------------------------------------------------
<...>
BM_bcmp<uint8_t, Identical>/512000                      -0.9570         -0.9570        432131         18593        432101         18590
<...>
BM_bcmp<uint16_t, Identical>/256000                     -0.8826         -0.8826        161408         18950        161409         18948
<...>
BM_bcmp<uint32_t, Identical>/128000                     -0.7714         -0.7714         81497         18627         81488         18627
<...>
BM_bcmp<uint64_t, Identical>/64000                      -0.6239         -0.6239         50138         18855         50138         18855
<...>
BM_bcmp<uint8_t, InequalHalfway>/512000                 -0.9503         -0.9503        192405          9570        192392          9569
<...>
BM_bcmp<uint16_t, InequalHalfway>/256000                -0.9253         -0.9253        127858          9547        127860          9547
<...>
BM_bcmp<uint32_t, InequalHalfway>/128000                -0.8088         -0.8088         49140          9396         49140          9394
<...>
BM_bcmp<uint64_t, InequalHalfway>/64000                 -0.7041         -0.7041         32101          9499         32099          9498
```

What can we tell from the benchmark?
* Performance of naive equality check somewhat improves with element size,
  maxing out at eltcnt/sec=1.58603G/s for uint16_t, or bytes_read/sec=19.0209G/s
  for uint64_t. I think, that instability implies performance problems.
* Performance of `memcmp()`-aware benchmark always maxes out at around
  bytes_read/sec=51.2991G/s for every type. That is 2.6x the throughput of the
  naive variant!
* eltcnt/sec metric for the `memcmp()`-aware benchmark maxes out at
  eltcnt/sec=27.541G/s for uint8_t (was: eltcnt/sec=1.18491G/s, so 24x) and
  linearly decreases with element size.
  For uint64_t, it's ~4x+ the elements/second.
* The call obvious is more pricey than the loop, with small element count.
  As it can be seen from the full output {F8768210}, the `memcmp()` is almost
  universally worse, independent of the element size (and thus buffer size) when
  element count is less than 8.

So all in all, bcmp idiom does indeed pose untapped performance headroom.
This diff does implement said idiom recognition. I think a reasonable test
coverage is present, but do tell if there is anything obvious missing.

Now, quality. This does succeed to build and pass the test-suite, at least
without any non-bundled elements. {F8768216} {F8768217}
This transform fires 91 times:
```
$ /build/test-suite/utils/compare.py -m loop-idiom.NumBCmp result-new.json
Tests: 1149
Metric: loop-idiom.NumBCmp

Program                                         result-new

MultiSourc...Benchmarks/7zip/7zip-benchmark    79.00
MultiSource/Applications/d/make_dparser         3.00
SingleSource/UnitTests/vla                      2.00
MultiSource/Applications/Burg/burg              1.00
MultiSourc.../Applications/JM/lencod/lencod     1.00
MultiSource/Applications/lemon/lemon            1.00
MultiSource/Benchmarks/Bullet/bullet            1.00
MultiSourc...e/Benchmarks/MallocBench/gs/gs     1.00
MultiSourc...gs-C/TimberWolfMC/timberwolfmc     1.00
MultiSourc...Prolangs-C/simulator/simulator     1.00
```
The size changes are:
I'm not sure what's going on with SingleSource/UnitTests/vla.test yet, did not look.
```
$ /build/test-suite/utils/compare.py -m size..text result-{old,new}.json --filter-hash
Tests: 1149
Same hash: 907 (filtered out)
Remaining: 242
Metric: size..text

Program                                        result-old result-new diff
test-suite...ingleSource/UnitTests/vla.test   753.00     833.00     10.6%
test-suite...marks/7zip/7zip-benchmark.test   1001697.00 966657.00  -3.5%
test-suite...ngs-C/simulator/simulator.test   32369.00   32321.00   -0.1%
test-suite...plications/d/make_dparser.test   89585.00   89505.00   -0.1%
test-suite...ce/Applications/Burg/burg.test   40817.00   40785.00   -0.1%
test-suite.../Applications/lemon/lemon.test   47281.00   47249.00   -0.1%
test-suite...TimberWolfMC/timberwolfmc.test   250065.00  250113.00   0.0%
test-suite...chmarks/MallocBench/gs/gs.test   149889.00  149873.00  -0.0%
test-suite...ications/JM/lencod/lencod.test   769585.00  769569.00  -0.0%
test-suite.../Benchmarks/Bullet/bullet.test   770049.00  770049.00   0.0%
test-suite...HMARK_ANISTROPIC_DIFFUSION/128    NaN        NaN        nan%
test-suite...HMARK_ANISTROPIC_DIFFUSION/256    NaN        NaN        nan%
test-suite...CHMARK_ANISTROPIC_DIFFUSION/64    NaN        NaN        nan%
test-suite...CHMARK_ANISTROPIC_DIFFUSION/32    NaN        NaN        nan%
test-suite...ENCHMARK_BILATERAL_FILTER/64/4    NaN        NaN        nan%
Geomean difference                                                   nan%
         result-old    result-new       diff
count  1.000000e+01  10.00000      10.000000
mean   3.152090e+05  311695.40000  0.006749
std    3.790398e+05  372091.42232  0.036605
min    7.530000e+02  833.00000    -0.034981
25%    4.243300e+04  42401.00000  -0.000866
50%    1.197370e+05  119689.00000 -0.000392
75%    6.397050e+05  639705.00000 -0.000005
max    1.001697e+06  966657.00000  0.106242
```

I don't have timings though.

And now to the code. The basic idea is to completely replace the whole loop.
If we can't fully kill it, don't transform.
I have left one or two comments in the code, so hopefully it can be understood.

Also, there is a few TODO's that i have left for follow-ups:
* widening of `memcmp()`/`bcmp()`
* step smaller than the comparison size
* Metadata propagation
* more than two blocks as long as there is still a single backedge?
* ???

Reviewers: reames, fhahn, mkazantsev, chandlerc, craig.topper, courbet

Reviewed By: courbet

Subscribers: miyuki, hiraditya, xbolva00, nikic, jfb, gchatelet, courbet, llvm-commits, mclow.lists

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61144

llvm-svn: 374662
2019-10-12 15:35:32 +00:00
Roman Lebedev 536b0ee40a [UBSan][clang][compiler-rt] Applying non-zero offset to nullptr is undefined behaviour
Summary:
Quote from http://eel.is/c++draft/expr.add#4:
```
4     When an expression J that has integral type is added to or subtracted
      from an expression P of pointer type, the result has the type of P.
(4.1) If P evaluates to a null pointer value and J evaluates to 0,
      the result is a null pointer value.
(4.2) Otherwise, if P points to an array element i of an array object x with n
      elements ([dcl.array]), the expressions P + J and J + P
      (where J has the value j) point to the (possibly-hypothetical) array
      element i+j of x if 0≤i+j≤n and the expression P - J points to the
      (possibly-hypothetical) array element i−j of x if 0≤i−j≤n.
(4.3) Otherwise, the behavior is undefined.
```

Therefore, as per the standard, applying non-zero offset to `nullptr`
(or making non-`nullptr` a `nullptr`, by subtracting pointer's integral value
from the pointer itself) is undefined behavior. (*if* `nullptr` is not defined,
i.e. e.g. `-fno-delete-null-pointer-checks` was *not* specified.)

To make things more fun, in C (6.5.6p8), applying *any* offset to null pointer
is undefined, although Clang front-end pessimizes the code by not lowering
that info, so this UB is "harmless".

Since rL369789 (D66608 `[InstCombine] icmp eq/ne (gep inbounds P, Idx..), null -> icmp eq/ne P, null`)
LLVM middle-end uses those guarantees for transformations.
If the source contains such UB's, said code may now be miscompiled.
Such miscompilations were already observed:
* https://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20190826/687838.html
* https://github.com/google/filament/pull/1566

Surprisingly, UBSan does not catch those issues
... until now. This diff teaches UBSan about these UB's.

`getelementpointer inbounds` is a pretty frequent instruction,
so this does have a measurable impact on performance;
I've addressed most of the obvious missing folds (and thus decreased the performance impact by ~5%),
and then re-performed some performance measurements using my [[ https://github.com/darktable-org/rawspeed | RawSpeed ]] benchmark:
(all measurements done with LLVM ToT, the sanitizer never fired.)
* no sanitization vs. existing check: average `+21.62%` slowdown
* existing check vs. check after this patch: average `22.04%` slowdown
* no sanitization vs. this patch: average `48.42%` slowdown

Reviewers: vsk, filcab, rsmith, aaron.ballman, vitalybuka, rjmccall, #sanitizers

Reviewed By: rsmith

Subscribers: kristof.beyls, nickdesaulniers, nikic, ychen, dtzWill, xbolva00, dberris, arphaman, rupprecht, reames, regehr, llvm-commits, cfe-commits

Tags: #clang, #sanitizers, #llvm

Differential Revision: https://reviews.llvm.org/D67122

llvm-svn: 374293
2019-10-10 09:25:02 +00:00
Craig Topper 635d383fad [X86] Enable -mprefer-vector-width=256 by default for Skylake-avx512 and later Intel CPUs.
AVX512 instructions can cause a frequency drop on these CPUs. This
can negate the performance gains from using wider vectors. Enabling
prefer-vector-width=256 will prevent generation of zmm registers
unless explicit 512 bit operations are used in the original source
code.

I believe gcc and icc both do something similar to this by default.

Differential Revision: https://reviews.llvm.org/D67259

llvm-svn: 371694
2019-09-11 23:54:36 +00:00
Alina Sbirlea a6e0bef312 Update ReleaseNotes: add enabling of MemorySSA.
llvm-svn: 371569
2019-09-10 23:22:37 +00:00
Craig Topper 5ebd0a6e88 [SelectionDAG] Remove ISD::FP_ROUND_INREG
I don't think anything in tree creates this node. So all of this
code appears to be dead.

Code coverage agrees
http://lab.llvm.org:8080/coverage/coverage-reports/llvm/coverage/Users/buildslave/jenkins/workspace/clang-stage2-coverage-R/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp.html

Differential Revision: https://reviews.llvm.org/D67312

llvm-svn: 371431
2019-09-09 17:54:44 +00:00
Roman Lebedev bdd65351d3 Revert r370454 "[LoopIdiomRecognize] BCmp loop idiom recognition"
https://bugs.llvm.org/show_bug.cgi?id=43206 was filed,
claiming that there is a miscompilation.
Reverting until i investigate.

This reverts commit r370454

llvm-svn: 370788
2019-09-03 17:14:56 +00:00
Craig Topper 18e8d02e8c [X86] Pass v32i16/v64i8 in zmm registers on KNL target.
gcc and icc pass these types in zmm registers in zmm registers.

This patch implements a quick hack to override the register
type before calling convention handling to one that is legal.
Longer term we might want to do something similar to 256-bit
integer registers on AVX1 where we just split all the operations.

Fixes PR42957

Differential Revision: https://reviews.llvm.org/D66708

llvm-svn: 370495
2019-08-30 17:35:08 +00:00
Roman Lebedev 5c9f3cfec7 [LoopIdiomRecognize] BCmp loop idiom recognition
Summary:
@mclow.lists brought up this issue up in IRC.
It is a reasonably common problem to compare some two values for equality.
Those may be just some integers, strings or arrays of integers.

In C, there is `memcmp()`, `bcmp()` functions.
In C++, there exists `std::equal()` algorithm.
One can also write that function manually.

libstdc++'s `std::equal()` is specialized to directly call `memcmp()` for
various types, but not `std::byte` from C++2a. https://godbolt.org/z/mx2ejJ

libc++ does not do anything like that, it simply relies on simple C++'s
`operator==()`. https://godbolt.org/z/er0Zwf (GOOD!)

So likely, there exists a certain performance opportunities.
Let's compare performance of naive `std::equal()` (no `memcmp()`) with one that
is using `memcmp()` (in this case, compiled with modified compiler). {F8768213}

```
#include <algorithm>
#include <cmath>
#include <cstdint>
#include <iterator>
#include <limits>
#include <random>
#include <type_traits>
#include <utility>
#include <vector>

#include "benchmark/benchmark.h"

template <class T>
bool equal(T* a, T* a_end, T* b) noexcept {
  for (; a != a_end; ++a, ++b) {
    if (*a != *b) return false;
  }
  return true;
}

template <typename T>
std::vector<T> getVectorOfRandomNumbers(size_t count) {
  std::random_device rd;
  std::mt19937 gen(rd());
  std::uniform_int_distribution<T> dis(std::numeric_limits<T>::min(),
                                       std::numeric_limits<T>::max());
  std::vector<T> v;
  v.reserve(count);
  std::generate_n(std::back_inserter(v), count,
                  [&dis, &gen]() { return dis(gen); });
  assert(v.size() == count);
  return v;
}

struct Identical {
  template <typename T>
  static std::pair<std::vector<T>, std::vector<T>> Gen(size_t count) {
    auto Tmp = getVectorOfRandomNumbers<T>(count);
    return std::make_pair(Tmp, std::move(Tmp));
  }
};

struct InequalHalfway {
  template <typename T>
  static std::pair<std::vector<T>, std::vector<T>> Gen(size_t count) {
    auto V0 = getVectorOfRandomNumbers<T>(count);
    auto V1 = V0;
    V1[V1.size() / size_t(2)]++;  // just change the value.
    return std::make_pair(std::move(V0), std::move(V1));
  }
};

template <class T, class Gen>
void BM_bcmp(benchmark::State& state) {
  const size_t Length = state.range(0);

  const std::pair<std::vector<T>, std::vector<T>> Data =
      Gen::template Gen<T>(Length);
  const std::vector<T>& a = Data.first;
  const std::vector<T>& b = Data.second;
  assert(a.size() == Length && b.size() == a.size());

  benchmark::ClobberMemory();
  benchmark::DoNotOptimize(a);
  benchmark::DoNotOptimize(a.data());
  benchmark::DoNotOptimize(b);
  benchmark::DoNotOptimize(b.data());

  for (auto _ : state) {
    const bool is_equal = equal(a.data(), a.data() + a.size(), b.data());
    benchmark::DoNotOptimize(is_equal);
  }
  state.SetComplexityN(Length);
  state.counters["eltcnt"] =
      benchmark::Counter(Length, benchmark::Counter::kIsIterationInvariant);
  state.counters["eltcnt/sec"] =
      benchmark::Counter(Length, benchmark::Counter::kIsIterationInvariantRate);
  const size_t BytesRead = 2 * sizeof(T) * Length;
  state.counters["bytes_read/iteration"] =
      benchmark::Counter(BytesRead, benchmark::Counter::kDefaults,
                         benchmark::Counter::OneK::kIs1024);
  state.counters["bytes_read/sec"] = benchmark::Counter(
      BytesRead, benchmark::Counter::kIsIterationInvariantRate,
      benchmark::Counter::OneK::kIs1024);
}

template <typename T>
static void CustomArguments(benchmark::internal::Benchmark* b) {
  const size_t L2SizeBytes = []() {
    for (const benchmark::CPUInfo::CacheInfo& I :
         benchmark::CPUInfo::Get().caches) {
      if (I.level == 2) return I.size;
    }
    return 0;
  }();
  // What is the largest range we can check to always fit within given L2 cache?
  const size_t MaxLen = L2SizeBytes / /*total bufs*/ 2 /
                        /*maximal elt size*/ sizeof(T) / /*safety margin*/ 2;
  b->RangeMultiplier(2)->Range(1, MaxLen)->Complexity(benchmark::oN);
}

BENCHMARK_TEMPLATE(BM_bcmp, uint8_t, Identical)
    ->Apply(CustomArguments<uint8_t>);
BENCHMARK_TEMPLATE(BM_bcmp, uint16_t, Identical)
    ->Apply(CustomArguments<uint16_t>);
BENCHMARK_TEMPLATE(BM_bcmp, uint32_t, Identical)
    ->Apply(CustomArguments<uint32_t>);
BENCHMARK_TEMPLATE(BM_bcmp, uint64_t, Identical)
    ->Apply(CustomArguments<uint64_t>);

BENCHMARK_TEMPLATE(BM_bcmp, uint8_t, InequalHalfway)
    ->Apply(CustomArguments<uint8_t>);
BENCHMARK_TEMPLATE(BM_bcmp, uint16_t, InequalHalfway)
    ->Apply(CustomArguments<uint16_t>);
BENCHMARK_TEMPLATE(BM_bcmp, uint32_t, InequalHalfway)
    ->Apply(CustomArguments<uint32_t>);
BENCHMARK_TEMPLATE(BM_bcmp, uint64_t, InequalHalfway)
    ->Apply(CustomArguments<uint64_t>);
```
{F8768210}
```
$ ~/src/googlebenchmark/tools/compare.py --no-utest benchmarks build-{old,new}/test/llvm-bcmp-bench
RUNNING: build-old/test/llvm-bcmp-bench --benchmark_out=/tmp/tmpb6PEUx
2019-04-25 21:17:11
Running build-old/test/llvm-bcmp-bench
Run on (8 X 4000 MHz CPU s)
CPU Caches:
  L1 Data 16K (x8)
  L1 Instruction 64K (x4)
  L2 Unified 2048K (x4)
  L3 Unified 8192K (x1)
Load Average: 0.65, 3.90, 4.14
---------------------------------------------------------------------------------------------------
Benchmark                                         Time             CPU   Iterations UserCounters...
---------------------------------------------------------------------------------------------------
<...>
BM_bcmp<uint8_t, Identical>/512000           432131 ns       432101 ns         1613 bytes_read/iteration=1000k bytes_read/sec=2.20706G/s eltcnt=825.856M eltcnt/sec=1.18491G/s
BM_bcmp<uint8_t, Identical>_BigO               0.86 N          0.86 N
BM_bcmp<uint8_t, Identical>_RMS                   8 %             8 %
<...>
BM_bcmp<uint16_t, Identical>/256000          161408 ns       161409 ns         4027 bytes_read/iteration=1000k bytes_read/sec=5.90843G/s eltcnt=1030.91M eltcnt/sec=1.58603G/s
BM_bcmp<uint16_t, Identical>_BigO              0.67 N          0.67 N
BM_bcmp<uint16_t, Identical>_RMS                 25 %            25 %
<...>
BM_bcmp<uint32_t, Identical>/128000           81497 ns        81488 ns         8415 bytes_read/iteration=1000k bytes_read/sec=11.7032G/s eltcnt=1077.12M eltcnt/sec=1.57078G/s
BM_bcmp<uint32_t, Identical>_BigO              0.71 N          0.71 N
BM_bcmp<uint32_t, Identical>_RMS                 42 %            42 %
<...>
BM_bcmp<uint64_t, Identical>/64000            50138 ns        50138 ns        10909 bytes_read/iteration=1000k bytes_read/sec=19.0209G/s eltcnt=698.176M eltcnt/sec=1.27647G/s
BM_bcmp<uint64_t, Identical>_BigO              0.84 N          0.84 N
BM_bcmp<uint64_t, Identical>_RMS                 27 %            27 %
<...>
BM_bcmp<uint8_t, InequalHalfway>/512000      192405 ns       192392 ns         3638 bytes_read/iteration=1000k bytes_read/sec=4.95694G/s eltcnt=1.86266G eltcnt/sec=2.66124G/s
BM_bcmp<uint8_t, InequalHalfway>_BigO          0.38 N          0.38 N
BM_bcmp<uint8_t, InequalHalfway>_RMS              3 %             3 %
<...>
BM_bcmp<uint16_t, InequalHalfway>/256000     127858 ns       127860 ns         5477 bytes_read/iteration=1000k bytes_read/sec=7.45873G/s eltcnt=1.40211G eltcnt/sec=2.00219G/s
BM_bcmp<uint16_t, InequalHalfway>_BigO         0.50 N          0.50 N
BM_bcmp<uint16_t, InequalHalfway>_RMS             0 %             0 %
<...>
BM_bcmp<uint32_t, InequalHalfway>/128000      49140 ns        49140 ns        14281 bytes_read/iteration=1000k bytes_read/sec=19.4072G/s eltcnt=1.82797G eltcnt/sec=2.60478G/s
BM_bcmp<uint32_t, InequalHalfway>_BigO         0.40 N          0.40 N
BM_bcmp<uint32_t, InequalHalfway>_RMS            18 %            18 %
<...>
BM_bcmp<uint64_t, InequalHalfway>/64000       32101 ns        32099 ns        21786 bytes_read/iteration=1000k bytes_read/sec=29.7101G/s eltcnt=1.3943G eltcnt/sec=1.99381G/s
BM_bcmp<uint64_t, InequalHalfway>_BigO         0.50 N          0.50 N
BM_bcmp<uint64_t, InequalHalfway>_RMS             1 %             1 %
RUNNING: build-new/test/llvm-bcmp-bench --benchmark_out=/tmp/tmpQ46PP0
2019-04-25 21:19:29
Running build-new/test/llvm-bcmp-bench
Run on (8 X 4000 MHz CPU s)
CPU Caches:
  L1 Data 16K (x8)
  L1 Instruction 64K (x4)
  L2 Unified 2048K (x4)
  L3 Unified 8192K (x1)
Load Average: 1.01, 2.85, 3.71
---------------------------------------------------------------------------------------------------
Benchmark                                         Time             CPU   Iterations UserCounters...
---------------------------------------------------------------------------------------------------
<...>
BM_bcmp<uint8_t, Identical>/512000            18593 ns        18590 ns        37565 bytes_read/iteration=1000k bytes_read/sec=51.2991G/s eltcnt=19.2333G eltcnt/sec=27.541G/s
BM_bcmp<uint8_t, Identical>_BigO               0.04 N          0.04 N
BM_bcmp<uint8_t, Identical>_RMS                  37 %            37 %
<...>
BM_bcmp<uint16_t, Identical>/256000           18950 ns        18948 ns        37223 bytes_read/iteration=1000k bytes_read/sec=50.3324G/s eltcnt=9.52909G eltcnt/sec=13.511G/s
BM_bcmp<uint16_t, Identical>_BigO              0.08 N          0.08 N
BM_bcmp<uint16_t, Identical>_RMS                 34 %            34 %
<...>
BM_bcmp<uint32_t, Identical>/128000           18627 ns        18627 ns        37895 bytes_read/iteration=1000k bytes_read/sec=51.198G/s eltcnt=4.85056G eltcnt/sec=6.87168G/s
BM_bcmp<uint32_t, Identical>_BigO              0.16 N          0.16 N
BM_bcmp<uint32_t, Identical>_RMS                 35 %            35 %
<...>
BM_bcmp<uint64_t, Identical>/64000            18855 ns        18855 ns        37458 bytes_read/iteration=1000k bytes_read/sec=50.5791G/s eltcnt=2.39731G eltcnt/sec=3.3943G/s
BM_bcmp<uint64_t, Identical>_BigO              0.32 N          0.32 N
BM_bcmp<uint64_t, Identical>_RMS                 33 %            33 %
<...>
BM_bcmp<uint8_t, InequalHalfway>/512000        9570 ns         9569 ns        73500 bytes_read/iteration=1000k bytes_read/sec=99.6601G/s eltcnt=37.632G eltcnt/sec=53.5046G/s
BM_bcmp<uint8_t, InequalHalfway>_BigO          0.02 N          0.02 N
BM_bcmp<uint8_t, InequalHalfway>_RMS             29 %            29 %
<...>
BM_bcmp<uint16_t, InequalHalfway>/256000       9547 ns         9547 ns        74343 bytes_read/iteration=1000k bytes_read/sec=99.8971G/s eltcnt=19.0318G eltcnt/sec=26.8159G/s
BM_bcmp<uint16_t, InequalHalfway>_BigO         0.04 N          0.04 N
BM_bcmp<uint16_t, InequalHalfway>_RMS            29 %            29 %
<...>
BM_bcmp<uint32_t, InequalHalfway>/128000       9396 ns         9394 ns        73521 bytes_read/iteration=1000k bytes_read/sec=101.518G/s eltcnt=9.41069G eltcnt/sec=13.6255G/s
BM_bcmp<uint32_t, InequalHalfway>_BigO         0.08 N          0.08 N
BM_bcmp<uint32_t, InequalHalfway>_RMS            30 %            30 %
<...>
BM_bcmp<uint64_t, InequalHalfway>/64000        9499 ns         9498 ns        73802 bytes_read/iteration=1000k bytes_read/sec=100.405G/s eltcnt=4.72333G eltcnt/sec=6.73808G/s
BM_bcmp<uint64_t, InequalHalfway>_BigO         0.16 N          0.16 N
BM_bcmp<uint64_t, InequalHalfway>_RMS            28 %            28 %
Comparing build-old/test/llvm-bcmp-bench to build-new/test/llvm-bcmp-bench
Benchmark                                                  Time             CPU      Time Old      Time New       CPU Old       CPU New
---------------------------------------------------------------------------------------------------------------------------------------
<...>
BM_bcmp<uint8_t, Identical>/512000                      -0.9570         -0.9570        432131         18593        432101         18590
<...>
BM_bcmp<uint16_t, Identical>/256000                     -0.8826         -0.8826        161408         18950        161409         18948
<...>
BM_bcmp<uint32_t, Identical>/128000                     -0.7714         -0.7714         81497         18627         81488         18627
<...>
BM_bcmp<uint64_t, Identical>/64000                      -0.6239         -0.6239         50138         18855         50138         18855
<...>
BM_bcmp<uint8_t, InequalHalfway>/512000                 -0.9503         -0.9503        192405          9570        192392          9569
<...>
BM_bcmp<uint16_t, InequalHalfway>/256000                -0.9253         -0.9253        127858          9547        127860          9547
<...>
BM_bcmp<uint32_t, InequalHalfway>/128000                -0.8088         -0.8088         49140          9396         49140          9394
<...>
BM_bcmp<uint64_t, InequalHalfway>/64000                 -0.7041         -0.7041         32101          9499         32099          9498
```

What can we tell from the benchmark?
* Performance of naive equality check somewhat improves with element size,
  maxing out at eltcnt/sec=1.58603G/s for uint16_t, or bytes_read/sec=19.0209G/s
  for uint64_t. I think, that instability implies performance problems.
* Performance of `memcmp()`-aware benchmark always maxes out at around
  bytes_read/sec=51.2991G/s for every type. That is 2.6x the throughput of the
  naive variant!
* eltcnt/sec metric for the `memcmp()`-aware benchmark maxes out at
  eltcnt/sec=27.541G/s for uint8_t (was: eltcnt/sec=1.18491G/s, so 24x) and
  linearly decreases with element size.
  For uint64_t, it's ~4x+ the elements/second.
* The call obvious is more pricey than the loop, with small element count.
  As it can be seen from the full output {F8768210}, the `memcmp()` is almost
  universally worse, independent of the element size (and thus buffer size) when
  element count is less than 8.

So all in all, bcmp idiom does indeed pose untapped performance headroom.
This diff does implement said idiom recognition. I think a reasonable test
coverage is present, but do tell if there is anything obvious missing.

Now, quality. This does succeed to build and pass the test-suite, at least
without any non-bundled elements. {F8768216} {F8768217}
This transform fires 91 times:
```
$ /build/test-suite/utils/compare.py -m loop-idiom.NumBCmp result-new.json
Tests: 1149
Metric: loop-idiom.NumBCmp

Program                                         result-new

MultiSourc...Benchmarks/7zip/7zip-benchmark    79.00
MultiSource/Applications/d/make_dparser         3.00
SingleSource/UnitTests/vla                      2.00
MultiSource/Applications/Burg/burg              1.00
MultiSourc.../Applications/JM/lencod/lencod     1.00
MultiSource/Applications/lemon/lemon            1.00
MultiSource/Benchmarks/Bullet/bullet            1.00
MultiSourc...e/Benchmarks/MallocBench/gs/gs     1.00
MultiSourc...gs-C/TimberWolfMC/timberwolfmc     1.00
MultiSourc...Prolangs-C/simulator/simulator     1.00
```
The size changes are:
I'm not sure what's going on with SingleSource/UnitTests/vla.test yet, did not look.
```
$ /build/test-suite/utils/compare.py -m size..text result-{old,new}.json --filter-hash
Tests: 1149
Same hash: 907 (filtered out)
Remaining: 242
Metric: size..text

Program                                        result-old result-new diff
test-suite...ingleSource/UnitTests/vla.test   753.00     833.00     10.6%
test-suite...marks/7zip/7zip-benchmark.test   1001697.00 966657.00  -3.5%
test-suite...ngs-C/simulator/simulator.test   32369.00   32321.00   -0.1%
test-suite...plications/d/make_dparser.test   89585.00   89505.00   -0.1%
test-suite...ce/Applications/Burg/burg.test   40817.00   40785.00   -0.1%
test-suite.../Applications/lemon/lemon.test   47281.00   47249.00   -0.1%
test-suite...TimberWolfMC/timberwolfmc.test   250065.00  250113.00   0.0%
test-suite...chmarks/MallocBench/gs/gs.test   149889.00  149873.00  -0.0%
test-suite...ications/JM/lencod/lencod.test   769585.00  769569.00  -0.0%
test-suite.../Benchmarks/Bullet/bullet.test   770049.00  770049.00   0.0%
test-suite...HMARK_ANISTROPIC_DIFFUSION/128    NaN        NaN        nan%
test-suite...HMARK_ANISTROPIC_DIFFUSION/256    NaN        NaN        nan%
test-suite...CHMARK_ANISTROPIC_DIFFUSION/64    NaN        NaN        nan%
test-suite...CHMARK_ANISTROPIC_DIFFUSION/32    NaN        NaN        nan%
test-suite...ENCHMARK_BILATERAL_FILTER/64/4    NaN        NaN        nan%
Geomean difference                                                   nan%
         result-old    result-new       diff
count  1.000000e+01  10.00000      10.000000
mean   3.152090e+05  311695.40000  0.006749
std    3.790398e+05  372091.42232  0.036605
min    7.530000e+02  833.00000    -0.034981
25%    4.243300e+04  42401.00000  -0.000866
50%    1.197370e+05  119689.00000 -0.000392
75%    6.397050e+05  639705.00000 -0.000005
max    1.001697e+06  966657.00000  0.106242
```

I don't have timings though.

And now to the code. The basic idea is to completely replace the whole loop.
If we can't fully kill it, don't transform.
I have left one or two comments in the code, so hopefully it can be understood.

Also, there is a few TODO's that i have left for follow-ups:
* widening of `memcmp()`/`bcmp()`
* step smaller than the comparison size
* Metadata propagation
* more than two blocks as long as there is still a single backedge?
* ???

Reviewers: reames, fhahn, mkazantsev, chandlerc, craig.topper, courbet

Reviewed By: courbet

Subscribers: hiraditya, xbolva00, nikic, jfb, gchatelet, courbet, llvm-commits, mclow.lists

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61144

llvm-svn: 370454
2019-08-30 09:51:23 +00:00
Craig Topper 5a43fdd313 [X86] Remove what little support we had for MPX
-Deprecate -mmpx and -mno-mpx command line options
-Remove CPUID detection of mpx for -march=native
-Remove MPX from all CPUs
-Remove MPX preprocessor define

I've left the "mpx" string in the backend so we don't fail on old IR, but its not connected to anything.

gcc has also deprecated these command line options. https://www.phoronix.com/scan.php?page=news_item&px=GCC-Patch-To-Drop-MPX

Differential Revision: https://reviews.llvm.org/D66669

llvm-svn: 370393
2019-08-29 18:09:02 +00:00
Craig Topper a47db7110d [X86][ReleaseNotes] Add a note about the switch to widening legalization for narrow vectors.
llvm-svn: 370233
2019-08-28 17:18:56 +00:00
Hans Wennborg dba4dd1e8d Revert r367941 "Add a note to the release not about a potentially breaking optimization"
The note was moved to the release_90 branch in r367997.

llvm-svn: 367998
2019-08-06 08:32:33 +00:00
Philip Reames e39e79358f Add a note to the release not about a potentially breaking optimization
This has come up twice already (once in pr42763 and once in the commit thread), so give warning of a new way in which UB can result in unexpected program behavior.

llvm-svn: 367941
2019-08-05 22:34:59 +00:00
Tim Northover a009a60a91 IR: print value numbers for unnamed function arguments
For consistency with normal instructions and clarity when reading IR,
it's best to print the %0, %1, ... names of function arguments in
definitions.

Also modifies the parser to accept IR in that form for obvious reasons.

llvm-svn: 367755
2019-08-03 14:28:34 +00:00
Hans Wennborg 8f5b44aead Bump the trunk version to 10.0.0svn
and clear the release notes.

llvm-svn: 366427
2019-07-18 11:51:05 +00:00
Matt Arsenault 269e4e1b60 Add some release notes for 9.0 release
llvm-svn: 366093
2019-07-15 17:50:28 +00:00
Pavel Labath 9eb4b96be0 Add lldb type unit support to the release notes
Reviewers: JDevlieghere, teemperor

Subscribers: llvm-commits, lldb-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D64366

llvm-svn: 365568
2019-07-09 22:36:43 +00:00
Simon Pilgrim 456fc4fa6d Retire VS2015 Support
As proposed here: https://lists.llvm.org/pipermail/llvm-dev/2019-June/133147.html

This patch raises the minimum supported version to build LLVM/Clang to Visual Studio 2017.

Differential Revision: https://reviews.llvm.org/D64326

llvm-svn: 365452
2019-07-09 10:10:48 +00:00
Jonas Devlieghere 7626e1e504 Add lldb-mi deprecation to the release notes
Differential revision: https://reviews.llvm.org/D64254

llvm-svn: 365231
2019-07-05 18:23:52 +00:00
Jonas Devlieghere bb65a38b56 Add LLDB section to the release notes
llvm-svn: 365228
2019-07-05 17:58:30 +00:00
Serge Guelton 85fc597f26 Document legacy pass manager extension points
Differential Revision: https://reviews.llvm.org/D64093

llvm-svn: 365142
2019-07-04 14:03:11 +00:00
Tim Northover b7141207a4 Reapply: IR: add optional type to 'byval' function parameters
When we switch to opaque pointer types we will need some way to describe
how many bytes a 'byval' parameter should occupy on the stack. This adds
a (for now) optional extra type parameter.

If present, the type must match the pointee type of the argument.

The original commit did not remap byval types when linking modules, which broke
LTO. This version fixes that.

Note to front-end maintainers: if this causes test failures, it's probably
because the "byval" attribute is printed after attributes without any parameter
after this change.

llvm-svn: 362128
2019-05-30 18:48:23 +00:00
Fangrui Song f4dfd63c74 [IR] Disallow llvm.global_ctors and llvm.global_dtors of the 2-field form in textual format
The 3-field form was introduced by D3499 in 2014 and the legacy 2-field
form was planned to be removed in LLVM 4.0

For the textual format, this patch migrates the existing 2-field form to
use the 3-field form and deletes the compatibility code.
test/Verifier/global-ctors-2.ll checks we have a friendly error message.

For bitcode, lib/IR/AutoUpgrade UpgradeGlobalVariables will upgrade the
2-field form (add i8* null as the third field).

Reviewed By: rnk, dexonsmith

Differential Revision: https://reviews.llvm.org/D61547

llvm-svn: 360742
2019-05-15 02:35:32 +00:00
Roman Lebedev a6be919c92 [Docs] ReleaseNotes: fixup markup in memcmp()->bcmp() entry
llvm-svn: 358986
2019-04-23 13:46:18 +00:00
Matt Arsenault caf1316f71 IR: Add immarg attribute
This indicates an intrinsic parameter is required to be a constant,
and should not be replaced with a non-constant value.

Add the attribute to all AMDGPU and generic intrinsics that comments
indicate it should apply to. I scanned other target intrinsics, but I
don't see any obvious comments indicating which arguments are intended
to be only immediates.

This breaks one questionable testcase for the autoupgrade. I'm unclear
on whether the autoupgrade is supposed to really handle declarations
which were never valid. The verifier fails because the attributes now
refer to a parameter past the end of the argument list.

llvm-svn: 355981
2019-03-12 21:02:54 +00:00
Clement Courbet 8e16d73346 [SelectionDAG] Allow the user to specify a memeq function.
Summary:
Right now, when we encounter a string equality check,
e.g. `if (memcmp(a, b, s) == 0)`, we try to expand to a comparison if `s` is a
small compile-time constant, and fall back on calling `memcmp()` else.

This is sub-optimal because memcmp has to compute much more than
equality.

This patch replaces `memcmp(a, b, s) == 0` by `bcmp(a, b, s) == 0` on platforms
that support `bcmp`.

`bcmp` can be made much more efficient than `memcmp` because equality
compare is trivially parallel while lexicographic ordering has a chain
dependency.

Subscribers: fedor.sergeev, jyknight, ckennelly, gchatelet, llvm-commits

Differential Revision: https://reviews.llvm.org/D56593

llvm-svn: 355672
2019-03-08 09:07:45 +00:00
Hans Wennborg 1fe469ae6c Bump the trunk version to 9.0.0svn
llvm-svn: 351320
2019-01-16 10:57:02 +00:00
Derek Schuff 5e54bc18e2 [WebAssembly] Update release notes
Summary:
Explicitly note that multithreading support is not included in the stable
ABI.

Differential Revision: https://reviews.llvm.org/D56681

llvm-svn: 351213
2019-01-15 17:54:42 +00:00
Dan Gohman 220dcdb997 [WebAssembly] Add a release notes blurb
Bid farewell to LLVM_EXPERIMENTAL_TARGETS_TO_BUILD!

Differential Revision: https://reviews.llvm.org/D56648

llvm-svn: 351083
2019-01-14 18:20:30 +00:00
Roman Lebedev 629d9804e3 ReleaseNotes: X86 Target: bdver2 sched model was added (D52779)
llvm-svn: 350053
2018-12-24 12:12:26 +00:00
Tom Stellard ed713c55b2 ReleaseNotes: Document removal of add_llvm_loadable_module CMake macro
This was removed in r349839.

llvm-svn: 349921
2018-12-21 16:20:37 +00:00
Max Moroz b2091c930b [llvm-cov] Add lcov tracefile export format.
Summary:
lcov tracefiles are used by various coverage reporting tools and build
systems (e.g., Bazel). It is a simple text-based format to parse and
more convenient to use than the JSON export format, which needs
additional processing to map regions/segments back to line numbers.

It's a little unfortunate that "text" format is now overloaded to refer
specifically to JSON for export, but I wanted to avoid making any
breaking changes to the UI of the llvm-cov tool at this time.

Patch by Tony Allevato (@allevato).

Reviewers: Dor1s, vsk

Reviewed By: Dor1s, vsk

Subscribers: mgorny, llvm-commits

Differential Revision: https://reviews.llvm.org/D54266

llvm-svn: 346506
2018-11-09 16:10:44 +00:00
Hans Wennborg 7c890242cb ReleaseNotes: update links to use https
llvm-svn: 341785
2018-09-10 08:50:31 +00:00
Hans Wennborg 67500081a2 Clear release notes and update version
llvm-svn: 338556
2018-08-01 13:58:00 +00:00
Duncan P. N. Exon Smith 0f81faed05 ADT: Document advantages of SmallVector<T,0> over std::vector
In light of the recent changes to SmallVector in r335421, r337514, and
r337820, document its advantages over std::vector (see r175906 and
r266909).

Also add a release note.

https://reviews.llvm.org/D49748

llvm-svn: 338071
2018-07-26 21:29:54 +00:00
David Carlier 6887aa8adc [docs] add various sanitisers support for FreeBSD/OpenBSD
since couple of months, supports had been enabled for FreeBSD and OpenBSD.

Reviewers: thakis, spatel, dim

Reviewed By: dim

Differential Revision: https://reviews.llvm.org/D47322

llvm-svn: 334207
2018-06-07 16:33:48 +00:00
Amaury Sechet 93a7d2aa3c Get rid of SETCCE
Summary: It has been deprecated in favor of SETCCCARRY for a year now and isn't used by any in tree backend.

Reviewers: efriedma, craig.topper, dblaikie, bkramer

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D47685

llvm-svn: 333939
2018-06-04 18:36:22 +00:00
Nicola Zaghen 771e3beea6 [ReleaseNotes] Formatting fixes.
llvm-svn: 333902
2018-06-04 14:40:34 +00:00
Nicola Zaghen 9438b15946 [ReleaseNotes] Add release note for the new LLVM_DEBUG macro.
This is to provide a way to migrate from the old DEBUG macro to the new one.

Differential Revision: https://reviews.llvm.org/D47528

llvm-svn: 333898
2018-06-04 13:55:09 +00:00
Amaury Sechet 8467411dad Set ADDE/ADDC/SUBE/SUBC to expand by default
Summary:
They've been deprecated in favor of UADDO/ADDCARRY or USUBO/SUBCARRY for a while.

Target that uses these opcodes are changed in order to ensure their behavior doesn't change.

Reviewers: efriedma, craig.topper, dblaikie, bkramer

Subscribers: jholewinski, arsenm, jyknight, sdardis, nemanjai, nhaehnle, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, jordy.potman.lists, apazos, sabuasal, niosHD, jrtc27, zzheng, edward-jones, mgrang, atanasyan, llvm-commits

Differential Revision: https://reviews.llvm.org/D47422

llvm-svn: 333748
2018-06-01 13:21:33 +00:00
Fangrui Song afa95ee03d [LLVM-C] [OCaml] Remove LLVMAddBBVectorizePass
Summary: It was fully replaced back in 2014, and the implementation was removed 11 months ago by r306797.

Reviewers: hfinkel, chandlerc, whitequark, deadalnix

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D47436

llvm-svn: 333378
2018-05-28 16:58:10 +00:00
Piotr Padlewski ce358262eb Dissallow non-empty metadata for invariant.group
Summary:
This feature is not needed, but it might be usefull in the future
to use metadata to mark what which function should support it
(and strip it when not).

Reviewers: rsmith, sanjoy, amharc, kuhar

Subscribers: hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D45419

llvm-svn: 332787
2018-05-18 23:53:46 +00:00
Piotr Padlewski 5dde809404 Rename invariant.group.barrier to launder.invariant.group
Summary:
This is one of the initial commit of "RFC: Devirtualization v2" proposal:
https://docs.google.com/document/d/16GVtCpzK8sIHNc2qZz6RN8amICNBtvjWUod2SujZVEo/edit?usp=sharing

Reviewers: rsmith, amharc, kuhar, sanjoy

Subscribers: arsenm, nhaehnle, javed.absar, hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D45111

llvm-svn: 331448
2018-05-03 11:03:01 +00:00
Nico Weber dd3c75a067 Stop setting LLVM_ON_WIN32 in config.h and llvm-config.h.
See thread "Replacing LLVM_ON_WIN32 with just _WIN32" on llvm-dev and cfe-dev.

I replaced all uses of LLVM_ON_WIN32 with _WIN32 in r331127 (llvm),
r331069 (clang), r329697 (lldb), r329696 (lld), r329696 (clang-tools-extra).

If your out-of-tree program used LLVM_ON_WIN32, just use _WIN32 instead, which
is set at exactly the same time to exactly the same value.

https://reviews.llvm.org/D46264

llvm-svn: 331224
2018-04-30 20:19:48 +00:00
Sanjay Patel 1babf5ff32 [DAGCombiner] rename function attribute for disabling ftrunc transform
This is the matching name change for the Clang patch at:
D46236
rL331209

Differential Revision: https://reviews.llvm.org/D46237 

llvm-svn: 331210
2018-04-30 18:20:33 +00:00
Sanjay Patel f6d595bd44 [docs] add fp-cast-overflow-workaround options to release notes
llvm-svn: 331059
2018-04-27 16:33:35 +00:00
Sanjay Patel c5ded68077 [docs] provide the specific sanitizer option to detect junk-in-the-ftrunc
llvm-svn: 330958
2018-04-26 17:04:07 +00:00
Sanjay Patel 3d453ad711 [DAGCombine] (float)((int) f) --> ftrunc (PR36617)
This was originally committed at rL328921 and reverted at rL329920 to
investigate failures in Chrome. This time I've added to the ReleaseNotes
to warn users of the potential of exposing UB and let me repeat that
here for more exposure:

  Optimization of floating-point casts is improved. This may cause surprising
  results for code that is relying on undefined behavior. Code sanitizers can
  be used to detect affected patterns such as this:

    int main() {
      float x = 4294967296.0f;
      x = (float)((int)x);
      printf("junk in the ftrunc: %f\n", x);
      return 0;
    }

    $ clang -O1 ftrunc.c -fsanitize=undefined ; ./a.out
    ftrunc.c:5:15: runtime error: 4.29497e+09 is outside the range of 
                   representable values of type 'int'
    junk in the ftrunc: 0.000000


Original commit message:

fptosi / fptoui round towards zero, and that's the same behavior as ISD::FTRUNC,
so replace a pair of casts with the equivalent node. We don't have to account for
special cases (NaN, INF) because out-of-range casts are undefined.

Differential Revision: https://reviews.llvm.org/D44909

llvm-svn: 330437
2018-04-20 15:07:55 +00:00
Manoj Gupta 431d8c39ba [Release Notes] Add release note for "-fmerge-all-constants"
Summary:
Add note that "-fmerge-all-constants" is not applied as default
anymore.

Reviewers: rjmccall, rsmith, chandlerc

Subscribers: llvm-commits, thakis

Differential Revision: https://reviews.llvm.org/D45388

llvm-svn: 329457
2018-04-06 21:11:09 +00:00
Clement Courbet ac74acdefe Re-land r329156 "Add llvm-exegesis tool."
Fixed to depend on and initialize the native target instead of X86.

llvm-svn: 329169
2018-04-04 11:37:06 +00:00
Clement Courbet 7949b3b1dc Revert r329156 "Add llvm-exegesis tool."
Breaks a bunch of bots.

llvm-svn: 329157
2018-04-04 08:22:54 +00:00
Clement Courbet 7287b2c1ec Add llvm-exegesis tool.
Summary:
[llvm-exegesis][RFC] Automatic Measurement of Instruction Latency/Uops

This is the code corresponding to the RFC "llvm-exegesis Automatic Measurement of Instruction Latency/Uops".

The RFC is available on the LLVM mailing lists as well as the following document
for easier reading:
https://docs.google.com/document/d/1QidaJMJUyQdRrFKD66vE1_N55whe0coQ3h1GpFzz27M/edit?usp=sharing

Subscribers: mgorny, gchatelet, orwant, llvm-commits

Differential Revision: https://reviews.llvm.org/D44519

llvm-svn: 329156
2018-04-04 08:13:32 +00:00
Sylvestre Ledru f22ebb7599 Rename llvm library from libLLVM-X.Y to libLLVM-X
Summary:
As we are only doing X.0.Z releases (not using the minor version), there is no need to keep -X.Y in the version.

Like patch https://reviews.llvm.org/D41808, I propose that we rename libLLVM-7.0svn.so to libLLVM-7svn.so 
This patch will also rename downstream libraries like liblldb-7.0 to liblldb-7

Reviewers: axw, beanz, dim, hans

Reviewed By: dim, hans

Subscribers: mgorny, llvm-commits

Differential Revision: https://reviews.llvm.org/D41869

llvm-svn: 328768
2018-03-29 09:44:09 +00:00
Andrea Di Biagio ce63362ef8 [Release Notes] Add release note for llvm-mca.
Differential Revision: https://reviews.llvm.org/D44636

llvm-svn: 327965
2018-03-20 10:25:36 +00:00
Reid Kleckner f8b51c5f90 [IR] Avoid the need to prefix MS C++ symbols with '\01'
Now the Windows mangling modes ('w' and 'x') do not do any mangling for
symbols starting with '?'. This means that clang can stop adding the
hideous '\01' leading escape. This means LLVM debug logs are less likely
to contain ASCII escape characters and it will be easier to copy and
paste MS symbol names from IR.

Finally.

For non-Windows platforms, names starting with '?' still get IR
mangling, so once clang stops escaping MS C++ names, we will get extra
'_' prefixing on MachO. That's fine, since it is currently impossible to
construct a triple that uses the MS C++ ABI in clang and emits macho
object files.

Differential Revision: https://reviews.llvm.org/D7775

llvm-svn: 327734
2018-03-16 20:13:32 +00:00
Vedant Kumar 3a408538f0 Remove the LoopInstSimplify pass (-loop-instsimplify)
LoopInstSimplify is unused and untested. Reading through the commit
history the pass also seems to have a high maintenance burden.

It would be best to retire the pass for now. It should be easy to
recover if we need something similar in the future.

Differential Revision: https://reviews.llvm.org/D44053

llvm-svn: 327329
2018-03-12 20:49:42 +00:00
Daniel Neilson 01fb57e7a0 Add release note on change to memcpy/memmove/memset builtin signatures
Summary:
The signatures for the builtins @llvm.memcpy, @llvm.memmove, and @llvm.memset
where changed in rL322965. The number of arguments has decreased from five to
four with the removal of the alignment argument. Alignment is now conveyed
by supplying the align parameter attribute on the destination and/or source of
the cpy/move/set.

llvm-svn: 324265
2018-02-05 19:39:38 +00:00
Hans Wennborg 5bd4a66873 Clear release notes for 7.0.0
llvm-svn: 321727
2018-01-03 15:45:32 +00:00
Vedant Kumar 195dfd10a6 [Debugify] Add a pass to test debug info preservation
The Debugify pass synthesizes debug info for IR. It's paired with a
CheckDebugify pass which determines how much of the original debug info
is preserved. These passes make it easier to create targeted tests for
debug info preservation.

Here is the Debugify algorithm:

  NextLine = 1
  for (Instruction &I : M)
    attach DebugLoc(NextLine++) to I

  NextVar = 1
  for (Instruction &I : M)
    if (canAttachDebugValue(I))
      attach dbg.value(NextVar++) to I

The CheckDebugify pass expects contiguous ranges of DILocations and
DILocalVariables. If it fails to find all of the expected debug info, it
prints a specific error to stderr which can be FileChecked.

This was discussed on llvm-dev in the thread:
"Passes to add/validate synthetic debug info"

Differential Revision: https://reviews.llvm.org/D40512

llvm-svn: 320202
2017-12-08 21:57:28 +00:00
Daniel Sanders 7c3a89231c Add release note about TargetRegistry change from r318352
llvm-svn: 319093
2017-11-27 21:12:55 +00:00
Alexander Kornienko 3cd2ce1e0b Add a ReleaseNotes blurb for Execute.*Wait API change
... in r313155, r313156.

llvm-svn: 313356
2017-09-15 11:45:30 +00:00
Hans Wennborg 3739be2af4 Clear release notes for 6.0.0
llvm-svn: 308474
2017-07-19 14:09:16 +00:00
Chandler Carruth 3545a9e1f9 Remove the BBVectorize pass.
It served us well, helped kick-start much of the vectorization efforts
in LLVM, etc. Its time has come and past. Back in 2014:
http://lists.llvm.org/pipermail/llvm-dev/2014-November/079091.html

Time to actually let go and move forward. =]

I've updated the release notes both about the removal and the
deprecation of the corresponding C API.

llvm-svn: 306797
2017-06-30 07:09:08 +00:00
Eric Christopher bc02ef17ca Add a C API section to the release notes.
llvm-svn: 306777
2017-06-30 01:17:45 +00:00
Zachary Turner bd336e44d8 Rename llvm-pdbdump -> llvm-pdbutil.
This is to reflect the evolving nature of the tool as being
useful for more than just dumping PDBs, as it can do many other
things.

Differential Revision: https://reviews.llvm.org/D34062

llvm-svn: 305106
2017-06-09 20:46:17 +00:00
Zachary Turner 6976b7fad5 Update release notes for BinaryFormat library.
Differential Revision: https://reviews.llvm.org/D34001

llvm-svn: 304995
2017-06-08 17:47:22 +00:00
Sanjoy Das b9e6ebe348 Add a blurb to the release notes about the WeakVH -> WeakTrackingVH transition
llvm-svn: 302456
2017-05-08 19:15:06 +00:00
Hans Wennborg f1e773cab5 Don't try to link to the 4.0 release notes
llvm-svn: 294647
2017-02-09 23:03:34 +00:00
Hans Wennborg 669f0d7582 Clear the release notes for 5.0.0
llvm-svn: 291836
2017-01-12 21:50:22 +00:00
Matthias Braun 9f15a79e5d Timer: Track name and description.
The previously used "names" are rather descriptions (they use multiple
words and contain spaces), use short programming language identifier
like strings for the "names" which should be used when exporting to
machine parseable formats.

Also removed a unused TimerGroup from Hexxagon.

Differential Revision: https://reviews.llvm.org/D25583

llvm-svn: 287369
2016-11-18 19:43:18 +00:00
Dylan McKay 7293f9f7cc [ReleaseNotes] Mention the completion of the upstreaming of the AVR backend
llvm-svn: 287273
2016-11-17 22:26:09 +00:00
Sylvestre Ledru 8b59bb4da7 As we released 3.9, from the 4.0 release notes, points to version 3.9 instead of 3.8
llvm-svn: 286719
2016-11-12 10:39:09 +00:00
whitequark 4dcf92a27e [OCaml] Adapt to the new attribute C API.
llvm-svn: 286705
2016-11-12 03:38:30 +00:00
Amaury Sechet 74084c44ca Kill deprecated attribute API
Summary:
This kill various depreacated API related to attribute :
 - The deprecated C API attribute based on LLVMAttribute enum.
 - The Raw attribute set format (planned to be removed in 4.0).

Reviewers: bkramer, echristo, mehdi_amini, void

Subscribers: mehdi_amini, llvm-commits

Differential Revision: https://reviews.llvm.org/D23039

llvm-svn: 286062
2016-11-06 07:48:46 +00:00
Hans Wennborg 47f2616b6a ReleaseNotes: mention new compiler requirements
llvm-svn: 284994
2016-10-24 17:29:52 +00:00
Justin Bogner 73fc15a989 Support: Drop LLVM_ATTRIBUTE_UNUSED_RESULT
Uses of this have all been updated to use LLVM_NODISCARD, which
matches the C++17 [[nodiscard]] semantics rather than those of GCC's
__attribute__((warn_unused_result)).

llvm-svn: 284367
2016-10-17 07:37:11 +00:00
Hans Wennborg 8745289899 Trunk release notes now refer to 4.0.0
llvm-svn: 275842
2016-07-18 18:02:23 +00:00
Amaury Sechet 76c78a057a Fix list of deprecated C API attribute functions
llvm-svn: 272727
2016-06-14 22:23:29 +00:00
Amaury Sechet 5db224e1f0 Make sure we have a Add/Remove/Has function for various thing that can have attribute.
Summary: This also deprecated the get attribute function familly.

Reviewers: Wallbraker, whitequark, joker.eph, echristo, rafael, jyknight

Subscribers: axw, joker.eph, llvm-commits

Differential Revision: http://reviews.llvm.org/D19181

llvm-svn: 272504
2016-06-12 06:17:24 +00:00
Chris Bieneman 71ef6178e1 Updating release notes for CMake version bump
CMake 3.4.3 is now required for building LLVM-based projects.

llvm-svn: 271945
2016-06-06 22:02:16 +00:00
Mehdi Amini 448dd8c5c6 Add a FIXME note in the release notes about documenting ThinLTO
llvm-svn: 271742
2016-06-03 21:45:34 +00:00
Amaury Sechet 447831acae Extract renaming from D19181
Summary: This needs to get in before anything is released concerning attribute. If the old name gets in the wild, then we are stuck with it forever. Putting it in its own diff should getting that part at least in fast.

Reviewers: Wallbraker, whitequark, joker.eph, echristo, rafael, jyknight

Subscribers: llvm-commits, joker.eph

Differential Revision: http://reviews.llvm.org/D20417

llvm-svn: 270452
2016-05-23 16:38:25 +00:00
Justin Bogner b012699741 SDAG: Rename Select->SelectImpl and repurpose Select as returning void
This is a step towards removing the rampant undefined behaviour in
SelectionDAG, which is a part of llvm.org/PR26808.

We rename SelectionDAGISel::Select to SelectImpl and update targets to
match, and then change Select to return void and consolidate the
sketchy behaviour we're trying to get away from there.

Next, we'll update backends to implement `void Select(...)` instead of
SelectImpl and eventually drop the base Select implementation.

llvm-svn: 268693
2016-05-05 23:19:08 +00:00
Chuang-Yu Cheng 0600e8d759 [ppc64] Reenable sibling call optimization on ppc64 since fixed tsan library tail-call issue
print-stack-trace.cc test failure of compiler-rt has been fixed by
r266869 (http://reviews.llvm.org/D19148), so reenable sibling call
optimization on ppc64

Reviewers: nemanjai kbarton
llvm-svn: 267527
2016-04-26 07:38:24 +00:00
Sanjoy Das 784ec12a3c Add some release notes about the fix for PR26774
As suggested by Chandler on the review thread for D18634.

llvm-svn: 267239
2016-04-22 22:45:23 +00:00
Amaury Sechet 60b31453ac Add LLVMGetAttrKindID in the C API in order to facilitate migration away from LLVMAttribute
Summary:
LLVMAttribute has outlived its utility and is becoming a problem for C API users that what to use all the LLVM attributes. In order to help moving away from LLVMAttribute in a smooth manner, this diff introduce LLVMGetAttrKindIDInContext, which can be used instead of the enum values.

See D18749 for reference.

Reviewers: Wallbraker, whitequark, joker.eph, echristo, rafael

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D19081

llvm-svn: 266842
2016-04-20 01:02:12 +00:00
Mehdi Amini 03b42e41bf Remove every uses of getGlobalContext() in LLVM (but the C API)
At the same time, fixes InstructionsTest::CastInst unittest: yes
you can leave the IR in an invalid state and exit when you don't
destroy the context (like the global one), no longer now.

This is the first part of http://reviews.llvm.org/D19094

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 266379
2016-04-14 21:59:01 +00:00
Ehsan Amiri ec63c92916 [PPC] Added a note to release notes
A draft line added to release notes for PPC, to keep a record of changes.
This is just a draft and will be rewritten towards the end of release.

llvm-svn: 265694
2016-04-07 16:47:35 +00:00
Hans Wennborg e1a2e90ffa Change eliminateCallFramePseudoInstr() to return an iterator
This will become necessary in a subsequent change to make this method
merge adjacent stack adjustments, i.e. it might erase the previous
and/or next instruction.

It also greatly simplifies the calls to this function from Prolog-
EpilogInserter. Previously, that had a bunch of logic to resume iteration
after the call; now it just continues with the returned iterator.

Note that this changes the behaviour of PEI a little. Previously,
it attempted to re-visit the new instruction created by
eliminateCallFramePseudoInstr(). That code was added in r36625,
but I can't see any reason for it: the new instructions will obviously
not be pseudo instructions, they will not have FrameIndex operands,
and we have already accounted for the stack adjustment.

Differential Revision: http://reviews.llvm.org/D18627

llvm-svn: 265036
2016-03-31 18:33:38 +00:00