llvm-project

Commit Graph

Author	SHA1	Message	Date
Johannes Doerfert	0cc2b61943	[Attributor] Shortcut no-return through will-return No-return and will-return are exclusive, assuming the latter is more prominent we can avoid updates of the former unless will-return is not known for sure. llvm-svn: 374739	2019-10-13 21:25:53 +00:00
Johannes Doerfert	d82385b049	[Attributor][FIX] NullPointerIsDefined needs the pointer AS (AANonNull) Also includes a shortcut via AADereferenceable if possible. llvm-svn: 374737	2019-10-13 20:48:26 +00:00
Johannes Doerfert	8ee410c75e	[Attributor][MemBehavior] Fallback to the function state for arguments Even if an argument is captured, we cannot have an effect the function does not have. This is fine except for the special case of `inalloca` as it does not behave by the rules. TODO: Maybe the special rule for `inalloca` is wrong after all. llvm-svn: 374736	2019-10-13 20:47:16 +00:00
Johannes Doerfert	db6efb017f	[Attributor][FIX] Use check prefix that is actually tested Summary: This changes "CHECK" check lines to "ATTRIBUTOR" check lines where necessary and also fixes the now exposed, mostly minor, problems. Reviewers: sstefan1, uenoku Subscribers: hiraditya, bollu, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68929 llvm-svn: 374735	2019-10-13 20:40:10 +00:00
Craig Topper	25eb219959	[X86] Enable use of avx512 saturating truncate instructions in more cases. This enables use of the saturating truncate instructions when the result type is less than 128 bits. It also enables the use of saturating truncate instructions on KNL when the input is less than 512 bits. We can do this by widening the input and then extracting the result. llvm-svn: 374731	2019-10-13 19:07:28 +00:00
Sanjay Patel	b32e4664a7	[ConstantFold] fix inconsistent handling of extractelement with undef index (PR42689) Any constant other than zero was already folded to undef if the index is undef. https://bugs.llvm.org/show_bug.cgi?id=42689 llvm-svn: 374729	2019-10-13 17:34:08 +00:00
Sanjay Patel	f90728c322	[InstCombine] don't assume 'inbounds' for bitcast deref or null pointer in non-default address space Follow-up to D68244 to account for a corner case discussed in: https://bugs.llvm.org/show_bug.cgi?id=43501 Add one more restriction: if the pointer is deref-or-null and in a non-default (non-zero) address space, we can't assume inbounds. Differential Revision: https://reviews.llvm.org/D68706 llvm-svn: 374728	2019-10-13 17:19:08 +00:00
Roman Lebedev	8e2561974d	[NFC][InstCombine] More test for "sign bit test via shifts" pattern (PR43595) While that pattern is indirectly handled via reassociateShiftAmtsOfTwoSameDirectionShifts(), that incursme one-use restriction on truncation, which is pointless since we know that we'll produce a single instruction. Additionally, if we are only looking for sign bit, we don't need shifts to be identical, which isn't the case in general, and is the blocker for me in bug in question: https://bugs.llvm.org/show_bug.cgi?id=43595 llvm-svn: 374726	2019-10-13 17:11:16 +00:00
Simon Pilgrim	e84916d891	[X86][AVX] Add i686 avx splat tests llvm-svn: 374719	2019-10-13 13:18:07 +00:00
Craig Topper	d50cb9ac8c	[X86] Add a one use check on the setcc to the min/max canonicalization code in combineSelect. This seems to improve std::midpoint code where we have a min and a max with the same condition. If we split the setcc we can end up with two compares if the one of the operands is a constant. Since we aggressively canonicalize compares with constants. For non-constants it can interfere with our ability to share control flow if we need to expand cmovs into control flow. I'm also not sure I understand this min/max canonicalization code. The motivating case talks about comparing with 0. But we don't check for 0 explicitly. Removes one instruction from the codegen for PR43658. llvm-svn: 374706	2019-10-13 06:48:05 +00:00
Craig Topper	bf57aa2b25	[X86] Enable v4i32->v4i16 and v8i16->v8i8 saturating truncates to use pack instructions with avx512. llvm-svn: 374705	2019-10-13 05:47:47 +00:00
Craig Topper	8fe8adb9f1	[X86] Add v2i64->v2i32/v2i16/v2i8 test cases to the trunc packus/ssat/usat tests. NFC llvm-svn: 374704	2019-10-13 05:47:42 +00:00
Johannes Doerfert	4056e7f02a	[Attributor][FIX] Avoid splitting blocks if possible Before, we eagerly split blocks even if it was not necessary, e.g., they had a single unreachable instruction and only a single predecessor. llvm-svn: 374703	2019-10-13 05:27:09 +00:00
Johannes Doerfert	af6e479733	[Attributor][FIX] Ensure h2s doesn't trigger on escaped pointers We do not yet perform h2s because we know something is free'ed but we do it because we know the pointer does not escape. Storing the pointer allows it to escape so we have to prevent that. llvm-svn: 374699	2019-10-13 04:14:15 +00:00
Johannes Doerfert	d20f80780e	[Attributor][FIX] Do not apply h2s for arbitrary mallocs H2S did apply to mallocs of non-constant sizes if the uses were OK. This is now forbidden through reording of the "good" and "bad" cases in the conditional. llvm-svn: 374698	2019-10-13 03:54:08 +00:00
Johannes Doerfert	9daf51910b	[Attributor][FIX] Add missing function declaration in test case llvm-svn: 374696	2019-10-13 02:42:09 +00:00
Johannes Doerfert	ea1e81f54b	[Attributor][FIX] Avoid modifying naked/optnone functions The check for naked/optnone was insufficient for different reasons. We now check before we initialize an abstract attribute and we do it for all abstract attributes. llvm-svn: 374694	2019-10-13 02:24:02 +00:00
Johannes Doerfert	92694eba93	[SROA] Reuse existing lifetime markers if possible Summary: If the underlying alloca did not change, we do not necessarily need new lifetime markers. This patch adds a check and reuses the old ones if possible. Reviewers: reames, ssarda, t.p.northover, hfinkel Subscribers: hiraditya, bollu, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68900 llvm-svn: 374692	2019-10-13 02:21:23 +00:00
Joel E. Denny	57046e8fd9	Revert r374652: "[lit] Fix internal diff's --strip-trailing-cr and use it" This series of patches still breaks a Windows bot. llvm-svn: 374679	2019-10-12 18:51:51 +00:00
Joel E. Denny	9abfa58171	Revert r374653: "[lit] Fix a few oversights in r374651 that broke some bots" This series of patches still breaks a Windows bot. llvm-svn: 374678	2019-10-12 18:51:34 +00:00
Roman Lebedev	76cdcf25b8	[LoopIdiomRecognize] Recommit: BCmp loop idiom recognition Summary: This is a recommit, this originally landed in rL370454 but was subsequently reverted in rL370788 due to https://bugs.llvm.org/show_bug.cgi?id=43206 The reduced testcase was added to bcmp-negative-tests.ll as @pr43206_different_loops - we must ensure that the SCEV's we got are both for the same loop we are currently investigating. Original commit message: @mclow.lists brought up this issue up in IRC. It is a reasonably common problem to compare some two values for equality. Those may be just some integers, strings or arrays of integers. In C, there is `memcmp()`, `bcmp()` functions. In C++, there exists `std::equal()` algorithm. One can also write that function manually. libstdc++'s `std::equal()` is specialized to directly call `memcmp()` for various types, but not `std::byte` from C++2a. https://godbolt.org/z/mx2ejJ libc++ does not do anything like that, it simply relies on simple C++'s `operator==()`. https://godbolt.org/z/er0Zwf (GOOD!) So likely, there exists a certain performance opportunities. Let's compare performance of naive `std::equal()` (no `memcmp()`) with one that is using `memcmp()` (in this case, compiled with modified compiler). {F8768213} ``` #include <algorithm> #include <cmath> #include <cstdint> #include <iterator> #include <limits> #include <random> #include <type_traits> #include <utility> #include <vector> #include "benchmark/benchmark.h" template <class T> bool equal(T* a, T* a_end, T* b) noexcept { for (; a != a_end; ++a, ++b) { if (a != b) return false; } return true; } template <typename T> std::vector<T> getVectorOfRandomNumbers(size_t count) { std::random_device rd; std::mt19937 gen(rd()); std::uniform_int_distribution<T> dis(std::numeric_limits<T>::min(), std::numeric_limits<T>::max()); std::vector<T> v; v.reserve(count); std::generate_n(std::back_inserter(v), count, [&dis, &gen]() { return dis(gen); }); assert(v.size() == count); return v; } struct Identical { template <typename T> static std::pair<std::vector<T>, std::vector<T>> Gen(size_t count) { auto Tmp = getVectorOfRandomNumbers<T>(count); return std::make_pair(Tmp, std::move(Tmp)); } }; struct InequalHalfway { template <typename T> static std::pair<std::vector<T>, std::vector<T>> Gen(size_t count) { auto V0 = getVectorOfRandomNumbers<T>(count); auto V1 = V0; V1[V1.size() / size_t(2)]++; // just change the value. return std::make_pair(std::move(V0), std::move(V1)); } }; template <class T, class Gen> void BM_bcmp(benchmark::State& state) { const size_t Length = state.range(0); const std::pair<std::vector<T>, std::vector<T>> Data = Gen::template Gen<T>(Length); const std::vector<T>& a = Data.first; const std::vector<T>& b = Data.second; assert(a.size() == Length && b.size() == a.size()); benchmark::ClobberMemory(); benchmark::DoNotOptimize(a); benchmark::DoNotOptimize(a.data()); benchmark::DoNotOptimize(b); benchmark::DoNotOptimize(b.data()); for (auto _ : state) { const bool is_equal = equal(a.data(), a.data() + a.size(), b.data()); benchmark::DoNotOptimize(is_equal); } state.SetComplexityN(Length); state.counters["eltcnt"] = benchmark::Counter(Length, benchmark::Counter::kIsIterationInvariant); state.counters["eltcnt/sec"] = benchmark::Counter(Length, benchmark::Counter::kIsIterationInvariantRate); const size_t BytesRead = 2 * sizeof(T) * Length; state.counters["bytes_read/iteration"] = benchmark::Counter(BytesRead, benchmark::Counter::kDefaults, benchmark::Counter::OneK::kIs1024); state.counters["bytes_read/sec"] = benchmark::Counter( BytesRead, benchmark::Counter::kIsIterationInvariantRate, benchmark::Counter::OneK::kIs1024); } template <typename T> static void CustomArguments(benchmark::internal::Benchmark* b) { const size_t L2SizeBytes = []() { for (const benchmark::CPUInfo::CacheInfo& I : benchmark::CPUInfo::Get().caches) { if (I.level == 2) return I.size; } return 0; }(); // What is the largest range we can check to always fit within given L2 cache? const size_t MaxLen = L2SizeBytes / /total bufs/ 2 / /maximal elt size/ sizeof(T) / /safety margin/ 2; b->RangeMultiplier(2)->Range(1, MaxLen)->Complexity(benchmark::oN); } BENCHMARK_TEMPLATE(BM_bcmp, uint8_t, Identical) ->Apply(CustomArguments<uint8_t>); BENCHMARK_TEMPLATE(BM_bcmp, uint16_t, Identical) ->Apply(CustomArguments<uint16_t>); BENCHMARK_TEMPLATE(BM_bcmp, uint32_t, Identical) ->Apply(CustomArguments<uint32_t>); BENCHMARK_TEMPLATE(BM_bcmp, uint64_t, Identical) ->Apply(CustomArguments<uint64_t>); BENCHMARK_TEMPLATE(BM_bcmp, uint8_t, InequalHalfway) ->Apply(CustomArguments<uint8_t>); BENCHMARK_TEMPLATE(BM_bcmp, uint16_t, InequalHalfway) ->Apply(CustomArguments<uint16_t>); BENCHMARK_TEMPLATE(BM_bcmp, uint32_t, InequalHalfway) ->Apply(CustomArguments<uint32_t>); BENCHMARK_TEMPLATE(BM_bcmp, uint64_t, InequalHalfway) ->Apply(CustomArguments<uint64_t>); ``` {F8768210} ``` $ ~/src/googlebenchmark/tools/compare.py --no-utest benchmarks build-{old,new}/test/llvm-bcmp-bench RUNNING: build-old/test/llvm-bcmp-bench --benchmark_out=/tmp/tmpb6PEUx 2019-04-25 21:17:11 Running build-old/test/llvm-bcmp-bench Run on (8 X 4000 MHz CPU s) CPU Caches: L1 Data 16K (x8) L1 Instruction 64K (x4) L2 Unified 2048K (x4) L3 Unified 8192K (x1) Load Average: 0.65, 3.90, 4.14 --------------------------------------------------------------------------------------------------- Benchmark Time CPU Iterations UserCounters... --------------------------------------------------------------------------------------------------- <...> BM_bcmp<uint8_t, Identical>/512000 432131 ns 432101 ns 1613 bytes_read/iteration=1000k bytes_read/sec=2.20706G/s eltcnt=825.856M eltcnt/sec=1.18491G/s BM_bcmp<uint8_t, Identical>_BigO 0.86 N 0.86 N BM_bcmp<uint8_t, Identical>_RMS 8 % 8 % <...> BM_bcmp<uint16_t, Identical>/256000 161408 ns 161409 ns 4027 bytes_read/iteration=1000k bytes_read/sec=5.90843G/s eltcnt=1030.91M eltcnt/sec=1.58603G/s BM_bcmp<uint16_t, Identical>_BigO 0.67 N 0.67 N BM_bcmp<uint16_t, Identical>_RMS 25 % 25 % <...> BM_bcmp<uint32_t, Identical>/128000 81497 ns 81488 ns 8415 bytes_read/iteration=1000k bytes_read/sec=11.7032G/s eltcnt=1077.12M eltcnt/sec=1.57078G/s BM_bcmp<uint32_t, Identical>_BigO 0.71 N 0.71 N BM_bcmp<uint32_t, Identical>_RMS 42 % 42 % <...> BM_bcmp<uint64_t, Identical>/64000 50138 ns 50138 ns 10909 bytes_read/iteration=1000k bytes_read/sec=19.0209G/s eltcnt=698.176M eltcnt/sec=1.27647G/s BM_bcmp<uint64_t, Identical>_BigO 0.84 N 0.84 N BM_bcmp<uint64_t, Identical>_RMS 27 % 27 % <...> BM_bcmp<uint8_t, InequalHalfway>/512000 192405 ns 192392 ns 3638 bytes_read/iteration=1000k bytes_read/sec=4.95694G/s eltcnt=1.86266G eltcnt/sec=2.66124G/s BM_bcmp<uint8_t, InequalHalfway>_BigO 0.38 N 0.38 N BM_bcmp<uint8_t, InequalHalfway>_RMS 3 % 3 % <...> BM_bcmp<uint16_t, InequalHalfway>/256000 127858 ns 127860 ns 5477 bytes_read/iteration=1000k bytes_read/sec=7.45873G/s eltcnt=1.40211G eltcnt/sec=2.00219G/s BM_bcmp<uint16_t, InequalHalfway>_BigO 0.50 N 0.50 N BM_bcmp<uint16_t, InequalHalfway>_RMS 0 % 0 % <...> BM_bcmp<uint32_t, InequalHalfway>/128000 49140 ns 49140 ns 14281 bytes_read/iteration=1000k bytes_read/sec=19.4072G/s eltcnt=1.82797G eltcnt/sec=2.60478G/s BM_bcmp<uint32_t, InequalHalfway>_BigO 0.40 N 0.40 N BM_bcmp<uint32_t, InequalHalfway>_RMS 18 % 18 % <...> BM_bcmp<uint64_t, InequalHalfway>/64000 32101 ns 32099 ns 21786 bytes_read/iteration=1000k bytes_read/sec=29.7101G/s eltcnt=1.3943G eltcnt/sec=1.99381G/s BM_bcmp<uint64_t, InequalHalfway>_BigO 0.50 N 0.50 N BM_bcmp<uint64_t, InequalHalfway>_RMS 1 % 1 % RUNNING: build-new/test/llvm-bcmp-bench --benchmark_out=/tmp/tmpQ46PP0 2019-04-25 21:19:29 Running build-new/test/llvm-bcmp-bench Run on (8 X 4000 MHz CPU s) CPU Caches: L1 Data 16K (x8) L1 Instruction 64K (x4) L2 Unified 2048K (x4) L3 Unified 8192K (x1) Load Average: 1.01, 2.85, 3.71 --------------------------------------------------------------------------------------------------- Benchmark Time CPU Iterations UserCounters... --------------------------------------------------------------------------------------------------- <...> BM_bcmp<uint8_t, Identical>/512000 18593 ns 18590 ns 37565 bytes_read/iteration=1000k bytes_read/sec=51.2991G/s eltcnt=19.2333G eltcnt/sec=27.541G/s BM_bcmp<uint8_t, Identical>_BigO 0.04 N 0.04 N BM_bcmp<uint8_t, Identical>_RMS 37 % 37 % <...> BM_bcmp<uint16_t, Identical>/256000 18950 ns 18948 ns 37223 bytes_read/iteration=1000k bytes_read/sec=50.3324G/s eltcnt=9.52909G eltcnt/sec=13.511G/s BM_bcmp<uint16_t, Identical>_BigO 0.08 N 0.08 N BM_bcmp<uint16_t, Identical>_RMS 34 % 34 % <...> BM_bcmp<uint32_t, Identical>/128000 18627 ns 18627 ns 37895 bytes_read/iteration=1000k bytes_read/sec=51.198G/s eltcnt=4.85056G eltcnt/sec=6.87168G/s BM_bcmp<uint32_t, Identical>_BigO 0.16 N 0.16 N BM_bcmp<uint32_t, Identical>_RMS 35 % 35 % <...> BM_bcmp<uint64_t, Identical>/64000 18855 ns 18855 ns 37458 bytes_read/iteration=1000k bytes_read/sec=50.5791G/s eltcnt=2.39731G eltcnt/sec=3.3943G/s BM_bcmp<uint64_t, Identical>_BigO 0.32 N 0.32 N BM_bcmp<uint64_t, Identical>_RMS 33 % 33 % <...> BM_bcmp<uint8_t, InequalHalfway>/512000 9570 ns 9569 ns 73500 bytes_read/iteration=1000k bytes_read/sec=99.6601G/s eltcnt=37.632G eltcnt/sec=53.5046G/s BM_bcmp<uint8_t, InequalHalfway>_BigO 0.02 N 0.02 N BM_bcmp<uint8_t, InequalHalfway>_RMS 29 % 29 % <...> BM_bcmp<uint16_t, InequalHalfway>/256000 9547 ns 9547 ns 74343 bytes_read/iteration=1000k bytes_read/sec=99.8971G/s eltcnt=19.0318G eltcnt/sec=26.8159G/s BM_bcmp<uint16_t, InequalHalfway>_BigO 0.04 N 0.04 N BM_bcmp<uint16_t, InequalHalfway>_RMS 29 % 29 % <...> BM_bcmp<uint32_t, InequalHalfway>/128000 9396 ns 9394 ns 73521 bytes_read/iteration=1000k bytes_read/sec=101.518G/s eltcnt=9.41069G eltcnt/sec=13.6255G/s BM_bcmp<uint32_t, InequalHalfway>_BigO 0.08 N 0.08 N BM_bcmp<uint32_t, InequalHalfway>_RMS 30 % 30 % <...> BM_bcmp<uint64_t, InequalHalfway>/64000 9499 ns 9498 ns 73802 bytes_read/iteration=1000k bytes_read/sec=100.405G/s eltcnt=4.72333G eltcnt/sec=6.73808G/s BM_bcmp<uint64_t, InequalHalfway>_BigO 0.16 N 0.16 N BM_bcmp<uint64_t, InequalHalfway>_RMS 28 % 28 % Comparing build-old/test/llvm-bcmp-bench to build-new/test/llvm-bcmp-bench Benchmark Time CPU Time Old Time New CPU Old CPU New --------------------------------------------------------------------------------------------------------------------------------------- <...> BM_bcmp<uint8_t, Identical>/512000 -0.9570 -0.9570 432131 18593 432101 18590 <...> BM_bcmp<uint16_t, Identical>/256000 -0.8826 -0.8826 161408 18950 161409 18948 <...> BM_bcmp<uint32_t, Identical>/128000 -0.7714 -0.7714 81497 18627 81488 18627 <...> BM_bcmp<uint64_t, Identical>/64000 -0.6239 -0.6239 50138 18855 50138 18855 <...> BM_bcmp<uint8_t, InequalHalfway>/512000 -0.9503 -0.9503 192405 9570 192392 9569 <...> BM_bcmp<uint16_t, InequalHalfway>/256000 -0.9253 -0.9253 127858 9547 127860 9547 <...> BM_bcmp<uint32_t, InequalHalfway>/128000 -0.8088 -0.8088 49140 9396 49140 9394 <...> BM_bcmp<uint64_t, InequalHalfway>/64000 -0.7041 -0.7041 32101 9499 32099 9498 ``` What can we tell from the benchmark? * Performance of naive equality check somewhat improves with element size, maxing out at eltcnt/sec=1.58603G/s for uint16_t, or bytes_read/sec=19.0209G/s for uint64_t. I think, that instability implies performance problems. * Performance of `memcmp()`-aware benchmark always maxes out at around bytes_read/sec=51.2991G/s for every type. That is 2.6x the throughput of the naive variant! * eltcnt/sec metric for the `memcmp()`-aware benchmark maxes out at eltcnt/sec=27.541G/s for uint8_t (was: eltcnt/sec=1.18491G/s, so 24x) and linearly decreases with element size. For uint64_t, it's ~4x+ the elements/second. * The call obvious is more pricey than the loop, with small element count. As it can be seen from the full output {F8768210}, the `memcmp()` is almost universally worse, independent of the element size (and thus buffer size) when element count is less than 8. So all in all, bcmp idiom does indeed pose untapped performance headroom. This diff does implement said idiom recognition. I think a reasonable test coverage is present, but do tell if there is anything obvious missing. Now, quality. This does succeed to build and pass the test-suite, at least without any non-bundled elements. {F8768216} {F8768217} This transform fires 91 times: ``` $ /build/test-suite/utils/compare.py -m loop-idiom.NumBCmp result-new.json Tests: 1149 Metric: loop-idiom.NumBCmp Program result-new MultiSourc...Benchmarks/7zip/7zip-benchmark 79.00 MultiSource/Applications/d/make_dparser 3.00 SingleSource/UnitTests/vla 2.00 MultiSource/Applications/Burg/burg 1.00 MultiSourc.../Applications/JM/lencod/lencod 1.00 MultiSource/Applications/lemon/lemon 1.00 MultiSource/Benchmarks/Bullet/bullet 1.00 MultiSourc...e/Benchmarks/MallocBench/gs/gs 1.00 MultiSourc...gs-C/TimberWolfMC/timberwolfmc 1.00 MultiSourc...Prolangs-C/simulator/simulator 1.00 ``` The size changes are: I'm not sure what's going on with SingleSource/UnitTests/vla.test yet, did not look. ``` $ /build/test-suite/utils/compare.py -m size..text result-{old,new}.json --filter-hash Tests: 1149 Same hash: 907 (filtered out) Remaining: 242 Metric: size..text Program result-old result-new diff test-suite...ingleSource/UnitTests/vla.test 753.00 833.00 10.6% test-suite...marks/7zip/7zip-benchmark.test 1001697.00 966657.00 -3.5% test-suite...ngs-C/simulator/simulator.test 32369.00 32321.00 -0.1% test-suite...plications/d/make_dparser.test 89585.00 89505.00 -0.1% test-suite...ce/Applications/Burg/burg.test 40817.00 40785.00 -0.1% test-suite.../Applications/lemon/lemon.test 47281.00 47249.00 -0.1% test-suite...TimberWolfMC/timberwolfmc.test 250065.00 250113.00 0.0% test-suite...chmarks/MallocBench/gs/gs.test 149889.00 149873.00 -0.0% test-suite...ications/JM/lencod/lencod.test 769585.00 769569.00 -0.0% test-suite.../Benchmarks/Bullet/bullet.test 770049.00 770049.00 0.0% test-suite...HMARK_ANISTROPIC_DIFFUSION/128 NaN NaN nan% test-suite...HMARK_ANISTROPIC_DIFFUSION/256 NaN NaN nan% test-suite...CHMARK_ANISTROPIC_DIFFUSION/64 NaN NaN nan% test-suite...CHMARK_ANISTROPIC_DIFFUSION/32 NaN NaN nan% test-suite...ENCHMARK_BILATERAL_FILTER/64/4 NaN NaN nan% Geomean difference nan% result-old result-new diff count 1.000000e+01 10.00000 10.000000 mean 3.152090e+05 311695.40000 0.006749 std 3.790398e+05 372091.42232 0.036605 min 7.530000e+02 833.00000 -0.034981 25% 4.243300e+04 42401.00000 -0.000866 50% 1.197370e+05 119689.00000 -0.000392 75% 6.397050e+05 639705.00000 -0.000005 max 1.001697e+06 966657.00000 0.106242 ``` I don't have timings though. And now to the code. The basic idea is to completely replace the whole loop. If we can't fully kill it, don't transform. I have left one or two comments in the code, so hopefully it can be understood. Also, there is a few TODO's that i have left for follow-ups: * widening of `memcmp()`/`bcmp()` * step smaller than the comparison size * Metadata propagation * more than two blocks as long as there is still a single backedge? * ??? Reviewers: reames, fhahn, mkazantsev, chandlerc, craig.topper, courbet Reviewed By: courbet Subscribers: miyuki, hiraditya, xbolva00, nikic, jfb, gchatelet, courbet, llvm-commits, mclow.lists Tags: #llvm Differential Revision: https://reviews.llvm.org/D61144 llvm-svn: 374662	2019-10-12 15:35:32 +00:00
Roman Lebedev	45539737dd	[NFC][LoopIdiom] Add bcmp loop idiom miscompile test from PR43206. The transform forgot to check SCEV loop scopes. https://bugs.llvm.org/show_bug.cgi?id=43206 llvm-svn: 374661	2019-10-12 15:35:16 +00:00
Roman Lebedev	c41e9f6bbf	[NFC][LoopIdiom] Move one bcmp test into the proper place llvm-svn: 374660	2019-10-12 15:35:09 +00:00
Simon Pilgrim	9f0885d38d	[X86][SSE] Avoid unnecessary PMOVZX in v4i8 sum reduction This should go away once D66004 has landed and we can simplify shuffle chains using demanded elts. llvm-svn: 374658	2019-10-12 15:19:13 +00:00
Simon Pilgrim	1b59a16c0b	[CostModel][X86] Improve sum reduction costs. I can't see any notable differences in costs between SSE2 and SSE42 arches for FADD/ADD reduction, so I've lowered the target to just SSE2. I've also added vXi8 sum reduction costs in line with the PSADBW codegen and discussions on PR42674. llvm-svn: 374655	2019-10-12 13:21:50 +00:00
Joel E. Denny	648875bbcf	[lit] Fix a few oversights in r374651 that broke some bots llvm-svn: 374653	2019-10-12 12:32:00 +00:00
Joel E. Denny	0f80927316	[lit] Fix internal diff's --strip-trailing-cr and use it Using GNU diff, `--strip-trailing-cr` removes a `\r` appearing before a `\n` at the end of a line. Without this patch, lit's internal diff only removes `\r` if it appears as the last character. That seems useless. This patch fixes that. This patch also adds `--strip-trailing-cr` to some tests that fail on Windows bots when D68664 is applied. Based on what I see in the bot logs, I think the following is happening. In each test there, lit diff is comparing a file with `\r\n` line endings to a file with `\n` line endings. Without D68664, lit diff reads those files with Python's universal newlines support activated, causing `\r` to be dropped. However, with D68664, lit diff reads the files in binary mode instead and thus reports that every line is different, just as GNU diff does (at least under Ubuntu). Adding `--strip-trailing-cr` to those tests restores the previous behavior while permitting the behavior of lit diff to be more like GNU diff. Reviewed By: rnk Differential Revision: https://reviews.llvm.org/D68839 llvm-svn: 374652	2019-10-12 11:58:30 +00:00
Craig Topper	9bd542dcd5	[X86] Use pack instructions for packus/ssat truncate patterns when 256-bit is the largest legal vector and the result type is at least 256 bits. Since the input type is larger than 256-bits we'll need to some concatenating to reassemble the results. The pack instructions ability to concatenate while packing make this a shorter/faster sequence. llvm-svn: 374643	2019-10-12 07:59:29 +00:00
Craig Topper	80a4feed7c	[X86] Test SKX cpu in the vector-trunc-packus/ssat/usat.ll tests instad of min-legal-vector-width.ll This adds "min-legal-vector-width"="256" function attributes to all the tests for a larger than 256-bit input. Also switch any larger than 512-bit inputs to use a load. This makes the arguments consistent with min-legal-vector-width attribute which should usually be at least as large as the arguments. The SKX configuration will avoid using zmm registers on the modified test cases. For many of them we should use something closer to the AVX2 codegen with pack instructions instead of the avx512 saturating truncates. llvm-svn: 374642	2019-10-12 07:59:24 +00:00
Simon Atanasyan	4a46af845f	[mips] Fix `loadImmediate` calls when load non-address values. llvm-svn: 374640	2019-10-12 07:42:44 +00:00
Vitaly Buka	ec6bfa81b7	Revert 374629 "[sancov] Accommodate sancov and coverage report server for use under Windows" http://lab.llvm.org:8011/builders/clang-s390x-linux/builds/27650/steps/ninja%20check%201/logs/stdio http://lab.llvm.org:8011/builders/clang-ppc64be-linux-lnt/builds/31759 http://lab.llvm.org:8011/builders/clang-s390x-linux-lnt/builds/15095 http://lab.llvm.org:8011/builders/clang-ppc64be-linux-multistage/builds/21075 http://lab.llvm.org:8011/builders/clang-ppc64be-linux-lnt/builds/31759 llvm-svn: 374636	2019-10-12 05:23:43 +00:00
Zi Xuan Wu	9802268ad3	recommit: [LoopVectorize][PowerPC] Estimate int and float register pressure separately in loop-vectorize In loop-vectorize, interleave count and vector factor depend on target register number. Currently, it does not estimate different register pressure for different register class separately(especially for scalar type, float type should not be on the same position with int type), so it's not accurate. Specifically, it causes too many times interleaving/unrolling, result in too many register spills in loop body and hurting performance. So we need classify the register classes in IR level, and importantly these are abstract register classes, and are not the target register class of backend provided in td file. It's used to establish the mapping between the types of IR values and the number of simultaneous live ranges to which we'd like to limit for some set of those types. For example, POWER target, register num is special when VSX is enabled. When VSX is enabled, the number of int scalar register is 32(GPR), float is 64(VSR), but for int and float vector register both are 64(VSR). So there should be 2 kinds of register class when vsx is enabled, and 3 kinds of register class when VSX is NOT enabled. It runs on POWER target, it makes big(+~30%) performance improvement in one specific bmk(503.bwaves_r) of spec2017 and no other obvious degressions. Differential revision: https://reviews.llvm.org/D67148 llvm-svn: 374634	2019-10-12 02:53:04 +00:00
Vitaly Buka	23aa2aec78	[sancov] Accommodate sancov and coverage report server for use under Windows Summary: This patch makes the following changes to SanCov and its complementary Python script in order to resolve issues pertaining to non-UNIX file paths in JSON symbolization information: * Convert all paths to use forward slash. * Update `coverage-report-server.py` to correctly handle paths to sources which contain spaces. * Remove Linux platform restriction for all SanCov unit tests. All SanCov tests passed when ran on my local Windows machine. Patch by Douglas Gliner. Reviewers: kcc, filcab, phosek, morehouse, vitalybuka, metzman Reviewed By: vitalybuka Subscribers: vsk, Dor1s, llvm-commits Tags: #sanitizers, #llvm Differential Revision: https://reviews.llvm.org/D51018 llvm-svn: 374629	2019-10-12 02:29:26 +00:00
Vitaly Buka	e8a462a019	[sancov] Use LLVM Support library JSON writer in favor of individual implementation Summary: In this diff, I've replaced the individual implementation of `JSONWriter` with `json::OStream` provided by `llvm/Support/JSON.h`. Important Note: The output format of the JSON is considerably different compared to the original implementation. Important differences include: * New line for each entry in an array (should make diffs cleaner) * No space between keys and colon in attributed object entries. * Attributes with empty strings will now print the attribute name and a quote pair rather than excluding the attribute altogether Examples of these differences can be seen in the changes to the sancov tests which compare the JSON output. Patch by Douglas Gliner. Reviewers: kcc, filcab, phosek, morehouse, vitalybuka, metzman Subscribers: mehdi_amini, dexonsmith, llvm-commits Tags: #sanitizers, #llvm Differential Revision: https://reviews.llvm.org/D68752 llvm-svn: 374628	2019-10-12 02:29:24 +00:00
Vedant Kumar	852e3b2076	[llvm-profdata] Make "malformed-ptr-to-counter-array.test" textual As pointed out in https://reviews.llvm.org/D66979 post-commit, making this test textual would make it more maintainable. Differential Revision: https://reviews.llvm.org/D68718 llvm-svn: 374617	2019-10-12 00:23:15 +00:00
Craig Topper	3472feb94c	[X86] Fold a VTRUNCS/VTRUNCUS+store into a saturating truncating store. We already did this for VTRUNCUS with a specific combination of types. This extends this to VTRUNCS and handles any types where a truncating store is legal. llvm-svn: 374615	2019-10-12 00:01:08 +00:00
Craig Topper	7dcd440d44	[X86] Add test case showing missing opportunity to fold vmovsdb into a store after type legalization. NFC llvm-svn: 374614	2019-10-12 00:00:59 +00:00
Stanislav Mekhanoshin	f87fe45d5c	[AMDGPU] Use GCN prefix in dpp_combine.mir. NFC. llvm-svn: 374607	2019-10-11 22:28:04 +00:00
Stanislav Mekhanoshin	e2d104f64c	[AMDGPU] link dpp pseudos and real instructions on gfx10 This defaults to zero fi operand, but we do not expose it anyway. Should we expose it later it needs to be added to the pseudo. This enables dpp combining on gfx10. Differential Revision: https://reviews.llvm.org/D68888 llvm-svn: 374604	2019-10-11 22:03:36 +00:00
David Blaikie	289c45cc62	DebugInfo: Use base address selection entries for debug_loc Unify the range and loc emission (for both DWARFv4 and DWARFv5 style lists) and take advantage of that unification to use strategic base addresses for loclists. Differential Revision: https://reviews.llvm.org/D68620 llvm-svn: 374600	2019-10-11 21:52:41 +00:00
Simon Atanasyan	66048fed82	[mips] Store 64-bit `li.d' operand as a single 8-byte value Now assembler generates two consecutive `.4byte` directives to store 64-bit `li.d' operand. The first directive stores high 4-byte of the value. The second directive stores low 4-byte of the value. But on 64-bit system we load this value at once and get wrong result if the system is little-endian. This patch fixes the bug. It stores the `li.d' operand as a single 8-byte value. Differential Revision: https://reviews.llvm.org/D68778 llvm-svn: 374598	2019-10-11 21:51:33 +00:00
Simon Atanasyan	5ebe3511b3	[mips] Use less instruction to load zero into FPR by li.s / li.d pseudos If `li.s` or `li.d` loads zero into a FPR, it's not necessary to load zero into `at` GPR register and then move its value into a floating point register. We can use as a source register the `zero / $0` one. Differential Revision: https://reviews.llvm.org/D68777 llvm-svn: 374597	2019-10-11 21:51:23 +00:00
David Green	7c30af8e65	Revert 374373: [Codegen] Alter the default promotion for saturating adds and subs This commit is not extending the promoted integers as it should. Reverting whilst I look into the details. llvm-svn: 374592	2019-10-11 20:33:03 +00:00
Quentin Colombet	9c36ec5941	[GISel][CallLowering] Enable vector support in argument lowering The exciting code is actually already enough to handle the splitting of vector arguments but we were lacking a test case. This commit adds a test case for vector argument lowering involving splitting and enable the related support in call lowering. llvm-svn: 374589	2019-10-11 20:22:57 +00:00
David Blaikie	f358c3d371	llvm-dwarfdump: Add verbose printing for debug_loclists llvm-svn: 374582	2019-10-11 19:06:35 +00:00
Simon Pilgrim	af6c15f679	[X86][SSE] Add support for v4i8 add reduction llvm-svn: 374579	2019-10-11 17:54:15 +00:00
Sanjay Patel	781c49de9c	[AArch64] add tests for (v)select-of-constants; NFC These are copied from existing test files in x86/PPC. llvm-svn: 374568	2019-10-11 16:10:23 +00:00
Kerry McLaughlin	ee0a0a3464	[AArch64][SVE] Implement sdot and udot (lane) intrinsics Summary: Implements the following arithmetic intrinsics: - int_aarch64_sve_sdot - int_aarch64_sve_sdot_lane - int_aarch64_sve_udot - int_aarch64_sve_udot_lane This patch includes tests for the Subdivide4Argument type added by D67549 Reviewers: sdesmalen, SjoerdMeijer, greened, rengolin, rovka Reviewed By: sdesmalen Subscribers: tschuett, kristof.beyls, rkruppe, psnobl, cfe-commits, llvm-commits Differential Revision: https://reviews.llvm.org/D67551 llvm-svn: 374566	2019-10-11 15:53:41 +00:00
David Tenty	033d16cedc	[AIX] Use .space instead of .zero in assembly Summary: The AIX system assembler does not understand .zero, so we should prefer emitting .space. Subscribers: nemanjai, hiraditya, kbarton, MaskRay, jsji, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68815 llvm-svn: 374564	2019-10-11 15:07:28 +00:00
Dmitry Preobrazhensky	c4995076c6	[AMDGPU][MC][GFX9][GFX10] Corrected number of src operands for ds_[read/write]_addtid_b32 See https://bugs.llvm.org/show_bug.cgi?id=37941 Reviewers: arsenm, rampitec Differential Revision: https://reviews.llvm.org/D68787 llvm-svn: 374561	2019-10-11 14:53:26 +00:00
Dmitry Preobrazhensky	b82fae01ea	[AMDGPU][MC][GFX6][GFX7][GFX10] Added instructions buffer_atomic_[fcmpswap/fmin/fmax]* See https://bugs.llvm.org/show_bug.cgi?id=28232 Reviewers: arsenm, rampitec Differential Revision: https://reviews.llvm.org/D68788 llvm-svn: 374559	2019-10-11 14:44:51 +00:00
Dmitry Preobrazhensky	472c6b0aa0	[AMDGPU][MC][GFX10] Enabled null for 64-bit dst operands See https://bugs.llvm.org/show_bug.cgi?id=43524 Reviewers: arsenm, rampitec Differential Revision: https://reviews.llvm.org/D68785 llvm-svn: 374557	2019-10-11 14:35:11 +00:00
Sanjay Patel	3b581ac80f	[DAGCombiner] fold vselect-of-constants to shift The diffs suggest that we are missing some more basic analysis/transforms, but this keeps the vector path in sync with the scalar (rL374397). This is again a preliminary step for introducing the reverse transform in IR as proposed in D63382. llvm-svn: 374555	2019-10-11 14:17:56 +00:00
Dmitry Preobrazhensky	882c3e3db5	[AMDGPU][MC] Corrected parsing of optional operands See https://bugs.llvm.org/show_bug.cgi?id=43486 Reviewers: artem.tamazov, arsenm Differential Revision: https://reviews.llvm.org/D68350 llvm-svn: 374553	2019-10-11 14:05:09 +00:00
Simon Atanasyan	d18e56db6b	[mips] Follow-up to r374544. Fix test case. llvm-svn: 374548	2019-10-11 12:58:37 +00:00
Kai Nacke	42b7cd5830	[Tests] Output of od can be lower or upper case (llvm-objcopy/yaml2obj). The command `od -t x` is used to dump data in hex format. The LIT tests assumes that the hex characters are in lowercase. However, there are also platforms which use uppercase letter. To solve this issue the tests are updated to use the new `--ignore-case` option of FileCheck. Reviewers: Bigcheese, jakehehrlich, rupprecht, espindola, alexshap, jhenderson Differential Revision: https://reviews.llvm.org/D68693 llvm-svn: 374547	2019-10-11 12:50:57 +00:00
Simon Atanasyan	b051a19aa0	[mips] Fix loading "double" immediate into a GPR and FPR If a "double" (64-bit) value has zero low 32-bits, it's possible to load such value into a GP/FP registers as an instruction immediate. But now assembler loads only high 32-bits of the value. For example, if a target register is GPR the `li.d $4, 1.0` instruction converts into the `lui $4, 16368` one. As a result, we get `0x3FF00000` in the register. While a correct representation of the `1.0` value is `0x3FF0000000000000`. The patch fixes that. Differential Revision: https://reviews.llvm.org/D68776 llvm-svn: 374544	2019-10-11 12:33:12 +00:00
George Rimar	e6e26339ff	[llvm-readobj] - Remove excessive fields when dumping "Version symbols". This removes a few fields that are not useful: "Section Name", "Address", "Offset" and "Link" (they duplicated the information available under the "Sections [" tag). Differential revision: https://reviews.llvm.org/D68704 llvm-svn: 374541	2019-10-11 12:27:11 +00:00
Oliver Stannard	9f6a873268	Dead Virtual Function Elimination Currently, it is hard for the compiler to remove unused C++ virtual functions, because they are all referenced from vtables, which are referenced by constructors. This means that if the constructor is called from any live code, then we keep every virtual function in the final link, even if there are no call sites which can use it. This patch allows unused virtual functions to be removed during LTO (and regular compilation in limited circumstances) by using type metadata to match virtual function call sites to the vtable slots they might load from. This information can then be used in the global dead code elimination pass instead of the references from vtables to virtual functions, to more accurately determine which functions are reachable. To make this transformation safe, I have changed clang's code-generation to always load virtual function pointers using the llvm.type.checked.load intrinsic, instead of regular load instructions. I originally tried writing this using clang's existing code-generation, which uses the llvm.type.test and llvm.assume intrinsics after doing a normal load. However, it is possible for optimisations to obscure the relationship between the GEP, load and llvm.type.test, causing GlobalDCE to fail to find virtual function call sites. The existing linkage and visibility types don't accurately describe the scope in which a virtual call could be made which uses a given vtable. This is wider than the visibility of the type itself, because a virtual function call could be made using a more-visible base class. I've added a new !vcall_visibility metadata type to represent this, described in TypeMetadata.rst. The internalization pass and libLTO have been updated to change this metadata when linking is performed. This doesn't currently work with ThinLTO, because it needs to see every call to llvm.type.checked.load in the linkage unit. It might be possible to extend this optimisation to be able to use the ThinLTO summary, as was done for devirtualization, but until then that combination is rejected in the clang driver. To test this, I've written a fuzzer which generates random C++ programs with complex class inheritance graphs, and virtual functions called through object and function pointers of different types. The programs are spread across multiple translation units and DSOs to test the different visibility restrictions. I've also tried doing bootstrap builds of LLVM to test this. This isn't ideal, because only classes in anonymous namespaces can be optimised with -fvisibility=default, and some parts of LLVM (plugins and bugpoint) do not work correctly with -fvisibility=hidden. However, there are only 12 test failures when building with -fvisibility=hidden (and an unmodified compiler), and this change does not cause any new failures for either value of -fvisibility. On the 7 C++ sub-benchmarks of SPEC2006, this gives a geomean code-size reduction of ~6%, over a baseline compiled with "-O2 -flto -fvisibility=hidden -fwhole-program-vtables". The best cases are reductions of ~14% in 450.soplex and 483.xalancbmk, and there are no code size increases. I've also run this on a set of 8 mbed-os examples compiled for Armv7M, which show a geomean size reduction of ~3%, again with no size increases. I had hoped that this would have no effect on performance, which would allow it to awlays be enabled (when using -fwhole-program-vtables). However, the changes in clang to use the llvm.type.checked.load intrinsic are causing ~1% performance regression in the C++ parts of SPEC2006. It should be possible to recover some of this perf loss by teaching optimisations about the llvm.type.checked.load intrinsic, which would make it worth turning this on by default (though it's still dependent on -fwhole-program-vtables). Differential revision: https://reviews.llvm.org/D63932 llvm-svn: 374539	2019-10-11 11:59:55 +00:00
Kai Nacke	5b5b2fd2b8	[FileCheck] Implement --ignore-case option. The FileCheck utility is enhanced to support a `--ignore-case` option. This is useful in cases where the output of Unix tools differs in case (e.g. case not specified by Posix). Reviewers: Bigcheese, jakehehrlich, rupprecht, espindola, alexshap, jhenderson, MaskRay Differential Revision: https://reviews.llvm.org/D68146 llvm-svn: 374538	2019-10-11 11:59:14 +00:00
Clement Courbet	c8eb0547ef	[llvm-exegesis] Show noise cluster in analysis output. Reviewers: gchatelet Subscribers: tschuett, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68780 llvm-svn: 374533	2019-10-11 11:33:18 +00:00
Vitaly Buka	b46dd6e92a	Insert module constructors in a module pass Summary: If we insert them from function pass some analysis may be missing or invalid. Fixes PR42877. Reviewers: eugenis, leonardchan Reviewed By: leonardchan Subscribers: hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D68832 > llvm-svn: 374481 Signed-off-by: Vitaly Buka <vitalybuka@google.com> llvm-svn: 374527	2019-10-11 08:47:03 +00:00
QingShan Zhang	bb8d540010	[TableGen] Fix a bug that MCSchedClassDesc is interfered between different SchedModel Assume that, ModelA has scheduling resource for InstA and ModelB has scheduling resource for InstB. This is what the llvm::MCSchedClassDesc looks like: llvm::MCSchedClassDesc ModelASchedClasses[] = { ... InstA, 0, ... InstB, -1,... }; llvm::MCSchedClassDesc ModelBSchedClasses[] = { ... InstA, -1,... InstB, 0,... }; The -1 means invalid num of macro ops, while it is valid if it is >=0. This is what we look like now: llvm::MCSchedClassDesc ModelASchedClasses[] = { ... InstA, 0, ... InstB, 0,... }; llvm::MCSchedClassDesc ModelBSchedClasses[] = { ... InstA, 0,... InstB, 0,... }; And compiler hit the assertion here because the SCDesc is valid now for both InstA and InstB. Differential Revision: https://reviews.llvm.org/D67950 llvm-svn: 374524	2019-10-11 08:36:54 +00:00
Craig Topper	e0cb1cf7e3	[X86] Add v8i64->v8i8 ssat/usat/packus truncate tests to min-legal-vector-width.ll I wonder if we should split the v8i8 stores in order to form two v4i8 saturating truncating stores. This would remove the unpckl needed to concatenated the v4i8 results to make a single store. llvm-svn: 374519	2019-10-11 07:24:36 +00:00
Yi-Hong Lyu	2fbfb04ffe	[PowerPC] Remove assertion "Shouldn't overwrite a register before it is killed" The assertion is everzealous and fail tests like: renamable $x3 = LI8 0 STD renamable $x3, 16, $x1 renamable $x3 = LI8 0 Remove the assertion since killed flag of $x3 is not mandentory. Differential Revision: https://reviews.llvm.org/D68344 llvm-svn: 374515	2019-10-11 05:32:29 +00:00
Chen Zheng	c6c6f717af	[NFC] run specific pass instead of whole -O3 pipeline for popcount recoginzation testcase. llvm-svn: 374514	2019-10-11 05:30:18 +00:00
Chen Zheng	c17c5864ff	[InstCombine] recognize popcount. This patch recognizes popcount intrinsic according to algorithm from website http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetParallel Differential Revision: https://reviews.llvm.org/D68189 llvm-svn: 374512	2019-10-11 05:13:56 +00:00
Craig Topper	ccc85ac855	[X86] Add a DAG combine to turn v16i16->v16i8 VTRUNCUS+store into a saturating truncating store. llvm-svn: 374509	2019-10-11 04:16:49 +00:00
Craig Topper	4b9947e2e7	[X86] Add test case for trunc_packus_v16i32_v16i8_store to min-legal-vector-width.ll We aren't folding the vpmovuswb into the store. llvm-svn: 374507	2019-10-11 04:02:04 +00:00
Philip Reames	2d5820cd72	[CVP] Remove a masking operation if range information implies it's a noop This is really a known bits style transformation, but known bits isn't context sensitive. The particular case which comes up happens to involve a range which allows range based reasoning to eliminate the mask pattern, so handle that case specifically in CVP. InstCombine likes to generate the mask-by-low-bits pattern when widening an arithmetic expression which includes a zext in the middle. Differential Revision: https://reviews.llvm.org/D68811 llvm-svn: 374506	2019-10-11 03:48:56 +00:00
Craig Topper	32097c2696	[X86] Add more packus/ssat/usat truncate tests from legal vectors to less than 128-bit vectors. Some of these have sub-optimal codegen for avx512 relative to avx2. llvm-svn: 374505	2019-10-11 03:46:39 +00:00
Nico Weber	d38332981f	Revert 374481 "[tsan,msan] Insert module constructors in a module pass" CodeGen/sanitizer-module-constructor.c fails on mac and windows, see e.g. http://lab.llvm.org:8011/builders/clang-x64-windows-msvc/builds/11424 llvm-svn: 374503	2019-10-11 02:44:20 +00:00
Lang Hames	e5c61cee44	[JITLink] Disable the MachO/AArch64 testcase while investigating bot failures. The windows bots are failing due to a memory layout error. Temporarily disabling while I investigate whether this can be worked around, or whether the test should be disabled on Windows. llvm-svn: 374500	2019-10-11 01:58:12 +00:00
Johannes Doerfert	8fa56c49df	[Attributor][FIX] Do not replace musstail calls with constant llvm-svn: 374498	2019-10-11 01:45:32 +00:00
Craig Topper	b560fd6c52	[X86] Improve the AVX512 bailout in combineTruncateWithSat to allow pack instructions in more situations. If we don't have VLX we won't end up selecting a saturating truncate for 256-bit or smaller vectors so we should just use the pack lowering. llvm-svn: 374487	2019-10-11 00:38:51 +00:00
Craig Topper	4dc27c69b6	[X86] Update trunc_packus_v32i32_v32i8 test in min-legal-vector-width.ll to use a load for the large type and add the min-legal-vector-width attribute. The attribute is needed to avoid zmm registers. Using memory avoids argument splitting for large vectors. llvm-svn: 374486	2019-10-11 00:38:41 +00:00
Vitaly Buka	5c72aa232e	[tsan,msan] Insert module constructors in a module pass Summary: If we insert them from function pass some analysis may be missing or invalid. Fixes PR42877. Reviewers: eugenis, leonardchan Reviewed By: leonardchan Subscribers: hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D68832 llvm-svn: 374481	2019-10-10 23:49:10 +00:00
Lang Hames	3cc04f6a41	[JITLink] Add an initial implementation of JITLink for MachO/AArch64. This implementation has support for all relocation types except TLV. Compact unwind sections are not yet supported, so exceptions/unwinding will not work. llvm-svn: 374476	2019-10-10 23:37:51 +00:00
Lang Hames	96cd736c2d	[JITLink] Move MachO/x86 got test further down in the data section. 'named_data' should be the first symbol in the data section. llvm-svn: 374475	2019-10-10 23:37:49 +00:00
Alina Sbirlea	6442b56974	[MemorySSA] Update Phi simplification. When simplifying a Phi to the unique value found incoming, check that there wasn't a Phi already created to break a cycle. If so, remove it. Resolves PR43541. Some additional nits included. llvm-svn: 374471	2019-10-10 23:27:21 +00:00
Craig Topper	a0df8b72f2	[X86] Add test cases for packus/ssat/usat 32i32->v32i8 test cases. NFC llvm-svn: 374459	2019-10-10 21:46:44 +00:00
Marcello Maggioni	0112123eea	[GISel] Allow getConstantVRegVal() to return G_FCONSTANT values. In GISel we have both G_CONSTANT and G_FCONSTANT, but because in GISel we don't really have a concept of Float vs Int value the only difference between the two is where the data originates from. What both G_CONSTANT and G_FCONSTANT return is just a bag of bits with the constant representation in it. By making getConstantVRegVal() return G_FCONSTANTs bit representation as well we allow ConstantFold and other things to operate with G_FCONSTANT. Adding tests that show ConstantFolding to work on mixed G_CONSTANT and G_FCONSTANT sources. Differential Revision: https://reviews.llvm.org/D68739 llvm-svn: 374458	2019-10-10 21:46:26 +00:00
Stanislav Mekhanoshin	19a1a739b1	[AMDGPU] Handle undef old operand in DPP combine It was missing an undef flag. Differential Revision: https://reviews.llvm.org/D68813 llvm-svn: 374455	2019-10-10 21:32:41 +00:00
Rong Xu	686fa4bbfb	[ValueTracking] Improve pointer offset computation for cases of same base This patch improves the handling of pointer offset in GEP expressions where one argument is the base pointer. isPointerOffset() is being used by memcpyopt where current code synthesizes consecutive 32 bytes stores to one store and two memset intrinsic calls. With this patch, we convert the stores to one memset intrinsic. Differential Revision: https://reviews.llvm.org/D67989 llvm-svn: 374454	2019-10-10 21:30:43 +00:00
Evandro Menezes	8bd4276981	[InstCombine] Add test case for PR43617 (NFC) Also, refactor check in `LibCallSimplifier::optimizeLog()`. llvm-svn: 374453	2019-10-10 21:29:10 +00:00
Alina Sbirlea	67f0c5c085	[MemorySSA] Additional handling of unreachable blocks. Summary: Whenever we get the previous definition, the assumption is that the recursion starts ina reachable block. If the recursion starts in an unreachable block, we may recurse indefinitely. Handle this case by returning LoE if the block is unreachable. Resolves PR43426. Reviewers: george.burgess.iv Subscribers: Prazek, sanjoy.google, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68809 llvm-svn: 374447	2019-10-10 20:43:06 +00:00
Sanjay Patel	8dd16ed0c8	[x86] reduce duplicate test assertions; NFC llvm-svn: 374436	2019-10-10 19:52:27 +00:00
Craig Topper	0e561437c5	[X86] Use packusdw+vpmovuswb to implement v16i32->V16i8 that clamps signed inputs to be between 0 and 255 when zmm registers are disabled on SKX. If we've disable zmm registers, the v16i32 will need to be split. This split will propagate through min/max the truncate. This creates two sequences that need to be concatenated back to v16i8. We can instead use packusdw to do part of the clamping, truncating, and concatenating all at once. Then we can use a vpmovuswb to finish off the clamp. Differential Revision: https://reviews.llvm.org/D68763 llvm-svn: 374431	2019-10-10 19:40:44 +00:00
Reid Kleckner	67d440b949	Print quoted backslashes in LLVM IR as \\ instead of \5C This improves readability of Windows path string literals in LLVM IR. The LLVM assembler has supported \\ in IR strings for a long time, but the lexer doesn't tolerate escaped quotes, so they have to be printed as \22 for now. llvm-svn: 374415	2019-10-10 18:31:57 +00:00
Reid Kleckner	e80a2616c8	Fix test to avoid check-not matching the temp file absolute path Fix for PR43636 llvm-svn: 374404	2019-10-10 18:01:27 +00:00
Sanjay Patel	7b904ce724	[DAGCombiner] fold select-of-constants to shift This reverses the scalar canonicalization proposed in D63382. Pre: isPowerOf2(C1) %r = select i1 %cond, i32 C1, i32 0 => %z = zext i1 %cond to i32 %r = shl i32 %z, log2(C1) https://rise4fun.com/Alive/Z50 x86 already tries to fold this pattern, but it isn't done uniformly, so we still see a diff. AArch64 probably should enable the TLI hook to benefit too, but that's a follow-on. llvm-svn: 374397	2019-10-10 17:52:02 +00:00
David Green	8628bb0491	[ARM] VQSUB instruction Same as VQADD, VQSUB can be selected from llvm.ssub.sat intrinsics. Differential Revision: https://reviews.llvm.org/D68567 llvm-svn: 374377	2019-10-10 16:34:30 +00:00
David Green	94d379095a	[Codegen] Alter the default promotion for saturating adds and subs The default promotion for the add_sat/sub_sat nodes currently does: 1. ANY_EXTEND iN to iM 2. SHL by M-N 3. [US][ADD\|SUB]SAT 4. L/ASHR by M-N If the promoted add_sat or sub_sat node is not legal, this can produce code that effectively does a lot of shifting (and requiring large constants to be materialised) just to use the overflow flag. It is simpler to just do the saturation manually, using the higher bitwidth addition and a min/max against the saturating bounds. That is what this patch attempts to do. Differential Revision: https://reviews.llvm.org/D68643 llvm-svn: 374373	2019-10-10 16:04:49 +00:00
Yonghong Song	d46a6a9e68	[BPF] Remove relocation for patchable externs Previously, patchable extern relocations are introduced to patch external variables used for multi versioning in compile once, run everywhere use case. The load instruction will be converted into a move with an patchable immediate which can be changed by bpf loader on the host. The kernel verifier has evolved and is able to load and propagate constant values, so compiler relocation becomes unnecessary. This patch removed codes related to this. Differential Revision: https://reviews.llvm.org/D68760 llvm-svn: 374367	2019-10-10 15:33:09 +00:00
Stanislav Mekhanoshin	cbe55c7caf	[AMDGPU] Fixed dpp_combine.mir with expensive checks. NFC. llvm-svn: 374365	2019-10-10 15:28:52 +00:00
Roman Lebedev	a5e65c1cf7	[MCA] Show aggregate over Average Wait times for the whole snippet (PR43219) Summary: As disscused in https://bugs.llvm.org/show_bug.cgi?id=43219, i believe it may be somewhat useful to show //some// aggregates over all the sea of statistics provided. Example: ``` Average Wait times (based on the timeline view): [0]: Executions [1]: Average time spent waiting in a scheduler's queue [2]: Average time spent waiting in a scheduler's queue while ready [3]: Average time elapsed from WB until retire stage [0] [1] [2] [3] 0. 3 1.0 1.0 4.7 vmulps %xmm0, %xmm1, %xmm2 1. 3 2.7 0.0 2.3 vhaddps %xmm2, %xmm2, %xmm3 2. 3 6.0 0.0 0.0 vhaddps %xmm3, %xmm3, %xmm4 3 3.2 0.3 2.3 <total> ``` I.e. we average the averages. Reviewers: andreadb, mattd, RKSimon Reviewed By: andreadb Subscribers: gbedwell, arphaman, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68714 llvm-svn: 374361	2019-10-10 14:46:21 +00:00
Dmitri Gribenko	d3aed7fc79	Revert "[FileCheck] Implement --ignore-case option." This reverts commit r374339. It broke tests: http://lab.llvm.org:8011/builders/clang-x86_64-debian-fast/builds/19066 llvm-svn: 374359	2019-10-10 14:27:14 +00:00
Dmitri Gribenko	a89e5a41ec	Revert "[Tests] Output of od can be lower or upper case (llvm-objcopy/yaml2obj)." This reverts commit r374343. It broke tests: http://lab.llvm.org:8011/builders/clang-x86_64-debian-fast/builds/19066 llvm-svn: 374358	2019-10-10 14:26:54 +00:00
Dmitri Gribenko	43fcbcb4e8	Revert "Fix OCaml/core.ml fneg check" This reverts commit r374346. It attempted to fix OCaml tests, but is does not actually fix them. llvm-svn: 374357	2019-10-10 14:16:58 +00:00
Simon Pilgrim	6a38474f77	[X86] combineFMA - Convert to use isNegatibleForFree/GetNegatedExpression. Split off from D67557. llvm-svn: 374356	2019-10-10 14:14:12 +00:00
Simon Pilgrim	fdc0917b46	Fix OCaml/core.ml fneg check (try 2) llvm-svn: 374355	2019-10-10 14:13:55 +00:00
Dmitri Gribenko	eaf6dd482b	Revert "[IRBuilder] Update IRBuilder::CreateFNeg(...) to return a UnaryOperator" This reverts commit r374240. It broke OCaml tests: http://lab.llvm.org:8011/builders/clang-x86_64-debian-fast/builds/19014 llvm-svn: 374354	2019-10-10 14:13:54 +00:00
Thomas Preud'homme	48edae336b	Revert "[test] Use system locale for mri-utf8.test" This reverts commit r374318 / `b6f1d1fa0e`. llvm-svn: 374349	2019-10-10 13:39:12 +00:00
Simon Pilgrim	fbf8b0bc0d	Fix OCaml/core.ml fneg check llvm-svn: 374346	2019-10-10 13:29:47 +00:00
George Rimar	55f1be0996	[llvm-readelf] - Do not enter an infinite loop when printing histogram. This is similar to D68086. We are entering an infinite loop when dumping a histogram for a specially crafted .hash section with a loop in a chain. Differential revision: https://reviews.llvm.org/D68771 llvm-svn: 374344	2019-10-10 13:26:26 +00:00
Kai Nacke	819f01d917	[Tests] Output of od can be lower or upper case (llvm-objcopy/yaml2obj). The command `od -t x` is used to dump data in hex format. The LIT tests assumes that the hex characters are in lowercase. However, there are also platforms which use uppercase letter. To solve this issue the tests are updated to use the new `--ignore-case` option of FileCheck. Reviewers: Bigcheese, jakehehrlich, rupprecht, espindola, alexshap, jhenderson Differential Revision: https://reviews.llvm.org/D68693 llvm-svn: 374343	2019-10-10 13:24:00 +00:00
Amaury Sechet	aaf0507896	[DAGCombine] Match more patterns for half word bswap Summary: It ensures that the bswap is generated even when a part of the subtree already matches a bswap transform. Reviewers: craig.topper, efriedma, RKSimon, lebedev.ri Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68250 llvm-svn: 374340	2019-10-10 13:20:10 +00:00
Kai Nacke	dfd2b6f07f	[FileCheck] Implement --ignore-case option. The FileCheck utility is enhanced to support a `--ignore-case` option. This is useful in cases where the output of Unix tools differs in case (e.g. case not specified by Posix). Reviewers: Bigcheese, jakehehrlich, rupprecht, espindola, alexshap, jhenderson, MaskRay Differential Revision: https://reviews.llvm.org/D68146 llvm-svn: 374339	2019-10-10 13:15:41 +00:00
Pavel Labath	3aa7e76677	MinidumpYAML: Add support for the memory info list stream Summary: The implementation is fairly straight-forward and uses the same patterns as the existing streams. The yaml form does not attempt to preserve the data in the "gaps" that can be created by setting a larger-than-required header or entry size in the stream header, because the existing consumer (lldb) does not make use of the information in the gap in any way, and attempting to preserve that would make the implementation more complicated. Reviewers: amccarth, jhenderson, clayborg Subscribers: llvm-commits, lldb-commits, markmentovai, zturner, JosephTremoulet Tags: #llvm Differential Revision: https://reviews.llvm.org/D68645 llvm-svn: 374337	2019-10-10 13:05:46 +00:00
David Green	39596ec2fe	[ARM] VQADD instructions This selects MVE VQADD from the vector llvm.sadd.sat or llvm.uadd.sat intrinsics. Differential Revision: https://reviews.llvm.org/D68566 llvm-svn: 374336	2019-10-10 13:05:04 +00:00
Sanjay Patel	3370d4d2b7	[AArch64][x86] add tests for (v)select bit magic; NFC llvm-svn: 374334	2019-10-10 12:53:24 +00:00
Mirko Brkusanin	c2e481679b	[Mips] Fix 374055 EXPENSIVE_CHECKS build was failing on new test. This is fixed by marking $ra register as undef. Test now has -verify-machineinstrs to check for operand flags. llvm-svn: 374320	2019-10-10 12:02:14 +00:00
Thomas Preud'homme	b6f1d1fa0e	[test] Use system locale for mri-utf8.test Summary: llvm-ar's mri-utf8.test test relies on the en_US.UTF-8 locale to be installed for its last RUN line to work. If not installed, the unicode string gets encoded (interpreted) as ascii which fails since the most significant byte is non zero. This commit changes the test to only rely on the system being able to encode the pound sign in its default encoding (e.g. UTF-16 for Microsoft Windows) by always opening the file via input/output redirection. This avoids forcing a given locale to be present and supported. A Byte Order Mark is also added to help recognizing the encoding of the file and its endianness. Reviewers: gbreynoo, MaskRay, rupprecht, JamesNagurne, jfb Subscribers: dexonsmith, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68472 llvm-svn: 374318	2019-10-10 11:48:30 +00:00
Oliver Stannard	4f454b2275	[IfCvt][ARM] Optimise diamond if-conversion for code size Currently, the heuristics the if-conversion pass uses for diamond if-conversion are based on execution time, with no consideration for code size. This adds a new set of heuristics to be used when optimising for code size. This is mostly target-independent, because the if-conversion pass can see the code size of the instructions which it is removing. For thumb, there are a few passes (insertion of IT instructions, selection of narrow branches, and selection of CBZ instructions) which are run after if conversion and affect these heuristics, so I've added target hooks to better predict the code-size effect of a proposed if-conversion. Differential revision: https://reviews.llvm.org/D67350 llvm-svn: 374301	2019-10-10 09:58:28 +00:00
Matt Arsenault	12994a70cf	AMDGPU: Use SGPR_128 instead of SReg_128 for vregs SGPR_128 only includes the real allocatable SGPRs, and SReg_128 adds the additional non-allocatable TTMP registers. There's no point in allocating SReg_128 vregs. This shrinks the size of the classes regalloc needs to consider, which is usually good. llvm-svn: 374284	2019-10-10 07:11:33 +00:00
Craig Topper	0a84576262	[X86] Add test case for trunc_packus_v16i32_v16i8 with avx512vl+avx512bw and prefer-vector-width=256 and min-legal-vector-width=256. NFC llvm-svn: 374283	2019-10-10 06:25:00 +00:00
Johannes Doerfert	72adda1740	[Attributor] Handle `null` differently in capture and alias logic Summary: `null` in the default address space (=AS 0) cannot be captured nor can it alias anything. We make this clear now as it can be important for callbacks and other cases later on. In addition, this patch improves the debug output for noalias deduction. Reviewers: sstefan1, uenoku Subscribers: hiraditya, bollu, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68624 llvm-svn: 374280	2019-10-10 05:33:21 +00:00
Chen Zheng	92e00293fd	[PowerPC] add testcase for ppc loop instr form prep - NFC llvm-svn: 374273	2019-10-10 03:00:15 +00:00
Reid Kleckner	9d8f0b3519	[codeview] Try to avoid emitting .cv_loc with line zero Summary: Visual Studio doesn't like it while stepping. It kicks you out of the source view of the file being stepped through and tries to fall back to the disassembly view. Fixes PR43530 The fix is incomplete, because it's possible to have a basic block with no source locations at all. In this case, we don't emit a .cv_loc, but that will result in wrong stepping behavior in the debugger if the layout predecessor of the location-less BB has an unrelated source location. We could try harder to find a valid location that dominates or post-dominates the current BB, but in general it's a dataflow problem, and one still might not exist. I left a FIXME about this. As an alternative, we might want to consider having the middle-end check if its emitting codeview and get it to stop using line zero. Reviewers: akhuang Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68747 llvm-svn: 374267	2019-10-10 01:06:01 +00:00
Thomas Lively	3414bce07a	[WebAssembly] Fix tests missed in rL374235 llvm-svn: 374259	2019-10-09 23:06:38 +00:00
Matt Arsenault	f8bf7d7f42	AMDGPU: Don't fold copies to physregs In a future patch, this will help cleanup m0 handling. The register coalescer handles copies from a register that materializes an immediate, but doesn't handle move immediates itself. The virtual register uses will often be allocated to the same register, so there end up being no real copy. llvm-svn: 374257	2019-10-09 22:51:42 +00:00
Matt Arsenault	85dfa82302	AMDGPU/GlobalISel: Fix crash on wide constant load with VGPR pointer This was ignoring the register bank of the input pointer, and isUniformMMO seems overly aggressive. This will now conservatively assume a VGPR in cases where the incoming bank hasn't been determined yet (i.e. is from a loop phi). llvm-svn: 374255	2019-10-09 22:44:49 +00:00
Matt Arsenault	3cd3959fe2	GlobalISel: Implement fewerElementsVector for G_BUILD_VECTOR Turn it into a G_CONCAT_VECTORS of G_BUILD_VECTOR. llvm-svn: 374252	2019-10-09 22:44:43 +00:00
Stanislav Mekhanoshin	c6dec1d828	[AMDGPU] Fixed dpp combine of VOP1 If original instruction did not have source modifiers they were not added to the new DPP instruction as well, even if needed. Differential Revision: https://reviews.llvm.org/D68729 llvm-svn: 374241	2019-10-09 22:02:58 +00:00
Cameron McInally	47363a148f	[IRBuilder] Update IRBuilder::CreateFNeg(...) to return a UnaryOperator Also update Clang to call Builder.CreateFNeg(...) for UnaryMinus. Differential Revision: https://reviews.llvm.org/D61675 llvm-svn: 374240	2019-10-09 21:52:15 +00:00
Thomas Lively	00f9e5aa76	[WebAssembly] Make returns variadic Summary: This is necessary and sufficient to get simple cases of multiple return working with multivalue enabled. More complex cases will require block and loop signatures to be generalized to potentially be type indices as well. Reviewers: aheejin, dschuff Subscribers: sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68684 llvm-svn: 374235	2019-10-09 21:42:08 +00:00
Wei Mi	09dcfe6805	[SampleFDO] Add indexing for function profiles so they can be loaded on demand in ExtBinary format Currently for Text, Binary and ExtBinary format profiles, when we compile a module with samplefdo, even if there is no function showing up in the profile, we have to load all the function profiles from the profile input. That is a waste of compile time. CompactBinary format profile has already had the support of loading function profiles on demand. In this patch, we add the support to load profile on demand for ExtBinary format. It will work no matter the sections in ExtBinary format profile are compressed or not. Experiment shows it reduces the time to compile a server benchmark by 30%. When profile remapping and loading function profiles on demand are both used, extra work needs to be done so that the loading on demand process will take the name remapping into consideration. It will be addressed in a follow-up patch. Differential Revision: https://reviews.llvm.org/D68601 llvm-svn: 374233	2019-10-09 21:36:03 +00:00
David Blaikie	411497c6c7	llvm-dwarfdump: Support multiple debug_loclists contributions Also fixing the incorrect "offset" field being computed/printed for each location list. llvm-svn: 374232	2019-10-09 21:25:28 +00:00
Sanjay Patel	232b9dc46a	[ConstProp] add tests for extractelement with undef index; NFC llvm-svn: 374210	2019-10-09 20:14:17 +00:00
Sanjay Patel	0845ac7331	[InstCombine] add another test for gep inbounds; NFC llvm-svn: 374190	2019-10-09 17:52:26 +00:00
Thomas Lively	3419e90dc1	[WebAssembly] Add builtin and intrinsic for v8x16.swizzle Summary: This clang builtin and corresponding LLVM intrinsic are necessary to expose the exact semantics of the underlying WebAssembly instruction to users. LLVM produces a poison value if the dynamic swizzle indices are greater than the vector size, but the WebAssembly instruction sets the corresponding output lane to zero. Users who depend on this behavior can safely use this builtin. Depends on D68527. Reviewers: aheejin, dschuff Subscribers: sbc100, jgravelle-google, hiraditya, sunfish, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D68531 llvm-svn: 374189	2019-10-09 17:45:47 +00:00
Thomas Lively	d5b7a4e2e8	[WebAssembly] v8x16.swizzle and rewrite BUILD_VECTOR lowering Summary: Adds the new v8x16.swizzle SIMD instruction as specified at https://github.com/WebAssembly/simd/blob/master/proposals/simd/SIMD.md#swizzling-using-variable-indices. In addition to adding swizzles as a candidate lowering in LowerBUILD_VECTOR, also rewrites and simplifies the lowering to minimize the number of replace_lanes necessary rather than trying to minimize code size. This leads to more uses of v128.const instead of splats, which is expected to increase performance. The new code will be easier to tune once V8 implements all the vector construction operations, and it will also be easier to add new candidate instructions in the future if necessary. Reviewers: aheejin, dschuff Subscribers: sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68527 llvm-svn: 374188	2019-10-09 17:39:19 +00:00
Kevin P. Neal	44e988ab14	[FPEnv][NFC] Change test to conform to strictfp attribute rules. In particular, the function definition is not marked strictfp despite containing a function marked strictfp. Also, if any function call is marked strictfp then all function calls in that function must be marked. This change to move the one strictfp call to a new properly marked function meets all the new rules. Tested with a stricter version of D68233. Reviewed by: spatel Approved by: spatel Differential Revision: https://reviews.llvm.org/D68713 llvm-svn: 374186	2019-10-09 17:24:56 +00:00
Sanjay Patel	df14bd315d	[SLP] respect target register width for GEP vectorization (PR43578) We failed to account for the target register width (max vector factor) when vectorizing starting from GEPs. This causes vectorization to proceed to obviously illegal widths as in: https://bugs.llvm.org/show_bug.cgi?id=43578 For x86, this also means that SLP can produce rogue AVX or AVX512 code even when the user specifies a narrower vector width. The AArch64 test in ext-trunc.ll appears to be better using the narrower width. I'm not exactly sure what getelementptr.ll is trying to do, but it's testing with "-slp-threshold=-18", so I'm not worried about those diffs. The x86 test is an over-reduction from SPEC h264; this patch appears to restore the perf loss caused by SLP when using -march=haswell. Differential Revision: https://reviews.llvm.org/D68667 llvm-svn: 374183	2019-10-09 16:32:49 +00:00
Momchil Velikov	d037a5f065	[AArch64] Ensure no tagged memory is left in the unallocated portion of the stack This patch makes sure that if we tag some memory, we untag that memory before the function returns/throws via any exit, reachable from the tag operation. For that we place the untag operation either at: a) the lifetime end call for the alloca, if that call post-dominates the lifetime start call (where the tag operation is placed), or it (the lifetime end call) dominates all reachable exits, otherwise b) at the reachable exits Differential Revision: https://reviews.llvm.org/D68469 llvm-svn: 374182	2019-10-09 16:31:50 +00:00
Jonas Devlieghere	e7affcdbd2	Re-land "[dsymutil] Fix handling of common symbols in multiple object files." The original patch got reverted because it hit a long-standing legacy issue on Windows that prevents files from being named `com`. Thanks Kristina & Jeremy for pointing this out. llvm-svn: 374178	2019-10-09 16:19:13 +00:00
Alina Sbirlea	7faa14a98b	[MemorySSA] Make the use of moveAllAfterMergeBlocks consistent. Summary: The rule for the moveAllAfterMergeBlocks API si for all instructions from `From` to have been moved to `To`, while keeping the CFG edges (and block terminators) unchanged. Update all the callsites for moveAllAfterMergeBlocks to follow this. Pending follow-up: since the same behavior is needed everytime, merge all callsites into one. The common denominator may be the call to `MergeBlockIntoPredecessor`. Resolves PR43569. Reviewers: george.burgess.iv Subscribers: Prazek, sanjoy.google, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68659 llvm-svn: 374177	2019-10-09 15:54:24 +00:00
David Green	fcc9c4627e	Add and adjust saturating tests. NFC This adds some extra testing to the existing [su][add/sub]_sat X86 and AArch64 tests and adds equivalent tests for ARM. llvm-svn: 374169	2019-10-09 14:17:38 +00:00
Sjoerd Meijer	d1170dbe58	[LV] Emitting SCEV checks with OptForSize When optimising for size and SCEV runtime checks need to be emitted to check overflow behaviour, the loop vectorizer can run in this assert: LoopVectorize.cpp:2699: void llvm::InnerLoopVectorizer::emitSCEVChecks( llvm::Loop , llvm::BasicBlock ): Assertion `!BB->getParent()->hasOptSize() && "Cannot SCEV check stride or overflow when opt We should not generate predicates while optimising for size because code will be generated for predicates such as these SCEV overflow runtime checks. This should fix PR43371. Differential Revision: https://reviews.llvm.org/D68082 llvm-svn: 374166	2019-10-09 13:19:41 +00:00
Simon Pilgrim	d7ac255325	[CostModel][X86] Add tests for insertelement to non-immediate vector element indices llvm-svn: 374161	2019-10-09 12:36:34 +00:00
Simon Pilgrim	a21176ffb1	[CostModel][X86] Add tests for extractelement from non-immediate vector element indices llvm-svn: 374160	2019-10-09 12:36:22 +00:00
David Green	e2c72929c8	[ARM] Add saturating arithmetic tests for MVE. NFC llvm-svn: 374159	2019-10-09 12:29:51 +00:00
James Molloy	9948fe6997	[TableGen] Fix crash when using HwModes in CodeEmitterGen When an instruction has an encoding definition for only a subset of the available HwModes, ensure we just avoid generating an encoding rather than crash. llvm-svn: 374150	2019-10-09 09:15:34 +00:00
Clement Courbet	c3a7fb7599	[llvm-exegesis] Explore LEA addressing modes. Summary: This will help for PR32326. This shows the well-known issue with `RBP` and `R13` as base registers. Reviewers: gchatelet Subscribers: tschuett, llvm-commits, RKSimon, andreadb Tags: #llvm Differential Revision: https://reviews.llvm.org/D68646 llvm-svn: 374146	2019-10-09 08:49:13 +00:00
Jeremy Morse	e9c8f6fea6	Revert r374139, "[dsymutil] Fix handling of common symbols in multiple object files." The added test files ("com", "com1.o", "com2.o") are reserved names on Windows, and makes 'git checkout' fail with a filesystem error. llvm-svn: 374144	2019-10-09 08:27:48 +00:00
Jonas Devlieghere	4ac388f7ca	[dsymutil] Fix handling of common symbols in multiple object files. For common symbols the linker emits only a single symbol entry in the debug map. This caused dsymutil to not relocate common symbols when linking DWARF coming form object files that did not have this entry. This patch fixes that by keeping track of common symbols in the object files and synthesizing a debug map entry for them using the address from the main binary. Differential revision: https://reviews.llvm.org/D68680 llvm-svn: 374139	2019-10-09 04:16:18 +00:00
Bill Wendling	4d69ca8c67	[IA] Add tests for a few other edge cases Test with the last eight bits within the range [7F, FF] and with lower-case hex letters. llvm-svn: 374124	2019-10-08 22:06:09 +00:00
Jonas Devlieghere	a3f794e9b4	[dsymutil] Improve verbose output (NFC) The verbose output for finding relocations assumed that we'd always dump the DIE after (which starts with a newline) and therefore didn't include one itself. However, this isn't always true, leading to garbled output. This patch adds a newline to the verbose output and adds a line that says that the DIE is being kept (which isn't obvious otherwise). It also adds a 0x prefix to the relocations. llvm-svn: 374123	2019-10-08 22:03:13 +00:00
Roman Lebedev	354ba6985c	[CVP} Replace SExt with ZExt if the input is known-non-negative Summary: zero-extension is far more friendly for further analysis. While this doesn't directly help with the shift-by-signext problem, this is not unrelated. This has the following effect on test-suite (numbers collected after the finish of middle-end module pass manager): \| Statistic \| old \| new \| delta \| percent change \| \| correlated-value-propagation.NumSExt \| 0 \| 6026 \| 6026 \| +100.00% \| \| instcount.NumAddInst \| 272860 \| 271283 \| -1577 \| -0.58% \| \| instcount.NumAllocaInst \| 27227 \| 27226 \| -1 \| 0.00% \| \| instcount.NumAndInst \| 63502 \| 63320 \| -182 \| -0.29% \| \| instcount.NumAShrInst \| 13498 \| 13407 \| -91 \| -0.67% \| \| instcount.NumAtomicCmpXchgInst \| 1159 \| 1159 \| 0 \| 0.00% \| \| instcount.NumAtomicRMWInst \| 5036 \| 5036 \| 0 \| 0.00% \| \| instcount.NumBitCastInst \| 672482 \| 672353 \| -129 \| -0.02% \| \| instcount.NumBrInst \| 702768 \| 702195 \| -573 \| -0.08% \| \| instcount.NumCallInst \| 518285 \| 518205 \| -80 \| -0.02% \| \| instcount.NumExtractElementInst \| 18481 \| 18482 \| 1 \| 0.01% \| \| instcount.NumExtractValueInst \| 18290 \| 18288 \| -2 \| -0.01% \| \| instcount.NumFAddInst \| 139035 \| 138963 \| -72 \| -0.05% \| \| instcount.NumFCmpInst \| 10358 \| 10348 \| -10 \| -0.10% \| \| instcount.NumFDivInst \| 30310 \| 30302 \| -8 \| -0.03% \| \| instcount.NumFenceInst \| 387 \| 387 \| 0 \| 0.00% \| \| instcount.NumFMulInst \| 93873 \| 93806 \| -67 \| -0.07% \| \| instcount.NumFPExtInst \| 7148 \| 7144 \| -4 \| -0.06% \| \| instcount.NumFPToSIInst \| 2823 \| 2838 \| 15 \| 0.53% \| \| instcount.NumFPToUIInst \| 1251 \| 1251 \| 0 \| 0.00% \| \| instcount.NumFPTruncInst \| 2195 \| 2191 \| -4 \| -0.18% \| \| instcount.NumFSubInst \| 92109 \| 92103 \| -6 \| -0.01% \| \| instcount.NumGetElementPtrInst \| 1221423 \| 1219157 \| -2266 \| -0.19% \| \| instcount.NumICmpInst \| 479140 \| 478929 \| -211 \| -0.04% \| \| instcount.NumIndirectBrInst \| 2 \| 2 \| 0 \| 0.00% \| \| instcount.NumInsertElementInst \| 66089 \| 66094 \| 5 \| 0.01% \| \| instcount.NumInsertValueInst \| 2032 \| 2030 \| -2 \| -0.10% \| \| instcount.NumIntToPtrInst \| 19641 \| 19641 \| 0 \| 0.00% \| \| instcount.NumInvokeInst \| 21789 \| 21788 \| -1 \| 0.00% \| \| instcount.NumLandingPadInst \| 12051 \| 12051 \| 0 \| 0.00% \| \| instcount.NumLoadInst \| 880079 \| 878673 \| -1406 \| -0.16% \| \| instcount.NumLShrInst \| 25919 \| 25921 \| 2 \| 0.01% \| \| instcount.NumMulInst \| 42416 \| 42417 \| 1 \| 0.00% \| \| instcount.NumOrInst \| 100826 \| 100576 \| -250 \| -0.25% \| \| instcount.NumPHIInst \| 315118 \| 314092 \| -1026 \| -0.33% \| \| instcount.NumPtrToIntInst \| 15933 \| 15939 \| 6 \| 0.04% \| \| instcount.NumResumeInst \| 2156 \| 2156 \| 0 \| 0.00% \| \| instcount.NumRetInst \| 84485 \| 84484 \| -1 \| 0.00% \| \| instcount.NumSDivInst \| 8599 \| 8597 \| -2 \| -0.02% \| \| instcount.NumSelectInst \| 45577 \| 45913 \| 336 \| 0.74% \| \| instcount.NumSExtInst \| 84026 \| 78278 \| -5748 \| -6.84% \| \| instcount.NumShlInst \| 39796 \| 39726 \| -70 \| -0.18% \| \| instcount.NumShuffleVectorInst \| 100272 \| 100292 \| 20 \| 0.02% \| \| instcount.NumSIToFPInst \| 29131 \| 29113 \| -18 \| -0.06% \| \| instcount.NumSRemInst \| 1543 \| 1543 \| 0 \| 0.00% \| \| instcount.NumStoreInst \| 805394 \| 804351 \| -1043 \| -0.13% \| \| instcount.NumSubInst \| 61337 \| 61414 \| 77 \| 0.13% \| \| instcount.NumSwitchInst \| 8527 \| 8524 \| -3 \| -0.04% \| \| instcount.NumTruncInst \| 60523 \| 60484 \| -39 \| -0.06% \| \| instcount.NumUDivInst \| 2381 \| 2381 \| 0 \| 0.00% \| \| instcount.NumUIToFPInst \| 5549 \| 5549 \| 0 \| 0.00% \| \| instcount.NumUnreachableInst \| 9855 \| 9855 \| 0 \| 0.00% \| \| instcount.NumURemInst \| 1305 \| 1305 \| 0 \| 0.00% \| \| instcount.NumXorInst \| 10230 \| 10081 \| -149 \| -1.46% \| \| instcount.NumZExtInst \| 60353 \| 66840 \| 6487 \| 10.75% \| \| instcount.TotalBlocks \| 829582 \| 829004 \| -578 \| -0.07% \| \| instcount.TotalFuncs \| 83818 \| 83817 \| -1 \| 0.00% \| \| instcount.TotalInsts \| 7316574 \| 7308483 \| -8091 \| -0.11% \| TLDR: we produce -0.11% less instructions, -6.84% less `sext`, +10.75% more `zext`. To be noted, clearly, not all new `zext`'s are produced by this fold. (And now i guess it might have been interesting to measure this for D68103 :S) Reviewers: nikic, spatel, reames, dberlin Reviewed By: nikic Subscribers: hiraditya, jfb, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68654 llvm-svn: 374112	2019-10-08 20:29:48 +00:00
Roman Lebedev	347f6a770b	[CVP][NFC] Revisit sext vs. zext test llvm-svn: 374111	2019-10-08 20:29:36 +00:00

1 2 3 4 5 ...

65849 Commits