llvm-project

Commit Graph

Author	SHA1	Message	Date
Simon Atanasyan	68f73bf262	[mips] Merge common checkings under the same check prefix. NFC llvm-svn: 370467	2019-08-30 12:15:12 +00:00
Luis Marques	c2b3d527fa	[RISCV] Fix a couple of tests' CHECKs llvm-svn: 370466	2019-08-30 12:11:47 +00:00
Haojian Wu	ed170c9bf9	Remove an extra ";", NFC. llvm-svn: 370465	2019-08-30 12:09:31 +00:00
Amaury Sechet	485760f4c0	[X86] Add tests for rotate matching. NFC llvm-svn: 370464	2019-08-30 11:35:28 +00:00
Bjorn Pettersson	227145924a	[CodeGen] Introduce MachineBasicBlock::replacePhiUsesWith helper and use it. NFC Summary: Found a couple of places in the code where all the PHI nodes of a MBB is updated, replacing references to one MBB by reference to another MBB instead. This patch simply refactors the code to use a common helper (MachineBasicBlock::replacePhiUsesWith) for such PHI node updates. Reviewers: t.p.northover, arsenm, uabelho Subscribers: wdng, hiraditya, jsji, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66750 llvm-svn: 370463	2019-08-30 11:23:10 +00:00
Simon Pilgrim	7cbf823f93	[DAGCombine] visitMULHS/visitMULHU - isBuildVectorAllZeros doesn't mean node is all zeros Return a proper zero vector, just in case some elements are undef. Noticed by inspection after dealing with a similar issue in PR43159. llvm-svn: 370460	2019-08-30 10:42:14 +00:00
Simon Pilgrim	01a3c25c27	Fix Wdocumentation warning. NFCI. llvm-svn: 370459	2019-08-30 10:25:52 +00:00
Chris Jackson	fa1fe93789	[llvm-objcopy] Allow the visibility of symbols created by --binary and --add-symbol to be specified with --new-symbol-visibility llvm-svn: 370458	2019-08-30 10:17:16 +00:00
Hideto Ueno	6381b143f6	[Attributor] Implement AANoAliasCallSiteArgument initialization Summary: This patch adds an appropriate `initialize` method for `AANoAliasCallSiteArgument`. Reviewers: jdoerfert, sstefan1 Reviewed By: jdoerfert Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66927 llvm-svn: 370456	2019-08-30 10:00:32 +00:00
Roman Lebedev	5c9f3cfec7	[LoopIdiomRecognize] BCmp loop idiom recognition Summary: @mclow.lists brought up this issue up in IRC. It is a reasonably common problem to compare some two values for equality. Those may be just some integers, strings or arrays of integers. In C, there is `memcmp()`, `bcmp()` functions. In C++, there exists `std::equal()` algorithm. One can also write that function manually. libstdc++'s `std::equal()` is specialized to directly call `memcmp()` for various types, but not `std::byte` from C++2a. https://godbolt.org/z/mx2ejJ libc++ does not do anything like that, it simply relies on simple C++'s `operator==()`. https://godbolt.org/z/er0Zwf (GOOD!) So likely, there exists a certain performance opportunities. Let's compare performance of naive `std::equal()` (no `memcmp()`) with one that is using `memcmp()` (in this case, compiled with modified compiler). {F8768213} ``` #include <algorithm> #include <cmath> #include <cstdint> #include <iterator> #include <limits> #include <random> #include <type_traits> #include <utility> #include <vector> #include "benchmark/benchmark.h" template <class T> bool equal(T* a, T* a_end, T* b) noexcept { for (; a != a_end; ++a, ++b) { if (a != b) return false; } return true; } template <typename T> std::vector<T> getVectorOfRandomNumbers(size_t count) { std::random_device rd; std::mt19937 gen(rd()); std::uniform_int_distribution<T> dis(std::numeric_limits<T>::min(), std::numeric_limits<T>::max()); std::vector<T> v; v.reserve(count); std::generate_n(std::back_inserter(v), count, [&dis, &gen]() { return dis(gen); }); assert(v.size() == count); return v; } struct Identical { template <typename T> static std::pair<std::vector<T>, std::vector<T>> Gen(size_t count) { auto Tmp = getVectorOfRandomNumbers<T>(count); return std::make_pair(Tmp, std::move(Tmp)); } }; struct InequalHalfway { template <typename T> static std::pair<std::vector<T>, std::vector<T>> Gen(size_t count) { auto V0 = getVectorOfRandomNumbers<T>(count); auto V1 = V0; V1[V1.size() / size_t(2)]++; // just change the value. return std::make_pair(std::move(V0), std::move(V1)); } }; template <class T, class Gen> void BM_bcmp(benchmark::State& state) { const size_t Length = state.range(0); const std::pair<std::vector<T>, std::vector<T>> Data = Gen::template Gen<T>(Length); const std::vector<T>& a = Data.first; const std::vector<T>& b = Data.second; assert(a.size() == Length && b.size() == a.size()); benchmark::ClobberMemory(); benchmark::DoNotOptimize(a); benchmark::DoNotOptimize(a.data()); benchmark::DoNotOptimize(b); benchmark::DoNotOptimize(b.data()); for (auto _ : state) { const bool is_equal = equal(a.data(), a.data() + a.size(), b.data()); benchmark::DoNotOptimize(is_equal); } state.SetComplexityN(Length); state.counters["eltcnt"] = benchmark::Counter(Length, benchmark::Counter::kIsIterationInvariant); state.counters["eltcnt/sec"] = benchmark::Counter(Length, benchmark::Counter::kIsIterationInvariantRate); const size_t BytesRead = 2 * sizeof(T) * Length; state.counters["bytes_read/iteration"] = benchmark::Counter(BytesRead, benchmark::Counter::kDefaults, benchmark::Counter::OneK::kIs1024); state.counters["bytes_read/sec"] = benchmark::Counter( BytesRead, benchmark::Counter::kIsIterationInvariantRate, benchmark::Counter::OneK::kIs1024); } template <typename T> static void CustomArguments(benchmark::internal::Benchmark* b) { const size_t L2SizeBytes = []() { for (const benchmark::CPUInfo::CacheInfo& I : benchmark::CPUInfo::Get().caches) { if (I.level == 2) return I.size; } return 0; }(); // What is the largest range we can check to always fit within given L2 cache? const size_t MaxLen = L2SizeBytes / /total bufs/ 2 / /maximal elt size/ sizeof(T) / /safety margin/ 2; b->RangeMultiplier(2)->Range(1, MaxLen)->Complexity(benchmark::oN); } BENCHMARK_TEMPLATE(BM_bcmp, uint8_t, Identical) ->Apply(CustomArguments<uint8_t>); BENCHMARK_TEMPLATE(BM_bcmp, uint16_t, Identical) ->Apply(CustomArguments<uint16_t>); BENCHMARK_TEMPLATE(BM_bcmp, uint32_t, Identical) ->Apply(CustomArguments<uint32_t>); BENCHMARK_TEMPLATE(BM_bcmp, uint64_t, Identical) ->Apply(CustomArguments<uint64_t>); BENCHMARK_TEMPLATE(BM_bcmp, uint8_t, InequalHalfway) ->Apply(CustomArguments<uint8_t>); BENCHMARK_TEMPLATE(BM_bcmp, uint16_t, InequalHalfway) ->Apply(CustomArguments<uint16_t>); BENCHMARK_TEMPLATE(BM_bcmp, uint32_t, InequalHalfway) ->Apply(CustomArguments<uint32_t>); BENCHMARK_TEMPLATE(BM_bcmp, uint64_t, InequalHalfway) ->Apply(CustomArguments<uint64_t>); ``` {F8768210} ``` $ ~/src/googlebenchmark/tools/compare.py --no-utest benchmarks build-{old,new}/test/llvm-bcmp-bench RUNNING: build-old/test/llvm-bcmp-bench --benchmark_out=/tmp/tmpb6PEUx 2019-04-25 21:17:11 Running build-old/test/llvm-bcmp-bench Run on (8 X 4000 MHz CPU s) CPU Caches: L1 Data 16K (x8) L1 Instruction 64K (x4) L2 Unified 2048K (x4) L3 Unified 8192K (x1) Load Average: 0.65, 3.90, 4.14 --------------------------------------------------------------------------------------------------- Benchmark Time CPU Iterations UserCounters... --------------------------------------------------------------------------------------------------- <...> BM_bcmp<uint8_t, Identical>/512000 432131 ns 432101 ns 1613 bytes_read/iteration=1000k bytes_read/sec=2.20706G/s eltcnt=825.856M eltcnt/sec=1.18491G/s BM_bcmp<uint8_t, Identical>_BigO 0.86 N 0.86 N BM_bcmp<uint8_t, Identical>_RMS 8 % 8 % <...> BM_bcmp<uint16_t, Identical>/256000 161408 ns 161409 ns 4027 bytes_read/iteration=1000k bytes_read/sec=5.90843G/s eltcnt=1030.91M eltcnt/sec=1.58603G/s BM_bcmp<uint16_t, Identical>_BigO 0.67 N 0.67 N BM_bcmp<uint16_t, Identical>_RMS 25 % 25 % <...> BM_bcmp<uint32_t, Identical>/128000 81497 ns 81488 ns 8415 bytes_read/iteration=1000k bytes_read/sec=11.7032G/s eltcnt=1077.12M eltcnt/sec=1.57078G/s BM_bcmp<uint32_t, Identical>_BigO 0.71 N 0.71 N BM_bcmp<uint32_t, Identical>_RMS 42 % 42 % <...> BM_bcmp<uint64_t, Identical>/64000 50138 ns 50138 ns 10909 bytes_read/iteration=1000k bytes_read/sec=19.0209G/s eltcnt=698.176M eltcnt/sec=1.27647G/s BM_bcmp<uint64_t, Identical>_BigO 0.84 N 0.84 N BM_bcmp<uint64_t, Identical>_RMS 27 % 27 % <...> BM_bcmp<uint8_t, InequalHalfway>/512000 192405 ns 192392 ns 3638 bytes_read/iteration=1000k bytes_read/sec=4.95694G/s eltcnt=1.86266G eltcnt/sec=2.66124G/s BM_bcmp<uint8_t, InequalHalfway>_BigO 0.38 N 0.38 N BM_bcmp<uint8_t, InequalHalfway>_RMS 3 % 3 % <...> BM_bcmp<uint16_t, InequalHalfway>/256000 127858 ns 127860 ns 5477 bytes_read/iteration=1000k bytes_read/sec=7.45873G/s eltcnt=1.40211G eltcnt/sec=2.00219G/s BM_bcmp<uint16_t, InequalHalfway>_BigO 0.50 N 0.50 N BM_bcmp<uint16_t, InequalHalfway>_RMS 0 % 0 % <...> BM_bcmp<uint32_t, InequalHalfway>/128000 49140 ns 49140 ns 14281 bytes_read/iteration=1000k bytes_read/sec=19.4072G/s eltcnt=1.82797G eltcnt/sec=2.60478G/s BM_bcmp<uint32_t, InequalHalfway>_BigO 0.40 N 0.40 N BM_bcmp<uint32_t, InequalHalfway>_RMS 18 % 18 % <...> BM_bcmp<uint64_t, InequalHalfway>/64000 32101 ns 32099 ns 21786 bytes_read/iteration=1000k bytes_read/sec=29.7101G/s eltcnt=1.3943G eltcnt/sec=1.99381G/s BM_bcmp<uint64_t, InequalHalfway>_BigO 0.50 N 0.50 N BM_bcmp<uint64_t, InequalHalfway>_RMS 1 % 1 % RUNNING: build-new/test/llvm-bcmp-bench --benchmark_out=/tmp/tmpQ46PP0 2019-04-25 21:19:29 Running build-new/test/llvm-bcmp-bench Run on (8 X 4000 MHz CPU s) CPU Caches: L1 Data 16K (x8) L1 Instruction 64K (x4) L2 Unified 2048K (x4) L3 Unified 8192K (x1) Load Average: 1.01, 2.85, 3.71 --------------------------------------------------------------------------------------------------- Benchmark Time CPU Iterations UserCounters... --------------------------------------------------------------------------------------------------- <...> BM_bcmp<uint8_t, Identical>/512000 18593 ns 18590 ns 37565 bytes_read/iteration=1000k bytes_read/sec=51.2991G/s eltcnt=19.2333G eltcnt/sec=27.541G/s BM_bcmp<uint8_t, Identical>_BigO 0.04 N 0.04 N BM_bcmp<uint8_t, Identical>_RMS 37 % 37 % <...> BM_bcmp<uint16_t, Identical>/256000 18950 ns 18948 ns 37223 bytes_read/iteration=1000k bytes_read/sec=50.3324G/s eltcnt=9.52909G eltcnt/sec=13.511G/s BM_bcmp<uint16_t, Identical>_BigO 0.08 N 0.08 N BM_bcmp<uint16_t, Identical>_RMS 34 % 34 % <...> BM_bcmp<uint32_t, Identical>/128000 18627 ns 18627 ns 37895 bytes_read/iteration=1000k bytes_read/sec=51.198G/s eltcnt=4.85056G eltcnt/sec=6.87168G/s BM_bcmp<uint32_t, Identical>_BigO 0.16 N 0.16 N BM_bcmp<uint32_t, Identical>_RMS 35 % 35 % <...> BM_bcmp<uint64_t, Identical>/64000 18855 ns 18855 ns 37458 bytes_read/iteration=1000k bytes_read/sec=50.5791G/s eltcnt=2.39731G eltcnt/sec=3.3943G/s BM_bcmp<uint64_t, Identical>_BigO 0.32 N 0.32 N BM_bcmp<uint64_t, Identical>_RMS 33 % 33 % <...> BM_bcmp<uint8_t, InequalHalfway>/512000 9570 ns 9569 ns 73500 bytes_read/iteration=1000k bytes_read/sec=99.6601G/s eltcnt=37.632G eltcnt/sec=53.5046G/s BM_bcmp<uint8_t, InequalHalfway>_BigO 0.02 N 0.02 N BM_bcmp<uint8_t, InequalHalfway>_RMS 29 % 29 % <...> BM_bcmp<uint16_t, InequalHalfway>/256000 9547 ns 9547 ns 74343 bytes_read/iteration=1000k bytes_read/sec=99.8971G/s eltcnt=19.0318G eltcnt/sec=26.8159G/s BM_bcmp<uint16_t, InequalHalfway>_BigO 0.04 N 0.04 N BM_bcmp<uint16_t, InequalHalfway>_RMS 29 % 29 % <...> BM_bcmp<uint32_t, InequalHalfway>/128000 9396 ns 9394 ns 73521 bytes_read/iteration=1000k bytes_read/sec=101.518G/s eltcnt=9.41069G eltcnt/sec=13.6255G/s BM_bcmp<uint32_t, InequalHalfway>_BigO 0.08 N 0.08 N BM_bcmp<uint32_t, InequalHalfway>_RMS 30 % 30 % <...> BM_bcmp<uint64_t, InequalHalfway>/64000 9499 ns 9498 ns 73802 bytes_read/iteration=1000k bytes_read/sec=100.405G/s eltcnt=4.72333G eltcnt/sec=6.73808G/s BM_bcmp<uint64_t, InequalHalfway>_BigO 0.16 N 0.16 N BM_bcmp<uint64_t, InequalHalfway>_RMS 28 % 28 % Comparing build-old/test/llvm-bcmp-bench to build-new/test/llvm-bcmp-bench Benchmark Time CPU Time Old Time New CPU Old CPU New --------------------------------------------------------------------------------------------------------------------------------------- <...> BM_bcmp<uint8_t, Identical>/512000 -0.9570 -0.9570 432131 18593 432101 18590 <...> BM_bcmp<uint16_t, Identical>/256000 -0.8826 -0.8826 161408 18950 161409 18948 <...> BM_bcmp<uint32_t, Identical>/128000 -0.7714 -0.7714 81497 18627 81488 18627 <...> BM_bcmp<uint64_t, Identical>/64000 -0.6239 -0.6239 50138 18855 50138 18855 <...> BM_bcmp<uint8_t, InequalHalfway>/512000 -0.9503 -0.9503 192405 9570 192392 9569 <...> BM_bcmp<uint16_t, InequalHalfway>/256000 -0.9253 -0.9253 127858 9547 127860 9547 <...> BM_bcmp<uint32_t, InequalHalfway>/128000 -0.8088 -0.8088 49140 9396 49140 9394 <...> BM_bcmp<uint64_t, InequalHalfway>/64000 -0.7041 -0.7041 32101 9499 32099 9498 ``` What can we tell from the benchmark? * Performance of naive equality check somewhat improves with element size, maxing out at eltcnt/sec=1.58603G/s for uint16_t, or bytes_read/sec=19.0209G/s for uint64_t. I think, that instability implies performance problems. * Performance of `memcmp()`-aware benchmark always maxes out at around bytes_read/sec=51.2991G/s for every type. That is 2.6x the throughput of the naive variant! * eltcnt/sec metric for the `memcmp()`-aware benchmark maxes out at eltcnt/sec=27.541G/s for uint8_t (was: eltcnt/sec=1.18491G/s, so 24x) and linearly decreases with element size. For uint64_t, it's ~4x+ the elements/second. * The call obvious is more pricey than the loop, with small element count. As it can be seen from the full output {F8768210}, the `memcmp()` is almost universally worse, independent of the element size (and thus buffer size) when element count is less than 8. So all in all, bcmp idiom does indeed pose untapped performance headroom. This diff does implement said idiom recognition. I think a reasonable test coverage is present, but do tell if there is anything obvious missing. Now, quality. This does succeed to build and pass the test-suite, at least without any non-bundled elements. {F8768216} {F8768217} This transform fires 91 times: ``` $ /build/test-suite/utils/compare.py -m loop-idiom.NumBCmp result-new.json Tests: 1149 Metric: loop-idiom.NumBCmp Program result-new MultiSourc...Benchmarks/7zip/7zip-benchmark 79.00 MultiSource/Applications/d/make_dparser 3.00 SingleSource/UnitTests/vla 2.00 MultiSource/Applications/Burg/burg 1.00 MultiSourc.../Applications/JM/lencod/lencod 1.00 MultiSource/Applications/lemon/lemon 1.00 MultiSource/Benchmarks/Bullet/bullet 1.00 MultiSourc...e/Benchmarks/MallocBench/gs/gs 1.00 MultiSourc...gs-C/TimberWolfMC/timberwolfmc 1.00 MultiSourc...Prolangs-C/simulator/simulator 1.00 ``` The size changes are: I'm not sure what's going on with SingleSource/UnitTests/vla.test yet, did not look. ``` $ /build/test-suite/utils/compare.py -m size..text result-{old,new}.json --filter-hash Tests: 1149 Same hash: 907 (filtered out) Remaining: 242 Metric: size..text Program result-old result-new diff test-suite...ingleSource/UnitTests/vla.test 753.00 833.00 10.6% test-suite...marks/7zip/7zip-benchmark.test 1001697.00 966657.00 -3.5% test-suite...ngs-C/simulator/simulator.test 32369.00 32321.00 -0.1% test-suite...plications/d/make_dparser.test 89585.00 89505.00 -0.1% test-suite...ce/Applications/Burg/burg.test 40817.00 40785.00 -0.1% test-suite.../Applications/lemon/lemon.test 47281.00 47249.00 -0.1% test-suite...TimberWolfMC/timberwolfmc.test 250065.00 250113.00 0.0% test-suite...chmarks/MallocBench/gs/gs.test 149889.00 149873.00 -0.0% test-suite...ications/JM/lencod/lencod.test 769585.00 769569.00 -0.0% test-suite.../Benchmarks/Bullet/bullet.test 770049.00 770049.00 0.0% test-suite...HMARK_ANISTROPIC_DIFFUSION/128 NaN NaN nan% test-suite...HMARK_ANISTROPIC_DIFFUSION/256 NaN NaN nan% test-suite...CHMARK_ANISTROPIC_DIFFUSION/64 NaN NaN nan% test-suite...CHMARK_ANISTROPIC_DIFFUSION/32 NaN NaN nan% test-suite...ENCHMARK_BILATERAL_FILTER/64/4 NaN NaN nan% Geomean difference nan% result-old result-new diff count 1.000000e+01 10.00000 10.000000 mean 3.152090e+05 311695.40000 0.006749 std 3.790398e+05 372091.42232 0.036605 min 7.530000e+02 833.00000 -0.034981 25% 4.243300e+04 42401.00000 -0.000866 50% 1.197370e+05 119689.00000 -0.000392 75% 6.397050e+05 639705.00000 -0.000005 max 1.001697e+06 966657.00000 0.106242 ``` I don't have timings though. And now to the code. The basic idea is to completely replace the whole loop. If we can't fully kill it, don't transform. I have left one or two comments in the code, so hopefully it can be understood. Also, there is a few TODO's that i have left for follow-ups: * widening of `memcmp()`/`bcmp()` * step smaller than the comparison size * Metadata propagation * more than two blocks as long as there is still a single backedge? * ??? Reviewers: reames, fhahn, mkazantsev, chandlerc, craig.topper, courbet Reviewed By: courbet Subscribers: hiraditya, xbolva00, nikic, jfb, gchatelet, courbet, llvm-commits, mclow.lists Tags: #llvm Differential Revision: https://reviews.llvm.org/D61144 llvm-svn: 370454	2019-08-30 09:51:23 +00:00
Roman Lebedev	09e4ac1a4d	[NFC] SCEVExpander: add SetCurrentDebugLocation() / getCurrentDebugLocation() wrappers Summary: The internal `Builder` is private, which means there is currently no way to set the debuginfo locations for `SCEVExpander`. This only adds the wrappers, but does not use them anywhere. Reviewers: mkazantsev, sanjoy, gberry, jyknight, dneilson Reviewed By: sanjoy Subscribers: javed.absar, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D61007 llvm-svn: 370453	2019-08-30 09:51:02 +00:00
David Stenberg	b35d4699d0	[LiveDebugValues] Insert entry values after bundles Summary: Change LiveDebugValues so that it inserts entry values after the bundle which contains the clobbering instruction. Previously it would insert the debug value after the bundle head using insertAfter(), breaking the bundle. Reviewers: djtodoro, NikolaPrica, aprantl, vsk Reviewed By: vsk Subscribers: hiraditya, llvm-commits Tags: #debug-info, #llvm Differential Revision: https://reviews.llvm.org/D66888 llvm-svn: 370448	2019-08-30 09:06:50 +00:00
Sven van Haastregt	fd66c8bf07	vim: add `immarg` keyword The `immarg` attribute was added in r355981. llvm-svn: 370443	2019-08-30 08:52:55 +00:00
Nico Weber	629f921568	gn build: Merge r370441 llvm-svn: 370442	2019-08-30 08:26:37 +00:00
Dmitri Gribenko	4fc0d3bd09	[ADT] Removed VariadicFunction Summary: It is not used. It uses macro-based unrolling instead of variadic templates, so it is not idiomatic anymore, and therefore it is a questionable API to keep "just in case". Subscribers: mgorny, dmgreen, dexonsmith, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66961 llvm-svn: 370441	2019-08-30 08:21:55 +00:00
Martin Storsjo	3d3a9b3b41	[LLD] [COFF] Support merging resource object files Extend WindowsResourceParser to support using a ResourceSectionRef for loading resources from an object file. Only allow merging resource object files in mingw mode; keep the existing error on multiple resource objects in link mode. If there only is one resource object file and no .res resources, don't parse and recreate the .rsrc section, but just link it in without inspecting it. This allows users to produce any .rsrc section (outside of what the parser supports), just like before. (I don't have a specific need for this, but it reduces the risk of this new feature.) Separate out the .rsrc section chunks in InputFiles.cpp, and only include them in the list of section chunks to link if we've determined that there only was one single resource object. (We need to keep other chunks from those object files, as they can legitimately contain other sections as well, in addition to .rsrc section chunks.) Differential Revision: https://reviews.llvm.org/D66824 llvm-svn: 370436	2019-08-30 06:56:33 +00:00
Martin Storsjo	d8d63ff24b	[WindowsResource] Remove use of global variables in WindowsResourceParser Instead of updating a global variable counter for the next index of strings and data blobs, pass along a reference to actual data/string vectors and let the TreeNode insertion methods add their data/strings to the vectors when a new entry is needed. Additionally, if the resource tree had duplicates, that were ignored with -force:multipleres in lld, we no longer store all versions of the duplicated resource data, now we only keep the one that actually ends up referenced. Differential Revision: https://reviews.llvm.org/D66823 llvm-svn: 370435	2019-08-30 06:56:02 +00:00
Martin Storsjo	e62d5682fb	[WindowsResource] Avoid duplicating the input filenames for each resource. NFC. Differential Revision: https://reviews.llvm.org/D66821 llvm-svn: 370434	2019-08-30 06:55:54 +00:00
Martin Storsjo	9438221785	[COFF] Add a ResourceSectionRef method for getting resource contents This allows llvm-readobj to print the contents of each resource when printing resources from an object file or executable, like it already does for plain .res files. This requires providing the whole COFFObjectFile to ResourceSectionRef. This supports both object files and executables. For executables, the DataRVA field is used as is to look up the right section. For object files, ideally we would need to complete linking of them and fix up all relocations to know what the DataRVA field would end up being. In practice, the only thing that makes sense for an RVA field is an ADDR32NB relocation. Thus, find a relocation pointing at this field, verify that it has the expected type, locate the symbol it points at, look up the section the symbol points at, and read from the right offset in that section. This works both for GNU windres object files (which use one single .rsrc section, with all relocations against the base of the .rsrc section, with the original value of the DataRVA field being the offset of the data from the beginning of the .rsrc section) and cvtres object files (with two separate .rsrc$01 and .rsrc$02 sections, and one symbol per data entry, with the original pre-relocated DataRVA field being set to zero). Differential Revision: https://reviews.llvm.org/D66820 llvm-svn: 370433	2019-08-30 06:55:49 +00:00
Petar Avramovic	e96892a8aa	[MIPS GlobalISel] Lower uitofp Add custom lowering for G_UITOFP for MIPS32. Differential Revision: https://reviews.llvm.org/D66930 llvm-svn: 370432	2019-08-30 05:51:12 +00:00
Petar Avramovic	6412b56513	[MIPS GlobalISel] Lower fptoui Add lower for G_FPTOUI. Algorithm is similar to the SDAG version in TargetLowering::expandFP_TO_UINT. Lower G_FPTOUI for MIPS32. Differential Revision: https://reviews.llvm.org/D66929 llvm-svn: 370431	2019-08-30 05:44:02 +00:00
Dan Gohman	8cfeeaf9de	[CodeGen] Fix lowering for returning the result of an extractvalue When the number of return values exceeds the number of registers available, SelectionDAGBuilder::visitRet transforms a function's return to use a pointer to a buffer to hold return values. When the returned value is an operator such as extractvalue, the value may have a non-zero result number. Add that number to the indexing when obtaining the values to store. This fixes https://bugs.llvm.org/show_bug.cgi?id=43132. Differential Revision: https://reviews.llvm.org/D66978 llvm-svn: 370430	2019-08-30 04:33:22 +00:00
Jinsong Ji	a070f12e57	[PowerPC][NFC] Use inline Subtarget->isPPC64() To be consistent with all the other instances. llvm-svn: 370428	2019-08-30 03:16:41 +00:00
Jinsong Ji	54a1ad5bd7	[PowerPC][NFC] Use -mtriple in RUN line, remove target triple in tls.ll To avoid confusion, especially when -mtriple are also added for PPC32. llvm-svn: 370427	2019-08-30 02:57:33 +00:00
Fangrui Song	7704b54389	[PPC32] Emit R_PPC_GOT_TPREL16 instead R_PPC_GOT_TPREL16_LO Unlike ppc64, which has ADDISgotTprelHA+LDgotTprelL pairs, ppc32 just uses LDgotTprelL32, so it does not make lots of sense to use _LO without a paired _HA. Emit R_PPC_GOT_TPREL16 instead R_PPC_GOT_TPREL16_LO to match GCC, and get better linker relocation check. Note, R_PPC_GOT_TPREL16_{HA,LO} don't have good linker support: (a) lld does not support R_PPC_GOT_TPREL16_{HA,LO}. (b) Top of tree ld.bfd does not support R_PPC_GOT_REL16_HA Initial-Exec -> Local-Exec relaxation: // a.o addis 3, 3, tsd_tls@got@tprel@ha lwz 3, tsd_tls@got@tprel@l(3) add 3, 3, tsd_tls@tls // b.o .section .tdata,"awT"; .globl tsd_tls; tsd_tls: // ld/ld-new a.o b.o internal error, aborting at ../../bfd/elf32-ppc.c:7952 in ppc_elf_relocate_section Reviewed By: adalava Differential Revision: https://reviews.llvm.org/D66925 llvm-svn: 370426	2019-08-30 02:20:49 +00:00
Craig Topper	160ed4cab4	[X86] Explicitly list all the always trivially rematerializable instructions. Add a default with an llvm_unreachable for anything we don't expect. This seems safer that just blindly returning true for anything missing from the switch. llvm-svn: 370424	2019-08-30 00:54:36 +00:00
Saleem Abdulrasool	be638099a4	DebugInfo: add CodeView register mapping for ARM NT Add the core registers and NEON registers mapping to the CodeView register ID. This is sufficient to compile a basic C program with debug info using CodeView debug info. llvm-svn: 370423	2019-08-30 00:16:02 +00:00
Dan Gohman	da84b688f9	[WebAssembly] Make __attribute__((used)) not imply export. Add an WASM_SYMBOL_NO_STRIP flag, so that __attribute__((used)) doesn't need to imply exporting. When targeting Emscripten, have WASM_SYMBOL_NO_STRIP imply exporting. Differential Revision: https://reviews.llvm.org/D62542 llvm-svn: 370415	2019-08-29 22:40:00 +00:00
Philip Reames	452e5647a5	[Tests] Precommit a few cases where we're missing oppurtunities for block local simplications off assumes. llvm-svn: 370414	2019-08-29 22:08:17 +00:00
Jinsong Ji	1ed7d2119e	[PowerPC] Support extended mnemonics mffprwz etc. Summary: Reported in https://github.com/opencv/opencv/issues/15413. We have serveral extended mnemonics for Move To/From Vector-Scalar Register Instructions eg: mffprd,mtfprd etc. We only support one of them, this patch add the others. Reviewers: nemanjai, steven.zhang, hfinkel, #powerpc Reviewed By: hfinkel Subscribers: wuzish, qcolombet, hiraditya, kbarton, MaskRay, shchenz, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66963 llvm-svn: 370411	2019-08-29 21:53:59 +00:00
Jessica Paquette	04e657be28	[AArch64][GlobalISel] Select arithmetic extended register patterns This teaches GISel to select patterns which fold an extend plus optional shift into the addressing mode. In particular, adds and subs. Factor out the arith extended register ComplexPatterns in AArch64InstrFormats.td and create GISel equivalents. Add some equivalent functions to the ones in AArch64ISelDAGToDAG: - `selectArithExtendedRegister` - `narrowExtendRegIfNeeded` - `getExtendTypeForInst` `getExtendTypeForInst` includes the checks for loads and stores. This will be used for WRO addressing modes in loads + stores. Teach selectCopy to properly handle subregister copies on the same bank in order to support `narrowExtendRegIfNeeded`. The extended register must be a GPR32, so we need to support same-bank subregister copies. Fix a bug in getSubRegForClass which would cause registers on things like GPR32common to end up getting ssub. Just change the check to look for FPR32 rather than GPR32. For tests: - Add select-arith-extended-reg.mir - Update addsub_ext.ll to include GlobalISel checks Differential Revision: https://reviews.llvm.org/D66835 llvm-svn: 370410	2019-08-29 21:53:58 +00:00
Reid Kleckner	5b79e603d3	[X86] Don't emit unreachable stack adjustments Summary: This is a minor improvement on our past attempts to do this. Fixes PR43155. Reviewers: hans Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66905 llvm-svn: 370409	2019-08-29 21:24:41 +00:00
Reid Kleckner	81e458d001	Allow '@' to appear in x86 mingw symbols Summary: There is no reason to differ in assembler behavior here between -msvc and -gnu targets. Without this setting, the text after the '@' is interpreted as a symbol variable, like foo@IMGREL. Reviewers: mstorsjo Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66974 llvm-svn: 370408	2019-08-29 21:15:02 +00:00
Sanjay Patel	33541fafde	[InstCombine] add possible bswap as widening shuffle test; NFC Goes with the proposal in D66965. llvm-svn: 370407	2019-08-29 20:57:50 +00:00
Reid Kleckner	fe47ed67fc	Fix the build for MSVC builds using M_PI llvm-svn: 370405	2019-08-29 20:32:53 +00:00
Simon Pilgrim	3d705a1fa4	[X86][SSE] combinePMULDQ - pmuldq(x, 0) -> zero vector (PR43159) ISD::isBuildVectorAllZeros permits undef elements to be present, which means we can't return it as a zero vector. PMULDQ/PMULUDQ is an extending multiply so a multiply by zero of the lower 32-bits should result in a zero 64-bit element. llvm-svn: 370404	2019-08-29 20:22:08 +00:00
Julian Lettner	af78899457	[ASan] Version mismatch check follow-up Follow-up for: [ASan] Make insertion of version mismatch guard configurable `3ae9b9d5e4` This tiny change makes sure that this test passes on our internal bots as well. llvm-svn: 370403	2019-08-29 20:20:05 +00:00
Matt Arsenault	cbd1782c79	AMDGPU/GlobalISel: Legalize sin/cos llvm-svn: 370402	2019-08-29 20:06:48 +00:00
Sanjay Patel	65f1c04000	[InstCombine] reduce duplicated code; NFC llvm-svn: 370399	2019-08-29 19:36:18 +00:00
Jordan Rupprecht	f9f81289e6	Revert [MBP] Disable aggressive loop rotate in plain mode This reverts r369664 (git commit `51f48295cb`) It causes many benchmark regressions, internally and in llvm's benchmark suite. llvm-svn: 370398	2019-08-29 19:03:58 +00:00
Alina Sbirlea	4b87023bae	Revert enabling MemorySSA. Breaks sanitizers bots. Differential Revision: https://reviews.llvm.org/D58311 llvm-svn: 370397	2019-08-29 19:01:23 +00:00
Craig Topper	5a43fdd313	[X86] Remove what little support we had for MPX -Deprecate -mmpx and -mno-mpx command line options -Remove CPUID detection of mpx for -march=native -Remove MPX from all CPUs -Remove MPX preprocessor define I've left the "mpx" string in the backend so we don't fail on old IR, but its not connected to anything. gcc has also deprecated these command line options. https://www.phoronix.com/scan.php?page=news_item&px=GCC-Patch-To-Drop-MPX Differential Revision: https://reviews.llvm.org/D66669 llvm-svn: 370393	2019-08-29 18:09:02 +00:00
Matt Arsenault	093ebf9275	GlobalISel: Don't compute known bits for non-integral GEP llvm-svn: 370392	2019-08-29 17:55:05 +00:00
Florian Hahn	f9cdb98f40	[LoopUnrollAndJam] Use Lazy strategy for DTU. We can also apply the earlier updates to the lazy DTU, instead of applying them directly. Reviewers: kuhar, brzycki, asbirlea, SjoerdMeijer Reviewed By: brzycki, asbirlea, SjoerdMeijer Differential Revision: https://reviews.llvm.org/D66918 llvm-svn: 370391	2019-08-29 17:47:58 +00:00
Matt Arsenault	b2b9a23758	GlobalISel: Add maskedValueIsZero and signBitIsZero to known bits I dropped the DemandedElts since it seems to be missing from some of the new interfaces, but not others. llvm-svn: 370389	2019-08-29 17:24:36 +00:00
Matt Arsenault	caff0a88dd	GlobalISel: Add known bits to InstructionSelector AMDGPU uses this for some addressing mode selection patterns. The analysis run itself doesn't do anything so it seems easier to just always require this than adding a way to opt in. llvm-svn: 370388	2019-08-29 17:24:32 +00:00
Alina Sbirlea	6289ee941d	[MemorySSA & LoopPassManager] Enable MemorySSA as loop dependency. Update tests. Summary: I'm not planning to check this in at the moment, but feedback is very welcome, in particular how this affects performance. The feedback obtains here will guide the next steps towards enabling this. This patch enables the use of MemorySSA in the loop pass manager. Passes that currently use MemorySSA: - EarlyCSE Passes that use MemorySSA after this patch: - EarlyCSE - LICM - SimpleLoopUnswitch Loop passes that update MemorySSA (and do not use it yet, but could use it after this patch): - LoopInstSimplify - LoopSimplifyCFG - LoopUnswitch - LoopRotate - LoopSimplify - LCSSA Loop passes that do not update MemorySSA: - IndVarSimplify - LoopDelete - LoopIdiom - LoopSink - LoopUnroll - LoopInterchange - LoopUnrollAndJam - LoopVectorize - LoopReroll - IRCE Reviewers: chandlerc, george.burgess.iv, davide, sanjoy, gberry Subscribers: jlebar, Prazek, dmgreen, jdoerfert, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D58311 llvm-svn: 370384	2019-08-29 17:08:13 +00:00
Jessica Paquette	ba04f5fac1	[GlobalISel][AArch64] Select llvm.aarch64.stxr* intrinsics. Add a GISelPredicateCode to the stxr_* PatFrags in AArch64InstrAtomics.td. This allows us to select these intrinsics. Differential Revision: https://reviews.llvm.org/D65779 llvm-svn: 370382	2019-08-29 16:55:55 +00:00
Sanjay Patel	63411910a2	[InstCombine] add tests for bswap disguised as shuffle; NFC Somewhat motivating case In PR43146: https://bugs.llvm.org/show_bug.cgi?id=43146 But that's a lot more complicated. llvm-svn: 370381	2019-08-29 16:48:00 +00:00
Jessica Paquette	b8b23a1648	[GlobalISel][AArch64] Use a GISelPredicateCode to select llvm.aarch64.stlxr.* Remove manual selection code for this intrinsic and use a GISelPredicateCode instead. This allows us to fully select this intrinsic without any tricky custom C++ matching. Differential Revision: https://reviews.llvm.org/D65780 llvm-svn: 370380	2019-08-29 16:45:19 +00:00
Jessica Paquette	87720ac8c8	[AArch64][GlobalISel] Select @llvm.aarch64.ldxr.* intrinsics Same thing as D66897, but for ldxr.* instead. Add a GISelPredicateCode to the ldxr_* definitions, which allows us to import them. Add select-ldxr-intrin.mir, and update arm64-ldxr-stxr.ll. Differential Revision: https://reviews.llvm.org/D66898 llvm-svn: 370378	2019-08-29 16:33:01 +00:00
Jessica Paquette	c327daeea5	[AArch64][GlobalISel] Select @llvm.aarch64.ldaxr.* intrinsics Add a GISelPredicateCode to ldaxr_. This allows us to import the patterns for @llvm.aarch64.ldaxr., and thus select them. Add `isLoadStoreOfNumBytes` for the GISelPredicateCode, since each of these intrinsics involves the same check. Add select-ldaxr-intrin.mir, and update arm64-ldxr-stxr.ll. Differential Revision: https://reviews.llvm.org/D66897 llvm-svn: 370377	2019-08-29 16:16:38 +00:00
Michael Liao	001871dee8	[SimplifyCFG] Skip sinking common lifetime markers of `alloca`. Summary: - Similar to the workaround in fix of PR30188, skip sinking common lifetime markers of `alloca`. They are mostly left there after inlining functions in branches. Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66950 llvm-svn: 370376	2019-08-29 16:12:05 +00:00
Jinsong Ji	8b0317ad7d	[PowerPC][NFC] Update fp-int-conversions-direct-moves.ll using script Also add -ppc-asm-full-reg-names,-ppc-vsr-nums-as-vr. llvm-svn: 370375	2019-08-29 15:38:02 +00:00
Roman Lebedev	05ef49515e	[NFC][SimplifyCFG] 'Safely extract low bits' pattern will also benefit from -phi-node-folding-threshold=3 This is the naive implementation of x86 BZHI/BEXTR instruction: it takes input and bit count, and extracts low nbits up to bit width. I.e. unlike shift it does not have any UB when nbits >= bitwidth. Which means we don't need a while PHI here, simple select will do. And if it's a select, it should then be trivial to fix codegen to select it to BEXTR/BZHI. See https://bugs.llvm.org/show_bug.cgi?id=34704 llvm-svn: 370369	2019-08-29 14:46:49 +00:00
Simon Pilgrim	ea67741899	[DAGCombine] Fix shadow variable warnings. NFCI. llvm-svn: 370365	2019-08-29 14:34:07 +00:00
Pavel Labath	bd546e5902	DWARFDebugLoc: Make parsing and error reporting more robust Summary: While examining this class for possible use in lldb, I noticed two things: - it spits out parsing errors directly to stderr - the loclists parser can incorrectly return valid location lists when parsing malformed (truncated) data I improve the stderr situation by making the parseOneLocationList functions return Expected<T>s. The errors are still dumped to stderr by their callers, so this is only a partial fix, but it is enough for my use case, as I intend to parse the locations lists one by one. I fix the behavior in the truncated scenario by using the newly introduced DataExtractor Cursor API. I also add tests for handling the error cases, as they currently have no coverage. Reviewers: dblaikie, JDevlieghere, probinson Subscribers: lldb-commits, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63591 llvm-svn: 370363	2019-08-29 14:26:05 +00:00
Luis Marques	cf3b39391e	[RISCV] Fix callee-saved-gprs.ll test ABIs llvm-svn: 370359	2019-08-29 14:05:59 +00:00
Joerg Sonnenberger	799c96693f	Allow replaceAndRecursivelySimplify to list unsimplified visitees. This is part of D65280 and split it to avoid ABI changes on the 9.0 release branch. llvm-svn: 370355	2019-08-29 13:22:30 +00:00
Simon Atanasyan	b23857c149	[mips] Inline emitStoreWithSymOffset and emitLoadWithSymOffset methods. NFC Both methods `MipsTargetStreamer::emitStoreWithSymOffset` and `MipsTargetStreamer::emitLoadWithSymOffset` are almost the same and differ argument names only. These methods are used in the single place so it's better to inline their code and remove original methods. llvm-svn: 370354	2019-08-29 13:19:50 +00:00
Simon Atanasyan	3464b91ef7	[mips] Fix expanding `lw/sw $reg1, symbol($reg2)` instruction When a "base" in the `lw/sw $reg1, symbol($reg2)` instruction is a register and generated code is position independent, backend does not add the "base" value to the symbol address. ``` lw $reg1, %got(symbol)($gp) lw/sw $reg1, 0($reg1) ``` This patch fixes the bug and adds the missed `addu` instruction by passing `BaseReg` into the `loadAndAddSymbolAddress` routine and handles the case when the `BaseReg` is the zero register to escape redundant `move reg, reg` instruction: ``` lw $reg1, %got(symbol)($gp) addu $reg1, $reg1, $reg2 lw/sw $reg1, 0($reg1) ``` Differential Revision: https://reviews.llvm.org/D66894 llvm-svn: 370353	2019-08-29 13:19:38 +00:00
Roman Lebedev	c584786854	[InstSimplify] Drop leftover "division-by-zero guard" around `@llvm.umul.with.overflow` inverted overflow bit Summary: Now that with D65143/D65144 we've produce `@llvm.umul.with.overflow`, and with D65147 we've flattened the CFG, we now can see that the guard may have been there to prevent division by zero is redundant. We can simply drop it: ``` ---------------------------------------- Name: no overflow or zero %iszero = icmp eq i4 %y, 0 %umul = smul_overflow i4 %x, %y %umul.ov = extractvalue {i4, i1} %umul, 1 %umul.ov.not = xor %umul.ov, -1 %retval.0 = or i1 %iszero, %umul.ov.not ret i1 %retval.0 => %iszero = icmp eq i4 %y, 0 %umul = smul_overflow i4 %x, %y %umul.ov = extractvalue {i4, i1} %umul, 1 %umul.ov.not = xor %umul.ov, -1 %retval.0 = or i1 %iszero, %umul.ov.not ret i1 %umul.ov.not Done: 1 Optimization is correct! ``` Note that this is inverted from what we have in a previous patch, here we are looking for the inverted overflow bit. And that inversion is kinda problematic - given this particular pattern we neither hoist that `not` closer to `ret` (then the pattern would have been identical to the one without inversion, and would have been handled by the previous patch), neither do the opposite transform. But regardless, we should handle this too. I've filled [[ https://bugs.llvm.org/show_bug.cgi?id=42720 \| PR42720 ]]. Reviewers: nikic, spatel, xbolva00, RKSimon Reviewed By: spatel Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65151 llvm-svn: 370351	2019-08-29 12:48:04 +00:00
Roman Lebedev	aaf6ab4410	[InstSimplify] Drop leftover "division-by-zero guard" around `@llvm.umul.with.overflow` overflow bit Summary: Now that with D65143/D65144 we've produce `@llvm.umul.with.overflow`, and with D65147 we've flattened the CFG, we now can see that the guard may have been there to prevent division by zero is redundant. We can simply drop it: ``` ---------------------------------------- Name: no overflow and not zero %iszero = icmp ne i4 %y, 0 %umul = umul_overflow i4 %x, %y %umul.ov = extractvalue {i4, i1} %umul, 1 %retval.0 = and i1 %iszero, %umul.ov ret i1 %retval.0 => %iszero = icmp ne i4 %y, 0 %umul = umul_overflow i4 %x, %y %umul.ov = extractvalue {i4, i1} %umul, 1 %retval.0 = and i1 %iszero, %umul.ov ret %umul.ov Done: 1 Optimization is correct! ``` Reviewers: nikic, spatel, xbolva00 Reviewed By: spatel Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65150 llvm-svn: 370350	2019-08-29 12:47:50 +00:00
Roman Lebedev	9f35d2b564	[SimplifyCFG] FoldTwoEntryPHINode(): don't bailout on i1 PHI's if we can hoist a 'not' from incoming values Summary: As it can be seen in the tests in D65143/D65144, even though we have formed an '@llvm.umul.with.overflow' and got rid of potential for division-by-zero, the control flow remains, we still have that branch. We have this condition: ``` // Don't fold i1 branches on PHIs which contain binary operators // These can often be turned into switches and other things. if (PN->getType()->isIntegerTy(1) && (isa<BinaryOperator>(PN->getIncomingValue(0)) \|\| isa<BinaryOperator>(PN->getIncomingValue(1)) \|\| isa<BinaryOperator>(IfCond))) return false; ``` which was added back in rL121764 to help with `select` formation i think? That check prevents us to flatten the CFG here, even though we know we no longer need that guard and will be able to drop everything but the '@llvm.umul.with.overflow' + `not`. As it can be seen from tests, we end here because the `not` is being sinked into the PHI's incoming values by InstCombine, so we can't workaround this by hoisting it to after PHI. Thus i suggest that we relax that check to not bailout if we'd get to hoist the `not`. Reviewers: craig.topper, spatel, fhahn, nikic Reviewed By: spatel Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65147 llvm-svn: 370349	2019-08-29 12:47:34 +00:00
Roman Lebedev	473a063a5e	[InstCombine] Fold '((%x * %y) u/ %x) != %y' to '@llvm.umul.with.overflow' + overflow bit extraction Summary: `((%x * %y) u/ %x) != %y` is one of (3?) common ways to check that some unsigned multiplication (will not) overflow. Currently, we don't catch it. We could: ``` $ /repositories/alive2/build-Clang-unknown/alive -root-only ~/llvm-patch1.ll Processing /home/lebedevri/llvm-patch1.ll.. ---------------------------------------- Name: no overflow %o0 = mul i4 %y, %x %o1 = udiv i4 %o0, %x %r = icmp ne i4 %o1, %y ret i1 %r => %n0 = umul_overflow i4 %x, %y %o0 = extractvalue {i4, i1} %n0, 0 %o1 = udiv %o0, %x %r = extractvalue {i4, i1} %n0, 1 ret %r Done: 1 Optimization is correct! ---------------------------------------- Name: no overflow %o0 = mul i4 %y, %x %o1 = udiv i4 %o0, %x %r = icmp eq i4 %o1, %y ret i1 %r => %n0 = umul_overflow i4 %x, %y %o0 = extractvalue {i4, i1} %n0, 0 %o1 = udiv %o0, %x %n1 = extractvalue {i4, i1} %n0, 1 %r = xor %n1, -1 ret i1 %r Done: 1 Optimization is correct! ``` Reviewers: nikic, spatel, efriedma, xbolva00, RKSimon Reviewed By: nikic Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65144 llvm-svn: 370348	2019-08-29 12:47:20 +00:00
Roman Lebedev	fb38b7aab3	[InstCombine] Fold '(-1 u/ %x) u< %y' to '@llvm.umul.with.overflow' + overflow bit extraction Summary: `(-1 u/ %x) u< %y` is one of (3?) common ways to check that some unsigned multiplication (will not) overflow. Currently, we don't catch it. We could: ``` ---------------------------------------- Name: no overflow %o0 = udiv i4 -1, %x %r = icmp ult i4 %o0, %y => %o0 = udiv i4 -1, %x %n0 = umul_overflow i4 %x, %y %r = extractvalue {i4, i1} %n0, 1 Done: 1 Optimization is correct! ---------------------------------------- Name: no overflow, swapped %o0 = udiv i4 -1, %x %r = icmp ugt i4 %y, %o0 => %o0 = udiv i4 -1, %x %n0 = umul_overflow i4 %x, %y %r = extractvalue {i4, i1} %n0, 1 Done: 1 Optimization is correct! ---------------------------------------- Name: overflow %o0 = udiv i4 -1, %x %r = icmp uge i4 %o0, %y => %o0 = udiv i4 -1, %x %n0 = umul_overflow i4 %x, %y %n1 = extractvalue {i4, i1} %n0, 1 %r = xor %n1, -1 Done: 1 Optimization is correct! ---------------------------------------- Name: overflow %o0 = udiv i4 -1, %x %r = icmp ule i4 %y, %o0 => %o0 = udiv i4 -1, %x %n0 = umul_overflow i4 %x, %y %n1 = extractvalue {i4, i1} %n0, 1 %r = xor %n1, -1 Done: 1 Optimization is correct! ``` As it can be observed from tests, while simply forming the `@llvm.umul.with.overflow` is easy, if we were looking for the inverted answer, then more work needs to be done to cleanup the now-pointless control-flow that was guarding against division-by-zero. This is being addressed in follow-up patches. Reviewers: nikic, spatel, efriedma, xbolva00, RKSimon Reviewed By: nikic, xbolva00 Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65143 llvm-svn: 370347	2019-08-29 12:47:08 +00:00
Roman Lebedev	cc95a45f8a	[CostModel] Model all `extractvalue`s as free. Summary: As disscussed in https://reviews.llvm.org/D65148#1606412, `extractvalue` don't actually generate any code, so we should treat them as free. Reviewers: craig.topper, RKSimon, jnspaulsson, greened, asb, t.p.northover, jmolloy, dmgreen Reviewed By: jmolloy Subscribers: javed.absar, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66098 llvm-svn: 370339	2019-08-29 11:50:30 +00:00
Jeremy Morse	ca0e4b3689	[DebugInfo] LiveDebugValues: correctly discriminate kinds of variable locations The missing line added by this patch ensures that only spilt variable locations are candidates for being restored from the stack. Otherwise, register or constant-value information can be interpreted as a spill location, through a union. The added regression test replicates a scenario where this occurs: the stack load from [rsp] causes the register-location DBG_VALUE to be "restored" to rsi, when it should be left alone. See PR43058 for details. Un x-fail a test that was suffering from this from a previous patch. Differential Revision: https://reviews.llvm.org/D66895 llvm-svn: 370334	2019-08-29 11:20:54 +00:00
Simon Pilgrim	6c2fc64edc	Fix signed/unsigned comparison warning. NFCI. llvm-svn: 370333	2019-08-29 11:18:53 +00:00
Simon Pilgrim	27f43e6b1a	Fix shadow variable warning. NFCI. llvm-svn: 370332	2019-08-29 11:16:32 +00:00
George Rimar	de0bc44883	[yaml2obj] - Allow placing local symbols after globals. This allows us to produce broken binaries with local symbols placed after global in '.dynsym'/'.symtab' Also, simplifies the code. Differential revision: https://reviews.llvm.org/D66799 llvm-svn: 370331	2019-08-29 10:58:47 +00:00
George Rimar	72e9584698	[llvm-readobj/llvm-readelf] - Report a proper warning when dumping a broken dynamic relocation. When we have a dynamic relocation with a broken symbol's st_name, tools report a useless error: "Invalid data was encountered while parsing the file". After this change we report a warning + "<corrupt>" as a symbol name. Differential revision: https://reviews.llvm.org/D66734 llvm-svn: 370330	2019-08-29 10:55:57 +00:00
David Green	942c2e3795	[ARM] MVE Masked loads and stores Masked loads and store fit naturally with MVE, the instructions being easily predicated. This adds lowering for the simple cases of masked loads and stores. It does not yet deal with widening/narrowing or pre/post inc. The llvm masked load intrinsic will accept a "passthru" value, dictating the values used for the zero masked lanes. In MVE the instructions write 0 to the zero predicated lanes, so we need to match a passthru that isn't 0 (or undef) with a select instruction to pull in the correct data after the load. We also need to do something with unaligned loads/stores. Currently this uses a similar method used in big endian, using an VLDRB.8 (and potentially a VREV in BE). This does mean that the predicate mask is converted from, for example, a v4i1 to a v16i1. The VLDR instructions are defined as using the first bit of the relevant mask lane, so this could potentially load different results if the predicate is little odd. As the input is a v4i1 however, I believe this is OK and all the bits required should be set in the predicate, making the VLDRB.8 load the same data. Differential Revision: https://reviews.llvm.org/D66534 llvm-svn: 370329	2019-08-29 10:54:35 +00:00
Jeremy Morse	313d2ce999	[DebugInfo] LiveDebugValues should always revisit backedges if it skips them The "join" method in LiveDebugValues does not attempt to join unseen predecessor blocks if their out-locations aren't yet initialized, instead the block should be re-visited later to see if any locations have changed validity. However, because the set of blocks were all being "process"'d once before "join" saw them, that logic in "join" was actually ignoring legitimate out-locations on the first pass through. This meant that some invalidated locations were not removed from the head of loops, allowing illegal locations to persist. Fix this by removing the run of "process" before the main join/process loop in ExtendRanges. Now the unseen predecessors that "join" skips truly are uninitialized, and we come back to the block at a later time to re-run "join", see the @baz function added. This also fixes another fault where stack/register transfers in the entry block (or any other before-any-loop-block) had their tranfers initially ignored, and were then never revisited. The MIR test added tests for this behaviour. XFail a test that exposes another bug; a fix for this is coming in D66895. Differential Revision: https://reviews.llvm.org/D66663 llvm-svn: 370328	2019-08-29 10:53:29 +00:00
Roman Lebedev	cc7495a355	[X86][CodeGen][NFC] Delay `combineIncDecVector()` from DAGCombine to X86DAGToDAGISel Summary: We were previously doing it in DAGCombine. But we also want to do `sub %x, C` -> `add %x, (sub 0, C)` for vectors in DAGCombine. So if we had `sub %x, -1`, we'll transform it to `add %x, 1`, which `combineIncDecVector()` will immediately transform back into `sub %x, -1`, and here we go again... I've marked this as NFC since not a single test changes, but since that 'changes' DAGCombine, probably this isn't fully NFC. Reviewers: RKSimon, craig.topper, spatel Reviewed By: craig.topper Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62327 llvm-svn: 370327	2019-08-29 10:50:09 +00:00
Amaury Sechet	8365e42010	[DAGCombiner] (insert_vector_elt (vector_shuffle X, Y), (extract_vector_elt X, N), IdxC) -> (vector_shuffle X, Y) Summary: This is beneficial when the shuffle is only used once and end up being generated in a few places when some node is combined into a shuffle. Reviewers: craig.topper, efriedma, RKSimon, lebedev.ri Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66718 llvm-svn: 370326	2019-08-29 10:35:51 +00:00
David Green	e9211b764c	[ARM] Masked load and store and predicate tests. NFC llvm-svn: 370325	2019-08-29 10:32:12 +00:00
Roman Lebedev	f13b0e3ed8	[InstCombine] Shift amount reassociation in bittest: trunc-of-lshr (PR42399) Summary: Finally, the fold i was looking forward to :) The legality check is muddy, i doubt i've groked the full generalization, but it handles all the cases i care about, and can come up with: https://rise4fun.com/Alive/26j I.e. we can perform the fold if any of the following is true: * The shift amount is either zero or one less than widest bitwidth * Either of the values being shifted has at most lowest bit set * The value that is being shifted by `shl` (which is not truncated) should have no less leading zeros than the total shift amount; * The value that is being shifted by `lshr` (which is truncated) should have no less leading zeros than the widest bit width minus total shift amount minus one I strongly suspect there is some better generalization, but i'm not aware of it as of right now. For now i also avoided using actual `computeKnownBits()`, but restricted it to constants. Reviewers: spatel, nikic, xbolva00 Reviewed By: spatel Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66383 llvm-svn: 370324	2019-08-29 10:26:23 +00:00
Simon Pilgrim	dfb2a19ac2	LegalizeSetCCCondCode - Reduce scope of NeedSwap to fix cppcheck warning. NFCI. No need for this to be defined outside the only switch case its used in. llvm-svn: 370320	2019-08-29 10:11:34 +00:00
Simon Pilgrim	920b04011b	Fix variable set but no used warnings on NDEBUG builds. NFCI. llvm-svn: 370319	2019-08-29 10:08:45 +00:00
Simon Pilgrim	ef9c6a7077	Fix variable set but no used warning on NDEBUG builds. NFCI. llvm-svn: 370317	2019-08-29 09:58:47 +00:00
Martin Storsjo	7ba81d95d5	[COFF] Add a ResourceSectionRef method for getting the data entry, print it in llvm-readobj Differential Revision: https://reviews.llvm.org/D66819 llvm-svn: 370311	2019-08-29 09:00:14 +00:00
Martin Storsjo	edb6ab9ba6	[COFF] Add a bounds checking helper for iterating a coff_resource_dir_table Instead of blindly incrementing pointers in llvm-readobj, use this helper, which does bounds checking against the available section data. Differential Revision: https://reviews.llvm.org/D66818 llvm-svn: 370310	2019-08-29 08:59:56 +00:00
Martin Storsjo	357a40ec7c	[COFF] Fix error handling in ResourceSectionRef Previously, the expression (Reader.readFoo()) was expanded twice, triggering asserts as one of the Error types ends up not checked (and as it was expanded twice, the method would end up called twice if it failed first). Differential Revision: https://reviews.llvm.org/D66817 llvm-svn: 370309	2019-08-29 08:59:41 +00:00
Martin Storsjo	e3e8874b89	[llvm-readobj] Print the resource type textually for .res files This already is done when dumping resources from coff objects. Differential Revision: https://reviews.llvm.org/D66816 llvm-svn: 370308	2019-08-29 08:59:31 +00:00
Martin Storsjo	cdb9aa6339	[llvm-readobj] Remove a leftover string trim operation. NFC. This became unnecessary in SVN r359153. Differential Revision: https://reviews.llvm.org/D66815 llvm-svn: 370307	2019-08-29 08:59:05 +00:00
Craig Topper	c96284002e	[X86] Remove isel patterns with X86VBroadcast+scalar_to_vector+load. The DAG should have these as X86VBroadcast+load. llvm-svn: 370299	2019-08-29 06:36:16 +00:00
Craig Topper	231e628d69	[X86] Remove some unneeded X86VBroadcast isel patterns that have larger than 128 bit input types. We should always be shrinking the input to 128 bits or smaller when the node is created. llvm-svn: 370296	2019-08-29 06:02:11 +00:00
Hideto Ueno	cbab334e40	[Attributor] Deduce "noalias" attribute Summary: This patch adds very basic deduction for noalias. Reviewers: jdoerfert, sstefan1 Reviewed By: jdoerfert Tags: LLVM Differential Revision: https://reviews.llvm.org/D66207 llvm-svn: 370295	2019-08-29 05:52:00 +00:00
Craig Topper	1ec5c204b8	[X86] Add a DAG combine to combine INSERTPS and VBROADCAST of a scalar load. Remove corresponding isel patterns. We had an isel pattern to perform this, but its better to do it in DAG combine as a simplification. This also fixes the lack of patterns for AVX512 targets. llvm-svn: 370294	2019-08-29 05:48:48 +00:00
Craig Topper	1aadf6f39f	[X86] Make inline assembly 'x' and 'v' constraints work for f128. Including a type legalizer fix to make bitcast operand promotion work correctly when getSoftenedFloat returns f128 instead of i128. Fixes PR43157 llvm-svn: 370293	2019-08-29 05:13:56 +00:00
Florian Hahn	3177b92231	[LoopUnroll] Use Lazy strategy for DTU used for MergeBlockIntoPredecessor. We do not access the DT in the loop, so we do not have to apply updates eagerly. We can apply them lazyly and flush them after we are done merging blocks. As follow-up work, we might be able to use the DTU above as well, instead of manually updating the DT. This brings the example from PR43134 from ~100s to ~4s for a relase + assertions build on my machine. Reviewers: efriedma, kuhar, asbirlea, brzycki Reviewed By: kuhar, brzycki Differential Revision: https://reviews.llvm.org/D66911 llvm-svn: 370292	2019-08-29 04:26:29 +00:00
Vitaly Buka	db751c3778	[ObjectYAML] Fix lifetime issue in dumpDebugLines Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66901 llvm-svn: 370289	2019-08-29 02:36:48 +00:00
Johannes Doerfert	bf112139ac	[Attributor] Improve messages in iteration verify mode When we now verify the iteration count we will see the actual count and the expected count before the assertion is triggered. llvm-svn: 370285	2019-08-29 01:29:44 +00:00
Johannes Doerfert	a283125ef2	[Attributor][NFC] Add const to map key llvm-svn: 370284	2019-08-29 01:28:30 +00:00
Johannes Doerfert	62a9c1da78	[Attributor][Fix] Indicate change correctly llvm-svn: 370283	2019-08-29 01:26:58 +00:00
Johannes Doerfert	1aac182f31	[Attributor] Fix typo llvm-svn: 370282	2019-08-29 01:26:09 +00:00
Matt Arsenault	216d8ff60b	AMDGPU: Don't use frame virtual registers SGPR spills aren't really handled after SILowerSGPRSpills. In order to directly control what happens if the scavenger needs to spill, the scavenger needs to be used directly. There is an alternative to spilling in these contexts anyway since the frame register can be increment and restored. This does present another possible issue if spilling is needed for the unused carry out if an add is needed. I think this can be avoided by using a scalar add (although that clobbers SCC, which happens anyway). llvm-svn: 370281	2019-08-29 01:13:47 +00:00
Matt Arsenault	8ec5c10042	GlobalISel/TableGen: Handle setcc patterns This is a special case because one node maps to two different G_ instructions, and the operand order is changed. This mostly enables G_FCMP for AMDPGPU. G_ICMP is still manually selected for now since it has the SALU and VALU complication to deal with. llvm-svn: 370280	2019-08-29 01:13:41 +00:00
Richard Trieu	e4a7f0182d	Add requirement to test. -debug-only option for llc is only available in debug builds so "REQUIRES: asserts" is needed in the tes. llvm-svn: 370279	2019-08-29 00:46:57 +00:00
Craig Topper	af364131af	[X86] Fix a couple isel patterns to not shrink a volatile load. Also add a FIXME because I'm not sure why these patterns exist. Looks like a missing combine. And another FIXME because the AVX512 equivalent one of the patterns is missing. llvm-svn: 370276	2019-08-28 23:45:10 +00:00
Shiva Chen	b39876d8cd	[RISCV] Avoid generating AssertZext for LP64 ABI when lowering floating LibCall The patch fixed the issue that RV64 didn't clear the upper bits when return complex floating value with lp64 ABI. float _Complex complex_add(float _Complex a, float _Complex b) { return a + b; } RealResult = zero_extend(RealA + RealB) ImageResult = ImageA + ImageB Return (RealResult \| (ImageResult << 32)) The patch introduces shouldExtendTypeInLibCall target hook to suppress the AssertZext generation when lowering floating LibCall. Thanks to Eli's comments from the Bugzilla https://bugs.llvm.org/show_bug.cgi?id=42820 Differential Revision: https://reviews.llvm.org/D65497 llvm-svn: 370275	2019-08-28 23:40:37 +00:00
Heejin Ahn	d85fd5a3f4	[WebAssembly] Add atomic.fence instruction Summary: This adds `atomic.fence` instruction: https://github.com/WebAssembly/threads/blob/master/proposals/threads/Overview.md#fence-operator And we now emit the new `atomic.fence` instruction for multithread fences, rather than the prevous `atomic.rmw` hack. Reviewers: dschuff Subscribers: sbc100, jgravelle-google, hiraditya, sunfish, jfb, tlively, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66794 llvm-svn: 370272	2019-08-28 23:13:43 +00:00
Tom Stellard	5be949e3d0	[LLVM-C] Fix omission of INSTALL_WITH_TOOLCHAIN to llvm_add_library() Due to a misstake with r365902 that tried to simplify the install with toolchain logic LLVM-C.dll was no longer being installed. Patch By: Jakob Bornecrantz llvm-svn: 370271	2019-08-28 22:59:04 +00:00
Simon Atanasyan	027f1da010	[mips] Add an empty line to separate different patterns. NFC llvm-svn: 370269	2019-08-28 22:32:16 +00:00
Simon Atanasyan	59bb3609fa	[mips] Fix 64-bit address loading in case of applying 32-bit mask to the result If result of 64-bit address loading combines with 32-bit mask, LLVM tries to optimize the code and remove "redundant" loading of upper 32-bits of the address. It leads to incorrect code on MIPS64 targets. MIPS backend creates the following chain of commands to load 64-bit address in the `MipsTargetLowering::getAddrNonPICSym64` method: ``` (add (shl (add (shl (add %highest(sym), %higher(sym)), 16), %hi(sym)), 16), %lo(%sym)) ``` If the mask presents, LLVM decides to optimize the chain of commands. It really does not make sense to load upper 32-bits because the 0x0fffffff mask anyway clears them. After removing redundant commands we get this chain: ``` (add (shl (%hi(sym), 16), %lo(%sym)) ``` There is no patterns matched `(MipsHi (i64 symbol))`. Due a bug in `SYM_32` predicate definition, backend incorrectly selects a pattern for a 32-bit symbols and uses the `lui` instruction for loading `%hi(sym)`. As a result we get incorrect set of instructions with unnecessary 16-bit left shifting: ``` lui at,0x0 R_MIPS_HI16 foo dsll at,at,0x10 daddiu at,at,0 R_MIPS_LO16 foo ``` This patch resolves two problems: - Fix `SYM_32/SYM_64` predicates to prevent selection of patterns dedicated to 32-bit symbols in case of using N64 ABI. - Add missed patterns for 64-bit symbols for `%hi/%lo`. Fix PR42736. Differential Revision: https://reviews.llvm.org/D66228 llvm-svn: 370268	2019-08-28 22:32:10 +00:00
Jessica Paquette	01cd91aaea	Add tie-breaker for register class sorting in getSuperRegForSubReg llvm::stable_sort is apparently not sufficient. Use the same tie-breaker/sorting style as TopoOrderRC fix bot failures. E.g. http://lab.llvm.org:8011/builders/llvm-clang-x86_64-expensive-checks-win/builds/19401/steps/test-check-all/logs/stdio llvm-svn: 370267	2019-08-28 22:03:05 +00:00
Artur Pilipenko	925afc1ce7	Fix for "DICompileUnit not listed in llvm.dbg.cu" verification error after ... ...cloning a function from a different module Currently when a function with debug info is cloned from a different module, the cloned function may have hanging DICompileUnits, so that the module with the cloned function fails debug info verification. The proposed fix inserts all DICompileUnits reachable from the cloned function to "llvm.dbg.cu" metadata operands of the cloned function module. Reviewed By: aprantl, efriedma Differential Revision: https://reviews.llvm.org/D66510 Patch by Oleg Pliss (Oleg.Pliss@azul.com) llvm-svn: 370265	2019-08-28 21:27:50 +00:00
Jason Liu	a1178b862a	[llvm-readobj][XCOFF][NFC] Add return statement to avoid -Wimplicit-fallthrough warning This is to fix the commit in r370097. llvm-svn: 370260	2019-08-28 20:59:17 +00:00
Julian Lettner	3ae9b9d5e4	[ASan] Make insertion of version mismatch guard configurable By default ASan calls a versioned function `__asan_version_mismatch_check_vXXX` from the ASan module constructor to check that the compiler ABI version and runtime ABI version are compatible. This ensures that we get a predictable linker error instead of hard-to-debug runtime errors. Sometimes, however, we want to skip this safety guard. This new command line option allows us to do just that. rdar://47891956 Reviewed By: kubamracek Differential Revision: https://reviews.llvm.org/D66826 llvm-svn: 370258	2019-08-28 20:40:55 +00:00
James Y Knight	f025968bcc	Ignore object files that lack coverage information. Before this change, if multiple binary files were presented, all of them must have been instrumented or the load would fail with coverage_map_error::no_data_found. Patch by Dean Sturtevant. Differential Revision: https://reviews.llvm.org/D66763 llvm-svn: 370257	2019-08-28 20:35:50 +00:00
Philip Reames	0b62951e1d	Use the handle --check-prefixes mechanism to de-verbosify a couple atomics tests [NFC] llvm-svn: 370256	2019-08-28 20:27:39 +00:00
Jessica Paquette	7080ffa21a	[GlobalISel] Import patterns containing SUBREG_TO_REG Reuse the logic for INSERT_SUBREG to also import SUBREG_TO_REG patterns. - Split `inferSuperRegisterClass` into two functions, one which tries to use an existing TreePatternNode (`inferSuperRegisterClassForNode`), and one that doesn't. SUBREG_TO_REG doesn't have a node to leverage, which is the cause for the split. - Rename GlobalISelEmitterInsertSubreg.td to GlobalISelEmitterSubreg.td and update it. - Update impacted tests in the AArch64 and X86 backends. This is kind of a hit/miss for code size improvements/regressions. E.g. in add-ext.ll, we now get some identity copies. This isn't really anything the importer can handle, since it's caused by a later pass introducing the copy for the sake of correctness. Differential Revision: https://reviews.llvm.org/D66769 llvm-svn: 370254	2019-08-28 20:12:31 +00:00
Nico Weber	6acfc7c587	gn build: Merge r370249 llvm-svn: 370251	2019-08-28 19:38:59 +00:00
Scott Linder	04f6f25421	[AMDGPU] Fix bug when calculating user_spgr_count for Code Object V3 assembler Stop counting explicitly disabled user_spgr's in the user_sgpr_count field of the kernel descriptor. Differential Revision: https://reviews.llvm.org/D66900 llvm-svn: 370250	2019-08-28 19:38:15 +00:00
Sanjay Patel	2d4b6777c4	[InstCombine] clean up wrap propagation for reassociated ops; NFCI Always true/false checks were flagged by static analysis; https://bugs.llvm.org/show_bug.cgi?id=43143 I have not confirmed the logic difference in propagating nsw vs. nuw, but presumably we would have noticed a bug by now if that was wrong. llvm-svn: 370248	2019-08-28 18:58:06 +00:00
Pirama Arumuga Nainar	19205abaaa	[ValueMapper] NFC: Remove dead code to pause metadata mapping Summary: This functionality was added when Mapper::mapMetadata was recursive. It is no longer needed after r265456, which switched it to be iterative. Reviewers: dexonsmith, srhines Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66860 llvm-svn: 370236	2019-08-28 17:43:14 +00:00
Craig Topper	a47db7110d	[X86][ReleaseNotes] Add a note about the switch to widening legalization for narrow vectors. llvm-svn: 370233	2019-08-28 17:18:56 +00:00
Johannes Doerfert	f7ca0fe1c8	[Attributor] Regularly clear dependences to remove spurious ones As dependences between abstract attributes can become stale, e.g., if one was sufficient to imply another one at some point but it has since been wakened to the point it is not usable for the formerly implied one. To weed out spurious dependences, and thereby eliminate unneeded updates, we introduce an option to determine how often the dependence cache is cleared and recomputed during the fixpoint iteration. Note that the initial value was determined such that we see a positive result on our tests. Differential Revision: https://reviews.llvm.org/D63315 llvm-svn: 370230	2019-08-28 16:58:52 +00:00
Kevin P. Neal	ddf13c00ed	[FPEnv] Add fptosi and fptoui constrained intrinsics. This implements constrained floating point intrinsics for FP to signed and unsigned integers. Quoting from D32319: The purpose of the constrained intrinsics is to force the optimizer to respect the restrictions that will be necessary to support things like the STDC FENV_ACCESS ON pragma without interfering with optimizations when these restrictions are not needed. Reviewed by: Andrew Kaylor, Craig Topper, Hal Finkel, Cameron McInally, Roman Lebedev, Kit Barton Approved by: Craig Topper Differential Revision: http://reviews.llvm.org/D63782 llvm-svn: 370228	2019-08-28 16:33:36 +00:00
Jessica Paquette	af0bd41e06	[AArch64][GlobalISel] Fall back when translating musttail calls These are currently translated as normal functions calls in AArch64. Until we have proper tail call lowering, we shouldn't translate these. Differential Revision: https://reviews.llvm.org/D66842 llvm-svn: 370225	2019-08-28 16:19:01 +00:00
Simon Pilgrim	1d8a886c59	Reduce scope of variable only used in a local pattern match. NFCI. llvm-svn: 370224	2019-08-28 16:06:33 +00:00
David Bolvansky	420327269e	[NFC] Added more tests for D66651 llvm-svn: 370222	2019-08-28 16:00:15 +00:00
Craig Topper	f79d8a064c	[InstCombine] Disable recursion in foldGEPICmp for vector pointer GEPs Due to missing vector support in this function, recursion can generate worse code in some cases. llvm-svn: 370221	2019-08-28 15:40:34 +00:00
Simon Pilgrim	b569624049	Fix uninitialized variable warning in cppcheck. NFCI. InstCombiner::MaxArraySizeForCombine is set outside the constructor so we need to ensure it has a default initialization value. llvm-svn: 370220	2019-08-28 15:19:49 +00:00
David Bolvansky	af118bb6d0	[NFC] Added a comment to avoid possible confusion llvm-svn: 370217	2019-08-28 15:04:48 +00:00
Ryan Taylor	3b1459ed7c	[AMDGPU] Adjust number of SGPRs available in Calling Convention This reduces the number of SGPRs due to some concerns about running out of SGPRs if you make all the SGPRs that aren't reserved available for the calling convention. Change-Id: Idb4ca4dc72f5b6808cb524ff7270915a8de5b4c1 llvm-svn: 370215	2019-08-28 15:00:45 +00:00
Simon Pilgrim	316bfb0f48	Remove duplicate 'BitWidth' variable. NFCI. llvm-svn: 370212	2019-08-28 14:37:44 +00:00
Johannes Doerfert	07a5c129c6	[Attributor] Restrict liveness and return information to functions Summary: Until we have proper call-site information we should not recompute liveness and return information for each call site. This patch directly uses the function versions and introduces TODOs at the usage sites. The required iterations to get to the fixpoint are most of the time reduced by this change and we always avoid work duplication. Reviewers: sstefan1, uenoku Subscribers: hiraditya, bollu, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66562 llvm-svn: 370208	2019-08-28 14:09:14 +00:00
Simon Pilgrim	284118ce3b	InstCombiner::visitSelectInst - rename Pred to MinMaxPred to stop shadow variable warning. NFCI. We have a lot of Predicate variables, all similarly named.... llvm-svn: 370207	2019-08-28 14:05:38 +00:00
Vlad Tsyrklevich	b8a96f4bf5	Reland "[yaml2obj] - Don't allow setting StOther and Other/Visibility at the same time." This relands this commit, I mistakenly reverted the original change thinking it was the cause of the observed MSan failures but it was not. llvm-svn: 370206	2019-08-28 14:04:09 +00:00
Hans Wennborg	cff90f07cb	[SelectionDAG] Don't generate libcalls for wide shifts on Windows (PR42711) Neither libgcc or compiler-rt are usually used on Windows, so these functions can't be called. Differential revision: https://reviews.llvm.org/D66880 llvm-svn: 370204	2019-08-28 13:55:10 +00:00
Amaury Sechet	3b44c36b29	[X86] Add test for rotate combining when add X, X is used instead of shl X, 1. NFC llvm-svn: 370203	2019-08-28 13:52:32 +00:00
Vlad Tsyrklevich	aba62e9c00	Revert "[yaml2obj] - Don't allow setting StOther and Other/Visibility at the same time." This reverts commit r370032, it was causing check-llvm failures on sanitizer-x86_64-linux-bootstrap-msan llvm-svn: 370198	2019-08-28 13:15:08 +00:00
Simon Pilgrim	14e07d7f4b	[DAGCombine] Fix cppcheck shadow variable warning. NFCI. We already have an outer Ops variable. llvm-svn: 370197	2019-08-28 12:48:41 +00:00
Simon Atanasyan	f46ba4f077	[mips] Use less registers to load address of TargetExternalSymbol There is no pattern matched `add hi, (MipsLo texternalsym)`. As a result, loading an address of 32-bit symbol requires two registers and one more additional instruction: ``` addiu $1, $zero, %lo(foo) lui $2, %hi(foo) addu $25, $2, $1 ``` This patch adds the missed pattern and enables generation more effective set of instructions: ``` lui $1, %hi(foo) addiu $25, $1, %lo(foo) ``` Differential Revision: https://reviews.llvm.org/D66771 llvm-svn: 370196	2019-08-28 12:35:53 +00:00
Amaury Sechet	4f4387dd12	[TargetLowering] Add buildLegalVectorShuffle facility to help build legal shuffles Summary: There are at least 2 ways to express the same shuffle. Various pieces of code explicit check for both option, but other places do not when they would benefit from doing it. This patches refactor the codebase to use buildLegalVectorShuffle in order to make that behavior more consistent. Reviewers: craig.topper, efriedma, RKSimon, lebedev.ri Subscribers: javed.absar, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66804 llvm-svn: 370190	2019-08-28 12:00:06 +00:00
Simon Pilgrim	c5b38e2869	[DAGCombine] Remove LoadedSlice::Cost default 'ForCodeSize' constructor arguments. NFCI. These were always being passed in and it allowed me to add the explicit tag to stop a cppcheck warning about 1 argument constructors. llvm-svn: 370189	2019-08-28 11:50:36 +00:00
Nico Weber	d2f5854567	gn build: Merge r370187 llvm-svn: 370188	2019-08-28 11:42:20 +00:00
David Green	379f6186dd	[ARM] Move MVEVPTBlockPass to a separate file. NFC This just pulls the MVEVPTBlockPass into a separate file, as opposed to being wrapped up in Thumb2ITBlockPass. Differential revision: https://reviews.llvm.org/D66579 llvm-svn: 370187	2019-08-28 11:37:31 +00:00
David Green	1c5b143c99	[MVE] VMOVX patterns This adds fp16 VMOVX patterns, using the same patterns as rL362482 with some adjustments for MVE. It allows us to move fp16 registers without going into and out of gprs. VMOVX is able to move the top bits from a fp16 in a fp reg into the bottom bits of another register, zeroing the rest. This can be used for odd MVE register lanes. The top bits are not read by fp16 instructions, so no move is required there if we are dealing with even lanes. Differential revision: https://reviews.llvm.org/D66793 llvm-svn: 370184	2019-08-28 10:13:23 +00:00
Hans Wennborg	0af82068a8	[LLVM-C] Fix ByVal Attribute crashing With the introduction of the typed byval attribute change there was no way that the LLVM-C API could create the correct class Attribute. If a program that uses the C API creates a ByVal attribute and annotates a function with that attribute LLVM will crash when it assembles or write that module containing the function out as bitcode. This change is a minimal fix to at least allow code to work, this is because the byval change is on the 9.0 and I don't want to introduce new LLVM-C API this late in the release cycle. By Jakob Bornecrantz! Differential revision: https://reviews.llvm.org/D66144 llvm-svn: 370176	2019-08-28 09:21:56 +00:00
Ayal Zaks	d15df0ede5	[LV] Fold tail by masking - handle reductions Allow vectorizing loops that have reductions when tail is folded by masking. A select is introduced in VPlan, choosing between the last value carried by the loop-exit/live-out instruction of the reduction, and the penultimate value carried by the reduction phi, according to the "i < n" mask of fold-tail. This select replaces the last value as the live-out value of the loop. Differential Revision: https://reviews.llvm.org/D66720 llvm-svn: 370173	2019-08-28 09:02:23 +00:00
Sam Parker	a761ba0f2d	[ARM][ParallelDSP] Change search for muls rL369567 reverted a couple of recent changes made to ARMParallelDSP because of a miscompilation error: PR43073. The issue stemmed from an underlying bug that was caused by adding muls into a reduction before it was proved that they could be executed in parallel with another mul. Most of the changes here are from the previously reverted commits. The additional changes have been made area: 1) The Search function now doesn't insert any muls into the Reduction object. That now happens once the search has successfully finished. 2) For any muls added into the reduction but that weren't paired, we accumulate their values as an input into the smlad. Differential Revision: https://reviews.llvm.org/D66660 llvm-svn: 370171	2019-08-28 08:51:13 +00:00
David Bolvansky	207c653965	[NFC] Unbreak tests llvm-svn: 370170	2019-08-28 08:42:40 +00:00
David Bolvansky	a0a8dd225d	[NFC] Updated test llvm-svn: 370169	2019-08-28 08:40:45 +00:00
David Bolvansky	05bda8b4e5	Annotate return values of allocation functions with dereferenceable_or_null Summary: Example define dso_local noalias i8* @_Z6maixxnv() local_unnamed_addr #0 { entry: %call = tail call noalias dereferenceable_or_null(64) i8* @malloc(i64 64) #6 ret i8* %call } Reviewers: jdoerfert Reviewed By: jdoerfert Subscribers: aaron.ballman, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66651 llvm-svn: 370168	2019-08-28 08:28:20 +00:00
Yi Kong	b9d87b9528	[llvm-objdump] Add the missing ARMv8 subarch detection Differential Revision: https://reviews.llvm.org/D66849 llvm-svn: 370163	2019-08-28 06:37:22 +00:00
Fangrui Song	6964027315	[LoopFusion] Fix another -Wunused-function in -DLLVM_ENABLE_ASSERTIONS=off build llvm-svn: 370156	2019-08-28 03:12:40 +00:00
Matt Arsenault	a8bbcbd006	AMDGPU/GlobalISel: Fix constraining scalar and/or/xor If the result register already had a register class assigned, the sources may not have been properly constrained. llvm-svn: 370150	2019-08-28 02:11:03 +00:00

1 2 3 4 5 ...

184229 Commits