llvm-project

Commit Graph

Author	SHA1	Message	Date
Thomas Lively	d0d9317061	[WebAssembly] Add SIMD QFMA/QFMS Summary: Adds clang builtins and LLVM intrinsics for these experimental instructions. They are not implemented in engines yet, but that is ok because the user must opt into using them by calling the builtins. Reviewers: aheejin, dschuff Reviewed By: aheejin Subscribers: sbc100, jgravelle-google, hiraditya, sunfish, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D67020 llvm-svn: 370556	2019-08-31 00:12:29 +00:00
Alina Sbirlea	3d03769ba0	[MemorySSA] Rename all phi entries. When renaming Phis incoming values, there may be multiple edges incoming from the same block (switch). Rename all. llvm-svn: 370548	2019-08-30 23:02:53 +00:00
Wei Mi	5ef5829fb0	[GVN] Verify value equality before doing phi translation for call instruction This is an updated version of https://reviews.llvm.org/D66909 to fix PR42605. Basically, current phi translatation translates an old value number to an new value number for a call instruction based on the literal equality of call expression, without verifying there is no clobber in between. This is incorrect. To get a finegrain check, use MachineDependence analysis to do the job. However, this is still not ideal. Although given a call instruction, `MemoryDependenceResults::getCallDependencyFrom` returns identical call instructions without clobber in between using MemDepResult with its DepType to be `Def`. However, identical is too strict here and we want it to be relaxed a little to consider phi-translation -- callee is the same, param operands can be different. That means changing the semantic of `MemDepResult::Def` and I don't know the potential impact. So currently the patch is still conservative to only handle MemDepResult::NonFuncLocal, which means the current call has no function local clobber. If there is clobber, even if the clobber doesn't stand in between the current call and the call with the new value, we won't do phi-translate. Differential Revision: https://reviews.llvm.org/D67013 llvm-svn: 370547	2019-08-30 23:01:22 +00:00
Reid Kleckner	185ddc08ee	Fix SEH_NoReturn machine verifier error llvm-svn: 370543	2019-08-30 22:40:51 +00:00
Reid Kleckner	657a06c619	[MC] Avoid crashes from improperly nested or wrong target .seh_handlerdata directives llvm-svn: 370540	2019-08-30 22:25:55 +00:00
Reid Kleckner	a33474d595	[X86] Print register names in .seh_* directives Also improve assembler parser register validation for .seh_ directives. This requires moving X86-specific seh directive handling into the x86 backend, which addresses some assembler FIXMEs. Differential Revision: https://reviews.llvm.org/D66625 llvm-svn: 370533	2019-08-30 21:23:05 +00:00
Reid Kleckner	0bb1630685	[Windows] Disable TrapUnreachable for Win64, add SEH_NoReturn Users have complained llvm.trap produce two ud2 instructions on Win64, one for the trap, and one for unreachable. This change fixes that. TrapUnreachable was added and enabled for Win64 in r206684 (April 2014) to avoid poorly understood issues with the Windows unwinder. There seem to be two major things in play: - the unwinder - C++ EH, _CxxFrameHandler3 & co The unwinder disassembles forward from the return address to scan for epilogues. Inserting a ud2 had the effect of stopping the unwinder, and ensuring that it ran the EH personality function for the current frame. However, it's not clear what the unwinder does when the return address happens to be the last address of one function and the first address of the next function. The Visual C++ EH personality, _CxxFrameHandler3, needs to figure out what the current EH state number is. It does this by consulting the ip2state table, which maps from PC to state number. This seems to go wrong when the return address is the last PC of the function or catch funclet. I'm not sure precisely which system is involved here, but in order to address these real or hypothetical problems, I believe it is enough to insert int3 after a call site if it would otherwise be the last instruction in a function or funclet. I was able to reproduce some similar problems locally by arranging for a noreturn call to appear at the end of a catch block immediately before an unrelated function, and I confirmed that the problems go away when an extra trailing int3 instruction is added. MSVC inserts int3 after every noreturn function call, but I believe it's only necessary to do it if the call would be the last instruction. This change inserts a pseudo instruction that expands to int3 if it is in the last basic block of a function or funclet. I did what I could to run the Microsoft compiler EH tests, and the ones I was able to run showed no behavior difference before or after this change. Differential Revision: https://reviews.llvm.org/D66980 llvm-svn: 370525	2019-08-30 20:46:39 +00:00
James Molloy	790a779f06	[MachinePipeliner] Separate schedule emission, NFC This is the first stage in refactoring the pipeliner and making it more accessible for backends to override and control. This separates the logic and state required to emit a scheudule from the logic that computes and validates a schedule. This will enable (a) new schedule emitters and (b) new modulo scheduling implementations to coexist. NFC. Differential Revision: https://reviews.llvm.org/D67006 llvm-svn: 370500	2019-08-30 18:49:50 +00:00
Simon Pilgrim	3be7081aa1	[DAGCombine] ReduceLoadWidth - remove duplicate SDLoc. NFCI. SDLoc(N0) and SDLoc(cast<LoadSDNode>(N0)) should be equivalent. llvm-svn: 370498	2019-08-30 18:19:02 +00:00
Simon Pilgrim	2d1e0899e9	[TargetLowering] SimplifyDemandedBits ADD/SUB/MUL - correctly inherit SDNodeFlags from the original node. Just disable NSW/NUW flags. This matches what we're already doing for the other situations for these nodes, it was just missed for the demanded constant case. Noticed by inspection - confirmed in offline discussion with @spatel. I've checked we have test coverage in the x86 extract-bits.ll and extract-lowbits.ll tests llvm-svn: 370497	2019-08-30 17:58:55 +00:00
Matt Arsenault	466ec2d552	GlobalISel: Fix missing pass dependency llvm-svn: 370496	2019-08-30 17:41:58 +00:00
Craig Topper	18e8d02e8c	[X86] Pass v32i16/v64i8 in zmm registers on KNL target. gcc and icc pass these types in zmm registers in zmm registers. This patch implements a quick hack to override the register type before calling convention handling to one that is legal. Longer term we might want to do something similar to 256-bit integer registers on AVX1 where we just split all the operations. Fixes PR42957 Differential Revision: https://reviews.llvm.org/D66708 llvm-svn: 370495	2019-08-30 17:35:08 +00:00
Craig Topper	30ddd2ab6c	[ValueTypes] Add v16f16 and v32f16 to EVT::getEVTString and Tablegen's getEnumName Missed these when I hadded the enum entries llvm-svn: 370494	2019-08-30 17:34:29 +00:00
Evgeniy Stepanov	04647f5e22	MemTag: unchecked load/store optimization. Summary: MTE allows memory access to bypass tag check iff the address argument is [SP, #imm]. This change takes advantage of this to demote uses of tagged addresses to regular FrameIndex operands, reducing register pressure in large functions. MO_TAGGED target flag is used to signal that the FrameIndex operand refers to memory that might be tagged, and needs to be handled with care. Such operand must be lowered to [SP, #imm] directly, without a scratch register. The transformation pass attempts to predict when the offset will be out of range and disable the optimization. AArch64RegisterInfo::eliminateFrameIndex has an escape hatch in case this prediction has been wrong, but it is quite inefficient and should be avoided. Reviewers: pcc, vitalybuka, ostannard Subscribers: mgorny, javed.absar, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66457 llvm-svn: 370490	2019-08-30 17:23:02 +00:00
Simon Pilgrim	ab8cb1a3c5	[DAGCombine] visitVSELECT - remove equivalent getValueType() call. NFCI. llvm-svn: 370489	2019-08-30 17:21:20 +00:00
Johannes Doerfert	659a8707d6	[Attributor] Fix: do not pretend to preserve the CFG llvm-svn: 370485	2019-08-30 16:35:10 +00:00
Craig Topper	66f03ba17d	[X86] Merge X86InstrInfo::loadRegFromAddr/storeRegToAddr into their only call site. I'm looking at unfolding broadcast loads on AVX512 which will require refactoring this code to select broadcast opcodes instead of regular load/stores in some cases. Merging them to avoid further complicating their interfaces. llvm-svn: 370484	2019-08-30 16:05:57 +00:00
Johannes Doerfert	3fac668d83	[Attributor] Use existing function information for the call site Summary: Instead of recomputing information for call sites we now use the function information directly. This is always valid and once we have call site specific information we can improve here. This patch also bootstraps attributes that are created on-demand through an initial update call. Information that is known will then directly be available in the new attribute without causing an iteration delay. The tests show how this improves the iteration count. Reviewers: sstefan1, uenoku Subscribers: hiraditya, bollu, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66781 llvm-svn: 370480	2019-08-30 15:24:52 +00:00
Johannes Doerfert	81df452d82	[Attributor] Manifest load/store alignment generally Summary: Any pointer could have load/store users not only floating ones so we move the manifest logic for alignment into the AAAlignImpl class. Reviewers: uenoku, sstefan1 Subscribers: hiraditya, bollu, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66922 llvm-svn: 370479	2019-08-30 15:22:28 +00:00
Simon Pilgrim	c2fed1dc8a	[DAGCombine] visitVSELECT - remove duplicate getOperand calls. NFCI. llvm-svn: 370478	2019-08-30 15:17:37 +00:00
Piotr Sobczak	67b979466a	[InstCombine][AMDGPU] Simplify tbuffer loads Summary: Add missing tbuffer loads intrinsics in SimplifyDemandedVectorElts. Reviewers: arsenm, nhaehnle Reviewed By: arsenm Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66926 llvm-svn: 370475	2019-08-30 14:20:04 +00:00
George Rimar	4e71702cd4	[yaml2obj][obj2yaml] - Use a single "Other" field instead of "Other", "Visibility" and "StOther". Currenly we can encode the 'st_other' field of symbol using 3 fields. 'Visibility' is used to encode STV_* values. 'Other' is used to encode everything except the visibility, but it can't handle arbitrary values. 'StOther' is used to encode arbitrary values when 'Visibility'/'Other' are not helpfull enough. 'st_other' field is used to encode symbol visibility and platform-dependent flags and values. Problem to encode it is that it consists of Visibility part (STV_* values) which are enumeration values and the Other part, which is different and inconsistent. For MIPS the Other part contains flags for all STO_MIPS_* values except STO_MIPS_MIPS16. (Like comment in ELFDumper says: "Someones in their infinite wisdom decided to make STO_MIPS_MIPS16 flag overlapped with other ST_MIPS_xxx flags."...) And for PPC64 the Other part might actually encode any value. This patch implements custom logic for handling the st_other and removes 'Visibility' and 'StOther' fields. Here is an example of a new YAML style this patch allows: - Name: foo Other: [ 0x4 ] - Name: bar Other: [ STV_PROTECTED, 4 ] - Name: zed Other: [ STV_PROTECTED, STO_MIPS_OPTIONAL, 0xf8 ] Differential revision: https://reviews.llvm.org/D66886 llvm-svn: 370472	2019-08-30 13:39:22 +00:00
Simon Pilgrim	3367669668	[DAGCombine] visitVSELECT - use getShiftAmountTy for shift amounts. llvm-svn: 370471	2019-08-30 13:30:37 +00:00
Simon Pilgrim	8e1989e79a	[DAGCombine] visitMULHS - use getScalarValueSizeInBits() to make safe for vector types. This is hidden behind a (scalar-only) isOneConstant(N1) check at the moment, but once we get around to adding vector support we need to ensure we're dealing with the scalar bitwidth, not the total. llvm-svn: 370468	2019-08-30 12:22:06 +00:00
Haojian Wu	ed170c9bf9	Remove an extra ";", NFC. llvm-svn: 370465	2019-08-30 12:09:31 +00:00
Bjorn Pettersson	227145924a	[CodeGen] Introduce MachineBasicBlock::replacePhiUsesWith helper and use it. NFC Summary: Found a couple of places in the code where all the PHI nodes of a MBB is updated, replacing references to one MBB by reference to another MBB instead. This patch simply refactors the code to use a common helper (MachineBasicBlock::replacePhiUsesWith) for such PHI node updates. Reviewers: t.p.northover, arsenm, uabelho Subscribers: wdng, hiraditya, jsji, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66750 llvm-svn: 370463	2019-08-30 11:23:10 +00:00
Simon Pilgrim	7cbf823f93	[DAGCombine] visitMULHS/visitMULHU - isBuildVectorAllZeros doesn't mean node is all zeros Return a proper zero vector, just in case some elements are undef. Noticed by inspection after dealing with a similar issue in PR43159. llvm-svn: 370460	2019-08-30 10:42:14 +00:00
Hideto Ueno	6381b143f6	[Attributor] Implement AANoAliasCallSiteArgument initialization Summary: This patch adds an appropriate `initialize` method for `AANoAliasCallSiteArgument`. Reviewers: jdoerfert, sstefan1 Reviewed By: jdoerfert Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66927 llvm-svn: 370456	2019-08-30 10:00:32 +00:00
Roman Lebedev	5c9f3cfec7	[LoopIdiomRecognize] BCmp loop idiom recognition Summary: @mclow.lists brought up this issue up in IRC. It is a reasonably common problem to compare some two values for equality. Those may be just some integers, strings or arrays of integers. In C, there is `memcmp()`, `bcmp()` functions. In C++, there exists `std::equal()` algorithm. One can also write that function manually. libstdc++'s `std::equal()` is specialized to directly call `memcmp()` for various types, but not `std::byte` from C++2a. https://godbolt.org/z/mx2ejJ libc++ does not do anything like that, it simply relies on simple C++'s `operator==()`. https://godbolt.org/z/er0Zwf (GOOD!) So likely, there exists a certain performance opportunities. Let's compare performance of naive `std::equal()` (no `memcmp()`) with one that is using `memcmp()` (in this case, compiled with modified compiler). {F8768213} ``` #include <algorithm> #include <cmath> #include <cstdint> #include <iterator> #include <limits> #include <random> #include <type_traits> #include <utility> #include <vector> #include "benchmark/benchmark.h" template <class T> bool equal(T* a, T* a_end, T* b) noexcept { for (; a != a_end; ++a, ++b) { if (a != b) return false; } return true; } template <typename T> std::vector<T> getVectorOfRandomNumbers(size_t count) { std::random_device rd; std::mt19937 gen(rd()); std::uniform_int_distribution<T> dis(std::numeric_limits<T>::min(), std::numeric_limits<T>::max()); std::vector<T> v; v.reserve(count); std::generate_n(std::back_inserter(v), count, [&dis, &gen]() { return dis(gen); }); assert(v.size() == count); return v; } struct Identical { template <typename T> static std::pair<std::vector<T>, std::vector<T>> Gen(size_t count) { auto Tmp = getVectorOfRandomNumbers<T>(count); return std::make_pair(Tmp, std::move(Tmp)); } }; struct InequalHalfway { template <typename T> static std::pair<std::vector<T>, std::vector<T>> Gen(size_t count) { auto V0 = getVectorOfRandomNumbers<T>(count); auto V1 = V0; V1[V1.size() / size_t(2)]++; // just change the value. return std::make_pair(std::move(V0), std::move(V1)); } }; template <class T, class Gen> void BM_bcmp(benchmark::State& state) { const size_t Length = state.range(0); const std::pair<std::vector<T>, std::vector<T>> Data = Gen::template Gen<T>(Length); const std::vector<T>& a = Data.first; const std::vector<T>& b = Data.second; assert(a.size() == Length && b.size() == a.size()); benchmark::ClobberMemory(); benchmark::DoNotOptimize(a); benchmark::DoNotOptimize(a.data()); benchmark::DoNotOptimize(b); benchmark::DoNotOptimize(b.data()); for (auto _ : state) { const bool is_equal = equal(a.data(), a.data() + a.size(), b.data()); benchmark::DoNotOptimize(is_equal); } state.SetComplexityN(Length); state.counters["eltcnt"] = benchmark::Counter(Length, benchmark::Counter::kIsIterationInvariant); state.counters["eltcnt/sec"] = benchmark::Counter(Length, benchmark::Counter::kIsIterationInvariantRate); const size_t BytesRead = 2 * sizeof(T) * Length; state.counters["bytes_read/iteration"] = benchmark::Counter(BytesRead, benchmark::Counter::kDefaults, benchmark::Counter::OneK::kIs1024); state.counters["bytes_read/sec"] = benchmark::Counter( BytesRead, benchmark::Counter::kIsIterationInvariantRate, benchmark::Counter::OneK::kIs1024); } template <typename T> static void CustomArguments(benchmark::internal::Benchmark* b) { const size_t L2SizeBytes = []() { for (const benchmark::CPUInfo::CacheInfo& I : benchmark::CPUInfo::Get().caches) { if (I.level == 2) return I.size; } return 0; }(); // What is the largest range we can check to always fit within given L2 cache? const size_t MaxLen = L2SizeBytes / /total bufs/ 2 / /maximal elt size/ sizeof(T) / /safety margin/ 2; b->RangeMultiplier(2)->Range(1, MaxLen)->Complexity(benchmark::oN); } BENCHMARK_TEMPLATE(BM_bcmp, uint8_t, Identical) ->Apply(CustomArguments<uint8_t>); BENCHMARK_TEMPLATE(BM_bcmp, uint16_t, Identical) ->Apply(CustomArguments<uint16_t>); BENCHMARK_TEMPLATE(BM_bcmp, uint32_t, Identical) ->Apply(CustomArguments<uint32_t>); BENCHMARK_TEMPLATE(BM_bcmp, uint64_t, Identical) ->Apply(CustomArguments<uint64_t>); BENCHMARK_TEMPLATE(BM_bcmp, uint8_t, InequalHalfway) ->Apply(CustomArguments<uint8_t>); BENCHMARK_TEMPLATE(BM_bcmp, uint16_t, InequalHalfway) ->Apply(CustomArguments<uint16_t>); BENCHMARK_TEMPLATE(BM_bcmp, uint32_t, InequalHalfway) ->Apply(CustomArguments<uint32_t>); BENCHMARK_TEMPLATE(BM_bcmp, uint64_t, InequalHalfway) ->Apply(CustomArguments<uint64_t>); ``` {F8768210} ``` $ ~/src/googlebenchmark/tools/compare.py --no-utest benchmarks build-{old,new}/test/llvm-bcmp-bench RUNNING: build-old/test/llvm-bcmp-bench --benchmark_out=/tmp/tmpb6PEUx 2019-04-25 21:17:11 Running build-old/test/llvm-bcmp-bench Run on (8 X 4000 MHz CPU s) CPU Caches: L1 Data 16K (x8) L1 Instruction 64K (x4) L2 Unified 2048K (x4) L3 Unified 8192K (x1) Load Average: 0.65, 3.90, 4.14 --------------------------------------------------------------------------------------------------- Benchmark Time CPU Iterations UserCounters... --------------------------------------------------------------------------------------------------- <...> BM_bcmp<uint8_t, Identical>/512000 432131 ns 432101 ns 1613 bytes_read/iteration=1000k bytes_read/sec=2.20706G/s eltcnt=825.856M eltcnt/sec=1.18491G/s BM_bcmp<uint8_t, Identical>_BigO 0.86 N 0.86 N BM_bcmp<uint8_t, Identical>_RMS 8 % 8 % <...> BM_bcmp<uint16_t, Identical>/256000 161408 ns 161409 ns 4027 bytes_read/iteration=1000k bytes_read/sec=5.90843G/s eltcnt=1030.91M eltcnt/sec=1.58603G/s BM_bcmp<uint16_t, Identical>_BigO 0.67 N 0.67 N BM_bcmp<uint16_t, Identical>_RMS 25 % 25 % <...> BM_bcmp<uint32_t, Identical>/128000 81497 ns 81488 ns 8415 bytes_read/iteration=1000k bytes_read/sec=11.7032G/s eltcnt=1077.12M eltcnt/sec=1.57078G/s BM_bcmp<uint32_t, Identical>_BigO 0.71 N 0.71 N BM_bcmp<uint32_t, Identical>_RMS 42 % 42 % <...> BM_bcmp<uint64_t, Identical>/64000 50138 ns 50138 ns 10909 bytes_read/iteration=1000k bytes_read/sec=19.0209G/s eltcnt=698.176M eltcnt/sec=1.27647G/s BM_bcmp<uint64_t, Identical>_BigO 0.84 N 0.84 N BM_bcmp<uint64_t, Identical>_RMS 27 % 27 % <...> BM_bcmp<uint8_t, InequalHalfway>/512000 192405 ns 192392 ns 3638 bytes_read/iteration=1000k bytes_read/sec=4.95694G/s eltcnt=1.86266G eltcnt/sec=2.66124G/s BM_bcmp<uint8_t, InequalHalfway>_BigO 0.38 N 0.38 N BM_bcmp<uint8_t, InequalHalfway>_RMS 3 % 3 % <...> BM_bcmp<uint16_t, InequalHalfway>/256000 127858 ns 127860 ns 5477 bytes_read/iteration=1000k bytes_read/sec=7.45873G/s eltcnt=1.40211G eltcnt/sec=2.00219G/s BM_bcmp<uint16_t, InequalHalfway>_BigO 0.50 N 0.50 N BM_bcmp<uint16_t, InequalHalfway>_RMS 0 % 0 % <...> BM_bcmp<uint32_t, InequalHalfway>/128000 49140 ns 49140 ns 14281 bytes_read/iteration=1000k bytes_read/sec=19.4072G/s eltcnt=1.82797G eltcnt/sec=2.60478G/s BM_bcmp<uint32_t, InequalHalfway>_BigO 0.40 N 0.40 N BM_bcmp<uint32_t, InequalHalfway>_RMS 18 % 18 % <...> BM_bcmp<uint64_t, InequalHalfway>/64000 32101 ns 32099 ns 21786 bytes_read/iteration=1000k bytes_read/sec=29.7101G/s eltcnt=1.3943G eltcnt/sec=1.99381G/s BM_bcmp<uint64_t, InequalHalfway>_BigO 0.50 N 0.50 N BM_bcmp<uint64_t, InequalHalfway>_RMS 1 % 1 % RUNNING: build-new/test/llvm-bcmp-bench --benchmark_out=/tmp/tmpQ46PP0 2019-04-25 21:19:29 Running build-new/test/llvm-bcmp-bench Run on (8 X 4000 MHz CPU s) CPU Caches: L1 Data 16K (x8) L1 Instruction 64K (x4) L2 Unified 2048K (x4) L3 Unified 8192K (x1) Load Average: 1.01, 2.85, 3.71 --------------------------------------------------------------------------------------------------- Benchmark Time CPU Iterations UserCounters... --------------------------------------------------------------------------------------------------- <...> BM_bcmp<uint8_t, Identical>/512000 18593 ns 18590 ns 37565 bytes_read/iteration=1000k bytes_read/sec=51.2991G/s eltcnt=19.2333G eltcnt/sec=27.541G/s BM_bcmp<uint8_t, Identical>_BigO 0.04 N 0.04 N BM_bcmp<uint8_t, Identical>_RMS 37 % 37 % <...> BM_bcmp<uint16_t, Identical>/256000 18950 ns 18948 ns 37223 bytes_read/iteration=1000k bytes_read/sec=50.3324G/s eltcnt=9.52909G eltcnt/sec=13.511G/s BM_bcmp<uint16_t, Identical>_BigO 0.08 N 0.08 N BM_bcmp<uint16_t, Identical>_RMS 34 % 34 % <...> BM_bcmp<uint32_t, Identical>/128000 18627 ns 18627 ns 37895 bytes_read/iteration=1000k bytes_read/sec=51.198G/s eltcnt=4.85056G eltcnt/sec=6.87168G/s BM_bcmp<uint32_t, Identical>_BigO 0.16 N 0.16 N BM_bcmp<uint32_t, Identical>_RMS 35 % 35 % <...> BM_bcmp<uint64_t, Identical>/64000 18855 ns 18855 ns 37458 bytes_read/iteration=1000k bytes_read/sec=50.5791G/s eltcnt=2.39731G eltcnt/sec=3.3943G/s BM_bcmp<uint64_t, Identical>_BigO 0.32 N 0.32 N BM_bcmp<uint64_t, Identical>_RMS 33 % 33 % <...> BM_bcmp<uint8_t, InequalHalfway>/512000 9570 ns 9569 ns 73500 bytes_read/iteration=1000k bytes_read/sec=99.6601G/s eltcnt=37.632G eltcnt/sec=53.5046G/s BM_bcmp<uint8_t, InequalHalfway>_BigO 0.02 N 0.02 N BM_bcmp<uint8_t, InequalHalfway>_RMS 29 % 29 % <...> BM_bcmp<uint16_t, InequalHalfway>/256000 9547 ns 9547 ns 74343 bytes_read/iteration=1000k bytes_read/sec=99.8971G/s eltcnt=19.0318G eltcnt/sec=26.8159G/s BM_bcmp<uint16_t, InequalHalfway>_BigO 0.04 N 0.04 N BM_bcmp<uint16_t, InequalHalfway>_RMS 29 % 29 % <...> BM_bcmp<uint32_t, InequalHalfway>/128000 9396 ns 9394 ns 73521 bytes_read/iteration=1000k bytes_read/sec=101.518G/s eltcnt=9.41069G eltcnt/sec=13.6255G/s BM_bcmp<uint32_t, InequalHalfway>_BigO 0.08 N 0.08 N BM_bcmp<uint32_t, InequalHalfway>_RMS 30 % 30 % <...> BM_bcmp<uint64_t, InequalHalfway>/64000 9499 ns 9498 ns 73802 bytes_read/iteration=1000k bytes_read/sec=100.405G/s eltcnt=4.72333G eltcnt/sec=6.73808G/s BM_bcmp<uint64_t, InequalHalfway>_BigO 0.16 N 0.16 N BM_bcmp<uint64_t, InequalHalfway>_RMS 28 % 28 % Comparing build-old/test/llvm-bcmp-bench to build-new/test/llvm-bcmp-bench Benchmark Time CPU Time Old Time New CPU Old CPU New --------------------------------------------------------------------------------------------------------------------------------------- <...> BM_bcmp<uint8_t, Identical>/512000 -0.9570 -0.9570 432131 18593 432101 18590 <...> BM_bcmp<uint16_t, Identical>/256000 -0.8826 -0.8826 161408 18950 161409 18948 <...> BM_bcmp<uint32_t, Identical>/128000 -0.7714 -0.7714 81497 18627 81488 18627 <...> BM_bcmp<uint64_t, Identical>/64000 -0.6239 -0.6239 50138 18855 50138 18855 <...> BM_bcmp<uint8_t, InequalHalfway>/512000 -0.9503 -0.9503 192405 9570 192392 9569 <...> BM_bcmp<uint16_t, InequalHalfway>/256000 -0.9253 -0.9253 127858 9547 127860 9547 <...> BM_bcmp<uint32_t, InequalHalfway>/128000 -0.8088 -0.8088 49140 9396 49140 9394 <...> BM_bcmp<uint64_t, InequalHalfway>/64000 -0.7041 -0.7041 32101 9499 32099 9498 ``` What can we tell from the benchmark? * Performance of naive equality check somewhat improves with element size, maxing out at eltcnt/sec=1.58603G/s for uint16_t, or bytes_read/sec=19.0209G/s for uint64_t. I think, that instability implies performance problems. * Performance of `memcmp()`-aware benchmark always maxes out at around bytes_read/sec=51.2991G/s for every type. That is 2.6x the throughput of the naive variant! * eltcnt/sec metric for the `memcmp()`-aware benchmark maxes out at eltcnt/sec=27.541G/s for uint8_t (was: eltcnt/sec=1.18491G/s, so 24x) and linearly decreases with element size. For uint64_t, it's ~4x+ the elements/second. * The call obvious is more pricey than the loop, with small element count. As it can be seen from the full output {F8768210}, the `memcmp()` is almost universally worse, independent of the element size (and thus buffer size) when element count is less than 8. So all in all, bcmp idiom does indeed pose untapped performance headroom. This diff does implement said idiom recognition. I think a reasonable test coverage is present, but do tell if there is anything obvious missing. Now, quality. This does succeed to build and pass the test-suite, at least without any non-bundled elements. {F8768216} {F8768217} This transform fires 91 times: ``` $ /build/test-suite/utils/compare.py -m loop-idiom.NumBCmp result-new.json Tests: 1149 Metric: loop-idiom.NumBCmp Program result-new MultiSourc...Benchmarks/7zip/7zip-benchmark 79.00 MultiSource/Applications/d/make_dparser 3.00 SingleSource/UnitTests/vla 2.00 MultiSource/Applications/Burg/burg 1.00 MultiSourc.../Applications/JM/lencod/lencod 1.00 MultiSource/Applications/lemon/lemon 1.00 MultiSource/Benchmarks/Bullet/bullet 1.00 MultiSourc...e/Benchmarks/MallocBench/gs/gs 1.00 MultiSourc...gs-C/TimberWolfMC/timberwolfmc 1.00 MultiSourc...Prolangs-C/simulator/simulator 1.00 ``` The size changes are: I'm not sure what's going on with SingleSource/UnitTests/vla.test yet, did not look. ``` $ /build/test-suite/utils/compare.py -m size..text result-{old,new}.json --filter-hash Tests: 1149 Same hash: 907 (filtered out) Remaining: 242 Metric: size..text Program result-old result-new diff test-suite...ingleSource/UnitTests/vla.test 753.00 833.00 10.6% test-suite...marks/7zip/7zip-benchmark.test 1001697.00 966657.00 -3.5% test-suite...ngs-C/simulator/simulator.test 32369.00 32321.00 -0.1% test-suite...plications/d/make_dparser.test 89585.00 89505.00 -0.1% test-suite...ce/Applications/Burg/burg.test 40817.00 40785.00 -0.1% test-suite.../Applications/lemon/lemon.test 47281.00 47249.00 -0.1% test-suite...TimberWolfMC/timberwolfmc.test 250065.00 250113.00 0.0% test-suite...chmarks/MallocBench/gs/gs.test 149889.00 149873.00 -0.0% test-suite...ications/JM/lencod/lencod.test 769585.00 769569.00 -0.0% test-suite.../Benchmarks/Bullet/bullet.test 770049.00 770049.00 0.0% test-suite...HMARK_ANISTROPIC_DIFFUSION/128 NaN NaN nan% test-suite...HMARK_ANISTROPIC_DIFFUSION/256 NaN NaN nan% test-suite...CHMARK_ANISTROPIC_DIFFUSION/64 NaN NaN nan% test-suite...CHMARK_ANISTROPIC_DIFFUSION/32 NaN NaN nan% test-suite...ENCHMARK_BILATERAL_FILTER/64/4 NaN NaN nan% Geomean difference nan% result-old result-new diff count 1.000000e+01 10.00000 10.000000 mean 3.152090e+05 311695.40000 0.006749 std 3.790398e+05 372091.42232 0.036605 min 7.530000e+02 833.00000 -0.034981 25% 4.243300e+04 42401.00000 -0.000866 50% 1.197370e+05 119689.00000 -0.000392 75% 6.397050e+05 639705.00000 -0.000005 max 1.001697e+06 966657.00000 0.106242 ``` I don't have timings though. And now to the code. The basic idea is to completely replace the whole loop. If we can't fully kill it, don't transform. I have left one or two comments in the code, so hopefully it can be understood. Also, there is a few TODO's that i have left for follow-ups: * widening of `memcmp()`/`bcmp()` * step smaller than the comparison size * Metadata propagation * more than two blocks as long as there is still a single backedge? * ??? Reviewers: reames, fhahn, mkazantsev, chandlerc, craig.topper, courbet Reviewed By: courbet Subscribers: hiraditya, xbolva00, nikic, jfb, gchatelet, courbet, llvm-commits, mclow.lists Tags: #llvm Differential Revision: https://reviews.llvm.org/D61144 llvm-svn: 370454	2019-08-30 09:51:23 +00:00
David Stenberg	b35d4699d0	[LiveDebugValues] Insert entry values after bundles Summary: Change LiveDebugValues so that it inserts entry values after the bundle which contains the clobbering instruction. Previously it would insert the debug value after the bundle head using insertAfter(), breaking the bundle. Reviewers: djtodoro, NikolaPrica, aprantl, vsk Reviewed By: vsk Subscribers: hiraditya, llvm-commits Tags: #debug-info, #llvm Differential Revision: https://reviews.llvm.org/D66888 llvm-svn: 370448	2019-08-30 09:06:50 +00:00
Martin Storsjo	3d3a9b3b41	[LLD] [COFF] Support merging resource object files Extend WindowsResourceParser to support using a ResourceSectionRef for loading resources from an object file. Only allow merging resource object files in mingw mode; keep the existing error on multiple resource objects in link mode. If there only is one resource object file and no .res resources, don't parse and recreate the .rsrc section, but just link it in without inspecting it. This allows users to produce any .rsrc section (outside of what the parser supports), just like before. (I don't have a specific need for this, but it reduces the risk of this new feature.) Separate out the .rsrc section chunks in InputFiles.cpp, and only include them in the list of section chunks to link if we've determined that there only was one single resource object. (We need to keep other chunks from those object files, as they can legitimately contain other sections as well, in addition to .rsrc section chunks.) Differential Revision: https://reviews.llvm.org/D66824 llvm-svn: 370436	2019-08-30 06:56:33 +00:00
Martin Storsjo	d8d63ff24b	[WindowsResource] Remove use of global variables in WindowsResourceParser Instead of updating a global variable counter for the next index of strings and data blobs, pass along a reference to actual data/string vectors and let the TreeNode insertion methods add their data/strings to the vectors when a new entry is needed. Additionally, if the resource tree had duplicates, that were ignored with -force:multipleres in lld, we no longer store all versions of the duplicated resource data, now we only keep the one that actually ends up referenced. Differential Revision: https://reviews.llvm.org/D66823 llvm-svn: 370435	2019-08-30 06:56:02 +00:00
Martin Storsjo	e62d5682fb	[WindowsResource] Avoid duplicating the input filenames for each resource. NFC. Differential Revision: https://reviews.llvm.org/D66821 llvm-svn: 370434	2019-08-30 06:55:54 +00:00
Martin Storsjo	9438221785	[COFF] Add a ResourceSectionRef method for getting resource contents This allows llvm-readobj to print the contents of each resource when printing resources from an object file or executable, like it already does for plain .res files. This requires providing the whole COFFObjectFile to ResourceSectionRef. This supports both object files and executables. For executables, the DataRVA field is used as is to look up the right section. For object files, ideally we would need to complete linking of them and fix up all relocations to know what the DataRVA field would end up being. In practice, the only thing that makes sense for an RVA field is an ADDR32NB relocation. Thus, find a relocation pointing at this field, verify that it has the expected type, locate the symbol it points at, look up the section the symbol points at, and read from the right offset in that section. This works both for GNU windres object files (which use one single .rsrc section, with all relocations against the base of the .rsrc section, with the original value of the DataRVA field being the offset of the data from the beginning of the .rsrc section) and cvtres object files (with two separate .rsrc$01 and .rsrc$02 sections, and one symbol per data entry, with the original pre-relocated DataRVA field being set to zero). Differential Revision: https://reviews.llvm.org/D66820 llvm-svn: 370433	2019-08-30 06:55:49 +00:00
Petar Avramovic	e96892a8aa	[MIPS GlobalISel] Lower uitofp Add custom lowering for G_UITOFP for MIPS32. Differential Revision: https://reviews.llvm.org/D66930 llvm-svn: 370432	2019-08-30 05:51:12 +00:00
Petar Avramovic	6412b56513	[MIPS GlobalISel] Lower fptoui Add lower for G_FPTOUI. Algorithm is similar to the SDAG version in TargetLowering::expandFP_TO_UINT. Lower G_FPTOUI for MIPS32. Differential Revision: https://reviews.llvm.org/D66929 llvm-svn: 370431	2019-08-30 05:44:02 +00:00
Dan Gohman	8cfeeaf9de	[CodeGen] Fix lowering for returning the result of an extractvalue When the number of return values exceeds the number of registers available, SelectionDAGBuilder::visitRet transforms a function's return to use a pointer to a buffer to hold return values. When the returned value is an operator such as extractvalue, the value may have a non-zero result number. Add that number to the indexing when obtaining the values to store. This fixes https://bugs.llvm.org/show_bug.cgi?id=43132. Differential Revision: https://reviews.llvm.org/D66978 llvm-svn: 370430	2019-08-30 04:33:22 +00:00
Jinsong Ji	a070f12e57	[PowerPC][NFC] Use inline Subtarget->isPPC64() To be consistent with all the other instances. llvm-svn: 370428	2019-08-30 03:16:41 +00:00
Fangrui Song	7704b54389	[PPC32] Emit R_PPC_GOT_TPREL16 instead R_PPC_GOT_TPREL16_LO Unlike ppc64, which has ADDISgotTprelHA+LDgotTprelL pairs, ppc32 just uses LDgotTprelL32, so it does not make lots of sense to use _LO without a paired _HA. Emit R_PPC_GOT_TPREL16 instead R_PPC_GOT_TPREL16_LO to match GCC, and get better linker relocation check. Note, R_PPC_GOT_TPREL16_{HA,LO} don't have good linker support: (a) lld does not support R_PPC_GOT_TPREL16_{HA,LO}. (b) Top of tree ld.bfd does not support R_PPC_GOT_REL16_HA Initial-Exec -> Local-Exec relaxation: // a.o addis 3, 3, tsd_tls@got@tprel@ha lwz 3, tsd_tls@got@tprel@l(3) add 3, 3, tsd_tls@tls // b.o .section .tdata,"awT"; .globl tsd_tls; tsd_tls: // ld/ld-new a.o b.o internal error, aborting at ../../bfd/elf32-ppc.c:7952 in ppc_elf_relocate_section Reviewed By: adalava Differential Revision: https://reviews.llvm.org/D66925 llvm-svn: 370426	2019-08-30 02:20:49 +00:00
Craig Topper	160ed4cab4	[X86] Explicitly list all the always trivially rematerializable instructions. Add a default with an llvm_unreachable for anything we don't expect. This seems safer that just blindly returning true for anything missing from the switch. llvm-svn: 370424	2019-08-30 00:54:36 +00:00
Dan Gohman	da84b688f9	[WebAssembly] Make __attribute__((used)) not imply export. Add an WASM_SYMBOL_NO_STRIP flag, so that __attribute__((used)) doesn't need to imply exporting. When targeting Emscripten, have WASM_SYMBOL_NO_STRIP imply exporting. Differential Revision: https://reviews.llvm.org/D62542 llvm-svn: 370415	2019-08-29 22:40:00 +00:00
Jinsong Ji	1ed7d2119e	[PowerPC] Support extended mnemonics mffprwz etc. Summary: Reported in https://github.com/opencv/opencv/issues/15413. We have serveral extended mnemonics for Move To/From Vector-Scalar Register Instructions eg: mffprd,mtfprd etc. We only support one of them, this patch add the others. Reviewers: nemanjai, steven.zhang, hfinkel, #powerpc Reviewed By: hfinkel Subscribers: wuzish, qcolombet, hiraditya, kbarton, MaskRay, shchenz, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66963 llvm-svn: 370411	2019-08-29 21:53:59 +00:00
Jessica Paquette	04e657be28	[AArch64][GlobalISel] Select arithmetic extended register patterns This teaches GISel to select patterns which fold an extend plus optional shift into the addressing mode. In particular, adds and subs. Factor out the arith extended register ComplexPatterns in AArch64InstrFormats.td and create GISel equivalents. Add some equivalent functions to the ones in AArch64ISelDAGToDAG: - `selectArithExtendedRegister` - `narrowExtendRegIfNeeded` - `getExtendTypeForInst` `getExtendTypeForInst` includes the checks for loads and stores. This will be used for WRO addressing modes in loads + stores. Teach selectCopy to properly handle subregister copies on the same bank in order to support `narrowExtendRegIfNeeded`. The extended register must be a GPR32, so we need to support same-bank subregister copies. Fix a bug in getSubRegForClass which would cause registers on things like GPR32common to end up getting ssub. Just change the check to look for FPR32 rather than GPR32. For tests: - Add select-arith-extended-reg.mir - Update addsub_ext.ll to include GlobalISel checks Differential Revision: https://reviews.llvm.org/D66835 llvm-svn: 370410	2019-08-29 21:53:58 +00:00
Reid Kleckner	5b79e603d3	[X86] Don't emit unreachable stack adjustments Summary: This is a minor improvement on our past attempts to do this. Fixes PR43155. Reviewers: hans Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66905 llvm-svn: 370409	2019-08-29 21:24:41 +00:00
Reid Kleckner	81e458d001	Allow '@' to appear in x86 mingw symbols Summary: There is no reason to differ in assembler behavior here between -msvc and -gnu targets. Without this setting, the text after the '@' is interpreted as a symbol variable, like foo@IMGREL. Reviewers: mstorsjo Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66974 llvm-svn: 370408	2019-08-29 21:15:02 +00:00
Reid Kleckner	fe47ed67fc	Fix the build for MSVC builds using M_PI llvm-svn: 370405	2019-08-29 20:32:53 +00:00
Simon Pilgrim	3d705a1fa4	[X86][SSE] combinePMULDQ - pmuldq(x, 0) -> zero vector (PR43159) ISD::isBuildVectorAllZeros permits undef elements to be present, which means we can't return it as a zero vector. PMULDQ/PMULUDQ is an extending multiply so a multiply by zero of the lower 32-bits should result in a zero 64-bit element. llvm-svn: 370404	2019-08-29 20:22:08 +00:00
Matt Arsenault	cbd1782c79	AMDGPU/GlobalISel: Legalize sin/cos llvm-svn: 370402	2019-08-29 20:06:48 +00:00
Sanjay Patel	65f1c04000	[InstCombine] reduce duplicated code; NFC llvm-svn: 370399	2019-08-29 19:36:18 +00:00
Jordan Rupprecht	f9f81289e6	Revert [MBP] Disable aggressive loop rotate in plain mode This reverts r369664 (git commit `51f48295cb`) It causes many benchmark regressions, internally and in llvm's benchmark suite. llvm-svn: 370398	2019-08-29 19:03:58 +00:00
Alina Sbirlea	4b87023bae	Revert enabling MemorySSA. Breaks sanitizers bots. Differential Revision: https://reviews.llvm.org/D58311 llvm-svn: 370397	2019-08-29 19:01:23 +00:00
Craig Topper	5a43fdd313	[X86] Remove what little support we had for MPX -Deprecate -mmpx and -mno-mpx command line options -Remove CPUID detection of mpx for -march=native -Remove MPX from all CPUs -Remove MPX preprocessor define I've left the "mpx" string in the backend so we don't fail on old IR, but its not connected to anything. gcc has also deprecated these command line options. https://www.phoronix.com/scan.php?page=news_item&px=GCC-Patch-To-Drop-MPX Differential Revision: https://reviews.llvm.org/D66669 llvm-svn: 370393	2019-08-29 18:09:02 +00:00
Matt Arsenault	093ebf9275	GlobalISel: Don't compute known bits for non-integral GEP llvm-svn: 370392	2019-08-29 17:55:05 +00:00
Florian Hahn	f9cdb98f40	[LoopUnrollAndJam] Use Lazy strategy for DTU. We can also apply the earlier updates to the lazy DTU, instead of applying them directly. Reviewers: kuhar, brzycki, asbirlea, SjoerdMeijer Reviewed By: brzycki, asbirlea, SjoerdMeijer Differential Revision: https://reviews.llvm.org/D66918 llvm-svn: 370391	2019-08-29 17:47:58 +00:00
Matt Arsenault	b2b9a23758	GlobalISel: Add maskedValueIsZero and signBitIsZero to known bits I dropped the DemandedElts since it seems to be missing from some of the new interfaces, but not others. llvm-svn: 370389	2019-08-29 17:24:36 +00:00
Matt Arsenault	caff0a88dd	GlobalISel: Add known bits to InstructionSelector AMDGPU uses this for some addressing mode selection patterns. The analysis run itself doesn't do anything so it seems easier to just always require this than adding a way to opt in. llvm-svn: 370388	2019-08-29 17:24:32 +00:00
Alina Sbirlea	6289ee941d	[MemorySSA & LoopPassManager] Enable MemorySSA as loop dependency. Update tests. Summary: I'm not planning to check this in at the moment, but feedback is very welcome, in particular how this affects performance. The feedback obtains here will guide the next steps towards enabling this. This patch enables the use of MemorySSA in the loop pass manager. Passes that currently use MemorySSA: - EarlyCSE Passes that use MemorySSA after this patch: - EarlyCSE - LICM - SimpleLoopUnswitch Loop passes that update MemorySSA (and do not use it yet, but could use it after this patch): - LoopInstSimplify - LoopSimplifyCFG - LoopUnswitch - LoopRotate - LoopSimplify - LCSSA Loop passes that do not update MemorySSA: - IndVarSimplify - LoopDelete - LoopIdiom - LoopSink - LoopUnroll - LoopInterchange - LoopUnrollAndJam - LoopVectorize - LoopReroll - IRCE Reviewers: chandlerc, george.burgess.iv, davide, sanjoy, gberry Subscribers: jlebar, Prazek, dmgreen, jdoerfert, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D58311 llvm-svn: 370384	2019-08-29 17:08:13 +00:00
Jessica Paquette	ba04f5fac1	[GlobalISel][AArch64] Select llvm.aarch64.stxr* intrinsics. Add a GISelPredicateCode to the stxr_* PatFrags in AArch64InstrAtomics.td. This allows us to select these intrinsics. Differential Revision: https://reviews.llvm.org/D65779 llvm-svn: 370382	2019-08-29 16:55:55 +00:00
Jessica Paquette	b8b23a1648	[GlobalISel][AArch64] Use a GISelPredicateCode to select llvm.aarch64.stlxr.* Remove manual selection code for this intrinsic and use a GISelPredicateCode instead. This allows us to fully select this intrinsic without any tricky custom C++ matching. Differential Revision: https://reviews.llvm.org/D65780 llvm-svn: 370380	2019-08-29 16:45:19 +00:00
Jessica Paquette	87720ac8c8	[AArch64][GlobalISel] Select @llvm.aarch64.ldxr.* intrinsics Same thing as D66897, but for ldxr.* instead. Add a GISelPredicateCode to the ldxr_* definitions, which allows us to import them. Add select-ldxr-intrin.mir, and update arm64-ldxr-stxr.ll. Differential Revision: https://reviews.llvm.org/D66898 llvm-svn: 370378	2019-08-29 16:33:01 +00:00
Jessica Paquette	c327daeea5	[AArch64][GlobalISel] Select @llvm.aarch64.ldaxr.* intrinsics Add a GISelPredicateCode to ldaxr_. This allows us to import the patterns for @llvm.aarch64.ldaxr., and thus select them. Add `isLoadStoreOfNumBytes` for the GISelPredicateCode, since each of these intrinsics involves the same check. Add select-ldaxr-intrin.mir, and update arm64-ldxr-stxr.ll. Differential Revision: https://reviews.llvm.org/D66897 llvm-svn: 370377	2019-08-29 16:16:38 +00:00
Michael Liao	001871dee8	[SimplifyCFG] Skip sinking common lifetime markers of `alloca`. Summary: - Similar to the workaround in fix of PR30188, skip sinking common lifetime markers of `alloca`. They are mostly left there after inlining functions in branches. Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66950 llvm-svn: 370376	2019-08-29 16:12:05 +00:00
Simon Pilgrim	ea67741899	[DAGCombine] Fix shadow variable warnings. NFCI. llvm-svn: 370365	2019-08-29 14:34:07 +00:00
Pavel Labath	bd546e5902	DWARFDebugLoc: Make parsing and error reporting more robust Summary: While examining this class for possible use in lldb, I noticed two things: - it spits out parsing errors directly to stderr - the loclists parser can incorrectly return valid location lists when parsing malformed (truncated) data I improve the stderr situation by making the parseOneLocationList functions return Expected<T>s. The errors are still dumped to stderr by their callers, so this is only a partial fix, but it is enough for my use case, as I intend to parse the locations lists one by one. I fix the behavior in the truncated scenario by using the newly introduced DataExtractor Cursor API. I also add tests for handling the error cases, as they currently have no coverage. Reviewers: dblaikie, JDevlieghere, probinson Subscribers: lldb-commits, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63591 llvm-svn: 370363	2019-08-29 14:26:05 +00:00
Joerg Sonnenberger	799c96693f	Allow replaceAndRecursivelySimplify to list unsimplified visitees. This is part of D65280 and split it to avoid ABI changes on the 9.0 release branch. llvm-svn: 370355	2019-08-29 13:22:30 +00:00
Simon Atanasyan	b23857c149	[mips] Inline emitStoreWithSymOffset and emitLoadWithSymOffset methods. NFC Both methods `MipsTargetStreamer::emitStoreWithSymOffset` and `MipsTargetStreamer::emitLoadWithSymOffset` are almost the same and differ argument names only. These methods are used in the single place so it's better to inline their code and remove original methods. llvm-svn: 370354	2019-08-29 13:19:50 +00:00
Simon Atanasyan	3464b91ef7	[mips] Fix expanding `lw/sw $reg1, symbol($reg2)` instruction When a "base" in the `lw/sw $reg1, symbol($reg2)` instruction is a register and generated code is position independent, backend does not add the "base" value to the symbol address. ``` lw $reg1, %got(symbol)($gp) lw/sw $reg1, 0($reg1) ``` This patch fixes the bug and adds the missed `addu` instruction by passing `BaseReg` into the `loadAndAddSymbolAddress` routine and handles the case when the `BaseReg` is the zero register to escape redundant `move reg, reg` instruction: ``` lw $reg1, %got(symbol)($gp) addu $reg1, $reg1, $reg2 lw/sw $reg1, 0($reg1) ``` Differential Revision: https://reviews.llvm.org/D66894 llvm-svn: 370353	2019-08-29 13:19:38 +00:00
Roman Lebedev	c584786854	[InstSimplify] Drop leftover "division-by-zero guard" around `@llvm.umul.with.overflow` inverted overflow bit Summary: Now that with D65143/D65144 we've produce `@llvm.umul.with.overflow`, and with D65147 we've flattened the CFG, we now can see that the guard may have been there to prevent division by zero is redundant. We can simply drop it: ``` ---------------------------------------- Name: no overflow or zero %iszero = icmp eq i4 %y, 0 %umul = smul_overflow i4 %x, %y %umul.ov = extractvalue {i4, i1} %umul, 1 %umul.ov.not = xor %umul.ov, -1 %retval.0 = or i1 %iszero, %umul.ov.not ret i1 %retval.0 => %iszero = icmp eq i4 %y, 0 %umul = smul_overflow i4 %x, %y %umul.ov = extractvalue {i4, i1} %umul, 1 %umul.ov.not = xor %umul.ov, -1 %retval.0 = or i1 %iszero, %umul.ov.not ret i1 %umul.ov.not Done: 1 Optimization is correct! ``` Note that this is inverted from what we have in a previous patch, here we are looking for the inverted overflow bit. And that inversion is kinda problematic - given this particular pattern we neither hoist that `not` closer to `ret` (then the pattern would have been identical to the one without inversion, and would have been handled by the previous patch), neither do the opposite transform. But regardless, we should handle this too. I've filled [[ https://bugs.llvm.org/show_bug.cgi?id=42720 \| PR42720 ]]. Reviewers: nikic, spatel, xbolva00, RKSimon Reviewed By: spatel Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65151 llvm-svn: 370351	2019-08-29 12:48:04 +00:00
Roman Lebedev	aaf6ab4410	[InstSimplify] Drop leftover "division-by-zero guard" around `@llvm.umul.with.overflow` overflow bit Summary: Now that with D65143/D65144 we've produce `@llvm.umul.with.overflow`, and with D65147 we've flattened the CFG, we now can see that the guard may have been there to prevent division by zero is redundant. We can simply drop it: ``` ---------------------------------------- Name: no overflow and not zero %iszero = icmp ne i4 %y, 0 %umul = umul_overflow i4 %x, %y %umul.ov = extractvalue {i4, i1} %umul, 1 %retval.0 = and i1 %iszero, %umul.ov ret i1 %retval.0 => %iszero = icmp ne i4 %y, 0 %umul = umul_overflow i4 %x, %y %umul.ov = extractvalue {i4, i1} %umul, 1 %retval.0 = and i1 %iszero, %umul.ov ret %umul.ov Done: 1 Optimization is correct! ``` Reviewers: nikic, spatel, xbolva00 Reviewed By: spatel Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65150 llvm-svn: 370350	2019-08-29 12:47:50 +00:00
Roman Lebedev	9f35d2b564	[SimplifyCFG] FoldTwoEntryPHINode(): don't bailout on i1 PHI's if we can hoist a 'not' from incoming values Summary: As it can be seen in the tests in D65143/D65144, even though we have formed an '@llvm.umul.with.overflow' and got rid of potential for division-by-zero, the control flow remains, we still have that branch. We have this condition: ``` // Don't fold i1 branches on PHIs which contain binary operators // These can often be turned into switches and other things. if (PN->getType()->isIntegerTy(1) && (isa<BinaryOperator>(PN->getIncomingValue(0)) \|\| isa<BinaryOperator>(PN->getIncomingValue(1)) \|\| isa<BinaryOperator>(IfCond))) return false; ``` which was added back in rL121764 to help with `select` formation i think? That check prevents us to flatten the CFG here, even though we know we no longer need that guard and will be able to drop everything but the '@llvm.umul.with.overflow' + `not`. As it can be seen from tests, we end here because the `not` is being sinked into the PHI's incoming values by InstCombine, so we can't workaround this by hoisting it to after PHI. Thus i suggest that we relax that check to not bailout if we'd get to hoist the `not`. Reviewers: craig.topper, spatel, fhahn, nikic Reviewed By: spatel Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65147 llvm-svn: 370349	2019-08-29 12:47:34 +00:00
Roman Lebedev	473a063a5e	[InstCombine] Fold '((%x * %y) u/ %x) != %y' to '@llvm.umul.with.overflow' + overflow bit extraction Summary: `((%x * %y) u/ %x) != %y` is one of (3?) common ways to check that some unsigned multiplication (will not) overflow. Currently, we don't catch it. We could: ``` $ /repositories/alive2/build-Clang-unknown/alive -root-only ~/llvm-patch1.ll Processing /home/lebedevri/llvm-patch1.ll.. ---------------------------------------- Name: no overflow %o0 = mul i4 %y, %x %o1 = udiv i4 %o0, %x %r = icmp ne i4 %o1, %y ret i1 %r => %n0 = umul_overflow i4 %x, %y %o0 = extractvalue {i4, i1} %n0, 0 %o1 = udiv %o0, %x %r = extractvalue {i4, i1} %n0, 1 ret %r Done: 1 Optimization is correct! ---------------------------------------- Name: no overflow %o0 = mul i4 %y, %x %o1 = udiv i4 %o0, %x %r = icmp eq i4 %o1, %y ret i1 %r => %n0 = umul_overflow i4 %x, %y %o0 = extractvalue {i4, i1} %n0, 0 %o1 = udiv %o0, %x %n1 = extractvalue {i4, i1} %n0, 1 %r = xor %n1, -1 ret i1 %r Done: 1 Optimization is correct! ``` Reviewers: nikic, spatel, efriedma, xbolva00, RKSimon Reviewed By: nikic Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65144 llvm-svn: 370348	2019-08-29 12:47:20 +00:00
Roman Lebedev	fb38b7aab3	[InstCombine] Fold '(-1 u/ %x) u< %y' to '@llvm.umul.with.overflow' + overflow bit extraction Summary: `(-1 u/ %x) u< %y` is one of (3?) common ways to check that some unsigned multiplication (will not) overflow. Currently, we don't catch it. We could: ``` ---------------------------------------- Name: no overflow %o0 = udiv i4 -1, %x %r = icmp ult i4 %o0, %y => %o0 = udiv i4 -1, %x %n0 = umul_overflow i4 %x, %y %r = extractvalue {i4, i1} %n0, 1 Done: 1 Optimization is correct! ---------------------------------------- Name: no overflow, swapped %o0 = udiv i4 -1, %x %r = icmp ugt i4 %y, %o0 => %o0 = udiv i4 -1, %x %n0 = umul_overflow i4 %x, %y %r = extractvalue {i4, i1} %n0, 1 Done: 1 Optimization is correct! ---------------------------------------- Name: overflow %o0 = udiv i4 -1, %x %r = icmp uge i4 %o0, %y => %o0 = udiv i4 -1, %x %n0 = umul_overflow i4 %x, %y %n1 = extractvalue {i4, i1} %n0, 1 %r = xor %n1, -1 Done: 1 Optimization is correct! ---------------------------------------- Name: overflow %o0 = udiv i4 -1, %x %r = icmp ule i4 %y, %o0 => %o0 = udiv i4 -1, %x %n0 = umul_overflow i4 %x, %y %n1 = extractvalue {i4, i1} %n0, 1 %r = xor %n1, -1 Done: 1 Optimization is correct! ``` As it can be observed from tests, while simply forming the `@llvm.umul.with.overflow` is easy, if we were looking for the inverted answer, then more work needs to be done to cleanup the now-pointless control-flow that was guarding against division-by-zero. This is being addressed in follow-up patches. Reviewers: nikic, spatel, efriedma, xbolva00, RKSimon Reviewed By: nikic, xbolva00 Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65143 llvm-svn: 370347	2019-08-29 12:47:08 +00:00
Roman Lebedev	cc95a45f8a	[CostModel] Model all `extractvalue`s as free. Summary: As disscussed in https://reviews.llvm.org/D65148#1606412, `extractvalue` don't actually generate any code, so we should treat them as free. Reviewers: craig.topper, RKSimon, jnspaulsson, greened, asb, t.p.northover, jmolloy, dmgreen Reviewed By: jmolloy Subscribers: javed.absar, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66098 llvm-svn: 370339	2019-08-29 11:50:30 +00:00
Jeremy Morse	ca0e4b3689	[DebugInfo] LiveDebugValues: correctly discriminate kinds of variable locations The missing line added by this patch ensures that only spilt variable locations are candidates for being restored from the stack. Otherwise, register or constant-value information can be interpreted as a spill location, through a union. The added regression test replicates a scenario where this occurs: the stack load from [rsp] causes the register-location DBG_VALUE to be "restored" to rsi, when it should be left alone. See PR43058 for details. Un x-fail a test that was suffering from this from a previous patch. Differential Revision: https://reviews.llvm.org/D66895 llvm-svn: 370334	2019-08-29 11:20:54 +00:00
Simon Pilgrim	6c2fc64edc	Fix signed/unsigned comparison warning. NFCI. llvm-svn: 370333	2019-08-29 11:18:53 +00:00
Simon Pilgrim	27f43e6b1a	Fix shadow variable warning. NFCI. llvm-svn: 370332	2019-08-29 11:16:32 +00:00
George Rimar	de0bc44883	[yaml2obj] - Allow placing local symbols after globals. This allows us to produce broken binaries with local symbols placed after global in '.dynsym'/'.symtab' Also, simplifies the code. Differential revision: https://reviews.llvm.org/D66799 llvm-svn: 370331	2019-08-29 10:58:47 +00:00
David Green	942c2e3795	[ARM] MVE Masked loads and stores Masked loads and store fit naturally with MVE, the instructions being easily predicated. This adds lowering for the simple cases of masked loads and stores. It does not yet deal with widening/narrowing or pre/post inc. The llvm masked load intrinsic will accept a "passthru" value, dictating the values used for the zero masked lanes. In MVE the instructions write 0 to the zero predicated lanes, so we need to match a passthru that isn't 0 (or undef) with a select instruction to pull in the correct data after the load. We also need to do something with unaligned loads/stores. Currently this uses a similar method used in big endian, using an VLDRB.8 (and potentially a VREV in BE). This does mean that the predicate mask is converted from, for example, a v4i1 to a v16i1. The VLDR instructions are defined as using the first bit of the relevant mask lane, so this could potentially load different results if the predicate is little odd. As the input is a v4i1 however, I believe this is OK and all the bits required should be set in the predicate, making the VLDRB.8 load the same data. Differential Revision: https://reviews.llvm.org/D66534 llvm-svn: 370329	2019-08-29 10:54:35 +00:00
Jeremy Morse	313d2ce999	[DebugInfo] LiveDebugValues should always revisit backedges if it skips them The "join" method in LiveDebugValues does not attempt to join unseen predecessor blocks if their out-locations aren't yet initialized, instead the block should be re-visited later to see if any locations have changed validity. However, because the set of blocks were all being "process"'d once before "join" saw them, that logic in "join" was actually ignoring legitimate out-locations on the first pass through. This meant that some invalidated locations were not removed from the head of loops, allowing illegal locations to persist. Fix this by removing the run of "process" before the main join/process loop in ExtendRanges. Now the unseen predecessors that "join" skips truly are uninitialized, and we come back to the block at a later time to re-run "join", see the @baz function added. This also fixes another fault where stack/register transfers in the entry block (or any other before-any-loop-block) had their tranfers initially ignored, and were then never revisited. The MIR test added tests for this behaviour. XFail a test that exposes another bug; a fix for this is coming in D66895. Differential Revision: https://reviews.llvm.org/D66663 llvm-svn: 370328	2019-08-29 10:53:29 +00:00
Roman Lebedev	cc7495a355	[X86][CodeGen][NFC] Delay `combineIncDecVector()` from DAGCombine to X86DAGToDAGISel Summary: We were previously doing it in DAGCombine. But we also want to do `sub %x, C` -> `add %x, (sub 0, C)` for vectors in DAGCombine. So if we had `sub %x, -1`, we'll transform it to `add %x, 1`, which `combineIncDecVector()` will immediately transform back into `sub %x, -1`, and here we go again... I've marked this as NFC since not a single test changes, but since that 'changes' DAGCombine, probably this isn't fully NFC. Reviewers: RKSimon, craig.topper, spatel Reviewed By: craig.topper Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62327 llvm-svn: 370327	2019-08-29 10:50:09 +00:00
Amaury Sechet	8365e42010	[DAGCombiner] (insert_vector_elt (vector_shuffle X, Y), (extract_vector_elt X, N), IdxC) -> (vector_shuffle X, Y) Summary: This is beneficial when the shuffle is only used once and end up being generated in a few places when some node is combined into a shuffle. Reviewers: craig.topper, efriedma, RKSimon, lebedev.ri Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66718 llvm-svn: 370326	2019-08-29 10:35:51 +00:00
Roman Lebedev	f13b0e3ed8	[InstCombine] Shift amount reassociation in bittest: trunc-of-lshr (PR42399) Summary: Finally, the fold i was looking forward to :) The legality check is muddy, i doubt i've groked the full generalization, but it handles all the cases i care about, and can come up with: https://rise4fun.com/Alive/26j I.e. we can perform the fold if any of the following is true: * The shift amount is either zero or one less than widest bitwidth * Either of the values being shifted has at most lowest bit set * The value that is being shifted by `shl` (which is not truncated) should have no less leading zeros than the total shift amount; * The value that is being shifted by `lshr` (which is truncated) should have no less leading zeros than the widest bit width minus total shift amount minus one I strongly suspect there is some better generalization, but i'm not aware of it as of right now. For now i also avoided using actual `computeKnownBits()`, but restricted it to constants. Reviewers: spatel, nikic, xbolva00 Reviewed By: spatel Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66383 llvm-svn: 370324	2019-08-29 10:26:23 +00:00
Simon Pilgrim	dfb2a19ac2	LegalizeSetCCCondCode - Reduce scope of NeedSwap to fix cppcheck warning. NFCI. No need for this to be defined outside the only switch case its used in. llvm-svn: 370320	2019-08-29 10:11:34 +00:00
Simon Pilgrim	920b04011b	Fix variable set but no used warnings on NDEBUG builds. NFCI. llvm-svn: 370319	2019-08-29 10:08:45 +00:00
Simon Pilgrim	ef9c6a7077	Fix variable set but no used warning on NDEBUG builds. NFCI. llvm-svn: 370317	2019-08-29 09:58:47 +00:00
Martin Storsjo	7ba81d95d5	[COFF] Add a ResourceSectionRef method for getting the data entry, print it in llvm-readobj Differential Revision: https://reviews.llvm.org/D66819 llvm-svn: 370311	2019-08-29 09:00:14 +00:00
Martin Storsjo	edb6ab9ba6	[COFF] Add a bounds checking helper for iterating a coff_resource_dir_table Instead of blindly incrementing pointers in llvm-readobj, use this helper, which does bounds checking against the available section data. Differential Revision: https://reviews.llvm.org/D66818 llvm-svn: 370310	2019-08-29 08:59:56 +00:00
Martin Storsjo	357a40ec7c	[COFF] Fix error handling in ResourceSectionRef Previously, the expression (Reader.readFoo()) was expanded twice, triggering asserts as one of the Error types ends up not checked (and as it was expanded twice, the method would end up called twice if it failed first). Differential Revision: https://reviews.llvm.org/D66817 llvm-svn: 370309	2019-08-29 08:59:41 +00:00
Craig Topper	c96284002e	[X86] Remove isel patterns with X86VBroadcast+scalar_to_vector+load. The DAG should have these as X86VBroadcast+load. llvm-svn: 370299	2019-08-29 06:36:16 +00:00
Craig Topper	231e628d69	[X86] Remove some unneeded X86VBroadcast isel patterns that have larger than 128 bit input types. We should always be shrinking the input to 128 bits or smaller when the node is created. llvm-svn: 370296	2019-08-29 06:02:11 +00:00
Hideto Ueno	cbab334e40	[Attributor] Deduce "noalias" attribute Summary: This patch adds very basic deduction for noalias. Reviewers: jdoerfert, sstefan1 Reviewed By: jdoerfert Tags: LLVM Differential Revision: https://reviews.llvm.org/D66207 llvm-svn: 370295	2019-08-29 05:52:00 +00:00
Craig Topper	1ec5c204b8	[X86] Add a DAG combine to combine INSERTPS and VBROADCAST of a scalar load. Remove corresponding isel patterns. We had an isel pattern to perform this, but its better to do it in DAG combine as a simplification. This also fixes the lack of patterns for AVX512 targets. llvm-svn: 370294	2019-08-29 05:48:48 +00:00
Craig Topper	1aadf6f39f	[X86] Make inline assembly 'x' and 'v' constraints work for f128. Including a type legalizer fix to make bitcast operand promotion work correctly when getSoftenedFloat returns f128 instead of i128. Fixes PR43157 llvm-svn: 370293	2019-08-29 05:13:56 +00:00
Florian Hahn	3177b92231	[LoopUnroll] Use Lazy strategy for DTU used for MergeBlockIntoPredecessor. We do not access the DT in the loop, so we do not have to apply updates eagerly. We can apply them lazyly and flush them after we are done merging blocks. As follow-up work, we might be able to use the DTU above as well, instead of manually updating the DT. This brings the example from PR43134 from ~100s to ~4s for a relase + assertions build on my machine. Reviewers: efriedma, kuhar, asbirlea, brzycki Reviewed By: kuhar, brzycki Differential Revision: https://reviews.llvm.org/D66911 llvm-svn: 370292	2019-08-29 04:26:29 +00:00
Johannes Doerfert	bf112139ac	[Attributor] Improve messages in iteration verify mode When we now verify the iteration count we will see the actual count and the expected count before the assertion is triggered. llvm-svn: 370285	2019-08-29 01:29:44 +00:00
Johannes Doerfert	62a9c1da78	[Attributor][Fix] Indicate change correctly llvm-svn: 370283	2019-08-29 01:26:58 +00:00
Johannes Doerfert	1aac182f31	[Attributor] Fix typo llvm-svn: 370282	2019-08-29 01:26:09 +00:00
Matt Arsenault	216d8ff60b	AMDGPU: Don't use frame virtual registers SGPR spills aren't really handled after SILowerSGPRSpills. In order to directly control what happens if the scavenger needs to spill, the scavenger needs to be used directly. There is an alternative to spilling in these contexts anyway since the frame register can be increment and restored. This does present another possible issue if spilling is needed for the unused carry out if an add is needed. I think this can be avoided by using a scalar add (although that clobbers SCC, which happens anyway). llvm-svn: 370281	2019-08-29 01:13:47 +00:00
Matt Arsenault	8ec5c10042	GlobalISel/TableGen: Handle setcc patterns This is a special case because one node maps to two different G_ instructions, and the operand order is changed. This mostly enables G_FCMP for AMDPGPU. G_ICMP is still manually selected for now since it has the SALU and VALU complication to deal with. llvm-svn: 370280	2019-08-29 01:13:41 +00:00
Craig Topper	af364131af	[X86] Fix a couple isel patterns to not shrink a volatile load. Also add a FIXME because I'm not sure why these patterns exist. Looks like a missing combine. And another FIXME because the AVX512 equivalent one of the patterns is missing. llvm-svn: 370276	2019-08-28 23:45:10 +00:00
Shiva Chen	b39876d8cd	[RISCV] Avoid generating AssertZext for LP64 ABI when lowering floating LibCall The patch fixed the issue that RV64 didn't clear the upper bits when return complex floating value with lp64 ABI. float _Complex complex_add(float _Complex a, float _Complex b) { return a + b; } RealResult = zero_extend(RealA + RealB) ImageResult = ImageA + ImageB Return (RealResult \| (ImageResult << 32)) The patch introduces shouldExtendTypeInLibCall target hook to suppress the AssertZext generation when lowering floating LibCall. Thanks to Eli's comments from the Bugzilla https://bugs.llvm.org/show_bug.cgi?id=42820 Differential Revision: https://reviews.llvm.org/D65497 llvm-svn: 370275	2019-08-28 23:40:37 +00:00
Heejin Ahn	d85fd5a3f4	[WebAssembly] Add atomic.fence instruction Summary: This adds `atomic.fence` instruction: https://github.com/WebAssembly/threads/blob/master/proposals/threads/Overview.md#fence-operator And we now emit the new `atomic.fence` instruction for multithread fences, rather than the prevous `atomic.rmw` hack. Reviewers: dschuff Subscribers: sbc100, jgravelle-google, hiraditya, sunfish, jfb, tlively, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66794 llvm-svn: 370272	2019-08-28 23:13:43 +00:00
Simon Atanasyan	027f1da010	[mips] Add an empty line to separate different patterns. NFC llvm-svn: 370269	2019-08-28 22:32:16 +00:00
Simon Atanasyan	59bb3609fa	[mips] Fix 64-bit address loading in case of applying 32-bit mask to the result If result of 64-bit address loading combines with 32-bit mask, LLVM tries to optimize the code and remove "redundant" loading of upper 32-bits of the address. It leads to incorrect code on MIPS64 targets. MIPS backend creates the following chain of commands to load 64-bit address in the `MipsTargetLowering::getAddrNonPICSym64` method: ``` (add (shl (add (shl (add %highest(sym), %higher(sym)), 16), %hi(sym)), 16), %lo(%sym)) ``` If the mask presents, LLVM decides to optimize the chain of commands. It really does not make sense to load upper 32-bits because the 0x0fffffff mask anyway clears them. After removing redundant commands we get this chain: ``` (add (shl (%hi(sym), 16), %lo(%sym)) ``` There is no patterns matched `(MipsHi (i64 symbol))`. Due a bug in `SYM_32` predicate definition, backend incorrectly selects a pattern for a 32-bit symbols and uses the `lui` instruction for loading `%hi(sym)`. As a result we get incorrect set of instructions with unnecessary 16-bit left shifting: ``` lui at,0x0 R_MIPS_HI16 foo dsll at,at,0x10 daddiu at,at,0 R_MIPS_LO16 foo ``` This patch resolves two problems: - Fix `SYM_32/SYM_64` predicates to prevent selection of patterns dedicated to 32-bit symbols in case of using N64 ABI. - Add missed patterns for 64-bit symbols for `%hi/%lo`. Fix PR42736. Differential Revision: https://reviews.llvm.org/D66228 llvm-svn: 370268	2019-08-28 22:32:10 +00:00
Artur Pilipenko	925afc1ce7	Fix for "DICompileUnit not listed in llvm.dbg.cu" verification error after ... ...cloning a function from a different module Currently when a function with debug info is cloned from a different module, the cloned function may have hanging DICompileUnits, so that the module with the cloned function fails debug info verification. The proposed fix inserts all DICompileUnits reachable from the cloned function to "llvm.dbg.cu" metadata operands of the cloned function module. Reviewed By: aprantl, efriedma Differential Revision: https://reviews.llvm.org/D66510 Patch by Oleg Pliss (Oleg.Pliss@azul.com) llvm-svn: 370265	2019-08-28 21:27:50 +00:00
Julian Lettner	3ae9b9d5e4	[ASan] Make insertion of version mismatch guard configurable By default ASan calls a versioned function `__asan_version_mismatch_check_vXXX` from the ASan module constructor to check that the compiler ABI version and runtime ABI version are compatible. This ensures that we get a predictable linker error instead of hard-to-debug runtime errors. Sometimes, however, we want to skip this safety guard. This new command line option allows us to do just that. rdar://47891956 Reviewed By: kubamracek Differential Revision: https://reviews.llvm.org/D66826 llvm-svn: 370258	2019-08-28 20:40:55 +00:00
James Y Knight	f025968bcc	Ignore object files that lack coverage information. Before this change, if multiple binary files were presented, all of them must have been instrumented or the load would fail with coverage_map_error::no_data_found. Patch by Dean Sturtevant. Differential Revision: https://reviews.llvm.org/D66763 llvm-svn: 370257	2019-08-28 20:35:50 +00:00
Scott Linder	04f6f25421	[AMDGPU] Fix bug when calculating user_spgr_count for Code Object V3 assembler Stop counting explicitly disabled user_spgr's in the user_sgpr_count field of the kernel descriptor. Differential Revision: https://reviews.llvm.org/D66900 llvm-svn: 370250	2019-08-28 19:38:15 +00:00
Sanjay Patel	2d4b6777c4	[InstCombine] clean up wrap propagation for reassociated ops; NFCI Always true/false checks were flagged by static analysis; https://bugs.llvm.org/show_bug.cgi?id=43143 I have not confirmed the logic difference in propagating nsw vs. nuw, but presumably we would have noticed a bug by now if that was wrong. llvm-svn: 370248	2019-08-28 18:58:06 +00:00
Pirama Arumuga Nainar	19205abaaa	[ValueMapper] NFC: Remove dead code to pause metadata mapping Summary: This functionality was added when Mapper::mapMetadata was recursive. It is no longer needed after r265456, which switched it to be iterative. Reviewers: dexonsmith, srhines Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66860 llvm-svn: 370236	2019-08-28 17:43:14 +00:00
Johannes Doerfert	f7ca0fe1c8	[Attributor] Regularly clear dependences to remove spurious ones As dependences between abstract attributes can become stale, e.g., if one was sufficient to imply another one at some point but it has since been wakened to the point it is not usable for the formerly implied one. To weed out spurious dependences, and thereby eliminate unneeded updates, we introduce an option to determine how often the dependence cache is cleared and recomputed during the fixpoint iteration. Note that the initial value was determined such that we see a positive result on our tests. Differential Revision: https://reviews.llvm.org/D63315 llvm-svn: 370230	2019-08-28 16:58:52 +00:00
Kevin P. Neal	ddf13c00ed	[FPEnv] Add fptosi and fptoui constrained intrinsics. This implements constrained floating point intrinsics for FP to signed and unsigned integers. Quoting from D32319: The purpose of the constrained intrinsics is to force the optimizer to respect the restrictions that will be necessary to support things like the STDC FENV_ACCESS ON pragma without interfering with optimizations when these restrictions are not needed. Reviewed by: Andrew Kaylor, Craig Topper, Hal Finkel, Cameron McInally, Roman Lebedev, Kit Barton Approved by: Craig Topper Differential Revision: http://reviews.llvm.org/D63782 llvm-svn: 370228	2019-08-28 16:33:36 +00:00
Jessica Paquette	af0bd41e06	[AArch64][GlobalISel] Fall back when translating musttail calls These are currently translated as normal functions calls in AArch64. Until we have proper tail call lowering, we shouldn't translate these. Differential Revision: https://reviews.llvm.org/D66842 llvm-svn: 370225	2019-08-28 16:19:01 +00:00
Simon Pilgrim	1d8a886c59	Reduce scope of variable only used in a local pattern match. NFCI. llvm-svn: 370224	2019-08-28 16:06:33 +00:00
Craig Topper	f79d8a064c	[InstCombine] Disable recursion in foldGEPICmp for vector pointer GEPs Due to missing vector support in this function, recursion can generate worse code in some cases. llvm-svn: 370221	2019-08-28 15:40:34 +00:00
Simon Pilgrim	b569624049	Fix uninitialized variable warning in cppcheck. NFCI. InstCombiner::MaxArraySizeForCombine is set outside the constructor so we need to ensure it has a default initialization value. llvm-svn: 370220	2019-08-28 15:19:49 +00:00
David Bolvansky	af118bb6d0	[NFC] Added a comment to avoid possible confusion llvm-svn: 370217	2019-08-28 15:04:48 +00:00
Ryan Taylor	3b1459ed7c	[AMDGPU] Adjust number of SGPRs available in Calling Convention This reduces the number of SGPRs due to some concerns about running out of SGPRs if you make all the SGPRs that aren't reserved available for the calling convention. Change-Id: Idb4ca4dc72f5b6808cb524ff7270915a8de5b4c1 llvm-svn: 370215	2019-08-28 15:00:45 +00:00
Simon Pilgrim	316bfb0f48	Remove duplicate 'BitWidth' variable. NFCI. llvm-svn: 370212	2019-08-28 14:37:44 +00:00
Johannes Doerfert	07a5c129c6	[Attributor] Restrict liveness and return information to functions Summary: Until we have proper call-site information we should not recompute liveness and return information for each call site. This patch directly uses the function versions and introduces TODOs at the usage sites. The required iterations to get to the fixpoint are most of the time reduced by this change and we always avoid work duplication. Reviewers: sstefan1, uenoku Subscribers: hiraditya, bollu, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66562 llvm-svn: 370208	2019-08-28 14:09:14 +00:00
Simon Pilgrim	284118ce3b	InstCombiner::visitSelectInst - rename Pred to MinMaxPred to stop shadow variable warning. NFCI. We have a lot of Predicate variables, all similarly named.... llvm-svn: 370207	2019-08-28 14:05:38 +00:00
Vlad Tsyrklevich	b8a96f4bf5	Reland "[yaml2obj] - Don't allow setting StOther and Other/Visibility at the same time." This relands this commit, I mistakenly reverted the original change thinking it was the cause of the observed MSan failures but it was not. llvm-svn: 370206	2019-08-28 14:04:09 +00:00
Hans Wennborg	cff90f07cb	[SelectionDAG] Don't generate libcalls for wide shifts on Windows (PR42711) Neither libgcc or compiler-rt are usually used on Windows, so these functions can't be called. Differential revision: https://reviews.llvm.org/D66880 llvm-svn: 370204	2019-08-28 13:55:10 +00:00
Vlad Tsyrklevich	aba62e9c00	Revert "[yaml2obj] - Don't allow setting StOther and Other/Visibility at the same time." This reverts commit r370032, it was causing check-llvm failures on sanitizer-x86_64-linux-bootstrap-msan llvm-svn: 370198	2019-08-28 13:15:08 +00:00
Simon Pilgrim	14e07d7f4b	[DAGCombine] Fix cppcheck shadow variable warning. NFCI. We already have an outer Ops variable. llvm-svn: 370197	2019-08-28 12:48:41 +00:00
Simon Atanasyan	f46ba4f077	[mips] Use less registers to load address of TargetExternalSymbol There is no pattern matched `add hi, (MipsLo texternalsym)`. As a result, loading an address of 32-bit symbol requires two registers and one more additional instruction: ``` addiu $1, $zero, %lo(foo) lui $2, %hi(foo) addu $25, $2, $1 ``` This patch adds the missed pattern and enables generation more effective set of instructions: ``` lui $1, %hi(foo) addiu $25, $1, %lo(foo) ``` Differential Revision: https://reviews.llvm.org/D66771 llvm-svn: 370196	2019-08-28 12:35:53 +00:00
Amaury Sechet	4f4387dd12	[TargetLowering] Add buildLegalVectorShuffle facility to help build legal shuffles Summary: There are at least 2 ways to express the same shuffle. Various pieces of code explicit check for both option, but other places do not when they would benefit from doing it. This patches refactor the codebase to use buildLegalVectorShuffle in order to make that behavior more consistent. Reviewers: craig.topper, efriedma, RKSimon, lebedev.ri Subscribers: javed.absar, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66804 llvm-svn: 370190	2019-08-28 12:00:06 +00:00
Simon Pilgrim	c5b38e2869	[DAGCombine] Remove LoadedSlice::Cost default 'ForCodeSize' constructor arguments. NFCI. These were always being passed in and it allowed me to add the explicit tag to stop a cppcheck warning about 1 argument constructors. llvm-svn: 370189	2019-08-28 11:50:36 +00:00
David Green	379f6186dd	[ARM] Move MVEVPTBlockPass to a separate file. NFC This just pulls the MVEVPTBlockPass into a separate file, as opposed to being wrapped up in Thumb2ITBlockPass. Differential revision: https://reviews.llvm.org/D66579 llvm-svn: 370187	2019-08-28 11:37:31 +00:00
David Green	1c5b143c99	[MVE] VMOVX patterns This adds fp16 VMOVX patterns, using the same patterns as rL362482 with some adjustments for MVE. It allows us to move fp16 registers without going into and out of gprs. VMOVX is able to move the top bits from a fp16 in a fp reg into the bottom bits of another register, zeroing the rest. This can be used for odd MVE register lanes. The top bits are not read by fp16 instructions, so no move is required there if we are dealing with even lanes. Differential revision: https://reviews.llvm.org/D66793 llvm-svn: 370184	2019-08-28 10:13:23 +00:00
Hans Wennborg	0af82068a8	[LLVM-C] Fix ByVal Attribute crashing With the introduction of the typed byval attribute change there was no way that the LLVM-C API could create the correct class Attribute. If a program that uses the C API creates a ByVal attribute and annotates a function with that attribute LLVM will crash when it assembles or write that module containing the function out as bitcode. This change is a minimal fix to at least allow code to work, this is because the byval change is on the 9.0 and I don't want to introduce new LLVM-C API this late in the release cycle. By Jakob Bornecrantz! Differential revision: https://reviews.llvm.org/D66144 llvm-svn: 370176	2019-08-28 09:21:56 +00:00
Ayal Zaks	d15df0ede5	[LV] Fold tail by masking - handle reductions Allow vectorizing loops that have reductions when tail is folded by masking. A select is introduced in VPlan, choosing between the last value carried by the loop-exit/live-out instruction of the reduction, and the penultimate value carried by the reduction phi, according to the "i < n" mask of fold-tail. This select replaces the last value as the live-out value of the loop. Differential Revision: https://reviews.llvm.org/D66720 llvm-svn: 370173	2019-08-28 09:02:23 +00:00
Sam Parker	a761ba0f2d	[ARM][ParallelDSP] Change search for muls rL369567 reverted a couple of recent changes made to ARMParallelDSP because of a miscompilation error: PR43073. The issue stemmed from an underlying bug that was caused by adding muls into a reduction before it was proved that they could be executed in parallel with another mul. Most of the changes here are from the previously reverted commits. The additional changes have been made area: 1) The Search function now doesn't insert any muls into the Reduction object. That now happens once the search has successfully finished. 2) For any muls added into the reduction but that weren't paired, we accumulate their values as an input into the smlad. Differential Revision: https://reviews.llvm.org/D66660 llvm-svn: 370171	2019-08-28 08:51:13 +00:00
David Bolvansky	05bda8b4e5	Annotate return values of allocation functions with dereferenceable_or_null Summary: Example define dso_local noalias i8* @_Z6maixxnv() local_unnamed_addr #0 { entry: %call = tail call noalias dereferenceable_or_null(64) i8* @malloc(i64 64) #6 ret i8* %call } Reviewers: jdoerfert Reviewed By: jdoerfert Subscribers: aaron.ballman, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66651 llvm-svn: 370168	2019-08-28 08:28:20 +00:00
Yi Kong	b9d87b9528	[llvm-objdump] Add the missing ARMv8 subarch detection Differential Revision: https://reviews.llvm.org/D66849 llvm-svn: 370163	2019-08-28 06:37:22 +00:00
Fangrui Song	6964027315	[LoopFusion] Fix another -Wunused-function in -DLLVM_ENABLE_ASSERTIONS=off build llvm-svn: 370156	2019-08-28 03:12:40 +00:00
Matt Arsenault	a8bbcbd006	AMDGPU/GlobalISel: Fix constraining scalar and/or/xor If the result register already had a register class assigned, the sources may not have been properly constrained. llvm-svn: 370150	2019-08-28 02:11:03 +00:00
Vlad Tsyrklevich	57076d3199	Revert "Change the X86 datalayout to add three address spaces for 32 bit signed," This reverts commit r370083 because it caused check-lld failures on sanitizer-x86_64-linux-fast. llvm-svn: 370142	2019-08-28 01:08:54 +00:00
Matt Arsenault	5c7e96dc26	AMDGPU/GlobalISel: Implement addrspacecast for 32-bit constant addrspace llvm-svn: 370140	2019-08-28 00:58:24 +00:00
Philip Reames	93a26ec98d	[NFC] Assert preconditions and merge all users into one codepath in Loads.cpp llvm-svn: 370128	2019-08-27 23:36:31 +00:00
Craig Topper	5bbb604bb5	[InstCombine] Disable some portions of foldGEPICmp for GEPs that return a vector of pointers. Fix other portions. llvm-svn: 370114	2019-08-27 21:38:56 +00:00
Luis Marques	c894c6c983	[RISCV] Implement RISCVRegisterInfo::getPointerRegClass Fixes bug 43041 Differential Revision: https://reviews.llvm.org/D66752 llvm-svn: 370113	2019-08-27 21:37:57 +00:00
Amara Emerson	e20b91c265	[GlobalISel] Replace hard coded dynamic alloca handling with G_DYN_STACKALLOC. This change moves the actual stack pointer manipulation into the legalizer, available to targets via lower(). The codegen is slightly different because we're using explicit masks instead of G_PTRMASK, and using G_SUB rather than adding a negative amount via G_GEP. Differential Revision: https://reviews.llvm.org/D66678 llvm-svn: 370104	2019-08-27 19:54:27 +00:00
Philip Reames	2694522f13	[Loads/SROA] Remove blatantly incorrect code and fix a bug revealed in the process The code we had isSafeToLoadUnconditionally was blatantly wrong. This function takes a "Size" argument which is supposed to describe the span loaded from. Instead, the code use the size of the pointer passed (which may be unrelated!) and only checks that span. For any Size > LoadSize, this can and does lead to miscompiles. Worse, the generic code just a few lines above correctly handles the cases which are valid. So, let's delete said code. Removing this code revealed two issues: 1) As noted by jdoerfert the removed code incorrectly handled external globals. The test update in SROA is to stop testing incorrect behavior. 2) SROA was confusing bytes and bits, but this wasn't obvious as the Size parameter was being essentially ignored anyway. Fixed. Differential Revision: https://reviews.llvm.org/D66778 llvm-svn: 370102	2019-08-27 19:34:43 +00:00
Matt Arsenault	2910184936	DAG: computeNumSignBits for MUL Copied directly from the IR version. Most of the testcases I've added for this are somewhat problematic because they really end up testing the yet to be implemented version for MUL_I24/MUL_U24. llvm-svn: 370099	2019-08-27 19:05:33 +00:00
Jason Liu	7c72e82b25	[XCOFF][AIX] Generate symbol table entries with llvm-readobj Summary: This patch implements main entry and auxiliary entries of symbol table generation for llvm-readobj on AIX. The source code of aix_xcoff_xlc_test8.o (compile with xlc) is: -bash-4.2$ cat test8.c extern int i; extern int TestforXcoff; extern int fun(int i); static int static_i; char* p="abcd"; int fun1(int j) { static_i++; j++; j=j+*p; return j; } int main() { i++; fun(i); return fun1(i); } Patch provided by DiggerLin Differential Revision: https://reviews.llvm.org/D65240 llvm-svn: 370097	2019-08-27 18:54:46 +00:00
Praveen Velliengiri	3b1b56d3fb	[ORCv2] - New Speculate Query Implementation Summary: This patch introduces, SequenceBBQuery - new heuristic to find likely next callable functions it tries to find the blocks with calls in order of execution sequence of Blocks. It still uses BlockFrequencyAnalysis to find high frequency blocks. For a handful of hottest blocks (plan to customize), the algorithm traverse and discovered the caller blocks along the way to Entry Basic Block and Exit Basic Block. It uses Block Hint, to stop traversing the already visited blocks in both direction. It implicitly assumes that once the block is visited during discovering entry or exit nodes, revisiting them again does not add much. It also branch probability info (cached result) to traverse only hot edges (planned to customize) from hot blocks. Without BPI, the algorithm mostly return's all the blocks in the CFG with calls. It also changes the heuristic queries, so they don't maintain states. Hence it is safe to call from multiple threads. It also implements, new instrumentation to avoid jumping into JIT on every call to the function with the help _orc_speculate.decision.block and _orc_speculate.block. "Speculator Registration Mechanism is also changed" - kudos to @lhames Open to review, mostly looking to change implementation of SequeceBBQuery heuristics with good data structure choices. Reviewers: lhames, dblaikie Reviewed By: lhames Subscribers: mgorny, hiraditya, mgrang, llvm-commits, lhames Tags: #speculative_compilation_in_orc, #llvm Differential Revision: https://reviews.llvm.org/D66399 llvm-svn: 370092	2019-08-27 18:23:36 +00:00
Andrea Di Biagio	2f51a43f8c	[Tblgen][MCA] Add the ability to mark groups as LoadQueue and StoreQueue. NFCI Before this patch, users were not allowed to optionally mark processor resource groups as load/store queues. That is because tablegen class MemoryQueue was originally declared as expecting a ProcResource template argument (instead of a more generic ProcResourceKind). That was an oversight, since the original intention from D54957 was to let user mark any processor resource as either load/store queue. This patch adds the ability to use processor resource groups in MemoryQueue definitions. This is not a user visible change. Differential Revision: https://reviews.llvm.org/D66810 llvm-svn: 370091	2019-08-27 18:20:34 +00:00
Matt Arsenault	ff07631b48	AMDGPU: Add amdgpu-32bit-address-high-bits to MIR serialization llvm-svn: 370089	2019-08-27 18:18:38 +00:00
Matt Arsenault	0c096da02f	AMDGPU: Fix crash from inconsistent register types for v3i16/v3f16 This is something of a workaround since computeRegisterProperties seems to be doing the wrong thing. llvm-svn: 370086	2019-08-27 17:51:56 +00:00

1 2 3 4 5 ...

126325 Commits