llvm-project

Commit Graph

Author	SHA1	Message	Date
Guy Blank	31b37297de	[X86][AVX512] Refine some avx512er intrinsics tests. NFC. The modified tests should test the masked intrinsics. Currently the mask is constant, which with a future patch (https://reviews.llvm.org/D32805) will cause the intrinsics to be replaced with an unmasked version. This patch changes the constant mask to be a variable one. llvm-svn: 302529	2017-05-09 14:03:51 +00:00
Serge Pavlov	d526b13e61	Add extra operand to CALLSEQ_START to keep frame part set up previously Using arguments with attribute inalloca creates problems for verification of machine representation. This attribute instructs the backend that the argument is prepared in stack prior to CALLSEQ_START..CALLSEQ_END sequence (see http://llvm.org/docs/InAlloca.htm for details). Frame size stored in CALLSEQ_START in this case does not count the size of this argument. However CALLSEQ_END still keeps total frame size, as caller can be responsible for cleanup of entire frame. So CALLSEQ_START and CALLSEQ_END keep different frame size and the difference is treated by MachineVerifier as stack error. Currently there is no way to distinguish this case from actual errors. This patch adds additional argument to CALLSEQ_START and its target-specific counterparts to keep size of stack that is set up prior to the call frame sequence. This argument allows MachineVerifier to calculate actual frame size associated with frame setup instruction and correctly process the case of inalloca arguments. The changes made by the patch are: - Frame setup instructions get the second mandatory argument. It affects all targets that use frame pseudo instructions and touched many files although the changes are uniform. - Access to frame properties are implemented using special instructions rather than calls getOperand(N).getImm(). For X86 and ARM such replacement was made previously. - Changes that reflect appearance of additional argument of frame setup instruction. These involve proper instruction initialization and methods that access instruction arguments. - MachineVerifier retrieves frame size using method, which reports sum of frame parts initialized inside frame instruction pair and outside it. The patch implements approach proposed by Quentin Colombet in https://bugs.llvm.org/show_bug.cgi?id=27481#c1. It fixes 9 tests failed with machine verifier enabled and listed in PR27481. Differential Revision: https://reviews.llvm.org/D32394 llvm-svn: 302527	2017-05-09 13:35:13 +00:00
Simon Dardis	659c43f11a	Revert "[MIPS] Add support to match more patterns for DINS instruction" This reverts commit rL302512. This broke the mips buildbots. llvm-svn: 302526	2017-05-09 13:18:48 +00:00
Simon Pilgrim	ca3a63a849	[X86][SSE42] Lower v2i64/v4i64 ASHR(X, 63) as PCMPGTQ(0, X) Similar to what we do for vXi8 ASHR(X, 7), use SSE42's PCMPGTQ to splat the sign instead of using the PSRAD+PSHUFD. Avoiding bitcasts this improves combines that utilize computeNumSignBits, permits memory folding and reduces pipe pressure. Although it does require a second register, given that this is a (cheap) zero register the impact is minimal. Differential Revision: https://reviews.llvm.org/D32973 llvm-svn: 302525	2017-05-09 13:14:40 +00:00
Guy Blank	5995802911	[X86][AVX512] Add test for masking of scalar instructions. llvm-svn: 302519	2017-05-09 12:32:48 +00:00
Nikolai Bozhenov	b7bf386e80	[X86] Clang option -fuse-init-array has no effect when generating for MCU target Reviewers: Eugene.Zelenko, dschuff, craig.topper Reviewed By: craig.topper Subscribers: ahatanak, aaboud, DavidKreitzer, llvm-commits, cfe-commits Differential Revision: https://reviews.llvm.org/D32543 Patch by AndreiGrischenko <andrei.l.grischenko@intel.com> llvm-svn: 302513	2017-05-09 10:14:03 +00:00
Strahinja Petrovic	27ae4c3259	[MIPS] Add support to match more patterns for DINS instruction This patch adds support for recognizing patterns to match DINS instruction. Differential Revision: https://reviews.llvm.org/D31465 llvm-svn: 302512	2017-05-09 10:02:00 +00:00
Reid Kleckner	41bb94233b	Revert "Don't add DBG_VALUE instructions for static allocas in dbg.declare" This reverts commit r302461. It appears to be causing failures compiling gtest with debug info on the Linux sanitizer bot. I was unable to reproduce the failure locally, however. llvm-svn: 302504	2017-05-09 01:57:44 +00:00
Reid Kleckner	9f29914d40	Revert "Use the frame index side table for byval and inalloca arguments" This reverts r302483 and it's follow up fix. llvm-svn: 302493	2017-05-09 01:14:39 +00:00
Evgeniy Stepanov	f7e8acf0fc	Ignore !associated metadata with null argument. Fixes PR32577 (comment 10). Such metadata may legitimately appear in LTO. llvm-svn: 302485	2017-05-08 23:46:20 +00:00
Reid Kleckner	918e8157d8	Relax Dwarf filecheck test for 32-bit hosts llvm-svn: 302484	2017-05-08 23:27:52 +00:00
Reid Kleckner	45efcf0c96	Use the frame index side table for byval and inalloca arguments Summary: For inalloca functions, this is a very common code pattern: %argpack = type <{ i32, i32, i32 }> define void @f(%argpack* inalloca %args) { entry: %a = getelementptr inbounds %argpack, %argpack* %args, i32 0, i32 0 %b = getelementptr inbounds %argpack, %argpack* %args, i32 0, i32 1 %c = getelementptr inbounds %argpack, %argpack* %args, i32 0, i32 2 tail call void @llvm.dbg.declare(metadata i32* %a, ... "a") tail call void @llvm.dbg.declare(metadata i32* %c, ... "b") tail call void @llvm.dbg.declare(metadata i32* %b, ... "c") Even though these GEPs can be simplified to a constant offset from EBP or RSP, we don't do that at -O0, and each GEP is computed into a register. Registers used to compute argument addresses are typically spilled and clobbered very quickly after the initial computation, so live debug variable tracking loses information very quickly if we use DBG_VALUE instructions. This change moves processing of dbg.declare between argument lowering and basic block isel, so that we can ask if an argument has a frame index or not. If the argument lives in a register as is the case for byval arguments on some targets, then we don't put it in the side table and during ISel we emit DBG_VALUE instructions. Reviewers: aprantl Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D32980 llvm-svn: 302483	2017-05-08 23:20:27 +00:00
Tim Northover	c48c993b75	ARM: use divmod libcalls on embedded MachO platforms too. The separated libcalls are implemented in terms of __divmodsi4 and __udivmodsi4 anyway, so we should always use them if possible. llvm-svn: 302462	2017-05-08 20:00:14 +00:00
Reid Kleckner	bf828eedb4	Don't add DBG_VALUE instructions for static allocas in dbg.declare Summary: An llvm.dbg.declare of a static alloca is always added to the MachineFunction dbg variable map, so these values are entirely redundant. They survive all the way through codegen to be ignored by DWARF emission. Effectively revert r113967 Two bugpoint-reduced test cases from 2012 broke as a result of this change. Despite my best efforts, I haven't been able to rewrite the test case using dbg.value. I'm not too concerned about the lost coverage because these were reduced from the test-suite, which we still run. Reviewers: aprantl, dblaikie Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D32920 llvm-svn: 302461	2017-05-08 19:58:15 +00:00
Quentin Colombet	55a72b3b05	[AArch64][RegisterBankInfo] Change the default mapping of fp loads. This fixes PR32550, in a way that does not imply running the greedy mode at O0. The fix consists in checking if a load is used by any floating point instruction and if yes, we return a default mapping with FPR instead of GPR. llvm-svn: 302453	2017-05-08 18:16:31 +00:00
Zvi Rackover	0f1ffb6cab	[X86] Split test configurations. NFC. Split test that includes reproducer for pr32967 to KNL and SKX. llvm-svn: 302442	2017-05-08 16:54:25 +00:00
Zvi Rackover	7fa777fb74	Adding reproducer for pr32967. NFC. llvm-svn: 302426	2017-05-08 14:47:32 +00:00
Simon Pilgrim	df39b03f29	[X86][SSE] Improve combineLogicBlendIntoPBLENDV to use general masks. Currently combineLogicBlendIntoPBLENDV can only match ASHR to detect sign splatting of a bit mask, this patch generalises this to use computeNumSignBits instead. This is a first step in several things we can do to improve PBLENDV support: * Better matching of X86ISD::ANDNP patterns. * Handle floating point cases. * Better vector and bitcast support in computeNumSignBits. * Recognise that PBLENDV only uses the sign bit of the mask, we should be able strip away sign splats (ASHR, PCMPGT isNeg tests etc.). Differential Revision: https://reviews.llvm.org/D32953 llvm-svn: 302424	2017-05-08 14:16:39 +00:00
Simon Pilgrim	8a3b9c7401	Normalize line endings. NFCI, llvm-svn: 302422	2017-05-08 13:32:34 +00:00
Simon Pilgrim	f5ca255d18	[ARM][NEON] Add support for ISD::ABS lowering Update NEON int_arm_neon_vabs intrinsic to use the ISD::ABS opcode directly Added constant folding tests. Differential Revision: https://reviews.llvm.org/D32938 llvm-svn: 302417	2017-05-08 10:37:34 +00:00
Igor Breger	810c6257f1	[GlobalISel][X86] G_GEP selection support. Summary: [GlobalISel][X86] G_GEP selection support. Reviewers: zvi, guyblank Reviewed By: guyblank Subscribers: dberris, rovka, llvm-commits, kristof.beyls Differential Revision: https://reviews.llvm.org/D32396 llvm-svn: 302412	2017-05-08 09:40:43 +00:00
Igor Breger	605b965ae5	[GlobalISel][X86] G_MUL legalizer/selector support. Summary: G_MUL legalizer/selector/regbank support. Use only Tablegen-erated instruction selection. This patch dealing with legal operations only. Reviewers: zvi, guyblank Reviewed By: guyblank Subscribers: krytarowski, rovka, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D32698 llvm-svn: 302410	2017-05-08 09:03:37 +00:00
Dean Michael Berris	9bcaed867a	[XRay] Custom event logging intrinsic This patch introduces an LLVM intrinsic and a target opcode for custom event logging in XRay. Initially, its use case will be to allow users of XRay to log some type of string ("poor man's printf"). The target opcode compiles to a noop sled large enough to enable calling through to a runtime-determined relative function call. At runtime, when X-Ray is enabled, the sled is replaced by compiler-rt with a trampoline to the logic for creating the custom log entries. Future patches will implement the compiler-rt parts and clang-side support for emitting the IR corresponding to this intrinsic. Reviewers: timshen, dberris Subscribers: igorb, pelikan, rSerge, timshen, echristo, dberris, llvm-commits Differential Revision: https://reviews.llvm.org/D27503 llvm-svn: 302405	2017-05-08 05:45:21 +00:00
Simon Pilgrim	33f7397cc0	[X86][AVX512] Relax assertion and just exit combine for unsupported types (PR32907) llvm-svn: 302361	2017-05-06 20:53:52 +00:00
Simon Pilgrim	fea153f341	[X86][AVX512] Move v2i64/v4i64 VPABS lowering to tablegen Extend NoVLX targets to use the 512-bit versions llvm-svn: 302359	2017-05-06 19:11:59 +00:00
Simon Pilgrim	781cb10104	[X86][SSE] Break register dependencies on v16i8/v8i16 BUILD_VECTOR on SSE41 rL294581 broke unnecessary register dependencies on partial v16i8/v8i16 BUILD_VECTORs, but on SSE41 we (currently) use insertion for full BUILD_VECTORs as well. By allowing full insertion to occur on SSE41 targets we can break register dependencies here as well. llvm-svn: 302355	2017-05-06 17:30:39 +00:00
Simon Pilgrim	946f08c618	[X86][AVX2] Add scheduling latency/throughput tests for some AVX2 instructions Many more to come... llvm-svn: 302338	2017-05-06 13:46:09 +00:00
Simon Pilgrim	2c15447f99	[DAGCombiner] If ISD::ABS is legal/custom, use it directly instead of canonicalizing first. Remove an extra canonicalization step if ISD::ABS is going to be used anyway. Updated x86 abs combine to check that we are lowering from both canonicalizations. llvm-svn: 302337	2017-05-06 13:44:42 +00:00
Krzysztof Parzyszek	d0c71ef8ab	[RDF] Remove covered parts of reached uses for phi and use in same block llvm-svn: 302305	2017-05-05 22:10:32 +00:00
Matthias Braun	4682ac6c83	ARM: Compute MaxCallFrame size early This exposes a method in MachineFrameInfo that calculates MaxCallFrameSize and calls it after instruction selection in the ARM target. This avoids ARMBaseRegisterInfo::canRealignStack()/ARMFrameLowering::hasReservedCallFrame() giving different answers in early/late phases of codegen. The testcase shows a particular nasty example result of that where we would fail to properly align an alloca. Differential Revision: https://reviews.llvm.org/D32622 llvm-svn: 302303	2017-05-05 22:04:05 +00:00
Matthias Braun	c1c5691686	Add missing target triple to test llvm-svn: 302301	2017-05-05 21:50:26 +00:00
Kannan Narayanan	5e73b04b84	[AMDGPU] In the new waitcnt insertion pass, use getHeader instead of getTopBlock to find the loop header. Differential Revision: https://reviews.llvm.org/D32831 llvm-svn: 302290	2017-05-05 21:10:17 +00:00
Matthias Braun	8940114f61	MIParser/MIRPrinter: Compute block successors if not explicitely specified - MIParser: If the successor list is not specified successors will be added based on basic block operands in the block and possible fallthrough. - MIRPrinter: Adds a new `simplify-mir` option, with that option set: Skip printing of block successor lists in cases where the parser is guaranteed to reconstruct it. This means we still print the list if some successor cannot be determined (happens for example for jump tables), if the successor order changes or branch probabilities being unequal. Differential Revision: https://reviews.llvm.org/D31262 llvm-svn: 302289	2017-05-05 21:09:30 +00:00
Konstantin Zhuravlyov	6ccb076aeb	AMDGPU/AMDHSA: Set COMPUTE_PGM_RSRC2:LDS_SIZE to 0 This field is populated by the CP Differential Revision: https://reviews.llvm.org/D32619 llvm-svn: 302277	2017-05-05 20:13:55 +00:00
Alexei Starovoitov	7bab73b1f8	[bpf] fix a bug which causes incorrect big endian reloc fixup o Add bpfeb support in BPF dwarfdump unit test case Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@fb.com> llvm-svn: 302265	2017-05-05 18:05:00 +00:00
Amaury Sechet	841b0907c4	Add more variations of addcarry in the tests. NFC. llvm-svn: 302252	2017-05-05 16:27:55 +00:00
Simon Pilgrim	3f8d8f5f43	[X86][SSE] Add 128/256/512 bit vector build vector from register tests llvm-svn: 302243	2017-05-05 15:36:31 +00:00
Simon Pilgrim	ac3c4b6da4	[X86][AVX512] Improve support and testing for CTLZ of 512-bit vectors without CDI llvm-svn: 302233	2017-05-05 13:31:52 +00:00
Krzysztof Parzyszek	31d4b3b247	Remove stale live-ins in the branch folder Hoisting common code can cause registers that live-in in the successor blocks to no longer be live-in. The live-in information needs to be updated to reflect this, or otherwise incorrect code can be generated later on. Differential Revision: https://reviews.llvm.org/D32661 llvm-svn: 302228	2017-05-05 12:20:07 +00:00
Marek Olsak	584d2c05d4	AMDGPU: GFX9 GS and HS shaders always have the scratch wave offset in SGPR5 Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D32645 llvm-svn: 302200	2017-05-04 22:25:20 +00:00
Aditya Nandakumar	117b667bd9	[GISel]: Add support to translate ConstantVectors Reviewed by Quentin https://reviews.llvm.org/D32814 llvm-svn: 302196	2017-05-04 21:43:12 +00:00
Krzysztof Parzyszek	038a0546db	[PPC] When restoring R30 (PIC base pointer), mark it as <def> This happened on the PPC32/SVR4 path and was discovered when building FreeBSD on PPC32. It was a typo-class error in the frame lowering code. This fixes PR26519. llvm-svn: 302183	2017-05-04 19:14:54 +00:00
Reid Kleckner	6d2ea6ec80	[ms-inline-asm] Use the frontend size only for ambiguous instructions This avoids problems on code like this: char buf[16]; __asm { movups xmm0, [buf] mov [buf], eax } The frontend size in this case (1) is wrong, and the register makes the instruction matching unambiguous. There are also enough bytes available that we shouldn't complain to the user that they are potentially using an incorrectly sized instruction to access the variable. Supersedes D32636 and D26586 and fixes PR28266 llvm-svn: 302179	2017-05-04 18:19:52 +00:00
Adrian Prantl	defc99a94e	Cleanup tests to not share a DISubprogram between multiple Functions. rdar://problem/31926379 llvm-svn: 302166	2017-05-04 16:24:31 +00:00
Chad Rosier	84a238dd62	[DAGCombine] Transform (fadd A, (fmul B, -2.0)) -> (fsub A, (fadd B, B)). Differential Revision: http://reviews.llvm.org/D32596 llvm-svn: 302153	2017-05-04 14:14:44 +00:00
Simon Pilgrim	66af84bfc0	[X86][AVX512] Fix VPABSD file checks Fix capitalization and string matching llvm-svn: 302150	2017-05-04 13:42:57 +00:00
Simon Pilgrim	960a8e71e0	[X86][SSE] Add i686 triple tests for partial vector and re-association llvm-svn: 302149	2017-05-04 13:35:40 +00:00
Jonas Paulsson	4fd156261e	[SystemZ] Make copyPhysReg() add impl-use operands of super reg. When a 128 bit COPY is lowered into two instructions, an impl-use operand of the super-reg should be added to each new instruction in case one of the sub-regs is undefined. Review: Ulrich Weigand llvm-svn: 302146	2017-05-04 13:33:30 +00:00
Simon Pilgrim	5127dbbb23	[X86][SSE] Add i686 triple tests for PBLENDW commutation llvm-svn: 302145	2017-05-04 13:08:09 +00:00
Simon Pilgrim	fbaaf25739	[X86][AVX1] Regenerate checks and add i686 triple tests for folded logical ops llvm-svn: 302144	2017-05-04 13:00:30 +00:00
Igor Breger	70583606b1	[X86][AVX-512] Allow EVEX encoded instruction selection when available for mul v8i32. Differential Revision: https://reviews.llvm.org/D32679 llvm-svn: 302127	2017-05-04 07:34:58 +00:00
Sam Parker	df337704f0	[ARM] ACLE Chapter 9 intrinsics Added the integer data processing intrinsics from ACLE v2.1 Chapter 9 but I have missed out the saturation_occurred intrinsics for now. For the instructions that read and write the GE bits, a chain is included and the only instruction that reads these flags (sel) is only selectable via the implemented intrinsic. Differential Revision: https://reviews.llvm.org/D32281 llvm-svn: 302126	2017-05-04 07:31:28 +00:00
Oren Ben Simhon	51de0330eb	[X86] Disabling PLT in Regcall CC Functions According to psABI, PLT stub clobbers XMM8-XMM15. In Regcall calling convention those registers are used for passing parameters. Thus we need to prevent lazy binding in Regcall. Differential Revision: https://reviews.llvm.org/D32430 llvm-svn: 302124	2017-05-04 07:22:49 +00:00
Igor Breger	0d5949e366	[AVX-512VL] Autogenerate checks. Add --show-mc-encoding to check instruction predicate. llvm-svn: 302123	2017-05-04 06:53:31 +00:00
Craig Topper	d4d09fd73d	[SelectionDAG] Improve known bits support for CTPOP. This is based on the same concept from ValueTracking's version of computeKnownBits. llvm-svn: 302110	2017-05-04 04:33:27 +00:00
Dean Michael Berris	bdfe90050b	[XRay] Create an Index of sleds per function Summary: This change adds a new section to the xray-instrumented binary that stores an index into ranges of the instrumentation map, where sleds associated with the same function can be accessed as an array. At runtime, we can get access to this index by function ID offset allowing for selective patching and unpatching by function ID. Each entry in this new section (xray_fn_idx) will include two pointers indicating the start and one past the end of the sleds associated with the same function. These entries will be 16 bytes long on x86 and aarch64. On arm, we align to 16 bytes anyway so the runtime has to take that into consideration. __{start,stop}_xray_fn_idx will be the symbols that the runtime will look for when we implement the selective patching/unpatching by function id APIs. Because XRay synthesizes the function id's in a monotonically increasing manner at runtime now, implementations (and users) can use this table to look up the sleds associated with a specific function. This is useful in implementations that want to do things like: - Implement coverage mode for functions by patching everything pre-main, then as functions are encountered, the installed handler can unpatch the function that's been encountered after recording that it's been called. - Do "learning mode", so that the implementation can figure out some statistical information about function calls by function id for a time being, and then determine which functions are worth uninstrumenting at runtime. - Do "selective instrumentation" where an implementation can specifically instrument only certain function id's at runtime (either based on some external data, or through some other heuristics) instead of patching all the instrumented functions at runtime. Reviewers: dblaikie, echristo, chandlerc, javed.absar Subscribers: pelikan, aemerson, kpw, llvm-commits, rengolin Differential Revision: https://reviews.llvm.org/D32693 llvm-svn: 302109	2017-05-04 03:37:57 +00:00
Dean Michael Berris	22f2bcf4b9	[XRay] Detect loops in functions being lowered Summary: This is an implementation of the loop detection logic that XRay needs to determine whether a function might take time at runtime. Without this heuristic, XRay will tend to not instrument short functions that have loops that might have runtime dependent on inputs or external values. While this implementation doesn't do any further analysis than just figuring out whether there is a loop in the MachineFunction being code-gen'ed, we're paving the way for being able to perform more sophisticated analysis of the function in the future (for example to determine whether the trip count for the loop might be constant, and make a decision on that instead). This enables us to cover more functions with the default heuristics, and potentially identify ones that have variable runtime latency just by looking for the presence of loops. Reviewers: chandlerc, rnk, pelikan Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D32274 llvm-svn: 302103	2017-05-04 01:24:26 +00:00
Michael Zolotukhin	37162adf3e	[SCEV] createAddRecFromPHI: Optimize for the most common case. Summary: The existing implementation creates a symbolic SCEV expression every time we analyze a phi node and then has to remove it, when the analysis is finished. This is very expensive, and in most of the cases it's also unnecessary. According to the data I collected, ~60-70% of analyzed phi nodes (measured on SPEC) have the following form: PN = phi(Start, OP(Self, Constant)) Handling such cases separately significantly speeds this up. Reviewers: sanjoy, pete Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D32663 llvm-svn: 302096	2017-05-03 23:53:38 +00:00
Reid Kleckner	5c0bdef5aa	Mark functions as not having CFI once we finalize an x86 stack frame We'll set it back to true in emitPrologue if it gets called. It doesn't get called for naked functions. Fixes PR32912 llvm-svn: 302092	2017-05-03 23:13:42 +00:00
Krzysztof Parzyszek	2af5037d34	[Hexagon] Use automatically-generated scheduling information for HVX Patch by Jyotsna Verma. llvm-svn: 302073	2017-05-03 20:10:36 +00:00
Alexei Starovoitov	4198f2a702	[bpf] add relocation support . there should be no runtime relocation inside the bpf function. . relocation supported here mostly for debugging. . a test case is added. Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> llvm-svn: 302055	2017-05-03 17:30:56 +00:00
Tim Northover	761bcdaf06	ARM: add extra test for addrmode folding. I was worried we might replace a mul with a mul+shift even if there were later uses. Turns out to be unfounded but I'd just as well add an actual test for it. llvm-svn: 302051	2017-05-03 16:54:30 +00:00
Simon Pilgrim	03ccf91d85	[X86][LWP] Add stack folding mappings and tests for LWPINS/LWPVAL instructions llvm-svn: 302049	2017-05-03 16:46:30 +00:00
Amaury Sechet	666c705953	[DAGCombine] (addcarry (add\|uaddo X, Y), 0, Carry) -> (addcarry X, Y, Carry) Summary: Do the transform when the carry isn't used. It's a pattern exposed when legalizing large integers. Reviewers: jyknight, nemanjai, mkuper, spatel, RKSimon, zvi, bkramer Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D32755 llvm-svn: 302047	2017-05-03 16:28:10 +00:00
Simon Pilgrim	99b925bdf3	[X86][LWP] Add llvm support for LWP instructions (reapplied). This patch adds support for the the LightWeight Profiling (LWP) instructions which are available on all AMD Bulldozer class CPUs (bdver1 to bdver4). Reapplied - this time without changing line endings of existing files. Differential Revision: https://reviews.llvm.org/D32769 llvm-svn: 302041	2017-05-03 15:51:39 +00:00
Simon Pilgrim	a271c54324	Revert rL302028 due to accidental line ending changes. llvm-svn: 302038	2017-05-03 15:42:29 +00:00
Krzysztof Parzyszek	4763c2d999	[Hexagon] Adjust latency between allocframe and the first store on stack Allocframe and the following stores on the stack have a latency of 2 cycles when not in the same packet. This happens because R29 is needed early by the store instruction. Since one of such stores can be packetized along with allocframe and use old value of R29, we can assign it 0 cycle latency while leaving latency of other stores to the default value of 2 cycles. Patch by Jyotsna Verma. llvm-svn: 302034	2017-05-03 15:33:09 +00:00
Simon Pilgrim	b2e0464fde	[X86][LWP] Add llvm support for LWP instructions. This patch adds support for the the LightWeight Profiling (LWP) instructions which are available on all AMD Bulldozer class CPUs (bdver1 to bdver4). Differential Revision: https://reviews.llvm.org/D32769 llvm-svn: 302028	2017-05-03 15:18:34 +00:00
Oren Ben Simhon	dbd4bba1ec	[X86] Support of no_caller_saved_registers attribute This patch implements the LLVM part for no_caller_saved_registers attribute as appears here: https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=5ed3cc7b66af4758f7849ed6f65f4365be8223be. In order to implement the attribute, we use the dynamic CSR mechanism to remove returned/passed arguments from the function regmask/CSR list. Differential Revision: https://reviews.llvm.org/D31876 llvm-svn: 302020	2017-05-03 13:07:19 +00:00
Elad Cohen	ef5798acf5	Support arbitrary address space pointers in masked gather/scatter intrinsics. Fixes PR31789 - When loop-vectorize tries to use these intrinsics for a non-default address space pointer we fail with a "Calling a function with a bad singature!" assertion. This patch solves this by adding the 'vector of pointers' argument as an overloaded type which will determine the address space. Differential revision: https://reviews.llvm.org/D31490 llvm-svn: 302018	2017-05-03 12:28:54 +00:00
Dylan McKay	4aedb8a6b7	[AVR] Reserve the Y register in all functions llvm-svn: 302017	2017-05-03 11:56:01 +00:00
Alex Lorenz	c748d7b57b	[Triple] Add a "macos" OS type that acts as a synonym for "macosx" The "macosx" OS type is still the canonical type. In the future "macos" will become the canonical OS type (but we will still support "macosx"). rdar://27043820 Differential Revision: https://reviews.llvm.org/D32748 llvm-svn: 302011	2017-05-03 10:42:35 +00:00
Tim Shen	e59d06fe78	[PowerPC, DAGCombiner] Fold a << (b % (sizeof(a) * 8)) back to a single instruction Summary: This is the corresponding llvm change to D28037 to ensure no performance regression. Reviewers: bogner, kbarton, hfinkel, iteratee, echristo Subscribers: nemanjai, llvm-commits Differential Revision: https://reviews.llvm.org/D28329 llvm-svn: 301990	2017-05-03 00:07:02 +00:00
Tim Northover	4a01ffbd6a	ARM: avoid handing a deleted node back to TableGen during ISel. When we replaced the multiplicand the destination node might already exist. When that happens the original gets CSEd and deleted. However, it's actually used as the offset so nonsense is produced. Should fix PR32726. llvm-svn: 301983	2017-05-02 22:45:19 +00:00
Joel Jones	6513405735	[AArch64] ILP32 Backend Relocation Support Remove "_NC" suffix and semantics from TLSDESC_LD{64,32}_LO12 and TLSDESC_ADD_LO12 relocations Rearrange ordering in AArch64.def to follow relocation encoding Fix name: R_AARCH64_P32_LD64_GOT_LO12_NC => R_AARCH64_P32_LD32_GOT_LO12_NC Add support for several "TLS", "TLSGD", and "TLSLD" relocations for ILP32 Fix return values from isNonILP32reloc Add implementations for R_AARCH64_ADR_PREL_PG_HI21_NC, R_AARCH64_P32_LD32_GOT_LO12_NC, R_AARCH64_P32_TLSIE_LD32_GOTTPREL_LO12_NC, R_AARCH64_P32_TLSDESC_LD32_LO12, R_AARCH64_LD64_GOT_LO12_NC, TLSLD_LDST128_DTPREL_LO12, TLSLD_LDST128_DTPREL_LO12_NC, TLSLE_LDST128_TPREL_LO12, TLSLE_LDST128_TPREL_LO12_NC Modify error messages to give name of equivalent relocation in the ABI not being used, along with better checking for non-existent requested relocations. Added assembler support for "pg_hi21_nc" Relocation definitions added without implementations: R_AARCH64_P32_TLSDESC_ADR_PREL21, R_AARCH64_P32_TLSGD_ADR_PREL21, R_AARCH64_P32_TLSGD_ADD_LO12_NC, R_AARCH64_P32_TLSLD_ADR_PREL21, R_AARCH64_P32_TLSLD_ADR_PAGE21, R_AARCH64_P32_TLSLD_ADD_LO12_NC, R_AARCH64_P32_TLSLD_LD_PREL19, R_AARCH64_P32_TLSDESC_LD_PREL19, R_AARCH64_P32_TLSGD_ADR_PAGE21, R_AARCH64_P32_TLS_DTPREL, R_AARCH64_P32_TLS_DTPMOD, R_AARCH64_P32_TLS_TPREL, R_AARCH64_P32_TLSDESC Fix encoding: R_AARCH64_P32_TLSDESC_ADR_PAGE21 Reviewers: Peter Smith Patch by: Joel Jones (jjones@cavium.com) Differential Revision: https://reviews.llvm.org/D32072 llvm-svn: 301980	2017-05-02 22:01:48 +00:00
Tim Northover	f9d8eee3db	ARM: add arm1176j-f processor I doubt anyone actually uses it, and I'm not even entirely convinced it exists myself; but it is our default for "clang -arch armv6". Functionally, if it does exist it's identical to the arm1176jz-f from LLVM's point of view (the difference is apparently in the "Security Extensions"). llvm-svn: 301962	2017-05-02 19:06:13 +00:00
Matt Arsenault	5c80618fb7	AMDGPU: Don't promote alloca to LDS for leaf functions LDS use in leaf functions not currently handled. llvm-svn: 301958	2017-05-02 18:33:18 +00:00
Krzysztof Parzyszek	a750383d0f	[Hexagon] Add extenders for GD_PLT_B22_PCREL and LD_PLT_B22_PCREL Patch by Sid Manning. llvm-svn: 301955	2017-05-02 18:15:33 +00:00
Krzysztof Parzyszek	9aaf923376	[Hexagon] Don't ignore mult-cycle latency information The compiler was generating code that ends up ignoring a multiple latency dependence between two instructions by scheduling the intructions in back-to-back packets. The packetizer needs to end a packet if the latency of the current current insruction and the source in the previous packet is greater than 1 cycle. This case occurs when there is still room in the current packet, but scheduling the instruction causes a stall. Instead, the packetizer should start a new packet. Also, if the current packet already contains a stall, then it is okay to add another instruction to the packet that also causes a stall. This occurs when there are no instructions that can be scheduled in between the producer and consumer instructions. This patch changes the latency for loads to 2 cycles from 3 cycles. This change refects that a load only needs to be separated by one extra packet to eliminate the stall. Patch by Ikhlas Ajbar. llvm-svn: 301954	2017-05-02 18:12:19 +00:00
Zachary Turner	a0aae2757d	Revert "Remove "_NC" suffix and semantics from TLSDESC_LD{64,32}_LO12 and" This reverts commit c08155afc5d3230792da2ad30a046a8617735a73. This is causing undefined symbol errors with some of the constants. llvm-svn: 301944	2017-05-02 17:51:27 +00:00
Joel Jones	705103e523	Remove "_NC" suffix and semantics from TLSDESC_LD{64,32}_LO12 and TLSDESC_ADD_LO12 relocations Rearrange ordering in AArch64.def to follow relocation encoding Fix name: R_AARCH64_P32_LD64_GOT_LO12_NC => R_AARCH64_P32_LD32_GOT_LO12_NC Add support for several "TLS", "TLSGD", and "TLSLD" relocations for ILP32 Fix return values from isNonILP32reloc Add implementations for R_AARCH64_ADR_PREL_PG_HI21_NC, R_AARCH64_P32_LD32_GOT_LO12_NC, R_AARCH64_P32_TLSIE_LD32_GOTTPREL_LO12_NC, R_AARCH64_P32_TLSDESC_LD32_LO12, R_AARCH64_LD64_GOT_LO12_NC, TLSLD_LDST128_DTPREL_LO12, TLSLD_LDST128_DTPREL_LO12_NC, TLSLE_LDST128_TPREL_LO12, TLSLE_LDST128_TPREL_LO12_NC Modify error messages to give name of equivalent relocation in the ABI not being used, along with better checking for non-existent requested relocations. Added assembler support for "pg_hi21_nc" Relocation definitions added without implementations: R_AARCH64_P32_TLSDESC_ADR_PREL21, R_AARCH64_P32_TLSGD_ADR_PREL21, R_AARCH64_P32_TLSGD_ADD_LO12_NC, R_AARCH64_P32_TLSLD_ADR_PREL21, R_AARCH64_P32_TLSLD_ADR_PAGE21, R_AARCH64_P32_TLSLD_ADD_LO12_NC, R_AARCH64_P32_TLSLD_LD_PREL19, R_AARCH64_P32_TLSDESC_LD_PREL19, R_AARCH64_P32_TLSGD_ADR_PAGE21, R_AARCH64_P32_TLS_DTPREL, R_AARCH64_P32_TLS_DTPMOD, R_AARCH64_P32_TLS_TPREL, R_AARCH64_P32_TLSDESC Fix encoding: R_AARCH64_P32_TLSDESC_ADR_PAGE21 Reviewers: Peter Smith Patch by: Joel Jones (jjones@cavium.com) Differential Revision: https://reviews.llvm.org/D32072 llvm-svn: 301939	2017-05-02 17:14:31 +00:00
Matt Arsenault	7b82b4bddb	AMDGPU: Make intrinsics speculatable llvm-svn: 301937	2017-05-02 16:57:44 +00:00
Amaury Sechet	3847996d74	Add new test case for addcarry. NFC. llvm-svn: 301932	2017-05-02 16:07:32 +00:00
Amaury Sechet	106a7eab84	[DAGCombine] (uaddo X, (addcarry Y, 0, Carry)) -> (addcarry X, Y, Carry) Summary: This is a common pattern that arise when legalizing large integers operations. Only do it when Y + 1 cannot overflow as this would change the carry behavior of uaddo . Reviewers: jyknight, nemanjai, mkuper, spatel, RKSimon, zvi, bkramer Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D32687 llvm-svn: 301922	2017-05-02 14:15:48 +00:00
Amaury Sechet	153911f71d	[DAGCombine] (add X, (addcarry Y, 0, Carry)) -> (addcarry X, Y, Carry) Summary: Common pattern when legalizing large integers operations. Similar to D32687, when the carry isn't used. Reviewers: jyknight, nemanjai, mkuper, spatel, RKSimon, zvi, bkramer Differential Revision: https://reviews.llvm.org/D32738 llvm-svn: 301919	2017-05-02 13:34:25 +00:00
Simon Pilgrim	6615bde93b	[X86][SSE] Add test for PR30264 (combining multiple constants inputs in a shuffle) llvm-svn: 301915	2017-05-02 12:25:17 +00:00
Simon Pilgrim	89ad89cc73	[SelectionDAG] Improve support for promotion of <1 x fX> floating point argument types (PR31088) PR31088 demonstrated that we were assuming that only integers require promotion from <1 x iX> types, when in fact float types may require it as well - in this case half floats. This patch adds support for extension/truncation for both integer and float types. Differential Revision: https://reviews.llvm.org/D32391 llvm-svn: 301910	2017-05-02 10:33:08 +00:00
Simon Pilgrim	8deb87a6c0	[DAGCombiner] Improve MatchBswapHword logic (PR31357) The existing code only looks at half of the tree when matching bswap + rol patterns ending in an OR tree (as opposed to a cascade). Patch originally introduced by Jim Lewis. Submitted on the behalf of Dinar Temirbulatov. Differential Revision: https://reviews.llvm.org/D32039 llvm-svn: 301907	2017-05-02 10:16:19 +00:00
Dylan McKay	28355efdad	[AVR] Save/restore the frame pointer for all functions A recent commit I made made it so that we only did this for signal or interrupt handlers. This broke normal functions. llvm-svn: 301893	2017-05-02 01:57:48 +00:00
Nemanja Ivanovic	b89c27f515	[PowerPC] Emit VMX loads/stores for aligned ops to avoid adding swaps on LE Fixes PR30730. This is a re-commit of a pulled commit. The commit was pulled because some software projects contained uses of Altivec vectors that violated alignment requirements. Known issues have now been fixed. Committing on behalf of Lei Huang. Differential Revision: https://reviews.llvm.org/D26861 llvm-svn: 301892	2017-05-02 01:47:34 +00:00
Matthias Braun	ab9438cb03	MachineFrameInfo: Track whether MaxCallFrameSize is computed yet; NFC This tracks whether MaxCallFrameSize is computed yet. Ideally we would assert and fail when the value is queried before it is computed, however this fails various targets that need to be fixed first. Differential Revision: https://reviews.llvm.org/D32570 llvm-svn: 301851	2017-05-01 22:32:25 +00:00
Craig Topper	6b1b630a98	[SelectionDAG] Use known ones to provide a better bound for the known zeros for CTTZ/CTLZ operations. This is the SelectionDAG version of D32521. If know where at least one 1 is located in the input to these intrinsics we can place an upper bound on the number of bits needed to represent the count and thus increase the number of known zeros in the output. I think we can also refine this further for CTTZ_UNDEF/CTLZ_UNDEF by assuming that the answer will never be BitWidth. I've left this out for now because it caused other test failures across multiple targets. Usually because of turning ADD into OR based on this new information. I'll fix CTPOP in a future patch. Differential Revision: https://reviews.llvm.org/D32692 llvm-svn: 301806	2017-05-01 16:08:06 +00:00
Dylan McKay	59e7fe3da8	[AVR] Implement non-constant bit rotations This lets us do bit rotations of variable amount. llvm-svn: 301794	2017-05-01 09:48:55 +00:00
Igor Breger	4064dc76c5	[GlobalISel][X86] rename test file. NFC. llvm-svn: 301793	2017-05-01 08:11:02 +00:00
Craig Topper	c8b5693948	[X86] Add tests for opportunities to improve known bits for CTTZ and CTLZ. llvm-svn: 301791	2017-05-01 06:33:17 +00:00
Igor Breger	c08a783521	[GlobalISel][X86] G_SEXT/G_ZEXT support. Reviewers: zvi, guyblank Reviewed By: zvi Subscribers: rovka, llvm-commits, kristof.beyls Differential Revision: https://reviews.llvm.org/D32591 llvm-svn: 301790	2017-05-01 06:30:16 +00:00
Igor Breger	a9edb88d46	[GlobalISel][X86] G_LOAD/G_STORE pointer selection support. Summary: [GlobalISel][X86] G_LOAD/G_STORE pointer selection support. Reviewers: zvi, guyblank Reviewed By: zvi, guyblank Subscribers: dberris, rovka, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D32217 llvm-svn: 301788	2017-05-01 06:08:32 +00:00
Sanjay Patel	ad13826aea	[DAGCombiner] shrink/widen a vselect to match its condition operand size (PR14657) We discussed shrinking/widening of selects in IR in D26556, and I'll try to get back to that patch eventually. But I'm hoping that this transform is less iffy in the DAG where we can check legality of the select that we want to produce. A few things to note: 1. We can't wait until after legalization and do this generically because (at least in the x86 tests from PR14657), we'll have PACKSS and bitcasts in the pattern. 2. This might benefit more of the SSE codegen if we lifted the legal-or-custom requirement, but that requires a closer look to make sure we don't end up worse. 3. There's a 'vblendv' opportunity that we're missing that results in andn/and/or in some cases. That should be fixed next. 4. I'm assuming that AVX1 offers the worst of all worlds wrt uneven ISA support with multiple legal vector sizes, but if there are other targets like that, we should add more tests. 5. There's a codegen miracle in the multi-BB tests from PR14657 (the gcc auto-vectorization tests): despite IR that is terrible for the target, this patch allows us to generate the optimal loop code because something post-ISEL is hoisting the splat extends above the vector loops. Differential Revision: https://reviews.llvm.org/D32620 llvm-svn: 301781	2017-04-30 22:44:51 +00:00
Amaury Sechet	8ac81f3924	Do not legalize large add with addc/adde, introduce addcarry and do it with uaddo/addcarry Summary: As per discution on how to get better codegen an large int legalization, it became clear that using a glue for the carry was preventing several desirable optimizations. Passing the carry down as a value allow for more flexibility. Reviewers: jyknight, nemanjai, mkuper, spatel, RKSimon, zvi, bkramer Subscribers: igorb, llvm-commits Differential Revision: https://reviews.llvm.org/D29872 llvm-svn: 301775	2017-04-30 19:24:09 +00:00
Daniel Sanders	e9fdba39e0	[globalisel][tablegen] Compute available feature bits correctly. Summary: Predicate<> now has a field to indicate how often it must be recomputed. Currently, there are two frequencies, per-module (RecomputePerFunction==0) and per-function (RecomputePerFunction==1). Per-function predicates are currently recomputed more frequently than necessary since the only predicate in this category is cheap to test. Per-module predicates are now computed in getSubtargetImpl() while per-function predicates are computed in selectImpl(). Tablegen now manages the PredicateBitset internally. It should only be necessary to add the required includes. Also fixed a problem revealed by the test case where constrainSelectedInstRegOperands() would attempt to tie operands that BuildMI had already tied. Reviewers: ab, qcolombet, t.p.northover, rovka, aditya_nandakumar Reviewed By: rovka Subscribers: kristof.beyls, igorb, llvm-commits Differential Revision: https://reviews.llvm.org/D32491 llvm-svn: 301750	2017-04-29 17:30:09 +00:00
Simon Pilgrim	694cb2c838	[X86][AVX] Added codegen tests for _mm256_zext* helper intrinsics (PR32839) Not great codegen, especially as VEX moves support implicit zeroing of upper bits.... llvm-svn: 301748	2017-04-29 17:15:12 +00:00
Simon Pilgrim	ac7f3e24d3	[X86][SSE] Add initial <2 x half> tests for PR31088 As discussed on D32391, test X86/X64 SSE2 and X64 F16C. llvm-svn: 301744	2017-04-29 14:29:06 +00:00
Matt Arsenault	2a80369ae4	AMDGPU: Fix copies from physical registers in SIFixSGPRCopies This would assert when there were multiple defs of a physical register. We just need to move all of the users of it. llvm-svn: 301730	2017-04-29 01:26:34 +00:00
Adrian Prantl	fed4f399d3	Remove line and file from DINamespace. Fixes the issue highlighted in http://lists.llvm.org/pipermail/cfe-dev/2014-June/037500.html. The DW_AT_decl_file and DW_AT_decl_line attributes on namespaces can prevent LLVM from uniquing types that are in the same namespace. They also don't carry any meaningful information. rdar://problem/17484998 Differential Revision: https://reviews.llvm.org/D32648 llvm-svn: 301706	2017-04-28 22:25:46 +00:00
Krzysztof Parzyszek	072ddb383c	[RDF] Correctly calculate lane masks for defs llvm-svn: 301700	2017-04-28 21:57:53 +00:00
Krzysztof Parzyszek	2065a2f4e6	Properly handle PHIs with subregisters in UnreachableBlockElim When a PHI operand has a subregister, create a COPY instead of simply replacing the PHI output with the input it. Differential Revision: https://reviews.llvm.org/D32650 llvm-svn: 301699	2017-04-28 21:56:33 +00:00
Krzysztof Parzyszek	0b3acbb1dd	[Hexagon] Do not move a block if it is on a fall-through path llvm-svn: 301698	2017-04-28 21:54:11 +00:00
Marek Olsak	2d82590f64	AMDGPU: Add new amdgcn.init.exec intrinsics v2: More tests, bug fixes, cosmetic changes. Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D31762 llvm-svn: 301677	2017-04-28 20:21:58 +00:00
Alexei Starovoitov	f7bd5ebd3b	[bpf] add bigendian support to disassembler . swap 4-bit register encoding, 16-bit offset and 32-bit imm to support big endian archs . add a test Reported-by: David S. Miller <davem@davemloft.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org> llvm-svn: 301653	2017-04-28 16:51:01 +00:00
Simon Pilgrim	7ae9419dc0	[DAGCombiner] Add ComputeNumSignBits vector demanded elements support to ASHR and INSERT_VECTOR_ELT (reapplied) Reapplied r299221 after fix for nondeterminism in ThinLTO builder (rL301599), with extra check for implicit truncation of inserted element. llvm-svn: 301644	2017-04-28 13:21:18 +00:00
Simon Pilgrim	ec93334317	[X86][SSE] Added new tests from D32416 to show codegen delta llvm-svn: 301641	2017-04-28 11:53:08 +00:00
Simon Pilgrim	04928fd021	[X86][SSE] Renames all ones test to better match type. Added 8f32/4f64 optsize tests discussed on D32416 llvm-svn: 301639	2017-04-28 11:12:30 +00:00
Simon Pilgrim	67b1a79985	[X86][SSE] Add codegen test for _mm_set_pd1 (PR32827) llvm-svn: 301638	2017-04-28 10:31:42 +00:00
Andrew Ng	03e35b6bc0	[DebugInfo][X86] Improve X86 Optimize LEAs handling of debug values. This is a follow up to the fix in r298360 to improve the handling of debug values when redundant LEAs are removed. The fix in r298360 effectively discarded the debug values. This patch now attempts to preserve the debug values by using the DWARF DW_OP_stack_value operation via prependDIExpr. Moved functions appendOffset and prependDIExpr from Local.cpp to DebugInfoMetadata.cpp and made them available as static member functions of DIExpression. Differential Revision: https://reviews.llvm.org/D31604 llvm-svn: 301630	2017-04-28 08:44:30 +00:00
Diana Picus	0674a3ce97	[ARM] GlobalISel: Tighten test. NFC Explicitly check types and load sizes in the IRTranslator test. llvm-svn: 301627	2017-04-28 07:50:47 +00:00
Sanjoy Das	ba0daee6b2	[StackMaps] Increase the size of the "location size" field Summary: In some cases LLVM (especially the SLP vectorizer) will create vectors that are 256 bytes (or larger). Given that this is intentional[0] is likely to get more common, this patch updates the StackMap binary format to deal with the spill locations for said vectors. This change also bumps the stack map version from 2 to 3. [0]: https://reviews.llvm.org/D32533#738350 Reviewers: reames, kavon, skatkov, javed.absar Subscribers: mcrosier, nemanjai, llvm-commits Differential Revision: https://reviews.llvm.org/D32629 llvm-svn: 301615	2017-04-28 04:48:42 +00:00
Simon Pilgrim	9a08ad8abd	[X86][SSE] Add tests for broadcast from larger vector loads llvm-svn: 301583	2017-04-27 20:19:00 +00:00
Sanjoy Das	40c32dd9a0	Use a pointer type for target frame indices during statepoint lowering Summary: The type of the target frame index is intptr, not the type of the value we're going to store into it. Without this change we crash in the attached test case when trying to type-legalize a TargetFrameIndex. Patchpoint lowering types the target frame index as intptr as well. Reviewers: reames, bogner, arsenm Subscribers: arsenm, mcrosier, llvm-commits Differential Revision: https://reviews.llvm.org/D32256 llvm-svn: 301566	2017-04-27 17:17:16 +00:00
Sanjay Patel	c3e00fcadd	[x86] add minimal tests for potential size-changing vsel transforms; NFC llvm-svn: 301554	2017-04-27 16:10:20 +00:00
Zoran Jovanovic	ffef3e3c6a	[mips][microMIPS] Adding code size reduction pass for MicroMIPS Author: milena.vujosevic.janicic Reviewers: sdardis The code implements size reduction pass for MicroMIPS. Load and store instructions are examined and transformed, if possible. lw32 instruction is transformed into 16-bit instruction lwsp sw32 instruction is transformed into 16-bit instruction swsp Arithmetic instrcutions are examined and transformed, if possible. addu32 instruction is transformed into 16-bit instruction addu16 subu32 instruction is transformed into 16-bit instruction subu16 Differential Revision: https://reviews.llvm.org/D15144 llvm-svn: 301540	2017-04-27 13:10:48 +00:00
Diana Picus	4f46be327c	[ARM] GlobalISel: Fix extended stack operands Fix a crash when trying to extend a value passed as a sign- or zero-extended stack parameter. The cause of the crash was that we were setting the size of the loaded value to 32 bits, and then tyring to extend again to 32 bits. This patch addresses the issue by also introducing a G_TRUNC after the load. This will leave the unused bits to their original values set by the caller, while being consistent about the types. For values that are not extended, we just use a smaller load. llvm-svn: 301531	2017-04-27 10:23:30 +00:00
Andrew V. Tischenko	9108ae2b50	2 tests that were lost in rL301390 llvm-svn: 301529	2017-04-27 10:20:35 +00:00
Igor Breger	360d0f23ee	[GlobalISel][X86] handle not symmetric G_COPY Summary: handle not symmetric G_COPY Reviewers: zvi, guyblank Reviewed By: guyblank Subscribers: rovka, llvm-commits, kristof.beyls Differential Revision: https://reviews.llvm.org/D32420 llvm-svn: 301523	2017-04-27 08:02:03 +00:00
Sanjay Patel	a0547c3d9f	[DAGCombiner] add (sext i1 X), 1 --> zext (not i1 X) Besides better codegen, the motivation is to be able to canonicalize this pattern in IR (currently we don't) knowing that the backend is prepared for that. This may also allow removing code for special constant cases in DAGCombiner::foldSelectOfConstants() that was added in D30180. Differential Revision: https://reviews.llvm.org/D31944 llvm-svn: 301457	2017-04-26 20:26:46 +00:00
Sanjay Patel	3603e3f22d	[x86] change tests to use sext, not zext; NFC These are intended to exercise D31944, so we need sexts. llvm-svn: 301412	2017-04-26 14:35:54 +00:00
Sanjay Patel	e2ec05a62a	[TargetLowering] fix isConstTrueVal to account for build vector truncation Build vectors have magical truncation powers, so we have things like this: v4i1 = BUILD_VECTOR Constant:i32<1>, Constant:i32<1>, Constant:i32<1>, Constant:i32<1> v4i16 = BUILD_VECTOR Constant:i32<1>, Constant:i32<1>, Constant:i32<1>, Constant:i32<1> If we don't truncate the splat node returned by getConstantSplatNode(), then we won't find truth when ZeroOrNegativeOneBooleanContent is the rule. Differential Revision: https://reviews.llvm.org/D32505 llvm-svn: 301408	2017-04-26 14:05:42 +00:00
Ranjeet Singh	acbd4e141f	Fix signed multiplication with overflow fallback. For targets that don't have ISD::MULHS or ISD::SMUL_LOHI for the type and the double width type is illegal, then the two operands are sign extended to twice their size then multiplied to check for overflow. The extended upper halves were mismatched causing an incorrect result. This fixes the mismatch. A test was added for ARM V6-M where the bug was detected. Patch by James Duley. Differential Revision: https://reviews.llvm.org/D31807 llvm-svn: 301404	2017-04-26 13:41:43 +00:00
Simon Pilgrim	e093594074	[X86] Added pointer math zext test case (PR22970) llvm-svn: 301401	2017-04-26 13:03:00 +00:00
Simon Pilgrim	e6a7708448	[X86][SSE] Add test case for repeated vector insertions of the same element (PR15298) llvm-svn: 301396	2017-04-26 12:23:32 +00:00
Sagar Thakur	b458b468a2	[mips] Fix test mips64fpldst.ll with machine verifier enabled Removed micro mips register classes for gp initialization because gp initialization uses pure mips64 instruction. Even when compiling for micro mips, gp initialization can be done with pure mips64 instructions. Reviewed by Simon Dardis Differential: D32286 llvm-svn: 301394	2017-04-26 11:40:12 +00:00
Ayman Musa	d9fb157845	[X86][SSE2] Fix asm string for movq (Move Quadword) instruction. Replace "mov{d\|q}" with "movq". Differential Revision: https://reviews.llvm.org/D32220 llvm-svn: 301386	2017-04-26 07:08:44 +00:00
Vadzim Dambrouski	d91fb8c367	[MSP430] Fix PR32769: Select8 and Select16 need to have SR in Uses. If Select pseudo instruction doesn't have use SR, then CMP instructions are being marked as dead and later can be removed by MachineCSE pass. This leads to incorrect code generation. Differential Revision: https://reviews.llvm.org/D32473 llvm-svn: 301372	2017-04-26 00:33:59 +00:00
Sanjay Patel	227c901dd8	[x86] add more tests for potential change in bool math folding; NFC Also, use AVX2 to show a potential difference for 256-bit vectors. llvm-svn: 301362	2017-04-25 20:56:14 +00:00
Konstantin Zhuravlyov	54ba4312a3	AMDGPU: Fix ValueKind code object metadata for images Differential Revision: https://reviews.llvm.org/D32504 llvm-svn: 301360	2017-04-25 20:38:26 +00:00
Sanjay Patel	7e6ee7c00d	[x86] regenerate checks; NFC llvm-svn: 301359	2017-04-25 20:30:08 +00:00
Simon Pilgrim	58641e4529	[X86][AVX2] Add shuffle test for PR27320 showing current codegen. llvm-svn: 301342	2017-04-25 18:00:04 +00:00
Simon Pilgrim	6f775ba188	[X86][SSE] Add tests for PR14657 showing current codegen. llvm-svn: 301334	2017-04-25 17:22:34 +00:00
Andrew Ng	10ebfe0684	Resubmit r301309: [DebugInfo][X86] Fix handling of DBG_VALUE's in post-RA scheduler. This patch reapplies r301309 with the fix to the MIR test to fix the assertion triggered by r301309. Had trimmed a little bit too much from the MIR! llvm-svn: 301317	2017-04-25 15:39:57 +00:00
Dylan McKay	8f515b1ef7	[AVR] Support the LDWRdPtr instruction with the same Src+Dst register llvm-svn: 301313	2017-04-25 15:09:04 +00:00
Andrew Ng	049ed153af	Revert "[DebugInfo][X86] Fix handling of DBG_VALUE's in post-RA scheduler." This reverts commit r301309 which is causing buildbot assertion failures. llvm-svn: 301312	2017-04-25 14:36:01 +00:00
Andrew Ng	178c369456	[DebugInfo][X86] Fix handling of DBG_VALUE's in post-RA scheduler. This patch fixes a bug with the updating of DBG_VALUE's in BreakAntiDependencies. Previously, it would only attempt to update the first DBG_VALUE following the instruction whose register is being changed, potentially leaving DBG_VALUE's referring to the wrong register. Now the code will update all DBG_VALUE's that immediately follow the instruction. This issue was detected as a result of an optimized codegen difference with "-g" where an X86 byte/word fixup was not performed due to a DBG_VALUE referencing the wrong register. Differential Revision: https://reviews.llvm.org/D31755 llvm-svn: 301309	2017-04-25 13:39:49 +00:00
Simon Pilgrim	7d65b66962	[DAGCombiner] Add vector support for (srl (trunc (srl x, c1)), c2) combine. llvm-svn: 301305	2017-04-25 12:40:45 +00:00
Simon Pilgrim	ab0446332e	[SelectionDAG] Recognise splat vector isKnownToBeAPowerOfTwo one/sign bit shift cases. llvm-svn: 301303	2017-04-25 12:29:07 +00:00
Sanjay Patel	6b01b4f5a6	[ARM, x86] add more vector tests for bool math; NFC I'm proposing a fold for increment-of-sexted-bool in: https://reviews.llvm.org/D31944 ...so we need to know what happens in more cases like these. llvm-svn: 301269	2017-04-24 22:42:34 +00:00
Matt Arsenault	4474652c95	Revert "StructurizeCFG: Directly invert cmp instructions" This reverts commit r300732. This breaks a few tests. I think the problem is related to adding more uses of the condition that don't yet exist at this point. llvm-svn: 301242	2017-04-24 20:25:01 +00:00
Matt Arsenault	0774ea267a	AMDGPU: Select scratch mubuf offsets when pointer is a constant In call sequence setups, there may not be a frame index base and the pointer is a constant offset from the frame pointer / scratch wave offset register. llvm-svn: 301230	2017-04-24 19:40:59 +00:00
Stanislav Mekhanoshin	bd5394be3d	[AMDGPU] Merge M0 initializations Merges equivalent initializations of M0 and hoists them into a common dominator block. Technically the same code can be used with any register, physical or virtual. Differential Revision: https://reviews.llvm.org/D32279 llvm-svn: 301228	2017-04-24 19:37:54 +00:00
Adrian Prantl	083e6a5b5c	Don't emit CFI instructions at the end of a function When functions are terminated by unreachable instructions, the last instruction might trigger a CFI instruction to be generated. However, emitting it would be be illegal since the function (and thus the FDE the CFI is in) has already ended with the previous instruction. Darwin's dwarfdump --verify --eh-frame complains about this and the specification supports this. Relevant bits from the DWARF 5 standard (6.4 Call Frame Information): "[The] address_range [field in an FDE]: The number of bytes of program instructions described by this entry." "Row creation instructions: [...] The new location value is always greater than the current one." The first quotation implies that a CFI cannot describe a target address outside of the enclosing FDE's range. rdar://problem/26244988 Differential Revision: https://reviews.llvm.org/D32246 llvm-svn: 301219	2017-04-24 18:45:59 +00:00
Yaxun Liu	fd23a0c095	CodeGen: Add a hook for getFenceOperandTy Currently the operand type for ATOMIC_FENCE assumes value type of a pointer in address space 0. This is fine for most targets. However for amdgcn target, the size of pointer in address space 0 depends on triple environment. For amdgiz environment, it is 64 bit but for other environment it is 32 bit. On the other hand, amdgcn target expects 32 bit fence operands independent of the target triple environment. Therefore a hook is need in target lowering for getting the fence operand type. This patch has no effect on targets other than amdgcn. Differential Revision: https://reviews.llvm.org/D32186 llvm-svn: 301215	2017-04-24 18:26:27 +00:00
Matt Arsenault	3e02538a02	AMDGPU: Move trap lowering to DAG Fixes traps in any block besides the entry block, and fixes depending on a live-in physical register by using a virtual register copy. Also happens to stop emitting a nop in the case debug trap is not supported. llvm-svn: 301206	2017-04-24 17:49:13 +00:00

1 2 3 4 5 ...

20221 Commits