llvm-project

Commit Graph

Author	SHA1	Message	Date
Yonghong Song	ac2e25026f	bpf: avoid load from read-only sections If users tried to have a structure decl/init code like below struct test_t t = { .memeber1 = 45 }; It is very likely that compiler will generate a readonly section to hold up the init values for variable t. Later load of t members, e.g., t.member1 will result in a read from readonly section. BPF program cannot handle relocation. This will force users to write: struct test_t t = {}; t.member1 = 45; This is just inconvenient and unintuitive. This patch addresses this issue by implementing BPF PreprocessISelDAG. For any load from a global constant structure or an global array of constant struct, it attempts to translate it into a constant directly. The traversal of the constant struct and other constant data structures are similar to where the assembler emits read-only sections. Four different unit test cases are also added to cover different scenarios. Signed-off-by: Yonghong Song <yhs@fb.com> llvm-svn: 305560	2017-06-16 15:41:16 +00:00
Daniel Neilson	3faabbbe85	[Atomics] Rename and change prototype for atomic memcpy intrinsic Summary: Background: http://lists.llvm.org/pipermail/llvm-dev/2017-May/112779.html This change is to alter the prototype for the atomic memcpy intrinsic. The prototype itself is being changed to more closely resemble the semantics and parameters of the llvm.memcpy intrinsic -- to ease later combination of the llvm.memcpy and atomic memcpy intrinsics. Furthermore, the name of the atomic memcpy intrinsic is being changed to make it clear that it is not a generic atomic memcpy, but specifically a memcpy is unordered atomic. Reviewers: reames, sanjoy, efriedma Reviewed By: reames Subscribers: mzolotukhin, anna, llvm-commits, skatkov Differential Revision: https://reviews.llvm.org/D33240 llvm-svn: 305558	2017-06-16 14:43:59 +00:00
Simon Dardis	5852c4c108	Revert "[mips][microMIPS] Extending size reduction pass with ADDIUSP and ADDIUR1SP" This reverts commit r305455. This commit was reported as breaking one of the sanitizer buildbots. Reverting until lab.llvm.org comes back online. llvm-svn: 305557	2017-06-16 14:00:33 +00:00
Krzysztof Parzyszek	3a40b34123	[Hexagon] Don't kill live registers when creating mux out of tfr The second part of r305300: when placing the mux at the later location, make sure that it won't use any register that was killed between the two original instructions. Remove any such kills and transfer them to the mux. llvm-svn: 305553	2017-06-16 12:24:03 +00:00
Matthias Braun	a42c537912	RegScavenging: Add scavengeRegisterBackwards() Re-apply r276044/r279124. Trying to reproduce or disprove the ppc64 problems reported in the stage2 build last time, which I cannot reproduce right now. This is a variant of scavengeRegister() that works for enterBasicBlockEnd()/backward(). The benefit of the backward mode is that it is not affected by incomplete kill flags. This patch also changes PrologEpilogInserter::doScavengeFrameVirtualRegs() to use the register scavenger in backwards mode. Differential Revision: http://reviews.llvm.org/D21885 llvm-svn: 305516	2017-06-15 22:14:55 +00:00
Alexander Timofeev	0f9c84cd93	DivergencyAnalysis patch for review llvm-svn: 305494	2017-06-15 19:33:10 +00:00
Lei Huang	b4733ca8c5	[MachineLICM] Hoist TOC-based address instructions Add condition for MachineLICM to safely hoist instructions that utilize non constant registers that are reserved. On PPC, global variable access is done through the table of contents (TOC) which is always in register X2. The ABI reserves this register in any functions that have calls or access global variables. A call through a function pointer involves saving, changing and restoring this register around the call and thus MachineLICM does not consider it to be invariant. We can however guarantee the register is preserved across the call and thus is invariant. Differential Revision: https://reviews.llvm.org/D33562 llvm-svn: 305490	2017-06-15 18:29:59 +00:00
Arnold Schwaighofer	ae9312c487	ISel: Fix FastISel of swifterror values The code assumed that we process instructions in basic block order. FastISel processes instructions in reverse basic block order. We need to pre-assign virtual registers before selecting otherwise we get def-use relationships wrong. This only affects code with swifterror registers. rdar://32659327 llvm-svn: 305484	2017-06-15 17:34:42 +00:00
Hiroshi Inoue	7a08bb1458	[PowerPC] fix potential verification errors on CFENCE8 This patch fixes a potential verification error (64-bit register operands for cmpw) with -verify-machineinstrs. Differential Revision: https://reviews.llvm.org/D34208 llvm-svn: 305479	2017-06-15 16:51:28 +00:00
Simon Pilgrim	4d432b2c6b	[X86][AVX2] Fix issue in lowerV8I16GeneralSingleInputVectorShuffle that was assuming v8i16 vectors We can use this with v16i16/v32i16 as well. Found during fuzz testing. llvm-svn: 305472	2017-06-15 14:52:30 +00:00
Simon Pilgrim	b98cb3808c	Revert r305465: [X86][AVX512] Improve lowering of AVX512 compare intrinsics (remove redundant shift left+right instructions). This is causing windows buildbot failures llvm-svn: 305470	2017-06-15 14:39:34 +00:00
Ayman Musa	56912cda71	[X86][AVX512] Improve lowering of AVX512 compare intrinsics (remove redundant shift left+right instructions). AVX512 compare instructions return v*i1 types. In cases where the number of elements in the returned value are less than 8, clang adds zeroes to get a mask of v8i1 type. Later on it's replaced with CONCAT_VECTORS, which then is lowered to many DAG nodes including insert/extract element and shift right/left nodes. The fact that AVX512 compare instructions put the result in a k register and zeroes all its upper bits allows us to remove the extra nodes simply by copying the result to the required register class. When lowering, identify these cases and transform them into an INSERT_SUBVECTOR node (marked legal), then catch this pattern in instructions selection phase and transform it into one avx512 cmp instruction. Differential Revision: https://reviews.llvm.org/D33188 llvm-svn: 305465	2017-06-15 13:02:37 +00:00
Diana Picus	02e11010b2	[ARM] GlobalISel: Add support for i32 modulo Add support for modulo for targets that have hardware division and for those that don't. When hardware division is not available, we have to choose the correct libcall to use. This is generally straightforward, except for AEABI. The AEABI variant is trickier than the other libcalls because it returns { quotient, remainder }, instead of just one value like the other libcalls that we've seen so far. Therefore, we need to use custom lowering for it. However, we don't want to have too much special code, so we refactor the target-independent code in the legalizer by adding a helper for replacing an instruction with a libcall. This helper is used by the legalizer itself when dealing with simple calls, and also by the custom ARM legalization for the more complicated AEABI divmod calls. llvm-svn: 305459	2017-06-15 10:53:31 +00:00
Diana Picus	8fd1601d32	[ARM] GlobalISel: Lower only homogeneous struct args Lowering mixed struct args, params and returns used G_INSERT, which is a bit more convoluted to support through the entire pipeline. Since they don't occur that often in practice, it's probably wiser to leave them out until later. Meanwhile, we can lower homogeneous structs using G_MERGE_VALUES, which has good support in the legalizer. These occur e.g. as the return of __aeabi_idivmod, so it's nice to be able to support them. llvm-svn: 305458	2017-06-15 09:42:02 +00:00
Florian Hahn	0a26d2c298	[AArch64] Enable FeatureFuseAES for the generic processor model. Summary: Scheduling AESE/AESMC and AESD/AESIMC instruction pairs back-to-back gives a double digit speedup on benchmarks using those instructions on Cortex-A processors. In GCC, this optimization is part of the generic processor model as well. This change should not have a major performance impact on processors that do not optimize AES instruction pairs, although I only had access to Cortex-A processors for benchmarking. Reviewers: rengolin, kristof.beyls, javed.absar, evandro, silviu.baranga, MatzeB, mcrosier, joelkevinjones, joel_k_jones, bmakam, t.p.northover Reviewed By: evandro Subscribers: sbaranga, aemerson, llvm-commits Differential Revision: https://reviews.llvm.org/D33836 llvm-svn: 305457	2017-06-15 09:31:23 +00:00
Zoran Jovanovic	d9299293ad	[mips][microMIPS] Extending size reduction pass with ADDIUSP and ADDIUR1SP Author: milena.vujosevic.janicic Reviewers: sdardis The patch extends size reduction pass for MicroMIPS. The following instructions are examined and transformed, if possible: ADDIU instruction is transformed into 16-bit instruction ADDIUSP ADDIU instruction is transformed into 16-bit instruction ADDIUR1SP Differential Revision: https://reviews.llvm.org/D33887 llvm-svn: 305455	2017-06-15 09:14:33 +00:00
Alexandros Lamprineas	1c15ee2631	Revert "[ARM] Support constant pools in data when generating execute-only code." This reverts commit 3a204faa093c681a1e96c5e0622f50649b761ee0. I've upset a buildbot which runs the address sanitizer: ERROR: AddressSanitizer: stack-use-after-scope lib/Target/ARM/ARMISelLowering.cpp:2690 That Twine variable is used illegally. llvm-svn: 305390	2017-06-14 15:00:08 +00:00
Simon Dardis	9790e39f45	[mips] Fix multiprecision arithmetic. For multiprecision arithmetic on MIPS, rather than using ISD::ADDE / ISD::ADDC, get SelectionDAG to break down the operation into ISD::ADDs and ISD::SETCCs. For MIPS, only the DSP ASE has a carry flag, so in the general case it is not useful to directly support ISD::{ADDE, ADDC, SUBE, SUBC} nodes. Also improve the generation code in such cases for targets with TargetLoweringBase::ZeroOrOneBooleanContent by directly using the result of the comparison node rather than using it in selects. Similarly for ISD::SUBE / ISD::SUBC. Address optimization breakage by moving the generation of MIPS specific integer multiply-accumulate nodes to before legalization. This revolves PR32713 and PR33424. Thanks to Simonas Kazlauskas and Pirama Arumuga Nainar for reporting the issue! Reviewers: slthakur Differential Revision: https://reviews.llvm.org/D33494 llvm-svn: 305389	2017-06-14 14:46:30 +00:00
Alexandros Lamprineas	c582d6e133	[ARM] Support constant pools in data when generating execute-only code. The ARM backend asserts against constant pool lowering when it generates execute-only code in order to prevent the generation of constant pools in the text section. It appears that target independent optimizations might generate DAG nodes that represent constant pools. By lowering such nodes as global addresses we don't violate the semantics of execute-only code and also it is guaranteed that execute-only behaves correct with the position-independent addressing modes that support execute-only code. Differential Revision: https://reviews.llvm.org/D33773 llvm-svn: 305387	2017-06-14 13:22:41 +00:00
Florian Hahn	ffc498dfcc	Align definition of DW_OP_plus with DWARF spec [3/3] Summary: This patch is part of 3 patches that together form a single patch, but must be introduced in stages in order not to break things. The way that LLVM interprets DW_OP_plus in DIExpression nodes is basically that of the DW_OP_plus_uconst operator since LLVM expects an unsigned constant operand. This unnecessarily restricts the DW_OP_plus operator, preventing it from being used to describe the evaluation of runtime values on the expression stack. These patches try to align the semantics of DW_OP_plus and DW_OP_minus with that of the DWARF definition, which pops two elements off the expression stack, performs the operation and pushes the result back on the stack. This is done in three stages: • The first patch (LLVM) adds support for DW_OP_plus_uconst. • The second patch (Clang) contains changes all its uses from DW_OP_plus to DW_OP_plus_uconst. • The third patch (LLVM) changes the semantics of DW_OP_plus and DW_OP_minus to be in line with its DWARF meaning. This patch includes the bitcode upgrade from legacy DIExpressions. Patch by Sander de Smalen. Reviewers: echristo, pcc, aprantl Reviewed By: aprantl Subscribers: fhahn, javed.absar, aprantl, llvm-commits Differential Revision: https://reviews.llvm.org/D33894 llvm-svn: 305386	2017-06-14 13:14:38 +00:00
Simon Dardis	941a49b6d6	[mips] Fix machine verifier errors in the long branch pass This patch fixes two systemic machine verifier errors in the long branch pass. The first is the incorrect basic block successors and the second was the incorrect construction of several jump instructions. This partially resolves PR27458 and the associated PR32146. Reviewers: slthakur Differential Revision: https://reviews.llvm.org/D33378 llvm-svn: 305382	2017-06-14 12:16:47 +00:00
Nemanja Ivanovic	7855185bbb	Revert r304907 as it is causing some failures that I cannot reproduce. Reverting this until a test case can be provided to aid the investigation. llvm-svn: 305372	2017-06-14 07:05:42 +00:00
Daniel Sanders	4e52366c2a	[globalisel][legalizer] G_LOAD/G_STORE NarrowScalar should not emit G_GEP x, 0. Summary: When legalizing G_LOAD/G_STORE using NarrowScalar, we should avoid emitting %0 = G_CONSTANT ty 0 %1 = G_GEP %x, %0 since it's cheaper to not emit the redundant instructions than it is to fold them away later. Reviewers: qcolombet, t.p.northover, ab, rovka, aditya_nandakumar, kristof.beyls Reviewed By: qcolombet Subscribers: javed.absar, llvm-commits, igorb Differential Revision: https://reviews.llvm.org/D32746 llvm-svn: 305340	2017-06-13 23:42:32 +00:00
Krzysztof Parzyszek	b3a8d20e27	[Hexagon] Generate store-immediate instructions for stack objects Store-immediate instructions have a non-extendable offset. Since the actual offset for a stack object is not known until much later, only generate these stores when the stack size (at the time of instruction selection) is small. llvm-svn: 305305	2017-06-13 17:10:16 +00:00
Krzysztof Parzyszek	c83c267b84	[Hexagon] Generate multiply-high instruction in isel llvm-svn: 305302	2017-06-13 16:21:57 +00:00
Krzysztof Parzyszek	de2ac17b7b	[Hexagon] Don't kill live registers when creating mux out of tfr When a mux instruction is created from a pair of complementary conditional transfers, it can be placed at the location of either the earlier or the later of the transfers. Since it will use the operands of the original transfers, putting it in the earlier location may hoist a kill of a source register that was originally further down. Make sure the kill flag is removed if the register is still used afterwards. llvm-svn: 305300	2017-06-13 16:07:36 +00:00
Simon Dardis	c38d391f56	[MIPS] BuildCondBr should preserve MO flags While simplifying branches in the MachineInstr representation, the routine BuildCondBr must preserve flags on register MachineOperands. In particular, it must preserve the <undef> flag. This fixes a bug that is unlikely to occur in any real scenario, but which bugpoint is likely to introduce. Patch By Nick Johnson! Reviewers: ahatanak, sdardis Differential Revision: https://reviews.llvm.org/D34041 llvm-svn: 305290	2017-06-13 14:11:29 +00:00
Krzysztof Parzyszek	9bd4d91037	[Hexagon] Stop pmpy recognition when shift conversion fails The conversion of shifts from right shifts to left shifts may fail. In such case, the pmpy recognition cannot proceed. llvm-svn: 305289	2017-06-13 13:51:49 +00:00
Oliver Stannard	852fbd2fea	[ARM] Add scheduling classes for VFNM[AS] The VFNM[AS] instructions did not have scheduling information attached, which was causing assertion failures with the Cortex-A57 scheduling model and -fp-contract=fast, because the Cortex-A57 sched model claims to be complete. Differential Revision: https://reviews.llvm.org/D34139 llvm-svn: 305288	2017-06-13 13:04:32 +00:00
Craig Topper	8b8767662c	[AVX-512] Mark masked VPCMP instructions as commutable. llvm-svn: 305276	2017-06-13 07:13:50 +00:00
Craig Topper	e1d8103d8f	[AVX-512] Mark masked version of vpcmpeq as being commutable. llvm-svn: 305275	2017-06-13 07:13:47 +00:00
Craig Topper	42d0339257	[X86] Add masked integer compare instructions to load folding tables. llvm-svn: 305274	2017-06-13 07:13:44 +00:00
Tom Stellard	ee6e6452df	AMDGPU/GlobalISel: Mark 32-bit G_ADD as legal Reviewers: arsenm Reviewed By: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, igorb, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D33992 llvm-svn: 305232	2017-06-12 20:54:56 +00:00
Tim Northover	7a61316e89	AArch64: don't try to emit an add (shifted reg) for SP. The "Add/sub (shifted reg)" instructions use the 31 encoding for xzr and wzr rather than the SP, so we need to use different variants. Situations where this actually comes up are rare enough (see test-case) that I think falling back to DAG is fine. llvm-svn: 305230	2017-06-12 20:49:53 +00:00
Tony Jiang	1a8eec141a	[PowerPC] Match vec_revb builtins to P9 instructions. Power9 has instructions that will reverse the bytes within an element for all sizes (half-word, word, double-word and quad-word). These can be used for the vec_revb builtins in altivec.h. However, we implement these to match vector shuffle nodes as that will cover both the builtins and vector shuffles that occur in the SDAG through other means. Differential Revision: https://reviews.llvm.org/D33690 llvm-svn: 305214	2017-06-12 18:24:36 +00:00
Tony Jiang	30a49d1a3d	[Power9] Added support for the modsw, moduw, modsd, modud hardware instructions. Note that if we need the result of both the divide and the modulo then we compute the modulo based on the result of the divide and not using the new hardware instruction. Commit on behalf of STEFAN PINTILIE. Differential Revision: https://reviews.llvm.org/D33940 llvm-svn: 305210	2017-06-12 17:58:42 +00:00
Sanjay Patel	5e7b7b7503	[x86] regenerate checks with update_llc_test_checks.py The dream of a unified check-line auto-generator for all phases of compilation is dead. The llc script has already diverged to be better at its goal, so having 2 scripts that do almost the same thing is just causing confusion. We can rip out the llc ability in update_test_checks.py next and rename it, so it will be clear that we have one script for llc check auto-generation and another for opt. llvm-svn: 305206	2017-06-12 17:31:36 +00:00
Geoff Berry	06c9dc3d9c	[SelectionDAG] Allow sin/cos -> sincos optimization on GNU triples w/ just -fno-math-errno Summary: This change enables the sin(x) cos(x) -> sincos(x) optimization on GNU target triples. This optimization was being inhibited when -ffast-math wasn't set because sincos in GLibC does not set errno, while sin and cos do. However, this optimization will only run if the attributes on the sin/cos calls include readnone, which is how clang represents the fact that it doesn't care about the errno values set by these functions (via the -fno-math-errno flag). Reviewers: hfinkel, bogner Subscribers: mcrosier, javed.absar, llvm-commits, paul.redmond Differential Revision: https://reviews.llvm.org/D32921 llvm-svn: 305204	2017-06-12 17:15:41 +00:00
Matt Arsenault	d9b77848f2	AMDGPU: Teach isLegalAddressingMode about flat offsets Also fix reporting r+r as a valid addressing mode without offsets. llvm-svn: 305203	2017-06-12 17:06:35 +00:00
Sanjay Patel	9d13a18845	[x86] regenerate checks with update_llc_test_checks.py The dream of a unified check-line auto-generator for all phases of compilation is dead. The llc script has already diverged to be better at its goal, so having 2 scripts that do almost the same thing is just causing confusion for newcomers. I plan to fix up more x86 tests in a next commit. We can rip out the llc ability in update_test_checks.py after that. llvm-svn: 305202	2017-06-12 17:05:43 +00:00
Matt Arsenault	db7c6a8731	AMDGPU: Start selecting flat instruction offsets llvm-svn: 305201	2017-06-12 16:53:51 +00:00
Matt Arsenault	fd02314113	AMDGPU: Start adding offset fields to flat instructions llvm-svn: 305194	2017-06-12 15:55:58 +00:00
Than McIntosh	14d61436c0	StackColoring: smarter check for slot overlap Summary: The old check for slot overlap treated 2 slots `S` and `T` as overlapping if there existed a CFG node in which both of the slots could possibly be active. That is overly conservative and caused stack blowups in Rust programs. Instead, check whether there is a single CFG node in which both of the slots are possibly active together. Fixes PR32488. Patch by Ariel Ben-Yehuda <ariel.byd@gmail.com> Reviewers: thanm, nagisa, llvm-commits, efriedma, rnk Reviewed By: thanm Subscribers: dotdash Differential Revision: https://reviews.llvm.org/D31583 llvm-svn: 305193	2017-06-12 14:56:02 +00:00
Craig Topper	69fead95c7	[AVX-512] Add VPCONFLICT and VPLZCNT to load folding tables. llvm-svn: 305180	2017-06-12 04:57:31 +00:00
Sanjay Patel	dcbfbb11d9	[x86] use vperm2f128 rather than vinsertf128 when there's a chance to fold a 32-byte load I was looking closer at the x86 test diffs in D33866, and the first change seems like it shouldn't happen in the first place. So this patch will resolve that. Using Agner's tables and AMD docs, vperm2f128 and vinsertf128 have identical timing for any given CPU model, so we should be able to interchange those without affecting perf. But as we can see in some of the diffs here, using vperm2f128 allows load folding, so we should take that opportunity to reduce code size and register pressure. A secondary advantage is making AVX1 and AVX2 codegen more similar. Given that vperm2f128 was introduced with AVX1, we should be selecting it in all of the same situations that we would with AVX2. If there's some reason that an AVX1 CPU would not want to use this instruction, that should be fixed up in a later pass. Differential Revision: https://reviews.llvm.org/D33938 llvm-svn: 305171	2017-06-11 21:18:58 +00:00
Amaury Sechet	2127452ff7	[DAGCombine] Make sure we check the ResNo from UADDO before combining Summary: UADDO has 2 result, and one must check the result no before doing any kind of combine. Without it, the transform is invalid. Reviewers: joerg Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D34088 llvm-svn: 305162	2017-06-11 11:36:38 +00:00
Simon Pilgrim	8622f51e94	[X86][SSE] Extended PR32368 to SSE/AVX1/AVX2 llvm-svn: 305154	2017-06-10 21:13:01 +00:00
Simon Pilgrim	46619359db	[X86][AVX512] Added test case for PR32368 llvm-svn: 305153	2017-06-10 20:58:43 +00:00
Wei Ding	7c3e5115a5	AMDGPU : Fix ISA Version Definitions. Differential Revision: http://reviews.llvm.org/D28531 llvm-svn: 305137	2017-06-10 03:53:19 +00:00
Sanjay Patel	dd96270472	[PowerPC] add memcmp test with one constant operand and equality cmp; NFC llvm-svn: 305131	2017-06-09 23:15:14 +00:00
I-Jui (Ray) Sung	21fde385fa	[AArch64] Add fallback in FastISel fp16 conversions Summary: - Fix assertion failures on F16 to/from int types in FastISel by falling back to regular ISel - Add a testcase of various conversion cases with FastISel (-O0) Reviewers: kristof.beyls, jmolloy, SjoerdMeijer Reviewed By: SjoerdMeijer Subscribers: SjoerdMeijer, llvm-commits, srhines, pirama, aemerson, rengolin, javed.absar, kristof.beyls Differential Revision: https://reviews.llvm.org/D33734 llvm-svn: 305127	2017-06-09 22:40:50 +00:00
Stanislav Mekhanoshin	1a61ab8172	[AMDGPU] Add intrinsics for alignbit and alignbyte instructions Differential Revision: https://reviews.llvm.org/D34046 llvm-svn: 305098	2017-06-09 19:03:00 +00:00
Simon Pilgrim	3d37b1a277	[X86][SSE] Add support for PACKSS nodes to faux shuffle extraction If the inputs won't saturate during packing then we can treat the PACKSS as a truncation shuffle llvm-svn: 305091	2017-06-09 17:29:52 +00:00
Simon Dardis	212cccb2f4	Reland "[SelectionDAG] Enable target specific vector scalarization of calls and returns" By target hookifying getRegisterType, getNumRegisters, getVectorBreakdown, backends can request that LLVM to scalarize vector types for calls and returns. The MIPS vector ABI requires that vector arguments and returns are passed in integer registers. With SelectionDAG's new hooks, the MIPS backend can now handle LLVM-IR with vector types in calls and returns. E.g. 'call @foo(<4 x i32> %4)'. Previously these cases would be scalarized for the MIPS O32/N32/N64 ABI for calls and returns if vector types were not legal. If vector types were legal, a single 128bit vector argument would be assigned to a single 32 bit / 64 bit integer register. By teaching the MIPS backend to inspect the original types, it can now implement the MIPS vector ABI which requires a particular method of scalarizing vectors. Previously, the MIPS backend relied on clang to scalarize types such as "call @foo(<4 x float> %a) into "call @foo(i32 inreg %1, i32 inreg %2, i32 inreg %3, i32 inreg %4)". This patch enables the MIPS backend to take either form for vector types. The previous version of this patch had a "conditional move or jump depends on uninitialized value". Reviewers: zoran.jovanovic, jaydeep, vkalintiris, slthakur Differential Revision: https://reviews.llvm.org/D27845 llvm-svn: 305083	2017-06-09 14:37:08 +00:00
David Stuttard	82618baa0f	[AMDGPU] Fix for issue in alloca to vector promotion pass Summary: Alloca promotion pass not dealing with non-canonical input Added some additional checks so the pass simply backs-off forms it can't deal with (non-canonical) Also added some test cases in non-canonical form to check that it no longer crashes Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tpr, t-tye Differential Revision: https://reviews.llvm.org/D31710 llvm-svn: 305079	2017-06-09 14:16:22 +00:00
Nirav Dave	43a4d8122f	Prevent RemoveDeadNodes from deleted already deleted node. This prevents against assertion errors like PR32659 which occur from a replacement deleting a node after it's been added to the list argument of RemoveDeadNodes. The specific failure from PR32659 does not currently happen, but it is still potentially possible. The underlying cause is that the callers of the change dfunction builds up a list of nodes to delete after having moved their uses and it possible that a move of a later node will cause a previously deleted nodes to be deleted. Reviewers: bkramer, spatel, davide Reviewed By: spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D33731 llvm-svn: 305070	2017-06-09 12:57:35 +00:00
Oliver Stannard	ad0973557c	[ARM] Add scheduling info for VFMS The scalar VFMS instructions did not have scheduling information attached (but VFMA did), which was causing assertion failures with the Cortex-A57 scheduling model and -fp-contract=fast. Differential Revision: https://reviews.llvm.org/D34040 llvm-svn: 305064	2017-06-09 09:19:09 +00:00
Matthias Braun	1ee25e0c3f	RegAllocPBQP: Do not assign reserved physical register (0) RegAllocPBQP: Since getRawAllocationOrder() may return a collection that includes reserved physical registers, iterate to find an un-reserved physical register. (1) VirtRegMap: Enforce the invariant: "no reserved physical registers" in assignVirt2Phys(). Previously, this was checked only after the fact in VirtRegRewriter::rewrite. (2) MachineVerifier: updated the test per MatzeB's review. (3) +testcase Patch by Nick Johnson<Nicholas.Paul.Johnson@deshawresearch.com>! Differential Revision: https://reviews.llvm.org/D33947 llvm-svn: 305016	2017-06-08 21:30:54 +00:00
Krzysztof Parzyszek	8a7fb0fe51	[Hexagon] Skip mux generation when predicate register is undefined llvm-svn: 305014	2017-06-08 20:56:36 +00:00
Sanjay Patel	5e370850d4	[CGP] don't expand a memcmp with nobuiltin attribute This matches the behavior used in the SDAG when expanding memcmp. For reference, we're intentionally treating the earlier fortified call transforms differently after: https://bugs.llvm.org/show_bug.cgi?id=23093 https://reviews.llvm.org/rL233776 One motivation for not transforming nobuiltin calls is that it can interfere with sanitizers: https://reviews.llvm.org/D19781 https://reviews.llvm.org/D19801 Differential Revision: https://reviews.llvm.org/D34043 llvm-svn: 305007	2017-06-08 19:47:25 +00:00
Matt Arsenault	3c7581bbeb	AMDGPU: Use correct register names in inline assembly Fixes using physical registers in inline asm from clang. llvm-svn: 305004	2017-06-08 19:03:20 +00:00
Guozhi Wei	f31c56df2a	[PPC] In PPCBoolRetToInt change the bool value to i64 if the target is ppc64 In PPCBoolRetToInt bool value is changed to i32 type. On ppc64 it may introduce an extra zero extension for the return value. This patch changes the integer type to i64 to avoid the zero extension on ppc64. This patch fixed PR32442. Differential Revision: https://reviews.llvm.org/D31407 llvm-svn: 305001	2017-06-08 18:27:24 +00:00
Mark Searles	e5c7832311	[AMDGPU] Force qsads instrs to use different dest register than source registers The V_MQSAD_PK_U16_U8, V_QSAD_PK_U16_U8, and V_MQSAD_U32_U8 take more than 1 pass in hardware. For these three instructions, the destination registers must be different than all sources, so that the first pass does not overwrite sources for the following passes. Differential Revision: https://reviews.llvm.org/D33783 llvm-svn: 304998	2017-06-08 18:21:19 +00:00
Zaara Syeda	79acbbe513	[Power9] Exploit vector integer extend instructions This patch adds build vector patterns to exploit the vector integer extend instructions: vextsb2w - Vector Extend Sign Byte To Word vextsb2d - Vector Extend Sign Byte To Doubleword vextsh2w - Vector Extend Sign Halfword To Word vextsh2d - Vector Extend Sign Halfword To Doubleword vextsw2d - Vector Extend Sign Word To Doubleword Differential Revision: https://reviews.llvm.org/D33510 llvm-svn: 304992	2017-06-08 17:14:36 +00:00
Sanjay Patel	0edcd1d717	[PowerPC] add memcmp test with nobuiltin attr; NFC In SDAG, we don't expand libcalls with a nobuiltin attribute. It's not clear if that's correct from the existing code comment: "Don't do the check if marked as nobuiltin for some reason." ...adding a test here either way to show that there is currently a different behavior implemented in the CGP-based expansion. llvm-svn: 304991	2017-06-08 17:09:18 +00:00
Sanjay Patel	b5a03797df	[x86] remove unused param from tests; NFC llvm-svn: 304989	2017-06-08 17:02:39 +00:00
Sanjay Patel	e7c5041c2a	[CGP / PowerPC] avoid multi-block overhead for simple memcmp expansion The test diff for PowerPC shows we can better optimize if this case is one block. For x86, there's would be a substantial difference if CGP expansion was enabled because branches are assumed cheap and SDAG can't optimize across blocks. Instead of this: _cmp_eq8: movq (%rdi), %rax cmpq (%rsi), %rax je LBB23_1 ## BB#2: ## %res_block movl $1, %ecx jmp LBB23_3 LBB23_1: xorl %ecx, %ecx LBB23_3: ## %endblock xorl %eax, %eax testl %ecx, %ecx sete %al retq We get this: cmp_eq8: movq (%rdi), %rcx xorl %eax, %eax cmpq (%rsi), %rcx sete %al retq And that matches the optimal codegen that we get from the current expansion in SelectionDAGBuilder::visitMemCmpCall(). If this looks right, then I just need to confirm that vector-sized expansion will work from here, and we can enable CGP memcmp() expansion for x86. Ie, we'll bypass the power-of-2 special cases currently optimized in SDAG because we can lower the IR produced here optimally. Differential Revision: https://reviews.llvm.org/D34005 llvm-svn: 304987	2017-06-08 16:53:18 +00:00
Andrew V. Tischenko	8cb1d0931f	Add scheduler classes to integer/float horizontal operations. This patch will close PR32801. Differential Revision: https://reviews.llvm.org/D33203 llvm-svn: 304986	2017-06-08 16:44:13 +00:00
Sanjay Patel	2ab6ee0dc4	[x86] add tests for memcmp expansion; NFC We already had a test to demonstrate PR33325: https://bugs.llvm.org/show_bug.cgi?id=33325 I'm adding tests for general memcmp expansion (see D34005 / D33963) and: https://bugs.llvm.org/show_bug.cgi?id=33329 ...plus non-power-of-2 sizes, so we can see what that looks like currently or if expanded. llvm-svn: 304979	2017-06-08 15:01:29 +00:00
Andrew V. Tischenko	e0531025f8	This patch closes PR28513: an optimization of multiplication by different constants. The initial patch was rejected: I fixed the issue and re-apply it. llvm-svn: 304972	2017-06-08 10:20:13 +00:00
Diana Picus	dbd4589042	[ARM] GlobalISel: Add more tests. NFC Add a couple of tests to increase coverage for the TableGen'erated code, in particular for rules where 2 generic instructions may be combined into a single machine instruction. llvm-svn: 304971	2017-06-08 09:47:30 +00:00
Krzysztof Parzyszek	5ba13825f0	[Hexagon] Generate 'inbounds' GEPs in HexagonCommonGEP llvm-svn: 304937	2017-06-07 20:04:33 +00:00
Sanjay Patel	8ce1e3b759	[CGP] avoid zext/trunc of a memcmp expansion compare This could be viewed as another shortcoming of the DAGCombiner: when both operands of a compare are zexted from the same source type, we should be able to compare the original types. The effect on PowerPC perf is likely unnoticeable, but there's a visible regression for x86 if we feed the suboptimal IR for memcmp expansion to the DAG: _cmp_eq4_zexted_to_i64: movl (%rdi), %ecx movl (%rsi), %edx xorl %eax, %eax cmpq %rdx, %rcx sete %al _cmp_eq4_better: movl (%rdi), %ecx xorl %eax, %eax cmpl (%rsi), %ecx sete %al llvm-svn: 304923	2017-06-07 16:16:45 +00:00
Petar Jovanovic	2f5f8e947a	[mips][dsp] Modify repl.ph to accept signed immediate values Changed immediate type for repl.ph from uimm10 to simm10 as per the specs. Repl.qb still accepts uimm8. Both instructions now mimic the behaviour of GNU as. Patch by Stefan Maksimovic. Differential Revision: https://reviews.llvm.org/D33594 llvm-svn: 304918	2017-06-07 14:48:46 +00:00
Guy Blank	e1888d4388	[X86] Add test to demonstrate inefficient lowering of v48i8 shuffle. llvm-svn: 304915	2017-06-07 14:29:10 +00:00
Tom Stellard	2860a428f7	AMDGPU/GlobalISel: Mark 32-bit G_SELECT as legal Reviewers: arsenm Reviewed By: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, igorb, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D33949 llvm-svn: 304910	2017-06-07 13:54:51 +00:00
Sanjay Patel	6e8e7cc70e	[x86] avoid flipping sign bits for vector icmp by using known bits If we know that both operands of an unsigned integer vector comparison are non-negative, then it's safe to directly use a signed-compare-greater-than instruction (the only non-equality integer vector compare predicate provided by SSE/AVX). We're intentionally not changing the condition code to signed in order to preserve the existing transforms that use min/max/psubus below here. This should solve PR33276: https://bugs.llvm.org/show_bug.cgi?id=33276 Differential Revision: https://reviews.llvm.org/D33862 llvm-svn: 304909	2017-06-07 13:46:34 +00:00
Nemanja Ivanovic	d8623f0825	[PowerPC] Eliminate integer compare instructions - vol. 5 Adds handling for i64 SETNE comparison (both sign and zero extended). Differential Revision: https://reviews.llvm.org/D33720 llvm-svn: 304907	2017-06-07 13:18:06 +00:00
Petar Jovanovic	3c039d968e	[mips] do not use FastISel when -mxgot is present The clang compiler by default uses FastISel when invoked with -O0, which is also the default. In that case, passing of -mxgot does not get honored, i.e. the code path that is to deal with large got is not taken. Clang produces same output regardless of -mxgot being present or not. This change checks whether -mxgot is passed as an option, and turns off FastISel if it is. Patch by Stefan Maksimovic. Differential Revision: https://reviews.llvm.org/D33593 llvm-svn: 304906	2017-06-07 12:59:53 +00:00
Diana Picus	0b4190a9d6	[ARM] GlobalISel: Purge G_SEQUENCE According to the commit message from r296921, G_MERGE_VALUES and G_INSERT are to be preferred over G_SEQUENCE. Therefore, stop generating G_SEQUENCE in the ARM backend and remove the code dealing with it. This boils down to the code breaking up double values for the soft float calling convention. Use G_MERGE_VALUES + G_UNMERGE_VALUES instead of G_SEQUENCE + G_EXTRACT for it. This maps very nicely to VMOVDRR + VMOVRRD and simplifies the code in the instruction selector. There's one occurence of G_SEQUENCE left in arm-irtranslator.ll, but that is part of the target-independent code for translating constant structs. Therefore, it is beyond the scope of this commit. llvm-svn: 304902	2017-06-07 12:35:05 +00:00
Nemanja Ivanovic	bb67f847d6	[PowerPC] Eliminate integer compare instructions - vol. 3 Adds handling for i32 SETNE comparison (both sign and zero extended). Differential Revision: https://reviews.llvm.org/D33718 llvm-svn: 304901	2017-06-07 12:23:41 +00:00
Diana Picus	0196427b03	[ARM] GlobalISel: Support G_XOR Same as the other binary operators: - legalize to 32 bits - map to GPRs - select to EORrr via TableGen'erated code llvm-svn: 304898	2017-06-07 11:57:30 +00:00
Simon Dardis	7c96ba1920	evert "[mips] Fix test mips64fpldst.ll with machine verifier enabled" This reverts commit r301394. It broke some internal buildbots, reverting while the issue is being investigated. llvm-svn: 304896	2017-06-07 11:21:37 +00:00
Simon Pilgrim	58f5be2771	[X86][SSE] Fix an issue with PEXTRW/PEXTRB indices during shuffle combining We were checking that the index was in range of the destination vector type, not the (larger) source vector type llvm-svn: 304894	2017-06-07 10:30:35 +00:00
Diana Picus	eeb0aad8e4	[ARM] GlobalISel: Support G_OR Same as the other binary operators: - legalize to 32 bits - map to GPRs - select ORRrr thanks to TableGen'erated code llvm-svn: 304890	2017-06-07 10:14:23 +00:00
Diana Picus	8445858a93	[ARM] GlobalISel: Support G_AND This is identical to the support for the other binary operators: - widen to s32 - map into GPR - select ANDrr (via TableGen'erated code) llvm-svn: 304885	2017-06-07 09:17:41 +00:00
Sanjay Patel	f57015d4cc	[CGP / PowerPC] use direct compares if there's only one load per block in memcmp() expansion I'd like to enable CGP memcmp expansion for x86, but the output from CGP would regress the special cases (memcmp(x,y,N) != 0 for N=1,2,4,8,16,32 bytes) that we already handle. I'm not sure if we'll actually be able to produce the optimal code given the block-at-a-time limitation in the DAG. We might have to just avoid those special-cases here in CGP. But regardless of that, I think this is a win for the more general cases. http://rise4fun.com/Alive/cbQ Differential Revision: https://reviews.llvm.org/D33963 llvm-svn: 304849	2017-06-07 00:17:08 +00:00
Sanjay Patel	7a52296f1f	[PowerPC] auto-generate full checks and increase test coverage 3 of the tests were testing exactly the same thing: memcmp(x, y, 16) != 0. I changed that to test 4, 7, and 16 bytes, so we can see how those differ. llvm-svn: 304838	2017-06-06 22:06:07 +00:00
Evgeny Stupachenko	ff9d80cc7a	Added tests for X86InterleavedStore. Reviewers: RKSimon, DavidKreitzer Differential Revision: https://reviews.llvm.org/D33684 Patch by: Aleen Farhana <Farhana.aleen@gmail.com> llvm-svn: 304834	2017-06-06 21:08:00 +00:00
Matthias Braun	7e23fc05c1	llc: Add ability to parse mir from stdin - Add -x <language> option to switch between IR and MIR inputs. - Change MIR parser to read from stdin when filename is '-'. - Add a simple mir roundtrip test. llvm-svn: 304825	2017-06-06 20:06:57 +00:00
Evgeny Stupachenko	3b88291581	Fix PR23384 (part 3 of 3) Summary: The patch makes instruction count the highest priority for LSR solution for X86 (previously registers had highest priority). Reviewers: qcolombet Differential Revision: http://reviews.llvm.org/D30562 From: Evgeny Stupachenko <evstupac@gmail.com> llvm-svn: 304824	2017-06-06 20:04:16 +00:00
Matthias Braun	8b5f9e4438	MIRPrinter: Avoid assert() when printing empty INLINEASM strings. CodeGen uses MO_ExternalSymbol to represent the inline assembly strings. Empty strings for symbol names appear to be invalid. For now just special case the output code to avoid hitting an `assert()` in `printLLVMNameWithoutPrefix()`. This fixes https://llvm.org/PR33317 llvm-svn: 304815	2017-06-06 19:00:58 +00:00
Petar Jovanovic	64fb7a8ebd	[mips] Add madd4 subtarget feature Addition of a feature and a predicate used to control generation of madd.fmt and similar instructions. Patch by Stefan Maksimovic. Differential Revision: https://reviews.llvm.org/D33400 llvm-svn: 304801	2017-06-06 15:33:01 +00:00
Simon Pilgrim	f7113fd270	[X86][AVX1] Split 256-bit vector non-temporal FastISel loads to keep it non-temporal (PR32744) Extension to D33728 llvm-svn: 304798	2017-06-06 14:18:39 +00:00
Tom Stellard	8cd60a5067	AMDGPU/GlobalISel: Mark 32-bit G_ICMP as legal Reviewers: arsenm Reviewed By: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, igorb, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D33890 llvm-svn: 304797	2017-06-06 14:16:50 +00:00
Vivek Pandya	56d87ef5d7	[Improve CodeGen Testing] This patch renables MIRPrinter print fields which have value equal to its default. If -simplify-mir option is passed then MIRPrinter will not print such fields. This change also required some lit test cases in CodeGen directory to be changed. Reviewed By: MatzeB Differential Revision: https://reviews.llvm.org/D32304 llvm-svn: 304779	2017-06-06 08:16:19 +00:00
Chandler Carruth	11134628e2	[x86] Stop this test from dirtying the source tree when run. The output isn't used anyways. llvm-svn: 304766	2017-06-06 03:24:22 +00:00
Chandler Carruth	d7120758ba	[x86] Add the test for folding stack spills into pextrw. This is a negative test as pextrw doesn't write to all 32-bits of the spilled GPR. This fold ended up happening when D32684 was landed and covers the regression that motivated reverting it in r304762. llvm-svn: 304763	2017-06-06 02:16:01 +00:00
Chandler Carruth	41ed4034dd	[x86] Revert the X86FoldTablesEmitter due to more miscompiles. In testing, we've found yet another miscompile caused by the new tables. And this one is even less clear how to fix (we could teach it to fold a 16-bit load instead of the 32-bit load it wants, or block folding entirely). Also, the approach to excluding instructions seems increasingly to not scale well. I have left a more detailed analysis on the review log for the original patch (https://reviews.llvm.org/D32684) along with suggested path forward. I will land an additional test case that I wrote which covers the code that was miscompiling (folding into the output of `pextrw`) in a subsequent commit to keep this a pure revert. For each commit reverted here, I've restricted the revert to the non-test code touching the x86 fold table emission until the last commit where I did revert the test updates. This means the new test cases added for `insertps` and `xchg` remain untouched (and continue to pass). Reverted commits: r304540: [X86] Don't fold into memory operands into insertps in the ... r304347: [TableGen] Adapt more places to getValueAsString now ... r304163: [X86] Don't fold away the memory operand of an xchg. r304123: Don't capture a temporary std::string in a StringRef. r304122: Resubmit "[X86] Adding new LLVM TableGen backend that ..." Original commit was in r304088, and after a string of fixes was reverted previously in r304121 to fix build bots, and then re-landed in r304122. llvm-svn: 304762	2017-06-06 02:15:31 +00:00
Matthias Braun	7bda195812	CodeGen: Refactor MIR parsing When parsing .mir files immediately construct the MachineFunctions and put them into MachineModuleInfo. This allows us to get rid of the delayed construction (and delayed error reporting) through the MachineFunctionInitialzier interface. Differential Revision: https://reviews.llvm.org/D33809 llvm-svn: 304758	2017-06-06 00:44:35 +00:00
Matthias Braun	c7c06f158c	CodeGen/LLVMTargetMachine: Refactor ISel pass construction; NFCI - Move ISel (and pre-isel) pass construction into TargetPassConfig - Extract AsmPrinter construction into a helper function Putting the ISel code into TargetPassConfig seems a lot more natural and both changes together make make it easier to build custom pipelines involving .mir in an upcoming commit. This moves MachineModuleInfo to an earlier place in the pass pipeline which shouldn't have any effect. llvm-svn: 304754	2017-06-06 00:26:13 +00:00
Sanjay Patel	d47e64398e	[x86] fix over-specific triple; NFC There's nothing darwin-specific in these tests, and using that setting causes extra phantom diffs when the auto-generated check lines are regenerated today. llvm-svn: 304753	2017-06-06 00:18:11 +00:00
Quentin Colombet	c668935d85	[InlineSpiller] Don't spill fully undef values Althought it is not wrong to spill undef values, it is useless and harms both code size and runtime. Before spilling a value, check that its content actually matters. http://www.llvm.org/PR33311 llvm-svn: 304752	2017-06-05 23:51:27 +00:00
Matt Arsenault	9e5b5053d1	RenameIndependentSubregs: Fix handling of undef tied operands If a tied source operand was undef, it would be replaced but not update the other tied operand, which would end up using different virtual registers. llvm-svn: 304747	2017-06-05 22:58:57 +00:00
Volkan Keles	ebe6bb9006	[GlobalISel] IRTranslator: Add MachineMemOperand to target memory intrinsics Reviewers: qcolombet, ab, t.p.northover, aditya_nandakumar, dsanders Reviewed By: qcolombet Subscribers: rovka, kristof.beyls, javed.absar, igorb, llvm-commits Differential Revision: https://reviews.llvm.org/D33724 llvm-svn: 304743	2017-06-05 22:17:17 +00:00
Davide Italiano	fb4d5c095b	[SelectionDAG] Update the dominator after splitting critical edges. Running `llc -verify-dom-info` on the attached testcase results in a crash in the verifier, due to a stale dominator tree. i.e. DominatorTree is not up to date! Computed: =============================-------------------------------- Inorder Dominator Tree: [1] %safe_mod_func_uint8_t_u_u.exit.i.i.i {0,7} [2] %lor.lhs.false.i61.i.i.i {1,2} [2] %safe_mod_func_int8_t_s_s.exit.i.i.i {3,6} [3] %safe_div_func_int64_t_s_s.exit66.i.i.i {4,5} Actual: =============================-------------------------------- Inorder Dominator Tree: [1] %safe_mod_func_uint8_t_u_u.exit.i.i.i {0,9} [2] %lor.lhs.false.i61.i.i.i {1,2} [2] %safe_mod_func_int8_t_s_s.exit.i.i.i {3,8} [3] %safe_div_func_int64_t_s_s.exit66.i.i.i {4,5} [3] %safe_mod_func_int8_t_s_s.exit.i.i.i.lor.lhs.false.i61.i.i.i_crit_edge {6,7} This is because in `SelectionDAGIsel` we split critical edges without updating the corresponding dominator for the function (and we claim in `MachineFunctionPass::getAnalysisUsage()` that the domtree is preserved). We could either stop preserving the domtree in `getAnalysisUsage` or tell `splitCriticalEdge()` to update it. As the second option is easy to implement, that's the one I chose. Differential Revision: https://reviews.llvm.org/D33800 llvm-svn: 304742	2017-06-05 22:16:41 +00:00
Simon Pilgrim	807b708d13	[X86][SSE41] Non-temporal loads shouldn't be folded if it can be avoided (PR32743) Missed SSE41 non-temporal load case in previous commit Differential Revision: https://reviews.llvm.org/D33728 llvm-svn: 304722	2017-06-05 16:45:32 +00:00
Simon Pilgrim	b2ef948628	[X86][AVX1] Split 256-bit vector non-temporal loads to keep it non-temporal (PR32744) Differential Revision: https://reviews.llvm.org/D33728 llvm-svn: 304718	2017-06-05 16:02:01 +00:00
Simon Pilgrim	a25bf0b6b9	[X86][SSE] Non-temporal loads shouldn't be folded if it can be avoided (PR32743) Differential Revision: https://reviews.llvm.org/D33728 llvm-svn: 304717	2017-06-05 15:43:03 +00:00
Diana Picus	0091cc3528	[ARM] GlobalISel: Constrain callee register on indirect calls When lowering calls, we generate instructions with machine opcodes rather than generic ones. Therefore, we need to constrain the register classes of the operands. Also enable the machine verifier on the arm-irtranslator.ll test, since that would've caught this issue. Fixes (part of) PR32146. llvm-svn: 304712	2017-06-05 12:54:53 +00:00
Javed Absar	b16d146838	Add support for #pragma clang section This patch provides a means to specify section-names for global variables, functions and static variables, using #pragma directives. This feature is only defined to work sensibly for ELF targets. One can specify section names as: #pragma clang section bss="myBSS" data="myData" rodata="myRodata" text="myText" One can "unspecify" a section name with empty string e.g. #pragma clang section bss="" data="" text="" rodata="" Reviewers: Roger Ferrer, Jonathan Roelofs, Reid Kleckner Differential Revision: https://reviews.llvm.org/D33413 llvm-svn: 304704	2017-06-05 10:09:13 +00:00
Stanislav Mekhanoshin	286a4225b9	[AMDGPU] Fix SIFoldOperands crash with clamp Fixes bug #33302. Pass did not account that Src1 of max instruction can be an immediate. Differential Revision: https://reviews.llvm.org/D33884 llvm-svn: 304696	2017-06-05 01:03:04 +00:00
Simon Pilgrim	46dd55f1e1	[X86][SSE] Change BUILD_VECTOR interleaving ordering to improve coalescing/combine opportunities We currently generate BUILD_VECTOR as a tree of UNPCKL shuffles of the same type: e.g. for v4f32: Step 1: unpcklps 0, 2 ==> X: <?, ?, 2, 0> : unpcklps 1, 3 ==> Y: <?, ?, 3, 1> Step 2: unpcklps X, Y ==> <3, 2, 1, 0> The issue is because we are not placing sequential vector elements together early enough, we fail to recognise many combinable patterns - consecutive scalar loads, extractions etc. Instead, this patch unpacks progressively larger sequential vector elements together: e.g. for v4f32: Step 1: unpcklps 0, 2 ==> X: <?, ?, 1, 0> : unpcklps 1, 3 ==> Y: <?, ?, 3, 2> Step 2: unpcklpd X, Y ==> <3, 2, 1, 0> This does mean that we are creating UNPCKL shuffle of different value types, but the relevant combines that benefit from this are quite capable of handling the additional BITCASTs that are now included in the shuffle tree. Differential Revision: https://reviews.llvm.org/D33864 llvm-svn: 304688	2017-06-04 20:12:04 +00:00
Igor Breger	3bfba2c569	[GlobalISel][X86] merge irtranslator-call test files. NFC llvm-svn: 304683	2017-06-04 12:41:10 +00:00
Stanislav Mekhanoshin	0330660403	[AMDGPU] Untangle SDWA pass from SIShrinkInstructions Remove dependency of SDWA pass on SIShrinkInstructions. The goal is to move SDWA even higher in the stack to avoid second run of MachineLICM, MachineCSE and SIFoldOperands. Also added handling to preserve original src modifiers. Differential Revision: https://reviews.llvm.org/D33860 llvm-svn: 304665	2017-06-03 17:39:47 +00:00
Amaury Sechet	39fbe3bb60	Regenerate expectations for trunc-to-bool.ll . NFC llvm-svn: 304660	2017-06-03 11:35:40 +00:00
Simon Pilgrim	f93debb40c	[X86][SSE] Add SCALAR_TO_VECTOR(PEXTRW/PEXTRB) support to faux shuffle combining Generalized existing SCALAR_TO_VECTOR(EXTRACT_VECTOR_ELT) code to support AssertZext + PEXTRW/PEXTRB cases as well. llvm-svn: 304659	2017-06-03 11:12:57 +00:00
Tom Stellard	e042412ef1	AMDGPU/GlobalISel: Mark 1-bit integer constants as legal Summary: These are mostly legal, but will probably need special lowering for some cases. Reviewers: arsenm Reviewed By: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, igorb, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D33791 llvm-svn: 304628	2017-06-03 01:13:33 +00:00
Stanislav Mekhanoshin	f154b4f52c	[AMDGPU] Preserve operand order in SIFoldOperands SIFoldOperands can commute operands even if no folding was done. This change is to preserve IR is no folding was done. Differential Revision: https://reviews.llvm.org/D33802 llvm-svn: 304625	2017-06-03 00:41:52 +00:00
Quentin Colombet	1ee8616ca0	[SystemZ] Simplify test case. NFC Remove useless successors information. llvm-svn: 304615	2017-06-02 23:40:58 +00:00
Sanjay Patel	56641ac497	[x86] fix over-specific triple; NFC There's nothing darwin-specific in these tests, and using that setting causes extra phantom diffs when the auto-generated check lines are regenerated today. llvm-svn: 304614	2017-06-02 23:40:46 +00:00
Philip Reames	80135bdf9e	Canonicalize a test via utils/update_test_checks.py Turns out I might not have further changes to make here, but with the way I'd written the tests, even I couldn't tell that. :( llvm-svn: 304613	2017-06-02 23:27:36 +00:00
Sanjay Patel	4cad0f0477	[x86] add tests for unsigned vector compares with known signbits; NFC (PR33276) llvm-svn: 304612	2017-06-02 23:24:28 +00:00
Matthias Braun	0021d46a1c	RegisterScavenging: Add ScavengerTest pass This pass allows to run the register scavenging independently of PrologEpilogInserter to allow targeted testing. Also adds some basic register scavenging tests. llvm-svn: 304606	2017-06-02 23:01:42 +00:00
Quentin Colombet	2145cf3f07	[RABasic] Properly update the LiveRegMatrix when LR splitting occur Prior to this patch we used to not touch the LiveRegMatrix while doing live-range splitting. In other words, when live-range splitting was occurring, the LiveRegMatrix was not reflecting the changes. This is generally fine because it means the query to the LiveRegMatrix will be conservately correct. However, when decisions are taken based on what is going to happen on the interferences (e.g., when we spill a register and know that it is going to be available for another one), we might hit an assertion that the color used for the assignment is still in use. This patch makes sure the changes on the live-ranges are properly reflected in the LiveRegMatrix, so the assertions don't break. An alternative could have been to remove the assertion, but it would make the invariants of the code and the general reasoning more complicated in my opnion. http://llvm.org/PR33057 llvm-svn: 304603	2017-06-02 22:46:31 +00:00
Quentin Colombet	ebbaed6d3c	[RABasic] Properly initialize the pass Use the initializeXXX method to initialize the RABasic pass in the pipeline. This enables us to take advantage of the .mir infrastructure. llvm-svn: 304602	2017-06-02 22:46:26 +00:00
Ahmed Bougacha	018a68f9e4	[X86] Correctly broadcast NaN-like integers as float on AVX. Since r288804, we try to lower build_vectors on AVX using broadcasts of float/double. However, when we broadcast integer values that happen to have a NaN float bitpattern, we lose the NaN payload, thereby changing the integer value being broadcast. This is caused by ConstantFP::get, to which we pass the splat i32 as a float (by bitcasting it using bitsToFloat). ConstantFP::get takes a double parameter, so we end up lossily converting a single-precision NaN to double-precision. Instead, avoid any kinds of conversions by directly building an APFloat from the splatted APInt. Note that this also fixes another piece of code (broadcast of subvectors), that currently isn't susceptible to the same problem. Also note that we could really just use APInt and ConstantInt throughout: the constant pool type doesn't matter much. Still, for consistency, use the appropriate type. llvm-svn: 304590	2017-06-02 20:02:59 +00:00
Amaury Sechet	04ffaca604	Regenerate expectation for wide-fma-contraction.ll . NFC llvm-svn: 304586	2017-06-02 19:15:04 +00:00
Konstantin Zhuravlyov	be6c0ca5e2	AMDGPU: Make auto waitcnt before barrier a feature Differential Revision: https://reviews.llvm.org/D33793 llvm-svn: 304571	2017-06-02 17:40:26 +00:00
Philip Reames	94cc4a29ed	Add placeholder for more extensive verification of psuedo ops This initial patch doesn't actually do much useful. It's just to show where the new code goes. Once this is in, I'll extend the verification logic to check more useful properties. For those curious, the more complicated version of this patch already found one very suspicious thing. Differential Revision: https://reviews.llvm.org/D33819 llvm-svn: 304564	2017-06-02 16:36:37 +00:00
Amaury Sechet	5746e7356a	Update select.ll expected results. NFC llvm-svn: 304557	2017-06-02 16:07:43 +00:00
Alexander Timofeev	3f70b619a9	AMDGPUAnnotateUniformValue should always treat volatile loads as divergent llvm-svn: 304554	2017-06-02 15:25:52 +00:00
Mark Searles	70359ac60d	[AMDGPU] Turn on the new waitcnt insertion pass. Adjust tests. -enable-si-insert-waitcnts=1 becomes the default -enable-si-insert-waitcnts=0 to use old pass Differential Revision: https://reviews.llvm.org/D33730 llvm-svn: 304551	2017-06-02 14:19:25 +00:00
Zoran Jovanovic	2aae0649a1	[mips][microMIPS] Extending size reduction pass with LBU16, LHU16, SB16 and SH16 Author: milena.vujosevic.janicic Reviewers: sdardis The patch extends size reduction pass for MicroMIPS. The following instructions are examined and transformed, if possible: LBU instruction is transformed into 16-bit instruction LBU16 LHU instruction is transformed into 16-bit instruction LHU16 SB instruction is transformed into 16-bit instruction SB16 SH instruction is transformed into 16-bit instruction SH16 Differential Revision: https://reviews.llvm.org/D33091 llvm-svn: 304550	2017-06-02 14:14:21 +00:00
Krzysztof Parzyszek	066e8b56a0	[Hexagon] Return 0 from getDotNewPredOp when .new opcode does not exist This allows using this function to test if an instruction can be converted to a .new form. llvm-svn: 304549	2017-06-02 14:07:06 +00:00
Amaury Sechet	2e1fed9ef8	Regenerate sse3.ll test results. NFC llvm-svn: 304548	2017-06-02 14:02:49 +00:00
Amaury Sechet	8e370f14cb	Regenerate and-sink.ll test results. NFC llvm-svn: 304547	2017-06-02 14:02:46 +00:00
Amaury Sechet	f0c066f140	Regenerate shrink-compare.ll test results. NFC llvm-svn: 304546	2017-06-02 14:02:43 +00:00
Benjamin Kramer	19092d783c	[X86] Don't fold into memory operands into insertps in the generated folding tables. insertps behaves differently, the register form selects from an input register based on the immediate operand while the memory form just loads the given address. We have custom code to change the immediate in cases where that's legal, so completely remove insertps from the generated tables. llvm-svn: 304540	2017-06-02 10:50:22 +00:00
John Brawn	6671616cde	[GlobalMerge] Don't merge globals that may be preempted When a global may be preempted it needs to be accessed directly, instead of indirectly through a MergedGlobals symbol, for the preemption to work. This fixes PR33136. Differential Revision: https://reviews.llvm.org/D33727 llvm-svn: 304537	2017-06-02 10:24:14 +00:00
Diana Picus	e7aa90987d	[ARM] GlobalISel: Support struct params/returns Very very similar to the support for arrays. As with arrays, we don't support returning large structs that wouldn't fit in R0-R3. Most front-ends would likely use sret arguments for that anyway. The only significant difference is that when splitting a struct, we need to make sure we set the correct original alignment on each member, otherwise it may get split incorrectly between stack and registers. llvm-svn: 304536	2017-06-02 10:16:48 +00:00
Javed Absar	4ae7e81233	[ARM] Cortex-A57 scheduling model for ARM backend (AArch32) This patch implements the Cortex-A57 scheduling model. The main code is in ARMScheduleA57.td, ARMScheduleA57WriteRes.td. Small changes in cpp,.h files to support required scheduling predicates. Scheduling model implemented according to: http://infocenter.arm.com/help/topic/com.arm.doc.uan0015b/Cortex_A57_Software_Optimization_Guide_external.pdf. Patch by : Andrew Zhogin (submitted on his behalf, as requested). Rewiewed by: Renato Golin, Diana Picus, Javed Absar, Kristof Beyls. Differential Revision: https://reviews.llvm.org/D28152 llvm-svn: 304530	2017-06-02 08:53:19 +00:00
Amaury Sechet	9a6fdc0bd5	Specify triple for xor-icmp.ll . llvm-svn: 304526	2017-06-02 07:45:22 +00:00
Amaury Sechet	968dda7f81	Regenerate expectations for xor-icmp.ll . NFC llvm-svn: 304525	2017-06-02 07:25:02 +00:00
Yaxun Liu	a618acf923	[AMDGPU] Fix kernel arg segment size for amdgizcl Differential Revision: https://reviews.llvm.org/D33307 llvm-svn: 304482	2017-06-01 21:31:53 +00:00
Nirav Dave	4952871630	[SDAG] Fix CombineTo ordering in visitZERO_EXTEND and visitSIGN_EXTEND Reorder CombineTo Calls to prevent references to stale/deleted SDNodes which caused undue assertions. Reviewers: dbabokin Subscribers: aemerson, rengolin, llvm-commits Differential Revision: https://reviews.llvm.org/D31625 llvm-svn: 304460	2017-06-01 19:33:50 +00:00
Krzysztof Parzyszek	3cf16576d5	[Hexagon] Fix dependence check in the packetizer An incorrect check in the packetizer lead to an attempt to convert an unconditional branch to a .new (conditional) form. llvm-svn: 304442	2017-06-01 18:02:40 +00:00
Krzysztof Parzyszek	51fd5405d5	[Hexagon] Handle long-running simplification loop in idiom recognition The initial assumption was that the simplification would converge to a fixed point relatvely quickly. Turns out that there are legitimate situa- tions where the complexity of the code causes it to take a large number of iterations. Two main changes: - Instead of aborting upon hitting the limit, simply return nullptr. - Reduce the limit to 10,000 from 100,000. llvm-svn: 304441	2017-06-01 18:00:47 +00:00
Amaury Sechet	94eb633dd2	Fix addcarry-crash.ll llvm-svn: 304415	2017-06-01 14:24:31 +00:00
Amaury Sechet	b761959993	Add regression test for the addcarry crash. See D33770 for context. llvm-svn: 304414	2017-06-01 14:09:56 +00:00

1 2 3 4 5 ...

20569 Commits