llvm-project

Commit Graph

Author	SHA1	Message	Date
Anna Welker	1e413a8c36	[ARM][MVE] Add support for incrementing gathers Enables the MVEGatherScatterLowering pass to build pre-incrementing gathers. Incrementing writeback gathers are built when it is possible to replace the loop increment instruction. Differential Revision: https://reviews.llvm.org/D76786	2020-05-07 12:33:50 +01:00
Kerry McLaughlin	3bcd3dd473	[CodeGen][SVE] Lowering of shift operations with scalable types Summary: Adds AArch64ISD nodes for: - SHL_PRED (logical shift left) - SHR_PRED (logical shift right) - SRA_PRED (arithmetic shift right) Existing patterns for unpredicated left shift by immediate have also been moved into the appropriate multiclasses in SVEInstrFormats.td. Reviewers: sdesmalen, efriedma, ctetreau, huihuiz, rengolin Reviewed By: efriedma Subscribers: huihuiz, tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, cfe-commits, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D79478	2020-05-07 11:43:49 +01:00
Jay Foad	17e13da29d	[AMDGPU] Re-auto-generate test checks	2020-05-07 11:08:11 +01:00
Carl Ritson	e3ffe7269b	[AMDGPU] Cluster shader exports Summary: Add DAG scheduling mutation to cluster export instructions. This avoids unnecessary waitcnts being added when computation ends up interspersed with exports. Reviewers: foad, arsenm, rampitec, nhaehnle Reviewed By: foad Subscribers: kzhuravl, jvesely, wdng, mgorny, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D79481	2020-05-07 19:05:38 +09:00
Carl Ritson	0d4d86cbd1	[AMDGPU] Precommit test for D79481. NFC Test shows unnecessary s_waitcnt between shader exports.	2020-05-07 19:01:51 +09:00
Kerry McLaughlin	a31f4c52bf	[SVE][CodeGen] Fix legalisation for scalable types Summary: This patch handles illegal scalable types when lowering IR operations, addressing several places where the value of isScalableVector() is ignored. For types such as <vscale x 8 x i32>, this means splitting the operations. In this example, we would split it into two operations of type <vscale x 4 x i32> for the low and high halves. In cases such as <vscale x 2 x i32>, the elements in the vector will be promoted. In this case they will be promoted to i64 (with a vector of type <vscale x 2 x i64>) Reviewers: sdesmalen, efriedma, huntergr Reviewed By: efriedma Subscribers: david-arm, tschuett, hiraditya, rkruppe, psnobl, cfe-commits, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D78812	2020-05-07 10:01:31 +01:00
Sam Parker	3c9b6dfa54	[NFC][ARM] Add tail predication test	2020-05-07 08:19:32 +01:00
Craig Topper	350645594e	[X86] Enable combinePMULH to match multiplies with elements larger than i32. We're truncating so the extra bits will be discarded.	2020-05-06 23:13:59 -07:00
Craig Topper	1796cfd837	[X86] Add test cases for missed opportunity to match pmulh from multiplies with elements larger than i32. We currently look for vXi32 sext/zext to match PMULH, but it doesn't matter how many extra bits above i32 there are.	2020-05-06 23:13:58 -07:00
Eli Friedman	2c8546107a	[AArch64][SVE] Implement lowering for SIGN_EXTEND etc. of SVE predicates. Now using patterns, since there's a single-instruction lowering. (We could convert to VSELECT and pattern-match that, but there doesn't seem to be much point.) I think this might be the first instruction to use nested multiclasses this way? It seems like a good way to reduce duplication between different integer widths. Let me know if it seems like an improvement. Also, while I'm here, fix the return type of SETCC so we don't try to merge a sign-extend with a SETCC. Differential Revision: https://reviews.llvm.org/D79193	2020-05-06 17:56:32 -07:00
Ulrich Weigand	947f78ac27	[SystemZ] Fix/optimize vec_load_len and related intrinsics When using vec_load/store_len_r with an immediate length operand of 16 or larger, LLVM will currently emit an VLRL/VSTRL instruction with that immediate. This creates a valid encoding (which should be supported by the assembler), but always traps at runtime. This patch fixes this by not creating VLRL/VSTRL in those cases. This would result in loading the length into a register and calling VLRLR/VSTRLR instead. However, these operations with a length of 15 or larger are in fact simply equivalent to a full vector load or store. And in fact the same holds true for vec_load/store_len as well. Therefore, add a DAGCombine rule to replace those operations with plain vector loads or stores if the length is known at compile time and equal or larger to 15.	2020-05-06 21:15:58 +02:00
LemonBoy	7fa5abd343	[SelectionDAG] Fix assertion failure with big shift amounts Calling getShiftAmountTy with LegalTypes set may return a type that's too narrow to hold the shift amount for integer type it's applied to. Fixes the regression introduced by D79096 Differential Revision: https://reviews.llvm.org/D79405	2020-05-06 11:58:37 -07:00
Sanjay Patel	1b678ee8a6	[x86] add test of shift+cast+concat for PR45794; NFC Depends on D79360 / rG2f1fe1864d25 for the transform.	2020-05-06 14:18:04 -04:00
Michael Liao	4ee5a04187	[amdgpu] Fix check of VCC. Summary: - Need to include checking on the new 16-bit subregs. Reviewers: rampitec Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D79498	2020-05-06 14:16:37 -04:00
Simon Pilgrim	fe6f5ba0bf	[X86][AVX] Add PR45808 test case for badly promoted comparison mask arithmetic	2020-05-06 19:09:57 +01:00
Simon Pilgrim	8817334ce3	[X86] getShuffleScalarElt - add CONCAT_VECTORS/INSERT_VECTOR_ELT support. This helped fix some i686 vXi64 broadcast folds that were becoming v2Xi32 broadcasts because we didn't match the broadcast until after SimplifyDemandedBits worked out we only used the bottom 32-bits in PMUL(U)DQ and type legalization had split the original i64 load. A couple of regressions occurred which required some fixups - adding concat_vectors(broadcast_load,broadcast_load) splat support and recognising (unnecessary) unary shuffles of already broadcasted vectors. This came about as part of the work investigating vector load combining from shuffles for PR42550.	2020-05-06 18:13:33 +01:00
Stanislav Mekhanoshin	54d6dfe996	[AMDGPU] Drop 16 bit subreg suffixes on print We do not want to break asm syntax. These suffixes are quite useful for debugging, so add an option to print them. Right now it is NFC. Differential Revision: https://reviews.llvm.org/D79435	2020-05-06 08:14:10 -07:00
Luís Marques	a3e6e624c7	[RISCV][NFC] Add more constant materialization tests This patch adds more constant materialization tests, focusing on cases where we could improve our materialization instruction sequences (particularly for RV64). Various of these cases will be improved upon in follow-up patches. Differential Revision: https://reviews.llvm.org/D79453	2020-05-06 16:06:16 +01:00
David Green	f5f83cf4df	[ARM] VMOVhr load -> vldr Much like the similar combine added recently for VMOVrh load, this adds a fold for VMOVhr load turning it into a vldr.f16 as opposed to a vldrh and vmov.f16. Differential Revision: https://reviews.llvm.org/D78714	2020-05-06 15:45:56 +01:00
Ram Nalamothu	f7060f4f88	For PAL, make sure Scratch Buffer Descriptor do not clobber GIT pointer Since SRSRC has alignment requirements, first find non GIT pointer clobbered registers for SRSRC and then if those registers clobber preloaded Scratch Wave Offset register, copy the Scratch Wave Offset register to a free SGPR.	2020-05-06 10:31:15 -04:00
Sanjay Patel	2f1fe1864d	[DAGCombiner] sink target-supported FP<->int cast op after concat vectors Try to combine N short vector cast ops into 1 wide vector cast op: concat (cast X), (cast Y)... -> cast (concat X, Y...) This is part of solving PR45794: https://bugs.llvm.org/show_bug.cgi?id=45794 As noted in the code comment, this is uglier than I was hoping because the opcode determines whether we pass the source or destination type to isOperationLegalOrCustom(). Also IIUC, there's no way to validate what the other (dest or src) type is. Without the extra legality check on that, there's an ARM regression test in: test/CodeGen/ARM/isel-v8i32-crash.ll ...that will crash trying to lower an unsupported v8f32 to v8i16. Differential Revision: https://reviews.llvm.org/D79360	2020-05-06 10:25:58 -04:00
Matt Arsenault	074c371a48	AMDGPU: Insert kernarg code after allocas This produces more normal looking IR by keeping all the allocas clustered at the start of the block.	2020-05-06 10:19:56 -04:00
David Green	d05f8a38c5	[ARM] VMOVrh of VMOVhr A VMOVhr of a VMOVrh can be simply folded to the original HPR value. Differential Revision: https://reviews.llvm.org/D78710	2020-05-06 15:10:01 +01:00
David Green	a349949f8a	[ARM] Extract from a VDUP If we get into the situation where we are extracting from a VDUP, the extracted value is just the origin, so long as the types match or we can bitcast between the two. Differential Revision: https://reviews.llvm.org/D78708	2020-05-06 14:51:25 +01:00
David Green	ed7db68c35	[ARM] Convert a bitcast VDUP to a VDUP The idea, under MVE, is to introduce more bitcasts around VDUP's in an attempt to get the type correct across basic block boundaries. In order to do that without other regressions we need a few fixups, of which this is the first. If the code is a bitcast of a VDUP, we can convert that straight into a VDUP of the new type, so long as they have the same size. Differential Revision: https://reviews.llvm.org/D78706	2020-05-06 14:14:21 +01:00
Dmitry Preobrazhensky	5998baccb9	[AMDGPU][MC][GFX9+] Enabled 21-bit signed offsets for SMEM instructions Reviewers: arsenm, rampitec Differential Revision: https://reviews.llvm.org/D79288	2020-05-06 14:13:10 +03:00
Konstantin Schwarz	e82b0e9a8e	[GlobalISel][InlineAsm] Add support for basic output operand constraints Reviewers: arsenm, dsanders, aemerson, volkan, t.p.northover, paquette Reviewed By: arsenm Subscribers: gargaroff, wdng, rovka, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D78318	2020-05-06 10:06:13 +02:00
Craig Topper	0fac1c1912	[X86] Allow Yz inline assembly constraint to choose ymm0 or zmm0 when avx/avx512 are enabled and type is 256 or 512 bits gcc supports selecting ymm0/zmm0 for the Yz constraint when used with 256 or 512 bit vector types. Fixes PR45806 Differential Revision: https://reviews.llvm.org/D79448	2020-05-05 21:12:30 -07:00
Jessica Paquette	b1b86d1c28	[AArch64][GlobalISel] Fold shifts into G_ICMP Since G_ICMP can be selected to a SUBS, we can fold shifts into such compares. E.g. ``` cmp w1, w0, lsl #3 cmp w1, w0, lsr #3 cmp w1, w0, asr #3 ``` This is done the same way as for adds and subtracts, using `selectShiftedRegister`. This gives some minor code size savings on CTMark. https://reviews.llvm.org/D79365	2020-05-05 18:35:39 -07:00
Craig Topper	a4286fc952	[X86] Fix usage of Align constructing MachineMemOperands. Similar to D77687, but for the X86 specific code. Differential Revision: https://reviews.llvm.org/D79381	2020-05-05 15:25:02 -07:00
Christudasan Devadasan	b8a616ec59	[AMDGPU] Fixed the test by adding the triple.	2020-05-06 00:14:01 +05:30
Momchil Velikov	fb18dffaeb	Revert "[ARM] CMSE code generation" This reverts commit `7cbbf89d23`. The regression tests fail with the expensive checks.	2020-05-05 19:05:40 +01:00
Christudasan Devadasan	375cec4b6c	[AMDGPU] Introduce more scratch registers in the ABI. The AMDGPU target has a convention that defined all VGPRs (execept the initial 32 argument registers) as callee-saved. This convention is not efficient always, esp. when the callee requiring more registers, ended up emitting a large number of spills, even though its caller requires only a few. This patch revises the ABI by introducing more scratch registers that a callee can freely use. The 256 vgpr registers now become: 32 argument registers 112 scratch registers and 112 callee saved registers. The scratch registers and the CSRs are intermixed at regular intervals (a split boundary of 8) to obtain a better occupancy. Reviewers: arsenm, t-tye, rampitec, b-sumner, mjbedy, tpr Reviewed By: arsenm, t-tye Differential Revision: https://reviews.llvm.org/D76356	2020-05-05 23:02:58 +05:30
Momchil Velikov	7cbbf89d23	[ARM] CMSE code generation This patch implements the final bits of CMSE code generation: * emit special linker symbols * restrict parameter passing to not use memory * emit BXNS and BLXNS instructions for returns from non-secure entry functions, and non-secure function calls, respectively * emit code to save/restore secure floating-point state around calls to non-secure functions * emit code to save/restore non-secure floating-pointy state upon entry to non-secure entry function, and return to non-secure state * emit code to clobber registers not used for arguments and returns when switching to no-secure state Patch by Momchil Velikov, Bradley Smith, Javed Absar, David Green, possibly others. Differential Revision: https://reviews.llvm.org/D76518	2020-05-05 18:23:28 +01:00
Stanislav Mekhanoshin	9ef166e657	[AMDGPU] Fix FoldImmediate for 16 bit operand Differential Revision: https://reviews.llvm.org/D79362	2020-05-05 10:19:14 -07:00
Jinsong Ji	80b78a47e5	[MachinePipeliner] Add ORE for MachinePipeliner This patch adds ORE for MachinePipeliner, so that people can anaylyze their code using opt-viewer or other tools, then optimize the code to catch more piplining opportunities. Reviewed By: bcahoon Differential Revision: https://reviews.llvm.org/D79368	2020-05-05 16:04:53 +00:00
David Green	146d44c251	[LSR] Don't require register reuse under postinc LSR has some logic that tries to aggressively reuse registers in formula. This can lead to sub-optimal decision in complex loops where the backend it trying to use shouldFavorPostInc. This disables the re-use in those situations. Differential Revision: https://reviews.llvm.org/D79301	2020-05-05 16:04:50 +01:00
Jay Foad	3d76824b7f	[AMDGPU] Better support for VMEM soft clauses in GCNHazardRecognizer VMEM soft clauses only contain VMEM and FLAT instructions. Teaching GCNHazardRecognizer::checkSoftClauseHazards that other kinds of instructions will naturally break the clause means there are far fewer cases where it has to insert an s_nop instruction to forcibly break the clause. Differential Revision: https://reviews.llvm.org/D79353	2020-05-05 15:49:09 +01:00
Sebastian Neubauer	1de4e56933	[AMDGPU] Don't mark the .note section as ALLOC Marking a section as ALLOC tells the ELF loader to load the section into memory. As we do not want to load the notes into VRAM, the flag should not be there. On AMDHSA, .note is still marked as ALLOC, apparently this is currently needed for OpenCL (see https://reviews.llvm.org/D74995). Differential Revision: https://reviews.llvm.org/D76278	2020-05-05 14:21:45 +02:00
Sander de Smalen	8cb5663abd	[AArch64][SVE] Guard bitcast patterns under IsLE predicate Reviewed By: efriedma Tags: #llvm Differential Revision: https://reviews.llvm.org/D79352	2020-05-05 13:18:35 +01:00
David Green	f85acb1915	[ARM] Correct the type on a predicate cast A PREDICATE_CAST(PREDICATE_CAST(X)) can be converted to a PREDICATE_CAST(X) as the operation can convert between any forms of predicates (v4i1/v8i1/v16i1/i32). Unfortunately I got the type wrong on one of the rarer converts, which would lead to invalid nodes during isel. This fixes it up to use the correct type. Differential Revision: https://reviews.llvm.org/D79402	2020-05-05 13:15:10 +01:00
Simon Pilgrim	e53d4869a0	[X86][AVX] combineVectorSignBitsTruncation - avoid complex vXi64->vXi32 PACKSS truncations (PR45794) Unless we're truncating an 'all-bits' result, using PACKSS for vXi64->vXi32 truncation causes problems with later combines as ComputeNumSignBits struggles to see through BITCASTs to smaller types. If we don't use PACKSS in these cases then we fallback to shuffles which are usually just as good.	2020-05-05 11:57:25 +01:00
Simon Pilgrim	371a69ac9a	[X86][AVX] Add PR45794 sitofp v4i64-v4f32 test case	2020-05-05 11:32:48 +01:00
Heejin Ahn	834debfffd	[WebAssembly] Fix block marker placing after fixUnwindMismatches Summary: This fixes a few things that are connected. It is very hard to provide an independent test case for each of those fixes, because they are interconnected and sometimes one masks another. The provided test case triggers some of those bugs below but not all. --- 1. Background: `placeBlockMarker` takes a BB, and if the BB is a destination of some branch, it places `end_block` marker there, and computes the nearest common dominator of all predecessors (what we call 'header') and places a `block` marker there. When we first place markers, we traverse BBs from top to bottom. For example, when there are 5 BBs A, B, C, D, and E and B, D, and E are branch destinations, if mark the BB given to `placeBlockMarker` with `` and draw a rectangle representing the border of `block` and `end_block` markers, the process is going to look like ``` ------- ----- \|-----\| --- \|---\| \|\|---\|\| \|A\| \|\|A\|\| \|\|\|A\|\|\| --- --> \|---\| --> \|\|---\|\| B \| B \| \|\| B \|\| C \| C \| \|\| C \|\| D ----- \|-----\| E D \| D \| E ------- E ``` which means when we first place markers, we go from inner to outer scopes. So when we place a `block` marker, if the header already contains other `block` or `try` marker, it has to belong to an inner scope, so the existing `block`/`try` markers should go _after_ the new marker. This was the assumption we had. But after placing all markers we run `fixUnwindMismatches` function. There we do some control flow transformation and create some branches, and we call `placeBlockMarker` again to place `block`/`end_block` markers for those newly created branches. We can't assume that we are traversing branch destination BBs from top to bottom now because we are basically inserting some new markers in the middle of existing markers. Fix: In `placeBlockMarker`, we don't have the assumption that the BB given is in the order of top to bottom, and when placing `block` markers, calculates whether existing `block` or `try` markers are inner or outer scopes with respect to the current scope. --- 2. Background: In `fixUnwindMismatches`, when there is a call whose correct unwind destination mismatches the current destination after initially placing `try` markers, we wrap that with a new nested `try`/`catch`/`end` and jump to the correct handler within the new `catch`. The correct handler code is split as a separate BB from its original EH pad so it can be branched to. Here's an example: - Before ``` mbb: call @foo <- Unwind destination mismatch! wrong-ehpad: catch ... cont: end_try ... correct-ehpad: catch [handler code] ``` - After ``` mbb: try (new) call @foo nested-ehpad: (new) catch (new) local.set n / drop (new) br %handleri (new) nested-end: (new) end_try (new) wrong-ehpad: catch ... cont: end_try ... correct-ehpad: catch local.set n / drop (new) handler: (new) end_try [handler code] ``` Note that after this transformation, it is possible there are no calls to actually unwind to `correct-ehpad` here. `call @foo` now branches to `handler`, and there can be no other calls to unwind to `correct-ehpad`. In this case `correct-ehpad` does not have any predecessors anymore. This can cause a bug in `placeBlockMarker`, because we may need to place `end_block` marker in `handler`, and `placeBlockMarker` computes the nearest common dominator of all predecessors. If one of `handler`'s predecessor (here `correct-ehpad`) does not have any predecessors, i.e., no way of reaching it, we cannot correctly compute the common dominator of predecessors of `handler`, and end up placing no `block`/`end` markers. This bug actually sometimes masks the bug 1. Fix: When we have an EH pad that does not have any predecessors after this transformation, deletes all its successors, so that its successors don't have any dangling predecessors. --- 3. Background: Actually the `handler` BB in the example shown in bug 2 doesn't need `end_block` marker, despite it being a new branch destination, because it already has `end_try` marker which can serve the same purpose. I just put that example there for an illustration purpose. There is a case we actually need to place `end_block` marker: when the branch dest is the appendix BB. The appendix BB is created when there is a call that is supposed to unwind to the caller ends up unwinding to a wrong EH pad. In this case we also wrap the call with a nested `try`/`catch`/`end`, create an 'appendix' BB at the very end of the function, and branch to that BB, where we rethrow the exception to the caller. Fix: When we don't actually need to place block markers, we don't. --- 4. In case we fall through to the continuation BB after the catch block, after extracting handler code in `fixUnwindMismatches` (refer to bug 2 for an example), we now have to add a branch to it to bypass the handler. - Before ``` try ... (falls through to 'cont') catch handler body end <-- cont ``` - After ``` try ... br %cont (new) catch end handler body <-- cont ``` The problem is, we haven't been placing a new `end_block` marker in the `cont` BB in this case. We should, and this fixes it. But it is hard to provide a test case that triggers this bug, because the current compilation pipeline from .ll to .s does not generate this kind of code; we always have a `br` after `invoke`. But code without `br` is still valid, and we can have that kind of code if we have some pipeline changes or optimizations later. Even mir test cases cannot trigger this part for now, because we don't encode auxiliary EH-related data structures (such as `WasmEHFuncInfo`) in mir now. Those functionalities can be added later, but I don't think we should block this fix on that. Reviewers: dschuff Subscribers: sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D79324	2020-05-05 02:06:47 -07:00
Pierre-vh	d5eb7ffa33	[Target][ARM] Fold or(A, B) more aggressively for I1 vectors This patch makes the folding of or(A, B) into not(and(not(A), not(B))) more agressive for I1 vector. This only affects Thumb2 MVE and improves codegen, because it removes a lot of msr/mrs instructions on VPR.P0. This patch also adds a xor(vcmp) -> !vcmp fold for MVE. Differential Revision: https://reviews.llvm.org/D77202	2020-05-05 10:03:02 +01:00
Pierre-vh	ffdda495f7	[Target][ARM] Add PerformVSELECTCombine for MVE Integer Ops This patch adds an implementation of PerformVSELECTCombine in the ARM DAG Combiner that transforms vselect(not(cond), lhs, rhs) into vselect(cond, rhs, lhs). Normally, this should be done by the target-independent DAG Combiner, but it doesn't handle the kind of constants that we generate, so we have to reimplement it here. Differential Revision: https://reviews.llvm.org/D77712	2020-05-05 10:03:02 +01:00
David Green	09767af848	[ARM] MVE predcast with const test. NFC	2020-05-05 09:53:42 +01:00
Krzysztof Parzyszek	156092bbcc	[RegisterCoalescer] Extend a subrange if needed when filling range gap Register live ranges may have had gaps that after coalescing should be removed. This is done by adding a new segment to the range, and merging it with neighboring segments. When doing so, do not assume that each subrange of the register ended at the same index. If a subrange ended earlier, adding this segment could make the live range invalid. Instead, if the subrange is not live at the start of the segment, extend it first.	2020-05-04 16:49:59 -05:00
Sanjay Patel	58c1770b8f	[x86] add test for shift+op+concat; NFC D79360 could change this kind of sequence.	2020-05-04 17:31:45 -04:00
David Green	8430141578	[ARM] Complex LSR test showing inefficient codegen. NFC	2020-05-04 21:50:10 +01:00
Eli Friedman	1eb160fe8d	[ARM] Fix tail call validity checking for varargs calls. If a varargs function is calling a non-varargs function, or vice versa, make sure we use the correct "varargs" bit for each. Fixes https://bugs.llvm.org/show_bug.cgi?id=45234 Differential Revision: https://reviews.llvm.org/D79199	2020-05-04 12:34:14 -07:00
Sanjay Patel	f1d083ab45	[x86] add tests for concat of casts; NFC	2020-05-04 15:22:31 -04:00
Snehasish Kumar	c8ac29ab1d	Descriptive symbol names for machine basic block sections. Today symbol names generated for machine basic block sections use a unary encoding to reduce bloat. This is essential when every basic block in the binary is assigned a symbol however with basic block clusters (rG05192e585ce175b55f2a26b83b4ed7882785c8e6) when we only need to generate a few non-temporary symbols we can assign more descriptive names making them more user friendly. With this change - Cold cluster section for function foo is named "foo.cold" Exception cluster section for function foo is named "foo.eh" Other cluster sections identified by their ids are named "foo.ID" Using this format works well with existing tools. It will demangle as expected and works with existing symbolizers, profilers and debuggers out of the box. $ c++filt _Z3foov.cold foo() [clone .cold] $ c++filt _Z3foov.eh foo() [clone .eh] $c++filt _Z3foov.1234 foo() [clone 1234] Tests for basicblock-sections are updated with some cleanup where appropriate. Differential Revision: https://reviews.llvm.org/D79221	2020-05-04 19:06:43 +00:00
Sean Fertile	47f9e71ac7	[PowerPC][AIX][NFC] Remove spills and reloads from arg passing test.	2020-05-04 14:26:33 -04:00
Stanislav Mekhanoshin	c85eda74b8	[AMDGPU] fix copies between 32 and 16 bit This a hack to fix illegal 32 to 16 bit copies. The problem is when we make 16 bit subregs legal it creates a huge amount of failures which can only be resolved at once without a temporary hack like this. The next step is to change operands, instruction definitions and patterns until this hack is not needed. Differential Revision: https://reviews.llvm.org/D79119	2020-05-04 08:54:22 -07:00
Alex Richardson	d1ff003fbb	[SelectionDAGBuilder] Stop setting alignment to one for hidden sret values We allocated a suitably aligned frame index so we know that all the values have ABI alignment. For MIPS this avoids using pair of lwl + lwr instructions instead of a single lw. I found this when compiling CHERI pure capability code where we can't use the lwl/lwr unaligned loads/stores and and were to falling back to a byte load + shift + or sequence. This should save a few instructions for MIPS and possibly other backends that don't have fast unaligned loads/stores. It also improves code generation for CodeGen/X86/pr34653.ll and CodeGen/WebAssembly/offset.ll since they can now use aligned loads. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D78999	2020-05-04 14:44:39 +01:00
Alex Richardson	3fc738846e	[MIPS] Add a baseline test showing current inefficient hidden sret lowering SelectionDAGBuilder currently doesn't propagate the known alignment of the sret parameter. This is inefficient for MIPS and highly inefficient for our out-of-tree CHERI-extended MIPS since we don't have lwl/lwr so fall back to byte loads for align == 1.	2020-05-04 14:44:39 +01:00
alex-t	5b898bddff	[AMDGPU] Enable carry out ADD/SUB operations divergence driven instruction selection. Summary: This change enables all kind of carry out ISD opcodes to be selected according to the node divergence. Reviewers: rampitec, arsenm, vpykhtin Reviewed By: rampitec Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D78091	2020-05-04 16:42:25 +03:00
Raul Tambre	0863e94ebd	[AArch64] Add NVIDIA Carmel support Summary: NVIDIA's Carmel ARM64 cores are used in Tegra194 chips found in Jetson AGX Xavier, DRIVE AGX Xavier and DRIVE AGX Pegasus. References: * https://devblogs.nvidia.com/nvidia-jetson-agx-xavier-32-teraops-ai-robotics/#h.huq9xtg75a5e * NVIDIA Xavier Series System-on-Chip Technical Reference Manual 1.3 (https://developer.nvidia.com/embedded/downloads#?search=Xavier%20Series%20SoC%20Technical%20Reference%20Manual) Reviewers: sdesmalen, paquette Reviewed By: sdesmalen Subscribers: llvm-commits, ianshmean, kristof.beyls, hiraditya, jfb, danielkiss, cfe-commits, t.p.northover Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D77940	2020-05-04 13:52:30 +01:00
Kerry McLaughlin	19f5da9c1d	[SVE][Codegen] Lower legal min & max operations Summary: This patch adds AArch64ISD nodes for [S\|U]MIN_PRED and [S\|U]MAX_PRED, and lowers both SVE intrinsics and IR operations for min and max to these nodes. There are two forms of these instructions for SVE: a predicated form and an immediate (unpredicated) form. The patterns which existed for the latter have been updated to match a predicated node with an immediate and map this to the immediate instruction. Reviewers: sdesmalen, efriedma, dancgr, rengolin Reviewed By: efriedma Subscribers: huihuiz, tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, cfe-commits, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D79087	2020-05-04 11:19:19 +01:00
Craig Topper	8b53fdd3b6	[X86] Custom legalize v16i64->v16i8 truncate with avx512. Default legalization will create two v8i64 truncs to v8i32, concat them to v16i32, and then truncate the rest of the way to v16i8. Instead we can truncate directly from v8i64 to v8i8 in the lower half of an xmm. Then concat the two halves to use vpunpcklqdq. This is the same number of uops, but the dependency chain through the uops is better since the halves are merged at the end. I had to had SimplifyDemandedBits support for VTRUNC to prevent a regression on vector-trunc-math.ll. combineTruncatedArithmetic no longer gets a chance to shrink vXi64 mul so we were producing the v8i64 multiply sequence using multiple PMULUDQs. With the demanded bits fix we are able to prune out the extra ops leaving just two PMULUDQs, one for each v8i64 half. This is twice the width of the 2 v8i32 PMULLDs we had before, but PMULUDQ is 1 uop and PMULLD is 2. We also save some truncates. It's probably worth using PMULUDQ even when PMULLQ is available since the latter is 3 uops, but that will require a different change. Differential Revision: https://reviews.llvm.org/D79231	2020-05-03 23:26:04 -07:00
Simon Pilgrim	ff5094c03f	[X86] Add tests showing failure to fold mul(abs(x),abs(x)) -> mul(x,x) (PR39476)	2020-05-03 17:39:48 +01:00
Craig Topper	cd75b74073	[X86] Fix a few issues in the evex-to-vex-compress.mir test. Don't use $noreg for instructions that take register inputs. Only allow $noreg for parts of memory operands. Don't use index register with $rip base. Use RETQ instead of the RET pseudo. This pass is after the ExpandPseudo pass that converts RET to RETQ.	2020-05-02 18:02:12 -07:00
LemonBoy	6d103ca855	[SelectionDAG] Unify scalarizeVectorLoad and VectorLegalizer::ExpandLoad The two code paths have the same goal, legalizing a load of a non-byte-sized vector by loading the "flattened" representation in memory, slicing off each single element and then building a vector out of those pieces. The technique employed by `ExpandLoad` is slightly more convoluted and produces slightly better codegen on ARM, AMDGPU and x86 but suffers from some bugs (D78480) and is wrong for BE machines. Differential Revision: https://reviews.llvm.org/D79096	2020-05-02 15:18:10 -07:00
Sam Elliott	fe4245a4c1	[RISCV] Implement convertSelectOfConstantsToMath Summary: The current lowering of `select` on RISC-V uses a branch instruction to load a register with one or other value. This is inefficient, especially in the case of small constants that can be computed easily. By implementing the TargetLowering::convertSelectOfConstantsToMath hook, some of the simpler cases are covered that let us avoid introducing a branch in these cases. Reviewed By: luismarques Differential Revision: https://reviews.llvm.org/D79260	2020-05-02 15:05:57 +01:00
Sam Elliott	bf552d29ee	[RISCV][NFC] Tests for (select (const), (const)) Summary: This just adds some simple cases for testing select of constants. There will be a follow-up patch that improves code generation in some of these cases. Reviewed By: luismarques Differential Revision: https://reviews.llvm.org/D79259	2020-05-02 15:05:45 +01:00
Sam Elliott	a4a9a1f671	[RISCV] Add patterns for checking isnan Summary: This patch addresses some weird assembly sequences we were seeing during comparing floats. In particular, comparing a float to itself tells you whether it is NaN or not, which we were doing correctly, but with an extra unneeded `and` instruction. This patch specialises the existing patterns to remove the `and` instructions when both their operands are the same. Reviewed By: luismarques, asb Differential Revision: https://reviews.llvm.org/D78908	2020-05-02 15:01:04 +01:00
Sam Elliott	910ca0e435	[RISCV][NFC] Add tests for checking isnan patterns Summary: I worked on adding some SelectionDag patterns to address code generated by these examples, which came out of some differential testing against GCC. The pattern additions will be in a follow-up patch. Reviewers: luismarques, asb Reviewed By: luismarques, asb Differential Revision: https://reviews.llvm.org/D78907	2020-05-02 14:57:18 +01:00
Craig Topper	8555c91337	[X86] Use more accurate increments for the induction variables in sad.ll. NFC I think some copy/pasting was used to create loops of different VFs. But the increment of the induction variable wasn't updated to match the VF. This has no effect on the pattern matching we're testing, it just helps the test make sense to the reader.	2020-05-01 18:55:22 -07:00
Thomas Lively	e0f52842c8	[WebAssembly] Renumber SIMD opcodes Summary: As described in https://github.com/WebAssembly/simd/pull/209. This is the final reorganization of the SIMD opcode space before standardization. It has been landed in concert with corresponding changes in other projects in the WebAssembly SIMD ecosystem. Reviewers: aheejin Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D79224	2020-05-01 17:20:49 -07:00
Simon Pilgrim	7cb5a51f38	[DAG] SimplifyDemandedVectorElts - add INSERT_SUBVECTOR SimplifyMultipleUseDemandedBits handling	2020-05-01 16:20:51 +01:00
Simon Pilgrim	65d32a9892	[DAG] SimplifyDemandedVectorElts - remove INSERT_SUBVECTOR if we don't demand the subvector	2020-05-01 16:20:51 +01:00
Simon Pilgrim	e3c0be596c	[DAG] SimplifyDemandedVectorElts - add EXTRACT_SUBVECTOR SimplifyMultipleUseDemandedBits handling	2020-05-01 13:48:07 +01:00
Simon Pilgrim	8cbd8194c1	[X86] Improving folding of concat_vectors of subvectors from the same broadcast Handle concat_vectors(extract_subvector(broadcast(x)), extract_subvector(broadcast(x))) -> broadcast(x) To expose this we also need collectConcatOps to recognise the insert_subvector(x, extract_subvector(x, lo), hi) subvector splat pattern	2020-05-01 11:23:10 +01:00
Jay Foad	5f7ea85e78	[AMDGPU] Remove unnecessary s_waitcnt between VMEM loads VMEM loads of the same type (sampler vs no sampler) are guaranteed to write their result registers in order, so there is no need for an s_waitcnt even if they write to overlapping vgprs. Differential Revision: https://reviews.llvm.org/D79176	2020-05-01 10:10:23 +01:00
Hubert Tong	0e8608b3c3	[tests] Revert unhelpful change from `d73eed42d1`	2020-04-30 22:18:54 -04:00
Hubert Tong	d73eed42d1	[tests] Speculative fix for buildbot breakage from `c5f7c039ef`	2020-04-30 22:04:53 -04:00
Suyog Sarda	ea093f6481	Handle cases for subregisters. While restoring latency, check if any of the registers of source instruction is a subregister of the successor instructions apart from being same register.	2020-04-30 20:32:33 -05:00
Craig Topper	c5f7c039ef	[X86] Add x, t and g modifiers for inline asm This patch adds the x, t and g modifiers for inline asm from GCC. These will print a vector register as xmm, ymm or zmm* respectively. I also fixed register names with modifiers with inteldialect so they are no longer printed with a leading %. Patch by Amanieu d'Antras Differential Revision: https://reviews.llvm.org/D78977	2020-04-30 17:45:45 -07:00
Craig Topper	6a1ad76dab	[X86] Don't return true from isTruncateFree for vectors Also fix some cost tables for vXi1 types to match the costs entries for the types they will be promoted to. Differential Revision: https://reviews.llvm.org/D79045	2020-04-30 16:43:35 -07:00
Simon Pilgrim	bf468f4349	[X86][SSE] Canonicalize UNARYSHUFFLE(XOR(X,-1) -> XOR(UNARYSHUFFLE(X),-1) This pushes the NOT pattern up the DAG to help expose it for further combines (AND->ANDN in particular). The PSHUFD/MOVDDUP 'splat' cases are the only ones I've seen in the wild so far, we can further generalize if/when we need to.	2020-04-30 19:18:51 +01:00
Simon Pilgrim	51308ee30c	[X86] Extend combine-bitselect tests Add AVX512VL tests and v2i64/v8i64 mask broadcast tests that match the existing v4i64 versions	2020-04-30 16:24:17 +01:00
Simon Pilgrim	30211c4783	[X86] combineANDXORWithAllOnesIntoANDNP - add BROADCAST handling Fold BROADCAST(NOT(Y)) -> BROADCAST(Y) as part of finding a NOT inversion.	2020-04-30 16:24:17 +01:00
diggerlin	a2c8cd1812	[AIX] emit .extern and .weak directive linkage SUMMARY: emit .extern and .weak directive linkage Reviewers: hubert.reinterpretcast, Jason Liu Subscribers: wuzish, nemanjai, hiraditya Differential Revision: https://reviews.llvm.org/D76932	2020-04-30 09:54:10 -04:00
Simon Pilgrim	f6cdcb0a5a	[X86][SSE] Add bitselect tests where the mask is a broadcasted scalar Shows issue that the IsNot() test can't see through shuffles/broadcasts	2020-04-30 14:45:21 +01:00
Simon Pilgrim	96238486ed	[DAGCombine] Move the remaining X86 funnel shift patterns to DAGCombine X86 matches several 'shift+xor' funnel shift patterns: fold (or (srl (srl x1, 1), (xor y, 31)), (shl x0, y)) -> (fshl x0, x1, y) fold (or (shl (shl x0, 1), (xor y, 31)), (srl x1, y)) -> (fshr x0, x1, y) fold (or (shl (add x0, x0), (xor y, 31)), (srl x1, y)) -> (fshr x0, x1, y) These patterns are also what we end up with the proposed expansion changes in D77301. This patch moves these to DAGCombine's generic MatchFunnelPosNeg. All existing X86 test cases still pass, and we just have a small codegen change in pr32282.ll. Reviewed By: @spatel Differential Revision: https://reviews.llvm.org/D78935	2020-04-30 12:57:17 +01:00
Cullen Rhodes	672b62ea21	[AArch64][SVE] Custom lowering of floating-point reductions Summary: This patch implements custom floating-point reduction ISD nodes that have vector results, which are used to lower the following intrinsics: * llvm.aarch64.sve.fadda * llvm.aarch64.sve.faddv * llvm.aarch64.sve.fmaxv * llvm.aarch64.sve.fmaxnmv * llvm.aarch64.sve.fminv * llvm.aarch64.sve.fminnmv SVE reduction instructions keep their result within a vector register, with all other bits set to zero. Changes in this patch were implemented by Paul Walker and Sander de Smalen. Reviewers: sdesmalen, efriedma, rengolin Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D78723	2020-04-30 10:18:40 +00:00
David Sherwood	058cd8c5be	[CodeGen] Add support for inserting elements into scalable vectors Summary: This patch tries to ensure that we do something sensible when generating code for the ISD::INSERT_VECTOR_ELT DAG node when operating on scalable vectors. Previously we always returned 'undef' when inserting an element into an out-of-bounds lane index, whereas now we only do this for fixed length vectors. For scalable vectors it is assumed that the backend will do the right thing in the same way that we have to deal with variable lane indices. In this patch I have permitted a few basic combinations for scalable vector types where it makes sense, but in general avoided most cases for now as they currently require the use of BUILD_VECTOR nodes. This patch includes tests for all scalable vector types when inserting into lane 0, but I've only included one or two vector types for other cases such as variable lane inserts. Differential Revision: https://reviews.llvm.org/D78992	2020-04-30 11:14:04 +01:00
Evgeniy Brevnov	3e68a66704	[BPI][NFC] Reuse post dominantor tree from analysis manager when available Summary: Currenlty BPI unconditionally creates post dominator tree each time. While this is not incorrect we can save compile time by reusing existing post dominator tree (when it's valid) provided by analysis manager. Reviewers: skatkov, taewookoh, yrouban Reviewed By: skatkov Subscribers: hiraditya, steven_wu, dexonsmith, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D78987	2020-04-30 11:31:03 +07:00
Puyan Lotfi	9b16ece6ca	[test][MachineOutliner] REQUIRES: asserts This new test checks some of the debug output to ensure what iteration the outliner reached a fixed point. For now I am making it REQUIRES: asserts so that it wont break any bots that have asserts disabled.	2020-04-29 19:43:17 -04:00
Puyan Lotfi	ffd5e121d7	[NFCi] Iterative Outliner + clang-format refactoring. Prior to D69446 I had done some NFC cleanup to make landing an iterative outliner a cleaner more straight-forward patch. Since then, it seems that has landed but I noticed some ways it could be cleaned up. Specifically: 1) doOutline was meant to be the re-runable function, but instead runOnceOnModule was created that just calls doOutline. 2) In D69446 we discussed that the flag allowing the re-run of the outliner should be a flag to tell how many additional times to run the outliner again, not the total number of times. I don't think it makes sense to introduce a flag, but print an error if the flag is set to 0. This is an NFCi, the i being that I get rid of the way that the machine-outline-runs flag could be used to tell the outliner to not run at all, and because I renamed the flag to '-machine-outliner-reruns'. Differential Revision: https://reviews.llvm.org/D79070	2020-04-29 18:36:47 -04:00
Sanjay Patel	cecee111e4	[x86] add tests for awkward 'icmp eq i1'; NFC	2020-04-29 14:39:47 -04:00
Simon Pilgrim	f0903de1aa	[x86] Enable bypassing 64-bit division on generic x86-64 This is currently enabled for Intel big cores from Sandy Bridge onward, as well as Atom, Silvermont, and KNL, due to 64-bit division being so slow on these cores. AMD cores can do this in hardware (use 32-bit division based on input operand width), so it's not a win there. But since the majority of x86 CPUs benefit from this optimization, and since the potential upside is significantly greater than the downside, we should enable this for the generic x86-64 target. Patch By: @atdt Reviewed By: @craig.topper, @RKSimon Differential Revision: https://reviews.llvm.org/D75567	2020-04-29 16:55:48 +01:00
Chen Zheng	957c5dd78b	[PowerPC-QPX] add more test for QPX madd/msub operands order - NFC	2020-04-29 01:17:14 -04:00
QingShan Zhang	b5f89744cc	[DAGCombine] Checking the cost directly to improve the code readability Call getNegatedExpression(Cost) and check the Cost to make the code more clear. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D78347	2020-04-29 01:49:39 +00:00
Stanislav Mekhanoshin	26777ad7a0	[AMDGPU] Adapt GCNRegBankReassign for 16 bit subregs It allows it not to crash and analyze 16 bit subregs if those appear in the instructions. At the same time it does not attempt to reassign these. It still can correctly identify register banks to let larger registers to be reassigned. More work will be needed here when real instructions will use these registers and more tests as well. Differential Revision: https://reviews.llvm.org/D78772	2020-04-28 16:16:04 -07:00
Stanislav Mekhanoshin	8a30460697	[AMDGPU] Define AGPR subregs These are only needed as VGPR counterpart. Differential Revision: https://reviews.llvm.org/D78597	2020-04-28 15:30:43 -07:00
Craig Topper	446a3be8f1	[X86] Add PACK instructions to hasUndefRegUpdate so the BreakFalseDeps pass will reassign an undef second source to match the first source We generate PACK instructions with an undef second source when we are truncating from a 128-bit vector to something narrower and we don't care about the upper bits of the vector register. The register allocation process will always assign untied undef uses to xmm0. This creates a false dependency on xmm0. By adding these instructions to hasUndefRegUpdate, we can get the BreakFalseDeps pass to reassign the source to match the other input. Normally this interface is used for instructions that might need an xor inserted to break the dependency. But the pass also has a heuristic that tries to use the same register as other sources. That should always be possible for these instructions so we'll never trigger the xor dependency break. Differential Revision: https://reviews.llvm.org/D79032	2020-04-28 15:11:32 -07:00
Stanislav Mekhanoshin	46a75436f8	[AMDGPU] Define special SGPR subregs These are used in SReg_32 and when we start to use SGPR_LO16 there will be compaints that not all registers in RC support all subreg indexes. For now it is NFC. Unused regunits are reserved so that verifier does not complain about missing phys reg live-ins. Differential Revision: https://reviews.llvm.org/D78591	2020-04-28 14:57:46 -07:00
Davide Italiano	0ed276bb08	[GlobalISel] Assign the correct debug location when combining G_ANYEXT/G_ZEXT <rdar://problem/62535712>	2020-04-28 14:12:33 -07:00
Stanislav Mekhanoshin	395d93358e	Revert "[AMDGPU] Define special SGPR subregs" This reverts commit `1baaa080e0`.	2020-04-28 13:53:15 -07:00
Stanislav Mekhanoshin	1baaa080e0	[AMDGPU] Define special SGPR subregs These are used in SReg_32 and when we start to use SGPR_LO16 there will be compaints that not all registers in RC support all subreg indexes. For now it is NFC. Unused regunits are reserved so that verifier does not complain about missing phys reg live-ins. Differential Revision: https://reviews.llvm.org/D78591	2020-04-28 13:34:24 -07:00
Sean Fertile	2a3cf5e583	[PowerPC][AIX] Pass ByVal formal args that span registers and stack. Implement passing of ByVal formal arguments when the argument is passed partly in the argument registers, with the remainder of the argument passed on the stack. Differential Revision: https://reviews.llvm.org/D78515	2020-04-28 14:57:14 -04:00
Jessica Paquette	2af31b3b65	[AArch64][GlobalISel] Select immediate forms of compares by wiggling constants Similar to code in `getAArch64Cmp` in AArch64ISelLowering. When we get a compare against a constant, sometimes, that constant isn't valid for selecting an immediate form. However, sometimes, you can get a valid constant by adding 1 or subtracting 1, and updating the condition code. This implements the following transformations when valid: - x slt c => x sle c - 1 - x sge c => x sgt c - 1 - x ult c => x ule c - 1 - x uge c => x ugt c - 1 - x sle c => x slt c + 1 - x sgt c => s sge c + 1 - x ule c => x ult c + 1 - x ugt c => s uge c + 1 Valid meaning the constant doesn't wrap around when we fudge it, and the result gives us a compare which can be selected into an immediate form. This also moves `getImmedFromMO` higher up in the file so we can use it. Differential Revision: https://reviews.llvm.org/D78769	2020-04-28 11:35:01 -07:00
Craig Topper	0de7ddbfb0	[X86] Handle more cases in combineAddOrSubToADCOrSBB. This adds support for X + SETAE --> sbb X, -1 X - SETAE --> adc X, -1 Fixes PR45700 Differential Revision: https://reviews.llvm.org/D78984	2020-04-28 10:39:39 -07:00
Craig Topper	c480dc6b47	[X86] Pre-commit tests for D78984. NFC These tests show some missed opportunities to use sbb/adc.	2020-04-28 10:39:37 -07:00
Francis Visoiu Mistrih	e770153865	[AArch64] Add support for -ffixed-x30 Add support for reserving LR in: * the driver through `-ffixed-x30` * cc1 through `-target-feature +reserve-x30` * the backend through `-mattr=+reserve-x30` * a subtarget feature `reserve-x30` the same way we're doing for the other registers.	2020-04-28 08:48:28 -07:00
David Green	1084b32339	[ARM] Always replace FP16 bitcasts with VMOVhr or VMOVrh This changes the logic with lowering fp16 bitcasts to always produce either a VMOVhr or a VMOVrh, instead of only trying to do it with certain surrounding nodes. To perform the same optimisations demand bits and known bits information has been added for them. Differential Revision: https://reviews.llvm.org/D78587	2020-04-28 16:12:53 +01:00
Krzysztof Parzyszek	25a4b1904c	Handle part-word LL/SC in atomic expansion pass Differential Revision: https://reviews.llvm.org/D77213	2020-04-28 10:07:39 -05:00
Chen Zheng	22fdbd01a3	[Powerpc] add triple for new added qpx test case - NFC	2020-04-28 05:32:10 -04:00
Ng Zhi An	500b4ad5f4	[PowerPC] Fix downcast from nullptr for target streamer getTargetStreamer() might return null (e.g. when running inlined-strings.ll test), downcasting to a reference will be wrong. This is detectable with -fsanitize=null. Reviewed By: steven.zhang Differential Revision: https://reviews.llvm.org/D78686	2020-04-28 09:20:10 +00:00
Chen Zheng	949018cc27	[PowerPC] add test case for reorder operands of qpx fma instr - nfc.	2020-04-28 04:43:32 -04:00
Jonas Paulsson	c84461ba8d	[SystemZ] Fix test case. Remove bad kill flags fom load-and-test.mir as discovered by https://reviews.llvm.org/D78586: "[MachineVerifier] Add more checks for registers in live-in lists". Review: Ulrich Weigand	2020-04-28 09:43:03 +02:00
Kazushi (Jam) Marukawa	3c80478d73	[VE] Update branch instructions Summary: Changing all mnemonic to match assembly instructions to simplify mnemonic naming rules. This time update all branch instructions. This also change to use %s10 register consistently. Differential Revision: https://reviews.llvm.org/D78889	2020-04-28 09:41:01 +02:00
Kazushi (Jam) Marukawa	0314e8980f	[VE] Support floating point immediate values Summary: Add simm7fp/mimmfp to represent floating point immediate values. Also clean multiclasses to define floating point arithmetic instructions to handle simm7fp/mimmfp operands. Also add several regression tests for new operands. Differential Revision: https://reviews.llvm.org/D78887	2020-04-28 09:36:10 +02:00
Chen Zheng	45d92806ea	[PowerPC] use inst-level fast-math-flags to drive MachineCombiner Currently, on PowerPC target, it uses function scope UnsafeFPMath option to drive Machine Combiner pass. This is not accurate in two ways: 1: the scope is not accurate. Machine Combiner pass only requires instruction-level flags instead of the function scope. 2: the float point flag is not accurate. Machine Combiner pass only requires float point flags reassoc and nsz. Reviewed By: steven.zhang Differential Revision: https://reviews.llvm.org/D78183	2020-04-28 03:31:12 -04:00
Kang Zhang	4bb0a1cb70	[PowerPC] Fix the liveins for ppc-expand-isel pass Summary: In the ppc-expand-isel pass, we use stepForward() to update the liveins, this function is not recommended, because it needs the accurate kill info. This patch uses the function computeAndAddLiveIns() to update the liveins, it's the recommended method and can fix the liveins bug for ppc-expand-isel pass.. Reviewed By: efriedma, lkail Differential Revision: https://reviews.llvm.org/D78657	2020-04-28 03:22:48 +00:00
LemonBoy	f30416fdde	[AsmPrinter] Fix emission of non-standard integer constants for BE targets The code assumed that zero-extending the integer constant to the designated alloc size would be fine even for BE targets, but that's not the case as that pulls in zeros from the MSB side while we actually expect the padding zeros to go after the LSB. I've changed the codepath handling the constant integers to use the store size for both small(er than u64) and big constants and then add zero padding right after that. Differential Revision: https://reviews.llvm.org/D78011	2020-04-27 14:57:29 -07:00
Wei Mi	68d2301e12	Recommit "Generate Callee Saved Register (CSR) related cfi directives like .cfi_restore" Insert .cfi_offset/.cfi_register when IncomingCSRSaved of current block is larger than OutgoingCSRSaved of its previous block. Original commit message: https://reviews.llvm.org/D42848 only handled CFA related cfi directives but didn't handle CSR related cfi. The patch adds the CSR part. Basically it reuses the framework created in D42848. For each basicblock, the patch tracks which CSR set have been saved at its CFG predecessors's exits, and compare the CSR set with the set at its previous basicblock's exit (The previous block is the block laid before the current block). If the saved CSR set at its previous basicblock's exit is larger, .cfi_restore will be inserted. The patch also generates proper .cfi_restore in epilogue to make sure the saved CSR set is consistent for the incoming edges of each block. Differential Revision: https://reviews.llvm.org/D74303	2020-04-27 12:46:58 -07:00
Davide Italiano	c8433a5b1b	[GlobalISel] Remove debug locations when emitting constants. The tl;dr story is that this causes jumps in the emitted line tables, even at `-O0`. We could at some point consider more fancy solutions to preserve locations, but it doesn't seem to be worth the effort for now. <rdar://problem/62460788> Differential Revision: https://reviews.llvm.org/D78947	2020-04-27 11:27:08 -07:00
Stefan Pintilie	1354a03e74	[PowerPC][Future] Implement PC Relative Tail Calls Tail Calls were initially disabled for PC Relative code because it was not safe to make certain assumptions about the tail calls (namely that all compiled functions no longer used the TOC pointer in R2). However, once all of the TOC pointer references have been removed it is safe to tail call everything that was tail called prior to the PC relative additions as well as a number of new cases. For example, it is now possible to tail call indirect functions as there is no need to save and restore the TOC pointer for indirect functions if the caller is marked as may clobber R2 (st_other=1). For the same reason it is now also possible to tail call functions that are external. Differential Revision: https://reviews.llvm.org/D77788	2020-04-27 12:55:08 -05:00
Simon Pilgrim	bd60b2983e	[X86][SSE] Regenerate oddsubvector.ll test checks Fixes some missed address symbol regexs	2020-04-27 18:48:02 +01:00
David Sherwood	121ca44c19	[CodeGen] Use SPLAT_VECTOR for zeroinitialiser with scalable types Adding tests that I forgot to add as part of a previous change: https://reviews.llvm.org/D78636	2020-04-27 16:16:48 +01:00
David Green	61b8af0375	[ARM] Allow fma in tail predicated loops There are some intrinsics like this that currently block tail predication, but should be fine. This allows fma through, as the one that I ran into. There may be others that need the same treatment but I've only done this one here. Differential Revision: https://reviews.llvm.org/D78385	2020-04-27 15:32:47 +01:00
Simon Pilgrim	d9e174dbf7	[X86][SSE] getFauxShuffle - account for PEXTW/PEXTB implicit zero-extension The insert(truncate/extend(extract(vec0,c0)),vec1,c1) case in rGacbc5ede99 wasn't combining the 'mineltsize' with the src vector elt size which may be smaller due to implicit extension during extraction. Reduced from test case provided by @mstorsjo	2020-04-27 12:46:50 +01:00
David Green	8807139026	[ARM] Only produce qadd8b under hasV6Ops When compiling for a arm5te cpu from clang, the +dsp attribute is set. This meant we could try and generate qadd8 instructions where we would end up having no pattern. I've changed the condition here to be hasV6Ops && hasDSP, which is what other parts of ARMISelLowering seem to use for similar instructions. Fixed PR45677. Differential Revision: https://reviews.llvm.org/D78877	2020-04-27 10:13:29 +01:00
Matt Arsenault	4cef9812eb	AMDGPU: Add some missing atomics tests We had no FP atomic load/store coverage.	2020-04-26 15:09:35 -04:00
Simon Pilgrim	acbc5ede99	[X86][SSE] getFauxShuffle - support insert(truncate/extend(extract(vec0,c0)),vec1,c1) shuffle patterns at the byte level Followup to the PR45604 fix at rGe71dd7c011a3 where we disabled most of these cases. By creating the shuffle at the byte level we can handle any extension/truncation as long as we track how small the scalar got and assume that the upper bytes will need to be zero.	2020-04-26 15:31:01 +01:00
Kang Zhang	f85e35d2a3	[NFC][PowerPC] Add the killed flag for the case expand-isel-liveness.mir	2020-04-26 04:40:20 +00:00
Kang Zhang	fe2a522533	[NFC][PowerPC] Add a new test case in expand-isel-liveness.mir	2020-04-26 03:15:54 +00:00
Craig Topper	c1cb733db6	[X86] Improve lowering of v16i8->v16i1 truncate under prefer-vector-width=256.	2020-04-25 15:20:33 -07:00
Sanjay Patel	7f4ff782d4	[x86] use vector instructions to lower even more FP->int->FP casts This is another enhancement to D77895/D78362 to avoid a round-trip from XMM->GPR->XMM. This time we handle the case of starting/ending with different FP types but always with signed i32 as the intermediate value. I think this covers all of the faux vector optimization possibilities for pre-AVX512. There is at least 1 other transform mentioned in PR36617: https://bugs.llvm.org/show_bug.cgi?id=36617#c19 ...where we fold an 'fpext' into a preceding 'sitofp'. I think we will want to handle that earlier (DAGCombiner or instcombine) because that's a target-independent optimization. Differential Revision: https://reviews.llvm.org/D78758	2020-04-25 11:38:54 -04:00
Snehasish Kumar	0cc063a8ff	Use .text.unlikely and .text.eh prefixes for MachineBasicBlock sections. Summary: Instead of adding a ".unlikely" or ".eh" suffix for machine basic blocks, this change updates the behaviour to use an appropriate prefix instead. This allows lld to group basic block sections together when -z,keep-text-section-prefix is specified and matches the behaviour observed in gcc. Reviewers: tmsriram, mtrofin, efriedma Reviewed By: tmsriram, efriedma Subscribers: eli.friedman, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D78742	2020-04-24 15:07:38 -07:00
Fangrui Song	10bc12588d	[XRay] Change Sled.Function to PC-relative for sled version 2 and make llvm-xray support sled version 2 addresses Follow-up of D78082 and D78590. Otherwise, because xray_instr_map is now read-only, the absolute relocation used for Sled.Function will cause a text relocation.	2020-04-24 14:41:56 -07:00
Amara Emerson	dbb0356771	[AArch64][GlobalISel] Fix sub-64b stack parameter passing on Darwin. A previous bug fix for varargs introduced a regression where we would incorrectly widen some stores to memory when passing i8/i16 parameters on the stack. This didn't show up seemingly because it only happens when there is no signext/zeroext parameter attribute, which I think for Darwin clang adds. Swift however seems to be a different story, and a plain anyext on the parameter triggered the bug. To fix this, I've added a new ValueHandler::assignValueToAddress type override which lets us distiguish between varargs and fixed args (we still need this widening behaviour for varargs to fix the original bug in 2018). rdar://61353552	2020-04-24 13:56:43 -07:00
Matt Arsenault	35e6a9c839	AMDGPU: Break read2/write2 search range on a memory fence This is to fix performance regressions introduced by `86c944d790`. The old search would collect all potentially mergeable instructions in the entire block. In this case, the same address is written in multiple places in the block on the other side of a fence. When sorted by offset, the two unmergeable, identical addresses would be next to each other and the merge would give up. Break the search space when we encounter an instruction we won't be able to merge across. This will keep the identical addresses in different merge attempts. This may also improve compile time by reducing the merge list size.	2020-04-24 15:53:30 -04:00
Vedant Kumar	c0fa447e02	AArch64: Remove reversedInstructionsWithoutDebug helper When using reversedInstructionsWithoutDebug to construct a range from a pair of MachineInstrBundleIterators, the range unexpectedly leaves out an element. This results in mis-optimization as @mstorsjo points out in https://reviews.llvm.org/D78157. The problem is that when we convert a MachineInstrBundleIterator to a reverse iterator, the result gets incremented: MachineInstrBundleIterator(++I.getReverse()) The comment there explains that the "resulting iterator will dereference ... to the previous node, which is somewhat unexpected; but converting the two endpoints in a range will give the same range in reverse". This makes it hard to understand what reversedInstructionsWithoutDebug will do: I've removed the helper to prevent similar mistakes in the future.	2020-04-24 11:28:17 -07:00
Fangrui Song	25e22613df	[XRay] Change ARM/AArch64/powerpc64le to use version 2 sled (PC-relative address) Follow-up of D78082 (x86-64). This change avoids dynamic relocations in `xray_instr_map` for ARM/AArch64/powerpc64le. MIPS64 cannot use 64-bit PC-relative addresses because R_MIPS_PC64 is not defined. Because MIPS32 shares the same code, for simplicity, we don't use PC-relative addresses for MIPS32 as well. Tested on AArch64 Linux and ppc64le Linux. Reviewed By: ianlevesque Differential Revision: https://reviews.llvm.org/D78590	2020-04-24 08:35:43 -07:00
Luke Geeson	7da1905125	[AArch32] Armv8.6-a Matrix Mult Assembly + Intrinsics This patch upstreams support for the Armv8.6-a Matrix Multiplication Extension. A summary of the features can be found here: https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/arm-architecture-developments-armv8-6-a This patch includes: - Assembly support for AArch32 - Intrinsics Support for AArch32 Neon Intrinsics for Matrix Multiplication Note: these extensions are optional in the 8.6a architecture and so have to be enabled by default No additional IR types or C Types are needed for this extension. This is part of a patch series, starting with BFloat16 support and the other components in the armv8.6a extension (in previous patches linked in phabricator) Based on work by: - Luke Geeson - Oliver Stannard - Luke Cheeseman Reviewers: t.p.northover, miyuki Reviewed By: miyuki Subscribers: miyuki, ostannard, kristof.beyls, hiraditya, danielkiss, cfe-commits Tags: #clang Differential Revision: https://reviews.llvm.org/D77872	2020-04-24 15:54:06 +01:00
Luke Geeson	832cd74913	[AArch64] Armv8.6-a Matrix Mult Assembly + Intrinsics This patch upstreams support for the Armv8.6-a Matrix Multiplication Extension. A summary of the features can be found here: https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/arm-architecture-developments-armv8-6-a This patch includes: - Assembly support for AArch64 only (no SVE or Neon) - Intrinsics Support for AArch64 Armv8.6a Matrix Multiplication Instructions (No bfloat16 matrix multiplication) No IR types or C Types are needed for this extension. This is part of a patch series, starting with BFloat16 support and the other components in the armv8.6a extension (in previous patches linked in phabricator) Based on work by: - Luke Geeson - Oliver Stannard - Luke Cheeseman Reviewers: ostannard, t.p.northover, rengolin, kmclaughlin Reviewed By: kmclaughlin Subscribers: kmclaughlin, kristof.beyls, hiraditya, danielkiss, cfe-commits Tags: #clang Differential Revision: https://reviews.llvm.org/D77871	2020-04-24 15:54:06 +01:00
Piotr Sobczak	7631af3af2	[AMDGPU] Skip generating cache invalidating instructions on AMDPAL Summary: Frontend guarantees that coherent accesses have corresponding cache policy bits set (glc, dlc). Therefore there is no need for extra instructions that invalidate cache. Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D78800	2020-04-24 13:53:44 +02:00
David Green	a947be51bd	[ARM] Various tests for MVE and FP16 codegen. NFC	2020-04-24 12:11:46 +01:00
Kerry McLaughlin	53dd72a87a	[SVE][CodeGen] Lower SDIV & UDIV to SVE intrinsics Summary: This patch maps IR operations for sdiv & udiv to the @llvm.aarch64.sve.[s\|u]div intrinsics. A ptrue must be created during lowering as the div instructions have only a predicated form. Patch contains changes by Andrzej Warzynski. Reviewers: sdesmalen, c-rhodes, efriedma, cameron.mcinally, rengolin Reviewed By: efriedma Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, andwar, cfe-commits, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D78569	2020-04-24 11:38:20 +01:00
Kang Zhang	302e11cd97	[NFC][PowerPC] Fix the liveins for 3 mir test cases	2020-04-24 08:03:02 +00:00
Christudasan Devadasan	207cd5f68f	[AMDGPU] Add the SGPR used for FP copy to block livein lists. The temporary register used for FP copy should be live throughout the function.	2020-04-24 11:47:38 +05:30
Krzysztof Parzyszek	5c7a2cfac1	[Hexagon] Fix result word order when bitcasting vector pred to int64/128	2020-04-23 19:15:11 -05:00
James Y Knight	248a5db3f2	Change callbr to only define its output SSA variable on the normal path, not the indirect targets. Fixes: PR45565. Differential Revision: https://reviews.llvm.org/D78341	2020-04-23 19:36:44 -04:00
aartbik	907871d9ad	[llvm] [CodeGen] Fixed vector halving bug for masked load Summary: Given a VL=14 that is enveloped by a proper VL=16, splitting the masked load using the enveloping halving VL=8/8 should yields should eventually yield V=8/5. This fixes various assert failures in getHalfNumVectorElementsVT() and IncrementMemoryAddress(). Note, I suspect similar fixes will be needed for other masked operations, but for now I send out a fix for masked load only. Bugzilla issue 45563 https://bugs.llvm.org/show_bug.cgi?id=45563 Reviewers: craig.topper, mehdi_amini, nicolasvasilache Reviewed By: craig.topper Subscribers: hiraditya, dmgreen, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D78608	2020-04-23 15:12:44 -07:00
aartbik	e4e187d203	[llvm] [X86] Processed test with update_llc_test_checks Summary: As requested in another review for a similar regression test, I updated this test with the same utility. Reviewers: dmgreen, craig.topper, mehdi_amini, nicolasvasilache Reviewed By: craig.topper Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D78739	2020-04-23 15:09:53 -07:00
Sanjay Patel	b53fd70b9e	[x86] add tests for FP->int->FP with different FP types; NFC	2020-04-23 17:07:16 -04:00
Jay Foad	479145a5c2	[AMDGPU] Avoid hard-coded line numbers in error message checks This makes it easier for us to maintain downstream changes to some of these tests. NFC. Differential Revision: https://reviews.llvm.org/D78716	2020-04-23 21:06:09 +01:00
Matt Arsenault	89c8c80bd5	AMDGPU: Change pre-gfx9 implementation of fcanonicalize to mul If f32 denormals were enabled pre-gfx9, we would still try to implement this with v_max_f32. Pre-gfx9, these instructions ignored the denormal mode and did not flush. Switch to the multiply form for f32 as a workaround which should always work in any case. This fixes conformance failures when the library implementation of fmin/fmax were accidentally not inlined, forcing the assumption of no flushing on targets where denormals are not enabled by default. This is a workaround, since really we should not be mixing code with different FP mode expectations, but prefer the lowering that will work in any mode. Now this will always use max to implement canonicalize on gfx9+. This is only really beneficial for f64. For f32/f16 it's a neutral choice (and worse in terms of code size in 1 case), but possibly worse for the compiler since it does add an extra register use operand. Leave this change for later.	2020-04-23 15:24:13 -04:00
Matt Arsenault	5fe3f06596	AMDGPU/GlobalISel: Add new baseline checks for canonicalize	2020-04-23 15:04:32 -04:00
Simon Pilgrim	757c7c244b	[X86][SSE] Add SSE2 extract-concat tests Check pre-SSE41 codegen where we have less PEXTR/PINSR instructions	2020-04-23 19:40:34 +01:00
Krzysztof Parzyszek	d8e1dd8b9b	[Hexagon] Add missing live-in registers in some codegen tests	2020-04-23 10:28:04 -05:00
Jay Foad	0337017a9f	[AMDGPU] Use SGPR instead of SReg classes `12994a70cf` did this for 128-bit classes: SGPR_128 only includes the real allocatable SGPRs, and SReg_128 adds the additional non-allocatable TTMP registers. There's no point in allocating SReg_128 vregs. This shrinks the size of the classes regalloc needs to consider, which is usually good. This patch extends it to all classes > 64 bits, for consistency. Differential Revision: https://reviews.llvm.org/D78622	2020-04-23 11:45:22 +01:00
Sander de Smalen	a5e0389b2a	[AArch64] Define ACLE FP conversion intrinsics with more specific predicate. This patch changes the FP conversion intrinsics to take a predicate that matches the number of lanes for the vector with the widest element type as opposed to using <vscale x 16 x i1>. For example: ```<vscale x 4 x float> @llvm.aarch64.sve.fcvt.f32f16(<vscale x 4 x float>, <vscale x 4 x i1>, <vscale x 8 x half>)``` now uses <vscale x 4 x i1> instead of <vscale x 16 x i1> And similar for: ```<vscale x 4 x float> @llvm.aarch64.sve.fcvt.f32f64(<vscale x 4 x float>, <vscale x 2 x i1>, <vscale x 2 x double>)``` where the predicate now matches the wider type, so <vscale x 2 x i1>. Reviewers: efriedma, SjoerdMeijer, paulwalker-arm, rengolin Reviewed By: efriedma Tags: #clang Differential Revision: https://reviews.llvm.org/D78402	2020-04-23 10:53:23 +01:00
Amara Emerson	613f12dd8e	[AArch64][GlobalISel] Set the current debug loc when missing in some cases.	2020-04-23 01:34:57 -07:00
Vedant Kumar	e0b60c6df2	[AArch64CollectLOH] Debug insts should not break LOH collection [14/14] Fix an issue where the presence of debug instructions could break collection of linker optimization hints.	2020-04-22 17:03:41 -07:00
Vedant Kumar	ff8c417d31	[AArch64PreLegalizerCombiner] Fix debug invariance issue in matchFConstantToConstant [13/14] Fix an issue where the FConstantToConstant combine could fail if debug instructions were present.	2020-04-22 17:03:41 -07:00
Vedant Kumar	c2c2dc526a	[AArch64LoadStoreOptimizer] Skip debug insts during pattern matching [12/14] Do not count the presence of debug insts against the limit set by LdStLimit, and allow the optimizer to find matching insts by skipping over debug insts. Differential Revision: https://reviews.llvm.org/D78411	2020-04-22 17:03:40 -07:00
Vedant Kumar	bf4c70b355	[AArch64ConditionOptimizer] Fix missed optimization due to debug insts [11/14] Summary: The findSuitableCompare method can fail if debug instructions are present in the MBB -- fix this by using helpers to skip over debug insts. Reviewers: aemerson, paquette Subscribers: kristof.beyls, hiraditya, danielkiss, aprantl, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D78265	2020-04-22 17:03:40 -07:00
Vedant Kumar	78d69e97cc	[AArch64CondBrTuning] Ignore debug insts when scanning for NZCV clobbers [10/14] Summary: This fixes several instances in which condbr optimization was missed due to a debug instruction appearing as a bogus NZCV clobber. Reviewers: aemerson, paquette Subscribers: kristof.beyls, hiraditya, jfb, danielkiss, aprantl, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D78264	2020-04-22 17:03:40 -07:00
Vedant Kumar	4a51b61cb3	[AArch64] Clean up assorted usage of hasOneUse/use_instructions [9/14] Summary: Use the variants of these APIs which skip over debug instructions. This is mostly a cleanup, but it does fix a debug-variance issue which causes addsub-shifted.ll and addsub_ext.ll to fail when debug info is inserted by -mir-debugify. Reviewers: aemerson, paquette Subscribers: kristof.beyls, hiraditya, jfb, danielkiss, llvm-commits, aprantl Tags: #llvm Differential Revision: https://reviews.llvm.org/D78262	2020-04-22 17:03:40 -07:00
Vedant Kumar	b157974ab3	[AArch64ConditionalCompares] Ignore debug insts in findConvertibleCompare [8/14] Summary: Fix an issue where the presence of debug info could disable the ccmp optimization due to findConvertibleCompare failing too early (the error is "Can't create ccmp with multiple uses", where the "use" is a DBG_VALUE inst). Depends on D78151. Reviewers: t.p.northover, paquette, aemerson Subscribers: kristof.beyls, hiraditya, danielkiss, aprantl, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D78156	2020-04-22 17:03:40 -07:00
Vedant Kumar	f0b52beef3	[AArch64InstrInfo] Ignore debug insts in areCFlagsAccessedBetweenInstrs [7/14] Summary: Fix an issue where the presence of debug info could disable a peephole optimization due to areCFlagsAccessedBetweenInstrs returning the wrong result. In test/CodeGen/AArch64/arm64-csel.ll, the issue was found in the function @foo5, in which the first compare could successfully be optimized but not the second. Reviewers: t.p.northover, eastig, paquette Subscribers: kristof.beyls, hiraditya, danielkiss, aprantl, dsanders, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D78157	2020-04-22 17:03:40 -07:00
Vedant Kumar	26271c8384	[AArch64InstrInfo] Ignore debug insts in canInstrSubstituteCmpInstr [6/14] Summary: Fix an issue where the presence of debug info could disable a peephole optimization in optimizeCompareInstr due to canInstrSubstituteCmpInstr returning the wrong result. Depends on D78137. Reviewers: t.p.northover, eastig, paquette Subscribers: kristof.beyls, hiraditya, danielkiss, aprantl, llvm-commits, dsanders Tags: #llvm Differential Revision: https://reviews.llvm.org/D78151	2020-04-22 17:03:40 -07:00
Vedant Kumar	f1a71b5949	[GIsel][LegalizerHelper] Account for debug insts when creating mem libcalls [5/14] Summary: While lowering memory intrinsics, GIsel attempts to form a tail call to a library routine. There might be a DBG_LABEL or something after the intrinsic call, though: in that case, GIsel should still be able to form the tail call, and should also delete the debug insts after the tail call as the transform makes them invalid. Reviewers: dsanders, aemerson Subscribers: hiraditya, aprantl, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D78335	2020-04-22 17:03:40 -07:00
Vedant Kumar	ba9db54505	[GIsel][CombinerHelper] Fix for missed ElideBrByInvertingCond/CombineIndexedLoadStore combines [4/14] Summary: Fix an issue which could result in ElideBrByInvertingCond or CombineIndexedLoadStore being missed when debug info is present. In both cases the fix is s/hasOneUse/hasOneNonDbgUse/. Reviewers: aemerson, dsanders Subscribers: hiraditya, aprantl, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D78254	2020-04-22 17:03:40 -07:00
Vedant Kumar	5c04274dab	[GIsel][CombinerHelper] Don't consider debug insts in dominance queries [3/14] Summary: This fixes several issues where the presence of debug instructions could disable certain combines, due to dominance queries finding uses/defs that don't actually exist. Reviewers: dsanders, fhahn, paquette, aemerson Subscribers: hiraditya, arphaman, aprantl, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D78253	2020-04-22 17:03:40 -07:00
Vedant Kumar	5bae277584	[GISel][RegBankSelect] Hide assertion failure from LLT::getScalarSizeInBits [2/14] Summary: It looks like RegBankSelect can try to assign a bank based on a DBG_VALUE instead of ignoring it. This eventually leads to an assert in AArch64RegisterBankInfo::getInstrMapping because there is some info missing from the DBG_VALUE MachineOperand (I see: `Assertion failed: (RawData != 0 && "Invalid Type"), function getScalarSizeInBits`). I'm not 100% sure it's safe to insert DBG_VALUE instructions right before RegBankSelect (that's what -debugify-and-strip-all-safe is doing). Any advice appreciated. Depends on D78135. Reviewers: ab, qcolombet, dsanders, aprantl Subscribers: kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D78137	2020-04-22 17:03:39 -07:00
Vedant Kumar	6b58018c05	[ARM] Mark some tests as not safe for -debugify-and-strip-all, NFC These tests contain debug instructions which get checked, so we can't insert synthetic debug info and expect the tests to pass. The rest of the ARM backend tests appear to be fair game.	2020-04-22 17:03:39 -07:00
Vedant Kumar	2a5675f11d	[MachineDebugify] Insert synthetic DBG_VALUE instructions Summary: Teach MachineDebugify how to insert DBG_VALUE instructions. This can help find bugs causing CodeGen differences when debug info is present. DBG_VALUE instructions are only emitted when -debugify-level is set to locations+variables. There is essentially no attempt made to match up DBG_VALUE register operands with the local variables they ought to correspond to. I'm not sure how to improve the situation. In some cases (MachineMemOperand?) it's possible to find the IR instruction a MachineInstr corresponds to, but in general this seems to call for "undoing" the work done by ISel. Reviewers: dsanders, aprantl Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D78135	2020-04-22 17:03:39 -07:00
Eli Friedman	1a78b0bd38	[MachineOutliner] Teach outliner to set live-ins Preserving liveness can be useful even late in the pipeline, if we're doing substantial optimization work afterwards. (See, for example, D76065.) Teach MachineOutliner how to correctly set live-ins on the basic block in outlined functions. Differential Revision: https://reviews.llvm.org/D78605	2020-04-22 14:19:26 -07:00
Victor Huang	a60ca4b4e9	[PowerPC][Future] Initial support for PCRel addressing to get block address Add initial support for PCRelative addressing to get block address instead of using TOC. Differential Revision: https://reviews.llvm.org/D76294	2020-04-22 15:01:29 -05:00
Puyan Lotfi	264c07ef77	[llvm][MIRVRegNamer] Avoid collisions across jump table indices. Hash Jump Table Indices uniquely within a basic block for MIR Canonicalizer / MIR VReg Renamer passes. Differential Revision: https://reviews.llvm.org/D77966	2020-04-22 14:58:44 -04:00
David Green	eecba95067	[ARM] Replace arm vendor with none. NFC	2020-04-22 18:19:35 +01:00
Victor Huang	02141a17ae	[PowerPC][Future] Remove redundant r2 save and restore for indirect call Currently an indirect call produces the following sequence on PCRelative mode: extern void function( ); extern void (ptrfunc) ( ); void g() { ptrfunc=function; } void f() { (ptrfunc) ( ); } Producing paddi 3, 0, .LC0@PCREL, 1 ld 3, 0(3) std 2, 24(1) ld 12, 0(3) mtctr 12 bctrl ld 2, 24(1) Though the caller does not use or preserve r2, it is still saved and restored across a function call. This patch is added to remove these redundant save and restores for indirect calls. Differential Revision: https://reviews.llvm.org/D77749	2020-04-22 12:05:51 -05:00
Victor Huang	43abef06f4	[PowerPC][Future] Initial support for PCRel addressing for jump tables. Add initial support for PC Relative addressing to get jump table base address instead of using TOC. Differential Revision: https://reviews.llvm.org/D75931	2020-04-22 10:45:01 -05:00
John Brawn	8211cfb7c8	[ARM] Don't shrink STM if it would cause an unknown base register store If a 16-bit thumb STM with writeback stores the base register but it isn't the first register in the list, then an unknown value is stored. The load/store optimizer knows this and generates a 32-bit STM without writeback instead, but thumb2 size reduction converts it into a 16-bit STM. Fix this by having thumb2 size reduction notice such STMs and leave them as they are. Differential Revision: https://reviews.llvm.org/D78493	2020-04-22 14:50:42 +01:00
David Green	892af45c86	[ARM] Distribute MVE post-increments This adds some extra processing into the Pre-RA ARM load/store optimizer to detect and merge MVE loads/stores and adds of the same base. This we don't always turn into a post-inc during ISel, and due to the nature of it being a graph we don't always know an order to use for the nodes, not knowing which nodes to make post-inc and which to use the new post-inc of. After ISel, we have an order that we can use to post-inc the following instructions. So this looks for a loads/store with a starting offset of 0, and an add/sub from the same base, plus a number of other loads/stores. We then do some checks and convert the zero offset load/store into a postinc variant. Any loads/stores after it have the offset subtracted from their immediates. For example: LDR #4 LDR #4 LDR #0 LDR_POSTINC #16 LDR #8 LDR #-8 LDR #12 LDR #-4 ADD #16 It only handles MVE loads/stores at the moment. Normal loads/store will be added in a followup patch, they just have some extra details to ensure that we keep generating LDRD/LDM successfully. Differential Revision: https://reviews.llvm.org/D77813	2020-04-22 14:16:51 +01:00
David Green	48ac4e6938	[ARM] MVE FMA loop tests. NFC	2020-04-22 13:27:40 +01:00
Jay Foad	dbdffe3ee9	[AMDGPU] Add 192-bit register classes Differential Revision: https://reviews.llvm.org/D78312	2020-04-22 13:10:37 +01:00
Lucas Prates	727e6fb84a	[NFC][llvm][X86] Adding missing -mtiple to X86 test. The modified test was missing the specification of the intended triple in its run line, assuming X86 is the default.	2020-04-22 11:55:57 +01:00
Kerry McLaughlin	17f6e18acf	[AArch64][SVE] Add SVE intrinsic for LD1RQ Summary: Adds the following intrinsic for contiguous load & replicate: - @llvm.aarch64.sve.ld1rq The LD1RQ intrinsic only needs the SImmS16XForm added by this patch. The others (SImmS2XForm, SImmS3XForm & SImmS4XForm) were added for consistency. Reviewers: andwar, sdesmalen, efriedma, cameron.mcinally, dancgr, rengolin Reviewed By: sdesmalen Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, danielkiss, cfe-commits, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D76929	2020-04-22 11:29:27 +01:00
aartbik	5397f29087	[llvm] [X86] Make test more robust against different builds Summary: Rationale: Using the --debug-only flag requires a debug build. Also, the debug output is not always consistent over different builds. This change avoids all problems by just testing the generated assembly for AVX. Reviewers: craig.topper, mehdi_amini, nicolasvasilache Reviewed By: craig.topper Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D78609	2020-04-22 00:23:46 -07:00
Qiu Chaofan	c12722cde8	[PowerPC] Exploit RLDIMI for OR with large immediates This patch exploits rldimi instruction for patterns like `or %a, 0b000011110000`, which saves number of instructions when the operand has only one use, compared with `li-ori-sldi-or`. Reviewed By: nemanjai Differential Revision: https://reviews.llvm.org/D77850	2020-04-22 14:16:52 +08:00
Eli Friedman	704293b168	[ARM] Fix MIR tests with invalid live-ins. A register can't be live if it isn't defined; fix issues in various testcases. Differential Revision: https://reviews.llvm.org/D78529	2020-04-21 12:13:35 -07:00
Eli Friedman	b4b9faa120	[AArch64] Fix MIR tests with invalid live-ins. A register can't be live if it isn't defined; fix issues in various testcases. Differential Revision: https://reviews.llvm.org/D78531	2020-04-21 12:13:32 -07:00
aartbik	8387bee94d	[llvm] [X86] Fixed type bug in vselect for AVX masked load Summary: Bugzilla issue 45563 https://bugs.llvm.org/show_bug.cgi?id=45563 Reviewers: nicolasvasilache, mehdi_amini, craig.topper Reviewed By: craig.topper Subscribers: RKSimon, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D78527	2020-04-21 11:11:35 -07:00
Pavel Iliin	be881e2831	[AArch64] FMLA/FMLS patterns improvement. FMLA/FMLS f16 indexed patterns added. Fixes https://bugs.llvm.org/show_bug.cgi?id=45467 Removed redundant v2f32 vector_extract indexed pattern since Instruction Selection is able to match v4f32 instead.	2020-04-21 18:23:21 +01:00
Fangrui Song	5771c98562	[XRay] Change xray_instr_map sled addresses from absolute to PC relative for x86-64 xray_instr_map contains absolute addresses of sleds, which are relocated by `R_*_RELATIVE` when linked in -pie or -shared mode. By making these addresses relative to PC, we can avoid the dynamic relocations and remove the SHF_WRITE flag from xray_instr_map. We can thus save VM pages containg xray_instr_map (because they are not modified). This patch changes x86-64 and bumps the sled version to 2. Subsequent changes will change powerpc64le and AArch64. Reviewed By: dberris, ianlevesque Differential Revision: https://reviews.llvm.org/D78082	2020-04-21 09:36:09 -07:00
Stefan Pintilie	a92ee77d85	[PowerPC][Future] Add offsets to PC Relative relocations. This is an optimization that applies to global addresses and allows for the following transformation: Convert this: paddi r3, 0, symbol@PCREL, 1 ld r4, 8(r3) To this: pld r4, symbol@PCREL+8(0), 1 An instruction is saved and the linker can do the addition when the symbol is resolved. Differential Revision: https://reviews.llvm.org/D76160	2020-04-21 11:08:19 -05:00
Kang Zhang	e477915bfe	[PowerPC] Add a new test case expand-isel-liveness.mir	2020-04-21 16:00:34 +00:00
Sean Fertile	cd8e9e8fcd	[PowerPC][AIX][NFC] Fix use of FileCheck variable in lit test.	2020-04-21 10:56:46 -04:00
Pavel Iliin	c2dd38f1cb	[AArch64][NFC] One more intrinsic test.	2020-04-21 15:20:07 +01:00
Kerry McLaughlin	0df40d6ef8	[AArch64][SVE] Add addressing mode for contiguous loads & stores Summary: This patch adds the register + register addressing mode for SVE contiguous load and store intrinsics (LD1 & ST1) Reviewers: sdesmalen, fpetrogalli, efriedma, rengolin Reviewed By: fpetrogalli Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, danielkiss, cfe-commits, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D78509	2020-04-21 12:04:43 +01:00
Sam Parker	27d19101e9	[ARM][ParallelDSP] Handle squaring multiplies The logic in ARMParallelDSP is setup to merge two 16-bits loads into a 32-bit load and feed them into the smlads. This requires that four loads are combined for the four inputs, but there wasn't actually a check for this. Differential Revision: https://reviews.llvm.org/D78492	2020-04-21 08:39:56 +01:00
Yonghong Song	3cb7e7bf95	BPF: fix a CORE optimization bug For the test case in this patch like below struct t { int a; } __attribute__((preserve_access_index)); int foo(void ); int test(struct t arg) { long param[1]; param[0] = (long)&arg->a; return foo(param); } The IR right before BPF SimplifyPatchable phase: %1:gpr = LD_imm64 @"llvm.t:0:0$0:0" %2:gpr = LDD killed %1:gpr, 0 %3:gpr = ADD_rr %0:gpr(tied-def 0), killed %2:gpr STD killed %3:gpr, %stack.0.param, 0 After SimplifyPatchable phase, the incorrect IR is generated: %1:gpr = LD_imm64 @"llvm.t:0:0$0:0" %3:gpr = ADD_rr %0:gpr(tied-def 0), killed %1:gpr CORE_MEM killed %3:gpr, 306, %0:gpr, @"llvm.t:0:0$0:0" Note that CORE_MEM pseudo op is introduced to encode memory operations related to CORE. In the above, we intend to check whether we have a store like (%3:gpr + 0) = ... and if this is the case, we could replace it with (%0:gpr + @"llvm.t:0:0$0:0"_ = ... Unfortunately, in the above, IR for the store is *(%stack.0.param + 0) = %3:gpr and transformation should not happen. Note that we won't have problem if the actual CORE dereference (arg->a) happens. This patch fixed the problem by skip CORE optimization if the use of ADD_rr result is not the base address of the store operation. Differential Revision: https://reviews.llvm.org/D78466	2020-04-20 19:54:51 -07:00
Pavel Iliin	6e22a1e5c4	[AArch64][NFC] More intrinsic tests.	2020-04-21 00:00:26 +01:00
David Green	ce1840a90a	[ARM] MVE and scalar postinc mir tests. NFC	2020-04-20 22:00:07 +01:00
Piotr Sobczak	c48ceaf37b	Revert "[AMDGPU] Set the CostPerUse value for vgpr registers." This reverts commit `728b878de6`. D76417 has caused vgpr count to go up significantly in real-world graphics content.	2020-04-20 22:47:31 +02:00
Andrew Litteken	1488bef8fc	[MachineOutliner] Annotation for outlined functions in AArch64 - Adding changes to support comments on outlined functions with outlining for the conditions through which it was outlined (e.g. Thunks, Tail calls) - Adapts the emitFunctionHeader to print out a comment next to the header if the target specifies it based on information in MachineFunctionInfo - Adds mir test for function annotiation Differential Revision: https://reviews.llvm.org/D78062	2020-04-20 13:33:31 -07:00
Chris Bowler	ff048af2e3	[NFC] [AIX] [PowerPC] Add missing instruction to AIX byval test	2020-04-20 15:00:59 -04:00
David Tenty	0098324947	[AIX] Return the correct set of callee saved regs Summary: r13 isn't reserved on 32-bit AIX, which is reflected in our calling convention but not callee saved regs. Reviewers: sfertile, ZarkoCA, cebowleratibm, jasonliu Reviewed By: sfertile Subscribers: thakis, lei, wuzish, nemanjai, hiraditya, kbarton, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D77101	2020-04-20 14:31:08 -04:00
Nemanja Ivanovic	64b31d96df	[PowerPC] Do not attempt to reuse load for 64-bit FP_TO_UINT without FPCVT We call the function that attempts to reuse the conversion without checking whether the target matches the constraints that the callee expects. This patch adds the check prior to the call. Fixes: https://bugs.llvm.org/show_bug.cgi?id=43976 Differential revision: https://reviews.llvm.org/D77564	2020-04-20 13:00:06 -05:00
David Green	460202b464	[ARM] Add an low overhead sibling loop test. NFC	2020-04-20 18:46:38 +01:00
Sean Fertile	8541a3cc9d	[PowerPC][AIX] Use a file check variable for register used in addressing.	2020-04-20 13:08:09 -04:00
David Tenty	28ae1969dc	Revert "[AIX] Return the correct set of callee saved regs" This reverts commit `6c881bf1fe`.	2020-04-20 13:06:37 -04:00
Sean Fertile	d52bb6d099	[PowerPC][AIX] ByVal formal argument support: passing on the stack. Adds support for passing a ByVal formal argument completely on the stack (ie after all argument registers are exhausted). Differential Revision: https://reviews.llvm.org/D78263	2020-04-20 12:04:59 -04:00
David Tenty	6c881bf1fe	[AIX] Return the correct set of callee saved regs Summary: r13 isn't reserved on 32-bit AIX, which is reflected in our calling convention but not callee saved regs. Reviewers: sfertile, ZarkoCA, cebowleratibm, jasonliu Reviewed By: sfertile Subscribers: lei, wuzish, nemanjai, hiraditya, kbarton, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D77101	2020-04-20 11:22:17 -04:00
Petre-Ionut Tudor	52474992b1	Revert "[ARM] Fix conditions for lowering to S[LR]I" This reverts commit `cabfcf840a`. The patch introduced another bug in the optimization.	2020-04-20 16:11:04 +01:00
Ayke van Laethem	8aad119d93	[AVR] Do not place functions in .progmem.data Previously, the AVR backend would put functions in .progmem.data. This is probably a regression from when functions still lived in address space 0. With this change, only global constants are placed in .progmem.data. This is not complete: avr-gcc additionally respects -fdata-sections for progmem global constants, which LLVM doesn't yet do. But fixing that is a bit more complicated (and I believe other backends such as RISC-V might also have similar issues). Differential Revision: https://reviews.llvm.org/D78212	2020-04-20 13:56:38 +02:00
Ayke van Laethem	9505b5cb66	[AVR] Do not use divmod calls for bigger integers The avr-libc provides divmodqi4, divmodhi4, and divmodsi4 functions, but does not provide a divmoddi4. Instead it provides regular divdi3 and moddi3 functions. Note that avr-libc doesn't support divti3 or modti3 for 128-bit integer manipulation. Source: https://github.com/gcc-mirror/gcc/blob/releases/gcc-5.4.0/libgcc/config/avr/lib1funcs.S Differential Revision: https://reviews.llvm.org/D78437	2020-04-20 13:56:38 +02:00
Kerry McLaughlin	33ffce5414	[AArch64][SVE] Remove LD1/ST1 dependency on llvm.masked.load/store Summary: The SVE masked load and store intrinsics introduced in D76688 rely on common llvm.masked.load/store nodes. This patch creates new ISD nodes for LD1(S) & ST1 to remove this dependency. Additionally, this adds support for sign & zero extending loads and truncating stores. Reviewers: sdesmalen, efriedma, cameron.mcinally, c-rhodes, rengolin Reviewed By: efriedma Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, danielkiss, andwar, cfe-commits, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D78204	2020-04-20 11:08:11 +01:00
Sam Parker	62f97123fb	[ARM][MVE] Add patterns for VRHADD Add patterns which use standard add nodes along with arm vshr imm nodes. Differential Revision: https://reviews.llvm.org/D77069	2020-04-20 10:05:21 +01:00
Kang Zhang	a8e15ee04a	[CodeGen] Support freeze expand for ppc_fp128 Summary: The patch D29014 has added the new ISD::FREEZE and can deal with the integer. The patch D76980 has added SoftenFloatRes_FREEZE for float point. But we still lack of expand for ppc_fp128, this will cause assertion for some cases. This patch is to support freeze expand for ppc_fp128. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D78278	2020-04-20 07:27:41 +00:00
Xiang1 Zhang	0980038a5e	Handle CET for -exception-model sjlj Summary: In SjLj exception mode, the old landingpad BB will create a new landingpad BB and use indirect branch jump to the old landingpad BB in lowering. So we should add 2 endbr for this exception model. Reviewers: hjl.tools, craig.topper, annita.zhang, LuoYuanke, pengfei, efriedma Reviewed By: LuoYuanke Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D77124	2020-04-20 11:13:40 +08:00
David Green	a0b1616359	[ARM] Regenerate tests. NFC	2020-04-19 13:45:39 +01:00
Simon Pilgrim	e71dd7c011	[X86][SSE] getFauxShuffle - don't combine shuffles with small truncated scalars (PR45604) getFauxShuffle attempts to combine INSERT_VECTOR_ELT(TRUNCATE/EXTEND(EXTRACT_VECTOR_ELT(x))) patterns into a target shuffle chain. PR45604 identified an issue where the scalar was truncated to a size smaller than the destination vector element and then zero extended back, which requires the upper bits to be zero'd which we don't currently do. To avoid the bug I've added an early out in these truncation cases, a future commit should allow us to handle this by inserting the necessary SM_SentinelZero padding.	2020-04-19 13:35:22 +01:00
Sanjay Patel	cceb630a07	[x86] use vector instructions to lower more FP->int->FP casts This is an enhancement to D77895 to avoid another round-trip from XMM->GPR->XMM. This time we handle the case of starting/ending with an f64 and casting to signed i32 as the intermediate value. It's a bit more involved than I initially assumed because we need to use target-specific opcodes to represent the non-standard cast ops. Differential Revision: https://reviews.llvm.org/D78362	2020-04-19 08:33:17 -04:00
Simon Pilgrim	d6db919bee	[X86][SSE] Add test case for PR45604	2020-04-19 13:13:54 +01:00
LemonBoy	a5d161c119	[PowerPC] Don't use rldicl for PPC32 According to https://www.ibm.com/support/knowledgecenter/ssw_aix_72/assembler/idalangref_rldicl_rletdw_instrs.html rldicl should not be used when targeting 32bit CPUs. Reviewed By: #powerpc, nemanjai, MaskRay Differential Revision: https://reviews.llvm.org/D77946	2020-04-18 17:24:25 -07:00
Andrew Litteken	8d5024f7fe	fix to outline cfi instruction when can be grouped in a tail call [MachineOutliner] fix test for excluding CFI and add test to include CFI in outlining New test to check that we only outline CFI instruction if all CFI Instructions in the function would be captured by the outlining adding x86 tests analagous to AARCH64 cfi tests Revision: https://reviews.llvm.org/D77852	2020-04-17 22:26:34 -07:00
Craig Topper	31a166e4cb	[X86] Clean up some mir tests with INLINEASM to avoid regdef or to correct the immediate for the regdef. The immediate used for the regdef is the encoding for the register class in the enum generated by tablegen. This encoding will change any time a new register class is added. Since the number is part of the input, this means it can become stale. This change modifies some test to avoid this kind of immediate all together. And updates one test to use the current encoding of GR64.	2020-04-17 21:55:44 -07:00
Jessica Paquette	66037b84cf	MachineFunctionInfo for AArch64 in MIR Starting with hasRedZone adding MachineFunctionInfo to be put in the YAML for MIR files. Split out of: D78062 Based on implementation for MachineFunctionInfo for WebAssembly Differential Revision: https://reviews.llvm.org/D78173 Patch by Andrew Litteken! (AndrewLitteken)	2020-04-17 15:16:59 -07:00
Craig Topper	5f69e53e55	[X86] Remove single incoming value phis from tests for the loop SAD pattern. NFC InstCombine should ensure these don't exist. I'm looking at making some changes to how we detect these patterns and not having to worry about these phis will help.	2020-04-17 13:39:47 -07:00
Francesco Petrogalli	897fdec586	[llvm][CodeGen] Addressing modes for SVE stN. This reverts commit `17b1869b72`. It is an attempt to fix the failure reported at The patch differs from the original one reviwed at https://reviews.llvm.org/D77435 only for the use of the std::make_tuple in building the return value of `findAddrModeSVELoadStore`: - return {IsRegReg ? Opc_rr : Opc_ri, NewBase, NewOffset}; + return std::make_tuple(IsRegReg ? Opc_rr : Opc_ri, NewBase, the original patch submitted at `fc4e954ed5` was failing the following build: http://lab.llvm.org:8011/builders/clang-armv7-linux-build-cache/builds/29420/ with error: /home/buildslave/buildslave/clang-armv7-linux-build-cache/llvm/llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp /home/buildslave/buildslave/clang-armv7-linux-build-cache/llvm/llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp:1439:10: error: chosen constructor is explicit in copy-initialization return {IsRegReg ? Opc_rr : Opc_ri, NewBase, NewOffset}; ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ /usr/bin/../lib/gcc/arm-linux-gnueabihf/5.4.0/../../../../include/c++/5.4.0/tuple:479:19: note: explicit constructor declared here constexpr tuple(_UElements&&... __elements) ^ 1 error generated.	2020-04-17 20:35:35 +01:00
Francesco Petrogalli	17b1869b72	Revert "[llvm][CodeGen] Addressing modes for SVE stN." This reverts commit `fc4e954ed5`. The commit reported the following failure: http://lab.llvm.org:8011/builders/clang-armv7-linux-build-cache/builds/29420 FAILED: lib/Target/AArch64/CMakeFiles/LLVMAArch64CodeGen.dir/AArch64ISelDAGToDAG.cpp.o /usr/bin/c++ -DGTEST_HAS_RTTI=0 -D_DEBUG -D_FILE_OFFSET_BITS=64 -D_GNU_SOURCE -D_LARGEFILE_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -Ilib/Target/AArch64 -I/home/buildslave/buildslave/clang-armv7-linux-build-cache/llvm/llvm/lib/Target/AArch64 -I/usr/include/libxml2 -Iinclude -I/home/buildslave/buildslave/clang-armv7-linux-build-cache/llvm/llvm/include -mthumb -fPIC -fvisibility-inlines-hidden -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -pedantic -Wno-long-long -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wstring-conversion -fdiagnostics-color -ffunction-sections -fdata-sections -O3 -fvisibility=hidden -fno-exceptions -fno-rtti -UNDEBUG -std=c++14 -MMD -MT lib/Target/AArch64/CMakeFiles/LLVMAArch64CodeGen.dir/AArch64ISelDAGToDAG.cpp.o -MF lib/Target/AArch64/CMakeFiles/LLVMAArch64CodeGen.dir/AArch64ISelDAGToDAG.cpp.o.d -o lib/Target/AArch64/CMakeFiles/LLVMAArch64CodeGen.dir/AArch64ISelDAGToDAG.cpp.o -c /home/buildslave/buildslave/clang-armv7-linux-build-cache/llvm/llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp /home/buildslave/buildslave/clang-armv7-linux-build-cache/llvm/llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp:1439:10: error: chosen constructor is explicit in copy-initialization return {IsRegReg ? Opc_rr : Opc_ri, NewBase, NewOffset}; ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ /usr/bin/../lib/gcc/arm-linux-gnueabihf/5.4.0/../../../../include/c++/5.4.0/tuple:479:19: note: explicit constructor declared here constexpr tuple(_UElements&&... __elements)	2020-04-17 20:03:11 +01:00
Stanislav Mekhanoshin	992fbce4e9	[AMDGPU] copyPhysReg() for 16 bit SGPR subregs Differential Revision: https://reviews.llvm.org/D78255	2020-04-17 11:59:39 -07:00
Stanislav Mekhanoshin	fde2aefa22	[AMDGPU] Use SDWA for 16 bit subreg copy This simplifies the logic and allows to use it on GFX8. Differential Revision: https://reviews.llvm.org/D78150	2020-04-17 11:45:44 -07:00
Francesco Petrogalli	fc4e954ed5	[llvm][CodeGen] Addressing modes for SVE stN. Reviewers: efriedma, sdesmalen, c-rhodes, ctetreau Reviewed By: c-rhodes Subscribers: tschuett, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D77435	2020-04-17 19:31:44 +01:00
Francesco Petrogalli	48879c02bf	[llvm][CodeGen] Fix issue for SVE gather prefetch. Summary: This change is fixing an issue where the dagcombine incorrectly used an addressing mode with scaled offsets (indices), instead of unscaled offsets. Those addressing modes do not exist for `prfh` , `prfw` and `prfd`, hence we can reuse `prfb` because that has unscaled offsets, and because the pseudo-code in the XML spec suggests that the element size is not used for the amount of data that is prefetched by the instruction. FWIW, GCC also emits a `prfb` for these cases. Reviewers: sdesmalen, andwar, rengolin Reviewed By: sdesmalen Subscribers: tschuett, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D78069	2020-04-17 19:23:28 +01:00
Petre-Ionut Tudor	cabfcf840a	[ARM] Fix conditions for lowering to S[LR]I Summary: Fixed wrong conditions for generating (S[LR]I X, Y, C2) from (or (and X, BvecC1), (lsl Y, C2)) and added ISel nodes to lower to S[LR]I. The optimisation is also enabled by default now. Subscribers: kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D77387	2020-04-17 17:19:24 +01:00
Stefan Pintilie	b771c4a842	[PowerPC][Future] More support for PCRel addressing for global values Add initial support for PC Relative addressing for global values that require GOT indirect addressing. This patch adds PCRelative support for global addresses that may not be known at link time and may require access through the GOT. Differential Revision: https://reviews.llvm.org/D76064	2020-04-17 11:06:13 -05:00
Dominik Montada	55e3a7c6b2	[GlobalISel][AMDGPU] add legalization for G_FREEZE Summary: Copy the legalization rules from SelectionDAG: -widenScalar using anyext -narrowScalar using intermediate merges -scalarize/fewerElements using unmerge -moreElements using G_IMPLICIT_DEF and insert Add G_FREEZE legalization actions to AMDGPULegalizerInfo. Use the same legalization actions as G_IMPLICIT_DEF. Depends on D77795. Reviewers: dsanders, arsenm, aqjune, aditya_nandakumar, t.p.northover, lebedev.ri, paquette, aemerson Reviewed By: arsenm Subscribers: kzhuravl, yaxunl, dstuttard, tpr, t-tye, jvesely, nhaehnle, kerbowa, wdng, rovka, hiraditya, volkan, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D78092	2020-04-17 16:44:46 +02:00
jasonliu	77618cc237	[XCOFF][AIX] Fix getSymbol to return the correct qualname when necessary Summary: AIX symbol have qualname and unqualified name. The stock getSymbol could only return unqualified name, which leads us to patch many caller side(lowerConstant, getMCSymbolForTOCPseudoMO). So we should try to address this problem in the callee side(getSymbol) and clean up the caller side instead. Note: this is a "mostly" NFC patch, with a fix for the original lowerConstant behavior. Differential Revision: https://reviews.llvm.org/D78045	2020-04-17 13:45:14 +00:00
Sanjay Patel	a6fc687e34	[x86] add/adjust tests for FP<->int casts; NFC	2020-04-17 08:22:42 -04:00
Roger Ferrer Ibanez	5f23686412	[RISCV][AsmParser] Implement .option (no)pic Differential Revision: https://reviews.llvm.org/D77867	2020-04-17 12:08:30 +00:00
Sam Parker	f88000a4b5	[ARM][MVE] Add VHADD and VHSUB patterns Add patterns that use a normal, non-wrapping, add and sub nodes along with an arm vshr imm node. Differential Revision: https://reviews.llvm.org/D77065	2020-04-17 07:45:15 +01:00
QingShan Zhang	4bd186c0ff	[PowerPC] Exploit the rldicl + rldicl when and with mask If we are and the constant like 0xFFFFFFC00000, for now, we are using several instructions to generate this 48bit constant and final an "and". However, we could exploit it with two rotate instructions. MB ME MB+63-ME +----------------------+ +----------------------+ \|0000001111111111111000\| -> \|0000000001111111111111\| +----------------------+ +----------------------+ 0 63 0 63 Rotate left ME + 1 bit first, and then, mask it with (MB + 63 - ME, 63), finally, rotate back. Notice that, we need to round it with 64 bit for the wrapping case. Reviewed by: ChenZheng, Nemanjai Differential Revision: https://reviews.llvm.org/D71831	2020-04-17 05:24:00 +00:00
Craig Topper	944cc5e0ab	[SelectionDAGBuilder][CGP][X86] Move some of SDB's gather/scatter uniform base handling to CGP. I've always found the "findValue" a little odd and inconsistent with other things in SDB. This simplfifies the code in SDB to just handle a splat constant address or a 2 operand GEP in the same BB. This removes the need for "findValue" since the operands to the GEP are guaranteed to be available. The splat constant handling is new, but was needed to avoid regressions due to constant folding combining GEPs created in CGP. CGP is now responsible for canonicalizing gather/scatters into this form. The pattern I'm using for scalarizing, a scalar GEP followed by a GEP with an all zeroes index, seems to be subject to constant folding that the insertelement+shufflevector was not. Differential Revision: https://reviews.llvm.org/D76947	2020-04-16 17:49:22 -07:00
Wouter van Oortmerssen	48139ebc3a	[WebAssembly] Add int32 DW_OP_WASM_location variant This to allow us to add reloctable global indices as a symbol. Also adds R_WASM_GLOBAL_INDEX_I32 relocation type to support it. See discussion in https://github.com/WebAssembly/debugging/issues/12	2020-04-16 16:32:17 -07:00
Sanjay Patel	b29fca30fa	[x86] auto-generate complete test checks; NFC	2020-04-16 17:16:51 -04:00
David Green	94052da929	[ARM] MVE postinc tests. NFC	2020-04-16 22:05:28 +01:00
David Green	8e8c3c3408	[ARM] Mir test for machine sinking multiple def instructions. NFC	2020-04-16 20:58:14 +01:00
bd1976llvm	86478d3de9	[MC][ELF] Put explicit section name symbols into entry size compatible sections Ensure that symbols explicitly* assigned a section name are placed into a section with a compatible entry size. This is done by creating multiple sections with the same name** if incompatible symbols are explicitly given the name of an incompatible section, whilst: - Avoiding using uniqued sections where possible (for readability and to maximize compatibly with assemblers). - Creating as few SHF_MERGE sections as possible (for efficiency). Given that each symbol is assigned to a section in a single pass, we must decide which section each symbol is assigned to without seeing the properties of all symbols. A stable and easy to understand assignment is desirable. The following rules facilitate this: The "generic" section for a given section name will be mergeable if the name is a mergeable "default" section name (such as .debug_str), a mergeable "implicit" section name (such as .rodata.str2.2), or MC has already created a mergeable "generic" section for the given section name (e.g. in response to a section directive in inline assembly). Otherwise, the "generic" section for a given name is non-mergeable; and, non-mergeable symbols are assigned to the "generic" section, while mergeable symbols are assigned to uniqued sections. Terminology: "default" sections are those always created by MC initially, e.g. .text or .debug_str. "implicit" sections are those created normally by MC in response to the symbols that it encounters, i.e. in the absence of an explicit section name assignment on the symbol, e.g. a function foo might be placed into a .text.foo section. "generic" sections are those that are referred to when a unique section ID is not supplied, e.g. if there are multiple unique .bob sections then ".quad .bob" will reference the generic .bob section. Typically, the generic section is just the first section of a given name to be created. Default sections are always generic. * Typically, section names might be explicitly assigned in source code using a language extension e.g. a section attribute: _attribute_ ((section ("section-name"))) - https://clang.llvm.org/docs/AttributeReference.html ** I refer to such sections as unique/uniqued sections. In assembly the ", unique," assembly syntax is used to express such sections. Fixes https://bugs.llvm.org/show_bug.cgi?id=43457. See https://reviews.llvm.org/D68101 for previous discussions leading to this patch. Some minor fixes were required to LLVM's tests, for tests had been using the old behavior - which allowed for explicitly assigning globals with incompatible entry sizes to a section. This fix relies on the ",unique ," assembly feature. This feature is not available until bintuils version 2.35 (https://sourceware.org/bugzilla/show_bug.cgi?id=25380). If the integrated assembler is not being used then we avoid using this feature for compatibility and instead try to place mergeable symbols into non-mergeable sections or issue an error otherwise. Differential Revision: https://reviews.llvm.org/D72194	2020-04-16 19:12:49 +00:00
Jaydeep Chauhan	561cb14e74	[LLVM] Remove wrong DBG_VALUE instruction with one operand in AArch64 test case Summary: AArch64 test case llvm/test/CodeGen/AArch64/branch-target-enforcement.mir is checking for invalid DBG_VALUE instruction with one operand(`DBG_VALUE $lr`). And this DBG_VALUE instruction is echoed from test case it self only. Correct format of DBG_VALUE is given in below link: https://llvm.org/docs/SourceLevelDebugging.html#variable-locations-in-instruction-selection-and-mir Reviewers: dsanders, eli.friedman, jmorse, vsk Reviewed By: dsanders Subscribers: kristof.beyls, danielkiss, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D78309	2020-04-16 11:58:07 -07:00
Cameron McInally	1223255c2d	[AArch64][SVE] Add DestructiveBinaryImm SQSHLU patterns. Add DestructiveBinaryImm SQSHLU patterns and tests. These patterns allow the SQSHLU instruction to match with a MOVPRFX. Differential Revision: https://reviews.llvm.org/D76728	2020-04-16 13:48:08 -05:00
Stefan Pintilie	18b6050324	[PowerPC][Future] Initial support for PC Relative addressing for global values This patch adds PC Relative support for global values that are known at link time. If a global value requires access through the global offset table (GOT) it is not covered in this patch. Differential Revision: https://reviews.llvm.org/D75280	2020-04-16 12:45:22 -05:00

... 3 4 5 6 7 ...

34024 Commits