llvm-project

Commit Graph

Author	SHA1	Message	Date
Simon Pilgrim	4214ca9614	[X86][AVX] Attempt to fold vpermf128(op(x,i),op(y,i)) -> op(vpermf128(x,y),i) If vpermf128/vpermi128 is acting on 2 similar 'inlane' ops, then try to perform the vpermf128 first which will allow us to merge the ops. This will help us fix one of the regressions in D56387	2021-01-11 16:59:25 +00:00
Paul Robinson	c161775dec	[FastISel] Flush local value map on every instruction Local values are constants or addresses that can't be folded into the instruction that uses them. FastISel materializes these in a "local value" area that always dominates the current insertion point, to try to avoid materializing these values more than once (per block). https://reviews.llvm.org/D43093 added code to sink these local value instructions to their first use, which has two beneficial effects. One, it is likely to avoid some unnecessary spills and reloads; two, it allows us to attach the debug location of the user to the local value instruction. The latter effect can improve the debugging experience for debuggers with a "set next statement" feature, such as the Visual Studio debugger and PS4 debugger, because instructions to set up constants for a given statement will be associated with the appropriate source line. There are also some constants (primarily addresses) that could be produced by no-op casts or GEP instructions; the main difference from "local value" instructions is that these are values from separate IR instructions, and therefore could have multiple users across multiple basic blocks. D43093 avoided sinking these, even though they were emitted to the same "local value" area as the other instructions. The patch comment for D43093 states: Local values may also be used by no-op casts, which adds the register to the RegFixups table. Without reversing the RegFixups map direction, we don't have enough information to sink these instructions. This patch undoes most of D43093, and instead flushes the local value map after() every IR instruction, using that instruction's debug location. This avoids sometimes incorrect locations used previously, and emits instructions in a more natural order. In addition, constants materialized due to PHI instructions are not assigned a debug location immediately; instead, when the local value map is flushed, if the first local value instruction has no debug location, it is given the same location as the first non-local-value-map instruction. This prevents PHIs from introducing unattributed instructions, which would either be implicitly attributed to the location for the preceding IR instruction, or given line 0 if they are at the beginning of a machine basic block. Neither of those consequences is good for debugging. This does mean materialized values are not re-used across IR instruction boundaries; however, only about 5% of those values were reused in an experimental self-build of clang. () Actually, just prior to the next instruction. It seems like it would be cleaner the other way, but I was having trouble getting that to work. This reapplies commits `cf1c774d` and `dc35368c`, and adds the modification to PHI handling, which should avoid problems with debugging under gdb. Differential Revision: https://reviews.llvm.org/D91734	2021-01-11 08:32:36 -08:00
Paul Robinson	e5eb5c8a7f	NFC: Use -LABEL more There were a number of tests needing updates for D91734, and I added a bunch of LABEL directives to help track down where those had to go. These directives are an improvement independent of the functional patch, so I'm committing them as their own separate patch.	2021-01-11 08:14:58 -08:00
Simon Pilgrim	a0f82749f4	[X86] Extend lzcnt-cmp tests to test on non-lzcnt targets	2021-01-11 15:27:08 +00:00
Simon Pilgrim	a46982a255	[X86] Add nounwind to lzcnt-cmp tests Remove unnecessary cfi markup	2021-01-11 15:06:38 +00:00
Joe Ellis	007358239d	[DAGCombiner] Use getVectorElementCount inside visitINSERT_SUBVECTOR This avoids TypeSize-/ElementCount-related warnings. Differential Revision: https://reviews.llvm.org/D92747	2021-01-11 14:15:11 +00:00
Jay Foad	6dcf9207df	[AMDGPU] Fix a urem combine test to test what it was supposed to	2021-01-11 13:32:34 +00:00
Simon Pilgrim	8112a2598c	[X86][SSE] Add 'vectorized sum' test patterns These are often generated when building a vector from the reduction sums of independent vectors. I've implemented some typical patterns from various v4f32/v4i32 based off current codegen emitted from the vectorizers, although these tests are more about tweaking some hadd style backend folds to handle whatever the vectorizers/vectorcombine throws at us...	2021-01-11 12:51:18 +00:00
Kazushi (Jam) Marukawa	d02de13932	[VE] Support additional VMRGW and VMV intrinsic instructions Support missing VMRGW and VMV intrinsic instructions and add regression tests. Reviewed By: simoll Differential Revision: https://reviews.llvm.org/D94300	2021-01-11 20:50:31 +09:00
Kazushi (Jam) Marukawa	b72ca79982	[VE] Support intrinsic to isnert/extract_subreg of v512i1 Support insert/extract_subreg intrinsic instructions for v512i1 registers and add regression tests. Reviewed By: simoll Differential Revision: https://reviews.llvm.org/D94298	2021-01-11 20:40:10 +09:00
Simon Pilgrim	5963229266	[X86][SSE] Add missing SSE test coverage for permute(hop,hop) folds Should help avoid bugs like reported in rG80dee7965dff	2021-01-11 11:29:04 +00:00
Simon Pilgrim	41bf338dd1	Revert rGd43a264a5dd3 "Revert "[X86][SSE] Fold unpack(hop(),hop()) -> permute(hop())"" This reapplies commit rG80dee7965dffdfb866afa9d74f3a4a97453708b2. [X86][SSE] Fold unpack(hop(),hop()) -> permute(hop()) UNPCKL/UNPCKH only uses one op from each hop, so we can merge the hops and then permute the result. REAPPLIED with a fix for unary unpacks of HOP.	2021-01-11 11:29:04 +00:00
Kerry McLaughlin	c37f68a888	[SVE][CodeGen] Fix legalisation of floating-point masked gathers Changes in this patch: - When lowering floating-point masked gathers, cast the result of the gather back to the original type with reinterpret_cast before returning. - Added patterns for reinterpret_casts from integer to floating point, and concat_vector patterns for bfloat16. - Tests for various legalisation scenarios with floating point types. Reviewed By: sdesmalen, david-arm Differential Revision: https://reviews.llvm.org/D94171	2021-01-11 10:57:46 +00:00
Luo, Yuanke	c5be0e0cc0	[X86] Fix tile register spill issue. The tile register spill need 2 instructions. %46:gr64_nosp = MOV64ri 64 TILESTORED %stack.2, 1, killed %46:gr64_nosp, 0, $noreg, %43:tile The first instruction load the stride to a GPR, and the second instruction store tile register to stack slot. The optimization of merge spill instruction is done after register allocation. And spill tile register need create a new virtual register to for stride, so we can't hoist tile spill instruction in postOptimization() of register allocation. We can't hoist TILESTORED alone and we can't hoist the 2 instuctions together because MOV64ri will clobber some GPR. This patch is to disble the spill merge for any spill which need 2 instructions. Differential Revision: https://reviews.llvm.org/D93898	2021-01-11 18:35:09 +08:00
QingShan Zhang	7539c75bb4	[DAGCombine] Remove the check for unsafe-fp-math when we are checking the AFN We are checking the unsafe-fp-math for sqrt but not for fpow, which behaves inconsistent. As the direction is to remove this global option, we need to remove the unsafe-fp-math check for sqrt and update the test with afn fast-math flags. Reviewed By: Spatel Differential Revision: https://reviews.llvm.org/D93891	2021-01-11 02:25:53 +00:00
Nico Weber	d43a264a5d	Revert "[X86][SSE] Fold unpack(hop(),hop()) -> permute(hop())" This reverts commit `80dee7965d`. Makes clang sometimes hang forever. See https://bugs.chromium.org/p/chromium/issues/detail?id=1164786#c6 for a stand-alone repro.	2021-01-10 20:22:53 -05:00
Fraser Cormack	b02eab9058	[RISCV] Add scalable vector icmp ISel patterns Original patch by @rogfer01. The RVV integer comparison instructions are defined in such a way that many LLVM operations are defined by using the "opposite" comparison instruction and swapping the operands. This is done in this patch in most cases, except for the mappings where the immediate range must be adjusted to accomodate: va < i --> vmsle{u}.vi vd, va, i-1, vm va >= i --> vmsgt{u}.vi vd, va, i-1, vm That is left for future optimization; this patch supports all operations but in the case of the missing mappings the immediate will be moved to a scalar register first. Since there are so many condition codes and operand cases to check, it was decided to reduce the test burden by only testing the "vscale x 8" vector types. Authored-by: Roger Ferrer Ibanez <rofirrim@gmail.com> Co-Authored-by: Fraser Cormack <fraser@codeplay.com> Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D94168	2021-01-09 20:54:34 +00:00
Fraser Cormack	41d06095b0	[SelectionDAG] Teach isConstOrConstSplat about ISD::SPLAT_VECTOR This improves llvm::isConstOrConstSplat by allowing it to analyze ISD::SPLAT_VECTOR nodes, in order to allow more constant-folding of operations using scalable vector types. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D94168	2021-01-09 20:54:34 +00:00
Mircea Trofin	75c04327a5	[NFC] Disallow unused prefixes in CodeGen/X86 tests. Also fixed remaining tests that featured unused prefixes. Differential Revision: https://reviews.llvm.org/D94330	2021-01-09 11:43:32 -08:00
Roger Ferrer Ibanez	524d8fa9a5	[RISCV] Do not grow the stack a second time when we need to realign the stack This is a first change needed to fix a crash in which the emergency spill splot ends being out of reach. This happens when we run the register scavenger after we have eliminated the frame indexes. The fix for the actual crash will come in a later change. This change removes an extra stack size increase we do in RISCVFrameLowering::determineFrameLayout. We don't have to change the size of the stack here as PEI::calculateFrameObjectOffsets is already doing this with the right size accounting the extra alignment. Differential Revision: https://reviews.llvm.org/D89237	2021-01-09 16:51:09 +00:00
Heejin Ahn	4e4df1e38d	[WebAssembly] Remove unreachable EH pads This removes unreachable EH pads in LateEHPrepare. This is not for optimization but for preparation for CFGStackify. In CFGStackify, we determine where to place `try` marker by computing the nearest common dominator of all predecessors of an EH pad, but when an EH pad does not have a predecessor, it becomes tricky. We can insert an empty dummy BB before the EH pad and place the `try` there, but removing unreachable EH pads is simpler. This moves an existing exception label test from eh-label.mir to exception.mir and adds a new test there. This also adds some comments to existing methods. Reviewed By: dschuff, tlively Differential Revision: https://reviews.llvm.org/D94044	2021-01-09 03:42:38 -08:00
Fraser Cormack	2c442629f0	[RISCV] Add tests for scalable constant-folding (NFC)	2021-01-09 11:31:22 +00:00
Heejin Ahn	0d8dfbb42a	[WebAssembly] Update InstPrinter support for EH - Updates InstPrinter to handle `catch_all`. - Makes `rethrow` condition an early exit from the function to make the rest simpler. - Unify label and catch counters. They don't need to be counted separately and this will help `delegate` instruction later. - Removes `LastSeenEHInst` field. This was first introduced to handle when there are more than one `catch` blocks per `try`, but this was not implemented correctly and not being used at the moment anyway. - Reenables all tests in cfg-stackify-eh.ll that don't deal with unwind destination mismatches, which will be handled in a later CL. Reviewed By: dschuff, tlively, aardappel Differential Revision: https://reviews.llvm.org/D94043	2021-01-09 02:42:35 -08:00
Heejin Ahn	52e240a072	[WebAssembly] Remove exnref and br_on_exn This removes `exnref` type and `br_on_exn` instruction. This is effectively NFC because most uses of these were already removed in the previous CLs. Reviewed By: dschuff, tlively Differential Revision: https://reviews.llvm.org/D94041	2021-01-09 02:02:54 -08:00
Heejin Ahn	9e4eadeb13	[WebAssembly] Update basic EH instructions for the new spec This implements basic instructions for the new spec. - Adds new versions of instructions: `catch`, `catch_all`, and `rethrow` - Adds support for instruction selection for the new instructions - `catch` needs a custom routine for the same reason `throw` needs one, to encode `__cpp_exception` tag symbol. - Updates `WebAssembly::isCatch` utility function to include `catch_all` and Change code that compares an instruction's opcode with `catch` to use that function. - LateEHPrepare - Previously in LateEHPrepare we added `catch` instruction to both `catchpad`s (for user catches) and `cleanuppad`s (for destructors). In the new version `catch` is generated from `llvm.catch` intrinsic in instruction selection phase, so we only need to add `catch_all` to the beginning of cleanup pads. - `catch` is generated from instruction selection, but we need to hoist the `catch` instruction to the beginning of every EH pad, because `catch` can be in the middle of the EH pad or even in a split BB from it after various code transformations. - Removes `addExceptionExtraction` function, which was used to generate `br_on_exn` before. - CFGStackfiy: Deletes `fixUnwindMismatches` function. Running this function on the new instruction causes crashes, and the new version will be added in a later CL, whose contents will be completely different. So deleting the whole function will make the diff easier to read. - Reenables all disabled tests in exception.ll and eh-lsda.ll and a single basic test in cfg-stackify-eh.ll. - Updates existing tests to use the new assembly format. And deletes `br_on_exn` instructions from the tests and FileCheck lines. Reviewed By: dschuff, tlively Differential Revision: https://reviews.llvm.org/D94040	2021-01-09 01:48:06 -08:00
Heejin Ahn	9724c3cff4	[WebAssembly] Update WasmEHPrepare for the new spec Clang generates `wasm.get.exception` and `wasm.get.ehselector` intrinsics, which respectively return a caught exception value (a pointer to some C++ exception struct) and a selector (an integer value that tells which C++ `catch` clause the current exception matches, or does not match any). WasmEHPrepare is a pass that does some IR-level preparation before instruction selection. Previously one of things we did in this pass was to convert `wasm.get.exception` intrinsic calls to `wasm.extract.exception` intrinsics. Their semantics were the same except `wasm.extract.exception` did not have a token argument. We maintained these two separate intrinsics with the same semantics because instruction selection couldn't handle token arguments. This `wasm.extract.exception` intrinsic was later converted to `extract_exception` instruction in instruction selection, which was a pseudo instruction to implement `br_on_exn`. Because `br_on_exn` pushed an extracted value onto the value stack after the `end` instruction of a `block`, but LLVM does not have a way of modeling that kind of behavior, so this pseudo instruction was used to pull an extracted value out of thin air, like this: ``` block $l0 ... br_on_exn $cpp_exception $l0 ... end extract_exception ;; pushes values onto the stack ``` In the new spec, we don't need this pseudo instruction anymore because `catch` itself returns a value and we don't have `br_on_exn` anymore. In the spec `catch` returns multiple values (like `br_on_exn`), but here we assume it only returns a single i32, which is sufficient to support C++. So this renames `wasm.get.exception` intrinsic to `wasm.catch`. Because this CL does not yet contain instruction selection for `wasm.catch` intrinsic, all `RUN` lines in exception.ll, eh-lsda.ll, and cfg-stackify-eh.ll, and a single `RUN` line in wasm-eh.cpp (which is an end-to-end test from C++ source to assembly) fail. So this CL temporarily disables those `RUN` lines, and for those test files without any valid remaining `RUN` lines, adds a dummy `RUN` line to make them pass. These tests will be reenabled in later CLs. Reviewed By: dschuff, tlively Differential Revision: https://reviews.llvm.org/D94039	2021-01-08 23:38:26 -08:00
Ben Shi	55f0a1b066	[RISCV] Optimize multiplication with constant 1. Break MUL with specific constant to a SLLI and an ADD/SUB on riscv32 with the M extension. 2. Break MUL with specific constant to two SLLI and an ADD/SUB, if the constant needs a pair of LUI/ADDI to construct. Reviewed by: craig.topper Differential Revision: https://reviews.llvm.org/D93619	2021-01-09 10:37:21 +08:00
Tony	2f499b9aff	[AMDGPU] Add volatile support to SIMemoryLegalizer Treat a non-atomic volatile load and store as a relaxed atomic at system scope for the address spaces accessed. This will ensure all relevant caches will be bypassed. A volatile atomic is not changed and still only bypasses caches upto the level specified by the SyncScope operand. Differential Revision: https://reviews.llvm.org/D94214	2021-01-09 00:52:33 +00:00
Mircea Trofin	a8bda3df42	[NFC] Disallow unused prefixes in CodeGen/AMDGPU This adds the lit config, and cleans up remaining tests. Differential Revision: https://reviews.llvm.org/D94245	2021-01-08 11:49:23 -08:00
David Green	024af42c60	[ARM] Custom lower i1 vector truncates The ISel patterns we have for truncating to i1's under MVE do not seem to be correct. Instead custom lower to icmp(ne, and(x, 1), 0). Differential Revision: https://reviews.llvm.org/D94226	2021-01-08 18:21:00 +00:00
Simon Pilgrim	80dee7965d	[X86][SSE] Fold unpack(hop(),hop()) -> permute(hop()) UNPCKL/UNPCKH only uses one op from each hop, so we can merge the hops and then permute the result.	2021-01-08 15:22:17 +00:00
Heejin Ahn	7be271537e	[WebAssembly] Rename wasm_rethrow_in_catch intrinsic/builtin `wasm_rethrow_in_catch` intrinsic and builtin are used in order to rethrow an exception when the exception is caught but there is no matching clause within the current `catch`. For example, ``` try { foo(); } catch (int n) { ... } ``` If the caught exception does not correspond to C++ `int` type, it should be rethrown. These intrinsic/builtin were renamed `rethrow_in_catch` because at the time I thought there would be another intrinsic for C++'s `throw` keyword, which rethrows an exception. It turned out that `throw` keyword doesn't require wasm's `rethrow` instruction, so we rename `rethrow_in_catch` to just `rethrow` here. Reviewed By: dschuff, tlively Differential Revision: https://reviews.llvm.org/D94038	2021-01-08 06:55:04 -08:00
David Green	1ae762469f	[ARM] Update and regenerate test checks. NFC	2021-01-08 14:54:16 +00:00
Simon Pilgrim	4a582d766a	[X86][SSE] Add vphaddd/vphsubd unpack(hop(),hop()) tests	2021-01-08 14:39:37 +00:00
Simon Pilgrim	7b9f541c1e	[X86][SSE] Add tests for unpack(hop(),hop()) We should be able to convert these to permute(hop()) as we only ever use one of the ops from each hop.	2021-01-08 14:11:37 +00:00
Kazushi (Jam) Marukawa	5ead757f1d	[VE] Support pack_f32p and pack_f32a intrinsic instructions Support pack_f32p and pack_f32a intrinsic instructions and regression tests. Reviewed By: simoll Differential Revision: https://reviews.llvm.org/D94296	2021-01-08 22:59:11 +09:00
Heejin Ahn	d012430eee	[WebAssembly] Change label numbers to variables in test cfg-stackify-eh.ll contains many `CHECK` lines specifying label / catch comments with numbers. These numbers are subject to change every time any block/loop/try is added in the middle in existing functions or any other function is added in the middle of the file, generating a large number of lines in diffs. This change converts them to variables so they can be more resistent to future changes. Reviewed By: dschuff, tlively Differential Revision: https://reviews.llvm.org/D94037	2021-01-08 05:49:59 -08:00
Simon Moll	611d3c63f3	[VP] ISD helper functions [VE] isel for vp_add, vp_and This implements vp_add, vp_and for the VE target by lowering them to the VVP_* layer. We also add helper functions for VP SDNodes (isVPSDNode, getVPMaskIdx, getVPExplicitVectorLengthIdx). Reviewed By: kaz7 Differential Revision: https://reviews.llvm.org/D93766	2021-01-08 14:29:45 +01:00
Nicholas Guy	ed23229a64	[AArch64] Fix crash caused by invalid vector element type Fixes a crash caused by D91255, when LLVMTy is null when calling changeExtendedVectorElementType. Differential Revision: https://reviews.llvm.org/D94234	2021-01-08 12:02:54 +00:00
Simon Moll	eeba70a463	[VE] Expand single-element BUILD_VECTOR to INSERT_VECTOR_ELT We do this mostly to be able to test the insert_vector_elt isel patterns. As long as we don't, most single element insertions show up as `BUILD_VECTOR` in the backend. Reviewed By: kaz7 Differential Revision: https://reviews.llvm.org/D93759	2021-01-08 11:48:01 +01:00
Simon Moll	d1b606f897	[VE] Extract & insert vector element isel Isel and tests for extract_vector_elt and insert_vector_elt. Reviewed By: kaz7 Differential Revision: https://reviews.llvm.org/D93687	2021-01-08 11:46:59 +01:00
Qiu Chaofan	6175fcf01f	[NFC] Update some PPC tests marked as auto-generated Update CodeGen regression tests with marker at first line telling it's auto-generated by the script, under PowerPC directory. For some reason, these tests are generated but manually written, which makes things unclear when someone's change affecting them. However, some tests only show simple change after re-generated, like extra blank lines, disappearing '.localentry', etc. Besides, some tests are generated but added checks for debug output. This commit doesn't try updating them.	2021-01-08 17:59:13 +08:00
Kazushi (Jam) Marukawa	12167632bc	[VE] Add SVOB intrinsic instruction Add SVOB intrinsic instruction and a regression test. Reviewed By: simoll Differential Revision: https://reviews.llvm.org/D94279	2021-01-08 18:49:17 +09:00
David Sherwood	d1bf26fd94	[AArch64][SVE] Add lowering for llvm abs intrinsic Add functionality to permit lowering of the abs and neg intrinsics using the passthru variants. Differential Revision: https://reviews.llvm.org/D94160	2021-01-08 08:55:25 +00:00
Christudasan Devadasan	ae25a397e9	AMDGPU/GlobalISel: Enable sret demotion	2021-01-08 10:56:35 +05:30
Evandro Menezes	946bc50e4c	[RISCV] Define the vfsqrt RVV intrinsics Define the `vfsqrt` IR intrinsics for the respective V instructions. Authored-by: Roger Ferrer Ibanez <rofirrim@gmail.com> Co-Authored-by: Evandro Menezes <evandro.menezes@sifive.com> Differential Revision: https://reviews.llvm.org/D93745	2021-01-07 17:29:29 -06:00
Arthur Eubanks	9ccf13c36d	[NewPM][NVPTX] Port NVPTX opt passes There are only two used in the IR optimization pipeline. Port these and add them to the default pipeline. Similar to https://reviews.llvm.org/D93863. I added -mtriple to some tests since under the new PM, the passes are only available when the TargetMachine is specified. Reviewed By: rnk Differential Revision: https://reviews.llvm.org/D93930	2021-01-07 15:12:35 -08:00
Matt Arsenault	2cbbc6e87c	GlobalISel: Fail legalization on narrowing extload below memory size	2021-01-07 17:40:34 -05:00
Matt Arsenault	1f9b6ef91f	GlobalISel: Add combine for G_UREM by power of 2 Really I want this in the legalizer, but this is a start.	2021-01-07 16:36:35 -05:00
Wouter van Oortmerssen	5c38ae36c5	[WebAssembly] Fixed byval args missing DWARF DW_AT_LOCATION A struct in C passed by value did not get debug information. Such values are currently lowered to a Wasm local even in -O0 (not to an alloca like on other archs), which becomes a Target Index operand (TI_LOCAL). The DWARF writing code was not emitting locations in for TI's specifically if the location is a single range (not a list). In addition, the ExplicitLocals pass which removes the ARGUMENT pseudo instructions did not update the associated DBG_VALUEs, and couldn't even find these values since the code assumed such instructions are adjacent, which is not the case here. Also fixed asm printing of TIs needed by a test. Differential Revision: https://reviews.llvm.org/D94140	2021-01-07 10:31:38 -08:00

1 2 3 4 5 ...

37149 Commits