llvm-project

Commit Graph

Author	SHA1	Message	Date
Cameron McInally	c126eb7529	[SelectionDAG] Add legalizations for VECREDUCE_SEQ_FMUL Hook up legalizations for VECREDUCE_SEQ_FMUL. This is following up on the VECREDUCE_SEQ_FADD work from D90247. Differential Revision: https://reviews.llvm.org/D90644	2020-11-04 14:20:31 -06:00
Mircea Trofin	5dc47541f9	[NFC] Use Register/MCRegister Differential Revision: https://reviews.llvm.org/D90724	2020-11-04 12:20:17 -08:00
Craig Topper	cc3bf27077	[RISCV] Remove assertsexti32 from fslw/fsrw isel patterns. The operations in these patterns shouldn't be effected by sign bits. And the pattern is starting from a sign_extend_inreg so we aren't expecting sign bits to be passed through either. Differential Revision: https://reviews.llvm.org/D90739	2020-11-04 11:37:58 -08:00
Craig Topper	d47300f503	[RISCV] Correct the operand order for fshl/fshr to fsl/fsr instructions. fsl/fsr take their shift amount in $rs2 or an immediate. The sources are $rs1 and $rs3. fshl/fshr ISD opcodes both concatenate operand 0 in the high bits and operand 1 in the lower bits. fshl returns the high bits after shifting and fshr returns the low bits. So a shift amount of 0 returns operand 0 for fshl and operand 1 for fshr. fsl/fsr concatenate their operands in different orders such that $rs1 will be returned for a shift amount of 0. So $rs1 needs to come from operand 0 of fshl and operand 1 of fshr. Differential Revision: https://reviews.llvm.org/D90735	2020-11-04 11:13:25 -08:00
Craig Topper	0122a4ea66	[RISCV] Remove assertsexti32 from inputs to riscv_sllw/srlw nodes in B extension isel patterns. riscv_sllw/srlw only reads the lower 32 bits of the first operand. And the lower 5 bits of the second operands. Whether the upper 32 bits of the input are sign bits or not doesn't matter. Also use ineg and not to shorten the patterns. Differential Revision: https://reviews.llvm.org/D90668	2020-11-04 10:35:05 -08:00
Craig Topper	857563eaf0	[RISCV] Check all 64-bits of the mask in SelectRORIW. We need to ensure the upper 32 bits of the mask are zero. So that the srl shifts zeroes into the lower 32 bits. Differential Revision: https://reviews.llvm.org/D90585	2020-11-04 10:15:30 -08:00
Christopher Tetreault	900ec97bbe	[UBSan] Cannot negate smallest negative signed integer Silence warning Undefined Behavior Sanitzer warning: runtime error: negation of -9223372036854775808 cannot be represented in type 'int64_t' (aka 'long'); cast to an unsigned type to negate this value to itself Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D90710	2020-11-04 10:07:52 -08:00
Craig Topper	3701e33a22	[RISCV] Remove custom isel for (srl (shl val, 32), imm). Use pattern instead. NFCI We don't need custom matching, we just a need a predicate to check the immediate is greater than 32. We can use the existing ImmSub32 to adjust the immediate. I've also used the new predicate in the other location that used ImmSub32. I tried to create a test case where we would break without the greater than 32 check on that pattern, but DAG combine defeated me. Still seemed safer to have it. Differential Revision: https://reviews.llvm.org/D90546	2020-11-04 09:59:14 -08:00
Joe Nash	58adab34c4	[AMDGPU] Resolve pseudo registers at encoding uses Pseudo-registers allow different register encodings between gpu generations. Make sure we resolve the pseudo regs to real regs whenever we get their hardware encoding. Using the correct encodings revealed a register bank conflict and an unnecessary write dependency. Tests have been updated to match. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D90721 Change-Id: I73c154cd24aecc820993b50bebaf4df97a5710ca	2020-11-04 12:52:32 -05:00
Sebastian Neubauer	31a0b2834f	[AMDGPU] Fix iterating in SIFixSGPRCopies The insertion of waterfall loops splits the current basic block into three blocks. So the basic block that we iterate over must be updated. This failed assert(!NodePtr->isKnownSentinel()) in ilist_iterator for divergent calls in branches before. Differential Revision: https://reviews.llvm.org/D90596	2020-11-04 18:43:19 +01:00
Paul C. Anagnostopoulos	d56cd4291e	[TableGen] Add !interleave operator to concatenate a list of values with delimiters Add a test. Use it in some TableGen files. Differential Revision: https://reviews.llvm.org/D90469	2020-11-04 09:23:54 -05:00
Simon Moll	351c10cc72	[VE] Add +vpu attribute `+vpu` controls whether VEISelLowering adds any vregs. This defaults to `-vpu` to have scalar code generation out of the box. We bring up vector isel under the `+vpu` flag. Once vector isel is stable we switch to `+vpu` and advertise vregs and vops in TTI. Reviewed By: kaz7 Differential Revision: https://reviews.llvm.org/D90465	2020-11-04 12:42:00 +01:00
Kerry McLaughlin	f2412d372d	[SVE][CodeGen] Lower scalable integer vector reductions This patch uses the existing LowerFixedLengthReductionToSVE function to also lower scalable vector reductions. A separate function has been added to lower VECREDUCE_AND & VECREDUCE_OR operations with predicate types using ptest. Lowering scalable floating-point reductions will be addressed in a follow up patch, for now these will hit the assertion added to expandVecReduce() in TargetLowering. Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D89382	2020-11-04 11:38:49 +00:00
Sebastian Neubauer	1124bf4ab7	[AMDGPU] Set rsrc1 flags for graphics shaders Before they were only set for compute kernels and compute shaders but not for other shaders. Differential Revision: https://reviews.llvm.org/D89399	2020-11-04 12:25:41 +01:00
Sebastian Neubauer	76313288cd	[AMDGPU] Fix ieee mode default value Previously, the default value for ieee mode was - on for compute kernels and compute shaders, - off for all shaders except compute shaders. This commit changes the default to be - on for compute kernels, - off for shaders. This aligns the default value with the settings that are actually in use. To my knowledge, all users of shader calling conventions (mesa and llpc) disable the ieee mode by default. Differential Revision: https://reviews.llvm.org/D89388	2020-11-04 12:25:38 +01:00
David Green	eb611930b6	[ARM] Remove unused variable. NFC	2020-11-04 09:00:03 +00:00
Sander de Smalen	73b6cb67dc	[NFCI] Replace AArch64StackOffset by StackOffset. This patch replaces the AArch64StackOffset class by the generic one defined in TypeSize.h. Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D88983	2020-11-04 08:49:00 +00:00
Amara Emerson	393b55380a	[AArch64][GlobalISel] Add combine for G_EXTRACT_VECTOR_ELT to allow selection of pairwise FADD. For the <2 x float> case, instead of adding another combine or legalization to get it into a <4 x float> form, I'm just adding a GISel specific selection pattern to cover it. Differential Revision: https://reviews.llvm.org/D90699	2020-11-03 17:25:14 -08:00
Julien Jorge	0fca651711	[WebAssembly] Don't fold frame offset for global addresses When machine instructions are in the form of ``` %0 = CONST_I32 @str %1 = ADD_I32 %stack.0, %0 %2 = LOAD 0, 0, %1 ``` In the `ADD_I32` instruction, it is possible to fold it if `%0` is a `CONST_I32` from an immediate number. But in this case it is a global address, so we shouldn't do that. But we haven't checked if the operand of `ADD` is an immediate so far. This fixes the problem. (The case applies the same for `ADD_I64` and `CONST_I64` instructions.) Fixes https://bugs.llvm.org/show_bug.cgi?id=47944. Patch by Julien Jorge (jjorge@quarkslab.com) Reviewed By: dschuff Differential Revision: https://reviews.llvm.org/D90577	2020-11-03 14:56:25 -08:00
Sanjay Patel	c40126e740	[ARM] remove cost-kind predicate for most math op costs This is based on the same idea that I am using for the basic model implementation and what I have partly already done for x86: throughput cost is number of instructions/uops, so size/blended costs are identical except in special cases (for example, fdiv or other known-expensive machine instructions or things like MVE that may require cracking into >1 uop)). Differential Revision: https://reviews.llvm.org/D90692	2020-11-03 17:23:46 -05:00
Jordan Rupprecht	980bf1d5d1	[NFC] Inline wasm assertion-only variable	2020-11-03 13:06:59 -08:00
Andy Wingo	107c3a12d6	[WebAssembly] Implement ref.null This patch adds a new "heap type" operand kind to the WebAssembly MC layer, used by ref.null. Currently the possible values are "extern" and "func"; when typed function references come, though, this operand may be a type index. Note that the "heap type" production is still known as "refedtype" in the draft proposal; changing its name in the spec is ongoing (https://github.com/WebAssembly/reference-types/issues/123). The register form of ref.null is still untested. Differential Revision: https://reviews.llvm.org/D90608	2020-11-03 10:46:23 -08:00
Craig Topper	00eff96e1d	[RISCV] Add missing patterns for rotr with immediate for Zbb/Zbp extensions. DAGCombine doesn't canonicalize rotl/rotr with immediate so we need patterns for both. Remove the custom matcher for rotl to RORI and just use a SDNodeXForm to convert the immediate instead. Doing this gives priority to the rev32/rev16 versions of grevi over rori since an explicit immediate is more precise than any immediate. I also added rotr patterns for rev32/rev16. And removed the (or (shl), (shr)) patterns that should be combined to rotl by DAG combine. There is at least one other grev pattern that probably needs a another rotr pattern, but we need more test coverage first. Differential Revision: https://reviews.llvm.org/D90575	2020-11-03 10:04:52 -08:00
Esme-Yi	5053eab890	Revert "[PowerPC] Extend folding RLWINM + RLWINM to post-RA." This reverts commit `119ab2181e`.	2020-11-03 16:34:02 +00:00
Tim Renouf	89d41f3a2b	[AMDGPU] Add gfx1033 target Differential Revision: https://reviews.llvm.org/D90447 Change-Id: If2650fc7f31bbdd49c76e74a9ca8e3734d769761	2020-11-03 16:27:48 +00:00
Tim Renouf	ee3e642627	[AMDGPU] Add gfx90c target This differentiates the Ryzen 4000/4300/4500/4700 series APUs that were previously included in gfx909. Differential Revision: https://reviews.llvm.org/D90419 Change-Id: Ia901a7157eb2f73ccd9f25dbacec38427312377d	2020-11-03 16:27:43 +00:00
Jay Foad	040c50278c	[AMDGPU] Fix ds_read2/write2 with unaligned offsets These instructions use a scaled offset. We were wrongly selecting them even when the required offset was not a multiple of the scale factor. Differential Revision: https://reviews.llvm.org/D90607	2020-11-03 15:16:10 +00:00
Jameson Nash	a0ad066ce4	make the AsmPrinterHandler array public This lets external consumers customize the output, similar to how AssemblyAnnotationWriter lets the caller define callbacks when printing IR. The array of handlers already existed, this just cleans up the code so that it can be exposed publically. Replaces https://reviews.llvm.org/D74158 Differential Revision: https://reviews.llvm.org/D89613	2020-11-03 10:02:09 -05:00
Sanjay Patel	9af561ec99	[x86] update cost table comments for maxnum; NFC Follow-up suggested in D90613.	2020-11-03 08:09:59 -05:00
David Green	bd32386410	[ARM] Remove unused variable. NFC	2020-11-03 12:58:10 +00:00
David Green	e474499402	[ARM] Treat memcpy/memset/memmove as call instructions for low overhead loops If an instruction will be lowered to a call there is no advantage of using a low overhead loop as the LR register will need to be spilled and reloaded around the call, and the low overhead will end up being reverted. This teaches our hardware loop lowering that these memory intrinsics will be calls under certain situations. Differential Revision: https://reviews.llvm.org/D90439	2020-11-03 11:53:09 +00:00
Nicholas Guy	54d8627852	[AArch64] Redundant masks in downcast long multiply Adds patterns to catch masks preceeding a long multiply, and generating a single umull/smull instruction instead. Differential revision: https://reviews.llvm.org/D89956	2020-11-03 10:12:28 +00:00
Petar Avramovic	0031418dce	AMDGPU/GlobalISel: Use same builder/observer in post-legalizer-combiner Change match/apply functions into methods of new target specific combiner helper class. Use reference to MachineIRBuilder from helper instead of constructing new MachineIRBuilder each time new instruction needs to made. Allows correct tracking of newly created instructions. Differential Revision: https://reviews.llvm.org/D90623	2020-11-03 09:24:50 +01:00
Esme-Yi	119ab2181e	[PowerPC] Extend folding RLWINM + RLWINM to post-RA. Summary: This patch depends on D89846. We have the patterns to fold 2 RLWINMs in ppc-mi-peephole, while some RLWINM will be generated after RA, for example rGc4690b007743. If the RLWINM generated after RA followed by another RLWINM, we expect to perform the optimization after RA, too. Reviewed By: shchenz, steven.zhang Differential Revision: https://reviews.llvm.org/D89855	2020-11-03 07:44:11 +00:00
Craig Topper	46e91f6701	[RISCV] Remove isel patterns for fshl/fshr with same inputs. NFC These were being selected to ROL/ROR, but DAG combine should canonicalize fshl/fshr with same inputs to rotl/rotr which we also have patterns for.	2020-11-02 23:12:18 -08:00
Esme-Yi	b969dfe26f	[NFC][PowerPC] Move the folding RLWINMs from ppc-mi-peephole to PPCInstrInfo. Summary: We have the patterns to fold 2 RLWINMs in ppc-mi-peephole, while some RLWINM will be generated after RA, for example D88274. If the RLWINM generated after RA followed by another RLWINM, we expect to perform the optimization after RA, too. This is a NFC patch to move the folding patterns to PPCInstrInfo, and the follow-up works will be calling it in pre-emit-peephole and expand the patterns to handle more cases. Reviewed By: shchenz Differential Revision: https://reviews.llvm.org/D89846	2020-11-03 06:28:56 +00:00
Jessica Clarke	7601a21738	[RISCV] Only return DestSourcePair from isCopyInstrImpl for registers ADDI often has a frameindex in operand 1, but consumers of this interface, such as MachineSink, tend to call getReg() on the Destination and Source operands, leading to the following crash when building FreeBSD after this implementation was added in 8cf6778d30: ``` clang: llvm/include/llvm/CodeGen/MachineOperand.h:359: llvm::Register llvm::MachineOperand::getReg() const: Assertion `isReg() && "This is not a register operand!"' failed. PLEASE submit a bug report to https://bugs.llvm.org/ and include the crash backtrace, preprocessed source, and associated run script. Stack dump: #0 0x00007f4286f9b4d0 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) llvm/lib/Support/Unix/Signals.inc:563:0 #1 0x00007f4286f9b587 PrintStackTraceSignalHandler(void) llvm/lib/Support/Unix/Signals.inc:630:0 #2 0x00007f4286f9926b llvm::sys::RunSignalHandlers() llvm/lib/Support/Signals.cpp:71:0 #3 0x00007f4286f9ae52 SignalHandler(int) llvm/lib/Support/Unix/Signals.inc:405:0 #4 0x00007f428646ffd0 (/lib/x86_64-linux-gnu/libc.so.6+0x3efd0) #5 0x00007f428646ff47 raise /build/glibc-2ORdQG/glibc-2.27/signal/../sysdeps/unix/sysv/linux/raise.c:51:0 #6 0x00007f42864718b1 abort /build/glibc-2ORdQG/glibc-2.27/stdlib/abort.c:81:0 #7 0x00007f428646142a __assert_fail_base /build/glibc-2ORdQG/glibc-2.27/assert/assert.c:89:0 #8 0x00007f42864614a2 (/lib/x86_64-linux-gnu/libc.so.6+0x304a2) #9 0x00007f428d4078e2 llvm::MachineOperand::getReg() const llvm/include/llvm/CodeGen/MachineOperand.h:359:0 #10 0x00007f428d8260e7 attemptDebugCopyProp(llvm::MachineInstr&, llvm::MachineInstr&) llvm/lib/CodeGen/MachineSink.cpp:862:0 #11 0x00007f428d826442 performSink(llvm::MachineInstr&, llvm::MachineBasicBlock&, llvm::MachineInstrBundleIterator<llvm::MachineInstr, false>, llvm::SmallVectorImpl<llvm::MachineInstr>&) llvm/lib/CodeGen/MachineSink.cpp:918:0 #12 0x00007f428d826e27 (anonymous namespace)::MachineSinking::SinkInstruction(llvm::MachineInstr&, bool&, std::map<llvm::MachineBasicBlock, llvm::SmallVector<llvm::MachineBasicBlock, 4u>, std::less<llvm::MachineBasicBlock>, std::allocator<std::pair<llvm::MachineBasicBlock const, llvm::SmallVector<llvm::MachineBasicBlock*, 4u> > > >&) llvm/lib/CodeGen/MachineSink.cpp:1073:0 #13 0x00007f428d824a2c (anonymous namespace)::MachineSinking::ProcessBlock(llvm::MachineBasicBlock&) llvm/lib/CodeGen/MachineSink.cpp:410:0 #14 0x00007f428d824513 (anonymous namespace)::MachineSinking::runOnMachineFunction(llvm::MachineFunction&) llvm/lib/CodeGen/MachineSink.cpp:340:0 ``` Thus, check that operand 1 is also a register in the condition. Reviewed By: arichardson, luismarques Differential Revision: https://reviews.llvm.org/D89090	2020-11-03 03:55:47 +00:00
Qiu Chaofan	d14e51806b	[PowerPC] Skip IEEE 128-bit FP type in FastISel Vector types, quadword integers and f128 currently cannot be handled in FastISel. We did not skip f128 type in lowering arguments, which causes a crash. This patch will fix it. Reviewed By: steven.zhang Differential Revision: https://reviews.llvm.org/D90206	2020-11-03 11:17:11 +08:00
Qiu Chaofan	3204ffeade	[PowerPC] [NFC] Rename VCMPo to VCMP_rec Reviewed By: jsji Differential Revision: https://reviews.llvm.org/D90581	2020-11-03 11:10:59 +08:00
Fangrui Song	ca01a6b3ac	[PowerPC] Parse and ignore .machine ppc64 In the wild, kexec-tools purgatory/arch/ppc64/v2wrap.S and hvcall.S use this directive.	2020-11-02 16:49:57 -08:00
Krzysztof Parzyszek	b26a2755dc	[Hexagon] Move isTypeForHVX from Hexagon TTI to HexagonSubtarget, NFC It's useful outside of Hexagon TTI, and with how TTI is implemented, it is not accessible outside of TTI.	2020-11-02 14:00:45 -06:00
Stanislav Mekhanoshin	c9d6fe6f7d	[AMDGPU] Improve FLAT scratch detection We were useing too broad check for isFLATScratch() which also includes FLAT global. Differential Revision: https://reviews.llvm.org/D90505	2020-11-02 11:37:33 -08:00
Craig Topper	9ac2910093	[RISCV] Make SelectRORIW handle the commutability of OR. The SHL and SRL could be in opposite order so account for that. Differential Revision: https://reviews.llvm.org/D90586	2020-11-02 09:32:54 -08:00
Sanjay Patel	35fa3c474f	[x86] add AVX2 cost model entries for maxnum of 256-bit vectors As noticed in D90554 , the AVX2 costs for 256-bit vectors did not include FMAXNUM entries, so we fell back to AVX1 which assumes those ops will be split into 128-bit halves or something close to that. Differential Revision: https://reviews.llvm.org/D90613	2020-11-02 12:20:17 -05:00
Craig Topper	7142ec3aaf	[RISCV] When matching RORIW, make sure the same input is given to both shifts. The code is looking for (sext_inreg (or (shl X, C2), (shr (and Y, C3), C1))). We need to ensure X and Y are the same. Differential Revision: https://reviews.llvm.org/D90580	2020-11-02 09:12:40 -08:00
Momchil Velikov	7360d6d921	[ARM][MachineOutliner] Do not overestimate LR liveness in return block The `LiveRegUnits` utility (as well as `LivePhysRegs`) considers callee-saved registers to be alive at the point after the return instruction in a block. In the ARM backend, the `LR` register is classified as callee-saved, which is not really correct (from an ARM eABI or just common sense point of view). These two conditions cause the `MachineOutliner` to overestimate the liveness of `LR`, which results in unnecessary saves/restores of `LR` around calls to outlined sequences. It also causes the `MachineVerifer` to crash in some cases, because the save instruction reads a dead `LR`, for example when the following program: int h(int, int); int f(int a, int b, int c, int d) { a = h(a + 1, b - 1); b = b + c; return 1 + (2 * a + b) * (c - d) / (a - b) * (c + d); } int g(int a, int b, int c, int d) { a = h(a - 1, b + 1); b = b + c; return 2 + (2 * a + b) * (c - d) / (a - b) * (c + d); } is compiled with `-target arm-eabi -march=armv7-m -Oz`. This patch computes the liveness of `LR` in return blocks only, while taking into account the few ARM instructions, which read `LR`, but nevertheless the register is not mentioned (explicitly or implicitly) in the instruction operands. Differential Revision: https://reviews.llvm.org/D89189	2020-11-02 16:47:22 +00:00
Florian Hahn	b3b993a7ad	Reland "[TTI] Add VecPred argument to getCmpSelInstrCost." This reverts the revert commit `408c4408fa`. This version of the patch includes a fix for a crash caused by treating ICmp/FCmp constant expressions as instructions. Original message: On some targets, like AArch64, vector selects can be efficiently lowered if the vector condition is a compare with a supported predicate. This patch adds a new argument to getCmpSelInstrCost, to indicate the predicate of the feeding select condition. Note that it is not sufficient to use the context instruction when querying the cost of a vector select starting from a scalar one, because the condition of the vector select could be composed of compares with different predicates. This change greatly improves modeling the costs of certain compare/select patterns on AArch64. I am also planning on putting up patches to make use of the new argument in SLPVectorizer & LV.	2020-11-02 15:39:29 +00:00
Matt Arsenault	86b8f6919b	AMDGPU: Reorder checks	2020-11-02 10:21:48 -05:00
Evgeny Leviant	cc96a82291	[TableGen][SchedModels] Fix read/write variant substitution Patch fixes case when sched class has write and read variants belonging to different processor models. Differential revision: https://reviews.llvm.org/D89777	2020-11-02 17:39:04 +03:00
Jay Foad	0892d2a311	Revert "Fix ds_read2/write2 unaligned offsets" This reverts commit `2e7e898c8f`. It was committed by mistake.	2020-11-02 14:01:33 +00:00
Jay Foad	2e7e898c8f	Fix ds_read2/write2 unaligned offsets	2020-11-02 13:57:13 +00:00
Simon Pilgrim	36920d5f9d	[RISCV] Avoid std::pair<> in FPReg StringSwitch to avoid MSVC compile failures. NFCI. As discussed on D90322, some MSVC builds are failing with is_trivially_copyable static asserts (see D86126) - we can avoid this by not using the std::pair<unsigned,unsigned> which held both the FP+DP Registers, just handle the FP register and convert to DP on the fly.	2020-11-02 11:30:57 +00:00
Caroline Concatto	71038788ce	Revert "[AArch64][AsmParser] Remove 'x31' alias for 'sp/xzr' register." This reverts commit `8b281bfaf3`.	2020-11-02 08:15:50 +00:00
Caroline Concatto	8b281bfaf3	[AArch64][AsmParser] Remove 'x31' alias for 'sp/xzr' register. Only the aliases 'xzr' and 'sp' exist for the physical register x31. The reason for wanting to remove the alias 'x31' is because it allows users to write invalid asm that is not accepted by the GNU assembler. Is there any objection to removing this alias? Or do we want to keep this for compatibility with existing code that uses w31/x31? Differential Revision: https://reviews.llvm.org/D90153	2020-11-02 07:57:05 +00:00
Qiu Chaofan	2762e6734f	[PowerPC] Fix a crash in POWER 9 setb peephole Variable InnerIsSel references FalseRes, while FalseRes might be zext/sext. So InnerIsSel should reference SetOrSelCC, otherwise a crash will happen. Reviewed By: steven.zhang Differential Revision: https://reviews.llvm.org/D90142	2020-11-02 14:29:43 +08:00
Craig Topper	e57237f198	Recommit "[RISCV] Remove include of RISCVRegisterInfo.h from RISCVBaseInfo.h. NFCI" This reverts `781917254d` and recommits `781917254d`. I've changed getRegForInlineAsmConstraint to not use a std::pair of Register in a previous commit. Hopefully that fixes the reported issue with expensive checks on Windows. I'm still not sure exactly why this commit removing an include affected a different file. Original message: RISCVRegisterInfo.h is part of the CodeGen layer. The Utils library is intended to be shared with the MC layer so shouldn't use files from the CodeGen layer. The register enum names are already available from RISCVMCTargetDesc.h. It appears what was coming from this include was a transitive include of the Register class which I've replaced with MCRegister. Register has a constructor from MCRegister so it should be convertible.	2020-11-01 10:35:37 -08:00
Craig Topper	a76cd10fcd	[RISCV] Use 'unsigned' instead of Register in getRegForInlineAsmConstraint. NFC The return value of this interface still uses an 'unsigned' on all targets. So we convert Register back to unsigned at the end. I'm hoping this will prevent the issue that caused the revert of D90322.	2020-11-01 10:16:52 -08:00
Christudasan Devadasan	d6aa4aa29a	[AMDGPU] Some refactoring after D90404. NFC.	2020-11-01 13:18:53 +05:30
Christudasan Devadasan	9bb2b4f0aa	[AMDGPU] Add alignment check for v3 to v4 load type promotion It should be enabled only when the load alignment is at least 8-byte. Fixes: SWDEV-256824 Reviewed By: foad Differential Revision: https://reviews.llvm.org/D90404	2020-11-01 12:05:34 +05:30
Ayke van Laethem	e03ba2198d	[AVR] Improve inline rotate/shift expansions These expansions were rather inefficient and were done with more code than necessary. This change optimizes them to use expansions more similar to GCC. The code size is the same (when optimizing for code size) but somehow LLVM reorders blocks in a non-optimal way. Still, this should be an improvement with a reduction in code size of around 0.12% (when building compiler-rt). Differential Revision: https://reviews.llvm.org/D86418	2020-10-31 23:15:49 +01:00
Paul C. Anagnostopoulos	ef6f6d1c1a	[TableGen] Eliminate uses of true and false in .td files. They occurred in one NVPTX file and some test files. Differential Revision: https://reviews.llvm.org/D90513	2020-10-31 10:54:33 -04:00
David Green	30ad742644	[ARM] Fix crash for gather of pointer costs. If the elt size is unknown due to it being a pointer, a comparison against 0 will cause an assert. Make sure the elt size is large enough before comparing and for the moment just return the scalar cost.	2020-10-31 13:10:14 +00:00
Simon Pilgrim	9e406ee808	[X86] Make some basic VarArgsLoweringHelper helper methods const. NFCI. Fixes a number of cppcheck remarks.	2020-10-31 12:16:49 +00:00
Simon Pilgrim	e0cbcf96ce	[X86] Make the X86FrameSortingComparator operator const. NFCI. Fixes a cppcheck remark.	2020-10-31 12:16:49 +00:00
Simon Pilgrim	55dbb7d823	[X86] X86MCTargetDesc - ensure the declaration/definition variable names match. NFCI. Silences cppcheck mismatch warnings.	2020-10-31 11:50:00 +00:00
Simon Pilgrim	30a1d91127	[X86] Reduce scope of DestReg and use specific Register type not unsigned. NFCI.	2020-10-31 11:46:07 +00:00
Simon Pilgrim	ae80ac6db2	[X86] printAsmMRegister - make the X86AsmPrinter arg a const reference. NFC. Fixes cppcheck warning.	2020-10-31 11:41:14 +00:00
Simon Pilgrim	39f77b3224	[X86] assignValueToReg - fix Wshadow warning. NFCI. X86OutgoingValueHandler already has a MIB member	2020-10-31 11:39:26 +00:00
Simon Pilgrim	33e20008d1	[X86] printAsmVRegister - remove unused argument. NFC.	2020-10-31 11:34:28 +00:00
Simon Pilgrim	ec547a7517	[X86] X86AsmPrinter - ensure the declaration/definition variable names match. NFCI. Silences cppcheck mismatch warnings.	2020-10-31 11:31:46 +00:00
Simon Pilgrim	5eec049689	[X86] No need to determine pointer when the type is already a MachineInstr. NFCI. Caught by cppcheck - appears to be a copy+paste typo as the other var is an iterator that does need the & pointer operation.	2020-10-31 11:26:25 +00:00
Liu, Chen3	756f597841	[X86] Support Intel avxvnni This patch mainly made the following changes: 1. Support AVX-VNNI instructions; 2. Introduce ExplicitVEXPrefix flag so that vpdpbusd/vpdpbusds/vpdpbusds/vpdpbusds instructions only use vex-encoding when user explicity add {vex} prefix. Differential Revision: https://reviews.llvm.org/D89105	2020-10-31 12:39:51 +08:00
Thomas Lively	a787e09779	[WebAssembly] Prototype i64x2.bitmask As proposed in https://github.com/WebAssembly/simd/pull/368. Differential Revision: https://reviews.llvm.org/D90514	2020-10-30 17:23:30 -07:00
Wouter van Oortmerssen	86cd2332ce	[WebAssembly] Fixed DWARF DW_AT_low_pc encoded as 64-bit in wasm64 Also added general wasm64 DWARF test Also added asserts for unsupported reloc combinations that triggered this bug. Differential Revision: https://reviews.llvm.org/D90503	2020-10-30 16:42:48 -07:00
Thomas Lively	0a512a555a	[WebAssembly] Prototype i64x2.eq As proposed in https://github.com/WebAssembly/simd/pull/381. Since it is still in the prototyping phase, it is only accessible via a target builtin function and a target intrinsic. Depends on D90504. Differential Revision: https://reviews.llvm.org/D90508	2020-10-30 16:38:15 -07:00
Thomas Lively	1cb0b56607	[WebAssembly] Prototype i64x2.widen_{low,high}_i32x4_{s,u} As proposed in https://github.com/WebAssembly/simd/pull/290. As usual, these instructions are available only via builtin functions and intrinsics while they are in the prototyping stage. Differential Revision: https://reviews.llvm.org/D90504	2020-10-30 15:44:04 -07:00
Florian Hahn	408c4408fa	Revert "[TTI] Add VecPred argument to getCmpSelInstrCost." This reverts commit `73f01e3df5`. This appears to break http://lab.llvm.org:8011/#/builders/85/builds/383.	2020-10-30 21:26:14 +00:00
Peter Collingbourne	3d049bce98	hwasan: Support for outlined checks in the Linux kernel. Add support for match-all tags and GOT-free runtime calls, which are both required for the kernel to be able to support outlined checks. This requires extending the access info to let the backend know when to enable these features. To make the code easier to maintain introduce an enum with the bit field positions for the access info. Allow outlined checks to be enabled with -mllvm -hwasan-inline-all-checks=0. Kernels that contain runtime support for outlined checks may pass this flag. Kernels lacking runtime support will continue to link because they do not pass the flag. Old versions of LLVM will ignore the flag and continue to use inline checks. With a separate kernel patch [1] I measured the code size of defconfig + tag-based KASAN, as well as boot time (i.e. time to init launch) on a DragonBoard 845c with an Android arm64 GKI kernel. The results are below: code size boot time before 92824064 6.18s after 38822400 6.65s [1] https://linux-review.googlesource.com/id/I1a30036c70ab3c3ee78d75ed9b87ef7cdc3fdb76 Depends on D90425 Differential Revision: https://reviews.llvm.org/D90426	2020-10-30 14:25:40 -07:00
Cameron McInally	dda1e74b58	[Legalize] Add legalizations for VECREDUCE_SEQ_FADD Add Legalization support for VECREDUCE_SEQ_FADD, so that we don't need to depend on ExpandReductionsPass. Differential Revision: https://reviews.llvm.org/D90247	2020-10-30 16:02:55 -05:00
Peter Collingbourne	c9b1a2b41d	AArch64: Use SBFX instead of UBFX to extract address granule in outlined HWASan checks. In a kernel (or in general in environments where bit 55 of the address is set) the shadow base needs to point to the end of the shadow region, not the beginning. Bit 55 needs to be sign extended into bits 52-63 of the shadow base offset, otherwise we end up loading from an invalid address. We can do this by using SBFX instead of UBFX. Using SBFX should have no effect in the userspace case where bit 55 of the address is clear so we do so unconditionally. I don't think we need a ABI version bump for this (but one will come anyway when we switch to x20 for the shadow base register). Differential Revision: https://reviews.llvm.org/D90424	2020-10-30 12:53:15 -07:00
Peter Collingbourne	3859fc653f	AArch64: Switch to x20 as the shadow base register for outlined HWASan checks. From a code size perspective it turns out to be better to use a callee-saved register to pass the shadow base. For non-leaf functions it avoids the need to reload the shadow base into x9 after each function call, at the cost of an additional stack slot to save the caller's x20. But with x9 there is also a stack size cost, either as a result of copying x9 to a callee-saved register across calls or by spilling it to stack, so for the non-leaf functions the change to stack usage is largely neutral. It is also code size (and stack size) neutral for many leaf functions. Although they now need to save/restore x20 this can typically be combined via LDP/STP into the x30 save/restore. In the case where the function needs callee-saved registers or stack spills we end up needing, on average, 8 more bytes of stack and 1 more instruction but given the improvements to other functions this seems like the right tradeoff. Unfortunately we cannot change the register for the v1 (non short granules) check because the runtime assumes that the shadow base register is stored in x9, so the v1 check still uses x9. Aside from that there is no change to the ABI because the choice of shadow base register is a contract between the caller and the outlined check function, both of which are compiler generated. We do need to rename the v2 check functions though because the functions are deduplicated based on their names, not on their contents, and we need to make sure that when object files from old and new compilers are linked together we don't end up with a function that uses x9 calling an outlined check that uses x20 or vice versa. With this change code size of /system/lib64/*.so in an Android build with HWASan goes from 200066976 bytes to 194085912 bytes, or a 3% decrease. Differential Revision: https://reviews.llvm.org/D90422	2020-10-30 12:51:30 -07:00
Craig Topper	6915c76e10	[RISCV] Don't use DCI.CombineTo to replace a single result. NFCI Just return the new node, which is the standard practice. I also noticed what appeared to be an unnecessary attempt at creating an ANY_EXTEND where the type should already be correct. I replace with an assert to verify the type. Differential Revision: https://reviews.llvm.org/D90444	2020-10-30 10:46:32 -07:00
Sanjay Patel	251dd7c0f9	[x86] add cost overrides for mul with overflow I'm assuming the standard size integer instructions for this end up as something like: mulq %rsi seto %al And the 'mul' generally has reciprocal throughput of 1 on typical implementations (higher latency, but that's not handled here). The default costs may end up much higher than that, and that's what we see in the test diffs. Vector types are left as a 'TODO'. Differential Revision: https://reviews.llvm.org/D90431	2020-10-30 12:38:16 -04:00
Simon Moll	4474d4d49c	[VE][NFC] Split up lowering init Split up the monolithic VETargetLowering ctor into three initialization phases: 1. initRegisterClasses() 2. initSPUActions() 3. // TODO initVPUActions() Reviewed By: kaz7 Differential Revision: https://reviews.llvm.org/D90463	2020-10-30 16:18:27 +01:00
Matt Arsenault	790f5771fd	AMDGPU: Fix missing writelane cases to skip with exec=0	2020-10-30 11:15:11 -04:00
serge-sans-paille	0f60bcc36c	[stack-clash] Fix probing of dynamic alloca - Perform the probing in the correct direction. Related to https://github.com/rust-lang/rust/pull/77885#issuecomment-711062924 - The first touch on a dynamic alloca cannot use a mov because it clobbers existing space. Use a xor 0 instead Differential Revision: https://reviews.llvm.org/D90216	2020-10-30 15:34:00 +01:00
Simon Pilgrim	0ff1ab42f2	Use cast<> instead of dyn_cast<> as we dereference the pointer immediately. NFCI. Fix clang static analyzer warning - we know that the arg should be ConstantInt and we're better off relying on cast<> asserting on failure rather than a null dereference crash.	2020-10-30 14:33:20 +00:00
Florian Hahn	73f01e3df5	[TTI] Add VecPred argument to getCmpSelInstrCost. On some targets, like AArch64, vector selects can be efficiently lowered if the vector condition is a compare with a supported predicate. This patch adds a new argument to getCmpSelInstrCost, to indicate the predicate of the feeding select condition. Note that it is not sufficient to use the context instruction when querying the cost of a vector select starting from a scalar one, because the condition of the vector select could be composed of compares with different predicates. This change greatly improves modeling the costs of certain compare/select patterns on AArch64. I am also planning on putting up patches to make use of the new argument in SLPVectorizer & LV. Reviewed By: dmgreen, RKSimon Differential Revision: https://reviews.llvm.org/D90070	2020-10-30 13:49:08 +00:00
David Sherwood	cea69fa4dc	[SVE] Add fatal error for unnamed SVE variadic arguments We don't currently support passing unnamed variadic SVE arguments so I've added a fatal error if we hit such cases to prevent any silent ABI issues in future. Differential Revision: https://reviews.llvm.org/D90230	2020-10-30 13:35:47 +00:00
David Green	d14db8c8dc	[ARM] Match MVE vqdmulh This adds ISel matching for a form of VQDMULH. There are several ir patterns that we could match to that instruction, this one is for: min(ashr(mul(sext(a), sext(b)), 7), 127) Which is what llvm will optimize to once it has removed the max that usually makes up the min/max saturate pattern, as in this case the compare will always be false. The additional complication to match i32 patterns (which extend into an i64) is that the min will be a vselect/setcc, as vmin is not supported for i64 vectors. Tablegen patterns have also been updated to attempt to reuse the MVE_TwoOpPattern patterns. Differential Revision: https://reviews.llvm.org/D90096	2020-10-30 13:34:27 +00:00
Simon Pilgrim	781917254d	Revert rG22c383763456 "[RISCV] Remove include of RISCVRegisterInfo.h from RISCVBaseInfo.h" This reverts commit `22c3837634`. This is causing a build failure with MSVC - reported on D90322	2020-10-30 11:59:37 +00:00
alex-t	a4f7e4264c	[AMDGPU] SILowerControlFlow::removeMBBifRedundant. Refactoring plus fix for the null MBB pointer in MF->splice Detailed description: This change addresses the refactoring adviced by foad. It also contain the fix for the case when getNextNode is null if the successor block is the last in MachineFunction. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D90314	2020-10-30 14:46:08 +03:00
Michael Roe	fc0892c1f9	[mips] Implement add.ps, mul.ps and sub.ps Differential revision: https://reviews.llvm.org/D90321	2020-10-30 10:59:15 +03:00
Krzysztof Parzyszek	db60e64036	[Hexagon] Handle additional shuffles that can be made perfect	2020-10-29 19:09:00 -05:00
Craig Topper	74b078294f	[RISCV] Improve worklist management in the DAG combine for SLLW/SRLW/SRAW This combine makes two calls to SimplifyDemandedBits, one for the LHS and one for the RHS. If the LHS call returns true, we don't make the RHS call. When SimplifyDemandedBits makes a change, it will add the nodes around the change to the DAG combiner worklist. If the simplification happens on the first recursion step, the N will get added to the worklist. But if the simplification happens deeper in the recursion, then N will not be revisited until the next time the DAG combiner runs. This patch explicitly addes N to the worklist anytime a Simplification is made. Without this we might miss additional simplifications on the LHS or never simplify the RHS. Special care also needs to be taken to not add N if it has been CSEd by the simplification. There are similar examples in DAGCombiner and the X86 target, but I don't have a test for it for RISC-V. I've also returned SDValue(N, 0) instead of SDValue() so DAGCombiner knows a change was made and will update its Statistic variable. The test here was constructed so that 2 simplifications happen to the LHS. Without this fix one happens in the post type legalization DAG combine and the other happens after LegalizeDAG. This prevents the RHS from ever being simplified causing the left and right shift to clear the upper 32 bits of the RHS to be left behind. Differential Revision: https://reviews.llvm.org/D90339	2020-10-29 14:52:53 -07:00
Craig Topper	22c3837634	[RISCV] Remove include of RISCVRegisterInfo.h from RISCVBaseInfo.h RISCVRegisterInfo.h is part of the CodeGen layer. The Utils library is intended to be shared with the MC layer so shouldn't use files from the CodeGen layer. The register enum names are already available from RISCVMCTargetDesc.h. It appears what was coming from this include was a transitive include of the Register class which I've replaced with MCRegister. Register has a constructor from MCRegister so it should be convertible.	2020-10-29 11:39:19 -07:00
Thomas Lively	be6f50798e	[WebAssembly] Implement SIMD signselect instructions As proposed in https://github.com/WebAssembly/simd/pull/124, using the opcodes adopted by V8 in https://chromium-review.googlesource.com/c/v8/v8/+/2486235/2/src/wasm/wasm-opcodes.h. Uses new builtin functions and a new target intrinsic exclusively to ensure that the new instructions are only emitted when a user explicitly opts in to using them since they are still in the prototyping and evaluation phase. Differential Revision: https://reviews.llvm.org/D90357	2020-10-29 11:06:20 -07:00
Jay Foad	9cee87d72a	[AMDGPU] Fix double space in disassembly of ds_gws_sema_* with gds By setting up the AsmStrings correctly we can remove some special cases from AMDGPUInstPrinter::printOffset. Differential Revision: https://reviews.llvm.org/D90307	2020-10-29 17:31:59 +00:00
Jay Foad	58de4b2053	[AMDGPU] Use pseudo instructions for readlane/writelane This reverts r227987 "R600/SI: Determine target-specific encoding of READLANE and WRITELANE early v2". All the codegen changes are caused by the post-RA scheduler no longer treating readlane/writelane as scheduling barriers due to having unmodelled side effects. (The pseudos are hasSideEffects = 0, but the real instructions are hasSideEffects = ? which TableGen conservatively treats as 1.) Differential Revision: https://reviews.llvm.org/D90401	2020-10-29 16:00:53 +00:00
Nicholas Guy	eb9fe24eaf	[ARM] Fix IT block generation after Thumb2SizeReduce with -Oz Fixes a regression caused by D82439, in which IT blocks were no longer being generated when -Oz is present. Differential Revision: https://reviews.llvm.org/D88496	2020-10-29 15:17:31 +00:00
Jay Foad	7a79921edd	[AMDGPU] Remove gds operand from ds_gws_* MachineInstrs The operand value was always 1 (except in some bad MIR tests) so it was redundant. Differential Revision: https://reviews.llvm.org/D90378	2020-10-29 15:04:23 +00:00
Jay Foad	a442fad911	[AMDGPU] Fix double space in disassembly of s_set_gpr_idx_mode Differential Revision: https://reviews.llvm.org/D90374	2020-10-29 14:54:33 +00:00
Jay Foad	e9dd2c4fe2	[AMDGPU] Fix double space in disassembly of some DPP instructions Differential Revision: https://reviews.llvm.org/D90373	2020-10-29 14:54:33 +00:00
Kazushi (Jam) Marukawa	58a6b7bcde	[VE] Add missing BCR format Add missing "BCR %sy, 0, target" format instruction and a regression test for this format. Reviewed By: simoll Differential Revision: https://reviews.llvm.org/D90387	2020-10-29 23:30:49 +09:00
Kazushi (Jam) Marukawa	07d1996601	[VE] Support register aliases in llvm-mc Support register aliases in MC layer to compile existing assembly files with clang and integrated assembler. Reviewed By: simoll Differential Revision: https://reviews.llvm.org/D90383	2020-10-29 23:28:32 +09:00
Jay Foad	69f5105f5c	[AMDGPU] Simplify insertNoops functions. NFC.	2020-10-29 10:55:20 +00:00
Kazushi (Jam) Marukawa	9c82944b2d	[VE] Add vector control instructions Add LVL/SVL/SMVL/LVIX isntructions. Add regression tests too. Reviewed By: simoll Differential Revision: https://reviews.llvm.org/D90355	2020-10-29 19:24:31 +09:00
Ben Shi	076a8d915b	[NFC][AVR] Improve device list Reviewed By: dylanmckay https://reviews.llvm.org/D87968	2020-10-29 10:54:17 +08:00
Kazushi (Jam) Marukawa	7942960199	[VE] Add vector mask operation instructions Add VFMK/VFMS/VFMF/ANDM/ORM/XORM/EQVM/NNDM/NEGM/PCVM/LZVM/TOVM isntructions. Add regression tests too. Also add new patterns to parse VFMK/VFMS/VFMF mnemonics. Reviewed By: simoll Differential Revision: https://reviews.llvm.org/D90297	2020-10-29 08:42:41 +09:00
Austin Kerbow	de51867343	[AMDGPU] Add Reset function to GCNHazardRecognizer Reset the tracked emitted instructions when starting scheduling on a new region. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D90347	2020-10-28 16:32:32 -07:00
Jay Foad	5b91a6a88b	[AMDGPU] Allow some modifiers on VOP3B instructions V_DIV_SCALE_F32/F64 are VOP3B encoded so they can't use the ABS src modifier, but they can still use NEG and the usual output modifiers. This partially reverts `3b99f12a4e` "AMDGPU: Remove modifiers from v_div_scale_*". Differential Revision: https://reviews.llvm.org/D90296	2020-10-28 21:54:14 +00:00
Jay Foad	50ee22d791	[AMDGPU] Fix double space in disassembly of SDWA instructions with vcc Differential Revision: https://reviews.llvm.org/D90317	2020-10-28 21:39:39 +00:00
Florian Hahn	772aaa6023	[AArch64] Improve lowering of insert_vector_elt with 0.0 consts. When moving +0.0 into a float vector, we can use to vi*gpr variants of INS. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D90176	2020-10-28 21:35:33 +00:00
Austin Kerbow	8b127a8661	[AMDGPU] Fix inserting combined s_nop in bundles Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D90334	2020-10-28 14:34:04 -07:00
Florian Hahn	ba78cae20f	[AArch64] Use DUP for BUILD_VECTOR with few different elements. If most elements of BUILD_VECTOR are the same, with a few different elements, it is better to use DUP for the common elements and INSERT_VECTOR_ELT for the different elements. Currently this transform is guarded quite restrictively to only trigger in clearly beneficial cases. With D90176, the lowering for patterns originating from code like ` float32x4_t y = {a,a,a,0};` (common in 3D apps) are lowered even better (unnecessary fmov is removed). Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D90233	2020-10-28 19:48:20 +00:00
Sanjay Patel	7c395f31a6	[CostModel][x86] remove cost-kind predicate for intrinsic costs We model cost as number of instructions / uops, so it does not make sense to treat size/blended costs any differently than throughput.	2020-10-28 14:33:37 -04:00
Thomas Lively	31e944556f	[WebAssembly] Prototype extending multiplication SIMD instructions As proposed in https://github.com/WebAssembly/simd/pull/376. This commit implements new builtin functions and intrinsics for these instructions, but does not yet add them to wasm_simd128.h because they have not yet been merged to the proposal. These are the first instructions with opcodes greater than 0xff, so this commit updates the MC layer and disassembler to handle that correctly. Differential Revision: https://reviews.llvm.org/D90253	2020-10-28 09:38:59 -07:00
Paul C. Anagnostopoulos	9d72065cf6	[TableGen] [AMDGPU] Add !sub operator for subtraction Use it in the AMDGPU target to eliminate !add(value1, !mul(value2, -1)) Differential Revision: https://reviews.llvm.org/D90107	2020-10-28 12:27:53 -04:00
Jay Foad	9e634bc22f	[AMDGPU] Omit needless string concatenations. NFC.	2020-10-28 12:56:52 +00:00
Kazushi (Jam) Marukawa	cbdee7df06	[VE] Add vector merger operation instructions Add VMRG/VSHF/VCP/VEX isntructions. Add regression tests too. Also add new patterns to parse new UImm4 oeprand. Reviewed By: simoll Differential Revision: https://reviews.llvm.org/D90292	2020-10-28 19:57:10 +09:00
Kazushi (Jam) Marukawa	7ce2b93cbe	[VE] Add vector iterative operation instructions Add VFIA/VFIS/VFIM/VFIAM/VFISM/VFIMA/VFIMS isntructions. Add regression tests too. Reviewed By: simoll Differential Revision: https://reviews.llvm.org/D90252	2020-10-28 19:06:46 +09:00
Kazushi (Jam) Marukawa	15f6250bed	[VE][NFC] Fix typo in comment	2020-10-28 18:51:07 +09:00
Kazushi (Jam) Marukawa	b22e32a9c8	[VE] Specify to expand BRIND and BR_JT BRIND and BR_JT are not implmented yet, so expand them atm. Add regression tests too. Reviewed By: simoll Differential Revision: https://reviews.llvm.org/D90283	2020-10-28 18:50:20 +09:00
David Green	066737fdbc	[AArch64] Remove AArch64ISD::NOT, use vnot instead vnot (xor -1) should be equivalent to the AArch64 specific AArch64ISD::NOT node, but allow more folding thanks to all the target independent optimizations. Specifically this allows select(icmp ne, x, y) to become "cmeq; bsl y, x" as opposed to needing to convert the predicate with "cmeq; mvn; bsl x, y" Unfortunately there is a regression in a cmtst test, but the code it selected from was already non-canonical, with instcombine preferring to use an eq predicate instead. Plus the more common case of icmp ne is improved. Differential Revision: https://reviews.llvm.org/D90126	2020-10-28 08:15:37 +00:00
Carl Ritson	057934a6d7	[AMDGPU] Fix insert of SIPreAllocateWWMRegs in FastRegAlloc SIPreAllocateWWMRegs was being inserted after RegisterCoalescer but this pass does not exist during FastAlloc so pre-allocation pass was never being run. Insert pre-allocation after TwoAddressInstructionPass instead. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D90236	2020-10-28 12:15:15 +09:00
Nemanja Ivanovic	5459d08795	[PowerPC] Fix single-use check and update chain users for ld-splat When converting a BUILD_VECTOR or VECTOR_SHUFFLE to a splatting load as of `1461fb6e78`, we inaccurately check for a single user of the load and neglect to update the users of the output chain of the original load. As a result, we can emit a new load when the original load is kept and the new load can be reordered after a dependent store. This patch fixes those two issues. Fixes https://bugs.llvm.org/show_bug.cgi?id=47891	2020-10-27 16:49:38 -05:00
Stanislav Mekhanoshin	78ae1f6c90	[AMDGPU] Change predicate for fma/fmac legacy I do not exactly like the use of a negative predicate to enable instructions' support. Change HasNoMadMacF32Insts with HasFmaLegacy32. Differential Revision: https://reviews.llvm.org/D90250	2020-10-27 12:03:52 -07:00
Victor Huang	2e1a737f46	[PowerPC][PCRelative] Turn on TLS support for PCRel by default Turn on TLS support for PCRel by default and update the test cases. Differential Revision: https://reviews.llvm.org/D88738 Reviewed by: stefanp, kamaub	2020-10-27 13:58:44 -05:00
Michael Liao	46c3d5cb05	[amdgpu] Add the late codegen preparation pass. Summary: - Teach that pass to widen naturally aligned but not DWORD aligned sub-DWORD loads. Reviewers: rampitec, arsenm Subscribers: Tags: #llvm Differential Revision: https://reviews.llvm.org/D80364	2020-10-27 14:07:59 -04:00
Kazushi (Jam) Marukawa	a65883a78a	[VE] Add vector reduction instructions Add VSUMS/VSUMX/VFSUM/VMAXS/VMAXX/VFMAX/VRAND/VROR/VRXOR isntructions. Add regression tests too. Reviewed By: simoll Differential Revision: https://reviews.llvm.org/D90227	2020-10-28 02:33:21 +09:00
Michael Liao	0d092303b4	[amdgpu] Enable use of AA during codegen. - Add an internal option `-amdgpu-use-aa-in-codegen` to enable or disable this feature. By Default, it's enabled. Differential Revision: https://reviews.llvm.org/D89320	2020-10-27 09:46:23 -04:00
Benjamin Kramer	35f7cbf9df	[X86] Don't crash on CVTPS2PH with wide vector inputs.	2020-10-27 14:42:02 +01:00
Kazushi (Jam) Marukawa	c5fa6bae12	[VE] Add vector float instructions Add VFAD/VFSB/VFMP/VFDV/VFSQRT/VFCP/VFCM/VFMAD/VFMSB/VFNMAD/VFNMSB/ VRCP/VRSQRT/VRSQRTNEX/VFIX/VFIXX/VFLT/VFLTX/VCVS/VCVD instructions. Add regression tests too. Also add additional AsmParser for VFIX and VFIXX instructions to parse their mnemonic. Reviewed By: simoll Differential Revision: https://reviews.llvm.org/D90166	2020-10-27 20:42:24 +09:00
Jay Foad	6539ebe97d	[AMDGPU] Use DPP instead of Ext in a couple of class names. NFC.	2020-10-27 10:22:30 +00:00
Craig Topper	f385823e04	[X86] Alternate implementation of D88194. This uses PreprocessISelDAG to replace the constant before instruction selection instead of matching opcodes after. Reviewed By: pengfei Differential Revision: https://reviews.llvm.org/D89178	2020-10-27 00:20:03 -07:00
Wei Wang	d602e79a81	[X86] Encode global address in small code model In small code model, program and its symbols are linked in the lower 2 GB of the address space. Try encoding global address even when the range is unknown in such case. Differential Revision: https://reviews.llvm.org/D89341	2020-10-26 23:14:06 -07:00
Bing1 Yu	2c08f1b4b6	[CostModel][X86] teach TTI calculate cost of chain of vector inserts/extracts more precisely and correctly:In each 128-lane, if there is at least one index is demanded and not all indices are demanded... In each 128-lane, if there is at least one index is demanded and not all indices are demanded and this 128-lane is not the first 128-lane of the legalized-vector, then this 128-lane needs a extracti128; If in each 128-lane, there is at least one index is demanded, this 128-lane needs a inserti128. The following cases will help you build a better understanding: Assume we insert several elements into a v8i32 vector in avx2, Case#1: inserting into 1th index needs vpinsrd + inserti128 Case#2: inserting into 5th index needs extracti128 + vpinsrd + inserti128 Case#3: inserting into 4,5,6,7 index needs 4*vpinsrd + inserti128. Reviewed By: pengfei, RKSimon Differential Revision: https://reviews.llvm.org/D89767	2020-10-27 11:21:13 +08:00
Chen Zheng	00e573cadb	[LSR] fix typo in comments and rename for a new added hook.	2020-10-26 22:29:22 -04:00
Carl Ritson	7a880ab388	[AMDGPU] Move WQM Pass after MI Scheduler Exec mask manipulation inserted by SIWholeQuadMode barriers to instruction scheduling. Move the entire pass after the machine instruction scheduler and make changes so pass is correct for non-SSA operation. These changes should leave the pass still usable pre-scheduler, although tests have be updated to reflect post-scheduler results. Reviewed By: nhaehnle Differential Revision: https://reviews.llvm.org/D88081	2020-10-27 10:25:53 +09:00
Amy Kwan	803cc3aff2	[PowerPC] Implement Set Boolean Condition Instructions This patch implements the set boolean condition instructions introduced in POWER10. The set boolean condition instructions (set[n]bc[r]) are used during the following situations: - sign/zero/any extending i1 to an i32 or i64, - reg+reg, reg+imm or floating point comparisons being sign/zero extended to i32 or i64, - spilling CR bits (using the setnbc instruction) Differential Revision: https://reviews.llvm.org/D87705	2020-10-26 18:42:51 -05:00
Stanislav Mekhanoshin	d176e13ca5	Fixed release build after D89170	2020-10-26 16:00:57 -07:00
Stanislav Mekhanoshin	038d884a50	[AMDGPU] Use flat scratch instructions where available The support is disabled by default. So far there is instruction selection, spilling, and frame elimination. It also changes SP from unswizzled to swizzled as used by flat scratch instructions, so it cannot be mixed with MUBUF stack access. At the very least missing: - GlobalISel; - Some optimizations in frame elimination in between vector and scalar ALU; - It shall finally allow to always materialize frame index as an SGPR, but that is not implemented and frame elimination cannot handle it yet; - Unaligned and/or multidword flat scratch shall work, but it is legalized now for MUBUF; - Operand folding cannot optimize FI like with MUBUF yet; - It will need scaling the value of the SP/FP in the DWARF expression to recover the unswizzled scratch address; Differential Revision: https://reviews.llvm.org/D89170	2020-10-26 14:40:42 -07:00
Evgeny Leviant	a28388f95b	[ARM][SchedModels] Move IsLDMBaseRegInListPred to ARMSchedule.td. NFC This predicate is not specific to cortex-a57 and can be used in other processor models as well.	2020-10-26 22:31:41 +03:00
Stanislav Mekhanoshin	ad8131bb03	[AMDGPU] Fix VC warning about singed/unsigned comparison. NFC. This is the warning reported in https://reviews.llvm.org/D89599	2020-10-26 11:55:57 -07:00
Evgeny Leviant	e74f66125e	[ARM][SchedModels] Convert IsLdstsoScaledNotOptimalPred to MCSchedPredicate Differential revision: https://reviews.llvm.org/D90150	2020-10-26 20:22:41 +03:00
Evgeny Leviant	a877bda397	Fix issue in cortex-a57 sched model Differential revision: https://reviews.llvm.org/D90152	2020-10-26 20:16:40 +03:00
Benjamin Kramer	b777d30496	[AMDGPU] Avoid unused variable warning in Release builds. NFC. SIRegisterInfo.cpp:480:19: error: unused variable 'SOffset'	2020-10-26 18:11:57 +01:00
Kazushi (Jam) Marukawa	9d0db405b5	[VE] Add vector shift instructions Add VSLL/VSLD/VSRL/VSLA/VSLAX/VSRA/VSRAX/VSFA instructionss. Add additonal AsmParser for VSLD special operand. Also add regression tests. Reviewed By: simoll Differential Revision: https://reviews.llvm.org/D90143	2020-10-27 00:30:27 +09:00
Kazushi (Jam) Marukawa	83cb423c6e	[VE] Add vector logical instructions Add VAND/VOR/VXOE/VEQV/VLDZ/VPCNT/VBRV/VSEQ instrucitons and regression tests. Reviewed By: simoll Differential Revision: https://reviews.llvm.org/D90141	2020-10-27 00:29:33 +09:00
Kazushi (Jam) Marukawa	cfefef50c1	[VE] Support atomic store Support atomic store instructions and add a regression test. Reviewed By: simoll Differential Revision: https://reviews.llvm.org/D90137	2020-10-27 00:28:11 +09:00
Jay Foad	0ca4124798	[AMDGPU] Make more use of printNamedBit in AMDGPUInstPrinter. NFC.	2020-10-26 14:03:35 +00:00
Kazushi (Jam) Marukawa	8aa60f67dc	[VE] Add vector comparison and min/max Add VCMP/VCPS/VCPX/VCMS/VCMX vector instructions. Also add regression tests. Reviewed By: simoll Differential Revision: https://reviews.llvm.org/D89643	2020-10-26 18:32:04 +09:00
Kazushi (Jam) Marukawa	0acf700243	[VE] Add integer arithmetic vector instructions Add VADD/VADS/VADX/VSUB/VSBS/VSBX/VMPY/VMPS/VMPX/VMPD/VDIV/VDVS/VDVX instructions. Also add regression tests. Reviewed By: simoll Differential Revision: https://reviews.llvm.org/D89642	2020-10-26 18:30:11 +09:00
Sebastian Neubauer	a094b4fa4b	[AMDGPU] Emit new pal metadata by default If no pal metadata is given, default to the msgpack format instead of the legacy metadata. This makes tests better readable. Differential Revision: https://reviews.llvm.org/D90035	2020-10-26 10:16:17 +01:00
Evgeny Leviant	a95ce5f65f	[ARM][SchedModels] Rename and generalize predicate. NFC	2020-10-26 12:14:55 +03:00
Kazushi (Jam) Marukawa	f32992ad24	[VE] Support atomic load Support atomic load instruction and add a regression test. VE uses release consitency, so need to insert fence around atomic instructions. This patch enable AtomicExpandPass and use emitLeadingFence and emitTrailingFence mechanism for such purpose. Reviewed By: simoll Differential Revision: https://reviews.llvm.org/D90135	2020-10-26 18:02:45 +09:00
Evgeny Leviant	99b2756517	[ARM][SchedModels] Get rid of IsLdrAm2ScaledPred Differential revision: https://reviews.llvm.org/D90024	2020-10-26 12:01:39 +03:00
Evgeny Leviant	a4fc18e641	[ARM][SchedModels] Convert IsLdstsoMinusRegPred to MCSchedPredicate Differential revision: https://reviews.llvm.org/D90029	2020-10-26 11:54:08 +03:00
Evgeny Leviant	d613e39d52	[ARM][SchedModels] Convert IsLdrAm3NegRegOffPred to MCSchedPredicate Differential revision: https://reviews.llvm.org/D90045	2020-10-26 11:43:02 +03:00
David Green	61bc18de0b	[Schedule] Add a MultiHazardRecognizer This adds a MultiHazardRecognizer and starts to make use of it in the ARM backend. The idea of the class is to allow multiple independent hazard recognizers to be added to a single base MultiHazardRecognizer, allowing them to all work in parallel without requiring them to be chained into subclasses. They can then be added or not based on cpu or subtarget features, which will become useful in the ARM backend once more hazard recognizers are being used for various things. This also renames ARMHazardRecognizer to ARMHazardRecognizerFPMLx in the process, to more clearly explain what that recognizer is designed for. Differential Revision: https://reviews.llvm.org/D72939	2020-10-26 08:06:17 +00:00
Kazushi (Jam) Marukawa	52f03fe115	[VE] Support atomic fence Support atomic fence instruction and add a regression test. Add MEMBARRIER pseudo insturction also to use it as a barrier against to the compiler optimizations. Reviewed By: simoll Differential Revision: https://reviews.llvm.org/D90112	2020-10-26 17:03:09 +09:00
Christudasan Devadasan	5a061041ec	[AMDGPU] Avoid offset register in MUBUF for direct stack object accesses We use an absolute address for stack objects and it would be necessary to have a constant 0 for soffset field. Fixes: SWDEV-228562 Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D89234	2020-10-26 11:08:37 +05:30
Craig Topper	82974e0114	[X86] Don't disassemble wbinvd with 0xf2 or 0x66 prefix. The 0xf3 prefix has been defined as wbnoinvd on Icelake Server. So the prefix isn't ignored by the CPU. AMD documentation suggests that wbnoinvd is treated as wbinvd on older processors. Intel documentation is not clear. Perhaps 0xf2 and 0x66 are treated the same, but its not documented. This patch changes TB to PS in the td file so 0xf2 and 0x66 will be treated as errors. This matches versions of objdump after wbnoinvd was added.	2020-10-25 20:56:01 -07:00
Liu, Chen3	180548c5c7	[X86] VEX/EVEX prefix doesn't work for inline assembly. For now, we lost the encoding information if we using inline assembly. The encoding for the inline assembly will keep default even if we add the vex/evex prefix. Differential Revision: https://reviews.llvm.org/D90009	2020-10-26 08:37:45 +08:00
Craig Topper	63ba82ed00	[X86] Use TargetConstant for immediates for VASTART_SAVE_XMM_REGS.	2020-10-25 12:52:56 -07:00
Craig Topper	2ed16aa66f	[X86] Use TargetConstant instead of Constant for operands to X86vaarg64.	2020-10-25 12:24:59 -07:00
Craig Topper	a222d832d5	[X86] Use TargetConstant for FPDiff with X86::TC_RETURN. It's required to be a constant and can never be in a register so make it explicit.	2020-10-25 00:29:11 -07:00
Fangrui Song	f04d92af94	[X86] Produce R_X86_64_GOTPCRELX for test/binop instructions (MOV32rm/TEST32rm/...) when -Wa,-mrelax-relocations=yes is enabled We have been producing R_X86_64_REX_GOTPCRELX (MOV64rm/TEST64rm/...) and R_X86_64_GOTPCRELX for CALL64m/JMP64m without the REX prefix since 2016 (to be consistent with GNU as), but not for MOV32rm/TEST32rm/...	2020-10-24 15:14:17 -07:00
Fangrui Song	d5adadb3a5	[AArch64][GlobalISel] Fix -Wunused-variable. NFC	2020-10-24 12:47:11 -07:00
Benjamin Kramer	39a0d6889d	[X86] Add a stub for Intel's alderlake. No scheduling, no autodetection.	2020-10-24 19:01:22 +02:00
Benjamin Kramer	bd2cf96c09	[X86] Add a stub for znver3 based on the little public information there is in AMD's manuals No scheduling, no autodetection. Just enough so -march=znver3 works.	2020-10-24 19:01:22 +02:00
dfukalov	9068c20965	[AMDGPU][CostModel] Refine cost model for half- and quarter-rate instructions. 1. Throughput and codesize costs estimations was separated and updated. 2. Updated fdiv cost estimation for different cases. 3. Added scalarization processing for types that are treated as !isSimple() to improve codesize estimation in getArithmeticInstrCost() and getArithmeticInstrCost(). The code was borrowed from TCK_RecipThroughput path of base implementation. Next step is unify scalarization part in base class that is currently works for TCK_RecipThroughput path only. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D89973	2020-10-24 19:53:08 +03:00
David Green	92205bf122	[ARM] Remove some dead code. NFC	2020-10-24 17:22:49 +01:00
Simon Pilgrim	ce356e1546	[DAG] Add BuildVectorSDNode::getRepeatedSequence helper to recognise multi-element splat patterns Replace the X86 specific isSplatZeroExtended helper with a generic BuildVectorSDNode method. I've just used this to simplify the X86ISD::BROADCASTM lowering so far (and remove isSplatZeroExtended), but we should be able to use this in more places to lower to complex broadcast patterns. Differential Revision: https://reviews.llvm.org/D87930	2020-10-24 12:23:09 +01:00
Jonas Paulsson	7c026a83ee	[SystemZ] Define MaxInstLength to have the value of 6. This value had the default value of 4 which caused branch relaxation to fail. Review: Ulrich Weigand Differential Revision: https://reviews.llvm.org/D90065	2020-10-24 09:19:34 +02:00
Krzysztof Parzyszek	1b5baa42bc	[Hexagon] Handle selection between HVX vector predicates Make sure that (select i1 q0 q1) is handled properly.	2020-10-23 18:22:03 -05:00
Evandro Menezes	fe9a7d9627	[RISCV] Use the commercial name for scheduling model (NFC) Use the commercial name for the scheduling model for the SiFive 7 Series.	2020-10-23 16:33:27 -05:00
Cameron McInally	a1cc274cb3	[SVE] Lower fixed length VECREDUCE_SEQ_FADD operation Differential Revision: https://reviews.llvm.org/D89162	2020-10-23 16:24:02 -05:00
Stanislav Mekhanoshin	2e64ad9494	[AMDGPU] Fixed isLegalRegOperand() with physregs This does not change anything at the moment, but needed for D89170. In that change I am probing a physical SGPR to see if it is legal. RC is SReg_32, but DRC for scratch instructions is SReg_32_XEXEC_HI and test fails. That is sufficient just to check if DRC contains a register here in case of physreg. Physregs also do not use subregs so the subreg handling below is irrelevant for these. Differential Revision: https://reviews.llvm.org/D90064	2020-10-23 11:33:34 -07:00
Baptiste Saleil	edb27912a3	[PowerPC] Add intrinsics for MMA This patch adds support for MMA intrinsics. Authored by: Baptiste Saleil Reviewed By: #powerpc, bsaleil, amyk Differential Revision: https://reviews.llvm.org/D89345	2020-10-23 13:16:02 -05:00
Amara Emerson	0f0fd383b4	[AArch64][GlobalISel] Introduce a new post-isel optimization pass. There are two optimizations here: 1. Consider the following code: FCMPSrr %0, %1, implicit-def $nzcv %sel1:gpr32 = CSELWr %_, %_, 12, implicit $nzcv %sub:gpr32 = SUBSWrr %_, %_, implicit-def $nzcv FCMPSrr %0, %1, implicit-def $nzcv %sel2:gpr32 = CSELWr %_, %_, 12, implicit $nzcv This kind of code where we have 2 FCMPs each feeding a CSEL can happen when we have a single IR fcmp being used by two selects. During selection, to ensure that there can be no clobbering of nzcv between the fcmp and the csel, we have to generate an fcmp immediately before each csel is selected. However, often we can essentially CSE these together later in MachineCSE. This doesn't work though if there are unrelated flag-setting instructions in between the two FCMPs. In this case, the SUBS defines NZCV but it doesn't have any users, being overwritten by the second FCMP. Our solution here is to try to convert flag setting operations between a interval of identical FCMPs, so that CSE will be able to eliminate one. 2. SelectionDAG imported patterns for arithmetic ops currently select the flag-setting ops for CSE reasons, and add the implicit-def $nzcv operand to those instructions. However if those impdef operands are not marked as dead, the peephole optimizations are not able to optimize them into non-flag setting variants. The optimization here is to find these dead imp-defs and mark them as such. This pass is only enabled when optimizations are enabled. Differential Revision: https://reviews.llvm.org/D89415	2020-10-23 10:18:36 -07:00
Huihui Zhang	1e113c078a	[AArch64][SVE] Fix umin/umax lowering to handle out of range imm. Immediate must be in an integer range [0,255] for umin/umax instruction. Extend pattern matching helper SelectSVEArithImm() to take in value type bitwidth when checking immediate value is in range or not. Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D89831	2020-10-23 09:42:56 -07:00
Victor Huang	7a74bb899a	[PowerPC] Fix the Predicates for enabling pcrelative-memops and PLXVP/PSTXVP definitions In this patch, Predicates fix added for the following: * disable prefix-instrs will disable pcrelative-memops * set two predicates PairedVectorMemops and PrefixInstrs for PLXVP/PSTXVP definitions Differential Revision: https://reviews.llvm.org/D89727 Reviewed by: amyk, steven.zhang	2020-10-23 11:33:20 -05:00
vpykhtin	00255f4192	[AMDGPU] Fix access beyond the end of the basic block in execMayBeModifiedBeforeAnyUse. I was wrong in thinking that MRI.use_instructions return unique instructions and mislead Jay in his previous patch D64393. First loop counted more instructions than it was in reality and the second loop went beyond the basic block with that counter. I used Jay's previous code that relied on MRI.use_operands to constrain the number of instructions to check among. modifiesRegister is inlined to reduce the number of passes over instruction operands and added assert on BB end boundary. Differential Revision: https://reviews.llvm.org/D89386	2020-10-23 19:17:48 +03:00
Paulo Matos	69e2797eae	[WebAssembly] Implementation of (most) table instructions Implementation of instructions table.get, table.set, table.grow, table.size, table.fill, table.copy. Missing instructions are table.init and elem.drop as they deal with element sections which are not yet implemented. Added more tests to tables.s Differential Revision: https://reviews.llvm.org/D89797	2020-10-23 08:42:54 -07:00
Jay Foad	958130dfda	[AMDGPU] Add simplification/combines for llvm.amdgcn.fma.legacy This follows on from D89558 which added the new intrinsic and D88955 which added similar combines for llvm.amdgcn.fmul.legacy. Differential Revision: https://reviews.llvm.org/D90028	2020-10-23 16:16:13 +01:00
Paul C. Anagnostopoulos	876af264c1	[TableGen] Change !getop and !setop to !getdagop and !setdagop. Differential Revision: https://reviews.llvm.org/D89814	2020-10-23 10:36:05 -04:00
Matt Arsenault	8a59d4b654	AMDGPU: Don't query for TII in TII	2020-10-23 10:34:24 -04:00
Matt Arsenault	d61996473d	AMDGPU: Increase branch size estimate with offset bug This will be relaxed to insert a nop if the offset hits the bad value, so over estimate branch instruction sizes.	2020-10-23 10:34:24 -04:00
Chen Zheng	1e0b6c1df0	[LSR] ignore profitable chain when reg num is not major cost. Reviewed By: samparker Differential Revision: https://reviews.llvm.org/D89665	2020-10-23 09:35:48 -04:00
Simon Pilgrim	936ef89ebe	[X86] lowerShuffleWithPERMV - use MVT::changeTypeToInteger helper. NFCI.	2020-10-23 12:35:27 +01:00
Evgeny Leviant	cb86522c94	[ARM][SchedModels] Convert IsR1P0AndLaterPred to MCSchedPredicate. NFC Differential revision: https://reviews.llvm.org/D90017	2020-10-23 14:27:49 +03:00
Florian Hahn	0fcc6f7a76	[AArch64] Implement getIntrinsicInstrCost, handle min/max intrinsics. This patch adds a specialized implementation of getIntrinsicInstrCost and add initial cost-modeling for min/max vector intrinsics. AArch64 NEON support umin/smin/umax/smax for vectors <8 x i8>, <16 x i8>, <4 x i16>, <8 x i16>, <2 x i32> and <4 x i32>. Notably, it does not support vectors with i64 elements. This change by itself should have very little impact on codegen, but in follow-up patches I plan to teach the vectorizers to consider using those intrinsics on platforms where it is profitable, e.g. because there is no general 'select'-like instruction. The current cost returned should be better for throughput, latency and size. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D89953	2020-10-23 11:32:42 +01:00
Jay Foad	86a480e9ce	[AMDGPU] Add simplification/combines for llvm.amdgcn.fmul.legacy Differential Revision: https://reviews.llvm.org/D88955	2020-10-23 09:31:00 +01:00
Evgeny Leviant	7a78073be7	[ARM][SchedModels] Let ldm* instruction scheduling use MCSchedPredicate Differential revision: https://reviews.llvm.org/D89957	2020-10-23 10:33:20 +03:00
Jessica Paquette	19dc9c9780	[AArch64][GlobalISel] Move imm adjustment for G_ICMP to post-legalizer lowering Move the code which adjusts the immediate/predicate on a G_ICMP to AArch64PostLegalizerLowering. This - Reduces the number of places we need to test for optimized compares in the selector. We know that the compare should have been simplified by the time it hits the selector, so we can avoid testing this in selects, brconds, etc. - Allows us to potentially fold more compares (previously, this optimization was only done after calling `tryFoldCompare`, this may allow us to hit some more TST cases) - Simplifies the selection code in `emitIntegerCompare` significantly; we can just use an emitSUBS function. - Allows us to avoid checking that the predicate has been updated after `emitIntegerCompare`. Also add a utility header file for things that may be useful in the selector and various combiners. No need for an implementation file at this point, since it's just one constexpr function for now. I've run into a couple cases where having one of these would be handy, so might as well add it here. There are a couple functions in the selector that can probably be factored out into here. Differential Revision: https://reviews.llvm.org/D89823	2020-10-22 15:27:36 -07:00
Jessica Paquette	147b9497e7	[AArch64][GlobalISel] Split post-legalizer combiner to allow for lowering at -O0 There are a lot of combines in AArch64PostLegalizerCombiner which exist to facilitate instruction matching in the selector. (E.g. matching for G_ZIP and other shuffle vector pseudos) It still makes sense to select these instructions at -O0. Matching earlier in a combiner can reduce complexity in the selector significantly. For example, a good portion of our selection code for compares would be a lot easier to represent in a combine. This patch moves matching combines into a "AArch64PostLegalizerLowering" combiner which runs at all optimization levels. Also, while we're here, improve the documentation for the AArch64PostLegalizerCombiner, and fix up the filepath in its file comment. And also add a 'r' which somehow got dropped from a bunch of function names. https://reviews.llvm.org/D89820	2020-10-22 14:43:25 -07:00
Tim Corringham	3c1273d737	[AMDGPU] Add amdgpu specific loop threshold metadata Add new loop metadata amdgpu.loop.unroll.threshold to allow the initial AMDGPU specific unroll threshold value to be specified on a loop by loop basis. The intention is to be able to to allow more nuanced hints, e.g. specifying a low threshold value to indicate that a loop may be unrolled if cheap enough rather than using the all or nothing llvm.loop.unroll.disable metadata. Differential Revision: https://reviews.llvm.org/D84779	2020-10-22 17:21:32 +01:00
Mircea Trofin	e24537d48f	[NFC][MC] Use MCRegister for ReachingDefAnalysis APIs Also updated the users of the APIs; and a drive-by small change to RDFRegister.cpp Differential Revision: https://reviews.llvm.org/D89912	2020-10-22 08:47:35 -07:00
Piotr Sobczak	7ae0033ca8	[AMDGPU] Fix expansion of i16 MULH This commit marks i16 MULH as expand in AMDGPU backend, which is necessary after the refactoring in D80485. Differential Revision: https://reviews.llvm.org/D89965	2020-10-22 17:05:06 +02:00
Evgeny Leviant	ed6a91f456	[ARM][SchedModels] Convert IsLdstsoScaledPred to MCSchedPredicate Differential revision: https://reviews.llvm.org/D89939	2020-10-22 18:03:01 +03:00
Simon Pilgrim	2692978050	[X86] X86AsmParser - make methods const where possible. NFCI. Reported by cppcheck	2020-10-22 15:55:06 +01:00
Simon Pilgrim	091b18ba81	[X86] Return const& in IntelExprStateMachine::getIdentifierInfo(). NFCI. Avoid unnecessary copy in X86AsmParser::ParseIntelOperand	2020-10-22 15:55:06 +01:00
Matt Arsenault	d5c0561667	AMDGPU: Fix not always reserving VGPRs used for SGPR spilling The VGPRs used for SGPR spills need to be reserved, even if we aren't speculatively reserving one. This was broken by `117e5609e9`.	2020-10-22 10:19:19 -04:00
Matt Arsenault	d3bcfe2a36	AMDGPU: Implement getNoPreservedMask We don't support funclets for exception handling and I hit this when manually reducing MIR.	2020-10-22 10:17:31 -04:00
Simon Pilgrim	794dc7ad26	[CodeGen] Split MVT::changeTypeToInteger() functionality from EVT::changeTypeToInteger(). Add the MVT equivalent handling for EVT changeTypeToInteger/changeVectorElementType/changeVectorElementTypeToInteger. All the SimpleVT code already exists inside the EVT equivalents, but by splitting this out we can use these directly inside MVT types without converting to/from EVT.	2020-10-22 14:27:42 +01:00
Jeremy Morse	d73275993b	[DebugInstrRef] Substitute debug value numbers to handle optimizations This patch touches two optimizations, TwoAddressInstruction and X86's FixupLEAs pass, both of which optimize by re-creating instructions. For LEAs, various bits of arithmetic are better represented as LEAs on X86, while TwoAddressInstruction sometimes converts instrs into three address instructions if it's profitable. For debug instruction referencing, both of these require substitutions to be created -- the old instruction number must be pointed to the new instruction number, as illustrated in the added test. If this isn't done, any variable locations based on the optimized instruction are conservatively dropped. Differential Revision: https://reviews.llvm.org/D85756	2020-10-22 13:01:03 +01:00
Tianqing Wang	be39a6fe6f	[X86] Add User Interrupts(UINTR) instructions For more details about these instructions, please refer to the latest ISE document: https://software.intel.com/en-us/download/intel-architecture-instruction-set-extensions-programming-reference. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D89301	2020-10-22 17:33:07 +08:00
Tony	8e8cc587a5	[NFC][AMDGPU] Reorder SIMemoryLegalizer functions to be consistent - Make the SIMemoryLegalizer insertAcquire function be in the same order for each target to be consistent. Differential Revision: https://reviews.llvm.org/D89880	2020-10-22 05:39:18 +00:00
Xiang1 Zhang	7c3fea7721	[X86] Support customizing stack protector guard Reviewed By: nickdesaulniers, MaskRay Differential Revision: https://reviews.llvm.org/D88631	2020-10-22 10:08:14 +08:00
Craig Topper	9e884169a2	[FPEnv][X86][SystemZ] Use different algorithms for i64->double uint_to_fp under strictfp to avoid producing -0.0 when rounding toward negative infinity Some of our conversion algorithms produce -0.0 when converting unsigned i64 to double when the rounding mode is round toward negative. This switches them to other algorithms that don't have this problem. Since it is undefined behavior to change rounding mode with the non-strict nodes, this patch only changes the behavior for strict nodes. There are still problems with unsigned i32 conversions too which I'll try to fix in another patch. Fixes part of PR47393 Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D87115	2020-10-21 18:12:54 -07:00
Gaurav Jain	4634ad6c0b	[NFC] Set return type of getStackPointerRegisterToSaveRestore to Register Differential Revision: https://reviews.llvm.org/D89858	2020-10-21 16:19:38 -07:00
Evgeny Leviant	bf9edcb6fd	[ARM][SchedModels] Convert IsLdrAm3RegOffPred to MCSchedPredicate Differential revision: https://reviews.llvm.org/D89876	2020-10-21 20:49:10 +03:00
Stanislav Mekhanoshin	611959f004	[AMDGPU] Fixed v_swap_b32 match 1. Fixed liveness issue with implicit kills. 2. Fixed potential problem with an indirect mov. Fixes: SWDEV-256848 Differential Revision: https://reviews.llvm.org/D89599	2020-10-21 10:14:24 -07:00
Joe Nash	f6d7832f4c	[AMDGPU] Refactor SOPC & SOPP .td for extension We use the Real vs Pseudo instruction abstraction for other types of instructions to facilitate changes in opcode between gpu generations. This patch introduces that abstraction to SOPC and SOPP. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D89738 Change-Id: I59d53c2c7058b49d05b60350f4062a9b542d3138	2020-10-21 12:35:52 -04:00
Matt Arsenault	1ed4caff1d	AMDGPU: Lower the threshold reported for maximum stack size exceeded Check the actual maximum supported stack size for a kernel.	2020-10-21 12:06:27 -04:00
Matt Arsenault	53c43431bc	AMDGPU: Propagate amdgpu-flat-work-group-size attributes Fixes being overly conservative with the register counts in called functions. This should try to do a conservative range merge, but for now just clone. Also fix not being able to functionally run the pass standalone.	2020-10-21 12:06:24 -04:00
Paul C. Anagnostopoulos	dfd6b69e01	[ARM] [TableGen] Clean up !if(!eq(boolean, 1) and related booleans Differential Revision: https://reviews.llvm.org/D89822	2020-10-21 09:52:45 -04:00
Jonas Paulsson	1606755da0	[SystemZ] Mark unsaved argument R6 as live throughout function. For historical reasons, the R6 register is a callee-saved argument register. This means that if it is used to pass an argument to a function that does not clobber it, it is live throughout the function. This patch makes sure that in this special case any kill flags of it are removed. Review: Ulrich Weigand, Eli Friedman Differential Revision: https://reviews.llvm.org/D89451	2020-10-21 14:38:59 +02:00
Nicholas Guy	9a2d2bedb7	Add "SkipDead" parameter to TargetInstrInfo::DefinesPredicate Some instructions may be removable through processes such as IfConversion, however DefinesPredicate can not be made aware of when this should be considered. This parameter allows DefinesPredicate to distinguish these removable instructions on a per-call basis, allowing for more fine-grained control from processes like ifConversion. Renames DefinesPredicate to ClobbersPredicate, to better reflect it's purpose Differential Revision: https://reviews.llvm.org/D88494	2020-10-21 11:52:47 +01:00
Sebastian Neubauer	5290f50e44	[AMDGPU] Fix off by one in assert D89217 did not subtract one when accessing SubRegFromChannelTable in one place. Differential Revision: https://reviews.llvm.org/D89804	2020-10-21 12:37:52 +02:00
Jay Foad	f6a5699c6c	[AMDGPU][TableGen] Make more use of !ne !not !and !or. NFC.	2020-10-21 09:56:43 +01:00
Craig Topper	d4d0b41a82	[X86] Remove period from end of error message in assembler Addresses post-commit feedback from D89837.	2020-10-21 00:43:23 -07:00
Craig Topper	79a69f558f	[X86] Error on using h-registers with REX prefix in the assembler instead of leaving it to a fatal error in the encoder. Using a fatal error is bad for user experience. Reviewed By: pengfei Differential Revision: https://reviews.llvm.org/D89837	2020-10-20 21:35:44 -07:00
Carl Ritson	324a15cead	[AMDGPU][NFC] Fix missing size in comment	2020-10-21 11:38:21 +09:00
Austin Kerbow	ebdcef20ce	[AMDGPU] Avoid inserting noops during scheduling Passes that are run after the post-RA scheduler may insert instructions like waitcnt which eliminate the need for certain noops. After this patch the scheduler is still aware of possible latency from hazards but noops will not be inserted until the dedicated hazard recognizer pass is run. Depends on D89753. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D89754	2020-10-20 17:11:36 -07:00
Austin Kerbow	37d907899f	[HazardRec] Allow inserting multiple wait-states simultaneously If a target can encode multiple wait-states into a noop allow emitting such instructions directly. Reviewed By: rampitec, dmgreen Differential Revision: https://reviews.llvm.org/D89753	2020-10-20 17:03:47 -07:00
Tony	1bc7bfffdb	[AMDGPU] Optimize waitcnt insertion for flat memory operations Change waitcnt insertion to check the memory operand tokens to see if flat memory operations access VMEM in the same way it does to check if accessing LDS. This avoids adding waitcnt for counters for address spaces that are not accessed. In addition, only generate the pessimistic waitcnt 0 if a flat memory operation appears to access both VMEM and LDS. This benefits flat memory operations that explicitly specify the address space as GLOBAL or LOCAL. Differential Revision: https://reviews.llvm.org/D89618	2020-10-20 22:55:12 +00:00
Craig Topper	1298252f80	[X86] Move 'int $3' -> 'int3' handling in the assembler to processInstruction. Instead of handling before parsing, just fix it after parsing.	2020-10-20 15:22:00 -07:00
Craig Topper	702aae368a	[X86] Move 's{hr,ar,hl} , <op>' to 'shift <op>' optimization in the assembler into processInstruction. Instead of detecting the mnemonic and hacking the operands before parsing. Just fix it up after parsing.	2020-10-20 15:20:46 -07:00
Paul C. Anagnostopoulos	332ff48dee	[AMDGPU] [TableGen] Clean up !if(!eq(boolean, 1) and related booleans Differential Revision: https://reviews.llvm.org/D89796	2020-10-20 16:23:15 -04:00
vnalamot	89f7ccea6f	[AMDGPU] Remove getAllVGPR32() which cannot handle Accum VGPRs properly Remove getAllVGPR32() interface and update the SGPR spill code to use a proper method to get the relevant VGPR registers list. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D89806	2020-10-20 23:15:24 +05:30
Jay Foad	4913e3627c	[AMDGPU] Remove unused declaration. NFC. The implementation of this method was removed in D89706.	2020-10-20 16:31:42 +01:00
Michael Liao	2a0e4d1c01	[amdgpu] Enhance AMDGPU AA. - In general, a generic point may alias to pointers in all other address spaces. However, for certain cases enforced by the programming model, we may found a generic point won't alias to pointers to local objects. * When a generic pointer is loaded from the constant address space, it could only be a pointer to the GLOBAL or CONSTANT address space. Thus, it won't alias to pointers to the PRIVATE or LOCAL address space. * When a generic pointer is passed as a kernel argument, it also could only be a pointer to the GLOBAL or CONSTANT address space. Thus, it also won't alias to pointers to the PRIVATE or LOCAL address space. Differential Revision: https://reviews.llvm.org/D89525	2020-10-20 09:54:12 -04:00
Carl Ritson	be2afbd019	[AMDGPU] Remove fix up operand from SI_ELSE Remove immediate operand from SI_ELSE which indicates if EXEC has been modified. Instead always emit code that handles EXEC and remove unnecessary instructions during pre-RA optimisation. This facilitates passes (i.e. SIWholeQuadMode) adding exec mask manipulation post control flow lowering, and pre control flow lower passes do not need to be aware of SI_ELSE handling. Reviewed By: nhaehnle Differential Revision: https://reviews.llvm.org/D89644	2020-10-20 19:15:21 +09:00
Carl Ritson	6aabbeadae	[AMDGPU][NFC] Tidy SIOptimizeExecMaskingPreRA for extensibility Remove duplicate code and move things around to make it easier to add additional optimisations to the pass. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D89619	2020-10-20 17:22:43 +09:00
Ulrich Weigand	c299f3555d	[SystemZ] Fix disassembler crashes The "Size" value returned by SystemZDisassembler::getInstruction is used by common code even in the case where the routine returns failure. If that Size value exceeds the number of bytes remaining in the section, that could cause disassembler crashes. Fixed by never returning more than the number of bytes remaining.	2020-10-20 10:21:42 +02:00
Evgeny Leviant	991e86156c	[ARM][SchedModels] Convert IsCPSRDefinedPred to MCSchedPredicate Differential revision: https://reviews.llvm.org/D89460	2020-10-20 11:14:21 +03:00
David Green	6dcbc323fd	Revert "[ARM][LowOverheadLoops] Adjust Start insertion." This reverts commit `38f625d0d1`. This commit contains some holes in its logic and has been causing issues since it was commited. The idea sounds OK but some cases were not handled correctly. Instead of trying to fix that up later it is probably simpler to revert it and work to reimplement it in a more reliable way.	2020-10-20 08:55:21 +01:00
Luqman Aden	51892a42da	[COFF][ARM] Fix CodeView for Windows on 32bit ARM targets. Create the LLVM / CodeView register mappings for the 32-bit ARM Window targets. Reviewed By: compnerd Differential Revision: https://reviews.llvm.org/D89622	2020-10-19 22:16:16 -07:00
Wang, Pengfei	3a85472af2	[X86] Fix assert fail when element type is i1. extract_vector_elt will turn type vxi1 into i8, which triggers the assertion fail. Since we don't really handle vxi1 cases in below code, we can just return from here. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D89096	2020-10-20 09:26:32 +08:00
Stanislav Mekhanoshin	6ddadf9901	[AMDGPU] flat scratch ST addressing mode on gfx10 GFX10 enables third addressing mode for flat scratch instructions, an ST mode. In that mode both register operands are omitted and only swizzled offset is used in addition to flat_scratch base. Differential Revision: https://reviews.llvm.org/D89501	2020-10-19 15:29:52 -07:00
Sergei Trofimovich	1eb812e06d	[VE] Fix initializer visibility Before the change attempt to link libLTO.so against shared LLVM library failed as: ``` [ 76%] Linking CXX shared library ../../lib/libLTO.so ... /usr/bin/cmake -E cmake_link_script CMakeFiles/LTO.dir/link.txt --verbose=1 c++ -o ...libLTO.so.12git ...ibLLVM-12git.so ld: CMakeFiles/LTO.dir/lto.cpp.o: in function `llvm::InitializeAllTargetInfos()': include/llvm/Config/Targets.def:31: undefined reference to `LLVMInitializeVETargetInfo' ``` It happens because on linux llvm build system sets default symbol visibility to "hidden". The fix is to set visibility back to "default" for exported APIs with LLVM_EXTERNAL_VISIBILITY. Bug: https://bugs.llvm.org/show_bug.cgi?id=47847 Reviewed By: simoll Differential Revision: https://reviews.llvm.org/D89633	2020-10-19 22:54:41 +01:00
Evgenii Stepanov	188a7d6710	Add alloca size threshold for StackTagging initializer merging. Summary: Initializer merging generates pretty inefficient code for large allocas that also happens to trigger an exponential algorithm somewhere in Machine Instruction Scheduler. See https://bugs.llvm.org/show_bug.cgi?id=47867. This change adds an upper limit for the alloca size. The default limit is selected such that worst case size of memtag-generated code is similar to non-memtag (but because of the ISA quirks, this case is realized at the different value of alloca size, ex. memset inlining triggers at sizes below 512, but stack tagging instructions are 2x shorter, so limit is approx. 256). We could try harder to emit more compact code with initializer merging, but that would only affect large, sparsely initialized allocas, and those are doing fine already. Reviewers: vitalybuka, pcc Subscribers: llvm-commits	2020-10-19 13:44:07 -07:00
Craig Topper	e28376ec28	[X86] Add i32->float and i64->double bitcast pseudo instructions to store folding table. We have pseudo instructions we use for bitcasts between these types. We have them in the load folding table, but not the store folding table. This adds them there so they can be used for stack spills. I added an exact size check so that we don't fold when the stack slot is larger than the GPR. Otherwise the upper bits in the stack slot would be garbage. That would be fine for Eli's test case in PR47874, but I'm not sure its safe in general. A step towards fixing PR47874. Next steps are to change the ADDSSrr_Int pseudo instructions to use FR32 as the second source register class instead of VR128. That will keep the coalescer from promoting the register class of the bitcast instruction which will make the stack slot 4 bytes instead of 16 bytes. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D89656	2020-10-19 12:53:14 -07:00
Tony	6be9c7d2dc	[AMDGPU] Correct comment typo in SIMemoryLegaliizer.cpp	2020-10-19 18:50:28 +00:00
Jay Foad	56f6bf1a8d	[AMDGPU] Remove MUL_LOHI_U24/MUL_LOHI_I24 These were introduced in r279902 on the grounds that using separate MUL_U24/MUL_I24 and MULHI_U24/MULHI_I24 nodes would introduce multiple uses of the operands, which would prevent SimplifyDemandedBits from simplifying the operands. This has since been fixed by D24672 "AMDGPU/SI: Use new SimplifyDemandedBits helper for multi-use operations" No functional change intended. At least it has no effect on lit tests. Differential Revision: https://reviews.llvm.org/D89706	2020-10-19 19:15:34 +01:00
Amy Kwan	6a946fd06f	[DAGCombiner][PowerPC] Remove isMulhCheaperThanMulShift TLI hook, Use isOperationLegalOrCustom directly instead. MULH is often expanded on targets. This patch removes the isMulhCheaperThanMulShift hook and uses isOperationLegalOrCustom instead. Differential Revision: https://reviews.llvm.org/D80485	2020-10-19 12:23:04 -05:00
Tony	151e297034	[AMDGPU] Simplify cumode handling in SIMemoryLegalizer Differential Revision: https://reviews.llvm.org/D89663	2020-10-19 17:13:45 +00:00
Paul C. Anagnostopoulos	2871c6c93f	[Aarch64] [TableGen] Clean up !if(!eq(boolean, 1) and related booleans. Differential Revision: https://reviews.llvm.org/D89551	2020-10-19 10:33:55 -04:00

... 3 4 5 6 7 ...

60174 Commits