llvm-project

Commit Graph

Author	SHA1	Message	Date
Matt Arsenault	1024b73ef5	AMDGPU: Split denormal mode tracking bits Prepare to accurately track the future denormal-fp-math attribute changes. The way to actually set these separately is not wired in yet. This is just a mechanical change, and mostly still assumes the input and output mode match. This should be refined for some cases. For example, fcanonicalize lowering should use the flushing variant if either input or output flushing is enabled	2020-02-04 10:44:21 -08:00
Matt Arsenault	75fcdfa1fc	AMDGPU: Cleanup SMRD buffer selection The usage of the Imm out argument from SelectSMRDOffset is pretty confusing. Stop trying to reject CI immediates in the case where the offset field can be used. It's not an illegal way to encode the immediate, so just prefer the better encoding pattern with AddedComplexity. We probably don't even really need the different opcodes for the different offset types anymore, but that will be more work to cleanup. The SMRD non-buffer load patterns could also use a cleanup to be done separately.	2020-02-04 10:28:08 -08:00
Simon Pilgrim	f25a2a3de5	[X86] Fix missing load latencies (PR36894) We weren't account for load latencies in the SSE42/AES/CLMUL schedule classes	2020-02-04 18:18:29 +00:00
Matt Arsenault	a3c814d234	Separately track input and output denormal mode AMDGPU and x86 at least both have separate controls for whether denormal results are flushed on output, and for whether denormals are implicitly treated as 0 as an input. The current DAGCombiner use only really cares about the input treatment of denormals.	2020-02-04 12:59:21 -05:00
Fangrui Song	8ff86fcf4c	[X86] -fpatchable-function-entry=N,0: place patch label after ENDBR{32,64} Similar to D73680 (AArch64 BTI). A local linkage function whose address is not taken does not need ENDBR32/ENDBR64. Placing the patch label after ENDBR32/ENDBR64 has the advantage that code does not need to differentiate whether the function has an initial ENDBR. Also, add 32-bit tests and test that .cfi_startproc is at the function entry. The line information has a general implementation and is tested by AArch64/patchable-function-entry-empty.mir Reviewed By: nickdesaulniers Differential Revision: https://reviews.llvm.org/D73760	2020-02-04 09:42:36 -08:00
David Spickett	a05566c994	[ARM] Correct missing newline after outputting .tlsdescseq directive. Differential Revision: https://reviews.llvm.org/D73972	2020-02-04 17:38:09 +00:00
Yonghong Song	6d07802d63	[BPF] handle typedef of struct/union for CO-RE relocations Linux commit `1cf5b23988 (diff-289313b9fec99c6f0acfea19d9cfd949)` uses "#pragma clang attribute push (__attribute__((preserve_access_index)), apply_to = record)" to apply CO-RE relocations to all records including the following pattern: #pragma clang attribute push (__attribute__((preserve_access_index)), apply_to = record) typedef struct { int a; } __t; #pragma clang attribute pop int test(__t *arg) { return arg->a; } The current approach to use struct/union type in the relocation record will result in an anonymous struct, which make later type matching difficult in bpf loader. In fact, current BPF backend will fail the above program with assertion: clang: ../lib/Target/BPF/BPFAbstractMemberAccess.cpp:796: ... Assertion `TypeName.size()' failed. clang will change to use the type of the base of the member access which will preserve the typedef modifier for the preserve_{struct,union}_access_index intrinsics in the above example. Here we adjust BPF backend to accept that the debuginfo type metadata may be 'typedef' and handle them properly. Differential Revision: https://reviews.llvm.org/D73902	2020-02-04 08:53:03 -08:00
Justin Hibbits	b8dc54cf39	PowerPC: Remove redundancy in ternary for predicate selection rG2c4620ad57b8 inadvertently added redundancies in selection of GT and LE predicates for SPE. Correct this. Partially addresses PR 44768.	2020-02-04 10:38:21 -06:00
David Spickett	95c95a94d7	[ARM][AsmParser] Make assembly directives case insensitive Differential Revision: https://reviews.llvm.org/D73469	2020-02-04 16:34:39 +00:00
Kazushi (Jam) Marukawa	3ed12232b0	[VE] half fptrunc+store&load+fpext Summary: fp16 (half) load+fpext and fptrunc+store isel legalization and tests. Also, ExternalSymbolSDNode operand printing (tested by fp16 lowering). Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D73899	2020-02-04 17:16:09 +01:00
Jonas Paulsson	563e84790f	[SystemZ] Support -msoft-float This is needed when building the Linux kernel. Review: Ulrich Weigand Differential Revision: https://reviews.llvm.org/D72189	2020-02-04 10:32:45 -05:00
Mikhail Maltsev	65b3b6c0ac	[ARM] Make ARM::ArchExtKind use 64-bit underlying type (part 2), NFCI Summary: After following Simon's suggestion about additional testing posted at https://reviews.llvm.org/D73906, I found several more places that need to be updated. Reviewers: simon_tatham, dmgreen, ostannard, eli.friedman Reviewed By: simon_tatham Subscribers: merge_guards_bot, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73963	2020-02-04 14:48:10 +00:00
Mikhail Maltsev	7128aace60	[ARM] Make ARM::ArchExtKind use 64-bit underlying type, NFCI Summary: This patch changes the underlying type of the ARM::ArchExtKind enumeration to uint64_t and adjusts the related code. The goal of the patch is to prepare the code base for a new architecture extension. Reviewers: simon_tatham, eli.friedman, ostannard, dmgreen Reviewed By: dmgreen Subscribers: merge_guards_bot, kristof.beyls, hiraditya, cfe-commits, llvm-commits, pbarrio Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D73906	2020-02-04 11:24:18 +00:00
Filipe Cabecinhas	abada5036e	[NFC] Fix some spelling mistakes to test pushing to GH.	2020-02-04 11:07:31 +00:00
Kadir Cetinkaya	d2b6ac6ccd	Revert "[X86] Use X86ISD::SUB instead of X86ISD::CMP in some places." This reverts commit `8413116bf1`. this seems to be causing crashes while compiling ncurses. ``` $ ./bin/llc bugpoint-reduced-simplified.ll LLVM ERROR: Cannot emit physreg copy instruction ``` Here are the crashers: https://gist.github.com/kadircet/918f5bb97a2afe048cb875490edba46e executing with an llc compiled at `904d54de9b` works fine.	2020-02-04 11:22:53 +01:00
David Green	362d00e051	[ARM][VecReduce] Force expand vector_reduce_fmin Under MVE, we do not have any lowering for fminimum, which a vector_reduce_fmin without NoNan will be expanded into. As with the other recent patches, force this to expand in the pre-isel pass. Note that Neon lowering would be OK because the scalar fminimum uses the vector VMIN instruction, but is probably better to just rely on the scalar operations, which is what is done here. Also fixes what appears to be the reversal of INF vs -INF in the vector_reduce_fmin widening code.	2020-02-04 09:36:59 +00:00
Guillaume Chatelet	b8144c0536	[NFC] Encapsulate MemOp logic Summary: This patch simply introduces functions instead of directly accessing the fields. This helps introducing additional check logic. A second patch will add simplifying functions. Reviewers: courbet Subscribers: arsenm, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, jsji, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73945	2020-02-04 10:36:26 +01:00
Craig Topper	cd14b4a62b	[X86] Remove unneeded code that looks for (and (i8 (X86setcc_c)) I don't believe we use this construct anymore so I don't think we need to look for it.	2020-02-03 23:18:11 -08:00
Craig Topper	4581d97416	[X86] Remove some uncovered and possibly broken code from combineZext. This code matches (zext (trunc (setcc_carry))) -> (and (setcc_carry), 1) but the code never checks what type we're truncating too. An and mask of 1 would only make sense if the trunc was to MVT::i1, but we didn't check for that. I believe this code is a leftover from when i1 was a legal type.	2020-02-03 22:59:39 -08:00
Craig Topper	8413116bf1	[X86] Use X86ISD::SUB instead of X86ISD::CMP in some places. Our normal lowering for ISD::SETCC uses X86ISD::SUB to enable CSE unless the RHS is 0. optimizeCompareInstr called by the peephole pass can turn subs with unused results into cmps to clean this up. This commit makes other places that create X86ISD::CMP have the same behavior.	2020-02-03 21:01:11 -08:00
Craig Topper	c3a47221e0	[X86] Don't emit two X86ISD::COMI/UCOMI nodes when handling comi/ucomi intrinsics. We were creating two with different operand orders, and then only using one of them. Instead just swap the operands when needed and create a single node.	2020-02-03 20:08:01 -08:00
Craig Topper	c7768ce522	[X86] Update the haswell and broadwell scheduler information for gather instructions Broadwell was missing half the gather instructions. Both models had some mixups in the resource costs and number of uops. I've updated here based on what I think the original IACA source says with some cross checking against the microcode. I'm not sure about latency as the IACA source I have doesn't have that information. So I'm using the latency from uops.info. I plan to update Skylake models as well, but I'll do that in a separate patch. Differential Revision: https://reviews.llvm.org/D73844	2020-02-03 17:57:48 -08:00
Huihui Zhang	9a40670a0a	Revert "Reland "[AArch64] Fix data race on RegisterBank initialization."" This reverts commit `9c726e9d90`. There still buildbot failure: http://lab.llvm.org:8011/builders/clang-armv7-linux-build-cache/builds/25749	2020-02-03 16:58:58 -08:00
Huihui Zhang	9c726e9d90	Reland "[AArch64] Fix data race on RegisterBank initialization." Minor fix, lambda function should capture all automatic variables by reference. Harbormaster pass with: https://reviews.llvm.org/B45640	2020-02-03 16:48:18 -08:00
Jessica Paquette	9effe38b22	[AArch64][GlobalISel] Fold G_XOR into TB(N)Z bit calculation This ports the existing case for G_XOR from `getTestBitOperand` in AArch64ISelLowering into GlobalISel. The idea is to flip between TBZ and TBNZ while walking through G_XORs. Let's say we have ``` tbz (xor x, c), b ``` Let's say the `b`-th bit in `c` is 1. Then - If the `b`-th bit in `x` is 1, the `b`-th bit in `(xor x, c)` is 0. - If the `b`-th bit in `x` is 0, then the `b`-th bit in `(xor x, c)` is 1. So, then ``` tbz (xor x, c), b == tbnz x, b ``` Let's say the `b`-th bit in `c` is 0. Then - If the `b`-th bit in `x` is 1, the `b`-th bit in `(xor x, c)` is 1. - If the `b`-th bit in `x` is 0, then the `b`-th bit in `(xor x, c)` is 0. So, then ``` tbz (xor x, c), b == tbz x, b ``` Differential Revision: https://reviews.llvm.org/D73929	2020-02-03 15:22:24 -08:00
Jay Foad	2252cac694	[ANDGPU] getMemOperandsWithOffset: support BUF non-stack-access instructions with resource but no vaddr Summary: This enables clustering for many more BUF instructions. Reviewers: rampitec, arsenm, nhaehnle Subscribers: jvesely, wdng, hiraditya, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73868	2020-02-03 22:49:30 +00:00
Jessica Paquette	37910fd0e1	[AArch64][GlobalISel] Fold G_SHL into TB(N)Z bit calculation This implements the following optimization: ``` (tbz (shl x, c), b) -> (tbz x, b-c) ``` Which appears in `getTestBitOperand` in AArch64ISelLowering.cpp. If we test bit `b` of `shl x, c`, we can fold away the `shl` by looking `c` bits to the right of `b` in `x` when this fits in the type. So, we can just test the `b-c`th bit. Differential Revision: https://reviews.llvm.org/D73924	2020-02-03 14:27:08 -08:00
Matt Arsenault	7d3aace3f5	AMDGPU: Add flag to control mem intrinsic expansion GlobalISel doesn't implement the expansion for these yet, so add a flag to force expanding these so it's possible to avoid these for a while.	2020-02-03 14:26:01 -08:00
Matt Arsenault	cb7b661d3d	AMDGPU: Analyze divergence of inline asm	2020-02-03 12:42:16 -08:00
Matt Arsenault	2758ae41ae	AMDGPU/GlobalISel: Allow selecting s128 load/stores	2020-02-03 12:28:08 -08:00
Matt Arsenault	726446a009	AMDGPU: Fix splitting wide f32 s.buffer.load intrinsics This would witch f32 to i32, and produce an invald concat_vectors from i32 pieces to an f32 vector.	2020-02-03 12:28:08 -08:00
David Tenty	77e71c5217	[AIX] Don't use a zero fill with a second parameter Summary: The AIX assembler .space directive can't take a second non-zero argument to fill with. But LLVM emitFill currently assumes it can. We add a flag to the AsmInfo to check if non-zero fill is supported, and if we can't zerofill non-zero values we just splat the .byte directives. Reviewers: stevewan, sfertile, DiggerLin, jasonliu, Xiangling_L Reviewed By: jasonliu Subscribers: Xiangling_L, wuzish, nemanjai, hiraditya, kbarton, jsji, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73554	2020-02-03 15:16:08 -05:00
Jessica Paquette	2bd46444d7	[AArch64][GlobalISel] Walk through G_AND in TB(N)Z bit calculation Given ``` tb(n)z (and x, m), b ``` Where the `b`-th bit of `m` is 1, ``` tb(n)z (and x, m), b == tb(n)z x, b ``` So, we can walk past a `G_AND` in this case. Also add test/CodeGen/AArch64/GlobalISel/opt-fold-and-tbz-tbnz.mir to test this. Differential Revision: https://reviews.llvm.org/D73790	2020-02-03 11:53:47 -08:00
Amara Emerson	b911b99052	[AArch64][GlobalISel] Don't reconvert to p0 in convertPtrAddToAdd(). convertPtrAddToAdd improved overall code size and quality by a significant amount, but on -O0 we generate some cross-class copies due to the fact that we emitted G_PTRTOINT and G_INTTOPTR around the G_ADD. Unfortunately at -O0 we don't run any register coalescing, so these cross class copies end up escaping as moves, and we ended up regressing 3 benchmarks on CTMark (though still a winner overall). This patch changes the lowering to instead directly emit the G_ADD into the destination register, and then force changes the dest LLT to s64 from p0. This should be ok, as all uses of the register should now be selected and therefore the LLT doesn't matter for the users. It does however matter for the importer patterns, which will fail to select a G_ADD if there's a p0 LLT. I'm not able to get rid of the G_PTRTOINT on the source yet however. We can't use the same trick of breaking the type system since that could break the selection of the defining instruction. Thus with -O0 we still end up with a cross class copy on source. Code size improvements on -O0: Program baseline new diff test-suite :: CTMark/Bullet/bullet.test 965520 949164 -1.7% test-suite...TMark/7zip/7zip-benchmark.test 1069456 1052600 -1.6% test-suite...ark/tramp3d-v4/tramp3d-v4.test 1213692 1199804 -1.1% test-suite...:: CTMark/sqlite3/sqlite3.test 421680 419736 -0.5% test-suite...-typeset/consumer-typeset.test 837076 833380 -0.4% test-suite :: CTMark/lencod/lencod.test 799712 796976 -0.3% test-suite...:: CTMark/ClamAV/clamscan.test 688264 686132 -0.3% test-suite :: CTMark/kimwitu++/kc.test 1002344 999648 -0.3% test-suite...Mark/mafft/pairlocalalign.test 422296 421768 -0.1% test-suite :: CTMark/SPASS/SPASS.test 656792 656532 -0.0% Geomean difference -0.6% Differential Revision: https://reviews.llvm.org/D73910	2020-02-03 11:50:22 -08:00
Matt Arsenault	cd7650c186	GlobalISel: Implement fewerElementsVector for G_SEXT_INREG Start using a new strategy with a combination of merge and unmerges. This allows scalarizing before lowering, which in cases like <2 x s128> avoids producing giant illegal shifts.	2020-02-03 11:47:33 -08:00
Simon Pilgrim	3ece5a23bd	[X86] getTargetShuffleMask - use getConstantOperandVal helper. NFCI.	2020-02-03 18:06:47 +00:00
Nikita Popov	1cc4f8d172	[ARM] Expand vector reduction intrinsics on soft float Followup to D73135. If the target doesn't have hard float (default for ARM), then we assert when trying to soften the result of vector reduction intrinsics. This patch marks these for expansion as well. (A bit odd to use vectors on a target without hard float ... but that's where you end up if you expose target-independent vector types.) Differential Revision: https://reviews.llvm.org/D73854	2020-02-03 18:49:12 +01:00
Jay Foad	05297b7cbe	[AMDGPU] getMemOperandsWithOffset: add resource operand for BUF instructions Summary: This prevents unwanted clustering of BUF instructions with the same vaddr but different resource descriptors. Reviewers: rampitec, arsenm, nhaehnle Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73867	2020-02-03 17:06:09 +00:00
Simon Pilgrim	bdfcdb1fb3	HexagonOptAddrMode::changeStore - fix null dereference warning (PR43463) As detailed on PR43463, this fixes a static analyzer null dereference warning by sinking Changed = true into the if() blocks where the MIB is actually created. I did a quick check that suggested that one of those if() blocks is always guaranteed to be hit (so we could change it to if-else), but this seems like a safer approach Differential Revision: https://reviews.llvm.org/D73883	2020-02-03 16:50:04 +00:00
Simon Pilgrim	8c0e715eb2	[X86] BEXTR SimplifyDemandedBitsForTargetNode - length == 0 -> result = 0	2020-02-03 16:50:03 +00:00
Guillaume Chatelet	333f2ad8b8	[Alignment][NFC] Use Align for getMemcpy/Memmove/Memset Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Subscribers: arsenm, dschuff, jyknight, sdardis, nemanjai, jvesely, nhaehnle, sbc100, jgravelle-google, hiraditya, aheejin, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, jsji, Jim, lenary, s.egerton, pzheng, sameer.abuasal, apazos, luismarques, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73885	2020-02-03 17:13:19 +01:00
Kazushi (Jam) Marukawa	be9fe6aa8b	[VE] (fp)trunc+store & load+(fp)ext isel Summary: load+sext/zext/fpext and (fp)trunc+store isel legalization and tests Reviewers: arsenm, craig.topper, rengolin, k-ishizaka Reviewed By: arsenm Subscribers: merge_guards_bot, wdng, hiraditya, llvm-commits Tags: #ve, #llvm Differential Revision: https://reviews.llvm.org/D73774	2020-02-03 16:55:44 +01:00
Simon Pilgrim	8ead5df0b1	[X86] computeKnownBitsForTargetNode - add BEXTR support (PR39153) Add a KnownBits::extractBits helper	2020-02-03 15:43:59 +00:00
Craig Topper	028579b51e	[X86] FUCOMI/FCOMI instructions should Def FPSW not FPCW. These instructions can set the exception in FPSW. But I don't think they can change FPCW. So this looks like a typo. Differential Revision: https://reviews.llvm.org/D73864	2020-02-03 07:39:00 -08:00
Kazushi (Jam) Marukawa	07c9f7574d	[VE] vaarg functions callers and callees Summary: Isel patterns and tests for vaarg functions as callers and callees. Reviewers: arsenm, rengolin, k-ishizaka Subscribers: merge_guards_bot, wdng, hiraditya, llvm-commits Tags: #ve, #llvm Differential Revision: https://reviews.llvm.org/D73710	2020-02-03 16:26:44 +01:00
Simon Pilgrim	a9ee3ffbc0	[X86] Move BEXTR DemandedBits handling inside SimplifyDemandedBitsForTargetNode Some prep work for PR39153.	2020-02-03 15:16:40 +00:00
Matt Arsenault	00b22df71d	AMDGPU: Fix extra type mangling on llvm.amdgcn.if.break These have to be the same mask type.	2020-02-03 07:02:05 -08:00
John Brawn	68cf574857	[FPEnv][AArch64] Add lowering of f128 STRICT_FSETCC These get lowered to function calls, like the non-strict versions. Differential Revision: https://reviews.llvm.org/D73784	2020-02-03 14:39:16 +00:00
Krzysztof Parzyszek	b99ed5c0b4	[Hexagon] Rename FeatureHasPreV65 to FeaturePreV65	2020-02-03 08:20:59 -06:00
Matt Arsenault	e4bc55bd94	AMDGPU/GlobalISel: Reduce indentation	2020-02-03 05:41:14 -08:00

1 2 3 4 5 ...

55835 Commits