llvm-project

Commit Graph

Author	SHA1	Message	Date
Simon Pilgrim	e95d04f4f1	[X86][AVX] lowerV4X128Shuffle - attempt to widen to 2x256 to simplify shuffles If we are lowering to X86ISD::SHUF128 we are going to lose track of individual 128-bit lanes that are UNDEF, so if we can widen these to guarantee that they are sequential with their neighbour we should. This helps with later shuffle combines.	2020-03-30 12:22:26 +01:00
Florian Hahn	9e81249d76	[Matrix] Rename emitChainedMatrixMultiply to emitMatrixMultiply (NFC). The Chained in the name potentially leads to confusion. Also updated the comment to drop the unnecessary mention of tile-sized.	2020-03-30 11:17:25 +01:00
Florian Hahn	c3b03f3d0c	[AMDGPU] Drop const for value that is copied (NFC). This fixes warning: loop variable 'Def' of type 'const llvm::Register' creates a copy from type 'const llvm::Register' [-Wrange-loop-analysis] llvm::Register just contains a single unsigned and should be copied. Reviewers: rampitec Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D77011	2020-03-30 10:59:59 +01:00
Sam Parker	94b195ff12	[ARM][LowOverheadLoops] Add horizontal reduction support Add a bit more logic into the 'FalseLaneZeros' tracking to enable horizontal reductions and also make the VADDV variants validForTailPredication. Differential Revision: https://reviews.llvm.org/D76708	2020-03-30 09:55:41 +01:00
Guillaume Chatelet	b91535f6c7	[Alignment][NFC] Return Align for SelectionDAGNodes::getOriginalAlignment/getAlignment Summary: Also deprecate getOriginalAlignment, getAlignment will take much more time as it is pervasive through the codebase (including TableGened files). This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D76933	2020-03-30 07:26:48 +00:00
David Green	c9eaed5149	[ARM] MVE VMOV.i64 In the original batch of MVE VMOVimm code generation VMOV.i64 was left out due to the way it was done downstream. It turns out that it's fairly simple though. This adds the codegen for it, similar to NEON. Bigendian is technically incorrect in this version, which John is fixing in a Neon patch.	2020-03-30 07:44:23 +01:00
Max Kazantsev	4e0d9925d6	[NFC] Remove obsolete checks followed by fix of isGuaranteedToTransferExecutionToSuccessor In past, isGuaranteedToTransferExecutionToSuccessor contained some weird logic for volatile loads/stores that was ultimately removed by patch D65375. It's time to remove a piece of dependent logic that used to be a workaround for the code which is now deleted. Reviewed By: uenoku Differential Revision: https://reviews.llvm.org/D76918	2020-03-30 12:24:41 +07:00
Jun Ma	31a1d85c53	[Coroutines 2/2] Improve symmetric control transfer feature Differential Revision: https://reviews.llvm.org/D76913	2020-03-30 09:53:09 +08:00
Jun Ma	a94fa2c049	[Coroutines 1/2] Improve symmetric control transfer feature Differential Revision: https://reviews.llvm.org/D76911	2020-03-30 09:53:09 +08:00
Benjamin Kramer	854f268ca6	[MC] Move deprecation infos from MCTargetDesc to MCInstrInfo This allows emitting it only when the feature is used by a target. Shrinks Release+Asserts clang by 900k.	2020-03-29 21:20:40 +02:00
Simon Pilgrim	9c8ec99c80	[X86][AVX] Combine 128/256-bit lane shuffles with zeroable upper subvectors to EXTRACT_SUBVECTOR (PR40720) As explained on PR40720, EXTRACTF128 is always as good/better than VPERM2F128/SHUF128, and we can use the implicit zeroing of the uppers.	2020-03-29 19:51:38 +01:00
Simon Pilgrim	8206c50cde	[X86] Add isAnyZero shuffle mask helper	2020-03-29 19:51:37 +01:00
Nikita Popov	8253a86b65	[InstCombine] Erase old mul when creating umulo As we don't return the result of replaceInstUsesWith(), we are responsible for erasing the instruction. There is a small subtlety here in that we need to do this after the other uses of Builder, which uses the original multiply as the insertion point. NFC apart from worklist order changes.	2020-03-29 20:46:08 +02:00
Nikita Popov	53d209076a	[InstCombine] Use replaceOperand() in demanded elements simplification To make sure that dead operands get DCEd. This fixes the largest source of leftover dead operands we see in tests. NFC apart from worklist changes.	2020-03-29 20:43:19 +02:00
Nikita Popov	0c87140065	[InstCombine] Use replaceOperand() in assoc cast simplification To make sure the old operands are DCEd. NFC apart from worklist order.	2020-03-29 20:28:37 +02:00
Nikita Popov	a9ddcd6411	[InstCombine] Erase old add when optimizing add overflow We don't return the replaceInstUsesWith() result, so we're responsible for cleaning up. NFC apart from worklist order changes.	2020-03-29 20:20:14 +02:00
Uday Bondhugula	c0955edfd6	Introduce support for lib function aligned_alloc in TLI / memory builtins Aligned_alloc is a standard lib function and has been in glibc since 2.16 and in the C11 standard. It has semantics similar to malloc/calloc for several analyses/transforms. This patch introduces aligned_alloc in target library info and memory builtins. Subsequent ones will make other passes aware and fix https://bugs.llvm.org/show_bug.cgi?id=44062 This change will also be useful to LLVM generators that need to allocate buffers of vector elements larger than 16 bytes (for eg. 256-bit ones), element boundary alignment for which is not typically provided by glibc malloc. Signed-off-by: Uday Bondhugula <uday@polymagelabs.com> Differential Revision: https://reviews.llvm.org/D76970	2020-03-29 23:36:24 +05:30
Matt Arsenault	d15723ef06	AMDGPU/GlobalISel: Remove redundant virtual	2020-03-29 14:03:07 -04:00
Matt Arsenault	ab7a41069e	AMDGPU: Fix using wrong instruction for FP conversion This was was never actually hit, but FTRUNC was clearly not the intent here.	2020-03-29 14:03:07 -04:00
Matt Arsenault	97bbe7ad2a	AMDGPU: Fix typo	2020-03-29 14:03:06 -04:00
Sanjay Patel	fc3cc8a4b0	[VectorCombine] skip debug intrinsics first for efficiency	2020-03-29 13:58:04 -04:00
Nikita Popov	26fa33755f	[InstCombine] Simplify select of cmpxchg transform Rather than converting to a dummy select with equal true and false ops, just directly return the resulting value. As a side-effect, this fixes missing DCE of the previously replaced operand.	2020-03-29 18:57:32 +02:00
Florian Hahn	99913ef3d1	[OpenMP] set_bits iterator yields unsigned elements, no reference (NFC). BitVector::set_bits() returns an iterator range yielding unsinged elements, which always will be copied while const & gives the impression that there will be no copy. Newer version of clang complain: warning: loop variable 'SetBitsIt' is always a copy because the range of type 'iterator_range<llvm::BitVector::const_set_bits_iterator>' (aka 'iterator_range<const_set_bits_iterator_impl<llvm::BitVector> >') does not return a reference [-Wrange-loop-analysis] Reviewers: jdoerfert, rnk Reviewed By: rnk Differential Revision: https://reviews.llvm.org/D77010	2020-03-29 17:08:13 +01:00
Nikita Popov	28f67bd5c5	[InstCombine] Fix worklist management in varargs transform Add a replaceUse() helper to mirror replaceOperand() for the rare cases where we're working directly on uses. NFC apart from worklist order changes.	2020-03-29 18:04:12 +02:00
Nikita Popov	6f07a9e80a	[InstCombine] Erase original add when creating saddo Usually when we replaceInstUsesWith() we also return the original instruction, and InstCombine will take care of erasing it. Here we don't do that, so we need to manually erase it. NFC apart from worklist order changes.	2020-03-29 18:01:32 +02:00
Nikita Popov	1e363023b8	[InstCombine] Use replaceOperand() in a few more places To make sure the old operands get DCEd. NFC apart from worklist order changes.	2020-03-29 18:01:00 +02:00
Simon Pilgrim	7734e4b3a3	[X86][AVX] Combine 128-bit lane shuffles with a zeroable upper half to EXTRACT_SUBVECTOR (PR40720) As explained on PR40720, EXTRACTF128 is always as good/better than VPERM2F128, and we can use the implicit zeroing of the upper half. I've added some extra tests to vector-shuffle-combining-avx2.ll to make sure we don't lose coverage.	2020-03-29 16:41:59 +01:00
Simon Pilgrim	da4c7db793	[X86] Rename matchShuffleAsByteRotate to matchShuffleAsElementRotate. NFC. This was an inner helper function for the real matchShuffleAsByteRotate function, but it is more generic and is used directly for VALIGN lowering which doesn't work at the byte level.	2020-03-29 16:41:58 +01:00
Simon Pilgrim	10439f9e32	[X86][AVX] Add X86ISD::VALIGN target shuffle decode support Allows us to combine VALIGN instructions with other shuffles - the combiner doesn't create VALIGN yet though.	2020-03-29 16:41:58 +01:00
Florian Hahn	49d00824bb	[VPlan] Use one VPWidenRecipe per original IR instruction. (NFC). This patch changes VPWidenRecipe to only store a single original IR instruction. This is the first required step towards modeling it's operands as VPValues and also towards breaking it up into a VPInstruction. Discussed as part of D74695. Reviewers: Ayal, gilr, rengolin Reviewed By: gilr Differential Revision: https://reviews.llvm.org/D76988	2020-03-29 13:47:28 +01:00
Simon Pilgrim	a7115d51be	[X86] X86CallFrameOptimization - generalize slow push code path Replace the explicit isAtom() \|\| isSLM() test with the more general (and more specific) slowTwoMemOps() check to avoid the use of the PUSHrmm push from memory case. This is actually very tricky to test in anything but quite complex code, but the atomic-idempotent.ll tests seem to be the most straightforward to use. Differential Revision: https://reviews.llvm.org/D76239	2020-03-29 11:01:59 +01:00
Richard Diamond	4bf015c035	[AlignmentFromAssumptions] Fix a SCEV assertion resulting from address space differences. Summary: On targets with different pointer sizes, -alignment-from-assumptions could attempt to create SCEV expressions which use different effective SCEV types. The provided test illustrates the issue. In `getNewAlignment`, AASCEV would be the (only) alloca, which would have an effective SCEV type of i32. But PtrSCEV, the GEP in this case, due to being in the flat/default address space, will have an effective SCEV of i64. This patch resolves the issue by truncating PtrSCEV to AASCEV's effective type. Reviewers: hfinkel, jdoerfert Reviewed By: jdoerfert Subscribers: jvesely, nhaehnle, hiraditya, javed.absar, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D75471	2020-03-29 01:26:31 -05:00
Fangrui Song	fc93787d7e	[MC][PowerPC] Make .reloc support arbitrary relocation types Generalizes `ad7199f3e6` (R_PPC_NONE/R_PPC64_NONE).	2020-03-28 17:04:31 -07:00
Matt Arsenault	9564f46766	AMDGPU: Make use of default operands	2020-03-28 17:33:29 -04:00
Benjamin Kramer	ba2e72c54e	[MDBuilder] Don't use stable sort for sorting integers.	2020-03-28 21:19:46 +01:00
Nikita Popov	2215dcf1d7	[InstCombine] Remove unreachable blocks before DCE Dropping unreachable code may reduce use counts on other instructions, so it's better to do this earlier rather than later. NFC-ish, may only impact worklist order.	2020-03-28 21:19:16 +01:00
Nikita Popov	97cc1275c7	[InstCombine] Merge two functions; NFC Merge AddReachableCodeToWorklist() into prepareICWorklistFromFunction(). It's one logical step, and this makes it easier to move code.	2020-03-28 21:19:16 +01:00
Benjamin Kramer	2d24d74b85	[AMDGPU] Stabilize sort order Found by the expensive checks in llvm::sort.	2020-03-28 20:20:14 +01:00
Yonghong Song	ced0d1f42b	[BPF] support 128bit int explicitly in layout spec Currently, bpf does not specify 128bit alignment in its layout spec. So for a structure like struct ipv6_key_t { unsigned pid; unsigned __int128 saddr; unsigned short lport; }; clang will generate IR type %struct.ipv6_key_t = type { i32, [12 x i8], i128, i16, [14 x i8] } Additional padding is to ensure later IR->MIR can generate correct stack layout with target layout spec. But it is common practice for a tracing program to be first compiled with target flag (e.g., x86_64 or aarch64) through clang to generate IR and then go through llc to generate bpf byte code. Tracing program often refers to kernel internal data structures which needs to be compiled with non-bpf target. But such a compilation model may cause a problem on aarch64. The bcc issue https://github.com/iovisor/bcc/issues/2827 reported such a problem. For the above structure, since aarch64 has "i128:128" in its layout string, the generated IR will have %struct.ipv6_key_t = type { i32, i128, i16 } Since bpf does not have "i128:128" in its spec string, the selectionDAG assumes alignment 8 for i128 and computes the stack storage size for the above is 32 bytes, which leads incorrect code later. The x86_64 does not have this issue as it does not have "i128:128" in its layout spec as it does permits i128 to be alignmented at 8 bytes at stack. Its IR type looks like %struct.ipv6_key_t = type { i32, [12 x i8], i128, i16, [14 x i8] } The fix here is add i128 support in layout spec, the same as aarch64. The only downside is we may have less optimal stack allocation in certain cases since we require 16byte alignment for i128 instead of 8. But this is probably fine as i128 is not used widely and in most cases users should already have proper alignment. Differential Revision: https://reviews.llvm.org/D76587	2020-03-28 11:46:29 -07:00
Benjamin Kramer	4065e92195	Upgrade some instances of std::sort to llvm::sort. NFC.	2020-03-28 19:23:29 +01:00
Reid Kleckner	e5bf5037d8	[CodeGen] Fix sinking local values in lpads with phis There was already a test case for landingpads to handle this case, but I had forgotten to consider PHI instructions preceding the EH_LABEL in the landingpad. PR45261	2020-03-28 11:10:33 -07:00
Nikita Popov	30d712103f	[InstCombine] Use replaceOperand() API in GEP transforms To make sure that replaced operands get DCEd. This drops one iteration from gepphigep.ll, which is still not optimal. This was the last test case performing more than 3 iterations. NFC-ish, only worklist order should change.	2020-03-28 19:07:25 +01:00
Nikita Popov	b1f78baeaa	[InstCombine] Reduce code duplication in GEP of PHI transform; NFC The `NewGEP->setOperand(DI, NewPN)` call was duplicated, and the insertion of NewGEP is the same in both if/else, so we can extract it.	2020-03-28 19:07:25 +01:00
Alexandre Ganea	3ab3f3c5d5	After `09158252f7`, fix build when -DLLVM_ENABLE_THREADS=OFF Tested on Linux with Clang 9, and on Windows with Visual Studio 2019 16.5.1 with -DLLVM_ENABLE_THREADS=ON and OFF.	2020-03-28 13:54:58 -04:00
Nikita Popov	672e8bfbfc	[InstCombine] Fix worklist management in foldXorOfICmps() Because this code does not use the IC-aware replaceInstUsesWith() helper, we need to manually push users to the worklist. This is NFC-ish, in that it may only change worklist order.	2020-03-28 18:25:21 +01:00
Enna1	03bc311a16	[CorrelatedValuePropagation] Remove redundant if statement in processSelect() This statement if (ReplaceWith == S) ReplaceWith = UndefValue::get(S->getType()); is introduced in https://reviews.llvm.org/rG35609d97ae89b8e13f40f4e6b9b056954f8baa83 to fix a case where unreachable code can cause select instruction simplification to fail. In https://reviews.llvm.org/rGd10480657527ffb44ea213460fb3676a6b1300aa, we begin to perform a depth-first walk of basic blocks. This means we will not visit unreachable blocks. So we do not need this the special check any more. Differential Revision: https://reviews.llvm.org/D76753	2020-03-28 18:01:17 +01:00
Martin Storsjö	e6112a56dd	[AsmPrinter] Emit .weak directive for weak linkage on COFF for symbols without a comdat MC already knows how to emulate the .weak directive (with its ELF semantics; i.e., an undefined weak symbol resolves to 0, and a defined weak symbol has lower link precedence than a strong symbol of the same name) using COFF weak externals. Plumb this through the ASM printer too, so that definitions marked with __attribute__((weak)) at the language level (which gets translated to weak linkage at the IR level) have the corresponding .weak directive emitted. Note that declarations marked with __attribute__((weak)) at the language level (which translates to extern_weak at the IR level) already have .weak directives emitted. Weak/linkonce symbols without an associated comdat (in particular, ones generated with __attribute__((weak)) in C/C++) were earlier emitted as normal unique globals, as the comdat is required to provide the linkonce semantics. This change makes sure they are emitted as .weak instead, allowing other symbols to override them. Rename the existing coff-weak.ll test to coff-linkonce.ll. I'm not quite sure what that test covers, since the behavior being tested in it (the emission of a one_only section) is just a result of passing -function-sections to llc; the linkonce_odr makes no difference. Add a new coff-weak.ll which tests the new directive emission. Based on an previous patch by Shoaib Meenai. Differential Revision: https://reviews.llvm.org/D44543	2020-03-28 18:48:58 +02:00
Florian Hahn	81f173ed0e	[SCCP] Remove LatticeVal alias now that transition is done (NFC). The LatticeVal alias was introduced to reduce the diff size for the transition to ValueLatticeElement, which is done now. This patch removes the unnecessary alias and updates some very verbose type uses with auto.	2020-03-28 15:40:24 +00:00
Florian Hahn	a44bf59c93	[SCCP] Remove unused toLatticeValue helper (NFC). LatticeVal is an alias for ValueLatticeElement and the function is not used any longer.	2020-03-28 15:40:24 +00:00
Michael Liao	d2dd0fac48	Fix `-Wsign-compare` warning. NFC.	2020-03-28 10:20:27 -04:00
Uday Bondhugula	06066c4003	[NFC] Attributor comment updates / cast cleanup Minor update/fixes to comments for the Attributor pass, and dyn_cast -> cast. Signed-off-by: Uday Bondhugula <uday@polymagelabs.com> Differential Revision: https://reviews.llvm.org/D76972	2020-03-28 13:36:43 +05:30
Serge Pavlov	f398739152	[FEnv] Constfold some unary constrained operations This change implements constant folding to constrained versions of intrinsics, implementing rounding: floor, ceil, trunc, round, rint and nearbyint. Differential Revision: https://reviews.llvm.org/D72930	2020-03-28 12:28:33 +07:00
Jonas Devlieghere	190df4a5bc	Revert "[FileCollector] Add a method to add a whole directory and it contents." This reverts commit `8913769e35` because the unit test is failing on the Windows bot.	2020-03-27 19:21:48 -07:00
Jessica Paquette	98d05f88d5	[GlobalISel] Fix equality for copies from physregs in matchEqualDefs When we see this: ``` %a = COPY $physreg ... SOMETHING implicit-def $physreg ... %b = COPY $physreg ``` The two copies are not equivalent, and so we shouldn't perform any folding on them. When we have two instructions which use a physical register check that they define the same virtual register(s) as well. e.g., if we run into this case ``` %a = COPY $physreg ... %b = COPY %a ``` we can say that the two copies are the same, and can be folded. Differential Revision: https://reviews.llvm.org/D76890	2020-03-27 17:52:21 -07:00
Jonas Devlieghere	8913769e35	[FileCollector] Add a method to add a whole directory and it contents. Extend the FileCollector's API with addDirectory which adds a directory and its contents to the VFS mapping. Differential revision: https://reviews.llvm.org/D76671	2020-03-27 17:38:24 -07:00
Kamlesh Kumar	aabc24acf0	[RISCV] Support llvm.thread.pointer Fixes https://bugs.llvm.org/show_bug.cgi?id=45303 (clang crashed on __builtin_thread_pointer) Reviewed By: lenary, MaskRay, luismarques Differential Revision: https://reviews.llvm.org/D76828	2020-03-27 17:30:12 -07:00
Nemanja Ivanovic	4821411347	[DAGCombine] Fix splitting indexed loads in ForwardStoreValueToDirectLoad() In DAGCombiner::visitLOAD() we perform some checks before breaking up an indexed load. However, we don't do the same checking in ForwardStoreValueToDirectLoad() which can lead to failures later during combining (see: https://bugs.llvm.org/show_bug.cgi?id=45301). This patch just adds the same checks to this function as well. Fixes: https://bugs.llvm.org/show_bug.cgi?id=45301 Differential revision: https://reviews.llvm.org/D76778	2020-03-27 18:03:47 -05:00
Francesco Petrogalli	4b3d94051c	[llvm][Type] Return fixed size for scalar types. [NFC] Summary: It is safe to assume that the TypeSize associated to scalar types has a fixed size. This avoids an implicit cast of TypeSize to integer inside `Type::getScalarSizeInBits()`, as such implicit cast is deprecated. Reviewers: efriedma, sdesmalen Reviewed By: efriedma Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D76892	2020-03-27 22:23:46 +00:00
Florian Hahn	9ce198d6ed	[Darwin] Respect -fno-unroll-loops during LTO. Currently -fno-unroll-loops is ignored when doing LTO on Darwin. This patch adds a new -lto-no-unroll-loops option to the LTO code generator and forwards it to the linker if -fno-unroll-loops is passed. Reviewers: thegameg, steven_wu Reviewed By: thegameg Differential Revision: https://reviews.llvm.org/D76916	2020-03-27 22:19:03 +00:00
Jonas Devlieghere	3ef33e69de	[VirtualFileSystem] Support directory entries in the YAMLVFSWriter The current implementation of the JSONWriter does not support writing out directory entries. Earlier today I added a unit test to illustrate the problem. When an entry is added to the YAMLVFSWriter and the path is a directory, it will incorrectly emit the directory as a file, and any files inside that directory will not be found by the VFS. It's possible to partially work around the issue by only adding "leaf nodes" (files) to the YAMLVFSWriter. However, this doesn't work for representing empty directories. This is a problem for clients of the VFS that want to iterate over a directory. The directory not being there is not the same as the directory being empty. This is not just a hypothetical problem. The FileCollector for example does not differentiate between file and directory paths. I temporarily worked around the issue for LLDB by ignoring directories, but I suspect this will prove problematic sooner rather than later. This patch fixes the issue by extending the JSONWriter to support writing out directory entries. We store whether an entry should be emitted as a file or directory. Differential revision: https://reviews.llvm.org/D76670	2020-03-27 15:16:52 -07:00
Sanjay Patel	0f56bbc1a5	[InstCombine] reduce FP-casted and bitcasted signbit check PR45305: https://bugs.llvm.org/show_bug.cgi?id=45305 Alive2 proofs: http://volta.cs.utah.edu:8080/z/bVyrko http://volta.cs.utah.edu:8080/z/Vxpz9q	2020-03-27 17:33:59 -04:00
Francesco Petrogalli	c66d1f38f6	[llvm][Support] Add isZero method for TypeSize. [NFC] Summary: The method is used where TypeSize is implicitly cast to integer for being checked against 0. Reviewers: sdesmalen, efriedma Reviewed By: sdesmalen, efriedma Subscribers: efriedma, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D76748	2020-03-27 21:03:44 +00:00
Matt Arsenault	a8cc9047de	CodeGen: Add -denormal-fp-math-f32 flag Make the set of FP related attributes and command flags closer.	2020-03-27 14:00:39 -07:00
Jay Foad	a6dfd827e5	[AMDGPU] Fix getEUsPerCU for gfx10 in CU mode Summary: "Per CU" is a bit simplistic for gfx10, but I couldn't think of a better name. Reviewers: arsenm, rampitec, nhaehnle, dstuttard, tpr Subscribers: kzhuravl, jvesely, wdng, yaxunl, t-tye, hiraditya, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D76861	2020-03-27 20:36:49 +00:00
Fangrui Song	152d14da64	[MC][X86] Make .reloc support arbitrary relocation types Generalizes D62014 (R_386_NONE/R_X86_64_NONE). Unlike ARM (D76746) and AArch64 (D76754), we cannot delete FK_NONE from getFixupKindSize because FK_NONE is still used by R_386_TLS_DESC_CALL/R_X86_64_TLSDESC_CALL.	2020-03-27 13:33:15 -07:00
diggerlin	9c20f09985	[AIX] Address comment https://reviews.llvm.org/D76162#inline-701237 SUMMARY: Address clang format issue: "clang format this block, I don't think the spaces are aligned correctly." Subscribers: wuzish, nemanjai, hiraditya Differential Revision: https://reviews.llvm.org/D76162	2020-03-27 16:21:53 -04:00
Matt Arsenault	348735b723	AMDGPU: Stop setting attributes based on TargetOptions Having arbitrary passes looking at the TargetOptions is pretty messy. This was also disregarding if a function already had an explicit attribute setting on it. opt/llc now add the attributes to functions that don't specify the attribute. clang and lld do not call the function to do this, which they maybe should. This was also treating unsafe-fp-math as implying the others, and setting the other attributes based on it. This is not done anywhere else, and I'm not sure is correct based on the current description of the option bit. Effectively reverts `1d8cf2be89`	2020-03-27 13:13:43 -07:00
Matt Arsenault	0ab5b5b858	Fix denormal-fp-math flag and attribute interaction Make these behave the same way unsafe-fp-math and co. The command line flag should add the attribute to functions that do not already have it, and leave existing attributes. The attribute is the actual implementation, but the flag is useful in some testing situations. AMDGPU has a variety of tests with denormals enabled/disabled that would require a painful level of test duplication without a flag. This doesn't expose setting the separate input/output modes, or add a flag for the f32 version yet. Tests will be included in future patch.	2020-03-27 12:48:58 -07:00
Fangrui Song	34d77516b8	[MC][AArch64] Make .reloc support arbitrary relocation types Depends on D76746. Generalizes D61973. Differential Revision: https://reviews.llvm.org/D76754	2020-03-27 12:30:52 -07:00
Fangrui Song	c389526171	[MC][ARM] Make .reloc support arbitrary relocation types Generalizes D61992. In GNU as, the .reloc directive supports arbitrary relocation types. A MCFixupKind value `V` larger than or equal to FirstLiteralRelocationKind is used to represent the relocation type whose number is V-FirstLiteralRelocationKind. This is useful for linker tests. Without the feature the assembler cannot produce certain relocation records (e.g. R_ARM_ALU_PC_G0/R_ARM_LDR_PC_G0) This helps move forward D75349 and D76575. Differential Revision: https://reviews.llvm.org/D76746	2020-03-27 12:29:49 -07:00
Lang Hames	cb84e4827e	[ORC] Introduce JITSymbolFlags::HasMaterializeSideEffectsOnly flag. This flag can be used to mark a symbol as existing only for the purpose of enabling materialization. Such a symbol can be looked up to trigger materialization with the lookup returning only once materialization is complete. Symbols with this flag will never resolve however (to avoid permanently polluting the symbol table), and should only be looked up using the SymbolLookupFlags::WeaklyReferencedSymbol flag. The primary use case for this flag is initialization symbols.	2020-03-27 11:02:54 -07:00
Lang Hames	d38d06e649	[ORC] Don't create MaterializingInfo entries unnecessarily.	2020-03-27 11:02:54 -07:00
Craig Topper	cdd1cd7120	[X86] Don't form masked instructions if the operation has an additional user. This will cause the operation to be repeated in both a mask and another masked or unmasked form. This can a wasted of execution resources. Differential Revision: https://reviews.llvm.org/D60940	2020-03-27 10:44:22 -07:00
Simon Pilgrim	950ea61653	[X86] Remove orphan LowerSTRICT_FSETCC declaration. NFCI. LowerSETCC handles strict cases as well, we don't have a separate function.	2020-03-27 17:03:19 +00:00
jasonliu	d60d7d69de	[llvm-objdump][XCOFF][AIX] Implement -r option Summary: Implement several XCOFF hooks to get '-r' option working for llvm-objdump -r. Reviewer: DiggerLin, hubert.reinterpretcast, jhenderson, MaskRay Differential Revision: https://reviews.llvm.org/D75131	2020-03-27 16:05:42 +00:00
Guillaume Chatelet	74eac9031a	[Alignment][NFC] MachineMemOperand::getAlign/getBaseAlign Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Subscribers: arsenm, dschuff, sdardis, nemanjai, jvesely, nhaehnle, sbc100, jgravelle-google, hiraditya, aheejin, kbarton, jrtc27, atanasyan, jfb, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D76925	2020-03-27 15:49:13 +00:00
Sam Parker	d7084fa34a	[ARM][LowOverheadLoops] DoubleWidthResult instructions canGenerateZeros Given that some instructions generate wider result elements than their inputs, flag them as being able to generate non zeros in the false lanes. Differential Revision: https://reviews.llvm.org/D76766	2020-03-27 15:26:13 +00:00
Alexandre Ganea	09158252f7	[ThinLTO] Allow usage of all hardware threads in the system Before this patch, it wasn't possible to extend the ThinLTO threads to all SMT/CMT threads in the system. Only one thread per core was allowed, instructed by usage of llvm::heavyweight_hardware_concurrency() in the ThinLTO code. Any number passed to the LLD flag /opt:lldltojobs=..., or any other ThinLTO-specific flag, was previously interpreted in the context of llvm::heavyweight_hardware_concurrency(), which means SMT disabled. One can now say in LLD: /opt:lldltojobs=0 -- Use one std::thread / hardware core in the system (no SMT). Default value if flag not specified. /opt:lldltojobs=N -- Limit usage to N threads, regardless of usage of heavyweight_hardware_concurrency(). /opt:lldltojobs=all -- Use all hardware threads in the system. Equivalent to /opt:lldltojobs=$(nproc) on Linux and /opt:lldltojobs=%NUMBER_OF_PROCESSORS% on Windows. When an affinity mask is set for the process, threads will be created only for the cores selected by the mask. When N > number-of-hardware-threads-in-the-system, the threads in the thread pool will be dispatched equally on all CPU sockets (tested only on Windows). When N <= number-of-hardware-threads-on-a-CPU-socket, the threads will remain on the CPU socket where the process started (only on Windows). Differential Revision: https://reviews.llvm.org/D75153	2020-03-27 10:20:58 -04:00
Sam Parker	0e6aa08381	[ARM][MVE] Add DoubleWidthResult flag Add a flag for those instructions which read from the top/bottom halves of their inputs and produce a vector of results with double width elements. Differential Revision: https://reviews.llvm.org/D76762	2020-03-27 13:44:04 +00:00
Sjoerd Meijer	401a324c51	[LV] Refactor widenIntOrFpInduction. NFC. This untangles the logic in widenIntOrFpInduction in order to make more explicit and visible how exactly the induction variable is lowered. Differential Revision: https://reviews.llvm.org/D76686	2020-03-27 12:58:50 +00:00
Guillaume Chatelet	e2ef6127d9	[Alignment] Fix overaligning bug Summary: This was discovered while converting to Align type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D76914	2020-03-27 12:57:50 +00:00
Simon Pilgrim	d6ddabd7ef	Revert rG6ff1ea3244c543ad24fc99c7f4979db2f2078593 "Fix "use of uninitialized variable" static analyzer warning. NFCI." @dblaikie noticed that this may interfere with msan analysis	2020-03-27 11:44:03 +00:00
Jonas Paulsson	35173dddd1	[SystemZ] Fix typos in comments.	2020-03-27 12:31:48 +01:00
David Green	8689f98e9b	[ARM] Fix MVE VCMPr f16 pattern This patterns seemed to be using the f32 instruction, not f16. Fix it to use the correct one. Differential Revision: https://reviews.llvm.org/D76841	2020-03-27 11:18:24 +00:00
Georgii Rymar	30c1f9a558	[llvm-readobj] - Fix a crash when DT_STRTAB is broken. We might have a crash scenario when we have an invalid DT_STRTAB value that is larger than the file size. I've added a test case to demonstrate. Differential revision: https://reviews.llvm.org/D76706	2020-03-27 13:18:08 +03:00
Guillaume Chatelet	a98662f4c1	[Alignment][NFC] Update MachineMemOperand implementation to use MaybeAlign Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Reviewed By: courbet Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D76625	2020-03-27 08:06:10 +00:00
Fangrui Song	6728a9ae19	[MCInstPrinter] Add parameter `Address` to printCustomAliasOperand. NFC Follow-up of D72172 and llvmorg-11-init-6896-gb3cc5dcef0f.	2020-03-27 00:38:20 -07:00
Johannes Doerfert	095cecbe0d	[OpenMP] `omp begin/end declare variant` - part 1, parsing This is the first part extracted from D71179 and cleaned up. This patch provides parsing support for `omp begin/end declare variant`, as defined in OpenMP technical report 8 (TR8) [0]. A major purpose of this patch is to provide proper math.h/cmath support for OpenMP target offloading. See PR42061, PR42798, PR42799. The current code was developed with this feature in mind, see [1]. [0] https://www.openmp.org/wp-content/uploads/openmp-TR8.pdf [1] https://reviews.llvm.org/D61399#change-496lQkg0mhRN Reviewed By: aaron.ballman Differential Revision: https://reviews.llvm.org/D74941	2020-03-27 02:30:58 -05:00
Fangrui Song	b3cc5dcef0	[MCInstPrinter] Add parameter `Address` to MCInstPrinter::printAliasInstr. NFC Follow-up of D72172.	2020-03-27 00:03:32 -07:00
Shengchen Kan	1fb4f99a21	[X86][MC] Fix the bug for prefix padding support Summary: There is a tiny logic error of D75300, making branch is not correctly aligned with option -x86-pad-max-prefix-size Reviewers: reames, MaskRay, craig.topper, LuoYuanke, jyknight Reviewed By: reames Subscribers: hiraditya, llvm-commits, annita.zhang Tags: #llvm Differential Revision: https://reviews.llvm.org/D76285	2020-03-27 14:16:09 +08:00
Juneyoung Lee	1bcc500b48	[DAGCombine] Add basic optimizations for FREEZE in SelDag Summary: This patch is the first effort to adding basic optimizations for FREEZE in SelDag. Reviewers: spatel, lebedev.ri Reviewed By: spatel Subscribers: xbolva00, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D76707	2020-03-27 12:20:39 +09:00
Zakk Chen	64fe841856	Fix typo, targetFeature should be lowercase. this fixing also enable llc -mattr=+cpuhelp Reviewers: ziangwan, kongyi Reviewed By: kongyi Tags: #llvm Differential Revision: https://reviews.llvm.org/D76757	2020-03-26 19:40:04 -07:00
Kai Wang	1a6b7318dd	[NFC] Clang format for the ELF header and ARM build attributes. Differential Revision: https://reviews.llvm.org/D76819	2020-03-27 09:53:12 +08:00
Leonard Chan	5d929e6646	Move setBugReportMsg() out from under a conditional Fixes a build break with LLVM_ENABLE_BACKTRACES=OFF. Differential Revision: https://reviews.llvm.org/D76893	2020-03-26 16:39:03 -07:00
Dan Gohman	d865437d9c	[WebAssembly] Fix the order of destructors in the LowerGlobalDtors pass. Fix the LowerGlobalDtors pass to run destructors in the same order as the regular LLVM destructor lowering -- in reverse order. Adjacent destructors with the same associated object are grouped, but destructors are not reordered based on associated objects. Differential Revision: https://reviews.llvm.org/D70685	2020-03-26 16:19:02 -07:00
Stanislav Mekhanoshin	4c4b71843b	[AMDGPU] Propagate amdgpu-waves-per-eu to callees Differential Revision: https://reviews.llvm.org/D76868	2020-03-26 14:43:44 -07:00
Craig Topper	9f7d4150b9	[X86] Move combineLoopMAddPattern and combineLoopSADPattern to an IR pass before SelecitonDAG. These transforms rely on a vector reduction flag on the SDNode set by SelectionDAGBuilder. This flag exists because SelectionDAG can't see across basic blocks so SelectionDAGBuilder is looking across and saving the info. X86 is the only target that uses this flag currently. By removing the X86 code we can remove the flag and the SelectionDAGBuilder code. This pass adds a dedicated IR pass for X86 that looks across the blocks and transforms the IR into a form that the X86 SelectionDAG can finish. An advantage of this new approach is that we can enhance it to shrink the phi nodes and final reduction tree based on the zeroes that we need to concatenate to bring the partially reduced reduction back up to the original width. Differential Revision: https://reviews.llvm.org/D76649	2020-03-26 14:10:20 -07:00
Simon Pilgrim	ad36491ebb	[X86] Prefer PACKUS(AND(),AND()) to SHUFFLE(PSHUFB(),PSHUFB()) on all targets Extends rG9d1721ce3926 to support AVX2+ targets.	2020-03-26 20:46:24 +00:00
Jay Foad	0fe096c4e9	[AMDGPU] Rename overloaded getMaxWavesPerEU to getWavesPerEUForWorkGroup Summary: I think Max in the name was misleading. NFC. Reviewers: arsenm, rampitec Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D76860	2020-03-26 20:21:04 +00:00
Jay Foad	bb9c4fd7ea	[AMDGPU] Remove getMaxWavesPerCU in favour of getWavesPerWorkGroup. Summary: These methods were identical. I chose to remove getMaxWavesPerCU because I think Max in the name was misleading. NFC. Reviewers: arsenm, rampitec Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D76859	2020-03-26 20:21:04 +00:00
Derek Schuff	e110897e28	[WEbAssembly] Clear frame base vreg in explicit-locals when stack pointer is dead Having an alloca in a function causes the stack pointer to be generated in the prolog, but if it's unused other than for debug info, explicit-locals will drop it and not allocate a local. In this case we need to reset the FrameBaseVreg. Differential Revision: https://reviews.llvm.org/D76784	2020-03-26 13:07:32 -07:00
Simon Pilgrim	39a52a19ed	[X86] lowerV16I8Shuffle - create v8i16 mask for PACKUS(AND(),AND()) patterns. We can improve computeKnownBits results by avoiding excess bitcasts. For this pattern we were doing: (v16i8 PACKUS(v8i16 BITCAST(v16i8 AND(V1, MASK)), v8i16 BITCAST(v16i8 AND(V2, MASK)))) By performing the MASK/AND with a v8i16 type and bitcasting V1/V2 directly we can help computeKnownBits see that the mask is clearing the upper bits and allows shuffle combining to peek through later on. This will be necessary to extend rG9d1721ce3926 to AVX2+ targets in a future patch.	2020-03-26 19:59:57 +00:00
diggerlin	fdfe411e7c	[AIX] discard the label in the csect of function description and use qualname for linkage SUMMARY: SUMMARY for a source file "test.c" void foo() {}; llc will generate assembly code as (assembly patch) .globl foo .globl .foo .csect foo[DS] foo: .long .foo .long TOC[TC0] .long 0 and symbol table as (xcoff object file) [4] m 0x00000004 .data 1 unamex foo [5] a4 0x0000000c 0 0 SD DS 0 0 [6] m 0x00000004 .data 1 extern foo [7] a4 0x00000004 0 0 LD DS 0 0 After first patch, the assembly will be as .globl foo[DS] # -- Begin function foo .globl .foo .align 2 .csect foo[DS] .long .foo .long TOC[TC0] .long 0 and symbol table will as [6] m 0x00000004 .data 1 extern foo [7] a4 0x00000004 0 0 DS DS 0 0 Change the code for the assembly path and xcoff objectfile patch for llc. Reviewers: Jason Liu Subscribers: wuzish, nemanjai, hiraditya Differential Revision: https://reviews.llvm.org/D76162	2020-03-26 15:46:52 -04:00
Scott Linder	bd12ecb88f	[AMDGPU] Fix PC register mapping in wave32 mode Summary: The PC_32 DWARF register is for a 32-bit process address space which we don't implement in AMDGCN; another way of putting this is that the size of the PC register is not a function of the wavefront size. If we ever implement a 32-bit process address space we will need to add two more DwarfFlavours i.e. we will need to represent the product of (wave32, wave64) x (64-bit address space, 32-bit address space). Tags: #llvm Differential Revision: https://reviews.llvm.org/D76732	2020-03-26 14:43:25 -04:00
David Blaikie	9002db05a2	Roll otherwise unused subexpressions into an assertion	2020-03-26 11:32:33 -07:00
Guillaume Chatelet	b727aabcb8	[Alignment][NFC] Use llvmTargetFrameLowering::getStackAlign Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Reviewed By: courbet Subscribers: wuzish, arsenm, jyknight, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, fedor.sergeev, jrtc27, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D76613	2020-03-26 18:15:53 +00:00
Jonathan Roelofs	7a89a5d81b	[InstCombine] Fix Incorrect fold of ashr+xor -> lshr w/ vectors Fixes https://bugs.llvm.org/show_bug.cgi?id=43665	2020-03-26 12:09:36 -06:00
Jay Foad	0602c20b1b	[AMDGPU] Make use of divideCeil. NFC.	2020-03-26 16:11:35 +00:00
Jay Foad	596bed3fd3	[AMDGPU] Remove unused methods. NFC.	2020-03-26 16:11:35 +00:00
Justin Hibbits	459e8e9488	[PowerPC]: Don't allow r0 as a target for LD_GOT_TPREL_L/32 Summary: The linker is free to relax this (relocation R_PPC_GOT_TPREL16) against R_PPC_TLS, if it sees fit (initial exec to local exec). If r0 is used, this can generate execution-invalid code (converts to 'addi %rX, %r0, FOO, which translates in PPC-lingo to li %rX, FOO). Forbid this instead. This fixes static binaries using locales on FreeBSD/powerpc (tested on FreeBSD/powerpcspe). Reviewed By: nemanjai Differential Revision: https://reviews.llvm.org/D76662	2020-03-26 10:59:28 -05:00
Simon Pilgrim	9d1721ce39	[X86][SSE] Prefer PACKUS(AND(),AND()) to SHUFFLE(PSHUFB(),PSHUFB()) on pre-AVX2 targets As discussed on PR31443, we should be trying to use PACKUS for binary truncation patterns to reduce the number of shuffles. The plan is to support AVX2+ targets once we've worked around PR45315 - we fail to peek through a VBROADCAST_LOAD mask to recognise zero upper bits in a PACKUS pattern. We should also be able to add support for v8i16 and possibly 256/512-bit vectors as well.	2020-03-26 15:47:43 +00:00
Fangrui Song	3eef47407b	[PPCInstPrinter] Change printBranchOperand(calltarget) to print the target address in hexadecimal form ``` // llvm-objdump -d output (before) 0: bl .-4 4: bl .+0 8: bl .+4 // llvm-objdump -d output (after) ; GNU objdump -d 0: bl 0xfffffffc / bl 0xfffffffffffffffc 4: bl 0x4 8: bl 0xc ``` Many Operand's are not annotated as OPERAND_PCREL. They are not affected (e.g. `b .+67108860`). I plan to fix them in future patches. Modified test/tools/llvm-objdump/ELF/PowerPC/branch-offset.s to test address space wraparound for powerpc32 and powerpc64. Reviewed By: sfertile, jhenderson Differential Revision: https://reviews.llvm.org/D76591	2020-03-26 08:32:29 -07:00
Fangrui Song	87de9a0786	[X86InstPrinter] Change printPCRelImm to print the target address in hexadecimal form ``` // llvm-objdump -d output (before) 400000: e8 0b 00 00 00 callq 11 400005: e8 0b 00 00 00 callq 11 // llvm-objdump -d output (after) 400000: e8 0b 00 00 00 callq 0x400010 400005: e8 0b 00 00 00 callq 0x400015 // GNU objdump -d. The lack of 0x is not ideal because the result cannot be re-assembled 400000: e8 0b 00 00 00 callq 400010 400005: e8 0b 00 00 00 callq 400015 ``` In llvm-objdump, we pass the address of the next MCInst. Ideally we should just thread the address of the current address, unfortunately we cannot call X86MCCodeEmitter::encodeInstruction (X86MCCodeEmitter requires MCInstrInfo and MCContext) to get the length of the MCInst. MCInstPrinter::printInst has other callers (e.g llvm-mc -filetype=asm, llvm-mca) which set Address to 0. They leave MCInstPrinter::PrintBranchImmAsAddress as false and this change is a no-op for them. Reviewed By: jhenderson Differential Revision: https://reviews.llvm.org/D76580	2020-03-26 08:28:59 -07:00
Fangrui Song	5fad05e80d	[MCInstPrinter] Pass `Address` parameter to MCOI::OPERAND_PCREL typed operands. NFC Follow-up of D72172 and D72180 This patch passes `uint64_t Address` to print methods of PC-relative operands so that subsequent target specific patches can change `*InstPrinter::print{Operand,PCRelImm,...}` to customize the output. Add MCInstPrinter::PrintBranchImmAsAddress which is set to true by llvm-objdump. ``` // Current llvm-objdump -d output aarch64: 20000: bl #0 ppc: 20000: bl .+4 x86: 20000: callq 0 // Ideal output aarch64: 20000: bl 0x20000 ppc: 20000: bl 0x20004 x86: 20000: callq 0x20005 // GNU objdump -d. The lack of 0x is not ideal because the result cannot be re-assembled aarch64: 20000: bl 20000 ppc: 20000: bl 0x20004 x86: 20000: callq 20005 ``` In `lib/Target/X86/X86GenAsmWriter1.inc` (generated by `llvm-tblgen -gen-asm-writer`): ``` case 12: // CALL64pcrel32, CALLpcrel16, CALLpcrel32, EH_SjLj_Setup, JCXZ, JECXZ, J... - printPCRelImm(MI, 0, O); + printPCRelImm(MI, Address, 0, O); return; ``` Some targets have 2 `printOperand` overloads, one without `Address` and one with `Address`. They should annotate derived `Operand` properly with `let OperandType = "OPERAND_PCREL"`. Reviewed By: jhenderson Differential Revision: https://reviews.llvm.org/D76574	2020-03-26 08:21:15 -07:00
Dominik Montada	9fedb6900d	[GlobalISel] add helper function to create arbitrary libcalls Summary: The existing helper function can only create a libcall to functions available in RTLIB. Add a helper function that can create a libcall to a given function name using the provided calling convention. Reviewers: aditya_nandakumar, t.p.northover, rovka, arsenm, dsanders Reviewed By: arsenm Subscribers: wdng, hiraditya, volkan, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D76845	2020-03-26 16:11:13 +01:00
Qiu Chaofan	172456c775	[Legalizer] Fix some flags miss in vector results In some scalarize/split result methods (unary, binary, ...), flags in SDNode were not passed down, which may lead to unexpected results in unsafe float-point optimization. This patch fixes them. (maybe not complete) Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D76832	2020-03-26 22:01:19 +08:00
Simon Pilgrim	e30d29ebc1	[X86][SSE] getFauxShuffleMask - peek through TRUNCATE/AEXT/ZEXT for INSERT_VECTOR_ELT(EXTRACT_VECTOR_ELT()) As long we extract from a source vector with smaller elements and we zero-extend the element in the final shuffle mask then we can safely peek through truncations and any/zero-extensions to find the source extraction.	2020-03-26 11:57:45 +00:00
Jonas Paulsson	8bf9e317e4	[SystemZ] Bugfix in tieOpsIfNeeded() This function did a check which was broken to see if an opcode requires op0 and op1 to be tied. By chance this is NFC. Review: Ulrich Weigand	2020-03-26 12:22:14 +01:00
gbreynoo	a945037e8f	Tools emit the bug report URL on crash When Clang crashes a useful message is output: "PLEASE submit a bug report to https://bugs.llvm.org/ and include the crash backtrace, preprocessed source, and associated run script." A similar message is now output for all tools. Differential Revision: https://reviews.llvm.org/D74324	2020-03-26 10:26:59 +00:00
Kang Zhang	4673699a47	[PowerPC] Remove the repeated definition for some InstAlias for mtspr/mfspr Summary: Below InstAlias have been redefined, this patch is to remove the repeated definition. mtdec/mfdec mtsdr1/mfsdr1 mtsrr0/mfsrr0 mtsrr1/mfsrr1 mtasr Reviewed By: nemanjai, steven.zhang Differential Revision: https://reviews.llvm.org/D75821	2020-03-26 09:58:30 +00:00
Cullen Rhodes	9086db707d	[AArch64][SVE] Implement structured store intrinsics Summary: This patch adds initial support for the following intrinsics: * llvm.aarch64.sve.st2 * llvm.aarch64.sve.st3 * llvm.aarch64.sve.st4 For storing two, three and four vectors worth of data. Basic codegen for reg+immediate forms are implemented. Reg+reg addressing modes will be addressed in a later patch. These intrinsics are intended for use in the Arm C Language Extension (ACLE). Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D75947	2020-03-26 09:34:51 +00:00
Ties Stuij	71ae267d1f	[PATCH] [ARM] ARMv8.6-a command-line + BFloat16 Asm Support Summary: This patch introduces command-line support for the Armv8.6-a architecture and assembly support for BFloat16. Details can be found https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/arm-architecture-developments-armv8-6-a in addition to the GCC patch for the 8..6-a CLI: https://gcc.gnu.org/legacy-ml/gcc-patches/2019-11/msg02647.html In detail this patch - march options for armv8.6-a - BFloat16 assembly This is part of a patch series, starting with command-line and Bfloat16 assembly support. The subsequent patches will upstream intrinsics support for BFloat16, followed by Matrix Multiplication and the remaining Virtualization features of the armv8.6-a architecture. Based on work by: - labrinea - MarkMurrayARM - Luke Cheeseman - Javed Asbar - Mikhail Maltsev - Luke Geeson Reviewers: SjoerdMeijer, craig.topper, rjmccall, jfb, LukeGeeson Reviewed By: SjoerdMeijer Subscribers: stuij, kristof.beyls, hiraditya, dexonsmith, danielkiss, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D76062	2020-03-26 09:17:20 +00:00
David Green	37b9cc8f29	[ARM] Sink splats to vector float instructions Some MVE floating point instructions have gpr register variants that take the scalar gpr value and splat them to all lanes. In order to accept them in loops, the shuffle_vector and insert need to be sunk down into the loop, next to the instruction so that ISel can see the whole pattern. This does that sinking for FAdd, FSub, FMul and FCmp. The patterns for mul are slightly more constrained as there are no fms variants taking register arguments. Differential Revision: https://reviews.llvm.org/D76023	2020-03-26 09:02:18 +00:00
Fangrui Song	4c52d51e78	[InstCombine] Fix a code-sinking bug after D73832/f1a9efabcb9b - UserParent = PN->getIncomingBlock(I->use_begin()); + UserParent = PN->getIncomingBlock(SingleUse); The first use of I may be droppable (llvm.assume). When compiling llvm/lib/IR/AutoUpgrade.cpp with a bootstrapped clang with ThinLTO with minimized bitcode files, I see such a case in the function _ZN4llvm20UpgradeIntrinsicCallEPNS_8CallInstEPNS_8FunctionE clang -c -fthinlto-index=AutoUpgrade.o.thinlto.bc AutoUpgrade.bc -O3 Unfortunately it is really difficult to get a minimized reproduce.	2020-03-25 22:50:53 -07:00
John McCall	9514c048d8	Use optimal layout and preserve alloca alignment in coroutine frames. Previously, we would ignore alloca alignment when building the frame and just use the natural alignment of the allocated type. If an alloca is over-aligned for its IR type, this could lead to a frame entry with inadequate alignment for the downstream uses of the alloca. Since highly-aligned fields also tend to produce poor layouts under a naive layout algorithm, I've also switched coroutine frames to use the new optimal struct layout algorithm. In order to communicate the frame size and alignment to later passes, I needed to set align+dereferenceable attributes on the frame-pointer parameter of the resume function. This is clearly the right thing to do, but the align attribute currently seems to result in assumptions being added during inlining that the optimizer cannot easily remove.	2020-03-26 00:51:09 -04:00
QingShan Zhang	1ef7bf4121	[PowerPC] Improve the way legalize mul for v8i16 and add pattern to match mul + add We can legalize the operation MUL for v8i16 with instruction (vmladduhm A, B, 0) if altivec enabled. Now, it is set as custom and expand it later, which is not the right way. And then, we can add the pattern to match the mul + add with (vmladduhm A, B, C) Reviewed By: Nemanjai Differential Revision: https://reviews.llvm.org/D76751	2020-03-26 04:46:49 +00:00
Stanislav Mekhanoshin	e06d707aa2	[AMDGPU] Fixed function traversal in attribute propagation AMDGPUPropagateAttributes pass was skipping some of the functions when cloning. Functions were added to root set and then skipped on the next interation because they are already in the root set, while were meant to be processed with different features. Differential Revision: https://reviews.llvm.org/D76815	2020-03-25 18:47:09 -07:00
Stanislav Mekhanoshin	6e00e3fcb0	[AMDGPU] Preserve original symbol during attribute propagation AMDGPUPropagateAttributes can swap names while cloning a function. Only do it if original symbol was not externally visible. Differential Revision: https://reviews.llvm.org/D76789	2020-03-25 15:26:30 -07:00
Tyker	f1a9efabcb	Ignore/Drop droppable uses for code-sinking in InstCombine Summary: This patch allows code-sinking in InstCombine to be performed when instruction have uses in llvm.assume. Use are considered droppable when it is preferable to modify the User such that the use disappears rather than to prevent a transformation because of the use. for now uses are considered droppable if they are in an llvm.assume. Reviewers: jdoerfert, nikic, spatel, lebedev.ri, sstefan1 Reviewed By: jdoerfert Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73832	2020-03-25 20:42:52 +01:00
Alina Sbirlea	3abcbf9903	[CFG/BasicBlock] Rename succ_const to const_succ. [NFC] Summary: Rename `succ_const_iterator` to `const_succ_iterator` and `succ_const_range` to `const_succ_range` for consistency with the predecessor iterators, and the corresponding iterators in MachineBasicBlock. Reviewers: nicholas, dblaikie, nlewycky Subscribers: hiraditya, bmahjour, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D75952	2020-03-25 12:40:55 -07:00
Heejin Ahn	f93426c5b9	[WebAssembly] Move event section before global section Summary: https://github.com/WebAssembly/exception-handling/issues/98 Also this moves many parts of code to make code align with the section order, even if they don't affect the output. Reviewers: tlively Subscribers: dschuff, sbc100, hiraditya, sunfish, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D76752	2020-03-25 11:49:03 -07:00
Nico Weber	d7888149aa	Suppress a few -Wunreachable-code warnings. No behavior change. Also fix a comment to say match reality.	2020-03-25 13:55:42 -04:00
Simon Pilgrim	c6e5531f9b	[X86][AVX] Combine shuffles to TRUNCATE/VTRUNC patterns Add support for combining shuffles to AVX512 truncate instructions - another step toward fixing D56387/D66004. It also fixes SKX code on PR31443. We could probably extend this further to handle non-VLX truncation cases.	2020-03-25 17:41:51 +00:00
Gil Rapaport	078c863305	[LV] Replace stored value with a VPValue (NFCI) InnerLoopVectorizer's code called during VPlan execution still relies on original IR's def-use relations to decide which vector code to generate, limiting VPlan transformations ability to modify def-use relations and still have ILV generate the vector code. This commit introduces a VPValue for VPWidenMemoryInstructionRecipe to use as the stored value. The recipe is generated with a VPValue wrapping the stored value of the scalar store. This reduces ingredient def-use usage by ILV as a step towards full VPlan-based def-use relations. Differential Revision: https://reviews.llvm.org/D76373	2020-03-25 19:36:55 +02:00
Tyker	d72c586aeb	[NFC] Rename function to match Coding Convention and fix typo in KnowledgeRetention	2020-03-25 18:31:13 +01:00
Mikhail Maltsev	bb4da94e5b	[ARM,CDE] Implement predicated Q-register CDE intrinsics Summary: This patch implements the following CDE intrinsics: T __arm_vcx1q_m(int coproc, T inactive, uint32_t imm, mve_pred_t p); T __arm_vcx2q_m(int coproc, T inactive, U n, uint32_t imm, mve_pred_t p); T __arm_vcx3q_m(int coproc, T inactive, U n, V m, uint32_t imm, mve_pred_t p); T __arm_vcx1qa_m(int coproc, T acc, uint32_t imm, mve_pred_t p); T __arm_vcx2qa_m(int coproc, T acc, U n, uint32_t imm, mve_pred_t p); T __arm_vcx3qa_m(int coproc, T acc, U n, V m, uint32_t imm, mve_pred_t p); The intrinsics are not part of the released ACLE spec, but internally at Arm we have reached consensus to add them to the next ACLE release. Reviewers: simon_tatham, MarkMurrayARM, ostannard, dmgreen Reviewed By: simon_tatham Subscribers: kristof.beyls, hiraditya, danielkiss, cfe-commits Tags: #clang Differential Revision: https://reviews.llvm.org/D76610	2020-03-25 17:08:19 +00:00
Yvan Roux	bd069ad39c	[ARM] Move ConstantIsland and LowOverheadLoops Passes. Move ARM ConstantIsland and LowOverheadLopps passes later in the pipeline such that they will be run after the upcoming Machine Outlining pass. Differential Revision: https://reviews.llvm.org/D76065	2020-03-25 16:49:21 +01:00
cdevadas	ce984129ea	[AMDGPU] Add SIPreEmitPeephole pass. This pass can handle all the optimization opportunities found just before code emission. Presently it includes the handling of vcc branch optimization that was handled earlier in SIInsertSkips. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D76712	2020-03-25 15:35:35 +00:00
Jonas Paulsson	f09b891d4a	[SystemZ] Improve foldMemoryOperandImpl() A spilled load of an immediate can use MVHI/MVGHI instead. A compare of a spilled register against an immediate can use CHSI/CGHSI. A logical compare can use CLFHSI/CLGHSI. Review: Ulrich Weigand Differential Revision: https://reviews.llvm.org/D76055	2020-03-25 16:21:08 +01:00
Sean Fertile	3282d875d6	[PowerPC][AIX] ByVal formal arguments in a single register. Adds support for passing ByVal formal arguments as long as they fit in a single register. Differential Revision: https://reviews.llvm.org/D76401	2020-03-25 11:09:40 -04:00
Kerry McLaughlin	05606329e2	[AArch64][SVE] Add SVE intrinsics for masked loads & stores Summary: Implements the following intrinsics for contiguous loads & stores: - @llvm.aarch64.sve.ld1 - @llvm.aarch64.sve.st1 Reviewers: sdesmalen, andwar, efriedma, cameron.mcinally, dancgr, rengolin Reviewed By: cameron.mcinally Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, danielkiss, cfe-commits, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D76688	2020-03-25 11:48:40 +00:00
Sam Parker	e87250202d	[ARM][MVE] Add HorizontalReduction flag Add a target flag for instructions that reduce into one, or more, scalar reg(s), including variants of: - VADDV - VABAV - VMINV/VMAXV - VMLADAV Differential Revision: https://reviews.llvm.org/D76683	2020-03-25 11:12:03 +00:00
Kazushi (Jam) Marukawa	28a42dd1b9	[VE] Change name of enum to CondCode Summary: Change enum name for condition codes from CondCodes to CondCode. Reviewers: arsenm, simoll, k-ishizaka Reviewed By: arsenm Subscribers: wdng, hiraditya, llvm-commits Tags: #llvm, #ve Differential Revision: https://reviews.llvm.org/D76747	2020-03-25 09:20:05 +01:00
Juneyoung Lee	453eac3f77	Minor fixes to a comment in CodeGenPrepare	2020-03-25 16:34:43 +09:00
Adrian Prantl	ed8ad6ec15	Add an -object-path-prefix option to dsymutil to remap object file paths (but no source paths) before processing. This is meant to be used for Clang objects where the module cache location was remapped using ``-fdebug-prefix-map``; to help dsymutil find the Clang module cache. <rdar://problem/55685132> Differential Revision: https://reviews.llvm.org/D76391	2020-03-24 17:13:42 -07:00
Matt Arsenault	39c55cef21	GlobalISel: Introduce bitcast legalize action For some operations, the type is unimportant and only the number of bits matters. For example I don't want to treat <4 x s8> as a legal type, but I also don't want to decompose loads of this into smaller pieces to get legal register types. On AMDGPU in SelectionDAG, we legalize a number of operations (most notably load and store) by coercing all types to vectors of i32. For GlobalISel, I'm trying very hard to avoid doing this for every type, but I don't think this strategy can be completely avoided. I'm trying to avoid bitcasts for any legitimately legal type we can operate on, since the intervening bitcasts have proven to be a hassle. For loads, I think I can get away without ever casting the result type, and handling any arbitrary bitwidth during selection (I will eventually want new tablegen support to help with this, rather than having to add every possible type as legal). The unmerge required to do anything with the value should expand to the expected shifts. This is trickier for stores, since it would now require handling a wide array of truncates during selection which I don't want. Future potentially interesting case are for vector indexing, where sub-dword type should be indexed in s32 pieces.	2020-03-24 19:33:33 -04:00
Nikita Popov	ec184dd548	[LVI] Convert some checks to assertions; NFC solveBlockValue() should only be called if the value isn't cached yet. Similarly, it does not make sense to "solve" a constant.	2020-03-24 23:11:13 +01:00
Amara Emerson	472d282046	[AArch64][GlobalISel] Don't localize TLS G_GLOBAL_VALUEs on Darwin. On Darwin these need to be selected into a function call for the TLS address lookup. As a result, they can't be moved below a physreg write, which happens in call sequences. In the long term, we should have some mechanism in the localizer to prevent localizing into target-specific atomic instruction sequences. rdar://60056248 Differential Revision: https://reviews.llvm.org/D76652	2020-03-24 13:35:50 -07:00
Johannes Doerfert	5699d08b79	[Attributor] Use knowledge retained in llvm.assume (operand bundles) This patch integrates operand bundle llvm.assumes [0] with the Attributor. Most IRAttributes will now look at uses of the associated value and if there are llvm.assume operand bundle uses with the right tag we will check if they are in the must-be-executed-context (around the context instruction). Droppable users, which is currently only llvm::assume, are handled special in some places now as well. [0] http://lists.llvm.org/pipermail/llvm-dev/2019-December/137632.html Reviewed By: uenoku Differential Revision: https://reviews.llvm.org/D74888	2020-03-24 15:33:40 -05:00
Craig Topper	e8d67ada2d	[X86] Disable autoupgrade support for avx512.mask.broadcasti32x2.* and avx512.mask.broadcastf32x2.*. These intrinsics take a v4i32/v4f32 input and are supposed to broadcast elements 0 and 1. Instead the autoupgrade code was broadcasting elements 0, 1, 2, and 3. I could fix the autoupgrade, but since its been broken for years it seemed better just to steer anyone still trying to use it away completely.	2020-03-24 12:35:24 -07:00

1 2 3 4 5 ...

132762 Commits