llvm-project

Commit Graph

Author	SHA1	Message	Date
Evgeniy Stepanov	d5a6fdbe95	[safestack] Inline safestack pointer access when possible. Summary: This adds an -mllvm flag that forces the use of a runtime function call to get the unsafe stack pointer, the same that is currently used on non-x86, non-aarch64 android. The call may be inlined. Reviewers: pcc Subscribers: aemerson, kristof.beyls, hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D37405 llvm-svn: 323259	2018-01-23 21:27:07 +00:00
Krzysztof Parzyszek	d5e8a260bb	[Hexagon] Add patterns for sext_inreg of HVX vector types llvm-svn: 323250	2018-01-23 19:56:16 +00:00
Krzysztof Parzyszek	3780a0e1fa	[Hexagon] Implement basic vector operations on vectors vNi1 In addition to that, make sure that there are no boolean vector types that are associated with multiple register classes. Specifically, remove v32i1 and v64i1 from integer register classes. These types will correspond to results of vector comparisons, and as such should belong to the vector predicate class. Having them in scalar registers as well makes legalization ambiguous. llvm-svn: 323229	2018-01-23 17:53:59 +00:00
Simon Pilgrim	6ff241fc99	[X86][SSE] LowerBUILD_VECTORAsVariablePermute - extract subvector from oversized index vectors llvm-svn: 323223	2018-01-23 17:02:15 +00:00
Dan Gohman	5464941a6a	[WebAssembly] Add mem.* intrinsics. The grow_memory and current_memory instructions are expected to be officially renamed to mem.grow and mem.size. Introduce new intrinsics with the new names. These new names aren't yet official, so for now, use them at your own risk. Also, take this opportunity to add arguments for the currently unused immediate field in those instructions. llvm-svn: 323222	2018-01-23 17:02:02 +00:00
Dan Gohman	f2c1cae5cb	[WebAssembly] Switch to *-wasm as the default target triple. This makes wasm32-unknown-unknown-wasm the default, which supports the .o file writer and the new linking ABI. To enable s2wasm-compatible output, use the wasm32-unknown-unknown-elf triple. llvm-svn: 323220	2018-01-23 16:55:44 +00:00
Alexander Ivchenko	e642231bfc	[x86] Reautogenerate a bunch of tests for D42287. NFC llvm-svn: 323215	2018-01-23 16:08:15 +00:00
Yaxun Liu	8b7454a8dd	CodeGen: Fix assertion in ScheduleDAGMILive::scheduleMI due to llvm.dbg.value Fix a bug in ScheduleDAGMILive::scheduleMI which causes BotRPTracker not tracking CurrentBottom in some rare cases involving llvm.dbg.value. This issues causes amdgcn target to assert when compiling some user codes with -g. Differential Revision: https://reviews.llvm.org/D42394 llvm-svn: 323214	2018-01-23 16:04:53 +00:00
Craig Topper	c58c2b5c9b	[X86] Rewrite vXi1 element insertion by using a vXi1 scalar_to_vector and inserting into a vXi1 vector. The existing code was already doing something very similar to subvector insertion so this allows us to remove the nearly duplicate code. This patch is a little larger than it should be due to differences between the DQI handling between the two today. llvm-svn: 323212	2018-01-23 15:56:36 +00:00
Simon Pilgrim	0c9f77a9f9	[X86][SSE] LowerBUILD_VECTORAsVariablePermute - ensure that the source vector is not larger than the destination We might be able to support this in the future with VPERMV3, OR(PSHUFB, PSHUFB) etc. llvm-svn: 323210	2018-01-23 15:51:03 +00:00
Alexander Ivchenko	347921a281	[x86] Mostly reautogenerate a bunch of tests that affect D37775. NFC Tests required minor manual tweaks: CodeGen/MIR/X86/generic-instr-type.mir CodeGen/X86/GlobalISel/select-copy.mir CodeGen/X86/GlobalISel/select-ext.mir CodeGen/X86/GlobalISel/select-intrinsic-x86-flags-read-u32.mir CodeGen/X86/GlobalISel/select-phi.mir CodeGen/X86/GlobalISel/select-trunc.mir CodeGen/X86/GlobalISel/select-frameIndex.mir And following tests are split into 32/64 versions: CodeGen/X86/GlobalISel/legalize-GV.mir CodeGen/X86/GlobalISel/select-frameIndex.mir llvm-svn: 323209	2018-01-23 15:48:50 +00:00
Simon Pilgrim	e2905c8a0c	[X86][SSE] LowerBUILD_VECTORAsVariablePermute - ensure that the index vector has the correct number of elements llvm-svn: 323206	2018-01-23 15:13:37 +00:00
Tim Northover	f9b560aa8e	AArch64: get type from correct result when forming BFX Some nodes produce multiple values so when obtaining the type of an ISD::OR we need to make sure we ask for the correct one. Hopefully that's all of them. llvm-svn: 323205	2018-01-23 15:11:27 +00:00
Tim Northover	9f3003d08f	AArch64: get type from correct result when forming BFI/BFM Some nodes produce multiple values so when obtaining the type of an ISD::OR we need to make sure we ask for the correct one. llvm-svn: 323202	2018-01-23 14:37:03 +00:00
Craig Topper	76adcc86cd	[X86] Legalize v32i1 without BWI via splitting to v16i1 rather than the default of promoting to v32i8. Summary: For the most part its better to keep v32i1 as a mask type of a narrower width than trying to promote it to a ymm register. I had to add some overrides to the methods that get the types for the calling convention so that we still use v32i8 for argument/return purposes. There are still some regressions in here. I definitely saw some around shuffles. I think we probably should move vXi1 shuffle from lowering to a DAG combine where I think the extend and truncate we have to emit would be better combined. I think we also need a DAG combine to remove trunc from (extract_vector_elt (trunc)) Overall this removes something like 13000 CHECK lines from lit tests. Reviewers: zvi, RKSimon, delena, spatel Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D42031 llvm-svn: 323201	2018-01-23 14:25:39 +00:00
Simon Pilgrim	8ea1a0c690	[X86][SSE] LowerBUILD_VECTORAsVariablePermute - fix PSHUFB source/index operand ordering As detailed in rL317463, PSHUFB (like most variable shuffle instructions) uses Op[0] for the source vector and Op[1] for the shuffle index vector, VPERMV works in reverse which is probably where the confusion comes from. Differential Revision: https://reviews.llvm.org/D42380 llvm-svn: 323190	2018-01-23 11:39:06 +00:00
MinSeong Kim	27f77b4300	[Analysis] Disable exp/exp2/pow finite lib calls on Android with -ffast-math. Summary: Since r322087, glibc's finite lib calls are generated when possible. However, glibc is not supported on Android. Therefore this change enables llvm to finely distinguish between linux and Android for unsupported library calls. The change also include some regression tests. Reviewers: srhines, pirama Reviewed By: srhines Subscribers: kongyi, chh, javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D42288 llvm-svn: 323187	2018-01-23 11:11:36 +00:00
Stefan Maksimovic	98749e0249	[mips] Properly select abs and sqrt instructions - Alter abs for micromips to have both AFGR64 and FGR64 variants, same as sqrt - Remove sqrt and abs from MicroMips32r6InstrInfo.td, use micromips FGR64 variants - Restrict non-micromips abs/sqrt with NotInMicroMips predicate Differential revision: https://reviews.llvm.org/D41439 llvm-svn: 323184	2018-01-23 10:09:39 +00:00
Craig Topper	c92edd994e	[X86] Don't reorder (srl (and X, C1), C2) if (and X, C1) can be matched as a movzx Summary: If we can match as a zero extend there's no need to flip the order to get an encoding benefit. As movzx is 3 bytes with independent source/dest registers. The shortest 'and' we could make is also 3 bytes unless we get lucky in the register allocator and its on AL/AX/EAX which have a 2 byte encoding. This patch was more impressive before r322957 went in. It removed some of the same Ands that got deleted by that patch. Reviewers: spatel, RKSimon Reviewed By: spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D42313 llvm-svn: 323175	2018-01-23 05:45:52 +00:00
Craig Topper	e5aea25980	[X86] Remove 'NOREX' comment from the printing of _NOREX instructions. Some of the NOREX instructions are used in 32-bit mode making this printing confusing. It also doesn't provide a lot of value since you can see the h-register being used by the instruction. llvm-svn: 323174	2018-01-23 05:37:00 +00:00
Chandler Carruth	c58f2166ab	Introduce the "retpoline" x86 mitigation technique for variant #2 of the speculative execution vulnerabilities disclosed today, specifically identified by CVE-2017-5715, "Branch Target Injection", and is one of the two halves to Spectre.. Summary: First, we need to explain the core of the vulnerability. Note that this is a very incomplete description, please see the Project Zero blog post for details: https://googleprojectzero.blogspot.com/2018/01/reading-privileged-memory-with-side.html The basis for branch target injection is to direct speculative execution of the processor to some "gadget" of executable code by poisoning the prediction of indirect branches with the address of that gadget. The gadget in turn contains an operation that provides a side channel for reading data. Most commonly, this will look like a load of secret data followed by a branch on the loaded value and then a load of some predictable cache line. The attacker then uses timing of the processors cache to determine which direction the branch took in the speculative execution, and in turn what one bit of the loaded value was. Due to the nature of these timing side channels and the branch predictor on Intel processors, this allows an attacker to leak data only accessible to a privileged domain (like the kernel) back into an unprivileged domain. The goal is simple: avoid generating code which contains an indirect branch that could have its prediction poisoned by an attacker. In many cases, the compiler can simply use directed conditional branches and a small search tree. LLVM already has support for lowering switches in this way and the first step of this patch is to disable jump-table lowering of switches and introduce a pass to rewrite explicit indirectbr sequences into a switch over integers. However, there is no fully general alternative to indirect calls. We introduce a new construct we call a "retpoline" to implement indirect calls in a non-speculatable way. It can be thought of loosely as a trampoline for indirect calls which uses the RET instruction on x86. Further, we arrange for a specific call->ret sequence which ensures the processor predicts the return to go to a controlled, known location. The retpoline then "smashes" the return address pushed onto the stack by the call with the desired target of the original indirect call. The result is a predicted return to the next instruction after a call (which can be used to trap speculative execution within an infinite loop) and an actual indirect branch to an arbitrary address. On 64-bit x86 ABIs, this is especially easily done in the compiler by using a guaranteed scratch register to pass the target into this device. For 32-bit ABIs there isn't a guaranteed scratch register and so several different retpoline variants are introduced to use a scratch register if one is available in the calling convention and to otherwise use direct stack push/pop sequences to pass the target address. This "retpoline" mitigation is fully described in the following blog post: https://support.google.com/faqs/answer/7625886 We also support a target feature that disables emission of the retpoline thunk by the compiler to allow for custom thunks if users want them. These are particularly useful in environments like kernels that routinely do hot-patching on boot and want to hot-patch their thunk to different code sequences. They can write this custom thunk and use `-mretpoline-external-thunk` in addition to `-mretpoline`. In this case, on x86-64 thu thunk names must be: ``` __llvm_external_retpoline_r11 ``` or on 32-bit: ``` __llvm_external_retpoline_eax __llvm_external_retpoline_ecx __llvm_external_retpoline_edx __llvm_external_retpoline_push ``` And the target of the retpoline is passed in the named register, or in the case of the `push` suffix on the top of the stack via a `pushl` instruction. There is one other important source of indirect branches in x86 ELF binaries: the PLT. These patches also include support for LLD to generate PLT entries that perform a retpoline-style indirection. The only other indirect branches remaining that we are aware of are from precompiled runtimes (such as crt0.o and similar). The ones we have found are not really attackable, and so we have not focused on them here, but eventually these runtimes should also be replicated for retpoline-ed configurations for completeness. For kernels or other freestanding or fully static executables, the compiler switch `-mretpoline` is sufficient to fully mitigate this particular attack. For dynamic executables, you must compile all libraries with `-mretpoline` and additionally link the dynamic executable and all shared libraries with LLD and pass `-z retpolineplt` (or use similar functionality from some other linker). We strongly recommend also using `-z now` as non-lazy binding allows the retpoline-mitigated PLT to be substantially smaller. When manually apply similar transformations to `-mretpoline` to the Linux kernel we observed very small performance hits to applications running typical workloads, and relatively minor hits (approximately 2%) even for extremely syscall-heavy applications. This is largely due to the small number of indirect branches that occur in performance sensitive paths of the kernel. When using these patches on statically linked applications, especially C++ applications, you should expect to see a much more dramatic performance hit. For microbenchmarks that are switch, indirect-, or virtual-call heavy we have seen overheads ranging from 10% to 50%. However, real-world workloads exhibit substantially lower performance impact. Notably, techniques such as PGO and ThinLTO dramatically reduce the impact of hot indirect calls (by speculatively promoting them to direct calls) and allow optimized search trees to be used to lower switches. If you need to deploy these techniques in C++ applications, we strongly recommend that you ensure all hot call targets are statically linked (avoiding PLT indirection) and use both PGO and ThinLTO. Well tuned servers using all of these techniques saw 5% - 10% overhead from the use of retpoline. We will add detailed documentation covering these components in subsequent patches, but wanted to make the core functionality available as soon as possible. Happy for more code review, but we'd really like to get these patches landed and backported ASAP for obvious reasons. We're planning to backport this to both 6.0 and 5.0 release streams and get a 5.0 release with just this cherry picked ASAP for distros and vendors. This patch is the work of a number of people over the past month: Eric, Reid, Rui, and myself. I'm mailing it out as a single commit due to the time sensitive nature of landing this and the need to backport it. Huge thanks to everyone who helped out here, and everyone at Intel who helped out in discussions about how to craft this. Also, credit goes to Paul Turner (at Google, but not an LLVM contributor) for much of the underlying retpoline design. Reviewers: echristo, rnk, ruiu, craig.topper, DavidKreitzer Subscribers: sanjoy, emaste, mcrosier, mgorny, mehdi_amini, hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D41723 llvm-svn: 323155	2018-01-22 22:05:25 +00:00
Mark Searles	7687d42052	[AMDGPU] SI Load Store Optimizer: When merging with offset, use V_ADD_{I\|U}32_e64 - Change inserted add ( V_ADD_{I\|U}32_e32 ) to _e64 version ( V_ADD_{I\|U}32_e64 ) so that the add uses a vreg for the carry; this prevents inserted v_add from killing VCC; the _e64 version doesn't accept a literal in its encoding, so we need to introduce a mov instr as well to get the imm into a register. - Change pass name to "SI Load Store Optimizer"; this removes the '/', which complicates scripts. Differential Revision: https://reviews.llvm.org/D42124 llvm-svn: 323153	2018-01-22 21:46:43 +00:00
Petar Jovanovic	29aced1bae	[mips] add warnings for using dsp and msa flags with inappropriate revisions Dsp and dspr2 require MIPS revision 2, while msa requires revision 5. Adding warnings for cases when these flags are used with earlier revision. Patch by Milos Stojanovic. Differential Revision: https://reviews.llvm.org/D40490 llvm-svn: 323131	2018-01-22 16:43:30 +00:00
Carey Williams	da15b5b116	[AArch64] optimise v4f16 fcmps to utilise vector instructions Improves the code generation for v4f16 FCMP instructions when FullFP16 is not supported. Generating FCTVL(s) rather than a longer series of FCVTs. Differential Revision: https://reviews.llvm.org/D41772 llvm-svn: 323118	2018-01-22 14:16:11 +00:00
Simon Pilgrim	a2b157bde4	[X86][AVX] Add test case for PR34370 llvm-svn: 323106	2018-01-22 12:27:22 +00:00
Simon Pilgrim	17682a86da	[X86][SSE] Add ISD::VECTOR_SHUFFLE to faux shuffle decoding (Reapplied) Primarily, this allows us to use the aggressive extraction mechanisms in combineExtractWithShuffle earlier and make use of UNDEF elements that may be lost during lowering. Reapplied after rL322279 was reverted at rL322335 due to PR35918, underlying issue was fixed at rL322644. llvm-svn: 323104	2018-01-22 12:05:17 +00:00
Marina Yatsina	77a21dbad4	Break false dependencies for POPCNT, LZCNT, TZCNT Add POPCNT, LZCNT, TZCNT to the list of instructions that have false dependency. Add a test to make sure BreakFalseDeps breaks the dependencies for these instructions. Update affected tests. This fixes bugzilla https://bugs.llvm.org/show_bug.cgi?id=33869 This is the final of multiple patches that fix this bugzilla. Most of the patches are intended at refactoring the existent code. Reviews of the refactoring done to enable this change: https://reviews.llvm.org/D40330 https://reviews.llvm.org/D40331 https://reviews.llvm.org/D40332 https://reviews.llvm.org/D40333 Differential Revision: https://reviews.llvm.org/D40334 Change-Id: If95cbf1a3f5c7dccff8f1b22ecb397542147303d llvm-svn: 323096	2018-01-22 10:07:01 +00:00
Marina Yatsina	6fc2aaae8d	Separate ExecutionDepsFix into 4 parts: 1. ReachingDefsAnalysis - Allows to identify for each instruction what is the “closest” reaching def of a certain register. Used by BreakFalseDeps (for clearance calculation) and ExecutionDomainFix (for arbitrating conflicting domains). 2. ExecutionDomainFix - Changes the variant of the instructions in order to minimize domain crossings. 3. BreakFalseDeps - Breaks false dependencies. 4. LoopTraversal - Creatws a traversal order of the basic blocks that is optimal for loops (introduced in revision L293571). Both ExecutionDomainFix and ReachingDefsAnalysis use this to determine the order they will traverse the basic blocks. This also included the following changes to ExcecutionDepsFix original logic: 1. BreakFalseDeps and ReachingDefsAnalysis logic no longer restricted by a register class. 2. ReachingDefsAnalysis tracks liveness of reg units instead of reg indices into a given reg class. Additional changes in affected files: 1. X86 and ARM targets now inherit from ExecutionDomainFix instead of ExecutionDepsFix. BreakFalseDeps also was added to the passes they activate. 2. Comments and references to ExecutionDepsFix replaced with ExecutionDomainFix and BreakFalseDeps, as appropriate. Additional refactoring changes will follow. This commit is (almost) NFC. The only functional change is that now BreakFalseDeps will break dependency for all register classes. Since no additional instructions were added to the list of instructions that have false dependencies, there is no actual change yet. In a future commit several instructions (and tests) will be added. This is the first of multiple patches that fix bugzilla https://bugs.llvm.org/show_bug.cgi?id=33869 Most of the patches are intended at refactoring the existent code. Additional relevant reviews: https://reviews.llvm.org/D40331 https://reviews.llvm.org/D40332 https://reviews.llvm.org/D40333 https://reviews.llvm.org/D40334 Differential Revision: https://reviews.llvm.org/D40330 Change-Id: Icaeb75e014eff96a8f721377783f9a3e6c679275 llvm-svn: 323087	2018-01-22 10:05:23 +00:00
Craig Topper	7fddf2bfef	[X86] Add an override of targetShrinkDemandedConstant to limit the damage that shrinkdemandedbits can do to zext_in_reg operations Summary: This patch adds an implementation of targetShrinkDemandedConstant that tries to keep shrinkdemandedbits from removing bits that would otherwise have been recognized as a movzx. We still need a follow patch to stop moving ands across srl if the and could be represented as a movzx before the shift but not after. I think this should help with some of the cases that D42088 ended up removing during isel. Reviewers: spatel, RKSimon Reviewed By: spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D42265 llvm-svn: 323048	2018-01-20 18:50:09 +00:00
Jonas Paulsson	9cee52732f	Move new test from Generic to SystemZ. A few build bots failed with r323042 because they are not configured to build the SystemZ target. llvm-svn: 323044	2018-01-20 16:57:06 +00:00
Jonas Paulsson	7ad28863fb	[SelectionDAG] Fix codegen of vector stores with non byte-sized elements. This was completely broken, but hopefully fixed by this patch. In cases where it is needed, a vector with non byte-sized elements is stored by extracting, zero-extending, shift:ing and or:ing the elements into an integer of the same width as the vector, which is then stored. Review: Eli Friedman, Ulrich Weigand https://reviews.llvm.org/D42100#inline-369520 https://bugs.llvm.org/show_bug.cgi?id=35520 llvm-svn: 323042	2018-01-20 16:05:10 +00:00
Craig Topper	a6c6d65b72	[X86] Add some more v32i1 shuffle tests with shuffles between mask creation and mask usage rather than being just shuffling input arguments. The existing tests just tested shuffles of v32i1 inputs, but arguments are promoted to v32i8. So it wasn't a good demonstration of v32i1 shuffle handling. The new test cases use compares and selects to get k-register operations around the shuffle. This is prep work for demonstrating changes from D42031. llvm-svn: 323031	2018-01-20 08:13:35 +00:00
Craig Topper	4f14153494	[X86] Add test cases for failures to use movzx due to various issues with demanded bits. D42265 and D42313 should help with some of these. llvm-svn: 323030	2018-01-20 07:50:57 +00:00
Saleem Abdulrasool	d3169b478c	test: fix ARM tests harder Remove the missed check update for the removal of the x86 specific vector call on ARM. llvm-svn: 323023	2018-01-20 01:26:46 +00:00
Saleem Abdulrasool	6f831d54f8	test: move ARM test from x86 The ARM backend is not guaranteed to be present on x86, move the test to the ARM tests. llvm-svn: 323021	2018-01-20 01:03:11 +00:00
Saleem Abdulrasool	99f479abcf	CodeGen: handle llvm.used properly for COFF `llvm.used` contains a list of pointers to named values which the compiler, assembler, and linker are required to treat as if there is a reference that they cannot see. Ensure that the symbols are preserved by adding an explicit `-include` reference to the linker command. llvm-svn: 323017	2018-01-20 00:28:02 +00:00
Craig Topper	08bd14803c	[X86] Teach X86 codegen to use vector width preference to avoid promoting to 512-bit types when VLX is enabled and the preference is for a smaller size. This change applies to places where we would turn 128/256-bit code into 512-bit in order to get a wider element type through sext/zext. Any 512-bit types that already existed in the IR/DAG will be left that way. The width preference has no effect on codegen behavior when the target does not have AVX512 enabled. So AVX/AVX2 codegen cannot be limited via this mechanism yet. If the preference is lower than 256 we may still use a 256 bit type to do the operation. Constraining to 128 bits makes it much more difficult to support some operations. For many of these cases we need to change element width while keeping element count constant which is easiest done by switching between 256 and 128 bit. The preference is only obeyed when AVX512 and VLX are available. This means the preference is not obeyed for KNL, but is obeyed for SKX, Cannonlake, and Icelake. For KNL, the only way to do masked operation is on 512-bit registers so we would have to completely disable masking to obey the preference. We would also lose support for gather, scatter, ctlz, vXi64 multiplies, etc. This may change in the future, but this simplifies the initial implementation. Differential Revision: https://reviews.llvm.org/D41895 llvm-svn: 323016	2018-01-20 00:26:12 +00:00
Sanjay Patel	4127e77e13	[x86] add tests for sqrt estimate that should respect denorms; NFC (PR34994) llvm-svn: 323003	2018-01-19 22:47:49 +00:00
Craig Topper	b0959fce1b	[X86] Autogenerate complete checks on a couple tests. NFC llvm-svn: 322997	2018-01-19 22:04:20 +00:00
Jessica Paquette	a499c3c29d	Add optional DICompileUnit to DIBuilder + make outliner debug info use it Previously, the DIBuilder didn't expose functionality to set its compile unit in any other way than calling createCompileUnit. This meant that the outliner, which creates new functions, had to create a new compile unit for its debug info. This commit adds an optional parameter in the DIBuilder's constructor which lets you set its CU at construction. It also changes the MachineOutliner so that it keeps track of the DISubprograms for each outlined sequence. If debugging information is requested, then it uses one of the outlined sequence's DISubprograms to grab a CU. It then uses that CU to construct the DISubprogram for the new outlined function. The test has also been updated to reflect this change. See https://reviews.llvm.org/D42254 for more information. Also see the e-mail discussion on D42254 in llvm-commits for more context. llvm-svn: 322992	2018-01-19 21:21:49 +00:00
Ulrich Weigand	426f6bef44	[SystemZ] Prefer LOCHI over generating IPM sequences On current machines we have load-on-condition instructions that can be used to directly implement the SETCC semantics. If we have those, it is always preferable to use them instead of generating the IPM sequence. llvm-svn: 322989	2018-01-19 20:56:04 +00:00
Ulrich Weigand	31112895d9	[SystemZ] Directly use CC result of compare-and-swap In order to implement a test whether a compare-and-swap succeeded, the SystemZ back-end currently emits a rather inefficient sequence of first converting the CC result into an integer, and then testing that integer against zero. This commit changes the back-end to simply directly test the CC value set by the compare-and-swap instruction. llvm-svn: 322988	2018-01-19 20:54:18 +00:00
Ulrich Weigand	849a59fd4b	[SystemZ] Rework IPM sequence generation The SystemZ back-end uses a sequence of IPM followed by arithmetic operations to implement the SETCC primitive. This is currently done early during SelectionDAG. This patch moves generating those sequences to much later in SelectionDAG (during PreprocessISelDAG). This doesn't change much in generated code by itself, but it allows further enhancements that will be checked-in as follow-on commits. llvm-svn: 322987	2018-01-19 20:52:04 +00:00
Ulrich Weigand	ac04d9b8e5	[SystemZ] Run branch-12.ll test only if long tests enabled This avoids excessive test run times e.g. with expensive checks enabled. llvm-svn: 322983	2018-01-19 19:51:38 +00:00
Simon Pilgrim	969a432b18	[X86][SSE] Add SSE2 gather tests Check codegen without PEXTRD llvm-svn: 322974	2018-01-19 17:50:25 +00:00
Joel Galenson	dbc724f764	[ARM] Fix perf regression in compare optimization. Fix a performance regression caused by r322737. While trying to make it easier to replace compares with existing adds and subtracts, I accidentally stopped it from doing so in some cases. This should fix that. I'm also fixing another potential bug in that commit. Differential Revision: https://reviews.llvm.org/D42263 llvm-svn: 322972	2018-01-19 17:46:27 +00:00
Derek Schuff	bfb02aec5a	[WebAssembly] Fix libcall signature lookup RuntimeLibcallSignatures previously manually initialized all the libcall names into an array and searched it linearly for the first match to lookup the corresponding index. r322802 switched that to initializing a map keyed by the libcall name. Neither of these approaches works correctly because some libcall numbers use the same name on different platforms (e.g. the "l" suffixed functions use f80 or f128 or ppcf128). This change fixes that by ensuring that each name only goes into the map once. It also adds tests. Differential Revision: https://reviews.llvm.org/D42271 llvm-svn: 322971	2018-01-19 17:45:54 +00:00
Dan Gohman	5d2b9354b1	[WebAssembly] Make sign-extension opcodes a distinct feature. Sign-extension opcodes have been split into a separate proposal from the main threads proposal, so switch them to their own target feature. See: https://github.com/WebAssembly/sign-extension-ops llvm-svn: 322966	2018-01-19 17:16:24 +00:00
Daniel Neilson	1e68724d24	Remove alignment argument from memcpy/memmove/memset in favour of alignment attributes (Step 1) Summary: This is a resurrection of work first proposed and discussed in Aug 2015: http://lists.llvm.org/pipermail/llvm-dev/2015-August/089384.html and initially landed (but then backed out) in Nov 2015: http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20151109/312083.html The @llvm.memcpy/memmove/memset intrinsics currently have an explicit argument which is required to be a constant integer. It represents the alignment of the dest (and source), and so must be the minimum of the actual alignment of the two. This change is the first in a series that allows source and dest to each have their own alignments by using the alignment attribute on their arguments. In this change we: 1) Remove the alignment argument. 2) Add alignment attributes to the source & dest arguments. We, temporarily, require that the alignments for source & dest be equal. For example, code which used to read: call void @llvm.memcpy.p0i8.p0i8.i32(i8* %dest, i8* %src, i32 100, i32 4, i1 false) will now read call void @llvm.memcpy.p0i8.p0i8.i32(i8* align 4 %dest, i8* align 4 %src, i32 100, i1 false) Downstream users may have to update their lit tests that check for @llvm.memcpy/memmove/memset call/declaration patterns. The following extended sed script may help with updating the majority of your tests, but it does not catch all possible patterns so some manual checking and updating will be required. s~declare void @llvm\.mem(set\|cpy\|move)\.p([^(])$(.), i32, i1$~declare void @llvm.mem\1.p\2(\3, i1)~g s~call void @llvm\.memset\.p([^(])i8$i8([^])\ (.), i8 (.), i8 (.), i32 [01], i1 ([^)])$~call void @llvm.memset.p\1i8(i8\2* \3, i8 \4, i8 \5, i1 \6)~g s~call void @llvm\.memset\.p([^(])i16$i8([^])\ (.), i8 (.), i16 (.), i32 [01], i1 ([^)])$~call void @llvm.memset.p\1i16(i8\2* \3, i8 \4, i16 \5, i1 \6)~g s~call void @llvm\.memset\.p([^(])i32$i8([^])\ (.), i8 (.), i32 (.), i32 [01], i1 ([^)])$~call void @llvm.memset.p\1i32(i8\2* \3, i8 \4, i32 \5, i1 \6)~g s~call void @llvm\.memset\.p([^(])i64$i8([^])\ (.), i8 (.), i64 (.), i32 [01], i1 ([^)])$~call void @llvm.memset.p\1i64(i8\2* \3, i8 \4, i64 \5, i1 \6)~g s~call void @llvm\.memset\.p([^(])i128$i8([^])\ (.), i8 (.), i128 (.), i32 [01], i1 ([^)])$~call void @llvm.memset.p\1i128(i8\2* \3, i8 \4, i128 \5, i1 \6)~g s~call void @llvm\.memset\.p([^(])i8$i8([^])\ (.), i8 (.), i8 (.), i32 ([0-9]), i1 ([^)])$~call void @llvm.memset.p\1i8(i8\2 align \6 \3, i8 \4, i8 \5, i1 \7)~g s~call void @llvm\.memset\.p([^(])i16$i8([^])\ (.), i8 (.), i16 (.), i32 ([0-9]), i1 ([^)])$~call void @llvm.memset.p\1i16(i8\2 align \6 \3, i8 \4, i16 \5, i1 \7)~g s~call void @llvm\.memset\.p([^(])i32$i8([^])\ (.), i8 (.), i32 (.), i32 ([0-9]), i1 ([^)])$~call void @llvm.memset.p\1i32(i8\2 align \6 \3, i8 \4, i32 \5, i1 \7)~g s~call void @llvm\.memset\.p([^(])i64$i8([^])\ (.), i8 (.), i64 (.), i32 ([0-9]), i1 ([^)])$~call void @llvm.memset.p\1i64(i8\2 align \6 \3, i8 \4, i64 \5, i1 \7)~g s~call void @llvm\.memset\.p([^(])i128$i8([^])\ (.), i8 (.), i128 (.), i32 ([0-9]), i1 ([^)])$~call void @llvm.memset.p\1i128(i8\2 align \6 \3, i8 \4, i128 \5, i1 \7)~g s~call void @llvm\.mem(cpy\|move)\.p([^(])i8$i8([^])\ (.), i8([^])\ (.), i8 (.), i32 [01], i1 ([^)])$~call void @llvm.mem\1.p\2i8(i8\3 \4, i8\5* \6, i8 \7, i1 \8)~g s~call void @llvm\.mem(cpy\|move)\.p([^(])i16$i8([^])\ (.), i8([^])\ (.), i16 (.), i32 [01], i1 ([^)])$~call void @llvm.mem\1.p\2i16(i8\3 \4, i8\5* \6, i16 \7, i1 \8)~g s~call void @llvm\.mem(cpy\|move)\.p([^(])i32$i8([^])\ (.), i8([^])\ (.), i32 (.), i32 [01], i1 ([^)])$~call void @llvm.mem\1.p\2i32(i8\3 \4, i8\5* \6, i32 \7, i1 \8)~g s~call void @llvm\.mem(cpy\|move)\.p([^(])i64$i8([^])\ (.), i8([^])\ (.), i64 (.), i32 [01], i1 ([^)])$~call void @llvm.mem\1.p\2i64(i8\3 \4, i8\5* \6, i64 \7, i1 \8)~g s~call void @llvm\.mem(cpy\|move)\.p([^(])i128$i8([^])\ (.), i8([^])\ (.), i128 (.), i32 [01], i1 ([^)])$~call void @llvm.mem\1.p\2i128(i8\3 \4, i8\5* \6, i128 \7, i1 \8)~g s~call void @llvm\.mem(cpy\|move)\.p([^(])i8$i8([^])\ (.), i8([^])\ (.), i8 (.), i32 ([0-9]), i1 ([^)])$~call void @llvm.mem\1.p\2i8(i8\3* align \8 \4, i8\5* align \8 \6, i8 \7, i1 \9)~g s~call void @llvm\.mem(cpy\|move)\.p([^(])i16$i8([^])\ (.), i8([^])\ (.), i16 (.), i32 ([0-9]), i1 ([^)])$~call void @llvm.mem\1.p\2i16(i8\3* align \8 \4, i8\5* align \8 \6, i16 \7, i1 \9)~g s~call void @llvm\.mem(cpy\|move)\.p([^(])i32$i8([^])\ (.), i8([^])\ (.), i32 (.), i32 ([0-9]), i1 ([^)])$~call void @llvm.mem\1.p\2i32(i8\3* align \8 \4, i8\5* align \8 \6, i32 \7, i1 \9)~g s~call void @llvm\.mem(cpy\|move)\.p([^(])i64$i8([^])\ (.), i8([^])\ (.), i64 (.), i32 ([0-9]), i1 ([^)])$~call void @llvm.mem\1.p\2i64(i8\3* align \8 \4, i8\5* align \8 \6, i64 \7, i1 \9)~g s~call void @llvm\.mem(cpy\|move)\.p([^(])i128$i8([^])\ (.), i8([^])\ (.), i128 (.), i32 ([0-9]), i1 ([^)])$~call void @llvm.mem\1.p\2i128(i8\3* align \8 \4, i8\5* align \8 \6, i128 \7, i1 \9)~g The remaining changes in the series will: Step 2) Expand the IRBuilder API to allow creation of memcpy/memmove with differing source and dest alignments. Step 3) Update Clang to use the new IRBuilder API. Step 4) Update Polly to use the new IRBuilder API. Step 5) Update LLVM passes that create memcpy/memmove calls to use the new IRBuilder API, and those that use use MemIntrinsicInst::[get\|set]Alignment() to use getDestAlignment() and getSourceAlignment() instead. Step 6) Remove the single-alignment IRBuilder API for memcpy/memmove, and the MemIntrinsicInst::[get\|set]Alignment() methods. Reviewers: pete, hfinkel, lhames, reames, bollu Reviewed By: reames Subscribers: niosHD, reames, jholewinski, qcolombet, jfb, sanjoy, arsenm, dschuff, dylanmckay, mehdi_amini, sdardis, nemanjai, david2050, nhaehnle, javed.absar, sbc100, jgravelle-google, eraman, aheejin, kbarton, JDevlieghere, asb, rbar, johnrusso, simoncook, jordy.potman.lists, apazos, sabuasal, llvm-commits Differential Revision: https://reviews.llvm.org/D41675 llvm-svn: 322965	2018-01-19 17:13:12 +00:00
Sanjay Patel	4c8382eec6	[x86] add RUN line and auto-generate checks There were checks for a 32-bit target here, but no RUN line corresponding to that prefix. I don't know what the intent of these tests is, but at least now we can see what happens for both targets. llvm-svn: 322961	2018-01-19 17:09:28 +00:00
Sanjay Patel	6b0fd436da	[x86] regenerate complete checks; NFC D42265 will improve something here, but it's not obvious how without more checks. llvm-svn: 322960	2018-01-19 17:05:16 +00:00
Sanjay Patel	74a1eef7c4	[x86] shrink 'and' immediate values by setting the high bits (PR35907) Try to reverse the constant-shrinking that happens in SimplifyDemandedBits() for 'and' masks when it results in a smaller sign-extended immediate. We are also able to detect dead 'and' ops here (the mask is all ones). In that case, we replace and return without selecting the 'and'. Other targets might want to share some of this logic by enabling this under a target hook, but I didn't see diffs for simple cases with PowerPC or AArch64, so they may already have some specialized logic for this kind of thing or have different needs. This should solve PR35907: https://bugs.llvm.org/show_bug.cgi?id=35907 Differential Revision: https://reviews.llvm.org/D42088 llvm-svn: 322957	2018-01-19 16:37:25 +00:00
Nirav Dave	72d32f24f5	[X86] Extend load-op-store fusion merge to ADC/SBB. Summary: Add handling of EFLAG input to X86 Load-op-store fusion checking. Reviewers: craig.topper, RKSimon Subscribers: llvm-commits, hiraditya Differential Revision: https://reviews.llvm.org/D42128 llvm-svn: 322952	2018-01-19 15:37:57 +00:00
Simon Pilgrim	586b31b870	[X86][AVX] Add more variable permute tests for source vectors smaller than destination llvm-svn: 322948	2018-01-19 14:55:22 +00:00
Simon Pilgrim	37d977bc68	Fix line endings. NFCI. llvm-svn: 322940	2018-01-19 12:09:17 +00:00
Simon Pilgrim	65a565bf21	[X86] Add KNL target to slow PMULLD tests llvm-svn: 322939	2018-01-19 12:07:44 +00:00
Simon Pilgrim	852abd1ab6	[X86] Add RDPID schedule test llvm-svn: 322938	2018-01-19 12:06:49 +00:00
Simon Pilgrim	9b839ef354	[X86] Regenerate RDPMC intrinsic test llvm-svn: 322937	2018-01-19 12:05:58 +00:00
Matthias Braun	3ab9fcb98e	Split TailDuplicatePass into pre- and post-RA variant; NFC Split TailDuplicatePass into EarlyTailDuplicate and TailDuplicate. This avoids playing games with fake pass IDs and using MRI::isSSA() to determine pre-/post-RA state. llvm-svn: 322926	2018-01-19 06:08:17 +00:00
Matthias Braun	8bb5228db9	Move tests to the correct place test/CodeGen/MIR is for testing the MIR parser/printer. Tests for passes and targets belong to test/CodeGen/TARGETNAME. llvm-svn: 322925	2018-01-19 06:08:15 +00:00
Serguei Katkov	22bb1c0e17	Revert [CGP] Re-enable Select in complex addressing mode One of buildbots failed. Revert for now till fix the issue. llvm-svn: 322923	2018-01-19 04:52:39 +00:00
Matthias Braun	5c290dc206	AArch64: Fix emergency spillslot being out of reach for large callframes Re-commit of r322200: The testcase shouldn't hit machineverifiers anymore with r322917 in place. Large callframes (calls with several hundreds or thousands or parameters) could lead to situations in which the emergency spillslot is out of range to be addressed relative to the stack pointer. This commit forces the use of a frame pointer in the presence of large callframes. This commit does several things: - Compute max callframe size at the end of instruction selection. - Add mirFileLoaded target callback. Use it to compute the max callframe size after loading a .mir file when the size wasn't specified in the file. - Let TargetFrameLowering::hasFP() return true if there exists a callframe > 255 bytes. - Always place the emergency spillslot close to FP if we have a frame pointer. - Note that `useFPForScavengingIndex()` would previously return false when a base pointer was available leading to the emergency spillslot getting allocated late (that's the whole effect of this callback). Which made no sense to me so I took this case out: Even though the emergency spillslot is technically not referenced by FP in this case we still want it allocated early. Differential Revision: https://reviews.llvm.org/D40876 llvm-svn: 322919	2018-01-19 03:16:36 +00:00
Matthias Braun	dc4b3e87f4	AArch64: Omit callframe setup/destroy when not necessary Do not create CALLSEQ_START/CALLSEQ_END when there is no callframe to setup and the callframe size is 0. - Fixes an invalid callframe nesting for byval arguments, which would look like this before this patch (as in `big-byval.ll`): ... ADJCALLSTACKDOWN 32768, 0, ... # Setup for extfunc ... ADJCALLSTACKDOWN 0, 0, ... # setup for memcpy ... BL &memcpy ... ADJCALLSTACKUP 0, 0, ... # destroy for memcpy ... BL &extfunc ADJCALLSTACKUP 32768, 0, ... # destroy for extfunc - Saves us two instructions in the common case of zero-sized stackframes. - Remove an unnecessary scheduling barrier (hence the small unittest changes). Differential Revision: https://reviews.llvm.org/D42006 llvm-svn: 322917	2018-01-19 02:45:38 +00:00
Craig Topper	84b26b90d1	[X86] Add intrinsic support for the RDPID instruction This adds a new instrinsic to support the rdpid instruction. The implementation is a bit weird because the intrinsic is defined as always returning 32-bits, but the assembler support thinks the instruction produces a 64-bit register in 64-bit mode. But really it zeros the upper 32 bits. So I had to add separate patterns where 64-bit mode uses an extract_subreg. Differential Revision: https://reviews.llvm.org/D42205 llvm-svn: 322910	2018-01-18 23:52:31 +00:00
Changpeng Fang	4737e892de	AMDGPU/SI: Add d16 support for image intrinsics. Summary: This patch implements d16 support for image load, image store and image sample intrinsics. Reviewers: Matt, Brian. Differential Revision: https://reviews.llvm.org/D3991 llvm-svn: 322903	2018-01-18 22:08:53 +00:00
Martin Storsjo	d96be854e5	[test] Actually check the common parts in CodeGen/ARM/global-merge-external.ll. NFC. Previously, these parts weren't ever checked. The label patterns need to be extended to match successfully on macho. Differential Revision: https://reviews.llvm.org/D42126 llvm-svn: 322900	2018-01-18 21:21:48 +00:00
Amara Emerson	d5785775f8	[AArch64][GlobalISel] Add isel support for global values in the large code model. Fixes PR35958. Differential Revision: https://reviews.llvm.org/D42175 llvm-svn: 322878	2018-01-18 19:21:27 +00:00
Simon Pilgrim	f7ca8ff071	[X86][SSE] Regenerate vector promotion tests llvm-svn: 322877	2018-01-18 19:17:26 +00:00
Simon Pilgrim	3c8e2bf830	[X86][AVX] Add 256/512-bit slow PMULLD tests llvm-svn: 322874	2018-01-18 18:38:32 +00:00
Francis Visoiu Mistrih	378b5f3de6	[CodeGen] Print RegClasses on MI in verbose mode r322086 removed the trailing information describing reg classes for each register. This patch adds printing reg classes next to every register when individual operands/instructions/basic blocks are printed. In the case of dumping MIR or printing a full function, by default don't print it. Differential Revision: https://reviews.llvm.org/D42239 llvm-svn: 322867	2018-01-18 17:59:06 +00:00
Sam McCall	768efb5379	[MachineOutliner] Fix r322788 - don't write to working directory llvm-svn: 322850	2018-01-18 15:02:28 +00:00
Simon Pilgrim	dc25e1d8e2	[X86] Add PR35918 test case llvm-svn: 322846	2018-01-18 13:42:02 +00:00
Alex Bradbury	921383828e	[RISCV] Codegen support for the standard RV32M instruction set extension llvm-svn: 322843	2018-01-18 12:36:38 +00:00
Alex Bradbury	7d6aa1f7ae	[RISCV] Implement frame pointer elimination llvm-svn: 322839	2018-01-18 11:34:02 +00:00
Andrew V. Tischenko	360974a559	A new test to demostrate the current SHLD/SHRD code generation. llvm-svn: 322828	2018-01-18 10:40:48 +00:00
Alex Bradbury	d3263aa1df	[RISCV][NFC] Add nounwind to functions in div.ll and mul.ll Committing this separately to minimise irrelevant changes for an upcoming patch. llvm-svn: 322825	2018-01-18 09:41:14 +00:00
Craig Topper	83b0a98902	[X86] Use vmovdqu64/vmovdqa64 for unmasked integer vector stores for consistency with loads. Previously we used 64 for vXi64 stores and 32 for everything else. This change uses 64 for everything just like do for loads. llvm-svn: 322820	2018-01-18 07:44:09 +00:00
Craig Topper	21c8a8fa49	[X86] Remove isel patterns for using unmasked vmovdqa32/vmovdqu32 for integer vector loads. These patterns were just looking for a vXi64 bitcasted to vXi32, but there is no advantage to using vmovdqa32 over vmovdqa64. llvm-svn: 322819	2018-01-18 07:44:06 +00:00
Craig Topper	6620a69f18	[X86] Remove windows line endings from a test file. NFC llvm-svn: 322817	2018-01-18 06:47:09 +00:00
Craig Topper	7f0d85ec1e	[DAGCombiner] Add a DAG combine to turn a splat build_vector where the splat elemnt is a bitcast from a vector type into a concat_vector For example, a build_vector of i64 bitcasted from v2i32 can be turned into a concat_vectors of the v2i32 vectors with a bitcast to a vXi64 type Differential Revision: https://reviews.llvm.org/D42090 llvm-svn: 322811	2018-01-18 04:17:06 +00:00
Justin Bogner	a9346e050f	GlobalISel: Make MachineCSE runnable in the middle of the GlobalISel Right now, it is not possible to run MachineCSE in the middle of the GlobalISel pipeline. Being able to run generic optimizations between the core passes of GlobalISel was one of the goals of the new ISel framework. This is the first attempt to do it. The problem is that MachineCSE pass assumes all register operands have a register class, which, in GlobalISel context, won't be true until after the InstructionSelect pass. The reason for this behaviour is that before replacing one virtual register with another, MachineCSE pass (and most of the other optimization machine passes) must check if the virtual registers' constraints have a (sufficiently large) intersection, and constrain the resulting register appropriately if such intersection exists. GlobalISel extends the representation of such constraints from just a register class to a triple (low-level type, register bank, register class). This commit adds MachineRegisterInfo::constrainRegAttrs method that extends MachineRegisterInfo::constrainRegClass to such a triple. The idea is that going forward we should use: - RegisterBankInfo::constrainGenericRegister within GlobalISel's InstructionSelect pass - MachineRegisterInfo::constrainRegClass within SelectionDAG ISel - MachineRegisterInfo::constrainRegAttrs everywhere else regardless the target and instruction selector it uses. Patch by Roman Tereshin. Thanks! llvm-svn: 322805	2018-01-18 02:06:56 +00:00
Volkan Keles	4aa73a649a	Fix the failure caused by r322773 Do not run GlobalISel if `-fast-isel=0 -global-isel=false`. llvm-svn: 322800	2018-01-18 01:10:30 +00:00
Jessica Paquette	729e68693f	[MachineOutliner] Add DISubprograms to outlined functions. Before, it wasn't possible to get backtraces inside outlined functions. This commit adds DISubprograms to the IR functions created by the outliner which makes this possible. Also attached a test that ensures that the produced debug information is correct. This is useful to users that want to debug outlined code. llvm-svn: 322789	2018-01-18 00:00:58 +00:00
Reid Kleckner	1aa9061c5f	[CodeGen] Hoist common AsmPrinter code out of X86, ARM, and AArch64 Every known PE COFF target emits /EXPORT: linker flags into a .drective section. The AsmPrinter should handle this. While we're at it, use global_values() and emit each export flag with its own .ascii directive. This should make the .s file output more readable. llvm-svn: 322788	2018-01-17 23:55:23 +00:00
Benjamin Kramer	8b1986b5cb	Add support for emitting libcalls for x86_fp80 -> fp128 and vice-versa compiler_rt doesn't provide them (yet), but libgcc does. PR34076. llvm-svn: 322772	2018-01-17 22:29:16 +00:00
Simon Pilgrim	d109b5e027	[X86][MMX] Add PR35982 test cases FEMMS has the same problem as EMMS llvm-svn: 322770	2018-01-17 22:19:31 +00:00
Eli Friedman	c60a23a6af	[LegalizeDAG] Fix ATOMIC_CMP_SWAP_WITH_SUCCESS legalization. The code wasn't zero-extending correctly, so the comparison could spuriously fail. Adds some AArch64 tests to cover this case. Inspired by D41791. Differential Revision: https://reviews.llvm.org/D41798 llvm-svn: 322767	2018-01-17 22:04:36 +00:00
Daniel Sanders	12e6e709e9	[globalisel][tablegen] Honour priority order within nested instructions. It appears that we haven't been prioritizing rules that contain nested instructions properly. InstructionOperandMatcher didn't override isHigherPriorityThan so it never compared the instructions/operands/predicates inside nested instructions. Fixes PR35926. Thanks to Diana Picus for the bug report. llvm-svn: 322754	2018-01-17 20:34:29 +00:00
Zaara Syeda	c9dc7b451b	Revert [PowerPC] This reverts commit rL322721 Failing build bots. Revert the commit now. llvm-svn: 322748	2018-01-17 20:00:15 +00:00
Rafael Espindola	d700869235	Use a got to access a hidden weak undefined on MachO. Trying to link __attribute__((weak, visibility("hidden"))) extern int foo; int *main(void) { return &foo; } on OS X fails with ld: 32-bit RIP relative reference out of range (-4294971318 max is +/-2GB): from _main (0x100000FAB) to _foo@0x00001000 (0x00000000) in '_main' from test.o for architecture x86_64 The problem being that 0 cannot be computed as a fixed difference from %rip. Exactly the same issue exists on ELF and we can use the same solution. llvm-svn: 322739	2018-01-17 19:19:55 +00:00
Joel Galenson	bbcaf4ac5c	[ARM] Optimize {s,u}mul.with.overflow. This extends my previous patches to also optimize overflow-checked multiplies during SelectionDAG. Differential revision: https://reviews.llvm.org/D40922 llvm-svn: 322738	2018-01-17 19:19:05 +00:00
Joel Galenson	fe7fa40869	[ARM] Optimize {s,u}{add,sub}.with.overflow. The ARM backend contains code that tries to optimize compares by replacing them with an existing instruction that sets the flags the same way. This allows it to replace a "cmp" with a "adds", generalizing the code that replaces "cmp" with "sub". It also heuristically disables sinking of instructions that could potentially be used to replace compares (currently only if they're next to each other). Differential revision: https://reviews.llvm.org/D38378 llvm-svn: 322737	2018-01-17 19:19:05 +00:00
Craig Topper	b70ca5060f	[X86] Teach LowerBUILD_VECTOR to recognize pair-wise splats of 32-bit elements and use a 64-bit broadcast If we are splatting pairs of 32-bit elements, we can use a 64-bit broadcast to get the job done. We could probably could probably do this with other sizes too, for example four 16-bit elements. Or we could broadcast pairs of 16-bit elements using a 32-bit element broadcast. But I've left that as a future improvement. I've also restricted this to AVX2 only because we can only broadcast loads under AVX. Differential Revision: https://reviews.llvm.org/D42086 llvm-svn: 322730	2018-01-17 18:58:22 +00:00
Craig Topper	279ace187a	[X86] When legalizing (v64i1 select i8, v64i1, v64i1) make sure not to introduce bitcasts to i64 in 32-bit mode We legalize selects of masks with scalar conditions using a bitcast to an integer type. But if we are in 32-bit mode we can't convert v64i1 to i64. So instead split the v64i1 to v32i1 and concat it back together. Each half will then be legalized by bitcasting to i32 which is fine. The test case is a little indirect. If we have the v64i1 select in IR it will get legalized by legalize vector ops which has a run of type legalization after it. That type legalization run is able to fix this i64 bitcast. So in order to avoid that we need a build_vector of a splat which legalize vector ops will ignore. Legalize DAG will then turn that into a select via LowerBUILD_VECTORvXi1. And the select will get legalized. In this case there is no type legalizer run to cleanup the bitcast. This fixes pr35972. llvm-svn: 322724	2018-01-17 18:46:01 +00:00
Simon Pilgrim	3274d35a0d	[X86][SSE] Add v4i16 PMULLD tests llvm-svn: 322723	2018-01-17 18:41:27 +00:00
Zaara Syeda	8e951fd2f6	[PowerPC] Add handling for ColdCC calling convention and a pass to mark candidates with coldcc attribute. This patch adds support for the coldcc calling convention for Power. This changes the set of non-volatile registers. It includes a pass to stress test the implementation by marking all static directly called functions with the coldcc attribute through the option -enable-coldcc-stress-test. It also includes an option, -ppc-enable-coldcc, to add the coldcc attribute to functions which are cold at all call sites based on BlockFrequencyInfo when the containing function does not call any non cold functions. Differential Revision: https://reviews.llvm.org/D38413 llvm-svn: 322721	2018-01-17 18:22:55 +00:00
Jonas Paulsson	ef785694f2	[SystemZ] Handle BRCTH branches correctly in SystemZLongBranch.cpp. BRCTH is capable of a long branch which needs to be recognized during branch relaxation. This is done by checking for ExtraRelaxSize == 0. Review: Ulrich Weigand llvm-svn: 322688	2018-01-17 17:16:07 +00:00
Diana Picus	4652e25030	[ARM GlobalISel] Add instselect tests for G_FPEXT and G_FPTRUNC G_FPEXT and G_FPTRUNC are handled by TableGen'erated code, just add tests. llvm-svn: 322665	2018-01-17 15:01:19 +00:00
Pablo Barrio	f2c29571da	[AArch64] Fix incorrect LD1 of 16-bit FP vectors in big endian Summary: Loading a vector of 4 half-precision FP sometimes results in an LD1 of 2 single-precision FP + a reversal. This results in an incorrect byte swap due to the conversion from little endian to big endian. In order to generate the correct byte swap, it is easier to generate the correct LD1 of 4 half-precision FP, thus avoiding the subsequent reversal. Reviewers: craig.topper, jmolloy, olista01 Reviewed By: olista01 Subscribers: efriedma, samparker, SjoerdMeijer, rogfer01, aemerson, rengolin, javed.absar, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D41863 llvm-svn: 322663	2018-01-17 14:39:29 +00:00
Diana Picus	c62a16234b	[ARM GlobalISel] Map G_FPEXT and G_FPTRUNC to FPR llvm-svn: 322657	2018-01-17 14:14:14 +00:00
Daniil Fukalov	d5fca554e2	[AMDGPU] add LDS f32 intrinsics added llvm.amdgcn.atomic.{add\|min\|max}.f32 intrinsics to allow generate ds_{add\|min\|max}[_rtn]_f32 instructions needed for OpenCL float atomics in LDS Reviewed by: arsenm Differential Revision: https://reviews.llvm.org/D37985 llvm-svn: 322656	2018-01-17 14:05:05 +00:00
Diana Picus	65ed364fac	[ARM GlobalISel] Legalize G_FPEXT and G_FPTRUNC Mark G_FPEXT and G_FPTRUNC as legal or libcall, depending on hardware support, but only for conversions between float and double. Also add the necessary boilerplate so that the LegalizerHelper can introduce the required libcalls. This also works only for float and double, but isn't too difficult to extend when the need arises. llvm-svn: 322651	2018-01-17 13:34:10 +00:00
Benjamin Kramer	8d073a2c2d	[X86] Don't mutate shuffle arguments after early-out for AVX512 The match* functions have the annoying behavior of modifying its inputs. Save and restore the inputs, just in case the early out for AVX512 is hit. This is still not great and its only a matter of time this kind of bug happens again, but I couldn't come up with a better pattern without rewriting significant chunks of this code. Fixes PR35977. llvm-svn: 322644	2018-01-17 13:01:06 +00:00
Simon Pilgrim	1bea16f5d2	[X86][AVX] Add extra 'interleaved+lanepermute' shuffle test Possible missed opportunity to use 64-bit lane permute on AVX1 in lowerShuffleAsRepeatedMaskAndLanePermute llvm-svn: 322628	2018-01-17 10:56:54 +00:00
Andrew V. Tischenko	f7706994a6	Allow usage of X86-prefixes as separate instrs. Differential Revision: https://reviews.llvm.org/D42102 llvm-svn: 322623	2018-01-17 10:12:06 +00:00
Sean Eveson	2ae6037dd1	[MC] Fix -stack-size-section on ARM Change symbol values in the stack_size section from being 8 bytes, to being a target dependent size. Differential Revision: https://reviews.llvm.org/D42108 llvm-svn: 322619	2018-01-17 09:01:29 +00:00
Simon Pilgrim	a8e6b885bd	[X86][BTVER2] Fix scheduling of VCMPSD/VCMPSS instructions For some reason they don't have a trailing i like the packed equivalents. llvm-svn: 322600	2018-01-16 22:15:41 +00:00
Volkan Keles	f7f2568613	[GlobalISel][TableGen] Add support for SDNodeXForm Summary: This patch adds CustomRenderer which renders the matched operands to the specified instruction. Targets can enable the matching of SDNodeXForm by adding a definition that inherits from GICustomOperandRenderer and GISDNodeXFormEquiv as follows. def gi_imm8 : GICustomOperandRenderer<"renderImm8”>, GISDNodeXFormEquiv<imm8_xform>; Custom renderer functions should be of the form: void render(MachineInstrBuilder &MIB, const MachineInstr &I); Reviewers: dsanders, ab, rovka Reviewed By: dsanders Subscribers: kristof.beyls, javed.absar, llvm-commits, mgrang, qcolombet Differential Revision: https://reviews.llvm.org/D42012 llvm-svn: 322582	2018-01-16 18:44:05 +00:00
Simon Pilgrim	3e0aafbfcc	[X86][MMX] Accept UNDEF upper bits for MOVD GR32->MMX llvm-svn: 322574	2018-01-16 17:01:31 +00:00
Simon Pilgrim	85e6139633	[X86][MMX] Improve MMX constant generation Extend the MMX zero code to take any constant with zero'd upper 32-bits llvm-svn: 322553	2018-01-16 14:21:28 +00:00
Jonas Devlieghere	6f24c8778c	[DebugInfo] Unify dumping of address ranges Summary: This patch unifies the printing of address ranges as [0x0, 0x1). rdar://34822059 Reviewers: aprantl, dblaikie Subscribers: mehdi_amini, llvm-commits Differential Revision: https://reviews.llvm.org/D42056 llvm-svn: 322543	2018-01-16 11:17:57 +00:00
Yonghong Song	b42c7c7863	[BPF] Teach DAG2DAG AND elimination about load intrinsics As commented on the existing code: // The Reg operand should be a virtual register, which is defined // outside the current basic block. DAG combiner has done a pretty // good job in removing truncating inside a single basic block. However, when the Reg operand comes from bpf_load_[byte \| half \| word] intrinsics, the generic optimizer doesn't understand their results are zero extended, so these single basic block elimination opportunities were missed. Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com> Acked-by: Yonghong Song <yhs@fb.com> Signed-off-by: Jiong Wang <jiong.wang@netronome.com> llvm-svn: 322534	2018-01-16 07:27:19 +00:00
Simon Pilgrim	85bd9141ca	[X86][MMX] Add support for MMX zero vector creation As mentioned on PR35869, (and came up recently on D41517) we don't create a MMX zero register via the PXOR but instead perform a spill to stack from a XMM zero register. This patch adds support for direct MMX zero vector creation and should make it easier to add better constant vector creation in the future as well. Differential Revision: https://reviews.llvm.org/D41908 llvm-svn: 322525	2018-01-15 22:32:40 +00:00
Simon Pilgrim	940eae3cc1	[X86][SSE] Add custom execution domain fixing for BLENDPD/BLENDPS/PBLENDD/PBLENDW (PR34873) Add support for custom execution domain fixing and implement support for BLENDPD/BLENDPS/PBLENDD/PBLENDW. Differential Revision: https://reviews.llvm.org/D42042 llvm-svn: 322524	2018-01-15 22:18:45 +00:00
Sanjay Patel	30265d0a47	[x86] add tests to show missed constant shrinking (PR35907); NFC llvm-svn: 322523	2018-01-15 21:57:41 +00:00
Sanjay Patel	fc74f71400	[x86] regenerate test checks; NFC llvm-svn: 322522	2018-01-15 21:32:39 +00:00
Sanjay Patel	b885f04695	[x86] regenerate test checks; NFC llvm-svn: 322521	2018-01-15 21:28:52 +00:00
Sanjay Patel	0d0cec879b	[x86] regenerate test checks; NFC llvm-svn: 322519	2018-01-15 21:22:46 +00:00
Stanislav Mekhanoshin	62875fcd6c	[AMDGPU] Add HW_REG_SH_MEM_BASES symbolic name for s_getreg_b32 Differential Revision: https://reviews.llvm.org/D41617 llvm-svn: 322500	2018-01-15 18:49:15 +00:00
Krzysztof Parzyszek	b8f2a1e7b7	[Hexagon] Rewrite LowerVECTOR_SHUFFLE for 32-/64-bit vectors The old implementation was not always correct. The new one recognizes more shuffles that match specific instructions. llvm-svn: 322498	2018-01-15 18:33:33 +00:00
Jonas Paulsson	776a81a483	[SystemZ] Check for legality before doing LOAD AND TEST transformations. Since a load and test instruction treat its operands as signed, it can only replace a logical compare for EQ/NE uses. Review: Ulrich Weigand https://bugs.llvm.org/show_bug.cgi?id=35662 llvm-svn: 322488	2018-01-15 15:41:26 +00:00
Andrew V. Tischenko	e58c0c96b2	Update BTVER2 sched numbers for some AVX instructions (xmm version). Differential Revision: https://reviews.llvm.org/D40067 llvm-svn: 322485	2018-01-15 14:21:11 +00:00
Benjamin Kramer	736a343e97	Revert "[DAG] Elide overlapping stores" This reverts commit r322085. Internal PPC testing is still showing the same symptoms as when this patch landed the last time. llvm-svn: 322474	2018-01-15 10:57:24 +00:00
Simon Pilgrim	700552dd78	[X86][SSE] Tag PR21137 test case The test was added ages ago, but we didn't comment where it came from. llvm-svn: 322465	2018-01-14 21:59:43 +00:00
Craig Topper	6c2dee0c8e	[X86] Add test cases for D41794. llvm-svn: 322464	2018-01-14 20:53:49 +00:00
Simon Pilgrim	1b6440ff22	[X86][SSE] Add PR22391 test case llvm-svn: 322463	2018-01-14 19:57:50 +00:00
Craig Topper	7197a452fc	[X86] Autoupgrade kunpck intrinsics using vector operations instead of scalar operations Summary: This patch changes the kunpck intrinsic autoupgrade to use vXi1 shufflevector operations to perform vector extracts and concats. This more closely matches the definition of the kunpck instructions. Currently we rely on a DAG combine to turn the scalar shift/and/or code into a concat vectors operation. By doing it in the IR we get this for free. Reviewers: spatel, RKSimon, zvi, jina.nahias Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D42018 llvm-svn: 322462	2018-01-14 19:24:10 +00:00
Simon Pilgrim	7c3088e5c0	[X86] Regenerate fp128 test llvm-svn: 322460	2018-01-14 19:07:41 +00:00
Simon Pilgrim	9904fe77a0	[X86][SSE] Support combining MOVLHPS undef inputs llvm-svn: 322459	2018-01-14 18:50:34 +00:00
Simon Pilgrim	73cebe807b	[X86][SSE] Add v2f64 3u shuffle test Shows a missed opportunity to remove a unnecessary move compared to 31 shuffle mask. llvm-svn: 322458	2018-01-14 18:38:21 +00:00
Sanjay Patel	527bf920c6	[x86] auto-generate complete checks; NFC llvm-svn: 322457	2018-01-14 17:47:40 +00:00
Craig Topper	b2868233b7	[X86] Use ISD::TRUNCATE instead of X86ISD::VTRUNC when input and output types have the same number of elements. llvm-svn: 322455	2018-01-14 08:11:36 +00:00
Craig Topper	57d58051bb	[X86] Add X86ISD::VTRUNC to computeKnownBitsForTargetNode. We have to take special care to avoid the cases where the result of the truncate would be padded with zero elements. Ideally we'd just use ISD::TRUNCATE for these cases instead. llvm-svn: 322454	2018-01-14 08:11:33 +00:00
Craig Topper	e9fc0cd920	[X86] Improve legalization of vXi16/vXi8 selects. Extend vXi1 conditions of vXi8/vXi16 selects even before type legalization gets a chance to split wide vectors. Previously we would only extend 128 and 256 bit vectors. But if we start with a 512 bit vector or wider that needs to be split we wouldn't extend until after the split had taken place. By extending early we improve the results of type legalization. Don't widen condition of 128/256 bit vXi16/vXi8 selects when we have BWI but not VLX. We can still use a mask register by widening the select to 512-bits instead. This is similar to what we do for compares already. llvm-svn: 322450	2018-01-14 02:05:51 +00:00
Craig Topper	7a3b10184b	[X86] Add an avx512bw command line to the avx512-vec-cmp.ll test. Add some additional test cases. Additional test cases cover selects with i16/i8 conditions that are only 128/256-bits wide, but the compares are 512-bits wide and can only produce k-registers. We should be able to artificially widen the selects to avoid moving the k-register to an xmm/ymm register. llvm-svn: 322449	2018-01-14 02:05:49 +00:00
Zvi Rackover	652f9a1896	X86: Add pattern matching for PMADDWD In addition to the existing match as part of a loop-reduction, add a straightforward pattern match for DAG-contained patterns. Reviewers: RKSimon, craig.topper Subscribers: llvm-commits Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D41811 llvm-svn: 322446	2018-01-13 17:42:19 +00:00
Simon Pilgrim	f408745306	[X86] Regenerate double shift tests llvm-svn: 322444	2018-01-13 16:55:28 +00:00
Simon Pilgrim	20acf939ef	[X86][MMX] Add test for MMX zero folding As discussed in D41908 llvm-svn: 322436	2018-01-13 12:29:06 +00:00
Zvi Rackover	63f1f322c9	X86 Tests: add more pamddwd cases. NFC Improve coverage of D41811 llvm-svn: 322434	2018-01-13 08:21:29 +00:00
Craig Topper	6f109f8c6c	[X86] Add DAG combine to promote vXi1 result of a vXi8/vXi16 setcc when we have AVX512 but not BWI. This avoids having the result type stick around until lowering where we have to extend the setcc and insert a truncate. If we get the types converted early we can do more to optimize it. llvm-svn: 322432	2018-01-13 06:24:46 +00:00
Paul Robinson	b22d170caf	XFAIL a test on Darwin, line-table stuck on DWARF 2 llvm-svn: 322430	2018-01-13 01:39:30 +00:00
Jessica Paquette	757e120379	[MachineOutliner] Move hasAddressTaken check to MachineOutliner.cpp Mostly NFC. Still updating the test though just for completeness. This moves the hasAddressTaken check to MachineOutliner.cpp and replaces it with a per-basic block test rather than a per-function test. The old test was too conservative and was preventing functions in C programs from being outlined even though they were safe to outline. This was mostly a problem in C sources. llvm-svn: 322425	2018-01-13 00:42:28 +00:00
Tim Renouf	75ced9d5b8	[AMDGPU] stop image_store being moved illegally Summary: A recent change 321556: AMDGPU: Remove mayLoad/hasSideEffects from MIMG stores can allow the machine instruction scheduler to move an image store past an image load using the same descriptor. V2: Fixed by marking image ops as mayAlias and isAliased. This may be overly conservative, and we may need to revisit. V3: Reverted test change done on 321556. Reviewers: arsenm, nhaehnle, dstuttard Subscribers: llvm-commits, t-tye, yaxunl, wdng, kzhuravl Differential Revision: https://reviews.llvm.org/D41969 llvm-svn: 322419	2018-01-12 22:57:24 +00:00
Changpeng Fang	44dfa1de3b	AMDGPU/SI: Add d16 support for buffer intrinsics. Differential Revision: https://reviews.llvm.org/D38906 Reviewers: Matt and Brian. llvm-svn: 322402	2018-01-12 21:12:19 +00:00
Paul Robinson	1879cb0b42	Try to fix more bots after r322391 llvm-svn: 322400	2018-01-12 20:54:45 +00:00
Paul Robinson	d138088c2f	Add toothpicks to test from r322391 llvm-svn: 322394	2018-01-12 19:58:35 +00:00
Paul Robinson	612e89d74f	[DWARFv5] CodeGen support for MD5 file checksums Pass MD5 checksums through from IR to assembly/object files. After this, getting Clang to compute the MD5 should be the last step to supporting MD5 in the DWARF v5 line table header. Differential Revision: https://reviews.llvm.org/D41926 llvm-svn: 322391	2018-01-12 19:17:50 +00:00
Simon Pilgrim	edff13b9de	[X86][SSE] Force blend domains on stack folding tests llvm-svn: 322385	2018-01-12 18:05:29 +00:00
Simon Pilgrim	b8bc537923	[X86][AVX] Regenerate element insertion tests llvm-svn: 322384	2018-01-12 18:02:52 +00:00
Benjamin Kramer	309124e0b1	[PowerPC] Don't miscompile rotate+mask into an ANDIo if it can't recreate the immediate I'm not even sure if this transform is ever worth it, but this at least stops the bleeding. llvm-svn: 322373	2018-01-12 15:03:24 +00:00
Nemanja Ivanovic	ebb23078e9	[PowerPC] Zero-extend the compare operand for ATOMIC_CMP_SWAP Part of the fix for https://bugs.llvm.org/show_bug.cgi?id=35812. This patch ensures that the compare operand for the atomic compare and swap is properly zero-extended to 32 bits if applicable. A follow-up commit will fix the extension for the SETCC node generated when expanding an ATOMIC_CMP_SWAP_WITH_SUCCESS. That will complete the bug fix. Differential Revision: https://reviews.llvm.org/D41856 llvm-svn: 322372	2018-01-12 14:58:41 +00:00
Stefan Pintilie	70bfe66111	Revert "[PowerPC] Manually schedule the prologue and epilogue" This reverts commit r322124 since some tests were broken by that patch. Will recommmit once the patch is fixed. llvm-svn: 322369	2018-01-12 13:12:49 +00:00
Diana Picus	cf044647c4	[ARM GlobalISel] Add inst selector tests for G_FMA We don't yet match all the patterns involving G_FMA. Add tests for some of the ones that we do match. llvm-svn: 322368	2018-01-12 12:44:36 +00:00
Diana Picus	2dc5405693	[ARM GlobalISel] Map G_FMA to FPR llvm-svn: 322367	2018-01-12 12:06:01 +00:00
Diana Picus	e74243d473	[ARM GlobalISel] Legalize G_FMA For hard float with VFP4, it is legal. Otherwise, we use libcalls. This needs a bit of support in the LegalizerHelper for soft float because we didn't handle G_FMA libcalls yet. The support is trivial, as the only difference between G_FMA and other libcalls that we already handle is that it has 3 input operands rather than just 2. llvm-svn: 322366	2018-01-12 11:30:45 +00:00
Andre Vieira	5627c218e1	[ARM] Add codegen for SMMULR, SMMLAR and SMMLSR This patch teaches the Arm back-end to generate the SMMULR, SMMLAR and SMMLSR instructions from equivalent IR patterns. Differential Revision: https://reviews.llvm.org/D41775 llvm-svn: 322361	2018-01-12 09:24:41 +00:00
Andre Vieira	26b9de9ebb	[ARM] Fix erroneous availability of SMMLS for Armv7-M Differential Revision: https://reviews.llvm.org/D41855 llvm-svn: 322360	2018-01-12 09:21:09 +00:00
Serguei Katkov	76a1de3cd5	[CGP] Re-enable Select in complex addressing mode Re-enable Select after a couple of fixes. Differential Revision: https://reviews.llvm.org/D40634 llvm-svn: 322358	2018-01-12 08:33:34 +00:00
Craig Topper	b1623321af	[X86] Add 'l' and 'q' suffixes to the tbm instruction mnemonics. While the suffix isn't required to disambiguate the instructions, it is required in order to parse the instructions when the suffix is specified in order to match the GNU assembler. llvm-svn: 322354	2018-01-12 06:21:36 +00:00
David L. Jones	8c87213c26	Revert r322279 due to Skylake miscompile. Summary: This revision causes Skylake (and apparently, only Skylake) codegen to fail in certain cases. Details: https://bugs.llvm.org/show_bug.cgi?id=35918 Subscribers: sanjoy, llvm-commits Differential Revision: https://reviews.llvm.org/D41972 llvm-svn: 322335	2018-01-12 00:17:38 +00:00
Matthias Braun	ea4359e922	PeepholeOptimizer: Fix for vregs without defs The PeepholeOptimizer would fail for vregs without a definition. If this was caused by an undef operand abort to keep the code simple (so we don't need to add logic everywhere to replicate the undef flag). Differential Revision: https://reviews.llvm.org/D40763 llvm-svn: 322319	2018-01-11 22:30:43 +00:00
Rafael Espindola	e4b0231c63	Make internal/private GVs implicitly dso_local. While updating clang tests for having clang set dso_local I noticed that: - There are a lot of tests to update. - Many of the updates are redundant. They are redundant because a GV is "obviously dso_local". This patch starts formalizing that a bit by requiring that internal and private GVs be dso_local too. Since they all are, we don't have to print dso_local to the textual representation, making it a bit more compact and easier to read. llvm-svn: 322317	2018-01-11 22:15:05 +00:00
Matthias Braun	08abcac9dc	PeepholeOptimizer: Do not form PHI with subreg arguments When replacing a PHI the PeepholeOptimizer currently takes the register class of the register at the first operand. This however is not correct if this argument has a subregister index. As there is currently no API to query the register class resulting from applying a subregister index to all registers in a class, we can only abort in these cases and not perform the transformation. This changes findNextSource() to require the end of all copy chains to not use a subregister if there is any PHI in the chain. I had to rewrite the overly complicated inner loop there to have a good place to insert the new check. This fixes https://llvm.org/PR33071 (aka rdar://32262041) Differential Revision: https://reviews.llvm.org/D40758 llvm-svn: 322313	2018-01-11 21:57:03 +00:00
Evgeniy Stepanov	5223b5d9d6	[arm] Implement Target Operand Flag MIR serialization. Reviewers: efriedma, pcc Subscribers: aemerson, javed.absar, kristof.beyls, hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D39975 llvm-svn: 322312	2018-01-11 21:37:58 +00:00
Craig Topper	2aac3ee5bc	[X86] Legalize 128/256 gathers/scatters on KNL by using widening rather than sign extending the index. We can just widen the vectors with undef and zero extend the mask. llvm-svn: 322308	2018-01-11 19:38:30 +00:00
Krzysztof Parzyszek	240df6faa4	[Hexagon] Fix building 64-bit vector from constant values The constants were aggregated in a reverse order. llvm-svn: 322303	2018-01-11 18:30:41 +00:00
Krzysztof Parzyszek	4ef6cfff6a	[Hexagon] Cast elements to correct type when creating constant vector llvm-svn: 322301	2018-01-11 18:03:23 +00:00
Zvi Rackover	999e6c2967	DAGCombine: Let truncates negate extension through extract-subvector Summary: Fold cases such as: (v8i8 truncate (v8i32 extract_subvector (v16i32 sext (v16i8 V), Idx))) -> (v8i8 extract_subvector (v16i8 V), Idx) This can be generalized to cases where the truncate and extend do not fully cancel each other out, but it may require querying the target about profitability. Reviewers: RKSimon, craig.topper, spatel, efriedma Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D41927 llvm-svn: 322300	2018-01-11 18:02:33 +00:00
Zvi Rackover	cf0999887a	X86 Tests: Add zext cases in (trunc (subvector)) test. NFC Cases were missing as observed in D41927 llvm-svn: 322297	2018-01-11 17:50:34 +00:00
Simon Pilgrim	8de035670e	[X86][SSE] Drop old insertps stack folding test Broken test from old attempt for folding tables - we don't peek through extract_subvector spills at all (which is why it doesn't fold), and we already have foldMemoryOperandCustom to handle insertps immediate correction anyway. llvm-svn: 322292	2018-01-11 16:57:58 +00:00
Sanjay Patel	e63d8dda5a	[ValueTracking] recognize min/max-of-min/max with notted ops (PR35875) This was originally planned as the fix for: https://bugs.llvm.org/show_bug.cgi?id=35834 ...but simpler transforms handled that case, so I implemented a lesser solution. It turns out we need to handle the case with 'not' ops too because the real code example that we are trying to solve: https://bugs.llvm.org/show_bug.cgi?id=35875 ...has extra uses of the intermediate values, so we can't rely on smaller canonicalizations to get us to the goal. As with rL321672, I've tried to show every possibility in the codegen tests because that's the simplest way to prove we're doing the right thing in the wide variety of permutations of this pattern. We can also show an InstCombine win because we added a fold for this case in: rL321998 / D41603 An Alive proof for one variant of the pattern to show that the InstCombine and codegen results are correct: https://rise4fun.com/Alive/vd1 Name: min3_nots %nx = xor i8 %x, -1 %ny = xor i8 %y, -1 %nz = xor i8 %z, -1 %cmpxz = icmp slt i8 %nx, %nz %minxz = select i1 %cmpxz, i8 %nx, i8 %nz %cmpyz = icmp slt i8 %ny, %nz %minyz = select i1 %cmpyz, i8 %ny, i8 %nz %cmpyx = icmp slt i8 %y, %x %r = select i1 %cmpyx, i8 %minxz, i8 %minyz => %cmpxyz = icmp slt i8 %minxz, %ny %r = select i1 %cmpxyz, i8 %minxz, i8 %ny Name: min3_nots_alt %nx = xor i8 %x, -1 %ny = xor i8 %y, -1 %nz = xor i8 %z, -1 %cmpxz = icmp slt i8 %nx, %nz %minxz = select i1 %cmpxz, i8 %nx, i8 %nz %cmpyz = icmp slt i8 %ny, %nz %minyz = select i1 %cmpyz, i8 %ny, i8 %nz %cmpyx = icmp slt i8 %y, %x %r = select i1 %cmpyx, i8 %minxz, i8 %minyz => %xz = icmp sgt i8 %x, %z %maxxz = select i1 %xz, i8 %x, i8 %z %xyz = icmp sgt i8 %maxxz, %y %maxxyz = select i1 %xyz, i8 %maxxz, i8 %y %r = xor i8 %maxxyz, -1 llvm-svn: 322283	2018-01-11 15:13:47 +00:00
Simon Pilgrim	6e6da3f449	[X86][SSE] Add ISD::VECTOR_SHUFFLE to faux shuffle decoding Primarily, this allows us to use the aggressive extraction mechanisms in combineExtractWithShuffle earlier and make use of UNDEF elements that may be lost during lowering. llvm-svn: 322279	2018-01-11 14:25:18 +00:00
Zvi Rackover	3ee66d9cd1	X86: Fix LowerBUILD_VECTORAsVariablePermute for case Src is smaller than Indices Summary: As RKSimon suggested in pr35820, in the case that Src is smaller in bit-size than Indices, need to widen Src to avoid type mismatch. Fixes pr35820 Reviewers: RKSimon, craig.topper Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D41865 llvm-svn: 322272	2018-01-11 12:26:52 +00:00
Alex Bradbury	0715d35ed5	[RISCV] Reserve an emergency spill slot for the register scavenger when necessary Although the register scavenger can often find a spare register, an emergency spill slot is needed to guarantee success. Reserve this slot in cases where the function is known to have a large stack (meaning the scavenger may be needed when forming stack addresses). llvm-svn: 322269	2018-01-11 11:17:19 +00:00
Craig Topper	0b59034b15	[X86] Optimize v2i32/v2f32 scatters. If the index is v2i64 we can use the scatter instruction that has v4i32/v4f32 data register, v2i64 index, and v2i1 mask. Similar was already done for gather. Implement custom widening for v2i32 data to remove the code that reverses type legalization during lowering. llvm-svn: 322254	2018-01-11 06:31:28 +00:00
Sanjay Patel	f16fe0f205	[AArch64] add tests for notted variants of min/max; NFC Like rL321668 / rL321672, the planned optimizer change to fix these will be in ValueTracking, but we can test the changes cleanly here with AArch64 codegen. llvm-svn: 322238	2018-01-10 23:31:42 +00:00
Matthias Braun	e3a8db7ba1	Revert "AArch64: Fix emergency spillslot being out of reach for large callframes" Revert for now as the testcase is hitting a pre-existing verifier error that manifest as a failure when expensive checks are enabled (or -verify-machineinstrs) is used. This reverts commit r322200. llvm-svn: 322231	2018-01-10 22:36:28 +00:00
Alex Bradbury	315cd3ace4	[RISCV] Implement support for the BranchRelaxation pass Branch relaxation is needed to support branch displacements that overflow the instruction's immediate field. Differential Revision: https://reviews.llvm.org/D40830 llvm-svn: 322224	2018-01-10 21:05:07 +00:00
Matthias Braun	725ad0eee0	TargetLoweringBase: The ios simulator has no bzero function. Make sure I really get back to the beahvior before my rewrite in r321035 which turned out not to be completely NFC as I changed the behavior for the ios simulator environment. llvm-svn: 322223	2018-01-10 20:49:57 +00:00
Alex Bradbury	e027c93ac2	[RISCV] Implement branch analysis This is a prerequisite for the branch relaxation pass, and allows a number of optimisation passes (e.g. BranchFolding and MachineBlockPlacement) to work. Differential Revision: https://reviews.llvm.org/D40808 llvm-svn: 322222	2018-01-10 20:47:00 +00:00
Alex Bradbury	70f137b6bf	[RISCV] Add support for llvm.{frameaddress,returnaddress} intrinsics llvm-svn: 322218	2018-01-10 20:12:00 +00:00
Alex Bradbury	9330e64485	[RISCV] Add basic support for inline asm constraints llvm-svn: 322217	2018-01-10 20:05:09 +00:00
Alex Bradbury	9fea4881d0	[RISCV] Support stack frames and offsets up to 32-bits Differential Revision: https://reviews.llvm.org/D40807 llvm-svn: 322216	2018-01-10 19:53:46 +00:00
Alex Bradbury	c85be0de56	[RISCV] Support for varargs Includes support for expanding va_copy. Also adds support for using 'aligned' registers when necessary for vararg calls, and ensure the frame pointer always points to the bottom of the vararg spill region. This is necessary to ensure that the saved return address and stack pointer are always available at fixed known offsets of the frame pointer. Differential Revision: https://reviews.llvm.org/D40805 llvm-svn: 322215	2018-01-10 19:41:03 +00:00
Craig Topper	af4eb17223	[SelectionDAG][X86] Explicitly store the scale in the gather/scatter ISD nodes Currently we infer the scale at isel time by analyzing whether the base is a constant 0 or not. If it is we assume scale is 1, else we take it from the element size of the pass thru or stored value. This seems a little weird and I think it makes more sense to make it explicit in the DAG rather than doing tricky things in the backend. Most of this patch is just making sure we copy the scale around everywhere. Differential Revision: https://reviews.llvm.org/D40055 llvm-svn: 322210	2018-01-10 19:16:05 +00:00
Jessica Paquette	c191f1097c	[MachineOutliner] Outline ADRPs ADRP instructions weren't being outlined because they're PC-relative and thus fail the LR checks. This patch adds a special case for ADRPs to getOutliningType to make sure that ADRPs can be outlined and updates the MIR test. llvm-svn: 322207	2018-01-10 18:49:57 +00:00
Matthias Braun	b42ffa1283	AArch64: Fix emergency spillslot being out of reach for large callframes Large callframes (calls with several hundreds or thousands or parameters) could lead to situations in which the emergency spillslot is out of range to be addressed relative to the stack pointer. This commit forces the use of a frame pointer in the presence of large callframes. This commit does several things: - Compute max callframe size at the end of instruction selection. - Add mirFileLoaded target callback. Use it to compute the max callframe size after loading a .mir file when the size wasn't specified in the file. - Let TargetFrameLowering::hasFP() return true if there exists a callframe > 255 bytes. - Always place the emergency spillslot close to FP if we have a frame pointer. - Note that `useFPForScavengingIndex()` would previously return false when a base pointer was available leading to the emergency spillslot getting allocated late (that's the whole effect of this callback). Which made no sense to me so I took this case out: Even though the emergency spillslot is technically not referenced by FP in this case we still want it allocated early. Differential Revision: https://reviews.llvm.org/D40876 llvm-svn: 322200	2018-01-10 18:16:24 +00:00
Simon Pilgrim	f74e3f45dc	[X86][MMX] Add test for PR35869 llvm-svn: 322197	2018-01-10 17:05:03 +00:00
Zvi Rackover	a27442f4f4	X86 Tests: Add isel tests for truncate-extract_vector-extend. NFC. To be improved in a future patch llvm-svn: 322192	2018-01-10 14:56:15 +00:00
Simon Pilgrim	a0c59cce0e	[X86][SSE] Add some basic FABS combine tests llvm-svn: 322182	2018-01-10 13:28:34 +00:00
Simon Pilgrim	a330a407c4	[X86][SSE] Add v2f64 u2 shuffle test Adds missing coverage for SHUFPD undef argument lowering, and also shows a missed opportunity to remove a unnecessary move compared to 02 shuffle mask. llvm-svn: 322175	2018-01-10 12:23:39 +00:00
Diana Picus	e3591f3a17	[ARM GlobalISel] Add inst selector tests for G_FNEG s32 and s64 G_FNEG is already handled by the TableGen'erated code. Just add a few tests to make sure everything works as expected. llvm-svn: 322170	2018-01-10 11:13:36 +00:00
Diana Picus	0ed7513c83	[ARM GlobalISel] Map G_FNEG to the FPR bank llvm-svn: 322169	2018-01-10 11:13:31 +00:00
Diana Picus	f949a0abac	[ARM GlobalISel] Legalize G_FNEG for s32 and s64 For hard float, it is legal. For soft float, we need to lower to 0 - x first, and then we can use the libcall for G_FSUB. This is undoing some of the canonicalization performed by the IRTranslator (which introduces G_FNEG when it sees a 0 - x). Ideally, that canonicalization would be performed by a pre-legalizer pass that would allow targets to opt out of this behaviour rather than dance around it in the legalizer. llvm-svn: 322168	2018-01-10 10:45:34 +00:00
Jonas Paulsson	1a76f3a2c2	Temporarily revert "[SystemZ] Check for legality before doing LOAD AND TEST transformations." , due to test failures. llvm-svn: 322165	2018-01-10 10:05:55 +00:00
Diana Picus	8f14886630	[ARM GlobalISel] Legalize s32/s64 G_FCONSTANT Legal for hard float. Change to G_CONSTANT for soft float (but preserve the binary representation). llvm-svn: 322164	2018-01-10 10:01:49 +00:00
Jonas Paulsson	9222b91e24	[SelectionDAGBuilder] Chain prefetches less aggressively. Prefetches used to always be chained between any previous and following memory accesses. The problem with this was that later optimizations, such as folding of a load into the user instruction, got disrupted. This patch relaxes the chaining of prefetches in order to remedy this. Reveiw: Hal Finkel https://reviews.llvm.org/D38886 llvm-svn: 322163	2018-01-10 09:33:00 +00:00
Diana Picus	734a5e8912	[ARM GlobalISel] Legalize G_CONSTANT for scalars > 32 bits Make G_CONSTANT narrow for any scalars larger than 32 bits. llvm-svn: 322162	2018-01-10 09:32:01 +00:00
Jonas Paulsson	d9dde1ac56	[SystemZ] Check for legality before doing LOAD AND TEST transformations. Since a load and test instruction treat its operands as signed, it can only replace a logical compare for EQ/NE uses. Review: Ulrich Weigand https://bugs.llvm.org/show_bug.cgi?id=35662 llvm-svn: 322161	2018-01-10 09:18:17 +00:00
Puyan Lotfi	fe6c9cbb24	[MIR] Repurposing '$' sigil used by external symbols. Replacing with '&'. Planning to add support for named vregs. This puts is in a conundrum since physregs are named as well. To rectify this we need to use a sigil other than '%' for physregs in MIR. We've settled on using '$' for physregs but first we must repurpose it from external symbols using it, which is what this commit is all about. We think '&' will have familiar semantics for C/C++ users. llvm-svn: 322146	2018-01-10 00:56:48 +00:00

... 2 3 4 5 6 ...

23240 Commits