llvm-project

Commit Graph

Author	SHA1	Message	Date
James Henderson	6135b0f88c	[llvm-readobj] Don't print '@' at end of unversioned dynsym names This fixes https://bugs.llvm.org/show_bug.cgi?id=40097. The problem was caused by a regression in r188022. See also r350614. Reviewed by: rupprecht, mstorsjo, Higuoxing, jakehehrlich Differential Revision: https://reviews.llvm.org/D56319 llvm-svn: 350615	2019-01-08 10:58:05 +00:00
Sam Parker	53000a74a5	[ARM] Add missing patterns for DSP muls Using a PatLeaf for sext_16_node allowed matching smulbb and smlabb instructions once the operands had been sign extended. But we also need to use sext_inreg operands along with sext_16_node to catch a few more cases that enable use to remove the unnecessary sxth. Differential Revision: https://reviews.llvm.org/D55992 llvm-svn: 350613	2019-01-08 10:12:36 +00:00
Matt Arsenault	c765240060	AMDGPU/GlobalISel: Introduce vcc reg bank I'm not entirely sure this is the correct thing to do with the global isel philosophy, but I think this is necessary to handle how differently SGPRs are used normally vs. from a condition. For example, it makes sense to allow a copy from a VGPR to an SGPR, but it makes no sense to allow a copy from VGPRs to SGPRs used as select mask. This avoids regbankselecting strange code with a truncate feeding directly into a condition field. Now a copy is forced from sgpr(s1) to vcc, which is more sensible to handle. Some of these issues could probably avoided with making enough operations resulting in i1 illegal. I think we can't avoid this register bank for legality. For example, an i1 and where one source is from a truncate, and one source is a compare needs some kind of copy inserted to make sure both are in condition registers. llvm-svn: 350611	2019-01-08 06:30:53 +00:00
Thomas Lively	6a87ddac9a	[WebAssembly] Massive instruction renaming Summary: An automated renaming of all the instructions listed at https://github.com/WebAssembly/spec/issues/884#issuecomment-426433329 as well as some similarly-named identifiers. Reviewers: aheejin, dschuff, aardappel Subscribers: sbc100, jgravelle-google, eraman, sunfish, jfb, llvm-commits Differential Revision: https://reviews.llvm.org/D56338 llvm-svn: 350609	2019-01-08 06:25:55 +00:00
Mandeep Singh Grang	f286bee9fe	[MC] [AArch64] Support resolving signed fixups for :abs_g0_s: etc. Summary: This patch is a follow-up to D55896. Reviewers: efriedma, mstorsjo Reviewed By: efriedma Subscribers: javed.absar, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D56029 llvm-svn: 350606	2019-01-08 04:48:00 +00:00
Matt Arsenault	a1515d2d33	AMDGPU/GlobalISel: Legalize concat_vectors llvm-svn: 350598	2019-01-08 01:30:02 +00:00
Matt Arsenault	adc40baa29	RegBankSelect: Fix copy insertion point for terminators If a copy was needed to handle the condition of brcond, it was being inserted before the defining instruction. Add tests for iterator edge cases. I find the existing code here suspect for the case where it's looking for terminators that modify the register. It's going to insert a copy in the middle of the terminators, which isn't allowed (it might be necessary to have a COPY_terminator if anybody actually needs this). Also legalize brcond for AMDGPU. llvm-svn: 350595	2019-01-08 01:22:47 +00:00
Matt Arsenault	ae6f1e07fc	AMDGPU/GlobalISel: Disallow VGPR->SCC copies This fixes using scalar adds when only the carry in is a VGPR using greedy regbankselect. llvm-svn: 350593	2019-01-08 01:13:20 +00:00
Matt Arsenault	68c668a5f3	AMDGPU/GlobalISel: RegBankSelect for carry-in I'm not sure we should be allowing the truncate to s1 for the inputs. It may be necessary to create a new VCC reg bank. llvm-svn: 350592	2019-01-08 01:09:09 +00:00
Matt Arsenault	2cc15b67b7	AMDGPU/GlobalISel: RegBankSelect for add/sub with carry out llvm-svn: 350589	2019-01-08 01:03:58 +00:00
Matt Arsenault	299302fbe7	AMDGPU/GlobalISel: InstrMapping for G_UNMERGE_VALUES llvm-svn: 350588	2019-01-08 00:46:19 +00:00
Chen Zheng	33a61d719c	fix comment typo - NFC llvm-svn: 350587	2019-01-08 00:40:01 +00:00
Wei Mi	2645fd0ece	[RegisterCoalescer] dst register's live interval needs to be updated when merging a src register in ToBeUpdated set. This is to fix PR40061 related with https://reviews.llvm.org/rL339035. In https://reviews.llvm.org/rL339035, live interval of source pseudo register in rematerialized copy may be saved in ToBeUpdated set and its update may be postponed. In PR40061, %t2 = %t1 is rematerialized and %t1 is added into toBeUpdated set to postpone its live interval update. After the rematerialization, the live interval of %t1 is larger than necessary. Then %t1 is merged into %t3 and %t1 gets removed. After the merge, %t3 contains live interval larger than necessary. Because %t3 is not in toBeUpdated set, its live interval is not updated after register coalescing and it will break some assumption in regalloc. The patch requires the live interval of destination register in a merge to be updated if the source register is in ToBeUpdated. Differential revision: https://reviews.llvm.org/D55867 llvm-svn: 350586	2019-01-08 00:26:11 +00:00
Jonas Devlieghere	91b43adb69	[dsymutil] Upstream unobfuscation logic. The unobufscation support for BCSymbolMaps was the last piece of code that hasn't been upstreamed yet. This patch contains a reworked version of the existing code and relevant tests. Differential revision: https://reviews.llvm.org/D56346 llvm-svn: 350580	2019-01-07 23:27:25 +00:00
Rong Xu	6f366c3a04	[PGO] Use SourceFileName rather module name in PGOFuncName In LTO or Thin-lto mode (though linker plugin), the module names are of temp file names which are different for different compilations. Using SourceFileName avoids the issue. This should not change any functionality for current PGO as all the current callers of getPGOFuncName() is before LTO. llvm-svn: 350579	2019-01-07 23:25:56 +00:00
Davide Italiano	bf1fdb852f	[Verifier] Reject invalid type for DILocalVariable. Reviewers: aprantl Subscribers: hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D56414 llvm-svn: 350578	2019-01-07 23:09:09 +00:00
Michael Ferguson	e39b614d1d	[ValueTracking] Adjust comment in test Adjusts a comment in this test to verify commit access. llvm-svn: 350569	2019-01-07 21:02:22 +00:00
Craig Topper	486313b5f7	Recommit r350554 "[X86] Remove AVX512VBMI2 concat and shift intrinsics. Replace with target independent funnel shift intrinsics." The MSVC limit we hit on AutoUpgrade.cpp has been worked around for now. llvm-svn: 350567	2019-01-07 21:00:32 +00:00
Martin Storsjo	93a7137c0a	[ObjectYAML] [COFF] Support multiple symbols with the same name Differential Revision: https://reviews.llvm.org/D56294 llvm-svn: 350566	2019-01-07 20:55:33 +00:00
Craig Topper	fad1589f39	Revert r350554 "[X86] Remove AVX512VBMI2 concat and shift intrinsics. Replace with target independent funnel shift intrinsics." The AutoUpgrade.cpp if/else cascade hit an MSVC limit again. llvm-svn: 350562	2019-01-07 19:39:05 +00:00
Craig Topper	826f44b550	[TargetLowering][AMDGPU] Remove the SimplifyDemandedBits function that takes a User and OpIdx. Stop using it in AMDGPU target for simplifyI24. As we saw in D56057 when we tried to use this function on X86, it's unsafe. It allows the operand node to have multiple users, but doesn't prevent recursing past the first node when it does have multiple users. This can cause other simplifications earlier in the graph without regard to what bits are needed by the other users of the first node. Ideally all we should do to the first node if it has multiple uses is bypass it when its not needed by the user we started from. Doing any other transformation that SimplifyDemandedBits can do like turning ZEXT/SEXT into AEXT would result in an increase in instructions. Fortunately, we already have a function that can do just that, GetDemandedBits. It will only make transformations that involve bypassing a node. This patch changes AMDGPU's simplifyI24, to use a combination of GetDemandedBits to handle the multiple use simplifications. And then uses the regular SimplifyDemandedBits on each operand to handle simplifications allowed when the operand only has a single use. Unfortunately, GetDemandedBits simplifies constants more aggressively than SimplifyDemandedBits. This caused the -7 constant in the changed test to be simplified to remove the upper bits. I had to modify computeKnownBits to account for this by ignoring the upper 8 bits of the input. Differential Revision: https://reviews.llvm.org/D56087 llvm-svn: 350560	2019-01-07 19:30:43 +00:00
Craig Topper	9c4f7e9147	[X86] Remove AVX512VBMI2 concat and shift intrinsics. Replace with target independent funnel shift intrinsics. Differential Revision: https://reviews.llvm.org/D56377 llvm-svn: 350554	2019-01-07 19:10:12 +00:00
Diogo N. Sampaio	f192cdb5c9	[ARM] ComputeKnownBits to handle extract vectors This patch adds the sign/zero extension done by vgetlane to ARM computeKnownBitsForTargetNode. Differential revision: https://reviews.llvm.org/D56098 llvm-svn: 350553	2019-01-07 19:01:47 +00:00
Simon Pilgrim	32f77f2b52	[X86] Add OR(AND(X,C),AND(Y,~C)) bit select tests Based off work for D55935 llvm-svn: 350548	2019-01-07 18:07:56 +00:00
Armando Montanez	488545ef15	[elfabi] Add option to manually specify file read format Although llvm-elfabi will attempt to read input files without needing the format to be manually specified, doing so has the potential to introduce extraneous errors that can hinder debugging (since multiple readers may fail in attempts to read the file). This change allows the input file format to be manually specified to force elfabi to use a single reader. This makes it easier to test and debug errors specific to a given reader. llvm-svn: 350545	2019-01-07 17:33:10 +00:00
Jordan Rupprecht	70038e01c8	[llvm-objcopy] Handle -O <format> flag. Summary: The -O flag is currently being mostly ignored; it's only checked whether or not the output format is "binary". This adds support for a few formats (e.g. elf64-x86-64), so that when specified, the output can change between 32/64 bit and sizes/alignments are updated accordingly. This fixes PR39135 Reviewers: jakehehrlich, jhenderson, alexshap, espindola Reviewed By: jhenderson Subscribers: emaste, arichardson, llvm-commits Differential Revision: https://reviews.llvm.org/D53667 llvm-svn: 350541	2019-01-07 16:59:12 +00:00
Sanjay Patel	47f92d3270	[x86] add more tests for LowerToHorizontalOp(); NFC These tests show missed optimizations and a miscompile similar to PR40243 - https://bugs.llvm.org/show_bug.cgi?id=40243 llvm-svn: 350533	2019-01-07 16:10:14 +00:00
Rhys Perry	f77e2e8406	AMDGPU: test for uniformity of branch instruction, not its condition Summary: If a divergent branch instruction is marked as divergent by propagation rule 2 in DivergencePropagator::exploreSyncDependency() and its condition is uniform, that branch would incorrectly be assumed to be uniform. Reviewers: arsenm, tstellar Reviewed By: arsenm Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D56331 llvm-svn: 350532	2019-01-07 15:52:28 +00:00
James Henderson	9e014b6c3d	[llvm-nm] Add --portability as alias for --format=posix GNU nm supports this alias, so supporting it in llvm-nm makes it easier to transition between the two. Fixes https://bugs.llvm.org/show_bug.cgi?id=40002 Reviewed by: mstorsjo, rupprecht Differential Revision: https://reviews.llvm.org/D56312 llvm-svn: 350522	2019-01-07 14:12:51 +00:00
Matt Arsenault	369acb8470	AMDGPU: Remove VS/SV mappings from select These would violate the constant bus restriction llvm-svn: 350517	2019-01-07 13:21:36 +00:00
Simon Pilgrim	6aac0ec21f	Regenerate test. Prep work towards enabling SimplifyDemandedBits vector support for TRUNCATE as discussed on D56118. llvm-svn: 350514	2019-01-07 12:21:13 +00:00
Simon Pilgrim	09bf22862a	Regenerate test. Prep work towards enabling SimplifyDemandedBits vector support for TRUNCATE as discussed on D56118. llvm-svn: 350513	2019-01-07 12:20:35 +00:00
Craig Topper	1ac0839098	[X86] Update VBMI2 vshld/vshrd tests to use an immediate that doesn't require a modulo. Planning to replace these with funnel shift intrinsics which would mask out the extra bits. This will help minimize test diffs. llvm-svn: 350504	2019-01-07 05:58:53 +00:00
Craig Topper	6ffeeb705f	[X86] Add support for matching vector funnel shift to AVX512VBMI2 instructions. Summary: AVX512VBMI2 supports a funnel shift by immediate and a funnel shift by a variable vector. Reviewers: spatel, RKSimon Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D56361 llvm-svn: 350498	2019-01-06 18:10:18 +00:00
Craig Topper	d0ba531a0c	[X86] Use two pmovmskbs in combineBitcastvxi1 for (i64 (bitcast (v64i1 (truncate (v64i8)))) on KNL. llvm-svn: 350481	2019-01-05 22:42:58 +00:00
Craig Topper	46f8b4a11e	[X86] Allow combinevxi1Bitcast to use pmovmskb on avx512 targets if the input is a truncate from v16i8/v32i8. This is especially helpful on targets without avx512bw since we don't have a good way to convert from v16i8/v32i8 to v16i1/v32i1 for the truncate anyway. If we're just going to convert it to a GPR we might as well use pmovmskb to accomplish both. llvm-svn: 350480	2019-01-05 21:40:07 +00:00
Stanislav Mekhanoshin	35a3a3bd11	Added single use check to ShrinkDemandedConstant Fixes cvt_f32_ubyte combine. performCvtF32UByteNCombine() could shrink source node to demanded bits only even if there are other uses. Differential Revision: https://reviews.llvm.org/D56289 llvm-svn: 350475	2019-01-05 19:20:00 +00:00
Craig Topper	27406e1f9e	[X86] Regenerate test to merge 32-bit and 64-bit check lines. NFC llvm-svn: 350474	2019-01-05 19:19:37 +00:00
Craig Topper	3f48dbf72e	[X86] Allow LowerTRUNCATE to use PACKUS/PACKSS for v16i16->v16i8 truncate when -mprefer-vector-width-256 is in effect and BWI is not available. llvm-svn: 350473	2019-01-05 18:48:11 +00:00
Nikita Popov	25a02c12f1	[InstCombine] Improve cttz/ctlz + icmp tests; NFC Change part of the tests to use vectors (I'm using scalar for ugt and vector for ult), add multiuse variations, rename %lz to %tz for the cttz tests. llvm-svn: 350471	2019-01-05 17:36:05 +00:00
Nikita Popov	b46680407d	[InstCombine] Add cttz/ctlz + icmp ugt/ult tests; NFC llvm-svn: 350468	2019-01-05 15:51:59 +00:00
Nikita Popov	65038515ee	[InstCombine] Relax cttz/ctlz with select on zero The cttz/ctlz intrinsics have a parameter specifying whether the result is undefined for zero. cttz(x, false) can be relaxed to cttz(x, true) if x is known non-zero, and in fact such an optimization is already performed. However, this currently doesn't work if x is non-zero as a result of a select rather than an explicit branch. This patch adds handling for this case, thus allowing x != 0 ? cttz(x, false) : y to simplify to x != 0 ? cttz(x, true) : y. Differential Revision: https://reviews.llvm.org/D55786 llvm-svn: 350463	2019-01-05 09:48:16 +00:00
Nikita Popov	7bd4900ba0	[InstCombine] Add vector tests for select + ctlz/cttz; NFC llvm-svn: 350462	2019-01-05 09:48:05 +00:00
Evgeniy Stepanov	0184c53cbd	Revert "Revert "[hwasan] Android: Switch from TLS_SLOT_TSAN(8) to TLS_SLOT_SANITIZER(6)"" This reapplies commit r348983. llvm-svn: 350448	2019-01-05 00:44:58 +00:00
Vyacheslav Zakharin	0a6f86c54b	Update the pr_datasz of .note.gnu.property section. Patch by Xiang Zhang. Differential Revision: https://reviews.llvm.org/D56080 llvm-svn: 350436	2019-01-04 21:25:01 +00:00
Nikita Popov	6658fce4fc	[BDCE] Remove dead uses of arguments In addition to finding dead uses of instructions, also find dead uses of function arguments, and replace them with zero as well. I'm changing the way the known bits are computed here to remove the coupling between the transfer function and the algorithm. It previously relied on the first op being visited first and computing known bits -- unless the first op is not an instruction, in which case they're computed on the second op. I could have adjusted this to check for "instruction or argument", but I think it's better to avoid the repeated calculation with an explicit flag. Differential Revision: https://reviews.llvm.org/D56247 llvm-svn: 350435	2019-01-04 21:21:43 +00:00
Craig Topper	cfeb1cf9af	[X86] Add INSERT_SUBVECTOR to ComputeNumSignBits This adds support for calculating sign bits of insert_subvector. I based it on the computeKnownBits. My motivating case is propagating sign bits information across basic blocks on AVX targets where concatenating using insert_subvector is common. Differential Revision: https://reviews.llvm.org/D56283 llvm-svn: 350432	2019-01-04 20:50:59 +00:00
Sanjay Patel	6a5656703e	[x86] add tests for potential horizontal vector ops; NFC These are modified versions of the FP tests from rL349923. llvm-svn: 350430	2019-01-04 20:14:53 +00:00
Peter Collingbourne	87f477b5e4	hwasan: Implement lazy thread initialization for the interceptor ABI. The problem is similar to D55986 but for threads: a process with the interceptor hwasan library loaded might have some threads started by instrumented libraries and some by uninstrumented libraries, and we need to be able to run instrumented code on the latter. The solution is to perform per-thread initialization lazily. If a function needs to access shadow memory or add itself to the per-thread ring buffer its prologue checks to see whether the value in the sanitizer TLS slot is null, and if so it calls __hwasan_thread_enter and reloads from the TLS slot. The runtime does the same thing if it needs to access this data structure. This change means that the code generator needs to know whether we are targeting the interceptor runtime, since we don't want to pay the cost of lazy initialization when targeting a platform with native hwasan support. A flag -fsanitize-hwaddress-abi={interceptor,platform} has been introduced for selecting the runtime ABI to target. The default ABI is set to interceptor since it's assumed that it will be more common that users will be compiling application code than platform code. Because we can no longer assume that the TLS slot is initialized, the pthread_create interceptor is no longer necessary, so it has been removed. Ideally, lazy initialization should only cost one instruction in the hot path, but at present the call may cause us to spill arguments to the stack, which means more instructions in the hot path (or theoretically in the cold path if the spills are moved with shrink wrapping). With an appropriately chosen calling convention for the per-thread initialization function (TODO) the hot path should always need just one instruction and the cold path should need two instructions with no spilling required. Differential Revision: https://reviews.llvm.org/D56038 llvm-svn: 350429	2019-01-04 19:27:04 +00:00
Teresa Johnson	853b962416	[ThinLTO] Handle chains of aliases At -O0, globalopt is not run during the compile step, and we can have a chain of an alias having an immediate aliasee of another alias. The summaries are constructed assuming aliases in a canonical form (flattened chains), and as a result only the base object but no intermediate aliases were preserved. Fix by adding a pass that canonicalize aliases, which ensures each alias is a direct alias of the base object. Reviewers: pcc, davidxl Subscribers: mehdi_amini, inglorion, eraman, steven_wu, dexonsmith, arphaman, llvm-commits Differential Revision: https://reviews.llvm.org/D54507 llvm-svn: 350423	2019-01-04 19:04:54 +00:00
Sanjay Patel	6153565511	[x86] lower extracted fadd/fsub to horizontal vector math; 2nd try The 1st try for this was at rL350369, but it caused IR-level diffs because our cost models differentiate custom vs. legal/promote lowering. So that was reverted at rL350373. The cost models were fixed independently at rL350403, so this is effectively the same patch as last time. Original commit message: This would show up if we fix horizontal reductions to narrow as they go along, but it's an improvement for size and/or Jaguar (fast-hops) independent of that. We need to do this late to not interfere with other pattern matching of larger horizontal sequences. We can extend this to integer ops in a follow-up patch. Differential Revision: https://reviews.llvm.org/D56011 llvm-svn: 350421	2019-01-04 17:48:13 +00:00
Vedant Kumar	a1778df474	[CodeExtractor] Do not extract unsafe lifetime markers Lifetime markers which reference inputs to the extraction region are not safe to extract. Example ('rhs' will be extracted): ``` entry: +------------+ \| x = alloca \| \| y = alloca \| +------------+ / \ lhs: rhs: +-------------------+ +-------------------+ \| lifetime_start(x) \| \| lifetime_start(x) \| \| use(x) \| \| lifetime_start(y) \| \| lifetime_end(x) \| \| use(x, y) \| \| lifetime_start(y) \| \| lifetime_end(y) \| \| use(y) \| \| lifetime_end(x) \| \| lifetime_end(y) \| +-------------------+ +-------------------+ ``` Prior to extraction, the stack coloring pass sees that the slots for 'x' and 'y' are in-use at the same time. After extraction, the coloring pass infers that 'x' and 'y' are not in-use concurrently, because markers from 'rhs' are no longer available to help decide otherwise. This leads to a miscompile, because the stack slots actually are in-use concurrently in the extracted function. Fix this by moving lifetime start/end markers for memory regions defined in the calling function around the call to the extracted function. Fixes llvm.org/PR39671 (rdar://45939472). Differential Revision: https://reviews.llvm.org/D55967 llvm-svn: 350420	2019-01-04 17:43:22 +00:00
Sanjay Patel	722466e1f1	[InstCombine] reduce raw IR narrowing rotate patterns to funnel shift Similar to rL350199 - there are no known analysis/codegen holes for funnel shift intrinsics now, so we can canonicalize the 6+ regular instructions to funnel shift to improve vectorization, inlining, unrolling, etc. llvm-svn: 350419	2019-01-04 17:38:12 +00:00
Nico Weber	c9141fc99f	[gn build] Commit change that should have been in r350410. llvm-svn: 350416	2019-01-04 17:26:05 +00:00
John Brawn	39ac159c24	[LICM] Adjust how moving the re-hoist point works In some cases the order that we hoist instructions in means that when rehoisting (which uses the same order as hoisting) we can rehoist to a block A, then a block B, then block A again. This currently causes an assertion failure as it expects that when changing the hoist point it only ever moves to a block that dominates the hoist point being moved from. Fix this by moving the re-hoist point when it doesn't dominate the dominator of hoisted instruction, or in other words when it wouldn't dominate the uses of the instruction being rehoisted. Differential Revision: https://reviews.llvm.org/D55266 llvm-svn: 350408	2019-01-04 17:12:09 +00:00
Simon Pilgrim	c2054144ee	[CostModel][X86] Fix SSE1 FADD/FSUB costs Noticed in D56011 - handle the case that scalar fp ops are quicker on P3 than P4 Add the other costs so that we're not relying on the default "is legal/custom" cost logic. llvm-svn: 350403	2019-01-04 16:55:57 +00:00
Ranjeet Singh	107dd2565c	Revert patches 348835 and 348571 because they're causing code size performance regressions. llvm-svn: 350402	2019-01-04 16:39:10 +00:00
Simon Pilgrim	71d61567c0	[CostModel][X86] Add SSE1 fp cost tests llvm-svn: 350401	2019-01-04 16:37:01 +00:00
Simon Pilgrim	9f4dea8c06	[X86] Add VPSLLI/VPSRLI ((X >>u C1) << C2) SimplifyDemandedBits combine Repeat of the generic SimplifyDemandedBits shift combine llvm-svn: 350399	2019-01-04 15:43:43 +00:00
Simon Pilgrim	7ee2285625	[X86] Split immediate shifts tests. NFCI. A future patch will combine logical shifts more aggressively. llvm-svn: 350396	2019-01-04 14:56:10 +00:00
Florian Hahn	7902405c42	[ValueTracking] Fix a misuse of APInt in GetPointerBaseWithConstantOffset GetPointerBaseWithConstantOffset include this code, where ByteOffset and GEPOffset are both of type llvm::APInt : ByteOffset += GEPOffset.getSExtValue(); The problem with this line is that getSExtValue() returns an int64_t, but the += matches an overload for uint64_t. The problem is that the resulting APInt is no longer considered to be signed. That in turn causes assertion failures later on if the relevant pointer type is > 64 bits in width and the GEPOffset was negative. Changing it to ByteOffset += GEPOffset.sextOrTrunc(ByteOffset.getBitWidth()); resolves the issue and explicitly performs the sign-extending or truncation. Additionally, instead of asserting later if the result is > 64 bits, it breaks out of the loop in that case. See also https://reviews.llvm.org/D24729 https://reviews.llvm.org/D24772 This commit must be merged after D38662 in order for the test to pass. Patch by Michael Ferguson <mpfergu@gmail.com>. Reviewers: reames, sanjoy, hfinkel Reviewed By: hfinkel Differential Revision: https://reviews.llvm.org/D38501 llvm-svn: 350395	2019-01-04 14:53:22 +00:00
Craig Topper	6265a15f2e	[X86] Add post-isel peephole to fold KAND+KORTEST into KTEST if only the zero flag is used. Doing this late so we will prefer to fold the AND into a masked comparison first. That can be better for the live range of the mask register. Differential Revision: https://reviews.llvm.org/D56246 llvm-svn: 350374	2019-01-04 00:10:58 +00:00
Sanjay Patel	26ce9c38a7	revert r350369: [x86] lower extracted fadd/fsub to horizontal vector math There are non-codegen tests that need to be updated with this code change. llvm-svn: 350373	2019-01-04 00:02:02 +00:00
Sanjay Patel	ef4afca2ad	[x86] lower extracted fadd/fsub to horizontal vector math This would show up if we fix horizontal reductions to narrow as they go along, but it's an improvement for size and/or Jaguar (fast-hops) independent of that. We need to do this late to not interfere with other pattern matching of larger horizontal sequences. We can extend this to integer ops in a follow-up patch. Differential Revision: https://reviews.llvm.org/D56011 llvm-svn: 350369	2019-01-03 23:16:19 +00:00
Heejin Ahn	777d01c756	[WebAssembly] Optimize Irreducible Control Flow Summary: Irreducible control flow is not that rare, e.g. it happens in malloc and 3 other places in the libc portions linked in to a hello world program. This patch improves how we handle that code: it emits a br_table to dispatch to only the minimal necessary number of blocks. This reduces the size of malloc by 33%, and makes it comparable in size to asm2wasm's malloc output. Added some tests, and verified this passes the emscripten-wasm tests run on the waterfall (binaryen2, wasmobj2, other). Reviewers: aheejin, sunfish Subscribers: mgrang, jgravelle-google, sbc100, dschuff, llvm-commits Differential Revision: https://reviews.llvm.org/D55467 Patch by Alon Zakai (kripken) llvm-svn: 350367	2019-01-03 23:10:11 +00:00
Wouter van Oortmerssen	820c6263d9	[WebAssembly] Fixed disassembler not knowing about new brlist operand Summary: The previously introduced new operand type for br_table didn't have a disassembler implementation, causing an assert. Reviewers: dschuff, aheejin Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D56227 llvm-svn: 350366	2019-01-03 23:01:30 +00:00
Wouter van Oortmerssen	9843295608	[WebAssembly] Made InstPrinter more robust Summary: Instead of asserting on certain kinds of malformed instructions, it now still print, but instead adds an annotation indicating the problem, and/or indicates invalid_type etc. We're using the InstPrinter from many contexts that can't always guarantee values are within range (e.g. the disassembler), where having output is more valueable than asserting. Reviewers: dschuff, aheejin Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D56223 llvm-svn: 350365	2019-01-03 22:59:59 +00:00
Sanjay Patel	b8687c2168	[x86] add 512-bit vector tests for horizontal ops; NFC llvm-svn: 350364	2019-01-03 22:55:18 +00:00
Sanjay Patel	ac23c46883	[x86] add AVX512 runs for horizontal ops; NFC llvm-svn: 350362	2019-01-03 22:42:32 +00:00
Craig Topper	58c61dce1d	[X86] Add test case for D56283. This tests a case where we need to be able to compute sign bits for two insert_subvectors that is a liveout of a basic block. The result is then used as a boolean vector in another basic block. llvm-svn: 350359	2019-01-03 22:31:07 +00:00
Sanjay Patel	6b8a9dbfc4	[x86] remove dead CHECK lines from test file; NFC llvm-svn: 350358	2019-01-03 22:30:36 +00:00
Sanjay Patel	fd58d623ff	[x86] split tests for FP and integer horizontal math These are similar patterns, but when you throw AVX512 onto the pile, the number of variations explodes. For FP, we really don't care about AVX1 vs. AVX2 for FP ops. There may be some superficial shuffle diffs, but that's not what we're testing for here, so I removed those RUNs. Separating by type also lets us specify 'sse3' for the FP file vs. 'ssse3' for the integer file...because x86. llvm-svn: 350357	2019-01-03 22:26:51 +00:00
Sanjay Patel	8db27b31ac	[x86] add common FileCheck prefix to reduce assert duplication; NFC llvm-svn: 350356	2019-01-03 22:11:14 +00:00
Sanjay Patel	9633d76a40	[DAGCombiner][x86] scalarize binop followed by extractelement As noted in PR39973 and D55558: https://bugs.llvm.org/show_bug.cgi?id=39973 ...this is a partial implementation of a fold that we do as an IR canonicalization in instcombine: // extelt (binop X, Y), Index --> binop (extelt X, Index), (extelt Y, Index) We want to have this in the DAG too because as we can see in some of the test diffs (reductions), the pattern may not be visible in IR. Given that this is already an IR canonicalization, any backend that would prefer a vector op over a scalar op is expected to already have the reverse transform in DAG lowering (not sure if that's a realistic expectation though). The transform is limited with a TLI hook because there's an existing transform in CodeGenPrepare that tries to do the opposite transform. Differential Revision: https://reviews.llvm.org/D55722 llvm-svn: 350354	2019-01-03 21:31:16 +00:00
Nirav Dave	667838f034	[AVR] Update integration/blink.ll as we now generate sbi/cbi instructions. Silence long standing test failure. llvm-svn: 350353	2019-01-03 21:25:39 +00:00
Alexander Timofeev	993e2798fd	[AMDGPU] Fix scalar operand folding bug that causes SHOC performance regression. Detailed description: SIFoldOperands::foldInstOperand iterates over the operand uses calling the function that changes def-use iteratorson the way. As a result loop exits immediately when def-use iterator is changed. Hence, the operand is folded to the very first use instruction only. This makes VGPR live along the whole basic block and increases register pressure significantly. The performance drop observed in SHOC DeviceMemory test is caused by this bug. Proposed fix: collect uses to separate container for further processing in another loop. Testing: make check-llvm SHOC performance test. Reviewers: rampitec, ronlieb Differential Revision: https://reviews.llvm.org/D56161 llvm-svn: 350350	2019-01-03 19:55:32 +00:00
Nico Weber	6f06ce641e	Remove unused %host_cc lit pattern It was added in r257236 but then the one use was removed in r309517. Since no test should call %host_cc, remove the pattern. Differential Revision: https://reviews.llvm.org/D56200 llvm-svn: 350348	2019-01-03 19:31:53 +00:00
Armando Montanez	31f0f659a8	[elfabi] Introduce tool for ELF TextAPI Follow up for D53051 This patch introduces the tool associated with the ELF implementation of TextAPI (previously llvm-tapi, renamed for better distinction). This tool will house a number of features related to enalysis and manipulation of shared object's exposed interfaces. The first major feature for this tool is support for producing binary stubs that are useful for compile-time linking of shared objects. This patch introduces beginnings of support for reading binary ELF objects to work towards that goal. Added: - elfabi tool. - support for reading architecture from a binary ELF file into an ELFStub. - Support for writing .tbe files. Differential Revision: https://reviews.llvm.org/D55352 llvm-svn: 350341	2019-01-03 18:32:36 +00:00
Sanjay Patel	4e71ff234e	[x86] add tests for buildvector with extracted element; NFC llvm-svn: 350338	2019-01-03 17:55:32 +00:00
Jordan Rupprecht	1f82176f7d	[llvm-objcopy][ELF] Implement a mutable section visitor that updates size-related fields (Size, EntrySize, Align) before layout. Summary: Fix EntrySize, Size, and Align before doing layout calculation. As a side cleanup, this removes a dependence on sizeof(Elf_Sym) within BinaryReader, so we can untemplatize that. This unblocks a cleaner implementation of handling the -O<format> flag. See D53667 for a previous attempt. Actual implementation of the -O<format> flag will come in an upcoming commit, this is largely a NFC (although not _totally_ one, because alignment on binary input was actually wrong before). Reviewers: jakehehrlich, jhenderson, alexshap, espindola Reviewed By: jhenderson Subscribers: emaste, arichardson, llvm-commits Differential Revision: https://reviews.llvm.org/D56211 llvm-svn: 350336	2019-01-03 17:45:30 +00:00
Simon Pilgrim	f0c533b7db	[CostModel][X86] Add truncate cost tests to cover all legal destination types We were only testing costs for legal source vector element counts llvm-svn: 350323	2019-01-03 14:49:39 +00:00
Alex Bradbury	2ba76be882	[RISCV][MC] Accept %lo and %pcrel_lo on operands to li This matches GNU assembler behaviour. llvm-svn: 350321	2019-01-03 14:41:41 +00:00
Serge Guelton	873cba17b2	Python compat - iteritems() vs. items() Always use `items()` and introduce extra `list(...)` call when needed. Differential Revision: https://reviews.llvm.org/D56257 llvm-svn: 350312	2019-01-03 14:12:23 +00:00
Serge Guelton	beb6fee542	Python compat - portable way of raising exceptions Differential Revision: https://reviews.llvm.org/D56256 llvm-svn: 350311	2019-01-03 14:12:13 +00:00
Serge Guelton	7d0174c558	[NFC] Remove unused Python import Differential Revision: https://reviews.llvm.org/D56254 llvm-svn: 350310	2019-01-03 14:12:07 +00:00
Serge Guelton	07ccb4b81d	Pythran compat - range vs. xrange Use range instead of xrange whenever possible. The extra list creation in Python2 is generally not a performance bottleneck. Differential Revision: https://reviews.llvm.org/D56253 llvm-svn: 350309	2019-01-03 14:11:58 +00:00
Serge Guelton	4a27478a5b	Python compat - print statement Make sure all print statements are compatible with Python 2 and Python3 using the `from __future__ import print_function` statement. Differential Revision: https://reviews.llvm.org/D56249 llvm-svn: 350307	2019-01-03 14:11:33 +00:00
Philip Pfaffe	b39a97c8f6	[NewPM] Port Msan Summary: Keeping msan a function pass requires replacing the module level initialization: That means, don't define a ctor function which calls __msan_init, instead just declare the init function at the first access, and add that to the global ctors list. Changes: - Pull the actual sanitizer and the wrapper pass apart. - Add a newpm msan pass. The function pass inserts calls to runtime library functions, for which it inserts declarations as necessary. - Update tests. Caveats: - There is one test that I dropped, because it specifically tested the definition of the ctor. Reviewers: chandlerc, fedor.sergeev, leonardchan, vitalybuka Subscribers: sdardis, nemanjai, javed.absar, hiraditya, kbarton, bollu, atanasyan, jsji Differential Revision: https://reviews.llvm.org/D55647 llvm-svn: 350305	2019-01-03 13:42:44 +00:00
Diogo N. Sampaio	25ae9a84c3	[NFC] Fix missing testfile change of rL350299 This file was missing on the patch llvm-svn: 350302	2019-01-03 12:48:06 +00:00
Simon Pilgrim	44d6b25d2c	[X86] Cleanup saturated add/sub tests Use X86/X64 check prefixes Use nounwind to reduce cfi noise llvm-svn: 350301	2019-01-03 12:31:13 +00:00
Simon Pilgrim	c2aadfaaad	[SLPVectorizer] Flag ADD/SUB SSAT/USAT intrinsics trivially vectorizable (PR40123) Enables SLP vectorization for the SSE2 PADDS/PADDUS/PSUBS/PSUBUS style intrinsics llvm-svn: 350300	2019-01-03 12:18:23 +00:00
Diogo N. Sampaio	8786a946d8	[ARM] Add command-line option for SB SB (Speculative Barrier) is only mandatory from 8.5 onwards but is optional from Armv8.0-A. This patch adds a command line option to enable SB, as it was previously only possible to enable by selecting -march=armv8.5-a. This patch also renames FeatureSpecRestrict to FeatureSB. Reviewed By: olista01, LukeCheeseman Differential Revision: https://reviews.llvm.org/D55990 llvm-svn: 350299	2019-01-03 12:09:12 +00:00
Simon Pilgrim	c5e22b29e4	[SLPVectorizer][X86] Add ADD/SUB SSAT/USAT tests (PR40123) llvm-svn: 350297	2019-01-03 12:02:14 +00:00
Simon Pilgrim	d824f99a6c	[X86] Add ADD/SUB SSAT/USAT vector costs (PR40123) Costs for real SSE2 instructions llvm-svn: 350295	2019-01-03 11:38:42 +00:00
Simon Pilgrim	55ea89305d	[X86] Add ADD/SUB SSAT/USAT cost tests (PR40123) llvm-svn: 350293	2019-01-03 11:29:24 +00:00
Piotr Sobczak	3abef8f9ea	[AMDGPU] Change section name with metadata access Summary: The commit rL348922 introduced a means to set Metadata section kind for a global variable, if its explicit section name was prefixed with ".AMDGPU.metadata.". This patch changes that prefix to ".AMDGPU.comment.", as "metadata" in the section name might lead to ambiguity with metadata used by AMD PAL runtime. Change-Id: Idd4748800d6fe801441d91595fc21e5a4171e668 Reviewers: kzhuravl Reviewed By: kzhuravl Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D56197 llvm-svn: 350292	2019-01-03 11:22:58 +00:00
Markus Lavin	72b9deb21f	[CodeGen] Skip over dbg-instr in twoaddr pass A DBG_VALUE between a two-address instruction and a following COPY would prevent rescheduleMIBelowKill optimization inside TwoAddressInstructionPass. Differential Revision: https://reviews.llvm.org/D55987 llvm-svn: 350289	2019-01-03 08:36:06 +00:00
Martin Storsjo	74e7d26090	[llvm-readobj] [COFF] Print the symbol index for relocations There can be multiple local symbols with the same name (for e.g. comdat sections), and thus the symbol name itself isn't enough to disambiguate symbols. Differential Revision: https://reviews.llvm.org/D56140 llvm-svn: 350288	2019-01-03 08:08:23 +00:00
Craig Topper	5ef47ad82e	[X86] Add test cases for opportunities to use KTEST when check if the result of ANDing two mask registers is zero. The test cases are constructed to avoid folding the AND into a masked compare operation. Currently we emit a KAND and a KORTEST for these cases. llvm-svn: 350287	2019-01-03 07:12:54 +00:00
QingShan Zhang	f24ec7bdd0	[Power9] Enable the Out-of-Order scheduling model for P9 hw When switched to the MI scheduler for P9, the hardware is modeled as out of order. However, inside the MI Scheduler algorithm, we still use the in-order scheduling model as the MicroOpBufferSize isn't set. The MI scheduler take it as the hw cannot buffer the op. So, only when all the available instructions issued, the pending instruction could be scheduled. That is not true for our P9 hw in fact. This patch is trying to enable the Out-of-Order scheduling model. The buffer size 44 is picked from the P9 hw spec, and the perf test indicate that, its value won't hurt the cpu2017. With this patch, there are 3 specs improved over 3% and 1 spec deg over 3%. The detail is as follows: x264_r: +6.95% cactuBSSN_r: +6.94% lbm_r: +4.11% xz_r: -3.85% And the GEOMEAN for all the C/C++ spec in spec2017 is about 0.18% improved. Reviewer: Nemanjai Differential Revision: https://reviews.llvm.org/D55810 llvm-svn: 350285	2019-01-03 05:04:18 +00:00
Pete Cooper	697281df42	Teach ObjCARC optimizer about equivalent PHIs when eliminating autoreleaseRV/retainRV pairs OptimizeAutoreleaseRVCall skips optimizing llvm.objc.autoreleaseReturnValue if it sees a user which is llvm.objc.retainAutoreleasedReturnValue, and if they have equivalent arguments (either identical or equivalent PHIs). It then assumes that ObjCARCOpt::OptimizeRetainRVCall will optimize the pair instead. Trouble is, ObjCARCOpt::OptimizeRetainRVCall doesn't know about equivalent PHIs so optimizes in a different way and we are left with an unoptimized llvm.objc.autoreleaseReturnValue. This teaches ObjCARCOpt::OptimizeRetainRVCall to also understand PHI equivalence. rdar://problem/47005143 Reviewed By: ahatanak Differential Revision: https://reviews.llvm.org/D56235 llvm-svn: 350284	2019-01-03 01:38:08 +00:00
Daniel Sanders	157c43f823	[tblgen][disasm] Emit record names again when decoder conflicts occur. And add a test for it. llvm-svn: 350277	2019-01-03 00:14:33 +00:00
Teresa Johnson	0aa09c62cb	[gold] emit assembly listing from gold plugin on LTO stage Summary: Sometimes it's useful to emit assembly after LTO stage to modify it manually. Emitting precodegen bitcode file (via save-temps plugin option) and then feeding it to llc doesn't always give the same binary as original. This patch is simpler alternative to https://reviews.llvm.org/D24020. Patch by Denis Bakhvalov. Reviewers: mehdi_amini, tejohnson Reviewed By: tejohnson Subscribers: MaskRay, inglorion, dexonsmith, llvm-commits Differential Revision: https://reviews.llvm.org/D56114 llvm-svn: 350276	2019-01-02 23:48:00 +00:00
Craig Topper	df5304d8de	[X86] Add load folding support to the custom isel we do for X86ISD::UMUL/SMUL. The peephole pass isn't always able to fold the load because it can't commute the implicit usage of AL/AX/EAX/RAX. llvm-svn: 350272	2019-01-02 23:24:08 +00:00
Craig Topper	ce46bfa848	[X86] Add test cases to show that we fail to fold loads into i8 smulo and i8/i16/i32/i64 umulo lowering without the assistance of the peephole pass. NFC llvm-svn: 350271	2019-01-02 23:24:03 +00:00
Wouter van Oortmerssen	ad72f68501	[WebAssembly] made assembler parse block_type Summary: This was previously ignored and an incorrect value generated. Also fixed Disassembler's handling of block_type. Reviewers: dschuff, aheejin Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D56092 llvm-svn: 350270	2019-01-02 23:23:51 +00:00
Xin Tong	33e3b4b9b3	[ThinLTO] Scan all variants of vague symbol for reachability. Summary: Alias can make one (but not all) live, we still need to scan all others if this symbol is reachable from somewhere else. Reviewers: tejohnson, grimar Subscribers: mehdi_amini, inglorion, eraman, steven_wu, dexonsmith, llvm-commits Differential Revision: https://reviews.llvm.org/D56117 llvm-svn: 350269	2019-01-02 23:18:20 +00:00
Nikita Popov	41f5710328	[BDCE] Fix typo in test; NFC shl by 32 is undefined. This was intended to be a shl by 31 as part of a rotate sequence. llvm-svn: 350265	2019-01-02 22:34:32 +00:00
Pete Cooper	8d58048024	Fix assert in ObjCARC optimizer when deleting retainBlock of null or undef. The caller to EraseInstruction had this conditional: // ARC calls with null are no-ops. Delete them. if (IsNullOrUndef(Arg)) but the assert inside EraseInstruction only allowed ConstantPointerNull and not undef or bitcasts. This adds support for both of these cases. rdar://problem/47003805 llvm-svn: 350261	2019-01-02 21:00:02 +00:00
Thomas Lively	88590e99f2	[WebAssembly][NFC] Elaborate on simd-noopt test comment llvm-svn: 350260	2019-01-02 20:43:08 +00:00
Nikita Popov	cc6ef7f153	[BDCE] Remove instructions without demanded bits If an instruction has no demanded bits, remove it directly during BDCE, instead of leaving it for something else to clean up. Differential Revision: https://reviews.llvm.org/D56185 llvm-svn: 350257	2019-01-02 20:02:14 +00:00
Craig Topper	9d4860ec4e	[X86] Remove X86ISD::INC/DEC. Just select them from X86ISD::ADD/SUB at isel time INC/DEC are pretty much the same as ADD/SUB except that they don't update the C flag. This patch removes the special nodes and just pattern matches from ADD/SUB during isel if the C flag isn't being used. I had to avoid selecting DEC is the result isn't used. This will become a SUB immediate which will turned into a CMP later by optimizeCompareInstr. This lead to the one test change where we use a CMP instead of a DEC for an overflow intrinsic since we only checked the flag. This also exposed a hole in our RMW flag matching use of hasNoCarryFlagUses. Our root node for the match is a store and there's no guarantee that all the flag users have been selected yet. So hasNoCarryFlagUses needs to check copyToReg and machine opcodes, but it also needs to check for the pre-match SETCC, SETCC_CARRY, BRCOND, and CMOV opcodes. Differential Revision: https://reviews.llvm.org/D55975 llvm-svn: 350245	2019-01-02 19:01:05 +00:00
Craig Topper	8dd7bd2cd7	[DAGCombiner] After performing the division by constant optimization for a DIV or REM node, replace the users of the corresponding REM or DIV node if it exists. Currently we expand the two nodes separately. This gives DAG combiner an opportunity to optimize the expanded sequence taking into account only one set of users. When we expand the other node we'll create the expansion again, but might not be able to optimize it the same way. So the nodes won't CSE and we'll have two similarish sequences in the same basic block. By expanding both nodes at the same time we'll avoid prematurely optimizing the expansion until both the division and remainder have been replaced. Improves the test case from PR38217. There may be additional opportunities after this. Differential Revision: https://reviews.llvm.org/D56145 llvm-svn: 350239	2019-01-02 18:19:07 +00:00
Craig Topper	44bcc824d3	[X86] Adding full coverage of MC encoding for the XOP and LWP ISAs. Adding MC regressions tests to cover the XOP isa set. This patch is part of a larger task to cover MC encoding of all X86 isa sets started in revision: https://reviews.llvm.org/D39952 Differential Revision: https://reviews.llvm.org/D41392 llvm-svn: 350237	2019-01-02 18:09:41 +00:00
Craig Topper	3109f3a4ab	[LegalizeIntegerTypes] When promoting the result of an extract_vector_elt also promote the input type if necessary By also promoting the input type we get a better idea for what scalar type to use. This can provide better results if the result of the extract is sign extended. What was previously happening is that the extract result would be legalized, sometime later the input of the sign extend would be legalized using the result of the extract. Then later the extract input would be legalized forcing a truncate into the input of the sign extend using a replace all uses. This requires DAG combine to combine out the sext/truncate pair. But sometimes we visited the truncate first and messed things up before the sext could be combined. By creating the extract with the correct scalar type when we create legalize the result type, the truncate will be added right away. Then when the sign_extend input is legalized it will create an any_extend of the truncate which can be optimized by getNode to maybe remove the truncate. And then a sign_extend_inreg. Now DAG combine doesn't have to worry about getting rid of the extend. This fixes the regression on X86 in D56156. Differential Revision: https://reviews.llvm.org/D56176 llvm-svn: 350236	2019-01-02 17:58:30 +00:00
Craig Topper	c562fae02b	[DAGCombiner][X86][PowerPC] Teach visitSIGN_EXTEND_INREG to fold (sext_in_reg (aext/sext x)) -> (sext x) when x has more than 1 sign bit and the sext_inreg is from one of them. If x has multiple sign bits than it doesn't matter which one we extend from so we can sext from x's msb instead. The X86 setcc-combine.ll changes are a little weird. It appears we ended up with a (sext_inreg (aext (trunc (extractelt)))) after type legalization. The sext_inreg+aext now gets optimized by this combine to leave (sext (trunc (extractelt))). Then we visit the trunc before we visit the sext. This ends up changing the truncate to an extractvectorelt from a bitcasted vector. I have a follow up patch to fix this. Differential Revision: https://reviews.llvm.org/D56156 llvm-svn: 350235	2019-01-02 17:58:27 +00:00
Wei Mi	ecc89b76cb	[PowerPC] Remove SeenUse check when optimizing conditional branch in PPCPreEmitPeephole pass. PPCPreEmitPeephole will convert a BC to B when the conditional branch is based on a constant CR by CRSET or CRUNSET. This is added in https://reviews.llvm.org/rL343100. When the conditional branch is known to be always taken, all branches will be removed and a new unconditional branch will be inserted. However, when SeenUse is false the original patch will not remove the branches, but still insert the new unconditional branch, update the successors and create inconsistent IR. Compiling the synthetic testcase included can show the problem we run into. The patch simply removes the SeenUse condition when adding branches into InstrsToErase set. Differential Revision: https://reviews.llvm.org/D56041 llvm-svn: 350223	2019-01-02 17:07:23 +00:00
Simon Pilgrim	d8125726d5	[X86] Support SHLD/SHRD masked shift-counts (PR34641) Peek through shift modulo masks while matching double shift patterns. I was hoping to delay this until I could remove the X86 code with generic funnel shift matching (PR40081) but this will do for now. Differential Revision: https://reviews.llvm.org/D56199 llvm-svn: 350222	2019-01-02 17:05:37 +00:00
Sanjay Patel	eafd481aad	[x86] add more tests for potential horizontal ops; NFC As discussed in D56011 - add runs for AVX512 and tests with extra uses. llvm-svn: 350221	2019-01-02 16:36:04 +00:00
Hal Finkel	4f2381440d	[BasicAA] Support arbitrary pointer sizes (and fix an overflow bug) Motivated by the discussion in D38499, this patch updates BasicAA to support arbitrary pointer sizes by switching most remaining non-APInt calculations to use APInt. The size of these APInts is set to the maximum pointer size (maximum over all address spaces described by the data layout string). Most of this translation is straightforward, but this patch contains a fix for a bug that revealed itself during this translation process. In order for test/Analysis/BasicAA/gep-and-alias.ll to pass, which is run with 32-bit pointers, the intermediate calculations must be performed using 64-bit integers. This is because, as noted in the patch, when GetLinearExpression decomposes an expression into C1V+C2, and we then multiply this by Scale, and distribute, to get (C1Scale)V + C2Scale, it can be the case that, even through C1V+C2 does not overflow for relevant values of V, (C2Scale) can overflow. If this happens, later logic will draw invalid conclusions from the (base) offset value. Thus, when initially applying the APInt conversion, because the maximum pointer size in this test is 32 bits, it started failing. Suspicious, I created a 64-bit version of this test (included here), and that failed (miscompiled) on trunk for a similar reason (the multiplication can overflow). After fixing this overflow bug, the first test case (at least) in Analysis/BasicAA/q.bad.ll started failing. This is also a 32-bit test, and was relying on having 64-bit intermediate values to have BasicAA return an accurate result. In order to fix this problem, and because I believe that it is not uncommon to use i64 indexing expressions in 32-bit code (especially portable code using int64_t), it seems reasonable to always use at least 64-bit integers. In this way, we won't regress our analysis capabilities (and there's a command-line option added, so experimenting with this should be easy). As pointed out by Eli during the review, there are other potential overflow conditions that this patch does not address. Fixing those is left to follow-up work. Patch by me with contributions from Michael Ferguson (mferguson@cray.com). Differential Revision: https://reviews.llvm.org/D38662 llvm-svn: 350220	2019-01-02 16:28:09 +00:00
Piotr Sobczak	378131bae0	[AMDGPU] Handle OR as operand of raw load/store Summary: Use isBaseWithConstantOffset() which handles OR as an operand to llvm.amdgcn.raw.buffer.load and llvm.amdgcn.raw.buffer.store. Change-Id: Ifefb9dc5ded8710d333df07ab1900b230e33539a Reviewers: nhaehnle, mareko, arsenm Reviewed By: arsenm Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D55999 llvm-svn: 350208	2019-01-02 09:47:41 +00:00
Craig Topper	8969720787	[X86] Add i8/i16 smulo/umulo test cases where the overflow indication is used by a mask. llvm-svn: 350204	2019-01-02 05:46:02 +00:00
Craig Topper	6f2feb8293	[X86] Remove KNL specific check prefix from xmulo.ll test. NFC This was added at a time when i1 was a legal type with avx512f and there was a bug. i1 is no longer considered a legal type with avx512f so there should be no codegen difference. llvm-svn: 350203	2019-01-02 05:46:00 +00:00
Sanjay Patel	654e6aabb9	[InstCombine] canonicalize raw IR rotate patterns to funnel shift The final piece of IR-level analysis to allow this was committed with: rL350188 Using the intrinsics should improve transforms based on cost models like vectorization and inlining. The backend should be prepared too, so we can now canonicalize more sequences of shift/logic to the intrinsics and know that the end result should be equal or better to the original code even if the target does not have an actual rotate instruction. llvm-svn: 350199	2019-01-01 21:51:39 +00:00
Craig Topper	00b390a000	[X86] Factor the core code out of LowerXALUO into a helper function. Use it in LowerBRCOND and LowerSELECT to avoid some duplicated code. This makes it easier to keep the LowerBRCOND and LowerSELECT code in sync with LowerXALUO so they always pick the same operation for overflowing instructions. This is inspired by the helper functions used by ARM and AArch64 for the same purpose. The test change is because LowerSELECT was not in sync with LowerXALUO with regard to INC/DEC for SADDO/SSUBO. llvm-svn: 350198	2019-01-01 19:34:11 +00:00
Craig Topper	a728214203	[X86] Remove KNL specific check prefix from xaluo.ll test. NFC This was added at a time when i1 was a legal type with avx512f and there was a bug. i1 is no longer considered a legal type with avx512f so there should be no codegen difference. llvm-svn: 350195	2019-01-01 18:44:44 +00:00
Craig Topper	9478492a80	[X86] Add test cases to show where LowerSELECT doesn't select SADDO/SSUBO to INC/DEC, but LowerXALUOOp does. Leading to duplicate code. When SADDO/SSUBO is used as a part of a condition, the X86 backend has to lower the instruction twice. One for the flags use and then once for the data use. These two selections should be kept in sync so they end up with one node providing the data and the flags. This doesn't seem to be happening for INC/DEC. llvm-svn: 350194	2019-01-01 18:44:42 +00:00
Nikita Popov	c5a023b624	[BDCE] Regenerate test checks; NFC llvm-svn: 350190	2019-01-01 12:27:23 +00:00
Nikita Popov	d4bf57be6b	[BDCE] Remove -instsimplify from BDCE test; NFC To make it more obvious which part of the transformation is carried out by BDCE. Also drop the CHECK-IO lines which only run -instsimplify as they don't really seem meaningful if the main check doesn't run -instsimplify either. llvm-svn: 350189	2019-01-01 10:17:35 +00:00
Nikita Popov	bc9986e9ad	Reapply "[BDCE][DemandedBits] Detect dead uses of undead instructions" This (mostly) fixes https://bugs.llvm.org/show_bug.cgi?id=39771. BDCE currently detects instructions that don't have any demanded bits and replaces their uses with zero. However, if an instruction has multiple uses, then some of the uses may be dead (have no demanded bits) even though the instruction itself is still live. This patch extends DemandedBits/BDCE to detect such uses and replace them with zero. While this will not immediately render any instructions dead, it may lead to simplifications (in the motivating case, by converting a rotate into a simple shift), break dependencies, etc. The implementation tries to strike a balance between analysis power and complexity/memory usage. Originally I wanted to track demanded bits on a per-use level, but ultimately we're only really interested in whether a use is entirely dead or not. I'm using an extra set to track which uses are dead. However, as initially all uses are dead, I'm not storing uses those user is also dead. This case is checked separately instead. The previous attempt to land this lead to miscompiles, because cases where uses were initially dead but were later found to be live during further analysis were not always correctly removed from the DeadUses set. This is fixed now and the added test case demanstrates such an instance. Differential Revision: https://reviews.llvm.org/D55563 llvm-svn: 350188	2019-01-01 10:05:26 +00:00
Ayonam Ray	e00606a1b2	Reversing the commit in revision 350186. Revision causes regression in 4 tests. llvm-svn: 350187	2019-01-01 07:28:55 +00:00
Ayonam Ray	c471bb2e67	Omit range checks from jump tables when lowering switches with unreachable default During the lowering of a switch that would result in the generation of a jump table, a range check is performed before indexing into the jump table, for the switch value being outside the jump table range and a conditional branch is inserted to jump to the default block. In case the default block is unreachable, this conditional jump can be omitted. This patch implements omitting this conditional branch for unreachable defaults. Review Reference: D52002 llvm-svn: 350186	2019-01-01 06:37:50 +00:00
Chen Zheng	4952e668f8	[InstCombine] canonicalize MUL with NEG operand -X * Y --> -(X * Y) X * -Y --> -(X * Y) Differential Revision: https://reviews.llvm.org/D55961 llvm-svn: 350185	2019-01-01 01:09:20 +00:00
Simon Pilgrim	8b503c795e	[X86] Add PR34641 masked shld/shrd test cases llvm-svn: 350181	2018-12-31 19:46:18 +00:00
Craig Topper	c25f1f8f17	[X86] Add additional RUN lines to prepare for D56156. NFC llvm-svn: 350180	2018-12-31 19:09:32 +00:00
Craig Topper	ed3ffae4a4	[SelectionDAG] Add SIGN_EXTEND_VECTOR_INREG support to computeKnownBits. Differential Revision: https://reviews.llvm.org/D56168 llvm-svn: 350179	2018-12-31 19:09:30 +00:00
Craig Topper	bb0873cf46	[X86] Add X86ISD::VSRAI to computeKnownBitsForTargetNode. Differential Revision: https://reviews.llvm.org/D56169 llvm-svn: 350178	2018-12-31 19:09:27 +00:00
Michal Gorny	7343e24f78	[test] Fix propagating HOME envvar to unittests Propagate HOME environment variable to unittests. This is necessary to fix test failures resulting from pw_home pointing to a non-existing directory while being overriden with HOME. Apparently Gentoo users hit this sometimes when they override build directory for Portage. Original bug report: https://bugs.gentoo.org/674088 Differential Revision: https://reviews.llvm.org/D56162 llvm-svn: 350175	2018-12-31 13:48:12 +00:00
Martin Storsjo	74d93f9b24	[AArch64] Accept "sve" as arch feature in assembler Differential Revision: https://reviews.llvm.org/D56128 llvm-svn: 350174	2018-12-31 10:22:04 +00:00
Alexander Potapenko	cea4f83371	[MSan] Handle llvm.is.constant intrinsic MSan used to report false positives in the case the argument of llvm.is.constant intrinsic was uninitialized. In fact checking this argument is unnecessary, as the intrinsic is only used at compile time, and its value doesn't depend on the value of the argument. llvm-svn: 350173	2018-12-31 09:42:23 +00:00
Martin Storsjo	2018777836	[AArch64] Implement the .arch_extension directive Differential Revision: https://reviews.llvm.org/D56131 llvm-svn: 350169	2018-12-30 21:06:32 +00:00
Kang Zhang	9d78c60bf4	[PowerPC] Fix machine verify pass error for PATCHPOINT pseudo instruction that bad machine code Summary: For SDAG, we pretend patchpoints aren't special at all until we emit the code for the pseudo. Then the verifier runs and it seems like we have a use of an undefined register (the register will be reserved later, but the verifier doesn't know that). So this patch call setUsesTOCBasePtr before emit the code for the pseudo, so verifier can know X2 is a reserved register. Reviewed By: nemanjai Differential Revision: https://reviews.llvm.org/D56148 llvm-svn: 350165	2018-12-30 15:13:51 +00:00
Kang Zhang	4aa6453767	[PowerPC] Fix ADDE, SUBE do not know how to promote operator Summary: This patch is created to fix the Bugzilla bug 39815: https://bugs.llvm.org/show_bug.cgi?id=39815 This patch is to support promotion integer result for the instruction ADDE, SUBE. Reviewed By: hfinkel Differential Revision: https://reviews.llvm.org/D56119 llvm-svn: 350161	2018-12-30 07:48:09 +00:00
Craig Topper	a32e353afa	[X86] Don't mark SEXTLOAD from v4i8/v4i16/v8i8 as Custom on pre-sse4.1. This seems to be getting in the way more than its helping. This does mean we stop scalarizing some cases, but I'm not convinced the scalarization was really better. Some of the changes to vsel-cmp-load.ll are a regression but D56156 should fix it. llvm-svn: 350159	2018-12-30 03:05:07 +00:00
Craig Topper	f237ce159e	[X86] Add custom type legalization for SIGN_EXTEND_VECTOR_INREG from 16i16/v32i8 to v4i64 when v4i64 needs splitting. This allows us to sign extend to v4i32 first. And then share that extension to implement the final steps to v4i64 using a pcmpgt and punpckl and punpckh. We already do something similar for SIGN_EXTEND with -x86-experimental-vector-widening-legalization. llvm-svn: 350158	2018-12-30 02:30:34 +00:00
Nemanja Ivanovic	0f7715afe1	[PowerPC] Complete the custom legalization of vector int to fp conversion A recent patch has added custom legalization of vector conversions of v2i16 -> v2f64. This just rounds it out for other types where the input vector has an illegal (narrower) type than the result vector. Specifically, this will handle the following conversions: v2i8 -> v2f64 v4i8 -> v4f32 v4i16 -> v4f32 Differential revision: https://reviews.llvm.org/D54663 llvm-svn: 350155	2018-12-29 13:40:48 +00:00
Chen Zheng	763c8973bf	[InstCombine] [NFC] update testcases for canonicalize MUL with NEG operand llvm-svn: 350154	2018-12-29 12:18:15 +00:00
Nemanja Ivanovic	3c7ac649ec	[PowerPC] Fix CR Bit spill pseudo expansion The current CRBIT spill pseudo-op expansion creates a KILL instruction that kills the CRBIT and defines the enclosing CR field. However, this paints a false picture to the register allocator that all bits in the CR field are killed so copies of other bits out of the field become dead and removable. This changes the expansion to preserve the KILL flag on the CRBIT as an implicit use and to treat the CR field as an undef input. Thanks to Hal Finkel for the review and Uli Weigand for implementation input. Differential revision: https://reviews.llvm.org/D55996 llvm-svn: 350153	2018-12-29 11:43:54 +00:00
Simon Atanasyan	a6424e7c4e	[mips] Show an error on attempt to use 64-bit PC-relative relocation The following code requests 64-bit PC-relative relocations unsupported by MIPS ABI. Now it triggers an assertion. It's better to show an error message. ``` foo: .quad bar - foo ``` llvm-svn: 350152	2018-12-29 10:10:02 +00:00
Simon Atanasyan	b243d8d42a	[mips] Show a regular error message on attempt to use one byte relocation llvm-svn: 350151	2018-12-29 10:09:55 +00:00
Craig Topper	7bb1d50455	[X86] Add test case from PR38217. NFC llvm-svn: 350150	2018-12-29 07:14:30 +00:00
Craig Topper	0a6cec6f9f	[X86] Don't mark SEXTLOAD v4i8->v4i64 and v8i8->v8i64 as custom under vector widening legalization. This was tricking us into making these operations and then letting them get scalarized later. But I can't prove that the scalarized version is actually better. llvm-svn: 350141	2018-12-29 01:17:11 +00:00
Anna Thomas	bae11e7999	[UnrollRuntime] NFC: Updated exiting tests and added more tests Added more tests for multiple exiting blocks to the LatchExit. Today these cases are not supported. Patch to follow soon. llvm-svn: 350135	2018-12-28 19:21:50 +00:00
Craig Topper	f814d28eb3	[X86] Directly emit X86ISD::PMULUDQ from the ReplaceNodeResults handling of v2i8/v2i16/v2i32 multiply. Previously we emitted a multiply and some masking that was supposed to matched to PMULUDQ, but the masking could sometimes be removed before we got a chance to match it. So instead just emit the PMULUDQ directly. Remove the DAG combine that was added when the ReplaceNodeResults code was originally added. Add a new DAG combine to avoid regressions in shrink_vmul.ll Some of the shrink_vmul.ll test cases now pick PMULUDQ instead of PMADDWD/PMULLD, but I think this should be an improvement on most CPUs. I think all of this can go away if/when we switch to -x86-experimental-vector-widening-legalization llvm-svn: 350134	2018-12-28 19:19:39 +00:00
Anna Thomas	98743fa77a	[UnrollRuntime] NFC: Add comment and verify LCSSA Added -verify-loop-lcssa to test cases. Updated comments in ConnectProlog. llvm-svn: 350131	2018-12-28 18:52:16 +00:00
Diogo N. Sampaio	9123f82cc4	[AArch64] Add command-line option for SB SB (Speculative Barrier) is only mandatory from 8.5 onwards but is optional from Armv8.0-A. This patch adds a command line option to enable SB, as it was previously only possible to enable by selecting -march=armv8.5-a. This patch also moves to FeatureSB the old FeatureSpecRestrict. Reviewers: pbarrio, olista01, t.p.northover, LukeCheeseman Differential Revision: https://reviews.llvm.org/D55921 llvm-svn: 350126	2018-12-28 17:14:58 +00:00
Max Kazantsev	8f70c9bf49	[NFC] Add failing test on LCSSA form preservation of LoopSimplifyCFG llvm-svn: 350119	2018-12-28 10:43:37 +00:00
Hiroshi Inoue	1ea98f040e	[PowerPC] handle ISD:TRUNCATE in BitPermutationSelector This is the last one in a series of patches to support better code generation for bitfield insert. BitPermutationSelector already support ISD::ZERO_EXTEND but not TRUNCATE. This patch adds support for ISD:TRUNCATE in BitPermutationSelector. For example of this test case, struct s64b { int a:4; int b:16; int c:24; }; void bitfieldinsert64b(struct s64b *p, unsigned char v) { p->b = v; } the selection DAG loos like: t14: i32,ch = load<(load 4 from %ir.0)> t0, t2, undef:i64 t18: i32 = and t14, Constant:i32<-1048561> t4: i64,ch = CopyFromReg t0, Register:i64 %1 t22: i64 = AssertZext t4, ValueType:ch:i8 t23: i32 = truncate t22 t16: i32 = shl nuw nsw t23, Constant:i32<4> t19: i32 = or t18, t16 t20: ch = store<(store 4 into %ir.0)> t14:1, t19, t2, undef:i64 By handling truncate in the BitPermutationSelector, we can use information from AssertZext when selecting t19 and skip the mask operation corresponding to t18. So the generated sequences with and without this patch are without this patch rlwinm 5, 5, 0, 28, 11 # corresponding to t18 rlwimi 5, 4, 4, 20, 27 with this patch rlwimi 5, 4, 4, 12, 27 Differential Revision: https://reviews.llvm.org/D49076 llvm-svn: 350118	2018-12-28 08:00:39 +00:00
Max Kazantsev	530ff8f3cc	Temporarily disable term folding in LoopSimplifyCFG, add tests llvm-svn: 350117	2018-12-28 06:22:39 +00:00
QingShan Zhang	f2d9df61c7	[PowerPC] Remove the implicit use of the register if it is replaced by Imm If we are changing the MI operand from Reg to Imm, we need also handle its implicit use if have. Differential Revision: https://reviews.llvm.org/D56078 llvm-svn: 350115	2018-12-28 03:38:09 +00:00
Zi Xuan Wu	a02a3feecf	[PowerPC] Fix assert from machine verify pass that atomic pseudo expanding causes mismatched register class For atomic value operand which less than 4 bytes need to be masked. And the related operation to calculate the newvalue can be done in 32 bit gprc. So just use gprc for mask and value calculation. Differential Revision: https://reviews.llvm.org/D56077 llvm-svn: 350113	2018-12-28 02:12:55 +00:00
Chen Zheng	5ede950df9	[PowerPC] fix register class after converting X-FORM instruction to D-FORM instruction Differential Revision: https://reviews.llvm.org/D55806 llvm-svn: 350111	2018-12-28 01:02:35 +00:00
Craig Topper	787ad92bf6	[X86] Remove check that avoids creating PMULDQ with illegal types. Rely on SplitOpsAndApply to legalize it. Create PMULDQ/PMULUDQ as long as the number of elements is a power of 2. This seems to give some improvements in our ability to use SimplifyDemandedBits. llvm-svn: 350084	2018-12-27 03:37:04 +00:00
Wouter van Oortmerssen	f227621036	[WebAssembly] Added basic support for if/else/end_if in MC layer. Summary: These instructions are currently unused in our backend, but for completeness it is good to support them, so they can be used with the assembler in hand-written code. Tests are very basic, signature support missing much like other blocks. Reviewers: dschuff, aheejin Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D55973 llvm-svn: 350079	2018-12-26 22:55:26 +00:00
Wouter van Oortmerssen	29c6ce5879	[WebAssembly] Make assembler check for proper nesting of control flow. Summary: It does so using a simple nesting stack, and gives clear errors upon violation. This is unique to wasm, since most CPUs do not have any nested constructs. Had to add an end of file check to the general assembler for this. Note: if/else/end instructions are not currently supported in our tablegen defs, so these tests will be enabled in a follow-up. They already pass the nesting check. Reviewers: dschuff, aheejin Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D55797 llvm-svn: 350078	2018-12-26 22:46:18 +00:00
Craig Topper	c9a6000755	[LoopIdiomRecognize] Add CTTZ support Summary: Existing LIR recognizes CTLZ where shifting input variable right until it is zero. (Shift-Until-Zero idiom) This commit: 1. Augments Shift-Until-Zero idiom to recognize CTTZ where input variable is shifted left. 2. Prepare for BitScan idiom recognition. Patch by Yuanfang Chen (tabloid.adroit) Reviewers: craig.topper, evstupac Reviewed By: craig.topper Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D55876 llvm-svn: 350074	2018-12-26 21:59:48 +00:00
Reid Kleckner	c168c6f86f	[codeview] Check if this 'this' type of a method is a pointer Fixes crash reported after r347354 for frontends that don't always emit 'this' pointers for methods. Now we will silently produce debug info that makes functions like this look like static methods, which seems reasonable. llvm-svn: 350073	2018-12-26 21:52:17 +00:00
Justin Lebar	49fac56ea3	[NVPTX] Allow libcalls that are defined in the current module. The patch adds a possibility to make library calls on NVPTX. An important thing about library functions - they must be defined within the current module. This basically should guarantee that we produce a valid PTX assembly (without calls to not defined functions). The one who wants to use the libcalls is probably will have to link against compiler-rt or any other implementation. Currently, it's completely impossible to make library calls because of error LLVM ERROR: Cannot select: i32 = ExternalSymbol '...'. But we can lower ExternalSymbol to TargetExternalSymbol and verify if the function definition is available. Also, there was an issue with a DAG during legalisation. When we expand instruction into libcall, the inner call-chain isn't being "integrated" into outer chain. Since the last "data-flow" (call retval load) node is located in call-chain earlier than CALLSEQ_END node, the latter becomes a leaf and therefore a dead node (and is being removed quite fast). Proposed here solution relies on another data-flow pseudo nodes (ProxyReg) which purpose is only to keep CALLSEQ_END at legalisation and instruction selection phases - we remove the pseudo instructions before register scheduling phase. Patch by Denys Zariaiev! Differential Revision: https://reviews.llvm.org/D34708 llvm-svn: 350069	2018-12-26 19:12:31 +00:00
Simon Pilgrim	a8ff77bb34	[AMDGPU] Regenerate i64 shift tests. To show codegen diff due to a future SimplifyDemandedBits patch. llvm-svn: 350065	2018-12-26 12:09:10 +00:00
Petar Avramovic	09dff33349	[MIPS GlobalISel] Select G_SELECT Add widen scalar for type index 1 (i1 condition) for G_SELECT. Select G_SELECT for pointer, s32(integer) and smaller low level types on MIPS32. Differential Revision: https://reviews.llvm.org/D56001 llvm-svn: 350063	2018-12-25 14:42:30 +00:00
Kang Zhang	d501a1e596	[PowerPC] Fix the bug of ISD::ADDE to set its second return type to glue Summary: This patch is to fix the bug imported by rL341634. In above submit , the the return type of ISD::ADDE is 14224: SDVTList VTs = DAG.getVTList(MVT::i64, MVT::i64), but in fact, the second return type of ISD::ADDE should be MVT::Glue not MVT::i64. Reviewed By: hfinkel Differential Revision: https://reviews.llvm.org/D55977 llvm-svn: 350061	2018-12-25 03:29:51 +00:00
Craig Topper	0229da8f07	[X86] Use GetDemandedBits to simplify the operands of PMULDQ/PMULUDQ. This is an alternative to what I attempted in D56057. GetDemandedBits is a special version of SimplifyDemandedBits that allows simplifications even when the operand has other uses. GetDemandedBits will only do simplifications that allow a node to be bypassed. It won't create new nodes or alter any of the other users. I had to add support for bypassing SIGN_EXTEND_INREG to GetDemandedBits. Based on a patch that Simon Pilgrim sent me in email. Fixes PR40142. llvm-svn: 350059	2018-12-24 19:40:20 +00:00
Craig Topper	6356ad940b	[X86] Add test cases for PR40142. NFC llvm-svn: 350058	2018-12-24 19:40:17 +00:00
Eugene Leviant	4dc3a3f746	[HWASAN] Instrument memorty intrinsics by default Differential revision: https://reviews.llvm.org/D55926 llvm-svn: 350055	2018-12-24 16:02:48 +00:00
Max Kazantsev	0b455c2b71	Revert rL350048 and rL350050 These patches have broken almost all buildbots on test DebugInfo/X86/addr_comments.ll. Reverting to green. llvm-svn: 350052	2018-12-24 10:30:04 +00:00
Max Kazantsev	edabb9ae56	[LoopSimplifyCFG] Delete dead exiting edges This patch teaches LoopSimplifyCFG to remove dead exiting edges from loops. Differential Revision: https://reviews.llvm.org/D54025 Reviewed By: fedor.sergeev llvm-svn: 350049	2018-12-24 07:41:33 +00:00
David Blaikie	e20bf9ab91	DebugInfo: Use assembly label arithmetic for address pool size for easier reading/editing llvm-svn: 350048	2018-12-24 07:35:10 +00:00
David Blaikie	d671eb7e7c	DebugInfo: Add assembly comments for debug_addr contribution header fields llvm-svn: 350047	2018-12-24 07:09:50 +00:00
David Blaikie	b917c3a41a	llvm-dwarfdump: Skip address index info (and dump only the address, if found) when non-verbose dumping addrx forms There's a few bugs here still - demonstrated with FIXITs in the test. llvm-svn: 350046	2018-12-24 06:52:31 +00:00
Max Kazantsev	347c583772	Return "[LoopSimplifyCFG] Delete dead in-loop blocks" The underlying bug that caused the revert should be fixed by rL348567. Differential Revision: https://reviews.llvm.org/D54023 llvm-svn: 350045	2018-12-24 06:06:17 +00:00
Craig Topper	5eb5e2bc89	[X86] Autogenerate complete checks. NFC llvm-svn: 350039	2018-12-24 01:59:31 +00:00
Sanjay Patel	93f1074677	[DAGCombiner] limit shuffle to extend transform (PR40146) It's dangerous to knowingly create an illegal vector type no matter what stage of combining we're in. This prevents the missed folding/scalarization seen in: https://bugs.llvm.org/show_bug.cgi?id=40146 llvm-svn: 350034	2018-12-23 20:48:31 +00:00
Sanjay Patel	9e5588e1df	[x86] add test for vector shuffle --> extend transform (PR40146); NFC llvm-svn: 350033	2018-12-23 20:36:52 +00:00
Sanjay Patel	9933574ac3	[DAGCombiner] allow hoisting vector bitwise logic ahead of extends llvm-svn: 350032	2018-12-23 19:58:16 +00:00
Sanjay Patel	8bc612f63b	[x86] add tests for vector extend + logic ops; NFC llvm-svn: 350031	2018-12-23 18:37:44 +00:00
Craig Topper	006bac6880	[X86] Return false from hasAndNotCompare if the comparision value is a constant. We won't end up using an ANDN instruction in this case so we should generate the same code we do for pre-BMI targets. llvm-svn: 350018	2018-12-23 05:52:55 +00:00
Craig Topper	3cc92a28ce	[X86] Fix an old FIXME about folding the zero constant into the OR instruction we use for sequentially consistent fence in 32-bit mode without SSE2. llvm-svn: 350013	2018-12-23 01:54:43 +00:00
Craig Topper	dfb8a427ff	[X86] Autogenerate complete checks. NFC llvm-svn: 350012	2018-12-23 01:54:41 +00:00
David Blaikie	25179613f6	llvm-dwarfdump: Dump the section name/number for addr attributes llvm-svn: 350009	2018-12-22 20:34:58 +00:00
Sanjay Patel	4b537aaf6d	[DAGCombiner] allow narrowing of add followed by truncate trunc (add X, C ) --> add (trunc X), C' If we're throwing away the top bits of an 'add' instruction, do it in the narrow destination type. This makes the truncate-able opcode list identical to the sibling transform done in IR (in instcombine). This change used to show regressions for x86, but those are gone after D55494. This gets us closer to deleting the x86 custom function (combineTruncatedArithmetic) that does almost the same thing. Differential Revision: https://reviews.llvm.org/D55866 llvm-svn: 350006	2018-12-22 17:10:31 +00:00
Sanjay Patel	52c02d70e2	[x86] add load fold patterns for movddup with vzext_load The missed load folding noticed in D55898 is visible independent of that change either with an adjusted IR pattern to start or with AVX2/AVX512 (where the build vector becomes a broadcast first; movddup is not produced until we get into isel via tablegen patterns). Differential Revision: https://reviews.llvm.org/D55936 llvm-svn: 350005	2018-12-22 16:59:02 +00:00
Roman Lebedev	da1df56e5d	NFC][CodeGen][X86][AArch64] Tests for bit extract (pat. a/c/d) with trunc (PR36419) llvm-svn: 350000	2018-12-22 10:38:05 +00:00
Roman Lebedev	c90611db06	[NFC][CodeGen][X86][AArch64] Bit extract: add nounwind attr to drop .cfi noise Forgot about that. llvm-svn: 349999	2018-12-22 09:58:13 +00:00
Roman Lebedev	29d8af283a	[NFC][CodeGen][X86][AArch64] Tests for bit extract (pat. b) with trunc (PR36419) @bextr64_32_b1 is extracted from hotpath of real-world code (RawSpeed BitStream<>::peekBitsNoFill()) after `clang -O3`. @bextr64_32_b2/@bextr64_32_b0 is the same pattern, but with trunc done last, showing how i think it can be handled: https://rise4fun.com/Alive/K4B https://rise4fun.com/Alive/qC9 It is possible that middle-end should do some of this, too. https://bugs.llvm.org/show_bug.cgi?id=36419 llvm-svn: 349998	2018-12-22 09:40:14 +00:00
David Blaikie	9efb0153f0	llvm-dwarfdump: Remove extraneous space between '(' and 'indexed' When dumping string or address indexes llvm-svn: 349997	2018-12-22 08:43:08 +00:00
David Blaikie	c04d2bf22a	llvm-dwarfdump: Print the section name/number for addr_index attributes (addr attributes coming shortly) llvm-svn: 349996	2018-12-22 08:33:55 +00:00
Reid Kleckner	98bbd07cc3	[MC] Enable .file support on COFF and diagnose it on unsupported targets Summary: The "single parameter" .file directive appears to be an ELF-only feature that is intended to insert the main source filename into the string table table. I noticed that if you assemble an ELF .s file for COFF, typically it will assert right away on a .file directive near the top of the file. My first change was to make this emit a proper error in the asm parser so that we don't assert so easily. However, COFF actually does have some support for this directive, and if you emit an object file, llvm-mc does not assert. When emitting a COFF object, MC will take those file names and create "debug" symbol table entries for them. I'm not familiar with these kinds of symbol table entries, and I'm not aware of any users of them, but @compnerd added them a while ago. They don't introduce absolute paths, and most main source file paths are short enough that this extra entry shouldn't cause any problems, so I enabled the flag in MCAsmInfoCOFF that indicates that it's supported. This has the side effect of adding an extra debug symbol to every object produced by clang, which is a pretty big functional change. My question is, should we keep the functionality or remove it in the name of symbol table minimalism? Reviewers: mstorsjo, compnerd Subscribers: hiraditya, compnerd, llvm-commits Differential Revision: https://reviews.llvm.org/D55900 llvm-svn: 349976	2018-12-21 23:35:48 +00:00
David Blaikie	c3f30a7fc6	Reapply: DebugInfo: Assume an absence of ranges or high_pc on a CU means the CU is empty (devoid of code addresses) Originally committed in r349333, reverted in r349353. GCC emitted these unconditionally on/before 4.4/March 2012 Clang emitted these unconditionally on/before 3.5/March 2014 This improves performance when parsing CUs (especially those using split DWARF) that contain no code ranges (such as the mini CUs that may be created by ThinLTO importing - though generally they should be/are avoided, especially for Split DWARF because it produces a lot of very small CUs, which don't scale well in a bunch of other ways too (including size)). The revert was due to a (Google internal) test that had some checked in old object files missing DW_AT_ranges. That's since been fixed. llvm-svn: 349968	2018-12-21 22:25:01 +00:00
Craig Topper	e58cd9cbc6	[X86] Add isel patterns to match BMI/TBMI instructions when lowering has turned the root nodes into one of the flag producing binops. This fixes the patterns that have or/and as a root. 'and' is handled differently since thy usually have a CMP wrapped around them. I had to look for uses of the CF flag because all these nodes have non-standard CF flag behavior. A real or/xor would always clear CF. In practice we shouldn't be using the CF flag from these nodes as far as I know. Differential Revision: https://reviews.llvm.org/D55813 llvm-svn: 349962	2018-12-21 21:42:43 +00:00
Craig Topper	62ec024d3b	[X86] Don't allow optimizeCompareInstr to replace a CMP with BEXTR if the sign flag is used. The BEXTR instruction documents the SF bit as undefined. The TBM BEXTR instruction has the same issue, but I'm not sure how to test it. With the control being an immediate we can determine the sign bit is 0 or the BEXTR would have been removed. Fixes PR40060 Differential Revision: https://reviews.llvm.org/D55807 llvm-svn: 349956	2018-12-21 21:16:26 +00:00

... 2 3 4 5 6 ...

58528 Commits