Commit Graph

67607 Commits

Author SHA1 Message Date
Anna Welker 346f6b54bd [ARM][MVE] Enable masked gathers from vector of pointers
Adds a pass to the ARM backend that takes a v4i32
gather and transforms it into a call to MVE's
masked gather intrinsics.

Differential Revision: https://reviews.llvm.org/D71743
2020-01-08 13:43:12 +00:00
Sam Parker 96d2d96b03 [NFC][ARM] Update tests
Run the update_mir_test on some of the low-overhead loop tests.
2020-01-08 06:08:30 -05:00
Xuanda Yang dfeb8730e2 [llvm-symbolizer]Fix printing of malformed address values not passed via stdin
Summary:
relates https://bugs.llvm.org/show_bug.cgi?id=44443

Adding missing newline when printing bad input values.

Fix testcase

Reviewers: jhenderson

Reviewed By: jhenderson

Subscribers: rupprecht, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D72313
2020-01-08 18:37:41 +08:00
Kadir Cetinkaya b212eb7159
Revert "[InstCombine] fold zext of masked bit set/clear"
This reverts commit a041c4ec6f.

This looks like a non-trivial change and there has been no code
reviews (at least there were no phabricator revisions attached to the
commit description). It is also causing a regression in one of our
downstream integration tests, we haven't been able to come up with a
minimal reproducer yet.
2020-01-08 11:21:21 +01:00
Tim Northover 903e5c3028 AArch64: add missing Apple CPU names and use them by default.
Apple's CPUs are called A7-A13 in official communication, occasionally with
weird suffixes which we probably don't need to care about. This adds each one
and describes its features. It also switches the default CPU to the canonical
name for Cyclone, but leaves legacy support in so that existing bitcode still
compiles.
2020-01-08 09:24:06 +00:00
QingShan Zhang 44f78f368c [NFC][Test] Add the option -enable-no-signed-zeros-fp-math for test
fma-combine.ll
2020-01-08 06:48:51 +00:00
Wang, Pengfei 9a621de1ec [X86] Adding fp128 support for strict fcmp
Summary: Adding fp128 support for strict fcmp

Reviewers: craig.topper, LiuChen3, andrew.w.kaylor, RKSimon, uweigand

Subscribers: hiraditya, llvm-commits, LuoYuanke

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71897
2020-01-08 12:59:31 +08:00
James Clarke 917f46db04 [RISCV] Fix evalutePCRelLo for symbols at the end of a fragment
Summary:
This is analogous to D58943, which correctly finds the corresponding
fixup. However, when linker relaxations are disabled and we evaluate the
fixup, we need to also ensure we use an offset of 0 rather than the size
of the previous fragment.

Reviewers: asb, efriedma, lenary

Reviewed By: efriedma

Subscribers: hiraditya, rbar, johnrusso, simoncook, sabuasal, niosHD, kito-cheng, shiva0217, MaskRay, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, rkruppe, PkmX, jocewei, psnobl, benna, Jim, s.egerton, pzheng, sameer.abuasal, apazos, luismarques, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71978
2020-01-08 04:32:06 +00:00
czhengsz 8b8ba44047 [SCEV] get more accurate range for AddExpr with wrap flag.
Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D64869
2020-01-07 20:58:04 -05:00
Eric Christopher c23c8746d0 XFAIL load_extension.ll for all targets currently - it's failing on
additional platforms than just darwin.
2020-01-07 17:00:55 -08:00
Philip Reames 312a532dc0 [GVN/FP] Considate logic for reasoning about equality vs equivalance for floats
Factor out common logic into some reasonable commented helper functions. In the process, ensure that the in-block vs cross-block cases are handled the same. They previously weren't.

Differential Revision: https://reviews.llvm.org/D67126
2020-01-07 16:05:04 -08:00
Amara Emerson b6598bcf4b [AArch64][GlobalISel] Fold a chain of two G_PTR_ADDs of constant offsets.
E.g.
%addr1 = G_PTR_ADD %base, G_CONSTANT 20
%addr2 = G_PTR_ADD %addr1, G_CONSTANT 8
  -->
%addr2 = G_PTR_ADD %base, G_CONSTANT 28

Differential Revision: https://reviews.llvm.org/D72351
2020-01-07 14:12:42 -08:00
Craig Topper eee89cd5a8 [X86] Add SSE4.1 command lines to vec-strict-inttofp-128.ll to cover the v2i64->v2f32 strict_uitofp codegen. NFC 2020-01-07 13:53:38 -08:00
Bill Wendling e886e762dd Revert "Allow output constraints on "asm goto""
This reverts commit 52366088a8.

I accidentally pushed this before supporting changes.
2020-01-07 13:44:08 -08:00
Bill Wendling 52366088a8 Allow output constraints on "asm goto"
Summary:
Remove the restrictions that preventing "asm goto" from returning non-void
values. The values returned by "asm goto" are only valid on the "fallthrough"
path.

Reviewers: jyknight, nickdesaulniers, hfinkel

Reviewed By: jyknight, nickdesaulniers

Subscribers: rsmith, hiraditya, llvm-commits, cfe-commits, craig.topper, rnk

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D69876
2020-01-07 13:40:26 -08:00
Matt Arsenault 6652cc0cf7 AMDGPU/GlobalISel: Fix scalar G_SELECT for arbitrary pointers
4e85ca9562 missed updating the legal
condition type set for pointers with any unrecognized address space.
2020-01-07 16:36:31 -05:00
Matt Arsenault a2d54fc534 AMDGPU/GlobalISel: Add some missing G_SELECT testcases 2020-01-07 16:36:31 -05:00
Matt Arsenault 577b0b5f54 AMDGPU/GlobalISel: Fix missing test for s16 icmp 2020-01-07 16:36:31 -05:00
Matt Arsenault 4844bf0fe2 AMDGPU: Apply i16 add->sub pattern with zext to i32
This was only applying the deeper nested zext pattern, and missing the
special case code size fold.
2020-01-07 16:36:31 -05:00
Craig Topper 9685cf709f [X86] Enable v2i64->v2f32 uint_to_fp code in ReplaceNodeResults on SSE4.1 target
Now that we generate decent code for (v2i64 (setlt zero, X)) on pre-sse4.2 targets I think we can use this now.

Differential Revision: https://reviews.llvm.org/D72354
2020-01-07 13:25:29 -08:00
Daniel Sanders 39c05703a6 [gicombiner] Correct 64f1bb5cd2 to account for MSVC's %p format 2020-01-07 12:50:05 -08:00
Sanjay Patel 6d52edebc9 [x86] add tests for extract-of-concat; NFC 2020-01-07 15:48:54 -05:00
Matt Arsenault 449ab10509 AMDGPU: Add baseline test for missing pattern
The optimization to turn an add into a sub isn't triggering when the
pattern to use the zeroed high bits is used.
2020-01-07 15:10:08 -05:00
Matt Arsenault 68e70fb098 AMDGPU: Fix not using v_cvt_f16_[iu]16
We weren't treating i16->f16 casts as legal on targets with these
instructions, and always using a pair of casts through i32.
2020-01-07 15:10:07 -05:00
Fangrui Song 8edf759ca7 [PowerPC][Triple] Use elfv2 on freebsd>=13 and linux-musl
Summary:
Every powerpc64le platform uses elfv2.

For powerpc64, the environments "elfv1" and "elfv2" were added for
FreeBSD ELFv1->ELFv2 migration in D61950.  FreeBSD developers have
decided to use OS versions to select ABI, and no one is relying on the
environments.

Also use elfv2 on powerpc64-linux-musl.

Users can always use -mabi=elfv1 and -mabi=elfv2 to override the default
ABI.

Reviewed By: adalava

Differential Revision: https://reviews.llvm.org/D72352
2020-01-07 11:40:56 -08:00
Jessica Paquette acd2580824 [MachineOutliner][AArch64] Save + restore LR in noreturn functions
Conservatively always save + restore LR in noreturn functions.

These functions do not end in a RET, and so they aren't guaranteed to have an
instruction which uses LR in any way. So, as a result, you can end up in
unfortunate situations where you can't backtrace out of these functions in a
debugger.

Remove the old noreturn test, and add a new one which is more descriptive.

Remove the restriction that we can't outline from noreturn functions as well
since we now do the right thing.
2020-01-07 11:27:25 -08:00
Craig Topper afa8211e97 [X86] Improve lowering of (v2i64 (setgt X, -1)) on pre-SSE2 targets. Enable v2i64 in foldVectorXorShiftIntoCmp.
Similar to D72302 but for the canonical form for the opposite case. I've changed foldVectorXorShiftIntoCmp to form a target independent setcc node instead of PCMPGT now and enabled its for v2i64 on pre-SSE4.2 targets. The setcc should eventually get lowered to PCMPGT or the new v2i64 sequence.

Differential Revision: https://reviews.llvm.org/D72318
2020-01-07 11:22:04 -08:00
Craig Topper b9376690a0 [X86] Improve lowering of v2i64 sign bit tests on pre-sse4.2 targets
Without sse4.2 a v2i64 setlt needs to expand into a pcmpgtd, pcmpeqd, 3 shuffles, and 2 logic ops. But if we're only interested in the sign bit of the i64 elements, we can just use one pcmpgtd and shuffle the odd elements to the even elements.

Differential Revision: https://reviews.llvm.org/D72302
2020-01-07 11:22:03 -08:00
Daniel Sanders 1d94fb2111 [gicombiner] Add GIMatchTree and use it for the code generation
Summary:
GIMatchTree's job is to build a decision tree by zipping all the
GIMatchDag's together.

Each DAG is added to the tree builder as a leaf and partitioners are used
to subdivide each node until there are no more partitioners to apply. At
this point, the code generator is responsible for testing any untested
predicates and following any unvisited traversals (there shouldn't be any
of the latter as the getVRegDef partitioner handles them all).

Note that the leaves don't always fit into partitions cleanly and the
partitions may overlap as a result. This is resolved by cloning the leaf
into every partition it belongs to. One example of this is a rule that can
match one of N opcodes. The leaf for this rule would end up in N partitions
when processed by the opcode partitioner. A similar example is the
getVRegDef partitioner where having rules (add $a, $b), and (add ($a, $b), $c)
will result in the former being in the partition for successfully
following the vreg-def and failing to do so as it doesn't care which
happens.

Depends on D69151

Fixed the issues with the windows bots which were caused by stdout/stderr
interleaving.

Reviewers: bogner, volkan

Reviewed By: volkan

Subscribers: lkail, mgorny, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D69152
2020-01-07 11:12:53 -08:00
Simon Pilgrim 55de6fc0b6 [ARM] Regenerate bfi.ll test cases 2020-01-07 16:51:11 +00:00
diggerlin a3832f33d9 [AIX][XCOFF]Implement mergeable const
SUMMARY:
In this patch, we map mergeable const objects to the read-only section in the same manner as const objects that are not mergeable.

Reviewers: hubert.reinterpretcast,jasonliu
Subscribers: wuzish, nemanjai, hiraditya

Differential Revision: https://reviews.llvm.org/D71551
2020-01-07 11:20:51 -05:00
Sjoerd Meijer ee811808a9 [ARM][MVE] Renamed VPT Block tests and files to something more informative. NFC 2020-01-07 16:16:54 +00:00
Matt Arsenault 78b30a54c9 AMDGPU/GlobalISel: Fix readfirstlane pattern import
The imm folding optimization pattern failed to import. The instruction
pattern was already working, but failing to fail on SGPR inputs.
2020-01-07 11:07:08 -05:00
Sanjay Patel f8962571f7 [InstCombine] try to pull 'not' of select into compare operands
not (select ?, (cmp TPred, ?, ?), (cmp FPred, ?, ?) -->
     select ?, (cmp TPred', ?, ?), (cmp FPred', ?, ?)

If both sides of the select are cmps, we can remove an instruction.
The case where only side is a cmp is deferred to a possible
follow-on patch.

We have a more general 'isFreeToInvert' analysis, but I'm not seeing
a way to use that more widely without inducing infinite looping
(opposing transforms).
Here, we flip the compare predicates directly, so we should not have
any danger by creating extra intermediate 'not' ops.

Alive proofs:
https://rise4fun.com/Alive/jKa

Name: both select values are compares - invert predicates
  %tcmp = icmp sle i32 %x, %y
  %fcmp = icmp ugt i32 %z, %w
  %sel = select i1 %cond, i1 %tcmp, i1 %fcmp
  %not = xor i1 %sel, true
=>
  %tcmp_not = icmp sgt i32 %x, %y
  %fcmp_not = icmp ule i32 %z, %w
  %not = select i1 %cond, i1 %tcmp_not, i1 %fcmp_not

Name: false val is compare - invert/not
  %fcmp = icmp ugt i32 %z, %w
  %sel = select i1 %cond, i1 %tcmp, i1 %fcmp
  %not = xor i1 %sel, true
=>
  %tcmp_not = xor i1 %tcmp, -1
  %fcmp_not = icmp ule i32 %z, %w
  %not = select i1 %cond, i1 %tcmp_not, i1 %fcmp_not

Differential Revision: https://reviews.llvm.org/D72007
2020-01-07 10:44:23 -05:00
Matt Arsenault e699c03c9b AMDGPU/GlobalISel: Fix import of s_abs_i32 pattern 2020-01-07 10:32:07 -05:00
Matt Arsenault 9150d6bd73 AMDGPU/GlobalISel: Select llvm.amdgcn.wqm.vote 2020-01-07 10:15:29 -05:00
Tim Northover e130eef588 OpaquePtr: print byval types containing anonymous types correctly.
Attribute::getAsString doesn't have enough information to print anonymous
Module-level types correctly, so they come back as "%type 0xabcd". This results
in broken IR when printing as text.

Instead, print type-attributes (currently just byval) using the TypePrinting
infrastructure available in AsmWriter. This only applies to function argument
attributes.
2020-01-07 15:11:43 +00:00
Matt Arsenault f26ed6e47c llc: Change behavior of -mcpu with existing attribute
Don't overwrite existing target-cpu attributes.

I've often found the replacement behavior annoying, and this is
inconsistent with how the fast math command line flags interact with
the function attributes.

Does not yet change target-features, since I think that should behave
as a concatenation.
2020-01-07 10:10:25 -05:00
Sanjay Patel 58e2e92a57 [DAGCombiner] reduce shuffle of concat of same vector
This is possibly a small part towards solving PR42024:
https://bugs.llvm.org/show_bug.cgi?id=42024

The vectorizer is creating shuffles of concat like this:

%63 = shufflevector <4 x i64> %x, <4 x i64> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 0, i32 1, i32 2, i32 3>
%64 = shufflevector <8 x i64> %63, <8 x i64> undef, <8 x i32> <i32 0, i32 4, i32 1, i32 5, i32 2, i32 6, i32 3, i32 7>

That might be fixable in the vectorizers, but we're not allowed to fold that into a single shuffle in instcombine,
so we should have a backend backstop to convert that into the likely simpler form:

%64 = shufflevector <4 x i64> %x, <4 x i64> undef, <8 x i32> <i32 0, i32 0, i32 1, i32 1, i32 2, i32 2, i32 3, i32 3>

Differential Revision: https://reviews.llvm.org/D72300
2020-01-07 09:48:59 -05:00
Sjoerd Meijer e34801c8e6 [ARM][MVE] VPT Blocks: findVCMPToFoldIntoVPS
This is a recommit of D71330, but with a few things fixed and changed:

1) ReachingDefAnalysis: this was not running with optnone as it was checking
skipFunction(), which other analysis passes don't do. I guess this is a
copy-paste from a codegen pass.
2) VPTBlockPass: here I've added skipFunction(), because like most/all
optimisations, we don't want to run this with optnone.

This fixes the issues with the initial/previous commit: the VPTBlockPass was
running with optnone, but ReachingDefAnalysis wasn't, and so VPTBlockPass was
crashing querying ReachingDefAnalysis.

I've added test case mve-vpt-block-optnone.mir to check that we don't run
VPTBlock with optnone.

Differential Revision: https://reviews.llvm.org/D71470
2020-01-07 13:54:47 +00:00
Victor Campos 60e0120c91 [ARM] Improve codegen of volatile load/store of i64
Summary:
Instead of generating two i32 instructions for each load or store of a volatile
i64 value (two LDRs or STRs), now emit LDRD/STRD.

These improvements cover architectures implementing ARMv5TE or Thumb-2.

Reviewers: dmgreen, efriedma, john.brawn, nickdesaulniers

Reviewed By: efriedma, nickdesaulniers

Subscribers: nickdesaulniers, vvereschaka, kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70072
2020-01-07 13:16:18 +00:00
Ulrich Weigand 14cd4a5b32 [SystemZ] Extend fp-strict-alias test case
Explicitly add test for fpexcept.maytrap intrinsics.
2020-01-07 12:44:51 +01:00
Ulrich Weigand 4814b68b7a [SystemZ] Fix python failure in test case
With recent Python the Large/spill-02.py test failed with an error:
TypeError: can't multiply sequence by non-int of type 'float'
2020-01-07 10:26:37 +01:00
QingShan Zhang d877229b5b [NFC][Test] Add a test to verify the DAGCombine of fma 2020-01-07 03:13:39 +00:00
Matt Arsenault e8d9d202bc AMDGPU: Add run line to int_to_fp tests
This wasn't catching a regression on targets with legal i16 triggered
in a future commit.
2020-01-06 21:38:50 -05:00
Heejin Ahn 21f7b36209 [WebAssembly] Fix landingpad-only case in Emscripten EH
Summary:
Previously we didn't set `Changed` to true when there are only landing
pads but not invokes. This fixes it and we set `Changed` to true
whenever we have landing pads. (There can't be invokes without landing
pads, so that case is covered too)

The test case for this has to be a separate file because this pass is a
`ModulePass` and `Changed` is computed based on the whole module.

Reviewers: tlively

Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D72308
2020-01-06 17:02:32 -08:00
Matt Arsenault 52afc93c38 AMDGPU/GlobalISel: Legalize G_READCYCLECOUNTER 2020-01-06 19:16:32 -05:00
Matt Arsenault d4c9e13324 AMDGPU/GlobalISel: Select G_UADDE/G_USUBE 2020-01-06 18:27:52 -05:00
Matt Arsenault 4e85ca9562 AMDGPU/GlobalISel: Replace handling of boolean values
This solves selection failures with generated selection patterns,
which would fail due to inferring the SGPR reg bank for virtual
registers with a set register class instead of VCC bank. Use
instruction selection would constrain the virtual register to a
specific class, so when the def was selected later the bank no longer
was set to VCC.

Remove the SCC reg bank. SCC isn't directly addressable, so it
requires copying from SCC to an allocatable 32-bit register during
selection, so these might as well be treated as 32-bit SGPR values.

Now any scalar boolean value that will produce an outupt in SCC should
be widened during RegBankSelect to s32. Any s1 value should be a
vector boolean during selection. This makes the vcc register bank
unambiguous with a normal SGPR during selection.

Summary of how this should now work:

- G_TRUNC is always a no-op, and never should use a vcc bank result.

- SALU boolean operations should be promoted to s32 in RegBankSelect
  apply mapping

- An s1 value means vcc bank at selection. The exception is for
  legalization artifacts that use s1, which are never VCC. All other
  contexts should infer the VCC register classes for s1 typed
  registers. The LLT for the register is now needed to infer the
  correct register class. Extensions with vcc sources should be
  legalized to a select of constants during RegBankSelect.

- Copy from non-vcc to vcc ensures high bits of the input value are
  cleared during selection.

- SALU boolean inputs should ensure the inputs are 0/1. This includes
  select, conditional branches, and carry-ins.

There are a few somewhat dirty details. One is that G_TRUNC/G_*EXT
selection ignores the usual register-bank from register class
functions, and can't handle truncates with VCC result banks. I think
this is OK, since the artifacts are specially treated anyway. This
does require some care to avoid producing cases with vcc. There will
also be no 100% reliable way to verify this rule is followed in
selection in case of register classes, and violations manifests
themselves as invalid copy instructions much later.

Standard phi handling also only considers the bank of the result
register, and doesn't insert copies to make the source banks
match. This doesn't work for vcc, so we have to manually correct phi
inputs in this case. We should add a verifier check to make sure there
are no phis with mixed vcc and non-vcc register bank inputs.

There's also some duplication with the LegalizerHelper, and some code
which should live in the helper. I don't see a good way to share
special knowledge about what types to use for intermediate operations
depending on the bank for example. Using the helper to replace
extensions with selects also seems somewhat awkward to me.

Another issue is there are some contexts calling
getRegBankFromRegClass that apparently don't have the LLT type for the
register, but I haven't yet run into a real issue from this.

This also introduces new unnecessary instructions in most cases, since
we don't yet try to optimize out the zext when the source is known to
come from a compare.
2020-01-06 18:26:42 -05:00
Matt Arsenault 26f714ff43 TableGen/GlobalISel: Handle default operands that are used
Copy the logic from the existing handling in the DAG matcher emittter.

This will enable some AMDGPU pattern cleanups without breaking
GlobalISel tests, and eventually handle importing more patterns.

The test is a bit annoying since the sections seem to randomly sort
themselves if anything else is added in the future.
2020-01-06 18:26:42 -05:00