Commit Graph

130312 Commits

Author SHA1 Message Date
Matt Arsenault e47965bf64 AMDGPU/GlobalISel: Merge trivial legalize rules
Also move constant-like rules together
2020-01-21 17:37:19 -05:00
Roman Lebedev a6492e2271
[IR] Value::getPointerAlignment(): handle pointer constants
Summary:
New `@test13` in `Attributor/align.ll` is the main motivation - `null` pointer
really does not limit our alignment knowledge, in fact it is fully aligned
since it has no bits set.

Here we don't special-case `null` pointer because it is somewhat controversial
to add one more place where we enforce that `null` pointer is zero,
but instead we do the more general thing of trying to perform constant-fold
of pointer constant to an integer, and perform alignment inferrment on that.

Reviewers: jdoerfert, gchatelet, courbet, sstefan1

Reviewed By: jdoerfert

Subscribers: hiraditya, arphaman, jfb, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D73131
2020-01-22 01:32:46 +03:00
Florian Hahn f42994f228 [Matrix] Hide and describe matrix-propagate-shape option. 2020-01-21 14:28:47 -08:00
Matt Arsenault 9a5a6e9465 AMDGPU/GlobalISel: Merge G_PTR_ADD/G_PTR_MASK rules 2020-01-21 16:57:01 -05:00
Matt Arsenault fd109308a7 AMDGPU/GlobalISel: Legalize G_PTR_ADD for arbitrary pointers
Pointers of unrecognized address spaces shoudl be treated as
global-like pointers. Even if loads and stores of them aren't handled,
dumb operations that just operate on the bits should work.
2020-01-21 16:35:36 -05:00
Quentin Colombet ff1f3cc1a1 [GISelKnownBits] Make the max depth a parameter of the analysis
Allow users of that analysis to define the cut off depth of the
analysis instead of hardcoding 6.

NFC as the default parameter is 6.
2020-01-21 11:35:31 -08:00
Thomas Lively 28857d14a8 [WebAssembly] Split and recombine multivalue calls for ISel
Summary:
Multivalue calls both take and return an arbitrary number of
arguments, but ISel only supports one or the other in a single
instruction. To get around this, calls are modeled as two pseudo
instructions during ISel. These pseudo instructions, CALL_PARAMS and
CALL_RESULTS, are recombined into a single CALL MachineInstr in a
custom emit hook.

RegStackification and the MC layer will additionally need to be made
aware of multivalue calls before the tests will produce correct
output.

Reviewers: aheejin, dschuff

Subscribers: sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71496
2020-01-21 11:31:33 -08:00
Thomas Lively 3ef169e586 [WebAssembly][InstrEmitter] Foundation for multivalue call lowering
Summary:
WebAssembly is unique among upstream targets in that it does not at
any point use physical registers to store values. Instead, it uses
virtual registers to model positions in its value stack. This means
that some target-independent lowering activities that would use
physical registers need to use virtual registers instead for
WebAssembly and similar downstream targets. This CL generalizes the
existing `usesPhysRegsForPEI` lowering hook to
`usesPhysRegsForValues` in preparation for using it in more places.

One such place is in InstrEmitter for instructions that have variadic
defs. On register machines, it only makes sense for these defs to be
physical registers, but for WebAssembly they must be virtual registers
like any other values. This CL changes InstrEmitter to check the new
target lowering hook to determine whether variadic defs should be
physical or virtual registers.

These changes are necessary to support a generalized CALL instruction
for WebAssembly that is capable of returning an arbitrary number of
arguments. Fully implementing that instruction will require additional
changes that are described in comments here but left for a follow up
commit.

Reviewers: aheejin, dschuff, qcolombet

Subscribers: sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71484
2020-01-21 11:13:46 -08:00
Ehud Katz 0b336b6048 [APFloat] Add support for operations on Signaling NaN
Fix PR30781

Differential Revision: https://reviews.llvm.org/D69774
2020-01-21 21:02:00 +02:00
Ehud Katz 68122b5826 [APFloat] Extend conversion from special strings
Add support for converting Signaling NaN, and a NaN Payload from string.

The NaNs (the string "nan" or "NaN") may be prefixed with 's' or 'S' for defining a Signaling NaN.

A payload for a NaN can be specified as a suffix.
It may be a octal/decimal/hexadecimal number in parentheses or without.

Differential Revision: https://reviews.llvm.org/D69773
2020-01-21 20:22:27 +02:00
Fangrui Song 8e1f0974c2 [PowerPC] Delete PPCSubtarget::isDarwin and isDarwinABI
http://lists.llvm.org/pipermail/llvm-dev/2018-August/125614.html developers have agreed to remove Darwin support from POWER backends.

Reviewed By: sfertile

Differential Revision: https://reviews.llvm.org/D72067
2020-01-21 09:54:44 -08:00
Jonas Devlieghere 72b8bad150 [lldb/Hexagon] Include <mutex>
Fixes compiler error on macOS: error: no type named 'mutex' in namespace
'std'.
2020-01-21 09:51:30 -08:00
Fangrui Song 7a8b0b1595 [StackColoring] Remap PseudoSourceValue frame indices via MachineFunction::getPSVManager()
Reviewed By: dantrushin

Differential Revision: https://reviews.llvm.org/D73063
2020-01-21 09:46:27 -08:00
Krzysztof Parzyszek 305bf5b21d [Hexagon] Add support for Hexagon v67t microarchitecture (tiny core) 2020-01-21 11:35:10 -06:00
Krzysztof Parzyszek 020041d99b Update spelling of {analyze,insert,remove}Branch in strings and comments
These names have been changed from CamelCase to camelCase, but there were
many places (comments mostly) that still used the old names.

This change is NFC.
2020-01-21 10:15:38 -06:00
Zakk Chen 1256d68093 [RISCV] Check the target-abi module flag matches the option
Reviewers: lenary, asb

Reviewed By: lenary

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D72768
2020-01-21 07:32:12 -08:00
Simon Pilgrim f04284cf1d [TargetLowering] SimplifyDemandedBits ISD::SRA multi-use handling
Call SimplifyMultipleUseDemandedBits to peek through extended source args with multiple uses
2020-01-21 15:12:07 +00:00
Benjamin Kramer 81f385b0c6 Make dropTriviallyDeadConstantArrays not quadratic
Only look at the operands of dead constant arrays instead of all
constant arrays again.
2020-01-21 16:06:46 +01:00
Jinsong Ji d7032bc3c0 [PowerPC][NFC] Reclaim TSFlags bit 6
We removed UseVSXReg flag in https://reviews.llvm.org/D58685
But we did not reclain the bit 6 it was assigned,
this will become confusing and a hole later..
We should reclaim it as early as possible before new bits.

Reviewed By: sfertile

Differential Revision: https://reviews.llvm.org/D72649
2020-01-21 15:04:05 +00:00
Simon Pilgrim 47f99d2ca8 [SelectionDAG] GetDemandedBits - remove ANY_EXTEND handling
Rely on SimplifyMultipleUseDemandedBits fallback instead.
2020-01-21 14:39:00 +00:00
Simon Pilgrim b065902ed4 [X86] combineBT - use SimplifyDemandedBits instead of GetDemandedBits
Another step towards removing SelectionDAG::GetDemandedBits entirely
2020-01-21 14:24:46 +00:00
Simon Pilgrim 651fa669a2 [TargetLowering] SimplifyDemandedBits ANY_EXTEND/ANY_EXTEND_VECTOR_INREG multi-use handling
Call SimplifyMultipleUseDemandedBits to peek through extended source args with multiple uses
2020-01-21 14:07:19 +00:00
Guillaume Chatelet 139771f8b0 [Alignment][NFC] Use Align with CreateElementUnorderedAtomicMemMove
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet

Subscribers: hiraditya, jfb, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D73050
2020-01-21 14:16:50 +01:00
Guillaume Chatelet bc8a1ab26f [Alignment][NFC] Use Align with CreateMaskedLoad
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet

Subscribers: hiraditya, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D73087
2020-01-21 14:13:22 +01:00
Simon Pilgrim 5f5f478564 [DAG] Fold extract_vector_elt (scalar_to_vector), K to undef (K != 0)
This was unconditionally folding this to the source operand, even if the access was out of bounds. Use undef instead of the extract is not the first element.

This helps with some cases where 3-vectors are legalized and avoids processing the 4th component.

Original Patch by: arsenm (Matt Arsenault)

Differential Revision: https://reviews.llvm.org/D51589
2020-01-21 10:58:30 +00:00
Simon Pilgrim 8d2e6bdbe1 [TargetLowering] SimplifyDemandedBits - Pull out InDemandedMask variable to ISD::SHL. NFCI.
Matches ISD::SRA + ISD::SRL variants.
2020-01-21 10:40:18 +00:00
Anna Welker ff9877ce34 [ARM][MVE] Enable masked scatter
Extends the gather/scatter pass in MVEGatherScatterLowering.cpp to
enable the transformation of masked scatters into calls to MVE's masked
scatter intrinsic.

Differential Revision: https://reviews.llvm.org/D72856
2020-01-21 09:46:26 +00:00
Nicolai Hähnle a80291ce10 Revert "[AMDGPU] Invert the handling of skip insertion."
This reverts commit 0dc6c249bf.

The commit is reported to cause a regression in piglit/bin/glsl-vs-loop for
Mesa.
2020-01-21 09:17:25 +01:00
Fangrui Song 02c1321139 [MC] Improve a report_fatal_error 2020-01-20 23:13:18 -08:00
Fangrui Song 5721483b64 [AMDGPU] Fix -Wunused-variable after e5823bf806 2020-01-20 22:41:13 -08:00
Matt Arsenault c72aa27f91 AMDDGPU/GlobalISel: Fix RegBankSelect for llvm.amdgcn.ps.live 2020-01-20 23:21:53 -05:00
Matt Arsenault e5823bf806 AMDGPU: Don't create weird sized integers
There's no reason to introduce a new, unnaturally sized value
here. This has a chance to produce worse code with
legalization. Avoids regression in a future patch.
2020-01-20 20:02:54 -05:00
Fangrui Song d232c21566 [AsmPrinter] Don't emit __patchable_function_entries entry if "patchable-function-entry"="0"
Add improve tests
2020-01-20 16:13:48 -08:00
Matt Arsenault 9b13b4a0e3 AMDGPU: Prepare to use scalar register indexing
Define pseudos mirroring the the VGPR indexing ones, and adjust the
operands in the s_movrel* instructions to avoid the result def.
2020-01-20 17:19:16 -05:00
Matt Arsenault 8615eeb455 AMDGPU: Partially merge indirect register write handling
a785209bc2 switched to using a pseudos instead of manually tying
operands on the regular instruction. The VGPR indexing mode path
should have the same problems that change attempted to avoid, so these
should use the same strategy.

Use a single pseudo for the VGPR indexing mode and movreld paths, and
expand it based on the subtarget later. These have essentially the
same constraints, reading the index from m0.

Switch from using an offset to the subregister index directly, instead
of computing an offset and re-adding it back. Also add missing pseudos
for existing register class sizes.
2020-01-20 17:19:16 -05:00
Krzysztof Parzyszek c12a5917d2 [Hexagon] Add support for Hexagon/HVX v67 ISA 2020-01-20 16:16:49 -06:00
Mircea Trofin 2e42cc7a50 [NFC] small rename of private member in InlineCost.cpp
Summary:
Follow-up from https://reviews.llvm.org/D71733. Also moved an
initialization to the base class, where it belonged in the first place.

Reviewers: eraman, davidxl

Reviewed By: davidxl

Subscribers: hiraditya, haicheng, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D72949
2020-01-20 13:03:15 -08:00
Matt Arsenault f6418d72f5 AMDGPU/GlobalISel: Add documentation for RegisterBankInfo
Document some high level strategies that should be used for register
bank selection. The constant bus restriction section hasn't actually
been implemented yet.
2020-01-20 15:41:25 -05:00
Simon Pilgrim 9c06c10fba [SelectionDAG] GetDemandedBits - fallback to SimplifyMultipleUseDemandedBits by default.
First step towards removing SelectionDAG::GetDemandedBits entirely since it so similar to SimplifyMultipleUseDemandedBits anyhow.
2020-01-20 16:51:52 +00:00
Sid Manning 7fee4fed4c Add support for Linux/Musl ABI
Differential revision: https://reviews.llvm.org/D72701

The patch adds a new option ABI for Hexagon. It primary deals with
the way variable arguments are passed and is use in the Hexagon Linux Musl
environment.

If a callee function has a variable argument list, it must perform the
following operations to set up its function prologue:

  1. Determine the number of registers which could have been used for passing
     unnamed arguments. This can be calculated by counting the number of
     registers used for passing named arguments. For example, if the callee
     function is as follows:

         int foo(int a, ...){ ... }

     ... then register R0 is used to access the argument ' a '. The registers
     available for passing unnamed arguments are R1, R2, R3, R4, and R5.

  2. Determine the number and size of the named arguments on the stack.

  3. If the callee has named arguments on the stack, it should copy all of these
     arguments to a location below the current position on the stack, and the
     difference should be the size of the register-saved area plus padding
     (if any is necessary).

     The register-saved area constitutes all the registers that could have
     been used to pass unnamed arguments. If the number of registers forming
     the register-saved area is odd, it requires 4 bytes of padding; if the
     number is even, no padding is required. This is done to ensure an 8-byte
     alignment on the stack.  For example, if the callee is as follows:

       int foo(int a, ...){ ... }

     ... then the named arguments should be copied to the following location:

       current_position - 5 (for R1-R5) * 4 (bytes) - 4 (bytes of padding)

     If the callee is as follows:

        int foo(int a, int b, ...){ ... }

     ... then the named arguments should be copied to the following location:

        current_position - 4 (for R2-R5) * 4 (bytes) - 0 (bytes of padding)

  4. After any named arguments have been copied, copy all the registers that
     could have been used to pass unnamed arguments on the stack. If the number
     of registers is odd, leave 4 bytes of padding and then start copying them
     on the stack; if the number is even, no padding is required. This
     constitutes the register-saved area. If padding is required, ensure
     that the start location of padding is 8-byte aligned.  If no padding is
     required, ensure that the start location of the on-stack copy of the
     first register which might have a variable argument is 8-byte aligned.

  5. Decrement the stack pointer by the size of register saved area plus the
     padding.  For example, if the callee is as follows:

        int foo(int a, ...){ ... } ;

     ... then the decrement value should be the following:

        5 (for R1-R5) * 4 (bytes) + 4 (bytes of padding) = 24 bytes

     The decrement should be performed before the allocframe instruction.
     Increment the stack-pointer back by the same amount before returning
     from the function.
2020-01-20 09:59:56 -06:00
Sanjay Patel 7bee94410c [InstCombine] form copysign from select of FP constants (PR44153)
This should be the last step needed to solve the problem in the
description of PR44153:
https://bugs.llvm.org/show_bug.cgi?id=44153

If we're casting an FP value to int, testing its signbit, and then
choosing between a value and its negated value, that's a
complicated way of saying "copysign":

(bitcast X) <  0 ? -TC :  TC --> copysign(TC,  X)

Differential Revision: https://reviews.llvm.org/D72643
2020-01-20 10:51:14 -05:00
Guillaume Chatelet 46b9563cf6 [Alignment][NFC] Use Align with CreateElementUnorderedAtomicMemCpy
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet, nicolasvasilache

Subscribers: hiraditya, jfb, mehdi_amini, rriddle, jpienaar, burmako, shauheen, antiagainst, csigg, arpith-jacob, mgester, lucyrfox, herhut, liufengdb, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D73041
2020-01-20 15:39:45 +01:00
Mark Murray b10a0eb04a [ARM][MVE][Intrinsics] Take abs() of VMINNMAQ, VMAXNMAQ intrinsics' first arguments.
Summary: Fix VMINNMAQ, VMAXNMAQ intrinsics; BOTH arguments have the absolute values taken.

Reviewers: dmgreen, simon_tatham

Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D72830
2020-01-20 14:33:26 +00:00
Sanjay Patel da9c93f330 [InstSimplify] fold select of vector constants that include undef elements
As mentioned in D72643, we'd like to be able to assert that any select
of equivalent constants has been removed before we're deep into InstCombine.

But there's a loophole in that assertion for vectors with undef elements
that don't match exactly.

This patch should close that gap. If we have undefs, we can't safely
propagate those unless both constants elements for that lane are undef.

Differential Revision: https://reviews.llvm.org/D72958
2020-01-20 08:48:32 -05:00
dfukalov de34b54edc [SCEV] Swap guards estimation sequence. NFC
Summary:
Loop unroll spends a lot of time in SCEVs processing in case when a function
contains hundreds of simple 'for' loops with a quite complex arrays indexes like

  for (int i = 0; i < 8; ++i) {
    for (int j = 0; j < 32; ++j) {
      C[j*8+i] = B[j*32+i+128] + A[i*64+128];
    }
  }
  for (int i = 0; i < 8; ++i) {
    for (int j = 0; j < 8; ++j) {
      for (int k = 0; k < 32; ++k) {
        D[k*64+i*8+j] = D[k*64+i*8+j] + E[i+16] * C[k*8+j+256];
      }
    }
  }

The patch improves loop unroll speed since isLoopBackedgeGuardedByCond takes
much less time than isLoopEntryGuardedByCond in the edge case.

Reviewers: skatkov, sanjoy, mkazantsev

Reviewed By: sanjoy

Subscribers: fhahn, hiraditya, javed.absar, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D72929
2020-01-20 16:41:16 +03:00
Simon Tatham f3e73e88fd [ARM,MVE] Fix confusing MC names for MVE VMINA/VMAXA insns.
Summary:
A recent commit accidentally defined names like `MVE_VMAXAs8` as
instances of the multiclass `MVE_VMINA`, and vice versa. This has no
effect on the test suite, because nothing directly refers to those
instruction names (the isel patterns are generated in Tablegen using
`!cast<Instruction>(NAME)` inside a lower-level multiclass). But it
means that `llvm-mc -show-inst` was listing VMAXA as VMINA, and it
would also affect any further draft code gen patches that use those
instruction ids.

Reviewers: MarkMurrayARM, dmgreen, miyuki, ostannard

Reviewed By: dmgreen

Subscribers: kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D73034
2020-01-20 13:25:52 +00:00
Andrzej Warzynski 7e717b3990 [AArch64][SVE] Extend int_aarch64_sve_ld1_gather_imm
The ACLE distinguishes between the following addressing modes for gather
loads:
  * "scalar base, vector offset", and
  * "vector base, scalar offset".
For the "vector base, scalar offset" case, the
`int_aarch64_sve_ld1_gather_imm` intrinsic was added in 79f2422d.
Currently, that intrinsic assumes that the scalar offset is passed as an
immediate.  As a result, it does not cater for cases where scalar offset
is stored in a register.

In this patch `int_aarch64_sve_ld1_gather_imm` is extended so that all
cases are covered:
* `int_aarch64_sve_ld1_gather_imm` is renamed as
  `int_aarch64_sve_ld1_gather_scalar_offset`
* new DAG combine rules are added for GLD1_IMM for scenarios where the
  offset is a non-immediate scalar or an out-of-range immediate
* sve-intrinsics-gather-loads-vector-base.ll is renamed as
  sve-intrinsics-gather-loads-vector-base-imm-offset.ll
* sve-intrinsics-gather-loads-vector-base-scalar-offset.ll is added to test
  file for non-immediate offsets

Similar changes are made for scatter store intrinsics.

Reviewed By: sdesmalen, efriedma

Differential Revision: https://reviews.llvm.org/D71773
2020-01-20 12:19:18 +00:00
Evgeniy Brevnov af7e158872 [LV] Vectorizer should adjust trip count in profile information
Summary: Vectorized loop processes VFxUF number of elements in one iteration thus total number of iterations decreases proportionally. In addition epilog loop may not have more than VFxUF - 1 iterations. This patch updates profile information accordingly.

Reviewers: hsaito, Ayal, fhahn, reames, silvas, dcaballe, SjoerdMeijer, mkuper, DaniilSuchkov

Reviewed By: Ayal, DaniilSuchkov

Subscribers: fedor.sergeev, hiraditya, rkruppe, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D67905
2020-01-20 18:36:28 +07:00
Simon Pilgrim eaa4548459 [X86][SSE] Add PACKSS SimplifyMultipleUseDemandedBits 'sign bit' handling.
Attempt to use SimplifyMultipleUseDemandedBits to simplify PACKSS if we're only after the sign bit.
2020-01-20 10:48:54 +00:00
Sjoerd Meijer 8cba99e2aa [ARM][MVE] Tail-Predication: rematerialise iteration count in exit blocks
This patch uses helper function rewriteLoopExitValues that is refactored in
D72602 to rematerialise the iteration count in exit blocks, so that we can
clean-up loop update expressions inside the hardware-loops later in
ARMLowOverheadLoops, which is necessary to get actual performance gains for
tail-predicated loops.

Differential Revision: https://reviews.llvm.org/D72714
2020-01-20 10:26:36 +00:00
Evgeniy Brevnov cfe97681cd [NFC][LoopUtils] Minor change in comment according to review D71990. 2020-01-20 17:10:10 +07:00
Evgeniy Brevnov 10357e1c89 [LoopUtils] Better accuracy for getLoopEstimatedTripCount.
Summary: Current implementation of getLoopEstimatedTripCount returns 1 iteration less than it should. The reason is that in bottom tested loop first iteration is executed before first back branch is taken. For example for loop with !{!"branch_weights", i32 1 // taken, i32 1 // exit} metadata getLoopEstimatedTripCount gives 1 while actual number of iterations is 2.

Reviewers: Ayal, fhahn

Reviewed By: Ayal

Subscribers: mgorny, hiraditya, zzheng, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71990
2020-01-20 16:58:07 +07:00
Awanish Pandey 84c4c87e04 Recommit "[DWARF5][DebugInfo]: Added support for DebugInfo generation for auto return type for C++ member functions."
Summary:
This was reverted in 328e0f3dca due to
chromium bot failure. This revision addresses that case.

Original commit message:
Summary:
    This patch will provide support for auto return type for the C++ member
    functions. Before this return type of the member function is deduced and
    stored in the DIE.
    This patch includes llvm side implementation of this feature.

    Patch by: Awanish Pandey <Awanish.Pandey@amd.com>

    Reviewers: dblaikie, aprantl, shafik, alok, SouraVX, jini.susan.george

    Reviewed by: dblaikie

    Differential Revision: https://reviews.llvm.org/D70524
2020-01-20 15:13:13 +05:30
Sjoerd Meijer 93175a5caa [IndVarSimplify][LoopUtils] rewriteLoopExitValues. NFCI
This moves `rewriteLoopExitValues()` from IndVarSimplify to LoopUtils thus
making it a generic loop utility function.  This allows to rewrite loop exit
values by just calling this function without running the whole IndVarSimplify
pass.

We use this in D72714 to rematerialise the iteration count in exit blocks, so
that we can clean-up loop update expressions inside the hardware-loops later.

Differential Revision: https://reviews.llvm.org/D72602
2020-01-20 09:05:00 +00:00
Georgii Rymar 11e8e32444 [llvm-mc] - Produce R_X86_64_PLT32 relocation for branches with JCC opcodes too.
The idea is to produce R_X86_64_PLT32 instead of
R_X86_64_PC32 for branches.

It fixes https://bugs.llvm.org/show_bug.cgi?id=44397.

This patch teaches MC to do that for JCC (jump if condition is met)
instructions. The new behavior matches modern GNU as.
It is similar to D43383, which did the same for "call/jmp foo",
but missed JCC cases.

Differential revision: https://reviews.llvm.org/D72831
2020-01-20 11:42:19 +03:00
David Green ff2e67a4f7 [ARM] MVE VLDn postinc
This adds Post inc variants of the VLD2/4 and VST2/4 instructions in
MVE. It uses the same mechanism/nodes as Neon, transforming the
intrinsic+add pair into a ARMISD::VLD2_UPD, which gets selected to a
post-inc instruction. The code to do that is mostly taken from the
existing Neon code, but simplified as less variants are needed.

It also fills in some getTgtMemIntrinsic for the arm.mve.vld2/4
instrinsics, which allow the nodes to have MMO's, calculated as the full
length to the memory being loaded/stored.

Differential Revision: https://reviews.llvm.org/D71194
2020-01-20 06:57:07 +00:00
David Green 5e51f75542 [ARM] Favour post inc for MVE loops
We were previously not necessarily favouring postinc for the MVE loads
and stores, leading to extra code prior to the loop to set up the
preinc. MVE in general can benefit from postinc (as we don't have
unrolled loops), and certain instructions like the VLD2's only post-inc
versions are available.

Differential Revision: https://reviews.llvm.org/D70790
2020-01-20 06:57:07 +00:00
Fangrui Song eaab1bf21e [StackColoring] Remap FixedStackPseudoSourceValue frame index referenced by MachineMemOperand
StackColoring::remapInstructions() remaps MachineOperand frame index (e.g. %stack.1 -> %stack.0)
but does not remap FixedStackPseudoSourceValue frame index (e.g. store 4 into %stack.1.ap2.i.i)
referenced by MachineMemoryOperand.

This can cause an assertion failure when LiveDebugValues references a dead stack object.

It is difficult to craft a test case. -g, va_copy and stack-coloring are required.
I can only reproduce it on ppc32.
2020-01-19 22:53:45 -08:00
Michael Liao 819421745c Reorder targets in alphabetical order. NFC. 2020-01-19 21:11:54 -05:00
Florian Hahn 0ee1db2d1d [X86] Try to avoid casts around logical vector ops recursively.
Currently PromoteMaskArithemtic only looks at a single operation to
skip casts. This means we miss cases where we combine multiple masks.

This patch updates PromoteMaskArithemtic to try to recursively promote
AND/XOR/AND nodes that terminate in truncates of the right size or
constant vectors.

Reviewers: craig.topper, RKSimon, spatel

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D72524
2020-01-19 17:22:43 -08:00
Fangrui Song 886d2c2ca7 [BranchRelaxation] Simplify offset computation and fix a bug in adjustBlockOffsets()
If Start!=0, adjustBlockOffsets() may unnecessarily adjust the offset of
Start. There is no correctness issue, but it can create more block
splits.
2020-01-19 16:02:16 -08:00
Fangrui Song 8e8a75ad50 [TargetRegisterInfo] Default trackLivenessAfterRegAlloc() to true
Except AMDGPU/R600RegisterInfo (a bunch of MIR tests seem to have
problems), every target overrides it with true. PostMachineScheduler
requires livein information. Not providing it can cause assertion
failures in ScheduleDAGInstrs::addSchedBarrierDeps().
2020-01-19 14:20:37 -08:00
Lang Hames 84217ad661 [ORC] Add weak symbol support to defineMaterializing, fix for PR40074.
The MaterializationResponsibility::defineMaterializing method allows clients to
add new definitions that are in the process of being materialized to the JIT.
This patch adds support to defineMaterializing for symbols with weak linkage
where the new definitions may be rejected if another materializer concurrently
defines the same symbol. If a weak symbol is rejected it will not be added to
the MaterializationResponsibility's responsibility set. Clients can check for
membership in the responsibility set via the
MaterializationResponsibility::getSymbols() method before resolving any
such weak symbols.

This patch also adds code to RTDyldObjectLinkingLayer to tag COFF comdat symbols
introduced during codegen as weak, on the assumption that these are COFF comdat
constants. This fixes http://llvm.org/PR40074.
2020-01-19 10:46:07 -08:00
Fangrui Song 9a24488cb6 [CodeGen] Move fentry-insert, xray-instrumentation and patchable-function before addPreEmitPass()
This intention is to move patchable-function before aarch64-branch-targets
(configured in AArch64PassConfig::addPreEmitPass) so that we emit BTI before NOPs
(see https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92424).

This also allows addPreEmitPass() passes to know the precise instruction sizes if they want.

Tried x86-64 Debug/Release builds of ccls with -fxray-instrument -fxray-instruction-threshold=1.
No output difference with this commit and the previous commit.
2020-01-19 00:09:46 -08:00
Craig Topper 5fa2022ec0 [X86] Remove X86ISD::FILD_FLAG and stop gluing nodes together.
Summary:
I think whatever problem the gluing was fixing has long since been fixed. We don't have any of the restrictions on FP stack stuff that existed back when this was first added.

I had to change which type we use for FILD in BuildFILD when X86 was enabled because most of the isel patterns block f32/f64 instructions when SSE1/SSE2 are enabled. So I needed to use the f80 pattern, but this shouldn't have an effect the generated code since there is only one FILD instruction anyway. We already use f80 explicitly in other other places.

Reviewers: RKSimon, spatel

Reviewed By: RKSimon

Subscribers: andrew.w.kaylor, scanon, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D72805
2020-01-18 23:44:05 -06:00
Fangrui Song 0cb415c189 [X86][BranchAlign] Suppress branch alignment for {,_}__tls_get_addr
The x86-64 General Dynamic TLS code sequence uses prefixes to allow
linker relaxation.  Adding segment override prefix or NOPs can break
linker relaxation (ld -pie/-no-pie).

i386 General Dynamic and x86-64 Local Dynamic do not use prefixes, but
for simplicity, just disable auto padding consistently.

Reviewed By: skan, LuoYuanke

Differential Revision: https://reviews.llvm.org/D72878
2020-01-18 18:14:51 -08:00
Fangrui Song 9583a3f262 [AsmPrinter] Delete dead takeDeletedSymbsForFunction()
The code added in r98579 is dead now.
2020-01-18 17:08:00 -08:00
Reid Kleckner ff6be0ca25 Revert "[Support] Explicitly instantiate BumpPtrAllocatorImpl"
This reverts commit add9599050.

Buildbots don't seem to like it.
2020-01-18 09:33:00 -08:00
Reid Kleckner add9599050 [Support] Explicitly instantiate BumpPtrAllocatorImpl
Most clients only ever use the default BumpPtrAllocator.
2020-01-18 09:21:53 -08:00
Simon Pilgrim 69bc450882 [X86] Rename lowerShuffleAsRotate -> lowerShuffleAsVALIGN
Since it can only ever create VALIGN nodes.
2020-01-18 11:29:14 +00:00
Michael Liao 6d0d86a64d [DAG] Add helper for creating constant vector index with correct type. NFC. 2020-01-18 01:23:36 -05:00
David Blaikie 58b10df54f DebugInfo: Move SectionLabel tracking into CU's addRange
This makes the SectionLabel handling more resilient - specifically for
future PROPELLER work which will have more CU ranges (rather than just
one per function).

Ultimately it might be nice to make this more general/resilient to
arbitrary labels (rather than relying on the labels being created for CU
ranges & then being reused by ranges, loclists, and possibly other
addresses). It's possible that other (non-rnglist/loclist) uses of
addresses will need the addresses to be in SectionLabels earlier (eg:
move the CU.addRange to be done on function begin, rather than function
end, so during function emission they are already populated for other
use).
2020-01-17 18:12:34 -08:00
David Blaikie 46ed93315f [IR] Remove some unnecessary cleanup in Module's dtor, and use a unique_ptr to simplify some
Follow on from D72812, based on Mehdi Amini's feedback.
2020-01-17 17:30:24 -08:00
Derek Schuff ff171acf84 [WebAssembly] Track frame registers through VReg and local allocation
This change has 2 components:

Target-independent: add a method getDwarfFrameBase to TargetFrameLowering. It
describes how the Dwarf frame base will be encoded.  That can be a register (the
default), the CFA (which replaces NVPTX-specific logic in DwarfCompileUnit), or
a DW_OP_WASM_location descriptr.

WebAssembly: Allow WebAssemblyFunctionInfo::getFrameRegister to return the
correct virtual register instead of FP32/SP32 after WebAssemblyReplacePhysRegs
has run.  Make WebAssemblyExplicitLocals store the local it allocates for the
frame register. Use this local information to implement getDwarfFrameBase

The result is that the DW_AT_frame_base attribute is correctly encoded for each
subprogram, and each param and local variable has a correct DW_AT_location that
uses DW_OP_fbreg to refer to the frame base.

This is a reland of rG3a05c3969c18 with fixes for the expensive-checks
and Windows builds

Differential Revision: https://reviews.llvm.org/D71681
2020-01-17 17:23:56 -08:00
Matt Arsenault a4451d88ee Consolidate internal denormal flushing controls
Currently there are 4 different mechanisms for controlling denormal
flushing behavior, and about as many equivalent frontend controls.

- AMDGPU uses the fp32-denormals and fp64-f16-denormals subtarget features
- NVPTX uses the nvptx-f32ftz attribute
- ARM directly uses the denormal-fp-math attribute
- Other targets indirectly use denormal-fp-math in one DAGCombine
- cl-denorms-are-zero has a corresponding denorms-are-zero attribute

AMDGPU wants a distinct control for f32 flushing from f16/f64, and as
far as I can tell the same is true for NVPTX (based on the attribute
name).

Work on consolidating these into the denormal-fp-math attribute, and a
new type specific denormal-fp-math-f32 variant. Only ARM seems to
support the two different flush modes, so this is overkill for the
other use cases. Ideally we would error on the unsupported
positive-zero mode on other targets from somewhere.

Move the logic for selecting the flush mode into the compiler driver,
instead of handling it in cc1. denormal-fp-math/denormal-fp-math-f32
are now both cc1 flags, but denormal-fp-math-f32 is not yet exposed as
a user flag.

-cl-denorms-are-zero, -fcuda-flush-denormals-to-zero and
-fno-cuda-flush-denormals-to-zero will be mapped to
-fp-denormal-math-f32=ieee or preserve-sign rather than the old
attributes.

Stop emitting the denorms-are-zero attribute for the OpenCL flag. It
has no in-tree users. The meaning would also be target dependent, such
as the AMDGPU choice to treat this as only meaning allow flushing of
f32 and not f16 or f64. The naming is also potentially confusing,
since DAZ in other contexts refers to instructions implicitly treating
input denormals as zero, not necessarily flushing output denormals to
zero.

This also does not attempt to change the behavior for the current
attribute. The LangRef now states that the default is ieee behavior,
but this is inaccurate for the current implementation. The clang
handling is slightly hacky to avoid touching the existing
denormal-fp-math uses. Fixing this will be left for a future patch.

AMDGPU is still using the subtarget feature to control the denormal
mode, but the new attribute are now emitted. A future change will
switch this and remove the subtarget features.
2020-01-17 20:09:53 -05:00
Matt Arsenault 592de0009f AMDGPU/GlobalISel: Select llvm.amdgcn.update.dpp
The existing test is overly reliant on -mattr=-flat-for-global, and
some missing optimizations to re-use.
2020-01-17 20:09:53 -05:00
Matt Arsenault ec9628318d AMDGPU/GlobalISel: Select DS append/consume 2020-01-17 20:09:53 -05:00
Reid Kleckner 423e3db6a8 Remove unneeded FoldingSet.h include from Attributes.h
Avoids 637 extra FoldingSet.h and Allocator.h includes. FoldingSet.h
needs Allocator.h, which is relatively expensive.
2020-01-17 16:36:09 -08:00
Evgenii Stepanov d081962dea Merge memtag instructions with adjacent stack slots.
Summary:
Detect a run of memory tagging instructions for adjacent stack frame slots,
and replace them with a shorter instruction sequence
* replace STG + STG with ST2G
* replace STGloop + STGloop with STGloop

This code needs to run when stack slot offsets are already known, but before
FrameIndex operands in STG instructions are eliminated; that's the
reason for the new hook in PrologueEpilogue.

This change modifies STGloop and STZGloop pseudos to take the size as an
immediate integer operand, and adds _untied variants of those pseudos
that are allowed to take the base address as a FI operand. This is needed to
simplify recognizing an STGloop instruction as operating on a stack slot
post-regalloc.

This improves memtag code size by ~0.25%, and it looks like an additional ~0.1%
is possible by rearranging the stack frame such that consecutive STG
instructions reference adjacent slots (patch pending).

Reviewers: pcc, ostannard

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70286
2020-01-17 15:19:29 -08:00
Alina Sbirlea 9f6c6ee6b9 [MemDepAnalysis/VNCoercion] Move static method to its only use. [NFCI]
Static method MemoryDependenceResults::getLoadLoadClobberFullWidthSize
does not have or use any info specific to MemoryDependenceResults.
Move it to its only user: VNCoercion.
2020-01-17 15:18:42 -08:00
Petr Hosek d3db13af7e [profile] Support counter relocation at runtime
This is an alternative to the continous mode that was implemented in
D68351. This mode relies on padding and the ability to mmap a file over
the existing mapping which is generally only available on POSIX systems
and isn't suitable for other platforms.

This change instead introduces the ability to relocate counters at
runtime using a level of indirection. On every counter access, we add a
bias to the counter address. This bias is stored in a symbol that's
provided by the profile runtime and is initially set to zero, meaning no
relocation. The runtime can mmap the profile into memory at abitrary
location, and set bias to the offset between the original and the new
counter location, at which point every subsequent counter access will be
to the new location, which allows updating profile directly akin to the
continous mode.

The advantage of this implementation is that doesn't require any special
OS support. The disadvantage is the extra overhead due to additional
instructions required for each counter access (overhead both in terms of
binary size and performance) plus duplication of counters (i.e. one copy
in the binary itself and another copy that's mmapped).

Differential Revision: https://reviews.llvm.org/D69740
2020-01-17 15:02:23 -08:00
Peter Collingbourne cd40bd0a32 hwasan: Move .note.hwasan.globals note to hwasan.module_ctor comdat.
As of D70146 lld GCs comdats as a group and no longer considers notes in
comdats to be GC roots, so we need to move the note to a comdat with a GC root
section (.init_array) in order to prevent lld from discarding the note.

Differential Revision: https://reviews.llvm.org/D72936
2020-01-17 13:40:52 -08:00
Ian Levesque 97ba483026 [xray] Allow instrumenting only function entry and/or only function exit
Extend -fxray-instrumentation-bundle to split function-entry and
function-exit into two separate options, so that it is possible to
instrument only function entry or only function exit.  For use cases
that only care about one or the other this will save significant overhead
and code size.

Differential Revision: https://reviews.llvm.org/D72890
2020-01-17 13:32:34 -08:00
Ian Levesque 7628e474a5 [xray] Add xray-ignore-loops option
XRay allows tuning by minimum function size, but also always instruments
functions with loops in them.  If the minimum function size is set to a
large value the loop instrumention ends up causing most functions to be
instrumented anyway.  This adds a new flag, xray-ignore-loops, to disable
the loop detection logic.

Differential Revision: https://reviews.llvm.org/D72659
2020-01-17 13:32:17 -08:00
Adrian Prantl 7b30370e5b Move the sysroot attribute from DIModule to DICompileUnit
[this re-applies c0176916a4
 with the correct commit message and phabricator link]

This addresses point 1 of PR44213.
https://bugs.llvm.org/show_bug.cgi?id=44213

The DW_AT_LLVM_sysroot attribute is used for Clang module debug info,
to allow LLDB to import a Clang module from source. Currently it is
part of each DW_TAG_module, however, it is the same for all modules in
a compile unit. It is more efficient and less ambiguous to store it
once in the DW_TAG_compile_unit.

This should have no effect on DWARF consumers other than LLDB.

Differential Revision: https://reviews.llvm.org/D71732
2020-01-17 12:55:40 -08:00
Adrian Prantl c17aee67f1 Revert "Rename DW_AT_LLVM_isysroot to DW_AT_LLVM_sysroot"
This reverts commit 12e479475a.

I accidentally landed this patch with the wrong commit message ...
2020-01-17 12:52:36 -08:00
Eli Friedman 447dcef790 Revert "[SVE] Pass Scalable argument to VectorType::get in Bitcode Reader"
This reverts commit 5df53a2259.

Caused test failures.
2020-01-17 12:13:49 -08:00
Christopher Tetreault 5df53a2259 [SVE] Pass Scalable argument to VectorType::get in Bitcode Reader
Summary:
* Pass the Scalability test to VectorType::get in order to be
able to deserialize bitcode that contains scalable vector operations

Change-Id: I37fe5b1c0c237a9153130deefdc1a6d595c7f12e

Reviewers: efriedma, pcc, sdesmalen, apazos, huihuiz, chrisj

Reviewed By: sdesmalen

Subscribers: tschuett, hiraditya, rkruppe, psnobl, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D72792
2020-01-17 11:34:08 -08:00
Mike Lambert fe085be125 [Hexagon] Use itinerary for assembler HVX resource checking 2020-01-17 13:14:04 -06:00
Alina Sbirlea 78d4096d03 [LazyCallGraph] Add invalidate method.
Summary: Add invalidate method in LazyCallGraph.

Reviewers: chandlerc, silvas

Subscribers: hiraditya, sanjoy.google, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D72817
2020-01-17 10:47:51 -08:00
Alina Sbirlea 630a8011e4 [CallGraph] Add invalidate method.
Summary: Add invalidate method in CallGraph.

Reviewers: Eugene.Zelenko, chandlerc

Subscribers: hiraditya, sanjoy.google, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D72816
2020-01-17 10:47:51 -08:00
Alina Sbirlea 62a50a95fc [BrachProbablityInfo] Add invalidate method.
Summary: Add invalidate method for BrachProbablityInfo.

Reviewers: Eugene.Zelenko, chandlerc

Subscribers: hiraditya, sanjoy.google, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D72815
2020-01-17 10:47:51 -08:00
Stanislav Mekhanoshin eebdd85e7d [AMDGPU] allow multi-dword flat scratch access since GFX9
This is supported starting with GFX9.

Differential Revision: https://reviews.llvm.org/D72865
2020-01-17 10:47:03 -08:00
Alina Sbirlea 5cc99d05f5 [GlobalsModRef] Add invalidate method
Summary: Add invalidate method to GlobalsAA.

Reviewers: tejohnson, chandlerc

Subscribers: hiraditya, sanjoy.google, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D72818
2020-01-17 10:33:54 -08:00
Brian Cain c1873631d0 [Hexagon] Refactor HexagonShuffle
The check() in HexagonShuffle has been decomposed into smaller steps.
No functionality change is intended with this commit.
2020-01-17 12:22:07 -06:00
Adrian Prantl 12e479475a Rename DW_AT_LLVM_isysroot to DW_AT_LLVM_sysroot
This is a purely cosmetic change that is NFC in terms of the binary
output. I bugs me that I called the attribute DW_AT_LLVM_isysroot
since the "i" is an artifact of GCC command line option syntax
(-isysroot is in the category of -i options) and doesn't carry any
useful information otherwise.

This attribute only appears in Clang module debug info.

Differential Revision: https://reviews.llvm.org/D71722
2020-01-17 09:36:48 -08:00
Drew Wock 0bcfafc5e7 [SeparateConstOffsetFromGEP] Fix: sext(a) + sext(b) -> sext(a + b) matches add and sub instructions with one another
During the SeparateConstOffsetFromGEP pass, signed extensions are distributed
to the values that feed into them and then later recombined. The recombination
stage is somewhat problematic- it doesn't differ add and sub instructions
from another when matching the sext(a) +/- sext(b) -> sext(a +/- b) pattern
in some instances.

An example- the IR contains:
%unextendedA
%unextendedB
%subuAuB = unextendedA - unextendedB
%extA = extend A
%extB = extend B
%addeAeB = extA + extB

The problematic optimization will transform that into:

%unextendedA
%unextendedB
%subuAuB = unextendedA - unextendedB
%extA = extend A
%extB = extend B
%addeAeB = extend subuAuB ; Obviously not semantically equivalent to the IR input.

This patch fixes that.

Patch by Drew Wock <drew.wock@sas.com>
Differential Revision: https://reviews.llvm.org/D65967
2020-01-17 12:22:52 -05:00
Nikita Popov 522c030aa9 [InstCombine] Fix worklist management in DSE (PR44552)
Fixes https://bugs.llvm.org/show_bug.cgi?id=44552. We need to make
sure that the store is reprocessed, because performing DSE may
expose more DSE opportunities.

There is a slight caveat here though: We need to make sure that we
add back the store the worklist first, because that means it will
be processed after the operands of the removed store have been
processed. This is a general bug in InstCombine worklist management
that I hope to address at some point, but for now it means we need
to do this manually rather than just returning the instruction as
changed.

Differential Revision: https://reviews.llvm.org/D72807
2020-01-17 18:10:56 +01:00
Nikita Popov 77befe54f7 [InstCombine] Fix worklist management in return combine
There are two related bugs here: First, we don't add the operand
we're replacing to the worklist, which means it may not get DCEd
(see test change). Second, usually this would just get picked up
in the next iteration, but we also do not report the instruction
as changed. This means that we do not get that extra instcombine
iteration, and more importantly, may break the pass pipeline, as
the function is not marked as changed.

Differential Revision: https://reviews.llvm.org/D72864
2020-01-17 17:59:23 +01:00
Nikita Popov 2ca092f320 [InstCombine] Support disabling expensive combines in opt
Currently, there is no way to disable ExpensiveCombines when doing
a standalone opt -instcombine run, as that's the default, and the
opt option can currently only be used to force enable, not to force
disable. The only way to disable expensive combines is via -O1 or -O2,
but that of course also runs the rest of the kitchen sink...

This patch allows using opt -instcombine -expensive-combines=0 to
run InstCombine without ExpensiveCombines.

Differential Revision: https://reviews.llvm.org/D72861
2020-01-17 17:56:20 +01:00
David Spickett 398dc06ad0 [AArch64] Make AArch64 specific assembly directives case insensitive
Differential Revision: https://reviews.llvm.org/D72923
2020-01-17 16:16:18 +00:00
Matt Arsenault 886f9071c6 AMDGPU: Don't assert on a16 images on targets without FeatureR128A16
Currently the lowering for i16 image coordinates asserts on gfx10. I'm
somewhat confused by this though. The feature is missing from the
gfx10 feature lists, but the a16 bit appears to be present in the
manual for MIMG instructions.
2020-01-17 11:07:00 -05:00
Sanjay Patel 43f60e614a [x86] try harder to form 256-bit unpck*
This is another part of a problem noted in PR42024:
https://bugs.llvm.org/show_bug.cgi?id=42024

The AVX2 code may use awkward 256-bit shuffles vs. the AVX code that gets split
into the expected 128-bit unpack instructions. We have to be selective in
matching the types where we try to do this though. Otherwise, we can end up
with more instructions (in the case of v8x32/v4x64).

Differential Revision: https://reviews.llvm.org/D72575
2020-01-17 10:42:39 -05:00
Krzysztof Parzyszek 2d5bfc6eb1 [Hexagon] Improve HVX version checks 2020-01-17 09:40:26 -06:00
Krzysztof Parzyszek 60aed6a4e5 [Hexagon] Add prev65 subtarget feature
There was a change to trap1 instruction between v62 and v65. This
feature will allow the assembler/disassembler to handle different
variants depending on the CPU version.
2020-01-17 09:27:27 -06:00
Simon Pilgrim 8eb4d25a09 [X86] Split X87/SSE compare classes into WriteFCom + WriteFComX
Most X87 compare instructions write to the X87 status word, while the SSE (U)COMI compares write to rFLAGS. These are often handled very differently on CPUs (e.g. rFLAGS outputs typically involve a fpu2gpr transfer), and we shouldn't be grouping all these instructions behind a single class - so this patch splits off the SSE compares into a new WriteFComX class (and currently keeps the same behaviours). If there's a need to distinguish between X87 instructions more closely we can investigate that in the future, but as we don't handle any of the X87 side effects at the moment its unlikely to have any notable effect.
2020-01-17 13:53:58 +00:00
Simon Pilgrim 1dc2f25790 [SelectionDAG] ComputeKnownBits - assert we're computing the 0'th (difference) result for the SUB/SUBC cases
Matches what we already do for the ADD/ADDC/ADDE case.
2020-01-17 13:53:57 +00:00
Sanjay Patel c1e159ef6e [IR] fix Constant::isElementWiseEqual() to allow for all undef elements compare
We could argue that match() should be more flexible here,
but I'm not sure what impact that would have on existing code.
2020-01-17 08:31:16 -05:00
Sam Parker 42350cd893 [ARM][MVE] Tail Predicate IsSafeToRemove
Introduce a method to walk through use-def chains to decide whether
it's possible to remove a given instruction and its users. These
instructions are then stored in a set until the end of the transform
when they're erased. This is now used to perform checks on the
iteration count (LoopDec chain), element count (VCTP chain) and the
possibly redundant iteration count.

As well as being able to remove chains of instructions, we know also
check that the sub feeding the vctp is producing the expected value.

Differential Revision: https://reviews.llvm.org/D71837
2020-01-17 13:19:14 +00:00
Fedor Sergeev cc7cb05e9d [BasicBlock] fix looping in getPostdominatingDeoptimizeCall
Blindly following unique-successors chain appeared to be a bad idea.
In a degenerate case when block jumps to itself that goes into endless loop.

Discovered this problem when playing with additional changes,
managed to reproduce it on existing LoopPredication code.

Fix by checking a "visited" set while iterating through unique successors.

Reviewed By: skatkov

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D72908
2020-01-17 15:40:02 +03:00
Cullen Rhodes 49edf9a509 [AArch64][SVE] Add break intrinsics
Summary:
Implements the following intrinsics:

    * @llvm.aarch64.sve.brka
    * @llvm.aarch64.sve.brka.z
    * @llvm.aarch64.sve.brkb
    * @llvm.aarch64.sve.brkb.z
    * @llvm.aarch64.sve.brkn.z
    * @llvm.aarch64.sve.brkpa.z
    * @llvm.aarch64.sve.brkpb.z

Reviewers: sdesmalen, efriedma, dancgr, mgudim, cameron.mcinally, rengolin

Reviewed By: sdesmalen

Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D72393
2020-01-17 11:47:08 +00:00
Simon Pilgrim f611158350 [SelectionDAG] Better ISD::ANY_EXTEND/ISD::ANY_EXTEND_VECTOR_INREG ComputeKnownBits support
Add DemandedElts handling to ISD::ANY_EXTEND and add missing ISD::ANY_EXTEND_VECTOR_INREG handling. Despite the lack of test changes this code IS being used - its just that the ANY_EXTEND ops are legalized later on (typically to ZERO_EXTEND equivalents) so we typically manage to combine later on.
2020-01-17 11:37:58 +00:00
David Spickett 37fb3b3363 [AsmParser] Make generic directives and aliases case insensitive.
GCC will accept any case for assembler directives.
For example ".abort" and ".ABORT" (even ".aBoRt")
are equivalent.

https://sourceware.org/binutils/docs/as/Pseudo-Ops.html#Pseudo-Ops
"The names are case insensitive for most targets,
and usually written in lower case."

Change llvm-mc to accept any case for generic directives
or aliases of those directives.

This for Bugzilla #39527.

Differential Revision: https://reviews.llvm.org/D72686
2020-01-17 11:02:56 +00:00
Kerry McLaughlin fe3bb8ec96 [AArch64][SVE] Add ImmArg property to intrinsics with immediates
Summary:
Several SVE intrinsics with immediate arguments (including those
added by D70253 & D70437) do not use the ImmArg property.
This patch adds ImmArg<Op> where required and changes
the appropriate patterns which match the immediates.

Reviewers: efriedma, sdesmalen, andwar, rengolin

Reviewed By: efriedma

Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, cfe-commits, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D72612
2020-01-17 10:47:55 +00:00
Craig Topper caee96031d [Transforms][RISCV] Remove a "using namespace llvm" from an include file. Fix a place that became dependent on it.
This include file was created in October and has a "using namespace llvm". This seems to get exposed to other include files and finally onto cpp files. While this somewhat okay for llvm itself, its bad for other projects that use llvm as a library and includes a header file that picks this up. This was found by ISPC which has some class names at gloal scope with the same names as LLVM.

It looks like RISCV accidentally became dependent on this. I fixed it by reordering some includes in the RISCV code, but maybe we want to change the TableGenEmitter to put "namespace llvm {" in the generated file instead? But we probably want to do the simplest thing first so we can merge it to 10.0.

Differential Revision: https://reviews.llvm.org/D72895
2020-01-16 20:50:41 -08:00
Matt Arsenault 117d4f1900 AMDGPU: Add register classes to MUBUF load patterns 2020-01-16 22:00:44 -05:00
Zakk Chen cef838e65f Revert "[RISCV] Support ABI checking with per function target-features"
This reverts commit 7bc58a779a.
It breaks EXPENSIVE_CHECKS on Windows
2020-01-16 18:01:07 -08:00
Davide Italiano 30a8865142 [FastISel] Lower `llvm.dbg.value(undef, ...` correctly.
Summary:
Instead of just dropping them.

<rdar://problem/58657146>

Reviewers: aprantl, vsk, ab, paquette, echristo

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D72877
2020-01-16 16:22:20 -08:00
David Blaikie 65eb74e94b PointerLikeTypeTraits: Standardize NumLowBitsAvailable on static constexpr rather than anonymous enum
This is (more?) usable by GDB pretty printers and seems nicer to write.

There's one tricky caveat that in C++14 (LLVM's codebase today) the
static constexpr member declaration is not a definition - so odr use of
this constant requires an out of line definition, which won't be
provided (that'd make all these trait classes more annoyidng/expensive
to maintain). But the use of this constant in the library implementation
is/should always be in a non-odr context - only two unit tests needed to
be touched to cope with this/avoid odr using these constants.

Based on/expanded from D72590 by Christian Sigg.
2020-01-16 15:30:50 -08:00
Eric Christopher de022a8824 [NFC] Fold isHugeExpression into hasHugeExpression and update callers
accordingly.
2020-01-16 15:28:54 -08:00
Jessica Paquette b82d18e1e8 [AArch64][GlobalISel] Change G_FCONSTANTs feeding into stores into G_CONSTANTS
Given the following situation:

x = G_FCONSTANT (something that can't be materialized)
G_STORE x, some_addr

We know that x must be materialized as at least a single mov. However, at the
time of selection, the G_STORE will have been regbankselected to a FPR store.

So, as a result, you'll get an unnecessary fmov into the G_STORE.

Storing a constant value in a GPR and a constant value in a FPR are the same.
So, whenever you see a G_FCONSTANT that feeds into only G_STORES, so might as
well make it a G_CONSTANT.

This adds a target-specific combine which changes G_FCONSTANTs feeding into
G_STOREs into G_CONSTANTs.

Differential Revision: https://reviews.llvm.org/D72814
2020-01-16 15:18:44 -08:00
Derek Schuff 80906d9d16 Revert "[WebAssembly] Track frame registers through VReg and local allocation"
This reverts commit 3a05c3969c.
It breaks under expensive-checks and on Windows
2020-01-16 14:38:00 -08:00
Matt Arsenault 3ef8cdf666 AMDGPU: Do permlane16 vdst_in discard optimization in InstCombine
There's more potential value to discarding the source value earlier,
since we always know the value of the fi/bc bits.
2020-01-16 17:27:53 -05:00
Matt Arsenault 91e758b732 AMDGPU: Move permlane discard vdst_in optimization
This case can be handled as a regular selection pattern, so move it
out of the weird post-isel folding code which doesn't have an exactly
equivalent place in GlobalISel.

I think it doesn't make much sense to do this optimization here
though, and it would be more useful in instcombine. There's not really
any new information that will be gained during lowering since these
inputs were known from the beginning.
2020-01-16 17:27:53 -05:00
Derek Schuff 3a05c3969c [WebAssembly] Track frame registers through VReg and local allocation
This change has 2 components:

Target-independent: add a method getDwarfFrameBase to TargetFrameLowering. It
describes how the Dwarf frame base will be encoded.  That can be a register (the
default), the CFA (which replaces NVPTX-specific logic in DwarfCompileUnit), or
a DW_OP_WASM_location descriptr.

WebAssembly: Allow WebAssemblyFunctionInfo::getFrameRegister to return the
correct virtual register instead of FP32/SP32 after WebAssemblyReplacePhysRegs
has run.  Make WebAssemblyExplicitLocals store the local it allocates for the
frame register. Use this local information to implement getDwarfFrameBase

The result is that the DW_AT_frame_base attribute is correctly encoded for each
subprogram, and each param and local variable has a correct DW_AT_location that
uses DW_OP_fbreg to refer to the frame base.

Differential Revision: https://reviews.llvm.org/D71681
2020-01-16 13:51:17 -08:00
Sanjay Patel 52b44902d0 [IR] fix crash in Constant::isElementWiseEqual() with FP types
We lifted this code from InstCombine for general usage in:
rL369842
...but it's not safe as-is. There are no existing users that can
trigger this bug, but I discovered it via crashing several
regression tests when trying to use it for select folding in
InstSimplify.

ICmp requires (vector) integer types, so give up on anything that's
not integer or FP (pointers and ?) then bitcast the constants
before trying the match. That matches the definition of "equal or
undef" that I was looking for. If someone wants an FP-aware version
of equality (deal with NaN, -0.0), that could be a different mode
or different function.

Differential Revision: https://reviews.llvm.org/D72784
2020-01-16 16:49:16 -05:00
Krzysztof Parzyszek ecf0766cf1 [Hexagon] Add ELF flags for Hexagon v66 to ELFYAML.cpp 2020-01-16 15:01:00 -06:00
Fedor Sergeev 1f2dad1fd5 [GVN] add GVN parameters parsing to new pass manager
Introduce parsing, add a few instances of parameter use into GVN-PRE tests.

Reviewers: skatkov, asbirlea
Reviewed By: skatkov

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D72752
2020-01-16 23:53:46 +03:00
Kazu Hirata 53b68e676f Resubmit: [JumpThreading] Thread jumps through two basic blocks
This reverts commit 2d258ed931.  This
revision fixes the Windows build and adds a testcase for it, namely
thread-two-bbs3.ll.  My original patch improperly copied EH pads on
Windows.  This patch disregards jump threading opportunities having to
do with EH pads.

[JumpThreading] Thread jumps through two basic blocks

Summary:
This patch teaches JumpThreading.cpp to thread through two basic
blocks like:

  bb3:
    %var = phi i32* [ null, %bb1 ], [ @a, %bb2 ]
    %tobool = icmp eq i32 %cond, 0
    br i1 %tobool, label %bb4, label ...

  bb4:
    %cmp = icmp eq i32* %var, null
    br i1 %cmp, label bb5, label bb6

by duplicating basic blocks like bb3 above.  Once we duplicate bb3 as
bb3.dup and redirect edge bb2->bb3 to bb2->bb3.dup, we have:

  bb3:
    %var = phi i32* [ @a, %bb2 ]
    %tobool = icmp eq i32 %cond, 0
    br i1 %tobool, label %bb4, label ...

  bb3.dup:
    %var = phi i32* [ null, %bb1 ]
    %tobool = icmp eq i32 %cond, 0
    br i1 %tobool, label %bb4, label ...

  bb4:
    %cmp = icmp eq i32* %var, null
    br i1 %cmp, label bb5, label bb6

Then the existing code in JumpThreading.cpp can thread edge
bb3.dup->bb4 through bb4 and eventually create bb3.dup->bb5.

Reviewers: wmi

Subscribers: hiraditya, jfb, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70247
2020-01-16 12:33:37 -08:00
Matt Arsenault f5d98543b8 AMDGPU: Remove outdated comment 2020-01-16 14:54:27 -05:00
Matt Arsenault e12b840abf AMDGPU/GlobalISel: Improve lowering of G_SEXT_INREG
Clamping the scalar is much better than lowering with superwide shifts
for types > s64.
2020-01-16 14:29:37 -05:00
Matt Arsenault a66d2817ca GlobalISel: Don't ignore requested ext narrowing type
This was assuming the narrow target was the source type. Respect the
requested type when these don't match by using intermediate
merges. This avoids producing very wide, illegal shift expansions.
2020-01-16 14:29:37 -05:00
Matt Arsenault be31a7b7ee GlobalISel: Move extension scalar narrowing to separate function
Also rename a few things. Handling a different requested type will
require this to become much more complex.
2020-01-16 14:29:37 -05:00
Krzysztof Parzyszek 5f65065437 [Hexagon] Update autogeneated intrinsic information in LLVM 2020-01-16 13:11:18 -06:00
Craig Topper 61a89e17df [LegalizeDAG][Mips] Add an assert to protect a uint_to_fp implementation from double rounding. Add a i32->f32 uint_to_fp implementation that avoids this code.
The algorithm here only works if the sint_to_fp doesn't do any
rounding. Otherwise it can round before the offset fixup is
applied. Add an assert to protect this.

To avoid breaking the one test in tree that tested this code
with a set of types that fail the assert, I've enabled i32->f32
to use the i64->f32 algorithm. This only occurs when f64 isn't
a legal type. If f64 is legal then we do i32->f64->f32 instead.

Differential Revision: https://reviews.llvm.org/D72794
2020-01-16 11:08:16 -08:00
Matt Arsenault d0943537e1 GlobalISel: Apply target MMO flags to atomics
Unify MMO flag handling with SelectionDAG like with loads and stores.
2020-01-16 13:49:43 -05:00
Matt Arsenault 0d0fce42b0 GlobalISel: Preserve load/store metadata in IRTranslator
This was dropping the invariant metadata on dead argument loads, so
they weren't deleted.

Atomics still need to be fixed the same way. Also, apparently store
was never preserving dereferencable which should also be fixed.
2020-01-16 13:49:43 -05:00
Krzysztof Parzyszek 8ee2d16896 [Hexagon] Add a target feature to disable compound instructions
This affects the following instructions:
Tag: M4_mpyrr_addr     Syntax: Ry32 = add(Ru32,mpyi(Ry32,Rs32))
Tag: M4_mpyri_addr_u2  Syntax: Rd32 = add(Ru32,mpyi(#u6:2,Rs32))
Tag: M4_mpyri_addr     Syntax: Rd32 = add(Ru32,mpyi(Rs32,#u6))
Tag: M4_mpyri_addi     Syntax: Rd32 = add(#u6,mpyi(Rs32,#U6))
Tag: M4_mpyrr_addi     Syntax: Rd32 = add(#u6,mpyi(Rs32,Rt32))
Tag: S4_addaddi        Syntax: Rd32 = add(Rs32,add(Ru32,#s6))
Tag: S4_subaddi        Syntax: Rd32 = add(Rs32,sub(#s6,Ru32))
Tag: S4_or_andix       Syntax: Rx32 = or(Ru32,and(Rx32,#s10))
Tag: S4_andi_asl_ri    Syntax: Rx32 = and(#u8,asl(Rx32,#U5))
Tag: S4_ori_asl_ri     Syntax: Rx32 = or(#u8,asl(Rx32,#U5))
Tag: S4_addi_asl_ri    Syntax: Rx32 = add(#u8,asl(Rx32,#U5))
Tag: S4_subi_asl_ri    Syntax: Rx32 = sub(#u8,asl(Rx32,#U5))
Tag: S4_andi_lsr_ri    Syntax: Rx32 = and(#u8,lsr(Rx32,#U5))
Tag: S4_ori_lsr_ri     Syntax: Rx32 = or(#u8,lsr(Rx32,#U5))
Tag: S4_addi_lsr_ri    Syntax: Rx32 = add(#u8,lsr(Rx32,#U5))
Tag: S4_subi_lsr_ri    Syntax: Rx32 = sub(#u8,lsr(Rx32,#U5))
2020-01-16 12:37:30 -06:00
Arkady Shlykov c87982b467 Revert "[Loop Peeling] Add possibility to enable peeling on loop nests."
This reverts commit 3f3017e because there's a failure on peel-loop-nests.ll
with LLVM_ENABLE_EXPENSIVE_CHECKS on.

Differential Revision: https://reviews.llvm.org/D70304
2020-01-16 10:33:38 -08:00
stevewan bed7626f04 [PowerPC][AIX] Make PIC the default relocation model for AIX
Summary:
The `llc` tool currently defaults to Static relocation model and generates non-relocatable code for 32-bit Power.
This is not desirable on AIX where we always generate Position Independent Code (PIC). This patch makes PIC the default relocation model for AIX.

Reviewers: daltenty, hubert.reinterpretcast, DiggerLin, Xiangling_L, sfertile

Reviewed By: hubert.reinterpretcast

Subscribers: mgorny, wuzish, nemanjai, hiraditya, kbarton, jsji, shchenz, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D72479
2020-01-16 13:07:36 -05:00
Nico Weber 81c67da0f2 remove an include that's unused after r347592 2020-01-16 12:49:54 -05:00
Fedor Sergeev 3478551bf3 [GVN] introduce GVNOptions to control GVN pass behavior
There are a few global (cl::opt) controls that enable optional
behavior in GVN. Introduce GVNOptions that provide corresponding
per-pass instance controls.

That will allow to use GVN multiple times in pipeline each time
with different settings.

Reviewers: asbirlea, rnk, reames, skatkov, fhahn
Reviewed By: fhahn

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D72732
2020-01-16 20:21:08 +03:00
Mircea Trofin 7acfda633f [llvm] Make new pass manager's OptimizationLevel a class
Summary:
The old pass manager separated speed optimization and size optimization
levels into two unsigned values. Coallescing both in an enum in the new
pass manager may lead to unintentional casts and comparisons.

In particular, taking a look at how the loop unroll passes were constructed
previously, the Os/Oz are now (==new pass manager) treated just like O3,
likely unintentionally.

This change disallows raw comparisons between optimization levels, to
avoid such unintended effects. As an effect, the O{s|z} behavior changes
for loop unrolling and loop unroll and jam, matching O2 rather than O3.

The change also parameterizes the threshold values used for loop
unrolling, primarily to aid testing.

Reviewers: tejohnson, davidxl

Reviewed By: tejohnson

Subscribers: zzheng, ychen, mehdi_amini, hiraditya, steven_wu, dexonsmith, dang, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D72547
2020-01-16 09:00:56 -08:00
Matt Arsenault 4ca1ad85b7 AMDGPU/GlobalISel: Don't handle legacy buffer intrinsic 2020-01-16 11:31:12 -05:00
Matt Arsenault 9b2f3532c7 AMDGPU/GlobalISel: Select DS GWS intrinsics 2020-01-16 11:25:10 -05:00
Jay Foad 885260d5d8 [GlobalISel] Don't arbitrarily limit a mask to 64 bits
Reviewers: arsenm

Subscribers: wdng, rovka, hiraditya, volkan, Petar.Avramovic, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D72853
2020-01-16 16:13:20 +00:00
Jay Foad 63f73545dd [GlobalISel] Pass MachineOperands into MachineIRBuilder helper methods
Reviewers: arsenm, aditya_nandakumar, aemerson

Subscribers: wdng, rovka, hiraditya, volkan, Petar.Avramovic, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D72849
2020-01-16 16:04:21 +00:00
Sam Parker 760b175109 [ARM][LowOverheadLoops] Update liveness info
Recommitting e93e0d413f after reverting due to test failures, which
will hopefully now be fixed. Original commit message:

After expanding the pseudo instructions, update the liveness info.
We do this in a post-order traversal of the loop, including its
exit blocks and preheader(s).

Differential Revision: https://reviews.llvm.org/D72131
2020-01-16 15:44:25 +00:00
Jay Foad 28bb43bdf8 [GlobalISel] Use more MachineIRBuilder helper methods
Reviewers: arsenm, nhaehnle

Subscribers: wdng, rovka, hiraditya, volkan, Petar.Avramovic, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D72833
2020-01-16 15:34:51 +00:00
Anna Welker c24cf97960 [ARM][MVE] Enable extending gathers
Enables the masked gather pass to
create extending masked gathers.

Differential Revision: https://reviews.llvm.org/D72451
2020-01-16 15:24:54 +00:00