Commit Graph

154064 Commits

Author SHA1 Message Date
Craig Topper 1e04923d21 [MachineValueType] Don't allow MVT::getVectorNumElements() to be called for scalable vectors.
Migrate the one caller that failed lit tests to use
MVT::getVectorMinNumElements directly.
2022-01-13 09:16:25 -08:00
Simon Pilgrim 55029f017d [X86] canonicalizeShuffleWithBinOps - add X86ISD::PSHUFHW/PSHUFLW handling 2022-01-13 17:08:59 +00:00
Matt Arsenault 59994c25f9 AMDGPU: Select workitem ID intrinsics to 0 with req_work_group_size
Shockingly we weren't doing this already. We should probably have this
be done earlier in the IR too, but it's still helpful to have the
lowering guarantee it so that we can modify the ABI implicit inputs
based on it.
2022-01-13 12:08:18 -05:00
Matt Arsenault a6f49423c1 AMDGPU: Optimize outgoing workitem ID based on reqd_work_group_size
If we know we we aren't using a component from the kernel, we can save
a few bit packing instructions.

We're still enabling the VGPR input to the kernel though.
2022-01-13 12:08:18 -05:00
David Sherwood ba471ba8d2 Revert "[CodeGen][AArch64] Ensure isSExtCheaperThanZExt returns true for negative constants"
This reverts commit 31009f0b5a.

It seems to be causing SVE VLA buildbot failures and has introduced a
genuine regression. Reverting for now.
2022-01-13 15:59:43 +00:00
Eugene Zhulenev 764e52f0d4 [DebugInfo][InstrRef] Short-circuit unnecessary preferred location map construction
Reviewed By: cota

Differential Revision: https://reviews.llvm.org/D117162
2022-01-13 06:24:52 -08:00
Nikita Popov aba7c3c033 [ConstantFold] Check uniform value in ConstantFoldLoadFromConst()
This case is automatically handled if ConstantFoldLoadFromConstPtr()
is used. Make sure that ConstantFoldLoadFromConst() also handles it.
2022-01-13 14:40:19 +01:00
Petar Avramovic 235886e174 AMDGPU/GlobalISel: Fix custom legalizatation for fceil 2022-01-13 14:29:30 +01:00
Sander de Smalen b92102a6d7 [AArch64] Add native CPU detection for Neoverse-V1.
Map Main ID part number 0xd40 to neoverse-v1, as described in the
Neoverse-V1 Technical Reference Manual:

https://developer.arm.com/documentation/101427/0101/Register-descriptions/AArch64-system-registers/MIDR-EL1--Main-ID-Register--EL1

Differential Revision: https://reviews.llvm.org/D117207
2022-01-13 12:58:54 +00:00
Simon Pilgrim 57a551a8df [X86][AVX] lowerShuffleAsLanePermuteAndShuffle - don't split element rotate patterns
Partial element rotate patterns (e.g. for element insertion on Issue #53124) were being split if every lane wasn't crossing, but really there's a good repeated mask hiding in there.
2022-01-13 11:59:08 +00:00
David Green 61888d97f6 [AArch64] Basic demand elements for some intrinsics
A lot of neon intrinsics work lane-wise, meaning that non-demanded
elements in and not demanded out. This teaches that to
AArch64TTIImpl::simplifyDemandedVectorEltsIntrinsic for some simple
single-input truncate intrinsics, which can help remove unnecessary
instructions.

Differential Revision: https://reviews.llvm.org/D117097
2022-01-13 11:53:12 +00:00
Florian Hahn 3f2fb767e3
[VPlan] Make IV operand explicit for VPWidenCanonicalIVRecipe (NFC).
This makes the def-use relationship between VPCanonicalIVPHIRecipe and
VPWidenCanonicalIVRecipe explicit. Needed for D117140.
2022-01-13 11:13:05 +00:00
Simon Pilgrim 4f414af6a7 Fix MSVC "32-bit shift implicitly converted to 64 bits" warning. NFC. 2022-01-13 11:10:50 +00:00
Simon Pilgrim 37ebec68a8 [MIPS] Mips16DAGToDAGISel::selectAddr - Use cast<> instead of dyn_cast<> to avoid dereference of nullptr
The pointer is always dereferenced immediately below, so assert the cast is correct instead of returning nullptr
2022-01-13 11:10:49 +00:00
Hans Wennborg 2bc57d85eb Don't override __attribute__((no_stack_protector)) by inlining (PR52886)
Since 26c6a3e736, LLVM's inliner will "upgrade" the caller's stack protector
attribute based on the callee. This lead to surprising results with Clang's
no_stack_protector attribute added in 4fbf84c173 (D46300). Consider the
following code compiled with clang -fstack-protector-strong -Os
(https://godbolt.org/z/7s3rW7a1q).

  extern void h(int* p);

  inline __attribute__((always_inline)) int g() {
    return 0;
  }

  int __attribute__((__no_stack_protector__)) f() {
    int a[1];
    h(a);
    return g();
  }

LLVM will inline g() into f(), and f() would get a stack protector, against the
users explicit wishes, potentially breaking the program e.g. if h() changes the
value of the stack cookie. That's a miscompile.

More recently, bc044a88ee (D91816) addressed this problem by preventing
inlining when the stack protector is disabled in the caller and enabled in the
callee or vice versa. However, the problem remained if the callee is marked
always_inline as in the example above. This affected users, see e.g.
http://crbug.com/1274129 and http://llvm.org/pr52886.

One way to fix this would be to prevent inlining also in the always_inline
case. Despite the name, always_inline does not guarantee inlining, so this
would be legal but potentially surprising to users.

However, I think the better fix is to not enable the stack protector in a
caller based on the callee. The motivation for the old behaviour is unclear, it
seems counter-intuitive, and causes real problems as we've seen.

This commit implements that fix, which means in the example above, g() gets
inlined into f() (also without always_inline), and f() is emitted without stack
protector. I think that matches most developers' expectations, and that's also
what GCC does.

Another effect of this change is that a no_stack_protector function can now be
inlined into a stack protected function, e.g. (https://godbolt.org/z/hafP6W856):

  extern void h(int* p);

  inline int __attribute__((__no_stack_protector__)) __attribute__((always_inline)) g() {
    return 0;
  }

  int f() {
    int a[1];
    h(a);
    return g();
  }

I think that's fine. Such code would be unusual since no_stack_protector is
normally applied to a program entry point which sets up the stack canary. And
even if such code exists, inlining doesn't change the semantics: there is still
no stack cookie setup/check around entry/exit of the g() code region, but there
may be in the surrounding context, as there was before inlining. This also
matches GCC.

See also the discussion at https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94722

Differential revision: https://reviews.llvm.org/D116589
2022-01-13 12:04:49 +01:00
Sebastian Neubauer f4139440f1 [Docs] Fix IR and TableGen grammar inconsistencies
IR:
- globals (and functions, ifuncs, aliases) can have a partition
- catchret has a `to` before the label
- the sint/int types do not exist
- signext comes after the type
- a variable was missing its type

TableGen:
- The second value after a `#` concatenation is optional
  See e.g. llvm/lib/Target/X86/X86InstrAVX512.td:L3351
- IncludeDirective and PreprocessorDirective were never referenced in
  the grammar
- Add some missing ;
- Parent classes of multiclasses can have generic arguments.
  Reuse the `ParentClassList` that is already used in other places.

MIR:
- liveins only allows physical registers, which start with a $

Differential Revision: https://reviews.llvm.org/D116674
2022-01-13 11:55:13 +01:00
Ties Stuij 7c70f96a91 [ARM] fix bug causing shrinkwrapping not always being off using PAC
If you want to check for all uses of PAC, the SpillsLR argument to
shouldSignReturnAddress should be true instead of false, as that value will be
returned from the function if the other checks fall through.

Reviewed By: miyuki

Differential Revision: https://reviews.llvm.org/D116213
2022-01-13 10:37:00 +00:00
Nikita Popov 1cbb456123 [GlobalOpt] Fix global to select transform under opaque pointers
We need to check that the load/store type is also the same, as this
is no longer implicitly checked through the pointer type.
2022-01-13 11:13:06 +01:00
Paulo Matos 97ef15ad76 [WebAssembly] Fix reftype load/store match with idx from call
Implement support for matching an index from a WebAssembly CALL
instruction. Add test.

Reviewed By: tlively

Differential Revision: https://reviews.llvm.org/D115327
2022-01-13 11:04:22 +01:00
Jay Foad 821dd3b0e5 [FileCheck] Allow literal '['s before "[[var...]]"
Change FileCheck to accept patterns like "[[[var...]]" and treat the
excess open brackets at the start as literals.

This makes the patterns for matching assembler output with literal
brackets much cleaner. For example an AMDGPU pattern that used to be
written like:

  buffer_store_dwordx2 v{{\[}}[[LO]]:[[HI]]{{\]}}

can now be:

  buffer_store_dwordx2 v[[[LO]]:[[HI]]]

(Even before this patch the final close bracket did not need to be
wrapped in {{}}, but people tended to do it anyway for symmetry.)

This does not introduce any ambiguity since "[[" was always followed by
an identifier or '@' or '#', so "[[[" was always an error.

I've included a few test updates in this patch just for illustration and
testing. There are a couple of hundred tests that could be updated as a
follow up, mostly in test/CodeGen/.

Differential Revision: https://reviews.llvm.org/D117117

Change-Id: Ia6bc6f65cb69734821c911f54a43fe1c673bcca7
2022-01-13 09:47:37 +00:00
David Sherwood 31009f0b5a [CodeGen][AArch64] Ensure isSExtCheaperThanZExt returns true for negative constants
When we know the value we're extending is a negative constant then it
makes sense to use SIGN_EXTEND because this may improve code quality in
some cases, particularly when doing a constant splat of an unpacked vector
type. For example, for SVE when splatting the value -1 into all elements
of a vector of type <vscale x 2 x i32> the element type will get promoted
from i32 -> i64. In this case we want the splat value to sign-extend from
(i32 -1) -> (i64 -1), whereas currently it zero-extends from
(i32 -1) -> (i64 0xFFFFFFFF). Sign-extending the constant means we can use
a single mov immediate instruction.

New tests added here:

  CodeGen/AArch64/sve-vector-splat.ll

I believe we see some code quality improvements in these existing
tests too:

  CodeGen/AArch64/dag-numsignbits.ll
  CodeGen/AArch64/reduce-and.ll
  CodeGen/AArch64/unfold-masked-merge-vector-variablemask.ll

The apparent regressions in CodeGen/AArch64/fast-isel-cmp-vec.ll only
occur because the test disables codegen prepare and branch folding.

Differential Revision: https://reviews.llvm.org/D114357
2022-01-13 09:43:07 +00:00
Florian Hahn 7ce48be0fd
[LV] Inline CreateSplatIV call for scalar VFs (NFC).
This is a NFC change split off from D116123, as suggested there.
D116123 will remove the last user of CreateSplatIV.
2022-01-13 09:34:31 +00:00
David Sherwood ef1ca4d3e9 [AArch64] Fix incorrect use of MVT::getVectorNumElements in AArch64TTIImpl::getVectorInstrCost
If we are inserting into or extracting from a scalable vector we do
not know the number of elements at runtime, so we can only let the
index wrap for fixed-length vectors.

Tests added here:

  Analysis/CostModel/AArch64/sve-insert-extract.ll

Differential Revision: https://reviews.llvm.org/D117099
2022-01-13 09:27:14 +00:00
Vladislav Khmelevsky 6b22c370c8 RuntimeDyldELF: Don't abort on R_AARCH64_NONE relocation
Do nothing on R_AARCH64_NONE relocation. The relocation is used by BOLT when re-linking the final binary. It is used as a dummy relocation hack in order to stop the RuntimeDyld to skip the allocation of the section.

Reviewed By: lhames

Differential Revision: https://reviews.llvm.org/D117066
2022-01-13 11:54:48 +03:00
luxufan 0ef5aa69e7 [JITLink] Add fixup value range check
This patch makes jitlink to report an out of range error when the fixup value out of range

Reviewed By: lhames

Differential Revision: https://reviews.llvm.org/D107328
2022-01-13 16:32:49 +08:00
Jim Lin bb13036483 [M68k][NFC] Use Register instead of unsigned int 2022-01-13 15:49:39 +08:00
Christian Sigg cc1b9acf55 [NVPTX] Lower fp16 fminnum, fmaxnum to native on sm_80.
Reviewed By: bkramer, tra

Differential Revision: https://reviews.llvm.org/D117122
2022-01-13 08:52:31 +01:00
Kazu Hirata cd772844d8 [CSKY] Ensure a newline at the end of a file (NFC) 2022-01-12 22:11:57 -08:00
James Y Knight 55fcbf0a84 Revert "[Inline] Attempt to delete any discardable if unused functions"
Somehow this ends up causing an infinite loop in the inliner.

This reverts commit d5be48c66d.
2022-01-13 03:06:47 +00:00
Lian Wang 16877c5d2c [RISCV] Add bfp and bfpw intrinsic in zbf extension
Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D116994
2022-01-13 02:53:00 +00:00
Philip Reames 9979299705 [Attributor] Simplify how we handle required alignment during heap-to-stack [NFC]
The existing code duplicated the same concern in two places, and (weirdly) changed the inference of the allocation size based on whether we could meet the alignment requirement.  Instead, just directly check the allocation requirement.
2022-01-12 17:34:17 -08:00
Philip Reames d1f4c6a611 [Attributor] Generalize calloc handling in heap-to-stack for any init value [NFC]
Rewrite the calloc specific handling in heap-to-stack to allow arbitrary init values.  The basic problem being solved is that if an allocation is initilized to anything other than zero, this must be explicitly done for the formed alloca as well.

This covers the calloc case today, but once a couple of earlier guards are removed in this code, downstream allocators with other init values could also be handled.

Inspired by discussion on D116971
2022-01-12 16:58:39 -08:00
Philip Reames 8e76720cf2 [Attributor] Reuse object size evaluation code [NFC] 2022-01-12 16:58:39 -08:00
Philip Reames db57065b36 [Attributor] Use getAllocAlignment where possible [NFC]
Inspired by D116971.
2022-01-12 16:58:39 -08:00
Matt Arsenault 1adeebc2cf AMDGPU: Fix assert on function argument as loop condition 2022-01-12 19:44:26 -05:00
Stanislav Mekhanoshin d043822daa [AMDGPU] Fixed physreg asm constraint parsing
We are always failing parsing of the physreg constraint because
we do not drop trailing brace, thus getAsInteger() returns a
non-empty string and we delegate reparsing to the TargetLowering.

In addition it did not parse register tuples.

Fixed which has allowed to remove w/a in two places we call it.

Differential Revision: https://reviews.llvm.org/D117055
2022-01-12 16:37:08 -08:00
Matt Arsenault 5a16306c09 GlobalISel: Always enable GISelKnownBits for InstructionSelect
This wasn't running at -O0, and causing crashes for AMDGPU. AMDGPU
needs this to match the addressing modes of stack access instructions,
which is even more important at -O0 than with optimizations.

It currently costs nothing to run ahead of time, so just always enable
it.
2022-01-12 18:57:24 -05:00
Matt Arsenault 5f39a02ea9 RegScavenger: Remove used regs from scavenge candidates
In a future change, AMDGPU will have 2 emergency scavenging indexes in
some situations. The secondary scavenging index ends up being used
recursively when the scavenger calls eliminateFrameIndex for the
emergency spill slot. Without this, it would end up seeing the same
register which was just scavenged in the parent call as free, inserts
a second emergency spill to the same location and returns the same
register when 2 unique free registers are required.

We need to only do this if the register is used. SystemZ uses 2
scavenging slots, but calls the scavenger twice in sequence and not
recursively. In this case the previously scavenged register can be
re-clobbered, but is still tracked in the scavenger until it sees the
deferred restore instruction.
2022-01-12 18:56:52 -05:00
Matt Arsenault 4515c24bbc AMDGPU/GlobalISel: Fix assertions on legalize queries with huge align
For some reason we pass around the alignment in bits as uint64_t. Two
places were truncating it to unsigned, and losing bits in extreme
cases.
2022-01-12 18:21:44 -05:00
Matt Arsenault 07ddfa95e3 GlobalISel: Add G_ASSERT_ALIGN hint instruction
Insert it for call return values only for now, which is the only case
the DAG handles also.
2022-01-12 18:20:58 -05:00
Tomas Matheson 2db4cf5962 clang support for Armv8.8/9.3 HBC
This introduces clang command line support for new Armv8.8-A and
Armv9.3-A Hinted Conditional Branches feature, previously introduced
into LLVM in https://reviews.llvm.org/D116156.

Patch by Tomas Matheson and Son Tuan Vu.

Differential Revision: https://reviews.llvm.org/D116939
2022-01-12 22:07:35 +00:00
Luís Ferreira 6983968e83 [Demangle] Pass Ret parameter from decodeNumber by reference
Since Ret parameter is never meant to be nullptr, let's pass it by reference instead of a raw pointer.

Reviewed By: dblaikie

Differential Revision: https://reviews.llvm.org/D117046
2022-01-12 21:57:31 +00:00
Luís Ferreira b21ea1c270 [Demangle] Add support for D types back referencing
This patch adds support for type back referencing, allowing demangling of
    compressed mangled symbols with repetitive types.

    Signed-off-by: Luís Ferreira <contact@lsferreira.net>

Reviewed By: dblaikie

Differential Revision: https://reviews.llvm.org/D111419
2022-01-12 21:57:31 +00:00
Luís Ferreira bec08795db [Demangle] Add support for D symbols back referencing
This patch adds support for identifier back referencing allowing compressed
    mangled names by avoiding repetitiveness.

    Signed-off-by: Luís Ferreira <contact@lsferreira.net>

Reviewed By: dblaikie

Differential Revision: https://reviews.llvm.org/D111417
2022-01-12 21:57:31 +00:00
Luís Ferreira 669bfcf036 [Demangle] Add minimal support for D simple basic types
This patch implements simple demangling of two basic types to add minimal type functionality. This will be later used in function type parsing. After that being implemented we can add the rest of the types and test the result of the type name.

Reviewed By: dblaikie

Differential Revision: https://reviews.llvm.org/D111416
2022-01-12 21:57:30 +00:00
Sanjay Patel 6bd127b079 [InstSimplify] use knownbits to fold more udiv/urem
We could use knownbits on both operands for even more folds (and there are
already tests in place for that), but this is enough to recover the example
from:
https://github.com/llvm/llvm-project/issues/51934
(the tests are derived from the code in that example)

I am assuming no noticeable compile-time impact from this because udiv/urem
are rare opcodes.

Differential Revision: https://reviews.llvm.org/D116616
2022-01-12 14:59:43 -05:00
Nico Weber 66b2ed477f Revert "[JITLink][AArch64] Add support for splitting eh-frames on AArch64."
This reverts commit 253ce92844.
Breaks tests on Windows, see
https://github.com/llvm/llvm-project/issues/52921#issuecomment-1011118896
2022-01-12 14:40:09 -05:00
Alex Bradbury 33d008b169 [RISCV] Update recently ratified Zb{a,b,c,s} extensions to no longer be experimental
Agreed policy is that RISC-V extensions that have not yet been ratified
should be marked as experimental, and enabling them requires the use of
the -menable-experimental-extensions flag when using clang alongside the
version number. These extensions have now been ratified, so this is no
longer necessary, and the target feature names can be renamed to no
longer be prefixed with "experimental-".

Differential Revision: https://reviews.llvm.org/D117131
2022-01-12 19:33:44 +00:00
Matt Arsenault bd2c01e937 AMDGPU/GlobalISel: Do not use terminator copy before waterfall loops
Stop using the _term variants of the mov to save the initial exec
value before the waterfall loop. This cannot be glued to the bottom of
the block because we may need to spill the result register. Just use a
regular mov, like the loops produced on the DAG path. Fixes some
verification errors with regalloc fast.
2022-01-12 13:44:05 -05:00
Matt Arsenault 8a16201a0b GlobalISel: Fix insert point in localizer
This was inserting the new G_CONSTANT after the use, and the later
block scan would run off the end. Fix calling SkipPHIsAndLabels for no
apparent reason.
2022-01-12 13:44:05 -05:00