Commit Graph

31319 Commits

Author SHA1 Message Date
Konstantin Schwarz d2e66d7fa4 [GlobalISel] Add a combine for and(load , mask) -> zextload
This only handles simple masks, not shifted masks, for now.

Reviewed By: aemerson

Differential Revision: https://reviews.llvm.org/D109357
2021-09-16 10:42:46 +02:00
Sam Parker c98a8a09b5 [HardwareLoops] Loop guard intrinsic to recognise zext
If a loop count was initially represented by a 32b unsigned int in C
then the hardware-loop pass can recognise the loop guard and insert
the llvm.test.set.loop.iterations intrinsic. If this was instead a
unsigned short/char then clang inserts a zext instruction to expand
the loop count to an i32. This patch adds the necessary pattern
matching to enable the use of lvm.test.set.loop.iterations in those
cases.

Patch by: sherwin-dc

Differential Revision: https://reviews.llvm.org/D109631
2021-09-16 08:33:16 +01:00
Alok Kumar Sharma a5b72abc9e [DebugInfo] Enhance DIImportedEntity to accept children entities
New field `elements` is added to '!DIImportedEntity', representing
list of aliased entities.
This is needed to dump optimized debugging information where all names
in a module are imported, but a few names are imported with overriding
aliases.

Reviewed By: dblaikie

Differential Revision: https://reviews.llvm.org/D109343
2021-09-16 10:41:55 +05:30
Ahmed Bougacha 94a2f9cdb6 [GlobalISel] Fix CombinerHelper::isPredecessor for same def/use MI.
The doc comment for isPredecessor says:
  Returns true if \p DefMI precedes \p UseMI or they are the same
  instruction.
And dominates relies on that behavior for its own:
  Returns true if \p DefMI dominates \p UseMI. By definition an
  instruction dominates itself.

Make both statements correct by fixing isPredecessor.
Found by inspection.
2021-09-15 16:45:27 -07:00
Matt Arsenault 87c00878d3 SplitKit: Remove decade old live interval hack
This was trying to fixup broken live intervals coming out of the
coalescer. The verifier is more complete now and no tests seem to fail
without this.
2021-09-15 17:35:59 -04:00
Amara Emerson 5ec1845cad [AArch64][GlobalISel] Add a new reassociation for G_PTR_ADDs.
G_PTR_ADD (G_PTR_ADD X, C), Y) -> (G_PTR_ADD (G_PTR_ADD(X, Y), C)

Improves CTMark -Os on AArch64:

Program            before after  diff
           sqlite3 286932 287024  0.0%
                kc 432512 432508 -0.0%
             SPASS 412788 412764 -0.0%
    pairlocalalign 249460 249416 -0.0%
            bullet 475740 475512 -0.0%
    7zip-benchmark 568864 568356 -0.1%
  consumer-typeset 419088 418648 -0.1%
        tramp3d-v4 367628 367224 -0.1%
          clamscan 383184 382732 -0.1%
            lencod 430028 429284 -0.2%
Geomean difference               -0.1%

Differential Revision: https://reviews.llvm.org/D109528
2021-09-14 23:57:41 -07:00
Matt Arsenault 54d755a034 DAG: Fix incorrect folding of fmul -1 to fneg
The fmul is a canonicalizing operation, and fneg is not so this would
break denormals that need flushing and also would not quiet signaling
nans. Fold to fsub instead, which is also canonicalizing.
2021-09-14 21:25:02 -04:00
Matt Arsenault 4a36e96c3f RegAllocGreedy: Account for reserved registers in num regs heuristic
This simple heuristic uses the estimated live range length combined
with the number of registers in the class to switch which heuristic to
use. This was taking the raw number of registers in the class, even
though not all of them may be available. AMDGPU heavily relies on
dynamically reserved numbers of registers based on user attributes to
satisfy occupancy constraints, so the raw number is highly misleading.

There are still a few problems here. In the original testcase that
made me notice this, the live range size is incorrect after the
scheduler rearranges instructions, since the instructions don't have
the original InstrDist offsets. Additionally, I think it would be more
appropriate to use the number of disjointly allocatable registers in
the class. For the AMDGPU register tuples, there are a large number of
registers in each tuple class, but only a small fraction can actually
be allocated at the same time since they all overlap with each
other. It seems we do not have a query that corresponds to the number
of independently allocatable registers. Relatedly, I'm still debugging
some allocation failures where overlapping tuples seem to not be
handled correctly.

The test changes are mostly noise. There are a handful of x86 tests
that look like regressions with an additional spill, and a handful
that now avoid a spill. The worst looking regression is likely
test/Thumb2/mve-vld4.ll which introduces a few additional
spills. test/CodeGen/AMDGPU/soft-clause-exceeds-register-budget.ll
shows a massive improvement by completely eliminating a large number
of spills inside a loop.
2021-09-14 21:00:29 -04:00
Bjorn Pettersson cd2bff1ef1 [StackColoring] Fix a debug invariance problem
Ignore dbg instructions when collecting stack slot markers. This is
to make sure the coloring is invariant regarding presence of dbg
instructions (even in cases when the dbg instructions might be
badly placed in the input).

Differential Revision: https://reviews.llvm.org/D109758
2021-09-14 19:21:56 +02:00
vnalamot 726b5d3416 [RegScavenger][NFC] Refer to the already initialized local variable for spill slot index
Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D109501
2021-09-13 21:55:33 +05:30
Simon Pilgrim 9db20822f7 [APInt] Add APIntOps::ScaleBitMask helper
APInt is used to describe a bit mask in a variety of value tracking and demanded bits/elts functions.

When traversing through dst/src operands, we have a number of places where these masks need to widened/narrowed to translate through bitcasts, reductions etc. to a different type.

This patch add a APIntOps::ScaleBitMask common helper, adds unit test coverage, and updates a number of cases to use the the helper instead of their own implementation.

This came up on D109065 where we currently have to add yet another implementation of the same code.

Differential Revision: https://reviews.llvm.org/D109683
2021-09-13 16:27:12 +01:00
vnalamot 0fc3ebb70a [SelectionDAG][NFC] Fix typo in VerifyDAGDiverence() function name
Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D109674
2021-09-13 20:48:04 +05:30
David Truby 915e9e76bf [llvm][sve] Lowering for VLS masked extending loads
This extends the custom lowering for extending loads on
fixed length vectors in SVE to support masked extending loads.

The existing tests for correct behaviour of masked extending loads
exhibit bad code generation due to the legalistaion of i1 vectors.
They have been left as-is and new tests have been added that do not
exhibit this behaviour.

Differential Revision: https://reviews.llvm.org/D108200
2021-09-13 11:13:25 +01:00
Nikita Popov 4189e5fe12 [CGP] Support opaque pointers in address mode fold
Rather than inspecting the pointer element type, use the access
type of the load/store/atomicrmw/cmpxchg.

In the process of doing this, simplify the logic by storing the
address + type in MemoryUses, rather than an Instruction + Operand
pair (which was then used to fetch the address).
2021-09-12 17:43:37 +02:00
Kazu Hirata c9fca53af1 [CodeGen, Target] Use pred_empty and succ_empty (NFC) 2021-09-10 11:11:31 -07:00
Nikita Popov 14afbe9448 [CallLowering] Support opaque pointers
Always use the byval/inalloca/preallocated type (which is required
nowadays), don't fall back on the pointer element type.

This requires adding Function::getParamPreallocatedType() to
mirror the CallBase API, so that the templated code can work with
both.
2021-09-10 18:32:12 +02:00
Sander de Smalen ec7d8d5069 [SelectionDAG] PromoteIntRes_EXTRACT_SUBVECTOR for scalable vectors (widening).
This patch implements legalization of EXTRACT_SUBVECTOR for the case
where the result needs promoting, and the input type requires widening.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D109509
2021-09-10 13:29:26 +01:00
Sander de Smalen 801a745dd2 [SelectionDAG] PromoteIntRes_EXTRACT_SUBVECTOR for scalable vectors.
This patch implements legalization of EXTRACT_SUBVECTOR for the case
where the result needs promoting, and the input type is either legal
or requires splitting.

The idea is that the operation is broken down into simpler steps,
by first extracting a smaller subvector until the input vector
becomes legal or requires promotion.

Reviewed By: CarolineConcatto

Differential Revision: https://reviews.llvm.org/D109313
2021-09-10 13:29:26 +01:00
Zequan Wu 12f80c0bbd [DebugInfo] Emit DW_AT_inline under -g1/-gmlt
Differential Revision: https://reviews.llvm.org/D109554
2021-09-09 18:59:50 -07:00
Craig Topper 9af8f1b18e [SelectionDAG] Add isZero/isAllOnes methods to ConstantSDNode.
Soft deprecrate isNullValue/isAllOnesValue and update in tree
callers. This matches the changes to the APInt interface from
D109483.

Reviewed By: lattner

Differential Revision: https://reviews.llvm.org/D109535
2021-09-09 13:28:30 -07:00
Craig Topper 517728fe1e [SelectionDAG] Use DAG.getNOT to further simplify some code. NFC
Followup to D109483
2021-09-09 10:53:39 -07:00
Nick Desaulniers e69d402088 [NFC] rename member of BitTestBlock and JumpTableHeader
Follow up to suggestions in D109103 via hans:
  I think UnreachableDefault (or UnreachableFallthrough) would be a
  better name now, since it doesn't just omit the range check, it also
  omits the last bit test.

Reviewed By: hans

Differential Revision: https://reviews.llvm.org/D109455
2021-09-09 10:43:00 -07:00
Chris Lattner d51da74889 [CodeGen] Use DAG.getAllOnesConstant where possible to simplify code. NFC. 2021-09-09 10:22:51 -07:00
Chris Lattner 735f46715d [APInt] Normalize naming on keep constructors / predicate methods.
This renames the primary methods for creating a zero value to `getZero`
instead of `getNullValue` and renames predicates like `isAllOnesValue`
to simply `isAllOnes`.  This achieves two things:

1) This starts standardizing predicates across the LLVM codebase,
   following (in this case) ConstantInt.  The word "Value" doesn't
   convey anything of merit, and is missing in some of the other things.

2) Calling an integer "null" doesn't make any sense.  The original sin
   here is mine and I've regretted it for years.  This moves us to calling
   it "zero" instead, which is correct!

APInt is widely used and I don't think anyone is keen to take massive source
breakage on anything so core, at least not all in one go.  As such, this
doesn't actually delete any entrypoints, it "soft deprecates" them with a
comment.

Included in this patch are changes to a bunch of the codebase, but there are
more.  We should normalize SelectionDAG and other APIs as well, which would
make the API change more mechanical.

Differential Revision: https://reviews.llvm.org/D109483
2021-09-09 09:50:24 -07:00
Chris Lattner 9e46dd965a [APInt.h] Reduce the APInt header file interface a bit. NFC
This moves one mid-size function out of line, inlines the
trivial tcAnd/tcOr/tcXor/tcComplement methods into their only
caller, and moves the magic/umagic functions into SelectionDAG
since they are implementation details of its algorithm. This
also removes the unit tests for magic, but these are already
tested in the divide lowering logic for various targets.

This also upgrades some C style comments to C++.

Differential Revision: https://reviews.llvm.org/D109476
2021-09-08 18:17:07 -07:00
Amara Emerson eae44c8a86 [GlobalISel] Implement merging of stores of truncates.
This is a port of a combine which matches a pattern where a wide type scalar
value is stored by several narrow stores. It folds it into a single store or
a BSWAP and a store if the targets supports it.

Assuming little endian target:
 i8 *p = ...
 i32 val = ...
 p[0] = (val >> 0) & 0xFF;
 p[1] = (val >> 8) & 0xFF;
 p[2] = (val >> 16) & 0xFF;
 p[3] = (val >> 24) & 0xFF;
=>
 *((i32)p) = val;

On CTMark AArch64 -Os this results in a good amount of savings:

Program            before        after       diff
             SPASS 412792       412788       -0.0%
                kc 432528       432512       -0.0%
            lencod 430112       430096       -0.0%
  consumer-typeset 419156       419128       -0.0%
            bullet 475840       475752       -0.0%
        tramp3d-v4 367760       367628       -0.0%
          clamscan 383388       383204       -0.0%
    pairlocalalign 249764       249476       -0.1%
    7zip-benchmark 570100       568860       -0.2%
           sqlite3 287628       286920       -0.2%
Geomean difference                           -0.1%

Differential Revision: https://reviews.llvm.org/D109419
2021-09-08 17:06:33 -07:00
Nick Desaulniers 4331f19d8b [ISEL][BitTestBlock] omit additional bit test when default destination is unreachable
Otherwise we end up with an extra conditional jump, following by an
unconditional jump off the end of a function. ie.

  bb.0:
    BT32rr ..
    JCC_1 %bb.4 ...
  bb.1:
    BT32rr ..
    JCC_1 %bb.2 ...
    JMP_1 %bb.3
  bb.2:
    ...
  bb.3.unreachable:
  bb.4:
    ...

  Should be equivalent to:
  bb.0:
    BT32rr ..
    JCC_1 %bb.4 ...
    JMP_1 %bb.2
  bb.1:
  bb.2:
    ...
  bb.3.unreachable:
  bb.4:
    ...

This can occur since at the higher level IR (Instruction) SwitchInsts
are required to have BBs for default destinations, even when it can be
deduced that such BBs are unreachable.

For most programs, this isn't an issue, just wasted instructions since the
unreachable has been statically proven.

The x86_64 Linux kernel when built with CONFIG_LTO_CLANG_THIN=y fails to
boot though once D106056 is re-applied.  D106056 makes it more likely
that correlation-propagation (CVP) can deduce that the default case of
SwitchInsts are unreachable. The x86_64 kernel uses a binary post
processor called objtool, which emits this warning:

vmlinux.o: warning: objtool: cfg80211_edmg_chandef_valid()+0x169: can't
find jump dest instruction at .text.cfg80211_edmg_chandef_valid+0x17b

I haven't debugged precisely why this causes a failure at boot time, but
fixing this very obvious jump off the end of the function fixes the
warning and boot problem.

Link: https://bugs.llvm.org/show_bug.cgi?id=50080
Fixes: https://github.com/ClangBuiltLinux/linux/issues/679
Fixes: https://github.com/ClangBuiltLinux/linux/issues/1440

Reviewed By: hans

Differential Revision: https://reviews.llvm.org/D109103
2021-09-08 11:03:47 -07:00
David Green d8d24c64fe [DAG] Fix GT -> GE condition when creating SetCC
79845ed6df folded some setcc(ashr) conditions to setcc, but got
the condition for NE incorrect, using GT where it should be using GE.
2021-09-08 12:41:51 +01:00
Evgeny Leviant 93b09a2a5d [LiveDebugValues] Handle spills of indirect debug values correctly
When handling register spill for indirect debug value LiveDebugValues pass doesn't add
DW_OP_deref operator which may in some cases cause debugger to return value address, instead
of value while machine register holding that address is spilled.

Differential revision: https://reviews.llvm.org/D109142
2021-09-08 14:06:08 +03:00
Fraser Cormack 2c5568a6a9 [LegalizeTypes][VP] Add promotion support for binary VP ops
This patch extends the preliminary support for vector-predicated (VP)
operation legalization to include promotion of illegal integer vector
types.

Integer promotion of binary VP operations is relatively simple and
piggy-backs on the non-VP logic, but passing the two extra mask and VP
operands through to the promoted operation.

Tests have been added to the RISC-V target to cover the basic scenarios
for integer promotion for both fixed- and scalable-vector types.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D108288
2021-09-08 10:22:57 +01:00
Peter Smith 5e71839f77 [MC] Add MCSubtargetInfo to MCAlignFragment
In preparation for passing the MCSubtargetInfo (STI) through to writeNops
so that it can use the STI in operation at the time, we need to record the
STI in operation when a MCAlignFragment may write nops as padding. The
STI is currently unused, a further patch will pass it through to
writeNops.

There are many places that can create an MCAlignFragment, in most cases
we can find out the STI in operation at the time. In a few places this
isn't possible as we are in initialisation or finalisation, or are
emitting constant pools. When possible I've tried to find the most
appropriate existing fragment to obtain the STI from, when none is
available use the per module STI.

For constant pools we don't actually need to use EmitCodeAlign as the
constant pools are data anyway so falling through into it via an
executable NOP is no better than falling through into data padding.

This is a prerequisite for D45962 which uses the STI to emit the
appropriate NOP for the STI. Which can differ per fragment.

Note that involves an interface change to InitSections. It is now
called initSections and requires a SubtargetInfo as a parameter.

Differential Revision: https://reviews.llvm.org/D45961
2021-09-07 15:46:19 +01:00
Mirko Brkusanin 6c4b634da6 [AMDGPU][GlobalISel] Legalize G_MUL for non-standard types
Legalizing G_MUL for non-standard types (like i33) generated an error. Putting
minScalar and maxScalar instead of clampScalar. Also using new rule, instead
of widening to the next power of 2, widen to the next multiple of the passed
argument (32 in this case), so instead of widening i65 to i128, we widen it to
i96.

Patch by: Mateja Marjanovic

Differential Revision: https://reviews.llvm.org/D109228
2021-09-07 16:33:24 +02:00
Mirko Brkusanin 5263bf583a [AMDGPU][GlobalISel] Legalization of G_ROTL and G_ROTR
Add implementation for the legalization of G_ROTL and G_ROTR machine
instructions. They are very similar to funnel shift instructions, the only
difference is funnel shifts have 3 operands, whereas rotate instructions have
two operands, the first being the register that is being rotated and the second
being the number of shifts. The legalization of G_ROTL/G_ROTR is just lowering
them into funnel shift instructions if they are legal.

Patch by: Mateja Marjanovic

Differential Revision: https://reviews.llvm.org/D105347
2021-09-07 16:33:24 +02:00
Mirko Brkusanin 36527cbe02 [AMDGPU][GlobalISel] Legalize memcpy family of intrinsics
Legalize G_MEMCPY, G_MEMMOVE, G_MEMSET and G_MEMCPY_INLINE.

Corresponding intrinsics are replaced by a loop that uses loads/stores in
AMDGPULowerIntrinsics pass unless their length is a constant lower then
MemIntrinsicExpandSizeThresholdOpt (default 1024). Any G_MEM* instruction that
reaches legalizer should have a const length argument and should be expanded
into appropriate number of loads + stores.

Differential Revision: https://reviews.llvm.org/D108357
2021-09-07 12:24:07 +02:00
Fraser Cormack a823bdf3ab [RISCV][VP] Custom lower VP_STORE and VP_LOAD
This patch adds support for the vector-predicated `VP_STORE` and
`VP_LOAD` nodes. We do this in the same way we lower `MSTORE` and
`MLOAD`: to regular load/store instructions via intrinsics.

One necessary change was made to `SelectionDAGLegalize` so that
`VP_STORE` nodes' operation actions are taken from the stored "value"
operands, in the same vein as `STORE` or `MSTORE`.

Reviewed By: craig.topper, rogfer01

Differential Revision: https://reviews.llvm.org/D108999
2021-09-07 10:53:25 +01:00
Fraser Cormack f4dee8cb82 [RISCV][VP] Custom lower VP_SCATTER and VP_GATHER
This patch adds support for the `VP_SCATTER` and `VP_GATHER` nodes by
lowering them to RVV's `vsox`/`vlux` instructions, respectively. This
process is almost identical to the existing `MSCATTER`/`MGATHER` support.

One extra change was made to `SelectionDAGLegalize` so that
`VP_SCATTER`'s operation action is derived from its stored "value"
operand rather than its return type (which is always the chain).

Reviewed By: craig.topper, rogfer01

Differential Revision: https://reviews.llvm.org/D108987
2021-09-07 10:43:07 +01:00
Sanjay Patel e1e4bf174b [DAGCombine] Prevent the transform of combine for multi-use operand
The test is based on a miscompile example in:
https://llvm.org/PR51321

Differential Revision: https://reviews.llvm.org/D107692
2021-09-06 15:30:32 -04:00
Jonas Paulsson 118997d8e9 [SelectionDAGBuilder] Bugfix in visitInlineAsm()
In case of a virtual register tied to a phys-def, the register class needs to
be computed. Make sure that this works generally also with fast regalloc by
using TLI.getRegClassFor() whenever possible, and make only the case of
'Untyped' use getMinimalPhysRegClass().

Fixes https://bugs.llvm.org/show_bug.cgi?id=51699.

Review: Ulrich Weigand
Differential Revision: https://reviews.llvm.org/D109291
2021-09-06 17:46:31 +02:00
David Green 1b83aaaefa [DAG] Remove oneuse check in select_cc setgt X, -1, C, ~C fold
This appears to produce better code, even if the condition may need to
be replicated.
2021-09-05 16:18:31 +01:00
David Green 8523fb96a6 [DAG] Fold select_cc setgt X, -1, C, ~C -> xor (ashr X, BW-1), C
Given a select_cc producing a constant and a invertion of the constant
for a comparison more than zero, we can produce an xor with ashr
instead, which produces smaller code. The ashr either sets all bits or
clear all bits depending on if the value is negative. This is then xor'd
with the constant to optionally negate the value.
https://alive2.llvm.org/ce/z/DTFaBZ

This includes a OneUseCheck on the Cmp, which seems to make thinks a
little worse and will be removed in a followup.

Differential Revision: https://reviews.llvm.org/D109149
2021-09-05 16:04:01 +01:00
David Green 79845ed6df [DAG] Fold setcc eq with ashr to compare to zero.
Pulled out of D109149, this folds set_cc seteq (ashr X, BW-1), -1 ->
set_cc setlt X, 0 to prevent some regressions later on when folding
select_cc setgt X, -1, C, ~C -> xor (ashr X, BW-1), C

Differential Revision: https://reviews.llvm.org/D109214
2021-09-05 14:06:47 +01:00
Fangrui Song e03c8d309a [AsmPrinter] Remove unneeded MCSubtargetInfo temporary after D14346. NFC
The temporary object was used as a workaround when the target parser may
change STI. D14346 made the MCSubtargetInfo argument to
createMCAsmParser const, so we no longer need the temporary object.
2021-09-04 10:50:10 -07:00
Konstantin Schwarz 90d5298759 [GlobalISel] Add convenience constructors to MemDesc
This allows constructing a MemDesc from a MachineMemoryOperand, a pattern that starts to show up more frequently.

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D109161
2021-09-03 12:52:18 +02:00
Chen Zheng 34badc409c Revert "[HardwareLoops] Change order of SCEV expression construction for InitLoopCount."
This causes https://bugs.llvm.org/show_bug.cgi?id=51714 and
is not a right patch according to comments in D91724

This reverts commit 42eaf4fe0a.
2021-09-03 02:55:43 +00:00
Jessica Paquette 844d8e0337 [GlobalISel] Combine icmp eq/ne x, 0/1 -> x when x == 0 or 1
This adds the following combines:

```
x = ... 0 or 1
c = icmp eq x, 1

->

c = x
```

and

```
x = ... 0 or 1
c = icmp ne x, 0

->

c = x
```

When the target's true value for the relevant types is 1.

This showed up in the following situation:

https://godbolt.org/z/M5jKexWTW

SDAG currently supports the `ne` case, but not the `eq` case. This can probably
be further generalized, but I don't feel like thinking that hard right now.

This gives some minor code size improvements across the board on CTMark at
-Os for AArch64. (0.1% for 7zip and pairlocalalign in particular.)

Differential Revision: https://reviews.llvm.org/D109130
2021-09-02 15:05:31 -07:00
Heejin Ahn 28780e59f6 [WebAssembly] Add Wasm SjLj support
This add support for SjLj using Wasm exception handling instructions:
https://github.com/WebAssembly/exception-handling/blob/master/proposals/exception-handling/Exceptions.md

This does not yet support the mixed use of EH and SjLj within a
function. It will be added in a follow-up CL.

This currently passes all SjLj Emscripten tests for wasm0/1/2/3/s,
except for the below:
- `test_longjmp_standalone`: Uses Node
- `test_dlfcn_longjmp`: Uses NodeRAWFS
- `test_longjmp_throw`: Mixes EH and SjLj
- `test_exceptions_longjmp1`: Mixes EH and SjLj
- `test_exceptions_longjmp2`: Mixes EH and SjLj
- `test_exceptions_longjmp3`: Mixes EH and SjLj

Reviewed By: dschuff, tlively

Differential Revision: https://reviews.llvm.org/D108960
2021-09-02 10:51:02 -07:00
David Green 9cb8f4d1ad [ARM] Add a tail-predication loop predicate register
The semantics of tail predication loops means that the value of LR as an
instruction is executed determines the predicate. In other words:

mov r3, #3
DLSTP lr, r3        // Start tail predication, lr==3
VADD.s32 q0, q1, q2 // Lanes 0,1 and 2 are updated in q0.
mov lr, #1
VADD.s32 q0, q1, q2 // Only first lane is updated.

This means that the value of lr cannot be spilled and re-used in tail
predication regions without potentially altering the behaviour of the
program. More lanes than required could be stored, for example, and in
the case of a gather those lanes might not have been setup, leading to
alignment exceptions.

This patch adds a new lr predicate operand to MVE instructions in order
to keep a reference to the lr that they use as a tail predicate. It will
usually hold the zeroreg meaning not predicated, being set to the LR phi
value in the MVETPAndVPTOptimisationsPass. This will prevent it from
being spilled anywhere that it needs to be used.

A lot of tests needed updating.

Differential Revision: https://reviews.llvm.org/D107638
2021-09-02 13:42:58 +01:00
Roman Lebedev 3f1f08f0ed
Revert @llvm.isnan intrinsic patchset.
Please refer to
https://lists.llvm.org/pipermail/llvm-dev/2021-September/152440.html
(and that whole thread.)

TLDR: the original patch had no prior RFC, yet it had some changes that
really need a proper RFC discussion. It won't be productive to discuss
such an RFC, once it's actually posted, while said patch is already
committed, because that introduces bias towards already-committed stuff,
and the tree is potentially in broken state meanwhile.

While the end result of discussion may lead back to the current design,
it may also not lead to the current design.

Therefore i take it upon myself
to revert the tree back to last known good state.

This reverts commit 4c4093e6e3.
This reverts commit 0a2b1ba33a.
This reverts commit d9873711cb.
This reverts commit 791006fb8c.
This reverts commit c22b64ef66.
This reverts commit 72ebcd3198.
This reverts commit 5fa6039a5f.
This reverts commit 9efda541bf.
This reverts commit 94d3ff09cf.
2021-09-02 13:53:56 +03:00
Fraser Cormack ef78f2106c [LegalizeTypes][VP] Add splitting support for binary VP ops
This patch extends D107904's introduction of vector-predicated (VP)
operation legalization to include vector splitting.

When the result of a binary VP operation needs splitting, all of its
operands are split in kind. The two operands and the mask are split as
usual, and the vector-length parameter EVL is "split" such that the low
and high halves each execute the correct number of elements.

Tests have been added to the RISC-V target to show splitting several
scenarios for fixed- and scalable-vector types. Without support for
`umax` (e.g. in the `B` extension) the generated code starts to branch.
Ideally a cost model would prevent their insertion in the first place.

Through these tests many opportunities for better codegen can be seen:
combining known-undef VP operations and for constant-folding operations
on `ISD::VSCALE`, to name but a few.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D107957
2021-09-02 10:15:53 +01:00
Abinav Puthan Purayil 0baace5379 [DAGCombine] Add node level checks for fp-contract and fp-ninf in visitFMULForFMADistributiveCombine().
Differential Revision: https://reviews.llvm.org/D107551
2021-09-02 11:33:14 +05:30
Roman Lebedev f5753125f0
[Codegen][TLI][X86] SimplifyMultipleUseDemandedBits(): 0'th vec subreg widening is free, try to perform it earlier
I believe, the profitability reasoning here is correct
"sub"reg is already located within the 0'th subreg of wider reg,
so if we have suvector insertion at index 0 into undef,
then it's always free do to.

After this, D109065 finally avoids the regression in D108382.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D109074
2021-09-02 00:54:05 +03:00
Arthur Eubanks 52e6d70c40 [NFC] Use newly introduced *AtIndex methods
Introduced in D108788. These are clearer.
2021-09-01 11:18:41 -07:00
Fraser Cormack 85fd44d7fe [SelectionDAG][NFC] Fix typo in assertion message
s/Uexpected/Unexpected.
2021-09-01 08:55:06 +01:00
Yonghong Song 89424a829f [DWARF] Support new TAG DW_TAG_LLVM_annotation
A new LLVM specific TAG DW_TAG_LLVM_annotation is added.
The name is suggested by Paul Robinson ([1]).
Currently, this tag is used to output __attribute__((btf_tag("string")))
annotations in dwarf. The following is an example for a global
variable with two btf_tag attributes:
  0x0000002a:   DW_TAG_variable
                  DW_AT_name      ("g1")
                  DW_AT_type      (0x00000052 "int")
                  DW_AT_external  (true)
                  DW_AT_decl_file ("/tmp/home/yhs/work/tests/llvm/btf_tag/t.c")
                  DW_AT_decl_line (8)
                  DW_AT_location  (DW_OP_addr 0x0)

  0x0000003f:     DW_TAG_LLVM_annotation
                    DW_AT_name    ("btf_tag")
                    DW_AT_const_value     ("tag1")

  0x00000048:     DW_TAG_LLVM_annotation
                    DW_AT_name    ("btf_tag")
                    DW_AT_const_value     ("tag2")

  0x00000051:     NULL

In the future, DW_TAG_LLVM_annotation may encode other type
of non-string const value.

 [1] https://lists.llvm.org/pipermail/llvm-dev/2021-June/151250.html

Differential Revision: https://reviews.llvm.org/D106621
2021-08-31 19:22:17 -07:00
Stanislav Mekhanoshin d170945bb2 [RegAlloc] Immediately delete dead instructions with live uses
When RA eliminated a dead def it can either immediately delete
the instruction itself or replace it with KILL to defer the
actual removal. If this instruction has a virtual register use
killing the register it will shrink the LI of the use. However,
if the LI covers the instruction and extends beyond it the
shrink will not happen. In fact that is impossible to shrink
such use because of the KILL still using it.

If later the LI of the use will be split at the KILL and the
KILL itself is eliminated after that point the new live segment
ends up at an invalid slot index.

This extremely rare condition was hit after D106408 which has
enabled rematerialization of such instructions. The replacement
with KILL is only done for rematerialized defs which became dead
and such rematerialization did not generally happen before.

The patch deletes an instruction immediately if it is a result
of rematerialization and has such use. An alternative would be
to prohibit a split at a KILL instruction, but it looks like it
is better to split a live range rather then keeping a killed
instruction just in case it can be rematerialized further.

Fixes PR51655.

Differential Revision: https://reviews.llvm.org/D108951
2021-08-31 13:46:00 -07:00
Jessica Paquette 94d3ff09cf [GlobalISel] Don't use G_FPTOSI in G_ISNAN legalization
As noted in the comments in D108227, using G_FPTOSI produces wrong results for
G_ISNAN. Drop the G_FPTOSI and perform the operation on integer types.

Elsewhere in LLVM, a bitcast would be the appropriate choice (as it is in SDAG).
GlobalISel does not distinguish between integer and FP types, so a bitcast would
be meaningless here.
2021-08-31 10:26:42 -07:00
Hussain Kadhem 524ded7d01 [VP] implementation of sdag support for VP memory intrinsics
Followup to D99355: SDAG support for vector-predicated load/store/gather/scatter.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D105871
2021-08-31 17:01:50 +02:00
Nemanja Ivanovic 84d4ed1761 Revert "[DebugInfo] Emit DW_TAG_namelist and DW_TAG_namelist_item"
This reverts commit 0a6fad754e.
It caused failures on a number of PowerPC bots.
2021-08-31 09:24:50 -05:00
Craig Topper 201f6446da [LegalizeTypes][X86] Improve ExpandIntRes_FP_TO_SINT/ExpandIntRes_FP_TO_UINT when input is SoftPromoteHalf.
Instead of splitting off the fp16 to float conversion and generating
a libcall, we should split the operation into fp16 to float and float
to integer operations. This will allow the float to integer conversion
to go through any custom handling the target has. If the target doesn't
have custom handling then we should come back to ExpandIntRes_FP_TO_SINT/
ExpandIntRes_FP_TO_UINT automatically to create the libcall.

This avoids generating libcalls on 32-bit X86. These library functions may
not exist in 32-bit libgcc. At least for LLVM, we never generate them when
hardware floating point instructions are available.

Differential Revision: https://reviews.llvm.org/D108933
2021-08-30 13:12:59 -07:00
Bjorn Pettersson 789f01283d [SelectionDAG] Fix miscompile bugs related to smul.fix.sat with scale zero
When expanding a SMULFIXSAT ISD node (usually originating from
a smul.fix.sat intrinsic) we've applied some optimizations for
the special case when the scale is zero. The idea has been that
it would be cheaper to use an SMULO instruction (if legal) to
perform the multiplication and at the same time detect any overflow.
And in case of overflow we could use some SELECT:s to replace the
result with the saturated min/max value. The only tricky part
is to know if we overflowed on the min or max value, i.e. if the
product is positive or negative. Unfortunately the implementation
has been incorrect as it has looked at the product returned by the
SMULO to determine the sign of the product. In case of overflow that
product is truncated and won't give us the correct sign bit.

This patch is adding an extra XOR of the multiplication operands,
which is used to determine the sign of the non truncated product.

This patch fixes PR51677.

Reviewed By: lebedev.ri

Differential Revision: https://reviews.llvm.org/D108938
2021-08-30 22:08:26 +02:00
Chih-Ping Chen 070090cfa5 [DebugInfo] Remove the restriction on the size of DIStringType
in DebugHandlerBase::isUnsignedDIType.

Differential Revision: https://reviews.llvm.org/D108559
2021-08-30 15:36:54 -04:00
Nikita Popov 0529e2e018 [InstrInfo] Use 64-bit immediates for analyzeCompare() (NFCI)
The backend generally uses 64-bit immediates (e.g. what
MachineOperand::getImm() returns), so use that for analyzeCompare()
and optimizeCompareInst() as well. This avoids truncation for
targets that support immediates larger 32-bit. In particular, we
can avoid the bugprone value normalization hack in the AArch64
target.

This is a followup to D108076.

Differential Revision: https://reviews.llvm.org/D108875
2021-08-30 19:46:04 +02:00
Hongtao Yu f39256e3a5 [CSSPGO] Avoid repeatedly computing md5 hash code for pseudo probe inline contexts.
Md5 hashing is expansive. Using a hash map to look up already computed GUID for dwarf names. Saw a 2% build time improvement on an internal large application.

Reviewed By: wenlei

Differential Revision: https://reviews.llvm.org/D108722
2021-08-30 10:11:47 -07:00
Kazu Hirata c50faffb4e [llvm] Remove redundant calls to str() and c_str() (NFC)
Identified with readability-redundant-string-cstr.
2021-08-30 09:05:05 -07:00
Craig Topper 705d005781 [DAGCombiner][RISCV] Don't use vector types in DAGCombiner::tryStoreMergeOfLoads if we need a rotate.
The check for whether a rotate is possible occurs before the
memory legality checks for the integer type. So it's possible we
decide we can use a rotate, but then fail the legality checks. If
that happens we should not fall back to a vector type. This triggers
an assertion in the rotate handling when it finds a vector type
instead of an integer type.

In theory we could use a shufflevector in place of the rotate, but
right now I'd just like to fix the crash.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D108839
2021-08-30 08:47:15 -07:00
Djordje Todorovic 86f5288eae [LiveDebugValues] Cleanup Transfers when removing Entry Value
If we encounter a new debug value, describing the same parameter,
we should stop tracking the parameter's Entry Value. At that point,
in some cases, the Transfer which uses the parameter's Entry Value,
is already emitted. Thanks to the RemoveRedundantDebugValues pass,
many problems with incorrect instruction order and number of DBG_VALUEs
are fixed. However, we still cannot rely on the rule that each new
debug value is set by the previous non-debug instruction in Machine
Basic Block.

When new parameter debug value triggers removal of Backup Entry Value
for the same parameter, do the cleanup of Transfers emitted from Backup
Entry Values. Get the Transfer Instruction which created the new debug
value and search for debug values already emitted from the to-be-deleted
Backup Entry Value and attached to the Transfer Instruction. If found,
delete the Transfer and remove "primary" Entry Value Var Loc from
OpenRanges.

This patch fixes PR47628.

Patch by Nikola Tesic.

Differential revision: https://reviews.llvm.org/D106856
2021-08-30 14:00:41 +02:00
Simon Pilgrim 7c25a32840 Fix MSVC "signed/unsigned mismatch" comparison warning. NFCI. 2021-08-30 12:11:09 +01:00
“bhkumarn” 0a6fad754e [DebugInfo] Emit DW_TAG_namelist and DW_TAG_namelist_item
This patch emits DW_TAG_namelist and DW_TAG_namelist_item for fortran
namelist variables. DICompositeType is extended to support this fortran
feature.

Reviewed By: aprantl

Differential Revision: https://reviews.llvm.org/D108553
2021-08-30 13:40:39 +05:30
Matt Arsenault 1494298b51 GlobalISel: Remove check for empty functions as these are invalid IR 2021-08-27 09:27:06 -04:00
Carl Ritson 5d9de3ea18 [DAGCombine] Allow FMA combine with both FMA and FMAD
Without this change only the preferred fusion opcode is tested
when attempting to combine FMA operations.
If both FMA and FMAD are available then FMA ops formed prior to
legalization will not be merged post legalization as FMAD becomes
the preferred fusion opcode.

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D108619
2021-08-27 19:49:35 +09:00
Matt Arsenault 3fdcd9bb13 GlobalISel: Add CallBase to CallLoweringInfo
The DAG version has this, and is necessary for call lowering to take
advantage of any attributes at the call site.
2021-08-26 21:09:11 -04:00
Craig Topper 8bb24289f3 [SelectionDAG] Optimize bitreverse expansion to minimize the number of mask constants.
We can halve the number of mask constants by masking before shl
and after srl.

This can reduce the number of mov immediate or constant
materializations. Or reduce the number of constant pool loads
for X86 vectors.

I think we might be able to do something similar for bswap. I'll
look at it next.

Differential Revision: https://reviews.llvm.org/D108738
2021-08-26 09:33:24 -07:00
Andrew Wei c9066c5d37 [CGP] Fix the crash for combining address mode when having cyclic dependency
In the combination of addressing modes, when replacing the matched phi nodes,
sometimes the phi node to be replaced has been modified. For example,
there’s matcher set [A, B] and [C, A], which will have cyclic dependency:
A is replaced by B and C will be replaced by A. Because we tried to match new phi node
to another new phi node, we should ignore new phi nodes when mapping new phi node to old one.

Reviewed By: skatkov

Differential Revision: https://reviews.llvm.org/D108635
2021-08-26 22:52:42 +08:00
Jay Foad 985eb25546 [MachineScheduler] Fix tracing
Consistently print a newline before "RegionInstrs:".
2021-08-26 09:27:01 +01:00
Heejin Ahn 2f88a30ca6 [WebAssembly] Extract longjmp handling in EmSjLj to a function (NFC)
Emscripten SjLj and (soon-to-be-added) Wasm SjLj transformation share
many steps:
1. Initialize `setjmpTable` and `setjmpTableSize` in the entry BB
2. Handle `setjmp` callsites
3. Handle `longjmp` callsites
4. Cleanup and update SSA

1, 3, and 4 are identical for Emscripten SjLj and Wasm SjLj. Only the
step 2 is different. This CL extracts the current Emscripten SjLj's
longjmp callsites handling into a function. The reason to make this a
separate CL is, without this, the diff tool cannot compare things well
in the presence of moved code and added code in the followup Wasm SjLj
CL, and it ends up mixing them together, making the diff unreadable.

Also fixes some typos and variable names. So far we've been calling the
buffer argument to `setjmp` and `longjmp` `jmpbuf`, but the name used in
the man page for those functions is `env`, so updated them to be
consistent.

Reviewed By: tlively

Differential Revision: https://reviews.llvm.org/D108728
2021-08-25 15:45:38 -07:00
Heejin Ahn c2c9a3fd9c [WebAssembly] Rename wasm.catch.exn intrinsic back to wasm.catch
The plan was to use `wasm.catch.exn` intrinsic to catch exceptions and
add `wasm.catch.longjmp` intrinsic, that returns two values (setjmp
buffer and return value), later to catch longjmps. But because we
decided not to use multivalue support at the moment, we are going to use
one intrinsic that returns a single value for both exceptions and
longjmps. And even if it's not for that, I now think the naming of
`wasm.catch.exn` is a little weird, because the intrinsic can still take
a tag immediate, which means it can be used for anything, not only
exceptions, as long as that returns a single value.

This partially reverts D107405.

Reviewed By: tlively

Differential Revision: https://reviews.llvm.org/D108683
2021-08-25 14:19:22 -07:00
Sanjay Patel e728d1a3e8 [DAGCombiner] create binop nodes with all of expected values
This is another bug exposed by https://llvm.org/PR51612
(and the one that triggered the initial assertion) in the report.

That example was suppressed with:
985b48f183

...but these would still crash because we created nodes
like UADDO without the expected 2 output values.
2021-08-25 16:14:22 -04:00
Sanjay Patel 985b48f183 [DAGCombiner] check uses more strictly on select-of-binop fold
There are 2 bugs here:
1. We were not checking uses of operand 2 (the false value of the select).
2. We were not checking for multiple uses of nodes that produce >1 result.

Correcting those is enough to avoid the crash in the reduced test based on:
https://llvm.org/PR51612

The additional use check on operand 0 (the condition value of the select)
should not strictly be necessary because we are only replacing one use
with another (whether it makes performance sense to do the transform with
that pattern is not clear). But as noted in the TODO, changing that
uncovers another bug.

Note: there's at least one more bug here - we aren't propagating EVTs
correctly, but I plan to fix that in another patch.
2021-08-25 14:14:41 -04:00
Nick Desaulniers 846e562dcc [Clang] add support for error+warning fn attrs
Add support for the GNU C style __attribute__((error(""))) and
__attribute__((warning(""))). These attributes are meant to be put on
declarations of functions whom should not be called.

They are frequently used to provide compile time diagnostics similar to
_Static_assert, but which may rely on non-ICE conditions (ie. relying on
compiler optimizations). This is also similar to diagnose_if function
attribute, but can diagnose after optimizations have been run.

While users may instead simply call undefined functions in such cases to
get a linkage failure from the linker, these provide a much more
ergonomic and actionable diagnostic to users and do so at compile time
rather than at link time. Users instead may be able use inline asm .err
directives.

These are used throughout the Linux kernel in its implementation of
BUILD_BUG and BUILD_BUG_ON macros. These macros generally cannot be
converted to use _Static_assert because many of the parameters are not
ICEs. The Linux kernel still needs to be modified to make use of these
when building with Clang; I have a patch that does so I will send once
this feature is landed.

To do so, we create a new IR level Function attribute, "dontcall" (both
error and warning boil down to one IR Fn Attr).  Then, similar to calls
to inline asm, we attach a !srcloc Metadata node to call sites of such
attributed callees.

The backend diagnoses these during instruction selection, while we still
know that a call is a call (vs say a JMP that's a tail call) in an arch
agnostic manner.

The frontend then reconstructs the SourceLocation from that Metadata,
and determines whether to emit an error or warning based on the callee's
attribute.

Link: https://bugs.llvm.org/show_bug.cgi?id=16428
Link: https://github.com/ClangBuiltLinux/linux/issues/1173

Reviewed By: aaron.ballman

Differential Revision: https://reviews.llvm.org/D106030
2021-08-25 10:34:18 -07:00
Jeremy Morse 0116ed0069 [DebugInfo][InstrRef] Don't use instr-ref for unoptimised functions
InstrRefBasedLDV is marginally slower than VarlocBasedLDV when analysing
optimised code -- however, it's much slower when analysing code compiled
-O0.

To avoid this: don't use instruction referencing for -O0 functions. In the
"pure" case of unoptimised code, this won't really harm the debugging
experience because most variables won't have been promoted off the stack,
so can't go missing. It becomes more complicated when optimised code is
inlined into functions marked optnone; however these are rare, and as -O0
doesn't run many optimisations there should be little damage to the debug
experience as a result.

I've taken the opportunity to refactor testing for instruction-referencing
into a MachineFunction method, which seems the most appropriate place to
put it.

Differential Revision: https://reviews.llvm.org/D108585
2021-08-25 15:10:36 +01:00
Peilin Guo 4c4dbeeeea [DAGCombine] Check the legality of the index of EXTRACT_SUBVECTOR
For ISD::EXTRACT_SUBVECTOR, its second operand must be a constant
multiple of the known-minimum vector length of the result type.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D107795
2021-08-25 19:33:39 +08:00
Jeremy Morse cc1e87bf55 [DebugInfo][InstrRef] Avoid stack-slot-coloring changing codegen due to DI
Stack slot colouring adds "weight" to slots if a non-dbg-value instruction
refers to it. This, unfortunately, means that DBG_PHI instructions can have
an effect on codegen. The fix is very simple, replace isDebugValue with
isDebugInstr.

The regression test contains a scenario that reproduces this problem; I've
represented both normal-debug mode and instr-ref debug mode instructions
in comment lines prefixed with AAAAAA and BBBBBB, and un-comment them with
sed to test that the two different modes produce the same behaviour.

Differential Revision: https://reviews.llvm.org/D108627
2021-08-25 12:04:59 +01:00
Konstantin Schwarz 4b4bc1ea16 [GlobalISel] Do not generate illegal G_SEXTLOADs after legalization
The sext_inreg_of_load combine did not have the isLegalOrBeforeLegalizer check,
leading to the generation of potentially illegal G_SEXTLOADs when run after legalization.

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D108626
2021-08-25 10:13:39 +02:00
Vang Thao 549f6a819a [MachineCopyPropagation] Check CrossCopyRegClass for cross-class copys
On some AMDGPU subtargets, copying to and from AGPR registers using another
AGPR register is not possible. A intermediate VGPR register is needed for AGPR
to AGPR copy. This is an issue when machine copy propagation forwards a
COPY $agpr, replacing a COPY $vgpr which results in $agpr = COPY $agpr. It is
removing a cross class copy that may have been optimized by previous passes and
potentially creating an unoptimized cross class copy later on.

To avoid this issue, check CrossCopyRegClass if a different register class will
be needed for the copy. If so then avoid forwarding the copy when the
destination does not match the desired register class and if the original copy
already matches the desired register class.

Issue seen while attempting to optimize another AGPR to AGPR issue:

Live-ins: $agpr0
$vgpr0 = COPY $agpr0
$agpr1 = V_ACCVGPR_WRITE_B32 $vgpr0
$agpr2 = COPY $vgpr0
$agpr3 = COPY $vgpr0
$agpr4 = COPY $vgpr0

After machine-cp:

$vgpr0 = COPY $agpr0
$agpr1 = V_ACCVGPR_WRITE_B32 $vgpr0
$agpr2 = COPY $agpr0
$agpr3 = COPY $agpr0
$agpr4 = COPY $agpr0

Machine-cp propagated COPY $agpr0 to replace $vgpr0 creating 3 AGPR to AGPR
copys. Later this creates a cross-register copy from AGPR->VGPR->AGPR for each
copy when the prior VGPR->AGPR copy was already optimal.

Reviewed By: lkail, rampitec

Differential Revision: https://reviews.llvm.org/D108011
2021-08-24 21:22:36 -07:00
Stanislav Mekhanoshin 92c1fd19ab Allow rematerialization of virtual reg uses
Currently isReallyTriviallyReMaterializableGeneric() implementation
prevents rematerialization on any virtual register use on the grounds
that is not a trivial rematerialization and that we do not want to
extend liveranges.

It appears that LRE logic does not attempt to extend a liverange of
a source register for rematerialization so that is not an issue.
That is checked in the LiveRangeEdit::allUsesAvailableAt().

The only non-trivial aspect of it is accounting for tied-defs which
normally represent a read-modify-write operation and not rematerializable.

The test for a tied-def situation already exists in the
/CodeGen/AMDGPU/remat-vop.mir,
test_no_remat_v_cvt_f32_i32_sdwa_dst_unused_preserve.

The change has affected ARM/Thumb, Mips, RISCV, and x86. For the targets
where I more or less understand the asm it seems to reduce spilling
(as expected) or be neutral. However, it needs a review by all targets'
specialists.

Differential Revision: https://reviews.llvm.org/D106408
2021-08-24 11:09:02 -07:00
Simon Pilgrim 194b08000c [DAG] LoadedSlice::canMergeExpensiveCrossRegisterBankCopy - replace getABITypeAlign with allowsMemoryAccess (PR45116)
One of the cases identified in PR45116 - we don't need to limit load combines to ABI alignment, we can use allowsMemoryAccess - which tests using getABITypeAlign, but also checks if a target permits (fast) misaligned memory loads by checking allowsMisalignedMemoryAccesses as a fallback.
2021-08-24 15:28:30 +01:00
Simon Pilgrim 6de0b55188 [DAG] TransformFPLoadStorePair - replace getABITypeAlign with allowsMemoryAccess (PR45116)
One of the cases identified in PR45116 - we don't need to limit load combines (in this case for fp->int load/store copies) to ABI alignment, we can use allowsMemoryAccess - which tests using getABITypeAlign, but also checks if a target permits (fast) misaligned memory loads by checking allowsMisalignedMemoryAccesses as a fallback.

Differential Revision: https://reviews.llvm.org/D108318
2021-08-24 13:11:27 +01:00
Simon Pilgrim e431b280c9 [DAG] CombineConsecutiveLoads - replace getABITypeAlign with allowsMemoryAccess (PR45116)
One of the cases identified in PR45116 - we don't need to limit load combines (in this case for ISD::BUILD_PAIR) to ABI alignment, we can use allowsMemoryAccess - which tests using getABITypeAlign, but also checks if a target permits (fast) misaligned memory loads by checking allowsMisalignedMemoryAccesses as a fallback.

This helps in particular for 32-bit X86 cases loading 64-bit size data, reducing codegen diffs vs x86_64.

Differential Revision: https://reviews.llvm.org/D108307
2021-08-24 12:31:22 +01:00
Stanislav Mekhanoshin 401a45c61b Fix late rematerialization operands check
D106408 enables rematerialization of instructions with virtual
register uses. That has uncovered the bug in the allUsesAvailableAt
implementation: https://bugs.llvm.org/show_bug.cgi?id=51516.

In the majority of cases canRematerializeAt() called to check if
an instruction can be rematerialized before the given UseIdx.
However, SplitEditor::enterIntvAtEnd() calls it to rematerialize
an instruction at the end of a block passing LIS.getMBBEndIdx()
into the check. In the testcase from the bug it has attempted to
rematerialize ADDXri after STRXui in bb.17. The use operand %55
of the ADD is killed by the STRX but that is undetected by the check
because it adjusts passed UseIdx to the reg slot, before the kill.
The value is dead at the index passed to the check however.

This change uses a later of passed UseIdx and its reg slot. This
shall be correct because if are checking an availability of operands
before an instruction that instruction cannot be the one defining
these operands. If we are checking for late rematerialization we
are really interested if operands live past the instruction.

The bug is not exploitable without D106408 but needed to reland
reverted D106408.

Differential Revision: https://reviews.llvm.org/D108475
2021-08-23 12:23:58 -07:00
Jessica Paquette 6760e2a7bc [GlobalISel] Translate @llvm.llround.* -> G_LLROUND
Translate it using `IRTranslator::translateSimpleIntrinsic`.

Differential Revision: https://reviews.llvm.org/D108563
2021-08-23 09:42:53 -07:00
Ben Shi f69fb7ac72 [DAGCombiner] Add target hook function to decide folding (mul (add x, c1), c2)
Reviewed by: lebedev.ri, spatel, craig.topper, luismarques, jrtc27

Differential Revision: https://reviews.llvm.org/D107711
2021-08-22 16:53:32 +08:00
Fangrui Song 1dfb30e54c [TargetCallingConv] Change OutputArg ctor to match its members
This avoids unneeded MVT->EVT conversion.
2021-08-21 16:41:48 -07:00
Jessica Paquette af8e09d4bb [GlobalISel] Add G_LLROUND
Basically the same as G_LROUND. Handles the llvm.llround family of intrinsics.

Also add a helper function to the MachineVerifier for checking if all of the
(virtual register) operands of an instruction are scalars. Seems like a useful
thing to have.

Differential Revision: https://reviews.llvm.org/D108429
2021-08-20 14:07:21 -07:00
Daniel Paoliello 8ecce69594 Fix SEH table addresses for Windows
Issue Details:
The addresses for SEH tables for Windows are incorrect as 1 was unconditionally being added to all addresses. +1 is required for the SEH end address (as it is exclusive), but the SEH start addresses is inclusive and so should be used as-is.

In the IP2State tables, the addresses are +1 for AMD64 to handle the return address for a call being after the actual call instruction but are as-is for ARM and ARM64 as the `StateFromIp` function in the VC runtime automatically takes this into account and adjusts the address that is it looking up.

Fix Details:
* Split the `getLabel` function into two: `getLabel` (used for the SEH start address and ARM+ARM64 IP2State addresses) and `getLabelPlusOne` (for the SEH end address, and AMD64 IP2State addresses).

Reviewed By: rnk

Differential Revision: https://reviews.llvm.org/D107784
2021-08-20 22:32:12 +03:00
Craig Topper 10020d41ee [TypePromotion] Remove unused IRBuilder object. NFC 2021-08-20 12:20:09 -07:00
Jeremy Morse ce8254d096 [DebugInfo][InstrRef] Correctly ignore DBG_VALUE_LIST in InstrRef mode
This patch makes InstrRefBasedLDV "safe" to work with DBG_VALUE_LISTs. It
doesn't actually interpret them, but it recognises that they specify
variable locations and avoids propagating false locations, which is better
than the current state. Observe the attached tes

 * We avoid propagating DBG_VALUE_LISTs into successor blocks, as they're
   not "currently" supported,
 * We don't propagate other variable locations across DBG_VALUE_LISTs,
   because we know that the variable location is terminated by the
   DBG_VALUE_LIST.

Differential Revision: https://reviews.llvm.org/D108143
2021-08-20 14:51:02 +01:00
Jeremy Morse c76c24e40b [DebugInfo][InstrRef] Remove a faulty assertion
This patch removes an assertion, and adds a regression test showing why the
assertion is broken.

For context, LocIdx is a key/index number for machine locations, so that we
can describe locations as a single integer and ignore whether they're on
the stack, in registers or otherwise. Back when InstrRefBasedLDV was added,
I happened to bake in a "special" zero number for various reasons, which
Vedant identified as undesirable in this review comment:
https://reviews.llvm.org/D83047#inline-765495 . I subsequently removed that
special zero number, but it looks like I didn't delete this assertion at
the time, which assumes that a zero LocIdx is invalid.

The attached test shows that this assertion is reachable on valid code --
on x86 $rsp always gets the LocIdx number zero, and if you transfer a
variable value into it, InstrRefBasedLDV crashes on that assertion. The
code might be a bit wild to be storing variables to $rsp like that, however
we shouldn't crash on it.

Differential Revision: https://reviews.llvm.org/D108134
2021-08-20 14:23:32 +01:00
Anshil Gandhi 508b06699a [Remarks] [AMDGPU] Emit optimization remarks for atomics generating hardware instructions
Produce remarks when atomic instructions are expanded into hardware instructions
in SIISelLowering.cpp. Currently, these remarks are only emitted for atomic fadd
instructions.

Differential Revision: https://reviews.llvm.org/D108150
2021-08-19 20:51:19 -06:00
Jessica Paquette 3207ed196c [GlobalISel] Add IRTranslator support for @llvm.lround.* -> G_LROUND
Translate the `@llvm.lround.*` family to G_LROUND via
`IRTranslator::translateSimpleIntrinsic`.

Differential Revision: https://reviews.llvm.org/D108418
2021-08-19 17:08:08 -07:00
Jessica Paquette 3118926483 [GlobalISel] Add a G_LROUND instruction
Meant to represent the `@llvm.lround.*` family.

Add the opcode, docs, and verification.

Differential Revision: https://reviews.llvm.org/D108417
2021-08-19 17:06:24 -07:00
Amara Emerson 95ac3d15e9 [AArch64][GlobalISel] Add G_VECREDUCE fewerElements support for full scalarization.
For some reductions like G_VECREDUCE_OR on AArch64, we need to scalarize
completely if the source is <= 64b. This change adds support for that in
the legalizer. If the source has a pow-2 num elements, then we can do
a tree reduction using the scalar operation in the individual elements.
Otherwise, we just create a sequential chain of operations.

For AArch64, we only need to scalarize if the input is <64b. If it's great than
64b then we can first do a fewElements step to 64b, taking advantage of vector
instructions until we reach the point of scalarization.

I also had to relax the verifier checks for reductions because the intrinsics
support <1 x EltTy> types, which we lower to scalars for GlobalISel.

Differential Revision: https://reviews.llvm.org/D108276
2021-08-19 16:38:52 -07:00
Adrian Prantl 1e586bcc3e Move function definition out-of-line to fix the modularized build (NFC) 2021-08-19 12:26:23 -07:00
Craig Topper 84cea602f9 Revert "[SelectionDAGBuilder] Compute and cache PreferredExtendType on demand."
This reverts commit add08c8741.

There was a compile time jump on tramp3d-v4 on https://llvm-compile-time-tracker.com/
Want to see if it goes away with this reverted.
2021-08-19 08:42:05 -07:00
David Green d10f23a25d [ISel] Expand saddsat and ssubsat via asr and xor
This changes the lowering of saddsat and ssubsat so that instead of
using:
  r,o = saddo x, y
  c = setcc r < 0
  s = c ? INTMAX : INTMIN
  ret o ? s : r
into using asr and xor to materialize the INTMAX/INTMIN constants:
  r,o = saddo x, y
  s = ashr r, BW-1
  x = xor s, INTMIN
  ret o ? x : r
https://alive2.llvm.org/ce/z/TYufgD

This seems to reduce the instruction count in most testcases across most
architectures. X86 has some custom lowering added to compensate for
cases where it can increase instruction count.

Differential Revision: https://reviews.llvm.org/D105853
2021-08-19 16:08:07 +01:00
Craig Topper add08c8741 [SelectionDAGBuilder] Compute and cache PreferredExtendType on demand.
Previously we pre-calculated this and cached it for every
instruction in the function. Most of the calculated results will
never be used. So instead calculate it only on the first use, and
then cache it.

The cache was originally added to fix a compile time issue which
caused r216066 to be reverted.

This change exposed that we weren't pre-computing the Value for
Arguments. I've explicitly disabled that for now as it seemed to
regress some tests on AArch64 which has sext built into its compare
instructions.

Spotted while investigating how to improve heuristics to work better
with RISCV preferring sign extend for unsigned compares for i32 on RV64.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D107976
2021-08-19 07:18:33 -07:00
Craig Topper c60a4c1ba5 [TypePromotion] Use Instruction* instead of Value* for a couple functions. NFC
This matches how they are called and allows some isa/cast/dyn_cast
to be removed.

Differential Revision: https://reviews.llvm.org/D108333
2021-08-19 07:09:38 -07:00
Fraser Cormack e6b1ac8546 [LegalizeTypes][VP] Add widening support for binary VP ops
This patch adds the beginnings of more thorough support in the
legalizers for vector-predicated (VP) operations.

The first step is the ability to widen illegal vectors. The more
complicated scenario in which the result/operands need widening but the
mask doesn't has not been handled here. That would require a lot of code
without an in-tree target on which to test it.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D107904
2021-08-19 13:08:47 +01:00
Rong Xu 5fdaaf7fd8 [SampleFDO] Flow Sensitive Sample FDO (FSAFDO) profile loader
This patch implements Flow Sensitive Sample FDO (FSAFDO) profile
loader. We have two profile loaders for FS profile,
one before RegAlloc and one before BlockPlacement.

To enable it, when -fprofile-sample-use=<profile> is specified,
add "-enable-fs-discriminator=true \
     -disable-ra-fsprofile-loader=false \
     -disable-layout-fsprofile-loader=false"
to turn on the FS profile loaders.

Differential Revision: https://reviews.llvm.org/D107878
2021-08-18 18:37:35 -07:00
Kyungwoo Lee 829616c241 [NFC][DebugInfo] getDwarfCompileUnitID
This is a refactoring for the use in https://reviews.llvm.org/D108261

Reviewed By: dblaikie

Differential Revision: https://reviews.llvm.org/D108271
2021-08-18 17:35:03 -07:00
Arthur Eubanks 2fc075948c [NFC] Remove some unnecessary AttributeList methods
These rely on methods I'm trying to cleanup.
2021-08-18 11:15:20 -07:00
Jessica Paquette 791006fb8c [GlobalISel] Implement lowering for G_ISNAN + use it in AArch64
GlobalISel equivalent to `TargetLowering::expandISNAN`.

Use it in AArch64 and add a testcase.

Differential Revision: https://reviews.llvm.org/D108227
2021-08-18 10:54:25 -07:00
Jessica Paquette d9873711cb [GlobalISel] Add IRTranslator support for G_ISNAN
Translate the `@llvm.isnan` intrinsic to G_ISNAN when we see it.

This is pretty much the same as the associated SelectionDAGBuilder code. Main
difference is that we don't expand it here. It makes more sense to do that
during legalization in GlobalISel. GlobalISel will just legalize the generated
illegal types.

Differential Revision: https://reviews.llvm.org/D108226
2021-08-18 10:48:10 -07:00
Jessica Paquette 0a2b1ba33a [GlobalISel] Add G_ISNAN
Add a generic opcode equivalent to the `llvm.isnan` intrinsic +
MachineVerifier support for it.

We need an opcode here because we may want target-specific lowering later on.

Differential Revision: https://reviews.llvm.org/D108222
2021-08-18 10:42:05 -07:00
Petr Hosek 2d4470ab89 Revert "Allow rematerialization of virtual reg uses"
This reverts commit 877572cc19 which
introduced PR51516.
2021-08-18 00:12:41 -07:00
Arthur Eubanks 3f4d00bc3b [NFC] More get/removeAttribute() cleanup 2021-08-17 21:05:41 -07:00
Arthur Eubanks de0ae9e89e [NFC] Cleanup more AttributeList::addAttribute() 2021-08-17 21:05:41 -07:00
Qiu Chaofan 5ca250a03d [RegAlloc] Remove addAllocPriorityToGlobalRanges hook
It was introduced in 1a6dc92 and only enabled on PowerPC/AMDGPU. That
should be enabled for all targets.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D108010
2021-08-18 10:21:27 +08:00
jacquesguan a7ebc4d145 [DAGCombiner] Teach isKnownToBeAPowerOfTwo handle SPLAT_VECTOR
Make DAGCombine turn mul by power of 2 into shl for scalable vector.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D107883
2021-08-18 10:10:40 +08:00
Wang, Pengfei 2379949aad [X86] AVX512FP16 instructions enabling 3/6
Enable FP16 conversion instructions.

Ref.: https://software.intel.com/content/www/us/en/develop/download/intel-avx512-fp16-architecture-specification.html

Reviewed By: LuoYuanke

Differential Revision: https://reviews.llvm.org/D105265
2021-08-18 09:03:41 +08:00
Simon Pilgrim d7f288502f SelectionDAGBuilder::visitInlineAsm - don't dereference dyn_cast<> results.
dyn_cast<> can return nullptr if the cast is illegal, use cast<> instead which will assert that the cast is correct.

Fixes static analyser warning.
2021-08-17 18:40:59 +01:00
Fraser Cormack f3e9047249 [VP] Add vector-predicated reduction intrinsics
This patch adds vector-predicated ("VP") reduction intrinsics corresponding to
each of the existing unpredicated `llvm.vector.reduce.*` versions. Unlike the
unpredicated reductions, all VP reductions have a start value. This start value
is returned when the no vector element is active.

Support for expansion on targets without native vector-predication support is
included.

This patch is based on the ["reduction
slice"](https://reviews.llvm.org/D57504#1732277) of the LLVM-VP reference patch
(https://reviews.llvm.org/D57504).

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D104308
2021-08-17 17:56:35 +01:00
Sebastian Neubauer fbae34635d [GlobalISel] Add combine for PTR_ADD with regbanks
Combine two G_PTR_ADDs, but keep the register bank of the constant.
That way, the combine can be used in post-regbank-select combines.

Introduce two helper methods in CombinerHelper, getRegBank and
setRegBank that get and set an optional register bank to a register.
That way, they can be used before and after register bank selection.

Differential Revision: https://reviews.llvm.org/D103326
2021-08-17 13:58:16 +02:00
Tiehu Zhang 9cfa9b44a5 [CodeGenPrepare] The instruction to be sunk should be inserted before its user in a block
In current implementation, the instruction to be sunk will be inserted before the target instruction without considering the def-use tree,
which may case Instruction does not dominate all uses error. We need to choose a suitable location to insert according to the use chain

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D107262
2021-08-17 18:58:15 +08:00
Jeremy Morse 708cbda577 [DebugInfo][InstrRef] Honour too-much-debug-info cutouts
This reapplies 54a61c94f9, its follow up in 547b712500, which were
reverted 95fe61e639. Original commit message:

VarLoc based LiveDebugValues will abandon variable location propagation if
there are too many blocks and variable assignments in the function. If it
didn't, and we had (say) 1000 blocks and 1000 variables in scope, we'd end
up with 1 million DBG_VALUEs just at the start of blocks.

Instruction-referencing LiveDebugValues should honour this limitation too
(because the same limitation applies to it). Hoist the relevant command
line options into LiveDebugValues.cpp and pass it down into the
implementation classes as an argument to ExtendRanges. I've duplicated all
the run-lines in live-debug-values-cutoffs.mir to have an
instruction-referencing flavour.

Differential Revision: https://reviews.llvm.org/D107823
2021-08-17 11:34:49 +01:00
Arthur Eubanks 0d822da2bd [NFC] Remove/replace some confusing attribute getters on Function 2021-08-16 16:12:37 -07:00
Afanasyev Ivan 913b5d2f7a [AsmPrinter] fix nullptr dereference for MBBs with hasAddressTaken property without BB
Basic block pointer is dereferenced unconditionally for MBBs with
hasAddressTaken property.

MBBs might have hasAddressTaken property without reference to BB.
Backend developers must assign fake BB to MBB to workaround this issue
and it should be fixed.

Reviewed By: rnk

Differential Revision: https://reviews.llvm.org/D108092
2021-08-16 15:32:09 -07:00
Anshil Gandhi f22ba51873 [Remarks] Emit optimization remarks for atomics generating CAS loop
Implements ORE in AtomicExpand pass to report atomics generating a
compare and swap loop.

Differential Revision: https://reviews.llvm.org/D106891
2021-08-16 14:56:01 -06:00
Stanislav Mekhanoshin 877572cc19 Allow rematerialization of virtual reg uses
Currently isReallyTriviallyReMaterializableGeneric() implementation
prevents rematerialization on any virtual register use on the grounds
that is not a trivial rematerialization and that we do not want to
extend liveranges.

It appears that LRE logic does not attempt to extend a liverange of
a source register for rematerialization so that is not an issue.
That is checked in the LiveRangeEdit::allUsesAvailableAt().

The only non-trivial aspect of it is accounting for tied-defs which
normally represent a read-modify-write operation and not rematerializable.

The test for a tied-def situation already exists in the
/CodeGen/AMDGPU/remat-vop.mir,
test_no_remat_v_cvt_f32_i32_sdwa_dst_unused_preserve.

The change has affected ARM/Thumb, Mips, RISCV, and x86. For the targets
where I more or less understand the asm it seems to reduce spilling
(as expected) or be neutral. However, it needs a review by all targets'
specialists.

Differential Revision: https://reviews.llvm.org/D106408
2021-08-16 12:42:42 -07:00
Stanislav Mekhanoshin b9e433b02a Prevent machine licm if remattable with a vreg use
Check if a remateralizable nstruction does not have any virtual
register uses. Even though rematerializable RA might not actually
rematerialize it in this scenario. In that case we do not want to
hoist such instruction out of the loop in a believe RA will sink
it back if needed.

This already has impact on AMDGPU target which does not check for
this condition in its isTriviallyReMaterializable implementation
and have instructions with virtual register uses enabled. The
other targets are not impacted at this point although will be when
D106408 lands.

Differential Revision: https://reviews.llvm.org/D107677
2021-08-16 12:09:00 -07:00
Craig Topper 92abb1cf90 [TypePromotion] Don't mutate the result type of SwitchInst.
SwitchInst should have a void result type.

Add a check to the verifier to catch this error.

Reviewed By: samparker

Differential Revision: https://reviews.llvm.org/D108084
2021-08-16 08:54:34 -07:00
Simon Pilgrim d6fe8d37c6 [DAG] Fold concat_vectors(concat_vectors(x,y),concat_vectors(a,b)) -> concat_vectors(x,y,a,b)
Follow-up to D107068, attempt to fold nested concat_vectors/undefs, as long as both the vector and inner subvector types are legal.

This exposed the same issue in ARM's MVE LowerCONCAT_VECTORS_i1 (raised as PR51365) and AArch64's performConcatVectorsCombine which both assumed concat_vectors only took 2 subvector operands.

Differential Revision: https://reviews.llvm.org/D107597
2021-08-16 16:06:54 +01:00
Jeremy Morse 95fe61e639 Revert 54a61c94f9 and its follow up in 547b712500
These were part of D107823, however asan  has found something excitingly
wrong happening:

https://lab.llvm.org/buildbot/#/builders/5/builds/10543/steps/13/logs/stdio
2021-08-16 15:48:56 +01:00
Jeremy Morse 547b712500 Suppress signedness-comparison warning
This is a follow-up to 54a61c94f9.
2021-08-16 15:29:43 +01:00
Jeremy Morse 54a61c94f9 [DebugInfo][InstrRef] Honour too-much-debug-info cutouts
VarLoc based LiveDebugValues will abandon variable location propagation if
there are too many blocks and variable assignments in the function. If it
didn't, and we had (say) 1000 blocks and 1000 variables in scope, we'd end
up with 1 million DBG_VALUEs just at the start of blocks.

Instruction-referencing LiveDebugValues should honour this limitation too
(because the same limitation applies to it). Hoist the relevant command
line options into LiveDebugValues.cpp and pass it down into the
implementation classes as an argument to ExtendRanges. I've duplicated all
the run-lines in live-debug-values-cutoffs.mir to have an
instruction-referencing flavour.

Differential Revision: https://reviews.llvm.org/D107823
2021-08-16 15:06:40 +01:00
Paul Walker cd0e196413 [DAGCombiner] Stop visitEXTRACT_SUBVECTOR creating illegal BITCASTs post legalisation.
visitEXTRACT_SUBVECTOR can sometimes create illegal BITCASTs when
removing "redundant" INSERT_SUBVECTOR operations.  This patch adds
an extra check to ensure such combines only occur after operation
legalisation if any resulting BITBAST is itself legal.

Differential Revision: https://reviews.llvm.org/D108086
2021-08-15 18:25:49 +01:00
Qiu Chaofan a240b29f21 [NFC] Simply update a FIXME comment
X86 overrided LowerOperationWrapper was moved to common implementation
in a7eae62.
2021-08-15 22:43:46 +08:00
Dávid Bolvanský 49de6070a2 Revert "[Remarks] Emit optimization remarks for atomics generating CAS loop"
This reverts commit 435785214f. Still same compile time issues for -O0 -g, eg. +1.3% for sqlite3.
2021-08-15 11:44:13 +02:00
Anshil Gandhi 435785214f [Remarks] Emit optimization remarks for atomics generating CAS loop
Implements ORE in AtomicExpand pass to report atomics generating
a compare and swap loop.

Differential Revision: https://reviews.llvm.org/D106891
2021-08-14 23:37:23 -06:00
Anshil Gandhi 29e11a1aa3 Revert "[Remarks] Emit optimization remarks for atomics generating CAS loop"
This reverts commit c4e5425aa5.
2021-08-13 23:58:04 -06:00
Anshil Gandhi c4e5425aa5 [Remarks] Emit optimization remarks for atomics generating CAS loop
Implements ORE in AtomicExpandPass to report atomics generating a compare
and swap loop.

Differential Revision: https://reviews.llvm.org/D106891
2021-08-13 22:44:08 -06:00
Jessica Paquette 50efbf9cbe [GlobalISel] Narrow binops feeding into G_AND with a mask
This is a fairly common pattern:

```
%mask = G_CONSTANT iN <mask val>
%add = G_ADD %lhs, %rhs
%and = G_AND %add, %mask
```

We have combines to eliminate G_AND with a mask that does nothing.

If we combined the above to this:

```
%mask = G_CONSTANT iN <mask val>
%narrow_lhs = G_TRUNC %lhs
%narrow_rhs = G_TRUNC %rhs
%narrow_add = G_ADD %narrow_lhs, %narrow_rhs
%ext = G_ZEXT %narrow_add
%and = G_AND %ext, %mask
```

We'd be able to take advantage of those combines using the trunc + zext.

For this to work (or be beneficial in the best case)

- The operation we want to narrow then widen must only be used by the G_AND
- The G_TRUNC + G_ZEXT must be free
- Performing the operation at a narrower width must not produce a different
  value than performing it at the original width *after masking.*

Example comparison between SDAG + GISel: https://godbolt.org/z/63jzb1Yvj

At -Os for AArch64, this is a 0.2% code size improvement on CTMark/pairlocalign.

Differential Revision: https://reviews.llvm.org/D107929
2021-08-13 18:31:13 -07:00
Matt Arsenault cc56152f83 GlobalISel: Add helper function for getting EVT from LLT
This can only give an imperfect approximation, but is enough to avoid
crashing in places where we call into EVT functions starting from LLTs.
2021-08-13 21:10:13 -04:00
Arthur Eubanks f80ae58068 [NFC] Cleanup calls to AttributeList::getAttribute(FunctionIndex)
getAttribute() is confusing, use a clearer method.
2021-08-13 16:27:11 -07:00
Arthur Eubanks d7593ebaee [NFC] Clean up users of AttributeList::hasAttribute()
AttributeList::hasAttribute() is confusing, use clearer methods like
hasParamAttr()/hasRetAttr().

Add hasRetAttr() since it was missing from AttributeList.
2021-08-13 11:59:18 -07:00
Arthur Eubanks 92ce6db9ee [NFC] Rename AttributeList::hasFnAttribute() -> hasFnAttr()
This is more consistent with similar methods.
2021-08-13 11:09:18 -07:00
Ruiling Song e1beebbac5 SplitKit: Don't further split subrange mask in buildCopy
We may use several COPY instructions to copy the needed sub-registers
during split. But the way we split the lanes during the COPYs may be
different from the subranges of the old register. This would fail when we
extend the subranges of the new register because the LaneMasks do not
match exactly between subranges of new register and old register.
Since we are bundling the COPYs, I think there is no need to further refine the
subranges of the new register based on the set of LaneMasks of the inserted COPYs.

I am not sure if there will be further breaking cases. But as the subranges of
new register are created based on the LaneMasks of the subranges of old register,
it will be highly possible we will always find an exact LaneMask match.
We can think about how to make the extendPHIKillRanges() work for
subrange mask mismatch case if we meet more such cases in the future.

The test case was from D105065 by @arsenm.

Differential Revision: https://reviews.llvm.org/D107829
2021-08-13 07:36:38 +08:00
Rong Xu 4c5909ba83 [SampleFDO] Add two passes of MIRAddFSDiscriminatorsPass
This patch adds Pass1 of MIRADDFSDiscriminatorsPass before register
allocation, and Pass2 of MIRAddFSDiscriminatorsPass before
Block-Placement. This is still under --enable-fs-discrmininator
option (default false).

This would reduce the turn-around time for FSAFDO transition.

Differential Revision: https://reviews.llvm.org/D104579
2021-08-11 11:11:04 -07:00
Fraser Cormack 885be620f9 [LegalizeTypes][NFC] Remove else-after-return
Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D107890
2021-08-11 16:48:28 +01:00
Rainer Orth 7bbbf29561 [ELF] Don't emit SHF_GNU_RETAIN on Solaris
The introduction of `SHF_GNU_RETAIN` has caused massive problems on Solaris.

Initially, as reported in Bug 49437, it caused dozens of testsuite failures
on both sparc and x86.  The objects were marked as `ELFOSABI_NONE`, but
`SHF_GNU_RETAIN` is a GNU extension. In the native Solaris ABI, that flag
(in the range for OS-specific values) is `SHF_SUNW_ABSENT` with a
completely different semantics, which confuses Solaris `ld` very much.

Later, the objects became (correctly) marked `ELFOSABI_GNU`, which Solaris
`ld` doesn't support, causing it to SEGV and break the build.  The linker
is currently being hardened to not accept non-native OS ABIs to avoid this.

The need for linker support is already documented in
`clang/include/clang/Basic/AttrDocs.td`, but not currently checked.

This patch avoids all this by not emitting `SHF_GNU_RETAIN` on Solaris at all.

Tested on `amd64-pc-solaris2.11`, `sparcv9-sun-solaris2.11`, and
`x86_64-pc-linux-gnu`.

Differential Revision: https://reviews.llvm.org/D107747
2021-08-11 09:27:51 +02:00
madhur13490 61526b1262 [DAG] Reword comment for EnforceNodeIdInvariant and InvalidateNodeId. NFC.
Reviewed By: niravd

Differential Revision: https://reviews.llvm.org/D107845
2021-08-11 12:14:28 +05:30
Craig Topper a8ae41fb51 [SelectionDAGBuilder] Save iterator to avoid second DenseMap lookup. NFC
We were calling find and then using operator[]. Instead keep the
iterator from find and use it to get the value.

Just happened to notice while investigating how we decide what extends
to use between basic blocks.
2021-08-10 22:37:48 -07:00
Christopher Di Bella c874dd5362 [llvm][clang][NFC] updates inline licence info
Some files still contained the old University of Illinois Open Source
Licence header. This patch replaces that with the Apache 2 with LLVM
Exception licence.

Differential Revision: https://reviews.llvm.org/D107528
2021-08-11 02:48:53 +00:00
Amara Emerson 7ec4ce157b [AArch64][GlobalISel] Relax oneuse restriction for PTR_ADD chain combining to check addressing legality.
With contributions by Sebastian Neubauer

Differential Revision: https://reviews.llvm.org/D105676
2021-08-10 16:41:18 -07:00
Adrian Prantl d6b6880172 Streamline the API of salvageDebugInfoImpl (NFC)
This patch refactors / simplifies salvageDebugInfoImpl(). The goal
here is to simplify the implementation of coro::salvageDebugInfo() in
a followup patch.

  1. Change the return value to I.getOperand(0). Currently users of
     salvageDebugInfoImpl() assume that the first operand is
     I.getOperand(0). This patch makes this information explicit. A
     nice side-effect of this change is that it allows us to salvage
     expressions such as add i8 1, %a in the future.

  2. Factor out the creation of a DIExpression and return an array of
     DIExpression operations instead. This change allows users that
     call salvageDebugInfoImpl() in a loop to avoid the costly
     creation of temporary DIExpressions and to defer the creation of
     a DIExpression until the end.

This patch does not change any functionality.

rdar://80227769

Differential Revision: https://reviews.llvm.org/D107383
2021-08-10 15:21:18 -07:00
Jinsong Ji 2cfd427626 [AIX] Don't crash on unimplemented lowerRelativeReference
We may call lowerRelativeReference in MC to determine whether target
supports this lowering. We should return nullptr instead of crashing
when we haven't implemented the real lowering.

Reviewed By: hubert.reinterpretcast

Differential Revision: https://reviews.llvm.org/D107830
2021-08-10 17:43:06 +00:00
Matt Arsenault 1b41945da0 RegAllocGreedy: Add spaces between registers in debug message 2021-08-10 13:12:34 -04:00
Konstantin Schwarz 64bef13f08 [GlobalISel] Look through truncs and extends in narrowScalarShift
If a G_SHL is fed by a G_CONSTANT, the lower and upper bits of the source can be
shifted individually by the constant shift amount.

However in case the shift amount came from a G_TRUNC(G_CONSTANT), the generic shift legalization
code was used, producing intermediate shifts that are potentially illegal on some targets.

This change teaches narrowScalarShift to look through G_TRUNCs and G_*EXTs.

Reviewed By: paquette

Differential Revision: https://reviews.llvm.org/D89100
2021-08-10 13:49:22 +02:00
Wang, Pengfei 6f7f5b54c8 [X86] AVX512FP16 instructions enabling 1/6
1. Enable FP16 type support and basic declarations used by following patches.
2. Enable new instructions VMOVW and VMOVSH.

Ref.: https://software.intel.com/content/www/us/en/develop/download/intel-avx512-fp16-architecture-specification.html

Reviewed By: LuoYuanke

Differential Revision: https://reviews.llvm.org/D105263
2021-08-10 12:46:01 +08:00
Jeremy Morse d4ce9e463d [DWARF] Revert sharing subprograms across CUs
This patch is a revert of e08f205f5c. In that patch, DW_TAG_subprograms
were permitted to be referenced across CU boundaries, to improve stack
trace construction using call site information. Unfortunately, as
documented in PR48790, the way that subprograms are "owned" by dwarf units
is sufficiently complicated that subprograms end up in unexpected units,
invalidating cross-unit references.

There's no obvious way to easily fix this, and several attempts have
failed. Revert this to ensure correct DWARF is always emitted.

Three tests change in addition to the reversion, but they're all very
light alterations.

Differential Revision: https://reviews.llvm.org/D107076
2021-08-09 12:43:43 +01:00
Luo, Yuanke 53642d5b80 [NFC] Fix the formula for reciprocal calculation.
Differential Revision: https://reviews.llvm.org/D107713
2021-08-09 16:03:56 +08:00
Amara Emerson 4c2e01232c [GlobalISel] Fix a combine causing DBG_VALUE with dangling vregs.
We should use MachineInstr::eraseFromParentAndMarkDBGValuesForRemoval()
instead of eraseFromParent().

We should probably use that in other places too but fix this issue which
affects clang bootstrap builds for now.
2021-08-07 01:41:02 -07:00
Nemanja Ivanovic 62fe3dcf98 Fix PPC buildbot break caused by 4c4093e6e3
This commit adds the isnan intrinsic and provides a default expansion
for it in the SDAG. However, it makes the assumption that types
it operates on are IEEE-compliant types. This is not always the case.
An example of that is PPC "double double" which has a representation
that
- Does not need to conform to IEEE requirements for isnan as it is
  not an IEEE-compliant type
- Does not have a representation that allows for straightforward
  reinterpreting as an integer and use of integer operations

The result was that this commit broke __builtin_isnan for ppc_fp128
making many valid numeric values report a NaN.

This patch simply changes the expansion to always expand to unordered
comparison (regardless of whether FP exceptions are tracked). This
is inline with previous semantics.
2021-08-06 22:10:20 -05:00
Amara Emerson 2b067e3335 Change TargetLowering::canMergeStoresTo() to take a MF instead of DAG.
DAG is unnecessary and we need this hook to implement store merging on GlobalISel too.
2021-08-06 12:57:53 -07:00
Jon Roelofs eae4a44c1d [GlobalISel][KnownBits] Implement G_CTPOP
Implementation copied almost verbatim from ValueTracking.

Differential revision: https://reviews.llvm.org/D107606
2021-08-06 09:48:39 -07:00
Craig Topper b2ca4dc935 [LegalizeTypes] Add a simple expansion for SMULO when a libcall isn't available.
This isn't optimal, but prevents crashing when the libcall isn't
available. It just calculates the full product and makes sure the high bits
match the sign of the low half. Each of the pieces should go through their own
type legalization.

This can make D107420 unnecessary.

Needs tests, but I wanted to start discussion about D107420.

Reviewed By: FreddyYe

Differential Revision: https://reviews.llvm.org/D107581
2021-08-06 09:43:01 -07:00
Kazu Hirata 276be84d0a [CodeGen] Remove computeDefOperandLatency (NFC)
The last use was removed on Oct 9, 2016 in commit
5c924d7117.
2021-08-06 08:26:55 -07:00
Jay Foad 57b9107e3f [GlobalISel] Improve widening of cttz/cttz_zero_undef
Differential Revision: https://reviews.llvm.org/D107631
2021-08-06 14:25:56 +01:00
Jay Foad cd2594e1c6 [GlobalISel] Improve legalization of narrow CTTZ
Differential Revision: https://reviews.llvm.org/D107457
2021-08-06 09:40:48 +01:00
Serge Pavlov 4c4093e6e3 Introduce intrinsic llvm.isnan
This is recommit of the patch 16ff91ebcc,
reverted in 0c28a7c990 because it had
an error in call of getFastMathFlags (base type should be FPMathOperator
but not Instruction). The original commit message is duplicated below:

    Clang has builtin function '__builtin_isnan', which implements C
    library function 'isnan'. This function now is implemented entirely in
    clang codegen, which expands the function into set of IR operations.
    There are three mechanisms by which the expansion can be made.

    * The most common mechanism is using an unordered comparison made by
      instruction 'fcmp uno'. This simple solution is target-independent
      and works well in most cases. It however is not suitable if floating
      point exceptions are tracked. Corresponding IEEE 754 operation and C
      function must never raise FP exception, even if the argument is a
      signaling NaN. Compare instructions usually does not have such
      property, they raise 'invalid' exception in such case. So this
      mechanism is unsuitable when exception behavior is strict. In
      particular it could result in unexpected trapping if argument is SNaN.

    * Another solution was implemented in https://reviews.llvm.org/D95948.
      It is used in the cases when raising FP exceptions by 'isnan' is not
      allowed. This solution implements 'isnan' using integer operations.
      It solves the problem of exceptions, but offers one solution for all
      targets, however some can do the check in more efficient way.

    * Solution implemented by https://reviews.llvm.org/D96568 introduced a
      hook 'clang::TargetCodeGenInfo::testFPKind', which injects target
      specific code into IR. Now only SystemZ implements this hook and it
      generates a call to target specific intrinsic function.

    Although these mechanisms allow to implement 'isnan' with enough
    efficiency, expanding 'isnan' in clang has drawbacks:

    * The operation 'isnan' is hidden behind generic integer operations or
      target-specific intrinsics. It complicates analysis and can prevent
      some optimizations.

    * IR can be created by tools other than clang, in this case treatment
      of 'isnan' has to be duplicated in that tool.

    Another issue with the current implementation of 'isnan' comes from the
    use of options '-ffast-math' or '-fno-honor-nans'. If such option is
    specified, 'fcmp uno' may be optimized to 'false'. It is valid
    optimization in general, but it results in 'isnan' always returning
    'false'. For example, in some libc++ implementations the following code
    returns 'false':

        std::isnan(std::numeric_limits<float>::quiet_NaN())

    The options '-ffast-math' and '-fno-honor-nans' imply that FP operation
    operands are never NaNs. This assumption however should not be applied
    to the functions that check FP number properties, including 'isnan'. If
    such function returns expected result instead of actually making
    checks, it becomes useless in many cases. The option '-ffast-math' is
    often used for performance critical code, as it can speed up execution
    by the expense of manual treatment of corner cases. If 'isnan' returns
    assumed result, a user cannot use it in the manual treatment of NaNs
    and has to invent replacements, like making the check using integer
    operations. There is a discussion in https://reviews.llvm.org/D18513#387418,
    which also expresses the opinion, that limitations imposed by
    '-ffast-math' should be applied only to 'math' functions but not to
    'tests'.

    To overcome these drawbacks, this change introduces a new IR intrinsic
    function 'llvm.isnan', which realizes the check as specified by IEEE-754
    and C standards in target-agnostic way. During IR transformations it
    does not undergo undesirable optimizations. It reaches instruction
    selection, where is lowered in target-dependent way. The lowering can
    vary depending on options like '-ffast-math' or '-ffp-model' so the
    resulting code satisfies requested semantics.

    Differential Revision: https://reviews.llvm.org/D104854
2021-08-06 14:32:27 +07:00
Sean Fertile 23651c5ae0 [PowerPC][AIX] Create multiple constant sections.
Fixes issue where late materialized constants can be more strictly
aligned then their containing csect.

Differential Revision: https://reviews.llvm.org/D103103
2021-08-05 21:19:16 -04:00
Jon Roelofs 5fc7b1a260 Revert "[GlobalISel][KnownBits] Implement G_CTPOP"
This reverts commit ce6eb4f15a.

It's broken on the windows bots: https://reviews.llvm.org/D107606#2930121
2021-08-05 17:47:47 -07:00
Jon Roelofs ce6eb4f15a [GlobalISel][KnownBits] Implement G_CTPOP
Implementation copied almost verbatim from ValueTracking.

Differential revision: https://reviews.llvm.org/D107606
2021-08-05 17:17:29 -07:00
Craig Topper f7076cfd3a [DAGCombiner][RISCV][AMDGPU] Call SimplifyDemandedBits at the end of visitMULHU to enable known bits contant folding.
We don't have real demanded bits support for MULHU, but we can
still use the known bits based constant folding support at the end
of SimplifyDemandedBits to simplify a MULHU. This helps with cases
where we know the LHS and RHS have enough leading zeros so that
the high multiply result is always 0.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D106471
2021-08-05 08:31:26 -07:00
Simon Pilgrim 2cbf9fd402 [DAG] DAGCombiner::visitVECTOR_SHUFFLE - recognise INSERT_SUBVECTOR patterns
IR typically creates INSERT_SUBVECTOR patterns as a widening of the subvector with undefs to pad to the destination size, followed by a shuffle for the actual insertion - SelectionDAGBuilder has to do something similar for shuffles when source/destination vectors are different sizes.

This combine attempts to recognize these patterns by looking for a shuffle of a subvector (from a CONCAT_VECTORS) that starts at a modulo of its size into an otherwise identity shuffle of the base vector.

This uncovered a couple of target-specific issues as we haven't often created INSERT_SUBVECTOR nodes in generic code - aarch64 could only handle insertions into the bottom of undefs (i.e. a vector widening), and x86-avx512 vXi1 insertion wasn't keeping track of undef elements in the base vector.

Fixes PR50053

Differential Revision: https://reviews.llvm.org/D107068
2021-08-05 15:40:48 +01:00
Paul Robinson 75aa3d520d Add a DIExpression const-folder to prevent silly expressions.
It's entirely possible (because it actually happened) for a bool
variable to end up with a 256-bit DW_AT_const_value.  This came about
when a local bool variable was initialized from a bitfield in a
32-byte struct of bitfields, and after inlining and constant
propagation, the variable did have a constant value. The sequence of
optimizations had it carrying "i256" values around, but once the
constant made it into the llvm.dbg.value, no further IR changes could
affect it.

Technically the llvm.dbg.value did have a DIExpression to reduce it
back down to 8 bits, but the compiler is in no way ready to emit an
oversized constant *and* a DWARF expression to manipulate it.
Depending on the circumstances, we had either just the very fat bool
value, or an expression with no starting value.

The sequence of optimizations that led to this state did seem pretty
reasonable, so the solution I came up with was to invent a DWARF
constant expression folder.  Currently it only does convert ops, but
there's no reason it couldn't do other ops if that became useful.

This broke three tests that depended on having convert ops survive
into the DWARF, so I added an operator that would abort the folder to
each of those tests.

Differential Revision: https://reviews.llvm.org/D106915
2021-08-05 06:14:40 -07:00
Petar Avramovic 66de26b1f9 GlobalISel: Fix matchEqualDefs for instructions with multiple defs
Instructions that produceSameValue produce same values for operands with
same index. matchEqualDefs used to return true for any two values from
different instructions that produce same values. Fix this by checking if
values are defined by operands with the same index.

Differential Revision: https://reviews.llvm.org/D107362
2021-08-05 15:05:45 +02:00
Dominik Montada cc947e29ea [GlobalISel] Combine shr(shl x, c1), c2 to G_SBFX/G_UBFX
Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D107330
2021-08-05 13:52:10 +02:00
Fraser Cormack 0b8471e91b [SelectionDAG] Correctly determine the VECREDUCE_SEQ_FMUL action
The LegalizeAction for this node should follow the logic for
`VECREDUCE_SEQ_FADD` and be determined using the vector operand's type.

here isn't an in-tree target that makes use of this, but I think it's safe to
say this is how it should behave, should a target want to customize the action
for this node.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D107478
2021-08-05 09:42:33 +01:00
Fangrui Song a194438615 [CodeGen] Add -align-loops
to `lib/CodeGen/CommandFlags.cpp`. It can replace
-x86-experimental-pref-loop-alignment=.

The loop alignment is only used by MachineBlockPlacement.
The implementation uses a new `llvm::TargetOptions` for now, as
an IR function attribute/module flags metadata may be overkill.

This is the llvm part of D106701.
2021-08-04 12:45:18 -07:00
Craig Topper c23405174a [DAGCombiner][AMDGPU] Canonicalize constants to the RHS of MULHU/MULHS.
This allows special constants like to 0 to be recognized. It's also
expected by isel patterns if a target had a mulh with immediate instructions.
The commuting done by tablegen won't commute patterns with immediates since it
expects DAGCombine to have done it.

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D107486
2021-08-04 11:39:23 -07:00
David Green eeddcba525 [RDA] Attempt to make RDA subreg aware
This attempts to make more of RDA aware of potentially overlapping
subregisters. Some of this was already in place, with it iterating
through MCRegUnitIterators. This also replaces calls to
LiveRegs.contains(..) with !LiveRegs.available(..), and updates the
isValidRegUseOf and isValidRegDefOf to search subregs.

Differential Revision: https://reviews.llvm.org/D107351
2021-08-04 14:21:32 +01:00
Serge Pavlov 0c28a7c990 Revert "Introduce intrinsic llvm.isnan"
This reverts commit 16ff91ebcc.
Several errors were reported mainly test-suite execution time. Reverted
for investigation.
2021-08-04 17:18:15 +07:00
Serge Pavlov 16ff91ebcc Introduce intrinsic llvm.isnan
Clang has builtin function '__builtin_isnan', which implements C
library function 'isnan'. This function now is implemented entirely in
clang codegen, which expands the function into set of IR operations.
There are three mechanisms by which the expansion can be made.

* The most common mechanism is using an unordered comparison made by
  instruction 'fcmp uno'. This simple solution is target-independent
  and works well in most cases. It however is not suitable if floating
  point exceptions are tracked. Corresponding IEEE 754 operation and C
  function must never raise FP exception, even if the argument is a
  signaling NaN. Compare instructions usually does not have such
  property, they raise 'invalid' exception in such case. So this
  mechanism is unsuitable when exception behavior is strict. In
  particular it could result in unexpected trapping if argument is SNaN.

* Another solution was implemented in https://reviews.llvm.org/D95948.
  It is used in the cases when raising FP exceptions by 'isnan' is not
  allowed. This solution implements 'isnan' using integer operations.
  It solves the problem of exceptions, but offers one solution for all
  targets, however some can do the check in more efficient way.

* Solution implemented by https://reviews.llvm.org/D96568 introduced a
  hook 'clang::TargetCodeGenInfo::testFPKind', which injects target
  specific code into IR. Now only SystemZ implements this hook and it
  generates a call to target specific intrinsic function.

Although these mechanisms allow to implement 'isnan' with enough
efficiency, expanding 'isnan' in clang has drawbacks:

* The operation 'isnan' is hidden behind generic integer operations or
  target-specific intrinsics. It complicates analysis and can prevent
  some optimizations.

* IR can be created by tools other than clang, in this case treatment
  of 'isnan' has to be duplicated in that tool.

Another issue with the current implementation of 'isnan' comes from the
use of options '-ffast-math' or '-fno-honor-nans'. If such option is
specified, 'fcmp uno' may be optimized to 'false'. It is valid
optimization in general, but it results in 'isnan' always returning
'false'. For example, in some libc++ implementations the following code
returns 'false':

    std::isnan(std::numeric_limits<float>::quiet_NaN())

The options '-ffast-math' and '-fno-honor-nans' imply that FP operation
operands are never NaNs. This assumption however should not be applied
to the functions that check FP number properties, including 'isnan'. If
such function returns expected result instead of actually making
checks, it becomes useless in many cases. The option '-ffast-math' is
often used for performance critical code, as it can speed up execution
by the expense of manual treatment of corner cases. If 'isnan' returns
assumed result, a user cannot use it in the manual treatment of NaNs
and has to invent replacements, like making the check using integer
operations. There is a discussion in https://reviews.llvm.org/D18513#387418,
which also expresses the opinion, that limitations imposed by
'-ffast-math' should be applied only to 'math' functions but not to
'tests'.

To overcome these drawbacks, this change introduces a new IR intrinsic
function 'llvm.isnan', which realizes the check as specified by IEEE-754
and C standards in target-agnostic way. During IR transformations it
does not undergo undesirable optimizations. It reaches instruction
selection, where is lowered in target-dependent way. The lowering can
vary depending on options like '-ffast-math' or '-ffp-model' so the
resulting code satisfies requested semantics.

Differential Revision: https://reviews.llvm.org/D104854
2021-08-04 15:27:49 +07:00
Heejin Ahn 9bd02c433b [WebAssembly] Misc. cosmetic changes in EH (NFC)
- Rename `wasm.catch` intrinsic to `wasm.catch.exn`, because we are
  planning to add a separate `wasm.catch.longjmp` intrinsic which
  returns two values.
- Rename several variables
- Remove an unnecessary parameter from `canLongjmp` and `isEmAsmCall`
  from LowerEmscriptenEHSjLj pass
- Add `-verify-machineinstrs` in a test for a safety measure
- Add more comments + fix some errors in comments
- Replace `std::vector` with `SmallVector` for cases likely with small
  number of elements
- Renamed `EnableEH`/`EnableSjLj` to `EnableEmEH`/`EnableEmSjLj`: We are
  soon going to add `EnableWasmSjLj`, so this makes the distincion
  clearer

Reviewed By: tlively

Differential Revision: https://reviews.llvm.org/D107405
2021-08-03 21:03:46 -07:00
Arthur Eubanks ad25344620 [MC][CodeGen] Emit constant pools earlier
Previously we would emit constant pool entries for ldr inline asm at the
very end of AsmPrinter::doFinalization(). However, if we're emitting
dwarf aranges, that would end all sections with aranges. Then if we have
constant pool entries to be emitted in those same sections, we'd hit an
assert that the section has already been ended.

We want to emit constant pool entries before emitting dwarf aranges.
This patch splits out arm32/64's constant pool entry emission into its
own MCTargetStreamer virtual method.

Fixes PR51208

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D107314
2021-08-03 20:55:31 -07:00
Simon Pilgrim 11396641e4 [DAG] Cleanup DAGCombiner::CombineConsecutiveLoads early-outs. NFCI.
We had some similar hasOneUse/isNON_EXTLoad early-outs spread out over different parts of the method - we should pull them all together.

Noticed while triaging PR45116
2021-08-03 13:47:55 +01:00
Eli Friedman 1f62af6346 [AArch64][SelectionDAG] Support passing/returning scalable vectors with unusual types.
This adds handling for two cases:

1. A scalable vector where the element type is promoted.
2. A scalable vector where the element count is odd (or more generally,
   not divisble by the element count of the part type).

(Some element types still don't work; for example, <vscale x 2 x i128>,
or <vscale x 2 x fp128>.)

Differential Revision: https://reviews.llvm.org/D105591
2021-08-02 15:53:16 -07:00
Max Kazantsev c5b63714b5 [GC][NFC] Make getGCStrategy by name available in IR
We might want to use info from GC strategy in middle end analysis.
The motivation for this is provided in D99135: we may want to ask
a GC if it's going to work with a given pointer (currently this code
makes naive check by the method name).

Differetial Revision: https://reviews.llvm.org/D100559
Reviewed By: reames
2021-08-02 14:26:04 +07:00
Matt Arsenault ebc17a0d68 GlobalISel: Scalarize unaligned vector stores
This has the same problems and limitations as the load path.
2021-07-31 10:37:15 -04:00
Simon Pilgrim 3a7c82efb8 [DAG] isGuaranteedNotToBeUndefOrPoison - handle ISD::BUILD_VECTOR nodes
If all demanded elements of the BUILD_VECTOR pass a isGuaranteedNotToBeUndefOrPoison check, then we can treat this specific demanded use of the BUILD_VECTOR as guaranteed not to be undef or poison either.

Differential Revision: https://reviews.llvm.org/D107174
2021-07-31 15:08:25 +01:00
Matt Arsenault bc2cb91a20 GlobalISel: Have lowerStore handle some unaligned stores
This is NFC until some of the AMDGPU legalization rules are ripped
out.
2021-07-31 10:01:42 -04:00
Alexandros Lamprineas 7d940432c4 [AArch64] Legalize MVT::i64x8 in DAG isel lowering
This patch legalizes the Machine Value Type introduced in D94096 for loads
and stores. A new target hook named getAsmOperandValueType() is added which
maps i512 to MVT::i64x8. GlobalISel falls back to DAG for legalization.

Differential Revision: https://reviews.llvm.org/D94097
2021-07-31 09:51:28 +01:00
Alexandros Lamprineas 3094e5389b [AArch64] Add a Machine Value Type for 8 consecutive registers
Adds MVT::i64x8, a Machine Value Type needed for lowering inline assembly
operands which materialize a sequence of eight general purpose registers.

Differential Revision: https://reviews.llvm.org/D94096
2021-07-31 09:51:28 +01:00
Rahman Lavaee 2256b359d7 Explain the symbols of basic block clusters with an example in the header comments.
This prevents from confusion with the ``labels`` option.

Reviewed By: snehasish

Differential Revision: https://reviews.llvm.org/D107128
2021-07-30 12:08:04 -07:00
Simon Pilgrim 3c0b596ecc SelectionDAGDumper.cpp - remove nested if-else return chain. NFCI.
Match style and don't use an else after a return.
2021-07-30 19:23:05 +01:00
Simon Pilgrim 986841cca2 SelectionDAGDumper.cpp - printrWithDepthHelper - remove dead code. NFCI.
Fixes coverity warning - we have an early-out for unsigned depth == 0, so the depth < 1 early-out later on is dead code.
2021-07-30 19:23:04 +01:00
Matt Arsenault e46badd4e9 GlobalISel: Have lowerLoad scalarize unaligned vectors
This could be smarter by picking an ideal type, or at least splitting
the vector in half first. Also handles lower for non-power-of-2,
non-extending vector loads.

Currently this just avoids failing to legalize some odd vector AMDGPU
tests, but is a step towards removing the split logic from the
NarrowScalar logic.
2021-07-30 13:23:29 -04:00
Matt Arsenault f19226dda5 GlobalISel: Have load lowering handle some unaligned accesses
The code for splitting an unaligned access into 2 pieces is
essentially the same as for splitting a non-power-of-2 load for
scalars. It would be better to pick an optimal memory access size and
directly use it, but splitting in half is what the DAG does.

As-is this fixes handling of some unaligned sextload/zextloads for
AMDGPU. In the future this will help drop the ugly abuse of
narrowScalar to handle splitting unaligned accesses.
2021-07-30 12:55:58 -04:00
Adrian Prantl c5d84d2eb3 GlobalISel/AArch64: don't optimize away redundant branches at -O0
This patch prevents GlobalISel from optimizing out redundant branch
instructions when compiling without optimizations.

The motivating example is code like the following common pattern in
Swift, where users expect to be able to set a breakpoint on the early
exit:

public func f(b: Bool) {
  guard b else {
    return // I would like to set a breakpoint here.
  }
  ...
}

The patch modifies two places in GlobalISEL: The first one is in
IRTranslator.cpp where the removal of redundant branches is made
conditional on the optimization level. The second one is in
AArch64InstructionSelector.cpp where an -O0 *only* optimization is
being removed.

Disabling these optimizations increases code size at -O0 by
~8%. However, doing so improves debuggability, and debug builds are
the primary reason why developers compile without optimizations. We
thus concluded that this is the right trade-off.

rdar://79515454

This tenatively reapplies the patch without modifications, the LLDB
test that has blocked this from landing previously has since been
modified to hopefully no longer be sensitive to this change.

Differential Revision: https://reviews.llvm.org/D105238
2021-07-29 16:04:22 -07:00
Amara Emerson c54d5c9756 [GlobalISel] Use GMergeLikeOp to simplify a combine. NFC. 2021-07-29 13:53:16 -07:00