Commit Graph

29920 Commits

Author SHA1 Message Date
Amy Huang 7156910d85 [CodeView] Encode signed int values correctly when emitting S_CONSTANTs
Differential Revision: https://reviews.llvm.org/D90199
2020-10-30 09:28:41 -07:00
Nikita Popov 6c2ad4cf87 [SDAG] Extract helper to determine neutral element (NFC)
Make the existing VECREDUCE based code more generic, but expressing
it in terms of the neutral value of the base opcode instead.
2020-10-29 22:05:06 +01:00
Nikita Popov a5f172927d [SDAG] Fix neutral value for vecreduce_fadd
The neutral value for FADD is -0.0, not 0.0, so this is what we
need to pad vectors with.
2020-10-29 21:27:59 +01:00
Nikita Popov 91bf172088 [SDAG] Extract helper to get vecreduce base opcode (NFC) 2020-10-29 20:22:22 +01:00
dfukalov b3cdaef518 [MIR] Fix out of bounds access in MIRPrinter.
Fixes: SWDEV-256460

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D90239
2020-10-29 14:35:06 +03:00
Alok Kumar Sharma 930a8c60b6 [DebugInfo] [NFCI] Adding a missed out line in support for DW_TAG_generic_subrange.
This commit adds a missed out line in earlier commit for DW_TAG_generic_subrange.
Previous commit ID: a6dd01afa3
Differential Revision: https://reviews.llvm.org/D89218
Thanks markus for pointing this out.
2020-10-29 16:18:20 +05:30
David Green a4b6b1e1c8 [InterleaveAccess] Recognise Interleave loads through binary operations
Instcombine will currently sink identical shuffles though vector binary
operations. This is probably generally useful, but can break up the code
pattern we use to represent an interleaving load group. This patch
reverses that in the InterleaveAccessPass to re-recognise the pattern of
shuffles sunk past binary operations and folds them back if an
interleave group can be created.

Differential Revision: https://reviews.llvm.org/D89489
2020-10-29 09:13:23 +00:00
Vedant Kumar ffba94a9ac Revert "[DebugInfo] Fix legacy ZExt emission when FromBits >= 64 (PR47927)"
This reverts commit 9905346221.

It breaks the compiler-rt build, see https://reviews.llvm.org/D89838
2020-10-28 18:57:17 -07:00
Vedant Kumar 4fe81b6b6a Revert "[DebugInfo] Shorten legacy [s|z]ext dwarf expressions"
This reverts commit 2ce36ebca5. It depends
on https://reviews.llvm.org/D89838, which needs to be reverted.
2020-10-28 18:57:17 -07:00
Derek Schuff 77973f8dee [WebAssembly] Add support for DWARF type units
Since Wasm comdat sections work similarly to ELF, we can use that mechanism
    to eliminate duplicate dwarf type information in the same way.

    Differential Revision: https://reviews.llvm.org/D88603
2020-10-28 17:41:22 -07:00
Amy Huang 7669f3c0f6 Recommit "[CodeView] Emit static data members as S_CONSTANTs."
We used to only emit static const data members in CodeView as
S_CONSTANTS when they were used; this patch makes it so they are always emitted.

This changes CodeViewDebug.cpp to find the static const members from the
class debug info instead of creating DIGlobalVariables in the IR
whenever a static const data member is used.

Bug: https://bugs.llvm.org/show_bug.cgi?id=47580

Differential Revision: https://reviews.llvm.org/D89072

This reverts commit 504615353f.
2020-10-28 16:35:59 -07:00
Gaurav Jain f719fd7ade [NFC] Use [MC]Register in CSE & LICM
Differential Revision: https://reviews.llvm.org/D90327
2020-10-28 15:53:26 -07:00
Alok Kumar Sharma a6dd01afa3 [DebugInfo] Support for DW_TAG_generic_subrange
This is needed to support fortran assumed rank arrays which
have runtime rank.

  Summary:
Fortran assumed rank arrays have dynamic rank. DWARF TAG
DW_TAG_generic_subrange is needed to support that.

  Testing:
unit test cases added (hand-written)
check llvm
check debug-info

Reviewed By: aprantl

Differential Revision: https://reviews.llvm.org/D89218
2020-10-29 01:34:15 +05:30
Aditya Nandakumar bed8394047 [GISel]: Few InsertVecElt combines
https://reviews.llvm.org/D88060

This adds the following combines
1) build_vector formation from insert_vec_elts
2) insert_vec_elts (build_vector) -> build_vector
2020-10-28 12:27:07 -07:00
Mircea Trofin f0a98ad820 [NFC] Use Register in RegisterPressure APIs
Some related changes as well.

Differential Revision: https://reviews.llvm.org/D90268
2020-10-28 12:14:08 -07:00
Vedant Kumar 2ce36ebca5 [DebugInfo] Shorten legacy [s|z]ext dwarf expressions
Take advantage of the emitConstu helper to emit slightly shorter dwarf
expressions to implement legacy [s|z]ext operations.
2020-10-28 12:06:02 -07:00
Vedant Kumar 9905346221 [DebugInfo] Fix legacy ZExt emission when FromBits >= 64 (PR47927)
Fix an out-of-bounds shift in emitLegacyZExt by using a slightly more
complicated dwarf expression to create the zext mask.

This addresses a UBSan diagnostic seen when compiling compiler-rt
(llvm.org/PR47927).

rdar://70307714

Differential Revision: https://reviews.llvm.org/D89838
2020-10-28 12:06:02 -07:00
Matt Arsenault b9c21d43bb RegAlloc: Clear isSSA
The MIR parser may infer SSA, so -run-pass=regallocgreedy would hit a
verifier error after multiple vreg defs are added.
2020-10-28 12:02:16 -04:00
Djordje Todorovic 6384378582 [NFC][IntrRefLDV] Improve the Value printing
Basically, this just improves the dump of the Value stored within a location.

If the defining instruction number is zero, it means it is "live-in".

Before the patch:

ESI --> bb 0 inst 0 loc ESI
After:

ESI --> Value{bb: 0, inst: live-in, loc: ESI}
This is an NFC.

Differential Revision: https://reviews.llvm.org/D90309
2020-10-28 07:39:08 -07:00
Simon Pilgrim f53d7f55f1 [DAG] Move canFoldInAddressingMode before foldBinOpIntoSelect. NFC.
Reduces the diff in D90113.
2020-10-28 12:16:05 +00:00
Luqman Aden 4c0a016927 Rename EHPersonality::MSVC_Win64SEH to EHPersonality::MSVC_TableSEH. NFC.
The types of SEH aren't x86(-32) vs x64 but rather stack-based exception chaining
vs table-based exception handling. x86-32 is the only arch for which Windows
uses the former. 32-bit ARM would use what is called Win64SEH today, which
is a bit confusing so instead let's just rename it to be a bit more clear.

Reviewed By: compnerd, rnk

Differential Revision: https://reviews.llvm.org/D90117
2020-10-27 23:22:13 -07:00
Derek Schuff 44eea0b1a7 Revert "[WebAssembly] Add support for DWARF type units"
This reverts commit bcb8a119df.
2020-10-27 17:57:32 -07:00
Derek Schuff bcb8a119df [WebAssembly] Add support for DWARF type units
Since Wasm comdat sections work similarly to ELF, we can use that mechanism
to eliminate duplicate dwarf type information in the same way.

Differential Revision: https://reviews.llvm.org/D88603
2020-10-27 17:13:41 -07:00
Nicolai Hähnle e025d09b21 Revert multiple patches based on "Introduce CfgTraits abstraction"
These logically belong together since it's a base commit plus
followup fixes to less common build configurations.

The patches are:

Revert "CfgInterface: rename interface() to getInterface()"

This reverts commit a74fc48158.

Revert "Wrap CfgTraitsFor in namespace llvm to please GCC 5"

This reverts commit f2a06875b6.

Revert "Try to make GCC5 happy about the CfgTraits thing"

This reverts commit 03a5f7ce12.

Revert "Introduce CfgTraits abstraction"

This reverts commit c0cdd22c72.
2020-10-27 20:33:30 +01:00
Amy Huang 504615353f Revert "[CodeView] Emit static data members as S_CONSTANTs."
Seems like there's an assert in here that we shouldn't be running into.

This reverts commit 515973222e.
2020-10-27 11:29:58 -07:00
Vedant Kumar 5a3ef55a52 [Utils] Skip RemoveRedundantDbgInstrs in MergeBlockIntoPredecessor (PR47746)
This patch changes MergeBlockIntoPredecessor to skip the call to
RemoveRedundantDbgInstrs, in effect partially reverting D71480 due to
some compile-time issues spotted in LoopUnroll and SimplifyCFG.

The call to RemoveRedundantDbgInstrs appears to have changed the
worst-case behavior of the merging utility. Loosely speaking, it seems
to have gone from O(#phis) to O(#insts).

It might not be possible to mitigate this by scanning a block to
determine whether there are any debug intrinsics to remove, since such a
scan costs O(#insts).

So: skip the call to RemoveRedundantDbgInstrs. There's surprisingly
little fallout from this, and most of it can be addressed by doing
RemoveRedundantDbgInstrs later. The exception is (the block-local
version of) SimplifyCFG, where it might just be too expensive to call
RemoveRedundantDbgInstrs.

Differential Revision: https://reviews.llvm.org/D88928
2020-10-27 10:12:59 -07:00
Djordje Todorovic cca049ad2b [NFC][IntrRefLDV] Some code clean up
As reading the source code, I've found some minor nits:
  -Use using instead of typedef
  -Fix a comment
  -Refactor

Differential Revision: https://reviews.llvm.org/D90155
2020-10-27 05:31:24 -07:00
Sven van Haastregt 5d03080092 [TargetLowering] Add i1 condition for bit comparison fold
For i1 types, boolean false is represented identically regardless of
the boolean content, so we can allow optimizations that otherwise
would not be correct for booleans with false represented as a negative
one.

Patch by Erik Hogeman.

Differential Revision: https://reviews.llvm.org/D90145
2020-10-27 12:22:20 +00:00
Med Ismail Bennani a3aea0193d [llvm/DebugInfo] Simplify DW_OP_implicit_value condition (NFC)
Signed-off-by: Med Ismail Bennani <medismail.bennani@gmail.com>
2020-10-27 11:25:19 +01:00
Gaurav Jain 17cdba61d4 [NFC] Use [MC]Register in RegAllocPBQP & RegisterCoalescer
Differential Revision: https://reviews.llvm.org/D90008
2020-10-26 17:13:32 -07:00
Rahman Lavaee 0b2f4cdf2b Explicitly check for entry basic block, rather than relying on MachineBasicBlock::pred_empty.
Sometimes in unoptimized code, we have dangling unreachable basic blocks with no predecessors. Basic block sections should be emitted for those as well. Without this patch, the included test fails with a fatal error in `AsmPrinter::emitBasicBlockEnd`.

Reviewed By: tmsriram

Differential Revision: https://reviews.llvm.org/D89423
2020-10-26 16:15:56 -07:00
Amy Huang 515973222e [CodeView] Emit static data members as S_CONSTANTs.
We used to only emit static const data members in CodeView as
S_CONSTANTS when they were used; this patch makes it so they are always emitted.

I changed CodeViewDebug.cpp to find the static const members from the
class debug info instead of creating DIGlobalVariables in the IR
whenever a static const data member is used.

Bug: https://bugs.llvm.org/show_bug.cgi?id=47580

Differential Revision: https://reviews.llvm.org/D89072
2020-10-26 15:30:35 -07:00
Peter Waller 5b742a0c10 [SVE][CodeGen][DAGCombiner] Fix TypeSize warning in redundant store elimination
The modified code in visitSTORE was missing a scalable vector check, and still
using the now deprecated implicit cast of TypeSize to uint64_t through the
overloaded operator. This patch fixes these issues.

This brings the logic in line with the comment on the context line immediately
above the added precondition.

Add a test in sve-redundant-store.ll that the warning is not triggered.

Differential Revision: https://reviews.llvm.org/D89701
2020-10-26 16:37:48 +00:00
Peter Waller 6536d6040f Revert "[SVE][CodeGen][DAGCombiner] Fix TypeSize warning in redundant store elimination"
This reverts commit 4604441386.

Reverting because it was not the intended version of the patch, which
follows this patch.
2020-10-26 16:37:00 +00:00
Peter Waller 4604441386 [SVE][CodeGen][DAGCombiner] Fix TypeSize warning in redundant store elimination
The modified code in visitSTORE was missing a scalable vector check, and still
using the now deprecated implicit cast of TypeSize to uint64_t through the
overloaded operator. This patch fixes these issues.

This brings the logic in line with the comment on the context line immediately
above the added precondition.

Add a test in Redundantstores.ll that the warning is not triggered.
2020-10-26 16:23:42 +00:00
Djordje Todorovic a64b2c9366 [NFC][InstrRefLDV] Fix a typo 2020-10-26 04:04:16 -07:00
Florian Hahn b2bec7cece [AsmPrinter] Add per BB instruction mix remark.
This patch adds a remarks that provides counts for each opcode per basic block.

An snippet of the generated information can be seen below.

The current implementation uses the target specific opcode for the counts. For example, on AArch64 this means we currently get 2 entries for `add` instructions if the block contains 32 and 64 bit adds. Similarly, immediate version are treated differently.

Unfortunately there seems to be no convenient way to get only the mnemonic part of the instruction as a string AFAIK. This could be improved in the future.

```
--- !Analysis
Pass:            asm-printer
Name:            InstructionMix
DebugLoc:        { File: arm64-instruction-mix-remarks.ll, Line: 30, Column: 30 }
Function:        foo
Args:
  - String:          'BasicBlock: '
  - BasicBlock:      else
  - String:          "\n"
  - String:          INST_MADDWrrr
  - String:          ': '
  - INST_MADDWrrr:   '2'
  - String:          "\n"
  - String:          INST_MOVZWi
  - String:          ': '
  - INST_MOVZWi:     '1'
```

Reviewed By: anemet, thegameg, paquette

Differential Revision: https://reviews.llvm.org/D89892
2020-10-26 09:25:45 +00:00
David Green 61bc18de0b [Schedule] Add a MultiHazardRecognizer
This adds a MultiHazardRecognizer and starts to make use of it in the
ARM backend. The idea of the class is to allow multiple independent
hazard recognizers to be added to a single base MultiHazardRecognizer,
allowing them to all work in parallel without requiring them to be
chained into subclasses. They can then be added or not based on cpu or
subtarget features, which will become useful in the ARM backend once
more hazard recognizers are being used for various things.

This also renames ARMHazardRecognizer to ARMHazardRecognizerFPMLx in the
process, to more clearly explain what that recognizer is designed for.

Differential Revision: https://reviews.llvm.org/D72939
2020-10-26 08:06:17 +00:00
Simon Pilgrim ce356e1546 [DAG] Add BuildVectorSDNode::getRepeatedSequence helper to recognise multi-element splat patterns
Replace the X86 specific isSplatZeroExtended helper with a generic BuildVectorSDNode method.

I've just used this to simplify the X86ISD::BROADCASTM lowering so far (and remove isSplatZeroExtended), but we should be able to use this in more places to lower to complex broadcast patterns.

Differential Revision: https://reviews.llvm.org/D87930
2020-10-24 12:23:09 +01:00
Simon Pilgrim 62b17a7697 [LegalizeTypes] Legalize vector rotate operations
Lower vector rotate operations as long as the legalization occurs outside of LegalizeVectorOps.

This fixes https://bugs.llvm.org/show_bug.cgi?id=47320

Patch By: @rsanthir.quic (Ryan Santhirarajan)

Differential Revision: https://reviews.llvm.org/D89497
2020-10-24 11:30:32 +01:00
Med Ismail Bennani 64c4dac60e
[llvm/DebugInfo] Emit DW_OP_implicit_value when tuning for LLDB
This patch enables emitting DWARF `DW_OP_implicit_value` opcode when
tuning debug information for LLDB (`-debugger-tune=lldb`).

This will also propagate to Darwin platforms, since they use LLDB tuning
as a default.

rdar://67406059

Differential Revision: https://reviews.llvm.org/D90001

Signed-off-by: Med Ismail Bennani <medismail.bennani@gmail.com>
2020-10-24 06:45:33 +02:00
Mehdi Amini 8f492f6467 Remove unused verifyRegStateMapping() function in RegAllocFast (NFC)
This fixes compiler warning when building with assertions.
2020-10-24 00:36:51 +00:00
Cameron McInally a1cc274cb3 [SVE] Lower fixed length VECREDUCE_SEQ_FADD operation
Differential Revision: https://reviews.llvm.org/D89162
2020-10-23 16:24:02 -05:00
Nick Desaulniers b7926ce6d7 [IR] add fn attr for no_stack_protector; prevent inlining on mismatch
It's currently ambiguous in IR whether the source language explicitly
did not want a stack a stack protector (in C, via function attribute
no_stack_protector) or doesn't care for any given function.

It's common for code that manipulates the stack via inline assembly or
that has to set up its own stack canary (such as the Linux kernel) would
like to avoid stack protectors in certain functions. In this case, we've
been bitten by numerous bugs where a callee with a stack protector is
inlined into an __attribute__((__no_stack_protector__)) caller, which
generally breaks the caller's assumptions about not having a stack
protector. LTO exacerbates the issue.

While developers can avoid this by putting all no_stack_protector
functions in one translation unit together and compiling those with
-fno-stack-protector, it's generally not very ergonomic or as
ergonomic as a function attribute, and still doesn't work for LTO. See also:
https://lore.kernel.org/linux-pm/20200915172658.1432732-1-rkir@google.com/
https://lore.kernel.org/lkml/20200918201436.2932360-30-samitolvanen@google.com/T/#u

Typically, when inlining a callee into a caller, the caller will be
upgraded in its level of stack protection (see adjustCallerSSPLevel()).
By adding an explicit attribute in the IR when the function attribute is
used in the source language, we can now identify such cases and prevent
inlining.  Block inlining when the callee and caller differ in the case that one
contains `nossp` when the other has `ssp`, `sspstrong`, or `sspreq`.

Fixes pr/47479.

Reviewed By: void

Differential Revision: https://reviews.llvm.org/D87956
2020-10-23 11:55:39 -07:00
Mircea Trofin 819044ad2d [NFC] Use [MC]Register in RegAllocGreedy
This was initiated from the uses of MCRegUnitIterator, so while likely
not exhaustive, it's a step forward.

Differential Revision: https://reviews.llvm.org/D89975
2020-10-23 11:30:53 -07:00
Paulo Matos 69e2797eae [WebAssembly] Implementation of (most) table instructions
Implementation of instructions table.get, table.set, table.grow,
table.size, table.fill, table.copy.

Missing instructions are table.init and elem.drop as they deal with
element sections which are not yet implemented.

Added more tests to tables.s

Differential Revision: https://reviews.llvm.org/D89797
2020-10-23 08:42:54 -07:00
Jeremy Morse b1b2c6ab66 [DebugInstrRef] Handle DBG_INSTR_REFs use-before-defs in LiveDebugValues
Deciding where to place debugging instructions when normal instructions
sink between blocks is difficult -- see PR44117. Dealing with this with
instruction-referencing variable locations is simple: we just tolerate
DBG_INSTR_REFs referring to values that haven't been computed yet. This
patch adds support into InstrRefBasedLDV to record when a variable value
appears in the middle of a block, and should have a DBG_VALUE added when it
appears (a debug use before def).

While described simply, this relies heavily on the value-propagation
algorithm in InstrRefBasedLDV. The implementation doesn't attempt to verify
the location of a value unless something non-trivial occurs to merge
variable values in vlocJoin. This means that a variable with a value that
has no location can retain it across all control flow (including loops).
It's only when another debug instruction specifies a different variable
value that we have to check, and find there's no location.

This property means that if a machine value is defined in a block dominated
by a DBG_INSTR_REF that refers to it, all the successor blocks can
automatically find a location for that value (if it's not clobbered). Thus
in a sense, InstrRefBasedLDV is already supporting and implementing
use-before-defs. This patch allows us to specify a variable location in the
block where it's defined.

When loading live-in variable locations, TransferTracker currently discards
those where it can't find a location for the variable value. However, we
can tell from the machine value number whether the value is defined in this
block. If it is, add it to a set of use-before-def records. Then, once the
relevant instruction has been processed, emit a DBG_VALUE immediately after
it.

Differential Revision: https://reviews.llvm.org/D85775
2020-10-23 16:33:23 +01:00
Denis Antrushin 4f7ee55971 Revert "[Statepoints] Allow deopt GC pointer on VReg if gc-live bundle is empty."
Downstream testing revealed some problems with this patch.
Reverting while investigating.
This reverts commit 2b96dcebfa.
2020-10-23 21:55:06 +07:00
Jeremy Morse 68f4715716 [DebugInstrRef] Convert DBG_INSTR_REFs into variable locations
Handle DBG_INSTR_REF instructions in LiveDebugValues, to determine and
propagate variable locations. The logic is fairly straight forwards:
Collect a map of debug-instruction-number to the machine value numbers
generated in the first walk through the function. When building the
variable value transfer function and we see a DBG_INSTR_REF, look up the
instruction it refers to, and pick the machine value number it generates,
That's it; the rest of LiveDebugValues continues as normal.

Awkwardly, there are two kinds of instruction numbering happening here: the
offset into the block (which is how machine value numbers are determined),
and the numbers that we label instructions with when generating
DBG_INSTR_REFs.

I've also restructured the TransferTracker redefVar code a little, to
separate some DBG_VALUE specific operations into its own method. The
changes around redefVar should be largely NFC, while allowing
DBG_INSTR_REFs to specify a value number rather than just a location.

Differential Revision: https://reviews.llvm.org/D85771
2020-10-23 14:50:02 +01:00
Jeremy Morse ab93e71065 [DebugInstrRef] NFC: Separate collection of machine/variable values
This patch adjusts _when_ something happens in LiveDebugValues /
InstrRefBasedLDV, to make it more amenable to dealing with DBG_INSTR_REF
instructions. There's no functional change.

In the current InstrRefBasedLDV implementation, we collect the machine
value-number transfer function for blocks at the same time as the
variable-value transfer function. After solving machine value numbers, the
variable-value transfer function is updated so that DBG_VALUEs of live-in
registers have the correct value. The same would need to be done for
DBG_INSTR_REFs, to connect instruction-references with machine value
numbers.

Rather than writing more code for that, this patch separates the two: we
collect the (machine-value-number) transfer function and solve for
machine value numbers, then step through the MachineInstrs again collecting
the variable value transfer function. This simplifies things for the new
few patches.

Differential Revision: https://reviews.llvm.org/D85760
2020-10-23 11:13:20 +01:00
David Blaikie 4437df8eed DebugInfo: Hash DIE referevences (DW_OP_convert) when computing Split DWARF signatures 2020-10-22 20:09:33 -07:00
Han Shen e42f6c0ac0 Revert "[MBP] Add whole chain to BlockFilterSet instead of individual BB"
This reverts commit adfb541501.

This is reverted because it caused an chrome error: https://crbug.com/1140168
2020-10-22 17:31:01 -07:00
David Blaikie a66311277a DWARFv5: Disable DW_OP_convert for configurations that don't yet support it
Testing reveals that lldb and gdb have some problems with supporting
DW_OP_convert - gdb with Split DWARF tries to resolve the CU-relative
DIE offset relative to the skeleton DIE. lldb tries to treat the offset
as absolute, which judging by the llvm-dsymutil support for
DW_OP_convert, I guess works OK in MachO? (though probably llvm-dsymutil
is producing invalid DWARF by resolving the relative reference to an
absolute one?).

Specifically this disables DW_OP_convert usage in DWARFv5 if:
* Tuning for GDB and using Split DWARF
* Tuning for LLDB and not targeting MachO
2020-10-22 12:02:33 -07:00
Mircea Trofin e24537d48f [NFC][MC] Use MCRegister for ReachingDefAnalysis APIs
Also updated the users of the APIs; and a drive-by small change to
RDFRegister.cpp

Differential Revision: https://reviews.llvm.org/D89912
2020-10-22 08:47:35 -07:00
Jeremy Morse 68ac02c0dd [DebugInstrRef] Pass DBG_INSTR_REFs through register allocation
Both FastRegAlloc and LiveDebugVariables/greedy need to cope with
DBG_INSTR_REFs. None of them actually need to take any action, other than
passing DBG_INSTR_REFs through: variable location information doesn't refer
to any registers at this stage.

LiveDebugVariables stashes the instruction information in a tuple, then
re-creates it later. This is only necessary as the register allocator
doesn't expect to see any debug instructions while it's working. No
equivalence classes or interval splitting is required at all!

No changes are needed for the fast register allocator, as it just ignores
debug instructions. The test added checks that both of them preserve
DBG_INSTR_REFs.

This also expands ScheduleDAGInstrs.cpp to treat DBG_INSTR_REFs the same as
DBG_VALUEs when rescheduling instructions around. The current movement of
DBG_VALUEs around is less than ideal, but it's not a regression to make
DBG_INSTR_REFs subject to the same movement.

Differential Revision: https://reviews.llvm.org/D85757
2020-10-22 15:51:22 +01:00
Matt Arsenault 188df17420 ScheduleDAGInstrs: Skip debug instructions at end of scheduling region
If the end instruction of the scheduling region was a DBG_VALUE, the
uses of the debug instruction were tracked as if they were real
uses. This would then hit the deadDefHasNoUse assertion in
addVRegDefDeps if the only use was the debug instruction.
2020-10-22 10:16:45 -04:00
Jeremy Morse d73275993b [DebugInstrRef] Substitute debug value numbers to handle optimizations
This patch touches two optimizations, TwoAddressInstruction and X86's
FixupLEAs pass, both of which optimize by re-creating instructions. For
LEAs, various bits of arithmetic are better represented as LEAs on X86,
while TwoAddressInstruction sometimes converts instrs into three address
instructions if it's profitable.

For debug instruction referencing, both of these require substitutions to
be created -- the old instruction number must be pointed to the new
instruction number, as illustrated in the added test. If this isn't done,
any variable locations based on the optimized instruction are
conservatively dropped.

Differential Revision: https://reviews.llvm.org/D85756
2020-10-22 13:01:03 +01:00
Fangrui Song b0c12474ed [ShrinkWrap] Delete unneeded nullptr checks for the save point. NFC
findNearestCommonDominator never returns nullptr.
2020-10-22 00:27:01 -07:00
Xiang1 Zhang 7c3fea7721 [X86] Support customizing stack protector guard
Reviewed By: nickdesaulniers, MaskRay

Differential Revision: https://reviews.llvm.org/D88631
2020-10-22 10:08:14 +08:00
Craig Topper 9e884169a2 [FPEnv][X86][SystemZ] Use different algorithms for i64->double uint_to_fp under strictfp to avoid producing -0.0 when rounding toward negative infinity
Some of our conversion algorithms produce -0.0 when converting unsigned i64 to double when the rounding mode is round toward negative. This switches them to other algorithms that don't have this problem. Since it is undefined behavior to change rounding mode with the non-strict nodes, this patch only changes the behavior for strict nodes.

There are still problems with unsigned i32 conversions too which I'll try to fix in another patch.

Fixes part of PR47393

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D87115
2020-10-21 18:12:54 -07:00
Gaurav Jain 4634ad6c0b [NFC] Set return type of getStackPointerRegisterToSaveRestore to Register
Differential Revision: https://reviews.llvm.org/D89858
2020-10-21 16:19:38 -07:00
Jeremy Morse 537f0fbe82 [DebugInfo] Follow up c521e44def with an API improvement
As mentioned post-commit in D85749, the 'substituteDebugValuesForInst'
method added in c521e44def would be better off with a limit on the
number of operands to substitute. This handles the common case of
"substitute the first operand between these two differing instructions",
or possibly up to N first operands.
2020-10-21 14:45:55 +01:00
Simon Pilgrim 88523f6f4b [DAG] getNode(ISD::EXTRACT_SUBVECTOR) Drop unnecessary N2C null check - we assert that this isn't null and have already used the pointer. NFCI.
Fixes cppcheck + null dereference warning.
2020-10-21 11:53:44 +01:00
Nicholas Guy 9a2d2bedb7 Add "SkipDead" parameter to TargetInstrInfo::DefinesPredicate
Some instructions may be removable through processes such as IfConversion,
however DefinesPredicate can not be made aware of when this should be considered.
This parameter allows DefinesPredicate to distinguish these removable instructions
on a per-call basis, allowing for more fine-grained control from processes like
ifConversion.

Renames DefinesPredicate to ClobbersPredicate, to better reflect it's purpose

Differential Revision: https://reviews.llvm.org/D88494
2020-10-21 11:52:47 +01:00
Sven van Haastregt bfc961aeb2 [TargetLowering] Check boolean content when folding bit compare
Updates an optimization that relies on boolean contents being either 0
or 1 to properly check for this before triggering.

The following:
  (X & 8) != 0 --> (X & 8) >> 3
Produces unexpected results when a boolean 'true' value is represented
by negative one.

Patch by Erik Hogeman.

Differential Revision: https://reviews.llvm.org/D89390
2020-10-21 11:46:55 +01:00
David Sherwood 5b17b323a6 [SVE][CodeGen] Replace use of TypeSize comparison operator in CreateStackTemporary
We were previously relying upon the TypeSize comparison operators to
obtain the maximum size of two types, however use of such operators is
being deprecated in favour of making the caller aware that it could
be dealing with scalable vector types. I have changed the code to assert
that the two types have the same scalable property and thus we can
simply take the maximum of the known minimum sizes instead.

Differential Revision: https://reviews.llvm.org/D88563
2020-10-21 08:31:36 +01:00
Mircea Trofin 5e731625f3 [NFC][MC] Use [MC]Register in MachineVerifier
Differential Revision: https://reviews.llvm.org/D89815
2020-10-20 20:42:35 -07:00
Hubert Tong 134ffa8138 NFC: Fix -Wsign-compare warnings on 32-bit builds
Comparing 32-bit `ptrdiff_t` against 32-bit `unsigned` results in
`-Wsign-compare` warnings for both GCC and Clang.

The warning for the cases in question appear to identify an issue
where the `ptrdiff_t` value would be mutated via conversion to an
unsigned type.

The warning is resolved by using the usual arithmetic conversions to
safely preserve the value of the `unsigned` operand while trying to
convert to a signed type. Host platforms where `unsigned` has the same
width as `unsigned long long` will need to make a different change, but
using an explicit cast has disadvantages that can be avoided for now.

Reviewed By: dantrushin

Differential Revision: https://reviews.llvm.org/D89612
2020-10-20 20:52:10 -04:00
Austin Kerbow 37d907899f [HazardRec] Allow inserting multiple wait-states simultaneously
If a target can encode multiple wait-states into a noop allow emitting such
instructions directly.

Reviewed By: rampitec, dmgreen

Differential Revision: https://reviews.llvm.org/D89753
2020-10-20 17:03:47 -07:00
Nicolai Hähnle c0cdd22c72 Introduce CfgTraits abstraction
The CfgTraits abstraction simplfies writing algorithms that are
generic over the type of CFG, and enables writing such algorithms
as regular non-template code that operates on opaque references
to CFG blocks and values.

Implementations of CfgTraits provide operations on the concrete
CFG types, e.g. `IrCfgTraits::BlockRef` is `BasicBlock *`.

CfgInterface is an abstract base class which provides operations
on opaque types CfgBlockRef and CfgValueRef. Those opaque types
encapsulate a `void *`, but the meaning depends on the concrete
CFG type. For example, MachineCfgTraits -- for use with MachineIR
in SSA form -- encodes a Register inside CfgValueRef. Converting
between concrete references and opaque/generic ones is done by
CfgTraits::{fromGeneric,toGeneric}. Convenience methods
CfgTraits::{un}wrap{Iterator,Range} are available as well.

Writing algorithms in terms of CfgInterface adds some overhead
(virtual method calls, plus in same cases it removes the
opportunity to inline iterators), but can be much more convenient
since generic algorithms can be written as non-templates.

This patch adds implementations of CfgTraits for all CFGs on
which dominator trees are calculated, so that the dominator
tree can be ported to this machinery. Only IrCfgTraits (LLVM IR)
and MachineCfgTraits (Machine IR in SSA form) are complete, the
other implementations are limited to the absolute minimum
required to make the upcoming dominator tree changes work.

v5:
- fix MachineCfgTraits::blockdef_iterator and allow it to iterate over
  the instructions in a bundle
- use MachineBasicBlock::printName

v6:
- implement predecessors/successors for all CfgTraits implementations
- fix error in unwrapRange
- rename toGeneric/fromGeneric into wrapRef/unwrapRef to have naming
  that is consistent with {wrap,unwrap}{Iterator,Range}
- use getVRegDef instead of getUniqueVRegDef

v7:
- std::forward fix in wrapping_iterator
- fix typos

v8:
- cleanup operators on CfgOpaqueType
- address other review comments

Change-Id: Ia75f4f268fded33fca11218a7d578c9aec1f3f4d

Differential Revision: https://reviews.llvm.org/D83088
2020-10-20 13:50:52 +02:00
Luqman Aden 51892a42da [COFF][ARM] Fix CodeView for Windows on 32bit ARM targets.
Create the LLVM / CodeView register mappings for the 32-bit ARM Window targets.

Reviewed By: compnerd

Differential Revision: https://reviews.llvm.org/D89622
2020-10-19 22:16:16 -07:00
Qiu Chaofan 1b2fe71ecf [DAGCombiner] Tighten reasscociation of visitFMA
From LangRef, FMF contract should not enable reassociating to form
arbitrary contractions. So it should not help rearrange nodes like
(fma (fmul x, c1), c2, y) into (fma x, c1*c2, y).

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D89527
2020-10-20 10:13:01 +08:00
Craig Topper edd0cb11bd [SelectionDAG][X86] Enable SimplifySetCC CTPOP transforms for vector splats
This enables these transforms for vectors:
(ctpop x) u< 2 -> (x & x-1) == 0
(ctpop x) u> 1 -> (x & x-1) != 0
(ctpop x) == 1 --> (x != 0) && ((x & x-1) == 0)
(ctpop x) != 1 --> (x == 0) || ((x & x-1) != 0)

All enabled if CTPOP isn't Legal. This differs from the scalar
behavior where the first two are done unconditionally and the
last two are done if CTPOP isn't Legal or Custom. The Legal
check produced better results for vectors based on X86's
custom handling. Might be worth re-visiting scalars here.

I disabled the looking through truncate for vectors. The
code that creates new setcc can use the same result VT as the
original setcc even if we truncated the input. That may work
work for most scalars, but definitely wouldn't work for vectors
unless it was a vector of i1.

Fixes or at least improves PR47825

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D89346
2020-10-19 12:56:59 -07:00
Amy Kwan 6a946fd06f [DAGCombiner][PowerPC] Remove isMulhCheaperThanMulShift TLI hook, Use isOperationLegalOrCustom directly instead.
MULH is often expanded on targets.
This patch removes the isMulhCheaperThanMulShift hook and uses
isOperationLegalOrCustom instead.

Differential Revision: https://reviews.llvm.org/D80485
2020-10-19 12:23:04 -05:00
Mircea Trofin 225065b9a8 [NFC][MC] Type [MC]Register uses in MachineTraceMetrics
Differential Revision: https://reviews.llvm.org/D89710
2020-10-19 09:49:52 -07:00
David Sherwood 3945b69e81 [SVE][CodeGen] Replace more TypeSize comparison operators with their scalar equivalents
In certain places in llvm/lib/CodeGen we were relying upon the TypeSize
comparison operators when in fact the code was only ever expecting
either scalar values or fixed width vectors. This patch changes a few
functions that were always expecting to work on scalar or fixed width
types:

1. DAGCombiner::mergeTruncStores - deals with scalar integers only.
2. DAGCombiner::ReduceLoadWidth - not valid for vectors.
3. DAGCombiner::createBuildVecShuffle - should only be used for
   fixed width vectors.
4. SelectionDAGLegalize::ExpandFCOPYSIGN and
   SelectionDAGLegalize::getSignAsIntValue - only work on scalars.

Differential Revision: https://reviews.llvm.org/D88562
2020-10-19 08:38:50 +01:00
David Sherwood 35a531fb45 [SVE][CodeGen][NFC] Replace TypeSize comparison operators with their scalar equivalents
In certain places in llvm/lib/CodeGen we were relying upon the TypeSize
comparison operators when in fact the code was only ever expecting
either scalar values or fixed width vectors. I've changed some of these
places to use the equivalent scalar operator.

Differential Revision: https://reviews.llvm.org/D88482
2020-10-19 08:30:31 +01:00
David Sherwood f693f915a0 [SVE][CodeGen] Replace uses of TypeSize comparison operators
In certain places in the code we can never end up in a situation where
we're mixing fixed width and scalable vector types. For example,
we can't have truncations and extends that change the lane count. Also,
in other places such as GenWidenVectorStores and GenWidenVectorLoads we
know from the behaviour of FindMemType that we can never choose a vector
type with a different scalable property.

In various places I have used EVT::bitsXY functions instead of
TypeSize::isKnownXY, where it probably makes sense to keep an assert
that scalable properties match.

Differential Revision: https://reviews.llvm.org/D88654
2020-10-19 08:08:41 +01:00
Alok Kumar Sharma 0538353b3b [DebugInfo] Support for DWARF operator DW_OP_over
LLVM rejects DWARF operator DW_OP_over. This DWARF operator is needed
for Flang to support assumed rank array.

  Summary:
Currently LLVM rejects DWARF operator DW_OP_over. Below error is
produced when llvm finds this operator.
[..]
invalid expression
!DIExpression(151, 20, 16, 48, 30, 35, 80, 34, 6)
warning: ignoring invalid debug info in over.ll
[..]
There were some parts missing in support of this operator, which are
now completed.

  Testing
-added a unit testcase
-check-debuginfo
-check-llvm

Reviewed By: aprantl

Differential Revision: https://reviews.llvm.org/D89208
2020-10-17 08:42:28 +05:30
Craig Topper 278bd06891 [TargetLowering] Extract simplifySetCCs ctpop into a separate function. NFCI
As requested in D89346. This allows us to add some early outs.

I reordered some checks a little bit to make the more common bail outs happen earlier. Like checking opcode before checking hasOneUse. And I moved the bit width check to make sure it was safe to look through a truncate to the spot where we look through truncates instead of after.

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D89494
2020-10-16 19:47:56 -07:00
Jameson Nash 4242df1470 Revert "make the AsmPrinterHandler array public"
I messed up one of the tests.
2020-10-16 17:22:07 -04:00
Jameson Nash ac2def2d8d make the AsmPrinterHandler array public
This lets external consumers customize the output, similar to how
AssemblyAnnotationWriter lets the caller define callbacks when printing
IR. The array of handlers already existed, this just cleans up the code
so that it can be exposed publically.

Differential Revision: https://reviews.llvm.org/D74158
2020-10-16 16:27:31 -04:00
Amara Emerson 6042c25b0a [GlobalISel] Add translation support for vector reduction intrinsics.
In order to prevent the ExpandReductions pass from expanding some intrinsics
before they get to codegen, I had to add a -disable-expand-reductions flag
for testing purposes.

Differential Revision: https://reviews.llvm.org/D89028
2020-10-16 10:17:53 -07:00
Jay Foad 0c1381d795 [llc] Use -filetype=null to disable MIR printing
If you use -stop-after or similar options, llc will normally print MIR.
This patch checks for -filetype=null as a special case to disable MIR
printing. As the comment says, "The Null output is intended for use for
performance analysis ...", and I found this useful for timing a subset
of the passes that llc runs without the significant overhead of printing
MIR just to send it to /dev/null.

Differential Revision: https://reviews.llvm.org/D89476
2020-10-16 16:51:56 +01:00
Amara Emerson c2551c1f40 [GlobalISel] Remove scalar src from non-sequential fadd/fmul reductions.
It's probably better to split these into separate G_FADD/G_FMUL + G_VECREDUCE
operations in the translator rather than carrying the scalar around. The
majority of the time it'll get simplified away as the scalars are probably
identity values.

Differential Revision: https://reviews.llvm.org/D89150
2020-10-15 15:51:44 -07:00
Denis Antrushin 8f0ddd4a1a [Statepoints] Remove MI limit on number of tied operands.
After D87915 statepoint can have more than 15 tied operands.
Remove this restriction from statepoint lowering code.
2020-10-15 19:02:38 +07:00
Adrian Kuegel ead2aa7098 Fix unused variable warning when compiling with asserts disabled.
Differential Revision: https://reviews.llvm.org/D89454
2020-10-15 12:50:19 +02:00
Jeremy Morse c521e44def [DebugInstrRef] Support recording of instruction reference substitutions
Add a table recording "substitutions" between pairs of <instruction,
operand> numbers, from old pairs to new pairs. Post-isel optimizations are
able to record the outcome of an optimization in this way. For example, if
there were a divide instruction that generated the quotient and remainder,
and it were replaced by one that only generated the quotient:

  $rax, $rcx = DIV-AND-REMAINDER $rdx, $rsi, debug-instr-num 1
  DBG_INSTR_REF 1, 0
  DBG_INSTR_REF 1, 1

Became:

  $rax = DIV $rdx, $rsi, debug-instr-num 2
  DBG_INSTR_REF 1, 0
  DBG_INSTR_REF 1, 1

We could enter a substitution from <1, 0> to <2, 0>, and no substitution
for <1, 1> as it's no longer generated.

This approach means that if an instruction or value is deleted once we've
left SSA form, all variables that used the value implicitly become
"optimized out", something that isn't true of the current DBG_VALUE
approach.

Differential Revision: https://reviews.llvm.org/D85749
2020-10-15 11:30:14 +01:00
Denis Antrushin 8c2b69d53a [Statepoints] Unlimited tied operands.
Current limit on amount of tied operands (15) sometimes is too low
for statepoint. We may get couple dozens of gc pointer operands on
statepoint.
Review D87154 changed format of statepoint to list every gc pointer
only once, which makes it trivial to find tiedness relation between
statepoint operands: defs are mapped 1-1 to gc pointer operands passed
on registers.

Reviewed By: skatkov

Differential Revision: https://reviews.llvm.org/D87915
2020-10-15 16:16:11 +07:00
Craig Topper 50c9f1e11d [TargetLowering] Replace Log2_32_Ceil with Log2_32 in SimplifySetCC ctpop combine.
This combine can look through (trunc (ctpop X)). When doing this
it tries to make sure the trunc doesn't lose any information
from the ctpop. It does this by checking that the truncated type
has more bits that Log2_32_Ceil of the ctpop type. The Ceil is
unnecessary and pessimizes non-power of 2 types.

For example, ctpop of i256 requires 9 bits to represent the max
value of 256. But ctpop of i255 only requires 8 bits to represent
the max result of 255. Log2_32_Ceil of 256 and 255 both return 8
while Log2_32 returns 8 for 256 and 7 for 255

The code with popcnt enabled is a regression for this test case,
but it does match what already happens with i256 truncated to i9.
Since power of 2 is more likely, I don't think it should block
this change.

Differential Revision: https://reviews.llvm.org/D89412
2020-10-15 01:05:07 -07:00
Snehasish Kumar 24bf6ff4e0 [llvm] Update default cutoff threshold for machine function splitter.
Based on internal testing at Google we found that setting the profile
summary cutoff threshold to 999950 yields the best results in terms of
itlb and icache metrics (as observed on Intel CPUs).

*default* = Split out code if no profile count available for block
*size-%*  = The fraction of bytes split out of .text and .text.hot
*itlb*    = Misses per kilo instructions (MPKI) for itlb
*icache*  = Misses per kilo instructions (MPKI) for L1 icache

Search1

| cutoff  | size-%  | itlb      | icache  |
|---------|---------|-----------|---------|
| default | 42.5861 | 0.0822151 | 2.46363 |
|  999999 | 44.9350 | 0.0767194 | 2.44416 |
|  999950 | 50.0660 |  0.075744 |  2.4091 |
|  999500 | 56.9158 |  0.082564 |  2.4188 |
|  995000 | 63.8625 | 0.0814927 | 2.42832 |
|  990000 | 71.7314 |  0.106906 | 2.57785 |

Search2

| cutoff  | size-% | itlb     | icache  |
|---------|--------|----------|---------|
| default | 2.8845 | 0.626712 | 4.73245 |
|  999999 | 3.3291 | 0.602309 | 4.70045 |
|  999950 | 3.8577 | 0.587842 | 4.71632 |
|  999500 | 4.4170 |  0.63577 | 4.68351 |
|  995000 | 5.1020 | 0.657969 | 4.82272 |
|  990000 | 5.7153 | 0.719122 | 5.39496 |

Differential Revision: https://reviews.llvm.org/D89085
2020-10-14 12:48:10 -07:00
Snehasish Kumar 77638a5343 [llvm] Set the default for -bbsections-cold-text-prefix to .text.split.
After using this for a while, we find that it is generally useful to
have it set to .text.split. by default, removing the need for an
additional -mllvm option.

Differential Revision: https://reviews.llvm.org/D88997
2020-10-14 12:16:36 -07:00
Guozhi Wei adfb541501 [MBP] Add whole chain to BlockFilterSet instead of individual BB
Currently we add individual BB to BlockFilterSet if its frequency satisfies

LoopFreq / Freq <= LoopToColdBlockRatio

LoopFreq is edge frequency from outside to loop header.
LoopToColdBlockRatio is a command line parameter.

It doesn't make sense since we always layout whole chain, not individual BBs.

It may also cause a tricky problem. Sometimes it is possible that the LoopFreq
of an inner loop is smaller than LoopFreq of outer loop. So a BB can be in
BlockFilterSet of inner loop, but not in BlockFilterSet of outer loop,
like .cold in the test case. So it is added to the chain of inner loop. When
work on the outer loop, .cold is not added to BlockFilterSet, so the edge to
successor .problem is not counted in UnscheduledPredecessors of .problem chain.
But other blocks in the inner loop are added BlockFilterSet, so the whole inner
loop chain can be layout, and markChainSuccessors is called to decrease
UnscheduledPredecessors of following chains. markChainSuccessors calls
markBlockSuccessors for every BB, even it is not in BlockFilterSet, like .cold,
so .problem chain's UnscheduledPredecessors is decreased, but this edge was not
counted on in fillWorkLists, so .problem chain's UnscheduledPredecessors
becomes 0 when it still has an unscheduled predecessor .pred! And it causes
problems in following various successor BB selection algorithms.

Differential Revision: https://reviews.llvm.org/D89088
2020-10-14 11:55:10 -07:00
jasonliu f85bcc21dd [AIX] Turn -fdata-sections on by default in Clang
Summary:

This patch does the following:
1. Make InitTargetOptionsFromCodeGenFlags() accepts Triple as a
 parameter, because some options' default value is triple dependant.
2. DataSections is turned on by default on AIX for llc.
3. Test cases change accordingly because of the default behaviour change.
4. Clang Driver passes in -fdata-sections by default on AIX.

Reviewed By: MaskRay, DiggerLin

Differential Revision: https://reviews.llvm.org/D88737
2020-10-14 15:58:31 +00:00
Mircea Trofin c8fcffe775 [NFC][MC] Use MCRegister in Machine{Sink|Pipeliner}.cpp
Differential Revision: https://reviews.llvm.org/D89328
2020-10-14 08:42:17 -07:00
Michael Liao ae40d2858e Fix an apparent typo. `assert()` must not contain side-effects. NFC. 2020-10-14 11:33:34 -04:00
Jeremy Morse c4e7857d4e [DebugInstrRef] Create DBG_INSTR_REFs in SelectionDAG
When given the -experimental-debug-variable-locations option (via -Xclang
or to llc), have SelectionDAG generate DBG_INSTR_REF instructions instead
of DBG_VALUE. For now, this only happens in a limited circumstance: when
the value referred to is not a PHI and is defined in the current block.
Other situations introduce interesting problems, addresed in later patches.

Practically, this patch hooks into InstrEmitter and if it can find a
defining instruction for a value, gives it an instruction number, and
points the DBG_INSTR_REF at that <instr, operand> pair.

Differential Revision: https://reviews.llvm.org/D85747
2020-10-14 14:24:08 +01:00
Jeremy Morse 2c5f3d54c5 [DebugInstrRef] Parse debug instruction-references from/to MIR
This patch defines the MIR format for debug instruction references: it's an
integer trailing an instruction, marked out by "debug-instr-number", much
like how "debug-location" identifies the DebugLoc metadata of an
instruction. The instruction number is stored directly in a MachineInstr.

Actually referring to an instruction comes in a later patch, but is done
using one of these instruction numbers.

I've added a round-trip test and two verifier checks: that we don't label
meta-instructions as generating values, and that there are no duplicates.

Differential Revision: https://reviews.llvm.org/D85746
2020-10-14 10:57:09 +01:00
Aditya Nandakumar ef3d17482f [GISel] Add combine for constant G_PTR_ADD offsets.
https://reviews.llvm.org/D88865

This adds a single combine for GlobalISel to fold:

ptradd (inttoptr C1) C2
Into:

C1 + C2
Additionally, a small test for AArch64 is added.

Patch by pnappa.
2020-10-13 17:26:12 -07:00
Andrew Paverd cfd8481da1 Reland [CFGuard] Add address-taken IAT tables and delay-load support
This patch adds support for creating Guard Address-Taken IAT Entry Tables (.giats$y sections) in object files, matching the behavior of MSVC. These contain lists of address-taken imported functions, which are used by the linker to create the final GIATS table.
Additionally, if any DLLs are delay-loaded, the linker must look through the .giats tables and add the respective load thunks of address-taken imports to the GFIDS table, as these are also valid call targets.

Reviewed By: rnk

Differential Revision: https://reviews.llvm.org/D87544
2020-10-13 13:20:52 -07:00
Mircea Trofin 08097fc6a9 [NFC][Regalloc] Use MCRegister in MachineCopyPropagation
Differential Revision: https://reviews.llvm.org/D89250
2020-10-13 09:05:08 -07:00
Mirko Brkusanin 52ba4fa6aa [GlobalISel] Avoid making G_PTR_ADD with nullptr
When the first operand is a null pointer we can avoid making a G_PTR_ADD and
make a G_INTTOPTR with the offset operand.
This helps us avoid making add with 0 later on for targets such as AMDGPU.

Differential Revision: https://reviews.llvm.org/D87140
2020-10-13 13:02:55 +02:00
Craig Topper 1687a8d83b [X86][SelectionDAG] Add SADDO_CARRY and SSUBO_CARRY to support multipart signed add/sub overflow legalization.
This passes existing X86 test but I'm not sure if it handles all type
legalization cases it needs to.

Alternative to D89200

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D89222
2020-10-12 23:18:29 -07:00
Konstantin Schwarz 7341123439 [GlobalISel][KnownBits] Early return on out of bound shift amounts
If the known shift amount is bigger than or equal to the bitwidth of the type of the value to be shifted,
the result is target dependent, so don't try to infer any bits.

This fixes a crash we've seen in one of our internal test suites.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D89232
2020-10-12 18:39:19 +02:00
Mircea Trofin 43d347995c [NFC][MC] Use MCRegister in LiveRangeMatrix
The change starts from LiveRangeMatrix and also checks the users of the
APIs are typed accordingly.

Differential Revision: https://reviews.llvm.org/D89145
2020-10-12 08:54:36 -07:00
Mircea Trofin 596a9f6b89 [NFC][Regalloc] Pass VirtRegMap by reference.
It's never null - the reason it's modeled as a pointer is because the
pass can't init it in its ctor. Passing by ref simplifies the code, too,
as the null checks were unnecessary complexity.

Differential Revision: https://reviews.llvm.org/D89171
2020-10-12 08:32:30 -07:00
Simon Pilgrim c252200e4d [DAG][ARM][MIPS][RISCV] Improve funnel shift promotion to use 'double shift' patterns
Based on a discussion on D88783, if we're promoting a funnel shift to a width at least twice the size as the original type, then we can use the 'double shift' patterns (shifting the concatenated sources).

Differential Revision: https://reviews.llvm.org/D89139
2020-10-12 14:11:02 +01:00
David Sherwood c5ba0d33cc [SVE] Make ElementCount and TypeSize use a new PolySize class
I have introduced a new template PolySize class, where the template
parameter determines the type of quantity, i.e. for an element
count this is just an unsigned value. The ElementCount class is
now just a simple derivation of PolySize<unsigned>, whereas TypeSize
is more complicated because it still needs to contain the uint64_t
cast operator, since there are still many places in the code that
rely upon this implicit cast. As such the class also still needs
some of it's own operators.

I've tried to minimise the amount of code in the base PolySize
class, which led to a couple of changes:

1. In some places we were relying on '==' operator comparisons
between ElementCounts and the scalar value 1. I didn't put this
operator in the new PolySize class, and thought it was actually
clearer to use the isScalar() function instead.
2. I removed the isByteSized function and replaced it with calls
to isKnownMultipleOf(8).

I've also renamed NextPowerOf2 to be coefficientNextPowerOf2 so
that it's more consistent with coefficientDivideBy.

Differential Revision: https://reviews.llvm.org/D88409
2020-10-12 08:23:38 +01:00
Fangrui Song cddb49bcc0 [SchedDAGInstrs] Delete redundant contains(). NFC 2020-10-11 20:58:30 -07:00
Krzysztof Parzyszek 61eaa2e14a [SDAG] Remember to set UndefElts in isSplatValue for SPLAT_VECTOR 2020-10-10 19:42:24 -05:00
David Green cb27006a94 [ARM] Attempt to make Tail predication / RDA more resilient to empty blocks
There are a number of places in RDA where we assume the block will not
be empty. This isn't necessarily true for tail predicated loops where we
have removed instructions. This attempt to make the pass more resilient
to empty blocks, not casting pointers to machine instructions where they
would be invalid.

The test contains a case that was previously failing, but recently been
hidden on trunk. It contains an empty block to begin with to show a
similar error.

Differential Revision: https://reviews.llvm.org/D88926
2020-10-10 14:50:25 +01:00
Alok Kumar Sharma 96bd4d34a2 [DebugInfo] Support for DWARF attribute DW_AT_rank
This patch adds support for DWARF attribute DW_AT_rank.

  Summary:
Fortran assumed rank arrays have dynamic rank. DWARF attribute
DW_AT_rank is needed to support that.

  Testing:
unit test cases added (hand-written)
check llvm
check debug-info

Reviewed By: aprantl

Differential Revision: https://reviews.llvm.org/D89141
2020-10-10 17:51:12 +05:30
Denis Antrushin 2b96dcebfa [Statepoints] Allow deopt GC pointer on VReg if gc-live bundle is empty.
Currently we allow passing pointers from deopt bundle on VReg only if
they were seen in list of gc-live pointers passed on VRegs.
This means that for the case of empty gc-live bundle we spill deopt
bundle's pointers. This change allows lowering deopt pointers to VRegs
in case of empty gc-live bundle. In case of non-empty gc-live bundle,
behavior does not change.

Reviewed By: skatkov

Differential Revision: https://reviews.llvm.org/D88999
2020-10-10 14:58:08 +07:00
Mircea Trofin c11c20fb00 [NFC][Regalloc] VirtRegAuxInfo::Hint does not need to be a field
It is only used in weightCalcHelper, and cleared upon its finishing its
job there.

The patch further cleans up style guide discrepancies, and simplifies
CopyHint by removing duplicate 'IsPhys' information (it's what the Reg
field would report).
2020-10-09 13:42:23 -07:00
Mircea Trofin 62e2ac6461 [NFC][Regalloc] Fix coding style in CalcSpillWeights 2020-10-09 12:22:12 -07:00
Scott Linder 4a98cf7867 [NFC] Reformat MILexer.cpp:getIdentifierKind
Reformat to avoid unrelated changes in diff of future patch.
Committed as obvious.
2020-10-09 15:21:24 +00:00
Esme-Yi e9fd8823ba [DAGCombiner] Add decomposition patterns for Mul-by-Imm.
Summary: This patch is derived from D87384.
In this patch we expand the existing decomposition of mul-by-constant to be more general by implementing 2 patterns:
```
  mul x, (2^N + 2^M) --> (add (shl x, N), (shl x, M))
  mul x, (2^N - 2^M) --> (sub (shl x, N), (shl x, M))
```
The conversion will be trigged if the multiplier is a big constant that the target can't use a single multiplication instruction to handle. This is controlled by the hook `decomposeMulByConstant`.
More over, the conversion benefits from an ILP improvement since the instructions are independent. A case with the sequence like following also gets benefit since a shift instruction is saved.

```
*res1 = a * 0x8800;
*res2 = a * 0x8080;
```

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D88201
2020-10-09 08:51:40 +00:00
Fangrui Song 2c4c2dc2d9 [MCRegister] Simplify isStackSlot & isPhysicalRegister and delete isPhysical. NFC 2020-10-08 22:08:33 -07:00
Fangrui Song c3de9a9e69 Fix incorect Register -> MCRegister conversion
getReg returns a Register which may represent a virtual register.
2020-10-08 21:40:48 -07:00
Kazushi (Jam) Marukawa 1d1c1f8ff2 [VE] Add new MVT types for NEC SX Aurora VE vector
This patch adds entries for:
    v64i64
    v128i64
    v256i64
    v64f64
    v128f64
    v256f64

Reviewed By: simoll

Differential Revision: https://reviews.llvm.org/D88776
2020-10-09 12:07:41 +09:00
Kai Luo 8a5858c8fd [TwoAddressInstruction][PowerPC] Call `regOverlapsSet` to find out real clobbers and uses
In `rescheduleKillAboveMI`, current implementation uses `SmallSet` to track reg's defs and uses. When comparing, use `SmallSet.count` to find out if it's clobbered or used. It's not correct if involving subregisters. This patch uses `regOverlapsSet` already used by `rescheduleMIBelowKill` to fix the issue.

Fixed https://bugs.llvm.org/show_bug.cgi?id=47707.

Reviewed By: #powerpc, nemanjai

Differential Revision: https://reviews.llvm.org/D88716
2020-10-09 02:34:54 +00:00
Mircea Trofin 4cfc4025cc [NFC][MC] MCRegister API typing.
Mostly LiveIntervals, with their effects (users).

Differential Revision: https://reviews.llvm.org/D89018
2020-10-08 15:08:34 -07:00
Quentin Colombet fd8275e04a [GlobalISel] Add missing pass dependencies for IRTranslator
The IRTranslator depends on the branch probability info pass when the
optimization level is different than None and it depends all the time on
the StackProtector pass.

We have to explicitly call out pass dependencies otherwise the pass manager
may not be able to schedule the IRTranslator.

Before this patch, we were lucky because previous passes depend on the branch
probability info pass (like the Global Variable Optimization) and the stack
protector pass is initialized in initializeCodeGen.
However, if the target has a custom pipeline without any passes like Global
Variable Optimization, the pipeline creation will fail, at least because of
the branch probability info pass dependency (it is unlikely that
initializeCodeGen is not called).

This patch adds the missing dependencies to the IRTranslator.

Differential Revision: https://reviews.llvm.org/D89063
2020-10-08 13:57:21 -07:00
Rahman Lavaee 2b0c5d76a6 Introduce and use a new section type for the bb_addr_map section.
This patch lets the bb_addr_map (renamed to __llvm_bb_addr_map) section use a special section type (SHT_LLVM_BB_ADDR_MAP) instead of SHT_PROGBITS. This would help parsers, dumpers and other tools to use the sh_type ELF field to identify this section rather than relying on string comparison on the section name.

Reviewed By: jhenderson

Differential Revision: https://reviews.llvm.org/D88199
2020-10-08 11:13:19 -07:00
Amara Emerson 283b4d6ba3 [GlobalISel] Add G_VECREDUCE_* opcodes for vector reductions.
These mirror the IR and SelectionDAG intrinsics & nodes.

Opcodes added:
G_VECREDUCE_SEQ_FADD
G_VECREDUCE_SEQ_FMUL
G_VECREDUCE_FADD
G_VECREDUCE_FMUL
G_VECREDUCE_FMAX
G_VECREDUCE_FMIN
G_VECREDUCE_ADD
G_VECREDUCE_MUL
G_VECREDUCE_AND
G_VECREDUCE_OR
G_VECREDUCE_XOR
G_VECREDUCE_SMAX
G_VECREDUCE_SMIN
G_VECREDUCE_UMAX
G_VECREDUCE_UMIN

Differential Revision: https://reviews.llvm.org/D88750
2020-10-08 10:33:19 -07:00
diggerlin 92bca12843 [AIX] add new option -mignore-xcoff-visibility
SUMMARY:

In IBM compiler xlclang , there is an option -fnovisibility which suppresses visibility. For more details see: https://www.ibm.com/support/knowledgecenter/SSGH3R_16.1.0/com.ibm.xlcpp161.aix.doc/compiler_ref/opt_visibility.html.

We need to add the option -mignore-xcoff-visibility for compatibility with the IBM AIX OS (as the option is enabled by default in AIX). With this option llvm does not emit any visibility attribute to ASM or XCOFF object file.

The option only work on the AIX OS, for other non-AIX OS using the option will report an unsupported options error.

In AIX OS:

1.1  the option -mignore-xcoff-visibility is enabled by default , if there is not -fvisibility=* and -mignore-xcoff-visibility explicitly in the clang command .

1.2 if there is -fvisibility=* explicitly but not -mignore-xcoff-visibility  explicitly in the clang command.  it will generate visibility attributes.

1.3 if there are  both  -fvisibility=* and  -mignore-xcoff-visibility  explicitly in the clang command. The option  "-mignore-xcoff-visibility" wins , it do not emit the visibility attribute.

The option -mignore-xcoff-visibility has no effect on visibility attribute when compile with -emit-llvm option to generated LLVM IR.

Reviewer: daltenty,Jason Liu

Differential Revision: https://reviews.llvm.org/D87451
2020-10-08 09:34:58 -04:00
Geoffrey Martin-Noble 93db4a8ce6 Remove unused variables
These are unused since
https://reviews.llvm.org/rG35cb45c533fb76dcfc9f44b4e8bbd5d8a855ed2a
causing `-Wunused` warnings.

Differential Revision: https://reviews.llvm.org/D89022
2020-10-07 18:30:12 -07:00
Anna Thomas 35cb45c533 [ImplicitNullChecks] Support complex addressing mode
The pass is updated to handle loads through complex addressing mode,
specifically, when we have a scaled register and a scale.
It requires two API updates in TII which have been implemented for X86.

See added IR and MIR testcases.

Tests-Run: make check
Reviewed-By: reames, danstrushin
Differential Revision: https://reviews.llvm.org/D87148
2020-10-07 20:55:38 -04:00
Mircea Trofin 297655c123 [NFC][regalloc] Use MCRegister instead of unsigned in InterferenceCache
Also changed users of APIs.

Differential Revision: https://reviews.llvm.org/D88930
2020-10-07 14:48:43 -07:00
Rahman Lavaee 34cd06a9b3 [BasicBlockSections] Make sure that the labels for address-taken blocks are emitted after switching the seciton.
Currently, AsmPrinter code is organized in a way in which the labels of address-taken blocks are emitted in the previous section, which makes the relocation incorrect.
This patch reorganizes the code to switch to the basic block section before handling address-taken blocks.

Reviewed By: snehasish, MaskRay

Differential Revision: https://reviews.llvm.org/D88517
2020-10-07 13:22:38 -07:00
Amara Emerson e72cfd938f Rename the VECREDUCE_STRICT_{FADD,FMUL} SDNodes to VECREDUCE_SEQ_{FADD,FMUL}.
The STRICT was causing unnecessary confusion. I think SEQ is a more accurate
name for what they actually do, and the other obvious option of "ORDERED"
has the issue of already having a meaning in FP contexts.

Differential Revision: https://reviews.llvm.org/D88791
2020-10-07 10:45:09 -07:00
Amara Emerson 322d0afd87 [llvm][mlir] Promote the experimental reduction intrinsics to be first class intrinsics.
This change renames the intrinsics to not have "experimental" in the name.

The autoupgrader will handle legacy intrinsics.

Relevant ML thread: http://lists.llvm.org/pipermail/llvm-dev/2020-April/140729.html

Differential Revision: https://reviews.llvm.org/D88787
2020-10-07 10:36:44 -07:00
Jay Foad 1aa8e6a51a [SDag] SimplifyDemandedBits: simplify to FP constant if all bits known
We were already doing this for integer constants. This patch implements
the same thing for floating point constants.

Differential Revision: https://reviews.llvm.org/D88570
2020-10-07 09:24:38 +01:00
Chen Zheng ed46e84c7a [MachineInstr] exclude call instruction in mayAlias
we now get noAlias result for a call instruction and other
load/store/call instructions if we query mayAlias.
This is not right as call instruction is not with mayloadorstore,
but it may alter the memory.

This patch fixes this wrong alias query.

Differential Revision: https://reviews.llvm.org/D87490
2020-10-07 00:12:21 -04:00
Bill Wendling d2c61d2bf9 [CodeGen][TailDuplicator] Don't duplicate blocks with INLINEASM_BR
Tail duplication of a block with an INLINEASM_BR may result in a PHI
node on the indirect branch. This is okay, but it also introduces a copy
for that PHI node *after* the INLINEASM_BR, which is not okay.

See: https://github.com/ClangBuiltLinux/linux/issues/1125

Differential Revision: https://reviews.llvm.org/D88823
2020-10-06 18:44:59 -07:00
Mircea Trofin d85b845cb2 [NFC][MC] Type uses of MCRegUnitIterator as MCRegister
This is one of many subsequent similar changes. Note that we're ok with
the parameter being typed as MCPhysReg, as MCPhysReg -> MCRegister is a
correct conversion; Register -> MCRegister assumes the former is indeed
physical, so we stop relying on the implicit conversion and use the
explicit, value-asserting asMCReg().

Differential Revision: https://reviews.llvm.org/D88862
2020-10-06 12:09:56 -07:00
Dmitri Gribenko b3876ef490 Silence -Wunused-variable in NDEBUG mode 2020-10-06 16:02:17 +02:00
Denis Antrushin c08d48fc2d [Statepoints] Change statepoint machine instr format to better suit VReg lowering.
Current Statepoint MI format is this:

   STATEPOINT
   <id>, <num patch bytes >, <num call arguments>, <call target>,
   [call arguments...],
   <StackMaps::ConstantOp>, <calling convention>,
   <StackMaps::ConstantOp>, <statepoint flags>,
   <StackMaps::ConstantOp>, <num deopt args>, [deopt args...],
   <gc base/derived pairs...> <gc allocas...>

Note that GC pointers are listed in pairs <base,derived>.
This causes base pointers to appear many times (at least twice) in
instruction, which is bad for us when VReg lowering is ON.
The problem is that machine operand tiedness is 1-1 relation, so
it might look like this:

  %vr2 = STATEPOINT ... %vr1, %vr1(tied-def0)

Since only one instance of %vr1 is tied, that may lead to incorrect
codegen (see PR46917 for more details), so we have to always spill
base pointers. This mostly defeats new VReg lowering scheme.

This patch changes statepoint instruction format so that every
gc pointer appears only once in operand list. That way they all can
be tied. Additional set of operands is added to preserve base-derived
relation required to build stackmap.
New statepoint has following format:

  STATEPOINT
  <id>, <num patch bytes>, <num call arguments>, <call target>,
  [call arguments...],
  <StackMaps::ConstantOp>, <calling convention>,
  <StackMaps::ConstantOp>, <statepoint flags>,
  <StackMaps::ConstantOp>, <num deopt args>, [deopt args...],
  <StackMaps::ConstantOp>, <num gc pointers>, [gc pointers...],
  <StackMaps::ConstantOp>, <num gc allocas>,  [gc allocas...]
  <StackMaps::ConstantOp>, <num entries in gc map>, [base/derived indices...]

Changes are:
  - every gc pointer is listed only once in a flat length-prefixed list;
  - alloca list is prefixed with its length too;
  - following alloca list is length-prefixed list of base-derived
    indices of pointers from gc pointer list. Note that indices are
    logical (number of pointer), not absolute (index of machine operand).

Differential Revision: https://reviews.llvm.org/D87154
2020-10-06 17:40:29 +07:00
David Sherwood 4ed47d50ea [SVE][CodeGen] Fix DAGCombiner::ForwardStoreValueToDirectLoad for scalable vectors
In DAGCombiner::ForwardStoreValueToDirectLoad I have fixed up some
implicit casts from TypeSize -> uint64_t and replaced calls to
getVectorNumElements() with getVectorElementCount(). There are some
simple cases of forwarding that we can definitely support for
scalable vectors, i.e. when the store and load are both scalable
vectors and have the same size. I have added tests for the new
code paths here:

  CodeGen/AArch64/sve-forward-st-to-ld.ll

Differential Revision: https://reviews.llvm.org/D87098
2020-10-06 08:04:03 +01:00
Carl Ritson ea9d6392f4 Fix reordering of instructions during VirtRegRewriter unbundling
When unbundling COPY bundles in VirtRegRewriter the start of the
bundle is not correctly referenced in the unbundling loop.

The effect of this is that unbundled instructions are sometimes
inserted out-of-order, particular in cases where multiple
reordering have been applied to avoid clobbering dependencies.
The resulting instruction sequence clobbers dependencies.

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D88821
2020-10-06 09:43:02 +09:00
Mircea Trofin b268e24d43 [NFC][regalloc] Separate iteration from AllocationOrder
This separates the two concerns - encapsulation of traversal order; and
iteration.

Differential Revision: https://reviews.llvm.org/D88256
2020-10-05 16:13:18 -07:00
Craig Topper 1127662c6d [SelectionDAG] Make sure FMF are propagated when getSetcc canonicalizes FP constants to RHS.
getNode handling for ISD:SETCC calls FoldSETCC which can canonicalize
FP constants to the RHS. When this happens we should create the node
with the FMF that was requested. By using FlagInserter when can ensure
any calls to getNode/getSetcc during canonicalization will also get the flags.

Differential Revision: https://reviews.llvm.org/D88063
2020-10-05 14:55:23 -07:00
Fangrui Song 27e1cc6f39 Cleanup CodeGen/CallingConvLower.cpp
Patch by pi1024e (email unavailable)

Differential Revision: https://reviews.llvm.org/D82593
2020-10-05 14:47:46 -07:00
Jon Roelofs db80cc397e [CodeGen][MachineSched] Fixup function name typo. NFC 2020-10-05 12:43:50 -07:00
Mircea Trofin 82ebbcfb05 [NFC][regalloc] Model weight normalization as a virtual
Continuing from D88499, we can now model the normalization function as a
virtual member of VirtRegAuxInfo. Note that the default
(normalizeSpillWeight) is also used stand-alone in RAGreedy.

Differential Revision: https://reviews.llvm.org/D88713
2020-10-05 11:33:07 -07:00
David Sherwood fa0293081d [SVE][CodeGen] Fix TypeSize/ElementCount related warnings in sve-split-store.ll
I have fixed up a number of warnings resulting from TypeSize -> uint64_t
casts and calling getVectorNumElements() on scalable vector types. I
think most of the changes are fairly trivial except for those in
DAGTypeLegalizer::SplitVecRes_MSTORE I've tried to ensure we create
the MachineMemoryOperands in a sensible way for scalable vectors.

I have added a CHECK line to the following test:

 CodeGen/AArch64/sve-split-store.ll

that ensures no new warnings are added.

Differential Revision: https://reviews.llvm.org/D86928
2020-10-05 19:27:00 +01:00
Amara Emerson c2bce848ec [GlobalISel] Fix CSEMIRBuilder silently allowing use-before-def.
If a CSEMIRBuilder query hits the instruction at the current insert point,
move insert point ahead one so that subsequent uses of the builder don't end up with
uses before defs.

This fix also shows that AMDGPU was also affected by this bug often, but got away
with it because it was using a G_IMPLICIT_DEF before the use.

Differential Revision: https://reviews.llvm.org/D88605
2020-10-05 11:00:00 -07:00
Qiu Chaofan b326d4ff94 [SelectionDAG] Don't remove unused negated constant immediately
This reverts partial of a2fb5446 (actually, 2508ef01) about removing
negated FP constant immediately if it has no uses. However, as discussed
in bug 47517, there're cases when NegX is folded into constant from
other places while NegY is removed by that line of code and NegX is
equal to NegY. In these cases, NegX is deleted before used and crash
happens. So revert the code and add necessary test case.
2020-10-06 01:16:45 +08:00
Sanjay Patel 2ccbf3dbd5 [SDAG] fold x * 0.0 at node creation time
In the motivating case from https://llvm.org/PR47517
we create a node that does not get constant folded
before getNegatedExpression is attempted from some
other node, and we crash.

By moving the fold into SelectionDAG::simplifyFPBinop(),
we get the constant fold sooner and avoid the problem.
2020-10-04 11:31:57 -04:00
Denis Antrushin 7b19cd06d7 [Statepoints][ISEL] visitGCRelocate: chain to current DAG root.
This is similar to D87251, but for CopyFromRegs nodes.
Even for local statepoint uses we generate CopyToRegs/CopyFromRegs
nodes.  When generating CopyFromRegs in visitGCRelocate, we must chain
to current DAG root, not EntryNode, to ensure proper ordering of copy
w.r.t. statepoint node producing result for it.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D88639
2020-10-02 21:41:22 +07:00
David Sherwood b0ce9f0f4c [SVE][CodeGen] Fix implicit TypeSize->uint64_t casts in TypePromotion
The TypePromotion pass only operates on scalar types so I've fixed up
all places where we were relying upon the implicit cast from
TypeSize->uint64_t.

Differential Revision: https://reviews.llvm.org/D88575
2020-10-02 08:12:11 +01:00
David Sherwood b8ce6a6756 [SVE][CodeGen] Add new EVT/MVT getFixedSizeInBits() functions
When we know that a particular type is always going to be fixed
width we have so far been writing code like this:

  getSizeInBits().getFixedSize()

Since we are doing this in quite a few places now it seems to make
sense to add a new helper function that allows us to replace
these calls with a single getFixedSizeInBits() call.

Differential Revision: https://reviews.llvm.org/D88649
2020-10-02 07:47:31 +01:00
Carl Ritson 5136f4748a CodeGen: Fix livein calculation in MachineBasicBlock splitAt
Fix and simplify computation of liveins for new block.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D88535
2020-10-02 10:45:04 +09:00
jasonliu 78a9e62aa6 [XCOFF] Enable -fdata-sections on AIX
Summary:
Some design decision worth noting about:

I've noticed a recent mailing discussing about why string literal is
not affected by -fdata-sections for ELF target:
http://lists.llvm.org/pipermail/llvm-dev/2020-September/145121.html

But on AIX, our linker could not split the mergeable string like other target.
So I think it would make more sense for us to emit separate csect for
every mergeable string in -fdata-sections mode,
as there might not be other ways for linker to do garbage collection
on unused mergeable string.

Reviewed By: daltenty, hubert.reinterpretcast

Differential Revision: https://reviews.llvm.org/D88339
2020-10-02 00:16:24 +00:00
Arthur Eubanks 499260c03b Revert "[CFGuard] Add address-taken IAT tables and delay-load support"
This reverts commit ef4e971e5e.
2020-10-01 11:29:54 -07:00
David Sherwood 15474d7691 [SVE][CodeGen] Replace use of TypeSize operator< in GlobalMerge::doMerge
We don't support global variables with scalable vector types so I've
changed the code to compare the fixed sizes instead.

Differential Revision: https://reviews.llvm.org/D88564
2020-10-01 14:06:59 +01:00
Andrew Paverd ef4e971e5e [CFGuard] Add address-taken IAT tables and delay-load support
This patch adds support for creating Guard Address-Taken IAT Entry Tables (.giats$y sections) in object files, matching the behavior of MSVC. These contain lists of address-taken imported functions, which are used by the linker to create the final GIATS table.
Additionally, if any DLLs are delay-loaded, the linker must look through the .giats tables and add the respective load thunks of address-taken imports to the GFIDS table, as these are also valid call targets.

Reviewed By: rnk

Differential Revision: https://reviews.llvm.org/D87544
2020-10-01 12:45:07 +01:00
Kerry McLaughlin fcf70e1e3b [SVE][CodeGen] Lower scalable fp_extend & fp_round operations
This patch adds FP_EXTEND_MERGE_PASSTHRU & FP_ROUND_MERGE_PASSTHRU
ISD nodes, used to lower scalable vector fp_extend/fp_round operations.
fp_round has an additional argument, the 'trunc' flag, which is an integer of zero or one.

This also fixes a warning introduced by the new tests added to sve-split-fcvt.ll,
resulting from an implicit TypeSize -> uint64_t cast in SplitVecOp_FP_ROUND.

Reviewed By: sdesmalen, paulwalker-arm

Differential Revision: https://reviews.llvm.org/D88321
2020-10-01 12:17:37 +01:00
Rahman Lavaee 8955950c12 Exception support for basic block sections
This is part of the Propeller framework to do post link code layout optimizations. Please see the RFC here: https://groups.google.com/forum/#!msg/llvm-dev/ef3mKzAdJ7U/1shV64BYBAAJ and the detailed RFC doc here: https://github.com/google/llvm-propeller/blob/plo-dev/Propeller_RFC.pdf

This patch provides exception support for basic block sections by splitting the call-site table into call-site ranges corresponding to different basic block sections. Still all landing pads must reside in the same basic block section (which is guaranteed by the the core basic block section patch D73674 (ExceptionSection) ). Each call-site table will refer to the landing pad fragment by explicitly specifying @LPstart (which is omitted in the normal non-basic-block section case). All these call-site tables will share their action and type tables.

The C++ ABI somehow assumes that no landing pads point directly to LPStart (which works in the normal case since the function begin is never a landing pad), and uses LP.offset = 0 to specify no landing pad. In the case of basic block section where one section contains all the landing pads, the landing pad offset relative to LPStart could actually be zero. Thus, we avoid zero-offset landing pads by inserting a **nop** operation as the first non-CFI instruction in the exception section.

**Background on Exception Handling in C++ ABI**
https://github.com/itanium-cxx-abi/cxx-abi/blob/master/exceptions.pdf

Compiler emits an exception table for every function. When an exception is thrown, the stack unwinding library queries the unwind table (which includes the start and end of each function) to locate the exception table for that function.

The exception table includes a call site table for the function, which is used to guide the exception handling runtime to take the appropriate action upon an exception. Each call site record in this table is structured as follows:

| CallSite                       |  -->  Position of the call site (relative to the function entry)
| CallSite length           |  -->  Length of the call site.
| Landing Pad               |  -->  Position of the landing pad (relative to the landing pad fragment’s begin label)
| Action record offset  |  -->  Position of the first action record

The call site records partition a function into different pieces and describe what action must be taken for each callsite. The callsite fields are relative to the start of the function (as captured in the unwind table).

The landing pad entry is a reference into the function and corresponds roughly to the catch block of a try/catch statement. When execution resumes at a landing pad, it receives an exception structure and a selector value corresponding to the type of the exception thrown, and executes similar to a switch-case statement. The landing pad field is relative to the beginning of the procedure fragment which includes all the landing pads (@LPStart). The C++ ABI requires all landing pads to be in the same fragment. Nonetheless, without basic block sections, @LPStart is the same as the function @Start (found in the unwind table) and can be omitted.

The action record offset is an index into the action table which includes information about which exception types are caught.

**C++ Exceptions with Basic Block Sections**
Basic block sections break the contiguity of a function fragment. Therefore, call sites must be specified relative to the beginning of the basic block section. Furthermore, the unwinding library should be able to find the corresponding callsites for each section. To do so, the .cfi_lsda directive for a section must point to the range of call-sites for that section.
This patch introduces a new **CallSiteRange** structure which specifies the range of call-sites which correspond to every section:

  `struct CallSiteRange {
    // Symbol marking the beginning of the precedure fragment.
    MCSymbol *FragmentBeginLabel = nullptr;
    // Symbol marking the end of the procedure fragment.
    MCSymbol *FragmentEndLabel = nullptr;
    // LSDA symbol for this call-site range.
    MCSymbol *ExceptionLabel = nullptr;
    // Index of the first call-site entry in the call-site table which
    // belongs to this range.
    size_t CallSiteBeginIdx = 0;
    // Index just after the last call-site entry in the call-site table which
    // belongs to this range.
    size_t CallSiteEndIdx = 0;
    // Whether this is the call-site range containing all the landing pads.
    bool IsLPRange = false;
  };`

With N basic-block-sections, the call-site table is partitioned into N call-site ranges.

Conceptually, we emit the call-site ranges for sections sequentially in the exception table as if each section has its own exception table. In the example below, two sections result in the two call site ranges (denoted by LSDA1 and LSDA2) placed next to each other. However, their call-sites will refer to records in the shared Action Table. We also emit the header fields (@LPStart and CallSite Table Length) for each call site range in order to place the call site ranges in separate LSDAs. We note that with -basic-block-sections, The CallSiteTableLength will not actually represent the length of the call site table, but rather the reference to the action table. Since the only purpose of this field is to locate the action table, correctness is guaranteed.

Finally, every call site range has one @LPStart pointer so the landing pads of each section must all reside in one section (not necessarily the same section). To make this easier, we decide to place all landing pads of the function in one section (hence the `IsLPRange` field in CallSiteRange).

|  @LPStart                   |  --->  Landing pad fragment     ( LSDA1 points here)
| CallSite Table Length | ---> Used to find the action table.
| CallSites                     |
| …                                 |
| …                                 |
| @LPStart                    |  --->  Landing pad fragment ( LSDA2 points here)
| CallSite Table Length |
| CallSites                     |
| …                                 |
| …                                 |
…
…
|      Action Table          |
|      Types Table           |

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D73739
2020-09-30 11:05:55 -07:00
Mircea Trofin d6de40f886 [NFC][regalloc] Make VirtRegAuxInfo part of allocator state
All the state of VRAI is allocator-wide, so we can avoid creating it
every time we need it. In addition, the normalization function is
allocator-specific. In a next change, we can simplify that design in
favor of just having it as a virtual member.

Differential Revision: https://reviews.llvm.org/D88499
2020-09-30 08:13:05 -07:00
Matt Arsenault 5aa1119537 GlobalISel: Assert if MoreElements uses a non-vector type 2020-09-30 10:36:00 -04:00
Matt Arsenault d93459992e LiveDebugValues: Fix typos and indentation 2020-09-30 10:35:25 -04:00
Matt Arsenault a66fca44ac RegAllocFast: Add extra DBG_VALUE for live out spills
This allows LiveDebugValues to insert the proper DBG_VALUEs in live
out blocks if a spill is inserted before the use of a
register. Previously, this would see the register use as the last
DBG_VALUE, even though the stack slot should be treated as the live
out value.

This avoids an lldb test regression when D52010 is re-applied.
2020-09-30 10:35:25 -04:00
Matt Arsenault 89baeaef2f Reapply "RegAllocFast: Rewrite and improve"
This reverts commit 73a6a164b8.
2020-09-30 10:35:25 -04:00
Gabriel Hjort Åkerlund 43d239d0fa [GlobalISel] Fix incorrect setting of ValNo when splitting
Before, for each original argument i, ValNo was set to i + PartIdx, but
ValNo is intended to reflect the index of the value before splitting.
Hence, ValNo should always be set to i and not consider the PartIdx.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D86511
2020-09-30 16:08:51 +02:00
Sam Parker 3f88c10a6b [RDA] isSafeToDefRegAt: Look at global uses
We weren't looking at global uses of a value, so we could happily
overwrite the register incorrectly.

Differential Revision: https://reviews.llvm.org/D88554
2020-09-30 14:06:45 +01:00
Jay Foad cdac4492b4 [SplitKit] Cope with no live subranges in defFromParent
Following on from D87757 "[SplitKit] Only copy live lanes", it is
possible to split a live range at a point when none of its subranges
are live. This patch handles that case by inserting an implicit def
of the superreg.

Patch by Quentin Colombet!

Differential Revision: https://reviews.llvm.org/D88397
2020-09-30 10:16:25 +01:00
Sam Parker 779a8a028f [ARM][LowOverheadLoops] TryRemove helper.
Make a helper function that wraps around RDA::isSafeToRemove and
utilises the existing DCE IT block checks.
2020-09-30 09:37:24 +01:00
Sam Parker 700f93e92b [RDA] Switch isSafeToMove iterators
So forwards is forwards and backwards is reverse. Also add a check
so that we know the instructions are in the expected order.

Differential Revision: https://reviews.llvm.org/D88419
2020-09-30 08:10:48 +01:00
Amara Emerson 1d54e75cf2 [GlobalISel] Fix multiply with overflow intrinsics legalization generating invalid MIR.
During lowering of G_UMULO and friends, the previous code moved the builder's
insertion point to be after the legalizing instruction. When that happened, if
there happened to be a "G_CONSTANT i32 0" immediately after, the CSEMIRBuilder
would try to find that constant during the buildConstant(zero) call, and since
it dominates itself would return the iterator unchanged, even though the def
of the constant was *after* the current insertion point. This resulted in the
compare being generated *before* the constant which it was using.

There's no need to modify the insertion point before building the mul-hi or
constant. Delaying moving the insert point ensures those are built/CSEd before
the G_ICMP is built.

Fixes PR47679

Differential Revision: https://reviews.llvm.org/D88514
2020-09-29 18:40:58 -07:00
Zequan Wu 6c91e623e5 [CodeGen] emit CG profile for COFF object file
Differential Revision: https://reviews.llvm.org/D87811
2020-09-29 12:03:30 -07:00
Mircea Trofin 6d193ba333 [NFC][regalloc] Unit test for AllocationOrder iteration.
Added unittests. In the process, separated core construction - which just
needs the hits, order, and 'HardHints' values - from construction from
current register allocation state, to simplify testing.

Differential Revision: https://reviews.llvm.org/D88455
2020-09-29 10:48:07 -07:00
Krzysztof Parzyszek db04bec5f1 [SDAG] Do not convert undef to 0 when folding CONCAT/BUILD_VECTOR
Differential Revision: https://reviews.llvm.org/D88273
2020-09-29 09:12:26 -05:00
Dominik Montada 113114a5da [GlobalISel] fix widenScalarUnmerge if widen type is not a multiple of destination type
Fix creation of illegal unmerge when widen was requested to a type which
is not a multiple of the destination type. E.g. when trying to widen
an s48 unmerge to s64 the existing code would create an illegal unmerge
from s64 to s48.

Instead, create further unmerges to a GCD type, then use this to remerge
these intermediate results to the actual destinations.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D88422
2020-09-29 15:52:20 +02:00
Jay Foad 781edd501c [SDag] Verify DAG divergence after dumping. NFC.
When debugging, it's useful to be able to see the DAG that has just
failed divergence verification.
2020-09-29 14:05:07 +01:00
Jay Foad d6b04f3937 [SDag] Refactor and simplify divergence calculation and checking. NFC. 2020-09-29 14:05:07 +01:00
Ruiling Song 73805329ba [RegisterCoalescer] Pass Undefs to extendToIndices()
When extending the subranges, the reaching-def may be an undefs. When
extending such kind of subrange, it will try to search for the reaching
def first. If the reaching def is an undef and we did not provide 'Undefs',
The findReachingDefs() will fail with message:
"Use of $noreg does not have a corresponding definition on every path:
 LLVM ERROR: Use not jointly dominated by defs."
So we computeSubRangeUndefs() and pass the result to extendToIndices().

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D87744
2020-09-29 08:14:24 +08:00
Fangrui Song bd08a87cfe [EHStreamer] Simplify sharedTypeIDs with std::mismatch
(Note that EMStreamer.cpp is largely under tested. The only test checking the prefix sharing is CodeGen/WebAssembly/eh-lsda.ll)
2020-09-28 15:05:59 -07:00
Amara Emerson 082321909e [GlobalISel] Add support for lowering of vector G_SELECT and use for AArch64.
The lowering is a port of the SDAG expansion.

Differential Revision: https://reviews.llvm.org/D88364
2020-09-28 14:00:46 -07:00
Jessica Paquette a52e78012a [GlobalISel] Combine (xor (and x, y), y) -> (and (not x), y)
When we see this:

```
%and = G_AND %x, %y
%xor = G_XOR %and, %y
```

Produce this:

```
%not = G_XOR %x, -1
%new_and = G_AND %not, %y
```

as long as we are guaranteed to eliminate the original G_AND.

Also matches all commuted forms. E.g.

```
%and = G_AND %y, %x
%xor = G_XOR %y, %and
```

will be matched as well.

Differential Revision: https://reviews.llvm.org/D88104
2020-09-28 10:08:14 -07:00
David Sherwood bafdd11326 [SVE] Replace / operator in TypeSize/ElementCount with divideCoefficientBy
After some recent upstream discussion we decided that it was best
to avoid having the / operator for both ElementCount and TypeSize,
since this could give the impression that these classes can be used
in the same way as basic integer integer types. However, division
for scalable types is a bit odd because we are only dividing the
minimum quantity by a value, as opposed to something like:

  (MinSize * Vscale) / SomeValue

This is why when performing division it's important the caller
first establishes whether the operation makes sense, perhaps by
calling isKnownMultipleOf() prior to division. The caller must now
explictly call divideCoefficientBy() on the class to perform the
operation.

Differential Revision: https://reviews.llvm.org/D87700
2020-09-28 08:03:00 +01:00
Arthur Eubanks a2578e92e2 Revert "Reland [CodeGen] emit CG profile for COFF object file"
This reverts commit 506b6170cb.

This still causes link errors, see https://crbug.com/1130780.
2020-09-27 22:43:14 -07:00
Nikita Popov f229bf2e12 [Legalize][X86] Improve nnan fmin/fmax vector reduction
Use +/-Inf or +/-Largest as neutral element for nnan fmin/fmax
reductions. This avoids dropping any FMF flags. Preserving the
nnan flag in particular is important to get a good lowering on X86.

Differential Revision: https://reviews.llvm.org/D87586
2020-09-27 10:47:35 +02:00
Chen Zheng c8f6c0f961 [Machinesink] add one more profitable loop related pattern
Reviewed By: qcolombet

Differential Revision: https://reviews.llvm.org/D86925
2020-09-26 21:02:21 -04:00
Simon Pilgrim a61272a900 [DAG] Fold vector mul(x,0)/mul(x,1) to a clearing mask
If we're multiplying all elements of a vector by '0' or '1' then we can more efficiently perform this as a clearing mask (that is likely to further simplify to a shuffle blend).

This was noticed when reviewing D87502 but seems to help idiv/irem by constant cases even more as '0'/'1' values are often used for 'passthrough' cases.

Differential Revision: https://reviews.llvm.org/D88225
2020-09-26 14:31:57 +01:00
Simon Pilgrim decc1944f3 MachineCSE.cpp - use auto const& iterators in for-range loops to avoid copies. NFCI. 2020-09-26 14:31:57 +01:00
Simon Atanasyan c6c5629f2f [CodeGen] Do not call `emitGlobalConstantLargeInt` for constant requires 8 bytes to store
This is a fix for PR47630. The regression is caused by the D78011. After
this change the code starts to call the `emitGlobalConstantLargeInt` even
for constants which requires eight bytes to store.

Differential revision: https://reviews.llvm.org/D88261
2020-09-26 08:58:46 +03:00
Qiu Chaofan c0f8e4c06c [SelectionDAG] Add guard to automatically insert flags
This is like FastMathFlagGuard in IR. Since we use SDAG instance to get
values, it's with SelectionDAG. By creating a FlagInserter in current
scope, all values created by getNode will get the flags if no Flags
argument provided.

In this patch, I applied it to floating point operations folding part in
DAG combiner, and removed Flags passing to getNode to show its effect.
Other places in DAG combiner and other helper methods similar to getNode
also need this. They can be done in follow-up patches.

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D87361
2020-09-26 13:57:52 +08:00
Dávid Bolvanský 179e15d53a [SystemZ] Optimize bcmp calls (PR47420)
Solves https://bugs.llvm.org/show_bug.cgi?id=47420

Reviewed By: uweigand

Differential Revision: https://reviews.llvm.org/D87988
2020-09-25 17:55:39 +02:00
Jay Foad b34ddfcc76 [SplitKit] In addDeadDef tolerate parent range that defines more lanes
Following on from D87757 "[SplitKit] Only copy live lanes", in
SplitEditor::addDeadDef, when we're checking whether the parent live
interval has a subrange defining the same lanes, tolerate the case
where the parent subrange defines a superset of the lanes. This can
happen when the child subrange comes from SplitEditor::buildCopy
decomposing a partial copy into a sequence of subreg copies that cover
the required lanes.

Differential Revision: https://reviews.llvm.org/D88020
2020-09-25 11:31:56 +01:00
Sam Parker a399d1880b [ARM] Find VPT implicitly predicated by VCTP
On failing to find a VCTP in the list of instructions that explicitly
predicate the entry of a VPT block, inspect whether the block is
controlled via VPT which is implicitly predicated due to it's
predicated operand(s).

Differential Revision: https://reviews.llvm.org/D87819
2020-09-25 08:50:53 +01:00
Snehasish Kumar d2696dec45 [llvm] Add -bbsections-cold-text-prefix to emit cold clusters to a different section.
This change adds an option to basic block sections to allow cold
clusters to be assigned a custom text prefix. With a custom prefix such
as ".text.split." (D87840), lld can place them in a separate output section.
The benefits are -

* Empirically shown to improve icache and itlb metrics by 3-5%
(absolute) compared to placing split parts in .text.unlikely.
* Mitigates against poor profiles, eg samplePGO profiles used with the
machine function splitter. Optimizations such as hugepage remapping can
make different decisions at the section granularity.
* Enables section granularity hotness monitoring (checking on the
decisions made during compilation vs sample data from production).

Differential Revision: https://reviews.llvm.org/D87813
2020-09-24 15:26:15 -07:00
Sriraman Tallam e39286510d Temporary fix for D85085 debug_loc bug with basic block sections.
Until then, this one line fix removes the assert fail with basic block sections
with debug info. Bug tracking this: #47549
This fix does not generate loc list or DW_AT_const_value if the argument is
mentioned in a different section than the start of the function.

Temporarily fixes bugzilla : https://bugs.llvm.org/show_bug.cgi?id=47549

Differential Revision: https://reviews.llvm.org/D87787
2020-09-24 14:41:49 -07:00
Zequan Wu 506b6170cb Reland [CodeGen] emit CG profile for COFF object file
This reverts commit 90242caca2.

Error fixed at f5435399e8

Differential Revision: https://reviews.llvm.org/D87811
2020-09-24 14:38:53 -07:00
Bill Wendling 0c0c57f7b2 Revert "[CodeGen] Postprocess PHI nodes for callbr"
Accidental commit.

This reverts commit 7f4c940bd0.
2020-09-24 14:35:23 -07:00
Bill Wendling 7f4c940bd0 [CodeGen] Postprocess PHI nodes for callbr
When processing PHI nodes after a callbr, we need to make sure that the
PHI nodes on the default branch are resolved after the callbr
(inserted after INLINEASM_BR). The PHI node values on the indirect
branches are processed before the INLINEASM_BR.

Differential Revision: https://reviews.llvm.org/D86260
2020-09-24 14:34:28 -07:00
Mircea Trofin 89aad892a5 [NFC][regalloc] Remove unused API in AllocationOrder
Differential Revision: https://reviews.llvm.org/D88197
2020-09-24 12:25:56 -07:00
Matt Arsenault e75afc9acf GlobalISel: Use unmerge when copying wide vectors to result registers
Avoid using G_EXTRACT and move towards a more consistent vector
legalization strategy.
2020-09-24 15:19:51 -04:00
vpykhtin d9beff04a3 [RegisterCoalescer] Fix IMPLICIT_DEF init removal for a register on joining
This patch removes redundant IMPLICIT_DEF for subregs which was leading to
incorrect register initialization on joining in some cases.

Reviewed by: qcolombet

Differential revision: https://reviews.llvm.org/D82258
2020-09-24 17:37:03 +03:00
Alexandre Ganea 4b64ce7428 Improve 723fea2307 - Silence 'warning: unused variable' when compiling with Clang 10.0 2020-09-24 09:07:22 -04:00
Pushpinder Singh 41d6669f1f [GlobalISel][AMDGPU] Lower G_SMULH/G_UMULH
Reviewed By: arsenm, foad

Differential Revision: https://reviews.llvm.org/D85653
2020-09-23 22:25:29 -04:00
Eli Friedman 3f739f736b [SelectionDAG][GISel] Make LegalizeDAG lower FNEG using integer ops.
Previously, if a floating-point type was legal, but FNEG wasn't legal,
we would use FSUB.  Instead, we should use integer ops, to preserve the
semantics.  (Alternatively, there's a compiler-rt call we could use, but
there isn't much reason to use that.)

It turns out we actually are still using this obscure codepath in a few
cases: on some targets, we have "legal" floating-point types that don't
actually support any floating-point operations.  In particular, ARM and
AArch64 are using this path.

The implementation for SelectionDAG is pretty simple because we can
reuse the infrastructure from FCOPYSIGN.

See also 9a3dc3e, the corresponding change to type legalization.

Also includes a "bonus" change to STRICT_FSUB legalization, so we can
lower a STRICT_FSUB to a float libcall.

Includes the changes to both LegalizeDAG and GlobalISel so we don't have
inconsistent results in the future.

Fixes https://bugs.llvm.org/show_bug.cgi?id=46792 .

Differential Revision: https://reviews.llvm.org/D84287
2020-09-23 14:10:33 -07:00
Guozhi Wei fd75ad8662 [MBFIWrapper] Add a new function getBlockProfileCount
MBFIWrapper keeps track of block frequencies of newly created blocks and
modified blocks, modified block frequencies should also impact block profile
count. This class doesn't provide interface getBlockProfileCount, users can only
use the underlying MBFI to query profile count, the underlying MBFI doesn't know
the modifications made in MBFIWrapper, so it either provides stale profile count
for modified block or simply crashes on new blocks.

So this patch add function getBlockProfileCount to class MBFIWrapper to handle
new blocks or modified blocks.

Differential Revision: https://reviews.llvm.org/D87802
2020-09-23 09:31:45 -07:00
Matt Arsenault c463fd136e GlobalISel: Fix truncating shift amount in trunc (shl) combine
The shift amount type does not necessarily match the result type. This
was inserting a trunc from s32 to s32, which asserted. Just preserve
the original shift amount type which can be legalized later.
2020-09-23 09:07:50 -04:00
David Sherwood e077367a28 [SVE] Make EVT::getScalarSizeInBits and others consistent with Type::getScalarSizeInBits
An existing function Type::getScalarSizeInBits returns a uint64_t
instead of a TypeSize class because the caller is requesting a
scalar size, which cannot be scalable. This patch makes other
similar functions requesting a scalar size consistent with that,
thereby eliminating more than 1000 implicit TypeSize -> uint64_t
casts.

Differential revision: https://reviews.llvm.org/D87889
2020-09-23 09:20:08 +01:00
Fangrui Song bee68b2956 [EHStreamer] Ensure CallSiteEntry::{BeginLabel,EndLabel} are non-null. NFC
... to simplify the code a bit.

Reviewed By: rahmanl

Differential Revision: https://reviews.llvm.org/D87999
2020-09-22 17:34:43 -07:00
Reid Kleckner 90242caca2 Revert "[CodeGen] emit CG profile for COFF object file"
This reverts commit 91aed9bf97, it is
causing link errors.
2020-09-22 13:47:39 -07:00
Stefanos Baziotis 89c1e35f3c [LoopInfo] empty() -> isInnermost(), add isOutermost()
Differential Revision: https://reviews.llvm.org/D82895
2020-09-22 23:28:51 +03:00
Mircea Trofin d1e0f9f3cf [NFC][regalloc] Simplify/conform to style guide indvars in Greedy
Differential Revision: https://reviews.llvm.org/D88055
2020-09-22 10:55:52 -07:00
Simon Pilgrim 4dada8d617 [DAG] Remove DAGTypeLegalizer::GenWidenVectorTruncStores (PR42046)
Just scalarize trunc stores - GenWidenVectorTruncStores does the same thing but is flawed (PR42046) and unused.

Differential Revision: https://reviews.llvm.org/D87708
2020-09-22 17:24:45 +01:00
Alexandre Ganea 723fea2307 Silence 'warning: unused variable' when compiling with Clang 10.0 2020-09-22 12:17:40 -04:00
Michael Liao 534f6e1718 [PeepholeOptimizer] Enhance the redundant COPY elimination.
- Eliminate redundant COPYs from the same register & subregister pair.

Differential Revision: https://reviews.llvm.org/D87939
2020-09-22 10:11:37 -04:00
Muhammad Omair Javaid 73a6a164b8 Revert "Reapply Revert "RegAllocFast: Rewrite and improve""
This reverts commit 55f9f87da2.

Breaks following buildbots:
http://lab.llvm.org:8011/builders/lldb-arm-ubuntu/builds/4306
http://lab.llvm.org:8011/builders/lldb-aarch64-ubuntu/builds/9154
2020-09-22 14:40:06 +05:00
Mircea Trofin 6a6b06f526 [NFC][regalloc] Use reverse iterator ranges for improved readability
Differential Revision: https://reviews.llvm.org/D88047
2020-09-21 14:58:37 -07:00
Martin Storsjö 36c64af9d7 [CodeGen] [WinException] Only produce handler data at the end of the function if needed
If we are going to write handler data (that is written as variable
length data following after the unwind info in .xdata), we need to
emit the handler data immediately, but for cases where no such
info is going to be written, skip emitting it right away. (Unwind
info for all remaining functions that hasn't gotten it emitted
directly is emitted at the end.)

This does slightly change the ordering of sections (triggering a
bunch of updates to DebugInfo/COFF tests), but the change should be
benign.

This also matches GCC's assembly output, which doesn't output
.seh_handlerdata unless it actually is needed.

For ARM64, the unwind info can be packed into the runtime function
entry itself (leaving no data in the .xdata section at all), but
that can only be done if there's no follow-on data in the .xdata
section. If emission of the unwind info is triggered via
EmitWinEHHandlerData (or the .seh_handlerdata directive), which
implicitly switches to the .xdata section, there's a chance of the
caller wanting to pass further data there, so the packed format
can't be used in that case.

Differential Revision: https://reviews.llvm.org/D87448
2020-09-21 23:42:59 +03:00
Matt Arsenault 55f9f87da2 Reapply Revert "RegAllocFast: Rewrite and improve"
This reverts commit dbd53a1f0c.

Needed lldb test updates
2020-09-21 15:45:27 -04:00
Simon Pilgrim 6a0ed57a22 ImplicitNullChecks.cpp - use auto const& iterators in for-range loops to avoid copies. NFCI. 2020-09-21 17:42:57 +01:00
Simon Pilgrim 3ae07b2a33 TargetPassConfig.cpp - use auto const& iterator in for-range loop to avoid copies. NFCI. 2020-09-21 17:17:11 +01:00
Simon Pilgrim ce294ff8cd MachineCSE.cpp - use auto const& iterator in for-range loop to avoid copies. NFCI. 2020-09-21 16:54:26 +01:00
Denis Antrushin ee86688b81 [Statepoints][ISEL] gc.relocate uniquification should be based on SDValue, not IR Value.
When exporting statepoint results to virtual registers we try to avoid
generating exports for duplicated inputs. But we erroneously use
IR Value* to check if inputs are duplicated. Instead, we should use
SDValue, because even different IR values can get lowered to the same
SDValue.
I'm adding a (degenerate) test case which emphasizes importance of this
feature for invoke statepoints.
If we fail to export only unique values we will end up with something
like that:

  %0 = STATEPOINT
  %1 = COPY %0

landing_pad:
  <use of %1>

And when exceptional path is taken, %1 is left uninitialized (COPY is never
execute).

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D87695
2020-09-21 19:44:46 +07:00
Alexander Belyaev 17dc729bd4 Revert "[NFC][ScheduleDAG] Remove unused EntrySU SUnit"
This reverts commit 0345d88de6.

Google internal backend uses EntrySU, we are looking into removing
dependency on it.

Differential Revision: https://reviews.llvm.org/D88018
2020-09-21 13:33:05 +02:00
Lucas Prates 53d238a961 [CodeGen] Fixing inconsistent ABI mangling of vlaues in SelectionDAGBuilder
SelectionDAGBuilder was inconsistently mangling values based on ABI
Calling Conventions when getting them through copyFromRegs in
SelectionDAGBuilder, causing duplicate value type convertions for
function arguments. The checking for the mangling requirement was based
on the value's originating instruction and was performed outside of, and
inspite of, the regular Calling Convention Lowering.

The issue could be observed in a scenario such as:

```
%arg1 = load half, half* %const, align 2
%arg2 = call fastcc half @someFunc()
call fastcc void @otherFunc(half %arg1, half %arg2)
; Here, %arg2 was incorrectly mangled twice, as the CallConv data from
; the call to @someFunc() was taken into consideration for the check
; when getting the value for processing the call to @otherFunc(...),
; after the proper convertion had taken place when lowering the return
; value of the first call.
```

This patch fixes the issue by disregarding the Calling Convention
information for such copyFromRegs, making sure the ABI mangling is
properly contanined in the Calling Convention Lowering.

This fixes Bugzilla #47454.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D87844
2020-09-21 10:05:34 +01:00
Fangrui Song dbc616e982 [EHStreamer] Fix a "Continue to action" -fverbose-asm comment when multi-byte LEB128 encoding is needed
This only happens with more than 64 action records and it is difficult to construct a test.
2020-09-20 21:41:48 -07:00
Qiu Chaofan 1d782c2987 [PowerPC] Pass nofpexcept flag to custom lowered constrained ops
This is a follow-up of D86605. For strict DAG FP node, if its FP
exception behavior metadata is ignore, it should have nofpexcept flag.
But during custom lowering, this flag isn't passed down.

This is also seen on X86 target.

Reviewed By: uweigand

Differential Revision: https://reviews.llvm.org/D87390
2020-09-21 10:44:25 +08:00
Fangrui Song d06485685d [XRay] Change mips to use version 2 sled (PC-relative address)
Follow-up to D78590. All targets use PC-relative addresses now.

Reviewed By: atanasyan, dberris

Differential Revision: https://reviews.llvm.org/D87977
2020-09-20 17:59:57 -07:00
Fangrui Song 6913812abc Fix some clang-tidy bugprone-argument-comment issues 2020-09-19 20:41:25 -07:00
Eric Christopher dbd53a1f0c Temporarily Revert "RegAllocFast: Rewrite and improve"
as it's breaking a few tests in the lldb test suite.

Bot: http://lab.llvm.org:8011/builders/lldb-arm-ubuntu/builds/4226/steps/test/logs/stdio

This reverts commit c8757ff3aa.
2020-09-18 18:11:21 -07:00
Fangrui Song 2ac06241d2 [LiveDebugValues] Add `#if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)` to suppress -Wunused-function 2020-09-18 17:25:37 -07:00
Amara Emerson 5d34d7f1a0 [GlobalISel] Add lowering support for G_ABS and use for AArch64.
Differential Revision: https://reviews.llvm.org/D87952
2020-09-18 16:17:18 -07:00
Reid Kleckner 9932561b48 [COFF] Move per-global .drective emission from AsmPrinter to TLOFCOFF
This changes the order of output sections and the output assembly, but
is otherwise NFC.

It simplifies the TLOF interface by removing two COFF-only methods.
2020-09-18 14:31:01 -07:00
James Y Knight f7a53d82c0 PR47468: Fix findPHICopyInsertPoint, so that copies aren't incorrectly inserted after an INLINEASM_BR.
findPHICopyInsertPoint special cases placement in a block with a
callbr or invoke in it. In that case, we must ensure that the copy is
placed before the INLINEASM_BR or call instruction, if the register is
defined prior to that instruction, because it may jump out of the
block.

Previously, the code placed it immediately after the last def _or
use_. This is wrong, if the use is the instruction which may jump.  We
could correctly place it immediately after the last def (ignoring
uses), but that is non-optimal for register pressure.

Instead, place the copy after the last def, or before the
call/inlineasm_br, whichever is later.

Differential Revision: https://reviews.llvm.org/D87865
2020-09-18 14:14:04 -04:00
Matt Arsenault 3105d0f84b CodeGen: Move split block utility to MachineBasicBlock
AMDGPU needs this in several places, so consolidate them here.
2020-09-18 14:05:18 -04:00
Matt Arsenault c8757ff3aa RegAllocFast: Rewrite and improve
This rewrites big parts of the fast register allocator. The basic
strategy of doing block-local allocation hasn't changed but I tweaked
several details:

Track register state on register units instead of physical
registers. This simplifies and speeds up handling of register aliases.
Process basic blocks in reverse order: Definitions are known to end
register livetimes when walking backwards (contrary when walking
forward then uses may or may not be a kill so we need heuristics).

Check register mask operands (calls) instead of conservatively
assuming everything is clobbered.  Enhance heuristics to detect
killing uses: In case of a small number of defs/uses check if they are
all in the same basic block and if so the last one is a killing use.
Enhance heuristic for copy-coalescing through hinting: We check the
first k defs of a register for COPYs rather than relying on there just
being a single definition.  When testing this on the full llvm
test-suite including SPEC externals I measured:

average 5.1% reduction in code size for X86, 4.9% reduction in code on
aarch64. (ranging between 0% and 20% depending on the test) 0.5%
faster compiletime (some analysis suggests the pass is slightly slower
than before, but we more than make up for it because later passes are
faster with the reduced instruction count)

Also adds a few testcases that were broken without this patch, in
particular bug 47278.

Patch mostly by Matthias Braun
2020-09-18 14:05:18 -04:00
Matt Arsenault 870fd53e4f Reapply "RegAllocFast: Record internal state based on register units"
The regressions this caused should be fixed when
https://reviews.llvm.org/D52010 is applied.

This reverts commit a21387c654.
2020-09-18 14:05:18 -04:00
Zequan Wu 91aed9bf97 [CodeGen] emit CG profile for COFF object file
I forgot to add emission of CG profile for COFF object file, when adding the support (https://reviews.llvm.org/D81775)

Differential Revision: https://reviews.llvm.org/D87811
2020-09-18 10:57:54 -07:00
Francis Visoiu Mistrih 0345d88de6 [NFC][ScheduleDAG] Remove unused EntrySU SUnit
EntrySU doesn't seem to be used at all when building the ScheduleDAG.

Differential Revision: https://reviews.llvm.org/D87867
2020-09-18 09:50:47 -07:00
Simon Pilgrim d967aaa8fa [DAG] BuildVectorSDNode::getSplatValue - pull out repeated getNumOperands() calls. NFCI. 2020-09-18 16:10:23 +01:00
Matt Arsenault 751a6c5760 IR: Move denormal mode parsing from MachineFunction to Function
This was just inspecting the IR to begin with, and is useful to check
in some places in the IR.
2020-09-18 09:55:47 -04:00
Philip Reames b04c181ed7 [AArch64] Enable implicit null check transformation
This change enables the generic implicit null transformation for the AArch64 target. As background for those unfamiliar with our implicit null check support:

    An implicit null check is the use of a signal handler to catch and redirect to a handler a null pointer. Specifically, it's replacing an explicit conditional branch with such a redirect. This is only done for very cold branches under frontend control w/appropriate metadata.
    FAULTING_OP is used to wrap the faulting instruction. It is modelled as being a conditional branch to reflect the fact it can transfer control in the CFG.
    FAULTING_OP does not need to be an analyzable branch to achieve it's purpose. (Or at least, that's the x86 model. I find this slightly questionable.)
    When lowering to MC, we convert the FAULTING_OP back into the actual instruction, record the labels, and lower the original instruction.

As can be seen in the test changes, currently the AArch64 backend does not eliminate the unconditional branch to the fallthrough block. I've tried two approaches, neither of which worked. I plan to return to this in a separate change set once I've wrapped my head around the interactions a bit better. (X86 handles this via AllowModify on analyzeBranch, but adding the obvious code causing BranchFolding to crash. I haven't yet figured out if it's a latent bug in BranchFolding, or something I'm doing wrong.)

Differential Revision: https://reviews.llvm.org/D87851
2020-09-17 16:00:19 -07:00
Quentin Colombet 99e865b618 [TargetRegisterInfo] Add a couple of target hooks for the greedy register allocator
Before this patch, the last chance recoloring and deferred spilling
techniques were solely controled by command line options.
This patch adds target hooks for these two techniques so that it
is easier for backend writers to override the default behavior.

The default behavior of the hooks preserves the default values of
the related command line options.

NFC
2020-09-17 15:23:15 -07:00
Derek Schuff 0ff28fa6a7 Support dwarf fission for wasm object files
Initial support for dwarf fission sections (-gsplit-dwarf) on wasm.
The most interesting change is support for writing 2 files (.o and .dwo) in the
wasm object writer. My approach moves object-writing logic into its own function
and calls it twice, swapping out the endian::Writer (W) in between calls.
It also splits the import-preparation step into its own function (and skips it when writing a dwo).

Differential Revision: https://reviews.llvm.org/D85685
2020-09-17 14:42:41 -07:00
Victor Huang a4bb71b1c0 Disable hoisting MI to hotter basic blocks when using pgo
This is a follow up patch for https://reviews.llvm.org/D63676 to
enable the feature when using pgo.

Differential Revision: https://reviews.llvm.org/D85240
2020-09-17 14:17:00 -05:00
Amara Emerson 79b21fc187 [AArch64][GlobalISel] Fix bug in fewVectorElts action while legalizing oversize G_FPTRUNC vectors.
For <8 x s32> = fptrunc <8 x s64> the fewerElementsVector action tries to break
down the source vector into the final source vectors of <2 x s64> using unmerge.
This fixes a crash due to using the wrong number of elements for the breakdown
type.

Also add some legalizer tests for explicitly G_FPTRUNC which we didn't have.

Differential Revision: https://reviews.llvm.org/D87814
2020-09-17 08:56:26 -07:00
Simon Pilgrim 2a56a0ba08 ModuloSchedule.cpp - remove unnecessary includes. NFCI.
Already included in ModuloSchedule.h
2020-09-17 16:47:48 +01:00
Simon Pilgrim 85ba2f1663 LiveDebugVariables.cpp - remove unnecessary Compiler.h include. NFCI.
Already included in LiveDebugVariables.h
2020-09-17 15:06:02 +01:00
Simon Pilgrim 46e59062a0 DwarfExpression.cpp - remove unnecessary includes. NFCI.
Already included in DwarfExpression.h
2020-09-17 15:06:02 +01:00
Simon Pilgrim 67ae46c820 SafeStackLayout.cpp - remove unnecessary StackLifetime.h include. NFCI.
Already included in SafeStackLayout.h
2020-09-17 14:56:46 +01:00
Simon Pilgrim 572e542c5e DwarfStringPool.cpp - remove unnecessary StringRef include. NFCI.
Already included in DwarfStringPool.h
2020-09-17 12:18:27 +01:00
Simon Pilgrim 71f237506b DwarfFile.h - remove unnecessary includes. NFCI.
Use forward declarations where possible, move includes down to DwarfFile.cpp and avoid duplicate includes.
2020-09-17 12:12:18 +01:00
Simon Pilgrim 550b1a6fd4 [AsmPrinter] DwarfDebug - use DebugLoc const references where possible. NFC.
Avoid unnecessary copies.
2020-09-17 10:45:54 +01:00
Simon Pilgrim 4ae1bb193a [AsmPrinter] Remove orphan DwarfUnit::shareAcrossDWOCUs declaration. NFCI.
Method implementation no longer exists.
2020-09-17 10:45:52 +01:00
Jay Foad 6f6d389da5 [SplitKit] Only copy live lanes
When splitting a live interval with subranges, only insert copies for
the lanes that are live at the point of the split. This avoids some
unnecessary copies and fixes a problem where copying dead lanes was
generating MIR that failed verification. The test case for this is
test/CodeGen/AMDGPU/splitkit-copy-live-lanes.mir.

Without this fix, some earlier live range splitting would create %430:

%430 [256r,848r:0)[848r,2584r:1)  0@256r 1@848r L0000000000000003 [848r,2584r:0)  0@848r L0000000000000030 [256r,2584r:0)  0@256r weight:1.480938e-03
...
256B     undef %430.sub2:vreg_128 = V_LSHRREV_B32_e32 16, %20.sub1:vreg_128, implicit $exec
...
848B     %430.sub0:vreg_128 = V_AND_B32_e32 %92:sreg_32, %20.sub1:vreg_128, implicit $exec
...
2584B    %431:vreg_128 = COPY %430:vreg_128

Then RAGreedy::tryLocalSplit would split %430 into %432 and %433 just
before 848B giving:

%432 [256r,844r:0)  0@256r L0000000000000030 [256r,844r:0)  0@256r weight:3.066802e-03
%433 [844r,848r:0)[848r,2584r:1)  0@844r 1@848r L0000000000000030 [844r,2584r:0)  0@844r L0000000000000003 [844r,844d:0)[848r,2584r:1)  0@844r 1@848r weight:2.831776e-03
...
256B     undef %432.sub2:vreg_128 = V_LSHRREV_B32_e32 16, %20.sub1:vreg_128, implicit $exec
...
844B     undef %433.sub0:vreg_128 = COPY %432.sub0:vreg_128 {
           internal %433.sub2:vreg_128 = COPY %432.sub2:vreg_128
848B     }
  %433.sub0:vreg_128 = V_AND_B32_e32 %92:sreg_32, %20.sub1:vreg_128, implicit $exec
...
2584B    %431:vreg_128 = COPY %433:vreg_128

Note that the copy from %432 to %433 at 844B is a curious
bundle-without-a-BUNDLE-instruction that SplitKit creates deliberately,
and it includes a copy of .sub0 which is not live at this point, and
that causes it to fail verification:

*** Bad machine code: No live subrange at use ***
- function:    zextload_global_v64i16_to_v64i64
- basic block: %bb.0  (0x7faed48) [0B;2848B)
- instruction: 844B    undef %433.sub0:vreg_128 = COPY %432.sub0:vreg_128
- operand 1:   %432.sub0:vreg_128
- interval:    %432 [256r,844r:0)  0@256r L0000000000000030 [256r,844r:0)  0@256r weight:3.066802e-03
- at:          844B

Using real bundles with a BUNDLE instruction might also fix this
problem, but the current fix is less invasive and also avoids some
unnecessary copies.

https://bugs.llvm.org/show_bug.cgi?id=47492

Differential Revision: https://reviews.llvm.org/D87757
2020-09-17 09:26:11 +01:00
Qiu Chaofan a2fb5446be [SelectionDAG] Check any use of negation result before removal
2508ef01 fixed a bug about constant removal in negation. But after
sanitizing check I found there's still some issue about it so it's
reverted.

Temporary nodes will be removed if useless in negation. Before the
removal, they'd be checked if any other nodes used it. So the removal
was moved after getNode. However in rare cases the node to be removed is
the same as result of getNode. We missed that and will be fixed by this
patch.

Reviewed By: steven.zhang

Differential Revision: https://reviews.llvm.org/D87614
2020-09-17 16:00:54 +08:00
Igor Kudrin 027d47d1c7 [DebugInfo] Simplify DIEInteger::SizeOf().
An AsmPrinter should always be provided to the method because some forms
depend on its parameters. The only place in the codebase which passed
a nullptr value was found in the unit tests, so the patch updates it to
use some dummy AsmPrinter instead.

Differential Revision: https://reviews.llvm.org/D85293
2020-09-17 12:47:38 +07:00
Craig Topper e30371d99d [DAGCombiner] Teach visitMSTORE to replace an all ones mask with an unmasked store.
Similar to what done in D87788 for MLOAD.

Again I've skipped indexed, truncating, and compressing stores.
2020-09-16 16:42:22 -07:00
Craig Topper 89ee4c0314 [DAGCombiner] Teach visitMLOAD to replace an all ones mask with an unmasked load
If we have an all ones mask, we can just a regular masked load. InstCombine already gets this in IR. But the all ones mask can appear after type legalization.

Only avx512 test cases are affected because X86 backend already looks for element 0 and the last element being 1. It replaces this with an unmasked load and blend. The all ones mask is a special case of that where the blend will be removed. That transform is only enabled on avx2 targets. I believe that's because a non-zero passthru on avx2 already requires a separate blend so its more profitable to handle mixed constant masks.

This patch adds a dedicated all ones handling to the target independent DAG combiner. I've skipped extending, expanding, and index loads for now. X86 doesn't use index so I don't know much about it. Extending made me nervous because I wasn't sure I could trust the memory VT had the right element count due to some weirdness in vector splitting. For expanding I wasn't sure if we needed different undef handling.

Differential Revision: https://reviews.llvm.org/D87788
2020-09-16 13:21:16 -07:00
Matt Arsenault 88bdcbbf1a GlobalISel: Lift store value widening restriction
This doesn't change the memory size and doesn't need to worry about
non-power-of-2 sizes.
2020-09-16 14:25:07 -04:00
Michael Kitzan c4e589b795 [GISel] Add new combines for unary FP instrs with constant operand
https://reviews.llvm.org/D86393

Patch adds five new `GICombinerRules`, one for each of the following unary
FP instrs: `G_FNEG`, `G_FABS`, `G_FPTRUNC`, `G_FSQRT`, and `G_FLOG2`. The
combine rules perform the FP operation on the constant operand and replace
the original instr with the result. Patch additionally adds new combiner
tests for the AArch64 target to test these new combiner rules.
2020-09-16 10:34:15 -07:00
Simon Pilgrim 8f7d6b2375 DwarfUnit.h - remove unnecessary includes. NFCI. 2020-09-16 18:32:29 +01:00
Simon Pilgrim 69682f993c InterferenceCache.cpp - remove duplicate includes. NFCI.
Remove headers already included in InterferenceCache.h
2020-09-16 18:32:28 +01:00
Matt Arsenault 738c73a454 RegAllocFast: Make self loop live-out heuristic more aggressive
This currently has no impact on code, but prevents sizeable code size
regressions after D52010. This prevents spilling and reloading all
values inside blocks that loop back. Add a baseline test which would
regress without this patch.
2020-09-16 13:12:38 -04:00
Matt Arsenault 8d8a496356 LocalStackSlotAllocation: Swap order of check 2020-09-16 12:56:40 -04:00
Francesco Petrogalli 15e9a6c211 [llvm][CodeGen] Do not scalarize `llvm.masked.[gather|scatter]` operating on scalable vectors.
This patch prevents the `llvm.masked.gather` and `llvm.masked.scatter` intrinsics to be scalarized when invoked on scalable vectors.

The change in `Function.cpp` is needed to prevent the warning that is raised when `getNumElements` is used in place of `getElementCount` on `VectorType` instances. The tests guards for regressions on this change.

The tests makes sure that calls to `llvm.masked.[gather|scatter]` are still scalarized when:

  # the intrinsics are operating on fixed size vectors, and
  # the compiler is not targeting fixed length SVE code generation.

Reviewed By: efriedma, sdesmalen

Differential Revision: https://reviews.llvm.org/D86249
2020-09-16 16:00:28 +00:00
Mircea Trofin 6e85c3d5c7 [NFC][Regalloc] accessors for 'reg' and 'weight'
Also renamed the fields to follow style guidelines.

Accessors help with readability - weight mutation, in particular,
is easier to follow this way.

Differential Revision: https://reviews.llvm.org/D87725
2020-09-16 08:28:57 -07:00
Sebastian Neubauer 833b3b0d3a [AMDGPU] Add v3f16/v3i16 support to SDag
Fix lowering and instruction selection for v3x16 types
and enable InstCombine to emit them.

This patch only implements it for the selection dag.
GlobalISel tests in GlobalISel/llvm.amdgcn.image.load.1d.d16.ll and
GlobalISel/llvm.amdgcn.image.store.2d.d16.ll still don't work.

Differential Revision: https://reviews.llvm.org/D84420
2020-09-16 17:20:27 +02:00
Sam Parker 1c421046d7 [RDA] Fix getUniqueReachingDef for self loops
We've fixed the case where this could return an instruction after the
given instruction, but also means that we can falsely return a
'unique' def when they could be one coming from the backedge of a
loop.

Differential Revision: https://reviews.llvm.org/D87751
2020-09-16 12:44:23 +01:00
Simon Pilgrim 3f682611ab [DAG] Remover getOperand() call. NFCI. 2020-09-16 11:18:58 +01:00
Volkan Keles 79378b1b75 GlobalISel: Fix a failing combiner test
test/CodeGen/AArch64/GlobalISel/combine-trunc.mir was failing
due to the different order for evaluating function arguments.
This patch updates the related code to fix the issue.
2020-09-15 16:40:38 -07:00
Aditya Nandakumar 97203cfd6b [GISel] Add new GISel combiners for G_MUL
https://reviews.llvm.org/D87668

Patch adds two new GICombinerRules, one for G_MUL(X, 1) and another for G_MUL(X, -1).
G_MUL(X, 1) is an identity combine, and G_MUL(X, -1) gets replaced with G_SUB(0, X).
Patch additionally adds new combiner tests for the AArch64 target to test these
new combiner rules, as well as updates AMDGPU GISel tests.

Patch by mkitzan
2020-09-15 16:08:47 -07:00
Volkan Keles a4e35cc2ec GlobalISel: Add combines for G_TRUNC
https://reviews.llvm.org/D87050
2020-09-15 15:50:34 -07:00
Guozhi Wei 243ffd0cad [MachineBasicBlock] Fix a typo in function copySuccessor
The condition used to decide if need to copy probability should be reversed.

Differential Revision: https://reviews.llvm.org/D87417
2020-09-15 09:18:18 -07:00
Qiu Chaofan e1669843f2 Revert "[SelectionDAG] Remove unused FP constant in getNegatedExpression"
2508ef01 doesn't totally fix the issue since we did not handle the case
when unused temporary negated result is the same with the result, which
is found by address sanitizer.
2020-09-15 22:03:50 +08:00
Hans Wennborg a21387c654 Revert "RegAllocFast: Record internal state based on register units"
This seems to have caused incorrect register allocation in some cases,
breaking tests in the Zig standard library (PR47278).

As discussed on the bug, revert back to green for now.

> Record internal state based on register units. This is often more
> efficient as there are typically fewer register units to update
> compared to iterating over all the aliases of a register.
>
> Original patch by Matthias Braun, but I've been rebasing and fixing it
> for almost 2 years and fixed a few bugs causing intermediate failures
> to make this patch independent of the changes in
> https://reviews.llvm.org/D52010.

This reverts commit 66251f7e1d, and
follow-ups 931a68f26b
and 0671a4c508. It also adjust some
test expectations.
2020-09-15 13:25:41 +02:00
Simon Pilgrim 6c1f2a34fb SpillPlacement.cpp - remove unnecessary includes. NFCI.
These are all directly included in SpillPlacement.h
2020-09-15 12:18:24 +01:00
Simon Pilgrim 1abb4461ea StatepointLowering.cpp - remove unnecessary includes. NFCI.
These are all directly included in StatepointLowering.h
2020-09-15 12:18:23 +01:00
Simon Pilgrim bee79cdcc6 SelectionDAGBuilder.h - remove unnecessary includes. NFCI.
Reduce to forward declarations and move implicit dependencies down to the cpp files.
2020-09-15 12:18:22 +01:00
Qiu Chaofan 2508ef014e [SelectionDAG] Remove unused FP constant in getNegatedExpression
960cbc53 immediately removes nodes that won't be used to avoid
compilation time explosion. This patch adds the removal to constants to
fix PR47517.

Reviewed By: RKSimon, steven.zhang

Differential Revision: https://reviews.llvm.org/D87614
2020-09-15 17:59:10 +08:00
Petar Avramovic 9b4fa85434 GlobalISel/IRTranslator resetTargetOptions based on function attributes
Update TargetMachine.Options with function attributes before we start
to generate MIR instructions. This allows access to correct function
attributes via TargetMachine.Options (it used to access attributes of
the function that was translated first).
This affects some existing tests with "no-nans-fp-math" attribute.
Follow-up on D87456.

Differential Revision: https://reviews.llvm.org/D87511
2020-09-15 10:26:09 +02:00
Igor Kudrin a845ebd633 [DebugInfo] Make offsets of dwarf units 64-bit (19/19).
In the case of LTO, several DWARF units can be emitted in one section.
For an extremely large application, they may exceed the limit of 4GiB
for 32-bit offsets. As it is now possible to emit 64-bit debugging info,
the patch enables storing the larger offsets.

Differential Revision: https://reviews.llvm.org/D87026
2020-09-15 12:23:32 +07:00
Igor Kudrin 8c19ac23bd [DebugInfo] Make the offset of string pool entries 64-bit (18/19).
The string pool is shared among several units in the case of LTO,
and it potentially can exceed the limit of 4GiB for an extremely
large application. As it is now possible to emit 64-bit debugging
info, the limitation can be removed.

Differential Revision: https://reviews.llvm.org/D87025
2020-09-15 12:23:32 +07:00
Igor Kudrin 7e1e4e81cb [DebugInfo] Fix emitting DWARF64 .debug_macro[.dwo] sections (17/19).
The patch fixes emitting flags and the debug_line_offset field in
the header, as well as the reference to the macro string for
a pre-standard GNU .debug_macro extension.

Differential Revision: https://reviews.llvm.org/D87024
2020-09-15 12:23:31 +07:00
Igor Kudrin a93dd26d8c [DebugInfo] Fix emitting DWARF64 .debug_names sections (16/19).
The patch fixes emitting the unit length field in the header of
the table and offsets to the entry pool. Note that while the patch
changes the common method to emit offsets, in fact, nothing is changed
for Apple accelerator tables, because we do not yet support DWARF64 for
those targets.

Differential Revision: https://reviews.llvm.org/D87023
2020-09-15 12:23:31 +07:00
Igor Kudrin 00ce54689d [DebugInfo] Fix emitting DWARF64 .debug_addr sections (15/19).
The patch fixes emitting the header of the table. The content is
independent of the DWARF format.

Differential Revision: https://reviews.llvm.org/D87022
2020-09-15 12:23:31 +07:00
Igor Kudrin 3158d3dd4b [DebugInfo] Fix emitting DWARF64 .debug_loclists sections (14/19).
The size of the offsets in the table depends on the DWARF format.

Differential Revision: https://reviews.llvm.org/D87020
2020-09-15 12:23:31 +07:00
Igor Kudrin f9b242fe24 [DebugInfo] Fix emitting DWARF64 .debug_rnglists sections (13/19).
The size of the offsets in the table depends on the DWARF format.

Differential Revision: https://reviews.llvm.org/D87019
2020-09-15 12:23:31 +07:00
Igor Kudrin 03b09c6b68 [DebugInfo] Fix emitting pre-v5 name lookup tables in the DWARF64 format (12/19).
The transition is done by using methods of AsmPrinter which
automatically emit values in compliance with the selected DWARF format.

Differential Revision: https://reviews.llvm.org/D87013
2020-09-15 12:23:30 +07:00
Igor Kudrin b118030f3f [DebugInfo] Fix emitting DWARF64 .debug_aranges sections (11/19).
The patch fixes calculating the size of the table and emitting
the fields which depend on the DWARF format by using methods that
choose appropriate sizes automatically.

Differential Revision: https://reviews.llvm.org/D87012
2020-09-15 12:23:30 +07:00
Igor Kudrin 18f23b3ecc [DebugInfo] Fix emitting DWARF64 type units (10/19).
The patch fixes emitting the offset to the type DIE. All other fields
are already fixed in previous patches.

Differential Revision: https://reviews.llvm.org/D87021
2020-09-15 11:31:07 +07:00
Igor Kudrin 924dc58076 [DebugInfo] Fix emitting DWARF64 DWO compilation units and string offset tables (9/19).
These two fixes are better to go together because llvm-dwarfdump is
unable to dump a table when another one is malformed.

Differential Revision: https://reviews.llvm.org/D87018
2020-09-15 11:31:00 +07:00
Igor Kudrin 383d34c077 [DebugInfo] Fix emitting DWARF64 .debug_str_offsets sections (8/19).
The patch fixes calculating the size of the table and emitting the unit
length field.

Differential Revision: https://reviews.llvm.org/D87017
2020-09-15 11:30:53 +07:00
Igor Kudrin 26f1f18831 [DebugInfo] Fix emitting the DW_AT_location attribute for 64-bit DWARFv3 (7/19).
The patch uses a common method to determine the appropriate form for
the value of the attribute.

Differential Revision: https://reviews.llvm.org/D87016
2020-09-15 11:30:46 +07:00
Igor Kudrin cae7c1eb78 [DebugInfo] Use a common method to determine a suitable form for section offsts (6/19).
This is mostly an NFC patch because the involved methods are used when
emitting DWO files, which is incompatible with DWARFv3, or for platforms
where DWARF64 is not supported yet.

Differential Revision: https://reviews.llvm.org/D87015
2020-09-15 11:30:38 +07:00
Igor Kudrin 5dd1c59188 [DebugInfo] Fix emitting DWARF64 compilation units (5/19).
The patch also adds a method to choose an appropriate DWARF form
to represent section offsets according to the version and the format
of producing debug info.

Differential Revision: https://reviews.llvm.org/D87014
2020-09-15 11:30:30 +07:00
Igor Kudrin 982b31fad2 [DebugInfo] Add the -dwarf64 switch to llc and other internal tools (4/19).
The patch adds a switch to enable emitting debug info in the 64-bit
DWARF format. Most emitter for sections will be updated in the subsequent
patches, whereas for .debug_line and .debug_frame the emitters are in
the MC library, which is already updated.

For now, the switch is enabled only for 64-bit ELF targets.

Differential Revision: https://reviews.llvm.org/D87011
2020-09-15 11:30:18 +07:00
Igor Kudrin c3c501f5d7 [DebugInfo] Add new emitting methods for values which depend on the DWARF format (3/19).
These methods are going to be used in subsequent patches.

Differential Revision: https://reviews.llvm.org/D87010
2020-09-15 11:30:10 +07:00
Igor Kudrin a8058c6f8d [DebugInfo] Fix DIE value emitters to be compatible with DWARF64 (2/19).
DW_FORM_sec_offset and DW_FORM_strp imply values of different sizes with
DWARF32 and DWARF64. The patch fixes DIE value classes to use correct
sizes when emitting their values. For DIELocList it ensures that the
requested DWARF form matches the current DWARF format because that class
uses a method that selects the size automatically.

Differential Revision: https://reviews.llvm.org/D87009
2020-09-15 11:30:02 +07:00
Igor Kudrin 380e746bcc [DebugInfo] Fix methods of AsmPrinter to emit values corresponding to the DWARF format (1/19).
These methods are used to emit values which are 32-bit in DWARF32 and
64-bit in DWARF64. The patch fixes them so that they choose the length
automatically, depending on the DWARF format set in the Context.

Differential Revision: https://reviews.llvm.org/D87008
2020-09-15 11:29:48 +07:00
Quentin Colombet b3afad0463 [GlobalISel] Add a `X, Y = G_UNMERGE(G_ZEXT Z)` -> X = G_ZEXT Z; Y = 0 combine
Add a combiner helper to transform unmerge of zext into one zext and
a constant 0

Differential Revision: https://reviews.llvm.org/D87427
2020-09-14 17:27:23 -07:00
Quentin Colombet d2321129bd [GlobalISel] Add `X,Y<dead> = G_UNMERGE Z` -> X = G_TRUNC Z
Add a combiner helper that replaces G_UNMERGE where all the destination lanes
are dead except the first one with a G_TRUNC.

Differential Revision: https://reviews.llvm.org/D87174
2020-09-14 17:27:23 -07:00
Quentin Colombet a36278c2f8 [GlobalISel] Add G_UNMERGE(Cst) -> Cst1, Cst2, ... combine
Add a combiner helper that replaces G_UNMERGE of big constants into direct
use of smaller constants.

Differential Revision: https://reviews.llvm.org/D87166
2020-09-14 16:30:18 -07:00