Commit Graph

27514 Commits

Author SHA1 Message Date
David Green b5315ae8ff [Codegen][ARM] Add addressing modes from masked loads and stores
MVE has a basic symmetry between it's normal loads/store operations and
the masked variants. This means that masked loads and stores can use
pre-inc and post-inc addressing modes, just like the standard loads and
stores already do.

To enable that, this patch adds all the relevant infrastructure for
treating masked loads/stores addressing modes in the same way as normal
loads/stores.

This involves:
- Adding an AddressingMode to MaskedLoadStoreSDNode, along with an extra
   Offset operand that is added after the PtrBase.
- Extending the IndexedModeActions from 8bits to 16bits to store the
   legality of masked operations as well as normal ones. This array is
   fairly small, so doubling the size still won't make it very large.
   Offset masked loads can then be controlled with
   setIndexedMaskedLoadAction, similar to standard loads.
- The same methods that combine to indexed loads, such as
   CombineToPostIndexedLoadStore, are adjusted to handle masked loads in
   the same way.
- The ARM backend is then adjusted to make use of these indexed masked
   loads/stores.
- The X86 backend is adjusted to hopefully be no functional changes.

Differential Revision: https://reviews.llvm.org/D70176
2019-11-26 16:21:01 +00:00
Luís Marques 6fd4c42fa8 [LegalizeTypes][RISCV] Soften FCOPYSIGN operand
Summary: Adds support for softening FCOPYSIGN operands.
Adds RISC-V tests that exercise the new softening code.

Reviewers: asb, lenary, efriedma
Reviewed By: efriedma
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70679
2019-11-26 15:22:55 +00:00
Sam Parker 28166816b0 [ARM][ReachingDefs] Remove dead code in loloops.
Add some more helper functions to ReachingDefs to query the uses of
a given MachineInstr and also to query whether two MachineInstrs use
the same def of a register.

For Arm, while tail-predicating, these helpers are used in the
low-overhead loops to remove the dead code that calculates the number
of loop iterations.

Differential Revision: https://reviews.llvm.org/D70240
2019-11-26 10:27:46 +00:00
Sam Parker cced971fd3 [ARM][ReachingDefs] RDA in LoLoops
Add several new methods to ReachingDefAnalysis:
- getReachingMIDef, instead of returning an integer, return the
  MachineInstr that produces the def.
- getInstFromId, return a MachineInstr for which the given integer
  corresponds to.
- hasSameReachingDef, return whether two MachineInstr use the same
  def of a register.
- isRegUsedAfter, return whether a register is used after a given
  MachineInstr.

These methods have been used in ARMLowOverhead to replace searching
for uses/defs.

Differential Revision: https://reviews.llvm.org/D70009
2019-11-26 10:13:46 +00:00
Craig Topper 3dc7c5f7d8 [LegalizeTypes] Remove code to create ISD::FP_TO_FP16 from SoftenFloatRes_FTRUNC.
There seems to have been a misunderstanding of what ISD::FTRUNC
represents. ISD::FTRUNC is equivalent to llvm.trunc which takes
a floating point value, truncates it without changing the size
of the value and returns it.

Despite its similar name, its different than the fptrunc instruction
in IR which changes a floating point value to a smaller floating
point value. fptrunc is represented by ISD::FP_ROUND in SelectionDAG.

Since the ISD::FP_TO_FP16 node takes a floating point value and
converts it to f16 its more similar to ISD::FP_ROUND. In fact there
is identical code to what is being removed here in SoftenFloatRes_FP_ROUND.

I assume this bug was never encountered because it would require
f16 to be legalized by softening rather than the default of
promoting.
2019-11-25 18:18:40 -08:00
Sanjay Patel 214683f3b2 [DAGCombiner] avoid crash on out-of-bounds insert index (PR44139)
We already have this simplification at node-creation-time, but
the test from:
https://bugs.llvm.org/show_bug.cgi?id=44139
...shows that we can combine our way to an assert/crash too.
2019-11-25 16:24:06 -05:00
Craig Topper d6ec6e4bf6 [TargetLowering] Merge ExpandChainLibCall with makeLibCall
I need to be able to drop an operand for STRICT_FP_ROUND handling on X86. Merging these functions gives me the ArrayRef interface that passes the return type, operands, and debugloc instead of the Node.

Differential Revision: https://reviews.llvm.org/D70503
2019-11-25 10:52:49 -08:00
Jeremy Morse d9c9a4e48d [DebugInfo] Avoid register coalesing unsoundly changing DBG_VALUE locations
This is a re-land of D56151 / r364515 with a completely new implementation.

Once MIR code leaves SSA form and the liveness of a vreg is considered,
DBG_VALUE insts are able to refer to non-live vregs, because their
debug-uses do not contribute to liveness. This non-liveness becomes
problematic for optimizations like register coalescing, as they can't
``see'' the debug uses in the liveness analyses.

As a result registers get coalesced regardless of debug uses, and that can
lead to invalid variable locations containing unexpected values. In the
added test case, the first vreg operand of ADD32rr is merged with various
copies of the vreg (great for performance), but a DBG_VALUE of the
unmodified operand is blindly updated to the modified operand. This changes
what value the variable will appear to have in a debugger.

Fix this by changing any DBG_VALUE whose operand will be resurrected by
register coalescing to be a $noreg DBG_VALUE, i.e. give the variable no
location. This is an overapproximation as some coalesced locations are safe
(others are not) -- an extra domination analysis would be required to work
out which, and it would be better if we just don't generate non-live
DBG_VALUEs.

Differential Revision: https://reviews.llvm.org/D64630
2019-11-25 13:47:06 +00:00
Thomas Raoux e0297a8bee [ModuloSchedule] Fix a bug in experimental expander
Fix two problems that popped up after my last patch. One is that the
stiching of prologue/epilogue can be wrong when reading a value from a
previsou stage. Also changed how we duplicate phi instructions to avoid
generating extra phi that we delete later.

Differential Revision: https://reviews.llvm.org/D70213
2019-11-23 16:01:47 -08:00
Sourabh Singh Tomar 0e02977b6e Recommit "[DWARF] Support for loclist.dwo section in llvm and llvm-dwarfdump."
The original commit message follows.

This patch adds support for debug_loclists.dwo section in llvm and llvm-dwarfdump.
Also Fixes PR43622, PR43623.

Reviewers: dblaikie, probinson, labath, aprantl, jini.susan.george

Differential Revision: https://reviews.llvm.org/D69462
2019-11-23 20:10:23 +05:30
Sourabh Singh Tomar 02cb4b2fd6 Revert "[DWARF] Support for loclist.dwo section in llvm and llvm-dwarfdump."
This reverts commit 81b0a3284a.
Will Re-apply, with updated Differtial Revision, for automatic closure of
Phabricator review.
2019-11-23 19:46:07 +05:30
Sourabh Singh Tomar 81b0a3284a [DWARF] Support for loclist.dwo section in llvm and llvm-dwarfdump.
This patch adds support for debug_loclists.dwo section in llvm and llvm-dwarfdump.
Also Fixes PR43622, PR43623.

Reviewers: dblaikie, probinson, labath, aprantl, jini.susan.george

https://reviews.llvm.org/D69462
2019-11-23 10:25:11 +05:30
Clement Courbet cb15ba84fe Reland "[DAGCombiner] Allow zextended load combines."
Check that the generated type is simple.
2019-11-22 14:47:18 +01:00
Roman Lebedev 96cf5c8d47
[Codegen] TargetLowering::prepareUREMEqFold(): `x u% C1 ==/!= C2` (PR35479)
Summary:
The current lowering is:
```
Name: (X % C1) == C2 -> X * C3 <= C4 || false
Pre: (C2 == 0 || C1 u<= C2) && (C1 u>> countTrailingZeros(C1)) * C3 == 1
%zz = and i8 C3, 0 ; trick alive into making C3 avaliable in precondition
%o0 = urem i8 %x, C1
%r = icmp eq i8 %o0, C2
  =>
%zz = and i8 C3, 0 ; and silence it from complaining about said reg
%C4 = -1 /u C1
%n0 = mul i8 %x, C3
%n1 = lshr i8 %n0, countTrailingZeros(C1) ; rotate right
%n2 = shl i8 %n0, ((8-countTrailingZeros(C1)) %u 8) ; rotate right
%n3 = or i8 %n1, %n2 ; rotate right
%is_tautologically_false = icmp ule i8 C1, C2
%C4_fixed = select i1 %is_tautologically_false, i8 -1, i8 %C4
%res = icmp ule i8 %n3, %C4_fixed
%r = xor i1 %res, %is_tautologically_false
```
https://rise4fun.com/Alive/2xC
https://rise4fun.com/Alive/jpb5

However, we can support non-tautological cases `C1 u> C2` too.
Said handling consists of two parts:
* `C2 u<= (-1 %u C1)`. It just works. We only have to change `(X % C1) == C2` into `((X - C2) % C1) == 0`
```
Name: (X % C1) == C2 -> (X - C2) * C3 <= C4   iff C2 u<= (-1 %u C1)
Pre: (C1 u>> countTrailingZeros(C1)) * C3 == 1 && C2 u<= (-1 %u C1)
%zz = and i8 C3, 0 ; trick alive into making C3 avaliable in precondition
%o0 = urem i8 %x, C1
%r = icmp eq i8 %o0, C2
  =>
%zz = and i8 C3, 0 ; and silence it from complaining about said reg
%C4 = (-1 /u C1)
%n0 = sub i8 %x, C2
%n1 = mul i8 %n0, C3
%n2 = lshr i8 %n1, countTrailingZeros(C1) ; rotate right
%n3 = shl i8 %n1, ((8-countTrailingZeros(C1)) %u 8) ; rotate right
%n4 = or i8 %n2, %n3 ; rotate right
%is_tautologically_false = icmp ule i8 C1, C2
%C4_fixed = select i1 %is_tautologically_false, i8 -1, i8 %C4
%res = icmp ule i8 %n4, %C4_fixed
%r = xor i1 %res, %is_tautologically_false
```
https://rise4fun.com/Alive/m4P
https://rise4fun.com/Alive/SKrx
* `C2 u> (-1 %u C1)`. We also have to change `(X % C1) == C2` into `((X - C2) % C1) == 0`,
  and we have to decrement C4:
```
Name: (X % C1) == C2 -> (X - C2) * C3 <= C4   iff C2 u> (-1 %u C1)
Pre: (C1 u>> countTrailingZeros(C1)) * C3 == 1 && C2 u> (-1 %u C1)
%zz = and i8 C3, 0 ; trick alive into making C3 avaliable in precondition
%o0 = urem i8 %x, C1
%r = icmp eq i8 %o0, C2
  =>
%zz = and i8 C3, 0 ; and silence it from complaining about said reg
%C4 = (-1 /u C1)-1
%n0 = sub i8 %x, C2
%n1 = mul i8 %n0, C3
%n2 = lshr i8 %n1, countTrailingZeros(C1) ; rotate right
%n3 = shl i8 %n1, ((8-countTrailingZeros(C1)) %u 8) ; rotate right
%n4 = or i8 %n2, %n3 ; rotate right
%is_tautologically_false = icmp ule i8 C1, C2
%C4_fixed = select i1 %is_tautologically_false, i8 -1, i8 %C4
%res = icmp ule i8 %n4, %C4_fixed
%r = xor i1 %res, %is_tautologically_false
```
https://rise4fun.com/Alive/d40
https://rise4fun.com/Alive/8cF

I believe this concludes `x u% C1 ==/!= C2` lowering.
In fact, clang is may now be better in this regard than gcc:
as it can be seen from `@t32_6_4` test, we do lower `x % 6 == 4`
via this pattern, while gcc does not: https://godbolt.org/z/XNU2z9
And all the general alive proofs say this is legal.
And manual checking agrees: https://rise4fun.com/Alive/WA2

Fixes [[ https://bugs.llvm.org/show_bug.cgi?id=35479 | PR35479 ]].

Reviewers: RKSimon, craig.topper, spatel

Reviewed By: RKSimon

Subscribers: nick, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70053
2019-11-22 15:22:42 +03:00
Roman Lebedev 3f46022e33
[Codegen] TargetLowering::prepareUREMEqFold(): `x u% C1 ==/!= C2` with tautological C1 u<= C2 (PR35479)
Summary:
This is a preparatory cleanup before i add more
of this fold to deal with comparisons with non-zero.

In essence, the current lowering is:
```
Name: (X % C1) == 0 -> X * C3 <= C4
Pre: (C1 u>> countTrailingZeros(C1)) * C3 == 1
%zz = and i8 C3, 0 ; trick alive into making C3 avaliable in precondition
%o0 = urem i8 %x, C1
%r = icmp eq i8 %o0, 0
  =>
%zz = and i8 C3, 0 ; and silence it from complaining about said reg
%C4 = -1 /u C1
%n0 = mul i8 %x, C3
%n1 = lshr i8 %n0, countTrailingZeros(C1) ; rotate right
%n2 = shl i8 %n0, ((8-countTrailingZeros(C1)) %u 8) ; rotate right
%n3 = or i8 %n1, %n2 ; rotate right
%r = icmp ule i8 %n3, %C4
```
https://rise4fun.com/Alive/oqd

It kinda just works, really no weird edge-cases.
But it isn't all that great for when comparing with non-zero.
In particular, given `(X % C1) == C2`, there will be problems
in the always-false tautological case where `C2 u>= C1`:
https://rise4fun.com/Alive/pH3

That case is tautological, always-false:
```
Name: (X % Y) u>= Y
%o0 = urem i8 %x, %y
%r = icmp uge i8 %o0, %y
  =>
%r = false
```
https://rise4fun.com/Alive/ofu

While we can't/shouldn't get such tautological case normally,
we do deal with non-splat vectors, so unless we want to give up
in this case, we need to fixup/short-circuit such lanes.

There are two lowering variants:
1. We can blend between whatever computed result and the correct tautological result
```
Name: (X % C1) == C2 -> X * C3 <= C4 || false
Pre: (C2 == 0 || C1 u<= C2) && (C1 u>> countTrailingZeros(C1)) * C3 == 1
%zz = and i8 C3, 0 ; trick alive into making C3 avaliable in precondition
%o0 = urem i8 %x, C1
%r = icmp eq i8 %o0, C2
  =>
%zz = and i8 C3, 0 ; and silence it from complaining about said reg
%C4 = -1 /u C1
%n0 = mul i8 %x, C3
%n1 = lshr i8 %n0, countTrailingZeros(C1) ; rotate right
%n2 = shl i8 %n0, ((8-countTrailingZeros(C1)) %u 8) ; rotate right
%n3 = or i8 %n1, %n2 ; rotate right
%is_tautologically_false = icmp ule i8 C1, C2
%res = icmp ule i8 %n3, %C4
%r = select i1 %is_tautologically_false, i1 0, i1 %res
```
https://rise4fun.com/Alive/PjT5
https://rise4fun.com/Alive/1KV

2. We can invert the comparison result
```
Name: (X % C1) == C2 -> X * C3 <= C4 || false
Pre: (C2 == 0 || C1 u<= C2) && (C1 u>> countTrailingZeros(C1)) * C3 == 1
%zz = and i8 C3, 0 ; trick alive into making C3 avaliable in precondition
%o0 = urem i8 %x, C1
%r = icmp eq i8 %o0, C2
  =>
%zz = and i8 C3, 0 ; and silence it from complaining about said reg
%C4 = -1 /u C1
%n0 = mul i8 %x, C3
%n1 = lshr i8 %n0, countTrailingZeros(C1) ; rotate right
%n2 = shl i8 %n0, ((8-countTrailingZeros(C1)) %u 8) ; rotate right
%n3 = or i8 %n1, %n2 ; rotate right
%is_tautologically_false = icmp ule i8 C1, C2
%C4_fixed = select i1 %is_tautologically_false, i8 -1, i8 %C4
%res = icmp ule i8 %n3, %C4_fixed
%r = xor i1 %res, %is_tautologically_false
```
https://rise4fun.com/Alive/2xC
https://rise4fun.com/Alive/jpb5

3. We can expand into `and`/`or`:
https://rise4fun.com/Alive/WGn
https://rise4fun.com/Alive/lcb5

Blend-one is likely better since we avoid having to load the
replacement from constant pool. `xor` is second best since
it's still pretty general. I'm not adding `and`/`or` variants.

Reviewers: RKSimon, craig.topper, spatel

Reviewed By: RKSimon

Subscribers: nick, hiraditya, xbolva00, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70051
2019-11-22 15:16:03 +03:00
Clement Courbet 88e205525c Revert "[DAGCombiner] Allow zextended load combines."
Breaks some bots.
2019-11-22 09:01:08 +01:00
Clement Courbet 036790f988 [DAGCombiner] Allow zextended load combines.
Summary: or(zext(load8(base)), zext(load8(base+1)) -> zext(load16 base)

Reviewers: apilipenko, RKSimon

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70487
2019-11-22 08:40:19 +01:00
Pengfei Wang 22a0edd070 [FPEnv] Add an option to disable strict float node mutating to an normal
float node

This patch add an option 'disable-strictnode-mutation' to prevent strict
node mutating to an normal node.
So we can make sure that the patch which sets strict-node as legal works
correctly.

Patch by Chen Liu(LiuChen3)

Differential Revision: https://reviews.llvm.org/D70226
2019-11-21 18:07:11 -08:00
Craig Topper 7696b99258 [LegalizeDAG][X86] Add support for turning STRICT_FADD/SUB/MUL/DIV into libcalls. Use it for fp128 on x86-64.
This requires a minor hack for f32/f64 strict fadd/fsub to avoid
turning those into libcalls.
2019-11-21 16:19:25 -08:00
Hiroshi Yamauchi 52e377497d [PGO][PGSO] DAG.shouldOptForSize part.
Summary:
(Split of off D67120)

SelectionDAG::shouldOptForSize changes for profile guided size optimization.

Reviewers: davidxl

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70095
2019-11-21 14:16:00 -08:00
Tom Stellard ab411801b8 [cmake] Explicitly mark libraries defined in lib/ as "Component Libraries"
Summary:
Most libraries are defined in the lib/ directory but there are also a
few libraries defined in tools/ e.g. libLLVM, libLTO.  I'm defining
"Component Libraries" as libraries defined in lib/ that may be included in
libLLVM.so.  Explicitly marking the libraries in lib/ as component
libraries allows us to remove some fragile checks that attempt to
differentiate between lib/ libraries and tools/ libraires:

1. In tools/llvm-shlib, because
llvm_map_components_to_libnames(LIB_NAMES "all") returned a list of
all libraries defined in the whole project, there was custom code
needed to filter out libraries defined in tools/, none of which should
be included in libLLVM.so.  This code assumed that any library
defined as static was from lib/ and everything else should be
excluded.

With this change, llvm_map_components_to_libnames(LIB_NAMES, "all")
only returns libraries that have been added to the LLVM_COMPONENT_LIBS
global cmake property, so this custom filtering logic can be removed.
Doing this also fixes the build with BUILD_SHARED_LIBS=ON
and LLVM_BUILD_LLVM_DYLIB=ON.

2. There was some code in llvm_add_library that assumed that
libraries defined in lib/ would not have LLVM_LINK_COMPONENTS or
ARG_LINK_COMPONENTS set.  This is only true because libraries
defined lib lib/ use LLVMBuild.txt and don't set these values.
This code has been fixed now to check if the library has been
explicitly marked as a component library, which should now make it
easier to remove LLVMBuild at some point in the future.

I have tested this patch on Windows, MacOS and Linux with release builds
and the following combinations of CMake options:

- "" (No options)
- -DLLVM_BUILD_LLVM_DYLIB=ON
- -DLLVM_LINK_LLVM_DYLIB=ON
- -DBUILD_SHARED_LIBS=ON
- -DBUILD_SHARED_LIBS=ON -DLLVM_BUILD_LLVM_DYLIB=ON
- -DBUILD_SHARED_LIBS=ON -DLLVM_LINK_LLVM_DYLIB=ON

Reviewers: beanz, smeenai, compnerd, phosek

Reviewed By: beanz

Subscribers: wuzish, jholewinski, arsenm, dschuff, jyknight, dylanmckay, sdardis, nemanjai, jvesely, nhaehnle, mgorny, mehdi_amini, sbc100, jgravelle-google, hiraditya, aheejin, fedor.sergeev, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, steven_wu, rogfer01, MartinMosbeck, brucehoult, the_o, dexonsmith, PkmX, jocewei, jsji, dang, Jim, lenary, s.egerton, pzheng, sameer.abuasal, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70179
2019-11-21 10:48:08 -08:00
Bjorn Pettersson 898de30291 [BranchFolding] Fix PR43964 about branch folder not being debug invariant
Summary:
The fix in BranchFolder related to non debug invariant problems
done in commit ec32dff0b0 actually introduced some new
problems with debug invariance.

Before that patch ComputeCommonTailLength would move iterators
back, past debug instructions, in order to make ProfitableToMerge
make consistent answers "when one block differs from the other
only by whether debugging pseudos are present at the beginning".
But the changes in ec32dff0b0 undid that by moving the iterators
forward again.

This patch refactors ComputeCommonTailLength. The function was
really complex, considering that the SkipTopCFIAndReturn part
always moved the iterators forward to the first "real" instruction
in the found tail after ec32dff0b0.

The patch also restores the logic to "back past possible debugging
pseudos at beginning of block" to make sure ProfitableToMerge
gives consistent answers independent of DBG_VALUE instructions
before the tail. That is now done by ProfitableToMerge instead of
being hidden as a side-effect in ComputeCommonTailLength.

Reviewers: probinson, yechunliang, jmorse

Reviewed By: jmorse

Subscribers: Orlando, mehdi_amini, dexonsmith, aprantl, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70091
2019-11-21 18:13:32 +01:00
Clement Courbet 252567377c [DAGCombine][NFC] Use ArrayRef and correctly size SmallVectors.
In preparation for D70487.
2019-11-21 08:53:37 +01:00
Adrian Prantl 5da385fb56 Fix an offset underflow bug in DwarfExpression when describing small values with subregisters
DwarfExpression::addMachineReg() knows how to build a larger register
that isn't expressible in DWARF by combining multiple
subregisters. However, if the entire value fits into just one
subregister, it would still emit the other subregisters, leading to
all sorts of inconsistencies down the line.

This patch fixes that by moving an already existing(!) check whether
the subregister's offset is before the end of the value to the right
place.

rdar://problem/57294211

Differential Revision: https://reviews.llvm.org/D70508
2019-11-20 17:07:54 -08:00
Craig Topper c9e8e808cf [SelectionDAG][X86] Mutate strictFP nodes to non-strict in DoInstructionSelection when the node is marked Expand rather than when it is not Legal.
This allows operations that are marked Custom, but have some type
combinations that are legal to get past this code.

Add custom mutation code to X86's Select function for the nodes
that don't have isel patterns yet.
2019-11-20 10:36:02 -08:00
Xiangling Liao 750e855641 A fix of the bug introduced by previous lowering in asm patch.
Differential Revision: https://reviews.llvm.org/D70243
2019-11-20 11:29:10 -05:00
Xing Xue 5665fc91fe [AIX][XCOFF] Add support for generating assembly code for one-byte mergable strings
This patch adds support for generating assembly code for one-byte mergeable strings.

Generating assembly code for multi-byte mergeable strings and the `XCOFF` object code for mergeable strings will be supported later.

Reviewers: hubert.reinterpretcast, jasonliu, daltenty, sfertile, DiggerLin, Xiangling_L

Reviewed by: daltenty

Subscribers: wuzish, nemanjai, hiraditya, kbarton, jsji, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70310
2019-11-20 11:26:49 -05:00
Xiangling Liao ca33727abe [AIX] Lowering jump table, constant pool and block address in asm
This patch lowering jump table, constant pool and block address in assembly.
1. On AIX, jump table index is always relative;
2. Put CPI and JTI into ReadOnlySection until we support unique data sections;
3. Create the temp symbol for block address symbol;
4. Update MIR testcases and add related assembly part;

Differential Revision: https://reviews.llvm.org/D70243
2019-11-20 10:27:15 -05:00
David Zarzycki 257acbf6ae
[SelectionDAG] Combine U{ADD,SUB}O diamonds into {ADD,SUB}CARRY
Summary:
Convert (uaddo (uaddo x, y), carryIn) into addcarry x, y, carryIn if-and-only-if the carry flags of the first two uaddo are merged via OR or XOR.

Work remaining: match ADD, etc.

Reviewers: craig.topper, RKSimon, spatel, niravd, jonpa, uweigand, deadalnix, nikic, lebedev.ri, dmgreen, chfast

Reviewed By: lebedev.ri

Subscribers: chfast, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70079
2019-11-20 16:25:42 +02:00
Djordje Todorovic 979592a6f7 [DebugInfo] Remove the DIFlagArgumentNotModified debug info flag
Due to changes in D68206, we remove the DIFlagArgumentNotModified
and its usage.

Differential Revision: https://reviews.llvm.org/D68207
2019-11-20 13:18:40 +01:00
Serge Pavlov ea8678d1c7 Move floating point related entities to namespace level
This is recommit of commit e6584b2b7b, which was reverted in
30e7ee3c4b together with af57dbf12e.
Original message is below.

Enumerations that describe rounding mode and exception behavior were
defined inside ConstrainedFPIntrinsic. It makes sense to use the same
definitions to represent the same properties in other cases, not only
in constrained intrinsics. It was however inconvenient as required to
include constrained intrinsics definitions even if they were not needed.
Also using long scope prefix reduced readability.

This change moves these definitioins to the namespace llvm::fp.
No functional changes.

Differential Revision: https://reviews.llvm.org/D69552
2019-11-20 19:05:46 +07:00
Serge Pavlov 0c50c0b055 [FEnv] File with properties of constrained intrinsics
Summary
In several places we need to enumerate all constrained intrinsics or IR
nodes that should be represented by them. It is easy to miss some of
the cases. To make working with these intrinsics more convenient and
robust, this change introduces file containing definitions of all
constrained intrinsics and some of their properties. This file can be
included to generate constrained intrinsics processing code.

Reviewers: kpn, andrew.w.kaylor, cameron.mcinally, uweigand

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D69887
2019-11-20 13:30:07 +07:00
Craig Topper c4b41e8d1d [LegalizeDAG][X86] Enable STRICT_FP_TO_SINT/UINT to be promoted
Differential Revision: https://reviews.llvm.org/D70220
2019-11-19 16:14:37 -08:00
Vedant Kumar ba71ca3720 [DebugInfo] Describe size of spilled values in call site params
A call site parameter description of a memory operand needs to
unambiguously convey the size of the operand to prevent incorrect entry
value evaluation.

Thanks for David Stenberg for pointing this issue out!
2019-11-19 12:03:52 -08:00
Matt Arsenault 7fe9435dc8 Work on cleaning up denormal mode handling
Cleanup handling of the denormal-fp-math attribute. Consolidate places
checking the allowed names in one place.

This is in preparation for introducing FP type specific variants of
the denormal-fp-mode attribute. AMDGPU will switch to using this in
place of the current hacky use of subtarget features for the denormal
mode.

Introduce a new header for dealing with FP modes. The constrained
intrinsic classes define related enums that should also be moved into
this header for uses in other contexts.

The verifier could use a check to make sure the denorm-fp-mode
attribute is sane, but there currently isn't one.

Currently, DAGCombiner incorrectly asssumes non-IEEE behavior by
default in the one current user. Clang must be taught to start
emitting this attribute by default to avoid regressions when this is
switched to assume ieee behavior if the attribute isn't present.
2019-11-19 22:01:14 +05:30
Matt Arsenault b696b9dba7 DAG: Add function context to isFMAFasterThanFMulAndFAdd
AMDGPU needs to know the FP mode for the function to answer this
correctly when this is removed from the subtarget.

AArch64 had to make this more complicated by using this from an IR
hook, so add an IR typed overload.
2019-11-19 19:25:26 +05:30
Thomas Preud'homme a89ca4ae17 Fix PR44001: assert failure in getFunctionLocalOffsetAfterInsn
Summary:
Assert in getFunctionLocalOffsetAfterInsn() fails when processing a call
MachineInstr inside a bundle and compiling with debug info. This is
because labels are added by DwarfDebug::beginInstruction() which is
called for each top-level MI by EmitFunctionBody()'s for-loop iteration
but constructCallSiteEntryDIEs() which calls
getFunctionLocalOffsetAfterInsn() iterates over all MIs.

This commit modifies constructCallSiteEntryDIEs() to get the associated
bundle MI for call MIs inside a bundle and use that to when calling
getFunctionLocalOffsetAfterInsn() and getLabelAfterInsn(). It also skips
loop iterations for bundle MIs since the loop statements are concerned
with debug info for each physical instructions and bundles represent a
group of instructions. It also fix the comment about PCAddr since the
code is getting the return address and not the call address.

Reviewers: dstenb, vsk, aprantl, djtodoro, dblaikie, NikolaPrica

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70293
2019-11-19 11:23:11 +00:00
Craig Topper dc02eb1909 [SelectionDAG] Merge the two identical ExpandChainLibCall methods from LegalizeTypes and LegalizeDAG to one version in TaretLowering.
Reviewers: RKSimon, efriedma, spatel

Reviewed By: efriedma

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70354
2019-11-18 20:22:33 -08:00
Craig Topper 6e20d70a69 [LegalizeDAG] Convert strict fp nodes to libcalls without losing the chain.
Previously we mutated the node and then converted it to a libcall. But this loses the chain information.

This patch keeps the chain, but unfortunately breaks tail call optimization as the functions involved in deciding if a node is in tail call position can't handle the chain. But correct ordering seems more important to be right.

Somehow the SystemZ tests improved. I looked at one of them and it seemed that we're handling the split vector elements in a different order and that made the copies work better.

Differential Revision: https://reviews.llvm.org/D70334
2019-11-18 11:24:08 -08:00
Eric Christopher 30e7ee3c4b Temporarily Revert "Add support for options -frounding-math, ftrapping-math, -ffp-model=, and -ffp-exception-behavior="
and a follow-up NFC rearrangement as it's causing a crash on valid. Testcase is on the original review thread.

This reverts commits af57dbf12e and e6584b2b7b
2019-11-18 10:46:48 -08:00
Sam McCall d27a16eb39 Revert "[DWARF5]Addition of alignment atrribute in typedef DIE."
This reverts commit 423f541c1a, which
breaks llvm-c ABI.
2019-11-18 15:53:22 +01:00
Graham Hunter 3f08ad611a [SVE][CodeGen] Scalable vector MVT size queries
* Implements scalable size queries for MVTs, split out from D53137.

* Contains a fix for FindMemType to avoid using scalable vector type
  to contain non-scalable types.

* Explicit casts for several places where implicit integer sign
  changes or promotion from 32 to 64 bits caused problems.

* CodeGenDAGPatterns will treat scalable and non-scalable vector types
  as different.

Reviewers: greened, cameron.mcinally, sdesmalen, rovka

Reviewed By: rovka

Differential Revision: https://reviews.llvm.org/D66871
2019-11-18 12:30:59 +00:00
Craig Topper bfbbf0aba8 [LegalizeTypes] Remove SoftenFloat handling from ExpandIntRes_LLROUND_LLRINT and remove assert from the strict fp path.
These were both recently added. While the call to GetSoftenedFloat
is a little more optimal, we don't do it in the expand for
FP_TO_SINT/UINT so there's no real reason to do it here. This
avoids a FIXME for strict fp.
2019-11-17 23:48:31 -08:00
Craig Topper 5a56d2aa33 [LegalizeTypes] Remove unnecessary conversion from EVT to MVT to MVT::SimpleValueType just to assign back to EVT. NFC 2019-11-17 23:48:31 -08:00
Craig Topper af435286e5 [LegalizeTypes][X86] Add support for expanding the result type of STRICT_LLROUND and STRICT_LLRINT.
This doesn't handle softening the input type, but we don't handle
softening any of the strict nodes yet. Skipping that made it easy
to reuse an existing function for creating a libcall from a node
with a chain.
2019-11-17 20:03:05 -08:00
Craig Topper 1b0efe2b17 [LegalizeTypes] When expanding the integer result of LLROUND/LLRINT, also call GetSoftenedFloat if the floating point input needs to be softened.
Before this we were emitting a bitcast to integer from the lowering
code that itself will need to be legalized. By calling
GetSoftenedFloat we get the integer conversion in one step without
needing to relegalize a bitcast.
2019-11-17 13:31:30 -08:00
Craig Topper 9b515b6dd9 [LegalizeTypes] Remove PromoteFloat support form ExpandIntRes_LLROUND_LLRINT.
This code isn't exercised, and was in the wrong place. If we need
this, we would need to promote the type before figuring out which
libcall to use.

I'm choosing to remove it rather than fixing since we don't
support PromoteFloat for LRINT/LROUND/LLRINT/LLROUND when the
result type is legal so I don't see much reason to support it
for the case where the result type isn't legal.
2019-11-17 13:31:30 -08:00
Craig Topper d4ba11ae32 [LegalizeTypes] Merge ExpandIntRes_LLROUND and ExpandIntRes_LLRINT into one function that handles both. NFC
These too functions are were the same except for which libcall gets
emitted. Just merge them into one.

This is prep work for some other work including strict fp support.
2019-11-17 13:31:30 -08:00
Sourabh Singh Tomar 423f541c1a [DWARF5]Addition of alignment atrribute in typedef DIE.
This patch, adds support for DW_AT_alignment[DWARF5] attribute, to be emitted with typdef DIE.
When explicit alignment is specified.

Patch by Awanish Pandey <Awanish.Pandey@amd.com>

Reviewers: aprantl, dblaikie, jini.susan.george, SouraVX, alok,
deadalinx

Differential Revision: https://reviews.llvm.org/D70111
2019-11-16 21:56:53 +05:30
David Blaikie 77cfcd7509 DebugInfo: Use loclistx for DWARFv5 location lists to reduce the number of relocations
This only implements the non-dwo part, but loclistx is necessary to use
location lists in DWARFv5, so it's a precursor to that work - and
generally reduces relocations (only using one reloc, then
indexes/relative offsets for all location list references) in non-split
DWARF.
2019-11-15 18:51:13 -08:00
Quentin Colombet 98ceac4981 [GISel][CombinerHelper] Use uses() instead of operands() when traversing use operands.
NFC
2019-11-15 13:54:33 -08:00
Quentin Colombet 304abde077 [GISel][CombinerHelper] Add support for scalar type for the result of shuffle vector
LLVM IR of 1-element vectors get lower into scalar in GISel. As a
result, shuffle vector may also produce a scalar.

This patch teaches the shuffle combiner how to deal with scalars when
they are in the destination type of a shuffle vector.

For now, we just support the easy case where this can be lowered to
a plain copy. For other cases, we leave the shuffle vector as is.

This type of IR are seen in O0 pipelines. E.g., as produced with
SingleSource/UnitTests/Vector/AArch64/aarch64_neon_intrinsics.c.

rdar://problem/57198904
2019-11-15 13:54:33 -08:00
Aditya Nandakumar 7276868556 [MirNamer][Canonicalizer]: Perform instruction semantic based renaming
https://reviews.llvm.org/D70210

Previously:

Due to sensitivity of the algorithm with gaps, and extra instructions,
when diffing, often we see naming being off by a few. Makes the diff
unreadable even for tests with 7 and 8 instructions respectively.
Naming can change depending on candidates (and order of picking
candidates). Suddenly if there's one extra instruction somewhere, the
entire subtree would be named completely differently.
No consistent naming of similar instructions which occur in different
functions. If we try to do something like count the frequency
distribution of various differences across suite, then the above
sensitivity issues are going to result in poor results.
Instead:

Name instruction based on semantics of the instruction (hash of the
opcode and operands). Essentially for a given instruction that occurs in
any module/function it'll be named similarly (ie semantic). This has
some nice properties
Can easily look at many instructions and just check the hash and if
they're named similarly, then it's the same instruction. Makes it very
easy to spot the same instruction both multiple times, as well as across
many functions (useful for frequency distribution).
Independent of traversal/candidates/depth of graph. No need to keep
track of last index/gaps/skip count etc.
No off by few issues with diffs. I've tried the old vs new
implementation in files ranging from 30 to 700 instructions. In both
cases with the old algorithm, diffs are a sea of red, where as for the
semantic version, in both cases, the diffs line up beautifully.
Simplified implementation of the main loop (simple iteration) , no keep
track of what's visited and not.
Handle collision just by incrementing a counter. Roughly
bb[N]_hash_[CollisionCount].
Additionally with the new implementation, we can probably avoid doing
the hoisting of instructions to various places, as they'll likely be
named the same resulting in differences only based on collision (ie
regardless of whether the instruction is hoisted or not/close to use or
not, it'll be named the same hash which should result in use of the
instruction be identical with the only change being the collision count)
which is very easy to spot visually.
2019-11-15 08:38:54 -08:00
diggerlin 3dfa975fb3 Add read-only data assembly writing for aix
SUMMARY:
The patch will emit read-only variable assembly code for aix.

Reviewers: daltenty,Xiangling_Liao
Subscribers: rupprecht, seiyai,hiraditya

Differential Revision: https://reviews.llvm.org/D70182
2019-11-15 11:30:19 -05:00
Serge Pavlov e6584b2b7b Move floating point related entities to namespace level
Enumerations that describe rounding mode and exception behavior were
defined inside ConstrainedFPIntrinsic. It makes sense to use the same
definitions to represent the same properties in other cases, not only
in constrained intrinsics. It was however inconvenient as required to
include constrained intrinsics definitions even if they were not needed.
Also using long scope prefix reduced readability.

This change moves these definitioins to the namespace llvm::fp.
No functional changes.

Differential Revision: https://reviews.llvm.org/D69552
2019-11-15 19:56:33 +07:00
Jay Foad c953e061b4 [CodeGen] Increase the size of a SmallVector
The SmallVector reserve() call in
MachineInstrExpressionTrait::getHashValue accounted for over 3% of all
calls to malloc() when I compiled a bunch of graphics shaders for the
AMDGPU target. Its initial size was only enough for machine instructions
with up to 7 operands, but for AMDGPU 8 and 10 operands are very common.
Here's a histogram of number of operands for each call to getHashValue,
gathered from the same collection of shaders:

1  13503
2  254273
3  135781
4  422508
5  614997
6  194953
7  287248
8  1517255
9  31218
10 1191269
11 70731
12 24
13 77
15 84
17 4692
27 16
33 705
49 6

Typical instructions with 8 and 10 operands are floating point
arithmetic and multiply-accumulate instructions like:

%83:vgpr_32 = V_MUL_F32_e64 0, killed %82:vgpr_32, 0, killed %81:vgpr_32, 0, 0, implicit $exec
%330:vgpr_32 = V_MAC_F32_e64 0, killed %327:vgpr_32, 0, killed %329:sgpr_32, 0, %328:vgpr_32(tied-def 0), 0, 0, implicit $exec

Differential Revision: https://reviews.llvm.org/D70301
2019-11-15 11:32:11 +00:00
Matt Arsenault bc276c6379 GlobalISel: Lower s1 source G_SITOFP/G_UITOFP 2019-11-15 13:37:20 +05:30
Reid Kleckner 4c1a1d3cf9 Add missing includes needed to prune LLVMContext.h include, NFC
These are a pre-requisite to removing #include "llvm/Support/Options.h"
from LLVMContext.h: https://reviews.llvm.org/D70280
2019-11-14 15:23:15 -08:00
Vedant Kumar 1ee84e5ab2 [DebugInfo] Allow spill slots in call site parameter descriptions
Allow call site paramter descriptions to reference spill slots. Spill
slots are not visible to high-level LLVM IR, so they can safely be
referenced during entry value evaluation (as they cannot be clobbered by
some other function).

This gives a 5% increase in the number of call site parameter DIEs in an
LTO x86_64 build of the xnu kernel.

This reverts commit eb4c98ca3d (
[DebugInfo] Exclude memory location values as parameter entry values),
effectively reintroducing the portion of D60716 which dealt with memory
locations (authored by Djordje, Nikola, Ananth, and Ivan).

This partially addresses llvm.org/PR43343. However, not all memory
operands forwarded to callees live in spill slots. In the xnu build, it
may be possible to use an escape analysis to increase the number of call
site parameter by another 15% (more details in PR43343).

Differential Revision: https://reviews.llvm.org/D70254
2019-11-14 12:48:51 -08:00
Daniel Sanders b2839c442e [globalisel][irtanslator] The IRTranslator should preserve TBAA information 2019-11-14 12:11:27 -08:00
Sumanth Gundapaneni 7c7e368a7f [Pipeliner] Fix an assertion caused by iterator invalidation. 2019-11-14 13:08:06 -06:00
Craig Topper 17bb2d7c80 [ExpandReductions] Don't push all intrinsics to the worklist. Just push reductions.
We were previously pushing all intrinsics used in a function to the
worklist. This is wasteful for memory in a function with a lot of
intrinsics.

We also ask TTI if we should expand every intrinsic, but we only
have expansion support for the reduction intrinsics. This just
wastes time for the non-reduction intrinsics.

This patch only pushes reduction intrinsics into the worklist and
skips other intrinsics.

Differential Revision: https://reviews.llvm.org/D69470
2019-11-14 10:26:53 -08:00
Reid Kleckner 5fe3f00ae2 Replace wrongly deleted header banner, fix formatting
I reviewed the diff hunks of 05da2fe521 that don't contain
'#include' lines, and found two unintended changes. I deleted a header
banner inadvertently while inserting a header, and changed the
indentation of a constructor in an odd way. Add back the banner, and
reformat the constructor.
2019-11-14 10:21:42 -08:00
Paweł Bylica 1c247dd028
[DAGCombiner] Drop redundant DAG method param. NFC 2019-11-14 14:02:53 +01:00
Paweł Bylica 9b89bda517
[DAGCombiner] Use TLI field already available. NFC 2019-11-14 14:02:52 +01:00
Reid Kleckner 1dfede3122 Move CodeGenFileType enum to Support/CodeGen.h
Avoids the need to include TargetMachine.h from various places just for
an enum. Various other enums live here, such as the optimization level,
TLS model, etc. Data suggests that this change probably doesn't matter,
but it seems nice to have anyway.
2019-11-13 16:39:34 -08:00
Reid Kleckner 05da2fe521 Sink all InitializePasses.h includes
This file lists every pass in LLVM, and is included by Pass.h, which is
very popular. Every time we add, remove, or rename a pass in LLVM, it
caused lots of recompilation.

I found this fact by looking at this table, which is sorted by the
number of times a file was changed over the last 100,000 git commits
multiplied by the number of object files that depend on it in the
current checkout:
  recompiles    touches affected_files  header
  342380        95      3604    llvm/include/llvm/ADT/STLExtras.h
  314730        234     1345    llvm/include/llvm/InitializePasses.h
  307036        118     2602    llvm/include/llvm/ADT/APInt.h
  213049        59      3611    llvm/include/llvm/Support/MathExtras.h
  170422        47      3626    llvm/include/llvm/Support/Compiler.h
  162225        45      3605    llvm/include/llvm/ADT/Optional.h
  158319        63      2513    llvm/include/llvm/ADT/Triple.h
  140322        39      3598    llvm/include/llvm/ADT/StringRef.h
  137647        59      2333    llvm/include/llvm/Support/Error.h
  131619        73      1803    llvm/include/llvm/Support/FileSystem.h

Before this change, touching InitializePasses.h would cause 1345 files
to recompile. After this change, touching it only causes 550 compiles in
an incremental rebuild.

Reviewers: bkramer, asbirlea, bollu, jdoerfert

Differential Revision: https://reviews.llvm.org/D70211
2019-11-13 16:34:37 -08:00
Reid Kleckner 364d1785a6 Sink MachineFunction private method out of line
This method is private and only called from this file and doesn't need
to be inline. Saves a TargetMachine.h include in MachineFunction.h, a
popular header. The include was introduced in 98603a8153 despite the
forward decl of LLVMTargetMachine.
2019-11-13 15:36:58 -08:00
Craig Topper 84e83b54bd [TargetLowering] Increase the storage size of NumRegistersForVT to allow the type break down for v256i1 and other types to be stored correctly
v256i1 on X86 without avx512 breaks down to 256 i8 values when passed between basic blocks. But the NumRegistersForVT was sized at a byte for each VT. This results in 256 being stored as 0.

This patch enlarges the type to 16 bits and adds an assert to ensure that no information is lost when the entry is stored.

Differential Revision: https://reviews.llvm.org/D70138
2019-11-13 12:09:35 -08:00
Quentin Colombet de94cda81b [LiveInterval] Allow updating subranges with slightly out-dated IR
During register coalescing, we update the live-intervals on-the-fly.
To do that we are in this strange mode where the live-intervals can
be slightly out-of-sync (more precisely they are forward looking)
compared to what the IR actually represents.
This happens because the register coalescer only updates the IR when
it is done with updating the live-intervals and it has to do it this
way because updating the IR on-the-fly would actually clobber some
information on how the live-ranges that are being updated look like.

This is problematic for updates that rely on the IR to accurately
represents the state of the live-ranges. Right now, we have only
one of those: stripValuesNotDefiningMask.
To reconcile this need of out-of-sync IR, this patch introduces a
new argument to LiveInterval::refineSubRanges that allows the code
doing the live range updates to reason about how the code should
look like after the coalescer will have rewritten the registers.
Essentially this captures how a subregister index with be offseted
to match its position in a new register class.

E.g., let say we want to merge:
    V1.sub1:<2 x s32> = COPY V2.sub3:<4 x s32>

We do that by choosing a class where sub1:<2 x s32> and sub3:<4 x s32>
overlap, i.e., by choosing a class where we can find "offset + 1 == 3".
Put differently we align V2's sub3 with V1's sub1:
    V2: sub0 sub1 sub2 sub3
    V1: <offset>  sub0 sub1

This offset will look like a composed subregidx in the the class:
     V1.(composed sub2 with sub1):<4 x s32> = COPY V2.sub3:<4 x s32>
 =>  V1.(composed sub2 with sub1):<4 x s32> = COPY V2.sub3:<4 x s32>

Now if we didn't rewrite the uses and def of V1, all the checks for V1
need to account for this offset to match what the live intervals intend
to capture.

Prior to this patch, we would fail to recognize the uses and def of V1
and would end up with machine verifier errors: No live segment at def.
This could lead to miscompile as we would drop some live-ranges and
thus, miss some interferences.

For this problem to trigger, we need to reach stripValuesNotDefiningMask
while having a mismatch between the IR and the live-ranges (i.e.,
we have to apply a subreg offset to the IR.)

This requires the following three conditions:
1. An update of overlapping subreg lanes: e.g., dsub0 == <ssub0, ssub1>
2. An update with Tuple registers with a possibility to coalesce the
   subreg index: e.g., v1.dsub_1 == v2.dsub_3
3. Subreg liveness enabled.

looking at the IR to decide what is alive and what is not, i.e., calling
stripValuesNotDefiningMask.
coalescer maintains for the live-ranges information.

None of the targets that currently use subreg liveness (i.e., the targets
that fulfill #3, Hexagon, AMDGPU, PowerPC, and SystemZ IIRC) expose #1 and
and #2, so this patch also artificial enables subreg liveness for ARM,
so that a nice test case can be attached.
2019-11-13 11:17:56 -08:00
David Stenberg 7417cc149b Fix typo in DwarfDebug [NFC] 2019-11-13 18:06:16 +01:00
David Stenberg 5e646ff530 [DebugInfo] Avoid creating entry values for clobbered registers
Summary:
Entry values are considered for parameters that have register-described
DBG_VALUEs in the entry block (along with other conditions).

If a parameter's value has been propagated from the caller to the
callee, then the parameter's DBG_VALUE in the entry block may be
described using a register defined by some instruction, and entry values
should not be emitted for the parameter, which can currently occur.
One such case was seen in the attached test case, in which the second
parameter, which is described by a redefinition of the first parameter's
register, would incorrectly get an entry value using the first
parameter's register. This commit intends to solve such cases by keeping
track of register defines, and ignoring DBG_VALUEs in the entry block
that are described by such registers.

In a RelWithDebInfo build of clang-8, the average size of the set was
27, and in a RelWithDebInfo+ASan build it was 30.

Reviewers: djtodoro, NikolaPrica, aprantl, vsk

Reviewed By: djtodoro, vsk

Subscribers: hiraditya, llvm-commits

Tags: #debug-info, #llvm

Differential Revision: https://reviews.llvm.org/D69889
2019-11-13 11:10:47 +01:00
David Stenberg 4fec44cd61 [DebugInfo] Add helper for finding entry value candidates [NFC]
Summary:
The conditions that are used to determine if entry values should be
emitted for a parameter are quite many, and will grow slightly
in a follow-up commit, so move those to a helper function, as was
suggested in the code review for D69889.

Reviewers: djtodoro, NikolaPrica

Reviewed By: djtodoro

Subscribers: probinson, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D69955
2019-11-13 11:10:47 +01:00
Sander de Smalen 9a1c243aa5 [AArch64][SVE] Allocate locals that are scalable vectors.
This patch adds a target interface to set the StackID for a given type,
which allows scalable vectors (e.g. `<vscale x 16 x i8>`) to be assigned a
'sve-vec' StackID, so it is allocated in the SVE area of the stack frame.

Reviewers: ostannard, efriedma, rengolin, cameron.mcinally

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D70080
2019-11-13 09:45:24 +00:00
joanlluch d384ad6b63 [TargetLowering][DAGCombine][MSP430] Shift Amount Threshold in DAGCombine (4)
Summary:
Replaces
```
unsigned getShiftAmountThreshold(EVT VT)
```
by

```
bool shouldAvoidTransformToShift(EVT VT, unsigned amount)
```
thus giving more flexibility for targets to decide whether particular shift amounts must be considered expensive or not.

Updates the MSP430 target with a custom implementation.

This continues  D69116, D69120, D69326 and updates them, so all of them must be committed before this.

Existing tests apply, a few more have been added.

Reviewers: asl, spatel

Reviewed By: spatel

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70042
2019-11-13 09:23:08 +01:00
Tim Renouf 07ebd74154 MCP: Fixed bug with dest overlapping copy source
In MachineCopyPropagation, when propagating the source of a copy into
the operand of a later instruction, bail if a destination overlaps
(partly defines) the copy source. If the instruction where the
substitution is happening is also a copy, allowing the propagation
confuses the tracking mechanism.

Differential Revision: https://reviews.llvm.org/D69953

Change-Id: Ic570754f878f2d91a4a50a9bdcf96fbaa240726d
2019-11-12 08:18:11 +00:00
aqjune e87d71668e [IR] Redefine Freeze instruction
Summary:
This patch redefines freeze instruction from being UnaryOperator to a subclass of UnaryInstruction.

ConstantExpr freeze is removed, as discussed in the previous review.
FreezeOperator is not added because there's no ConstantExpr freeze.
`freeze i8* null` test is added to `test/Bindings/llvm-c/freeze.ll` as well, because the null pointer-related bug in `tools/llvm-c/echo.cpp` is now fixed.
InstVisitor has visitFreeze now because freeze is not unaryop anymore.

Reviewers: whitequark, deadalnix, craig.topper, jdoerfert, lebedev.ri

Reviewed By: craig.topper, lebedev.ri

Subscribers: regehr, nlopes, mehdi_amini, hiraditya, steven_wu, dexonsmith, jfb, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D69932
2019-11-12 10:49:00 +09:00
Sean Fertile e5e2e0a66b [PowerPC][XCOFF] Add support for zero initialized global values.
For XCOFF, globals mapped into the .bss section are linked as COMMON
definitions. This behaviour is incorrect for zero initialized data, so
emit those to the .data section instead.

Differential Revision: https://reviews.llvm.org/D69528
2019-11-11 18:52:10 -05:00
Victor Huang edab7dd426 Disable hoisting MI to hotter basic blocks
In current Hoist() function of machine licm pass, it will not check the source and destination basic block frequencies that a instruction is hoisted from/to.
There is a chance that instruction is hoisted from a cold to a hot basic block.

In this patch, we add options to disable machine instruction hoisting if destination block is hotter.

Differential Revision: https://reviews.llvm.org/D63676
2019-11-11 21:32:56 +00:00
Thomas Raoux e0f1d9d872 [ModuloSchedule] Fix modulo expansion for data loop carried dependencies.
The new experimental expansion has a problem when a value has a data
dependency with an instruction from a previous stage. This is due to
the way we peel out the kernel. To fix that I'm changing the way we
peel out the kernel. We now peel the kernel NumberStage - 1 times.
The code would be correct at this point if we didn't have to handle
cases where the loop iteration is smaller than the number of stages.
To handle this case we move instructions between different epilogues
based on their stage and remap the PHI instructions correctly.

Differential Revision: https://reviews.llvm.org/D69538
2019-11-11 12:09:27 -08:00
Thomas Raoux 03da6e8c00 [ModuloSchedule] Do target loop analysis before peeling.
Simple change to call target hook analyzeLoopForPipelining before
    changing the loop. After peeling analyzing the loop may be more
    complicated for target that don't have a loop instruction. This doesn't
    affect Hexagone and PPC as they have hardware loop instructions.

    Differential Revision: https://reviews.llvm.org/D69912
2019-11-11 09:35:39 -08:00
Yi-Hong Lyu 6bbfafd037 [CGP] Make ICMP_EQ use CR result of ICMP_S(L|G)T dominators
For example:

long long test(long long a, long long b) {
  if (a << b > 0)
    return b;
  if (a << b < 0)
    return a;
  return a*b;
}

Produces:

        sld. 5, 3, 4
        ble 0, .LBB0_2
        mr 3, 4
        blr
.LBB0_2:                                # %if.end
        cmpldi  5, 0
        li 5, 1
        isel 4, 4, 5, 2
        mulld 3, 4, 3
        blr

But the compare (cmpldi 5, 0) is redundant and can be removed (CR0 already
contains the result of that comparison).

The root cause of this is that LLVM converts signed comparisons into equality
comparison based on dominance. Equality comparisons are unsigned by default, so
we get either a record-form or cmp (without the l for logical) feeding a cmpl.
That is the situation we want to avoid here.

Differential Revision: https://reviews.llvm.org/D60506
2019-11-11 17:28:50 +00:00
Francis Visoiu Mistrih a9a3781df8 [ObjC] Override TailCallKind when lowering objc intrinsics
The tail-call-kind-ness is known by the ObjCARC analysis and can be
enforced while lowering the intrinsics to calls.

This allows us to get the requested tail calls at -O0 without trying to
preserve the attributes throughout passes that change code even at -O0
,like the Always Inliner, where the ObjCOpt pass doesn't run.

Differential Revision: https://reviews.llvm.org/D69980
2019-11-11 08:30:06 -08:00
joanlluch e0012c5d6a [TargetLowering][DAGCombine][MSP430] Shift Amount Threshold in DAGCombine (3)
Summary:
Additional filtering of undesired shifts for targets that do not support them efficiently.

Related with  D69116 and  D69120

Applies the TLI.getShiftAmountThreshold hook to prevent undesired generation of shifts for the following IR code:

```
define i16 @testShiftBits(i16 %a) {
entry:
  %and = and i16 %a, -64
  %cmp = icmp eq i16 %and, 64
  %conv = zext i1 %cmp to i16
  ret i16 %conv
}

define i16 @testShiftBits_11(i16 %a) {
entry:
  %cmp = icmp ugt i16 %a, 63
  %conv = zext i1 %cmp to i16
  ret i16 %conv
}

define i16 @testShiftBits_12(i16 %a) {
entry:
  %cmp = icmp ult i16 %a, 64
  %conv = zext i1 %cmp to i16
  ret i16 %conv
}
```
The attached diff file shows the piece code in TargetLowering that is responsible for the generation of shifts in relation to the IR above.

Before applying this patch, shifts will be generated to replace non-legal icmp immediates. However, shifts may be undesired if they are even more expensive for the target.

For all my previous patches in this series (cited above) I added test cases for the MSP430 target. However, in this case, the target is not suitable for showing improvements related with this patch, because the MSP430 does not implement "isLegalICmpImmediate". The default implementation returns always true, therefore the patched code in TargetLowering is never reached for that target. Targets implementing both "isLegalICmpImmediate" and "getShiftAmountThreshold" will benefit from this.

The differential effect of this patch can only be shown for the MSP430 by temporarily implementing "isLegalICmpImmediate" to return false for large immediates. This is simulated with the implementation of a command line flag that was incorporated in D69975

This patch belongs to a initiative to "relax" the generation of shifts by LLVM for targets requiring it

Reviewers: spatel, lebedev.ri, asl

Reviewed By: spatel

Subscribers: lenary, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D69326
2019-11-11 10:18:25 +01:00
Simon Pilgrim 6976a0e826 RegisterCoalescer - remove duplicate variable to fix Wshadow warning. NFCI. 2019-11-09 20:10:12 +00:00
Simon Pilgrim f092e80939 RegisterCoalescer - fix uninitialized variables. NFCI. 2019-11-09 20:10:11 +00:00
Simon Pilgrim 7f8488eeb4 Fix operator precedence warning. NFC. 2019-11-09 17:03:21 +00:00
David Blaikie db797bfb2b DebugInfo: Remove redundant conditionals/checks from macro info emission
These checks fall out naturally from the current implementation without
needing to be explicitly considered anymore.
2019-11-08 15:31:15 -08:00
David Blaikie 736273c7fe DebugInfo: Do not create a debug_macinfo section if no CUs have associated macros
Patch based on Sourabh Singh's D69839 patch.
2019-11-08 15:30:11 -08:00
David Blaikie 3951245c38 NVPTX: Don't insert an extra empty line at the end of the last section.
This was arbitrarily appearing in only the last section emitted - which
made tests more sensitive than they needed to be (removing the last
section - like the macinfo section change that's coming after this)
would, surprisingly, move the blank line to the previous section.
2019-11-08 15:16:04 -08:00
David Blaikie 39c308f6b8 DebugInfo: Use separate macinfo contributions for each CU
The macinfo support was broken for LTO situations, by terminating
macinfo lists only once - multiple macinfo contributions were correctly
labeled, but they all continued/flowed into later contributions until
only one terminator appeared at the end of the section.

Correctly terminate each contribution & fix the parsing to handle this
situation too. The parsing fix is also necessary for dumping linked
binaries - the previous code would stop at the end of the first
contribution - missing all later contributions in a linked binary.

It'd be nice to improve the dumping to print the offsets of each
contribution so it'd be easier to know which CU AT_macro_info refers to
which macinfo contribution.
2019-11-08 13:27:00 -08:00
Eli Friedman 5df3a87224 [AArch64][X86] Don't assume __powidf2 is available on Windows.
We had some code for this for 32-bit ARM, but this doesn't really need
to be in target-specific code; generalize it.

(I think this started showing up recently because we added an
optimization that converts pow to powi.)

Differential Revision: https://reviews.llvm.org/D69013
2019-11-08 12:43:21 -08:00
Djordje Todorovic 8d2ccd1ac3 Reland: [TII] Use optional destination and source pair as a return value; NFC
Refactor usage of isCopyInstrImpl, isCopyInstr and isAddImmediate methods
to return optional machine operand pair of destination and source
registers.

Patch by Nikola Prica

Differential Revision: https://reviews.llvm.org/D69622
2019-11-08 13:00:39 +01:00
Hans Wennborg ff3b513495 Revert d91ed80 "[codeview] Reference types in type parent scopes"
This triggered asserts in the Chromium build, see https://crbug.com/1022729 for
details and reproducer.

> Without this change, when a nested tag type of any kind (enum, class,
> struct, union) is used as a variable type, it is emitted without
> emitting the parent type. In CodeView, parent types point to their inner
> types, and inner types do not point back to their parents. We already
> walk over all of the parent scopes to build the fully qualified name.
> This change simply requests their type indices as we go along to enusre
> they are all emitted.
>
> Fixes PR43905
>
> Reviewers: akhuang, amccarth
>
> Differential Revision: https://reviews.llvm.org/D69924
2019-11-08 11:30:33 +01:00
Sanne Wouda f649f24d38 [RAGreedy] Enable -consider-local-interval-cost for AArch64
Summary:
The greedy register allocator occasionally decides to insert a large number of
unnecessary copies, see below for an example.  The -consider-local-interval-cost
option (which X86 already enables by default) fixes this.  We enable this option
for AArch64 only after receiving feedback that this change is not beneficial for
PowerPC.

We evaluated the impact of this change on compile time, code size and
performance benchmarks.

This option has a small impact on compile time, measured on CTMark. A 0.1%
geomean regression on -O1 and -O2, and 0.2% geomean for -O3, with at most 0.5%
on individual benchmarks.

The effect on both code size and performance on AArch64 for the LLVM test suite
is nil on the geomean with individual outliers (ignoring short exec_times)
between:

                 best     worst
  size..text     -3.3%    +0.0%
  exec_time      -5.8%    +2.3%

On SPEC CPU® 2017 (compiled for AArch64) there is a minor reduction (-0.2% at
most) in code size on some benchmarks, with a tiny movement (-0.01%) on the
geomean.  Neither intrate nor fprate show any change in performance.

This patch makes the following changes.

- For the AArch64 target, enableAdvancedRASplitCost() now returns true.

- Ensures that -consider-local-interval-cost=false can disable the new
  behaviour if necessary.

This matrix multiply example:

   $ cat test.c
   long A[8][8];
   long B[8][8];
   long C[8][8];

   void run_test() {
     for (int k = 0; k < 8; k++) {
       for (int i = 0; i < 8; i++) {
	 for (int j = 0; j < 8; j++) {
	   C[i][j] += A[i][k] * B[k][j];
	 }
       }
     }
   }

results in the following generated code on AArch64:

  $ clang --target=aarch64-arm-none-eabi -O3 -S test.c -o -
  [...]
                                        // %for.cond1.preheader
                                        // =>This Inner Loop Header: Depth=1
        add     x14, x11, x9
        str     q0, [sp, #16]           // 16-byte Folded Spill
        ldr     q0, [x14]
        mov     v2.16b, v15.16b
        mov     v15.16b, v14.16b
        mov     v14.16b, v13.16b
        mov     v13.16b, v12.16b
        mov     v12.16b, v11.16b
        mov     v11.16b, v10.16b
        mov     v10.16b, v9.16b
        mov     v9.16b, v8.16b
        mov     v8.16b, v31.16b
        mov     v31.16b, v30.16b
        mov     v30.16b, v29.16b
        mov     v29.16b, v28.16b
        mov     v28.16b, v27.16b
        mov     v27.16b, v26.16b
        mov     v26.16b, v25.16b
        mov     v25.16b, v24.16b
        mov     v24.16b, v23.16b
        mov     v23.16b, v22.16b
        mov     v22.16b, v21.16b
        mov     v21.16b, v20.16b
        mov     v20.16b, v19.16b
        mov     v19.16b, v18.16b
        mov     v18.16b, v17.16b
        mov     v17.16b, v16.16b
        mov     v16.16b, v7.16b
        mov     v7.16b, v6.16b
        mov     v6.16b, v5.16b
        mov     v5.16b, v4.16b
        mov     v4.16b, v3.16b
        mov     v3.16b, v1.16b
        mov     x12, v0.d[1]
        fmov    x15, d0
        ldp     q1, q0, [x14, #16]
        ldur    x1, [x10, #-256]
        ldur    x2, [x10, #-192]
        add     x9, x9, #64             // =64
        mov     x13, v1.d[1]
        fmov    x16, d1
        ldr     q1, [x14, #48]
        mul     x3, x15, x1
        mov     x14, v0.d[1]
        fmov    x17, d0
        mov     x18, v1.d[1]
        fmov    x0, d1
        mov     v1.16b, v3.16b
        mov     v3.16b, v4.16b
        mov     v4.16b, v5.16b
        mov     v5.16b, v6.16b
        mov     v6.16b, v7.16b
        mov     v7.16b, v16.16b
        mov     v16.16b, v17.16b
        mov     v17.16b, v18.16b
        mov     v18.16b, v19.16b
        mov     v19.16b, v20.16b
        mov     v20.16b, v21.16b
        mov     v21.16b, v22.16b
        mov     v22.16b, v23.16b
        mov     v23.16b, v24.16b
        mov     v24.16b, v25.16b
        mov     v25.16b, v26.16b
        mov     v26.16b, v27.16b
        mov     v27.16b, v28.16b
        mov     v28.16b, v29.16b
        mov     v29.16b, v30.16b
        mov     v30.16b, v31.16b
        mov     v31.16b, v8.16b
        mov     v8.16b, v9.16b
        mov     v9.16b, v10.16b
        mov     v10.16b, v11.16b
        mov     v11.16b, v12.16b
        mov     v12.16b, v13.16b
        mov     v13.16b, v14.16b
        mov     v14.16b, v15.16b
        mov     v15.16b, v2.16b
        ldr     q2, [sp]                // 16-byte Folded Reload
        fmov    d0, x3
        mul     x3, x12, x1
  [...]

With -consider-local-interval-cost the same section of code results in the
following:

  $ clang --target=aarch64-arm-none-eabi -mllvm -consider-local-interval-cost -O3 -S test.c -o -
  [...]
  .LBB0_1:                              // %for.cond1.preheader
                                        // =>This Inner Loop Header: Depth=1
        add     x14, x11, x9
        ldp     q0, q1, [x14]
        ldur    x1, [x10, #-256]
        ldur    x2, [x10, #-192]
        add     x9, x9, #64             // =64
        mov     x12, v0.d[1]
        fmov    x15, d0
        mov     x13, v1.d[1]
        fmov    x16, d1
        ldp     q0, q1, [x14, #32]
        mul     x3, x15, x1
        cmp     x9, #512                // =512
        mov     x14, v0.d[1]
        fmov    x17, d0
        fmov    d0, x3
        mul     x3, x12, x1
  [...]

Reviewers: SjoerdMeijer, samparker, dmgreen, qcolombet

Reviewed By: dmgreen

Subscribers: ZhangKang, jsji, wuzish, ppc-slack, lkail, steven.zhang, MatzeB, qcolombet, kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D69437
2019-11-08 10:20:28 +00:00
Galina Kistanova ad3c9d46fe Revert "[MachineVerifier] Improve verification of live-in lists.
This reverts commit b7b170c to give the author more time to address failing tests on the expensive checks buildbots.
2019-11-07 14:02:13 -08:00
Reid Kleckner d91ed80e97 [codeview] Reference types in type parent scopes
Without this change, when a nested tag type of any kind (enum, class,
struct, union) is used as a variable type, it is emitted without
emitting the parent type. In CodeView, parent types point to their inner
types, and inner types do not point back to their parents. We already
walk over all of the parent scopes to build the fully qualified name.
This change simply requests their type indices as we go along to enusre
they are all emitted.

Fixes PR43905

Reviewers: akhuang, amccarth

Differential Revision: https://reviews.llvm.org/D69924
2019-11-07 13:58:01 -08:00
Simon Pilgrim 77cfe83f7d PostRAScheduler - fix uninitialized variable warning. NFCI. 2019-11-07 16:56:16 +00:00
Sanjay Patel 777d1d1d98 [SDAG] reduce code duplication; NFC 2019-11-07 10:28:45 -05:00
Sanjay Patel 2fdd58c506 [SDAG] reduce code duplication; NFC 2019-11-07 10:15:17 -05:00
Dávid Bolvanský 62ad212825 [Analysis] Attribute deref/deref_or_null should not prevent tail call optimization 2019-11-06 23:08:07 +01:00
Philip Reames db036ee0a4 [X86/Atomics] Correct a few transforms for new atomic lowering
This is a partial fix for the issues described in commit message of 027aa27 (the revert of G24609).  Unfortunately, I can't provide test coverage for it on it's own as the only (known) wrong example is still wrong, but due to a separate issue.

These fixes are cases where when performing unrelated DAG combines, we were dropping the atomicity flags entirely.
2019-11-05 13:20:08 -08:00
Amy Huang a078c77d72 [MIR] Add MIR parsing for heap alloc site instruction markers
Summary:
This patch adds MIR parsing and printing for heap alloc markers, which were
added in D69136. They are printed as an operand similar to pre-/post-instr
symbols, with a heap-alloc-marker token and a metadata node.

Reviewers: rnk

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D69864
2019-11-05 12:57:45 -08:00
Daniel Sanders e74c5b9661 [globalisel] Rename G_GEP to G_PTR_ADD
Summary:
G_GEP is rather poorly named. It's a simple pointer+scalar addition and
doesn't support any of the complexities of getelementptr. I therefore
propose that we rename it. There's a G_PTR_MASK so let's follow that
convention and go with G_PTR_ADD

Reviewers: volkan, aditya_nandakumar, bogner, rovka, arsenm

Subscribers: sdardis, jvesely, wdng, nhaehnle, hiraditya, jrtc27, atanasyan, arphaman, Petar.Avramovic, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D69734
2019-11-05 10:31:17 -08:00
Simon Pilgrim 117e6dd6cc Remove redundant assignment. NFCI.
Fixes cppcheck warning.
2019-11-05 17:08:08 +00:00
Simon Pilgrim 76166a1ac7 Use iterator prefix increment. NFCI. 2019-11-05 17:08:08 +00:00
Simon Pilgrim 7ad2583613 [MachineOutliner] Reduce scope of variable and stop duplicate getMF() calls. NFCI. 2019-11-05 17:08:08 +00:00
jmolloy 39525a6723 [DFAPacketizer] Allow up to 64 functional units
Summary:
To drive the automaton we used a uint64_t as an action type. This
contained the transition's resource requirements as a conjunction:

  (a OR b) AND (b OR c)

We encoded this conjunction as a sequence of four 16-bit bitmasks.
This limited the number of addressable functional units to 16, which
is quite low and has bitten many people in the past.

Instead, the DFAEmitter now generates a lookup table from InstrItinerary
class (index of the ItinData inside the ProcItineraries) to an internal
action index which is essentially a dense embedding of the conjunctive
form. Because we never materialize the conjunctive form, we no longer
have the 16 FU restriction.

In this patch we limit to 64 functional units due to using a uint64_t
bitmask in the DFAEmitter. Now that we've decoupled these representations
we can increase this in future.

Reviewers: ThomasRaoux, kparzysz, majnemer

Reviewed By: ThomasRaoux

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D69110
2019-11-05 15:41:42 +00:00
Simon Pilgrim c7f127d93f [MachineOutliner] Fix uninitialized variable warnings. NFCI. 2019-11-05 15:15:14 +00:00
Dávid Bolvanský 9f294fc497 [AtomicExpandPass] Silence static analyzer warnings about operator priority. NFCI. 2019-11-05 13:55:46 +01:00
David Green f01b9aa89e [MachineScheduler] Enable AA in PostRA Machine scheduler
This adds AA to Post-RA Machine Scheduling, allowing the pass more
freedom when handling memory operations.

My understanding is that this was just never done, not that it is
inherently incorrect to do so. The older PostRA List scheduler already
makes use of AA, it's just that the MI PostRA Scheduler was never taught
to use it.

Differential Revision: https://reviews.llvm.org/D69814
2019-11-05 11:58:50 +00:00
Thomas Preud'homme 646896a442 Fix PR40644: miscompile indexed FP constant store
Summary:
Functions replaceStoreOfFPConstant() and OptimizeFloatStore() both
replace store of float by a store of an integer unconditionally. However
this generates wrong code when the store that is replaced is an indexed
or truncating store. This commit solves this issue by adding an early
return in these functions when the store being considered is not a
normal store.

Bug was only observed on out of tree targets, hence the lack of testcase
in this commit.

Reviewers: efriedma

Subscribers: hiraditya, arphaman, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D68420
2019-11-05 11:07:52 +00:00
David Green 7d9af03ff7 [Scheduling][ARM] Consistently enable PostRA Machine scheduling
In the ARM backend, for historical reasons we have only some targets
using Machine Scheduling. The rest use the old list scheduler as they
are using itinaries and the list scheduler seems to produce better code
(and not crash running out of register on v6m codes). So whether to use
the MIScheduler or not is checked at runtime from the subtarget
features.

This is fine, except for post-ra scheduling. Whether to use the old
post-ra list scheduler or the post-ra machine schedule is decided as the
pass manager is set up, in arms case from a newly constructed subtarget.
Under some situations, like LTO, this won't include the correct cpu so
can pick the wrong option. This can have a surprising effect on
performance.

To fix that, this patch overrides targetSchedulesPostRAScheduling and
addPreSched2 in the ARM backend, adding _both_ post-ra schedulers and
picking at runtime which to execute. To pick between the two I've had to
add a enablePostRAMachineScheduler() method that normally returns
enableMachineScheduler() && enablePostRAScheduler(), which can be
overridden to enable just one of PostRAMachineScheduler vs
PostRAScheduler.

Thanks to David Penry for the identifying this problem.

Differential Revision: https://reviews.llvm.org/D69775
2019-11-05 10:44:55 +00:00
Sjoerd Meijer 92164cf25d Recommit "[HardwareLoops] Optimisation remarks"
With a few things fixed:
- initialisaiton of the optimisation remark pass (this was causing the buildbot
  failures on PPC),
- a test case.

Differential Revision: https://reviews.llvm.org/D69660
2019-11-05 09:06:22 +00:00
aqjune 58acbce3de [IR] Add Freeze instruction
Summary:
- Define Instruction::Freeze, let it be UnaryOperator
- Add support for freeze to LLLexer/LLParser/BitcodeReader/BitcodeWriter
  The format is `%x = freeze <ty> %v`
- Add support for freeze instruction to llvm-c interface.
- Add m_Freeze in PatternMatch.
- Erase freeze when lowering IR to SelDag.

Reviewers: deadalnix, hfinkel, efriedma, lebedev.ri, nlopes, jdoerfert, regehr, filcab, delcypher, whitequark

Reviewed By: lebedev.ri, jdoerfert

Subscribers: jfb, kristof.beyls, hiraditya, lebedev.ri, steven_wu, dexonsmith, xbolva00, delcypher, spatel, regehr, trentxintong, vsk, filcab, nlopes, mehdi_amini, deadalnix, llvm-commits

Differential Revision: https://reviews.llvm.org/D29011
2019-11-05 15:54:56 +09:00
Sanjay Patel 113181e9bd [DAGCombine][MSP430] use shift amount threshold in DAGCombine (2/2)
Continuation of:
D69116

Contributes to a fix for PR43559:
https://bugs.llvm.org/show_bug.cgi?id=43559

See also D69099 and D69116

Use the TLI hook in DAGCombine.cpp to guard against creating
shift nodes that are not optimal for a target.

Patch by: @joanlluch (Joan LLuch)

Differential Revision: https://reviews.llvm.org/D69120
2019-11-04 13:41:41 -05:00
Ulrich Weigand 664f84e246 [FPEnv][SelectionDAG] Refactor strict FP node construction
Small refactoring in visitConstrainedFPIntrinsic that should make
it easier to create DAG nodes requiring extra arguments.  That is
the case currently only for STRICT_FP_ROUND, but may be the case
for additional nodes (in particular compares) in the future.

Extracted from the patch for D69281.

NFC.
2019-11-04 17:45:54 +01:00
Jonas Paulsson b7b170c9b4 [MachineVerifier] Improve verification of live-in lists.
MachineVerifier::visitMachineFunctionAfter() is extended to check the
live-through case for live-in lists. This is only done for registers without
aliases and that are neither allocatable or reserved, such as the SystemZ::CC
register.

The MachineVerifier earlier only catched the case of a live-in use without
an entry in the live-in list (as "using an undefined physical register").

A comment in LivePhysRegs.h has been added stating a guarantee that
addLiveOuts() can be trusted for a full register both before and after
register allocation.

Review: Quentin Colombet
https://reviews.llvm.org/D68267
2019-11-04 16:22:00 +01:00
Dávid Bolvanský a18a8db0d4 [SelectionDAG] Fixed null check after dereferencing warning. NFCI. 2019-11-03 19:34:03 +01:00
shkzhang 4e9778e346 [CodeGen] [ExpandReduction] Fix the bug for ExpandReduction() when vector size isn't power of 2
Summary:
For below test case, we will get assert error except for AArch64 and ARM:

declare i8 @llvm.experimental.vector.reduce.and.i8.v3i8(<3 x i8> %a)
define i8 @test_v3i8(<3 x i8> %a) nounwind {
  %b = call i8 @llvm.experimental.vector.reduce.and.i8.v3i8(<3 x i8> %a)
  ret i8 %b
}
In the function getShuffleReduction (), we can see it needs the vector size must be power of 2.

This patch is fix below error when the number of element is not power of 2 for those llvm.experimental.vector.reduce.* function.

Reviewed By: jsji

Differential Revision: https://reviews.llvm.org/D68625
2019-11-02 23:59:12 -04:00
Simon Pilgrim 095d2a4ced FastISel - fix uninitialized variable warnings in constructor. NFCI. 2019-11-02 18:03:22 +00:00
Simon Pilgrim 97725707f4 Fix uninitialized variable warning. NFCI. 2019-11-02 14:42:38 +00:00
David Blaikie 89b7f16204 DebugInfo: Streamline debug_ranges/rnglists/rnglists.dwo emission code
More code reuse, better basis for modelling
debug_loc/loclists/loclists.dwo emission support.
2019-11-01 14:56:43 -07:00
Craig Topper 96bb076621 [TargetLowering] Move the setBooleanContents check on (xor (setcc), (setcc)) == / != 1 -> (setcc) != / == (setcc) to the right place
We need to be checking the value types for the inner setccs not
the outer setcc. We need to ensure those setccs produce a 0/1
value or that the xor is on the i1 type. I think at the time
this code was originally written, getBooleanContents didn't
take any arguments so this was probably correct. But now we can
have a different boolean contents for integer and floating point.

Not sure why the other combines below the xor were also checking
the boolean contents. None of them involve any setccs other than
the outer one and they only produce a new setcc.

Differential Revision: https://reviews.llvm.org/D69480
2019-11-01 14:43:17 -07:00
Craig Topper 42d77461f3 [MachineBasicBlock] Skip over debug instructions in computeRegisterLiveness before checking for begin
If there are debug instructions before the stopping point,
we need to skip over them before checking for begin in order
to avoid having the debug instructions effect behavior.

Fixes PR43758.

Differential Revision: https://reviews.llvm.org/D69606
2019-11-01 14:43:17 -07:00
Reid Kleckner f5d935c167 [WinCFG] Handle constant casts carefully in .gfids emission
Summary:
The general Function::hasAddressTaken has two issues that make it
inappropriate for our purposes:
1. it is sensitive to dead constant users (PR43858 / crbug.com/1019970),
   leading to different codegen when debu info is enabled
2. it considers direct calls via a function cast to be address escapes

The first is fixable, but the second is not, because IPO clients rely on
this behavior. They assume this function means that all call sites are
analyzable for IPO purposes.

So, implement our own analysis, which gets closer to finding functions
that may be indirect call targets.

Reviewers: ajpaverd, efriedma, hans

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D69676
2019-11-01 13:32:03 -07:00
Bjorn Pettersson 56c22931bd [LDV][RAGreedy] Inform LiveDebugVariables about new VRegs added by InlineSpiller
Summary:
Make sure RAGreedy informs LiveDebugVariables about new VRegs
that is introduced at spill by InlineSpiller.

Consider this example

 LDV: !"var"	 [48r;128r):0 Loc0=%2

 48B   %2 = ...
 ...
 128B  %7 = ADD %2, ...

If %2 is spilled the InlineSpiller will insert spill/reload
instructions and introduces some new vregs. So we get

 48B   %4 = ...
 56B   spill %4
 ...
 120B  reload %5
 128B  %3 = ADD %5, ...

In the past we did not inform LDV about this, and when reintroducing
DBG_VALUE instruction LDV still got information that "var" had the
location of the spilled register %2 for the interval [48r;128r).
The result was bad, since we mapped "var" to the spill slot even
before the spill happened:

 %4 = ...
 DBG_VALUE %spill.0, !"var"
 spill %4 to %spill.0
 ...
 reload %5
 %3 = ADD %5, ...

This patch will inform LDV about the interval split introduced
due to spilling. So the location map in LDV will become

 !"var"	[48r;56r):1 [56r;120r):0 [120r;128r):2 Loc0=%2 Loc1=%4 Loc2=%5

And when inserting DBG_VALUE instructions we get

 %4 = ...
 DBG_VALUE %4, !"var"
 spill %4 to %spill.0
 DBG_VALUE %spill.0, !"var"
 ...
 reload %5
 DBG_VALUE %5, !"var"
 %3 = ADD %5, ...

Fixes: https://bugs.llvm.org/show_bug.cgi?id=38899

Reviewers: jmorse, vsk, aprantl

Reviewed By: jmorse

Subscribers: dstenb, wuzish, MatzeB, qcolombet, nemanjai, hiraditya, jsji, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D69584
2019-11-01 16:25:32 +01:00
Matt Arsenault 6221767055 DAG: Add DAG argument to isFPExtFoldable
For AMDGPU this is dependent on the FP mode, which should eventually
not be a property of the subtarget.
2019-10-31 22:32:45 -07:00
Fangrui Song 44d0c3d947 [PGO][PGSO] Fix -DBUILD_SHARED_LIBS=on builds after D69580/llvmorg-10-init-8797-g0d987e411ac
Move TargetLoweringBase::isSuitableForJumpTable from
llvm/CodeGen/TargetLowering.h to .cpp, to avoid the undefined reference
from all LLVM${Target}ISelLowering.cpp.

Another fix is to add a dependency on TransformUtils to all
lib/Target/$Target/LLVMBuild.txt, but that is too disruptive.
2019-10-31 14:02:29 -07:00
Hiroshi Yamauchi 0d987e411a [PGO][PGSO] TargetLowering/TargetTransformationInfo/SwitchLoweringUtils part.
Summary:
(Split of off D67120)

TargetLowering/TargetTransformationInfo/SwitchLoweringUtils changes for profile
guided size optimization.

Reviewers: davidxl

Subscribers: eraman, hiraditya, haicheng, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D69580
2019-10-31 13:22:56 -07:00
Simon Pilgrim 3842b94c4e Revert rG57ee0435bd47f23f3939f402914c231b4f65ca5e - [TII] Use optional destination and source pair as a return value; NFC
This is breaking MSVC builds: http://lab.llvm.org:8011/builders/llvm-clang-x86_64-expensive-checks-win/builds/20375
2019-10-31 18:00:29 +00:00
Sanne Wouda f2cb9c0eab Fix missing memcpy, memmove and memset tail calls
Summary:
If a wrapper around one of the mem* stdlib functions bitcasts the returned
pointer value before returning it (e.g. to a wchar_t*), LLVM does not emit a
tail call.

Add a check for this scenario so that we emit a tail call.

Reviewers: wmi, mkuper, ramred01, dmgreen

Reviewed By: wmi, dmgreen

Subscribers: hiraditya, sanwou01, javed.absar, lebedev.ri, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D59078
2019-10-31 16:13:29 +00:00
Matt Arsenault 1725f28841 DAG: Add new control for ISD::FMAD formation
For AMDGPU this depends on whether denormals are enabled in the
default FP mode for the function. Currently this is treated as a
subtarget feature, so FMAD is selectively legal based on that. I want
to move this out of the subtarget features so this can be controlled
with a denormal mode attribute. Additionally, this will allow folding
based on a future ftz fast math flag.
2019-10-31 07:51:38 -07:00
Djordje Todorovic 57ee0435bd [TII] Use optional destination and source pair as a return value; NFC
Refactor usage of isCopyInstrImpl, isCopyInstr and isAddImmediate methods
to return optional machine operand pair of destination and source
registers.

Patch by Nikola Prica

Differential Revision: https://reviews.llvm.org/D69622
2019-10-31 15:34:49 +01:00
Jeremy Morse a8db456b53 Revert "[DebugInfo] MachineSink: Insert undef DBG_VALUEs when sinking instructions"
This reverts commit ee50590e16.

PR43855 reports a performance regression from this commit, which I'll
look into.
2019-10-31 12:39:06 +00:00
Jeremy Morse d382a8a768 Revert "[DebugInfo] MachineSink: find more DBG_VALUEs to sink"
This reverts commit f5e1b718a6.

PR43855 reports a performance regression with commit ee50590e. This commit
depends on the faulty one, so has to come out too.
2019-10-31 12:39:06 +00:00
David Candler 92aa0c2dbc [cfi] Add flag to always generate .debug_frame
This adds a flag to LLVM and clang to always generate a .debug_frame
section, even if other debug information is not being generated. In
situations where .eh_frame would normally be emitted, both .debug_frame
and .eh_frame will be used.

Differential Revision: https://reviews.llvm.org/D67216
2019-10-31 09:48:30 +00:00
Quentin Colombet f0eeb3c7a7 [GISel][CombinerHelper] Combine shuffle_vector scalar to build_vector
Teach the combiner helper how to replace shuffle_vector of scalars
into build_vector.
I am not particularly happy about having to add this combine, but we
currently get those from <1 x iN> from the IR.

Bonus: This fixes an assert in the shuffle_vector combines since before
this patch, we were expecting vector types.
2019-10-30 18:20:37 -07:00
Matt Arsenault 0202fa3a47 RegAllocFast: Use Register 2019-10-30 14:40:21 -07:00
Jeremy Morse 3137fe4d23 [DebugInfo][DAG] Distinguish different kinds of location indirection
From SelectionDAGs point of view, debug variable locations specified with
dbg.declare and dbg.addr are indirect -- they specify the address of
something. But calling conventions might mean that a Value is placed on
the stack somewhere, and this too is indirection. Previously this was
mixed up in the "IsIndirect" field of DBG_VALUE insts; this patch
separates them by encoding the indirection in a DIExpression.

If we have a dbg.declare or dbg.addr, then the expression produces an
address that then becomes a DWARF memory location. We can represent
this by putting a DW_OP_deref on the _end_ of the expression. If a Value
has been placed on the stack, then we need to put a DW_OP_deref on the
_start_ of the expression, to load the Value from the stack and have
the rest of the expression operate on it.

Differential Revision: https://reviews.llvm.org/D69028
2019-10-30 18:41:44 +00:00
Kevin P. Neal 72bc291f94 [NFC] Move this set of STRICT_* cases to be next to the non-strict cases.
Requested by Cameron McInally in D69275.
2019-10-30 13:32:27 -04:00
David Tellenbach fbe7f5e972 [NFC][MachineOutliner] Fix typo in comment 2019-10-30 16:28:11 +00:00
Jay Foad 86549c7528 [SelectionDAG] Add support for FP_ROUND in WidenVectorOperand.
Summary:
This is used on AMDGPU for rounding from v3f64 (which is illegal) to
v3f32 (which is legal).

Subscribers: jvesely, nhaehnle, tpr, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D69339
2019-10-30 15:18:21 +00:00
Krzysztof Parzyszek 43144ffa91 LiveIntervals: Split live intervals on multiple dead defs
This is a follow-up to D67448.

Split live intervals with multiple dead defs during the initial
execution of the live interval analysis, but do it outside of the
function createAndComputeVirtRegInterval.

Differential Revision: https://reviews.llvm.org/D68666
2019-10-30 08:50:46 -05:00
Djordje Todorovic 532815dd5c [ARM][AArch64][DebugInfo] Improve call site instruction interpretation
Extend the describeLoadedValue() with support for target specific ARM and
AArch64 instructions interpretation. The patch provides specialization for
ADD and SUB operations that include a register and an immediate/offset
operand. Some of the instructions can operate with global string addresses
or constant pool indexes but such cases are omitted since we currently lack
flexible support for processing such operands at DWARF production stage.

Patch by Nikola Prica

Differential Revision: https://reviews.llvm.org/D67556
2019-10-30 13:58:14 +01:00
David Zarzycki f68925d450
[X86] Make memcmp vector lowering handle arbitrary expansions
Teach combineVectorSizedSetCCEquality() to handle arbitrary memcmp
expansions but do not change any default policy for now.

This also fixes a bug in the memcmp expansion itself when large
displacements are needed.

https://reviews.llvm.org/D69507
2019-10-30 09:12:57 +02:00
Adrian Prantl f919be3365 [DWARF5] Added support for deleted C++ special member functions.
This patch adds support for deleted C++ special member functions in
clang and llvm. Also added Defaulted member encodings for future
support for defaulted member functions.

Patch by Sourabh Singh Tomar!

Differential Revision: https://reviews.llvm.org/D69215
2019-10-29 13:44:06 -07:00
Philip Reames 2460989eab [SelectionDAG] Enable lowering unordered atomics loads w/LoadSDNode (and stores w/StoreSDNode) by default
Enable the new SelectionDAG representation for unordered loads and stores introduced in r371441 by default.  As a reminder, the new lowering changes the representation of an unordered atomic load from an AtomicSDNode - which is essentially a black box which gets passed through without combines messing with it - to a LoadSDNode w/a atomic marker on the MMO. The later parallels the way we handle volatiles, and I've audited the code to ensure that every location which checks one checks the other.

This has been fairly heavily fuzzed, and I examined diffs in a reasonable large corpus of assembly by hand, so I'm reasonable sure this is correct for the common case.  Late in the review for this, it was discovered that I hadn't correctly handled cases which could be legalized into CAS operations.  This points out that there's a strong bias in the IR of the frontend I'm working with towards only legal atomics.  If there are problems with this patch, the most likely area will be legalization.

Differential Revision: https://reviews.llvm.org/D69219
2019-10-29 12:46:24 -07:00
Sander de Smalen d6a7da80aa Reland [AArch64][DebugInfo] Do not recompute CalleeSavedStackSize (Take 2)
llvm/test/DebugInfo/MIR/X86/live-debug-values-reg-copy.mir failed with
EXPENSIVE_CHECKS enabled, causing the patch to be reverted in
rG2c496bb5309c972d59b11f05aee4782ddc087e71.

This patch relands the patch with a proper fix to the
live-debug-values-reg-copy.mir tests, by ensuring the MIR encodes the
callee-saves correctly so that the CalleeSaved info is taken from MIR
directly, rather than letting it be recalculated by the PEI pass. I've
done this by running `llc -stop-before=prologepilog` on the LLVM
IR as captured in the test files, adding the extra MOV instructions
that were manually added in the original test file, then running `llc
-run-pass=prologepilog` and finally re-added the comments for the MOV
instructions.
2019-10-29 16:13:07 +00:00
Greg Bedwell 1ba72a81ca Fix some spelling mistakes in comments. NFC 2019-10-29 12:41:24 +00:00