Commit Graph

132218 Commits

Author SHA1 Message Date
Krzysztof Parzyszek 3656558cec [Hexagon] Only allow single HVX vector loads/stores in lowering
This will prevent store widening from forming vector pair stores,
which eventually end up broken up into single stores.
2020-03-14 14:26:01 -05:00
Simon Pilgrim ee862adf60 Fix signed/unsigned comparison warning. 2020-03-14 18:42:27 +00:00
Simon Pilgrim 0cb2f089c1 [X86] getFauxShuffleMask - pull out repeated byte sizes varaibles. NFC. 2020-03-14 17:36:17 +00:00
Florian Hahn 4878aa36d4 [ValueLattice] Add new state for undef constants.
This patch adds a new undef lattice state, which is used to represent
UndefValue constants or instructions producing undef.

The main difference to the unknown state is that merging undef values
with constants (or single element constant ranges) produces  the
constant/constant range, assuming all uses of the merge result will be
replaced by the found constant.

Contrary, merging non-single element ranges with undef needs to go to
overdefined. Using unknown for UndefValues currently causes mis-compiles
in CVP/LVI (PR44949) and will become problematic once we use
ValueLatticeElement for SCCP.

Reviewers: efriedma, reames, davide, nikic

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D75120
2020-03-14 17:19:59 +00:00
Georgii Rymar b236b4cb43 [yaml2obj] - Set a default value for `PAddr` property of a program header to a value of `VAddr`
`PAddr` corresponds to `p_paddr` of a program header, which is the segment's physical
address for systems in which physical addressing is relevant. `p_paddr` is often equal
to `p_vaddr`, which is the virtual address of a segment.

This patch changes the default for `PAddr` from 0 to a value of `VAddr`.

Differential revision: https://reviews.llvm.org/D76131
2020-03-14 17:44:57 +03:00
Simon Pilgrim f47f4c137b [X86] getFauxShuffleMask - merge insertelement paths
Merge the INSERT_VECTOR_ELT/SCALAR_TO_VECTOR and PINSRW/PINSRB shuffle mask paths - they both do the same thing (find source vector + handle implicit zero extension). The PINSRW/PINSRB path also handled in the insertion of zero case which needed to be added to the general case as well.
2020-03-14 13:11:03 +00:00
Shengchen Kan e6f1dd40bd [X86] Disable nop padding before instruction following a prefix
Reviewers: reames, MaskRay, craig.topper, LuoYuanke, jyknight

Reviewed By: LuoYuanke

Subscribers: hiraditya, llvm-commits, annita.zhang

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76052
2020-03-14 13:15:30 +08:00
Diogo Sampaio 83cdb654e4 [AArch64][Fix] LdSt optimization generate premature stack-popping
Summary:
When moving add and sub to memory operand instructions,
aarch64-ldst-opt would prematurally pop the stack pointer,
before memory instructions that do access the stack using
indirect loads.
e.g.
```
int foo(int offset){
    int local[4] = {0};
    return local[offset];
}
```
would generate:
```
sub     sp, sp, #16            ; Push the stack
mov     x8, sp                 ; Save stack in register
stp     xzr, xzr, [sp], #16    ; Zero initialize stack, and post-increment, making it invalid
------ If an exception goes here, the stack value might be corrupted
ldr     w0, [x8, w0, sxtw #2]  ; Access correct position, but it is not guarded by SP
```

Reviewers: fhahn, foad, thegameg, eli.friedman, efriedma

Reviewed By: efriedma

Subscribers: efriedma, kristof.beyls, hiraditya, danielkiss, llvm-commits, simon_tatham

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D75755
2020-03-14 02:03:10 +00:00
Craig Topper 755e00876c [X86] Remove isel patterns for X86VBroadcast+trunc+extload. Replace with DAG combines.
This is a little more complicated than I'd like it to be. We have
to manually match a trunc+srl+load pattern that generic DAG
combine won't do for us due to isTypeDesirableForOp.
2020-03-13 18:12:16 -07:00
Whitney Tsang aca7167535 [NFC][LoopUnrollAndJam] clang-format.
I am currently working on this file.
2020-03-14 00:04:10 +00:00
Philip Reames b4c8608eba Adjust debug output for MCRelaxableFragment to include the size so that sanity checking relaxation offsets from -debug output is easier 2020-03-13 16:22:46 -07:00
Eli Friedman 65fc706ddf [SCEV] Add support for GEPs over scalable vectors.
Because we have to use a ConstantExpr at some point, the canonical form
isn't set in stone, but this seems reasonable.

The pretty sizeof(<vscale x 4 x i32>) dumping is a relic of ancient
LLVM; I didn't have to touch that code. :)

Differential Revision: https://reviews.llvm.org/D75887
2020-03-13 16:12:45 -07:00
Brian Cain ad7b930bd1 Initialize IsFast* values
We must initialize these values in case some targets do not assign to
them in allowsMemoryAccess().
2020-03-13 17:46:32 -05:00
Jan Korous b7ce8fa91e [LLJIT] Add std::move() as a workaround for older compilers
Clang 3.8 isn't able to bind the variable to rvalue-ref which breaks the build.
2020-03-13 15:25:25 -07:00
Craig Topper 431df3d873 [SelectionDAGBuilder] Simplify the struct type handling in getUniformBase. 2020-03-13 14:00:21 -07:00
Craig Topper 1d192e09d8 [IR] Fix formatting. NFC 2020-03-13 14:00:20 -07:00
Lang Hames 906a91aa4d [MCJIT] Check for RuntimeDyld errors in MCJIT::finalizeLoadedModules.
Patch based on https://reviews.llvm.org/D75912 by Alexander Shishkin. Thanks
Alexander!

To minimize disruption to existing clients, who may be relying on the fact that
unused references to unresolved symbols do not generate an error, this patch
makes error checking opt-in: Clients can call ExecutionEngine::hasError or
LLVMExecutionEngineGetError to check whether and error has occurred.

Differential revision: https://reviews.llvm.org/D75912
2020-03-13 13:58:41 -07:00
Richard Smith b5aaa60962 Fix "unused variable" warning in NDEBUG builds. 2020-03-13 13:56:57 -07:00
Akira Hatanaka c6f1713c46 [ObjC][ARC] Don't remove autoreleaseRV/retainRV pairs if the call isn't
a tail call

This reapplies the patch in https://reviews.llvm.org/rG1f5b471b8bf4,
which was reverted because it was causing crashes.

https://bugs.chromium.org/p/chromium/issues/detail?id=1061289#c2

Check that HasSafePathToCall is true before checking the call is a tail
call.

Original commit message:

Previosly ARC optimizer removed the autoreleaseRV/retainRV pair in the
following code, which caused the object returned by @something to be
placed in the autorelease pool because the call to @something isn't a
tail call:

```
  %call = call i8* @something(...)
  %2 = call i8* @objc_retainAutoreleasedReturnValue(i8* %call)
  %3 = call i8* @objc_autoreleaseReturnValue(i8* %2)
  ret i8* %3
```

Fix the bug by checking whether @something is a tail call.

rdar://problem/59275894
2020-03-13 13:52:14 -07:00
Stanislav Mekhanoshin c262b69dcc [AMDGPU] Fix endcf collapse
Only collapse inner endcf if the outer one belongs to SI_IF.
If it does belong to SI_ELSE then mask being restored in fact
a partial inverse of what we need.

Differential Revision: https://reviews.llvm.org/D76154
2020-03-13 13:50:21 -07:00
Martin Storsjö 8f540dad61 [COFF] Assign unique names to autogenerated .weak.<name>.default symbols
These symbols need to be external (MSVC tools error out if a weak
external points at a symbol that isn't external; this was tried before
but had to be reverted in bc5b7217dc,
and this was originally explicitly fixed in
732eeaf2a9).

If multiple object files have weak symbols with defaults, their
defaults could cause linker errors due to duplicate definitions,
unless the names of the defaults are unique.

GNU binutils handles this by appending the name of another symbol
from the same object file to the name of the default symbol. Try
to implement something similar; before writing the object file,
locate a symbol that should have a unique name and use the name of
that one for making the weak defaults unique.

Differential Revision: https://reviews.llvm.org/D75989
2020-03-13 22:44:55 +02:00
Matt Arsenault 015b640be4 AMDGPU: Add flag to used fixed function ABI
Pass all arguments to every function, rather than only passing the
minimum set of inputs needed for the call graph.
2020-03-13 13:27:05 -07:00
Alexey Zhikhartsev f71abec661 [LoopInterchange] Fix interchanging contents of preheader BBs
Summary:
Previously LCSSA was getting broken by placing instructions into the
(newly) inner *header* instead of the *pre*header.

Fixes PR43474

Reviewers: fhahn

Reviewed By: fhahn

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D75943
2020-03-13 15:59:37 -04:00
Matt Arsenault bb8622094d AMDGPU: Don't handle kernarg.segment.ptr in functions
Just lower this to null. Pass implicitarg.ptr in its place in the
argument list.
2020-03-13 12:51:12 -07:00
Nico Weber f82b32a51e Revert "Reland "[DebugInfo] Enable the debug entry values feature by default""
This reverts commit 5aa5c943f7.
Causes clang to assert, see
https://bugs.chromium.org/p/chromium/issues/detail?id=1061533#c4
for a repro.
2020-03-13 15:37:44 -04:00
Stanislav Mekhanoshin 32e90cbcd1 [AMDGPU] Disable endcf collapse
There are some functional regressions and I suspect our
scopes are not as perfectly enclosed as I expected.
Disable it for now.

Differential Revision: https://reviews.llvm.org/D76148
2020-03-13 12:33:22 -07:00
Reid Kleckner 478b06e687 Revert "[ObjC][ARC] Check the basic block size before calling DominatorTree::dominate"
This reverts commit 5c3117b0a9

This should not be necessary after
7593a480db, and Florian Hahn has confirmed
that the problem no longer reproduces with this patch.

I happened to notice this code because the FIXME talks about
OrderedBasicBlock.

Reviewed By: fhahn, dexonsmith

Differential Revision: https://reviews.llvm.org/D76075
2020-03-13 11:57:55 -07:00
Simon Pilgrim 05c0d34918 [X86][SSE] Prefer trunc(movd(x)) to pextrb(x,0)
If we're extracting the 0'th index of a v16i8 vector we're better off using MOVD than PEXTRB, unless we're storing the value or we require the implicit zero extension of PEXTRB.

The biggest perf diff is on SLM targets where MOVD (uops=1, lat=3 tp=1) is notably faster than PEXTRB (uops=2, lat=5, tp=4).

This matches what we already do for PEXTRW.

Differential Revision: https://reviews.llvm.org/D76138
2020-03-13 18:43:04 +00:00
Huihui Zhang fc1f205745 [SLPVectorizer][SVE] Bail out early for scalable vector.
Summary:
SLPVectorizer try to vectorize list of scalar instructions of the same type,
instructions already vectorized are rejected through isValidElementType().

Without this patch, tryToVectorizeList() will first try to determine vectorization
factor of a list of Instructions before checking whether each instruction has unsupported
type or not. For instructions already vectorized for SVE, it will crash at getVectorElementSize(),
where it try to return a fixed size.

This patch make sure invalid element types are rejected before trying to get vectorization
factor. This make sure we are not trying to vectorize instructions already vectorized.

Reviewers: sdesmalen, efriedma, spatel, RKSimon, ABataev, apazos, rengolin

Reviewed By: efriedma

Subscribers: tschuett, hiraditya, rkruppe, psnobl, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76017
2020-03-13 11:23:31 -07:00
Sanjay Patel 94f5d73182 [SimplifyCFG] fix formatting; NFC 2020-03-13 14:12:28 -04:00
Sanjay Patel 51e53af11c [SimplifyCFG] fix debug print formatting; NFC 2020-03-13 14:12:28 -04:00
Philip Reames 1b86ad27a7 Use 15 byte long nops on modern Intel processors
Back in D42616, we switched our default nop length from 15 to 10 bytes because some platforms have painful decode stalls when encountering multiple instruction prefixes. (10 byte long nops come from the fact that prefixes are used to pad after 8 bytes, and some platforms have issues w/more than two prefixes.)

Based on Agner's guides, it appears to be the case that modern Intel (SandyBridge and later) can decode an arbitrary number of prefixes without issue. Intel's guide only provides up to 9 bytes; I read that as providing a safe default for all their chips. Older chips and Atom series have serious decode stalls. I can't find a conclusive reference beyond those two.

Differential Revision: https://reviews.llvm.org/D75945
2020-03-13 10:51:09 -07:00
Simon Cook a26bd4ec16 [TableGen] Support combining AssemblerPredicates with ORs
For context, the proposed RISC-V bit manipulation extension has a subset
of instructions which require one of two SubtargetFeatures to be
enabled, 'zbb' or 'zbp', and there is no defined feature which both of
these can imply to use as a constraint either (see comments in D65649).

AssemblerPredicates allow multiple SubtargetFeatures to be declared in
the "AssemblerCondString" field, separated by commas, and this means
that the two features must both be enabled. There is no equivalent to
say that _either_ feature X or feature Y must be enabled, short of
creating a dummy SubtargetFeature for this purpose and having features X
and Y imply the new feature.

To solve the case where X or Y is needed without adding a new feature,
and to better match a typical TableGen style, this replaces the existing
"AssemblerCondString" with a dag "AssemblerCondDag" which represents the
same information. Two operators are defined for use with
AssemblerCondDag, "all_of", which matches the current behaviour, and
"any_of", which adds the new proposed ORing features functionality.

This was originally proposed in the RFC at
http://lists.llvm.org/pipermail/llvm-dev/2020-February/139138.html

Changes to all current backends are mechanical to support the replaced
functionality, and are NFCI.

At this stage, it is illegal to combine features with ands and ors in a
single AssemblerCondDag. I suspect this case is sufficiently rare that
adding more complex changes to support it are unnecessary.

Differential Revision: https://reviews.llvm.org/D74338
2020-03-13 17:13:51 +00:00
Florian Hahn 0c5b6e2ea5 Recommit "[SCCP] Use ValueLatticeElement instead of LatticeVal (NFCI)"
This patch should fix the cause of the stage2 failures and
PR45185.

This reverts the revert commit c52f839e72.
2020-03-13 17:03:22 +00:00
Simon Pilgrim a2db388dce [CostModel][X86] Improve ISD::CTTZ costs accounting for BSF/TZCNT implementations 2020-03-13 16:51:13 +00:00
Ehud Katz 18eae33122 [SCEV] Fix usage of invalid IP with FoldingSet
Fix the use of invalid Insertion Point pointer with the UniqueSCEVs FoldingSet,
which caused memory corruption.
2020-03-13 18:36:58 +02:00
Tyker 2543567c41 [AssumeBundles] filter usefull attriutes to preserve
Summary:
This patch will filter attributes to only preserve those that are usefull.
In the case of NoAlias it is filtered out not because it isn't usefull
but because it is incorrect to preserve it as it is only valdi for the
duration of the function.

Reviewers: jdoerfert

Reviewed By: jdoerfert

Subscribers: jdoerfert, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D75828
2020-03-13 17:35:47 +01:00
Tyker 69375fd0a3 [AssumeBundles] Preserve Information in the inliner
Summary:
during inling Create and insert an llvm.assume with attributes to preserve them.
to prevent any changes for now generation of llvm.assume is under a flag disabled by default.

Reviewers: jdoerfert

Reviewed By: jdoerfert

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D75825
2020-03-13 17:35:47 +01:00
Alexandre Ganea a7325298e1 [CodeView] Align type records on 4-bytes when emitting PDBs
When emitting PDBs, the TypeStreamMerger class is used to merge .debug$T records from the input .OBJ files into the output .PDB stream.
Records in .OBJs are not required to be aligned on 4-bytes, and "The Netwide Assembler 2.14" generates non-aligned records.

When compiling with -DLLVM_ENABLE_ASSERTIONS=ON, an assert was triggered in MergingTypeTableBuilder when non-ghash merging was used.
With ghash merging there was no assert.
As a result, LLD could potentially generate a non-aligned TPI stream.

We now align records on 4-bytes when record indices are remapped, in TypeStreamMerger::remapIndices().

Differential Revision: https://reviews.llvm.org/D75081
2020-03-13 12:22:19 -04:00
omarahmed1111 b285b333dc [Attributor] Detect possibly unbounded cycles in functions
This patch add mayContainUnboundedCycle helper function which checks whether a function has any cycle which we don't know if it is bounded or not.
Loops with maximum trip count are considered bounded, any other cycle not.
It also contains some fixed tests and some added tests contain bounded and
unbounded loops and non-loop cycles.

Reviewed By: jdoerfert, uenoku, baziotis

Differential Revision: https://reviews.llvm.org/D74691
2020-03-13 11:17:33 -05:00
Pankaj Gode bf990530ae [Attributor] Improve noalias preservation using reachability
Resolution for below fixme:
(ii) Check whether the value is captured in the scope using AANoCapture.
FIXME: This is conservative though, it is better to look at CFG and
             check only uses possibly executed before this callsite.

Propagates caller argument's noalias attribute to callee.

Reviewed by: jdoerfert, uenoku

Reviewers: jdoerfert, sstefan1, uenoku

Subscribers: uenoku, sstefan1, hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D71617
2020-03-13 21:09:08 +05:30
Simon Pilgrim 846c614f54 [X86] combineExtractWithShuffle - pull out repeated getSizeInBits() call. NFC. 2020-03-13 15:36:04 +00:00
Simon Pilgrim fe047fbccc [X86] LowerEXTRACT_VECTOR_ELT - pull out repeated getOperand() calls. NFC.
Also, cleanup LowerEXTRACT_VECTOR_ELT_SSE4 comments which had references to non-constant extraction indices.
2020-03-13 15:36:02 +00:00
Sanjay Patel cbeffa3f6c [SimplifyCFG] convert if-else chain to switch; NFC
Fix formatting of related function names while changing the code.
2020-03-13 10:28:41 -04:00
Nico Weber 86eb2c3991 Revert "[ObjC][ARC] Don't remove autoreleaseRV/retainRV pairs if the call isn't"
This reverts commit 1f5b471b8b.
Causes asserts when building code with arc. See
https://bugs.chromium.org/p/chromium/issues/detail?id=1061289#c2
for a full repro. Will post a creduced repro once creduce is done
running.
2020-03-13 10:16:02 -04:00
Ehud Katz fcc2238b8b [SCEV] Add missing cache queries
Calculating SCEVs can be cumbersome, and may take very long time (even
hours, for very long expressions). To prevent recalculating expressions
over and over again, we cache them.
This change add cache queries to key positions, to prevent recalculation
of the expressions.

Fix PR43571.

Differential Revision: https://reviews.llvm.org/D70097
2020-03-13 15:32:43 +02:00
Andrzej Warzynski a0c15ed460 [AArch64][SVE] Add the @llvm.aarch64.sve.dup.x intrinsic
Summary:
This intrinsic implements the unpredicated duplication of scalar values
and is mapped to (through ISD::SPLAT_VECTOR):
  * DUP <Zd>.<T>, #<imm>
  * DUP <Zd>.<T>, <R><n|SP>

Reviewed by: sdesmalen

Differential Revision: https://reviews.llvm.org/D75900
2020-03-13 12:40:22 +00:00
Alexandre Ganea 28ad9fc208 [Clang][Driver] In -fintegrated-cc1 mode, avoid crashing on exit after a compiler crash
After a crash catched by the CrashRecoveryContext, this patch prevents from accessing dangling pointers in TimerGroup structures before the clang tool exits. Previously, the default TimerGroup had internal linked lists which were still pointing to old Timer or TimerGroup instances, which lived in stack frames released by the CrashRecoveryContext.

Fixes PR45164.

Differential Revision: https://reviews.llvm.org/D76099
2020-03-13 08:15:35 -04:00
David Green 2c6c169dbd [ARM] Optimise ASRL/LSRL to smaller shifts using demand bits.
The ASRL/LSRL long shifts are generated from 64bit shifts. Once we have
them, it might turn out that enough of the 64bit result was not required
that we can use a smaller shift to perform the same result. As the
smaller shift can in general be folded in more way, such as into add
instructions in one of the test cases here, we can use the demand bit
analysis to prefer the smaller shifts where we can.

Differential Revision: https://reviews.llvm.org/D75371
2020-03-13 10:09:03 +00:00
David Green f67d93dc23 [ARM] Constant long shift combines
This changes the way that asrl and lsrl intrinsics are lowered, going
via a the ISEL ASRL and LSLL nodes instead of straight to machine nodes.
On top of that, it adds some constant folds for long shifts, in case it
turns out that the shift amount was either constant or 0.

Differential Revision: https://reviews.llvm.org/D75553
2020-03-13 08:54:59 +00:00