Commit Graph

105217 Commits

Author SHA1 Message Date
Keith Wyss 3d0bc9ef1a Xray docs with description of Flight Data Recorder binary format.
Summary:
Adding a new restructuredText file to document the trace format produced with
an FDR mode handler and read by llvm-xray toolset.

Fixed two problems in the documentation from differential review. One bad table
and a missing link in the toc.

Original commit was e97c5836a77db803fe53319c53f3bf8e8b26d2b7.

Reviewers: dberris, pelikan

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D36041

llvm-svn: 309891
2017-08-02 21:47:27 +00:00
Matt Arsenault e6de494b74 LV: Don't insert runtime ptr checks on divergent targets
llvm-svn: 309890
2017-08-02 21:43:08 +00:00
George Karpenkov 03f9506477 [libFuzzer tests] Use substring comparison in libFuzzer tests
LIT launches executables with absolute, and not relative, path.
strncmp would try to do exact comparison and fail.

Differential Revision: https://reviews.llvm.org/D36242

llvm-svn: 309889
2017-08-02 21:38:50 +00:00
Craig Topper fc9bf50dee [InstCombine] Remove unnecessary temporary APInt. NFCI
llvm-svn: 309887
2017-08-02 21:05:40 +00:00
Teresa Johnson ecd901314d [PM] Split LoopUnrollPass and make partial unroller a function pass
Summary:
This is largely NFC*, in preparation for utilizing ProfileSummaryInfo
and BranchFrequencyInfo analyses. In this patch I am only doing the
splitting for the New PM, but I can do the same for the legacy PM as
a follow-on if this looks good.

*Not NFC since for partial unrolling we lose the updates done to the
loop traversal (adding new sibling and child loops) - according to
Chandler this is not very useful for partial unrolling, but it also
means that the debugging flag -unroll-revisit-child-loops no longer
works for partial unrolling.

Reviewers: chandlerc

Subscribers: mehdi_amini, mzolotukhin, eraman, llvm-commits

Differential Revision: https://reviews.llvm.org/D36157

llvm-svn: 309886
2017-08-02 20:35:29 +00:00
Rafael Espindola 9f92995781 Don't pass the code model to MC
I was surprised to see the code model being passed to MC. After all,
it assembles code, it doesn't create it.

The one place it is used is in the expansion of .cfi directives to
handle .eh_frame being more that 2gb away from the code.

As far as I can tell, gnu assembler doesn't even have an option to
enable this. Compiling a c file with gcc -mcmodel=large produces a
regular looking .eh_frame. This is probably because in practice linker
parse and recreate .eh_frames.

In llvm this is used because the JIT can place the code and .eh_frame
very far apart. Ideally we would fix the jit and delete this
option. This is hard.

Apart from confusion another problem with the current interface is
that most callers pass CodeModel::Default, which is bad since MC has
no way to map it to the target default if it actually needed to.

This patch then replaces the argument with a boolean with a default
value. The vast majority of users don't ever need to look at it. In
fact, only CodeGen and llvm-mc use it and llvm-mc just to enable more
testing.

llvm-svn: 309884
2017-08-02 20:32:26 +00:00
Craig Topper 4068e4eec5 [InstCombine] Remove explicit code for folding (xor(zext(cmp)), 1) and (xor(sext(cmp)), -1) to ext(!cmp).
As far as I can tell this should be handled by foldCastedBitwiseLogic which is called later in visitXor.

Differential Revision: https://reviews.llvm.org/D36214

llvm-svn: 309882
2017-08-02 20:30:27 +00:00
Craig Topper ae9b87d10c [InstCombine] Support sext in foldLogicCastConstant
This adds support for sext in foldLogicCastConstant. This is a prerequisite for D36214.

Differential Revision: https://reviews.llvm.org/D36234

llvm-svn: 309880
2017-08-02 20:25:56 +00:00
David Blaikie 22dc4474a6 DebugInfo: Test & handle (differently) non-zero DW_AT_ranges_base
Followup to r309570, fixing it slightly differently (ranges_base and
addr_base should never be read from a DWO file - so there shouldn't be
any issue with 'overriding' the values - conditionalize the code and
assert that the values aren't being overriden).

llvm-svn: 309879
2017-08-02 20:16:22 +00:00
Stefan Pintilie 873889ca16 [Power9] Exploit vector absolute difference instructions on Power 9
Power 9 has instructions to do absolute difference (VABSDUB, VABSDUH, VABSDUW)
for byte, halfword and word. We should take advantage of these.

Differential Revision: https://reviews.llvm.org/D34684

llvm-svn: 309876
2017-08-02 20:07:21 +00:00
Jakub Kuderski d869913f3b [Dominators] Teach LoopDeletion to use the new incremental API
Summary:
This patch makes LoopDeletion use the incremental DominatorTree API.

We modify LoopDeletion to perform the deletion in 5 steps:
1. Create a new dummy edge from the preheader to the exit, by adding a conditional branch.
2. Inform the DomTree about the new edge.
3. Remove the conditional branch and replace it with an unconditional edge to the exit. This removes the edge to the loop header, making it unreachable.
4. Inform the DomTree about the deleted edge.
5. Remove the unreachable block from the function.

Creating the dummy conditional branch is necessary to perform incremental DomTree update.
We should consider using the batch updater when it's ready.

Reviewers: dberlin, davide, grosser, sanjoy

Reviewed By: dberlin, grosser

Subscribers: mzolotukhin, llvm-commits

Differential Revision: https://reviews.llvm.org/D35391

llvm-svn: 309850
2017-08-02 18:17:52 +00:00
Hiroshi Inoue 0bd906ec8f [StackColoring] Update AliasAnalysis information in stack coloring pass (part 2)
This patch is update after the first patch (https://reviews.llvm.org/rL309651) based on the post-commit comments.

Stack coloring pass need to maintain AliasAnalysis information when merging stack slots of different types.
Actually, there is a FIXME comment in StackColoring.cpp

// FIXME: In order to enable the use of TBAA when using AA in CodeGen,
// we'll also need to update the TBAA nodes in MMOs with values
// derived from the merged allocas.

But, TBAA has been already enabled in CodeGen without fixing this pass.
The incorrect TBAA metadata results in recent failures in bootstrap test on ppc64le (PR33928) by allowing unsafe instruction scheduling.
Although we observed the problem on ppc64le, this is a platform neutral issue.

This patch makes the stack coloring pass maintains AliasAnalysis information when merging multiple stack slots.

This patch fixes PR33928.

llvm-svn: 309849
2017-08-02 18:16:32 +00:00
Keith Wyss a2e9d1716f Revert "Xray docs with description of Flight Data Recorder binary format."
This reverts commit 3462b8ad41a840fd54dbbd0d3f2a514c5ad6f656.

The docs-llvm-html target failed.

llvm-svn: 309842
2017-08-02 17:36:52 +00:00
Coby Tayree d483a10791 [AsmParser][GAS-compatibility] Ignore an empty 'p2align' directive
GAS ignores the aforementioned issue
this patch aligns LLVM + throws in an appropriate warning

Differential Revision: https://reviews.llvm.org/D36060

llvm-svn: 309841
2017-08-02 17:36:10 +00:00
Keith Wyss 5a1dd7e990 Xray docs with description of Flight Data Recorder binary format.
Summary:
Adding a new restructuredText file to document the trace format produced with
an FDR mode handler and read by llvm-xray toolset.

Reviewers: dberris, pelikan

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D36041

llvm-svn: 309836
2017-08-02 17:27:20 +00:00
Adrian Prantl bd6d291c59 Assert that the offset of a DBG_VALUE is always 0. (NFC)
llvm-svn: 309834
2017-08-02 17:19:13 +00:00
Matt Arsenault 2738ede6b2 AMDGPU: Restore using MRI to find highest used regs
If there are no calls, this is a faster path than
searching the entire program for calls.

This was supposed to be left in r309781.
Fixes unused variable warning.

llvm-svn: 309832
2017-08-02 17:15:01 +00:00
Adrian Prantl 5577363a2c Remove the unused Offset field from MachineLocation (NFC)
rdar://problem/33580047

llvm-svn: 309831
2017-08-02 17:07:38 +00:00
Nirav Dave e44032f7e7 [DAG] Improve candidate pruning in store merge failure case. NFCI
During store merge we construct a sorted list of consecutive store
candidates and consider subsequences for merging into a single
store. For each subsequence we check if the stored value type is legal
the merged store would have valid and fast and if the constructed
value to be stored is valid. The only properties that affect this
check between subsequences is the size of the subsequence, the
alignment of the first store, the alignment of the stored load value
(when merging stores-of-loads), and whether the merged value is a
constant zero.

If we do not find a viable mergeable subsequence starting from the
first store of length N, we know that a subsequence starting at a
later store of length N will also fail unless the new store's
alignment, the new load's alignment (if we're merging store-of-loads),
or we've dropped stores of nonzero value and could construct a merged
stores of zero (for merging constants).

As a result if we fail to find a valid subsequence starting from the
first store we can safely skip considering subsequences that start
with subsequent stores unless one of the above properties is
true. This significantly (2x) improves compile time in some
pathological cases.

Reviewers: RKSimon, efriedma, zvi, spatel, waltl

Subscribers: grandinj, llvm-commits

Differential Revision: https://reviews.llvm.org/D35901

llvm-svn: 309830
2017-08-02 16:35:58 +00:00
Adrian Prantl 61b39b2aec Remove unused includes of MachineLocation.h (NFC)
llvm-svn: 309824
2017-08-02 15:32:18 +00:00
Adrian Prantl 2049c0d734 Remove unreachable code. (NFC)
MachineLocation::getOffset() always returns 0.

rdar://problem/33580047

llvm-svn: 309823
2017-08-02 15:22:17 +00:00
Florian Hahn 31f78fd0ae [AArch64] Simplify AES*Tied pseudo expansion (NFC).
Summary:
Suggested by @t.p.northover in https://bugs.llvm.org/show_bug.cgi?id=34015.


Reviewers: javed.absar, t.p.northover, rengolin

Reviewed By: t.p.northover

Subscribers: aemerson, kristof.beyls, llvm-commits, t.p.northover

Differential Revision: https://reviews.llvm.org/D36223

llvm-svn: 309821
2017-08-02 15:17:19 +00:00
Chad Rosier 5ce28f4f92 [InlineCost] Remove redundant call. NFC.
llvm-svn: 309819
2017-08-02 14:50:27 +00:00
Chad Rosier 2e1c050e52 [InlineCost] Simplify more 'and' and 'or' operations.
Differential Revision: https://reviews.llvm.org/D35856

llvm-svn: 309817
2017-08-02 14:40:42 +00:00
Alexey Bataev 36e6096a03 [SLPVectorizer] Generalize interface of functions, NFC.
llvm-svn: 309816
2017-08-02 14:38:07 +00:00
Alexey Bataev fba97e6e21 [SLP] Fix for PR31880: shuffle and vectorize repeated scalar ops on extracted elements
Summary:
Currently most of the time vectors of extractelement instructions are
treated as scalars that must be gathered into vectors. But in some
cases, like when we have extractelement instructions from single vector
with different constant indeces or from 2 vectors of the same size, we
can treat this operations as shuffle of a single vector or blending of 2
vectors.
```
define <2 x i8> @g(<2 x i8> %x, <2 x i8> %y) {
  %x0 = extractelement <2 x i8> %x, i32 0
  %y1 = extractelement <2 x i8> %y, i32 1
  %x0x0 = mul i8 %x0, %x0
  %y1y1 = mul i8 %y1, %y1
  %ins1 = insertelement <2 x i8> undef, i8 %x0x0, i32 0
  %ins2 = insertelement <2 x i8> %ins1, i8 %y1y1, i32 1
  ret <2 x i8> %ins2
}
```
can be converted to something like
```
define <2 x i8> @g(<2 x i8> %x, <2 x i8> %y) {
  %1 = shufflevector <2 x i8> %x, <2 x i8> %y, <2 x i32> <i32 0, i32 3>
  %2 = mul <2 x i8> %1, %1
  ret <2 x i8> %2
}
```
Currently this type of conversion is considered as high cost
transformation.

Reviewers: mzolotukhin, delena, mkuper, hfinkel, RKSimon

Subscribers: ashahid, RKSimon, spatel, llvm-commits

Differential Revision: https://reviews.llvm.org/D30200

llvm-svn: 309812
2017-08-02 13:25:26 +00:00
Diana Picus d5a00b0ff6 [MIR] Print target-specific constant pools
This should enable us to test the generation of target-specific constant
pools, e.g. for ARM:

constants:
 - id:              0
   value:           'g(GOT_PREL)-(LPC0+8-.)'
   alignment:       4
   isTargetSpecific: true

I intend to use this to test PIC support in GlobalISel for ARM.

This is difficult to test outside of that context, since the existing
MIR tests usually rely on parser support as well, and that seems a bit
trickier to add. We could try to add a unit test, but the setup for that
seems rather convoluted and overkill.

We do test however that the parser reports a nice error when
encountering a target-specific constant pool.

Differential Revision: https://reviews.llvm.org/D36092

llvm-svn: 309806
2017-08-02 11:09:30 +00:00
Davide Italiano c2f73b7fae [NewGVN] Fold single-use variables. NFCI.
llvm-svn: 309790
2017-08-02 04:05:49 +00:00
Davide Italiano b13a3fa41b [NewGVN] Remove a (now stale) comment. NFCI.
llvm-svn: 309789
2017-08-02 03:51:40 +00:00
Dehao Chen 3246dc3df1 Fix the bug that parseAAPipeline is not invoked in runNewPMPasses in release compiler.
Summary: The logic is guarded by "assert".

Reviewers: davidxl, davide, chandlerc

Reviewed By: davide, chandlerc

Subscribers: sanjoy, llvm-commits, mehdi_amini

Differential Revision: https://reviews.llvm.org/D36195

llvm-svn: 309787
2017-08-02 03:03:19 +00:00
Craig Topper 36af40c115 [SimplifyCFG] Fix typo in comment. NFC
llvm-svn: 309785
2017-08-02 02:34:16 +00:00
Chandler Carruth 95055d8f8b [PM] Fix a bug where through CGSCC iteration we can get
infinite-inlining across multiple runs of the inliner by keeping a tiny
history of internal-to-SCC inlining decisions.

This is still a bit gross, but I don't yet have any fundamentally better
ideas and numerous people are blocked on this to use new PM and ThinLTO
together.

The core of the idea is to detect when we are about to do an inline that
has a chance of re-splitting an SCC which we have split before with
a similar inlining step. That is a critical component in the inlining
forming a cycle and so far detects all of the various cyclic patterns
I can come up with as well as the original real-world test case (which
comes from a ThinLTO build of libunwind).

I've added some tests that I think really demonstrate what is going on
here. They are essentially state machines that march the inliner through
various steps of a cycle and check that we stop when the cycle is closed
and that we actually did do inlining to form that cycle.

A lot of thanks go to Eric Christopher and Sanjoy Das for the help
understanding this issue and improving the test cases.

The biggest "yuck" here is the layering issue -- the CGSCC pass manager
is providing somewhat magical state to the inliner for it to use to make
itself converge. This isn't great, but I don't honestly have a lot of
better ideas yet and at least seems nicely isolated.

I have tested this patch, and it doesn't block *any* inlining on the
entire LLVM test suite and SPEC, so it seems sufficiently narrowly
targeted to the issue at hand.

We have come up with hypothetical issues that this patch doesn't cover,
but so far none of them are practical and we don't have a viable
solution yet that covers the hypothetical stuff, so proceeding here in
the interim. Definitely an area that we will be back and revisiting in
the future.

Differential Revision: https://reviews.llvm.org/D36188

llvm-svn: 309784
2017-08-02 02:09:22 +00:00
Matt Arsenault 8e8f8f43b0 AMDGPU: Fix clobbering CSR VGPRs when spilling SGPR to it
llvm-svn: 309783
2017-08-02 01:52:45 +00:00
Matt Arsenault 1d6317c3ad AMDGPU: Fix emitting encoded calls
This was failing on out of bounds access to the extra operands
on the s_swappc_b64 beyond those in the instruction definition.

This was working, but somehow regressed within the past few weeks,
although I don't see any obvious commit.

llvm-svn: 309782
2017-08-02 01:42:04 +00:00
Matt Arsenault 6ed7b9bfc0 AMDGPU: Analyze callee resource usage in AsmPrinter
llvm-svn: 309781
2017-08-02 01:31:28 +00:00
Dehao Chen 89d3226019 Update the new PM pipeline to make ICP aware if it is SamplePGO build.
Summary: In ThinLTO backend compile, OPTOptions are not set so that the ICP in ThinLTO backend does not know if it is a SamplePGO build, in which profile count needs to be annotated directly on call instructions. This patch cleaned up the PGOOptions handling logic and passes down PGOOptions to ThinLTO backend.

Reviewers: chandlerc, tejohnson, davidxl

Reviewed By: chandlerc

Subscribers: sanjoy, llvm-commits, mehdi_amini

Differential Revision: https://reviews.llvm.org/D36052

llvm-svn: 309780
2017-08-02 01:28:31 +00:00
Stanislav Mekhanoshin f23ae4fbe9 [AMDGPU] Fix asan error after last commit
Previous change "Turn s_and_saveexec_b64 into s_and_b64 if
result is unused" introduced asan use-after-poison error.
Instruction was analyzed after eraseFromParent() calls.

Move analysys higher than erase.

llvm-svn: 309779
2017-08-02 01:18:57 +00:00
Nirav Dave 15894f8dec [DAG] Refactor store merge subexpressions. NFC.
Distribute various expressions across ifs.

llvm-svn: 309777
2017-08-02 01:08:38 +00:00
Matt Arsenault d1867c0345 AMDGPU: Don't place arguments in emergency stack slot
When finding the fixed offsets for function arguments,
this needs to skip over the 4 bytes reserved for the
emergency stack slot.

llvm-svn: 309776
2017-08-02 00:59:51 +00:00
Matt Arsenault acc5e82b0e DAG: Undo and->or combine with FrameIndexes
This pattern shows up when lowering byval copies on AMDGPU.

The byval object access is split into 4-byte chunks, adding a
constant offset to the FixedStack base. When some of the offsets
turn into ors, this prevents combining the constant offsets.

This makes it not apparent that the object is there when matching
addressing modes, so it ends up using a scratch wave offset
relative access and the lengthy frame index expansion for that.

llvm-svn: 309775
2017-08-02 00:43:42 +00:00
Adrian Prantl 83ca4fc7bc Update LiveDebugValues to generate DIExpressions for spill offsets
instead of using the deprecated offset field of DBG_VALUE.

This has no observable effect on the generated DWARF, but the
assembler comments will look different.

rdar://problem/33580047

llvm-svn: 309773
2017-08-02 00:16:56 +00:00
Stanislav Mekhanoshin da0edef1bd [AMDGPU] Turn s_and_saveexec_b64 into s_and_b64 if result is unused
With SI_END_CF elimination for some nested control flow we can now
eliminate saved exec register completely by turning a saveexec version
of instruction into just a logical instruction.

Differential Revision: https://reviews.llvm.org/D36007

llvm-svn: 309766
2017-08-01 23:44:35 +00:00
Stanislav Mekhanoshin 37e7f959c0 [AMDGPU] Collapse adjacent SI_END_CF
Add a pass to remove redundant S_OR_B64 instructions enabling lanes in
the exec. If two SI_END_CF (lowered as S_OR_B64) come together without any
vector instructions between them we can only keep outer SI_END_CF, given
that CFG is structured and exec bits of the outer end statement are always
not less than exec bit of the inner one.

This needs to be done before the RA to eliminate saved exec bits registers
but after register coalescer to have no vector registers copies in between
of different end cf statements.

Differential Revision: https://reviews.llvm.org/D35967

llvm-svn: 309762
2017-08-01 23:14:32 +00:00
Sanjoy Das 4cad61adb3 [SCEV/IndVars] Always compute loop exiting values if the backedge count is 0
If SCEV can prove that the backedge taken count for a loop is zero, it does not
need to "understand" a recursive PHI to compute its exiting value.

This should fix PR33885.

llvm-svn: 309758
2017-08-01 22:37:58 +00:00
Adrian Prantl aac78ce47e Use helper function instead of manually constructing DBG_VALUEs (NFC)
rdar://problem/33580047

llvm-svn: 309757
2017-08-01 22:37:35 +00:00
Adrian Prantl 032d2381bf Remove PrologEpilogInserter's usage of DBG_VALUE's offset field
In the last half-dozen commits to LLVM I removed code that became dead
after removing the offset parameter from llvm.dbg.value gradually
proceeding from IR towards the backend. Before I can move on to
DwarfDebug and friends there is one last side-called offset I need to
remove:  This patch modifies PrologEpilogInserter's use of the
DBG_VALUE's offset argument to use a DIExpression instead. Because the
PrologEpilogInserter runs at the Machine level I had to play a little
trick with a named llvm.dbg.mir node to get the DIExpressions to print
in MIR dumps (which print the llvm::Module followed by the
MachineFunction dump).

I also had to add rudimentary DwarfExpression support to CodeView and
as a side-effect also fixed a bug (CodeViewDebug::collectVariableInfo
was supposed to give up on variables with complex DIExpressions, but
would fail to do so for fragments, which are also modeled as
DIExpressions).

With this last holdover removed we will have only one canonical way of
representing offsets to debug locations which will simplify the code
in DwarfDebug (and future versions of CodeViewDebug once it starts
handling more complex expressions) and make it easier to reason about.

This patch is NFC-ish: All test case changes are for assembler
comments and the binary output does not change.

rdar://problem/33580047
Differential Revision: https://reviews.llvm.org/D36125

llvm-svn: 309751
2017-08-01 21:45:24 +00:00
Haicheng Wu 50692a203c [AArch64] Fix a typo in isExtFreeImpl()
next => not

Differential Revision: https://reviews.llvm.org/D36104

llvm-svn: 309748
2017-08-01 21:26:45 +00:00
Vedant Kumar 4b102c3d5c [llvm-cov] Allow specifying distinct architectures for each loaded binary
The coverage tool needs to know which slice to look at when it's handed
a universal binary. Some projects need to look at aggregate coverage
reports for a variety of slices in different binaries: this patch adds
support for these kinds of projects to llvm-cov.

rdar://problem/33579007

llvm-svn: 309747
2017-08-01 21:23:26 +00:00
Eugene Zelenko 52889219ef [Hexagon] Fix some Clang-tidy modernize-use-using and Include What You Use warnings; other minor fixes (NFC).
llvm-svn: 309746
2017-08-01 21:20:10 +00:00
Martin Storsjo eacf4e408b [AArch64] Rewrite stack frame handling for win64 vararg functions
The previous attempt, which made do with a single offset in
computeCalleeSaveRegisterPairs, wasn't quite enough. The previous
attempt only worked as long as CombineSPBump == true (since the
offset would be adjusted later in fixupCalleeSaveRestoreStackOffset).

Instead include the size for the fixed stack area used for win64
varargs in calculations in emitPrologue/emitEpilogue. The stack
consists of mainly three parts;
- AFI->getLocalStackSize()
- AFI->getCalleeSavedStackSize()
- FixedObject

Most of the places in the code which previously used the CSStackSize
now use PrologueSaveSize instead, which is the sum of the latter
two, while some cases which need exactly the middle one use
AFI->getCalleeSavedStackSize() explicitly instead of a local variable.

In addition to moving the offsetting into emitPrologue/emitEpilogue
(which fixes functions with CombineSPBump == false), also set the
frame pointer to point to the right location, where the frame pointer
and link register actually are stored. In addition to the prologue/epilogue,
this also requires changes to resolveFrameIndexReference.

Add tests for a function that keeps a frame pointer and another one
that uses a VLA.

Differential Revision: https://reviews.llvm.org/D35919

llvm-svn: 309744
2017-08-01 21:13:54 +00:00