Commit Graph

46451 Commits

Author SHA1 Message Date
David Blaikie e5adb68e04 DebugInfo: Provide option for explicitly specifying the name of the DWP file
If you've archived the DWP file somewhere it's probably useful to be
able to just tell llvm-symbolizer where it is when you're symbolizing
stack traces from the binary.

This only provides a mechanism for specifying a single DWP file, good if
you're symbolizing a program with a single DWP file, but it's likely if
the program is dynamically linked that you might have a DWP for each
dynamic library - in which case this feature won't help (at least as
it's surfaced in llvm-symbolizer for now) - in theory it could be
extended to specify a collection of DWP files that could all be
consulted for split CU hash resolution.

llvm-svn: 309498
2017-07-30 01:34:08 +00:00
Sam Elliott 67b0e589d0 Migrate PGOMemOptSizeOpt to use new OptimizationRemarkEmitter Pass
Summary:
Fixes PR33790.

This patch still needs a yaml-style test, which I shall write tomorrow

Reviewers: anemet

Reviewed By: anemet

Subscribers: anemet, llvm-commits

Differential Revision: https://reviews.llvm.org/D35981

llvm-svn: 309497
2017-07-30 00:35:33 +00:00
Florian Hahn f63a5e91db [AArch64] Tie source and destination operands for AESMC/AESIMC.
Summary:
Most CPUs implementing AES fusion require instruction pairs of the form
    AESE Vn, _
    AESMC Vn, Vn
and
    AESD Vn, _
    AESIMC Vn, Vn

The constraint is added to AES(I)MC instructions which use the result of
an AES(E|D) instruction by using AES(I)MCTrr pseudo instructions, which
constraint source and destination registers to be the same.

A nice side effect of this change is that now all possible pairs are
scheduled back-to-back on the exynos-m1 for the misched-fusion-aes.ll
test case.

I had to update aes_load_store. The version I added initially was very
reduced and with the new constraint, AESE/AESMC could not be scheduled
back-to-back. I updated the test to be more realistic and still expose
the same scheduling problem as the initial test case.

Reviewers: t.p.northover, rengolin, evandro, kristof.beyls, silviu.baranga

Reviewed By: t.p.northover, evandro

Subscribers: aemerson, javed.absar, llvm-commits

Differential Revision: https://reviews.llvm.org/D35299

llvm-svn: 309495
2017-07-29 20:35:28 +00:00
Florian Hahn 2f86e3d494 [AArch64] Use 8 bytes as preferred function alignment on Cortex-A53.
Summary:
This change gives a 0.25% speedup on execution time, a 0.82% improvement
in benchmark scores and a 0.20% increase in binary size on a Cortex-A53.
These numbers are the geomean results on a wide range of benchmarks from
the test-suite and a range of proprietary suites.

Reviewers: t.p.northover, aadg, silviu.baranga, mcrosier, rengolin

Reviewed By: rengolin

Subscribers: grimar, davide, aemerson, rengolin, javed.absar, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D35568

llvm-svn: 309494
2017-07-29 20:04:54 +00:00
Saleem Abdulrasool fc85067f30 MC: account for the return column in the CIE key
If the return column is different, we cannot coalesce the CIE across the
FDEs.  Add that to the key calculation.  This ensures that we emit a
separate CIE.

llvm-svn: 309492
2017-07-29 20:03:00 +00:00
Hiroshi Inoue 40d01f3cb5 Fix test failure without X86 backend
move test/Transforms/SimplifyCFG/disable-lookup-table.ll into test/Transforms/SimplifyCFG/X86/disable-lookup-table.ll to avoid test failure when X86 backend is not enabled

llvm-svn: 309487
2017-07-29 15:03:31 +00:00
Simon Pilgrim 718cb0ea62 [SelectionDAG][X86] CombineBT - more aggressively determine demanded bits
This patch is in 2 parts:

1 - replace combineBT's use of SimplifyDemandedBits (hasOneUse only) with SelectionDAG::GetDemandedBits to more aggressively determine the lower bits used by BT.

2 - update SelectionDAG::GetDemandedBits to support ANY_EXTEND - if the demanded bits are only in the non-extended portion, then peek through and demand from the source value and then ANY_EXTEND that if we found a match.

Differential Revision: https://reviews.llvm.org/D35896

llvm-svn: 309486
2017-07-29 14:50:25 +00:00
Tobias Grosser 670a5d88a3 [tests] Do not emity binary bitcode to stdout in RegionInfo tests
llvm-svn: 309485
2017-07-29 09:58:43 +00:00
Dehao Chen 4194ebff8d Update the test to make windows bot pass.
llvm-svn: 309482
2017-07-29 07:01:25 +00:00
Dehao Chen 246254b97d update the test file that was omitted in r309478.
llvm-svn: 309479
2017-07-29 04:11:20 +00:00
Dehao Chen ce0842ce9c Refine the PGOOpt and SamplePGOSupport handling.
Summary:
Now that SamplePGOSupport is part of PGOOpt, there are several places that need tweaking:
1. AddDiscriminator pass should *not* be invoked at ThinLTOBackend (as it's already invoked in the PreLink phase)
2. addPGOInstrPasses should only be invoked when either ProfileGenFile or ProfileUseFile is non-empty.
3. SampleProfileLoaderPass should only be invoked when SampleProfileFile is non-empty.
4. PGOIndirectCallPromotion should only be invoked in ProfileUse phase, or in ThinLTOBackend of SamplePGO.

Reviewers: chandlerc, tejohnson, davidxl

Reviewed By: chandlerc

Subscribers: sanjoy, mehdi_amini, eraman, llvm-commits

Differential Revision: https://reviews.llvm.org/D36040

llvm-svn: 309478
2017-07-29 04:10:24 +00:00
Matt Arsenault 9608a2891d AMDGPU: Make areMemAccessesTriviallyDisjoint more aware of segment flat
Checking the encoding is insufficient since now there can
be global or scratch instructions.

llvm-svn: 309472
2017-07-29 01:26:21 +00:00
Matt Arsenault dc8f5cc39c AMDGPU: Teach isLegalAddressingMode about global_* instructions
Also refine the flat check to respect flat-for-global feature,
and constant fallback should check global handling, not
specifically MUBUF.

llvm-svn: 309471
2017-07-29 01:12:31 +00:00
Matt Arsenault 4e309b0861 AMDGPU: Start selecting global instructions
llvm-svn: 309470
2017-07-29 01:03:53 +00:00
Tobias Edler von Koch 819d84032e [LTO] llvm-lto2: Add option to load sample profile
Summary:
This exposes LTO's Conf.SampleProfile as a command line option
(-lto-sample-profile-file) for testing via the llvm-lto2 utility.

Reviewers: pcc, danielcdh

Subscribers: mehdi_amini, inglorion, llvm-commits

Differential Revision: https://reviews.llvm.org/D36030

llvm-svn: 309456
2017-07-28 23:43:22 +00:00
Farhana Aleen 7ccc9fd0e2 Added tests for i8 interleaved-load-pattern of stride=4, VF=(8, 16, 32).
llvm-svn: 309447
2017-07-28 22:43:34 +00:00
Sumanth Gundapaneni 8d50a50e98 [SimplifyCFG] Make the no-jump-tables attribute also disable switch lookup tables
Differential Revision: https://reviews.llvm.org/D35579

llvm-svn: 309444
2017-07-28 22:25:40 +00:00
Easwaran Raman 51b809bf2f [Inliner] Do not apply any bonus for cold callsites.
Summary:
Inlining threshold is increased by application of bonuses when the
callee has a single reachable basic block or is rich in vector
instructions. Similarly, inlining cost is reduced by applying a large
bonus when the last call to a static function is considered for
inlining. This patch disables the application of these bonuses when the
callsite or the callee is cold. The intention here is to prevent a large
cold callsite from being inlined to a non-cold caller that could prevent
the caller from being inlined. This is especially important when the
cold callsite is a last call to a static since the associated bonus is
very high.

Reviewers: chandlerc, davidxl

Subscribers: danielcdh, llvm-commits

Differential Revision: https://reviews.llvm.org/D35823

llvm-svn: 309441
2017-07-28 21:47:36 +00:00
Adrian Prantl abe04759a6 Remove the obsolete offset parameter from @llvm.dbg.value
There is no situation where this rarely-used argument cannot be
substituted with a DIExpression and removing it allows us to simplify
the DWARF backend. Note that this patch does not yet remove any of
the newly dead code.

rdar://problem/33580047
Differential Revision: https://reviews.llvm.org/D35951

llvm-svn: 309426
2017-07-28 20:21:02 +00:00
Reid Kleckner 9be82c3169 Fix conditional tail call branch folding when both edges are the same
The conditional tail call logic did the wrong thing when both
destinations of a conditional branch were the same:

BB#1: derived from LLVM BB %entry
    Live Ins: %EFLAGS
    Predecessors according to CFG: BB#0
        JE_1 <BB#5>, %EFLAGS<imp-use,kill>
        JMP_1 <BB#5>

BB#5: derived from LLVM BB %sw.epilog
    Predecessors according to CFG: BB#1
        TCRETURNdi64 <ga:@mergeable_conditional_tailcall>, 0, ...

We would fold the JE_1 to a TCRETURNdi64cc, and then remove our BB#5
successor. Then BB#5 would be deleted as it had no predecessors, leaving
a dangling "JMP_1 <BB#5>" reference behind to cause assertions later.

This patch checks that both conditional branch destinations are
different before doing the transform. The standard branch folding logic
is able to remove both the JMP_1 and the JE_1, and for my test case we
end up forming a better conditional tail call later.

Fixes PR33980

llvm-svn: 309422
2017-07-28 19:48:40 +00:00
Matt Arsenault da9ab148f3 AMDGPU: Look through a bitcast user of an out argument
This allows handling of a lot more of the interesting
cases in Blender. Most of the large functions unlikely
to be inlined have this pattern.

This is a special case for what clang emits for OpenCL 3
element vectors. Annoyingly, these are emitted as
<3 x elt>* pointers, but accessed as <4 x elt>* operations.
This also needs to handle cases where a struct containing
a single vector is used.

llvm-svn: 309419
2017-07-28 19:06:16 +00:00
Matt Arsenault c06574ffc0 AMDGPU: Add pass to replace out arguments
It is better to return arguments directly in registers
if we are making a call rather than introducing expensive
stack usage. In one of sample compile from one of
Blender's many kernel variants, this fires on about
~20 different functions. Future improvements may be to
recognize simple cases where the pointer is indexing a small
array. This also fails when the store to the out argument
is in a separate block from the return, which happens in
a few of the Blender functions. This should also probably
be using MemorySSA which might help with that.

I'm not sure this is correct as a FunctionPass, but
MemoryDependenceAnalysis seems to not work with
a ModulePass.

I'm also not sure where it should run.I think it should
run  before DeadArgumentElimination, so maybe either
EP_CGSCCOptimizerLate or EP_ScalarOptimizerLate.

llvm-svn: 309416
2017-07-28 18:40:05 +00:00
Hiroshi Yamauchi 1b179bc5ff [LVI] Constant-propagate a zero extension of the switch condition value through case edges
Summary:
LazyValueInfo currently computes the constant value of the switch condition through case edges, which allows the constant value to be propagated through the case edges.

But we have seen a case where a zero-extended value of the switch condition is used past case edges for which the constant propagation doesn't occur.

This patch adds a small logic to handle such a case in getEdgeValueLocal().

This is motivated by the Python 2.7 eval loop in PyEval_EvalFrameEx() where the lack of the constant propagation causes longer live ranges and more spill code than necessary.

With this patch, we see that the code size of PyEval_EvalFrameEx() decreases by ~5.4% and a performance test improves by ~4.6%.




Reviewers: wmi, dberlin, sanjoy

Reviewed By: sanjoy

Subscribers: davide, davidxl, llvm-commits

Differential Revision: https://reviews.llvm.org/D34822

llvm-svn: 309415
2017-07-28 18:35:25 +00:00
Tim Northover a7f583e33b GlobalISel: map 128-bit values to an FPR by default.
Eventually we may want to allow a pair of GPRs but absolutely nothing in the
entire world is ready for that yet.

llvm-svn: 309404
2017-07-28 17:11:01 +00:00
Matt Arsenault 9166ce86e8 AMDGPU: Annotate implicitarg.ptr usage
We need to pass something to functions for this to work.
It isn't derivable just from the kernarg segment pointer
because the implicit arguments are placed after the
kernel arguments.

Also fixes missing test for the intrinsic.

llvm-svn: 309398
2017-07-28 15:52:08 +00:00
Wei Mi 55c05e14af [GVN] Recommit the patch "Add phi-translate support in scalarpre"
Recommit after workaround the bug PR31652.

Three bugs fixed in previous recommits: The first one is to use CurrentBlock
instead of PREInstr's Parent as param of performScalarPREInsertion because
the Parent of a clone instruction may be uninitialized. The second one is stop
PRE when CurrentBlock to its predecessor is a backedge and an operand of CurInst
is defined inside of CurrentBlock. The same value defined inside of loop in last
iteration can not be regarded as available. The third one is an out-of-bound
array access in a flipped if guard.

Right now scalarpre doesn't have phi-translate support, so it will miss some
simple pre opportunities. Like the following testcase, current scalarpre cannot
recognize the last "a * b" is fully redundent because a and b used by the last
"a * b" expr are both defined by phis.

long a[100], b[100], g1, g2, g3;
__attribute__((pure)) long goo();

void foo(long a, long b, long c, long d) {

  g1 = a * b;
  if (__builtin_expect(g2 > 3, 0)) {
    a = c;
    b = d;
    g2 = a * b;
  }
  g3 = a * b;      // fully redundant.

}

The patch adds phi-translate support in scalarpre. This is only a temporary
solution before the newpre based on newgvn is available.

Differential Revision: https://reviews.llvm.org/D32252

llvm-svn: 309397
2017-07-28 15:47:25 +00:00
Strahinja Petrovic 25e9e1b866 [ARM] Add the option to directly access TLS pointer
This patch enables choice for accessing thread local
storage pointer (like '-mtp' in gcc).

Differential Revision: https://reviews.llvm.org/D34408

llvm-svn: 309381
2017-07-28 12:54:57 +00:00
Simon Pilgrim 1ff3da7273 [X86] Add test case for PR33290
llvm-svn: 309375
2017-07-28 09:43:52 +00:00
Simon Pilgrim 88d3bed351 [X86][AVX] Cleanup shuffle combine tests - remove old prefixes.
llvm-svn: 309374
2017-07-28 09:41:55 +00:00
Peter Smith 5804364f7a [ARM] Add test to check pcs of ARM ABI runtime floating point helpers
The ARM Runtime ABI document (IHI0043) defines the AEABI floating point
helper functions in section 4.1.2 The floating-point helper functions.
The functions listed in this section must always use the base AAPCS calling
convention.

This test generates calls to all the helper functions that llvm supports
and checks that the base AAPCS calling convention has been used. We test
the equivalent of -mfloat-abi=soft, -mfloat-abi=softfp, -mfloat-abi=hardfp
with an FPU that supports single and double precision, and one that only
supports double precision.

Differential Revision: https://reviews.llvm.org/D35904

llvm-svn: 309371
2017-07-28 09:21:00 +00:00
Max Kazantsev fa4969539a [SCEV] Do not visit nodes twice in containsConstantSomewhere
This patch reworks the function that searches constants in Add and Mul SCEV expression
chains so that now it does not visit a node more than once, and also renames this function
for better correspondence between its implementation and semantics.

Differential Revision: https://reviews.llvm.org/D35931

llvm-svn: 309367
2017-07-28 06:42:15 +00:00
Saleem Abdulrasool 61d81ec754 test: require x86 backend
Ensure that the target is registered before using it.  Should fix the
hexagon Bots.

llvm-svn: 309363
2017-07-28 04:15:35 +00:00
Saleem Abdulrasool a219b3d8d1 MC: add support for cfi_return_column
This adds support for the CFI pseudo-op return_column.  This specifies
the frame table column which contains the return address.

Addresses PR33953!

llvm-svn: 309360
2017-07-28 03:39:19 +00:00
Sanjoy Das 843ab57457 Revert "[SCEV] Cache results of computeExitLimit"
This reverts commit r309080.  The patch needs to clear out the
ScalarEvolution::ExitLimits cache in forgetMemoizedResults.

I've replied on the commit thread for the patch with more details.

llvm-svn: 309357
2017-07-28 03:25:07 +00:00
Davide Italiano 75a001ba78 [JumpThreading] Stop falsely preserving LazyValueInfo.
JumpThreading claims to preserve LVI, but it doesn't preserve
the analyses which LVI holds a reference to (e.g. the Dominator).
In the current pass manager infrastructure, after JT runs, the
PM frees these analyses (including DominatorTree) but preserves
LVI.

CorrelatedValuePropagation runs immediately after and queries
a corrupted domtree, causing weird miscompiles.

This commit disables the preservation of LVI for the time being.
Eventually, we should either move LVI to a proper dependency
tracking mechanism (i.e. an analyses shouldn't hold references
to other analyses and compute them on demand if needed), or
we should teach all the passes preserving LVI to preserve the
analyses LVI depends on.

The new pass manager has a mechanism to invalidate LVI in case
one of the analyses it depends on becomes invalid, so this problem
shouldn't exist (at least not in this immediate form), but handling
of analyses holding references is still a very delicate subject.

Fixes PR33917 (and rustc).

llvm-svn: 309355
2017-07-28 03:10:43 +00:00
David Blaikie 89daf77a11 DebugInfo: Consider a CU containing only local imported entities to be 'empty'
This can come up in ThinLTO & wastes space & makes degenerate IR.

As per the added FIXME, ultimately, local imported entities should hang
off the function and that way the imported entity list on the CU can be
tested for emptiness like all the other CU lists.

(function-attached local imported entities are probably also the best
path forward for fixing how imported entities are handled both in
cross-module use (currently, while ThinLTO preserves the imported
entities, they would not get used at the imported inlined location -
only in the abstract origin that appears in the partial CU created by
the import (which isn't emitted under Fission due to cross-CU
limitations there)) and to reduce the number of points where imported
entities are emitted (they're currently emitted into every inlined
instance, concrete instance, and abstract origin - they should only go
in teh abstract origin if there is one, otherwise in the concrete
instance - but this requires lots of delayed handling and wiring up,
same as abstract variables & subprograms))

llvm-svn: 309354
2017-07-28 03:06:25 +00:00
Davide Italiano 01cb947abb [JumpThreading] Add an option to dump LazyValueInfo after the run.
Differential Revision:  https://reviews.llvm.org/D35973

llvm-svn: 309353
2017-07-28 02:57:43 +00:00
Matthias Braun c618a466f1 ARMFrameLowering: Only set ExtraCSSpill for actually unused registers.
The code assumed that unclobbered/unspilled callee saved registers are
unused in the function. This is not true for callee saved registers that are
also used to pass parameters such as swiftself.

rdar://33401922

llvm-svn: 309350
2017-07-28 01:36:32 +00:00
Dehao Chen f4240b5b91 Separate the ICP total threshold and remaining threshold.
Summary: In the current implementation, isPromotionProfitable only checks if the call count to a direct target is no less than a certain percentage threshold of the remaining call counts that have not been promoted. This causes code size problems when the target count is small but greater than a large portion of remaining counts. E.g. target1 takes 99.9%, while target2 takes 0.1%. Both targets will be promoted and inlined, makes the function size too large, which potentially prevents it from further inlining into its callers. This patch adds another percentage threshold against the total indirect call count. If the target count needs to be no less than both thresholds in order to be promoted speculatively.

Reviewers: davidxl, tejohnson

Reviewed By: tejohnson

Subscribers: sanjoy, llvm-commits

Differential Revision: https://reviews.llvm.org/D35962

llvm-svn: 309345
2017-07-28 01:02:54 +00:00
Reid Kleckner 07a5d4372e [X86] Fix latent bug in sibcall eligibility logic
The X86 tail call eligibility logic was correct when it was written, but
the addition of inalloca and argument copy elision broke its
assumptions. It was assuming that fixed stack objects were immutable.

Currently, we aim to emit a tail call if no arguments have to be
re-arranged in memory. This code would trace the outgoing argument
values back to check if they are loads from an incoming stack object.
If the stack argument is immutable, then we won't need to store it back
to the stack when we tail call.

Fortunately, stack objects track their mutability, so we can just make
the obvious check to fix the bug.

This was http://crbug.com/749826

llvm-svn: 309343
2017-07-28 00:58:35 +00:00
Kostya Serebryany 063b652096 [sanitizer-coverage] rename sanitizer-coverage-create-pc-table into sanitizer-coverage-pc-table and add plumbing for a clang flag
llvm-svn: 309337
2017-07-28 00:09:29 +00:00
Kostya Serebryany b75d002f15 [sanitizer-coverage] add a feature sanitizer-coverage-create-pc-table=1 (works with trace-pc-guard and inline-8bit-counters) that adds a static table of instrumented PCs to be used at run-time
llvm-svn: 309335
2017-07-27 23:36:49 +00:00
Davide Italiano 1a26f24f35 [ConstantFolder] Don't try to fold gep when the idx is a vector.
The code in ConstantFoldGetElementPtr() assumes integers, and
therefore it crashes trying to get the integer bidwith of a vector
type (in this case <4 x i32>. I just changed the code to prevent
the folding in case of vectors and I didn't bother to generalize
as this doesn't seem to me something that really happens in
practice, but I'm willing to change the patch if you think
it's worth it.
This is hard to trigger from -instsimplify or -instcombine
only as the second instruction is dead, so the test uses loop-unroll.

Differential Revision:  https://reviews.llvm.org/D35956

llvm-svn: 309330
2017-07-27 22:20:44 +00:00
Ahmed Bougacha 87807c5a86 [AArch64] Fix legality info passed to demanded bits for TBI opt.
The (seldom-used) TBI-aware optimization had a typo lying dormant since
it was first introduced, in r252573:  when asking for demanded bits, it
told TLI that it was running after legalize, where the opposite was
true.

This is an important piece of information, that the demanded bits
analysis uses to make assumptions about the node.  r301019 added such an
assumption, which was broken by the TBI combine.

Instead, pass the correct flags to TLO.

llvm-svn: 309323
2017-07-27 21:27:25 +00:00
Eric Beckmann d8bac0fa3f Add test to reject merging of empty manifest.
Reviewers: ruiu, rnk

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D35954

llvm-svn: 309317
2017-07-27 19:58:12 +00:00
Dinar Temirbulatov 636ac1b6da Change prefix in vector-shuffle-combining-avx.patch to reduce test size.
llvm-svn: 309315
2017-07-27 19:47:35 +00:00
Hiroshi Yamauchi 60855214c2 [InstCombine] Simplify pointer difference subtractions (GEP-GEP) where GEPs have other uses and one non-constant index
Summary:
Pointer difference simplifications currently happen only if input GEPs don't have other uses or their indexes are all constants, to avoid duplicating indexing arithmetic.

This patch enables cases with exactly one non-constant index among input GEPs to happen where there is no duplicated arithmetic or code size increase even if input GEPs have other uses.

For example, this patch allows "(&A[42][i]-&A[42][0])" --> "i", which didn't happen previously, if the input GEP(s) have other uses.

Reviewers: sanjoy, bkramer

Reviewed By: sanjoy

Subscribers: mcrosier, llvm-commits

Differential Revision: https://reviews.llvm.org/D35499

llvm-svn: 309304
2017-07-27 18:27:11 +00:00
Simon Pilgrim ac84850ea6 [SelectionDAG] Improve DAGTypeLegalizer::convertMask assertion (PR33960)
Improve DAGTypeLegalizer::convertMask's isSETCCorConvertedSETCC assertion to properly check for any mixture of SETCC or BUILD_VECTOR of constants, or a logical mask op of them.

llvm-svn: 309302
2017-07-27 18:15:54 +00:00
Dinar Temirbulatov aead31a36f [X86] SET0 to use XMM registers where possible PR26018 PR32862
Differential Revision: https://reviews.llvm.org/D35839

llvm-svn: 309298
2017-07-27 17:47:01 +00:00
Adam Nemet a67dfe3b04 Relax the matching in these tests
Looks like the template arguments are displayed differently depending on the
host compiler(?).  E.g.:

InnerAnalysisManagerProxy<CGSCCAnalysisManager
InnerAnalysisManagerProxy<llvm::AnalysisManager<llvm::LazyCallGraph::SCC, ...

Fix fallout after r309294

llvm-svn: 309297
2017-07-27 17:45:02 +00:00