Commit Graph

21060 Commits

Author SHA1 Message Date
Steven Wu fa43892d6f Revert "[ThinLTO] Internalize readonly globals"
This reverts commit 10c84a8f35cae4a9fc421648d9608fccda3925f2.

llvm-svn: 346768
2018-11-13 17:35:04 +00:00
Florian Hahn a4dc7feeea [VPlan] VPlan version of InterleavedAccessInfo.
This patch turns InterleaveGroup into a template with the instruction type
being a template parameter. It also adds a VPInterleavedAccessInfo class, which
only contains a mapping from VPInstructions to their respective InterleaveGroup.
As we do not have access to scalar evolution in VPlan, we can re-use
convert InterleavedAccessInfo to VPInterleavedAccess info.


Reviewers: Ayal, mssimpso, hfinkel, dcaballe, rengolin, mkuper, hsaito

Reviewed By: rengolin

Differential Revision: https://reviews.llvm.org/D49489

llvm-svn: 346758
2018-11-13 15:58:18 +00:00
Zhizhou Yang cc633af55b Introduce DebugCounter into ConstProp pass
Summary:
This patch introduces DebugCounter into ConstProp pass at per-transformation level.

It will provide an option to skip first n or stop after n transformations for the whole ConstProp pass.

This will make debug easier for the pass, also providing chance to do transformation level bisecting.

Reviewers: davide, fhahn

Reviewed By: fhahn

Subscribers: llozano, george.burgess.iv, llvm-commits

Differential Revision: https://reviews.llvm.org/D50094

llvm-svn: 346720
2018-11-13 00:31:22 +00:00
Sanjay Patel 35b1c2d19d [InstCombine] narrow width of rotate patterns, part 3
This is a longer variant for the pattern handled in
rL346713 
This one includes zexts. 

Eventually, we should canonicalize all rotate patterns 
to the funnel shift intrinsics, but we need a bit more
infrastructure to make sure the vectorizers handle those
intrinsics as well as the shift+logic ops.

https://rise4fun.com/Alive/FMn

Name: narrow rotateright
  %neg = sub i8 0, %shamt
  %rshamt = and i8 %shamt, 7
  %rshamtconv = zext i8 %rshamt to i32
  %lshamt = and i8 %neg, 7
  %lshamtconv = zext i8 %lshamt to i32
  %conv = zext i8 %x to i32
  %shr = lshr i32 %conv, %rshamtconv
  %shl = shl i32 %conv, %lshamtconv
  %or = or i32 %shl, %shr
  %r = trunc i32 %or to i8
  =>
  %maskedShAmt2 = and i8 %shamt, 7
  %negShAmt2 = sub i8 0, %shamt
  %maskedNegShAmt2 = and i8 %negShAmt2, 7
  %shl2 = lshr i8 %x, %maskedShAmt2
  %shr2 = shl i8 %x, %maskedNegShAmt2
  %r = or i8 %shl2, %shr2
llvm-svn: 346716
2018-11-12 22:52:25 +00:00
Sanjay Patel 98e427ccf2 [InstCombine] narrow width of rotate patterns, part 2 (PR39624)
The sub-pattern for the shift amount in a rotate can take on
several different forms, and there's apparently no way to
canonicalize those without seeing the entire rotate sequence.

This is the form noted in:
https://bugs.llvm.org/show_bug.cgi?id=39624

https://rise4fun.com/Alive/qnT

  %zx = zext i8 %x to i32
  %maskedShAmt = and i32 %shAmt, 7
  %shl = shl i32 %zx, %maskedShAmt
  %negShAmt = sub i32 0, %shAmt
  %maskedNegShAmt = and i32 %negShAmt, 7
  %shr = lshr i32 %zx, %maskedNegShAmt
  %rot = or i32 %shl, %shr
  %r = trunc i32 %rot to i8
  =>
  %truncShAmt = trunc i32 %shAmt to i8
  %maskedShAmt2 = and i8 %truncShAmt, 7
  %shl2 = shl i8 %x, %maskedShAmt2
  %negShAmt2 = sub i8 0, %truncShAmt
  %maskedNegShAmt2 = and i8 %negShAmt2, 7
  %shr2 = lshr i8 %x, %maskedNegShAmt2
  %r = or i8 %shl2, %shr2

llvm-svn: 346713
2018-11-12 22:11:09 +00:00
Sanjay Patel ceab2329b6 [InstCombine] refactor code for matching shift amount of a rotate; NFC
As shown in existing test cases and with:
https://bugs.llvm.org/show_bug.cgi?id=39624
...we're missing at least 2 more patterns for rotate narrowing.

llvm-svn: 346711
2018-11-12 22:00:00 +00:00
Philip Reames b8d8db30ea [GC][InstCombine] Fix a potential iteration issue
Noticed via inspection.  Appears to be largely innocious in practice, but slight code change could have resulted in either visit order dependent missed optimizations or infinite loops.  May be a minor compile time problem today.

llvm-svn: 346698
2018-11-12 20:00:53 +00:00
Simon Pilgrim 631f2bf51e [CostModel] Add more realistic SK_ExtractSubvector generic costs.
Instead of defaulting to a cost = 1, expand to element extract/insert like we do for other shuffles.

This exposes an issue in LoopVectorize which could call SK_ExtractSubvector with a scalar subvector type.

llvm-svn: 346656
2018-11-12 14:25:23 +00:00
Max Kazantsev 7d49a3a816 [LICM] Hoist guards from non-header blocks
This patch relaxes overconservative checks on whether or not we could write
memory before we execute an instruction. This allows us to hoist guards out of
loops even if they are not in the header block.

Differential Revision: https://reviews.llvm.org/D50891
Reviewed By: fedor.sergeev

llvm-svn: 346643
2018-11-12 09:29:58 +00:00
Calixte Denizet c6fabeac11 [GCOV] Add options to filter files which must be instrumented.
Summary:
When making code coverage, a lot of files (like the ones coming from /usr/include) are removed when post-processing gcno/gcda so finally they doen't need to be instrumented nor to appear in gcno/gcda.
The goal of the patch is to be able to filter the files we want to instrument, there are several advantages to do that:
- improve speed (no overhead due to instrumentation on files we don't care)
- reduce gcno/gcda size
- it gives the possibility to easily instrument only few files (e.g. ones modified in a patch) without changing the build system
- need to accept this patch to be enabled in clang: https://reviews.llvm.org/D52034

Reviewers: marco-c, vsk

Reviewed By: marco-c

Subscribers: llvm-commits, sylvestre.ledru

Differential Revision: https://reviews.llvm.org/D52033

llvm-svn: 346641
2018-11-12 09:01:43 +00:00
Florian Hahn 9026d4ee9b [IPSCCP,PM] Preserve PDT in the new pass manager.
Reviewers: kuhar, chandlerc, NutshellySima, brzycki

Reviewed By: NutshellySima, brzycki

Differential Revision: https://reviews.llvm.org/D54317

llvm-svn: 346618
2018-11-11 20:22:45 +00:00
Sanjay Patel 4a12aa9791 [InstCombine] simplify code for merging stores; NFCI
llvm-svn: 346596
2018-11-10 20:29:25 +00:00
Eugene Leviant be8d19967a [ThinLTO] Internalize readonly globals
This patch allows internalising globals if all accesses to them
(from live functions) are from non-volatile load instructions

Differential revision: https://reviews.llvm.org/D49362

llvm-svn: 346584
2018-11-10 08:31:21 +00:00
Eli Friedman 15930bf352 [JumpThreading] Fix exponential time algorithm computing known values.
ComputeValueKnownInPredecessors has a "visited" set to prevent infinite
loops, since a value can be visited more than once.  However, the
implementation didn't prevent the algorithm from taking exponential
time. Instead of removing elements from the RecursionSet one at a time,
we should keep around the whole set until
ComputeValueKnownInPredecessors finishes, then discard it.

The testcase is synthetic because I was having trouble effectively
reducing the original.  But it's basically the same idea.

Instead of failing, we could theoretically cache the result instead.
But I don't think it would help substantially in practice.

Differential Revision: https://reviews.llvm.org/D54239

llvm-svn: 346562
2018-11-09 22:35:26 +00:00
Florian Hahn 9f878e9bae Revert r346483: [CallSiteSplitting] Only record conditions up to the IDom(call site).
This cause a failure with EXPENSIVE_CHECKS

llvm-svn: 346492
2018-11-09 13:28:58 +00:00
Florian Hahn a1062f4b68 [IPSCCP,PM] Preserve DT in the new pass manager.
After D45330, Dominators are required for IPSCCP and can be preserved.

This patch preserves DominatorTreeAnalysis in the new pass manager. AFAIK the legacy pass manager cannot preserve function analysis required by a module analysis.

Reviewers: davide, dberlin, chandlerc, efriedma, kuhar, NutshellySima

Reviewed By: chandlerc, kuhar, NutshellySima

Differential Revision: https://reviews.llvm.org/D47259

llvm-svn: 346486
2018-11-09 11:52:27 +00:00
Florian Hahn 52578f95c9 [CallSiteSplitting] Only record conditions up to the IDom(call site).
We can stop recording conditions once we reached the immediate dominator
for the block containing the call site. Conditions in predecessors of the
that node will be the same for all paths to the call site and splitting
is not beneficial.

This patch makes CallSiteSplitting dependent on the DT anlysis. because
the immediate dominators seem to be the easiest way of finding the node
to stop at.

I had to update some exiting tests, because they were checking for
conditions that were true/false on all paths to the call site. Those
should now be handled by instcombine/ipsccp.

Reviewers: davide, junbuml

Reviewed By: junbuml

Differential Revision: https://reviews.llvm.org/D44627

llvm-svn: 346483
2018-11-09 10:23:46 +00:00
Carlos Alberto Enciso fa9cf89734 [DebugInfo][Dexter] Unreachable line stepped onto after SimplifyCFG.
In SimplifyCFG when given a conditional branch that goes to BB1 and BB2, the hoisted common terminator instruction in the two blocks, caused debug line records associated with subsequent select instructions to become ambiguous. It causes the debugger to display unreachable source lines.

Differential Revision: https://reviews.llvm.org/D53390

llvm-svn: 346481
2018-11-09 09:42:10 +00:00
Max Kazantsev 9883d1e1a7 [NFC] Add utility function for SafetyInfo updates for moveBefore
llvm-svn: 346472
2018-11-09 05:39:04 +00:00
Florian Hahn a684a99441 [LoopInterchange] Support reductions across inner and outer loop.
This patch adds logic to detect reductions across the inner and outer
loop by following the incoming values of PHI nodes in the outer loop. If
the incoming values take part in a reduction in the inner loop or come
from outside the outer loop, we found a reduction spanning across inner
and outer loop.

With this change, ~10% more loops are interchanged in the LLVM
test-suite + SPEC2006.

Fixes https://bugs.llvm.org/show_bug.cgi?id=30472

Reviewers: mcrosier, efriedma, karthikthecool, davide, hfinkel, dmgreen

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D43245

llvm-svn: 346438
2018-11-08 20:44:19 +00:00
Pirama Arumuga Nainar e61652a384 [LTO] Drop non-prevailing definitions only if linkage is not local or appending
Summary:
This fixes PR 37422

In ELF, non-weak symbols can also be non-prevailing.  In this particular
PR, the __llvm_profile_* symbols are non-prevailing but weren't getting
dropped - causing multiply-defined errors with lld.

Also add a test, strong_non_prevailing.ll, to ensure that multiple
copies of a strong symbol are dropped.

To fix the test regressions exposed by this fix,
- do not mark prevailing copies for symbols with 'appending' linkage.
There's no one prevailing copy for such symbols.
- fix the prevailing version in dead-strip-fulllto.ll
- explicitly pass exported symbols to llvm-lto in fumcimport.ll and
funcimport_var.ll

Reviewers: tejohnson, pcc

Subscribers: mehdi_amini, inglorion, eraman, steven_wu, dexonsmith,
dang, srhines, llvm-commits

Differential Revision: https://reviews.llvm.org/D54125

llvm-svn: 346436
2018-11-08 20:10:07 +00:00
Tom Stellard 28d662164d InstCombine: Avoid introducing poison values when lowering llvm.amdgcn.[us]bfe
Summary:
When the 3rd argument to these intrinsics is zero, lowering them
to shift instructions produces poison values, since we end up with
shift amounts equal to the number of bits in the shifted value.  This
means we can only lower these intrinsics if we can prove that the
3rd argument is not zero.

Reviewers: arsenm

Reviewed By: arsenm

Subscribers: bnieuwenhuizen, jvesely, wdng, nhaehnle, llvm-commits

Differential Revision: https://reviews.llvm.org/D53739

llvm-svn: 346422
2018-11-08 17:57:57 +00:00
Vedant Kumar d6699423f1 [CodeExtractor] Mark functions noreturn when applicable
This eliminates the outlining penalty for llvm.trap/unreachable, because
callers no longer have to emit cleanup/ret instructions after calling an
outlined `noreturn` function.

rdar://45523626

llvm-svn: 346421
2018-11-08 17:57:09 +00:00
Max Kazantsev 266c087b9d Return "[IndVars] Smart hard uses detection"
The patch has been reverted because it ended up prohibiting propagation
of a constant to exit value. For such values, we should skip all checks
related to hard uses because propagating a constant is always profitable.

Differential Revision: https://reviews.llvm.org/D53691

llvm-svn: 346397
2018-11-08 11:54:35 +00:00
Gil Rapaport 7b88bab386 [LSR] Combine unfolded offset into invariant register
LSR reassociates constants as unfolded offsets when the constants fit as
immediate add operands, which currently prevents such constants from being
combined later with loop invariant registers.
This patch modifies GenerateCombinations() to generate a second formula which
includes the unfolded offset in the combined loop-invariant register.

This commit fixes a bug in the original patch (committed at r345114, reverted
at r345123).

Differential Revision: https://reviews.llvm.org/D51861

llvm-svn: 346390
2018-11-08 09:01:19 +00:00
whitequark 73cb978495 [MergeFuncs] Improve ordering of equal functions
Summary:
MergeFunctions currently tries to process strong functions before
weak functions, because weak functions can simply call strong
functions, while a strong/weak function cannot call a weak function
(a backing strong function is needed).

This patch additionally tries to process external functions before
local functions, because we definitely have to keep the external
function, but may be able to drop the local one (and definitely
can if it is also unnamed_addr).

Unfortunately, this exposes an existing bug in the implementation:
The FnTree and FNodesInTree structures can currently go out of
sync in the case where two weak functions are merged, because the
function in FnTree/FNodesInTree is RAUWed. This leaves it behind in
FnTree (this is intended, as it is the strong backing function which
should be used for further merges), while it is replaced in
FNodesInTree (this is not intended).

This is fixed by switching FNodesInTree from using a ValueMap to
using a DenseMap of AssertingVH.

This exposes another minor issue: Currently FNodesInTree is not
cleared after MergeFunctions finishes running. Currently, this is
potentially dangerous (e.g. if something else wants to RAUW a function
with a non-function), but at the very least it is unnecessary/inefficient.
After the change to use AssertingVH it becomes more problematic,
because there are certainly passes that remove functions.

This issue is fixed by clearing FNodesInTree at the end of the pass.

Reviewers: jfb, whitequark

Reviewed By: whitequark

Subscribers: rkruppe, llvm-commits

Differential Revision: https://reviews.llvm.org/D53271

llvm-svn: 346386
2018-11-08 03:58:01 +00:00
whitequark 3580ac6125 [MergeFuncs] Call removeUsers() prior to unnamed_addr RAUW
Summary:
For unnamed_addr functions we RAUW instead of only replacing direct callers. However, functions in which replacements were performed currently are not added back to the worklist, resulting in missed merging opportunities.

Fix this by calling removeUsers() prior to RAUW.

Reviewers: jfb, whitequark

Reviewed By: whitequark

Subscribers: rkruppe, llvm-commits

Differential Revision: https://reviews.llvm.org/D53262

llvm-svn: 346385
2018-11-08 03:57:55 +00:00
Reid Kleckner b41b372171 [sancov] Put .SCOV* sections into the right comdat groups on COFF
Avoids linker errors about relocations against discarded sections.

This was uncovered during the Chromium clang roll here:
https://chromium-review.googlesource.com/c/chromium/src/+/1321863#message-717516acfcf829176f6a2f50980f7a4bdd66469a

After this change, Chromium's libGLESv2 links successfully for me.

Reviewers: metzman, hans, morehouse

Differential Revision: https://reviews.llvm.org/D54232

llvm-svn: 346381
2018-11-08 00:57:33 +00:00
Rong Xu fb4bcc452c [PGO] Exit early if all count values are zero
If all the edge counts for a function are zero, skip count population and
annotation, as nothing will happen. This can save some compile time.

Differential Revision: https://reviews.llvm.org/D54212

llvm-svn: 346370
2018-11-07 23:51:20 +00:00
Fedor Sergeev f9a02a7006 [SimpleLoopUnswitch] partial unswitch needs to be careful when replacing invariants with constants
When partial unswitch operates on multiple conditions at once, .e.g:
   if (Cond1 || Cond2 || NonInv) ...

it should infer (and replace) values for individual conditions only on one
side of unswitch and not another.

More precisely only these derivations hold true:
   (Cond1 || Cond2) == false  =>  Cond1 == Cond2 == false
   (Cond1 && Cond2) == true   =>  Cond1 == Cond2 == true

By the way we organize unswitching it means only replacing on "continue" blocks
and never on "unswitched" ones. Since trivial unswitch does not have "unswitched"
blocks it does not have this problem.

Fixes PR 39568.

Reviewers: chandlerc, asbirlea
Differential Revision: https://reviews.llvm.org/D54211

llvm-svn: 346350
2018-11-07 20:05:11 +00:00
Mandeep Singh Grang d47d188b6f [LoopSink] Do not sink instructions into non-cold blocks
Summary: This fixes PR39570.

Reviewers: danielcdh, rnk, bkramer

Reviewed By: rnk

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D54181

llvm-svn: 346337
2018-11-07 18:26:24 +00:00
Florian Hahn ac86038b40 [NewGVN] Make sure we do not add a user to itself.
If we simplify an instruction to itself, we do not need to add a user to
itself. For congruence classes with a defining expression, we already
use a similar logic.

Fixes PR38259.

Reviewers: davide, efriedma, mcrosier

Reviewed By: davide

Differential Revision: https://reviews.llvm.org/D51168

llvm-svn: 346335
2018-11-07 17:20:07 +00:00
Sanjay Patel 57a08b3343 [InstCombine] propagate FMF for fcmp+fabs folds
By morphing the instruction rather than deleting and creating a new one,
we retain fast-math-flags and potentially other metadata (profile info?).

llvm-svn: 346331
2018-11-07 16:15:01 +00:00
Sanjay Patel bb521e63af [InstCombine] peek through fabs() when checking isnan()
That should be the end of the missing cases for this fold.
See earlier patches in this series:
rL346321
rL346324

llvm-svn: 346327
2018-11-07 15:44:26 +00:00
Sanjay Patel fa5f146872 [InstCombine] add folds for fcmp Pred fabs(X), 0.0
Similar to rL346321, we had folds for the ordered
versions of these compares already, so add the
unordered siblings for completeness.

llvm-svn: 346324
2018-11-07 15:33:03 +00:00
James Y Knight 72f76bf230 Add support for llvm.is.constant intrinsic (PR4898)
This adds the llvm-side support for post-inlining evaluation of the
__builtin_constant_p GCC intrinsic.

Also fixed SCCPSolver::visitCallSite to not blow up when seeing a call
to a function where canConstantFoldTo returns true, and one of the
arguments is a struct.

Updated from patch initially by Janusz Sobczak.

Differential Revision: https://reviews.llvm.org/D4276

llvm-svn: 346322
2018-11-07 15:24:12 +00:00
Sanjay Patel 76faf5145d [InstCombine] add fold for fabs(X) u< 0.0
The sibling fold for 'oge' --> 'ord' was already here,
but this half was missing. 

The result of fabs() must be positive or nan, so asking 
if the result is negative or nan is the same as asking 
if the result is nan.

This is another step towards fixing:
https://bugs.llvm.org/show_bug.cgi?id=39475

llvm-svn: 346321
2018-11-07 15:11:32 +00:00
Sanjay Patel de58e93666 fix typos aggressively; NFC
llvm-svn: 346316
2018-11-07 14:35:36 +00:00
Sanjay Patel 7552d0d2e6 [InstCombine] do not shrink switch conditions to illegal types (PR29009)
This patch makes shrinking switch conditions less aggressive which was introduced by:
rL274233

Note that we have 2 new bugs to track potential follow-ups that might have solved PR29009
in different ways:
https://bugs.llvm.org/show_bug.cgi?id=39569
https://bugs.llvm.org/show_bug.cgi?id=39578

Patch by:
@dendibakh (Denis Bakhvalov)

Differential Revision: https://reviews.llvm.org/D54115

llvm-svn: 346315
2018-11-07 14:12:41 +00:00
Calixte Denizet c3bed1e8e6 [GCOV] Flush counters before to avoid counting the execution before fork twice and for exec** functions we must flush before the call
Summary:
This is replacement for patch in https://reviews.llvm.org/D49460.
When we fork, the counters are duplicate as they're and so the values are finally wrong when writing gcda for parent and child.
So just before to fork, we flush the counters and so the parent and the child have new counters set to zero.
For exec** functions, we need to flush before the call to have some data.

Reviewers: vsk, davidxl, marco-c

Reviewed By: marco-c

Subscribers: llvm-commits, sylvestre.ledru, marco-c

Differential Revision: https://reviews.llvm.org/D53593

llvm-svn: 346313
2018-11-07 13:49:17 +00:00
Sanjay Patel d1172a0c20 [IR] add optional parameter for copying IR flags to compare instructions
As shown, this is used to eliminate redundant code in InstCombine,
and there are more cases where we should be using this pattern, but
we're currently unintentionally dropping flags. 

llvm-svn: 346282
2018-11-07 00:00:42 +00:00
Teresa Johnson cb397461e1 [ThinLTO] Split NotEligibleToImport into legality and inlinability flags
Summary:
The NotEligibleToImport flag on the GlobalValueSummary was set if it
isn't legal to import (e.g. because it references unpromotable locals)
and when it can't be inlined (in which case importing is pointless).

I split out the inlinable piece into a separate flag on the
FunctionSummary (doesn't make sense for aliases or global variables),
because in the future we may want to import for reasons other than
inlining.

Reviewers: davidxl

Subscribers: mehdi_amini, inglorion, eraman, steven_wu, dexonsmith, arphaman, llvm-commits

Differential Revision: https://reviews.llvm.org/D53345

llvm-svn: 346261
2018-11-06 19:41:35 +00:00
Vedant Kumar 1e209e284f [CodeExtractor] Do not extract calls to eh_typeid_for (PR39545)
The lowering for a call to eh_typeid_for changes when it's moved from
one function to another.

There are several proposals for fixing this issue in llvm.org/PR39545.
Until some solution is in place, do not allow CodeExtractor to extract
calls to eh_typeid_for, as that results in serious miscompilations.

llvm-svn: 346256
2018-11-06 19:06:08 +00:00
Vedant Kumar 09b7aa443d [CodeExtractor] Erase use-without-def debug intrinsics in parent func
When CodeExtractor moves instructions to a new function, debug
intrinsics referring to those instructions within the parent function
become invalid.

This results in the same verifier failure which motivated r344545, about
function-local metadata being used in the wrong function.

llvm-svn: 346255
2018-11-06 19:05:53 +00:00
Sanjay Patel 724014adde [InstCombine] allow vector types for fcmp+fpext fold
llvm-svn: 346245
2018-11-06 17:20:20 +00:00
Sanjay Patel 46bf3922c1 [InstCombine] propagate fast-math-flags when folding fcmp+fpext, part 2
llvm-svn: 346242
2018-11-06 16:45:27 +00:00
Sanjay Patel 7c3ee4da42 [InstCombine] rearrange code for fcmp+fpext; NFCI
llvm-svn: 346241
2018-11-06 16:37:35 +00:00
Sanjay Patel 1b85f00201 [InstCombine] propagate fast-math-flags when folding fcmp+fpext
llvm-svn: 346240
2018-11-06 16:23:03 +00:00
Sanjay Patel 2fd5b0ebfb [InstCombine] propagate fast-math-flags when folding fcmp+fneg, part 2
llvm-svn: 346238
2018-11-06 15:58:57 +00:00
Sanjay Patel 05e70fb978 [InstCombine] reduce code; NFC
llvm-svn: 346235
2018-11-06 15:53:58 +00:00
Sanjay Patel 70282a0501 [InstCombine] propagate fast-math-flags when folding fcmp+fneg
This is another part of solving PR39475:
https://bugs.llvm.org/show_bug.cgi?id=39475

This might be enough to fix that particular issue, but as noted
with the FIXME, we're still dropping FMF on other folds around here.

llvm-svn: 346234
2018-11-06 15:49:45 +00:00
Simon Pilgrim c1da5f757e [InstCombine] Ensure nested shifts are in range (OSS-Fuzz #9880)
llvm-svn: 346225
2018-11-06 11:28:22 +00:00
Max Kazantsev 0042c06e8b [LICM] Remove too conservative IsMustExecute variable
LICM relies on variable `MustExecute` which is conservatively set to `false`
in all non-headers. It is used when we decide whether or not we want to hoist
an instruction or a guard.

For the guards, it might be too conservative to use this variable, we can
instead use a more precise logic from LoopSafetyInfo. Currently it is only NFC
because `IsMemoryNotModified` is also conservatively set to `false` for all
non-headers, and we cannot hoist guards from non-header blocks. However once we
give up using `IsMemoryNotModified` and use a smarter check instead, this will
allow us to hoist guards from all mustexecute non-header blocks.

Differential Revision: https://reviews.llvm.org/D50888
Reveiwed By: fedor.sergeev

llvm-svn: 346204
2018-11-06 04:17:40 +00:00
Max Kazantsev 69f6dfa0f8 [LICM] Use ICFLoopSafetyInfo in LICM
This patch makes LICM use `ICFLoopSafetyInfo` that is a smarter version
of LoopSafetyInfo that leverages power of Implicit Control Flow Tracking
to keep track of throwing instructions and give less pessimistic answers
to queries related to throws.

The ICFLoopSafetyInfo itself has been introduced in rL344601. This patch
enables it in LICM only.

Differential Revision: https://reviews.llvm.org/D50377
Reviewed By: apilipenko

llvm-svn: 346201
2018-11-06 02:44:49 +00:00
Max Kazantsev e059f4452b Revert "[IndVars] Smart hard uses detection"
This reverts commit 2f425e9c7946b9d74e64ebbfa33c1caa36914402.

It seems that the check that we still should do the transform if we
know the result is constant is missing in this code. So the logic that
has been deleted by this change is still sometimes accidentally useful.
I revert the change to see what can be done about it. The motivating
case is the following:

@Y = global [400 x i16] zeroinitializer, align 1

define i16 @foo() {
entry:
  br label %for.body

for.body:                                         ; preds = %entry, %for.body
  %i = phi i16 [ 0, %entry ], [ %inc, %for.body ]

  %arrayidx = getelementptr inbounds [400 x i16], [400 x i16]* @Y, i16 0, i16 %i
  store i16 0, i16* %arrayidx, align 1
  %inc = add nuw nsw i16 %i, 1
  %cmp = icmp ult i16 %inc, 400
  br i1 %cmp, label %for.body, label %for.end

for.end:                                          ; preds = %for.body
  %inc.lcssa = phi i16 [ %inc, %for.body ]
  ret i16 %inc.lcssa
}

We should be able to figure out that the result is constant, but the patch
breaks it.

Differential Revision: https://reviews.llvm.org/D51584

llvm-svn: 346198
2018-11-06 02:02:05 +00:00
Sanjay Patel 1440107821 [InstSimplify] fold select (fcmp X, Y), X, Y
This is NFCI for InstCombine because it calls InstSimplify, 
so I left the tests for this transform there. As noted in
the code comment, we can allow this fold more often by using
FMF and/or value tracking.

llvm-svn: 346169
2018-11-05 21:51:39 +00:00
Taewook Oh 2b7ae47ccb [MergeICmps] Do not perform the transformation if GEP is used outside of block
Summary:
This patch prevents MergeICmps to performn the transformation if the address operand GEP of the load instruction has a use outside of the load's parent block. Without this patch, compiler crashes with the given test case because the use of `%first.i` is still around when the basic block is erased from https://github.com/llvm-mirror/llvm/blob/master/lib/Transforms/Scalar/MergeICmps.cpp#L620. I think checking `isUsedOutsideOfBlock` with `GEP` is the original intention of the code, as the checking for `LoadI` is already performed in the same function.

This patch is incomplete though, as this makes the pass overly conservative and fails the test `tuple-four-int8.ll`. I believe what needs to be done is checking if GEP has a use outside of block that is not the part of "Comparisons" chain. Submit the patch as of now to prevent compiler crash.

Reviewers: courbet, trentxintong

Reviewed By: courbet

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D54089

llvm-svn: 346151
2018-11-05 18:16:32 +00:00
Sanjay Patel c26fd1e772 [InstCombine] canonicalize -0.0 to +0.0 in fcmp
As stated in IEEE-754 and discussed in:
https://bugs.llvm.org/show_bug.cgi?id=38086
...the sign of zero does not affect any FP compare predicate.

Known regressions were fixed with:
rL346097 (D54001)
rL346143

The transform will help reduce pattern-matching complexity to solve:
https://bugs.llvm.org/show_bug.cgi?id=39475
...as well as improve CSE and codegen (a zero constant is almost always
easier to produce than 0x80..00).

llvm-svn: 346147
2018-11-05 17:26:42 +00:00
Sanjay Patel 87aa10062c [InstCombine] loosen FP 0.0 constraint for fcmp+select substitution
It looks like we correctly removed edge cases with 0.0 from D50714,
but we were a bit conservative because getBinOpIdentity() doesn't
distinguish between +0.0 and -0.0 and 'nsz' is effectively always
true for fcmp (see discussion in:
https://bugs.llvm.org/show_bug.cgi?id=38086

Without this change, we would get regressions by canonicalizing
to +0.0 in all fcmp, and that's a step towards solving:
https://bugs.llvm.org/show_bug.cgi?id=39475

llvm-svn: 346143
2018-11-05 16:50:44 +00:00
Vedant Kumar d2a895a972 [HotColdSplitting] Use TTI to inform outlining threshold
Using TargetTransformInfo allows the splitting pass to factor in the
code size cost of instructions as it decides whether or not outlining is
profitable.

This did not regress the overall amount of outlining seen on the handful
of internal frameworks I tested.

Thanks to Jun Bum Lim for suggesting this!

Differential Revision: https://reviews.llvm.org/D53835

llvm-svn: 346108
2018-11-04 23:11:57 +00:00
Jordan Rupprecht 80e7e86c29 [DebugInfo][InstMerge] Fix -debugify for phi node created by -mldst-motion
Summary:
-mldst-motion creates a new phi node without any debug info. Use the merged debug location from the incoming stores to fix this.

Fixes PR38177. The test case here is (somewhat) simplified from:

```
struct S {
  int foo;
  void fn(int bar);
};
void S::fn(int bar) {
  if (bar)
    foo = 1;
  else
    foo = 0;
}
```

Reviewers: dblaikie, gbedwell, aprantl, vsk

Reviewed By: vsk

Subscribers: vsk, JDevlieghere, llvm-commits

Tags: #debug-info

Differential Revision: https://reviews.llvm.org/D54019

llvm-svn: 346027
2018-11-02 18:25:41 +00:00
Ayal Zaks 45a3ca7be7 [LV] Avoid vectorizing loops under opt for size that involve SCEV checks
Fix PR39417, PR39497

The loop vectorizer may generate runtime SCEV checks for overflow and stride==1
cases, leading to execution of original scalar loop. The latter is forbidden
when optimizing for size. An assert introduced in r344743 triggered the above
PR's showing it does happen. This patch fixes this behavior by preventing
vectorization in such cases.

Differential Revision: https://reviews.llvm.org/D53612

llvm-svn: 345959
2018-11-02 09:16:12 +00:00
Max Kazantsev 872bb74b0a [NFC][LICM] Factor out instruction erasing logic
This patch factors out a function that makes all required updates
whenever an instruction gets erased.

Differential Revision: https://reviews.llvm.org/D54011
Reviewed By: apilipenko

llvm-svn: 345914
2018-11-02 00:21:45 +00:00
Florian Hahn de4f774783 [LoopInterchange] Fix unused variables in release build
llvm-svn: 345881
2018-11-01 19:51:13 +00:00
Florian Hahn c8bd6ea35e [LoopInterchange] Remove support for inner-only reductions.
Inner-loop only reductions require additional checks to make sure they
form a load-phi-store cycle across inner and outer loop. Otherwise the
reduction value is not properly preserved. This patch disables
interchanging such loops for now, as it causes miscompiles in some
cases and it seems to apply only for a tiny amount of loops. Across the
test-suite, SPEC2000 and SPEC2006, 61 instead of 62 loops are
interchange with inner loop reduction support disabled. With
-loop-interchange-threshold=-1000, 3256 instead of 3267.

See the discussion and history of D53027 for an outline of how such legality
checks could look like.

Reviewers: efriedma, mcrosier, davide

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D53027

llvm-svn: 345877
2018-11-01 19:25:00 +00:00
Reid Kleckner 3f756fbabe Remove unnecessary fallthrough annotation after unreachable
Clang's -Wimplicit-fallthrough implementation warns on this. I built
clang with GCC 7.3 in +asserts and -asserts mode, and GCC doesn't warn
on this in either configuration. I think it is unnecessary. I separated
it from the large mechanical patch (https://reviews.llvm.org/D53950) in
case I am wrong and it has to be reverted.

llvm-svn: 345876
2018-11-01 19:11:05 +00:00
Max Kazantsev 46955b58ee [NFC] Reorganize code to prepare it for more transforms
llvm-svn: 345820
2018-11-01 09:42:50 +00:00
Max Kazantsev 3d347bf545 [IndVars] Smart hard uses detection
When rewriting loop exit values, IndVars considers this transform not profitable if
the loop instruction has a loop user which it believes cannot be optimized away.
In current implementation only calls that immediately use the instruction are considered
as such.

This patch extends the definition of "hard" users to any side-effecting instructions
(which usually cannot be optimized away from the loop) and also allows handling
of not just immediate users, but use chains.

Differentlai Revision: https://reviews.llvm.org/D51584
Reviewed By: etherzhhb

llvm-svn: 345814
2018-11-01 06:47:01 +00:00
Volkan Keles 3ca146d083 [InstCombine] Combine nested min/max intrinsics with constants
Reviewers: arsenm, spatel

Reviewed By: spatel

Subscribers: lebedev.ri, wdng, llvm-commits

Differential Revision: https://reviews.llvm.org/D53774

llvm-svn: 345751
2018-10-31 17:50:52 +00:00
Sanjay Patel 1c254c6716 [InstCombine] refactor fabs+fcmp fold; NFC
Also, remove/replace/minimize/enhance the tests for this fold.
The code drops FMF, so it needs more tests and at least 1 fix.

llvm-svn: 345734
2018-10-31 16:34:43 +00:00
Sanjay Patel b9fe3fbb57 [InstCombine] add assertion that InstSimplify has folded a fabs+fcmp; NFC
The 'OLT' case was updated at rL266175, so I assume it was just an
oversight that 'UGE' was not included because that patch handled
both predicates in InstSimplify.

llvm-svn: 345727
2018-10-31 15:31:45 +00:00
Sanjay Patel 85cba3b6fb [InstSimplify] fold 'fcmp nnan oge X, 0.0' when X is not negative
This re-raises some of the open questions about how to apply and use fast-math-flags in IR from PR38086:
https://bugs.llvm.org/show_bug.cgi?id=38086
...but given the current implementation (no FMF on casts), this is likely the only way to predicate the 
transform.

This is part of solving PR39475:
https://bugs.llvm.org/show_bug.cgi?id=39475

Differential Revision: https://reviews.llvm.org/D53874

llvm-svn: 345725
2018-10-31 14:57:23 +00:00
Fedor Sergeev 412ed34744 [LoopUnroll] allow customization for new-pass-manager version of LoopUnroll
Unlike its legacy counterpart new pass manager's LoopUnrollPass does
not provide any means to select which flavors of unroll to run
(runtime, peeling, partial), relying on global defaults.

In some cases having ability to run a restricted LoopUnroll that
does more than LoopFullUnroll is needed.

Introduced LoopUnrollOptions to select optional unroll behaviors.
Added 'unroll<peeling>' to PassRegistry mainly for the sake of testing.

Reviewers: chandlerc, tejohnson
Differential Revision: https://reviews.llvm.org/D53440

llvm-svn: 345723
2018-10-31 14:33:14 +00:00
Max Kazantsev 541f824d32 [IndVars] Strengthen restricton in rewriteLoopExitValues
For some unclear reason rewriteLoopExitValues considers recalculation
after the loop profitable if it has some "soft uses" outside the loop (i.e. any
use other than call and return), even if we have proved that it has a user inside
the loop which we think will not be optimized away.

There is no existing unit test that would explain this. This patch provides an
example when rematerialisation of exit value is not profitable but it passes
this check due to presence of a "soft use" outside the loop.

It makes no sense to recalculate value on exit if we are going to compute it
due to some irremovable within the loop. This patch disallows applying this
transform in the described situation.

Differential Revision: https://reviews.llvm.org/D51581
Reviewed By: etherzhhb

llvm-svn: 345708
2018-10-31 10:30:50 +00:00
Dorit Nuzman 34da6dd696 [LV] Support vectorization of interleave-groups that require an epilog under
optsize using masked wide loads 

Under Opt for Size, the vectorizer does not vectorize interleave-groups that
have gaps at the end of the group (such as a loop that reads only the even
elements: a[2*i]) because that implies that we'll require a scalar epilogue
(which is not allowed under Opt for Size). This patch extends the support for
masked-interleave-groups (introduced by D53011 for conditional accesses) to
also cover the case of gaps in a group of loads; Targets that enable the
masked-interleave-group feature don't have to invalidate interleave-groups of
loads with gaps; they could now use masked wide-loads and shuffles (if that's
what the cost model selects).

Reviewers: Ayal, hsaito, dcaballe, fhahn

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D53668

llvm-svn: 345705
2018-10-31 09:57:56 +00:00
Alexander Potapenko c1c4c9a494 [MSan] another take at instrumenting inline assembly - now with calls
Turns out it's not always possible to figure out whether an asm()
statement argument points to a valid memory region.
One example would be per-CPU objects in the Linux kernel, for which the
addresses are calculated using the FS register and a small offset in the
.data..percpu section.
To avoid pulling all sorts of checks into the instrumentation, we replace
actual checking/unpoisoning code with calls to
msan_instrument_asm_load(ptr, size) and
msan_instrument_asm_store(ptr, size) functions in the runtime.

This patch doesn't implement the runtime hooks in compiler-rt, as there's
been no demand in assembly instrumentation for userspace apps so far.

llvm-svn: 345702
2018-10-31 09:32:47 +00:00
Matthias Braun 9fd397b423 ADT/STLExtras: Introduce llvm::empty; NFC
This is modeled after C++17 std::empty().

Differential Revision: https://reviews.llvm.org/D53909

llvm-svn: 345679
2018-10-31 00:23:23 +00:00
Sanjay Patel 4c39dfc91e [InstCombine] use 'match' to reduce code; NFC
llvm-svn: 345647
2018-10-30 20:52:25 +00:00
Quentin Colombet 900678227c [InstCombine] Teach the move free before null test opti how to deal with noop casts
InstCombine features an optimization that essentially replaces:
if (a)
  free(a)
into:
free(a)

Right now, this optimization is gated by the minsize attribute and therefore
we only perform it if we can prove that we are going to be able to eliminate
the branch and the destination block.

However when casts are involved the optimization would fail to apply, because
the optimization was not smart enough to realize that it is possible to also
move the casts away from the destination block and that is harmless to the
performance since they are just noops.
E.g.,
foo(int *a)
if (a)
  free((char*)a)

Wouldn't be optimized by instcombine, because
- We would refuse to hoist the `bitcast i32* %a to i8` in the source block
- We would fail to see that `bitcast i32* %a to i8` and %a are the same value.

This patch fixes both these problems:
- It teaches the pattern matching of the comparison how to look
  through casts.
- It checks that whether the additional instruction in the destination block
  can be hoisted and are harmless performance-wise.
- It hoists all the code of the destination block in the source block.

Differential Revision: D53356

llvm-svn: 345644
2018-10-30 20:51:04 +00:00
Calixte Denizet 38d50545fe [GCOV] Function counters are wrong when on one line
Summary:
After commit https://reviews.llvm.org/rL344228, the function definitions have a counter but when on one line the counter is wrong (e.g. void foo() { })
I added a test in: https://reviews.llvm.org/D53601

Reviewers: marco-c

Reviewed By: marco-c

Subscribers: llvm-commits, sylvestre.ledru

Differential Revision: https://reviews.llvm.org/D53600

llvm-svn: 345624
2018-10-30 18:41:31 +00:00
Simon Pilgrim 44a9a71d2a [TTI] Fix uses of SK_ExtractSubvector shuffle costs (PR39368)
Correct costings of SK_ExtractSubvector requires the SubTy argument to indicate the type/size of the extracted subvector.

Unlike the rest of the shuffle kinds this means that the main Ty argument represents the source vector type not the destination!

I've done my best to fix a number of vectorizer uses:

SLP - the reduction epilogue costs should be using a SK_PermuteSingleSrc shuffle as these all occur at the hardware vector width - we're not extracting (illegal) subvector types. This is causing the cost model diffs as SK_ExtractSubvector costs are poorly handled and tend to just return 1 at the moment.

LV - I'm not clear on what the SK_ExtractSubvector should represents for recurrences - I've used a <1 x ?> subvector extraction as that seems to match the VF delta.

Differential Revision: https://reviews.llvm.org/D53573

llvm-svn: 345617
2018-10-30 18:10:02 +00:00
Sanjay Patel 68a61cb07c [InstCombine] use getFltSemantics() instead of duplicating it; NFC
llvm-svn: 345613
2018-10-30 16:21:56 +00:00
Sanjay Patel b12e410082 [InstCombine] try to turn shuffle into insertelement
shuffle (insert ?, Scalar, IndexC), V1, Mask --> insert V1, Scalar, IndexC'

The motivating case is at least a couple of steps away: I noticed that
SLPVectorizer does not analyze shuffles as well as sequences of 
insert/extract in PR34724:
https://bugs.llvm.org/show_bug.cgi?id=34724
...so SLP may fail to vectorize when source code has shuffles to start 
with or instcombine has converted insert/extract to shuffles.

Independent of that, an insertelement is always a simpler op for IR 
analysis vs. a shuffle, so we should transform to insert when possible.

I don't think there's any codegen concern here - if a target can't insert 
a scalar directly to some fixed element in a vector (x86?), then this 
should get expanded to the insert+shuffle that we started with.

Differential Revision: https://reviews.llvm.org/D53507

llvm-svn: 345607
2018-10-30 15:26:39 +00:00
Jonas Paulsson 1f067c94dc [LoopVectorizer] Fix for cost values of memory accesses.
This commit is a combination of two patches:

* "Fix in getScalarizationOverhead()"

   If target returns false in TTI.prefersVectorizedAddressing(), it means the
   address registers will not need to be extracted. Therefore, there should
   be no operands scalarization overhead for a load instruction.

* "Don't pass the instruction pointer from getMemInstScalarizationCost."

   Since VF is always > 1, this is a cost query for an instruction in the
   vectorized loop and it should not be evaluated within the scalar
   context of the instruction.

Review: Ulrich Weigand, Hal Finkel
https://reviews.llvm.org/D52351
https://reviews.llvm.org/D52417

llvm-svn: 345603
2018-10-30 14:34:15 +00:00
Nicola Zaghen f96383c99e [SROA] Use offset sizes from the DataLayout instead of the pointer siezes.
This fixes an assertion when constant folding a GEP when the part of the offset
was in i32 (IndexSize, as per DataLayout) and part in the i64 (PointerSize) in
the newly created test case.

Differential Revision: https://reviews.llvm.org/D52609

llvm-svn: 345585
2018-10-30 11:15:04 +00:00
Vedant Kumar dd4be53b20 [HotColdSplitting] Allow outlining single-block cold regions
It can be profitable to outline single-block cold regions because they
may be large.

Allow outlining single-block regions if they have over some threshold of
non-debug, non-terminator instructions. I chose 3 as the threshold after
experimenting with several internal frameworks.

In practice, reducing the threshold further did not give much
improvement, whereas increasing it resulted in substantial regressions.

Differential Revision: https://reviews.llvm.org/D53824

llvm-svn: 345524
2018-10-29 19:15:39 +00:00
Florian Hahn fc7654a67b [Local] Keep K's range if K does not move when combining metadata.
As K has to dominate I, IIUC I's range metadata must be a subset of
K's. After Eli's recent clarification to the LangRef, loading a value
outside of the range is undefined behavior.
Therefore if I's range contains elements outside of K's range and we would load
one such value, K would cause undefined behavior.

In cases like hoisting/sinking, we still want the most generic range
over all code paths to/from the hoist/sink point. As suggested in the
patches related to D47339, I will refactor the handling of those
scenarios and try to decouple it from this function as follow up, once
we switched to a similar handling of metadata in most of
combineMetadata.

I updated some tests checking mostly the merging of metadata to keep the
metadata of to dominating load. The most interesting one is probably test8 in
test/Transforms/JumpThreading/thread-loads.ll. It contained a comment
about the alias metadata preventing us to eliminate the branch, but it
seem like the actual problem currently is that we merge the ranges of
both loads and cannot eliminate the icmp afterwards. With this patch, we
manage to eliminate the icmp, as the range of the first load excludes 8.

Reviewers: efriedma, nlopes, davide

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D51629

llvm-svn: 345456
2018-10-27 16:53:45 +00:00
Simon Pilgrim a132016d4d Fix -Wdocumentation warning. NFCI.
llvm-svn: 345454
2018-10-27 15:14:42 +00:00
Leonard Chan eebecb3214 Revert "[PassManager/Sanitizer] Enable usage of ported AddressSanitizer passes with -fsanitize=address"
This reverts commit 8d6af840396f2da2e4ed6aab669214ae25443204 and commit
b78d19c287b6e4a9abc9fb0545de9a3106d38d3d which causes slower build times
by initializing the AddressSanitizer on every function run.

The corresponding revisions are https://reviews.llvm.org/D52814 and
https://reviews.llvm.org/D52739.

llvm-svn: 345433
2018-10-26 22:51:51 +00:00
Christy Lee 3cc0e935c4 Pointer types were treated as zero-size by MergeICmps
Summary:
The visitICmp analysis function would record compares of pointer types, as size 0. This causes the resulting memcmp() call to have the wrong total size.
Found with "self-build" of clang/LLVM on Windows.

Reviewers: christylee, trentxintong, courbet

Reviewed By: courbet

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D53536

llvm-svn: 345413
2018-10-26 18:02:06 +00:00
Max Kazantsev 619a83463f [SimpleLoopUnswitch] Unswitch by experimental.guard intrinsics
This patch adds support of `llvm.experimental.guard` intrinsics to non-trivial
simple loop unswitching. These intrinsics represent implicit control flow which
has pretty much the same semantics as usual conditional branches. The
algorithm of dealing with them is following:

- Consider guards as unswitching candidates;
- If a guard is considered the best candidate, turn it into a branch;
- Apply normal unswitching algorithm on this branch.

The patch has no compile time effect on code that does not contain any guards.

Differential Revision: https://reviews.llvm.org/D53744
Reviewed By: chandlerc

llvm-svn: 345387
2018-10-26 14:20:11 +00:00
Max Kazantsev bde31000b1 [SimpleLoopUnswitch] Make all checks before actual non-trivial unswitch
We should be able to make all relevant checks before we actually start the non-trivial
unswitching, so that we could guarantee that once we have started the transform,
it will always succeed.

Reviewed By: chandlerc
Differential Revision: https://reviews.llvm.org/D53747

llvm-svn: 345375
2018-10-26 09:52:58 +00:00
Cameron McInally 384a74b0e6 [FPEnv] Last BinaryOperator::isFNeg(...) to m_FNeg(...) changes
Replacing BinaryOperator::isFNeg(...) to avoid regressions when we
separate FNeg from the FSub IR instruction.

Differential Revision: https://reviews.llvm.org/D53650

llvm-svn: 345295
2018-10-25 18:09:33 +00:00
Carlos Alberto Enciso 9a24e1a7cd [DebugInfo][Dexter] Unreachable line stepped onto after SimplifyCFG.
When SimplifyCFG changes the PHI node into a select instruction, the debug line records becomes ambiguous. It causes the debugger to display unreachable source lines.

Differential Revision: https://reviews.llvm.org/D53287

llvm-svn: 345250
2018-10-25 09:58:59 +00:00
Gabor Buella 1f6ca0ba15 Add -instcombine-code-sinking option
Reviewers: craig.topper, andrew.w.kaylor, efriedma

Reviewed By: craig.topper, andrew.w.kaylor, efriedma

Differential Revision: https://reviews.llvm.org/D52709

llvm-svn: 345248
2018-10-25 08:32:29 +00:00
Alina Sbirlea ad4d018202 Update MemorySSA in LoopRotate.
Summary:
Teach LoopRotate to preserve MemorySSA.
Enable tests for correctness, dependency disabled by default.

Subscribers: sanjoy, jlebar, Prazek, george.burgess.iv, llvm-commits

Differential Revision: https://reviews.llvm.org/D51718

llvm-svn: 345216
2018-10-24 22:46:45 +00:00
Vedant Kumar c299006879 [HotColdSplitting] Identify larger cold regions using domtree queries
The current splitting algorithm works in three stages:

  1) Identify cold blocks, then
  2) Use forward/backward propagation to mark hot blocks, then
  3) Grow a SESE region of blocks *outside* of the set of hot blocks and
  start outlining.

While testing this pass on Apple internal frameworks I noticed that some
kinds of control flow (e.g. loops) are never outlined, even though they
unconditionally lead to / follow cold blocks. I noticed two other issues
related to how cold regions are identified:

  - An inconsistency can arise in the internal state of the hotness
  propagation stage, as a block may end up in both the ColdBlocks set
  and the HotBlocks set. Further inconsistencies can arise as these sets
  do not match what's in ProfileSummaryInfo.

  - It isn't necessary to limit outlining to single-exit regions.

This patch teaches the splitting algorithm to identify maximal cold
regions and outline them. A maximal cold region is defined as the set of
blocks post-dominated by a cold sink block, or dominated by that sink
block. This approach can successfully outline loops in the cold path. As
a side benefit, it maintains less internal state than the current
approach.

Due to a limitation in CodeExtractor, blocks within the maximal cold
region which aren't dominated by a single entry point (a so-called "max
ancestor") are filtered out.

Results:
  - X86 (LNT + -Os + externals): 134KB of TEXT were outlined compared to
  47KB pre-patch, or a ~3x improvement. Did not see a performance impact
  across two runs.
  - AArch64 (LNT + -Os + externals + Apple-internal benchmarks): 149KB
  of TEXT were outlined. Ditto re: performance impact.
  - Outlining results improve marginally in the internal frameworks I
  tested.

Follow-ups:
  - Outline more than once per function, outline large single basic
  blocks, & try to remove unconditional branches in outlined functions.

Differential Revision: https://reviews.llvm.org/D53627

llvm-svn: 345209
2018-10-24 22:15:41 +00:00
Teresa Johnson c8dba682bb [hot-cold-split] Name split functions with ".cold" suffix
Summary:
The current default of appending "_"+entry block label to the new
extracted cold function breaks demangling. Change the deliminator from
"_" to "." to enable demangling. Because the header block label will
be empty for release compile code, use "extracted" after the "." when
the label is empty.

Additionally, add a mechanism for the client to pass in an alternate
suffix applied after the ".", and have the hot cold split pass use
"cold."+Count, where the Count is currently 1 but can be used to
uniquely number multiple cold functions split out from the same function
with D53588.

Reviewers: sebpop, hiraditya

Subscribers: llvm-commits, erik.pilkington

Differential Revision: https://reviews.llvm.org/D53534

llvm-svn: 345178
2018-10-24 18:53:47 +00:00
Sanjay Patel 3b206305fd [InstCombine] try harder to form select from logic ops (2nd try)
The original patch was committed here:
rL344609
...and reverted:
rL344612
...because it did not properly check/test data types before calling
ComputeNumSignBits(). 

The tests that caused bot failures for the previous commit are 
over-reaching front-end tests that run the entire -O optimizer 
pipeline: 
    Clang :: CodeGen/builtins-systemz-zvector.c
    Clang :: CodeGen/builtins-systemz-zvector2.c

I've added a negative test here to ensure coverage for that case.
The new early exit check also tests the type of the 'B' parameter,
so we don't waste time on matching if either value is unsuitable.

Original commit message:

This is part of solving PR37549:
https://bugs.llvm.org/show_bug.cgi?id=37549

The patterns shown here are a special case of something
that we already convert to select. Using ComputeNumSignBits()
catches that case (but not the more complicated motivating
patterns yet).

The backend has hooks/logic to convert back to logic ops
if that's better for the target.

llvm-svn: 345149
2018-10-24 15:17:56 +00:00
Cameron McInally 678f43f666 [FPEnv] Convert more BinaryOperator::isFNeg(...) to m_FNeg(...)
This work is to avoid regressions when we seperate FNeg from the FSub IR instruction. 

Differential Revision: https://reviews.llvm.org/D53205

llvm-svn: 345146
2018-10-24 14:45:18 +00:00
Gil Rapaport c523036fd2 Revert r345114
Investigating fails.

llvm-svn: 345123
2018-10-24 08:41:22 +00:00
Dorit Nuzman 5114390e48 [LV] Don't have fold-tail under optsize invalidate interleave-groups when
masked-interleaving is enabled

Enable interleave-groups under fold-tail scenario for Opt for size compilation;
D50480 added support for vectorizing loops of arbitrary trip-count without a
remiander, which in turn makes everything in the loop conditional, including
interleave-groups if any. It therefore invalidated all interleave-groups
because we didn't have support for vectorizing predicated interleaved-groups
at the time. In the meantime, D53011 introduced this support, so we don't
have to invalidate interleave-groups when masked-interleaved support is enabled.

Reviewers: Ayal, hsaito, dcaballe, fhahn

Reviewed By: hsaito

Differential Revision: https://reviews.llvm.org/D53559

llvm-svn: 345115
2018-10-24 07:11:38 +00:00
Gil Rapaport 5012e7f6ac [LSR] Combine unfolded offset into invariant register
LSR reassociates constants as unfolded offsets when the constants fit as
immediate add operands, which currently prevents such constants from being
combined later with loop invariant registers.
This patch modifies GenerateCombinations() to generate a second formula which
includes the unfolded offset in the combined loop-invariant register.

Differential Revision: https://reviews.llvm.org/D51861

llvm-svn: 345114
2018-10-24 07:08:38 +00:00
Wei Mi 80a0c97e07 [PM] keeping history when original SCC split and then merge into itself
in the same round of SCC update.

In https://reviews.llvm.org/rL309784, inline history is added to prevent
infinite inlining across multiple run of inliner and SCC update, but the
history will only be kept when new SCC is actually generated during SCC update.

We found a case that SCC can be split and then merge into itself in the same
round of SCC update, so the same SCC will be pop out from UR.CWorklist and
then added back immediately, without any new SCC generated, that is why the
existing patch cannot catch the infinite inline case.

What the patch does is even if no new SCC is generated, if only the current
SCC appears in UR.CWorklist again, then keep the inline history.

Differential Revision: https://reviews.llvm.org/D52915

llvm-svn: 345103
2018-10-23 23:29:45 +00:00
Vedant Kumar 503154615d [HotColdSplitting] Attach MinSize to outlined code
Outlined code is cold by assumption, so it makes sense to optimize it
for minimal code size rather than performance.

After r344869 moved the splitting pass to the end of the IR pipeline,
this does not result in much of a code size reduction. This is probably
because a comparatively small number backend transforms make use of the
MinSize hint.

Running LNT on x86_64, I see that 33/1020 binaries shrink for a total of
919 bytes of TEXT reduction. I didn't measure a significant performance
impact.

Differential Revision: https://reviews.llvm.org/D53518

llvm-svn: 345072
2018-10-23 19:41:12 +00:00
Sanjay Patel 95790c546f [InstCombine] use 'match' to simplify code
There's probably some vector-with-undef-element pattern
that shows an improvement, so this is probably not quite
'NFC'.

This is the last step towards removing the fake binop 
queries for not/neg. Ie, there are no more uses of those
functions in trunk. Fneg should follow.

llvm-svn: 345050
2018-10-23 16:54:28 +00:00
Jordan Rupprecht 2fed6ac186 [DebugInfo][GlobalOpt] Fix -debugify for globalopt shrinking globals to booleans.
Summary:
TryToShrinkGlobalToBoolean, when possible, will split store <value> + load <value> into store <bool> + select <bool ? value : 0>. This preserves DebugLoc during that pass.

Fixes PR37959. The test case here is the simplified .ll for:

```
static int foo;
int bar() {
  foo = 5;
  return foo;
}
```

Reviewers: dblaikie, gbedwell, aprantl

Reviewed By: dblaikie

Subscribers: mehdi_amini, JDevlieghere, dexonsmith, llvm-commits

Tags: #debug-info

Differential Revision: https://reviews.llvm.org/D53531

llvm-svn: 345046
2018-10-23 16:35:51 +00:00
Sanjay Patel 5b6b090cf2 [Reassociate] replace fake binop queries with 'match' API
We need to update this code before introducing an 'fneg' instruction in IR,
so we might as well kill off the integer neg/not queries too.

This is no-functional-change-intended for scalar code and most vector code. 
For vectors, we can see that the 'match' API allows for undef elements in 
constants, so we optimize those cases better.

Ideally, there would be a test for each code diff, but I don't see evidence
of that for the existing code, so I didn't try very hard to come up with new 
vector tests for each code change.

Differential Revision: https://reviews.llvm.org/D53533

llvm-svn: 345042
2018-10-23 15:55:06 +00:00
Simon Pilgrim 532a0f122e [SLPVectorizer] Add basic support for mul/and/or/xor horizontal reductions
Expand arithmetic reduction to include mul/and/or/xor instructions.

This patch just fixes the SLPVectorizer - the effective reduction costs for AVX1+ are still poor (see rL344846) and will need to be improved before SLP sees this as a valid transform - but we can already see the effect on SSE2 tests.

This partially helps PR37731, but doesn't fix it all as it still falls over on the extraction/reduction order for some reason.

Differential Revision: https://reviews.llvm.org/D53473

llvm-svn: 345037
2018-10-23 15:13:09 +00:00
Sanjay Patel 747feb28e4 [InstCombine] use 'match' to handle vectors and simplify code
This is another step towards completely removing the fake 
binop queries for not/neg/fneg.

llvm-svn: 345036
2018-10-23 15:05:12 +00:00
Sanjay Patel ad76c682c7 [InstCombine] swap select profile metadata when swapping select ops
llvm-svn: 345034
2018-10-23 14:43:31 +00:00
Sanjay Patel 5141435d23 [SLSR] use 'match' to simplify code; NFC
This pass could probably be modified slightly to allow
vector splat transforms for practically no cost, but
it only works on scalars for now. So the use of the
newer 'match' API should make no functional difference. 

llvm-svn: 345030
2018-10-23 14:07:39 +00:00
Dorit Nuzman da5dc13355 Leftover bits from https://reviews.llvm.org/D53420 that were accidentally left
out of revision 344883

llvm-svn: 345021
2018-10-23 11:51:55 +00:00
Kostya Serebryany af95597c3c [hwasan] add stack frame descriptions.
Summary:
At compile-time, create an array of {PC,HumanReadableStackFrameDescription}
for every function that has an instrumented frame, and pass this array
to the run-time at the module-init time.
Similar to how we handle pc-table in SanitizerCoverage.
The run-time is dummy, will add the actual logic in later commits.

Reviewers: morehouse, eugenis

Reviewed By: eugenis

Subscribers: srhines, llvm-commits, kubamracek

Differential Revision: https://reviews.llvm.org/D53227

llvm-svn: 344985
2018-10-23 00:50:40 +00:00
Sanjay Patel dd1c3df72d [Reassociate] add 'using namespace' to reduce bloat; NFC
llvm-svn: 344959
2018-10-22 21:37:02 +00:00
Teresa Johnson f431a2f261 [hot-cold-split] Add opt remark on success
Summary: Emit optimization remark on successful hot cold split.

Reviewers: sebpop, hiraditya

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D53512

llvm-svn: 344938
2018-10-22 19:06:42 +00:00
Benjamin Kramer 3e778165d6 [CGProfile] Turn constant-size SmallVector into array
No functionality change.

llvm-svn: 344893
2018-10-22 10:51:34 +00:00
Dorit Nuzman 3ec99fe21b [IAI,LV] Avoid creating a scalar epilogue due to gaps in interleave-groups when
optimizing for size

LV is careful to respect -Os and not to create a scalar epilog in all cases
(runtime tests, trip-counts that require a remainder loop) except for peeling
due to gaps in interleave-groups. This patch fixes that; -Os will now have us
invalidate such interleave-groups and vectorize without an epilog.

The patch also removes a related FIXME comment that is now obsolete, and was
also inaccurate:
"FIXME: return None if loop requiresScalarEpilog(<MaxVF>), or look for a smaller
MaxVF that does not require a scalar epilog."
(requiresScalarEpilog() has nothing to do with VF).

Reviewers: Ayal, hsaito, dcaballe, fhahn

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D53420

llvm-svn: 344883
2018-10-22 06:17:09 +00:00
Aditya Kumar d9e2e383a9 Schedule Hot Cold Splitting pass after most optimization passes
Summary:
In the new+old pass manager, hot cold splitting was schedule too early.
Thanks to Vedant for pointing this out.

Reviewers: sebpop, vsk

Reviewed By: sebpop, vsk

Subscribers: mehdi_amini, llvm-commits

Differential Revision: https://reviews.llvm.org/D53437

llvm-svn: 344869
2018-10-21 18:11:56 +00:00
Sanjay Patel 0522b0da31 [InstCombine] use 'match' to simplify code; NFC
llvm-svn: 344855
2018-10-20 17:15:57 +00:00
Sanjay Patel ec572ade20 [InstCombine] make code more flexible with lambda; NFC
I couldn't tell from svn history when these checks were added,
but it pre-dates the split of instcombine into its own directory
at rL92459.

The motivation for changing the check is partly shown by the
code in PR34724:
https://bugs.llvm.org/show_bug.cgi?id=34724

There are also existing regression tests for SLPVectorizer with
sequences of extract+insert that are likely assumed to become
shuffles by the vectorizer cost models.

llvm-svn: 344854
2018-10-20 16:58:27 +00:00
Sanjay Patel 729c4362cf [InstCombine] add explanatory comment for strange vector logic; NFC
llvm-svn: 344852
2018-10-20 16:25:55 +00:00
Evandro Menezes 164ea101ab [NFC][InstCombine] Undo stray change
Undo stray change introduced by r344725.

llvm-svn: 344814
2018-10-19 20:57:45 +00:00
Thomas Lively c339250e12 [InstCombine] InstCombine and InstSimplify for minimum and maximum
Summary: Depends on D52765

Reviewers: aheejin, dschuff

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D52766

llvm-svn: 344799
2018-10-19 19:01:26 +00:00
Sanjay Patel 70daf85bc2 [InstCombine] use m_Neg() in dyn_castNegVal() to match vectors with undef elts
llvm-svn: 344793
2018-10-19 17:54:53 +00:00
Fangrui Song 2e83b2e9ee Use llvm::{all,any,none}_of instead std::{all,any,none}_of. NFC
llvm-svn: 344774
2018-10-19 06:12:02 +00:00
Ayal Zaks b0b5312e67 [LV] Fold tail by masking to vectorize loops of arbitrary trip count under opt for size
When optimizing for size, a loop is vectorized only if the resulting vector loop
completely replaces the original scalar loop. This holds if no runtime guards
are needed, if the original trip-count TC does not overflow, and if TC is a
known constant that is a multiple of the VF. The last two TC-related conditions
can be overcome by
1. rounding the trip-count of the vector loop up from TC to a multiple of VF;
2. masking the vector body under a newly introduced "if (i <= TC-1)" condition.

The patch allows loops with arbitrary trip counts to be vectorized under -Os,
subject to the existing cost model considerations. It also applies to loops with
small trip counts (under -O2) which are currently handled as if under -Os.

The patch does not handle loops with reductions, live-outs, or w/o a primary
induction variable, and disallows interleave groups.

(Third, final and main part of -)
Differential Revision: https://reviews.llvm.org/D50480

llvm-svn: 344743
2018-10-18 15:03:15 +00:00
Mikael Holmen e3605d0f70 Add a emitUnaryFloatFnCall version that fetches the function name from TLI
Summary:
In several places in the code we use the following pattern:

  if (hasUnaryFloatFn(&TLI, Ty, LibFunc_tan, LibFunc_tanf, LibFunc_tanl)) {
    [...]
    Value *Res = emitUnaryFloatFnCall(X, TLI.getName(LibFunc_tan), B, Attrs);
    [...]
  }

In short, we check if there is a lib-function for a certain type, and then
we _always_ fetch the name of the "double" version of the lib function and
construct a call to the appropriate function, that we just checked exists,
using that "double" name as a basis.

This is of course a problem in cases where the target doesn't support the
"double" version, but e.g. only the "float" version.

In that case TLI.getName(LibFunc_tan) returns "", and
emitUnaryFloatFnCall happily appends an "f" to "", and we erroneously end
up with a call to a function called "f".

To solve this, the above pattern is changed to

  if (hasUnaryFloatFn(&TLI, Ty, LibFunc_tan, LibFunc_tanf, LibFunc_tanl)) {
    [...]
    Value *Res = emitUnaryFloatFnCall(X, &TLI, LibFunc_tan, LibFunc_tanf,
                                      LibFunc_tanl, B, Attrs);
    [...]
  }

I.e instead of first fetching the name of the "double" version and then
letting emitUnaryFloatFnCall() add the final "f" or "l", we let
emitUnaryFloatFnCall() fetch the right name from TLI.

Reviewers: eli.friedman, efriedma

Reviewed By: efriedma

Subscribers: efriedma, bjope, llvm-commits

Differential Revision: https://reviews.llvm.org/D53370

llvm-svn: 344725
2018-10-18 06:27:53 +00:00
Chandler Carruth 60b2e054dc [TI removal] Switch simple loop unswitch to `Instruction`.
llvm-svn: 344719
2018-10-18 00:40:26 +00:00
Chandler Carruth c6cad4251e [TI removal] Switch NewGVN to directly use `Instruction`.
llvm-svn: 344718
2018-10-18 00:39:46 +00:00
Chandler Carruth c8eaea71c9 [TI removal] Use `Instruction` instead of `TerminatorInst` for
a variable's type.

llvm-svn: 344717
2018-10-18 00:39:18 +00:00
Chandler Carruth 8b7a8123dd [TI removal] Update CodeExtractor to use Instruction directly.
llvm-svn: 344716
2018-10-18 00:38:54 +00:00
Chandler Carruth c1e3ee29a4 [TI removal] Switch ObjCARC code to directly use the nice range-based
successors API or directly build the iterators out of the terminator
instruction and avoid requiring a TerminatorInst variable.

llvm-svn: 344715
2018-10-18 00:38:34 +00:00
Chandler Carruth 93cf2ea27a [TI removal] Switch MergeFunctions to directly use Instruction API.
llvm-svn: 344714
2018-10-18 00:37:37 +00:00
Nicolai Haehnle 0823050b9f StructurizeCFG: Simplify inserted PHI nodes
Summary:
This improves subsequent divergence analysis in some cases.

Change-Id: I5e95e7ec7fd3fa80d414d1a53a02fea23e3d67d3

Reviewers: arsenm, rampitec

Subscribers: jvesely, wdng, llvm-commits

Differential Revision: https://reviews.llvm.org/D53316

llvm-svn: 344697
2018-10-17 15:37:41 +00:00
Fedor Sergeev c297e84b97 [LoopPredication] add some simple stats
Just adding some useful statistics to LoopPredication pass
which was lacking any of these.

llvm-svn: 344681
2018-10-17 09:02:54 +00:00
Leonard Chan 423957ad3a [Sanitizer][PassManager] Fix for failing ASan tests on arm-linux-gnueabihf
Forgot to initialize the legacy pass in it's constructor.

Differential Revision: https://reviews.llvm.org/D53350

llvm-svn: 344659
2018-10-17 00:16:07 +00:00
Teresa Johnson d2c234a4cc [ThinLTO] Add importing stats to thin link
Summary:
Previously we could only get the number of imported functions and
variables from the backend. This adds stats to the thin link where the
importing is decided.

Reviewers: wmi

Subscribers: inglorion, dexonsmith, llvm-commits

Differential Revision: https://reviews.llvm.org/D53337

llvm-svn: 344658
2018-10-16 23:49:50 +00:00
Jonathan Metzman 5eb8cba280 [SanitizerCoverage] Don't duplicate code to get section pointers
Summary:
Merge code used to get section start and section end pointers
for SanitizerCoverage constructors. This includes code that handles
getting the start pointers when targeting MSVC.

Reviewers: kcc, morehouse

Reviewed By: morehouse

Subscribers: kcc, hiraditya

Differential Revision: https://reviews.llvm.org/D53211

llvm-svn: 344657
2018-10-16 23:43:57 +00:00
David Bolvansky 7c7760da7e [InstCombine] Cleanup libfunc attribute inferring
Reviewers: efriedma

Reviewed By: efriedma

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D53338

llvm-svn: 344645
2018-10-16 21:18:31 +00:00
Anna Thomas 6f732bfb79 [LV] Teach vectorizer about variant value store into uniform address
Summary:
Teach vectorizer about vectorizing variant value stores to uniform
address. Similar to rL343028, we do not allow vectorization if we have
multiple stores to the same uniform address.

Cost model already has the change for considering the extract
instruction cost for a variant value store. See added test cases for how
vectorization is done.
The patch also contains changes to the ORE messages.

Reviewers: Ayal, mkuper, anemet, hsaito

Subscribers: rkruppe, llvm-commits

Differential Revision: https://reviews.llvm.org/D52656

llvm-svn: 344613
2018-10-16 15:46:26 +00:00
Sanjay Patel bb3dd34e62 revert rL344609: [InstCombine] try harder to form select from logic ops
I noticed a missing check and added it at rL344610, but there actually
are codegen tests that will fail without that, so I'll edit those and
submit a fixed patch with more tests.

llvm-svn: 344612
2018-10-16 15:26:08 +00:00
Sanjay Patel f6a7c8b1fc [InstCombine] make sure type is integer before calling ComputeNumSignBits
llvm-svn: 344610
2018-10-16 14:44:50 +00:00
Sanjay Patel 0c48c977b8 [InstCombine] try harder to form select from logic ops
This is part of solving PR37549:
https://bugs.llvm.org/show_bug.cgi?id=37549

The patterns shown here are a special case of something
that we already convert to select. Using ComputeNumSignBits()
catches that case (but not the more complicated motivating
patterns yet).

The backend has hooks/logic to convert back to logic ops
if that's better for the target.

llvm-svn: 344609
2018-10-16 14:35:21 +00:00
Max Kazantsev 9c90ec2fae [NFC] Make LoopSafetyInfo abstract to allow alternative implementations
llvm-svn: 344592
2018-10-16 08:31:05 +00:00
Max Kazantsev 8d56be7070 [NFC] Encapsulate work with BlockColors in LoopSafetyInfo
llvm-svn: 344590
2018-10-16 08:07:14 +00:00
David Stenberg c9163855dd [DebugInfo][LCSSA] Rewrite pre-existing debug values outside loop
Summary:
Extend LCSSA so that debug values outside loops are rewritten to
use the PHI nodes that the pass creates.

This fixes PR39019. In that case, we ran LCSSA on a loop that
was later on vectorized, which left us with something like this:

  for.cond.cleanup:
    %add.lcssa = phi i32 [ %add, %for.body ], [ %34, %middle.block ]
    call void @llvm.dbg.value(metadata i32 %add,
    ret i32 %add.lcssa

  for.body:
    %add =
    [...]
    br i1 %exitcond, label %for.cond.cleanup, label %for.body

which later resulted in the debug.value becoming undef when
removing the scalar loop (and the location would have probably
been wrong for the vectorized case otherwise).

As we now may need to query the AvailableVals cache more than
once for a basic block, FindAvailableVals() in SSAUpdaterImpl is
changed so that it updates the cache for blocks that we do not
create a PHI node for, regardless of the block's number of
predecessors. The debug value in the attached IR reproducer
would not be properly rewritten without this.

Debug values residing in blocks where we have not inserted any
PHI nodes are currently left as-is by this patch. I'm not sure
what should be done with those uses.

Reviewers: mattd, aprantl, vsk, probinson

Reviewed By: mattd, aprantl

Subscribers: jmorse, gbedwell, JDevlieghere, llvm-commits

Differential Revision: https://reviews.llvm.org/D53130

llvm-svn: 344589
2018-10-16 08:06:48 +00:00
Max Kazantsev c8466f937c [NFC] Turn isGuaranteedToExecute into a method
llvm-svn: 344587
2018-10-16 06:34:53 +00:00
Lang Hames 8f9a2446e0 Change a TerminatorInst* to an Instruction* in HotColdSplitting.cpp.
r344558 added an assignment to a TerminatorInst* from
BasicBlock::getTerminatorInst(), but BasicBlock::getTerminatorInst() returns an
Instruction* rather than a TerminatorInst* since r344504 so this fails to
compile.

Changing the variable to an Instruction* should get the bots building again.

llvm-svn: 344566
2018-10-15 22:27:03 +00:00
Sebastian Pop 542e522b87 [hot-cold-split] fix static analysis of cold regions
Make the code of blockEndsInUnreachable to match the function
blockEndsInUnreachable in CodeGen/BranchFolding.cpp. I also have
added a note to make sure the code of this function will not be
modified unless the back-end version is also modified.

An early return before outlining has been added to avoid
outlining the full function body when the first block in the
function is marked cold.

The static analysis of cold code has been amended to avoid
marking the whole function as cold by back-propagation
because the back-propagation would mark blocks with return
statements as cold.

The patch adds debug statements to help discover these problems.

Differential Revision: https://reviews.llvm.org/D52904

llvm-svn: 344558
2018-10-15 21:43:11 +00:00
Vedant Kumar 15718a6190 [CodeExtractor] Erase debug intrinsics in outlined thunks (fix PR22900)
Variable updates within the outlined function are invisible to
debuggers. This could be improved by defining a DISubprogram for the
new function. For the moment, simply erase the debug intrinsics instead.

This fixes verifier failures about function-local metadata being used in
the wrong function, seen while testing the hot/cold splitting pass.

rdar://45142482

Differential Revision: https://reviews.llvm.org/D53267

llvm-svn: 344545
2018-10-15 19:22:20 +00:00
Chandler Carruth e303c87e19 [TI removal] Make `getTerminator()` return a generic `Instruction`.
This removes the primary remaining API producing `TerminatorInst` which
will reduce the rate at which code is introduced trying to use it and
generally make it much easier to remove the remaining APIs across the
codebase.

Also clean up some of the stragglers that the previous mechanical update
of variables missed.

Users of LLVM and out-of-tree code generally will need to update any
explicit variable types to handle this. Replacing `TerminatorInst` with
`Instruction` (or `auto`) almost always works. Most of these edits were
made in prior commits using the perl one-liner:
```
perl -i -ple 's/TerminatorInst(\b.* = .*getTerminator\(\))/Instruction\1/g'
```

This also my break some rare use cases where people overload for both
`Instruction` and `TerminatorInst`, but these should be easily fixed by
removing the `TerminatorInst` overload.

llvm-svn: 344504
2018-10-15 10:42:50 +00:00
Chandler Carruth 52eaaf3ff8 [TI removal] Rework `InstVisitor` to support visiting instructions that
are terminators without relying on the specific `TerminatorInst` type.

This required cleaning up two users of `InstVisitor`s usage of
`TerminatorInst` as well.

llvm-svn: 344503
2018-10-15 10:10:54 +00:00
Chandler Carruth edb12a838a [TI removal] Make variables declared as `TerminatorInst` and initialized
by `getTerminator()` calls instead be declared as `Instruction`.

This is the biggest remaining chunk of the usage of `getTerminator()`
that insists on the narrow type and so is an easy batch of updates.
Several files saw more extensive updates where this would cascade to
requiring API updates within the file to use `Instruction` instead of
`TerminatorInst`. All of these were trivial in nature (pervasively using
`Instruction` instead just worked).

llvm-svn: 344502
2018-10-15 10:04:59 +00:00
Chandler Carruth ae98759ec5 [TI removal] Remove `TerminatorInst` from GVN.h and GVN.cpp.
This is the last interesting usage in all of LLVM's headers. The
remaining usages in headers are the core typesystem bits (Core.h,
instruction types, and InstVisitor) and as the return of
`BasicBlock::getTerminator`. The latter is the big remaining API point
that I'll remove after mass updates to user code.

llvm-svn: 344501
2018-10-15 10:00:15 +00:00
Chandler Carruth 4a2d58e16a [TI removal] Remove `TerminatorInst` from BasicBlockUtils.h
This requires updating a number of .cpp files to adapt to the new API.
I've just systematically updated all uses of `TerminatorInst` within
these files te `Instruction` so thta I won't have to touch them again in
the future.

llvm-svn: 344498
2018-10-15 09:34:05 +00:00
Chandler Carruth b99a24689b [TI removal] Remove TerminatorInst as an input parameter from all public
LLVM APIs. There weren't very many.

We still have the instruction visitor, and APIs with TerminatorInst as
a return type or an output parameter.

llvm-svn: 344494
2018-10-15 09:17:09 +00:00
Ayal Zaks e567b5b526 [LV] Fix comments reported when not vectorizing single iteration loops; NFC
Landing this as a separate part of https://reviews.llvm.org/D50480, being a
seemingly unrelated change ([LV] Vectorizing loops of arbitrary trip count
without remainder under opt for size).

llvm-svn: 344483
2018-10-14 17:53:02 +00:00
Sanjay Patel 7181146c6c [InstCombine] combine a shuffle and an extract subvector shuffle
This is part of the missing IR-level folding noted in D52912.
This should be ok as a canonicalization because the new shuffle mask can't
be any more complicated than the existing shuffle mask. If there's some 
target where the shorter vector shuffle is not legal, it should just end up 
expanding to something like the pair of shuffles that we're starting with here.

Differential Revision: https://reviews.llvm.org/D53037

llvm-svn: 344476
2018-10-14 15:25:06 +00:00
Dorit Nuzman 38bbf81ade recommit 344472 after fixing build failure on ARM and PPC.
llvm-svn: 344475
2018-10-14 08:50:06 +00:00
Dorit Nuzman 5118c68cde revert 344472 due to failures.
llvm-svn: 344473
2018-10-14 07:21:20 +00:00
Dorit Nuzman 8174368955 [IAI,LV] Add support for vectorizing predicated strided accesses using masked
interleave-group

The vectorizer currently does not attempt to create interleave-groups that
contain predicated loads/stores; predicated strided accesses can currently be
vectorized only using masked gather/scatter or scalarization. This patch makes
predicated loads/stores candidates for forming interleave-groups during the
Loop-Vectorizer's analysis, and adds the proper support for masked-interleave-
groups to the Loop-Vectorizer's planning and transformation stages. The patch
also extends the TTI API to allow querying the cost of masked interleave groups
(which each target can control); Targets that support masked vector loads/
stores may choose to enable this feature and allow vectorizing predicated
strided loads/stores using masked wide loads/stores and shuffles.

Reviewers: Ayal, hsaito, dcaballe, fhahn, javed.absar

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D53011

llvm-svn: 344472
2018-10-14 07:06:16 +00:00
Benjamin Kramer c55e997556 Move some helpers from the global namespace into anonymous ones.
llvm-svn: 344468
2018-10-13 22:18:22 +00:00
Sanjay Patel 47579b21e2 [InstCombine] fix complexity canonicalization with fake unary vector ops
This is a preliminary step to avoid regressions when we add
an actual 'fneg' instruction to IR. See D52934 and D53205.

llvm-svn: 344458
2018-10-13 16:15:37 +00:00
David Bolvansky e8b3bba717 [InstCombine] Fixed crash with aliased functions
Summary: Fixes PR39177

Reviewers: spatel, jbuening

Reviewed By: jbuening

Subscribers: jbuening, llvm-commits

Differential Revision: https://reviews.llvm.org/D53129

llvm-svn: 344454
2018-10-13 15:21:55 +00:00
Kostya Serebryany bc504559ec move GetOrCreateFunctionComdat to Instrumentation.cpp/Instrumentation.h
Summary:
GetOrCreateFunctionComdat is currently used in SanitizerCoverage,
where it's defined. I'm planing to use it in HWASAN as well,
so moving it into a common location.
NFC

Reviewers: morehouse

Reviewed By: morehouse

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D53218

llvm-svn: 344433
2018-10-12 23:21:48 +00:00
Jonathan Metzman 0b94e88007 [SanitizerCoverage] Prevent /OPT:REF from stripping constructors
Summary:
Linking with the /OPT:REF linker flag when building COFF files causes
the linker to strip SanitizerCoverage's constructors. Prevent this by
giving the constructors WeakODR linkage and by passing the linker a
directive to include sancov.module_ctor.

Include a test in compiler-rt to verify libFuzzer can be linked using
/OPT:REF

Reviewers: morehouse, rnk

Reviewed By: morehouse, rnk

Subscribers: rnk, morehouse, hiraditya

Differential Revision: https://reviews.llvm.org/D52119

llvm-svn: 344391
2018-10-12 18:11:47 +00:00
Max Moroz 4d010ca35b [SanitizerCoverage] Make Inline8bit and TracePC counters dead stripping resistant.
Summary:
Otherwise, at least on Mac, the linker eliminates unused symbols which
causes libFuzzer to error out due to a mismatch of the sizes of coverage tables.

Issue in Chromium: https://bugs.chromium.org/p/chromium/issues/detail?id=892167

Reviewers: morehouse, kcc, george.karpenkov

Reviewed By: morehouse

Subscribers: kubamracek, llvm-commits

Differential Revision: https://reviews.llvm.org/D53113

llvm-svn: 344345
2018-10-12 13:59:31 +00:00
Tim Northover 487780678f SCCP: avoid caching DenseMap entry that might be invalidated.
Later calls to getValueState might insert entries into the ValueState map and
cause reallocation, invalidating a reference.

llvm-svn: 344327
2018-10-12 09:01:59 +00:00
Eugene Leviant eddf6b5df5 [ThinLTO] Don't import GV which contains blockaddress
Differential revision: https://reviews.llvm.org/D53139

llvm-svn: 344325
2018-10-12 07:24:02 +00:00
Kostya Serebryany d891ac9794 merge two near-identical functions createPrivateGlobalForString into one
Summary:
We have two copies of createPrivateGlobalForString (in asan and in esan).
This change merges them into one. NFC

Reviewers: vitalybuka

Reviewed By: vitalybuka

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D53178

llvm-svn: 344314
2018-10-11 23:03:27 +00:00
Leonard Chan 64e21b5cfd [PassManager/Sanitizer] Port of AddresSanitizer pass from legacy to new PassManager
This patch ports the legacy pass manager to the new one to take advantage of
the benefits of the new PM. This involved moving a lot of the declarations for
`AddressSantizer` to a header so that it can be publicly used via
PassRegistry.def which I believe contains all the passes managed by the new PM.

This patch essentially decouples the instrumentation from the legacy PM such
hat it can be used by both legacy and new PM infrastructure.

Differential Revision: https://reviews.llvm.org/D52739

llvm-svn: 344274
2018-10-11 18:31:51 +00:00
Amara Emerson 54f60255a2 [InstCombine] Fix SimplifyLibCalls erasing an instruction while IC still had references to it.
InstCombine keeps a worklist and assumes that optimizations don't
eraseFromParent() the instruction, which SimplifyLibCalls violates. This change
adds a new callback to SimplifyLibCalls to let clients specify their own hander
for erasing actions.

Differential Revision: https://reviews.llvm.org/D52729

llvm-svn: 344251
2018-10-11 14:51:11 +00:00
David Green 8066198442 [InstCombine] Demand bits of UMin
This is the umin alternative to the umax code from rL344237. We use
DeMorgans law on the umax case to bring us to the same thing on umin,
but using countLeadingOnes, not countLeadingZeros.

Differential Revision: https://reviews.llvm.org/D53036

llvm-svn: 344239
2018-10-11 11:28:27 +00:00
David Green 30c0e98b9c [InstCombine] Demand bits of UMax
Use the demanded bits of umax(A,C) to prove we can just use A so long as the
lowest non-zero bit of DemandMask is higher than the highest non-zero bit of C

Differential Revision: https://reviews.llvm.org/D53033

llvm-svn: 344237
2018-10-11 11:04:09 +00:00
Florian Hahn 18e07bb822 [LV] Use SmallVector instead of DenseMap in calculateRegisterUsage (NFC).
We assign indices sequentially for seen instructions, so we can just use
a vector and push back the seen instructions. No need for using a
DenseMap.

Reviewers: hsaito, rengolin, nadav, dcaballe

Reviewed By: rengolin

Differential Revision: https://reviews.llvm.org/D53089

llvm-svn: 344233
2018-10-11 09:46:25 +00:00
Florian Hahn 7eb5cb4ebc [LV] Ignore more debug info.
We can avoid doing some unnecessary work by skipping debug instructions
in a few loops. It also helps to ensure debug instructions do not
prevent vectorization, although I do not have any concrete test cases
for that.

Reviewers: rengolin, hsaito, dcaballe, aprantl, vsk

Reviewed By: rengolin, dcaballe

Differential Revision: https://reviews.llvm.org/D53091

llvm-svn: 344232
2018-10-11 09:27:24 +00:00
Calixte Denizet d2f290b034 [gcov] Display the hit counter for the line of a function definition
Summary:
Right now there is no hit counter on the line of function.
So the idea is add the line of the function to all the lines covered by the entry block.
Tests in compiler-rt/profile will be fixed in another patch: https://reviews.llvm.org/D49854

Reviewers: marco-c, davidxl

Reviewed By: marco-c

Subscribers: sylvestre.ledru, llvm-commits

Differential Revision: https://reviews.llvm.org/D49853

llvm-svn: 344228
2018-10-11 08:53:43 +00:00
Max Kazantsev b2e51090a4 [IndVars] Drop "exact" flag from lshr and udiv when substituting their args
There is a transform that may replace `lshr (x+1), 1` with `lshr x, 1` in case
if it can prove that the result will be the same. However the initial instruction
might have an `exact` flag set, and it now should be dropped unless we prove
that it may hold. Incorrectly set `exact` attribute may then produce poison.

Differential Revision: https://reviews.llvm.org/D53061
Reviewed By: sanjoy

llvm-svn: 344223
2018-10-11 07:22:26 +00:00
Richard Smith 6c67662816 Add a flag to remap manglings when reading profile data information.
This can be used to preserve profiling information across codebase
changes that have widespread impact on mangled names, but across which
most profiling data should still be usable. For example, when switching
from libstdc++ to libc++, or from the old libstdc++ ABI to the new ABI,
or even from a 32-bit to a 64-bit build.

The user can provide a remapping file specifying parts of mangled names
that should be treated as equivalent (eg, std::__1 should be treated as
equivalent to std::__cxx11), and profile data will be treated as
applying to a particular function if its name is equivalent to the name
of a function in the profile data under the provided equivalences. See
the documentation change for a description of how this is configured.

Remapping is supported for both sample-based profiling and instruction
profiling. We do not support remapping indirect branch target
information, but all other profile data should be remapped
appropriately.

Support is only added for the new pass manager. If someone wants to also
add support for this for the old pass manager, doing so should be
straightforward.

This is the LLVM side of Clang r344199.

Reviewers: davidxl, tejohnson, dlj, erik.pilkington

Subscribers: mehdi_amini, steven_wu, dexonsmith, llvm-commits

Differential Revision: https://reviews.llvm.org/D51249

llvm-svn: 344200
2018-10-10 23:13:47 +00:00
George Burgess IV 6ef8002c2c Replace most users of UnknownSize with LocationSize::unknown(); NFC
Moving away from UnknownSize is part of the effort to migrate us to
LocationSizes (e.g. the cleanup promised in D44748).

This doesn't entirely remove all of the uses of UnknownSize; some uses
require tweaks to assume that UnknownSize isn't just some kind of int.
This patch is intended to just be a trivial replacement for all places
where LocationSize::unknown() will Just Work.

llvm-svn: 344186
2018-10-10 21:28:44 +00:00
Sanjay Patel 05aadf885d [InstCombine] reverse 'trunc X to <N x i1>' canonicalization; 2nd try
Re-trying r344082 because it unintentionally included extra diffs.

Original commit message:
icmp ne (and X, 1), 0 --> trunc X to N x i1

Ideally, we'd do the same for scalars, but there will likely be
regressions unless we add more trunc folds as we're doing here
for vectors.

The motivating vector case is from PR37549:
https://bugs.llvm.org/show_bug.cgi?id=37549

define <4 x float> @bitwise_select(<4 x float> %x, <4 x float> %y, <4 x float> %z, <4 x float> %w) {

  %c = fcmp ole <4 x float> %x, %y
  %s = sext <4 x i1> %c to <4 x i32>
  %s1 = shufflevector <4 x i32> %s, <4 x i32> undef, <4 x i32> <i32 0, i32 0, i32 1, i32 1>
  %s2 = shufflevector <4 x i32> %s, <4 x i32> undef, <4 x i32> <i32 2, i32 2, i32 3, i32 3>
  %cond = or <4 x i32> %s1, %s2
  %condtr = trunc <4 x i32> %cond to <4 x i1>
  %r = select <4 x i1> %condtr, <4 x float> %z, <4 x float> %w
  ret <4 x float> %r

}

Here's a sampling of the vector codegen for that case using
mask+icmp (current behavior) vs. trunc (with this patch):

AVX before:

vcmpleps        %xmm1, %xmm0, %xmm0
vpermilps       $80, %xmm0, %xmm1 ## xmm1 = xmm0[0,0,1,1]
vpermilps       $250, %xmm0, %xmm0 ## xmm0 = xmm0[2,2,3,3]
vorps   %xmm0, %xmm1, %xmm0
vandps  LCPI0_0(%rip), %xmm0, %xmm0
vxorps  %xmm1, %xmm1, %xmm1
vpcmpeqd        %xmm1, %xmm0, %xmm0
vblendvps       %xmm0, %xmm3, %xmm2, %xmm0

AVX after:

vcmpleps        %xmm1, %xmm0, %xmm0
vpermilps       $80, %xmm0, %xmm1 ## xmm1 = xmm0[0,0,1,1]
vpermilps       $250, %xmm0, %xmm0 ## xmm0 = xmm0[2,2,3,3]
vorps   %xmm0, %xmm1, %xmm0
vblendvps       %xmm0, %xmm2, %xmm3, %xmm0

AVX512f before:

vcmpleps        %xmm1, %xmm0, %xmm0
vpermilps       $80, %xmm0, %xmm1 ## xmm1 = xmm0[0,0,1,1]
vpermilps       $250, %xmm0, %xmm0 ## xmm0 = xmm0[2,2,3,3]
vorps   %xmm0, %xmm1, %xmm0
vpbroadcastd    LCPI0_0(%rip), %xmm1 ## xmm1 = [1,1,1,1]
vptestnmd       %zmm1, %zmm0, %k1
vblendmps       %zmm3, %zmm2, %zmm0 {%k1}

AVX512f after:

vcmpleps        %xmm1, %xmm0, %xmm0
vpermilps       $80, %xmm0, %xmm1 ## xmm1 = xmm0[0,0,1,1]
vpermilps       $250, %xmm0, %xmm0 ## xmm0 = xmm0[2,2,3,3]
vorps   %xmm0, %xmm1, %xmm0
vpslld  $31, %xmm0, %xmm0
vptestmd        %zmm0, %zmm0, %k1
vblendmps       %zmm2, %zmm3, %zmm0 {%k1}

AArch64 before:

fcmge   v0.4s, v1.4s, v0.4s
zip1    v1.4s, v0.4s, v0.4s
zip2    v0.4s, v0.4s, v0.4s
orr     v0.16b, v1.16b, v0.16b
movi    v1.4s, #1
and     v0.16b, v0.16b, v1.16b
cmeq    v0.4s, v0.4s, #0
bsl     v0.16b, v3.16b, v2.16b

AArch64 after:

fcmge   v0.4s, v1.4s, v0.4s
zip1    v1.4s, v0.4s, v0.4s
zip2    v0.4s, v0.4s, v0.4s
orr     v0.16b, v1.16b, v0.16b
bsl     v0.16b, v2.16b, v3.16b

PowerPC-le before:

xvcmpgesp 34, 35, 34
vspltisw 0, 1
vmrglw 3, 2, 2
vmrghw 2, 2, 2
xxlor 0, 35, 34
xxlxor 35, 35, 35
xxland 34, 0, 32
vcmpequw 2, 2, 3
xxsel 34, 36, 37, 34

PowerPC-le after:

xvcmpgesp 34, 35, 34
vmrglw 3, 2, 2
vmrghw 2, 2, 2
xxlor 0, 35, 34
xxsel 34, 37, 36, 0

Differential Revision: https://reviews.llvm.org/D52747

llvm-svn: 344181
2018-10-10 20:47:46 +00:00
Sanjay Patel 58fc00d0bc revert r344082: [InstCombine] reverse 'trunc X to <N x i1>' canonicalization
This commit accidentally included the diffs from D53057.

llvm-svn: 344178
2018-10-10 20:39:39 +00:00
Renato Golin d8e7ca4a32 [VPlan] Fix CondBit quoting in dumpBasicBlock
Quotes were being printed for VPInstructions but not the rest.

llvm-svn: 344161
2018-10-10 17:55:21 +00:00
Scott Linder 3759efc650 Relax trivial cast requirements in CallPromotionUtils
Differential Revision: https://reviews.llvm.org/D52792

llvm-svn: 344153
2018-10-10 16:35:47 +00:00
Carlos Alberto Enciso c0952c8a08 Revert "[DebugInfo][Dexter] Unreachable line stepped onto after SimplifyCFG."
This reverts commit r344120.

It was causing buildbot failures.

llvm-svn: 344135
2018-10-10 12:09:34 +00:00
Neil Henning 3d4579829e Fix an ordering bug in the scalarizer.
I've added a new test case that causes the scalarizer to try and use
dead-and-erased values - caused by the basic blocks not being in
domination order within the function. To fix this, instead of iterating
through the blocks in function order, I walk them in reverse post order.

Differential Revision: https://reviews.llvm.org/D52540

llvm-svn: 344128
2018-10-10 09:27:45 +00:00
Carlos Alberto Enciso e7a347e5f8 [DebugInfo][Dexter] Unreachable line stepped onto after SimplifyCFG.
When SimplifyCFG changes the PHI node into a select instruction, the debug line records becomes ambiguous. It causes the debugger to display unreachable source lines. 

Differential Revision: https://reviews.llvm.org/D52887

llvm-svn: 344120
2018-10-10 08:29:55 +00:00
George Burgess IV 40dc63e1f0 [Analysis] Make LocationSizes carry an 'imprecise' bit
There are places where we need to merge multiple LocationSizes of
different sizes into one, and get a sensible result.

There are other places where we want to optimize aggressively based on
the value of a LocationSizes (e.g. how can a store of four bytes be to
an area of storage that's only two bytes large?)

This patch makes LocationSize hold an 'imprecise' bit to note whether
the LocationSize can be treated as an upper-bound and lower-bound for
the size of a location, or just an upper-bound.

This concludes the series of patches leading up to this. The most recent
of which is r344108.

Fixes PR36228.

Differential Revision: https://reviews.llvm.org/D44748

llvm-svn: 344114
2018-10-10 06:39:40 +00:00
Max Kazantsev 1d893bfcea [NFC] Make a variable const
llvm-svn: 344113
2018-10-10 04:19:38 +00:00
Sanjay Patel e9ca7ea3e5 [InstCombine] reverse 'trunc X to <N x i1>' canonicalization
icmp ne (and X, 1), 0 --> trunc X to N x i1

Ideally, we'd do the same for scalars, but there will likely be 
regressions unless we add more trunc folds as we're doing here 
for vectors.

The motivating vector case is from PR37549:
https://bugs.llvm.org/show_bug.cgi?id=37549

define <4 x float> @bitwise_select(<4 x float> %x, <4 x float> %y, <4 x float> %z, <4 x float> %w) {
  %c = fcmp ole <4 x float> %x, %y
  %s = sext <4 x i1> %c to <4 x i32>
  %s1 = shufflevector <4 x i32> %s, <4 x i32> undef, <4 x i32> <i32 0, i32 0, i32 1, i32 1>
  %s2 = shufflevector <4 x i32> %s, <4 x i32> undef, <4 x i32> <i32 2, i32 2, i32 3, i32 3>
  %cond = or <4 x i32> %s1, %s2
  %condtr = trunc <4 x i32> %cond to <4 x i1>
  %r = select <4 x i1> %condtr, <4 x float> %z, <4 x float> %w
  ret <4 x float> %r
}

Here's a sampling of the vector codegen for that case using 
mask+icmp (current behavior) vs. trunc (with this patch):

AVX before:

vcmpleps	%xmm1, %xmm0, %xmm0
vpermilps	$80, %xmm0, %xmm1 ## xmm1 = xmm0[0,0,1,1]
vpermilps	$250, %xmm0, %xmm0 ## xmm0 = xmm0[2,2,3,3]
vorps	%xmm0, %xmm1, %xmm0
vandps	LCPI0_0(%rip), %xmm0, %xmm0
vxorps	%xmm1, %xmm1, %xmm1
vpcmpeqd	%xmm1, %xmm0, %xmm0
vblendvps	%xmm0, %xmm3, %xmm2, %xmm0

AVX after:

vcmpleps	%xmm1, %xmm0, %xmm0
vpermilps	$80, %xmm0, %xmm1 ## xmm1 = xmm0[0,0,1,1]
vpermilps	$250, %xmm0, %xmm0 ## xmm0 = xmm0[2,2,3,3]
vorps	%xmm0, %xmm1, %xmm0
vblendvps	%xmm0, %xmm2, %xmm3, %xmm0

AVX512f before:

vcmpleps	%xmm1, %xmm0, %xmm0
vpermilps	$80, %xmm0, %xmm1 ## xmm1 = xmm0[0,0,1,1]
vpermilps	$250, %xmm0, %xmm0 ## xmm0 = xmm0[2,2,3,3]
vorps	%xmm0, %xmm1, %xmm0
vpbroadcastd	LCPI0_0(%rip), %xmm1 ## xmm1 = [1,1,1,1]
vptestnmd	%zmm1, %zmm0, %k1
vblendmps	%zmm3, %zmm2, %zmm0 {%k1}

AVX512f after:

vcmpleps	%xmm1, %xmm0, %xmm0
vpermilps	$80, %xmm0, %xmm1 ## xmm1 = xmm0[0,0,1,1]
vpermilps	$250, %xmm0, %xmm0 ## xmm0 = xmm0[2,2,3,3]
vorps	%xmm0, %xmm1, %xmm0
vpslld	$31, %xmm0, %xmm0
vptestmd	%zmm0, %zmm0, %k1
vblendmps	%zmm2, %zmm3, %zmm0 {%k1}

AArch64 before:

fcmge	v0.4s, v1.4s, v0.4s
zip1	v1.4s, v0.4s, v0.4s
zip2	v0.4s, v0.4s, v0.4s
orr	v0.16b, v1.16b, v0.16b
movi	v1.4s, #1
and	v0.16b, v0.16b, v1.16b
cmeq	v0.4s, v0.4s, #0
bsl	v0.16b, v3.16b, v2.16b

AArch64 after:

fcmge	v0.4s, v1.4s, v0.4s
zip1	v1.4s, v0.4s, v0.4s
zip2	v0.4s, v0.4s, v0.4s
orr	v0.16b, v1.16b, v0.16b
bsl	v0.16b, v2.16b, v3.16b

PowerPC-le before:

xvcmpgesp 34, 35, 34
vspltisw 0, 1
vmrglw 3, 2, 2
vmrghw 2, 2, 2
xxlor 0, 35, 34
xxlxor 35, 35, 35
xxland 34, 0, 32
vcmpequw 2, 2, 3
xxsel 34, 36, 37, 34

PowerPC-le after:

xvcmpgesp 34, 35, 34
vmrglw 3, 2, 2
vmrghw 2, 2, 2
xxlor 0, 35, 34
xxsel 34, 37, 36, 0

Differential Revision: https://reviews.llvm.org/D52747

llvm-svn: 344082
2018-10-09 21:26:01 +00:00
Sanjay Patel 88194dfe1a [InstCombine] make helper function 'static'; NFC
llvm-svn: 344056
2018-10-09 15:29:26 +00:00
George Burgess IV f96e618017 Make LocationSize a proper Optional type; NFC
This is the second in a series of changes intended to make
https://reviews.llvm.org/D44748 more easily reviewable. Please see that
patch for more context. The first change being r344012.

Since I was requested to do all of this with post-commit review, this is
about as small as I can make this patch.

This patch makes LocationSize into an actual type that wraps a uint64_t;
users are required to call getValue() in order to get the size now. If
the LocationSize has an Unknown size (e.g. if LocSize ==
MemoryLocation::UnknownSize), getValue() will assert.

This also adds DenseMap specializations for LocationInfo, which required
taking two more values from the set of values LocationInfo can
represent. Hence, heavy users of multi-exabyte arrays or structs may
observe slightly lower-quality code as a result of this change.

The intent is for getValue()s to be very close to a corresponding
hasValue() (which is often spelled `!= MemoryLocation::UnknownSize`).
Sadly, small diff context appears to crop that out sometimes, and the
last change in DSE does require a bit of nonlocal reasoning about
control-flow. :/

This also removes an assert, since it's now redundant with the assert in
getValue().

llvm-svn: 344013
2018-10-09 03:18:56 +00:00
George Burgess IV fefc42c9bb Use locals instead of struct fields; NFC
This is one of a series of changes intended to make
https://reviews.llvm.org/D44748 more easily reviewable. Please see that
patch for more context.

Since I was requested to do all of this with post-commit review, this is
about as small as I can make it (beyond committing changes to these few
files separately, but they're incredibly similar in spirit, so...)

On its own, this change doesn't make a great deal of sense. I plan on
having a follow-up Real Soon Now(TM) to make the bits here make more
sense. :)

In particular, the next change in this series is meant to make
LocationSize an actual type, which you have to call .getValue() on in
order to get at the uint64_t inside. Hence, this change refactors code
so that:
- we only need to call the soon-to-come getValue() once in most cases,
  and
- said call to getValue() happens very closely to a piece of code that
  checks if the LocationSize has a value (e.g. if it's != UnknownSize).

llvm-svn: 344012
2018-10-09 02:14:33 +00:00
Robert Lougher 0c93ea2634 [TailCallElim] Enable marking of calls with byval as tails
In r339636 the alias analysis rules were changed with regards to tail calls
and byval arguments. Previously, tail calls were assumed not to alias
allocas from the current frame. This has been updated, to not assume this
for arguments with the byval attribute.

This patch aligns TailCallElim with the new rule. Tail marking can now be
more aggressive and mark more calls as tails, e.g.:

define void @test() {
  %f = alloca %struct.foo
  call void @bar(%struct.foo* byval %f)
  ret void
}

define void @test2(%struct.foo* byval %f) {
  call void @bar(%struct.foo* byval %f)
  ret void
}

define void @test3(%struct.foo* byval %f) {
  %agg.tmp = alloca %struct.foo
  %0 = bitcast %struct.foo* %agg.tmp to i8*
  %1 = bitcast %struct.foo* %f to i8*
  call void @llvm.memcpy.p0i8.p0i8.i64(i8* %0, i8* %1, i64 40, i1 false)
  call void @bar(%struct.foo* byval %agg.tmp)
  ret void
}

The problematic case where a byval parameter is captured by a call is still
handled correctly, and will not be marked as a tail (see PR7272).

llvm-svn: 343986
2018-10-08 18:03:40 +00:00
Xin Tong bfdad33b82 [ThinLTO] Keep non-prevailing (linkonce|weak)_odr symbols live
Summary:
If we have a symbol with (linkonce|weak)_odr linkage, we do not want
to dead strip it even it is not prevailing.

IR level (linkonce|weak)_odr symbol can become non-prevailing when we mix
ELF objects and IR objects where the (linkonce|weak)_odr symbol in the ELF
object is prevailing and the ones in the IR objects are not. Stripping
them will prevent us from doing optimizations with them.

By not dead stripping them, We will convert these symbols to
available_externally linkage as a result of non-prevailing and eventually
dropping them after inlining.

I modified cache-prevailing.ll to use linkonce linkage as it is
testing whether cache prevailing bit is effective or not, not
we should treat linkonce_odr alive or not

Reviewers: tejohnson, pcc

Subscribers: mehdi_amini, inglorion, eraman, steven_wu, dexonsmith, llvm-commits

Differential Revision: https://reviews.llvm.org/D52893

llvm-svn: 343970
2018-10-08 15:12:48 +00:00
Neil Henning 57f5d0a885 [IRBuilder] Fixup CreateIntrinsic to allow specifying Types to Mangle.
The IRBuilder CreateIntrinsic method wouldn't allow you to specify the
types that you wanted the intrinsic to be mangled with. To fix this
I've:

- Added an ArrayRef<Type *> member to both CreateIntrinsic overloads.
- Used that array to pass into the Intrinsic::getDeclaration call.
- Added a CreateUnaryIntrinsic to replace the most common use of
  CreateIntrinsic where the type was auto-deduced from operand 0.
- Added a bunch more unit tests to test Create*Intrinsic calls that
  weren't being tested (including the FMF flag that wasn't checked).

This was suggested as part of the AMDGPU specific atomic optimizer
review (https://reviews.llvm.org/D51969).

Differential Revision: https://reviews.llvm.org/D52087

llvm-svn: 343962
2018-10-08 10:32:33 +00:00
Ewan Crawford fa120cbdbc [InstCombine] Fix incongruous GEP type addrspace
Currently running the @insertelem_after_gep function below through the InstCombine pass with opt produces invalid IR.

Input:
```
define void @insertelem_after_gep(<16 x i32>* %t0) {
   %t1 = bitcast <16 x i32>* %t0 to [16 x i32]*
   %t2 = addrspacecast [16 x i32]* %t1 to [16 x i32] addrspace(3)*
   %t3 = getelementptr inbounds [16 x i32], [16 x i32] addrspace(3)* %t2, i64 0, i64 0
   %t4 = insertelement <16 x i32 addrspace(3)*> undef, i32 addrspace(3)* %t3, i32 0
   call void @extern_vec_pointers_func(<16 x i32 addrspace(3)*> %t4)
   ret void
}
```

Output:

```
define void @insertelem_after_gep(<16 x i32>* %t0) {
  %t3 = getelementptr inbounds <16 x i32>, <16 x i32>* %t0, i64 0, i64 0
  %t4 = insertelement <16 x i32 addrspace(3)*> undef, i32 addrspace(3)* %t3, i32 0
  call void @my_extern_func(<16 x i32 addrspace(3)*> %t4)
  ret void
}
```

Which although causes no complaints when produced, isn't valid IR as the insertelement use of the %t3 GEP expects an address space.

```
opt: /tmp/bad.ll:52:73: error: '%t3' defined with type 'i32*' but expected 'i32 addrspace(3)*'
  %t4 = insertelement <16 x i32 addrspace(3)*> undef, i32 addrspace(3)* %t3, i32 0
```

I've fixed this by adding an addrspacecast after the GEP in the InstCombine pass, and including a check for this type mismatch to the verifier.

Reviewers: spatel, lebedev.ri
Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D52294

llvm-svn: 343956
2018-10-08 08:40:45 +00:00
Max Kazantsev b07369651e [LV] Do not create SCEVs on broken IR in emitTransformedIndex. PR39160
At the point when we perform `emitTransformedIndex`, we have a broken IR (in
particular, we have Phis for which not every incoming value is properly set). On
such IR, it is illegal to create SCEV expressions, because their internal
simplification process may try to prove some predicates and break when it
stumbles across some broken IR.

The only purpose of using SCEV in this particular place is attempt to simplify
the generated code slightly. It seems that the result isn't worth it, because
some trivial cases (like addition of zero and multiplication by 1) can be
handled separately if needed, but more generally InstCombine is able to achieve
the goals we want to achieve by using SCEV.

This patch fixes a functional crash described in PR39160, and as side-effect it
also generates a bit smarter code in some simple cases. It also may cause some
optimality loss (i.e. we will now generate `mul` by power of `2` instead of
shift etc), but there is nothing what InstCombine could not handle later. In
case of dire need, we can support more trivial cases just in place.

Note that this patch only fixes one particular case of the general problem that
LV misuses SCEV, attempting to create SCEVs or prove predicates on invalid IR.
The general solution, however, seems complex enough.

Differential Revision: https://reviews.llvm.org/D52881
Reviewed By: fhahn, hsaito

llvm-svn: 343954
2018-10-08 05:46:29 +00:00
Jonas Paulsson 29d80f07ee [LoopVectorizer] Use TTI.getOperandInfo()
Call getOperandInfo() instead of using (near) duplicated code in
LoopVectorizationCostModel::getInstructionCost().

This gets the OperandValueKind and OperandValueProperties values for a Value
passed as operand to an arithmetic instruction.

getOperandInfo() used to be a static method in TargetTransformInfo.cpp, but
is now instead a public member.

Review: Florian Hahn
https://reviews.llvm.org/D52883

llvm-svn: 343852
2018-10-05 14:34:04 +00:00