Commit Graph

12050 Commits

Author SHA1 Message Date
Nikita Popov 14ca9a8355 Revert "[DemandedBits][BDCE] Support vectors of integers"
This reverts commit r348549. Causing assertion failures during
clang build.

llvm-svn: 348558
2018-12-07 00:42:03 +00:00
Nikita Popov cf65b9207b [DemandedBits][BDCE] Support vectors of integers
DemandedBits and BDCE currently only support scalar integers. This
patch extends them to also handle vector integer operations. In this
case bits are not tracked for individual vector elements, instead a
bit is demanded if it is demanded for any of the elements. This matches
the behavior of computeKnownBits in ValueTracking and
SimplifyDemandedBits in InstCombine.

The getDemandedBits() method can now only be called on instructions that
have integer or vector of integer type. Previously it could be called on
any sized instruction (even if it was not particularly useful). The size
of the return value is now always the scalar size in bits (while
previously it was the type size in bits).

Differential Revision: https://reviews.llvm.org/D55297

llvm-svn: 348549
2018-12-06 23:50:32 +00:00
Nikita Popov d7b6b62deb [BDCE] Add tests for BDCE applied to vector instructions; NFC
These are baseline tests for D55297.

llvm-svn: 348548
2018-12-06 23:50:19 +00:00
Alexandros Lamprineas e4c91f5c4c [GVN] Don't perform scalar PRE on GEPs
Partial Redundancy Elimination of GEPs prevents CodeGenPrepare from
sinking the addressing mode computation of memory instructions back
to its uses. The problem comes from the insertion of PHIs, which
confuse CGP and make it bail.

I've autogenerated the check lines of an existing test and added a
store instruction to demonstrate the motivation behind this change.
The store is now using the gep instead of a phi.

Differential Revision: https://reviews.llvm.org/D55009

llvm-svn: 348496
2018-12-06 16:11:58 +00:00
Ilya Biryukov cb5331eb93 Revert "[LoopSimplifyCFG] Delete dead in-loop blocks"
This reverts commit r348457.
The original commit causes clang to crash when doing an instrumented
build with a new pass manager. Reverting to unbreak our integrate.

llvm-svn: 348484
2018-12-06 13:21:01 +00:00
Roman Lebedev 98cb1216a6 [InstCombine] foldICmpWithLowBitMaskedVal(): don't miscompile -1 vector elts
I was finally able to quantify what i thought was missing in the fix,
it was vector constants. If we have a scalar (and %x, -1),
it will be instsimplified before we reach this code,
but if it is a vector, we may still have a -1 element.

Thus, we want to avoid the fold if *at least one* element is -1.
Or in other words, ignoring the undef elements, no sign bits
should be set. Thus, m_NonNegative().

A follow-up for rL348181
https://bugs.llvm.org/show_bug.cgi?id=39861

llvm-svn: 348462
2018-12-06 08:14:24 +00:00
Roman Lebedev d9941fa270 [NFC][InstCombine] Add more miscompile tests for foldICmpWithLowBitMaskedVal()
We also have to me aware of vector constants. If at least one element
is -1, we can't transform.

llvm-svn: 348461
2018-12-06 08:11:20 +00:00
Max Kazantsev 0b1d069d64 [LoopSimplifyCFG] Delete dead in-loop blocks
This patch teaches LoopSimplifyCFG to delete loop blocks that have
become unreachable after terminator folding has been done.

Differential Revision: https://reviews.llvm.org/D54023
Reviewed By: anna

llvm-svn: 348457
2018-12-06 05:45:02 +00:00
Matt Arsenault ca8631ba6e InstCombine: Add some missing tests for scalarization
llvm-svn: 348456
2018-12-06 03:32:50 +00:00
David L. Jones 5ff7b8a04a Revert r347934 "[SCEV] Guard movement of insertion point for loop-invariants"
This change caused SEGVs in instcombine. (The r347934 change seems to me to be a
precipitating cause, not a root cause. Details are on the llvm-commits thread
for r347934.)

llvm-svn: 348426
2018-12-05 23:13:50 +00:00
Sanjay Patel 998ececef0 [InstCombine] remove dead code from visitExtractElement
Extracting from a splat constant is always handled by InstSimplify.
Move the test for this from InstCombine to InstSimplify to make
sure that stays true.

llvm-svn: 348423
2018-12-05 23:09:33 +00:00
Sanjay Patel de3db684b7 [InstCombine] add/move tests for extractelement; NFC
llvm-svn: 348417
2018-12-05 21:56:13 +00:00
Vedant Kumar 09415a850e [CodeExtractor] Do not marked outlined calls which may resume EH as noreturn
Treat terminators which resume exception propagation as returning instructions
(at least, for the purposes of marking outlined functions `noreturn`). This is
to avoid inserting traps after calls to outlined functions which unwind.

rdar://46129950

llvm-svn: 348404
2018-12-05 19:35:37 +00:00
Christian Bruel 4ead99b3ac Allow norecurse attribute on functions that have debug infos.
Summary: debug intrinsics might be marked norecurse to enable the caller function to be norecurse and optimized if needed. This avoids code gen optimisation differences when -g is used, as in globalOpt.cpp:processInternalGlobal checks.

Reviewers: chandlerc, jmolloy, aprantl

Reviewed By: aprantl

Subscribers: aprantl, llvm-commits

Differential Revision: https://reviews.llvm.org/D55187

llvm-svn: 348381
2018-12-05 16:48:00 +00:00
Sanjay Patel baffae91b2 [InstCombine] simplify icmps with same operands based on dominating cmp
The tests here are based on the motivating cases from D54827.

More background:
1. We don't get these cases in general with SimplifyCFG because the root
   of the pattern match is an icmp, not a branch. I'm not sure how often
   we encounter this pattern vs. the seemingly more likely case with 
   branches, but I don't see evidence to leave the minimal pattern
   unoptimized.

2. This has a chance of increasing compile-time because we're using a
   ValueTracking call to handle the match. The motivating cases could be
   handled with a simpler pair of calls to isImpliedTrueByMatchingCmp/
   isImpliedFalseByMatchingCmp, but I saw that we have a more 
   comprehensive wrapper around those, so we might as well use it here
   unless there's evidence that it's significantly slower.

3. Ideally, we'd handle the fold to constants in InstSimplify, but as
   with the existing code here, we could extend this to handle cases
   where the result is not a constant, but a new combined predicate.
   That would mean splitting the logic across the 2 passes and possibly
   duplicating the pattern-matching cost.

4. As mentioned in D54827, this seems like the kind of thing that should
   be handled in Correlated Value Propagation, but that pass is currently
   limited to dealing with instructions with constant operands, so extending
   this bit of InstCombine is the smallest/easiest way to get these patterns 
   optimized.

llvm-svn: 348367
2018-12-05 15:04:00 +00:00
Max Kazantsev 594cb55686 [NFC] Verify memoryssa in test for PR39783
llvm-svn: 348333
2018-12-05 05:20:08 +00:00
Sanjay Patel 1df1facae2 [InstCombine] add tests for implied simplifications; NFC
Ideally, we would fold all of these in InstSimplify in a
similar way to rL347896, but this is a bit awkward when
we're trying to simplify a compare directly because the
ValueTracking API expects the compare as an input, but
in InstSimplify, we just have the operands of the compare.

Given that we can do transforms besides just simplifications,
we might as well just extend the code in InstCombine (which 
already does simplifications with constant operands).

llvm-svn: 348312
2018-12-04 22:25:33 +00:00
Sanjay Patel 882555628b [InstCombine] auto-generate full checks for icmp overflow tests; NFC
llvm-svn: 348274
2018-12-04 15:41:34 +00:00
Sanjay Patel 320cf5dde5 [InstCombine] auto-generate full checks for icmp dominator tests; NFC
llvm-svn: 348270
2018-12-04 15:00:35 +00:00
Simon Pilgrim 924f98e579 Add common check prefix. NFCI.
llvm-svn: 348265
2018-12-04 14:32:42 +00:00
Alina Sbirlea a2eebb828e Update MemorySSA in SimpleLoopUnswitch.
Summary:
Teach SimpleLoopUnswitch to preserve MemorySSA.

Subscribers: sanjoy, jlebar, Prazek, george.burgess.iv, llvm-commits

Differential Revision: https://reviews.llvm.org/D47022

llvm-svn: 348263
2018-12-04 14:23:37 +00:00
Vedant Kumar d129569e34 [CodeExtractor] Split PHI nodes with incoming values from outlined region (PR39433)
If a PHI node out of extracted region has multiple incoming values from it,
split this PHI on two parts. First PHI has incomings only from region and
extracts with it (they are placed to the separate basic block that added to the
list of outlined), and incoming values in original PHI are replaced by first
PHI. Similar solution is already used in CodeExtractor for PHIs in entry block
(severSplitPHINodes method). It covers PR39433 bug.

Patch by Sergei Kachkov!

Differential Revision: https://reviews.llvm.org/D55018

llvm-svn: 348205
2018-12-03 22:40:21 +00:00
Sanjay Patel 8c65515082 [InstCombine] fix undef propagation bug with shuffle+binop
When we have a shuffle that extends a source vector with undefs
and then do some binop on that, we must make sure that the extra
elements remain undef with that binop if we reverse the order of
the binop and shuffle.

'or' is probably the easiest example to show the bug because
'or C, undef --> -1' (not undef). But there are other 
opcode/constant combinations where this is true as shown by 
the 'shl' test.

llvm-svn: 348191
2018-12-03 21:15:17 +00:00
Roman Lebedev 7bf2fed167 [InstCombine] foldICmpWithLowBitMaskedVal(): disable 2 faulty folds.
These two folds are invalid for this non-constant pattern
when the mask ends up being all-ones:
https://rise4fun.com/Alive/9au
https://rise4fun.com/Alive/UcQM

Fixes https://bugs.llvm.org/show_bug.cgi?id=39861

llvm-svn: 348181
2018-12-03 20:07:58 +00:00
Sanjay Patel 3e66d81ec6 [InstCombine] add tests for shuffle+binop fold; NFC
llvm-svn: 348173
2018-12-03 19:41:21 +00:00
Sanjay Patel 8918a511a1 [SimplifyCFG] add tests for cross block compare folding; NFC
These are the baseline tests for D54827.
Patch based on code originally written by: @yinyuefengyi (luo xionghu)

Differential Revision: https://reviews.llvm.org/D54994

llvm-svn: 348151
2018-12-03 16:55:29 +00:00
Michal Gorny ff13c24cfe [test] Fix use of 'sort -b' in SimpleLoopUnswitch on NetBSD
Add '-k 1' to 'sort -b' calls in SimpleLoopUnswitch tests, as required
for sort implementation on NetBSD.  The '-b' modifier is ineffective
if specified without any key.  Per the manpage:

  Note that the -b option has no effect unless key fields are specified.

Differential Revision: https://reviews.llvm.org/D55168

llvm-svn: 348097
2018-12-02 16:49:33 +00:00
Simon Pilgrim 102854f4d4 [TTI] Reduction costs only need to include a single extract element cost (REAPPLIED)
We were adding the entire scalarization extraction cost for reductions, which returns the total cost of extracting every element of a vector type.

For reductions we don't need to do this - we just need to extract the 0'th element after the reduction pattern has completed.

Fixes PR37731

Rebased and reapplied after being reverted in rL347541 due to PR39774 - which was fixed by D54955/rL347759 and D55017/rL347997

Differential Revision: https://reviews.llvm.org/D54585

llvm-svn: 348076
2018-12-01 14:18:31 +00:00
Nikita Popov 0c5d6ccbfc [InstCombine] Support ssub.sat canonicalization for non-splats
Extend ssub.sat(X, C) -> sadd.sat(X, -C) canonicalization to also
support non-splat vector constants. This is done by generalizing
the implementation of the isNotMinSignedValue() helper to return
true for constants that are non-splat, but don't contain any
signed min elements.

Differential Revision: https://reviews.llvm.org/D55011

llvm-svn: 348072
2018-12-01 10:58:34 +00:00
Teresa Johnson 5b8ff375c8 [ThinLTO] Allow importing of functions with var args
Summary:
Follow up to D54270, which allowed importing of var args functions
unless they called va_start. As pointed out in the post-commit comments
on that patch, the inliner can handle functions that call va_start in
certain situations as well. Go ahead and enable importing of all var
args functions. Measurements on a large binary show that this increases
imports and binary size by an insignificant amount.

Reviewers: davidxl

Subscribers: mehdi_amini, inglorion, eraman, steven_wu, dexonsmith, llvm-commits

Differential Revision: https://reviews.llvm.org/D54607

llvm-svn: 348068
2018-12-01 05:11:46 +00:00
Craig Topper 88270231f8 [X86][LoopVectorize] Replace -mcpu=skylake-avx512 with -mattr=avx512f in some tests that failed when experimenting with defaulting to -mprefer-vector-width=256 for skylake-avx512.
llvm-svn: 348063
2018-12-01 01:38:44 +00:00
Craig Topper 502fc1bdd5 [X86] Split skylake-avx512 run lines in SLP vectorizer tests to cover -mprefer=vector-width=256 and -mprefer-vector-width=512.
This will make these tests immune if we ever change the default behavior of -march=skylake-avx512 to prefer 256 bit vectors.

llvm-svn: 348046
2018-11-30 22:53:21 +00:00
Sanjay Patel 398728732e [InstSimplify] add tests for undef + partial undef constant folding; NFC
These tests should probably go under a separate test file because they
should fold with just -constprop, but they're similar to the scalar
tests already in here.

llvm-svn: 348045
2018-11-30 22:51:34 +00:00
Joseph Tremoulet 27b1e3bd4f [Mem2Reg] Fix nondeterministic corner case
Summary:
When mem2reg inserts phi nodes in blocks with unreachable predecessors,
it adds undef operands for those incoming edges.  When there are
multiple such predecessors, the order is currently based on the address
of the BasicBlocks.  This change fixes that by using the BBNumbers in
the sort/search predicates, as is done elsewhere in mem2reg to ensure
determinism.

Also adds a testcase with a bunch of unreachable preds, which
(nodeterministically) fails without the fix.


Reviewers: majnemer

Reviewed By: majnemer

Subscribers: mgrang, llvm-commits

Differential Revision: https://reviews.llvm.org/D55077

llvm-svn: 348024
2018-11-30 19:20:02 +00:00
Alexey Bataev 3689747619 [SLP]PR39774: Update references of the replaced external instructions.
Summary:
An additional fix for PR39774. Need to update the references for the
RedcutionRoot instruction when it is replaced during the vectorization
phase to avoid compiler crash on reduction vectorization.

Reviewers: RKSimon, spatel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D55017

llvm-svn: 347997
2018-11-30 15:14:20 +00:00
Renato Golin 135e72e1b9 Add a new reduction pattern match
Adding a new reduction pattern match for vectorizing code similar
to TSVC s3111:

for (int i = 0; i < N; i++)
  if (a[i] > b)
    sum += a[i];

This patch adds support for fadd, fsub and fmull, as well as multiple
branches and different (but compatible) instructions (ex. add+sub) in
different branches.

The difference from the previous patch(https://reviews.llvm.org/D49168)
is as follows:
 - Added check of fast-math property of fp-instruction to the
   previous patch
 - Fix/add some pattern for if-reduction.ll


Differential Revision: https://reviews.llvm.org/D54464

Patch by Takahiro Miyoshi <takahiro.miyoshi@linaro.org>
     and Masakazu Ueno <masakazu.ueno@linaro.org>

llvm-svn: 347989
2018-11-30 13:40:10 +00:00
Max Kazantsev 9cf417db78 [LoopSimplifyCFG] Update MemorySSA in terminator folding. PR39783
Terminator folding transform lacks MemorySSA update for memory Phis,
while they exist within MemorySSA analysis. They need exactly the same
type of updates as regular Phis. Failing to update them properly ends up
with inconsistent MemorySSA and manifests in various assertion failures.

This patch adds Memory Phi updates to this transform.

Thanks to @jonpa for finding this!

Differential Revision: https://reviews.llvm.org/D55050
Reviewed By: asbirlea

llvm-svn: 347979
2018-11-30 10:06:23 +00:00
Max Kazantsev deaa3e2068 [NFC] Simplify and reduce tests for PR39783
llvm-svn: 347976
2018-11-30 09:51:25 +00:00
Warren Ristow 72d1f3a285 [SCEV] Guard movement of insertion point for loop-invariants
r320789 suppressed moving the insertion point of SCEV expressions with
dev/rem operations to the loop header in non-loop-invariant situations.
This, and similar, hoisting is also unsafe in the loop-invariant case,
since there may be a guard against a zero denominator. This is an
adjustment to the fix of r320789 to suppress the movement even in the
loop-invariant case.

This fixes PR30806.

Differential Revision: https://reviews.llvm.org/D54713

llvm-svn: 347934
2018-11-30 00:02:54 +00:00
David Stuttard c6603861d8 Revert r347871 "Fix: Add support for TFE/LWE in image intrinsic"
Also revert fix r347876

One of the buildbots was reporting a failure in some relevant tests that I can't
repro or explain at present, so reverting until I can isolate.

llvm-svn: 347911
2018-11-29 20:14:17 +00:00
Sanjay Patel d802270808 [InstSimplify] fold select with implied condition
This is an almost direct move of the functionality from InstCombine to 
InstSimplify. There's no reason not to do this in InstSimplify because 
we never create a new value with this transform.

(There's a question of whether any dominance-based transform belongs in
either of these passes, but that's a separate issue.)

I've changed 1 of the conditions for the fold (1 of the blocks for the 
branch must be the block we started with) into an assert because I'm not 
sure how that could ever be false.

We need 1 extra check to make sure that the instruction itself is in a
basic block because passes other than InstCombine may be using InstSimplify
as an analysis on values that are not wired up yet.

The 3-way compare changes show that InstCombine has some kind of 
phase-ordering hole. Otherwise, we would have already gotten the intended
final result that we now show here.

llvm-svn: 347896
2018-11-29 18:44:39 +00:00
John Brawn a7eb2c863f [LICM] Reapply r347776 "Make LICM able to hoist phis" with fix
This commit caused a large compile-time slowdown in some cases when NDEBUG is
off due to the dominator tree verification it added. Fix this by only doing
dominator tree and loop info verification when something has been hoisted.

Differential Revision: https://reviews.llvm.org/D52827

llvm-svn: 347889
2018-11-29 17:10:00 +00:00
Sanjay Patel 515b91cdef [SimplifyCFG] auto-generate complete checks; NFC
llvm-svn: 347882
2018-11-29 16:28:37 +00:00
Sanjay Patel 81449c6b0e [InstCombine] auto-generate complete checks; NFC
llvm-svn: 347881
2018-11-29 16:26:03 +00:00
Joseph Tremoulet 926ee459c4 [CallSiteSplitting] Report edge deletion to DomTreeUpdater
Summary:
When splitting musttail calls, the split blocks' original terminators
get removed; inform the DTU when this happens.

Also add a testcase that fails an assertion in the DTU without this fix.


Reviewers: fhahn, junbuml

Reviewed By: fhahn

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D55027

llvm-svn: 347872
2018-11-29 15:27:04 +00:00
David Stuttard de02e4b1cc Add support for TFE/LWE in image intrinsics
TFE and LWE support requires extra result registers that are written in the
event of a failure in order to detect that failure case.
The specific use-case that initiated these changes is sparse texture support.

This means that if image intrinsics are used with either option turned on, the
programmer must ensure that the return type can contain all of the expected
results. This can result in redundant registers since the vector size must be a
power-of-2.

This change takes roughly 6 parts:
1. Modify the instruction defs in tablegen to add new instruction variants that
can accomodate the extra return values.
2. Updates to lowerImage in SIISelLowering.cpp to accomodate setting TFE or LWE
(where the bulk of the work for these instruction types is now done)
3. Extra verification code to catch cases where intrinsics have been used but
insufficient return registers are used.
4. Modification to the adjustWritemask optimisation to account for TFE/LWE being
enabled (requires extra registers to be maintained for error return value).
5. An extra pass to zero initialize the error value return - this is because if
the error does not occur, the register is not written and thus must be zeroed
before use. Also added a new (on by default) option to ensure ALL return values
are zero-initialized that is required for sparse texture support.
6. Disable the inst_combine optimization in the presence of tfe/lwe (later TODO
for this to re-enable and handle correctly).

There's an additional fix now to avoid a dmask=0

For an image intrinsic with tfe where all result channels except tfe
were unused, I was getting an image instruction with dmask=0 and only a
single vgpr result for tfe. That is incorrect because the hardware
assumes there is at least one vgpr result, plus the one for tfe.

Fixed by forcing dmask to 1, which gives the desired two vgpr result
with tfe in the second one.

The TFE or LWE result is returned from the intrinsics using an aggregate
type. Look in the test code provided to see how this works, but in essence IR
code to invoke the intrinsic looks as follows:

%v = call {<4 x float>,i32} @llvm.amdgcn.image.load.1d.v4f32i32.i32(i32 15,
                                      i32 %s, <8 x i32> %rsrc, i32 1, i32 0)
%v.vec = extractvalue {<4 x float>, i32} %v, 0
%v.err = extractvalue {<4 x float>, i32} %v, 1

Differential revision: https://reviews.llvm.org/D48826

Change-Id: If222bc03642e76cf98059a6bef5d5bffeda38dda
llvm-svn: 347871
2018-11-29 15:21:13 +00:00
Martin Storsjo bfd1d27585 Revert "[LICM] Enable control flow hoisting by default" and "[LICM] Reapply r347190 "Make LICM able to hoist phis" with fix"
This reverts commits r347776 and r347778.

The first one, r347776, caused significant compile time regressions
for certain input files, see PR39836 for details.

llvm-svn: 347867
2018-11-29 14:39:39 +00:00
Sanjay Patel 83d1d3f167 [CVP] auto-generate complete test checks; NFC
llvm-svn: 347866
2018-11-29 14:28:47 +00:00
Max Kazantsev a63b275285 [NFC] Add two XFAIL tests from PR39783
llvm-svn: 347845
2018-11-29 09:38:22 +00:00
Sam Parker d6ebf0108e [LoopStrengthReduce] ComplexityLimit as an option
Convert ComplexityLimit into a command line value.

Differential Revision: https://reviews.llvm.org/D54899

llvm-svn: 347843
2018-11-29 08:34:22 +00:00
Craig Topper 961b956eb4 [Inliner] Modify the merging of min-legal-vector-width attribute to better handle when the caller or callee don't have the attribute.
Lack of an attribute means that the function hasn't been checked for what vector width it requires. So if the caller or the callee doesn't have the attribute we should make sure the combined function after inlining does not have the attribute.

If the caller already doesn't have the attribute we can just avoid adding it. Otherwise if the callee doesn't have the attribute just remove the caller's attribute.

llvm-svn: 347841
2018-11-29 07:27:38 +00:00
Craig Topper 645cc6e331 [Inliner] Add test for merging of min-legal-vector-width function attribute.
This should have been added in r337844, but apparently was I failed to 'git add' the file.

llvm-svn: 347840
2018-11-29 07:02:47 +00:00
Paul Robinson adcdc1bd0a [DebugInfo] IR/Bitcode changes for DISubprogram flags.
Packing the flags into one bitcode word will save effort in
adding new flags in the future.

Differential Revision: https://reviews.llvm.org/D54755

llvm-svn: 347806
2018-11-28 21:14:32 +00:00
Jeremy Morse 9b4cfa55b1 [DebugInfo] Give inlinable calls DILocs (PR39807)
In PR39807 we incorrectly handle circumstances where calls are common'd
from conditional blocks into the parent BB. Calls that can be inlined
must always have DebugLocs, however we strip them during commoning, which
the IR verifier asserts on.

Fix this by using applyMergedLocation: it will perform the same DebugLoc
stripping of conditional Locs, but will also generate an unknown location
DebugLoc that satisfies the requirement for inlinable calls to always have
locations.

Some of the prior logic for selecting a DebugLoc is now likely redundant;
I'll generate a follow-up to remove it (involves editing more regression
tests).

Differential Revision: https://reviews.llvm.org/D54997

llvm-svn: 347782
2018-11-28 17:58:45 +00:00
John Brawn 4557ffeb63 [LICM] Enable control flow hoisting by default
Differential Revision: https://reviews.llvm.org/D54949

llvm-svn: 347778
2018-11-28 17:23:03 +00:00
John Brawn 31c9769580 [LICM] Reapply r347190 "Make LICM able to hoist phis" with fix
This commit caused failures because it failed to correctly handle cases where
we hoist a phi, then hoist a use of that phi, then have to rehoist that use. We
need to make sure that we rehoist the use to _after_ the hoisted phi, which we
do by always rehoisting to the immediate dominator instead of just rehoisting
everything to the original preheader.

An option is also added to control whether control flow is hoisted, which is
off in this commit but will be turned on in a subsequent commit.

Differential Revision: https://reviews.llvm.org/D52827

llvm-svn: 347776
2018-11-28 17:21:49 +00:00
Nikita Popov 8d63aed459 [InstCombine] Combine saturating add/sub with constant operands
Combine
  sat(sat(X + C1) + C2) -> sat(X + (C1+C2))
and
  sat(sat(X - C1) - C2) -> sat(X - (C1+C2))
if the sign of C1 and C2 matches.

In the unsigned case we can compute C1+C2 with saturating arithmetic,
and InstSimplify will reduce this just to the saturation value. For
the signed case, we cannot perform the simplification if the result
of the addition overflows.

This change is part of https://reviews.llvm.org/D54534.

llvm-svn: 347773
2018-11-28 16:37:15 +00:00
Nikita Popov 42f89989a1 [InstCombine] Canonicalize ssub.sat to sadd.sat
Canonicalize ssub.sat(X, C) to ssub.sat(X, -C) if C is constant and
not signed minimum. This will help further optimizations to apply.

This change is part of https://reviews.llvm.org/D54534.

llvm-svn: 347772
2018-11-28 16:37:09 +00:00
Nikita Popov cf596a8c26 [ValueTracking] Determine always-overflow condition for unsigned sub
Always-overflow was already determined for unsigned addition, but
not subtraction. This patch establishes parity.

This allows us to perform some additional simplifications for
signed saturating subtractions.

This change is part of https://reviews.llvm.org/D54534.

llvm-svn: 347771
2018-11-28 16:37:04 +00:00
Nikita Popov 78a9295e15 [InstCombine] Use known overflow information for saturating add/sub
If ValueTracking can determine that the add/sub can newer overflow,
replace it with the corresponding nuw/nsw add/sub.

Additionally, for the unsigned case, if ValueTracking determines
that the add/sub always overflows, replace the result with the
saturation value.

This change is part of https://reviews.llvm.org/D54534.

llvm-svn: 347770
2018-11-28 16:36:59 +00:00
Nikita Popov 085d24a8b3 [InstCombine] Canonicalize const arg for saturating adds
If a saturating add intrinsic has one constant argument, make sure
it is on the RHS. This will simplify further transformations.

This change is part of https://reviews.llvm.org/D54534.

llvm-svn: 347769
2018-11-28 16:36:52 +00:00
Alexey Bataev 579c2d9d64 [SLP]Fix PR39774: Set ReductionRoot if the original instruction is vectorized.
Summary:
If the original reduction root instruction was vectorized, it might be
removed from the tree. It means that the insertion point may become
invalidated and the whole vectorization of the reduction leads to the
incorrect output result.
The ReductionRoot instruction must be marked as externally used so it
could not be removed. Otherwise it might cause inconsistency with the
cost model and we may end up with too optimistic optimization.

Reviewers: RKSimon, spatel, hfinkel, mkuper

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D54955

llvm-svn: 347759
2018-11-28 14:34:11 +00:00
Nikita Popov e20e6b4a53 [InstCombine] Add tests for saturating add/sub; NFC
These are baseline tests for D54534.

llvm-svn: 347700
2018-11-27 19:52:56 +00:00
Florian Hahn fd6ea134f4 [PartialInliner] Make PHIs free in cost computation.
InlineCost also treats them as free and the current implementation
can cause assertion failures if PHI nodes are moved outside the region
from entry BBs to the region.

It also updates the code to use the instructionsWithoutDebug iterator.

Reviewers: davidxl, davide, vsk, graham-yiu-huawei

Reviewed By: davidxl

Differential Revision: https://reviews.llvm.org/D54748

llvm-svn: 347683
2018-11-27 18:17:27 +00:00
Max Kazantsev b0a9b75e2a Add missing REQUIRES: asserts
llvm-svn: 347644
2018-11-27 07:51:18 +00:00
Max Kazantsev c4e4d6449a [LoopSimplifyCFG] Fix corner case with duplicating successors
It fixes a bug that doesn't update Phi inputs of the only live successor that
is in the list of block's successors more than once.

Thanks @uabelho for finding this.

Differential Revision: https://reviews.llvm.org/D54849
Reviewed By: anna

llvm-svn: 347640
2018-11-27 06:17:21 +00:00
Sanjay Patel 703299e5e9 [InstCombine] add tests for rotate/bswap equality; NFC
llvm-svn: 347618
2018-11-27 00:08:21 +00:00
Xin Tong 04d49779a1 [ICP] Remove incompatible attributes at indirect-call promoted callsites.
Summary:
Removing ncompatible attributes at indirect-call promoted callsites, not removing it results in
at least a IR verification error.

Reviewers: davidxl, xur, mssimpso

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D54913

llvm-svn: 347605
2018-11-26 22:03:52 +00:00
Fedor Sergeev 8cd9d1b5ce Revert "[TTI] Reduction costs only need to include a single extract element cost"
This reverts commit r346970.
It was causing PR39774, a crash in slp-vectorizer on a rather simple loop
with just a bunch of 'and's in the body.

llvm-svn: 347541
2018-11-26 10:17:27 +00:00
Florian Hahn 6615a7132a [IPSCCP] Use input operand instead of OriginalOp for ssa_copy.
OriginalOp of a Predicate refers to the original IR value,
before renaming. While solving in IPSCCP, we have to use
the operand of the ssa_copy instead, to avoid missing
updates for nested conditions on the same IR value.

Fixes PR39772.

llvm-svn: 347524
2018-11-25 16:32:02 +00:00
Nikita Popov 2c779c0e34 [InstCombine] Determine demanded and known bits for funnel shifts
Support funnel shifts in InstCombine demanded bits simplification.
If the shift amount is constant, we can determine both the demanded
bits of the operands, as well as the known bits of the result.

If one of the operands has no demanded bits, it will be replaced
by undef and the funnel shift will be simplified into a simple shift
due to the simplifications added in D54778.

Differential Revision: https://reviews.llvm.org/D54869

llvm-svn: 347515
2018-11-24 19:00:45 +00:00
Joel Jones 7459398a43 Revert unapproved commit
llvm-svn: 347511
2018-11-24 07:26:55 +00:00
Joel Jones 5f533c5fe1 [AArch64] Enable libm vectorized functions via SLEEF
This changeset is modeled after Intel's submission for SVML. It enables
trigonometry functions vectorization via SLEEF: http://sleef.org/.

 * A new vectorization library enum is added to TargetLibraryInfo.h: SLEEF.
 * A new option is added to TargetLibraryInfoImpl - ClVectorLibrary: SLEEF.
 * A comprehensive test case is included in this changeset.
 * In a separate changeset (for clang), a new vectorization library argument is
   added to -fveclib: -fveclib=SLEEF.

Trigonometry functions that are vectorized by sleef:

acos
asin
atan
atanh
cos
cosh
exp
exp2
exp10
lgamma
log10
log2
log
sin
sinh
sqrt
tan
tanh
tgamma

Patch by Stefan Teleman
Differential Revision: https://reviews.llvm.org/D53927

llvm-svn: 347510
2018-11-24 06:41:39 +00:00
Nikita Popov 6e81d421e1 [InstCombine] Simplify funnel shift with zero/undef operand to shift
The following simplifications are implemented:

 * `fshl(X, 0, C) -> shl X, C%BW`
 * `fshl(X, undef, C) -> shl X, C%BW` (assuming undef = 0)
 * `fshl(0, X, C) -> lshr X, BW-C%BW`
 * `fshl(undef, X, C) -> lshr X, BW-C%BW` (assuming undef = 0)
 * `fshr(X, 0, C) -> shl X, (BW-C%BW)`
 * `fshr(X, undef, C) -> shl X, BW-C%BW` (assuming undef = 0)
 * `fshr(0, X, C) -> lshr X, C%BW`
 * `fshr(undef, X, C) -> lshr, X, C%BW` (assuming undef = 0)

The simplification is only performed if the shift amount C is constant,
because we can explicitly compute C%BW and BW-C%BW in this case.

Differential Revision: https://reviews.llvm.org/D54778

llvm-svn: 347505
2018-11-23 22:45:08 +00:00
Max Kazantsev 7231009b78 [NFC] Add test that demonstrates buggy behavior on term folding of LoopSimplifyCFG
llvm-svn: 347488
2018-11-23 10:34:22 +00:00
Max Kazantsev e1c2dc27d3 Disable LoopSimplifyCFG terminator folding by default
llvm-svn: 347486
2018-11-23 09:14:53 +00:00
Max Kazantsev cb8e240334 [LoopSimplifyCFG] Don't delete LCSSA Phis
When removing edges, we also update Phi inputs and may end up removing
a Phi if it has only one input. We should not do it for edges that leave the current
loop because these Phis are LCSSA Phis and need to be preserved.

Thanks @dmgreen	for finding this!

Differential Revision: https://reviews.llvm.org/D54841

llvm-svn: 347484
2018-11-23 07:56:47 +00:00
Max Kazantsev a10c1c7412 [NFC] Add verification flags to tests
llvm-svn: 347483
2018-11-23 05:21:53 +00:00
Nikita Popov a70fdf8635 [InstCombine] Add tests for funnel shift with zero operand; NFC
These are additional baseline tests for D54778.

llvm-svn: 347414
2018-11-21 20:34:11 +00:00
Nikita Popov 6f54fb0052 [MergeFuncs] Generate alias instead of thunk if possible
The MergeFunctions pass was originally intended to emit aliases
instead of thunks where possible (unnamed_addr). However, for a
long time this functionality was behind a flag hardcoded to false,
bitrotted and was eventually removed in r309313.

Originally the functionality was first disabled in r108417 due to
lack of support for aliases in Mach-O. I believe that this is no
longer the case nowadays, but not really familiar with this area.

In the interest of being conservative, this patch reintroduces the
aliasing functionality behind a default disabled -mergefunc-use-aliases
flag.

Differential Revision: https://reviews.llvm.org/D53285

llvm-svn: 347407
2018-11-21 19:37:19 +00:00
Mikael Holmen b6f76002d9 [PM] Port Scalarizer to the new pass manager.
Patch by: markus (Markus Lavin)

Reviewers: chandlerc, fedor.sergeev

Reviewed By: fedor.sergeev

Subscribers: llvm-commits, Ka-Ka, bjope

Differential Revision: https://reviews.llvm.org/D54695

llvm-svn: 347392
2018-11-21 14:00:17 +00:00
Max Kazantsev bcd3f55827 [NFC] More complex tests for LoopSimplifyCFG
llvm-svn: 347384
2018-11-21 09:55:09 +00:00
Max Kazantsev 6d9e7918ec [NFC] Add some sophisticated tests on LoopSimplifyCFG
llvm-svn: 347381
2018-11-21 07:22:06 +00:00
John Regehr 3a1c9d55cc [LVI] run transfer function for binary operator even when the RHS isn't a constant
LVI was symbolically executing binary operators only when the RHS was
constant, missing the case where we have a ConstantRange for the RHS,
but not an actual constant. Tested using check-all and by
bootstrapping. Compile time is not impacted measurably.

Differential Revision: https://reviews.llvm.org/D19859

llvm-svn: 347379
2018-11-21 05:24:12 +00:00
Sanjay Patel 96152dcb1c [InstCombine] add tests for funnel shifts; NFC
These are included in D54666, so adding them first with baseline results.

Patch by: @nikic (Nikita Popov)

llvm-svn: 347333
2018-11-20 17:51:49 +00:00
Sanjay Patel 14ab9170b8 [InstSimplify] fold funnel shifts with undef operands
Splitting these off from the D54666.

Patch by: nikic (Nikita Popov)

llvm-svn: 347332
2018-11-20 17:34:59 +00:00
Sanjay Patel 2778f56a40 [InstSimplify] add tests for funnel shift with undef operands; NFC
These are part of D54666, so adding them here before the patch to
show the baseline (currently unoptimized) results.

Patch by: @nikic (Nikita Popov)

llvm-svn: 347331
2018-11-20 17:30:09 +00:00
Sanjay Patel eea21da12a [InstructionSimplify] Add support for saturating add/sub
Add support for saturating add/sub in InstructionSimplify. In particular, the following simplifications are supported:

    sat(X + 0) -> X
    sat(X + undef) -> -1
    sat(X uadd MAX) -> MAX
    (and commutative variants)

    sat(X - 0) -> X
    sat(X - X) -> 0
    sat(X - undef) -> 0
    sat(undef - X) -> 0
    sat(0 usub X) -> 0
    sat(X usub MAX) -> 0

Patch by: @nikic (Nikita Popov)

Differential Revision: https://reviews.llvm.org/D54532

llvm-svn: 347330
2018-11-20 17:20:26 +00:00
Guozhi Wei c21fba1bab [LoopSink] Add preheader to alias set
This patch fixes PR39695.

The original LoopSink only considers memory alias in loop body. But PR39695 shows that instructions following sink candidate in preheader should also be checked. This is a conservative patch, it simply adds whole preheader block to alias set. It may lose some optimization opportunity, but I think that is very rare because: 1 in the most common case st/ld to the same address, the load should already be optimized away. 2 usually preheader is not very large. 

Differential Revision: https://reviews.llvm.org/D54659

llvm-svn: 347325
2018-11-20 16:49:07 +00:00
Sanjay Patel f5ead29b78 [PatternMatch] Handle undef vectors consistently
This patch fixes the issue noticed in D54532. 
The problem is that cst_pred_ty-based matchers like m_Zero() currently do not match 
scalar undefs (as expected), but *do* match vector undefs. This may lead to optimization 
inconsistencies in rare cases.

There is only one existing test for which output changes, reverting the change from D53205. 
The reason here is that vector fsub undef, %x is no longer matched as an m_FNeg(). While I 
think that the new output is technically worse than the previous one, it is consistent with 
scalar, and I don't think it's really important either way (generally that undef should have 
been folded away prior to reassociation.)

I've also added another test case for this issue based on InstructionSimplify. It took some 
effort to find that one, as in most cases undef folds are either checked first -- and in the 
cases where they aren't it usually happens to not make a difference in the end. This is the 
only case I was able to come up with. Prior to this patch the test case simplified to undef 
in the scalar case, but zeroinitializer in the vector case.

Patch by: @nikic (Nikita Popov)

Differential Revision: https://reviews.llvm.org/D54631

llvm-svn: 347318
2018-11-20 16:08:19 +00:00
Max Kazantsev c04b5307d1 Recommit "[LoopSimplifyCFG] Teach LoopSimplifyCFG to constant-fold branches and switches"
The initial version of patch lacked Phi nodes updates in destinations of removed
edges. This version contains this update and tests on this situation.

Differential Revision: https://reviews.llvm.org/D54021

llvm-svn: 347289
2018-11-20 05:43:32 +00:00
Benjamin Kramer fdd9b4fc8f Revert "[LoopSimplifyCFG] Teach LoopSimplifyCFG to constant-fold branches and switches"
This reverts commits r347183 & r347184. Crashes while building libxml.

llvm-svn: 347260
2018-11-19 20:01:20 +00:00
Vedant Kumar 238533ec2e [InstCombine] Set debug loc on `mergeStoreIntoSuccessor` phi
Assigning a merged debug location to the `mergeStoreIntoSuccessor` phi
improves backtrace quality.

Fixes llvm.org/PR38083.

llvm-svn: 347257
2018-11-19 19:55:02 +00:00
Benjamin Kramer 2cad359c91 Revert "[LICM] Make LICM able to hoist phis"
This reverts commit r347190.

llvm-svn: 347225
2018-11-19 16:51:57 +00:00
Anna Thomas 5e9215f02b [LV] Avoid vectorizing unsafe dependencies in uniform address
Summary:
Currently, when vectorizing stores to uniform addresses, the only
instance we prevent vectorization is if there are multiple stores to the
same uniform address causing an unsafe dependency.
This patch teaches LAA to avoid vectorizing loops that have an unsafe
cross-iteration dependency between a load and a store to the same uniform address.

Fixes PR39653.

Reviewers: Ayal, efriedma

Subscribers: rkruppe, llvm-commits

Differential Revision: https://reviews.llvm.org/D54538

llvm-svn: 347220
2018-11-19 15:39:59 +00:00
John Brawn 12c046fba0 [LICM] Make LICM able to hoist phis
The general approach taken is to make note of loop invariant branches, then when
we see something conditional on that branch, such as a phi, we create a copy of
the branch and (empty versions of) its successors and hoist using that.

This has no impact by itself that I've been able to see, as LICM typically
doesn't see such phis as they will have been converted into selects by the time
LICM is run, but once we start doing phi-to-select conversion later it will be
important.

Differential Revision: https://reviews.llvm.org/D52827

llvm-svn: 347190
2018-11-19 11:31:24 +00:00
Fangrui Song 209cfbe60e [LoopSimplifyCFG] Add requires: asserts after rL347183
llvm-svn: 347184
2018-11-19 06:28:15 +00:00
Max Kazantsev 8e3e33d138 [LoopSimplifyCFG] Teach LoopSimplifyCFG to constant-fold branches and switches
This patch introduces infrastructure and the simplest case for constant-folding
of branch and switch instructions within loop into unconditional branches.
It is useful as a cleanup for such passes as loop unswitching that sometimes
produce such branches.

Only the simplest case supported in this patch: after the folding, no block
should become dead or stop being part of the loop. Support for more
sophisticated cases will go separately in follow-up patches.

Differential Revision: https://reviews.llvm.org/D54021
Reviewed By: anna

llvm-svn: 347183
2018-11-19 05:54:38 +00:00
Vedant Kumar 35f504c113 [CorrelatedValuePropagation] Preserve debug locations (PR38178)
Fix all of the missing debug location errors in CVP found by debugify.

This includes the missing-location-after-udiv-truncation case described
in llvm.org/PR38178.

llvm-svn: 347147
2018-11-18 00:29:58 +00:00
Fedor Sergeev 2e3e224e71 [SimpleLoopUnswitch] adding cost multiplier to cap exponential unswitch with
We need to control exponential behavior of loop-unswitch so we do not get
run-away compilation.

Suggested solution is to introduce a multiplier for an unswitch cost that
makes cost prohibitive as soon as there are too many candidates and too
many sibling loops (meaning we have already started duplicating loops
by unswitching).

It does solve the currently known problem with compile-time degradation
(PR 39544).

Tests are built on top of a recently implemented CHECK-COUNT-<num>
FileCheck directives.

Reviewed By: chandlerc, mkazantsev
Differential Revision: https://reviews.llvm.org/D54223

llvm-svn: 347097
2018-11-16 21:16:43 +00:00
Adrian Prantl 83d87520ed GlobalDCE: Teach isEmptyFunction() to ignore debug intrinsics.
This fixes PR39669.
https://bugs.llvm.org/show_bug.cgi?id=39669

llvm-svn: 347065
2018-11-16 17:47:21 +00:00
Sanjay Patel f967328e24 [InstSimplify] add tests for saturating add/sub; NFC
These are baseline tests for D54532.
Patch based on the original tests by:
@nikic (Nikita Popov)

llvm-svn: 347060
2018-11-16 16:32:34 +00:00
Sanjay Patel 5ebd2a785e [InstSimplify] add test to demonstrate undef matching differences; NFC
This is a baseline test for D54631.
Patch by:
@nikic (Nikita Popov)

llvm-svn: 347055
2018-11-16 15:35:58 +00:00
Sanjay Patel c92aa7618f [InstCombine] adjust rotate direction in tests; NFC
Copy/paste errors - all of the changed tests rotated left before.

llvm-svn: 346982
2018-11-15 19:15:41 +00:00
Sanjay Patel 6cda87463f [InstCombine] add tests for funnel shift (rotate) canonicalization; NFC
llvm-svn: 346975
2018-11-15 18:19:56 +00:00
Simon Pilgrim 924f193419 [TTI] Reduction costs only need to include a single extract element cost
We were adding the entire scalarization extraction cost for reductions, which returns the total cost of extracting every element of a vector type.

For reductions we don't need to do this - we just need to extract the 0'th element after the reduction pattern has completed.

Fixes PR37731

Differential Revision: https://reviews.llvm.org/D54585

llvm-svn: 346970
2018-11-15 17:42:53 +00:00
Sanjay Patel bc56b2432d [InstCombine] fix rotate narrowing bug for non-pow-2 types
llvm-svn: 346968
2018-11-15 17:19:14 +00:00
Sanjay Patel 712bdb275c [InstCombine] add rotate narrowing tests with odd types; NFC
There's a potential miscompile here. It's unlikely in the real 
world because this transform is guarded with shouldChangeType(), 
but this test file doesn't include a standard data-layout for
some reason (despite including a custom 1), so we can see the bug. 

llvm-svn: 346966
2018-11-15 16:34:26 +00:00
Simon Pilgrim 5a1b7cea91 [SLPVectorizer][X86] Regenerate reduction minmax tests and cleanup check prefixes
llvm-svn: 346965
2018-11-15 16:34:15 +00:00
Simon Pilgrim 4dd692ec2a [SLPVectorizer][X86] Regenerate reduction tests and add PR37731 test
Cleanup check prefixes

llvm-svn: 346964
2018-11-15 16:08:25 +00:00
Sanjay Patel e98ec77a95 [InstSimplify] delete shift-of-zero guard ops around funnel shifts
This is a problem seen in common rotate idioms as noted in:
https://bugs.llvm.org/show_bug.cgi?id=34924

Note that we are not canonicalizing standard IR (shifts and logic) to the intrinsics yet. 
(Although I've written this before...) I think this is the last step before we enable 
that transform. Ie, we could regress code by doing that transform without this 
simplification in place.

In PR34924, I questioned whether this is a valid transform for target-independent IR, 
but I convinced myself this is ok. If we're speculating a funnel shift by turning cmp+br 
into select, then SimplifyCFG has already determined that the transform is justified. 
It's possible that SimplifyCFG is not taking into account profile or other metadata, 
but if that's true, then it's a bug independent of funnel shifts.

Also, we do have CGP code to restore a guard like this around an intrinsic if it can't 
be lowered cheaply. But that isn't necessary for funnel shift because the default 
expansion in SelectionDAGBuilder includes this same cmp+select.

Differential Revision: https://reviews.llvm.org/D54552

llvm-svn: 346960
2018-11-15 14:53:37 +00:00
Sanjay Patel 4832ffee39 [InstSimplify] add more tests for funnel shift with select; NFC
The cases are just different enough that we should have 
complete tests to avoid bugs from typos in the code.

llvm-svn: 346902
2018-11-14 22:34:25 +00:00
Vedant Kumar 808e157356 Mark @llvm.trap cold
A call to @llvm.trap can be expected to be cold (i.e. unlikely to be
reached in a normal program execution).

Outlining paths which unconditionally trap is an important memory
saving. As the hot/cold splitting pass (imho) should not treat all
noreturn calls as cold, explicitly mark @llvm.trap cold so that it can
be outlined.

Split out of https://reviews.llvm.org/D54244.

Differential Revision: https://reviews.llvm.org/D54329

llvm-svn: 346885
2018-11-14 19:53:41 +00:00
Teresa Johnson 32dc5b9bf1 [ThinLTO] Update handling of vararg functions to match inliner
Summary:
Previously we marked all vararg functions as non-inlinable in the
function summary, which prevented their importing. However, the
corresponding inliner restriction was loosened in r321940/r342675
to only apply to functions calling va_start. Adjust the summary
flag computation to match.

Reviewers: davidxl

Subscribers: mehdi_amini, inglorion, eraman, steven_wu, dexonsmith, llvm-commits

Differential Revision: https://reviews.llvm.org/D54270

llvm-svn: 346883
2018-11-14 19:30:13 +00:00
Sanjay Patel 7d028670f6 [InstSimplify] add tests for funnel shift with select; NFC
llvm-svn: 346881
2018-11-14 19:12:54 +00:00
Mandeep Singh Grang 0905fc77c1 [InstCombine] Remove a couple of asserts based on incorrect assumptions
Summary:
These asserts are based on the assumption that the order of true/false operands in a select and those in the compare would always be the same.
This fixes PR39595.

Reviewers: craig.topper, spatel, dmgreen

Reviewed By: craig.topper

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D54359

llvm-svn: 346874
2018-11-14 17:55:07 +00:00
John Brawn 9fd8c20c4f [SimplifyCFG] Regenerate preserve-branchweights.ll test. NFC
Regenerate this test using update_test_checks.py in preparation for an
upcomming commit, to make it not depend on the names of instructions.

llvm-svn: 346869
2018-11-14 15:27:07 +00:00
Florian Hahn 505091a8f2 Recommit r346483: [CallSiteSplitting] Only record conditions up to the IDom(call site).
The underlying problem causing the expensive-check failure was fixed in
rL346769.

llvm-svn: 346843
2018-11-14 10:04:30 +00:00
Reid Kleckner 41390b47de Revert r346810 "Preserve loop metadata when splitting exit blocks"
It broke the Windows self-host:
http://lab.llvm.org:8011/builders/clang-x64-windows-msvc/builds/1457

llvm-svn: 346823
2018-11-14 01:47:32 +00:00
Sanjay Patel a139564896 [InstCombine] fold funnel shift amount based on demanded bits
The shift amount of a funnel shift is modulo the scalar bitwidth:
http://llvm.org/docs/LangRef.html#llvm-fshl-intrinsic
...so we can use demanded bits analysis on that operand to simplify it
when we have a power-of-2 bitwidth.

This is another step towards canonicalizing {shift/shift/or} to the 
intrinsics in IR.

Differential Revision: https://reviews.llvm.org/D54478

llvm-svn: 346814
2018-11-13 23:27:23 +00:00
Craig Topper 3c87c2a3c5 Preserve loop metadata when splitting exit blocks
LoopUtils.cpp contains a utility that splits an loop exit block, so that the new block contains only edges coming from the loop. In the case of nested loops, the exit path for the inner loop might also be the back-edge of the outer loop. The new block which is inserted on this path, is now a latch for the outer loop, and it needs to hold the loop metadata for the outer loop. (The test case gives a more concrete view of the situation.)

Patch by Chang Lin (clin1)

Differential Revision: https://reviews.llvm.org/D53876

llvm-svn: 346810
2018-11-13 23:06:49 +00:00
Sanjay Patel f8f12272e8 [InstCombine] canonicalize rotate patterns with cmp/select
The cmp+branch variant of this pattern is shown in:
https://bugs.llvm.org/show_bug.cgi?id=34924
...and as discussed there, we probably can't transform
that without a rotate intrinsic. We do have that now
via funnel shift, but we're not quite ready to 
canonicalize IR to that form yet. The case with 'select'
should already be transformed though, so that's this patch.

The sequence with negation followed by masking is what we
use in the backend and partly in clang (though that part 
should be updated).

https://rise4fun.com/Alive/TplC
  %cmp = icmp eq i32 %shamt, 0
  %sub = sub i32 32, %shamt
  %shr = lshr i32 %x, %shamt
  %shl = shl i32 %x, %sub
  %or = or i32 %shr, %shl
  %r = select i1 %cmp, i32 %x, i32 %or
  =>
  %neg = sub i32 0, %shamt
  %masked = and i32 %shamt, 31
  %maskedneg = and i32 %neg, 31
  %shl2 = lshr i32 %x, %masked
  %shr2 = shl i32 %x, %maskedneg
  %r = or i32 %shl2, %shr2

llvm-svn: 346807
2018-11-13 22:47:24 +00:00
Cameron McInally cbde0d9c7b [IR] Add a dedicated FNeg IR Instruction
The IEEE-754 Standard makes it clear that fneg(x) and
fsub(-0.0, x) are two different operations. The former is a bitwise
operation, while the latter is an arithmetic operation. This patch
creates a dedicated FNeg IR Instruction to model that behavior.

Differential Revision: https://reviews.llvm.org/D53877

llvm-svn: 346774
2018-11-13 18:15:47 +00:00
Florian Hahn 107d0a8756 [CSP, Cloning] Update DuplicateInstructionsInSplitBetween to use DomTreeUpdater.
This patch updates DuplicateInstructionsInSplitBetween to update a DTU
instead of applying updates to the DT directly.

Given that there only are 2 users, also updated them in this patch to
avoid churn.

I slightly moved the code in CallSiteSplitting around to reduce the
places where we have to pass in DTU. If necessary, I could split those
changes in a separate patch.

This fixes missing DT updates when dealing with musttail calls in
CallSiteSplitting, by using DTU->deleteBB.

Reviewers: junbuml, kuhar, NutshellySima, indutny, brzycki

Reviewed By: NutshellySima

llvm-svn: 346769
2018-11-13 17:54:43 +00:00
Sanjay Patel bcc5a74261 [InstCombine] add tests for funnel shift demanded bits; NFC
llvm-svn: 346762
2018-11-13 16:47:16 +00:00
Sanjay Patel 02f289e587 [InstCombine] add rotate variants that include select; NFC
llvm-svn: 346719
2018-11-12 23:58:59 +00:00
Sanjay Patel 35b1c2d19d [InstCombine] narrow width of rotate patterns, part 3
This is a longer variant for the pattern handled in
rL346713 
This one includes zexts. 

Eventually, we should canonicalize all rotate patterns 
to the funnel shift intrinsics, but we need a bit more
infrastructure to make sure the vectorizers handle those
intrinsics as well as the shift+logic ops.

https://rise4fun.com/Alive/FMn

Name: narrow rotateright
  %neg = sub i8 0, %shamt
  %rshamt = and i8 %shamt, 7
  %rshamtconv = zext i8 %rshamt to i32
  %lshamt = and i8 %neg, 7
  %lshamtconv = zext i8 %lshamt to i32
  %conv = zext i8 %x to i32
  %shr = lshr i32 %conv, %rshamtconv
  %shl = shl i32 %conv, %lshamtconv
  %or = or i32 %shl, %shr
  %r = trunc i32 %or to i8
  =>
  %maskedShAmt2 = and i8 %shamt, 7
  %negShAmt2 = sub i8 0, %shamt
  %maskedNegShAmt2 = and i8 %negShAmt2, 7
  %shl2 = lshr i8 %x, %maskedShAmt2
  %shr2 = shl i8 %x, %maskedNegShAmt2
  %r = or i8 %shl2, %shr2
llvm-svn: 346716
2018-11-12 22:52:25 +00:00
Sanjay Patel 98e427ccf2 [InstCombine] narrow width of rotate patterns, part 2 (PR39624)
The sub-pattern for the shift amount in a rotate can take on
several different forms, and there's apparently no way to
canonicalize those without seeing the entire rotate sequence.

This is the form noted in:
https://bugs.llvm.org/show_bug.cgi?id=39624

https://rise4fun.com/Alive/qnT

  %zx = zext i8 %x to i32
  %maskedShAmt = and i32 %shAmt, 7
  %shl = shl i32 %zx, %maskedShAmt
  %negShAmt = sub i32 0, %shAmt
  %maskedNegShAmt = and i32 %negShAmt, 7
  %shr = lshr i32 %zx, %maskedNegShAmt
  %rot = or i32 %shl, %shr
  %r = trunc i32 %rot to i8
  =>
  %truncShAmt = trunc i32 %shAmt to i8
  %maskedShAmt2 = and i8 %truncShAmt, 7
  %shl2 = shl i8 %x, %maskedShAmt2
  %negShAmt2 = sub i8 0, %truncShAmt
  %maskedNegShAmt2 = and i8 %negShAmt2, 7
  %shr2 = lshr i8 %x, %maskedNegShAmt2
  %r = or i8 %shl2, %shr2

llvm-svn: 346713
2018-11-12 22:11:09 +00:00
Sanjay Patel b32d03dfed [InstCombine] add more tests for rotate narrowing; NFC
llvm-svn: 346703
2018-11-12 20:32:59 +00:00
Sanjay Patel 8512e5909e [InstCombine] regenerate checks; NFC
llvm-svn: 346689
2018-11-12 18:41:08 +00:00
Simon Pilgrim 47d38198eb [CostModel] Add more realistic SK_InsertSubvector generic costs.
Instead of defaulting to a cost = 1, expand to element extract/insert like we do for other shuffles.

llvm-svn: 346662
2018-11-12 15:20:24 +00:00
Sanjay Patel 1456fd7614 [VectorUtils] add funnel-shifts to the list of vectorizable intrinsics
This just identifies the intrinsics as candidates for vectorization.
It does not mean we will attempt to vectorize under normal conditions
(the test file is forcing vectorization). 

The cost model must be fixed to show that the transform is profitable 
in general.

Allowing vectorization with these intrinsics is required to avoid
potential regressions from canonicalizing to the intrinsics from
generic IR:
https://bugs.llvm.org/show_bug.cgi?id=37417

llvm-svn: 346661
2018-11-12 15:20:14 +00:00
Sanjay Patel 75120dcb06 [LoopVectorize] add tests for funnel shifts; NFC
llvm-svn: 346658
2018-11-12 14:52:01 +00:00
Max Kazantsev 7d49a3a816 [LICM] Hoist guards from non-header blocks
This patch relaxes overconservative checks on whether or not we could write
memory before we execute an instruction. This allows us to hoist guards out of
loops even if they are not in the header block.

Differential Revision: https://reviews.llvm.org/D50891
Reviewed By: fedor.sergeev

llvm-svn: 346643
2018-11-12 09:29:58 +00:00
Florian Hahn 9026d4ee9b [IPSCCP,PM] Preserve PDT in the new pass manager.
Reviewers: kuhar, chandlerc, NutshellySima, brzycki

Reviewed By: NutshellySima, brzycki

Differential Revision: https://reviews.llvm.org/D54317

llvm-svn: 346618
2018-11-11 20:22:45 +00:00
Sanjay Patel 3482801dea [InstCombine] auto-generate full checks; NFC
llvm-svn: 346594
2018-11-10 18:51:10 +00:00
Eli Friedman 15930bf352 [JumpThreading] Fix exponential time algorithm computing known values.
ComputeValueKnownInPredecessors has a "visited" set to prevent infinite
loops, since a value can be visited more than once.  However, the
implementation didn't prevent the algorithm from taking exponential
time. Instead of removing elements from the RecursionSet one at a time,
we should keep around the whole set until
ComputeValueKnownInPredecessors finishes, then discard it.

The testcase is synthetic because I was having trouble effectively
reducing the original.  But it's basically the same idea.

Instead of failing, we could theoretically cache the result instead.
But I don't think it would help substantially in practice.

Differential Revision: https://reviews.llvm.org/D54239

llvm-svn: 346562
2018-11-09 22:35:26 +00:00
Simon Pilgrim fc8f1d7da7 [CostModel][X86] SK_ExtractSubvector is free if the subvector is at the start of the source vector
llvm-svn: 346538
2018-11-09 19:04:27 +00:00
Florian Hahn 9f878e9bae Revert r346483: [CallSiteSplitting] Only record conditions up to the IDom(call site).
This cause a failure with EXPENSIVE_CHECKS

llvm-svn: 346492
2018-11-09 13:28:58 +00:00
Florian Hahn a1062f4b68 [IPSCCP,PM] Preserve DT in the new pass manager.
After D45330, Dominators are required for IPSCCP and can be preserved.

This patch preserves DominatorTreeAnalysis in the new pass manager. AFAIK the legacy pass manager cannot preserve function analysis required by a module analysis.

Reviewers: davide, dberlin, chandlerc, efriedma, kuhar, NutshellySima

Reviewed By: chandlerc, kuhar, NutshellySima

Differential Revision: https://reviews.llvm.org/D47259

llvm-svn: 346486
2018-11-09 11:52:27 +00:00
Florian Hahn 52578f95c9 [CallSiteSplitting] Only record conditions up to the IDom(call site).
We can stop recording conditions once we reached the immediate dominator
for the block containing the call site. Conditions in predecessors of the
that node will be the same for all paths to the call site and splitting
is not beneficial.

This patch makes CallSiteSplitting dependent on the DT anlysis. because
the immediate dominators seem to be the easiest way of finding the node
to stop at.

I had to update some exiting tests, because they were checking for
conditions that were true/false on all paths to the call site. Those
should now be handled by instcombine/ipsccp.

Reviewers: davide, junbuml

Reviewed By: junbuml

Differential Revision: https://reviews.llvm.org/D44627

llvm-svn: 346483
2018-11-09 10:23:46 +00:00
Florian Hahn a684a99441 [LoopInterchange] Support reductions across inner and outer loop.
This patch adds logic to detect reductions across the inner and outer
loop by following the incoming values of PHI nodes in the outer loop. If
the incoming values take part in a reduction in the inner loop or come
from outside the outer loop, we found a reduction spanning across inner
and outer loop.

With this change, ~10% more loops are interchanged in the LLVM
test-suite + SPEC2006.

Fixes https://bugs.llvm.org/show_bug.cgi?id=30472

Reviewers: mcrosier, efriedma, karthikthecool, davide, hfinkel, dmgreen

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D43245

llvm-svn: 346438
2018-11-08 20:44:19 +00:00
Pirama Arumuga Nainar e61652a384 [LTO] Drop non-prevailing definitions only if linkage is not local or appending
Summary:
This fixes PR 37422

In ELF, non-weak symbols can also be non-prevailing.  In this particular
PR, the __llvm_profile_* symbols are non-prevailing but weren't getting
dropped - causing multiply-defined errors with lld.

Also add a test, strong_non_prevailing.ll, to ensure that multiple
copies of a strong symbol are dropped.

To fix the test regressions exposed by this fix,
- do not mark prevailing copies for symbols with 'appending' linkage.
There's no one prevailing copy for such symbols.
- fix the prevailing version in dead-strip-fulllto.ll
- explicitly pass exported symbols to llvm-lto in fumcimport.ll and
funcimport_var.ll

Reviewers: tejohnson, pcc

Subscribers: mehdi_amini, inglorion, eraman, steven_wu, dexonsmith,
dang, srhines, llvm-commits

Differential Revision: https://reviews.llvm.org/D54125

llvm-svn: 346436
2018-11-08 20:10:07 +00:00
Tom Stellard 28d662164d InstCombine: Avoid introducing poison values when lowering llvm.amdgcn.[us]bfe
Summary:
When the 3rd argument to these intrinsics is zero, lowering them
to shift instructions produces poison values, since we end up with
shift amounts equal to the number of bits in the shifted value.  This
means we can only lower these intrinsics if we can prove that the
3rd argument is not zero.

Reviewers: arsenm

Reviewed By: arsenm

Subscribers: bnieuwenhuizen, jvesely, wdng, nhaehnle, llvm-commits

Differential Revision: https://reviews.llvm.org/D53739

llvm-svn: 346422
2018-11-08 17:57:57 +00:00
Vedant Kumar d6699423f1 [CodeExtractor] Mark functions noreturn when applicable
This eliminates the outlining penalty for llvm.trap/unreachable, because
callers no longer have to emit cleanup/ret instructions after calling an
outlined `noreturn` function.

rdar://45523626

llvm-svn: 346421
2018-11-08 17:57:09 +00:00
Max Kazantsev 266c087b9d Return "[IndVars] Smart hard uses detection"
The patch has been reverted because it ended up prohibiting propagation
of a constant to exit value. For such values, we should skip all checks
related to hard uses because propagating a constant is always profitable.

Differential Revision: https://reviews.llvm.org/D53691

llvm-svn: 346397
2018-11-08 11:54:35 +00:00
Gil Rapaport 7b88bab386 [LSR] Combine unfolded offset into invariant register
LSR reassociates constants as unfolded offsets when the constants fit as
immediate add operands, which currently prevents such constants from being
combined later with loop invariant registers.
This patch modifies GenerateCombinations() to generate a second formula which
includes the unfolded offset in the combined loop-invariant register.

This commit fixes a bug in the original patch (committed at r345114, reverted
at r345123).

Differential Revision: https://reviews.llvm.org/D51861

llvm-svn: 346390
2018-11-08 09:01:19 +00:00
whitequark 73cb978495 [MergeFuncs] Improve ordering of equal functions
Summary:
MergeFunctions currently tries to process strong functions before
weak functions, because weak functions can simply call strong
functions, while a strong/weak function cannot call a weak function
(a backing strong function is needed).

This patch additionally tries to process external functions before
local functions, because we definitely have to keep the external
function, but may be able to drop the local one (and definitely
can if it is also unnamed_addr).

Unfortunately, this exposes an existing bug in the implementation:
The FnTree and FNodesInTree structures can currently go out of
sync in the case where two weak functions are merged, because the
function in FnTree/FNodesInTree is RAUWed. This leaves it behind in
FnTree (this is intended, as it is the strong backing function which
should be used for further merges), while it is replaced in
FNodesInTree (this is not intended).

This is fixed by switching FNodesInTree from using a ValueMap to
using a DenseMap of AssertingVH.

This exposes another minor issue: Currently FNodesInTree is not
cleared after MergeFunctions finishes running. Currently, this is
potentially dangerous (e.g. if something else wants to RAUW a function
with a non-function), but at the very least it is unnecessary/inefficient.
After the change to use AssertingVH it becomes more problematic,
because there are certainly passes that remove functions.

This issue is fixed by clearing FNodesInTree at the end of the pass.

Reviewers: jfb, whitequark

Reviewed By: whitequark

Subscribers: rkruppe, llvm-commits

Differential Revision: https://reviews.llvm.org/D53271

llvm-svn: 346386
2018-11-08 03:58:01 +00:00
whitequark 3580ac6125 [MergeFuncs] Call removeUsers() prior to unnamed_addr RAUW
Summary:
For unnamed_addr functions we RAUW instead of only replacing direct callers. However, functions in which replacements were performed currently are not added back to the worklist, resulting in missed merging opportunities.

Fix this by calling removeUsers() prior to RAUW.

Reviewers: jfb, whitequark

Reviewed By: whitequark

Subscribers: rkruppe, llvm-commits

Differential Revision: https://reviews.llvm.org/D53262

llvm-svn: 346385
2018-11-08 03:57:55 +00:00
Rong Xu fb4bcc452c [PGO] Exit early if all count values are zero
If all the edge counts for a function are zero, skip count population and
annotation, as nothing will happen. This can save some compile time.

Differential Revision: https://reviews.llvm.org/D54212

llvm-svn: 346370
2018-11-07 23:51:20 +00:00
Fedor Sergeev f9a02a7006 [SimpleLoopUnswitch] partial unswitch needs to be careful when replacing invariants with constants
When partial unswitch operates on multiple conditions at once, .e.g:
   if (Cond1 || Cond2 || NonInv) ...

it should infer (and replace) values for individual conditions only on one
side of unswitch and not another.

More precisely only these derivations hold true:
   (Cond1 || Cond2) == false  =>  Cond1 == Cond2 == false
   (Cond1 && Cond2) == true   =>  Cond1 == Cond2 == true

By the way we organize unswitching it means only replacing on "continue" blocks
and never on "unswitched" ones. Since trivial unswitch does not have "unswitched"
blocks it does not have this problem.

Fixes PR 39568.

Reviewers: chandlerc, asbirlea
Differential Revision: https://reviews.llvm.org/D54211

llvm-svn: 346350
2018-11-07 20:05:11 +00:00
Mandeep Singh Grang d47d188b6f [LoopSink] Do not sink instructions into non-cold blocks
Summary: This fixes PR39570.

Reviewers: danielcdh, rnk, bkramer

Reviewed By: rnk

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D54181

llvm-svn: 346337
2018-11-07 18:26:24 +00:00
Florian Hahn ac86038b40 [NewGVN] Make sure we do not add a user to itself.
If we simplify an instruction to itself, we do not need to add a user to
itself. For congruence classes with a defining expression, we already
use a similar logic.

Fixes PR38259.

Reviewers: davide, efriedma, mcrosier

Reviewed By: davide

Differential Revision: https://reviews.llvm.org/D51168

llvm-svn: 346335
2018-11-07 17:20:07 +00:00
Sanjay Patel 57a08b3343 [InstCombine] propagate FMF for fcmp+fabs folds
By morphing the instruction rather than deleting and creating a new one,
we retain fast-math-flags and potentially other metadata (profile info?).

llvm-svn: 346331
2018-11-07 16:15:01 +00:00
Sanjay Patel bb521e63af [InstCombine] peek through fabs() when checking isnan()
That should be the end of the missing cases for this fold.
See earlier patches in this series:
rL346321
rL346324

llvm-svn: 346327
2018-11-07 15:44:26 +00:00
Sanjay Patel d80ec9e11a [InstCombine] add tests for isnan(fabs(X)); NFC
llvm-svn: 346325
2018-11-07 15:36:23 +00:00
Sanjay Patel fa5f146872 [InstCombine] add folds for fcmp Pred fabs(X), 0.0
Similar to rL346321, we had folds for the ordered
versions of these compares already, so add the
unordered siblings for completeness.

llvm-svn: 346324
2018-11-07 15:33:03 +00:00
Sanjay Patel 16a527e7de [InstCombine] add tests for more fcmp+fabs preds; NFC
llvm-svn: 346323
2018-11-07 15:27:02 +00:00
James Y Knight 72f76bf230 Add support for llvm.is.constant intrinsic (PR4898)
This adds the llvm-side support for post-inlining evaluation of the
__builtin_constant_p GCC intrinsic.

Also fixed SCCPSolver::visitCallSite to not blow up when seeing a call
to a function where canConstantFoldTo returns true, and one of the
arguments is a struct.

Updated from patch initially by Janusz Sobczak.

Differential Revision: https://reviews.llvm.org/D4276

llvm-svn: 346322
2018-11-07 15:24:12 +00:00
Sanjay Patel 76faf5145d [InstCombine] add fold for fabs(X) u< 0.0
The sibling fold for 'oge' --> 'ord' was already here,
but this half was missing. 

The result of fabs() must be positive or nan, so asking 
if the result is negative or nan is the same as asking 
if the result is nan.

This is another step towards fixing:
https://bugs.llvm.org/show_bug.cgi?id=39475

llvm-svn: 346321
2018-11-07 15:11:32 +00:00
Sanjay Patel c006a0ad4b [InstCombine] add test for fcmp+fabs; NFC
llvm-svn: 346320
2018-11-07 15:01:09 +00:00
Sanjay Patel 46a2510d01 [InstCombine] add FMF to fcmp to show failure to propagate; NFC
llvm-svn: 346317
2018-11-07 14:44:09 +00:00
Sanjay Patel 7552d0d2e6 [InstCombine] do not shrink switch conditions to illegal types (PR29009)
This patch makes shrinking switch conditions less aggressive which was introduced by:
rL274233

Note that we have 2 new bugs to track potential follow-ups that might have solved PR29009
in different ways:
https://bugs.llvm.org/show_bug.cgi?id=39569
https://bugs.llvm.org/show_bug.cgi?id=39578

Patch by:
@dendibakh (Denis Bakhvalov)

Differential Revision: https://reviews.llvm.org/D54115

llvm-svn: 346315
2018-11-07 14:12:41 +00:00
Max Kazantsev 68b2ad7e63 [NFC] Add missing test case, some test renaming
llvm-svn: 346295
2018-11-07 05:58:10 +00:00
Vedant Kumar 1e209e284f [CodeExtractor] Do not extract calls to eh_typeid_for (PR39545)
The lowering for a call to eh_typeid_for changes when it's moved from
one function to another.

There are several proposals for fixing this issue in llvm.org/PR39545.
Until some solution is in place, do not allow CodeExtractor to extract
calls to eh_typeid_for, as that results in serious miscompilations.

llvm-svn: 346256
2018-11-06 19:06:08 +00:00
Vedant Kumar 09b7aa443d [CodeExtractor] Erase use-without-def debug intrinsics in parent func
When CodeExtractor moves instructions to a new function, debug
intrinsics referring to those instructions within the parent function
become invalid.

This results in the same verifier failure which motivated r344545, about
function-local metadata being used in the wrong function.

llvm-svn: 346255
2018-11-06 19:05:53 +00:00
Eli Friedman e3a5fc6d80 Disable calls to *_finite and other glibc-only functions on Musl.
Non-GNU environments don't have __finite_*, so treat them as
unavailable.

Differential Revision: https://reviews.llvm.org/D51282

llvm-svn: 346250
2018-11-06 18:23:32 +00:00
Sanjay Patel 724014adde [InstCombine] allow vector types for fcmp+fpext fold
llvm-svn: 346245
2018-11-06 17:20:20 +00:00
Sanjay Patel db272b3720 [InstCombine] add vector test for fcmp+fpext; NFC
llvm-svn: 346243
2018-11-06 17:06:58 +00:00
Sanjay Patel 46bf3922c1 [InstCombine] propagate fast-math-flags when folding fcmp+fpext, part 2
llvm-svn: 346242
2018-11-06 16:45:27 +00:00
Sanjay Patel 1b85f00201 [InstCombine] propagate fast-math-flags when folding fcmp+fpext
llvm-svn: 346240
2018-11-06 16:23:03 +00:00
Sanjay Patel 6aea3071e8 [InstCombine] adjust tests to show dropping FMF; NFC
llvm-svn: 346239
2018-11-06 16:07:39 +00:00
Sanjay Patel 2fd5b0ebfb [InstCombine] propagate fast-math-flags when folding fcmp+fneg, part 2
llvm-svn: 346238
2018-11-06 15:58:57 +00:00
Sanjay Patel a166d19d93 [InstCombine] adjust tests to show dropping FMF; NFC
Also, remove some stale FIXME comments ( rL346234 ).

llvm-svn: 346236
2018-11-06 15:57:52 +00:00
Sanjay Patel 70282a0501 [InstCombine] propagate fast-math-flags when folding fcmp+fneg
This is another part of solving PR39475:
https://bugs.llvm.org/show_bug.cgi?id=39475

This might be enough to fix that particular issue, but as noted
with the FIXME, we're still dropping FMF on other folds around here.

llvm-svn: 346234
2018-11-06 15:49:45 +00:00
Sanjay Patel be985e33f0 [InstCombine] add tests for FMF propagation failure; NFC
llvm-svn: 346232
2018-11-06 15:21:44 +00:00
Simon Pilgrim c1da5f757e [InstCombine] Ensure nested shifts are in range (OSS-Fuzz #9880)
llvm-svn: 346225
2018-11-06 11:28:22 +00:00
Max Kazantsev 69f6dfa0f8 [LICM] Use ICFLoopSafetyInfo in LICM
This patch makes LICM use `ICFLoopSafetyInfo` that is a smarter version
of LoopSafetyInfo that leverages power of Implicit Control Flow Tracking
to keep track of throwing instructions and give less pessimistic answers
to queries related to throws.

The ICFLoopSafetyInfo itself has been introduced in rL344601. This patch
enables it in LICM only.

Differential Revision: https://reviews.llvm.org/D50377
Reviewed By: apilipenko

llvm-svn: 346201
2018-11-06 02:44:49 +00:00
Max Kazantsev c210c65e77 [NFC] Add motivating test case for revert in rL346198
llvm-svn: 346199
2018-11-06 02:12:44 +00:00
Max Kazantsev e059f4452b Revert "[IndVars] Smart hard uses detection"
This reverts commit 2f425e9c7946b9d74e64ebbfa33c1caa36914402.

It seems that the check that we still should do the transform if we
know the result is constant is missing in this code. So the logic that
has been deleted by this change is still sometimes accidentally useful.
I revert the change to see what can be done about it. The motivating
case is the following:

@Y = global [400 x i16] zeroinitializer, align 1

define i16 @foo() {
entry:
  br label %for.body

for.body:                                         ; preds = %entry, %for.body
  %i = phi i16 [ 0, %entry ], [ %inc, %for.body ]

  %arrayidx = getelementptr inbounds [400 x i16], [400 x i16]* @Y, i16 0, i16 %i
  store i16 0, i16* %arrayidx, align 1
  %inc = add nuw nsw i16 %i, 1
  %cmp = icmp ult i16 %inc, 400
  br i1 %cmp, label %for.body, label %for.end

for.end:                                          ; preds = %for.body
  %inc.lcssa = phi i16 [ %inc, %for.body ]
  ret i16 %inc.lcssa
}

We should be able to figure out that the result is constant, but the patch
breaks it.

Differential Revision: https://reviews.llvm.org/D51584

llvm-svn: 346198
2018-11-06 02:02:05 +00:00
Sanjay Patel 1440107821 [InstSimplify] fold select (fcmp X, Y), X, Y
This is NFCI for InstCombine because it calls InstSimplify, 
so I left the tests for this transform there. As noted in
the code comment, we can allow this fold more often by using
FMF and/or value tracking.

llvm-svn: 346169
2018-11-05 21:51:39 +00:00
Sanjay Patel 72c2d355b7 [InstSimplify] add tests for select+fcmp; NFC
These are translated from InstCombine's test file with the same name.
We should move the transform from InstCombine to InstSimplify.

llvm-svn: 346168
2018-11-05 21:42:01 +00:00
Taewook Oh 2b7ae47ccb [MergeICmps] Do not perform the transformation if GEP is used outside of block
Summary:
This patch prevents MergeICmps to performn the transformation if the address operand GEP of the load instruction has a use outside of the load's parent block. Without this patch, compiler crashes with the given test case because the use of `%first.i` is still around when the basic block is erased from https://github.com/llvm-mirror/llvm/blob/master/lib/Transforms/Scalar/MergeICmps.cpp#L620. I think checking `isUsedOutsideOfBlock` with `GEP` is the original intention of the code, as the checking for `LoadI` is already performed in the same function.

This patch is incomplete though, as this makes the pass overly conservative and fails the test `tuple-four-int8.ll`. I believe what needs to be done is checking if GEP has a use outside of block that is not the part of "Comparisons" chain. Submit the patch as of now to prevent compiler crash.

Reviewers: courbet, trentxintong

Reviewed By: courbet

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D54089

llvm-svn: 346151
2018-11-05 18:16:32 +00:00
Sanjay Patel 1cfba9b5ed [InstCombine] add/adjust tests for fcmp+select substitution; NFC
There was no coverage for at least 2 out of the 4 patterns because
of fcmp canonicalization. The tests and code should be moved to
InstSimplify in a follow-up because this doesn't create any new values.

llvm-svn: 346150
2018-11-05 18:09:10 +00:00
Sanjay Patel c26fd1e772 [InstCombine] canonicalize -0.0 to +0.0 in fcmp
As stated in IEEE-754 and discussed in:
https://bugs.llvm.org/show_bug.cgi?id=38086
...the sign of zero does not affect any FP compare predicate.

Known regressions were fixed with:
rL346097 (D54001)
rL346143

The transform will help reduce pattern-matching complexity to solve:
https://bugs.llvm.org/show_bug.cgi?id=39475
...as well as improve CSE and codegen (a zero constant is almost always
easier to produce than 0x80..00).

llvm-svn: 346147
2018-11-05 17:26:42 +00:00
Sanjay Patel 87aa10062c [InstCombine] loosen FP 0.0 constraint for fcmp+select substitution
It looks like we correctly removed edge cases with 0.0 from D50714,
but we were a bit conservative because getBinOpIdentity() doesn't
distinguish between +0.0 and -0.0 and 'nsz' is effectively always
true for fcmp (see discussion in:
https://bugs.llvm.org/show_bug.cgi?id=38086

Without this change, we would get regressions by canonicalizing
to +0.0 in all fcmp, and that's a step towards solving:
https://bugs.llvm.org/show_bug.cgi?id=39475

llvm-svn: 346143
2018-11-05 16:50:44 +00:00
Sanjay Patel 8b2a1f7fd9 [InstCombine] adjust tests for select with FP identity op; NFC
These are mislabeled as negative tests.

llvm-svn: 346142
2018-11-05 16:27:03 +00:00
Sanjay Patel 92a53eabc6 [InstCombine] add/adjust tests for select with fsub identity op; NFC
llvm-svn: 346138
2018-11-05 15:45:01 +00:00
Sanjay Patel 278db2fba1 [InstCombine] add tests for select with FP identity op; NFC
llvm-svn: 346136
2018-11-05 15:08:36 +00:00
David Green ba9f245b0d [Inliner] Penalise inlining of calls with loops at Oz
We currently seem to underestimate the size of functions with loops in them,
both in terms of absolute code size and in the difficulties of dealing with
such code. (Calls, for example, can be tail merged to further reduce
codesize). At -Oz, we can then increase code size by inlining small loops
multiple times.

This attempts to penalise functions with loops at -Oz by adding a CallPenalty
for each top level loop in the function. It uses LI (and hence DT) to calculate
the number of loops. As we are dealing with minsize, the inline threshold is
small and functions at this point should be relatively small, making the
construction of these cheap.

Differential Revision: https://reviews.llvm.org/D52716

llvm-svn: 346134
2018-11-05 14:54:34 +00:00
Vedant Kumar d2a895a972 [HotColdSplitting] Use TTI to inform outlining threshold
Using TargetTransformInfo allows the splitting pass to factor in the
code size cost of instructions as it decides whether or not outlining is
profitable.

This did not regress the overall amount of outlining seen on the handful
of internal frameworks I tested.

Thanks to Jun Bum Lim for suggesting this!

Differential Revision: https://reviews.llvm.org/D53835

llvm-svn: 346108
2018-11-04 23:11:57 +00:00
Sanjay Patel e7c94ef1de [ValueTracking] determine sign of 0.0 from select when matching min/max FP
In PR39475:
https://bugs.llvm.org/show_bug.cgi?id=39475
..we may fail to recognize/simplify fabs() in some cases because we do not 
canonicalize fcmp with a -0.0 operand.

Adding that canonicalization can cause regressions on min/max FP tests, so 
that's this patch: for the purpose of determining whether something is min/max, 
let the value returned by the select determine how we treat a 0.0 operand in the fcmp.

This patch doesn't actually change the -0.0 to +0.0. It just changes the analysis, so 
we don't fail to recognize equivalent min/max patterns that only differ in the 
signbit of 0.0.

Differential Revision: https://reviews.llvm.org/D54001

llvm-svn: 346097
2018-11-04 14:28:48 +00:00
Sanjay Patel cac28b452e [ValueTracking] peek through 2-input shuffles in ComputeNumSignBits
This patch gives the IR ComputeNumSignBits the same functionality as the 
DAG version (the code is derived from the existing code).

This an extension of the single input shuffle analysis added with D53659.

Differential Revision: https://reviews.llvm.org/D53987

llvm-svn: 346071
2018-11-03 13:18:55 +00:00
Jordan Rupprecht 80e7e86c29 [DebugInfo][InstMerge] Fix -debugify for phi node created by -mldst-motion
Summary:
-mldst-motion creates a new phi node without any debug info. Use the merged debug location from the incoming stores to fix this.

Fixes PR38177. The test case here is (somewhat) simplified from:

```
struct S {
  int foo;
  void fn(int bar);
};
void S::fn(int bar) {
  if (bar)
    foo = 1;
  else
    foo = 0;
}
```

Reviewers: dblaikie, gbedwell, aprantl, vsk

Reviewed By: vsk

Subscribers: vsk, JDevlieghere, llvm-commits

Tags: #debug-info

Differential Revision: https://reviews.llvm.org/D54019

llvm-svn: 346027
2018-11-02 18:25:41 +00:00
Jonas Paulsson 79f2441eee [SystemZ] Rework getInterleavedMemoryOpCost()
Model this function more closely after the BasicTTIImpl version, with
separate handling of loads and stores. For loads, the set of actually loaded
vectors is checked.

This makes it more readable and just slightly more accurate generally.

Review: Ulrich Weigand
https://reviews.llvm.org/D53071

llvm-svn: 345998
2018-11-02 17:15:36 +00:00
Ayal Zaks 45a3ca7be7 [LV] Avoid vectorizing loops under opt for size that involve SCEV checks
Fix PR39417, PR39497

The loop vectorizer may generate runtime SCEV checks for overflow and stride==1
cases, leading to execution of original scalar loop. The latter is forbidden
when optimizing for size. An assert introduced in r344743 triggered the above
PR's showing it does happen. This patch fixes this behavior by preventing
vectorization in such cases.

Differential Revision: https://reviews.llvm.org/D53612

llvm-svn: 345959
2018-11-02 09:16:12 +00:00
Florian Hahn c8bd6ea35e [LoopInterchange] Remove support for inner-only reductions.
Inner-loop only reductions require additional checks to make sure they
form a load-phi-store cycle across inner and outer loop. Otherwise the
reduction value is not properly preserved. This patch disables
interchanging such loops for now, as it causes miscompiles in some
cases and it seems to apply only for a tiny amount of loops. Across the
test-suite, SPEC2000 and SPEC2006, 61 instead of 62 loops are
interchange with inner loop reduction support disabled. With
-loop-interchange-threshold=-1000, 3256 instead of 3267.

See the discussion and history of D53027 for an outline of how such legality
checks could look like.

Reviewers: efriedma, mcrosier, davide

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D53027

llvm-svn: 345877
2018-11-01 19:25:00 +00:00
Sanjay Patel 73bb119940 [InstCombine] add test for ComputeNumSignBits on 2-input shuffle; NFC
llvm-svn: 345852
2018-11-01 16:57:54 +00:00
Sanjay Patel 746ebb4ee8 [InstSimplify] fold icmp based on range of abs/nabs (2nd try)
This is retrying the fold from rL345717 
(reverted at rL347780)
...with a fix for the miscompile
demonstrated by PR39510:
https://bugs.llvm.org/show_bug.cgi?id=39510

Original commit message:

This is a fix for PR39475:
https://bugs.llvm.org/show_bug.cgi?id=39475

We managed to get some of these patterns using computeKnownBits in https://reviews.llvm.org/D47041, but that
can't be used for nabs(). Instead, put in some range-based logic, so we can fold
both abs/nabs with icmp with a constant value.

Alive proofs:
https://rise4fun.com/Alive/21r

Name: abs_nsw_is_positive

  %cmp = icmp slt i32 %x, 0
  %negx = sub nsw i32 0, %x
  %abs = select i1 %cmp, i32 %negx, i32 %x
  %r = icmp sgt i32 %abs, -1
    =>
  %r = i1 true


Name: abs_nsw_is_not_negative

  %cmp = icmp slt i32 %x, 0
  %negx = sub nsw i32 0, %x
  %abs = select i1 %cmp, i32 %negx, i32 %x
  %r = icmp slt i32 %abs, 0
    =>
  %r = i1 false


Name: nabs_is_negative_or_0

  %cmp = icmp slt i32 %x, 0
  %negx = sub i32 0, %x
  %nabs = select i1 %cmp, i32 %x, i32 %negx
  %r = icmp slt i32 %nabs, 1
    =>
  %r = i1 true

Name: nabs_is_not_over_0

  %cmp = icmp slt i32 %x, 0
  %negx = sub i32 0, %x
  %nabs = select i1 %cmp, i32 %x, i32 %negx
  %r = icmp sgt i32 %nabs, 0
    =>
  %r = i1 false

Differential Revision: https://reviews.llvm.org/D53844

llvm-svn: 345832
2018-11-01 14:07:39 +00:00
Sanjay Patel 056807b01e [InstSimplify] add tests for icmp fold bug (PR39510); NFC
Verify that set intersection/subset are not confused.

llvm-svn: 345831
2018-11-01 14:03:22 +00:00
Max Kazantsev 3d347bf545 [IndVars] Smart hard uses detection
When rewriting loop exit values, IndVars considers this transform not profitable if
the loop instruction has a loop user which it believes cannot be optimized away.
In current implementation only calls that immediately use the instruction are considered
as such.

This patch extends the definition of "hard" users to any side-effecting instructions
(which usually cannot be optimized away from the loop) and also allows handling
of not just immediate users, but use chains.

Differentlai Revision: https://reviews.llvm.org/D51584
Reviewed By: etherzhhb

llvm-svn: 345814
2018-11-01 06:47:01 +00:00
Sanjay Patel 72fe03f93b revert rL345717 : [InstSimplify] fold icmp based on range of abs/nabs
This can miscompile as shown in PR39510:
https://bugs.llvm.org/show_bug.cgi?id=39510

llvm-svn: 345780
2018-10-31 21:37:40 +00:00
Sanjay Patel b041831a1a [InstCombine] add tests for fmin/fmax pattern matching failure; NFC
llvm-svn: 345771
2018-10-31 20:03:27 +00:00
Sanjay Patel 886893883a [InstCombine] regenerate test checks; NFC
llvm-svn: 345757
2018-10-31 18:17:51 +00:00
Sanjay Patel 5bcec66c55 [InstCombine] add tests for fcmp with -0.0; NFC
From IEEE754: "Comparisons shall ignore the sign of zero (so +0 = −0)."

llvm-svn: 345752
2018-10-31 17:55:40 +00:00
Volkan Keles 3ca146d083 [InstCombine] Combine nested min/max intrinsics with constants
Reviewers: arsenm, spatel

Reviewed By: spatel

Subscribers: lebedev.ri, wdng, llvm-commits

Differential Revision: https://reviews.llvm.org/D53774

llvm-svn: 345751
2018-10-31 17:50:52 +00:00
Sanjay Patel 1c254c6716 [InstCombine] refactor fabs+fcmp fold; NFC
Also, remove/replace/minimize/enhance the tests for this fold.
The code drops FMF, so it needs more tests and at least 1 fix.

llvm-svn: 345734
2018-10-31 16:34:43 +00:00
Sanjay Patel d4dc30c20d [InstSimplify] fold 'fcmp nnan ult X, 0.0' when X is not negative
This is the inverted case for the transform added with D53874 / rL345725.

llvm-svn: 345728
2018-10-31 15:35:46 +00:00
Sanjay Patel 85cba3b6fb [InstSimplify] fold 'fcmp nnan oge X, 0.0' when X is not negative
This re-raises some of the open questions about how to apply and use fast-math-flags in IR from PR38086:
https://bugs.llvm.org/show_bug.cgi?id=38086
...but given the current implementation (no FMF on casts), this is likely the only way to predicate the 
transform.

This is part of solving PR39475:
https://bugs.llvm.org/show_bug.cgi?id=39475

Differential Revision: https://reviews.llvm.org/D53874

llvm-svn: 345725
2018-10-31 14:57:23 +00:00
Fedor Sergeev 412ed34744 [LoopUnroll] allow customization for new-pass-manager version of LoopUnroll
Unlike its legacy counterpart new pass manager's LoopUnrollPass does
not provide any means to select which flavors of unroll to run
(runtime, peeling, partial), relying on global defaults.

In some cases having ability to run a restricted LoopUnroll that
does more than LoopFullUnroll is needed.

Introduced LoopUnrollOptions to select optional unroll behaviors.
Added 'unroll<peeling>' to PassRegistry mainly for the sake of testing.

Reviewers: chandlerc, tejohnson
Differential Revision: https://reviews.llvm.org/D53440

llvm-svn: 345723
2018-10-31 14:33:14 +00:00
Sanjay Patel 1cd9917edf [InstSimplify] add tests for fcmp and known positive; NFC
llvm-svn: 345722
2018-10-31 14:29:21 +00:00
Sanjay Patel 2efccd2cf2 [InstSimplify] fold icmp based on range of abs/nabs
This is a fix for PR39475:
https://bugs.llvm.org/show_bug.cgi?id=39475

We managed to get some of these patterns using computeKnownBits in D47041, but that 
can't be used for nabs(). Instead, put in some range-based logic, so we can fold 
both abs/nabs with icmp with a constant value.

Alive proofs:
https://rise4fun.com/Alive/21r

Name: abs_nsw_is_positive
  %cmp = icmp slt i32 %x, 0
  %negx = sub nsw i32 0, %x
  %abs = select i1 %cmp, i32 %negx, i32 %x
  %r = icmp sgt i32 %abs, -1
    =>
  %r = i1 true
 
Name: abs_nsw_is_not_negative
  %cmp = icmp slt i32 %x, 0
  %negx = sub nsw i32 0, %x
  %abs = select i1 %cmp, i32 %negx, i32 %x
  %r = icmp slt i32 %abs, 0
    =>
  %r = i1 false
 
Name: nabs_is_negative_or_0
  %cmp = icmp slt i32 %x, 0
  %negx = sub i32 0, %x
  %nabs = select i1 %cmp, i32 %x, i32 %negx
  %r = icmp slt i32 %nabs, 1
    =>
  %r = i1 true

Name: nabs_is_not_over_0
  %cmp = icmp slt i32 %x, 0
  %negx = sub i32 0, %x
  %nabs = select i1 %cmp, i32 %x, i32 %negx
  %r = icmp sgt i32 %nabs, 0
    =>
  %r = i1 false

Differential Revision: https://reviews.llvm.org/D53844

llvm-svn: 345717
2018-10-31 13:25:10 +00:00
Max Kazantsev ea35455d9e [NFC] Add tests for loop-simplifycfg for further development
llvm-svn: 345713
2018-10-31 11:28:23 +00:00
Max Kazantsev 541f824d32 [IndVars] Strengthen restricton in rewriteLoopExitValues
For some unclear reason rewriteLoopExitValues considers recalculation
after the loop profitable if it has some "soft uses" outside the loop (i.e. any
use other than call and return), even if we have proved that it has a user inside
the loop which we think will not be optimized away.

There is no existing unit test that would explain this. This patch provides an
example when rematerialisation of exit value is not profitable but it passes
this check due to presence of a "soft use" outside the loop.

It makes no sense to recalculate value on exit if we are going to compute it
due to some irremovable within the loop. This patch disallows applying this
transform in the described situation.

Differential Revision: https://reviews.llvm.org/D51581
Reviewed By: etherzhhb

llvm-svn: 345708
2018-10-31 10:30:50 +00:00
Dorit Nuzman 34da6dd696 [LV] Support vectorization of interleave-groups that require an epilog under
optsize using masked wide loads 

Under Opt for Size, the vectorizer does not vectorize interleave-groups that
have gaps at the end of the group (such as a loop that reads only the even
elements: a[2*i]) because that implies that we'll require a scalar epilogue
(which is not allowed under Opt for Size). This patch extends the support for
masked-interleave-groups (introduced by D53011 for conditional accesses) to
also cover the case of gaps in a group of loads; Targets that enable the
masked-interleave-group feature don't have to invalidate interleave-groups of
loads with gaps; they could now use masked wide-loads and shuffles (if that's
what the cost model selects).

Reviewers: Ayal, hsaito, dcaballe, fhahn

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D53668

llvm-svn: 345705
2018-10-31 09:57:56 +00:00
Quentin Colombet 900678227c [InstCombine] Teach the move free before null test opti how to deal with noop casts
InstCombine features an optimization that essentially replaces:
if (a)
  free(a)
into:
free(a)

Right now, this optimization is gated by the minsize attribute and therefore
we only perform it if we can prove that we are going to be able to eliminate
the branch and the destination block.

However when casts are involved the optimization would fail to apply, because
the optimization was not smart enough to realize that it is possible to also
move the casts away from the destination block and that is harmless to the
performance since they are just noops.
E.g.,
foo(int *a)
if (a)
  free((char*)a)

Wouldn't be optimized by instcombine, because
- We would refuse to hoist the `bitcast i32* %a to i8` in the source block
- We would fail to see that `bitcast i32* %a to i8` and %a are the same value.

This patch fixes both these problems:
- It teaches the pattern matching of the comparison how to look
  through casts.
- It checks that whether the additional instruction in the destination block
  can be hoisted and are harmless performance-wise.
- It hoists all the code of the destination block in the source block.

Differential Revision: D53356

llvm-svn: 345644
2018-10-30 20:51:04 +00:00
Volkan Keles 5a672b22e8 [InstCombine] Add preliminary tests for nested min/max combines. NFC
Summary: As requested in D53774.

Reviewers: spatel

Reviewed By: spatel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D53875

llvm-svn: 345616
2018-10-30 17:51:14 +00:00
Sanjay Patel bcde5afcac [InstSimplify] add tests for fcmp folds; NFC
This is part of a problem noted in PR39475:
https://bugs.llvm.org/show_bug.cgi?id=39475

llvm-svn: 345615
2018-10-30 16:58:43 +00:00
Sanjay Patel b12e410082 [InstCombine] try to turn shuffle into insertelement
shuffle (insert ?, Scalar, IndexC), V1, Mask --> insert V1, Scalar, IndexC'

The motivating case is at least a couple of steps away: I noticed that
SLPVectorizer does not analyze shuffles as well as sequences of 
insert/extract in PR34724:
https://bugs.llvm.org/show_bug.cgi?id=34724
...so SLP may fail to vectorize when source code has shuffles to start 
with or instcombine has converted insert/extract to shuffles.

Independent of that, an insertelement is always a simpler op for IR 
analysis vs. a shuffle, so we should transform to insert when possible.

I don't think there's any codegen concern here - if a target can't insert 
a scalar directly to some fixed element in a vector (x86?), then this 
should get expanded to the insert+shuffle that we started with.

Differential Revision: https://reviews.llvm.org/D53507

llvm-svn: 345607
2018-10-30 15:26:39 +00:00
Jonas Paulsson 1f067c94dc [LoopVectorizer] Fix for cost values of memory accesses.
This commit is a combination of two patches:

* "Fix in getScalarizationOverhead()"

   If target returns false in TTI.prefersVectorizedAddressing(), it means the
   address registers will not need to be extracted. Therefore, there should
   be no operands scalarization overhead for a load instruction.

* "Don't pass the instruction pointer from getMemInstScalarizationCost."

   Since VF is always > 1, this is a cost query for an instruction in the
   vectorized loop and it should not be evaluated within the scalar
   context of the instruction.

Review: Ulrich Weigand, Hal Finkel
https://reviews.llvm.org/D52351
https://reviews.llvm.org/D52417

llvm-svn: 345603
2018-10-30 14:34:15 +00:00
Nicola Zaghen f96383c99e [SROA] Use offset sizes from the DataLayout instead of the pointer siezes.
This fixes an assertion when constant folding a GEP when the part of the offset
was in i32 (IndexSize, as per DataLayout) and part in the i64 (PointerSize) in
the newly created test case.

Differential Revision: https://reviews.llvm.org/D52609

llvm-svn: 345585
2018-10-30 11:15:04 +00:00
Sanjay Patel 33603198f2 [InstSimplify] add tests for abs/nabs+icmp folding; NFC
llvm-svn: 345541
2018-10-29 21:05:41 +00:00
Fedor Sergeev f923e86205 [LoopUnroll] NFC. Factor out runtime-loop.ll common test behavior.
Adding COMMON prefix to get common part handled there.
Needed to simplify test changes for D53440.

llvm-svn: 345538
2018-10-29 20:38:23 +00:00
Vedant Kumar dd4be53b20 [HotColdSplitting] Allow outlining single-block cold regions
It can be profitable to outline single-block cold regions because they
may be large.

Allow outlining single-block regions if they have over some threshold of
non-debug, non-terminator instructions. I chose 3 as the threshold after
experimenting with several internal frameworks.

In practice, reducing the threshold further did not give much
improvement, whereas increasing it resulted in substantial regressions.

Differential Revision: https://reviews.llvm.org/D53824

llvm-svn: 345524
2018-10-29 19:15:39 +00:00
Renato Golin 53bd4f4832 Revert r344172: [LV] Add a new reduction pattern match
This patch has caused fast-math issues in the reduction pattern.

Will re-work and land again.

llvm-svn: 345465
2018-10-27 22:13:43 +00:00
Florian Hahn fc7654a67b [Local] Keep K's range if K does not move when combining metadata.
As K has to dominate I, IIUC I's range metadata must be a subset of
K's. After Eli's recent clarification to the LangRef, loading a value
outside of the range is undefined behavior.
Therefore if I's range contains elements outside of K's range and we would load
one such value, K would cause undefined behavior.

In cases like hoisting/sinking, we still want the most generic range
over all code paths to/from the hoist/sink point. As suggested in the
patches related to D47339, I will refactor the handling of those
scenarios and try to decouple it from this function as follow up, once
we switched to a similar handling of metadata in most of
combineMetadata.

I updated some tests checking mostly the merging of metadata to keep the
metadata of to dominating load. The most interesting one is probably test8 in
test/Transforms/JumpThreading/thread-loads.ll. It contained a comment
about the alias metadata preventing us to eliminate the branch, but it
seem like the actual problem currently is that we merge the ranges of
both loads and cannot eliminate the icmp afterwards. With this patch, we
manage to eliminate the icmp, as the range of the first load excludes 8.

Reviewers: efriedma, nlopes, davide

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D51629

llvm-svn: 345456
2018-10-27 16:53:45 +00:00
Sanjay Patel cc9e401e3c [ValueTracking] peek through shuffles in ComputeNumSignBits (PR37549)
The motivating case is from PR37549:
https://bugs.llvm.org/show_bug.cgi?id=37549

The analysis improvement allows us to form a vector 'select' out of 
bitwise logic (the use of ComputeNumSignBits was added at rL345149).

The smaller test shows another InstCombine improvement - we use 
ComputeNumSignBits to add 'nsw' to shift-left. But the negative
test shows an example where we must not add 'nsw' - when the shuffle
mask contains undef elements.

Differential Revision: https://reviews.llvm.org/D53659

llvm-svn: 345429
2018-10-26 21:05:14 +00:00
Christy Lee 3cc0e935c4 Pointer types were treated as zero-size by MergeICmps
Summary:
The visitICmp analysis function would record compares of pointer types, as size 0. This causes the resulting memcmp() call to have the wrong total size.
Found with "self-build" of clang/LLVM on Windows.

Reviewers: christylee, trentxintong, courbet

Reviewed By: courbet

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D53536

llvm-svn: 345413
2018-10-26 18:02:06 +00:00
Max Kazantsev 619a83463f [SimpleLoopUnswitch] Unswitch by experimental.guard intrinsics
This patch adds support of `llvm.experimental.guard` intrinsics to non-trivial
simple loop unswitching. These intrinsics represent implicit control flow which
has pretty much the same semantics as usual conditional branches. The
algorithm of dealing with them is following:

- Consider guards as unswitching candidates;
- If a guard is considered the best candidate, turn it into a branch;
- Apply normal unswitching algorithm on this branch.

The patch has no compile time effect on code that does not contain any guards.

Differential Revision: https://reviews.llvm.org/D53744
Reviewed By: chandlerc

llvm-svn: 345387
2018-10-26 14:20:11 +00:00
Cameron McInally 384a74b0e6 [FPEnv] Last BinaryOperator::isFNeg(...) to m_FNeg(...) changes
Replacing BinaryOperator::isFNeg(...) to avoid regressions when we
separate FNeg from the FSub IR instruction.

Differential Revision: https://reviews.llvm.org/D53650

llvm-svn: 345295
2018-10-25 18:09:33 +00:00
Sam Parker a16667e79b [ARM] Use Cortex-A57 sched model for Cortex-A72
This mirrors what we already do for AArch64 as the cores are similar.
As discussed in the review, enabling the machine scheduler causes
more variations in performance changes so it is not enabled for now.
This patch improves LNT scores by a geomean of 1.57% at -O3.

Differential Revision: https://reviews.llvm.org/D53562

llvm-svn: 345272
2018-10-25 15:08:29 +00:00
Simon Pilgrim 53e8e145e9 [CostModel][X86] Add realistic vXi64 uitofp vXf64 costs
Match codegen improvements from D53649/rL345256

llvm-svn: 345263
2018-10-25 13:06:20 +00:00
Simon Pilgrim 0573b8d8b6 [CostModel][X86] Add realistic i64 uitofp f64 scalar costs
llvm-svn: 345261
2018-10-25 12:42:10 +00:00
Gabor Buella 1f6ca0ba15 Add -instcombine-code-sinking option
Reviewers: craig.topper, andrew.w.kaylor, efriedma

Reviewed By: craig.topper, andrew.w.kaylor, efriedma

Differential Revision: https://reviews.llvm.org/D52709

llvm-svn: 345248
2018-10-25 08:32:29 +00:00
Alina Sbirlea ad4d018202 Update MemorySSA in LoopRotate.
Summary:
Teach LoopRotate to preserve MemorySSA.
Enable tests for correctness, dependency disabled by default.

Subscribers: sanjoy, jlebar, Prazek, george.burgess.iv, llvm-commits

Differential Revision: https://reviews.llvm.org/D51718

llvm-svn: 345216
2018-10-24 22:46:45 +00:00
Vedant Kumar c299006879 [HotColdSplitting] Identify larger cold regions using domtree queries
The current splitting algorithm works in three stages:

  1) Identify cold blocks, then
  2) Use forward/backward propagation to mark hot blocks, then
  3) Grow a SESE region of blocks *outside* of the set of hot blocks and
  start outlining.

While testing this pass on Apple internal frameworks I noticed that some
kinds of control flow (e.g. loops) are never outlined, even though they
unconditionally lead to / follow cold blocks. I noticed two other issues
related to how cold regions are identified:

  - An inconsistency can arise in the internal state of the hotness
  propagation stage, as a block may end up in both the ColdBlocks set
  and the HotBlocks set. Further inconsistencies can arise as these sets
  do not match what's in ProfileSummaryInfo.

  - It isn't necessary to limit outlining to single-exit regions.

This patch teaches the splitting algorithm to identify maximal cold
regions and outline them. A maximal cold region is defined as the set of
blocks post-dominated by a cold sink block, or dominated by that sink
block. This approach can successfully outline loops in the cold path. As
a side benefit, it maintains less internal state than the current
approach.

Due to a limitation in CodeExtractor, blocks within the maximal cold
region which aren't dominated by a single entry point (a so-called "max
ancestor") are filtered out.

Results:
  - X86 (LNT + -Os + externals): 134KB of TEXT were outlined compared to
  47KB pre-patch, or a ~3x improvement. Did not see a performance impact
  across two runs.
  - AArch64 (LNT + -Os + externals + Apple-internal benchmarks): 149KB
  of TEXT were outlined. Ditto re: performance impact.
  - Outlining results improve marginally in the internal frameworks I
  tested.

Follow-ups:
  - Outline more than once per function, outline large single basic
  blocks, & try to remove unconditional branches in outlined functions.

Differential Revision: https://reviews.llvm.org/D53627

llvm-svn: 345209
2018-10-24 22:15:41 +00:00
Sanjay Patel e9f9a2a29c [InstCombine] add test for fptrunc with vector with undef elt; NFC
This should be fixed with D53650.

llvm-svn: 345206
2018-10-24 22:02:05 +00:00
Teresa Johnson c8dba682bb [hot-cold-split] Name split functions with ".cold" suffix
Summary:
The current default of appending "_"+entry block label to the new
extracted cold function breaks demangling. Change the deliminator from
"_" to "." to enable demangling. Because the header block label will
be empty for release compile code, use "extracted" after the "." when
the label is empty.

Additionally, add a mechanism for the client to pass in an alternate
suffix applied after the ".", and have the hot cold split pass use
"cold."+Count, where the Count is currently 1 but can be used to
uniquely number multiple cold functions split out from the same function
with D53588.

Reviewers: sebpop, hiraditya

Subscribers: llvm-commits, erik.pilkington

Differential Revision: https://reviews.llvm.org/D53534

llvm-svn: 345178
2018-10-24 18:53:47 +00:00
Sanjay Patel d1fe437cf1 [InstCombine] add test for ComputeNumSignBits with shuffle; NFC
llvm-svn: 345162
2018-10-24 17:01:42 +00:00
Sanjay Patel 2169b9c976 [InstCombine] add test for select with shuffled condition (PR37549); NFC
llvm-svn: 345156
2018-10-24 16:21:23 +00:00
Sanjay Patel 3b206305fd [InstCombine] try harder to form select from logic ops (2nd try)
The original patch was committed here:
rL344609
...and reverted:
rL344612
...because it did not properly check/test data types before calling
ComputeNumSignBits(). 

The tests that caused bot failures for the previous commit are 
over-reaching front-end tests that run the entire -O optimizer 
pipeline: 
    Clang :: CodeGen/builtins-systemz-zvector.c
    Clang :: CodeGen/builtins-systemz-zvector2.c

I've added a negative test here to ensure coverage for that case.
The new early exit check also tests the type of the 'B' parameter,
so we don't waste time on matching if either value is unsuitable.

Original commit message:

This is part of solving PR37549:
https://bugs.llvm.org/show_bug.cgi?id=37549

The patterns shown here are a special case of something
that we already convert to select. Using ComputeNumSignBits()
catches that case (but not the more complicated motivating
patterns yet).

The backend has hooks/logic to convert back to logic ops
if that's better for the target.

llvm-svn: 345149
2018-10-24 15:17:56 +00:00
Cameron McInally 678f43f666 [FPEnv] Convert more BinaryOperator::isFNeg(...) to m_FNeg(...)
This work is to avoid regressions when we seperate FNeg from the FSub IR instruction. 

Differential Revision: https://reviews.llvm.org/D53205

llvm-svn: 345146
2018-10-24 14:45:18 +00:00
Gil Rapaport c523036fd2 Revert r345114
Investigating fails.

llvm-svn: 345123
2018-10-24 08:41:22 +00:00
Dorit Nuzman 5114390e48 [LV] Don't have fold-tail under optsize invalidate interleave-groups when
masked-interleaving is enabled

Enable interleave-groups under fold-tail scenario for Opt for size compilation;
D50480 added support for vectorizing loops of arbitrary trip-count without a
remiander, which in turn makes everything in the loop conditional, including
interleave-groups if any. It therefore invalidated all interleave-groups
because we didn't have support for vectorizing predicated interleaved-groups
at the time. In the meantime, D53011 introduced this support, so we don't
have to invalidate interleave-groups when masked-interleaved support is enabled.

Reviewers: Ayal, hsaito, dcaballe, fhahn

Reviewed By: hsaito

Differential Revision: https://reviews.llvm.org/D53559

llvm-svn: 345115
2018-10-24 07:11:38 +00:00
Gil Rapaport 5012e7f6ac [LSR] Combine unfolded offset into invariant register
LSR reassociates constants as unfolded offsets when the constants fit as
immediate add operands, which currently prevents such constants from being
combined later with loop invariant registers.
This patch modifies GenerateCombinations() to generate a second formula which
includes the unfolded offset in the combined loop-invariant register.

Differential Revision: https://reviews.llvm.org/D51861

llvm-svn: 345114
2018-10-24 07:08:38 +00:00
Wei Mi 80a0c97e07 [PM] keeping history when original SCC split and then merge into itself
in the same round of SCC update.

In https://reviews.llvm.org/rL309784, inline history is added to prevent
infinite inlining across multiple run of inliner and SCC update, but the
history will only be kept when new SCC is actually generated during SCC update.

We found a case that SCC can be split and then merge into itself in the same
round of SCC update, so the same SCC will be pop out from UR.CWorklist and
then added back immediately, without any new SCC generated, that is why the
existing patch cannot catch the infinite inline case.

What the patch does is even if no new SCC is generated, if only the current
SCC appears in UR.CWorklist again, then keep the inline history.

Differential Revision: https://reviews.llvm.org/D52915

llvm-svn: 345103
2018-10-23 23:29:45 +00:00
Vedant Kumar 503154615d [HotColdSplitting] Attach MinSize to outlined code
Outlined code is cold by assumption, so it makes sense to optimize it
for minimal code size rather than performance.

After r344869 moved the splitting pass to the end of the IR pipeline,
this does not result in much of a code size reduction. This is probably
because a comparatively small number backend transforms make use of the
MinSize hint.

Running LNT on x86_64, I see that 33/1020 binaries shrink for a total of
919 bytes of TEXT reduction. I didn't measure a significant performance
impact.

Differential Revision: https://reviews.llvm.org/D53518

llvm-svn: 345072
2018-10-23 19:41:12 +00:00
Jordan Rupprecht 2fed6ac186 [DebugInfo][GlobalOpt] Fix -debugify for globalopt shrinking globals to booleans.
Summary:
TryToShrinkGlobalToBoolean, when possible, will split store <value> + load <value> into store <bool> + select <bool ? value : 0>. This preserves DebugLoc during that pass.

Fixes PR37959. The test case here is the simplified .ll for:

```
static int foo;
int bar() {
  foo = 5;
  return foo;
}
```

Reviewers: dblaikie, gbedwell, aprantl

Reviewed By: dblaikie

Subscribers: mehdi_amini, JDevlieghere, dexonsmith, llvm-commits

Tags: #debug-info

Differential Revision: https://reviews.llvm.org/D53531

llvm-svn: 345046
2018-10-23 16:35:51 +00:00
Sanjay Patel 5b6b090cf2 [Reassociate] replace fake binop queries with 'match' API
We need to update this code before introducing an 'fneg' instruction in IR,
so we might as well kill off the integer neg/not queries too.

This is no-functional-change-intended for scalar code and most vector code. 
For vectors, we can see that the 'match' API allows for undef elements in 
constants, so we optimize those cases better.

Ideally, there would be a test for each code diff, but I don't see evidence
of that for the existing code, so I didn't try very hard to come up with new 
vector tests for each code change.

Differential Revision: https://reviews.llvm.org/D53533

llvm-svn: 345042
2018-10-23 15:55:06 +00:00
Simon Pilgrim 532a0f122e [SLPVectorizer] Add basic support for mul/and/or/xor horizontal reductions
Expand arithmetic reduction to include mul/and/or/xor instructions.

This patch just fixes the SLPVectorizer - the effective reduction costs for AVX1+ are still poor (see rL344846) and will need to be improved before SLP sees this as a valid transform - but we can already see the effect on SSE2 tests.

This partially helps PR37731, but doesn't fix it all as it still falls over on the extraction/reduction order for some reason.

Differential Revision: https://reviews.llvm.org/D53473

llvm-svn: 345037
2018-10-23 15:13:09 +00:00