Commit Graph

49804 Commits

Author SHA1 Message Date
Max Kazantsev a13e163a27 [RewriteStatepoints] Fix incorrect assertion
`RewriteStatepointsForGC` iterates over function blocks and their predecessors
in order of declaration. One of outcomes of this is that callsites are placed in
arbitrary order which has nothing to do with travelsar order.

On the other hand, function `recomputeLiveInValues` asserts that bases are
added to `Info.PointerToBase` before their deried pointers are updated. But
if call sites are processed in order different from RPOT, this is not necessarily
true. We cannot guarantee that the base was placed there before every
pointer derived from it. All we can guarantee is that this base was marked as
known base by this point.

This patch replaces the fact that we assert from checking that the base was
added to the map with assert that the base was marked as known base.

Differential Revision: https://reviews.llvm.org/D41593

llvm-svn: 321517
2017-12-28 12:03:12 +00:00
Simon Pilgrim 62411e4d4f [X86][SSE] Use PMADDWD for v4i32 multiplies with 17 or more leading zeros
If there are 17 or more leading zeros to the v4i32 elements, then we can use PMADD for the integer multiply when PMULLD is unavailable or slow.

The 17 bits need to be zero as the PMADDWD performs a v8i16 signed-mul-extend + pairwise-add - the upper 16 so we're adding a zero pair and the 17th bit so we don't incorrectly sign extend.

Differential Revision: https://reviews.llvm.org/D41484

llvm-svn: 321516
2017-12-28 10:05:49 +00:00
Simon Pilgrim 472689a159 [InstCombine] Check for isa<Instruction> before using cast<>
Protects against casts from constexpr etc.

Reduced from oss-fuzz #4788 test case

llvm-svn: 321515
2017-12-28 09:35:35 +00:00
Reid Kleckner 6d31001cd6 Revert "[memcpyopt] Teach memcpyopt to optimize across basic blocks"
This reverts r321138. It seems there are still underlying issues with
memdep. PR35519 seems to still be present if debug info is enabled. We
end up losing a memcpy. Somehow during store to memset merging, we
insert the memset after the memcpy or fail to update the memdep analysis
to account for the newly inserted memset of a pair.

Reduced test case:

  #include <assert.h>
  #include <stdio.h>
  #include <string>
  #include <utility>
  #include <vector>

  void do_push_back(
      std::vector<std::pair<std::string, std::vector<std::string>>>* crls) {
    crls->push_back(std::make_pair(std::string(), std::vector<std::string>()));
  }

  int __attribute__((optnone)) main() {
    // Put some data in the vector and then remove it so we take the push_back
    // fast path.
    std::vector<std::pair<std::string, std::vector<std::string>>> crl_set;
    crl_set.push_back({"asdf", {}});
    crl_set.pop_back();
    printf("first word in vector storage: %p\n", *(void**)crl_set.data());

    // Do the push_back which may fail to initialize the data.
    do_push_back(&crl_set);
    auto* first = &crl_set.back().first;
    printf("first word in vector storage (should be zero): %p\n",
           *(void**)crl_set.data());
    assert(first->empty());
    puts("ok");
  }

Compile with libc++, enable optimizations, and enable debug info:
$ clang++ -stdlib=libc++ -g -O2 t.cpp -o t.exe -Wl,-rpath=llvm/build/lib

This program will assert with this change.

llvm-svn: 321510
2017-12-28 05:10:33 +00:00
Sanjay Patel 84d54c3d8f [InstCombine] add tests for min/max folds (PR35717); NFC
llvm-svn: 321500
2017-12-27 21:55:06 +00:00
Petr Hosek ad6f457c39 [llvm-readobj] Support -needed-libs option for COFF files
This implements the -needed-libs option in the COFF dumper.

Differential Revision: https://reviews.llvm.org/D41529

llvm-svn: 321498
2017-12-27 19:59:56 +00:00
Andrew V. Tischenko 428e302f4d A special test to demonstrate debug logging for asm matcher.
llvm-svn: 321497
2017-12-27 19:25:21 +00:00
Craig Topper 72bbbeb2a7 [X86] Reimplement r321437 using custom lowering instead of as a DAG combine.
My original implementation ran as a DAG combine post type legalization, but it turns out we don't run that DAG combine step if type legalization didn't change anything. Attempts to make the combine run before type legalization as well hit other issues.

So just do it in LowerMUL where we can catch more cases.

llvm-svn: 321496
2017-12-27 19:09:40 +00:00
Matthew Simpson 9439f54902 [AArch64] Change order of candidate FMLS patterns
r319980 added new patterns to the machine combiner for transforming (fsub (fmul
x y) z) into (fmla (fneg z) x y). That is, fsub's where the first source
operand is an fmul are transformed. We previously only matched the case where
the second source operand of an fsub was an fmul, transforming (fsub z (fmul x
y)) into (fmls z x y). Now, if we have an fsub where both source operands are
fmuls, both of the above patterns are applicable.

However, the order in which we add the patterns to the list of candidates
determines the transformation that takes place, since only the first pattern
that matches will be used. This patch changes the order these two patterns are
added to the list of candidates such that we prefer the case where the second
source operand is an fmul (the fmls case), rather than the other one (the
fmla/fneg case). When both source operands are fmuls, this ordering results in
fewer instructions.

Differential Revision: https://reviews.llvm.org/D41587

llvm-svn: 321491
2017-12-27 15:25:01 +00:00
Benjamin Kramer 293f34301e [X86] Fix vmul combine for AVX1 targets.
v8i32 is legal von AVX1, but it doesn't have pmuludq for it.

llvm-svn: 321490
2017-12-27 13:31:50 +00:00
Simon Pilgrim e7d032f1d8 [InstCombine] Gracefully handle out of range extractelement indices
InstSimplify is responsible for handling these, but we shouldn't just assert here.

Reduced from oss-fuzz #4808 test case

llvm-svn: 321489
2017-12-27 12:00:18 +00:00
Simon Pilgrim 6fad3cbc66 [DAGCombine] foldBinOpIntoSelect can fail to constant fold in some cases.
For example, float operations may fail to constant fold under certain circumstances (inf/nan/denormal creation etc.)

Reduced from oss-fuzz #4802 test case

llvm-svn: 321488
2017-12-27 11:36:18 +00:00
Mikael Holmen fb2fd20f50 [Lint] Don't warn about noalias argument aliasing if other argument is byval
Summary:
When using byval, the data is effectively copied as part of the call
anyway, so we aren't actually passing the pointer and thus there is no
reason to issue a warning.

Reviewers: rnk

Reviewed By: rnk

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D40118

llvm-svn: 321478
2017-12-27 08:48:33 +00:00
Gadi Haber 309b06cb5c [X86][RD]: Adding full coverage of MC encoding for RD isa sets.<NFC>
NFC.
Adding MC regressions tests to cover RDPMC, RDRAND, RDRAND, RDSEED, RDTSCP, DWRFSGS isa sets.
This patch is part of a larger task to cover MC encoding of all X86 isa sets started in revision: https://reviews.llvm.org/D39952

Reviewers: zvi, craig.topper, RKSimon, AndreiGrischenk
Differential Revision: https://reviews.llvm.org/D41328

Change-Id: Ie97b397546e6b1ed180c6abd7b41fccb136d2b82
llvm-svn: 321476
2017-12-27 08:35:57 +00:00
Serguei Katkov edf3c8292b [SCEV] Do not insert if it is already in cache
This is fix for the crash caused by ScalarEvolution::getTruncateExpr.

It expects that if it checked the condition that SCEV is not in UniqueSCEVs cache in
the beginning that it will not be there inside this method.

However during recursion and transformation/simplification for sub expression,
it is possible that these modifications will end up with the same SCEV as we started from.

So we must always check whether SCEV is in cache and do not insert item if it is already there.

Reviewers: sanjoy, mkazantsev, craig.topper	
Reviewed By: sanjoy
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D41380

llvm-svn: 321472
2017-12-27 07:15:23 +00:00
Philip Reames cd13a66381 [instcombine] add powi(x, 2) -> x * x
llvm-svn: 321468
2017-12-27 01:30:12 +00:00
Zhaoshi Zheng 8af1e1cb78 [Unroll][DebugInfo] Propagate loop body's debug location to epilog preheader
NewExit and epilog PreHeader should has the same debug loc as the original loop
body, instead of original loop exit.

llvm-svn: 321465
2017-12-26 23:31:21 +00:00
Simon Pilgrim b17c204cc0 [DAGCombine] visitANDLike - ensure APInt is is in range for getSExtValue/getZExtValue
Reduced from oss-fuzz #4782 test case

llvm-svn: 321464
2017-12-26 23:27:44 +00:00
Craig Topper 8d8a7da745 [X86] Regenerate test using update_llc_test_checks.py.
llvm-svn: 321462
2017-12-26 22:22:57 +00:00
Sanjay Patel 14adbacd8a [InstCombine] fix miscompile of frem with 0.0 operand (PR34870)
We might want to select NAN here or do this transform with fast-math,
but this should at least fix the miscompile.

llvm-svn: 321461
2017-12-26 22:12:20 +00:00
Sanjay Patel 546c43fd1a [InstCombine] add test for frem with 0.0 (PR34870); NFC
llvm-svn: 321460
2017-12-26 22:06:57 +00:00
Andrew V. Tischenko 1dd7856af5 It's a fix for Bug 35741 - can't use comments after x86 prefixes.
Differential Revision: https://reviews.llvm.org/D41579

llvm-svn: 321459
2017-12-26 18:29:52 +00:00
Sanjay Patel 9a39979dd2 [ValueTracking] ignore FP signed-zero when detecting a casted-to-integer fmin/fmax pattern
This is a preliminary step for the patch discussed in D41136 (and denoted here with the FIXME comment).

When we match an FP min/max that is cast to integer, any intermediate difference between +0.0 or -0.0 
should be muted in the result by the conversion (either fptosi or fptoui) of the result. Thus, we can 
enable 'nsz' for the purpose of matching fmin/fmax.

Note that there's probably room to generalize this more, possibly by fixing the current calls to the
weak version of isKnownNonZero() in matchSelectPattern() to the more powerful recursive version.

Differential Revision: https://reviews.llvm.org/D41333

llvm-svn: 321456
2017-12-26 15:09:19 +00:00
Simon Pilgrim 628f63e5fd [DAGCombine] Don't combine (and (setne X, 0), (setne X, -1)) --> (setuge (add X, 1), 2) for i1
Reduced from oss-fuzz #4773 test case

llvm-svn: 321455
2017-12-26 14:48:28 +00:00
Simon Pilgrim 79c2c2f08c [InstSimplify] Check for in range extraction index before calling APInt::getZExtValue()
Reduced from oss-fuzz #4768 test case

llvm-svn: 321454
2017-12-26 11:42:39 +00:00
Craig Topper 162439dcdf [X86] Pass itins.rr/itins.rm through properly for some instructions.
llvm-svn: 321452
2017-12-26 05:43:05 +00:00
Craig Topper 9b800c692e [X86] Use SSE_INTMUL_ITINS_P for the AVX-512 MUL instructions to match their SSE/AVX counterparts.
llvm-svn: 321451
2017-12-26 05:43:04 +00:00
Eugene Leviant 193e701d86 [ThinLTO] Don't import functions with noinline attribute
Differential revision: https://reviews.llvm.org/D41489

llvm-svn: 321443
2017-12-25 13:57:24 +00:00
George Rimar 18e6a788fb [MC] - Disallow invalid section groups declarations.
This fixes parseGroup() so that it always sets error condition on error.
Previously it was not done, because parseIdentifier looks never do that,
assuming that caller should do it if he wants to.

So previously cases from test were silently accepted and produced broken output.

Differential revision: https://reviews.llvm.org/D41559

llvm-svn: 321439
2017-12-25 09:41:00 +00:00
Max Kazantsev ddb096853d [SafepointIRVerifier] Allow non-dereferencing uses of unrelocated or poisoned PHI nodes
PHI that has at least one unrelocated input cannot cause any issues by itself,
though its uses should be carefully verified. With this patch PHIs are allowed
to have any inputs but when all inputs are unrelocated the PHI is marked as
unrelocated and if not all inputs are unrelocated then the PHI is marked as
poisoned. Poisoned pointers can be used only in three ways: to derive new
pointers, in PHIs or in comparisons against constants that are exclusively
derived from null.

Patch by Daniil Suchkov!

Differential Revision: https://reviews.llvm.org/D41006

llvm-svn: 321438
2017-12-25 09:35:10 +00:00
Craig Topper 705fef3ef3 [X86] Add a DAG combines to turn vXi64 muls into VPMULDQ/VPMULUDQ if the upper bits are all sign bits or zeros.
Normally we catch this during lowering, but vXi64 mul is considered legal when we have AVX512DQ.

This DAG combine allows us to avoid PMULLQ with AVX512DQ if we can prove its unnecessary. PMULLQ is 3 uops that take 4 cycles each. While pmuldq/pmuludq is only one 4 cycle uop.

llvm-svn: 321437
2017-12-25 06:47:10 +00:00
Craig Topper b28460a0d6 [X86] Add avx512vl and avx512dq command lines to combine-pmuldq.ll to demonstrate where we fail to use pmuldq/pmuludq and use to pmullq instead.
It's nice that pmullq exists, but it has higher latency and probably lower throughput than pmuldq/pmuludq. We should prefer those if we can.

llvm-svn: 321436
2017-12-25 06:47:08 +00:00
Simon Pilgrim 34265223df [X86][AVX] Add AVX1/AVX2 vmul tests
llvm-svn: 321426
2017-12-24 12:51:54 +00:00
Simon Pilgrim e0434fad16 [X86][X87] Mark pseudo memory fold instructions as load/sideeffects (PR21160, PR34080, PR34454).
Match regular x87 memory fold instructions with load/sideeffects tags, to prevent the schedulers from re-ordering them across the fnstcw/fldcw sequences for truncating stores while they are still pseudo during the stack conversion pass.

llvm-svn: 321424
2017-12-24 12:20:21 +00:00
Simon Pilgrim ed99c332c3 [X86][X87] Renamed CHECK prefix, its not actually broken anymore just scheduled differently
llvm-svn: 321423
2017-12-24 10:25:01 +00:00
Simon Pilgrim fcbf977e04 [X86][X87] Add another test case mentioned on PR34080
Did my best to reduce this, but the X87 scheduling bug is hard to hit at the best of times...

llvm-svn: 321422
2017-12-24 10:22:55 +00:00
Craig Topper 2d1d9a11c1 [X86] Fix (v2f64 (s/uint_to_fp (v2i1))) to avoid scalarization without AVX512DQ.
Previously we extended v2i1 to v2f64 and then tried to use cvtuqq2pd/cvtqq2pd, but that only works with avx512dq. So we ended up scalarizing it. Now we widen to v4i1 first and extend to v4i32.

llvm-svn: 321420
2017-12-24 06:51:36 +00:00
George Rimar 64edcdc3fb [MC] - Teach llvm-mc to handle comdats whose names are numbers.
Currently llvm-mc ignores COMDATs whose names are numbers,
for example following code:

.section .foo,"G",@progbits,123,comdat

would produce no COMDATs at all.

Patch fixes the issue. 

Differential revision: https://reviews.llvm.org/D41552

llvm-svn: 321419
2017-12-24 06:13:36 +00:00
Craig Topper f65a6e4ed8 [DAGCombiners] Don't turn ANDs to shuffles with zero so early. Give some other combines a chance to run.
This moves the combine for turning ANDs into shuffle with zero out of SimplifyVBinOps and places it only in visitAND below the reassociate handling. This fixes the specific case I noticed where we failed to combine two ands with constants.

llvm-svn: 321417
2017-12-24 02:05:18 +00:00
Craig Topper 62fd123731 [X86] Teach WidenMaskArithmetic to handle any constant buildvector on the RHS not just all zeros/ones.
llvm-svn: 321415
2017-12-24 01:03:31 +00:00
Craig Topper 1f2f265fc1 [SelectionDAG] Teach SelectionDAG::getNode to constant fold zext/aext/sext of constant build vectors.
llvm-svn: 321414
2017-12-23 20:21:29 +00:00
Florian Hahn 7e9328906b [CallSiteSplitting] Remove isOrHeader restriction.
By following the single predecessors of the predecessors of the call
site, we do not need to restrict the control flow.

Reviewed By: junbuml, davide

Differential Revision: https://reviews.llvm.org/D40729

llvm-svn: 321413
2017-12-23 20:02:26 +00:00
Craig Topper 06dad14797 [X86] Remove type restrictions from WidenMaskArithmetic.
This can help AVX-512 code where mask types are legal allowing us to remove extends and truncates to/from mask types.

llvm-svn: 321408
2017-12-23 18:53:05 +00:00
Sam Clegg 6006e09169 [WebAssembly] MC: Fix for address taken aliases
Previously, taking the address for an alias would result in:
 "Symbol not found in table index space"

Increase test coverage for weak aliases.

This code should be more efficient too as it avoids building
the `IsAddressTaken` set.

Differential Revision: https://reviews.llvm.org/D41510

llvm-svn: 321384
2017-12-22 20:31:39 +00:00
Alina Sbirlea ca741a87f2 [MemorySSA] Allow reordering of loads that alias in the presence of volatile loads.
Summary:
Make MemorySSA allow reordering of two loads that may alias, when one is volatile.
This makes MemorySSA less conservative and behaving the same as the AliasSetTracker.
For more context, see D16875.

LLVM language reference: "The optimizers must not change the number of volatile operations or change their order of execution relative to other volatile operations. The optimizers may change the order of volatile operations relative to non-volatile operations. This is not Java’s “volatile” and has no cross-thread synchronization behavior."

Reviewers: george.burgess.iv, dberlin

Subscribers: sanjoy, reames, hfinkel, llvm-commits, Prazek

Differential Revision: https://reviews.llvm.org/D41525

llvm-svn: 321382
2017-12-22 19:54:03 +00:00
Guozhi Wei 33250340f4 [SimplifyCFG] Don't do if-conversion if there is a long dependence chain
If after if-conversion, most of the instructions in this new BB construct a long and slow dependence chain, it may be slower than cmp/branch, even if the branch has a high miss rate, because the control dependence is transformed into data dependence, and control dependence can be speculated, and thus, the second part can execute in parallel with the first part on modern OOO processor.

This patch checks for the long dependence chain, and give up if-conversion if find one.

Differential Revision: https://reviews.llvm.org/D39352

llvm-svn: 321377
2017-12-22 18:54:04 +00:00
Dmitry Preobrazhensky 471adf7fdc [AMDGPU][MC] Corrected handling of negative expressions
See bug 35716: https://bugs.llvm.org/show_bug.cgi?id=35716

Reviewers: artem.tamazov, arsenm

Differential Revision: https://reviews.llvm.org/D41488

llvm-svn: 321372
2017-12-22 18:03:35 +00:00
Craig Topper b2368fbdf4 [SelectionDAG] Reverse the order of operands in the ISD::ADD created by TargetLowering::getVectorElementPointer so that the FrameIndex is on the left.
This seems to improve X86's ability to match this into an address computation. Otherwise the other operand gets assigned to the base register and the stack pointer + frame index ends up in the index register. But index registers can't encode ESP/RSP so we end up having to move it into another register to meet the constraint.

I could try to improve the address matcher in X86, but swapping the producer seemed easier. Several other places already have the operands in this order so this is at least consistent.

llvm-svn: 321370
2017-12-22 17:18:13 +00:00
Craig Topper 576335f998 [X86] When lowering insert_vector_elt/extract_vector_elt of vXi1 with a non-constant index just use either a 128-bit type or the vXi8 type with the correct number of elements.
Despite what the comment said there isn't better codegen for 512-bit vectors. The 128/256/512 bit implementation jus stores to memory and loads an element. There's no advantage to doing that with a larger size. In fact in many cases it causes a stack realignment and generates worse code.

llvm-svn: 321369
2017-12-22 17:18:11 +00:00
Dmitry Preobrazhensky c5b0c172f6 [AMDGPU][MC] Corrected parsing of optional operands for ds_swizzle_b32
See bug 35645: https://bugs.llvm.org/show_bug.cgi?id=35645

Reviewers: artem.tamazov, arsenm

Differential Revision: https://reviews.llvm.org/D41186

llvm-svn: 321367
2017-12-22 17:13:28 +00:00