Commit Graph

50045 Commits

Author SHA1 Message Date
Craig Topper bb8b79b0a0 [X86] Add test cases for vXi1 fptosi/fptoui.
Currently we do a lot of scalarization in these test cases.

llvm-svn: 321631
2018-01-01 21:12:10 +00:00
Sanjay Patel 18962dabb7 [x86] add runs for more vector variants; NFC
Preliminary step to see what the effects of D41618 look like.

llvm-svn: 321624
2018-01-01 16:36:47 +00:00
Simon Pilgrim e337268df7 [X86][SSE] Add test case from PR32160
llvm-svn: 321620
2018-01-01 13:04:04 +00:00
Uriel Korach c06596ced4 [X86] Regenerate test checks in sse-intrinsics-x86-upgrade with update-llc
Removing outdated checks.
NFC

llvm-svn: 321619
2018-01-01 09:00:13 +00:00
Uriel Korach e87d240699 [X86] Regenerate test checks in sse2-intrinsics-x86-upgrade with update-llc
Removing outdated checks.
NFC

llvm-svn: 321618
2018-01-01 08:47:50 +00:00
Craig Topper 0d35edda90 [X86] In LowerTruncateVecI1, don't add SHL if the input is known to be all sign bits.
If the input is all sign bits then the LSB through MSB are all the same so we don't need to be move the LSB to the MSB.

llvm-svn: 321617
2018-01-01 04:52:58 +00:00
Craig Topper fc3ce4993c [X86] Add patterns for using zmm registers for v8i32/v8f32 vselect with the false input being zero.
We can use zmm move with zero masking for this. We already had patterns for using a masked move, but we didn't check for the zero masking case separately.

llvm-svn: 321612
2018-01-01 01:11:29 +00:00
Craig Topper f78b75fb59 [X86] Use CONCAT_VECTORS instead of INSERT_SUBVECTOR for padding v4i1/v2i1 vector to v8i1 pre-legalize.
The CONCAT_VECTORS will be lowered to INSERT_SUBVECTOR later. In the modified cases this seems to be enough to trick a later DAG combine into running in a different order than allows the ANDs to be removed.

I'll admit this is a bit of a hack that happens to work, but using CONCAT_VECTORS is more consistent with other legalization code anyway.

llvm-svn: 321611
2017-12-31 19:17:52 +00:00
Simon Pilgrim b000675374 [X86][AVX2] Combine extract(broadcast(scalar_value)) --> scalar_value
As it has a scalar source we don't treat it as a target shuffle so needs special handling.

llvm-svn: 321610
2017-12-31 18:59:30 +00:00
Simon Pilgrim e940b86c5f [X86][AVX] Add test case from PR33740
llvm-svn: 321608
2017-12-31 17:16:48 +00:00
Simon Pilgrim f205ec716b [X86][SSE] Don't vectorize splat buildvector of binops (PR30780)
Don't combine buildvector(binop(),binop(),binop(),binop()) -> binop(buildvector(), buildvector()) if its a splat - keep the binop scalar and just splat the result to avoid large vector constants.

llvm-svn: 321607
2017-12-31 17:07:47 +00:00
Davide Italiano 9f074fe915 [SimplifyCFG] Stop hoisting musttail calls incorrectly.
PR35774.

llvm-svn: 321603
2017-12-31 16:47:16 +00:00
Craig Topper f0f6eefb49 [X86] Add a DAG combine to widen (i4 (bitcast (v4i1))) before type legalization sees the i4 and changes to load/store.
Same for v2i1 and i2.

llvm-svn: 321602
2017-12-31 09:50:38 +00:00
Craig Topper 7f39623533 [X86] Add a DAG combine to fix (v4i1 (bitcast (i4))) before type legalization sees the i4 and changes to load/store.
Same for i2 and v2i1.

llvm-svn: 321601
2017-12-31 08:25:50 +00:00
George Rimar 7672eb84af [MC] - Stop ignoring invalid meta data symbols.
Previously llvm-mc would silently accept code from testcase,
that contains invalid metadata symbol in section declaration.

Patch fixes the issue.

Differential revision: https://reviews.llvm.org/D41641

llvm-svn: 321599
2017-12-31 07:41:02 +00:00
Craig Topper 876ec0b558 [X86] Prevent combining (v8i1 (bitconvert (i8 load)))->(v8i1 load) if we don't have DQI.
We end up using an i8 load via an isel pattern from v8i1 anyway. This just makes it more explicit. This seems to improve codgen in some cases and I'd like to kill off some of the load patterns.

llvm-svn: 321598
2017-12-31 07:38:41 +00:00
Craig Topper a362dee774 [X86] Remove AND32ri8 from pattern for v1i1 load.
I don't think anything would actually expect the other bits to be zero.

llvm-svn: 321596
2017-12-31 07:38:33 +00:00
Craig Topper 7ba1b76854 [X86] Fix a crash when returning a <1 x i1> value>
llvm-svn: 321595
2017-12-31 07:38:30 +00:00
Philip Reames 232951dfb2 2nd attempt at "fixing" amdgpu tests after r321575​
The test needs to be changed; it was exercising UB and that likely wasn't the intent of the test author.  I simply removed the checks because I have absolutely no idea what this test was trying to accomplish.  With multiple check patterns, no explanation, and no familiarity on my part with the ISA a true fix is going to have to come from someone familiar with the target.  

llvm-svn: 321591
2017-12-31 03:34:36 +00:00
Philip Reames 3580c90458 Test fix after r321575
The test in question was checking for a particular intepretation of undefined behavior.  Relax the test to check that we simply don't crash.

Sorry for the breakage, I don't generally build AMDGPU locally and just saw the failure this morning.  

llvm-svn: 321589
2017-12-30 18:42:37 +00:00
Simon Pilgrim 06f6d262f9 [X86][SSE] Add PR30780 test cases
Broadcast of sign/zero extended scalars resulting in unnecessary vector constants

llvm-svn: 321584
2017-12-30 11:51:45 +00:00
Simon Pilgrim fa0f793c8d [X86][SSE] Add test for (v2f32 uitofp(build_vector(i32, i32))) (PR35732)
To compare against (v2f32 build_vector(f32 uitofp(i32), f32 uitofp(i32)))

llvm-svn: 321583
2017-12-30 11:20:56 +00:00
Hiroshi Inoue ca3cdd7f27 [PowerPC] fix a bug in TCO eligibility check
If the callee and caller use different calling convensions, we cannot apply TCO if the callee requires arguments on stack; e.g. C calling convention and Fast CC use the same registers for parameter passing, but the stack offset is not necessarily same.

This patch also recommit r319218 "[PowerPC] Allow tail calls of fastcc functions from C CallingConv functions." by @sfertile since the problem reported in r320106 should be fixed.

Differential Revision: https://reviews.llvm.org/D40893

llvm-svn: 321579
2017-12-30 08:09:04 +00:00
Craig Topper c5fd31a802 [X86] Custom legalize vXi1 extract_subvector with KSHIFTR.
This allows us to remove some isel patterns.

This is mostly NFC, but we now use KSHIFTB instead of KSHIFTW with DQI.

llvm-svn: 321576
2017-12-30 06:45:43 +00:00
Philip Reames e499bc3042 [instsimplify] consistently handle undef and out of bound indices for insertelement and extractelement
In one case, we were handling out of bounds, but not undef indices.  In the other, we were handling undef (with the comment making the analogy to out of bounds), but not out of bounds.  Be consistent and treat both undef and constant out of bounds indices as producing undefined results.

As a side effect, this also protects instcombine from having to handle large constant indices as we always simplify first.

llvm-svn: 321575
2017-12-30 05:54:22 +00:00
Philip Reames 8e1abe4a7d Add another test case for r321489
Went to reduce another fuzzer failure to find it's already been fixed, but the test case is slightly different so it's worth adding anyways.

Reduced from oss-fuzz #4768 test case

llvm-svn: 321573
2017-12-30 04:10:48 +00:00
Philip Reames 3e9c671923 Move tests associated with transforms moved in r321467
llvm-svn: 321572
2017-12-30 03:13:00 +00:00
Simon Atanasyan 970f686faa [mips] Replace assert by an error message
Initially, if the `c` constraint applied to the wrong data type that
causes LLVM to assert. This commit replaces the assert by an error
message.

llvm-svn: 321565
2017-12-29 19:18:24 +00:00
Matt Arsenault d94b63d765 AMDGPU: Remove mayLoad/hasSideEffects from MIMG stores
Atomics still have hasSideEffects set on them because
of the mess that is the memory properties.

llvm-svn: 321556
2017-12-29 17:18:18 +00:00
Simon Pilgrim c701596e86 [X86][SSE] Match PSHUFLW/PSHUFHW + PSHUFD vXi16 shuffle patterns (PR34686)
As noted in PR34686, we are relying on a PSHUFD+PSHUFLW+PSHUFHW shuffle chain for most general vXi16 unary shuffles.

This patch checks for simpler PSHUFLW+PSHUFD and PSHUFHW+PSHUFD cases beforehand, building on some existing code that just handled splat shuffles.

By doing so we also prevent premature use of PSHUFB shuffles which can be slower and require the creation/loading of constant shuffle masks.

We now have the 'fast-variable-shuffle' option for hardware that prefers combining 2 or more shuffles to VPSHUFB etc.

Differential Revision: https://reviews.llvm.org/D38318

llvm-svn: 321553
2017-12-29 14:41:50 +00:00
Dmitry Preobrazhensky 414e05383f [AMDGPU][MC] Incorrect parsing of flat/global atomic modifiers
See bug 35730: https://bugs.llvm.org/show_bug.cgi?id=35730

Differential Revision: https://reviews.llvm.org/D41598

Reviewers: vpykhtin, artem.tamazov, arsenm
llvm-svn: 321552
2017-12-29 13:55:11 +00:00
Nemanja Ivanovic 4e1f5e0734 [PowerPC] Fix for PR35688 - handle out-of-range values for r+r to r+i conversion
Revision 320791 introduced a pass that transforms reg+reg instructions to
reg+imm if they're fed by "load immediate". However, it didn't
handle out-of-range shifts correctly as reported in PR35688.
This patch fixes that and therefore the PR.

Furthermore, there was undefined behaviour in the patch where the RHS of an
initialization expression was 32 bits and constant `1` was shifted left 32
bits. This was fixed by ensuring the RHS is 64 bits just like the LHS.

Differential Revision: https://reviews.llvm.org/D41369

llvm-svn: 321551
2017-12-29 12:22:27 +00:00
Andrew V. Tischenko 03ddad853d Fix incorrect operand sizes for some MMX instructions: punpcklwd, punpcklbw and punpckldq.
Differential Revision: https://reviews.llvm.org/D41595

llvm-svn: 321549
2017-12-29 08:31:01 +00:00
Fedor Sergeev 02e7f0247b [PM] pass -debug-pass-manager flag into FunctionToLoopPassAdaptor's canonicalization PM
Summary:
New pass manager driver passes DebugPM (-debug-pass-manager) flag into
individual PassManager constructors in order to enable debug logging.
FunctionToLoopPassAdaptor has its own internal LoopCanonicalizationPM
which never gets its debug logging enabled and that means canonicalization
passes like LoopSimplify are never present in -debug-pass-manager output.

Extending FunctionToLoopPassAdaptor's constructor and
createFunctionToLoopPassAdaptor wrapper with an optional
boolean DebugLogging argument.

Passing debug-logging flags there as appropriate.

Reviewers: chandlerc, davide

Reviewed By: davide

Subscribers: mehdi_amini, eraman, llvm-commits, JDevlieghere

Differential Revision: https://reviews.llvm.org/D41586

llvm-svn: 321548
2017-12-29 08:16:06 +00:00
Sanjay Patel 2aae217a97 [x86] add tests for potential memcmp expansion (PR33325); NFC
llvm-svn: 321542
2017-12-28 22:07:47 +00:00
Benjamin Kramer a19d42f972 Unbreak test relying on debug output after r321540.
llvm-svn: 321541
2017-12-28 21:36:10 +00:00
Craig Topper 55cf880900 [X86] When lowering extending loads from v2i1/v4i1, if we have VLX, use a narrower extend.
Previously we used an extend from v8i1 to v8i32/v8i64. Then extracted to the final width. But if we have VLX we should extract first. This way we don't end up with an overly large extend.

This allows us to use vcmpeq to make all ones for the sign extend when DQI isn't available. Otherwise we get a VPTERNLOG.

If we make v2i1/v4i1 legal like proposed in D41560, we could always do this and rely on the lowering of the extend to widen when necessary.

llvm-svn: 321538
2017-12-28 19:46:11 +00:00
Reid Kleckner a2d119a059 [WinEH] Don't emit state stores or EH thunks for available_externally functions
The exception handler thunk needs to reference the LSDA of the parent
function, which won't be emitted if it's available_externally.

Fixes PR35736. ThinLTO ends up producing available_externally functions
that use _CxxFrameHandler3.

llvm-svn: 321532
2017-12-28 18:41:31 +00:00
Guozhi Wei 29697c13bc Revert r321377, it causes regression to https://reviews.llvm.org/P8055.
llvm-svn: 321528
2017-12-28 17:02:34 +00:00
Benjamin Kramer 0af3be4560 Fix tests after move to utohexstr.
llvm-svn: 321527
2017-12-28 17:00:37 +00:00
Gadi Haber efdb49bc9a [X86][PREFETCH]: Adding full coverage of MC encoding for the PREFETCH isa sets.<NFC>
NFC.
Adding MC regressions tests to cover the PREFETCH isa sets for both 32 and 64 bit.
This patch is part of a larger task to cover MC encoding of all X86 ISA Sets started in revision: https://reviews.llvm.org/D39952

Reviewers: zvi, craig.topper, RKSimon, AndreiGrischenko
Differential Revision: https://reviews.llvm.org/D41161

Change-Id: Icdc8c5fb68c414de7d2cfdb50da1cc6763d9932a
llvm-svn: 321524
2017-12-28 15:00:41 +00:00
Max Kazantsev a13e163a27 [RewriteStatepoints] Fix incorrect assertion
`RewriteStatepointsForGC` iterates over function blocks and their predecessors
in order of declaration. One of outcomes of this is that callsites are placed in
arbitrary order which has nothing to do with travelsar order.

On the other hand, function `recomputeLiveInValues` asserts that bases are
added to `Info.PointerToBase` before their deried pointers are updated. But
if call sites are processed in order different from RPOT, this is not necessarily
true. We cannot guarantee that the base was placed there before every
pointer derived from it. All we can guarantee is that this base was marked as
known base by this point.

This patch replaces the fact that we assert from checking that the base was
added to the map with assert that the base was marked as known base.

Differential Revision: https://reviews.llvm.org/D41593

llvm-svn: 321517
2017-12-28 12:03:12 +00:00
Simon Pilgrim 62411e4d4f [X86][SSE] Use PMADDWD for v4i32 multiplies with 17 or more leading zeros
If there are 17 or more leading zeros to the v4i32 elements, then we can use PMADD for the integer multiply when PMULLD is unavailable or slow.

The 17 bits need to be zero as the PMADDWD performs a v8i16 signed-mul-extend + pairwise-add - the upper 16 so we're adding a zero pair and the 17th bit so we don't incorrectly sign extend.

Differential Revision: https://reviews.llvm.org/D41484

llvm-svn: 321516
2017-12-28 10:05:49 +00:00
Simon Pilgrim 472689a159 [InstCombine] Check for isa<Instruction> before using cast<>
Protects against casts from constexpr etc.

Reduced from oss-fuzz #4788 test case

llvm-svn: 321515
2017-12-28 09:35:35 +00:00
Reid Kleckner 6d31001cd6 Revert "[memcpyopt] Teach memcpyopt to optimize across basic blocks"
This reverts r321138. It seems there are still underlying issues with
memdep. PR35519 seems to still be present if debug info is enabled. We
end up losing a memcpy. Somehow during store to memset merging, we
insert the memset after the memcpy or fail to update the memdep analysis
to account for the newly inserted memset of a pair.

Reduced test case:

  #include <assert.h>
  #include <stdio.h>
  #include <string>
  #include <utility>
  #include <vector>

  void do_push_back(
      std::vector<std::pair<std::string, std::vector<std::string>>>* crls) {
    crls->push_back(std::make_pair(std::string(), std::vector<std::string>()));
  }

  int __attribute__((optnone)) main() {
    // Put some data in the vector and then remove it so we take the push_back
    // fast path.
    std::vector<std::pair<std::string, std::vector<std::string>>> crl_set;
    crl_set.push_back({"asdf", {}});
    crl_set.pop_back();
    printf("first word in vector storage: %p\n", *(void**)crl_set.data());

    // Do the push_back which may fail to initialize the data.
    do_push_back(&crl_set);
    auto* first = &crl_set.back().first;
    printf("first word in vector storage (should be zero): %p\n",
           *(void**)crl_set.data());
    assert(first->empty());
    puts("ok");
  }

Compile with libc++, enable optimizations, and enable debug info:
$ clang++ -stdlib=libc++ -g -O2 t.cpp -o t.exe -Wl,-rpath=llvm/build/lib

This program will assert with this change.

llvm-svn: 321510
2017-12-28 05:10:33 +00:00
Sanjay Patel 84d54c3d8f [InstCombine] add tests for min/max folds (PR35717); NFC
llvm-svn: 321500
2017-12-27 21:55:06 +00:00
Petr Hosek ad6f457c39 [llvm-readobj] Support -needed-libs option for COFF files
This implements the -needed-libs option in the COFF dumper.

Differential Revision: https://reviews.llvm.org/D41529

llvm-svn: 321498
2017-12-27 19:59:56 +00:00
Andrew V. Tischenko 428e302f4d A special test to demonstrate debug logging for asm matcher.
llvm-svn: 321497
2017-12-27 19:25:21 +00:00
Craig Topper 72bbbeb2a7 [X86] Reimplement r321437 using custom lowering instead of as a DAG combine.
My original implementation ran as a DAG combine post type legalization, but it turns out we don't run that DAG combine step if type legalization didn't change anything. Attempts to make the combine run before type legalization as well hit other issues.

So just do it in LowerMUL where we can catch more cases.

llvm-svn: 321496
2017-12-27 19:09:40 +00:00
Matthew Simpson 9439f54902 [AArch64] Change order of candidate FMLS patterns
r319980 added new patterns to the machine combiner for transforming (fsub (fmul
x y) z) into (fmla (fneg z) x y). That is, fsub's where the first source
operand is an fmul are transformed. We previously only matched the case where
the second source operand of an fsub was an fmul, transforming (fsub z (fmul x
y)) into (fmls z x y). Now, if we have an fsub where both source operands are
fmuls, both of the above patterns are applicable.

However, the order in which we add the patterns to the list of candidates
determines the transformation that takes place, since only the first pattern
that matches will be used. This patch changes the order these two patterns are
added to the list of candidates such that we prefer the case where the second
source operand is an fmul (the fmls case), rather than the other one (the
fmla/fneg case). When both source operands are fmuls, this ordering results in
fewer instructions.

Differential Revision: https://reviews.llvm.org/D41587

llvm-svn: 321491
2017-12-27 15:25:01 +00:00
Benjamin Kramer 293f34301e [X86] Fix vmul combine for AVX1 targets.
v8i32 is legal von AVX1, but it doesn't have pmuludq for it.

llvm-svn: 321490
2017-12-27 13:31:50 +00:00
Simon Pilgrim e7d032f1d8 [InstCombine] Gracefully handle out of range extractelement indices
InstSimplify is responsible for handling these, but we shouldn't just assert here.

Reduced from oss-fuzz #4808 test case

llvm-svn: 321489
2017-12-27 12:00:18 +00:00
Simon Pilgrim 6fad3cbc66 [DAGCombine] foldBinOpIntoSelect can fail to constant fold in some cases.
For example, float operations may fail to constant fold under certain circumstances (inf/nan/denormal creation etc.)

Reduced from oss-fuzz #4802 test case

llvm-svn: 321488
2017-12-27 11:36:18 +00:00
Mikael Holmen fb2fd20f50 [Lint] Don't warn about noalias argument aliasing if other argument is byval
Summary:
When using byval, the data is effectively copied as part of the call
anyway, so we aren't actually passing the pointer and thus there is no
reason to issue a warning.

Reviewers: rnk

Reviewed By: rnk

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D40118

llvm-svn: 321478
2017-12-27 08:48:33 +00:00
Gadi Haber 309b06cb5c [X86][RD]: Adding full coverage of MC encoding for RD isa sets.<NFC>
NFC.
Adding MC regressions tests to cover RDPMC, RDRAND, RDRAND, RDSEED, RDTSCP, DWRFSGS isa sets.
This patch is part of a larger task to cover MC encoding of all X86 isa sets started in revision: https://reviews.llvm.org/D39952

Reviewers: zvi, craig.topper, RKSimon, AndreiGrischenk
Differential Revision: https://reviews.llvm.org/D41328

Change-Id: Ie97b397546e6b1ed180c6abd7b41fccb136d2b82
llvm-svn: 321476
2017-12-27 08:35:57 +00:00
Serguei Katkov edf3c8292b [SCEV] Do not insert if it is already in cache
This is fix for the crash caused by ScalarEvolution::getTruncateExpr.

It expects that if it checked the condition that SCEV is not in UniqueSCEVs cache in
the beginning that it will not be there inside this method.

However during recursion and transformation/simplification for sub expression,
it is possible that these modifications will end up with the same SCEV as we started from.

So we must always check whether SCEV is in cache and do not insert item if it is already there.

Reviewers: sanjoy, mkazantsev, craig.topper	
Reviewed By: sanjoy
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D41380

llvm-svn: 321472
2017-12-27 07:15:23 +00:00
Philip Reames cd13a66381 [instcombine] add powi(x, 2) -> x * x
llvm-svn: 321468
2017-12-27 01:30:12 +00:00
Zhaoshi Zheng 8af1e1cb78 [Unroll][DebugInfo] Propagate loop body's debug location to epilog preheader
NewExit and epilog PreHeader should has the same debug loc as the original loop
body, instead of original loop exit.

llvm-svn: 321465
2017-12-26 23:31:21 +00:00
Simon Pilgrim b17c204cc0 [DAGCombine] visitANDLike - ensure APInt is is in range for getSExtValue/getZExtValue
Reduced from oss-fuzz #4782 test case

llvm-svn: 321464
2017-12-26 23:27:44 +00:00
Craig Topper 8d8a7da745 [X86] Regenerate test using update_llc_test_checks.py.
llvm-svn: 321462
2017-12-26 22:22:57 +00:00
Sanjay Patel 14adbacd8a [InstCombine] fix miscompile of frem with 0.0 operand (PR34870)
We might want to select NAN here or do this transform with fast-math,
but this should at least fix the miscompile.

llvm-svn: 321461
2017-12-26 22:12:20 +00:00
Sanjay Patel 546c43fd1a [InstCombine] add test for frem with 0.0 (PR34870); NFC
llvm-svn: 321460
2017-12-26 22:06:57 +00:00
Andrew V. Tischenko 1dd7856af5 It's a fix for Bug 35741 - can't use comments after x86 prefixes.
Differential Revision: https://reviews.llvm.org/D41579

llvm-svn: 321459
2017-12-26 18:29:52 +00:00
Sanjay Patel 9a39979dd2 [ValueTracking] ignore FP signed-zero when detecting a casted-to-integer fmin/fmax pattern
This is a preliminary step for the patch discussed in D41136 (and denoted here with the FIXME comment).

When we match an FP min/max that is cast to integer, any intermediate difference between +0.0 or -0.0 
should be muted in the result by the conversion (either fptosi or fptoui) of the result. Thus, we can 
enable 'nsz' for the purpose of matching fmin/fmax.

Note that there's probably room to generalize this more, possibly by fixing the current calls to the
weak version of isKnownNonZero() in matchSelectPattern() to the more powerful recursive version.

Differential Revision: https://reviews.llvm.org/D41333

llvm-svn: 321456
2017-12-26 15:09:19 +00:00
Simon Pilgrim 628f63e5fd [DAGCombine] Don't combine (and (setne X, 0), (setne X, -1)) --> (setuge (add X, 1), 2) for i1
Reduced from oss-fuzz #4773 test case

llvm-svn: 321455
2017-12-26 14:48:28 +00:00
Simon Pilgrim 79c2c2f08c [InstSimplify] Check for in range extraction index before calling APInt::getZExtValue()
Reduced from oss-fuzz #4768 test case

llvm-svn: 321454
2017-12-26 11:42:39 +00:00
Craig Topper 162439dcdf [X86] Pass itins.rr/itins.rm through properly for some instructions.
llvm-svn: 321452
2017-12-26 05:43:05 +00:00
Craig Topper 9b800c692e [X86] Use SSE_INTMUL_ITINS_P for the AVX-512 MUL instructions to match their SSE/AVX counterparts.
llvm-svn: 321451
2017-12-26 05:43:04 +00:00
Eugene Leviant 193e701d86 [ThinLTO] Don't import functions with noinline attribute
Differential revision: https://reviews.llvm.org/D41489

llvm-svn: 321443
2017-12-25 13:57:24 +00:00
George Rimar 18e6a788fb [MC] - Disallow invalid section groups declarations.
This fixes parseGroup() so that it always sets error condition on error.
Previously it was not done, because parseIdentifier looks never do that,
assuming that caller should do it if he wants to.

So previously cases from test were silently accepted and produced broken output.

Differential revision: https://reviews.llvm.org/D41559

llvm-svn: 321439
2017-12-25 09:41:00 +00:00
Max Kazantsev ddb096853d [SafepointIRVerifier] Allow non-dereferencing uses of unrelocated or poisoned PHI nodes
PHI that has at least one unrelocated input cannot cause any issues by itself,
though its uses should be carefully verified. With this patch PHIs are allowed
to have any inputs but when all inputs are unrelocated the PHI is marked as
unrelocated and if not all inputs are unrelocated then the PHI is marked as
poisoned. Poisoned pointers can be used only in three ways: to derive new
pointers, in PHIs or in comparisons against constants that are exclusively
derived from null.

Patch by Daniil Suchkov!

Differential Revision: https://reviews.llvm.org/D41006

llvm-svn: 321438
2017-12-25 09:35:10 +00:00
Craig Topper 705fef3ef3 [X86] Add a DAG combines to turn vXi64 muls into VPMULDQ/VPMULUDQ if the upper bits are all sign bits or zeros.
Normally we catch this during lowering, but vXi64 mul is considered legal when we have AVX512DQ.

This DAG combine allows us to avoid PMULLQ with AVX512DQ if we can prove its unnecessary. PMULLQ is 3 uops that take 4 cycles each. While pmuldq/pmuludq is only one 4 cycle uop.

llvm-svn: 321437
2017-12-25 06:47:10 +00:00
Craig Topper b28460a0d6 [X86] Add avx512vl and avx512dq command lines to combine-pmuldq.ll to demonstrate where we fail to use pmuldq/pmuludq and use to pmullq instead.
It's nice that pmullq exists, but it has higher latency and probably lower throughput than pmuldq/pmuludq. We should prefer those if we can.

llvm-svn: 321436
2017-12-25 06:47:08 +00:00
Simon Pilgrim 34265223df [X86][AVX] Add AVX1/AVX2 vmul tests
llvm-svn: 321426
2017-12-24 12:51:54 +00:00
Simon Pilgrim e0434fad16 [X86][X87] Mark pseudo memory fold instructions as load/sideeffects (PR21160, PR34080, PR34454).
Match regular x87 memory fold instructions with load/sideeffects tags, to prevent the schedulers from re-ordering them across the fnstcw/fldcw sequences for truncating stores while they are still pseudo during the stack conversion pass.

llvm-svn: 321424
2017-12-24 12:20:21 +00:00
Simon Pilgrim ed99c332c3 [X86][X87] Renamed CHECK prefix, its not actually broken anymore just scheduled differently
llvm-svn: 321423
2017-12-24 10:25:01 +00:00
Simon Pilgrim fcbf977e04 [X86][X87] Add another test case mentioned on PR34080
Did my best to reduce this, but the X87 scheduling bug is hard to hit at the best of times...

llvm-svn: 321422
2017-12-24 10:22:55 +00:00
Craig Topper 2d1d9a11c1 [X86] Fix (v2f64 (s/uint_to_fp (v2i1))) to avoid scalarization without AVX512DQ.
Previously we extended v2i1 to v2f64 and then tried to use cvtuqq2pd/cvtqq2pd, but that only works with avx512dq. So we ended up scalarizing it. Now we widen to v4i1 first and extend to v4i32.

llvm-svn: 321420
2017-12-24 06:51:36 +00:00
George Rimar 64edcdc3fb [MC] - Teach llvm-mc to handle comdats whose names are numbers.
Currently llvm-mc ignores COMDATs whose names are numbers,
for example following code:

.section .foo,"G",@progbits,123,comdat

would produce no COMDATs at all.

Patch fixes the issue. 

Differential revision: https://reviews.llvm.org/D41552

llvm-svn: 321419
2017-12-24 06:13:36 +00:00
Craig Topper f65a6e4ed8 [DAGCombiners] Don't turn ANDs to shuffles with zero so early. Give some other combines a chance to run.
This moves the combine for turning ANDs into shuffle with zero out of SimplifyVBinOps and places it only in visitAND below the reassociate handling. This fixes the specific case I noticed where we failed to combine two ands with constants.

llvm-svn: 321417
2017-12-24 02:05:18 +00:00
Craig Topper 62fd123731 [X86] Teach WidenMaskArithmetic to handle any constant buildvector on the RHS not just all zeros/ones.
llvm-svn: 321415
2017-12-24 01:03:31 +00:00
Craig Topper 1f2f265fc1 [SelectionDAG] Teach SelectionDAG::getNode to constant fold zext/aext/sext of constant build vectors.
llvm-svn: 321414
2017-12-23 20:21:29 +00:00
Florian Hahn 7e9328906b [CallSiteSplitting] Remove isOrHeader restriction.
By following the single predecessors of the predecessors of the call
site, we do not need to restrict the control flow.

Reviewed By: junbuml, davide

Differential Revision: https://reviews.llvm.org/D40729

llvm-svn: 321413
2017-12-23 20:02:26 +00:00
Craig Topper 06dad14797 [X86] Remove type restrictions from WidenMaskArithmetic.
This can help AVX-512 code where mask types are legal allowing us to remove extends and truncates to/from mask types.

llvm-svn: 321408
2017-12-23 18:53:05 +00:00
Sam Clegg 6006e09169 [WebAssembly] MC: Fix for address taken aliases
Previously, taking the address for an alias would result in:
 "Symbol not found in table index space"

Increase test coverage for weak aliases.

This code should be more efficient too as it avoids building
the `IsAddressTaken` set.

Differential Revision: https://reviews.llvm.org/D41510

llvm-svn: 321384
2017-12-22 20:31:39 +00:00
Alina Sbirlea ca741a87f2 [MemorySSA] Allow reordering of loads that alias in the presence of volatile loads.
Summary:
Make MemorySSA allow reordering of two loads that may alias, when one is volatile.
This makes MemorySSA less conservative and behaving the same as the AliasSetTracker.
For more context, see D16875.

LLVM language reference: "The optimizers must not change the number of volatile operations or change their order of execution relative to other volatile operations. The optimizers may change the order of volatile operations relative to non-volatile operations. This is not Java’s “volatile” and has no cross-thread synchronization behavior."

Reviewers: george.burgess.iv, dberlin

Subscribers: sanjoy, reames, hfinkel, llvm-commits, Prazek

Differential Revision: https://reviews.llvm.org/D41525

llvm-svn: 321382
2017-12-22 19:54:03 +00:00
Guozhi Wei 33250340f4 [SimplifyCFG] Don't do if-conversion if there is a long dependence chain
If after if-conversion, most of the instructions in this new BB construct a long and slow dependence chain, it may be slower than cmp/branch, even if the branch has a high miss rate, because the control dependence is transformed into data dependence, and control dependence can be speculated, and thus, the second part can execute in parallel with the first part on modern OOO processor.

This patch checks for the long dependence chain, and give up if-conversion if find one.

Differential Revision: https://reviews.llvm.org/D39352

llvm-svn: 321377
2017-12-22 18:54:04 +00:00
Dmitry Preobrazhensky 471adf7fdc [AMDGPU][MC] Corrected handling of negative expressions
See bug 35716: https://bugs.llvm.org/show_bug.cgi?id=35716

Reviewers: artem.tamazov, arsenm

Differential Revision: https://reviews.llvm.org/D41488

llvm-svn: 321372
2017-12-22 18:03:35 +00:00
Craig Topper b2368fbdf4 [SelectionDAG] Reverse the order of operands in the ISD::ADD created by TargetLowering::getVectorElementPointer so that the FrameIndex is on the left.
This seems to improve X86's ability to match this into an address computation. Otherwise the other operand gets assigned to the base register and the stack pointer + frame index ends up in the index register. But index registers can't encode ESP/RSP so we end up having to move it into another register to meet the constraint.

I could try to improve the address matcher in X86, but swapping the producer seemed easier. Several other places already have the operands in this order so this is at least consistent.

llvm-svn: 321370
2017-12-22 17:18:13 +00:00
Craig Topper 576335f998 [X86] When lowering insert_vector_elt/extract_vector_elt of vXi1 with a non-constant index just use either a 128-bit type or the vXi8 type with the correct number of elements.
Despite what the comment said there isn't better codegen for 512-bit vectors. The 128/256/512 bit implementation jus stores to memory and loads an element. There's no advantage to doing that with a larger size. In fact in many cases it causes a stack realignment and generates worse code.

llvm-svn: 321369
2017-12-22 17:18:11 +00:00
Dmitry Preobrazhensky c5b0c172f6 [AMDGPU][MC] Corrected parsing of optional operands for ds_swizzle_b32
See bug 35645: https://bugs.llvm.org/show_bug.cgi?id=35645

Reviewers: artem.tamazov, arsenm

Differential Revision: https://reviews.llvm.org/D41186

llvm-svn: 321367
2017-12-22 17:13:28 +00:00
Haicheng Wu 6d14dfe8f3 [InlineCost] Find more free binary operations
Currently, inline cost model considers a binary operator as free only if both
its operands are constants. Some simple cases are missing such as a + 0, a - a,
etc. This patch modifies visitBinaryOperator() to call SimplifyBinOp() without
going through simplifyInstruction() to get rid of the constant restriction.
Thus, visitAnd() and visitOr() are not needed.

Differential Revision: https://reviews.llvm.org/D41494

llvm-svn: 321366
2017-12-22 17:09:09 +00:00
Dmitry Preobrazhensky 2713495318 [AMDGPU][MC] Added support of 256- and 512-bit tuples of ttmp registers
See bug 35561: https://bugs.llvm.org/show_bug.cgi?id=35561

This patch also affects implementation of SGPR and VGPR registers though changes are cosmetic.

Reviewers: artem.tamazov, arsenm

Differential Revision: https://reviews.llvm.org/D41437

llvm-svn: 321359
2017-12-22 15:18:06 +00:00
Simon Atanasyan 5cd90ccbe3 [mips] Add test case to check that calls to mcount follow long calls / short calls options. NFC
llvm-svn: 321357
2017-12-22 13:45:46 +00:00
Diana Picus 28a6d0e639 [ARM GlobalISel] Support G_INTTOPTR and G_PTRTOINT for s32
Mark conversions between pointers and 32-bit scalars as legal, map them
to the GPR and select to a simple COPY.

llvm-svn: 321356
2017-12-22 13:05:51 +00:00
Diana Picus 68773859c8 [ARM GlobalISel] Support pointer constants
Pointer constants are pretty rare, since we usually represent them as
integer constants and then cast to pointer. One notable exception is the
null pointer constant, which is represented directly as a G_CONSTANT 0
with pointer type. Mark it as legal and make sure it is selected like
any other integer constant.

llvm-svn: 321354
2017-12-22 11:09:18 +00:00
Sam Parker cf426fccd4 [DAGCombine] Revert r321259
Improve ReduceLoadWidth for SRL Patch is causing an issue on the
PPC64 BE santizer.

llvm-svn: 321349
2017-12-22 08:36:25 +00:00
Craig Topper e268598dd3 [X86] Add prefetchwt1 instruction and overhaul priorities and isel enabling for prefetch instructions.
Previously prefetch was only considered legal if sse was enabled, but it should be supported with 3dnow as well.

The prfchw flag now imply at least some form of prefetch without the write hint is available, either the sse or 3dnow version. This is true even if 3dnow and sse are explicitly disabled.

Similarly prefetchwt1 feature implies availability of prefetchw and the the prefetcht0/1/2/nta instructions. This way we can support _MM_HINT_ET0 using prefetchw and _MM_HINT_ET1 with prefetchwt1. And its assumed that if we have levels for the write hint we would have levels for the non-write hint, thus why we enable the sse prefetch instructions.

I believe this behavior is consistent with gcc. I've updated the prefetch.ll to test all of these combinations.

llvm-svn: 321335
2017-12-22 02:30:30 +00:00
Craig Topper 9befe89367 [X86] Use SIGN_EXTEND to implement ANY_EXTEND from vXi1.
llvm-svn: 321334
2017-12-22 02:30:26 +00:00
Eli Friedman 07d94d59a4 inline-fp.ll was moved in r321332; delete it properly.
llvm-svn: 321333
2017-12-22 02:10:40 +00:00
Eli Friedman 39ed9a602b [Inliner] Restrict soft-float inlining penalty.
The penalty is currently getting applied in a bunch of places where it
doesn't make sense, like bitcasts (which are free) and calls (which
were getting the call penalty applied twice). Instead, just apply the
penalty to binary operators and floating-point casts.

While I'm here, also fix getFPOpCost() to do the right thing in more
cases, so we don't have to dig into function attributes.

Differential Revision: https://reviews.llvm.org/D41522

llvm-svn: 321332
2017-12-22 02:08:08 +00:00
Craig Topper 8772228963 [X86] Use SIGN_EXTEND rather than ZERO_EXTEND for lowering extract_vector_elt from vXi1 with a non-const index.
We have a better range of instructions we can use if we can fill with the value i1 value rather than zeroing.

llvm-svn: 321315
2017-12-21 22:08:23 +00:00
Alina Sbirlea 50db8a2086 [ModRefInfo] Add must alias info to ModRefInfo.
Summary:
Add an additional bit to ModRefInfo, ModRefInfo::Must, to be cleared for known must aliases.
Shift existing Mod/Ref/ModRef values to include an additional most
significant bit. Update wrappers that modify ModRefInfo values to
reflect the change.

Notes:
* ModRefInfo::Must is almost entirely cleared in the AAResults methods, the remaining changes are trying to preserve it.
* Only some small changes to make custom AA passes set ModRefInfo::Must (BasicAA).
* GlobalsModRef already declares a bit, who's meaning overlaps with the most significant bit in ModRefInfo (MayReadAnyGlobal). No changes to shift the value of MayReadAnyGlobal (see AlignedMap). FunctionInfo.getModRef() ajusts most significant bit so correctness is preserved, but the Must info is lost.
* There are cases where the ModRefInfo::Must is not set, e.g. 2 calls that only read will return ModRefInfo::NoModRef, though they may read from exactly the same location.

Reviewers: dberlin, hfinkel, george.burgess.iv

Subscribers: llvm-commits, sanjoy

Differential Revision: https://reviews.llvm.org/D38862

llvm-svn: 321309
2017-12-21 21:41:53 +00:00
Craig Topper 742ac98d01 [X86] When lowering truncates to vXi1, don't sign extend i16/i8 types to 512-bit if we have VLX.
This should only affect what we do for v8i16. Previously we went to v8i64, but if we have VLX we only need v8i32. This prevents an unnecessary zmm usage.

llvm-svn: 321303
2017-12-21 20:45:13 +00:00
Wolfgang Pieb 6ecd6a8088 [DWARF v5] Rework of string offsets table reader
Reorganizes the DWARF consumer to derive the string offsets table 
contribution's format from the contribution header instead of 
(incorrectly) from the unit's format.

Reviewers: JDevliegehere, aprantl

Differential Revision: https://reviews.llvm.org/D41146
 

llvm-svn: 321295
2017-12-21 19:38:13 +00:00
Craig Topper 410a289b79 [X86] Promote v8i1 shuffles to v8i32 instead of v8i64 if we have VLX.
We should have equally good shuffle options for v8i32 with VLX. This was spotted during my attempts to remove 512-bit vectors from SKX.

We still use 512-bits for v16i1, v32i1, and v64i1. I'm less sure we can handle those well with narrower vectors. i32 and i64 element sizes get the best shuffle support.

llvm-svn: 321291
2017-12-21 18:44:06 +00:00
Simon Pilgrim 4de5bb093c [X86][SSE] Split large PAVGB/PAVGW vectors to legal widths
Patch to allow detectAVGPattern handle vectors larger than the legal size (128 SSE2, 256 AVX2, 512 AVX512BW), splitting the vectors accordingly.

Differential Revision: https://reviews.llvm.org/D41440

llvm-svn: 321288
2017-12-21 18:12:31 +00:00
Simon Pilgrim 6b915d3353 [DAGCombiner] Generalize (or (and X, c1), c2) -> (and (or X, c2), c1|c2) combine to work on non-splat vectors
The knownbits_mask_or_shuffle_uitofp change is interesting - shuffle combines manage to kick in, removing the AND constant mask load. For targets with fast-variable-shuffle this should reduce further to VPOR+VPSHUFB+VCVTDQ2PS.

llvm-svn: 321279
2017-12-21 16:34:46 +00:00
Simon Pilgrim 4707709d1b [X86] Add (or (and X, c1), c2) -> (and (or X, c2), c1|c2) non-splat vector test
llvm-svn: 321278
2017-12-21 16:08:41 +00:00
Tony Jiang eba757e45c [PowerPC] Fix parest build failure in SPEC2017.
The build failure was caused by an assertion in pre-legalization DAGCombine:

Combining: t6: ppcf128 = uint_to_fp t5
... into: t20: f32 = PPCISD::FCFIDUS t19

which is clearly wrong since ppcf128 are definitely different type with f32 and
we cannot change the node value type when do DAGCombine. The fix is don't
handle ppc_fp128 or i1 conversions in PPCTargetLowering::combineFPToIntToFP and
leave it to downstream to legalize it and expand it to small legal types.

Differential Revision: https://reviews.llvm.org/D41411

llvm-svn: 321276
2017-12-21 15:42:50 +00:00
Simon Pilgrim 4dd03ed7e3 [DAGCombiner] Generalize (and (or x, C), D) -> D iff (C & D) == D combine to work on non-splat vectors
llvm-svn: 321275
2017-12-21 15:17:29 +00:00
Simon Dardis 4b35ea1a0a [mips] Fix the invalid EVA test
During the review of D40362 I spotted that this test wasn't actually
testing the eva instructions due to '-mattr==eva', rather than '-mattr=+eva',
which resulted in test having no effect.

llvm-svn: 321273
2017-12-21 15:14:07 +00:00
Simon Pilgrim 636510a702 [X86] Add (and (or x, C), D) -> D iff (C & D) == D non-splat vector test
llvm-svn: 321268
2017-12-21 14:33:40 +00:00
Simon Pilgrim bed1aceadb [X86] Add v48i8 AVG test case, based on discussion on D41440
llvm-svn: 321261
2017-12-21 13:18:19 +00:00
Sam Parker 59efb8cb5b [DAGCombine] Improve ReduceLoadWidth for SRL
If the SRL node is only used by an AND, we may be able to set the
ExtVT to the width of the mask, making the AND redundant. To support
this, another check has been added in isLegalNarrowLoad which queries
whether the load is valid.

Differential Revision: https://reviews.llvm.org/D41350

llvm-svn: 321259
2017-12-21 12:55:04 +00:00
Sam Parker 98727bc261 [ARM] Armv8-R DFB instruction
Implement MC support for the Armv8-R 'Data Full Barrier' instruction.

Differential Revision: https://reviews.llvm.org/D41430

llvm-svn: 321256
2017-12-21 11:17:49 +00:00
Simon Atanasyan 62d3259d2d [llvm-readobj] Support 'GNU' style for MIPS GOT/PLT dumping
This change adds `printMipsGOT` and `printMipsPLT` methods to the
`DumpStyle` class and overrides them in the `GNUStyle` and `LLVMStyle`
descendants. To pass information about GOT/PLT layout into these
methods, the `MipsGOTParser` class has been extended to hold all
necessary data.

llvm-svn: 321253
2017-12-21 10:26:02 +00:00
Craig Topper 72c22f4366 [X86] Use PSHUFB for v32i16 shuffles before falling back to VPERMW/VPERMI2W.
PSHUFB has the ability to implicitly 0 elements which VPERMI2W can't do. So give a chance to use it first.

llvm-svn: 321251
2017-12-21 08:22:51 +00:00
Craig Topper 38af615b4c [X86] Use VPERMI2B for v16i8 shuffles if we have VBMI+VLX and would have otherwise used two PSHUFBs ORed together.
llvm-svn: 321249
2017-12-21 07:31:30 +00:00
Craig Topper 03b2bc4838 [X86] Use VPERMB/VPERMI2B for v32i8 shuffle lowering if VBMI and VLX are supported.
llvm-svn: 321248
2017-12-21 05:58:31 +00:00
Craig Topper 5e4c7bc963 [X86] Add avx512vbmi command lines to vector-shuffle-256-v32.ll
llvm-svn: 321247
2017-12-21 03:58:31 +00:00
Sam Clegg b6a429842e [WebAssembly] Fix local references to weak aliases
When weak aliases are used with in same translation
unit we need to be able to directly reference to alias
and not just the thing it is aliases.  We do this by
defining both a wasm import and a wasm export in this
case that result in a single Symbol.  This change is
a partial revert of rL314245.  A corresponding lld
change address the previous issues we had with this.

See: https://github.com/WebAssembly/tool-conventions/issues/34

Differential Revision: https://reviews.llvm.org/D41472

llvm-svn: 321242
2017-12-21 02:30:38 +00:00
Michael Zolotukhin ad371e0caa [SimplifyCFG] Avoid quadratic on a predecessors number behavior in instruction sinking.
If a block has N predecessors, then the current algorithm will try to
sink common code to this block N times (whenever we visit a
predecessor). Every attempt to sink the common code includes going
through all predecessors, so the complexity of the algorithm becomes
O(N^2).
With this patch we try to sink common code only when we visit the block
itself. With this, the complexity goes down to O(N).
As a side effect, the moment the code is sunk is slightly different than
before (the order of simplifications has been changed), that's why I had
to adjust two tests (note that neither of the tests is supposed to test
SimplifyCFG):
* test/CodeGen/AArch64/arm64-jumptable.ll - changes in this test mimic
the changes that previous implementation of SimplifyCFG would do.
* test/CodeGen/ARM/avoid-cpsr-rmw.ll - in this test I disabled common
code sinking by a command line flag.

llvm-svn: 321236
2017-12-21 01:22:13 +00:00
Joel Galenson 6f4e827e4c [ARM] Optimize {s,u}{add,sub}.with.overflow.
The AArch64 backend contains code to optimize {s,u}{add,sub}.with.overflow during SelectionDAG.  This commit ports that code to the ARM backend.

Differential revision: https://reviews.llvm.org/D35635

llvm-svn: 321224
2017-12-20 22:25:39 +00:00
Krzysztof Parzyszek e4ce92cabf [Hexagon] Allow construction of HVX vector predicates
Handle BUILD_VECTOR of boolean values.

llvm-svn: 321220
2017-12-20 20:49:43 +00:00
Yonghong Song 25bf825961 bpf: add support for objdump -print-imm-hex
Add support for 'objdump -print-imm-hex' for imm64, operand imm
and branch target. If user programs encode immediate values
as hex numbers, such an option will make it easy to correlate
asm insns with source code. This option also makes it easy
to correlate imm values with insn encoding.

There is one changed behavior in this patch. In old way, we
print the 64bit imm as u64:
  O << (uint64_t)Op.getImm();
and the new way is:
  O << formatImm(Op.getImm());

The formatImm is defined in llvm/MC/MCInstPrinter.h as
  format_object<int64_t> formatImm(int64_t Value)

So the new way to print 64bit imm is i64 type.
If a 64bit value has the highest bit set, the old way
will print the value as a positive value and the
new way will print as a negative value. The new way
is consistent with x86_64.
For the code (see the test program):
 ...
 if (a == 0xABCDABCDabcdabcdULL)
 ...
x86_64 objdump, with and without -print-imm-hex, looks like:
 48 b8 cd ab cd ab cd ab cd ab   movabsq $-6067004223159161907, %rax
 48 b8 cd ab cd ab cd ab cd ab   movabsq $-0x5432543254325433, %rax

Signed-off-by: Yonghong Song <yhs@fb.com>
llvm-svn: 321215
2017-12-20 19:39:58 +00:00
Matt Arsenault 303327d58b TableGen: Allow setting SDNodeProperties on intrinsics
Allows preserving MachineMemOperands on intrinsics
through selection. For reasons I don't understand, this
is a static property of the pattern and the selector
deliberately goes out of its way to drop if not present.

Intrinsics already inherit from SDPatternOperator allowing
them to be used directly in instruction patterns. SDPatternOperator
has a list of SDNodeProperty, but you currently can't set them on
the intrinsic. Without SDNPMemOperand, when the node is selected
any memory operands are always dropped. Allowing setting this
on the intrinsics avoids needing to introduce another equivalent
target node just to have SDNPMemOperand set.

llvm-svn: 321212
2017-12-20 19:36:28 +00:00
Matthew Simpson cb35c5d5c2 [ICP] Expose unconditional call promotion interface
This patch modifies the indirect call promotion utilities by exposing and using
an unconditional call promotion interface. The unconditional promotion
interface (i.e., call promotion without creating an if-then-else) can be used
if it's known that an indirect call has only one possible callee. The existing
conditional promotion interface uses this unconditional interface to promote an
indirect call after it has been versioned and placed within the "then" block.

A consequence of unconditional promotion is that the fix-up operations for phi
nodes in the normal destination of invoke instructions are changed. This is
necessary because the existing implementation assumed that an invoke had been
versioned, creating a "merge" block where a return value bitcast could be
placed. In the new implementation, the edge between a promoted invoke's parent
block and its normal destination is split if needed to add a bitcast for the
return value. If the invoke is also versioned, the phi node merging the return
value of the promoted and original invoke instructions is placed in the "merge"
block.

Differential Revision: https://reviews.llvm.org/D40751

llvm-svn: 321210
2017-12-20 19:26:37 +00:00
Craig Topper 07820f2fe4 [X86] Remove zext from vXi32 to vXi64 on indices of gather/scatter instructions if we can prove the pre-extended value is positive.
Gather/scatter can implicitly sign extend from i32->i64 on indices. So if we know the sign bit of the input to a zext is 0 we can use the implicit extension.

llvm-svn: 321209
2017-12-20 19:25:33 +00:00
Warren Ristow c07d49585f Improve the test for r320216. NFC.
Patch by Matthew Voss!

llvm-svn: 321207
2017-12-20 19:11:31 +00:00
Evgeniy Stepanov 3fd1b1a764 [hwasan] Implement -fsanitize-recover=hwaddress.
Summary: Very similar to AddressSanitizer, with the exception of the error type encoding.

Reviewers: kcc, alekseyshl

Subscribers: cfe-commits, kubamracek, llvm-commits, hiraditya

Differential Revision: https://reviews.llvm.org/D41417

llvm-svn: 321203
2017-12-20 19:05:44 +00:00
Matt Arsenault f7f59b5292 [AMDGPU, AsmParser] Enable the mnemonic spell corrector.
Patch by Dmitry Venikov

llvm-svn: 321202
2017-12-20 18:52:57 +00:00
Craig Topper bc92e00f2e [X86] Implement the fusing of MUL+SUBADD to FMSUBADD
This patch turns shuffles of fadd/fsub with fmul into fmsubadd.

Patch by Dmitry Venikov

Differential Revision: https://reviews.llvm.org/D40335

llvm-svn: 321200
2017-12-20 18:05:15 +00:00
Teresa Johnson a4ce3bfdda [PGO] Function section hotness prefix should look at all blocks
Summary:
The function section prefix for PGO based layout (e.g. hot/unlikely)
should look at the hotness of all blocks not just the entry BB.
A function with a cold entry but a very hot loop should be placed in the
hot section, for example, so that it is located close to other hot
functions it may call. For SamplePGO it was already looking at the
branch weights on calls, and I made that code conditional on whether
this is SamplePGO since it was essentially a noop for instrumentation
PGO anyway.

Reviewers: davidxl

Subscribers: eraman, llvm-commits

Differential Revision: https://reviews.llvm.org/D41395

llvm-svn: 321197
2017-12-20 17:53:10 +00:00
Florian Hahn 012c8f97b2 [InstCombine] Add debug location to new caller.
Reviewers: rnk, aprantl, majnemer

Reviewed By: aprantl

Differential Revision: https://reviews.llvm.org/D414

llvm-svn: 321191
2017-12-20 17:16:59 +00:00
Mohammad Shahid 3a934d6ab9 Revert r320548:[SLP] Vectorize jumbled memory loads
llvm-svn: 321181
2017-12-20 15:26:59 +00:00
Krzysztof Parzyszek 8f6b0c850a [Hexagon] Adjust the value type for BCvt in LowerFormalArguments
llvm-svn: 321177
2017-12-20 14:44:05 +00:00
Daniel Sanders 32de8bbd30 [globalisel][tablegen] Allow ImmLeaf predicates to use InstructionSelector members
NFC for currently supported targets. This resolves a problem encountered by
targets such as RISCV that reference `Subtarget` in ImmLeaf predicates.

llvm-svn: 321176
2017-12-20 14:41:51 +00:00
Florian Hahn 467abe3e4f [LV] Remove unnecessary DoExtraAnalysis guard (silent bug)
canVectorize is only checking if the loop has a normalized pre-header if DoExtraAnalysis is true.
This doesn't make sense to me because reporting analysis information shouldn't alter legality
checks. This is probably the result of a last minute minor change before committing (?).

Patch by Diego Caballero.

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D40973

llvm-svn: 321172
2017-12-20 13:28:38 +00:00
Simon Pilgrim a50eec0293 [X86][AVX2] Split more shuffle tests into 'slow' and 'fast' variable shuffles
llvm-svn: 321171
2017-12-20 13:12:34 +00:00
Diana Picus 75ce852abe [ARM GlobalISel] Fix assertion in RegBankSelect
We get an assertion in RegBankSelect for code along the lines of
my_32_bit_int = my_64_bit_int, which tends to translate into a 64-bit
load, followed by a G_TRUNC, followed by a 32-bit store. This appears in
a couple of places in the test-suite.

At the moment, the legalizer doesn't distinguish between integer and
floating point scalars, so a 64-bit load will be marked as legal for
targets with VFP, and so will the rest of the sequence, leading to a
slightly bizarre G_TRUNC reaching RegBankSelect.

Since the current support for 64-bit integers is rather immature, this
patch works around the issue by explicitly handling this case in
RegBankSelect and InstructionSelect. In the future, we may want to
revisit this decision and make sure 64-bit integer loads are narrowed
before reaching RegBankSelect.

llvm-svn: 321165
2017-12-20 11:27:10 +00:00
Florian Hahn c3aa6d83fd [ARM] Lower unsigned saturation to USAT
Summary:
Implement lower of unsigned saturation on an interval [0, k] where k + 1 is a power of two using USAT instruction in a similar way to how [~k, k] is lowered using SSAT on ARM models that supports it.

Patch by Marten Svanfeldt

Reviewers: t.p.northover, pbarrio, eastig, SjoerdMeijer, javed.absar, fhahn

Reviewed By: fhahn

Subscribers: fhahn, aemerson, javed.absar, llvm-commits, kristof.beyls

Differential Revision: https://reviews.llvm.org/D41348

llvm-svn: 321164
2017-12-20 11:13:57 +00:00
Sander de Smalen cd6be960ce [AArch64][SVE] Re-submit patch series for ZIP1/ZIP2
This patch resubmits the SVE ZIP1/ZIP2 patch series consisting of
of r320992, r320986, r320973, and r320970 by reverting
https://reviews.llvm.org/rL321024.

The issue that caused r321024 has been addressed in https://reviews.llvm.org/rL321158,
so this patch-series should be safe to resubmit.

llvm-svn: 321163
2017-12-20 11:02:42 +00:00
Tim Northover 6db5d027c6 AArch64: fix one more place movi.2d could be created.
Somehow got missed out of r320965.

llvm-svn: 321162
2017-12-20 10:45:39 +00:00
Bjorn Steinbrink 030123e8e8 Give up on array allocas in getPointerDereferenceableBytes
Summary:
As suggested by Eli Friedman, don't try to handle array allocas here,
because of possible overflows, instead rely on instcombine converting
them to allocations of array types.

Reviewers: efriedma

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D41398

llvm-svn: 321159
2017-12-20 10:01:30 +00:00
Sander de Smalen c067c30d9e [AArch64] Asm: Fix parsing of register aliases that have a name starting with 'z'
Summary: This fixes an issue as identified by @rnk in https://reviews.llvm.org/rL321029.

Reviewers: rnk, fhahn, rengolin, efriedma, echristo, olista01

Reviewed By: rnk, fhahn

Subscribers: aemerson, javed.absar, kristof.beyls, llvm-commits, rnk

Differential Revision: https://reviews.llvm.org/D41382

llvm-svn: 321158
2017-12-20 09:45:45 +00:00
Sam Parker daed9de622 [AArch64] CCSIDR2 system register
Implement the 'Current Cache Size' register that has been introduced
as part of the Armv8.3 architecture. I originally missed this, and
(hopefully) should be the final patch for assembler support.

Differential Revision: https://reviews.llvm.org/D41396

llvm-svn: 321155
2017-12-20 08:56:41 +00:00
Gadi Haber 0ae485c581 [X86][CLFLUSH]: Adding full coverage of MC encoding for the CLFLUSH isa sets.<NFC>
NFC.
Adding MC regressions tests to cover the CLFLSH and CLFLUSHOPT isa sets.
This patch is part of a larger task to cover MC encoding of all X86 isa sets started in revision: https://reviews.llvm.org/D39952

Reviewers: zvi, RKSimon, craig.topper, m_zuckerman
Differential Revision: https://reviews.llvm.org/D41331

Change-Id: Ifa643dd52f1b7184c52bc1806038dc74b234fc65
llvm-svn: 321153
2017-12-20 08:28:24 +00:00
Craig Topper abed821c36 [X86] Optimize sign extends on index operand to gather/scatter to not sign extend past i32.
The gather instruction will implicitly sign extend to the pointer width, we don't need to further extend it. This can prevent unnecessary splitting in some cases.

There's still an issue that lowering on non-VLX can introduce another sign extend that doesn't get combined with shifts from a lowered sign_extend_inreg.

llvm-svn: 321152
2017-12-20 07:36:59 +00:00
Martin Storsjo 2778fd0b59 [AArch64] Implement stack probing for windows
Differential Revision: https://reviews.llvm.org/D41131

llvm-svn: 321150
2017-12-20 06:51:45 +00:00
Hiroshi Inoue 11e571e0c6 [PowerPC] fix a bug in redundant compare elimination
This patch fixes a bug in the redundant compare elimination reported in https://reviews.llvm.org/rL320786 and re-enables the optimization.

The redundant compare elimination assumes that we can replace signed comparison with unsigned comparison for the equality check. But due to the difference in the sign extension behavior we cannot change the opcode if the comparison is against an immediate and the most significant bit of the immediate is one.

Differential Revision: https://reviews.llvm.org/D41385

llvm-svn: 321147
2017-12-20 05:18:19 +00:00
Dan Gohman aa3922819e [memcpyopt] Teach memcpyopt to optimize across basic blocks
This teaches memcpyopt to make a non-local memdep query when a local query
indicates that the dependency is non-local. This notably allows it to
eliminate many more llvm.memcpy calls in common Rust code, often by 20-30%.

This is r319482 and r319483, along with fixes for PR35519: fix the 
optimization that merges stores into memsets to preserve cached memdep
info, and fix memdep's non-local caching strategy to not assume that larger
queries are always more conservative than smaller ones.

Fixes PR28958 and PR35519.

Differential Revision: https://reviews.llvm.org/D40802

llvm-svn: 321138
2017-12-20 01:36:25 +00:00
Craig Topper b1ae03fd61 [X86] Improve coverage of fma negations.
llvm-svn: 321137
2017-12-20 01:26:36 +00:00
Craig Topper 171fb15786 [X86] Fix probable typo in fma fneg test.
llvm-svn: 321136
2017-12-20 01:26:35 +00:00
Adrian McCarthy 86795d9166 Revert "Fix faulty assertion in debug info"
This reverts commit e32def3f7ebe1136b7038336eff56a415a962bf2.

llvm-svn: 321125
2017-12-19 23:34:37 +00:00
Adrian McCarthy 2ed8f36834 Fix faulty assertion in debug info
It appears the code uses nullptr to represent a void type in debug metadata,
which led to an assertion failure when building DeltaAlgorithm.cpp with a
self-hosted clang on Windows.

I'm not sure why/if the problem was Windows-specific.

Fixes bug https://bugs.llvm.org/show_bug.cgi?id=35543

Differential Revision: https://reviews.llvm.org/D41264

llvm-svn: 321122
2017-12-19 23:01:17 +00:00
Francis Visoiu Mistrih 3b265c8fcf [CodeGen] Move printing MO_FPImmediate operands to MachineOperand::print
Work towards the unification of MIR and debug output by refactoring the
interfaces.

llvm-svn: 321110
2017-12-19 21:47:00 +00:00
Jonas Devlieghere b2a03193dd [dwarfdump][test] Add test case for r321064
Verify that -lookup takes a 64-bit address.

llvm-svn: 321101
2017-12-19 19:42:32 +00:00
Mark Searles e4f067ebe2 [AMDGPU] Turn off MergeConsecutiveStores() before Instruction Selection for AMDGPU. Commit dbbb6c5fc3642987430866dffdf710df4f616ac7 turned on MergeConsecutiveStores() before Instruction Selection for all targets. Enough AMDGPU compiles go into an infinite loop ( MergeConsecutiveStores() merges two stores; LegalizeStoreOps() un-merges; MergeConsecutiveStores() re-merges, etc. ) to warrant turning it off until the issues can be addressed.
Differential Revision: https://reviews.llvm.org/D41377

llvm-svn: 321100
2017-12-19 19:26:23 +00:00
Simon Pilgrim 7cabb4c384 [X86] Regenerate popcnt tests
llvm-svn: 321093
2017-12-19 18:05:13 +00:00
Amara Emerson b6ddbef673 [GlobalISel][Legalizer] Fix crash when trying to lower G_FNEG of fp128 types.
This doesn't add legalizer support, just prevents crashing so that we
can gracefully fall back to SDAG.

Fixes PR35690.

llvm-svn: 321091
2017-12-19 17:21:35 +00:00
Nirav Dave 51425fa5ba [DAG] Elide overlapping store
Summary:
Extend overlapping store elision to handle overwrites of stores by
larger stores.

Nontemporal tests have been modified to add memory dependencies to
prevent store elision.

Reviewers: craig.topper, rnk, t.p.northover

Subscribers: javed.absar, hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D40969

llvm-svn: 321089
2017-12-19 17:10:56 +00:00
Simon Pilgrim d873b6f6ba [X86][AVX512] Attempt target shuffle combining to different types instead of early-out
We try to prevent shuffle combining to value types that would stop the folding of masked operations, but by just returning early, we were failing to try different shuffle types.

The TODOs are all still relevant here to improve codegen but we're lacking test examples.

llvm-svn: 321085
2017-12-19 16:54:07 +00:00
Ben Dunbobbin 9ecb8b548c [Support][CachePruning] Disable cache pruning regression fix
borked by: rL284966 (see: https://reviews.llvm.org/D25730).

Previously, Interval was unsigned (see: CachePruning.h), replacing the type with std::chrono::seconds (which is signed) causes a regression in behaviour because the c-api intends negative values to translate to large positive intervals to *effectively* disable the pruning (see comments on: setCachePruningInterval()).

Differential Revision: https://reviews.llvm.org/D41231

llvm-svn: 321077
2017-12-19 14:42:38 +00:00
Haicheng Wu b3689cabda [InlineCost] Skip volatile loads when looking for repeated loads
This is a follow-up fix of r320814.  A test case is also added.

llvm-svn: 321075
2017-12-19 13:42:58 +00:00
Simon Pilgrim fd5df639a3 [X86][SSE] Add cpu feature for aggressive combining to variable shuffles
As mentioned in D38318 and D40865, modern Intel processors prefer to combine multiple shuffles to a variable shuffle mask (PSHUFB/VPERMPS etc.) instead of having multiple stage 'fixed' shuffles which put more pressure on Port 5 (at the expense of extra shuffle mask loads).

This patch provides a FeatureFastVariableShuffle target flag for Haswell+ CPUs that prefers combining 2 or more fixed shuffles to a single variable shuffle (default is 3 shuffles).

The long term aim is to drive more of this from schedule data (probably via the MC) but we're not close to being ready for that yet.

Differential Revision: https://reviews.llvm.org/D41323

llvm-svn: 321074
2017-12-19 13:16:43 +00:00
David Green 110844d21c [ARM] Register the Thumb2SizeReducePass. NFC
Also adds a simple test case.

llvm-svn: 321072
2017-12-19 12:19:08 +00:00
Simon Pilgrim f6d4ab6daf [X86][SSE] Use (V)PHMINPOSUW for vXi8 SMAX/SMIN/UMAX/UMIN horizontal reductions (PR32841)
Extension to D39729 which performed this for vXi16, with the same bit flipping to handle SMAX/SMIN/UMAX cases, vXi8 UMIN horizontal reductions can be performed.

This makes use of the fact that by performing a pair-wise i8 SHUFFLE/UMIN before PHMINPOSUW, we both get the UMIN of each pair but also zero-extend the upper bits ready for v8i16.

Differential Revision: https://reviews.llvm.org/D41294

llvm-svn: 321070
2017-12-19 12:02:40 +00:00
Simon Dardis 1ade566c45 [mips] Handle the emission of microMIPSr6 sll instruction when used as a nop.
This instruction is encoded as zero, so we have handle that case when checking
for unimplemented opcodes when producing the encoding for an instruction.

llvm-svn: 321066
2017-12-19 11:16:22 +00:00
Max Kazantsev fd95ee0c9a [JumpThreading] Restrict PRE across instructions that don't pass control to successors
PRE in JumpThreading should not be able to hoist copy of non-speculable loads across
instructions that don't always transfer execution to their successors, otherwise they may
introduce an unsafe load which otherwise would not be executed.

The same problem for GVN was fixed as rL316975.

Differential Revision: https://reviews.llvm.org/D40347

llvm-svn: 321063
2017-12-19 09:10:21 +00:00
Bjorn Steinbrink 2da4d9d86d Treat sret arguments as being dereferenceable in getPointerDereferenceableBytes()
Reviewers: rnk, hfinkel, efriedma

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D41355

llvm-svn: 321061
2017-12-19 08:46:46 +00:00
Craig Topper 13142b10d5 [X86] Don't extend v16i8 non-uniform shifts to v16i32 if we have BWI. Use v16i16 instead.
BWI supports shifting by word amounts. Even if VLX isn't support we can still widen to v32i16 and extract the lower half. For SKX its preferrable to not use 512-bit vector if we can.

llvm-svn: 321059
2017-12-19 06:59:10 +00:00
Justin Bogner 4314f3adc2 update_mir_test_checks: Accept IR as input as well as MIR
We need to handle IR for tests that want to do lowering (or just
-stop-after with IR as input). I've run this on one AArch64 test to
demonstrate what it looks like.

llvm-svn: 321048
2017-12-19 00:49:04 +00:00
Jake Ehrlich e8437de727 [llvm-objcopy] Add option to add a progbits section from a file
This change adds support for adding progbits sections with contents from a file

Differential Revision: https://reviews.llvm.org/D41212

llvm-svn: 321047
2017-12-19 00:47:30 +00:00
Matthias Braun e29c0b8862 TargetLoweringBase: Followup to r321035
I missed some prefixes and the fact that on AArch64 we use "bzero"
instead of "__bzero" as on X86 when doing my refactoring in r321035.

Improve tests for bzero.

llvm-svn: 321046
2017-12-19 00:43:00 +00:00
Krzysztof Parzyszek e704583f23 [Hexagon] Cache loads to select to avoid traversing mutating DAG
llvm-svn: 321034
2017-12-18 23:13:27 +00:00
Evandro Menezes 687df6380e [AArch64] Expand test coverage of vector element shuffling to Exynos
Make sure that all test cases are run for Exynos as well.  Otherwise, NFC.

llvm-svn: 321032
2017-12-18 22:17:39 +00:00
Bob Haarman ea5ff9fa6b Fix buffer overrun in WindowsResourceCOFFWriter::writeSymbolTable()
Summary:
We were using sprintf(..., "$R06X", <some uint32_t>) to create strings
that are expected to be exactly length 8, but this results in longer
strings if the uint32_t is greater than 0xffffff. This change modifies
the behavior as follows:

 - Uses the loop counter instead of the data offset. This gives us
   sequential symbol names, avoiding collisions as much as possible.

 - Masks the value to 0xffffff to avoid generating names longer than 8
   bytes.

 - Uses formatv instead of sprintf.

Fixes PR35581.

Reviewers: ruiu, zturner

Reviewed By: ruiu

Subscribers: hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D41270

llvm-svn: 321030
2017-12-18 22:10:14 +00:00
Reid Kleckner 8f3c351aa3 Add test for .req directive starting with 'p'
Reduced test case from libjpeg_turbo.

llvm-svn: 321029
2017-12-18 22:01:18 +00:00
Craig Topper 4802d4e23e [X86] Don't use NOPL when the assembler is passed an empty CPU string. Update tests to force a CPU with NOPL
Empty string should be equivalent to "generic" which doesn't allow NOPL. Force tests to use specificy 'pentiumpro' to guarantee NOPL.

Fixes PR35686

llvm-svn: 321026
2017-12-18 21:37:27 +00:00
Reid Kleckner 37517a2ddd Revert "[AArch64][SVE] Asm" changes, they broke libjpeg_turbo
This reverts changes r320992, r320986, r320973, and r320970.

r320970 by itself breaks the test case, and the rest depend on it.

Test case will land soon.

llvm-svn: 321024
2017-12-18 20:58:25 +00:00
Ivan A. Kosarev a80c79b5bf [Analysis] Generate more precise TBAA tags when one access encloses the other
There are cases when two tags with different base types denote
accesses to the same direct or indirect member of a structure
type. Currently, merging of such tags results in a tag that
represents an access to an object that has the type of that
member. This patch changes this so that if one of the accesses
encloses the other, then the generic tag is the one of the
enclosed access.

Differential Revision: https://reviews.llvm.org/D39557

llvm-svn: 321019
2017-12-18 20:05:20 +00:00
Teresa Johnson 915897e21b [PGO] Fix handling of cold entry count for instrumented PGO
Summary:
In r277849, getEntryCount was changed to return None when the entry
count was 0, specifically for SamplePGO where it means no samples were
recorded. However, for instrumentation PGO a 0 entry count should be
returned directly, since it does mean that the function was completely
cold. Otherwise we end up treating these functions conservatively
in isFunctionEntryCold() and isColdBB().

Instead, for SamplePGO use -1 when there are no samples, and change
getEntryCount to return None when the value is -1.

Reviewers: danielcdh, davidxl

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D41307

llvm-svn: 321018
2017-12-18 20:02:43 +00:00
Quentin Colombet ec76d9c47f [TableGen][GlobalISel] Optimize MatchTable for faster instruction selection
*** Context ***

Prior to this patchw, the table generated for matching instruction was
straight forward but highly inefficient.

Basically, each pattern generates its own set of self contained checks
and actions.
E.g., TableGen generated:
// First pattern
CheckNumOperand 3
CheckOpcode G_ADD
...
Build ADDrr
// Second pattern
CheckNumOperand 3
CheckOpcode G_ADD
...
Build ADDri
// Third pattern
CheckNumOperand 3
CheckOpcode G_SUB
...
Build SUBrr

*** Problem ***

Because of that generation, a *lot* of check were redundant between each
pattern and were checked every single time until we reach the pattern
that matches.
E.g., Taking the previous table, let say we are matching a G_SUB, that
means we were going to check all the rules for G_ADD before looking at
the G_SUB rule. In particular we are going to do:
check 3 operands; PASS
check G_ADD; FAIL
; Next rule
check 3 operands; PASS (but we already knew that!)
check G_ADD; FAIL (well it is still not true)
; Next rule
check 3 operands; PASS (really!!)
check G_SUB; PASS (at last :P)

*** Proposed Solution ***

This patch introduces a concept of group of rules (GroupMatcher) that
share some predicates and only get checked once for the whole group.

This patch only creates groups with one nesting level. Conceptually
there is nothing preventing us for having deeper nest level. However,
the current implementation is not smart enough to share the recording
(aka capturing) of values. That limits its ability to do more sharing.

For the given example the current patch will generate:
// First group
CheckOpcode G_ADD

 // First pattern
 CheckNumOperand 3
 ...
 Build ADDrr
 // Second pattern
 CheckNumOperand 3
 ...
 Build ADDri

// Second group
CheckOpcode G_SUB

 // Third pattern
 CheckNumOperand 3
 ...
 Build SUBrr

But if we allowed several nesting level, it could create a sub group
for the checknumoperand 3.
(We would need to call optimizeRules on the rules within a group.)

*** Result ***

With only one level of nesting, the instruction selection pass is up
to 4x faster. For instance, one instruction now takes 500 checks,
instead of 24k! With more nesting we could get in the tens I believe.

Differential Revision: https://reviews.llvm.org/D39034

rdar://problem/34670699

llvm-svn: 321017
2017-12-18 19:47:41 +00:00
Dimitry Andric e4f5d01033 Fix more inconsistent line endings. NFC.
llvm-svn: 321016
2017-12-18 19:46:56 +00:00
Jessica Paquette 02c124d644 [MachineOutliner] Recommit r320229
LR was undefined entering outlined functions that contain calls. This made the
machine verifier unhappy when expensive checks were enabled. This fixes that.

llvm-svn: 321014
2017-12-18 19:33:21 +00:00
Benjamin Kramer efc7c88ea8 [PPC] Also disable the pre-emit version of reg+reg to reg+imm transformation.
This has the same issue as the early pass disabled in r321010.

llvm-svn: 321013
2017-12-18 19:21:56 +00:00
Paul Robinson a06f8dcca6 Recommit "[DWARFv5] Dump an MD5 checksum in the line-table header."
Adds missing support for DW_FORM_data16.

Update of r320852/r320886, fixing the unittest again, this time use a
raw char string for the test data.

Differential Revision: https://reviews.llvm.org/D41090

llvm-svn: 321011
2017-12-18 19:08:35 +00:00
Krzysztof Parzyszek 6b589e593d [Hexagon] Generate HVX code for vector sign-, zero- and any-extends
Implement any-extend as zero-extend.

llvm-svn: 321004
2017-12-18 18:32:27 +00:00
Simon Pilgrim f947137ed0 [X86] Regenerate test to improve codegen testing for D41350
llvm-svn: 321003
2017-12-18 18:31:02 +00:00
Teresa Johnson 9ecaaff251 [ThinLTO] Make distributed indexes test more robust
Modify test so that it passes in the reverse-iteration bot.
We use DenseMap instead of std::map for the summaries to emit into
distributed index files. The iteration order is not defined, but
it is deterministic, which is good enough.

llvm-svn: 321000
2017-12-18 18:00:32 +00:00
Xinliang David Li 19fb5b467b [PGO] add MST min edge selection heuristic to ensure non-zero entry count
Differential Revision: http://reviews.llvm.org/D41059

llvm-svn: 320998
2017-12-18 17:56:19 +00:00
Francis Visoiu Mistrih b213b27ee3 [YAML] Add support for non-printable characters
LLVM IR function names which disable mangling start with '\01'
(https://www.llvm.org/docs/LangRef.html#identifiers).

When an identifier like "\01@abc@" gets dumped to MIR, it is quoted, but
only with single quotes.

http://www.yaml.org/spec/1.2/spec.html#id2770814:

"The allowed character range explicitly excludes the C0 control block
allowed), the surrogate block #xD800-#xDFFF, #xFFFE, and #xFFFF."

http://www.yaml.org/spec/1.2/spec.html#id2776092:

"All non-printable characters must be escaped.
[...]
Note that escape sequences are only interpreted in double-quoted scalars."

This patch adds support for printing escaped non-printable characters
between double quotes if needed.

Should also fix PR31743.

Differential Revision: https://reviews.llvm.org/D41290

llvm-svn: 320996
2017-12-18 17:38:03 +00:00
Sander de Smalen 09f56a54d0 [AArch64][SVE] Asm: Improve diagnostics further when +sve is not specified
Summary: Patch [4/4] in a series to add parsing of predicates and properly parse SVE ZIP1/ZIP2 instructions. This patch further improves diagnostic messages for when the SVE feature is not specified.

Reviewers: rengolin, fhahn, olista01, echristo, efriedma

Reviewed By: fhahn

Subscribers: sdardis, aemerson, javed.absar, tschuett, llvm-commits, kristof.beyls

Differential Revision: https://reviews.llvm.org/D40363

llvm-svn: 320992
2017-12-18 16:48:53 +00:00
Simon Dardis fd8c65e868 Reland "[mips] Fix the target specific instruction verifier"
Fix an off by one error in the bounds checking for 'dinsu' and update
the ranges in the test comments so that they are accurate.

This version has the correct commit message.

Reviewers: atanasyan

Differential Revision: https://reviews.llvm.org/D41183

llvm-svn: 320991
2017-12-18 15:56:40 +00:00
Sean Fertile 5fb624a3b8 [Memcpy Loop Lowering] Remove the fixed int8 lowering.
Switch over to the lowering that uses target supplied operand types.

Differential Revision: https://reviews.llvm.org/D41201

llvm-svn: 320989
2017-12-18 15:31:14 +00:00
Sander de Smalen 190979189a [TableGen][AsmMatcherEmitter] Only choose specific diagnostic for enabled instruction
Summary:
When emitting a diagnostic for an invalid operand, a specific diagnostic
should only be reported when the instruction being matched is actually
enabled by the feature flags.

Patch [3/4] in a series to add parsing of predicates and properly parse SVE 
ZIP1/ZIP2 instructions. This patch fixes bogus diagnostic messages for when
the SVE feature is not specified.

Reviewers: rengolin, craig.topper, olista01, sdardis, stoklund

Reviewed By: olista01, sdardis

Subscribers: fhahn, javed.absar, llvm-commits

Differential Revision: https://reviews.llvm.org/D40362

llvm-svn: 320986
2017-12-18 14:34:24 +00:00
Max Kazantsev 1acab00229 [LVI] Support for ashr in LVI
Enhance LVI to analyze the ‘ashr’ binary operation. This leverages the infrastructure in ConstantRange for the ashr operation.

Patch by Surya Kumari Jangala!

Differential Revision: https://reviews.llvm.org/D40886

llvm-svn: 320983
2017-12-18 14:23:30 +00:00
Simon Dardis f70af977af Revert "[mips] Fix the target specific instruction verifier"
This reverts commit r320974. The commit message lacked the Differential Revison: line.

llvm-svn: 320975
2017-12-18 12:30:34 +00:00
Simon Dardis c3c0d4590b [mips] Fix the target specific instruction verifier
Fix an off by one error in the bounds checking for 'dinsu' and update
the ranges in the test comments so that they are accurate.

Reviewers: atanasyan

https://reviews.llvm.org/D41183

llvm-svn: 320974
2017-12-18 12:24:17 +00:00
Sander de Smalen fce0c1c45b [AArch64][SVE] Asm: Add ZIP1/ZIP2 instructions (predicate/data vectors)
Summary: Patch [2/4] in a series to add parsing of predicates and properly parse SVE ZIP1/ZIP2 instructions.

Reviewers: rengolin, kristof.beyls, fhahn, mcrosier, evandro

Reviewed By: fhahn

Subscribers: aemerson, javed.absar, llvm-commits, tschuett

Differential Revision: https://reviews.llvm.org/D40361

llvm-svn: 320973
2017-12-18 11:29:59 +00:00
Sander de Smalen ce1e0975f4 [AArch64][SVE] Asm: Add SVE predicate register definitions and parsing support
Summary: Patch [1/4] in a series to add parsing of predicates and properly parse SVE ZIP1/ZIP2 instructions.

Reviewers: rengolin, kristof.beyls, fhahn, mcrosier, evandro, echristo, efriedma

Reviewed By: fhahn

Subscribers: aemerson, javed.absar, llvm-commits, tschuett

Differential Revision: https://reviews.llvm.org/D40360

llvm-svn: 320970
2017-12-18 11:26:34 +00:00
Tim Northover 9097a07e4e AArch64: work around how Cyclone handles "movi.2d vD, #0".
For Cylone, the instruction "movi.2d vD, #0" is executed incorrectly in some rare
circumstances. Work around the issue conservatively by avoiding the instruction entirely.

This patch changes CodeGen so that problematic instructions are never
generated, and the AsmParser so that an equivalent instruction is used (with a
warning).

llvm-svn: 320965
2017-12-18 10:36:00 +00:00
Igor Laevsky 7bd3fb15e1 [TargetLibraryInfo] Discard library functions with incorrectly sized integers
Differential Revision: https://reviews.llvm.org/D41184

llvm-svn: 320964
2017-12-18 10:31:58 +00:00
Sam Parker fd967f2f7a [ARM] Adjust test checks
Correct the CHECK-LABELS of a couple of dag combine tests.

llvm-svn: 320963
2017-12-18 10:08:03 +00:00
Sam Parker 00804efd72 [DAGCombine] Move AND nodes to multiple load leaves
Search from AND nodes to find whether they can be propagated back to
loads, so that the AND and load can be combined into a narrow load.
We search through OR, XOR and other AND nodes and all bar one of the
leaves are required to be loads or constants. The exception node then
needs to be masked off meaning that the 'and' isn't removed, but the
loads(s) are narrowed still.

Differential Revision: https://reviews.llvm.org/D41177

llvm-svn: 320962
2017-12-18 10:04:27 +00:00
Craig Topper 7034d401f8 [X86] Use mattr instead of mcpu in some of the cost model tests.
Based on the names of the check lines, features seems more appropriate that cpu.

Spotted while prototyping my patch to make 512-bit vectors illegal on SKX sometimes.

llvm-svn: 320959
2017-12-18 07:21:58 +00:00
Hiroshi Inoue c6faf15459 [SROA] Disable non-whole-alloca splits by default
This patch introduce a switch to control splitting of non-whole-alloca slices with default off.
The switch will be default on again after fixing an issue reported in PR35657.

llvm-svn: 320958
2017-12-18 06:47:37 +00:00
Serguei Katkov b0b67a8d38 [CGP] Fix the handling select inst in complex addressing mode
When we put the value in select placeholder we must pass
the value through simplification tracker due to the value might
be already simplified and erased.

This is a fix for PR35658.

Reviewers: john.brawn, uabelho
Reviewed By: john.brawn
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D41251

llvm-svn: 320956
2017-12-18 04:25:07 +00:00
Sanjay Patel 9da049fa8a [x86] add tests for finite libcall lowering (PR35672); NFC
llvm-svn: 320955
2017-12-18 00:38:45 +00:00
Bjorn Steinbrink 3603de2fa2 Re-commit "Properly handle multi-element and dynamically sized allocas in getPointerDereferenceableBytes()""
llvm-clang-x86_64-expensive-checks-win is still broken, so the failure
seems unrelated.

llvm-svn: 320953
2017-12-17 21:20:16 +00:00
Craig Topper 255a76d6d1 [X86] Add test cases that show cases where buildvector of extract and inserts should be turned into fmsubadd.
This is a follow up to the fmaddsub support added in r320950. Hopefully in the future we can fix lowering to handle this fmsubadd too.

llvm-svn: 320951
2017-12-17 18:31:36 +00:00
Craig Topper fd8d040820 [X86] Make the code that creates fmaddsub from build_vector of extracts and inserts functional and add tests.
Summary:
We had no tests for this and we couldn't do the optimization because of a bad use count check. We need to know how many non-undef pieces of the build vector were filled in and ensure our use count is equal to that. But on the shuffle combine version we need the use count to be 2.

The missing coverage was noticed during the review of D40335.

Reviewers: RKSimon, zvi, spatel

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D41133

llvm-svn: 320950
2017-12-17 18:23:45 +00:00
Simon Pilgrim 406d04a916 [X86] Regenerate truncated rotation tests + add missing 32-bit checks
llvm-svn: 320949
2017-12-17 18:20:42 +00:00
Bjorn Steinbrink 6f7bbf349f Revert "Properly handle multi-element and dynamically sized allocas in getPointerDereferenceableBytes()"
This reverts commit 217067d5179882de9deb60d2e866befea4c126e7.

Fails on llvm-clang-x86_64-expensive-checks-win

llvm-svn: 320945
2017-12-17 15:16:58 +00:00
Bjorn Steinbrink c27f81b92b Properly handle byval arguments in getPointerDereferenceableBytes()
Summary:
For byval arguments, the number of dereferenceable bytes is equal to
the size of the pointee, not the pointer.

Reviewers: hfinkel, rnk

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D41305

llvm-svn: 320939
2017-12-17 02:37:42 +00:00
Bjorn Steinbrink 5d86532467 Properly handle multi-element and dynamically sized allocas in getPointerDereferenceableBytes()
Reviewers: hfinkel, rnk

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D41288

llvm-svn: 320938
2017-12-17 01:54:25 +00:00
Craig Topper ee1e71e576 [X86] Use extract_vector_elt instead of X86ISD::VEXTRACT for isel of vXi1 extractions.
llvm-svn: 320937
2017-12-17 01:35:48 +00:00
Craig Topper c0c2d19e08 [X86] Canonicalize extract_vector_elt from vXi1 to always return MVT::i32.
This allows us to remove some isel patterns that allowed MVT::i8 result type.

llvm-svn: 320936
2017-12-17 01:35:47 +00:00
Simon Pilgrim 4c9e8215e9 [X86][AVX] lowerVectorShuffleAsBroadcast - aggressively peek through BITCASTs
Assuming we can safely adjust the broadcast index for the new type to keep it suitably aligned, then peek through BITCASTs when looking for the broadcast source.

Fixes PR32007

llvm-svn: 320933
2017-12-16 23:32:18 +00:00
Simon Pilgrim f3b6da00f5 [X86][AVX] Fix failed broadcast fold
Strip excess BITCASTs from EXTRACT_SUBVECTOR input

llvm-svn: 320930
2017-12-16 22:57:17 +00:00
Sean Fertile 68d7f9da76 [Memcpy Loop Lowering] Only calculate residual size/bytes copied when needed.
If the loop operand type is int8 then there will be no residual loop for the
unknown size expansion. Dont create the residual-size and bytes-copied values
when they are not needed.

llvm-svn: 320929
2017-12-16 22:41:39 +00:00
Craig Topper 1260a4e826 [X86] When using vpopcntdq for ctpop of v8i16 vectors, only promote to v8i32.
Previously we promoted to v8i64, but we don't need to go all the way to 512-bits. If we have VLX we can use the 256-bit instruction. And even if we don't have VLX we can widen v8i32 to v16i32 and drop the upper half.

llvm-svn: 320926
2017-12-16 19:31:36 +00:00
Simon Pilgrim 5f022d278b [InstCombine] Regenerate FMUL/FMA combine tests with update_test_checks.py
llvm-svn: 320922
2017-12-16 17:18:15 +00:00
Sanjay Patel 5a0cdac174 [InstCombine] canonicalize shifty abs(): ashr+add+xor --> cmp+neg+sel
We want to do this for 2 reasons:
1. Value tracking does not recognize the ashr variant, so it would fail to match for cases like D39766.
2. DAGCombiner does better at producing optimal codegen when we have the cmp+sel pattern.

More detail about what happens in the backend:
1. DAGCombiner has a generic transform for all targets to convert the scalar cmp+sel variant of abs 
   into the shift variant. That is the opposite of this IR canonicalization.
2. DAGCombiner has a generic transform for all targets to convert the vector cmp+sel variant of abs 
   into either an ABS node or the shift variant. That is again the opposite of this IR canonicalization.
3. DAGCombiner has a generic transform for all targets to convert the exact shift variants produced by #1 or #2
   into an ISD::ABS node. Note: It would be an efficiency improvement if we had #1 go directly to an ABS node 
   when that's legal/custom.
4. The pattern matching above is incomplete, so it is possible to escape the intended/optimal codegen in a 
   variety of ways.
   a. For #2, the vector path is missing the case for setlt with a '1' constant.
   b. For #3, we are missing a match for commuted versions of the shift variants.
5. Therefore, this IR canonicalization can only help get us to the optimal codegen. The version of cmp+sel 
   produced by this patch will be recognized in the DAG and converted to an ABS node when possible or the 
   shift sequence when not.
6. In the following examples with this patch applied, we may get conditional moves rather than the shift 
   produced by the generic DAGCombiner transforms. The conditional move is created using a target-specific 
   decision for any given target. Whether it is optimal or not for a particular subtarget may be up for debate.

define i32 @abs_shifty(i32 %x) {
  %signbit = ashr i32 %x, 31 
  %add = add i32 %signbit, %x  
  %abs = xor i32 %signbit, %add 
  ret i32 %abs
}

define i32 @abs_cmpsubsel(i32 %x) {
  %cmp = icmp slt i32 %x, zeroinitializer
  %sub = sub i32 zeroinitializer, %x
  %abs = select i1 %cmp, i32 %sub, i32 %x
  ret i32 %abs
}

define <4 x i32> @abs_shifty_vec(<4 x i32> %x) {
  %signbit = ashr <4 x i32> %x, <i32 31, i32 31, i32 31, i32 31> 
  %add = add <4 x i32> %signbit, %x  
  %abs = xor <4 x i32> %signbit, %add 
  ret <4 x i32> %abs
}

define <4 x i32> @abs_cmpsubsel_vec(<4 x i32> %x) {
  %cmp = icmp slt <4 x i32> %x, zeroinitializer
  %sub = sub <4 x i32> zeroinitializer, %x
  %abs = select <4 x i1> %cmp, <4 x i32> %sub, <4 x i32> %x
  ret <4 x i32> %abs
}

> $ ./opt -instcombine shiftyabs.ll -S | ./llc -o - -mtriple=x86_64 -mattr=avx 
> abs_shifty:
> 	movl	%edi, %eax
> 	negl	%eax
> 	cmovll	%edi, %eax
> 	retq
> 
> abs_cmpsubsel:
> 	movl	%edi, %eax
> 	negl	%eax
> 	cmovll	%edi, %eax
> 	retq
> 
> abs_shifty_vec:
> 	vpabsd	%xmm0, %xmm0
> 	retq
> 
> abs_cmpsubsel_vec:
> 	vpabsd	%xmm0, %xmm0
> 	retq
> 
> $ ./opt -instcombine shiftyabs.ll -S | ./llc -o - -mtriple=aarch64
> abs_shifty:
> 	cmp	w0, #0                  // =0
> 	cneg	w0, w0, mi
> 	ret
> 
> abs_cmpsubsel: 
> 	cmp	w0, #0                  // =0
> 	cneg	w0, w0, mi
> 	ret
>                                        
> abs_shifty_vec: 
> 	abs	v0.4s, v0.4s
> 	ret
> 
> abs_cmpsubsel_vec: 
> 	abs	v0.4s, v0.4s
> 	ret
> 
> $ ./opt -instcombine shiftyabs.ll -S | ./llc -o - -mtriple=powerpc64le 
> abs_shifty:  
> 	srawi 4, 3, 31
> 	add 3, 3, 4
> 	xor 3, 3, 4
> 	blr
> 
> abs_cmpsubsel:
> 	srawi 4, 3, 31
> 	add 3, 3, 4
> 	xor 3, 3, 4
> 	blr
> 
> abs_shifty_vec:   
> 	vspltisw 3, -16
> 	vspltisw 4, 15
> 	vsubuwm 3, 4, 3
> 	vsraw 3, 2, 3
> 	vadduwm 2, 2, 3
> 	xxlxor 34, 34, 35
> 	blr
> 
> abs_cmpsubsel_vec: 
> 	vspltisw 3, -16
> 	vspltisw 4, 15
> 	vsubuwm 3, 4, 3
> 	vsraw 3, 2, 3
> 	vadduwm 2, 2, 3
> 	xxlxor 34, 34, 35
> 	blr
>

Differential Revision: https://reviews.llvm.org/D40984

llvm-svn: 320921
2017-12-16 16:41:17 +00:00
Hal Finkel 92ea8acbcd Move Transforms/LoopVectorize/consecutive-ptr-cg-bug.ll into the X86 subdirectory
This test depends on X86's TTI; move into the X86 subdirectory.

llvm-svn: 320914
2017-12-16 05:10:20 +00:00
Hal Finkel 5444f40965 [LV] Extend InstWidening with CM_Widen_Recursive
Changes to the original scalar loop during LV code gen cause the return value
of Legal->isConsecutivePtr() to be inconsistent with the return value during
legal/cost phases (further analysis and information of the bug is in D39346).
This patch is an alternative fix to PR34965 following the CM_Widen approach
proposed by Ayal and Gil in D39346. It extends InstWidening enum with
CM_Widen_Reverse to properly record the widening decision for consecutive
reverse memory accesses and, consequently, get rid of the
Legal->isConsetuviePtr() call in LV code gen. I think this is a simpler/cleaner
solution to PR34965 than the one in D39346.

Fixes PR34965.

Patch by Diego Caballero, thanks!

Differential Revision: https://reviews.llvm.org/D40742

llvm-svn: 320913
2017-12-16 02:55:24 +00:00
Hal Finkel e86a8b79b5 [PowerPC, AsmParser] Enable the mnemonic spell corrector
r307148 added an assembly mnemonic spelling correction support and enabled it
on ARM. This enables that support on PowerPC as well.

Patch by Dmitry Venikov, thanks!

Differential Revision: https://reviews.llvm.org/D40552

llvm-svn: 320911
2017-12-16 02:42:18 +00:00
Craig Topper c08960597c [X86] Add 128 and 256-bit VPOPCNTDQ instructions. Adjust some tablegen classes LZCNT/POPCNT.
I think when this instruction was first published it was only for a Knights CPU and thus VLX version was missing.

llvm-svn: 320910
2017-12-16 02:40:28 +00:00
Vitaly Buka 12f9b8cf24 [LTO] Update tests for r320905
llvm-svn: 320909
2017-12-16 02:40:20 +00:00
Vitaly Buka fd563a0352 Remove trailing whitespace
llvm-svn: 320907
2017-12-16 02:12:35 +00:00
Vitaly Buka a5376f393e [LTO] Make processing of combined module more consistent
Summary:
1. Use stream 0 only for combined module. Previously if combined module was not
processes ThinLTO used the stream for own output. However small changes in input,
could trigger combined module  and shuffle outputs making life of llvm::LTO harder.

2. Always process combined module and write output to stream 0. Processing empty
combined module is cheap and allows llvm::LTO users to avoid implementing processing
which is already done in llvm::LTO.

Subscribers: mehdi_amini, inglorion, eraman, hiraditya

Differential Revision: https://reviews.llvm.org/D41267

llvm-svn: 320905
2017-12-16 02:10:00 +00:00
Teresa Johnson 160f4bb803 Add another missing -enable-import-metadata to test
r320895 modified a test so that it needs -enable-import-metadata which
is false by default for NDEBUG, found another place that needs this
added.

llvm-svn: 320903
2017-12-16 01:35:36 +00:00
Hal Finkel 2ff24731bb [SimplifyLibCalls] Inline calls to cabs when it's safe to do so
When unsafe algerbra is allowed calls to cabs(r) can be replaced by:

  sqrt(creal(r)*creal(r) + cimag(r)*cimag(r))

Patch by Paul Walker, thanks!

Differential Revision: https://reviews.llvm.org/D40069

llvm-svn: 320901
2017-12-16 01:26:25 +00:00
Teresa Johnson 4358a40345 Add -enable-import-metadata to test
r320895 modified a test so that it needs -enable-import-metadata which
is false by default for NDEBUG.

llvm-svn: 320899
2017-12-16 01:00:48 +00:00
Teresa Johnson 81bbf74265 [ThinLTO] Enable importing of aliases as copy of aliasee
Summary:
This implements a missing feature to allow importing of aliases, which
was previously disabled because alias cannot be available_externally.
We instead import an alias as a copy of its aliasee.

Some additional work was required in the IndexBitcodeWriter for the
distributed build case, to ensure that the aliasee has a value id
in the distributed index file (i.e. even when it is not being
imported directly).

This is a performance win in codes that have many aliases, e.g. C++
applications that have many constructor and destructor aliases.

Reviewers: pcc

Subscribers: mehdi_amini, inglorion, eraman, llvm-commits

Differential Revision: https://reviews.llvm.org/D40747

llvm-svn: 320895
2017-12-16 00:18:12 +00:00
Paul Robinson 6d0484f2b6 Revert "Recommit "[DWARFv5] Dump an MD5 checksum in the line-table header.""
This reverts commit 0afef672f63f0e4e91938656bc73424a8c058bfc.
Still failing at runtime on bots.

llvm-svn: 320888
2017-12-15 23:21:52 +00:00
Paul Robinson 5c8f7d7de4 Recommit "[DWARFv5] Dump an MD5 checksum in the line-table header."
Adds missing support for DW_FORM_data16.

Update of r320852, fixing the unittest to use a hand-coded struct
instead of std::array to guarantee data layout.

Differential Revision: https://reviews.llvm.org/D41090

llvm-svn: 320886
2017-12-15 22:57:17 +00:00
Krzysztof Parzyszek 266d6f03a1 [Hexagon] Handle concat_vectors of all allowed HVX types
llvm-svn: 320865
2017-12-15 21:23:12 +00:00
Jun Bum Lim 44c58d35c1 Re-commit : [LICM] Allow sinking when foldable in loop
This recommits r320823 reverted due to the test failure in sink-foldable.ll and
an unused variable. Added "REQUIRES: aarch64-registered-target" in the test
and removed unused variable.

Original commit message:

  Continue trying to sink an instruction if its users in the loop is foldable.
  This will allow the instruction to be folded in the loop by decoupling it from
  the user outside of the loop.

  Reviewers: hfinkel, majnemer, davidxl, efriedma, danielcdh, bmakam, mcrosier

  Reviewed By: hfinkel

  Subscribers: javed.absar, bmakam, mcrosier, llvm-commits

  Differential Revision: https://reviews.llvm.org/D37076

llvm-svn: 320858
2017-12-15 20:33:24 +00:00
Paul Robinson 67ca67d1b2 Revert "[DWARFv5] Dump an MD5 checksum in the line-table header."
Unit test fails on some bots.

llvm-svn: 320857
2017-12-15 20:29:25 +00:00
Krzysztof Parzyszek 29832a6c8b [Hexagon] Fix operand-swapping PatFrag for atomic stores
PatFrag now has the atomicity information stored as bit fields. They
need to be copied to the new PatFrag.

llvm-svn: 320855
2017-12-15 20:13:57 +00:00
Paul Robinson 72546fe87b [DWARFv5] Dump an MD5 checksum in the line-table header.
Adds missing support for DW_FORM_data16.

Differential Revision: https://reviews.llvm.org/D41090

llvm-svn: 320852
2017-12-15 19:52:34 +00:00
Craig Topper 3fb8386685 [SelectionDAG][X86] Fix insert_vector_elt lowering for v32i1/v64i1 with non-constant index
Summary:
Currently we don't handle v32i1/v64i1 insert_vector_elt correctly as we fail to look at the number of elements closely and assume it can only be v16i1 or v8i1.

We also can't type legalize v64i1 insert_vector_elt correctly on KNL due to the type not being byte addressable as required by the legalizing through memory accesses path requires.

For the first issue, the patch now tries to pick a 512-bit register with the correct number of elements and promotes to that.

For the second issue, we now extend the vector to a byte addressable type, do the stores to memory, load the two halves, and then truncate the halves back to the original type. Technically since we changed the type, we may not need two loads, but actually checking that is more work and for the v64i1 case we do need them.

Reviewers: RKSimon, delena, spatel, zvi

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D40942

llvm-svn: 320849
2017-12-15 19:35:22 +00:00
Sean Fertile 42b13343fd [Memcpy Loop Lowering] Insert loop BB inbetween the split BB.
The original memcpy expansion inserted the loop basic block inbetween
the 2 new basic blocks created by splitting the original block the memcpy
call was in. This commit makes the new memcpy expansion do the same to keep the
layout of the IR matching between the old and new implementations.

Differential Review: https://reviews.llvm.org/D41197

llvm-svn: 320848
2017-12-15 19:29:12 +00:00
Andrew V. Tischenko 22f0742dda Fix for bug PR35549 - Repeated schedule comments.
Differential Revision: https://reviews.llvm.org/D40960

llvm-svn: 320837
2017-12-15 18:13:05 +00:00
Jun Bum Lim 5efd4d8b5e Revert "Re-commit : [LICM] Allow sinking when foldable in loop"
This reverts commit r320833.

llvm-svn: 320836
2017-12-15 18:12:49 +00:00
Jun Bum Lim 83ccad6684 Re-commit : [LICM] Allow sinking when foldable in loop
This recommit r320823 after fixing a test failure.

 Original commit message:

    Continue trying to sink an instruction if its users in the loop is foldable.
    This will allow the instruction to be folded in the loop by decoupling it from
    the user outside of the loop.

    Reviewers: hfinkel, majnemer, davidxl, efriedma, danielcdh, bmakam, mcrosier

    Reviewed By: hfinkel

    Subscribers: javed.absar, bmakam, mcrosier, llvm-commits

    Differential Revision: https://reviews.llvm.org/D37076

llvm-svn: 320833
2017-12-15 17:58:59 +00:00
Michael Trent a1703b1fc2 Updated llvm-objdump to display local relocations in Mach-O binaries
Summary:
llvm-objdump's Mach-O parser was updated in r306037 to display external
relocations for MH_KEXT_BUNDLE file types. This change extends the Macho-O
parser to display local relocations for MH_PRELOAD files. When used with
the -macho option relocations will be displayed in a historical format.

All tests are passing for llvm, clang, and lld. llvm-objdump builds without
compiler warnings.

rdar://35778019

Reviewers: enderby

Reviewed By: enderby

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D41199

llvm-svn: 320832
2017-12-15 17:57:40 +00:00
Craig Topper a16395008c [X86] Fix XSAVE64 and similar instructions to not be allowed by the assembler in 32-bit mode.
There was a top level "let Predicates =" in the .td file that was overriding the Requires on each instruction.

I've added an assert to the code emitter to catch more cases like this. I'm sure this isn't the only place where the right predicates aren't being applied. This assert already found that we don't block btq/btsq/btrq in 32-bit mode.

llvm-svn: 320830
2017-12-15 17:22:58 +00:00