Commit Graph

353732 Commits

Author SHA1 Message Date
Raphael Isemann 7283ec0170 [lldb] Fix RecordDecl match string in module-ownership.mm to get the test running again
The relevant output FileCheck is scanning in this test is as follows:

    CXXRecordDecl 0x7f96cf8239c8 <<invalid sloc>> <invalid sloc> imported in A.B <undeserialized declarations> struct definition
    <<DefinitionData boilerplate>>
    `-FieldDecl 0x7f96cf823b90 <<invalid sloc>> <invalid sloc> imported in A.B anon_field_b 'int'
    (anonymous struct)
    CXXRecordDecl 0x7f96cf823be8 <<invalid sloc>> <invalid sloc> imported in A.B struct

Before 710fa2c4ee this test was passing by
accident as it had a -DAG suffix in the checks changed by this patch,
causing FileCheck to first match the last line of the output above
(instead of the first one), and then finding the FieldDecl above.
When I removed the -DAG suffix, FileCheck actually enforced the ordering
and started failing as the FieldDecl comes before the CXXRecordDecl match
we get.

This patch fixes the CXXRecordDecl check to find the first line of the output
above which caused FileCheck to also find the FieldDecl that follows. Also
gives the FieldDecl a more unique name to make name collisions less likely.
2020-05-08 15:05:19 +02:00
Benjamin Kramer f936457f80 Revert "Recommit "[LV] Induction Variable does not remain scalar under tail-folding.""
This reverts commit ae45b4dbe7. It
causes miscompilations, test case on the mailing list.
2020-05-08 14:49:10 +02:00
Raphael Isemann 13a1b3c1e6 [lldb] Prevent objc-root-class warning when compiling module-ownership.mm test
This test was generating the following false-positive warning when being compiled:
 warning: class 'SomeClass' defined without specifying a base class [-Wobjc-root-class]
2020-05-08 14:41:01 +02:00
Adam Czachorowski 8e7bb37dfb [clangd] Fix crash in AddUsing tweak due to non-identifier DeclName
Patch by Adam Czachorowski!

Reviewers: hokein

Reviewed By: hokein

Subscribers: ilya-biryukov, MaskRay, jkorous, arphaman, kadircet, usaxena95, cfe-commits

Tags: #clang

Differential Revision: https://reviews.llvm.org/D79582
2020-05-08 14:26:21 +02:00
Raphael Isemann 710fa2c4ee [lldb] Make module-ownership.mm test more robust against AST node ordering
The current test is checking both the anonymous structs and the template
specializations in one FileCheck run, but the anonymous struct line can
partially match the AST dump of a template specialization, causing that
FileCheck won't match that same line later against the template specialization
check and incorrectly fails on that check. This only happens when the
template specialization node somehow ends up before the anonymous struct node.

This patch just puts the checks for the anonymous structs in their own FileCheck
run to prevent them from partially matching any other record decl.

Fixes rdar://62997926
2020-05-08 13:44:13 +02:00
Simon Pilgrim 5f9f37c42a [X86][AVX] Don't let X86ISD::BROADCAST peek through bitcasts to illegal types.
This was an existing bug exposed by the more aggressive X86ISD::BROADCAST generation by rG8817334ce3c7

Original test case thanks to @mstorsjo
2020-05-08 12:30:50 +01:00
Simon Pilgrim a0da4466d8 RemarkStringTable.h - reduce StringRef/Remark includes to forward declarations. NFC
Move StringRef.h include down to RemarkStringTable.cpp and remove some unused includes there as well.
2020-05-08 12:30:49 +01:00
Calixte Denizet 0da37bedc2 [compiler-rt] Reduce the number of threads in gcov test to avoid failure
Summary:
Patch in D78477 introduced a new test for gcov and this test is failing on arm:
 - http://lab.llvm.org:8011/builders/clang-cmake-thumbv7-full-sh/builds/4752/steps/ninja%20check%202/logs/stdio
  - http://lab.llvm.org:8011/builders/clang-cmake-armv7-full/builds/10501/steps/ninja%20check%202/logs/stdio
So try to fix it in reducing the number of threads.

Reviewers: marco-c

Reviewed By: marco-c

Subscribers: dberris, kristof.beyls, #sanitizers, serge-sans-paille, sylvestre.ledru

Tags: #sanitizers

Differential Revision: https://reviews.llvm.org/D79621
2020-05-08 12:48:07 +02:00
Simon Pilgrim 8fd9af4518 Remark.h - reduce ArrayRef.h include to SmallVector.h. NFC.
We only need to include SmallVector.h in Remark.h, and then the more bulky ArrayRef.h in Remark.cpp.
2020-05-08 11:10:28 +01:00
Simon Pilgrim ffd9cfa740 AArch6/ARMTargetParser.h - move Triple.h dependency down to cpp file. NFC.
Reduce Triple.h include to a forward declaration in the header. Only the implementations in the cpp files need the actual Triple class definition.
2020-05-08 11:10:28 +01:00
Frederik Gossen 5d5f61fc89 [MLIR] Add complex addition and substraction to the standard dialect
Complex addition and substraction are the first two binary operations on complex
numbers.
Remaining operations will follow the same pattern.

Differential Revision: https://reviews.llvm.org/D79479
2020-05-08 09:54:18 +00:00
Kirill Bobyrev 9c198b550e
[clangd] NFC: Use deprecated grpc++ headers for compatibility
Summary:
Ubuntu 18.04 and older versions do not provide latest gRCP packages and the
ones that are in the repository use deprecated headers. Use these headers to
make builds possible.

https://packages.ubuntu.com/bionic/amd64/libgrpc++-dev/filelist

Reviewers: sammccall

Reviewed By: sammccall

Subscribers: ilya-biryukov, MaskRay, jkorous, arphaman, kadircet, usaxena95, cfe-commits

Tags: #clang

Differential Revision: https://reviews.llvm.org/D79487
2020-05-08 11:04:52 +02:00
Igor Kudrin 6ab09e7177 Fix a failing test.
Differential Revision: https://reviews.llvm.org/D79501
2020-05-08 15:36:01 +07:00
Nikita Popov 5a2265647e Reapply [InstSimplify] Remove known bits constant folding
No changes relative to last time, but after a mitigation for
an AMDGPU regression landed.

---

If SimplifyInstruction() does not succeed in simplifying the
instruction, it will compute the known bits of the instruction
in the hope that all bits are known and the instruction can be
folded to a constant. I have removed a similar optimization
from InstCombine in D75801, and would like to drop this one as well.

On average, we spend ~1% of total compile-time performing this
known bits calculation. However, if we introduce some additional
statistics for known bits computations and how many of them succeed
in simplifying the instruction we get (on test-suite):

    instsimplify.NumKnownBits: 216
    instsimplify.NumKnownBitsComputed: 13828375
    valuetracking.NumKnownBitsComputed: 45860806

Out of ~14M known bits calculations (accounting for approximately
one third of all known bits calculations), only 0.0015% succeed in
producing a constant. Those cases where we do succeed to compute
all known bits will get folded by other passes like InstCombine
later. On test-suite, only lencod.test and GCC-C-execute-pr44858.test
show a hash difference after this change. On lencod we see an
improvement (a loop phi is optimized away), on the GCC torture
test a regression (a function return value is determined only
after IPSCCP, preventing propagation from a noinline function.)

There are various regressions in InstSimplify tests. However, all
of these cases are already handled by InstCombine, and corresponding
tests have already been added there.

Differential Revision: https://reviews.llvm.org/D79294
2020-05-08 10:24:53 +02:00
Igor Kudrin 989ae9e848 [DebugInfo] Fix handling DW_OP_call_ref in DWARF64 units.
DW_OP_call_ref is the only operation that has an operand which depends
on the DWARF format. The patch fixes handling that operation in DWARF64
units.

Differential Revision: https://reviews.llvm.org/D79501
2020-05-08 15:14:42 +07:00
Igor Kudrin 050c9dd43a [DebugInfo] Fix printing values of forms which depend on the DWARF format.
The values are 8 bytes long in DWARF64, so they should not be truncated
to uint32_t on dumping.

Differential Revision: https://reviews.llvm.org/D79093
2020-05-08 15:14:41 +07:00
Nikita Popov 5fa87ec004 [AMDGPU] Try to determine sign bit during div/rem expansion
This is preparation for D79294, which removes an expensive
InstSimplify optimization, on the assumption that it will be
picked up by InstCombine instead. Of course, this does not hold
up if a backend performs non-trivial IR expansions without running
a canonicalization pipeline afterwards, which turned up as an
issue in the context of AMDGPU div/rem expansion.

This patch mitigates the issue by explicitly performing a known
bits calculation where it matters. No test changes, as those would
only be visible after the other patch lands.

Differential Revision: https://reviews.llvm.org/D79596
2020-05-08 10:11:26 +02:00
Marcel Koester 568787f81e [mlir] Updated SideEffect interface definitions to use tablegen Resource objects.
The SideEffect interface definitions currently use string expressions to
reference custom resource objects. This CL introduces Resource objects in
tablegen definitions to simplify linking of resource reference to resource
objects.

Differential Revision: https://reviews.llvm.org/D78917
2020-05-08 09:55:08 +02:00
Craig Topper 5e74cf2999 [X86] Add v32i8 and v64i8 tests to vec_smulo.ll and vec_umulo.ll. NFC
I was look at our vXi8 handling in LowerMULH and noticed that
vXi8 mulo uses but we don't test all types.
2020-05-07 22:17:25 -07:00
sameeran joshi 332e6aea37 [flang]Semantics for SELECT RANK.
Summary:
    Initially on github I worked on semantic checks.Then I tried some compile-time
    test of the rank value, they were failing as there were no symbols
    generated for them inside SELECT RANK's scope.So I went further to
    add new symbol in each scope, also added the respective 'rank: '
    field for a symbol when we dump the symboltable. I added a field to
    keep track of the rank in AssocEntityDetails class.This caused shape
    analysis framework to become inconsistent. So shape analysis framework
    was updated to handle this new representation.

	 *   I added more tests for above changes.

	 *   On phabricator I addressed some minor changes.

	 *   Lastly I worked on review comments.

    Reviewers: klausler,sscalpone,DavidTruby,kiranchandramohan,tskeith,anchu-rajendran,kiranktp

    Reviewed By:klausler, DavidTruby, tskeith

    Subscribers:#flang-commits, #llvm-commits

    Tags: #flang, #llvm

    Differential Revision: https://reviews.llvm.org/D78623
2020-05-08 08:52:31 +05:30
Weverything 4ae537c222 Fix false positive with -Wnon-c-typedef-for-linkage
Implicit methods for structs can confuse the warning, so exclude checking
the Decl's that are implicit. Implicit Decl's for lambdas still need to
be checked, so skipping all implicit Decl's won't work.

Differential Revision: https://reviews.llvm.org/D79548
2020-05-07 19:20:08 -07:00
aartbik 771d30c647 [llvm] [CodeGen] Fixed vector halving bug for masked store
Summary:
Note that this fix is very similar to what has already been
done for the masked load in https://reviews.llvm.org/D78608

Bugs:
https://bugs.llvm.org/show_bug.cgi?id=45563
https://bugs.llvm.org/show_bug.cgi?id=45833

Reviewers: craig.topper, nicolasvasilache, mehdi_amini

Reviewed By: craig.topper

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D79611
2020-05-07 19:01:40 -07:00
Jason Molenda 2ea7187ab9 Add a new lockdownd plist for launching posix processes
Similar to
com.apple.debugserver.plist & com.apple.debugserver.internal.plist
com.apple.debugserver.applist.plist & com.apple.debugserver.applist.internal.plist
add a variant of the posix plist.

<rdar://problem/62995567>
2020-05-07 18:53:51 -07:00
Xing GUO ce86a986c3 [Object] Remove unneeded check in ELFFile<ELFT>::dynamicEntries().
Check for `DynSecSize % sizeof(Elf_Dyn) != 0` is unneeded in this context.

1. If the .dynamic section is acquired from program headers, the .dynamic
section is "cut off" by

```
makeArrayRef(..., Phdr.p_filesz / sizeof(Elf_Dyn));
DynSeSize = Phdr.p_filesz;
```

2. If the .dynamic section is acquired from section headers, the .dynamic
section is checked in `getSectionContentsAsArray<Elf_Dyn>(&Sec)`.

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D79560
2020-05-08 09:54:36 +08:00
Sriraman Tallam e8147ad822 Uniuqe Names for Internal Linkage Symbols.
This is a standalone patch and this would help Propeller do a better job of code
layout as it can accurately attribute the profiles to the right internal linkage
function.

This also helps SampledFDO/AutoFDO correctly associate sampled profiles to the
right internal function. Currently, if there is more than one internal symbol
foo, their profiles are aggregated by SampledFDO.

This patch adds a new clang option, -funique-internal-funcnames, to generate
unique names for functions with internal linkage. This patch appends the md5
hash of the module name to the function symbol as a best effort to generate a
unique name for symbols with internal linkage.

Differential Revision: https://reviews.llvm.org/D73307
2020-05-07 18:18:37 -07:00
Diego Caballero f5224d437e [LoopFusion] Remove unreachable blocks from DT and LI after fusion
This patch removes FC0.ExitBlock and FC1GuardBlock from DT and LI
after fusion of guarded loops. They become unreachable and LI
verification failed when they happened to be inside another loop.

Reviewed By: kbarton

Differential Revision: https://reviews.llvm.org/D78679
2020-05-07 16:44:40 -07:00
Nico Weber 29396059a4 Revert "[YAMLVFSWriter][Test][NFC] Add couple tests"
This reverts commit 7143d79254.
Breaks check-llvm on Windows, see e.g.
http://lab.llvm.org:8011/builders/clang-x64-windows-msvc/builds/15919/steps/stage%201%20check/logs/stdio
2020-05-07 19:07:08 -04:00
James Y Knight 7af9d386da Correctly modify the CFG in IfConverter, and then remove the
CorrectExtraCFGEdges function.

The latter was a workaround for "Various pieces of code" leaving bogus
extra CFG edges in place. Where by "various" it meant only
IfConverter::MergeBlocks, which failed to clear all of the successors
of dead blocks it emptied out. This wouldn't matter a whole lot,
except that the dead blocks remained listed as predecessors of
still-useful blocks, inhibiting optimizations.

This fix slightly changed two thumb tests, because the correct CFG
successors allowed for the "diamond" if-conversion pattern to be
detected, when it could only use "simple" before.

Additionally, the removal of a now-redundant call to analyzeBranch
(with AllowModify=true) in BranchFolder::OptimizeFunction caused a
later check for an empty block in BranchFolder::OptimizeBlock to
fail. Correct this by moving the call to analyzeBranch in
OptimizeBlock higher.

Differential Revision: https://reviews.llvm.org/D79527
2020-05-07 18:17:07 -04:00
Jonas Devlieghere 13062d0fb7 [lldb/Test] Skip more tests that are not expected to work with passive replay
This skips some tests that pass with active replay (which doesn't check
the output) but fail with passive replay. Valid reasons for this
include:

 - Checking the output of the process (which doesn't run during replay),
 - Checking files that cannot be captured in the VFS (non-existing or
   unreadable files or files that are removed during test),

Unfortunately there's no good way to mark a test as supported for active
replay but unsupported for passive replay because the number and order
of API calls needs to be identical during capture and replay. I don't
think this is a huge loss however.
2020-05-07 15:16:52 -07:00
Johannes Doerfert edf0391491 [Attributor][FIX] Record dependences for assumed dead abstract attributes
In a recent patch we introduced a problem with abstract attributes that
were assumed dead at some point. Since `Attributor::updateAA` was
introduced in 95e0d28b71, we did not
remember the dependence on the liveness AA when an abstract attribute
was assumed dead and therefore not updated.

Explicit reproducer added in liveness.ll.

---

Single run of the Attributor module and then CGSCC pass (oldPM)
for SPASS/clause.c (~10k LLVM-IR loc):

Before:
```
calls to allocation functions: 509242 (345483/s)
temporary memory allocations: 98666 (66937/s)
peak heap memory consumption: 18.60MB
peak RSS (including heaptrack overhead): 103.29MB
total memory leaked: 269.10KB
```

After:
```
calls to allocation functions: 529332 (355494/s)
temporary memory allocations: 102107 (68574/s)
peak heap memory consumption: 19.40MB
peak RSS (including heaptrack overhead): 102.79MB
total memory leaked: 269.10KB
```

Difference:
```
calls to allocation functions: 20090 (1339333/s)
temporary memory allocations: 3441 (229400/s)
peak heap memory consumption: 801.45KB
peak RSS (including heaptrack overhead): 0B
total memory leaked: 0B
```
2020-05-07 17:00:50 -05:00
Johannes Doerfert 675334daef [Attributor] Mark dependence as optional 2020-05-07 17:00:50 -05:00
zoecarver 6d2a66b10d [libc++] ECMAScript IdentityEscape is ambiguous (2584)
This patch fixes [[ https://cplusplus.github.io/LWG/issue2584 | 2584 ]]. Now the following works:
    const std::regex r1("\\z");
    assert(std::regex_match("z", r1));

Differential Revision: https://reviews.llvm.org/D66610
2020-05-07 14:26:25 -07:00
Ed Maste 21e5e1724b getMainExecutable: Fix hand-rolled AT_EXECPATH for older FreeBSD
Once we hit AT_NULL, we need to bail out of the loop; not just the
enclosing switch.  This fixes basic usage (e.g. `cc --version`) when
AT_EXECPATH isn't present on older branches (e.g. under
emu-user-static, at the moment), where we would previously run off
the end of ::environ.

Patch By: kevans

Reviewed By: arichardson

Differential Revision:	https://reviews.llvm.org/D79239
2020-05-07 17:05:17 -04:00
zoecarver df73e36dc6 [libcxx] [NFC] fpos Requirements (p0759r1).
Implements p0759r1. Test-only change. Adds explicit test for table 106 and type checking.

Differential Review: https://reviews.llvm.org/D60491
2020-05-07 14:02:42 -07:00
mydeveloperday 5a4ddbd69d [clang-format] [PR45639] clang-format splits up the brackets of C++17 attribute [[ ]] when used with the first parameter
Summary:
https://bugs.llvm.org/show_bug.cgi?id=45639

clang-format incorrectly splits the `[[` in a long argument list

```
void SomeLongClassName::ALongMethodNameInThatClass([[maybe_unused]] const shared_ptr<ALongTypeName>& argumentNameForThat
LongType) {

}
```

becomes

```
void SomeLongClassName::ALongMethodNameInThatClass([
    [maybe_unused]] const shared_ptr<ALongTypeName> &argumentNameForThatLongType) {

}
```

leaving one `[` on the previous line

For a function with just 1 very long argument, clang-format chooses to split between the `[[`,

This revision prevents the slip between the two `[` and the second `[`

Reviewed By: krasimir

Subscribers: cfe-commits

Tags: #clang, #clang-format

Differential Revision: https://reviews.llvm.org/D79401
2020-05-07 22:00:04 +01:00
Alina Sbirlea 6227f021ad [SimpleLoopUnswitch] Update DefaultExit condition to check unreachable is not empty.
Summary:
Update the check for the default exit block to not only check that the
terminator is not unreachable, but also check that unreachable block has
*only* the unreachable instruction.

Reviewers: chandlerc

Subscribers: hiraditya, uabelho, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D78277
2020-05-07 13:48:30 -07:00
Marcel Hlopko c9e6519d15 Remove unused _LIBCPP_RAW_ITERATORS
Summary: This change removes seemingly unused _LIBCPP_RAW_ITERATORS.

Reviewers: #libc, EricWF, ldionne

Reviewed By: #libc, EricWF, ldionne

Subscribers: dexonsmith, ldionne, gribozavr2, libcxx-commits

Tags: #libc

Differential Revision: https://reviews.llvm.org/D79323
2020-05-07 22:26:15 +02:00
Erich Keane f9eaa6934e Ensure aux-target specific builtins get validated.
I discovered that when using an aux-target builtin, it was recognized as
a builtin but never checked. This patch checks for an aux-target builtin
and instead validates it against the correct target.

It does this by extracting the checking code for Target-specific
builtins into its own function, then calls with either targetInfo or
AuxTargetInfo.
2020-05-07 13:22:10 -07:00
Sanjay Patel 5b48f7d2fc [VectorCombine] adjust test to make intent clearer; NFC
Create a non-zero result to show that the other lane is computed correctly.
2020-05-07 16:21:17 -04:00
Huihui Zhang e8ea1eb4c1 [NFC] Adjust test check lines for D78267.
This wasn't identified through buildbot before.
2020-05-07 13:20:15 -07:00
Evgenii Stepanov b4aa71e1bd Allow -fsanitize-minimal-runtime with memtag sanitizer.
Summary:
MemTag does not have any runtime at the moment, it's strictly code
instrumentation.

Reviewers: pcc

Subscribers: cryptoad, cfe-commits

Tags: #clang

Differential Revision: https://reviews.llvm.org/D79522
2020-05-07 13:07:46 -07:00
Huihui Zhang 1ec0cc0f02 [InstCombine][SVE] Fix visitExtractElementInst for scalable type.
Summary:
This patch fix the following issues with visitExtractElementInst:

      1. Restrict VectorUtils::findScalarElement to fixed-length vector.
         For scalable type, the number of elements in shuffle mask is
         unknown at compile-time.
      2. Fix out-of-range calculation for fixed-length vector.
      3. Skip scalable type when analysis rely on fixed number of elements.
      4. Add unit tests to check functionality of extractelement for scalable type.

Reviewers: sdesmalen, efriedma, spatel, nikic

Reviewed By: efriedma

Subscribers: tschuett, hiraditya, rkruppe, psnobl, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D78267
2020-05-07 13:03:52 -07:00
Nico Weber d03838343f Make -Wnonportable-include-path ignore drive case on Windows.
See PR45812 for motivation.

No explicit test since I couldn't figure out how to get the
current disk drive in lower case into a form in lit where I could
mkdir it and cd to it. But the change does have test coverage in
that I can remove the case normalization in lit, and tests failed
on several bots (and for me locally if in a pwd with a lower-case
drive) without that normalization prior to this change.

Differential Revision: https://reviews.llvm.org/D79531
2020-05-07 15:54:09 -04:00
Erich Keane ed86058b53 Add static assert to ID Table to make sure aux targets work right.
I discovered that the limit on possible builtins managed by this
ObjCOrBuiltin variable is too low when combining large targets, since
aux-targets are appended to the targets list. A runtime assert exists
for this, however this patch creates a static-assert as well.

The logic for said static-assert is to make sure we have the room for
the aux-target and target to both be the largest list, which makes sure
we have room for all possible combinations.

I also incremented the number of bits by 1, since I discovered this
currently broken.  The current bit-count was 36, so this doesn't
increase any size.
2020-05-07 12:49:46 -07:00
Huihui Zhang 08c9c13749 [InstCombine][SVE] Fix visitInsertElementInst for scalable type.
Summary:
This patch fixes the following issues in visitInsertElementInst:

      1. Bail out for scalable type when analysis requires fixed size number of vector elements.
      2. Use cast<FixedVectorType> to get vector number of elements. This ensure assertion
          on scalable vector type.
      3. For scalable type, avoid folding a chain of insertelement into splat:
            insertelt(insertelt(insertelt(insertelt X, %k, 0), %k, 1), %k, 2) ...
              ->
            shufflevector(insertelt(X, %k, 0), undef, zero)
          The length of scalable vector is unknown at compile-time, therefore we don't know if
          given insertelement sequence is valid for splat.

Reviewers: sdesmalen, efriedma, spatel, nikic

Reviewed By: sdesmalen, efriedma

Subscribers: tschuett, hiraditya, rkruppe, psnobl, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D78895
2020-05-07 12:44:52 -07:00
Jacques Pienaar 5eae715a31 [mlir] Add NamedAttrList
This is a wrapper around vector of NamedAttributes that keeps track of whether sorted and does some minimal effort to remain sorted (doing more, e.g., appending attributes in sorted order, could be done in follow up). It contains whether sorted and if a DictionaryAttr is queried, it caches the returned DictionaryAttr along with whether sorted.

Change MutableDictionaryAttr to always return a non-null Attribute even when empty (reserve null cases for errors). To this end change the getter to take a context as input so that the empty DictionaryAttr could be queried. Also create one instance of the empty dictionary attribute that could be reused without needing to lock context etc.

Update infer type op interface to use DictionaryAttr and use NamedAttrList to avoid incurring multiple conversion costs.

Fix bug in sorting helper function.

Differential Revision: https://reviews.llvm.org/D79463
2020-05-07 12:33:36 -07:00
Sanjay Patel 5d0f2fdfa5 [VectorCombine] add tests with undefs; NFC
Goes with D79452.
2020-05-07 15:28:26 -04:00
Matt Arsenault 6f17b3e3a7 AMDGPU: Fix broken tests for HSA metadata
These were testing byval private kernel arguments, which doesn't make
any sense and has never been used. There didn't seem to be any tests
for real value struct arguments, which are.
2020-05-07 15:27:12 -04:00
Sanjay Patel 02051c7f3a [SLP] add another bailout for load-combine patterns (2nd try)
The original patch (rG86dfbc676ebe) exposed an existing bug:
we could wrongly cast a constant expression to BinaryOperator
because the pattern matching allows that. This adds a check
for that case, and there's a reduced test case to verify no
crashing.

Original commit message:

This builds on the or-reduction bailout that was added with D67841.
We still do not have IR-level load combining, although that could
be a target-specific enhancement for -vector-combiner.

The heuristic is narrowly defined to catch the motivating case from
PR39538:
https://bugs.llvm.org/show_bug.cgi?id=39538
...while preserving existing functionality.

That is, there's an unmodified test of pure load/zext/store that is
not seen in this patch at llvm/test/Transforms/SLPVectorizer/X86/cast.ll.
That's the reason for the logic difference to require the 'or'
instructions. The chances that vectorization would actually help a
memory-bound sequence like that seem small, but it looks nicer with:

  vpmovzxwd     (%rsi), %xmm0
  vmovdqu       %xmm0, (%rdi)

rather than:

  movzwl        (%rsi), %eax
  movl  %eax, (%rdi)
  ...

In the motivating test, we avoid creating a vector mess that is
unrecoverable in the backend, and SDAG forms the expected bswap
instructions after load combining:

  movzbl (%rdi), %eax
  vmovd %eax, %xmm0
  movzbl 1(%rdi), %eax
  vmovd %eax, %xmm1
  movzbl 2(%rdi), %eax
  vpinsrb $4, 4(%rdi), %xmm0, %xmm0
  vpinsrb $8, 8(%rdi), %xmm0, %xmm0
  vpinsrb $12, 12(%rdi), %xmm0, %xmm0
  vmovd %eax, %xmm2
  movzbl 3(%rdi), %eax
  vpinsrb $1, 5(%rdi), %xmm1, %xmm1
  vpinsrb $2, 9(%rdi), %xmm1, %xmm1
  vpinsrb $3, 13(%rdi), %xmm1, %xmm1
  vpslld $24, %xmm0, %xmm0
  vpmovzxbd %xmm1, %xmm1 # xmm1 = xmm1[0],zero,zero,zero,xmm1[1],zero,zero,zero,xmm1[2],zero,zero,zero,xmm1[3],zero,zero,zero
  vpslld $16, %xmm1, %xmm1
  vpor %xmm0, %xmm1, %xmm0
  vpinsrb $1, 6(%rdi), %xmm2, %xmm1
  vmovd %eax, %xmm2
  vpinsrb $2, 10(%rdi), %xmm1, %xmm1
  vpinsrb $3, 14(%rdi), %xmm1, %xmm1
  vpinsrb $1, 7(%rdi), %xmm2, %xmm2
  vpinsrb $2, 11(%rdi), %xmm2, %xmm2
  vpmovzxbd %xmm1, %xmm1 # xmm1 = xmm1[0],zero,zero,zero,xmm1[1],zero,zero,zero,xmm1[2],zero,zero,zero,xmm1[3],zero,zero,zero
  vpinsrb $3, 15(%rdi), %xmm2, %xmm2
  vpslld $8, %xmm1, %xmm1
  vpmovzxbd %xmm2, %xmm2 # xmm2 = xmm2[0],zero,zero,zero,xmm2[1],zero,zero,zero,xmm2[2],zero,zero,zero,xmm2[3],zero,zero,zero
  vpor %xmm2, %xmm1, %xmm1
  vpor %xmm1, %xmm0, %xmm0
  vmovdqu %xmm0, (%rsi)

  movl  (%rdi), %eax
  movl  4(%rdi), %ecx
  movl  8(%rdi), %edx
  movbel        %eax, (%rsi)
  movbel        %ecx, 4(%rsi)
  movl  12(%rdi), %ecx
  movbel        %edx, 8(%rsi)
  movbel        %ecx, 12(%rsi)

Differential Revision: https://reviews.llvm.org/D78997
2020-05-07 15:04:37 -04:00
Sanjay Patel 62ea77ec02 [SLP] add test for constant expression fake of load-combine pattern; NFC
This is a reduction of the test that caused D78997 to be reverted.
2020-05-07 15:04:37 -04:00