Commit Graph

387882 Commits

Author SHA1 Message Date
Roman Lebedev d5494931f2
[NFCI][X86] Mark a few lately-added system instructions as such for Scheduling purposes 2021-05-09 01:07:07 +03:00
Fangrui Song 492173d42b [test] Fix tools/gold/X86/new-pm.ll after D101797 2021-05-08 13:41:36 -07:00
Krzysztof Parzyszek 561026936b [Hexagon] Propagate metadata in Hexagon Vector Combine 2021-05-08 14:35:55 -05:00
Andrea Di Biagio de1843e51a [llvm-mca][View] Update the Register File statistics.
Correctly track the number of move eliminated in the
Register File statistics.
2021-05-08 19:43:16 +01:00
Greg McGary 5be8502271 [lld-macho] Explicitly undefine literal exported symbols
Symbols explicitly exported via command-line options `--exported_symbol SYM` and `--exported_symbols_list FILE` must be defined. Before this fix, lazy symbols defined in archives would be left to languish. We now force them to be included in the linked output.

Differential Revision: https://reviews.llvm.org/D102100
2021-05-08 11:37:00 -07:00
Andrea Di Biagio 9ceea66602 [MCA][RegisterFile] Refactor the move elimination logic to address PR50258.
This patch lifts the restriction on the number of read/write registers for a
move elimination candidate.  With this patch, move elimination candidates with
exactly two reads and two writes are treated like register swap operations for
the purpose of move elimination.

This patch currently doesn't affect any upstream model. However, it should help
unblock the progress on PR50258.
2021-05-08 18:10:35 +01:00
Nico Weber 7b6dd265ce [lld/mac] Copy some of the commit message of d5a70db193 into a comment 2021-05-08 13:03:17 -04:00
Louis Dionne 2054474640 [libc++] NFC: Refactor Lit annotations
Annotations for c++03 mode are useless, since we only run these tests
in C++11 and C++14.
2021-05-08 12:16:41 -04:00
Florian Hahn 2bf34c0a93
[VPlan] Add test for sink scalars and merging using VPlan.
Add a couple of tests with scalars that can be sunk to their predicated
users.

This pre-commits tests for D100258.
2021-05-08 16:47:48 +01:00
Simon Pilgrim ab5ee342b9 [GlobalISel] Ensure MachineIRBuilder::getDebugLoc() returns a const reference. NFCI.
Avoids a lot of unnecessary tracking increments/decrements of the underlying TrackingMDNodeRef.
2021-05-08 16:23:28 +01:00
Simon Pilgrim 4524d8b755 [X86] combineHorizOpWithShuffle - generalize HOP(SHUFFLE(X),SHUFFLE(Y)) -> SHUFFLE(HOP(X,Y)) fold.
For 128-bit types, generalize the fold to recognise duplicate operands in either shuffle.
2021-05-08 16:23:27 +01:00
Louis Dionne 74d096e558 [libc++] Move handling of the target triple to the DSL
This fixes a long standing issue where the triple is not always set
consistently in all configurations. This change also moves the
back-deployment Lit features to using the proper target triple
instead of using something ad-hoc.

This will be necessary for using from scratch Lit configuration files
in both normal testing and back-deployment testing.

Differential Revision: https://reviews.llvm.org/D102012
2021-05-08 11:10:53 -04:00
Vinayaka Bandishti 9610a2d753 [MLIR] Add memref dialect dependency for affine fusion pass
For `AffineLoopFusion` pass, add `memref` dialect as a dependent
dialect. Since the fusion pass can create `memref::AllocOp`s, the
dialect must be registered in its dependent dialects.

The missing dependency was not discovered until now because the above
said op creation happes only when the input already has
`memref::AllocOp`s in it, and all dialects in the input are
automatically added to the context.

Reviewed By: bondhugula

Differential Revision: https://reviews.llvm.org/D102104
2021-05-08 20:12:33 +05:30
Uday Bondhugula 73df48158b [MLIR][NFC] Remove unused MLIRContext declaration
Remove unused MLIRContext declaration. NFC.

Differential Revision: https://reviews.llvm.org/D102103
2021-05-08 19:07:24 +05:30
Roman Lebedev 1acd9a1a29
Revert "[LICM] Hoist loads with invariant.group metadata"
This appears to miscompile google benchmark's GetCacheSizesFromKVFS()
when compiling with -fstrict-vtable-pointers.
Runnable reproducer: https://godbolt.org/z/f9ovKqTzb
The "f.fail()" crashes with BUS error, it is compiled into testb,
and the adress it is testing is non-sensical.

This reverts commit 4c89bcadf6.
2021-05-08 15:44:49 +03:00
Saurabh Jha 4e192edb2d Test commit to check commit access 2021-05-08 13:24:05 +01:00
Roman Lebedev b1c38207e9
[X86] Improve costmodel for scalar byte swaps
Currently we model i16 bswap as very high cost (`10`),
which doesn't seem right, with all other being at `1`.

Regardless of `MOVBE`, i16 reg-reg bswap is lowered into
(an extending move plus) rot-by-8:
https://godbolt.org/z/8jrq7fMTj
I think it should at worst have throughput of `1`:

Since i32/i64 already have cost of `1`,
`MOVBE` doesn't improve their costs any further.

BUT, `MOVBE` must have at least a single memory operand,
with other being a register. Which means, if we have
a bswap of load, iff load has a single use,
we'll fold bswap into load.

Likewise, if we have store of a bswap, iff bswap
has a single use, we'll fold bswap into store.

So i think we should treat such a bswap as free,
unless of course we know that for the particular CPU
they are performing badly.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D101924
2021-05-08 15:17:35 +03:00
Louis Dionne c42007e266 [libc++] Use Xcode's CMake if it's present
This resolves issues when the CMake in use on the host is too old to
configure libc++ properly, but Xcode has a sufficiently recent version.
It is technically possible for the reverse issue to happen, where the
Xcode version would be too old and the user-installed version would be
better, however in the context of our build bots, we use AppleClang on
Apple platforms, and the CMake shipped with Xcode should work with the
AppleClang shipped alongside that Xcode.

Differential Revision: https://reviews.llvm.org/D102083
2021-05-08 07:40:35 -04:00
Qiu Chaofan 2db4979c0f [VectorCombine] Simplify to scalar store if only one element updated
This patch simplifies load-insertelt-store pattern into
getelementptr-store.

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D98240
2021-05-08 18:14:51 +08:00
Butygin e2a7764481 [mlir] Debug print pattern before and after matchAndRewrite call
Motivation: we have passes with lot of rewrites and when one one them segfaults or asserts, it is very hard to find waht exactly pattern failed without debug info.

Differential Revision: https://reviews.llvm.org/D101443
2021-05-08 12:00:36 +03:00
Xiang1 Zhang d4bdeca576 [X86] Support AMX fast register allocation
Differential Revision: https://reviews.llvm.org/D100026
2021-05-08 14:21:11 +08:00
Arthur Eubanks 72bd0116e3 Fix build after 34a8a437b 2021-05-07 23:18:44 -07:00
Xiang1 Zhang bebafe01a7 Revert "[X86] Support AMX fast register allocation"
This reverts commit 77e2e5e07d.
2021-05-08 13:43:32 +08:00
Xiang1 Zhang 77e2e5e07d [X86] Support AMX fast register allocation 2021-05-08 13:27:21 +08:00
Michael Liao 631da3b152 Replace a remaining CRLF with LF. NFC. 2021-05-08 01:09:15 -04:00
Arthur Eubanks 34a8a437bf [NewPM] Hide pass manager debug logging behind -debug-pass-manager-verbose
Printing pass manager invocations is fairly verbose and not super
useful.

This allows us to remove DebugLogging from pass managers and PassBuilder
since all logging (aside from analysis managers) goes through
instrumentation now.

This has the downside of never being able to print the top level pass
manager via instrumentation, but that seems like a minor downside.

Reviewed By: ychen

Differential Revision: https://reviews.llvm.org/D101797
2021-05-07 21:51:47 -07:00
RamNalamothu 223852d76f [DebugInfo] UnwindTable::create() should not add empty rows to CFI unwind table
UnwindTable::parseRows() may return successfully if the CFIProgram has either
no CFI instructions or only DW_CFA_nop instructions and the UnwindRow return
argument will be empty. But currently, the callers are not checking for this case
which is leading to incorrect dumps in the unwind tables in such cases i.e.

  CFA=unspecified

Reviewed By: clayborg

Differential Revision: https://reviews.llvm.org/D101892
2021-05-08 10:19:02 +05:30
River Riddle 53b946aa63 [mlir] Refactor the representation of function-like argument/result attributes.
The current design uses a unique entry for each argument/result attribute, with the name of the entry being something like "arg0". This provides for a somewhat sparse design, but ends up being much more expensive (from a runtime perspective) in-practice. The design requires building a string every time we lookup the dictionary for a specific arg/result, and also requires N attribute lookups when collecting all of the arg/result attribute dictionaries.

This revision restructures the design to instead have an ArrayAttr that contains all of the attribute dictionaries for arguments and another for results. This design reduces the number of attribute name lookups to 1, and allows for O(1) lookup for individual element dictionaries. The major downside is that we can end up with larger memory usage, as the ArrayAttr contains an entry for each element even if that element has no attributes. If the memory usage becomes too problematic, we can experiment with a more sparse structure that still provides a lot of the wins in this revision.

This dropped the compilation time of a somewhat large TensorFlow model from ~650 seconds to ~400 seconds.

Differential Revision: https://reviews.llvm.org/D102035
2021-05-07 19:32:31 -07:00
Arthur Eubanks 44d14d5de6 [lit] Bump up the Windows process cap from 32 to 60
At 61 or over, I see messages like

  File "...\Python\Python39\lib\multiprocessing\connection.py", line 816, in _exhaustive_wait
    res = _winapi.WaitForMultipleObjects(L, False, timeout)

  ValueError: need at most 63 handles, got a sequence of length 64

60 seems to work for me.

If this causes issues for anybody else, feel free to revert.
2021-05-07 18:13:38 -07:00
River Riddle 5c84195b8c [mlir] Add hover support to mlir-lsp-server
This provides information when the user hovers over a part of the source .mlir file. This revision adds the following hover behavior:
* Operation:
  - Shows the generic form.
* Operation Result:
  - Shows the parent operation name, result number(s), and type(s).
* Block:
  - Shows the parent operation name, block number, predecessors, and successors.
* Block Argument:
  - Shows the parent operation name, parent block, argument number, and type.

Differential Revision: https://reviews.llvm.org/D101113
2021-05-07 18:09:01 -07:00
Arthur Eubanks ddff81f692 Revert "lit: revert 134b103fc0f3a995d76398bf4b029d72bebe8162"
This reverts commit d319005a37.

Causing messages like:

  File "...\Python\Python39\lib\multiprocessing\connection.py", line 816, in _exhaustive_wait
    res = _winapi.WaitForMultipleObjects(L, False, timeout)
ValueError: need at most 63 handles, got a sequence of length 74
2021-05-07 18:00:11 -07:00
Arthur Eubanks d82bc9e81d [gn build] Manually port 5b158093e 2021-05-07 17:54:32 -07:00
thomasraoux 6aaf06f929 [mlir][vector] Fix warning
Previous change caused another warning in some build configuration:
"default label in switch which covers all enumeration values"
2021-05-07 17:12:47 -07:00
Amara Emerson 5b158093e2 [AArch64][GlobalISel] Create a new minimal combiner pass just for -O0.
We never bothered to have a separate set of combines for -O0 in the prelegalizer
before. This results in some minor performance hits for a mode where performance
isn't a concern (although not regressing code size significantly is still preferable).

This also removes the CSE option since we don't need it for -O0.

Through experiments, I've arrived at a set of combines that gets the most code
size improvement at -O0, while reducing the amount of time spent in the combiner
by around 35% give or take.

Differential Revision: https://reviews.llvm.org/D102038
2021-05-07 17:01:27 -07:00
Amara Emerson 808bc11d9e [GlobalISel] Don't form zero/sign extending loads for atomics.
For importing patterns, we only support matching G_LOAD, not G_ZEXTLOAD or
G_SEXTLOAD.

Differential Revision: https://reviews.llvm.org/D101932
2021-05-07 16:41:48 -07:00
Weston Carvalho 1f65f42dd3 Make `hasTypeLoc` matcher support more node types.
Differential Revision: https://reviews.llvm.org/D101572
2021-05-08 00:35:22 +01:00
Weston Carvalho 0ad494838b NFC: Move TypeList implementation up the file
This will make it possible for more code to use it.
2021-05-08 00:35:13 +01:00
Arthur Eubanks 6f7131002b [NewPM] Move analysis invalidation/clearing logging to instrumentation
We're trying to move DebugLogging into instrumentation, rather than
being part of PassManagers/AnalysisManagers.

Reviewed By: ychen

Differential Revision: https://reviews.llvm.org/D102093
2021-05-07 15:25:31 -07:00
Jessica Paquette 1312852040 [AArch64][GlobalISel] Legalize narrow type G_CTPOPs
Using `clampScalar` here because we ought to mark s128 as custom eventually.

(Right now, it will just fall back.)

With this legalization, we get the same code as SDAG:
https://godbolt.org/z/TneoPKrKG

Differential Revision: https://reviews.llvm.org/D100908
2021-05-07 14:52:23 -07:00
Adrian Prantl c6ddf669dc Fix the module-enabled build by removing a redundant type definition. 2021-05-07 14:45:17 -07:00
Petr Hosek 167906c109 [BareMetal] Ensure that sysroot always comes after library paths
This addresses an issue introduced in D91559. We would invoke the
compiler with -Lpath/to/lib --sysroot=path/to/sysroot where both
locations contain libraries with the same name, but we expect linker
to pick up the library in path/to/lib since that version is more
specialized. This was the case before D91559 where the sysroot path
would be ignored, but after that change linker would now pick up the
library from the sysroot which resulted in unexpected behavior.

The sysroot path should always come after any user provided library
paths, followed by compiler runtime paths. We want for libraries in user
provided library paths to always take precedence over sysroot libraries.
This matches the behavior of other toolchains used with other targets.

Differential Revision: https://reviews.llvm.org/D102049
2021-05-07 14:42:02 -07:00
Nico Weber d5a70db193 [lld/mac] Write every weak symbol only once in the output
Before this, if an inline function was defined in several input files,
lld would write each copy of the inline function the output. With this
patch, it only writes one copy.

Reduces the size of Chromium Framework from 378MB to 345MB (compared
to 290MB linked with ld64, which also does dead-stripping, which we
don't do yet), and makes linking it faster:

        N           Min           Max        Median           Avg        Stddev
    x  10     3.9957051     4.3496981     4.1411121      4.156837    0.10092097
    +  10      3.908154      4.169318     3.9712729     3.9846753   0.075773012
    Difference at 95.0% confidence
            -0.172162 +/- 0.083847
            -4.14165% +/- 2.01709%
            (Student's t, pooled s = 0.0892373)

Implementation-wise, when merging two weak symbols, this sets a
"canOmitFromOutput" on the InputSection belonging to the weak symbol not put in
the symbol table. We then don't write InputSections that have this set, as long
as they are not referenced from other symbols. (This happens e.g. for object
files that don't set .subsections_via_symbols or that use .alt_entry.)

Some restrictions:
- not yet done for bitcode inputs
- no "comdat" handling (`kindNoneGroupSubordinate*` in ld64) --
  Frame Descriptor Entries (FDEs), Language Specific Data Areas (LSDAs)
  (that is, catch block unwind information) and Personality Routines
  associated with weak functions still not stripped. This is wasteful,
  but harmless.
- However, this does strip weaks from __unwind_info (which is needed for
  correctness and not just for size)
- This nopes out on InputSections that are referenced form more than
  one symbol (eg from .alt_entry) for now

Things that work based on symbols Just Work:
- map files (change in MapFile.cpp is no-op and not needed; I just
  found it a bit more explicit)
- exports

Things that work with inputSections need to explicitly check if
an inputSection is written (e.g. unwind info).

This patch is useful in itself, but it's also likely also a useful foundation
for dead_strip.

I used to have a "canoncialRepresentative" pointer on InputSection instead of
just the bool, which would be handy for ICF too. But I ended up not needing it
for this patch, so I removed that again for now.

Differential Revision: https://reviews.llvm.org/D102076
2021-05-07 17:11:40 -04:00
thomasraoux b90b66bcbe [mlir] Missed clang-format 2021-05-07 13:57:34 -07:00
thomasraoux d0453a8933 [mlir][vector] Extend pattern to trim lead unit dimension to Splat Op
Differential Revision: https://reviews.llvm.org/D102091
2021-05-07 13:54:41 -07:00
Petr Hosek f97ada27aa Revert "[BareMetal] Ensure that sysroot always comes after library paths"
This reverts commit 6b00b34b8a.
2021-05-07 13:38:04 -07:00
Florian Hahn 75b9997760
[LV] Remove reference of PHI from comment, they are not recorded (NFC).
The comment incorrectly states that the PHI is recorded. That's not
accurate, only the recipe for the incoming value is recorded.

Suggested post-commit for 4ba8720f88.
2021-05-07 21:34:23 +01:00
Andrea Di Biagio 3822ac909e [MCA][RegisterFile] Fix register class check for move elimination (PR50265)
The register file should always check if the destination register is from a
register class that allows move elimination.

Before this change, the check on the register class was only performed in a few
very specific cases. However, it should have always been performed.
This patch fixes the issue.

Note that none of the upstream scheduling models is currently affected by this
bug, so there is no test for it. The issue was found by Roman while working on
the znver3 model. I was able to reproduce the issue locally by tweaking the
btver2 model. I then verified that this patch fixes the issue.
2021-05-07 21:30:25 +01:00
Olivier Goffart c4adc49a1c [SEH] Fix regression with SEH in noexpect functions
Commit 5baea05601 set the CurCodeDecl
because it was needed to pass the assert in CodeGenFunction::EmitLValueForLambdaField,
But this was not right to do as CodeGenFunction::FinishFunction passes it to EmitEndEHSpec
and cause corruption of the EHStack.

Revert the part of the commit that changes the CurCodeDecl, and instead
adjust the assert to check for a null CurCodeDecl.

Differential Revision: https://reviews.llvm.org/D102027
2021-05-07 13:27:59 -07:00
Florian Hahn 337d765282
[LV] Assert if trying to sink replicate region into another region (NFC)
Currently sinking a replicate region into another replicate region is
not supported. Add an assert, to make the problem more obvious, should
it occur.

Discussed post-commit for ccebf7a109.
2021-05-07 21:25:35 +01:00
Florian Hahn 01c26d4e04
[LV] Rename Region to TargetRegion, similar to SinkRegion (NFC).
Adjust the name to make it clearer this is the region containing the
target recipe, similar to SinkRegion below.

Suggested post-commit for ccebf7a109.
2021-05-07 21:25:35 +01:00