Commit Graph

397494 Commits

Author SHA1 Message Date
Balazs Benics e5646b9254 Revert "Revert "[analyzer] Ignore IncompleteArrayTypes in getStaticSize() for FAMs""
This reverts commit df1f4e0cc6.

Now the test case explicitly specifies the target triple.
I decided to use x86_64 for that matter, to have a fixed
bitwidth for `size_t`.

Aside from that, relanding the original changes of:
https://reviews.llvm.org/D105184
2021-08-25 17:19:06 +02:00
Andrea Di Biagio 5f848b311f [X86][SchedModel] Fix latency the Hi register write of MULX (PR51495).
Before this patch, WriteIMulH reported a latency value which is correct for the
RR variant of MULX, but not for the RM variant.

This patch fixes the issue by introducing a new WriteIMulHLd, which is meant to
be used only by the RM variant of MULX.

Differential Revision: https://reviews.llvm.org/D108701
2021-08-25 16:12:09 +01:00
Vyacheslav Zakharin 2e192ab1f4 [CodeExtractor] Preserve topological order for the return blocks.
Differential Revision: https://reviews.llvm.org/D108673
2021-08-25 08:09:01 -07:00
Jon Chesterfield 85eedf7acb [openmp] Delete unused grid value field, missed from D108380 2021-08-25 15:54:25 +01:00
Thomas Johnson 8c3886b0ec [ARC] Add ADC (addition with carry) and SBC (subtraction with carry) instructions
Differential Revision: https://reviews.llvm.org/D108672
2021-08-25 07:46:15 -07:00
Balazs Benics df1f4e0cc6 Revert "[analyzer] Ignore IncompleteArrayTypes in getStaticSize() for FAMs"
This reverts commit 360ced3b8f.
2021-08-25 16:43:25 +02:00
Nicholas Guy 36fcf47fc8 [AArch64] Generate SMOV in place of sext(fmov(...))
A single smov instruction is capable of moving from a vector register while performing
the sign-extend during said move, rather than each step being performed by separate instructions.

Differential Revision: https://reviews.llvm.org/D108633
2021-08-25 15:23:22 +01:00
Balazs Benics 360ced3b8f [analyzer] Ignore IncompleteArrayTypes in getStaticSize() for FAMs
Currently only `ConstantArrayType` is considered for flexible array
members (FAMs) in `getStaticSize()`.
However, `IncompleteArrayType` also shows up in practice as FAMs.

This patch will ignore the `IncompleteArrayType` and return Unknown
for that case as well. This way it will be at least consistent with
the current behavior until we start modeling them accurately.

I'm expecting that this will resolve a bunch of false-positives
internally, caused by the `ArrayBoundV2`.

Reviewed By: ASDenysPetrov

Differential Revision: https://reviews.llvm.org/D105184
2021-08-25 16:12:17 +02:00
Jon Chesterfield ba0af885e7 [libomptarget][amdgpu][nfc] Make grid value access match devicertl 2021-08-25 15:11:19 +01:00
Jeremy Morse 0116ed0069 [DebugInfo][InstrRef] Don't use instr-ref for unoptimised functions
InstrRefBasedLDV is marginally slower than VarlocBasedLDV when analysing
optimised code -- however, it's much slower when analysing code compiled
-O0.

To avoid this: don't use instruction referencing for -O0 functions. In the
"pure" case of unoptimised code, this won't really harm the debugging
experience because most variables won't have been promoted off the stack,
so can't go missing. It becomes more complicated when optimised code is
inlined into functions marked optnone; however these are rare, and as -O0
doesn't run many optimisations there should be little damage to the debug
experience as a result.

I've taken the opportunity to refactor testing for instruction-referencing
into a MachineFunction method, which seems the most appropriate place to
put it.

Differential Revision: https://reviews.llvm.org/D108585
2021-08-25 15:10:36 +01:00
Jon Chesterfield 9b2c6c07b5 [libomptarget][amdgpu] Refactor debug printing
Move most debug printing in rtl.cpp behind DP() macro
Adjust the print output for gpu arch mismatch when the architectures match
Convert an assert into graceful failure

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D108562
2021-08-25 14:57:51 +01:00
Joe Nash e381833ba5 [AMDGPU] Support global_atomic_fmin/max on gfx10
Makes patterns added for gfx90a usable with the gfx10 versions of the
insts.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D108654

Change-Id: I86167bf6b4823f975f74ccb619bd6190331ba16b
2021-08-25 09:35:10 -04:00
Andrea Di Biagio fe13b81ed9 [X86][NFC] Pre-commit llvm-mca tests for PR51495.
WriteIMulH reports an incorrect latency for RM variants of MULX.
2021-08-25 14:17:17 +01:00
Louis Dionne 77b32055ec [libc++] Assume that compilers support extended constexpr in C++14 mode
We don't support any compiler that doesn't support C++14 constexpr when
compiling in C++14 mode anymore, so we can just assume that we have C++14
extended constexpr when compiling in C++14 mode. This allows us to remove
some workarounds for older compilers.

Differential Revision: https://reviews.llvm.org/D108638
2021-08-25 08:41:07 -04:00
Florian Hahn 90d09eb300
[LoopPeel] Allow peeling with multiple unreachable-terminated exit blocks.
Support for peeling with multiple exit blocks was added in D63921/77bb3a486fa6.

So far it has only been enabled for loops where all non-latch exits are
'de-optimizing' exits (D63923). But peeling of multi-exit loops can be
highly beneficial in other cases too, like if all non-latch exiting
blocks are unreachable.

The motivating case are loops with runtime checks, like the C++ example
below. The main issue preventing vectorization is that the invariant
accesses to load the bounds of B is conditionally executed in the loop
and cannot be hoisted out. If we peel off the first iteration, they
become dereferenceable in the loop, because they must execute before the
loop is executed, as all non-latch exits are terminated with
unreachable. This subsequently allows hoisting the loads and runtime
checks out of the loop, allowing vectorization of the loop.

     int sum(std::vector<int> *A, std::vector<int> *B, int N) {
       int cost = 0;
       for (int i = 0; i < N; ++i)
         cost += A->at(i) + B->at(i);
       return cost;
     }

This gives a ~20-30% increase of score for Geekbench5/HDR on AArch64.

Note that this requires a follow-up improvement to the peeling cost
model to actually peel iterations off loops as above. I will share that
shortly.

Also, peeling of multi-exits might be beneficial for exit blocks with
other terminators, but I would like to keep the scope limited to known
high-reward cases for now.

I removed the option to disable peeling for multi-deopt exits because
the code is more general now. Alternatively, the option could also be
generalized, but I am not sure if there's much value in the option?

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D108108
2021-08-25 13:26:40 +01:00
Dawid Jurczak bdcf04246c [LoopIdiom] Don't transform loop into memmove when load from body has more than one use
This change fixes issue found by Markus: https://reviews.llvm.org/rG11338e998df1
Before this patch following code was transformed to memmove:

for (int i = 15; i >= 1; i--) {
  p[i] = p[i-1];
  sum += p[i-1];
}

However load from p[i-1] is used not only by store to p[i] but also by sum computation.
Therefore we cannot emit memmove in loop header.

Differential Revision: https://reviews.llvm.org/D107964
2021-08-25 14:22:40 +02:00
Jan Kuehle e708808f87 [clang-format] Support TypeScript override keyword
TypeScript 4.3 added a new "override" keyword for class members. This
lets clang-format know about it, so it can format code using it
properly.

Reviewed By: krasimir

Differential Revision: https://reviews.llvm.org/D108692
2021-08-25 14:11:50 +02:00
Peilin Guo 4c4dbeeeea [DAGCombine] Check the legality of the index of EXTRACT_SUBVECTOR
For ISD::EXTRACT_SUBVECTOR, its second operand must be a constant
multiple of the known-minimum vector length of the result type.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D107795
2021-08-25 19:33:39 +08:00
Jeremy Morse cc1e87bf55 [DebugInfo][InstrRef] Avoid stack-slot-coloring changing codegen due to DI
Stack slot colouring adds "weight" to slots if a non-dbg-value instruction
refers to it. This, unfortunately, means that DBG_PHI instructions can have
an effect on codegen. The fix is very simple, replace isDebugValue with
isDebugInstr.

The regression test contains a scenario that reproduces this problem; I've
represented both normal-debug mode and instr-ref debug mode instructions
in comment lines prefixed with AAAAAA and BBBBBB, and un-comment them with
sed to test that the two different modes produce the same behaviour.

Differential Revision: https://reviews.llvm.org/D108627
2021-08-25 12:04:59 +01:00
River Riddle c8d9e1ce43 [mlir][AttrTypeGen] Add support for specifying a "accessor" type of a parameter
This allows for using a different type when accessing a parameter than the
one used for storage. This allows for returning parameters by reference,
enables using more optimized/convient reference results, and more.

Differential Revision: https://reviews.llvm.org/D108593
2021-08-25 09:27:36 +00:00
River Riddle 9658b061dd [mlir] Update DialectAsmParser::parseString to use std::string instead of StringRef
This allows for parsing strings that have escape sequences, which require constructing
a string (as they can't be represented by looking at the Token contents directly).

Differential Revision: https://reviews.llvm.org/D108589
2021-08-25 09:27:35 +00:00
River Riddle aea3026ea7 [mlir] Move the Operation use iteration utilities to ResultRange
This allows for iterating and interacting with the uses of a specific subset of
results as opposed to just the full range.

Differential Revision: https://reviews.llvm.org/D108586
2021-08-25 09:27:35 +00:00
Jean Perier b3e392c081 [flang] Implement Posix version of DATE_AND_TIME runtime
Use gettimeofday and localtime_r to implement DATE_AND_TIME intrinsic.
The Windows version fallbacks to the "no date and time information
available" defined by the standard (strings set to blanks and values to
-HUGE).

The implementation uses an ifdef between windows and the rest because
from my tests, the SFINAE approach leads to undeclared name bogus errors
with clang 8 that seems to ignore failure to instantiate is not an error
for the function names (i.e., it understands it should not instantiate
the version using gettimeofday if it is not there, but still yields an
error that it is not declared on the spot where it is called in the
uninstantiated version).

Differential Revision: https://reviews.llvm.org/D108622
2021-08-25 11:16:52 +02:00
Jan Svoboda b5088cb408 [clang][deps] Ensure deterministic order of TU '-fmodule-file=' arguments
Translation units with multiple direct modular dependencies trigger a non-deterministic ordering in `clang-scan-deps`. This boils down to usage of `std::unordered_map`, which gets replaced by `std::map` in this patch.

Depends on D103526.

Reviewed By: dexonsmith

Differential Revision: https://reviews.llvm.org/D103807
2021-08-25 11:14:16 +02:00
Rosie Sumpter e221724714 [LoopFlatten] Add statistic for number of loops flattened. NFC
Differential Revision: https://reviews.llvm.org/D108644
2021-08-25 10:10:10 +01:00
Tres Popp 868bd9938d [mlir] Add assertion in NamedAttrList to prevent adding null attributes
Differential Revision: https://reviews.llvm.org/D108570
2021-08-25 11:06:53 +02:00
LLVM GN Syncbot b0b26ae4b3 [gn build] Port 48958d02d2 2021-08-25 09:02:08 +00:00
Daniil Fukalov 48958d02d2 [NFC][AMDGPU] Reduce includes dependencies.
1. Splitted out some parts of R600 target to separate modules/headers.
2. Reduced some include lists in headers.
3. Found and fixed issue with override `GCNTargetMachine::getSubtargetImpl()`
   and `R600TargetMachine::getSubtargetImpl()` had different return value type
   than base class.
4. Minor forward declarations cleanup.

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D108596
2021-08-25 12:01:55 +03:00
Jan Svoboda 3b8f536fec [clang][deps] Use top-level modules as precompiled dependencies
The `ASTReader` populates `Module::PresumedModuleMapFile` only for top-level modules, not submodules. To avoid generating empty `-fmodule-map-file=` arguments, make discovered modules depend on top-level precompiled modules. The granularity of submodules is not important here.

The documentation of `Module::PresumedModuleMapFile` says this field is non-empty only when building from preprocessed source. This means there can still be cases where the dependency scanner generates empty `-fmodule-map-file=` arguments. That's being addressed in separate patch: D108544.

Reviewed By: dexonsmith

Differential Revision: https://reviews.llvm.org/D108647
2021-08-25 10:51:34 +02:00
serge-sans-paille 46c947af7e Have lit preserve SOURCE_DATE_EPOCH
This environment variable has been standardized for reproducible builds. Setting
it can help to have reproducible tests too, so keep it as part of the testing
env when set.

See https://reproducible-builds.org/docs/source-date-epoch/

Differential Revision: https://reviews.llvm.org/D108332
2021-08-25 10:38:20 +02:00
Jan Svoboda 83c633ea1a [clang][deps] Collect precompiled deps from submodules too
In this patch, the dependency scanner starts collecting precompiled dependencies from all encountered submodules, not only from top-level modules.

Reviewed By: dexonsmith

Differential Revision: https://reviews.llvm.org/D108540
2021-08-25 10:35:34 +02:00
Florian Mayer 023f18bbaf [hwasan] do not check if freed pointer belonged to allocator.
In that case it is very likely that there will be a tag mismatch anyway.

We handle the case that the pointer belongs to neither of the allocators
by getting a nullptr from allocator.GetBlockBegin.

Reviewed By: hctim, eugenis

Differential Revision: https://reviews.llvm.org/D108383
2021-08-25 09:31:01 +01:00
Konstantin Schwarz 4b4bc1ea16 [GlobalISel] Do not generate illegal G_SEXTLOADs after legalization
The sext_inreg_of_load combine did not have the isLegalOrBeforeLegalizer check,
leading to the generation of potentially illegal G_SEXTLOADs when run after legalization.

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D108626
2021-08-25 10:13:39 +02:00
Jonas Hahnfeld ea08c4cd1c [CUDA] Fix static device variables with -fgpu-rdc
NVPTX does not allow dots in the identifier, so ptxas errors out with
   fatal   : Parsing error near '.static': syntax error
because it parses .static as a directive. Avoid this problem by using
two underscores, similar to what OpenMP does for outlined functions.

Differential Revision: https://reviews.llvm.org/D108456
2021-08-25 09:31:22 +02:00
Yi Kong 5fc4828aa6 [clang] Don't generate warn-stack-size when the warning is ignored
8ace121305 introduced a regression for code that explicitly ignores the
-Wframe-larger-than= warning. Make sure we don't generate the
warn-stack-size attribute for that case.

Differential Revision: https://reviews.llvm.org/D108686
2021-08-25 14:58:45 +08:00
Douglas Yung 323a6bfbb8 Add "REQUIRES: arm-registered-target" line to test added in D108603.
This should fix the test failure on the PS4 build bot.
2021-08-24 22:22:16 -07:00
Vang Thao 549f6a819a [MachineCopyPropagation] Check CrossCopyRegClass for cross-class copys
On some AMDGPU subtargets, copying to and from AGPR registers using another
AGPR register is not possible. A intermediate VGPR register is needed for AGPR
to AGPR copy. This is an issue when machine copy propagation forwards a
COPY $agpr, replacing a COPY $vgpr which results in $agpr = COPY $agpr. It is
removing a cross class copy that may have been optimized by previous passes and
potentially creating an unoptimized cross class copy later on.

To avoid this issue, check CrossCopyRegClass if a different register class will
be needed for the copy. If so then avoid forwarding the copy when the
destination does not match the desired register class and if the original copy
already matches the desired register class.

Issue seen while attempting to optimize another AGPR to AGPR issue:

Live-ins: $agpr0
$vgpr0 = COPY $agpr0
$agpr1 = V_ACCVGPR_WRITE_B32 $vgpr0
$agpr2 = COPY $vgpr0
$agpr3 = COPY $vgpr0
$agpr4 = COPY $vgpr0

After machine-cp:

$vgpr0 = COPY $agpr0
$agpr1 = V_ACCVGPR_WRITE_B32 $vgpr0
$agpr2 = COPY $agpr0
$agpr3 = COPY $agpr0
$agpr4 = COPY $agpr0

Machine-cp propagated COPY $agpr0 to replace $vgpr0 creating 3 AGPR to AGPR
copys. Later this creates a cross-register copy from AGPR->VGPR->AGPR for each
copy when the prior VGPR->AGPR copy was already optimal.

Reviewed By: lkail, rampitec

Differential Revision: https://reviews.llvm.org/D108011
2021-08-24 21:22:36 -07:00
Lang Hames 2a35d59b2f [JITLink][MachO] Add more detail to error message. 2021-08-25 13:31:52 +10:00
Lang Hames fc3b2675e7 [ORC] Fix typo in debugging output 2021-08-25 13:31:51 +10:00
Carl Ritson 28ba16c31b [DAGCombine] Pre-commit test for D108619 2021-08-25 12:18:01 +09:00
Fangrui Song 9ab9a9595b [InstrProfiling] Keep profd non-private for non-renamable comdat functions
The NS==0 condition used by D103717 missed a corner case: if the current copy
does not have a hash suffix (e.g. weak_odr), a copy with value profiling (with a
different CFG) may exist. This is super rare, but is possible with pre-inlining
PGO instrumentation (which can make a weak_odr function inlines its callees
differently, sometimes with value profiling while sometimes without).

If the current copy with private profd is prevailing, the non-prevailing copy
may get an undefined symbol if a caller inlining the non-prevailing function
references its profd. If the other copy with non-private profd is prevailing,
the current copy may cause a "relocation to discarded section" linker error.

The fix is straightforward: just keep non-private profd in such a `DataReferencedByCode` case.

With this change, a stage 2 (`-DLLVM_TARGETS_TO_BUILD=X86 -DLLVM_BUILD_INSTRUMENTED=IR`)
clang is 0.08% larger (172431496/172286720-1).
`stat -c %s **/*.o | awk '{s+=$1}END{print s}' is 0.026% larger.
The majority of D103717's benefits remains.

Reviewed By: xur

Differential Revision: https://reviews.llvm.org/D108432
2021-08-24 20:14:03 -07:00
Richard Smith cd4d6d718b PR48030: Fix COMDAT-related linking problem with C++ thread_local static data members.
Previously when emitting a C++ guarded initializer, we tried to work out what
the enclosing function would be used for and added it to the COMDAT containing
the variable if we thought that doing so would be correct. But this was done
from a context in which we didn't -- and realistically couldn't -- correctly
infer how the enclosing function would be used.

Instead, add the initialization function to a COMDAT from the code that
creates it, in the case where it makes sense to do so: when we know that
the one and only reference to the initialization function is in
@llvm.global.ctors and that reference is in the same COMDAT.

Reviewed By: rjmccall

Differential Revision: https://reviews.llvm.org/D108680
2021-08-24 19:53:44 -07:00
Thomas Lively 977eeb0c38 [WebAssembly] Fix some UB from ca541aa319 2021-08-24 19:44:03 -07:00
Fangrui Song 32e2326cda Revert D108432 "[InstrProfiling] Keep profd non-private for non-renamable comdat functions"
This reverts commit f653beea88.

It broke Windows coverage-inline.cpp because link.exe has a limitation
that external symbols in IMAGE_COMDAT_SELECT_ASSOCIATIVE don't work.

It essentially dropped the previous size optimization for coverage
because coverage doesn't rename comdat by default.
Needs more investigation what we should do.
2021-08-24 19:16:07 -07:00
Rob Suderman 5541a05d6a [mlir][tosa] Quantized tosa.avg_pool2d lowering to linalg
Includes the quantized version of average pool lowering to linalg dialect.
This includes a lit test for the transform. It is not 100% correct as the
multiplier / shift should be done in i64 however this is negligable rounding
difference.

Reviewed By: NatashaKnk

Differential Revision: https://reviews.llvm.org/D108676
2021-08-24 18:54:23 -07:00
Rob Suderman 4ef1770abd [mlir][tosa] Table did not apply offset before extract on i8 input
Lowering to table was incorrect as it did not apply a 128 offset before
extracting the value from the table. Fixed and correct tensor length on input
table.

Reviewed By: NatashaKnk

Differential Revision: https://reviews.llvm.org/D108436
2021-08-24 18:52:33 -07:00
Matthias Springer a9cff97f94 [mlir][SCF] Generalize AffineMinSCFCanonicalization to min/max ops
* Add support for affine.max ops to SCF loop peeling pattern.
* Add support for affine.max ops to `AffineMinSCFCanonicalizationPattern`.
* Rename `AffineMinSCFCanonicalizationPattern` to `AffineOpSCFCanonicalizationPattern`.
* Rename `AffineMinSCFCanonicalization` pass to `SCFAffineOpCanonicalization`.

Differential Revision: https://reviews.llvm.org/D108009
2021-08-25 10:40:34 +09:00
wren romano 90e0c657b7 [mlir][sparse] Correcting the use of emplace_back
The emplace commands are variadic and should take all the constructor arguments directly, since they implicitly call the constructor themselves in order to avoid the cost of constructing and then moving/copying temporaries.

Reviewed By: aartbik

Differential Revision: https://reviews.llvm.org/D108670
2021-08-24 18:32:13 -07:00
Heejin Ahn d5244fb160 [WebAssembly] Use SSAUpdaterBulk in LowerEmscriptenSjLj
We update SSA in two steps in Emscripten SjLj:
1. Rewrite uses of `setjmpTable` and `setjmpTableSize` variables and
   place `phi`s where necessary, which are updated where we call
   `saveSetjmp`.
2. Do a whole function level SSA update for all variables, because we
   split BBs where `setjmp` is called and there are possibly variable
   uses that are not dominated by a def.
   (See 955b91c19c/llvm/lib/Target/WebAssembly/WebAssemblyLowerEmscriptenEHSjLj.cpp (L1314-L1324))

We have been using `SSAUpdater` to do this, but `SSAUpdaterBulk` class
was added after this pass was first created, and for the step 2 it looks
like a better alternative with a possible performance benefit. Not sure
the author is aware of it, but `SSAUpdaterBulk` seems to have a
limitation: it cannot handle a use within the same BB as a def but
before it. For example:
```
... = %a + 1
%a = foo();
```
or
```
%a = %a + 1
```
The uses `%a` in RHS should be rewritten with another SSA variable of
`%a`, most likely one generated from a `phi`. But `SSAUpdaterBulk`
thinks all uses of `%a` are below the def of `%a` within the same BB.
(`SSAUpdater` has two different functions of rewriting because of this:
`RewriteUse` and `RewriteUseAfterInsertions`.) This doesn't affect our
usage in the step 2 because that deals with possibly non-dominated uses
by defs after block splitting. But it does in the step 1, which still
uses `SSAUpdater`.

But this CL also simplifies the step 1 by using `make_early_inc_range`,
removing the need to advance the iterator before rewriting a use.

This is NFC; the test changes are just the order of PHI nodes.

Reviewed By: dschuff

Differential Revision: https://reviews.llvm.org/D108583
2021-08-24 18:23:51 -07:00
Rob Suderman a7bf93807b [mlir][tosa] Fix conv/depthwise conv padding for quantized values
When padding quantized operations, the padding needs to equal the zero point
of the input value. Corrected the pass to change the padding value if quantized.

Reviewed By: NatashaKnk

Differential Revision: https://reviews.llvm.org/D108440
2021-08-24 18:13:22 -07:00