Commit Graph

411777 Commits

Author SHA1 Message Date
Jonas Paulsson 792853cb78 [SystemZ] Remove the ManipulatesSP flag from backend (NFC).
This flag was set in the presence of stacksave/stackrestore in order to force
a frame pointer.

This should however not be needed per the comment in MachineFrameInfo.h
stating that a a variable sized object "...is the sole condition which
prevents frame pointer elimination", and experiments have also shown that
there seems to be no effect whatsoever on code generation with ManipulatesSP.

Review: Ulrich Weigand
2022-01-20 13:00:51 -06:00
John Ericson df31ff1b29 [cmake] Make include(GNUInstallDirs) always below project(..)
Its defaulting logic must go after `project(..)` to work correctly,  but `project(..)` is often in a standalone condition making this
awkward, since the rest of the condition code may also need GNUInstallDirs.

The good thing is there are the various standalone booleans, which I had missed before. This makes splitting the conditional blocks less awkward.

Reviewed By: arichardson, phosek, beanz, ldionne, #libunwind, #libc, #libc_abi

Differential Revision: https://reviews.llvm.org/D117639
2022-01-20 18:59:17 +00:00
Marco Elver c65186c89f [clang] Improve -Wdeclaration-after-statement
With 118f966b46, Clang matches GCC's behaviour and allows enabling
-Wdeclaration-after-statement with C99 and later.

However, the check for mixing declarations and code is not a constant time
algorithm, and therefore should be guarded with Diags.isIgnored().

Furthermore, improve test coverage with: non-pedantic C89 with the
warning; C11 with the warning; and when using -Wall.

Finally, mention the changed behaviour in ReleaseNotes.rst.

Reviewed By: aaron.ballman

Differential Revision: https://reviews.llvm.org/D117232
2022-01-20 19:56:34 +01:00
Sanjay Patel 2d031ec5e5 [InstCombine] add one-use check to opposite shift folds
Test comments say this might be intentional, but I don't
see any hard evidence to support it. The extra instruction
shows up as a potential regression in D117680.

One test does show a missed fold that might be recovered
with better demanded bits analysis.
2022-01-20 13:49:23 -05:00
Sanjay Patel 587dccfb12 [InstCombine] avoid 'tmp' usage in test files; NFC
The update script ( utils/update_test_checks.py ) warns against this
because it can conflict with the default FileCheck names given to
anonymous values in the IR.
2022-01-20 13:49:23 -05:00
eopXD b58cc9fb23 [NFC][RISCV] Add end-of-line symbol in target-feature testcases
Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D117808
2022-01-20 10:47:49 -08:00
Craig Topper 9abc593e98 [TargetLowering][InstCombine] Simplify BSwap demanded bits code a little. NFC
Use alignDown instead of &= ~7.

Replace ResultBit with NLZ. (BitWidth - NLZ - NTZ == 8) so
(BitWidth - NTZ - 8 == NLZ).

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D117804
2022-01-20 10:45:17 -08:00
Roman Lebedev eb6c6e6058
[NFC][InstCombine] Add test showing failure to sink into `resume` block 2022-01-20 21:42:08 +03:00
Evgeny Shulgin b80db150cd Add `isConsteval` matcher
Support C++20 consteval functions and C++2b if consteval for AST Matchers.
2022-01-20 13:35:10 -05:00
Tue Ly aad04534c4 [libc] Implement correct rounding with all rounding modes for hypot functions.
Update the rounding logic for generic hypot function so that it will round correctly with all rounding modes.

Reviewed By: sivachandra, zimmermann6

Differential Revision: https://reviews.llvm.org/D117590
2022-01-20 13:33:20 -05:00
Joseph Huber af5600420b [OpenMP] Don't pass empty files to nvlink
This patch adds and exception to the nvlink wrapper tool to not pass
empty cubin files to the nvlink job. If an empty file is passed to
nvlink it will cause an error indicating that the file could not be
opened. This would occur if the user tried to link object files that
contained offloading code with a file that didnt. This will act as a
workaround until the new OpenMP offloading driver becomes the default.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D117777
2022-01-20 13:12:02 -05:00
Sergei Grechanik 5abf116322 [mlir][vector] Allow values outside of [0; dim-size] in create_mask
This commits explicitly states that negative values and values exceeding
vector dimensions are allowed in vector.create_mask (but not in
vector.constant_mask). These values are now truncated when
canonicalizing vector.create_mask to vector.constant_mask.

Reviewed By: aartbik

Differential Revision: https://reviews.llvm.org/D116069
2022-01-20 09:34:42 -08:00
Alexandre Ganea 5af2433e17 [clang-cl] Support the /HOTPATCH flag
This patch adds support for the MSVC /HOTPATCH flag: https://docs.microsoft.com/sv-se/cpp/build/reference/hotpatch-create-hotpatchable-image?view=msvc-170&viewFallbackFrom=vs-2019

The flag is translated to a new -fms-hotpatch flag, which in turn adds a 'patchable-function' attribute for each function in the TU. This is then picked up by the PatchableFunction pass which would generate a TargetOpcode::PATCHABLE_OP of minsize = 2 (which means the target instruction must resolve to at least two bytes). TargetOpcode::PATCHABLE_OP is only implemented for x86/x64. When targetting ARM/ARM64, /HOTPATCH isn't required (instructions are always 2/4 bytes and suitable for hotpatching).

Additionally, when using /Z7, we generate a 'hot patchable' flag in the CodeView debug stream, in the S_COMPILE3 record. This flag is then picked up by LLD (or link.exe) and is used in conjunction with the linker /FUNCTIONPADMIN flag to generate extra space before each function, to accommodate for live patching long jumps. Please see: d703b92296/lld/COFF/Writer.cpp (L1298)

The outcome is that we can finally use Live++ or Recode along with clang-cl.

NOTE: It seems that MSVC cl.exe always enables /HOTPATCH on x64 by default, although if we did the same I thought we might generate sub-optimal code (if this flag was active by default). Additionally, MSVC always generates a .debug$S section and a S_COMPILE3 record, which Clang doesn't do without /Z7. Therefore, the following MSVC command-line "cl /c file.cpp" would have to be written with Clang such as "clang-cl /c file.cpp /HOTPATCH /Z7" in order to obtain the same result.

Depends on D43002, D80833 and D81301 for the full feature.

Differential Revision: https://reviews.llvm.org/D116511
2022-01-20 12:57:19 -05:00
Matt Arsenault 237502c1a4 AMDGPU: Fix asm in test using wrong IR type for physical register 2022-01-20 12:56:53 -05:00
Matt Arsenault 064cea9c9a AMDGPU/GlobalISel: Try to use s_and_b64 in ptrmask selection
Avoids a test diff with SDAG.
2022-01-20 12:56:53 -05:00
Matt Arsenault 2d1f9aa27d AMDGPU/GlobalISel: Regenerate test checks with -NEXT 2022-01-20 12:56:53 -05:00
Matt Arsenault 08549ba51e AMDGPU/GlobalISel: Explicitly set -global-isel-abort in failure tests
If the default mode is the fallback, this would fail since it would
end up seeing the DAG failure message instead.
2022-01-20 12:56:53 -05:00
zijunzhao c0f9592daa add tsan shared library
Add tsan shared library on Android. Only build tsan when minSdkVersion is above 23.

Reviewed By: danalbert, vitalybuka

Differential Revision: https://reviews.llvm.org/D108394
2022-01-20 17:54:16 +00:00
Matt Arsenault 2e49e0cfde AMDGPU/GlobalISel: Directly diagnose return value use for FP atomics
Emit an error if the return value is used on subtargets that do not
support them. Previously we were falling back to the DAG on selection
failure, where it would emit this error and then fail again.
2022-01-20 12:46:45 -05:00
Nikolas Klauser 4822447522 [libc++] basic_string::resize_and_overwrite: Adopt LWG3645 (Not voted in yet)
Adopt LWG3645, which fixes the value categories of basic_string::resize_and_overwrite
https://timsong-cpp.github.io/lwg-issues/3645

Reviewed By: ldionne, #libc

Spies: libcxx-commits

Differential Revision: https://reviews.llvm.org/D116815
2022-01-20 18:41:09 +01:00
Roman Lebedev 1455eddcf7
[NFC][SimplifyCFG] Add some tests for `invoke` merging 2022-01-20 20:37:29 +03:00
Sam Clegg feddf11502 [lld][WebAssemlby] Convert test to check disassembly output. NFC
Differential Revision: https://reviews.llvm.org/D117739
2022-01-20 09:32:01 -08:00
Nadav Rotem 191a6e9dfa optimize icmp-ugt-ashr
This diff optimizes the sequence icmp-ugt(ashr,C_1) C_2. InstCombine
already implements this optimization for sgt, and this patch adds
support ugt. This patch adds the check for UGT.

@craig.topper came up with the idea and proof:

  define i1 @src(i8 %x, i8 %y, i8 %c) {
    %cp1 = add i8 %c, 1
    %i = shl i8 %cp1, %y
    %i.2 = ashr i8 %i, %y
    %cmp = icmp eq i8 %cp1, %i.2
    ;Assume: C + 1 == (((C + 1) << y) >> y)
    call void @llvm.assume(i1 %cmp)

    ; uncomment for the sgt case
    %j = shl i8 %cp1, %y
    %j.2 = sub i8 %j, 1
    %cmp2 = icmp ne i8 %j.2, 127
    ;Assume (((c + 1 ) << y) - 1) != 127
    call void @llvm.assume(i1 %cmp2)

    %s = ashr i8 %x, %y
    %r = icmp sgt i8 %s, %c
    ret i1 %r
  }

  define i1 @tgt(i8 %x, i8 %y, i8 %c) {
    %cp1 = add i8 %c, 1
    %j = shl i8 %cp1, %y
    %j.2 = sub i8 %j, 1

    %r = icmp sgt i8 %x, %j.2
    ret i1 %r
  }

  declare void @llvm.assume(i1)

  This change is related to the optimizations in D117252.

  Differential Revision: https://reviews.llvm.org/D117365
2022-01-20 09:31:46 -08:00
Valentin Clement 81cbbe3e17
[flang][NFC] Remove unused/duplicated kStridePosInDim
kStridePosInDim is a duplicate of kDimStridePos and is not used. Just
remove it.

Reviewed By: kiranchandramohan

Differential Revision: https://reviews.llvm.org/D117784
2022-01-20 18:30:29 +01:00
Fraser Cormack 75017db08c [RISCV] Add tests for commuted vector/scalar VP patterns
This patch adds a variety of tests checking that we can match
vector/scalar instructions against masked VP intrinsics when the splat
is on the LHS. At this stage, we can't, despite us having
ostensibly-commutable ISel patterns for them. The use of V0 as the mask
operand interferes with the auto-generated ISel table.
2022-01-20 17:10:09 +00:00
Matt Arsenault be7e938e27 AMDGPU/GlobalISel: Stop handling llvm.amdgcn.buffer.atomic.fadd
This code is not structured to handle the legacy buffer intrinsics and
was miscompiling them.
2022-01-20 12:12:06 -05:00
Matt Arsenault 8ff3c9e0be AMDGPU/GlobalISel: Fix selection of gfx90a FP atomics
The struct/raw forms for the buffer atomics now work as
expected. However, we're incorrectly handling the legacy form (which
we probably shouldn't handle at all). We also are not diagnosing the
use of the return value on gfx908. These will be addressed separately.
2022-01-20 12:12:06 -05:00
Matt Arsenault 89c447e4e6 AMDGPU: Stop reserving 36-bytes before kernel arguments for amdpal
This was inheriting the mesa behavior, and as far as I know nobody is
using opencl kernels with amdpal. The isMesaKernel check was
irrelevant because this property needs to be held for all functions.
2022-01-20 12:12:05 -05:00
Random06457 ee198df2e1 [mips] Improve vr4300 mulmul bugfix pass
When compiling with dwarf info, the mfix4300 flag introduced in
https://reviews.llvm.org/D116238 can miss some occurrences of the vr4300
mulmul bug if a debug instruction happens to be between two `muls`
instructions. This change skips debug instructions in order to fix
the mulmul bug detection.

Fixes https://github.com/llvm/llvm-project/issues/53094

Differential Revision: https://reviews.llvm.org/D117615
2022-01-20 20:10:04 +03:00
Lucas Prates 283f5a198a [GlobalISel] Fix incorrect sign extension when combining G_INTTOPTR and G_PTR_ADD
The GlobalISel combiner currently uses sign extension when manipulating
the LHS constant when combining a sequence of the following sequence of
machine instructions into a single constant:
```
  %0:_(s32) = G_CONSTANT i32 <CONSTANT>
  %1:_(p0) = G_INTTOPTR %0:_(s32)
  %2:_(s64) = G_CONSTANT i64 <CONSTANT>
  %3:_(p0) = G_PTR_ADD %1:_, %2:_(s64)
```

This causes an issue when the bit width of the first contant and the
target pointer size are different, as G_INTTOPTR has no sign extension
semantics.

This patch fixes this by capture an arbitrary precision in when matching
the constant, allowing the matching function to correctly zero extend
it.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D116941
2022-01-20 17:02:52 +00:00
Sjoerd Meijer fabf1de132 [FuncSpec] Add a reference, and some other clarifying comments. NFC. 2022-01-20 17:01:08 +00:00
Philip Reames c104fca36b {SLP] Delete dead code in favor of proper assert [NFC] 2022-01-20 08:54:12 -08:00
Philip Reames c43ebae838 [SLP] Reduce nesting depth in calculateDependencies via for loop and early continue [NFC] 2022-01-20 08:46:44 -08:00
Sander de Smalen 990bab89ff [ScalableVectors] Warn instead of error for invalid size requests.
This was intended to be fixed by D98856, but that only seemed to have
the desired behaviour when compiling to assembly using `-S`, not when
compiling into an object file or executable. Given that this was not
the intention of D98856, this patch fixes the behaviour.
2022-01-20 16:42:08 +00:00
Adrian Prantl c0957bd617 Add missing include to fix modular build 2022-01-20 08:35:33 -08:00
Adrian Prantl 54ba376d08 Add missing include to fix modular build 2022-01-20 08:33:44 -08:00
Philip Reames 3c422cbe6b [SLP] Add an asser to make a non-obvious precondition clear [NFC] 2022-01-20 08:24:10 -08:00
Michael Kruse 616f77172f [OpenMPIRBuilder] Detect and fix ambiguous InsertPoints for createParallel.
When a Builder methods accepts multiple InsertPoints, when both point to
the same position, inserting instructions at one position will "move" the
other after the inserted position since the InsertPoint is pegged to the
instruction following the intended InsertPoint. For instance, when
creating a parallel region at Loc and passing the same position as AllocaIP,
creating instructions at Loc will "move" the AllocIP behind the Loc
position.

To avoid this ambiguity, add an assertion checking this condition and
fix the unittests.

In case of AllocaIP, an alternative solution could be to implicitly
split BasicBlock at InsertPoint, using the first as AllocaIP, the second
for inserting the instructions themselves. However, this solution is
specific to AllocaIP since AllocaIP will always have to be first. Hence,
this is an argument to generally handling ambiguous InsertPoints as API
sage error.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D117226
2022-01-20 10:13:44 -06:00
Nico Weber 91eca967b9 [gn build] (manually) port f29256a64a 2022-01-20 11:02:06 -05:00
Nikita Popov 805bc24868 [InstSimplify] Add test for load of non-integral pointer (NFC) 2022-01-20 16:50:05 +01:00
Mubashar Ahmad 3da69fb5a2 [Clang][AArch64][ARM] Unaligned Access Test Fix
Test fixed for the unaligned access warning.
2022-01-20 15:29:53 +00:00
Nikita Popov 0f283de9d1 [InstSimplify] Add test for reinterpret load of pointer type (NFC) 2022-01-20 16:25:54 +01:00
Simon Pilgrim 866311e71c [X86] lowerToAddSubOrFMAddSub - lower 512-bit ADDSUB patterns to blend(fsub,fadd)
AVX512 doesn't provide a ADDSUB instruction, but if we've built this from a build vector of scalar fsub/fadd elements we can still lower to blend(fsub,fadd)
2022-01-20 15:16:05 +00:00
Mircea Trofin f29256a64a [MLGO] Improved support for AOT cross-targeting scenarios
The tensorflow AOT compiler can cross-target, but it can't run on (for
example) arm64. We added earlier support where the AOT-ed header and object
would be built on a separate builder and then passed at build time to
a build host where the AOT compiler can't run, but clang can be otherwise
built.

To simplify such scenarios given we now support more than one AOT-able
case (regalloc and inliner), we make the AOT scenario centered on whether
files are generated, case by case (this includes the "passed from a
different builder" scenario).
This means we shouldn't need an 'umbrella' LLVM_HAVE_TF_AOT, in favor of
case by case control. A builder can opt out of an AOT case by passing that case's
model path as `none`. Note that the overrides still take precedence.

This patch controls conditional compilation with case-specific flags,
which can be enabled locally, for the component where those are
available. We still keep an overall flag for some tests.

The 'development/training' mode is unchanged, because there the model is
passed from the command line and interpreted.

Differential Revision: https://reviews.llvm.org/D117752
2022-01-20 07:05:39 -08:00
Nikita Popov 81d35f27dd [DebugInstrRef] Memoize variable order during sorting (NFC)
Instead of constructing DebugVariables and looking up the order
in the comparison function, compute the order upfront and then sort
a vector of (order, instr).

This improves compile-time by -0.4% geomean on CTMark ReleaseLTO-g.

Differential Revision: https://reviews.llvm.org/D117575
2022-01-20 16:04:24 +01:00
Simon Pilgrim 4130357f96 [X86] Fix v16f32 ADDSUB test
This was supposed to ensure we're not generating 512-bit ADDSUB nodes, but cut+paste typos meant we weren't generating a full 512-bit pattern
2022-01-20 14:58:36 +00:00
eopXD 14c5fd920b [Clang][RISCV] Change TARGET_BUILTIN to require zve32x for vector instruction
According to v-spec v1.0, `zve-32x` is the new minimum extension to include
to have vector instructions.

Reviewed By: kito-cheng

Differential Revision: https://reviews.llvm.org/D112613
2022-01-20 06:53:48 -08:00
Jan Svoboda 9011903e36 [llvm][vfs] Abstract in-memory node creation
The creation of in-memory VFS nodes happens in a single function that deduces what kind of node to create from the arguments. This leads to complicated if-then-else logic that's difficult to cleanly extend.

This patch abstracts away in-memory node creation via a type-erased factory function that's passed instead.

Reviewed By: dexonsmith

Differential Revision: https://reviews.llvm.org/D117648
2022-01-20 15:48:02 +01:00
Jan Svoboda 9e24d14ac8 [llvm][vfs] NFC: Virtualize in-memory `getStatus`
This patch virtualizes the `getStatus` function on `InMemoryNode` in LLVM VFS. Currently, this is implemented via top-level function `getNodeStatus` that tries to cast `InMemoryNode *` into each subtype. Virtual functions seem to be the simpler solution here.

Reviewed By: dexonsmith

Differential Revision: https://reviews.llvm.org/D117649
2022-01-20 15:48:02 +01:00
Stephan Herhut 6d45284618 [mlir][memref] Add better support for identity layouts in memref.collapse_shape canonicalizer
When computing the new type of a collapse_shape operation, we need to at least
take into account whether the type has an identity layout, in which case we can
easily support dynamic strides. Otherwise, the canonicalizer creates invalid
IR.

Longer term, both the verifier and the canoncializer need to be extended to
support the general case.

Differential Revision: https://reviews.llvm.org/D117772
2022-01-20 15:31:43 +01:00