Commit Graph

990 Commits

Author SHA1 Message Date
Fraser Cormack 73244e8f85 [VP] Add vp.icmp comparison intrinsic and docs
This patch mostly follows up on D121292 which introduced the vp.fcmp
intrinsic.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D122729
2022-03-30 17:05:11 +01:00
Fraser Cormack da6131f20a [VP] Add vp.fcmp comparison intrinsic and docs
This patch adds the first support for vector-predicated comparison
intrinsics, starting with vp.fcmp. It uses metadata to encode its
condition code, like the llvm.experimental.constrained.fcmp intrinsic.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D121292
2022-03-30 14:39:18 +01:00
Serge Pavlov 8160dd582b Revert "Mapping of FP operations to constrained intrinsics"
This reverts commit 115b3ace36.
Starting from this commit the buildbot sanitizer-x86_64-linux-bootstrap-msan
starts failing (build 10071). Reverted for investigation.
2022-03-30 16:46:43 +07:00
Serge Pavlov 115b3ace36 Mapping of FP operations to constrained intrinsics
A new function 'getConstrainedIntrinsic' is added, which for any gived
instruction returns id of the corresponding constrained intrinsic. If
there is no constrained counterpart for the instruction or the instruction
is already a constrained intrinsic, the function returns zero.

Differential Revision: https://reviews.llvm.org/D69562
2022-03-30 12:21:30 +07:00
Craig Topper 49c2206b3b [VP] Preserve address space of pointer for strided load/store intrinsics.
This adds LLVMAnyPointerToElt to use instead of LLVMPointerToElt.
This allows us to preserve the address space as part of the type
overload for the intrinsic, but still require the vector element
type to match the pointer type.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D122042
2022-03-22 09:52:54 -07:00
Lorenzo Albano 28cfa764c2 [VP] Strided loads/stores
This patch introduces two new experimental IR intrinsics and SDAG nodes
to represent vector strided loads and stores.

Reviewed By: simoll

Differential Revision: https://reviews.llvm.org/D114884
2022-03-10 18:46:54 +01:00
Simon Moll 5f62156762 [VP] Introducing VectorBuilder, the VP intrinsic builder
VectorBuilder wraps around an IRBuilder and
VectorBuilder::createVectorInstructions emits VP intrinsics as if they
were regular instructions.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D105283
2022-03-07 10:02:07 +01:00
Simon Moll 8de8731591 Revert "[VP] Introducing VectorBuilder, the VP intrinsic builder"
This reverts commit 8bcbfb50e8.

Taking this patch offline to fix breakage: https://lab.llvm.org/buildbot/#/builders/110/builds/10912
2022-03-03 13:34:37 +01:00
Simon Moll 8bcbfb50e8 [VP] Introducing VectorBuilder, the VP intrinsic builder
VectorBuilder wraps around an IRBuilder and
VectorBuilder::createVectorInstructions emits VP intrinsics as if they
were regular instructions.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D105283
2022-03-03 11:31:57 +01:00
Simon Moll d05ddb86f6 [VP] vp.sitofp cast intrinsic and docs
Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D119922
2022-03-02 10:16:19 +01:00
serge-sans-paille a494ae43be Cleanup includes: TransformsUtils
Estimation on the impact on preprocessor output:
before: 1065307662
after:  1064800684

Discourse thread: https://discourse.llvm.org/t/include-what-you-use-include-cleanup
Differential Revision: https://reviews.llvm.org/D120741
2022-03-01 21:00:07 +01:00
Simon Moll 03e83cc8eb [VP] vp.fptosi cast intrinsic and docs
Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D119535
2022-02-15 18:17:19 +01:00
Momchil Velikov 6398903ac8 Extend the `uwtable` attribute with unwind table kind
We have the `clang -cc1` command-line option `-funwind-tables=1|2` and
the codegen option `VALUE_CODEGENOPT(UnwindTables, 2, 0) ///< Unwind
tables (1) or asynchronous unwind tables (2)`. However, this is
encoded in LLVM IR by the presence or the absence of the `uwtable`
attribute, i.e.  we lose the information whether to generate want just
some unwind tables or asynchronous unwind tables.

Asynchronous unwind tables take more space in the runtime image, I'd
estimate something like 80-90% more, as the difference is adding
roughly the same number of CFI directives as for prologues, only a bit
simpler (e.g. `.cfi_offset reg, off` vs. `.cfi_restore reg`). Or even
more, if you consider tail duplication of epilogue blocks.
Asynchronous unwind tables could also restrict code generation to
having only a finite number of frame pointer adjustments (an example
of *not* having a finite number of `SP` adjustments is on AArch64 when
untagging the stack (MTE) in some cases the compiler can modify `SP`
in a loop).
Having the CFI precise up to an instruction generally also means one
cannot bundle together CFI instructions once the prologue is done,
they need to be interspersed with ordinary instructions, which means
extra `DW_CFA_advance_loc` commands, further increasing the unwind
tables size.

That is to say, async unwind tables impose a non-negligible overhead,
yet for the most common use cases (like C++ exceptions), they are not
even needed.

This patch extends the `uwtable` attribute with an optional
value:
      -  `uwtable` (default to `async`)
      -  `uwtable(sync)`, synchronous unwind tables
      -  `uwtable(async)`, asynchronous (instruction precise) unwind tables

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D114543
2022-02-14 14:35:02 +00:00
YASHASVI KHATAVKAR 70fdbf35de Adding DiBuilder interface for assumed length strings 2022-02-11 14:40:02 -05:00
Paul Robinson d2495b69f2 [RGT] Exercise both paths through a test
BitcastToGEP had an opaque/typed pointer decision point, make sure it
exercises both sides.

Found by the Rotten Green Tests project.
2022-02-11 10:47:00 -08:00
YASHASVI KHATAVKAR 93d1a623ce Reverting an entire stack of changes causing build failures 2022-02-10 17:58:22 -05:00
YASHASVI KHATAVKAR 0e7341b7b1 worked on review comments 2022-02-10 15:24:51 -05:00
YASHASVI KHATAVKAR c26a0d1cda Updated the test to include proper string get functions 2022-02-10 15:24:50 -05:00
YASHASVI KHATAVKAR 929499eb64 Updated the test to include addtional details 2022-02-10 15:24:50 -05:00
YASHASVI KHATAVKAR 99f990be64 Added StringLocationExp to the new apis 2022-02-10 15:24:50 -05:00
YASHASVI KHATAVKAR 43d421cda3 Adding DIBuilder interface for assumed length string 2022-02-10 15:24:50 -05:00
Craig Topper 60745fb16f [VP] llvm.vp.fneg intrinsic and LangRef
Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D119262
2022-02-09 07:54:36 -08:00
Craig Topper cef177d186 [VP] llvm.vp.fma intrinsic and LangRef
Differential Revision: https://reviews.llvm.org/D119185
2022-02-07 15:53:27 -08:00
serge-sans-paille ffe8720aa0 Reduce dependencies on llvm/BinaryFormat/Dwarf.h
This header is very large (3M Lines once expended) and was included in location
where dwarf-specific information were not needed.

More specifically, this commit suppresses the dependencies on
llvm/BinaryFormat/Dwarf.h in two headers: llvm/IR/IRBuilder.h and
llvm/IR/DebugInfoMetadata.h. As these headers (esp. the former) are widely used,
this has a decent impact on number of preprocessed lines generated during
compilation of LLVM, as showcased below.

This is achieved by moving some definitions back to the .cpp file, no
performance impact implied[0].

As a consequence of that patch, downstream user may need to manually some extra
files:

llvm/IR/IRBuilder.h no longer includes llvm/BinaryFormat/Dwarf.h
llvm/IR/DebugInfoMetadata.h no longer includes llvm/BinaryFormat/Dwarf.h

In some situations, codes maybe relying on the fact that
llvm/BinaryFormat/Dwarf.h was including llvm/ADT/Triple.h, this hidden
dependency now needs to be explicit.

$ clang++ -E  -Iinclude -I../llvm/include ../llvm/lib/Transforms/Scalar/*.cpp -std=c++14 -fno-rtti -fno-exceptions | wc -l
after:   10978519
before:  11245451

Related Discourse thread: https://llvm.discourse.group/t/include-what-you-use-include-cleanup
[0] https://llvm-compile-time-tracker.com/compare.php?from=fa7145dfbf94cb93b1c3e610582c495cb806569b&to=995d3e326ee1d9489145e20762c65465a9caeab4&stat=instructions

Differential Revision: https://reviews.llvm.org/D118781
2022-02-04 11:44:03 +01:00
serge-sans-paille e188aae406 Cleanup header dependencies in LLVMCore
Based on the output of include-what-you-use.

This is a big chunk of changes. It is very likely to break downstream code
unless they took a lot of care in avoiding hidden ehader dependencies, something
the LLVM codebase doesn't do that well :-/

I've tried to summarize the biggest change below:

- llvm/include/llvm-c/Core.h: no longer includes llvm-c/ErrorHandling.h
- llvm/IR/DIBuilder.h no longer includes llvm/IR/DebugInfo.h
- llvm/IR/IRBuilder.h no longer includes llvm/IR/IntrinsicInst.h
- llvm/IR/LLVMRemarkStreamer.h no longer includes llvm/Support/ToolOutputFile.h
- llvm/IR/LegacyPassManager.h no longer include llvm/Pass.h
- llvm/IR/Type.h no longer includes llvm/ADT/SmallPtrSet.h
- llvm/IR/PassManager.h no longer includes llvm/Pass.h nor llvm/Support/Debug.h

And the usual count of preprocessed lines:
$ clang++ -E  -Iinclude -I../llvm/include ../llvm/lib/IR/*.cpp -std=c++14 -fno-rtti -fno-exceptions | wc -l
before: 6400831
after:  6189948

200k lines less to process is no that bad ;-)

Discourse thread on the topic: https://llvm.discourse.group/t/include-what-you-use-include-cleanup

Differential Revision: https://reviews.llvm.org/D118652
2022-02-02 06:54:20 +01:00
Michael Gottesman 7ed95d1577 [debug-info] Add support for llvm.dbg.addr in DIBuilder.
I based this off of the API already create for llvm.dbg.value since both
intrinsics have the same arguments at the API level.

I added some tests exercising the API a little as well as an additional small
test that shows how one can use llvm.dbg.addr to limit the PC range where an
address value is available in the debugger. This is done by calling
llvm.dbg.value with undef and the same metadata info as one used to create the
llvm.dbg.addr.

rdar://83957028

Reviewed By: aprantl

Differential Revision: https://reviews.llvm.org/D117442
2022-01-18 18:26:50 -08:00
Simon Moll 33efbc8184 [VP] llvm.vp.merge intrinsic and LangRef
llvm.vp.merge interprets the %evl operand differently than the other vp
intrinsics: all lanes at positions greater or equal than the %evl
operand are passed through from the second vector input. Otherwise it
behaves like llvm.vp.select.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D116725
2022-01-12 14:06:56 +01:00
Serge Guelton d2cc6c2d0c Use a sorted array instead of a map to store AttrBuilder string attributes
Using and std::map<SmallString, SmallString> for target dependent attributes is
inefficient: it makes its constructor slightly heavier, and involves extra
allocation for each new string attribute. Storing the attribute key/value as
strings implies extra allocation/copy step.

Use a sorted vector instead. Given the low number of attributes generally
involved, this is cheaper, as showcased by

https://llvm-compile-time-tracker.com/compare.php?from=5de322295f4ade692dc4f1823ae4450ad3c48af2&to=05bc480bf641a9e3b466619af43a2d123ee3f71d&stat=instructions

Differential Revision: https://reviews.llvm.org/D116599
2022-01-10 14:49:53 +01:00
Nikita Popov 32808cfb24 [IR] Track users of comdats
Track all GlobalObjects that reference a given comdat, which allows
determining whether a function in a comdat is dead without scanning
the whole module.

In particular, this makes filterDeadComdatFunctions() have complexity
O(#DeadFunctions) rather than O(#SymbolsInModule), which addresses
half of the compile-time issue exposed by D115545.

Differential Revision: https://reviews.llvm.org/D115864
2022-01-06 09:13:58 +01:00
serge-sans-paille 9290ccc3c1 Introduce the AttributeMask class
This class is solely used as a lightweight and clean way to build a set of
attributes to be removed from an AttrBuilder. Previously AttrBuilder was used
both for building and removing, which introduced odd situation like creation of
Attribute with dummy value because the only relevant part was the attribute
kind.

Differential Revision: https://reviews.llvm.org/D116110
2022-01-04 15:37:46 +01:00
Florian Hahn 5d68dc184e
[Verifier] Iteratively traverse all indirect users.
The recursive implementation can run into stack overflows, e.g. like in PR52844.

The order the users are visited changes, but for the current use case
this only impacts the order error messages are emitted.
2021-12-23 23:20:12 +01:00
Fraser Cormack 8002fa6760 [LangRef] Remove incorrect vector alignment rules
The LangRef incorrectly says that if no exact match is found when
seeking alignment for a vector type, the largest vector type smaller
than the sought-after vector type. This is incorrect as vector types
require an exact match, else they fall back to reporting the natural
alignment.

The corrected rule was not added in its place, as rules for other types
(e.g., floating-point types) aren't documented.

A unit test was added to demonstrate this.

Reviewed By: aeubanks

Differential Revision: https://reviews.llvm.org/D112463
2021-12-14 14:35:40 +00:00
Nikita Popov 6213f1dd03 [IR] Make VPIntrinsic::getDeclarationForParams() opaque pointer compatible
The vp.load and vp.gather intrinsics require the intrinsic return
type to determine the correct function signature. With opaque pointers,
it cannot be derived from the parameter pointee types.

Differential Revision: https://reviews.llvm.org/D115632
2021-12-14 14:20:59 +01:00
Nikita Popov 9cbab13282 [ConstantsTest] Avoid crash with opaque pointers
With opaque pointers there will be no bitcast, so don't assume
that.
2021-12-13 15:23:12 +01:00
Arthur Eubanks 93a20ecee4 [DebugInfo] Check DIEnumerator bit width when comparing for equality
As mentioned in D106585, this causes non-determinism, which can also be
shown by this test case being flaky without this patch.

We were using the APSInt's bit width for hashing, but not for checking
for equality. APInt::isSameValue() does not check bit width.

Reviewed By: rnk

Differential Revision: https://reviews.llvm.org/D115054
2021-12-03 13:40:22 -08:00
Sanjay Patel 97e921c81f [PatternMatch] create and use matcher for 'not' that excludes undef elements
We needed a stricter version of m_Not for D114462, but I wasn't
sure if that was going to be required anywhere else, so I didn't bother
to make that reusable.

It turns out we have one more existing simplification that needs
this (currently miscompiles):
https://alive2.llvm.org/ce/z/9-nTKi

And there's at least one more fold in that family that we could add.

Differential Revision: https://reviews.llvm.org/D114882
2021-12-02 08:51:13 -05:00
Nikita Popov 55d392cc30 [llvm-c] Make LLVMAddAlias opaque pointer compatible
Deprecate LLVMAddAlias in favor of LLVMAddAlias2, which accepts a
value type and an address space. Previously these were extracted
from the pointer type.

Differential Revision: https://reviews.llvm.org/D114860
2021-12-02 09:21:16 +01:00
Ellis Hoag 0150645bf5 [DebugInfo] Do not replace existing nodes from DICompileUnit
When creating a new DIBuilder with an existing DICompileUnit, load the
DINodes from the current DICompileUnit so they don't get overwritten.
This is done in the MachineOutliner pass, but it didn't change the CU so
the bug never appeared. We need this if we ever want to add DINodes to
the CU after it has been created, e.g., DIGlobalVariables.

Reviewed By: dblaikie

Differential Revision: https://reviews.llvm.org/D114556
2021-11-29 19:46:10 -08:00
Simon Moll 1e65b93f3a [VP] Canonicalize macros of VPIntrinsics.def
Usage and naming of macros in VPIntrinsics.def has been inconsistent. Rename all property macros to VP_PROPERTY_<name>.  Use BEGIN/END scope macros to attach properties to vp intrinsics and SDNodes (instead of specifying either directly with the property macro).
A follow-up patch has documentation on how the macros are (intended) to be used.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D114144
2021-11-23 16:51:11 +01:00
David Sherwood ca18fcc2c0 [IR] Change CreateStepVector to work with element types smaller than i8
Currently the stepvector intrinsic only supports element types that
are integers of size 8 bits or more. This patch adds support for the
creation of stepvectors with smaller element types by creating
the intrinsic with i8 elements that we then truncate to the requested
size.

It's not currently possible to write a vectoriser test to exercise
this code path so I have added a unit test here:

  llvm/unittests/IR/IRBuilderTest.cpp

Differential Revision: https://reviews.llvm.org/D113767
2021-11-17 10:47:50 +00:00
Mircea Trofin a32c2c3808 [NFC] Use Optional<ProfileCount> to model invalid counts
ProfileCount could model invalid values, but a user had no indication
that the getCount method could return bogus data. Optional<ProfileCount>
addresses that, because the user must dereference the optional. In
addition, the patch removes concept duplication.

Differential Revision: https://reviews.llvm.org/D113839
2021-11-14 19:03:30 -08:00
Nikita Popov 1c5d636af1 [ConstantRangeTest] Add helper to enumerate APInts (NFC)
While ForeachNumInConstantRange(ConstantRange::getFull(Bits))
works, it's somewhat roundabout, and I keep looking for this
function.
2021-11-12 18:18:51 +01:00
Nikita Popov 2060895c9c [ConstantRange] Add exact union/intersect (NFC)
For some optimizations on comparisons it's necessary that the
union/intersect is exact and not a superset. Add methods that
return Optional<ConstantRange> only if the result is exact.

For the sake of simplicity this is implemented by comparing
the subset and superset approximations for now, but it should be
possible to do this more directly, as unionWith() and intersectWith()
already distinguish the cases where the result is imprecise for the
preferred range type functionality.
2021-11-07 21:46:06 +01:00
Nikita Popov cf71a5ea8f [ConstantRange] Support zero size in isSizeLargerThan()
From an API perspective, it does not make a lot of sense that 0
is not a valid argument to this function. Add the exact check needed
to support it.
2021-11-07 21:22:45 +01:00
Luke Benes 2249ecee8d
[IR][ShuffleVector] Fix Wdangling-else warning in InstructionsTest
Fix a dangling else that gcc-11 warned about. The EXPECT_EQ macro
expands to an if-else, so the whole construction contains a hidden
dangling else.

Differential Revision: https://reviews.llvm.org/D113346
2021-11-07 00:07:01 +03:00
Nikita Popov 9f0194be45 [ConstantRange] Add getEquivalentICmp() variant with offset (NFCI)
Add a variant of getEquivalentICmp() that produces an optional
offset. This allows us to create an equivalent icmp for all ranges.

Use this in the with.overflow folding code, which was doing this
adjustment separately -- this clarifies that the fold will indeed
always apply.
2021-11-06 21:59:45 +01:00
Roman Lebedev a5cd27880a
[IR] Improve member `ShuffleVectorInst::isReplicationMask()`
When we have an actual shuffle, we can impose the additional restriction
that the mask replicates the elements of the first operand, so we know
the replication factor as a ratio of output and op0 vector sizes.
2021-11-06 00:09:27 +03:00
Roman Lebedev 0b36431810
[NFCI] InstructionTest: trim `InstructionsTest.ShuffleMaskIsReplicationMask_*` complexity
These tests have pretty high O() complexity due to their nature,
which leads to potentially-long runtimes.

While in release build for me they took ~1 and ~2 sec,
as noted in https://reviews.llvm.org/D113214#inline-1080479
they take minutes in debug build.

Fine-tune the amount of permutations they deal with,
without affecting the test coverage. After this,
they take <~10ms each for me (in release build),
hopefully that is good-enough for debug build too.
2021-11-05 19:22:48 +03:00
Roman Lebedev 01d8759ac9
[IR][ShuffleVector] Introduce `isReplicationMask()` matcher
Avid readers of this saga may recall from previous installments,
that replication mask replicates (lol) each of the `VF` elements
in a vector `ReplicationFactor` times. For example, the mask for
`ReplicationFactor=3` and `VF=4` is: `<0,0,0,1,1,1,2,2,2,3,3,3>`.
More importantly, replication mask is used by LoopVectorizer
when using masked interleaved memory operations.

As discussed in previous installments, while it is used by LV,
and we **seem** to support masked interleaved memory operations on X86,
it's support in cost model leaves a lot to be desired:
until basically yesterday even for AVX512 we had no cost model for it.

As it has been witnessed in the recent
AVX2 `X86TTIImpl::getInterleavedMemoryOpCost()`
costmodel patches, while it is hard-enough to query the cost
of a particular assembly sequence [from llvm-mca],
afterwards the check lines LV costmodel tests must be updated manually.
This is, at the very least, boring.

Okay, now we have decent costmodel coverage for interleaving shuffles,
but now basically the same mind-killing sequence has to be performed
for replication mask. I think we can improve at least the second half
of the problem, by teaching
the `TargetTransformInfoImplCRTPBase::getUserCost()` to recognize
`Instruction::ShuffleVector` that are repetition masks,
adding exhaustive test coverage
using `-cost-model -analyze` + `utils/update_analyze_test_checks.py`

This way we can have good exhaustive coverage for cost model,
and only basic coverage for the LV costmodel.

This patch adds precise undef-aware `isReplicationMask()`,
with exhaustive test coverage.
* `InstructionsTest.ShuffleMaskIsReplicationMask` shows that
   it correctly detects all the known masks.
* `InstructionsTest.ShuffleMaskIsReplicationMask_undef`
  shows that replacing some mask elements in a known replication mask
  still allows us to recognize it as a replication mask.
  Note, with enough undef elts, we may detect a different tuple.
* `InstructionsTest.ShuffleMaskIsReplicationMask_Exhaustive_Correctness`
  shows that if we detected the replication mask with given params,
  then if we actually generate a true replication mask with said params,
  it matches element-wise ignoring undef mask elements.

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D113214
2021-11-05 16:53:47 +03:00
Jakub Kuderski 3348b841d3 Make enum iteration with seq safe by default
By default `llvm::seq` would happily iterate over enums, which may be unsafe if the enum values are not continuous. This patch disable enum iteration with `llvm::seq` and `llvm::seq_inclusive` and adds two new functions: `enum_seq` and `enum_seq_inclusive`.

To make sure enum iteration is safe, we require users to declare their enum types as iterable by specializing `enum_iteration_traits<SomeEnum>`. Because it's not always possible to add these traits next to enum definition (e.g., for enums defined in external libraries), we provide an escape hatch to allow iteration on per-callsite basis by passing `force_iteration_on_noniterable_enum`.

The main benefit of this approach is that these global declarations via traits can appear just next to enum definitions, making easy to spot when enums are miss-labeled, e.g., after introducing new enum values, whereas `force_iteration_on_noniterable_enum` should stand out and be easy to grep for.

This emerged from a discussion with gchatelet@ about reusing llvm's `Sequence.h` in lieu of https://github.com/GPUOpen-Drivers/llpc/blob/dev/lgc/interface/lgc/EnumIterator.h.

Reviewed By: dblaikie, gchatelet, aaron.ballman

Differential Revision: https://reviews.llvm.org/D107378
2021-11-03 20:52:21 -04:00