The way function gets the induction variable is by judging whether
StepInst or IndVar in the phi statement is one of the operands of CMP.
But if the LatchCmpOp0/LatchCmpOp1 is a constant, the subsequent
comparison may result in null == null, which is meaningless. This patch
fixes the typo.
Reviewed By: Whitney
Differential Revision: https://reviews.llvm.org/D112980
mapped_iterator is a useful abstraction for applying a
map function over an existing iterator, but our current
usage ends up allocating storage/making indirect calls
even with the map function is a known function, which
is horribly inefficient. This commit refactors the usage
of mapped_iterator to avoid this, and allows for directly
referencing the map function when dereferencing.
Fixes PR52319
Differential Revision: https://reviews.llvm.org/D113511
This patch adds minimal support for D programming language demangling on LLVM
core based on the D name mangling spec. This will allow easier integration on a
future LLDB plugin for D either in the upstream tree or outside of it.
Minimal support includes recognizing D demangling encoding and at least one
mangling name, which in this case is `_Dmain` mangle.
Reviewed By: jhenderson, lattner
Differential Revision: https://reviews.llvm.org/D111414
- CUDA cannot associate memory space with pointer types. Even though Clang could add extra attributes to specify the address space explicitly on a pointer type, it breaks the portability between Clang and NVCC.
- This change proposes to assume the address space from a pointer from the assumption built upon target-specific address space predicates, such as `__isGlobal` from CUDA. E.g.,
```
foo(float *p) {
__builtin_assume(__isGlobal(p));
// From there, we could assume p is a global pointer instead of a
// generic one.
}
```
This makes the code portable without introducing the implementation-specific features.
Note that NVCC starts to support __builtin_assume from version 11.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D112041
Add new triple and target info for ‘spirv32’ and ‘spirv64’ and,
thus, enabling clang (LLVM IR) code emission to SPIR-V target.
The target for SPIR-V is mostly reused from SPIR by derivation
from a common base class since IR output for SPIR-V is mostly
the same as SPIR. Some refactoring are made accordingly.
Added and updated tests for parts that are different between
SPIR and SPIR-V.
Patch by linjamaki (Henry Linjamäki)!
Differential Revision: https://reviews.llvm.org/D109144
For some optimizations on comparisons it's necessary that the
union/intersect is exact and not a superset. Add methods that
return Optional<ConstantRange> only if the result is exact.
For the sake of simplicity this is implemented by comparing
the subset and superset approximations for now, but it should be
possible to do this more directly, as unionWith() and intersectWith()
already distinguish the cases where the result is imprecise for the
preferred range type functionality.
Be more consistent in the naming convention for the various RET instructions to specify in terms of bitwidth.
Helps prevent future scheduler model mismatches like those that were only addressed in D44687.
Differential Revision: https://reviews.llvm.org/D113302
Fix a dangling else that gcc-11 warned about. The EXPECT_EQ macro
expands to an if-else, so the whole construction contains a hidden
dangling else.
Differential Revision: https://reviews.llvm.org/D113346
Add a variant of getEquivalentICmp() that produces an optional
offset. This allows us to create an equivalent icmp for all ranges.
Use this in the with.overflow folding code, which was doing this
adjustment separately -- this clarifies that the fold will indeed
always apply.
When we have an actual shuffle, we can impose the additional restriction
that the mask replicates the elements of the first operand, so we know
the replication factor as a ratio of output and op0 vector sizes.
These tests have pretty high O() complexity due to their nature,
which leads to potentially-long runtimes.
While in release build for me they took ~1 and ~2 sec,
as noted in https://reviews.llvm.org/D113214#inline-1080479
they take minutes in debug build.
Fine-tune the amount of permutations they deal with,
without affecting the test coverage. After this,
they take <~10ms each for me (in release build),
hopefully that is good-enough for debug build too.
Avid readers of this saga may recall from previous installments,
that replication mask replicates (lol) each of the `VF` elements
in a vector `ReplicationFactor` times. For example, the mask for
`ReplicationFactor=3` and `VF=4` is: `<0,0,0,1,1,1,2,2,2,3,3,3>`.
More importantly, replication mask is used by LoopVectorizer
when using masked interleaved memory operations.
As discussed in previous installments, while it is used by LV,
and we **seem** to support masked interleaved memory operations on X86,
it's support in cost model leaves a lot to be desired:
until basically yesterday even for AVX512 we had no cost model for it.
As it has been witnessed in the recent
AVX2 `X86TTIImpl::getInterleavedMemoryOpCost()`
costmodel patches, while it is hard-enough to query the cost
of a particular assembly sequence [from llvm-mca],
afterwards the check lines LV costmodel tests must be updated manually.
This is, at the very least, boring.
Okay, now we have decent costmodel coverage for interleaving shuffles,
but now basically the same mind-killing sequence has to be performed
for replication mask. I think we can improve at least the second half
of the problem, by teaching
the `TargetTransformInfoImplCRTPBase::getUserCost()` to recognize
`Instruction::ShuffleVector` that are repetition masks,
adding exhaustive test coverage
using `-cost-model -analyze` + `utils/update_analyze_test_checks.py`
This way we can have good exhaustive coverage for cost model,
and only basic coverage for the LV costmodel.
This patch adds precise undef-aware `isReplicationMask()`,
with exhaustive test coverage.
* `InstructionsTest.ShuffleMaskIsReplicationMask` shows that
it correctly detects all the known masks.
* `InstructionsTest.ShuffleMaskIsReplicationMask_undef`
shows that replacing some mask elements in a known replication mask
still allows us to recognize it as a replication mask.
Note, with enough undef elts, we may detect a different tuple.
* `InstructionsTest.ShuffleMaskIsReplicationMask_Exhaustive_Correctness`
shows that if we detected the replication mask with given params,
then if we actually generate a true replication mask with said params,
it matches element-wise ignoring undef mask elements.
Reviewed By: spatel
Differential Revision: https://reviews.llvm.org/D113214
This normalizes most paths (except ones input from the user as command
line arguments) into the preferred form, if `real_style()` evaluates to
`windows_forward`.
Differential Revision: https://reviews.llvm.org/D111880
This behaves just like the regular Windows style, with both separator
forms accepted, but with get_separator() returning forward slashes.
Add a more descriptive name for the existing style, keeping the old
name around as an alias initially.
Add a new function `make_preferred()` (like the C++17
`std::filesystem::path` function with the same name), which converts
windows paths to the preferred separator form (while this one works on
any platform and takes a `path::Style` argument).
Contrary to `native()` (just like `make_preferred()` in `std::filesystem`),
this doesn't do anything at all on Posix, it doesn't try to reinterpret
backslashes into forward slashes there.
Differential Revision: https://reviews.llvm.org/D111879
This interface should not have existed in the first place, let alone
be a public member.
It allows calling `ElementCount::get(..)->getValue()`, which is ambiguous.
The interfaces to be used are either getFixedValue() or getKnownMinValue().
This expands the lookup table statically and avoids routing through methods that
contain asserts (like StringRef/std::string element accessors and drop_front)
such that performance is more predictable across compilation environments. This
was primarily driven by slow debug mode performance but has a large benefit in
release builds as well.
```
ssd_mobilenet_v2_face_float (42MB .mlir)
Debug/MSVC (old): 5.22s
Debug/MSVC (new): 0.16s
Release/MSVC (old): 0.81s
Release/MSVC (new): 0.02s
huggingface_minilm (536MB .mlir)
Debug/MSVC (old): 65.31s
Debug/MSVC (new): 2.03s
Release/MSVC (old): 9.93s
Release/MSVC (new): 0.27s
```
Now in debug the time is split evenly between lexString, tryGetFromHex, and
element attrs hashing, with the next step to making it faster being to combine
the work (incremental hashing during conversion, etc) - but this is at least in
the right order of magnitude and retains the original API surface.
I have not profiled a build with clang but this is strictly less code and simpler
data structures so I'd expect improvements there as well.
This also fixes a bug where 0xFF bytes in the input would read out of bounds.
Reviewed By: dblaikie, stellaraccident
Differential Revision: https://reviews.llvm.org/D112105
By default `llvm::seq` would happily iterate over enums, which may be unsafe if the enum values are not continuous. This patch disable enum iteration with `llvm::seq` and `llvm::seq_inclusive` and adds two new functions: `enum_seq` and `enum_seq_inclusive`.
To make sure enum iteration is safe, we require users to declare their enum types as iterable by specializing `enum_iteration_traits<SomeEnum>`. Because it's not always possible to add these traits next to enum definition (e.g., for enums defined in external libraries), we provide an escape hatch to allow iteration on per-callsite basis by passing `force_iteration_on_noniterable_enum`.
The main benefit of this approach is that these global declarations via traits can appear just next to enum definitions, making easy to spot when enums are miss-labeled, e.g., after introducing new enum values, whereas `force_iteration_on_noniterable_enum` should stand out and be easy to grep for.
This emerged from a discussion with gchatelet@ about reusing llvm's `Sequence.h` in lieu of https://github.com/GPUOpen-Drivers/llpc/blob/dev/lgc/interface/lgc/EnumIterator.h.
Reviewed By: dblaikie, gchatelet, aaron.ballman
Differential Revision: https://reviews.llvm.org/D107378
Data references in a loop should not access elements over the
statically allocated size. So we can infer a loop max trip count
from this undefined behavior.
Reviewed By: reames, mkazantsev, nikic
Differential Revision: https://reviews.llvm.org/D109821
* Properly specify reference type in enumerator_iter
* Fix constness of iterator_adaptor_base::operator*
Differential Revision: https://reviews.llvm.org/D112981
blockaddresses do not participate in the call graph since the only
instructions that use them must all return to someplace within the
current function. And passes cannot retrieve a function address from a
blockaddress.
This was suggested by efriedma in D58260.
Fixes PR50881.
Reviewed By: nickdesaulniers
Differential Revision: https://reviews.llvm.org/D112178
For certain combination of LHS and RHS constant ranges,
the signedness of the relational comparison predicate is irrelevant.
This implements complete and precise model for all predicates,
as confirmed by the brute-force tests. I'm not sure if there are
some more cases that we can handle here.
In a follow-up, CVP will make use of this.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D90924
This type has been moved up into the llvm::orc::shared namespace.
This type was originally put in the detail:: namespace on the assumption that
few (if any) LLVM source files would need to use it. In practice it has been
needed in many places, and will continue to be needed until/unless
OrcTargetProcess is fully merged into the ORC runtime.
As noted in https://reviews.llvm.org/D90924#inline-1076197
apparently this is a pretty common pattern,
let's not repeat it yet again, but have it in a common place.
There may be some more places where it could be used,
but these are the most obvious ones.
Remove sys::path::is_style_native(), which was added alongside
is_style_windows() and is_style_posix().
Thinking a bit about the windows forward-slash style variant in
https://reviews.llvm.org/D111879, it's not clear to me how the new
sys::path::is_style_native() should behave for them.
- Should it return true for both `windows_slash` and
`windows_backslash`?
- Should it return true for only one of them?
I can think of hypothetical uses and justifications for either one, and
I could also imagine clients guessing either behaviour when just looking
at the function name in code.
Call sites will probably be more clear if they don't use this function,
and instead write out the code:
```
// Is "S" the coarse-grained native style?
if (is_style_windows(S) == is_style_windows(Style::native))
// Is "S" the fine-grained native style?
if (is_style_windows(S) == is_style_windows(Style::native) &&
preferred_separator(S) == preferred_separator(Style::native))
```
Can always add this again if someone needs it and can justify one
behaviour over the other, but for now might as well avoid growing users.
fs::copy_file() on Darwin has a nice optimization to clone the file when
possible. Change the implementation to use clonefile() directly, instead
of the higher-level copyfile(). The latter does the wrong thing for
symlinks, which requires calling `stat` first...
With that out of the way, optimistically call clonefile() all the time,
and then for any error that's recoverable try again with copyfile()
(without the COPYFILE_CLONE flag, as before).
Differential Revision: https://reviews.llvm.org/D112250
Expose three helpers in namespace llvm::sys::path to detect the
path rules followed by sys::path::Style.
- is_style_posix()
- is_style_windows()
- is_style_native()
This are constexpr functions that that will allow a bunch of
path-related code to stop checking `_WIN32`.
Originally I looked at adding system_style(), analogous to
sys::endian::system_endianness(), but future patches (from others) will
add more Windows style variants for slash preferences. These helpers
should be resilient to that change, allowing callers to detect basic
path rules.
Differential Revision: https://reviews.llvm.org/D112288
This patch adds the inclusive clause (which was missed in previous
reorganization - https://reviews.llvm.org/D110903) in omp.wsloop operation.
Added a test for validating it.
Also fixes the order clause, which was not accepting any values. It now accepts
"concurrent" as a value, as specified in the standard.
Reviewed By: kiranchandramohan, peixin, clementval
Differential Revision: https://reviews.llvm.org/D112198
The support for neoverse-512tvb mirrors the same option available in GCC[1].
There is no functional effect for this option yet.
This patch ensures the driver accepts "-mcpu=neoverse-512tvb", and enough
plumbing is in place to allow the new option to be used in the future.
[1]https://gcc.gnu.org/onlinedocs/gcc/AArch64-Options.html
Differential Revision: https://reviews.llvm.org/D112406
There's precedent for that in `CreateOr()`/`CreateAnd()`.
The motivation here is to avoid bloating the run-time check's IR
in `SCEVExpander::generateOverflowCheck()`.
Refs. https://reviews.llvm.org/D109368#3089809
Over in e7084ceab3 the InstrRefBasedLDV class grew a MachineRegisterInfo
pointer to lookup register sizes -- however, that field wasn't initialized
in the corresponding unit tests. This patch initializes it!
Fixes a buildbot failure reported on D112006
Otherwise, ODRUniquing would map some member method/variable MDNodes
to have enum type DIScope, resulting in invalid debug info and bad
DWARF.
- Add a Verifier check that when a 'scope:' operand is an ODR type that is not an enum.
- Makes ODRUniquing apply to only ODR types with the same tag so that the debuginfo/DWARF is well-formed.
Reviewed By: probinson, aprantl
Differential Revision: https://reviews.llvm.org/D111770
There are a few STL containers hanging around that can become DenseMaps,
SmallVectors and similar. This recovers a modest amount of compile time
performance.
While I'm here, adjust the bit layout of ValueIDNum: this was always
supposed to act like a value type, however it seems that clang doesn't
compile the comparison functions to act that way. Add a uint64_t to a
union that explicitly aliases the bitfields, so that we can compare the
whole value as a single integer.
Differential Revision: https://reviews.llvm.org/D112333
This patch is like D111627 -- instead of calculating IDF for every location
on the stack, only do it for the smallest units of interference, and copy
the PHIs for those units to any aliases.
The test added runs placeMLocPHIs directly, and tests that:
* A def of the lower 8 bits of a stack slot causes all aliasing regs to
have PHIs placed,
* It doesn't cause the equivalent location to x86's $ah, which isn't
aliased, to have a PHI placed.
Differential Revision: https://reviews.llvm.org/D112324
Change buffer_unique_ostream's constructor to call
raw_ostream::SetUnbuffered() on its owned stream. Otherwise,
buffer_unique_ostream's destructor could cause the owned stream to
temporarily allocate a buffer only to be immediately flushed.
Also add some tests for buffer_ostream and buffer_unique_ostream. Use
the same naming scheme as other raw_ostream-related tests (e.g.,
`raw_ostreamTest` for the fixture, `raw_ostream_test.cpp` for the
filename).
(I considered changing buffer_ostream in the same way (calling
SetUnbuffered on the referenced stream), but that seemed like overreach
since the client may have more things to write.)
(I considered merging buffer_ostream and buffer_unique_ostream into a
single class (with a `raw_ostream&` and a `std::unique_ptr` that is only
sometimes used), but that makes the class bigger and the small amount of
code deduplication seems uncompelling.)
Differential Revision: https://reviews.llvm.org/D110369
Expected<T>::moveInto() takes as an out parameter any `OtherT&` that's
assignable from `T&&`. It moves any stored value before returning
takeError().
Since moveInto() consumes both the Error and the value, it's only
anticipated that we'd use call it on temporaries/rvalues, with naming
the Expected first likely to be an anti-pattern of sorts (either you
want to deal with both at the same time, or you don't). As such,
starting it out as `&&`-qualified... but it'd probably be fine to drop
that if there's a good use case for lvalues that appears.
There are two common patterns that moveInto() cleans up:
```
// If the variable is new:
Expected<std::unique_ptr<int>> ExpectedP = makePointer();
if (!ExpectedP)
return ExpectedP.takeError();
std::unique_ptr<int> P = std::move(*ExpectedP);
// If the target variable already exists:
if (Expected<T> ExpectedP = makePointer())
P = std::move(*ExpectedP);
else
return ExpectedP.takeError();
```
moveInto() takes less typing and avoids needing to name (or leak into
the scope) an extra variable.
```
// If the variable is new:
std::unique_ptr<int> P;
if (Error E = makePointer().moveInto(P))
return E;
// If the target variable already exists:
if (Error E = makePointer().moveInto(P))
return E;
```
It also seems useful for unit tests, to log errors (but continue) when
there's an unexpected failure. E.g.:
```
// Crash on error, or undefined in non-asserts builds.
std::unique_ptr<MemoryBuffer> MB = cantFail(makeMemoryBuffer());
// Avoid crashing on error without moveInto() :(.
Expected<std::unique_ptr<MemoryBuffer>>
ExpectedMB = makeMemoryBuffer();
ASSERT_THAT_ERROR(ExpectedMB.takeError(), Succeeded());
std::unique_ptr<MemoryBuffer> MB = std::move(ExpectedMB);
// Avoid crashing on error with moveInto() :).
std::unique_ptr<MemoryBuffer> MB;
ASSERT_THAT_ERROR(makeMemoryBuffer().moveInto(MB), Succeeded());
```
Differential Revision: https://reviews.llvm.org/D112278
Sometimes we generate code that writes to a subregister, then spills /
restores a super-register to the stack, for example:
$eax = MOV32ri 0
MOV64mr $rsp, 1, $noreg, 16, $noreg, $rax
$rcx = MOV64rm $rsp, 1, $noreg, 8, $noreg
This patch takes a different approach: it adds another index to
MLocTracker that identifies a size/offset within a stack slot. A location
on the stack is then a pari of {FrameIndex, SlotNum}. Spilling and
restoring now involves pairing up the src/dest register numbers, and the
dest/src stack position to be transferred to/from. Location coverage
improves as a result, compile-time performance decreases, alas.
One limitation is that if a PHI occurs inside a stack slot:
DBG_PHI %stack.0, 1
We don't know how large the resulting value is, and so might have
difficulty picking which value to use. DBG_PHI might need to be augmented
in the future with such a size.
Unit tests added ensure that spills and restores correctly transfer to
positions in the Location => Value map, and that different register classes
written to the stack will correctly clobber all other positions in the
stack slot.
Differential Revision: https://reviews.llvm.org/D112133
This patch adds some unit tests for the machine-location transfer-function
building parts of InstrRefBasedLDV: i.e., test that if we feed some MIR
into the transfer-function building code, does it create the correct
transfer function.
There are a number of minor defects that get corrected in the process:
* The unit test was selecting the x86 (i.e. 32 bit) backend rather than
x86_64's 64 bit backend,
* COPY instructions weren't actually having their subregister values
correctly represented in the transfer function. Subregisters were being
defined by the COPY, rather than taking the value in the source register.
* SP aliases were at risk of being clobbered, if an SP subregister was
clobbered.
Differential Revision: https://reviews.llvm.org/D112006
On AIX, the plugins are linked with `-WL,-G`, which produces shared objects enabled for use with the run-time linker. This patch sets the run-time
linker at the main executable link step to allow symbols from the plugins shared objects to be properly bound.
Reviewed By: daltenty
Differential Revision: https://reviews.llvm.org/D112275