In keeping with our general direction of having the vXi1 type present in IR, this patch converts the mask argument for avx512 gather to vXi1. This can avoid k-register to GPR to k-register transitions late in codegen.
I left the existing intrinsics behind because they have many out of tree users such as ISPC. They generate their own code and don't go through the autoupgrade path which only works for bitcode and ll parsing. Ideally we will get them to migrate to target independent intrinsics, but it might be easier for them to migrate to these new intrinsics.
I'll work on scatter and gatherpf/scatterpf next.
Differential Revision: https://reviews.llvm.org/D56527
llvm-svn: 351234
Summary:
Clang calls these functions to produce IR for assume-aligned attributes.
I would like to teach UBSAN to verify these assumptions.
For that, i need to access the final pointer on which the check is performed,
and the actual `icmp` that does the check.
The alternative to this would be to fully re-implement this in clang.
This is a second commit, the original one was r351104,
which was mass-reverted in r351159 because 2 compiler-rt tests were failing.
Reviewers: spatel, dneilson, craig.topper, dblaikie, hfinkel
Reviewed By: hfinkel
Subscribers: hfinkel, llvm-commits
Differential Revision: https://reviews.llvm.org/D54588
llvm-svn: 351176
This adds support for multilib paths for wasm32 targets, following
[Debian's Multiarch conventions], and also adds an experimental OS name in
order to test it.
[Debian's Multiarch conventions]: https://wiki.debian.org/Multiarch/
Differential Revision: https://reviews.llvm.org/D56553
llvm-svn: 351163
This will allow other utilities (including a future RuntimeDyld replacement) to
use these types without pulling in the major Core types (JITDylib, etc.).
llvm-svn: 351138
MachOObjectFile::getSymbolByIndex.
ObjectFile derivatives should prefer symbol_iterator/SymbolRef over
basic_symbol_iterator/BasicSymbolRef where possible, as the former
retain their link to the ObjectFile (rather than a SymbolicFile) and provide
more functionality.
No test for this: Existing code is working, and we don't have (m)any libObject
unit tests. I'll think about how we can test more systematically going forward.
llvm-svn: 351128
consistently accept a pointee-type argument.
Note: this also adds a new C API and soft-deprecates the old C API.
Differential Revision: https://reviews.llvm.org/D56559
llvm-svn: 351124
accept a return-type argument.
Note: this also adds a new C API and soft-deprecates the old C API.
Differential Revision: https://reviews.llvm.org/D56558
llvm-svn: 351123
accept a callee-type argument.
Note: this also adds a new C API and soft-deprecates the old C API.
Differential Revision: https://reviews.llvm.org/D56557
llvm-svn: 351122
accept a callee-type argument.
Note: this also adds a new C API and soft-deprecates the old C API.
Differential Revision: https://reviews.llvm.org/D56556
llvm-svn: 351121
Summary:
This allows a bit more control for scenarios where client might
modifiy a DIContext
Reviewers: twoh, Kader, modocache
Reviewed By: Kader
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D56505
llvm-svn: 351107
Summary:
Clang calls these functions to produce IR for assume-aligned attributes.
I would like to teach UBSAN to verify these assumptions.
For that, i need to access the final pointer on which the check is performed,
and the actual `icmp` that does the check.
The alternative to this would be to fully re-implement this in clang.
Reviewers: spatel, dneilson, craig.topper, dblaikie, hfinkel
Reviewed By: hfinkel
Subscribers: hfinkel, llvm-commits
Differential Revision: https://reviews.llvm.org/D54588
llvm-svn: 351104
This removes the old grow_memory and mem.grow-style intrinsics, leaving just
the memory.grow-style intrinsics.
Differential Revision: https://reviews.llvm.org/D56645
llvm-svn: 351084
Split MachinePipeliner code into header and cpp files to allow
inheritance from SwingSchedulerDAG.
This reapplies https://reviews.llvm.org/D56084 after moving the
implementation of the dump functions into the .cpp files. This fixes a
linker error when building with Clang modules enables and local
submodule visibility disabled.
Original patch by Lama Saba <lama.saba@intel.com>!
llvm-svn: 351077
Normally, changing the function signatures of C APIs is disallowed,
but as these two are brand new last week, and haven't been released
yet, it is okay in this instance.
As per discussion in D56556, we will not add NameLen arguments to IR
building APIs, for the following reasons:
1. We do not want to deprecate all of the IR building APIs, just to add a
NameLen argument to each one.
2. Consistency is important, so adding it just to new ones is unfortunate.
3. The IR names are completely optional, useful for readability of IR
only. There is no value in ever supporting nul bytes.
Differential Revision: https://reviews.llvm.org/D56669
llvm-svn: 351076
TFE and LWE support requires extra result registers that are written in the
event of a failure in order to detect that failure case.
The specific use-case that initiated these changes is sparse texture support.
This means that if image intrinsics are used with either option turned on, the
programmer must ensure that the return type can contain all of the expected
results. This can result in redundant registers since the vector size must be a
power-of-2.
This change takes roughly 6 parts:
1. Modify the instruction defs in tablegen to add new instruction variants that
can accomodate the extra return values.
2. Updates to lowerImage in SIISelLowering.cpp to accomodate setting TFE or LWE
(where the bulk of the work for these instruction types is now done)
3. Extra verification code to catch cases where intrinsics have been used but
insufficient return registers are used.
4. Modification to the adjustWritemask optimisation to account for TFE/LWE being
enabled (requires extra registers to be maintained for error return value).
5. An extra pass to zero initialize the error value return - this is because if
the error does not occur, the register is not written and thus must be zeroed
before use. Also added a new (on by default) option to ensure ALL return values
are zero-initialized that is required for sparse texture support.
6. Disable the inst_combine optimization in the presence of tfe/lwe (later TODO
for this to re-enable and handle correctly).
There's an additional fix now to avoid a dmask=0
For an image intrinsic with tfe where all result channels except tfe
were unused, I was getting an image instruction with dmask=0 and only a
single vgpr result for tfe. That is incorrect because the hardware
assumes there is at least one vgpr result, plus the one for tfe.
Fixed by forcing dmask to 1, which gives the desired two vgpr result
with tfe in the second one.
The TFE or LWE result is returned from the intrinsics using an aggregate
type. Look in the test code provided to see how this works, but in essence IR
code to invoke the intrinsic looks as follows:
%v = call {<4 x float>,i32} @llvm.amdgcn.image.load.1d.v4f32i32.i32(i32 15,
i32 %s, <8 x i32> %rsrc, i32 1, i32 0)
%v.vec = extractvalue {<4 x float>, i32} %v, 0
%v.err = extractvalue {<4 x float>, i32} %v, 1
This re-submit of the change also includes a slight modification in
SIISelLowering.cpp to work-around a compiler bug for the powerpc_le
platform that caused a buildbot failure on a previous submission.
Differential revision: https://reviews.llvm.org/D48826
Change-Id: If222bc03642e76cf98059a6bef5d5bffeda38dda
Work around for ppcle compiler bug
Change-Id: Ie284cf24b2271215be1b9dc95b485fd15000e32b
llvm-svn: 351054
Summary:
Previously only one RealFileSystem instance was available, and its working
directory is shared with the process. This doesn't work well for multithreaded
programs that want to work with relative paths - the vfs::FileSystem is assumed
to provide the working directory, but a thread cannot control this exclusively.
The new vfs::createPhysicalFileSystem() factory copies the process's working
directory initially, and then allows it to be independently modified.
This implementation records the working directory path, and glues it to relative
paths to provide the correct absolute path to the sys::fs:: functions.
This will give different results in unusual situations (e.g. the CWD is moved).
The main alternative is the use of openat(), fstatat(), etc to ask the OS to
resolve paths relative to a directory handle which can be kept open. This is
more robust. There are two reasons not to do this initially:
1. these functions are not available on all supported Unixes, and are somewhere
between difficult and unavailable on Windows. So we need a path-based
fallback anyway.
2. this would mean also adding support at the llvm::sys::fs level, which is a
larger project. My clearest idea is an OS-specific `BaseDirectory` object
that can be optionally passed to functions there. Eventually this could be
backed by either paths or a fd where openat() is supported.
This is a large project, and demonstrating here that a path-based fallback
works is a useful prerequisite.
There is some subtlety to the path-manipulation mechanism:
- when setting the working directory, both Specified=makeAbsolute(path) and
Resolved=realpath(path) are recorded. These may differ in the presence of
symlinks.
- getCurrentWorkingDirectory() and makeAbsolute() use Specified - this is
similar to the behavior of $PWD and sys::path::current_path
- IO operations like openFileForRead use Resolved. This is similar to the
behavior of an openat() based implementation, that doesn't see changes
in symlinks.
There may still be combinations of operations and FS states that yield unhelpful
behavior. This is hard to avoid with symlinks and FS abstractions :(
The caching behavior of the current working directory is removed in this patch.
getRealFileSystem() is now specified to link to the process CWD, so the caching
is incorrect.
The user who needed this so far is clangd, which will immediately switch to
createPhysicalFileSystem().
Reviewers: ilya-biryukov, bkramer, labath
Subscribers: ioeric, kadircet, kristina, llvm-commits
Differential Revision: https://reviews.llvm.org/D56545
llvm-svn: 351050
Part of the effort to refactoring frame pointer code generation. We used
to use two function attributes "no-frame-pointer-elim" and
"no-frame-pointer-elim-non-leaf" to represent three kinds of frame
pointer usage: (all) frames use frame pointer, (non-leaf) frames use
frame pointer, (none) frame use frame pointer. This CL makes the idea
explicit by using only one enum function attribute "frame-pointer"
Option "-frame-pointer=" replaces "-disable-fp-elim" for tools such as
llc.
"no-frame-pointer-elim" and "no-frame-pointer-elim-non-leaf" are still
supported for easy migration to "frame-pointer".
tests are mostly updated with
// replace command line args ‘-disable-fp-elim=false’ with ‘-frame-pointer=none’
grep -iIrnl '\-disable-fp-elim=false' * | xargs sed -i '' -e "s/-disable-fp-elim=false/-frame-pointer=none/g"
// replace command line args ‘-disable-fp-elim’ with ‘-frame-pointer=all’
grep -iIrnl '\-disable-fp-elim' * | xargs sed -i '' -e "s/-disable-fp-elim/-frame-pointer=all/g"
Patch by Yuanfang Chen (tabloid.adroit)!
Differential Revision: https://reviews.llvm.org/D56351
llvm-svn: 351049
Utility function `DeleteDeadBlock` expects that all predecessors of a block being
deleted are already deleted, with the exception of single-block loop. It makes it
hard to use for deletion of a set of blocks that may contain cyclic dependencies.
The is no correct order of invocations of this function that does not produce
dangling pointers on already deleted blocks.
This patch introduces a generalized version of this function `DeleteDeadBlocks`
that allows us to remove multiple blocks at once, even if there are cycles among
them. The only requirement is that no block being deleted should have a predecessor
that is not being deleted.
The logic of `DeleteDeadBlocks` is following:
for each block
create relevant DT updates;
remove all instructions (replace with undef if needed);
replace terminator with unreacheable;
apply DT updates;
for each block
delete block;
Therefore, `DeleteDeadBlock` becomes a particular case of
the general algorithm called for a single block.
Differential Revision: https://reviews.llvm.org/D56120
Reviewed By: skatkov
llvm-svn: 351045
Summary:
Add support for options that always prefix their value, giving an error
if the value is in the next argument or if the option is given a value
assignment (ie. opt=val). This is the desired behavior for the -D option
of FileCheck for instance.
Copyright:
- Linaro (changes in version 2 of revision D55940)
- GraphCore (changes in later versions and introduced when creating
D56549)
Reviewers: jdenny
Subscribers: llvm-commits, probinson, kristina, hiraditya,
JonChesterfield
Differential Revision: https://reviews.llvm.org/D56549
llvm-svn: 351038
This shortcut mechanism for creating types was added 10 years ago, but
has seen almost no uptake since then, neither internally nor in
external projects.
The very small number of characters saved by using it does not seem
worth the mental overhead of an additional type-creation API, so,
delete it.
Differential Revision: https://reviews.llvm.org/D56573
llvm-svn: 351020
MIPS ABI states that every function must be called through jalr $t9. In
other words, a function expect that t9 register points to the beginning
of its code. A function uses this register to calculate offset to the
Global Offset Table and save it to the `gp` register.
```
lui $gp, %hi(_gp_disp)
addiu $gp, %lo(_gp_disp)
addu $gp, $gp, $t9
```
If `t9` and as a result `$gp` point to the wrong place the following code
loads incorrect value from GOT and passes control to invalid code.
```
lw $v0,%call16(foo)($gp)
jalr $t9
```
OrcMips32 and OrcMips64 writeResolverCode methods pass control to the
resolved address, but do not setup `$t9` before the call. The `t9` holds
value of the beginning of `resolver` code so any attempts to call
routines via GOT failed.
This change fixes the problem. The `OrcLazy/hidden-visibility.ll` test
starts to pass correctly. Before the change it fails on MIPS because the
`exitOnLazyCallThroughFailure` called from the resolver code could not
call libc routine `exit` via GOT.
Differential Revision: http://reviews.llvm.org/D56058
llvm-svn: 351000
This patch takes some of the code from D49837 to allow us to enable ISD::ABS support for all SSE vector types.
Differential Revision: https://reviews.llvm.org/D56544
llvm-svn: 350998
Summary:
Records in the module summary index whether the bitcode was compiled
with the option necessary to enable splitting the LTO unit
(e.g. -fsanitize=cfi, -fwhole-program-vtables, or -fsplit-lto-unit).
The information is passed down to the ModuleSummaryIndex builder via a
new module flag "EnableSplitLTOUnit", which is propagated onto a flag
on the summary index.
This is then used during the LTO link to check whether all linked
summaries were built with the same value of this flag. If not, an error
is issued when we detect a situation requiring whole program visibility
of the class hierarchy. This is the case when both of the following
conditions are met:
1) We are performing LowerTypeTests or Whole Program Devirtualization.
2) There are type tests or type checked loads in the code.
Note I have also changed the ThinLTOBitcodeWriter to also gate the
module splitting on the value of this flag.
Reviewers: pcc
Subscribers: ormris, mehdi_amini, Prazek, inglorion, eraman, steven_wu, dexonsmith, arphaman, dang, llvm-commits
Differential Revision: https://reviews.llvm.org/D53890
llvm-svn: 350948
Currently when a select has a constant value in one branch and the select feeds
a conditional branch (via a compare/ phi and compare) we unfold the select
statement. This results in threading the conditional branch later on. Similar
opportunity exists when a select (with a constant in one branch) feeds a
switch (via a phi node). The patch unfolds select under this condition.
A testcase is provided.
llvm-svn: 350931
Summary:
The original patch addressed the use of BlockRPONumber by forcing a sequence point when accessing that map in a conditional. In short we found cases where that map was being accessed with blocks that had not yet been added to that structure. For context, I've kept the wall of text below, to what we are trying to fix, by always ensuring a updated BlockRPONumber.
== Backstory ==
I was investigating an ICE (segfault accessing a DenseMap item). This failure happened non-deterministically, with no apparent reason and only on a Windows build of LLVM (from October 2018).
After looking into the crashes (multiple core files) and running DynamoRio, the cores and DynamoRio (DR) log pointed to the same code in `GVN::performScalarPRE()`. The values in the map are unsigned integers, the keys are `llvm::BasicBlock*`. Our test case that triggered this warning and periodic crash is rather involved. But the problematic line looks to be:
GVN.cpp: Line 2197
```
if (BlockRPONumber[P] >= BlockRPONumber[CurrentBlock] &&
```
To test things out, I cooked up a patch that accessed the items in the map outside of the condition, by forcing a sequence point between accesses. DynamoRio stopped warning of the issue, and the test didn't seem to crash after 1000+ runs.
My investigation was on an older version of LLVM, (source from October this year). What it looks like was occurring is the following, and the assembly from the latest pull of llvm in December seems to confirm this might still be an issue; however, I have not witnessed the crash on more recent builds. Of course the asm in question is generated from the host compiler on that Windows box (not clang), but it hints that we might want to consider how we access the BlockRPONumber map in this conditional (line 2197, listed above). In any case, I don't think the host compiler is wrong, rather I think it is pointing out a possibly latent bug in llvm.
1) There is no sequence point for the `>=` operation.
2) A call to a `DenseMapBase::operator[]` can have the side effect of the map reallocating a larger store (more Buckets, via a call to `DenseMap::grow`).
3) It seems perfectly legal for a host compiler to generate assembly that stores the result of a call to `operator[]` on the stack (that's what my host compile of GVN.cpp is doing) . A second call to `operator[]` //might// encourage the map to 'grow' thus making any pointers to the map's store invalid. The `>=` compares the first and second values. If the first happens to be a pointer produced from operator[], it could be invalid when dereferenced at the time of comparison.
The assembly generated from the Window's host compiler does show the result of the first access to the map via `operator[]` produces a pointer to an unsigned int. And that pointer is being stored on the stack. If a second call to the map (which does occur) causes the map to grow, that address (on the stack) is now invalid.
Reviewers: t.p.northover, efriedma
Reviewed By: efriedma
Subscribers: efriedma, llvm-commits
Differential Revision: https://reviews.llvm.org/D55974
llvm-svn: 350880
Summary:
Step 2 in using MemorySSA in LICM:
Use MemorySSA in LICM to do sinking and hoisting, all under "EnableMSSALoopDependency" flag.
Promotion is disabled.
Enable flag in LICM sink/hoist tests to test correctness of this change. Moved one test which
relied on promotion, in order to test all sinking tests.
Reviewers: sanjoy, davide, gberry, george.burgess.iv
Subscribers: llvm-commits, Prazek
Differential Revision: https://reviews.llvm.org/D40375
llvm-svn: 350879
Field ResourceUnitMask was incorrectly defined as a 'const unsigned' mask. It
should have been a 64 bit quantity instead. That means, ResourceUnitMask was
always implicitly truncated to a 32 bit quantity.
This issue has been found by inspection. Surprisingly, that bug was latent, and
it never negatively affected any existing upstream targets.
This patch fixes the wrong definition of ResourceUnitMask, and adds a bunch of
extra debug prints to help debugging potential issues related to invalid
processor resource masks.
llvm-svn: 350820
Summary:
If we don't reset the optimized value O for access A, even though A is no longer optimized to O, A will still show up in that O's users list.
This fails verification when hoisting a Def outside a loop, even though the updates are correct.
The reason is that the phi in the loop header still find as user the hoisted def, because the Def has a pointer to the Phi in its optimized operand.
Reviewers: george.burgess.iv
Subscribers: sanjoy, jlebar, Prazek, llvm-commits
Differential Revision: https://reviews.llvm.org/D56467
llvm-svn: 350783
Summary:
Instead of using two separate callbacks to return the entry count and the
relative block frequency, use a single callback to return callsite
count. This would allow better supporting hybrid mode in the future as
the count of callsite need not always be derived from entry count (as in
sample PGO).
Reviewers: davidxl
Subscribers: mehdi_amini, steven_wu, dexonsmith, dang, llvm-commits
Differential Revision: https://reviews.llvm.org/D56464
llvm-svn: 350755
Summary: All a non-default title for the debugging this debugging aide
Reviewers: twoh, Kader, modocache
Reviewed By: twoh
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D56499
llvm-svn: 350749
Fixed issue with identity values and other cases, f32/f16 identity values to be added later. fma/mac instructions is disabled for now.
Test is fully reworked, added comments. Other fixes:
1. dpp move with uses and old reg initializer should be in the same BB.
2. bound_ctrl:0 is only considered when bank_mask and row_mask are fully enabled (0xF). Othervise the old register value is checked for identity.
3. Added add, subrev, and, or instructions to the old folding function.
4. Kill flag is cleared for the src0 (DPP register) as it may be copied into more than one user.
Differential revision: https://reviews.llvm.org/D55444
llvm-svn: 350721
Current strategy of dropping `InstructionPrecedenceTracking` cache is to
invalidate the entire basic block whenever we change its contents. In fact,
`InstructionPrecedenceTracking` has 2 internal strictures: `OrderedInstructions`
that is needed to be invalidated whenever the contents changes, and the map
with first special instructions in block. This second map does not need an
update if we add/remove a non-special instuction because it cannot
affect the contents of this map.
This patch changes API of `InstructionPrecedenceTracking` so that it now
accounts for reasons under which we invalidate blocks. This should lead
to much less recalculations of the map and should save us some compile time
because in practice we don't typically add/remove special instructions.
Differential Revision: https://reviews.llvm.org/D54462
Reviewed By: efriedma
llvm-svn: 350694
Starting in C++17, MSVC introduced a new mangling for function
parameters that are themselves noexcept functions. This patch
makes llvm-undname properly demangle them.
Patch by Zachary Henkel
Differential Revision: https://reviews.llvm.org/D55769
llvm-svn: 350656
A straightforward port of tsan to the new PM, following the same path
as D55647.
Differential Revision: https://reviews.llvm.org/D56433
llvm-svn: 350647
The new-pm version of DA is untested. Testing requires a printer, so
add that and use it in the existing DA tests.
Differential Revision: https://reviews.llvm.org/D56386
llvm-svn: 350624
This reverts commit rL350497
reported remaining issues seem to be unrelated to modules or this change.
more info: https://reviews.llvm.org/D56084
llvm-svn: 350621
Summary: Add a utility function for creating a basic block without a parent function. A useful operation for compilers that need to synthesize and conditionally insert code without having to bother with appending and immediately unlinking a block.
Reviewers: whitequark, deadalnix
Reviewed By: whitequark
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D56279
llvm-svn: 350608
Summary: Fix an old outstanding problem with the int cast builder binding always assuming the cast is signed by introducing a new LLVMBuildIntCast2 operation and deprecating the old prototype.
Reviewers: whitequark, deadalnix
Reviewed By: whitequark
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D56280
llvm-svn: 350607
As we saw in D56057 when we tried to use this function on X86, it's unsafe. It allows the operand node to have multiple users, but doesn't prevent recursing past the first node when it does have multiple users. This can cause other simplifications earlier in the graph without regard to what bits are needed by the other users of the first node. Ideally all we should do to the first node if it has multiple uses is bypass it when its not needed by the user we started from. Doing any other transformation that SimplifyDemandedBits can do like turning ZEXT/SEXT into AEXT would result in an increase in instructions.
Fortunately, we already have a function that can do just that, GetDemandedBits. It will only make transformations that involve bypassing a node.
This patch changes AMDGPU's simplifyI24, to use a combination of GetDemandedBits to handle the multiple use simplifications. And then uses the regular SimplifyDemandedBits on each operand to handle simplifications allowed when the operand only has a single use. Unfortunately, GetDemandedBits simplifies constants more aggressively than SimplifyDemandedBits. This caused the -7 constant in the changed test to be simplified to remove the upper bits. I had to modify computeKnownBits to account for this by ignoring the upper 8 bits of the input.
Differential Revision: https://reviews.llvm.org/D56087
llvm-svn: 350560
`CallSite`.
With this change, the remaining `CallSite` usages are just for
implementing the wrapper type itself.
This does update the C API but leaves the names of that API alone and
only updates their implementation.
Differential Revision: https://reviews.llvm.org/D56184
llvm-svn: 350509
update client code.
Also rename it to use the more generic term `call` instead of something
that could be confused with a praticular type.
Differential Revision: https://reviews.llvm.org/D56183
llvm-svn: 350508
minted `CallBase` class instead of the `CallSite` wrapper.
This moves the largest interwoven collection of APIs that traffic in
`CallSite`s. While a handful of these could have been migrated with
a minorly more shallow migration by converting from a `CallSite` to
a `CallBase`, it hardly seemed worth it. Most of the APIs needed to
migrate together because of the complex interplay of AA APIs and the
fact that converting from a `CallBase` to a `CallSite` isn't free in its
current implementation.
Out of tree users of these APIs can fairly reliably migrate with some
combination of `.getInstruction()` on the `CallSite` instance and
casting the resulting pointer. The most generic form will look like `CS`
-> `cast_or_null<CallBase>(CS.getInstruction())` but in most cases there
is a more elegant migration. Hopefully, this migrates enough APIs for
users to fully move from `CallSite` to the base class. All of the
in-tree users were easily migrated in that fashion.
Thanks for the review from Saleem!
Differential Revision: https://reviews.llvm.org/D55641
llvm-svn: 350503
a way that it still supports `CallSite` but users can be ported to rely
on `CallBase` instead.
This will unblock the ports across the analysis and transforms libraries
(and out-of-tree users) and once done we can clean this up by removing
the `CallSite` layer.
Differential Revision: https://reviews.llvm.org/D56182
llvm-svn: 350502
In addition to finding dead uses of instructions, also find dead uses
of function arguments, and replace them with zero as well.
I'm changing the way the known bits are computed here to remove the
coupling between the transfer function and the algorithm. It previously
relied on the first op being visited first and computing known bits --
unless the first op is not an instruction, in which case they're computed
on the second op. I could have adjusted this to check for "instruction
or argument", but I think it's better to avoid the repeated calculation
with an explicit flag.
Differential Revision: https://reviews.llvm.org/D56247
llvm-svn: 350435
At -O0, globalopt is not run during the compile step, and we can have a
chain of an alias having an immediate aliasee of another alias. The
summaries are constructed assuming aliases in a canonical form
(flattened chains), and as a result only the base object but no
intermediate aliases were preserved.
Fix by adding a pass that canonicalize aliases, which ensures each
alias is a direct alias of the base object.
Reviewers: pcc, davidxl
Subscribers: mehdi_amini, inglorion, eraman, steven_wu, dexonsmith, arphaman, llvm-commits
Differential Revision: https://reviews.llvm.org/D54507
llvm-svn: 350423
Lifetime markers which reference inputs to the extraction region are not
safe to extract. Example ('rhs' will be extracted):
```
entry:
+------------+
| x = alloca |
| y = alloca |
+------------+
/ \
lhs: rhs:
+-------------------+ +-------------------+
| lifetime_start(x) | | lifetime_start(x) |
| use(x) | | lifetime_start(y) |
| lifetime_end(x) | | use(x, y) |
| lifetime_start(y) | | lifetime_end(y) |
| use(y) | | lifetime_end(x) |
| lifetime_end(y) | +-------------------+
+-------------------+
```
Prior to extraction, the stack coloring pass sees that the slots for 'x'
and 'y' are in-use at the same time. After extraction, the coloring pass
infers that 'x' and 'y' are *not* in-use concurrently, because markers
from 'rhs' are no longer available to help decide otherwise.
This leads to a miscompile, because the stack slots actually are in-use
concurrently in the extracted function.
Fix this by moving lifetime start/end markers for memory regions defined
in the calling function around the call to the extracted function.
Fixes llvm.org/PR39671 (rdar://45939472).
Differential Revision: https://reviews.llvm.org/D55967
llvm-svn: 350420
Added field 'MustIssueImmediately' to the instruction descriptor of instructions
that only consume in-order issue/dispatch processor resources.
This speeds up queries from the hardware Scheduler, and gives an average ~5%
speedup on a release build.
No functional change intended.
llvm-svn: 350397
Method ResourceManager::use() is responsible for updating the internal state of
used processor resources, as well as notifying resource groups that contain used
resources.
Before this patch, method 'use()' didn't know how to quickly obtain the set of
groups that contain a particular resource unit. It had to discover groups by
perform a potentially slow search (done by iterating over the set of processor
resource descriptors).
With this patch, the relationship between resource units and groups is stored in
the ResourceManager. That means, method 'use()' no longer has to search for
groups. This gives an average speedup of ~4-5% on a release build.
This patch also adds extra code comments in ResourceManager.h to better describe
the resource mask layout, and how resouce indices are computed from resource
masks.
llvm-svn: 350387
Prediction control instructions are only
mandatory from v8.5a onwards but is optional
from Armv8.0-A. This patch adds a command
line option to enable it by it's own.
Differential Revision: https://reviews.llvm.org/D56007
llvm-svn: 350385
As noted in PR39973 and D55558:
https://bugs.llvm.org/show_bug.cgi?id=39973
...this is a partial implementation of a fold that we do as an IR canonicalization in instcombine:
// extelt (binop X, Y), Index --> binop (extelt X, Index), (extelt Y, Index)
We want to have this in the DAG too because as we can see in some of the test diffs (reductions),
the pattern may not be visible in IR.
Given that this is already an IR canonicalization, any backend that would prefer a vector op over
a scalar op is expected to already have the reverse transform in DAG lowering (not sure if that's
a realistic expectation though). The transform is limited with a TLI hook because there's an
existing transform in CodeGenPrepare that tries to do the opposite transform.
Differential Revision: https://reviews.llvm.org/D55722
llvm-svn: 350354
Summary:
Keeping msan a function pass requires replacing the module level initialization:
That means, don't define a ctor function which calls __msan_init, instead just
declare the init function at the first access, and add that to the global ctors
list.
Changes:
- Pull the actual sanitizer and the wrapper pass apart.
- Add a newpm msan pass. The function pass inserts calls to runtime
library functions, for which it inserts declarations as necessary.
- Update tests.
Caveats:
- There is one test that I dropped, because it specifically tested the
definition of the ctor.
Reviewers: chandlerc, fedor.sergeev, leonardchan, vitalybuka
Subscribers: sdardis, nemanjai, javed.absar, hiraditya, kbarton, bollu, atanasyan, jsji
Differential Revision: https://reviews.llvm.org/D55647
llvm-svn: 350305
SB (Speculative Barrier) is only mandatory from 8.5
onwards but is optional from Armv8.0-A. This patch adds a command
line option to enable SB, as it was previously only possible to
enable by selecting -march=armv8.5-a.
This patch also renames FeatureSpecRestrict to FeatureSB.
Reviewed By: olista01, LukeCheeseman
Differential Revision: https://reviews.llvm.org/D55990
llvm-svn: 350299
There can be multiple local symbols with the same name (for e.g.
comdat sections), and thus the symbol name itself isn't enough
to disambiguate symbols.
Differential Revision: https://reviews.llvm.org/D56140
llvm-svn: 350288
Summary: Add read[only|write] PIC relocation models to the C API and teach the TargetMachine API about it.
Reviewers: whitequark, deadalnix
Reviewed By: whitequark
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D56187
llvm-svn: 350279
Sometimes it's useful to be able to output demangled names without
tag specifiers like "struct", "class", etc. This patch adds a
flag enabling this.
llvm-svn: 350241
Motivated by the discussion in D38499, this patch updates BasicAA to support
arbitrary pointer sizes by switching most remaining non-APInt calculations to
use APInt. The size of these APInts is set to the maximum pointer size (maximum
over all address spaces described by the data layout string).
Most of this translation is straightforward, but this patch contains a fix for
a bug that revealed itself during this translation process. In order for
test/Analysis/BasicAA/gep-and-alias.ll to pass, which is run with 32-bit
pointers, the intermediate calculations must be performed using 64-bit
integers. This is because, as noted in the patch, when GetLinearExpression
decomposes an expression into C1*V+C2, and we then multiply this by Scale, and
distribute, to get (C1*Scale)*V + C2*Scale, it can be the case that, even
through C1*V+C2 does not overflow for relevant values of V, (C2*Scale) can
overflow. If this happens, later logic will draw invalid conclusions from the
(base) offset value. Thus, when initially applying the APInt conversion,
because the maximum pointer size in this test is 32 bits, it started failing.
Suspicious, I created a 64-bit version of this test (included here), and that
failed (miscompiled) on trunk for a similar reason (the multiplication can
overflow).
After fixing this overflow bug, the first test case (at least) in
Analysis/BasicAA/q.bad.ll started failing. This is also a 32-bit test, and was
relying on having 64-bit intermediate values to have BasicAA return an accurate
result. In order to fix this problem, and because I believe that it is not
uncommon to use i64 indexing expressions in 32-bit code (especially portable
code using int64_t), it seems reasonable to always use at least 64-bit
integers. In this way, we won't regress our analysis capabilities (and there's
a command-line option added, so experimenting with this should be easy).
As pointed out by Eli during the review, there are other potential overflow
conditions that this patch does not address. Fixing those is left to follow-up
work.
Patch by me with contributions from Michael Ferguson (mferguson@cray.com).
Differential Revision: https://reviews.llvm.org/D38662
llvm-svn: 350220
GlobalVariable
Summary:
Extend Module::getOrInsertGlobal to accept a callback for creating a new
GlobalVariable if necessary instead of calling the GV constructor
directly using default arguments. Additionally overload
getOrInsertGlobal for the previous default behavior.
Reviewers: chandlerc
Subscribers: hiraditya, llvm-commits, bollu
Differential Revision: https://reviews.llvm.org/D56130
llvm-svn: 350219
Summary: Add accessors so the performance improvement from this setting is accessible to third parties.
Reviewers: whitequark, deadalnix
Reviewed By: whitequark
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D56179
llvm-svn: 350196
This (mostly) fixes https://bugs.llvm.org/show_bug.cgi?id=39771.
BDCE currently detects instructions that don't have any demanded bits
and replaces their uses with zero. However, if an instruction has
multiple uses, then some of the uses may be dead (have no demanded bits)
even though the instruction itself is still live. This patch extends
DemandedBits/BDCE to detect such uses and replace them with zero.
While this will not immediately render any instructions dead, it may
lead to simplifications (in the motivating case, by converting a rotate
into a simple shift), break dependencies, etc.
The implementation tries to strike a balance between analysis power and
complexity/memory usage. Originally I wanted to track demanded bits on
a per-use level, but ultimately we're only really interested in whether
a use is entirely dead or not. I'm using an extra set to track which uses
are dead. However, as initially all uses are dead, I'm not storing uses
those user is also dead. This case is checked separately instead.
The previous attempt to land this lead to miscompiles, because cases
where uses were initially dead but were later found to be live during
further analysis were not always correctly removed from the DeadUses
set. This is fixed now and the added test case demanstrates such an
instance.
Differential Revision: https://reviews.llvm.org/D55563
llvm-svn: 350188
SB (Speculative Barrier) is only mandatory from 8.5
onwards but is optional from Armv8.0-A. This patch adds a command
line option to enable SB, as it was previously only possible to
enable by selecting -march=armv8.5-a.
This patch also moves to FeatureSB the old FeatureSpecRestrict.
Reviewers: pbarrio, olista01, t.p.northover, LukeCheeseman
Differential Revision: https://reviews.llvm.org/D55921
llvm-svn: 350126
Summary:
This patch extends the MemberAttributes interface with the isStatic method.
It is needed for D56126.
Reviewers: zturner, rnk
Reviewed By: zturner
Differential Revision: https://reviews.llvm.org/D56127
llvm-svn: 350125
Summary:
This will make migrating code easier and generally seems like a good collection
of API improvements.
Some of these APIs seem like more consistent / better naming of existing
ones. I've retained the old names for migration simplicit and am just
adding the new ones in this commit. I'll try to garbage collect these
once CallSite is gone.
Subscribers: sanjoy, mcrosier, hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D55638
llvm-svn: 350109
Summary:
It does so using a simple nesting stack, and gives clear errors upon
violation. This is unique to wasm, since most CPUs do not have
any nested constructs.
Had to add an end of file check to the general assembler for this.
Note: if/else/end instructions are not currently supported in our
tablegen defs, so these tests will be enabled in a follow-up.
They already pass the nesting check.
Reviewers: dschuff, aheejin
Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits
Differential Revision: https://reviews.llvm.org/D55797
llvm-svn: 350078
The patch adds a possibility to make library calls on NVPTX.
An important thing about library functions - they must be defined within
the current module. This basically should guarantee that we produce a
valid PTX assembly (without calls to not defined functions). The one who
wants to use the libcalls is probably will have to link against
compiler-rt or any other implementation.
Currently, it's completely impossible to make library calls because of
error LLVM ERROR: Cannot select: i32 = ExternalSymbol '...'. But we can
lower ExternalSymbol to TargetExternalSymbol and verify if the function
definition is available.
Also, there was an issue with a DAG during legalisation. When we expand
instruction into libcall, the inner call-chain isn't being "integrated"
into outer chain. Since the last "data-flow" (call retval load) node is
located in call-chain earlier than CALLSEQ_END node, the latter becomes
a leaf and therefore a dead node (and is being removed quite fast).
Proposed here solution relies on another data-flow pseudo nodes
(ProxyReg) which purpose is only to keep CALLSEQ_END at legalisation and
instruction selection phases - we remove the pseudo instructions before
register scheduling phase.
Patch by Denys Zariaiev!
Differential Revision: https://reviews.llvm.org/D34708
llvm-svn: 350069
This is difficult/not possible to test in LLVM, but is visible as a
crash in LLD when parsing DWARF to generate gdb-index.
This function is called by llvm-dwarfdump when parsing high_pc for
non-verbose output (to print the actual high_pc rather than the low_pc
relative value), but in that case llvm-dwarfdump doesn't print section
names (if it did, it would hit this problem).
We could add some other features to llvm-dwarfdump to expose this, but
nothing really springs to my mind. I will add a test to lld, though.
llvm-svn: 350010
Currently the section name (& possibly number) is only printed on
addresses in ranges - but no reason it couldn't also be displayed on
other addresses (like low/high PC).
Refactor in that direction by pulling out the section lookup and name
ambiguity dumping logic into a reusable helper.
llvm-svn: 349995
Propagate the llvm::Error a little further up. This is NFC for
llvm-dwarfdump in this change, but allows ld.lld to emit more precise
error messages about which object and archive the erroneous DWARF is in.
llvm-svn: 349978
Summary:
Added a pair of APIs for encoding/decoding the 3 components of a DWARF discriminator described in http://lists.llvm.org/pipermail/llvm-dev/2016-October/106532.html: the base discriminator, the duplication factor (useful in profile-guided optimization) and the copy index (used to identify copies of code in cases like loop unrolling)
The encoding packs 3 unsigned values in 32 bits. This CL addresses 2 issues:
- communicates overflow back to the user
- supports encoding all 3 components together. Current APIs assume a sequencing of events. For example, creating a new discriminator based on an existing one by changing the base discriminator was not supported.
Reviewers: davidxl, danielcdh, wmi, dblaikie
Reviewed By: dblaikie
Subscribers: zzheng, dmgreen, aprantl, JDevlieghere, llvm-commits
Differential Revision: https://reviews.llvm.org/D55681
llvm-svn: 349973
Instruction::isLifetimeStartOrEnd() checks whether an Instruction is an
llvm.lifetime.start or an llvm.lifetime.end intrinsic.
This was suggested as a cleanup in D55967.
Differential Revision: https://reviews.llvm.org/D56019
llvm-svn: 349964
Summary:
This function checks whether the mappings in the interval map overlap
with the given range [a;b]. The motivation is to enable checking for
overlap before inserting a new interval into the map.
Reviewers: vsk, dblaikie
Subscribers: dexonsmith, kristina, llvm-commits
Differential Revision: https://reviews.llvm.org/D55760
llvm-svn: 349898
-print-after IR printing generally can not print the IR unit (Loop or SCC)
which has just been invalidated by the pass. However, when working in -print-module-scope
mode even if Loop was invalidated there is still a valid module that we can print.
Since we can not access invalidated IR unit from AfterPassInvalidated instrumentation
point we can remember the module to be printed *before* pass. This change introduces
BeforePass instrumentation that stores all the information required for module printing
into the stack and then after pass (in AfterPassInvalidated) just print whatever
has been placed on stack.
Reviewed By: philip.pfaffe
Differential Revision: https://reviews.llvm.org/D55278
llvm-svn: 349896
- When signing return addresses with -msign-return-address=<scope>{+<key>},
either the A key instructions or the B key instructions can be used. To
correctly authenticate the return address, the unwinder/debugger must know
which key was used to sign the return address.
- When and exception is thrown or a break point reached, it may be necessary to
unwind the stack. To accomplish this, the unwinder/debugger must be able to
first authenticate an the return address if it has been signed.
- To enable this, the augmentation string of CIEs has been extended to allow
inclusion of a 'B' character. Functions that are signed using the B key
variant of the instructions should have and FDE whose associated CIE has a 'B'
in the augmentation string.
- One must also be able to preserve these semantics when first stepping from a
high level language into assembly and then, as a second step, into an object
file. To achieve this, I have introduced a new assembly directive
'.cfi_b_key_frame ', that tells the assembler the current frame uses return
address signing with the B key.
- This ensures that the FDE is associated with a CIE that has 'B' in the
augmentation string.
Differential Revision: https://reviews.llvm.org/D51798
llvm-svn: 349895
This saves materializing the immediate. The additional forms are less
common (they don't usually show up for bitfield insert/extract), but
they're still relevant.
I had to add a new target hook to prevent DAGCombine from reversing the
transform. That isn't the only possible way to solve the conflict, but
it seems straightforward enough.
Differential Revision: https://reviews.llvm.org/D55630
llvm-svn: 349857
If we found unsafe dependences other than 'unknown', we already know at
compile time that they are unsafe and the runtime checks should always
fail. So we can avoid generating them in those cases.
This should have no negative impact on performance as the runtime checks
that would be created previously should always fail. As a sanity check,
I measured the test-suite, spec2k and spec2k6 and there were no regressions.
Reviewers: Ayal, anemet, hsaito
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D55798
llvm-svn: 349794
Summary:
Add PLATFORM constants for iOS, tvOS, and watchOS simulators, as well
as human readable names for these constants, to the Mach-O file format
header files.
rdar://46854119
Reviewers: ab, davide
Reviewed By: ab, davide
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D55905
llvm-svn: 349779
These tools were assuming ABI version is 0,
that is not always true.
Patch teaches them to work with that field.
Differential revision: https://reviews.llvm.org/D55884
llvm-svn: 349737
Summary:
This allows expanding {7,11,13,14,15,21,22,23,25,26,27,28,29,30,31}-byte memcmp
in just two loads on X86. These were previously calling memcmp.
Reviewers: spatel, gchatelet
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D55263
llvm-svn: 349731
The current llvm.mem.parallel_loop_access metadata has a problem in that
it uses LoopIDs. LoopID unfortunately is not loop identifier. It is
neither unique (there's even a regression test assigning the some LoopID
to multiple loops; can otherwise happen if passes such as LoopVersioning
make copies of entire loops) nor persistent (every time a property is
removed/added from a LoopID's MDNode, it will also receive a new LoopID;
this happens e.g. when calling Loop::setLoopAlreadyUnrolled()).
Since most loop transformation passes change the loop attributes (even
if it just to mark that a loop should not be processed again as
llvm.loop.isvectorized does, for the versioned and unversioned loop),
the parallel access information is lost for any subsequent pass.
This patch unlinks LoopIDs and parallel accesses.
llvm.mem.parallel_loop_access metadata on instruction is replaced by
llvm.access.group metadata. llvm.access.group points to a distinct
MDNode with no operands (avoiding the problem to ever need to add/remove
operands), called "access group". Alternatively, it can point to a list
of access groups. The LoopID then has an attribute
llvm.loop.parallel_accesses with all the access groups that are parallel
(no dependencies carries by this loop).
This intentionally avoid any kind of "ID". Loops that are clones/have
their attributes modifies retain the llvm.loop.parallel_accesses
attribute. Access instructions that a cloned point to the same access
group. It is not necessary for each access to have it's own "ID" MDNode,
but those memory access instructions with the same behavior can be
grouped together.
The behavior of llvm.mem.parallel_loop_access is not changed by this
patch, but should be considered deprecated.
Differential Revision: https://reviews.llvm.org/D52116
llvm-svn: 349725
This (mostly) fixes https://bugs.llvm.org/show_bug.cgi?id=39771.
BDCE currently detects instructions that don't have any demanded bits
and replaces their uses with zero. However, if an instruction has
multiple uses, then some of the uses may be dead (have no demanded bits)
even though the instruction itself is still live. This patch extends
DemandedBits/BDCE to detect such uses and replace them with zero.
While this will not immediately render any instructions dead, it may
lead to simplifications (in the motivating case, by converting a rotate
into a simple shift), break dependencies, etc.
The implementation tries to strike a balance between analysis power and
complexity/memory usage. Originally I wanted to track demanded bits on
a per-use level, but ultimately we're only really interested in whether
a use is entirely dead or not. I'm using an extra set to track which uses
are dead. However, as initially all uses are dead, I'm not storing uses
those user is also dead. This case is checked separately instead.
The test case has a couple of cases that are not simplified yet. In
particular, we're only looking at uses of instructions right now. I think
it would make sense to also extend this to arguments. Furthermore
DemandedBits doesn't yet know some of the tricks that InstCombine does
for the demanded bits or bitwise or/and/xor in combination with known
bits information.
Differential Revision: https://reviews.llvm.org/D55563
llvm-svn: 349674
This is to address post-commit feedback from Paul Robinson on r348954.
The original commit misinterprets count and upper bound as the same thing (I thought I saw GCC producing an upper bound the same as Clang's count, but GCC correctly produces an upper bound that's one less than the count (in C, that is, where arrays are zero indexed)).
I want to preserve the C-like output for the common case, so in the absence of a lower bound the count (or one greater than the upper bound) is rendered between []. In the trickier cases, where a lower bound is specified, a half-open range is used (eg: lower bound 1, count 2 would be "[1, 3)" and an unknown parts use a '?' (eg: "[1, ?)" or "[?, 7)" or "[?, ? + 3)").
Reviewers: aprantl, probinson, JDevlieghere
Differential Revision: https://reviews.llvm.org/D55721
llvm-svn: 349670
This adds a G_FCEIL generic instruction and uses it in AArch64. This adds
selection for floating point ceil where it has a supported, dedicated
instruction. Other cases aren't handled here.
It updates the relevant gisel tests and adds a select-ceil test. It also adds a
check to arm64-vcvt.ll which ensures that we don't fall back when we run into
one of the relevant cases.
llvm-svn: 349664
Now that SimplifyDemandedBits/SimplifyDemandedVectorElts is simplifying vector elements, we're seeing more constant BUILD_VECTOR containing undefs.
This patch provides opt-in support for UNDEF elements in matchBinaryPredicate, passing NULL instead of the result ConstantSDNode* argument.
Differential Revision: https://reviews.llvm.org/D55822
llvm-svn: 349628
Now that SimplifyDemandedBits/SimplifyDemandedVectorElts are simplifying vector elements, we're seeing more constant BUILD_VECTOR containing UNDEFs.
This patch provides opt-in handling of UNDEF elements in matchUnaryPredicate, passing NULL instead of the ConstantSDNode* argument.
I've updated SelectionDAG::simplifyShift to demonstrate its use.
Differential Revision: https://reviews.llvm.org/D55819
llvm-svn: 349616
This is an initial implementation of no-op passthrough copying of COFF
with objcopy.
Differential Revision: https://reviews.llvm.org/D54939
llvm-svn: 349605
In AsmPrinter, make struct HandlerInfo and SmallVector
Handlers protected, so target extended AsmPrinter will
be able to add their own handlers.
Signed-off-by: Yonghong Song <yhs@fb.com>
Differential Revision: https://reviews.llvm.org/D55756
llvm-svn: 349602
We need to keep the underlying profile reader alive as long as the
profile data, because the profile data may contain StringRefs referring
to strings in the reader's name table.
llvm-svn: 349600
This patch moved the following files in lib/CodeGen/AsmPrinter/
AsmPrinterHandler.h
DbgEntityHistoryCalculator.h
DebugHandlerBase.h
to include/llvm/CodeGen directory.
Such a change will enable Target to extend DebugHandlerBase
and emit Target specific debug info sections.
Signed-off-by: Yonghong Song <yhs@fb.com>
Differential Revision: https://reviews.llvm.org/D55755
llvm-svn: 349564
This patch adds a VectorizationSafetyStatus enum, which will be extended
in a follow up patch to distinguish between 'safe with runtime checks'
and 'known unsafe' dependences.
Reviewers: anemet, anna, Ayal, hsaito
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D54892
llvm-svn: 349556
SelectionDAG currently changes these intrinsics to function calls, but that won't work
for other ISel's. Also we want to eventually support nonlazybind and weak linkage coming
from the front-end which we can't do in SelectionDAG.
llvm-svn: 349552
We're moving ARC optimisation and ARC emission in clang away from runtime methods
and towards intrinsics. This is the part which actually uses the intrinsics in the ARC
optimizer when both analyzing the existing calls and emitting new ones.
Differential Revision: https://reviews.llvm.org/D55348
Reviewers: ahatanak
llvm-svn: 349534
Rename:
NoUnrolling to InterleaveOnlyWhenForced
and
AlwaysVectorize to !VectorizeOnlyWhenForced
Contrary to what the name 'AlwaysVectorize' suggests, it does not
unconditionally vectorize all loops, but applies a cost model to
determine whether vectorization is profitable to all loops. Hence,
passing false will disable the cost model, except when a loop is marked
with llvm.loop.vectorize.enable. The 'OnlyWhenForced' suffix (suggested
by @hfinkel in D55716) better matches this behavior.
Similarly, 'NoUnrolling' disables the profitability cost model for
interleaving (a term to distinguish it from unrolling by the
LoopUnrollPass); rename it for consistency.
Differential Revision: https://reviews.llvm.org/D55785
llvm-svn: 349513
When using clang with `-fno-unroll-loops` (implicitly added with `-O1`),
the LoopUnrollPass is not not added to the (legacy) pass pipeline. This
also means that it will not process any loop metadata such as
llvm.loop.unroll.enable (which is generated by #pragma unroll or
WarnMissedTransformationsPass emits a warning that a forced
transformation has not been applied (see
https://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20181210/610833.html).
Such explicit transformations should take precedence over disabling
heuristics.
This patch unconditionally adds LoopUnrollPass to the optimizing
pipeline (that is, it is still not added with `-O0`), but passes a flag
indicating whether automatic unrolling is dis-/enabled. This is the same
approach as LoopVectorize uses.
The new pass manager's pipeline builder has no option to disable
unrolling, hence the problem does not apply.
Differential Revision: https://reviews.llvm.org/D55716
llvm-svn: 349509
This is https://bugs.llvm.org/show_bug.cgi?id=39992,
If we have the following code (test.cpp):
thread_local int tdata = 24;
and build an .o file with debug information:
clang --target=x86_64-pc-linux -c bar.cpp -g
Then object produced may have R_X86_64_DTPOFF64/R_X86_64_DTPOFF32 relocations.
(clang emits R_X86_64_DTPOFF64 and gcc emits R_X86_64_DTPOFF32 for the code above for me)
Currently, llvm-dwarfdump fails to compute this TLS relocation when dumping
object and reports an
error:
failed to compute relocation: R_X86_64_DTPOFF64, Invalid data was encountered while parsing the file
This relocation represents the offset in the TLS block and resolved by the linker,
but this info is unavailable at the
point when the object file is dumped by this tool.
The patch adds the simple evaluation for such relocations to avoid emitting errors.
Resulting behavior seems to be equal to GNU dwarfdump.
Differential revision: https://reviews.llvm.org/D55762
llvm-svn: 349476
- Reapply changes intially introduced in r343089
- The archtecture info is no longer loaded whenever a DWARFContext is created
- The runtimes libraries (santiziers) make use of the dwarf context classes but
do not intialise the target info
- The architecture of the object can be obtained without loading the target info
- Adding a method to the dwarf context to get this information and multiplex the
string printing later on
Differential Revision: https://reviews.llvm.org/D55774
llvm-svn: 349472
Apply final suggestions from probinson for this patch series plus a
few more tweaks:
* Improve various docs, for MatchType in particular.
* Rename some members of MatchType. The main problem was that the
term "final match" became a misnomer when CHECK-COUNT-<N> was
created.
* Split InputStartLine, etc. declarations into multiple lines.
Differential Revision: https://reviews.llvm.org/D55738
Reviewed By: probinson
llvm-svn: 349425
This patch implements annotations for diagnostics reporting CHECK-NOT
failed matches. These diagnostics are enabled by -vv. As for
diagnostics reporting failed matches for other directives, these
annotations mark the search ranges using `X~~`. The difference here
is that failed matches for CHECK-NOT are successes not errors, so they
are green not red when colors are enabled.
For example:
```
$ FileCheck -dump-input=help
The following description was requested by -dump-input=help to
explain the input annotations printed by -dump-input=always and
-dump-input=fail:
- L: labels line number L of the input file
- T:L labels the only match result for a pattern of type T from line L of
the check file
- T:L'N labels the Nth match result for a pattern of type T from line L of
the check file
- ^~~ marks good match (reported if -v)
- !~~ marks bad match, such as:
- CHECK-NEXT on same line as previous match (error)
- CHECK-NOT found (error)
- CHECK-DAG overlapping match (discarded, reported if -vv)
- X~~ marks search range when no match is found, such as:
- CHECK-NEXT not found (error)
- CHECK-NOT not found (success, reported if -vv)
- CHECK-DAG not found after discarded matches (error)
- ? marks fuzzy match when no match is found
- colors success, error, fuzzy match, discarded match, unmatched input
If you are not seeing color above or in input dumps, try: -color
$ FileCheck -vv -dump-input=always check5 < input5 |& sed -n '/^<<<</,$p'
<<<<<<
1: abcdef
check:1 ^~~
not:2 X~~
2: ghijkl
not:2 ~~~
check:3 ^~~
3: mnopqr
not:4 X~~~~~
4: stuvwx
not:4 ~~~~~~
5:
eof:4 ^
>>>>>>
$ cat check5
CHECK: abc
CHECK-NOT: foobar
CHECK: jkl
CHECK-NOT: foobar
$ cat input5
abcdef
ghijkl
mnopqr
stuvwx
```
Reviewed By: george.karpenkov, probinson
Differential Revision: https://reviews.llvm.org/D53899
llvm-svn: 349424
This patch implements input annotations for diagnostics reporting
CHECK-DAG discarded matches. These diagnostics are enabled by -vv.
These annotations mark discarded match ranges using `!~~` because they
are bad matches even though they are not errors.
CHECK-DAG discarded matches create another case where there can be
multiple match results for the same directive.
For example:
```
$ FileCheck -dump-input=help
The following description was requested by -dump-input=help to
explain the input annotations printed by -dump-input=always and
-dump-input=fail:
- L: labels line number L of the input file
- T:L labels the only match result for a pattern of type T from line L of
the check file
- T:L'N labels the Nth match result for a pattern of type T from line L of
the check file
- ^~~ marks good match (reported if -v)
- !~~ marks bad match, such as:
- CHECK-NEXT on same line as previous match (error)
- CHECK-NOT found (error)
- CHECK-DAG overlapping match (discarded, reported if -vv)
- X~~ marks search range when no match is found, such as:
- CHECK-NEXT not found (error)
- CHECK-DAG not found after discarded matches (error)
- ? marks fuzzy match when no match is found
- colors success, error, fuzzy match, discarded match, unmatched input
If you are not seeing color above or in input dumps, try: -color
$ FileCheck -vv -dump-input=always check4 < input4 |& sed -n '/^<<<</,$p'
<<<<<<
1: abcdef
dag:1 ^~~~
dag:2'0 !~~~ discard: overlaps earlier match
2: cdefgh
dag:2'1 ^~~~
check:3 X~ error: no match found
>>>>>>
$ cat check4
CHECK-DAG: abcd
CHECK-DAG: cdef
CHECK: efgh
$ cat input4
abcdef
cdefgh
```
This shows that the line 3 CHECK fails to match even though its
pattern appears in the input because its search range starts after the
line 2 CHECK-DAG's match range. The trouble might be that the line 2
CHECK-DAG's match range is later than expected because its first match
range overlaps with the line 1 CHECK-DAG match range and thus is
discarded.
Because `!~~` for CHECK-DAG does not indicate an error, it is not
colored red. Instead, when colors are enabled, it is colored cyan,
which suggests a match that went cold.
Reviewed By: george.karpenkov, probinson
Differential Revision: https://reviews.llvm.org/D53898
llvm-svn: 349423
This patch implements input annotations for diagnostics enabled by -v,
which report good matches for directives. These annotations mark
match ranges using `^~~`.
For example:
```
$ FileCheck -dump-input=help
The following description was requested by -dump-input=help to
explain the input annotations printed by -dump-input=always and
-dump-input=fail:
- L: labels line number L of the input file
- T:L labels the only match result for a pattern of type T from line L of
the check file
- T:L'N labels the Nth match result for a pattern of type T from line L of
the check file
- ^~~ marks good match (reported if -v)
- !~~ marks bad match, such as:
- CHECK-NEXT on same line as previous match (error)
- CHECK-NOT found (error)
- X~~ marks search range when no match is found, such as:
- CHECK-NEXT not found (error)
- ? marks fuzzy match when no match is found
- colors success, error, fuzzy match, unmatched input
If you are not seeing color above or in input dumps, try: -color
$ FileCheck -v -dump-input=always check3 < input3 |& sed -n '/^<<<</,$p'
<<<<<<
1: abc foobar def
check:1 ^~~
not:2 !~~~~~ error: no match expected
check:3 ^~~
>>>>>>
$ cat check3
CHECK: abc
CHECK-NOT: foobar
CHECK: def
$ cat input3
abc foobar def
```
-vv enables these annotations for FileCheck's implicit EOF patterns as
well. For an example where EOF patterns become relevant, see patch 7
in this series.
If colors are enabled, `^~~` is green to suggest success.
-v plus color enables highlighting of input text that has no final
match for any expected pattern. The highlight uses a cyan background
to suggest a cold section. This highlighting can make it easier to
spot text that was intended to be matched but that failed to be
matched in a long series of good matches.
CHECK-COUNT-<num> good matches are another case where there can be
multiple match results for the same directive.
Reviewed By: george.karpenkov, probinson
Differential Revision: https://reviews.llvm.org/D53897
llvm-svn: 349422
This patch implements input annotations for diagnostics that report
unexpected matches for CHECK-NOT. Like wrong-line matches for
CHECK-NEXT, CHECK-SAME, and CHECK-EMPTY, these annotations mark match
ranges using red `!~~` to indicate bad matches that are errors.
For example:
```
$ FileCheck -dump-input=help
The following description was requested by -dump-input=help to
explain the input annotations printed by -dump-input=always and
-dump-input=fail:
- L: labels line number L of the input file
- T:L labels the only match result for a pattern of type T from line L of
the check file
- T:L'N labels the Nth match result for a pattern of type T from line L of
the check file
- !~~ marks bad match, such as:
- CHECK-NEXT on same line as previous match (error)
- CHECK-NOT found (error)
- X~~ marks search range when no match is found, such as:
- CHECK-NEXT not found (error)
- ? marks fuzzy match when no match is found
- colors error, fuzzy match
If you are not seeing color above or in input dumps, try: -color
$ FileCheck -v -dump-input=always check3 < input3 |& sed -n '/^<<<</,$p'
<<<<<<
1: abc foobar def
not:2 !~~~~~ error: no match expected
>>>>>>
$ cat check3
CHECK: abc
CHECK-NOT: foobar
CHECK: def
$ cat input3
abc foobar def
```
Reviewed By: george.karpenkov, probinson
Differential Revision: https://reviews.llvm.org/D53896
llvm-svn: 349421
This patch implements input annotations for diagnostics that report
wrong-line matches for the directives CHECK-NEXT, CHECK-SAME, and
CHECK-EMPTY. Instead of the usual `^~~`, which is used by later
patches for good matches, these annotations use `!~~` to mark the bad
match ranges so that this category of errors is visually distinct.
Because such matches are errors, these annotates are red when colors
are enabled.
For example:
```
$ FileCheck -dump-input=help
The following description was requested by -dump-input=help to
explain the input annotations printed by -dump-input=always and
-dump-input=fail:
- L: labels line number L of the input file
- T:L labels the only match result for a pattern of type T from line L of
the check file
- T:L'N labels the Nth match result for a pattern of type T from line L of
the check file
- !~~ marks bad match, such as:
- CHECK-NEXT on same line as previous match (error)
- X~~ marks search range when no match is found, such as:
- CHECK-NEXT not found (error)
- ? marks fuzzy match when no match is found
- colors error, fuzzy match
If you are not seeing color above or in input dumps, try: -color
$ FileCheck -v -dump-input=always check2 < input2 |& sed -n '/^<<<</,$p'
<<<<<<
1: foo bar
next:2 !~~ error: match on wrong line
>>>>>>
$ cat check2
CHECK: foo
CHECK-NEXT: bar
$ cat input2
foo bar
```
Reviewed By: george.karpenkov, probinson
Differential Revision: https://reviews.llvm.org/D53894
llvm-svn: 349420
This patch implements input annotations for diagnostics that suggest
fuzzy matches for directives for which no matches were found. Instead
of using the usual `^~~`, which is used by later patches for good
matches, these annotations use `?` so that fuzzy matches are visually
distinct. No tildes are included as these diagnostics (independently
of this patch) currently identify only the start of the match.
For example:
```
$ FileCheck -dump-input=help
The following description was requested by -dump-input=help to
explain the input annotations printed by -dump-input=always and
-dump-input=fail:
- L: labels line number L of the input file
- T:L labels the only match result for a pattern of type T from line L of
the check file
- T:L'N labels the Nth match result for a pattern of type T from line L of
the check file
- X~~ marks search range when no match is found
- ? marks fuzzy match when no match is found
- colors error, fuzzy match
If you are not seeing color above or in input dumps, try: -color
$ FileCheck -v -dump-input=always check1 < input1 |& sed -n '/^<<<</,$p'
<<<<<<
1: ; abc def
2: ; ghI jkl
next:3'0 X~~~~~~~~ error: no match found
next:3'1 ? possible intended match
>>>>>>
$ cat check1
CHECK: abc
CHECK-SAME: def
CHECK-NEXT: ghi
CHECK-SAME: jkl
$ cat input1
; abc def
; ghI jkl
```
This patch introduces the concept of multiple "match results" per
directive. In the above example, the first match result for the
CHECK-NEXT directive is the failed match, for which the annotation
shows the search range. The second match result is the fuzzy match.
Later patches will introduce other cases of multiple match results per
directive.
When colors are enabled, `?` is colored magenta. That is, it doesn't
indicate the actual error, which a red `X~~` marker indicates, but its
color suggests it's closely related.
Reviewed By: george.karpenkov, probinson
Differential Revision: https://reviews.llvm.org/D53893
llvm-svn: 349419
Extend FileCheck to dump its input annotated with FileCheck's
diagnostics: errors, good matches if -v, and additional information if
-vv. The goal is to make it easier to visualize FileCheck's matching
behavior when debugging.
Each patch in this series implements input annotations for a
particular category of FileCheck diagnostics. While the first few
patches alone are somewhat useful, the annotations become much more
useful as later patches implement annotations for -v and -vv
diagnostics, which show the matching behavior leading up to the error.
This first patch implements boilerplate plus input annotations for
error diagnostics reporting that no matches were found for a
directive. These annotations mark the search ranges of the failed
directives. Instead of using the usual `^~~`, which is used by later
patches for good matches, these annotations use `X~~` so that this
category of errors is visually distinct.
For example:
```
$ FileCheck -dump-input=help
The following description was requested by -dump-input=help to
explain the input annotations printed by -dump-input=always and
-dump-input=fail:
- L: labels line number L of the input file
- T:L labels the match result for a pattern of type T from line L of
the check file
- X~~ marks search range when no match is found
- colors error
If you are not seeing color above or in input dumps, try: -color
$ FileCheck -v -dump-input=always check1 < input1 |& sed -n '/^Input file/,$p'
Input file: <stdin>
Check file: check1
-dump-input=help describes the format of the following dump.
Full input was:
<<<<<<
1: ; abc def
2: ; ghI jkl
next:3 X~~~~~~~~ error: no match found
>>>>>>
$ cat check1
CHECK: abc
CHECK-SAME: def
CHECK-NEXT: ghi
CHECK-SAME: jkl
$ cat input1
; abc def
; ghI jkl
```
Some additional details related to the boilerplate:
* Enabling: The annotated input dump is enabled by `-dump-input`,
which can also be set via the `FILECHECK_OPTS` environment variable.
Accepted values are `help`, `always`, `fail`, or `never`. As shown
above, `help` describes the format of the dump. `always` is helpful
when you want to investigate a successful FileCheck run, perhaps for
an unexpected pass. `-dump-input-on-failure` and
`FILECHECK_DUMP_INPUT_ON_FAILURE` remain as a deprecated alias for
`-dump-input=fail`.
* Diagnostics: The usual diagnostics are not suppressed in this mode
and are printed first. For brevity in the example above, I've
omitted them using a sed command. Sometimes they're perfectly
sufficient, and then they make debugging quicker than if you were
forced to hunt through a dump of long input looking for the error.
If you think they'll get in the way sometimes, keep in mind that
it's pretty easy to grep for the start of the input dump, which is
`<<<`.
* Colored Annotations: The annotated input is colored if colors are
enabled (enabling colors can be forced using -color). For example,
errors are red. However, as in the above example, colors are not
vital to reading the annotations.
I don't know how to test color in the output, so any hints here would
be appreciated.
Reviewed By: george.karpenkov, zturner, probinson
Differential Revision: https://reviews.llvm.org/D52999
llvm-svn: 349418
This was a pre-existing bug that could be triggered with assembly like
this:
.p2align 2
.LtmpN:
.cv_def_range "..."
I noticed this when attempting to change clang to emit aligned symbol
records.
llvm-svn: 349403
This is an initial patch to add the necessary support for a DemandedElts argument to SimplifyDemandedBits, more closely matching computeKnownBits and to help improve vector codegen.
I've added only a small amount of the changes necessary to get at least one test to update - a lot more can be done but I'd like to add these methodically with proper test coverage, at the same time the hope is to slowly move some/all of SimplifyDemandedVectorElts into SimplifyDemandedBits as well.
Differential Revision: https://reviews.llvm.org/D55768
llvm-svn: 349374
Class InstrBuilder wrongly assumed that llvm targets were always able to return
a non-null pointer when createMCInstrAnalysis() was called on them.
This was causing crashes when simulating executions for targets that don't
provide an MCInstrAnalysis object.
This patch fixes the issue by making MCInstrAnalysis optional.
llvm-svn: 349352
Summary:
This patch checks if the section order is correct when reading a wasm
object file in `WasmObjectFile` and converting YAML to wasm object in
yaml2wasm. (It is not possible to check when reading YAML because it is
handled exclusively by the YAML reader.)
This checks the ordering of all known sections (core sections + known
custom sections). This also adds section ID DataCount section that will
be scheduled to be added in near future.
Reviewers: sbc100
Subscribers: dschuff, mgorny, jgravelle-google, sunfish, llvm-commits
Differential Revision: https://reviews.llvm.org/D54924
llvm-svn: 349221
Summary:
This allows us to register it with the MachineFunction delegate and be
notified automatically about erasure and creation of instructions. However,
we still need explicit notification for modifications such as those caused
by setReg() or replaceRegWith().
There is a catch with this though. The notification for creation is
delivered before any operands can be added. While appropriate for
scheduling combiner work. This is unfortunate for debug output since an
opcode by itself doesn't provide sufficient information on what happened.
As a result, the work list remembers the instructions (when debug output is
requested) and emits a more complete dump later.
Another nit is that the MachineFunction::Delegate provides const pointers
which is inconvenient since we want to use it to schedule future
modification. To resolve this GISelWorkList now has an optional pointer to
the MachineFunction which describes the scope of the work it is permitted
to schedule. If a given MachineInstr* is in this function then it is
permitted to schedule work to be performed on the MachineInstr's. An
alternative to this would be to remove the const from the
MachineFunction::Delegate interface, however delegates are not permitted
to modify the MachineInstr's they receive.
In addition to this, the observer has three interface changes.
* erasedInstr() is now erasingInstr() to indicate it is about to be erased
but still exists at the moment.
* changingInstr() and changedInstr() have been added to report changes
before and after they are made. This allows us to trace the changes
in the debug output.
* As a convenience changingAllUsesOfReg() and
finishedChangingAllUsesOfReg() will report changingInstr() and
changedInstr() for each use of a given register. This is primarily useful
for changes caused by MachineRegisterInfo::replaceRegWith()
With this in place, both combine rules have been updated to report their
changes to the observer.
Finally, make some cosmetic changes to the debug output and make Combiner
and CombinerHelp
Reviewers: aditya_nandakumar, bogner, volkan, rtereshin, javed.absar
Reviewed By: aditya_nandakumar
Subscribers: mgorny, rovka, kristof.beyls, llvm-commits
Differential Revision: https://reviews.llvm.org/D52947
llvm-svn: 349167
Implement options in clang to enable recording the driver command-line
in an ELF section.
Implement a new special named metadata, llvm.commandline, to support
frontends embedding their command-line options in IR/ASM/ELF.
This differs from the GCC implementation in some key ways:
* In GCC there is only one command-line possible per compilation-unit,
in LLVM it mirrors llvm.ident and multiple are allowed.
* In GCC individual options are separated by NULL bytes, in LLVM entire
command-lines are separated by NULL bytes. The advantage of the GCC
approach is to clearly delineate options in the face of embedded
spaces. The advantage of the LLVM approach is to support merging
multiple command-lines unambiguously, while handling embedded spaces
with escaping.
Differential Revision: https://reviews.llvm.org/D54487
Clang Differential Revision: https://reviews.llvm.org/D54489
llvm-svn: 349155
Summary:
The two utility functions were added in D47919 to support SHT_RELR.
However, these are just relative relocations types and are't
necessarily be named Relr.
Reviewers: phosek, dberris
Reviewed By: dberris
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D55691
llvm-svn: 349133
build version load commands in the object file
This commit introduces a new metadata node called "SDK Version". It will be set
by the frontend to mark the platform SDK (macOS/iOS/etc) version which was used
during that particular compilation.
This node is used when machine code is emitted, by either saving the SDK version
into the appropriate macho load command (version min/build version), or by
emitting the assembly for these load commands with the SDK version specified as
well.
The assembly for both load commands is extended by allowing it to contain the
sdk_version X, Y [, Z] trailing directive to represent the SDK version
respectively.
rdar://45774000
Differential Revision: https://reviews.llvm.org/D55612
llvm-svn: 349119
Summary:
This patch computes the synthetic function entry count on the whole
program callgraph (based on module summary) and writes the entry counts
to the summary. After function importing, this count gets attached to
the IR as metadata. Since it adds a new field to the summary, this bumps
up the version.
Reviewers: tejohnson
Subscribers: mehdi_amini, inglorion, llvm-commits
Differential Revision: https://reviews.llvm.org/D43521
llvm-svn: 349076
Summary:
llvm-size uses "isText()" etc. which seem to indicate whether the section contains code-like things, not whether or not it will actually go in the text segment when in a fully linked executable.
The unit test added (elf-sizes.test) shows some types of sections that cause discrepencies versus the GNU size tool. llvm-size is not correctly reporting sizes of things mapping to text/data segments, at least for ELF files.
This fixes pr38723.
Reviewers: echristo, Bigcheese, MaskRay
Reviewed By: MaskRay
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D54369
llvm-svn: 349074
This is split from D55452 with the correct patch this time.
Pairwise reductions require two shuffles on every level but the last. On the last level the two shuffles are <1, u, u, u...> and <0, u, u, u...>, but <0, u, u, u...> will be dropped by InstCombine/DAGCombine as being an identity shuffle.
Differential Revision: https://reviews.llvm.org/D55615
llvm-svn: 349072
When calling BinaryStreamArray::drop_front(), if the stream
is skewed it means we must never drop the first bytes of the
stream since offsets which occur in records assume the existence
of those bytes. So if we want to skip the first record in a
stream, then what we really want to do is just set the begin
pointer to the next record. But we shouldn't actually remove
those bytes from the underlying view of the data.
llvm-svn: 349066
Move existing rotation expansion code into TargetLowering and set it up for vectors as well.
Ideally this would share more of the funnel shift expansion, but we handle the shift amount modulo quite differently at the moment.
Begun removing x86 vector rotate custom lowering to use the expansion.
llvm-svn: 349025
Summary:
In addition to knowing that an instruction is changed. It's also useful to
know when it's about to change. For example, it might print the instruction so
you can track the changes in a debug log, it might remove it from some queue
while it's being worked on, or it might want to change several instructions as
a single transaction and act on all the changes at once.
Added changingInstr() to all existing uses of changedInstr()
Reviewers: aditya_nandakumar
Reviewed By: aditya_nandakumar
Subscribers: rovka, kristof.beyls, llvm-commits
Differential Revision: https://reviews.llvm.org/D55623
llvm-svn: 348992
Summary:
There's little of interest that can be done to an already-erased instruction.
You can't inspect it, write it to a debug log, etc. It ought to be notification
that we're about to erase it. Rename the function to clarify the timing of the
event and reflect current usage.
Also fixed one case where we were trying to print an erased instruction.
Reviewers: aditya_nandakumar
Reviewed By: aditya_nandakumar
Subscribers: rovka, kristof.beyls, llvm-commits
Differential Revision: https://reviews.llvm.org/D55611
llvm-svn: 348976
Use the replacement execute once threading support in LLVM for PPC64le. It
seems that GCC does not define `__ppc__` and so we would actually call out to
the C++ runtime there which is not what the current code intended. Check both
`__ppc__` and `__PPC__`. This avoids the need for checking the endianness.
Thanks to nemanjai for the hint about GCC's behaviour and the fact that the
reviewed condition could be simplified.
Original patch by Sarvesh Tamba!
llvm-svn: 348970
Continue to present HSA metadata as YAML in ASM and when output by tools
(e.g. llvm-readobj), but encode it in Messagepack in the code object.
Differential Revision: https://reviews.llvm.org/D48179
llvm-svn: 348963
This patch introduces a generic function to determine whether a given vector type is known to be a splat value for the specified demanded elements, recursing up the DAG looking for BUILD_VECTOR or VECTOR_SHUFFLE splat patterns.
It also keeps track of the elements that are known to be UNDEF - it returns true if all the demanded elements are UNDEF (as this may be useful under some circumstances), so this needs to be handled by the caller.
A wrapper variant is also provided that doesn't take the DemandedElts or UndefElts arguments for cases where we just want to know if the SDValue is a splat or not (with/without UNDEFS).
I had hoped to completely remove the X86 local version of this function, but I'm seeing some regressions in shift/rotate codegen that will take a little longer to fix and I hope to get this in sooner so I can continue work on PR38243 which needs more capable splat detection.
Differential Revision: https://reviews.llvm.org/D55426
llvm-svn: 348953
When multiple loop transformation are defined in a loop's metadata, their order of execution is defined by the order of their respective passes in the pass pipeline. For instance, e.g.
#pragma clang loop unroll_and_jam(enable)
#pragma clang loop distribute(enable)
is the same as
#pragma clang loop distribute(enable)
#pragma clang loop unroll_and_jam(enable)
and will try to loop-distribute before Unroll-And-Jam because the LoopDistribute pass is scheduled after UnrollAndJam pass. UnrollAndJamPass only supports one inner loop, i.e. it will necessarily fail after loop distribution. It is not possible to specify another execution order. Also,t the order of passes in the pipeline is subject to change between versions of LLVM, optimization options and which pass manager is used.
This patch adds 'followup' attributes to various loop transformation passes. These attributes define which attributes the resulting loop of a transformation should have. For instance,
!0 = !{!0, !1, !2}
!1 = !{!"llvm.loop.unroll_and_jam.enable"}
!2 = !{!"llvm.loop.unroll_and_jam.followup_inner", !3}
!3 = !{!"llvm.loop.distribute.enable"}
defines a loop ID (!0) to be unrolled-and-jammed (!1) and then the attribute !3 to be added to the jammed inner loop, which contains the instruction to distribute the inner loop.
Currently, in both pass managers, pass execution is in a fixed order and UnrollAndJamPass will not execute again after LoopDistribute. We hope to fix this in the future by allowing pass managers to run passes until a fixpoint is reached, use Polly to perform these transformations, or add a loop transformation pass which takes the order issue into account.
For mandatory/forced transformations (e.g. by having been declared by #pragma omp simd), the user must be notified when a transformation could not be performed. It is not possible that the responsible pass emits such a warning because the transformation might be 'hidden' in a followup attribute when it is executed, or it is not present in the pipeline at all. For this reason, this patche introduces a WarnMissedTransformations pass, to warn about orphaned transformations.
Since this changes the user-visible diagnostic message when a transformation is applied, two test cases in the clang repository need to be updated.
To ensure that no other transformation is executed before the intended one, the attribute `llvm.loop.disable_nonforced` can be added which should disable transformation heuristics before the intended transformation is applied. E.g. it would be surprising if a loop is distributed before a #pragma unroll_and_jam is applied.
With more supported code transformations (loop fusion, interchange, stripmining, offloading, etc.), transformations can be used as building blocks for more complex transformations (e.g. stripmining+stripmining+interchange -> tiling).
Reviewed By: hfinkel, dmgreen
Differential Revision: https://reviews.llvm.org/D49281
Differential Revision: https://reviews.llvm.org/D55288
llvm-svn: 348944
Add an intrinsic that takes 2 signed integers with the scale of them provided
as the third argument and performs fixed point multiplication on them.
This is a part of implementing fixed point arithmetic in clang where some of
the more complex operations will be implemented as intrinsics.
Differential Revision: https://reviews.llvm.org/D54719
llvm-svn: 348912
Without this check, we hit an assertion in getZExtValue, if the constant
value does not fit into an uint64_t.
As getZExtValue returns an uint64_t, should we update
getAggregateElement to take an uin64_t as well?
This fixes https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=6109.
Reviewers: efriedma, craig.topper, spatel
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D55547
llvm-svn: 348906
https://reviews.llvm.org/D55516
Add the ability to pass in flags to buildInstr calls. Currently no
validation is performed but that can be easily performed based on the
opcode (if necessary).
Reviewed by: paquette.
llvm-svn: 348893
IR-printing AfterPass instrumentation might be called on a loop
that has just been invalidated. We should skip printing it to
avoid spurious asserts.
Reviewed By: chandlerc, philip.pfaffe
Differential Revision: https://reviews.llvm.org/D54740
llvm-svn: 348887
This change makes DT_SONAME treated as an optional trait for ELF TextAPI
stubs. This change accounts for the fact that shared objects aren't
guaranteed to have a DT_SONAME entry. Tests have been updated to check
for correct behavior of an optional soname.
Differential Revision: https://reviews.llvm.org/D55533
llvm-svn: 348817
https://reviews.llvm.org/D55294
Previously MachineIRBuilder::buildInstr used to accept variadic
arguments for sources (which were either unsigned or
MachineInstrBuilder). While this worked well in common cases, it doesn't
allow us to build instructions that have multiple destinations.
Additionally passing in other optional parameters in the end (such as
flags) is not possible trivially. Also a trivial call such as
B.buildInstr(Opc, Reg1, Reg2, Reg3)
can be interpreted differently based on the opcode (2defs + 1 src for
unmerge vs 1 def + 2srcs).
This patch refactors the buildInstr to
buildInstr(Opc, ArrayRef<DstOps>, ArrayRef<SrcOps>)
where DstOps and SrcOps are typed unions that know how to add itself to
MachineInstrBuilder.
After this patch, most invocations would look like
B.buildInstr(Opc, {s32, DstReg}, {SrcRegs..., SrcMIBs..});
Now all the other calls (such as buildAdd, buildSub etc) forward to
buildInstr. It also makes it possible to build instructions with
multiple defs.
Additionally in a subsequent patch, we should make it possible to add
flags directly while building instructions.
Additionally, the main buildInstr method is now virtual and other
builders now only have to override buildInstr (for say constant
folding/cseing) is straightforward.
Also attached here (https://reviews.llvm.org/F7675680) is a clang-tidy
patch that should upgrade the API calls if necessary.
llvm-svn: 348815
Not sure how I missed that in my testing, but obvious enough - this
causes segfaults when attempting to dereference the Error in end
iterators.
llvm-svn: 348814
Using an Error as an out parameter from an indirect operation like
iteration as described in the documentation (
http://llvm.org/docs/ProgrammersManual.html#building-fallible-iterators-and-iterator-ranges
) seems to be a little fussy - so here's /one/ possible solution, though
I'm not sure it's the right one.
Alternatively such APIs may be better off being switched to a standard
algorithm style, where they take a lambda to do the iteration work that
is then called back into (eg: "Error e = obj.for_each_note([](const
Note& N) { ... });"). This would be safer than having an unwritten
assumption that the user of such an iteration cannot return early from
the inside of the function - and must always exit through the gift
shop... I mean error checking. (even though it's guaranteed that if
you're mid-way through processing an iteration, it's not in an error
state).
Alternatively we'd need some other (the super untrustworthy/thing we've
generally tried to avoid) error handling primitive that actually clears
the error state entirely so it's safe to ignore.
Fleshed this solution out a bit further during review - it now relies on
op==/op!= comparison as the equivalent to "if (Err)" testing the Error.
So just like an Error must be checked (even if it's in a success state),
the Error hiding in the iterator must be checked after each increment
(including by comparison with another iterator - perhaps this could be
constrained to only checking if the iterator is compared to the end
iterator? Not sure it's too important).
So now even just creating the iterator and not incrementing it at all
should still assert because the Error has not been checked.
Reviewers: lhames, jakehehrlich
Differential Revision: https://reviews.llvm.org/D55235
llvm-svn: 348811
Summary: The APFloat and Constant APIs taking an APInt allow arbitrary payloads,
and that's great. There's a convenience API which takes an unsigned, and that's
silly because it then directly creates a 64-bit APInt. Just change it to 64-bits
directly.
At the same time, add ConstantFP NaN getters which match the APFloat ones (with
getQNaN / getSNaN and APInt parameters).
Improve the APFloat testing to set more payload bits.
Reviewers: scanon, rjmccall
Subscribers: jkorous, dexonsmith, kristina, llvm-commits
Differential Revision: https://reviews.llvm.org/D55460
llvm-svn: 348791
This patch restricts the capability of G_MERGE_VALUES, and uses the new
G_BUILD_VECTOR and G_CONCAT_VECTORS opcodes instead in the appropriate places.
This patch also includes AArch64 support for selecting G_BUILD_VECTOR of <4 x s32>
and <2 x s64> vectors.
Differential Revisions: https://reviews.llvm.org/D53629
llvm-svn: 348788
Refactor the scheduling predicates based on `MCInstPredicate`. In this
case, for the Exynos processors.
Differential revision: https://reviews.llvm.org/D55345
llvm-svn: 348774
Summary: The comment says we need 3 extracts and a select at the end. But didn't we just account for the select in the vector cost above. Aren't we just extracting the single element after taking the min/max in the vector register?
Reviewers: RKSimon, spatel, ABataev
Reviewed By: RKSimon
Subscribers: javed.absar, kristof.beyls, llvm-commits
Differential Revision: https://reviews.llvm.org/D55480
llvm-svn: 348739
Both intrinsics do the exact same thing so we really only need one.
Earlier in the 8.0 cycle we changed the signature of this intrinsic without renaming it. But it looks difficult to get the autoupgrade code to allow me to merge the intrinsics and change the signature at the same time. So I've renamed the intrinsic slightly for the new merged intrinsic. I'm skipping autoupgrading from the previous new to 8.0 signature. I've also renamed the subborrow for consistency.
llvm-svn: 348737
Since TBEHandler doesn't maintain state or otherwise have any need to be
a class right now, the read and write functions have been moved out and
turned into standalone functions. Additionally, the TBE read function
has been updated to return an Expected value for better error handling.
Tests have been updated to reflect these changes.
Differential Revision: https://reviews.llvm.org/D55450
llvm-svn: 348735