The cost computation assumes we do this correctly, but the actual
lowering was wrong.
Differential Revision: https://reviews.llvm.org/D46923
llvm-svn: 332514
Summary:
This is needed for the continuation of D46504,
to be able to store the timings.
Reviewers: george.karpenkov, NoQ, alexfh, sbenza
Reviewed By: alexfh
Subscribers: llvm-commits, cfe-commits
Differential Revision: https://reviews.llvm.org/D46939
llvm-svn: 332506
Summary:
This is needed for the continuation of D46504,
to be able to store the timings.
Reviewers: george.karpenkov, NoQ, alexfh, sbenza
Reviewed By: alexfh
Subscribers: llvm-commits, cfe-commits
Differential Revision: https://reviews.llvm.org/D46938
llvm-svn: 332505
Summary:
Although this is not stricly required, i would very much prefer
not to have known random precision losses along the way.
Reviewers: george.karpenkov, NoQ, alexfh, sbenza
Reviewed By: george.karpenkov
Subscribers: llvm-commits, cfe-commits
Differential Revision: https://reviews.llvm.org/D46937
llvm-svn: 332504
Summary: We have just used `.sys` suffix for the previous timer, this is clearly a typo
Reviewers: george.karpenkov, NoQ, alexfh, sbenza
Reviewed By: alexfh
Subscribers: llvm-commits, cfe-commits
Differential Revision: https://reviews.llvm.org/D46936
llvm-svn: 332503
As i64 types are not legal on 32-bit targets, insert these into a suitable zero vector and use the packed vXi64<->FP conversion instructions instead.
Fixes PR3163.
Differential Revision: https://reviews.llvm.org/D43441
llvm-svn: 332498
Summary:
Before this patch, signal handling wasn't signal safe. This leads to real-world
crashes. It used ManagedStatic inside of signals, this can allocate and can lead
to unexpected state when a signal occurs during llvm_shutdown (because
llvm_shutdown destroys the ManagedStatic). It also used cl::opt without custom
backing storage. Some de-allocation was performed as well. Acquiring a lock in a
signal handler is also a great way to deadlock.
We can't just disable signals on llvm_shutdown because the signals might do
useful work during that shutdown. We also can't just disable llvm_shutdown for
programs (instead of library uses of clang) because we'd have to then mark the
pointers as not leaked and make sure all the ManagedStatic uses are OK to leak
and remain so.
Move all of the code to lock-free datastructures instead, and avoid having any
of them in an inconsistent state. I'm not trying to be fancy, I'm not using any
explicit memory order because this code isn't hot. The only purpose of the
atomics is to guarantee that a signal firing on the same or a different thread
doesn't see an inconsistent state and crash. In some cases we might miss some
state (for example, we might fail to delete a temporary file), but that's fine.
Note that I haven't touched any of the backtrace support despite it not
technically being totally signal-safe. When that code is called we know
something bad is up and we don't expect to continue execution, so calling
something that e.g. sets errno is the least of our problems.
A similar patch should be applied to lib/Support/Windows/Signals.inc, but that
can be done separately.
Fix r332428 which I reverted in r332429. I originally used double-wide CAS
because I was lazy, but some platforms use a runtime function for that which
thankfully failed to link (it would have been bad for signal handlers
otherwise). I use a separate flag to guard the data instead.
<rdar://problem/28010281>
Reviewers: dexonsmith
Subscribers: steven_wu, llvm-commits
llvm-svn: 332496
As part of merging stores we check that fusing the nodes does not
cause a cycle due to one candidate store being indirectly dependent on
another store (this may happen via chained memory copies). This is
done by searching if a store is a predecessor to another store's
value.
Prune the search at the candidate search's root node which is a
predecessor to all candidate stores. This reduces the
size of the subgraph searched in large basic blocks.
Reviewers: jyknight
Subscribers: llvm-commits, hiraditya
Differential Revision: https://reviews.llvm.org/D46955
llvm-svn: 332490
For regular SVE vector operands, this patch introduces a more
sensible diagnostic when the vector has a wrong suffix (e.g. z0.s vs z0.b).
For example:
add z0.s, z1.s, z2.b -> invalid element width
^_____^
mismatch
For the vector-with-shift/extend (e.g. z0.s, uxtw #2) this patch takes
a slightly different approach and instead returns a 'invalid operand'
if the element size is not as expected. This is because the diagnostics
are more specificied to suggest using the right shift/extend suffix. This
is a trade-off not to introduce more operand classes and still provide
useful diagnostics for LD1 and PRF instructions.
For example:
ld1w z1.s, p0/z, [x0, z0.s] -> invalid shift/extend specified, expected 'z[0..31].s, (uxtw|sxtw)'
ld1w z1.d, p0/z, [x0, z0.s] -> invalid operand
^________________^
mismatch
For gather prefetches, both 'z0.s' and 'z0.d' would be allowed:
prfw #0, p0, [x0, z0.s] -> invalid shift/extend specified, expected 'z[0..31].s, (uxtw|sxtw) #2'
prfw #0, p0, [x0, z0.d] -> invalid shift/extend specified, expected 'z[0..31].d, (lsl|uxtw|sxtw) #2'
Without this change, the diagnostic would unnecessarily suggest a
different element size:
prfw #0, p0, [x0, z0.s] -> invalid shift/extend specified, expected 'z[0..31].d, (lsl|uxtw|sxtw) #2'
Reviewers: SjoerdMeijer, aemerson, fhahn, samparker, javed.absar
Reviewed By: SjoerdMeijer
Differential Revision: https://reviews.llvm.org/D46688
llvm-svn: 332483
Keep loads and stores together (target defines how many loads
and stores to gang up), such that it will help in pairing
and vectorization.
Differential Revision https://reviews.llvm.org/D46477
llvm-svn: 332482
The canonicalization was restricted to shuffle masks with
a 1-to-1 mapping to the constant vector, but that disqualifies
the common splat pattern. This is part of solving PR37463:
https://bugs.llvm.org/show_bug.cgi?id=37463
llvm-svn: 332479
Summary:
A recent patch ([[ https://reviews.llvm.org/rL331587 | rL331587 ]]) to Capture Tracking taught it that the `launder_invariant_group` intrinsic captures its argument only by returning it. Unfortunately, BasicAA still considered every call instruction as a possible escape source and hence concluded that the result of a `launder_invariant_group` call cannot alias any local non-escaping value. This led to [[ https://bugs.llvm.org/show_bug.cgi?id=37458 | bug 37458 ]].
This patch updates the relevant check for escape sources in BasicAA.
Reviewers: Prazek, kuhar, rsmith, hfinkel, sanjoy, xbolva00
Reviewed By: hfinkel, xbolva00
Subscribers: JDevlieghere, hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D46900
llvm-svn: 332466
Summary: If file stream arg is not captured and source is fopen, we could replace IO calls by unlocked IO ("_unlocked" function variants) to gain better speed,
Reviewers: efriedma, RKSimon, spatel, sanjoy, hfinkel, majnemer, lebedev.ri, rja
Reviewed By: rja
Subscribers: rja, srhines, efriedma, lebedev.ri, llvm-commits
Differential Revision: https://reviews.llvm.org/D45736
llvm-svn: 332452
A lot of the models still have too many InstRW overrides for these new classes - this needs cleaning up but I wanted to get the classes in first
llvm-svn: 332451
So that it can be shared with other passes that may end up doing the same
thing.
Differential Revision: https://reviews.llvm.org/D45874
llvm-svn: 332450
We currently handle all aggregates by creating one large LLT, and letting the
legalizer deal with splitting them up. However using this approach means that
we can't support big endian code correctly.
This patch changes the way that the IRTranslator deals with aggregate values,
by splitting them up into their constituent element values. To do this, parts
of the translator need to be modified to deal with multiple VRegs for a single
Value.
A new Value to VReg mapper is introduced to help keep compile time under
control, currently there is no measurable impact on CTMark despite the extra
code being generated in some cases.
Patch is based on the original work of Tim Northover.
Differential Revision: https://reviews.llvm.org/D46018
llvm-svn: 332449
Add support for this target hook, covering MIPS, microMIPS and MIPSR6, along
with some tests. Also add missing getOppositeBranchOpc() cases exposed by the
tests.
Reviewers: atanasyan, abeserminji, smaksimovic
Differential Revision: https://reviews.llvm.org/D46794
llvm-svn: 332446
This patch re-introduces the "S" inline assembler constraint. This matches
an absolute symbolic address or a label reference. The primary use case is
asm("adrp %0, %1\n\t"
"add %0, %0, :lo12:%1" : "=r"(addr) : "S"(&var));
I say re-introduces as it seems like "S" was implemented in the original
AArch64 backend, but it looks like it wasn't carried forward to the merged
backend. The original implementation had A and L modifiers that could be
used to print ":lo12:" to the string. It looks like gcc doesn't use these
and :lo12: is expected to be written in the inline assembly string so I've
not implemented A and L. Clang already supports the S modifier.
Fixes PR37180
Differential Revision: https://reviews.llvm.org/D46745
llvm-svn: 332444
Summary:
SelectionDAGLegalize::ExpandNode() inserts an ISD::MUL when lowering a
BR_JT opcode. While many backends optimize this multiply into a shift, e.g.
the MIPS backend currently always lowers this into a sequence of
load-immediate+multiply+mflo in MipsSETargetLowering::lowerMulDiv().
I initially changed the multiply to a shift in the MIPS backend but it
turns out that would not have handled the MIPSR6 case and was a lot more
code than doing it in LegalizeDAG.
I believe performing this simple optimization in LegalizeDAG instead of
each individual backend is the better solution since this also fixes other
backeds such as MSP430 which calls the multiply runtime function
__mspabi_mpyi without this patch.
Reviewers: sdardis, atanasyan, pftbest, asl
Reviewed By: sdardis
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D45760
llvm-svn: 332439
A catchswitch must be the only non-phi instruction in its basic block;
attempting to move a retain or release into a catchswitch basic block
will result in invalid IR. Explicitly mark a CFG hazard in this case to
prevent the code motion.
Differential Revision: https://reviews.llvm.org/D46482
llvm-svn: 332430
Summary:
Before this patch, signal handling wasn't signal safe. This leads to real-world
crashes. It used ManagedStatic inside of signals, this can allocate and can lead
to unexpected state when a signal occurs during llvm_shutdown (because
llvm_shutdown destroys the ManagedStatic). It also used cl::opt without custom
backing storage. Some de-allocation was performed as well. Acquiring a lock in a
signal handler is also a great way to deadlock.
We can't just disable signals on llvm_shutdown because the signals might do
useful work during that shutdown. We also can't just disable llvm_shutdown for
programs (instead of library uses of clang) because we'd have to then mark the
pointers as not leaked and make sure all the ManagedStatic uses are OK to leak
and remain so.
Move all of the code to lock-free datastructures instead, and avoid having any
of them in an inconsistent state. I'm not trying to be fancy, I'm not using any
explicit memory order because this code isn't hot. The only purpose of the
atomics is to guarantee that a signal firing on the same or a different thread
doesn't see an inconsistent state and crash. In some cases we might miss some
state (for example, we might fail to delete a temporary file), but that's fine.
Note that I haven't touched any of the backtrace support despite it not
technically being totally signal-safe. When that code is called we know
something bad is up and we don't expect to continue execution, so calling
something that e.g. sets errno is the least of our problems.
A similar patch should be applied to lib/Support/Windows/Signals.inc, but that
can be done separately.
<rdar://problem/28010281>
Reviewers: dexonsmith
Subscribers: aheejin, llvm-commits
Differential Revision: https://reviews.llvm.org/D46858
llvm-svn: 332428
The instructions using registers should be DBG_VALUE and normal
instructions. Use isDebugValue() to filter out DBG_VALUE and add
an assert to ensure there is no other kind of debug instructions
using the registers.
Differential Revision: https://reviews.llvm.org/D46739
Patch by Hsiangkai Wang.
llvm-svn: 332427
It doesn't matter much this late in the pipeline, but one place that
does check for it is the function alignment code.
Differential Revision: https://reviews.llvm.org/D46373
llvm-svn: 332415
It is legal for the type passed to isLegalAddressingMode to be
unsized or, more specifically, VoidTy. In this case, we must
check the legality of load / stores for all legal types. Directly
trying to call getTypeStoreSize is incorrect, and leads to breakage
in e.g. Loop Strength Reduction. This change guards against that
behaviour.
Differential Revision: https://reviews.llvm.org/D40405
llvm-svn: 332409
WasmObjectWriter mostly operates with function segments offsets that do
not include their size fields. WasmObjectFile needs to have and provide
this information to the lld to maintain proper
R_WEBASSEMBLY_FUNCTION_OFFSET_I32 relocations entries.
Patch by Yury Delendik
Differential Revision: https://reviews.llvm.org/D46763
llvm-svn: 332406
Author: Samuel Pitoiset
Without this patch, it appears to me that we are selecting
the wrong operand when inverting conditions. In the attached
test, it will select %tmp3 instead of %tmp4. To fix it, just
use 'A' as everywhere.
This fixes a regression introduced by
"[PatternMatch] define m_Not using m_Xor and cst_pred_ty"
https://reviews.llvm.org/D46351
llvm-svn: 332403
When storing the 0th lane of a vector, use a simpler and usually more
efficient scalar store instead. In this case, also using the unscaled
offset.
Differential revision: https://reviews.llvm.org/D46762
llvm-svn: 332394
search. NFCI.
Migrate single-use and non-volatility, non-indexed requirements on
stores of immediate store values to candidate collection pass from
later stage.
llvm-svn: 332392
specially handle SETB_C* pseudo instructions.
Summary:
While the logic here is somewhat similar to the arithmetic lowering, it
is different enough that it made sense to have its own function.
I actually tried a bunch of different optimizations here and none worked
well so I gave up and just always do the arithmetic based lowering.
Looking at code from the PR test case, we actually pessimize a bunch of
code when generating these. Because SETB_C* pseudo instructions clobber
EFLAGS, we end up creating a bunch of copies of EFLAGS to feed multiple
SETB_C* pseudos from a single set of EFLAGS. This in turn causes the
lowering code to ruin all the clever code generation that SETB_C* was
hoping to achieve. None of this is needed. Whenever we're generating
multiple SETB_C* instructions from a single set of EFLAGS we should
instead generate a single maximally wide one and extract subregs for all
the different desired widths. That would result in substantially better
code generation. But this patch doesn't attempt to address that.
The test case from the PR is included as well as more directed testing
of the specific lowering pattern used for these pseudos.
Reviewers: craig.topper
Subscribers: sanjoy, mcrosier, llvm-commits, hiraditya
Differential Revision: https://reviews.llvm.org/D46799
llvm-svn: 332389
Summary:
After r332167 we started to sort the IDF blocks inside IDF calculation, so
there is no need to re-sort them on the user site. The test changes are due to
a slightly different order we're using now (originally we used DFSInNumber and
now the blocks are sorted by a pair (LevelFromRoot, DFSInNumber)).
Reviewers: dberlin, mgrang
Subscribers: Prazek, hiraditya, george.burgess.iv, llvm-commits
Differential Revision: https://reviews.llvm.org/D46899
llvm-svn: 332385
BtVer2 - Fixes schedules for (V)CVTPS2PD instructions
A lot of the Intel models still have too many InstRW overrides for these new classes - this needs cleaning up but I wanted to get the classes in first
llvm-svn: 332376
This is a simple hack based on what's proposed in D37686, but we can extend it if needed in follow-ups.
It gets us most of the FMF functionality that we want without adding any state bits to the flags. It
also intentionally leaves out non-FMF flags (nsw, etc) to minimize the patch.
It should provide a superset of the functionality from D46563 - the extra tests show propagation and
codegen diffs for fcmp, vecreduce, and FP libcalls.
The PPC log2() test shows the limits of this most basic approach - we only applied 'afn' to the last
node created for the call. AFAIK, there aren't any libcall optimizations based on the flags currently,
so that shouldn't make any difference.
Differential Revision: https://reviews.llvm.org/D46854
llvm-svn: 332358
Btver2 - VCVTPH2PSYrm needs to double pump the AGU
Broadwell - missing VCVTPS2PH*mr stores extra latency
Allows us to remove the WriteCvtF2FSt conversion store class
llvm-svn: 332357
This is a resubmit of r331868 (D46583), which was reverted due to
failures on the PS4 bot.
These have been resolved with r332246/D46748.
llvm-svn: 332349
When two interposable functions are merged, we cannot replace
uses and have to emit calls to a common internal function. However,
writeThunk() will not actually emit a thunk if the function is too
small. This leaves us in a broken state where mergeTwoFunctions
already rewired the functions, but writeThunk doesn't do anything.
This patch changes the implementation so that:
* writeThunk() does just that.
* The direct replacement of calls is moved into mergeTwoFunctions()
into the non-interposable case only.
* isThunkProfitable() is extracted and will be called for
the non-iterposable case always, and in the interposable case
only if uses are still left after replacement.
This issue has been introduced in https://reviews.llvm.org/D34806,
where the code for checking thunk profitability has been moved.
Differential Revision: https://reviews.llvm.org/D46804
Reviewed By: whitequark
llvm-svn: 332342
Summary:
New unsigned saturation downconvert patterns detection was implemented in
X86 Codegen:
(truncate (smin (smax (x, C1), C2)) to dest_type),
where C1 >= 0 and C2 is unsigned max of destination type.
(truncate (smax (smin (x, C2), C1)) to dest_type)
where C1 >= 0, C2 is unsigned max of destination type and C1 <= C2.
These two patterns are equivalent to:
(truncate (umin (smax(x, C1), unsigned_max_of_dest_type)) to dest_type)
Reviewers: RKSimon
Subscribers: llvm-commits, a.elovikov
Differential Revision: https://reviews.llvm.org/D45315
llvm-svn: 332336
As requested in D46858, pulling this function into its own lambda makes it
easier to read that part of the code and reason as to what's going on because
the scope it can be called from is extremely limited. We want to keep it as a
function because it's called from the two subsequent lines.
llvm-svn: 332325
1. Deine FeatureRelax to enable/disable linker relaxation.
2. Define shouldForceRelocation to preserve relocation types even if the fixup
can be resolved when linker relaxation enabled. This is necessary for
correctness as offsets may change during relaxation.
Differential Revision: https://reviews.llvm.org/D46674
llvm-svn: 332318
This adds a -debugify-each mode to opt which, when enabled, wraps each
{Module,Function}Pass in a pipeline with logic to add, check, and strip
synthetic debug info for testing purposes.
This mode can be used to test complex pipelines for debug info bugs, or
to collect statistics about the number of debug values & locations lost
throughout various stages of a pipeline.
Patch by Son Tuan Vu!
Differential Revision: https://reviews.llvm.org/D46525
llvm-svn: 332312
Summary:
bugpoint has several options specified as `PositionalEatArgs` to pass
options through to the underlying tool, e.g. `-tool-args`. The `-help`
message suggests the usage is: `-tool-args=<string>`. However, this is
misleading, because that's not how these arguments work. Rather than taking
a value, the option consumes all positional arguments until the next
recognized option (or all arguments if `--` is specified at some point).
To make this slightly clearer, instead print the help as:
```
-tool-args <string>... - <tool arguments>...
```
Additionally, add an error if the user attempts to use a `PositionalEatArgs`
argument with a value, instead of silently ignoring it. Example:
```
./bin/bugpoint -tool-args=-mpcu=skylake-avx512
bugpoint: for the -tool-args option: This argument does not take a value.
Instead, it consumes any positional arguments until the next recognized option.
```
Reviewed By: aprantl
Differential Revision: https://reviews.llvm.org/D46787
llvm-svn: 332311
Summary:
Part of the InstCombine code for simplifying GEPs looks through
addrspacecasts. However, this was done by updating a variable
also used by the next transformation, for marking GEPs as
inbounds. This led to replacing a GEP with a similar instruction
in a different addrspace, which caused an assertion failure in RAUW.
This caused julia issue https://github.com/JuliaLang/julia/issues/27055
Patch by Jeff Bezanson <jeff@juliacomputing.com>
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D46722
llvm-svn: 332302
Summary:
In https://bugs.llvm.org/show_bug.cgi?id=37006 Nico Weber points out a
flaw in `OptTable::findNearest`: if an option "foo"'s prefixes are "--"
and "-", then the nearest option for "--fob" will be "-foo". This is
incorrect, however, since the function is expected to return "--foo".
The bug is due to a naive loop that attempts to predetermines which
prefix is best. Instead, compute the edit distance for each prefix/name
pair.
Test Plan: `check-llvm`
Reviewers: thakis
Reviewed By: thakis
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D46776
llvm-svn: 332299
Extract information related to a "unit header" from DWARFUnit into a
new DWARFUnitHeader class, and add a DWARFUnit member for the header.
This is one step in the direction of allowing type units in the
.debug_info section for DWARF v5.
Differential Revision: https://reviews.llvm.org/D46707
llvm-svn: 332289
Summary:
The BranchFolding pass is currently missing opportunities to hoist
common code if the hoisted-to block contains a single conditional branch
that has register uses. This occurs somewhat frequently on AArch64 with
CBZ/TBZ opcodes.
This change also eliminates some code differences when debug info is
present since the presence of e.g. DBG_VALUE instructions in the
hoisted-to block can enable hoisting that wouldn't have occurred without
them.
Reviewers: MatzeB, rnk, kparzysz, twoh, aprantl, javed.absar
Subscribers: kristof.beyls, JDevlieghere, mcrosier, llvm-commits
Differential Revision: https://reviews.llvm.org/D46324
llvm-svn: 332265
xsrqpi is currently using Z23Form_1.
The instruction format is xsrqpi R,VRT,VRB,RMC.
Rathar than bits 11-15 being used for FRA, it should have
bits 11-14 reserved and bit 15 for R. This patch adds a new
class Z23Form_4 to fix the instruction format.
Differential Revision: https://reviews.llvm.org/D46761
llvm-svn: 332253
When storing the 0th lane of a vector, use a simpler and usually more efficient scalar store instead.
Differential revision: https://reviews.llvm.org/D46655
llvm-svn: 332251
Summary:
If we are not emitting a linkage name in the .debug_info sections, we
should not add it into the index either. This makes sure our index is
consistent with the actual debug info.
I am also explicitly setting the --dwarf-linkage-names=All in the
name-collsions test as that one would now fail on targets where this
defaults to "Abstract" (in fact, it would have failed already if there
wasn't a bug in the DWARF verifier, which I fix as well).
Reviewers: probinson, aprantl, JDevlieghere
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D46748
llvm-svn: 332246
The DEBUG() macro is very generic so it might clash with other projects.
The renaming was done as follows:
- git grep -l 'DEBUG' | xargs sed -i 's/\bDEBUG\s\?(/LLVM_DEBUG(/g'
- git diff -U0 master | ../clang/tools/clang-format/clang-format-diff.py -i -p1 -style LLVM
- Manual change to APInt
- Manually chage DOCS as regex doesn't match it.
In the transition period the DEBUG() macro is still present and aliased
to the LLVM_DEBUG() one.
Differential Revision: https://reviews.llvm.org/D43624
llvm-svn: 332240
Summary:
The first foray into merging debug info into the echo tests.
- Add bindings to Module::getModuleFlagsMetadata() in the form of LLVMCopyModuleFlagsMetadata
- Add the opaque type LLVMModuleFlagEntry to represent Module::ModuleFlagEntry
- Add accessors for LLVMModuleFlagEntry's behavior, key, and metadata node.
Reviewers: whitequark, deadalnix
Reviewed By: whitequark
Subscribers: aprantl, JDevlieghere, llvm-commits, harlanhaskins
Differential Revision: https://reviews.llvm.org/D46792
llvm-svn: 332219
This pass is
a) broken.
b) r600 specific.
Fixing (a) is a bit more non-trivial, but fixing (b)
is easy. Move this pass to being R600 only for now.
This pass does pass all the unit tests, however clang
no longer generates code that looks like the unit test
input, so fixing the pass requires fixing the tests and
the pass as one, and checking it works with clang still.
Patch by Dave Airlie
llvm-svn: 332196
Summary:
As reported in PR37264, in some cases the X86 Domain Reassignment
`runOnMachineFunction()` is called twice. Because it only deletes the
`.second` members of its `InstrConverterBaseMap`, and does not clean up
the map itself, this can lead to double frees and crashes.
Use `DeleteContainerSeconds()` instead, so the `Converters` map can
safely be reinitialized and its members re-deleted for each X86 Domain
Reassignment pass.
Reviewers: guyblank, craig.topper
Reviewed By: craig.topper
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D46425
llvm-svn: 332176
Summary:
r271558 moved getManagedStaticMutex's mutex from a function-local
static to using call_once, but left a comment added in r211424. That comment is
now erroneous, remove it.
Reviewers: zturner, chandlerc
Subscribers: aheejin, llvm-commits
Differential Revision: https://reviews.llvm.org/D46784
llvm-svn: 332175
Summary:
Currently the order of blocks returned by `IDF::calculate` can be
non-deterministic. This was discovered in several attempts to enable
SSAUpdaterBulk for JumpThreading (which led to miscompare in bootstrap between
stage 3 and stage4). Originally, the blocks were put into a priority queue with
a depth level as their key, and this patch adds a DFSIn number as a second key
to specify a deterministic order across blocks from one level.
The solution was suggested by Daniel Berlin.
Reviewers: dberlin, davide
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D46646
llvm-svn: 332167
We cannot query this attribute from a subtarget given a machine function.
At this point attribute itself is already unavailable and can only be
obtained through MFI.
Differential Revision: https://reviews.llvm.org/D46781
llvm-svn: 332166
There's only one use of this currently, but that could
change with D46563. Either way, we shouldn't have to
update code outside of the flags struct when those
flag definitions change.
llvm-svn: 332155
This is a CodeExtractor improvement which adds support for extracting blocks
which have exception handling constructs if that is legal to do. CodeExtractor
performs validation checks to ensure that extraction is legal when it finds
invoke instructions or EH pads (landingpad, catchswitch, or cleanuppad) in
blocks to be extracted.
I have also added an option to allow extraction of blocks with alloca
instructions, but no validation is done for allocas. CodeExtractor caller has
to validate it himself before allowing alloca instructions to be extracted.
By default allocas are still not allowed in extraction blocks.
Differential Revision: https://reviews.llvm.org/D45904
llvm-svn: 332151
Summary:
We have no logic to promote alloca to vector for an AddrSpaceCast instruction.
Reviewer:
arsenm
Differential Revision:
https://reviews.llvm.org/D45993
llvm-svn: 332147
Let separate-const-offset-from-gep pass handle trunc() when it calculates
constant offset relative to base. The pass itself may insert trunc()
instructions when it canonicalises array indices to pointer-size integers
and needs to handle trunc() in order to evaluate the offset.
Differential Revision: https://reviews.llvm.org/D46732
llvm-svn: 332142
Remove a useless SwitchSection which also causes compilation failure
when IR contains comdat.
The SwitchSection is useless because the current section is already
correct text section for the function therefore no need to switch.
It causes compilation failure for comdat because functions with comdat
has specific text section, not the default .text section.
Since HIP uses comdat, this bug caused failures for HIP.
Differential Revision: https://reviews.llvm.org/D46770
llvm-svn: 332137
Summary:
This change adds handling of the atomic memset intrinsic to the
code path that simplifies the regular memset. In practice this means
that we will now also expand a small constant-length atomic memset
into a single unordered atomic store.
Reviewers: apilipenko, skatkov, mkazantsev, anna, reames
Reviewed By: reames
Subscribers: reames, llvm-commits
Differential Revision: https://reviews.llvm.org/D46660
llvm-svn: 332132
ExtendSetCCUses updates SETCC nodes which use a load (OriginalLoad) to
reflect a simplification to the load (ExtLoad).
Based on my reading, ExtendSetCCUses may create new nodes to extend a
constant attached to a SETCC. It also creates fresh SETCC nodes which
refer to any updated operands.
ISTM that the location applied to the new constant and SETCC nodes
should be the same as the location of the ExtLoad.
This was suggested by Adrian in https://reviews.llvm.org/D45995.
Part of: llvm.org/PR37262
Differential Revision: https://reviews.llvm.org/D46216
llvm-svn: 332119
This teaches tryToFoldExtOfLoad to set the right location on a
newly-created extload. With that in place, the logic for performing a
certain ([s|z]ext (load ...)) combine becomes identical for sexts and
zexts, and we can get rid of one copy of the logic.
The test case churn is due to dependencies on IROrders inherited from
the wrong SDLoc.
Part of: llvm.org/PR37262
Differential Revision: https://reviews.llvm.org/D46158
llvm-svn: 332118
Part of the logic for combining (zext (load ...)) and (sext (load ...))
is duplicated. This creates problems because bugs in one version have to
be fixed again in the other version.
To address this, as a first step, I've extracted the duplicate logic
into a helper. I'll fix the debug location bug in the helper and
eliminate the copy of its logic in a followup.
Part of: llvm.org/PR37262
Differential Revision: https://reviews.llvm.org/D46157
llvm-svn: 332117
These directives allow the 'C' (compressed) extension to be enabled/disabled
within a single file.
Differential Revision: https://reviews.llvm.org/D45864
Patch by Kito Cheng
llvm-svn: 332107
If the sprintf function is static (as on mingw-w64, where many stdio
functions are static inline wrappers), earlier optimization passes
could optimize out the return value altogether, and make it void,
which could break optimizations of this libcall that touch the
return value.
This fixes the issue discussed in PR37408 for the sprintf function.
Differential Revision: https://reviews.llvm.org/D46752
llvm-svn: 332106
Summary:
performPostLD1Combine in AArch64ISelLowering looks for vector
insert_vector_elt of a loaded value which it can optimize into a single
LD1LANE instruction. The code checking for the pattern was not checking
if the lane index was a constant which could cause two problems:
- an assert when lowering the LD1LANE ISD node since it assumes an
constant operand
- an assert in isel if the lane index value depends on the
post-incremented base register
Both of these issues are avoided by simply checking that the lane index
is a constant.
Fixes bug 35822.
Reviewers: t.p.northover, javed.absar
Subscribers: rengolin, kristof.beyls, mcrosier, llvm-commits
Differential Revision: https://reviews.llvm.org/D46591
llvm-svn: 332103
Phi nodes can reside in live blocks but one of their incoming
arguments can come from a dead block. Dead blocks and reassociate
don't play nice together. In fact, reassociate performs an RPO
as a first step to avoid processing dead blocks.
The reason why Reassociate might not fixpoint when examining
dead blocks is that the following:
%xor0 = xor i16 %xor1, undef
%xor1 = xor i16 %xor0, undef
is perfectly valid LLVM IR (if it appears in a dead block),
so the worklist algorithm keeps pushing the two instructions for
reexamination. Note that this is not Reassociate fault, at least
not entirely. It's llvm that has a weird definition of dominance.
Fixes PR37390.
llvm-svn: 332100
Confirmed by both Agner and Intel's AOM - the IEC/FPC are not required for pure load/stores (even if its a partial update).
Can't fix WriteStore until all RMW instructions are cleaned up though....
llvm-svn: 332096
Summary:
This change reworks the handling of atomic memcpy within the instcombine pass.
Previously, a constant length atomic memcpy would be lowered into loads & stores
as long as no more than 16 load/store pairs are created. This is quite different
from the lowering done for a non-atomic memcpy; which only ever lowers into a single
load/store pair of no more than 8 bytes. Larger constant-sized memcpy calls are
expanded to load/stores in later passes, such as SelectionDAG lowering.
In this change the behaviour for atomic memcpy is unified with non-atomic memcpy;
atomic memcpy is now treated in the same was as non-atomic memcpy has always been.
We leave it to later passes to lower longer-length atomic memcpy calls.
Due to the structure of the pass's handling of memtransfer intrinsics, this change
also gives us handling of atomic memmove that we did not previously have.
Reviewers: apilipenko, skatkov, mkazantsev, anna, reames
Reviewed By: reames
Subscribers: reames, llvm-commits
Differential Revision: https://reviews.llvm.org/D46658
llvm-svn: 332093
losesInfo would be left unset when no conversion needs to be done. A
caller such as InstCombine's fitsInFPType would then branch on an
uninitialized value.
Caught using valgrind on an out-of-tree target.
Differential Revision: https://reviews.llvm.org/D46645
llvm-svn: 332087
Summary:
https://bugs.llvm.org/show_bug.cgi?id=34897 demonstrates an incorrect
coroutine frame allocation elision in the coro-elide pass. The elision
is performed on the basis that the SSA variables from all llvm.coro.begin
are directly referenced in subsequent llvm.coro.destroy instructions.
However, this ignores the fact that the function may exit through paths
that do not run these destroy instructions. In the sample program from
PR34897, for example, the llvm.coro.destroy instruction is only
executed in exception handling code. When the coroutine function exits
normally, llvm.coro.destroy is not called. Eliding the allocation in
this case causes a subsequent reference to the coroutine handle from
outside of the function to access freed memory.
To fix the issue, when finding an llvm.coro.destroy for each llvm.coro.begin,
only consider llvm.coro.destroy that are executed along non-exceptional paths.
Test Plan:
1. Download the sample program from
https://bugs.llvm.org/show_bug.cgi?id=34897, compile it with
`clang++ -fcoroutines-ts -stdlib=libc++ -std=c++1z -O2`, and run it.
It should print `"run1\ncheck1\nrun2\ncheck2"` and then exit
successfully.
2. Compile https://godbolt.org/g/mCKfnr and confirm it is still
optimized to a single instruction, 'return 1190'.
3. `check-llvm`
Reviewers: rsmith, GorNishanov, eric_niebler
Reviewed By: GorNishanov
Subscribers: andrewrk, lewissbaker, EricWF, llvm-commits
Differential Revision: https://reviews.llvm.org/D43242
llvm-svn: 332077
Summary:
Add documentation for the LLVM Support functions `openFileForWrite` and
`openFileForRead`. The `openFileForRead` parameter `RealPath`, in
particular, I think warranted some explanation.
In addition, make the behavior of the functions more consistent across
platforms. Prior to this patch, Windows would set or not set the result
file descriptor based on the nature of the error, whereas Unix would
consistently set it to `-1` if the open failed. Make Windows
consistently set it to `-1` as well.
Test Plan:
1. `ninja check-llvm`
2. `ninja docs-llvm-html`
Reviewers: zturner, rnk, danielmartin, scanon
Reviewed By: danielmartin, scanon
Subscribers: scanon, danielmartin, llvm-commits
Differential Revision: https://reviews.llvm.org/D46499
llvm-svn: 332075
Summary:
Ship kNetBSD_ShadowOffset32 set to 1ULL << 30.
This is prepared for the amd64 kernel runtime.
Sponsored by <The NetBSD Foundation>
Reviewers: vitalybuka, joerg, kcc
Reviewed By: vitalybuka
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D46724
llvm-svn: 332069
We found current sampleFDO had a performance issue when triaging a regression.
For a callsite with inline instance in the profile, even if hot callsite inliner
cannot inline it, it may still execute enough times and should not be treated as
cold in regular inliner later. However, currently if such callsite is not inlined
by hot callsite inliner, and the BB where the callsite locates doesn't get
samples from other instructions inside of it, the callsite will have no profile
metadata annotated. In regular inliner cost analysis, if the callsite has no
profile annotated and its caller has profile information, it will be treated as
cold.
The fix changes the isCallsiteHot check and chooses to compare
CallsiteTotalSamples with hot cutoff value computed by ProfileSummaryInfo.
Differential Revision: https://reviews.llvm.org/D45377
llvm-svn: 332058
This commit adds a wrapper for std::distance() which works with ranges.
As it would be a common case to write `distance(predecessors(BB))`, this
also introduces `pred_size()` and `succ_size()` helpers to make that
easier to write.
Differential Revision: https://reviews.llvm.org/D46668
llvm-svn: 332057
The bitwidth of the operation should always be wider than the result width of the truncate since we don't recurse through any width changing operations.
llvm-svn: 332055
This implements a new table-gen emitter to create tables for
a wasm disassembler, and a dissassembler to use them.
Comes with 2 tests, that tests a few instructions manually. Is also able to
disassemble large .wasm files with objdump reasonably.
Not working so well, to be addressed in followups:
- objdump appears to be passing an incorrect starting point.
- since the disassembler works an instruction at a time, and it is
disassembling stack instruction, it has no idea of pseudo register assignments.
These registers are required for the instruction printing code that follows.
For now, all such registers appear in the output as $0.
Patch by Wouter van Oortmerssen
Differential Revision: https://reviews.llvm.org/D45848
llvm-svn: 332052
Clang's codegen now uses 128-bit masked load/store intrinsics in IR. The backend will widen to 512-bits on AVX512F targets.
So this patch adds patterns to detect codegen's widening and patterns for AVX512VL that don't get widened.
We may be able to drop some of the old patterns, but I leave that for a future patch.
llvm-svn: 332049
This reverts commit SVN r331889, which could trigger failed
assertions for cases where the snprintf function is declared
with a vaguely differing signature (e.g. being defined as
static inline), see PR37408.
llvm-svn: 332043
Summary: Move and correct LLVMDIBuilderCreateTypedef. This is the last API in DIBuilderBindings.h, so it is being removed and the C API will now be re-exported from IRBindings.h.
Reviewers: whitequark, harlanhaskins, deadalnix
Reviewed By: whitequark
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D46725
llvm-svn: 332041
length excluding the table header. Instead it must encode the contribution length minus the length
field itself.
Reviewer: JDevliegehere
Differential Revision: https://reviews.llvm.org/D45922
llvm-svn: 332030
Accessing the members of a large data structures needs a lot of GEPs which
usually have large offsets due to the size of the underlying data structure. If
the offsets are too large to fit into the r+i addressing mode, these GEPs cannot
be sunk to their users' blocks and many extra registers are needed then to carry
the values of these GEPs.
This patch tries to split a large data struct starting from %base like the
following.
Before:
BB0:
%base =
BB1:
%gep0 = gep %base, off0
%gep1 = gep %base, off1
%gep2 = gep %base, off2
BB2:
%load1 = load %gep0
%load2 = load %gep1
%load3 = load %gep2
After:
BB0:
%base =
%new_base = gep %base, off0
BB1:
%new_gep0 = %new_base
%new_gep1 = gep %new_base, off1 - off0
%new_gep2 = gep %new_base, off2 - off0
BB2:
%load1 = load i32, i32* %new_gep0
%load2 = load i32, i32* %new_gep1
%load3 = load i32, i32* %new_gep2
In the above example, the struct is split into two parts. The first part still
starts from %base and the second part starts from %new_base. After the
splitting, %new_gep1 and %new_gep2 have smaller offsets and then can be sunk to
BB2 and folded into their users.
The algorithm to split data structure is simple and very similar to the work of
merging SExts. First, it collects GEPs that have large offsets when iterating
the blocks. Second, it splits the underlying data structures and updates the
collected GEPs to use smaller offsets.
Differential Revision: https://reviews.llvm.org/D42759
llvm-svn: 332015
Summary:
- Adds getters for the line, column, and scope of a DILocation
- Adds getters for the name, size in bits, offset in bits, alignment in bits, line, and flags of a DIType
Reviewers: whitequark, harlanhaskins, deadalnix
Reviewed By: whitequark
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D46627
llvm-svn: 332014
These symbols only get included in the output symbols table if
they are used in a relocation.
This behaviour matches more closely the ELF object writer.
Differential Revision: https://reviews.llvm.org/D46561
llvm-svn: 332005
This is a follow up to the rL330983. The patch teaches ld, sd, and lld
commands accept 32-bit memory offsets by replacing `mem_simm16` operand
to `mem_simmptr`. In fact, these commands should accept 64-bit offsets,
but so large offsets require another command expanding and will be
supported by a separate patch.
Differential Revision: https://reviews.llvm.org/D46629
llvm-svn: 331997
This is a follow up to the rL330983. The patch teaches lh and lhu
commands accepts 32-bit memory offsets by replacing `mem_simm16` operand
to `mem_simmptr`.
Differential Revision: https://reviews.llvm.org/D46513
llvm-svn: 331996
With nnan, there's no need for the masked merge / blend
sequence (that probably costs much more than the min/max
instruction).
Somewhere between clang 5.0 and 6.0, we started producing
these intrinsics for fmax()/fmin() in C source instead of
libcalls or fcmp/select. The backend wasn't prepared for
that, so we regressed perf in those cases.
Note: it's possible that other targets have similar problems
as seen here.
Noticed while investigating PR37403 and related bugs:
https://bugs.llvm.org/show_bug.cgi?id=37403
The IR FMF propagation cases still don't work. There's
a proposal that might fix those cases in D46563.
llvm-svn: 331992
Summary:
This change teaches DSE that the atomic memory intrinsics can be overwriten
partially in the same way as the non-atomic forms. Specifically, that the
atomic memcpy & memset can be shortened at the end and that the atomic memset
can be shortened at the beginning, if they partially overwritten
by later stores.
Reviewers: mkazantsev, skatkov, apilipenko, efriedma, rsmith, spatel, filcab, sanjoy
Reviewed By: efriedma
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D45584
llvm-svn: 331991
Fixes bug https://bugs.llvm.org/show_bug.cgi?id=37339.
InlineAsm is only uniqued if the FunctionTypes are exactly the
same, while cmpTypes() for example considers all pointer types
in the default address space to be the same. For this reason
the end of cmpInlineAsm() can be reached.
This patch replaces the unreachable assertion with a check that
the function types are not identical.
Differential Revision: https://reviews.llvm.org/D46495
Reviewers: jfb
llvm-svn: 331990
Summary:
The combine in rebuildSetCC may be combined to another
node leaving our references stale. Keep a handle on
it to avoid stale references.
Fixes PR36602.
Reviewers: dbabokin, RKSimon, eli.friedman, davide
Subscribers: hiraditya, uabelho, JesperAntonsson, qcolombet, llvm-commits
Differential Revision: https://reviews.llvm.org/D46404
llvm-svn: 331985
The print format was causing at least 2 unit-test failures from r331971.
The signed/unsigned comparison warnings only appeared to affect two lines but
it was unclear whether it might just pop up on other lines, so I have been
explicit in all the literals in the tests.
There were other bot unit-test failures that I am still investigating.
llvm-svn: 331978
Summary:
Operand 0 is the condition, not the true value.
Use op 1 and op 2 as the correct values.
Reviewers: george.burgess.iv, nlopes, efriedma
Reviewed By: george.burgess.iv
Subscribers: craig.topper, rjmccall, lebedev.ri, llvm-commits
Differential Revision: https://reviews.llvm.org/D46343
llvm-svn: 331976
Put in a conservatively correct estimate for now. Avoids miscompiling
clang in FDO mode. This is really tricky to trigger in reality as
basically all interesting cases will be folded away by computeKnownBits
earlier, I was unable to find a reasonably small test case.
llvm-svn: 331975
Reviewed by: dblaikie, JDevlieghere, espindola
Differential Revision: https://reviews.llvm.org/D44560
Summary:
The .debug_line parser previously reported errors by printing to stderr and
return false. This is not particularly helpful for clients of the library code,
as it prevents them from handling the errors in a manner based on the calling
context. This change switches to using llvm::Error and callbacks to indicate
what problems were detected during parsing, and has updated clients to handle
the errors in a location-specific manner. In general, this means that they
continue to do the same thing to external users. Below, I have outlined what
the known behaviour changes are, relating to this change.
There are two levels of "errors" in the new error mechanism, to broadly
distinguish between different fail states of the parser, since not every
failure will prevent parsing of the unit, or of subsequent unit. Malformed
table errors that prevent reading the remainder of the table (reported by
returning them) and other minor issues representing problems with parsing that
do not prevent attempting to continue reading the table (reported by calling a
specified callback funciton). The only example of this currently is when the
last sequence of a unit is unterminated. However, I think it would be good to
change the handling of unrecognised opcodes to report as minor issues as well,
rather than just printing to the stream if --verbose is used (this would be a
subsequent change however).
I have substantially extended the DwarfGenerator to be able to handle
custom-crafted .debug_line sections, allowing for comprehensive unit-testing
of the parser code. For now, I am just adding unit tests to cover the basic
error reporting, and positive cases, and do not currently intend to test every
part of the parser, although the framework should be sufficient to do so at a
later point.
Known behaviour changes:
- The dump function in DWARFContext now does not attempt to read subsequent
tables when searching for a specific offset, if the unit length field of a
table before the specified offset is a reserved value.
- getOrParseLineTable now returns a useful Error if an invalid offset is
encountered, rather than simply a nullptr.
- The parse functions no longer use `WithColor::warning` directly to report
errors, allowing LLD to call its own warning function.
- The existing parse error messages have been updated to not specifically
include "warning" in their message, allowing consumers to determine what
severity the problem is.
- If the line table version field appears to have a value less than 2, an
informative error is returned, instead of just false.
- If the line table unit length field uses a reserved value, an informative
error is returned, instead of just false.
- Dumping of .debug_line.dwo sections is now implemented the same as regular
.debug_line sections.
- Verbose dumping of .debug_line[.dwo] sections now prints the prologue, if
there is a prologue error, just like non-verbose dumping.
As a helper for the generator code, I have re-added emitInt64 to the
AsmPrinter code. This previously existed, but was removed way back in r100296,
presumably because it was dead at the time.
This change also requires a change to LLD, which will be committed separately.
llvm-svn: 331971
During simplification umax we trigger isKnownPredicate twice. As a first attempt it
tries the induction. To do that it tries to get post increment of SCEV.
Re-writing the SCEV may result in simplification of umax. If the SCEV contains a lot
of umax operations this recursion becomes very slow.
The added test demonstrates the slow behavior.
To resolve this we use only simple ways to check whether the predicate is known.
Reviewers: sanjoy, mkazantsev
Reviewed By: sanjoy
Subscribers: lebedev.ri, javed.absar, llvm-commits
Differential Revision: https://reviews.llvm.org/D46046
llvm-svn: 331949
Const/local/shared address spaces are all < 4GB and we can always use
32-bit pointers to access them. This has substantial performance impact
on kernels that uses shared memory for intermediary results.
The feature is disabled by default.
Differential Revision: https://reviews.llvm.org/D46147
llvm-svn: 331941
This is a follow-up to D45986. As suggested there, we should match the "all-bits-set"
pattern in addition to "any-bits-set".
This was a little more complicated than I thought it would be initially because the
"and 1" instruction can be anywhere in the chain. Hopefully, the code comments make
that logic understandable, but if you see a way to simplify or improve that, it's
most appreciated.
This transforms patterns that emerge from bitfield tests as seen in PR37098:
https://bugs.llvm.org/show_bug.cgi?id=37098
I think it would also help reduce the large test from:
D46336
D46595
but we need something to reassociate that case to the forms we're expecting here first.
Differential Revision: https://reviews.llvm.org/D46649
llvm-svn: 331937
The previous handling for guard widening in InstCombine was extremely restrictive. In particular, it didn't handle the common case where we had two guards separated by a single icmp. Handle this by scanning through a small fixed window of instructions to find the next guard if needed.
Differential Revision: https://reviews.llvm.org/D46203
llvm-svn: 331935
This is safe as long as the udiv is not exact. The pattern is not common in
C++ code, but comes up all the time in code generated by XLA's GPU backend.
Differential Revision: https://reviews.llvm.org/D46647
llvm-svn: 331933
Summary: As per title. SETCCE is deprecated and will eventually be removed.
Reviewers: rogfer01, efriedma, rengolin, javed.absar
Subscribers: kristof.beyls, chrib, llvm-commits
Differential Revision: https://reviews.llvm.org/D46512
llvm-svn: 331929
The second source operand of G_SHL, G_ASHR, and G_LSHR must preserve its
value as a (small) unsigned integer, therefore its incorrect to widen it
in any way but by zero extending it.
G_SHL was using G_ANYEXT and G_ASHR - G_SEXT (which is correct for their
destination and first source operands, but not the "number of bits to
shift" operand).
Generally, shifts aren't as similar to regular binary operations as it
might seem, for instance, they aren't commutative nor associative and
the second source operand usually requires a special treatment.
Reviewers: bogner, javed.absar, aivchenk, rovka
Reviewed By: bogner
Subscribers: igorb, kristof.beyls, llvm-commits
Differential Revision: https://reviews.llvm.org/D46413
llvm-svn: 331926
If a multiply is truncated, SimplifyDemandedBits
sometimes turns a zero_extend of the inputs into an
any_extend, which makes the known bits computation unhelpful.
Ignore these and compute known bits for the underlying value,
since we insert the correct extend type after.
llvm-svn: 331919
This is an extension of an existing combine to reduce wider
shls if the result fits in the final result type. This
introduces the same combine, but reduces the shift to a middle
sized type to avoid the slow 64-bit shift.
llvm-svn: 331916
If the truncate is only accessing the first element of the vector,
we can use the original source value.
This helps with some combine ordering issues after operations are
lowered to integer operations between bitcasts of build_vector.
In particular it stops unnecessarily materializing the unused
top half of a vector in some cases.
llvm-svn: 331909
Previously if !LegalOperations we would blindly call getBitcast and hope that getNode would constant fold it. But if the conversion is between a vector and a scalar, getNode has no simplification.
This means we would just get back the original N. We would then return that N which would make the caller of visitBITCAST think that we used CombineTo and did our own worklist management. This prevents target specific optimizations from being called for vector/scalar bitcasts until after legal operations.
llvm-svn: 331896
The previous value of 8192 resulted in severe compile time hits in
some pathological cases.
rdar://39781410
Differential Revision: https://reviews.llvm.org/D46581
llvm-svn: 331888
Summary:
Unnormal values are a feature of some very old x87 processors. We handle
them correctly for the most part -- the only exception was an unnormal
value whose significand happened to be zero. In this case the APFloat
was still initialized as normal number (category = fcNormal), but a
subsequent toString operation would assert because the math would
produce nonsensical values for the zero significand.
During review, it was decided that the correct way to fix this is to
treat all unnormal values as NaNs (as that is what any >=386 processor
will do).
The issue was discovered because LLDB would crash when trying to print
some "long double" values.
Reviewers: skatkov, scanon, gottesmm
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D41868
llvm-svn: 331884
Summary:
Various path functions were not treating paths consisting of slashes
alone consistently. For example, the iterator-based accessors decomposed the
path "///" into two elements: "/" and ".". This is not too bad, but it
is different from the behavior specified by posix:
```
A pathname that contains ***at least one non-slash character*** and that
ends with one or more trailing slashes shall be resolved as if a single
dot character ( '.' ) were appended to the pathname.
```
More importantly, this was different from how we treated the same path
in the filename+parent_path functions, which decomposed this path into
"." and "". This was completely wrong as it lost the information that
this was an absolute path which referred to the root directory.
This patch fixes this behavior by making sure all functions treat paths
consisting of (back)slashes alone the same way as "/". I.e., the
iterator-based functions will just report one component ("/"), and the
filename+parent_path will decompose them into "/" and "".
A slightly controversial topic here may be the treatment of "//". Posix
says that paths beginning with "//" may have special meaning and indeed
we have code which parses paths like "//net/foo/bar" specially. However,
as we were already not being consistent in parsing the "//" string
alone, and any special parsing for it would complicate the code further,
I chose to treat it the same way as longer sequences of slashes (which
are guaranteed to be the same as "/").
Another slight change of behavior is in the parsing of paths like
"//net//". Previously the last component of this path was ".". However,
as in our parsing the "//net" part in this path was the same as the
"drive" part in "c:\" and the next slash was the "root directory", it
made sense to treat "//net//" the same way as "//net/" (i.e., not to add
the extra "." component at the end).
Reviewers: zturner, rnk, dblaikie, Bigcheese
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D45942
llvm-svn: 331876
With custom lowering for vector MULLH{S,U}, it is now profitable to
vectorize a divide by constant loop for the custom types (v16i8, v8i16,
and v4i32). The cost if based on TargetLowering::Build{S,U}DIV which
uses a multiply by constant plus adjustment to express a divide by
constant.
Both {u,s}mull{2} are expressed as Instruction::Mul and shifts by
Instruction::AShr.
llvm-svn: 331873
The new verifier check has found an error in the
debug-names-name-collisions.ll test on the PS4 bot:
error: Name Index @ 0x0: Entry @ 0xdc: mismatched Name of DIE @ 0x23: index - _ZN3foo3fooE; debug_info - foo.
Reverting while I investigate whether this is a bug in the verifier or
the generator.
This reverts commit r331868.
llvm-svn: 331869
Summary:
This patch implements a check which makes sure all entries required by
the DWARF v5 specification are present in the Name Index. The algorithm
tries to follow the wording of Section 6.1.1.1 of the spec as closely as
possible.
The main deviation from it is that instead of a whitelist-based approach
in the spec "The name index must contain an entry for each debugging
information entry that defines a named subprogram, label, variable,
type, or namespace" I chose a blacklist-based one, where I consider
everything to be "in" and then remove the entries that don't make sense.
I did this because it has more potential for catching interesting cases
and the above is a bit vague (it uses plain words like "variable" and
"subprogram", but the rest of the section speaks about specific TAGs).
This approach has raised some interesting questions, the main one being
whether enumerator values should be indexed. The consensus seems to be
that they should, although it does not follow from section 6.1.1.1.
For the time being I made the verifier ignore these, as LLVM does not do
this yet, and I wanted to get a clean run when verifying generated debug
info.
Another interesting case was the DW_TAG_imported_declaration. It was not
immediately clear to me whether this should go in or not, but currently
it is not indexed, and (unlike the enumerators) in does not seem to cause
problems for LLDB, so I've also ignored it.
Reviewers: JDevlieghere, aprantl, dblaikie
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D46583
llvm-svn: 331868
MOVNTPD/MOVNTPS should be WriteFStore
Standardized BDW/HSW/SKL/SKX WriteFStore/WriteVecStore - fixes some missed instregex patterns. (V)MASKMOVDQU was already using the default, its costs gets increased but is still nowhere near the real cost of that nasty instruction....
llvm-svn: 331864
The operator == used for exporting a function with a different
name in the DLL compared to the name in the import library
(which is useful for adding linker level aliases for function
in the import library) is a feature distinct and different from
the operator = used for exporting a function with a different
name (both in import library and DLL) than in the implementation
producing the DLL.
When creating an import library using dlltool, from a def file that
contains forwards (Func = OtherDll.Func), this shouldn't affect the
produced import library, which should still behave just as if it
was a normal exported function.
This clears a lot of confusion and subtle misunderstandings, and
avoids a parameter that was used to avoid creating weak aliases
when invoked from lld. (This parameter was added previously due to
the existing conflation of the two features.)
Differential Revision: https://reviews.llvm.org/D46245
llvm-svn: 331859
Summary:
MergedLoadStoreMotion::mergeStores is using some heuristics
to limit the amount of stores that it tries to sink (see
MagicCompileTimeControl in MergedLoadStoreMotion.cpp). The
heuristic involves counting the number of instructions in
one of the basic blocks that is part of the transformation.
We now ignore dbg intrinsics when counting instruction for
the MagicCompileTimeControl heuristic. This to make sure that
the amount of stores that are sunk doesn't depend on the amount
of debug information (if -g is used or not).
Reviewers: Gerolf, davide, majnemer
Reviewed By: davide
Subscribers: dberlin, bjope, aprantl, JDevlieghere, llvm-commits
Differential Revision: https://reviews.llvm.org/D46600
llvm-svn: 331852
Reverting this to see if the clang-cmake-aarch64-global-isel and
clang-cmake-aarch64-quick bots are failing because of this commit.
We know it wasn't r331819.
llvm-svn: 331846
Because we create a new kind of debug instruction, DBG_LABEL, we need to
check all passes which use isDebugValue() to check MachineInstr is debug
instruction or not. When expelling debug instructions, we should expel
both DBG_VALUE and DBG_LABEL. So, I create a new function,
isDebugInstr(), in MachineInstr to check whether the MachineInstr is
debug instruction or not.
This patch has no new test case. I have run regression test and there is
no difference in regression test.
Differential Revision: https://reviews.llvm.org/D45342
Patch by Hsiangkai Wang.
llvm-svn: 331844
In order to convert LLVM IR to MachineInstr, we need a new TargetOpcode,
DBG_LABEL, to ‘lower’ intrinsic llvm.dbg.label. The patch
creates this new TargetOpcode and convert intrinsic llvm.dbg.label to
MachineInstr through SelectionDAG.
In SelectionDAG, debug information is stored in SDDbgInfo. We create a
new data member of SDDbgInfo for labels and use the new data member,
SDDbgLabel, to create DBG_LABEL MachineInstr.
The new DBG_LABEL MachineInstr uses label metadata from LLVM IR as its
parameter. So, the backend could get metadata information of labels from
DBG_LABEL MachineInstr.
Differential Revision: https://reviews.llvm.org/D45341
Patch by Hsiangkai Wang.
llvm-svn: 331842
In order to set breakpoints on labels and list source code around
labels, we need collect debug information for labels, i.e., label
name, the function label belong, line number in the file, and the
address label located. In order to keep these information in LLVM
IR and to allow backend to generate debug information correctly.
We create a new kind of metadata for labels, DILabel. The format
of DILabel is
!DILabel(scope: !1, name: "foo", file: !2, line: 3)
We hope to keep debug information as much as possible even the
code is optimized. So, we create a new kind of intrinsic for label
metadata to avoid the metadata is eliminated with basic block.
The intrinsic will keep existing if we keep it from optimized out.
The format of the intrinsic is
llvm.dbg.label(metadata !1)
It has only one argument, that is the DILabel metadata. The
intrinsic will follow the label immediately. Backend could get the
label metadata through the intrinsic's parameter.
We also create DIBuilder API for labels to be used by Frontend.
Frontend could use createLabel() to allocate DILabel objects, and use
insertLabel() to insert llvm.dbg.label intrinsic in LLVM IR.
Differential Revision: https://reviews.llvm.org/D45024
Patch by Hsiangkai Wang.
llvm-svn: 331841
Previously thumb bits were only checked for external relocations (thumb to arm
code and vice-versa). This patch adds detection for thumb callees in the same
section asthe (also thumb) caller.
The MachO/Thumb test case is updated to cover this, and redundant checks
(handled by the MachO/ARM test) are removed.
llvm-svn: 331838
Summary:
The current LowerInvoke pass cannot handle invoke instructions with a
funclet bundle operand. The order of operands for an invoke instruction
is {call arguments, callee, funclet operand (if any), normal dest,
unwind dest}. The current code assumes there is no funclet operand and
incorrectly includes a funclet operand into call arguments.
Reviewers: rnk
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D46242
llvm-svn: 331832
We enter MergeBlockIntoPredecessor with a block looking like this:
for.inc.us-lcssa: ; preds = %cond.end
%k.1.lcssa.ph = phi i32 [ %conv15, %cond.end ]
%t.3.lcssa.ph = phi i32 [ %k.1.lcssa.ph, %cond.end ]
br label %for.inc, !dbg !66
[note the first arg of the PHI being a PHI].
FoldSingleEntryPHINodes gets rid of both PHIs (calling, eraseFromParent).
But right before we call the function, we push into IncomingValues the
only argument of the PHIs, and shortly after we try to iterate over
something which has been invalidated before :(
The fix its not trying to remove PHIs which have an incoming value
coming from the same BB we're looking at.
Fixes PR37300 and rdar://problem/39910460
Differential Revision: https://reviews.llvm.org/D46568
llvm-svn: 331824
Fix a silly mistake in my pre-commit changes for r331816. It should check what
opcode the insn is before extracting the operands.
NFC at the moment since the caller already checked the opcode.
llvm-svn: 331820
Refactoring LegalizerHelper::widenScalar member function reducing its
size by approximately a factor of 2 and (hopefuly) making it more
straightforward and regular by introducing widenScalarSrc and
widenScalarDst helper methods.
The new widenScalar* methods mutate the instructions in place instead
of recreating them from scratch and removing the originals. The
compile time implications of this were measured on sqlite3
amalgamation, targeting AArch64 in -O0:
LegalizerHelper::widenScalar: > 25% faster
Legalizer::runOnMachineFunction: ~ 4.0 - 4.5% faster
Also adding MachineOperand::setCImm and refactoring out
MachineIRBuilder::recordInsertion methods to make the change possible.
Reviewers: aditya_nandakumar, bogner, javed.absar, t.p.northover, ab, dsanders, arsenm
Reviewed By: aditya_nandakumar
Subscribers: wdng, rovka, kristof.beyls, llvm-commits
Differential Revision: https://reviews.llvm.org/D46414
llvm-svn: 331819
Before SVN r244158, codeview debug info was emitted always
emitted for msvc if debug info was enabled, but that commit
added a module flag.
Since it's still restricted by the flag, we can allow it
for any target if the user requests it, not only msvc (and
windows-itanium, added in SVN r287567).
Add a test for emitting it for a mingw target.
Differential Revision: https://reviews.llvm.org/D46303
llvm-svn: 331809
Summary:
Don't skip functions with the same name but from different files.
That change makes it possible to generate code coverage reports from
different binaries compiled from different sources even if there are functions
with non-unique names. Without that change, code coverage for such functions is
missing except of the first function processed.
Reviewers: vsk, morehouse
Reviewed By: vsk
Subscribers: llvm-commits, kcc
Differential Revision: https://reviews.llvm.org/D46478
llvm-svn: 331801
Summary:
Broadcast code generation emitted instructions in pre-header, while the instruction they are dependent on in the vector loop body.
This resulted in an IL verification error ---- value used before defined.
Reviewers: rengolin, fhahn, hfinkel
Reviewed By: rengolin, fhahn
Subscribers: dcaballe, Ka-Ka, llvm-commits
Differential Revision: https://reviews.llvm.org/D46302
llvm-svn: 331799
Summary:
AMDGPU stores a numerical code for the particular GPU variant in EFlags
in the ELF file. This commit provides a mapping from that number into
the machine name for use by objdump-type tools.
Change-Id: Id37fc0bebad443bd89c0080985ce298c4e7e9319
Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, t-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D46587
llvm-svn: 331798
Legalize and emit code for truncate and convert float128 to (un)signed short
and (un)signed char.
Differential Revision: https://reviews.llvm.org/D46194
llvm-svn: 331797
CodeGenPrepare pass move extension instructions close to load instructions in different BB, so they can be combined later. But the extension instructions can't move through logical and shift instructions in current implementation. This patch enables this enhancement, so we can eliminate more extension instructions.
Differential Revision: https://reviews.llvm.org/D45537
llvm-svn: 331783
Existing DAG combine only handles conversions for FP_TO_SINT:
"{f32, f64} x { i32, i16 }"
This patch simplifies the code to handle:
"{ FP_TO_SINT, FP_TO_UINT } x { f64, f32 } x { i64, i32, i16, i8 }"
Differential Revision: https://reviews.llvm.org/D46102
llvm-svn: 331778
- Report error for invalid dpp_ctrl values.
- Changed the way it is reported, now the error will be emitted into
asm and will work with release build as well.
- Added dpp_ctrl value verifier for codegen.
- Added symbolic constants for dpp_ctrl.
Differential Revision: https://reviews.llvm.org/D46565
llvm-svn: 331775
Introduced a new pattern for matching splat.d explicitly.
Both splat.d and splati.d can now be generated from the @llvm.mips.splat.d
intrinsic depending on whether an immediate value has been passed.
Differential Revision: https://reviews.llvm.org/D45683
llvm-svn: 331771
This fixes a couple of BtVer2 missing instructions that weren't been handled in the override.
NOTE: There are still a lot of overrides that still need cleaning up!
llvm-svn: 331770
I've created the necessary classes but there are still a lot of overrides that need cleaning up.
NOTE: The Znver1 model was missing some div/idiv variants in the instregex patterns and wasn't setting the resource cycles at all in the overrides.
llvm-svn: 331767
Patch https://reviews.llvm.org/D41445 changed the behaviour of 'isReg()'
to also return 'true' if the parsed register operand is a vector
register. Code in the AsmMatcher checks if a register is a subclass of the
expected register class. However, even though both parsed registers map
to the same physical register, the 'v' register is of kind 'NeonVector',
where 'q' is of type Scalar, where isSubclass() does not distinguish
between the two cases.
The solution is to use an AsmOperand instead of the register directly,
and use the PredicateMethod to distinguish the two operands.
This fixes for example:
ldr v0, [x0] // 'v0' is an invalid operand for this instruction
ldr q0, [x0] // valid
Reviewers: aemerson, Gerolf, SjoerdMeijer, javed.absar
Reviewed By: aemerson
Differential Revision: https://reviews.llvm.org/D46310
llvm-svn: 331755
This is a fix for PR30290: by marking all byval stack slots as being aliased,
the instruction scheduler is more conservative about rescheduling memory
accesses to such stack slots as an LLVM Value* might alias it. This fixes
errors such as in the patched test case, where reads and writes to a data
structure are illegally mixed.
This could be fixed better in the future with better analysis for the
instruction scheduler to know what Values alias what stack slots.
Differential Revision: https://reviews.llvm.org/D45022
llvm-svn: 331749
This patch adds a shadow stack fix when compiling
setjmp/longjmp with the shadow stack enabled. This
allows setjmp/longjmp to work correctly with CET.
Patch by mike.dvoretsky
Differential Revision: https://reviews.llvm.org/D46181
llvm-svn: 331748
The code was previously relying on there being a null terminator
somewhere in (or after) the string table, something made less likely by
r330786.
Differential Revision: https://reviews.llvm.org/D46527
llvm-svn: 331746
Summary:
and use the -msgx flag as a requirement
for the SGX instructions.
Reviewers: craig.topper, zvi
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D46436
llvm-svn: 331742
Summary:
In formLCSSAForInstructions we speculatively add new PHI
nodes, that sometimes ends up without having any uses. It
has been discovered that sometimes an added PHI node can
appear as being unused in one iteration of the Worklist,
although it can end up being used by a PHI node added in
a later iteration. We now check, a second time, that the
PHI node still is unused before we remove it. This avoids
an assert about "Trying to remove a phi with uses." for the
added test case.
Reviewers: davide, mzolotukhin, mattd, dberlin
Reviewed By: mzolotukhin, dberlin
Subscribers: dberlin, mzolotukhin, davide, bjope, uabelho, llvm-commits
Differential Revision: https://reviews.llvm.org/D46422
llvm-svn: 331741
Making sure we don't truncate / extend pointers, don't try to change
vector topology or bitcast vectors to scalars or back, and most
importantly, don't extend to a smaller type or truncate to a large
one.
Reviewers: qcolombet t.p.northover aditya_nandakumar
Reviewed By: qcolombet
Subscribers: rovka, kristof.beyls, llvm-commits
Differential Revision: https://reviews.llvm.org/D46490
llvm-svn: 331718
Summary: Makes this consistent with the old PM.
Reviewers: eraman
Subscribers: mehdi_amini, llvm-commits
Differential Revision: https://reviews.llvm.org/D46526
llvm-svn: 331709
MCSymbol has getIndex/setIndex which are implementation defined
and on other platforms are used to store the symbol table
index. It makes sense to use this rather than invent a new
mapping.
Differential Revision: https://reviews.llvm.org/D46555
llvm-svn: 331705
This change causes us to re-run tablegen for every single target on
every single build. This is much, much worse than the problem being
fixed AFAICT.
On my system, it makes a clean rebuild of `llc` with nothing changed go
from .5s to over 8s. On systems with less parallelism, slower file
systems, or high process startup overhead this will be even more
extreme.
The only way I see this could be a win is in clean builds where we churn
the filesystem. But I think incremental rebuild is more important, and
so if we want to re-instate this, it needs to be done in a way that
doesn't trigger constant re-runs of tablegen.
llvm-svn: 331702
Every generic machine instruction must have generic virtual registers
only, that is, have a low-level type attached to each operand.
Previously MachineVerifier would catch a type missing on an operand
only if the previous operand for the the same type index exists and
have a type attached to it and it will report it as a type mismatch.
This is incosistent behaviour and a misleading error message.
This commit makes sure MachineVerifier explicitly checks that the
types are there for every operand and if not provides a
straightforward error message.
Reviewers: qcolombet t.p.northover bogner ab
Reviewed By: qcolombet
Subscribers: rovka, kristof.beyls, llvm-commits
Differential Revision: https://reviews.llvm.org/D46455
llvm-svn: 331694
This is an NFC pre-commit for the following "Checking that generic
instrs have LLTs on all vregs" commit.
This overloads MachineOperand::print to make it possible to print LLTs
with standalone machine operands.
This also overloads MachineVerifier::print(...MachineOperand...) with
an optional LLT using the newly introduced MachineOperand::print
variant; no actual calls added.
This also refactors MachineVerifier::visitMachineInstrBefore in the
parts dealing with all generic instructions (checking Selected
property, LLTs, and phys regs).
llvm-svn: 331693
Summary:
Split off from D46031.
The previous patch, D46493, completely disabled unfolding in case of immediates.
But we can do better:
{F6120274} {F6120277}
https://rise4fun.com/Alive/xJS
Reviewers: spatel, craig.topper
Reviewed By: spatel
Subscribers: andreadb, llvm-commits
Differential Revision: https://reviews.llvm.org/D46494
llvm-svn: 331685
Summary:
Split off from D46031.
In masked merge case, this degrades IPC by decreasing instruction count.
{F6108777}
The next patch should be able to recover and improve this.
This also affects the transform @spatel have added in D27489 / rL289738,
and the test coverage for X86 was missing.
But after i have added it, and looked at the changes in MCA, i'm somewhat confused.
{F6093591} {F6093592} {F6093593}
I'd say this regression is an improvement, since `IPC` increased in that case?
Reviewers: spatel, craig.topper
Reviewed By: spatel
Subscribers: andreadb, llvm-commits, spatel
Differential Revision: https://reviews.llvm.org/D46493
llvm-svn: 331684
to be needed: jump table sections are created with .cfi.jumptable suffix. With
this change each jump table is placed in a separate section, which allows the
linker to re-order them.
Differential Revision: https://reviews.llvm.org/D46537
llvm-svn: 331680
Split to support single/double for scalar, XMM and YMM/ZMM instructions - removing InstrRW overrides for these instructions.
Fixes Atom ADDSUBPD instruction and reclassifies VFPCLASS as WriteFCmp which is closer in behaviour.
llvm-svn: 331672
Summary:
getNode optimizes (ext (trunc x)) to x and the dbgvalue node on trunc is lost. The fix calls transferDbgValues to add the dbgvalue to x.
Add DebugInfo/AArch64/dbg-value-i16.ll
Patch by Sejong Oh!
Reviewers: aprantl, javed.absar, llvm-commits, vsk
Reviewed By: aprantl, vsk
Subscribers: kristof.beyls, JDevlieghere, llvm-commits
Differential Revision: https://reviews.llvm.org/D46348
llvm-svn: 331665
Updated wasm section symbols names to match section name, and ensure all
referenced sections will have a symbol (per DWARF spec v3, Figure 43)
Patch by Yury Delendik!
Differential Revision: https://reviews.llvm.org/D46543
llvm-svn: 331664
These are more like cross-lane shuffles than regular shuffles - we already do this for AVX512 equivalents.
Differential Revision: https://reviews.llvm.org/D46229
llvm-svn: 331659
Remove the old waitcnt pass ( si-insert-waits ), which is no longer maintained
and getting crufty
Differential Revision: https://reviews.llvm.org/D46448
llvm-svn: 331641
Instead of enabling it for non NDEBUG builds, use -verify-cfiinstrs to
run verifier in CFIInstrInserter. It defaults to false.
Differential Revision: https://reviews.llvm.org/D46444
llvm-svn: 331635
Summary:
Previously, all DS ops forced WQM in a pixel shader. That was a hack to
allow for graphics frontends using ds_swizzle to implement explicit
derivatives, on SI/CI at least where DPP is not available. But it forced
WQM for _any_ DS op.
With this commit, DS ops no longer force WQM. Both graphics frontends
(Mesa and LLPC) need to change to issue an explicit llvm.amdgcn.wqm
intrinsic call when calculating explicit derivatives.
The required Mesa change is: "amd/common: use llvm.amdgcn.wqm for
explicit derivatives".
Subscribers: qcolombet, arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, t-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D46051
Change-Id: I9b745b626fa91bbd66456e6cf41ee07eeea42f81
llvm-svn: 331633
WriteFRcp/WriteFRsqrt are split to support scalar, XMM and YMM/ZMM instructions.
WriteFSqrt is split into single/double/long-double sizes and scalar, XMM, YMM and ZMM instructions.
This removes all InstrRW overrides for these instructions.
NOTE: There were a couple of typos in the Znver1 model - notably a 1cy throughput for SQRT that is highly unlikely and doesn't tally with Agner.
NOTE: I had to add Agner's numbers for several targets for WriteFSqrt80.
llvm-svn: 331629