Summary:
A store to an object whose lifetime is about to end can be removed.
See PR40550 for motivation.
Reviewers: niravd
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D57541
llvm-svn: 354244
The pattern-matching from rL354221 / rL354224 doesn't cover
these, so we're up to 8 different commuted possibilities.
There may still be 1 more variant of this pattern.
llvm-svn: 354236
No need for a separate stack slot. The lifetimes don't overlap.
Also fix the MachinePointerInfo for the final load after the integer conversion to indicate it came from the stack slot.
llvm-svn: 354234
We want to use the sum in the icmp to allow matching with
m_UAddWithOverflow and eliminate the 'not'. This is discussed
in D51929 and is another step towards solving PR14613:
https://bugs.llvm.org/show_bug.cgi?id=14613
(The matching here is incomplete. Trying to take minimal steps
to make sure we don't induce infinite looping from existing
canonicalizations of the 'select'.)
llvm-svn: 354221
No need to manually split everything. We can let the type legalizer work for us.
The test change seems to be caused by some DAG ordering issue that was previously circumventing a one use check in LowerSELECT where FP selects are turned into blends if the setcc has one use. But it was running after an integer select and the same setcc had been legalized to cmov and X86SISD::CMP. This dropped the use count of the setcc, but wasn't what was intended.
llvm-svn: 354197
Summary:
This change fixes the `-no-llvm-bc` flag to work with object files within
archives. Currently the `-no-llvm-bc` flag works for regular object files, but
not static libraries, where it continues to show bitcode symbol info.
Original support was added in D4371.
Reviewers: compnerd, smeenai, pcc
Reviewed By: compnerd
Subscribers: rupprecht, keith, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D48798
llvm-svn: 354196
Preventing the load fold won't fix the partial register update since the
input we can fold is a GPR. So it will do nothing to prevent a false dependency
on an XMM register.
llvm-svn: 354193
When we need to do an fp->int conversion using x87 instructions, we need to temporarily change the rounding mode to 0b11 and perform a store. To do this we save the old value of the fpcw to the stack, then set the fpcw to 0xc7f, do the store, then restore fpcw. But the 0xc7f value forces the exception mask bits 1. While this is what they would be in the default FP environment, as we move to support changing the FP environments, we shouldn't make this assumption.
This patch changes the code to explicitly OR 0xc00 with the old value so that only the rounding mode is changed. Unfortunately, this requires two stack temporaries instead of one. One to hold the old value and one to hold the new value. Without two stack temporaries we would need an additional GPR. We already need one to do the OR operation in. This is similar to what gcc and icc do for this operation. Though they are both better at reusing the stack temporaries when there are multiple truncates in a function(or at least in a basic block)
Differential Revision: https://reviews.llvm.org/D57788
llvm-svn: 354178
Implement two more transforms of atomicrmw:
1) We can convert an atomicrmw which produces a known value in memory into an xchg instead.
2) We can convert an atomicrmw xchg w/o users into a store for some orderings.
Differential Revision: https://reviews.llvm.org/D58290
llvm-svn: 354170
It seems there were some problem with using a .mir test. For some reason
doing '-stop-before=codegenprepare' and then '-start-before=codegenprepare'
on the output .mir file results in the NoVRegs Property after instruction
selection.
Recommitting the same test as an .ll file instead.
llvm-svn: 354160
If a lifetime.end marker occurs along one path through the extraction
region, but not another, then it's still incorrect to lift the marker,
because there is some path through the extracted function which would
ordinarily not reach the marker. If the call to the extracted function
is in a loop, unrolling can cause inputs to the function to become
optimized out as undef after the first iteration.
To prevent incorrect stack slot merging in the calling function, it
should be sufficient to lift lifetime.start markers for region inputs.
I've tested this theory out by doing a stage2 check-all with randomized
splitting enabled.
This is a follow-up to r353973, and there's additional context for this
change in https://reviews.llvm.org/D57834.
rdar://47896986
Differential Revision: https://reviews.llvm.org/D58253
llvm-svn: 354159
With or without PGO data applied, splitting early in the pipeline
(either before the inliner or shortly after it) regresses performance
across SPEC variants. The cause appears to be that splitting hides
context for subsequent optimizations.
Schedule splitting late again, in effect reversing r352080, which
scheduled the splitting pass early for code size benefits (documented in
https://reviews.llvm.org/D57082).
Differential Revision: https://reviews.llvm.org/D58258
llvm-svn: 354158
Testing based on the total size of the elements failed to catch a few
invalid scenarios, so explicitly check the number of elements/operands
and types.
This failed to catch situations like
<4 x s16> = G_BUILD_VECTOR s32, s32 since the total size added
up. This also would fail to catch an implicit conversion between
pointers and scalars.
llvm-svn: 354139
The Verifier is separate from the MachineVerifier, so move it to a
different directory. Some other verifier tests were scattered in
target codegen tests as well (although I'm sure I missed some). Work
towards using a more consistent naming scheme to make it clearer where
the gaps still are for generic instructions.
llvm-svn: 354138
Constant hoisting may have hidden a constant behind a bitcast so that
it isn't folded into its users. However, this prevents BPI from
calculating some of its heuristics that are based upon constant
values. So, I've added a simple helper function to look through these
casts.
Differential Revision: https://reviews.llvm.org/D58166
llvm-svn: 354119
As detailed on PR40730, we are not correctly filling in the lane shuffle mask (D53148/rL344446) - we fill in for the correct src lane but don't add it to the correct mask element, so any reference to the correct element is likely to see an UNDEF mask index.
This allows constant folding to propagate UNDEFs prior to the lane mask being (correctly) lowered to vperm2f128.
This patch fixes the issue by fully populating the lane shuffle mask - this is more than is necessary (if we only filled in the required mask elements we might be able to match other shuffle instructions - broadcasts etc.), but its the most cautious approach as this needs to be cherrypicked into the 8.0.0 release branch.
Differential Revision: https://reviews.llvm.org/D58237
llvm-svn: 354117
This patch also introduces the emitAuipcInstPair helper, which is then used
for both emitLoadAddress and emitLoadLocalAddress.
Differential Revision: https://reviews.llvm.org/D55325
Patch by James Clarke.
llvm-svn: 354111
ConvertTruncs is used to replace a trunc for an AND mask, however
this function wasn't working as expected. By performing the change
later, we can create a wide type integer mask instead of a narrow -1
value, which could then be simply removed (incorrectly). Because we
now perform this action later, it's necessary to cache the trunc type
before we perform the promotion.
Differential Revision: https://reviews.llvm.org/D57686
llvm-svn: 354108
Summary:
llvm-symbolizer would originally report symbols that belonged to an invalid object file section.
Specifically the case where: `*Symbol.getSection() == ObjFile.section_end()`
This patch prevents the Symbolizer from collecting symbols that belong to invalid sections.
The test (from PR40591) introduces a case where two symbols have address 0,
one symbol is defined, 'foo', and the other is not defined, 'bar'. This patch will cause
the Symbolizer to keep 'foo' and ignore 'bar'.
As a side note, the logic for adding symbols to the Symbolizer's store
(`SymbolizableObjectFile::addSymbol`) replaces symbols with the
same <address, size> pair. At some point that logic should be revisited as in the
aforementioned case, 'bar' was overwriting 'foo' in the Symbolizer's store,
and 'foo' was forgotten.
This fixes PR40591
Reviewers: jhenderson, rupprecht
Reviewed By: rupprecht
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D58146
llvm-svn: 354083
These haven't been checking anything useful and have been testing the
wrong failure reason for many years. Replace them with something which
stresses what is actually implemented in the verifier now.
llvm-svn: 354070
This is basically a pointer typed add, so shouldn't be any different.
This was assuming everything was an SGPR, which is not true.
Also cleanup legality for GEP. I don't seem to be seeing the problem
the hack marking s64 as a legal pointer type the comment mentions.
llvm-svn: 354067
Reassociate adds to collect scalar operands in a single
instruction when possible. That will result in a scalar
add followed by vector instead of two vector adds, thus
better utilizing SALU.
Differential Revision: https://reviews.llvm.org/D58220
llvm-svn: 354066
Summary:
The changes to disable LTO unit splitting by default (r350949) and
detect inconsistently split LTO units (r350948) are causing some crashes
when the inconsistency is detected in multiple threads simultaneously.
Fix that by having the code always look for the inconsistently split
LTO units during the thin link, by checking for the presence of type
tests recorded in the summaries.
Modify test added in r350948 to remove single threading required to fix
a bot failure due to this issue (and some debugging options added in the
process of diagnosing it).
Reviewers: pcc
Subscribers: mehdi_amini, inglorion, eraman, steven_wu, dexonsmith, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D57561
llvm-svn: 354062
For "idempotent" atomicrmw instructions which we can't simply turn into load, canonicalize the operation and constant. This reduces the matching needed elsewhere in the optimizer, but doesn't directly impact codegen.
For any architecture where OR/Zero is not a good default choice, you can extend the AtomicExpand lowerIdempotentRMWIntoFencedLoad mechanism. I reviewed X86 to make sure this works well, haven't audited other backends.
Differential Revision: https://reviews.llvm.org/D58244
llvm-svn: 354058
Expand on Quentin's r353471 patch which converts some atomicrmws into loads. Handle remaining operation types, and fix a slight bug. Atomic loads are required to have alignment. Since this was within the InstCombine fixed point, somewhere else in InstCombine was adding alignment before the verifier saw it, but still, we should fix.
Terminology wise, I'm using the "idempotent" naming that is used for the same operations in AtomicExpand and X86ISelLoweringInfo. Once this lands, I'll add similar tests for AtomicExpand, and move the pattern match function to a common location. In the review, there was seemingly consensus that "idempotent" was slightly incorrect for this context. Once we setle on a better name, I'll update all uses at once.
Differential Revision: https://reviews.llvm.org/D58242
llvm-svn: 354046
Summary:
GNU ar has a `P` modifier that changes filename comparisons to use full paths instead of the basename. As noted in the GNU docs, regular archives are not created with full path names, so P is used to deal with archives created by other archive programs (e.g. see the updated `absolute-paths.test` test case).
Since thin archives use full path names -- paths are relative to the archive -- it seems very error prone to not imply P when dealing with thin archives, so P is implied in those cases. (I think this is a deviation from GNU ar that makes sense).
This fixes PR37436 via https://github.com/ClangBuiltLinux/linux/issues/33.
Reviewers: mstorsjo, pcc, ruiu, davide, david2050, rnk
Subscribers: tpimh, llvm-commits, nickdesaulniers
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D57927
llvm-svn: 354044
Summary:
In r353537 we now copy all metadata to the new function, with the old
being removed when the old function is eliminated. In some cases the old
function is dropped to a declaration (seems to only occur with the old
PM). Go ahead and clear all metadata from the old function to handle that
case, since verification will complain otherwise. This is consistent
with what was being done for debug metadata before r353537.
Reviewers: davidxl, uabelho
Subscribers: jdoerfert, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D58215
llvm-svn: 354032
The test case requires the peeled loop to be forgotten after peeling,
even though it does not have a parent. When called via the unroller,
SE->forgetTopmostLoop is also called, so the test case would also pass
without any SCEV invalidation, but peelLoop is exposed as utility
function. Also, in the test case, simplifyLoop will make changes,
removing the loop from SCEV, but it is better to not rely on this
behavior.
Reviewers: sanjoy, mkazantsev
Reviewed By: mkazantsev
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D58192
llvm-svn: 354031
Select G_BR and G_BRCOND for MIPS32.
Unconditional branch G_BR does not have register operand,
for that reason we only add tests.
Since conditional branch G_BRCOND compares register to zero on MIPS32,
explicit extension must be performed on i1 condition in order to set
high bits to appropriate value.
Differential Revision: https://reviews.llvm.org/D58182
llvm-svn: 354022
The Arm peephole optimiser code keeps track of both an MI and a SubAdd that can
be used to optimise away a CMP. In the rare case that both are found and not
ruled-out as valid, we could end up setting the flags on the wrong one.
Instead make sure we are using SubAdd if it exists, as it will be closer to the
CMP.
The testcase here is a little theoretical, with a dead def of cpsr. It should
hopefully show the point.
Differential Revision: https://reviews.llvm.org/D58176
llvm-svn: 354018