Summary:
This is a follow up to rL331182. A PHI node can be split up into
several MIR PHI nodes when being selected. When there is a
dbg.value intrinsic that uses the result of such a PHI node we
need to select several DBG_VALUE instructions, with fragment
expressions, in order to do a correct selection.
Reviewers: rnk, aprantl, vsk
Reviewed By: vsk
Subscribers: mattd, llvm-commits, JDevlieghere, aprantl, gbedwell, rnk
Tags: #debug-info
Differential Revision: https://reviews.llvm.org/D46329
llvm-svn: 331337
and (or (lshr X, C), ...), 1 --> (X & C') != 0
I initially thought about implementing the minimal pattern in instcombine as mentioned here:
https://bugs.llvm.org/show_bug.cgi?id=37098#c6
...but we need to do better to catch the more general sequence from the motivating test
(more than 2 bits in the compare). And a test-suite run with statistics showed that this
pattern only happened 2 times currently. It would potentially happen more often if
reassociation worked better (D45842), but it's probably still not too frequent?
This is small enough that I didn't see a need to create a whole new class/file within
AggressiveInstCombine. There are likely other relatively small matchers like what was
discussed in D44266 that would slide under foldUnusualPatterns() (name suggestions welcome).
We could potentially also consolidate matchers for ctpop, bswap, etc under here.
Differential Revision: https://reviews.llvm.org/D45986
llvm-svn: 331311
While running the lit tests for the most recent version of D45916
(https://reviews.llvm.org/D45916), I found that a couple tests for this pass
suddenly started segfaulting. Since the outliner wasn't actually doing anything
to the code in either of these tests I got curious.
I found that the pass doesn’t completely create the machine-level constructs
necessary to actually add a MachineFunction and MachineBasicBlock to the
module. This patch adds in those missing bits. After this, adding the
outliner before this pass won’t cause it to segfault.
You can recreate this behaviour by adding the MachineOutliner directly before
the pass and having it return false immediately.
https://reviews.llvm.org/D46330
llvm-svn: 331307
The logic for this combine is almost identical to the logic for a
(sext (sextload x)) combine.
This commit factors out the logic so it can be shared by both combines,
and corrects the SDLoc assigned in the zext version of the combine.
Prior to this patch, for the given test case, we would apply the
location associated with the udiv instruction to instructions which
perform the load.
Part of: llvm.org/PR37262
llvm-svn: 331303
Prior to this patch, for the given test case, we would apply the
location associated with the sdiv instruction to instructions which
perform the load.
Part of: llvm.org/PR37262.
Differential Revision: https://reviews.llvm.org/D46222
llvm-svn: 331302
In DAGCombiner, we try to simplify this pattern:
([s|z]ext (load ...))
Conceptually, a new extload which is created while splitting the load
should have the same debug location as the load.
Making this change affects the IROrder of the new load, causing some
test case churn.
In practice, the new location is never different from the location of
the [s|z]ext, at least not during check-llvm or a stage2 build.
Part of: llvm.org/PR37262
Differential Revision: https://reviews.llvm.org/D46156
llvm-svn: 331301
Setting the right SDLoc on a newly-created zextload fixes a line table
bug which resulted in non-linear stepping behavior.
Several backend tests contained CHECK lines which relied on the IROrder
inherited from the wrong SDLoc. This patch breaks that dependence where
feasbile and regenerates test cases where not.
In some cases, changing a node's IROrder may alter register allocation
and spill behavior. This can affect performance. I have chosen not to
prevent this by applying a "known good" IROrder to SDLocs, as this may
hide a more general bug in the scheduler, or cause regressions on other
test inputs.
rdar://33755881, Part of: llvm.org/PR37262
Differential Revision: https://reviews.llvm.org/D45995
llvm-svn: 331300
LLVM always puts function definition DIEs at the top level, but under
some circumstances GCC does not (at least in this case with member
functions of a function-local type).
To ensure that doesn't appear as though the local type's member function
is unduly inlined within the outer function - ensure the inline
discovery DIE parent walk stops at the first DW_TAG_subprogram.
llvm-svn: 331291
This is a follow-up to r331272.
We've been running doxygen with the autobrief option for a couple of
years now. This makes the \brief markers into our comments
redundant. Since they are a visual distraction and we don't want to
encourage more \brief markers in new code either, this patch removes
them all.
Patch produced by
for i in $(git grep -l '\@brief'); do perl -pi -e 's/\@brief //g' $i & done
https://reviews.llvm.org/D46290
llvm-svn: 331275
We've been running doxygen with the autobrief option for a couple of
years now. This makes the \brief markers into our comments
redundant. Since they are a visual distraction and we don't want to
encourage more \brief markers in new code either, this patch removes
them all.
Patch produced by
for i in $(git grep -l '\\brief'); do perl -pi -e 's/\\brief //g' $i & done
Differential Revision: https://reviews.llvm.org/D46290
llvm-svn: 331272
Summary:
This is a fix for PR23997.
The loop vectorizer is not preserving the inbounds property of GEPs that it creates.
This is inhibiting some optimizations. This patch preserves the inbounds property in
the case where a load/store is being fed by an inbounds GEP.
Reviewers: mkuper, javed.absar, hsaito
Reviewed By: hsaito
Subscribers: dcaballe, hsaito, llvm-commits
Differential Revision: https://reviews.llvm.org/D46191
llvm-svn: 331269
phi is on lhs of a comparison op.
For the following testcase,
L1:
%t0 = add i32 %m, 7
%t3 = icmp eq i32* %t2, null
br i1 %t3, label %L3, label %L2
L2:
%t4 = load i32, i32* %t2, align 4
br label %L3
L3:
%t5 = phi i32 [ %t0, %L1 ], [ %t4, %L2 ]
%t6 = icmp eq i32 %t0, %t5
br i1 %t6, label %L4, label %L5
We know if we go through the path L1 --> L3, %t6 should always be true. However
currently, if the rhs of the eq comparison is phi, JumpThreading fails to
evaluate %t6 to true. And we know that Instcombine cannot guarantee always
canonicalizing phi to the left hand side of the comparison operation according
to the operand priority comparison mechanism in instcombine. The patch handles
the case when rhs of the comparison op is a phi.
Differential Revision: https://reviews.llvm.org/D46275
llvm-svn: 331266
The previous version of this patch restricted the 'jal' instruction to MIPS and
microMIPSr3. microMIPS32r6 does not have this instruction and instead uses jal
as an alias for balc.
Original commit message:
> Reviewers: smaksimovic, atanasyan, abeserminji
>
> Differential Revision: https://reviews.llvm.org/D46114
>
llvm-svn: 331259
Without this change, GCC 7 raises the warning below:
control reaches end of non-void function
Reviewers: sbc100, andreadb
Reviewed By: andreadb
Differential Revision: https://reviews.llvm.org/D46304
llvm-svn: 331255
This patch fixes a bug introduced by revision 330778 (originally reviewed at:
https://reviews.llvm.org/D44782), where function isFrameLoadOpcode returned
the wrong number of bytes read for opcodes VMOVSSrm and VMOVSDrm.
This corrects that mistake, and extends the regression test to catch cases where
the dead stores should be removed.
Patch by Jeremy Morse.
Differential Revision: https://reviews.llvm.org/D46256
llvm-svn: 331252
The warning was (introduced in r331220):
lib/MC/WasmObjectWriter.cpp:51:1: warning: control reaches end of non-void function [-Wreturn-type]
}
^
llvm-svn: 331251
unswitch and replace it with the amazingly simple update API code.
This addresses piles of FIXMEs around the update logic here and makes
everything substantially simpler.
llvm-svn: 331247
code review.
It turns out this *is* necessary, and I read the comment on the API
correctly the first time. ;]
The `applyUpdates` routine requires that updates are "balanced". This is
in order to cleanly handle cycles like inserting, removing, nad then
re-inserting the same edge. This precludes inserting the same edge
multiple times in a row as handling that would cause the insertion logic
to become *ordered* instead of *unordered* (which is what the API
provides).
It happens that in this specific case nothing (other than an assert and
contract violation) goes wrong because we're never inserting and
removing the same edge. The implementation *happens* to do the right
thing to eliminate redundant insertions in that case.
But the requirement is there and there is an assert to catch it.
Somehow, after the code review I never did another asserts-clang build
testing loop-unswich for a long time. As a consequence, I didn't notice
this despite a bunch of testing going on, but it shows up immediately
with an asserts build of clang itself.
llvm-svn: 331246
Previously for instructions like fxsave we would print "opaque ptr" as part of the memory operand. Now we print nothing.
We also no longer accept "opaque ptr" in the parser. We still accept any size to be specified for these instructions, but we may want to consider only parsing when no explicit size is specified. This what gas does.
llvm-svn: 331243
This appears to have some issues associated with the file directive output
causing multiple global symbols with the name "file" to be emitted into a
startup section. I'm investigating more specific causes and working with the
original author.
This reverts commit r330271.
Also Revert "[DEBUGINFO, NVPTX] Add the test for the debug info of the local"
This reverts commit r330592 and the follow up of 330779 as the testcase is dependent upon r330271.
llvm-svn: 331237
This patch updates some code responsible the skip debug info to use
BasicBlock::instructionsWithoutDebug. I think this makes things slightly
simpler and more direct.
Reviewers: aprantl, vsk, hans, danielcdh
Reviewed By: hans
Differential Revision: https://reviews.llvm.org/D46252
llvm-svn: 331221
Dead defs were being removed from the live set (in stepForward), but
registers clobbered by regmasks weren't (more specifically, they were
actually removed by removeRegsInMask, but then they were added back in).
llvm-svn: 331219
Teach AsmParser to check with Assembler for when evaluating constant
expressions. This improves the handing of preprocessor expressions
that must be resolved at parse time. This idiom can be found as
assembling-time assertion checks in source-level assemblers. Note that
this relies on the MCStreamer to keep sufficient tabs on Section /
Fragment information which the MCAsmStreamer does not. As a result the
textual output may fail where the equivalent object generation would
pass. This can most easily be resolved by folding the MCAsmStreamer
and MCObjectStreamer together which is planned for in a separate
patch.
Currently, this feature is only enabled for assembly input, keeping IR
compilation consistent between assembly and object generation.
Reviewers: echristo, rnk, probinson, espindola, peter.smith
Reviewed By: peter.smith
Subscribers: eraman, peter.smith, arichardson, jyknight, hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D45164
llvm-svn: 331218
This patch updates some code responsible the skip debug info to use
BasicBlock::instructionsWithoutDebug. I think this makes things slightly
simpler and more direct.
Reviewers: aprantl, vsk, chandlerc
Reviewed By: aprantl
Differential Revision: https://reviews.llvm.org/D46253
llvm-svn: 331217
No need to waste space nor number MBBs differently if MF gets recreated.
Reviewers: qcolombet, stoklund, t.p.northover, bogner, javed.absar
Reviewed By: qcolombet
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D46078
llvm-svn: 331213
We need to split most of the scheduler classes by vector width to remove more of the InstRW overrides, this patch should make this easier/tidier by allowing us to pass the X86SchedWriteWidths wrapper to multi-width multiclasses and then split as required.
I've included fields for Scl (scalar float/double), MMX (MMX integer), XMM, YMM and ZMM widths. These fields mostly share the same classes but it should give us the flexibility that we may need in the future.
This patch has replaced a set of example SSE/AVX512 instruction cases but isn't exhaustive as it gets very noisy before we really need the functionality.
Differential Revision: https://reviews.llvm.org/D46266
llvm-svn: 331208
Summary:
As discussed in D45733, we want to do this in InstCombine.
https://rise4fun.com/Alive/LGk
Reviewers: spatel, craig.topper
Reviewed By: spatel
Subscribers: chandlerc, xbolva00, llvm-commits
Differential Revision: https://reviews.llvm.org/D45867
llvm-svn: 331205
This provides an optimized implementation of SADDO/SSUBO/UADDO/USUBO
as well as ADDCARRY/SUBCARRY on top of the new CC implementation.
In particular, multi-word arithmetic now uses UADDO/ADDCARRY instead
of the old ADDC/ADDE logic, which means we no longer need to use
"glue" links for those instructions. This also allows making full
use of the memory-based instructions like ALSI, which couldn't be
recognized due to limitations in the DAG matcher previously.
Also, the llvm.sadd.with.overflow et.al. intrinsincs now expand to
directly using the ADD instructions and checking for a CC 3 result.
llvm-svn: 331203
Currently, an instruction setting the condition code is linked to
the instruction using the condition code via a "glue" link in the
SelectionDAG. This has a number of drawbacks; in particular, it
means the same CC cannot be used by multiple users. It also makes
it more difficult to efficiently implement SADDO et. al.
This patch changes the back-end to represent CC dependencies as
normal values during SelectionDAG matching, along the lines of
how this is handled in the X86 back-end already.
In addition to the core mechanics of updating all relevant patterns,
this requires a number of additional changes:
- We now need to be able to spill/restore a CC value into a GPR
if necessary. This means providing a copyPhysReg implementation
for moves involving CC, and defining getCrossCopyRegClass.
- Since we still prefer to avoid such spills, we provide an override
for IsProfitableToFold to avoid creating a merged LOAD / ICMP if
this would result in multiple users of the CC.
- combineCCMask no longer requires a single CC user, and no longer
need to be careful about preventing invalid glue/chain cycles.
- emitSelect needs to be more careful in marking CC live-in to
the basic block it generates. Also, we can now optimize the
case of multiple subsequent selects with the same condition
just like X86 does.
llvm-svn: 331202
There are two separate fixes here:
* The lowering code for non-extending loads should report UnableToLegalize instead of emitting the same instruction.
* The target should not be requesting lowering of non-extending loads.
llvm-svn: 331201
This prevents infinite recursion in DWARFDie::findRecursively for
malformed DWARF where a DIE references itself.
This fixes PR36257.
Differential revision: https://reviews.llvm.org/D43092
llvm-svn: 331200
In patterns where we need to specify a result VT, prefer
[(set (tr.vt tr.op:$V1), (operator ...))]
over
[(set tr.op:$V1, (tr.vt (operator ...)))]
This is NFC now, but simplifies some future changes.
llvm-svn: 331192
If we have LOCR instructions, select them directly from SelectionDAG
instead of first going through a pseudo instruction and then using
the custom inserter to emit the LOCR.
Provide Select pseudo-instructions for VR32/VR64 if we have vector
instructions, to avoid having to go through the first 16 FPRs
unnecessarily.
If we do not have LOCFHR, prefer using LOCR followed by a move
over a conditional branch.
llvm-svn: 331191
See r331124 for how I made a list of files missing the include.
I then ran this Python script:
for f in open('filelist.txt'):
f = f.strip()
fl = open(f).readlines()
found = False
for i in xrange(len(fl)):
p = '#include "llvm/'
if not fl[i].startswith(p):
continue
if fl[i][len(p):] > 'Config':
fl.insert(i, '#include "llvm/Config/llvm-config.h"\n')
found = True
break
if not found:
print 'not found', f
else:
open(f, 'w').write(''.join(fl))
and then looked through everything with `svn diff | diffstat -l | xargs -n 1000 gvim -p`
and tried to fix include ordering and whatnot.
No intended behavior change.
llvm-svn: 331184
Summary:
This patch will introduce copying of DBG_VALUE instructions
from an otherwise empty basic block to predecessor/successor
blocks in case the empty block is eliminated/bypassed. It
is currently only done in one identified situation in the
BranchFolding pass, before optimizing on empty block.
It can be seen as a light variant of the propagation done
by the LiveDebugValues pass, which unfortunately is executed
after the BranchFolding pass.
We only propagate (copy) DBG_VALUE instructions in a limited
number of situations:
a) If the empty BB is the only predecessor of a successor
we can copy the DBG_VALUE instruction to the beginning of
the successor (because the DBG_VALUE instruction is always
part of the flow between the blocks).
b) If the empty BB is the only successor of a predecessor
we can copy the DBG_VALUE instruction to the end of the
predecessor (because the DBG_VALUE instruction is always
part of the flow between the blocks). In this case we add
the DBG_VALUE just before the first terminator (assuming
that the terminators do not impact the DBG_VALUE).
A future solution, to handle more situations, could perhaps
be to run the LiveDebugValues pass before branch folding?
This fix is related to PR37234. It is expected to resolve
the problem seen, when applied together with the fix in
SelectionDAG from here: https://reviews.llvm.org/D46129
Reviewers: #debug-info, aprantl, rnk
Reviewed By: #debug-info, aprantl
Subscribers: ormris, gbedwell, llvm-commits
Differential Revision: https://reviews.llvm.org/D46184
llvm-svn: 331183
Summary:
When building the selection DAG at ISel all PHI nodes are
selected and lowered to Machine Instruction PHI nodes before
we start to create any SDNodes. So there are no SDNodes for
values produced by the PHI nodes.
In the past when selecting a dbg.value intrinsic that uses
the value produced by a PHI node we have been handling such
dbg.value intrinsics as "dangling debug info". I.e. we have
not created a SDDbgValue node directly, because there is
no existing SDNode for the PHI result, instead we deferred
the creationg of a SDDbgValue until we found the first use
of the PHI result.
The old solution had a couple of flaws. The position of the
selected DBG_VALUE instruction would end up quite late in a
basic block, and for example not directly after the PHI node
as in the LLVM IR input. And in case there were no use at all
in the basic block the dbg.value could be dropped completely.
This patch introduces a new VREG kind of SDDbgValue nodes.
It is similar to a SDNODE kind of node, but it refers directly
to a virtual register and not a SDNode. When we do selection
for a dbg.value that is using the result of a PHI node we
can do a lookup of the virtual register directly (as it already
is determined for the PHI node) and create a SDDbgValue node
immediately instead of delaying the selection until we find a
use.
This should fix a problem with losing debug info at ISel
as seen in PR37234 (https://bugs.llvm.org/show_bug.cgi?id=37234).
It does not resolve PR37234 completely, because the debug info
is dropped later on in the BranchFolder (see D46184).
Reviewers: #debug-info, aprantl
Reviewed By: #debug-info, aprantl
Subscribers: rnk, gbedwell, aprantl, JDevlieghere, llvm-commits
Differential Revision: https://reviews.llvm.org/D46129
llvm-svn: 331182
This patch updates some code responsible the skip debug info to use
BasicBlock::instructionsWithoutDebug. I think this makes things
slightly simpler and more direct.
Reviewers: mkuper, rengolin, dcaballe, aprantl, vsk
Reviewed By: rengolin
Differential Revision: https://reviews.llvm.org/D46254
llvm-svn: 331174
The PMAXSD/PMINSD instregexs had been written as PMAX(C?)SD - looks like this was a search+replace error when matching float MAXSD/MINSD commutative instructions.
llvm-svn: 331167
Previously these instructions were unselectable and instead were generated
through the instruction mapping tables.
Reviewers: atanasyan, smaksimovic, abeserminji
Differential Revision: https://reviews.llvm.org/D46055
llvm-svn: 331165
This patch extends the 'isSVEVectorRegWithShiftExtend' function to
improve diagnostics for SVE's gather load (scalar + vector) addressing
modes. Instead of always suggesting the 'unscaled' addressing mode,
the use of DiagnosticPredicate enables a more specific error message
in the context where the scaling is incorrect. For example:
ld1h z0.d, p0/z, [x0, z0.d, lsl #2]
^
shift amount should be '1'
Instead of suggesting the packed, unscaled addressing mode:
expected 'z[0..31].d, (uxtw|sxtw)'
the assembler now suggests using the proper scaling:
expected 'z[0..31].d, (lsl|uxtw|sxtw) #1'
Reviewers: fhahn, rengolin, samparker, SjoerdMeijer, javed.absar
Reviewed By: fhahn
Differential Revision: https://reviews.llvm.org/D46124
llvm-svn: 331162
The instructions have predicates of Not64BitMode, but there are identical strings in InstAliases that have Mode32Bit and Mode16Bit. But the ordering is uncontrolled and the less specific Not64BitMode was ordered first.
This patch hides the Not64BitMode from the table so there is no conflict anymore.
llvm-svn: 331158
These aliases are used to default the memory forms of call and jmp to the size of the operating mode. This doesn't work for Intel syntax. We have a different hack in the AsmParser code itself to force a size on unsized memory operands.
llvm-svn: 331153
Most of the add<operandname>Operands() functions are the same
and can be replaced by using a single 'RenderMethod' in
the AArch64InstrFormats.td file. Since many of the scaled
immediates (with different scaling/bits) are the same, most of
these can reuse the same AsmOperandClass.
Reviewers: fhahn, rengolin, samparker, SjoerdMeijer, javed.absar
Reviewed By: samparker
Differential Revision: https://reviews.llvm.org/D46122
llvm-svn: 331146
Summary:
This is a follow up to D45420 (included here since it is still under review and this change is dependent on that) and D45072 (committed).
Actual change for this patch is LoopVectorize* and cmakefile. All others are all from D45420.
LoopVectorizationLegality is an analysis and thus really belongs to Analysis tree. It is modular enough and it is reusable enough ---- we can further improve those aspects once uses outside of LV picks up.
Hopefully, this will make it easier for people familiar with vectorization theory, but not necessarily LV itself to contribute, by lowering the volume of code they should deal with. We probably should start adding some code in LV to check its own capability (i.e., vectorization is legal but LV is not ready to handle it) and then bail out.
Reviewers: rengolin, fhahn, hfinkel, mkuper, aemerson, mssimpso, dcaballe, sguggill
Reviewed By: rengolin, dcaballe
Subscribers: egarcia, rogfer01, mgorny, llvm-commits
Differential Revision: https://reviews.llvm.org/D45552
llvm-svn: 331139
This allows the instruction selection to follow mode in Intel syntax. And allows a suffix to be used to change size.
This matches gas behavior from what I could tell.
llvm-svn: 331138
It doesn't really exist. The instruction always writes 16-bits of memory. Putting a REX.w on it won't change anything.
While I was touching the encoding tests to remove it, I added some other missing register form test cases.
llvm-svn: 331135
LLVM_ON_WIN32 is set exactly with MSVC and MinGW (but not Cygwin) in
HandleLLVMOptions.cmake, which is where _WIN32 defined too. Just use the
default macro instead of a reinvented one.
See thread "Replacing LLVM_ON_WIN32 with just _WIN32" on llvm-dev and cfe-dev.
No intended behavior change.
This moves over all uses of the macro, but doesn't remove the definition
of it in (llvm-)config.h yet.
llvm-svn: 331127
Summary: Add bindings to create import declarations for modules, functions, types, and other entities. This wraps the conveniences available in the existing DIBuilder API, but these seem C++-specific.
Reviewers: whitequark, harlanhaskins, deadalnix
Reviewed By: whitequark
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D46167
llvm-svn: 331123
Many of these aliases exist to give one syntax or the other a slightly different mnemonic and the other variant gets a duplicate of its normal mnemonic
This patch restricts a lot of these to only one variant so we don't get the duplication.
This removes a lot of duplicate entries from the matcher table. It also reduces the number of warnings printed when you enable the ambiguous match warning in tablegen.
llvm-svn: 331117
Summary:
Previously, a extending load was represented at (G_*EXT (G_LOAD x)).
This had a few drawbacks:
* G_LOAD had to be legal for all sizes you could extend from, even if
registers didn't naturally hold those sizes.
* All sizes you could extend from had to be allocatable just in case the
extend went missing (e.g. by optimization).
* At minimum, G_*EXT and G_TRUNC had to be legal for these sizes. As we
improve optimization of extends and truncates, this legality requirement
would spread without considerable care w.r.t when certain combines were
permitted.
* The SelectionDAG importer required some ugly and fragile pattern
rewriting to translate patterns into this style.
This patch begins changing the representation to:
* (G_[SZ]EXTLOAD x)
* (G_LOAD x) any-extends when MMO.getSize() * 8 < ResultTy.getSizeInBits()
which resolves these issues by allowing targets to work entirely in their
native register sizes, and by having a more direct translation from
SelectionDAG patterns.
This patch introduces the new generic instructions and new variation on
G_LOAD and adds lowering for them to convert back to the existing
representations.
Depends on D45466
Reviewers: ab, aditya_nandakumar, bogner, rtereshin, volkan, rovka, aemerson, javed.absar
Reviewed By: aemerson
Subscribers: aemerson, kristof.beyls, javed.absar, llvm-commits
Differential Revision: https://reviews.llvm.org/D45540
llvm-svn: 331115
Summary:
* rL328953 does not include bindings for LLVMDIBuilderCreateClassType and LLVMDIBuilderCreateBitFieldMemberType despite declaring their prototypes. Provide these bindings now.
* Switch to more precise types with specific numeric limits matching the DIBuilder's C++ API.
Reviewers: harlanhaskins, whitequark, deadalnix
Reviewed By: whitequark
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D46168
llvm-svn: 331114
Summary:
Masked merge has a pattern of: `((x ^ y) & M) ^ y`.
But, there is no difference between `((x ^ y) & M) ^ y` and `((x ^ y) & ~M) ^ x`,
We should canonicalize the pattern to non-inverted mask.
https://rise4fun.com/Alive/Yol
Reviewers: spatel, craig.topper
Reviewed By: spatel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D45664
llvm-svn: 331112
These instruction don't use their memory operands as normal memory operands. They're just used as addresses. They don't have a size because they aren't directly representing a load or store.
llvm-svn: 331104
Favor the 0x1a encoding for register/register move to match gas.
The instructions used RM and MR in their name along with rr/rm/mr at the end. To make more consistent with other instructions remove the RM/MR and use rr/rm/mr/rr_REV.
Hide the _REV encoding from the assembler but leave it for the disassembler.
llvm-svn: 331101
The invocation of getExact in ScalarEvolution::getBackedgeTakenInfo is used
only for getting statistic and for assert.
Even if statistics is disabled, the code related to it will be eliminated
the invocation to getExact itself will not be eliminated
because it may have side-effects like creation of new SCEVs.
So do invocation only when we collect statistics or executes asserts.
Reviewers: mkazantsev, sanjoy, javed.absar
Reviewed By: javed.absar
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D46178
llvm-svn: 331099
This commit makes it so that if you outline a def of some register, then the
call instruction created by the outliner actually reflects that the register
is defined by the call. It also makes it so that outlined functions don't
have the TracksLiveness property.
Outlined calls shouldn't break liveness assumptions that someone might make.
This also un-XFAILs the noredzone test, and updates the calls test.
llvm-svn: 331095
The effect of doing so is not disrupting the LoopPassManager when mixing this pass with other loop passes. This should help locality of access substaintially and avoids the cost of computing PostDom.
The assumption here is that the full GuardWidening (which does use PostDom) is run as a canonicalization before loop opts and that this version is just catching cases exposed by other loop passes. (i.e. LoopPredication, IndVarSimplify, LoopUnswitch, etc..)
llvm-svn: 331094
Summary:
D42479 (rL329525) enabled SDIV combine for pow2 non-splat vector
dividers. But when there is a 1 in a vector, the instruction sequence to
be generated involves shifting a value by the number of its bit widths,
which is undefined
(c64f4dbfe3/lib/CodeGen/SelectionDAG/DAGCombiner.cpp (L6000-L6006)).
Especially, in architectures that do not support vector instructions,
each of element in a vector will be computed separately using scalar
operations, and then the resulting value will be undef for '1' values
in a vector.
(All 1's vector is fine; only vectors mixed with 1 and others will be
affected.)
Reviewers: RKSimon, jgravelle-google
Subscribers: jfb, dschuff, sbc100, jgravelle-google, llvm-commits
Differential Revision: https://reviews.llvm.org/D46161
llvm-svn: 331092
Summary:
Previously the flag intrinsics always used the index instructions even if a mask instruction also exists.
To fix fix this I've created a single ISD node type that returns index, mask, and flags. The SelectionDAG CSE process will merge all flavors of intrinsics with the same inputs to a s ingle node. Then during isel we just have to look at which results are used to know what instruction to generate. If both mask and index are used we'll need to emit two instructions. But for all other cases we can emit a single instruction.
Since I had to do manual isel anyway, I've removed the pseudo instructions and custom inserter code that was working around tablegen limitations with multiple implicit defs.
I've also renamed the recently added sse42.ll test case to sttni.ll since it focuses on that subset of the sse4.2 instructions.
Reviewers: chandlerc, RKSimon, spatel
Reviewed By: chandlerc
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D46202
llvm-svn: 331091
For local variables the first DW_OP_deref is consumed by turning the
location kind into a memeory location, but that only makes sense for
values that are in a register to begin with, which cannot happen for
global variables that are attached to a symbol.
rdar://problem/39741860
This reapplies r330970 after fixing an uncovered bug in r331086 and
working around the situation caused by it.
llvm-svn: 331090
Now local value sinking only scans and numbers instructions added
between the current flush point and the last flush point. This ensures
that ISel is overall linear in the size of the BB.
Fixes PR37010 and re-enables local value sinking by default.
llvm-svn: 331087
This patch adds support for fragment expressions
TryToShrinkGlobalToBoolean() which were previously just dropped.
Thanks to Reid Kleckner for providing me a reproducer!
llvm-svn: 331086
Summary:
Currently, we
1. match `LHS` matcher to the `first` operand of binary operator,
2. and then match `RHS` matcher to the `second` operand of binary operator.
If that does not match, we swap the `LHS` and `RHS` matchers:
1. match `RHS` matcher to the `first` operand of binary operator,
2. and then match `LHS` matcher to the `second` operand of binary operator.
This works ok.
But it complicates writing of commutative matchers, where one would like to match
(`m_Value()`) the value on one side, and use (`m_Specific()`) it on the other side.
This is additionally complicated by the fact that `m_Specific()` stores the `Value *`,
not `Value **`, so it won't work at all out of the box.
The last problem is trivially solved by adding a new `m_c_Specific()` that stores the
`Value **`, not `Value *`. I'm choosing to add a new matcher, not change the existing
one because i guess all the current users are ok with existing behavior,
and this additional pointer indirection may have performance drawbacks.
Also, i'm storing pointer, not reference, because for some mysterious-to-me reason
it did not work with the reference.
The first one appears trivial, too.
Currently, we
1. match `LHS` matcher to the `first` operand of binary operator,
2. and then match `RHS` matcher to the `second` operand of binary operator.
If that does not match, we swap the ~~`LHS` and `RHS` matchers~~ **operands**:
1. match ~~`RHS`~~ **`LHS`** matcher to the ~~`first`~~ **`second`** operand of binary operator,
2. and then match ~~`LHS`~~ **`RHS`** matcher to the ~~`second`~ **`first`** operand of binary operator.
Surprisingly, `$ ninja check-llvm` still passes with this.
But i expect the bots will disagree..
The motivational unittest is included.
I'd like to use this in D45664.
Reviewers: spatel, craig.topper, arsenm, RKSimon
Reviewed By: craig.topper
Subscribers: xbolva00, wdng, llvm-commits
Differential Revision: https://reviews.llvm.org/D45828
llvm-svn: 331085
Some of the bots were failing in a different way to the others. These were
unable to compare tuples. Fix this by changing to a struct, thereby avoiding
the quirks of tuples.
llvm-svn: 331081
We currently have a hard to solve analysis problem around the order of instructions within a potentially throwing block. We can't cheaply determine whether a given instruction is before the first potential throw in the block. While we're working on that in the background, special case the first instruction within the header.
why this particular special case? Well, headers are guaranteed to execute if the loop does, and it turns out we tend to produce this form in practice.
In a follow on patch, I tend to extend LICM with an alternate approach which works for any instruction in the header before the first throw, but this is the best I can come up with other users of the analysis (such as store promotion.)
Note: I can't show the difference in the analysis result since we're ORing in the expensive instruction walk used by SCEV. Using the full walk is not suitable for a general solution.
llvm-svn: 331079
Summary:
Only allow a single unique .symver alias per symbol. This matches the
behavior of gas. I noticed that we ignored multiple mismatched symver
directives looking at https://reviews.llvm.org/D45798
Reviewers: pcc, tejohnson, espindola
Reviewed By: pcc
Subscribers: emaste, arichardson, llvm-commits, kcc
Differential Revision: https://reviews.llvm.org/D45845
llvm-svn: 331078
Extend the live-in check for all aliased registers so that we can
allow sinking Copy instructions when only implicit def is in successor's
live-in.
llvm-svn: 331072
Summary:
Currently only the memory size is supported but others can be added as
needed.
narrowScalar for G_LOAD and G_STORE now correctly update the
MachineMemOperand and will refuse to legalize atomics since those need more
careful expansions to maintain atomicity.
Reviewers: ab, aditya_nandakumar, bogner, rtereshin, aemerson, javed.absar
Reviewed By: aemerson
Subscribers: aemerson, rovka, kristof.beyls, javed.absar, llvm-commits
Differential Revision: https://reviews.llvm.org/D45466
llvm-svn: 331071
The idea is to have a pass which performs the same transformation as GuardWidening, but can be run within a loop pass manager without disrupting the pass manager structure. As demonstrated by the test case, this doesn't quite get there because of issues with post dom, but it gives a good step in the right direction. the motivation is purely to reduce compile time since we can now preserve locality during the loop walk.
This patch only includes a legacy pass. A follow up will add a new style pass as well.
llvm-svn: 331060
These branches were previously unanalyzable and unselectable. Add them and
recognize how to generate their inverses.
Reviewers: smaksimovic, atanasyan, abeserminji
Differential Revision: https://reviews.llvm.org/D46113
llvm-svn: 331050
Put the first ldp at the end, so that the load-store optimizer can run
and merge the ldp and the add into a post-index ldp.
This didn't work in case no frame was needed and resulted in code size
regressions.
llvm-svn: 331044
If the MachineInstr uses a custom inserter and is then erased after
instruction selection, there is no use for mapping it to a sched class.
Review: Ulrich Weigand
llvm-svn: 331040
We currently support LCSSA PHI nodes in the outer loop exit, if their
incoming values do not come from the outer loop latch or if the
outer loop latch has a single predecessor. In that case, the outer loop latch
will be executed only if the inner loop gets executed. If we have multiple
predecessors for the outer loop latch, it may be executed even if the inner
loop does not get executed.
This is a first step to support the case described in
https://bugs.llvm.org/show_bug.cgi?id=30472
Reviewers: efriedma, karthikthecool, mcrosier
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D43237
llvm-svn: 331037
This adds IR intrinsics for the AArch64 dot-product instructions introduced in
v8.2-A.
Differential revisioon: https://reviews.llvm.org/D46107
llvm-svn: 331036
Since PTX has grown a <2 x half> datatype vectorization has become more
important. The late LoadStoreVectorizer intentionally only does loads
and stores, but now arithmetic has to be vectorized for optimal
throughput too.
This is still very limited, SLP vectorization happily creates <2 x half>
if it's a legal type but there's still a lot of register moving
happening to get that fed into a vectorized store. Overall it's a small
performance win by reducing the amount of arithmetic instructions.
I haven't really checked what the loop vectorizer does to PTX code, the
cost model there might need some more tweaks. I didn't see it causing
harm though.
Differential Revision: https://reviews.llvm.org/D46130
llvm-svn: 331035
This patch makes compiler does not fuse fmul and fadd/fsub into
fmadd/fmsub by default. Instead, -fp-contract=fast option can
be used when such behavior is desired.
Differential Revision: https://reviews.llvm.org/D46057
llvm-svn: 331033
This adds IR intrinsics for the ARM dot-product instructions introduced in
v8.2-A.
Differential revision: https://reviews.llvm.org/D46106
llvm-svn: 331032
Back when the R52 schedule was added in rL286949, there was no way
to enable machine schedules in ARM for specific cores. Since then a
target feature has been added. This enables the feature for R52,
removing the need to manually specify compiler flags.
llvm-svn: 331027
Summary:
The value tracking analysis uses function alignment to infer that the
least significant bits of function pointers are known to be zero.
Unfortunately, this is not correct for ARM targets: the least
significant bit of a function pointer stores the ARM/Thumb state
information (i.e., the LSB is set for Thumb functions and cleared for
ARM functions).
The original approach (https://reviews.llvm.org/D44781) introduced a
new field for function pointer alignment in the DataLayout structure
to address this. But it seems unlikely that optimizations based on
function pointer alignment would bring much benefit in practice to
justify the additional maintenance burden, so this patch simply
assumes that function pointer alignment is always unknown.
Reviewers: javed.absar, efriedma
Reviewed By: efriedma
Subscribers: kristof.beyls, llvm-commits, hfinkel, rogfer01
Differential Revision: https://reviews.llvm.org/D46110
llvm-svn: 331025
Add new umin creation method which accepts a list of operands.
SCEV does not represents umin which is required in getExact, so
it transforms umin to umax with not. As a result the transformation of
tree of max to max with several operands does not work.
We just use the new introduced method for creation umin from several operands.
Reviewers: sanjoy, mkazantsev
Reviewed By: sanjoy
Subscribers: javed.absar, llvm-commits
Differential Revision: https://reviews.llvm.org/D46047
llvm-svn: 331015
It doesn't unwind, and the wrong marking leads to the creation of an
.eh_frame section when it isn't necessary.
Differential Revision: https://reviews.llvm.org/D46082
llvm-svn: 331008
The program might have unusual expectations for functions; for example,
the Linux kernel's build system warns if it finds references from .text
to .init.data.
I'm not sure this is something we actually want to make any guarantees
about (there isn't any explicit rule that would disallow outlining
in this case), but we might want to be conservative anyway.
Differential Revision: https://reviews.llvm.org/D46091
llvm-svn: 331007
Summary: Also test for symbols information in test/MC/WebAssembly/debug-info.ll.
Subscribers: jfb, dschuff, jgravelle-google, aheejin, sunfish, JDevlieghere, llvm-commits
Differential Revision: https://reviews.llvm.org/D46160
llvm-svn: 331005
Summary: If file stream arg is not captured and source is fopen, we could replace IO calls by unlocked IO ("_unlocked" function variants) to gain better speed,
Reviewers: efriedma, RKSimon, spatel, sanjoy, hfinkel, majnemer
Subscribers: lebedev.ri, llvm-commits
Differential Revision: https://reviews.llvm.org/D45736
llvm-svn: 331002
Summary:
Simplify integer add expression X % C0 + (( X / C0 ) % C1) * C0 to
X % (C0 * C1). This is a common pattern seen in code generated by the XLA
GPU backend.
Add test cases for this new optimization.
Patch by Bixia Zheng!
Reviewers: sanjoy
Reviewed By: sanjoy
Subscribers: efriedma, craig.topper, lebedev.ri, llvm-commits, jlebar
Differential Revision: https://reviews.llvm.org/D45976
llvm-svn: 330992
The main goal of this change is to make it much easier to track which
rules are actually covered by Testgen'erated regression tests.
Reviewers: aemerson, dsanders
Differential Revision: https://reviews.llvm.org/D46095
llvm-svn: 330988
`lb` and `lbu` commands accepts 16-bit signed offsets. But GAS accepts
larger offsets for these commands. If an offset does not fit in 16-bit
range, `lb` command is translated into lui/lb or lui/addu/lb series.
It's interesting that initially LLVM assembler supported this feature,
but later it was broken.
This patch restores support for 32-bit offsets. It replaces `mem_simm16`
operand for `LB` and `LBu` definitions by the new `mem_simmptr` operand.
This operand is intended to check that offset fits to the same size as
using for pointers. Later we will be able to extend this rule and
accepts 64-bit offsets when it is possible.
Some issues remain:
- The regression also affects LD, SD, LH, LHU commands. I'm going
to fix them by a separate patch.
- GAS accepts any 32-bit values as an offset. Now LLVM accepts signed
16-bit values and this patch extends the range to signed 32-bit offsets.
In other words, the following code accepted by GAS and still triggers
an error by LLVM:
```
lb $4, 0x80000004
# gas
lui a0, 0x8000
lb a0, 4(a0)
```
- In case of 64-bit pointers GAS accepts a 64-bit offset and translates
it to the li/dsll/lb series of commands. LLVM still rejects it.
Probably this feature has never been implemented in LLVM. This issue
is for a separate patch.
```
lb $4, 0x800000001
# gas
li a0, 0x8000
dsll a0, a0, 0x14
lb a0, 4(a0)
```
Differential Revision: https://reviews.llvm.org/D45020
llvm-svn: 330983
Summary:
Use the FP for scavenged spill slot accesses to prevent corruption of
the callee-save region when the SP is re-aligned.
Based on problem and patch reported by @paulwalker-arm
This is an alternative to solution proposed in D45770
Reviewers: t.p.northover, paulwalker-arm, thegameg, javed.absar
Subscribers: qcolombet, mcrosier, paulwalker-arm, kristof.beyls, rengolin, javed.absar, llvm-commits
Differential Revision: https://reviews.llvm.org/D46063
llvm-svn: 330976
For local variables the first DW_OP_deref is consumed by turning the
location kind into a memeory location, but that only makes sense for
values that are in a register to begin with, which cannot happen for
global variables that are attached to a symbol.
rdar://problem/39741860
llvm-svn: 330970
Summary:
The old comment referred to llvm/IR/Writer.h which doesn't longer exist.
This patch replaces it with an up-to-date description of AsmWriter library.
Patch by Alex Yursha.
Reviewers: gribozavr, vlad.tsyrklevich
Reviewed By: vlad.tsyrklevich
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D45895
llvm-svn: 330962
Summary:
Follow-up to D43690, the EliminateAvailableExternally pass currently
runs under -O0 and -O2 and up. Under -O1 we would still want to drop
available_externally symbols to reduce space without inlining having
run.
Reviewers: tejohnson
Reviewed By: tejohnson
Subscribers: mehdi_amini, llvm-commits, kcc
Differential Revision: https://reviews.llvm.org/D46093
llvm-svn: 330961
Correct the definitions of ei, di, eret, deret, wait, syscall and break.
Also provide microMIPS specific aliases to match the MIPS aliases.
Additionally correct the definition of the wait instruction so that
it is present in the instruction mapping tables.
Reviewers: smaksimovic, abeserminji, atanasyan
Differential Revision: https://reviews.llvm.org/D45939
llvm-svn: 330952
As noted, the attribute name is subject to change once we have
the clang side implemented, but it's clear that we need some
kind of attribute-based predication here based on the discussion
for:
rL330437
llvm-svn: 330951
This causes some slight shuffling but no meaningful codegen differences on the
corpus I used for testing, but it has a larger impact when combined with e.g.
rematerialisation. Regardless, it makes sense to report as accurate
target-specific information as possible.
llvm-svn: 330949
There's no direct instruction for this, but it's trivially implemented
with two movs. Without this the code generator just dies when
encountering a shufflevector.
Differential Revision: https://reviews.llvm.org/D46116
llvm-svn: 330948
As discussed in the post-review comments for rL330437,
we need to guard this fold to allow existing code to
keep working with the undefined behavior that they've
come to rely on.
That would mean duplicating more code than we already
have, so let's fix that first.
llvm-svn: 330947
This patch adds a new shuffle kind useful for transposing a 2xn matrix. These
transpose shuffle masks read corresponding even- or odd-numbered vector
elements from two n-dimensional source vectors and write each result into
consecutive elements of an n-dimensional destination vector. The transpose
shuffle kind is meant to model the TRN1 and TRN2 AArch64 instructions. As such,
this patch also considers transpose shuffles in the AArch64 implementation of
getShuffleCost.
Differential Revision: https://reviews.llvm.org/D45982
llvm-svn: 330941
I'm unable to construct a representative test case that demonstrates the
advantage, but it seems sensible to report accurate target-specific
information regardless.
llvm-svn: 330938
This patch extends the PredicateMethod of AsmOperands used in SVE's
LD1 instructions with a DiagnosticPredicate. This makes them 'context
sensitive' to the operand that has been parsed and tells the user to
use the right register (with expected shift/extend), rather than telling
the immediate is out of range when it actually parsed a register.
Patch [2/2] in a series to improve assembler diagnostics for SVE:
- Patch [1/2]: https://reviews.llvm.org/D45879
- Patch [2/2]: https://reviews.llvm.org/D45880
Reviewers: olista01, stoklund, craig.topper, mcrosier, rengolin, echristo, fhahn, SjoerdMeijer, evandro, javed.absar
Reviewed By: fhahn
Differential Revision: https://reviews.llvm.org/D45880
llvm-svn: 330934
This has no impact on codegen for the current RISC-V unit tests or my small
benchmark set and very minor changes in a few programs in the GCC torture
suite. Based on this, I haven't been able to produce a representative test
program that demonstrates a benefit from isLegalAddressingMode. I'm committing
the patch anyway, on the basis that presenting accurate information to the
target-independent code is preferable to relying on incorrect generic
assumptions.
llvm-svn: 330932
instructions.
These have special permission according to the x86 manual to read
unaligned memory, and this folding is done by ICC and GCC as well.
This corrects one of the issues identified in PR37246.
llvm-svn: 330896
This reverts commit 023c8be90980e0180766196cba86f81608b35d38.
This patch triggers miscompile of zlib on PowerPC platform. Most likely it is
caused by some pre-backend PPC-specific pass, but we don't clearly know the
reason yet. So we temporally revert this patch with intention to return it
once the problem is resolved. See bug 37229 for details.
llvm-svn: 330893