When multi-instruction emission is supported, it will no longer be guaranteed
that every BuildMIAction has a corresponding matched instruction. BuildMIAction
should support not having one to cover the case where a rule produces more
instructions than it matched.
llvm-svn: 316463
The type index is from the TPI stream, not the IPI stream. Fix the
dumper, fix type index discovery, and add a test in LLD.
Also improve the log message we emit when we fail to rewrite type
indices in LLD. That's how I found this bug.
llvm-svn: 316461
By using the widest type possible for PACKSS truncation we have a better chance of being able to peek through bitcasts and improves other combines driven by ComputeNumSignBits.
llvm-svn: 316448
Ideally, we should compare 32- and 64-bit versions to see if the
ret line is the only difference and then insert the regex only
in that case. But this is a quick hack to avoid a bunch of noise
as existing tests are updated.
llvm-svn: 316443
These tests checked for the line number without a leading ":", so for example,
a missed diagnostic on line 123 could match one on line 1123, 2123, etc,
desynchronising the test for hundreds of lines.
This couldn't cause it to incorrectly pass or fail, but made it hard to track
down test failures.
Differential revision: https://reviews.llvm.org/D39238
llvm-svn: 316442
Report a diagnostic when we fail to parse a shift in a memory operand because
the shift type is not an identifier. Without this, we were silently ignoring
the whole instruction.
Differential revision: https://reviews.llvm.org/D39237
llvm-svn: 316441
Summary:
r264440 added or/and patterns for storing -1 or 0 with the intention of decreasing code size. However,
X86CallFrameOptimization does not recognize these memory accesses so it will not replace them with push's when profitable.
This patch fixes this problem by teaching X86CallFrameOptimization these store 0/-1 idioms.
An alternative fix would be to prevent the 'store 0/1 idioms' patterns from firing when accessing the stack. This would save
the need to teach the pass about these idioms. However, because X86CallFrameOptimization does not always fire we may result
in cases where neither X86CallFrameOptimization not the patterns for 'store 0/1 idioms' fire.
Fixes pr34863
Reviewers: DavidKreitzer, guyblank, aymanmus
Reviewed By: aymanmus
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D38738
llvm-svn: 316431
Summary:
Got asserts in llvm::CastInst::getCastOpcode saying:
`DestBits == SrcBits && "Illegal cast to vector (wrong type or size)"' failed.
Problem seemed to be that llvm::ConstantFoldCastInstruction did
not handle ptrtoint cast of a getelementptr returning a vector
correctly. I assume such situations are quite rare, since the
GEP needs to be considered as a constant value (base pointer
being null).
The solution used here is to simply avoid the constant fold
of ptrtoint when the value is a vector. It is not supported,
and by bailing out we do not fail on assertions later on.
Reviewers: craig.topper, majnemer, davide, filcab, efriedma
Reviewed By: efriedma
Subscribers: efriedma, filcab, llvm-commits
Differential Revision: https://reviews.llvm.org/D38546
llvm-svn: 316430
Summary:
When describing trunc/zext/sext/ptrtoint/inttoptr in the chapter
about Constant Expressions we now simply refer to the Instruction
Reference. As far as I know there are no difference when it comes
to the semantics and the argument constraints. The only difference
is that the syntax is slighly different for the constant expressions,
regarding the use of parenthesis in constant expressions.
Referring to the Instruction Reference is the same solution as
already used for several other operations, such as bitcast.
The main goal was to add information that vector types are allowed
also in trunc/zext/sext/ptrtoint/inttoptr constant expressions.
That was not explicitly mentioned earlier, and resulted in some
questions in the review of https://reviews.llvm.org/D38546
Reviewers: efriedma, majnemer
Reviewed By: efriedma
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D39165
llvm-svn: 316429
This change fixes values of test so that it passes
-verify without errors and also adds comments.
Test was introduced in D39119 and intention was to check
that tool is able to dump few
DW_*GNU_call_site* tags and attributes, so that
change is NFC cleanup.
llvm-svn: 316428
Summary:
Kill the thread if operand 0 == false.
llvm.amdgcn.wqm.vote can be applied to the operand.
Also allow kill in all shader stages.
Reviewers: arsenm, nhaehnle
Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D38544
llvm-svn: 316427
* Remove the -arm-asm-parser-dev-diags option.
* Use normal DEBUG(dbgs()) printing for the extra development information about
missing diagnostics.
Differential Revision: https://reviews.llvm.org/D39194
llvm-svn: 316423
rL316059 fixed the potential build failure when compiling
with -DLLVM_BUILD_LLVM_DYLIB=ON -DLLVM_LINK_LLVM_DYLIB=ON.
rL316372 just reverted the part of the fix, so restore it.
llvm-svn: 316422
This is the Thumb encoding, so the Requires list must include IsThumb.
No test because we happen to select the ARM one first, but that's just luck.
Differential Revision: https://reviews.llvm.org/D39190
llvm-svn: 316421
This alias caused a crash when trying to print the "cps #0" instruction in a
diagnostic for thumbv6 (which doesn't have that instruction).
The comment was incorrect, this instruction is UNPREDICTABLE if no flag bits
are set, so I don't think it's worth keeping.
Differential Revision: https://reviews.llvm.org/D39191
llvm-svn: 316420
Summary:
Support formatv of TimePoint with strftime-style formats.
Extensions for millis/micros/nanos are added.
Inital use case is HH:MM:SS.MMM timestamps in clangd logs.
Reviewers: bkramer, ilya-biryukov
Subscribers: labath, llvm-commits
Differential Revision: https://reviews.llvm.org/D38992
llvm-svn: 316419
Refactor ExpandMemcmp:
- Stop duplicating the logic for computation of the sequence of loads to
generate (thsi was done in three different places), this is now done
only once in MemCmpExpansion::MemCmpExpansion().
- Add a FIXME to expose a bug with the computation of the number of loads
when not all sizes are loadable. For example, on X86-32 + SSE, possible
loads are {16,4,2,1} bytes. The current code considers that all loads
starting at MaxLoadSize are possible. This is not an issue right now as
vector loads are not enabled, so I'm not fixing the issue here to keep
the change as small as possible. I'm going to address this in a
subsequent revision, where I enable vector loads.
See https://bugs.llvm.org/show_bug.cgi?id=34887
Differential Revision: https://reviews.llvm.org/D38498
llvm-svn: 316417
SelectionDAG inserts a copy of ESP into a virtual register.
X86CallFrameOptimization assumed that the COPY, if present, is always
right after the call-frame setup instruction (ADJCALLSTACKDOWN). This was a
wrong assumption as the COPY can be located anywhere between the call-frame setup
instruction and its first use. If the COPY happened to be located in a different
location than what X86CallFrameOptimization assumed, visiting it while
processing the call chain would lead to a conservative bail-out.
The fix is quite straightfoward, scan ahead for the stack-pointer copy and make note
of it so it can be ignored while processing the call chain.
Fixes pr34903
Differential Revision: https://reviews.llvm.org/D38730
llvm-svn: 316416
Besides all the goodness from modularizing a header, this is necessary
to compile ToT with modules with the clang host compiler from Xcode 9 in
macOS 10.13, which our bots don't use yet.
rdar://problem/35038151
llvm-svn: 316414
Infrastructure designed for padding code with nop instructions in key places such that preformance improvement will be achieved.
The infrastructure is implemented such that the padding is done in the Assembler after the layout is done and all IPs and alignments are known.
This patch by itself in a NFC. Future patches will make use of this infrastructure to implement required policies for code padding.
Reviewers:
aaboud
zvi
craig.topper
gadi.haber
Differential revision: https://reviews.llvm.org/D34393
Change-Id: I92110d0c0a757080a8405636914a93ef6f8ad00e
llvm-svn: 316413
Summary:
The motivation of this change is to enable .mir testing for this pass.
Added one test case to cover the functionality, this same case will be improved by
a future patch.
Reviewers: igorb, guyblank, DavidKreitzer
Reviewed By: guyblank, DavidKreitzer
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D38729
llvm-svn: 316412
Summary:
Previously, we would emit error messages like "IO failure on output
stream". This change causes use to include information about what
actually went wrong, e.g. "No space left on device".
Reviewers: sunfish, rnk
Reviewed By: rnk
Subscribers: mehdi_amini, llvm-commits, hiraditya
Differential Revision: https://reviews.llvm.org/D39203
llvm-svn: 316404
The `BasicBlock::getFirstInsertionPt` call may return `std::end` for the
BB. Dereferencing the end iterator results in an assertion failure
"(!NodePtr->isKnownSentinel()), function operator*". Ensure that the
returned iterator is valid before dereferencing it. If the end is
returned, move one position backward to get a valid insertion point.
llvm-svn: 316401
This adds type index discovery and dumper support for symbol record kind
0x1168, which is a list of inlined function ids. This symbol kind is
undocumented, but S_INLINEES is consistent with the existing
nomenclature.
Fixes PR34222
llvm-svn: 316398
This commit adds optimisation remarks for outlining which fire when a function
is successfully outlined.
To do this, OutlinedFunctions must now contain references to their Candidates.
Since the Candidates must still be sorted and worked on separately, this is
done by working on everything in terms of shared_ptrs to Candidates. This is
good; it means that we can easily move everything to outlining in terms of
the OutlinedFunctions rather than the individual Candidates. This is far more
intuitive than what's currently there!
(Remarks are output when a function is created for some group of Candidates.
In a later commit, all of the outlining logic should be rewritten so that we
loop over OutlinedFunctions rather than over Candidates.)
llvm-svn: 316396
This is in preparation for a verifier check that makes sure
copies are of the same size (when generic virtual registers are involved).
llvm-svn: 316388
This is in preparation for a verifier check that makes sure copies are
of the same size (when generic virtual registers are involved).
llvm-svn: 316387
This pass adds pgo-memop-opt pass to the new pass manager.
It is in the old pass manager but somehow left out in the new pass manager.
Differential Revision: http://reviews.llvm.org/D39145
llvm-svn: 316384
Apple's iOS, tvOS and watchOS simulator platforms have never been clearly
distinguished in the target triples. Even though they are intended to
behave similarly to the corresponding device platforms, they have separate
SDKs and are really separate platforms from the compiler's perspective.
Clang now defines a macro when building for one of these simulator platforms
(r297866) but that relies on the very indirect mechanism of checking to see
which option was used to specify the minimum deployment target. That is not
so great. Swift would also like to distinguish these simulator platforms in
a similar way, but unlike Clang, Swift does not use a separate option to
specify the minimum deployment target -- it uses a -target option to
specify the target triple directly, including the OS version number.
Using a different target triple for the simulator platforms is a much
more direct and obvious way to specify this. Putting the "simulator" in
the environment component of the triple means the OS values can stay the
same and existing code the looks at the OS field will not be affected.
https://reviews.llvm.org/D39143
rdar://problem/34729432
llvm-svn: 316380
This fixes a bug where we'd crash given code like the test-case from
https://bugs.llvm.org/show_bug.cgi?id=30792 . Instead, we let the
offending clobber silently slide through.
This doesn't fully fix said bug, since the assembler will still complain
the moment it sees a crypto/fp/vector op, and we still don't diagnose
calls that require vector regs.
Differential Revision: https://reviews.llvm.org/D39030
llvm-svn: 316374
Implement a localised graph builder for indirect control flow
instructions. Main interface is through GraphBuilder::buildFlowGraph,
which will build a flow graph around an indirect CF instruction. Various
modifications to FileVerifier are also made to const-expose some members
needed for machine code analysis done by the graph builder.
Reviewers: vlad.tsyrklevich
Reviewed By: vlad.tsyrklevich
Subscribers: llvm-commits, kcc, pcc
Differential Revision: https://reviews.llvm.org/D38427
llvm-svn: 316372
Summary:
The elts of ActivePreds which is defined as a SmallPtrSet are copied
into Blocks using std::copy. This makes the resultant order of Blocks
non-deterministic. We cannot simply sort Blocks as they need to match
the corresponding Values. So a better approach is to define ActivePreds
as SmallSetVector.
This fixes the following failures in
http://lab.llvm.org:8011/builders/reverse-iteration:
LLVM :: Transforms/GVNSink/indirect-call.ll
LLVM :: Transforms/GVNSink/sink-common-code.ll
LLVM :: Transforms/GVNSink/struct.ll
Reviewers: dberlin, jmolloy, bkramer, efriedma
Reviewed By: dberlin
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D39025
llvm-svn: 316369
In HexagonISelLowering, there is code to handle the case when
a function returns an i1 type. In this case, we need to generate
extra nodes to copy the result from R0 to a predicate register.
The code was returning the wrong value for the chain edge which
caused an assert "Wrong topological sorting" when converting the
instructions to MIs.
This patch fixes the problem by returning the chain for the final
copy.
Patch by Brendon Cahoon.
llvm-svn: 316367
If we have the situation where a Swap feeds a Splat we can sometimes change the
index on the Splat and then remove the Swap instruction.
Differential Revision: https://reviews.llvm.org/D39009
llvm-svn: 316366
This patch enables the import of stores. Unfortunately, doing so by itself,
loses an optimization where storing 0 to memory makes use of WZR/XZR.
To mitigate this, this patch also introduces a new feature that allows register
operands to nominate a zero register. When this is done, GlobalISel will
substitute (G_CONSTANT 0) with the nominated register automatically. This
is currently configured to only apply to the stores.
Applying it to GPR32/GPR64 register classes in general will be done after
review see (https://reviews.llvm.org/D39150).
llvm-svn: 316360
The range should be assumed to be the hardware maximum
if a workitem intrinsic is used in a callable function
which does not know the restricted limit of the calling
kernel.
llvm-svn: 316346
I don't think this changes anything functionally yet, but I plan to fix the disassembler to use this to disable matching certain instructions with 0xf3/0xf2/0x66 prefixes.
llvm-svn: 316337
Remove AssertZext and instead add PEXTRW/PEXTRB support to computeKnownBitsForTargetNode to simplify instruction selection.
Differential Revision: https://reviews.llvm.org/D39169
llvm-svn: 316336
combineShuffleOfScalars is very conservative about shuffled BUILD_VECTORs that can be combined together.
This patch adds one additional case - if both BUILD_VECTORs represent splats of the same scalar value but with different UNDEF elements, then we should create a single splat BUILD_VECTOR, sharing only the UNDEF elements defined by the shuffle mask.
Differential Revision: https://reviews.llvm.org/D38696
llvm-svn: 316331
Summary:
Support formatting formatv_objects.
While here, fix documentation about member-formatters, and attempted
perfect-forwarding (I think).
Reviewers: zturner
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D38997
llvm-svn: 316330
This fixes exporting functions starting with an underscore, and
fully decorated fastcall/vectorcall functions.
Tests will be added in the lld repo.
Differential Revision: https://reviews.llvm.org/D39168
llvm-svn: 316316
Before, loop unrolling was only enabled for loops with a single
block. This restriction has been removed and replaced by:
- allow a maximum of two exiting blocks,
- a four basic block limit for cores with a branch predictor.
Differential Revision: https://reviews.llvm.org/D38952
llvm-svn: 316313
The overflow detection assertions were tautological due to truncation.
Adjust them to no longer be tautological.
Patch by Alex Langford!
llvm-svn: 316303
Summary:
It's unclear if this is the only thing we can do but at least this is consistent with the check
of address space agreement in `isBitCastable`.
The code is used at least in both instcombine and jumpthreading though
I could only find a way to trigger the invalid cast in instcombine.
Reviewers: loladiro, sanjoy, majnemer
Reviewed By: sanjoy
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D34335
llvm-svn: 316302
As discussed in D39011:
https://reviews.llvm.org/D39011
...replacing constants with a variable is inverting the transform done
by other IR passes, so we definitely don't want to do this early.
In fact, it's questionable whether this transform belongs in SimplifyCFG
at all. I'll look at moving this to codegen as a follow-up step.
llvm-svn: 316298
Summary: test/CodeGen/PowerPC/pr33093.ll uses both powerpc64 (big-endian) and powerpc64le while the former was unsupported.
Subscribers: nemanjai
Differential Revision: https://reviews.llvm.org/D39164
llvm-svn: 316297
This fixes bugzilla 26810
https://bugs.llvm.org/show_bug.cgi?id=26810
This is intended to prevent sequences like:
movl %ebp, 8(%esp) # 4-byte Spill
movl %ecx, %ebp
movl %ebx, %ecx
movl %edi, %ebx
movl %edx, %edi
cltd
idivl %esi
movl %edi, %edx
movl %ebx, %edi
movl %ecx, %ebx
movl %ebp, %ecx
movl 16(%esp), %ebp # 4 - byte Reload
Such sequences are created in 2 scenarios:
Scenario #1:
vreg0 is evicted from physreg0 by vreg1
Evictee vreg0 is intended for region splitting with split candidate physreg0 (the reg vreg0 was evicted from)
Region splitting creates a local interval because of interference with the evictor vreg1 (normally region spliiting creates 2 interval, the "by reg" and "by stack" intervals. Local interval created when interference occurs.)
one of the split intervals ends up evicting vreg2 from physreg1
Evictee vreg2 is intended for region splitting with split candidate physreg1
one of the split intervals ends up evicting vreg3 from physreg2 etc.. until someone spills
Scenario #2
vreg0 is evicted from physreg0 by vreg1
vreg2 is evicted from physreg2 by vreg3 etc
Evictee vreg0 is intended for region splitting with split candidate physreg1
Region splitting creates a local interval because of interference with the evictor vreg1
one of the split intervals ends up evicting back original evictor vreg1 from physreg0 (the reg vreg0 was evicted from)
Another evictee vreg2 is intended for region splitting with split candidate physreg1
one of the split intervals ends up evicting vreg3 from physreg2 etc.. until someone spills
As compile time was a concern, I've added a flag to control weather we do cost calculations for local intervals we expect to be created (it's on by default for X86 target, off for the rest).
Differential Revision: https://reviews.llvm.org/D35816
Change-Id: Id9411ff7bbb845463d289ba2ae97737a1ee7cc39
llvm-svn: 316295
This is similar to how we generate the VEX tables.
More fixes are still needed for the instructions that use EVEX.b (broadcast and embedded rounding).
llvm-svn: 316294
The missed canonicalization/optimization in the motivating test from PR34471 leads to very different codegen:
int switcher(int x) {
switch(x) {
case 17: return 17;
case 19: return 19;
case 42: return 42;
default: break;
}
return 0;
}
int comparator(int x) {
if (x == 17) return 17;
if (x == 19) return 19;
if (x == 42) return 42;
return 0;
}
For the first example, we use a bit-test optimization to avoid a series of compare-and-branch:
https://godbolt.org/g/BivDsw
Differential Revision: https://reviews.llvm.org/D39011
llvm-svn: 316293
This patch implements dynamic stack (re-)alignment for 16-bit Thumb. When
targeting processors, which support only the 16-bit Thumb instruction set
the compiler ignores the alignment attributes of automatic variables and may
silently generate incorrect code.
Differential revision: https://reviews.llvm.org/D38143
llvm-svn: 316289
The pass scans the function to find instruction chains that define
registers in the same domain (closures).
It then calculates the cost of converting the closure to another domain.
If found profitable, the instructions are converted to instructions in
the other domain and the register classes are changed accordingly.
This commit adds the pass infrastructure and a simple conversion from
the GPR domain to the Mask domain.
Differential Revision:
https://reviews.llvm.org/D37251
Change-Id: Ic2cf1d76598110401168326d411128ae2580a604
llvm-svn: 316288
This introduces a new operand type to encode the whether the index register should be XMM/YMM/ZMM. And new code to fixup the results created by readSIB.
This has the nice effect of removing a bunch of code that hard coded the name of every GATHER and SCATTER instruction to map the index type.
This fixes PR32807.
llvm-svn: 316273
Neither of these cases really require a temporary APInt outside the loop. For the ConstantDataSequential case the APInt will never be larger than 64-bits so its fine to just call getElementAsAPInt. For ConstantVector we can get the APInt by reference and only make a copy where the inversion is needed.
llvm-svn: 316265
The way that splitInnerLoopHeader splits blocks requires that
the induction PHI will be the first PHI in the inner loop
header. This makes sure that is actually the case when there
are both IV and reduction phis.
Differential Revision: https://reviews.llvm.org/D38682
llvm-svn: 316261
We don't need to do any additional recursion, we just need to analyze the APInt stored in the node. This matches what the ValueTracking versions do for IR.
llvm-svn: 316256
Summary:
We shouldn't recurse any further but it doesn't mean we shouldn't be able to give the known bits for a constant. The caller would probably like that we always return the right answer for a constant RHS. This matches what InstCombine does in this case.
I don't have a test case because this showed up while trying to revive D31724.
Reviewers: RKSimon, spatel
Reviewed By: RKSimon
Subscribers: arsenm, llvm-commits
Differential Revision: https://reviews.llvm.org/D38967
llvm-svn: 316255
Summary: __multi3 is not available on x86 (32-bit). Setting lib call name for MULI_128 to nullptr forces DAGTypeLegalizer::ExpandIntRes_MUL to generate instructions for 128-bit multiply instead of a call to an undefined function. This fixes PR20871 though it may be worth looking at why licm and indvars combine to generate 65-bit multiplies in that test.
Patch by Riyaz V Puthiyapurayil
Reviewers: craig.topper, schweitz
Reviewed By: craig.topper, schweitz
Subscribers: RKSimon, llvm-commits
Differential Revision: https://reviews.llvm.org/D38668
llvm-svn: 316254
This was suggested in PR35003:
https://bugs.llvm.org/show_bug.cgi?id=35003
32-bit checks may be identical to 64-bit (if we avoid those pesky scalar params!).
I'll check in the script change shortly assuming this doesn't anger any bots.
llvm-svn: 316223
Summary:
This change comes from using lld for i686-windows-msvc. Before this change, lld
emits an error of:
error: relocation against symbol in discarded section: .xdata
It's possible that this could be addressed in lld, but I think this change is
reasonable on its own.
At a high level, this is being generated:
A (.text comdat) -> B (.text) -> C (.xdata comdat)
Where A is a C++ inline function, which references B, an exception handler
thunk, which references C, the exception handling info.
With this structure, lld will error when applying relocations to B if the C it
references has been discarded (some other C has been selected).
This change checks if A is comdat, and if so places the exception registration
thunk (B) in the comdata group of A (and B).
It appears that MSVC makes the __ehhandler function comdat.
Is it possible that duplicate thunks are being emitted into the final binary
with other linkers, or are they stripping the unused thunks?
Reviewers: rnk, majnemer, compnerd, smeenai
Reviewed By: rnk, compnerd
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D38940
llvm-svn: 316219
Normally, if the registers holding the induction variable's bounds
are redefined inside of the loop's body, the loop cannot be converted
to a hardware loop. However, if the redefining instruction is actually
loading an immediate value into the register, this conversion is both
possible and legal (since the immediate itself will be used in the
loop setup in the preheader).
llvm-svn: 316218
(recommit #2 after checking for timeout issue).
The original patch was an improvement to IR ValueTracking on
non-negative integers. It has been checked in to trunk (D18777,
r284022). But was disabled by default due to performance regressions.
Perf impact has improved. The patch would be enabled by default.
Reviewers: reames, hfinkel
Differential Revision: https://reviews.llvm.org/D34101
Patch by: Olga Chupina <olga.chupina@intel.com>
llvm-svn: 316208
This test checks that load from constant memory will be sunk regardless of
aliasing stores in the loop.
Patch by Daniil Suchkov!
Differential Revision: https://reviews.llvm.org/D39113
llvm-svn: 316207
The commit at https://reviews.llvm.org/rL315888 is causing some failures
with internal testing. Disabling this code until we can resolve the issues.
llvm-svn: 316199
Summary:
Updated the XRayExample docs with instructions for using the llvm-xray stacks
command.
Reviewers: dberris
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D39106
llvm-svn: 316192