By using the widest type possible for PACKSS truncation we have a better chance of being able to peek through bitcasts and improves other combines driven by ComputeNumSignBits.
llvm-svn: 316448
These tests checked for the line number without a leading ":", so for example,
a missed diagnostic on line 123 could match one on line 1123, 2123, etc,
desynchronising the test for hundreds of lines.
This couldn't cause it to incorrectly pass or fail, but made it hard to track
down test failures.
Differential revision: https://reviews.llvm.org/D39238
llvm-svn: 316442
Report a diagnostic when we fail to parse a shift in a memory operand because
the shift type is not an identifier. Without this, we were silently ignoring
the whole instruction.
Differential revision: https://reviews.llvm.org/D39237
llvm-svn: 316441
Summary:
r264440 added or/and patterns for storing -1 or 0 with the intention of decreasing code size. However,
X86CallFrameOptimization does not recognize these memory accesses so it will not replace them with push's when profitable.
This patch fixes this problem by teaching X86CallFrameOptimization these store 0/-1 idioms.
An alternative fix would be to prevent the 'store 0/1 idioms' patterns from firing when accessing the stack. This would save
the need to teach the pass about these idioms. However, because X86CallFrameOptimization does not always fire we may result
in cases where neither X86CallFrameOptimization not the patterns for 'store 0/1 idioms' fire.
Fixes pr34863
Reviewers: DavidKreitzer, guyblank, aymanmus
Reviewed By: aymanmus
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D38738
llvm-svn: 316431
Summary:
Got asserts in llvm::CastInst::getCastOpcode saying:
`DestBits == SrcBits && "Illegal cast to vector (wrong type or size)"' failed.
Problem seemed to be that llvm::ConstantFoldCastInstruction did
not handle ptrtoint cast of a getelementptr returning a vector
correctly. I assume such situations are quite rare, since the
GEP needs to be considered as a constant value (base pointer
being null).
The solution used here is to simply avoid the constant fold
of ptrtoint when the value is a vector. It is not supported,
and by bailing out we do not fail on assertions later on.
Reviewers: craig.topper, majnemer, davide, filcab, efriedma
Reviewed By: efriedma
Subscribers: efriedma, filcab, llvm-commits
Differential Revision: https://reviews.llvm.org/D38546
llvm-svn: 316430
This change fixes values of test so that it passes
-verify without errors and also adds comments.
Test was introduced in D39119 and intention was to check
that tool is able to dump few
DW_*GNU_call_site* tags and attributes, so that
change is NFC cleanup.
llvm-svn: 316428
Summary:
Kill the thread if operand 0 == false.
llvm.amdgcn.wqm.vote can be applied to the operand.
Also allow kill in all shader stages.
Reviewers: arsenm, nhaehnle
Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D38544
llvm-svn: 316427
This alias caused a crash when trying to print the "cps #0" instruction in a
diagnostic for thumbv6 (which doesn't have that instruction).
The comment was incorrect, this instruction is UNPREDICTABLE if no flag bits
are set, so I don't think it's worth keeping.
Differential Revision: https://reviews.llvm.org/D39191
llvm-svn: 316420
SelectionDAG inserts a copy of ESP into a virtual register.
X86CallFrameOptimization assumed that the COPY, if present, is always
right after the call-frame setup instruction (ADJCALLSTACKDOWN). This was a
wrong assumption as the COPY can be located anywhere between the call-frame setup
instruction and its first use. If the COPY happened to be located in a different
location than what X86CallFrameOptimization assumed, visiting it while
processing the call chain would lead to a conservative bail-out.
The fix is quite straightfoward, scan ahead for the stack-pointer copy and make note
of it so it can be ignored while processing the call chain.
Fixes pr34903
Differential Revision: https://reviews.llvm.org/D38730
llvm-svn: 316416
Infrastructure designed for padding code with nop instructions in key places such that preformance improvement will be achieved.
The infrastructure is implemented such that the padding is done in the Assembler after the layout is done and all IPs and alignments are known.
This patch by itself in a NFC. Future patches will make use of this infrastructure to implement required policies for code padding.
Reviewers:
aaboud
zvi
craig.topper
gadi.haber
Differential revision: https://reviews.llvm.org/D34393
Change-Id: I92110d0c0a757080a8405636914a93ef6f8ad00e
llvm-svn: 316413
Summary:
The motivation of this change is to enable .mir testing for this pass.
Added one test case to cover the functionality, this same case will be improved by
a future patch.
Reviewers: igorb, guyblank, DavidKreitzer
Reviewed By: guyblank, DavidKreitzer
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D38729
llvm-svn: 316412
The `BasicBlock::getFirstInsertionPt` call may return `std::end` for the
BB. Dereferencing the end iterator results in an assertion failure
"(!NodePtr->isKnownSentinel()), function operator*". Ensure that the
returned iterator is valid before dereferencing it. If the end is
returned, move one position backward to get a valid insertion point.
llvm-svn: 316401
This adds type index discovery and dumper support for symbol record kind
0x1168, which is a list of inlined function ids. This symbol kind is
undocumented, but S_INLINEES is consistent with the existing
nomenclature.
Fixes PR34222
llvm-svn: 316398
This commit adds optimisation remarks for outlining which fire when a function
is successfully outlined.
To do this, OutlinedFunctions must now contain references to their Candidates.
Since the Candidates must still be sorted and worked on separately, this is
done by working on everything in terms of shared_ptrs to Candidates. This is
good; it means that we can easily move everything to outlining in terms of
the OutlinedFunctions rather than the individual Candidates. This is far more
intuitive than what's currently there!
(Remarks are output when a function is created for some group of Candidates.
In a later commit, all of the outlining logic should be rewritten so that we
loop over OutlinedFunctions rather than over Candidates.)
llvm-svn: 316396
This is in preparation for a verifier check that makes sure
copies are of the same size (when generic virtual registers are involved).
llvm-svn: 316388
This is in preparation for a verifier check that makes sure copies are
of the same size (when generic virtual registers are involved).
llvm-svn: 316387
This pass adds pgo-memop-opt pass to the new pass manager.
It is in the old pass manager but somehow left out in the new pass manager.
Differential Revision: http://reviews.llvm.org/D39145
llvm-svn: 316384
This fixes a bug where we'd crash given code like the test-case from
https://bugs.llvm.org/show_bug.cgi?id=30792 . Instead, we let the
offending clobber silently slide through.
This doesn't fully fix said bug, since the assembler will still complain
the moment it sees a crypto/fp/vector op, and we still don't diagnose
calls that require vector regs.
Differential Revision: https://reviews.llvm.org/D39030
llvm-svn: 316374
In HexagonISelLowering, there is code to handle the case when
a function returns an i1 type. In this case, we need to generate
extra nodes to copy the result from R0 to a predicate register.
The code was returning the wrong value for the chain edge which
caused an assert "Wrong topological sorting" when converting the
instructions to MIs.
This patch fixes the problem by returning the chain for the final
copy.
Patch by Brendon Cahoon.
llvm-svn: 316367
If we have the situation where a Swap feeds a Splat we can sometimes change the
index on the Splat and then remove the Swap instruction.
Differential Revision: https://reviews.llvm.org/D39009
llvm-svn: 316366
This patch enables the import of stores. Unfortunately, doing so by itself,
loses an optimization where storing 0 to memory makes use of WZR/XZR.
To mitigate this, this patch also introduces a new feature that allows register
operands to nominate a zero register. When this is done, GlobalISel will
substitute (G_CONSTANT 0) with the nominated register automatically. This
is currently configured to only apply to the stores.
Applying it to GPR32/GPR64 register classes in general will be done after
review see (https://reviews.llvm.org/D39150).
llvm-svn: 316360
The range should be assumed to be the hardware maximum
if a workitem intrinsic is used in a callable function
which does not know the restricted limit of the calling
kernel.
llvm-svn: 316346
combineShuffleOfScalars is very conservative about shuffled BUILD_VECTORs that can be combined together.
This patch adds one additional case - if both BUILD_VECTORs represent splats of the same scalar value but with different UNDEF elements, then we should create a single splat BUILD_VECTOR, sharing only the UNDEF elements defined by the shuffle mask.
Differential Revision: https://reviews.llvm.org/D38696
llvm-svn: 316331
Before, loop unrolling was only enabled for loops with a single
block. This restriction has been removed and replaced by:
- allow a maximum of two exiting blocks,
- a four basic block limit for cores with a branch predictor.
Differential Revision: https://reviews.llvm.org/D38952
llvm-svn: 316313
Summary:
It's unclear if this is the only thing we can do but at least this is consistent with the check
of address space agreement in `isBitCastable`.
The code is used at least in both instcombine and jumpthreading though
I could only find a way to trigger the invalid cast in instcombine.
Reviewers: loladiro, sanjoy, majnemer
Reviewed By: sanjoy
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D34335
llvm-svn: 316302
As discussed in D39011:
https://reviews.llvm.org/D39011
...replacing constants with a variable is inverting the transform done
by other IR passes, so we definitely don't want to do this early.
In fact, it's questionable whether this transform belongs in SimplifyCFG
at all. I'll look at moving this to codegen as a follow-up step.
llvm-svn: 316298
This fixes bugzilla 26810
https://bugs.llvm.org/show_bug.cgi?id=26810
This is intended to prevent sequences like:
movl %ebp, 8(%esp) # 4-byte Spill
movl %ecx, %ebp
movl %ebx, %ecx
movl %edi, %ebx
movl %edx, %edi
cltd
idivl %esi
movl %edi, %edx
movl %ebx, %edi
movl %ecx, %ebx
movl %ebp, %ecx
movl 16(%esp), %ebp # 4 - byte Reload
Such sequences are created in 2 scenarios:
Scenario #1:
vreg0 is evicted from physreg0 by vreg1
Evictee vreg0 is intended for region splitting with split candidate physreg0 (the reg vreg0 was evicted from)
Region splitting creates a local interval because of interference with the evictor vreg1 (normally region spliiting creates 2 interval, the "by reg" and "by stack" intervals. Local interval created when interference occurs.)
one of the split intervals ends up evicting vreg2 from physreg1
Evictee vreg2 is intended for region splitting with split candidate physreg1
one of the split intervals ends up evicting vreg3 from physreg2 etc.. until someone spills
Scenario #2
vreg0 is evicted from physreg0 by vreg1
vreg2 is evicted from physreg2 by vreg3 etc
Evictee vreg0 is intended for region splitting with split candidate physreg1
Region splitting creates a local interval because of interference with the evictor vreg1
one of the split intervals ends up evicting back original evictor vreg1 from physreg0 (the reg vreg0 was evicted from)
Another evictee vreg2 is intended for region splitting with split candidate physreg1
one of the split intervals ends up evicting vreg3 from physreg2 etc.. until someone spills
As compile time was a concern, I've added a flag to control weather we do cost calculations for local intervals we expect to be created (it's on by default for X86 target, off for the rest).
Differential Revision: https://reviews.llvm.org/D35816
Change-Id: Id9411ff7bbb845463d289ba2ae97737a1ee7cc39
llvm-svn: 316295