As discussed on PR28136, lowerShuffleAsRepeatedMaskAndLanePermute was attempting to match repeated masks at the 128-bit level and then permute the resultant lanes at the 128-bit (AVX1) or 64-bit (AVX2) sub-lane level.
This change allows us to create the repeated masks at the sub-lane level (and then concat them together to create a 128-bit repeated mask) and then select which sub-lane to permute. This has no effect on the AVX1 codegen.
Fixes PR28136.
llvm-svn: 275543
Thumb-1 doesn't have post-inc or pre-inc load or store instructions. However the LDM/STM instructions with writeback can function as post-inc load/store:
ldm r0!, {r1} @ load from r0 into r1 and increment r0 by 4
Obviously, this only works if the post increment is 4.
llvm-svn: 275540
... When we emit several calls to the same function in the same basic block.
An indirect call uses a "BLX r0" instruction which has a 16-bit encoding. If many calls are made to the same target, this can enable significant code size reductions.
llvm-svn: 275537
Also stop trying to insert skip blocks at end_cf. This
was inserting them at the end of the block which doesn't make
sense. The skip should be inserted at the beginning of the block
right after the end cf. Just remove this for now since no tests
seem to stress this and I think this can be handled more generally
later.
Fixes bug 28550
llvm-svn: 275510
For a fully inlined call chain like a -> b -> c -> d, we were emitting
line info for 'd' 3 separate times: once for d's actual InlineSite line
table, and twice for 'b' and 'c'. This is particularly inefficient when
all these functions are in different headers, because now we need to
encode the file change. Windbg was coping with our suboptimal output, so
this should not be noticeable from the debugger.
llvm-svn: 275502
We don't need to print any of the special __mh_*_header symbols when
disassembling. Since they point at the beginning of the segment (not where the
actual code is) they're pretty misleading.
Should also fix lld bots.
llvm-svn: 275498
This improves the situation discussed in D19228 where we were forcing VPERMPD/VPERMQ where VPERM2F128/VPERM2I128 would have been better.
This was incorrectly reverted in rL275421 during triage of PR28552.
llvm-svn: 275497
We were quite happy to read past the end of the valid section data when
disassembling. Instead we entirely skip stub dylibs, and tell the user what's
happened if their section only has partial data.
llvm-svn: 275487
Fix for PR 28418.
opt never finishes compiling a test when -gvn option is passed.
The problem is caused by the fact that GVN fails to fold a constant expression.
Differential Revision: https://reviews.llvm.org/D22185
llvm-svn: 275483
Summary:
Invoke the weak/linkonce symbol resolution support (already used by
libLTO) that operates via the summary index.
This ensures prevailing linkonce are kept, by making them weak, and
marks preempted copies as available_externally when possible.
With this change, the older support for keeping the prevailing linkonce
(by changing their symbol resolution) is removed.
Reviewers: mehdi_amini
Subscribers: llvm-commits, mehdi_amini
Differential Revision: http://reviews.llvm.org/D22302
llvm-svn: 275474
This patch allows the formation of interleaved access groups in loops
containing predicated blocks. However, the predicated accesses are prevented
from forming groups.
Differential Revision: https://reviews.llvm.org/D19694
llvm-svn: 275471
On Hexagon is it legal to packetize the instructions setting up call
arguments with the call instruction itself. This was already done,
except for tail calls. Make sure tail calls are handled as well.
llvm-svn: 275458
Summary: Port Dead Loop Deletion Pass to the new pass manager.
Reviewers: silvas, davide
Subscribers: llvm-commits, sanjoy, mcrosier
Differential Revision: https://reviews.llvm.org/D21483
llvm-svn: 275453
If there was a tail call, we would incorrectly handle the relocation. It would
end up indexing into the array with an incorrect section id. The symbol was
external to the module, so the Section ID was UNDEFINED (-1). We would then
index the SmallVector with this ID, triggering an assertion. Use the Value
rather than the section load address in this case.
llvm-svn: 275442
Note: I removed the checks after each jump because that's noise, but we apparently
need branches rather than returning i1 to see the bt codegen in some cases.
llvm-svn: 275439
Summary:
In preparation for changing GlobalsAA to stop assuming that intrinsics
can't read arbitrary globals, we need to make sure GlobalsAA is querying
function attributes rather than relying on this assumption.
This patch was inspired by: http://reviews.llvm.org/D20206
Reviewers: jmolloy, hfinkel
Subscribers: eli.friedman, llvm-commits
Differential Revision: https://reviews.llvm.org/D21318
llvm-svn: 275433
We were able to assemble, but not disassemble.
Note that fixupRMValue was truncating EA_REG_BND0-3 because we hit
the uint8_t max. The control registers were already squarely above
it, but I don't think they ever go in .r/m, only in .reg.
I also did notice an extra REX.W in our encoding, but I think that's
fine.
llvm-svn: 275427
This patch prevents increases in the number of instructions, pre-instcombine,
due to induction variable scalarization. An increase in instructions can lead
to an increase in the compile-time required to simplify the induction
variables. We now maintain a new map for scalarized induction variables to
prevent us from converting between the scalar and vector forms.
This patch should resolve compile-time regressions seen after r274627.
llvm-svn: 275419
stdcall is callee-pop like thiscall, so the thiscall changes already did most
of the work for this. This change only opts stdcall in and adds tests.
llvm-svn: 275414
This improves the situation discussed in D19228 where we were forcing VPERMPD/VPERMQ where VPERM2F128/VPERM2I128 would have been better.
llvm-svn: 275411
Summary:
It was recently discovered that, for Mips's SelectionDAGISel subclasses,
all optimization levels caused SelectionDAGISel to behave like -O2.
This change adds the necessary plumbing to initialize the optimization level.
Reviewers: andrew.w.kaylor
Subscribers: andrew.w.kaylor, sdardis, dean, llvm-commits, vradosavljevic, petarj, qcolombet, probinson, dsanders
Differential Revision: https://reviews.llvm.org/D14900
llvm-svn: 275410
Primarily this is to allow blend with zero instead of having to use vperm2f128, but we can use this in the future to deal with AVX512 cases where we need to keep the original element size to correctly fold masked operations.
llvm-svn: 275406
enables the code size optimisation to fold a rem and div into a single
aeabi_uidivmod call. This was not happening before because sdiv was converted
but srem not, and instructions with different signedness are not combined.
Differential Revision: http://reviews.llvm.org/D22214
llvm-svn: 275403
This pass hoists duplicated computations in the program. The primary goal of
gvn-hoist is to reduce the size of functions before inline heuristics to reduce
the total cost of function inlining.
Pass written by Sebastian Pop, Aditya Kumar, Xiaoyu Hu, and Brian Rzycki.
Important algorithmic contributions by Daniel Berlin under the form of reviews.
Differential Revision: http://reviews.llvm.org/D19338
llvm-svn: 275401
constant hoisting. It not only takes into account the number of uses and the
cost of expressions in which constants appear, but now also the resulting
integer range of the offsets. Thus, the algorithm maximizes the number of uses
within an integer range that will enable more efficient code generation. On
ARM, for example, this will enable code size optimisations because less
negative offsets will be created. Negative offsets/immediates are not supported
by Thumb1 thus preventing more compact instruction encoding.
Differential Revision: http://reviews.llvm.org/D21183
llvm-svn: 275382
We were able to fold masked loads with an all-ones mask to a normal
load. However, we couldn't turn a masked load with a mask with mixed
ones and undefs into a normal load.
llvm-svn: 275380