Summary: Adds support for the SVE AND instruction with vector and logical-immediate operands, and their corresponding aliases.
Reviewers: fhahn, rengolin, samparker, echristo, aadg, kristof.beyls
Reviewed By: fhahn
Subscribers: aemerson, javed.absar, tschuett, llvm-commits
Differential Revision: https://reviews.llvm.org/D42295
llvm-svn: 324343
- Fix condition for detecting that a complex basic block was the first in
the chain.
- Add tests.
This was caught by buildbots when submitting rL324319.
llvm-svn: 324341
Followup to D42544 that matches PACKSSWB cases for non-AVX512, SSE and PACKSSDW cases will have to wait until we can add support for general SMIN/SMAX matching.
llvm-svn: 324339
It is better to update pointer of the DISuprogram before we call RAUW for
still live arguments of the function, because with the change reviewed in
D42541 in RAUW we compare DISubprograms rather than functions itself.
Patch by Djordje Todorovic.
Differential Revision: https://reviews.llvm.org/D42794
llvm-svn: 324335
This adds most of the FP16 codegen support, but these areas need further work:
- FP16 literals and immediates are not properly supported yet (e.g. literal
pool needs work),
- Instructions that are generated from intrinsics (e.g. vabs) haven't been
added.
This will be addressed in follow-up patches.
Differential Revision: https://reviews.llvm.org/D42849
llvm-svn: 324321
Summary: Now that PR33325 is fixed, this should always improve the generated code.
Reviewers: spatel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D42793
llvm-svn: 324317
This patch moves ThinLTOBitcodeWriter/module-asm.ll test case into x86 directory to avoid a test failure when x86 backend is not enabled.
llvm-svn: 324316
These used things like unsigned less than zero, which is always false because there is no unsigned number less than zero.
I plan to teach DAG combine to optimize these so need to stop using them.
llvm-svn: 324315
If the inline asm provides the definition of a symbol, this can result
in duplicate symbol errors.
Differential Revision: https://reviews.llvm.org/D42944
llvm-svn: 324313
Summary:
This method is trying to use the truncate node to find which SETCC operand should be replaced directly with the extended load.
This used to work correctly because all uses of the original load were replaced by the truncate before this function was called. So this was used to effectively bypass the truncate and find the load under it.
All but one of the callers now call this before the truncate has replaced the laod so the setcc doesn't yet use the truncate. To account for this we should pass the original load instead.
I changed the order of that one caller to make this work there too.
I don't have a test case because this is probably hidden by later DAG combines causing the extend and truncate to cancel out. I assume this way is a little more efficient and matches what was originally intended.
Reviewers: RKSimon, spatel, niravd
Reviewed By: niravd
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D42878
llvm-svn: 324311
Summary:
Removing the dropped symbols will prevent indirect call promotion in the
ThinLTO Backend from adding a new reference to a symbol, which can
result in linker unsats. This can happen when we compile with a sample
profile collected from one binary by used for another, which may have
profiled targets that aren't used in the new binary.
Note that until dropDeadSymbols handles variables and aliases (in
progress), we may not be able to remove the declaration and can still
have an issue.
Reviewers: grimar, davidxl
Subscribers: mehdi_amini, inglorion, llvm-commits, eraman
Differential Revision: https://reviews.llvm.org/D42816
llvm-svn: 324299
We now allow all signed comparisons and not equal. The complement that needs to be added for this is no worse than the extend. And the vector output forms of pcmpeq/pcmpgt have better latency than the k-register version on SKX.
llvm-svn: 324294
In the motivating case from PR35681 and represented by the macro-fuse-cmp test:
https://bugs.llvm.org/show_bug.cgi?id=35681
...there's a 37 -> 31 byte size win for the loop because we eliminate the big base
address offsets.
SPEC2017 on Ryzen shows no significant perf difference.
Differential Revision: https://reviews.llvm.org/D42607
llvm-svn: 324289
Summary:
This change is part of step five in the series of changes to remove alignment argument from
memcpy/memmove/memset in favour of alignment attributes. In particular, this changes the
LowerMemIntrinsics pass to cease using the old getAlignment() API of MemoryIntrinsic in
favour of getting source & dest specific alignments through the new API.
Steps:
Step 1) Remove alignment parameter and create alignment parameter attributes for
memcpy/memmove/memset. ( rL322965, rC322964, rL322963 )
Step 2) Expand the IRBuilder API to allow creation of memcpy/memmove with differing
source and dest alignments. ( rL323597 )
Step 3) Update Clang to use the new IRBuilder API. ( rC323617 )
Step 4) Update Polly to use the new IRBuilder API. ( rL323618 )
Step 5) Update LLVM passes that create memcpy/memmove calls to use the new IRBuilder API,
and those that use use MemIntrinsicInst::[get|set]Alignment() to use [get|set]DestAlignment()
and [get|set]SourceAlignment() instead. ( rL323886, rL323891, rL324148, rL324273 )
Step 6) Remove the single-alignment IRBuilder API for memcpy/memmove, and the
MemIntrinsicInst::[get|set]Alignment() methods.
Reference
http://lists.llvm.org/pipermail/llvm-dev/2015-August/089384.htmlhttp://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20151109/312083.html
llvm-svn: 324278
Summary:
This change is part of step five in the series of changes to remove alignment argument from
memcpy/memmove/memset in favour of alignment attributes. In particular, this changes the
SimplifyLibCalls pass to cease using the old IRBuilder createMemCpy/createMemMove
single-alignment APIs in favour of the new API that allows setting source and destination
alignments independently.
Steps:
Step 1) Remove alignment parameter and create alignment parameter attributes for
memcpy/memmove/memset. ( rL322965, rC322964, rL322963 )
Step 2) Expand the IRBuilder API to allow creation of memcpy/memmove with differing
source and dest alignments. ( rL323597 )
Step 3) Update Clang to use the new IRBuilder API. ( rC323617 )
Step 4) Update Polly to use the new IRBuilder API. ( rL323618 )
Step 5) Update LLVM passes that create memcpy/memmove calls to use the new IRBuilder API,
and those that use use MemIntrinsicInst::[get|set]Alignment() to use [get|set]DestAlignment()
and [get|set]SourceAlignment() instead. ( rL323886, rL323891, r3L24148 )
Step 6) Remove the single-alignment IRBuilder API for memcpy/memmove, and the
MemIntrinsicInst::[get|set]Alignment() methods.
Reference
http://lists.llvm.org/pipermail/llvm-dev/2015-August/089384.htmlhttp://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20151109/312083.html
llvm-svn: 324273
The major visible difference here is that in line-table dumps,
directory and file names are wrapped in double-quotes; previously,
directory names got single quotes and file names were not quoted at
all.
The improvement in this patch is that when a DWARF v5 line table
header has indirect strings, in a verbose dump these will all have
their section[offset] printed as well as the name itself. This
matches the format used for dumping strings in the .debug_info
section.
Differential Revision: https://reviews.llvm.org/D42802
llvm-svn: 324270
As PR36225 shows, we definitely don't want to enable the
canEvaluate* logic with phis.
There's still a question of whether we should just revert
r324014 completely because it exposes a compile-time sinkhole
(although that problem might exist independently).
llvm-svn: 324266
Summary:
The signatures for the builtins @llvm.memcpy, @llvm.memmove, and @llvm.memset
where changed in rL322965. The number of arguments has decreased from five to
four with the removal of the alignment argument. Alignment is now conveyed
by supplying the align parameter attribute on the destination and/or source of
the cpy/move/set.
llvm-svn: 324265
This allows the immediate to folded into the and instead of being forced to move into a register. This can sometimes result in shorter encodings since the and can sign extend an immediate.
This also allows us to match an and to a movzx after a not.
This can cause an extra move if the input to the separate NOT has an additional user which requires a copy before the NOT.
llvm-svn: 324260
This is the instcombine part of unsigned saturation canonicalization.
Backend patches already commited:
https://reviews.llvm.org/D37510https://reviews.llvm.org/D37534
It converts unsigned saturated subtraction patterns to forms recognized
by the backend:
(a > b) ? a - b : 0 -> ((a > b) ? a : b) - b)
(b < a) ? a - b : 0 -> ((a > b) ? a : b) - b)
(b > a) ? 0 : a - b -> ((a > b) ? a : b) - b)
(a < b) ? 0 : a - b -> ((a > b) ? a : b) - b)
((a > b) ? b - a : 0) -> - ((a > b) ? a : b) - b)
((b < a) ? b - a : 0) -> - ((a > b) ? a : b) - b)
((b > a) ? 0 : b - a) -> - ((a > b) ? a : b) - b)
((a < b) ? 0 : b - a) -> - ((a > b) ? a : b) - b)
Patch by Yulia Koval!
Differential Revision: https://reviews.llvm.org/D41480
llvm-svn: 324255
There was a logic hole in D42739 / rL324014 because we're not accounting for select and phi
instructions that might have repeated operands. This is likely a source of an infinite loop.
I haven't manufactured a test case to prove that, but it should be safe to speculatively limit
this transform to binops while we try to create that test.
llvm-svn: 324252
If the upper 32 bits of a 64 bit mask are all zeros, we have special isel patterns to use a 32-bit and instead of a 64-bit and by relying on the impliciting zeroing of 32 bit ops.
This patch teachs shrinkAndImmediate not to break that optimization.
Differential Revision: https://reviews.llvm.org/D42899
llvm-svn: 324249
This broke the Chromium build; see PR36238.
> This patch is an enhancement to propagate dbg.value information when
> Phis are created on behalf of LCSSA. I noticed a case where a value
> carried across a loop was reported as <optimized out>.
>
> Specifically this case:
>
> int bar(int x, int y) {
> return x + y;
> }
>
> int foo(int size) {
> int val = 0;
> for (int i = 0; i < size; ++i) {
> val = bar(val, i); // Both val and i are correct
> }
> return val; // <optimized out>
> }
>
> In the above case, after all of the interesting computation completes
> our value is reported as "optimized out." This change will add a
> dbg.value to correct this.
>
> This patch also moves the dbg.value insertion routine from
> LoopRotation.cpp into Local.cpp, so that we can share it in both places
> (LoopRotation and LCSSA).
>
> Patch by Matt Davis!
>
> Differential Revision: https://reviews.llvm.org/D42551
llvm-svn: 324247
The function shuffp2 was breaking up a wide shuffle into a pair of
narrower ones, except that the narrower shuffle masks were actually
uninitialized.
llvm-svn: 324243
Summary:
This complements the fixes in r323633 and r324075 which drop the
definitions of dead functions and variables, respectively.
Fixes PR36208.
Reviewers: grimar, rafael
Subscribers: mehdi_amini, llvm-commits, inglorion
Differential Revision: https://reviews.llvm.org/D42856
llvm-svn: 324242