Summary:
Following the cleanup in D48202, method foldBlockIntoPredecessor has the
same behavior. Replace its uses with MergeBlockIntoPredecessor.
Remove foldBlockIntoPredecessor.
Reviewers: chandlerc, dmgreen
Subscribers: jlebar, javed.absar, zzheng, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62751
llvm-svn: 362538
This is a special case of a more general transform (not (sub Y, X)) -> (add X, ~Y). InstCombine knows the general form. I've restricted to the special case to fix the motivating case PR42118. I tried handling any case where Y was constant, but got some changes on some Mips tests that I couldn't quickly prove where beneficial.
Fixes PR42118
Differential Revision: https://reviews.llvm.org/D62828
llvm-svn: 362533
Oddly, I had to change a value name from "tmp0" to "bc0" to get the autogened test to pass. I'm putting this down to an oddity of update_test_checks or FileCheck, but don't understand it.
llvm-svn: 362532
This shows up as a side issue to the main problem for the AVX target example from PR37428:
https://bugs.llvm.org/show_bug.cgi?id=37428 - https://godbolt.org/z/7tpRa3
But as we can see in the pile of existing test diffs, it's actually a widespread problem
that affects any AVX or later target. Apart from a couple of oddballs, I think these are
all improvements for the reasons stated in the code comment: we do not want to enable YMM
unnecessarily (avoid vzeroupper and frequency throttling) and some cores split 256-bit
stores anyway.
We could say that MergeConsecutiveStores() is going overboard on some of these examples,
but that won't solve the problem completely. But that is a reason I'm proposing this as
a lowering rather than a combine: we will infinite loop fighting the merge code if we try
this earlier.
Differential Revision: https://reviews.llvm.org/D62498
llvm-svn: 362524
Arm Architecture v8.5a introduces Branch Target Identification (BTI). When
enabled all indirect branches must target a bti instruction of the
appropriate form. As PLT sequences may sometimes be the target of an
indirect branch and PLT[0] always is, a static linker may need to generate
PLT sequences that contain "bti c" as the first instruction. In effect:
bti c
adrp x16, page offset to .got.plt
...
Instead of:
adrp x16, page offset to .got.plt
...
At present the PLT decoding assumes the adrp will always be the first
instruction. This patch adds support for a single "bti c" to prefix it. A
test binary has been uploaded with such a PLT sequence. A forthcoming LLD
patch will make heavy use of the PLT decoding code.
Differential Revision: https://reviews.llvm.org/D62598
llvm-svn: 362523
- For error returns in demangleSpecialTableNode(),
demangleLocalStaticGuard(), RTTITypeDescriptor,
demangleRttiBaseClassDescriptorNode(), demangleUnsigned(),
demangleUntypedVariable() (via RttiBaseClassArray)
- For ?_A and ?_P which are handled at early levels of the
demangler but are not implemented in a later stage; this
is now more obvious
- Replace a "default:" with an explicit list of cases, to
get -Wswitch check we list all cases
llvm-svn: 362520
The underlying ConstantRange functionality has been added in D60952,
D61207 and D61238, this just exposes it for LVI.
I'm switching the code from using a whitelist to a blacklist, as
we're down to one unsupported operation here (xor) and writing it
this way seems more obvious :)
Differential Revision: https://reviews.llvm.org/D62822
llvm-svn: 362519
- Add test coverage around invalid anon namespaces and
for error paths in demanglePrimitiveType() and in
demangleFullyQualifiedTypeName()
- Use DEMANGLE_UNREACHABLE in two more unreachable places
llvm-svn: 362514
One way of using llvm-symbolizer is to interactively within a process
write a line from a parent process to llvm-symbolizer's stdin, and then
read the output, then write the next line, read, etc. This worked as
long as all the lines were good. However, this didn't work prior to this
patch if any of the inputs were bad inputs, because the output is not
flushed after a bad input, meaning the parent process is sat waiting for
output, whilst llvm-symbolizer is sat waiting for input. This patch
flushes the output after every invocation of symbolizeInput when reading
from stdin. It also removes unnecessary flushing when llvm-symbolizer is
not reading addresses from stdin, which should give a slight performance
boost in these situations.
Reviewed by: ikudrin
Differential Revision: https://reviews.llvm.org/D62371
llvm-svn: 362511
This is to address some of the problems in existing P9 resource modeling,
especially about the dispatching rules.
Instead of using a hypothetical DISPATCHER , we try to use the number of
actual dispatch slots, and define SchedWriteRes to model dispatch rules,
then update instruction classes according to dispatch rules.
All the dispatch rules and instruction classes update are made according
to POWER9 User Manual.
Differential Revision: https://reviews.llvm.org/D61873
llvm-svn: 362509
The proposal in D62498 showed that x86 would benefit from vector
store splitting, but that may conflict with the generic DAG
combiner's store merging transforms.
Add memory type to the existing TLI hook that enables the merging
transforms, so we can limit those changes to scalars only for x86.
llvm-svn: 362507
- Replace `Error = true` in a few branches that are truly unreachable
with DEMANGLE_UNREACHABLE
- Remove early return early in startsWithLocalScopePattern() because
it's redundant with the next two early returns
- Remove unreachable `case '0'` (it's handled in the branch below)
- Remove an unused bool return
- Add test coverage for several early error returns, mostly in
array type parsing
llvm-svn: 362506
Even if one bit is defined, the code is not clear what it is suppose to do.
The test wants to assert that some bits are undef, but that's not what the IR does and I don't think it's even possible to do that in any meaningful way. It was added in D12497, so @reames might want to double check.
Differential Revision: https://reviews.llvm.org/D60859
llvm-svn: 362499
ELF for the 64-bit Arm Architecture defines two processor-specific dynamic
tags:
DT_AARCH64_BTI_PLT 0x70000001, d_val
DT_AARCH64_PAC_PLT 0x70000003, d_val
These presence of these tags indicate that PLT sequences have been
protected using Branch Target Identification and Pointer Authentication
respectively. The presence of both indicates that the PLT sequences have
been protected with both Branch Target Identification and Pointer
Authentication.
This patch adds the tags and tests for llvm-readobj and yaml2obj.
As some of the processor specific dynamic tags overlap, this patch splits
them up, keeping their original default value if they were not previously
mentioned explicitly in a switch case.
Differential Revision: https://reviews.llvm.org/D62596
llvm-svn: 362493
ELF for the 64-bit Arm Architecture defines a processor specific property
type GNU_PROPERTY_AARCH64_FEATURE_1_AND as GNU_PROPERTY_LOPROC. This
property works in a similar way to the existing X86 processor specific
property GNU_PROPERTY_GNU_X86_FEATURE_1_AND.
Two feature bits are defined for GNU_PROPERTY_AARCH64_FEATURE_1_AND:
- GNU_PROPERTY_AARCH64_FEATURE_1_BTI 0x1
- GNU_PROPERTY_AARCH64_FEATURE_1_PAC 0x2
This patch defines the property, feature bits and implements support for
printing in llvm-readobj.
Differential Revision: https://reviews.llvm.org/D62595
llvm-svn: 362490
Summary:
This *might* be the last fold for `sink-addsub-of-const.ll`, but i'm not sure yet.
As far as i can tell, there are no regressions here (ignoring x86-32),
all changes are either good or neutral.
This, almost surprisingly to me, fixes the motivational tests (in `shift-amount-mod.ll`)
`@reg32_lshr_by_sub_from_negated` from [[ https://bugs.llvm.org/show_bug.cgi?id=41952 | PR41952 ]].
https://rise4fun.com/Alive/vMd3
Reviewers: RKSimon, t.p.northover, craig.topper, spatel, efriedma
Reviewed By: RKSimon
Subscribers: sdardis, javed.absar, arichardson, kristof.beyls, jrtc27, atanasyan, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62774
llvm-svn: 362488
As I mentioned on D61887 we don't get many hits on ComputeNumSignBits as we did on computeKnownBits.
The case we do get is interesting though - it allows us to use the 'ConditionalNegate' combine in combineLogicBlendIntoPBLENDV to remove a select.
It comes too late for SSE41 (BLENDV) cases, but SSE2 tests can hit it now. We should probably try to make use of this for SSE41+ targets as well - avoiding variable blends is usually a good idea. I'll investigate as a followup.
Differential Revision: https://reviews.llvm.org/D62777
llvm-svn: 362486
Includes a fix for an introduced build failure due to a post c++11 use of std::mismatch.
This fixes some thin archive relative path issues, paths are shortened where possible and paths are output correctly when using the display table command.
Differential Revision: https://reviews.llvm.org/D59491
llvm-svn: 362484
This change adds two FP16 extraction and two insertion patterns
(one per possible vector length).
Extractions are handled by copying a Q/D register into one of VFP2
class registers, where single FP32 sub-registers can be accessed. Then
the extraction of even lanes are simple sub-register extractions
(because we don't care about the top parts of registers for FP16
operations). Odd lanes need an additional VMOVX instruction.
Unfortunately, insertions cannot be handled in the same way, because:
* There is no instruction to insert FP16 into an even lane (VINS only
works with odd lanes)
* The patterns for odd lanes will have a form of a DAG (not a tree),
and will not be implementable in pure tablegen
Because of this insertions are handled in the same way as 16-bit
integer insertions (with conversions between FP registers and GPRs
using VMOVHR instructions).
Without these patterns the ARM backend would sometimes fail during
instruction selection.
This patch also adds patterns which combine:
* an FP16 element extraction and a store into a single VST1
instruction
* an FP16 load and insertion into a single VLD1 instruction
Differential Revision: https://reviews.llvm.org/D62651
llvm-svn: 362482
This opportunity is found from spec 2017 557.xz_r. And it is used by the sha encrypt/decrypt. See sha-2/sha512.c
static void store64(u64 x, unsigned char* y)
{
for(int i = 0; i != 8; ++i)
y[i] = (x >> ((7-i) * 8)) & 255;
}
static u64 load64(const unsigned char* y)
{
u64 res = 0;
for(int i = 0; i != 8; ++i)
res |= (u64)(y[i]) << ((7-i) * 8);
return res;
}
The load64 has been implemented by https://reviews.llvm.org/D26149
This patch is trying to implement the store pattern.
Match a pattern where a wide type scalar value is stored by several narrow
stores. Fold it into a single store or a BSWAP and a store if the targets
supports it.
Assuming little endian target:
i8 *p = ...
i32 val = ...
p[0] = (val >> 0) & 0xFF;
p[1] = (val >> 8) & 0xFF;
p[2] = (val >> 16) & 0xFF;
p[3] = (val >> 24) & 0xFF;
>
*((i32)p) = val;
i8 *p = ...
i32 val = ...
p[0] = (val >> 24) & 0xFF;
p[1] = (val >> 16) & 0xFF;
p[2] = (val >> 8) & 0xFF;
p[3] = (val >> 0) & 0xFF;
>
*((i32)p) = BSWAP(val);
Differential Revision: https://reviews.llvm.org/D61843
llvm-svn: 362472
The family of 32-bit Thumb instruction encodings that include t2ORR,
t2AND and t2EOR are all listed in the ArmARM as having (0) in bit 15.
The Tablegen descriptions of those instructions listed them as ?. This
change tightens that up by making them into 0 + Unpredictable.
In the specific case of t2ORR, we tighten it up still further by
making the zero bit mandatory. This change comes from Arm v8.1-M, in
which encodings with that bit equal to 1 will now be used for
different instructions.
Reviewers: dmgreen, samparker, SjoerdMeijer, efriedma
Reviewed By: dmgreen, efriedma
Subscribers: efriedma, javed.absar, kristof.beyls, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D60705
llvm-svn: 362470
This is something of a workaround, and the state of stack realignment
controls is kind of a mess. Ideally, we would be able to specify the
stack is infinitely aligned on entry to a kernel.
TargetFrameLowering provides multiple controls which apply at
different points. The StackRealignable field is used during
SelectionDAG, and for some reason distinct from this
hook. StackAlignment is a single field not dependent on the
function. It would probably be better to make that dependent on the
calling convention, and the maximum value for kernels.
Currently this doesn't really change anything, since the frame
lowering mostly does its own thing. This helps avoid regressions in a
future change which will rely more heavily on hasFP.
llvm-svn: 362447
Instead of emitting all of the test stuff for a compare when it's only used by
a select, instead, just emit the compare + select. The select will use the
value of NZCV correctly, so we don't need to emit all of the test instructions
etc.
For now, only support fp selects which use G_FCMP. Also only support condition
codes which will only require one select to represent.
Also add a test.
Differential Revision: https://reviews.llvm.org/D62695
llvm-svn: 362446
r362199 fixed it for zero masking, but not zero masking. The load
folding in the peephole pass hid the bug. This patch turns off
the peephole pass on the relevant test to ensure coverage.
llvm-svn: 362440
Summary: This change facilitates propagating fmf which was placed on setcc from fcmp through folds with selects so that back ends can model this path for arithmetic folds on selects in SDAG.
Reviewers: qcolombet, spatel
Reviewed By: qcolombet
Subscribers: nemanjai, jsji
Differential Revision: https://reviews.llvm.org/D62552
llvm-svn: 362439
We currently miss the opportunities for optmizing comparisons in the peephole
optimizer if the input is the result of a COPY since we look for record-form
versions of the producing instruction.
This patch simply lets the optimization peek through copies.
Differential revision: https://reviews.llvm.org/D59633
llvm-svn: 362438
For some reason multiple places need to do this, and the variant the
loop unroller and inliner use was not handling it.
Also, introduce a new wrapper to be slightly more precise, since on
AMDGPU some addrspacecasts are free, but not no-ops.
llvm-svn: 362436
This patch fixes a problem that occurs in LowerSwitch when a switch statement has a PHI node as its condition, and the PHI node only has two incoming blocks, and one of those incoming blocks is through an unreachable default in the switch statement. When this condition occurs, LowerSwitch holds a pointer to the condition value, but removes the switch block as a predecessor of the PHI block, causing the PHI node to be replaced. LowerSwitch then tries to use its stale pointer to the original condition value, causing a crash.
Differential Revision: https://reviews.llvm.org/D62560
llvm-svn: 362427