This test returns nonnative integer types which aren't supported on all targets.
The real issue with the SelectionDAG scheduler is with x86 EFLAGS.
llvm-svn: 233355
We don't have any logic to emit those tables yet, so the SDAG lowering
of this intrinsic is just a stub. We can see the intrinsic in the
prepared IR, though.
llvm-svn: 233354
It can happen (by line CurSU->isPending = true; // This SU is not in
AvailableQueue right now.) that a SUnit is mark as available but is
not in the AvailableQueue. For SUnit being selected for scheduling
both conditions must be met.
This patch mainly defensively protects from invalid removing a node
from a queue. Sometimes nodes are marked isAvailable but are not in
the queue because they have been defered due to some hazard.
Patch by Pawel Bylica!
llvm-svn: 233351
Fix testcases whose variables are invalid. I'm working on a patch that
adds `Verifier` checks for `MDLocalVariable` (and `MDGlobalVariable`),
and these failed because:
- `scope:` fields need to point at `MDLocalScope` and can't be null.
- `file:` fields need to point at `MDFile`.
- `inlinedAt:` fields need to point at `MDLocation`.
llvm-svn: 233349
Summary:
The ARM backend can use a loop to implement copying byval parameters before
a call. In non-thumb2 mode it uses a constant pool load to materialize the
trip count. For targets that need movt instead (e.g. Native Client), use
the same code as in thumb2 mode to materialize the trip count.
Reviewers: jfb, t.p.northover
Differential Revision: http://reviews.llvm.org/D8442
llvm-svn: 233324
This patch teaches fast-isel how to select 128-bit vector load instructions.
Added test CodeGen/X86/fast-isel-vecload.ll
Differential Revision: http://reviews.llvm.org/D8605
llvm-svn: 233270
those are in the same basic block.
The previous approach was the topological order of the basic block.
By default this rule is disabled.
Related to PR22768.
llvm-svn: 233241
This patch adds supports for the vector constant folding of TRUNCATE and FP_EXTEND instructions and tidies up the SINT_TO_FP and UINT_TO_FP instructions to match.
It also moves the vector constant folding for the FNEG and FABS instructions to use the DAG.getNode() functionality like the other unary instructions.
Differential Revision: http://reviews.llvm.org/D8593
llvm-svn: 233224
We don't have any logic to emit those tables yet, so the sdag lowering
of this intrinsic is just a stub. We can see the intrinsic in the
prepared IR, though.
llvm-svn: 233209
This patch adds Hardware Transaction Memory (HTM) support supported by ISA 2.07
(POWER8). The intrinsic support is based on GCC one [1], but currently only the
'PowerPC HTM Low Level Built-in Function' are implemented.
The HTM instructions follows the RC ones and the transaction initiation result
is set on RC0 (with exception of tcheck). Currently approach is to create a
register copy from CR0 to GPR and comapring. Although this is suboptimal, since
the branch could be taken directly by comparing the CR0 value, it generates code
correctly on both test and branch and just return value. A possible future
optimization could be elimitate the MFCR instruction to branch directly.
The HTM usage requires a recently newer kernel with PPC HTM enabled. Tested on
powerpc64 and powerpc64le.
This is send along a clang patch to enabled the builtins and option switch.
[1] https://gcc.gnu.org/onlinedocs/gcc/PowerPC-Hardware-Transactional-Memory-Built-in-Functions.html
Phabricator Review: http://reviews.llvm.org/D8247
llvm-svn: 233204
This patch allows AVX blend instructions to handle insertion into the low
element of a 256-bit vector for the appropriate data types.
For f32, instead of:
vblendps $1, %xmm1, %xmm0, %xmm1 ## xmm1 = xmm1[0],xmm0[1,2,3]
vblendps $15, %ymm1, %ymm0, %ymm0 ## ymm0 = ymm1[0,1,2,3],ymm0[4,5,6,7]
we get:
vblendps $1, %ymm1, %ymm0, %ymm0 ## ymm0 = ymm1[0],ymm0[1,2,3,4,5,6,7]
For f64, instead of:
vmovsd %xmm1, %xmm0, %xmm1 ## xmm1 = xmm1[0],xmm0[1]
vblendpd $3, %ymm1, %ymm0, %ymm0 ## ymm0 = ymm1[0,1],ymm0[2,3]
we get:
vblendpd $1, %ymm1, %ymm0, %ymm0 ## ymm0 = ymm1[0],ymm0[1,2,3]
For the hardware-neglected integer data types, I left a TODO comment in the
code and added regression tests for a follow-on patch.
Differential Revision: http://reviews.llvm.org/D8609
llvm-svn: 233199
1. There were no CHECK-LABELs, so we could match instructions from the wrong function.
2. The use of zero operands meant multiple xor instructions could match some CHECKs.
3. The test was over-specified to need a Sandybridge CPU and Darwin triple.
llvm-svn: 233198
Reverts the code change from r221168 and the relevant test.
It was a mistake to disable the combiner, and based on the ultimate
definition of 'optnone' we shouldn't have considered the test case
as failing in the first place.
llvm-svn: 233153
We can't use TargetFrameLowering::getFrameIndexOffset directly, because
Win64 really wants the offset from the stack pointer at the end of the
prologue. Instead, use X86FrameLowering::getFrameIndexOffsetFromSP(),
which is a pretty close approximiation of that. It fails to handle cases
with interestingly large stack alignments, which is pretty uncommon on
Win64 and is TODO.
llvm-svn: 233137
vperm2x128 instructions have the special ability (aka free hardware capability)
to shuffle zero values into a vector.
This patch recognizes that type of shuffle and generates the appropriate
control byte.
https://llvm.org/bugs/show_bug.cgi?id=22984
Differential Revision: http://reviews.llvm.org/D8563
llvm-svn: 233100
Summary:
Previous behaviour of 'R' and 'm' has been preserved for now. They will be
improved in subsequent commits.
The offset permitted by ZC varies according to the subtarget since it is
intended to match the restrictions of the pref, ll, and sc instructions.
The restrictions on these instructions are:
* For microMIPS: 12-bit signed offset.
* For Mips32r6/Mips64r6: 9-bit signed offset.
* Otherwise: 16-bit signed offset.
Reviewers: vkalintiris
Reviewed By: vkalintiris
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D8414
llvm-svn: 233063
While the uitofp scalar constant folding treats an integer as an unsigned value (from lang ref):
%X = sitofp i8 -1 to double ; yields double:-1.0
%Y = uitofp i8 -1 to double ; yields double:255.0
The vector constant folding was always using sitofp:
%X = sitofp <2 x i8> <i8 -1, i8 -1> to <2 x double> ; yields <double -1.0, double -1.0>
%Y = uitofp <2 x i8> <i8 -1, i8 -1> to <2 x double> ; yields <double -1.0, double -1.0>
This patch fixes this so that the correct opcode is used for sitofp and uitofp.
%X = sitofp <2 x i8> <i8 -1, i8 -1> to <2 x double> ; yields <double -1.0, double -1.0>
%Y = uitofp <2 x i8> <i8 -1, i8 -1> to <2 x double> ; yields <double 255.0, double 255.0>
Differential Revision: http://reviews.llvm.org/D8560
llvm-svn: 233033
Continue to simplify the `DIDescriptor` subclasses, so that they behave
more like raw pointers. Remove `getRaw()`, replace it with an
overloaded `get()`, and overload the arrow and cast operators. Two
testcases started to crash on the arrow operators with this change
because of `scope:` references that weren't real scopes. I fixed them.
Soon I'll add verifier checks for them too.
This also adds explicit dereference operators. Previously, the builtin
dereference against `operator MDNode *()` would have worked, but now the
builtins are ambiguous.
llvm-svn: 233030
The pass used to be enabled by default with CodeGenOpt::Less (-O1).
This is too aggressive, considering the pass indiscriminately merges
all globals together.
Currently, performance doesn't always improve, and, on code that uses
few globals (e.g., the odd file- or function- static), more often than
not is degraded by the optimization. Lengthy discussion can be found
on llvmdev (AArch64-focused; ARM has similar problems):
http://lists.cs.uiuc.edu/pipermail/llvmdev/2015-February/082800.html
Also, it makes tooling and debuggers less useful when dealing with
globals and data sections.
GlobalMerge needs to better identify those cases that benefit, and this
will be done separately. In the meantime, move the pass to run with
-O3 rather than -O1, on both ARM and AArch64.
llvm-svn: 233024
This enables very common cases to switch to the
smaller encoding.
All of the standard LLVM canonicalizations of comparisons
are the opposite of what we want. Compares with constants
are moved to the RHS, but the first operand can be an inline
immediate, literal constant, or SGPR using the 32-bit VOPC
encoding.
There are additional bad canonicalizations that should
also be fixed, such as canonicalizing ge x, k to gt x, (k + 1)
if this makes k no longer an inline immediate value.
llvm-svn: 232988
This change is incorrect since it converts double rounding into single rounding,
which can produce different results. Instead this optimization will be done by
modifying Clang's codegen to not produce double rounding in the first place.
This reverts commit r232954.
llvm-svn: 232962
Specifically when the conversion is done in two steps, f16 -> f32 -> f64.
For example:
%1 = tail call float @llvm.convert.from.fp16.f32(i16 %0)
%conv = fpext float %1 to double
to:
vcvtb.f64.f16
llvm-svn: 232954
Fixing sign extension in makeLibCall for MIPS64. In MIPS64 architecture all
32 bit arguments (int, unsigned int, float 32 (soft float)) must be sign
extended. This fixes test "MultiSource/Applications/oggenc/".
Patch by Strahinja Petrovic.
Differential Revision: http://reviews.llvm.org/D7791
llvm-svn: 232943
Because the operands of a vector SETCC node can be of a different type from the
result (and often are), it can happen that even if we'd prefer to widen the
result type of the SETCC, the operands have been split instead. In this case,
the SETCC result also must be split. This mirrors what is done in
WidenVecRes_SELECT, and should be NFC elsewhere because if the operands are not
widened the following calls to GetWidenedVector will assert (which is what was
happening in the test case).
llvm-svn: 232935
As preparation for removing the getSubtargetImpl() call from
TargetMachine go ahead and flip the switch on caching the function
dependent subtarget and remove the bare getSubtargetImpl call
from the X86 port. As part of this add a few tests that show we
can generate code and assemble on X86 based on features/cpu on
the Function.
llvm-svn: 232879
If we couldn't analyze its terminator (i.e., it's an indirectbr, or some
other weirdness), we can't safely re-if-convert a predicated block,
because we can't tell whether the predicated terminator can
fallthrough (it does).
Currently, we would completely ignore the fallthrough successor. In
the added testcase, this means we used to generate:
...
@ %entry:
cmp r5, #21
ittt ne
@ %cc1f:
cmpne r7, #42
@ %cc2t:
strne.w r5, [r8]
movne pc, r10
@ %cc1t:
...
Whereas the successor of %cc1f was originally %bb1.
With the fix, we get the correct:
...
@ %entry:
cmp r5, #21
itt eq
@ %cc1t:
streq.w r5, [r11]
moveq pc, r0
@ %cc1f:
cmp r7, #42
itt ne
@ %cc2t:
strne.w r5, [r8]
movne pc, r10
@ %bb1:
...
rdar://20192768
Differential Revision: http://reviews.llvm.org/D8509
llvm-svn: 232872
With this patch, for this one exact case, we'll generate:
blendps %xmm0, %xmm1, $1
instead of:
insertps %xmm0, %xmm1, $0
If there's a memory operand available for load folding and we're
optimizing for size, we'll still generate the insertps.
The detailed performance data motivation for this may be found in D7866;
in summary, blendps has 2-3x throughput vs. insertps on widely used chips.
Differential Revision: http://reviews.llvm.org/D8332
llvm-svn: 232850
The code this patch removes was there to make sure the text sections went
before the dwarf sections. That is necessary because MachO uses offsets
relative to the start of the file, so adding a section can change relaxations.
The dwarf sections were being printed at the start just to produce symbols
pointing at the start of those sections.
The underlying issue was fixed in r231898. The dwarf sections are now printed
when they are about to be used, which is after we printed the text sections.
To make sure we don't regress, the patch makes the MachO streamer assert
if CodeGen puts anything unexpected after the DWARF sections.
llvm-svn: 232842
LocalStackSlotPass assumes that isFrameOffsetLegal doesn't change its
answer when the base register changes. Unfortunately this isn't true
in thumb1, where SP-based loads allow a larger offset than
non-SP-based loads, and this causes the base register reuse code to
generate instructions that are unencodable, causing an assertion
failure.
Solve this by adding a BaseReg parameter to isFrameOffsetLegal, which
ARMBaseRegisterInfo can then make use of to give the correct answer.
Differential Revision: http://reviews.llvm.org/D8419
llvm-svn: 232825