Fix testcases that don't pass the verifier after a WIP patch to check
`MDSubprogram` operands more effectively. I found the following issues:
- When `isDefinition: false`, the `variables:` field might point at
`!{i32 786468}`, or at a tuple that pointed at an empty tuple with
the comment "previously: invalid DW_TAG_base_type" (I vaguely recall
adding those comments during an upgrade script). In these cases, I
just dropped the array.
- The `variables:` field might point at something like `!{!{!8}}`,
where `!8` was an `MDLocation`. I removed the extra layer of
indirection.
- Invalid `type:` (not an `MDSubroutineType`).
llvm-svn: 233466
Fix debug info in these tests, which started failing with a WIP patch to
verify compile units and types. The problems look like they were all
caused by bitrot. They fell into these categories:
- Using `!{i32 0}` instead of `!{}`.
- Using `!{null}` instead of `!{}`.
- Using `!MDExpression()` instead of `!{}`.
- Using `!8` instead of `!{!8}`.
- `file:` references that pointed at `MDCompileUnit`s instead of the
same `MDFile` as the compile unit.
- `file:` references that were numerically off-by-one or (off-by-ten).
llvm-svn: 233415
method.
This enables the instprinter to print a different system register name based on
the feature bits of the per-function subtarget.
Differential Revision: http://reviews.llvm.org/D8668
llvm-svn: 233412
Tailcalls are only OK with forwarded sret pointers. With explicit sret,
one approximation is to check that the pointer isn't an Instruction, as
in that case it might point into some local memory (alloca). That's not
OK with tailcalls.
Explicit sret counterpart to r233409.
Differential Revison: http://reviews.llvm.org/D8510
llvm-svn: 233410
Tailcalls are only OK with forwarded sret pointers. With sret demotion,
they're not, as we'd have a pointer into a soon-to-be-dead stack frame.
Differential Revison: http://reviews.llvm.org/D8510
llvm-svn: 233409
Expose bpf pseudo load instruction via intrinsic. It is used by front-ends that
can encode file descriptors directly into IR instead of relying on relocations.
llvm-svn: 233396
nodes.
When a node is terminal it is pushed at the end of the list of the copies to
coalesce instead of being completely ignored. In effect, this reduces its
priority over non-terminal nodes.
Because of that, we do not miss the rematerialization opportunities, nor the
copies that can be merged with more complex, than the terminal rule,
interference checks.
Related to PR22768.
llvm-svn: 233395
Change `LLParser` to require a non-null `scope:` field for both
`MDLocation` and `MDLocalVariable`. There's no need to wait for the
verifier for this check. This also allows their `::getImpl()` methods
to assert that the incoming scope is non-null.
llvm-svn: 233394
"Fix the MachineScheduler's logic for updating ready times for in-order.
Now the scheduler updates a node's ready time as soon as it is
scheduled, before releasing dependent nodes."
This fix was only made in one variant of the ScheduleDAGMI driver.
Francois de Ferriere reported the issue in the other bit of code where
it was also needed.
I never got around to coming up with a test case, but it's an
obvious fix that shouldn't be delayed any longer.
I'll try to refactor this code a little better.
I did verify performance on a wide variety of targets and saw no
negative impact with this fix.
llvm-svn: 233366
This was discussed a while back and I left it optional for migration. Since it's been far more than the 'week or two' that was discussed, time to actually make this manditory.
llvm-svn: 233357
This patch adds support for explicitly provided spill slots in the GC arguments of a gc.statepoint. This is somewhat analogous to gcroot, but leverages the STATEPOINT MI node and StackMap infrastructure. The motivation for this is:
1) The stack spilling code for gc.statepoints hasn't advanced as fast as I'd like. One major option is to give up on doing spilling in the backend and do it at the IR level instead. We'd give up the ability to have gc values in registers, but that's a minor cost in practice. We are not neccessarily moving in that direction, but having the ability to prototype such a thing cheaply is interesting.
2) I want to port the gcroot lowering to use the statepoint infastructure. Given the metadata printers for gcroot expect a fixed set of stack roots, it's easiest to just reuse the explicit stack slots and pass them directly to the underlying statepoint.
I'm holding off on the documentation for the new feature until I'm reasonable sure this is going to stick around.
llvm-svn: 233356
This test returns nonnative integer types which aren't supported on all targets.
The real issue with the SelectionDAG scheduler is with x86 EFLAGS.
llvm-svn: 233355
We don't have any logic to emit those tables yet, so the SDAG lowering
of this intrinsic is just a stub. We can see the intrinsic in the
prepared IR, though.
llvm-svn: 233354
It can happen (by line CurSU->isPending = true; // This SU is not in
AvailableQueue right now.) that a SUnit is mark as available but is
not in the AvailableQueue. For SUnit being selected for scheduling
both conditions must be met.
This patch mainly defensively protects from invalid removing a node
from a queue. Sometimes nodes are marked isAvailable but are not in
the queue because they have been defered due to some hazard.
Patch by Pawel Bylica!
llvm-svn: 233351
Fix testcases whose variables are invalid. I'm working on a patch that
adds `Verifier` checks for `MDLocalVariable` (and `MDGlobalVariable`),
and these failed because:
- `scope:` fields need to point at `MDLocalScope` and can't be null.
- `file:` fields need to point at `MDFile`.
- `inlinedAt:` fields need to point at `MDLocation`.
llvm-svn: 233349
Summary:
The ARM backend can use a loop to implement copying byval parameters before
a call. In non-thumb2 mode it uses a constant pool load to materialize the
trip count. For targets that need movt instead (e.g. Native Client), use
the same code as in thumb2 mode to materialize the trip count.
Reviewers: jfb, t.p.northover
Differential Revision: http://reviews.llvm.org/D8442
llvm-svn: 233324
This patch teaches fast-isel how to select 128-bit vector load instructions.
Added test CodeGen/X86/fast-isel-vecload.ll
Differential Revision: http://reviews.llvm.org/D8605
llvm-svn: 233270
those are in the same basic block.
The previous approach was the topological order of the basic block.
By default this rule is disabled.
Related to PR22768.
llvm-svn: 233241
This patch adds supports for the vector constant folding of TRUNCATE and FP_EXTEND instructions and tidies up the SINT_TO_FP and UINT_TO_FP instructions to match.
It also moves the vector constant folding for the FNEG and FABS instructions to use the DAG.getNode() functionality like the other unary instructions.
Differential Revision: http://reviews.llvm.org/D8593
llvm-svn: 233224
We don't have any logic to emit those tables yet, so the sdag lowering
of this intrinsic is just a stub. We can see the intrinsic in the
prepared IR, though.
llvm-svn: 233209
This patch adds Hardware Transaction Memory (HTM) support supported by ISA 2.07
(POWER8). The intrinsic support is based on GCC one [1], but currently only the
'PowerPC HTM Low Level Built-in Function' are implemented.
The HTM instructions follows the RC ones and the transaction initiation result
is set on RC0 (with exception of tcheck). Currently approach is to create a
register copy from CR0 to GPR and comapring. Although this is suboptimal, since
the branch could be taken directly by comparing the CR0 value, it generates code
correctly on both test and branch and just return value. A possible future
optimization could be elimitate the MFCR instruction to branch directly.
The HTM usage requires a recently newer kernel with PPC HTM enabled. Tested on
powerpc64 and powerpc64le.
This is send along a clang patch to enabled the builtins and option switch.
[1] https://gcc.gnu.org/onlinedocs/gcc/PowerPC-Hardware-Transactional-Memory-Built-in-Functions.html
Phabricator Review: http://reviews.llvm.org/D8247
llvm-svn: 233204
This patch allows AVX blend instructions to handle insertion into the low
element of a 256-bit vector for the appropriate data types.
For f32, instead of:
vblendps $1, %xmm1, %xmm0, %xmm1 ## xmm1 = xmm1[0],xmm0[1,2,3]
vblendps $15, %ymm1, %ymm0, %ymm0 ## ymm0 = ymm1[0,1,2,3],ymm0[4,5,6,7]
we get:
vblendps $1, %ymm1, %ymm0, %ymm0 ## ymm0 = ymm1[0],ymm0[1,2,3,4,5,6,7]
For f64, instead of:
vmovsd %xmm1, %xmm0, %xmm1 ## xmm1 = xmm1[0],xmm0[1]
vblendpd $3, %ymm1, %ymm0, %ymm0 ## ymm0 = ymm1[0,1],ymm0[2,3]
we get:
vblendpd $1, %ymm1, %ymm0, %ymm0 ## ymm0 = ymm1[0],ymm0[1,2,3]
For the hardware-neglected integer data types, I left a TODO comment in the
code and added regression tests for a follow-on patch.
Differential Revision: http://reviews.llvm.org/D8609
llvm-svn: 233199
1. There were no CHECK-LABELs, so we could match instructions from the wrong function.
2. The use of zero operands meant multiple xor instructions could match some CHECKs.
3. The test was over-specified to need a Sandybridge CPU and Darwin triple.
llvm-svn: 233198
Reverts the code change from r221168 and the relevant test.
It was a mistake to disable the combiner, and based on the ultimate
definition of 'optnone' we shouldn't have considered the test case
as failing in the first place.
llvm-svn: 233153
We can't use TargetFrameLowering::getFrameIndexOffset directly, because
Win64 really wants the offset from the stack pointer at the end of the
prologue. Instead, use X86FrameLowering::getFrameIndexOffsetFromSP(),
which is a pretty close approximiation of that. It fails to handle cases
with interestingly large stack alignments, which is pretty uncommon on
Win64 and is TODO.
llvm-svn: 233137
vperm2x128 instructions have the special ability (aka free hardware capability)
to shuffle zero values into a vector.
This patch recognizes that type of shuffle and generates the appropriate
control byte.
https://llvm.org/bugs/show_bug.cgi?id=22984
Differential Revision: http://reviews.llvm.org/D8563
llvm-svn: 233100
Summary:
Previous behaviour of 'R' and 'm' has been preserved for now. They will be
improved in subsequent commits.
The offset permitted by ZC varies according to the subtarget since it is
intended to match the restrictions of the pref, ll, and sc instructions.
The restrictions on these instructions are:
* For microMIPS: 12-bit signed offset.
* For Mips32r6/Mips64r6: 9-bit signed offset.
* Otherwise: 16-bit signed offset.
Reviewers: vkalintiris
Reviewed By: vkalintiris
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D8414
llvm-svn: 233063
While the uitofp scalar constant folding treats an integer as an unsigned value (from lang ref):
%X = sitofp i8 -1 to double ; yields double:-1.0
%Y = uitofp i8 -1 to double ; yields double:255.0
The vector constant folding was always using sitofp:
%X = sitofp <2 x i8> <i8 -1, i8 -1> to <2 x double> ; yields <double -1.0, double -1.0>
%Y = uitofp <2 x i8> <i8 -1, i8 -1> to <2 x double> ; yields <double -1.0, double -1.0>
This patch fixes this so that the correct opcode is used for sitofp and uitofp.
%X = sitofp <2 x i8> <i8 -1, i8 -1> to <2 x double> ; yields <double -1.0, double -1.0>
%Y = uitofp <2 x i8> <i8 -1, i8 -1> to <2 x double> ; yields <double 255.0, double 255.0>
Differential Revision: http://reviews.llvm.org/D8560
llvm-svn: 233033
Continue to simplify the `DIDescriptor` subclasses, so that they behave
more like raw pointers. Remove `getRaw()`, replace it with an
overloaded `get()`, and overload the arrow and cast operators. Two
testcases started to crash on the arrow operators with this change
because of `scope:` references that weren't real scopes. I fixed them.
Soon I'll add verifier checks for them too.
This also adds explicit dereference operators. Previously, the builtin
dereference against `operator MDNode *()` would have worked, but now the
builtins are ambiguous.
llvm-svn: 233030
The pass used to be enabled by default with CodeGenOpt::Less (-O1).
This is too aggressive, considering the pass indiscriminately merges
all globals together.
Currently, performance doesn't always improve, and, on code that uses
few globals (e.g., the odd file- or function- static), more often than
not is degraded by the optimization. Lengthy discussion can be found
on llvmdev (AArch64-focused; ARM has similar problems):
http://lists.cs.uiuc.edu/pipermail/llvmdev/2015-February/082800.html
Also, it makes tooling and debuggers less useful when dealing with
globals and data sections.
GlobalMerge needs to better identify those cases that benefit, and this
will be done separately. In the meantime, move the pass to run with
-O3 rather than -O1, on both ARM and AArch64.
llvm-svn: 233024
This enables very common cases to switch to the
smaller encoding.
All of the standard LLVM canonicalizations of comparisons
are the opposite of what we want. Compares with constants
are moved to the RHS, but the first operand can be an inline
immediate, literal constant, or SGPR using the 32-bit VOPC
encoding.
There are additional bad canonicalizations that should
also be fixed, such as canonicalizing ge x, k to gt x, (k + 1)
if this makes k no longer an inline immediate value.
llvm-svn: 232988
This change is incorrect since it converts double rounding into single rounding,
which can produce different results. Instead this optimization will be done by
modifying Clang's codegen to not produce double rounding in the first place.
This reverts commit r232954.
llvm-svn: 232962
Specifically when the conversion is done in two steps, f16 -> f32 -> f64.
For example:
%1 = tail call float @llvm.convert.from.fp16.f32(i16 %0)
%conv = fpext float %1 to double
to:
vcvtb.f64.f16
llvm-svn: 232954