GCC 4.8 detected a signed compare [-Wsign-compare]. Add a cast for the
destination index. Add an assert to catch a potential overflow however unlikely
it may be.
llvm-svn: 213878
When we had a vector_shuffle where we had an input from each vector, we
could miscompile it because we were assuming the input from V2 wouldn't
be moved from where it was on the vector.
Added a test case.
llvm-svn: 213826
There were still some disassembler bits in lib/MC, but their use of Object
was only visible in the includes they used, not in the symbols.
llvm-svn: 213808
The transform to constant fold unary operations with an AND across a
vector comparison applies when the constant is not a splat of a scalar
as well.
llvm-svn: 213800
The folding of unary operations through a vector compare and mask operation
is only safe if the unary operation result is of the same size as its input.
For example, it's not safe for [su]itofp from v4i32 to v4f64.
llvm-svn: 213799
This chang fully reverts r211771.
That revision added a canonicalization rule which has the potential to causes a
combine-cycle in the target-independent canonicalizing DAG combine.
The plan is to move the logic that forms target specific addsub nodes as part of
the lowering of shuffles.
llvm-svn: 213736
Without this, we produce non-extern relocations when targeting older OS X
versions that ld64 can't cope with in the particular context of __eh_frame
sections (who'd want generic relocation-processing anyway?).
This means that an updated linker (ld64 from Xcode 3.2.6 or later) may be
needed when targeting such platforms with a modern version of LLVM, but this is
probably the case anyway and a reasonable requirement.
PR20212, rdar://problem/17544795
llvm-svn: 213665
This patch removes function 'CommuteVectorShuffle' from X86ISelLowering.cpp
and moves its logic into SelectionDAG.cpp as method 'getCommutedVectorShuffles'.
This refactoring is in preperation of an upcoming change to the DAGCombiner.
llvm-svn: 213503
Since the result of a SETCC for X86 is 0 or -1 in each lane, we can
move unary operations, in this case [su]int_to_fp through the mask
operation and constant fold the operation away. Generally speaking:
UNARYOP(AND(VECTOR_CMP(x,y), constant))
--> AND(VECTOR_CMP(x,y), constant2)
where constant2 is UNARYOP(constant).
This implements the transform where UNARYOP is [su]int_to_fp.
For example, consider the simple function:
define <4 x float> @foo(<4 x float> %val, <4 x float> %test) nounwind {
%cmp = fcmp oeq <4 x float> %val, %test
%ext = zext <4 x i1> %cmp to <4 x i32>
%result = sitofp <4 x i32> %ext to <4 x float>
ret <4 x float> %result
}
Before this change, the SSE code is generated as:
LCPI0_0:
.long 1 ## 0x1
.long 1 ## 0x1
.long 1 ## 0x1
.long 1 ## 0x1
.section __TEXT,__text,regular,pure_instructions
.globl _foo
.align 4, 0x90
_foo: ## @foo
cmpeqps %xmm1, %xmm0
andps LCPI0_0(%rip), %xmm0
cvtdq2ps %xmm0, %xmm0
retq
After, the code is improved to:
LCPI0_0:
.long 1065353216 ## float 1.000000e+00
.long 1065353216 ## float 1.000000e+00
.long 1065353216 ## float 1.000000e+00
.long 1065353216 ## float 1.000000e+00
.section __TEXT,__text,regular,pure_instructions
.globl _foo
.align 4, 0x90
_foo: ## @foo
cmpeqps %xmm1, %xmm0
andps LCPI0_0(%rip), %xmm0
retq
The cvtdq2ps has been constant folded away and the floating point 1.0f
vector lanes are materialized directly via the ModRM operand of andps.
llvm-svn: 213342
There are two parts here. First is to modify tablegen to adjust the encoding
type ENCODING_RM with the scaling factor.
The second is to use the new encoding types to compute the correct
displacement in the decoder.
Fixes <rdar://problem/17608489>
llvm-svn: 213281
Passes the computed scaling factor in TSFlags rather than the old attributes.
Also removes the C++ version of computing the scaling factor (MemObjSize)
along with the asserts added by the previous patch.
No functional change.
llvm-svn: 213279
This does not actually move the logic yet but reimplements it in the Tablegen
language. Then asserts that the new implementation results in the same value.
The next patch will remove the assert and the temporary use of the TSFlags and
remove the C++ implementation.
The formula requires a limited form of the logical left and right operators.
I implemented these with the bit-extract/insert operator (i.e. blah{bits}).
No functional change.
llvm-svn: 213278
Previously we asserted on this code. Currently compiler-rt doesn't
actually implement any of these new libcalls, but external help is
pretty much the only viable option for LLVM.
I've followed the much more generic "__truncST2" naming, as opposed to
the odd name for f32 -> f16 truncation. This can obviously be changed
later, or overridden by any targets that need to.
llvm-svn: 213252
x86 has no native ability to extend an f16 to f64, but the same result
is obtained if we expand it into two separate extensions: f16 -> f32
-> f64.
Unfortunately the same is not true for truncate, so that still results
in a compilation failure.
llvm-svn: 213251
This makes the two intrinsics @llvm.convert.from.f16 and
@llvm.convert.to.f16 accept types other than simple "float". This is
only strictly needed for the truncate operation, since otherwise
double rounding occurs and there's no way to represent the strict IEEE
conversion. However, for symmetry we allow larger types in the extend
too.
During legalization, we can expand an "fp16_to_double" operation into
two extends for convenience, but abort when the truncate isn't legal. A new
libcall is probably needed here.
Even after this commit, various target tweaks are needed to actually use the
extended intrinsics. I've put these into separate commits for clarity, so there
are no actual tests of f64 conversion here.
llvm-svn: 213248
It turns out that in most cases (the main exception being i1-related
types) once these operations are formed we cannot separate them and
the targets end up having to deal with them whether they want to or
not.
This is not a good situation, and a more reasonable default can be
formed by ackowledging this and having targets leave them as Legal.
Only x86 seems to be affected (other targets don't even try marking
the operation Expand).
Mostly there's no visible change here yet, but it will be useful to
have truly expanded EXTLOADS for MVT::f16 softening support.
llvm-svn: 213162
Before this change, method 'isShuffleMaskLegal' didn't know that shuffles
implementing a 'movhlps' operation were perfectly legal for SSE targets.
This patch adds the missing check for 'isMOVHLPSMask' inside method
'isShuffleMaskLegal' to fix the problem.
The reason why it is important to do this is because the DAGCombiner
conservatively avoids combining a pair of shuffles if the resulting shuffle
node has an illegal mask. Before this patch, shuffles with a MOVHLPS mask were
wrongly considered not to be legal. This was the root cause of some poor-code
generation bugs.
llvm-svn: 213137
There exists a helper function to abstract away the various differences
between ConstantVector, ConstantDataVector, ConstantAggregateZero, etc.
Use it to simplify X86WindowsTargetObjectFile::getSectionForConstant.
llvm-svn: 213104
Refactoring; no functional changes intended
Removed PostRAScheduler bits from subtargets (X86, ARM).
Added PostRAScheduler bit to MCSchedModel class.
This bit is set by a CPU's scheduling model (if it exists).
Removed enablePostRAScheduler() function from TargetSubtargetInfo and subclasses.
Fixed the existing enablePostMachineScheduler() method to use the MCSchedModel (was just returning false!).
Added methods to TargetSubtargetInfo to allow overrides for AntiDepBreakMode, CriticalPathRCs, and OptLevel for PostRAScheduling.
Added enablePostRAScheduler() function to PostRAScheduler class which queries the subtarget for the above values.
Preserved existing scheduler behavior for ARM, MIPS, PPC, and X86:
a. ARM overrides the CPU's postRA settings by enabling postRA for any non-Thumb or Thumb2 subtarget.
b. MIPS overrides the CPU's postRA settings by enabling postRA for everything.
c. PPC overrides the CPU's postRA settings by enabling postRA for everything.
d. X86 is the only target that actually has postRA specified via sched model info.
Differential Revision: http://reviews.llvm.org/D4217
llvm-svn: 213101
Fixes a gcc warning caused by a typo. A redundant assignment operation was
accidentally used as the third operand of a conditional expression.
No functional change intended.
llvm-svn: 213061
This implements the FastLowerCall hook, which is based on the DoSelectCall
function. The implementation is very similar, but the target-independent call
lowering part has been factored out.
This should also enable patchpoint intrinsic lowering for FastISel on X86.
Related to <rdar://problem/17427052>.
llvm-svn: 213049
Revert "[FastISel][X86] Implement the FastLowerIntrinsicCall hook."
Revert "[FastISel][X86] Implement the FastLowerCall hook."
This reverts commit r213035, r213036, and r213037 to make the
buildbots happy again.
llvm-svn: 213048
The constant pool entry code for WinCOFF assumed that vector constants
would be formed using ConstantDataVector, it did not expect to see a
ConstantVector. Furthermore, it did not expect undef as one of the
elements of the vector.
ConstantVectors should be handled like ConstantDataVectors, treat Undef
as zero.
llvm-svn: 213038
This implements the FastLowerCall hook, which is based on the DoSelectCall
function. The implementation is very similar, but the target-independent call
lowering part has been factored out.
This should also enable patchpoint intrinsic lowering for FastISel on X86.
Related to <rdar://problem/17427052>.
llvm-svn: 213035
No functional change.
The offsets for the other bitfields are specified symbolically. I need to
increase the size for one of the earlier fields which is easier after this
cleanup.
Why these bits are relative to VEXShift is a bit strange but that is for
another cleanup.
I made sure that the values for the enums are unchanged after this change.
llvm-svn: 213011
COFF lacks a feature that other object file formats support: mergeable
sections.
To work around this, MSVC sticks constant pool entries in special COMDAT
sections so that each constant is in it's own section. This permits
unused constants to be dropped and it also allows duplicate constants in
different translation units to get merged together.
This fixes PR20262.
Differential Revision: http://reviews.llvm.org/D4482
llvm-svn: 213006
We would emit a libcall for a 64-bit atomic on x86 after SVN r212119. This was
due to the misuse of hasCmpxchg16 to indicate if cmpxchg8b was supported on a
32-bit target. They were added at different times and would result in the
border condition being mishandled.
This fixes the border case to emit the cmpxchg8b instruction for 64-bit atomic
operations on x86 at the cost of restoring a long-standing bug in the codegen.
We emit a cmpxchg8b on all x86 targets even where the CPU does not support this
instruction (pre-Pentium CPUs). Although this bug should be fixed, this was
present prior to SVN r212119 and this change, so this is not really introducing
a regression.
llvm-svn: 212956
We construct a temporary "atomicrmw xchg" instruction when lowering atomic
stores for widths that aren't supported natively. This isn't on the top-level
worklist though, so it won't be removed automatically and we have to do it
ourselves once that itself has been lowered.
Thanks Saleem for pointing this out!
llvm-svn: 212948
Rename member variables and functions for the MCStreamer for DWARF-like
unwinding management. Rename the Windows ones as well and make the naming and
handling similar across the two. No functional change intended.
llvm-svn: 212912
No functional change. As I was trying to understand this function, I found
that variables were reused with confusing names and the broadcast case was a
bit too implicit. Hopefully, this is an improvement.
llvm-svn: 212795