This patch adds support to vectorize intrinsics such as powi, cttz and ctlz in Vectorizer. These intrinsics are different from other
intrinsics as second argument to these function must be same in order to vectorize them and it should be represented as a scalar.
Review: http://reviews.llvm.org/D3851#inline-32769 and http://reviews.llvm.org/D3937#inline-32857
llvm-svn: 209873
The corresponding CFE patch replaces these intrinsics with vector initializers
in avxintrin.h. This patch removes the LLVM intrinsics from the backend.
We now stop lowering at X86ISD::VBROADCAST custom node rather than lowering
that further to the intrinsics.
The patch only changes VBROADCASTS* and leaves VBROADCAST[FI]128 to continue
to use intrinsics. As explained in the CFE patch, the reason is that we
currently don't generate as good code for them without the intrinsics.
CodeGen/X86/avx-vbroadcast.ll already provides coverage for this change. It
checks that for a series of insertelements we generate the appropriate
vbroadcast instruction.
Also verified that there was no assembly change in the test-suite before and
after this patch.
llvm-svn: 209864
They are replaced with the same IR that is generated for the
vector-initializers in avxintrin.h.
The test verifies that we get back the original instruction. I haven't seen
this approach to be used in other auto-upgrade tests (i.e. llc + FileCheck)
but I think it's the most direct way to test this case. I believe this should
work because llc upgrades calls during parsing. (Other tests mostly check
that assembling and disassembling yields the upgraded IR.)
llvm-svn: 209863
The loop vectorizer instantiates be-taken-count + 1 as the loop iteration count.
If this expression overflows the generated code was invalid.
In case of overflow the code now jumps to the scalar loop.
Fixes PR17288.
llvm-svn: 209854
These tests ensure that a change I will propose in clang works as
expected.
Summary:
Added tests for the generation of blend+immediate instructions from a
shufflevector.
These tests were proposed along with a patch that was dropped. I'm
committing the tests anyway to protect against possible regressions in
codegen.
Reviewers: nadav, bkramer
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D3600
llvm-svn: 209853
Currently LLVM will generally merge GEPs. This allows backends to use more
complex addressing modes. In some cases this is not happening because there
is PHI inbetween the two GEPs:
GEP1--\
|-->PHI1-->GEP3
GEP2--/
This patch checks to see if GEP1 and GEP2 are similiar enough that they can be
cloned (GEP12) in GEP3's BB, allowing GEP->GEP merging (GEP123):
GEP1--\ --\ --\
|-->PHI1-->GEP3 ==> |-->PHI2->GEP12->GEP3 == > |-->PHI2->GEP123
GEP2--/ --/ --/
This also breaks certain use chains that are preventing GEP->GEP merges that the
the existing instcombine would merge otherwise.
Tests included.
llvm-svn: 209843
This seems to match what gcc does for ppc and what every other llvm
backend does.
This is a fixed version of r209638. The difference is to avoid any change
in behavior for functions. The logic for using constant pools for function
addresseses is spread over a few places and we have to keep them in sync.
llvm-svn: 209821
field represents ELF section header sh_info field and does not have any
sense for regular sections. Its interpretation depends on section type.
llvm-svn: 209801
During loop-unroll, loop exits from the current loop may end up in in different
outer loop. This requires to re-form LCSSA recursively for one level down from
the outer most loop where loop exits are landed during unroll. This fixes PR18861.
Differential Revision: http://reviews.llvm.org/D2976
llvm-svn: 209796
An address only use of an extract element of a load can be simplified to a
load. Without this the result of the extract element is spilled to the
stack so that an address is available.
llvm-svn: 209788
Don't assume that dynamically initialized globals are all initialized from
_GLOBAL__<module_name>I_ function. Instead, scan the llvm.global_ctors and
insert poison/unpoison calls to each function there.
Patch by Nico Weber!
llvm-svn: 209780
This matches gcc's behavior. It also seems natural given that aliases
contain other properties that govern how it is accessed (linkage,
visibility, dll storage).
Clang still has to be updated to expose this feature to C.
llvm-svn: 209759
Currently LLVM will generally merge GEPs. This allows backends to use more
complex addressing modes. In some cases this is not happening because there
is PHI inbetween the two GEPs:
GEP1--\
|-->PHI1-->GEP3
GEP2--/
This patch checks to see if GEP1 and GEP2 are similiar enough that they can be
cloned (GEP12) in GEP3's BB, allowing GEP->GEP merging (GEP123):
GEP1--\ --\ --\
|-->PHI1-->GEP3 ==> |-->PHI2->GEP12->GEP3 == > |-->PHI2->GEP123
GEP2--/ --/ --/
This also breaks certain use chains that are preventing GEP->GEP merges that the
the existing instcombine would merge otherwise.
Tests included.
llvm-svn: 209755
This reverts r208640 (I've just XFAILed the test) because it broke ppc64/Linux
self-hosting. Because nearly every regression test triggers a segfault, I hope
this will be easy to fix.
llvm-svn: 209747
This patch implements two things:
1. If we know one number is positive and another is negative, we return true as
signed addition of two opposite signed numbers will never overflow.
2. Implemented TODO : If one of the operands only has one non-zero bit, and if
the other operand has a known-zero bit in a more significant place than it
(not including the sign bit) the ripple may go up to and fill the zero, but
won't change the sign. e.x - (x & ~4) + 1
We make sure that we are ignoring 0 at MSB.
Patch by Suyog Sarda.
llvm-svn: 209746
This reverts commit r209638 because it broke self-hosting on ppc64/Linux. (the
Clang-compiled TableGen would segfault because it jumped to an invalid address
from within _ZNK4llvm17ManagedStaticBase21RegisterManagedStaticEPFPvvEPFvS1_E
(which is within the command-line parameter registration process)).
llvm-svn: 209745
Add regression tests for the following transformation:
str X, [x20]
...
add x20, x20, #32
->
str X, [x20], #32
with X being either w0, x0, s0, d0 or q0.
llvm-svn: 209715
Add regression tests for the following transformation:
ldr X, [x20]
...
add x20, x20, #32
->
ldr X, [x20], #32
with X being either w0, x0, s0, d0 or q0.
llvm-svn: 209711
The delinearization is needed only to remove the non linearity induced by
expressions involving multiplications of parameters and induction variables.
There is no problem in dealing with constant times parameters, or constant times
an induction variable.
For this reason, the current patch discards all constant terms and multipliers
before running the delinearization algorithm on the terms. The only thing
remaining in the term expressions are parameters and multiply expressions of
parameters: these simplified term expressions are passed to the array shape
recognizer that will not recognize constant dimensions anymore: these will be
recognized as different strides in parametric subscripts.
The only important special case of a constant dimension is the size of elements.
Instead of relying on the delinearization to infer the size of an element,
compute the element size from the base address type. This is a much more precise
way of computing the element size than before, as we would have mixed together
the size of an element with the strides of the innermost dimension.
llvm-svn: 209691
%higher and %highest can have non-zero values only for offsets greater
than 2GB, which is highly unlikely, if not impossible when compiling a
single function. This makes long branch for MIPS64 3 instructions smaller.
Differential Revision: http://llvm-reviews.chandlerc.com/D3281.diff
llvm-svn: 209678
After much puppetry, here's the major piece of the work to ensure that
even when a concrete definition preceeds all inline definitions, an
abstract definition is still created and referenced from both concrete
and inline definitions.
Variables are still broken in this case (see comment in
dbg-value-inlined-parameter.ll test case) and will be addressed in
follow up work.
llvm-svn: 209677
A further step to correctly emitting concrete out of line definitions
preceeding inlined instances of the same program.
To do this, emission of subprograms must be delayed until required since
we don't know which (abstract only (if there's no out of line
definition), concrete only (if there are no inlined instances), or both)
DIEs are required at the start of the module.
To reduce the test churn in the following commit that actually fixes the
bug, this commit introduces the lazy DIE construction and cleans up test
cases that are impacted by the changes in the resulting DIE ordering.
llvm-svn: 209675
This is a precursor to fixing inlined debug info where the concrete,
out-of-line definition may preceed any inlined usage. To cope with this,
the attributes that may appear on the concrete definition or the
abstract definition are delayed until the end of the module. Then, if an
abstract definition was created, it is referenced (and no other
attributes are added to the out-of-line definition), otherwise the
attributes are added directly to the out-of-line definition.
In a couple of cases this causes not just reordering of attributes, but
reordering of types. When the creation of the attribute is delayed, if
that creation would create a type (such as for a DW_AT_type attribute)
then other top level DIEs may've been constructed during the delay,
causing the referenced type to be created and added after those
intervening DIEs. In the extreme case, in cross-cu-inlining.ll, this
actually causes the DW_TAG_basic_type for "int" to move from one CU to
another.
llvm-svn: 209674
This is an enhancement to SeparateConstOffsetFromGEP. With this patch, we can
extract a constant offset from "s/zext and/or/xor A, B".
Added a new test @ext_or to verify this enhancement.
Refactoring the code, I also extracted some common logic to function
Distributable.
llvm-svn: 209670
This old test didn't have the argument numbering that's now squirelled
away in the high bits of the line number in the DW_TAG_arg_variable
metadata.
Add the numbering and update the test to ensure arguments are in-order.
llvm-svn: 209669
Detected by Daniel Jasper, Ilia Filippov, and Andrea Di Biagio
Fixed the argument order to select (the mask semantics to blendv* are the
inverse of select) and fixed the tests
Added parenthesis to the assert condition
Ran clang-format
llvm-svn: 209667
In PPCISelLowering.cpp: PPCTargetLowering::LowerBUILD_VECTOR(), there
is an optimization for certain patterns to generate one or two vector
splats followed by a vector add or subtract. This operation is
represented by a VADD_SPLAT in the selection DAG. Prior to this
patch, it was possible for the VADD_SPLAT to be assigned the wrong
data type, causing incorrect code generation. This patch corrects the
problem.
Specifically, the code previously assigned the value type of the
BUILD_VECTOR node to the newly generated VADD_SPLAT node. This is
correct much of the time, but not always. The problem is that the
call to isConstantSplat() may return a SplatBitSize that is not the
same as the number of bits in the original element vector type. The
correct type to assign is a vector type with the same element bit size
as SplatBitSize.
The included test case shows an example of this, where the
BUILD_VECTOR node has a type of v16i8. The vector to be built is {0,
16, 0, 16, 0, 16, 0, 16, 0, 16, 0, 16, 0, 16, 0, 16}. isConstantSplat
detects that we can generate a splat of 16 for type v8i16, which is
the type we must assign to the VADD_SPLAT node. If we do not, we
generate a vspltisb of 8 and a vaddubm, which generates the incorrect
result {16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16,
16}. The correct code generation is a vspltish of 8 and a vadduhm.
This patch also corrected code generation for
CodeGen/PowerPC/2008-07-10-SplatMiscompile.ll, which had been marked
as an XFAIL, so we can remove the XFAIL from the test case.
llvm-svn: 209662