There is no need to open-code the alignment calculation, we have a
handy RoundUpToAlignment function which "Does The Right Thing (TM)".
llvm-svn: 230392
Author: Simon Pilgrim <llvm-dev@redking.me.uk>
Date: Mon Feb 23 23:04:28 2015 +0000
Fix based on post-commit comment on D7816 & rL230177 - BUILD_VECTOR operand truncation was using the the BV's output scalar type instead of the input type.
and
Author: Simon Pilgrim <llvm-dev@redking.me.uk>
Date: Sun Feb 22 18:17:28 2015 +0000
[DagCombiner] Generalized BuildVector Vector Concatenation
The CONCAT_VECTORS combiner pass can transform the concat of two BUILD_VECTOR nodes into a single BUILD_VECTOR node.
This patch generalises this to support any number of BUILD_VECTOR nodes, and also permits UNDEF nodes to be included as well.
This was noticed as AVX vec128 -> vec256 canonicalization sometimes creates a CONCAT_VECTOR with a real vec128 lower and an vec128 UNDEF upper.
Differential Revision: http://reviews.llvm.org/D7816
as the root cause of PR22678 which is causing an assertion inside the DAG combiner.
I'll follow up to the main thread as well.
llvm-svn: 230358
The logic is almost there already, with our special homogeneous aggregate
handling. Tweaking it like this allows front-ends to emit AAPCS compliant code
without ever having to count registers or add discarded padding arguments.
Only arrays of i32 and i64 are needed to model AAPCS rules, but I decided to
apply the logic to all integer arrays for more consistency.
llvm-svn: 230348
For almost all node types, if the target requested custom lowering, and
LowerOperation returned its input, we'd treat the original node as legal. This
did not work, however, for many loads and stores, because they follow
slightly different code paths, and we did not account for the possibility of
LowerOperation returning its input at those call sites.
I think that we now handle this consistently everywhere. At the call sites in
LegalizeDAG, we used to assert in this case, so there's no functional change
for any existing code there. For the call sites in LegalizeVectorOps, this
really only affects whether or not we set Changed = true, but I think makes the
semantics clearer.
No test case here, but it will be covered by an upcoming PowerPC commit adding
QPX support.
llvm-svn: 230332
This patch teaches the backend how to expand a double-half conversion into
a double-float conversion immediately followed by a float-half conversion.
We do this only under fast-math, and if float-half conversions are legal
for the target.
Added test CodeGen/X86/fastmath-float-half-conversion.ll
Differential Revision: http://reviews.llvm.org/D7832
llvm-svn: 230276
Front-ends could use global unnamed_addr to hold pointers to other
symbols, like @gotequivalent below:
@foo = global i32 42
@gotequivalent = private unnamed_addr constant i32* @foo
@delta = global i32 trunc (i64 sub (i64 ptrtoint (i32** @gotequivalent to i64),
i64 ptrtoint (i32* @delta to i64))
to i32)
The global @delta holds a data "PC"-relative offset to @gotequivalent,
an unnamed pointer to @foo. The darwin/x86-64 assembly output for this follows:
.globl _foo
_foo:
.long 42
.globl _gotequivalent
_gotequivalent:
.quad _foo
.globl _delta
_delta:
.long _gotequivalent-_delta
Since unnamed_addr indicates that the address is not significant, only
the content, we can optimize the case above by replacing pc-relative
accesses to "GOT equivalent" globals, by a PC relative access to the GOT
entry of the final symbol instead. Therefore, "delta" can contain a pc
relative relocation to foo's GOT entry and we avoid the emission of
"gotequivalent", yielding the assembly code below:
.globl _foo
_foo:
.long 42
.globl _delta
_delta:
.long _foo@GOTPCREL+4
There are a couple of advantages of doing this: (1) Front-ends that need
to emit a great deal of data to store pointers to external symbols could
save space by not emitting such "got equivalent" globals and (2) IR
constructs combined with this opt opens a way to represent GOT pcrel
relocations by using the LLVM IR, which is something we previously had
no way to express.
Differential Revision: http://reviews.llvm.org/D6922
rdar://problem/18534217
llvm-svn: 230264
It was previously using the subtarget to get values for the global
offset without actually checking each function as it was generating
code. Go ahead and solidify the current behavior and make the
existing FIXMEs more prominent.
As a note the ARM backend previously had a thumb1 and non-thumb1
set of defaults. Only the former was tested so I've changed the
behavior to only use that for now.
llvm-svn: 230245
The CONCAT_VECTORS combiner pass can transform the concat of two BUILD_VECTOR nodes into a single BUILD_VECTOR node.
This patch generalises this to support any number of BUILD_VECTOR nodes, and also permits UNDEF nodes to be included as well.
This was noticed as AVX vec128 -> vec256 canonicalization sometimes creates a CONCAT_VECTOR with a real vec128 lower and an vec128 UNDEF upper.
Differential Revision: http://reviews.llvm.org/D7816
llvm-svn: 230177
DAGCombine will rewrite an BUILD_VECTOR where all non-undef inputs some from
[US]INT_TO_FP, as a BUILD_VECTOR of integers with the conversion applied as a
vector operation. We check operation legality of the conversion, but fail to
check legality of the integer vector type itself. Because targets don't
normally override operation legality defaults for illegal types, we need to
check this also.
This came up in the context of the QPX vector entensions for PowerPC (which can
have legal floating-point vector types without corresponding legal integer
vector types). No in-tree test case for this yes, but one can be added once
the QPX support has been committed.
llvm-svn: 230176
When expanding a truncating store or extending load using vector extracts or
inserts and scalar stores and loads, we were giving each of these scalar stores
or loads the same alignment as the original vector operation. While this will
often be right (most vector operations, especially those produced by
autovectorization, have the alignment of the underlying scalar type), the
vector operation could certainly have a larger alignment.
No test case (yet); noticed by inspection.
llvm-svn: 230175
asm parsing since it's not subtarget dependent and we can't depend
upon the one hanging off the MachineFunction's subtarget still
being around.
llvm-svn: 230135
Synthesizing a call directly using the MI layer would confuse the frame
lowering code. This is problematic as frame lowering is highly
sensitive the particularities of calls, etc.
llvm-svn: 230129
Summary:
Letting them begin at the PHI instruction slightly simplifies the code
but more importantly avoids breaking the assumption that live ranges
starting at the block begin are also live at the end of the predecessor
blocks. The MachineVerifier checks that but was apparently never run in
the few instances where liveranges are calculated for machine-SSA
functions.
Reviewers: qcolombet
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D7779
llvm-svn: 230093
This allows sharing of FMA forming combines to work
with instructions that have the same semantics as a separate
multiply and add.
This is expand by default, and only formed post legalization
so it shouldn't have much impact on targets that do not want it.
llvm-svn: 230070
AsmPrinter.
getSubtargetInfo now asserts that the MachineFunction exists.
Debug printing of register naming now uses the register info
from MCAsmInfo as that's unchanging.
llvm-svn: 229978
Today a simple function that only catches exceptions and doesn't run
destructor cleanups ends up containing a dead call to _Unwind_Resume
(PR20300). We can't remove these dead resume instructions during normal
optimization because inlining might introduce additional landingpads
that do have cleanups to run. Instead we can do this during EH
preparation, which is guaranteed to run after inlining.
Fixes PR20300.
Reviewers: majnemer
Differential Revision: http://reviews.llvm.org/D7744
llvm-svn: 229944
during SetupMachineFunction. This is also the single use of MII
and it'll be changing to TargetInstrInfo (which is MachineFunction
based) in the next commit here.
llvm-svn: 229931
asm support in the asm printer. If we can get a subtarget from
the machine function then we should do so, otherwise we can
go ahead and create a default one since we're at the module
level.
llvm-svn: 229916
First, don't combine bit masking into vector shuffles (even ones the
target can handle) once operation legalization has taken place. Custom
legalization of vector shuffles may exist for these patterns (making the
predicate return true) but that custom legalization may in some cases
produce the exact bit math this matches. We only really want to handle
this prior to operation legalization.
However, the x86 backend, in a fit of awesome, relied on this. What it
would do is mark VSELECTs as expand, which would turn them into
arithmetic, which this would then match back into vector shuffles, which
we would then lower properly. Amazing.
Instead, the second change is to teach the x86 backend to directly form
vector shuffles from VSELECT nodes with constant conditions, and to mark
all of the vector types we support lowering blends as shuffles as custom
VSELECT lowering. We still mark the forms which actually support
variable blends as *legal* so that the custom lowering is bypassed, and
the legal lowering can even be used by the vector shuffle legalization
(yes, i know, this is confusing. but that's how the patterns are
written).
This makes the VSELECT lowering much more sensible, and in fact should
fix a bunch of bugs with it. However, as you'll see in the test cases,
right now what it does is point out the *hilarious* deficiency of the
new vector shuffle lowering when it comes to blends. Fortunately, my
very next patch fixes that. I can't submit it yet, because that patch,
somewhat obviously, forms the exact and/or pattern that the DAG combine
is matching here! Without this patch, teaching the vector shuffle
lowering to produce the right code infloops in the DAG combiner. With
this patch alone, we produce terrible code but at least lower through
the right paths. With both patches, all the regressions here should be
fixed, and a bunch of the improvements (like using 2 shufps with no
memory loads instead of 2 andps with memory loads and an orps) will
stay. Win!
There is one other change worth noting here. We had hilariously wrong
vectorization cost estimates for vselect because we fell through to the
code path that assumed all "expand" vector operations are scalarized.
However, the "expand" lowering of VSELECT is vector bit math, most
definitely not scalarized. So now we go back to the correct if horribly
naive cost of "1" for "not scalarized". If anyone wants to add actual
modeling of shuffle costs, that would be cool, but this seems an
improvement on its own. Note the removal of 16 and 32 "costs" for doing
a blend. Even in SSE2 we can blend in fewer than 16 instructions. ;] Of
course, we don't right now because of OMG bad code, but I'm going to fix
that. Next patch. I promise.
llvm-svn: 229835