All of the cases were just appending from random access iterators to a
vector. Using insert/append can grow the vector to the perfect size
directly and moves the growing out of the loop. No intended functionalty
change.
llvm-svn: 230845
Straightforward patch to emit an alignment directive when emitting a
TOC entry. The test case was generated from the test in PR22711 that
demonstrated a misaligned .toc section. The object code is run
through llvm-readobj to verify that the correct alignment has been
applied to the .toc section.
Thanks to Ulrich Weigand for running down where the fix was needed.
llvm-svn: 230801
Summary:
Until now, we did this (among other things) based on whether or not the
target was Windows. This is clearly wrong, not just for Win64 ABI functions
on non-Windows, but for System V ABI functions on Windows, too. In this
change, we make this decision based on the ABI the calling convention
specifies instead.
Reviewers: rnk
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D7953
llvm-svn: 230793
When using Altivec, we can use vector loads and stores for aligned memcpy and
friends. Starting with the P7 and VXS, we have reasonable unaligned vector
stores. Starting with the P8, we have fast unaligned loads too.
For QPX, we use vector loads are stores, but only for aligned memory accesses.
llvm-svn: 230788
vectors. This lets us fix the rest of the v16 lowering problems when
pshufb is clearly better.
We might still be able to improve some of the lowerings by enabling the
other combine-based rewriting to fire for non-128-bit vectors, but this
at least should remove any regressions from using the fancy v16i16
lowering strategy.
llvm-svn: 230753
repeated 128-bit lane shuffles of wider vector types and use it to lower
256-bit v16i16 vector shuffles where applicable.
This should let us perfectly lowering the pattern of pshuflw and pshufhw
even for AVX2 256-bit patterns.
I've not added AVX-512 support, but it should be trivial for someone
working on that to wire up.
Note that currently this generates bad, long shuffle chains because we
don't combine 256-bit target shuffles. The subsequent patches will fix
that.
llvm-svn: 230751
Summary:
We identify the cases where the operand to an ADDE node is a constant
zero. In such cases, we can avoid generating an extra ADDu instruction
disguised as an identity move alias (ie. addu $r, $r, 0 --> move $r, $r).
Reviewers: dsanders
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D7906
llvm-svn: 230742
Summary:
This change causes us to actually save non-volatile registers in a Win64
ABI function that calls a System V ABI function, and vice-versa.
Reviewers: rnk
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D7919
llvm-svn: 230714
uses of TM->getSubtargetImpl and propagate to all calls.
This could be a debugging regression in places where we had a
TargetMachine and/or MachineFunction but don't have it as part
of the MachineInstr. Fixing this would require passing a
MachineFunction/Function down through the print operator, but
none of the existing uses in tree seem to do this.
llvm-svn: 230710
a lookup, pass that in rather than use a naked call to getSubtargetImpl.
This involved passing down and around either a TargetMachine or
TargetRegisterInfo. Update all callers/definitions around the targets
and SelectionDAG.
llvm-svn: 230699
blend as legal.
We made the same mistake in two different places. Whenever we are custom
lowering a v32i8 blend we need to check whether we are custom lowering
it only for constant conditions that can be shuffled, or whether we
actually have AVX2 and full dynamic blending support on bytes. Both are
fixed, with comments added to make it clear what is going on and a new
test case.
llvm-svn: 230695
dynamic blends.
This makes it much more clear what is going on. The case we're handling
is that of dynamic conditions, and we're bailing when the nature of the
vector types and subtarget preclude lowering the dynamic condition
vselect as an actual blend.
No functionality changed here, but this will make a subsequent bug-fix
to this code much more clear.
llvm-svn: 230690
There was a problem when passing structures as variable arguments.
The structures smaller than 64 bit were not left justified on MIPS64
big endian. This is now fixed by shifting the value to make it left-
justified when appropriate.
This fixes the bug http://llvm.org/bugs/show_bug.cgi?id=21608
Patch by Aleksandar Beserminji.
Differential Revision: http://reviews.llvm.org/D7881
llvm-svn: 230657
In case of "krait" CPU, asm printer doesn't emit any ".cpu" so the
features bits are not computed. This patch lets the asm printer
emit ".cpu cortex-a9" directive for krait and the hwdiv feature is
enabled through ".arch_extension". In short, krait is treated
as "cortex-a9" with hwdiv. We can not emit ".krait" as CPU since
it is not supported bu GNU GAS yet
llvm-svn: 230651
This patch is in response to r223147 where the avaiable features are
computed based on ".cpu" directive. This will work clean for the standard
variants like cortex-a9. For custom variants which rely on standard cpu names
for assembly, the additional features of a CPU should be propagated. This can be
done via ".arch_extension" as long as the assembler supports it. The
implementation for krait along with unit test will be submitted in next patch.
llvm-svn: 230650
The latency for the WriteMULm class was set to 4, which is actually lower than the latency for WriteMULr (5).
A better estimate would be 4 added to WriteMULr, that is, 9.
llvm-svn: 230634
formulaic into the top v8i16 lowering routine.
This makes the generalized lowering a completely general and single path
lowering which will allow generalizing it in turn for multiple 128-bit
lanes.
llvm-svn: 230623
Explanation: This function is in TargetLowering because it uses
RegClassForVT which would need to be moved to TargetRegisterInfo
and would necessitate moving isTypeLegal over as well - a massive
change that would just require TargetLowering having a TargetRegisterInfo
class member that it would use.
llvm-svn: 230585
This required plumbing a TargetRegisterInfo through computeRegisterProperties
and into findRepresentativeClass which uses it for register class
iteration. This required passing a subtarget into a few target specific
initializations of TargetLowering.
llvm-svn: 230583
LDtocL, and other loads that roughly correspond to the TOC_ENTRY SDAG node,
represent loads from the TOC, which is invariant. As a result, these loads can
be hoisted out of loops, etc. In order to do this, we need to generate
GOT-style MMOs for TOC_ENTRY, which requires treating it as a legitimate memory
intrinsic node type. Once this is done, the MMO transfer is automatically
handled for TableGen-driven instruction selection, and for nodes generated
directly in PPCISelDAGToDAG, we need to transfer the MMOs manually.
Also, we were not transferring MMOs associated with pre-increment loads, so do
that too.
Lastly, this fixes an exposed bug where R30 was not added as a defined operand of
UpdateGBR.
This problem was highlighted by an example (used to generate the test case)
posted to llvmdev by Francois Pichet.
llvm-svn: 230553
The Win64 epilogue structure is very restrictive, it permits a very
small number of opcodes and none of them are 'mov'.
This means that given:
mov %rbp, %rsp
pop %rbp
The mov isn't the epilogue, only the pop is. This is problematic unless
a frame pointer is present in which case we are free to do whatever we'd
like in the "body" of the function. If a frame pointer is present,
unwinding will undo the prologue operations in reverse order regardless
of the fact that we are at an instruction which is reseting the stack
pointer.
llvm-svn: 230543
We had somehow accumulated a few target-specific SDAG nodes dealing with PPC64
TOC access that were referenced only in TableGen patterns. The associated
(pseudo-)instructions are used, but are being generated directly. NFC.
llvm-svn: 230518
Reapply r230248.
Teach the peephole optimizer to work with MMX instructions by adding
entries into the foldable tables. This covers folding opportunities not
handled during isel.
llvm-svn: 230499
MMX_MOVD64rm zero-extends i32 load results into i64 registers.
The peephole optimizer will try to fold it in other MMX foldable
instructions, the wrong thing to do, since there's no MMX memory
instruction that loads from i32 and does implict zero extension.
Remove 'canFoldAsLoad' from MOVD64rm in order to prevent such folding.
The current MMX tests already test this, but since there are no MMX
instructions in the foldable tables yet, this did not trigger. This
commit prepares the addition of those instructions.
llvm-svn: 230498
Thumb-1 only allows SP-based LDR and STR to be word-sized, and SP-base LDR,
STR, and ADD only allow offsets that are a multiple of 4. Make some changes
to better make use of these instructions:
* Use word loads for anyext byte and halfword loads from the stack.
* Enforce 4-byte alignment on objects accessed in this way, to ensure that
the offset is valid.
* Do the same for objects whose frame index is used, in order to avoid having
to use more than one ADD to generate the frame index.
* Correct how many bits of offset we think AddrModeT1_s has.
Patch by John Brawn.
llvm-svn: 230496
Gather and scatter instructions additionally write to one of the source operands - mask register.
In this case Gather has 2 destination values - the loaded value and the mask.
Till now we did not support code gen pattern for gather - the instruction was generated from
intrinsic only and machine node was hardcoded.
When we introduce the masked_gather node, we need to select instruction automatically,
in the standard way.
I added a flag "hasTwoExplicitDefs" that allows to handle 2 destination operands.
(Some code in the X86InstrFragmentsSIMD.td is commented out, just to split one big
patch in many small patches)
llvm-svn: 230471
This adds support for the QPX vector instruction set, which is used by the
enhanced A2 cores on the IBM BG/Q supercomputers. QPX vectors are 256 bytes
wide, holding 4 double-precision floating-point values. Boolean values, modeled
here as <4 x i1> are actually also represented as floating-point values
(essentially { -1, 1 } for { false, true }). QPX shares many features with
Altivec and VSX, but is distinct from both of them. One major difference is
that, instead of adding completely-separate vector registers, QPX vector
registers are extensions of the scalar floating-point registers (lane 0 is the
corresponding scalar floating-point value). The operations supported on QPX
vectors mirrors that supported on the scalar floating-point values (with some
additional ones for permutations and logical/comparison operations).
I've been maintaining this support out-of-tree, as part of the bgclang project,
for several years. This is not the entire bgclang patch set, but is most of the
subset that can be cleanly integrated into LLVM proper at this time. Adding
this to the LLVM backend is part of my efforts to rebase bgclang to the current
LLVM trunk, but is independently useful (especially for codes that use LLVM as
a JIT in library form).
The assembler/disassembler test coverage is complete. The CodeGen test coverage
is not, but I've included some tests, and more will be added as follow-up work.
llvm-svn: 230413
The reason why these large shift sizes happen is because OpaqueConstants
currently inhibit alot of DAG combining, but that has to be addressed in
another commit (like the proposal in D6946).
Differential Revision: http://reviews.llvm.org/D6940
llvm-svn: 230355
The logic is almost there already, with our special homogeneous aggregate
handling. Tweaking it like this allows front-ends to emit AAPCS compliant code
without ever having to count registers or add discarded padding arguments.
Only arrays of i32 and i64 are needed to model AAPCS rules, but I decided to
apply the logic to all integer arrays for more consistency.
llvm-svn: 230348
Summary: Separated some instruction and pseudo-instruction definitions from InstAlias definitions, added banner for pseudo-instructions and removed a redundant whitespace from a pseudo-instruction definition. No functional change.
Reviewers: dsanders
Reviewed By: dsanders
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D7552
llvm-svn: 230327
Summary: Begin to add various address modes; including alloca.
Test Plan: Make sure there are no regressions in test-suite at O0/02 in mips32r1/r2
Reviewers: dsanders
Reviewed By: dsanders
Subscribers: echristo, rfuhler, llvm-commits
Differential Revision: http://reviews.llvm.org/D6426
llvm-svn: 230300
This is a follow up to r230233 to fix something that I noticed by
inspection. The AddrModeT2_i8s4 addressing mode does not support
negative offsets. I spent a good chunk of the day trying to come up with
a testcase for this but was not successful. This addressing mode is used
to spill and restore GPRPair registers in Thumb2 code and that does not
happen often. We also make very limited used of negative offsets when
lowering frame indexes. I am going ahead with the change anyway, because
I am pretty confident that it is correct. I also added a missing assertion
to check that the low bits of the scaled offset are zero.
llvm-svn: 230297
Prologue emission, in some cases, requires calls to a stack probe helper
function. The amount of stack to probe is passed as a register
argument in the Win64 ABI but the instruction sequence used is
pessimistic: it assumes that the number of bytes to probe is greater
than 4 GB.
Instead, select a more appropriate opcode depending on the number of
bytes we are going to probe.
llvm-svn: 230270
Front-ends could use global unnamed_addr to hold pointers to other
symbols, like @gotequivalent below:
@foo = global i32 42
@gotequivalent = private unnamed_addr constant i32* @foo
@delta = global i32 trunc (i64 sub (i64 ptrtoint (i32** @gotequivalent to i64),
i64 ptrtoint (i32* @delta to i64))
to i32)
The global @delta holds a data "PC"-relative offset to @gotequivalent,
an unnamed pointer to @foo. The darwin/x86-64 assembly output for this follows:
.globl _foo
_foo:
.long 42
.globl _gotequivalent
_gotequivalent:
.quad _foo
.globl _delta
_delta:
.long _gotequivalent-_delta
Since unnamed_addr indicates that the address is not significant, only
the content, we can optimize the case above by replacing pc-relative
accesses to "GOT equivalent" globals, by a PC relative access to the GOT
entry of the final symbol instead. Therefore, "delta" can contain a pc
relative relocation to foo's GOT entry and we avoid the emission of
"gotequivalent", yielding the assembly code below:
.globl _foo
_foo:
.long 42
.globl _delta
_delta:
.long _foo@GOTPCREL+4
There are a couple of advantages of doing this: (1) Front-ends that need
to emit a great deal of data to store pointers to external symbols could
save space by not emitting such "got equivalent" globals and (2) IR
constructs combined with this opt opens a way to represent GOT pcrel
relocations by using the LLVM IR, which is something we previously had
no way to express.
Differential Revision: http://reviews.llvm.org/D6922
rdar://problem/18534217
llvm-svn: 230264
It was previously using the subtarget to get values for the global
offset without actually checking each function as it was generating
code. Go ahead and solidify the current behavior and make the
existing FIXMEs more prominent.
As a note the ARM backend previously had a thumb1 and non-thumb1
set of defaults. Only the former was tested so I've changed the
behavior to only use that for now.
llvm-svn: 230245
This patch adds the isProfitableToHoist API. For AArch64, we want to prevent a
fmul from being hoisted in cases where it is more profitable to form a
fmsub/fmadd.
Phabricator Review: http://reviews.llvm.org/D7299
Patch by Lawrence Hu <lawrence@codeaurora.org>
llvm-svn: 230241
Summary:
-mno-odd-spreg prohibits the use of odd-numbered single-precision floating
point registers. However, vector insert/extract was still using them when
manipulating the subregisters of an MSA register. Fixed this by ensuring
that insertion/extraction is only performed on even-numbered vector
registers when -mno-odd-spreg is given.
Reviewers: vmedic, sstankovic
Reviewed By: sstankovic
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D7672
llvm-svn: 230235
The natural way to handle this addressing mode would be to say that it has
8 bits and gets scaled by 4, but since the MC layer is expecting the scaling
to be already reflected in the immediate value, we have been setting the
Scale to 1. That's fine, but then NumBits needs to be adjusted to reflect
the effective increase in the range of the immediate. That adjustment was
missing.
The consequence is that the register scavenger can fail.
The estimateRSStackSizeLimit() function in ARMFrameLowering.cpp correctly
assumes that the AddrModeT2_i8s4 address mode can handle scaled offsets up to
1020. Under just the right circumstances, we fail to reserve space for the
scavenger because it thinks that nothing will be needed. However, the overly
pessimistic behavior in rewriteT2FrameIndex causes some frame indexes to be
out of range and require scavenged registers, and so the scavenger asserts.
Unfortunately I have not been able to come up with a testcase for this. I
can only reproduce it on an internal branch where the frame layout and
register allocation is slightly different than trunk. We really need a
way to serialize MachineInstr-level IR to write reasonable tests for things
like this.
rdar://problem/19909005
llvm-svn: 230233
Teach the peephole optimizer to work with MMX instructions by adding
entries into the foldable tables. This covers folding opportunities not
handled during isel.
llvm-svn: 230226
I made the templates general, no need to define pattern separately for each instruction/intrinsic.
Now only need to add r_Int pattern for AVX.
llvm-svn: 230221
Synthesizing a call directly using the MI layer would confuse the frame
lowering code. This is problematic as frame lowering is highly
sensitive the particularities of calls, etc.
llvm-svn: 230129
Everyone except R600 was manually passing the length of a static array
at each callsite, calculated in a variety of interesting ways. Far
easier to let ArrayRef handle that.
There should be no functional change, but out of tree targets may have
to tweak their calls as with these examples.
llvm-svn: 230118
Stack realignment occurs after the prolog, not during, for Win64.
Because of this, don't factor in the maximum stack alignment when
establishing a frame pointer.
This fixes PR22572.
llvm-svn: 230113
The expansion code does the same thing. Since
the operands were not defined with the correct
types, this has the side effect of fixing operand
folding since the expanded pseudo would never use
SGPRs or inline immediates.
llvm-svn: 230072
This enables a few useful combines that used to only
use fma.
Also since v_mad_f32 apparently does not support denormals,
disable the existing cases that are custom handled if they are
requested.
llvm-svn: 230071
usage of instruction ADDU16 by CodeGen. For this instruction an improper
register is allocated, i.e. the register that is not from register set defined
for the instruction.
llvm-svn: 230053
changes to remove non-Function based subtargets out of the asm
printer. For module level emission we'll need to construct up
an MCSubtargetInfo so that we can encode instructions for
emission.
llvm-svn: 230050
This patch teaches X86FastISel how to select intrinsic 'convert_from_fp16' and
intrinsic 'convert_to_fp16'.
If the target has F16C, we can select VCVTPS2PHrr for a float-half conversion,
and VCVTPH2PSrr for a half-float conversion.
Differential Revision: http://reviews.llvm.org/D7673
llvm-svn: 230043
EmitFunctionStubs is called from doFinalization and so can't
depend on the Subtarget existing. It's also irrelevant as
we know we're darwin since we're in the darwin asm printer.
llvm-svn: 230039
This canonicalization step saves us 3 pattern matching possibilities * 4 math ops
for scalar FP math that uses xmm regs. The backend can re-commute the operands
post-instruction-selection if that makes register allocation better.
The tests in llvm/test/CodeGen/X86/sse-scalar-fp-arith.ll cover this scenario already,
so there are no new tests with this patch.
Differential Revision: http://reviews.llvm.org/D7777
llvm-svn: 230024
the wrong answer. We also got initializer lists which are *way* cleaner
for this kind of thing. Let's use those and make this a normal, boring
functionn accepting ArrayRef.
llvm-svn: 230004