address of the stack guard was being spilled to the stack.
Previously the address of the stack guard would get spilled to the stack if it
was impossible to keep it in a register. This patch introduces a new target
independent node and pseudo instruction which gets expanded post-RA to a
sequence of instructions that load the stack guard value. Register allocator
can now just remat the value when it can't keep it in a register.
<rdar://problem/12475629>
llvm-svn: 213967
This commit implements the frameaddress intrinsic for the AArch64 architecture
in FastISel.
There were two test cases that pretty much tested the same, so I combined them
to a single test case.
Fixes <rdar://problem/17811834>
llvm-svn: 213959
SDValues, fixing the two bugs left in the regression suite.
The key for both of these was the use a single value type rather than
a VTList which caused an unintentionally single-result merge-value node.
Fix this by getting the appropriate VTList in place.
Doing this exposed that the comments in x86's code abouth how MUL_LOHI
operands are handle is wrong. The bug with the use of out-of-range
result numbers was hiding the bug about the order of operands here (as
best i can tell). There are more places where the code appears to get
this backwards still...
llvm-svn: 213931
assembly instructions.
This is necessary to ensure ARM assembler switches to Thumb mode before it
starts assembling the file level inline assembly instructions at the beginning
of a .s file.
<rdar://problem/17757232>
llvm-svn: 213924
Because the PowerPC vmrgh* and vmrgl* instructions have a built-in
big-endian bias, it is necessary to swap their inputs in little-endian
mode when using them to implement a vector shuffle. This was
previously missed in the vector LE implementation.
There was already logic to distinguish between unary and "normal"
vmrg* vector shuffles, so this patch extends that logic to use a third
option: "swapped" vmrg* vector shuffles that are used for little
endian in place of the "normal" ones.
I've updated the vec-shuffle-le.ll test to check for the expected
register ordering on the generated instructions.
This bug was discovered when testing the LE and ELFv2 patches for
safety if they were backported to 3.4. A different vectorization
decision was made in 3.4 than on mainline trunk, and that exposed the
problem. I've verified this fix takes care of that issue.
llvm-svn: 213915
vector operation legalization with support for custom target lowering
and fallback to expand when it fails, and use this to implement sext and
anyext load lowering for x86 in a more principled way.
Previously, the x86 backend relied on a target DAG combine to "combine
away" sextload and extload nodes prior to legalization, or would expand
them during legalization with terrible code. This is particularly
problematic because the DAG combine relies on running over non-canonical
DAG nodes at just the right time to match several common and important
patterns. It used a combine rather than lowering because we didn't have
good lowering support, and to expose some tricks being employed to more
combine phases.
With this change it becomes a proper lowering operation, the backend
marks that it can lower these nodes, and I've added support for handling
the canonical forms that don't have direct legal representations such as
sextload of a v4i8 -> v4i64 on AVX1. With this change, our test cases
for this behavior continue to pass even after the DAG combiner beigns
running more systematically over every node.
There is some noise caused by this in the test suite where we actually
use vector extends instead of subregister extraction. This doesn't
really seem like the right thing to do, but is unlikely to be a critical
regression. We do regress in one case where by lowering to the
target-specific patterns early we were able to combine away extraneous
legal math nodes. However, this regression is completely addressed by
switching to a widening based legalization which is what I'm working
toward anyways, so I've just switched the test to that mode.
Differential Revision: http://reviews.llvm.org/D4654
llvm-svn: 213897
The Microsoft ABI and MSVCRT are considered the canonical C runtime and ABI.
The long double routines are not part of this environment. However, cygwin and
MinGW both provide supplementary implementations. Change the condition to
reflect this reality.
llvm-svn: 213896
This patch minimizes the number of nops that must be emitted on X86 to satisfy
stackmap shadow constraints.
To minimize the number of nops inserted, the X86AsmPrinter now records the
size of the most recent stackmap's shadow in the StackMapShadowTracker class,
and tracks the number of instruction bytes emitted since the that stackmap
instruction was encountered. Padding is emitted (if it is required at all)
immediately before the next stackmap/patchpoint instruction, or at the end of
the basic block.
This optimization should reduce code-size and improve performance for people
using the llvm stackmap intrinsic on X86.
<rdar://problem/14959522>
llvm-svn: 213892
This target is identical to the Windows MSVC (and follows Microsoft ABI for C).
Correct the library call setup for this target. The same set of library calls
are missing on this environment.
llvm-svn: 213883
GCC 4.8 detected a signed compare [-Wsign-compare]. Add a cast for the
destination index. Add an assert to catch a potential overflow however unlikely
it may be.
llvm-svn: 213878
Quite a bit of cruft had accumulated as we realised the various different cases
it had to handle and squeezed them in where possible. This refactoring mostly
flattens the logic and special-cases. The result is slightly longer, but I
think clearer.
Should be no functionality change.
llvm-svn: 213867
In order to enable the preservation of noalias function parameter information
after inlining, and the representation of block-level __restrict__ pointer
information (etc.), additional kinds of aliasing metadata will be introduced.
This metadata needs to be carried around in AliasAnalysis::Location objects
(and MMOs at the SDAG level), and so we need to generalize the current scheme
(which is hard-coded to just one TBAA MDNode*).
This commit introduces only the necessary refactoring to allow for the
introduction of other aliasing metadata types, but does not actually introduce
any (that will come in a follow-up commit). What it does introduce is a new
AAMDNodes structure to hold all of the aliasing metadata nodes associated with
a particular memory-accessing instruction, and uses that structure instead of
the raw MDNode* in AliasAnalysis::Location, etc.
No functionality change intended.
llvm-svn: 213859
The ARM ARM prohibits STRH instructions with writeback into the source register. With this commit this constraint is now enforced and we stop assembling STRH instructions with unpredictable behavior.
llvm-svn: 213850
Summary: The ll and sc instructions for r6 and non-r6 are misplaced. This patch fixes that.
Patch by Jyun-Yan You
Differential Revision: http://reviews.llvm.org/D4578
llvm-svn: 213847
Use ComputeNumSignBits instead of checking for i8 / i16 which only
worked when AMDIL was lying about having legal i8 / i16.
If an integer is known to fit in 24-bits, we can
do division faster with float ops.
llvm-svn: 213843
This bug is introduced by r211144. The element of operand may be
smaller than the element of result, but previous commit can
only handle the contrary condition. This commit is to handle this
scenario and generate optimized codes like ZIP1.
llvm-svn: 213830
When we had a vector_shuffle where we had an input from each vector, we
could miscompile it because we were assuming the input from V2 wouldn't
be moved from where it was on the vector.
Added a test case.
llvm-svn: 213826
There were still some disassembler bits in lib/MC, but their use of Object
was only visible in the includes they used, not in the symbols.
llvm-svn: 213808
The transform to constant fold unary operations with an AND across a
vector comparison applies when the constant is not a splat of a scalar
as well.
llvm-svn: 213800
The folding of unary operations through a vector compare and mask operation
is only safe if the unary operation result is of the same size as its input.
For example, it's not safe for [su]itofp from v4i32 to v4f64.
llvm-svn: 213799
I used the wrong method to obtain the return type inside FinishCall. This fix
simply uses the return type from FastLowerCall, which we already determined to
be a valid type.
Reduced test case from Chad. Thanks.
llvm-svn: 213788
With optimizations disabled, we disable the isel patterns for mul.wide; but we
were still generating MULWIDE ISD nodes. Now, we only try to generate MULWIDE
ISD nodes in DAGCombine if the optimization level is not zero.
llvm-svn: 213773
The target-independent DAGcombiner will generate:
asr w1, X, #31 w1 = splat sign bit.
add X, X, w1, lsr #28 X = X + 0 or pow2-1
asr w0, X, asr #4 w0 = X/pow2
However, the add + shifts is expensive, so generate:
add w0, X, 15 w0 = X + pow2-1
cmp X, wzr X - 0
csel X, w0, X, lt X = (X < 0) ? X + pow2-1 : X;
asr w0, X, asr 4 w0 = X/pow2
llvm-svn: 213758
We were assuming all SBFX-like operations would have the shl/asr form, but
often when the field being extracted is an i8 or i16, we end up with a
SIGN_EXTEND_INREG acting on a shift instead. Simple enough to check for though.
llvm-svn: 213754
Although the final shifter operand is a rotate, this actually only matters for
the half-word extends when the amount == 24. Otherwise folding a shift in is
just as good.
llvm-svn: 213753
This pass attempts to speculatively use a sqrt instruction if one exists on the target, falling back to a libcall if the target instruction returned NaN.
This was enabled for MIPS and System-Z, but is well guarded and is good for most targets - GCC does this for (that I've checked) X86, ARM and AArch64.
llvm-svn: 213752
The ARM ARM prohibits STRB instructions with writeback into the source register. With this commit this constraint is now enforced and we stop assembling STRB instructions with unpredictable behavior.
llvm-svn: 213750
There really is no arm64_be: it was a useful fiction to test big-endian support
while both backends existed in parallel, but now the only platform that uses
the name (iOS) doesn't have a big-endian variant, let alone one called
"arm64_be".
llvm-svn: 213748
The ARM ARM prohibits STR instructions with writeback into the source register. With this commit this constraint is now enforced and we stop assembling STR instructions with unpredictable behavior.
llvm-svn: 213745
Having both Triple::arm64 and Triple::aarch64 is extremely confusing, and
invites bugs where only one is checked. In reality, the only legitimate
difference between the two (arm64 usually means iOS) is also present in the OS
part of the triple and that's what should be checked.
We still parse the "arm64" triple, just canonicalise it to Triple::aarch64, so
there aren't any LLVM-side test changes.
llvm-svn: 213743
This chang fully reverts r211771.
That revision added a canonicalization rule which has the potential to causes a
combine-cycle in the target-independent canonicalizing DAG combine.
The plan is to move the logic that forms target specific addsub nodes as part of
the lowering of shuffles.
llvm-svn: 213736
The post-indexed instructions were missing the constraint, causing unpredictable STRH instructions to be emitted.
The earlyclobber constraint on the pre-indexed STR instructions is not strictly necessary, as the instruction selection for pre-indexed STR instructions goes through an additional layer of pseudo instructions which have the constraint defined, however it doesn't hurt to specify the constraint directly on the pre-indexed instructions as well, since at some point someone might create instances of them programmatically and then the constraint is definitely needed.
llvm-svn: 213729
This commit modifies the existing call lowering functions to be used as the
FastLowerCall and FastLowerIntrinsicCall target-hooks instead.
This enables patchpoint intrinsic lowering for AArch64.
This fixes <rdar://problem/17733076>
llvm-svn: 213704
Without this, we produce non-extern relocations when targeting older OS X
versions that ld64 can't cope with in the particular context of __eh_frame
sections (who'd want generic relocation-processing anyway?).
This means that an updated linker (ld64 from Xcode 3.2.6 or later) may be
needed when targeting such platforms with a modern version of LLVM, but this is
probably the case anyway and a reasonable requirement.
PR20212, rdar://problem/17544795
llvm-svn: 213665
GCC believes it may be possible to not return a value from the switch:
lib/Target/R600/SIRegisterInfo.cpp:187:1: warning: control reaches end of non-void function [-Wreturn-type]
Add an unreachable label to indicate that this is not possible and still permit
switch coverage checking.
llvm-svn: 213572
There are a few more cleanups to do, but I ran into some problems
with ext loads and trunc stores, when I tried to change some of the
vector loads and stores from custom to legal, so I wasn't able to
get rid of everything.
llvm-svn: 213552
We now emit this value when we need to contradict the default value. This
restores support for binutils 2.24.
When a suitable binutils has been released we can resume unconditionally
emitting .module directives. This is preferable to omitting the .module
directives since the .module directives protect against, for example,
accidentally assembling FP32 code with -mfp64 and producing an unusuable object.
llvm-svn: 213548
This implements a solution for constant initializers suggested
by Vadim Girlin, where we store the data after the shader code
and then use the S_GETPC instruction to compute its address.
This saves use the trouble of creating a new buffer for constant data
and then having to pass the pointer to the kernel via user SGPRs or the
input buffer.
llvm-svn: 213530
This abstraction allows us to support the various records that can be placed in
the .MIPS.options section in the future. We currently use it to record register
usage information (the ODK_REGINFO record in our ELF64 spec).
Each .MIPS.options record should subclass MipsOptionRecord and provide an
implementation of EmitMipsOptionRecord.
Patch by Matheus Almeida and Toma Tabacu
llvm-svn: 213522
Fix a dangerous default case that caused MipsCodeEmitter to discard pseudo
instructions it didn't recognize. It will now call llvm_unreachable() for
unrecognized pseudo's and explicitly handles PseudoReturn, PseudoReturn64,
PseudoIndirectBranch, PseudoIndirectBranch64, CFI_INSTRUCTION, IMPLICIT_DEF,
and KILL.
There may be other pseudos that need handling but this was enough for the
ExecutionEngine tests to pass on my test system.
llvm-svn: 213513
We now emit this directive when we need to contradict the default value (e.g.
-mno-odd-spreg is given) or an option changed the default value (e.g. -mfpxx
is given).
This restores support for the currently available head of binutils. However,
at this point binutils 2.24 is still not sufficient since it does not support
'.module fp=...'.
llvm-svn: 213511
This makes the first stage DAG for @llvm.convert.to.fp16 an fptrunc,
and correspondingly @llvm.convert.from.fp16 an fpext. The legalisation
path is now uniform, regardless of the input IR:
fptrunc -> FP_TO_FP16 (if f16 illegal) -> libcall
fpext -> FP16_TO_FP (if f16 illegal) -> libcall
Each target should be able to select the version that best matches its
operations and not be required to duplicate patterns for both fptrunc
and FP_TO_FP16 (for example).
As a result we can remove some redundant AArch64 patterns.
llvm-svn: 213507
This patch removes function 'CommuteVectorShuffle' from X86ISelLowering.cpp
and moves its logic into SelectionDAG.cpp as method 'getCommutedVectorShuffles'.
This refactoring is in preperation of an upcoming change to the DAGCombiner.
llvm-svn: 213503
This patch adds infrastructure support for passing array types
directly. These can be used by the front-end to pass aggregate
types (coerced to an appropriate array type). The details of the
array type being used inform the back-end about ABI-relevant
properties. Specifically, the array element type encodes:
- whether the parameter should be passed in FPRs, VRs, or just
GPRs/stack slots (for float / vector / integer element types,
respectively)
- what the alignment requirements of the parameter are when passed in
GPRs/stack slots (8 for float / 16 for vector / the element type
size for integer element types) -- this corresponds to the
"byval align" field
Using the infrastructure provided by this patch, a companion patch
to clang will enable two features:
- In the ELFv2 ABI, pass (and return) "homogeneous" floating-point
or vector aggregates in FPRs and VRs (this is similar to the ARM
homogeneous aggregate ABI)
- As an optimization for both ELFv1 and ELFv2 ABIs, pass aggregates
that fit fully in registers without using the "byval" mechanism
The patch uses the functionArgumentNeedsConsecutiveRegisters callback
to encode that special treatment is required for all directly-passed
array types. The isInConsecutiveRegs / isInConsecutiveRegsLast bits set
as a results are then used to implement the required size and alignment
rules in CalculateStackSlotSize / CalculateStackSlotAlignment etc.
As a related change, the ABI routines have to be modified to support
passing floating-point types in GPRs. This is necessary because with
homogeneous aggregates of 4-byte float type we can now run out of FPRs
*before* we run out of the 64-byte argument save area that is shadowed
by GPRs. Any extra floating-point arguments that no longer fit in FPRs
must now be passed in GPRs until we run out of those too.
Note that there was already code to pass floating-point arguments in
GPRs used with vararg parameters, which was done by writing the argument
out to the argument save area first and then reloading into GPRs. The
patch re-implements this, however, in favor of code packing float arguments
directly via extension/truncation, BITCAST, and BUILD_PAIR operations.
This is required to support the ELFv2 ABI, since we cannot unconditionally
write to the argument save area (which the caller might not have allocated).
The change does, however, affect ELFv1 varags routines too; but even here
the overall effect should be advantageous: Instead of loading the argument
into the FPR, then storing the argument to the stack slot, and finally
reloading the argument from the stack slot into a GPR, the new code now
just loads the argument into the FPR, and subsequently loads the argument
into the GPR (via BITCAST). That BITCAST might imply a save/reload from
a stack temporary (in which case we're no worse than before); but it
might be implemented more efficiently in some cases.
The final part of the patch enables up to 8 FPRs and VRs for argument
return in PPCCallingConv.td; this is required to support returning
ELFv2 homogeneous aggregates. (Note that this doesn't affect other ABIs
since LLVM wil only look for which register to use if the parameter is
marked as "direct" return anyway.)
Reviewed by Hal Finkel.
llvm-svn: 213493
This is a minor improvement in the ELFv2 ABI. In ELFv1, DWARF CFI
would represent a saved CR word (holding CR fields CR2, CR3, and CR4)
using just a single CFI record refering to CR2. In ELFv2 instead,
each of the CR fields is represented by its own CFI record. The
advantage is that the compiler can now chose to save just a single
(or two) CR fields instead of all of them, if those are the only ones
that actually need saving. That can lead to more efficient code using
mf(o)crf instead of the (slow) mfcr instruction.
Note that this patch does not (yet) implement this more efficient
code generation, but it does implement the part that is required to
be ABI compliant: creating multiple CFI records if multiple CR fields
are saved.
Reviewed by Hal Finkel.
llvm-svn: 213492
The ELFv2 ABI reduces the amount of stack required to implement an
ABI-compliant function call in two ways:
* the "linkage area" is reduced from 48 bytes to 32 bytes by
eliminating two unused doublewords
* the 64-byte "parameter save area" is now optional and need not be
present in certain cases (it remains mandatory in functions with
variable arguments, and functions that have any parameter that is
passed on the stack)
The following patch implements this required changes:
- reducing the linkage area, and associated relocation of the TOC save
slot, in getLinkageSize / getTOCSaveOffset (this requires updating all
callers of these routines to pass in the isELFv2ABI flag).
- (partially) handling the case where the parameter save are is optional
This latter part requires some extra explanation: Currently, we still
always allocate the parameter save area when *calling* a function.
That is certainly always compliant with the ABI, but may cause code to
allocate stack unnecessarily. This can be addressed by a follow-on
optimization patch.
On the *callee* side, in LowerFormalArguments, we *must* track
correctly whether the ABI guarantees that the caller has allocated
the parameter save area for our use, and the patch does so. However,
there is one complication: the code that handles incoming "byval"
arguments will currently *always* write to the parameter save area,
because it has to force incoming register arguments to the stack since
it must return an *address* to implement the byval semantics.
To fix this, the patch changes the LowerFormalArguments code to write
arguments to a freshly allocated stack slot on the function's own stack
frame instead of the argument save area in those cases where that area
is not present.
Reviewed by Hal Finkel.
llvm-svn: 213490
This patch builds upon the two preceding MC changes to implement the
basic ELFv2 function call convention. In the ELFv1 ABI, a "function
descriptor" was associated with every function, pointing to both the
entry address and the related TOC base (and a static chain pointer
for nested functions). Function pointers would actually refer to that
descriptor, and the indirect call sequence needed to load up both entry
address and TOC base.
In the ELFv2 ABI, there are no more function descriptors, and function
pointers simply refer to the (global) entry point of the function code.
Indirect function calls simply branch to that address, after loading it
up into r12 (as required by the ABI rules for a global entry point).
Direct function calls continue to just do a "bl" to the target symbol;
this will be resolved by the linker to the local entry point of the
target function if it is local, and to a PLT stub if it is global.
That PLT stub would then load the (global) entry point address of the
final target into r12 and branch to it. Note that when performing a
local function call, r2 must be set up to point to the current TOC
base: if the target ends up local, the ABI requires that its local
entry point is called with r2 set up; if the target ends up global,
the PLT stub requires that r2 is set up.
This patch implements all LLVM changes to implement that scheme:
- No longer create a function descriptor when emitting a function
definition (in EmitFunctionEntryLabel)
- Emit two entry points *if* the function needs the TOC base (r2)
anywhere (this is done EmitFunctionBodyStart; note that this cannot
be done in EmitFunctionBodyStart because the global entry point
prologue code must be *part* of the function as covered by debug info).
- In order to make use tracking of r2 (as needed above) work correctly,
mark direct function calls as implicitly using r2.
- Implement the ELFv2 indirect function call sequence (no function
descriptors; load target address into r12).
- When creating an ELFv2 object file, emit the .abiversion 2 directive
to tell the linker to create the appropriate version of PLT stubs.
Reviewed by Hal Finkel.
llvm-svn: 213489
As discussed in a previous checking to support the .localentry
directive on PowerPC, we need to inspect the actual target symbol
in needsRelocateWithSymbol to make the appropriate decision based
on that symbol's st_other bits.
Currently, needsRelocateWithSymbol does not get the target symbol.
However, it is directly available to its sole caller. This patch
therefore simply extends the needsRelocateWithSymbol by a new
parameter "const MCSymbolData &SD", passes in the target symbol,
and updates all derived implementations.
In particular, in the PowerPC implementation, this patch removes
the FIXME added by the previous checkin.
llvm-svn: 213487
A second binutils feature needed to support ELFv2 is the .localentry
directive. In the ELFv2 ABI, functions may have two entry points:
one for calling the routine locally via "bl", and one for calling the
function via function pointer (either at the source level, or implicitly
via a PLT stub for global calls). The two entry points share a single
ELF symbol, where the ELF symbol address identifies the global entry
point address, while the local entry point is found by adding a delta
offset to the symbol address. That offset is encoded into three
platform-specific bits of the ELF symbol st_other field.
The .localentry directive instructs the assembler to set those fields
to encode a particular offset. This is typically used by a function
prologue sequence like this:
func:
addis r2, r12, (.TOC.-func)@ha
addi r2, r2, (.TOC.-func)@l
.localentry func, .-func
Note that according to the ABI, when calling the global entry point,
r12 must be set to point the global entry point address itself; while
when calling the local entry point, r2 must be set to point to the TOC
base. The two instructions between the global and local entry point in
the above example translate the first requirement into the second.
This patch implements support in the PowerPC MC streamers to emit the
.localentry directive (both into assembler and ELF object output), as
well as support in the assembler parser to parse that directive.
In addition, there is another change required in MC fixup/relocation
handling to properly deal with relocations targeting function symbols
with two entry points: When the target function is known local, the MC
layer would immediately handle the fixup by inserting the target
address -- this is wrong, since the call may need to go to the local
entry point instead. The GNU assembler handles this case by *not*
directly resolving fixups targeting functions with two entry points,
but always emits the relocation and relies on the linker to handle
this case correctly. This patch changes LLVM MC to do the same (this
is done via the processFixupValue routine).
Similarly, there are cases where the assembler would normally emit a
relocation, but "simplify" it to a relocation targeting a *section*
instead of the actual symbol. For the same reason as above, this
may be wrong when the target symbol has two entry points. The GNU
assembler again handles this case by not performing this simplification
in that case, but leaving the relocation targeting the full symbol,
which is then resolved by the linker. This patch changes LLVM MC
to do the same (via the needsRelocateWithSymbol routine).
NOTE: The method used in this patch is overly pessimistic, since the
needsRelocateWithSymbol routine currently does not have access to the
actual target symbol, and thus must always assume that it might have
two entry points. This will be improved upon by a follow-on patch
that modifies common code to pass the target symbol when calling
needsRelocateWithSymbol.
Reviewed by Hal Finkel.
llvm-svn: 213485
ELFv2 binaries are marked by a bit in the ELF header e_flags field.
A new assembler directive .abiversion can be used to set that flag.
This patch implements support in the PowerPC MC streamers to emit the
.abiversion directive (both into assembler and ELF binary output),
as well as support in the assembler parser to parse the .abiversion
directive.
Reviewed by Hal Finkel.
llvm-svn: 213484
When handling an incoming byval argument, we need to possibly write
incoming registers to the stack in order to create an on-stack image
of the parameter, so we can return its address to common code.
This currently uses CreateFixedObject to access the parts of the
parameter save area where the argument is (or needs to be) stored.
However, sometimes we need to access multiple parts of that area,
e.g. to write multiple registers. The code currently uses a new
CreateFixedObject call for each of these accesses, resulting in
a patchwork of overlapping (fixed) stack objects.
This doesn't really matter in the case of fixed objects, since
any access to those turns into a fixed stackpointer + offset
address anyway. However, with the upcoming ELFv2 patches, we
may actually need to place an incoming argument into our *own*
stack frame instead of the caller's. This means we need to use
CreateStackObject instead, and we cannot have multiple overlapping
instances of those.
To make the rest of the argument handling code work equally in
both situations, this patch refactors it to always use just a
single call to CreateFixedObject, and access parts of that object
as required using address arithmetic. This way, we can in a future
patch substitute CreateStackObject without further changes.
No change to generated code intended.
llvm-svn: 213483
The PPCTargetLowering::SelectAddressRegImm routine needs to handle
FrameIndex nodes in a special manner, by tranlating them into a
TargetFrameIndex node. This was done in most cases, but seems to
have been neglected in one path: when the input tree has an OR of
the FrameIndex with an immediate. This can happen if the FrameIndex
can be proven to be sufficiently aligned that an OR of that immediate
is equivalent to an ADD.
The missing handling of FrameIndex in that case caused the SelectionDAG
instruction selection to miss opportunities to merge the OR back into
the FrameIndex node, leading to superfluous addi/ori instructions in
the final assembler output.
llvm-svn: 213482
These instructions can only take a limited input range, and return
the constant value 1 out of range. We should do range reduction to
be able to process arbitrary values. Use a FRACT instruction after
normalization to achieve this. Also add a test for constant folding
with the lowered code with unsafe-fp-math enabled.
v2: use DAG lowering instead of intrinsic, adapt test
v3: calculate constant, fold pattern into instruction definition
v4: misc style fixes, add sin-fold testcase, cosmetics
Patch by Grigori Goronzy
llvm-svn: 213458
When performing a dynamic stack adjustment without optimisations, we would mark
SP as def and R4 as kill. This occurred as part of the expansion of a
WIN__CHKSTK SDNode which indicated the proper handling of SP and R4. The result
would be that we would double define SP as part of an operation, which is
obviously incorrect.
Furthermore, the VTList for the chain had an incorrect parameter type of i32
instead of Other.
Correct these to permit proper lowering of __builtin_alloca at -O0.
llvm-svn: 213442
a) Move the replacement level decision to the target machine.
b) Create additional subtargets at the TargetMachine level to
cache and make replacement easy.
c) Make the mips16 features obvious.
d) Remove the override logic as it no longer does anything.
e) Have MipsModuleDAGToDAGISel take only the target machine.
f) Have the constant islands pass grab the current subtarget
from the MachineFunction (via the TargetMachine) instead
of caching it.
g) Unconditionally initialize TLOF.
h) Remove the old complicated subtarget based resetting and
replace it with simple conditionals.
llvm-svn: 213430
This adds initial support for PPC32 ELF PIC (Position Independent Code; the
-fPIC variety), thus rectifying a long-standing deficiency in the PowerPC
backend.
Patch by Justin Hibbits!
llvm-svn: 213427