llvm-project

Commit Graph

Author	SHA1	Message	Date
Bill Schmidt	9bac9ca796	[PPC64] Add test case for r215685. I had deferred adding this test case until I could get it down to a reasonable size. That's done now. Thanks, Bill llvm-svn: 215711	2014-08-15 13:51:57 +00:00
Juergen Ributzka	790bacf232	Revert several FastISel commits to track down a buildbot error. This reverts: r215595 "[FastISel][X86] Add large code model support for materializing floating-point constants." r215594 "[FastISel][X86] Use XOR to materialize the "0" value." r215593 "[FastISel][X86] Emit more efficient instructions for integer constant materialization." r215591 "[FastISel][AArch64] Make use of the zero register when possible." r215588 "[FastISel] Let the target decide first if it wants to materialize a constant." r215582 "[FastISel][AArch64] Cleanup constant materialization code. NFCI." llvm-svn: 215673	2014-08-14 19:56:28 +00:00
Juergen Ributzka	7cee768e55	[FastISel] Let the target decide first if it wants to materialize a constant. This changes the order in which FastISel tries to materialize a constant. Originally it would try to use a simple target-independent approach, which can lead to the generation of inefficient code. On X86 this would result in the use of movabsq to materialize any 64bit integer constant - even for simple and small values such as 0 and 1. Also some very funny floating-point materialization could be observed too. On AArch64 it would materialize the constant 0 in a register even the architecture has an actual "zero" register. On ARM it would generate unnecessary mov instructions or not use mvn. This change simply changes the order and always asks the target first if it likes to materialize the constant. This doesn't fix all the issues mentioned above, but it enables the targets to implement such optimizations. Related to <rdar://problem/17420988>. llvm-svn: 215588	2014-08-13 22:08:02 +00:00
Hal Finkel	46ef7ce283	[PowerPC] Implement PPCTargetLowering::getTgtMemIntrinsic This implements PPCTargetLowering::getTgtMemIntrinsic for Altivec load/store intrinsics. As with the construction of the MachineMemOperands for the intrinsic calls used for unaligned load/store lowering, the only slight complication is that we need to represent a larger memory range than the loaded/stored value-type size (because the address is rounded down to an aligned address, and we need to conservatively represent the entire possible range of the actual access). This required adding an extra size field to TargetLowering::IntrinsicInfo, and this was done in a way that required no modifications to other targets (the size defaults to the store size of the provided memory data type). This fixes test/CodeGen/PowerPC/unal-altivec-wint.ll (so it can be un-XFAILed). llvm-svn: 215512	2014-08-13 01:15:40 +00:00
Hal Finkel	415e344f29	Fix classof for ISD::INTRINSIC_W_CHAIN and INTRINSIC_VOID Unfortunately, our use of the SDNode class hierarchy for INTRINSIC_W_CHAIN and INTRINSIC_VOID nodes is somewhat broken right now. These nodes sometimes are used for memory intrinsics (those with MachineMemOperands), and sometimes not. When not, the nodes are not created as instances of MemIntrinsicSDNode, but rather created as some other subclass of SDNode using DAG::getNode. When they are memory intrinsics, they are created using DAG::getMemIntrinsicNode as instances of MemIntrinsicSDNode. MemIntrinsicSDNode is a subclass of MemSDNode, but prior to r214452, we had a non-self-consistent setup whereby MemIntrinsicSDNode::classof on INTRINSIC_W_CHAIN and INTRINSIC_VOID would return true but MemSDNode::classof on INTRINSIC_W_CHAIN and INTRINSIC_VOID would return false. In r214452, MemSDNode::classof was changed to return true for INTRINSIC_W_CHAIN and INTRINSIC_VOID, which is now self-consistent. The problem is that neither the pre-r214452 logic and the post-r214452 logic are really right. The truth is that not all INTRINSIC_W_CHAIN and INTRINSIC_VOID nodes are instances of MemIntrinsicSDNode (or MemSDNode for that matter), and the return value from classof needs to reflect that. This was broken before r214452 (because MemIntrinsicSDNode::classof always returned true), and was broken afterward (because MemSDNode::classof also always returned true), and will now be correct. The minimal solution is to grab one of the SubclassData bits (there is one left for MemIntrinsicSDNode nodes) and use it to store whether or not a particular INTRINSIC_W_CHAIN or INTRINSIC_VOID is really an instance of MemIntrinsicSDNode or not. Doing this allows both MemIntrinsicSDNode::classof and MemSDNode::classof to return the correct answer for the underlying object for both the memory-intrinsic and non-memory-intrinsic cases. This fixes the problem that r214452 created in the SelectionDAGDumper (thanks to Matt Arsenault for pointing it out). Because PowerPC does not implement getTgtMemIntrinsic, this change breaks test/CodeGen/PowerPC/unal-altivec-wint.ll. I've XFAILed it for now, and will fix it in a follow-up commit. llvm-svn: 215511	2014-08-13 01:15:37 +00:00
Joerg Sonnenberger	7ee0f31a8b	Provide an implementation of getNoopForMachoTarget for PPC, otherwise empty functions will assert in the MC object writer. llvm-svn: 215238	2014-08-08 19:13:23 +00:00
Bill Schmidt	42a6936c78	[PowerPC] Swap arguments and adjust shift count for vsldoi on little endian Commits r213915 and r214718 fix recognition of shuffle masks for vmrg* and vpku*um instructions for a little-endian target, by swapping the input arguments. The vsldoi instruction requires similar treatment, and also needs its shift count adjusted for little endian. Reviewed by Ulrich Weigand. This is a bug fix candidate for release 3.5 (and hopefully the last of those for PowerPC). llvm-svn: 214923	2014-08-05 20:47:25 +00:00
Bill Schmidt	f04e998e00	[PPC64LE] Fix wrong IR for vec_sld and vec_vsldoi My original LE implementation of the vsldoi instruction, with its altivec.h interfaces vec_sld and vec_vsldoi, produces incorrect shufflevector operations in the LLVM IR. Correct code is generated because the back end handles the incorrect shufflevector in a consistent manner. This patch and a companion patch for Clang correct this problem by removing the fixup from altivec.h and the corresponding fixup from the PowerPC back end. Several test cases are also modified to reflect the now-correct LLVM IR. llvm-svn: 214800	2014-08-04 23:21:01 +00:00
Joerg Sonnenberger	6d05a2b461	MC uses .lcomm now, so adjust. llvm-svn: 214776	2014-08-04 21:06:00 +00:00
Ulrich Weigand	983341d3f3	[PowerPC] Add target triple to vec_urem_const.ll test case This should hopefully fix build bots on other architectures. llvm-svn: 214721	2014-08-04 14:55:26 +00:00
Ulrich Weigand	cc9909b881	[PowerPC] Swap arguments to vpkuhum/vpkuwum on little-endian In commit r213915, Bill fixed little-endian usage of vmrgh* and vmrgl* by swapping the input arguments. As it turns out, the exact same fix is also required for the vpkuhum/vpkuwum patterns. This fixes another regression in llvmpipe when vector support is enabled. Reviewed by Bill Schmidt. llvm-svn: 214718	2014-08-04 13:53:40 +00:00
Ulrich Weigand	51eccec5d9	[PowerPC] MULHU/MULHS are not legal for vector types I ran into some test failures where common code changed vector division by constant into a multiply-high operation (MULHU). But these are not implemented by the back-end, so we failed to recognize the insn. Fixed by marking MULHU/MULHS as Expand for vector types. llvm-svn: 214716	2014-08-04 13:27:12 +00:00
Ulrich Weigand	c4cc7febb0	[PowerPC] Fix and improve vector comparisons This patch refactors code generation of vector comparisons. This fixes a wrong code-gen bug for ISD::SETGE for floating-point types, and improves generated code for vector comparisons in general. Specifically, the patch moves all logic deciding how to implement vector comparisons into getVCmpInst, which gets two extra boolean outputs indicating to its caller whether its needs to swap the input operands and/or negate the result of the comparison. Apart from implementing these two modifications as directed by getVCmpInst, there is no need to ever implement vector comparisons in any other manner; in particular, there is never a need to perform two separate comparisons (e.g. one for equal and one for greater-than, as code used to do before this patch). Reviewed by Bill Schmidt. llvm-svn: 214714	2014-08-04 13:13:57 +00:00
Hal Finkel	3604bf7fe7	[PowerPC] Recognize consecutive memory accesses from intrinsics When generating unaligned vector loads, we need to search for other loads or stores nearby offset by one vector width. If we find one, then we know that we can safely generate another aligned load at that address. Otherwise, we must generate the next load using an offset of the vector width minus one byte (so we don't read off the end of the allocation if the base unaligned address happened to be aligned at runtime). We had previously done this using only other vector loads and stores, but did not consider the PowerPC-specific vector load/store intrinsics. Now we'll also consider vector intrinsics. By itself, this change is a feature enhancement, but is a necessary step toward fixing the underlying problem behind PR19991. llvm-svn: 214469	2014-08-01 01:02:01 +00:00
Will Schmidt	44ff8f06ec	Disable IsSub subregister assert. pr18663. This is a follow-up to the activity in the bug at http://llvm.org/bugs/show_bug.cgi?id=18663 . The underlying issue has to do with how the KILL pseudo-instruction is handled. I defer to Hal/Jakob/Uli for additional details and background. This will disable the (bad?) assert, add an associated fixme comment, and add a pair of tests. The code change and the pr18663-2.ll test are copied from the referenced bug. That test does not immediately fail in my environment, but I have added the pr18663.ll test which does. (Comment from Hal) to provide everyone else with some context, this assert was not bad when it was written. At that time, we only generated KILL pseudo instructions around subregister copies. This logic, unfortunately, had its own problems. In r199797, the relevant logic in MachineCopyPropagation was replaced to generate KILLs for other kinds of copies too. This change in semantics broke this now-problematic assumption in AggressiveAntiDepBreaker. The AggressiveAntiDepBreaker really needs a proper cleanup to deal with the change, but removing the assert (which just allows the function to return false) is a safe conservative behavior, and should do for the time being. llvm-svn: 214429	2014-07-31 19:50:53 +00:00
Hal Finkel	36eff0f854	Fix ScalarEvolutionExpander when creating a PHI in a block with duplicate predecessors It seems that when I fixed this, almost exactly a year ago, I did not quite do it correctly. When we have duplicate block predecessors, we can indeed not have different incoming values for the same block, but we must have duplicate entries. So, instead of skipping the duplicates, we explicitly add the duplicate incoming values. Fixes PR20442. llvm-svn: 214423	2014-07-31 19:13:38 +00:00
Ulrich Weigand	e09f73716a	[PowerPC] Fix ppc64-elf-abi.ll test case on Darwin Use full -mtriple instead of just -march to ensure Linux ABI (ELFv1 or ELFv2) is selected. llvm-svn: 214179	2014-07-29 12:48:14 +00:00
Ulrich Weigand	085a10c49e	[PowerPC] Add testcase forgotten in the 214072 commit. llvm-svn: 214073	2014-07-28 13:10:25 +00:00
Hal Finkel	7c8ae53506	[PowerPC] Support TLS on PPC32/ELF Patch by Justin Hibbits! llvm-svn: 213960	2014-07-25 17:47:22 +00:00
Bill Schmidt	c9fa5dd618	[PATCH][PPC64LE] Correct little-endian usage of vmrgh* and vmrgl. Because the PowerPC vmrgh and vmrgl* instructions have a built-in big-endian bias, it is necessary to swap their inputs in little-endian mode when using them to implement a vector shuffle. This was previously missed in the vector LE implementation. There was already logic to distinguish between unary and "normal" vmrg* vector shuffles, so this patch extends that logic to use a third option: "swapped" vmrg* vector shuffles that are used for little endian in place of the "normal" ones. I've updated the vec-shuffle-le.ll test to check for the expected register ordering on the generated instructions. This bug was discovered when testing the LE and ELFv2 patches for safety if they were backported to 3.4. A different vectorization decision was made in 3.4 than on mainline trunk, and that exposed the problem. I've verified this fix takes care of that issue. llvm-svn: 213915	2014-07-25 01:55:55 +00:00
Joerg Sonnenberger	b5459e6e22	Don't use 128bit functions on PPC32. llvm-svn: 213899	2014-07-24 22:20:10 +00:00
Chandler Carruth	9a0051cd59	[SDAG] Make the DAGCombine worklist not grow endlessly due to duplicate insertions. The old behavior could cause arbitrarily bad memory usage in the DAG combiner if there was heavy traffic of adding nodes already on the worklist to it. This commit switches the DAG combine worklist to work the same way as the instcombine worklist where we null-out removed entries and only add new entries to the worklist. My measurements of codegen time shows slight improvement. The memory utilization is unsurprisingly dominated by other factors (the IR and DAG itself I suspect). This change results in subtle, frustrating churn in the particular order in which DAG combines are applied which causes a number of minor regressions where we fail to match a pattern previously matched by accident. AFAICT, all of these should be using AddToWorklist to directly or should be written in a less brittle way. None of the changes seem drastically bad, and a few of the changes seem distinctly better. A major change required to make this work is to significantly harden the way in which the DAG combiner handle nodes which become dead (zero-uses). Previously, we relied on the ability to "priority-bump" them on the combine worklist to achieve recursive deletion of these nodes and ensure that the frontier of remaining live nodes all were added to the worklist. Instead, I've introduced a routine to just implement that precise logic with no indirection. It is a significantly simpler operation than that of the combiner worklist proper. I suspect this will also fix some other problems with the combiner. I think the x86 changes are really minor and uninteresting, but the avx512 change at least is hiding a "regression" (despite the test case being just noise, not testing some performance invariant) that might be looked into. Not sure if any of the others impact specific "important" code paths, but they didn't look terribly interesting to me, or the changes were really minor. The consensus in review is to fix any regressions that show up after the fact here. Thanks to the other reviewers for checking the output on other architectures. There is a specific regression on ARM that Tim already has a fix prepped to commit. Differential Revision: http://reviews.llvm.org/D4616 llvm-svn: 213727	2014-07-23 07:08:53 +00:00
Ulrich Weigand	85d5df25de	[PowerPC] ELFv2 aggregate passing support This patch adds infrastructure support for passing array types directly. These can be used by the front-end to pass aggregate types (coerced to an appropriate array type). The details of the array type being used inform the back-end about ABI-relevant properties. Specifically, the array element type encodes: - whether the parameter should be passed in FPRs, VRs, or just GPRs/stack slots (for float / vector / integer element types, respectively) - what the alignment requirements of the parameter are when passed in GPRs/stack slots (8 for float / 16 for vector / the element type size for integer element types) -- this corresponds to the "byval align" field Using the infrastructure provided by this patch, a companion patch to clang will enable two features: - In the ELFv2 ABI, pass (and return) "homogeneous" floating-point or vector aggregates in FPRs and VRs (this is similar to the ARM homogeneous aggregate ABI) - As an optimization for both ELFv1 and ELFv2 ABIs, pass aggregates that fit fully in registers without using the "byval" mechanism The patch uses the functionArgumentNeedsConsecutiveRegisters callback to encode that special treatment is required for all directly-passed array types. The isInConsecutiveRegs / isInConsecutiveRegsLast bits set as a results are then used to implement the required size and alignment rules in CalculateStackSlotSize / CalculateStackSlotAlignment etc. As a related change, the ABI routines have to be modified to support passing floating-point types in GPRs. This is necessary because with homogeneous aggregates of 4-byte float type we can now run out of FPRs before we run out of the 64-byte argument save area that is shadowed by GPRs. Any extra floating-point arguments that no longer fit in FPRs must now be passed in GPRs until we run out of those too. Note that there was already code to pass floating-point arguments in GPRs used with vararg parameters, which was done by writing the argument out to the argument save area first and then reloading into GPRs. The patch re-implements this, however, in favor of code packing float arguments directly via extension/truncation, BITCAST, and BUILD_PAIR operations. This is required to support the ELFv2 ABI, since we cannot unconditionally write to the argument save area (which the caller might not have allocated). The change does, however, affect ELFv1 varags routines too; but even here the overall effect should be advantageous: Instead of loading the argument into the FPR, then storing the argument to the stack slot, and finally reloading the argument from the stack slot into a GPR, the new code now just loads the argument into the FPR, and subsequently loads the argument into the GPR (via BITCAST). That BITCAST might imply a save/reload from a stack temporary (in which case we're no worse than before); but it might be implemented more efficiently in some cases. The final part of the patch enables up to 8 FPRs and VRs for argument return in PPCCallingConv.td; this is required to support returning ELFv2 homogeneous aggregates. (Note that this doesn't affect other ABIs since LLVM wil only look for which register to use if the parameter is marked as "direct" return anyway.) Reviewed by Hal Finkel. llvm-svn: 213493	2014-07-21 00:13:26 +00:00
Ulrich Weigand	be928cc276	[PowerPC] ELFv2 explicit CFI for CR fields This is a minor improvement in the ELFv2 ABI. In ELFv1, DWARF CFI would represent a saved CR word (holding CR fields CR2, CR3, and CR4) using just a single CFI record refering to CR2. In ELFv2 instead, each of the CR fields is represented by its own CFI record. The advantage is that the compiler can now chose to save just a single (or two) CR fields instead of all of them, if those are the only ones that actually need saving. That can lead to more efficient code using mf(o)crf instead of the (slow) mfcr instruction. Note that this patch does not (yet) implement this more efficient code generation, but it does implement the part that is required to be ABI compliant: creating multiple CFI records if multiple CR fields are saved. Reviewed by Hal Finkel. llvm-svn: 213492	2014-07-21 00:03:18 +00:00
Ulrich Weigand	8658f17eef	[PowerPC] ELFv2 stack space reduction The ELFv2 ABI reduces the amount of stack required to implement an ABI-compliant function call in two ways: * the "linkage area" is reduced from 48 bytes to 32 bytes by eliminating two unused doublewords * the 64-byte "parameter save area" is now optional and need not be present in certain cases (it remains mandatory in functions with variable arguments, and functions that have any parameter that is passed on the stack) The following patch implements this required changes: - reducing the linkage area, and associated relocation of the TOC save slot, in getLinkageSize / getTOCSaveOffset (this requires updating all callers of these routines to pass in the isELFv2ABI flag). - (partially) handling the case where the parameter save are is optional This latter part requires some extra explanation: Currently, we still always allocate the parameter save area when calling a function. That is certainly always compliant with the ABI, but may cause code to allocate stack unnecessarily. This can be addressed by a follow-on optimization patch. On the callee side, in LowerFormalArguments, we must track correctly whether the ABI guarantees that the caller has allocated the parameter save area for our use, and the patch does so. However, there is one complication: the code that handles incoming "byval" arguments will currently always write to the parameter save area, because it has to force incoming register arguments to the stack since it must return an address to implement the byval semantics. To fix this, the patch changes the LowerFormalArguments code to write arguments to a freshly allocated stack slot on the function's own stack frame instead of the argument save area in those cases where that area is not present. Reviewed by Hal Finkel. llvm-svn: 213490	2014-07-20 23:43:15 +00:00
Ulrich Weigand	aa0ac4f11c	[PowerPC] ELFv2 function call changes This patch builds upon the two preceding MC changes to implement the basic ELFv2 function call convention. In the ELFv1 ABI, a "function descriptor" was associated with every function, pointing to both the entry address and the related TOC base (and a static chain pointer for nested functions). Function pointers would actually refer to that descriptor, and the indirect call sequence needed to load up both entry address and TOC base. In the ELFv2 ABI, there are no more function descriptors, and function pointers simply refer to the (global) entry point of the function code. Indirect function calls simply branch to that address, after loading it up into r12 (as required by the ABI rules for a global entry point). Direct function calls continue to just do a "bl" to the target symbol; this will be resolved by the linker to the local entry point of the target function if it is local, and to a PLT stub if it is global. That PLT stub would then load the (global) entry point address of the final target into r12 and branch to it. Note that when performing a local function call, r2 must be set up to point to the current TOC base: if the target ends up local, the ABI requires that its local entry point is called with r2 set up; if the target ends up global, the PLT stub requires that r2 is set up. This patch implements all LLVM changes to implement that scheme: - No longer create a function descriptor when emitting a function definition (in EmitFunctionEntryLabel) - Emit two entry points if the function needs the TOC base (r2) anywhere (this is done EmitFunctionBodyStart; note that this cannot be done in EmitFunctionBodyStart because the global entry point prologue code must be part of the function as covered by debug info). - In order to make use tracking of r2 (as needed above) work correctly, mark direct function calls as implicitly using r2. - Implement the ELFv2 indirect function call sequence (no function descriptors; load target address into r12). - When creating an ELFv2 object file, emit the .abiversion 2 directive to tell the linker to create the appropriate version of PLT stubs. Reviewed by Hal Finkel. llvm-svn: 213489	2014-07-20 23:31:44 +00:00
Ulrich Weigand	55a96650d9	[PowerPC] Fix FrameIndex handling in SelectAddressRegImm The PPCTargetLowering::SelectAddressRegImm routine needs to handle FrameIndex nodes in a special manner, by tranlating them into a TargetFrameIndex node. This was done in most cases, but seems to have been neglected in one path: when the input tree has an OR of the FrameIndex with an immediate. This can happen if the FrameIndex can be proven to be sufficiently aligned that an OR of that immediate is equivalent to an ADD. The missing handling of FrameIndex in that case caused the SelectionDAG instruction selection to miss opportunities to merge the OR back into the FrameIndex node, leading to superfluous addi/ori instructions in the final assembler output. llvm-svn: 213482	2014-07-20 22:26:40 +00:00
Hal Finkel	3ee2af7d1c	[PowerPC] 32-bit ELF PIC support This adds initial support for PPC32 ELF PIC (Position Independent Code; the -fPIC variety), thus rectifying a long-standing deficiency in the PowerPC backend. Patch by Justin Hibbits! llvm-svn: 213427	2014-07-18 23:29:49 +00:00
Ulrich Weigand	ea147a9d43	[PowerPC] Fix invalid displacement created by LocalStackAlloc This commit fixes a bug in PPCRegisterInfo::isFrameOffsetLegal that could result in the LocalStackAlloc pass creating an MI instruction out-of-range displacement: %vreg17<def> = LD 33184, %vreg31; mem:LD8[%g](align=32) %G8RC:%vreg17 G8RC_and_G8RC_NOX0:%vreg31 (In final assembler output the top bits are stripped off, resulting in a negative offset loading from below the stack pointer.) Common code expects the isFrameOffsetLegal routine to verify whether adding a given offset to the offset already present in the instruction results in a valid displacement. However, on PowerPC the routine did not take the already present instruction offset into account. This commit fixes isFrameOffsetLegal to add the instruction offset, and updates a local caller (needsFrameBaseReg) to no longer add the instruction offset itself before calling isFrameOffsetLegal. Reviewed by Hal Finkel. llvm-svn: 212832	2014-07-11 17:19:31 +00:00
Ulrich Weigand	5fd91e0c0b	[PowerPC] Fix testcase regression Use -mcpu to avoid different codegen depending on host platform. llvm-svn: 212478	2014-07-07 19:41:54 +00:00
Ulrich Weigand	ec2bf93895	[PowerPC] Fix "byval align" arguments Arguments passed as "byval align" should get the specified alignment in the parameter save area. There was some code in PPCISelLowering.cpp that attempted to implement this, but this didn't work correctly: while code did update the ArgOffset value, it neglected to update the PtrOff value (which was already computed from the old ArgOffset), and it also neglected to update GPR_idx -- fields skipped due to alignment in the save area must likewise be skipped in GPRs. This patch fixes and simplifies this logic by: - handling argument offset alignment right at the beginning of argument processing, using a new helper routine CalculateStackSlotAlignment (this avoids having to update PtrOff and other derived values later on) - not tracking GPR_idx separately, but always computing the correct GPR_idx for each argument from its ArgOffset - removing some redundant computation in LowerFormalArguments: MinReservedArea must equal ArgOffset after argument processing, so there's no use in computing it twice. [This doesn't change the behavior of the current clang front-end, since that never creates "byval align" arguments at the moment. This will change with a follow-on patch, however.] llvm-svn: 212476	2014-07-07 19:26:41 +00:00
Tim Northover	07f99fb769	llvm-readobj: fix MachO relocatoin printing a bit. There were two issues here: 1. At the very least, scattered relocations cannot use the same code to determine the corresponding symbol being referred to. For some reason we pretend there is no symbol, even when one actually exists in the symtab, so to match this behaviour getRelocationSymbol should simply return symbols_end for scattered relocations. 2. Printing "-" when we can't get a symbol (including the scattered case, but not exclusively), isn't that helpful. In both cases there is interesting information in that field, so we should print it. As hex will do. Small part of rdar://problem/17553104 llvm-svn: 212332	2014-07-04 10:57:56 +00:00
Ulrich Weigand	f236bb1b5b	Fix ppcf128 component access on little-endian systems The PowerPC 128-bit long double data type (ppcf128 in LLVM) is in fact a pair of two doubles, where one is considered the "high" or more-significant part, and the other is considered the "low" or less-significant part. When a ppcf128 value is stored in memory or a register pair, the high part always comes first, i.e. at the lower memory address or in the lower-numbered register, and the low part always comes second. This is true both on big-endian and little-endian PowerPC systems. (Similar to how with a complex number, the real part always comes first and the imaginary part second, no matter the byte order of the system.) This was implemented incorrectly for little-endian systems in LLVM. This commit fixes three related issues: - When printing an immediate ppcf128 constant to assembler output in emitGlobalConstantFP, emit the high part first on both big- and little-endian systems. - When lowering a ppcf128 type to a pair of f64 types in SelectionDAG (which is used e.g. when generating code to load an argument into a register pair), use correct low/high part ordering on little-endian systems. - In a related issue, because lowering ppcf128 into a pair of f64 must operate differently from lowering an int128 into a pair of i64, bitcasts between ppcf128 and int128 must not be optimized away by the DAG combiner on little-endian systems, but must effect a word-swap. Reviewed by Hal Finkel. llvm-svn: 212274	2014-07-03 15:06:47 +00:00
Ulrich Weigand	14bd521f4c	[PowerPC] Constrain base register in PPCRegisterInfo::resolveFrameIndex I've run into a bug where current LLVM at -O0 (with fast-isel) generated invalid code like: ld 0, 20936(1) # 8-byte Folded Reload stw 12, 10348(0) stw 12, 10344(0) The underlying vreg had been introduced as base register by the Local Stack Slot Allocation pass. That register was constrained to G8RC by PPCRegisterInfo::materializeFrameBaseRegister to match the ADDI instruction used to set it, but it was not constrained to G8RC_NOX0 to fit the use of the register in an address. That should have happened in PPCRegisterInfo::resolveFrameIndex. This patch adds an appropriate constrainRegClass call. Reviewed by Hal Finkel. llvm-svn: 211897	2014-06-27 13:04:12 +00:00
Eli Bendersky	5d5e18da3e	Rename loop unrolling and loop vectorizer metadata to have a common prefix. [LLVM part] These patches rename the loop unrolling and loop vectorizer metadata such that they have a common 'llvm.loop.' prefix. Metadata name changes: llvm.vectorizer.* => llvm.loop.vectorizer.* llvm.loopunroll.* => llvm.loop.unroll.* This was a suggestion from an earlier review (http://reviews.llvm.org/D4090) which added the loop unrolling metadata. Patch by Mark Heffernan. llvm-svn: 211710	2014-06-25 15:41:00 +00:00
Bill Schmidt	83973ef23b	[PPC64] Fix PR20071 (fctiduz generated for targets lacking that instruction) PR20071 identifies a problem in PowerPC's fast-isel implementation for floating-point conversion to integer. The fctiduz instruction was added in Power ISA 2.06 (i.e., Power7 and later). However, this instruction is being generated regardless of which 64-bit PowerPC target is selected. The intent is for fast-isel to punt to DAG selection when this instruction is not available. This patch implements that change. For testing purposes, the existing fast-isel-conversion.ll test adds a RUN line for -mcpu=970 and tests for the expected code generation. Additionally, the existing test fast-isel-conversion-p5.ll was found to be incorrectly expecting the unavailable instruction to be generated. I've removed these test variants since we have adequate coverage in fast-isel-conversion.ll. llvm-svn: 211627	2014-06-24 20:05:18 +00:00
Ulrich Weigand	f316e1db75	[PowerPC] Allow stack frames without parameter save area The PPCFrameLowering::determineFrameLayout routine currently ensures that every function that allocates a stack frame provides space for the parameter save area (via PPCFrameLowering::getMinCallFrameSize). This is actually not necessary. There may be functions that never call another routine but still allocate a frame; those do not require the parameter save area. In the future, with the ELFv2 ABI, even some routines that do call other functions do not need to allocate the parameter save area. While it is not a bug to allocate the parameter area when it is not needed, it is better to avoid it to save stack space. Note that when any particular function call requires the parameter save area, this space will already have been included by ABI code in the size the CALLSEQ_START insn is annotated with, and therefore included in the size returned by MFI->getMaxCallFrameSize(). This means that determineFrameLayout simply does not need to care about the parameter save area. (It still needs to ensure that every frame provides the linkage area.) This is implemented by this patch. Note that this exposed a bug in the new fast-isel code where the parameter area was not included in the CALLSEQ_START size; this is also fixed. A couple of test cases needed to be adapted for the new (smaller) stack frame size those tests now see. llvm-svn: 211495	2014-06-23 13:47:52 +00:00
Ulrich Weigand	9ba552db89	[PowerPC] Fix on-stack AltiVec arguments with 64-bit SVR4 Current 64-bit SVR4 code seems to have some remnants of Darwin code in AltiVec argument handing. This had the effect that AltiVec arguments (or subsequent arguments) were not correctly placed in the parameter area in some cases. The correct behaviour with the 64-bit SVR4 ABI is: - All AltiVec arguments take up space in the parameter area, just like any other arguments, whether vararg or not. - They are always 16-byte aligned, skipping a parameter area doubleword (and the associated GPR, if any), if necessary. This patch implements the correct behaviour and adds a test case. (Verified against GCC behaviour via the ABI compat test suite.) llvm-svn: 211492	2014-06-23 12:36:34 +00:00
Ulrich Weigand	59c6ab20d6	[PowerPC] Fix small argument stack slot offset for LE When small arguments (structures < 8 bytes or "float") are passed in a stack slot in the ppc64 SVR4 ABI, they must reside in the least significant part of that slot. On BE, this means that an offset needs to be added to the stack address of the parameter, but on LE, the least significant part of the slot has the same address as the slot itself. This changes the PowerPC back-end ABI code to only add the small argument stack slot offset for BE. It also adds test cases to verify the correct behavior on both BE and LE. llvm-svn: 211368	2014-06-20 16:34:05 +00:00
Alp Toker	1d099d9339	Fix typos llvm-svn: 211304	2014-06-19 19:41:26 +00:00
Ulrich Weigand	ad0cb91ed9	[PowerPC] Simplify and improve loading into TOC register During an indirect function call sequence on the 64-bit SVR4 ABI, generate code must load and then restore the TOC register. This does not use a regular LOAD instruction since the TOC register r2 is marked as reserved. Instead, the are two special instruction patterns: let RST = 2, DS = 2 in def LDinto_toc: DSForm_1a<58, 0, (outs), (ins g8rc:$reg), "ld 2, 8($reg)", IIC_LdStLD, [(PPCload_toc i64:$reg)]>, isPPC64; let RST = 2, DS = 10, RA = 1 in def LDtoc_restore : DSForm_1a<58, 0, (outs), (ins), "ld 2, 40(1)", IIC_LdStLD, [(PPCtoc_restore)]>, isPPC64; Note that these not only restrict the destination of the load to r2, but they also restrict the source of the load to particular address combinations. The latter is a problem when we want to support the ELFv2 ABI, since there the TOC save slot is no longer at 40(1). This patch replaces those two instructions with a single instruction pattern that only hard-codes r2 as destination, but supports generic addresses as source. This will allow supporting the ELFv2 ABI, and also helps generate more efficient code for calls to absolute addresses (allowing simplification of the ppc64-calls.ll test case). llvm-svn: 211193	2014-06-18 17:52:49 +00:00
Ulrich Weigand	e581920d12	[PowerPC] Add back test case for absolute calls (removed in r211174) As requested by Hal Finkel, this adds back a test for calls to a known-constant function pointer value, and verifies that the 64-bit SVR4 indirect function call sequence is used. llvm-svn: 211190	2014-06-18 17:28:56 +00:00
Ulrich Weigand	9aa09ef30f	[PowerPC] Do not use BLA with the 64-bit SVR4 ABI The PowerPC back-end uses BLA to implement calls to functions at known-constant addresses, which is apparently used for certain system routines on Darwin. However, with the 64-bit SVR4 ABI, this is actually incorrect. An immediate function pointer value on this platform is not directly usable as a target address for BLA: - in the ELFv1 ABI, the function pointer value refers to the function descriptor, not the code address - in the ELFv2 ABI, the function pointer value refers to the global entry point, but BL(A) would only be correct when calling the local entry point This bug didn't show up since using immediate function pointer values is not usually done in the 64-bit SVR4 ABI in the first place. However, I ran into this issue with a certain use case of LLVM as JIT, where immediate function pointer values were uses to implement callbacks from JITted code to helpers in statically compiled code. Fixed by simply not using BLA with the 64-bit SVR4 ABI. llvm-svn: 211174	2014-06-18 16:14:04 +00:00
Bill Schmidt	5d82f09b53	[PPC64] Fix PR19893 - improve code generation for local function addresses Rafael opened http://llvm.org/bugs/show_bug.cgi?id=19893 to track non-optimal code generation for forming a function address that is local to the compile unit. The existing code was treating both local and non-local functions identically. This patch fixes the problem by properly identifying local functions and generating the proper addis/addi code. I also noticed that Rafael's earlier changes to correct the surrounding code in PPCISelLowering.cpp were also needed for fast instruction selection in PPCFastISel.cpp, so this patch fixes that code as well. The existing test/CodeGen/PowerPC/func-addr.ll is modified to test the new code generation. I've added a -O0 run line to test the fast-isel code as well. Tested on powerpc64[le]-unknown-linux-gnu with no regressions. llvm-svn: 211056	2014-06-16 21:36:02 +00:00
Tim Northover	420a216817	IR: add "cmpxchg weak" variant to support permitted failure. This commit adds a weak variant of the cmpxchg operation, as described in C++11. A cmpxchg instruction with this modifier is permitted to fail to store, even if the comparison indicated it should. As a result, cmpxchg instructions must return a flag indicating success in addition to their original iN value loaded. Thus, for uniformity all cmpxchg instructions now return "{ iN, i1 }". The second flag is 1 when the store succeeded. At the DAG level, a new ATOMIC_CMP_SWAP_WITH_SUCCESS node has been added as the natural representation for the new cmpxchg instructions. It is a strong cmpxchg. By default this gets Expanded to the existing ATOMIC_CMP_SWAP during Legalization, so existing backends should see no change in behaviour. If they wish to deal with the enhanced node instead, they can call setOperationAction on it. Beware: as a node with 2 results, it cannot be selected from TableGen. Currently, no use is made of the extra information provided in this patch. Test updates are almost entirely adapting the input IR to the new scheme. Summary for out of tree users: ------------------------------ + Legacy Bitcode files are upgraded during read. + Legacy assembly IR files will be invalid. + Front-ends must adapt to different type for "cmpxchg". + Backends should be unaffected by default. llvm-svn: 210903	2014-06-13 14:24:07 +00:00
Bill Schmidt	f910a0650e	[PPC64LE] Recognize shufflevector patterns for little endian Various masks on shufflevector instructions are recognizable as specific PowerPC instructions (vector pack, vector merge, etc.). There is existing code in PPCISelLowering.cpp to recognize the correct patterns for big endian code. The masks for these instructions are different for little endian code due to the big-endian numbering employed by these instructions. This patch adds the recognition code for little endian. I've added a new test case test/CodeGen/PowerPC/vec_shuffle_le.ll for this. The existing recognizer test (vec_shuffle.ll) is unnecessarily verbose and difficult to read, so I felt it was better to add a new test rather than modify the old one. llvm-svn: 210536	2014-06-10 14:35:01 +00:00
Alp Toker	d3d017cf00	Reduce verbiage of lit.local.cfg files We can just split targets_to_build in one place and make it immutable. llvm-svn: 210496	2014-06-09 22:42:55 +00:00
Bill Schmidt	6b5a7dfc24	[PPC64LE] Generate correct code for unaligned little-endian vector loads The code in PPCTargetLowering::PerformDAGCombine() that handles unaligned Altivec vector loads generates a lvsl followed by a vperm. As we've seen in numerous other places, the vperm instruction has a big-endian bias, and this is fixed for little endian by complementing the permute control vector and swapping the input operands. In this case the lvsl is providing the permute control vector. Rather than generating an lvsl and a complement operation, it is sufficient to generate an lvsr instruction instead. Thus for LE code generation we will generate an lvsr rather than an lvsl, and swap the other input arguments on the vperm. The existing test/CodeGen/PowerPC/vec_misalign.ll is updated to test the code generation for PPC64 and PPC64LE, in addition to the existing PPC32/G5 testing. llvm-svn: 210493	2014-06-09 22:00:52 +00:00
Bill Schmidt	42995e8c74	[PPC64LE] Generate correct little-endian code for v16i8 multiply The existing code in PPCTargetLowering::LowerMUL() for multiplying two v16i8 values assumes that vector elements are numbered in big-endian order. For little-endian targets, the vector element numbering is reversed, but the vmuleub, vmuloub, and vperm instructions still assume big-endian numbering. To account for this, we must adjust the permute control vector and reverse the order of the input registers on the vperm instruction. The existing test/CodeGen/PowerPC/vec_mul.ll is updated to be executed on powerpc64 and powerpc64le targets as well as the original powerpc (32-bit) target. llvm-svn: 210474	2014-06-09 16:06:29 +00:00
Bill Schmidt	4aedff8995	[PPC64LE] Fix lowering of BUILD_VECTOR and SHUFFLE_VECTOR for little endian This patch fixes a couple of lowering issues for little endian PowerPC. The code for lowering BUILD_VECTOR contains a number of optimizations that are only valid for big endian. For now, we disable those optimizations for correctness. In the future, we will add analogous optimizations that are correct for little endian. When lowering a SHUFFLE_VECTOR to a VPERM operation, we again need to make the now-familiar transformation of swapping the input operands and complementing the permute control vector. Correctness of this transformation is tested by the accompanying test case. llvm-svn: 210336	2014-06-06 14:06:26 +00:00

1 2 3 4 5 ...

813 Commits