llvm-project

Commit Graph

Author	SHA1	Message	Date
Rafael Espindola	9f92995781	Don't pass the code model to MC I was surprised to see the code model being passed to MC. After all, it assembles code, it doesn't create it. The one place it is used is in the expansion of .cfi directives to handle .eh_frame being more that 2gb away from the code. As far as I can tell, gnu assembler doesn't even have an option to enable this. Compiling a c file with gcc -mcmodel=large produces a regular looking .eh_frame. This is probably because in practice linker parse and recreate .eh_frames. In llvm this is used because the JIT can place the code and .eh_frame very far apart. Ideally we would fix the jit and delete this option. This is hard. Apart from confusion another problem with the current interface is that most callers pass CodeModel::Default, which is bad since MC has no way to map it to the target default if it actually needed to. This patch then replaces the argument with a boolean with a default value. The vast majority of users don't ever need to look at it. In fact, only CodeGen and llvm-mc use it and llvm-mc just to enable more testing. llvm-svn: 309884	2017-08-02 20:32:26 +00:00
Stefan Pintilie	873889ca16	[Power9] Exploit vector absolute difference instructions on Power 9 Power 9 has instructions to do absolute difference (VABSDUB, VABSDUH, VABSDUW) for byte, halfword and word. We should take advantage of these. Differential Revision: https://reviews.llvm.org/D34684 llvm-svn: 309876	2017-08-02 20:07:21 +00:00
Matt Arsenault	2738ede6b2	AMDGPU: Restore using MRI to find highest used regs If there are no calls, this is a faster path than searching the entire program for calls. This was supposed to be left in r309781. Fixes unused variable warning. llvm-svn: 309832	2017-08-02 17:15:01 +00:00
Adrian Prantl	61b39b2aec	Remove unused includes of MachineLocation.h (NFC) llvm-svn: 309824	2017-08-02 15:32:18 +00:00
Florian Hahn	31f78fd0ae	[AArch64] Simplify AES*Tied pseudo expansion (NFC). Summary: Suggested by @t.p.northover in https://bugs.llvm.org/show_bug.cgi?id=34015. Reviewers: javed.absar, t.p.northover, rengolin Reviewed By: t.p.northover Subscribers: aemerson, kristof.beyls, llvm-commits, t.p.northover Differential Revision: https://reviews.llvm.org/D36223 llvm-svn: 309821	2017-08-02 15:17:19 +00:00
Matt Arsenault	8e8f8f43b0	AMDGPU: Fix clobbering CSR VGPRs when spilling SGPR to it llvm-svn: 309783	2017-08-02 01:52:45 +00:00
Matt Arsenault	1d6317c3ad	AMDGPU: Fix emitting encoded calls This was failing on out of bounds access to the extra operands on the s_swappc_b64 beyond those in the instruction definition. This was working, but somehow regressed within the past few weeks, although I don't see any obvious commit. llvm-svn: 309782	2017-08-02 01:42:04 +00:00
Matt Arsenault	6ed7b9bfc0	AMDGPU: Analyze callee resource usage in AsmPrinter llvm-svn: 309781	2017-08-02 01:31:28 +00:00
Stanislav Mekhanoshin	f23ae4fbe9	[AMDGPU] Fix asan error after last commit Previous change "Turn s_and_saveexec_b64 into s_and_b64 if result is unused" introduced asan use-after-poison error. Instruction was analyzed after eraseFromParent() calls. Move analysys higher than erase. llvm-svn: 309779	2017-08-02 01:18:57 +00:00
Matt Arsenault	d1867c0345	AMDGPU: Don't place arguments in emergency stack slot When finding the fixed offsets for function arguments, this needs to skip over the 4 bytes reserved for the emergency stack slot. llvm-svn: 309776	2017-08-02 00:59:51 +00:00
Stanislav Mekhanoshin	da0edef1bd	[AMDGPU] Turn s_and_saveexec_b64 into s_and_b64 if result is unused With SI_END_CF elimination for some nested control flow we can now eliminate saved exec register completely by turning a saveexec version of instruction into just a logical instruction. Differential Revision: https://reviews.llvm.org/D36007 llvm-svn: 309766	2017-08-01 23:44:35 +00:00
Stanislav Mekhanoshin	37e7f959c0	[AMDGPU] Collapse adjacent SI_END_CF Add a pass to remove redundant S_OR_B64 instructions enabling lanes in the exec. If two SI_END_CF (lowered as S_OR_B64) come together without any vector instructions between them we can only keep outer SI_END_CF, given that CFG is structured and exec bits of the outer end statement are always not less than exec bit of the inner one. This needs to be done before the RA to eliminate saved exec bits registers but after register coalescer to have no vector registers copies in between of different end cf statements. Differential Revision: https://reviews.llvm.org/D35967 llvm-svn: 309762	2017-08-01 23:14:32 +00:00
Haicheng Wu	50692a203c	[AArch64] Fix a typo in isExtFreeImpl() next => not Differential Revision: https://reviews.llvm.org/D36104 llvm-svn: 309748	2017-08-01 21:26:45 +00:00
Eugene Zelenko	52889219ef	[Hexagon] Fix some Clang-tidy modernize-use-using and Include What You Use warnings; other minor fixes (NFC). llvm-svn: 309746	2017-08-01 21:20:10 +00:00
Martin Storsjo	eacf4e408b	[AArch64] Rewrite stack frame handling for win64 vararg functions The previous attempt, which made do with a single offset in computeCalleeSaveRegisterPairs, wasn't quite enough. The previous attempt only worked as long as CombineSPBump == true (since the offset would be adjusted later in fixupCalleeSaveRestoreStackOffset). Instead include the size for the fixed stack area used for win64 varargs in calculations in emitPrologue/emitEpilogue. The stack consists of mainly three parts; - AFI->getLocalStackSize() - AFI->getCalleeSavedStackSize() - FixedObject Most of the places in the code which previously used the CSStackSize now use PrologueSaveSize instead, which is the sum of the latter two, while some cases which need exactly the middle one use AFI->getCalleeSavedStackSize() explicitly instead of a local variable. In addition to moving the offsetting into emitPrologue/emitEpilogue (which fixes functions with CombineSPBump == false), also set the frame pointer to point to the right location, where the frame pointer and link register actually are stored. In addition to the prologue/epilogue, this also requires changes to resolveFrameIndexReference. Add tests for a function that keeps a frame pointer and another one that uses a VLA. Differential Revision: https://reviews.llvm.org/D35919 llvm-svn: 309744	2017-08-01 21:13:54 +00:00
Matt Arsenault	206f826348	AMDGPU: Fix handling of div_scale with undef inputs The src0 register must match src1 or src2, but if these were undefined they could end up using different implicit_defed virtual registers. Force these to use one undef vreg or pick the defined other register. Also fixes producing invalid nodes without the right number of inputs when src2 is undef. llvm-svn: 309743	2017-08-01 20:49:41 +00:00
Matt Arsenault	b62a4eb524	AMDGPU: Initial implementation of calls Includes a hack to fix the type selected for the GlobalAddress of the function, which will be fixed by changing the default datalayout to use generic pointers for 0. llvm-svn: 309732	2017-08-01 19:54:18 +00:00
Davide Italiano	ffb1098e92	[AMDGPU] Put a function used only inside assert() under NDEBUG. llvm-svn: 309723	2017-08-01 19:07:20 +00:00
Jacques Pienaar	922928b62d	[lanai] Add getIntImmCost in LanaiTargetTransformInfo. Add simple int immediate cost function. llvm-svn: 309721	2017-08-01 18:40:08 +00:00
Simon Pilgrim	486072d3d6	[X86][SSE] Added missing vector logic intrinsic schedules Improves atom scheduler test coverage (to make it easier to upgrade them for PR32431). Merged SSE_VEC_BIT_ITINS_P + SSE_BIT_ITINS_P as we were interchanging between them. llvm-svn: 309715	2017-08-01 17:51:20 +00:00
Craig Topper	2a5bba7325	[X86] Use BEXTR/BEXTRI for 64-bit 'and' with a large mask Summary: The 64-bit 'and' with immediate instruction only supports a 32-bit immediate. So for larger constants we have to load the constant into a register first. If the immediate happens to be a mask we can use the BEXTRI instruction to perform the masking. We already do something similar using the BZHI instruction from the BMI2 instruction set. Reviewers: RKSimon, spatel Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D36129 llvm-svn: 309706	2017-08-01 17:18:14 +00:00
Simon Pilgrim	3f24ff6130	[X86][SSE] Added missing PACKSS/PACKUS intrinsic schedules Improves atom scheduler test coverage (to make it easier to upgrade them for PR32431). Checked on Agner that these actually match the UNPACK schedules, but better to include a separate class llvm-svn: 309701	2017-08-01 16:47:48 +00:00
Simon Pilgrim	810677eba2	[X86][SSSE3] Added missing PHADDS/PHSUBS/PSIGN intrinsic schedules llvm-svn: 309699	2017-08-01 16:18:25 +00:00
Craig Topper	2462a713ae	[AVX-512] Don't use unmasked VMOVDQU8/16 for 8-bit or 16-bit element stores even when BWI instructions are supported. Always use VMOVDQA32/VMOVDQU32. We were already using the 32 bit element opcode if BWI isn't enabled, but there's no reason to change opcode if we have BWI. We will still use the 8/16 opcodes for masked stores though. This allows us to use the aligned opcode when we can which makes our test output more consistent between different modes. It also reduces the number of isel patterns we need. This is a slight inconsistency with loads which default to 64 bit element opcodes. I'll probably rectify that in a future patch. Differential Revision: https://reviews.llvm.org/D35978 llvm-svn: 309693	2017-08-01 15:31:24 +00:00
Strahinja Petrovic	a2b4748bdc	[Mips] Fix for BBIT octeon instruction This patch enables control flow optimization for variations of BBIT instruction. In this case optimization removes unnecessary branch after BBIT instruction. Differential Revision: https://reviews.llvm.org/D35359 llvm-svn: 309679	2017-08-01 13:42:45 +00:00
Krzysztof Parzyszek	91ff5c6d47	[Hexagon] Convert HVX vector constants of i1 to i8 Certain operations require vector of i1 values. However, for Hexagon architecture compatibility, they need to be represented as vector of i8. Patch by Suyog Sarda. llvm-svn: 309677	2017-08-01 13:12:53 +00:00
Tom Stellard	9d8337d857	AMDGPU/GlobalISel: Add support for amdgpu_vs calling convention Reviewers: arsenm Reviewed By: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, igorb, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D35916 llvm-svn: 309675	2017-08-01 12:38:33 +00:00
Craig Topper	410d252f5b	[AVX-512] Add unmasked subvector inserts and extract to the execution domain tables. llvm-svn: 309632	2017-07-31 22:07:29 +00:00
Konstantin Belochapka	b77d0a5cf1	[X86][MMX] Added custom lowering action for MMX SELECT (PR30418) Fix for pr30418 - error in backend: Cannot select: t17: x86mmx = select_cc t2, Constant:i64<0>, t7, t8, seteq:ch Differential Revision: https://reviews.llvm.org/D34661 llvm-svn: 309614	2017-07-31 20:11:49 +00:00
Craig Topper	cb0e74975a	[AVX-512] Remove patterns that select vmovdqu8/16 for unmasked loads. Prefer vmovdqa64/vmovdqu64 instead. These were taking priority over the aligned load instructions since there is no vmovda8/16. I don't think there is really a difference between aligned and unaligned on newer cpus so I don't think it matters which instructions we use. But with this change we reduce the size of the isel table a little and we allow the aligned information to pass through to the evex->vec pass and produce the same output has avx/avx2 in some cases. I also generally dislike patterns rooted in a bitcast which these were. Differential Revision: https://reviews.llvm.org/D35977 llvm-svn: 309589	2017-07-31 17:35:44 +00:00
Simon Pilgrim	7b89ab5887	Strip trailing whitespace. NFCI. llvm-svn: 309584	2017-07-31 17:09:27 +00:00
Simon Pilgrim	77bdbc197e	Fix typo in comment. llvm-svn: 309583	2017-07-31 17:06:55 +00:00
Aditya Nandakumar	02c602e18c	[GISel]: Support Widening G_ICMP's destination operand. Updated AArch64 to widen destination to s32. https://reviews.llvm.org/D35737 Reviewed by Tim llvm-svn: 309579	2017-07-31 17:00:16 +00:00
Amaury Sechet	4358c5217d	Do not recombine FMA when that is not needed. Summary: As per title. This creates useless recombines. Reviewers: jyknight, nemanjai, mkuper, spatel, RKSimon, zvi, bkramer Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D33848 llvm-svn: 309578	2017-07-31 16:56:25 +00:00
Florian Hahn	a3ad61d874	Exclude more unused functions from release build. llvm-svn: 309576	2017-07-31 16:44:28 +00:00
Alexey Bataev	3e9b3eb91d	[Cost] Rename getReductionCost() to getArithmeticReductionCost(), NFC. llvm-svn: 309563	2017-07-31 14:19:32 +00:00
Florian Hahn	6b3216aad8	Guard print() functions only used by dump() functions. Summary: Since r293359, most dump() function are only defined when `!defined(NDEBUG) \|\| defined(LLVM_ENABLE_DUMP)` holds. print() functions only used by dump() functions are now unused in release builds, generating lots of warnings. This patch only defines some print() functions if they are used. Reviewers: MatzeB Reviewed By: MatzeB Subscribers: arsenm, mzolotukhin, nhaehnle, llvm-commits Differential Revision: https://reviews.llvm.org/D35949 llvm-svn: 309553	2017-07-31 10:07:49 +00:00
Guy Blank	b169d56dc3	[X86][AVX512] Add masked MOVS[S\|D] patterns Added patterns to recognize AND 1 on the mask of a scalar masked move is not needed since only the lower bit is relevant for the instruction. Differential Revision: https://reviews.llvm.org/D35897 llvm-svn: 309546	2017-07-31 08:26:14 +00:00
Hiroshi Inoue	5703fe37ab	[PowerPC] Change method names; NFC Changed method names based on the discussion in https://reviews.llvm.org/D34986: getInt64 -> selectI64Imm, getInt64Count -> selectI64ImmInstrCount. llvm-svn: 309541	2017-07-31 06:27:09 +00:00
Craig Topper	97e9fa7954	[X86] Add pattern to use bzhi for 64-bit 'and' with a mask when there is a load involved. We already had a pattern without load, but with a load we were falling back to a regular 'and' due to pattern complexity priority. llvm-svn: 309535	2017-07-31 05:55:54 +00:00
Coby Tayree	48d67cdbb4	[x86][inline-asm][ms-compat] legalize the use of "jc/jz short <op>" MS ignores the keyword "short" when used after a jc/jz instruction, LLVM ought to do the same. Test: D35893 Differential Revision: https://reviews.llvm.org/D35892 llvm-svn: 309509	2017-07-30 11:12:47 +00:00
Craig Topper	951f0ca104	[X86] Add addsub intrinsics to the intrinsic lowering table so we have a single set of isel patterns. llvm-svn: 309502	2017-07-30 06:02:59 +00:00
Florian Hahn	f63a5e91db	[AArch64] Tie source and destination operands for AESMC/AESIMC. Summary: Most CPUs implementing AES fusion require instruction pairs of the form AESE Vn, _ AESMC Vn, Vn and AESD Vn, _ AESIMC Vn, Vn The constraint is added to AES(I)MC instructions which use the result of an AES(E\|D) instruction by using AES(I)MCTrr pseudo instructions, which constraint source and destination registers to be the same. A nice side effect of this change is that now all possible pairs are scheduled back-to-back on the exynos-m1 for the misched-fusion-aes.ll test case. I had to update aes_load_store. The version I added initially was very reduced and with the new constraint, AESE/AESMC could not be scheduled back-to-back. I updated the test to be more realistic and still expose the same scheduling problem as the initial test case. Reviewers: t.p.northover, rengolin, evandro, kristof.beyls, silviu.baranga Reviewed By: t.p.northover, evandro Subscribers: aemerson, javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D35299 llvm-svn: 309495	2017-07-29 20:35:28 +00:00
Florian Hahn	2f86e3d494	[AArch64] Use 8 bytes as preferred function alignment on Cortex-A53. Summary: This change gives a 0.25% speedup on execution time, a 0.82% improvement in benchmark scores and a 0.20% increase in binary size on a Cortex-A53. These numbers are the geomean results on a wide range of benchmarks from the test-suite and a range of proprietary suites. Reviewers: t.p.northover, aadg, silviu.baranga, mcrosier, rengolin Reviewed By: rengolin Subscribers: grimar, davide, aemerson, rengolin, javed.absar, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D35568 llvm-svn: 309494	2017-07-29 20:04:54 +00:00
Simon Pilgrim	718cb0ea62	[SelectionDAG][X86] CombineBT - more aggressively determine demanded bits This patch is in 2 parts: 1 - replace combineBT's use of SimplifyDemandedBits (hasOneUse only) with SelectionDAG::GetDemandedBits to more aggressively determine the lower bits used by BT. 2 - update SelectionDAG::GetDemandedBits to support ANY_EXTEND - if the demanded bits are only in the non-extended portion, then peek through and demand from the source value and then ANY_EXTEND that if we found a match. Differential Revision: https://reviews.llvm.org/D35896 llvm-svn: 309486	2017-07-29 14:50:25 +00:00
Tom Stellard	503fd446ad	AMDGPU: Remove deadcode from AMDGPUInstPrinter Reviewers: arsenm Reviewed By: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D36034 llvm-svn: 309477	2017-07-29 03:56:53 +00:00
Tom Stellard	5c50cdf0e8	AMDGPU: Move INDIRECT_BASE_ADDR definition out of common files Summary: This is only used by R600. Reviewers: arsenm Reviewed By: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D35926 llvm-svn: 309476	2017-07-29 03:44:07 +00:00
Jessica Paquette	d87f54493d	[MachineOutliner] NFC: Change IsTailCall to a call class + frame class This commit - Removes IsTailCall and replaces it with a target-defined unsigned - Refactors getOutliningCallOverhead and getOutliningFrameOverhead so that they don't use IsTailCall - Adds a call class + frame class classification to OutlinedFunction and Candidate respectively This accomplishes a couple things. Firstly, we don't need the notion of tail call in the general outlining algorithm. Secondly, we now can have different "outlining classes" for each candidate within a set of candidates. This will make it easy to add new ways to outline sequences for certain targets and dynamically choose an appropriate cost model for a sequence depending on the context that that sequence lives in. Ultimately, this should get us closer to being able to do something like, say avoid saving the link register when outlining AArch64 instructions. llvm-svn: 309475	2017-07-29 02:55:46 +00:00
Matt Arsenault	9608a2891d	AMDGPU: Make areMemAccessesTriviallyDisjoint more aware of segment flat Checking the encoding is insufficient since now there can be global or scratch instructions. llvm-svn: 309472	2017-07-29 01:26:21 +00:00
Matt Arsenault	dc8f5cc39c	AMDGPU: Teach isLegalAddressingMode about global_* instructions Also refine the flat check to respect flat-for-global feature, and constant fallback should check global handling, not specifically MUBUF. llvm-svn: 309471	2017-07-29 01:12:31 +00:00

1 2 3 4 5 ...

43409 Commits