llvm-project

Commit Graph

Author	SHA1	Message	Date
Francis Visoiu Mistrih	90aba024c5	[MC] Fallback on DWARF when generating compact unwind on AArch64 Instead of asserting when using the def_cfa directive with a register different from fp, fallback on DWARF. Easily triggered with: .cfi_def_cfa x1, 32; rdar://40249694 Differential Revision: https://reviews.llvm.org/D47593 llvm-svn: 333667	2018-05-31 16:33:26 +00:00
Roman Tereshin	f34d7ecc15	[GlobalISel][Mips] LegalizerInfo verifier: Adding LegalizerInfo::verify(...) call for Mips Reviewers: aemerson, qcolombet Reviewed By: qcolombet Differential Revision: https://reviews.llvm.org/D46339 llvm-svn: 333665	2018-05-31 16:16:49 +00:00
Roman Tereshin	76c29c68dc	[GlobalISel][AMDGPU] LegalizerInfo verifier: Adding LegalizerInfo::verify(...) call for AMDGPU Reviewers: aemerson, qcolombet Reviewed By: qcolombet Differential Revision: https://reviews.llvm.org/D46339 llvm-svn: 333664	2018-05-31 16:16:48 +00:00
Roman Tereshin	667c7581ed	[GlobalISel][ARM] LegalizerInfo verifier: Adding LegalizerInfo::verify(...) call and fixing bugs exposed Reviewers: aemerson, qcolombet Reviewed By: qcolombet Differential Revision: https://reviews.llvm.org/D46339 llvm-svn: 333663	2018-05-31 16:16:48 +00:00
Roman Tereshin	cc1a16fdf9	[GlobalISel][X86] LegalizerInfo verifier: Adding LegalizerInfo::verify(...) call and fixing bugs exposed Reviewers: aemerson, qcolombet Reviewed By: qcolombet Differential Revision: https://reviews.llvm.org/D46339 llvm-svn: 333662	2018-05-31 16:16:47 +00:00
Simon Pilgrim	ff0623cd29	[X86][SSE] Recognise splat rotations and expand back to shift ops. Noticed while fixing PR37426, for splat rotations (rotation by an uniform value) its better to just expand back to shift ops than performing as a general non-uniform rotation. llvm-svn: 333661	2018-05-31 15:47:17 +00:00
Simon Pilgrim	c34395d889	[X86][AVX] Add peekThroughEXTRACT_SUBVECTORs helper (NFCI) We often need this for AVX1 128-bit integer ops as they may have been split from a 256-bit source. llvm-svn: 333660	2018-05-31 15:15:49 +00:00
Clement Courbet	2e41c5a79c	[X86] Introduce WriteFLDC for x87 constant loads. Summary: {FLDL2E, FLDL2T, FLDLG2, FLDLN2, FLDPI} were using WriteMicrocoded. - I've measured the values for Broadwell, Haswell, SandyBridge, Skylake. - For ZnVer1 and Atom, values were transferred form InstRWs. - For SLM and BtVer2, I've guessed some values :( Reviewers: RKSimon, craig.topper, andreadb Subscribers: gbedwell, llvm-commits Differential Revision: https://reviews.llvm.org/D47585 llvm-svn: 333656	2018-05-31 14:22:01 +00:00
Simon Dardis	d9a453832d	[mips] Guard all short instructions correctly. Reviewers: smaksimovic, atanasyan, abeserminji Differential Revision: https://reviews.llvm.org/D47533 llvm-svn: 333645	2018-05-31 12:47:01 +00:00
Clement Courbet	b78ab5097d	[X86] Extract latency of fldz/fld1 in separate classes. Summary: - I've measured the values for Broadwell, Haswell, SandyBridge, Skylake. - For ZnVer1 and Atom, values were transferred form `InstRW`s. - For SLM and BtVer2, values are from Agner. This is split off from https://reviews.llvm.org/D47377 Reviewers: RKSimon, andreadb Subscribers: gbedwell, llvm-commits Differential Revision: https://reviews.llvm.org/D47523 llvm-svn: 333642	2018-05-31 11:41:27 +00:00
Simon Pilgrim	346886bc0d	[X86][SSE] Add support for detecting SUB(SPLAT_BV, SPLAT) cases for shift-rotate patterns. This improves splat rotations (rotation by an uniform value), to avoid having to use the generic non-uniform shift code (extension to PR37426). llvm-svn: 333641	2018-05-31 11:25:16 +00:00
Luke Geeson	2e09995d42	[AArch64] Reverted rL333427 fixing Clang UnitTest Failure llvm-svn: 333634	2018-05-31 08:27:53 +00:00
Stanislav Mekhanoshin	d4b500cb08	[AMDGPU] Track occupancy in MFI Keep track of achieved occupancy in SIMachineFunctionInfo. At the moment we have a lot of duplicated or even missed code to query and maintain occupancy info. Record it in the MFI and query in a single call. Interfaces: - getOccupancy() - returns current recorded achieved occupancy. - getMinAllowedOccupancy() - returns lesser of the achieved occupancy and the lowest occupancy we are ready to tolerate. For example if a kernel is memory bound we are ready to tolerate 4 waves. - limitOccupancy() - record occupancy level if we have to lower it. - increaseOccupancy() - record occupancy if scheduler managed to increase the occupancy. MFI takes care of integrating different checks affecting occupancy, including LDS use and waves-per-eu attribute. Note that scheduler starts with not yet known register pressure, so has to record either limit or increase in occupancy after it is done. Later passes can just query a resulting value. New interface is used in the active scheduler and NFC wrt its work. Changes are also made to experimental schedulers to use it and record an occupancy after they are done. Before the change waves-per-eu was ignored by experimental schedulers and tolerance window for memory bound kernels was not used. Differential Revision: https://reviews.llvm.org/D47509 llvm-svn: 333629	2018-05-31 05:36:04 +00:00
Jan Vesely	f5016b79a6	AMDGPU/R600: Make sure functions are cacheline aligned v2: use "ensureAlignment" make functions cache line aligned Fixes GPU hangs since r333219: "AMDGPU: Split R600 AsmPrinter code into its own class" Differential Revision: https://reviews.llvm.org/D47516 llvm-svn: 333622	2018-05-31 04:08:08 +00:00
Roman Tereshin	5a65eb75c7	[GlobalISel][AArch64] LegalizerInfo verifier: Fixing bugs exposed by LegalizerInfo::verify(...) Reviewers: aemerson, qcolombet Reviewed By: qcolombet Differential Revision: https://reviews.llvm.org/D46339 llvm-svn: 333618	2018-05-31 01:56:05 +00:00
Tom Stellard	c7624317d7	AMDGPU: Split AMDGPUTTI into GCNTTI and R600TTI Reviewers: arsenm, nhaehnle Reviewed By: arsenm Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D47359 llvm-svn: 333605	2018-05-30 22:55:35 +00:00
Roman Tereshin	8f1753e994	[GlobalISel][AArch64] LegalizerInfo verifier: Adding LegalizerInfo::verify(...) call w/o fixing bugs This is to make it clear what kind of bugs the LegalizerInfo::verifier is able to catch and test its output Reviewers: aemerson, qcolombet Reviewed By: aemerson Differential Revision: https://reviews.llvm.org/D46338 llvm-svn: 333597	2018-05-30 22:10:04 +00:00
Simon Pilgrim	5e9f459c62	[X86][SSE] Pulled out splat detection helper from LowerScalarVariableShift (NFCI) Created the IsSplatValue helper from the splat detection code in LowerScalarVariableShift as a first NFC step towards improving support for splat rotations, which is an extension of PR37426. llvm-svn: 333580	2018-05-30 19:16:59 +00:00
Mark Searles	ed54ff1d51	[AMDGPU][Waitcnt] Fix build error: unused variable 'SWaitInst' https://reviews.llvm.org/rL333556 caused a buildbot failure. See http://lab.llvm.org:8011/builders/lld-x86_64-darwin13/builds/21876/steps/build_Lld/logs/stdio /Users/buildslave/as-bldslv9/lld-x86_64-darwin13/llvm.src/lib/Target/AMDGPU/SIInsertWaitcnts.cpp:2007:10: error: unused variable 'SWaitInst' [-Werror,-Wunused-variable] auto SWaitInst = BuildMI(EntryBB, EntryBB.getFirstNonPHI(), The unused variable was for debugging purposes; removing that piece of code to fix the build. llvm-svn: 333559	2018-05-30 16:27:57 +00:00
Matt Arsenault	7b4826e6ce	AMDGPU: Use better alignment for kernarg lowering This was just emitting loads with the ABI alignment for the raw type. The true alignment is often better, especially when an illegal vector type was scalarized. The better alignment allows using a scalar load more often. llvm-svn: 333558	2018-05-30 16:17:51 +00:00
Mark Searles	1054541490	[AMDGPU][Waitcnt] Fix handling of loops with many bottom blocks In terms of waitcnt insertion/if necessary, the waitcnt pass forces convergence for a loop. Previously, that kicked if greater than 2 passes over a loop, which doesn't account for loop with many bottom blocks. So, increase the threshold to (n+1), where n is the number of bottom blocks. This gives the pass an opportunity to consider the contribution of each bottom block, to the overall loop, before the forced convergence potentially kicks in. Differential Revision: https://reviews.llvm.org/D47488 llvm-svn: 333556	2018-05-30 15:47:45 +00:00
Gabor Buella	890e363e11	[X86] Lowering FMA intrinsics to native IR (LLVM part) Support for Clang lowering of fused intrinsics. This patch: 1. Removes bindings to clang fma intrinsics. 2. Introduces new LLVM unmasked intrinsics with rounding mode: int_x86_avx512_vfmadd_pd_512 int_x86_avx512_vfmadd_ps_512 int_x86_avx512_vfmaddsub_pd_512 int_x86_avx512_vfmaddsub_ps_512 supported with a new intrinsic type (INTR_TYPE_3OP_RM). 3. Introduces new x86 fmaddsub/fmsubadd folding. 4. Introduces new tests for code emitted by sequentions introduced in Clang part. Patch by tkrupa Reviewers: craig.topper, sroland, spatel, RKSimon Reviewed By: craig.topper, RKSimon Differential Revision: https://reviews.llvm.org/D47443 llvm-svn: 333554	2018-05-30 15:25:16 +00:00
Amaury Sechet	f47d9f30b0	[ARM] Remove code handling ADDC/ADDE/SUBC/SUBE Summary: This code is now dead as the ARM backend uses ADDCARRY/SUBCARRY/SETCCCARRY . Reviewers: rogfer01, efriedma, rengolin, javed.absar Subscribers: kristof.beyls, chrib, llvm-commits Differential Revision: https://reviews.llvm.org/D47413 llvm-svn: 333544	2018-05-30 13:45:43 +00:00
Krzysztof Parzyszek	8987174627	[Hexagon] Use vector align-left when shift amount fits in 3 bits This saves an instruction because for align-right the shift amount would need to be put in a register first. llvm-svn: 333543	2018-05-30 13:45:34 +00:00
Simon Dardis	f990bf1cd2	[mips] Correct the definition of CTC2/CFC2 llvm-svn: 333542	2018-05-30 13:21:13 +00:00
Simon Dardis	a3aa926c09	[mips] Correct the predicates of microMIPS compact branch instructions llvm-svn: 333541	2018-05-30 13:16:17 +00:00
Simon Dardis	f909058ad4	[mips] Sink PredicateControl further down the class hierarchy. Previously PredicateControl in some cases was a member of <X>Inst classes for some X (DSP, EVA) or was in more irregular place in the hierarchry for any given instruction. This patch moves PredicateControl down to the root so that it is consistently available. Then correct the base class of microMIPS instructions as using EncodingPredicates instead of the general Predicates field of Instruction. Reviewers: smaksimovic, abeserminji, atanasyan Differential Revision: https://reviews.llvm.org/D47526 llvm-svn: 333536	2018-05-30 12:40:53 +00:00
Simon Dardis	39710e3555	[mips] Correct the predicates of arithmetic and logic instructions. As part of this effort, duplicate and correct the predicates of some aliases. Also disable code generation of some short form instructions for FastISel, as it would otherwise reject them. Reviewers: atanasyan, abeserminji, smaksimovic Differential Revision: https://reviews.llvm.org/D47075 llvm-svn: 333530	2018-05-30 11:33:35 +00:00
Tim Northover	d8949f5002	AArch64: print correct annotation for ADRP addresses. The immediate on an ADRP MCInst needs to be multiplied by 0x1000 to obtain the actual PC-offset that will be calculated. llvm-svn: 333525	2018-05-30 09:54:59 +00:00
Sander de Smalen	bdf09fe7a2	[AArch64][AsmParser] Fix segfault on illegal fpimm. Floating point immediate combining a negative sign and a hexadecimal number, e.g. #-0x0 caused the compiler to crash. Reviewers: rengolin, fhahn, samparker, SjoerdMeijer, javed.absar Reviewed By: javed.absar Differential Revision: https://reviews.llvm.org/D47483 llvm-svn: 333524	2018-05-30 09:54:19 +00:00
Daniel Cederman	248dae81dc	[Sparc] Treat %fxx registers with value type Other as single precision They get type Other when used in the clobber list in inline assembly. This fixes tests fp128.ll and float.ll that failed after r333512. llvm-svn: 333523	2018-05-30 09:52:18 +00:00
Daniel Cederman	60e6ce4155	[Sparc] Select correct register class for FP register constraints Summary: The fX version of floating-point registers only supports single precision. We need to map the name to dX for doubles and qX for long doubles if we want getRegForInlineAsmConstraint() to be able to pick the correct register class. Reviewers: jyknight, venkatra Reviewed By: jyknight Subscribers: eraman, fedor.sergeev, jrtc27, llvm-commits Differential Revision: https://reviews.llvm.org/D47258 llvm-svn: 333512	2018-05-30 06:07:55 +00:00
Craig Topper	cc0741e59f	[X86] Add unmasked AVX512VNNI instrinsics. Use a select in IR instead. A future patch will remove the old masked intrinsics. llvm-svn: 333508	2018-05-30 05:25:59 +00:00
Shiva Chen	c3d0e89284	[RISCV] Support resolving fixup_riscv_call and add to MCFixupKindInfo table Resolving fixup_riscv_call by assembler when the linker relaxation diabled and the function and callsite within the same compile unit. And also adding static_assert after Infos array declaration to avoid missing any new fixup in MCFixupKindInfo in the future. Differential Revision: https://reviews.llvm.org/D47126 llvm-svn: 333487	2018-05-30 01:16:36 +00:00
Craig Topper	5989db0fb4	[X86] Remove some of the extractelts from the new MOVSS+FMA patterns. We only need the extractelt that corresponds to the register we're trying to insert back into. We can't guarantee the others haven't been optimized out depending on how those operands were produced. So instead just look for an FR32/FR64 input and emit a COPY_TO_REGCLASS to VR128 in the output pattern. This matches what we do for ADD/SUB/MUL/DIV. llvm-svn: 333473	2018-05-29 22:52:09 +00:00
Craig Topper	dbd371e931	[X86] Use VR128X instead of VR128 in EVEX instruction patterns. llvm-svn: 333464	2018-05-29 20:46:27 +00:00
Craig Topper	aba57bfebd	[X86] Rename the operands in the recently introduced MOVSS+FMA patterns so that the operand names in the output pattern are always in 1, 2, 3 order since those are the operand names in the instruction. The order should be controlled in the input pattern. llvm-svn: 333463	2018-05-29 20:46:26 +00:00
Craig Topper	5439b3d1e5	[X86] Fix a potential crash that occur after r333419. The code could issue a truncate from a small type to larger type. We need to extend in that case instead. llvm-svn: 333460	2018-05-29 20:04:10 +00:00
Matt Arsenault	2e4d338d16	AMDGPU: Fix typo in option description llvm-svn: 333457	2018-05-29 19:35:46 +00:00
Matt Arsenault	1ea0402e82	AMDGPU: Round up kernel argument allocation size AFAIK the driver's allocation will actually have to round this up anyway. It is useful to track the rounded up size, so that the end of the kernel segment is known to be dereferencable so a wider s_load_dword can be used for a short argument at the end of the segment. llvm-svn: 333456	2018-05-29 19:35:00 +00:00
Sameer AbuAsal	97684419e8	[RISCV] Add peepholes for Global Address lowering patterns Summary: Base and offset are always separated when a GlobalAddress node is lowered (rL332641) as an optimization to reduce instruction count. However, this optimization is not profitable if the Global Address ends up being used in only instruction. This patch adds peephole optimizations that merge an offset of an address calculation into the LUI %%hi and ADD %lo of the lowering sequence. The peephole handles three patterns: 1) ADDI (ADDI (LUI %hi(global)) %lo(global)), offset ---> ADDI (LUI %hi(global + offset)) %lo(global + offset). This generates: lui a0, hi (global + offset) add a0, a0, lo (global + offset) Instead of lui a0, hi (global) addi a0, hi (global) addi a0, offset This pattern is for cases when the offset is small enough to fit in the immediate filed of ADDI (less than 12 bits). 2) ADD ((ADDI (LUI %hi(global)) %lo(global)), (LUI hi_offset)) ---> offset = hi_offset << 12 ADDI (LUI %hi(global + offset)) %lo(global + offset) Which generates the ASM: lui a0, hi(global + offset) addi a0, lo(global + offset) Instead of: lui a0, hi(global) addi a0, lo(global) lui a1, (offset) add a0, a0, a1 This pattern is for cases when the offset doesn't fit in an immediate field of ADDI but the lower 12 bits are all zeros. 3) ADD ((ADDI (LUI %hi(global)) %lo(global)), (ADDI lo_offset, (LUI hi_offset))) ---> offset = global + offhi20<<12 + offlo12 ADDI (LUI %hi(global + offset)) %lo(global + offset) Which generates the ASM: lui a1, %hi(global + offset) addi a1, %lo(global + offset) Instead of: lui a0, hi(global) addi a0, lo(global) lui a1, (offhi20) addi a1, (offlo12) add a0, a0, a1 This pattern is for cases when the offset doesn't fit in an immediate field of ADDI and both the lower 1 bits and high 20 bits are non zero. Reviewers: asb Reviewed By: asb Subscribers: rbar, johnrusso, simoncook, jordy.potman.lists, apazos, niosHD, kito-cheng, shiva0217, zzheng, edward-jones, mgrang llvm-svn: 333455	2018-05-29 19:34:54 +00:00
Konstantin Zhuravlyov	2ca6b1f2ba	AMDGPU: Always set COMPUTE_PGM_RSRC2.ENABLE_TRAP_HANDLER to zero for AMDHSA as it is set by CP Differential Revision: https://reviews.llvm.org/D47392 llvm-svn: 333451	2018-05-29 19:09:13 +00:00
Eli Friedman	63fead0f43	[ARM] Enable SETCCCARRY lowering for Thumb1. We've had Thumb1 support for ARMISD::SUBE for a while now, so this just works. Reduces codesize a bit for 64-bit integer comparisons. Differential Revision: https://reviews.llvm.org/D47387 llvm-svn: 333445	2018-05-29 18:17:16 +00:00
Matt Arsenault	ceafc55e5a	AMDGPU: Pass function directly instead of MachineFunction These functions just query the underlying IR function, so pass it directly. llvm-svn: 333442	2018-05-29 17:42:50 +00:00
Matt Arsenault	2fb9ccf770	AMDGPU: Add nuw to add off of kernarg ptr llvm-svn: 333441	2018-05-29 17:42:38 +00:00
Matt Arsenault	ab2b79cb97	DAG: Remove redundant version of getRegisterTypeForCallingConv There seems to be no real reason to have these separate copies. The existing implementations just copy each other for x86. For Mips there is a subtle difference, which is just a bug since it changes based on the context where which one was called. Dropping this version, all tests pass. If I try to merge them to match the removed version, a test fails. llvm-svn: 333440	2018-05-29 17:42:26 +00:00
Tom Stellard	57b9342c80	AMDGPU: Split R600 MCInst lowering into its own class Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D47307 llvm-svn: 333439	2018-05-29 17:41:59 +00:00
Evandro Menezes	f8425340e4	[AArch64] Fix PR32384: bump up the number of stores per memset and memcpy As suggested in https://bugs.llvm.org/show_bug.cgi?id=32384#c1, this change makes the inlining of `memset()` and `memcpy()` more aggressive when compiling for speed. The tuning remains the same when optimizing for size. Patch by: Sebastian Pop <s.pop@samsung.com> Evandro Menezes <e.menezes@samsung.com> Differential revision: https://reviews.llvm.org/D45098 llvm-svn: 333429	2018-05-29 15:58:50 +00:00
Simon Atanasyan	69301c9eb9	[mips] Process numeric register name in the .set assignment directive Now LLVM assembler cannot process the following code and generates an error. GNU tools support .set assignment directive with numeric register name. ``` .set r4, 4 test.s:1:11: error: invalid token in expression .set r4, $4 ^ ``` This patch teach assembler to handle such directives correctly. Unfortunately a numeric register name cannot be represented as an expression. That's why we have to maintain a separate `StringMap` in the `MipsAsmParser` to keep mapping between aliases names and register numbers. Differential revision: https://reviews.llvm.org/D47464 llvm-svn: 333428	2018-05-29 15:58:06 +00:00
Amara Emerson	d5a9e7bbc9	Revert "[AArch64] added FP16 vcvth intrinsic support" This reverts commit r333410 due to bot failures. llvm-svn: 333427	2018-05-29 15:34:22 +00:00
Sander de Smalen	8704b03c4d	[AArch64][SVE] Asm: Support for predicated LSL/LSR (vectors) Reviewers: rengolin, huntergr, fhahn, samparker, SjoerdMeijer, javed.absar Reviewed By: SjoerdMeijer Differential Revision: https://reviews.llvm.org/D47365 llvm-svn: 333422	2018-05-29 14:40:24 +00:00
Jonas Devlieghere	43dce3edbe	[CodeView] Add prefix to CodeView registers. Adds CVReg to CodeView register names to prevent a duplicate symbol with CR3 defined in termios.h, as suggested by Zachary on the mailing list. http://lists.llvm.org/pipermail/llvm-dev/2018-May/123372.html Differential revision: https://reviews.llvm.org/D47478 rdar://39863705 llvm-svn: 333421	2018-05-29 14:35:34 +00:00
Alexander Ivchenko	96062eaa8e	[X86] Scalar mask and scalar move optimizations 1. Introduction of mask scalar TableGen patterns. 2. Introduction of new scalar move TableGen patterns and refactoring of existing ones. 3. Folding of pattern created by introducing scalar masking in Clang header files. Patch by tkrupa Differential Revision: https://reviews.llvm.org/D47012 llvm-svn: 333419	2018-05-29 14:27:11 +00:00
Lei Huang	716103f1cd	[PowerPC] Fix the incorrect iterator inside peephole Instruction selection can insert nodes into the underlying list after the root node so iterating will thereby miss it. We should NOT assume that, the root node is the last element in the DAG nodelist. Patch by: steven.zhang (Qing Shan Zhang) Differential Revision: https://reviews.llvm.org/D47437 llvm-svn: 333415	2018-05-29 13:38:56 +00:00
Sander de Smalen	26b9b2a8c3	[AArch64][SVE] Asm: Support for AND, ORR, EOR and BIC instructions. This patch addresses the following variants: - bitmask immediate, e.g. 'and z0.d, z0.d, #0x6'. - unpredicated data vectors, e.g. 'and z0.d, z1.d, z2.d'. - predicated data vectors, e.g. 'and z0.d, p0/m, z0.d, z1.d'. And also several aliases, such as: - ORN, alias of ORR. - EON, alias of EOR. - BIC, alias of AND (immediate variant) - MOV, alias of ORR (if unpredicated and source register operands are the same) Reviewers: rengolin, huntergr, fhahn, samparker, SjoerdMeijer, javed.absar Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D47363 llvm-svn: 333414	2018-05-29 13:08:43 +00:00
Luke Geeson	16092ab3c5	[AArch64] added FP16 vcvth intrinsic support Summary: Change-Id: I0df845749c7689dfc99150ba7c19c7d0dadbd705 Reviewers: javed.absar, SjoerdMeijer Reviewed By: SjoerdMeijer Subscribers: llvm-commits, SjoerdMeijer Differential Revision: https://reviews.llvm.org/D46311 llvm-svn: 333410	2018-05-29 11:40:33 +00:00
Simon Atanasyan	a1d69f9e53	[mips] Emit R_MICROMIPS_GPREL16/R_MICROMIPS_SUB/R_MICROMIPS_LO16 / HI16 relocations Emit R_MICROMIPS_GPREL16/R_MICROMIPS_SUB/R_MICROMIPS_LO16 and R_MICROMIPS_GPREL16/R_MICROMIPS_SUB/R_MICROMIPS_HI16 chains of relocations for %lo(%neg(%gp_rel())) and %hi(%neg(%gp_rel())) expressions in case of microMIPS. Differential Revision: http://reviews.llvm.org/D47220 llvm-svn: 333409	2018-05-29 11:33:54 +00:00
Sander de Smalen	98686c6b15	[AArch64][SVE] Asm: Support for ADD (immediate) instructions. This patch adds addsub_imm8_opt_lsl_(i8\|i16\|i32\|i64) operands that are unsigned values in the range 0 to 255. For element widths of 16 bits or higher it may also be a signed multiple of 256 in the range 0 to 65280. Note: This also does some refactoring to reuse convenience function getShiftedVal<shift>(), and now allows AArch64 scalar 'ADD #-4096' to be accepted to be mapped to SUB #4096. Reviewers: rengolin, fhahn, samparker, SjoerdMeijer, javed.absar Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D47310 llvm-svn: 333408	2018-05-29 10:39:49 +00:00
Simon Atanasyan	6be87bce29	[mips] Emit R_MICROMIPS_HIGHER / R_MICROMIPS_HIGHEST relocations Emit R_MICROMIPS_HIGHER / R_MICROMIPS_HIGHEST relocations for %higher() and %highest() expressions in case of microMIPS. These relocations do exactly the same things as R_MIPS_HIGHER / R_MIPS_HIGHEST, but for consistency it's better to write microMIPS variants. Differential Revision: http://reviews.llvm.org/D47219 llvm-svn: 333407	2018-05-29 10:27:44 +00:00
Simon Dardis	0fad58cbaf	[mips] Correct the predicates for a number of instructions. Previously, their listed predicates were overridden at the scope level. Reviewers: atanasyan, abeserminji, smaksimovic Differential Revision: https://reviews.llvm.org/D46947 llvm-svn: 333405	2018-05-29 09:56:19 +00:00
Simon Atanasyan	b2d61fa3d8	[mips] Cleanup the code to reduce diff with the upcoming patches. NFC llvm-svn: 333404	2018-05-29 09:51:33 +00:00
Simon Atanasyan	d408ec4cfa	[mips] Escape else-after-return. NFC llvm-svn: 333403	2018-05-29 09:51:28 +00:00
Simon Atanasyan	3535cb1130	[mips] Stop parsing a .set assignment if the first argument is not an identifier Before this fix the following code triggers two error messages. The second one is at least useless: test.s:1:9: error: expected identifier after .set .set 123, $a0 ^ test-set.s:1:9: error: unexpected token, expected comma .set 123, $a0 ^ llvm-svn: 333402	2018-05-29 09:51:22 +00:00
Tim Renouf	fa213f797b	[AMDGPU] Fixed build warning Summary: V2: Use cast instead of extra if. Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D47426 Change-Id: I6ac31da0306f79706960284a7ebd7b9c6237a83a llvm-svn: 333397	2018-05-29 08:15:37 +00:00
Craig Topper	a34f8731c7	[X86] Disable a DAG combine to allow packed AVX512DQ instructions to be consistently used for i64->float/double conversions. Summary: We already get this right if the i64 didn't come from a load. Reviewers: RKSimon Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D47439 llvm-svn: 333393	2018-05-29 06:22:45 +00:00
Clement Courbet	07c9ec6f2e	[X86][Sched] Add InstRW for CLC on Intel after SNB. Summary: After SNB, Intel CPUs can rename CF independently of other EFLAGS, so the renamer can zero it for free. Note that STC still consumes resources. To reproduce: `$ llvm-exegesis -mode=uops -opcode-name=CLC` On SNB: ``` --- key: opcode_name: CLC mode: uops config: '' cpu_name: sandybridge llvm_triple: x86_64-unknown-linux-gnu num_repetitions: 10000 measurements: - { key: '3', value: 0.0014, debug_string: SBPort0 } - { key: '4', value: 0.0013, debug_string: SBPort1 } - { key: '5', value: 0.0003, debug_string: SBPort4 } - { key: '6', value: 0.0029, debug_string: SBPort5 } - { key: '10', value: 0.0003, debug_string: SBPort23 } error: '' info: 'instruction is serial, repeating a random one. Snippet: CLC ' ... ``` On HSW: ``` --- key: opcode_name: CLC mode: uops config: '' cpu_name: haswell llvm_triple: x86_64-unknown-linux-gnu num_repetitions: 10000 measurements: - { key: '3', value: 0.001, debug_string: HWPort0 } - { key: '4', value: 0.0009, debug_string: HWPort1 } - { key: '5', value: 0.0004, debug_string: HWPort2 } - { key: '6', value: 0.0006, debug_string: HWPort3 } - { key: '7', value: 0.0002, debug_string: HWPort4 } - { key: '8', value: 0.0012, debug_string: HWPort5 } - { key: '9', value: 0.0022, debug_string: HWPort6 } - { key: '10', value: 0.0001, debug_string: HWPort7 } error: '' info: 'instruction is serial, repeating a random one. Snippet: CLC ' ... ``` Reviewers: craig.topper, RKSimon Subscribers: gchatelet, llvm-commits Differential Revision: https://reviews.llvm.org/D47362 llvm-svn: 333392	2018-05-29 06:19:39 +00:00
Craig Topper	21aeddc3dc	[X86] Remove masked vpermi2var/vpermt2var intrinsics and autoupgrade. We have unmasked intrinsics now and wrap them with a select. This is a net reduction of 36 intrinsics from before the unmasked intrinsics were added. llvm-svn: 333388	2018-05-29 05:22:05 +00:00
Craig Topper	2adc7d956c	[X86] Add unmasked vermi2var intrinsics so we can use explicit select instructions for masking in clang. This will allow us to remove the 3 different flavors of masked intrinsics. I'm leaving the actual intrinsic removal for another patch. llvm-svn: 333386	2018-05-29 03:26:30 +00:00
Craig Topper	dcfcfdb0d1	[X86] Converge X86ISD::VPERMV3 and X86ISD::VPERMIV3 to a single opcode. These do the same thing with the first and second sources swapped. They previously came from separate intrinsics that specified different masking behavior. But we can cover that with isel patterns and a single node. This is a step towards reducing the number of intrinsics needed. A bunch of tests change because we are now biased to choosing VPERMT over VPERMI when there is nothing to signal that commuting is beneficial. llvm-svn: 333383	2018-05-28 19:33:11 +00:00
Craig Topper	6b545182fb	[X86] Fix typo in comment. NFC llvm-svn: 333382	2018-05-28 19:33:06 +00:00
Farhana Aleen	eacb1020aa	[AMDGPU] Re-enabled 128bit wide-vector generation for local addr space by default. Summary: Bug reported here https://bugs.freedesktop.org/show_bug.cgi?id=105464 found to be resolved by some other fixes. Author: FarhanaAleen llvm-svn: 333380	2018-05-28 18:15:11 +00:00
Lei Huang	651be44913	[Power9]Legalize and emit code for HW/Byte vector extract and convert to QP Implemente patterns to extract HWord and Byte vector elements and convert to quad-precision. Differential Revision: https://reviews.llvm.org/D46774 llvm-svn: 333377	2018-05-28 16:43:29 +00:00
Zaara Syeda	6f3df02fdc	[PowerPC] Set isAsmParserOnly=1 for X-form TLS loads/stores The X-form TLS load/store instructions added for optimizing the initial-exec sequence in https://reviews.llvm.org/rL327635 fail to assemble. llvm-mc fails with the error: invalid operand for instruction. This patch adds these instructions into a block with isAsmParserOnly, similar to how ADD8TLS_ is currently handled. Differential Revision: https://reviews.llvm.org/D47382 llvm-svn: 333374	2018-05-28 15:27:58 +00:00
Daniel Cederman	2e7fe0edaf	[Sparc] Add .uahalf and .uaword directives Summary: Adding these makes it easier to assemble the output from GCC which generates a lot of .uahalf and .uaword directives. GAS treats .uahalf and .half the same unless the --enforce-aligned-data flag is used. I could not find a similar flag for LLVM so it seems that .half does not have any alignment requirement and is treated the same as .uahalf should be. If that would change later on then the tests in sparc-directives.s would fail due to bad alignment. Reviewers: jyknight, asb Reviewed By: jyknight Subscribers: fedor.sergeev, jrtc27, llvm-commits Differential Revision: https://reviews.llvm.org/D47319 llvm-svn: 333372	2018-05-28 12:42:55 +00:00
Craig Topper	26bc84860a	[X86] Stop forcing X86VPermi2X node index operand to match destination type to make masking pattern matching easier. Add extra patterns with bitcasts instead. This basically reverts r280696 in favor of using extra patterns as mentioned as an alternative in that commit message. For now I've only added the cases we have test cases for, but it should be easy to add more in the future. This will help to convert VPERMI2PS/VPERMT2PS intrinsics to use a single ISD node opcode. And hopefully allow some intrinsics to be removed. llvm-svn: 333365	2018-05-28 05:37:25 +00:00
Tim Renouf	364edcd2e5	[AMDGPU] Fixed WWM bug in block otherwise entirely in WQM Summary: For a block with WQM on entry and exit and containing no exact mode code, but containing some WWM code, the WQM pass forgot to process the block at all and so did not insert code to enter and leave WWM. This commit fixes that. Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D47027 Change-Id: I044792eead1293bed4203fb26ce75f47878afeb6 llvm-svn: 333362	2018-05-27 17:26:11 +00:00
Simon Pilgrim	79dae5ba2a	[X86] Don't hardcode scheduler class Also fixes BEXTRI instruction to use WritBEXTR class, which was missed when the class was added. llvm-svn: 333360	2018-05-27 14:54:18 +00:00
David Green	aee7ad0cde	Revert 333358 as it's failing on some builders. I'm guessing the tests reply on the ARM backend being built. llvm-svn: 333359	2018-05-27 12:54:33 +00:00
David Green	3034281b43	[UnrollAndJam] Add a new Unroll and Jam pass This is a simple implementation of the unroll-and-jam classical loop optimisation. The basic idea is that we take an outer loop of the form: for i.. ForeBlocks(i) for j.. SubLoopBlocks(i, j) AftBlocks(i) Instead of doing normal inner or outer unrolling, we unroll as follows: for i... i+=2 ForeBlocks(i) ForeBlocks(i+1) for j.. SubLoopBlocks(i, j) SubLoopBlocks(i+1, j) AftBlocks(i) AftBlocks(i+1) Remainder So we have unrolled the outer loop, then jammed the two inner loops into one. This can lead to a simpler inner loop if memory accesses can be shared between the now-jammed loops. To do this we have to prove that this is all safe, both for the memory accesses (using dependence analysis) and that ForeBlocks(i+1) can move before AftBlocks(i) and SubLoopBlocks(i, j). Differential Revision: https://reviews.llvm.org/D41953 llvm-svn: 333358	2018-05-27 12:11:21 +00:00
Eric Christopher	958a1f8d87	Remove boolean argument from isSuitableFromBSS. The argument was used as an additional negative condition and can be expressed in the if conditional without needing to pass it down. Update bss commentary around main use. llvm-svn: 333357	2018-05-27 11:39:34 +00:00
Eric Christopher	ed169ec424	Cleanups for getKindForGlobal: - Clarify block comment - Make Function/GlobalVariable split more explicit. - Move locals closer to uses. llvm-svn: 333356	2018-05-27 11:23:20 +00:00
Craig Topper	51eddb8749	[X86] Remove masking from avx512ifma intrinsics. Use a select instead. This allows us to avoid having mask and maskz variant. Reducing from 12 intrinsics to 6. llvm-svn: 333346	2018-05-26 18:55:19 +00:00
George Burgess IV	319be3a4e6	Replace AA's uses of uint64_t with LocationSize; NFC. The uint64_ts that we pass around AA to represent MemoryLocation sizes are logically an Optional<uint64_t>. In D44748, we want to add an extra 'imprecise' bit to this Optional<uint64_t> to represent whether a given MemoryLocation size is an upper-bound or an exact size. For more context on why, please see D44748. That patch is quite large, but reviewers seem to be OK with the approach. In D45581 (my first attempt to split 'noise' out of D44748), reames asked that I land a precursor that is solely replacing uint64_t with LocationSize, which starts out as `using LocationSize = uint64_t;`. He also gave me the OK to submit this rename without further review. llvm-svn: 333314	2018-05-25 21:16:58 +00:00
Mark Searles	32efedcff3	[AMDGPU][Waitcnt] Remove obsolete waitcnt option With the removal of the old waitcnt pass, the '-enable-si-insert-waitcnts' option is obsolete. Remove it. Differential Revision: https://reviews.llvm.org/D47378 llvm-svn: 333303	2018-05-25 20:24:08 +00:00
Stanislav Mekhanoshin	7fc1cee051	[AMDGPU] Fixed test failure with AMDGPUPerfHint We shall not keep iterator to a map while map is modified, this leads to a broken map. llvm-svn: 333298	2018-05-25 18:46:58 +00:00
Reid Kleckner	cb48efd585	Fix -Winconsistent-missing-overrides in AMDGPU code llvm-svn: 333291	2018-05-25 17:46:24 +00:00
Stanislav Mekhanoshin	1c538423dc	[AMDGPU] Add perf hints to functions This is adoption of HSAIL perfhint pass. Two types of hints are produced: 1. Function is memory bound. 2. Kernel can use wave limiter. Currently these hints are used in the scheduler. If a function is suspected to be memory bound we allow occupancy to decrease to 4 waves in the course of scheduling. Differential Revision: https://reviews.llvm.org/D46992 llvm-svn: 333289	2018-05-25 17:25:12 +00:00
Simon Dardis	6a31992383	[mips] Fix the definitions of lwp, swp Rather than using a regpair operand of these instructions, use two seperate operands and a custom converter to handle the implicit second register operand. Additionally, remove the microMIPS32R6 definition as its redundant. Reviewers: atanasyan, abeserminji, smaksimovic Differential Revision: https://reviews.llvm.org/D47255 llvm-svn: 333288	2018-05-25 16:15:48 +00:00
Krzysztof Parzyszek	95b073525b	[Hexagon] Fix packing source vectors in shufflevector selection When the shuffle mask selected a subvector of the second input vector, and aligning of the source was performed, the shuffle mask was updated incorrectly, resulting in an ICE further in the selection process. llvm-svn: 333279	2018-05-25 14:53:14 +00:00
Simon Pilgrim	0155bf0da9	[X86][SNB] Fix differences between vex/non-vex XMM vector moves (PR37286) As confirmed by llvm-exegesis, there is no scheduler difference between MOVDQA/MOVDQU and VMOVDQA/VMOVDQU xmm reg-reg moves Another chapter in the never ending crusade to remove useless InstRW overrides from the x86 scheduler models...... llvm-svn: 333271	2018-05-25 12:18:11 +00:00
Sander de Smalen	6e2a5b4cf0	Fix ubsan errors introduced by r333263 re. left-shifting negative values. llvm-svn: 333270	2018-05-25 11:41:04 +00:00
Sander de Smalen	62770795a5	[AArch64][SVE] Asm: Support for DUP (immediate) instructions. Unpredicated copy of optionally-shifted immediate to SVE vector, along with MOV-aliases. This patch contains parsing and printing support for cpy_imm8_opt_lsl_(i8\|i16\|i32\|i64). This operand allows a signed value in the range -128 to +127. For element widths of 16 bits or higher it may also be a signed multiple of 256 in the range -32768 to +32512. For element-width of 8 bits a range of -128 to 255 is accepted, since a copy of a byte can be considered either signed/unsigned. Note: This patch renames tryParseAddSubImm() -> tryParseImmWithOptionalShift() and moves the behaviour of trying to shift a plain immediate by an allowed shift-value to its addImmWithOptionalShiftOperands() method, so that the parsing itself is generic and allows immediates from multiple shifted operands. This is done because an immediate can be divisible by both shifted operands. Reviewers: rengolin, fhahn, samparker, SjoerdMeijer, javed.absar Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D47309 llvm-svn: 333263	2018-05-25 09:47:52 +00:00
Jonas Paulsson	307e782cbc	[SystemZ] Bugfix in combineSTORE(). Remember to check if store is truncating before calling combineTruncateExtract(). Review: Ulrich Weigand llvm-svn: 333262	2018-05-25 09:01:23 +00:00
Tim Renouf	ad8b7c1190	[AMDGPU] Fixed incorrect break from loop Summary: Lower control flow did not correctly handle the case that a loop break in if/else was on a condition that was not guaranteed to be masked by exec. The first test kernel shows an example of this going wrong; after exiting the loop, exec is all ones, even if it was not before the loop. The fix is for lowering of if-break and else-break to insert an S_AND_B64 to mask the break condition with exec. This commit also includes the optimization of not inserting that S_AND_B64 if it is obviously not needed because the break condition is the result of a V_CMP in the same basic block. V2: Addressed some review comments. V3: Test fixes. Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D44046 Change-Id: I0fc56a01209a9e99d1d5c9b0ffd16f111caf200c llvm-svn: 333258	2018-05-25 07:55:04 +00:00
Gabor Buella	d2f1ab1b10	[x86] invpcid LLVM intrinsic Re-add the feature flag for invpcid, which was removed in r294561. Add an intrinsic, which always uses a 32 bit integer as first argument, while the instruction actually uses a 64 bit register in 64 bit mode for the INVPCID_TYPE argument. Reviewers: craig.topper Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D47141 llvm-svn: 333255	2018-05-25 06:32:05 +00:00
Tom Stellard	79fffe3515	AMDGPU: Remove AMDGPUMCInstLower.h Summary: The AMDGPUMCInstLower class is not used outside AMDGPUMCInstLower.cpp, so we don't need a header file. Reviewers: arsenm, nhaehnle Reviewed By: arsenm Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D47264 llvm-svn: 333254	2018-05-25 04:57:02 +00:00
Tom Stellard	c501501055	AMDGPU: Split R600 AsmPrinter code into its own class Reviewers: arsenm, nhaehnle Reviewed By: arsenm Subscribers: kzhuravl, wdng, mgorny, yaxunl, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D47245 llvm-svn: 333219	2018-05-24 20:02:01 +00:00
Eli Friedman	9e177882aa	[AArch64] Improve orr+movk sequences for MOVi64imm. The existing code has three different ways to try to lower a 64-bit immediate to the sequence ORR+MOVK. The result is messy: it misses some possible sequences, and the order of the checks means we sometimes emit two MOVKs when we only need one. Instead, just use a simple loop to try all possible two-instruction ORR+MOVK sequences. Differential Revision: https://reviews.llvm.org/D47176 llvm-svn: 333218	2018-05-24 19:38:23 +00:00
Geoff Berry	98150e3a62	[AArch64] Take advantage of variable shift/rotate amount implicit mod operation. Summary: Optimize code generated for variable shifts/rotates by taking advantage of the implicit and/mod done on the variable shift amount register. Resolves bug 27582 and bug 37421. Reviewers: t.p.northover, qcolombet, MatzeB, javed.absar Subscribers: rengolin, kristof.beyls, mcrosier, llvm-commits Differential Revision: https://reviews.llvm.org/D46844 llvm-svn: 333214	2018-05-24 18:29:42 +00:00
Simon Pilgrim	b8c7c9c369	[X86][SSE] Pull out (AND (XOR X, -1), Y) matching into a helper function. NFC. llvm-svn: 333201	2018-05-24 16:16:42 +00:00
Simon Pilgrim	8bd73573c3	Fix unused variable warnings. NFCI. llvm-svn: 333195	2018-05-24 15:34:50 +00:00
Simon Pilgrim	0c72316a21	[X86][SSE] Pull out OR(AND(~MASK,X),AND(MASK,Y)) matching into a helper function. NFC. First stage towards matching more variants of the bitselect pattern for combineLogicBlendIntoPBLENDV (PR37549) llvm-svn: 333191	2018-05-24 15:12:48 +00:00
Simon Pilgrim	a90c211820	[X86][BtVer2] Added Jaguar cpu cycle counter to permit llvm-exegesis latency testing Ideally we'd be able to test a CPU by using __builtin_readcyclecounter()/RDTSC instead (PR37193) if a model/cycle-counter is not specified. NOTE: Jaguar PMCs don't give good coverage of resource pipes specified in the model (at the macro-vs-micro-op levels) but we should be able to cover at least a few resources. llvm-svn: 333190	2018-05-24 14:54:32 +00:00
Simon Atanasyan	f6b0c93fb3	[mips] Remove duplicated code from the expandLoadInst. NFC llvm-svn: 333164	2018-05-24 07:36:18 +00:00
Simon Atanasyan	a188267f0a	[mips] Remove redundant argument from expandLoadInst/expandStoreInst. NFC llvm-svn: 333163	2018-05-24 07:36:11 +00:00
Simon Atanasyan	be8a42efe2	[mips] Add precondition asserts to the expandLoadInst/expandStoreInst. NFC llvm-svn: 333162	2018-05-24 07:36:06 +00:00
Simon Atanasyan	478220f1fc	[mips] Cleanup the code a bit. NFC llvm-svn: 333161	2018-05-24 07:36:00 +00:00
Shiva Chen	43bfe84451	[RISCV] Support linker relax function call from auipc and jalr to jal To do this: 1. Add fixup_riscv_relax fixup types which eventually will transfer to R_RISCV_RELAX relocation types. 2. Insert R_RISCV_RELAX relocation types to auipc function call expression when linker relaxation enabled. Differential Revision: https://reviews.llvm.org/D44886 llvm-svn: 333158	2018-05-24 06:21:23 +00:00
Tom Stellard	1b95fed6f7	AMDGPU/R600: Remove code for handling AMDGPUISD::CLAMP Summary: We don't generate AMDGPUISD::CLAMP for R600 now that llvm.AMDGPU.clamp is gone. Reviewers: arsenm, nhaehnle Reviewed By: arsenm Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D47181 llvm-svn: 333153	2018-05-24 05:28:34 +00:00
Lei Huang	f4ec67822f	[PowerPC] Remove the match pattern in the definition of LXSDX/STXSDX The match pattern in the definition of LXSDX is xoaddr, so the Pseudo instruction XFLOADf64 never gets selected. XFLOADf64 expands to LXSDX/LFDX post RA based on the register pressure. To avoid ambiguity, we need to remove the select pattern for LXSDX, same as what was done for LXSD. STXSDX also have the same issue. Patch by Qing Shan Zhang (steven.zhang). Differential Revision: https://reviews.llvm.org/D47178 llvm-svn: 333150	2018-05-24 03:20:28 +00:00
Mandeep Singh Grang	ddcb95664e	[RISCV] Lower the tail pseudoinstruction This patch lowers the tail pseudoinstruction. This has been modeled after ARM's tail call opt. llvm-svn: 333137	2018-05-23 22:44:08 +00:00
Sameer AbuAsal	eadce02741	[RISCV] Set CostPerUse for registers Summary: Set CostPerUse higher for registers that are not used in the compressed instruction set. This will influence the greedy register allocator to reduce the use of registers that can't be encoded in 16 bit instructions. This affects register allocation even when compressed instruction isn't targeted, we see no major negative codegen impact. Reviewers: asb Reviewed By: asb Subscribers: rbar, johnrusso, simoncook, jordy.potman.lists, apazos, niosHD, kito-cheng, shiva0217, zzheng, edward-jones, mgrang Differential Revision: https://reviews.llvm.org/D47039 llvm-svn: 333132	2018-05-23 21:34:30 +00:00
Lei Huang	8b0da65bfb	[Power9]Legalize and emit code for W vector extract and convert to QP Implemente patterns to extract [Un]signed Word vector element and convert to quad-precision. Differential Revision: https://reviews.llvm.org/D46536 llvm-svn: 333115	2018-05-23 19:31:54 +00:00
Lei Huang	8990168a45	[Power9]Legalize and emit code for DW vector extract and convert to QP Implemente patterns to extract [Un]signed DWord vector element and convert to quad-precision. Differential Revision: https://reviews.llvm.org/D46333 llvm-svn: 333112	2018-05-23 18:36:51 +00:00
Chad Rosier	3f66363139	[CodeGen][AArch64] Use RegUnits to track register aliases. (NFC) Use RegUnits to track register aliases in AArch64RedundantCopyElimination. Differential Revision: https://reviews.llvm.org/D47269 llvm-svn: 333107	2018-05-23 17:49:38 +00:00
Petar Jovanovic	7d37bb42a1	Silence warnings introduced with r333093 r333093 introduced several warnings (-Wlogical-not-parentheses, -Wbool-compare). Adding parentheses in MipsSEInstrInfo::isCopyInstr() to silence it. llvm-svn: 333097	2018-05-23 16:27:51 +00:00
Petar Jovanovic	c051000b83	[X86][MIPS][ARM] New machine instruction property 'isMoveReg' This property is needed in order to follow values movement between registers. This property is used in TII to implement method that returns true if simple copy like instruction is recognized, along with source and destination machine operands. Patch by Nikola Prica. Differential Revision: https://reviews.llvm.org/D45204 llvm-svn: 333093	2018-05-23 15:28:28 +00:00
Nicola Zaghen	03d0b91f43	Remove DEBUG macro. Now that the LLVM_DEBUG() macro landed on the various sub-projects the DEBUG macro can be removed. Also change the new uses of DEBUG to LLVM_DEBUG. Differential Revision: https://reviews.llvm.org/D46952 llvm-svn: 333091	2018-05-23 15:09:29 +00:00
Alex Bradbury	257d5b5639	[RISCV] Add symbol diff relocation support for RISC-V For RISC-V it is desirable to have relaxation happen in the linker once addresses are known, and as such the size between two instructions/byte sequences in a section could change. For most assembler expressions, this is fine, as the absolute address results in the expression being converted to a fixup, and finally relocations. However, for expressions such as .quad .L2-.L1, the assembler folds this down to a constant once fragments are laid out, under the assumption that the difference can no longer change, although in the case of linker relaxation the differences can change at link time, so the constant is incorrect. One place where this commonly appears is in debug information, where the size of a function expression is in a form similar to the above. This patch extends the assembler to allow an AsmBackend to declare that it does not want the assembler to fold down this expression, and instead generate a pair of relocations that allow the linker to carry out the calculation. In this case, the expression is not folded, but when it comes to emitting a fixup, the generic FK_Data_* fixups are converted into a pair, one for the addition half, one for the subtraction, and this is passed to the relocation generating methods as usual. I have named these FK_Data_Add_* and FK_Data_Sub_* to indicate which half these are for. For RISC-V, which supports this via e.g. the R_RISCV_ADD64, R_RISCV_SUB64 pair of relocations, these are also set to always emit relocations relative to local symbols rather than section offsets. This is to deal with the fact that if relocations were calculated on e.g. .text+8 and .text+4, the result 12 would be stored rather than 4 as both addends are added in the linker. Differential Revision: https://reviews.llvm.org/D45181 Patch by Simon Cook. llvm-svn: 333079	2018-05-23 12:36:18 +00:00
Alex Bradbury	3fa69dd055	[Sparc] Use addAliasForDirective to support data directives The Sparc asm parser currently has custom parsing logic for .half, .word, .nword and .xword. Rather than use this custom logic, we can just use addAliasForDirective to enable the reuse of AsmParser::parseDirectiveValue. https://reviews.llvm.org/D47003 llvm-svn: 333078	2018-05-23 11:20:28 +00:00
Alex Bradbury	0a59f18951	[AArch64] Use addAliasForDirective to support data directives The AArch64 asm parser currently has custom parsing logic for .hword, .word, and .xword. Rather than use this custom logic, we can just use addAliasForDirective to enable the reuse of AsmParser::parseDirectiveValue. Differential Revision: https://reviews.llvm.org/D47000 llvm-svn: 333077	2018-05-23 11:17:20 +00:00
Alex Bradbury	1c010d0fa4	[RISCV] Correctly report sizes for builtin fixups This is a different approach to fixing the problem described in D46746. RISCVAsmBackend currently depends on the getSize helper function returning the number of bytes a fixup may change (note: some other backends have a similar helper named getFixupNumKindBytes). As noted in that review, this doesn't return the correct size for FK_Data_1, FK_Data_2, or FK_Data_8 meaning that too few bytes will be written in the case of FK_Data_8, and there's the potential of writing outside the Data array for the smaller fixups. D46746 extends getSize to recognise some of the builtin fixup types. Rather than having a function that needs to be kept up to date as new builtin or target-specific fixups are added, We can calculate an appropriate bound on the number of bytes that might be touched using Info.TargetSize and Info.TargetOffset. Differential Revision: https://reviews.llvm.org/D46965 llvm-svn: 333076	2018-05-23 10:53:56 +00:00
Daniel Cederman	6356571ec0	[Sparc] Add mnemonic aliases for flush, stb, stba, sth, and stha Reviewers: jyknight Reviewed By: jyknight Subscribers: fedor.sergeev, jrtc27, llvm-commits Differential Revision: https://reviews.llvm.org/D47140 llvm-svn: 333068	2018-05-23 08:26:49 +00:00
Roman Tereshin	e79d656c33	[GlobalISel][ARM] Adding HPR and QPR regclasses to FPRB regbank Also bringing ARMRegisterBankInfo::getRegBankFromRegClass implementation up to speed with the *.td-definition. Reviewed By: qcolombet Differential Revision: https://reviews.llvm.org/D43982 llvm-svn: 333056	2018-05-23 02:59:31 +00:00
Matt Arsenault	606bc315d6	AMDGPU: Fix v2f16 fneg/fabs pattern The integer operation convertion for some reason only happens if the source is a bitcast from an integer, which happens to always be the situation when the result is loaded. Add an additional pattern for when the source operation is really an FP operation. llvm-svn: 333019	2018-05-22 20:13:34 +00:00
Eli Friedman	785acce51d	Delete unused variable from r333015. (The assertion suppressed the unused variable warning on Release+Asserts builds, so I didn't notice.) llvm-svn: 333018	2018-05-22 19:38:07 +00:00
Tom Stellard	b12f4dec08	AMDGPU: Move AMDGPUTargetLowering::isFPExtFoldable() into SITargetLowering Summary: This is always false for R600. Reviewers: arsenm, nhaehnle Reviewed By: arsenm Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D47180 llvm-svn: 333016	2018-05-22 19:37:55 +00:00
Eli Friedman	042dc9e092	[MachineOutliner] Add "thunk" outlining for AArch64. When we're outlining a sequence that ends in a call, we can save up to three instructions in the outlined function by turning the call into a tail-call. I refer to this as thunk outlining because the resulting outlined function looks like a thunk; suggestions welcome for a better name. In addition to making the outlined function shorter, thunk outlining allows outlining calls which would otherwise be illegal to outline: we don't need to save/restore LR, so we don't need to prove anything about the stack access patterns of the callee. To make this work effectively, I also added MachineOutlinerInstrType::LegalTerminator to the generic MachineOutliner code; this allows treating an arbitrary instruction as a terminator in the suffix tree. Differential Revision: https://reviews.llvm.org/D47173 llvm-svn: 333015	2018-05-22 19:11:06 +00:00
Krzysztof Parzyszek	840b02bccf	[Hexagon] Add patterns for accumulating HVX compares llvm-svn: 333009	2018-05-22 18:27:02 +00:00
Aleksandar Beserminji	a5f755186a	[mips] Merge MipsLongBranch and MipsHazardSchedule passes MipsLongBranchPass and MipsHazardSchedule passes are joined to one pass because of mutual conflict. When MipsHazardSchedule inserts 'nop's, it potentially breaks some jumps, so they have to be expanded to long branches. When some branch is expanded to long branch, it potentially creates a hazard situation, which should be fixed by adding nops. New pass is called MipsBranchExpansion, it combines these two passes, and runs them alternately until one of them reports no changes were made. Differential Revision: https://reviews.llvm.org/D46641 llvm-svn: 332977	2018-05-22 13:24:38 +00:00
Simon Dardis	437153bb80	[mips] Correct the predicates of the cache and pref instructions Reviewers: atanasyan, abeserminji, smaksimovic Differential Revision: https://reviews.llvm.org/D46949 llvm-svn: 332970	2018-05-22 10:55:05 +00:00
Simon Pilgrim	4162d77744	[TTI] Add uniform/non-uniform constant Pow2 detection to TargetTransformInfo::getInstructionThroughput This enables us to detect more fast path sdiv cases under cost analysis. This patch also enables us to handle non-uniform-constant pow2 cases for X86 SDIV costs. Found while working on D46276 Future patches can then extend the vectorizers to more fully support non-uniform pow2 cases. Differential Revision: https://reviews.llvm.org/D46637 llvm-svn: 332969	2018-05-22 10:40:09 +00:00
Matt Arsenault	1349a04ef5	AMDGPU: Make v2i16/v2f16 legal on VI This usually results in better code. Fixes using inline asm with short2, and also fixes having a different ABI for function parameters between VI and gfx9. Partially cleans up the mess used for lowering of the d16 operations. Making v4f16 legal will help clean this up more, but this requires additional work. llvm-svn: 332953	2018-05-22 06:32:10 +00:00
Dan Gohman	b81848272d	[WebAssembly] Fix fast-isel lowering illegal argument and return types. For both argument and return types, promote illegal types like i24 to i32, and if a type can't be easily promoted, clear out the signature before bailing out, so avoid leaving it in a partially complete state. Fixes PR37546. llvm-svn: 332947	2018-05-22 04:58:36 +00:00
Tom Stellard	44b30b4537	AMDGPU: Remove #include "MCTargetDesc/AMDGPUMCTargetDesc.h" from common headers Summary: MCTargetDesc/AMDGPUMCTargetDesc.h contains enums for all the instuction and register defintions, which are huge so we only want to include them where needed. This will also make it easier if we want to split the R600 and GCN definitions into separate tablegenerated files. I was unable to remove AMDGPUMCTargetDesc.h from SIMachineFunctionInfo.h because it uses some enums from the header to initialize default values for the SIMachineFunction class, so I ended up having to remove includes of SIMachineFunctionInfo.h from headers too. Reviewers: arsenm, nhaehnle Reviewed By: nhaehnle Subscribers: MatzeB, kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D46272 llvm-svn: 332930	2018-05-22 02:03:23 +00:00
Craig Topper	358b094971	[X86] Remove 128/256-bit cvtdq2ps, cvtudq2ps, cvtqq2pd, cvtuqq2pd intrinsics. These can all be implemented with sitofp/uitofp instructions. llvm-svn: 332916	2018-05-21 23:15:00 +00:00
Roman Lebedev	7772de25d0	[DAGCombine][X86][AArch64] Masked merge unfolding: vector edition. Summary: This appears to be the last missing piece for the masked merge pattern handling in the backend. This is [[ https://bugs.llvm.org/show_bug.cgi?id=37104 \| PR37104 ]]. [[ https://bugs.llvm.org/show_bug.cgi?id=6773 \| PR6773 ]] will introduce an IR canonicalization that is likely bad for the end assembly. Previously, `andps`+`andnps` / `bsl` would be generated. (see `@out`) Now, they would no longer be generated (see `@in`), and we need to make sure that they are generated. Differential Revision: https://reviews.llvm.org/D46528 llvm-svn: 332904	2018-05-21 21:41:02 +00:00
Reid Kleckner	537917d13c	[X86] Simplify some X86 address mode folding code, NFCI This code should really do exactly the same thing for 32-bit x86 and 64-bit small code models, with the exception that RIP-relative addressing can't use base and index registers. llvm-svn: 332893	2018-05-21 21:03:19 +00:00
Craig Topper	aad3aefaeb	[X86] Remove masking from vpternlog intrinsics. Use a select in IR instead. This removes 6 intrinsics since we no longer need separate mask and maskz intrinsics. Differential Revision: https://reviews.llvm.org/D47124 llvm-svn: 332890	2018-05-21 20:58:09 +00:00
Peter Collingbourne	9a45114b3c	CodeGen: Add a dwo output file argument to addPassesToEmitFile and hook it up to dwo output. Part of PR37466. Differential Revision: https://reviews.llvm.org/D47089 llvm-svn: 332881	2018-05-21 20:16:41 +00:00
Peter Collingbourne	dcd7d6c331	MC: Separate creating a generic object writer from creating a target object writer. NFCI. With this we gain a little flexibility in how the generic object writer is created. Part of PR37466. Differential Revision: https://reviews.llvm.org/D47045 llvm-svn: 332868	2018-05-21 19:20:29 +00:00
Peter Collingbourne	2602a0d40c	Fix ubsan bounds check failure. llvm-svn: 332866	2018-05-21 19:09:47 +00:00
Stanislav Mekhanoshin	9badad2051	[AMDGPU] Add divergence analysis as a dependency for ISel AMDGPUDAGToDAGISel adds DivergenceAnalysis in getAnalysisUsage but does not list it in pass dependencies which may lead to crash. Differential Revision: https://reviews.llvm.org/D47151 llvm-svn: 332862	2018-05-21 18:18:52 +00:00
Peter Collingbourne	571a3301ae	MC: Change MCAsmBackend::writeNopData() to take a raw_ostream instead of an MCObjectWriter. NFCI. To make this work I needed to add an endianness field to MCAsmBackend so that writeNopData() implementations know which endianness to use. Part of PR37466. Differential Revision: https://reviews.llvm.org/D47035 llvm-svn: 332857	2018-05-21 17:57:19 +00:00
Tom Stellard	a91ce17b5f	AMDGPU/GlobalISel: Address post-commit review comments for r332379 MCRegisterInfo::getPhysRegSize() will be deprecated. llvm-svn: 332856	2018-05-21 17:49:31 +00:00
Andrea Di Biagio	b5757abefb	[X86][BtVer2] Add a 'J' prefix to the PRF/RCU defs. NFC This is to keep the Jaguar model's naming convention. Processor resources all have a 'J' prefix in the BtVer2 scheduling model. llvm-svn: 332851	2018-05-21 16:30:26 +00:00
Lama Saba	9417f7ff2e	[X86] - Avoid SFB pass - fix bug in updating the offsets for newly created copies Change-Id: I169ab6fe7e187727c0298c2a1e2868a683f3e688 llvm-svn: 332849	2018-05-21 16:23:16 +00:00
Simon Pilgrim	a8869e68a9	[X86][SSE] Add an assert to ensure that rotation amount is converted to a scale Missed in rL332832 where we added SSE v4i32 rotations for PR37426. llvm-svn: 332844	2018-05-21 15:17:23 +00:00
Tim Northover	4e3eec39fa	ARM: be conservative when asked load/store alignment of weird type. Chances are we'll be asked again after type legalization, but before that point it's better to claim misaligned accesses aren't allowed than to assert. llvm-svn: 332840	2018-05-21 12:43:54 +00:00
Aleksandar Beserminji	4977705727	[mips] Revert Merge MipsLongBranch and MipsHazardSchedule passes Revert this patch due buildbot failure. Differential Revision: https://reviews.llvm.org/D46641 llvm-svn: 332837	2018-05-21 11:38:52 +00:00
Eric Christopher	563d0b9cb9	Fix up a few grammar issues. llvm-svn: 332835	2018-05-21 10:27:36 +00:00
Aleksandar Beserminji	de7be5e46f	[mips] Merge MipsLongBranch and MipsHazardSchedule passes MipsLongBranchPass and MipsHazardSchedule passes are joined to one pass because of mutual conflict. When MipsHazardSchedule inserts 'nop's, it potentially breaks some jumps, so they have to be expanded to long branches. When some branch is expanded to long branch, it potentially creates a hazard situation, which should be fixed by adding nops. New pass is called MipsBranchExpansion, it combines these two passes, and runs them alternately until one of them reports no changes were made. Differential Revision: https://reviews.llvm.org/D46641 llvm-svn: 332834	2018-05-21 10:20:02 +00:00
Simon Pilgrim	5aa7cdfd70	[X86][SSE] Support v4i32 rotations (PR37426) As suggested by Fabian on PR37426, we can use PMULUDQ to perform v4i32 vector rotations as the upper 32bits of the multiply will contain the 'wrapped' bits of the rotation. v8i16/v16i8 rotations would be straightforward to add to lowerRotate in the future - ideally we'd mostly share code with the vector shifts lowering. Differential Revision: https://reviews.llvm.org/D46954 llvm-svn: 332832	2018-05-21 09:45:59 +00:00
Craig Topper	e4c045b7df	[X86] Remove mask arguments from permvar builtins/intrinsics. Use a select in IR instead. Someday maybe we'll use selects for all intrinsics. llvm-svn: 332824	2018-05-20 23:34:04 +00:00
Simon Dardis	777afc7fbd	[mips] Add microMIPSR6 ll/sc instructions. Previously the compiler was using the microMIPSR3 variants, incorrectly. Reviewers: atanasyan, abeserminji, smaksimovic Differential Revision: https://reviews.llvm.org/D46948 llvm-svn: 332820	2018-05-20 17:21:00 +00:00
Simon Pilgrim	ede0e4073e	Fix MSVC unused variable warning. NFCI. AMDGPURegisterInfo::getSubRegFromChannel is a static method - we don't need to get the AMDGPURegisterInfo instance. llvm-svn: 332807	2018-05-19 12:46:02 +00:00
Matt Arsenault	372d796ab1	AMDGPU: Add pass to optimize reqd_work_group_size Eliminate loads from the dispatch packet when they will have a known value. Also pattern match the code used by the library to handle partial workgroup dispatches, which isn't necessary if reqd_work_group_size is used. llvm-svn: 332771	2018-05-18 21:35:00 +00:00
Peter Collingbourne	e3f652973e	Support: Simplify endian stream interface. NFCI. Provide some free functions to reduce verbosity of endian-writing a single value, and replace the endianness template parameter with a field. Part of PR37466. Differential Revision: https://reviews.llvm.org/D47032 llvm-svn: 332757	2018-05-18 19:46:24 +00:00
Konstantin Zhuravlyov	caa8251971	AMDGPU/NFC: Set symbol's type that is coming from an argument in EmitAMDGPUSymbolType, instead of hard-coding it to STT_AMDGPU_HSA_KERNEL. llvm-svn: 332753	2018-05-18 18:41:37 +00:00
Peter Collingbourne	f7b81db715	MC: Change the streamer ctors to take an object writer instead of a stream. NFCI. The idea is that a client that wants split dwarf would create a specific kind of object writer that creates two files, and use it to create the streamer. Part of PR37466. Differential Revision: https://reviews.llvm.org/D47050 llvm-svn: 332749	2018-05-18 18:26:45 +00:00
Brendon Cahoon	e5ed563cc5	[Hexagon] Generate post-increment for floating point types The code that generates post-increments for Hexagon considered integer values only. This patch adds support to generate them for floating point values, f32 and f64. Differential Revision: https://reviews.llvm.org/D47036 llvm-svn: 332748	2018-05-18 18:14:44 +00:00
Simon Pilgrim	1273f4ad93	[X86] Add GPR<->XMM Schedule Tags BtVer2 - fix NumMicroOp and account for the Lat+6cy GPR->XMM and Lat+1cy XMm->GPR delays (see rL332737) The high number of MOVD/MOVQ equivalent instructions meant that there were a number of missed patterns in SNB/Znver1: SNB - add missing GPR<->MMX costs (taken from Agner / Intel AOM) Znver1 - add missing GPR<->XMM MOVQ costs (taken from Agner) llvm-svn: 332745	2018-05-18 17:58:36 +00:00
Craig Topper	f94ed26ea9	[X86] Directly legalize v16i16/v8i16 vselect to vXi8 vselect to use VPBLENDVB The intrinsic legalization for masked truncate uses ISD::TRUNCATE which can be constant folded by getNode. This prevents getVectorMaskingNode from seeing the ISD::TRUNCATE special case where it should emit X86ISD::SELECT instead of ISD::VSELECT. This causes a vselect with a v16i1 or v8i1 condition to be emitted during vector legalization. but vector legalization doesn't revisit nodes it creates. DAG combine will then promote this condition to match the result type. Then op legalization will try to legalize it, but the custom lowering hook returned SDValue(). But op legalization doesn't have an Expand for VSELECT because it expects vector legalization to have taken care of it. So the operation sticks around and fails in isel. This patch adds a custom legalization hook to morph it to a vXi8 vselect instead. This also simplifies the normal vXi16 vselect handling because vector legalization was normally expanding to AND/ANDN/OR and DAG combine was turning that into VBLENDVB. So we can skip a step by doing it directly. Fixes PR37499 Differential Revision: https://reviews.llvm.org/D47025 llvm-svn: 332743	2018-05-18 17:48:06 +00:00
Simon Pilgrim	007b50fd35	[X86][BtVer2] Improve simulation of (V)PINSR values Include the 6cy delay transferring from the GPR to FPU. llvm-svn: 332737	2018-05-18 17:09:41 +00:00
Simon Pilgrim	3ecb0b80f6	[X86][BtVer2] Partial vector stores (inc MMX) have a 2cy latency llvm-svn: 332722	2018-05-18 14:22:22 +00:00
Simon Pilgrim	c4b8d367a8	[X86][SSE] Ensure vector partial load/stores use the WriteVecLoad/WriteVecStore scheduler classes Retag some instructions that were missed when we split off vector load/store/moves - MOVQ/MOVD etc. Fixes BtVer2/SLM which have different behaviours for GPR stores. llvm-svn: 332718	2018-05-18 14:08:01 +00:00
Simon Pilgrim	e819199e2a	[X86][AVX] VEXTRACTF128mr store is a WriteFStoreX not WriteFStore llvm-svn: 332715	2018-05-18 13:17:51 +00:00
Simon Pilgrim	d749b321b2	[X86][SSE] Ensure float load/stores use the WriteFLoad/WriteFStore scheduler classes Retag some instructions that were missed when we split off vector load/store/moves - MOVSS/MOVSD/MOVHPD/MOVHPD/MOVLPD/MOVLPS etc. Fixes BtVer2/SLM which have different behaviours for GPR stores. llvm-svn: 332714	2018-05-18 13:13:59 +00:00
Clement Courbet	8892c7db08	[ExynosM3] Fix scheduling info. Differential Revision: https://reviews.llvm.org/D46356 llvm-svn: 332713	2018-05-18 13:10:41 +00:00
Simon Pilgrim	a325dffd36	[X86][ZnVer1] Cleanup more single match instregexs llvm-svn: 332712	2018-05-18 13:05:26 +00:00
Jonas Paulsson	b51ccaf4d4	[SystemZ] Fix commit message of previous commit. Sorry, the commit comment for r332703 is completely broken. My mind slipped - the right description would be: In SystemZDAGToDAGISel::Select(), in the handling for SELECT_CCMASK: Check if UpdateNodeOperands() returns a different SDNode and in that case call ReplaceNode. Review: Ulrich Weigand. llvm-svn: 332706	2018-05-18 12:07:16 +00:00
Alexander Ivchenko	5c54742da4	[X86][CET] Changing -fcf-protection behavior to comply with gcc (LLVM part) This patch aims to match the changes introduced in gcc by https://gcc.gnu.org/ml/gcc-cvs/2018-04/msg00534.html. The IBT feature definition is removed, with the IBT instructions being freely available on all X86 targets. The shadow stack instructions are also being made freely available, and the use of all these CET instructions is controlled by the module flags derived from the -fcf-protection clang option. The hasSHSTK option remains since clang uses it to determine availability of shadow stack instruction intrinsics, but it is no longer directly used. Comes with a clang patch (D46881). Patch by mike.dvoretsky Differential Revision: https://reviews.llvm.org/D46882 llvm-svn: 332705	2018-05-18 11:58:25 +00:00
Jonas Paulsson	de54c058a6	[SystemZ] Fold AHIMux in foldMemoryOperandImpl. AHIMux can be folded the same way as AHI. Review: Ulrich Weigand llvm-svn: 332703	2018-05-18 11:54:04 +00:00
Shiva Chen	6e07dfb148	[RISCV] Add WasForced parameter to MCAsmBackend::fixupNeedsRelaxationAdvanced For RISCV branch instructions, we need to preserve relocation types when linker relaxation enabled, so then linker could modify offset when the branch offsets changed. We preserve relocation types by define shouldForceRelocation. IsResolved return by evaluateFixup will always false when shouldForceRelocation return true. It will make RISCV MC Branch Relaxation always relax 16-bit branches to 32-bit form, even if the symbol actually could be resolved. To avoid 16-bit branches always relax to 32-bit form when linker relaxation enabled, we add a new parameter WasForced to indicate that the symbol actually couldn't be resolved and not forced by shouldForceRelocation return true. RISCVAsmBackend::fixupNeedsRelaxationAdvanced could relax branches with unresolved symbols by (!IsResolved && !WasForced). RISCV MC Branch Relaxation is needed because RISCV could perform 32-bit to 16-bit transformation in MC layer. Differential Revision: https://reviews.llvm.org/D46350 llvm-svn: 332696	2018-05-18 06:42:21 +00:00
Eric Christopher	68f2218e1e	Revert "Temporarily revert "[DEBUG] Initial adaptation of NVPTX target for debug info emission."" This reapplies commits: r330271, r330592, r330779. [DEBUG] Initial adaptation of NVPTX target for debug info emission. Summary: Patch adds initial emission of the debug info for NVPTX target. Currently, only .file and .loc directives are emitted, everything else is commented out to not break the compilation of Cuda. llvm-svn: 332689	2018-05-18 03:13:08 +00:00
Eli Friedman	d268bf0a4d	Fix unused lambda capture. llvm-svn: 332686	2018-05-18 02:11:25 +00:00
Eli Friedman	4081a57af7	[MachineOutliner] Count savings from outlining in bytes. Counting the number of instructions is both unintuitive and inaccurate. On AArch64, this only affects the generated remarks and certain rare pseudo-instructions, but it will have a bigger impact on other targets. Differential Revision: https://reviews.llvm.org/D46921 llvm-svn: 332685	2018-05-18 01:52:16 +00:00
Keno Fischer	e07153a859	[X86DomainReassignment] Don't compare stack-allocated values by address Summary: The Closure allocated in the main loop is allocated on the stack. However, later in the code its address is taken (and used for comparisons). This obviously doesn't work. In fact, the Closure will get the same stack address during every loop iteration, rendering the check that intended to identify Closure conflicts entirely ineffective. Fix this bug by giving every Closure a unique ID and using that for comparison. Alternatively, we could heap allocate the closure object. Fixes PR37396 Fixes JuliaLang/julia#27032 Reviewers: craig.topper, guyblank Reviewed By: craig.topper Subscribers: vchuravy, llvm-commits Differential Revision: https://reviews.llvm.org/D46800 llvm-svn: 332682	2018-05-18 01:03:01 +00:00
Keno Fischer	66ab99c3ee	[X86DomainReassignment] Don't delete IMPLICIT_DEF nodes Summary: We cannot simply delete IMPLICIT_DEF nodes. They may be used later (e.g. by a PHI) and deleting them will cause later passes (e.g. LiveVariables) to crash. However, it seems fine to ignore them for purposes of the domain reassignment (as we do with PHI). Fixes PR37430 Fixes JuliaLang/julia#27080 Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D46797 llvm-svn: 332680	2018-05-18 00:40:52 +00:00
Changpeng Fang	860d460063	AMDGPU/SI: Don't promote alloca to vector for atomic load/store Summary: Don't promote alloca to vector for atomic load/store Reviewer: arsenm Differential Revision: https://reviews.llvm.org/D46085 llvm-svn: 332673	2018-05-17 21:49:44 +00:00
Peter Collingbourne	0d8fa1b6fd	ARC, Nios2: Silence build warnings. NFCI. llvm-svn: 332663	2018-05-17 20:46:01 +00:00
Sameer AbuAsal	1dc0a8fb18	[RISCV] Separate base from offset in lowerGlobalAddress Summary: When lowering global address, lower the base as a TargetGlobal first then create an SDNode for the offset separately and chain it to the address calculation This optimization will create a DAG where the base address of a global access will be reused between different access. The offset can later be folded into the immediate part of the memory access instruction. With this optimization we generate: lui a0, %hi(s) addi a0, a0, %lo(s) ; shared base address. addi a1, zero, 20 ; 2 instructions per access. sw a1, 44(a0) addi a1, zero, 10 sw a1, 8(a0) addi a1, zero, 30 sw a1, 80(a0) Instead of: lui a0, %hi(s+44) ; 3 instructions per access. addi a1, zero, 20 sw a1, %lo(s+44)(a0) lui a0, %hi(s+8) addi a1, zero, 10 sw a1, %lo(s+8)(a0) lui a0, %hi(s+80) addi a1, zero, 30 sw a1, %lo(s+80)(a0) Which will save one instruction per access. Reviewers: asb, apazos Reviewed By: asb Subscribers: rbar, johnrusso, simoncook, jordy.potman.lists, niosHD, kito-cheng, shiva0217, zzheng, edward-jones, mgrang, apazos, asb, llvm-commits Differential Revision: https://reviews.llvm.org/D46989 llvm-svn: 332641	2018-05-17 18:14:53 +00:00
Mandeep Singh Grang	ef0ebf2806	[RISCV] Implement MC layer support for the tail pseudoinstruction Summary: This patch implements MC support for tail psuedo instruction. A follow-up patch implements the codegen support as well as handling of the indirect tail pseudo instruction. Reviewers: asb, apazos Reviewed By: asb Subscribers: rbar, johnrusso, simoncook, jordy.potman.lists, sabuasal, niosHD, kito-cheng, shiva0217, zzheng, edward-jones, llvm-commits Differential Revision: https://reviews.llvm.org/D46221 llvm-svn: 332634	2018-05-17 17:31:27 +00:00
Simon Pilgrim	2782a19fad	[X86] Split WriteCMOV + WriteCMOV2 scheduler classes Handle SNB+ targets which treat CMOVA/CMOVBE specially due to partial EFLAGS handling. llvm-svn: 332626	2018-05-17 16:47:30 +00:00
Changpeng Fang	391bcf8893	AMDGPU/SI: Handle infinite loop for the structurizer to work with CFG with infinite loops. Summary: The current StructurizeCFG pass only works for CFG with one exit. AMDGPUUnifyDivergentExitNodes combines multiple "return" blocks and/or "unreachable" blocks to one exit block for the Structurizer to work. However, infinite loop is another kind of special "exit", and if we don't handle it, the case of multiple exits will prevent the structurizer from working. In this work, for each infinite loop, we add a dummy edge to the "return" block, and thus the AMDGPUUnifyDivergentExitNodes pass will work with infinite loops. This will make CFG with infinite loops be structurized. Reviewer: nhaehnle Differential Revision: https://reviews.llvm.org/D46340 llvm-svn: 332625	2018-05-17 16:45:01 +00:00
Petar Jovanovic	daf5169398	[mips] Add support for Global INValidate ASE This includes Instructions: ginvi, ginvt, Assembler directives: .set ginv, .set noginv, .module ginv, .module noginv Attribute: ginv .MIPS.abiflags: GINV (0x20000) Patch by Vladimir Stefanovic. Differential Revision: https://reviews.llvm.org/D46268 llvm-svn: 332624	2018-05-17 16:30:32 +00:00
Alex Bradbury	6a53023b4e	[RISCV] Set isReMaterializable on ADDI and LUI instructions The isReMaterlizable flag is somewhat confusing, unlike most other instruction flags it is currently interpreted as a hint (mightBeRematerializable would be a better name). While LUI is always rematerialisable, for an instruction like ADDI it depends on its operands. TargetInstrInfo::isTriviallyReMaterializable will call TargetInstrInfo::isReallyTriviallyReMaterializable, which in turn calls TargetInstrInfo::isReallyTriviallyReMaterializableGeneric. We rely on the logic in the latter to pick out instances of ADDI that really are rematerializable. The isReMaterializable flag does make a difference on a variety of test programs. The recently committed remat.ll test case demonstrates how stack usage is reduce and a unnecessary lw/sw can be removed. Stack usage in the Proc0 function in dhrystone reduces from 192 bytes to 112 bytes. For the sake of completeness, this patch also implements RISCVRegisterInfo::isConstantPhysReg. Although this is called from a number of places, it doesn't seem to result in different codegen for any programs I've thrown at it. However, it is called in the rematerialisation codepath and it seems sensible to implement something correct here. Differential Revision: https://reviews.llvm.org/D46182 llvm-svn: 332617	2018-05-17 15:51:37 +00:00
Simon Pilgrim	b5741f5c3d	[X86][BtVer2] ADC/SBB take 2cy on an ALU pipe, not 1cy like ADD/SUB llvm-svn: 332616	2018-05-17 15:43:23 +00:00
Alex Bradbury	5e41fc83c5	[Hexagon] Use addAliasForDirective for data directives Data directives such as .word, .half, .hword are currently parsed using HexagonAsmParser::ParseDirectiveValue which effectively duplicates logic from AsmParser::parseDirectiveValue. This patch deletes that duplicated logic in favour of using addAliasForDirective. Differential Revision: https://reviews.llvm.org/D46999 llvm-svn: 332607	2018-05-17 13:21:18 +00:00
Simon Pilgrim	0c0336e003	[X86] Split WriteADC/WriteADCRMW scheduler classes For integer ALU instructions taking eflags as an input (ADC/SBB/ADCX/ADOX) llvm-svn: 332605	2018-05-17 12:43:42 +00:00
Jonas Paulsson	caafed5570	[SystemZ] Commenting (NFC) Some minor commenting in scheduler files. Review: Ulrich Weigand llvm-svn: 332599	2018-05-17 11:53:56 +00:00
Simon Pilgrim	ceb4933dc1	[X86][SNB] Minor scheduler cleanup Merge 2 instregex and explain the VMOVDQArr/MOVDQArr difference llvm-svn: 332591	2018-05-17 10:36:29 +00:00
Sander de Smalen	75cfa34156	[AArch64][SVE] Asm: Support for structured ST2, ST3 and ST4 (scalar+scalar) store instructions. Reviewers: rengolin, fhahn, samparker, SjoerdMeijer, javed.absar Reviewed By: SjoerdMeijer Differential Revision: https://reviews.llvm.org/D46680 llvm-svn: 332584	2018-05-17 09:05:41 +00:00
Alex Bradbury	cea6db0480	[RISCV] Add support for .half, .hword, .word, .dword directives These directives are recognised by gas. Support is added through the use of addAliasForDirective. Also match RISC-V gcc in preferring .half and .word for 16-bit and 32-bit data directives. llvm-svn: 332574	2018-05-17 05:58:08 +00:00
Craig Topper	a2c5264718	[X86] Add OptForSize to a couple load folding patterns. Remove some bad FIXME comments. The FIXME comments were about preventing load folding to avoid a partial xmm update. But these instructions use GPR as input when the load isn't folded. This won't help prevent a partial xmm update. llvm-svn: 332573	2018-05-17 05:41:11 +00:00
Dan Gohman	aef674102c	[WebAssembly] Fix the opcode number for i64.load16_u. Fixes PR37488. llvm-svn: 332561	2018-05-17 00:14:13 +00:00
Simon Pilgrim	820433f533	[X86][SNB] Remove unnecessary CVT InstRW overrides llvm-svn: 332536	2018-05-16 22:14:29 +00:00
Eli Friedman	ddbf6d6514	[MachineOutliner] Don't outline instructions that modify SP. This breaks the code which saves and restores LR, so we can't outline without doing something more complicated for stack adjustment. Found by inspection; we get lucky in most cases because getMemOpInfo only handles STRWpost, not any other pre/post-increment forms. But it hits a couple of artificial testcases in the tree. Differential Revision: https://reviews.llvm.org/D46920 llvm-svn: 332529	2018-05-16 21:20:16 +00:00
Krzysztof Parzyszek	f18009dbc6	[Hexagon] Fix the order of operands when selecting QCAT llvm-svn: 332526	2018-05-16 21:02:43 +00:00
Krzysztof Parzyszek	e8a0ae7346	[Hexagon] Mark HVX vector predicate bitwise ops as legal, add patterns llvm-svn: 332525	2018-05-16 21:00:24 +00:00

... 2 3 4 5 6 ...

47879 Commits