llvm-project

Commit Graph

Author	SHA1	Message	Date
Nemanja Ivanovic	e597bd8230	[PowerPC] Eliminate integer compare instructions - vol. 2 This patch builds upon https://reviews.llvm.org/rL302810 to add handling for bitwise logical operations in general purpose registers. The idea is to keep the values in GPRs as long as possible - only extracting them to a condition register bit when no further operations are to be done. Differential Revision: https://reviews.llvm.org/D31851 llvm-svn: 304282	2017-05-31 05:40:25 +00:00
Matthias Braun	bcd4c68233	X86FrameLowering: No need to mark FP as live-in everywhere The frame pointer (when used as frame pointer) is a reserved register. We do not track liveness of reserved registers and hence do not need to add them to the basic block livein lists. llvm-svn: 304274	2017-05-31 02:11:10 +00:00
Matthias Braun	05eeadbfd1	ARM: Fix cmpxchg O0 expansion This is the equivalent of r304048 for ARM: - Rewrite livein calculation to use the computeLiveIns() helper function. This is slightly less efficient but easier to reason about and doesn't unnecessarily add pristine and reserved registers[1] - Zero the status register at the beginning of the loop to make sure it has a defined value. - Remove kill flags of values that need to stay alive throughout the loop. [1] An upcoming commit of mine will tighten the MachineVerifier to catch these. llvm-svn: 304267	2017-05-31 01:21:35 +00:00
Matthias Braun	0dba4e3509	ARM: Do not add reserved registers to block livein lists; NFC llvm-svn: 304266	2017-05-31 01:21:30 +00:00
Eugene Zelenko	4e9736b1c9	[CodeGen] Fix some Clang-tidy modernize-use-using and Include What You Use warnings; other minor fixes (NFC). llvm-svn: 304265	2017-05-31 01:10:10 +00:00
Abderrazek Zaafrani	855411566b	Add latency info for Exynos interleaved Load/Store instructions. llvm-svn: 304259	2017-05-31 00:20:55 +00:00
Matthias Braun	5e394c3d6f	TargetPassConfig: Keep a reference to an LLVMTargetMachine; NFC TargetPassConfig is not useful for targets that do not use the CodeGen library, so we may just as well store a pointer to an LLVMTargetMachine instead of just to a TargetMachine. While at it, also change the constructor to take a reference instead of a pointer as the TM must not be nullptr. llvm-svn: 304247	2017-05-30 21:36:41 +00:00
Vedant Kumar	87aefe9042	Revert "This patch closes PR28513: an optimization of multiplication by different constants. It's implemented on DAG combiner level." This reverts commit r304209. I think this change is responsible for a tablgen failure in stage2 builds: http://green.lab.llvm.org/green/job/clang-stage2-configure-Rthinlto_build/2171/ I reproduced the failure locally (without ThinLTO), reverted the commit, rebuilt the stage1 clang, rebuilt the stage2 llvm-tblgen tool, and found that the crash disappears when the commit is reverted. Here is the stack trace: FAILED: lib/Target/ARM/ARMGenRegisterBank.inc.tmp cd /Volumes/Builds/pz-master-stage2-RA/lib/Target/ARM && /Volumes/Builds/pz-master-stage2-RA/bin/llvm-tblgen -gen-register-bank -I /Users/vk/llvm/lib/Target/ARM -I /Users/vk/llvm/include -I /Users/vk/llvm/lib/Target /Users/vk/llvm/lib/Target/ARM/ARM.td -o /Volumes /Builds/pz-master-stage2-RA/lib/Target/ARM/ARMGenRegisterBank.inc.tmp 0 llvm-tblgen 0x0000000106fc9568 llvm::sys::PrintStackTrace(llvm::raw_ostream&) + 40 1 llvm-tblgen 0x0000000106fc9be6 SignalHandler(int) + 422 2 libsystem_platform.dylib 0x00000001076a7fba _sigtramp + 26 3 libsystem_platform.dylib 0x00007fff58deb468 _sigtramp + 1366570184 4 llvm-tblgen 0x0000000106e89cc7 llvm::CodeGenRegBank::getCompositeSubRegIndex(llvm::CodeGenSubRegIndex, llvm::CodeGenSubRegIndex) + 615 5 llvm-tblgen 0x0000000106e88be6 llvm::CodeGenRegister::computeSubRegs(llvm::CodeGenRegBank&) + 2182 6 llvm-tblgen 0x0000000106e8e9f0 llvm::CodeGenRegBank::CodeGenRegBank(llvm::RecordKeeper&) + 2192 7 llvm-tblgen 0x0000000106f384a1 llvm::EmitRegisterBank(llvm::RecordKeeper&, llvm::raw_ostream&) + 65 8 llvm-tblgen 0x0000000106f72c64 (anonymous namespace)::LLVMTableGenMain(llvm::raw_ostream&, llvm::RecordKeeper&) + 1172 9 llvm-tblgen 0x0000000106fcb15f llvm::TableGenMain(char, bool ()(llvm::raw_ostream&, llvm::RecordKeeper&)) + 3599 10 llvm-tblgen 0x0000000106f727a6 main + 134 11 libdyld.dylib 0x000000010733c6a5 start + 1 Stack dump: 0. Program arguments: /Volumes/Builds/pz-master-stage2-RA/bin/llvm-tblgen -gen-register-bank -I /Users/vk/llvm/lib/Target/ARM -I /Users/vk/llvm/include -I /Users/vk/llvm/lib/Target /Users/vk/llvm/lib/Target/ARM/ARM.td -o /Volumes/Builds/pz-master-stage2-RA/lib/Target/ARM/ARMGenRegisterBank.inc.tmp /bin/sh: line 1: 41986 Segmentation fault: 11 /Volumes/Builds/pz-master-stage2-RA/bin/llvm-tblgen -gen-register-bank -I /Users/vk/llvm/lib/Target/ARM -I /Users/vk/llvm/include -I /Users/vk/llvm/lib/Target /Users/vk/llvm/lib/Target/ARM/ARM.td -o /Volumes/Builds/pz -master-stage2-RA/lib/Target/ARM/ARMGenRegisterBank.inc.tmp llvm-svn: 304231	2017-05-30 19:25:22 +00:00
Matthias Braun	700603555a	ARM: Add missing flags to TBB_[JH]T pseudo instructions NFC except for calming down the machine verifier in some cases. llvm-svn: 304227	2017-05-30 18:52:33 +00:00
Krzysztof Parzyszek	ef58017b35	[Hexagon] Improve code generation for 32x32-bit multiplication For multiplications of 64-bit values (giving 64-bit result), detect cases where the arguments are sign-extended 32-bit values, on a per- operand basis. This will allow few patterns to match a wider variety of combinations in which extensions can occur. llvm-svn: 304223	2017-05-30 17:47:51 +00:00
Stanislav Mekhanoshin	56ea488d8b	[AMDGPU] Allow SDWA in instructions with immediates and SGPRs An encoding does not allow to use SDWA in an instruction with scalar operands, either literals or SGPRs. That is however possible to copy these operands into a VGPR first. Several copies of the value are produced if multiple SDWA conversions were done. To cleanup MachineLICM (to hoist copies out of loops), MachineCSE (to remove duplicate copies) and SIFoldOperands (to replace SGPR to VGPR copy with immediate copy right to the VGPR) runs are added after the SDWA pass. Differential Revision: https://reviews.llvm.org/D33583 llvm-svn: 304219	2017-05-30 16:49:24 +00:00
Mark Searles	00ce96f6ee	[AMDGPU] Require waitcnt before barrier for all targets; adjust tests. Differential Revision: https://reviews.llvm.org/D33576 llvm-svn: 304217	2017-05-30 16:22:43 +00:00
Craig Topper	f6d4dc5b4a	[SelectionDAG] Set ISD::FPOWI to Expand by default Summary: Currently FPOWI defaults to Legal and LegalizeDAG.cpp turns Legal into Expand for this opcode because Legal is a "lie". This patch changes the default for this opcode to Expand and removes the hack from LegalizeDAG.cpp. It also removes all the code in the targets that set this opcode to Expand themselves since they can just rely on the default. Reviewers: spatel, RKSimon, efriedma Reviewed By: RKSimon Subscribers: jfb, dschuff, sbc100, jgravelle-google, nemanjai, javed.absar, andrew.w.kaylor, llvm-commits Differential Revision: https://reviews.llvm.org/D33530 llvm-svn: 304215	2017-05-30 15:27:55 +00:00
Andrew V. Tischenko	8b04826663	This patch closes PR28513: an optimization of multiplication by different constants. It's implemented on DAG combiner level. llvm-svn: 304209	2017-05-30 13:00:44 +00:00
Ulrich Weigand	3f484e68cc	[SystemZ] Add decimal floating-point instructions This adds assembler / disassembler support for the decimal floating-point instructions. Since LLVM does not yet have support for decimal float types, these cannot be used for codegen at this point. llvm-svn: 304203	2017-05-30 10:15:16 +00:00
Ulrich Weigand	f32adf6944	[SystemZ] Add hexadecimal floating-point instructions This adds assembler / disassembler support for the hexadecimal floating-point instructions. Since the Linux ABI does not use any hex float data types, these are not useful for codegen. llvm-svn: 304202	2017-05-30 10:13:23 +00:00
Zoran Jovanovic	375b60de74	[mips] Expansion of LI.S and LI.D Author: smaksimovic Reviewers: dsanders sdardis Introduces LI.S and LI.D pseudo instructions with floating point operands. Differential Revision: https://reviews.llvm.org/D14390 llvm-svn: 304198	2017-05-30 09:33:43 +00:00
Kristof Beyls	2af1e90eb2	Fix PR33031: correct the estimate of maximum offset for instructions spilling/filling the stack. llvm-svn: 304196	2017-05-30 06:58:41 +00:00
Jonas Paulsson	fe0c0935c8	[SystemZ] Improve buildVector() in SystemZISelLowering.cpp. Use VLREP when inserting one or more loads into a vector. This is more efficient than to first load and then use a VLVGP. Review: Ulrich Weigand llvm-svn: 304152	2017-05-29 13:22:23 +00:00
Nikolai Bozhenov	82f0801c1b	[Nios2] Target registration Reviewers: craig.topper, hfinkel, joerg, lattner, zvi Reviewed By: craig.topper Subscribers: oren_ben_simhon, igorb, belickim, tvvikram, mgorny, llvm-commits, pavel.v.chupin, DavidKreitzer Differential Revision: https://reviews.llvm.org/D32669 Patch by AndreiGrischenko <andrei.l.grischenko@intel.com> llvm-svn: 304144	2017-05-29 09:48:30 +00:00
Diana Picus	0c05cce4e0	[ARM] GlobalISel: Extract helper. NFCI. Create a helper to deal with the common code for merging incoming values together after they've been split during call lowering. There's likely more stuff that can be commoned up here, but we'll leave that for later. llvm-svn: 304143	2017-05-29 09:09:54 +00:00
Diana Picus	bf4aed2c38	[ARM] GlobalISel: Support array returns These are a bit rare in practice, but they don't require anything special compared to array parameters, so support them as well. llvm-svn: 304137	2017-05-29 08:19:19 +00:00
Hiroshi Inoue	e3c14ebbfa	[PPC] Fix assertion failure during binary encoding with -mcpu=pwr9 Summary clang -c -mcpu=pwr9 test/CodeGen/PowerPC/build-vector-tests.ll causes an assertion failure during the binary encoding. The failure occurs when a D-form load instruction takes two register operands instead of a register + an immediate. This patch fixes the problem and also adds an assertion to catch this failure earlier before the binary encoding (i.e. during lit test). The fix is from Nemanja Ivanovic @nemanjai. Differential Revision: https://reviews.llvm.org/D33482 llvm-svn: 304133	2017-05-29 07:12:39 +00:00
Diana Picus	8cca8cb0ce	[ARM] GlobalISel: Support array parameters/arguments Clang coerces structs into arrays, so it's a good idea to support them. Most of the support boils down to getting the splitToValueTypes helper to actually split types. We then use G_INSERT/G_EXTRACT to deal with the parts. llvm-svn: 304132	2017-05-29 07:01:52 +00:00
Zachary Turner	df1832cf86	Resubmit "[X86] Adding new LLVM TableGen backend that generates the X86 backend memory folding tables." This was reverted due to buildbot breakages and I was not familiar with this code to investigate it. But while trying to get a useful backtrace for the author, it turns out the fix was very obvious. Resubmitting this patch as is, and will submit the fix in a followup so that the fix is not hidden in the larger CL. llvm-svn: 304122	2017-05-29 02:19:37 +00:00
Zachary Turner	5b199be769	Revert "[X86] Adding new LLVM TableGen backend that generates the X86 backend memory folding tables." This reverts commit 28cb1003507f287726f43c771024a1dc102c45fe as well as all subsequent followups. llvm-tblgen currently segfaults with this change, and it seems it has been broken on the bots all day with no fixes in preparation. See, for example: http://lab.llvm.org:8011/builders/clang-x86-windows-msvc2015/ llvm-svn: 304121	2017-05-29 01:48:53 +00:00
Dylan McKay	74fc1ce0c2	[AVR] Remove SREG from CPI's Uses; authored by Florian Zeitz Summary: CPI does not read the status register, but only writes it. Reviewers: dylanmckay Reviewed By: dylanmckay Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D33223 llvm-svn: 304116	2017-05-29 00:10:14 +00:00
Geoff Berry	2739ebafb6	[AArch64][Falkor] Combine sched details files into one. NFC. llvm-svn: 304109	2017-05-28 22:20:44 +00:00
Geoff Berry	b542fb3817	[AArch64][Falkor] Fix some sched details. - Remove all uses of base sched model entries and set them all to Unsupported so all the opcodes are described in AArch64SchedFalkorDetails.td. - Remove entries for unsupported half-float opcodes. - Remove entries for unsupported LSE extension opcodes. - Add entry for MOVbaseTLS (and set Sched in base td file entry to WriteSys) and a few other pseudo ops. - Fix a few FP load/store with reg offset entries to use the LSLfast predicates. - Add Q size BIF/BIT/BSL entries. - Fix swapped Q/D sized CLS/CLZ/CNT/RBIT entires. - Fix pre/post increment address register latency (this operand is always dest 0). - Fix swapped FCVTHD/FCVTHS/FCVTDH/FCVTDS entries. - Fix XYZ resource over usage on LD[1-4] opcodes. llvm-svn: 304108	2017-05-28 21:48:31 +00:00
Ayman Musa	d9f1fe43a8	[X86] Adding new LLVM TableGen backend that generates the X86 backend memory folding tables. X86 backend holds huge tables in order to map between the register and memory forms of each instruction. This TableGen Backend automatically generated all these tables with the appropriate flags for each entry. Differential Revision: https://reviews.llvm.org/D32684 llvm-svn: 304088	2017-05-28 12:55:36 +00:00
Ayman Musa	0b4f97d5e9	[X86] Adding FoldGenRegForm helper field (for memory folding tables tableGen backend) to X86Inst class and set its value for the relevant instructions. Some register-register instructions can be encoded in 2 different ways, this happens when 2 register operands can be folded (separately). For example if we look at the MOV8rr and MOV8rr_REV, both instructions perform exactly the same operation, but are encoded differently. Here is the relevant information about these instructions from Intel's 64-ia-32-architectures-software-developer-manual: Opcode Instruction Op/En 64-Bit Mode Compat/Leg Mode Description 8A /r MOV r8,r/m8 RM Valid Valid Move r/m8 to r8. 88 /r MOV r/m8,r8 MR Valid Valid Move r8 to r/m8. Here we can see that in order to enable the folding of the output and input registers, we had to define 2 "encodings", and as a result we got 2 move 8-bit register-register instructions. In the X86 backend, we define both of these instructions, usually one has a regular name (MOV8rr) while the other has "_REV" suffix (MOV8rr_REV), must be marked with isCodeGenOnly flag and is not emitted from CodeGen. Automatically generating the memory folding tables relies on matching encodings of instructions, but in these cases where we want to map both memory forms of the mov 8-bit (MOV8rm & MOV8mr) to MOV8rr (not to MOV8rr_REV) we have to somehow point from the MOV8rr_REV to the "regular" appropriate instruction which in this case is MOV8rr. This field enable this "pointing" mechanism - which is used in the TableGen backend for generating memory folding tables. Differential Revision: https://reviews.llvm.org/D32683 llvm-svn: 304087	2017-05-28 12:39:37 +00:00
Matthias Braun	88c8c9847d	AArch64/PEI: Do not add reserved regs to liveins We do not track liveness for reserved registers. It is unnecessary to add them to block livein lists. llvm-svn: 304059	2017-05-27 03:38:02 +00:00
Quentin Colombet	7a43eddf28	[AArch64][GlobalISel] Add the Localizer pass for the O0 pipeline This should fix most of the issue we have right now with constants being spilled all over the place. llvm-svn: 304052	2017-05-27 01:34:07 +00:00
Matthias Braun	b4f74224ff	AArch64: Fix cmpxchg O0 expansion - Rewrite livein calculation to use the computeLiveIns() helper function. This is slightly less efficient but easier to reason about and doesn't unnecessarily add pristine and reserved registers[1] - Zero the status register at the beginning of the loop to make sure it has a defined value. - Remove kill flags of values that need to stay alive throughout the loop. [1] An upcoming commit of mine will tighten the MachineVerifier to catch these. llvm-svn: 304048	2017-05-26 23:48:59 +00:00
Alexei Starovoitov	3c585d3a8f	[bpf] disallow global_addr+off folding Wrong assembly code is generated for a simple program with clang. If clang only produces IR and llc is used for IR lowering and optimization, correct assembly code is generated. The main reason is that clang feeds default Reloc::Static to llvm and llc feeds no RelocMode to llvm, where for llc case, BPF backend picks up Reloc::PIC_ mode. This leads different IR lowering behavior and clang permits global_addr+off folding while llc doesn't. This patch introduces isOffsetFoldingLegal function into BPF backend and the function always return false. This will make clang and llc behave the same for the lowering. Bug https://bugs.llvm.org//show_bug.cgi?id=33183 has more detailed explanation. Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> llvm-svn: 304043	2017-05-26 22:32:41 +00:00
Davide Italiano	ef9bfe9531	[Mips] Placate GCC's -Wmisleading-indentation. NFCI. llvm-svn: 304041	2017-05-26 21:56:19 +00:00
Matthias Braun	ac4307c41e	LivePhysRegs: Rework constructor + documentation; NFC - Take reference instead of pointer to a TRI that cannot be nullptr. - Improve documentation comments. llvm-svn: 304038	2017-05-26 21:51:00 +00:00
Sumanth Gundapaneni	a6cf2fd5ec	[Hexagon] Cleanup of unused function isCalleeSaveReg (NFC) llvm-svn: 304034	2017-05-26 21:09:54 +00:00
Konstantin Zhuravlyov	b2ff8dfea0	Resubmit r303859 with test fixed. [AMDGPU] add intrinsic for s_getpc Summary: The s_getpc instruction is exposed as intrinsic llvm.amdgcn.s.getpc. Patch by Tim Corringham llvm-svn: 304031	2017-05-26 20:38:26 +00:00
Benjamin Kramer	debb3c35e0	Make helper functions static. NFC. llvm-svn: 304029	2017-05-26 20:09:00 +00:00
Dmitry Preobrazhensky	6a2431df0b	[AMDGPU][MC][GFX9] Corrected encoding of flat_scratch* for SDWA opcodes See bug 33171: https://bugs.llvm.org/show_bug.cgi?id=33171 Reviewers: Sam Kolton Differential Revision: https://reviews.llvm.org/D33553 llvm-svn: 304015	2017-05-26 18:01:29 +00:00
Tom Stellard	dde28a8c92	AMDGPU/GlobalISel: Mark 32-bit float constants as legal Reviewers: arsenm Reviewed By: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, igorb, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D33212 llvm-svn: 304003	2017-05-26 16:40:03 +00:00
Sam Kolton	363f47a2c7	[AMDGPU] SDWA: add disassembler support for GFX9 Summary: Added decoder methods and tests Reviewers: vpykhtin, artem.tamazov, dp Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye Differential Revision: https://reviews.llvm.org/D33545 llvm-svn: 303999	2017-05-26 15:52:00 +00:00
John Brawn	9009d2905d	[ARM] Fix lowering of misaligned memcpy/memset Currently getOptimalMemOpType returns i32 for large enough sizes without checking for alignment, leading to poor code generation when misaligned accesses aren't permitted as we generate a word store then later split it up into byte stores. This means we inadvertantly go over the MaxStoresPerMemcpy limit and for memset we splat the memset value into a word then immediately split it up again. Fix this by leaving it up to FindOptimalMemOpLowering to figure out which type to use, but also fix a bug there where it wasn't correctly checking if misaligned memory accesses are allowed. Differential Revision: https://reviews.llvm.org/D33442 llvm-svn: 303990	2017-05-26 13:59:12 +00:00
Andrew V. Tischenko	fdb264e263	The fix for PR22004: X86AsmParser.cpp asserts: OperandStack.size() > 1 && "Too few operands." llvm-svn: 303985	2017-05-26 13:23:34 +00:00
Nirav Dave	6ff50bf242	Fix signedness of constant. NFC. llvm-svn: 303980	2017-05-26 12:53:10 +00:00
Tim Shen	a76f20c364	[PPC] Add text for assert. llvm-svn: 303940	2017-05-25 23:40:46 +00:00
Tim Shen	a2b85da879	[PPC] Fix atomics lowering in DAG lowering. I forgot to forward the chain, causing some missing instruction dependencies. The test crashes the compiler without this patch. Inspired by the test case, D33519 also tries to remove the extra sync. Differential Revision: https://reviews.llvm.org/D33573 llvm-svn: 303931	2017-05-25 22:58:35 +00:00
Kyle Butt	13379d7c99	PPC: Correct Size for GETtlsADDR PPC::GETtlsADDR is lowered to a branch and a nop, by the assembly printer. Its size was incorrectly marked as 4, correct it to 8. The incorrect size can cause incorrect branch relaxation in PPCBranchSelector under the right conditions. llvm-svn: 303904	2017-05-25 19:37:41 +00:00
Nico Weber	b3d83a092a	Revert r303859, CodeGen/AMDGPU/llvm.amdgcn.s.getpc.ll fails on bots. llvm-svn: 303902	2017-05-25 19:19:29 +00:00
Manoj Gupta	d536180fdc	[AArch64]: add 'a' inline asm operand modifier. Summary: This is used in the Linux kernel, and effectively just means "print an address". This brings back r193593. Reviewed by: Renato Golin Reviewers: t.p.northover, rengolin, richard.barton.arm, kristof.beyls Subscribers: aemerson, javed.absar, llvm-commits, eraman Differential Revision: https://reviews.llvm.org/D33558 llvm-svn: 303901	2017-05-25 19:07:57 +00:00
Tim Corringham	32d0d38679	[AMDGPU] add intrinsic for s_getpc Summary: The s_getpc instruction is exposed as intrinsic llvm.amdgcn.s.getpc. Reviewers: arsenm Reviewed By: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye Differential Revision: https://reviews.llvm.org/D32862 llvm-svn: 303859	2017-05-25 14:04:14 +00:00
Oren Ben Simhon	7bf27f03f2	[X86] Adding vpopcntd and vpopcntq instructions AVX512_VPOPCNTDQ is a new feature set that was published by Intel. The patch represents the LLVM side of the addition of two new intrinsic based instructions (vpopcntd and vpopcntq). Differential Revision: https://reviews.llvm.org/D33169 llvm-svn: 303858	2017-05-25 13:45:23 +00:00
Tony Jiang	0a429f040e	[PowerPC] Fix a performance bug for PPC::XXSLDWI. There are some VectorShuffle Nodes in SDAG which can be selected to XXSLDWI instruction, this patch recognizes them and does the selection to improve the PPC performance. llvm-svn: 303822	2017-05-24 23:48:29 +00:00
Nirav Dave	bb20b5d5c3	[AArch64] Prevent nested ADDs from address calc in splitStoreSplat. NFC In preparation for late-stage store merging. llvm-svn: 303800	2017-05-24 19:55:49 +00:00
Zaara Syeda	932978315b	P9: D-form vector load/store. Differential Revision: https://reviews.llvm.org/D33248 llvm-svn: 303780	2017-05-24 17:50:37 +00:00
Matthew Simpson	6349380fa4	Revert r291254: [AArch64] Reduce vector insert/extract cost for Falkor The default vector insert/extract cost is more profitable on Falkor than the reduced cost. llvm-svn: 303771	2017-05-24 16:48:39 +00:00
Nirav Dave	d20066cbad	[AMDGPU] Prevent too large store merges in AMDGPU Subtargets. NFCI. Various address spaces on the SI and R600 subtargets have stricter limits on memory access size that other address spaces. Use canMergeStoresTo predicate to prevent the DAGCombiner from creating these stores as they will be split up during legalization. llvm-svn: 303767	2017-05-24 15:59:09 +00:00
Vadzim Dambrouski	b07351f4f8	[MSP430] Fix PR33050: Don't use ADD16ri to lower FrameIndex. Use ADDframe pseudo instruction instead. This will fix machine verifier error, and will help to fix PR32146. Differential Revision: https://reviews.llvm.org/D33452 llvm-svn: 303758	2017-05-24 15:08:30 +00:00
Marek Olsak	8973a0a22c	Revert "AMDGPU: Fold CI-specific complex SMRD patterns into existing complex patterns" This reverts commit e065977c4b5f68ab845400b256f6a3822b1325fa. It doesn't work. S_LOAD_DWORD_IMM_ci and friends aren't selected by any of the patterns, so it was putting 32-bit literals into the 8-bit field. llvm-svn: 303754	2017-05-24 14:53:50 +00:00
Krzysztof Parzyszek	e3ec97b031	[Hexagon] Fix comment in HexagonPacketizer::runOnMachineFunction Patch by Wei-Ren Chen. Differential Revision: https://reviews.llvm.org/D33439 llvm-svn: 303745	2017-05-24 13:43:42 +00:00
Jonas Paulsson	8624b7e1ce	[LoopVectorizer] Let target prefer scalar addressing computations. The loop vectorizer usually vectorizes any instruction it can and then extracts the elements for a scalarized use. On SystemZ, all elements containing addresses must be extracted into address registers (GRs). Since this extraction is not free, it is better to have the address in a suitable register to begin with. By forcing address arithmetic instructions and loads of addresses to be scalar after vectorization, two benefits result: * No need to extract the register * LSR optimizations trigger (LSR isn't handling vector addresses currently) Benchmarking show improvements on SystemZ with this new behaviour. Any other target could try this by returning false in the new hook prefersVectorizedAddressing(). Review: Renato Golin, Elena Demikhovsky, Ulrich Weigand https://reviews.llvm.org/D32422 llvm-svn: 303744	2017-05-24 13:42:56 +00:00
Jonas Paulsson	081b5a1e9d	[SystemZ] Fix register modelling in expandLoadStackGuard() EXPENSIVE_CHECKS found this bug (https://bugs.llvm.org/show_bug.cgi?id=33047), which this patch fixes. The EAR instruction defines a GR32, not a GR64. Review: Ulrich Weigand llvm-svn: 303743	2017-05-24 13:15:48 +00:00
Simon Pilgrim	9f46d1d479	Strip trailing whitespace. NFCI. llvm-svn: 303736	2017-05-24 11:02:27 +00:00
Florian Hahn	d211fe7c26	[ARM] Remove ThumbTargetMachines. (NFC) Summary: Thumb code generation is controlled by ARMSubtarget and the concrete ThumbLETargetMachine and ThumbBETargetMachine are not needed. Eric Christopher suggested removing the unneeded target machines in https://reviews.llvm.org/D33287. I think it still makes sense to keep separate TargetMachines for big and little endian as we probably do not want to have different endianess for difference functions in a single compilation unit. The MIPS backend has two separate TargetMachines for big and little endian as well. Reviewers: echristo, rengolin, kristof.beyls, t.p.northover Reviewed By: echristo Subscribers: aemerson, javed.absar, arichardson, llvm-commits Differential Revision: https://reviews.llvm.org/D33318 llvm-svn: 303733	2017-05-24 10:18:57 +00:00
Javed Absar	a32e3a1acf	[ARM] Add VLDx/VSTx sched defs for machine-schedulers. NFCI This patch adds missing scheds for Neon VLDx/VSTx instructions. This will help one write schedulers easier/faster in the future for ARM sub-targets. Existing models will not affected by this patch. Reviewed by: Renato Golin, Diana Picus Differential Revision: https://reviews.llvm.org/D33120 llvm-svn: 303717	2017-05-24 05:32:48 +00:00
Vadzim Dambrouski	49dd6e68c2	[MSP430] Add subtarget features for hardware multiplier. Also add more processors to make -mcpu option behave similar to gcc. Differential Revision: https://reviews.llvm.org/D33335 llvm-svn: 303695	2017-05-23 21:49:42 +00:00
Simon Pilgrim	c910a70b21	[AMDGPU] Add INDIRECT_BASE_ADDR to R600_Reg32 class (PR33045) This fixes 17 of the 41 -verify-machineinstrs test failures identified in PR33045 Differential Revision: https://reviews.llvm.org/D33451 llvm-svn: 303691	2017-05-23 21:27:15 +00:00
Changpeng Fang	1dbace195d	AMDGPU/SI: Move the local memory usage related checking after calling convention checking in PromoteAlloca Summary: Promoting Alloca to Vector and Promoting Alloca to LDS are two independent handling of Alloca and should not affect each other. As a result, we should not give up promoting to vector if there is not enough LDS. This patch factors out the local memory usage related checking out and replace it after the calling convention checking. Reviewer: arsenm Differential Revision: http://reviews.llvm.org/D33139 llvm-svn: 303684	2017-05-23 20:25:41 +00:00
Geoff Berry	d6ac96f953	[AArch64][Falkor] Refine sched details for LSLfast/ASRfast. llvm-svn: 303682	2017-05-23 19:57:45 +00:00
Stanislav Mekhanoshin	53a21292f8	[AMDGPU] Combine and (srl) into shl (bfe) Perform DAG combine: and (srl x, c), mask => shl (bfe x, nb + c, mask >> nb), nb Where nb is a number of trailing zeroes in mask. It replaces two instructions with two and BFE is generally a more expensive one. However this is only done if we are selecting a byte or word at an aligned boundary which results in a proper SDWA operand pattern. It is only done if SDWA is supported. TODO: improve SDWA pass to actually convert this pattern. It is not done now because we have an immediate in the instruction, which has be moved into a VGPR. Differential Revision: https://reviews.llvm.org/D33455 llvm-svn: 303681	2017-05-23 19:54:48 +00:00
Geoff Berry	e6366f505f	[AArch64][Falkor] Fix sched details for FMOV of WZR/XZR. llvm-svn: 303680	2017-05-23 19:54:28 +00:00
Oleg Ranevskyy	09df0020fc	[ARM] Temporarily disable globals promotion to constant pools to prevent miscompilation Summary: A temporary workaround for PR32780 - rematerialized instructions accessing the same promoted global through different constant pool entries. The patch turns off the globals promotion optimization leaving all its code in place, so that it can be easily turned on once PR32780 is fixed. Since this is a miscompilation issue causing generation of misbehaving code, and the problem is very subtle, the patch might be valuable enough to get into 4.0.1. Reviewers: efriedma, jmolloy Reviewed By: efriedma Subscribers: aemerson, javed.absar, llvm-commits, rengolin, asl, tstellar Differential Revision: https://reviews.llvm.org/D33446 llvm-svn: 303679	2017-05-23 19:38:37 +00:00
Nirav Dave	6c910c0dd8	[DAG] Add AddressSpace parameter to canMergeStoresTo. NFC. llvm-svn: 303673	2017-05-23 18:53:02 +00:00
Marek Olsak	7dadd86a35	AMDGPU: Fold CI-specific complex SMRD patterns into existing complex patterns This is just a cleanup. Also, it adds checking that ByteCount is aligned to 4. Reviewers: arsenm, nhaehnle, tstellarAMD Subscribers: kzhuravl, wdng, yaxunl, tony-tye Differential Revision: https://reviews.llvm.org/D28994 llvm-svn: 303658	2017-05-23 17:14:34 +00:00
Stanislav Mekhanoshin	a96ec3f360	[AMDGPU] Convert shl (add) into add (shl) shl (or\|add x, c2), c1 => or\|add (shl x, c1), (c2 << c1) This allows to fold a constant into an address in some cases as well as to eliminate second shift if the expression is used as an address and second shift is a result of a GEP. Differential Revision: https://reviews.llvm.org/D33432 llvm-svn: 303641	2017-05-23 15:59:58 +00:00
Simon Atanasyan	57253043a4	[mips] Remove unused class field. NFC llvm-svn: 303639	2017-05-23 15:00:30 +00:00
Simon Atanasyan	039b02ec78	[mips] Change type of MipsSubtarget ctor arguments s/std::string/StringRef/. NFC llvm-svn: 303638	2017-05-23 15:00:26 +00:00
Sam Kolton	f7659d71eb	[AMDGPU] SDWA: Add assembler support for GFX9 Summary: Added separate pseudo and real instruction for GFX9 SDWA instructions. Currently supports only in assembler. Depends D32493 Reviewers: vpykhtin, artem.tamazov Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye Differential Revision: https://reviews.llvm.org/D33132 llvm-svn: 303620	2017-05-23 10:08:55 +00:00
Florian Hahn	abb4218b98	[AArch64] Make instruction fusion more aggressive. Summary: This patch makes instruction fusion more aggressive by * adding artificial edges between the successors of FirstSU and SecondSU, similar to BaseMemOpClusterMutation::clusterNeighboringMemOps. * updating PostGenericScheduler::tryCandidate to keep clusters together, similar to GenericScheduler::tryCandidate. This change increases the number of AES instruction pairs generated on Cortex-A57 and Cortex-A72. This doesn't change code at all in most benchmarks or general code, but we've seen improvement on kernels using AESE/AESMC and AESD/AESIMC. Reviewers: evandro, kristof.beyls, t.p.northover, silviu.baranga, atrick, rengolin, MatzeB Reviewed By: evandro Subscribers: aemerson, rengolin, MatzeB, javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D33230 llvm-svn: 303618	2017-05-23 09:33:34 +00:00
Igor Breger	617be6e475	[GlobalISel][X86] G_LOAD/G_STORE vec256/512 support Summary: mark G_LOAD/G_STORE vec256/512 legal for AVX/AVX512. Implement instruction selection. Reviewers: zvi, guyblank Reviewed By: zvi Subscribers: rovka, llvm-commits, kristof.beyls Differential Revision: https://reviews.llvm.org/D33268 llvm-svn: 303617	2017-05-23 08:23:51 +00:00
Akira Hatanaka	e8ae3346a3	[AArch64] Fix PRR33100. This commit fixes a bug introduced in r301019 where optimizeLogicalImm would replace a logical node's immediate operand that was CSE'd and was also an operand of another node. This commit fixes the bug by replacing the logical node instead of its immediate operand. rdar://problem/32295276 llvm-svn: 303607	2017-05-23 06:08:37 +00:00
Krzysztof Parzyszek	9a23d40ee8	[Hexagon] Fix definitions of vector predicate loads and stores This fixes http://llvm.org/PR33048. llvm-svn: 303572	2017-05-22 20:02:53 +00:00
Stanislav Mekhanoshin	5fa289f0d8	[AMDGPU] Narrow lshl from 64 to 32 bit if possible Turn expensive 64 bit shift into 32 bit if shift does not overflow int: shl (ext x) => zext (shl x) Differential Revision: https://reviews.llvm.org/D33367 llvm-svn: 303569	2017-05-22 16:58:10 +00:00
Valery Pykhtin	74cb9c8831	[AMDGPU] Fix incorrect register usage tracking in GCNUpwardTracker Differential revision: https://reviews.llvm.org/D33289 llvm-svn: 303548	2017-05-22 13:09:40 +00:00
Simon Atanasyan	e0b726f2fa	[mips] Support micromips attribute passed by front-end This patch adds handling of the `micromips` and `nomicromips` attributes passed by front-end. The patch depends on D33363. Differential revision: https://reviews.llvm.org/D33364 llvm-svn: 303545	2017-05-22 12:47:41 +00:00
James Molloy	6110be9759	Re-apply r302416: [ARM] Clear the constant pool cache on explicit .ltorg directives Re-applying now that PR32825 which was raised on the commit this fixed up is now known to have also been fixed by this commit. Original commit message: Multiple ldr pseudoinstructions with the same constant value will reuse the same constant pool entry. However, if the constant pool is explicitly flushed with a .ltorg directive, we should not try to reference constants in the previous pool any longer, since they may be out of range. This fixes assembling hand-written assembler source which repeatedly loads the same constant value, across a binary size larger than the pc-relative fixup range for ldr instructions (4096 bytes). Such assembler source already uses explicit .ltorg instructions to emit constant pools with regular intervals. However if we try to reuse constants emitted in earlier pools, they end up out of range. This makes the output of the testcase match what binutils gas does (prior to this patch, it would fail to assemble). Differential Revision: https://reviews.llvm.org/D32847 llvm-svn: 303540	2017-05-22 09:42:07 +00:00
Strahinja Petrovic	ab9573f37c	[MIPS] Add support to match more patterns for DINS instruction This patch adds support for recognizing patterns to match DINS instruction. Differential Revision: https://reviews.llvm.org/D31465 llvm-svn: 303537	2017-05-22 09:06:44 +00:00
James Molloy	5cc75ae8f9	Revert "[ARM] Clear the constant pool cache on explicit .ltorg directives" This reverts commit r302416. This was a fixup for r286006, which has now been reverted so this doesn't apply (either in concept or in code). This commit itself has no problems, but the underlying issue it was fixing has now disappeared from the codebase. llvm-svn: 303536	2017-05-22 08:49:28 +00:00
Igor Breger	014fc566e7	[GlobalISel][X86] Fix G_TRUNC instruction selection. Updated tests with -verify-machineinstrs flag. It fixes 3 tests failed with machine verifier enabled and listed in PR27481 llvm-svn: 303502	2017-05-21 11:13:56 +00:00
Hiroshi Inoue	37e63b1b21	Summary PPC backend eliminates compare instructions by using record-form instructions in PPCInstrInfo::optimizeCompareInstr, which is called from peephole optimization pass. This patch improves this optimization to eliminate more compare instructions in two types of common case. - comparison against a constant 1 or -1 The record-form instructions set CR bit based on signed comparison against 0. So, the current implementation does not exploit the record-form instruction for comparison against a non-zero constant. This patch enables record-form optimization for constant of 1 or -1 if possible; it changes the condition "greater than -1" into "greater than or equal to 0" and "less than 1" into "less than or equal to 0". With this patch, compare can be eliminated in the following code sequence, as an example. uint64_t a, b; if ((a \| b) & 0x8000000000000000ull) { ... } else { ... } - andi for 32-bit comparison on PPC64 Since record-form instructions execute 64-bit signed comparison and so we have limitation in eliminating 32-bit comparison, i.e. with cmplwi, using the record-form. The original implementation already has such checks but andi. is not recognized as an instruction which executes implicit zero extension and hence safe to convert into record-form if used for equality check. %1 = and i32 %a, 10 %2 = icmp ne i32 %1, 0 br i1 %2, label %foo, label %bar In this simple example, LLVM generates andi. + cmplwi + beq on PPC64. This patch make it possible to eliminate the cmplwi for this case. I added andi. for optimization targets if it is safe to do so. Differential Revision: https://reviews.llvm.org/D30081 llvm-svn: 303500	2017-05-21 06:00:05 +00:00
Dmitry Preobrazhensky	ce941c9c38	[AMDGPU][MC] Corrected disassembler to decode instructions with 2 literals See bug 32922: https://bugs.llvm.org//show_bug.cgi?id=32922 Reviewers: artem.tamazov, vpykhtin Differential Revision: https://reviews.llvm.org/D32912 llvm-svn: 303428	2017-05-19 14:27:52 +00:00
Dmitry Preobrazhensky	9321e8fcec	[AMDGPU][MC] Fixed bugs in export instruction See Bugs 33019, 33056: https://bugs.llvm.org//show_bug.cgi?id=33019 https://bugs.llvm.org//show_bug.cgi?id=33056 Reviewers: artem.tamazov, vpykhtin Differential Revision: https://reviews.llvm.org/D33288 llvm-svn: 303423	2017-05-19 13:36:09 +00:00
Guy Blank	548e22a1a7	[X86][AVX512] Make i1 illegal in the CodeGen This patch defines the i1 type as illegal in the X86 backend for AVX512. For DAG operations on <N x i1> types (build vector, extract vector element, ...) i8 is used, and should be truncated/extended. This should produce better scalar code for i1 types since GPRs will be used instead of mask registers. Differential Revision: https://reviews.llvm.org/D32273 llvm-svn: 303421	2017-05-19 12:35:15 +00:00
Daniel Sanders	a1b2db7919	[globalisel][tablegen] Demote OptForSize/OptForMinSize/ForCodeSize to per-function predicates. Summary: This causes them to be re-computed more often than necessary but resolves objections that were raised post-commit on r301750. Reviewers: qcolombet, ab, t.p.northover, rovka, kristof.beyls Reviewed By: qcolombet Subscribers: igorb, llvm-commits Differential Revision: https://reviews.llvm.org/D32861 llvm-svn: 303418	2017-05-19 11:08:33 +00:00
Hans Wennborg	b00ffd8cb7	Revert r302938 "Add LiveRangeShrink pass to shrink live range within BB." This also reverts follow-ups r303292 and r303298. It broke some Chromium tests under MSan, and apparently also internal tests at Google. llvm-svn: 303369	2017-05-18 18:50:05 +00:00
Reid Kleckner	96ab8726a3	[IR] De-virtualize ~Value to save a vptr Summary: Implements PR889 Removing the virtual table pointer from Value saves 1% of RSS when doing LTO of llc on Linux. The impact on time was positive, but too noisy to conclusively say that performance improved. Here is a link to the spreadsheet with the original data: https://docs.google.com/spreadsheets/d/1F4FHir0qYnV0MEp2sYYp_BuvnJgWlWPhWOwZ6LbW7W4/edit?usp=sharing This change makes it invalid to directly delete a Value, User, or Instruction pointer. Instead, such code can be rewritten to a null check and a call Value::deleteValue(). Value objects tend to have their lifetimes managed through iplist, so for the most part, this isn't a big deal. However, there are some places where LLVM deletes values, and those places had to be migrated to deleteValue. I have also created llvm::unique_value, which has a custom deleter, so it can be used in place of std::unique_ptr<Value>. I had to add the "DerivedUser" Deleter escape hatch for MemorySSA, which derives from User outside of lib/IR. Code in IR cannot include MemorySSA headers or call the MemoryAccess object destructors without introducing a circular dependency, so we need some level of indirection. Unfortunately, no class derived from User may have any virtual methods, because adding a virtual method would break User::getHungOffOperands(), which assumes that it can find the use list immediately prior to the User object. I've added a static_assert to the appropriate OperandTraits templates to help people avoid this trap. Reviewers: chandlerc, mehdi_amini, pete, dberlin, george.burgess.iv Reviewed By: chandlerc Subscribers: krytarowski, eraman, george.burgess.iv, mzolotukhin, Prazek, nlewycky, hans, inglorion, pcc, tejohnson, dberlin, llvm-commits Differential Revision: https://reviews.llvm.org/D31261 llvm-svn: 303362	2017-05-18 17:24:10 +00:00
Francis Visoiu Mistrih	8b61764cbb	[LegacyPassManager] Remove TargetMachine constructors This provides a new way to access the TargetMachine through TargetPassConfig, as a dependency. The patterns replaced here are: * Passes handling a null TargetMachine call `getAnalysisIfAvailable<TargetPassConfig>`. * Passes not handling a null TargetMachine `addRequired<TargetPassConfig>` and call `getAnalysis<TargetPassConfig>`. * MachineFunctionPasses now use MF.getTarget(). * Remove all the TargetMachine constructors. * Remove INITIALIZE_TM_PASS. This fixes a crash when running `llc -start-before prologepilog`. PEI needs StackProtector, which gets constructed without a TargetMachine by the pass manager. The StackProtector pass doesn't handle the case where there is no TargetMachine, so it segfaults. Related to PR30324. Differential Revision: https://reviews.llvm.org/D33222 llvm-svn: 303360	2017-05-18 17:21:13 +00:00
Sam Kolton	ebfdaf7394	[AMDGPU] SDWA operands should not intersect with potential MIs Summary: There should be no intesection between SDWA operands and potential MIs. E.g.: ``` v_and_b32 v0, 0xff, v1 -> src:v1 sel:BYTE_0 v_and_b32 v2, 0xff, v0 -> src:v0 sel:BYTE_0 v_add_u32 v3, v4, v2 ``` In that example it is possible that we would fold 2nd instruction into 3rd (v_add_u32_sdwa) and then try to fold 1st instruction into 2nd (that was already destroyed). So if SDWAOperand is also a potential MI then do not apply it. Reviewers: vpykhtin, arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye Differential Revision: https://reviews.llvm.org/D32804 llvm-svn: 303347	2017-05-18 12:12:03 +00:00
Igor Breger	842b5b36ba	[GlobalISel][X86] G_ADD/G_SUB vector legalizer/selector support. Summary: G_ADD/G_SUB vector legalizer/selector support. Reviewers: zvi, guyblank Reviewed By: guyblank Subscribers: rovka, llvm-commits, kristof.beyls Differential Revision: https://reviews.llvm.org/D33232 llvm-svn: 303345	2017-05-18 11:10:56 +00:00
Simon Pilgrim	6bba6068be	[X86][AVX512] Add 512-bit vector ctpop costs + tests llvm-svn: 303342	2017-05-18 10:42:34 +00:00
Lama Saba	2ea271b54a	[X86] Replace slow LEA instructions in X86 According to Intel's Optimization Reference Manual for SNB+: " For LEA instructions with three source operands and some specific situations, instruction latency has increased to 3 cycles, and must dispatch via port 1: - LEA that has all three source operands: base, index, and offset - LEA that uses base and index registers where the base is EBP, RBP,or R13 - LEA that uses RIP relative addressing mode - LEA that uses 16-bit addressing mode " This patch currently handles the first 2 cases only. Differential Revision: https://reviews.llvm.org/D32277 llvm-svn: 303333	2017-05-18 08:11:50 +00:00
Davide Italiano	9ae69a75ec	[Target/X86] Remove unneeded return. NFCI. llvm-svn: 303323	2017-05-18 02:36:42 +00:00
Matt Arsenault	2b1f9aa577	AMDGPU: Start defining a calling convention Partially implement callee-side for arguments and return values. byval doesn't work properly, and most likely sret or other on-stack return values most as well. llvm-svn: 303308	2017-05-17 21:56:25 +00:00
Kyle Butt	f6c61ef64d	CodeGen: Power: Add lowering for shifts of v1i128. When legalizing vector operations on vNi128, they will be split to v1i128 because that is a legal type on ppc64, but then the compiler will crash in selection dag because it fails to select for these operations. This patch fixes shift operations. Logical shift right and left shift can be performed in the vector unit, but algebraic shift right requires being split. Differential Revision: https://reviews.llvm.org/D32774 llvm-svn: 303307	2017-05-17 21:54:41 +00:00
Michael Liao	ab12984634	Fix PR33028 - '-verify-mahcineinstrs' starts to complain allocatable live-in physical registers on non-entry or non-landing-pad basic blocks. - Refactor the XBEGIN translation to define EAX on a dedicated fallback code path due to XABORT. Add a pseudo instruction to define EAX explicitly to avoid add physical register live-in. Differential Revision: https://reviews.llvm.org/D33168 llvm-svn: 303306	2017-05-17 21:48:00 +00:00
Matt Arsenault	2525e4e4c2	AMDGPU: Expand frame indexes to be relative to scratch wave offset In order for an arbitrary callee to access an object in a caller's stack frame, the 32-bit offset used as the private pointer needs to be relative to the kernel's scratch wave offset register. Convert to this by finding the difference from the current stack frame and scaling by the wavefront size. llvm-svn: 303303	2017-05-17 21:23:14 +00:00
Matt Arsenault	156d3ae0b6	AMDGPU: Change mubuf soffset register when SP relative Check the MachinePointerInfo for whether the access is supposed to be relative to the stack pointer. No tests because this is used in later commits implementing calls. llvm-svn: 303301	2017-05-17 21:02:58 +00:00
Simon Pilgrim	23ef26728a	[X86][AVX512] Add 512-bit vector ctlz costs + tests llvm-svn: 303300	2017-05-17 21:02:18 +00:00
Matt Arsenault	98f2946ab3	AMDGPU: Make better use of op_sel with high components Handle more general swizzles. llvm-svn: 303296	2017-05-17 20:30:58 +00:00
Simon Pilgrim	d0365967c4	[X86][AVX512] Add 512-bit vector cttz costs + tests llvm-svn: 303293	2017-05-17 20:22:54 +00:00
Dehao Chen	02828a93e8	Only enable LiveRangeShrink for x86. Summary: Moving LiveRangeShrink to x86 as this pass is mostly useful for archtectures with great register pressure. Reviewers: MatzeB, qcolombet Reviewed By: qcolombet Subscribers: jholewinski, jyknight, javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D33294 llvm-svn: 303292	2017-05-17 20:18:13 +00:00
Matt Arsenault	786eeea23e	AMDGPU: Try to use op_sel when selecting packed instructions Avoids instructions to pack a vector when the source is really a scalar being broadcast. Also be smarter and look for per-component fneg. Doesn't yet handle scalar from upper half of register or other swizzles. llvm-svn: 303291	2017-05-17 20:00:00 +00:00
Jacob Gravelle	c63fb00f13	[WebAssembly][NFC] Update expected testsuite failures for newly passing tests Summary: r303050 fixes crashes when calling scalarizeMaskedMemIntrin pass from WebAssembly backend. This updates expected test failures for that. Reviewers: sbc100 Subscribers: jfb, llvm-commits, dschuff Differential Revision: https://reviews.llvm.org/D33295 llvm-svn: 303288	2017-05-17 19:45:22 +00:00
Matt Arsenault	ea8a4ed588	AMDGPU: Use appropriate soffset for spilling This needs to be the frame offset register, and not the global scratch wave offset register. For kernels, these are the same. llvm-svn: 303287	2017-05-17 19:37:57 +00:00
Matt Arsenault	ee324ffc1f	AMDGPU: Fix min3/max3 combines for f16/i16 Fix missing instruction definitions for min3/max3. llvm-svn: 303284	2017-05-17 19:25:06 +00:00
Simon Pilgrim	a9a92a1a6a	[X86][AVX512] Add 512-bit vector bitreverse costs + tests llvm-svn: 303283	2017-05-17 19:20:20 +00:00
Krzysztof Parzyszek	2b0533126e	[PPC] Properly update register save area offsets The variables MinGPR/MinG8R were not updated properly when resetting the offsets, which in the included testcase lead to saving the CR register in the same location as R30. This fixes another issue reported in PR26519. Differential Revision: https://reviews.llvm.org/D33017 llvm-svn: 303257	2017-05-17 13:25:09 +00:00
Igor Breger	28f290fab8	[GlobalISel][X86] Support add i64 in IA32. Summary: support G_UADDE instruction selection. Reviewers: zvi, guyblank Reviewed By: guyblank Subscribers: rovka, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D33096 llvm-svn: 303255	2017-05-17 12:48:08 +00:00
Jonas Paulsson	8722ade770	[SystemZ] Modelling of costs of divisions with a constant power of 2. Such divisions will eventually be implemented with shifts which should be reflected in the cost function. Review: Ulrich Weigand llvm-svn: 303254	2017-05-17 12:46:26 +00:00
Diana Picus	eafa4aa910	Reland r303247: [ARM] GlobalISel: Remove dead instruction selection code It only failed on llvm-clang-x86_64-expensive-checks-win, probably because the TableGen stuff hasn't been regenerated. Requires a clean build. llvm-svn: 303252	2017-05-17 12:42:52 +00:00
Diana Picus	36e4ba0f6e	Revert "[ARM] GlobalISel: Remove dead instruction selection code" This reverts commit r303247 because the tests are failing on some bots. Sorry! llvm-svn: 303249	2017-05-17 11:56:07 +00:00
Diana Picus	68d21c864e	[ARM] GlobalISel: Remove dead instruction selection code We can now generate code for selecting G_ADD, G_SUB and G_MUL. Remove the hand-written versions. llvm-svn: 303247	2017-05-17 11:39:26 +00:00
Daniel Cederman	4af795b499	[Sparc] Remove execute permissions from non-executable text files Reviewers: jyknight, lero_chris, venkatra Reviewed By: jyknight Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D27127 llvm-svn: 303245	2017-05-17 11:05:20 +00:00
Francis Visoiu Mistrih	b52e036600	BitVector: add iterators for set bits Differential revision: https://reviews.llvm.org/D32060 llvm-svn: 303227	2017-05-17 01:07:53 +00:00
Amara Emerson	c9916d7e97	Re-commit r302678, fixing PR33053. The issue was that the AArch64 TTI hook allowed unpacked integer cmp reductions which didn't have a lowering. llvm-svn: 303211	2017-05-16 21:29:22 +00:00
Tim Shen	3bef27cc6f	[PPC] Lower load acquire/seq_cst trailing fence to cmp + bne + isync. Summary: This fixes pr32392. The lowering pipeline is: llvm.ppc.cfence in IR -> PPC::CFENCE8 in isel -> Actual instructions in expandPostRAPseudo. The reason why expandPostRAPseudo is chosen is because previous passes are likely eliminating instructions like cmpw 3, 3 (early CSE) and bne- 7, .+4 (some branch pass(s)). Differential Revision: https://reviews.llvm.org/D32763 llvm-svn: 303205	2017-05-16 20:18:06 +00:00
Reid Kleckner	0ad69fc89f	Revert "[X86] Replace slow LEA instructions in X86" This reverts commit r303183, it broke various buildbots and introduced sanitizer errors. llvm-svn: 303199	2017-05-16 19:55:03 +00:00
Renato Golin	d69570e017	Revert "[ARM] Mark LEApcrel instructions as isAsCheapAsAMove" Revert "[ARM] Mark LEApcrel as not having side effects" This reverts commit r303054 and r303053, as they broke the ARM self-hosting buildbots: http://lab.llvm.org:8011/builders/clang-cmake-thumbv7-a15-full-sh/builds/1550 http://lab.llvm.org:8011/builders/clang-cmake-armv7-a15-selfhost-neon/builds/1349 http://lab.llvm.org:8011/builders/clang-cmake-armv7-a15-selfhost/builds/1845 Offline investigation on course. llvm-svn: 303193	2017-05-16 17:59:07 +00:00
Stanislav Mekhanoshin	acca0f5c02	[AMDGPU] Use GCNRPTracker dumper methods in scheduler Differential Revision: https://reviews.llvm.org/D33244 llvm-svn: 303186	2017-05-16 16:31:45 +00:00
Stanislav Mekhanoshin	b10860788f	[AMDGPU] Cache live-ins and register pressure in scheduler Using LIS can be quite expensive, so caching of calculated region live-ins and pressure is implemented. It does two things: 1. Caches the info for the second stage when we schedule with decreased target occupancy. 2. Tracks the basic block from top to bottom thus eliminating the need to scan whole register file liveness at every region split in the middle of the block. The scheduling is now done in 3 stages instead of two, with the first one being really a no-op and only used to collect scheduling regions as sent by the scheduler driver. There is no functional change to the current behavior, only compilation speed is affected. In general computeBlockPressure() could be simplified if we switch to backward RP tracker, because scheduler sends regions within a block starting from the last upward. We could use a natural order of upward tracker to seamlessly change between regions of the same block, since live reg set of a previous tracked region would become a live-out of the next region. That however requires fixing upward tracker to properly account defs and uses of the same instruction as both are contributing to the current pressure. When we converge on the produced pressure we should be able to switch between them back and forth. In addition, backward tracker is less expensive as it uses LIS in recede less often than forward uses it in advance. At the moment the worst known case compilation time has improved from 26 minutes to 8.5. Differential Revision: https://reviews.llvm.org/D33117 llvm-svn: 303184	2017-05-16 16:11:26 +00:00
Lama Saba	52e892577d	[X86] Replace slow LEA instructions in X86 According to Intel's Optimization Reference Manual for SNB+: " For LEA instructions with three source operands and some specific situations, instruction latency has increased to 3 cycles, and must dispatch via port 1: - LEA that has all three source operands: base, index, and offset - LEA that uses base and index registers where the base is EBP, RBP,or R13 - LEA that uses RIP relative addressing mode - LEA that uses 16-bit addressing mode " This patch currently handles the first 2 cases only. Differential Revision: https://reviews.llvm.org/D32277 llvm-svn: 303183	2017-05-16 16:01:36 +00:00
Stanislav Mekhanoshin	464cecf81e	[AMDGPU] Turn register pressure estimation into forward tracker This factors register pressure estimation mechanism from the GCNSchedStrategy into the forward tracker to unify interface with other strategies and expose it to other interested phases. Differential Revision: https://reviews.llvm.org/D33105 llvm-svn: 303179	2017-05-16 15:43:52 +00:00
Chad Rosier	8b12a03215	Fix an improperly placed curly bracket. NFC. llvm-svn: 303165	2017-05-16 12:43:23 +00:00
NAKAMURA Takumi	994a43d27a	AMDGPUCodeGen: Fix warnings in r303111. [-Wunused-variable] llvm-svn: 303137	2017-05-16 04:01:23 +00:00
Peter Collingbourne	6f0ecca3b5	IR: Give function GlobalValue::getRealLinkageName() a less misleading name: dropLLVMManglingEscape(). This function gives the wrong answer on some non-ELF platforms in some cases. The function that does the right thing lives in Mangler.h. To try to discourage people from using this function, give it a different name. Differential Revision: https://reviews.llvm.org/D33162 llvm-svn: 303134	2017-05-16 00:39:01 +00:00
Davide Italiano	60d36c7506	[AMDGPU] Kill now unused phiInfoElementGetDebugLoc(). NFCI. llvm-svn: 303122	2017-05-15 22:10:15 +00:00
Tim Northover	203c6f055d	AArch64: use linker-private symbols for globals in MachO. We don't use section-relative relocations on AArch64, so all symbols must be at least visible to the linker (i.e. properly global or l_whatever, but not L_whatever). llvm-svn: 303118	2017-05-15 21:51:38 +00:00
Adam Nemet	e29686e5c1	[SLP] Enable 64-bit wide vectorization on AArch64 ARM Neon has native support for half-sized vector registers (64 bits). This is beneficial for example for 2D and 3D graphics. This patch adds the option to lower MinVecRegSize from 128 via a TTI in the SLP Vectorizer. * Performance Analysis This change was motivated by some internal benchmarks but it is also beneficial on SPEC and the LLVM testsuite. The results are with -O3 and PGO. A negative percentage is an improvement. The testsuite was run with a sample size of 4. SPEC * CFP2006/482.sphinx3 -3.34% A pretty hot loop is SLP vectorized resulting in nice instruction reduction. This used to be a +22% regression before rL299482. * CFP2000/177.mesa -3.34% * CINT2000/256.bzip2 +6.97% My current plan is to extend the fix in rL299482 to i16 which brings the regression down to +2.5%. There are also other problems with the codegen in this loop so there is further room for improvement. ** LLVM testsuite * SingleSource/Benchmarks/Misc/ReedSolomon -10.75% There are multiple small SLP vectorizations outside the hot code. It's a bit surprising that it adds up to 10%. Some of this may be code-layout noise. * MultiSource/Benchmarks/VersaBench/beamformer/beamformer -8.40% The opt-viewer screenshot can be seen at F3218284. We start at a colder store but the tree leads us into the hottest loop. * MultiSource/Applications/lambda-0.1.3/lambda -2.68% * MultiSource/Benchmarks/Bullet/bullet -2.18% This is using 3D vectors. * SingleSource/Benchmarks/Shootout-C++/Shootout-C++-lists +6.67% Noise, binary is unchanged. * MultiSource/Benchmarks/Ptrdist/anagram/anagram +4.90% There is an additional SLP in the cold code. The test runs for ~1sec and prints out over 2000 lines. This is most likely noise. * MultiSource/Applications/aha/aha +1.63% * MultiSource/Applications/JM/lencod/lencod +1.41% * SingleSource/Benchmarks/Misc/richards_benchmark +1.15% Differential Revision: https://reviews.llvm.org/D31965 llvm-svn: 303116	2017-05-15 21:15:01 +00:00
Hans Wennborg	bd6e9e77a7	Revert r302678 "[AArch64] Enable use of reduction intrinsics." This caused PR33053. Original commit message: > The new experimental reduction intrinsics can now be used, so I'm enabling this > for AArch64. We will need this for SVE anyway, so it makes sense to do this for > NEON reductions as well. > > The existing code to match shufflevector patterns are replaced with a direct > lowering of the reductions to AArch64-specific nodes. Tests updated with the > new, simpler, representation. > > Differential Revision: https://reviews.llvm.org/D32247 llvm-svn: 303115	2017-05-15 20:59:32 +00:00
Jan Sjodin	a06bfe054e	Re-submit AMDGPUMachineCFGStructurizer. Differential Revision: https://reviews.llvm.org/D23209 llvm-svn: 303111	2017-05-15 20:18:37 +00:00
Tim Northover	8b96c7e9b5	AArch64: diagnose unrecognized features in .cpu directive. We were silently ignoring any features we couldn't match up, which led to errors in an inline asm block missing the conventional "\n\t". llvm-svn: 303108	2017-05-15 19:42:15 +00:00
Geoff Berry	e369653bf3	[AArch64][Falkor] Fix sched details for FMOV llvm-svn: 303099	2017-05-15 18:50:22 +00:00
Jan Sjodin	0e289822fa	Revert 303091. llvm-svn: 303098	2017-05-15 18:39:47 +00:00
Jan Sjodin	e9d2ddc9dd	Add AMDGPUMachineCFGStructurizer. Differential Revision: https://reviews.llvm.org/D23209 llvm-svn: 303091	2017-05-15 18:13:56 +00:00
Simon Pilgrim	55ff57861a	[NVPTX] Don't flag StoreParam/LoadParam memory chain operands as ReadMem/WriteMem (PR32146) Follow up to D33147 NVPTXTargetLowering::LowerCall was trusting the default argument values. Fixes another 17 of the NVPTX '-verify-machineinstrs with EXPENSIVE_CHECKS' errors in PR32146. Differential Revision: https://reviews.llvm.org/D33189 llvm-svn: 303082	2017-05-15 17:17:44 +00:00
Florian Hahn	af91e7e6d2	[AArch64] Enable FeatureFuseAES on Cortex-A72. This patch enables fusing dependent AESE/AESMC and AESD/AESIMC instruction pairs on Cortex-A72, as recommended in the Software Optimization Guide, section 4.10. llvm-svn: 303073	2017-05-15 15:15:22 +00:00
Dmitry Preobrazhensky	167f8b69e3	[AMDGPU][MC] Corrected several VI opcodes to avoid printing _e64 See bug 32936: https://bugs.llvm.org//show_bug.cgi?id=32936 Reviewers: artem.tamazov, vpykhtin Differential Revision: https://reviews.llvm.org/D33123 llvm-svn: 303070	2017-05-15 14:28:23 +00:00
Dmitry Preobrazhensky	03852a9dca	[AMDGPU][MC] Removed V_MQSAD_U16_U8 This instruction does not really exist See Bug 33018: https://bugs.llvm.org//show_bug.cgi?id=33018 Reviewers: vpykhtin, artem.tamazov Differential Revision: https://reviews.llvm.org/D33126 llvm-svn: 303055	2017-05-15 12:37:03 +00:00
John Brawn	9486becf09	[ARM] Mark LEApcrel instructions as isAsCheapAsAMove Doing this means that if an LEApcrel is used in two places we will rematerialize instead of generating two MOVs. This is particularly useful for printfs using the same format string, where we want to generate an address into a register that's going to get corrupted by the call. Differential Revision: https://reviews.llvm.org/D32858 llvm-svn: 303054	2017-05-15 11:57:54 +00:00
John Brawn	43132c46a6	[ARM] Mark LEApcrel as not having side effects Doing this lets us hoist it out of loops, and I've also marked it as rematerializable the same as the thumb1 and thumb2 counterparts. It looks like it being marked as such was just a mistake, as the commit that made that change only mentions LEApcrelJT and in thumb1 and thumb2 only the LEApcrelJT instructions were marked as having side-effects, so it looks like the intent was to only mark LEApcrelJT as having side-effects but LEApcrel was accidentally marked as such also. Differential Revision: https://reviews.llvm.org/D32857 llvm-svn: 303053	2017-05-15 11:50:21 +00:00
Simon Pilgrim	f8389656e3	[NVPTX] Don't rely on default arguments to SelectionDAG::getMemIntrinsicNode. NFC. NFC followup to D33147, this explicitly sets all the arguments (instead of relying on the defaults) to SelectionDAG::getMemIntrinsicNode to help identify -verify-machineinstrs issues. llvm-svn: 303047	2017-05-15 10:47:48 +00:00
Zvi Rackover	e6b278bc65	[X86] Utilize SelectionDAG::getSelect(). NFC. Replace SelectionDAG::getNode(ISD::SELECT, ...) and SelectionDAG::getNode(ISD::VSELECT, ...) with SelectionDAG::getSelect(...) Saves a few lines of code and in some cases saves the need to explicitly check the type of the desired node. llvm-svn: 303024	2017-05-14 21:30:38 +00:00
Simon Pilgrim	d0ef9d8e93	[X86][AVX1] Account for cost of extract/insert of 256-bit shifts llvm-svn: 303023	2017-05-14 20:52:11 +00:00
Simon Pilgrim	f96b4ab92d	[X86][AVX2] Fix costs for v4i64 ashr by splat llvm-svn: 303022	2017-05-14 20:25:42 +00:00
Simon Pilgrim	de4467b182	[X86][AVX1] Account for cost of extract/insert of 256-bit shifts by splat llvm-svn: 303021	2017-05-14 20:02:34 +00:00
Craig Topper	ceea1a76a1	[X86] Remove unused value from IntrinsicType enum. NFC llvm-svn: 303018	2017-05-14 19:38:06 +00:00
Simon Pilgrim	d3f0d03cc5	[X86][AVX1] Account for cost of extract/insert of 256-bit SDIV/UDIV by mul sequences llvm-svn: 303017	2017-05-14 18:52:15 +00:00
Simon Pilgrim	5bef9c627e	[X86][XOP] XOP's general v16i8 shifts will be used instead of v8i16 shift + mask. Tweak cost model to match what lowering actually does. llvm-svn: 303013	2017-05-14 17:59:46 +00:00
Simon Pilgrim	aa8dffb69b	[X86][SSE] Account for cost of extract/insert of v32i8 vector shifts llvm-svn: 303012	2017-05-14 17:36:07 +00:00
Simon Pilgrim	4599eaa09a	[X86][XOP] Account for cost of extract/insert of 256-bit vector shifts llvm-svn: 303010	2017-05-14 13:38:53 +00:00
Simon Pilgrim	f3ee9c6997	[X86][AVX] Allow 32-bit targets to peek through subvectors to extract constant splats for vXi64 shifts. llvm-svn: 303009	2017-05-14 11:46:26 +00:00
Simon Pilgrim	ef46c2762a	[x86, SSE] AVX1 PR28129 (256-bit all-ones rematerialization) Further perf tests on Jaguar indicate that: vxorps %ymm0, %ymm0, %ymm0 vcmpps $15, %ymm0, %ymm0, %ymm0 is consistently faster (by about 9%) than: vpcmpeqd %xmm0, %xmm0, %xmm0 vinsertf128 $1, %xmm0, %ymm0, %ymm0 Testing equivalent code on a SandyBridge (E5-2640) puts it slightly (~3%) faster as well. Committed on behalf of @dtemirbulatov Differential Revision: https://reviews.llvm.org/D32416 llvm-svn: 302989	2017-05-13 13:42:35 +00:00
Dylan McKay	0c4debc123	[AVR] When lowering Select8/Select16, put newly generated MBBs in the same spot Contributed by Dr. Gergő Érdi. Fixes a bug. Raised from (https://github.com/avr-rust/rust/issues/49). llvm-svn: 302973	2017-05-13 00:22:34 +00:00
Dylan McKay	0c707da6ac	[AVR] Remove an unused variable llvm-svn: 302970	2017-05-13 00:00:26 +00:00
Changpeng Fang	161e8c39af	AMDGPU/SI: Don't promote to vector if the load/store is volatile. Summary: We should not change volatile loads/stores in promoting alloca to vector. Reviewers: arsenm Differential Revision: http://reviews.llvm.org/D33107 llvm-svn: 302943	2017-05-12 20:31:12 +00:00
Simon Pilgrim	a1978aaefd	[NVPTX] Don't flag StoreRetVal memory chain operands as ReadMem (PR32146) This fixes 47 of the 75 NVPTX '-verify-machineinstrs with EXPENSIVE_CHECKS' errors in PR32146. Differential Revision: https://reviews.llvm.org/D33147 llvm-svn: 302942	2017-05-12 19:56:43 +00:00
Tim Shen	10c64e6aea	[PPC] Move the combine "a << (b % (sizeof(a) * 8)) -> (PPCshl a, b)" to the backend. NFC. Summary: Eli pointed out that it's unsafe to combine the shifts to ISD::SHL etc., because those are not defined for b > sizeof(a) * 8, even after some of the combiners run. However, PPCISD::SHL defines that behavior (as the instructions themselves). Move the combination to the backend. The tests in shift_mask.ll still pass. Reviewers: echristo, hfinkel, efriedma, iteratee Subscribers: nemanjai, llvm-commits Differential Revision: https://reviews.llvm.org/D33076 llvm-svn: 302937	2017-05-12 19:25:37 +00:00
Geoff Berry	ddbbf6416c	[AArch64][Falkor] Refine modeling of multiply accumulate forwarding. llvm-svn: 302933	2017-05-12 18:57:10 +00:00
Simon Pilgrim	b146e61828	Strip trailing whitespace. NFCI. llvm-svn: 302927	2017-05-12 17:42:36 +00:00
Craig Topper	8df66c602a	[KnownBits] Add bit counting methods to KnownBits struct and use them where possible This patch adds min/max population count, leading/trailing zero/one bit counting methods. The min methods return answers based on bits that are known without considering unknown bits. The max methods give answers taking into account the largest count that unknown bits could give. Differential Revision: https://reviews.llvm.org/D32931 llvm-svn: 302925	2017-05-12 17:20:30 +00:00
Tom Stellard	a0d67c748a	AMDGPU/GlobalISel: Mark 32-bit integer constants as legal Reviewers: arsenm Reviewed By: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, igorb, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D33115 llvm-svn: 302919	2017-05-12 16:46:46 +00:00
James Y Knight	d4e1b00e7c	[SPARC] Support 'f' and 'e' inline asm constraints. Based on patch by Patrick Boettcher and Chris Dewhurst. Differential Revision: https://reviews.llvm.org/D29116 llvm-svn: 302911	2017-05-12 15:59:10 +00:00
Simon Pilgrim	7f03231cc6	Use SDValue::getOperand() helper. NFCI. llvm-svn: 302894	2017-05-12 13:08:45 +00:00
Leslie Zhai	a1149e01d2	[AVR] Migrate to new StructType::get owing to Supress all uses of LLVM_END_WITH_NULL Reviewers: dylanmckay, jroelofs, RKSimon, serge-sans-paille Reviewed By: serge-sans-paille Differential Revision: https://reviews.llvm.org/D33119 llvm-svn: 302885	2017-05-12 09:08:03 +00:00
Reid Kleckner	43bbeb4c9f	Issue diagnostics when returning FP values on x86_64 without SSE1/2 Avoid using report_fatal_error, because it will ask the user to file a bug. If the user attempts to disable SSE on x86_64 and them use floating point, that's a bug in their code, not a bug in the compiler. This is just a start. There are other ways to crash the backend in this configuration, but they should be updated to follow this pattern. Differential Revision: https://reviews.llvm.org/D27522 llvm-svn: 302835	2017-05-11 22:43:02 +00:00
Guozhi Wei	22e7da9597	[PPC] Change the register constraint of the first source operand of instruction mtvsrdd to g8rc_nox0 According to Power ISA V3.0 document, the first source operand of mtvsrdd is constant 0 if r0 is specified. So the corresponding register constraint should be g8rc_nox0. This bug caused wrong output generated by 401.bzip2 when -mcpu=power9 and fdo are specified. Differential Revision: https://reviews.llvm.org/D32880 llvm-svn: 302834	2017-05-11 22:17:35 +00:00
Chad Rosier	aeffffdb44	[AArch64][MachineCombine] Fold FNMUL+FSUB -> FNMADD. Differential Revision: http://reviews.llvm.org/D33101. llvm-svn: 302822	2017-05-11 20:07:24 +00:00
Davide Italiano	0dcc015a81	[AMDGPU] Placate unused variable warning in release builds. llvm-svn: 302821	2017-05-11 19:58:52 +00:00
Vadzim Dambrouski	38e30197c3	[MSP430] Generate EABI-compliant libcalls Updates the MSP430 target to generate EABI-compatible libcall names. As a byproduct, adjusts the hardware multiplier options available in the MSP430 target, adds support for promotion of the ISD::MUL operation for 8-bit integers, and correctly marks R11 as used by call instructions. Patch by Andrew Wygle. Differential Revision: https://reviews.llvm.org/D32676 llvm-svn: 302820	2017-05-11 19:56:14 +00:00
Matt Arsenault	47ccafe787	AMDGPU: Remove tfe bit from flat instruction definitions We don't use it and it was removed in gfx9, and the encoding bit repurposed. Additionally actually using it requires changing the output register class, which wasn't done anyway. llvm-svn: 302814	2017-05-11 17:38:33 +00:00
Matt Arsenault	bf5482e4bb	AMDGPU: Pull fneg out of extract_vector_elt This allows folding source modifiers in more f16 cases. Makes it easier to select per-component packed neg modifiers. llvm-svn: 302813	2017-05-11 17:26:25 +00:00
Stanislav Mekhanoshin	33a97ec4ed	[AMDGPU] Fix incorrect register pressure calculation Earlier fix D32572 introduced a bug where live-ins were calculated for basic block instead of scheduling region. This change fixes it. Differential Revision: https://reviews.llvm.org/D33086 llvm-svn: 302812	2017-05-11 17:16:55 +00:00
Nemanja Ivanovic	96c3d626a2	[PowerPC] Eliminate integer compare instructions - vol. 1 This patch is the first in a series of patches to provide code gen for doing compares in GPRs when the compare result is required in a GPR. It adds the infrastructure to select GPR sequences for i1->i32 and i1->i64 extensions. This first patch handles equality comparison on i32 operands with the result sign or zero extended. Differential Revision: https://reviews.llvm.org/D31847 llvm-svn: 302810	2017-05-11 16:54:23 +00:00
Igor Breger	a44fc83d9f	[GlobalISel][X86] Remove hand-written G_FADD/F_SUB selection. Now it handle by TableGen. llvm-svn: 302793	2017-05-11 12:15:03 +00:00
Chandler Carruth	97500a9918	[x86] Fix a failure to select with AVX-512 when the type legalizer manages to form a VSELECT with a non-i1 element type condition. Those are technically allowed in SDAG (at least, the generic type legalization logic will form them and I wouldn't want to try to audit everything te preclude forming them) so we need to be able to lower them. This isn't too hard to implement. We mark VSELECT as custom so we get a chance in C++, add a fast path for i1 conditions to get directly handled by the patterns, and a fallback when we need to manually force the condition to be an i1 that uses the vptestm instruction to turn a non-mask into a mask. This, unsurprisingly, generates awful code. But it at least doesn't crash. This was actually impacting open source packages built with LLVM for AVX-512 in the wild, so quickly landing a patch that at least stops the immediate bleeding. I think I've found where to fix the codegen quality issue, but less confident of that change so separating it out from the thing that doesn't change the result of any existing test case but causes mine to not crash. llvm-svn: 302785	2017-05-11 10:52:16 +00:00
Simon Pilgrim	a4a13a0da0	Strip trailing whitespace. NFCI. llvm-svn: 302784	2017-05-11 10:03:05 +00:00
Diana Picus	9cfbc6d94f	[ARM][GlobalISel] Legalize narrow scalar ops by widening This is the same as r292827 for AArch64: we widen 8- and 16-bit ADD, SUB and MUL to 32 bits since we only have TableGen patterns for 32 bits. See the commit message for r292827 for more details. At this point we could just remove some of the tests for regbankselect and instruction-select, since we're not going to see any narrow operations at those levels anymore. Instead I decided to update them with G_ANYEXT/G_TRUNC operations, so we can validate the full sequences generated by the legalizer. llvm-svn: 302782	2017-05-11 09:45:57 +00:00
Serge Guelton	f4dc59ba8e	Remove spurious cast of nullptr. NFC. Conversion rules allow automatic casting of nullptr to any pointer type. llvm-svn: 302780	2017-05-11 08:53:00 +00:00
Serge Guelton	1b421c259f	Remove now useless trailing nullptr in StructType::get llvm-svn: 302779	2017-05-11 08:46:02 +00:00
Diana Picus	657bfd3302	[ARM][GlobalISel] Support for G_ANYEXT G_ANYEXT can be introduced by the legalizer when widening scalars. Add support for it in the register bank info (same mapping as everything else) and in the instruction selector. When selecting it, we treat it as a COPY, just like G_TRUNC. On this occasion we get rid of some assertions in selectCopy so we can reuse it. This shouldn't be a problem at the moment since we're not supporting any complicated cases (e.g. FPR, different register banks). We might want to separate the paths when we do. llvm-svn: 302778	2017-05-11 08:28:31 +00:00
Igor Breger	c7b5977bb1	[GlobalISel][X86] G_ICMP support. Summary: support G_ICMP for scalar types i8/i16/i64. Reviewers: zvi, guyblank Reviewed By: guyblank Subscribers: rovka, kristof.beyls, llvm-commits, krytarowski Differential Revision: https://reviews.llvm.org/D32995 llvm-svn: 302774	2017-05-11 07:17:40 +00:00
Igor Breger	db75455990	[X86] Move getX86ConditionCode() from X86FastISel.cpp to X86InstrInfo.cpp. NFC Summary: Move getX86ConditionCode() from X86FastISel.cpp to X86InstrInfo.cpp so it can be used by GloabalIsel instruction selector. This is a pre-commit for a patch I'm working on to support G_ICMP. NFC. Reviewers: zvi, guyblank, delena Reviewed By: guyblank, delena Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D33038 llvm-svn: 302767	2017-05-11 06:36:37 +00:00
Matt Arsenault	3c5e4237c6	AMDGPU: Make some packed shuffles free VOP3P instructions can encode access to either half of the register. llvm-svn: 302730	2017-05-10 21:29:33 +00:00
Matt Arsenault	acdc7659cc	AMDGPU: Add new subtarget features for gfx9 flat instructions Flat instructions gain an immediate offset, and 2 new sets of segment specific flat instructions are added. llvm-svn: 302729	2017-05-10 21:19:05 +00:00
Quentin Colombet	307e29124c	[AArch64][RegisterBankInfo] Change the default mapping of fp stores. For stores, check if the stored value is defined by a floating point instruction and if yes, we return a default mapping with FPR instead of GPR. llvm-svn: 302679	2017-05-10 15:19:41 +00:00
Amara Emerson	816542ceb3	[AArch64] Enable use of reduction intrinsics. The new experimental reduction intrinsics can now be used, so I'm enabling this for AArch64. We will need this for SVE anyway, so it makes sense to do this for NEON reductions as well. The existing code to match shufflevector patterns are replaced with a direct lowering of the reductions to AArch64-specific nodes. Tests updated with the new, simpler, representation. Differential Revision: https://reviews.llvm.org/D32247 llvm-svn: 302678	2017-05-10 15:15:38 +00:00
Ulrich Weigand	93b369ed11	[SystemZ] Add miscellaneous instructions This adds a few missing instructions for the assembler and disassembler. Those should be the last missing general- purpose (Chapter 7) instructions for the z10 ISA. llvm-svn: 302667	2017-05-10 14:20:15 +00:00
Ulrich Weigand	d3604dc72c	[SystemZ] Add missing arithmetic instructions This adds the remaining general arithmetic instructions for assembler / disassembler use. Most of these are not useful for codegen; a few might be, and those are listed in the README.txt for future improvements. llvm-svn: 302665	2017-05-10 14:18:47 +00:00
Jonas Paulsson	11d251c05c	[SystemZ] Implement getRepRegClassFor() This method must return a valid register class, or the list-ilp isel scheduler will crash. For MVT::Untyped nullptr was previously returned, but now ADDR128BitRegClass is returned instead. This is needed just as long as list-ilp (and probably also list-hybrid) is still there. Review: Ulrich Weigand, A Trick https://reviews.llvm.org/D32802 llvm-svn: 302649	2017-05-10 13:03:25 +00:00
Dmitry Preobrazhensky	da61a7f9ef	[AMDGPU][MC] Corrected v_madak/madmk to avoid printing "_e32" in disassembler output See bug 32927: https://bugs.llvm.org//show_bug.cgi?id=32927 Reviewers: vpykhtin, artem.tamazov, arsenm Differential Revision: https://reviews.llvm.org/D32913 llvm-svn: 302648	2017-05-10 13:00:28 +00:00
Ulrich Weigand	c7eb5a95b2	[SystemZ] Add decimal integer instructions This adds the set of decimal integer (BCD) instructions for assembler / disassembler use. llvm-svn: 302646	2017-05-10 12:42:45 +00:00
Ulrich Weigand	33a441adf9	[SystemZ] Add crypto instructions This adds the set of message-security assist instructions for assembler / disassembler use. llvm-svn: 302645	2017-05-10 12:42:00 +00:00
Ulrich Weigand	435cd1a3e4	[SystemZ] Add translate/convert instructions This adds the set of character-set translate and convert instructions for assembler / disassembler use. llvm-svn: 302644	2017-05-10 12:41:12 +00:00
Ulrich Weigand	eb17909536	[SystemZ] Add missing memory/string instructions This adds a number of missing memory and string instructions for assembler / disassembler use. llvm-svn: 302643	2017-05-10 12:40:15 +00:00
Martin Storsjo	605b0466ea	[AArch64] Fix a comment to match the code. NFC. For the ELF case, the default/preferred form is the generic one, not the short one as used for Apple - fix the comment to say so. Currently it is a copy-paste typo. Make the comments on the darwin default a bit more verbose. Use enum names instead of literal 0/1 to further increase readability and reduce fragility. Differential Revision: https://reviews.llvm.org/D32963 llvm-svn: 302634	2017-05-10 10:51:32 +00:00
Amara Emerson	836b0f48c1	Add a late IR expansion pass for the experimental reduction intrinsics. This pass uses a new target hook to decide whether or not to expand a particular intrinsic to the shuffevector sequence. Differential Revision: https://reviews.llvm.org/D32245 llvm-svn: 302631	2017-05-10 09:42:49 +00:00
Igor Breger	fda31e64e0	[GlobalISel][X86] G_ZEXT i1 to i32/i64 support. Summary: Support G_ZEXT i1 to i32/i64 instruction selection. Reviewers: zvi, guyblank Reviewed By: guyblank Subscribers: rovka, llvm-commits, kristof.beyls Differential Revision: https://reviews.llvm.org/D32965 llvm-svn: 302623	2017-05-10 06:52:58 +00:00
Stanislav Mekhanoshin	7e3794d5c3	[AMDGPU] Fixed typo in GCNRegPressure, NFC VGRP -> VGPR, SGRP -> SGPR llvm-svn: 302586	2017-05-09 20:50:04 +00:00
Matthew Simpson	78fd46b230	[AArch64] Consider widening instructions in cost calculations The AArch64 instruction set has a few "widening" instructions (e.g., uaddl, saddl, uaddw, etc.) that take one or more doubleword operands and produce quadword results. The operands are automatically sign- or zero-extended as appropriate. However, in LLVM IR, these extends are explicit. This patch updates TTI to consider these widening instructions as single operations whose cost is attached to the arithmetic instruction. It marks extends that are part of a widening operation "free" and applies a sub-target specified overhead (zero by default) to the arithmetic instructions. Differential Revision: https://reviews.llvm.org/D32706 llvm-svn: 302582	2017-05-09 20:18:12 +00:00
Serge Guelton	e38003f839	Suppress all uses of LLVM_END_WITH_NULL. NFC. Use variadic templates instead of relying on <cstdarg> + sentinel. This enforces better type checking and makes code more readable. Differential Revision: https://reviews.llvm.org/D32541 llvm-svn: 302571	2017-05-09 19:31:13 +00:00
Jacques Pienaar	0dbcc34f6b	[lanai] Add computeKnownBitsForTargetNode for Lanai. Summary: computeKnownBitsForTargetNode was not defined for Lanai which resulted in additional AND's with 0x1 for the output of SETCC instructions. Reviewers: eliben, majnemer Reviewed By: majnemer Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D29605 llvm-svn: 302568	2017-05-09 18:35:26 +00:00
Craig Topper	f893d49f0c	[X86] Add more patterns for BZHI isel This patch adds more patterns that a reasonable person might write that can be compiled to BZHI. This adds support for (~0U >> (32 - b)) & a; and a << (32 - b) >> (32 - b); This was inspired by the code in APInt::clearUnusedBits. This can pass an index of 32 to the bzhi instruction which a quick test of Haswell hardware shows will not mask any bits. Though the description text in the Intel manual says the "index is saturated to OperandSize-1". The pseudocode in the same manual indicates no bits will be zeroed for this case. I think this is still missing cases where the subtract portion is an 8-bit operation. Differential Revision: https://reviews.llvm.org/D32616 llvm-svn: 302549	2017-05-09 16:32:11 +00:00
Guy Blank	0c42d8c35b	VX512] Only look at lower bit in constant scalar masks for scalar masked instructions only the lower bit of the mask is relevant. so for constant masks we should either do an unmasked operation or no operation, depending on the value of the lower bit. This patch handles cases where the lower bit is '1'. Differential Revision: https://reviews.llvm.org/D32805 llvm-svn: 302546	2017-05-09 16:16:48 +00:00
Tim Shen	04de70d3a7	[Atomic] Remove IsStore/IsLoad in the interface, and pass the instruction instead. NFC. Now both emitLeadingFence and emitTrailingFence take the instruction itself, instead of taking IsLoad/IsStore pairs. Instruction::mayReadFromMemory and Instrucion::mayWriteToMemory are used for determining those two booleans. The instruction argument is also useful for later D32763, in emitTrailingFence. For emitLeadingFence, it seems to have cleaner interface with the proposed change. Differential Revision: https://reviews.llvm.org/D32762 llvm-svn: 302539	2017-05-09 15:27:17 +00:00
Aaron Ballman	3234647df6	Amend r302535; ifndef and ifdef are different, as it turns out. llvm-svn: 302537	2017-05-09 15:12:03 +00:00
Aaron Ballman	06297e839a	ARMRegisterBankInfo.h requires LLVM_BUILD_GLOBAL_ISEL to be defined. If it is not defined, then ARMGenRegisterBank.inc is not table generated and the inclusion of this header causes the build to fail. llvm-svn: 302535	2017-05-09 14:59:48 +00:00
Serge Pavlov	d526b13e61	Add extra operand to CALLSEQ_START to keep frame part set up previously Using arguments with attribute inalloca creates problems for verification of machine representation. This attribute instructs the backend that the argument is prepared in stack prior to CALLSEQ_START..CALLSEQ_END sequence (see http://llvm.org/docs/InAlloca.htm for details). Frame size stored in CALLSEQ_START in this case does not count the size of this argument. However CALLSEQ_END still keeps total frame size, as caller can be responsible for cleanup of entire frame. So CALLSEQ_START and CALLSEQ_END keep different frame size and the difference is treated by MachineVerifier as stack error. Currently there is no way to distinguish this case from actual errors. This patch adds additional argument to CALLSEQ_START and its target-specific counterparts to keep size of stack that is set up prior to the call frame sequence. This argument allows MachineVerifier to calculate actual frame size associated with frame setup instruction and correctly process the case of inalloca arguments. The changes made by the patch are: - Frame setup instructions get the second mandatory argument. It affects all targets that use frame pseudo instructions and touched many files although the changes are uniform. - Access to frame properties are implemented using special instructions rather than calls getOperand(N).getImm(). For X86 and ARM such replacement was made previously. - Changes that reflect appearance of additional argument of frame setup instruction. These involve proper instruction initialization and methods that access instruction arguments. - MachineVerifier retrieves frame size using method, which reports sum of frame parts initialized inside frame instruction pair and outside it. The patch implements approach proposed by Quentin Colombet in https://bugs.llvm.org/show_bug.cgi?id=27481#c1. It fixes 9 tests failed with machine verifier enabled and listed in PR27481. Differential Revision: https://reviews.llvm.org/D32394 llvm-svn: 302527	2017-05-09 13:35:13 +00:00
Simon Dardis	659c43f11a	Revert "[MIPS] Add support to match more patterns for DINS instruction" This reverts commit rL302512. This broke the mips buildbots. llvm-svn: 302526	2017-05-09 13:18:48 +00:00
Simon Pilgrim	ca3a63a849	[X86][SSE42] Lower v2i64/v4i64 ASHR(X, 63) as PCMPGTQ(0, X) Similar to what we do for vXi8 ASHR(X, 7), use SSE42's PCMPGTQ to splat the sign instead of using the PSRAD+PSHUFD. Avoiding bitcasts this improves combines that utilize computeNumSignBits, permits memory folding and reduces pipe pressure. Although it does require a second register, given that this is a (cheap) zero register the impact is minimal. Differential Revision: https://reviews.llvm.org/D32973 llvm-svn: 302525	2017-05-09 13:14:40 +00:00
Nikolai Bozhenov	b7bf386e80	[X86] Clang option -fuse-init-array has no effect when generating for MCU target Reviewers: Eugene.Zelenko, dschuff, craig.topper Reviewed By: craig.topper Subscribers: ahatanak, aaboud, DavidKreitzer, llvm-commits, cfe-commits Differential Revision: https://reviews.llvm.org/D32543 Patch by AndreiGrischenko <andrei.l.grischenko@intel.com> llvm-svn: 302513	2017-05-09 10:14:03 +00:00
Strahinja Petrovic	27ae4c3259	[MIPS] Add support to match more patterns for DINS instruction This patch adds support for recognizing patterns to match DINS instruction. Differential Revision: https://reviews.llvm.org/D31465 llvm-svn: 302512	2017-05-09 10:02:00 +00:00
Diana Picus	95640a1c4d	[ARM GlobalISel] Remove hand-written G_FADD selection Remove the code selecting G_FADD - now that TableGen can handle more opcodes, it's not needed anymore. llvm-svn: 302511	2017-05-09 08:32:42 +00:00
Tim Northover	c48c993b75	ARM: use divmod libcalls on embedded MachO platforms too. The separated libcalls are implemented in terms of __divmodsi4 and __udivmodsi4 anyway, so we should always use them if possible. llvm-svn: 302462	2017-05-08 20:00:14 +00:00
Quentin Colombet	55a72b3b05	[AArch64][RegisterBankInfo] Change the default mapping of fp loads. This fixes PR32550, in a way that does not imply running the greedy mode at O0. The fix consists in checking if a load is used by any floating point instruction and if yes, we return a default mapping with FPR instead of GPR. llvm-svn: 302453	2017-05-08 18:16:31 +00:00
Quentin Colombet	0e41a41b87	[AArch64][RegisterBankInfo] Fix mapping cost for GPR. In r292478, we changed the order of the enum that is referenced by PMI_FirstXXX. This had the side effect of changing the cost of the mapping of all the loads, instead of just the FPRs ones. Reinstate the higher cost for all but GPR loads. Note: This did not have any external visible effects: - For Fast mode, the cost would have been higher, but we don't care because we don't try to use alternative mappings. - For Greedy mode, the higher cost of the GPR loads, would have triggered the use of the supposedly alternative mapping, that would be in fact the same GPR mapping but with a lower cost. llvm-svn: 302452	2017-05-08 18:16:23 +00:00
Craig Topper	8297e52285	[ARM] Use a Changed flag to avoid making a pass's return value dependent on a compare with a Statistic object. Statistic compile to always be 0 in release build so this compare would always return false. And in the debug builds Statistic are global variables and remember their values across pass runs. So this compare returns true anytime the pass runs after the first time it modifies something. This was found after reviewing all usages of comparison operators on a Statistic object. We had some internal code that did a compare with a statistic that caused a mismatch in output between debug and release builds. So we did an audit out of paranoia. llvm-svn: 302450	2017-05-08 18:02:51 +00:00
Simon Pilgrim	df39b03f29	[X86][SSE] Improve combineLogicBlendIntoPBLENDV to use general masks. Currently combineLogicBlendIntoPBLENDV can only match ASHR to detect sign splatting of a bit mask, this patch generalises this to use computeNumSignBits instead. This is a first step in several things we can do to improve PBLENDV support: * Better matching of X86ISD::ANDNP patterns. * Handle floating point cases. * Better vector and bitcast support in computeNumSignBits. * Recognise that PBLENDV only uses the sign bit of the mask, we should be able strip away sign splats (ASHR, PCMPGT isNeg tests etc.). Differential Revision: https://reviews.llvm.org/D32953 llvm-svn: 302424	2017-05-08 14:16:39 +00:00
Simon Pilgrim	f5ca255d18	[ARM][NEON] Add support for ISD::ABS lowering Update NEON int_arm_neon_vabs intrinsic to use the ISD::ABS opcode directly Added constant folding tests. Differential Revision: https://reviews.llvm.org/D32938 llvm-svn: 302417	2017-05-08 10:37:34 +00:00
Martin Storsjo	fd4c158a84	[ARM] Clear the constant pool cache on explicit .ltorg directives Multiple ldr pseudoinstructions with the same constant value will reuse the same constant pool entry. However, if the constant pool is explicitly flushed with a .ltorg directive, we should not try to reference constants in the previous pool any longer, since they may be out of range. This fixes assembling hand-written assembler source which repeatedly loads the same constant value, across a binary size larger than the pc-relative fixup range for ldr instructions (4096 bytes). Such assembler source already uses explicit .ltorg instructions to emit constant pools with regular intervals. However if we try to reuse constants emitted in earlier pools, they end up out of range. This makes the output of the testcase match what binutils gas does (prior to this patch, it would fail to assemble). Differential Revision: https://reviews.llvm.org/D32847 llvm-svn: 302416	2017-05-08 10:26:24 +00:00
Simon Pilgrim	7a28a3ac78	[AARCH64][NEON] Add support for ISD::ABS lowering Update int_aarch64_neon_abs intrinsic to use the ISD::ABS opcode directly Differential Revision: https://reviews.llvm.org/D32940 llvm-svn: 302415	2017-05-08 10:25:18 +00:00
Igor Breger	810c6257f1	[GlobalISel][X86] G_GEP selection support. Summary: [GlobalISel][X86] G_GEP selection support. Reviewers: zvi, guyblank Reviewed By: guyblank Subscribers: dberris, rovka, llvm-commits, kristof.beyls Differential Revision: https://reviews.llvm.org/D32396 llvm-svn: 302412	2017-05-08 09:40:43 +00:00
Igor Breger	605b965ae5	[GlobalISel][X86] G_MUL legalizer/selector support. Summary: G_MUL legalizer/selector/regbank support. Use only Tablegen-erated instruction selection. This patch dealing with legal operations only. Reviewers: zvi, guyblank Reviewed By: guyblank Subscribers: krytarowski, rovka, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D32698 llvm-svn: 302410	2017-05-08 09:03:37 +00:00
Dean Michael Berris	9bcaed867a	[XRay] Custom event logging intrinsic This patch introduces an LLVM intrinsic and a target opcode for custom event logging in XRay. Initially, its use case will be to allow users of XRay to log some type of string ("poor man's printf"). The target opcode compiles to a noop sled large enough to enable calling through to a runtime-determined relative function call. At runtime, when X-Ray is enabled, the sled is replaced by compiler-rt with a trampoline to the logic for creating the custom log entries. Future patches will implement the compiler-rt parts and clang-side support for emitting the IR corresponding to this intrinsic. Reviewers: timshen, dberris Subscribers: igorb, pelikan, rSerge, timshen, echristo, dberris, llvm-commits Differential Revision: https://reviews.llvm.org/D27503 llvm-svn: 302405	2017-05-08 05:45:21 +00:00
Simon Pilgrim	2d1c6d6e8d	[X86][AVX1] Improve 256-bit vector costs for integer unary intrinsics. Account for subvector extraction/insertion, helps prevent the vectorizers from selecting 256-bit vectors that will have to be split anyhow on AVX1 targets. llvm-svn: 302378	2017-05-07 20:58:55 +00:00
Simon Pilgrim	33f7397cc0	[X86][AVX512] Relax assertion and just exit combine for unsupported types (PR32907) llvm-svn: 302361	2017-05-06 20:53:52 +00:00
Simon Pilgrim	fea153f341	[X86][AVX512] Move v2i64/v4i64 VPABS lowering to tablegen Extend NoVLX targets to use the 512-bit versions llvm-svn: 302359	2017-05-06 19:11:59 +00:00
Simon Pilgrim	f15a2f4d94	[X86] Reduce code for setting operations actions by merging into loops across multiple types/ops. NFCI. llvm-svn: 302357	2017-05-06 18:17:56 +00:00
Simon Pilgrim	98f1d02677	[NVPTX] Add support for ISD::ABS lowering Use the ISD::ABS opcode directly Differential Revision: https://reviews.llvm.org/D32944 llvm-svn: 302356	2017-05-06 17:42:09 +00:00
Simon Pilgrim	781cb10104	[X86][SSE] Break register dependencies on v16i8/v8i16 BUILD_VECTOR on SSE41 rL294581 broke unnecessary register dependencies on partial v16i8/v8i16 BUILD_VECTORs, but on SSE41 we (currently) use insertion for full BUILD_VECTORs as well. By allowing full insertion to occur on SSE41 targets we can break register dependencies here as well. llvm-svn: 302355	2017-05-06 17:30:39 +00:00
Quentin Colombet	245994d968	[RegisterBankInfo] Uniquely allocate instruction mapping. This is a step toward having statically allocated instruciton mapping. We are going to tablegen them eventually, so let us reflect that in the API. NFC. llvm-svn: 302316	2017-05-05 22:48:22 +00:00
Krzysztof Parzyszek	ee93e009c8	[Hexagon] Disable predicated calls by default llvm-svn: 302307	2017-05-05 22:13:57 +00:00
Krzysztof Parzyszek	e260332838	[Hexagon] Remove C6 and C7 as separate registers These are M0 and M1. Removing duplicated registers reduces the number of explicit register aliasing. llvm-svn: 302306	2017-05-05 22:12:12 +00:00
Krzysztof Parzyszek	d0c71ef8ab	[RDF] Remove covered parts of reached uses for phi and use in same block llvm-svn: 302305	2017-05-05 22:10:32 +00:00
Matthias Braun	4682ac6c83	ARM: Compute MaxCallFrame size early This exposes a method in MachineFrameInfo that calculates MaxCallFrameSize and calls it after instruction selection in the ARM target. This avoids ARMBaseRegisterInfo::canRealignStack()/ARMFrameLowering::hasReservedCallFrame() giving different answers in early/late phases of codegen. The testcase shows a particular nasty example result of that where we would fail to properly align an alloca. Differential Revision: https://reviews.llvm.org/D32622 llvm-svn: 302303	2017-05-05 22:04:05 +00:00
Kannan Narayanan	5e73b04b84	[AMDGPU] In the new waitcnt insertion pass, use getHeader instead of getTopBlock to find the loop header. Differential Revision: https://reviews.llvm.org/D32831 llvm-svn: 302290	2017-05-05 21:10:17 +00:00
Simon Pilgrim	430a335b7b	[X86] Use SDValue::getConstantOperandVal helper. NFCI. llvm-svn: 302286	2017-05-05 20:53:52 +00:00
Konstantin Zhuravlyov	6ccb076aeb	AMDGPU/AMDHSA: Set COMPUTE_PGM_RSRC2:LDS_SIZE to 0 This field is populated by the CP Differential Revision: https://reviews.llvm.org/D32619 llvm-svn: 302277	2017-05-05 20:13:55 +00:00
Alexei Starovoitov	7bab73b1f8	[bpf] fix a bug which causes incorrect big endian reloc fixup o Add bpfeb support in BPF dwarfdump unit test case Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@fb.com> llvm-svn: 302265	2017-05-05 18:05:00 +00:00
Craig Topper	f0aeee01c3	[KnownBits] Add wrapper methods for setting and clear all bits in the underlying APInts in KnownBits. This adds routines for reseting KnownBits to unknown, making the value all zeros or all ones. It also adds methods for querying if the value is zero, all ones or unknown. Differential Revision: https://reviews.llvm.org/D32637 llvm-svn: 302262	2017-05-05 17:36:09 +00:00

... 3 4 5 6 7 ...

42815 Commits