llvm-project

Commit Graph

Author	SHA1	Message	Date
Vasileios Kalintiris	3955b75ba9	[mips][FastISel] Instantiate the MipsFastISel class only for targets that support FastISel. Summary: Instead of instantiating the MipsFastISel class and checking if the target is supported in the overriden methods, we should perform that check before creating the class. This allows us to enable FastISel only for targets that truly support it, ie. MIPS32 to MIPS32R5. Reviewers: sdardis Subscribers: ehostunreach, llvm-commits Differential Revision: https://reviews.llvm.org/D24824 llvm-svn: 284475	2016-10-18 13:05:42 +00:00
George Rimar	bcfcb9e60f	[llvm-readobj] - Teach readobj to print PT_OPENBSD_RANDOMIZE/PT_OPENBSD_WXNEEDED headers. These are OpenBSD specific program headers and currently we support them in LLD. Description of headers (just in case) available here: http://man.openbsd.org/OpenBSD-current/man5/elf.5 OpenBSD commits were: For PT_OPENBSD_RANDOMIZE: `c494713c45` For PT_OPENBSD_WXNEEDED: `2a5a8fc7e3` Differential revision: https://reviews.llvm.org/D25616 llvm-svn: 284471	2016-10-18 10:54:56 +00:00
John Brawn	ecf79300dd	[SCEV] More accurate calculation of max backedge count of some less-than loops In loops that look something like i = n; do { ... } while(i++ < n+k); where k is a constant, the maximum backedge count is k (in fact the backedge count will be either 0 or k, depending on whether n+k wraps). More generally for LHS < RHS if RHS-(LHS of first comparison) is a constant then the loop will iterate either 0 or that constant number of times. This allows for more loop unrolling with the recent upper bound loop unrolling changes, and I'm working on a patch that will let loop unrolling additionally make use of the loop being executed either 0 or k times (we need to retain the loop comparison only on the first unrolled iteration). Differential Revision: https://reviews.llvm.org/D25607 llvm-svn: 284465	2016-10-18 10:10:53 +00:00
Simon Pilgrim	33f138b566	[X86][SSE] Added extra (mul x, (1 << c)) -> x << c style vector tests vXi64 will benefit more from lowering to shifts than multiplies llvm-svn: 284461	2016-10-18 09:29:13 +00:00
Javed Absar	e7c338081a	[ARM] Assign cost of scaling for Cortex-R52 This patch assigns cost of the scaling used in addressing for Cortex-R52. On Cortex-R52 a negated register offset takes longer than a non-negated register offset, in a register-offset addressing mode. Differential Revision: http://reviews.llvm.org/D25670 Reviewer: jmolloy llvm-svn: 284460	2016-10-18 09:08:54 +00:00
Simon Pilgrim	4ddc92b6cd	[X86][SSE] Add lowering to cvttpd2dq/cvttps2dq for sitofp v2f64/2f32 to 2i32 As discussed on PR28461 we currently miss the chance to lower "fptosi <2 x double> %arg to <2 x i32>" to cvttpd2dq due to its use of illegal types. This patch adds support for fptosi to 2i32 from both 2f64 and 2f32. It also recognises that cvttpd2dq zeroes the upper 64-bits of the xmm result (similar to D23797) - we still don't do this for the cvttpd2dq/cvttps2dq intrinsics - this can be done in a future patch. Differential Revision: https://reviews.llvm.org/D23808 llvm-svn: 284459	2016-10-18 07:42:15 +00:00
Dean Michael Berris	156f6cafc2	[XRay] Support for for tail calls for ARM no-Thumb This patch adds simplified support for tail calls on ARM with XRay instrumentation. Known issue: compiled with generic flags: `-O3 -g -fxray-instrument -Wall -std=c++14 -ffunction-sections -fdata-sections` (this list doesn't include my specific flags like --target=armv7-linux-gnueabihf etc.), the following program #include <cstdio> #include <cassert> #include <xray/xray_interface.h> [[clang::xray_always_instrument]] void __attribute__ ((noinline)) fC() { std::printf("In fC()\n"); } [[clang::xray_always_instrument]] void __attribute__ ((noinline)) fB() { std::printf("In fB()\n"); fC(); } [[clang::xray_always_instrument]] void __attribute__ ((noinline)) fA() { std::printf("In fA()\n"); fB(); } // Avoid infinite recursion in case the logging function is instrumented (so calls logging // function again). [[clang::xray_never_instrument]] void simplyPrint(int32_t functionId, XRayEntryType xret) { printf("XRay: functionId=%d type=%d.\n", int(functionId), int(xret)); } int main(int argc, char* argv[]) { __xray_set_handler(simplyPrint); printf("Patching...\n"); __xray_patch(); fA(); printf("Unpatching...\n"); __xray_unpatch(); fA(); return 0; } gives the following output: Patching... XRay: functionId=3 type=0. In fA() XRay: functionId=3 type=1. XRay: functionId=2 type=0. In fB() XRay: functionId=2 type=1. XRay: functionId=1 type=0. XRay: functionId=1 type=1. In fC() Unpatching... In fA() In fB() In fC() So for function fC() the exit sled seems to be called too much before function exit: before printing In fC(). Debugging shows that the above happens because printf from fC is also called as a tail call. So first the exit sled of fC is executed, and only then printf is jumped into. So it seems we can't do anything about this with the current approach (i.e. within the simplification described in https://reviews.llvm.org/D23988 ). Differential Revision: https://reviews.llvm.org/D25030 llvm-svn: 284456	2016-10-18 05:54:15 +00:00
Craig Topper	72b9f9864f	[AVX-512] Add test case to check shuffle decoding for masked vpermilps for r284450. This is harder to do for vpermilpd as shuffle combining turns the constant vector into an immediate since all vpermilpd's inputs with constant vector can also be encoded with the immediate form. llvm-svn: 284455	2016-10-18 05:44:04 +00:00
Craig Topper	448358b5f1	[X86] Fix DecodeVPERMVMask to handle cases where the constant pool entry has a different type than the shuffle itself. This is especially important for 32-bit targets with 64-bit shuffle elements. llvm-svn: 284453	2016-10-18 04:48:33 +00:00
Craig Topper	7268bf99ab	[AVX-512] Fix DecodeVPERMV3Mask to handle cases where the constant pool entry has a different type than the shuffle itself. Summary: This is especially important for 32-bit targets with 64-bit shuffle elements.This is similar to how PSHUFB and VPERMIL handle the same problem. Reviewers: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D25666 llvm-svn: 284451	2016-10-18 04:00:32 +00:00
Konstantin Zhuravlyov	98a3ac7106	[AMDGPU] Mark .note section SHF_ALLOC so lld creates a segment for it Differential Revision: https://reviews.llvm.org/D25694 llvm-svn: 284435	2016-10-17 22:40:15 +00:00
Kevin Enderby	2490de06f7	Next set of additional error checks for invalid Mach-O files for the load commands that use the MachO::sub_framework_command, MachO::sub_umbrella_command, MachO::sub_library_command and MachO::sub_client_command types but are not used in llvm libObject code but used in llvm tool code. This includes the LC_SUB_FRAMEWORK, LC_SUB_UMBRELLA, LC_SUB_LIBRARY and LC_SUB_CLIENT load commands. llvm-svn: 284431	2016-10-17 22:09:25 +00:00
Sanjay Patel	8716b3cbe0	remove FIXME comment (fixed with r284424); NFC llvm-svn: 284427	2016-10-17 21:08:39 +00:00
Sanjay Patel	523cd8290a	[DAG] use isConstOrConstSplat in ComputeNumSignBits to optimize SRA The scalar version of this pattern was noted in: https://reviews.llvm.org/D25485 and fixed with: https://reviews.llvm.org/rL284395 More refactoring of the constant/splat helpers is needed and will happen in follow-up patches. Differential Revision: https://reviews.llvm.org/D25685 llvm-svn: 284424	2016-10-17 20:41:39 +00:00
Davide Italiano	84bd58e915	[opt] Strip coverage if debug info is not present. If -coverage is passed, but -g is not, clang populates the PassManager pipeline with StripSymbols(debugOnly = true). The stripSymbol pass therefore scans the list of named metadata, drops !llvm.dbg.cu, but leaves !llvm.gcov and !0 (the compileUnit MD) around. The verifier runs, and finds out that there's a CU not listed in !llvm.dbg.cu (as it was previously dropped) -> crash. When we strip debug info, so, check if there's coverage data, and strip it as well, in order to avoid pending metadata left around. Differential Revision: https://reviews.llvm.org/D25689 llvm-svn: 284418	2016-10-17 20:05:35 +00:00
Dehao Chen	018a3afa99	Ignore debug info when making optimization decisions in SimplifyCFG. Summary: Debug info should not affect code generation. This patch properly handles debug info to make sure the generated code are the same with or without debug info. Reviewers: davidxl, mzolotukhin, jmolloy Subscribers: aprantl, llvm-commits Differential Revision: https://reviews.llvm.org/D25286 llvm-svn: 284415	2016-10-17 19:28:44 +00:00
Walter Erquinigo	b58d6a5655	Handle relocations to thumb functions when dynamic linking COFF modules Summary: This adds the necessary logic to support relocations to thumb functions in the COFF dynamic linker. The jumps to function addresses are mostly blx, which requires the ISA selection bit when jumping to a thumb function. Note: I'm determining if the relocation requires the ISA bit when creating the relocation entries and not when resolving the relocation. I have to do that because I need the ObjectFile and the actual Symbol, which are available only when creating the entries. It would require a gross refactor if I do it otherwise, but I'm okay with doing it if you think it's better. Reviewers: peter.smith, compnerd Subscribers: rengolin, sas Differential Revision: https://reviews.llvm.org/D25151 llvm-svn: 284410	2016-10-17 18:56:18 +00:00
Tim Northover	020d104496	GlobalISel: support wider range of load/store sizes in AArch64. llvm-svn: 284406	2016-10-17 18:36:53 +00:00
Tom Stellard	bc6c523cce	AMDGPU/SI: Fix LowerParameter() for i16 arguments Summary: If we are loading an i16 value from a 32-bit memory location, then we need to be able to truncate the loaded value to i16. Reviewers: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, tony-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D25198 llvm-svn: 284397	2016-10-17 16:21:45 +00:00
Sanjay Patel	2cf6bfaf73	[DAG] optimize away an arithmetic-right-shift of a 0 or -1 value This came up as part of: https://reviews.llvm.org/D25485 Note that the vector case is missed because ComputeNumSignBits() is deficient for vectors. llvm-svn: 284395	2016-10-17 15:58:28 +00:00
Sanjay Patel	95db75791e	[x86] add tests to show missing DAG folds for arithmetic-shift-right llvm-svn: 284394	2016-10-17 15:44:59 +00:00
Sanjay Patel	832962110a	[x86] auto-generate checks llvm-svn: 284393	2016-10-17 15:38:41 +00:00
George Rimar	272c410c24	[Object/ELF] - Check Header->e_shoff value earlier and do not crash. Patch checks that section pointer is aligned properly. This should be done before getStringTable() call. Differential revision: https://reviews.llvm.org/D25462 llvm-svn: 284387	2016-10-17 14:28:12 +00:00
James Molloy	aa79b19a3e	[SDAG] Use ABI type alignment for constant pools when optimizing for size SelectionDAG::getConstantPool will automatically determine an appropriate alignment if one is not specified. It does this by querying the type's preferred alignment. This can end up creating quite a lot of padding when the preferred alignment for vectors is 128. In optimize-for-size mode, it makes sense to instead query the ABI type alignment which is often smaller and causes less padding. llvm-svn: 284381	2016-10-17 12:54:07 +00:00
Oliver Stannard	fe4432b105	[SimplifyCFG] Don't lower complex ConstantExprs to lookup tables Not all ConstantExprs can be represented by a global variable, for example most pointer arithmetic other than addition of a constant, so we can't convert these values from switch statements to lookup tables. Differential Revision: https://reviews.llvm.org/D25550 llvm-svn: 284379	2016-10-17 12:00:24 +00:00
Tobias Grosser	2bbec0ee7f	[SCEV] Consider delinearization pattern with extension with identity factor Summary: The delinearization algorithm did not consider terms which had an extension without a multiply factor, i.e. a identify factor. We lose cases where size is char type where there will no multiply factor. Reviewers: sanjoy, grosser Subscribers: mzolotukhin, Eugene.Zelenko, llvm-commits, mssimpso, sanjoy, grosser Differential Revision: https://reviews.llvm.org/D16492 llvm-svn: 284378	2016-10-17 11:56:26 +00:00
Andrea Di Biagio	fa90c692db	[CodeGenPrepare] When moving a zext near to its associated load, do not retain the original debug location. CodeGenPrepare knows how to move a zext of a load into the same basic block where the load lives. The goal is to help ISel match a zero-extending load instead of two separated instructions. CGP attempts to move a zext computation even if it lives in a basic block that does not post-dominate the load's basic block. That means, the hoisted zext may be speculated. Preserving the zext location would hurt the debugging experience and the quality of sample pgo. With this patch, when moving a zext near to its associated load, CGP no longer propagates the zext's debug location. Instead, CGP conservatively reuses the same debug location for the load and the zext. An alternative approach would be to assign an artificial line-0 location to the zext. However we don't want to over-use the 'line-0' for this particular case because it would have a size cost in the line-table section for no additional benefit. Differential Revision: https://reviews.llvm.org/D25611 llvm-svn: 284377	2016-10-17 11:32:26 +00:00
George Rimar	65807f899b	Recommit r284371 "[Object/ELF] - Check that e_shnum is null when e_shoff is." With fix: hex edited the precompiled inputs from another testcases to pass new checks. Original commit message: [Object/ELF] - Check that e_shnum is null when e_shoff is. Spec says (http://www.sco.com/developers/gabi/1998-04-29/ch4.eheader.html) : e_shnum This member holds the number of entries in the section header table. Thus the product of e_shentsize and e_shnum gives the section header table's size in bytes. If a file has no section header table, e_shnum holds the value zero. Revealed using "id_000037,sig_11,src_000015,op_havoc,rep_8" from PR30540 That was the reason of crash in lld on incorrect input file. Binary reduced using afl-min. Differential revision: https://reviews.llvm.org/D25090 llvm-svn: 284374	2016-10-17 10:58:02 +00:00
George Rimar	830a62aa39	Revert r284371 "[Object/ELF] - Check that e_shnum is null when e_shoff is." It broke build bot: http://lab.llvm.org:8011/builders/clang-with-lto-ubuntu/builds/908/steps/test-stage1-compiler/logs/stdio llvm-svn: 284373	2016-10-17 10:20:47 +00:00
George Rimar	7d97e73589	[Object/ELF] - Check that e_shnum is null when e_shoff is. Spec says (http://www.sco.com/developers/gabi/1998-04-29/ch4.eheader.html) : e_shnum This member holds the number of entries in the section header table. Thus the product of e_shentsize and e_shnum gives the section header table's size in bytes. If a file has no section header table, e_shnum holds the value zero. Revealed using "id_000037,sig_11,src_000015,op_havoc,rep_8" from PR30540 That was the reason of crash in lld on incorrect input file. Binary reduced using afl-min. Differential revision: https://reviews.llvm.org/D25090 llvm-svn: 284371	2016-10-17 10:06:44 +00:00
George Rimar	71f3c1921a	[Object/ELF] - Do not crash on invalid section index. If object has wrong (large) string table index and also incorrect large value for amount of sections in total, then section index passes the check: if (Index >= getNumSections()) return object_error::invalid_section_index; But result pointer then is far after end of file data, what result in a crash. Differential revision: https://reviews.llvm.org/D25081 llvm-svn: 284369	2016-10-17 09:30:06 +00:00
Craig Topper	5b24cd31f5	[AVX-512] Add shuffle combining support for vpermi2var shuffles derived from existing support for vpermt2var. llvm-svn: 284357	2016-10-17 04:26:47 +00:00
Craig Topper	2052318700	[AVX-512] Add vpermi2var test cases to shuffle combining test case. Combining will be added in a future commit. llvm-svn: 284356	2016-10-17 04:26:44 +00:00
Craig Topper	715ad7fef5	[AVX-512] Add support for turning a 256-bit load that goes to both halfs of an insert_subvector into a subvector broadcast. Differential Revision: https://reviews.llvm.org/D25650 llvm-svn: 284353	2016-10-16 23:29:51 +00:00
Craig Topper	aa1370ac57	[AVX-512] Fix the operand order for vpermi2var_qi intrinsics to match the other vpermi2var intrinsics. llvm-svn: 284329	2016-10-16 04:54:35 +00:00
Craig Topper	4729fe8bb6	[AVX-512] Correct execution domain for VPERMT2PS and VPERMI2PS. llvm-svn: 284328	2016-10-16 04:54:31 +00:00
Davide Italiano	590ad7037e	[GVN/PRE] Hoist global values outside of loops. In theory this could be generalized to move anything where we prove the operands are available, but that would require rewriting PRE. As NewGVN will hopefully come soon, and we're trying to rewrite PRE in terms of NewGVN+MemorySSA, it's probably not worth spending too much time on it. Fix provided by Daniel Berlin! llvm-svn: 284311	2016-10-15 21:35:23 +00:00
Simon Pilgrim	730f83a750	[X86][SSE] Added some basic examples of knownbits failing for vector types computeKnownBits only returns the common bits of each vector element instead of only the elements that are actually used llvm-svn: 284308	2016-10-15 19:29:26 +00:00
Simon Pilgrim	d654530e55	[X86] Regenerate known bits test llvm-svn: 284306	2016-10-15 18:56:38 +00:00
Craig Topper	dde865afb5	[AVX-512] Add shuffle comments for vbroadcast instructions. llvm-svn: 284305	2016-10-15 16:26:07 +00:00
Tom Stellard	961811c906	AMDGPU/SI: Handle s_getreg hazard in GCNHazardRecognizer Reviewers: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, llvm-commits, tony-tye Differential Revision: https://reviews.llvm.org/D25526 llvm-svn: 284298	2016-10-15 00:58:14 +00:00
Tim Northover	69fa84a6e9	GlobalISel: rename legalizer components to match others. The previous names were both misleading (the MachineLegalizer actually contained the info tables) and inconsistent with the selector & translator (in having a "Machine") prefix. This should make everything sensible again. The only functional change is the name of a couple of command-line options. llvm-svn: 284287	2016-10-14 22:18:18 +00:00
Tim Northover	22bff66a9a	PowerPC: specify full triple to avoid different Darwin asm syntax. llvm-svn: 284281	2016-10-14 21:25:29 +00:00
Sanjay Patel	8f5bdb9d28	[ARM] add tests for PR30660 llvm-svn: 284280	2016-10-14 20:52:43 +00:00
Sanjay Patel	928f3d73f6	[PowerPC] add tests for PR30661 llvm-svn: 284279	2016-10-14 20:51:41 +00:00
Guozhi Wei	0cd65429be	[PPC] Shorter sequence to load 64bit constant with same hi/lo words This is a patch to implement pr30640. When a 64bit constant has the same hi/lo words, we can use rldimi to copy the low word into high word of the same register. This optimization caused failure of test case bperm.ll because of not optimal heuristic in function SelectAndParts64. It chooses AND or ROTATE to extract bit groups from a register, and OR them together. This optimization lowers the cost of loading 64bit constant mask used in AND method, and causes different code sequence. But actually ROTATE method is better in this test case. The reason is in ROTATE method the final OR operation can be avoided since rldimi can insert the rotated bits into target register directly. So this patch also enhances SelectAndParts64 to prefer ROTATE method when the two methods have same cost and there are multiple bit groups need to be ORed together. Differential Revision: https://reviews.llvm.org/D25521 llvm-svn: 284276	2016-10-14 20:41:50 +00:00
Tom Stellard	09c2bd6bd4	AMDGPU/SI: Use new SimplifyDemandedBits helper for multi-use operations Summary: We are using this helper for our 24-bit arithmetic combines, so we are now able to eliminate multi-use operations that mask the high-bits of 24-bit inputs (e.g. and x, 0xffffff) Reviewers: arsenm, nhaehnle Subscribers: tony-tye, arsenm, kzhuravl, wdng, nhaehnle, llvm-commits, yaxunl Differential Revision: https://reviews.llvm.org/D24672 llvm-svn: 284267	2016-10-14 19:14:29 +00:00
David L Kreitzer	01a057a0c4	Add a pass to optimize patterns of vectorized interleaved memory accesses for X86. The pass optimizes as a unit the entire wide load + shuffles pattern produced by interleaved vectorization. This initial patch optimizes one pattern (64-bit elements interleaved by a factor of 4). Future patches will generalize to additional patterns. Patch by Farhana Aleen Differential revision: http://reviews.llvm.org/D24681 llvm-svn: 284260	2016-10-14 18:20:41 +00:00
Tom Stellard	64a9d0876c	AMDGPU/SI: Don't allow unaligned scratch access Summary: The hardware doesn't support this. Reviewers: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, llvm-commits, tony-tye Differential Revision: https://reviews.llvm.org/D25523 llvm-svn: 284257	2016-10-14 18:10:39 +00:00
David L Kreitzer	d5c6755d83	[safestack] Use non-thread-local unsafe stack pointer for Contiki OS Patch by Michael LeMay Differential revision: http://reviews.llvm.org/D19852 llvm-svn: 284254	2016-10-14 17:56:00 +00:00
Pierre Gousseau	b6d652adb5	[X86] Take advantage of the lzcnt instruction on btver2 architectures when ORing comparisons to zero. This change adds transformations such as: zext(or(setcc(eq, (cmp x, 0)), setcc(eq, (cmp y, 0)))) To: srl(or(ctlz(x), ctlz(y)), log2(bitsize(x)) This optimisation is beneficial on Jaguar architecture only, where lzcnt has a good reciprocal throughput. Other architectures such as Intel's Haswell/Broadwell or AMD's Bulldozer/PileDriver do not benefit from it. For this reason the change also adds a "HasFastLZCNT" feature which gets enabled for Jaguar. Differential Revision: https://reviews.llvm.org/D23446 llvm-svn: 284248	2016-10-14 16:41:38 +00:00
Sanjay Patel	6d6eca5cdc	[InstCombine] use m_APInt to allow sub with constant folds for splat vectors llvm-svn: 284247	2016-10-14 16:31:54 +00:00
Sanjay Patel	ecd0da2619	[InstCombine] add tests for missing vector folds llvm-svn: 284245	2016-10-14 15:55:34 +00:00
Sanjay Patel	a3bc38b36d	[InstCombine] auto-generate checks llvm-svn: 284244	2016-10-14 15:41:25 +00:00
Sanjay Patel	0b611dcabf	[InstCombine] remove redundant test This test was apparently checking for 2 independent folds, but we have plenty of tests for those individual folds already. We are lacking vector tests, however, because we don't have the shift folds for vectors. llvm-svn: 284243	2016-10-14 15:36:28 +00:00
Sanjay Patel	ad0757febb	[InstCombine] update test to use FileCheck and auto-generate checks llvm-svn: 284242	2016-10-14 15:30:31 +00:00
Sanjay Patel	c6c5965a42	[InstCombine] sub X, sext(bool Y) -> add X, zext(bool Y) Prefer add/zext because they are better supported in terms of value-tracking. Note that the backend should be prepared for this IR canonicalization (including vector types) after: https://reviews.llvm.org/rL284015 Differential Revision: https://reviews.llvm.org/D25135 llvm-svn: 284241	2016-10-14 15:24:31 +00:00
Sanjay Patel	00fc7a6159	[DAG] add folds for negated shifted sign bit The same folds exist in InstCombine already. This came up as part of: https://reviews.llvm.org/D25485 llvm-svn: 284239	2016-10-14 14:26:47 +00:00
Sanjay Patel	7b4e4afb61	[x86] add tests to show missing folds for negated shifted sign bit llvm-svn: 284238	2016-10-14 14:14:40 +00:00
Nicolai Haehnle	67624af0cc	AMDGPU: Select 64-bit {ADD,SUB}{C,E} nodes Summary: This will be used for 64-bit MULHU, which is in turn used for the 64-bit divide-by-constant optimization (see D24822). Reviewers: arsenm, tstellarAMD Subscribers: kzhuravl, wdng, yaxunl, llvm-commits, tony-tye Differential Revision: https://reviews.llvm.org/D25289 llvm-svn: 284224	2016-10-14 10:30:00 +00:00
Diana Picus	68c7b04e8d	[GlobalISel] Get the AArch64 tests to work on Linux Mostly this just means changing the triple from aarch64-apple-ios to the generic aarch64--. Only one test needs more significant changes, but GlobalISel already does the right thing so it's ok to just change the checks. Differential Revision: https://reviews.llvm.org/D25532 llvm-svn: 284223	2016-10-14 10:19:40 +00:00
Simon Dardis	b3fd189cb5	[mips] Fix aui/daui/dahi/dati for MIPSR6 For compatiblity with binutils, define these instructions to take two registers with a 16bit unsigned immediate. Both of the registers have to be same for dahi and dati. Reviewers: dsanders, zoran.jovanovic Differential Review: https://reviews.llvm.org/D21473 llvm-svn: 284218	2016-10-14 09:31:42 +00:00
Craig Topper	40feb7f157	[DAGCombiner] Teach createBuildVecShuffle to handle cases where input vectors are less than half of the output vector size. This will be needed by a future commit to support sign/zero extending from v8i8 to v8i64 which requires a sign/zero_extend_vector_inreg to be created which requires v8i8 to be concatenated upto v64i8 and goes through this code. llvm-svn: 284204	2016-10-14 06:00:42 +00:00
Konstantin Zhuravlyov	c96b5d7073	[AMDGPU] Emit 32-bit lo/hi got and pc relative variant kinds for external and global address space variables Differential Revision: https://reviews.llvm.org/D25562 llvm-svn: 284196	2016-10-14 04:37:34 +00:00
Konstantin Zhuravlyov	2a2ac37c2c	[AMDGPU] Add 32-bit lo/hi got and pc relative variant kinds and emit appropriate relocations Differential Revision: https://reviews.llvm.org/D25548 llvm-svn: 284195	2016-10-14 04:21:32 +00:00
Konstantin Zhuravlyov	ee68fdadfe	[Support/ELF/AMDGPU] Add 32-bit lo/hi got and pc relative relocations Added relocation names: - R_AMDGPU_GOTPCREL32_LO - R_AMDGPU_GOTPCREL32_HI - R_AMDGPU_REL32_LO - R_AMDGPU_REL32_HI AMDGPU isa only supports 32-bit immediates. In order to access 64-bit address we need to generate 32-bit lo/hi relocations, and do the right math (separate patch). Currently we only generate one 32 bit relocation for lower bits for each access, losing higher bits. Hence we need relocations listed above. Differential Revision: https://reviews.llvm.org/D25546 llvm-svn: 284191	2016-10-14 04:03:49 +00:00
Saleem Abdulrasool	7705c4f1be	CodeGen: use MSVC division on windows itanium Windows itanium is identical to MSVC when dealing with everything but C++. Lower the math routines into msvcrt rather than compiler-rt. llvm-svn: 284175	2016-10-13 23:00:11 +00:00
Saleem Abdulrasool	06383dd272	CodeGen: adjust floating point operations in Windows itanium Windows itanium is equivalent to MSVC except in C++ mode. Ensure that the promote the 32-bit floating point operations to their 64-bit equivalences. llvm-svn: 284173	2016-10-13 22:38:15 +00:00
Sriraman Tallam	f29fa586e1	New llc option pie-copy-relocations to optimize access to extern globals. This option indicates copy relocations support is available from the linker when building as PIE and allows accesses to extern globals to avoid the GOT. Differential Revision: https://reviews.llvm.org/D24849 llvm-svn: 284160	2016-10-13 20:54:39 +00:00
Nirav Dave	a81682aad4	Revert "In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled." This reverts commit r284151 which appears to be triggering a LTO failures on Hexagon llvm-svn: 284157	2016-10-13 20:23:25 +00:00
Nirav Dave	4b36957243	In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled. Retrying after upstream changes. Simplify Consecutive Merge Store Candidate Search Now that address aliasing is much less conservative, push through simplified store merging search which only checks for parallel stores through the chain subgraph. This is cleaner as the separation of non-interfering loads/stores from the store-merging logic. Whem merging stores, search up the chain through a single load, and finds all possible stores by looking down from through a load and a TokenFactor to all stores visited. This improves the quality of the output SelectionDAG and generally the output CodeGen (with some exceptions). Additional Minor Changes: 1. Finishes removing unused AliasLoad code 2. Unifies the the chain aggregation in the merged stores across code paths 3. Re-add the Store node to the worklist after calling SimplifyDemandedBits. 4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is arbitrary, but seemed sufficient to not cause regressions in tests. This finishes the change Matt Arsenault started in r246307 and jyknight's original patch. Many tests required some changes as memory operations are now reorderable. Some tests relying on the order were changed to use volatile memory operations Noteworthy tests: CodeGen/AArch64/argument-blocks.ll - It's not entirely clear what the test_varargs_stackalign test is supposed to be asserting, but the new code looks right. CodeGen/AArch64/arm64-memset-inline.lli - CodeGen/AArch64/arm64-stur.ll - CodeGen/ARM/memset-inline.ll - The backend now generates worse code due to store merging succeeding, as we do do a 16-byte constant-zero store efficiently. CodeGen/AArch64/merge-store.ll - Improved, but there still seems to be an extraneous vector insert from an element to itself? CodeGen/PowerPC/ppc64-align-long-double.ll - Worse code emitted in this case, due to the improved store->load forwarding. CodeGen/X86/dag-merge-fast-accesses.ll - CodeGen/X86/MergeConsecutiveStores.ll - CodeGen/X86/stores-merging.ll - CodeGen/Mips/load-store-left-right.ll - Restored correct merging of non-aligned stores CodeGen/AMDGPU/promote-alloca-stored-pointer-value.ll - Improved. Correctly merges buffer_store_dword calls CodeGen/AMDGPU/si-triv-disjoint-mem-access.ll - Improved. Sidesteps loading a stored value and merges two stores CodeGen/X86/pr18023.ll - This test has been removed, as it was asserting incorrect behavior. Non-volatile stores CAN be moved past volatile loads, and now are. CodeGen/X86/vector-idiv.ll - CodeGen/X86/vector-lzcnt-128.ll - It's basically impossible to tell what these tests are actually testing. But, looks like the code got better due to the memory operations being recognized as non-aliasing. CodeGen/X86/win32-eh.ll - Both loads of the securitycookie are now merged. CodeGen/AMDGPU/vgpr-spill-emergency-stack-slot-compute.ll - This test appears to work but no longer exhibits the spill behavior. Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle Subscribers: wdng, nhaehnle, nemanjai, arsenm, weimingz, niravd, RKSimon, aemerson, qcolombet, dsanders, resistor, tstellarAMD, t.p.northover, spatel Differential Revision: https://reviews.llvm.org/D14834 llvm-svn: 284151	2016-10-13 19:20:16 +00:00
David L Kreitzer	d9ca3589de	[safestack] Move X86-targeted tests into the X86 subdirectory. Patch by Michael LeMay Differential revision: http://reviews.llvm.org/D25340 llvm-svn: 284139	2016-10-13 17:51:59 +00:00
Reid Kleckner	edfc9dcf42	Truncate long names in type records In the MS ABI, the frontend is supposed to MD5 such pathologically long names. LLVM should still defend itself from long names, though. Fixes part of PR29098. llvm-svn: 284136	2016-10-13 17:33:22 +00:00
Igor Breger	8409c356ad	[X86][AVX512] Fix sext v32i1 -> v32i8 lowering. Fix PR30600. Differential Revision: https://reviews.llvm.org/D25554 llvm-svn: 284134	2016-10-13 17:20:38 +00:00
Reid Kleckner	468e793fea	Fix for PR30687. Avoid dereferencing MBB.end(). We don't need to return a MachineInstr* from these stack probe insertion calls anyway. If we ever need to add it back, we can return an iterator instead. Based on a patch by David Kreitzer This bug is a consequence of r279314 \| dexonsmith \| 2016-08-19 13:40:12 -0700 (Fri, 19 Aug 2016) \| 110 lines We hit the "Assertion `!NodePtr->isKnownSentinel()' failed" assertion, but only when inserting a stack probe call at the end of an MBB, which isn't necessarily a common situation. Differential Revision: https://reviews.llvm.org/D25566 llvm-svn: 284130	2016-10-13 15:48:48 +00:00
Javed Absar	85874a9360	[ARM]: Assign cost of scaling used in addressing mode for ARM cores This patch assigns cost of the scaling used in addressing. On many ARM cores, a negated register offset takes longer than a non-negated register offset, in a register-offset addressing mode. For instance: LDR R0, [R1, R2 LSL #2] LDR R0, [R1, -R2 LSL #2] Above, (1) takes less cycles than (2). By assigning appropriate scaling factor cost, we enable the LLVM to make the right trade-offs in the optimization and code-selection phase. Differential Revision: http://reviews.llvm.org/D24857 Reviewers: jmolloy, rengolin llvm-svn: 284127	2016-10-13 14:57:43 +00:00
Matthew Simpson	1d4b163fc0	[LV] Account for predicated stores in instruction costs This patch ensures that we scale the estimated cost of predicated stores by block probability. This is a follow-on patch for r284123. llvm-svn: 284126	2016-10-13 14:54:31 +00:00
Sanjay Patel	24b6ef7792	[x86] add negate-i1 run for 32-bit target llvm-svn: 284124	2016-10-13 14:27:08 +00:00
Matthew Simpson	6cdb5a6f96	[LV] Avoid rounding errors for predicated instruction costs This patch modifies the cost calculation of predicated instructions (div and rem) to avoid the accumulation of rounding errors due to multiple truncating integer divisions. The calculation for predicated stores will be addressed in a follow-on patch since we currently don't scale the cost of predicated stores by block probability. Differential Revision: https://reviews.llvm.org/D25333 llvm-svn: 284123	2016-10-13 14:19:48 +00:00
Simon Pilgrim	cb59b5257c	[DAGCombiner] Add vector support to (mul (shl X, Y), Z) -> (shl (mul X, Z), Y) style combines llvm-svn: 284122	2016-10-13 14:04:35 +00:00
Matt Arsenault	253640e18d	AMDGPU: Assume spilling will occur at -O0 Because everything live is spilled at the end of a block by fast regalloc, assume this will happen and avoid the copies of the resource descriptor. llvm-svn: 284119	2016-10-13 13:10:00 +00:00
Simon Pilgrim	26b6dbc369	Copy+pasts typo in comment describing combine test Repeated the "fold (mul x, 0) -> 0" instead of "fold (mul x, 1) -> x" llvm-svn: 284118	2016-10-13 12:54:32 +00:00
Simon Pilgrim	fa8fadc0e5	[DAGCombiner] Add vector support to C2-(A+C1) -> (C2-C1)-A folding llvm-svn: 284117	2016-10-13 12:49:31 +00:00
Simon Dardis	515e8699f4	[mips] Add IAS support for dvp, evp These instructions were only defined for microMIPSR6 previously. Add definitions for MIPSR6, correct definitions for microMIPSR6, flag these instructions as having unmodelled side effects (they disable/enable virtual processors) and add missing disassember tests for microMIPSR6. Reviewers: vkalintiris Differential Review: https://reviews.llvm.org/D24291 llvm-svn: 284115	2016-10-13 12:12:56 +00:00
Simon Pilgrim	833b8a2071	[DAGCombiner] Add vector support to (sub -1, x) -> (xor x, -1) canonicalization Improves commutation potential llvm-svn: 284113	2016-10-13 12:05:20 +00:00
Oren Ben Simhon	92ccbf20ff	[X86] Basic additions to support RegCall Calling Convention. The Register Calling Convention (RegCall) was introduced by Intel to optimize parameter transfer on function call. This calling convention ensures that as many values as possible are passed or returned in registers. This commit presents the basic additions to LLVM CodeGen in order to support RegCall in X86. Differential Revision: http://reviews.llvm.org/D25022 llvm-svn: 284108	2016-10-13 07:53:43 +00:00
Craig Topper	3d41f91f61	[AVX-512] Fix v16i32 zero extending shuffle test case so it's really zero extend. llvm-svn: 284106	2016-10-13 05:41:01 +00:00
Craig Topper	ff23af4299	[AVX-512] Teach shuffle lowering to recognize 512-bit zero extends. llvm-svn: 284105	2016-10-13 05:29:41 +00:00
Craig Topper	05242739c2	[AVX-512] Add tests for basic 512-bit zero extending shuffle patterns. Code will be improved in a future commit. llvm-svn: 284104	2016-10-13 05:29:37 +00:00
Sebastian Pop	5ba9f24ed7	commit back "GVN-hoist: fix store past load dependence analysis (PR30216, PR30499)" This is with an extra change to avoid calling MemoryLocation::get() on a call instruction. Differential Revision: https://reviews.llvm.org/D25542 llvm-svn: 284098	2016-10-13 01:39:10 +00:00
Quentin Colombet	6b87a3109c	[AArch64][RegisterBankInfo] Provide alternative mappings for 64-bit load This allows RegBankSelect in greedy mode to get rid some of the cross register bank copies when loads are involved in the chain of computation. llvm-svn: 284097	2016-10-13 01:01:23 +00:00
Reid Kleckner	741d8a21d3	Correct PrivateLinkage for COFF - Use storage class C_STAT for 'PrivateLinkage' The storage class for PrivateLinkage should equal to the Internal Linkage. - Set 'PrivateGlobalPrefix' from "L" to ".L" for MM_WinCOFF (includes x86_64) MM_WinCOFF has empty GlobalPrefix '\0' so PrivateGlobalPrefix "L" may conflict to the normal symbol name starting with 'L'. Based on a patch by Han Sangjin! Manually updated test cases. llvm-svn: 284096	2016-10-13 00:55:24 +00:00
Quentin Colombet	cd80e97e88	[AArch64][RegisterBankInfo] Provide alternative mappings for G_BITCASTs. Thanks to this patch, RegBankSelect is able to get rid of some register bank copies as demonstrated in the test case. llvm-svn: 284094	2016-10-13 00:34:48 +00:00
Reid Kleckner	8958f6a529	Revert "GVN-hoist: fix store past load dependence analysis (PR30216, PR30499)" This CL didn't actually address the test case in PR30499, and clang still crashes. Also revert dependent change "Memory-SSA cleanup of clobbers interface, NFC" Reverts r283965 and r283967. llvm-svn: 284093	2016-10-13 00:18:26 +00:00
Quentin Colombet	9e64919b7c	[AArch64][RegisterBankInfo] Use static mapping for same bank G_BITCAST. NFC. llvm-svn: 284090	2016-10-13 00:12:04 +00:00
Quentin Colombet	db643d9091	[AArch64][MachineLegalizer] Mark more G_BITCAST as legal. Basically any vector types that fits in a 32-bit register is also valid as far as copies are concerned. llvm-svn: 284089	2016-10-13 00:12:01 +00:00
Albert Gutowski	3245ee7e57	fix function label name in addressofreturnaddress test llvm-svn: 284085	2016-10-12 23:58:45 +00:00
Krzysztof Parzyszek	abc0662f04	Handle lane masks in LivePhysRegs when adding live-ins Differential Revision: https://reviews.llvm.org/D25533 llvm-svn: 284076	2016-10-12 22:53:41 +00:00
Tim Northover	fb8d989818	GlobalISel: support G_TRUNC selection on AArch64. Ahmed's patch again. llvm-svn: 284075	2016-10-12 22:49:15 +00:00
Tim Northover	69271c64d5	GlobalISel: support int <-> float conversions on AArch64. More of Ahmed's work. llvm-svn: 284074	2016-10-12 22:49:11 +00:00

1 2 3 4 5 ...

40260 Commits