llvm-project

Commit Graph

Author	SHA1	Message	Date
Artur Pilipenko	c93cc5955f	[DAGCombiner] Match load by bytes idiom and fold it into a single load Match a pattern where a wide type scalar value is loaded by several narrow loads and combined by shifts and ors. Fold it into a single load or a load and a bswap if the targets supports it. Assuming little endian target: i8 a = ... i32 val = a[0] \| (a[1] << 8) \| (a[2] << 16) \| (a[3] << 24) => i32 val = ((i32)a) i8 a = ... i32 val = (a[0] << 24) \| (a[1] << 16) \| (a[2] << 8) \| a[3] => i32 val = BSWAP(((i32)a)) This optimization was discussed on llvm-dev some time ago in "Load combine pass" thread. We came to the conclusion that we want to do this transformation late in the pipeline because in presence of atomic loads load widening is irreversible transformation and it might hinder other optimizations. Eventually we'd like to support folding patterns like this where the offset has a variable and a constant part: i32 val = a[i] \| (a[i + 1] << 8) \| (a[i + 2] << 16) \| (a[i + 3] << 24) Matching the pattern above is easier at SelectionDAG level since address reassociation has already happened and the fact that the loads are adjacent is clear. Understanding that these loads are adjacent at IR level would have involved looking through geps/zexts/adds while looking at the addresses. The general scheme is to match OR expressions by recursively calculating the origin of individual bits which constitute the resulting OR value. If all the OR bits come from memory verify that they are adjacent and match with little or big endian encoding of a wider value. If so and the load of the wider type (and bswap if needed) is allowed by the target generate a load and a bswap if needed. Reviewed By: hfinkel, RKSimon, filcab Differential Revision: https://reviews.llvm.org/D26149 llvm-svn: 289538	2016-12-13 14:21:14 +00:00
Artur Pilipenko	01e86444a0	Move BaseIndexOffset in DAGCombiner.cpp so it will be available for the upcoming user llvm-svn: 289537	2016-12-13 14:16:02 +00:00
Simon Pilgrim	9dc67c0101	[SelectionDAG] computeKnownBits - simplified knownbits sign extension. NFCI. We don't need to extract+test the sign bit of the known ones/zeros, we can use sext which will handle all of this. llvm-svn: 289534	2016-12-13 13:36:27 +00:00
Simon Dardis	c97cfb69ba	[mips][rtdyld] Move MIPS relocation resolution to a subclass and implement N32 relocations N32 relocations are only correct for individual relocations at the moment. Support for relocation composition will follow in a later patch. Patch By: Daniel Sanders Reviwers: vkalintiris, atanasyan Differential Revision: https://reviews.llvm.org/D27467 llvm-svn: 289532	2016-12-13 11:39:18 +00:00
Simon Dardis	e8af792439	[mips] Fix comment to respect 80 chars per line; NFC llvm-svn: 289530	2016-12-13 11:10:53 +00:00
Simon Dardis	43b5ce492d	[mips] Fix compact branch hazard detection In certain cases it is possible that transient instructions such as %reg = IMPLICIT_DEF as a single instruction in a basic block to reach the MipsHazardSchedule pass. This patch teaches MipsHazardSchedule to properly look through such cases. Reviewers: vkalintiris, zoran.jovanovic Differential Revision: https://reviews.llvm.org/D27209 llvm-svn: 289529	2016-12-13 11:07:51 +00:00
Diana Picus	2d9adbf524	[GlobalISel] Move extendRegister where it belongs. NFCI Apparently I missed this one when I moved ValueHandler back in r288658. Sorry! llvm-svn: 289528	2016-12-13 10:46:12 +00:00
Craig Topper	ac75bca1eb	[X86][InstCombine] Fix SimplifyDemandedVectorElts to handle frcz scalar intrinsics correctly. Only the lower bits of the input element are used. And only the lower element can be undef since the upper bits are zeroed. Have InstCombineCalls call SimplifyDemandedVectorElts for these intrinsics to reuse this support. llvm-svn: 289523	2016-12-13 07:45:45 +00:00
NAKAMURA Takumi	b8ea75a010	llvm/test/Transforms/PGOProfile/noreturncall.ll REQUIRES asserts due to -debug-only. llvm-svn: 289522	2016-12-13 07:04:03 +00:00
Rong Xu	51a1e3c430	[PGO] Fix insane counts due to nonreturn calls Summary: Since we don't break BBs for function calls. We might get some insane counts (wrap of unsigned) in the presence of noreturn calls. This patch sets these counts to zero instead of the wrapped number. Reviewers: davidxl Subscribers: xur, eraman, llvm-commits Differential Revision: https://reviews.llvm.org/D27602 llvm-svn: 289521	2016-12-13 06:41:14 +00:00
Davide Italiano	463bebc319	[SCCP] Debug diagnostic goes under DEBUG(). NFCI. llvm-svn: 289519	2016-12-13 05:56:04 +00:00
Dylan McKay	1e57fa487b	[AVR] Add an 'relax memory operation' pass Summary: This pass will be used to relax instructions which use out of bounds memory accesses to equivalent operations that can work with the addresses. The pass currently implements relaxation for the STDWPtrQRr instruction. Without this pass, an assertion error would be hit in the pseudo expansion pass. In the future, we will need to add more instructions to this pass. We can do that on a case-by-case basic. Reviewers: arsenm, kparzysz Subscribers: wdng, llvm-commits, mgorny Differential Revision: https://reviews.llvm.org/D27650 llvm-svn: 289517	2016-12-13 05:53:14 +00:00
Philip Reames	1f1bbac8da	[peephole] Enhance folding logic to work for STATEPOINTs The general idea here is to get enough of the existing restrictions out of the way that the already existing folding logic in foldMemoryOperand can kick in for STATEPOINTs and fold references to immutable stack slots. The key changes are: Support for folding multiple operands at once which reference the same load Support for folding multiple loads into a single instruction Walk all the operands of the instruction for varidic instructions (this is a bug fix!) Once this lands, I'll post another patch which refactors the TII interface here. There's nothing actually x86 specific about the x86 code used here. Differential Revision: https://reviews.llvm.org/D24103 llvm-svn: 289510	2016-12-13 01:38:41 +00:00
Philip Reames	51387a8c28	[Statepoints] Reuse stack slots more than once within a basic block The stack slot reuse code had a really amusing bug. We ended up only reusing a stack slot exact once (initial use + reuse) within a basic block. If we had a third statepoint to process, we ended up allocating a new set of stack slots. If we crossed a basic block boundary, the set got cleared. As a result, code which is invoke heavy doesn't see the problem, but multiple calls within a basic block does. Net result: as we optimize invokes into calls, lowering gets worse. The root error here is that the bitmap uses by the custom allocator wasn't kept in sync. The result was that we ended up resizing the bitmap on the next statepoint (to handle the cross block case), reset the bit once, but then never reset it again. Differential Revision: https://reviews.llvm.org/D25243 llvm-svn: 289509	2016-12-13 01:21:15 +00:00
Kostya Serebryany	a31300e789	[libFuzzer] don't require extra flags with -minimize_crash=1 (default to -max_total_time=600). Also respect exact_artifact_path when outputting the end result llvm-svn: 289506	2016-12-13 00:40:47 +00:00
Chris Bieneman	5d58aa80ad	Missed a file in r289503. llvm-svn: 289504	2016-12-13 00:32:43 +00:00
Chris Bieneman	a0523fd0cd	[LIT] Fix system-windows Turns out if you were on windows and your default target wasn't windows the system-windows feature wasn't getting enabled. This fixes that and updates the coff-dwarf test to rely on the new "target-windows" feature. That test was the reason why system-windows was changed to not always be enabled on Windows hosts. llvm-svn: 289503	2016-12-13 00:29:56 +00:00
Chris Bieneman	5a7c5069da	Revert "Suppress LLVM::tools/llvm-symbolizer/coff-dwarf.test for mingw, for now." This reverts commit r249937. llvm-svn: 289502	2016-12-13 00:29:51 +00:00
Chris Bieneman	e96abc6d45	[llvm-config] Unsupported should be win32 Hopefully this will fix the failing Windows bot. llvm-svn: 289497	2016-12-12 23:42:08 +00:00
Tim Northover	d82cc61744	Stop lying about pointers' required alignments. These extra specializations were added in the depths of history (r67984 from 2009) and are clearly problematic now. The pointers actually are aligned to the default (8 bytes), since otherwise UBsan would be complaining loudly. I think it originally made sense because there was no "alignof" to infer the correct value so the generic case went with what malloc returned (8-byte aliged objects), and on 32-bit machines this specialization was correct. It became wrong when we started compiling for 64-bit, and caused a UBSan failure when we tried to put a ValueHandle into a DenseMap. Should fix the Green Dragon UBSan bot. llvm-svn: 289496	2016-12-12 23:29:07 +00:00
Marcos Pividori	681e904419	[libFuzzer] Implement Timers for Windows. Implemented timeouts for Windows using TimerQueueTimers. Timers are used to supervise the time of execution of the callback function that is being fuzzed. Differential Revision: https://reviews.llvm.org/D27237 llvm-svn: 289495	2016-12-12 23:25:11 +00:00
Sanjay Patel	2a1554a0b6	[x86] fix test specifications llvm-svn: 289493	2016-12-12 23:16:35 +00:00
Sanjay Patel	1740526e99	[x86] fix test specifications and auto-generate checks llvm-svn: 289492	2016-12-12 23:15:15 +00:00
Petr Hosek	024a17b06d	[CMake] Multi-target builtins build This change enables building builtins for multiple different targets using LLVM runtimes directory. To specify the builtin targets to be built, use the LLVM_BUILTIN_TARGETS variable, where the value is the list of targets. To pass a per target variable to the builtin build, you can set BUILTINS_<target>_<variable> where <variable> will be passed to the builtin build for <target>. Differential Revision: https://reviews.llvm.org/D26652 llvm-svn: 289491	2016-12-12 23:15:10 +00:00
Chris Bieneman	1a5e67869e	Revert "Disable all llvm-config tests for now, will investigate later" This reverts commit r260386. These tests all pass for me locally. I have no idea if they will pass on all configurations, so I'll watch the bots closely. llvm-svn: 289490	2016-12-12 23:14:58 +00:00
Dan Liew	197d2f0df3	[llvm-config] Fix bug where `--libfiles` and `--names` would produce incorrect output when LLVM is built with `LLVM_BUILD_LLVM_DYLIB`. `llvm-config` previously produced output like this ``` $ llvm-config --libfiles /usr/lib/liblibLLVM-4.0svn.so.so $ llvm-config --libnames liblibLLVM-4.0svn.so.so ``` The library prefix and shared library extension were added to the library name twice which was wrong. I wanted to write a test cases for this but it looks like all `llvm-config` tests were disabled by r260386 so I'll leave this for now. Subscribers: llvm-commits, tstellarAMD Reviewers: beanz, DiamondLovesYou, axw Differential Revision: https://reviews.llvm.org/D27393 llvm-svn: 289488	2016-12-12 23:07:22 +00:00
Andrew Kaylor	ff6a1edfa8	Avoid infinite loops in branch folding Differential Revision: https://reviews.llvm.org/D27582 llvm-svn: 289486	2016-12-12 23:05:38 +00:00
Chris Bieneman	7495a4895c	clang-format to fix post-commit feedback Thanks dblaikie! llvm-svn: 289485	2016-12-12 23:05:15 +00:00
Chris Bieneman	f07d05eccd	[llvm-config] Fix cflags test looking for "error" This test is (I think) actually trying to make sure no errors are printed, but it hits on the string "error" in flags. llvm-svn: 289484	2016-12-12 23:03:28 +00:00
Chris Bieneman	04418623fe	Revert "Remove system-libs.test for now" This reverts commit r260281. llvm-svn: 289483	2016-12-12 23:03:01 +00:00
Sanjoy Das	804b629812	Revert "[SCEVExpander] Use llvm data structures; NFC" This reverts r289215 (git SHA1 cb7b86a1). It breaks the ubsan build because a DenseMap that keys off of `AssertingVH<T>` will hit UB when it tries to cast the empty and tombstone keys to `T *` (due to insufficient alignment). This is the relevant stack trace (thanks to Mike Aizatsky): #0 0x25cf100 in llvm::AssertingVH<llvm::PHINode>::getValPtr() const llvm/include/llvm/IR/ValueHandle.h:212:39 #1 0x25cea20 in llvm::AssertingVH<llvm::PHINode>::operator=(llvm::AssertingVH<llvm::PHINode> const&) llvm/include/llvm/IR/ValueHandle.h:234:19 #2 0x25d0092 in llvm::DenseMapBase<llvm::DenseMap<llvm::AssertingVH<llvm::PHINode>, llvm::detail::DenseSetEmpty, llvm::DenseMapInfo<llvm::AssertingVH<llvm::PHINode> >, llvm::detail::DenseSetPair<llvm::AssertingVH<llvm::PHINode> > >, llvm::AssertingVH<llvm::PHINode>, llvm::detail::DenseSetEmpty, llvm::DenseMapInfo<llvm::AssertingVH<llvm::PHINode> >, llvm::detail::DenseSetPair<llvm::AssertingVH<llvm::PHINode> > >::clear() llvm/include/llvm/ADT/DenseMap.h:113:23 llvm-svn: 289482	2016-12-12 23:00:12 +00:00
Kostya Serebryany	092d5764a1	[libFuzzer] split one slow test into several, for more parallel testing llvm-svn: 289481	2016-12-12 22:55:25 +00:00
Nico Weber	b3901bdde8	Fix MSVC build after 289461; MSVC isn't sure if this is std:: or llvm:: llvm-svn: 289480	2016-12-12 22:46:40 +00:00
Kostya Serebryany	a4b43bf8e8	[libFuzzer] make SimpleCmpTest a bit simpler to crack and more verbose llvm-svn: 289477	2016-12-12 22:39:33 +00:00
Sanjay Patel	62104ee6d9	[x86] fix formatting; NFC llvm-svn: 289476	2016-12-12 22:31:01 +00:00
Eugene Zelenko	6a9226d9b8	[AMDGPU, PowerPC, TableGen] Fix some Clang-tidy modernize and Include What You Use warnings; other minor fixes (NFC). llvm-svn: 289475	2016-12-12 22:23:53 +00:00
Tim Shen	18e7ae672e	[APFloatTest] Use std::make_tuple to make GCC 4.8 happy Differential Revision: https://reviews.llvm.org/D26817 llvm-svn: 289474	2016-12-12 22:16:08 +00:00
Guozhi Wei	1fd553c934	[PPC] Prefer direct move on power8 if load 1 or 2 bytes to VSR Power8 has MTVSRWZ but no LXSIBZX/LXSIHZX, so move 1 or 2 bytes to VSR through MTVSRWZ is much faster than store the extended value into stack and load it with LXSIWZX. This patch fixes pr31144. Differential Revision: https://reviews.llvm.org/D27287 llvm-svn: 289473	2016-12-12 22:09:02 +00:00
Tim Shen	44bde896a5	[APFloat] Implement PPCDoubleDouble add and subtract. Summary: I looked at libgcc's implementation (which is based on the paper, Software for Doubled-Precision Floating-Point Computations", by Seppo Linnainmaa, ACM TOMS vol 7 no 3, September 1981, pages 272-283.) and made it generic to arbitrary IEEE floats. Differential Revision: https://reviews.llvm.org/D26817 llvm-svn: 289472	2016-12-12 21:59:30 +00:00
Matthew Simpson	92ce0230b5	[SLP] Fix sign-extends for type-shrinking This patch ensures the correct minimum bit width during type-shrinking. Previously when type-shrinking, we always sign-extended values back to their original width. However, if we are going to sign-extend, and the sign bit is unknown, we have to increase the minimum bit width by one bit so the sign-extend will fill the upper bits correctly. If the sign bit is known to be zero, we can perform a zero-extend instead. This should fix PR31243. Reference: https://llvm.org/bugs/show_bug.cgi?id=31243 Differential Revision: https://reviews.llvm.org/D27466 llvm-svn: 289470	2016-12-12 21:11:04 +00:00
Kostya Serebryany	035af9b346	[libFuzzer] build libFuzzer itself with asan llvm-svn: 289469	2016-12-12 20:58:10 +00:00
Paul Robinson	ac7fe5e0c4	Recommit r288212: Emit 'no line' information for interesting 'orphan' instructions. DWARF specifies that "line 0" really means "no appropriate source location" in the line table. By default, use this for branch targets and some other cases that have no specified source location, to prevent inheriting unfortunate line numbers from physically preceding instructions (which might be from completely unrelated source). Updated patch allows enabling or suppressing this behavior for all unspecified source locations. Differential Revision: http://reviews.llvm.org/D24180 llvm-svn: 289468	2016-12-12 20:49:11 +00:00
Kostya Serebryany	d4be88913e	[libFuzzer] respect -max_len during merge llvm-svn: 289467	2016-12-12 20:39:35 +00:00
Teresa Johnson	a29bd6ffcc	[ThinLTO] Remove useless code (NFC) Should have been removed in r288446. llvm-svn: 289466	2016-12-12 20:34:28 +00:00
Mehdi Amini	ef27db879c	Refactor BitcodeReader: move Metadata and ValueId handling in their own class/file Summary: I'm planning on changing the way we load metadata to enable laziness. I'm getting lost in this gigantic files, and gigantic class that is the bitcode reader. This is a first toward splitting it in a few coarse components that are more easily understandable. Reviewers: pcc, tejohnson Subscribers: mgorny, llvm-commits, dexonsmith Differential Revision: https://reviews.llvm.org/D27646 llvm-svn: 289461	2016-12-12 19:34:26 +00:00
Mehdi Amini	bf2090e31a	Remove IsMetadataMaterialized from BitcodeReader (NFC) Summary: It does not seem useful. Reviewers: pcc, dexonsmith Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D27668 llvm-svn: 289457	2016-12-12 19:23:39 +00:00
Geoff Berry	d73420d591	[LiveRangeEdit] Add assert string and descriptive comment. llvm-svn: 289456	2016-12-12 19:12:41 +00:00
Dimitry Andric	59e5cb4342	Fix compile with GCC 5 or later Summary: Compiling with GCC 5 or later can fail with a bogus error "constructor required before non-static data member for llvm::ValueEnumerator::MDRange::First has been parsed". This was originally fixed upstream in GCC PR 70528, but later this fix was reverted, and released versions of GCC still show the bogus error. To work around this, replace MDRange's declaration of a default constructor with a definition. Reviewers: dexonsmith, rsmith, rivanvx Subscribers: llvm-commits, dim, dexonsmith Differential Revision: https://reviews.llvm.org/D18730 llvm-svn: 289454	2016-12-12 19:05:52 +00:00
Reid Kleckner	30422eea0f	Revert "[SCEVExpand] do not hoist divisions by zero (PR30935)" Reverts r289412. It caused an OOB PHI operand access in instcombine when ASan is enabled. Reduction in progress. Also reverts "[SCEVExpander] Add a test case related to r289412" llvm-svn: 289453	2016-12-12 18:52:32 +00:00
Simon Atanasyan	5048514c20	[mips] For PIC code convert unconditional jump to unconditional branch Unconditional branch uses relative addressing which is the right choice in case of position independent code. This is a fix for the bug: https://dmz-portal.mips.com/bugz/show_bug.cgi?id=2445 Differential revision: https://reviews.llvm.org/D27483 llvm-svn: 289448	2016-12-12 17:40:26 +00:00
Nicolai Haehnle	f45ea4bbc5	AMDGPU: llvm.amdgcn.interp.mov is a source of divergence Summary: While the result is constant across a single primitive, each pixel shader wave can have pixels from multiple primitives. Reviewers: tstellarAMD, arsenm Subscribers: kzhuravl, wdng, yaxunl, llvm-commits, tony-tye Differential Revision: https://reviews.llvm.org/D27572 llvm-svn: 289447	2016-12-12 16:52:19 +00:00
Sanjay Patel	052220c5c8	remove stale FIXME note from test; NFC llvm-svn: 289445	2016-12-12 16:20:21 +00:00
Simon Pilgrim	a64d4dc22f	[X86] Regenerate vector bitcast/widening tests. llvm-svn: 289443	2016-12-12 16:15:45 +00:00
Sanjay Patel	e730ce87a5	[InstCombine] fix bug when offsetting case values of a switch (PR31260) We could truncate the condition and then try to fold the add into the original condition value causing wrong case constants to be used. Move the offset transform ahead of the truncate transform and return after each transform, so there's no chance of getting confused values. Fix for: https://llvm.org/bugs/show_bug.cgi?id=31260 llvm-svn: 289442	2016-12-12 16:13:52 +00:00
Teresa Johnson	040cc16835	[ThinLTO] Import only necessary DICompileUnit fields Summary: As discussed on mailing list, for ThinLTO importing we don't need to import all the fields of the DICompileUnit. Don't import enums, macros, retained types lists. Also only import local scoped imported entities. Since we don't currently import any global variables, we also don't need to import the list of global variables (added an assert to verify none are being imported). This is being done by pre-populating the value map entries to map the unneeded metadata to nullptr. For the imported entities, we can simply replace the source module's list with a new list containing only those needed imported entities. This is done in the IRLinker constructor so that value mapping automatically does the desired mapping. Reviewers: mehdi_amini, dexonsmith, dblaikie, aprantl Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D27635 llvm-svn: 289441	2016-12-12 16:09:30 +00:00
Sanjay Patel	87e2f677d7	[InstCombine] clean up range-for-loops in visitSwitchInst(); NFCI llvm-svn: 289439	2016-12-12 15:52:56 +00:00
Simon Pilgrim	d4ff86b973	[X86] Regenerate test. llvm-svn: 289438	2016-12-12 15:47:53 +00:00
Sanjay Patel	2b060c7700	[InstCombine] add test to show PR31260 miscompile; NFC llvm-svn: 289437	2016-12-12 15:28:44 +00:00
Sanjoy Das	b1227db1f4	[SCEVExpander] Add a test case related to r289412 llvm-svn: 289435	2016-12-12 14:57:11 +00:00
Simon Pilgrim	4cbe1834e4	Update inline argument comment. NFCI. combineX86ShufflesRecursively 'HasPSHUFB' flag has been the more generic 'HasVariableMask' flag for some time. llvm-svn: 289430	2016-12-12 13:43:15 +00:00
Simon Pilgrim	5ebd2b542b	[X86][SSE] Add support for combining SSE VSHLI/VSRLI uniform constant shifts. Fixes some missed constant folding opportunities and allows us to combine shuffles that end with a logical bit shift. llvm-svn: 289429	2016-12-12 13:33:58 +00:00
Simon Pilgrim	369cd349b9	[X86][SSE] Lower suitably sign-extended mul vXi64 using PMULDQ PMULDQ returns the 64-bit result of the signed multiplication of the lower 32-bits of vXi64 vector inputs, we can lower with this if the sign bits stretch that far. Differential Revision: https://reviews.llvm.org/D27657 llvm-svn: 289426	2016-12-12 10:49:15 +00:00
Simon Pilgrim	040a36c176	[SelectionDAG] Add support for EXTRACT_SUBVECTOR to ComputeNumSignBits Pre-commit as discussed on D27657 llvm-svn: 289425	2016-12-12 10:29:43 +00:00
Craig Topper	36ecce9bed	[X86] Teach selectScalarSSELoad to accept full 128-bit vector loads and the X86ISD::VZEXT_LOAD opcode. Disable peephole on some of the tests that no longer require it to properly fold scalar intrinsics. llvm-svn: 289424	2016-12-12 07:57:24 +00:00
Craig Topper	f2c6f7abf3	[X86] Change CMPSS/CMPSD intrinsic instructions to use sse_load_f32/f64 as its memory pattern instead of full vector load. These intrinsics only load a single element. We should use sse_loadf32/f64 to give more options of what loads it can match. Currently these instructions are often only getting their load folded thanks to the load folding in the peephole pass. I plan to add more types of loads to sse_load_f32/64 so we can match without the peephole. llvm-svn: 289423	2016-12-12 07:57:21 +00:00
Craig Topper	081c0e2864	[X86] Remove some intrinsic instructions from hasPartialRegUpdate Summary: These intrinsic instructions are all selected from intrinsics that have well defined behavior for where the upper bits come from. It's not the same place as the lower bits. As you can see we were suppressing load folding for these instructions in some cases. In none of the cases was the separate load helping avoid a partial dependency on the destination register. So we should just go ahead and allow the load to be folded. Only foldMemoryOperand was suppressing folding for these. They all have patterns for folding sse_load_f32/f64 that aren't gated with OptForSize, but sse_load_f32/f64 doesn't allow 128-bit vector loads. It only allows scalar_to_vector and vzmovl of scalar loads to match. There's no reason we can't allow a 128-bit vector load to be narrowed so I would like to fix sse_load_f32/f64 to allow that. And if I do that it changes some of these same test cases to fold the load too. Reviewers: spatel, zvi, RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D27611 llvm-svn: 289419	2016-12-12 05:07:17 +00:00
Sebastian Pop	8c9cc8c86b	[SCEVExpand] do not hoist divisions by zero (PR30935) SCEVExpand computes the insertion point for the components of a SCEV to be code generated. When it comes to generating code for a division, SCEVexpand would not be able to check (at compilation time) all the conditions necessary to avoid a division by zero. The patch disables hoisting of expressions containing divisions by anything other than non-zero constants in order to avoid hoisting these expressions past conditions that should hold before doing the division. The patch passes check-all on x86_64-linux. Differential Revision: https://reviews.llvm.org/D27216 llvm-svn: 289412	2016-12-12 02:52:51 +00:00
Craig Topper	7fc6d34ed1	[InstCombine][XOP] The instructions for the scalar frcz intrinsics are defined to put 0 in the upper bits, not pass bits through like other intrinsics. So we should return a zero vector instead. llvm-svn: 289411	2016-12-11 22:32:38 +00:00
Simon Pilgrim	831435cb14	[X86][SSE] Add support for combining target shuffles to SHUFPD. llvm-svn: 289407	2016-12-11 21:26:25 +00:00
Davide Italiano	0a1476c756	[SCCP] Use the appropriate helper function. NFCI. llvm-svn: 289406	2016-12-11 21:19:03 +00:00
Ayman Musa	7ec4ed55d3	[X86][AVX512] Add missing patterns for broadcast fallback in case load node has multiple uses (for v4i64 and v4f64). When the load node which the broadcast instruction broadcasts has multiple uses, it cannot be folded. A fallback pattern is added to catch these cases and provide another solution. Differential Revision: https://reviews.llvm.org/D27661 llvm-svn: 289404	2016-12-11 20:11:17 +00:00
Sanjoy Das	6de678815c	[TBAA] Don't generate invalid TBAA when merging nodes Summary: Fix a corner case in `MDNode::getMostGenericTBAA` where we can sometimes generate invalid TBAA metadata. Reviewers: chandlerc, hfinkel, mehdi_amini, manmanren Subscribers: mcrosier, llvm-commits Differential Revision: https://reviews.llvm.org/D26635 llvm-svn: 289403	2016-12-11 20:07:25 +00:00
Sanjoy Das	3336f681e3	[Verifier] Add verification for TBAA metadata Summary: This change adds some verification in the IR verifier around struct path TBAA metadata. Other than some basic sanity checks (e.g. we get constant integers where we expect constant integers), this checks: - That by the time an struct access tuple `(base-type, offset)` is "reduced" to a scalar base type, the offset is `0`. For instance, in C++ you can't start from, say `("struct-a", 16)`, and end up with `("int", 4)` -- by the time the base type is `"int"`, the offset better be zero. In particular, a variant of this invariant is needed for `llvm::getMostGenericTBAA` to be correct. - That there are no cycles in a struct path. - That struct type nodes have their offsets listed in an ascending order. - That when generating the struct access path, you eventually reach the access type listed in the tbaa tag node. Reviewers: dexonsmith, chandlerc, reames, mehdi_amini, manmanren Subscribers: mcrosier, llvm-commits Differential Revision: https://reviews.llvm.org/D26438 llvm-svn: 289402	2016-12-11 20:07:15 +00:00
Sanjay Patel	81ed3499cd	[Constants] don't die processing non-ConstantInt GEP indices in isGEPWithNoNotionalOverIndexing() (PR31262) This should fix: https://llvm.org/bugs/show_bug.cgi?id=31262 llvm-svn: 289401	2016-12-11 20:07:02 +00:00
Simon Pilgrim	7c98a79f7b	[X86][AVX512] Add target shuffle test showing missing PSHUFPD combine. llvm-svn: 289400	2016-12-11 19:41:23 +00:00
Sebastian Pop	e08d9c7c87	instr-combiner: sum up all latencies of the transformed instructions We have found that -- when the selected subarchitecture has a scheduling model and we are not optimizing for size -- the machine-instruction combiner uses a too-simple algorithm to compute the cost of one of the two alternatives [before and after running a combining pass on a section of code], and therefor it throws away the combination results too often. This fix has the potential to help any ISA with the potential to combine instructions and for which at least one subarchitecture has a scheduling model. As of now, this is only known to definitely affect AArch64 subarchitectures with a scheduling model. Regression tested on AMD64/GNU-Linux, new test case tested to fail on an unpatched compiler and pass on a patched compiler. Patch by Abe Skolnik and Sebastian Pop. llvm-svn: 289399	2016-12-11 19:39:32 +00:00
Simon Pilgrim	8766a76f3d	[X86][XOP] Add target shuffle tests showing missing PSHUFPD combine. llvm-svn: 289398	2016-12-11 19:36:25 +00:00
Sanjoy Das	ba1bf87586	[SCEVExpander] Explicitly expand AddRec starts into loop preheader This is NFC today, but won't be once D27216 (or an equivalent patch) is in. This change fixes a design problem in SCEVExpander -- it relied on a hoisting optimization to generate correct code for add recurrences. This meant changing the hoisting optimization to not kick in under certain circumstances (to avoid speculating faulting instructions, say) would break correctness. The fix is to make the correctness requirements explicit, and have it not rely on the hoisting optimization for correctness. llvm-svn: 289397	2016-12-11 19:02:21 +00:00
Oren Ben Simhon	9683ecbff6	[X86] Regcall - Adding support for mask types Regcall calling convention passes mask types arguments in x86 GPR registers. The review includes the changes required in order to support v32i1, v16i1 and v8i1. Differential Revision: https://reviews.llvm.org/D27148 llvm-svn: 289383	2016-12-11 14:10:52 +00:00
Chandler Carruth	726774cbf8	[FileCheck] Re-implement the logic to find each check prefix in the check file to not be unreasonably slow in the face of multiple check prefixes. The previous logic would repeatedly scan potentially large portions of the check file looking for alternative prefixes. In the worst case this would scan most of the file looking for a rare prefix between every single occurance of a common prefix. Even if we bounded the scan, this would do bad things if the order of the prefixes was "unlucky" and the distant prefix was scanned for first. None of this is necessary. It is straightforward to build a state machine that recognizes the first, longest of the set of alternative prefixes. That is in fact exactly whan a regular expression does. This patch builds a regular expression once for the set of prefixes and then uses it to search incrementally for the next prefix. This requires some threading of state but actually makes the code dramatically simpler. I've also added a big comment describing the algorithm as it was not at all obvious to me when I started. With this patch, several previously pathological test cases in test/CodeGen/X86 are 5x and more faster. Overall, running all tests under test/CodeGen/X86 uses 10% less CPU after this, and because all the slowest tests were hitting this, finishes in 40% less wall time on my system (going from just over 5.38s to just over 3.23s) on a release build! This patch substantially improves the time of all 7 X86 tests that were in the top 20 reported by --time-tests, 5 of them are completely off the list and the remaining 2 are much lower. (Sadly, the new tests on the list include 2 new X86 ones that are slow for unrelated reasons, so the count stays at 4 of the top 20.) It isn't clear how much this helps debug builds in aggregate in part because of the noise, but it again makes mane of the slowest x86 tests significantly faster (10% or more improvement). llvm-svn: 289382	2016-12-11 12:49:05 +00:00
Chandler Carruth	b03c166a6c	[FileCheck] Remove a parameter that was simply always set to a commandline flag and test the flag directly. NFC. If we ever need this generality it can be added back. llvm-svn: 289381	2016-12-11 10:22:17 +00:00
Chandler Carruth	4dabac20ad	[FileCheck] Clean up doxygen comments throughout. NFC. llvm-svn: 289380	2016-12-11 10:16:21 +00:00
Chandler Carruth	e8f2fb2061	[FileCheck] Run clang-format over this code. NFC. This fixes one formatting goof I left in my previous commit and many other inconsistencies. I'm planning to make substantial changes here and so wanted to get to a clean baseline. llvm-svn: 289379	2016-12-11 09:54:36 +00:00
Chandler Carruth	20247900d7	Refactor FileCheck some to reduce memory allocation and copying. Also make some readability improvements. Both the check file and input file have to be fully buffered to normalize their whitespace. But previously this would be done in a stack SmallString and then copied into a heap allocated MemoryBuffer. That seems pretty wasteful, especially for something like FileCheck where there are only ever two such entities. This just rearranges the code so that we can keep the canonicalized buffers on the stack of the main function, use reasonably large stack buffers to reduce allocation. A rough estimate seems to show that about 80% of LLVM's .ll and .s files will fit into a 4k buffer, so this should completely avoid heap allocation for the buffer in those cases. My system's malloc is fast enough that the allocations don't directly show up in timings. However, on some very slow test cases, this saves 1% - 2% by avoiding the copy into the heap allocated buffer. This also splits out the code which checks the input into a helper much like the code to build the checks as that made the code much more readable to me. Nit picks and suggestions welcome here. It has really exposed a bunch of stuff that could be cleaned up though, so I'm probably going to go and spring clean all of this code as I have more changes coming to speed things up. llvm-svn: 289378	2016-12-11 09:50:05 +00:00
Craig Topper	23ebd9564f	[X86][InstCombine] Add support for scalar FMA intrinsics to SimplifyDemandedVectorElts. This teaches SimplifyDemandedElts that the FMA can be removed if the lower element isn't used. It also teaches it that if upper elements of the first operand aren't used then we can simplify them. llvm-svn: 289377	2016-12-11 08:54:52 +00:00
Craig Topper	7a230f4225	[X86][InstCombine] Add the test cases for r289370, r289371, and r289372. I forgot to add the new files before commiting. llvm-svn: 289374	2016-12-11 08:00:51 +00:00
Chandler Carruth	ecbe61966f	Tweak the core loop in StringRef::find to avoid calling memcmp on every iteration. Instead, load the byte at the needle length, compare it directly, and save it to use in the lookup table of lengths we can skip forward. I also added an annotation to expect that the comparison fails so that the loop gets laid out contiguously without the call to memcpy (and the substantial register shuffling that the ABI requires of that call). Finally, because this behaves especially badly with a needle length of one (by calling memcmp with a zero length) special case that to directly call memchr, which is what we should have been doing anyways. This was motivated by the fact that there are a large number of test cases in 'check-llvm' where FileCheck's performance is dominated by calls to StringRef::find (in a release, no-asserts build). I'm working on patches to generally improve matters there, but this alone was worth a 12.5% improvement in one test case where FileCheck spent 92% of its time in this routine. I experimented a bunch with different minor variations on this theme, for example setting the pointer at the last byte and indexing backwards for the call to memcmp. That didn't improve anything on this version and seemed more complex. I also tried other things to make the loop flow more nicely and none worked. =/ It is a bit unfortunate, the generated code here remains pretty gross, but I don't see any obvious ways to improve it. At this point, most of my ideas would be really elaborate: 1) While the remainder of the string is long enough, we could load a 16-byte or 32-byte vector at the address of the last byte and use palignr to rotate that and check the first 15- or 31-bytes at the front of the next segment, essentially pre-loading the first several bytes of the next iteration so we could quickly detect a mismatch in those bytes without an additional memory access. Down side would be the code complexity, having a fallback loop, and likely misaligned vector load. Plus it would make the common case of the last byte not matching somewhat slower (need some extraction from a vector). 2) While we have space, we could do an aligned load of a 16- or 32-byte vector that contains the end byte, and use any peceding bytes to have a more precise "no" test, and any subsequent bytes could be saved for the next iteration. This remove any unaligned load penalty, but still requires us to pay the overhead of vector extraction for the cases where we didn't need to do anything other than load and compare the last byte. 3) Try to walk from the last byte in a way that is more friendly to cache and/or memory pre-fetcher considering we have to poke the last byte anyways. No idea if any of these are really worth pursuing though. They all seem somewhat unlikely to yield big wins in practice and to be a lot of work and complexity. So I settled here, which at least seems like a strict improvement over the previous version. llvm-svn: 289373	2016-12-11 07:46:21 +00:00
Craig Topper	61b280e7b0	[X86][InstCombine] Teach InstCombineCalls to simplify demanded elements for scalar FMA intrinsics. These intrinsics don't read the upper bits of their second and third inputs so we can try to simplify them. llvm-svn: 289372	2016-12-11 07:42:06 +00:00
Craig Topper	d96395365a	[AVX-512][InstCombine] Teach InstCombineCalls how to simplify demanded for scalar cmp intrinsics with masking and rounding. These intrinsics don't read the upper elements of their first and second input. These are slightly different the the SSE version which does use the upper bits of its first element as passthru bits since the result goes to an XMM register. For AVX-512 the result goes to a mask register instead. llvm-svn: 289371	2016-12-11 07:42:04 +00:00
Craig Topper	790d0fa569	[AVX-512][InstCombine] Teach InstCombineCalls how to simplify demanded elements for scalar add,div,mul,sub,max,min intrinsics with masking and rounding. These intrinsics don't read the upper bits of their second input. And the third input is the passthru for masking and that only uses the lower element as well. llvm-svn: 289370	2016-12-11 07:42:01 +00:00
Dylan McKay	bf1d2edab2	[AVR] Add calling convention CodeGen tests This adds CodeGen tests for the AVR C calling convention. llvm-svn: 289369	2016-12-11 07:09:45 +00:00
Kostya Serebryany	441e6310ae	[libFuzzer] don't depend on time in a test llvm-svn: 289368	2016-12-11 06:28:09 +00:00
Dylan McKay	72967a56e1	[AVR] Add a test to validate a simple 'blinking led' program llvm-svn: 289362	2016-12-11 04:59:39 +00:00
Craig Topper	58917f3508	[AVX-512][InstCombine] Add 512-bit vpermilvar intrinsics to InstCombineCalls to match 128 and 256-bit. llvm-svn: 289354	2016-12-11 01:59:36 +00:00
Craig Topper	e7166ce237	[X86] Fix a comment to say 'an FMA' instead of 'a FMA'. NFC llvm-svn: 289352	2016-12-11 01:28:08 +00:00
Craig Topper	1f1b441267	[X86] Remove masking from 512-bit VPERMIL intrinsics in preparation for being able to constant fold them in InstCombineCalls like we do for 128/256-bit. llvm-svn: 289350	2016-12-11 01:26:44 +00:00
Dylan McKay	139c0c7c37	[AVR] Fix a signed vs unsigned compiler warning llvm-svn: 289349	2016-12-11 00:24:13 +00:00
Craig Topper	9a63d7ade5	[X86][InstCombine] Teach InstCombineCalls to turn pshufb intrinsic into a shufflevector if the indices are constant. llvm-svn: 289348	2016-12-11 00:23:50 +00:00
Dylan McKay	658bb0964a	[AVR] Remove incorrect comment This should've been removed in r289323. llvm-svn: 289346	2016-12-10 23:50:30 +00:00
Craig Topper	edab02b50b	[X86] Remove masking from 512-bit PSHUFB intrinsics in preparation for being able to constant fold it in InstCombineCalls like we do for 128/256-bit. llvm-svn: 289344	2016-12-10 23:09:43 +00:00
Sanjay Patel	4c48bbe94d	[InstCombine] add helper for shift-by-shift folds; NFCI These are currently limited to integer types, but we should be able to extend to splat vectors and possibly general vectors. llvm-svn: 289343	2016-12-10 22:16:29 +00:00
Simon Pilgrim	b71b214287	[X86][SSE] Add tests for sign extended vXi64 multiplication llvm-svn: 289342	2016-12-10 22:02:36 +00:00
Simon Pilgrim	a03e350e69	[X86][SSE] Ensure UNPCK inputs are a consistent value type in LowerHorizontalByteSum llvm-svn: 289341	2016-12-10 21:16:45 +00:00
Craig Topper	abe7c5b5e9	[AVX-512] Remove 128/256 masked vpermil instrinsics and autoupgrade to a select around the unmasked avx1 intrinsics. llvm-svn: 289340	2016-12-10 21:15:52 +00:00
Craig Topper	a4744d170e	[X86][IR] Move the autoupgrading of store intrinsics out of the main nested if/else chain. This should buy a little more time against the MSVC limit mentioned in PR31034. The handlers for stores all return at the end of their block so they can be picked off early. llvm-svn: 289339	2016-12-10 21:15:48 +00:00
Matt Arsenault	fbc728853f	AMDGPU: Fix asan errors when folding operands This was failing when trying to fold immediates into operand 1 of a phi, which only has one statically known operand. llvm-svn: 289337	2016-12-10 19:58:00 +00:00
Simon Pilgrim	fb58550d73	[X86][SSE] Move ZeroVector creation into the shuffle pattern case where its actually used. Also fix the ZeroVector's type - I've no idea how this hasn't caused problems........ llvm-svn: 289336	2016-12-10 19:49:55 +00:00
Craig Topper	18b57da491	[AVX-512] Add support for lowering (v2i64 (fp_to_sint (v2f32))) to vcvttps2uqq when AVX512DQ and AVX512VL are available. llvm-svn: 289335	2016-12-10 19:35:39 +00:00
Craig Topper	8e288e0b68	[X86] Clarify indentation. NFC llvm-svn: 289334	2016-12-10 19:35:36 +00:00
Craig Topper	85f0e57c33	[X86] Combine LowerFP_TO_SINT and LowerFP_TO_UINT. They only differ by a single boolean flag passed to a helper function. Just check the opcode and create the flag. llvm-svn: 289333	2016-12-10 19:35:33 +00:00
Sanjay Patel	35289c62a8	[InstSimplify] improve function name; NFC llvm-svn: 289332	2016-12-10 17:40:47 +00:00
Simon Atanasyan	edd7a7bb40	[mips] Eliminate else-after-return. NFC llvm-svn: 289331	2016-12-10 17:30:09 +00:00
Simon Pilgrim	54945a12ec	[SelectionDAG] Add ability for computeKnownBits to peek through bitcasts from 'large element' scalar/vector to 'small element' vector. Extension to D27129 which already supported bitcasts from 'small element' vector to 'large element' scalar/vector types. llvm-svn: 289329	2016-12-10 17:00:00 +00:00
Simon Pilgrim	90a040e745	[X86][XOP] Add permil2ps buildvector combine test llvm-svn: 289327	2016-12-10 13:45:08 +00:00
Dylan McKay	41258cf07d	[AVR] Add a stub README file llvm-svn: 289326	2016-12-10 12:08:19 +00:00
Dylan McKay	d8a603c23b	[AVR] Fix and clean up the inline assembly tests There was a bug where we would hit an assertion if 'Q' was used as a constraint. I also removed hardcoded register names to prefer regexes so the tests don't break when the register allocator changes. llvm-svn: 289325	2016-12-10 11:49:07 +00:00
Dylan McKay	a7e0548722	[AVR] Explicitly set the target in all CodeGen tests This seems to have caused failures on the buildbot. llvm-svn: 289324	2016-12-10 11:23:16 +00:00
Dylan McKay	801a4bd4ed	[AVR] Fix an inline asm assertion which would always trigger It looks like some time in the past, constraint codes were changed from chars being passed around to enums. llvm-svn: 289323	2016-12-10 11:18:37 +00:00
Dylan McKay	5c90b8cb4f	[AVR] Use the register scavenger when expanding 'LDDW' instructions Summary: This gets rid of the hardcoded 'r0' that was used previously. Reviewers: asl Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D27567 llvm-svn: 289322	2016-12-10 10:51:55 +00:00
Dylan McKay	5d0233bea2	[AVR] Support stores to undefined pointers This would previously trigger an assertion error in AVRISelDAGToDAG. llvm-svn: 289321	2016-12-10 10:16:13 +00:00
Chandler Carruth	cef2482875	[PM] Further broaden this test's regex as both the CGSCC and Function inner AM proxies are now being rendered differently. llvm-svn: 289319	2016-12-10 07:59:59 +00:00
Chandler Carruth	d8aecb0e5c	[PM] Try to support the new spelling of one of the proxy names that are showing up on the build bots. llvm-svn: 289318	2016-12-10 07:46:51 +00:00
Chandler Carruth	6b9816477b	[PM] Support invalidation of inner analysis managers from a pass over the outer IR unit. Summary: This never really got implemented, and was very hard to test before a lot of the refactoring changes to make things more robust. But now we can test it thoroughly and cleanly, especially at the CGSCC level. The core idea is that when an inner analysis manager proxy receives the invalidation event for the outer IR unit, it needs to walk the inner IR units and propagate it to the inner analysis manager for each of those units. For example, each function in the SCC needs to get an invalidation event when the SCC gets one. The function / module interaction is somewhat boring here. This really becomes interesting in the face of analysis-backed IR units. This patch effectively handles all of the CGSCC layer's needs -- both invalidating SCC analysis and invalidating function analysis when an SCC gets invalidated. However, this second aspect doesn't really handle the LoopAnalysisManager well at this point. That one will need some change of design in order to fully integrate, because unlike the call graph, the entire function behind a LoopAnalysis's results can vanish out from under us, and we won't even have a cached API to access. I'd like to try to separate solving the loop problems into a subsequent patch though in order to keep this more focused so I've adapted them to the API and updated the tests that immediately fail, but I've not added the level of testing and validation at that layer that I have at the CGSCC layer. An important aspect of this change is that the proxy for the FunctionAnalysisManager at the SCC pass layer doesn't work like the other proxies for an inner IR unit as it doesn't directly manage the FunctionAnalysisManager and invalidation or clearing of it. This would create an ever worsening problem of dual ownership of this responsibility, split between the module-level FAM proxy and this SCC-level FAM proxy. Instead, this patch changes the SCC-level FAM proxy to work in terms of the module-level proxy and defer to it to handle much of the updates. It only does SCC-specific invalidation. This will become more important in subsequent patches that support more complex invalidaiton scenarios. Reviewers: jlebar Subscribers: mehdi_amini, mcrosier, mzolotukhin, llvm-commits Differential Revision: https://reviews.llvm.org/D27197 llvm-svn: 289317	2016-12-10 06:34:44 +00:00
Craig Topper	a39b650d72	[X86] Use X86ISD::CVTTP2SI and X86ISD::CVTTP2UI for lowering 128-bit cvttps2qq and cvttps2uqq intrinsics since there is a mismatch between number of input and output elements. Ideally ISD::FP_TO_SINT and ISD::FP_TO_UINT would only be used for cases with the same number of input and output elements. Similar things have already been done for other convert intrinsics. llvm-svn: 289316	2016-12-10 06:02:48 +00:00
Dylan McKay	f368509543	[AVR] Fix a bunch of incorrect assertion messages These should've been checking whether the immediate is a 6-bit unsigned integer. If the immediate was '63', this would cause an assertion error which shouldn't have occurred. llvm-svn: 289315	2016-12-10 05:48:48 +00:00
Kostya Serebryany	c05cb60369	[libFuzzer] test cleanup (3) llvm-svn: 289314	2016-12-10 02:48:42 +00:00
Kostya Serebryany	832d39e9cc	[libFuzzer] test cleanup (2) llvm-svn: 289313	2016-12-10 02:47:00 +00:00
Kostya Serebryany	2f962fe5f7	[libFuzzer] test cleanup llvm-svn: 289312	2016-12-10 02:45:56 +00:00
Kostya Serebryany	61be0f947d	[libFuzzer] switch all libFuzzer tests to use -fsanitize-coverage=trace-pc-guard. Support for the previosly used instrumentation will be removed in the following changes llvm-svn: 289311	2016-12-10 02:26:23 +00:00
Kostya Serebryany	1394ce2aa2	[libFuzzer] use __sanitizer_get_module_and_offset_for_pc to get the module name while printing the coverage llvm-svn: 289310	2016-12-10 01:19:35 +00:00
Matt Arsenault	2402b95db0	AMDGPU: Fix AMDGPUPromoteAlloca breaking addrspacecasts The users of the addrspacecast were having their types incorrectly changed, producing invalid bitcasts between address spaces. llvm-svn: 289307	2016-12-10 00:52:50 +00:00
Matt Arsenault	4bd7236193	AMDGPU: Fix handling of 16-bit immediates Since 32-bit instructions with 32-bit input immediate behavior are used to materialize 16-bit constants in 32-bit registers for 16-bit instructions, determining the legality based on the size is incorrect. Change operands to have the size specified in the type. Also adds a workaround for a disassembler bug that produces an immediate MCOperand for an operand that is supposed to be OPERAND_REGISTER. The assembler appears to accept out of bounds immediates and truncates them, but this seems to be an issue for 32-bit already. llvm-svn: 289306	2016-12-10 00:39:12 +00:00
Matt Arsenault	f0c862594b	AMDGPU: Fix vintrp disassembly llvm-svn: 289292	2016-12-10 00:29:55 +00:00
Matt Arsenault	618b330dd0	AMDGPU: Change vintrp printing to better match sc Some of the immediates need to be printed differently eventually. llvm-svn: 289291	2016-12-10 00:23:12 +00:00
Paul Robinson	0a32eab125	Bigger-hammer REQUIRES to fix Windows bot. llvm-svn: 289288	2016-12-09 23:08:17 +00:00
Eugene Zelenko	2bc2f33ba2	[AMDGPU, PowerPC, TableGen] Fix some Clang-tidy modernize and Include What You Use warnings; other minor fixes (NFC). llvm-svn: 289282	2016-12-09 22:06:55 +00:00
Paul Robinson	5e0bfa4a54	Speculative REQUIRES to fix Windows bot. llvm-svn: 289281	2016-12-09 21:59:00 +00:00
Simon Pilgrim	8dc97a4591	[X86] Regenerate test llvm-svn: 289279	2016-12-09 21:53:12 +00:00
Matt Arsenault	5869b5a447	AMDGPU: Cleanup checks in sext_inreg test llvm-svn: 289272	2016-12-09 21:10:41 +00:00
Adrian Prantl	8fafb8d378	Fix LLVM's use of DW_OP_bit_piece in DWARF expressions. LLVM's use of DW_OP_bit_piece is incorrect and a based on a misunderstanding of the wording in the DWARF specification. The offset argument of DW_OP_bit_piece refers to the offset into the location that is on the top of the DWARF expression stack, and not an offset into the source variable. This has since also been clarified in the DWARF specification. This patch fixes all uses of DW_OP_bit_piece to emit the correct offset and simplifies the DwarfExpression class to semi-automaticaly emit empty DW_OP_pieces to adjust the offset of the source variable, thus simplifying the code using DwarfExpression. While this is an incompatible bugfix, in practice I don't expect this to be much of a problem since LLVM's old interpretation and the correct interpretation of DW_OP_bit_piece differ only when there are gaps in the fragmented locations of the described variables or if individual fragments are smaller than a byte. LLDB at least won't interpret locations with gaps in them because is has no way to present undefined bits in a variable, and there is a high probability that an old-form expression will be malformed when interpreted correctly, because the DW_OP_bit_piece offset will be outside of the location at the top of the stack. As a nice side-effect, this patch enables us to use a more efficient encoding for subregisters: In order to express a sub-register at a non-zero offset we now use a DW_OP_bit_piece instead of shifting the value into place manually. This patch also adds missing test coverage for code paths that weren't exercised before. <rdar://problem/29335809> Differential Revision: https://reviews.llvm.org/D27550 llvm-svn: 289266	2016-12-09 20:43:40 +00:00
Matthias Braun	34359cf0fa	Add README describing the intention of test/CodeGen/MIR llvm-svn: 289265	2016-12-09 20:16:12 +00:00
Marek Olsak	23ae31cca0	AMDGPU/SI: Remove XNACK feature from CI Summary: CI doesn't have XNACK. Reviewers: tstellarAMD Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tony-tye Differential Revision: https://reviews.llvm.org/D27175 llvm-svn: 289263	2016-12-09 19:49:58 +00:00
Marek Olsak	0f55fbae6c	AMDGPU/SI: Don't reserve XNACK when it's disabled Summary: This frees 2 additional scalar registers. These are results from all of my 3 patches combined: Polaris: Spilled SGPRs: 2231 -> 1517 (-32.00 %) Tonga: Spilled SGPRs: 3829 -> 2608 (-31.89 %) Spilled VGPRs: 100 -> 84 (-16.00 %) Tonga even spills SGPRs via VGPRs to scratch. That's a compute shader limited to 64 VGPRs. Reviewers: tstellarAMD Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tony-tye Differential Revision: https://reviews.llvm.org/D27151 llvm-svn: 289262	2016-12-09 19:49:54 +00:00
Marek Olsak	693e9be918	AMDGPU/SI: Don't reserve FLAT_SCR on non-HSA targets & without stack objects Summary: This frees 2 scalar registers. Reviewers: tstellarAMD Subscribers: qcolombet, arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tony-tye Differential Revision: https://reviews.llvm.org/D27150 llvm-svn: 289261	2016-12-09 19:49:48 +00:00
Marek Olsak	91f22fbf4f	AMDGPU/SI: Allow using SGPRs 96-101 on VI Summary: There is no point in setting SGPRS=104, because VI allocates SGPRs in multiples of 16, so 104 -> 112. That enables us to use all 102 SGPRs for general purposes. Reviewers: tstellarAMD Subscribers: qcolombet, arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tony-tye Differential Revision: https://reviews.llvm.org/D27149 llvm-svn: 289260	2016-12-09 19:49:40 +00:00
Reid Kleckner	166eb37537	Remove /Zc:sizedDealloc- from the MSVC build According to the connect bug (https://connect.microsoft.com/VisualStudio/feedback/details/1351894), this was only necessary with pre-release versions of MSVC 2015. Fixes PR23513 llvm-svn: 289257	2016-12-09 19:20:28 +00:00
Paul Robinson	4fa7b57a1f	[DWARF] Suppress .loc directives from CFI instructions Like DBG_VALUE, these emit nothing to the .text section, and sometimes have no source location specified. Just ignore them. Differential Revision: http://reviews.llvm.org/D27492 llvm-svn: 289256	2016-12-09 19:15:32 +00:00
Matthias Braun	2c7d52a540	Move .mir tests to appropriate directories test/CodeGen/MIR should contain tests that intent to test the MIR printing or parsing. Tests that test something else should be in test/CodeGen/TargetName even when they are written in .mir. As a rule of thumb, only tests using "llc -run-pass none" should be in test/CodeGen/MIR. llvm-svn: 289254	2016-12-09 19:08:15 +00:00
Matt Arsenault	7b00cf4706	AMDGPU: Fix isTypeDesirableForOp for i16 This should do nothing for targets without i16. llvm-svn: 289235	2016-12-09 17:57:43 +00:00
Simon Pilgrim	017b7a71d8	[SelectionDAG] Add knownbits support for EXTRACT_VECTOR_ELT opcodes (REAPPLIED) Reapplied with fix for PR31323 - X86 SSE2 vXi16 multiplies for illegal types were creating CONCAT_VECTORS nodes with vector inputs that might not total the number of elements in the result type. llvm-svn: 289232	2016-12-09 17:53:11 +00:00
Matt Arsenault	38d8ed2b75	AMDGPU: Fix i128 mul llvm-svn: 289231	2016-12-09 17:49:14 +00:00
Matt Arsenault	52facf0195	AMDGPU: Allow TBA, TMA, TTMP* registers with SMEM instructions Fixes assembler regressions. llvm-svn: 289230	2016-12-09 17:49:11 +00:00
Matt Arsenault	eb4a55e066	AMDGPU: Clean up instruction bits Sort the instruction bits by type and make sure there is one for each format. Also cleanup namespaces. llvm-svn: 289229	2016-12-09 17:49:08 +00:00
Sean Fertile	1c4109b4c2	[PPC] Add intrinsics for vector extract word and vector insert word. Revision: https://reviews.llvm.org/D26547 llvm-svn: 289227	2016-12-09 17:21:42 +00:00
Nirav Dave	bedb5d906c	Revert "In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled." This reverts commit r289221 which appears to be triggering an assertion llvm-svn: 289226	2016-12-09 17:18:24 +00:00
Nirav Dave	fd51ff4fd8	In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled. Retrying after fixing overly aggressive load-store forwarding optimization. Simplify Consecutive Merge Store Candidate Search Now that address aliasing is much less conservative, push through simplified store merging search which only checks for parallel stores through the chain subgraph. This is cleaner as the separation of non-interfering loads/stores from the store-merging logic. Whem merging stores, search up the chain through a single load, and finds all possible stores by looking down from through a load and a TokenFactor to all stores visited. This improves the quality of the output SelectionDAG and generally the output CodeGen (with some exceptions). Additional Minor Changes: 1. Finishes removing unused AliasLoad code 2. Unifies the the chain aggregation in the merged stores across code paths 3. Re-add the Store node to the worklist after calling SimplifyDemandedBits. 4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is arbitrary, but seemed sufficient to not cause regressions in tests. This finishes the change Matt Arsenault started in r246307 and jyknight's original patch. Many tests required some changes as memory operations are now reorderable. Some tests relying on the order were changed to use volatile memory operations Noteworthy tests: CodeGen/AArch64/argument-blocks.ll - It's not entirely clear what the test_varargs_stackalign test is supposed to be asserting, but the new code looks right. CodeGen/AArch64/arm64-memset-inline.lli - CodeGen/AArch64/arm64-stur.ll - CodeGen/ARM/memset-inline.ll - The backend now generates worse code due to store merging succeeding, as we do do a 16-byte constant-zero store efficiently. CodeGen/AArch64/merge-store.ll - Improved, but there still seems to be an extraneous vector insert from an element to itself? CodeGen/PowerPC/ppc64-align-long-double.ll - Worse code emitted in this case, due to the improved store->load forwarding. CodeGen/X86/dag-merge-fast-accesses.ll - CodeGen/X86/MergeConsecutiveStores.ll - CodeGen/X86/stores-merging.ll - CodeGen/Mips/load-store-left-right.ll - Restored correct merging of non-aligned stores CodeGen/AMDGPU/promote-alloca-stored-pointer-value.ll - Improved. Correctly merges buffer_store_dword calls CodeGen/AMDGPU/si-triv-disjoint-mem-access.ll - Improved. Sidesteps loading a stored value and merges two stores CodeGen/X86/pr18023.ll - This test has been removed, as it was asserting incorrect behavior. Non-volatile stores CAN be moved past volatile loads, and now are. CodeGen/X86/vector-idiv.ll - CodeGen/X86/vector-lzcnt-128.ll - It's basically impossible to tell what these tests are actually testing. But, looks like the code got better due to the memory operations being recognized as non-aliasing. CodeGen/X86/win32-eh.ll - Both loads of the securitycookie are now merged. Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle Subscribers: wdng, nhaehnle, nemanjai, arsenm, weimingz, niravd, RKSimon, aemerson, qcolombet, dsanders, resistor, tstellarAMD, t.p.northover, spatel Differential Revision: https://reviews.llvm.org/D14834 llvm-svn: 289221	2016-12-09 16:15:12 +00:00
Simon Pilgrim	b9eb99f570	Use SelectionDAG.getSplatBuildVector helper. NFCI. llvm-svn: 289220	2016-12-09 16:01:50 +00:00
Tom Stellard	2a48433fcf	AMDGPU/SI: Don't mark VINTRP instructions as mayLoad Summary: These instructions technically do read from memory, but the memory is considered to be out of bounds for normal load/store instructions. shader-db stats: SGPRS: 1416075 -> 1413323 (-0.19 %) VGPRS: 867413 -> 863935 (-0.40 %) Spilled SGPRs: 1409 -> 1354 (-3.90 %) Spilled VGPRs: 63 -> 63 (0.00 %) Private memory VGPRs: 880 -> 880 (0.00 %) Scratch size: 2648 -> 2632 (-0.60 %) dwords per thread Code Size: 37889052 -> 37897340 (0.02 %) bytes LDS: 2147 -> 2147 (0.00 %) blocks Max Waves: 279243 -> 280369 (0.40 %) Wait states: 0 -> 0 (0.00 %) Reviewers: nhaehnle, mareko, arsenm Subscribers: kzhuravl, wdng, yaxunl, tony-tye Differential Revision: https://reviews.llvm.org/D27593 llvm-svn: 289219	2016-12-09 15:57:15 +00:00
Simon Pilgrim	bf9c0e7434	[SelectionDAG] Use SelectionDAG.getBuildVector helper. NFCI. Makes interception of BUILD_VECTOR creation easier for debugging. llvm-svn: 289218	2016-12-09 15:23:41 +00:00
Sanjoy Das	004de6fe69	[SCEVExpander] Remove \brief, reflow comments; NFC llvm-svn: 289216	2016-12-09 14:42:14 +00:00
Sanjoy Das	1f6b0433c8	[SCEVExpander] Use llvm data structures; NFC llvm-svn: 289215	2016-12-09 14:42:11 +00:00
Simon Pilgrim	15f1f828b5	[SelectionDAG] Add additional checks to CONCAT_VECTORS creation Part of the work for PR31323 - add extra asserts checking that the input vectors are of consistent type and result in the correct number of vector elements. llvm-svn: 289214	2016-12-09 14:27:52 +00:00
Benjamin Kramer	eedc4059c3	Plug another leak in the DWARF unittests, DIEInlineStrings are never destroyed. llvm-svn: 289208	2016-12-09 13:33:41 +00:00
Benjamin Kramer	9fcb7fe51e	Fix memory leak in unit test. The StringPool entries are destroyed with the allocator, the string pool itself is not. llvm-svn: 289207	2016-12-09 13:12:30 +00:00
NAKAMURA Takumi	6bd372bae7	llvm/test/Object/archive-thin-create.test: Make sure that %t is empty to stabilize the test. llvm-svn: 289202	2016-12-09 11:44:57 +00:00
Dylan McKay	1cdbf42a33	[AVR] Remove a set of redundant tests This fixes the build. llvm-svn: 289201	2016-12-09 11:22:26 +00:00
Simon Pilgrim	e4050a2961	[SelectionDAG] Add partial BITCAST support to computeKnownBits Adds support for bitcasting a little endian 'small element' vector to 'large element' scalar/vector (e.g. v16i8 to v4i32 or v2i32 to i64), which is required for PR30845. We extract the knownbits for each 'small element' part and concatenate the results together. We can add support for big endian and 'large element' scalar/vector to 'small element' vector bitcasting once we have test cases for them. Differential Revision: https://reviews.llvm.org/D27129 llvm-svn: 289200	2016-12-09 10:13:45 +00:00
Malcolm Parsons	1b1b02d25b	Update Doxygen comment in StringSaver (NFC) llvm-svn: 289196	2016-12-09 09:33:33 +00:00
Daniel Jasper	f51e05ffbc	Revert "[SelectionDAG] Add knownbits support for EXTRACT_VECTOR_ELT opcodes" This reverts commit r288916 as it is currently causing a crasher in Halide. Reproducer on llvm.org/PR31323. While it might be that halide is generating invalid IR, llc shouldn't crash. llvm-svn: 289194	2016-12-09 09:04:51 +00:00
Craig Topper	38b1b5d44f	[X86] Modify patterns from memory form of RCP/RSQRT/SQRT intrinsics to only allow (scalar_to_vector (loadf32/load64)) instead of anything that sse_load_f32/f64 can match. sse_load_f32/f64 can also match loads that are zero extended to vectors. We shouldn't match that because we wouldn't be able to get the instruction to zero the upper bits like the intrinsic semantics would require for such a case. There is a test case that does depend on this behavior. llvm-svn: 289193	2016-12-09 07:57:21 +00:00
Dylan McKay	18ae0f68f8	[AVR] Use a more appropriate integer type for wide IN/OUT instructions We could previously select an integer which would hit an assertion error in pseudo expansion. The new type will also generate the appropriate fixups if needed, which wasn't done beforehand. llvm-svn: 289192	2016-12-09 07:49:14 +00:00
Dylan McKay	a5d49dfbb3	[AVR] Add tests for a large number of pseudo instructions This adds MIR tests for 24 pseudo instructions. llvm-svn: 289191	2016-12-09 07:49:04 +00:00
Craig Topper	a55b483bb5	[AVX-512] Correctly preserve the passthru semantics of the FMA scalar intrinsics Summary: Scalar intrinsics have specific semantics about the which input's upper bits are passed through to the output. The same input is also supposed to be the input we use for the lower element when the mask bit is 0 in a masked operation. We aren't currently keeping these semantics with instruction selection. This patch corrects this by introducing new scalar FMA ISD nodes that indicate whether operand 1(one of the multiply inputs) or operand 3(the additon/subtraction input) should pass thru its upper bits. We use this information to select 213/132 form for the operand 1 version and the 231 form for the operand 3 version. We also use this information to suppress combining FNEG operations on the passthru input since semantically the passthru bits aren't negated. This is stronger than the earlier check added for a user being SELECTS so we can remove that. This fixes PR30913. Reviewers: delena, zvi, v_klochkov Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D27144 llvm-svn: 289190	2016-12-09 06:42:28 +00:00
Matt Arsenault	27c062932a	AMDGPU: Select i16 instructions to VOP3 forms These were selecting directly to the VOP2 form instead of VOP3 like the i32 instructions. Fixes regressions in future commits where an immediate isn't folded because it was initially used for the second operand. Because uniform 16-bit operations are promoted to i32, it's difficult to get a simple testcase where this matters. Fold failures in SIFoldOperands here tend to be hidden by commute and fold in SIShrinkInstructions. llvm-svn: 289189	2016-12-09 06:19:12 +00:00
Peter Collingbourne	8db7e5e4ee	Re-commit r289184, "Support: Use a 64-bit seek in raw_fd_ostream::seek()." with a configure-time check for lseek64. llvm-svn: 289187	2016-12-09 05:20:43 +00:00
Craig Topper	c4f2b0996d	[X86] Add masked versions of VPERMT2* and VPERMI2* to load folding tables. llvm-svn: 289186	2016-12-09 05:20:11 +00:00
Peter Collingbourne	f74fcdd30c	Revert r289184, we need more configury for Darwin and *BSD. llvm-svn: 289185	2016-12-09 05:04:30 +00:00
Peter Collingbourne	08ba509266	Support: Use a 64-bit seek in raw_fd_ostream::seek(). llvm-svn: 289184	2016-12-09 04:57:19 +00:00
Davide Italiano	f8f391db16	[SCCP] Make the test added in r289175 more meaningful. Add a comment while here. llvm-svn: 289182	2016-12-09 03:49:20 +00:00
Davide Italiano	824d695231	[SCCP] Teach the pass about `mul %x 0` even if %x is overdefined. The motivating example is: extern int patatino; int goo() { int x = 0; for (int i = 0; i < 1000000; ++i) { x *= patatino; } return x; } Currently SCCP will not realize that this function returns always zero, therefore will try to unroll and vectorize the loop at -O3 producing an awful lot of (useless) code. With this change, it will just produce: 0000000000000000 <g>: xor %eax,%eax retq llvm-svn: 289175	2016-12-09 03:08:42 +00:00
Craig Topper	2aeb456425	[AVX-512] Add vpermilps/pd to load folding tables. llvm-svn: 289173	2016-12-09 02:18:11 +00:00
Craig Topper	df9de00928	[AVX-512] Move some floating point stack folding test cases out of the integer test. llvm-svn: 289172	2016-12-09 02:18:07 +00:00
Craig Topper	107b187d2a	[Analysis] Fix typo in comment. NFC llvm-svn: 289171	2016-12-09 02:18:04 +00:00
Kostya Serebryany	111e1d69e3	[libFuzzer] implement crash-resistant merge (https://github.com/google/sanitizers/issues/722 ). This is a first experimental variant that needs some more testing, thus not yet adding a lit test (but there are unit tests). llvm-svn: 289166	2016-12-09 01:17:24 +00:00
Peter Collingbourne	8786754cc3	WholeProgramDevirt: Teach the pass to handle structs of arrays. This will become necessary in some cases once D22296 lands. llvm-svn: 289165	2016-12-09 01:10:11 +00:00
Chandler Carruth	86f0bdf832	[LCG] Minor cleanup to the LCG walk over a function, NFC. This just hoists the check for declarations up a layer which allows various sets used in the walk to be smaller. Also moves the relevant comments to match, and catches a few other cleanups in this code. llvm-svn: 289163	2016-12-09 00:46:44 +00:00
Peter Collingbourne	7a1e5bbe4e	Make WholeProgramDevirt understand ConstStruct vtables. Based on a patch by LemonBoy! Differential Revision: https://reviews.llvm.org/D26581 llvm-svn: 289162	2016-12-09 00:33:27 +00:00
Chris Bieneman	313b326bb6	[ObjectYAML] Support for DWARF debug_aranges This patch adds support for round tripping DWARF debug_aranges in and out of YAML. llvm-svn: 289161	2016-12-09 00:26:44 +00:00
Sanjay Patel	568196bf7b	[InstCombine] add tests for umin+icmp; NFC llvm-svn: 289157	2016-12-08 23:44:58 +00:00
Sanjay Patel	73d8bd9905	[InstCombine] add tests for umax+icmp; NFC llvm-svn: 289156	2016-12-08 23:36:57 +00:00
Zia Ansari	394cef803a	[InstSimplify] Add "X / 1.0" to SimplifyFDivInst. Differential Revision: https://reviews.llvm.org/D27587 llvm-svn: 289153	2016-12-08 23:27:40 +00:00
Sanjay Patel	b641aa3f14	[InstCombine] add tests for smax+icmp; NFC llvm-svn: 289151	2016-12-08 23:16:06 +00:00
Tim Northover	b58346f2f2	GlobalISel: fall back gracefully for debug intrinsics. Supporting them properly is a reasonably complex chunk of work, so to allow bot testing before then we should at least be able to fall back to DAG ISel. llvm-svn: 289150	2016-12-08 22:44:13 +00:00
Tim Northover	1e656ec137	GlobalISel: factor overflow handling into separate function. NFC. llvm-svn: 289149	2016-12-08 22:44:00 +00:00
Davide Italiano	54c683f9e7	[SCCP] Make sure SCCP and ConstantFolding agree on undef >> a. Currently SCCP folds the value to -1, while ConstantProp folds to 0. This changes SCCP to do what ConstantFolding does. llvm-svn: 289147	2016-12-08 22:28:53 +00:00
Simon Atanasyan	dccdfac877	[mips] Make the test case more specific and provide OS component of a triple. NFC llvm-svn: 289117	2016-12-08 22:10:52 +00:00
Simon Atanasyan	71db32110b	[mips] Change instruction s/daddiu/addiu/ since O32 prohibits the use of 64-bit GPRs. NFC llvm-svn: 289115	2016-12-08 22:10:48 +00:00
Simon Atanasyan	7f64300a7e	[mips] Change gnueabi to gnu in the triple because EABI has been removed recently. NFC llvm-svn: 289114	2016-12-08 22:10:44 +00:00
Simon Atanasyan	7625e771d2	[mips] Remove N32 Android test because Android does not support N32 ABI. NFC llvm-svn: 289113	2016-12-08 22:10:38 +00:00
Reid Kleckner	785e7d282c	Don't emit .seh_handler directives for any cleanup funclets We were falsely claiming that we had an LSDA for the relevant EH personality before this change, which could lead to the EH machinery interpreting random adjacent data as an LSDA. Fixes PR31317 This change is safe because cleanups can't contain exception handlers today. We do these things to maintain that invariant: - C++ destructors are naturally out-of-line - __finally blocks are outlined in clang - LLVM's inliner will not inline EH constructs into cleanups llvm-svn: 289101	2016-12-08 20:38:46 +00:00
Krzysztof Parzyszek	77a45576ef	[RDF] Fix incorrect lane mask calculation This was exposed by some code that used more than one level of sub- registers. There is no testcase, because there is no such code in the Hexagon backend. llvm-svn: 289099	2016-12-08 20:33:45 +00:00
Sanjay Patel	2580c95dc1	[InstSimplify] add fdiv x/1.0 test and update checks; NFC llvm-svn: 289098	2016-12-08 20:23:56 +00:00
Matt Arsenault	e96d03745d	AMDGPU: Make f16 ConstantFP legal Not having this legal led to combine failures, resulting in dumb things like bitcasts of constants not being folded away. The only reason I'm leaving the v_mov_b32 hack that f32 already uses is to avoid madak formation test regressions. PeepholeOptimizer has an ordering issue where the immediate fold attempt is into the sgpr->vgpr copy instead of the actual use. Running it twice avoids that problem. llvm-svn: 289096	2016-12-08 20:14:46 +00:00
Stanislav Mekhanoshin	73b54f4134	[AMDGPU] Fix number of reserved SGPRs on CI to reflect flat scratch use Differential Revision: https://reviews.llvm.org/D27225 llvm-svn: 289095	2016-12-08 20:07:23 +00:00
Matt Arsenault	6c06a6f48a	AMDGPU: Fix commuting v_sub_u16 The correct commutable opcode was set to itself, so this was simply swapping the operands to commute instead of also changing the opcode to v_subrev_u16. llvm-svn: 289093	2016-12-08 19:52:38 +00:00
Stanislav Mekhanoshin	50ea93a2bd	[AMDGPU] Add amdgpu-unify-metadata pass Multiple metadata values for records such as opencl.ocl.version, llvm.ident and similar are created after linking several modules. For some of them, notably opencl.ocl.version, this creates semantic problem because we cannot tell which version of OpenCL the composite module conforms. Moreover, such repetitions of identical values often create a huge list of unneeded metadata, which grows bitcode size both in memory and stored on disk. It can go up to several Mb when linked against our OpenCL library. Lastly, such long lists obscure reading of dumped IR. The pass unifies metadata after linking. Differential Revision: https://reviews.llvm.org/D25381 llvm-svn: 289092	2016-12-08 19:46:04 +00:00
Peter Collingbourne	235c275b20	IR, X86: Understand !absolute_symbol metadata on global variables. Summary: Attaching !absolute_symbol to a global variable does two things: 1) Marks it as an absolute symbol reference. 2) Specifies the value range of that symbol's address. Teach the X86 backend to allow absolute symbols to appear in place of immediates by extending the relocImm and mov64imm32 matchers. Start using relocImm in more places where it is legal. As previously proposed on llvm-dev: http://lists.llvm.org/pipermail/llvm-dev/2016-October/105800.html Differential Revision: https://reviews.llvm.org/D25878 llvm-svn: 289087	2016-12-08 19:01:00 +00:00
Chris Bieneman	fbf7dfe1ba	[ObjectYAML] Remove DWARF from class names Since all the DWARF classes are in a DWARFYAML namespace having every class start with DWARF seems like a bit of overkill. llvm-svn: 289080	2016-12-08 17:46:57 +00:00
Alexander Timofeev	18009560c5	[AMDGPU] Scalarization of global uniform loads. Summary: LC can currently select scalar load for uniform memory access basing on readonly memory address space only. This restriction originated from the fact that in HW prior to VI vector and scalar caches are not coherent. With MemoryDependenceAnalysis we can check that the memory location corresponding to the memory operand of the LOAD is not clobbered along the all paths from the function entry. Reviewers: rampitec, tstellarAMD, arsenm Subscribers: wdng, arsenm, nhaehnle Differential Revision: https://reviews.llvm.org/D26917 llvm-svn: 289076	2016-12-08 17:28:47 +00:00
Keno Fischer	dc09119776	ConstantFolding: Don't crash when encountering vector GEP ConstantFolding tried to cast one of the scalar indices to a vector type. Instead, use the vector type only for the first index (which is the only one allowed to be a vector) and use its scalar type otherwise. Fixes PR31250. Reviewers: majnemer Differential Revision: https://reviews.llvm.org/D27389 llvm-svn: 289073	2016-12-08 17:22:35 +00:00
Greg Clayton	b90328356a	Fix ASAN buildbots by fixing a double free crash. The dwarfgen::Generator::StringPool was in a unique_ptr but it was owned by the Allocator member variable so it was being free twice. llvm-svn: 289070	2016-12-08 16:57:04 +00:00
NAKAMURA Takumi	689493bb12	Prune unused libdeps. llvm-svn: 289060	2016-12-08 15:28:02 +00:00
NAKAMURA Takumi	66c8fa94a6	Prune unused \param(s) in r289050. [-Wdocumentation] llvm-svn: 289057	2016-12-08 15:00:12 +00:00
NAKAMURA Takumi	a495f882be	DIE::addAttribute(): Prune a redundant \param. [-Wdocumentation] llvm-svn: 289056	2016-12-08 15:00:07 +00:00
NAKAMURA Takumi	9ccd966612	LanaiInstPrinter: Prune unused libdeps. llvm-svn: 289054	2016-12-08 14:26:30 +00:00
NAKAMURA Takumi	b0f7b03711	DebugInfoDWARFTests: Prune unused libdeps. llvm-svn: 289053	2016-12-08 14:26:23 +00:00
NAKAMURA Takumi	fdf3edeb0c	DebugInfoDWARFTests: Add missing deps, AsmPrinter and Object. llvm-svn: 289052	2016-12-08 14:11:02 +00:00
NAKAMURA Takumi	bf177380ab	DebugInfoDWARFTests: Reorder LLVM_LINK_COMPONENTS. llvm-svn: 289051	2016-12-08 14:10:57 +00:00
Nicolai Haehnle	f08dc90253	[SelectionDAG] Add expansion and promotion of [US]MUL_LOHI Summary: Most targets set the action for these nodes to Expand even though there isn't actually any code for them in ExpandNode. Instead, targets simply relied on the fact that no code generates these nodes as long as the nodes aren't legal or custom. However, generating these nodes can be useful e.g. for divide-by-constant in wider integer types. Expand of [US]MUL_LOHI will use MULH[US] when legal or custom, and a sequence of half-width multiplications otherwise. Promote uses a wider multiply. This patch intends to not change the generated code, but indirect effects are possible since expansions/promotions that were previously done in DAGCombine may now be done in LegalizeDAG. See D24822 for a change that actually uses the new expansion. Reviewers: spatel, bkramer, venkatra, efriedma, hfinkel, ast, nadav, tstellarAMD Subscribers: arsenm, jyknight, nemanjai, wdng, nhaehnle, llvm-commits Differential Revision: https://reviews.llvm.org/D24956 llvm-svn: 289050	2016-12-08 14:08:14 +00:00
Nicolai Haehnle	3c67a08d1b	X86: Add checks for fma_patterns[_wide].ll with -enable-no-infs-fp-math This re-adds checks for the patterns that were disabled with r288506. Reviewers: spatel, delena, craig.topper Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D27346 llvm-svn: 289049	2016-12-08 14:08:08 +00:00
Nicolai Haehnle	2857dc3893	AMDGPU: Properly implement SIRegisterInfo::isFrameOffsetLegal and needsFrameBaseReg Summary: Without the fix to isFrameOffsetLegal to consider the instruction's immediate offset, the new test case hits the corresponding assertion in resolveFrameIndex, because the LocalStackSlotAllocation pass re-uses a different base register. With only the fix to isFrameOffsetLegal, code quality reduces in a bunch of places because frame base registers are added where they're not needed. This is addressed by properly implementing needsFrameBaseReg, which also helps to avoid unnecessary zero frame indices in a bunch of other places. Fixes piglit glsl-1.50/execution/variable-indexing/gs-output-array-vec4-index-wr.shader_test Reviewers: arsenm, tstellarAMD Subscribers: qcolombet, kzhuravl, wdng, yaxunl, tony-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D27344 llvm-svn: 289048	2016-12-08 14:08:02 +00:00
Daniel Jasper	0f77869d58	Move DwarfGenerator.cpp to unittests So far it creates a test helper and so it should be moved there. It also create a layering cycle between CodeGen and CodeGen/AsmPrinter, which should be avoided. Review: https://reviews.llvm.org/D27570 llvm-svn: 289044	2016-12-08 12:45:29 +00:00
Alexey Bataev	4f0d469d45	[SLP] Fix for PR6246: vectorization for scalar ops on vector elements. When trying to vectorize trees that start at insertelement instructions function tryToVectorizeList() uses vectorization factor calculated as MinVecRegSize/ScalarTypeSize. But sometimes it does not work as tree cost for this fixed vectorization factor is too high. Patch tries to improve the situation. It tries different vectorization factors from max(PowerOf2Floor(NumberOfVectorizedValues), MinVecRegSize/ScalarTypeSize) to MinVecRegSize/ScalarTypeSize and tries to choose the best one. Differential Revision: https://reviews.llvm.org/D27215 llvm-svn: 289043	2016-12-08 11:57:51 +00:00
Pavel Labath	82b95acfbe	Fix MSCV compilation broken by r289040 I wanted to use the "not" keyword to make sure it does not get lost in between other checks. MSVC does not like that. llvm-svn: 289041	2016-12-08 11:45:38 +00:00
Pavel Labath	fefefeb7f6	Improve format member detection in llvm::formatv Summary: The existing detection of a format member function has a couple of deficiencies: - the member function does not get detected if one calls formatv with an lvalue, because the template parameter gets deduced as T&, which fails the is_class check. - it also did not work if the function was called with a const variable because the template parameter would get deduced as const T&, again failing the is_class check. This fixes the problem by stripping the references in the uses_format_member template, to make sure the type is correctly detected as class. It also provides specializations of the has_FormatMember template for const and non-const members of the types in order to enable declaring the format member as a "const" function. I have added tests that verify that formatv can be now called in these scenarios. As some scenarios could not be verified at runtime (e.g. making sure that calling a non-const format member on a const object does not compile), I have also added some static_asserts which test the behaviour of the template classes used internally by formatv(). Reviewers: zturner Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D27525 llvm-svn: 289040	2016-12-08 11:31:19 +00:00
Dylan McKay	371117e7a5	[AVR] Add MIR tests for pseudo instruction expansions This adds tests for 13 pseudo instruction expansions. llvm-svn: 289039	2016-12-08 10:52:13 +00:00
Simon Pilgrim	413c8e217f	Wdocumentation fix llvm-svn: 289038	2016-12-08 10:41:41 +00:00
Simon Pilgrim	d35d067e47	Wdocumentation fix llvm-svn: 289037	2016-12-08 10:31:32 +00:00
Oliver Stannard	68e7c21ca0	Add a comment consumer mechanism to MCAsmLexer This allows clients to register an AsmCommentConsumer with the MCAsmLexer, which receives a callback each time a comment is parsed. Differential Revision: https://reviews.llvm.org/D27511 llvm-svn: 289036	2016-12-08 10:31:21 +00:00
Simon Pilgrim	d9c53710d5	[X86][SSE] Add vector test for (shl (or x, c1), c2) -> (or (shl x, c2), c1 << c2) detailed in D19325 llvm-svn: 289035	2016-12-08 10:17:25 +00:00
Dylan McKay	0cc0446ad2	[AVR] Add MIR tests for a few pseudo instructions llvm-svn: 289031	2016-12-08 08:54:41 +00:00
Dylan McKay	fac9ce5413	[AVR] Add an assertion to ensure we don't emit LPM when it's unsupported llvm-svn: 289030	2016-12-08 08:34:13 +00:00
Peter Collingbourne	f4257528e9	LTO: Hash the parts of the LTO configuration that affect code generation. Most importantly, we need to hash the relocation model, otherwise we can end up trying to link non-PIC object files into PIEs or DSOs. Differential Revision: https://reviews.llvm.org/D27556 llvm-svn: 289024	2016-12-08 05:28:30 +00:00
Greg Clayton	fd461fe360	Unbreak buildbots where the debug info test was crashing due to unchecked error. llvm-svn: 289017	2016-12-08 02:11:03 +00:00
Keno Fischer	d4ea4c18f1	Revert "[CodeGen] Fix invalid DWARF info on Win64" Appears to break on build bots. Reverting pending investigation. llvm-svn: 289014	2016-12-08 01:56:23 +00:00
Keno Fischer	460218fb7d	[CodeGen] Fix invalid DWARF info on Win64 The relocations for `DIEEntry::EmitValue` were wrong for Win64 (emitting FK_Data_4 instead of FK_SecRel_4). This corrects that oversight so that the DWARF data is correct in Win64 COFF files. Fixes PR15393. Patch by Jameson Nash <jameson@juliacomputing.com> based on a patch by David Majnemer. Differential Revision: https://reviews.llvm.org/D21731 llvm-svn: 289013	2016-12-08 01:40:21 +00:00
Greg Clayton	3462a420d1	Make a DWARF generator so we can unit test DWARF APIs with gtest. The only tests we have for the DWARF parser are the tests that use llvm-dwarfdump and expect output from textual dumps. More DWARF parser modification are coming in the next few weeks and I wanted to add tests that can verify that we can encode and decode all form types, as well as test some other basic DWARF APIs where we ask DIE objects for their children and siblings. DwarfGenerator.cpp was added in the lib/CodeGen directory. This file contains the code necessary to easily create DWARF for tests: dwarfgen::Generator DG; Triple Triple("x86_64--"); bool success = DG.init(Triple, Version); if (!success) return; dwarfgen::CompileUnit &CU = DG.addCompileUnit(); dwarfgen::DIE CUDie = CU.getUnitDIE(); CUDie.addAttribute(DW_AT_name, DW_FORM_strp, "/tmp/main.c"); CUDie.addAttribute(DW_AT_language, DW_FORM_data2, DW_LANG_C); dwarfgen::DIE SubprogramDie = CUDie.addChild(DW_TAG_subprogram); SubprogramDie.addAttribute(DW_AT_name, DW_FORM_strp, "main"); SubprogramDie.addAttribute(DW_AT_low_pc, DW_FORM_addr, 0x1000U); SubprogramDie.addAttribute(DW_AT_high_pc, DW_FORM_addr, 0x2000U); dwarfgen::DIE IntDie = CUDie.addChild(DW_TAG_base_type); IntDie.addAttribute(DW_AT_name, DW_FORM_strp, "int"); IntDie.addAttribute(DW_AT_encoding, DW_FORM_data1, DW_ATE_signed); IntDie.addAttribute(DW_AT_byte_size, DW_FORM_data1, 4); dwarfgen::DIE ArgcDie = SubprogramDie.addChild(DW_TAG_formal_parameter); ArgcDie.addAttribute(DW_AT_name, DW_FORM_strp, "argc"); // ArgcDie.addAttribute(DW_AT_type, DW_FORM_ref4, IntDie); ArgcDie.addAttribute(DW_AT_type, DW_FORM_ref_addr, IntDie); StringRef FileBytes = DG.generate(); MemoryBufferRef FileBuffer(FileBytes, "dwarf"); auto Obj = object::ObjectFile::createObjectFile(FileBuffer); EXPECT_TRUE((bool)Obj); DWARFContextInMemory DwarfContext(*Obj.get()); This code is backed by the AsmPrinter code that emits DWARF for the actual compiler. While adding unit tests it was discovered that DIEValue that used DIEEntry as their values had bugs where DW_FORM_ref1, DW_FORM_ref2, DW_FORM_ref8, and DW_FORM_ref_udata forms were not supported. These are all now supported. Added support for DW_FORM_string so we can emit inlined C strings. Centralized the code to unique abbreviations into a new DIEAbbrevSet class and made both the dwarfgen::Generator and the llvm::DwarfFile classes use the new class. Fixed comments in the llvm::DIE class so that the Offset is known to be the compile/type unit offset. DIEInteger now supports more DW_FORM values. There are also unit tests that cover: Encoding and decoding all form types and values Encoding and decoding all reference types (DW_FORM_ref1, DW_FORM_ref2, DW_FORM_ref4, DW_FORM_ref8, DW_FORM_ref_udata, DW_FORM_ref_addr) including cross compile unit references with that go forward one compile unit and backward on compile unit. Differential Revision: https://reviews.llvm.org/D27326 llvm-svn: 289010	2016-12-08 01:03:48 +00:00
Evgeniy Stepanov	0c8957c198	CFI-icall on Thumb Replace @progbits in the section directive with %progbits, because "@" starts a comment on arm/thumb. Use b.w branch instruction. Use .thumb_function and .thumb_set for proper arm/thumb interwork. This way jumptable entry addresses on thumb have bit 0 set (correctly). This does not affect CFI check math, because the address of the jumptable start also has that bit set. This does not work on thumbv5, because it does not support b.w, and the linker would not insert a veneer (trampoline?) to extend the range of b.n. We may need to do full-range plt-style jumptables on thumbv54, which are 12 bytes per entry. Another option is "push lr; bl; pop pc" (4 bytes) but that needs unwinding instructions, etc. Differential Revision: https://reviews.llvm.org/D27499 llvm-svn: 289008	2016-12-08 00:32:26 +00:00
Peter Collingbourne	cfef0cd31b	LTO: Remove the unused Config::Features field. We are currently initializing Features via MAttrs. llvm-svn: 289007	2016-12-08 00:27:37 +00:00
Matthias Braun	9ee1a1df24	The few days mentioned in r267095 are over llvm-svn: 289004	2016-12-08 00:16:42 +00:00
Matthias Braun	e2d2ead661	TargetPassConfig: Rename DisablePostRA -> DisablePostRASched; NFC llvm-svn: 289003	2016-12-08 00:16:08 +00:00
Matthias Braun	0c989a893b	LivePhysReg: Use reference instead of pointer in init(); NFC llvm-svn: 289002	2016-12-08 00:15:51 +00:00
Quentin Colombet	ae3168da3f	[InlineSpiller] Don't call TargetInstrInfo::foldMemoryOperand with an empty list. Since r287792 if we try to do that we will hit an assert. llvm-svn: 289001	2016-12-08 00:06:51 +00:00
Filipe Cabecinhas	d5b21b2418	[asan] Split load and store checks in test. NFCI llvm-svn: 288991	2016-12-07 22:37:11 +00:00
Chris Bieneman	7d7364ab4f	[yaml2obj] Refactor and abstract yaml2dwarf functions This abstracts the code for emitting DWARF binary from the DWARFYAML types into reusable interfaces that could be used by ELF and COFF. llvm-svn: 288990	2016-12-07 22:30:15 +00:00
Eugene Zelenko	9408c61830	[ADT, IR] Fix some Clang-tidy modernize-use-equals-delete and Include What You Use warnings; other minor fixes (NFC). llvm-svn: 288989	2016-12-07 22:06:02 +00:00
Davide Italiano	1ed5396304	[BDCE] Skip metadata while replacing uses. The fix committed in r288851 doesn't cover all the cases. In particular, if we have an instruction with side effects which has a no non-dbg use not depending on the bits, we still perform RAUW destroying the dbg.value's first argument. Prevent metadata from being replaced here to avoid the issue. Differential Revision: https://reviews.llvm.org/D27534 llvm-svn: 288987	2016-12-07 21:47:32 +00:00
Chris Bieneman	26d060fbf9	[obj2yaml] Refactor and abstract dwarf2yaml This makes the dwarf2yaml code separated and reusable allowing ELF and COFF to share implementations with MachO. llvm-svn: 288986	2016-12-07 21:47:28 +00:00
Tim Northover	c53606ef02	GlobalISel: use correct builder for ConstantExprs. ConstantExpr instances were emitting code into the current block rather than the entry block. This meant they didn't necessarily dominate all uses, which is clearly wrong. llvm-svn: 288985	2016-12-07 21:29:15 +00:00
Chris Bieneman	79e60eb948	[ObjectYAML] Pull DWARF support into DWARFYAML namespace Since DWARF formatting is agnostic to the object file it is stored in, it doesn't make sense for this to be in the MachOYAML implementation. Pulling it into its own namespace means we could modify the ELF and COFF YAML tools to emit DWARF as well. In a follow-up patch I will better abstract this in obj2yaml and yaml2obj so that the DWARF bits in the tools can be re-used too. llvm-svn: 288984	2016-12-07 21:26:32 +00:00

... 3 4 5 6 7 ...

142098 Commits