llvm-project

Commit Graph

Author	SHA1	Message	Date
Cameron McInally	9b7c15a364	[AVX512] Add 512b integer shift by variable intrinsics and patterns. llvm-svn: 222786	2014-11-25 20:41:51 +00:00
Colin LeMahieu	f7f156ffc3	[Hexagon] [NFC] Adding trailing whitespace to test files. llvm-svn: 222785	2014-11-25 20:22:24 +00:00
Colin LeMahieu	e83bc7476f	[Hexagon] Adding C2_mux instruction. llvm-svn: 222784	2014-11-25 20:20:09 +00:00
Hans Wennborg	bda193edff	Remove useless rdar:// comment from switch_to_lookup_table.ll test. llvm-svn: 222772	2014-11-25 18:45:23 +00:00
Colin LeMahieu	902157c249	[Hexagon] Replacing cmp* instructions with ones that contain encoding bits. llvm-svn: 222771	2014-11-25 18:20:52 +00:00
Hans Wennborg	45172aceb3	LazyValueInfo: Actually re-visit partially solved block-values in solveBlockValue() If solveBlockValue() needs results from predecessors that are not already computed, it returns false with the intention of resuming when the dependencies have been resolved. However, the computation would never be resumed since an 'overdefined' result had been placed in the cache, preventing any further computation. The point of placing the 'overdefined' result in the cache seems to have been to break cycles, but we can check for that when inserting work items in the BlockValue stack instead. This makes the "stop and resume" mechanism of solveBlockValue() work as intended, unlocking more analysis. Using this patch shaves 120 KB off a 64-bit Chromium build on Linux. I benchmarked compiling bzip2.c at -O2 but couldn't measure any difference in compile time. Tests by Jiangning Liu from r215343 / PR21238, Pete Cooper, and me. Differential Revision: http://reviews.llvm.org/D6397 llvm-svn: 222768	2014-11-25 17:23:05 +00:00
Joerg Sonnenberger	4af2f12153	Small model and JIT generally don't go well with each other. On LP64 platforms, it will work or not depending on the choosen memory layout, so neither PASS nor XFAIL is appropiate. As UNSUPPORTED as per-test target doesn't exist (yet), remove the test instead to unbreak the builds. llvm-svn: 222767	2014-11-25 17:14:22 +00:00
Rafael Espindola	c81c3f554c	Set the body of a new struct as soon as it is created. This changes the order in which different types are passed to get, but one order is not inherently better than the other. The main motivation is that this simplifies linkDefinedTypeBodies now that it is only linking "real" opaque types. It is also means that we only have to call it once and that we don't need getImpl. A small change in behavior is that we don't copy type names when resolving opaque types. This is an improvement IMHO, but it can be added back if desired. A test is included with the new behavior. llvm-svn: 222764	2014-11-25 15:33:40 +00:00
Joerg Sonnenberger	cf0ea262b1	Reapply 222538 and update tests to explicitly request small code model and PIC: Allow FDE references outside the +/-2GB range supported by PC relative offsets for code models other than small/medium. For JIT application, memory layout is less controlled and can result in truncations otherwise. Patch from Akos Kiss. Differential Revision: http://reviews.llvm.org/D6079 llvm-svn: 222760	2014-11-25 13:37:55 +00:00
Joerg Sonnenberger	b6e36e1421	Mark as explicit failing on x86-64 -- small memory model doesn't agree with default address selections. llvm-svn: 222759	2014-11-25 13:28:56 +00:00
Zoran Jovanovic	b554bba90f	[mips][micromips] Use call instructions with short delay slots Differential Revision: http://reviews.llvm.org/D6338 llvm-svn: 222752	2014-11-25 10:50:00 +00:00
Chandler Carruth	816d26fe5e	[InstCombine] Change LLVM To canonicalize toward the value type being stored rather than the pointer type. This change is analogous to r220138 which changed the canonicalization for loads. The rationale is the same: memory does not have a type, operations (and thus the values they produce) have a type. We should match that type as closely as possible rather than reading some form of semantics into the pointer type. With this change, loads and stores should no longer be made with nonsensical types for the values that tehy load and store. This is particularly important when trying to match specific loaded and stored types in the process of doing other instcombines, which is what led me down this twisty maze of miscanonicalization. I've put quite some effort into looking through IR to find places where LLVM's optimizer was being unreasonably conservative in the face of mismatched load and store types, however it is possible (let's say, likely!) I have missed some. If you see regressions here, or from r220138, the likely cause is some part of LLVM failing to cope with load and store types differing. Test cases appreciated, it is important that we root all of these out of LLVM. llvm-svn: 222748	2014-11-25 10:09:51 +00:00
Suyog Sarda	99c9c1f2b0	Change the test case file to use FileCheck instead of grep. NFC. Change by Ankur Garg. Differential Revision: http://reviews.llvm.org/D6382 llvm-svn: 222740	2014-11-25 08:44:56 +00:00
Chandler Carruth	1a3c2c414c	Revert r220349 to re-instate r220277 with a fix for PR21330 -- quite clearly only exactly equal width ptrtoint and inttoptr casts are no-op casts, it says so right there in the langref. Make the code agree. Original log from r220277: Teach the load analysis to allow finding available values which require inttoptr or ptrtoint cast provided there is datalayout available. Eventually, the datalayout can just be required but in practice it will always be there today. To go with the ability to expose available values requiring a ptrtoint or inttoptr cast, helpers are added to perform one of these three casts. These smarts are necessary to finish canonicalizing loads and stores to the operational type requirements without regressing fundamental combines. I've added some test cases. These should actually improve as the load combining and store combining improves, but they may fundamentally be highlighting some missing combines for select in addition to exercising the specific added logic to load analysis. llvm-svn: 222739	2014-11-25 08:20:27 +00:00
David Majnemer	8cf0dbb015	Forgot to add a file for r222734 llvm-svn: 222736	2014-11-25 07:45:56 +00:00
David Majnemer	0349a019ca	COFF: Add another test for r222124 llvm-svn: 222734	2014-11-25 07:42:36 +00:00
Rafael Espindola	86911440c2	Fix overly aggressive type merging. If we find out that two types are not isomorphic, we learn nothing about opaque sub types in both the source and destination. llvm-svn: 222727	2014-11-25 05:59:24 +00:00
Simon Atanasyan	84f4651f9a	[Object][Mips] Return address of MIPS symbol with cleared microMIPS indicator bit llvm-svn: 222726	2014-11-25 05:57:55 +00:00
Rafael Espindola	e96d7eb8bd	Link the type of aliases. They are not more or less "well typed" than GlobalVariables. llvm-svn: 222725	2014-11-25 04:43:59 +00:00
Juergen Ributzka	eb67bd8d74	[FastISel][AArch64] Fix and extend the tbz/tbnz pattern matching. The pattern matching failed to recognize all instances of "-1", because when comparing against "-1" we didn't use an APInt of the same bitwidth. This commit fixes this and also adds inverse versions of the conditon to catch more cases. llvm-svn: 222722	2014-11-25 04:16:15 +00:00
Rafael Espindola	27ce577356	Add an interesting test that we already get right. NFC. llvm-svn: 222720	2014-11-25 03:47:57 +00:00
David Majnemer	bd9ce4ea51	InstSimplify: Handle some simple tautological comparisons This handles cases where we are comparing a masked value against itself. The analysis could be further improved by making it recursive but such expense is not currently justified. llvm-svn: 222716	2014-11-25 02:55:48 +00:00
Hal Finkel	5901676581	[PowerPC] Add the 'attn' instruction The attn instruction is not part of the Power ISA, but is documented in the A2 user manual, and is accepted by the GNU assembler for the A2 and the POWER4+. Reported as part of PR21650. llvm-svn: 222712	2014-11-25 00:30:11 +00:00
Hal Finkel	360f213d03	[PowerPC] Implement combineRepeatedFPDivisors This does not matter on newer cores (where we can use reciprocal estimates in fast-math mode anyway), but for older cores this allows us to generate better fast-math code where we have multiple FDIVs with a common divisor. llvm-svn: 222710	2014-11-24 23:45:21 +00:00
Matt Arsenault	238ff1ad1e	Bug 21610: Canonicalize min/max fcmp selects to use ordered comparisons llvm-svn: 222705	2014-11-24 23:15:18 +00:00
Matt Arsenault	ea515d33c9	Convert test to FileCheck and use CHECK-LABEL llvm-svn: 222704	2014-11-24 23:03:17 +00:00
Rafael Espindola	6953a3a6e0	Add a disable-output option to the gold plugin. This corresponds to the opt option and is handy for profiling. llvm-svn: 222687	2014-11-24 21:18:14 +00:00
Rafael Espindola	3410466495	Pass the .ll files to llvm-link directly. NFC. llvm-svn: 222681	2014-11-24 20:35:59 +00:00
Kostya Serebryany	4cadd4afa0	[asan/coverage] change the way asan coverage instrumentation is done: instead of setting the guard to 1 in the generated code, pass the pointer to guard to __sanitizer_cov and set it there. No user-visible functionality change expected llvm-svn: 222675	2014-11-24 18:49:53 +00:00
Ulrich Weigand	a69bcd5ed0	[PowerPC] Fix PR 21652 - copy st_other bits on symbol assignment When processing an assignment in the integrated assembler that sets a symbol to the value of another symbol, we need to copy the st_other bits that encode the local entry point offset. Modeled after MipsTargetELFStreamer::emitAssignment handling of the ELF::STO_MIPS_MICROMIPS flag. llvm-svn: 222672	2014-11-24 18:09:47 +00:00
Colin LeMahieu	397a25e7cd	[Hexagon] Adding asrh instruction, removing unused multiclasses. llvm-svn: 222670	2014-11-24 18:04:42 +00:00
Colin LeMahieu	3b3197ef95	[Hexagon] Adding aslh instruction. llvm-svn: 222668	2014-11-24 17:44:19 +00:00
Colin LeMahieu	098256c5e6	[Hexagon] Adding zxth instruction. llvm-svn: 222662	2014-11-24 17:11:34 +00:00
Colin LeMahieu	bb7d6f5514	[Hexagon] Adding zxtb instruction. llvm-svn: 222660	2014-11-24 16:48:43 +00:00
David Majnemer	8e6f6a98b5	InstCombine: Don't create an unused instruction We would create an instruction but not inserting it. Not inserting the unused instruction would lead us to verification failure. This fixes PR21653. llvm-svn: 222659	2014-11-24 16:41:13 +00:00
Jozef Kolek	11bdb8bf33	[mips][microMIPS] Fix JRADDIUSP instruction Fix JRADDIUSP instruction, remove delay slot flag because this instruction doesn't have delay slot. Differential Revision: http://reviews.llvm.org/D6365 llvm-svn: 222658	2014-11-24 16:14:10 +00:00
Jozef Kolek	e8c9d1eaf7	[mips][microMIPS] Implement LBU16, LHU16, LW16, SB16, SH16 and SW16 instructions Differential Revision: http://reviews.llvm.org/D5122 llvm-svn: 222653	2014-11-24 14:39:13 +00:00
Jozef Kolek	ea22c4cfbb	[mips][microMIPS] Implement disassembler support for 16-bit instructions With the help of new method readInstruction16() two bytes are read and decodeInstruction() is called with DecoderTableMicroMips16, if this fails four bytes are read and decodeInstruction() is called with DecoderTableMicroMips32. Differential Revision: http://reviews.llvm.org/D6149 llvm-svn: 222648	2014-11-24 13:29:59 +00:00
Andrea Di Biagio	23e2cfa834	[X86] Improved target specific combine on VSELECT dag nodes. This patch teaches function 'transformVSELECTtoBlendVECTOR_SHUFFLE' how to convert VSELECT dag nodes to shuffles on targets that do not have SSE4.1. On pre-SSE4.1 targets, we can still perform blend operations using movss/movsd. Also, removed a target specific combine that performed a premature lowering of VSELECT nodes to target specific MOVSS/MOVSD nodes. llvm-svn: 222647	2014-11-24 12:23:15 +00:00
David Majnemer	b2a6e7458d	InstCombine: Don't assume DataLayout is always available We tried to get the result of DataLayout::getLargestLegalIntTypeSize but we didn't have a DataLayout. This resulted in opt crashing. This fixes PR21651. llvm-svn: 222645	2014-11-24 07:26:20 +00:00
Michael Kuperstein	9ef90647b9	[X86] Fixes bug in build_vector v4x32 lowering r222375 made some improvements to build_vector lowering of v4x32 and v4xf32 into an insertps, but it missed a case where: 1. A single extracted element is used twice. 2. The lower of the two non-zero indexes should be preserved, and the higher should be used for the dest mask. This caused a crash, since the source value for the insertps ends-up uninitialized. Differential Revision: http://reviews.llvm.org/D6377 llvm-svn: 222635	2014-11-23 13:09:06 +00:00
Elena Demikhovsky	9e5089a938	Masked Vector Load and Store Intrinsics. Introduced new target-independent intrinsics in order to support masked vector loads and stores. The loop vectorizer optimizes loops containing conditional memory accesses by generating these intrinsics for existing targets AVX2 and AVX-512. The vectorizer asks the target about availability of masked vector loads and stores. Added SDNodes for masked operations and lowering patterns for X86 code generator. Examples: <16 x i32> @llvm.masked.load.v16i32(i8* %addr, <16 x i32> %passthru, i32 4 /* align /, <16 x i1> %mask) declare void @llvm.masked.store.v8f64(i8 %addr, <8 x double> %value, i32 4, <8 x i1> %mask) Scalarizer for other targets (not AVX2/AVX-512) will be done in a separate patch. http://reviews.llvm.org/D6191 llvm-svn: 222632	2014-11-23 08:07:43 +00:00
Matt Arsenault	2a495975ed	R600: Fix extloads of i1 on R600/Evergreen llvm-svn: 222631	2014-11-23 02:57:54 +00:00
Matt Arsenault	b7ebdffe3c	R600/SI: Add additional tests for i1 loads llvm-svn: 222629	2014-11-23 02:57:50 +00:00
Matt Arsenault	79db0a70bc	R600/SI: Fix broken check lines and modernize prefixes Use -LABEL and remove -CHECK llvm-svn: 222628	2014-11-23 02:57:49 +00:00
Matt Arsenault	8499ea6a90	R600/SI: Fix missing -verify-machineinstrs on a test llvm-svn: 222627	2014-11-23 02:57:47 +00:00
David Majnemer	fb3805576b	InstCombine: Propagate exact for (sdiv X, Pow2) -> (udiv X, Pow2) llvm-svn: 222625	2014-11-22 20:00:41 +00:00
David Majnemer	ec6e481bc5	InstCombine: Propagate exact for (sdiv X, Y) -> (udiv X, Y) llvm-svn: 222624	2014-11-22 20:00:38 +00:00
David Majnemer	fa4699e65f	InstCombine: Propagate exact for (sdiv -X, C) -> (sdiv X, -C) llvm-svn: 222623	2014-11-22 20:00:34 +00:00
David Majnemer	a3aeb15613	InstCombine: Propagate exact in (udiv (lshr X,C1),C2) -> (udiv x,C1<<C2) llvm-svn: 222620	2014-11-22 18:16:54 +00:00
David Majnemer	546f81064c	InstCombine: Propagate NSW/NUW for X*(1<<Y) -> X<<Y llvm-svn: 222613	2014-11-22 08:57:02 +00:00
David Majnemer	8279a7506d	InstCombine: Propagate NSW for -X * -Y -> X * Y llvm-svn: 222612	2014-11-22 07:25:19 +00:00
David Majnemer	4efa9ff8ca	InstSimplify: Simplify (sub 0, X) -> X if it's NUW This is a generalization of the X - (0 - Y) -> X transform. llvm-svn: 222611	2014-11-22 07:15:16 +00:00
Chandler Carruth	8c44d86ab8	[x86] Add some tests for a common unpack pattern of vector shuffle that has a remarkably unique and efficient lowering. While we get this some of the time already, we miss a few cases and there wasn't a principled reason we got it. We should at least test this. v8 already has tests for this pattern. llvm-svn: 222607	2014-11-22 05:44:43 +00:00
David Majnemer	80c8f627db	InstCombine: Preserve nsw when folding X*(2^C) -> X << C llvm-svn: 222606	2014-11-22 04:52:55 +00:00
David Majnemer	fd4a6d2b7a	InstCombine: Preserve nsw/nuw for ((X << C2)C1) -> (X (C1 << C2)) llvm-svn: 222605	2014-11-22 04:52:52 +00:00
David Majnemer	027bc80928	InstCombine: Preserve nsw for (mul %V, -1) -> (sub 0, %V) llvm-svn: 222604	2014-11-22 04:52:38 +00:00
Gerolf Hoflehner	ec6217c929	[InstCombine] Re-commit of r218721 (Optimize icmp-select-icmp sequence) Fixes the self-host fail. Note that this commit activates dominator analysis in the combiner by default (like the original commit did). llvm-svn: 222590	2014-11-21 23:36:44 +00:00
Joerg Sonnenberger	02b13a8d9b	Fix transformation of add with pc argument to adr for non-immediate arguments. llvm-svn: 222587	2014-11-21 22:39:34 +00:00
Kostya Serebryany	60ef25bd54	[asan] remove old experimental code llvm-svn: 222586	2014-11-21 22:34:29 +00:00
Tom Stellard	f1206edfd0	R600/SI: Add a failing test case for offset order in ds_read2 instructions llvm-svn: 222585	2014-11-21 22:31:47 +00:00
Tom Stellard	a99ada528c	R600/SI: Emit s_mov_b32 m0, -1 before every DS instruction This s_mov_b32 will write to a virtual register from the M0Reg class and all the ds instructions now take an extra M0Reg explicit argument. This change is necessary to prevent issues with the scheduler mixing together instructions that expect different values in the m0 registers. llvm-svn: 222583	2014-11-21 22:31:44 +00:00
Tom Stellard	6596ba7933	R600/SI: Add SIFoldOperands pass This pass attempts to fold the source operands of mov and copy instructions into their uses. llvm-svn: 222581	2014-11-21 22:06:37 +00:00
Jozef Kolek	3b8ddb665b	[mips][microMIPS] This patch implements functionality in MIPS delay slot filler such as if delay slot filler have to put NOP instruction into the delay slot of microMIPS BEQ or BNE instruction which uses the register $0, then instead of emitting NOP this instruction is replaced by the corresponding microMIPS compact branch instruction, i.e. BEQZC or BNEZC. Differential Revision: http://reviews.llvm.org/D3566 llvm-svn: 222580	2014-11-21 22:04:35 +00:00
Tom Stellard	3ae588789e	R600/SI: Use hex notation for constant in test llvm-svn: 222578	2014-11-21 22:00:13 +00:00
Colin LeMahieu	310991c66f	[Hexagon] Adding sxth instruction. llvm-svn: 222577	2014-11-21 21:54:59 +00:00
Colin LeMahieu	91ffec908f	[Hexagon] Adding sxtb instruction. Renaming some identically named classes that will be removed after converting referencing defs. llvm-svn: 222575	2014-11-21 21:35:52 +00:00
Manman Ren	f0a582bada	Debug Info: revert r222195, r222210 and r222239. This is no longer needed after David's fix at r222377 + r222485. rdar://18958417 llvm-svn: 222563	2014-11-21 19:55:23 +00:00
Sanjay Patel	501890e909	Add a feature flag for slow 32-byte unaligned memory accesses [x86]. This patch adds a feature flag to avoid unaligned 32-byte load/store AVX codegen for Sandy Bridge and Ivy Bridge. There is no functionality change intended for those chips. Previously, the absence of AVX2 was being used as a proxy to detect this feature. But that hindered codegen for AVX-enabled AMD chips such as btver2 that do not have the 32-byte unaligned access slowdown. Performance measurements are included in PR21541 ( http://llvm.org/bugs/show_bug.cgi?id=21541 ). Differential Revision: http://reviews.llvm.org/D6355 llvm-svn: 222544	2014-11-21 17:40:04 +00:00
Chandler Carruth	ce5a26b0e7	[x86] Restructure the checking patterns for v16 and v32 avx2 vector shuffle lowering to allow much better blend matching. Specifically, with the new structure the code seems clearer to me and we correctly can hit the cases where merging two 128-bit lanes is a clear win and can be shuffled cheaply afterward. llvm-svn: 222539	2014-11-21 14:53:03 +00:00
Chandler Carruth	6c4d1ea8c4	[x86] Make the previous logic significantly less conservative and get a bunch more improvements. Non-lane-crossing is fine, the key is that lane merging only makes sense for single-input shuffles. Not sure why I got so turned around here. The code all works, I was just using the wrong model for it. This only updates v4 and v8 lowering. The v16 and v32 lowering requires restructuring the entire check sequence. llvm-svn: 222537	2014-11-21 14:33:24 +00:00
Andrea Di Biagio	0225b5bf6f	[DAG] Teach how to turn a build_vector into a shuffle if some of the operands are zero. Before this patch, the DAGCombiner only tried to convert build_vector dag nodes into shuffles if all operands were either extract_vector_elt or undef. This patch improves that logic and teaches the DAGCombiner how to deal with build_vector dag nodes where one or more operands are zero. A build_vector dag node with some zero operands is turned into a shuffle only if the resulting shuffle mask is legal for the target. llvm-svn: 222536	2014-11-21 14:32:06 +00:00
Chandler Carruth	d2b19bc867	[x86] Teach the x86 vector shuffle lowering to detect mergable 128-bit lanes. By special casing these we can often either reduce the total number of shuffles significantly or reduce the number of (high latency on Haswell) AVX2 shuffles that potentially cross 128-bit lanes. Even when these don't actually cross lanes, they have much higher latency to support that. Doing two of them and a blend is worse than doing a single insert across the 128-bit lanes to blend and then doing a single interleaved shuffle. While this seems like a narrow case, it kept cropping up on me and the difference is huge as you can see in many of the test cases. I first hit this trying to perfectly fix the interleaving shuffle patterns used by Halide for AVX2. llvm-svn: 222533	2014-11-21 13:56:05 +00:00
Chandler Carruth	77e1a0ad1f	[x86] Remove more windows line endings that slipped into this file... llvm-svn: 222528	2014-11-21 12:33:46 +00:00
Chandler Carruth	61c7b6252c	[x86] Add a bunch of test cases to 256-bit shuffles that exercise merging 128-bit subvectors and also shuffling all the elements of those subvectors. Currently we generate pretty bad code for many of these, but I'm testing a patch that should dramatically improve this in addition to making the shuffle lowering robust to other changes. llvm-svn: 222525	2014-11-21 12:17:50 +00:00
Alexey Volkov	fd1731d876	[X86] For Silvermont CPU use 16-bit division instead of 64-bit for small positive numbers Differential Revision: http://reviews.llvm.org/D5938 llvm-svn: 222521	2014-11-21 11:19:34 +00:00
Yury Gribov	55441bb601	[asan] Add new hidden compile-time flag asan-instrument-allocas to sanitize variable-sized dynamic allocas. Patch by Max Ostapenko. Reviewed at http://reviews.llvm.org/D6055 llvm-svn: 222519	2014-11-21 10:29:50 +00:00
Hao Liu	44e5d7a131	DAGCombiner: Allow the DAGCombiner to combine multiple FDIVs with the same divisor info FMULs by the reciprocal. E.g., ( a / D; b / D ) -> ( recip = 1.0 / D; a * recip; b * recip) A hook is added to allow the target to control whether it needs to do such combine. Reviewed in http://reviews.llvm.org/D6334 llvm-svn: 222510	2014-11-21 06:39:58 +00:00
Hal Finkel	f413be11f0	[PPC] Use SeparateConstOffsetFromGEP This mirrors r222331, which enabled SeparateConstOffsetFromGEP on AArch64, in the PowerPC backend. Yields, on a POWER7 machine, a 30% speedup on SingleSource/Benchmarks/Shootout/nestedloop (this might just be from LICM, there is a store moved out of the inner loop) and a potential speedup on MultiSource/Benchmarks/mediabench/mpeg2/mpeg2dec/mpeg2decode. Regardless, it makes some code look cleaner, and synchronizing the backends in this regard seems like a generally good thing. llvm-svn: 222504	2014-11-21 04:35:51 +00:00
David Majnemer	c0a313b57c	SROA: The alloca type isn't a candidate promotion type for vectors The alloca's type is irrelevant, only those types which are used in a load or store of the exact size of the slice should be considered. This manifested as an assertion failure when we compared the various types: we had a size mismatch. This fixes PR21480. llvm-svn: 222499	2014-11-21 02:34:55 +00:00
Quentin Colombet	a7439d4483	[X86] Do not custom lower UINT_TO_FP when the target type does not match the custom lowering. <rdar://problem/19026326> llvm-svn: 222489	2014-11-21 00:47:19 +00:00
Michael Zolotukhin	0dcae71449	Fix a trip-count overflow issue in LoopUnroll. Currently LoopUnroll generates a prologue loop before the main loop body to execute first N%UnrollFactor iterations. Also, this loop is used if trip-count can overflow - it's determined by a runtime check. However, we've been mistakenly optimizing this loop to a linear code for UnrollFactor = 2, not taking into account that it also serves as a safe version of the loop if its trip-count overflows. llvm-svn: 222451	2014-11-20 20:19:55 +00:00
Saleem Abdulrasool	2f3b3f3182	X86: use the correct alloca symbol for Windows Itanium Windows itanium targets the MSVCRT, and the stack probe symbol is provided by MSVCRT. This corrects the emission of stack probes on i686-windows-itanium. llvm-svn: 222439	2014-11-20 18:01:26 +00:00
Renato Golin	a03161c6ee	MCJIT tests passing on ARM after r222414 fixed the relocation llvm-svn: 222430	2014-11-20 13:32:16 +00:00
Jyoti Allur	5b9f35220e	[ELF] Prevent ARM ELF object writer from generating deprecated relocation code R_ARM_PLT32 llvm-svn: 222414	2014-11-20 05:58:11 +00:00
David Majnemer	ccce9ae4c7	Add a test for r221870 bad-relocs.obj.coff-i386 has a relocation whose symbol index is outside the symbol table. llvm-svn: 222413	2014-11-20 05:32:10 +00:00
Colin LeMahieu	ac00643603	[Hexagon] Adding A2_xor instruction with IR selection pattern and test. llvm-svn: 222399	2014-11-19 23:22:23 +00:00
Chad Rosier	90a2f9b110	Revert "[Reassociate] As the expression tree is rewritten make sure the operands are" This reverts commit r222142. This is causing/exposing an execution-time regression in spec2006/gcc and coremark on AArch64/A57/Ofast. Conflicts: test/Transforms/Reassociate/optional-flags.ll llvm-svn: 222398	2014-11-19 23:21:20 +00:00
Colin LeMahieu	21866546ae	[Hexagon] Adding A2_or instruction with IR selection pattern and test. llvm-svn: 222396	2014-11-19 22:58:04 +00:00
Andrea Di Biagio	1b657bfcc8	[X86] Improved lowering of v4x32 build_vector dag nodes. This patch improves the lowering of v4f32 and v4i32 build_vector dag nodes that are known to have at least two non-zero elements. With this patch, a build_vector that performs a blend with zero is converted into a shuffle. This is done to let the shuffle legalizer expand the dag node in a optimal way. For example, if we know that a build_vector performs a blend with zero, we can try to lower it as a movq/blend instead of always selecting an insertps. This patch also improves the logic that lowers a build_vector into a insertps with zero masking. See for example the extra test cases added to test sse41.ll. Differential Revision: http://reviews.llvm.org/D6311 llvm-svn: 222375	2014-11-19 19:34:29 +00:00
Tom Stellard	e0ddfd11ea	R600/SI: Make SIInstrInfo::isOperandLegal() more strict A register operand that has a common sub-class with its instruction's defined register class is not always legal. For example, SReg_32 and M0Reg both have a common sub-class, but we can't use an SReg_32 in instructions that expect a M0Reg. This prevents the llvm.SI.sendmsg.ll test from failing when the fold operand pass is added. llvm-svn: 222368	2014-11-19 16:58:49 +00:00
Zoran Jovanovic	a4c4b5fc01	[mips][micromips] Implement SWM32 and LWM32 instructions Differential Revision: http://reviews.llvm.org/D5519 llvm-svn: 222367	2014-11-19 16:44:02 +00:00
Suyog Sarda	aba97f4aba	Vectorize a reduction chain feeding into a 'return' statement. e.x return (a[0]+b[0]) + (a[1]+b[1]) Differential Revision: http://reviews.llvm.org/D6227 llvm-svn: 222364	2014-11-19 16:07:38 +00:00
Jozef Kolek	ffeed44190	[mips][microMIPS] Fix opcodes of MFHC1 and MTHC1 instructions. Differential Revision: http://reviews.llvm.org/D6169 llvm-svn: 222355	2014-11-19 13:37:51 +00:00
Arnaud A. de Grandmaison	7b9dc28060	Fix tail recursion elimination When the BasicBlock containing the return instrution has a PHI with 2 incoming values, FoldReturnIntoUncondBranch will remove the no longer used incoming value and remove the no longer needed phi as well. This leaves us with a BB that no longer has a PHI, but the subsequent call to FoldReturnIntoUncondBranch from FoldReturnAndProcessPred will not remove the return instruction (which still uses the result of the call instruction). This prevents EliminateRecursiveTailCall to remove the value, as it is still being used in a basicblock which has no predecessors. The basicblock can not be erased on the spot, because its iterator is still being used in runTRE. This issue was exposed when removing the threshold on size for lifetime marker insertion for named temporaries in clang. The testcase is a much reduced version of peelOffOuterExpr(const Expr, const ExplodedNode ) from clang/lib/StaticAnalyzer/Core/BugReporterVisitors.cpp. llvm-svn: 222354	2014-11-19 13:32:51 +00:00
Jozef Kolek	4d55b4d768	[mips][microMIPS] Implement CodeGen support for 16-bit instruction ADDIUR2. Differential Revision: http://reviews.llvm.org/D5800 llvm-svn: 222352	2014-11-19 13:23:58 +00:00
Jozef Kolek	73f64eac8c	[mips][microMIPS] Implement CodeGen support for ADDIUS5 instruction. Differential Revision: http://reviews.llvm.org/D5799 llvm-svn: 222351	2014-11-19 13:11:09 +00:00
Jozef Kolek	55bb542856	[mips][microMIPS] Add disassembler tests for new microMIPS 32-bit instructions: LWXS, BGEZALS, BLTZALS, BEQZC, BNEZC, JALS and JALRS. http://reviews.llvm.org/D5413 llvm-svn: 222349	2014-11-19 11:49:57 +00:00
Jozef Kolek	5f95dd2b65	[mips][microMIPS] Implement LWXS instruction. Differential Revision: http://reviews.llvm.org/D5407 llvm-svn: 222348	2014-11-19 11:39:12 +00:00
Jozef Kolek	dc62fc4a8f	[mips][microMIPS] Implement SDBBP and RDHWR instructions. Differential Revision: http://reviews.llvm.org/D5240 llvm-svn: 222347	2014-11-19 11:25:50 +00:00
Simon Pilgrim	3ac3b251a9	[X86][SSE] pslldq/psrldq byte shifts/rotation for SSE2 This patch builds on http://reviews.llvm.org/D5598 to perform byte rotation shuffles (lowerVectorShuffleAsByteRotate) on pre-SSSE3 (palignr) targets - pre-SSSE3 is only enabled on i8 and i16 vector targets where it is a more definite performance gain. I've also added a separate byte shift shuffle (lowerVectorShuffleAsByteShift) that makes use of the ability of the SLLDQ/SRLDQ instructions to implicitly shift in zero bytes to avoid the need to create a zero register if we had used palignr. Differential Revision: http://reviews.llvm.org/D5699 llvm-svn: 222340	2014-11-19 10:06:49 +00:00
David Majnemer	b7adf34ee0	AliasSetTracker: UnknownInsts should contribute to the refcount AliasSetTracker::addUnknown may create an AliasSet devoid of pointers just to contain an instruction if no suitable AliasSet already exists. It will then AliasSet::addUnknownInst and we will be done. However, it's possible for addUnknown to choose an existing AliasSet to addUnknownInst. If this were to occur, we are in a bit of a pickle: removing pointers from the AliasSet can cause the entire AliasSet to become destroyed, taking our unknown instructions out with them. Instead, keep track whether or not our AliasSet has any unknown instructions. This fixes PR21582. llvm-svn: 222338	2014-11-19 09:41:05 +00:00
Hao Liu	fd46bea46a	[AArch64] Enable SeparateConstOffsetFromGEP, EarlyCSE and LICM passes on AArch64 backend. SeparateConstOffsetFromGEP can gives more optimizaiton opportunities related to GEPs, which benefits EarlyCSE and LICM. By enabling these passes we can have better address calculations and generate a better addressing mode. Some SPEC 2006 benchmarks (astar, gobmk, namd) have obvious improvements on Cortex-A57. Reviewed in http://reviews.llvm.org/D5864. llvm-svn: 222331	2014-11-19 06:39:53 +00:00
Rui Ueyama	970dda295e	llvm-readobj: fix off-by-one error in COFFDumper It printed out base relocation table header as table entry. This patch also makes llvm-readobj to not skip ABSOLUTE entries becuase it was confusing. llvm-svn: 222299	2014-11-19 02:07:10 +00:00
Weiming Zhao	7a2d15678e	[Aarch64] Customer lowering of CTPOP to SIMD should check for NEON availability llvm-svn: 222292	2014-11-19 00:29:14 +00:00
Kostya Serebryany	cb45b126fb	[asan] add experimental basic-block tracing to asan-coverage; also fix -fsanitize-coverage=3 which was broken by r221718 llvm-svn: 222290	2014-11-19 00:22:58 +00:00
Rui Ueyama	74e85130a0	llvm-readobj: teach it how to dump COFF base relocation table llvm-svn: 222289	2014-11-19 00:18:07 +00:00
Manman Ren	c67109313c	Revert r222039 because of bot failure. http://lab.llvm.org:8080/green/job/clang-Rlto_master/298/ Hopefully, bot will be green. If not, we will re-submit the commit. llvm-svn: 222287	2014-11-19 00:13:26 +00:00
Matt Arsenault	c09cc3c5b0	R600/SI: Implement areMemAccessesTriviallyDisjoint This partially makes up for not having address spaces used for alias analysis in some simple cases. This is not yet enabled by default so shouldn't change anything yet. llvm-svn: 222286	2014-11-19 00:01:31 +00:00
Simon Pilgrim	9c1e4123f8	[X86][AVX] 256-bit vector stack unaligned load/stores identification Under many circumstances the stack is not 32-byte aligned, resulting in the use of the vmovups/vmovupd/vmovdqu instructions when inserting ymm reloads/spills. This minor patch adds these instructions to the isFrameLoadOpcode/isFrameStoreOpcode helpers so that they can be correctly identified and not be treated as folded reloads/spills. This has also been noticed by http://llvm.org/bugs/show_bug.cgi?id=18846 where it was causing redundant spills - I've added a reduced test case at test/CodeGen/X86/pr18846.ll Differential Revision: http://reviews.llvm.org/D6252 llvm-svn: 222281	2014-11-18 23:38:19 +00:00
Colin LeMahieu	44fd1c8bdf	[Hexagon] Adding A2_and instruction. llvm-svn: 222274	2014-11-18 22:45:47 +00:00
Chad Rosier	c250881838	[FastISel][AArch64] Also allow folding of sign-/zero-extend and arithmetic shift-right for booleans (i1). Arithmetic shift-right immediate with sign-/zero-extensions also works for boolean values. Update the assert and the test cases to reflect that fact. llvm-svn: 222272	2014-11-18 22:41:49 +00:00
Chad Rosier	e16d16ae41	[FastISel][AArch64] Also allow folding of sign-/zero-extend and logical shift-right for booleans (i1). Logical shift-right immediate with sign-/zero-extensions also works for boolean values. Update the assert and the test cases to reflect that fact. llvm-svn: 222270	2014-11-18 22:38:42 +00:00
David Majnemer	c6b8e20a5c	InstCombine: Fix another infinite loop caused by visitFPTrunc We would attempt to replace an frem's operand with the same operand. This would cause InstCombine to think real work was done, causing InstCombine to enter an infinite loop. This fixes the second part of PR21576. llvm-svn: 222265	2014-11-18 22:06:45 +00:00
Colin LeMahieu	38765e6d89	[Hexagon] Adding A2_sub instruction Renaming test files. llvm-svn: 222263	2014-11-18 21:51:51 +00:00
David Majnemer	b32eaddf11	Revert "Revert r222040 because of bot failure." This reverts commit r222203, reverting r222040 didn't end up turning the bot green. llvm-svn: 222261	2014-11-18 21:30:02 +00:00
Juergen Ributzka	cdda930843	[FastISel][AArch64] Follow-up fix for "Fix shift-immediate emission for "zero" shifts." Shifts also perform sign-/zero-extends to larger types, which requires us to emit an integer extend instead of a simple COPY. Related to PR21594. llvm-svn: 222257	2014-11-18 21:20:17 +00:00
Matt Arsenault	162c1010bd	R600/SI: Move SIFixSGPRCopies to inst selector passes This should expose more of the actually used VALU instructions to the machine optimization passes. This also should help getting i1 handling into a better state. For not entirly understood reasons, this fixes the split-scalar-i64-add.ll test where a 64-bit add would only partially be moved to the VALU resulting in use of undefined VCC. llvm-svn: 222256	2014-11-18 21:06:58 +00:00
Tom Stellard	f0a2107c6b	R600/SI: Make sure resource descriptors are always stored in SGPRs llvm-svn: 222253	2014-11-18 20:39:39 +00:00
Chad Rosier	b83c6d9c08	[Reassociate] Use test cases that can actually be optimized to verify optional flags are cleared. The reassociation pass was just reordering the leaf nodes in the previous test cases. llvm-svn: 222250	2014-11-18 20:34:01 +00:00
Colin LeMahieu	efa74e0280	[Hexagon] Converting from ADD_rr to A2_add which has encoding bits. Adding test to show correct instruction selection and encoding. llvm-svn: 222249	2014-11-18 20:28:11 +00:00
Juergen Ributzka	4328fd94b0	[FastISel][AArch64] Fix shift-immediate emission for "zero" shifts. This change emits a COPY for a shift-immediate with a "zero" shift value. This fixes PR21594 where we emitted a shift instruction with an incorrect immediate operand. llvm-svn: 222247	2014-11-18 19:58:59 +00:00
Philip Reames	018dbf18c4	Tweak EarlyCSE to recognize series of dead stores EarlyCSE is giving up on the current instruction immediately when it recognizes that the current instruction makes a previous store trivially dead. There's no reason to do this. Once the previous store has been deleted, it's perfectly legal to remember the value of the current store (for value forwarding) and the fact the store occurred (it could be dead too!). Reviewed by: Hal Differential Revision: http://reviews.llvm.org/D6301 llvm-svn: 222241	2014-11-18 17:46:32 +00:00
Manman Ren	4723d7965e	Remove triple in testing case to recover an arm bot. llvm-svn: 222239	2014-11-18 16:45:34 +00:00
David Majnemer	6fdb6b8fd4	InstCombine: Fold away tautological masked compares It is impossible for (x & INT_MAX) == 0 && x == INT_MAX to ever be true. While this sort of reasoning should normally live in InstSimplify, the machinery that derives this result is not trivial to split out. llvm-svn: 222230	2014-11-18 09:31:41 +00:00
Frederic Riss	fdccfc1e19	Allow DwarfCompileUnit::constructImportedEntityDIE to instanciate a GlobalVariable DIE. Usually global variables are in a retain list and instanciated before any call to constructImportedEntityDIE is made. This isn't true for forward declarations though. The testcase for this change is generated by a clang patched to emit such forward declarations (patch at http://reviews.llvm.org/D6173 which will land soon). The updated testcase tests more than just global variables, it now tests every type of 'using' clause we support. llvm-svn: 222217	2014-11-18 02:46:11 +00:00
David Majnemer	774aadf144	llvm-readobj: Don't print the Characteristics field as the Subsystem We claimed that we were printing the Subystem field when we were actually printing the Characteristics field. llvm-svn: 222216	2014-11-18 02:45:28 +00:00
David Majnemer	9a91e4a18a	IndVarSimplify: Allow LFTR to fire more often I added a pessimization in r217102 to prevent miscompiles when the incremented induction variable was used in a comparison; it would be poison. Try to use the incremented induction variable more often when we can be sure that the increment won't end in poison. Differential Revision: http://reviews.llvm.org/D6222 llvm-svn: 222213	2014-11-18 02:20:58 +00:00
Manman Ren	8975b044d3	Update testing case that was accidently duplicated. llvm-svn: 222210	2014-11-18 01:49:06 +00:00
Manman Ren	a64bd44fd8	Revert r222040 because of bot failure. http://lab.llvm.org:8080/green/job/clang-Rlto_master/298/ Hopefully, bot will be green. llvm-svn: 222203	2014-11-18 00:33:22 +00:00
Manman Ren	554865da5b	Debug Info: In DIBuilder, the context field of a global variable is updated to use DIScopeRef. A paired commit at clang will follow to show cases where we will use an identifer for the context of a global variable. rdar://18958417 llvm-svn: 222195	2014-11-18 00:29:08 +00:00
Duncan P. N. Exon Smith	f39c3b8108	IR: Simplify uniquing for MDNode Change uniquing from a `FoldingSet` to a `DenseSet` with custom `DenseMapInfo`. Unfortunately, this doesn't save any memory, since `DenseSet<T>` is a simple wrapper for `DenseMap<T, char>`, but I'll come back to fix that later. I used the name `GenericDenseMapInfo` to the custom `DenseMapInfo` since I'll be splitting `MDNode` into two classes soon: `MDNodeFwdDecl` for temporaries, and `GenericMDNode` for everything else. I also added a non-debug-info reduced version of a type-uniquing test that started failing on an earlier draft of this patch. Part of PR21532. llvm-svn: 222191	2014-11-17 23:28:21 +00:00
Juergen Ributzka	c9591e9bdb	[SimplifyCFG] Make the value type of the hole check bitmask a power-of-2. When converting a switch to a lookup table we might have to generate a bitmaks to encode and check for holes in the original switch statement. The type of this mask depends on the number of switch statements, which can result in illegal types for pretty much all architectures. To avoid unnecessary type legalization and help FastISel this commit increases the size of the bitmask to next power-of-2 value when necessary. This fixes rdar://problem/18984639. llvm-svn: 222168	2014-11-17 19:39:56 +00:00
Chad Rosier	bc0b869be9	[Reassociate] As the expression tree is rewritten make sure the operands are emitted in canonical form. llvm-svn: 222142	2014-11-17 16:33:50 +00:00
Alexey Volkov	7de210bd52	[X86] Use ADD/SUB instead of INC/DEC for Haswell and Broadwell CPUs Differential Revision: http://reviews.llvm.org/D5934 llvm-svn: 222141	2014-11-17 16:17:51 +00:00
Chad Rosier	9a1ac6e494	[Reassociate] Canonicalize constants to RHS operand. Fix a thinko where the RHS was already a constant. llvm-svn: 222139	2014-11-17 15:52:51 +00:00
Renato Golin	609bf92365	Fix ARM triple parsing The triple parser should only accept existing architecture names when the triple starts with armv, armebv, thumbv or thumbebv. Patch by Gabor Ballabas. llvm-svn: 222129	2014-11-17 14:08:57 +00:00
Oliver Stannard	970b0d576c	[Thumb1] Re-write emitThumbRegPlusImmediate This was motivated by a bug which caused code like this to be miscompiled: declare void @take_ptr(i8) define void @test() { %addr1.32 = alloca i8 %addr2.32 = alloca i32, i32 1028 call void @take_ptr(i8 %addr1) ret void } This was emitting the following assembly to get the value of %addr1: add r0, sp, #1020 add r0, r0, #8 However, "add r0, r0, #8" is not a valid Thumb1 instruction, and this could not be assembled. The generated object file contained this, resulting in r0 holding SP+8 rather tha SP+1028: add r0, sp, #1020 add r0, sp, #8 This function looked like it could have caused miscompilations for other combinations of registers and offsets (though I don't think it is currently called with these), and the heuristic it used did not match the emitted code in all cases. llvm-svn: 222125	2014-11-17 11:18:10 +00:00
David Majnemer	236b0ca790	Object, COFF: Tighten the object file parser We were a little lax in a few areas: - We pretended that import libraries were like any old COFF file, they are not. In fact, they aren't really COFF files at all, we should probably grow some specialized functionality to handle them smarter. - Our symbol iterators were more than happy to attempt to go past the end of the symbol table if you had a symbol with a bad list of auxiliary symbols. llvm-svn: 222124	2014-11-17 11:17:17 +00:00
Oliver Stannard	d29db9b949	Fix optimisations of SELECT_CC which assumed result is boolean Some optimisations in DAGCombiner cause miscompilations for targets that use TargetLowering::UndefinedBooleanContent, because they assume that the results of a SELECT_CC node are boolean values, and can be safely ANDed, ORed and XORed. These optimisations are only valid for targets that use ZeroOrOneBooleanContent or ZeroOrNegativeOneBooleanContent. This is a follow-up to D6210/r221693. llvm-svn: 222123	2014-11-17 10:49:31 +00:00
Erik Eckstein	105374fe5e	Optimize switch lookup tables with linear mapping. This is a simple optimization for switch table lookup: It computes the output value directly with an (optional) mul and add if there is a linear mapping between index and output. Example: int f1(int x) { switch (x) { case 0: return 10; case 1: return 11; case 2: return 12; case 3: return 13; } return 0; } generates: define i32 @f1(i32 %x) #0 { entry: %0 = icmp ult i32 %x, 4 br i1 %0, label %switch.lookup, label %return switch.lookup: %switch.offset = add i32 %x, 10 ret i32 %switch.offset return: ret i32 0 } llvm-svn: 222121	2014-11-17 09:13:57 +00:00
Bob Wilson	a61a19037a	Fix CR/LF line endings in test case. llvm-svn: 222120	2014-11-17 08:00:45 +00:00
Rafael Espindola	a3b5b60753	Add back r222061 with a fix. This adds back r222061, but now calls initializePAEvalPass from the correct library to avoid link problems. Original message: Don't make assumptions about the name of private global variables. Private variables are can be renamed, so it is not reliable to make decisions on the name. The name is also dropped by the assembler before getting to the linker, so using the name causes a disconnect between how llvm makes a decision (var name) and how the linker makes a decision (section it is in). This patch changes one case where we were looking at the variable name to use the section instead. Test tuning by Michael Gottesman. llvm-svn: 222117	2014-11-17 02:28:27 +00:00
Frederic Riss	d431932d38	Implement MachODumper::printFileHeaders Patch by Chilledheart. Differential Revision: http://reviews.llvm.org/D6163 llvm-svn: 222115	2014-11-17 01:34:15 +00:00
Jingyue Wu	0fa125a77d	[DependenceAnalysis] Allow subscripts of different types Summary: Several places in DependenceAnalysis assumes both SCEVs in a subscript pair share the same integer type. For instance, isKnownPredicate calls SE->getMinusSCEV(X, Y) which asserts X and Y share the same type. However, DependenceAnalysis fails to ensure this assumption when producing a subscript pair, causing tests such as NonCanonicalizedSubscript to crash. With this patch, DependenceAnalysis runs unifySubscriptType before producing any subscript pair, ensuring the assumption. Test Plan: Added NonCanonicalizedSubscript.ll on which DependenceAnalysis before the fix crashed because subscripts have different types. Reviewers: spop, sebpop, jingyue Reviewed By: jingyue Subscribers: eliben, meheff, llvm-commits Differential Revision: http://reviews.llvm.org/D6289 llvm-svn: 222100	2014-11-16 16:52:44 +00:00
David Majnemer	0df1d12476	ScalarEvolution: HowFarToZero was wrongly using signed division HowFarToZero was supposed to use unsigned division in order to calculate the backedge taken count. However, SCEVDivision::divide performs signed division. Unless I am mistaken, no users of SCEVDivision actually want signed arithmetic: switch to udiv and urem. This fixes PR21578. llvm-svn: 222093	2014-11-16 07:30:35 +00:00
Andrea Di Biagio	e13a0b81f4	[DAG] Improved target independent vector shuffle folding logic. This patch teaches the DAGCombiner how to combine shuffles according to rules: shuffle(shuffle(A, Undef, M0), B, M1) -> shuffle(B, A, M2) shuffle(shuffle(A, B, M0), B, M1) -> shuffle(B, A, M2) shuffle(shuffle(A, B, M0), A, M1) -> shuffle(B, A, M2) llvm-svn: 222090	2014-11-15 22:56:25 +00:00
Simon Pilgrim	6d675f4e35	[X86][SSE] Improve legal SHUFP and PSHUFD shuffle matching Updated X86TargetLowering::isShuffleMaskLegal to match SHUFP masks with commuted inputs and PSHUFD masks that reference the second input. As part of this I've refactored isPSHUFDMask to work in a more general manner and allow it to match against either the first or second input vector. Differential Revision: http://reviews.llvm.org/D6287 llvm-svn: 222087	2014-11-15 21:13:05 +00:00
Matt Arsenault	36094d788a	R600: Permute operands when selecting legacy min/max This gets the correct NaN behavior based on the compare type the hardware uses. This now passes the new piglit test I have for this on SI. Add stricter tests for the operand order. llvm-svn: 222079	2014-11-15 05:02:57 +00:00
Reid Kleckner	007239863e	Revert "Don't make assumptions about the name of private global variables." This reverts commit r222061. It's causing linker errors. llvm-svn: 222077	2014-11-15 02:03:53 +00:00
Rafael Espindola	2fc723099f	Don't make assumptions about the name of private global variables. Private variables are can be renamed, so it is not reliable to make decisions on the name. The name is also dropped by the assembler before getting to the linker, so using the name causes a disconnect between how llvm makes a decision (var name) and how the linker makes a decision (section it is in). This patch changes one case where we were looking at the variable name to use the section instead. Test tuning by Michael Gottesman. llvm-svn: 222061	2014-11-14 23:17:47 +00:00
Tim Northover	603d316517	ARM: refactor .cfi_def_cfa_offset emission. We use to track quite a few "adjusted" offsets through the FrameLowering code to account for changes in the prologue instructions as we went and allow the emission of correct CFA annotations. However, we were missing a couple of cases and the code was almost impenetrable. It's easier to just add any stack-adjusting instruction to a list and emit them together. llvm-svn: 222057	2014-11-14 22:45:33 +00:00
Tim Northover	9d2d218f49	ARM: correctly calculate the offset of FP in its push. When we folded the DPR alignment gap into a push, we weren't noting the extra distance from the beginning of the push to the FP, and so FP ended up pointing at an incorrect offset. The .cfi_def_cfa_offset directives are still wrong in this case, but I think that can be improved by refactoring. llvm-svn: 222056	2014-11-14 22:45:31 +00:00
Tim Northover	a0691c8983	ARM: simplify test. The test's DWARF stubs were there just to trigger the emission of .cfi directives. Fortunately, the NetBSD ABI already demands proper DWARF unwind info, so it's easier to just use that triple. llvm-svn: 222055	2014-11-14 22:45:23 +00:00
Kevin Enderby	ae3c126135	Add the code and test cases for 64-bit ARM to llvm-objdump’s Mach-O symbolizer. FYI, removed the unused MCInstrAnalysis as it does not exist for 64-bit ARM and was causing a “couldn't initialize disassembler for target” error. llvm-svn: 222045	2014-11-14 21:52:18 +00:00
Frederic Riss	2700d03da5	Add a test for r222029 that doesn't rely on the default target being a COFF platform. llvm-svn: 222041	2014-11-14 21:23:26 +00:00
David Majnemer	8c3d92e7e5	InstCombine: Fix infinite loop caused by visitFPTrunc We would attempt to replace a fptrunc of an frem with an identical fptrunc. This would cause the new fptrunc to be added to the worklist. Of course, this results in an infinite loop because we will keep visiting the newly created fptruncs. This fixes PR21576. llvm-svn: 222040	2014-11-14 21:21:15 +00:00
Chad Rosier	1ff4c0bf0b	Reapply r221924: "[GVN] Perform Scalar PRE on gep indices that feed loads before doing Load PRE" This commit updates the failing test in Analysis/TypeBasedAliasAnalysis/gvn-nonlocal-type-mismatch.ll The failing test is sensitive to the order in which we process loads. This version turns on the RPO traversal instead of the while DT traversal in GVN. The new test code is functionally same just the order of loads that are eliminated is swapped. This new version also fixes an issue where GVN splits a critical edge and potentially invalidate the RPO/DT iterator. llvm-svn: 222039	2014-11-14 21:09:13 +00:00
Tom Stellard	bdd567d86d	R600/SI: Fix spilling of m0 register If we have spilled the value of the m0 register, then we need to restore it with v_readlane_b32 to a regular sgpr, because v_readlane_b32 can't write to m0. v_readlane_b32 can't write to m0, so llvm-svn: 222036	2014-11-14 20:43:26 +00:00
Matt Arsenault	cc3c2b3946	R600/SI: Combine min3/max3 instructions llvm-svn: 222032	2014-11-14 20:08:52 +00:00
Frederic Riss	7c50047684	[dwarfdump] Handle relocations in Dwarf accelerator tables ELF targets (and maybe COFF) use relocations when referring to strings in the .debug_str section. Handle that in the accelerator table dumper. This commit restores the test/DebugInfo/cross-cu-inlining.ll test to its expected platform independant form, validating that the fix works (this test failed on linux boxes). llvm-svn: 222029	2014-11-14 19:30:08 +00:00
Matt Arsenault	72858935f7	R600/SI: Fix verifier error from a branch on IMPLICIT_DEF SIILowerI1Copies wasn't correctly handling this case. llvm-svn: 222020	2014-11-14 18:43:41 +00:00
Matt Arsenault	d28a7fde32	R600/SI: Match integer min / max instructions llvm-svn: 222015	2014-11-14 18:30:06 +00:00
Matt Arsenault	94812216ef	R600/SI: Use S_BFE_I64 for 64-bit sext_inreg llvm-svn: 222012	2014-11-14 18:18:16 +00:00
Chad Rosier	df8f2a23cb	[Reassociate] Canonicalize the operands of all binary operators. llvm-svn: 222008	2014-11-14 17:09:19 +00:00
Frederic Riss	39e1cda45b	Tentatively appease the bots. If this workaround gets the bots green, then we have to find out why the -dwarf-accel-tables=Enable option doesn't work as expected on non-darwin platforms. llvm-svn: 222007	2014-11-14 17:08:18 +00:00
Chad Rosier	d99df68e19	[Reassociate] Canonicalize operands of vector binary operators. Prior to this commit fmul and fadd binary operators were being canonicalized for both scalar and vector versions. We now canonicalize add, mul, and, or, and xor vector instructions. llvm-svn: 222006	2014-11-14 17:08:15 +00:00
Chad Rosier	f8b55f1bc5	[Reassociate] Canonicalize constants to RHS operand. llvm-svn: 222005	2014-11-14 17:05:59 +00:00
Frederic Riss	e837ec29c3	Reapply "[dwarfdump] Add support for dumping accelerator tables." This reverts commit r221842 which was a revert of r221836 and of the test parts of r221837. This new version fixes an UB bug pointed out by David (along with addressing some other review comments), makes some dumping more resilient to broken input data and forces the accelerator tables to be dumped in the tests where we use them (this decision is platform specific otherwise). llvm-svn: 222003	2014-11-14 16:15:53 +00:00
Cameron McInally	04400449c5	[AVX512] Add 512b masked integer shift by immediate patterns. llvm-svn: 222002	2014-11-14 15:43:00 +00:00
Tom Stellard	9d7ddd516e	R600/SI: Start implementing an assembler This was done using the Sparc and PowerPC AsmParsers as guides. So far it is very simple and only supports sopp instructions. llvm-svn: 221994	2014-11-14 14:08:00 +00:00
Bill Schmidt	7674692961	[PowerPC] Add VSX builtins for vec_div This patch adds builtin support for xvdivdp and xvdivsp, along with a test case. Straightforward stuff. There's a companion patch for Clang. llvm-svn: 221983	2014-11-14 12:10:40 +00:00
Tim Northover	d3be12a6c7	X86: use getConstant rather than getTargetConstant behind BUILD_VECTOR. getTargetConstant should only be used when you can guarantee the instruction selected will be able to cope with the raw value. BUILD_VECTOR is rather too generic for this so we should use getConstant instead. In that case, an instruction can still consume the constant, but if it doesn't it'll be materialised through its own round of ISel. Should fix PR21352. llvm-svn: 221961	2014-11-14 01:30:14 +00:00
Reid Kleckner	283bc2ed28	Allow the use of functions as typeinfo in landingpad clauses This is one step towards supporting SEH filter functions in LLVM. llvm-svn: 221954	2014-11-14 00:35:50 +00:00
Reed Kotler	d5c4196cb6	First stage of call lowering for Mips fast-isel Summary: This has most of what is needed for mips fast-isel call lowering for O32. What is missing I will add on the next patch because this patch is already too large. It should not be doing anything wrong but it will punt on some cases that it is basically capable of doing. The mechanism is there for parameters to be passed on the stack but I have not enabled it because it serves as a way for now to prevent some of the strange cases of O32 register passing that I have not fully checked yet and have some issues. The Mips O32 abi rules are very complicated as far how data is passed in floating and integer registers. However there is a way to think about this all very simply and this implementation reflects that. Basically, the ABI rules are written as if everything is passed on the stack and aligned as such. Once that is conceptually done, it is nearly trivial to reassign those locations to registers and then all the complexity disappears. So I have told tablegen that all the data is passed on the stack and during the lowering I fix this by assigning to registers as per the ABI doc. This has been my approach and you can line up what I did with the ABI document and see 1 to 1 what is going on. Test Plan: callabi.ll Reviewers: dsanders Reviewed By: dsanders Subscribers: jholewinski, echristo, ahatanak, llvm-commits, rfuhler Differential Revision: http://reviews.llvm.org/D5714 llvm-svn: 221948	2014-11-13 23:37:45 +00:00
Reid Kleckner	9aeb04793a	Fix symbol resolution of floating point libc builtins in MCJIT Fix for LLI failure on Windows\X86: http://llvm.org/PR5053 LLI.exe crashes on Windows\X86 when single precession floating point intrinsics like the following are used: acos, asin, atan, atan2, ceil, copysign, cos, cosh, exp, floor, fmin, fmax, fmod, log, pow, sin, sinh, sqrt, tan, tanh The above intrinsics are defined as inline-expansions in math.h, and are not exported by msvcr120.dll (Win32 API GetProcAddress returns null). For an FREM instruction, the JIT compiler generates a call to a stub for the fmodf() intrinsic, and adds a relocation to fixup at load time. The loader searches the libraries for the function, but fails because the symbol is not exported. So, the call target remains NULL and the execution crashes. Since the math functions are loaded at JIT/runtime, the JIT can patch CALL instruction directly instead of the searching the libraries' exported symbols. However, this fix caused build failures due to unresolved symbols like _fmodf at link time. Therefore, the current fix defines helper functions in the Runtime link/load library to perform the above operations. The address of these helper functions are used to patch up the CALL instruction at load time. Reviewers: lhames, rnk Reviewed By: rnk Differential Revision: http://reviews.llvm.org/D5387 Patch by Swaroop Sridhar! llvm-svn: 221947	2014-11-13 23:32:52 +00:00
Reid Kleckner	29d880bdc7	Relax the gcov version.ll test to check '.' instead of '\' The escaping of the '\' doesn't work with my combination of testing tools. llvm-svn: 221944	2014-11-13 23:07:55 +00:00
Matt Arsenault	da59f3de45	R600/SI: Fix fmin_legacy / fmax_legacy matching for SI select_cc is expanded on SI, so this was never matched. llvm-svn: 221941	2014-11-13 23:03:09 +00:00
Chad Rosier	8716b58583	Revert "[GVN] Perform Scalar PRE on gep indices that feed loads before doing Load PRE." This reverts commit r221924. It appears the commit was a bit premature and is causing bot failures that need further investigation. llvm-svn: 221939	2014-11-13 22:54:59 +00:00
Chandler Carruth	99b261ce6d	[x86] Add some tests for specific patterns of lane-flips combined with in-lane shuffles that aren't always handled well by the current vector shuffle lowering. No functionality change yet, that will follow in a subsequent commit. llvm-svn: 221938	2014-11-13 22:49:44 +00:00
Chad Rosier	dd526665fc	[GVN] Perform Scalar PRE on gep indices that feed loads before doing Load PRE. Phabricator Revision: http://reviews.llvm.org/D6103 Patch by "Balaram Makam" <bmakam@codeaurora.org>! llvm-svn: 221924	2014-11-13 21:17:58 +00:00
Juergen Ributzka	0af310d052	[FastISel][AArch64] Don't bail during simple GEP instruction selection. The generic FastISel code would bail, because it can't emit a sign-extend for AArch64. This copies the code over and uses AArch64 specific emit functions. This is not ideal and 'computeAddress' should handles this, so it can fold the address computation into the memory operation. I plan to clean up 'computeAddress' anyways, so I will add that in a future commit. Related to rdar://problem/18962471. llvm-svn: 221923	2014-11-13 20:50:44 +00:00
Matt Arsenault	7784992999	R600/SI: Use s_movk_i32 llvm-svn: 221922	2014-11-13 20:44:23 +00:00
Matt Arsenault	6ef66144f3	R600: Fix assert on empty function If a function is just an unreachable, this would hit a "this is not a MachO target" assertion because of setting HasSubsectionViaSymbols. llvm-svn: 221920	2014-11-13 20:07:40 +00:00
Matt Arsenault	cc8d3b8774	R600: Error on initializer for LDS. Also give a proper error for other address spaces. llvm-svn: 221917	2014-11-13 19:56:13 +00:00
Matt Arsenault	1cffa4c191	R600/SI: Get rid of FCLAMP_SI pseudo It's not necessary. Also use complex patterns to allow src modifier usage. llvm-svn: 221916	2014-11-13 19:49:04 +00:00
Matt Arsenault	581a7a6933	R600/SI: Allow commuting with src2_modifiers llvm-svn: 221911	2014-11-13 19:26:50 +00:00
Matt Arsenault	95e48668b6	R600/SI: Allow commuting some 3 op instructions e.g. v_mad_f32 a, b, c -> v_mad_f32 b, a, c This simplifies matching v_madmk_f32. This looks somewhat surprising, but it appears to be OK to do this. We can commute src0 and src1 in all of these instructions, and that's all that appears to matter. llvm-svn: 221910	2014-11-13 19:26:47 +00:00
Tim Northover	631cc9ce1a	ARM: allow constpool entry to be moved to the user's block in all cases. Normally entries can only move to a lower address, but when that wasn't viable, the user's block was considered anyway. Unfortunately, it went via createNewWater which wasn't designed to handle the case where there's already an island after the block. Unfortunately, the test we have is slow and fragile, and I couldn't reduce it to anything sane even with the @llvm.arm.space intrinsic. The test change here is recreating the previous one after the change. rdar://problem/18545506 llvm-svn: 221905	2014-11-13 17:58:53 +00:00
Tim Northover	ab85dcc7b8	ARM: avoid duplicating branches during constant islands. We were using a naive heuristic to determine whether a basic block already had an unconditional branch at the end. This mostly corresponded to reality (assuming branches got optimised) because there's not much point in a branch to the next block, but could go wrong. llvm-svn: 221904	2014-11-13 17:58:51 +00:00
Tim Northover	650b0ee53b	ARM: add @llvm.arm.space intrinsic for testing ConstantIslands. Creating tests for the ConstantIslands pass is very difficult, since it depends on precise layout details. Having the ability to precisely inject a number of bytes into the stream helps greatly. llvm-svn: 221903	2014-11-13 17:58:48 +00:00
Elena Demikhovsky	d5e95b57e0	AVX-512: SINT_TO_FP cost model and some bugfixes Checked some corner cases, for example translation of <8 x i1> to <8 x double> llvm-svn: 221883	2014-11-13 11:46:16 +00:00
Hal Finkel	e353eba33b	OCAMLFLAGS can contain =, don't use = with sed Like HOST_LDFLAGS, etc. OCAMLFLAGS can contain =, so use ! as the substitution separator instead of = (otherwise, sed might error). llvm-svn: 221879	2014-11-13 09:29:30 +00:00
Hal Finkel	45ba2c10e4	Revert r219432 - "Revert "[BasicAA] Revert "Revert r218714 - Make better use of zext and sign information.""" Let's try this again... This reverts r219432, plus a bug fix. Description of the bug in r219432 (by Nick): The bug was using AllPositive to break out of the loop; if the loop break condition i != e is changed to i != e && AllPositive then the test_modulo_analysis_with_global test I've added will fail as the Modulo will be calculated incorrectly (as the last loop iteration is skipped, so Modulo isn't updated with its Scale). Nick also adds this comment: ComputeSignBit is safe to use in loops as it takes into account phi nodes, and the == EK_ZeroEx check is safe in loops as, no matter how the variable changes between iterations, zero-extensions will always guarantee a zero sign bit. The isValueEqualInPotentialCycles check is therefore definitely not needed as all the variable analysis holds no matter how the variables change between loop iterations. And this patch also adds another enhancement to GetLinearExpression - basically to convert ConstantInts to Offsets (see test_const_eval and test_const_eval_scaled for the situations this improves). Original commit message: This reverts r218944, which reverted r218714, plus a bug fix. Description of the bug in r218714 (by Nick): The original patch forgot to check if the Scale in VariableGEPIndex flipped the sign of the variable. The BasicAA pass iterates over the instructions in the order they appear in the function, and so BasicAliasAnalysis::aliasGEP is called with the variable it first comes across as parameter GEP1. Adding a %reorder label puts the definition of %a after %b so aliasGEP is called with %b as the first parameter and %a as the second. aliasGEP later calculates that %a == %b + 1 - %idxprom where %idxprom >= 0 (if %a was passed as the first parameter it would calculate %b == %a - 1 + %idxprom where %idxprom >= 0) - ignoring that %idxprom is scaled by -1 here lead the patch to incorrectly conclude that %a > %b. Revised patch by Nick White, thanks! Thanks to Lang to isolating the bug. Slightly modified by me to add an early exit from the loop and avoid unnecessary, but expensive, function calls. Original commit message: Two related things: 1. Fixes a bug when calculating the offset in GetLinearExpression. The code previously used zext to extend the offset, so negative offsets were converted to large positive ones. 2. Enhance aliasGEP to deduce that, if the difference between two GEP allocations is positive and all the variables that govern the offset are also positive (i.e. the offset is strictly after the higher base pointer), then locations that fit in the gap between the two base pointers are NoAlias. Patch by Nick White! llvm-svn: 221876	2014-11-13 09:16:54 +00:00
Chandler Carruth	fee91883f4	[x86] Teach the vector shuffle lowering to make a more nuanced decision between splitting a vector into 128-bit lanes and recombining them vs. decomposing things into single-input shuffles and a final blend. This handles a large number of cases in AVX1 where the cross-lane shuffles would be much more expensive to represent even though we end up with a fast blend at the root. Instead, we can do a better job of shuffling in a single lane and then inserting it into the other lanes. This fixes the remaining bits of Halide's regression captured in PR21281 for AVX1. However, the bug persists in AVX2 because I've made this change reasonably conservative. The cases where it makes sense in AVX2 to split into 128-bit lanes are much more rare because we can often do full permutations across all elements of the 256-bit vector. However, the particular test case in PR21281 is an example of one of the rare cases where it is always better to work in a single 128-bit lane. I'm going to try to teach the logic to detect and form the good code even in AVX2 next, but it will need to use a separate heuristic. Finally, there is one pesky regression here where we previously would craftily use vpermilps in AVX1 to shuffle both high and low halves at the same time. We no longer pull that off, and not for any really good reason. Ultimately, I think this is just another missing nuance to the selection heuristic that I'll try to add in afterward, but this change already seems strictly worth doing considering the magnitude of the improvements in common matrix math shuffle patterns. As always, please let me know if this causes a surprising regression for you. llvm-svn: 221861	2014-11-13 04:06:10 +00:00
Rui Ueyama	ffa4cebe91	llvm-readobj: Print out address table when dumping COFF delay-import table llvm-svn: 221855	2014-11-13 03:22:54 +00:00
Frederic Riss	0f7abef2cf	Add an assert and a test that verify r221709's fix. llvm-svn: 221854	2014-11-13 03:20:23 +00:00
Chandler Carruth	253dd39a9a	[x86] Don't form overly fragmented blends when splitting and re-combining shuffles because nothing was available in the wider vector type. The key observation (which I've put in the comments for future maintainers) is that at this point, no further combining is really possible. And so even though these shuffles trivially could be combined, we need to actually do that as we produce them when producing them this late in the lowering. This fixes another (huge) part of the Halide vector shuffle regressions. As it happens, this was already well covered by the tests, but I hadn't noticed how bad some of these got. The specific patterns that turn directly into unpckl/h patterns were occurring many times in common vector processing code. There are still more problems here sadly, but trying to incrementally tease them apart and it looks like this is the core of the problem in the splitting logic. There is some chance of regression here, you can see it in the test changes. Specifically, where we stop forming pshufb in some cases, it is possible that pshufb was in fact faster. Intel "says" that pshufb is slower than the instruction sequences replacing it. llvm-svn: 221852	2014-11-13 02:42:08 +00:00
Quentin Colombet	f5485bb008	[CodeGenPrepare] Handle zero extensions in the TypePromotionHelper. Prior to this patch the TypePromotionHelper was promoting only sign extensions. Supporting zero extensions changes: - How constants are extended. - How sign extensions, zero extensions, and truncate are composed together. - How the type of the extended operation is recorded. Now we need to know the kind of the extension as well as its type. Each change is fairly small, unlike the diff. Most of the diff are comments/variable renaming to say "extension" instead of "sign extension". The performance improvements on the test suite are within the noise. Related to <rdar://problem/18310086>. llvm-svn: 221851	2014-11-13 01:44:51 +00:00
Juergen Ributzka	957a1454cc	[FastISel][AArch64] Optimize select when one of the operands is a 'true' or 'false' value. Optimize selects of i1 in the presence of 'true' and 'false' operands to simple logic operations. This fixes rdar://problem/18960150. llvm-svn: 221848	2014-11-13 00:36:46 +00:00
Juergen Ributzka	424c5fd12f	[FastISel][AArch64] Fold the cmp into the select when possible. This folds the compare emission into the select emission when possible, so we can directly use the flags and don't have to emit a separate compare. Related to rdar://problem/18960150. llvm-svn: 221847	2014-11-13 00:36:43 +00:00
Juergen Ributzka	d1a042abd0	[FastISel][AArch64] Extend 'select' lowering to support also i1 to i16. Related to rdar://problem/18960150. llvm-svn: 221846	2014-11-13 00:36:38 +00:00
Frederic Riss	e1f4958122	Revert "[dwarfdump] Add support for dumping accelerator tables." This reverts commit r221836. The tests are asserting on some buildbots. This also reverts the test part of r221837 as it relies on dwarfdump dumping the accelerator tables. llvm-svn: 221842	2014-11-13 00:15:15 +00:00
Sanjoy Das	c5676df3ec	Teach ScalarEvolution to sharpen range information. If x is known to have the range [a, b), in a loop predicated by (icmp ne x, a) its range can be sharpened to [a + 1, b). Get ScalarEvolution and hence IndVars to exploit this fact. This change triggers an optimization to widen-loop-comp.ll, so it had to be edited to get it to pass. This change was originally landed in r219834 but had a bug and broke ASan. It was reverted in r219878, and is now being re-landed after fixing the original bug. phabricator: http://reviews.llvm.org/D5639 reviewed by: atrick llvm-svn: 221839	2014-11-13 00:00:58 +00:00
Frederic Riss	3a6b354b3e	Fix emission of Dwarf accelerator table when there are multiple CUs. The DIE offset in the accel tables is an offset relative to the start of the debug_info section, but we were encoding the offset to the start of the containing CU. llvm-svn: 221837	2014-11-12 23:48:14 +00:00
Frederic Riss	39467276d0	[dwarfdump] Add support for dumping accelerator tables. The class used for the dump only allows to dump for the moment, but it can (and will) be easily extended to support search also. llvm-svn: 221836	2014-11-12 23:48:10 +00:00
Ahmed Bougacha	0788d49a40	[CodeGenPrepare][AArch64] Fix a TLI legality check on iPTR to use a lowered instead. Fixes PR21548. Related to PR20474. llvm-svn: 221820	2014-11-12 22:16:55 +00:00
Sanjay Patel	f6f7d5d1dd	Expose the number of Newton-Raphson iterations applied to the hardware's reciprocal estimate as a parameter (x86). This is a follow-on to r221706 and r221731 and discussed in more detail in PR21385. This patch also loosens the testcase checking for btver2. We know that the "1.0" will be loaded, but we can't tell exactly when, so replace the CHECK-NEXT specifiers with plain CHECKs. The CHECK-NEXT sequence relied on a quirk of post-RA-scheduling that may change independently of anything in these tests. llvm-svn: 221819	2014-11-12 21:39:01 +00:00
Timur Iskhodzhanov	0e76a16200	Temporary fix for PR21528 - use mangled C++ function names in COFF debug info to un-break ASan on Windows llvm-svn: 221813	2014-11-12 20:21:20 +00:00
Timur Iskhodzhanov	a11b32b7e5	[COFF] Make it clearer that the symbols subsection holds function display name rather than just name llvm-svn: 221812	2014-11-12 20:10:09 +00:00
Cameron McInally	73a6bca32b	[AVX512] Add integer shift by immediate intrinsics. llvm-svn: 221811	2014-11-12 19:58:54 +00:00
Sanjay Patel	4c219fd248	CGSCC should not treat intrinsic calls like function calls (PR21403) Make the handling of calls to intrinsics in CGSCC consistent: they are not treated like regular function calls because they are never lowered to function calls. Without this patch, we can get dangling pointer asserts from the subsequent loop that processes callsites because it already ignores intrinsics. See http://llvm.org/bugs/show_bug.cgi?id=21403 for more details / discussion. Differential Revision: http://reviews.llvm.org/D6124 llvm-svn: 221802	2014-11-12 18:25:47 +00:00
Jingyue Wu	8a12cea5f1	Disable indvar widening if arithmetics on the wider type are more expensive Summary: Reapply r221772. The old patch breaks the bot because the @indvar_32_bit test was run whether NVPTX was enabled or not. IndVarSimplify should not widen an indvar if arithmetics on the wider indvar are more expensive than those on the narrower indvar. For instance, although NVPTX64 treats i64 as a legal type, an ADD on i64 is twice as expensive as that on i32, because the hardware needs to simulate a 64-bit integer using two 32-bit integers. Split from D6188, and based on D6195 which adds NVPTXTargetTransformInfo. Fixes PR21148. Test Plan: Added @indvar_32_bit that verifies we do not widen an indvar if the arithmetics on the wider type are more expensive. This test is run only when NVPTX is enabled. Reviewers: jholewinski, eliben, meheff, atrick Reviewed By: atrick Subscribers: jholewinski, llvm-commits Differential Revision: http://reviews.llvm.org/D6196 llvm-svn: 221799	2014-11-12 18:09:15 +00:00
Justin Hibbits	21c5353f54	Revert part of the PIC tests (TLS part) This change actually wasn't warranted for -O0, and the new changes prove it and break the build. llvm-svn: 221793	2014-11-12 16:50:15 +00:00
Justin Hibbits	b296c9735e	Fix thet tests. I seem to have missed the update I made for changing 'flag_pic' to "PIC Level". Mea culpa. llvm-svn: 221792	2014-11-12 16:40:00 +00:00
Justin Hibbits	a88b605721	Add support for small-model PIC for PowerPC. Summary: Large-model was added first. With the addition of support for multiple PIC models in LLVM, now add small-model PIC for 32-bit PowerPC, SysV4 ABI. This generates more optimal code, for shared libraries with less than about 16380 data objects. Test Plan: Test cases added or updated Reviewers: joerg, hfinkel Reviewed By: hfinkel Subscribers: jholewinski, mcrosier, emaste, llvm-commits Differential Revision: http://reviews.llvm.org/D5399 llvm-svn: 221791	2014-11-12 15:16:30 +00:00
Rafael Espindola	301396c911	Fix the test. It was broken since r221708. llvm-svn: 221783	2014-11-12 14:23:04 +00:00
Zoran Jovanovic	fd888630b5	[mips][micromips] Add predicate 'InMicroMips' at CodeGen patterns for microMIPS instructions Differential Revision: http://reviews.llvm.org/D6198 llvm-svn: 221780	2014-11-12 13:30:10 +00:00
Chandler Carruth	0c922fcec5	[x86] Start improving the matching of unpck instructions based on test cases from Halide folks. This initial step was extracted from a prototype change by Clay Wood to try and address regressions found with Halide and the new vector shuffle lowering. llvm-svn: 221779	2014-11-12 10:05:18 +00:00
Chandler Carruth	ce6947d4cf	[x86] Clean up a bunch of vector shuffle tests with my script. Notably, removes windows line endings and other noise. This is in prelude to making substantive changes to these tests. llvm-svn: 221776	2014-11-12 09:17:15 +00:00
Elena Demikhovsky	be8808dc3f	AVX-512: Intrinsics for ERI 3 instructions: vrcp28, vrsqrt28, vexp2, only vector forms. Intrinsics include SAE (Suppres All Exceptions) parameter. http://reviews.llvm.org/D6214 llvm-svn: 221774	2014-11-12 07:31:03 +00:00
Jingyue Wu	a48273390c	Reverts r221772 which fails tests llvm-svn: 221773	2014-11-12 07:19:25 +00:00
Jingyue Wu	635a9b14fa	Disable indvar widening if arithmetics on the wider type are more expensive Summary: IndVarSimplify should not widen an indvar if arithmetics on the wider indvar are more expensive than those on the narrower indvar. For instance, although NVPTX64 treats i64 as a legal type, an ADD on i64 is twice as expensive as that on i32, because the hardware needs to simulate a 64-bit integer using two 32-bit integers. Split from D6188, and based on D6195 which adds NVPTXTargetTransformInfo. Fixes PR21148. Test Plan: Added @indvar_32_bit that verifies we do not widen an indvar if the arithmetics on the wider type are more expensive. Reviewers: jholewinski, eliben, meheff, atrick Reviewed By: atrick Subscribers: jholewinski, llvm-commits Differential Revision: http://reviews.llvm.org/D6196 llvm-svn: 221772	2014-11-12 06:58:45 +00:00
Bill Schmidt	729547847f	[PowerPC] Add vec_vsx_ld and vec_vsx_st intrinsics This patch enables the vec_vsx_ld and vec_vsx_st intrinsics for PowerPC, which provide programmer access to the lxvd2x, lxvw4x, stxvd2x, and stxvw4x instructions. New LLVM intrinsics are provided to represent these four instructions in IntrinsicsPowerPC.td. These are patterned after the similar intrinsics for lvx and stvx (Altivec). In PPCInstrVSX.td, these intrinsics are tied to the code gen patterns, with additional patterns to allow plain vanilla loads and stores to still generate these instructions. At -O1 and higher the intrinsics are immediately converted to loads and stores in InstCombineCalls.cpp. This will open up more optimization opportunities while still allowing the correct instructions to be generated. (Similar code exists for aligned Altivec loads and stores.) The new intrinsics are added to the code that checks for consecutive loads and stores in PPCISelLowering.cpp, as well as to PPCTargetLowering::getTgtMemIntrinsic(). There's a new test to verify the correct instructions are generated. The loads and stores tend to be reordered, so the test just counts their number. It runs at -O2, as it's not very effective to test this at -O0, when many unnecessary loads and stores are generated. I ended up having to modify vsx-fma-m.ll. It turns out this test case is slightly unreliable, but I don't know a good way to prevent problems with it. The xvmaddmdp instructions read and write the same register, which is one of the multiplicands. Commutativity allows either to be chosen. If the FMAs are reordered differently than expected by the test, the register assignment can be different as a result. Hopefully this doesn't change often. There is a companion patch for Clang. llvm-svn: 221767	2014-11-12 04:19:40 +00:00
Nick Kledzik	f44dbda542	Object, support both mach-o archive t.o.c file names For historical reasons archives on mach-o have two possible names for the file containing the table of contents for the archive: "__.SYMDEF SORTED" and "__.SYMDEF". But the libObject archive reader only supported the former. This patch fixes llvm::object::Archive to support both names. llvm-svn: 221747	2014-11-12 01:37:45 +00:00
Chad Rosier	f53f07046b	[Reassociate] Canonicalize negative constants out of expressions. Add support for FDiv, which was regressed by the previous commit. llvm-svn: 221738	2014-11-11 23:36:42 +00:00
Philip Reames	66c6de61ee	Canonicalize an assume(load != null) into !nonnull metadata We currently have two ways of informing the optimizer that the result of a load is never null: metadata and assume. This change converts the second in to the former. This avoids a need to implement optimizations using both forms. We should probably extend this basic idea to metadata of other forms; in particular, range metadata. We view is that assumes should be considered a "last resort" for when there isn't a more canonical way to represent something. Reviewed by: Hal Differential Revision: http://reviews.llvm.org/D5951 llvm-svn: 221737	2014-11-11 23:33:19 +00:00
Juergen Ributzka	89441b0dd8	[FastISel][AArch64] Add support for fabs intrinsic. Lower the llvm.fabs intrinsic to the 'fabs' MI instruction. This fixes rdar://problem/18946552. llvm-svn: 221729	2014-11-11 23:10:44 +00:00
Chad Rosier	094ac7735b	[Reassociate] Canonicalize negative constants out of expressions. This is a reapplication of r221171, but we only perform the transformation on expressions which include a multiplication. We do not transform rem/div operations as this doesn't appear to be safe in all cases. llvm-svn: 221721	2014-11-11 22:58:35 +00:00
Kostya Serebryany	29a18dcbc5	Move asan-coverage into a separate phase. Summary: This change moves asan-coverage instrumentation into a separate Module pass. The other part of the change in clang introduces a new flag -fsanitize-coverage=N. Another small patch will update tests in compiler-rt. With this patch no functionality change is expected except for the flag name. The following changes will make the coverage instrumentation work with tsan/msan Test Plan: Run regression tests, chromium. Reviewers: nlewycky, samsonov Reviewed By: nlewycky, samsonov Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D6152 llvm-svn: 221718	2014-11-11 22:14:37 +00:00
Tom Roeder	eb7a303d1b	Add Forward Control-Flow Integrity. This commit adds a new pass that can inject checks before indirect calls to make sure that these calls target known locations. It supports three types of checks and, at compile time, it can take the name of a custom function to call when an indirect call check fails. The default failure function ignores the error and continues. This pass incidentally moves the function JumpInstrTables::transformType from private to public and makes it static (with a new argument that specifies the table type to use); this is so that the CFI code can transform function types at call sites to determine which jump-instruction table to use for the check at that site. Also, this removes support for jumptables in ARM, pending further performance analysis and discussion. Review: http://reviews.llvm.org/D4167 llvm-svn: 221708	2014-11-11 21:08:02 +00:00
Colin LeMahieu	eb4675fb29	[llvm-mc] Fixing case where if a file ended with non-newline whitespace or a comma it would access invalid memory. Cleaned up parse loop. llvm-svn: 221707	2014-11-11 21:03:09 +00:00
Sanjay Patel	e2e589288f	Use rcpss/rcpps (X86) to speed up reciprocal calcs (PR21385). This is a first step for generating SSE rcp instructions for reciprocal calcs when fast-math allows it. This is very similar to the rsqrt optimization enabled in D5658 ( http://reviews.llvm.org/rL220570 ). For now, be conservative and only enable this for AMD btver2 where performance improves significantly both in terms of latency and throughput. We may never enable this codegen for Intel Core* chips because the divider circuits are just too fast. On SandyBridge, divss can be as fast as 10 cycles versus the 21 cycle critical path for the rcp + mul + sub + mul + add estimate. Follow-on patches may allow configuration of the number of Newton-Raphson refinement steps, add AVX512 support, and enable the optimization for more chips. More background here: http://llvm.org/bugs/show_bug.cgi?id=21385 Differential Revision: http://reviews.llvm.org/D6175 llvm-svn: 221706	2014-11-11 20:51:00 +00:00
Rafael Espindola	07e694d293	Simplify testcase. NFC. Thanks to Filipe Cabecinhas for the tip. llvm-svn: 221705	2014-11-11 20:49:16 +00:00
Bill Schmidt	3d9674cfb1	[PowerPC] Replace foul hackery with real calls to __tls_get_addr My original support for the general dynamic and local dynamic TLS models contained some fairly obtuse hacks to generate calls to __tls_get_addr when lowering a TargetGlobalAddress. Rather than generating real calls, special GET_TLS_ADDR nodes were used to wrap the calls and only reveal them at assembly time. I attempted to provide correct parameter and return values by chaining CopyToReg and CopyFromReg nodes onto the GET_TLS_ADDR nodes, but this was also not fully correct. Problems were seen with two back-to-back stores to TLS variables, where the call sequences ended up overlapping with unhappy results. Additionally, since these weren't real calls, the proper register side effects of a call were not recorded, so clobbered values were kept live across the calls. The proper thing to do is to lower these into calls in the first place. This is relatively straightforward; see the changes to PPCTargetLowering::LowerGlobalTLSAddress() in PPCISelLowering.cpp. The changes here are standard call lowering, except that we need to track the fact that these calls will require a relocation. This is done by adding a machine operand flag of MO_TLSLD or MO_TLSGD to the TargetGlobalAddress operand that appears earlier in the sequence. The calls to LowerCallTo() eventually find their way to LowerCall_64SVR4() or LowerCall_32SVR4(), which call FinishCall(), which calls PrepareCall(). In PrepareCall(), we detect the calls to __tls_get_addr and immediately snag the TargetGlobalTLSAddress with the annotated relocation information. This becomes an extra operand on the call following the callee, which is expected for nodes of type tlscall. We change the call opcode to CALL_TLS for this case. Back in FinishCall(), we change it again to CALL_NOP_TLS for 64-bit only, since we require a TOC-restore nop following the call for the 64-bit ABIs. During selection, patterns in PPCInstrInfo.td and PPCInstr64Bit.td convert the CALL_TLS nodes into BL_TLS nodes, and convert the CALL_NOP_TLS nodes into BL8_NOP_TLS nodes. This replaces the code removed from PPCAsmPrinter.cpp, as the BL_TLS or BL8_NOP_TLS nodes can now be emitted normally using their patterns and the associated printTLSCall print method. Finally, as a result of these changes, all references to get-tls-addr in its various guises are no longer used, so they have been removed. There are existing TLS tests to verify the changes haven't messed anything up). I've added one new test that verifies that the problem with the original code has been fixed. llvm-svn: 221703	2014-11-11 20:44:09 +00:00
Rafael Espindola	a9c28b68cd	Use a 8 bit immediate when possible. This fixes pr21529. llvm-svn: 221700	2014-11-11 19:46:36 +00:00
Dario Domizioli	e904e85faf	[X86][ELF] Fix PR20243 - leaf frame pointer bug with TLS access The ISel lowering for global TLS access in PIC mode was creating a pseudo instruction that is later expanded to a call, but the code was not setting the hasCalls flag in the MachineFrameInfo alongside the adjustsStack flag. This caused some functions to be mistakenly recognized as leaf functions, and this in turn affected the decision to eliminate the frame pointer. With the fix, hasCalls is properly set and the leaf frame pointer is correctly preserved. llvm-svn: 221695	2014-11-11 18:44:49 +00:00
Oliver Stannard	8c2c67e63c	LLVM incorrectly folds xor into select LLVM replaces the SelectionDAG pattern (xor (set_cc cc x y) 1) with (set_cc !cc x y), which is only correct when the xor has type i1. Instead, we should check that the constant operand to the xor is all ones. llvm-svn: 221693	2014-11-11 17:36:01 +00:00
Vasileios Kalintiris	b2dd15f8c7	[mips] Add preliminary support for the MIPS II target. Summary: This patch enables code generation for the MIPS II target. Pre-Mips32 targets don't have the MUL instruction, so we add the correspondent pattern that uses the MULT/MFLO combination in order to retrieve the product. This is WIP as we don't support code generation for select nodes due to the lack of conditional-move instructions. Reviewers: dsanders Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D6150 llvm-svn: 221686	2014-11-11 11:43:55 +00:00
Vasileios Kalintiris	8c1c95e95c	[mips] Add hardware register name "hwr_ulr" ($29) The canonical name when printing assembly is still $29. The reason is that GAS does not accept "$hwr_ulr" at the moment. This addresses the comments from r221307, which reverted the original commit r221299. llvm-svn: 221685	2014-11-11 11:22:39 +00:00
Andrea Di Biagio	5fa2e15453	[X86] Add missing check for 'isINSERTPSMask' in method 'isShuffleMaskLegal'. This helps the DAGCombiner to identify more opportunities to fold shuffles. llvm-svn: 221684	2014-11-11 11:20:31 +00:00
Vasileios Kalintiris	10b5ba3f6e	Recommit "[mips] Add names and tests for the hardware registers" The original commit r221299 was reverted in r221307. I removed the name "hrw_ulr" ($29) from the original commit because two tests were failing. llvm-svn: 221681	2014-11-11 10:31:31 +00:00
David Majnemer	185b5b1d24	llvm-objdump: Skip empty sections when dumping contents Empty sections are just noise when using objdump. This is similar to what binutils does. llvm-svn: 221680	2014-11-11 09:58:25 +00:00
Manuel Klimek	bfd0039e73	Was convinced in commit comments that requiring a specific python version is the wrong approach; reverting. llvm-svn: 221679	2014-11-11 08:53:18 +00:00
David Majnemer	2cc4bc77bf	MC, COFF: Use relocations for function references inside the section Referencing one symbol from another in the same section does not generally require a relocation. However, the MS linker has a feature called /INCREMENTAL which enables incremental links. It achieves this by creating thunks to the actual function and redirecting all relocations to point to the thunk. This breaks down with the old scheme if you have a function which references, say, itself. On x86_64, we would use %rip relative addressing to reference the start of the function from out current position. This would lead to miscompiles because other references might reference the thunk instead, breaking function pointer equality. This fixes PR21520. llvm-svn: 221678	2014-11-11 08:43:57 +00:00
Suyog Sarda	beb064bd94	Addition to r216371 (SLP and Loop Vectorization) and r218607 where cost model for signed division by power of 2 was improved for AArch64. The revision r218607 missed test case for Loop Vectorization. Adding it in this revision. Differential Revision: http://reviews.llvm.org/D6181 llvm-svn: 221674	2014-11-11 07:39:27 +00:00
Michael Kuperstein	3fe15e498f	[X86] Fix pattern match for 32-to-64-bit zext in the presence of AssertSext This fixes an issue with matching trunc -> assertsext -> zext on x86-64, which would not zero the high 32-bits. See PR20494 for details. Recommitting - This time, with a hopefully working test. Differential Revision: http://reviews.llvm.org/D6128 llvm-svn: 221672	2014-11-11 07:07:40 +00:00
Rafael Espindola	f7b5ba0621	Only run the gold plugin tests if gold supports the targets we test with. This fixes pr21345. llvm-svn: 221669	2014-11-11 05:27:12 +00:00
Quentin Colombet	360460ba64	[X86] Custom lower UINT_TO_FP from v4f32 to v4i32, and for v8f32 to v8i32 if AVX2 is available. According to IACA, the new lowering has a throughput of 8 cycles instead of 13 with the previous one. Althought this lowering kicks in some SPECs benchmarks, the performance improvement was within the noise. Correctness testing has been done for the whole range of uint32_t with the following program: uint4 v = (uint4) {0,1,2,3}; uint32_t i; //Check correctness over entire range for uint4 -> float4 conversion for( i = 0; i < 1U << (32-2); i++ ) { float4 t = test(v); float4 c = correct(v); if( 0xf != _mm_movemask_ps( t == c )) { printf( "Error @ %vx: %vf vs. %vf\n", v, c, t); return -1; } v += 4; } Where "correct" is the old lowering and "test" the new one. The patch adds a test case for the two custom lowering instruction. It also modifies the vector cost model, which is why cast.ll and uitofp.ll are modified. 2009-02-26-MachineLICMBug.ll is also modified because we now hoist 7 instructions instead of 4 (3 more constant loads). rdar://problem/18153096> llvm-svn: 221657	2014-11-11 02:23:47 +00:00
Chad Rosier	a9ae3e311c	[yaml2obj] Support AArch64 relocations. Patch by Daniel Stewart <stewartd@codeaurora.org>! Phabricator Revision: http://reviews.llvm.org/D6192 llvm-svn: 221639	2014-11-10 23:02:03 +00:00

... 3 4 5 6 7 ...

27392 Commits