llvm-project

Commit Graph

Author	SHA1	Message	Date
Nick Kledzik	3e95fa431e	[llvm-objdump] clean up test cases now that build bots are green llvm-svn: 217985	2014-09-17 21:53:07 +00:00
Justin Bogner	5cbed6e09e	llvm-cov: Push some more debug output into the View (NFC) llvm-svn: 217984	2014-09-17 21:48:52 +00:00
Rafael Espindola	51bd8ee309	Internalize common symbols when we can. This fixes pr20974. llvm-svn: 217981	2014-09-17 20:41:13 +00:00
Juergen Ributzka	c611d72754	[FastISel][AArch64] Simplify mul to shift when possible. This is related to rdar://problem/18369687. llvm-svn: 217980	2014-09-17 20:35:41 +00:00
Alexey Samsonov	7bddb0a56a	Exclude known and bugzilled failures from UBSan bootstrap llvm-svn: 217979	2014-09-17 20:17:52 +00:00
Juergen Ributzka	3871c69422	[FastISel][AArch64] Fold mul into add/sub and logical operations. Try to fold the multiply into the add/sub or logical operations (when possible). This is related to rdar://problem/18369687. llvm-svn: 217978	2014-09-17 19:51:38 +00:00
Juergen Ributzka	22d4cd0a4f	[FastISel][AArch64] Fold mul into the address computation of memory operations. Teach 'computeAddress' to also fold multiplies into the address computation (when possible). This fixes rdar://problem/18369443. llvm-svn: 217977	2014-09-17 19:19:31 +00:00
Robin Morisset	bf26f8fd56	Revert "[ARM, Fix] Fix emitLeading/TrailingFence on old ARM processors" It is breaking the build on the buildbots but works fine on my machine, I revert while trying to understand what happens (it appears to depend on the compiler used to build, I probably used a C++11 feature that is not perfectly supported by some of the buildbots). This reverts commit feb3176c4d006f99af8b40373abd56215a90e7cc. llvm-svn: 217973	2014-09-17 18:09:13 +00:00
Juergen Ributzka	d8e30c0db8	[FastISel][AArch64] Fold compare with zero and branch into CBZ and CBNZ. This takes advanatage of the CBZ and CBNZ instruction to further optimize the common null check pattern into a single instruction. This is related to rdar://problem/18358882. llvm-svn: 217972	2014-09-17 18:05:34 +00:00
Juergen Ributzka	fb3e14375a	[FastISel][AArch64] Improve branch selection to support all FP conditions. This adds the last two missing floating-point condition codes (FCMP_UEQ and FCMP_ONE) also to the branch selection. In these two cases an additonal branch instruction is required. This also adds unit tests to checks all the different condition codes. This is related o rdar://problem/18358882. llvm-svn: 217966	2014-09-17 17:46:47 +00:00
Robin Morisset	1c8a457575	[ARM, Fix] Fix emitLeading/TrailingFence on old ARM processors Summary: I had only tested this code for ARMv7 and ARMv8. This patch adds several fallback paths if the processor does not support dmb ish: - dmb sy if a cortex-M with support for dmb - mcr p15, #0, r0, c7, c10, #5 for ARMv6 (special instruction equivalent to a DMB) These fallback paths were chosen based on the code for fence seq_cst. Thanks to luqmana for having noticed this bug. Test Plan: Added more cases to atomic-load-store.ll + make check-all Reviewers: jfb, t.p.northover, luqmana Subscribers: aemerson, llvm-commits Differential Revision: http://reviews.llvm.org/D5304 llvm-svn: 217965	2014-09-17 17:41:16 +00:00
Matt Arsenault	02dc26529e	R600/SI: Change formatting of printed FP immediates Only 1 decimal place should be printed for inline immediates. Other constants should be hex constants. Does not include f64 tests because folding those inline immediates currently does not work. llvm-svn: 217964	2014-09-17 17:32:13 +00:00
Chad Rosier	307b50b0f6	[IndVarSimplify] Partially revert r217953 to see if this fixes the bots. Specifically, disable widening of unsigned compare instructions. llvm-svn: 217962	2014-09-17 16:35:09 +00:00
Chad Rosier	bb99f40530	[IndVarSimplify] Widen loop compare instructions. This improves other optimizations such as LSR. A sext may be added to the compare's other operand, but this can often be hoisted outside of the loop. llvm-svn: 217953	2014-09-17 14:10:33 +00:00
Andrea Di Biagio	5b92b4971a	[InstCombine] Fix wrong folding of constant comparison involving ahsr and negative quantities (PR20945). Example: define i1 @foo(i32 %a) { %shr = ashr i32 -9, %a %cmp = icmp ne i32 %shr, -5 ret i1 %cmp } Before this fix, the instruction combiner wrongly thought that %shr could have never been equal to -5. Therefore, %cmp was always folded to 'true'. However, when %a is equal to 1, then %cmp evaluates to 'false'. Therefore, in this example, it is not valid to fold %cmp to 'true'. The problem was only affecting the case where the comparison was between negative quantities where one of the quantities was obtained from arithmetic shift of a negative constant. This patch fixes the problem with the wrong folding (fixes PR20945). With this patch, the 'icmp' from the example is now simplified to a comparison between %a and 1. This still allows us to get rid of the arithmetic shift (%shr). llvm-svn: 217950	2014-09-17 11:32:31 +00:00
Toma Tabacu	351b2feeb3	[mips] Add assembler support for the .set nodsp directive. Summary: This directive is used to tell the assembler to reject DSP-specific instructions. Reviewers: dsanders Reviewed By: dsanders Differential Revision: http://reviews.llvm.org/D5142 llvm-svn: 217946	2014-09-17 09:01:54 +00:00
Pavel Chupin	37b65d81dd	[x32] Fix function indirect calls Summary: Zero-extend register to 64-bit for callq/jmpq. Test Plan: 3 tests added Reviewers: nadav, dschuff Subscribers: llvm-commits, zinovy.nis Differential Revision: http://reviews.llvm.org/D5355 llvm-svn: 217942	2014-09-17 07:09:23 +00:00
David Majnemer	b435a4214e	InstSimplify: Don't allow (x srem y) urem y -> x srem y Let's consider the case where: %x i16 = 32768 %y i16 = 384 %x srem %y = 65408 (%x srem %y) urem %y = 128 llvm-svn: 217939	2014-09-17 04:16:35 +00:00
David Majnemer	ac717f0972	InstSimplify: ((X % Y) % Y) -> (X % Y) Patch by Sonam Kumari! Differential Revision: http://reviews.llvm.org/D5350 llvm-svn: 217937	2014-09-17 03:34:34 +00:00
Nick Kledzik	3006130a8e	[llvm-objdump] properly use c_str() with format("%s"). Improve getLibraryShortNameByIndex() error handling. llvm-svn: 217930	2014-09-17 00:25:22 +00:00
Quentin Colombet	ac55b15bf4	[CodeGenPrepare][AddressingModeMatcher] The promotion mechanism was expecting instructions when truncate, sext, or zext were created. Fix that. llvm-svn: 217926	2014-09-16 22:36:07 +00:00
Nick Kledzik	53a80d3a46	tweak test case for debugging bot llvm-svn: 217906	2014-09-16 21:29:54 +00:00
Kevin Enderby	98c9accace	Hookup the MCSymbolizer to llvm-objdump’s disassembly for Mach-O files. First step done in this commit is to get flush out enough of the SymbolizerGetOpInfo() routine to symbolic an X86_64 hello world .o and its loading of the literal string and call to printf. Also the code to symbolicate the X86_64_RELOC_SUBTRACTOR relocation and a test is also added to show a slightly more complicated case. Next will be to flush out enough of SymbolizerSymbolLookUp() to get the literal string “Hello world” printed as a comment on the instruction that load the pointer to it. llvm-svn: 217893	2014-09-16 18:00:57 +00:00
Adam Nemet	e5a07167f5	[TableGen] Fully resolve class-instance values before defs in multiclasses By class-instance values I mean 'Class<Arg>' in 'Class<Arg>.Field' or in 'Other<Class<Arg>>' (syntactically s SimpleValue). This is to differentiate from unnamed/anonymous record definitions (syntactically an ObjectBody) which are not affected by this change. Consider the testcase: class Struct<int i> { int I = !shl(i, 1); int J = !shl(I, 1); } class Class<Struct s> { int Class_J = s.J; } multiclass MultiClass<int i> { def Def : Class<Struct<i>>; } defm Defm : MultiClass<2>; Before this fix, DefmDef.Class_J yields !shl(I, 1) instead of 8. This is the sequence of events. We start with this: multiclass MultiClass<int i> { def Def : Class<Struct<i>>; } During ParseDef the anonymous object for the class-instance value is created: multiclass Multiclass<int i> { def anonymous_0 : Struct<i>; def Def : Class<NAME#anonymous_0>; } Then class Struct<i> is added to anonymous_0. Also Class<NAME#anonymous_0> is added to Def: multiclass Multiclass<int i> { def anonymous_0 { int I = !shl(i, 1); int J = !shl(I, 1); } def Def { int Class_J = NAME#anonymous_0.J; } } So far so good but then we move on to instantiating this in the defm by substituting the template arg 'i'. This is how the anonymous prototype looks after fully instantiating. defm Defm = { def Defmanonymous_0 { int I = 4; int J = !shl(I, 1); } Note that we only resolved the reference to the template arg. The non-template-arg reference in 'J' has not been resolved yet. Then we go on to instantiating the Def prototype: def DefmDef { int Class_J = NAME#anonymous_0.J; } Which is resolved to Defmanonymous_0.J and then to !shl(I, 1). When we fully resolve each record in a defm, Defmanonymous_0.J does get set to 8 but that's too late for its use. The patch adds a new attribute to the Record class that indicates that this def is actually a class-instance value that may be used by other defs in a multiclass. (This is unlike regular defs which don't reference each other and thus can be resolved indepedently.) They are then fully resolved before the other defs while the multiclass is instantiated. I added vg_leak to the new test. I am not sure if this is necessary but I don't think I have a way to test it. I can also check in without the XFAIL and let the bots test this part. Also tested that X86.td.expanded and AAarch64.td.expanded were unchange before and after this change. (This issue triggering this problem is a WIP patch.) Part of <rdar://problem/17688758> llvm-svn: 217886	2014-09-16 17:14:13 +00:00
Toma Tabacu	65f1057191	[mips] Improve the error messages given by MipsAsmParser. Summary: Changed error messages to be more informative and to resemble other clang/llvm error messages (first letter is lower case, no ending punctuation) and updated corresponding tests. Reviewers: dsanders Reviewed By: dsanders Differential Revision: http://reviews.llvm.org/D5065 llvm-svn: 217873	2014-09-16 15:00:52 +00:00
Toma Tabacu	18227e6f20	[mips] Move 32-bit ADDiu instruction alias from Mips64InstrInfo.td to MipsInstrInfo.td. Patch by Vasileios Kalintiris. Differential Revision: http://reviews.llvm.org/D5244 llvm-svn: 217868	2014-09-16 10:19:03 +00:00
Toma Tabacu	25cdd222b0	[mips] Marked the ADDi instruction aliases as not available in Mips32R6 and Mips64R6. Patch by Vasileios Kalintiris. Differential Revision: http://reviews.llvm.org/D5242 llvm-svn: 217867	2014-09-16 09:26:09 +00:00
Tilmann Scheller	40fc9595c8	[InstCombine] Remove redundant test case. Patch by Sonam Kumari! Differential Revision: http://reviews.llvm.org/D5284 llvm-svn: 217865	2014-09-16 08:50:10 +00:00
Elena Demikhovsky	27012478d2	AVX-512: added cost for some AVX-512 instructions llvm-svn: 217863	2014-09-16 07:57:37 +00:00
Nick Kledzik	c17c8093db	tweak test case to help build bot llvm-svn: 217860	2014-09-16 04:51:38 +00:00
Hal Finkel	cc4f31d3d7	Fix BasicTTI::getCmpSelInstrCost to deal with illegal vector types The default implementation of getCmpSelInstrCost, which provides the cost of icmp/fcmp/select instructions, did not deal sensibly with illegal vector types that were scalarized. We'd ask for the legalization cost of the vector type, which would return something like (4, f64) given an input of <4 x double>, and we'd then check the TLI status of the ISD opcode on that scalar type. This would result in querying (ISD::VSELECT, f64), for example. Amusingly enough, ISD::VSELECT on scalar types is marked as Legal by default (as with most other operations), and most backends never change this because VSELECT is never generated on scalars. However, seeing the resulting operation as Legal, we'd neglect to add the scalarization cost before returning. The result is that we'd grossly under-estimate the cost of cmps/selects on illegal vector types. Now, if type legalization clearly results in scalarization, we skip the early return and add the scalarization cost. llvm-svn: 217859	2014-09-16 04:35:50 +00:00
David Majnemer	2cbc13878f	yaml2obj: Support bigobj Teach yaml2obj how to make a bigobj COFF file. Like the rest of LLVM, we automatically decide whether or not to use regular COFF or bigobj COFF on the fly depending on how many sections the resulting object would have. This ends the task of adding bigobj support to LLVM. N.B. This was tested by forcing yaml2obj to be used in bigobj mode regardless of the number of sections. While a dedicated test was written, the smallest I could make it was 36 MB (!) of yaml and it still took a significant amount of time to execute on a powerful machine. llvm-svn: 217858	2014-09-16 03:52:46 +00:00
Nick Kledzik	c1a750bba6	tweak test case to help solve why failing on one build bot llvm-svn: 217856	2014-09-16 02:33:36 +00:00
Nick Kledzik	56ebef45ef	[llvm-objdump] for mach-o add -bind, -lazy-bind, and -weak-bind options This finishes the ability of llvm-objdump to print out all information from the LC_DYLD_INFO load command. The -bind option prints out symbolic references that dyld must resolve immediately. The -lazy-bind option prints out symbolc reference that are lazily resolved on first use. The -weak-bind option prints out information about symbols which dyld must try to coalesce across images. llvm-svn: 217853	2014-09-16 01:41:51 +00:00
Juergen Ributzka	59e631c728	[FastISel][AArch64] Add vector support to argument lowering. Lower the first 8 vector arguments too. llvm-svn: 217850	2014-09-16 00:25:30 +00:00
Chandler Carruth	f845e89425	[x86] As a follow-up to r217819, don't check for VSELECT legality now that we don't use VSELECT and directly emit an addsub synthetic node. Also remove a stale comment referencing VSELECT. The test case is updated to use 'core2' which only has SSE3, not SSE4.1, and it still passes. Previously it would not because we lacked sufficient blend support to legalize the VSELECT. llvm-svn: 217849	2014-09-16 00:24:42 +00:00
Chandler Carruth	de5f2b356b	[x86] Add the beginnings of a proper DAG combine to match ADDSUBPS and ADDSUBPD nodes out of blends of adds and subs. This allows us to actually form these instructions with SSE3 rather than only forming them when we had both SSE3 for the ADDSUB instructions and SSE4.1 for the blend instructions. ;] Kind-of important. I've adjusted the CPU requirements on one of the tests to demonstrate this kicking in nicely for an SSE3 cpu configuration. llvm-svn: 217848	2014-09-16 00:15:20 +00:00
Juergen Ributzka	f693787ed0	[FastISel][AArch64] Add missing test case for previous commit. This adds the missing test case for the previous commit: Allow handling of vectors during return lowering for little endian machines. Sorry for the noise. llvm-svn: 217847	2014-09-15 23:47:57 +00:00
Juergen Ributzka	993224a553	[FastISel][AArch64] Lower sin/cos/pow to runtime lib calls. Also lower sin/cos/pow to runtime lib calls. This fixes rdar://problem/18343468. llvm-svn: 217839	2014-09-15 22:33:06 +00:00
Justin Bogner	92bb302314	llvm-cov: Make debug output more consistent This changes the debug output of the llvm-cov tool to consistently write to stderr, and moves the highlighting output closer to where it's relevant. llvm-svn: 217838	2014-09-15 22:23:29 +00:00
Justin Bogner	0b3614f806	llvm-cov: Fix an issue with showing regions but not counts In r217746, though it was supposed to be NFC, I broke llvm-cov's handling of showing regions without showing counts. This should've shown up in the existing tests, except they were checking debug output that was displayed regardless of what was actually output. I've moved the relevant debug output to a more appropriate place so that the tests catch this kind of thing. llvm-svn: 217835	2014-09-15 22:12:28 +00:00
Rafael Espindola	9dd2d5810f	Add back tests for empty function in SPARC and PowerPC. llvm-svn: 217834	2014-09-15 22:11:07 +00:00
Juergen Ributzka	afa034fb61	[FastISel][AArch64] Add lowering support for frem. This lowers frem to a runtime libcall inside fast-isel. The test case also checks the CallLoweringInfo bug that was exposed by this change. This fixes rdar://problem/18342783. llvm-svn: 217833	2014-09-15 22:07:49 +00:00
Juergen Ributzka	8984f48d89	[FastISel][AArch64] Improve floating-point compare support. Add support for the last two missing fcmp condition codes: UEQ and ONE. This fixes rdar://problem/18341575. llvm-svn: 217823	2014-09-15 20:47:16 +00:00
Reed Kotler	32be74b178	Add mips32 r1 to the list of supported targets for Mips fast-isel Summary: Expand list of supported targets for Mips to include mips32 r1. Previously it only include r2. More patches are coming where there is a difference but in the current patches as pushed upstream, r1 and r2 are equivalent. Test Plan: simplestorefp1.ll add new build bots at mips to test this flavor at both -O0 and -O2 Reviewers: dsanders Reviewed By: dsanders Differential Revision: http://reviews.llvm.org/D5306 llvm-svn: 217821	2014-09-15 20:30:25 +00:00
NAKAMURA Takumi	33d585cb25	llvm/test/CodeGen/X86/peephole-fold-movsd.ll: Relax an expression for win32. llvm-svn: 217806	2014-09-15 19:00:31 +00:00
Rafael Espindola	4235c2dade	Add a triple to fix the bots. llvm-svn: 217805	2014-09-15 18:54:41 +00:00
Rafael Espindola	6865d6f08a	Fix a lot of confusion around inserting nops on empty functions. On MachO, and MachO only, we cannot have a truly empty function since that breaks the linker logic for atomizing the section. When we are emitting a frame pointer, the presence of an unreachable will create a cfi instruction pointing past the last instruction. This is perfectly fine. The FDE information encodes the pc range it applies to. If some tool cannot handle this, we should explicitly say which bug we are working around and only work around it when it is actually relevant (not for ELF for example). Given the unreachable we could omit the .cfi_def_cfa_register, but then again, we could also omit the entire function prologue if we wanted to. llvm-svn: 217801	2014-09-15 18:32:58 +00:00
Quentin Colombet	9dcb724d31	[CodeGenPrepare][AddressingModeMatcher] Fix a think-o for the sext(zext) -> zext promotion introduced in r217629. We were returning the old sext instead of the new zext as the promoted instruction! Thanks Joerg Sonnenberger for the test case. llvm-svn: 217800	2014-09-15 18:26:58 +00:00
Akira Hatanaka	760814a7e1	[X86] Fix a bug in X86's peephole optimization. Peephole optimization was folding MOVSDrm, which is a zero-extending double precision floating point load, into ADDPDrr, which is a SIMD add of two packed double precision floating point values. (before) %vreg21<def> = MOVSDrm <fi#0>, 1, %noreg, 0, %noreg; mem:LD8[%7](align=16)(tbaa=<badref>) VR128:%vreg21 %vreg23<def,tied1> = ADDPDrr %vreg20<tied0>, %vreg21; VR128:%vreg23,%vreg20,%vreg21 (after) %vreg23<def,tied1> = ADDPDrm %vreg20<tied0>, <fi#0>, 1, %noreg, 0, %noreg; mem:LD8[%7](align=16)(tbaa=<badref>) VR128:%vreg23,%vreg20 X86InstrInfo::foldMemoryOperandImpl already had the logic that prevented this from happening. However the check wasn't being conducted for loads from stack objects. This commit factors out the logic into a new function and uses it for checking loads from stack slots are not zero-extending loads. rdar://problem/18236850 llvm-svn: 217799	2014-09-15 18:23:52 +00:00
Matt Arsenault	f090bda1d5	CHECK-LABELize test llvm-svn: 217797	2014-09-15 17:56:56 +00:00
Matt Arsenault	49dd4283ed	R600/SI: Prefer selecting more e64 instruction forms. Add some more tests to make sure better operand choices are still made. Leave some cases that seem to have no reason to ever be e64 alone. llvm-svn: 217789	2014-09-15 17:15:02 +00:00
Matt Arsenault	0fd0a316ed	R600/SI: Make sure double vector fmul is tested llvm-svn: 217787	2014-09-15 17:04:54 +00:00
Matt Arsenault	72aafd0689	R600/SI: Add some mubuf testcases. I noticed some odd looking cases where addr64 wasn't set when storing to a pointer in an SGPR. This seems to be intentional, and partially tested already. The documentation seems to describe addr64 in terms of which registers addressing modifiers come from, but I would expect to always need addr64 when using 64-bit pointers. If no offset is applied, it makes sense to not need to worry about doing a 64-bit add for the final address. A small immediate offset can be applied, so is it OK to not have addr64 set if a carry is necessary when adding the base pointer in the resource to the offset? llvm-svn: 217785	2014-09-15 16:48:01 +00:00
Matt Arsenault	3f98140c87	R600/SI: Add preliminary support for flat address space llvm-svn: 217777	2014-09-15 15:41:53 +00:00
Toma Tabacu	bbd0eca340	[mips] Marked the DADDiu instruction aliases as MIPS III. Patch by Vasileios Kalintiris. Differential Revision: http://reviews.llvm.org/D5239 llvm-svn: 217770	2014-09-15 14:47:46 +00:00
Chandler Carruth	707a2e098d	[x86] Begin emitting PBLENDW instructions for integer blend operations when SSE4.1 is available. This removes a ton of domain crossing from blend code paths that were ending up in the floating point code path. This is just the tip of the iceberg though. The real switch is for integer blend lowering to more actively rely on this instruction being available so we don't hit shufps at all any longer. =] That will come in a follow-up patch. Another place where we need better support is for using PBLENDVB when doing so avoids the need to have two complementary PSHUFB masks. llvm-svn: 217767	2014-09-15 12:40:54 +00:00
Chandler Carruth	00b1e0fc9d	[x86] Add an explicit SSE3 run to this test and flesh out a bunch of missing specific checks. While there is a lot of redundancy here where all-but-one mode use the same code generation, I'd rather have each variant spelled out and checked so that readers aren't misled by an omission in the test suite. llvm-svn: 217765	2014-09-15 11:40:20 +00:00
Chandler Carruth	12d4a70cbd	[x86] Teach the x86 DAG combiner to form UNPCKLPS and UNPCKHPS instructions from the relevant shuffle patterns. This is the last tweak I'm aware of to generate essentially perfect v4f32 and v2f64 shuffles with the new vector shuffle lowering up through SSE4.1. I'm sure I've missed some and it'd be nice to check since v4f32 is amenable to exhaustive exploration, but this is all of the tricks I'm aware of. With AVX there is a new trick to use the VPERMILPS instruction, that's coming up in a subsequent patch. llvm-svn: 217761	2014-09-15 11:26:25 +00:00
Chandler Carruth	41a25dd7ef	[x86] Teach the x86 DAG combiner to form MOVSLDUP and MOVSHDUP instructions when it finds an appropriate pattern. These are lovely instructions, and its a shame to not use them. =] They are fast, and can hand loads folded into their operands, etc. I've also plumbed the comment shuffle decoding through the various layers so that the test cases are printed nicely. llvm-svn: 217758	2014-09-15 11:15:23 +00:00
Chandler Carruth	35e3b545d6	[x86] Undo a flawed transform I added to form UNPCK instructions when AVX is available, and generally tidy up things surrounding UNPCK formation. Originally, I was thinking that the only advantage of PSHUFD over UNPCK instruction variants was its free copy, and otherwise we should use the shorter encoding UNPCK instructions. This isn't right though, there is a larger advantage of being able to fold a load into the operand of a PSHUFD. For UNPCK, the operand must be in a register so it can be the second input. This removes the UNPCK formation in the target-specific DAG combine for v4i32 shuffles. It also lifts the v8 and v16 cases out of the AVX-specific check as they are potentially replacing multiple instructions with a single instruction and so should always be valuable. The floating point checks are simplified accordingly. This also adjusts the formation of PSHUFD instructions to attempt to match the shuffle mask to one which would fit an UNPCK instruction variant. This was originally motivated to allow it to match the UNPCK instructions in the combiner, but clearly won't now. Eventually, we should add a MachineCombiner pass that can form UNPCK instructions post-RA when the operand is known to be in a register and thus there is no loss. llvm-svn: 217755	2014-09-15 10:35:41 +00:00
Chandler Carruth	44e64b5267	[x86] Teach the new vector shuffle lowering to use 'punpcklwd' and 'punpckhwd' instructions when suitable rather than falling back to the generic algorithm. While we could canonicalize to these patterns late in the process, that wouldn't help when the freedom to use them is only visible during initial lowering when undef lanes are well understood. This, it turns out, is very important for matching the shuffle patterns that are used to lower sign extension. Fixes a small but relevant regression in gcc-loops with the new lowering. When I changed this I noticed that several 'pshufd' lowerings became unpck variants. This is bad because it removes the ability to freely copy in the same instruction. I've adjusted the widening test to handle undef lanes correctly and now those will correctly continue to use 'pshufd' to lower. However, this caused a bunch of churn in the test cases. No functional change, just churn. Both of these changes are part of addressing a general weakness in the new lowering -- it doesn't sufficiently leverage undef lanes. I've at least a couple of patches that will help there at least in an academic sense. llvm-svn: 217752	2014-09-15 09:02:37 +00:00
David Majnemer	a315bd80c2	InstSimplify: Simplify trivial and/or of icmps Some ICmpInsts when anded/ored with another ICmpInst trivially reduces to true or false depending on whether or not all integers or no integers satisfy the intersected/unioned range. This sort of trivial looking code can come about when InstCombine performs a range reduction-type operation on sdiv and the like. This fixes PR20916. llvm-svn: 217750	2014-09-15 08:15:28 +00:00
Chandler Carruth	0a98790b32	[x86] Teach the new vector shuffle lowering to use BLENDPS and BLENDPD. These are super simple. They even take precedence over crazy instructions like INSERTPS because they have very high throughput on modern x86 chips. I still have to teach the integer shuffle variants about this to avoid so many domain crossings. However, due to the particular instructions available, that's a touch more complex and so a separate patch. Also, the backend doesn't seem to realize it can commute blend instructions by negating the mask. That would help remove a number of copies here. Suggestions on how to do this welcome, it's an area I'm less familiar with. llvm-svn: 217744	2014-09-14 23:43:33 +00:00
NAKAMURA Takumi	da86d7c26b	llvm/test/CodeGen/X86/vec_shuffle-38.ll: Add explicit -mtriple=x86_64-unknown to avoid incompatibility of win32. llvm-svn: 217742	2014-09-14 23:39:01 +00:00
Chandler Carruth	f2a92921f9	[x86] Add an SSE41 mode to this test. Nothing interesting here, its the same as SSE3. llvm-svn: 217741	2014-09-14 23:28:12 +00:00
Chandler Carruth	b396922647	[x86] Switch this test to use an ALL prefix with special SSE2 and SSE3 variants where significant. This will make it more obvious what is happening when we start using blends in SSE41. llvm-svn: 217740	2014-09-14 23:19:37 +00:00
Chandler Carruth	da5ce5cad8	[x86] Add some test cases where we should emit blendpd in SSE4.1. No actual change yet though. llvm-svn: 217739	2014-09-14 23:15:52 +00:00
Chandler Carruth	47ebd24e24	[x86] Teach the vector combiner that picks a canonical shuffle from to support transforming the forms from the new vector shuffle lowering to use 'movddup' when appropriate. A bunch of the cases where we actually form 'movddup' don't actually show up in the test results because something even later than DAG legalization maps them back to 'unpcklpd'. If this shows back up as a performance problem, I'll probably chase it down, but it is at least an encoded size loss. =/ To make this work, also always do this canonicalizing step for floating point vectors where the baseline shuffle instructions don't provide any free copies of their inputs. This also causes us to canonicalize unpck[hl]pd into mov{hl,lh}ps (resp.) which is a nice encoding space win. There is one test which is "regressed" by this: extractelement-load. There, the test case where the optimization it is testing fails, the exact instruction pattern which results is slightly different. This should probably be fixed by having the appropriate extract formed earlier in the DAG, but that would defeat the purpose of the test.... If this test case is critically important for anyone, please let me know and I'll try to work on it. The prior behavior was actually contrary to the comment in the test case and seems likely to have been an accident. llvm-svn: 217738	2014-09-14 22:41:37 +00:00
Matt Arsenault	f620a575bf	R600/SI: Fix broken check lines llvm-svn: 217736	2014-09-14 18:32:05 +00:00
Juergen Ributzka	85c1f84650	[FastISel][AArch64] Add support for non-native types for logical ops. Extend the logical ops selection to also support non-native types such as i1, i8, and i16. Fixes rdar://problem/18330589. llvm-svn: 217732	2014-09-13 23:46:28 +00:00
Chad Rosier	ce65c060e7	[AArch64] Update test case to pass with post-RA MI scheduler. Check that the post RA scheduler is being skipped, regardless of whether it's the top-down list latency scheduler or the post-RA MI scheduler. llvm-svn: 217725	2014-09-13 03:23:23 +00:00
Nick Kledzik	b8536b1db8	Stop suppress error messages in test case to see why one buildbot is failing llvm-svn: 217715	2014-09-12 22:46:01 +00:00
Nick Kledzik	ac43144e5a	[llvm-objdump] support -rebase option for mach-o to dump rebasing info Similar to my previous -exports-trie option, the -rebase option dumps info from the LC_DYLD_INFO load command. The rebasing info is a list of the the locations that dyld needs to adjust if a mach-o image is not loaded at its preferred address. Since ASLR is now the default, images almost never load at their preferred address, and thus need to be rebased by dyld. llvm-svn: 217709	2014-09-12 21:34:15 +00:00
Justin Bogner	54b112828f	llvm-profdata: Avoid undefined behaviour when reading raw profiles The raw profiles that are generated in compiler-rt always add padding so that each profile is aligned, so we can simply treat files that don't have this property as malformed. Caught by Alexey's new ubsan bot. Thanks! llvm-svn: 217708	2014-09-12 21:22:55 +00:00
Chad Rosier	e668f61076	FileCheckize. NFC. llvm-svn: 217698	2014-09-12 17:55:16 +00:00
Chad Rosier	486e087f26	[AArch64] Enable post-RA MI scheduler. Phabricator Revision: http://reviews.llvm.org/D5278 Patch by Sanjin Sijaric! llvm-svn: 217693	2014-09-12 17:40:39 +00:00
Jordan Rose	ef78038775	[lit] Parse all strings as UTF-8 rather than ASCII. As far as I can tell UTF-8 has been supported since the beginning of Python's codec support, and it's the de facto standard for text these days, at least for primarily-English text. This allows us to put Unicode into lit RUN lines. rdar://problem/18311663 llvm-svn: 217688	2014-09-12 16:46:05 +00:00
NAKAMURA Takumi	9e424c0eb4	llvm/test/CodeGen/X86/vec_ctbits.ll: Add explicit -mtriple=x86_64-unknown. It was incompatible to Win32 x64. llvm-svn: 217683	2014-09-12 15:10:56 +00:00
Zoran Jovanovic	c74e3eb9a6	[mips][microMIPS] Implement JRADDIUSP instruction Differential Revision: http://reviews.llvm.org/D5046 llvm-svn: 217681	2014-09-12 14:29:54 +00:00
Bill Schmidt	b73b370809	Address comments on r217622 llvm-svn: 217680	2014-09-12 14:26:36 +00:00
Zoran Jovanovic	ed6dd6bd39	[mips][microMIPS] Implement BGEZALS and BLTZALS instructions Differential Revision: http://reviews.llvm.org/D5004 llvm-svn: 217678	2014-09-12 13:51:58 +00:00
Zoran Jovanovic	ac9ef12fc5	[mips][microMIPS] Implement JALS and JALRS instructions. Differential Revision: http://reviews.llvm.org/D5003 llvm-svn: 217676	2014-09-12 13:43:41 +00:00
Zoran Jovanovic	4e7ac4ad2a	[mips][microMIPS] Implement TLBP, TLBR, TLBWI and TLBWR instructions Differential Revision: http://reviews.llvm.org/D5211 llvm-svn: 217675	2014-09-12 13:33:33 +00:00
James Molloy	a9f47b6bae	[ARM] Teach the cost model that cross-class copies are costly. Cross-class copies being expensive is actually a trait of the microarchitecture, but as I haven't yet seen an example of a microarchitecture where they're cheap it seems best to just enable this by default, covering the non-mcpu build case. llvm-svn: 217674	2014-09-12 13:29:40 +00:00
Benjamin Kramer	6d527ef9d6	Legalizer: Use the scalar bit width when promoting bit counting instrs on vectors. e.g. when promoting ctlz from <2 x i32> to <2 x i64> we have to fixup the result by 32 bits, not 64. PR20917. llvm-svn: 217671	2014-09-12 12:50:27 +00:00
Justin Bogner	3d7260e7b2	Revert "llvm-cov: Remove an overly system specific test" This fixes a call to sys::fs::equivalent that should've been to CodeCoverageTool::equivalentFiles, which lets us restore the test of r217476 that was removed in r217478. This reverts r217478, but the test works this time. llvm-svn: 217646	2014-09-11 23:20:48 +00:00
Matt Arsenault	362f345bab	R600/SI: Fix off by 1 error in used register count The register numbers start at 0, so if only 1 register was used, this was reported as 0. llvm-svn: 217636	2014-09-11 22:51:37 +00:00
Lang Hames	691a21ce5a	[MCJIT] Make sure we test ARM BR24 relocations with both internal and external symbols. Previously we have only been testing these relocations with external symbols. <rdar://problem/18308413> llvm-svn: 217635	2014-09-11 22:43:36 +00:00
Quentin Colombet	b2c5c6dde3	[CodeGenPrepare] Teach the addressing mode matcher how to promote zext. I.e., teach it about 'sext (zext a to ty) to ty2' => zext a to ty2. llvm-svn: 217629	2014-09-11 21:22:14 +00:00
Bill Schmidt	3ae268076b	Add missing colon to RUN line... llvm-svn: 217623	2014-09-11 20:13:52 +00:00
Bill Schmidt	be95fd5357	[PATCH, PowerPC] Accept 'U' and 'X' constraints in inline asm Inline asm may specify 'U' and 'X' constraints to print a 'u' for an update-form memory reference, or an 'x' for an indexed-form memory reference. However, these are really only useful in GCC internal code generation. In inline asm the operand of the memory constraint is typically just a register containing the address, so 'U' and 'X' make no sense. This patch quietly accepts 'U' and 'X' in inline asm patterns, but otherwise does nothing. If we ever unexpectedly see a non-register, we'll assert and sort it out afterwards. I've added a new test for these constraints; the test case should be used for other asm-constraints changes down the road. llvm-svn: 217622	2014-09-11 20:10:03 +00:00
Lang Hames	6f1048f94e	[MCJIT] Add support for ARM HALF_DIFF relocations to MCJIT. Fixes <rdar://problem/18297804>. llvm-svn: 217620	2014-09-11 19:21:14 +00:00
Matt Arsenault	d40e1c3fbc	Add triple to test to fix bots llvm-svn: 217612	2014-09-11 17:50:20 +00:00
Brad Smith	2ce0d91bde	Provide an implementation of getNoopForMachoTarget for SPARC. llvm-svn: 217611	2014-09-11 17:40:51 +00:00
Matt Arsenault	8239eaab99	Add DAG combine for shl + add of constants. Do (shl (add x, c1), c2) -> (add (shl x, c2), c1 << c2) This is already done for multiplies, but since multiplies by powers of two are turned into shifts, we also need to handle it here. This might want checks for isLegalAddImmediate to avoid transforming an add of a legal immediate with one that isn't. llvm-svn: 217610	2014-09-11 17:34:19 +00:00
Lang Hames	4669cd08a7	[MCJIT] Take the relocation addend into account when applying ARM MachO VANILLA and BR24 relocations. <rdar://problem/18296496> llvm-svn: 217605	2014-09-11 17:27:01 +00:00
Adam Nemet	053c4e825c	[AVX512] Fix miscompile for unpack r189189 implemented AVX512 unpack by essentially performing a 256-bit unpack between the low and the high 256 bits of src1 into the low part of the destination and another unpack of the low and high 256 bits of src2 into the high part of the destination. I don't think that's how unpack works. AVX512 unpack simply has more 128-bit lanes but other than it works the same way as AVX. So in each 128-bit lane, we're always interleaving certain parts of both operands rather different parts of one of the operands. E.g. for this: __v16sf a = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 }; __v16sf b = { 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31 }; __v16sf c = __builtin_shufflevector(a, b, 0, 8, 1, 9, 4, 12, 5, 13, 16, 24, 17, 25, 20, 28, 21, 29); we generated punpcklps (notice how the elements of a and b are not interleaved in the shuffle). In turn, c was set to this: 0 16 1 17 4 20 5 21 8 24 9 25 12 28 13 29 Obviously this should have just returned the mask vector of the shuffle vector. I mostly reverted this change and made sure the original AVX code worked for 512-bit vectors as well. Also updated the tests because they matched the logic from the code. llvm-svn: 217602	2014-09-11 16:51:10 +00:00
Sanjay Patel	1eb5047ddb	Add triple and remove hashes to account for buildbot differences in comment strings. llvm-svn: 217601	2014-09-11 16:08:44 +00:00
Sanjay Patel	7bd228a82e	Combine fmul vector FP constants when unsafe math is allowed. This is an extension of the change made with r215820: http://llvm.org/viewvc/llvm-project?view=revision&revision=215820 That patch allowed combining of splatted vector FP constants that are multiplied. This patch allows combining non-uniform vector FP constants too by relaxing the check on the type of vector. Also, canonicalize a vector fmul in the same way that we already do for scalars - if only one operand of the fmul is a constant, make it operand 1. Otherwise, we miss potential folds. This fold is also done by -instcombine, but it's possible that extra fmuls may have been generated during lowering. Differential Revision: http://reviews.llvm.org/D5254 llvm-svn: 217599	2014-09-11 15:45:27 +00:00
Aaron Watry	3ffc560094	R600: Test local atomics for evergreen Now that the operations are all implemented, we can test this sub-arch here. Signed-off-by: Aaron Watry <awatry@gmail.com> Reviewed-by: Matt Arsenault <matthew.arsenault@amd.com> llvm-svn: 217595	2014-09-11 15:02:52 +00:00
Tilmann Scheller	ee0e49398c	[ARM] Add Thumb-2 code size optimization regression test for LSR (register). llvm-svn: 217582	2014-09-11 10:45:50 +00:00
Tilmann Scheller	579379a6f4	[ARM] Add Thumb-2 code size optimization regression test for LSR (immediate). llvm-svn: 217581	2014-09-11 10:42:17 +00:00
Arnaud A. de Grandmaison	3690266739	[AArch64] Reenable the PBQP test now that the leak issue has been fixed. David Blaikie's commits r217563 & r217564, which added shared_ptr to the CostPool have fixed some memory leak issues exposed by the PBQP with coalescing constraints. The sanitizer bot was failing because of those leaks. Now that the leaks are gone, we can reenable the aarch64/pbqp test. llvm-svn: 217580	2014-09-11 10:39:52 +00:00
Tilmann Scheller	0c1249ac60	[ARM] Add Thumb-2 code size optimization regression test for LSL (register). llvm-svn: 217579	2014-09-11 10:33:39 +00:00
Tilmann Scheller	7430df486e	[ARM] Add Thumb2 code size optimization regression test for LSL (immediate). llvm-svn: 217576	2014-09-11 10:29:42 +00:00
Chandler Carruth	1ec3e4e4bd	[x86] Fixup r217565 which baked in an assumption about the function name that breaks on some platforms. This part of the test just doesn't matter... llvm-svn: 217575	2014-09-11 10:21:25 +00:00
Hal Finkel	f83e1f7f66	[AlignmentFromAssumptions] Don't crash just because the target is 32-bit We used to crash processing any relevant @llvm.assume on a 32-bit target (because we'd ask SE to subtract expressions of differing types). I've copied our 'simple.ll' test, but with the data layout from arm-linux-gnueabihf to get some meaningful test coverage here. llvm-svn: 217574	2014-09-11 08:40:17 +00:00
David Xu	f7aff68fe3	Build correct vector filled with undef nodes llvm-svn: 217570	2014-09-11 05:10:28 +00:00
Chandler Carruth	292303dd47	[x86] FileCheck-ize this test. llvm-svn: 217565	2014-09-11 00:13:35 +00:00
Matt Arsenault	61a528adc7	R600/SI: Fix losing chain when fixing reg class of loads. The lost chain resulting in earlier side effecting nodes being deleted. llvm-svn: 217561	2014-09-10 23:26:19 +00:00
Peter Collingbourne	d0ec5ab948	Add LLVMgold target to test dependencies. llvm-svn: 217557	2014-09-10 22:20:49 +00:00
Matt Arsenault	16e313343d	R600: Custom lower frem llvm-svn: 217553	2014-09-10 21:44:27 +00:00
Hal Finkel	71b7084112	[AlignmentFromAssumptions] Don't divide by zero for unknown starting alignment The routine that determines an alignment given some SCEV returns zero if the answer is unknown. In a case where we could determine the increment of an AddRec but not the starting alignment, we would compute the integer modulus by zero (which is illegal and traps). Prevent this by returning early if either the start or increment alignment is unknown (zero). llvm-svn: 217544	2014-09-10 21:05:52 +00:00
Rafael Espindola	71143ed24b	Remember to eraseFromParent after replaceAllUsesWith. llvm-svn: 217536	2014-09-10 19:39:41 +00:00
Arnaud A. de Grandmaison	d17f96c9ad	[AArch64] Temporarily desactivate the PBQP test, while I investigate some leaks in the allocator llvm-svn: 217531	2014-09-10 18:40:18 +00:00
Sanjay Patel	b653de1ada	Rename getMaximumUnrollFactor -> getMaxInterleaveFactor; also rename option names controlling this variable. "Unroll" is not the appropriate name for this variable. Clang already uses the term "interleave" in pragmas and metadata for this. Differential Revision: http://reviews.llvm.org/D5066 llvm-svn: 217528	2014-09-10 17:58:16 +00:00
Arnaud A. de Grandmaison	c75dbbbdd6	[AArch64] Add experimental PBQP support This adds target specific support for using the PBQP register allocator on the AArch64, for the A57 cpu. By default, the PBQP allocator is not used, unless explicitely required on the command line with "-aarch64-pbqp". llvm-svn: 217504	2014-09-10 14:06:10 +00:00
Asiri Rathnayake	369c030633	[AArch 64] Use a constant pool load for weak symbol references when using static relocation model and small code model. Summary: currently we generate GOT based relocations for weak symbol references regardless of the underlying relocation model. This should be change so that in static relocation model we use a constant pool load instead. Patch from: Keith Walker Reviewers: Renato Golin, Tim Northover llvm-svn: 217503	2014-09-10 13:54:38 +00:00
Tim Northover	ba1d704229	ARM: don't size-reduce STMs using the LR register. The only Thumb-1 multi-store capable of using LR is the PUSH instruction, which translates to STMDB, so we shouldn't convert STMIAs. Patch by Sergey Dmitrouk. llvm-svn: 217498	2014-09-10 12:53:28 +00:00
David Majnemer	44f51e5113	Object: Add support for bigobj This adds support for reading the "bigobj" variant of COFF produced by cl's /bigobj and mingw's -mbig-obj. The most significant difference that bigobj brings is more than 2**16 sections to COFF. bigobj brings a few interesting differences with it: - It doesn't have a Characteristics field in the file header. - It doesn't have a SizeOfOptionalHeader field in the file header (it's only used in executable files). - Auxiliary symbol records have the same width as a symbol table entry. Since symbol table entries are bigger, so are auxiliary symbol records. Write support will come soon. Differential Revision: http://reviews.llvm.org/D5259 llvm-svn: 217496	2014-09-10 12:51:52 +00:00
Yuri Gorshenin	3939dec1f7	[asan-assembly-instrumentation] Added CFI directives to the generated instrumentation code. Summary: [asan-assembly-instrumentation] Added CFI directives to the generated instrumentation code. Reviewers: eugenis Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5189 llvm-svn: 217482	2014-09-10 09:45:49 +00:00
Job Noorman	eb19aea4f9	Drop the W postfix on the 16-bit registers. This ensures the inline assembly register constraints are properly recognised in TargetLowering::getRegForInlineAsmConstraint. llvm-svn: 217479	2014-09-10 06:58:14 +00:00
Justin Bogner	32cc7abdb6	llvm-cov: Remove an overly system specific test It appears that the -filename-equivalence option for testing llvm-cov doesn't work correctly with -show-expansions. I'm reverting this test to get the bots green while I look into fixing that. This partially reverts r217476 llvm-svn: 217478	2014-09-10 06:35:38 +00:00
Kai Nacke	d287094566	[MIPS] Add aliases for sync instruction used by Octeon CPU This commit adds aliases for the sync instruction (synciobdma, syncs, syncw, syncws) which are used by the Octeon CPU. Reviewed by D. Sanders llvm-svn: 217477	2014-09-10 06:10:24 +00:00
Justin Bogner	3f81d4953a	llvm-cov: Fix a misuse of ArrayRef::slice I introduced in r217430 It appears this code was completely untested, so using ArrayRef wrong didn't break anything obvious. llvm-svn: 217476	2014-09-10 06:06:07 +00:00
Rafael Espindola	890db27b67	Handle common linkage correctly in the gold plugin. This is the plugin version of pr20882. This handles the case of every common symbol being in the IR. We will need some support from gold to handle the case where some symbols are in ELF and some in the IR. llvm-svn: 217458	2014-09-09 20:08:22 +00:00
Rafael Espindola	fe3842cda7	Merge alignment of common GlobalValue. Fixes pr20882. llvm-svn: 217455	2014-09-09 17:48:18 +00:00
Bjorn Steinbrink	3c33150801	Add a test for hoisting instructions with metadata out of then/else blocks Test for the bug fixed in r215723. llvm-svn: 217453	2014-09-09 17:10:21 +00:00
Rafael Espindola	0910605af6	When merging two common GlobalValues, keep the largest. llvm-svn: 217451	2014-09-09 15:59:12 +00:00
Rafael Espindola	14a41ce802	Make this input file pass the verifier. This was not noticed before because llvm-link only runs the verifier on the result and these globals were not present in the result. llvm-svn: 217450	2014-09-09 15:40:12 +00:00
Rafael Espindola	c83c8d4e74	Fix a use of an undefined value (the linkage). llvm-svn: 217445	2014-09-09 14:52:27 +00:00
Rafael Espindola	7fc29546f9	Prefer common over weak linkage when linking. This matches the behavior of ELF linkers. llvm-svn: 217443	2014-09-09 14:27:09 +00:00
Toma Tabacu	2664779b27	[mips] Add assembler support for .set mips0 directive. Summary: This directive is used to reset the assembler options to their initial values. Assembly programmers use it in conjunction with the ".set mipsX" directives. This patch depends on the .set push/pop directive (http://reviews.llvm.org/D4821). Contains work done by Matheus Almeida. Reviewers: dsanders Reviewed By: dsanders Differential Revision: http://reviews.llvm.org/D4957 llvm-svn: 217438	2014-09-09 12:52:14 +00:00
Pavel Chupin	e6617fc6d4	[x32] Emit callq for CALLpcrel32 Summary: In AT&T annotation for both x86_64 and x32 calls should be printed as callq in assembly. It's only a matter of correct mnemonic, object output is ok. Test Plan: trivial test added Reviewers: nadav, dschuff, craig.topper Subscribers: llvm-commits, zinovy.nis Differential Revision: http://reviews.llvm.org/D5213 llvm-svn: 217435	2014-09-09 11:54:12 +00:00
Tim Northover	0b0add517b	llvm-objdump: don't crash when __compact_unwind has no relocs. llvm-svn: 217433	2014-09-09 10:45:06 +00:00
Toma Tabacu	9db22db963	[mips] Add assembler support for .set push/pop directive. Summary: These directives are used to save the current assembler options (in the case of ".set push") and restore the previously saved options (in the case of ".set pop"). Contains work done by Matheus Almeida. Reviewers: dsanders Reviewed By: dsanders Differential Revision: http://reviews.llvm.org/D4821 llvm-svn: 217432	2014-09-09 10:15:38 +00:00
Renato Golin	63e27980da	ARM: Negative offset support problem This patch is to permit a negative offset usage for a non frame access. Patch by Igor Oblakov. llvm-svn: 217431	2014-09-09 09:57:59 +00:00
Bob Wilson	b3482af341	Set trunc store action to Expand for all X86 targets. When compiling without SSE2, isTruncStoreLegal(F64, F32) would return Legal, whereas with SSE2 it would return Expand. And since the Target doesn't seem to actually handle a truncstore for double -> float, it would just output a store of a full double in the space for a float hence overwriting other bits on the stack. Patch by Luqman Aden! llvm-svn: 217410	2014-09-09 01:13:36 +00:00
Hans Wennborg	18f0a986c1	Fast-ISel: Remove dead code after falling back from selecting call instructions (PR20863) Previously, fast-isel would not clean up after failing to select a call instruction, because it would have called flushLocalValueMap() which moves the insertion point, making SavedInsertPt in selectInstruction() invalid. Fixing this by making SavedInsertPt a member variable, and having flushLocalValueMap() update it. This removes some redundant code at -O0, and more importantly fixes PR20863. Differential Revision: http://reviews.llvm.org/D5249 llvm-svn: 217401	2014-09-08 20:24:10 +00:00
Matt Arsenault	7ac9c4a074	R600/SI: Replace LDS atomics with no return versions llvm-svn: 217379	2014-09-08 15:07:31 +00:00
Chad Rosier	3528c1e4c6	[AArch64] Improve AA to remove unneeded edges in the AA MI scheduling graph. Patch by Sanjin Sijaric <ssijaric@codeaurora.org>! Phabricator Review: http://reviews.llvm.org/D5103 llvm-svn: 217371	2014-09-08 14:43:48 +00:00
Hal Finkel	cebf0cc210	Make use @llvm.assume for loop guards in ScalarEvolution This adds a basic (but important) use of @llvm.assume calls in ScalarEvolution. When SE is attempting to validate a condition guarding a loop (such as whether or not the loop count can be zero), this check should also include dominating assumptions. llvm-svn: 217348	2014-09-07 21:37:59 +00:00
Hal Finkel	93873cc10e	Check for all known bits on ret in InstCombine From a combination of @llvm.assume calls (and perhaps through other means, such as range metadata), it is possible that all bits of a return value might be known. Previously, InstCombine did not check for this (which is understandable given assumptions of constant propagation), but means that we'd miss simple cases where assumptions are involved. llvm-svn: 217346	2014-09-07 21:28:34 +00:00
Hal Finkel	7e1844940e	Make use of @llvm.assume from LazyValueInfo This change teaches LazyValueInfo to use the @llvm.assume intrinsic. Like with the known-bits change (r217342), this requires feeding a "context" instruction pointer through many functions. Aside from a little refactoring to reuse the logic that turns predicates into constant ranges in LVI, the only new code is that which can 'merge' the range from an assumption into that otherwise computed. There is also a small addition to JumpThreading so that it can have LVI use assumptions in the same block as the comparison feeding a conditional branch. With this patch, we can now simplify this as expected: int foo(int a) { __builtin_assume(a > 5); if (a > 3) { bar(); return 1; } return 0; } llvm-svn: 217345	2014-09-07 20:29:59 +00:00
Hal Finkel	d67e463901	Add an AlignmentFromAssumptions Pass This adds a ScalarEvolution-powered transformation that updates load, store and memory intrinsic pointer alignments based on invariant((a+q) & b == 0) expressions. Many of the simple cases we can get with ValueTracking, but we still need something like this for the more complicated cases (such as those with an offset) that require some algebra. Note that gcc's __builtin_assume_aligned's optional third argument provides exactly for this kind of 'misalignment' offset for which this kind of logic is necessary. The primary motivation is to fixup alignments for vector loads/stores after vectorization (and unrolling). This pass is added to the optimization pipeline just after the SLP vectorizer runs (which, admittedly, does not preserve SE, although I imagine it could). Regardless, I actually don't think that the preservation matters too much in this case: SE computes lazily, and this pass won't issue any SE queries unless there are any assume intrinsics, so there should be no real additional cost in the common case (SLP does preserve DT and LoopInfo). llvm-svn: 217344	2014-09-07 20:05:11 +00:00
Hal Finkel	15aeaaf24a	Add additional patterns for @llvm.assume in ValueTracking This builds on r217342, which added the infrastructure to compute known bits using assumptions (@llvm.assume calls). That original commit added only a few patterns (to catch common cases related to determining pointer alignment); this change adds several other patterns for simple cases. r217342 contained that, for assume(v & b = a), bits in the mask that are known to be one, we can propagate known bits from the a to v. It also had a known-bits transfer for assume(a = b). This patch adds: assume(~(v & b) = a) : For those bits in the mask that are known to be one, we can propagate inverted known bits from the a to v. assume(v \| b = a) : For those bits in b that are known to be zero, we can propagate known bits from the a to v. assume(~(v \| b) = a): For those bits in b that are known to be zero, we can propagate inverted known bits from the a to v. assume(v ^ b = a) : For those bits in b that are known to be zero, we can propagate known bits from the a to v. For those bits in b that are known to be one, we can propagate inverted known bits from the a to v. assume(~(v ^ b) = a) : For those bits in b that are known to be zero, we can propagate inverted known bits from the a to v. For those bits in b that are known to be one, we can propagate known bits from the a to v. assume(v << c = a) : For those bits in a that are known, we can propagate them to known bits in v shifted to the right by c. assume(~(v << c) = a) : For those bits in a that are known, we can propagate them inverted to known bits in v shifted to the right by c. assume(v >> c = a) : For those bits in a that are known, we can propagate them to known bits in v shifted to the right by c. assume(~(v >> c) = a) : For those bits in a that are known, we can propagate them inverted to known bits in v shifted to the right by c. assume(v >=_s c) where c is non-negative: The sign bit of v is zero assume(v >_s c) where c is at least -1: The sign bit of v is zero assume(v <=_s c) where c is negative: The sign bit of v is one assume(v <_s c) where c is non-positive: The sign bit of v is one assume(v <=_u c): Transfer the known high zero bits assume(v <_u c): Transfer the known high zero bits (if c is know to be a power of 2, transfer one more) A small addition to InstCombine was necessary for some of the test cases. The problem is that when InstCombine was simplifying and, or, etc. it would fail to check the 'do I know all of the bits' condition before checking less specific conditions and would not fully constant-fold the result. I'm not sure how to trigger this aside from using assumptions, so I've just included the change here. llvm-svn: 217343	2014-09-07 19:21:07 +00:00
Hal Finkel	60db05896a	Make use of @llvm.assume in ValueTracking (computeKnownBits, etc.) This change, which allows @llvm.assume to be used from within computeKnownBits (and other associated functions in ValueTracking), adds some (optional) parameters to computeKnownBits and friends. These functions now (optionally) take a "context" instruction pointer, an AssumptionTracker pointer, and also a DomTree pointer, and most of the changes are just to pass this new information when it is easily available from InstSimplify, InstCombine, etc. As explained below, the significant conceptual change is that known properties of a value might depend on the control-flow location of the use (because we care that the @llvm.assume dominates the use because assumptions have control-flow dependencies). This means that, when we ask if bits are known in a value, we might get different answers for different uses. The significant changes are all in ValueTracking. Two main changes: First, as with the rest of the code, new parameters need to be passed around. To make this easier, I grouped them into a structure, and I made internal static versions of the relevant functions that take this structure as a parameter. The new code does as you might expect, it looks for @llvm.assume calls that make use of the value we're trying to learn something about (often indirectly), attempts to pattern match that expression, and uses the result if successful. By making use of the AssumptionTracker, the process of finding @llvm.assume calls is not expensive. Part of the structure being passed around inside ValueTracking is a set of already-considered @llvm.assume calls. This is to prevent a query using, for example, the assume(a == b), to recurse on itself. The context and DT params are used to find applicable assumptions. An assumption needs to dominate the context instruction, or come after it deterministically. In this latter case we only handle the specific case where both the assumption and the context instruction are in the same block, and we need to exclude assumptions from being used to simplify their own ephemeral values (those which contribute only to the assumption) because otherwise the assumption would prove its feeding comparison trivial and would be removed. This commit adds the plumbing and the logic for a simple masked-bit propagation (just enough to write a regression test). Future commits add more patterns (and, correspondingly, more regression tests). llvm-svn: 217342	2014-09-07 18:57:58 +00:00
David Blaikie	c42f9ac01c	DebugInfo: Do not use DW_FORM_GNU_addr_index in skeleton CUs, GDB 7.8 errors on this. It's probably not a huge deal to not do this - if we could, maybe the address could be reused by a subprogram low_pc and avoid an extra relocation, but it's just one per CU at best. llvm-svn: 217338	2014-09-07 17:31:42 +00:00
Hal Finkel	57f03dda49	Add functions for finding ephemeral values This adds a set of utility functions for collecting 'ephemeral' values. These are LLVM IR values that are used only by @llvm.assume intrinsics (directly or indirectly), and thus will be removed prior to code generation, implying that they should be considered free for certain purposes (like inlining). The inliner's cost analysis, and a few other passes, have been updated to account for ephemeral values using the provided functionality. This functionality is important for the usability of @llvm.assume, because it limits the "non-local" side-effects of adding llvm.assume on inlining, loop unrolling, etc. (these are hints, and do not generate code, so they should not directly contribute to estimates of execution cost). llvm-svn: 217335	2014-09-07 13:49:57 +00:00

1 2 3 4 5 ...

26212 Commits