llvm-project

Commit Graph

Author	SHA1	Message	Date
Adam Nemet	ce465421d7	[AVX512] Simplify use of !con() No change in X86.td.expanded. llvm-svn: 218485	2014-09-26 00:53:12 +00:00
Adam Nemet	f7988d7364	[AVX512] Pull pattern for subvector extract into the instruction definition No functional change. I initially thought that pulling the Pat<> into the instruction pattern was not possible because it was doing a transform on the index in order to convert it from a per-element (extract_subvector) index into a per-chunk (vextract*x4) index. Turns out this also works inside the pattern because the vextract_extract PatFrag has an OperandTransform EXTRACT_get_vextract{128,256}_imm, so the index in $idx goes through the same conversion. The existing test CodeGen/X86/avx512-insert-extract.ll extended in the previous commit provides coverage for this change. llvm-svn: 218480	2014-09-25 23:48:49 +00:00
Adam Nemet	55536c6a8f	[AVX512] Refactor subvector extracts No functional change. These are now implemented as two levels of multiclasses heavily relying on the new X86VectorVTInfo class. The multiclass at the first level that is called with float or int provides the 128 or 256 bit subvector extracts. The second level provides the register and memory variants and some more Pat<>s. I've compared the td.expanded files before and after. One change is that ExeDomain for 64x4 is SSEPackedDouble now. I think this is correct, i.e. a bugfix. (BTW, this is the change that was blocked on the recent tablegen fix. The class-instance values X86VectorVTInfo inside vextract_for_type weren't properly evaluated.) Part of <rdar://problem/17688758> llvm-svn: 218478	2014-09-25 23:48:45 +00:00
Adam Nemet	6ea09eb148	[AVX512] Fix typo F->I in VEXTRACTF32x4rr. llvm-svn: 218477	2014-09-25 23:48:42 +00:00
Bruno Cardoso Lopes	d04f7596e7	[MachineSink+PGO] Teach MachineSink to use BlockFrequencyInfo Machine Sink uses loop depth information to select between successors BBs to sink machine instructions into, where BBs within smaller loop depths are preferable. This patch adds support for choosing between successors by using profile information from BlockFrequencyInfo instead, whenever the information is available. Tested it under SPEC2006 train (average of 30 runs for each program); ~1.5% execution speedup in average on x86-64 darwin. <rdar://problem/18021659> llvm-svn: 218472	2014-09-25 23:14:26 +00:00
Nick Kledzik	e648037449	[Support] Add type-safe alternative to llvm::format() llvm::format() is somewhat unsafe. The compiler does not check that integer parameter size matches the %x or %d size and it does not complain when a StringRef is passed for a %s. And correctly using a StringRef with format() is ugly because you have to convert it to a std::string then call c_str(). The cases where llvm::format() is useful is controlling how numbers and strings are printed, especially when you want fixed width output. This patch adds some new formatting functions to raw_streams to format numbers and StringRefs in a type safe manner. Some examples: OS << format_hex(255, 6) => "0x00ff" OS << format_hex(255, 4) => "0xff" OS << format_decimal(0, 5) => " 0" OS << format_decimal(255, 5) => " 255" OS << right_justify(Str, 5) => " foo" OS << left_justify(Str, 5) => "foo " llvm-svn: 218463	2014-09-25 20:30:58 +00:00
Anton Yartsev	3fa65d4ef4	Refactoring: raw pointer -> unique_ptr llvm-svn: 218462	2014-09-25 19:55:58 +00:00
Tom Stellard	1fa1ce6112	ARM: Remove unneeded check for MI->hasPostISelHook() llvm-svn: 218459	2014-09-25 18:59:23 +00:00
Tom Stellard	529efcf9d0	SelectionDAG: Remove #if NDEBUG from check for a post-isel hook The InstrEmitter will skip the check of MI.hasPostISelHook() before calling AdjustInstrPostInstrSelection() when NDEBUG is not defined. This was added in r140228, and I'm not sure if it is intentional or not, but it is a likely source for bugs, because it means with Release+Asserts builds you can forget to set the hasPostISelHook flag on TableGen definitions and AdjustInstrPostInstrSelection() will still be called. llvm-svn: 218458	2014-09-25 18:59:22 +00:00
Tom Stellard	7980fc8562	R600/SI: Add support for global atomic add llvm-svn: 218457	2014-09-25 18:30:26 +00:00
Robin Morisset	810739d174	Lower idempotent RMWs to fence+load Summary: I originally tried doing this specifically for X86 in the backend in D5091, but it was rather brittle and generally running too late to be general. Furthermore, other targets may want to implement similar optimizations. So I reimplemented it at the IR-level, fitting it into AtomicExpandPass as it interacts with that pass (which could not be cleanly done before at the backend level). This optimization relies on a new target hook, which is only used by X86 for now, as the correctness of the optimization on other targets remains an open question. If it is found correct on other targets, it should be trivial to enable for them. Details of the optimization are discussed in D5091. Test Plan: make check-all + a new test Reviewers: jfb Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5422 llvm-svn: 218455	2014-09-25 17:27:43 +00:00
Sid Manning	31f7125562	Add missing attributes !cmp.[eq,gt,gtu] instructions. These instructions do not indicate they are extendable or the number of bits in the extendable operand. Rename to match architected names. Add a testcase for the intrinsics. llvm-svn: 218453	2014-09-25 13:09:54 +00:00
Daniel Sanders	621589e7c0	Add llvm_unreachables() for [ASZ]ExtUpper to X86FastISel.cpp to appease the buildbots. llvm-svn: 218452	2014-09-25 13:08:51 +00:00
Daniel Sanders	ae275e38a2	[mips] Add CCValAssign::[ASZ]ExtUpper and CCPromoteToUpperBitsInType and handle struct's correctly on big-endian N32/N64 return values. Summary: The N32/N64 ABI's require that structs passed in registers are laid out such that spilling the register with 'sd' places the struct at the lowest address. For little endian this is trivial but for big-endian it requires that structs are shifted into the upper bits of the register. We also require that structs passed in registers have the 'inreg' attribute for big-endian N32/N64 to work correctly. This is because the tablegen-erated calling convention implementation only has access to the lowered form of struct arguments (one or more integers of up to 64-bits each) and is unable to determine the original type. Reviewers: vmedic Reviewed By: vmedic Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5286 llvm-svn: 218451	2014-09-25 12:15:05 +00:00
Renato Golin	f5dd1dacb6	Add aliases for VAND imm to VBIC ~imm On ARM NEON, VAND with immediate (16/32 bits) is an alias to VBIC ~imm with the same type size. Adding that logic to the parser, and generating VBIC instructions from VAND asm files. This patch also fixes the validation routines for NEON splat immediates which were wrong. Fixes PR20702. llvm-svn: 218450	2014-09-25 11:31:24 +00:00
Chandler Carruth	0a6e961efd	[x86] Teach the new vector shuffle lowering to use AVX2 instructions for v4f64 and v8f32 shuffles when they are lane-crossing. We have fully general lane-crossing permutation functions in AVX2 that make this easy. Part of this also changes exactly when and how these vectors are split up when we don't have AVX2. This isn't always a win but it usually is a win, so on the balance I think its better. The primary regressions are all things that just need to be fixed anyways such as modeling when a blend can be completely accomplished via VINSERTF128, etc. Also, this highlights one of the few remaining big features: we do a really poor job of inserting elements into AVX registers efficiently. This completes almost all of the big tricks I have in mind for AVX2. The only things left that I plan to add: 1) element insertion smarts 2) palignr and other fairly specialized lowerings when they happen to apply llvm-svn: 218449	2014-09-25 11:03:55 +00:00
Chandler Carruth	e91d68c475	[x86] Teach the new vector shuffle lowering a fancier way to lower 256-bit vectors with lane-crossing. Rather than immediately decomposing to 128-bit vectors, try flipping the 256-bit vector lanes, shuffling them and blending them together. This reduces our worst case shuffle by a pretty significant margin across the board. llvm-svn: 218446	2014-09-25 10:21:15 +00:00
Oliver Stannard	3256b26ef2	[Thumb2] BXJ should be undefined for v7M, v8A The Thumb2 BXJ instruction (Branch and Exchange Jazelle) is not defined for v7M or v8A. It is defined for all other Thumb2-supporting architectures (v6T2, v7A and v7R). llvm-svn: 218445	2014-09-25 10:02:05 +00:00
Chandler Carruth	02387122e0	[x86] Fix an oversight in the v8i32 path of the new vector shuffle lowering where it only used the mask of the low 128-bit lane rather than the entire mask. This allows the new lowering to correctly match the unpack patterns for v8i32 vectors. For reference, the reason that we check for the the entire mask rather than checking the repeated mask is because the repeated masks don't abide by all of the invariants of normal masks. As a consequence, it is safer to use the full mask with functions like the generic equivalence test. llvm-svn: 218442	2014-09-25 04:10:27 +00:00
Chandler Carruth	8140158cb5	[x86] Rearrange the code for v16i16 lowering a bit for clarity and to reduce the amount of checking we do here. The first realization is that only non-crossing cases between 128-bit lanes are handled by almost the entire function. It makes more sense to handle the crossing cases first. THe second is that until we actually are going to generate fancy shared lowering strategies that use the repeated semantics of the v8i16 lowering, we should waste time checking for repeated masks. It is simplest to directly test for the entire unpck masks anyways, so we gained nothing from this. This also matches the structure of v32i8 more closely. No functionality changed here. llvm-svn: 218441	2014-09-25 04:03:22 +00:00
Chandler Carruth	d8f528adb8	[x86] Implement AVX2 support for v32i8 in the new vector shuffle lowering. This completes the basic AVX2 feature support, but there are still some improvements I'd like to do to really get the last mile of performance here. llvm-svn: 218440	2014-09-25 02:52:12 +00:00
Reid Kleckner	81782f0cb8	MC: Use @IMGREL instead of @IMGREL32, which we can't parse Nico Rieck added support for this 32-bit COFF relocation some time ago for Win64 stuff. It appears that as an oversight, the assembly output used "foo"@IMGREL32 instead of "foo"@IMGREL, which is what we can parse. Sadly, there were actually tests that took in IMGREL and put out IMGREL32, and we didn't notice the inconsistency. Oh well. Now LLVM can assemble it's own output with slightly more fidelity. llvm-svn: 218437	2014-09-25 02:09:18 +00:00
Chandler Carruth	d355369dbb	[x86] Remove the defunct X86ISD::BLENDV entry -- we use vector selects for this now. Should prevent folks from running afoul of this and not knowing why their code won't instruction select the way I just did... llvm-svn: 218436	2014-09-25 01:16:01 +00:00
Chandler Carruth	a577bc26b6	[x86] Fix the v16i16 blend logic I added in the prior commit and add the missing test cases for it. Unsurprisingly, without test cases, there were bugs here. Surprisingly, this bug wasn't caught at compile time. Yep, there is an X86ISD::BLENDV. It isn't wired to anything. Oops. I'll fix than next. llvm-svn: 218434	2014-09-25 01:13:38 +00:00
Justin Bogner	b35a72ae9e	llvm-cov: Combine segments that cover the same location If we have multiple coverage counts for the same segment, we need to add them up rather than arbitrarily choosing one. This fixes that and adds a test with template instantiations to exercise it. llvm-svn: 218432	2014-09-25 00:34:18 +00:00
Akira Hatanaka	8cc48bd159	[X86,AVX] Add an isel pattern for X86VBroadcast. This fixes PR21050 and rdar://problem/18434607. llvm-svn: 218431	2014-09-25 00:26:15 +00:00
Chandler Carruth	98443d89b9	[x86] Implement v16i16 support with AVX2 in the new vector shuffle lowering. This also implements the fancy blend lowering for v16i16 using AVX2 and teaches the X86 backend to print shuffle masks for 256-bit PSHUFB and PBLENDW instructions. It also makes the mask decoding correct for PBLENDW instructions. The yaks, they are legion. Tests are updated accordingly. There are some missing tests for the VBLENDVB lowering, but I'll add those in a follow-up as this commit has accumulated enough cruft already. llvm-svn: 218430	2014-09-25 00:24:19 +00:00
Kostya Serebryany	34ddf8725c	[asan] don't instrument module CTORs that may be run before asan.module_ctor. This fixes asan running together -coverage llvm-svn: 218421	2014-09-24 22:41:55 +00:00
Renato Golin	4b5f91f513	Revert 218406 - Refactor the RelocVisitor::visit method llvm-svn: 218416	2014-09-24 21:30:43 +00:00
Akira Hatanaka	8e77dbbf5a	Revert r218380. This was breaking Apple internal build bots. llvm-svn: 218409	2014-09-24 20:37:14 +00:00
Renato Golin	2b25450061	Refactor the RelocVisitor::visit method This change replaces the brittle if/else chain of string comparisons with a switch statement on the detected target triple, removing the need for testing arbitrary architecture names returned from getFileFormatName, whose primary purpose seems to be for display (user-interface) purposes. The visitor now takes a reference to the object file, rather than its arbitrary file format name to figure out whether the file is a 32 or 64-bit object file and what the detected target triple is. A set of tests have been added to help show that the refactoring processes relocations for the same targets as the original code. Patch by Charlie Turner. llvm-svn: 218406	2014-09-24 20:07:22 +00:00
Chris Bieneman	7827217131	Adding #ifdef around TermColorMutex based on feedback from Craig Topper. llvm-svn: 218401	2014-09-24 18:35:58 +00:00
Chandler Carruth	edcba62b4a	[x86] Factor out the logic to generically decombose a vector shuffle into unblended shuffles and a blend. This is the consistent fallback for the lowering paths that have fast blend operations available, and its getting quite repetitive. No functionality changed. llvm-svn: 218399	2014-09-24 18:20:09 +00:00
Kaelyn Takata	f2fce14920	Revert "Refactor the RelocVisitor::visit method" This reverts commit faac033f7364bb4226e22c8079c221c96af10d02. The test depends on all targets to be enabled in llc in order to pass, and needs to be rewritten/refactored to not have that dependency. llvm-svn: 218393	2014-09-24 17:49:07 +00:00
Renato Golin	53f6034f8e	Refactor the RelocVisitor::visit method This change replaces the brittle if/else chain of string comparisons with a switch statement on the detected target triple, removing the need for testing arbitrary architecture names returned from getFileFormatName, whose primary purpose seems to be for display (user-interface) purposes. The visitor now takes a reference to the object file, rather than its arbitrary file format name to figure out whether the file is a 32 or 64-bit object file and what the detected target triple is. A set of tests have been added to help show that the refactoring processes relocations for the same targets as the original code. Patch by Charlie Turner. llvm-svn: 218388	2014-09-24 17:00:42 +00:00
David Peixotto	0d4d5e64ec	Fix assertion in LICM doFinalization() The doFinalization method checks that the LoopToAliasSetMap is empty. LICM populates that map as it runs through the loop nest, deleting the entries for child loops as it goes. However, if a child loop is deleted by another pass (e.g. unrolling) then the loop will never be deleted from the map because LICM walks the loop nest to find entries it can delete. The fix is to delete the loop from the map and free the alias set when the loop is deleted from the loop nest. Differential Revision: http://reviews.llvm.org/D5305 llvm-svn: 218387	2014-09-24 16:48:31 +00:00
Moritz Roth	f5d0c7c2c0	[Thumb] Make load/store optimizer less conservative. If it's safe to clobber the condition flags, we can do a few extra things: it's then possible to reset the base register writeback using a SUBS, so we can try to merge even if the base register isn't dead after the merged instruction. This is effectively a (heavily bug-fixed) rewrite of r208992. llvm-svn: 218386	2014-09-24 16:35:50 +00:00
Oliver Stannard	1ae8b476f4	[Thumb] 32-bit encodings of 'cps' are not valid for v7M v7M only allows the 16-bit encoding of the 'cps' (Change Processor State) instruction, and does not have the 32-bit encoding which is valid from v6T2 onwards. llvm-svn: 218382	2014-09-24 14:20:01 +00:00
Aaron Ballman	f086a14d53	Silencing an "enumeral and non-enumeral type in conditional expression" warning. NFC. llvm-svn: 218381	2014-09-24 13:54:56 +00:00
Benjamin Kramer	ce246a13ea	Replace a hand-written suffix compare with std::lexicographical_compare. No functionality change. llvm-svn: 218380	2014-09-24 13:19:28 +00:00
Chandler Carruth	e7e9c04ddf	[x86] Teach the instruction lowering to add comments describing constant pool data being loaded into a vector register. The comments take the form of: # ymm0 = [a,b,c,d,...] # xmm1 = <x,y,z...> The []s are used for generic sequential data and the <>s are used for specifically ConstantVector loads. Undef elements are printed as the letter 'u', integers in decimal, and floating point values as floating point values. Suggestions on improving the formatting or other aspects of the display are very welcome. My primary use case for this is to be able to FileCheck test masks passed to vector shuffle instructions in-register. It isn't fantastic for that (no decoding special zeroing semantics or other tricks), but it at least puts the mask onto an instruction line that could reasonably be checked. I've updated many of the new vector shuffle lowering tests to leverage this in their test cases so that we're actually checking the shuffle masks remain as expected. Before implementing this, I tried a bunch of different approaches. I looked into teaching the MCInstLower code to scan up the basic block and find a definition of a register used in a shuffle instruction and then decode that, but this seems incredibly brittle and complex. I talked to Hal a lot about the "right" way to do this: attach the raw shuffle mask to the instruction itself in some form of unencoded operands, and then use that to emit the comments. I still think that's the optimal solution here, but it proved to be beyond what I'm up for here. In particular, it seems likely best done by completing the plumbing of metadata through these layers and attaching the shuffle mask in metadata which could have fully automatic dropping when encoding an actual instruction. llvm-svn: 218377	2014-09-24 09:39:41 +00:00
Michael Liao	d120916ca7	Allow BB duplication threshold to be adjusted through JumpThreading's ctor - BB duplication may not be desired on targets where there is no or small branch penalty and code duplication needs restrict control. llvm-svn: 218375	2014-09-24 04:59:06 +00:00
NAKAMURA Takumi	f744ad43e1	Windows/Host.inc: Reformat the header to fit 80-col. llvm-svn: 218374	2014-09-24 04:45:14 +00:00
NAKAMURA Takumi	239a226dea	Unix/Host.inc: Remove <cstdlib>. It has been unused for a long time. llvm-svn: 218373	2014-09-24 04:45:02 +00:00
NAKAMURA Takumi	12abbdaeab	Unix/Host.inc: Wrap a comment line in 80-col. llvm-svn: 218371	2014-09-24 04:44:50 +00:00
NAKAMURA Takumi	3d238b47ec	Unix/Host.inc: Remove leading whitespace. It had been here since r56942! llvm-svn: 218370	2014-09-24 04:44:37 +00:00
Jiangning Liu	3b096172cf	Clear PreferredExtendType for in each function-specific state FunctionLoweringInfo. llvm-svn: 218364	2014-09-24 03:22:56 +00:00
Chandler Carruth	7b688c6884	[x86] More refactoring of the shuffle comment emission. The previous attempt didn't work out so well. It looks like it will be much better for introducing extra logic to find a shuffle mask if the finding logic is totally separate. This also makes it easy to sink the opcode logic completely out of the routine so we don't re-dispatch across it. Still no functionality changed. llvm-svn: 218363	2014-09-24 03:06:37 +00:00
Chandler Carruth	edf50212df	[x86] Bypass the shuffle mask comment generation when not using verbose asm. This can be somewhat expensive and there is no reason to do it outside of tests or debugging sessions. I'm also likely to make it significantly more expensive to support more styles of shuffles. llvm-svn: 218362	2014-09-24 03:06:34 +00:00
Chandler Carruth	ab8b37a9d2	[x86] Hoist the logic for extracting the relevant bits of information from the MachineInstr into the caller which is already doing a switch over the instruction. This will make it more clear how to compute different operands to feed the comment selection for example. Also, in a drive-by-fix, don't append an empty comment string (which is a no-op ultimately). No functionality changed. llvm-svn: 218361	2014-09-24 02:24:41 +00:00

1 2 3 4 5 ...

72990 Commits