llvm-project

Commit Graph

Author	SHA1	Message	Date
Chandler Carruth	8ac488b161	[x86] Fix an amazing goof in the handling of sub, or, and xor lowering. The comment for this code indicated that it should work similar to our handling of add lowering above: if we see uses of an instruction other than flag usage and store usage, it tries to avoid the specialized X86ISD::* nodes that are designed for flag+op modeling and emits an explicit test. Problem is, only the add case actually did this. In all the other cases, the logic was incomplete and inverted. Any time the value was used by a store, we bailed on the specialized X86ISD node. All of this appears to have been historical where we had different logic here. =/ Turns out, we have quite a few patterns designed around these nodes. We should actually form them. I fixed the code to match what we do for add, and it has quite a positive effect just within some of our test cases. The only thing close to a regression I see is using: notl %r testl %r, %r instead of: xorl -1, %r But we can add a pattern or something to fold that back out. The improvements seem more than worth this. I've also worked with Craig to update the comments to no longer be actively contradicted by the code. =[ Some of this still remains a mystery to both Craig and myself, but this seems like a large step in the direction of consistency and slightly more accurate comments. Many thanks to Craig for help figuring out this nasty stuff. Differential Revision: https://reviews.llvm.org/D37096 llvm-svn: 311737	2017-08-25 00:34:07 +00:00
Chandler Carruth	93a645525c	[x86] Teach the cmov converter to aggressively convert cmovs with memory operands into control flow. We have seen periodically performance problems with cmov where one operand comes from memory. On modern x86 processors with strong branch predictors and speculative execution, this tends to be much better done with a branch than cmov. We routinely see cmov stalling while the load is completed rather than continuing, and if there are subsequent branches, they cannot be speculated in turn. Also, in many (even simple) cases, macro fusion causes the control flow version to be fewer uops. Consider the IACA output for the initial sequence of code in a very hot function in one of our internal benchmarks that motivates this, and notice the micro-op reduction provided. Before, SNB: ``` Throughput Analysis Report -------------------------- Block Throughput: 2.20 Cycles Throughput Bottleneck: Port1 \| Num Of \| Ports pressure in cycles \| \| \| Uops \| 0 - DV \| 1 \| 2 - D \| 3 - D \| 4 \| 5 \| \| --------------------------------------------------------------------- \| 1 \| \| 1.0 \| \| \| \| \| CP \| mov rcx, rdi \| 0* \| \| \| \| \| \| \| \| xor edi, edi \| 2^ \| 0.1 \| 0.6 \| 0.5 0.5 \| 0.5 0.5 \| \| 0.4 \| CP \| cmp byte ptr [rsi+0xf], 0xf \| 1 \| \| \| 0.5 0.5 \| 0.5 0.5 \| \| \| \| mov rax, qword ptr [rsi] \| 3 \| 1.8 \| 0.6 \| \| \| \| 0.6 \| CP \| cmovbe rax, rdi \| 2^ \| \| \| 0.5 0.5 \| 0.5 0.5 \| \| 1.0 \| \| cmp byte ptr [rcx+0xf], 0x10 \| 0F \| \| \| \| \| \| \| \| jb 0xf Total Num Of Uops: 9 ``` After, SNB: ``` Throughput Analysis Report -------------------------- Block Throughput: 2.00 Cycles Throughput Bottleneck: Port5 \| Num Of \| Ports pressure in cycles \| \| \| Uops \| 0 - DV \| 1 \| 2 - D \| 3 - D \| 4 \| 5 \| \| --------------------------------------------------------------------- \| 1 \| 0.5 \| 0.5 \| \| \| \| \| \| mov rax, rdi \| 0* \| \| \| \| \| \| \| \| xor edi, edi \| 2^ \| 0.5 \| 0.5 \| 1.0 1.0 \| \| \| \| \| cmp byte ptr [rsi+0xf], 0xf \| 1 \| 0.5 \| 0.5 \| \| \| \| \| \| mov ecx, 0x0 \| 1 \| \| \| \| \| \| 1.0 \| CP \| jnbe 0x39 \| 2^ \| \| \| \| 1.0 1.0 \| \| 1.0 \| CP \| cmp byte ptr [rax+0xf], 0x10 \| 0F \| \| \| \| \| \| \| \| jnb 0x3c Total Num Of Uops: 7 ``` The difference even manifests in a throughput cycle rate difference on Haswell. Before, HSW: ``` Throughput Analysis Report -------------------------- Block Throughput: 2.00 Cycles Throughput Bottleneck: FrontEnd \| Num Of \| Ports pressure in cycles \| \| \| Uops \| 0 - DV \| 1 \| 2 - D \| 3 - D \| 4 \| 5 \| 6 \| 7 \| \| --------------------------------------------------------------------------------- \| 0* \| \| \| \| \| \| \| \| \| \| mov rcx, rdi \| 0* \| \| \| \| \| \| \| \| \| \| xor edi, edi \| 2^ \| \| \| 0.5 0.5 \| 0.5 0.5 \| \| 1.0 \| \| \| \| cmp byte ptr [rsi+0xf], 0xf \| 1 \| \| \| 0.5 0.5 \| 0.5 0.5 \| \| \| \| \| \| mov rax, qword ptr [rsi] \| 3 \| 1.0 \| 1.0 \| \| \| \| \| 1.0 \| \| \| cmovbe rax, rdi \| 2^ \| 0.5 \| \| 0.5 0.5 \| 0.5 0.5 \| \| \| 0.5 \| \| \| cmp byte ptr [rcx+0xf], 0x10 \| 0F \| \| \| \| \| \| \| \| \| \| jb 0xf Total Num Of Uops: 8 ``` After, HSW: ``` Throughput Analysis Report -------------------------- Block Throughput: 1.50 Cycles Throughput Bottleneck: FrontEnd \| Num Of \| Ports pressure in cycles \| \| \| Uops \| 0 - DV \| 1 \| 2 - D \| 3 - D \| 4 \| 5 \| 6 \| 7 \| \| --------------------------------------------------------------------------------- \| 0* \| \| \| \| \| \| \| \| \| \| mov rax, rdi \| 0* \| \| \| \| \| \| \| \| \| \| xor edi, edi \| 2^ \| \| \| 1.0 1.0 \| \| \| 1.0 \| \| \| \| cmp byte ptr [rsi+0xf], 0xf \| 1 \| \| 1.0 \| \| \| \| \| \| \| \| mov ecx, 0x0 \| 1 \| \| \| \| \| \| \| 1.0 \| \| \| jnbe 0x39 \| 2^ \| 1.0 \| \| \| 1.0 1.0 \| \| \| \| \| \| cmp byte ptr [rax+0xf], 0x10 \| 0F \| \| \| \| \| \| \| \| \| \| jnb 0x3c Total Num Of Uops: 6 ``` Note that this cannot be usefully restricted to inner loops. Much of the hot code we see hitting this is not in an inner loop or not in a loop at all. The optimization still remains effective and indeed critical for some of our code. I have run a suite of internal benchmarks with this change. I saw a few very significant improvements and a very few minor regressions, but overall this change rarely has a significant effect. However, the improvements were very significant, and in quite important routines responsible for a great deal of our C++ CPU cycles. The gains pretty clealy outweigh the regressions for us. I also ran the test-suite and SPEC2006. Only 11 binaries changed at all and none of them showed any regressions. Amjad Aboud at Intel also ran this over their benchmarks and saw no regressions. Differential Revision: https://reviews.llvm.org/D36858 llvm-svn: 311226	2017-08-19 05:01:19 +00:00
Sanjay Patel	7c026cb1af	[x86] auto-generate full checks; NFC llvm-svn: 307718	2017-07-11 22:04:36 +00:00
Geoff Berry	92a286ae5a	[SelectionDAG] Handle inverted conditions when splitting into multiple branches. Summary: When conditional branches with complex conditions are split into multiple branches in SelectionDAGBuilder::FindMergedConditions, also handle inverted conditions. These may sometimes appear without having been optimized by InstCombine when CodeGenPrepare decides to sink and duplicate cmp instructions, causing them to have only one use. This problem can be increased by e.g. GVNHoist hiding more cmps from InstCombine by combining equivalent cmps from different blocks. For example codegen X & !(Y \| Z) as: jmp_if_X TmpBB jmp FBB TmpBB: jmp_if_notY Tmp2BB jmp FBB Tmp2BB: jmp_if_notZ TBB jmp FBB Reviewers: bogner, MatzeB, qcolombet Subscribers: llvm-commits, hiraditya, mcrosier, sebpop Differential Revision: https://reviews.llvm.org/D28380 llvm-svn: 292944	2017-01-24 16:36:07 +00:00
Sanjay Patel	bf51c8a975	[x86] fix usage of stale operands when lowering select I noticed this problem as part of the ongoing attempt to canonicalize min/max ops in IR. The debug output shows nodes like this: t4: i32 = xor t2, Constant:i32<-1> t21: i8 = setcc t4, Constant:i32<0>, setlt:ch t14: i32 = select t21, t4, Constant:i32<-1> And because the select is holding onto the t4 (xor) node while EmitTest creates a new x86-specific xor node, the lowering results in: t4: i32 = xor t2, Constant:i32<-1> t25: i32,i32 = X86ISD::XOR t2, Constant:i32<-1> t28: i32,glue = X86ISD::CMOV Constant:i32<-1>, t4, Constant:i8<15>, t25:1 Differential Revision: https://reviews.llvm.org/D28374 llvm-svn: 291392	2017-01-08 15:53:40 +00:00
Sanjay Patel	686527c1e0	[x86] add test to show bug in select lowering; NFC llvm-svn: 291151	2017-01-05 18:35:44 +00:00
David L Kreitzer	8b959e5cfa	Avoid unnecessary 32-bit to 64-bit zero extensions following 32-bit CMOV instructions on x86_64. The 32-bit CMOV implicitly zero extends. Differential Revision: https://reviews.llvm.org/D22941 llvm-svn: 277148	2016-07-29 15:09:54 +00:00
Michael Kuperstein	3e3652aef2	Recommit r274692 - [X86] Transform setcc + movzbl into xorl + setcc xorl + setcc is generally the preferred sequence due to the partial register stall setcc + movzbl suffers from. As a bonus, it also encodes one byte smaller. This fixes PR28146. The original commit tried inserting an 8bit-subreg into a GR32 (not GR32_ABCD) which was not appreciated by fast regalloc on 32-bit. llvm-svn: 274802	2016-07-07 22:50:23 +00:00
Michael Kuperstein	edb38a94f8	Revert r274692 to check whether this is what breaks windows selfhost. llvm-svn: 274771	2016-07-07 16:55:35 +00:00
Michael Kuperstein	1ef6c59b1d	[X86] Transform setcc + movzbl into xorl + setcc xorl + setcc is generally the preferred sequence due to the partial register stall setcc + movzbl suffers from. As a bonus, it also encodes one byte smaller. This fixes PR28146. Differential Revision: http://reviews.llvm.org/D21774 llvm-svn: 274692	2016-07-06 21:56:18 +00:00
David Blaikie	23af64846f	[opaque pointer type] Add textual IR support for explicit type parameter to the call instruction See r230786 and r230794 for similar changes to gep and load respectively. Call is a bit different because it often doesn't have a single explicit type - usually the type is deduced from the arguments, and just the return type is explicit. In those cases there's no need to change the IR. When that's not the case, the IR usually contains the pointer type of the first operand - but since typed pointers are going away, that representation is insufficient so I'm just stripping the "pointerness" of the explicit type away. This does make the IR a bit weird - it /sort of/ reads like the type of the first operand: "call void () %x(" but %x is actually of type "void ()" and will eventually be just of type "ptr". But this seems not too bad and I don't think it would benefit from repeating the type ("void (), void () %x(" and then eventually "void (), ptr %x(") as has been done with gep and load. This also has a side benefit: since the explicit type is no longer a pointer, there's no ambiguity between an explicit type and a function that returns a function pointer. Previously this case needed an explicit type (eg: a function returning a void() function was written as "call void () () * @x(" rather than "call void () * @x(" because of the ambiguity between a function returning a pointer to a void() function and a function returning void). No ambiguity means even function pointer return types can just be written alone, without writing the whole function's type. This leaves /only/ the varargs case where the explicit type is required. Given the special type syntax in call instructions, the regex-fu used for migration was a bit more involved in its own unique way (as every one of these is) so here it is. Use it in conjunction with the apply.sh script and associated find/xargs commands I've provided in rr230786 to migrate your out of tree tests. Do let me know if any of this doesn't cover your cases & we can iterate on a more general script/regexes to help others with out of tree tests. About 9 test cases couldn't be automatically migrated - half of those were functions returning function pointers, where I just had to manually delete the function argument types now that we didn't need an explicit function type there. The other half were typedefs of function types used in calls - just had to manually drop the * from those. import fileinput import sys import re pat = re.compile(r'((?:=\|:\|^\|\s)call\s(?:[^@]?))(\s$\|\s(?:(?:\[\[[a-zA-Z0-9_]+\]\]\|[@%](?:(")?[\\\?@a-zA-Z0-9_.]?(?(3)"\|)\|{{.}}))(?:$\|$)\|undef\|inttoptr\|bitcast\|null\|asm).$)') addrspace_end = re.compile(r"addrspace\(\d+$\s\$") func_end = re.compile("(?:void.\|\)\s)\$") def conv(match, line): if not match or re.search(addrspace_end, match.group(1)) or not re.search(func_end, match.group(1)): return line return line[:match.start()] + match.group(1)[:match.group(1).rfind('')].rstrip() + match.group(2) + line[match.end():] for line in sys.stdin: sys.stdout.write(conv(re.search(pat, line), line)) llvm-svn: 235145	2015-04-16 23:24:18 +00:00
David Blaikie	f72d05bc7b	[opaque pointer type] Add textual IR support for explicit type parameter to gep operator Similar to gep (r230786) and load (r230794) changes. Similar migration script can be used to update test cases, which successfully migrated all of LLVM and Polly, but about 4 test cases needed manually changes in Clang. (this script will read the contents of stdin and massage it into stdout - wrap it in the 'apply.sh' script shown in previous commits + xargs to apply it over a large set of test cases) import fileinput import sys import re rep = re.compile(r"(getelementptr(?:\s+inbounds)?\s$)((<\d\s+x\s+)?([^@]?)(\|\saddrspace\(\d+$)\s\(?(3)>)\s*)(?=$\|%\|@\|null\|undef\|blockaddress\|getelementptr\|addrspacecast\|bitcast\|inttoptr\|zeroinitializer\|<\|\[\[[a-zA-Z]\|\{\{)", re.MULTILINE \| re.DOTALL) def conv(match): line = match.group(1) line += match.group(4) line += ", " line += match.group(2) return line line = sys.stdin.read() off = 0 for match in re.finditer(rep, line): sys.stdout.write(line[off:match.start()]) sys.stdout.write(conv(match)) off = match.end() sys.stdout.write(line[off:]) llvm-svn: 232184	2015-03-13 18:20:45 +00:00
David Blaikie	a79ac14fa6	[opaque pointer type] Add textual IR support for explicit type parameter to load instruction Essentially the same as the GEP change in r230786. A similar migration script can be used to update test cases, though a few more test case improvements/changes were required this time around: (r229269-r229278) import fileinput import sys import re pat = re.compile(r"((?:=\|:\|^)\sload (?:atomic )?(?:volatile )?(.?))(\| addrspace$\d+$ )\($\| (?:%\|@\|null\|undef\|blockaddress\|getelementptr\|addrspacecast\|bitcast\|inttoptr\|\[\[[a-zA-Z]\|\{\{).$)") for line in sys.stdin: sys.stdout.write(re.sub(pat, r"\1, \2\3*\4", line)) Reviewers: rafael, dexonsmith, grosser Differential Revision: http://reviews.llvm.org/D7649 llvm-svn: 230794	2015-02-27 21:17:42 +00:00
JF Bastien	5d3280c7a7	x86-32: PUSHF/POPF use/def EFLAGS Summary: As a side-quest for D6629 jvoung pointed out that I should use -verify-machineinstrs and this found a bug in x86-32's handling of EFLAGS for PUSHF/POPF. This patch fixes the use/def, and adds -verify-machineinstrs to all x86 tests which contain 'EFLAGS'. One exception: this patch leaves inline-asm-fpstack.ll as-is because it fails -verify-machineinstrs in a way unrelated to EFLAGS. This patch also modifies cmpxchg-clobber-flags.ll along the lines of what D6629 already does by also testing i386. Test Plan: ninja check Reviewers: t.p.northover, jvoung Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D6687 llvm-svn: 224359	2014-12-16 20:15:45 +00:00
Andrew Trick	e4083f9e85	Disabled subregister copy coalescing during MachineCSE. This effectively backs out r197465 but leaves some of the general fixes in place. Not all targets are ready to handle this feature. To enable it, some infrastructure work is needed to better handle register class constraints. llvm-svn: 197514	2013-12-17 19:29:36 +00:00
Andrew Trick	e339828b90	Allow MachineCSE to coalesce trivial subregister copies the same way that it coalesces normal copies. Without this, MachineCSE is powerless to handle redundant operations with truncated source operands. This required fixing the 2-addr pass to handle tied subregisters. It isn't clear what combinations of subregisters can legally be tied, but the simple case of truncated source operands is now safely handled: %vreg11<def> = COPY %vreg1:sub_32bit; GR32:%vreg11 GR64:%vreg1 %vreg12<def> = COPY %vreg2:sub_32bit; GR32:%vreg12 GR64:%vreg2 %vreg13<def,tied1> = ADD32rr %vreg11<tied0>, %vreg12<kill>, %EFLAGS<imp-def> Test case: cse-add-with-overflow.ll. This exposed an existing bug in PPCInstrInfo::commuteInstruction. Thanks to Rafael for the test case: PowerPC/crash.ll. llvm-svn: 197465	2013-12-17 04:50:45 +00:00
Rafael Espindola	f152836788	Revert "Allow MachineCSE to coalesce trivial subregister copies the same way that it coalesces normal copies." This reverts commit r197414. It broke the ppc64 bootstrap. I will post a testcase in a sec. llvm-svn: 197424	2013-12-16 20:57:09 +00:00
Andrew Trick	88bd8629b2	Allow MachineCSE to coalesce trivial subregister copies the same way that it coalesces normal copies. Without this, MachineCSE is powerless to handle redundant operations with truncated source operands. This required fixing the 2-addr pass to handle tied subregisters. It isn't clear what combinations of subregisters can legally be tied, but the simple case of truncated source operands is now safely handled: %vreg11<def> = COPY %vreg1:sub_32bit; GR32:%vreg11 GR64:%vreg1 %vreg12<def> = COPY %vreg2:sub_32bit; GR32:%vreg12 GR64:%vreg2 %vreg13<def,tied1> = ADD32rr %vreg11<tied0>, %vreg12<kill>, %EFLAGS<imp-def> llvm-svn: 197414	2013-12-16 19:36:21 +00:00
Andrew Trick	e97d8d6dde	Enable MI Sched for x86. This changes the SelectionDAG scheduling preference to source order. Soon, the SelectionDAG scheduler can be bypassed saving a nice chunk of compile time. Performance differences that result from this change are often a consequence of register coalescing. The register coalescer is far from perfect. Bugs can be filed for deficiencies. On x86 SandyBridge/Haswell, the source order schedule is often preserved, particularly for small blocks. Register pressure is generally improved over the SD scheduler's ILP mode. However, we are still able to handle large blocks that require latency hiding, unlike the SD scheduler's BURR mode. MI scheduler also attempts to discover the critical path in single-block loops and adjust heuristics accordingly. The MI scheduler relies on the new machine model. This is currently unimplemented for AVX, so we may not be generating the best code yet. Unit tests are updated so they don't depend on SD scheduling heuristics. llvm-svn: 192750	2013-10-15 23:33:07 +00:00
Stephen Lin	f799e3f944	Convert CodeGen//.ll tests to use the new CHECK-LABEL for easier debugging. No functionality change and all tests pass after conversion. This was done with the following sed invocation to catch label lines demarking function boundaries: sed -i '' "s/^;$ $$[A-Z0-9_]$:$ $test$[A-Za-z0-9_-]$:$ $$/;\1\2-LABEL:\3test\4:\5/g" test/CodeGen//*.ll which was written conservatively to avoid false positives rather than false negatives. I scanned through all the changes and everything looks correct. llvm-svn: 186258	2013-07-13 20:38:47 +00:00
Andrew Trick	121124acf8	Revert "Temporarily enable MI-Sched on X86." This reverts commit 98a9b72e8c56dc13a2617de84503a3d78352789c. llvm-svn: 184823	2013-06-25 02:48:58 +00:00
Andrew Trick	5a1e0af838	Temporarily enable MI-Sched on X86. Sorry for the unit test churn. I'll try to make the change permanently next time. llvm-svn: 184705	2013-06-24 09:13:20 +00:00
Manman Ren	cc1dc6dc11	Disable rematerialization in TwoAddressInstructionPass. It is redundant; RegisterCoalescer will do the remat if it can't eliminate the copy. Collected instruction counts before and after this. A few extra instructions are generated due to spilling but it is normal to see these kinds of changes with almost any small codegen change, according to Jakob. This also fixed rdar://11830760 where xor is expected instead of movi0. llvm-svn: 160749	2012-07-25 18:28:13 +00:00
Benjamin Kramer	3d38c17b59	Switch the select to branch transformation on by default. The primitive conservative heuristic seems to give a slight overall improvement while not regressing stuff. Make it available to wider testing. If you notice any speed regressions (or significant code size regressions) let me know! llvm-svn: 156258	2012-05-06 14:25:16 +00:00
Chris Lattner	6a144a2227	Upgrade syntax of tests using volatile instructions to use 'load volatile' instead of 'volatile load', which is archaic. llvm-svn: 145171	2011-11-27 06:54:59 +00:00
Jakob Stoklund Olesen	1f72dd40c7	Pseudo CMOV instructions don't clobber EFLAGS. The explanation about a 0 argument being materialized as xor is no longer valid. Rematerialization will check if EFLAGS is live before clobbering it. The code produced by X86TargetLowering::EmitLoweredSelect does not clobber EFLAGS. This causes one less testb instruction to be generated in the cmov.ll test case. llvm-svn: 139057	2011-09-02 23:52:55 +00:00
Bill Wendling	410ec4aad1	As Dan pointed out, movzbl, movsbl, and friends are nicer than their alias (movzx/movsx) because they give more information. Revert that part of the patch. llvm-svn: 129498	2011-04-14 01:46:37 +00:00
Bill Wendling	7e07d6fb69	Have the X86 back-end emit the alias instead of what's being aliased. In most cases, it's much nicer and more informative reading the alias. llvm-svn: 129497	2011-04-14 01:11:51 +00:00
Sean Callanan	04d8cb74f3	Instruction fixes, added instructions, and AsmString changes in the X86 instruction tables. Also (while I was at it) cleaned up the X86 tables, removing tabs and 80-line violations. This patch was reviewed by Chris Lattner, but please let me know if there are any problems. * X86.td Removed tabs and fixed 80-line violations X86Instr64bit.td (IRET, POPCNT, BT_, LSL, SWPGS, PUSH_S, POP_S, L_S, SMSW) Added (CALL, CMOV) Added qualifiers (JMP) Added PC-relative jump instruction (POPFQ/PUSHFQ) Added qualifiers; renamed PUSHFQ to indicate that it is 64-bit only (ambiguous since it has no REX prefix) (MOV) Added rr form going the other way, which is encoded differently (MOV) Changed immediates to offsets, which is more correct; also fixed MOV64o64a to have to a 64-bit offset (MOV) Fixed qualifiers (MOV) Added debug-register and condition-register moves (MOVZX) Added more forms (ADC, SUB, SBB, AND, OR, XOR) Added reverse forms, which (as with MOV) are encoded differently (ROL) Made REX.W required (BT) Uncommented mr form for disassembly only (CVT__2__) Added several missing non-intrinsic forms (LXADD, XCHG) Reordered operands to make more sense for MRMSrcMem (XCHG) Added register-to-register forms (XADD, CMPXCHG, XCHG) Added non-locked forms * X86InstrSSE.td (CVTSS2SI, COMISS, CVTTPS2DQ, CVTPS2PD, CVTPD2PS, MOVQ) Added * X86InstrFPStack.td (COM_FST0, COMP_FST0, COM_FI, COM_FIP, FFREE, FNCLEX, FNOP, FXAM, FLDL2T, FLDL2E, FLDPI, FLDLG2, FLDLN2, F2XM1, FYL2X, FPTAN, FPATAN, FXTRACT, FPREM1, FDECSTP, FINCSTP, FPREM, FYL2XP1, FSINCOS, FRNDINT, FSCALE, FCOMPP, FXSAVE, FXRSTOR) Added (FCOM, FCOMP) Added qualifiers (FSTENV, FSAVE, FSTSW) Fixed opcode names (FNSTSW) Added implicit register operand * X86InstrInfo.td (opaque512mem) Added for FXSAVE/FXRSTOR (offset8, offset16, offset32, offset64) Added for MOV (NOOPW, IRET, POPCNT, IN, BTC, BTR, BTS, LSL, INVLPG, STR, LTR, PUSHFS, PUSHGS, POPFS, POPGS, LDS, LSS, LES, LFS, LGS, VERR, VERW, SGDT, SIDT, SLDT, LGDT, LIDT, LLDT, LODSD, OUTSB, OUTSW, OUTSD, HLT, RSM, FNINIT, CLC, STC, CLI, STI, CLD, STD, CMC, CLTS, XLAT, WRMSR, RDMSR, RDPMC, SMSW, LMSW, CPUID, INVD, WBINVD, INVEPT, INVVPID, VMCALL, VMCLEAR, VMLAUNCH, VMRESUME, VMPTRLD, VMPTRST, VMREAD, VMWRITE, VMXOFF, VMXON) Added (NOOPL, POPF, POPFD, PUSHF, PUSHFD) Added qualifier (JO, JNO, JB, JAE, JE, JNE, JBE, JA, JS, JNS, JP, JNP, JL, JGE, JLE, JG, JCXZ) Added 32-bit forms (MOV) Changed some immediate forms to offset forms (MOV) Added reversed reg-reg forms, which are encoded differently (MOV) Added debug-register and condition-register moves (CMOV) Added qualifiers (AND, OR, XOR, ADC, SUB, SBB) Added reverse forms, like MOV (BT) Uncommented memory-register forms for disassembler (MOVSX, MOVZX) Added forms (XCHG, LXADD) Made operand order make sense for MRMSrcMem (XCHG) Added register-register forms (XADD, CMPXCHG) Added unlocked forms * X86InstrMMX.td (MMX_MOVD, MMV_MOVQ) Added forms * X86InstrInfo.cpp: Changed PUSHFQ to PUSHFQ64 to reflect table change * X86RegisterInfo.td: Added debug and condition register sets * x86-64-pic-3.ll: Fixed testcase to reflect call qualifier * peep-test-3.ll: Fixed testcase to reflect test qualifier * cmov.ll: Fixed testcase to reflect cmov qualifier * loop-blocks.ll: Fixed testcase to reflect call qualifier * x86-64-pic-11.ll: Fixed testcase to reflect call qualifier * 2009-11-04-SubregCoalescingBug.ll: Fixed testcase to reflect call qualifier * x86-64-pic-2.ll: Fixed testcase to reflect call qualifier * live-out-reg-info.ll: Fixed testcase to reflect test qualifier * tail-opts.ll: Fixed testcase to reflect call qualifiers * x86-64-pic-10.ll: Fixed testcase to reflect call qualifier * bss-pagealigned.ll: Fixed testcase to reflect call qualifier * x86-64-pic-1.ll: Fixed testcase to reflect call qualifier * widen_load-1.ll: Fixed testcase to reflect call qualifier llvm-svn: 91638	2009-12-18 00:01:26 +00:00
Dan Gohman	d29cafc083	Restore a comment that was lost in the merge. llvm-svn: 81857	2009-09-15 15:09:54 +00:00
Chris Lattner	30fceaba05	this is failing on linux hosts, force a triple. llvm-svn: 81833	2009-09-15 04:27:29 +00:00
Chris Lattner	170c116aa7	merge one more in. llvm-svn: 81824	2009-09-15 02:27:23 +00:00
Chris Lattner	42aaa6e443	merge some more cmov tests into cmov.ll llvm-svn: 81823	2009-09-15 02:25:21 +00:00
Chris Lattner	baf78e5b52	merge two cmov tests into one. llvm-svn: 81822	2009-09-15 02:22:47 +00:00

34 Commits