Commit Graph

600 Commits

Author SHA1 Message Date
Craig Topper d78429f850 Add a bunch of AVX instructions to the folding tables. Also fixed the alignment on 256-bit AVX2 instructions.
llvm-svn: 148194
2012-01-14 18:14:53 +00:00
Craig Topper e52d86a740 Convert SHUFPD with the same register for both sources to PSHUFD if it would prevent a register copy. Similar to SHUFPS, but requires the mask to be converted.
llvm-svn: 148112
2012-01-13 09:21:41 +00:00
Craig Topper cb7e13d7c0 Make X86 instruction selection use 256-bit VPXOR for build_vector of all ones if AVX2 is enabled. This gives the ExeDepsFix pass a chance to choose FP vs int as appropriate. Also use v8i32 as the type for getZeroVector if AVX2 is enabled. This is consistent with SSE2 using prefering v4i32.
llvm-svn: 148108
2012-01-13 08:12:35 +00:00
Craig Topper a4c5a47b97 Use 8i32 constant pool entry for converting AVX2_SETALLONES. Possibly fixes PR11750.
llvm-svn: 148101
2012-01-13 06:12:41 +00:00
Evan Cheng 7fae11b231 - Add MachineInstrBundle.h and MachineInstrBundle.cpp. This includes a function
to finalize MI bundles (i.e. add BUNDLE instruction and computing register def
  and use lists of the BUNDLE instruction) and a pass to unpack bundles.
- Teach more of MachineBasic and MachineInstr methods to be bundle aware.
- Switch Thumb2 IT block to MI bundles and delete the hazard recognizer hack to
  prevent IT blocks from being broken apart.

llvm-svn: 146542
2011-12-14 02:11:42 +00:00
Benjamin Kramer 2dc5dec41d X86: Split (v)rounds[sd] into a normal and an intrinsic version.
llvm-svn: 146256
2011-12-09 15:43:55 +00:00
Evan Cheng 7f8e563a69 Add bundle aware API for querying instruction properties and switch the code
generator to it. For non-bundle instructions, these behave exactly the same
as the MC layer API.

For properties like mayLoad / mayStore, look into the bundle and if any of the
bundled instructions has the property it would return true.
For properties like isPredicable, only return true if *all* of the bundled
instructions have the property.
For properties like canFoldAsLoad, isCompare, conservatively return false for
bundles.

llvm-svn: 146026
2011-12-07 07:15:52 +00:00
Jakob Stoklund Olesen bde32d36bb Make X86::FsFLD0SS / FsFLD0SD real pseudo-instructions.
Like V_SET0, these instructions are expanded by ExpandPostRA to xorps /
vxorps so they can participate in execution domain swizzling.

This also makes the AVX variants redundant.

llvm-svn: 145440
2011-11-29 22:27:25 +00:00
Craig Topper 12b72def4e Fix VINSERTF128/VEXTRACTF128 to be marked as FP instructions. Allow execution dependency fix pass to convert them to their integer equivalents when AVX2 is enabled.
llvm-svn: 145376
2011-11-29 05:37:58 +00:00
Craig Topper 897a7d4b9c Correctly mark VPERM2F128 as being an FP instruction and add execution domain fixing support to convert it to VPERM2I128 for AVX2.
llvm-svn: 145370
2011-11-29 03:57:34 +00:00
Jakob Stoklund Olesen 02845410f9 Fix PR11422.
This was a bug in keeping track of the available domains when merging
domain values.

The wrong domain mask caused ExecutionDepsFix to try to move VANDPSYrr
to the integer domain which is only available in AVX2.

Also add an assertion to catch future attempts at emitting AVX2
instructions.

llvm-svn: 145096
2011-11-23 04:03:08 +00:00
Craig Topper a3a6583694 Use 256-bit vcmpeqd for creating an all ones vector when AVX2 is enabled.
llvm-svn: 145004
2011-11-19 22:34:59 +00:00
Jay Foad 0745e645e0 Remove some unnecessary includes of PseudoSourceValue.h.
llvm-svn: 144631
2011-11-15 07:24:32 +00:00
Craig Topper 649d1c5eec Fix PR11370 for real. Prevents converting 256-bit FP instruction to AVX2 256-bit integer instructions when AVX2 isn't enabled.
llvm-svn: 144629
2011-11-15 06:39:01 +00:00
Craig Topper 05baa85f58 Properly qualify AVX2 specific parts of execution dependency table. Also enable converting between 256-bit PS/PD operations when AVX1 is enabled. Fixes PR11370.
llvm-svn: 144622
2011-11-15 05:55:35 +00:00
Jakob Stoklund Olesen f8ad336bc4 Break false dependencies before partial register updates.
Two new TargetInstrInfo hooks lets the target tell ExecutionDepsFix
about instructions with partial register updates causing false unwanted
dependencies.

The ExecutionDepsFix pass will break the false dependencies if the
updated register was written in the previoius N instructions.

The small loop added to sse-domains.ll runs twice as fast with
dependency-breaking instructions inserted.

llvm-svn: 144602
2011-11-15 01:15:30 +00:00
Craig Topper 182b00a2e0 Add AVX2 version of instructions to load folding tables. Also add a bunch of missing SSE/AVX instructions.
llvm-svn: 144525
2011-11-14 08:07:55 +00:00
Craig Topper f87a2bef51 Enable execution dependency fix pass for YMM registers when AVX2 is enabled. Add AVX2 logical operations to list of replaceable instructions.
llvm-svn: 144179
2011-11-09 09:37:21 +00:00
Jakob Stoklund Olesen 0241308954 Expand V_SET0 to xorps by default.
The xorps instruction is smaller than pxor, so prefer that encoding.

The ExecutionDepsFix pass will switch the encoding to pxor and xorpd
when appropriate.

llvm-svn: 143996
2011-11-07 19:15:58 +00:00
Jakob Stoklund Olesen 729abd360e Add TEST8ri_NOREX pseudo to constrain sub_8bit_hi copies.
In 64-bit mode, sub_8bit_hi sub-registers can only be used by NOREX
instructions. The COPY created from the EXTRACT_SUBREG DAG node cannot
target all GR8 registers, only those in GR8_NOREX.

TO enforce this, we ensure that all instructions using the
EXTRACT_SUBREG are GR8_NOREX constrained.

This fixes PR11088.

llvm-svn: 141499
2011-10-08 18:28:28 +00:00
Jakob Stoklund Olesen 464fcc0035 Constrain both operands on MOVZX32_NOREXrr8.
This instruction is explicitly encoded without an REX prefix, so both
operands but be *_NOREX.

Also add an assertion to copyPhysReg() that fires when the MOV8rr_NOREX
constraints are not satisfied.

This fixes a miscompilation in 20040709-2 in the gcc test suite.

llvm-svn: 141410
2011-10-07 20:15:54 +00:00
Jakob Stoklund Olesen dd1904e7a6 Expand the x86 V_SET0* pseudos right after register allocation.
This also makes it possible to reduce the number of pseudo instructions
and get rid of the encoding information.

llvm-svn: 140776
2011-09-29 05:10:54 +00:00
Jakob Stoklund Olesen b48c994cc0 Promote the X86 Get/SetSSEDomain functions to TargetInstrInfo.
I am going to unify the SSEDomainFix and NEONMoveFix passes into a
single target independent pass.  They are essentially doing the same
thing.

llvm-svn: 140652
2011-09-27 22:57:18 +00:00
Jakob Stoklund Olesen f05864ad7d Add support for GR32 <-> FR32 cross class copies.
We already support GR64 <-> VR128 copies.  All of these copies break
partial register dependencies by zeroing the high part of the target
register.

llvm-svn: 140348
2011-09-22 22:45:24 +00:00
Bruno Cardoso Lopes 7b43568a93 Add a fixme note!
llvm-svn: 139872
2011-09-15 23:04:24 +00:00
Bruno Cardoso Lopes c69d68a150 Add the remaining AVX versions of instructions to X86InstrInfo, this
time for describing high latency ones and for recognizting loads
from the same base pointer

llvm-svn: 139864
2011-09-15 22:15:52 +00:00
Bruno Cardoso Lopes 6b302955b1 Factor out partial register update checks for some SSE instructions.
Also add the AVX versions and add comments!

llvm-svn: 139854
2011-09-15 21:42:23 +00:00
Bruno Cardoso Lopes d560b8c8e9 Teach the foldable tables about 128-bit AVX instructions and make the
alignment check for 256-bit classes more strict. There're no testcases
but we catch more folding cases for AVX while running single and multi
sources in the llvm testsuite.

Since some 128-bit AVX instructions have different number of operands
than their SSE counterparts, they are placed in different tables.

256-bit AVX instructions should also be added in the table soon. And
there a few more 128-bit versions to handled, which should come in
the following commits.

llvm-svn: 139687
2011-09-14 02:36:58 +00:00
Bruno Cardoso Lopes 23eb5265b4 * Combines Alignment, AuxInfo, and TB_NOT_REVERSABLE flag into a
single field (Flags), which is a bitwise OR of items from the TB_*
enum. This makes it easier to add new information in the future.

* Gives every static array an equivalent layout: { RegOp, MemOp, Flags }

* Adds a helper function, AddTableEntry, to avoid duplication of the
insertion code.

* Renames TB_NOT_REVERSABLE to TB_NO_REVERSE.

* Adds TB_NO_FORWARD, which is analogous to TB_NO_REVERSE, except that
it prevents addition of the Reg->Mem entry. (This is going to be used
by Native Client, in the next CL).

Patch by David Meyer

llvm-svn: 139311
2011-09-08 18:35:57 +00:00
Bruno Cardoso Lopes aad5e50ded Add AVX versions of FsMOVAPS and FsMOVAPS. Teach X86InstrInfo how to use
it!

llvm-svn: 139063
2011-09-03 00:46:45 +00:00
Jakob Stoklund Olesen f08354d183 Check for EFLAGS live-out before clobbering it.
It is only allowed to clobber EFLAGS at the end of a block if it isn't
live-in to any successor.

llvm-svn: 139056
2011-09-02 23:52:52 +00:00
Bruno Cardoso Lopes db520db514 Teach more places to use VMOVAPS,VMOVUPS instead of MOVAPS,MOVUPS,
whenever AVX is enabled.

llvm-svn: 138849
2011-08-31 03:04:09 +00:00
Bruno Cardoso Lopes dbd1352c80 Cleanup: Remove Int_ CVTSS2SI* forms
llvm-svn: 137297
2011-08-11 02:52:36 +00:00
Jakob Stoklund Olesen daa2cad723 Hoist hasLoadFromStackSlot and hasStoreToStackSlot.
These the methods are target-independent since they simply scan the
memory operands.  They can live in TargetInstrInfoImpl.

llvm-svn: 137063
2011-08-08 20:53:24 +00:00
Bruno Cardoso Lopes 9212bf275d Codegen allonesvector better while using AVX: vpcmpeqd + vinsertf128
This also fixes PR10452

llvm-svn: 136004
2011-07-25 23:05:32 +00:00
Evan Cheng 7e763d86ba Refactor X86 target to separate MC code from Target code.
llvm-svn: 135930
2011-07-25 18:43:53 +00:00
Bruno Cardoso Lopes a89039998d Fix PR10422 by adding the necessary AVX UCOMISD memory versions to
load folding logic

llvm-svn: 135801
2011-07-22 20:53:20 +00:00
Chris Lattner 229907cd11 land David Blaikie's patch to de-constify Type, with a few tweaks.
llvm-svn: 135375
2011-07-18 04:54:35 +00:00
Evan Cheng bc153d49b7 Next round of MC refactoring. This patch factor MC table instantiations, MC
registeration and creation code into XXXMCDesc libraries.

llvm-svn: 135184
2011-07-14 20:59:42 +00:00
Bruno Cardoso Lopes 6778597deb Add 256-bit load/store recognition and matching in several places.
llvm-svn: 135171
2011-07-14 18:50:58 +00:00
Evan Cheng 703a0fbf39 Hide the call to InitMCInstrInfo into tblgen generated ctor.
llvm-svn: 134244
2011-07-01 17:57:27 +00:00
Evan Cheng 194c3dc01f Move CallFrameSetupOpcode and CallFrameDestroyOpcode to TargetInstrInfo.
llvm-svn: 134030
2011-06-28 21:14:33 +00:00
Evan Cheng 1e210d08d8 Merge XXXGenRegisterNames.inc into XXXGenRegisterInfo.inc
llvm-svn: 134024
2011-06-28 20:07:07 +00:00
Evan Cheng 6cc775f905 - Rename TargetInstrDesc, TargetOperandInfo to MCInstrDesc and MCOperandInfo and
sink them into MC layer.
- Added MCInstrInfo, which captures the tablegen generated static data. Chang
TargetInstrInfo so it's based off MCInstrInfo.

llvm-svn: 134021
2011-06-28 19:10:37 +00:00
Evan Cheng 8d71a75777 More refactoring. Move getRegClass from TargetOperandInfo to TargetInstrInfo.
llvm-svn: 133944
2011-06-27 21:26:13 +00:00
Evan Cheng ee9b90a727 Get rid of one getStackAlignment(). RegisterInfo shouldn't need to know about stack alignment.
llvm-svn: 133679
2011-06-23 01:53:43 +00:00
Rafael Espindola defd4b0875 AnalyzeBranch doesn't change which successors a bb has, just the order
we try to branch to them.

Before we were creating successor lists with duplicated entries. Fixing that
found a bug in isBlockOnlyReachableByFallthrough that would causes it to
return the wrong answer for

-----------
...
jne foo
jmp bar

foo:
----------

llvm-svn: 132882
2011-06-12 03:20:32 +00:00
Eli Friedman 87ef38784e PR10092 (second try): Don't crash on a load without a momoperand; fast-isel creates loads like this.
llvm-svn: 132826
2011-06-10 01:13:01 +00:00
Eli Friedman 9008377c2d Revert 132789; it breaks tests. My mistake.
llvm-svn: 132795
2011-06-09 19:33:30 +00:00
Eli Friedman c095116710 Add a check to make sure we don't crash with strange configurations where we do fast-isel, then try to fold instructions. PR10092.
llvm-svn: 132789
2011-06-09 18:55:00 +00:00
Jakob Stoklund Olesen 56ce3a0f01 Fix PR10059 and future variations by handling all register subclasses.
Add TargetRegisterInfo::hasSubClassEq and use it to check for compatible
register classes instead of trying to list all register classes in
X86's getLoadStoreRegOpcode.

llvm-svn: 132398
2011-06-01 15:32:10 +00:00
Jakob Stoklund Olesen 2348cdd67f X86AsmPrinter doesn't know how to handle the X86II::MO_GOT_ABSOLUTE_ADDRESS flag
after folding ADD32ri to ADD32mi, so don't do that.

This only happens when the greedy register allocator gets itself in trouble and
spills %vreg9 here:

16L             %vreg9<def> = MOVPC32r 0, %ESP<imp-use>; GR32:%vreg9
48L             %vreg9<def> = ADD32ri %vreg9, <es:_GLOBAL_OFFSET_TABLE_>[TF=1], %EFLAGS<imp-def,dead>; GR32:%vreg9

That should never happen, the live range should be split instead.

llvm-svn: 130625
2011-04-30 23:00:05 +00:00
Chris Lattner 0ab5e2cded Fix a ton of comment typos found by codespell. Patch by
Luis Felipe Strano Moraes!

llvm-svn: 129558
2011-04-15 05:18:47 +00:00
Bill Wendling b902f1dd88 Reapply r129401 with patch for clang.
llvm-svn: 129419
2011-04-13 00:36:11 +00:00
Bill Wendling dbfde42468 Revert r129401 for now. Clang is using the old way of doing things.
llvm-svn: 129403
2011-04-12 22:59:27 +00:00
Bill Wendling 47c24875a1 Remove the unaligned load intrinsics in favor of using native unaligned loads.
Now that we have a first-class way to represent unaligned loads, the unaligned
load intrinsics are superfluous.

First part of <rdar://problem/8460511>.

llvm-svn: 129401
2011-04-12 22:46:31 +00:00
Andrew Trick 641e2d4f8c Increased the register pressure limit on x86_64 from 8 to 12
regs. This is the only change in this checkin that may affects the
default scheduler. With better register tracking and heuristics, it
doesn't make sense to artificially lower the register limit so much.

Added -sched-high-latency-cycles and X86InstrInfo::isHighLatencyDef to
give the scheduler a way to account for div and sqrt on targets that
don't have an itinerary. It is currently defaults to 10 (the actual
number doesn't matter much), but only takes effect on non-default
schedulers: list-hybrid and list-ilp.

Added several heuristics that can be individually disabled for the
non-default sched=list-ilp mode. This helps us determine how much
better we can do on a given benchmark than the default
scheduler. Certain compute intensive loops run much faster in this
mode with the right set of heuristics, and it doesn't seem to have
much negative impact elsewhere. Not all of the heuristics are needed,
but we still need to experiment to decide which should be disabled by
default for sched=list-ilp.

llvm-svn: 127067
2011-03-05 08:00:22 +00:00
Evan Cheng 3923466e82 Fix bug in X86 folding / unfolding table. Int_CMPSDrm and Int_CMPSSrm memory
operands starts at index 2, not 1.
rdar://9045024
PR9305

llvm-svn: 126359
2011-02-24 02:36:52 +00:00
NAKAMURA Takumi 0cfdac078e Target/X86: Tweak win64's tailcall.
llvm-svn: 124272
2011-01-26 02:04:09 +00:00
NAKAMURA Takumi 9d29eff198 Fix whitespace.
llvm-svn: 124270
2011-01-26 02:03:37 +00:00
Nate Begeman 073901c836 Add support for AVX to materialize +0.0 when doing scalar FP.
llvm-svn: 121415
2010-12-09 21:43:51 +00:00
Anton Korobeynikov d08fbd19f5 Move callee-saved regs spills / reloads to TFI
llvm-svn: 120228
2010-11-27 23:05:03 +00:00
Evan Cheng 63c7608c34 Re-enable register pressure aware machine licm with fixes. Hoist() may have
erased the instruction during LICM so UpdateRegPressureAfter() should not
reference it afterwards.

llvm-svn: 116845
2010-10-19 18:58:51 +00:00
Daniel Dunbar 418204e523 Revert r116781 "- Add a hook for target to determine whether an instruction def
is", which breaks some nightly tests.

llvm-svn: 116816
2010-10-19 17:14:24 +00:00
Evan Cheng 8249dfe6ce - Add a hook for target to determine whether an instruction def is
"long latency" enough to hoist even if it may increase spilling. Reloading
  a value from spill slot is often cheaper than performing an expensive
  computation in the loop. For X86, that means machine LICM will hoist
  SQRT, DIV, etc. ARM will be somewhat aggressive with VFP and NEON
  instructions.
- Enable register pressure aware machine LICM by default.

llvm-svn: 116781
2010-10-19 00:55:07 +00:00
Jakob Stoklund Olesen aec745326a Remove the x86 MOV{32,64}{rr,rm,mr}_TC instructions.
The reg-reg copies were no longer being generated since copyPhysReg copies
physical registers only.

The loads and stores are not necessary - The TC constraint is imposed by the
TAILJMP and TCRETURN instructions, there should be no need for constrained loads
and stores.

llvm-svn: 116314
2010-10-12 17:15:00 +00:00
Chris Lattner dd77477690 reapply: Use the new TB_NOT_REVERSABLE flag instead of special
reapply: reimplement the second half of the or/add optimization.  We should now

with no changes.  Turns out that one missing "Defs = [EFLAGS]" can upset things
a bit.

llvm-svn: 116040
2010-10-08 03:57:25 +00:00
Chris Lattner 626656a562 reapply the patch reverted in r116033:
"Reimplement (part of) the or -> add optimization.  Matching 'or' into 'add'"

With a critical fix: the add pseudos clobber EFLAGS.

llvm-svn: 116039
2010-10-08 03:54:52 +00:00
Daniel Dunbar 8f21f9c1fb Revert "Reimplement (part of) the or -> add optimization. Matching 'or' into
'add'", which seems to have broken just about everything.

llvm-svn: 116033
2010-10-08 02:07:32 +00:00
Daniel Dunbar 5b2a411c77 Revert "Use the new TB_NOT_REVERSABLE flag instead of special ", which depends
on r116007, which I am about to revert.

llvm-svn: 116032
2010-10-08 02:07:29 +00:00
Daniel Dunbar efdf08b5b8 Revert "reimplement the second half of the or/add optimization. We should now",
which depends on r116007, which I am about to revert.

llvm-svn: 116031
2010-10-08 02:07:26 +00:00
Chris Lattner 134f415bf8 reimplement the second half of the or/add optimization. We should now
only end up emitting LEA instead of OR.  If we aren't able to promote
something into an LEA, we should never be emitting it as an ADD.

Add some testcases that we emit "or" in cases where we used to produce
an "add".

llvm-svn: 116026
2010-10-08 01:05:10 +00:00
Chris Lattner e2245542ce Use the new TB_NOT_REVERSABLE flag instead of special
casing FsMOVAPDrr/FsMOVAPSrr.

llvm-svn: 116016
2010-10-08 00:03:02 +00:00
Chris Lattner 0921bfdf36 simplify some map operations.
llvm-svn: 116014
2010-10-07 23:57:02 +00:00
Chris Lattner 4fb38d3cd3 Reimplement (part of) the or -> add optimization. Matching 'or' into 'add'
is general goodness because it allows ORs to be converted to LEA to avoid
inserting copies.  However, this is bad because it makes the generated .s
file less obvious and gives valgrind heartburn (tons of false positives in
bitfield code).

While the general fix should be in valgrind, we can at least try to avoid
emitting ADD instructions that *don't* get promoted to LEA.  This is more
work because it requires introducing pseudo instructions to represents
"add that knows the bits are disjoint", but hey, people really love valgrind.

This fixes this testcase:
https://bugs.kde.org/show_bug.cgi?id=242137#c20

the add r/i cases are coming next.

llvm-svn: 116007
2010-10-07 23:36:18 +00:00
Chris Lattner 1c090c00bc Reduce casting in various tables by defining the table
with the right types.

llvm-svn: 116001
2010-10-07 23:08:41 +00:00
Chris Lattner 70a7b54f97 simplify code: don't build up vector only to assert it is empty.
llvm-svn: 115997
2010-10-07 22:26:19 +00:00
Jakob Stoklund Olesen b19bae4e3e Constrain the offset register to a *_NOSP register class when inserting LEA
instructions.

This unbreaks the machine code verifier and fixes PR8317.

llvm-svn: 115879
2010-10-07 00:07:26 +00:00
Chris Lattner 1a1c600110 Use #NAME# to have the CMOV multiclass define things with the same names as before
(e.g. CMOVBE16rr instead of CMOVBErr16).

llvm-svn: 115705
2010-10-05 23:00:14 +00:00
Chris Lattner 0067ee02f9 switch CMOVBE to the multipattern:
21 insertions(+), 53 deletions(-)

Moar change coming before I switch the rest.

llvm-svn: 115697
2010-10-05 22:23:58 +00:00
Chris Lattner f60062fd55 add basic avx support to the disassembler, also teach it about ssmem/sdmem
operands.

With this done, we can remove the _Int suffixes from the round instructions
without the disassembler blowing up.  This allows the assembler to support
them, implementing rdar://8456376 - llvm-mc rejects 'roundss'

llvm-svn: 115019
2010-09-29 02:57:56 +00:00
Chris Lattner ff3a3930a0 add asmparser support for cvttpd2dq by removing some Int_ prefixes.
Clean up cvttps2dq by removing some redundant implementations of the
same instruction.  rdar://8456382

llvm-svn: 115018
2010-09-29 02:36:32 +00:00
Chris Lattner ef1c2fc305 implement rdar://8456382 - cvtsd2si support, by removing some Int_ prefixes.
llvm-svn: 115017
2010-09-29 02:24:57 +00:00
Chris Lattner 37fc469f88 fix rdar://8456412 - llvm-mc crash in encoder on "mov %rdx, %cr8"
Teaching the code generator about CR8-15, how to rex them up, etc.

llvm-svn: 114533
2010-09-22 05:29:50 +00:00
Dan Gohman 534db8a5c8 Avoid emitting a PIC base register if no PIC addresses are needed.
This fixes rdar://8396318.

llvm-svn: 114201
2010-09-17 20:24:24 +00:00
Anton Korobeynikov c0b36921c2 Properly handle passing of FP stuff to varargs function on Win64:
value should be copied to the corresponding shadow reg as well.
Patch by Cameron Esfahani!

llvm-svn: 112262
2010-08-27 14:43:06 +00:00
Anton Korobeynikov 88c09879c7 Revert part of one of the prev. patches - tailjmp will follow later.
llvm-svn: 111291
2010-08-17 21:08:28 +00:00
Anton Korobeynikov cd78af6e3c Enable more win64 calls folding opportunities.
Patch by Cameron Esfahani!

llvm-svn: 111288
2010-08-17 21:06:01 +00:00
Bruno Cardoso Lopes 7f704b31a9 - Teach SSEDomainFix to switch between different levels of AVX instructions. Here we guess that AVX will have domain issues, so just implement them for consistency and in the future we remove if it's unnecessary.
- Make foldMemoryOperandImpl aware of 256-bit zero vectors folding and support the 128-bit counterparts of AVX too.
- Make sure MOV[AU]PS instructions are only selected when SSE1 is enabled, and duplicate the patterns to match AVX.
- Add a testcase for a simple 128-bit zero vector creation.

llvm-svn: 110946
2010-08-12 20:20:53 +00:00
Bruno Cardoso Lopes 1401e040eb Fix comment order
llvm-svn: 110898
2010-08-12 02:08:52 +00:00
Jakob Stoklund Olesen 9c473e46f3 Fix <rdar://problem/8282498> even if it doesn't reproduce on trunk.
When a register is defined by a partial load:

  %reg1234:sub_32 = MOV32mr <fi#-1>; GR64:%reg1234

That load cannot be folded into an instruction using the full 64-bit register.
It would become a 64-bit load.

This is related to the recent change to have isLoadFromStackSlot return false on
a sub-register load.

llvm-svn: 110874
2010-08-11 23:08:22 +00:00
Owen Anderson a7aed18624 Reapply r110396, with fixes to appease the Linux buildbot gods.
llvm-svn: 110460
2010-08-06 18:33:48 +00:00
Owen Anderson bda59bd247 Revert r110396 to fix buildbots.
llvm-svn: 110410
2010-08-06 00:23:35 +00:00
Owen Anderson 755aceb5d0 Don't use PassInfo* as a type identifier for passes. Instead, use the address of the static
ID member as the sole unique type identifier.  Clean up APIs related to this change.

llvm-svn: 110396
2010-08-05 23:42:04 +00:00
Jakob Stoklund Olesen ba0e124aaf Revert r109652, and remove the offending assert in loadRegFromStackSlot instead.
We do sometimes load from a too small stack slot when dealing with x86 arguments
(varargs and smaller-than-32-bit args). It looks like we know what we are doing
in those cases, so I am going to remove the assert instead of artifically
enlarging stack slot sizes.

The assert in storeRegToStackSlot stays in. We don't want to write beyond the
bounds of a stack slot.

llvm-svn: 109764
2010-07-29 17:42:27 +00:00
Jakob Stoklund Olesen 96a890a7f8 The isLoadFromStackSlot and isStoreToStackSlot have no way of reporting
subregister operands like this:

%reg1040:sub_32bit<def> = MOV32rm <fi#-2>, 1, %reg0, 0, %reg0, %reg1040<imp-def>; mem:LD4[FixedStack-2](align=8)

Make them return false when subreg operands are present. VirtRegRewriter is
making bad assumptions otherwise.

This fixes PR7713.

llvm-svn: 109489
2010-07-27 04:17:01 +00:00
Jakob Stoklund Olesen c3c05ed02e Add assertions that expose the PR7713 miscompilation: Accessing a stack slot
with a too-big register class.

llvm-svn: 109488
2010-07-27 04:16:58 +00:00
Chris Lattner 8f3adc9057 remove the JIT "NeedsExactSize" feature and supporting logic.
llvm-svn: 109167
2010-07-22 21:17:55 +00:00
Chris Lattner 083be4d384 instead of migrating it to the MC instruction encoder, just
rip out the implementation of X86InstrInfo::GetInstSizeInBytes.
The code being ripped out just implemented a copy and hacked up
version of the (old) instruction encoder, and is buggy and 
terrible in other ways.  Since "GetInstSizeInBytes" is really 
only there to support the JIT's "NeedsExactSize" hook (which
noone is using), just rip out the code.  I will rip out the
NeedsExactSize hook next.

This resolves rdar://7617809 - switch X86InstrInfo::GetInstSizeInBytes to use X86MCCodeEmitter

llvm-svn: 109149
2010-07-22 21:05:13 +00:00
Rafael Espindola 350b1a449f Fixes win64. It was broken by a previous patch where I missed the !isWin64
and then forced every register to be a vr128 on win64.

llvm-svn: 109060
2010-07-21 23:19:57 +00:00