Commit Graph

9087 Commits

Author SHA1 Message Date
Craig Topper 0498b88d48 Post process ADC/SBB and use a shorter encoding if they use a sign extended immediate.
llvm-svn: 177243
2013-03-18 03:34:55 +00:00
Craig Topper 7e9a1cb199 Refactor some duplicated code into helper functions.
llvm-svn: 177242
2013-03-18 02:53:34 +00:00
Craig Topper 612f7bfa4d Add X86 code emitter support AVX encoded MRMDestReg instructions.
Previously we weren't skipping the VVVV encoded register. Based on patch by Michael Liao.

llvm-svn: 177221
2013-03-16 03:44:31 +00:00
Jakob Stoklund Olesen 63bff2eb39 Define more SchedWrites for annotating X86 instructions.
Since almost all X86 instructions can fold loads, use a multiclass to
define register/memory pairs of SchedWrites.

An X86FoldableSchedWrite represents the register version of an
instruction. It holds a reference to the SchedWrite to use when the
instruction folds a load.

This will be used inside multiclasses that define rr and rm instruction
versions together.

llvm-svn: 177210
2013-03-16 00:02:17 +00:00
Eric Christopher 8996c5d469 Silence anonymous type in anonymous union warnings.
llvm-svn: 177135
2013-03-15 00:42:55 +00:00
Nadav Rotem adfa5eaf8c Unaligned loads should use the VMOVUPS opcode.
llvm-svn: 177130
2013-03-14 23:49:44 +00:00
Jakob Stoklund Olesen 712366821a Prepare for adding InstrSchedModel annotations to X86 instructions.
The new InstrSchedModel is easier to use than the instruction
itineraries. It will be used to model instruction latency and throughput
in modern Intel microarchitectures like Sandy Bridge.

InstrSchedModel should be able to coexist with instruction itinerary
classes, but for cleanliness we should switch the Atom processor model
to the new InstrSchedModel as well.

llvm-svn: 177122
2013-03-14 22:42:17 +00:00
Chad Rosier 4b54f594b4 [fast-isel] The X86FastISel::FastLowerArguments function doesn't properly handle
the win64 calling convention.
rdar://13423768

llvm-svn: 177113
2013-03-14 21:25:04 +00:00
Craig Topper ba82429826 Fix the name of a variable to match its declaration. Fixes build failure from r177014.
llvm-svn: 177015
2013-03-14 07:47:43 +00:00
Craig Topper 872999737d Fix a bug in the calculation of the VEX.B bit for FMA4 rr with the VEX.W bit set. The VEX.B was being calculated from the wrong operand. Fixes at least some portion of PR14185.
llvm-svn: 177014
2013-03-14 07:40:52 +00:00
Craig Topper a66d81d521 Teach X86 MC instruction lowering that VMOVAPSrr and other VEX-encoded register to register moves should be switched from using the MRMSrcReg form to the MRMDestReg form if the source register is a 64-bit extended register and the destination register is not. This allows the instruction to be encoded using the 2-byte VEX form instead of the 3-byte VEX form. The GNU assembler has similar behavior.
llvm-svn: 177011
2013-03-14 07:09:57 +00:00
Michael Liao 20d287044c Fix PR15309
- Fix the typo on type checking

llvm-svn: 177010
2013-03-14 06:57:42 +00:00
Kevin Enderby f15856ebb4 Fixes disassembler crashes on 2013 Haswell RTM instructions.
rdar://13318048

llvm-svn: 176828
2013-03-11 21:17:13 +00:00
Tom Stellard b1588fc057 DAGCombiner: Use correct value type for checking legality of BR_CC v3
LegalizeDAG.cpp uses the value of the comparison operands when checking
the legality of BR_CC, so DAGCombiner should do the same.

v2:
  - Expand more BR_CC value types for NVPTX

v3:
  - Expand correct BR_CC value types for Hexagon, Mips, and XCore.

llvm-svn: 176694
2013-03-08 15:36:57 +00:00
Benjamin Kramer 2c3d0df8ee X86: Fold EXTRACT_SUBVECTORs of a BUILD_VECTOR into a smaller BUILD_VECTOR.
That can usually be lowered efficiently and is common in sandybridge code.
It would be nice to do this in DAGCombiner but we can't insert arbitrary
BUILD_VECTORs this late.

Fixes PR15462.

llvm-svn: 176634
2013-03-07 18:48:40 +00:00
Michael Liao d5cac37dc5 Fix two remaining issue after fixing PR15355 when CMOV is not available
- Phi nodes should be replaced/updated after lowering CMOV into branch
  because 'mainMBB' updating operand in Phi node is changed.
- Add EFLAGS in livein before lowering the 2nd CMOV. It's necessary as
  we will reuse the EFLAGS generated before the 1st lowered CMOV, which
  won't clobber EFLAGS. However, we need explicitly specify that.
- '-attr=-cmov' test case are added.

llvm-svn: 176598
2013-03-07 01:01:29 +00:00
Michael Liao da22b30be5 Fix PR15355
- Clear 'mayStore' flag when loading from the atomic variable before the
  spin loop
- Clear kill flag from one use to multiple use in registers forming the
  address to that atomic variable
- don't use a physical register as live-in register in BB (neither entry
  nor landing pad.) by copying it into virtual register

(patch by Cameron Zwarich)

llvm-svn: 176538
2013-03-06 00:17:04 +00:00
David Sehr 4c8979cd4d The current X86 NOP padding uses one long NOP followed by the remainder in
one-byte NOPs.  If the processor actually executes those NOPs, as it sometimes
does with aligned bundling, this can have a performance impact.  From my
micro-benchmarks run on my one machine, a 15-byte NOP followed by twelve
one-byte NOPs is about 20% worse than a 15 followed by a 12.  This patch
changes NOP emission to emit as many 15-byte (the maximum) as possible followed
by at most one shorter NOP.

llvm-svn: 176464
2013-03-05 00:02:23 +00:00
Preston Gurd 485296d1e8 Bypass Slow Divides
* Only apply divide bypass optimization when not optimizing for size. 
* Fixed bug caused by constant for 0 value of type Int32,
  used dividend type to generate the constant instead.
* For atom x86-64 apply the divide bypass to use 16-bit divides instead of
  64-bit divides when operand values are small enough.
* Added lit tests for 64-bit divide bypass.

Patch by Tyler Nowicki!

llvm-svn: 176442
2013-03-04 18:13:57 +00:00
Arnold Schwaighofer 20ef54f4c1 X86 cost model: Adjust cost for custom lowered vector multiplies
This matters for example in following matrix multiply:

int **mmult(int rows, int cols, int **m1, int **m2, int **m3) {
  int i, j, k, val;
  for (i=0; i<rows; i++) {
    for (j=0; j<cols; j++) {
      val = 0;
      for (k=0; k<cols; k++) {
        val += m1[i][k] * m2[k][j];
      }
      m3[i][j] = val;
    }
  }
  return(m3);
}

Taken from the test-suite benchmark Shootout.

We estimate the cost of the multiply to be 2 while we generate 9 instructions
for it and end up being quite a bit slower than the scalar version (48% on my
machine).

Also, properly differentiate between avx1 and avx2. On avx-1 we still split the
vector into 2 128bits and handle the subvector muls like above with 9
instructions.
Only on avx-2 will we have a cost of 9 for v4i64.

I changed the test case in test/Transforms/LoopVectorize/X86/avx1.ll to use an
add instead of a mul because with a mul we now no longer vectorize. I did
verify that the mul would be indeed more expensive when vectorized with 3
kernels:

for (i ...)
   r += a[i] * 3;
for (i ...)
  m1[i] = m1[i] * 3; // This matches the test case in avx1.ll
and a matrix multiply.

In each case the vectorized version was considerably slower.

radar://13304919

llvm-svn: 176403
2013-03-02 04:02:52 +00:00
Michael Liao 6af16fc3b7 Fix PR10475
- ISD::SHL/SRL/SRA must have either both scalar or both vector operands
  but TLI.getShiftAmountTy() so far only return scalar type. As a
  result, backend logic assuming that breaks.
- Rename the original TLI.getShiftAmountTy() to
  TLI.getScalarShiftAmountTy() and re-define TLI.getShiftAmountTy() to
  return target-specificed scalar type or the same vector type as the
  1st operand.
- Fix most TICG logic assuming TLI.getShiftAmountTy() a simple scalar
  type.

llvm-svn: 176364
2013-03-01 18:40:30 +00:00
Duncan Sands 2cb41d372c GCC thinks that this variable might be used uninitialized (it isn't).
llvm-svn: 176341
2013-03-01 09:46:03 +00:00
Yiannis Tsiouris d4842e5ee9 Re-format comments (and check commit access)
llvm-svn: 176270
2013-02-28 16:59:10 +00:00
Nadav Rotem 08ab877cc7 Revert r176166 because it broke one of the lit tests.
llvm-svn: 176171
2013-02-27 05:56:20 +00:00
Nadav Rotem 85e1211fbf std::string to StringRef.
llvm-svn: 176166
2013-02-27 05:23:56 +00:00
Chad Rosier 1b33e8d63e [fast-isel] Make sure the FastLowerArguments function checks to make sure the
arguments type is a simple type.
rdar://13290455

llvm-svn: 176066
2013-02-26 01:05:31 +00:00
Michael Liao 609a527286 Refine fix to PR10499, no functionality change
- Put expensive checking after simple one

llvm-svn: 176060
2013-02-25 23:16:36 +00:00
Michael Liao ab97668061 Fix PR10499
- Check whether SSE is available before lowering all 1s vector building with
  PCMPEQD, which is only available from SSE2

llvm-svn: 176058
2013-02-25 23:01:03 +00:00
Chad Rosier a92ef4ba5b [fast-isel] Add X86FastIsel::FastLowerArguments to handle functions with 6 or
fewer scalar integer (i32 or i64) arguments. It completely eliminates the need
for SDISel for trivial functions.

Also, add the new llc -fast-isel-abort-args option, which is similar to
-fast-isel-abort option, but for formal argument lowering.

llvm-svn: 176052
2013-02-25 21:59:35 +00:00
Chad Rosier 669bb3ee77 [ms-inline asm] Add support for the pushad/popad mnemonics.
rdar://13254235

llvm-svn: 176036
2013-02-25 19:06:27 +00:00
Nadav Rotem b532fca92c Revert r169638 because it broke Mesa llvmpipe tests.
Fix PR15239.

llvm-svn: 175985
2013-02-24 07:09:35 +00:00
Benjamin Kramer ee23dcb461 X86: Disable cmov-memory patterns on subtargets without cmov.
Fixes PR15115.

llvm-svn: 175962
2013-02-23 10:40:58 +00:00
Peter Collingbourne 7b57621fb3 x86_64: designate most general purpose and SSE registers as callee save under coldcc
llvm-svn: 175911
2013-02-22 19:19:44 +00:00
Eli Bendersky 8da87163ca Move the eliminateCallFramePseudoInstr method from TargetRegisterInfo
to TargetFrameLowering, where it belongs. Incidentally, this allows us
to delete some duplicated (and slightly different!) code in TRI.

There are potentially other layering problems that can be cleaned up
as a result, or in a similar manner.

The refactoring was OK'd by Anton Korobeynikov on llvmdev.

Note: this touches the target interfaces, so out-of-tree targets may
be affected.

llvm-svn: 175788
2013-02-21 20:05:00 +00:00
Eli Bendersky e93249befa getX86SubSuperRegister has a special mode with High=true for i64 which
exists solely to enable it to call itself for i8 with some registers.
The proposed patch simplifies the function somewhat to make the High
bit only meaningful for the i8 mode, which makes sense. No functional
difference (getX86SubSuperRegister is not getting called from anywhere
outside with i64 and High=true).

llvm-svn: 175762
2013-02-21 16:40:18 +00:00
Jim Grosbach d2037eb1ee MCParser: Update method names per coding guidelines.
s/AddDirectiveHandler/addDirectiveHandler/
s/ParseMSInlineAsm/parseMSInlineAsm/
s/ParseIdentifier/parseIdentifier/
s/ParseStringToEndOfStatement/parseStringToEndOfStatement/
s/ParseEscapedString/parseEscapedString/
s/EatToEndOfStatement/eatToEndOfStatement/
s/ParseExpression/parseExpression/
s/ParseParenExpression/parseParenExpression/
s/ParseAbsoluteExpression/parseAbsoluteExpression/
s/CheckForValidSection/checkForValidSection/

http://llvm.org/docs/CodingStandards.html#name-types-functions-variables-and-enumerators-properly

No functional change intended.

llvm-svn: 175675
2013-02-20 22:21:35 +00:00
Jim Grosbach 341ad3e72a Update TargetLowering ivars for name policy.
http://llvm.org/docs/CodingStandards.html#name-types-functions-variables-and-enumerators-properly

ivars should be camel-case and start with an upper-case letter. A few in
TargetLowering were starting with a lower-case letter.

No functional change intended.

llvm-svn: 175667
2013-02-20 21:13:59 +00:00
Chad Rosier a018cfd10c [ms-inline asm] Make the comment a bit more verbose.
llvm-svn: 175641
2013-02-20 18:03:44 +00:00
Elena Demikhovsky 0ccdd1315b I optimized the following patterns:
sext <4 x i1> to <4 x i64>
 sext <4 x i8> to <4 x i64>
 sext <4 x i16> to <4 x i64>
 
I'm running Combine on SIGN_EXTEND_IN_REG and revert SEXT patterns:
 (sext_in_reg (v4i64 anyext (v4i32 x )), ExtraVT) -> (v4i64 sext (v4i32 sext_in_reg (v4i32 x , ExtraVT)))
 
 The sext_in_reg (v4i32 x) may be lowered to shl+sar operations.
 The "sar" does not exist on 64-bit operation, so lowering sext_in_reg (v4i64 x) has no vector solution.

I also added a cost of this operations to the AVX costs table.

llvm-svn: 175619
2013-02-20 12:42:54 +00:00
Chad Rosier 45a52fa097 [ms-inline asm] Force the use of a base pointer if the MachineFunction includes
MS-style inline assembly.

This is a follow-on to r175334.  Forcing a FP to be emitted doesn't ensure it
will be used.  Therefore, force the base pointer as well.  We now treat MS
inline assembly in the same way we treat functions with dynamic stack
realignment and VLAs.  This guarantees the BP will be used to reference 
parameters and locals.
rdar://13218191

llvm-svn: 175576
2013-02-19 23:50:45 +00:00
Jakub Staszak e167cf5c4d Add obvious constantness.
llvm-svn: 175560
2013-02-19 21:54:59 +00:00
Benjamin Kramer 1cb826b0ad Clean up HiPE prologue emission a bit and avoid signed arithmetic tricks.
No intended functionality change.

llvm-svn: 175536
2013-02-19 17:32:57 +00:00
Rafael Espindola 1c040b5788 Move LLVM_LIBRARY_VISIBILITY for consistency with what was done to
PPCJITInfo.cpp in r175394.

llvm-svn: 175531
2013-02-19 17:14:33 +00:00
Eli Bendersky b0b13b22a3 Make pass name more precise and fix comment.
llvm-svn: 175525
2013-02-19 16:38:32 +00:00
Craig Topper f371e89264 Fix capitalization in comment to match function name.
llvm-svn: 175497
2013-02-19 07:43:59 +00:00
Jakub Staszak 1f199a0ef2 Use array_pod_sort instead of std::sort.
llvm-svn: 175472
2013-02-18 23:18:22 +00:00
NAKAMURA Takumi 3a8002f61d X86FrameLowering.cpp: Fixup. Sorry for the breakage.
llvm-svn: 175467
2013-02-18 23:15:21 +00:00
NAKAMURA Takumi a614ec7e6f X86FrameLowering.cpp: Fix a warning in -Asserts. [-Wunused-variable]
llvm-svn: 175464
2013-02-18 23:08:49 +00:00
Chad Rosier 441e81287f Remove a useless assert.
llvm-svn: 175463
2013-02-18 22:20:16 +00:00
Benjamin Kramer 5c6e653b72 Fix a 32/64 bit incompatibility in the HiPE prologue generation.
llvm-svn: 175458
2013-02-18 21:45:01 +00:00