Commit Graph

54325 Commits

Author SHA1 Message Date
Benjamin Kramer 047d7ca0b1 CodeGenPrepare: Add a transform to turn selects into branches in some cases.
This came up when a change in block placement formed a cmov and slowed down a
hot loop by 50%:

	ucomisd	(%rdi), %xmm0
	cmovbel	%edx, %esi

cmov is a really bad choice in this context because it doesn't get branch
prediction. If we emit it as a branch, an out-of-order CPU can do a better job
(if the branch is predicted right) and avoid waiting for the slow load+compare
instruction to finish. Of course it won't help if the branch is unpredictable,
but those are really rare in practice.

This patch uses a dumb conservative heuristic, it turns all cmovs that have one
use and a direct memory operand into branches. cmovs usually save some code
size, so we disable the transform in -Os mode. In-Order architectures are
unlikely to benefit as well, those are included in the
"predictableSelectIsExpensive" flag.

It would be better to reuse branch probability info here, but BPI doesn't
support select instructions currently. It would make sense to use the same
heuristics as the if-converter pass, which does the opposite direction of this
transform.


Test suite shows a small improvement here and there on corei7-level machines,
but the actual results depend a lot on the used microarchitecture. The
transformation is currently disabled by default and available by passing the
-enable-cgp-select2branch flag to the code generator.

Thanks to Chandler for the initial test case to him and Evan Cheng for providing
me with comments and test-suite numbers that were more stable than mine :)

llvm-svn: 156234
2012-05-05 12:49:22 +00:00
Benjamin Kramer e31f31e5c0 Add a new target hook "predictableSelectIsExpensive".
This will be used to determine whether it's profitable to turn a select into a
branch when the branch is likely to be predicted.

Currently enabled for everything but Atom on X86 and Cortex-A9 devices on ARM.

I'm not entirely happy with the name of this flag, suggestions welcome ;)

llvm-svn: 156233
2012-05-05 12:49:14 +00:00
Benjamin Kramer a25a61b9e8 NVPTX: Initialize the UseF32FTZ flag.
llvm-svn: 156232
2012-05-05 11:22:02 +00:00
Stepan Dyatkovskiy cb2a1a34e2 Small fix in InstCombineCasts.cpp. Restored "alloca + bitcast" reducing for case when alloca's size is calculated within the "add/sub/... nsw".
Also added fix to 2011-06-13-nsw-alloca.ll test.

llvm-svn: 156231
2012-05-05 07:09:40 +00:00
Eric Christopher de9e92ed9b Typo.
llvm-svn: 156226
2012-05-05 01:16:06 +00:00
Jakob Stoklund Olesen e326ed33a8 Make sure findRepresentativeClass picks the widest super-register.
We want the representative register class to contain the largest
super-registers available. This makes the function less sensitive to the
register class numbering.

llvm-svn: 156220
2012-05-04 22:53:28 +00:00
Jakob Stoklund Olesen e89496fe63 Remove extra comma in debug output.
llvm-svn: 156219
2012-05-04 22:53:26 +00:00
David Blaikie 891d0a3d20 Fix warnings in release build.
This fixes a couple of Clang warnings in release builds of LLVM:

* Missing return in ISelLowering
* Unused variable in NVPTXutil.cpp

llvm-svn: 156216
2012-05-04 22:34:16 +00:00
Kevin Enderby cabbae653e Tweak to the fix in r156212, as with the change in removing the shift the
SignExtend32<22>(Val<<1) also needs to change to SignExtend32<21>(Val) .

llvm-svn: 156213
2012-05-04 22:09:52 +00:00
Kevin Enderby 8ce1ada1be Fix a bug in the ARM disassembler for wide branch conditional instructions
where the symbolic operand's displacement was incorrectly shifted left by 1.
rdar://11387046

llvm-svn: 156212
2012-05-04 22:02:27 +00:00
Chandler Carruth cd3464ee22 Fix a Clang warning in the new NVPTX backend:
In file included from ../lib/Target/NVPTX/VectorElementize.cpp:53:
../lib/Target/NVPTX/NVPTX.h:44:3: warning: default label in switch which covers all enumeration values [-Wcovered-switch-default]
  default: assert(0 && "Unknown condition code");
  ^
1 warning generated.

The prevailing pattern in LLVM is to not use a default label, and instead to
use llvm_unreachable to denote that the switch in fact covers all return paths
from the function.

llvm-svn: 156209
2012-05-04 21:35:49 +00:00
Chandler Carruth 6781821c01 Teach the code extractor how to extract a sequence of blocks from
RegionInfo's RegionNode. This mirrors the logic for automating the
extraction from a Loop.

llvm-svn: 156208
2012-05-04 21:33:30 +00:00
Chandler Carruth 8880325a92 Rename the Region::block_iterator to Region::block_node_iterator, and
add a new Region::block_iterator which actually iterates over the basic
blocks of the region.

The old iterator, now call 'block_node_iterator' iterates over
RegionNodes which contain a single basic block. This works well with the
GraphTraits-based iterator design, however most users actually want an
iterator over the BasicBlocks inside these RegionNodes. Now the
'block_iterator' is a wrapper which exposes exactly this interface.
Internally it uses the block_node_iterator to walk all nodes which are
single basic blocks, but transparently unwraps the basic block to make
user code simpler.

While this patch is a bit of a wash, most of the updates are to internal
users, not external users of the RegionInfo. I have an accompanying
patch to Polly that is a strict simplification of every user of this
interface, and I'm working on a pass that also wants the same simplified
interface.

This patch alone should have no functional impact.

llvm-svn: 156202
2012-05-04 20:55:23 +00:00
Justin Holewinski ae556d3ef7 This patch adds a new NVPTX back-end to LLVM which supports code generation for NVIDIA PTX 3.0. This back-end will (eventually) replace the current PTX back-end, while maintaining compatibility with it.
The new target machines are:

nvptx (old ptx32) => 32-bit PTX
nvptx64 (old ptx64) => 64-bit PTX

The sources are based on the internal NVIDIA NVPTX back-end, and
contain more functionality than the current PTX back-end currently
provides.

NV_CONTRIB

llvm-svn: 156196
2012-05-04 20:18:50 +00:00
Sebastian Pop 2420e8b7d5 Added missing CMN case in Thumb2SizeReduction pass so that LLVM emits 16-bits encoding of CMN instructions.
llvm-svn: 156195
2012-05-04 19:53:56 +00:00
Preston Gurd d6c440cd4c Adds Intel Atom scheduling latencies to X86InstrSystem.td.
llvm-svn: 156194
2012-05-04 19:26:37 +00:00
Matt Beaumont-Gay e82ab6baa7 Pacify GCC's -Wreturn-type
llvm-svn: 156189
2012-05-04 18:34:27 +00:00
Chandler Carruth 14316fcf7d Factor the computation of input and output sets into a public interface
of the CodeExtractor utility. This allows speculatively computing input
and output sets to measure the likely size impact of the code
extraction.

These sets cannot be reused sadly -- we mutate the function prior to
forming the final sets used by the actual extraction.

The interface has been revamped slightly to make it easier to use
correctly by making the interface const and sinking the computation of
the number of exit blocks into the full extraction function and away
from the rest of this logic which just computed two output parameters.

llvm-svn: 156168
2012-05-04 11:20:27 +00:00
Chandler Carruth 44e13911bc Rather than trying to gracefully handle input sequences with repeated
blocks, assert that this doesn't happen. We don't want to bother trying
to support this call pattern as it isn't necessary.

llvm-svn: 156167
2012-05-04 11:17:06 +00:00
Chandler Carruth 0a570552d1 Fix a goof with my previous commit by completely returning when we
detect an in-eligible block rather than just breaking out of the loop.

llvm-svn: 156166
2012-05-04 11:14:19 +00:00
Chandler Carruth 2f5d0191f7 Hoist a safety assert from the extraction method into the construction
of the extractor itself.

llvm-svn: 156164
2012-05-04 10:26:45 +00:00
Chandler Carruth 0fde00150d Move the CodeExtractor utility to a dedicated header file / source file,
and expose it as a utility class rather than as free function wrappers.

The simple free-function interface works well for the bugpoint-specific
pass's uses of code extraction, but in an upcoming patch for more
advanced code extraction, they simply don't expose a rich enough
interface. I need to expose various stages of the process of doing the
code extraction and query information to decide whether or not to
actually complete the extraction or give up.

Rather than build up a new predicate model and pass that into these
functions, just take the class that was actually implementing the
functions and lift it up into a proper interface that can be used to
perform code extraction. The interface is cleaned up and re-documented
to work better in a header. It also is now setup to accept the blocks to
be extracted in the constructor rather than in a method.

In passing this essentially reverts my previous commit here exposing
a block-level query for eligibility of extraction. That is no longer
necessary with the more rich interface as clients can query the
extraction object for eligibility directly. This will reduce the number
of walks of the input basic block sequence by quite a bit which is
useful if this enters the normal optimization pipeline.

llvm-svn: 156163
2012-05-04 10:18:49 +00:00
Hans Wennborg aea412008e Make ARM and Mips use TargetMachine::getTLSModel()
This moves the logic for selecting a TLS model to a single place,
instead of the previous three (ARM, Mips, and X86 which already
uses this function).

llvm-svn: 156162
2012-05-04 09:40:39 +00:00
Craig Topper bdd2e34b1f Fix some loops to match coding standards. No functional change intended.
llvm-svn: 156159
2012-05-04 06:39:13 +00:00
Craig Topper d4d3237bb8 Fix up some spacing. No functional change.
llvm-svn: 156158
2012-05-04 06:18:33 +00:00
Craig Topper e2ae413746 Simplify broadcast lowering code. No functional change intended.
llvm-svn: 156157
2012-05-04 05:49:51 +00:00
Craig Topper 42f2182366 Allow v16i16 and v32i8 shuffles to be rewritten as narrower shuffles.
llvm-svn: 156156
2012-05-04 04:44:49 +00:00
Bill Wendling fa0ebcd1b0 Add 'landingpad' instructions to the list of instructions to ignore.
Also combine the code in the 'assert' statement.

llvm-svn: 156155
2012-05-04 04:22:32 +00:00
Craig Topper 59063c0a3d Simplify shuffle narrowing code a bit. No functional change intended.
llvm-svn: 156154
2012-05-04 04:08:44 +00:00
Jakob Stoklund Olesen 796e5272ab Remove the SubRegClasses field from RegisterClass descriptions.
This information in now computed by TableGen.

llvm-svn: 156152
2012-05-04 03:30:34 +00:00
Jakob Stoklund Olesen 75fbe90839 Use SuperRegClassIterator for findRepresentativeClass().
The masks returned by SuperRegClassIterator are computed automatically
by TableGen. This is better than depending on the manually specified
SuperRegClasses.

llvm-svn: 156147
2012-05-04 02:19:22 +00:00
Jakob Stoklund Olesen 34a8f13e5f Initialize SparcInstrInfo before SparcTargetLowering.
The TargetLowering construction needs to use a valid TargetRegisterInfo
instance.

llvm-svn: 156146
2012-05-04 02:16:39 +00:00
Jakob Stoklund Olesen 57c7050675 Add a SuperRegClassIterator class.
This iterator class provides a more abstract interface to the (Idx,
Mask) lists of super-registers for a register class. The layout of the
tables shouldn't be exposed to clients.

llvm-svn: 156144
2012-05-04 01:48:29 +00:00
Chandler Carruth da7513a834 A pile of long over-due refactorings here. There are some very, *very*
minor behavior changes with this, but nothing I have seen evidence of in
the wild or expect to be meaningful. The real goal is unifying our logic
and simplifying the interfaces. A summary of the changes follows:

- Make 'callIsSmall' actually accept a callsite so it can handle
  intrinsics, and simplify callers appropriately.
- Nuke a completely bogus declaration of 'callIsSmall' that was still
  lurking in InlineCost.h... No idea how this got missed.
- Teach the 'isInstructionFree' about the various more intelligent
  'free' heuristics that got added to the inline cost analysis during
  review and testing. This mostly surrounds int->ptr and ptr->int casts.
- Switch most of the interesting parts of the inline cost analysis that
  were essentially computing 'is this instruction free?' to use the code
  metrics routine instead. This way we won't keep duplicating logic.

All of this is motivated by the desire to allow other passes to compute
a roughly equivalent 'cost' metric for a particular basic block as the
inline cost analysis. Sadly, re-using the same analysis for both is
really messy because only the actual inline cost analysis is ever going
to go to the contortions required for simplification, SROA analysis,
etc.

llvm-svn: 156140
2012-05-04 00:58:03 +00:00
Jakob Stoklund Olesen 2f460ae3b4 Use a shared implementation of getMatchingSuperRegClass().
TargetRegisterClass now gives access to the necessary tables.

llvm-svn: 156122
2012-05-03 22:49:04 +00:00
Kevin Enderby 914223010c Fix issues with the ARM bl and blx thumb instructions and the J1 and J2 bits
for the assembler and disassembler.  Which were not being set/read correctly
for offsets greater than 22 bits in some cases.

Changes to lib/Target/ARM/ARMAsmBackend.cpp from Gideon Myles!

llvm-svn: 156118
2012-05-03 22:41:56 +00:00
Chandler Carruth a46e62424b Factor the logic for testing whether a basic block is viable for code
extraction into a public interface. Also clean it up and apply it more
consistently such that we check for landing pads *anywhere* in the
extracted code, not just in single-block extraction.

This will be used to guide decisions in passes that are planning to
eventually perform a round of code extraction.

llvm-svn: 156114
2012-05-03 22:26:53 +00:00
Nuno Lopes d4cf35d775 remove calls to calloc if the allocated memory is not used (it was already being done for malloc)
fix a few typos found by Chad in my previous commit

llvm-svn: 156110
2012-05-03 22:08:19 +00:00
Sirish Pande f8e5e3c072 Support for target dependent Hexagon VLIW packetizer.
This patch creates and optimizes packets as per Hexagon ISA rules.

llvm-svn: 156109
2012-05-03 21:52:53 +00:00
Nuno Lopes d2b71e7fa9 add support for calloc to objectsize lowering
llvm-svn: 156102
2012-05-03 21:19:58 +00:00
Silviu Baranga 9560af848c Fixed disassembler for vstm/vldm ARM VFP instructions.
llvm-svn: 156077
2012-05-03 16:38:40 +00:00
Sirish Pande c92c31674e Extensions of Hexagon V4 instructions.
This adds new instructions for Hexagon V4 architecture.

llvm-svn: 156071
2012-05-03 16:18:50 +00:00
Nuno Lopes 22f6f3b055 replace 'break's with 'return 0' in visitCallInst code for objectsize, since there is no need to fallback to visitCallSite.
This gives a 0.9% in a test case

llvm-svn: 156069
2012-05-03 16:06:07 +00:00
Craig Topper 242183834a Use 'unsigned' instead of 'int' in a few places dealing with counts of vector elements.
llvm-svn: 156060
2012-05-03 07:26:59 +00:00
Craig Topper 315a5cc789 Fix 256-bit vpshuflw and vpshufhw immediate encoding to handle undefs in the lower half correctly. Missed in r155982.
llvm-svn: 156059
2012-05-03 07:12:59 +00:00
Evan Cheng b64e7b778b Fix two-address pass's aggressive instruction commuting heuristics. It's meant
to catch cases like:
 %reg1024<def> = MOV r1
 %reg1025<def> = MOV r0
 %reg1026<def> = ADD %reg1024, %reg1025
 r0            = MOV %reg1026

By commuting ADD, it let coalescer eliminate all of the copies. However, there
was a bug in the heuristics where it ended up commuting the ADD in:

 %reg1024<def> = MOV r0
 %reg1025<def> = MOV 0
 %reg1026<def> = ADD %reg1024, %reg1025
 r0            = MOV %reg1026

That did no benefit but rather ensure the last MOV would not be coalesced.

rdar://11355268

llvm-svn: 156048
2012-05-03 01:45:13 +00:00
Andrew Trick 32aea358e1 Added TargetRegisterInfo::getAllocatableClass.
The ensures that virtual registers always belong to an allocatable class.
If your target attempts to create a vreg for an operand that has no
allocatable register subclass, you will crash quickly.

This ensures that targets define register classes as intended.

llvm-svn: 156046
2012-05-03 01:14:37 +00:00
Bill Wendling c94d86c4ad Whitespace cleanup.
llvm-svn: 156034
2012-05-02 23:43:23 +00:00
Owen Anderson 41b0665b5b Teach DAGCombine the same multiply-by-1.0 folding trick when doing FMAs, just like it now knows for FMULs.
llvm-svn: 156029
2012-05-02 22:17:40 +00:00
Preston Gurd 926afd7401 For Intel Atom, use ILP scheduling always, instead of ILP for 64 bit
and Hybrid for 32 bit, since benchmarks show ILP scheduling is better
most of the time.

llvm-svn: 156028
2012-05-02 22:02:02 +00:00
Preston Gurd c0b976c42a Change the Intel Atom detection code to recognize
Lincroft and Medfield.

llvm-svn: 156025
2012-05-02 21:38:46 +00:00
Owen Anderson b5f167c660 Teach DAG combine that multiplication by 1.0 can always be constant folded.
llvm-svn: 156023
2012-05-02 21:32:35 +00:00
Jim Grosbach 28b0b7279e ARM: Add missing two-operand VBIC aliases.
llvm-svn: 156019
2012-05-02 21:11:56 +00:00
Douglas Gregor 12c1cd33f4 Move llvm-tblgen's StringMatcher into the TableGen library so it can
be used by clang-tblgen.

llvm-svn: 156000
2012-05-02 17:32:48 +00:00
Preston Gurd fa3f6cb830 This patch continues the work of adding instruction latencies for X86 Atom,
by providing the latencies for the instructions in X86InstrFPStack.td.

llvm-svn: 155996
2012-05-02 16:03:35 +00:00
Manman Ren f02efc8731 Revert r155853
The commit is intended to fix rdar://10961709.
But it is the root cause of PR12720.
Revert it for now.

llvm-svn: 155992
2012-05-02 15:24:32 +00:00
Kostya Serebryany ae7188d9b9 [tsan] typo and style (thanks to Nick Lewycky)
llvm-svn: 155986
2012-05-02 13:12:19 +00:00
Bill Wendling 274ba89d77 The value held in the vector may be RAUW'ed by some of the canonicalization
methods. Use a weak value handle to keep up with this.
PR12245

llvm-svn: 155984
2012-05-02 09:59:45 +00:00
Richard Barton 0fc56890ba Disallow YIELD and other allocated nop hints in pre-ARMv6 architectures.
llvm-svn: 155983
2012-05-02 09:43:18 +00:00
Craig Topper c73bc39c22 Add support for selecting AVX2 vpshuflw and vpshufhw. Add decoding support for AsmPrinter.
llvm-svn: 155982
2012-05-02 08:03:44 +00:00
Eli Friedman 4a80e94b86 Fix the implementation of MachOObjectFile::isSectionZeroInit so it follows the MachO spec.
llvm-svn: 155976
2012-05-02 02:31:28 +00:00
Jim Grosbach edcb868fe3 Tidy up. Naming conventions.
llvm-svn: 155960
2012-05-01 23:21:41 +00:00
Jakub Staszak 6126401c83 Remove unneeded break.
llvm-svn: 155959
2012-05-01 23:08:16 +00:00
Jakub Staszak cd2353402d Use dyn_cast instead of checking opcode and cast.
llvm-svn: 155957
2012-05-01 23:06:00 +00:00
Jakub Staszak 339380286b Remove trailing spaces.
llvm-svn: 155956
2012-05-01 23:04:38 +00:00
Bill Wendling b6b50c6638 Strip the pointer casts off of allocas so that the selection DAG can find them.
PR10799

llvm-svn: 155954
2012-05-01 22:50:45 +00:00
Sirish Pande 94212168fc Target independent Hexagon Packetizer fix.
llvm-svn: 155947
2012-05-01 21:28:30 +00:00
Jim Grosbach 1d20efb837 ARM: Add a few missing add->sub aliases w/ 'w' suffix.
Aliases for adding a negative immediate when using an explicit 'w'
suffix. E.g.,
        adds.w r2, #-16
        adds.w r2, r2, #-16
        addw r2, #-16
        addw r2, #-16
        addw r2, r2, #-16

rdar://11330769

llvm-svn: 155946
2012-05-01 21:17:34 +00:00
Jim Grosbach 70bed4faaf ARM: allow vanilla expressions for movw/movt.
Expressions for movw/movt don't always have an :upper16: or :lower16:
on them and that's ok. When they don't, it's just a plain [0-65536]
immediate result, effectively the same as a :lower16: variant kind.

rdar://10550147

llvm-svn: 155941
2012-05-01 20:43:21 +00:00
Preston Gurd 5ae5278ca1 This patch marks the X86 floating point stack registers ST0-ST7 as reserved
in order to avoid assertion failures in the register scavenger. The assertion
failures were “Bad machine code: Using an undefined physical register” and
“Bad machine code: MBB exits via unconditional fall-through but its successor
differs from its CFG successor!”.

llvm-svn: 155930
2012-05-01 19:50:22 +00:00
Jim Grosbach 758e0cc94a MC: Unknown assembler directives are now hard errors.
Previously, an unsupported/unknown assembler directive issued a warning.
That's generally unsafe, and inconsistent with the behaviour of pretty
much every system assembler. Now that the MC assemblers are mature
enough to be the default on multiple targets, it's reasonable to
issue errors for these.

For target or platform directives that need to stay warnings, we
should add explicit handlers for them in, e.g., ELFAsmParser.cpp,
DarwinAsmParser.cpp, et. al., and issue the warning there.

rdar://9246275

llvm-svn: 155926
2012-05-01 18:38:27 +00:00
Jim Grosbach a0c53f147a MC: Remove errant EatToEndOfStatement() in asm parser.
The caller is already responsible for eating any additional input on the
line. Putting an additional EatToEndOfStatement() in ParseStatement()
causes an entire extra statement to be consumed when treating warnings
as errors. For example, test/MC/macros.s will assert() because the
.endmacro directive is missed as a result.

rdar://11355843

llvm-svn: 155925
2012-05-01 18:38:24 +00:00
Manman Ren 425a55c1ce X86: optimization for max-like struct
This patch will optimize the following cases on X86
(a > b) ? (a-b) : 0
(a >= b) ? (a-b) : 0
(b < a) ? (a-b) : 0
(b <= a) ? (a-b) : 0

FROM
movl    %edi, %ecx
subl    %esi, %ecx
cmpl    %edi, %esi
movl    $0, %eax
cmovll  %ecx, %eax
TO
xorl    %eax, %eax
subl    %esi, %edi
cmovll  %eax, %edi
movl    %edi, %eax

rdar: 10734411
llvm-svn: 155919
2012-05-01 17:16:15 +00:00
Alexey Samsonov c4b3ad8195 X86: Use StackRegister instead of FrameRegister in getFrameIndexReference (to generate debug info for local variables) if stack needs realignment
llvm-svn: 155917
2012-05-01 15:16:06 +00:00
Benjamin Kramer cb3e98cf44 Move MipsDisassembler classes into an anonymous namespace.
llvm-svn: 155915
2012-05-01 14:34:24 +00:00
Benjamin Kramer 512c1dce8f Value-initialize global to avoid global construction.
llvm-svn: 155909
2012-05-01 10:48:02 +00:00
Eli Bendersky 667b879e73 RuntimeDyld cleanup:
- Improved parameter names for clarity
- Added comments
- emitCommonSymbols should return void because its return value is not being
  used anywhere
- Attempt to reduce the usage of the RelocationValueRef type. Restricts it 
  for a single goal and may serve as a step for eventual removal.

llvm-svn: 155908
2012-05-01 10:41:12 +00:00
Benjamin Kramer 84b857e4e6 YAMLParser: get rid of global ctors & dtors.
llvm-svn: 155907
2012-05-01 10:19:59 +00:00
Bill Wendling b12f16e75f Change the PassManager from a reference to a pointer.
The TargetPassManager's default constructor wants to initialize the PassManager
to 'null'. But it's illegal to bind a null reference to a null l-value. Make the
ivar a pointer instead.
PR12468

llvm-svn: 155902
2012-05-01 08:27:43 +00:00
Craig Topper 05eb6e096a Allow BMI, AES, F16C, POPCNT, FMA3, and CLMUL to be detected on AMD processors.
llvm-svn: 155899
2012-05-01 07:10:32 +00:00
Eli Bendersky fc079081b7 RuntimeDyld code cleanup:
- There's no point having a different type for the local and global symbol
  tables.
- Renamed SymbolTable to GlobalSymbolTable to clarify the intention
- Improved const correctness where relevant

llvm-svn: 155898
2012-05-01 06:58:59 +00:00
Craig Topper bae0e9ea1d Make XOP and FMA4 require SSE4A to match GCC behavior. Use this to simplify Bulldozer feature list.
llvm-svn: 155897
2012-05-01 06:54:48 +00:00
Craig Topper d32ebcc36b Attempt to handle MRMInitReg in emitVEXOpcodePrefix. Hopefully fixes PR12711.
llvm-svn: 155896
2012-05-01 06:34:01 +00:00
Craig Topper 43518cc55f Make XOP imply AVX as its needed to legalize the registers types.
llvm-svn: 155891
2012-05-01 05:41:41 +00:00
Craig Topper c0cef32b83 Remove HasSSE2 from AES and CLMUL predicates. It's now implied by the HasAES and HasCLMUL predicates.
llvm-svn: 155890
2012-05-01 05:35:02 +00:00
Craig Topper 29dd148a71 Make CLMUL and AES imply SSE2 since its needed to legalize the type.
llvm-svn: 155888
2012-05-01 05:28:32 +00:00
Craig Topper 0eacda5f69 Enable AVX and FMA4 for AMD Bulldozer processors.
llvm-svn: 155885
2012-05-01 05:18:13 +00:00
Nick Lewycky 78ee67e814 An instruction in a loop is not guaranteed to be executed just because the loop
has no exit blocks. Fixes PR12706!

llvm-svn: 155884
2012-05-01 04:03:01 +00:00
Lang Hames 3a90fabd85 Add support for llvm.arm.neon.vmull* intrinsics to InstCombine. Fixes
<rdar://problem/11291436>.

This is a second attempt at a fix for this, the first was r155468. Thanks
to Chandler, Bob and others for the feedback that helped me improve this.

llvm-svn: 155866
2012-05-01 00:20:38 +00:00
Jakub Staszak cec09b2594 Add some constantness. No functionality change.
llvm-svn: 155859
2012-04-30 23:41:30 +00:00
Manman Ren 4f4d5c8fc8 X86: optimization for -(x != 0)
This patch will optimize -(x != 0) on X86
FROM 
cmpl	$0x01,%edi
sbbl	%eax,%eax
notl	%eax
TO
negl %edi
sbbl %eax %eax

llvm-svn: 155853
2012-04-30 22:51:25 +00:00
Jim Grosbach e78031a9f3 ARM: Diagnostics for out of range fixups.
Replace some assert() calls w/ actual diagnostics. In a perfect world,
there'd be range checks on these values long before things ever reached
this code. For now, though, issuing a better-late-than-never diagnostic
is still a big improvement over assert().

rdar://11347287

llvm-svn: 155851
2012-04-30 22:30:43 +00:00
Jakob Stoklund Olesen 8503ba984f Fix address calculation error from r155744.
This was exposed by SingleSource/UnitTests/Vector/constpool.c.

The computed size of a basic block isn't always a multiple of its known
alignment, and that can introduce extra alignment padding after the
block.

<rdar://problem/11347135>

llvm-svn: 155845
2012-04-30 20:19:00 +00:00
Chad Rosier d427d51c2b Tidy up. No functional change intended.
llvm-svn: 155832
2012-04-30 17:47:15 +00:00
Derek Schuff b051adf263 Fix fastcc structure return with fast-isel on x86-32
On x86-32, structure return via sret lets the callee pop the hidden
pointer argument off the stack, which the caller then re-pushes.
However if the calling convention is fastcc, then a register is used
instead, and the caller should not adjust the stack. This is
implemented with a check of IsTailCallConvention
X86TargetLowering::LowerCall but is now checked properly in
X86FastISel::DoSelectCall.

(this time, actually commit what was reviewed!)

llvm-svn: 155825
2012-04-30 16:57:15 +00:00
Bob Wilson 9245c93656 Don't introduce illegal types when creating vmull operations. <rdar://11324364>
ARM BUILD_VECTORs created after type legalization cannot use i8 or i16
operands, since those types are not legal.  Instead use i32 operands, which
will be implicitly truncated by the BUILD_VECTOR to match the element type.

llvm-svn: 155824
2012-04-30 16:53:34 +00:00
Eli Bendersky b92e1cf636 It doesn't make sense to move symbol relocations to section relocations when
relocations are resolved.  It's much more reasonable to do this decision when
relocations are just being added - we have all the information at that point.

Also a bit of renaming and extra comments to clarify extensions.

llvm-svn: 155819
2012-04-30 12:15:58 +00:00
Duncan Sands 34c4869cf6 Just mark the sign bit as known zero, rather than any other irrelevant bits
known zero in the LHS.  Fixes PR12541.

llvm-svn: 155818
2012-04-30 11:56:58 +00:00
Bill Wendling bf4b9afbeb Second attempt at PR12573:
Allow the "SplitCriticalEdge" function to split the edge to a landing pad. If
the pass is *sure* that it thinks it knows what it's doing, then it may go ahead
and specify that the landing pad can have its critical edge split. The loop
unswitch pass is one of these passes. It will split the critical edges of all
edges coming from a loop to a landing pad not within the loop. Doing so will
retain important loop analysis information, such as loop simplify.

llvm-svn: 155817
2012-04-30 10:44:54 +00:00
Bill Wendling 325e6cd9cb Use an ArrayRef instead of explicit vector type.
llvm-svn: 155816
2012-04-30 10:25:51 +00:00
Eli Bendersky 32d5488f64 Code cleanup in RuntimeDyld:
- Add comments
- Change field names to be more reasonable
- Fix indentation and naming to conform to coding conventions
- Remove unnecessary includes / replace them by forward declatations

llvm-svn: 155815
2012-04-30 10:06:27 +00:00
Bill Wendling 712d85a8c0 Remove hack from r154987. The problem persists even with it, so it's not even a good hack.
llvm-svn: 155813
2012-04-30 09:23:48 +00:00
Craig Topper 55b3990837 No need to normalize index before calling Extract128BitVector
llvm-svn: 155811
2012-04-30 05:17:10 +00:00
Pete Cooper f76b5fe5ab Copied all the VEX prefix encoding code from X86MCCodeEmitter to the x86 JIT emitter. Needs some major refactoring as these two code emitters are almost identical
llvm-svn: 155810
2012-04-30 03:56:44 +00:00
Rafael Espindola dd48931461 Make sure HoistInsertPosition finds a position that is dominated by all
inputs.

llvm-svn: 155809
2012-04-30 03:53:06 +00:00
Jakub Staszak da03f3ba64 Remove unneeded casts. No functionality change.
llvm-svn: 155800
2012-04-29 20:52:53 +00:00
Craig Topper 3b94fa63d6 Simplify code a bit. No functional change intended.
llvm-svn: 155798
2012-04-29 20:22:05 +00:00
Kalle Raiskila 4c5f83ea19 Update the documentation of CellSPU, in case it gets removed in 3.1.
llvm-svn: 155797
2012-04-29 20:00:55 +00:00
Benjamin Kramer db25381a54 RegisterPressure: ArrayRefize some functions for better readability. No functionality change.
llvm-svn: 155795
2012-04-29 18:52:56 +00:00
Eli Bendersky 0e2ac5bcdb Fix some formatting, grammar and style issues and add a couple of missing comments.
llvm-svn: 155793
2012-04-29 12:40:47 +00:00
Jakob Stoklund Olesen 6053899aa0 Don't update spill weights when joining intervals.
We don't compute spill weights until after coalescing anyway.

llvm-svn: 155766
2012-04-28 19:19:11 +00:00
Jakob Stoklund Olesen 4fe0e1908e Spring cleaning - Delete dead code.
llvm-svn: 155765
2012-04-28 19:19:07 +00:00
Jakob Stoklund Olesen ae7521d1e4 Fix a problem with blocks that need to be split twice.
The code could search past the end of the basic block when there was
already a constant pool entry after the block.

Test case with giant basic block in SingleSource/UnitTests/Vector/constpool.c

llvm-svn: 155753
2012-04-28 06:21:38 +00:00
Andrew Trick 833f04962a Reapply 155668: Fix the SD scheduler to avoid gluing the same node twice.
This time, also fix the caller of AddGlue to properly handle
incomplete chains. AddGlue had failure modes, but shamefully hid them
from its caller. It's luck ran out.

Fixes rdar://11314175: BuildSchedUnits assert.

llvm-svn: 155749
2012-04-28 01:03:23 +00:00
Jim Grosbach c6f32b3295 ARM: Thumb add(sp plus register) asm constraints.
Make sure when parsing the Thumb1 sp+register ADD instruction that
the source and destination operands match. In thumb2, just use the
wide encoding if they don't. In Thumb1, issue a diagnostic.

rdar://11219154

llvm-svn: 155748
2012-04-27 23:51:36 +00:00
Jim Grosbach 9d8f6f3d9d ARM: Tweak tADDrSP definition for consistent operand order.
Make the operand order of the instruction match that of the asm syntax.

llvm-svn: 155747
2012-04-27 23:51:33 +00:00
Derek Schuff a99b168145 Revert r155745
llvm-svn: 155746
2012-04-27 23:37:41 +00:00
Derek Schuff bbf8b83e90 Fix fastcc structure return with fast-isel on x86-32
On x86-32, structure return via sret lets the callee pop the hidden
pointer argument off the stack, which the caller then re-pushes.
However if the calling convention is fastcc, then a register is used
instead, and the caller should not adjust the stack. This is
implemented with a check of IsTailCallConvention
X86TargetLowering::LowerCall but is now checked properly in
X86FastISel::DoSelectCall.

llvm-svn: 155745
2012-04-27 23:27:17 +00:00
Jakob Stoklund Olesen 5f0d1b462c Track worst case alignment padding more accurately.
Previously, ARMConstantIslandPass would conservatively compute the
address of an aligned basic block as:

  RoundUpToAlignment(Offset + UnknownPadding)

This worked fine for the layout algorithm itself, but it could fool the
verify() function because it accounts for alignment padding twice: Once
when adding the worst case UnknownPadding, and again by rounding up the
fictional block offset. This meant that when optimizeThumb2Instructions
would shrink an instruction, the conservative distance estimate could
grow. That shouldn't be possible since the woorst case alignment padding
wss already included.

This patch drops the use of RoundUpToAlignment, and depends only on
worst case padding to compute conservative block offsets. This has the
weird effect that the computed offset for an aligned block may not be
aligned.

The important difference is that shrinking an instruction can never
cause the estimated distance between two instructions to grow. The
estimated distance is always larger than the real distance that only the
assembler knows.

<rdar://problem/11339352>

llvm-svn: 155744
2012-04-27 22:58:38 +00:00
Andrew Trick 7a773ec053 Temporarily revert r155668: Fix the SD scheduler to avoid gluing.
This definitely caused regression with ARM -mno-thumb.

llvm-svn: 155743
2012-04-27 22:55:59 +00:00
Craig Topper 0fa6c7e593 Use 'unsigned' instead of 'int' in several places when retrieving number of vector elements.
llvm-svn: 155742
2012-04-27 22:54:43 +00:00
Chad Rosier 32c2178ef3 Add x86-specific DAG combine to simplify:
x == -y --> x+y == 0
 x != -y --> x+y != 0

On x86, the generated code goes from
   negl    %esi
   cmpl    %esi, %edi
   je    .LBB0_2
to
   addl    %esi, %edi
   je    .L4

This case is correctly handled for ARM with "cmn".

Patch by Manman Ren.
rdar://11245199
PR12545

llvm-svn: 155739
2012-04-27 22:33:25 +00:00
Michael J. Spencer 6033113e35 [Support/YAMLParser] Fix ASan found bugs.
llvm-svn: 155735
2012-04-27 21:12:20 +00:00
Craig Topper 42cd8d2c00 Tidy up spacing.
llvm-svn: 155733
2012-04-27 21:05:09 +00:00
Hal Finkel 27c3246169 Don't vectorize target-specific types (ppc_fp128, x86_fp80, etc.).
Target specific types should not be vectorized. As a practical matter,
these types are already register matched (at least in the x86 case),
and codegen does not always work correctly (at least in the ppc case,
and this is not worth fixing because ppc_fp128 is currently broken and
will probably go away soon).

llvm-svn: 155729
2012-04-27 19:34:00 +00:00
David Blaikie 84e4b39995 Change recurse depth limit to uint32 to fix warning.
llvm-svn: 155727
2012-04-27 19:30:32 +00:00
Dan Gohman dae3349ac2 Miscellaneous accumulated cleanups.
llvm-svn: 155725
2012-04-27 18:56:31 +00:00
Lang Hames ea001225c1 Fix the order of the operands in the llvm.fma intrinsic patterns for ARM,
<rdar://problem/11325085>.

llvm-svn: 155724
2012-04-27 18:51:24 +00:00
Mon P Wang 6120cfb8cd Add an early bailout to IsValueFullyAvailableInBlock from deeply nested blocks.
The limit is set to an arbitrary 1000 recursion depth to avoid stack overflow
issues. <rdar://problem/11286839>.

llvm-svn: 155722
2012-04-27 18:09:28 +00:00
Dan Gohman 1ccecdb2fd Reapply r155682, making constant folding more consistent, with a fix to work
properly with how the code handles all-undef PHI nodes.

llvm-svn: 155721
2012-04-27 17:50:22 +00:00
Richard Barton 82f95ea2ad Fix ARM assembly parsing for upper case condition codes on IT instructions.
llvm-svn: 155720
2012-04-27 17:34:01 +00:00
Benjamin Kramer 913da4b261 X86: Don't emit conditional floating point moves on when targeting pre-pentiumpro architectures.
* Model FPSW (the FPU status word) as a register.
* Add ISel patterns for the FUCOM*, FNSTSW and SAHF instructions.
* During Legalize/Lowering, build a node sequence to transfer the comparison
result from FPSW into EFLAGS. If you're wondering about the right-shift: That's
an implicit sub-register extraction (%ax -> %ah) which is handled later on by
the instruction selector.

Fixes PR6679. Patch by Christoph Erhardt!

llvm-svn: 155704
2012-04-27 12:07:43 +00:00
Kostya Serebryany 5a464f03d3 [asan] small optimization: do not emit "x+0" instructions
llvm-svn: 155701
2012-04-27 10:04:53 +00:00
Richard Barton f435b09eaf Refactor IT handling not to store the bottom bit of the condition code in the mask operand in the MCInst.
llvm-svn: 155700
2012-04-27 08:42:59 +00:00
NAKAMURA Takumi 6008dfdb70 Revert r155682, "Use ConstantExpr::getExtractElement when constant-folding vectors"
It broke stage2 build. stage1/clang sometimes crashed.

llvm-svn: 155699
2012-04-27 07:59:20 +00:00
Kostya Serebryany a1259778b4 [tsan] Atomic support for ThreadSanitizer, patch by Dmitry Vyukov
llvm-svn: 155698
2012-04-27 07:31:53 +00:00
Evan Cheng 1ec87ee096 Implement a bastardized ABI.
llvm-svn: 155686
2012-04-27 02:11:10 +00:00
Evan Cheng f52003de56 - thumbv6 shouldn't imply +thumb2. Cortex-M0 doesn't suppport 32-bit Thumb2
instructions.
- However, it does support dmb, dsb, isb, mrs, and msr.
rdar://11331541

llvm-svn: 155685
2012-04-27 01:27:19 +00:00
Dan Gohman 90f3798f26 Use ConstantExpr::getExtractElement when constant-folding vectors
instead of getAggregateElement. This has the advantage of being
more consistent and allowing higher-level constant folding to
procede even if an inner extract element cannot be folded.

Make ConstantFoldInstruction call ConstantFoldConstantExpression
on the instruction's operands, making it more consistent with 
ConstantFoldConstantExpression itself. This makes sure that
ConstantExprs get TargetData-aware folding before being handed
off as operands for further folding.

This causes more expressions to be folded, but due to a known
shortcoming in constant folding, this currently has the side effect
of stripping a few more nuw and inbounds flags in the non-targetdata
side of constant-fold-gep.ll. This is mostly harmless.

This fixes rdar://11324230.

llvm-svn: 155682
2012-04-27 00:54:36 +00:00
Jakob Stoklund Olesen c90abc8956 Break up getProfitableChainIncrement().
The required checks are moved to ChainInstruction() itself and the
policy decisions are moved to IVChain::isProfitableInc().

Also cache the ExprBase in IVChain to avoid frequent recomputations.

No functional change intended.

llvm-svn: 155676
2012-04-26 23:33:11 +00:00
Jakob Stoklund Olesen a0337d7bd9 Turn IVChain into a struct.
No functional change intended.

llvm-svn: 155675
2012-04-26 23:33:09 +00:00
Chad Rosier 7813dcee30 Add instcombine patterns for the following transformations:
(x & y) | (x ^ y) -> x | y 
 (x & y) + (x ^ y) -> x | y 

Patch by Manman Ren.
rdar://10770603

llvm-svn: 155674
2012-04-26 23:29:14 +00:00
Andrew Trick 03fa574af5 Fix the SD scheduler to avoid gluing the same node twice.
DAGCombine strangeness may result in multiple loads from the same
offset. They both may try to glue themselves to another load. We could
insist that the redundant loads glue themselves to each other, but the
beter fix is to bail out from bad gluing at the time we detect it.

Fixes rdar://11314175: BuildSchedUnits assert.

llvm-svn: 155668
2012-04-26 21:48:25 +00:00
Jim Grosbach 3d6c629e26 ARM: Thumb ldr(literal) base address alignment is 32-bits.
The base address for the PC-relative load is Align(PC,4), so it's the
address of the word containing the 16-bit instruction, not the address
of the instruction itself. Ugh.

rdar://11314619

llvm-svn: 155659
2012-04-26 20:48:12 +00:00
Preston Gurd 81290f4be5 Trivial change to set UseLeaForSP flag in addition to toggling
the FeatureLeaForSP feature bit when llvm auto detects Intel Atom.

Patch by Andy Zhang

llvm-svn: 155655
2012-04-26 19:52:27 +00:00
Michael J. Spencer a6c2c29152 [Support/YAML] Properly fix unitialized variable warning by inserting a
'REPLACEMENT CHARACTER' (U+FFFD) when getAsInteger fails.

llvm-svn: 155653
2012-04-26 19:27:11 +00:00
Tim Northover 3de97b7a86 Use VLD1 in NEON extenting-load patterns instead of VLDR.
On some cores it's a bad idea for performance to mix VFP and NEON instructions
and since these patterns are NEON anyway, the NEON load should be used.

llvm-svn: 155630
2012-04-26 08:46:29 +00:00
Tim Northover 6699a60b0e Test commit.
llvm-svn: 155626
2012-04-26 08:24:07 +00:00
Craig Topper 08ccfbe57b Enable detection of AVX and AVX2 support through CPUID. Add AVX/AVX2 to corei7-avx, core-avx-i, and core-avx2 cpu names.
llvm-svn: 155618
2012-04-26 06:40:15 +00:00
Chandler Carruth 739ef80fd7 Teach the reassociate pass to fold chains of multiplies with repeated
elements to minimize the number of multiplies required to compute the
final result. This uses a heuristic to attempt to form near-optimal
binary exponentiation-style multiply chains. While there are some cases
it misses, it seems to at least a decent job on a very diverse range of
inputs.

Initial benchmarks show no interesting regressions, and an 8%
improvement on SPASS. Let me know if any other interesting results (in
either direction) crop up!

Credit to Richard Smith for the core algorithm, and helping code the
patch itself.

llvm-svn: 155616
2012-04-26 05:30:30 +00:00