Commit Graph

19580 Commits

Author SHA1 Message Date
Reid Kleckner b9204a584c [WinEH] Don't forward branches across empty EH pad BBs
For really simple SEH catchpads, we tried to forward the invoke unwind
edge across the empty block.

llvm-svn: 252822
2015-11-11 23:09:31 +00:00
Geoff Berry 2ddfc5e60f [DAGCombiner] Improve zextload optimization.
Summary:
Don't fold
  (zext (and (load x), cst)) -> (and (zextload x), (zext cst))
if
  (and (load x) cst)
will match as a zextload already and has additional users.

For example, the following IR:

  %load = load i32, i32* %ptr, align 8
  %load16 = and i32 %load, 65535
  %load64 = zext i32 %load16 to i64
  store i32 %load16, i32* %dst1, align 4
  store i64 %load64, i64* %dst2, align 8

used to produce the following aarch64 code:

	ldr		w8, [x0]
	and	w9, w8, #0xffff
	and	x8, x8, #0xffff
	str		w9, [x1]
	str		x8, [x2]

but with this change produces the following aarch64 code:

	ldrh		w8, [x0]
	str		w8, [x1]
	str		x8, [x2]

Reviewers: resistor, mcrosier

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D14340

llvm-svn: 252789
2015-11-11 19:42:52 +00:00
Matt Arsenault d8fed1b793 Add target preference for GatherAllAliases max depth
llvm-svn: 252775
2015-11-11 18:44:33 +00:00
Dehao Chen 54511353e3 clang-format lib/CodeGen/AsmPrinter/DwarfCompileUnit.cpp
llvm-svn: 252769
2015-11-11 18:09:47 +00:00
Dehao Chen 72fdf444b7 Emit discriminator for inlined callsites.
Summary: Inlined callsites need to be emitted in debug info so that sample profile can be annotated to the correct inlined instance.

Reviewers: dnovillo, dblaikie

Subscribers: dblaikie, llvm-commits

Differential Revision: http://reviews.llvm.org/D14511

llvm-svn: 252768
2015-11-11 18:08:18 +00:00
Matthias Braun 2c98d0f477 MachineInstr: addRegisterDefReadUndef() => setRegisterDefReadUndef()
This way we can not only add but also remove read undef flags.

llvm-svn: 252678
2015-11-11 00:41:58 +00:00
Matthias Braun 4353b30542 TableGen: Emit LaneMask for register classes without subregisters as ~0u
This makes it slightly easier to handle classes with and without
subregister uniformly.

llvm-svn: 252671
2015-11-10 23:23:05 +00:00
Sanjay Patel 33ec5dbe35 less indent; NFCI
llvm-svn: 252643
2015-11-10 20:09:02 +00:00
Matt Arsenault aa118e299c LegalizeDAG: Implement promote for scalar_to_vector
This allows avoiding the default Expand behavior which
introduces stack usage. Bitcast the scalar and replace
the missing elements with undef.

This is covered by existing tests and used by a future
commit which makes 64-bit vectors legal types on AMDGPU.

llvm-svn: 252632
2015-11-10 18:48:11 +00:00
Matt Arsenault a46aa641f2 LegalizeDAG: Implement promote for insert_vector_elt
This is covered by existing tests and used by a future
commit which makes 64-bit vectors legal types on AMDGPU.

llvm-svn: 252631
2015-11-10 18:48:08 +00:00
Matt Arsenault 0b7958a59b LegalizeDAG: Implement promote for extract_vector_elt
This is for AMDGPU to implement v2i64 extract as extract of
half of a v4i32.

This is covered by existing tests and used by a future
commit which makes 64-bit vectors legal types on AMDGPU.

llvm-svn: 252630
2015-11-10 18:48:04 +00:00
Sanjay Patel 766589efdc add 'MustReduceDepth' as an objective/cost-metric for the MachineCombiner
This is one of the problems noted in PR25016:
https://llvm.org/bugs/show_bug.cgi?id=25016
and:
http://lists.llvm.org/pipermail/llvm-dev/2015-October/090998.html

The spilling problem is independent and not addressed by this patch.

The MachineCombiner was doing reassociations that don't improve or even worsen the critical path. 
This is caused by inclusion of the "slack" factor when calculating the critical path of the original
code sequence. If we don't add that, then we have a more conservative cost comparison of the old code
sequence vs. a new sequence. The more liberal calculation must be preserved, however, for the AArch64
MULADD patterns because benchmark regressions were observed without that.

The two failing test cases now have identical asm that does what we want:
a + b + c + d ---> (a + b) + (c + d)

Differential Revision: http://reviews.llvm.org/D13417

llvm-svn: 252616
2015-11-10 16:48:53 +00:00
Andy Ayers 809cbe9ea0 Support for emitting inline stack probes
For CoreCLR on Windows, stack probes must be emitted as inline sequences that probe successive stack pages
between the current stack limit and the desired new stack pointer location. This implements support for
the inline expansion on x64.

For in-body alloca probes, expansion is done during instruction lowering. For prolog probes, a stub call
is initially emitted during prolog creation, and expanded after epilog generation, to avoid complications
that arise when introducing new machine basic blocks during prolog and epilog creation.

Added a new test case, modified an existing one to exclude non-x64 coreclr (for now).

Add test case

Fix tests

llvm-svn: 252578
2015-11-10 01:50:49 +00:00
Matt Arsenault 6d87f28afd Remove unnecessary call to getAllocatableRegClass
I'm not sure what the point of this was. I'm not sure why
you would ever define an instruction that produces an unallocatable
register class. No tests fail with this removed, and it seems like
it should be a verifier error to define such an instruction.

This was problematic for AMDGPU because it would make bad decisions
by arbitrarily changing the register class when unsetting isAllocatable
for VS_32/VS_64, which is currently set as a workaround to this problem.

AMDGPU uses the VS_32/VS_64 register classes to represent operands which
can use either VGPRs or SGPRs. When  isAllocatable is unset for these,
this would need to pick  either the SGPR or VGPR class and insert either
a copy we don't want, or an illegal copy we would need to deal with
later. A semi-arbitrary register class ordering decision is made in tablegen,
which resulted in always picking a VGPR class because it happens to have
more registers than the SGPR register class. We really just want to
use whatever register class the original register had.

llvm-svn: 252565
2015-11-10 00:30:14 +00:00
Matthias Braun 7e624d5f11 MachineVerifier: Streamline live interval related error reporting
Simply perform additional report_context() calls after a report()
instead of adding more and more overloaded variations of report().  Also
improve several instances where information was output in an ad-hoc way
probably because no matching report() overload was available.

llvm-svn: 252552
2015-11-09 23:59:33 +00:00
Matthias Braun 716b43306b MachineVerifier: Add missing linebreak
MachineInstr::print() with SkipOppers==true does not produce a
linebreak, so we have to do that in MachineVerifier::report().

llvm-svn: 252551
2015-11-09 23:59:29 +00:00
Matthias Braun 45718db0a1 MachineVerifier: MI::print has no TargetMachine overload
The code was passing a target machine pointer which degraded to a true
operand to SkipOppers.

llvm-svn: 252550
2015-11-09 23:59:25 +00:00
Matthias Braun 42b4b63056 MachineVerifier: print list of live intervals if available
llvm-svn: 252549
2015-11-09 23:59:23 +00:00
Sanjay Patel 533c10c651 add a SelectionDAG method to check if no common bits are set in two nodes; NFCI
This was suggested in:
http://reviews.llvm.org/D13956

and is a follow-on to:
http://reviews.llvm.org/rL252515
http://reviews.llvm.org/rL252519

This lets us remove logically equivalent/duplicated code from DAGCombiner and X86ISelDAGToDAG.

A corresponding function for IR instructions already exists in ValueTracking.

llvm-svn: 252539
2015-11-09 23:31:38 +00:00
David Majnemer 2652b75700 [WinEH] Don't emit CATCHRET from visitCatchPad
Instead, emit a CATCHPAD node which will get selected to a target
specific sequence.

llvm-svn: 252528
2015-11-09 23:07:48 +00:00
Reid Kleckner 64b003f05d [WinEH] Tweak funclet prologue/epilogue insertion to pass verifier
For some reason we'd never run MachineVerifier on WinEH code, and you
explicitly have to ask for it with llc. I added it to a few test cases
to get some coverage.

Fixes PR25461.

llvm-svn: 252512
2015-11-09 21:04:00 +00:00
Andrew Kaylor fdd48fa1e1 [WinEH] Re-committing r252249 (Clone funclets with multiple parents) with additional fixes for determinism problems
Differential Revision: http://reviews.llvm.org/D14454

llvm-svn: 252508
2015-11-09 19:59:02 +00:00
Oliver Stannard 563585789c [CodeGen] Always promote f16 if not legal
We don't currently have any runtime library functions for operations on
f16 values (other than conversions to and from f32 and f64), so we
should always promote it to f32, even if that is not a legal type. In
that case, the f32 values would be softened to f32 library calls.

SoftenFloatRes_FP_EXTEND now needs to check the promoted operand's type,
as it may ne a no-op or require a different library call.

getCopyFromParts and getCopyToParts now need to cope with a
floating-point value stored in a larger integer part, as is the case for
any target that needs to store an f16 value in a 32-bit integer
register.

Differential Revision: http://reviews.llvm.org/D12856

llvm-svn: 252459
2015-11-09 11:03:18 +00:00
Yaron Keren 9ffee46d45 Erase unused FunctionDIs variables after r252219.
llvm-svn: 252401
2015-11-07 10:21:25 +00:00
Joseph Tremoulet f748c8937e [WinEH] Update exception pointer registers
Summary:
The CLR's personality routine passes these in rdx/edx, not rax/eax.

Make getExceptionPointerRegister a virtual method parameterized by
personality function to allow making this distinction.

Similarly make getExceptionSelectorRegister a virtual method parameterized
by personality function, for symmetry.


Reviewers: pgavlin, majnemer, rnk

Subscribers: jyknight, dsanders, llvm-commits

Differential Revision: http://reviews.llvm.org/D14344

llvm-svn: 252383
2015-11-07 01:11:31 +00:00
Tom Stellard 05691a678e DAGCombiner: Check shouldReduceLoadWidth before combining (and (load), x) -> extload
Reviewers: resistor, arsenm

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D13805

llvm-svn: 252349
2015-11-06 21:58:37 +00:00
Quentin Colombet 9a8efc08d3 [ShrinkWrapping] Teach shrink-wrapping how to analyze RegMask.
Previously we were conservatively assuming that RegMask operands clobber
callee saved registers.

llvm-svn: 252341
2015-11-06 21:00:13 +00:00
Matthias Braun 9198c671e8 MachineScheduler: Add regpressure information to debug dump
llvm-svn: 252340
2015-11-06 20:59:02 +00:00
Reid Kleckner b8fd162fc5 [WinEH] Mark funclet entries and exits as clobbering all registers
Summary:
In this implementation, LiveIntervalAnalysis invents a few register
masks on basic block boundaries that preserve no registers. The nice
thing about this is that it prevents the prologue inserter from thinking
it needs to spill all XMM CSRs, because it doesn't see any explicit
physreg defs in the MI.

Reviewers: MatzeB, qcolombet, JosephTremoulet, majnemer

Subscribers: MatzeB, llvm-commits

Differential Revision: http://reviews.llvm.org/D14407

llvm-svn: 252318
2015-11-06 17:06:38 +00:00
NAKAMURA Takumi 9947cacebf Revert r252249 (and r252255, r252258), "[WinEH] Clone funclets with multiple parents"
It behaved flaky due to iterating pointer key values on std::set and std::map.

llvm-svn: 252279
2015-11-06 10:07:33 +00:00
Reid Kleckner e535c1f856 Range-for some LiveIntervals code under review
llvm-svn: 252267
2015-11-06 02:01:02 +00:00
Andrew Kaylor f477585a2b Fix build warnings
llvm-svn: 252255
2015-11-06 01:08:35 +00:00
Andrew Kaylor 29cd576554 [WinEH] Clone funclets with multiple parents
Windows EH funclets need to always return to a single parent funclet.  However, it is possible for earlier optimizations to combine funclets (probably based on one funclet having an unreachable terminator) in such a way that this condition is violated.

These changes add code to the WinEHPrepare pass to detect situations where a funclet has multiple parents and clone such funclets, fixing up the unwind and catch return edges so that each copy of the funclet returns to the correct parent funclet.

Differential Revision: http://reviews.llvm.org/D13274?id=39098

llvm-svn: 252249
2015-11-06 00:20:50 +00:00
Peter Collingbourne d4bff30370 DI: Reverse direction of subprogram -> function edge.
Previously, subprograms contained a metadata reference to the function they
described. Because most clients need to get or set a subprogram for a given
function rather than the other way around, this created unneeded inefficiency.

For example, many passes needed to call the function llvm::makeSubprogramMap()
to build a mapping from functions to subprograms, and the IR linker needed to
fix up function references in a way that caused quadratic complexity in the IR
linking phase of LTO.

This change reverses the direction of the edge by storing the subprogram as
function-level metadata and removing DISubprogram's function field.

Since this is an IR change, a bitcode upgrade has been provided.

Fixes PR23367. An upgrade script for textual IR for out-of-tree clients is
attached to the PR.

Differential Revision: http://reviews.llvm.org/D14265

llvm-svn: 252219
2015-11-05 22:03:56 +00:00
Reid Kleckner 6ddae31045 [WinEH] Fix funclet prologues with stack realignment
We already had a test for this for 32-bit SEH catchpads, but those don't
actually create funclets. We had a bug that only appeared in funclet
prologues, where we would establish EBP and ESI as our FP and BP, and
then downstream prologue code would overwrite them.

While I was at it, I fixed Win64+funclets+stackrealign. This issue
doesn't come up as often there due to the ABI requring 16 byte stack
alignment, but now we can rest easy that AVX and WinEH will work well
together =P.

llvm-svn: 252210
2015-11-05 21:09:49 +00:00
Sanjay Patel 387e66e79f replace MachineCombinerPattern namespace and enum with enum class; NFCI
Also, remove an enum hack where enum values were used as indexes into an array.

We may want to make this a real class to allow pattern-based queries/customization (D13417).

llvm-svn: 252196
2015-11-05 19:34:57 +00:00
Eugene Zelenko ffec81ca00 Fix some Clang-tidy modernize warnings, other minor fixes.
Fixed warnings are: modernize-use-override, modernize-use-nullptr and modernize-redundant-void-arg.

Differential revision: http://reviews.llvm.org/D14312

llvm-svn: 252087
2015-11-04 22:32:32 +00:00
Cong Hou 23a3bf0147 Add new interfaces to MBB for manipulating successors with probabilities instead of weights. NFC.
This is part-1 of the patch that replaces all edge weights in MBB by
probabilities, which only adds new interfaces. No functional changes.

Differential revision: http://reviews.llvm.org/D13908

llvm-svn: 252083
2015-11-04 21:37:58 +00:00
Igor Laevsky 35fe692025 [StatepointLowering] Remove distinction between call and invoke safepoints
There is no point in having invoke safepoints handled differently than the
call safepoints. All relevant decisions could be made by looking at whether
or not gc.result and gc.relocate lay in a same basic block. This change will
 allow to lower call safepoints with relocates and results in a different 
basic blocks. See test case for example.

Differential Revision: http://reviews.llvm.org/D14158

llvm-svn: 252028
2015-11-04 01:16:10 +00:00
Peter Collingbourne 94d778697a CodeGen, Target: Move Mach-O-specific symbol name logic to Mach-O lowering.
A profile of an LTO link of Chrome revealed that we were spending some
~30-50% of execution time in the function Constant::getRelocationInfo(),
which is called from TargetLoweringObjectFile::getKindForGlobal() and in turn
from TargetMachine::getNameWithPrefix().

It turns out that we only need the result of getKindForGlobal() when
targeting Mach-O, so this change moves the relevant part of the logic to
TargetLoweringObjectFileMachO.

NFCI.

Differential Revision: http://reviews.llvm.org/D14168

llvm-svn: 252014
2015-11-03 23:40:03 +00:00
Simon Pilgrim 191ac7c679 [SelectionDAG] Use existing constant nodes instead of recreating them. NFC.
llvm-svn: 251990
2015-11-03 22:21:38 +00:00
Rafael Espindola 43e2e251ea Delete dead code.
llvm-svn: 251960
2015-11-03 18:55:58 +00:00
Igor Laevsky f637b4a52e [CodegenPrepare] Do not rematerialize gc.relocates across different basic blocks
Differential Revision: http://reviews.llvm.org/D14258

llvm-svn: 251957
2015-11-03 18:37:40 +00:00
Michael Kuperstein 73dc85293f [X86] Generate .cfi_adjust_cfa_offset correctly when pushing arguments
When push instructions are being used to pass function arguments on
the stack, and either EH or debugging are enabled, we need to generate
.cfi_adjust_cfa_offset directives appropriately. For (synch) EH, it is
enough for the CFA offset to be correct at every call site, while
for debugging we want to be correct after every push.

Darwin does not support this well, so don't use pushes whenever it
would be required.

Differential Revision: http://reviews.llvm.org/D13767

llvm-svn: 251904
2015-11-03 08:17:25 +00:00
Matthias Braun 6f4ed269b9 RegisterPressure: Improve assert message
llvm-svn: 251885
2015-11-03 01:53:36 +00:00
Matthias Braun 11859b5c8f RegisterPressure: Slightly nicer pressure diff dumping
llvm-svn: 251884
2015-11-03 01:53:33 +00:00
Matthias Braun 93563e7032 ScheduleDAGInstrs: Remove IsPostRA flag; NFC
ScheduleDAGInstrs doesn't behave differently before or after register
allocation. It was only used in a method of MachineSchedulerBase which
behaved differently in MachineScheduler/PostMachineScheduler. Change
this to let MachineScheduler/PostMachineScheduler just pass in a
parameter to that function.

The order of the LiveIntervals* and bool RemoveKillFlags paramters have
been switched to make out-of-tree code fail instead of unintentionally
passing a value intended for the IsPostRA flag to the (previously
following and default initialized) RemoveKillFlags.

Differential Revision: http://reviews.llvm.org/D14245

llvm-svn: 251883
2015-11-03 01:53:29 +00:00
Sanjay Patel 0ed9aeaa5f [CGP] widen switch condition and case constants to target's register width (2nd try)
This is a redo of r251849 except the tests have been split into arch-specific folders
to hopefully make the bots happy.

This is a follow-up from the discussion in D12965. The block-at-a-time limitation of
SelectionDAG also came up in D13297.

Without the InstCombine change from D12965, I don't expect this patch to make any
difference in the real world because InstCombine does not shrink cases like this in
visitSwitchInst(). But we need to have this CGP safety harness in place before
proceeding with any shrinkage in D12965, so we won't generate extra extends for compares.

I've opted for IR regression tests in the patch because that seems like a clearer way to
test the transform, but PowerPC CodeGen for an i16 widening test is shown below. x86
will need more work to solve: https://llvm.org/bugs/show_bug.cgi?id=22473

Before:
BB#0:
  mr 4, 3
  extsh. 3, 4
  ble 0, .LBB0_5
 BB#1:
  cmpwi  3, 99
  bgt    0, .LBB0_9
 BB#2:
  rlwinm 4, 4, 0, 16, 31      <--- 32-bit mask/extend
  li 3, 0
  cmplwi         4, 1
  beqlr 0
 BB#3:
  cmplwi         4, 10
  bne    0, .LBB0_12
 BB#4:
  li 3, 1
  blr
.LBB0_5:
  rlwinm 3, 4, 0, 16, 31      <--- 32-bit mask/extend
  cmplwi         3, 65436
  beq    0, .LBB0_13
 BB#6:
  cmplwi         3, 65526
  beq    0, .LBB0_15
 BB#7:
  cmplwi         3, 65535
  bne    0, .LBB0_12
 BB#8:
  li 3, 4
  blr
.LBB0_9:
  rlwinm 3, 4, 0, 16, 31      <--- 32-bit mask/extend
  cmplwi         3, 100
  beq    0, .LBB0_14
...

After:
BB#0:
  rlwinm 4, 3, 0, 16, 31      <--- mask/extend to 32-bit and then use that for comparisons
  cmpwi  4, 999
  ble 0, .LBB0_5
 BB#1:
  lis 3, 0
  ori 3, 3, 65525
  cmpw   4, 3
  bgt    0, .LBB0_9
 BB#2:
  cmplwi         4, 1000
  beq    0, .LBB0_14
 BB#3:
  cmplwi         4, 65436
  bne    0, .LBB0_13
 BB#4:
  li 3, 6
  blr
.LBB0_5:
  li 3, 0
  cmplwi         4, 1
  beqlr 0
 BB#6:
  cmplwi         4, 10
  beq    0, .LBB0_12
 BB#7:
  cmplwi         4, 100
  bne    0, .LBB0_13
 BB#8:
  li 3, 2
  blr
.LBB0_9:
  cmplwi         4, 65526
  beq    0, .LBB0_15
 BB#10:
  cmplwi         4, 65535
  bne    0, .LBB0_13
...


Differential Revision: http://reviews.llvm.org/D13532

llvm-svn: 251857
2015-11-02 23:22:49 +00:00
Sanjay Patel dfc825eb36 revert r251849; need to move tests to arch-specific folders
llvm-svn: 251851
2015-11-02 23:05:20 +00:00
Sanjay Patel b90a078de9 [CGP] widen switch condition and case constants to target's register width
This is a follow-up from the discussion in D12965. The block-at-a-time limitation of 
SelectionDAG also came up in D13297.

Without the InstCombine change from D12965, I don't expect this patch to make any 
difference in the real world because InstCombine does not shrink cases like this in
visitSwitchInst(). But we need to have this CGP safety harness in place before
proceeding with any shrinkage in D12965, so we won't generate extra extends for compares.

I've opted for IR regression tests in the patch because that seems like a clearer way to
test the transform, but PowerPC CodeGen for an i16 widening test is shown below. x86
will need more work to solve: https://llvm.org/bugs/show_bug.cgi?id=22473

Before:
BB#0:
  mr 4, 3
  extsh. 3, 4
  ble 0, .LBB0_5
 BB#1: 
  cmpwi	 3, 99
  bgt	 0, .LBB0_9
 BB#2:            
  rlwinm 4, 4, 0, 16, 31      <--- 32-bit mask/extend
  li 3, 0
  cmplwi	 4, 1
  beqlr 0
 BB#3:            
  cmplwi	 4, 10
  bne	 0, .LBB0_12
 BB#4:                      
  li 3, 1
  blr
.LBB0_5:                             
  rlwinm 3, 4, 0, 16, 31      <--- 32-bit mask/extend
  cmplwi	 3, 65436
  beq	 0, .LBB0_13
 BB#6:                            
  cmplwi	 3, 65526
  beq	 0, .LBB0_15
 BB#7:                       
  cmplwi	 3, 65535
  bne	 0, .LBB0_12
 BB#8:                       
  li 3, 4
  blr
.LBB0_9:                       
  rlwinm 3, 4, 0, 16, 31      <--- 32-bit mask/extend
  cmplwi	 3, 100
  beq	 0, .LBB0_14
...

After:
BB#0:        
  rlwinm 4, 3, 0, 16, 31      <--- mask/extend to 32-bit and then use that for comparisons
  cmpwi	 4, 999
  ble 0, .LBB0_5
 BB#1:          
  lis 3, 0
  ori 3, 3, 65525
  cmpw	 4, 3
  bgt	 0, .LBB0_9
 BB#2:         
  cmplwi	 4, 1000
  beq	 0, .LBB0_14
 BB#3:    
  cmplwi	 4, 65436
  bne	 0, .LBB0_13
 BB#4:       
  li 3, 6
  blr
.LBB0_5:   
  li 3, 0
  cmplwi	 4, 1
  beqlr 0
 BB#6: 
  cmplwi	 4, 10
  beq	 0, .LBB0_12
 BB#7:             
  cmplwi	 4, 100
  bne	 0, .LBB0_13
 BB#8:             
  li 3, 2
  blr
.LBB0_9:       
  cmplwi	 4, 65526
  beq	 0, .LBB0_15
 BB#10:      
  cmplwi	 4, 65535
  bne	 0, .LBB0_13
...


Differential Revision: http://reviews.llvm.org/D13532

llvm-svn: 251849
2015-11-02 22:46:24 +00:00
Cong Hou b90b9e0531 In MachineBlockPlacement, filter cold blocks off the loop chain when profile data is available.
In the current BB placement algorithm, a loop chain always contains all loop blocks. This has a drawback that cold blocks in the loop may be inserted on a hot function path, hence increasing branch cost and also reducing icache locality.

Consider a simple example shown below:

A
|
B⇆C
|
D

When B->C is quite cold, the best BB-layout should be A,B,D,C. But the current implementation produces A,C,B,D.

This patch filters those cold blocks off from the loop chain by comparing the ratio:

LoopBBFreq / LoopFreq

to 20%: if it is less than 20%, we don't include this BB to the loop chain. Here LoopFreq is the frequency of the loop when we reduce the loop into a single node. In general we have more cold blocks when the loop has few iterations. And vice versa.


Differential revision: http://reviews.llvm.org/D11662

llvm-svn: 251833
2015-11-02 21:24:00 +00:00
James Y Knight 646c4032e7 Fix two issues in MergeConsecutiveStores:
1) PR25154. This is basically a repeat of PR18102, which was fixed in
r200201, and broken again by r234430. The latter changed which of the
store nodes was merged into from the first to the last. Thus, we now
also need to prefer merging a later store at a given address into the
target node, instead of an earlier one.

2) While investigating that, I also realized I'd introduced a bug in
r236850. There, I removed a check for alignment -- not realizing that
nothing except the alignment check was ensuring that none of the stores
were overlapping! This is a really bogus way to ensure there's no
aliased stores.

A better solution to both of these issues is likely to always use the
code added in the 'if (UseAA)' branches which rearrange the chain based
on a more principled analysis. I'll look into whether that can be used
always, but in the interest of getting things back to working, I think a
minimal change makes sense.

llvm-svn: 251816
2015-11-02 18:48:08 +00:00
Jonas Paulsson 72640f1c9f [MachineVerifier] Analyze MachineMemOperands for mem-to-mem moves.
Since the verifier will give false reports if it incorrectly thinks MI is
loading or storing using an FI, it is necessary to scan memoperands and
find out how the FI is used in the instruction. This should be relatively
rare.

Needed to make CodeGen/SystemZ/spill-01.ll pass, which now runs with this flag.

Reviewed by Quentin Colombet.

llvm-svn: 251620
2015-10-29 08:28:35 +00:00
Matthias Braun f2f194455f Revert "ScheduleDAGInstrs: Remove IsPostRA flag"
It broke 3 arm testcases.

This reverts commit r251608.

llvm-svn: 251615
2015-10-29 05:06:41 +00:00
Matthias Braun dc7580aa88 MachineScheduler: Fix typo in debug message
Maybe I just missed the humor there ;-)

llvm-svn: 251609
2015-10-29 03:57:28 +00:00
Matthias Braun 7ffadd0087 ScheduleDAGInstrs: Remove IsPostRA flag
This was a layering violation in ScheduleDAGInstrs (and
MachineSchedulerBase) they both shouldn't know directly whether they are
used by the PostMachineScheduler or the MachineScheduler.

llvm-svn: 251608
2015-10-29 03:57:24 +00:00
Matthias Braun b0c437bc76 MachineScheduler: Use ranged for and slightly simplify the code
llvm-svn: 251607
2015-10-29 03:57:17 +00:00
Tim Northover 2d4d161519 ARM: support .watchos_version_min and .tvos_version_min.
These MachO file directives are used by linkers and other tools to provide
compatibility information, much like the existing .ios_version_min and
.macosx_version_min.

llvm-svn: 251569
2015-10-28 22:36:05 +00:00
Sanjoy Das 1d1929aace [ValueTracking] Use !range metadata more aggressively in KnownBits
Summary:
Teach `computeKnownBitsFromRangeMetadata` to use `!range` metadata more
aggressively.

Reviewers: majnemer, nlewycky, jingyue

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D14100

llvm-svn: 251487
2015-10-28 03:20:15 +00:00
Sanjoy Das 4ff3cf6d92 [SelectionDAG] Don't inspect !range metadata for extended loads
Summary:
Don't call `computeKnownBitsFromRangeMetadata` for extended loads --
this can cause a mismatch between the width of the !range metadata and
the width of the APInt's accumulating `KnownZero` (and `KnownOne` in the
future).  This isn't a problem now, but will be after a future change.

Note: this can be made more aggressive in the future.

Reviewers: nlewycky

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D14107

llvm-svn: 251486
2015-10-28 03:20:10 +00:00
James Y Knight 14eedd189b Make the SelectionDAG graph printer use SDNode::PersistentId labels.
r248010 changed the -debug output to use short ids, but did not
similarly modify the graph printer. Change to be consistent, for ease of
cross-reference.

llvm-svn: 251465
2015-10-27 23:09:03 +00:00
Sanjay Patel bbd4c79c8f Use the 'arcp' fast-math-flag when combining repeated FP divisors
This is a usage of the IR-level fast-math-flags now that they are propagated to SDNodes. 
This was originally part of D8900.

Removing the global 'enable-unsafe-fp-math' checks will require auto-upgrade and 
possibly other changes.

Differential Revision: http://reviews.llvm.org/D9708

llvm-svn: 251450
2015-10-27 20:27:25 +00:00
Cong Hou 07eeb8001e Create a new interface addSuccessorWithoutWeight(MBB*) in MBB to add successors when optimization is disabled.
When optimization is disabled, edge weights that are stored in MBB won't be used so that we don't have to store them. Currently, this is done by adding successors with default weight 0, and if all successors have default weights, the weight list will be empty. But that the weight list is empty doesn't mean disabled optimization (as is stated several times in MachineBasicBlock.cpp): it may also mean all successors just have default weights.

We should discourage using default weights when adding successors, because it is very easy for users to forget update the correct edge weights instead of using default ones (one exception is that the MBB only has one successor). In order to detect such usages, it is better to differentiate using default weights from the case when optimizations is disabled.

In this patch, a new interface addSuccessorWithoutWeight(MBB*) is created for when optimization is disabled. In this case, MBB will try to maintain an empty weight list, but it cannot guarantee this as for many uses of addSuccessor() whether optimization is disabled or not is not checked. But it can guarantee that if optimization is enabled, then the weight list always has the same size of the successor list.

Differential revision: http://reviews.llvm.org/D13963

llvm-svn: 251429
2015-10-27 17:59:36 +00:00
Mehdi Amini 891c0973df Do not use "else" when both branches return (NFC)
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 251398
2015-10-27 08:12:08 +00:00
Steve King fee370be72 Fix llc crash processing S/UREM for -Oz builds caused by rL250825.
When taking the remainder of a value divided by a constant, visitREM()
attempts to convert the REM to a longer but faster sequence of instructions.
This conversion calls combine() on a speculative DIV instruction. Commit
rL250825 may cause this combine() to return a DIVREM, corrupting nearby nodes.
Flow eventually hits unreachable().

This patch adds a test case and a check to prevent visitREM() from trying
to convert the REM instruction in cases where a DIVREM is possible.
See http://reviews.llvm.org/D14035

llvm-svn: 251373
2015-10-27 00:14:06 +00:00
Ivan Krasin 465fbe25c4 Fix indents. It's a follow up to r251353.
llvm-svn: 251364
2015-10-26 22:35:40 +00:00
Ivan Krasin 298639a5fd Move imported entities into DwarfCompilationUnit to speed up LTO linking.
Summary:
In particular, this CL speeds up the official Chrome linking with LTO by
1.8x.

See more details in https://crbug.com/542426

Reviewers: dblaikie

Subscribers: jevinskie

Differential Revision: http://reviews.llvm.org/D13918

llvm-svn: 251353
2015-10-26 21:36:35 +00:00
David Blaikie 7b54b525cd Remove assert(false) in favor of asserting the if conditional it is contained within.
Also adjust the code to avoid 3 redundant map lookups.

llvm-svn: 251327
2015-10-26 18:41:13 +00:00
Evgeniy Stepanov d1aad26589 [safestack] Fast access to the unsafe stack pointer on AArch64/Android.
Android libc provides a fixed TLS slot for the unsafe stack pointer,
and this change implements direct access to that slot on AArch64 via
__builtin_thread_pointer() + offset.

This change also moves more code into TargetLowering and its
target-specific subclasses to get rid of target-specific codegen
in SafeStackPass.

This change does not touch the ARM backend because ARM lowers
builting_thread_pointer as aeabi_read_tp, which is not available
on Android.

The previous iteration of this change was reverted in r250461. This
version leaves the generic, compiler-rt based implementation in
SafeStack.cpp instead of moving it to TargetLoweringBase in order to
allow testing without a TargetMachine.

llvm-svn: 251324
2015-10-26 18:28:25 +00:00
Elena Demikhovsky 092858588a Scalarizer for masked.gather and masked.scatter intrinsics.
When the target does not support these intrinsics they should be converted to a chain of scalar load or store operations.
If the mask is not constant, the scalarizer will build a chain of conditional basic blocks.
I added isLegalMaskedGather() isLegalMaskedScatter() APIs.

Differential Revision: http://reviews.llvm.org/D13722

llvm-svn: 251237
2015-10-25 15:37:55 +00:00
Michael Kuperstein eaa16005af [X86] Use correct calling convention for MCU psABI libcalls
When using the MCU psABI, compiler-generated library calls should pass
some parameters in-register. However, since inreg marking for x86 is currently
done by the front end, it will not be applied to backend-generated calls.

This is a workaround for PR3997, which describes a similar issue for -mregparm.

Differential Revision: http://reviews.llvm.org/D13977

llvm-svn: 251223
2015-10-25 08:14:05 +00:00
Rafael Espindola 84921b9860 Refactor: Simplify boolean conditional return statements in lib/CodeGen.
Patch by Richard.

llvm-svn: 251213
2015-10-24 23:11:13 +00:00
Simon Pilgrim 3448cbcc51 [DAGCombiner] Tidy up ConstantFP commutation. NFCI
Move ConstantFP canonicalization of commutative instructions to start of 2-op node creation (matches integer) - simplifies constant folding code.

llvm-svn: 251203
2015-10-24 20:06:18 +00:00
Simon Pilgrim 7430804fe1 [DAGCombiner] Generalize masking of constant rotates.
We don't need a mask of a rotation result to be a constant splat - any constant scalar/vector can be usefully folded.

Followup to D13851.

llvm-svn: 251197
2015-10-24 18:44:52 +00:00
Simon Pilgrim d5ef318b5b [X86][XOP] Add support for lowering vector rotations
This patch adds support for lowering to the XOP VPROT / VPROTI vector bit rotation instructions.

This has required changes to the DAGCombiner rotation pattern matching to support vector types - so far I've only changed it to support splat vectors, but generalising this further is feasible in the future.

Differential Revision: http://reviews.llvm.org/D13851

llvm-svn: 251188
2015-10-24 13:17:26 +00:00
Joseph Tremoulet 3d0fbf1d74 [CodeGen] Mark setjmp/catchret MBBs address-taken
Summary:
This ensures that BranchFolding (and similar) won't remove these blocks.

Also allow AsmPrinter::EmitBasicBlockStart to process MBBs which are
address-taken but do not have BBs that are address-taken, since otherwise
its call to getAddrLabelSymbolTableToEmit would fail an assertion on such
blocks.  I audited the other callers of getAddrLabelSymbolTableToEmit
(and getAddrLabelSymbol); they all have BBs known to be address-taken
except for the call through getAddrLabelSymbol from
WinException::create32bitRef; that call is actually now unreachable, so
I've removed it and updated the signature of create32bitRef.

This fixes PR25168.

Reviewers: majnemer, andrew.w.kaylor, rnk

Subscribers: pgavlin, llvm-commits

Differential Revision: http://reviews.llvm.org/D13774

llvm-svn: 251113
2015-10-23 15:06:05 +00:00
Davide Italiano fbb958c24b [CodeGen] Remove usage of NDEBUG in header.
Moreover, this seems unused.

llvm-svn: 251081
2015-10-23 00:17:40 +00:00
Matthias Braun 61f4d6439c MachineScheduler: Add a way to disable the 'ReduceLatency' heuristic
llvm-svn: 251037
2015-10-22 18:07:31 +00:00
Craig Topper 8fe40e0ed5 Change makeLibCall to take an ArrayRef<SDValue> instead of pointer and size. This removes the need to pass a hardcoded size in many places. NFC
llvm-svn: 251032
2015-10-22 17:05:00 +00:00
Zia Ansari 8f509a7044 [X86] - Catch extra combine opportunities for redundant imuls.
When we fold "mul ((add x, c1), c1)" -> "add ((mul x, c2), c1*c2)", we bail if (add x, c1) has multiple
users which would result in an extra add instruction.
In such cases, this patch adds a check to see if we can eliminate a multiply instruction in exchange for the extra add.

I also added the capability of doing the existing optimization with non-splatted vectors (splatted also works).

Differential Revision: http://reviews.llvm.org/D13740

llvm-svn: 251028
2015-10-22 16:14:45 +00:00
David Majnemer a8f17871e4 [WinEH] Remove extraneous call to emitEHRegistrationOffsetLabel
It's a relic from the earlier implementation, let's remove it.

llvm-svn: 250964
2015-10-21 23:20:39 +00:00
Matt Arsenault 29f9663f97 LegalizeDAG: Implement promote for build_vector
This will be used in future commits for AMDGPU to promote
operations on i64 vectors into operations on 32-bit vector
components.

This will be used / tested in future AMDGPU commits.

llvm-svn: 250945
2015-10-21 21:10:10 +00:00
Elena Demikhovsky 3ad76a1acd Masked Load/Store optimization for scalar code
When we have to convert the masked.load, masked.store to scalar code, we generate a chain of conditional basic blocks.
I added optimization for constant mask vector.

Differential Revision: http://reviews.llvm.org/D13855

llvm-svn: 250893
2015-10-21 11:50:54 +00:00
Jonas Paulsson 17ad04535f Let MachineVerifier be aware of mem-to-mem instructions.
A mem-to-mem instruction (that both loads and stores), which store to an
FI, cannot pass the verifier since it thinks it is loading from the FI.

For the mem-to-mem instruction, do a looser check in visitMachineOperand()
and only check liveness at the reg-slot while analyzing a frame index operand.

Needed to make CodeGen/SystemZ/xor-01.ll pass with -verify-machineinstrs,
which now runs with this flag.

Reviewed by Evan Cheng and Quentin Colombet.

llvm-svn: 250885
2015-10-21 07:39:47 +00:00
Krzysztof Parzyszek fdb7b693a7 Tail duplication can mix incompatible registers in phi nodes
Do not tail duplicate blocks where the successor has a phi node,
and the corresponding value in that phi node uses a subregister.

http://reviews.llvm.org/D13922

llvm-svn: 250877
2015-10-21 02:40:06 +00:00
Artyom Skrobov c736863a85 Two switch blocks in VectorLegalizer::LegalizeOp already have a
default: llvm_unreachable("This action is not supported yet!");

-- so I'm adding one to the third switch block, too.

This is a follow-up fix for http://reviews.llvm.org/D13862

llvm-svn: 250830
2015-10-20 15:06:37 +00:00
Artyom Skrobov 7fd67e25aa Adding support for TargetLoweringBase::LibCall
Summary:
TargetLoweringBase::Expand is defined as "Try to expand this to other ops,
otherwise use a libcall." For ISD::UDIV and ISD::SDIV, the choice between
the two possibilities was defined in a rather convoluted way:

- if DIVREM is legal, expand to DIVREM
- if DIVREM has a custom lowering, expand to DIVREM
- if DIVREM libcall is defined and a remainder from the same division is
  computed elsewhere, expand to a DIVREM libcall
- else, expand to a DIV libcall

This had the undesirable effect that if both DIV and DIVREM are implemented
as libcalls, then ISD::UDIV and ISD::SDIV are expanded to the heavier DIVREM
libcall, even when the remainder isn't used.

The new code adds a new LegalizeAction, TargetLoweringBase::LibCall, so that
backends can directly control whether they prefer an expansion or a conversion
to a libcall. This makes the generic lowering code even more generic,
allowing its reuse in a wider range of target-specific configurations.

The useful effect is that ARM backend will now generate a call
to __aeabi_{i,u}div rather than __aeabi_{i,u}divmod in cases where
it doesn't need the remainder. There's no functional change outside
the ARM backend.

Reviewers: t.p.northover, rengolin

Subscribers: t.p.northover, llvm-commits, aemerson

Differential Revision: http://reviews.llvm.org/D13862

llvm-svn: 250826
2015-10-20 13:14:52 +00:00
Artyom Skrobov b844fa7fc0 Combining DIV+REM->DIVREM doesn't belong in LegalizeDAG; move it over into DAGCombiner.
Summary:
In addition to moving the code over, this patch amends the DIV,REM -> DIVREM
combining to run on all affected nodes at once: if the nodes are converted
to DIVREM one at a time, then the resulting DIVREM may get legalized by the
backend into something target-specific that we won't be able to recognize
and correlate with the remaining nodes.

The motivation is to "prepare terrain" for D13862: when we set DIV and REM
to be legalized to libcalls, instead of the DIVREM, we otherwise lose the
ability to combine them together. To prevent this, we need to take the
DIV,REM -> DIVREM combining out of the lowering stage.

Reviewers: RKSimon, eli.friedman, rengolin

Subscribers: john.brawn, rengolin, llvm-commits

Differential Revision: http://reviews.llvm.org/D13733

llvm-svn: 250825
2015-10-20 13:06:02 +00:00
Duncan P. N. Exon Smith a25ad0685a AsmPrinter: Remove implicit ilist iterator conversion, NFC
llvm-svn: 250776
2015-10-20 00:36:08 +00:00
Cong Hou 7745dbc5c4 Enhance loop rotation with existence of profile data in MachineBlockPlacement pass.
Currently, in MachineBlockPlacement pass the loop is rotated to let the best exit to be the last BB in the loop chain, to maximize the fall-through from the loop to outside. With profile data, we can determine the cost in terms of missed fall through opportunities when rotating a loop chain and select the best rotation. Basically, there are three kinds of cost to consider for each rotation:

1. The possibly missed fall through edge (if it exists) from BB out of the loop to the loop header.
2. The possibly missed fall through edges (if they exist) from the loop exits to BB out of the loop.
3. The missed fall through edge (if it exists) from the last BB to the first BB in the loop chain.

Therefore, the cost for a given rotation is the sum of costs listed above. We select the best rotation with the smallest cost. This is only for PGO mode when we have more precise edge frequencies.

Differential revision: http://reviews.llvm.org/D10717

llvm-svn: 250754
2015-10-19 23:16:40 +00:00
Sanjay Patel 69a50a1e17 [CGP] transform select instructions into branches and sink expensive operands
This was originally checked in at r250527, but reverted at r250570 because of PR25222.
There were at least 2 problems: 
1. The cost check was checking for an instruction with an exact cost of TCC_Expensive;
that should have been >=.
2. The cause of the clang stage 1 failures was illegally sinking 'call' instructions;
we can't sink instructions that may have side effects / are not safe to execute speculatively.

Fixed those conditions in sinkSelectOperand() and added test cases.

Original commit message:
This is a follow-up to the discussion in D12882.

Ideally, we would like SimplifyCFG to be able to form select instructions even when the operands
are expensive (as defined by the TTI cost model) because that may expose further optimizations.
However, we would then like a later pass like CodeGenPrepare to undo that transformation if the
target would likely benefit from not speculatively executing an expensive op (this patch).

Once we have this safety mechanism in place, we can adjust SimplifyCFG to restore its
select-formation behavior that changed with r248439.

Differential Revision: http://reviews.llvm.org/D13297

llvm-svn: 250743
2015-10-19 21:59:12 +00:00
Owen Anderson faf5187ee0 Restore the original behavior of SelectionDAG::getTargetIndex().
It looks like an extra negation snuck in as apart of restoring it.

llvm-svn: 250726
2015-10-19 19:27:40 +00:00
Benjamin Kramer 2002aadaad Put back SelectionDAG::getTargetIndex.
While technically this is untested dead code, it has out-of-tree users.
This reverts a part of r250434.

llvm-svn: 250717
2015-10-19 18:26:16 +00:00
Matthias Braun e734195ce3 Revert "RegisterPressure: allocatable physreg uses are always kills"
This reverts commit r250596.

Reverted for now as the commit triggers assert in the AMDGPU target
pending investigation.

llvm-svn: 250713
2015-10-19 17:44:22 +00:00
Elena Demikhovsky 20662e39f1 Removed parameter "Consecutive" from isLegalMaskedLoad() / isLegalMaskedStore().
Originally I planned to use the same interface for masked gather/scatter and set isConsecutive to "false" in this case.

Now I'm implementing masked gather/scatter and see that the interface is inconvenient. I want to add interfaces isLegalMaskedGather() / isLegalMaskedScatter() instead of using the "Consecutive" parameter in the existing interfaces.

Differential Revision: http://reviews.llvm.org/D13850

llvm-svn: 250686
2015-10-19 07:43:38 +00:00
Simon Pilgrim 04d52d26f6 Use SDValue bool check. NFCI.
llvm-svn: 250653
2015-10-18 12:33:54 +00:00
Simon Pilgrim c2c154e078 Move one-use variable inside test. NFC.
llvm-svn: 250651
2015-10-18 11:47:23 +00:00
Simon Pilgrim 24057b9566 [DAG] Ensure vector constant folding uses correct scalar undef types
Minor fix to D13665 found during post-commit review.

llvm-svn: 250616
2015-10-17 16:49:43 +00:00
Matthias Braun 65e6d4a3f8 RegisterPressure: Unify the sparse sets in LiveRegsSet; NFC
Also do some cleanups comment improvements.

llvm-svn: 250598
2015-10-17 01:03:44 +00:00
Matthias Braun cdd2792aa6 RegisterPressure: allocatable physreg uses are always kills
This property was already used in the code path when no liveness
intervals are present. Unfortunately the code path that uses liveness
intervals tried to query a cached live interval for an allocatable
physreg, those are usually not computed so a conservative default was
used.

This doesn't affect any of the lit testcases. This is a foreclosure to
upcoming changes which should be NFC but without this patch this tidbit
wouldn't be NFC.

llvm-svn: 250596
2015-10-17 00:46:57 +00:00