Commit Graph

5509 Commits

Author SHA1 Message Date
Sanjay Patel cdd5ec47ed fix typos; NFC
llvm-svn: 244619
2015-08-11 16:10:41 +00:00
Sanjay Patel fec7965b36 fix minsize detection: minsize attribute implies optimizing for size
llvm-svn: 244617
2015-08-11 15:56:31 +00:00
Mehdi Amini b10555cc61 Fix InstCombine test: invalid CHECK line slipped in r231270
I incorrectly wrote CHECK-NEXT with followin with ':', the check was
ignored by FileCheck.
The non-inbound GEP is folded here because the DataLayout is no longer
optional, the fold was originally guarded with a comment that said:
    We need TD information to know the pointer size unless this is inbounds.
Now we always have "TD information" and perform the fold.

Thanks Jonathan Roelofs for noticing.

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 244613
2015-08-11 15:31:17 +00:00
Sanjay Patel b5c0c58737 remove unnecessary settings/attributes from test case
llvm-svn: 244612
2015-08-11 15:30:53 +00:00
James Molloy 134bec2722 Add support for floating-point minnum and maxnum
The select pattern recognition in ValueTracking (as used by InstCombine
and SelectionDAGBuilder) only knew about integer patterns. This teaches
it about minimum and maximum operations.

matchSelectPattern() has been extended to return a struct containing the
existing Flavor and a new enum defining the pattern's behavior when
given one NaN operand.

C minnum() is defined to return the non-NaN operand in this case, but
the idiomatic C "a < b ? a : b" would return the NaN operand.

ARM and AArch64 at least have different instructions for these different cases.

llvm-svn: 244580
2015-08-11 09:12:57 +00:00
Tyler Nowicki c94d6ad241 Print vectorization analysis when loop hint is specified.
This patch and a relatec clang patch solve the problem of having to explicitly enable analysis when specifying a loop hint pragma to get the diagnostics. Passing AlwasyPrint as the pass name (see below) causes the front-end to print the diagnostic if the user has specified '-Rpass-analysis' without an '=<target-pass>’. Users of loop hints can pass that compiler option without having to specify the pass and they will get diagnostics for only those loops with loop hints.

llvm-svn: 244555
2015-08-11 01:09:15 +00:00
Sanjoy Das 7742b8ba15 Address post-commit review from r243378.
This checks that bork_directive occurs exactly twice in the test output.

llvm-svn: 244543
2015-08-11 00:20:24 +00:00
Tyler Nowicki 652b0dabe6 Extend late diagnostics to include late test for runtime pointer checks.
This patch moves checking the threshold of runtime pointer checks to the vectorization requirements (late diagnostics) and emits a diagnostic that infroms the user the loop would be vectorized if not for exceeding the pointer-check threshold. Clang will also append the options that can be used to allow vectorization.

llvm-svn: 244523
2015-08-10 23:01:55 +00:00
Tyler Nowicki 655e573dc5 Make fp vectorization test X86 specified to avoid cost-model related problems on arm-thumb and hexagon.
llvm-svn: 244505
2015-08-10 21:14:38 +00:00
Simon Pilgrim a3a72b41de [InstCombine] Move SSE2/AVX2 arithmetic vector shift folding to instcombiner
As discussed in D11760, this patch moves the (V)PSRA(WD) arithmetic shift-by-constant folding to InstCombine to match the logical shift implementations.

Differential Revision: http://reviews.llvm.org/D11886

llvm-svn: 244495
2015-08-10 20:21:15 +00:00
Jonathan Roelofs f45295c366 Fix a few more cases of 'CHECK[^:]*$'. NFCI
llvm-svn: 244491
2015-08-10 19:56:39 +00:00
Tyler Nowicki c1a86f5866 Late evaluation of the fast-math vectorization requirement.
This patch moves the verification of fast-math to just before vectorization is done. This way we can tell clang to append the command line options would that allow floating-point commutativity. Specifically those are enableing fast-math or specifying a loop hint. 

llvm-svn: 244489
2015-08-10 19:51:46 +00:00
Tyler Nowicki 4d62f2e039 Modify diagnostic messages to clearly indicate the why interleaving wasn't done.
Sometimes interleaving is not beneficial, as determined by the cost-model and sometimes it is disabled by a loop hint (by the user). This patch modifies the diagnostic messages to make it clear why interleaving wasn't done.

llvm-svn: 244485
2015-08-10 19:14:16 +00:00
Jonathan Roelofs 49e46ce8e2 Fix a bunch of trivial cases of 'CHECK[^:]*$' in the tests. NFCI
I looked into adding a warning / error for this to FileCheck, but there doesn't
seem to be a good way to avoid it triggering on the instances of it in RUN lines.

llvm-svn: 244481
2015-08-10 19:01:27 +00:00
Mark Heffernan 8939154a22 Add new llvm.loop.unroll.enable metadata.
This change adds the unroll metadata "llvm.loop.unroll.enable" which directs
the optimizer to unroll a loop fully if the trip count is known at compile time, and
unroll partially if the trip count is not known at compile time. This differs from
"llvm.loop.unroll.full" which explicitly does not unroll a loop if the trip count is not
known at compile time.

The "llvm.loop.unroll.enable" is intended to be added for loops annotated with
"#pragma unroll".

llvm-svn: 244466
2015-08-10 17:28:08 +00:00
Fraser Cormack e29ab2bfab Prevent the scalarizer from caching incorrect entries
The scalarizer can cache incorrect entries when walking up a chain of
insertelement instructions. This occurs when it encounters more than one
instruction that it is not actively searching for, as it unconditionally caches
every element it finds. The fix is to only cache the first element that it
isn't searching for so we don't overwrite correct entries.

Reviewers: hfinkel

Differential Revision: http://reviews.llvm.org/D11559

llvm-svn: 244448
2015-08-10 14:48:47 +00:00
David Majnemer 4232fb3f8d [PHITransAddr] Don't assume that instruction operands are translatable
We can only PHI translate instructions.  In our attempt to PHI translate
a bitcast, we attempt to translate its operand; however, the operand
might be an argument or a global instead of an instruction.  Benignly
bail out when this happens.

This fixes PR24397.

Differential Revision: http://reviews.llvm.org/D11879

llvm-svn: 244418
2015-08-09 15:43:02 +00:00
Chen Li eafbc9dc47 [ConstantFoldTerminator] Preserve make.implicit metadata when converting SwitchInst to BranchInst
Summary: llvm::ConstantFoldTerminator function can convert SwitchInst with single case (and default) to a conditional BranchInst. This patch adds support to preserve make.implicit metadata on this conversion.

Reviewers: sanjoy, weimingz, chenli

Subscribers: mcrosier, llvm-commits

Differential Revision: http://reviews.llvm.org/D11841

llvm-svn: 244348
2015-08-07 19:30:12 +00:00
Simon Pilgrim 3815c16bf8 [InstCombine] Fix SSE2/AVX2 vector logical shift by constant
This patch fixes the sse2/avx2 vector shift by constant instcombine call to correctly deal with the fact that the shift amount is formed from the entire lower 64-bit and not just the lowest element as it currently assumes.

e.g.

%1 = tail call <4 x i32> @llvm.x86.sse2.psrl.d(<4 x i32> %v, <4 x i32> <i32 15, i32 15, i32 15, i32 15>)

In this case, (V)PSRLD doesn't perform a lshr by 15 but in fact attempts to shift by 64424509455 ((15 << 32) | 15) - giving a zero result.

In addition, this review also recognizes shift-by-zero from a ConstantAggregateZero type (PR23821).

Differential Revision: http://reviews.llvm.org/D11760

llvm-svn: 244341
2015-08-07 18:22:50 +00:00
Sanjoy Das 366acc175e [IndVars] Fix PR24356.
Unsigned predicates increase or decrease agnostic of the signs of their
increments.

llvm-svn: 244265
2015-08-06 20:43:41 +00:00
Quentin Colombet 6443cce233 [Reassociation] Fix miscompile for va_arg arguments.
iisUnmovableInstruction() had a list of instructions hardcoded which are
considered unmovable. The list lacked (at least) an entry for the va_arg
and cmpxchg instructions.
Fix this by introducing a new Instruction::mayBeMemoryDependent()
instead of maintaining another instruction list.

Patch by Matthias Braun <matze@braunis.de>.

Differential Revision: http://reviews.llvm.org/D11577

rdar://problem/22118647

llvm-svn: 244244
2015-08-06 18:44:34 +00:00
Richard Diamond bd753c9315 Fix an alignment error in `llvm::expandAtomicRMWToCmpXchg` without breaking the build where X86 isn't enabled.
Summary: Divide the primitive size in bits by eight so the initial load's alignment is in bytes as expected. Tested with the included unit test.

Reviewers: rengolin, jfb

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D11804

llvm-svn: 244229
2015-08-06 16:55:03 +00:00
Renato Golin a02ac60469 Revert "Divide the primitive size in bits by eight so the initial load's alignment is in bytes as expected. Tested with the included unit test."
This reverts commit r244155, as it was breaking the buildbots for too long.
Should be reapplied with proper fix.

llvm-svn: 244205
2015-08-06 10:37:59 +00:00
Richard Diamond 559c1d72a9 Divide the primitive size in bits by eight so the initial load's alignment is in
bytes as expected. Tested with the included unit test.

llvm-svn: 244155
2015-08-05 22:10:57 +00:00
Chen Li 50efd9220a [LoopUnswitch] Preserve make.implicit metadata for unswitched conditions
Summary: This patch adds support to preserve make.implicit metadata for unswitched conditions in loop pre-header.

Reviewers: sanjoy, weimingz

Subscribers: mcrosier, llvm-commits

Differential Revision: http://reviews.llvm.org/D11769

llvm-svn: 244132
2015-08-05 21:13:26 +00:00
Simon Pilgrim 42c611b9ae [InstCombine] Added more specific SSE2/AVX2 vector shift tests.
llvm-svn: 244022
2015-08-05 08:21:38 +00:00
Simon Pilgrim d19b9d8229 [InstCombine] Split off SSE2/AVX2 vector shift tests.
These aren't vector demanded bits tests. More tests to follow.

llvm-svn: 243963
2015-08-04 08:05:27 +00:00
Mehdi Amini c8d5783114 Update test suite to make "ninja check" succeed without native backend builtin
Requires "native" feature in most places that were failing.

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 243960
2015-08-04 06:32:54 +00:00
Sanjoy Das 215df9ed98 Revert "[LSR] Generate and use zero extends"
This reverts commit r243348 and r243357.  They caused PR24347.

llvm-svn: 243939
2015-08-04 01:52:05 +00:00
Chandler Carruth 87adb7a2e2 [Unroll] Improve the brute force loop unroll estimate by propagating
through PHI nodes across iterations.

This patch teaches the new advanced loop unrolling heuristics to propagate
constants into the loop from the preheader and around the backedge after
simulating each iteration. This lets us brute force solve simple recurrances
that aren't modeled effectively by SCEV. It also makes it more clear why we
need to process the loop in-order rather than bottom-up which might otherwise
make much more sense (for example, for DCE).

This came out of an attempt I'm making to develop a principled way to account
for dead code in the unroll estimation. When I implemented
a forward-propagating version of that it produced incorrect results due to
failing to propagate *cost* between loop iterations through the PHI nodes, and
it occured to me we really should at least propagate simplifications across
those edges, and it is quite easy thanks to the loop being in canonical and
LCSSA form.

Differential Revision: http://reviews.llvm.org/D11706

llvm-svn: 243900
2015-08-03 20:32:27 +00:00
Duncan P. N. Exon Smith 55ca964e94 DI: Disallow uniquable DICompileUnits
Since r241097, `DIBuilder` has only created distinct `DICompileUnit`s.
The backend is liable to start relying on that (if it hasn't already),
so make uniquable `DICompileUnit`s illegal and automatically upgrade old
bitcode.  This is a nice cleanup, since we can remove an unnecessary
`DenseSet` (and the associated uniquing info) from `LLVMContextImpl`.

Almost all the testcases were updated with this script:

    git grep -e '= !DICompileUnit' -l -- test |
    grep -v test/Bitcode |
    xargs sed -i '' -e 's,= !DICompileUnit,= distinct !DICompileUnit,'

I imagine something similar should work for out-of-tree testcases.

llvm-svn: 243885
2015-08-03 17:26:41 +00:00
Duncan P. N. Exon Smith ed013cd221 DI: Remove DW_TAG_arg_variable and DW_TAG_auto_variable
Remove the fake `DW_TAG_auto_variable` and `DW_TAG_arg_variable` tags,
using `DW_TAG_variable` in their place Stop exposing the `tag:` field at
all in the assembly format for `DILocalVariable`.

Most of the testcase updates were generated by the following sed script:

    find test/ -name "*.ll" -o -name "*.mir" |
    xargs grep -l 'DILocalVariable' |
    xargs sed -i '' \
      -e 's/tag: DW_TAG_arg_variable, //' \
      -e 's/tag: DW_TAG_auto_variable, //'

There were only a handful of tests in `test/Assembly` that I needed to
update by hand.

(Note: a follow-up could change `DILocalVariable::DILocalVariable()` to
set the tag to `DW_TAG_formal_parameter` instead of `DW_TAG_variable`
(as appropriate), instead of having that logic magically in the backend
in `DbgVariable`.  I've added a FIXME to that effect.)

llvm-svn: 243774
2015-07-31 18:58:39 +00:00
Wei Mi d6f7252e2e [SLP vectorizer]: Choose the best consecutive candidate to pair with a store instruction.
The patch changes the SLPVectorizer::vectorizeStores to choose the immediate
succeeding or preceding candidate for a store instruction when it has multiple
consecutive candidates. In this way it has better chance to find more slp
vectorization opportunities.

Differential Revision: http://reviews.llvm.org/D10445

llvm-svn: 243666
2015-07-30 17:40:39 +00:00
Michael Zolotukhin 9f06ef76d3 [Unroll] Handle SwitchInst properly.
Previously successor selection was simply wrong.

llvm-svn: 243545
2015-07-29 18:10:33 +00:00
Michael Zolotukhin 3a7d55b623 [Unroll] Don't crash when simplified branch condition is undef.
llvm-svn: 243544
2015-07-29 18:10:29 +00:00
Michael Zolotukhin a2069d36ce Rename test full-unroll-bad-geps.ll to full-unroll-crashers.ll.
No reason to limit it only to GEP-related crashes. More tests are to
come here.

llvm-svn: 243543
2015-07-29 18:10:23 +00:00
Sanjoy Das cfe41f050c [Statepoints] Let patchable statepoints have a symbolic call target.
Summary:
As added initially, statepoints required their call targets to be a
constant pointer null if ``numPatchBytes`` was non-zero.  This turns out
to be a problem ergonomically, since there is no way to mark patchable
statepoints as calling a (readable) symbolic value.

This change remove the restriction of requiring ``null`` call targets
for patchable statepoints, and changes PlaceSafepoints to maintain the
symbolic call target through its transformation.

Reviewers: reames, swaroop.sridhar

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D11550

llvm-svn: 243502
2015-07-28 23:50:30 +00:00
Jingyue Wu 42f1d67a45 [SCEV] Apply NSW and NUW flags via poison value analysis
Summary:
Make Scalar Evolution able to propagate NSW and NUW flags from instructions to SCEVs in some cases. This is based on reasoning about when poison from instructions with these flags would trigger undefined behavior. This gives a 13% speed-up on some Eigen3-based Google-internal microbenchmarks for NVPTX.

There does not seem to be clear agreement about when poison should be considered to propagate through instructions. In this analysis, poison propagates only in cases where that should be uncontroversial.

This change makes LSR able to create induction variables for expressions like &ptr[i + offset] for loops like this:

  for (int i = 0; i < limit; ++i) {
    sum += ptr[i + offset];
  }

Here ptr is a 64 bit pointer and offset is a 32 bit integer. For NVPTX, LSR currently creates an induction variable for i + offset instead, which is not as fast. Improving this situation is what brings the 13% speed-up on some Eigen3-based Google-internal microbenchmarks for NVPTX.


There are more details in this discussion on llvmdev.
June: http://lists.cs.uiuc.edu/pipermail/llvmdev/2015-June/thread.html#87234
July: http://lists.cs.uiuc.edu/pipermail/llvmdev/2015-July/thread.html#87392

Patch by Bjarke Roune

Reviewers: eliben, atrick, sanjoy

Subscribers: majnemer, hfinkel, jingyue, meheff, llvm-commits

Differential Revision: http://reviews.llvm.org/D11212

llvm-svn: 243460
2015-07-28 18:22:40 +00:00
Chih-Hung Hsieh 1e859582d6 Implement target independent TLS compatible with glibc's emutls.c.
The 'common' section TLS is not implemented.
Current C/C++ TLS variables are not placed in common section.
DWARF debug info to get the address of TLS variables is not generated yet.

clang and driver changes in http://reviews.llvm.org/D10524

  Added -femulated-tls flag to select the emulated TLS model,
  which will be used for old targets like Android that do not
  support ELF TLS models.

Added TargetLowering::LowerToTLSEmulatedModel as a target-independent
function to convert a SDNode of TLS variable address to a function call
to __emutls_get_address.

Added into lib/Target/*/*ISelLowering.cpp to call LowerToTLSEmulatedModel
for TLSModel::Emulated. Although all targets supporting ELF TLS models are
enhanced, emulated TLS model has been tested only for Android ELF targets.
Modified AsmPrinter.cpp to print the emutls_v.* and emutls_t.* variables for
emulated TLS variables.
Modified DwarfCompileUnit.cpp to skip some DIE for emulated TLS variabls.

TODO: Add proper DIE for emulated TLS variables.
      Added new unit tests with emulated TLS.

Differential Revision: http://reviews.llvm.org/D10522

llvm-svn: 243438
2015-07-28 16:24:05 +00:00
Sanjoy Das 6c7a186599 FileCheck'ify some wc/grep based tests; NFCI.
llvm-svn: 243378
2015-07-28 03:50:09 +00:00
Sanjoy Das 3895a57b32 [LSR] Move X86 specific test case to X86/
rL243348 added the test case in the wrong directory.

llvm-svn: 243357
2015-07-28 00:13:42 +00:00
Sanjoy Das 93b3504aa8 [LSR] Generate and use zero extends
Summary:
If a scale or a base register can be rewritten as "Zext({A,+,1})" then
LSR will now consider a formula of that form in its normal cost
computation.

Depends on D9180

Reviewers: qcolombet, atrick

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D9181

llvm-svn: 243348
2015-07-27 23:27:51 +00:00
Sanjoy Das 5dab205ced [IndVars] Make loop varying predicates loop invariant.
Summary:
Was D9784: "Remove loop variant range check when induction variable is
strictly increasing"

This change re-implements D9784 with the two differences:

 1. It does not use SCEVExpander and does not generate new
    instructions.  Instead, it does a quick local search for existing
    `llvm::Value`s that it needs when modifying the `icmp`
    instruction.

 2. It is more general -- it deals with both increasing and decreasing
    induction variables.

I've added all of the tests included with D9784, and two more.

As an example on what this change does (copied from D9784):

Given C code:

```
for (int i = M; i < N; i++) // i is known not to overflow
  if (i < 0) break;
  a[i] = 0;
}
```

This transformation produces:

```
for (int i = M; i < N; i++)
  if (M < 0) break;
  a[i] = 0;
}
```

Which can be unswitched into:

```
if (!(M < 0))
  for (int i = M; i < N; i++)
    a[i] = 0;
}
```

I went back and forth on whether the top level logic should live in
`SimplifyIndvar::eliminateIVComparison` or be put into its own
routine.  Right now I've put it under `eliminateIVComparison` because
even though the `icmp` is not *eliminated*, it no longer is an IV
comparison.  I'm open to putting it in its own helper routine if you
think that is better.

Reviewers: reames, nicholas, atrick

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D11278

llvm-svn: 243331
2015-07-27 21:42:49 +00:00
Simon Pilgrim 15c0a59463 [InstCombine][X86][SSE] Replace sign/zero extension intrinsics with native IR
Now that we are generating sane codegen for vector sext/zext nodes on SSE targets, this patch uses instcombine to replace the SSE41/AVX2 pmovsx and pmovzx intrinsics with the equivalent native IR code.

Differential Revision: http://reviews.llvm.org/D11503

llvm-svn: 243303
2015-07-27 18:52:15 +00:00
Matt Arsenault 95365ca482 Fix assert when inlining a constantexpr addrspacecast
The pointer size of the addrspacecasted pointer might not have matched,
so this would have hit an assert in accumulateConstantOffset.

I think this was here to allow constant folding of a load of an
addrspacecasted constant. Accumulating the offset through the
addrspacecast doesn't make much sense, so something else is necessary
to allow folding the load through this cast.

llvm-svn: 243300
2015-07-27 18:31:03 +00:00
Silviu Baranga de38070587 The tests added in r243270 require asserts to be enabled
llvm-svn: 243274
2015-07-27 15:22:49 +00:00
Silviu Baranga 65bdb6788b Fix the tests added in r243270. Use 2>&1 instead of |&
llvm-svn: 243273
2015-07-27 15:08:55 +00:00
Silviu Baranga 7581d22512 [ARM/AArch64] Fix cost model for interleaved accesses
Summary:
Fix the cost of interleaved accesses for ARM/AArch64.
We were calling getTypeAllocSize and using it to check
the number of bits, when we should have called
getTypeAllocSizeInBits instead.

This would pottentially cause the vectorizer to
generate loads/stores and shuffles which cannot
be matched with an interleaved access instruction.

No performance changes are expected for now since
matching/generating interleaved accesses is still
disabled by default.

Reviewers: rengolin

Subscribers: aemerson, llvm-commits, rengolin

Differential Revision: http://reviews.llvm.org/D11524

llvm-svn: 243270
2015-07-27 14:39:34 +00:00
Jingyue Wu bfefff555e Roll forward r243250
r243250 appeared to break clang/test/Analysis/dead-store.c on one of the build
slaves, but I couldn't reproduce this failure locally. Probably a false
positive as I saw this test was broken by r243246 or r243247 too but passed
later without people fixing anything.

llvm-svn: 243253
2015-07-26 19:10:03 +00:00
Jingyue Wu 84879b71a9 Revert r243250
breaks tests

llvm-svn: 243251
2015-07-26 18:30:13 +00:00