Commit Graph

5142 Commits

Author SHA1 Message Date
Philip Reames 6da37857d1 [RewriteStatepointsForGC] Fix a relocation bug w.r.t values defined by invoke instructions
RewriteStatepointsForGC pass emits an alloca for each GC pointer which will be relocated. It then inserts stores after def and all relocations, and inserts loads before each use as well. In the end, mem2reg is used to update IR with relocations in SSA form.

However, there is a problem with inserting stores for values defined by invoke instructions. The code didn't expect a def was a terminator instruction, and inserting instructions after these terminators resulted in malformed IR.

This patch fixes this problem by handling invoke instructions as a special case. If the def is an invoke instruction, the store will be inserted at the beginning of the normal destination block. Since return value from invoke instruction does not dominate the unwind destination block, no action is needed there.

Patch by: Chen Li
Differential Revision: http://reviews.llvm.org/D7923

llvm-svn: 231183
2015-03-04 00:13:52 +00:00
David Majnemer 1bacc0abc9 InstCombine: Ensure select condition types are identical before merging
Selection conditions may be vectors or scalars.  Make sure InstCombine
doesn't indiscriminately assume that a select which is value dependent
on another select have identical select condition types.

This fixes PR22773.

llvm-svn: 231156
2015-03-03 22:40:36 +00:00
Nadav Rotem 029c5c7fdb Teach ComputeNumSignBits about signed divisions.
http://reviews.llvm.org/D8028
rdar://20023136

llvm-svn: 231140
2015-03-03 21:39:02 +00:00
Duncan P. N. Exon Smith e274180f0e DebugInfo: Move new hierarchy into place
Move the specialized metadata nodes for the new debug info hierarchy
into place, finishing off PR22464.  I've done bootstraps (and all that)
and I'm confident this commit is NFC as far as DWARF output is
concerned.  Let me know if I'm wrong :).

The code changes are fairly mechanical:

  - Bumped the "Debug Info Version".
  - `DIBuilder` now creates the appropriate subclass of `MDNode`.
  - Subclasses of DIDescriptor now expect to hold their "MD"
    counterparts (e.g., `DIBasicType` expects `MDBasicType`).
  - Deleted a ton of dead code in `AsmWriter.cpp` and `DebugInfo.cpp`
    for printing comments.
  - Big update to LangRef to describe the nodes in the new hierarchy.
    Feel free to make it better.

Testcase changes are enormous.  There's an accompanying clang commit on
its way.

If you have out-of-tree debug info testcases, I just broke your build.

  - `upgrade-specialized-nodes.sh` is attached to PR22564.  I used it to
    update all the IR testcases.
  - Unfortunately I failed to find way to script the updates to CHECK
    lines, so I updated all of these by hand.  This was fairly painful,
    since the old CHECKs are difficult to reason about.  That's one of
    the benefits of the new hierarchy.

This work isn't quite finished, BTW.  The `DIDescriptor` subclasses are
almost empty wrappers, but not quite: they still have loose casting
checks (see the `RETURN_FROM_RAW()` macro).  Once they're completely
gutted, I'll rename the "MD" classes to "DI" and kill the wrappers.  I
also expect to make a few schema changes now that it's easier to reason
about everything.

llvm-svn: 231082
2015-03-03 17:24:31 +00:00
Peter Collingbourne da2dbf21a9 LowerBitSets: Use byte arrays instead of bit sets to represent in-memory bit sets.
By loading from indexed offsets into a byte array and applying a mask, a
program can test bits from the bit set with a relatively short instruction
sequence. For example, suppose we have 15 bit sets to lay out:

A (16 bits), B (15 bits), C (14 bits), D (13 bits), E (12 bits),
F (11 bits), G (10 bits), H (9 bits), I (7 bits), J (6 bits), K (5 bits),
L (4 bits), M (3 bits), N (2 bits), O (1 bit)

These bits can be laid out in a 16-byte array like this:

      Byte Offset
    0123456789ABCDEF
Bit
  7 HHHHHHHHHIIIIIII
  6 GGGGGGGGGGJJJJJJ
  5 FFFFFFFFFFFKKKKK
  4 EEEEEEEEEEEELLLL
  3 DDDDDDDDDDDDDMMM
  2 CCCCCCCCCCCCCCNN
  1 BBBBBBBBBBBBBBBO
  0 AAAAAAAAAAAAAAAA

For example, to test bit X of A, we evaluate ((bits[X] & 1) != 0), or to
test bit X of I, we evaluate ((bits[9 + X] & 0x80) != 0). This can be done
in 1-2 machine instructions on x86, or 4-6 instructions on ARM.

This uses the LPT multiprocessor scheduling algorithm to lay out the bits
efficiently.

Saves ~450KB of instructions in a recent build of Chromium.

Differential Revision: http://reviews.llvm.org/D7954

llvm-svn: 231043
2015-03-03 00:49:28 +00:00
Benjamin Kramer 838752d3f6 LoopIdiom: Give globals for memset_pattern16 private linkage.
There's really no reason to have them have entries in the symbol table
anymore. Old versions of ld64 had some bugs in this area but those have
been fixed long ago.

llvm-svn: 231041
2015-03-03 00:17:09 +00:00
Sanjoy Das 2d38031271 Revert some changes that were made to fix PR20680.
This re-lands change r230921.  r230921 was reverted because it broke a
clang test; a checkin fixing the clang test will be commited shortly.

Summary:
As far as I can tell, the real bug causing the issue was fixed in
r230533.  SCEVExpander should mark an increment operation as nuw or nsw
only if it can *prove* that the operation does not overflow.  There
shouldn't be any situation where we have to do something different
because of no-wrap flags generated by SCEVExpander.

Revert "IndVarSimplify: Allow LFTR to fire more often"

This reverts commit 1ade0f0faa98877b688e0b9da58e876052c1e04e (SVN: 222213).

Revert "IndVarSimplify: Don't let LFTR compare against a poison value"

This reverts commit c0f2b8b528d8a37b0a1522aae90af649d6357eb5 (SVN: 217102).

Reviewers: majnemer, atrick, spatel

Differential Revision: http://reviews.llvm.org/D7979

llvm-svn: 231018
2015-03-02 21:41:07 +00:00
NAKAMURA Takumi 0cd23c842e Revert r230921, "Revert some changes that were made to fix PR20680.", for now.
It caused a failure on clang/test/Misc/backend-optimization-failure.cpp .

llvm-svn: 230929
2015-03-02 01:14:03 +00:00
Sanjoy Das 876bd51486 Revert some changes that were made to fix PR20680.
Summary:
As far as I can tell, the real bug causing the issue was fixed in
r230533.  SCEVExpander should mark an increment operation as nuw or nsw
only if it can *prove* that the operation does not overflow.  There
shouldn't be any situation where we have to do something different
because of no-wrap flags generated by SCEVExpander.

Revert "IndVarSimplify: Allow LFTR to fire more often"

This reverts commit 1ade0f0faa98877b688e0b9da58e876052c1e04e (SVN: 222213).

Revert "IndVarSimplify: Don't let LFTR compare against a poison value"

This reverts commit c0f2b8b528d8a37b0a1522aae90af649d6357eb5 (SVN: 217102).

Reviewers: majnemer, atrick, spatel

Differential Revision: http://reviews.llvm.org/D7979

llvm-svn: 230921
2015-03-01 23:36:26 +00:00
Duncan P. N. Exon Smith d0c2a99f0e DebugInfo: Convert DW_OP_piece => DW_OP_bit_piece
r228631 stopped using `DW_OP_piece` inside `DIExpression`s in the IR,
but it apparently missed updating these testcases.  Caught by verifier
checks for `MDExpression` while working on moving the new hierarchy into
place.

llvm-svn: 230882
2015-02-28 23:57:16 +00:00
Duncan P. N. Exon Smith a951165e5a Fix line endings on Transforms/Inline/inline_dbg_declare.ll
llvm-svn: 230870
2015-02-28 21:38:32 +00:00
Benjamin Kramer cb570f1bc9 TRE: Just erase dead BBs and tweak the iteration loop not to increment the deleted BB iterator.
Leaving empty blocks around just opens up a can of bugs like PR22704. Deleting
them early also slightly simplifies code.

Thanks to Sanjay for the IR test case.

llvm-svn: 230856
2015-02-28 16:47:27 +00:00
Philip Reames 2e5bcbe8d5 [RewriteStatepointsForGC] Fix another order of iteration bug
It turns out the naming of inserted phis and selects is sensative to the order in which two sets are iterated.  We need to nail this down to avoid non-deterministic output and possible test failures.  

The modified test is the one I first noticed something odd in.  The change is making it more strict to report the error.  With the test change, but without the code change, the test fails roughly 1 in 5.  With the code change, I've run ~30 runs without error.

Long term, the right fix here is to adjust the naming scheme.  I'm checking in this hack to avoid any possible non-determinism in the tests over the weekend.  HJust because I only noticed one case doesn't mean it's actually the only case.  I hope to get to the right change Monday.

std->llvm data structure changes bugfix change #3

llvm-svn: 230835
2015-02-28 01:52:09 +00:00
Philip Reames a5aeaf4b4f [RewriteStatepointsForGC] Add tests for the base pointer identification algorithm
These tests cover the 'base object' identification and rewritting portion of RewriteStatepointsForGC.  These aren't completely exhaustive, but they've proven to be reasonable effective over time at finding regressions.

In the process of porting these tests over, I found my first "cleanup per llvm code style standards" bug.  We were relying on the order of iteration when testing the base pointers found for a derived pointer.  When we switched from std::set to DenseSet, this stopped being a safe assumption.  I'm suspecting I'm going to find more of those.  In particular, I'm now really wondering about the main iteration loop for this algorithm.  I need to go take a closer look at the assumptions there.

I'm not really happy with the fact these are testing what is essentially debug output (i.e. enabled via command line flags).  Suggestions for how to structure this better are very welcome.  

llvm-svn: 230818
2015-02-28 00:20:48 +00:00
David Blaikie a79ac14fa6 [opaque pointer type] Add textual IR support for explicit type parameter to load instruction
Essentially the same as the GEP change in r230786.

A similar migration script can be used to update test cases, though a few more
test case improvements/changes were required this time around: (r229269-r229278)

import fileinput
import sys
import re

pat = re.compile(r"((?:=|:|^)\s*load (?:atomic )?(?:volatile )?(.*?))(| addrspace\(\d+\) *)\*($| *(?:%|@|null|undef|blockaddress|getelementptr|addrspacecast|bitcast|inttoptr|\[\[[a-zA-Z]|\{\{).*$)")

for line in sys.stdin:
  sys.stdout.write(re.sub(pat, r"\1, \2\3*\4", line))

Reviewers: rafael, dexonsmith, grosser

Differential Revision: http://reviews.llvm.org/D7649

llvm-svn: 230794
2015-02-27 21:17:42 +00:00
David Blaikie 79e6c74981 [opaque pointer type] Add textual IR support for explicit type parameter to getelementptr instruction
One of several parallel first steps to remove the target type of pointers,
replacing them with a single opaque pointer type.

This adds an explicit type parameter to the gep instruction so that when the
first parameter becomes an opaque pointer type, the type to gep through is
still available to the instructions.

* This doesn't modify gep operators, only instructions (operators will be
  handled separately)

* Textual IR changes only. Bitcode (including upgrade) and changing the
  in-memory representation will be in separate changes.

* geps of vectors are transformed as:
    getelementptr <4 x float*> %x, ...
  ->getelementptr float, <4 x float*> %x, ...
  Then, once the opaque pointer type is introduced, this will ultimately look
  like:
    getelementptr float, <4 x ptr> %x
  with the unambiguous interpretation that it is a vector of pointers to float.

* address spaces remain on the pointer, not the type:
    getelementptr float addrspace(1)* %x
  ->getelementptr float, float addrspace(1)* %x
  Then, eventually:
    getelementptr float, ptr addrspace(1) %x

Importantly, the massive amount of test case churn has been automated by
same crappy python code. I had to manually update a few test cases that
wouldn't fit the script's model (r228970,r229196,r229197,r229198). The
python script just massages stdin and writes the result to stdout, I
then wrapped that in a shell script to handle replacing files, then
using the usual find+xargs to migrate all the files.

update.py:
import fileinput
import sys
import re

ibrep = re.compile(r"(^.*?[^%\w]getelementptr inbounds )(((?:<\d* x )?)(.*?)(| addrspace\(\d\)) *\*(|>)(?:$| *(?:%|@|null|undef|blockaddress|getelementptr|addrspacecast|bitcast|inttoptr|\[\[[a-zA-Z]|\{\{).*$))")
normrep = re.compile(       r"(^.*?[^%\w]getelementptr )(((?:<\d* x )?)(.*?)(| addrspace\(\d\)) *\*(|>)(?:$| *(?:%|@|null|undef|blockaddress|getelementptr|addrspacecast|bitcast|inttoptr|\[\[[a-zA-Z]|\{\{).*$))")

def conv(match, line):
  if not match:
    return line
  line = match.groups()[0]
  if len(match.groups()[5]) == 0:
    line += match.groups()[2]
  line += match.groups()[3]
  line += ", "
  line += match.groups()[1]
  line += "\n"
  return line

for line in sys.stdin:
  if line.find("getelementptr ") == line.find("getelementptr inbounds"):
    if line.find("getelementptr inbounds") != line.find("getelementptr inbounds ("):
      line = conv(re.match(ibrep, line), line)
  elif line.find("getelementptr ") != line.find("getelementptr ("):
    line = conv(re.match(normrep, line), line)
  sys.stdout.write(line)

apply.sh:
for name in "$@"
do
  python3 `dirname "$0"`/update.py < "$name" > "$name.tmp" && mv "$name.tmp" "$name"
  rm -f "$name.tmp"
done

The actual commands:
From llvm/src:
find test/ -name *.ll | xargs ./apply.sh
From llvm/src/tools/clang:
find test/ -name *.mm -o -name *.m -o -name *.cpp -o -name *.c | xargs -I '{}' ../../apply.sh "{}"
From llvm/src/tools/polly:
find test/ -name *.ll | xargs ./apply.sh

After that, check-all (with llvm, clang, clang-tools-extra, lld,
compiler-rt, and polly all checked out).

The extra 'rm' in the apply.sh script is due to a few files in clang's test
suite using interesting unicode stuff that my python script was throwing
exceptions on. None of those files needed to be migrated, so it seemed
sufficient to ignore those cases.

Reviewers: rafael, dexonsmith, grosser

Differential Revision: http://reviews.llvm.org/D7636

llvm-svn: 230786
2015-02-27 19:29:02 +00:00
Sanjoy Das 54ad996ca2 IRCE: add a test case for r230619.
llvm-svn: 230680
2015-02-26 20:14:32 +00:00
Hal Finkel 221f467185 [InstCombine/PowerPC] Convert aligned QPX load/store intrinsics into loads/stores
InstCombine has long had logic to convert aligned Altivec load/store intrinsics
into regular loads and stores. This mirrors that functionality for QPX vector
load/store intrinsics.

llvm-svn: 230660
2015-02-26 18:56:03 +00:00
Hal Finkel 18ee7c14fd [InstCombine] Add a test for altivec load/store intrinsic simplification
InstCombine has logic to convert aligned Altivec load/store intrinsics into
regular loads and stores. Unfortunately, there seems to be no regression test
covering this behavior. Adding one...

llvm-svn: 230632
2015-02-26 14:22:41 +00:00
Sanjoy Das e75ed92630 IRCE: generalize to handle loops with decreasing induction variables.
IRCE can now split the iteration space for loops like:

   for (i = n; i >= 0; i--)
     a[i + k] = 42; // bounds check on access

llvm-svn: 230618
2015-02-26 08:19:31 +00:00
Ramkumar Ramachandra 3408f3e296 PlaceSafepoints: use IRBuilder helpers
Use the IRBuilder helpers for gc.statepoint and gc.result, instead of
coding the construction by hand. Note that the gc.statepoint IRBuilder
handles only CallInst, not InvokeInst; retain that part of hand-coding.

Differential Revision: http://reviews.llvm.org/D7518

llvm-svn: 230591
2015-02-26 00:35:56 +00:00
Sanjay Patel cc29f4f2cb only propagate equality comparisons of FP values that we are certain are non-zero
This is a follow-on to r227491 which tightens the check for propagating FP
values. If a non-constant value happens to be a zero, we would hit the same
bug as before.

Bug noted and patch suggested by Eli Friedman.

llvm-svn: 230564
2015-02-25 22:46:08 +00:00
JF Bastien d52c990a90 InstCombine: extract instead of shuffle when performing vector/array type punning
Summary: SROA generates code that isn't quite as easy to optimize and contains unusual-sized shuffles, but that code is generally correct. As discussed in D7487 the right place to clean things up is InstCombine, which will pick up the type-punning pattern and transform it into a more obvious bitcast+extractelement, while leaving the other patterns SROA encounters as-is.

Test Plan: make check

Reviewers: jvoung, chandlerc

Subscribers: llvm-commits
llvm-svn: 230560
2015-02-25 22:30:51 +00:00
Peter Collingbourne eba7f73ff9 LowerBitSets: Align referenced globals.
This change aligns globals to the next highest power of 2 bytes, up to a
maximum of 128. This makes it more likely that we will be able to compress
bit sets with a greater alignment. In many more cases, we can now take
advantage of a new optimization also introduced in this patch that removes
bit set checks if the bit set is all ones.

The 128 byte maximum was found to provide the best tradeoff between instruction
overhead and data overhead in a recent build of Chromium. It allows us to
remove ~2.4MB of instructions at the cost of ~250KB of data.

Differential Revision: http://reviews.llvm.org/D7873

llvm-svn: 230540
2015-02-25 20:42:41 +00:00
Sanjoy Das dcc84db264 Bugfix: SCEVExpander incorrectly marks increment operations as no-wrap
(The change was landed in r230280 and caused the regression PR22674.
This version contains a fix and a test-case for PR22674).
    
When emitting the increment operation, SCEVExpander marks the
operation as nuw or nsw based on the flags on the preincrement SCEV.
This is incorrect because, for instance, it is possible that {-6,+,1}
is <nuw> while {-6,+,1}+1 = {-5,+,1} is not.
    
This change teaches SCEV to mark the increment as nuw/nsw only if it
can explicitly prove that the increment operation won't overflow.
    
Apart from the attached test case, another (more realistic)
manifestation of the bug can be seen in
Transforms/IndVarSimplify/pr20680.ll.

Differential Revision: http://reviews.llvm.org/D7778

llvm-svn: 230533
2015-02-25 20:02:59 +00:00
Sanjay Patel 40eaa8df99 Fix really obscure bug in CannotBeNegativeZero() (PR22688)
With a diabolically crafted test case, we could recurse
through this code and return true instead of false.

The larger engineering crime is the use of magic numbers. 
Added FIXME comments for those.

llvm-svn: 230515
2015-02-25 18:00:15 +00:00
Charles Davis 33d1dc0008 [IC] Turn non-null MD on pointer loads to range MD on integer loads.
Summary:
This change fixes the FIXME that you recently added when you committed
(a modified version of) my patch.  When `InstCombine` combines a load and
store of an pointer to those of an equivalently-sized integer, it currently
drops any `!nonnull` metadata that might be present.  This change replaces
`!nonnull` metadata with `!range !{ 1, -1 }` metadata instead.

Reviewers: chandlerc

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D7621

llvm-svn: 230462
2015-02-25 05:10:25 +00:00
Peter Collingbourne 1baeaa395a LowerBitSets: Introduce global layout builder.
The builder is based on a layout algorithm that tries to keep members of
small bit sets together. The new layout compresses Chromium's bit sets to
around 15% of their original size.

Differential Revision: http://reviews.llvm.org/D7796

llvm-svn: 230394
2015-02-24 23:17:02 +00:00
Hans Wennborg 953d6fb84e Revert r230280: "Bugfix: SCEVExpander incorrectly marks increment operations as no-wrap"
This caused PR22674, failing this assert:

Instructions.h:2281: llvm::Value* llvm::PHINode::getOperand(unsigned int) const: Assertion `i_nocapture < OperandTraits<PHINode>::operands(this) && "getOperand() out of range!"' failed.

llvm-svn: 230341
2015-02-24 16:19:29 +00:00
Sanjoy Das 82ea3d45b5 New instcombine rule: max(~a,~b) -> ~min(a, b)
This case is interesting because ScalarEvolutionExpander lowers min(a,
b) as ~max(~a,~b).  I think the profitability heuristics can be made
more clever/aggressive, but this is a start.

Differential Revision: http://reviews.llvm.org/D7821

llvm-svn: 230285
2015-02-24 00:08:41 +00:00
Sanjoy Das 18c243b933 Bugfix: SCEVExpander incorrectly marks increment operations as no-wrap
When emitting the increment operation, SCEVExpander marks the
operation as nuw or nsw based on the flags on the preincrement SCEV.
This is incorrect because, for instance, it is possible that {-6,+,1}
is <nuw> while {-6,+,1}+1 = {-5,+,1} is not.

This change teaches SCEV to mark the increment as nuw/nsw only if it
can explicitly prove that the increment operation won't overflow.

Apart from the attached test case, another (more realistic) manifestation
of the bug can be seen in Transforms/IndVarSimplify/pr20680.ll.

NOTE: this change was landed with an incorrect commit message in
rL230275 and was reverted for that reason in rL230279.  This commit
message is the correct one.

Differential Revision: http://reviews.llvm.org/D7778

llvm-svn: 230280
2015-02-23 23:22:58 +00:00
Sanjoy Das c9cf0151cf Revert 230275.
230275 got committed with an incorrect commit message due to a mixup
on my side.  Will re-land in a few moments with the correct commit
message.

llvm-svn: 230279
2015-02-23 23:13:22 +00:00
Sanjoy Das 913dfd8f7f Fix bug 22641
The bug was a result of getPreStartForExtend interpreting nsw/nuw
flags on an add recurrence more strongly than is legal.  {S,+,X}<nsw>
implies S+X is nsw only if the backedge of the loop is taken at least
once.

Differential Revision: http://reviews.llvm.org/D7808

llvm-svn: 230275
2015-02-23 22:55:13 +00:00
Chad Rosier 543900539f Prevent hoisting fmul from THEN/ELSE to IF if there is fmsub/fmadd opportunity.
This patch adds the isProfitableToHoist API.  For AArch64, we want to prevent a
fmul from being hoisted in cases where it is more profitable to form a
fmsub/fmadd.

Phabricator Review: http://reviews.llvm.org/D7299
Patch by Lawrence Hu <lawrence@codeaurora.org>

llvm-svn: 230241
2015-02-23 19:15:16 +00:00
Mehdi Amini cd3ca6f7dd InstSimplify: simplify 0 / X if nnan and nsz
From: Fiona Glaser <fglaser@apple.com>
llvm-svn: 230238
2015-02-23 18:30:25 +00:00
Sanjoy Das 7fc60da2f5 IRCE: use SCEVs instead of llvm::Value's for intermediate
calculations.  Semantically non-functional change.

This gets rid of some of the SCEV -> Value -> SCEV round tripping and
the Construct(SMin|SMax)Of and MaybeSimplify helper routines.

llvm-svn: 230150
2015-02-21 22:07:32 +00:00
Philip Reames 0b1b387441 [PlaceSafepoints] Adjust enablement logic to default to off and be GC configurable per GC
Previously, this pass ran over every function in the Module if added to the pass order.  With this change, it runs only over those with a GC attribute where the GC explicitly opts in.  A GC can also choose which of entry safepoint polls, backedge safepoint polls, and call safepoints it wants.  I hope to get these exposed as checks on the GCStrategy at some point, but for now, the checks are manual string comparisons.

llvm-svn: 230097
2015-02-21 00:09:09 +00:00
Benjamin Kramer 911d5b3ace LoopRotate: When reconstructing loop simplify form don't split edges from indirectbrs.
Yet another chapter in the endless story. While this looks like we leave
the loop in a non-canonical state this replicates the logic in
LoopSimplify so it doesn't diverge from the canonical form in any way.

PR21968

llvm-svn: 230058
2015-02-20 20:49:25 +00:00
Peter Collingbourne e6909c8e8b Introduce bitset metadata format and bitset lowering pass.
This patch introduces a new mechanism that allows IR modules to co-operatively
build pointer sets corresponding to addresses within a given set of
globals. One particular use case for this is to allow a C++ program to
efficiently verify (at each call site) that a vtable pointer is in the set
of valid vtable pointers for the class or its derived classes. One way of
doing this is for a toolchain component to build, for each class, a bit set
that maps to the memory region allocated for the vtables, such that each 1
bit in the bit set maps to a valid vtable for that class, and lay out the
vtables next to each other, to minimize the total size of the bit sets.

The patch introduces a metadata format for representing pointer sets, an
'@llvm.bitset.test' intrinsic and an LTO lowering pass that lays out the globals
and builds the bitsets, and documents the new feature.

Differential Revision: http://reviews.llvm.org/D7288

llvm-svn: 230054
2015-02-20 20:30:47 +00:00
Philip Reames 2ef029c7ae Bugfix for 229954
Before calling Function::getGC to test for enablement, we need to make sure there's actually a GC at all via Function::hasGC.  Otherwise, we'd crash on functions without a GC.  Thankfully, this only mattered if you manually scheduled the pass, but still, oops. :(

llvm-svn: 230040
2015-02-20 18:56:14 +00:00
Hal Finkel 847e05f569 [InstCombine] Remove unnecessary variable indexing into single-element arrays
This change addresses a deficiency pointed out in PR22629. To copy from the bug
report:

[from the bug report]

Consider this code:

int f(int x) {
  int a[] = {12};
  return a[x];
}

GCC knows to optimize this to

movl     $12, %eax
ret

The code generated by recent Clang at -O3 is:

movslq   %edi, %rax
movl     .L_ZZ1fiE1a(,%rax,4), %eax
retq

.L_ZZ1fiE1a:
  .long    12                      # 0xc

[end from the bug report]

This definitely seems worth fixing. I've also seen this kind of code before (as
the base case of generic vector wrapper templates with one element).

The general idea is to look at the GEP feeding a load or a store, which has
some variable as its first non-zero index, and determine if that index must be
zero (or else an out-of-bounds access would occur). We can do this for allocas
and globals with constant initializers where we know the maximum size of the
underlying object. When we find such a GEP, we create a new one for the memory
access with that first variable index replaced with a constant zero.

Even if we can't eliminate the memory access (and sometimes we can't), it is
still useful because it removes unnecessary indexing calculations.

llvm-svn: 229959
2015-02-20 03:05:53 +00:00
Philip Reames 6faacf4772 Adjust enablement of RewriteStatepointsForGC
When back merging the changes in 229945 I noticed that I forgot to mark the test cases with the appropriate GC.  We want the rewriting to be off by default (even when manually added to the pass order), not on-by default.  To keep the current test working, mark them as using the statepoint-example GC and whitelist that GC.  

Longer term, we need a better selection mechanism here for both actual usage and testing.  As I migrate more tests to the in tree version of this pass, I will probably need to update the enable/disable logic as well. 

llvm-svn: 229954
2015-02-20 02:34:49 +00:00
Philip Reames d16a9b1fdc Add a pass for constructing gc.statepoint sequences w/explicit relocations
This patch consists of a single pass whose only purpose is to visit previous inserted gc.statepoints which do not have gc.relocates inserted yet, and insert them. This can be used either immediately after IR generation to perform 'early safepoint insertion' or late in the pass order to perform 'late insertion'.

This patch is setting the stage for work to continue in tree.  In particular, there are known naming and style violations in the current patch.  I'll try to get those resolved over the next week or so.  As I touch each area to make style changes, I need to make sure we have adequate testing in place.  As part of the cleanup, I will be cleaning up a collection of test cases we have out of tree and submitting them upstream. The tests included in this change are very basic and mostly to provide examples of usage.

The pass has several main subproblems it needs to address:
- First, it has identify any live pointers. In the current code, the use of address spaces to distinguish pointers to GC managed objects is hard coded, but this will become parametrizable in the near future.  Note that the current change doesn't actually contain a useful liveness analysis.  It was seperated into a followup change as the code wasn't ready to be shared.  Instead, the current implementation just considers any dominating def of appropriate pointer type to be live.
- Second, it has to identify base pointers for each live pointer. This is a fairly straight forward data flow algorithm. 
- Third, the information in the previous steps is used to actually introduce rewrites. Rather than trying to do this by hand, we simply re-purpose the code behind Mem2Reg to do this for us.

llvm-svn: 229945
2015-02-20 01:06:44 +00:00
Michael Gottesman 0fc2accb58 [objc-arc-contract] We can not move retains over instructions which can not conservatively be proven to not decrement the retain's RCIdentity.
I also cleaned up the code to make it more understandable for mere mortals.

<rdar://problem/19853758>

llvm-svn: 229937
2015-02-20 00:02:49 +00:00
Ahmed Bougacha db141ac37d [ARM] Re-re-apply VLD1/VST1 base-update combine.
This re-applies r223862, r224198, r224203, and r224754, which were
reverted in r228129 because they exposed Clang misalignment problems
when self-hosting.

The combine caused the crashes because we turned ISD::LOAD/STORE nodes
to ARMISD::VLD1/VST1_UPD nodes.  When selecting addressing modes, we
were very lax for the former, and only emitted the alignment operand
(as in "[r1:128]") when it was larger than the standard alignment of
the memory type.

However, for ARMISD nodes, we just used the MMO alignment, no matter
what.  In our case, we turned ISD nodes to ARMISD nodes, and this
caused the alignment operands to start being emitted.

And that's how we exposed alignment problems that were ignored before
(but I believe would have been caught with SCTRL.A==1?).

To fix this, we can just mirror the hack done for ISD nodes:  only
take into account the MMO alignment when the access is overaligned.

Original commit message:
We used to only combine intrinsics, and turn them into VLD1_UPD/VST1_UPD
when the base pointer is incremented after the load/store.

We can do the same thing for generic load/stores.

Note that we can only combine the first load/store+adds pair in
a sequence (as might be generated for a v16f32 load for instance),
because other combines turn the base pointer addition chain (each
computing the address of the next load, from the address of the last
load) into independent additions (common base pointer + this load's
offset).

rdar://19717869, rdar://14062261.

llvm-svn: 229932
2015-02-19 23:52:41 +00:00
Rafael Espindola 8c97e19124 Avoid conversion to float when creating ConstantDataArray/ConstantDataVector.
Patch by Raoux, Thomas F!

llvm-svn: 229864
2015-02-19 16:08:20 +00:00
Igor Laevsky 55d60a4a2f Add few simple tests to check statepoint placement for invoke instructions.
Differential Revision: http://reviews.llvm.org/D7535

llvm-svn: 229842
2015-02-19 11:39:04 +00:00
Chandler Carruth b89464a9b6 [x86,sdag] Two interrelated changes to the x86 and sdag code.
First, don't combine bit masking into vector shuffles (even ones the
target can handle) once operation legalization has taken place. Custom
legalization of vector shuffles may exist for these patterns (making the
predicate return true) but that custom legalization may in some cases
produce the exact bit math this matches. We only really want to handle
this prior to operation legalization.

However, the x86 backend, in a fit of awesome, relied on this. What it
would do is mark VSELECTs as expand, which would turn them into
arithmetic, which this would then match back into vector shuffles, which
we would then lower properly. Amazing.

Instead, the second change is to teach the x86 backend to directly form
vector shuffles from VSELECT nodes with constant conditions, and to mark
all of the vector types we support lowering blends as shuffles as custom
VSELECT lowering. We still mark the forms which actually support
variable blends as *legal* so that the custom lowering is bypassed, and
the legal lowering can even be used by the vector shuffle legalization
(yes, i know, this is confusing. but that's how the patterns are
written).

This makes the VSELECT lowering much more sensible, and in fact should
fix a bunch of bugs with it. However, as you'll see in the test cases,
right now what it does is point out the *hilarious* deficiency of the
new vector shuffle lowering when it comes to blends. Fortunately, my
very next patch fixes that. I can't submit it yet, because that patch,
somewhat obviously, forms the exact and/or pattern that the DAG combine
is matching here! Without this patch, teaching the vector shuffle
lowering to produce the right code infloops in the DAG combiner. With
this patch alone, we produce terrible code but at least lower through
the right paths. With both patches, all the regressions here should be
fixed, and a bunch of the improvements (like using 2 shufps with no
memory loads instead of 2 andps with memory loads and an orps) will
stay. Win!

There is one other change worth noting here. We had hilariously wrong
vectorization cost estimates for vselect because we fell through to the
code path that assumed all "expand" vector operations are scalarized.
However, the "expand" lowering of VSELECT is vector bit math, most
definitely not scalarized. So now we go back to the correct if horribly
naive cost of "1" for "not scalarized". If anyone wants to add actual
modeling of shuffle costs, that would be cool, but this seems an
improvement on its own. Note the removal of 16 and 32 "costs" for doing
a blend. Even in SSE2 we can blend in fewer than 16 instructions. ;] Of
course, we don't right now because of OMG bad code, but I'm going to fix
that. Next patch. I promise.

llvm-svn: 229835
2015-02-19 10:36:19 +00:00
Sanjoy Das 11b279a832 Partial fix for bug 22589
Don't spend the entire iteration space in the scalar loop prologue if
computing the trip count overflows.  This change also gets rid of the
backedge check in the prologue loop and the extra check for
overflowing trip-count.

Differential Revision: http://reviews.llvm.org/D7715

llvm-svn: 229731
2015-02-18 19:32:25 +00:00
Elena Demikhovsky 6ddd998589 Minor fix after 229495.
Removed metadata and function attributes from the test.

llvm-svn: 229647
2015-02-18 08:09:28 +00:00