Commit Graph

222923 Commits

Author SHA1 Message Date
Simon Pilgrim 9904924e6b [X86][SSE] Tidyup BUILD_VECTOR operand collection. NFCI.
Avoid reuse of operand variables, keep them local to a particular lowering - the operand collection is unique to each case anyhow.

Renamed from V to Ops to more closely match their purpose.

llvm-svn: 261078
2016-02-17 10:12:30 +00:00
Benjamin Kramer 98520ca73b [Hexagon] cast<> a reference instead of referencing + dereferencing.
llvm-svn: 261077
2016-02-17 09:28:45 +00:00
Jonas Hahnfeld ffed72bbeb [compiler-rt][msan] Ensure initialisation before calling __msan_unpoison
__msan_unpoison uses intercepted memset which currently leads to a SEGV
when linking with libc++ under CentOS 7.

Differential Revision: http://reviews.llvm.org/D17263

llvm-svn: 261073
2016-02-17 07:12:18 +00:00
David Blaikie 8bce5a053d llvm-dwp: Support for type units when merging DWPs into larger DWPs
llvm-svn: 261072
2016-02-17 07:00:24 +00:00
David Blaikie 376b33a0d0 Fix the hash function.
llvm-svn: 261071
2016-02-17 07:00:22 +00:00
Cong Hou bbd4e3b400 Detecte vector reduction operations just before instruction selection.
This patch detects vector reductions before instruction selection. Vector
reductions are vectorized reduction operations, and for such operations we have
freedom to reorganize the elements of the result as long as the reduction of them
stay unchanged. This will enable some reduction pattern recognition during
instruction combine such as SAD/dot-product on X86. A flag is added to
SDNodeFlags to mark those vector reduction nodes to be checked during instruction
combine.

To detect those vector reductions, we search def-use chains starting from the
given instruction, and check if all uses fall into two categories:

1. Reduction with another vector.
2. Reduction on all elements.

in which 2 is detected by recognizing the pattern that the loop vectorizer
generates to reduce all elements in the vector outside of the loop, which
includes several ShuffleVector and one ExtractElement instructions.


Differential revision: http://reviews.llvm.org/D15250

llvm-svn: 261070
2016-02-17 06:37:04 +00:00
Rui Ueyama a2b1f45ded Make getOffset a member function of DynamicReloc<ELFT>.
Logically it belongs to DynamicReloc, and it is more readable to
be a member of the class.

llvm-svn: 261069
2016-02-17 06:08:42 +00:00
Rui Ueyama 861c731ccc Use shorter names for the .gnu.hash class.
llvm-svn: 261067
2016-02-17 05:40:03 +00:00
Rui Ueyama 91c0a5db01 Use stable_partition instead of erasing all elements and fill it again.
llvm-svn: 261066
2016-02-17 05:40:01 +00:00
Rui Ueyama c2e863a0d8 Use an accurate type instead of unsigned.
These values are offsets in the string table (which must fit in
host computer's memory space), so size_t is better than unsigned.

llvm-svn: 261065
2016-02-17 05:06:40 +00:00
Rui Ueyama 874e7aee29 Split SymbolTableSection::writeGlobalSymbols.
Previously, we added garbage-collected symbols to the symbol table
and filter them out when we were writing symbols to the file. In
this patch, garbage-collected symbols are filtered out from beginning.

llvm-svn: 261064
2016-02-17 04:56:44 +00:00
Hans Wennborg 84047896b9 Revert r260979 "[X86] Enable the LEA optimization pass by default."
Asserts are still firing in Chromium builds. PR26575.

llvm-svn: 261058
2016-02-17 02:49:59 +00:00
Xinliang David Li c902fed440 revert r261038: arm/aarch64 bot failure
llvm-svn: 261057
2016-02-17 02:39:34 +00:00
Mehdi Amini ac9d1467e2 Revert "Query the StringMap only once when creating MDString (NFC)"
This reverts commit r261030 and r261036.
(The revision was marked "approved" on phabricator, but some concerns
were raised on the mailing list. Thanks D. Blaikie for notifying me.)

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 261055
2016-02-17 02:18:58 +00:00
Chandler Carruth ce55f56fbc [cmake] Revert r260742 (and r260744) to improve order file support.
This appears to be passing '-Wl,-order_file' to Linux link commands,
which then causes the linker to silently, behind the scenes, write the
output to 'rder_file' instead of somewhere else. Will work with Chris to
figure out the proper support for this, but so far there are numerous
people who can't get Clang to update when they build because of this.

llvm-svn: 261054
2016-02-17 02:13:35 +00:00
Sean Silva 4be470b71b [AttrDocs.td] Fix up some reST syntax.
llvm-svn: 261053
2016-02-17 02:08:19 +00:00
Haicheng Wu 5cf99095bb [AliasSetTracker] Teach AliasSetTracker about MemSetInst
This change is to fix the problem discussed in
http://lists.llvm.org/pipermail/llvm-dev/2016-February/095446.html.

llvm-svn: 261052
2016-02-17 02:01:50 +00:00
JF Bastien a0d5347ee9 WebAssembly: update expected failures
r261050 seems to inadvertently fix the assertion failure.

llvm-svn: 261051
2016-02-17 01:59:23 +00:00
Dan Gohman 476ffcec04 [WebAssembly] Call memcpy for large byval copies.
This fixes very slow compilation on
test/CodeGen/Generic/2010-11-04-BigByval.ll . Note that MaxStoresPerMemcpy
and friends are not yet carefully tuned so the cutoff point is currently
somewhat arbitrary. However, it's important that there be a cutoff point
so that we don't emit unbounded quantities of loads and stores.

llvm-svn: 261050
2016-02-17 01:43:37 +00:00
Evgeniy Stepanov f55ebf0e39 [msan] Extend prlimit test.
llvm-svn: 261049
2016-02-17 01:34:56 +00:00
Evgeniy Stepanov d308f92d02 [msan] Intercept prlimit.
llvm-svn: 261048
2016-02-17 01:26:57 +00:00
Xinliang David Li 13cf09dc38 Test simplification
llvm-svn: 261047
2016-02-17 00:59:01 +00:00
Xinliang David Li 534ace01ad Restrengthen tests relaxed in r259955
llvm-svn: 261046
2016-02-17 00:58:13 +00:00
Mehdi Amini a7c0940d72 Teach clang to use the ThinLTO pipeline
Summary: Use the new pipeline implemented in D17115

Reviewers: tejohnson

Subscribers: joker.eph, cfe-commits

Differential Revision: http://reviews.llvm.org/D17272

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 261045
2016-02-17 00:42:20 +00:00
JF Bastien 188ca894c2 WebAssembly: update expected test failures
r261032 adds frame address support.

llvm-svn: 261044
2016-02-17 00:34:15 +00:00
Matt Arsenault 34766ca34b Add .gitignore for build directories
llvm-svn: 261043
2016-02-17 00:27:31 +00:00
Matt Arsenault 45e6eaaa05 amdgcn: Use new workitem intrinsics
llvm-svn: 261042
2016-02-17 00:27:27 +00:00
Chandler Carruth e5944d97d8 [LCG] Construct an actual call graph with call-edge SCCs nested inside
reference-edge SCCs.

This essentially builds a more normal call graph as a subgraph of the
"reference graph" that was the old model. This allows both to exist and
the different use cases to use the aspect which addresses their needs.
Specifically, the pass manager and other *ordering* constrained logic
can use the reference graph to achieve conservative order of visit,
while analyses reasoning about attributes and other properties derived
from reachability can reason about the direct call graph.

Note that this isn't necessarily complete: it doesn't model edges to
declarations or indirect calls. Those can be found by scanning the
instructions of the function if desirable, and in fact every user
currently does this in order to handle things like calls to instrinsics.
If useful, we could consider caching this information in the call graph
to save the instruction scans, but currently that doesn't seem to be
important.

An important realization for why the representation chosen here works is
that the call graph is a formal subset of the reference graph and thus
both can live within the same data structure. All SCCs of the call graph
are necessarily contained within an SCC of the reference graph, etc.

The design is to build 'RefSCC's to model SCCs of the reference graph,
and then within them more literal SCCs for the call graph.

The formation of actual call edge SCCs is not done lazily, unlike
reference edge 'RefSCC's. Instead, once a reference SCC is formed, it
directly builds the call SCCs within it and stores them in a post-order
sequence. This is used to provide a consistent platform for mutation and
update of the graph. The post-order also allows for very efficient
updates in common cases by bounding the number of nodes (and thus edges)
considered.

There is considerable common code that I'm still looking for the best
way to factor out between the various DFS implementations here. So far,
my attempts have made the code harder to read and understand despite
reducing the duplication, which seems a poor tradeoff. I've not given up
on figuring out the right way to do this, but I wanted to wait until
I at least had the system working and tested to continue attempting to
factor it differently.

This also requires introducing several new algorithms in order to handle
all of the incremental update scenarios for the more complex structure
involving two edge colorings. I've tried to comment the algorithms
sufficiently to make it clear how this is expected to work, but they may
still need more extensive documentation.

I know that there are some changes which are not strictly necessarily
coupled here. The process of developing this started out with a very
focused set of changes for the new structure of the graph and
algorithms, but subsequent changes to bring the APIs and code into
consistent and understandable patterns also ended up touching on other
aspects. There was no good way to separate these out without causing
*massive* merge conflicts. Ultimately, to a large degree this is
a rewrite of most of the core algorithms in the LCG class and so I don't
think it really matters much.

Many thanks to the careful review by Sanjoy Das!

Differential Revision: http://reviews.llvm.org/D16802

llvm-svn: 261040
2016-02-17 00:18:16 +00:00
Reid Kleckner 8de35fef3d [X86] Fix a shrink-wrapping miscompile around __chkstk
__chkstk clobbers EAX. If EAX is live across the prologue, then we have
to take extra steps to save it. We already had code to do this if EAX
was a register parameter. This change adapts it to work when shrink
wrapping is used.

llvm-svn: 261039
2016-02-17 00:17:33 +00:00
Xinliang David Li b83bedd8c2 New test case: make sure alloc bit is not set for covmap section on Linux
llvm-svn: 261038
2016-02-17 00:14:52 +00:00
Dan Gohman 1d547bf566 [WebAssembly] Use SDValue::getConstantOperandVal. NFC.
llvm-svn: 261037
2016-02-17 00:14:03 +00:00
Mehdi Amini 08ea2c7537 Fix MSVC bot: apparently visual studio does not like explicitly defaulted move ctor
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 261036
2016-02-17 00:11:59 +00:00
Richard Smith c28aee6a51 Improve diagnostics for ill-formed literal operator declarations.
Patch by Erik Pilkington!

llvm-svn: 261034
2016-02-17 00:04:04 +00:00
Andrew Kaylor b68464eb78 Fix build LLVM with -D LLVM_USE_INTEL_JITEVENTS:BOOL=ON on Windows
Differential Revision: http://reviews.llvm.org/D16940

llvm-svn: 261033
2016-02-16 23:52:18 +00:00
Dan Gohman 94c6566055 [WebAssembly] Implement __builtin_frame_address.
Differential Revision: http://reviews.llvm.org/D17307

llvm-svn: 261032
2016-02-16 23:48:04 +00:00
Mehdi Amini 0520929231 Query the StringMap only once when creating MDString (NFC)
Summary: Loading IR with debug info improves MDString::get() from 19ms to 10ms.

Reviewers: dexonsmith

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D16597

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 261030
2016-02-16 23:05:56 +00:00
Mehdi Amini 1db10ac6ce Define the ThinLTO Pipeline (experimental)
Summary:
On the contrary to Full LTO, ThinLTO can afford to shift compile time
from the frontend to the linker: both phases are parallel (even if
it is not totally "free": projects like clang are reusing product
from the "compile phase" for multiple link, think about
libLLVMSupport reused for opt, llc, etc.).

This pipeline is based on the proposal in D13443 for full LTO. We
didn't move forward on this proposal because the LTO link was far too
long after that. We believe that we can afford it with ThinLTO.

The ThinLTO pipeline integrates in the regular O2/O3 flow:

 - The compile phase perform the inliner with a somehow lighter
   function simplification. (TODO: tune the inliner thresholds here)
   This is intendend to simplify the IR and get rid of obvious things
   like linkonce_odr that will be inlined.
 - The link phase will run the pipeline from the start, extended with
   some specific passes that leverage the augmented knowledge we have
   during LTO. Especially after the inliner is done, a sequence of
   globalDCE/globalOpt is performed, followed by another run of the
   "function simplification" passes. It is not clear if this part
   of the pipeline will stay as is, as the split model of ThinLTO
   does not allow the same benefit as FullLTO without added tricks.

The measurements on the public test suite as well as on our internal
suite show an overall net improvement. The binary size for the clang
executable is reduced by 5%. We're still tuning it with the bringup
of ThinLTO and it will evolve, but this should provide a good starting
point.

Reviewers: tejohnson

Differential Revision: http://reviews.llvm.org/D17115

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 261029
2016-02-16 23:02:29 +00:00
Mehdi Amini ec8bee1879 Refactor the PassManagerBuilder: extract a "addFunctionSimplificationPasses()" (NFC)
It is intended to contains the passes run over a function after the
inliner is done with a function and before it moves to its callers.

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 261028
2016-02-16 22:54:27 +00:00
Adam Nemet 7e326b1552 Fix test from r261013
llvm-svn: 261027
2016-02-16 22:50:19 +00:00
Simon Pilgrim cc8a282647 [X86][AVX] Regenerated vselect tests
llvm-svn: 261026
2016-02-16 22:33:27 +00:00
Ahmed Bougacha f3cccab1e0 [X86] Remove the now-unused X86ISD::PSIGN. NFC.
llvm-svn: 261025
2016-02-16 22:14:12 +00:00
Ahmed Bougacha af60a429c9 [X86] Generalize logic blend of (x, -x) combine to match (-x, x).
I suspect this is what let PR26110 lie dormant for so long.

llvm-svn: 261024
2016-02-16 22:14:07 +00:00
Ahmed Bougacha 132fbf5476 [X86] Don't turn (c?-v:v) into (c?-v:0) by blindly using PSIGN.
Currently, we sometimes miscompile this vector pattern:
    (c ? -v : v)
We lower it to (because "c" is <4 x i1>, lowered as a vector mask):
    (~c & v) | (c & -v)

When we have SSSE3, we incorrectly lower that to PSIGN, which does:
    (c < 0 ? -v : c > 0 ? v : 0)
in other words, when c is either all-ones or all-zero:
    (c ? -v : 0)
While this is an old bug, it rarely triggers because the PSIGN combine
is too sensitive to operand order. This will be improved separately.

Note that the PSIGN tests are also incorrect. Consider:
    %b.lobit = ashr <4 x i32> %b, <i32 31, i32 31, i32 31, i32 31>
    %sub = sub nsw <4 x i32> zeroinitializer, %a
    %0 = xor <4 x i32> %b.lobit, <i32 -1, i32 -1, i32 -1, i32 -1>
    %1 = and <4 x i32> %a, %0
    %2 = and <4 x i32> %b.lobit, %sub
    %cond = or <4 x i32> %1, %2
    ret <4 x i32> %cond
if %b is zero:
    %b.lobit = <4 x i32> zeroinitializer
    %sub = sub nsw <4 x i32> zeroinitializer, %a
    %0 = <4 x i32> <i32 -1, i32 -1, i32 -1, i32 -1>
    %1 = <4 x i32> %a
    %2 = <4 x i32> zeroinitializer
    %cond = or <4 x i32> %a, zeroinitializer
    ret <4 x i32> %a
whereas we currently generate:
    psignd %xmm1, %xmm0
    retq
which returns 0, as %xmm1 is 0.

Instead, use a pure logic sequence, as described in:
https://graphics.stanford.edu/~seander/bithacks.html#ConditionalNegate

Fixes PR26110.

Differential Revision: http://reviews.llvm.org/D17181

llvm-svn: 261023
2016-02-16 22:14:03 +00:00
Ahmed Bougacha a87c3480b5 [X86] Extract PSIGN/BLENDVP tests into vector-blend.ll. NFC.
We're going to stop generating PSIGN, so calling a test "psign"
isn't ideal. Instead, call these tests what they really are:
variable blends using logic.
Also add a test to exhibit a case we're currently missing in
the PSIGN combine.

llvm-svn: 261022
2016-02-16 22:13:59 +00:00
Ahmed Bougacha f211f685e4 [X86] Extract PSIGN/BLENDVP combine. NFC.
llvm-svn: 261021
2016-02-16 22:13:55 +00:00
Ahmed Bougacha 7502768c6d [X86] Extract ANDNP combine. NFC.
This makes it IMO more readable and reduces indentation.

llvm-svn: 261020
2016-02-16 22:13:49 +00:00
Mehdi Amini 83efea89e8 Bitcode writer: fix a typo, using getName() instead of getSourceFileName()
When emitting the source filename, the encoding of the string
was checked against the name instead of the filename.

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 261019
2016-02-16 22:07:03 +00:00
Artem Belevich 0a0e54c194 [CUDA] pass debug options to ptxas.
ptxas optimizations are disabled if we need to generate debug info
as ptxas does not accept '-g' otherwise.

Differential Revision: http://reviews.llvm.org/D17111

llvm-svn: 261018
2016-02-16 22:03:20 +00:00
Derek Schuff 25628d3362 [WebAssembly] Update torture test expectations
These were fixed with r260978

llvm-svn: 261017
2016-02-16 21:52:06 +00:00
Reid Kleckner 9a593ee7d2 [codeview] Bail on a DBG_VALUE register operand with no register
This apparently comes up when the register allocator decides that a
variable will become undef along a certain path.

Also improve the error message we emit when we can't map from LLVM
register number to CV register number.

llvm-svn: 261016
2016-02-16 21:49:26 +00:00