Commit Graph

45621 Commits

Author SHA1 Message Date
Serguei Katkov 0e70206c8f This reverts commit r306272.
Revert "[MBP] do not rotate loop if it creates extra branch"

It breaks the sanitizer build bots. Need to fix this.

llvm-svn: 306276
2017-06-26 06:51:45 +00:00
Hiroshi Inoue 4484ff03df fix trivial typo in comment, NFC
llvm-svn: 306274
2017-06-26 06:32:04 +00:00
Serguei Katkov b01fff06ed [MBP] do not rotate loop if it creates extra branch
This is a last fix for the corner case of PR32214. Actually this is not really corner case in general.

We should not do a loop rotation if we create an additional branch due to it.
Consider the case where we have a loop chain H, M, B, C , where
H is header with viable fallthrough from pre-header and exit from the loop
M - some middle block
B - backedge to Header but with exit from the loop also.
C - some cold block of the loop.

Let's H is determined as a best exit. If we do a loop rotation M, B, C, H we can introduce the extra branch.
Let's compute the change in number of branches:
+1 branch from pre-header to header
-1 branch from header to exit
+1 branch from header to middle block if there is such
-1 branch from cold bock to header if there is one

So if C is not a predecessor of H then we introduce extra branch.

This change actually prohibits rotation of the loop if both true
1) Best Exit has next element in chain as successor.
2) Last element in chain is not a predecessor of first element of chain.

Reviewers: iteratee, xur
Reviewed By: iteratee
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D34271

llvm-svn: 306272
2017-06-26 05:27:27 +00:00
Matt Arsenault 10fc062b2b AMDGPU: Partially fix implicit.buffer.ptr intrinsic handling
This should not be treated as a different version of
private_segment_buffer. These are distinct things with
different uses and register classes, and requires the
function argument info to have more context about the
function's type and environment.

Also add missing test coverage for the intrinsic, and
emit an error for HSA. This also encovers that the intrinsic
is broken unless there happen to be stack objects.

llvm-svn: 306264
2017-06-26 03:01:31 +00:00
Chandler Carruth 4a000883c7 [LoopSimplify] Re-instate r306081 with a bug fix w.r.t. indirectbr.
This was reverted in r306252, but I already had the bug fixed and was
just trying to form a test case.

The original commit factored the logic for forming dedicated exits
inside of LoopSimplify into a helper that could be used elsewhere and
with an approach that required fewer intermediate data structures. See
that commit for full details including the change to the statistic, etc.

The code looked fine to me and my reviewers, but in fact didn't handle
indirectbr correctly -- it left the 'InLoopPredecessors' vector dirty.

If you have code that looks *just* right, you can end up leaking these
predecessors into a subsequent rewrite, and crash deep down when trying
to update PHI nodes for predecessors that don't exist.

I've added an assert that makes the bug much more obvious, and then
changed the code to reliably clear the vector so we don't get this bug
again in some other form as the code changes.

I've also added a test case that *does* manage to catch this while also
giving some nice positive coverage in the face of indirectbr.

The real code that found this came out of what I think is CPython's
interpreter loop, but any code with really "creative" interpreter loops
mixing indirectbr and other exit paths could manage to tickle the bug.
I was hard to reduce the original test case because in addition to
having a particular pattern of IR, the whole thing depends on the order
of the predecessors which is in turn depends on use list order. The test
case added here was designed so that in multiple different predecessor
orderings it should always end up going down the same path and tripping
the same bug. I hope. At least, it tripped it for me without
manipulating the use list order which is better than anything bugpoint
could do...

llvm-svn: 306257
2017-06-25 22:45:31 +00:00
Chandler Carruth 73367b6a09 [LoopSimplify] Improve a test for loop simplify minorly. NFC.
I did some basic testing while looking for a bug in my recent change to
loop simplify and even though it didn't find the bug it seems like
a useful improvement anyways.

llvm-svn: 306256
2017-06-25 22:24:02 +00:00
Daniel Jasper 4c6cd4ccb7 Revert "[LoopSimplify] Factor the logic to form dedicated exits into a utility."
This leads to a segfault. Chandler already has a test case and should be
able to recommit with a fix soon.

llvm-svn: 306252
2017-06-25 17:58:25 +00:00
Simon Pilgrim 9956364a1f [X86] Add test case for PR15705
llvm-svn: 306246
2017-06-25 16:12:45 +00:00
Sanjay Patel 2f3ead7adc [InstCombine] add (sext i1 X), 1 --> zext (not X)
http://rise4fun.com/Alive/i8Q

A narrow bitwise logic op is obviously better than math for value tracking, 
and zext is better than sext. Typically, the 'not' will be folded into an 
icmp predicate.

The IR difference would even survive through codegen for x86, so we would see 
worse code:

https://godbolt.org/g/C14HMF

one_or_zero(int, int):                      # @one_or_zero(int, int)
        xorl    %eax, %eax
        cmpl    %esi, %edi
        setle   %al
        retq

one_or_zero_alt(int, int):                  # @one_or_zero_alt(int, int)
        xorl    %ecx, %ecx
        cmpl    %esi, %edi
        setg    %cl
        movl    $1, %eax
        subl    %ecx, %eax
        retq

llvm-svn: 306243
2017-06-25 14:15:28 +00:00
Elena Demikhovsky 72f991cded AVX-512: Fixed a crash during legalization of <3 x i8> type
The compiler fails with assertion during legalization of SETCC for <3 x i8> operands.
The result is extended to <4 x i8> and then truncated <4 x i1>. It does not happen on AVX2, because the final result of SETCC is <4 x i32>.

Differential Revision: https://reviews.llvm.org/D34503

llvm-svn: 306242
2017-06-25 13:36:20 +00:00
Igor Breger f5035d6ee5 [GlobalISel][X86] Support vector type G_EXTRACT selection.
Summary:
Support vector type G_EXTRACT selection. For now G_EXTRACT marked as legal for any type, so nothing to do in legalizer.
Split from https://reviews.llvm.org/D33665

Reviewers: qcolombet, t.p.northover, zvi, guyblank

Reviewed By: guyblank

Subscribers: guyblank, rovka, llvm-commits, kristof.beyls

Differential Revision: https://reviews.llvm.org/D33957

llvm-svn: 306240
2017-06-25 11:42:17 +00:00
Dorit Nuzman e0e0f1ddb0 [AVX2] [TTI CostModel] Add cost of interleaved loads/stores for AVX2
The cost of an interleaved access was only implemented for AVX512. For other
X86 targets an overly conservative Base cost was returned, resulting in
avoiding vectorization where it is actually profitable to vectorize.
This patch starts to add costs for AVX2 for most prominent cases of
interleaved accesses (stride 3,4 chars, for now).

Note1: Improvements of up to ~4x were observed in some of EEMBC's rgb
workloads; There is also a known issue of 15-30% degradations on some of these
workloads, associated with an interleaved access followed by type
promotion/widening; the resulting shuffle sequence is currently inefficient and
will be improved by a series of patches that extend the X86InterleavedAccess pass
(such as D34601 and more to follow).

Note 2: The costs in this patch do not reflect port pressure penalties which can
be very dominant in the case of interleaved accesses since most of the shuffle
operations are restricted to a single port. Further tuning, that may incorporate
these considerations, will be done on top of the upcoming improved shuffle
sequences (that is, along with the abovementioned work to extend
X86InterleavedAccess pass).


Differential Revision: https://reviews.llvm.org/D34023

llvm-svn: 306238
2017-06-25 08:26:25 +00:00
Xinliang David Li b67530e9b9 [PGO] Implementate profile counter regiser promotion
Differential Revision: http://reviews.llvm.org/D34085

llvm-svn: 306231
2017-06-25 00:26:43 +00:00
Hiroshi Inoue a85d24b73d fix trivial typos in comment, NFC
llvm-svn: 306211
2017-06-24 16:00:26 +00:00
Hiroshi Inoue b300824ee7 fix trivial typos in comment, NFC
dereferencable -> dereferenceable

llvm-svn: 306210
2017-06-24 15:43:33 +00:00
Hiroshi Inoue 95f24dca98 [SelectionDAG] set dereferenceable flag when expanding memcpy/memmove
When SelectionDAG expands memcpy (or memmove) call into a sequence of load and store instructions, it disregards dereferenceable flag even the source pointer is known to be dereferenceable.
This results in an assertion failure if SelectionDAG commonizes a load instruction generated for memcpy with another load instruction for the source pointer.
This patch makes SelectionDAG to set the dereferenceable flag for the load instructions properly to avoid the assertion failure.

Differential Revision: https://reviews.llvm.org/D34467

llvm-svn: 306209
2017-06-24 15:17:38 +00:00
Rafael Espindola b05f4a7b25 Add missing %s to RUN line.
llvm-svn: 306199
2017-06-24 04:41:39 +00:00
Rafael Espindola 2c166857b3 Test the object file creation too.
This should *really* be a llvm-mc test, but the parser is broken.
See PR33579 for the parser bug.

llvm-svn: 306198
2017-06-24 04:31:45 +00:00
Vitaly Buka df19ad456e [InstCombine] Don't replace allocas with smaller globals
Summary:
InstCombine replaces large allocas with small globals consts causing buffer overflows
on valid code, see PR33372.

This fix permits this optimization only if the global is dereference for alloca size.

Fixes PR33372

Reviewers: eugenis, majnemer, chandlerc

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D34311

llvm-svn: 306194
2017-06-24 01:35:19 +00:00
Nirav Dave 18c10c53d0 Update constants in complex-return test to prevent reduction to smaller constants
llvm-svn: 306192
2017-06-24 01:29:24 +00:00
Zachary Turner fa33282774 [llvm-pdbutil] Dump raw bytes of module symbols and debug chunks.
llvm-svn: 306179
2017-06-23 23:08:57 +00:00
Rafael Espindola 801b42de31 ARM: move some logic from processFixupValue to applyFixup.
processFixupValue is called on every relaxation iteration. applyFixup
is only called once at the very end. applyFixup is then the correct
place to do last minute changes and value checks.

While here, do proper range checks again for fixup_arm_thumb_bl. We
used to do it, but dropped because of thumb2. We now do it again, but
use the thumb2 range.

llvm-svn: 306177
2017-06-23 22:52:36 +00:00
Petar Jovanovic 53dbfb3798 Reland r306095: [mips] Fix reg positions in the aui/daui instructions
After fixing (r306173) a failing test in the lld test suite (r306173),
reland r306095.

Original commit message:

  [mips] Fix register positions in the aui/daui instructions

  Swapped the position of the rt and rs register in the aui/daui
  instructions for mips32r6 and mips64r6. With this change, the format of
  the generated instructions complies with specifications and GCC.
  Patch by Milos Stojanovic.

llvm-svn: 306174
2017-06-23 22:37:19 +00:00
Reid Kleckner 45cb4fec1e [llvm-readobj] Fix COFF RVA table dumping bug
We would return an error in getVaPtr if the RVA table being dumped was
the last data in the .rdata section. Avoid the issue by subtracting one
from the offset and adding it back to get an open interval again.

llvm-svn: 306171
2017-06-23 22:12:11 +00:00
Zachary Turner c2f5b4bfd9 [llvm-pdbutil] Dump raw bytes of type and id records.
llvm-svn: 306167
2017-06-23 21:50:54 +00:00
Zachary Turner dd73968256 [llvm-pdbutil] Dump raw bytes of various DBI stream subsections.
llvm-svn: 306160
2017-06-23 21:11:54 +00:00
Vadzim Dambrouski 9e0d3878fb [MSP430] Fix data layout string.
Summary:
Without this patch some types have incorrect size and/or alignment
according to the MSP430 EABI.

Reviewers: asl, awygle

Reviewed By: asl

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D34561

llvm-svn: 306159
2017-06-23 21:11:45 +00:00
Nirav Dave cedfeb364f Add bitcast store-merge test.
llvm-svn: 306158
2017-06-23 20:52:14 +00:00
Zachary Turner 5f09852dfb [llvm-pdbutil] Show what blocks a stream occupies.
This is useful when you want to look at a specific chunk of a
stream or look for discontinuities, and you need to know the
list of blocks occupied by a stream.

llvm-svn: 306150
2017-06-23 20:28:14 +00:00
Zachary Turner 6c3e41bbd3 [llvm-pdbutil] Dump raw bytes of pdb name map.
This patch dumps the raw bytes of the pdb name map which contains
the mapping of stream name to stream index for the string table
and other reserved streams.

llvm-svn: 306148
2017-06-23 20:18:38 +00:00
Zachary Turner 6b124f29e7 [llvm-pdbutil] Add the ability to dump raw bytes from the file.
Normally we can only make sense of the content of a PDB in terms
of streams and blocks, but in some cases it may be useful to dump
bytes at a specific absolute file offset.  For example, if you
know that some interesting data is at a particular location and
you want to see some surrounding data.

llvm-svn: 306146
2017-06-23 19:54:44 +00:00
Krzysztof Parzyszek 717021772b Revert "[Hexagon] Handle decreasing of stack alignment in frame lowering"
This breaks passing of aligned function arguments.

llvm-svn: 306145
2017-06-23 19:47:04 +00:00
Chad Rosier 6db9ff64a8 [AArch64] Prefer Bcc to CBZ/CBNZ/TBZ/TBNZ when NZCV flags can be set for "free".
This patch contains a pass that transforms CBZ/CBNZ/TBZ/TBNZ instructions into a
conditional branch (Bcc), when the NZCV flags can be set for "free". This is
preferred on targets that have more flexibility when scheduling Bcc
instructions as compared to CBZ/CBNZ/TBZ/TBNZ (assuming all other variables are
equal). This can reduce register pressure and is also the default behavior for
GCC.

A few examples:

 add w8, w0, w1  -> cmn w0, w1             ; CMN is an alias of ADDS.
 cbz w8, .LBB_2  -> b.eq .LBB0_2           ; single def/use of w8 removed.

 add w8, w0, w1  -> adds w8, w0, w1        ; w8 has multiple uses.
 cbz w8, .LBB1_2 -> b.eq .LBB1_2

 sub w8, w0, w1       -> subs w8, w0, w1   ; w8 has multiple uses.
 tbz w8, #31, .LBB6_2 -> b.ge .LBB6_2

In looking at all current sub-target machine descriptions, this transformation
appears to be either positive or neutral.

Differential Revision: https://reviews.llvm.org/D34220.

llvm-svn: 306144
2017-06-23 19:20:12 +00:00
Zachary Turner 0b36c3ebd0 [llvm-pdbutil] Add a function for formatting MSF data.
The goal here is to make it possible to display absolute
file offsets when dumping byets from an MSF.  The problem is
that when dumping bytes from an MSF, often the bytes will
cross a block boundary and encounter a discontinuity.  We
can't use the normal formatBinary() function for this because
this would just treat the sequence as entirely ascending, and
not account out-of-order blocks.

This patch adds a formatMsfData() function to our printer, and
then uses this function to improve the output of the -stream-data
command line option for dumping bytes from a particular stream.

Test coverage is also expanded to make sure to include all possible
scenarios of offsets, sizes, and crossing block boundaries.

llvm-svn: 306141
2017-06-23 18:52:13 +00:00
Sanjay Patel 3de6bad65f [x86] fix value types for SBB transform (PR33560)
I'm not sure yet why this wouldn't fail in the simple case,
but clearly I used the wrong value type with:
https://reviews.llvm.org/rL306040

...and the bug manifests with:
https://bugs.llvm.org/show_bug.cgi?id=33560

llvm-svn: 306139
2017-06-23 18:42:15 +00:00
Simon Pilgrim 19cee0d56c [X86][AVX] Regenerate i256 bitcasted store test
Check on slow/fast unaligned memory targets

llvm-svn: 306138
2017-06-23 18:34:56 +00:00
Simon Pilgrim dfa436079f Regenerate extract-store.ll tests
llvm-svn: 306131
2017-06-23 17:19:44 +00:00
Krzysztof Parzyszek bb2fcd1921 [Hexagon] Handle decreasing of stack alignment in frame lowering
llvm-svn: 306124
2017-06-23 16:53:59 +00:00
Tim Northover 4b4eec7009 GlobalISel: remove G_SEQUENCE instruction.
It was trying to do too many things. The basic lumping together of values for
legalization purposes is now handled by G_MERGE_VALUES. More complex things
involving gaps and odd sizes are handled by G_INSERT sequences.

llvm-svn: 306120
2017-06-23 16:15:55 +00:00
Tim Northover b57bf2ac79 GlobalISel: convert buildSequence to use non-deprecated instructions.
G_SEQUENCE is going away soon so as a first step the MachineIRBuilder needs to
be taught how to emulate it with alternatives. We use G_MERGE_VALUES where
possible, and a sequence of G_INSERTs if not.

llvm-svn: 306119
2017-06-23 16:15:37 +00:00
Jun Bum Lim 506cfb7ab7 [InlineCost] Do not take INT_MAX when Cost is negative
Summary: visitSwitchInst should not take INT_MAX when Cost is negative. Instead of INT_MAX , we also use a valid upperbound cost when overflow occurs in Cost.

Reviewers: hans, echristo, dmgreen

Reviewed By: dmgreen

Subscribers: mcrosier, javed.absar, llvm-commits, eraman

Differential Revision: https://reviews.llvm.org/D34436

llvm-svn: 306118
2017-06-23 16:12:37 +00:00
Ulrich Weigand eaf0051ba3 [SystemZ] Remove unnecessary serialization before volatile loads
This reverts the use of TargetLowering::prepareVolatileOrAtomicLoad
introduced by r196905.  Nothing in the semantics of the "volatile"
keyword or the definition of the z/Architecture actually requires
that volatile loads are preceded by a serialization operation, and
no other compiler on the platform actually implements this.

Since we've now seen a use case where this additional serialization
causes noticable performance degradation, this patch removes it.

The patch still leaves in the serialization before atomic loads,
which is now implemented directly in lowerATOMIC_LOAD.  (This also
seems overkill, but that can be addressed separately.)

llvm-svn: 306117
2017-06-23 15:56:14 +00:00
Sanjay Patel 021f32fd0f [x86] auto-generate complete checks; NFC
llvm-svn: 306114
2017-06-23 15:29:49 +00:00
Sanjay Patel 02469b63c2 [x86] auto-generate complete checks; NFC
llvm-svn: 306113
2017-06-23 15:22:27 +00:00
Tom Stellard af552dc352 AMDGPU/GlobalISel: Mark 32-bit G_AND as legal
Reviewers: arsenm

Reviewed By: arsenm

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, igorb, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D34349

llvm-svn: 306112
2017-06-23 15:17:17 +00:00
Sanjay Patel 563e5afa0e [x86] remove overridden target settings in test; NFC
r306109 was supposed to make this change, but I committed the wrong version.

llvm-svn: 306110
2017-06-23 15:06:30 +00:00
Sanjay Patel 8e06df4303 [x86] rename test file and auto-generate complete checks; NFC
The command-line params override the target setting in the file itself, so delete that.
Also, remove the cpu and arch because those don't matter and neither does the OS specification in the triple.

llvm-svn: 306109
2017-06-23 14:58:21 +00:00
Simon Pilgrim 859b48d2d3 [X86][AVX] Extended vector average tests
Added AVX1 tests and merged AVX1/AVX2/AVX512 checks where possible

llvm-svn: 306107
2017-06-23 14:38:00 +00:00
Jonas Paulsson 82f15a7168 [SystemZ] Fix trap issue and enable expensive checks.
The isBarrier/isTerminator flags have been removed from the SystemZ trap
instructions, so that tests do not fail with EXPENSIVE_CHECKS. This was just
an issue at -O0 and did not affect code output on benchmarks.

(Like Eli pointed out: "targets are split over whether they consider their
"trap" a terminator; x86, AArch64, and NVPTX don't, but ARM, MIPS, PPC, and
SystemZ do. We should probably try to be consistent here.". This is still the
case, although SystemZ has switched sides).

SystemZ now returns true in isMachineVerifierClean() :-)

These Generic tests have been modified so that they can be run with or without
EXPENSIVE_CHECKS: CodeGen/Generic/llc-start-stop.ll and
CodeGen/Generic/print-machineinstrs.ll

Review: Ulrich Weigand, Simon Pilgrim, Eli Friedman
https://bugs.llvm.org/show_bug.cgi?id=33047
https://reviews.llvm.org/D34143

llvm-svn: 306106
2017-06-23 14:30:46 +00:00
Simon Pilgrim dbd20ffee1 [X86][SSE] Dropped -mcpu from vector average tests
Use triple and attribute only for consistency 

llvm-svn: 306104
2017-06-23 14:16:50 +00:00