Commit Graph

165672 Commits

Author SHA1 Message Date
George Rimar dcf59c5480 Recommit r335333 "[MC] - Add .stack_size sections into groups and link them with .text"
With compilation fix.

Original commit message:

D39788 added a '.stack-size' section containing metadata on function stack sizes
to output ELF files behind the new -stack-size-section flag.

This change does following two things on top:

1) Imagine the case when there are -ffunction-sections flag given and there are text sections in COMDATs. 
    The patch adds a '.stack-size' section into corresponding COMDAT group, so that linker will be able to
    eliminate them fast during resolving the COMDATs.
2) Patch sets a SHF_LINK_ORDER flag and links '.stack-size' with the corresponding .text.
   With that linker will be able to do -gc-sections on dead stack sizes sections.

Differential revision: https://reviews.llvm.org/D46874

llvm-svn: 335336
2018-06-22 10:53:47 +00:00
Simon Pilgrim e0a6eb1f4f [IR] Use Instruction::isBinaryOp helper instead of raw enum range tests. NFCI.
llvm-svn: 335335
2018-06-22 10:48:02 +00:00
George Rimar 6d448da1be Revert r335332 "[MC] - Add .stack_size sections into groups and link them with .text"
It broke bots.

http://lab.llvm.org:8011/builders/clang-ppc64le-linux-lnt/builds/12891
http://lab.llvm.org:8011/builders/clang-cmake-x86_64-sde-avx512-linux/builds/9443
http://lab.llvm.org:8011/builders/lldb-x86_64-ubuntu-14.04-buildserver/builds/25551

llvm-svn: 335333
2018-06-22 10:27:33 +00:00
George Rimar e14485a0c6 [MC] - Add .stack_size sections into groups and link them with .text
D39788 added a '.stack-size' section containing metadata on function stack sizes
to output ELF files behind the new -stack-size-section flag.

This change does following two things on top:

1) Imagine the case when there are -ffunction-sections flag given and there are text sections in COMDATs. 
    The patch adds a '.stack-size' section into corresponding COMDAT group, so that linker will be able to
    eliminate them fast during resolving the COMDATs.
2) Patch sets a SHF_LINK_ORDER flag and links '.stack-size' with the corresponding .text.
   With that linker will be able to do -gc-sections on dead stack sizes sections.

Differential revision: https://reviews.llvm.org/D46874

llvm-svn: 335332
2018-06-22 10:10:53 +00:00
Sjoerd Meijer 1043dffbd3 Recommit of r335326, with the test fixed that I missed.
llvm-svn: 335331
2018-06-22 10:03:03 +00:00
Simon Pilgrim 9c8f9374b5 [CostModel][AArch64] Add some initial costs for SK_Select and SK_PermuteSingleSrc
AArch64 was only setting costs for SK_Transpose, which meant that many of the simpler shuffles (e.g. SK_Select and SK_PermuteSingleSrc for larger vector elements) was being severely overestimated by the default shuffle expansion.

This patch adds costs to help improve SLP performance and avoid a regression in reductions introduced by D48174.

I'm not very knowledgeable about AArch64 shuffle lowering so I've kept the extra costs to a minimum - someone who knows this code can add extra costs which should improve vectorization a lot more.

Differential Revision: https://reviews.llvm.org/D48172

llvm-svn: 335329
2018-06-22 09:45:31 +00:00
Sjoerd Meijer 7ee5b090de Reverting r335326 while I look at the test failure
llvm-svn: 335328
2018-06-22 09:17:08 +00:00
Eugene Leviant 6d711ca168 Revert r335324 due to a builtbot failure
llvm-svn: 335327
2018-06-22 08:57:01 +00:00
Sjoerd Meijer 8d2f1565b7 [ARM] ARMv6m and v8m.baseline strict align
This sets target feature FeatureStrictAlign for Armv6-m and Armv8-m.baseline,
because it has no support for unaligned accesses.
It looks like we always pass target feature "+strict-align" from
Clang, so this is not a user facing problem, but querying the subtarget
(in e.g. llc) for unaligned access support is incorrect.

Differential Revision: https://reviews.llvm.org/D48437

llvm-svn: 335326
2018-06-22 08:48:13 +00:00
Matt Arsenault 3f8e7a3dbc AMDGPU: Add patterns for i32/i64 local atomic load/store
Not sure why the 32/64 split is needed in the atomic_load
store hierarchies. The regular PatFrags do this, but we don't
do it for the existing handling for global.

llvm-svn: 335325
2018-06-22 08:39:52 +00:00
Eugene Leviant ea19c9473c [Evaluator] Improve evaluation of call instruction
Differential revision: https://reviews.llvm.org/D46584

llvm-svn: 335324
2018-06-22 08:29:36 +00:00
Mikhail Dvoretckii 0963562083 [X86] Changing the check for valid inputs in combineScalarToVector
Changing the logic of scalar mask folding to check for valid input types rather
than against invalid ones, making it more robust and fixing PR37879.

Differential Revision: https://reviews.llvm.org/D48366

llvm-svn: 335323
2018-06-22 08:28:05 +00:00
Chandler Carruth aa5f4d2e23 Revert r335306 (and r335314) - the Call Graph Profile pass.
This is the first pass in the main pipeline to use the legacy PM's
ability to run function analyses "on demand". Unfortunately, it turns
out there are bugs in that somewhat-hacky approach. At the very least,
it leaks memory and doesn't support -debug-pass=Structure. Unclear if
there are larger issues or not, but this should get the sanitizer bots
back to green by fixing the memory leaks.

llvm-svn: 335320
2018-06-22 05:33:57 +00:00
Tom Stellard 6af7307650 AMDGPU/GlobalISel: Default to using TableGen'd instruction selector
Summary:
We can select all instructions that are marked as legal in a full piglit run,
so now is a good time to make the TableGen'd instruction selector default
for all opcodes.  This is NFC for a full piglit run, which is why there are
no tests.

Reviewers: arsenm, nhaehnle

Subscribers: kzhuravl, wdng, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D48198

llvm-svn: 335319
2018-06-22 03:04:35 +00:00
Tom Stellard 26fac0f8e1 AMDGPU/GlobalISel: legalize and select 32-bit G_ASHR
Reviewers: arsenm, nhaehnle

Subscribers: kzhuravl, wdng, yaxunl, rovka, kristof.beyls, dstuttard, tpr, llvm-commits, t-tye

Differential Revision: https://reviews.llvm.org/D48196

llvm-svn: 335318
2018-06-22 02:54:57 +00:00
Chandler Carruth fe70b29cf7 [LegacyPM] Fix PR37888 by teaching the legacy loop pass manager how to
clear out deleted loops from the current queue beyond just the current
loop.

This is important because SimpleLoopUnswitch will now enqueue the same
loop to be re-processed. When it does this with the legacy PM, we don't
have a way of canceling the rest of the pipeline and so we can end up
deleting the loop before we reprocess it. =/

This change also makes it easy to support deleting other loops in the
queue to process, although I don't have any use cases for that.

Differential Revision: https://reviews.llvm.org/D48470

llvm-svn: 335317
2018-06-22 02:43:41 +00:00
Tom Stellard 9a6535718e AMDGPU/GlobalISel: legalize and select 32-bit G_SITOFP
Reviewers: arsenm, nhaehnle

Reviewed By: arsenm

Subscribers: kzhuravl, wdng, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D48195

llvm-svn: 335316
2018-06-22 02:34:29 +00:00
Tom Stellard 7712ee8891 AMDGPU/GlobalISel: Implement select() for COPY
Reviewers: arsenm, nhaehnle

Reviewed By: nhaehnle

Subscribers: kzhuravl, wdng, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D46151

llvm-svn: 335315
2018-06-22 00:44:29 +00:00
Chandler Carruth 0307d52bd7 Fix test failures after r335306 due to the pipeline changing.
This wasn't obvious for the author to fix because this is the first
pipeline use of the magic utility to get function analyses within
a module pass in the lagecy pass manager. Turns out that has a bug which
prevents dumping the structure of the pipeline and shows up as an
unnamed pass.

I've just left a FIXME for that as it doesn't seem likely worth fixing
and certainly shouldn't hold up getting the bots green.

llvm-svn: 335314
2018-06-22 00:32:26 +00:00
Sanjay Patel 4784e1506e [InstCombine] fix shuffle-of-binops bug
With non-commutative binops, we could be using the same
variable value as operand 0 in 1 binop and operand 1 in 
the other, so we have to check for that possibility and
bail out.

llvm-svn: 335312
2018-06-21 23:56:59 +00:00
Sanjay Patel 23408c25f3 [InstCombine] add test for shuffle-of-binops; NFC
This shows a miscompile that was missed in rL335283.

llvm-svn: 335311
2018-06-21 23:53:01 +00:00
Tom Stellard 3f1c6fe156 AMDGPU/GlobalISel: Implement select() for G_IMPLICIT_DEF
Reviewers: arsenm, nhaehnle

Subscribers: kzhuravl, wdng, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D46150

llvm-svn: 335307
2018-06-21 23:38:20 +00:00
Michael J. Spencer fc93dd8e18 [Instrumentation] Add Call Graph Profile pass
This patch adds support for generating a call graph profile from Branch Frequency Info.

The CGProfile module pass simply gets the block profile count for each BB and scans for call instructions. For each call instruction it adds an edge from the current function to the called function with the current BB block profile count as the weight.

After scanning all the functions, it generates an appending module flag containing the data. The format looks like:

!llvm.module.flags = !{!0}

!0 = !{i32 5, !"CG Profile", !1}
!1 = !{!2, !3, !4} ; List of edges
!2 = !{void ()* @a, void ()* @b, i64 32} ; Edge from a to b with a weight of 32
!3 = !{void (i1)* @freq, void ()* @a, i64 11}
!4 = !{void (i1)* @freq, void ()* @b, i64 20}

Differential Revision: https://reviews.llvm.org/D48105

llvm-svn: 335306
2018-06-21 23:31:10 +00:00
Reid Kleckner 2ef486690c [X86] Fix 32-bit mingw comdat names, only add one underscore
llvm-svn: 335304
2018-06-21 23:06:33 +00:00
Fangrui Song 53bbb90718 [gdb] Update llvm::Optional
Reviewers: dblaikie

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D48461

llvm-svn: 335303
2018-06-21 22:34:29 +00:00
Scott Linder a3593cb44b [AMDGPU] Fix lit failures introduced in r335281
The tests do not support big-endian hosts.

llvm-svn: 335302
2018-06-21 22:30:09 +00:00
Sanjay Patel 73cde60468 [IR] fix typo in comment; NFC
llvm-svn: 335301
2018-06-21 22:25:42 +00:00
Reid Kleckner 3a2fd1c2f3 Revert r335297 "[X86] Implement more of x86-64 large and medium PIC code models"
MCJIT can't handle R_X86_64_GOT64 yet.

llvm-svn: 335300
2018-06-21 22:19:05 +00:00
Reid Kleckner 3286a6c896 [X86] Commit some comments that weren't in the medium code model patch
llvm-svn: 335298
2018-06-21 21:57:44 +00:00
Reid Kleckner 247fe6aeab [X86] Implement more of x86-64 large and medium PIC code models
Summary:
The large code model allows code and data segments to exceed 2GB, which
means that some symbol references may require a displacement that cannot
be encoded as a displacement from RIP. The large PIC model even relaxes
the assumption that the GOT itself is within 2GB of all code. Therefore,
we need a special code sequence to materialize it:
  .LtmpN:
    leaq .LtmpN(%rip), %rbx
    movabsq $_GLOBAL_OFFSET_TABLE_-.LtmpN, %rax # Scratch
    addq %rax, %rbx # GOT base reg

From that, non-local references go through the GOT base register instead
of being PC-relative loads. Local references typically use GOTOFF
symbols, like this:
    movq extern_gv@GOT(%rbx), %rax
    movq local_gv@GOTOFF(%rbx), %rax

All calls end up being indirect:
    movabsq $local_fn@GOTOFF, %rax
    addq %rbx, %rax
    callq *%rax

The medium code model retains the assumption that the code segment is
less than 2GB, so calls are once again direct, and the RIP-relative
loads can be used to access the GOT. Materializing the GOT is easy:
    leaq _GLOBAL_OFFSET_TABLE_(%rip), %rbx # GOT base reg

DSO local data accesses will use it:
    movq local_gv@GOTOFF(%rbx), %rax

Non-local data accesses will use RIP-relative addressing, which means we
may not always need to materialize the GOT base:
    movq extern_gv@GOTPCREL(%rip), %rax

Direct calls are basically the same as they are in the small code model:
They use direct, PC-relative addressing, and the PLT is used for calls
to non-local functions.

This patch adds reasonably comprehensive testing of LEA, but there are
lots of interesting folding opportunities that are unimplemented.

Reviewers: chandlerc, echristo

Subscribers: hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D47211

llvm-svn: 335297
2018-06-21 21:55:08 +00:00
Matthew Voss 30648ab233 [GVN] Avoid casting a vector of size less than 8 bits to i8
Summary:
A reprise of D25849.

This crash was found through fuzzing some time ago and was documented in PR28879.

No check for load size has been added due to the following tests:
  - Transforms/GVN/invariant.group.ll
  - Transforms/GVN/pr10820.ll

These tests expect load sizes that are not a multiple of eight.

Thanks to @davide for the original patch.

Reviewers: nlopes, davide, RKSimon, reames, efriedma

Reviewed By: efriedma

Subscribers: davide, llvm-commits, Prazek

Differential Revision: https://reviews.llvm.org/D48330

llvm-svn: 335294
2018-06-21 21:43:20 +00:00
Jonas Devlieghere 0bad3f625e [dsymutil] Force mmap'ing of binaries
After the recent refactoring that introduced parallel handling of
different object, the binary holder became unique per object file. This
defeats its optimization of caching archives, leading to an archive
being opened for every binary it contains. This is obviously unfortunate
and will need to be refactored soon.

Luckily in practice, the impact of this is limited as most files are
mmap'ed instead of memcopy'd. There's a caveat however: when the memory
buffer requires a null terminator and it's a multiple of the page size,
we allocate instead of mmap'ing. If this happens for a static archive,
we end up with N copies of it in memory, where N is the number of
objects in the archive, leading to exuberant memory usage. This provided
a stopgap solution to ensure that all the files it loads are mmap in
memory by removing the requirement for a terminating null byte.

Differential revision: https://reviews.llvm.org/D48397

llvm-svn: 335293
2018-06-21 21:37:53 +00:00
Tim Shen 63f244c4f4 [SCEV] Re-apply r335197 (with Polly fixes).
Summary:
This initiates a discussion on changing Polly accordingly while re-applying r335197 (D48338).

I have never worked on Polly. The proposed change to param_div_div_div_2.ll is not educated, but just patterns that match the output.

All LLVM files are already reviewed in D48338.

Reviewers: jdoerfert, bollu, efriedma

Subscribers: jlebar, sanjoy, hiraditya, llvm-commits, bixia

Differential Revision: https://reviews.llvm.org/D48453

llvm-svn: 335292
2018-06-21 21:29:54 +00:00
Konstantin Zhuravlyov e004b3d97b AMDGPU: Remove ability to reserve VGPRs for debugger
Differential Revision: https://reviews.llvm.org/D48234

llvm-svn: 335288
2018-06-21 20:28:19 +00:00
Reid Kleckner 13c9ee684c [mingw] Fix GCC ABI compatibility for comdat things
Summary:
GCC and the binutils COFF linker do comdats differently from MSVC.
If we want to be ABI compatible, we have to do what they do, which is to
emit unique section names like ".text$_Z3foov" instead of short section
names like ".text". Otherwise, the binutils linker gets confused and
reports multiple definition errors when two object files from GCC and
Clang containing the same inline function are linked together.

The best description of the issue is probably at
https://github.com/Alexpux/MINGW-packages/issues/1677, we don't seem to
have a good one in our tracker.

I fixed up the .pdata and .xdata sections needed everywhere other than
32-bit x86. GCC doesn't use associative comdats for those, it appears to
rely on the section name.

Reviewers: smeenai, compnerd, mstorsjo, martell, mati865

Subscribers: llvm-commits, hiraditya

Differential Revision: https://reviews.llvm.org/D48402

llvm-svn: 335286
2018-06-21 20:27:38 +00:00
Sanjay Patel a76b70069d [InstCombine] fold vector select of binops with constant ops to 1 binop (PR37806)
This is the simplest case from PR37806:
https://bugs.llvm.org/show_bug.cgi?id=37806

If we have a common variable operand used in a pair of binops with vector constants 
that are vector selected together, then we can constant shuffle the constant vectors 
to eliminate the shuffle instruction.

This has some tricky parts that are hopefully addressed in the tests and their 
respective comments:

  1. If the shuffle mask contains an undef element, then that lane of the result is 
     undef:
     http://llvm.org/docs/LangRef.html#shufflevector-instruction

     Therefore, we can replace the constant in that lane with an undef value except 
     for div/rem. With div/rem, an undef in the divisor would cause the whole op to 
     be undef. So I'm using the same hack as in D47686 - replace the undefs with '1'.

  2. Intersect the wrapping and FMF of the original binops for the new binop. There 
     should be no extra poison or fast-math potential in the new binop that wasn't 
     possible in the original code.

  3. Disregard other uses. Given that we're eliminating uses (shortening the 
     dependency chain), I think that's always the right IR canonicalization. But 
     I purposely chose the udiv test to demonstrate the scenario where both 
     intermediate values have other uses because that seems likely worse for 
     codegen with an expensive math op. This seems like a very rare possibility to 
     me, so I don't think it requires a backend patch first.

Differential Revision: https://reviews.llvm.org/D48401

llvm-svn: 335283
2018-06-21 20:15:09 +00:00
Scott Linder 1e8c2c705d [AMDGPU] Update assembler for HSA Code Object v3
Update AMDGPU assembler syntax behind the code-object-v3 feature:

* Replace/rename most AMDGPU assembler directives/symbols and document them.
* Provide more diagnostics (e.g. values out of range, missing values, repeated
  values).
* Provide path for backwards compatibility, even with underlying descriptor
  changes.

Differential Revision: https://reviews.llvm.org/D47736

llvm-svn: 335281
2018-06-21 19:38:56 +00:00
Francis Visoiu Mistrih ac599b6951 Revert r335206 "Recommit r333268: [IPSCCP] Use PredicateInfo to propagate facts from cmp instructions."
This reverts commit r335206.

As discussed here: https://reviews.llvm.org/rL333740, a fix will come
tomorrow. In the meanwhile, revert this to fix some bots.

llvm-svn: 335272
2018-06-21 19:18:36 +00:00
Simon Dardis 3505045b42 [mips] Modify comment to test new email address (NFC).
llvm-svn: 335269
2018-06-21 18:52:32 +00:00
Scott Linder 5792dd0f39 [AMDGPU] Fix bug with tracking processed blocks in SIInsertWaitcnts
BlockWaitcntProcessedSet was not being cleared between calls, so it was
producing incorrect counts in cases where MBB addresses happened to coincide
across multiple calls.

Differential Revision: https://reviews.llvm.org/D48391

llvm-svn: 335268
2018-06-21 18:48:48 +00:00
Konstantin Zhuravlyov 766c77efd7 AMDGPU/AMDHSA: Remove GridWorkGroupCountX/Y/Z
and everything that comes with it from implementation
and v3 header files.

Leave definition in v2 header files for backwards
compatibility.

Differential Revision: https://reviews.llvm.org/D48191

llvm-svn: 335267
2018-06-21 18:36:04 +00:00
Sanjay Patel 3382dc644e [InstCombine] add tests for shuffled cmps; NFC
llvm-svn: 335266
2018-06-21 18:07:38 +00:00
Matt Davis d041f21810 [DebugInfo] Ignore DBG_VALUE instructions in PostRA Machine Sink
Summary:
The logic for handling the sinking of COPY instructions was generating
different code when building with debug flags.

The original code did not take into consideration debug instructions.  This
resulted in the registers in the DBG_VALUE instructions being treated as used,
and prevented the COPY from being sunk.  This patch avoids analyzing debug
instructions when trying to sink COPY instructions.

This patch also creates a routine from the code in MachineSinking::SinkInstruction to
perform the logic of sinking an instruction along with its debug instructions.
This functionality is used in multiple places, including the code for sinking COPY instrs.


Reviewers: junbuml, javed.absar, MatzeB, bjope

Reviewed By: bjope

Subscribers: aprantl, probinson, thegameg, jonpa, bjope, vsk, kristof.beyls, JDevlieghere, llvm-commits

Tags: #debug-info

Differential Revision: https://reviews.llvm.org/D45637

llvm-svn: 335264
2018-06-21 17:59:52 +00:00
Sanjay Patel 3244537a3c [InstCombine] use constant pattern matchers with icmp+sext
The previous code worked with vectors, but it failed when the
vector constants contained undef elements. 
The matchers handle those cases.

llvm-svn: 335262
2018-06-21 17:51:44 +00:00
Sanjay Patel 5522e968ad [InstCombine] add vector icmp tests with undefs; NFC
llvm-svn: 335261
2018-06-21 17:37:14 +00:00
Sanjay Patel 7b0fc75f73 [InstCombine] simplify binops before trying other folds
This is outwardly NFC from what I can tell, but it should be more efficient 
to simplify first (despite the name, SimplifyAssociativeOrCommutative does
not actually simplify as InstSimplify does - it creates/morphs instructions).

This should make it easier to refactor duplicated code that runs for all binops.

llvm-svn: 335258
2018-06-21 17:06:36 +00:00
Sanjay Patel 447e8ece4d [LoopVectorize] regenerate full checks; NFC
llvm-svn: 335257
2018-06-21 16:54:32 +00:00
Craig Topper 8014053cbd [X86] Update fast-isel tests for clang r335253.
The new IR fixes a mismatch in the final extractelement for the i32 intrinsics. Previously we extracted a 64-bit element even though we only wanted 32 bits.

SimplifyDemandedElts isn't able to make FP elements undef now and the shuffle mask I used prevents the use of horizontal add we had before. Not sure we should have been using horizontal add anyway. It's implemented on Intel with two port 5 shuffles and an add. So we have on less shuffle now, but an additional instruction to decode.

Differential Revision: https://reviews.llvm.org/D48347

llvm-svn: 335256
2018-06-21 16:54:18 +00:00
Paul Robinson 547adcaaac [DWARF] Warn on and ignore ".file 0" for DWARF v4 and earlier.
This had been messing with the directory table for prior versions, and
also could induce a crash when generating asm output.

llvm-svn: 335254
2018-06-21 16:42:03 +00:00
Sirish Pande b60acb9e48 Revert "[AArch64] Coalesce Copy Zero during instruction selection"
This reverts commit d8f57105010cc7e78026e511d5def873fc91e0e7.

Original Commit:

Author: Haicheng Wu <haicheng@codeaurora.org>
Date:   Sun Feb 18 13:51:33 2018 +0000

    [AArch64] Coalesce Copy Zero during instruction selection

    Add special case for copy of zero to avoid a double copy.

    Differential Revision: https://reviews.llvm.org/D36104

Author's intention is to remove a BB that has one mov instruction. In
order to do that, d8f571050 pessmizes MachineSinking by introducing a
copy, such that mov instruction is NOT moved to the BB. Optimization
downstream gets rid of the BB with only mov instruction. This works well
if we have only one fall through branch as there is only one "extra"
mov instruction.

If we have multiple fall throughs, we will have a lot of redundant movs.
In such a case, it's better to have this BB which has one mov instruction.

This is causing degradation in jpeg, fft and other codebases. I believe
if we want to remove a BB with only one branch instruction, we should not
pessimize Machine Sinking at all, and find some other solution.

llvm-svn: 335251
2018-06-21 16:05:24 +00:00