Commit Graph

40070 Commits

Author SHA1 Message Date
James Molloy b7de497cb9 [Thumb] Don't try and emit LDRH/LDRB from the constant pool
This is not a valid encoding - these instructions cannot do PC-relative addressing.

The underlying problem here is of whitelist in ARMISelDAGToDAG that unwraps ARMISD::Wrappers during addressing-mode selection. This didn't realise TargetConstantPool was actually possible, so didn't handle it.

llvm-svn: 283323
2016-10-05 14:52:13 +00:00
Douglas Katzman 8449b238ea [X86] Fix some tests that didn't assert anything
llvm-svn: 283322
2016-10-05 14:46:14 +00:00
Oren Ben Simhon 0670e5a35b Test commit permission
llvm-svn: 283319
2016-10-05 14:12:41 +00:00
Krzysztof Parzyszek e7c72cdbb0 Fix machine operand traversal in ScheduleDAGInstrs::fixupKills
llvm-svn: 283315
2016-10-05 13:15:06 +00:00
Kyle Butt 25ac35d822 Revert "Codegen: Tail-duplicate during placement."
This reverts commit 062ace9764953e9769142c1099281a345f9b6bdc.

Issue with loop info and block removal revealed by polly.
I have a fix for this issue already in another patch, I'll re-roll this
together with that fix, and a test case.

llvm-svn: 283292
2016-10-05 01:39:29 +00:00
Kyle Butt adabac2d57 Codegen: Tail-duplicate during placement.
The tail duplication pass uses an assumed layout when making duplication
decisions. This is fine, but passes up duplication opportunities that
may arise when blocks are outlined. Because we want the updated CFG to
affect subsequent placement decisions, this change must occur during
placement.

In order to achieve this goal, TailDuplicationPass is split into a
utility class, TailDuplicator, and the pass itself. The pass delegates
nearly everything to the TailDuplicator object, except for looping over
the blocks in a function. This allows the same code to be used for tail
duplication in both places.

This change, in concert with outlining optional branches, allows
triangle shaped code to perform much better, esepecially when the
taken/untaken branches are correlated, as it creates a second spine when
the tests are small enough.

Issue from previous rollback fixed, and a new test was added for that
case as well.

Differential revision: https://reviews.llvm.org/D18226

llvm-svn: 283274
2016-10-04 23:54:18 +00:00
Sanjay Patel bfdbea6481 [Target] move reciprocal estimate settings from TargetOptions to TargetLowering
The motivation for the change is that we can't have pseudo-global settings for
codegen living in TargetOptions because that doesn't work with LTO.

Ideally, these reciprocal attributes will be moved to the instruction-level via
FMF, metadata, or something else. But making them function attributes is at least
an improvement over the current state.

The ingredients of this patch are:

    Remove the reciprocal estimate command-line debug option.
    Add TargetRecip to TargetLowering.
    Remove TargetRecip from TargetOptions.
    Clean up the TargetRecip implementation to work with this new scheme.
    Set the default reciprocal settings in TargetLoweringBase (everything is off).
    Update the PowerPC defaults, users, and tests.
    Update the x86 defaults, users, and tests.

Note that if this patch needs to be reverted, the related clang patch checked in
at r283251 should be reverted too.

Differential Revision: https://reviews.llvm.org/D24816

llvm-svn: 283252
2016-10-04 20:46:43 +00:00
Kevin Enderby f993d6e72c Next set of additional error checks for invalid Mach-O files for the
load commands that uses the MachO::encryption_info_command and
MachO::encryption_info_command types but not used in llvm libObject
code but used in llvm tool code.

This includes just LC_ENCRYPTION_INFO and
LC_ENCRYPTION_INFO_64 load commands.

llvm-svn: 283250
2016-10-04 20:37:43 +00:00
Matthias Braun 46a5238682 AArch64: Macrofusion: Split features, add missing combinations.
AArch64InstrInfo::shouldScheduleAdjacent() determines whether two
instruction can benefit from macroop fusion on apple CPUs. The list
turned out to be incomplete:
- the "rr" variants of the instructions were missing
- even the "rs" variants can have shift value == 0 and behave like the
  "rr" variants

This also splits the MacropFusion target feature into
ArithmeticBccFusion and ArithmeticCbzFusion.

Differential Revision: https://reviews.llvm.org/D25142

llvm-svn: 283243
2016-10-04 19:28:21 +00:00
Hal Finkel bdd6735a9e Don't filter diagnostics written as YAML to the output file
The purpose of the YAML diagnostic output file is to collect information on
optimizations performed, or not performed, for later processing by tools that
help users (and compiler developers) understand how code was optimized. As
such, the diagnostics that appear in the file should not be coupled to what a
user might want to see summarized for them as the compiler runs, and in fact,
because the user likely does not know what optimization diagnostics their tools
might want to use, the user cannot provide a useful filter regardless. As such,
we shouldn't filter the diagnostics going to the output file.

Differential Revision: https://reviews.llvm.org/D25224

llvm-svn: 283236
2016-10-04 18:13:45 +00:00
Adam Nemet 0428e93217 Serialize remark argument as a mapping to get proper quotation for the value.
llvm-svn: 283231
2016-10-04 17:05:04 +00:00
Alexey Bataev 7e217c2402 [SLPVectorizer] Add a test with non-vectorizable IR.
llvm-svn: 283225
2016-10-04 15:07:23 +00:00
Anna Thomas 479cbb9405 [RS4GC] Handle ShuffleVector instruction in findBasePointer
Summary:
This patch modifies the findBasePointer to handle the shufflevector instruction.

Tests run: RS4GC tests, local downstream tests.

Reviewers: reames, sanjoy

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D25197

llvm-svn: 283219
2016-10-04 13:48:37 +00:00
Andrey Bokhanko 6903be56d5 Fix IntegerType::MAX_INT_BITS value
IntegerType::MAX_INT_BITS is apparently not in sync with Type::SubclassData
size. This patch fixes this.

Differential Revision: https://reviews.llvm.org/D24814

llvm-svn: 283215
2016-10-04 12:43:46 +00:00
Nemanja Ivanovic 6354d23555 [Power9] Exploit D-Form VSX Scalar memory ops that target full VSX register set
This patch corresponds to review:

The newly added VSX D-Form (register + offset) memory ops target the upper half
of the VSX register set. The existing ones target the lower half. In order to
unify these and have the ability to target all the VSX registers using D-Form
operations, this patch defines Pseudo-ops for the loads/stores which are
expanded post-RA. The expansion then choses the correct opcode based on the
register that was allocated for the operation.

llvm-svn: 283212
2016-10-04 11:25:52 +00:00
Simon Dardis 86b3a1e79b [mips][fastisel] Consider soft-float an unsupported floating point mode
Treat soft-float as unsupported for fast-isel. Additionally, ensure we check
that lowering f32 arguments also considers the case of soft-float mode.

Reviewers: ehostunreach, vkalintiris, zoran.jovanovic

Differential Review: https://reviews.llvm.org/D24505

llvm-svn: 283209
2016-10-04 10:35:07 +00:00
George Rimar 67443021a4 [Object/ELF] - Do not crash on invalid sh_offset value of REL[A] section.
Previously code would access invalid memory and may crash,
patch fixes the issue.

Differential revision: https://reviews.llvm.org/D25187

llvm-svn: 283204
2016-10-04 09:25:39 +00:00
George Rimar 5cbf23664d [Object/ELF] - Avoid possible crash in getExtendedSymbolTableIndex().
When using broken input object found using AFL,
getExtendedSymbolTableIndex() crashed because ShndxTable
was empty as object does not contain SHT_SYMTAB_SHNDX section.

Differential revision: https://reviews.llvm.org/D25189

llvm-svn: 283196
2016-10-04 08:44:03 +00:00
Nemanja Ivanovic a565d9e612 Fix a test case failure on Apple PPC.
llvm-svn: 283191
2016-10-04 07:37:38 +00:00
Nemanja Ivanovic 11049f8f07 [Power9] Part-word VSX integer scalar loads/stores and sign extend instructions
This patch corresponds to review:
https://reviews.llvm.org/D23155

This patch removes the VSHRC register class (based on D20310) and adds
exploitation of the Power9 sub-word integer loads into VSX registers as well
as vector sign extensions.
The new instructions are useful for a few purposes:

    Int to Fp conversions of 1 or 2-byte values loaded from memory
    Building vectors of 1 or 2-byte integers with values loaded from memory
    Storing individual 1 or 2-byte elements from integer vectors

This patch implements all of those uses.

llvm-svn: 283190
2016-10-04 06:59:23 +00:00
Kyle Butt 3ffb8529bc Revert "Codegen: Tail-duplicate during placement."
This reverts commit ff234efbe23528e4f4c80c78057b920a51f434b2.

Causing crashes on aarch64 build.

llvm-svn: 283172
2016-10-04 00:38:23 +00:00
Eli Friedman 74bed9d757 Make GlobalsAA ignore dead constant expressions.
Slightly improves the precision of GlobalsAA in certain situations, and
makes the behavior of optimization passes more predictable.

Differential Revision: https://reviews.llvm.org/D24104

llvm-svn: 283165
2016-10-04 00:03:55 +00:00
Kyle Butt 396bfdd707 Codegen: Tail-duplicate during placement.
The tail duplication pass uses an assumed layout when making duplication
decisions. This is fine, but passes up duplication opportunities that
may arise when blocks are outlined. Because we want the updated CFG to
affect subsequent placement decisions, this change must occur during
placement.

In order to achieve this goal, TailDuplicationPass is split into a
utility class, TailDuplicator, and the pass itself. The pass delegates
nearly everything to the TailDuplicator object, except for looping over
the blocks in a function. This allows the same code to be used for tail
duplication in both places.

This change, in concert with outlining optional branches, allows
triangle shaped code to perform much better, esepecially when the
taken/untaken branches are correlated, as it creates a second spine when
the tests are small enough.

llvm-svn: 283164
2016-10-04 00:00:09 +00:00
Matthias Braun d2fc0d40e4 Set some tests to an unknown vendor and OS
This avoids llc using the hosts OS/vendor as defaults and triggering
unwanted behaviour in the tests. This should deal with the buildbot
breakages on windows after r283140.

llvm-svn: 283149
2016-10-03 21:58:20 +00:00
Mehdi Amini 762d68bbab [LTO] Fix test to not depend on the exact address of symbols, just their linkage
llvm-svn: 283148
2016-10-03 21:40:50 +00:00
Krzysztof Parzyszek c8b6ecabd8 [RDF] Fix liveness propagation through shadows
Each shadow only represents data flow that is restricted to its reaching
def. Propagating more than that could lead to spurious register liveness,
resulting in extra (incorrectly) block live-ins.

llvm-svn: 283143
2016-10-03 20:17:20 +00:00
Matthias Braun eccdee9196 X86: Do not produce GOT relocations on windows
Windows has no GOT relocations the way elf/darwin has. Some people use
x86_64-pc-win32-macho to build EFI firmware; Do not produce GOT
relocations for this target.

Differential Revision: https://reviews.llvm.org/D24627

llvm-svn: 283140
2016-10-03 20:11:24 +00:00
Sanjoy Das 0359a193a7 [PruneEH] Be correct in the face IPO
This fixes one spot I had missed in r265762.  Credit goes to Philip
Reames for spotting this one!

llvm-svn: 283137
2016-10-03 19:35:30 +00:00
Konstantin Zhuravlyov 691e2e020b [AMDGPU] Sign extend AShr when promoting (instead of zero extending)
llvm-svn: 283130
2016-10-03 18:29:01 +00:00
Hans Wennborg b4d2678c6f Jump threading: avoid trying to split edge into landingpad block (PR27840)
Splitting the edge is nontrivial because of the landing pad, and we would
currently assert trying to do it.

Differential Revision: https://reviews.llvm.org/D24680

llvm-svn: 283129
2016-10-03 18:18:04 +00:00
Sanjay Patel d27a21874b [x86, SSE/AVX] allow 128/256-bit lowering for copysign vector intrinsics (PR30433)
This should fix:
https://llvm.org/bugs/show_bug.cgi?id=30433

There are a couple of open questions about the codegen:
1. Should we let scalar ops be scalars and avoid vector constant loads/splats?
2. Should we have a pass to combine constants such as the inverted pair that we have here?

Differential Revision: https://reviews.llvm.org/D25165
 

llvm-svn: 283119
2016-10-03 16:38:27 +00:00
Rafael Espindola d7325ee702 Don't drop the llvm. prefix when renaming.
If the llvm. prefix is dropped other parts of llvm don't see this as
an intrinsic.  This means that the number of regular symbols depends
on the context the module is loaded into, which causes LTO to abort.

Fixes PR30509.

llvm-svn: 283117
2016-10-03 15:51:42 +00:00
Nirav Dave 157891c57f Prevent out of order HashDirective lexing in AsmLexer.
Retrying after buildbot reset.

To lex hash directives we peek ahead to find component tokens, create a
unified token, and unlex the peeked tokens so the parser does not need
to parse the tokens then. Make sure we do not to lex another hash
directive during peek operation.

This fixes PR28921.

Reviewers: rnk, loladiro

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D24839

llvm-svn: 283111
2016-10-03 13:48:27 +00:00
Matt Arsenault 40bae76620 AMDGPU: Fix missing -verify-machineinstrs in test
llvm-svn: 283107
2016-10-03 12:58:59 +00:00
Simon Pilgrim 52ab136881 [X86][SSE] Add PR30371 (shuffle constant folding) test case
llvm-svn: 283103
2016-10-03 12:16:39 +00:00
Sjoerd Meijer 4dbe73c1ed [ARM] Code size optimisation to lower udiv+urem to udiv+mls instead of a
library call to __aeabi_uidivmod. This is an improved implementation of
r280808, see also D24133, that got reverted because isel was stuck in a loop.
That was caused by the optimisation incorrectly triggering on i64 ints, which
shouldn't happen because there is no 64bit hwdiv support; that put isel's type
legalization and this optimisation in a loop. A native ARM compiler and testing
now shows that this is fixed.

Patch mostly by Pablo Barrio.

Differential Revision: https://reviews.llvm.org/D25077

llvm-svn: 283098
2016-10-03 10:12:32 +00:00
Alexey Bataev fe91cf3aba [CodeGen] Adding a test showing the current state of poor code gen of
search loop, by Andrey Tischenko

PR27136 shows failure to hoist constant out of loop. This test is used
as start point to fix the failure: it shows the current state of codegen
and discovers what should be fixed

Differential Revision: https://reviews.llvm.org/D25097

llvm-svn: 283091
2016-10-03 07:47:01 +00:00
Simon Pilgrim a8d2168cb0 [X86][AVX2] Add support for combining target shuffles to VPERMD/VPERMPS
llvm-svn: 283080
2016-10-02 21:07:58 +00:00
Simon Pilgrim bce1f6b491 [X86][AVX2] Missed opportunities to combine to VPERMD/VPERMPS
llvm-svn: 283077
2016-10-02 20:43:02 +00:00
Simon Pilgrim b5200971d6 [X86][AVX2] Fix typo in test names
We are testing vpermps not vpermd

llvm-svn: 283076
2016-10-02 19:31:58 +00:00
Sanjay Patel 170d7eb303 [x86] remove 'nan' strings from copysign assertions; NFC
Preemptively scrubbing these to avoid a bot fail as in PR30443:
https://llvm.org/bugs/show_bug.cgi?id=30443

I'm nearly done with a patch to fix these cases, so not trying very
hard to do better for the temporary win. 

I plan to use better checks than what the script produces for the vectorized cases.

llvm-svn: 283072
2016-10-02 17:07:24 +00:00
Sanjay Patel dfbbbcd662 [x86] add test to show unnecessary scalarization of copysign intrinsics (PR30433)
llvm-svn: 283071
2016-10-02 16:31:35 +00:00
Simon Pilgrim 03afbe783d [X86][AVX] Ensure broadcast loads respect dependencies
To allow broadcast loads of a non-zero'th vector element, lowerVectorShuffleAsBroadcast can replace a load with a new load with an adjusted address, but unfortunately we weren't ensuring that the new load respected the same dependencies.

This patch adds a TokenFactor and updates all dependencies of the old load to reference the new load instead.

Bug found during internal testing.

Differential Revision: https://reviews.llvm.org/D25039

llvm-svn: 283070
2016-10-02 15:59:15 +00:00
Hal Finkel a9321059b9 [PowerPC] Refactor soft-float support, and enable PPC64 soft float
This change enables soft-float for PowerPC64, and also makes soft-float disable
all vector instruction sets for both 32-bit and 64-bit modes. This latter part
is necessary because the PPC backend canonicalizes many Altivec vector types to
floating-point types, and so soft-float breaks scalarization support for many
operations. Both for embedded targets and for operating-system kernels desiring
soft-float support, it seems reasonable that disabling hardware floating-point
also disables vector instructions (embedded targets without hardware floating
point support are unlikely to have Altivec, etc. and operating system kernels
desiring not to use floating-point registers to lower syscall cost are unlikely
to want to use vector registers either). If someone needs this to work, we'll
need to change the fact that we promote many Altivec operations to act on
v4f32. To make it possible to disable Altivec when soft-float is enabled,
hardware floating-point support needs to be expressed as a positive feature,
like the others, and not a negative feature, because target features cannot
have dependencies on the disabling of some other feature. So +soft-float has
now become -hard-float.

Fixes PR26970.

llvm-svn: 283060
2016-10-02 02:10:20 +00:00
Martell Malone 3a4d900039 COFF: Fix short import lib import name type bitshift
As per the PE COFF spec (section 8.3, Import Name Type)
Offset: 18 Size 2 bits Name: Type
Offset: 20 Size 3 bits Name: Name Type

Offset: 20 added based on 18+2

Partially commited as rL279069

Differential Revision: https://reviews.llvm.org/D23540

llvm-svn: 283055
2016-10-01 23:10:20 +00:00
Simon Pilgrim 04e249b128 [SLPVectorizer][X86] Added fptosi/fptoui tests
llvm-svn: 283048
2016-10-01 19:35:59 +00:00
Simon Pilgrim 1ec20e9b0a [CostModel][X86] Added tests for current fptosi/fptoui costs
llvm-svn: 283047
2016-10-01 19:09:59 +00:00
Simon Pilgrim 567c4fbdae [SLPVectorizer][X86] Added fcopysign tests
llvm-svn: 283046
2016-10-01 17:00:26 +00:00
Simon Pilgrim cceeb2a4fa [SLPVectorizer][X86] Added fabs tests
llvm-svn: 283045
2016-10-01 16:54:01 +00:00
Simon Pilgrim e0ec5c1f05 [CostModel][X86] Added fcopysign costs
llvm-svn: 283044
2016-10-01 16:41:52 +00:00
Simon Pilgrim 8b021c382d [CostModel][X86] Added fabs costs
llvm-svn: 283042
2016-10-01 16:30:13 +00:00
Simon Pilgrim 1638d49f20 [X86][SSE] Add support for combining target shuffles to binary BLEND
We already had support for 1-input BLEND with zero - this adds support for 2-input BLEND as well.

llvm-svn: 283040
2016-10-01 16:04:28 +00:00
Simon Pilgrim ae17cf20ce [X86][SSE] Always combine target shuffles to MOVSD/MOVSS
Now we can commute to BLENDPD/BLENDPS on SSE41+ targets if necessary, so simplify the combine matching where we can.

This required me to add a couple of scalar math movsd/moss fold patterns that hadn't been needed in the past.

llvm-svn: 283038
2016-10-01 15:33:01 +00:00
Simon Pilgrim ccdd1ff49b [X86][SSE] Enable commutation from MOVSD/MOVSS to BLENDPD/BLENDPS on SSE41+ targets
Instead of selecting between MOVSD/MOVSS and BLENDPD/BLENDPS at shuffle lowering by subtarget this will help us select the instruction based on actual commutation requirements.

We could possibly add BLENDPD/BLENDPS -> MOVSD/MOVSS commutation and MOVSD/MOVSS memory folding using a similar approach if it proves useful

I avoided adding AVX512 handling as I'm not sure when we should be making use of VBLENDPD/VBLENDPS on EVEX targets

llvm-svn: 283037
2016-10-01 14:26:11 +00:00
Simon Pilgrim f1c575bad7 [X86][SSE] Regenerate vselect tests and improve AVX1/AVX2 coverage
llvm-svn: 283035
2016-10-01 13:10:14 +00:00
Nirav Dave e4c6153cf1 Revert "[MC] Prevent out of order HashDirective lexing in AsmLexer."
This reverts commit r282992 which appears to be causing an LTO test failure.

llvm-svn: 283034
2016-10-01 10:57:55 +00:00
Craig Topper 5eb5ade894 [X86] Cleanup patterns for using VMOVDDUP for broadcasts.
-Remove OptForSize. Not all of the backend follows the same rules for creating broadcasts and there is no conflicting pattern.
-Don't stop selecting VEX VMOVDDUP when AVX512 is supported. We need VLX for EVEX VMOVDDUP.
-Only use VMOVDDUP for v2i64 broadcasts if AVX2 is not supported.

llvm-svn: 283020
2016-10-01 07:11:24 +00:00
Craig Topper 8aca90507f [AVX-512] Add VLX command lines to 128 and 256-bit shufffle tests.
llvm-svn: 283014
2016-10-01 06:01:18 +00:00
Mehdi Amini 86eeda8e20 Revert "AMDGPU: Don't use offen if it is 0"
This reverts commit r282999.
Tests are not passing: http://lab.llvm.org:8011/builders/clang-x86_64-linux-selfhost-modules/builds/20038

llvm-svn: 283003
2016-10-01 02:35:24 +00:00
Matt Arsenault 3070fdf798 AMDGPU: Don't use offen if it is 0
This removes many re-initializations of a base register to 0.

llvm-svn: 282999
2016-10-01 01:37:15 +00:00
Nirav Dave 9f2bd4e7ea [MC] Prevent out of order HashDirective lexing in AsmLexer.
To lex hash directives we peek ahead to find component tokens, create a
unified token, and unlex the peeked tokens so the parser does not need
to parse the tokens then. Make sure we do not to lex another hash
directive during peek operation.

This fixes PR28921.

Reviewers: rnk, loladiro

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D24839

llvm-svn: 282992
2016-10-01 00:42:32 +00:00
Mehdi Amini 6610b01a27 [ASAN] Add the binder globals on Darwin to llvm.compiler.used to avoid LTO dead-stripping
The binder is in a specific section that "reverse" the edges in a
regular dead-stripping: the binder is live as long as a global it
references is live.

This is a big hammer that prevents LLVM from dead-stripping these,
while still allowing linker dead-stripping (with special knowledge
of the section).

Differential Revision: https://reviews.llvm.org/D24673

llvm-svn: 282988
2016-10-01 00:05:34 +00:00
Reid Kleckner 9cb915b7be [SEH] Emit the parent frame offset label even if there are no funclets
This avoids errors about references to undefined local labels from
unreferenced filter functions.

Fixes (sort of) PR30431

llvm-svn: 282967
2016-09-30 22:10:12 +00:00
Rui Ueyama 5d6714e593 Do not pass a superblock to PDBFileBuilder.
When we create a PDB file using PDBFileBuilder, the information
in the superblock, such as the size of the resulting file, is not
available.

Previously, PDBFileBuilder::initialize took a superblock assuming
that all the members of the struct are correct. That is useful when
you want to restore the exact information from a YAML file, but
that's probably the only use case in which that is useful.
When we are creating a PDB file on the fly, we have to backfill the
members.

This patch redefines PDBFileBuilder::initialize to take only a
block size. Now all the other members are left as default values,
so that they'll be updated when commit() is called.

Differential Revision: https://reviews.llvm.org/D25108

llvm-svn: 282944
2016-09-30 20:52:12 +00:00
Hans Wennborg b5643b47b6 X86: Allow conditional tail calls in Win64 "leaf" functions (PR26302)
We can't use Jcc to leave a Win64 function in general, because that
confuses the unwinder. However, for "leaf" functions, that is, functions
where the return address is always on top of the stack and which don't
have unwind info, it's OK.

Differential Revision: https://reviews.llvm.org/D24836

llvm-svn: 282920
2016-09-30 20:07:35 +00:00
Sanjay Patel f7b851fe84 [InstCombine] allow non-splat folds of select cond (ext X), C
llvm-svn: 282906
2016-09-30 19:49:22 +00:00
Dehao Chen a067b0926f Revert test change in r282894 as it's broken in some platforms.
llvm-svn: 282903
2016-09-30 19:25:23 +00:00
Gor Nishanov a263a60ad5 [Coroutines] Part15c: Fix coro-split to correctly handle definitions between coro.save and coro.suspend
Summary:
In the case below, %Result.i19 is defined between coro.save and coro.suspend and used after coro.suspend. We need to correctly place such a value into the coroutine frame.

```
  %save = call token @llvm.coro.save(i8* null)
  %Result.i19 = getelementptr inbounds %"struct.lean_future<int>::Awaiter", %"struct.lean_future<int>::Awaiter"* %ref.tmp7, i64 0, i32 0
  %suspend = call i8 @llvm.coro.suspend(token %save, i1 false)
  switch i8 %suspend, label %exit [
    i8 0, label %await.ready
    i8 1, label %exit
  ]
await.ready:
  %val = load i32, i32* %Result.i19

```

Reviewers: majnemer

Subscribers: llvm-commits, mehdi_amini

Differential Revision: https://reviews.llvm.org/D24418

llvm-svn: 282902
2016-09-30 19:24:19 +00:00
Sanjay Patel 75b2518762 [InstCombine] add tests for non-splat select(ext)
llvm-svn: 282901
2016-09-30 19:15:41 +00:00
Gor Nishanov c16219486a [Coroutines] Part15b: Fix dbg information handling in coro-split.
Summary:
Without the fix, if there was a function inlined into the coroutine with debug information, CloneFunctionInto(NewF, &F, VMap, /*ModuleLevelChanges=*/true, Returns); would duplicate all of the debug information including the DICompileUnit.

We know use VMap to indicate that debug metadata for a File, Unit and FunctionType should not be duplicated when we creating clones that will become f.resume, f.destroy and f.cleanup.

Reviewers: majnemer

Subscribers: mehdi_amini, llvm-commits

Differential Revision: https://reviews.llvm.org/D24417

llvm-svn: 282899
2016-09-30 19:05:06 +00:00
Gor Nishanov 768de2c604 [Coroutines] Part 15a: Lower coro.subfn.addr in CoroCleanup
Summary: Not all coro.subfn.addr intrinsics can be eliminated in CoroElide through devirtualization. Those that remain need to be lowered in CoroCleanup.

Reviewers: majnemer

Subscribers: llvm-commits, mehdi_amini

Differential Revision: https://reviews.llvm.org/D24412

llvm-svn: 282897
2016-09-30 18:41:35 +00:00
Sanjay Patel b43712a513 clean up tests and auto-generate checks
llvm-svn: 282896
2016-09-30 18:37:34 +00:00
Michal Gorny b002f6370b cmake: Install the OCaml libraries into a more correct path
Add a OCAML_INSTALL_PATH variable that can be used to control
the install path for OCaml libraries. The new variable defaults to
${OCAML_STDLIB_PATH}, i.e. the OCaml library path obtained from
the OCaml compiler. Install libraries into "llvm" subdirectory.

This fixes two issues:

1. OCaml library directories differ between systems, and 'lib/ocaml' is
incorrect e.g. on amd64 Gentoo where OCaml is installed
in 'lib64/ocaml'. Therefore, obtain the library path from the OCaml
compiler using 'ocamlc -where' (which is already used to set
OCAML_STDLIB_PATH), which is the method used commonly in OCaml packages.

2. The top-level directory is reserved for the standard library, and has
precedence over local directory in search path. As a result, OCaml
preferred the files installed along with previous LLVM version over the
source tree when building a new version, resulting in two versions being
mixed during the build. The new layout is used commonly by other OCaml
packages, and findlib is able to find the LLVM libraries successfully.

Bug: https://bugs.gentoo.org/559134
Bug: https://bugs.gentoo.org/559624

Differential Revision: https://reviews.llvm.org/D24354

llvm-svn: 282895
2016-09-30 18:34:23 +00:00
Dehao Chen 977853b7c5 Update loop unroller cost model to make sure debug info does not affect optimization decisions.
Summary: Debug info should *not* affect optimization decisions. This patch updates loop unroller cost model to make it not affected by debug info.

Reviewers: davidxl, mzolotukhin

Subscribers: haicheng, llvm-commits, mzolotukhin

Differential Revision: https://reviews.llvm.org/D25098

llvm-svn: 282894
2016-09-30 18:30:04 +00:00
Sanjay Patel 9685b3eb57 [InstCombine] add tests for select X, (ext X), C
llvm-svn: 282891
2016-09-30 18:10:14 +00:00
Derek Schuff e9e6891b2d [WebAssembly] Make register stackification more conservative
Register stackification currently checks VNInfo for changes. Make that
more accurate by testing each intervening instruction for any other defs
to the same virtual register.

Patch by Jacob Gravelle

Differential Revision: https://reviews.llvm.org/D24942

llvm-svn: 282886
2016-09-30 18:02:54 +00:00
Etienne Bergeron 0ca0568604 [asan] Support dynamic shadow address instrumentation
Summary:
This patch is adding the support for a shadow memory with
dynamically allocated address range.

The compiler-rt needs to export a symbol containing the shadow
memory range.

This is required to support ASAN on windows 64-bits.

Reviewers: kcc, rnk, vitalybuka

Subscribers: zaks.anna, kubabrecka, dberris, llvm-commits, chrisha

Differential Revision: https://reviews.llvm.org/D23354

llvm-svn: 282881
2016-09-30 17:46:32 +00:00
Matthew Simpson 7808833e28 [LV] Build all scalar steps for non-uniform induction variables
When building the steps for scalar induction variables, we previously attempted
to determine if all the scalar users of the induction variable were uniform. If
they were, we would only emit the step corresponding to vector lane zero. This
optimization was too aggressive. We generally don't know the entire set of
induction variable users that will be scalar. We have
isScalarAfterVectorization, but this is only a conservative estimate of the
instructions that will be scalarized. Thus, an induction variable may have
scalar users that aren't already known to be scalar. To avoid emitting unused
steps, we can only check that the induction variable is uniform. This should
fix PR30542.

Reference: https://llvm.org/bugs/show_bug.cgi?id=30542
llvm-svn: 282863
2016-09-30 15:13:52 +00:00
Dylan McKay 309eba75b1 Revert "[RegAllocGreedy] Attempt to split unspillable live intervals"
It was accidentally committed.

llvm-svn: 282855
2016-09-30 14:05:15 +00:00
Dylan McKay 2a80cc688a [RegAllocGreedy] Attempt to split unspillable live intervals
Summary:
Previously, when allocating unspillable live ranges, we would never
attempt to split. We would always bail out and try last ditch graph
recoloring.

This patch changes this by attempting to split all live intervals before
performing recoloring.

This fixes LLVM bug PR14879.

I can't add test cases for any backends other than AVR because none of
them have small enough register classes to trigger the bug.

Reviewers: qcolombet

Subscribers: MatzeB

Differential Revision: https://reviews.llvm.org/D25070

llvm-svn: 282852
2016-09-30 13:59:20 +00:00
Craig Topper 3f37a4180b Revert r282835 "[AVX-512] Always use the full 32 register vector classes for addRegisterClass regardless of whether AVX512/VLX is enabled or not."
Turns out this doesn't pass verify-machineinstrs.

llvm-svn: 282841
2016-09-30 05:35:42 +00:00
Craig Topper bc6e97b8f4 [AVX-512] Always use the full 32 register vector classes for addRegisterClass regardless of whether AVX512/VLX is enabled or not.
If AVX512 is disabled, the registers should already be marked reserved. Pattern predicates and register classes on instructions should take care of most of the rest. Loads/stores and physical register copies for XMM16-31 and YMM16-31 without VLX have already been taken care of.

I'm a little unclear why this changed the register allocation of the SSE2 run of the sad.ll test, but the registers selected appear to be valid after this change.

llvm-svn: 282835
2016-09-30 04:31:33 +00:00
Piotr Padlewski d28694739c [thinlto] Don't decay threshold for hot callsites
Summary:
We don't want to decay hot callsites to import chains of hot
callsites. The same mechanism is used in LIPO.

Reviewers: tejohnson, eraman, mehdi_amini

Subscribers: llvm-commits, mehdi_amini

Differential Revision: https://reviews.llvm.org/D24976

llvm-svn: 282833
2016-09-30 03:01:17 +00:00
Matt Arsenault 5d8eb25e78 AMDGPU: Use unsigned compare for eq/ne
For some reason there are both of these available, except
for scalar 64-bit compares which only has u64. I'm not sure
why there are both (I'm guessing it's for the one bit inputs we
don't use), but for consistency always using the
unsigned one.

llvm-svn: 282832
2016-09-30 01:50:20 +00:00
Reid Kleckner 147f91c88e [X86] Don't preserve Win64 SSE CSRs when SSE is disabled
Code that doesn't use floating point and doesn't use SSE (kernel code)
shouldn't save and restore SSE registers.

Fixes PR30503

llvm-svn: 282819
2016-09-30 00:17:49 +00:00
Kevin Enderby 4f229d867b Next set of additional error checks for invalid Mach-O files for the
load command that uses the MachO::entry_point_command type
but not used in llvm libObject code but used in llvm tool code.

This includes just the LC_MAIN load command.

llvm-svn: 282766
2016-09-29 21:07:29 +00:00
Reid Kleckner e45b2c7d8e [codeview] Use character types for all byte-sized integer types
The VS debugger doesn't appear to understand the 0x68 or 0x69 type
indices, which were probably intended for use on a platform where a C
'int' is 8 bits. So, use the character types instead. Clang was already
using the character types because '[u]int8_t' is usually defined in
terms of 'char'.

See the Rust issue for screenshots of what VS does:
https://github.com/rust-lang/rust/issues/36646

Fixes PR30552

llvm-svn: 282739
2016-09-29 17:55:01 +00:00
Kevin Enderby 245be3ed2a Next set of additional error checks for invalid Mach-O files for the
load command that uses the Mach::source_version_command type
but not used in llvm libObject code but used in llvm tool code.

This includes just the LC_SOURCE_VERSION load command.

llvm-svn: 282736
2016-09-29 17:45:23 +00:00
Kostya Serebryany a9b0dd0e51 [sanitizer-coverage/libFuzzer] make the guards for trace-pc 32-bit; create one array of guards per function, instead of one guard per BB. reorganize the code so that trace-pc-guard does not create unneeded globals
llvm-svn: 282735
2016-09-29 17:43:24 +00:00
Piotr Padlewski ba72b95f7b [thinlto] Add cold-callsite import heuristic
Summary:
Not tunned up heuristic, but with this small heuristic there is about
+0.10% improvement on SPEC 2006

Reviewers: tejohnson, mehdi_amini, eraman

Subscribers: mehdi_amini, llvm-commits

Differential Revision: https://reviews.llvm.org/D24940

llvm-svn: 282733
2016-09-29 17:32:07 +00:00
Simon Pilgrim d72330dd09 [X86] Add explicit test triple to make windows/msvc builds happier
llvm-svn: 282719
2016-09-29 15:10:09 +00:00
Craig Topper bd74f75619 [X86] Really fix the FileCheck line from r282690.
Why does Folded Spill comments print with a different number of # characters on different systems?

llvm-svn: 282693
2016-09-29 06:49:21 +00:00
Craig Topper 1b60e9d7c0 [AVX-512] Fix a check line from r282690.
llvm-svn: 282691
2016-09-29 06:37:21 +00:00
Craig Topper d875d6b9b4 [AVX-512] Support spills of XMM16-31 and YMM16-31 when VLX isn't available.
This adds new pseudo instructions that can be selected during register allocation to represent loads and stores of XMM/YMM registers when AVX512F is available, but VLX isn't. They will be converted to VEX encoded moves if the register turns out to be XMM0-15/YMM0-15. Otherwise either an EVEX VEXTRACT(store) or VBROADCAST(load) will be used.

Fixes one of the cases from PR29112.

llvm-svn: 282690
2016-09-29 06:07:09 +00:00
Craig Topper f91830e6ee [X86] Remove extra FileCheck lines that got left behind in r282688.
llvm-svn: 282689
2016-09-29 06:07:07 +00:00
Craig Topper 7eb0e7ce1f [AVX-512] Replicate pattern from AVX to select VMOVDDUP for (v2f64 (X86VBroadcast f64:)). Add AVX512VL to command line of existing AVX2 test that hits this condition.
llvm-svn: 282688
2016-09-29 05:54:43 +00:00
Craig Topper e7f2611160 [X86] Add EVEX encoded VBROADCASTSS/SD and VPBROADCASTD/Q to execution domain fixing table.
llvm-svn: 282687
2016-09-29 05:54:39 +00:00
Craig Topper 7da0465062 [X86] Add 512-bit VPBROADCASTB and VPBROADCASTW tests.
llvm-svn: 282685
2016-09-29 05:54:32 +00:00
Craig Topper 816a1d7783 [X86] Add VBROADCASTF128/VBROADCASTI128 to execution domain fixing tables.
llvm-svn: 282684
2016-09-29 05:54:28 +00:00
Matt Arsenault e6740754f0 AMDGPU: Partially fix control flow at -O0
Fixes to allow spilling all registers at the end of the block
work with exec modifications. Don't emit s_and_saveexec_b64 for
if lowering, and instead emit copies. Mark control flow mask
instructions as terminators to get correct spill code placement
with fast regalloc, and then have a separate optimization pass
form the saveexec.

This should work if SGPRs are spilled to VGPRs, but
will likely fail in the case that an SGPR spills to memory
and no workitem takes a divergent branch.

llvm-svn: 282667
2016-09-29 01:44:16 +00:00
Lei Liu 361615cfd0 AArch64: Set shift bit of TLSLE HI12 add instruction
Summary: AArch64 LLVM assembler emits add instruction without shift bit to calculate the higher 12-bit address of TLS variables in local exec model.  This generates wrong code sequence to access TLS variables with thread offset larger than 0x1000.

Reviewers: t.p.northover, peter.smith, rovka

Subscribers: salim.nasser, aemerson, llvm-commits, rengolin

Differential Revision: https://reviews.llvm.org/D24702

llvm-svn: 282661
2016-09-29 01:05:48 +00:00
Evgeny Stupachenko dc8a254663 Wisely choose sext or zext when widening IV.
Summary:
The patch fixes regression caused by two earlier patches D18777 and D18867.

Reviewers: reames, sanjoy

Differential Revision: http://reviews.llvm.org/D24280

From: Li Huang
llvm-svn: 282650
2016-09-28 23:39:39 +00:00
Kevin Enderby 76966bf066 Next set of additional error checks for invalid Mach-O files for the
load command that uses the Mach::rpath_command type
but not used in llvm libObject code but used in llvm tool code.

This includes just the LC_RPATH load command.

llvm-svn: 282649
2016-09-28 23:16:01 +00:00
Mike Aizatsky 392caa538d [sancov] introducing symbolized coverage files (.symcov)
Summary:
Answering any meaningful questions about .sancov files requires
accessing symbol information from the corresponding binary.

This change introduces a separate intermediate data structure and
format: symbolized coverage. It contains all symbol information that
is required to answer common queries:
- merging
- coverd/uncovered files and functions
- line status.

Also removing the html report functionality from sancov: generated
HTML files are too huge, and a different approach is required.
Maintaining this half-working approach in the C++ is painful.

Differential Revision: https://reviews.llvm.org/D24947

llvm-svn: 282639
2016-09-28 21:39:28 +00:00
Kevin Enderby 32359dbf6b Next set of additional error checks for invalid Mach-O files for the
other load commands that use the Mach::version_min_command type
but not used in llvm libObject code but used in llvm tool code.

This includes LC_VERSION_MIN_MACOSX, LC_VERSION_MIN_IPHONEOS,
LC_VERSION_MIN_TVOS and LC_VERSION_MIN_WATCHOS load commands.

llvm-svn: 282635
2016-09-28 21:20:45 +00:00
Krzysztof Parzyszek dcb1bcae0b IfConversion: Add implicit uses for redefined regs with live subregisters
Normally, if conversion would add implicit uses for redefined registers,
e.g. R0<def> = add_if ..., R0<imp-use>. However, if only subregisters of
R0 are known to be live but not R0 itself, such implicit uses will not be
added, causing prior definitions of such subregisters and R0 itself to
become dead.

llvm-svn: 282626
2016-09-28 20:07:41 +00:00
Konstantin Zhuravlyov e14df4b236 [AMDGPU] Promote uniform i16 ops to i32 ops for targets that have 16 bit instructions
Differential Revision: https://reviews.llvm.org/D24125

llvm-svn: 282624
2016-09-28 20:05:39 +00:00
Sanjay Patel 3e9e5ccf7c [InstCombine] update to use FileCheck
Also, remove unnecessary function attributes, parameters, and comments.
It looks like at least some of these tests are not minimal though...

llvm-svn: 282620
2016-09-28 19:10:16 +00:00
Simon Pilgrim fea5c7a051 [X86][AVX] Add test showing that VBROADCAST loads don't correctly respect dependencies
llvm-svn: 282613
2016-09-28 17:59:30 +00:00
Artur Pilipenko b6ce6e5dac Don't look through addrspacecast in GetPointerBaseWithConstantOffset
Pointers in different addrspaces can have different sizes, so it's not valid to look through addrspace cast calculating base and offset for a value.

This is similar to D13008.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D24729

llvm-svn: 282612
2016-09-28 17:57:16 +00:00
Adrian Prantl 7f5866c227 Teach LiveDebugValues about lexical scopes.
This addresses PR26055 LiveDebugValues is very slow.

Contrary to the old LiveDebugVariables pass LiveDebugValues currently
doesn't look at the lexical scopes before inserting a DBG_VALUE
intrinsic. This means that we often propagate DBG_VALUEs much further
down than necessary. This is especially noticeable in large C++
functions with many inlined method calls that all use the same
"this"-pointer.

For example, in the following code it makes no sense to propagate the
inlined variable a from the first inlined call to f() into any of the
subsequent basic blocks, because the variable will always be out of
scope:

void sink(int a);
void __attribute((always_inline)) f(int a) { sink(a); }
void foo(int i) {
   f(i);
   if (i)
     f(i);
   f(i);
}

This patch reuses the LexicalScopes infrastructure we have for
LiveDebugVariables to take this into account.

The effect on compile time and memory consumption is quite noticeable:
I tested a benchmark that is a large C++ source with an enormous
amount of inlined "this"-pointers that would previously eat >24GiB
(most of them for DBG_VALUE intrinsics) and whose compile time was
dominated by LiveDebugValues. With this patch applied the memory
consumption is 1GiB and 1.7% of the time is spent in LiveDebugValues.

https://reviews.llvm.org/D24994
Thanks to Daniel Berlin and Keith Walker for reviewing!

llvm-svn: 282611
2016-09-28 17:51:14 +00:00
Artem Belevich 3e1211581c [NVPTX] Added intrinsics for atom.gen.{sys|cta}.* instructions.
These are only available on sm_60+ GPUs.

Differential Revision: https://reviews.llvm.org/D24943

llvm-svn: 282607
2016-09-28 17:25:38 +00:00
Nirav Dave e524f50882 Revert "In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled."
This reverts commit r282600 due to test failues with MCJIT

llvm-svn: 282604
2016-09-28 16:37:50 +00:00
Marina Yatsina 76bfc6670b [x86] Accept 'retn' as an alias to 'ret[lqw]'\'ret' (At&t\Intel)
Implement 'retn' simply by aliasing it to the relevant 'ret' instruction

Commit on behalf of coby

Differential Revision: https://reviews.llvm.org/D24346

llvm-svn: 282601
2016-09-28 15:52:56 +00:00
Nirav Dave e17e055b75 In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled.
Simplify Consecutive Merge Store Candidate Search

  Now that address aliasing is much less conservative, push through
  simplified store merging search which only checks for parallel stores
  through the chain subgraph. This is cleaner as the separation of
  non-interfering loads/stores from the store-merging logic.

  Whem merging stores, search up the chain through a single load, and
  finds all possible stores by looking down from through a load and a
  TokenFactor to all stores visited. This improves the quality of the
  output SelectionDAG and generally the output CodeGen (with some
  exceptions).

  Additional Minor Changes:

    1. Finishes removing unused AliasLoad code
    2. Unifies the the chain aggregation in the merged stores across
       code paths
    3. Re-add the Store node to the worklist after calling
       SimplifyDemandedBits.
    4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is
       arbitrary, but seemed sufficient to not cause regressions in
       tests.

  This finishes the change Matt Arsenault started in r246307 and
  jyknight's original patch.

  Many tests required some changes as memory operations are now
  reorderable. Some tests relying on the order were changed to use
  volatile memory operations

  Noteworthy tests:

    CodeGen/AArch64/argument-blocks.ll -
      It's not entirely clear what the test_varargs_stackalign test is
      supposed to be asserting, but the new code looks right.

    CodeGen/AArch64/arm64-memset-inline.lli -
    CodeGen/AArch64/arm64-stur.ll -
    CodeGen/ARM/memset-inline.ll -
      The backend now generates *worse* code due to store merging
      succeeding, as we do do a 16-byte constant-zero store efficiently.

    CodeGen/AArch64/merge-store.ll -
      Improved, but there still seems to be an extraneous vector insert
      from an element to itself?

    CodeGen/PowerPC/ppc64-align-long-double.ll -
      Worse code emitted in this case, due to the improved store->load
      forwarding.

    CodeGen/X86/dag-merge-fast-accesses.ll -
    CodeGen/X86/MergeConsecutiveStores.ll -
    CodeGen/X86/stores-merging.ll -
    CodeGen/Mips/load-store-left-right.ll -
      Restored correct merging of non-aligned stores

    CodeGen/AMDGPU/promote-alloca-stored-pointer-value.ll -
      Improved. Correctly merges buffer_store_dword calls

    CodeGen/AMDGPU/si-triv-disjoint-mem-access.ll -
      Improved. Sidesteps loading a stored value and merges two stores

    CodeGen/X86/pr18023.ll -
      This test has been removed, as it was asserting incorrect
      behavior. Non-volatile stores *CAN* be moved past volatile loads,
      and now are.

    CodeGen/X86/vector-idiv.ll -
    CodeGen/X86/vector-lzcnt-128.ll -
      It's basically impossible to tell what these tests are actually
      testing. But, looks like the code got better due to the memory
      operations being recognized as non-aliasing.

    CodeGen/X86/win32-eh.ll -
      Both loads of the securitycookie are now merged.

    CodeGen/AMDGPU/vgpr-spill-emergency-stack-slot-compute.ll -
      This test appears to work but no longer exhibits the spill
      behavior.

Reviewers: arsenm, hfinkel, tstellarAMD, nhaehnle, jyknight

Subscribers: wdng, nhaehnle, nemanjai, arsenm, weimingz, niravd, RKSimon, aemerson, qcolombet, resistor, tstellarAMD, t.p.northover, spatel

Differential Revision: https://reviews.llvm.org/D14834

llvm-svn: 282600
2016-09-28 15:50:43 +00:00
Sanjay Patel 220a8730fb [InstSimplify] allow or-of-icmps folds with vector splat constants
llvm-svn: 282592
2016-09-28 14:27:21 +00:00
Sanjay Patel a8f9e57c74 [InstSimplify] add vector splat tests for or-of-icmps
llvm-svn: 282591
2016-09-28 14:17:35 +00:00
Sanjay Patel 1b312ad42d [InstSimplify] allow and-of-icmps folds with vector splat constants
llvm-svn: 282590
2016-09-28 13:53:13 +00:00
Guy Blank 2bdc74a471 [X86][FastISel] Use a COPY from K register to a GPR instead of a K operation
The KORTEST was introduced due to a bug where a TEST instruction used a K register.
but, turns out that the opposite case of KORTEST using a GPR is now happening

The change removes the KORTEST flow and adds a COPY instruction from the K reg to a GPR.

Differential Revision: https://reviews.llvm.org/D24953

llvm-svn: 282580
2016-09-28 11:22:17 +00:00
Michael Kuperstein 3e06eafc20 [DAG] Remove isVectorClearMaskLegal() check from vector_build dagcombine
This check currently doesn't seem to do anything useful on any in-tree target:
On non-x86, it always evaluates to false, so we never hit the code path that
creates the shuffle with zero.
On x86, it just forwards to isShuffleMaskLegal(), which is a reasonable thing to
query in general, but doesn't make sense if only restricted to zero blends.

Differential Revision: https://reviews.llvm.org/D24625

llvm-svn: 282567
2016-09-28 06:13:58 +00:00
Adam Nemet c507ac96f5 [Inliner] Port all opt remarks to new streaming API
llvm-svn: 282559
2016-09-27 23:47:03 +00:00
Adam Nemet 0427909434 Pass -S to opt in this test to avoid printing binary on mismatch
The purpose of the test is to verify diagnostics.

llvm-svn: 282558
2016-09-27 23:46:59 +00:00
Kevin Enderby 3e490ef94e Next set of additional error checks for invalid Mach-O files for the
other load commands that use the MachO::dylinker_command type
but not used in llvm libObject code but used in llvm tool code.

This includes LC_ID_DYLINKER, LC_LOAD_DYLINKER
and LC_DYLD_ENVIRONMENT load commands.

llvm-svn: 282553
2016-09-27 23:24:13 +00:00
Sanjay Patel 764ae8bd72 [x86] add folds for FP logic with vector zeros
The 'or' case shows up in copysign. The copysign code also had 
redundant checking for a scalar zero operand with 'and', so I 
removed that. 

I'm not sure how to test vector 'and', 'andn', and 'xor' yet, 
but it seems better to just include all of the logic ops since
we're fixing 'or' anyway.

llvm-svn: 282546
2016-09-27 22:28:13 +00:00
Geoff Berry b124331db7 [TargetRegisterInfo, AArch64] Add target hook for isConstantPhysReg().
Summary:
The current implementation of isConstantPhysReg() checks for defs of
physical registers to determine if they are constant.  Some
architectures (e.g. AArch64 XZR/WZR) have registers that are constant
and may be used as destinations to indicate the generated value is
discarded, preventing isConstantPhysReg() from returning true.  This
change adds a TargetRegisterInfo hook that overrides the no defs check
for cases such as this.

Reviewers: MatzeB, qcolombet, t.p.northover, jmolloy

Subscribers: junbuml, aemerson, mcrosier, rengolin

Differential Revision: https://reviews.llvm.org/D24570

llvm-svn: 282543
2016-09-27 22:17:27 +00:00
Adam Nemet 1142147e41 [Inliner] Fold the analysis remark into the missed remark
There is really no reason for these to be separate.

The vectorizer started this pretty bad tradition that the text of the
missed remarks is pretty meaningless, i.e. vectorization failed.  There,
you have to query analysis to get the full picture.

I think we should just explain the reason for missing the optimization
in the missed remark when possible.  Analysis remarks should provide
information that the pass gathers regardless whether the optimization is
passing or not.

llvm-svn: 282542
2016-09-27 21:58:17 +00:00
Michael Zolotukhin 1a554be3b6 [LoopSimplify] When simplifying phis in loop-simplify, do it only if it preserves LCSSA form.
llvm-svn: 282541
2016-09-27 21:03:45 +00:00
Adam Nemet a62b7e1a28 Output optimization remarks in YAML
(Re-committed after moving the template specialization under the yaml
namespace.  GCC was complaining about this.)

This allows various presentation of this data using an external tool.
This was first recommended here[1].

As an example, consider this module:

  1 int foo();
  2 int bar();
  3
  4 int baz() {
  5   return foo() + bar();
  6 }

The inliner generates these missed-optimization remarks today (the
hotness information is pulled from PGO):

  remark: /tmp/s.c:5:10: foo will not be inlined into baz (hotness: 30)
  remark: /tmp/s.c:5:18: bar will not be inlined into baz (hotness: 30)

Now with -pass-remarks-output=<yaml-file>, we generate this YAML file:

  --- !Missed
  Pass:            inline
  Name:            NotInlined
  DebugLoc:        { File: /tmp/s.c, Line: 5, Column: 10 }
  Function:        baz
  Hotness:         30
  Args:
    - Callee: foo
    - String:  will not be inlined into
    - Caller: baz
  ...
  --- !Missed
  Pass:            inline
  Name:            NotInlined
  DebugLoc:        { File: /tmp/s.c, Line: 5, Column: 18 }
  Function:        baz
  Hotness:         30
  Args:
    - Callee: bar
    - String:  will not be inlined into
    - Caller: baz
  ...

This is a summary of the high-level decisions:

* There is a new streaming interface to emit optimization remarks.
E.g. for the inliner remark above:

   ORE.emit(DiagnosticInfoOptimizationRemarkMissed(
                DEBUG_TYPE, "NotInlined", &I)
            << NV("Callee", Callee) << " will not be inlined into "
            << NV("Caller", CS.getCaller()) << setIsVerbose());

NV stands for named value and allows the YAML client to process a remark
using its name (NotInlined) and the named arguments (Callee and Caller)
without parsing the text of the message.

Subsequent patches will update ORE users to use the new streaming API.

* I am using YAML I/O for writing the YAML file.  YAML I/O requires you
to specify reading and writing at once but reading is highly non-trivial
for some of the more complex LLVM types.  Since it's not clear that we
(ever) want to use LLVM to parse this YAML file, the code supports and
asserts that we're writing only.

On the other hand, I did experiment that the class hierarchy starting at
DiagnosticInfoOptimizationBase can be mapped back from YAML generated
here (see D24479).

* The YAML stream is stored in the LLVM context.

* In the example, we can probably further specify the IR value used,
i.e. print "Function" rather than "Value".

* As before hotness is computed in the analysis pass instead of
DiganosticInfo.  This avoids the layering problem since BFI is in
Analysis while DiagnosticInfo is in IR.

[1] https://reviews.llvm.org/D19678#419445

Differential Revision: https://reviews.llvm.org/D24587

llvm-svn: 282539
2016-09-27 20:55:07 +00:00
Keith Walker 83ebef5db3 Propagate DBG_VALUE entries when there are unvisited predecessors
Variables are sometimes missing their debug location information in
blocks in which the variables should be available. This would occur
when one or more predecessor blocks had not yet been visited by the
routine which propagated the information from predecessor blocks.

This is addressed by only considering predecessor blocks which have
already been visited.

The solution to this problem was suggested by Daniel Berlin on the
LLVM developer mailing list.

Differential Revision: https://reviews.llvm.org/D24927

llvm-svn: 282506
2016-09-27 16:46:07 +00:00
Adam Nemet cc2a3fa8e8 Revert "Output optimization remarks in YAML"
This reverts commit r282499.

The GCC bots are failing

llvm-svn: 282503
2016-09-27 16:39:24 +00:00
Adam Nemet 92e928c10a Output optimization remarks in YAML
This allows various presentation of this data using an external tool.
This was first recommended here[1].

As an example, consider this module:

  1 int foo();
  2 int bar();
  3
  4 int baz() {
  5   return foo() + bar();
  6 }

The inliner generates these missed-optimization remarks today (the
hotness information is pulled from PGO):

  remark: /tmp/s.c:5:10: foo will not be inlined into baz (hotness: 30)
  remark: /tmp/s.c:5:18: bar will not be inlined into baz (hotness: 30)

Now with -pass-remarks-output=<yaml-file>, we generate this YAML file:

  --- !Missed
  Pass:            inline
  Name:            NotInlined
  DebugLoc:        { File: /tmp/s.c, Line: 5, Column: 10 }
  Function:        baz
  Hotness:         30
  Args:
    - Callee: foo
    - String:  will not be inlined into
    - Caller: baz
  ...
  --- !Missed
  Pass:            inline
  Name:            NotInlined
  DebugLoc:        { File: /tmp/s.c, Line: 5, Column: 18 }
  Function:        baz
  Hotness:         30
  Args:
    - Callee: bar
    - String:  will not be inlined into
    - Caller: baz
  ...

This is a summary of the high-level decisions:

* There is a new streaming interface to emit optimization remarks.
E.g. for the inliner remark above:

   ORE.emit(DiagnosticInfoOptimizationRemarkMissed(
                DEBUG_TYPE, "NotInlined", &I)
            << NV("Callee", Callee) << " will not be inlined into "
            << NV("Caller", CS.getCaller()) << setIsVerbose());

NV stands for named value and allows the YAML client to process a remark
using its name (NotInlined) and the named arguments (Callee and Caller)
without parsing the text of the message.

Subsequent patches will update ORE users to use the new streaming API.

* I am using YAML I/O for writing the YAML file.  YAML I/O requires you
to specify reading and writing at once but reading is highly non-trivial
for some of the more complex LLVM types.  Since it's not clear that we
(ever) want to use LLVM to parse this YAML file, the code supports and
asserts that we're writing only.

On the other hand, I did experiment that the class hierarchy starting at
DiagnosticInfoOptimizationBase can be mapped back from YAML generated
here (see D24479).

* The YAML stream is stored in the LLVM context.

* In the example, we can probably further specify the IR value used,
i.e. print "Function" rather than "Value".

* As before hotness is computed in the analysis pass instead of
DiganosticInfo.  This avoids the layering problem since BFI is in
Analysis while DiagnosticInfo is in IR.

[1] https://reviews.llvm.org/D19678#419445

Differential Revision: https://reviews.llvm.org/D24587

llvm-svn: 282499
2016-09-27 16:15:16 +00:00
Simon Dardis d2ed8abb15 [mips] Disable tail calls temporarily
Disable tail calls while the remaining bugs are fixed. Enable only for tests.

Reviewers: vkalintiris

Differential Review: https://reviews.llvm.org/D24912

llvm-svn: 282487
2016-09-27 13:15:54 +00:00
Simon Dardis 0486d585c5 [mips] Add rsqrt, recip for MIPS
Add rsqrt.[ds], recip.[ds] for MIPS. Correct the microMIPS definitions for
architecture support and register usage.

Reviewers: vkalintiris, zoran.jovanoic

Differential Review: https://reviews.llvm.org/D24499

llvm-svn: 282485
2016-09-27 12:25:15 +00:00
Nemanja Ivanovic 6f22b41398 [Power9] Builtins for ELF v.2 API conformance - back end portion
This patch corresponds to review:
https://reviews.llvm.org/D24396

This patch adds support for the "vector count trailing zeroes",
"vector compare not equal" and "vector compare not equal or zero instructions"
as well as "scalar count trailing zeroes" instructions. It also changes the
vector negation to use XXLNOR (when VSX is enabled) so as not to increase
register pressure (previously this was done with a splat immediate of all
ones followed by an XXLXOR). This was done because the altivec.h
builtins (patch to follow) use vector negation and the use of an additional
register for the splat immediate is not optimal.

llvm-svn: 282478
2016-09-27 08:42:12 +00:00
Craig Topper 71f1c64320 [X86] Add test case for PR30511 and r282341.
llvm-svn: 282473
2016-09-27 06:44:30 +00:00
Craig Topper 4ffe5d5af0 [X86] Expand all-ones-vector test to cover 256-bit and 512-bit vectors.
llvm-svn: 282472
2016-09-27 06:44:27 +00:00
Kostya Serebryany 45c144754b [sanitizer-coverage] fix a bug in trace-gep
llvm-svn: 282467
2016-09-27 01:55:08 +00:00
Kostya Serebryany 186d61801c [sanitizer-coverage] don't emit the CTOR function if nothing has been instrumented
llvm-svn: 282465
2016-09-27 01:08:33 +00:00
Ivan Krasin 4ff4f21e15 Revert r277556. Add -lowertypetests-bitsets-level to control bitsets generation
Summary:
We don't currently need this facility for CFI. Disabling individual hot methods proved
to be a better strategy in Chrome.

Also, the design of the feature is suboptimal, as pointed out by Peter Collingbourne.

Reviewers: pcc

Subscribers: kcc

Differential Revision: https://reviews.llvm.org/D24948

llvm-svn: 282461
2016-09-27 00:29:53 +00:00
Davide Italiano a9f85d68cc [CodeGen] Add support for emitting .init_array instead of .ctors on FreeBSD.
PR: 30494
llvm-svn: 282451
2016-09-26 22:53:15 +00:00
Davide Italiano f5d77f4aad [CodeGen] Switch test as FreeBSD will support .init_array soon.
llvm-svn: 282450
2016-09-26 22:38:17 +00:00
Derek Schuff 92d300eb8f [WebAssembly] Use the frame pointer instead of the stack pointer
When we have dynamic allocas we have a frame pointer, and
when we're lowering frame indexes we should make sure we use it.

Patch by Jacob Gravelle

Differential Revision: https://reviews.llvm.org/D24889

llvm-svn: 282442
2016-09-26 21:18:03 +00:00
Kevin Enderby 90986e6c7c Next set of additional error checks for invalid Mach-O files for the
other load commands that use the Mach::linkedit_data_command type
but not used in llvm libObject code but used in llvm tool code.

This includes LC_FUNCTION_STARTS, LC_SEGMENT_SPLIT_INFO
and LC_DYLIB_CODE_SIGN_DRS load commands.

llvm-svn: 282441
2016-09-26 21:11:03 +00:00
Piotr Padlewski d9830eb79f [thinlto] Basic thinlto fdo heuristic
Summary:
This patch improves thinlto importer
by importing 3x larger functions that are called from hot block.

I compared performance with the trunk on spec, and there
were about 2% on povray and 3.33% on milc. These results seems
to be consistant and match the results Teresa got with her simple
heuristic. Some benchmarks got slower but I think they are just
noisy (mcf, xalancbmki, omnetpp)- running the benchmarks again with
more iterations to confirm. Geomean of all benchmarks including the noisy ones
were about +0.02%.

I see much better improvement on google branch with Easwaran patch
for pgo callsite inlining (the inliner actually inline those big functions)
Over all I see +0.5% improvement, and I get +8.65% on povray.
So I guess we will see much bigger change when Easwaran patch will land
(it depends on new pass manager), but it is still worth putting this to trunk
before it.

Implementation details changes:
- Removed CallsiteCount.
- ProfileCount got replaced by Hotness
- hot-import-multiplier is set to 3.0 for now,
didn't have time to tune it up, but I see that we get most of the interesting
functions with 3, so there is no much performance difference with higher, and
binary size doesn't grow as much as with 10.0.

Reviewers: eraman, mehdi_amini, tejohnson

Subscribers: mehdi_amini, llvm-commits

Differential Revision: https://reviews.llvm.org/D24638

llvm-svn: 282437
2016-09-26 20:37:32 +00:00
Nirav Dave 6477ce2697 Add support for Code16GCC
[X86] The .code16gcc directive parses X86 assembly input in 32-bit mode and
outputs in 16-bit mode. Teach parser to switch modes appropriately.

Reviewers: dwmw2, craig.topper

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D20109

llvm-svn: 282430
2016-09-26 19:33:36 +00:00
Evandro Menezes 055767d5f4 [AArch64] Fix test triplet
Specify proper target triplet to pass under Windows too.

llvm-svn: 282423
2016-09-26 18:09:21 +00:00
Tom Stellard 1b9748c6a2 AMDGPU/SI: Don't crash on anonymous GlobalValues
Summary:
We need to call AsmPrinter::getNameWithPrefix() in order to handle
anonymous GlobalValues (e.g. @0, @1).

Reviewers: arsenm, b-sumner

Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tony-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D24865

llvm-svn: 282420
2016-09-26 17:29:25 +00:00
Daniel Berlin 1e98c04226 Remove pruning of phi nodes in MemorySSA - it makes updating harder
Reviewers: george.burgess.iv

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D24923

llvm-svn: 282419
2016-09-26 17:22:54 +00:00
Matthew Simpson b764aba2ab [LV] Scalarize instructions marked scalar after vectorization
This patch ensures that we actually scalarize instructions marked scalar after
vectorization. Previously, such instructions may have been vectorized instead.

Differential Revision: https://reviews.llvm.org/D23889

llvm-svn: 282418
2016-09-26 17:08:37 +00:00
Gor Nishanov bc0ebb383c [Coroutines] Part14: Handle coroutines with no suspend points.
Summary:
If coroutine has no suspend points, remove heap allocation and turn a coroutine into a normal function.

Also, if a pattern is detected that coroutine resumes or destroys itself prior to coro.suspend call, turn the suspend point into a simple jump to resume or cleanup label. This pattern occurs when coroutines are used to propagate errors in functions that return expected<T>.

Reviewers: majnemer

Subscribers: mehdi_amini, llvm-commits

Differential Revision: https://reviews.llvm.org/D24408

llvm-svn: 282414
2016-09-26 15:49:28 +00:00
Geoff Berry 256fcf975f [AArch64] Improve add/sub/cmp isel of uxtw forms.
Don't match the UXTW extended reg forms of ADD/ADDS/SUB/SUBS if the
32-bit to 64-bit zero-extend can be done for free by taking advantage
of the 32-bit defining instruction zeroing the upper 32-bits of the X
register destination.  This enables better instruction selection in a
few cases, such as:

  sub x0, xzr, x8
  instead of:
  mov x8, xzr
  sub x0, x8, w9, uxtw

  madd x0, x1, x1, x8
  instead of:
  mul x9, x1, x1
  add x0, x9, w8, uxtw

  cmp x2, x8
  instead of:
  sub x8, x2, w8, uxtw
  cmp x8, #0

  add x0, x8, x1, lsl #3
  instead of:
  lsl x9, x1, #3
  add x0, x9, w8, uxtw

Reviewers: t.p.northover, jmolloy

Subscribers: mcrosier, aemerson, llvm-commits, rengolin

Differential Revision: https://reviews.llvm.org/D24747

llvm-svn: 282413
2016-09-26 15:34:47 +00:00
Evandro Menezes e45de8a5ec Add support to optionally limit the size of jump tables.
Many high-performance processors have a dedicated branch predictor for
indirect branches, commonly used with jump tables.  As sophisticated as such
branch predictors are, they tend to have well defined limits beyond which
their effectiveness is hampered or even nullified.  One such limit is the
number of possible destinations for a given indirect branches that such
branch predictors can handle.

This patch considers a limit that a target may set to the number of
destination addresses in a jump table.

Patch by: Evandro Menezes <e.menezes@samsung.com>, Aditya Kumar
<aditya.k7@samsung.com>, Sebastian Pop <s.pop@samsung.com>.

Differential revision: https://reviews.llvm.org/D21940

llvm-svn: 282412
2016-09-26 15:32:33 +00:00
Alexey Bataev 793c946ecb [InstCombine] Fixed bug introduced in r282237
The index of the new insertelement instruction was evaluated in the
wrong way, it was considered as the index of the inserted value instead
of index of the position, where the value should be inserted.

llvm-svn: 282401
2016-09-26 13:18:59 +00:00
Andrea Di Biagio a82d52d11d [InstCombine] Teach the udiv folding logic how to handle constant expressions.
This patch fixes PR30366.

Function foldUDivShl() worked under the assumption that one of the values
in input to the function was always an instance of llvm::Instruction.
However, function visitUDivOperand() (the only user of foldUDivShl) was
clearly violating that precondition; internally, visitUDivOperand() uses pattern
matches to check the operands of a udiv. Pattern matchers for binary operators
know how to handle both Instruction and ConstantExpr values.

This patch fixes the problem in foldUDivShl(). Now we use pattern matchers
instead of explicit casts to Instruction. The reduced test case from PR30366
has been added to test file InstCombine/udiv-simplify.ll.

Differential Revision: https://reviews.llvm.org/D24565

llvm-svn: 282398
2016-09-26 12:07:23 +00:00
Sam Kolton 984461062f Revert "[AMDGPU] Disassembler: print label names in branch instructions"
This reverts commit 6c6dbe625263ec9fcf8de0df27263cf147cde550.

llvm-svn: 282396
2016-09-26 11:29:03 +00:00
Sam Kolton 1559f76257 [AMDGPU] Disassembler: print label names in branch instructions
Summary: Add AMDGPUSymbolizer for finding names for labels from ELF symbol table.

Reviewers: vpykhtin, artem.tamazov, tstellarAMD

Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tony-tye

Differential Revision: https://reviews.llvm.org/D24802

llvm-svn: 282394
2016-09-26 10:05:50 +00:00
James Molloy 9abb2fa5bb [ARM] Promote small global constants to constant pools
If a constant is unamed_addr and is only used within one function, we can save
on the code size and runtime cost of an indirection by changing the global's storage
to inside the constant pool. For example, instead of:

      ldr r0, .CPI0
      bl printf
      bx lr
    .CPI0: &format_string
    format_string: .asciz "hello, world!\n"

We can emit:

      adr r0, .CPI0
      bl printf
      bx lr
    .CPI0: .asciz "hello, world!\n"

This can cause significant code size savings when many small strings are used in one
function (4 bytes per string).

This recommit contains fixes for a nasty bug related to fast-isel fallback - because
fast-isel doesn't know about this optimization, if it runs and emits references to
a string that we inline (because fast-isel fell back to SDAG) we will end up
with an inlined string and also an out-of-line string, and we won't emit the
out-of-line string, causing backend failures.

It also contains fixes for emitting .text relocations which made the sanitizer
bots unhappy.

llvm-svn: 282387
2016-09-26 07:26:24 +00:00
Zvi Rackover 839d15a194 [X86] Optimization for replacing LEA with MOV at frame index elimination time
Summary:
Replace a LEA instruction of the form 'lea (%esp), %ebx' --> 'mov %esp, %ebx'

MOV is preferable over LEA because usually there are more issue-slots available to execute MOVs than LEAs. Latest processors also support zero-latency MOVs.

Fixes pr29022.

Reviewers: hfinkel, delena, igorb, myatsina, mkuper

Differential Revision: https://reviews.llvm.org/D24705

llvm-svn: 282385
2016-09-26 06:42:07 +00:00
Ayman Musa d7a5ed4141 [X86][avx512] Fix bug in masked compress store.
Differential Revision: https://reviews.llvm.org/D23984

llvm-svn: 282381
2016-09-26 06:22:08 +00:00
Craig Topper 60d3ef1d72 [AVX-512] Fix some patterns predicates to properly enforce priority for various versions of CVTDQ2PD instruction.
llvm-svn: 282358
2016-09-25 16:34:02 +00:00
Craig Topper d8b2bd492c [AVX-512] Add the scalar unsigned integer to fp conversion instructions to hasUndefRegUpdate.
llvm-svn: 282356
2016-09-25 16:33:57 +00:00
Sanjay Patel 752ad8fde7 [x86] don't try to create a vector integer inst for an SSE1 target (PR30512)
This bug was introduced with:
http://reviews.llvm.org/rL272511

We need to restrict the lowering to v4f32 comparisons because that's all SSE1 can handle.

This should fix:
https://llvm.org/bugs/show_bug.cgi?id=28044

llvm-svn: 282336
2016-09-24 20:24:06 +00:00
Sanjay Patel 0b36337d61 [x86] fix FCOPYSIGN lowering to create constants instead of ConstantPool loads
This is similar to:
https://reviews.llvm.org/rL279958

By not prematurely lowering to loads, we should be able to more easily eliminate
the 'or' with zero instructions seen in copysign-constant-magnitude.ll.

We should also be able to extend this code to handle vectors.

llvm-svn: 282312
2016-09-23 23:17:29 +00:00
Petr Hosek 85b2f67613 [MC] Support .ds directives in assembler parser
These directives are already supported by GNU assembler.

Differential Revision: https://reviews.llvm.org/D24740

llvm-svn: 282303
2016-09-23 21:53:36 +00:00
Matthias Braun 729c989083 llc: Add -start-before/-stop-before options
Differential Revision: https://reviews.llvm.org/D23089

llvm-svn: 282302
2016-09-23 21:46:02 +00:00
Teresa Johnson 896fee2846 [gold] Split plugin options controlling ThinLTO and codegen parallelism.
Summary:
As suggested in D24826, use different options for ThinLTO backend
parallelism from the option controlling regular LTO code gen
parallelism. They are already split in the LTO API, and this enables
controlling them with different clang options.

Reviewers: pcc, mehdi_amini

Subscribers: dexonsmith, llvm-commits, mehdi_amini

Differential Revision: https://reviews.llvm.org/D24873

llvm-svn: 282290
2016-09-23 20:35:19 +00:00
Petr Hosek 4cb08ce3bc [MC] Support .dcb directives in assembler parser
These directives are already supported by GNU assembler.

Differential Revision: https://reviews.llvm.org/D24741

llvm-svn: 282283
2016-09-23 19:25:15 +00:00
Vedant Kumar 4588088050 [llvm-cov] Filter away source files that aren't in the coverage mapping
... so that they don't show up in the index. This came up because polly
contains a .git directory and some other unmapped input in its source
dir.

llvm-svn: 282282
2016-09-23 18:57:35 +00:00
Vedant Kumar bc6479850e [llvm-cov] Get rid of all invalid filename references
We used to append filenames into a vector of std::string, and then
append a reference to each string into a separate vector. This made it
easier to work with the getUniqueSourceFiles API. But it's buggy.

std::string has a small-string optimization, so you can't expect to
capture a reference to one if you're copying it into a growing vector.
Add a test that triggers this invalid reference to std::string scenario,
and kill the issue with fire by just using ArrayRef<std::string>
everywhere.

llvm-svn: 282281
2016-09-23 18:57:32 +00:00
Sanjay Patel 04949faf69 [TLI] isdigit / isascii / toascii param type should match return type (PR30484)
We crash in LibCallSimplifier if we don't check the validity of the function signature properly.

llvm-svn: 282278
2016-09-23 18:44:09 +00:00
Matthias Braun 1acb55e67c ScheduleDAG: Match enum names when printing sdep kinds
It is less confusing to have the same names in the debug print as the
enum members.

llvm-svn: 282273
2016-09-23 18:28:31 +00:00
Jun Bum Lim 3822939ba7 Enhance calcColdCallHeuristics for InvokeInst
Summary: When identifying cold blocks, consider only the edge to the normal destination if the terminator is InvokeInst and let calcInvokeHeuristics() decide edge weights for the InvokeInst.

Reviewers: mcrosier, hfinkel, davidxl

Subscribers: mcrosier, llvm-commits

Differential Revision: https://reviews.llvm.org/D24868

llvm-svn: 282262
2016-09-23 17:26:14 +00:00
James Molloy 85124c76fc Revert "[ARM] Promote small global constants to constant pools"
This reverts commit r282241. It caused http://lab.llvm.org:8011/builders/clang-native-arm-lnt/builds/19882.

llvm-svn: 282249
2016-09-23 13:35:43 +00:00
Nemanja Ivanovic d2c3c51a70 [Power9] Exploit move and splat instructions for build_vector improvement
This patch corresponds to review:
https://reviews.llvm.org/D21135

This patch exploits the following instructions:
mtvsrws
lxvwsx
mtvsrdd
mfvsrld

In order to improve some build_vector and extractelement patterns.

llvm-svn: 282246
2016-09-23 13:25:31 +00:00
James Molloy 1ce54d6be2 [ARM] Promote small global constants to constant pools
If a constant is unamed_addr and is only used within one function, we can save
on the code size and runtime cost of an indirection by changing the global's storage
to inside the constant pool. For example, instead of:

      ldr r0, .CPI0
      bl printf
      bx lr
    .CPI0: &format_string
    format_string: .asciz "hello, world!\n"

We can emit:

      adr r0, .CPI0
      bl printf
      bx lr
    .CPI0: .asciz "hello, world!\n"

This can cause significant code size savings when many small strings are used in one
function (4 bytes per string).

This recommit contains fixes for a nasty bug related to fast-isel fallback - because
fast-isel doesn't know about this optimization, if it runs and emits references to
a string that we inline (because fast-isel fell back to SDAG) we will end up
with an inlined string and also an out-of-line string, and we won't emit the
out-of-line string, causing backend failures.

It also contains fixes for emitting .text relocations which made the sanitizer
bots unhappy.

llvm-svn: 282241
2016-09-23 12:15:58 +00:00
George Rimar 4f82df52ae Revert r282238 "Revert r282235 "[llvm-dwarfdump] - Teach dwarfdump to dump gdb-index section.""
Build bot issues (http://lab.llvm.org:8011/builders/clang-x64-ninja-win7/builds/15856/steps/ninja%20check%201/logs/FAIL%3A%20LLVM%3A%3Adwarfdump-dump-gdbindex.test)
should be fixed in that version. Issue was that MSVS does not support "%zu". Though it works fine on MSCS 2015,
Bot looks running MSVS 2013 that does not like it. MSDN also says that "z" prefix is not supported: https://msdn.microsoft.com/en-us/library/tcxf1dw6.aspx
I had to use PRId64 instead.

Original commit message:

[llvm-dwarfdump] - Teach dwarfdump to dump gdb-index section.

gold linker's --gdb-index option currently is able to create the .gdb_index section that allows GDB to locate and read the .dwo files as it needs them,
this helps reduce the total size of the object files processed by the linker.

More info about that:
https://gcc.gnu.org/wiki/DebugFission
https://sourceware.org/gdb/onlinedocs/gdb/Index-Section-Format.html

Patch teaches dwarfdump tool to dump this section.

Differential revision: https://reviews.llvm.org/D21503

llvm-svn: 282239
2016-09-23 11:01:53 +00:00
George Rimar a348527186 Revert r282235 "[llvm-dwarfdump] - Teach dwarfdump to dump gdb-index section."
It broke BB:
http://lab.llvm.org:8011/builders/clang-x64-ninja-win7/builds/15856

llvm-svn: 282238
2016-09-23 10:12:56 +00:00
Alexey Bataev fee9078dcd [InstCombine] Fix for PR29124: reduce insertelements to shufflevector
If inserting more than one constant into a vector:

define <4 x float> @foo(<4 x float> %x) {
  %ins1 = insertelement <4 x float> %x, float 1.0, i32 1
  %ins2 = insertelement <4 x float> %ins1, float 2.0, i32 2
  ret <4 x float> %ins2
}

InstCombine could reduce that to a shufflevector:

define <4 x float> @goo(<4 x float> %x) {
 %shuf = shufflevector <4 x float> %x, <4 x float> <float undef, float 1.0, float 2.0, float undef>, <4 x i32><i32 0, i32 5, i32 6, i32 3>
 ret <4 x float> %shuf
}
Also, InstCombine tries to convert shuffle instruction to single insertelement, if one of the vectors is a constant vector and only a single element from this constant should be used in shuffle, i.e.
shufflevector <4 x float> %v, <4 x float> <float undef, float 1.0, float
undef, float undef>, <4 x i32> <i32 0, i32 5, i32 undef, i32 undef> ->
insertelement <4 x float> %v, float 1.0, 1

Differential Revision: https://reviews.llvm.org/D24182

llvm-svn: 282237
2016-09-23 09:14:08 +00:00
George Rimar a77bcf5e42 [llvm-dwarfdump] - Teach dwarfdump to dump gdb-index section.
gold linker's --gdb-index option currently is able to create the .gdb_index section that allows GDB to locate and read the .dwo files as it needs them,
this helps reduce the total size of the object files processed by the linker.

More info about that:
https://gcc.gnu.org/wiki/DebugFission
https://sourceware.org/gdb/onlinedocs/gdb/Index-Section-Format.html

Patch teaches dwarfdump tool to dump this section.

Differential revision: https://reviews.llvm.org/D21503

llvm-svn: 282235
2016-09-23 09:09:26 +00:00
Tom Stellard e88bbc34c6 AMDGPU/SI: Include implicit arguments in kernarg_segment_byte_size
Reviewers: arsenm

Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, llvm-commits, tony-tye

Differential Revision: https://reviews.llvm.org/D24835

llvm-svn: 282223
2016-09-23 01:33:26 +00:00
Petr Hosek 2f4ac44dc3 [MC] Support skip and count for .incbin directive
These optional arguments are supported by GNU assembler.

Differential Revision: https://reviews.llvm.org/D24714

llvm-svn: 282217
2016-09-23 00:41:06 +00:00
Sanjay Patel 30ef70b090 [InstCombine] fold X urem C -> X < C ? X : X - C when C is big (PR28672)
We already have the udiv variant of this transform, so I think this is ok for 
InstCombine too even though there is an increase in IR instructions. As the 
tests and TODO comments show, the transform can lead to follow-on combines.

This should fix: https://llvm.org/bugs/show_bug.cgi?id=28672

Differential Revision: https://reviews.llvm.org/D24527

llvm-svn: 282209
2016-09-22 22:36:26 +00:00
Vedant Kumar 1ce90d889a [llvm-cov] Add the ability to specify directories of input source files
We've supported restricting coverage reports to a set of files for a
long time. Add support for being able to restrict by entire directories.

I suppose this supersedes D20803.

llvm-svn: 282202
2016-09-22 21:49:43 +00:00
Hans Wennborg c7957ef86c Revert r282168 "GVN-hoist: fix store past load dependence analysis (PR30216)"
and also the dependent r282175 "GVN-hoist: do not dereference null pointers"

It's causing compiler crashes building Harfbuzz (PR30499).

llvm-svn: 282199
2016-09-22 21:20:53 +00:00
Arnold Schwaighofer 0fd32c005b i386 does not support optimized swifterror handling
rdar://28432565

llvm-svn: 282186
2016-09-22 20:06:25 +00:00
Hans Wennborg c4b1d20ba2 Win64: Don't emit unwind info for "leaf" functions (PR30337)
According to MSDN (see the PR), functions which don't touch any callee-saved
registers (including %rsp) don't need any unwind info.

This patch makes LLVM not emit unwind info for such functions, to save
binary size.

Differential Revision: https://reviews.llvm.org/D24748

llvm-svn: 282185
2016-09-22 19:50:05 +00:00
Nemanja Ivanovic 8dacca943a [PowerPC] Sign extend sub-word values for atomic comparisons
Atomic comparison instructions use the sub-word load instruction on
Power8 and up but the value is not sign extended prior to the signed word
compare instruction. This patch adds that sign extension.

llvm-svn: 282182
2016-09-22 19:06:38 +00:00
Nirav Dave 9011da3d44 [DAG] Fix incorrect alignment of ext load.
Correctly use alignment size from loaded size not output value size.

Reviewers: jyknight, tstellarAMD, arsenm

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D23356

llvm-svn: 282177
2016-09-22 17:28:43 +00:00
Krzysztof Parzyszek b66efb855c [PPC] Set SP after loading data from stack frame, if no red zone is present
Follow-up to r280705: Make sure that the SP is only restored after all data
is loaded from the stack frame, if there is no red zone.

This completes the fix for https://llvm.org/bugs/show_bug.cgi?id=26519.

Differential Revision: https://reviews.llvm.org/D24466

llvm-svn: 282174
2016-09-22 17:22:43 +00:00
Sebastian Pop 8e6e3318c2 GVN-hoist: fix store past load dependence analysis (PR30216)
To hoist stores past loads, we used to search for potential
conflicting loads on the hoisting path by following a MemorySSA
def-def link from the store to be hoisted to the previous
defining memory access, and from there we followed the def-use
chains to all the uses that occur on the hoisting path. The
problem is that the def-def link may point to a store that does
not alias with the store to be hoisted, and so the loads that are
walked may not alias with the store to be hoisted, and even as in
the testcase of PR30216, the loads that may alias with the store
to be hoisted are not visited.

The current patch visits all loads on the path from the store to
be hoisted to the hoisting position and uses the alias analysis
to ask whether the store may alias the load. I was not able to
use the MemorySSA functionality to ask for whether load and
store are clobbered: I'm not sure which function to call, so I
used a call to AA->isNoAlias().

Store past store is still working as before using a MemorySSA
query: I added an extra test to pr30216.ll to make sure store
past store does not regress.

Differential Revision: https://reviews.llvm.org/D24517

llvm-svn: 282168
2016-09-22 15:33:51 +00:00
Sebastian Pop f6bfc1ad8b GVN-hoist: move hoist testcase to GVNHoist dir
llvm-svn: 282161
2016-09-22 14:45:46 +00:00
Sebastian Pop 440f15b7fc GVN-hoist: only hoist relevant scalar instructions
Without this patch, GVN-hoist would think that a branch instruction is a scalar instruction
and would try to value number it. The patch filters out all such kind of irrelevant instructions.

A bit frustrating is that there is no easy way to discard all those very infrequent instructions,
a bit like isa<TerminatorInst> that stands for a large family of instructions. I'm thinking that
checking for those very infrequent other instructions would cost us more in compilation time
than just letting those instructions getting numbered, so I'm still thinking that a simpler check:

  if (isa<TerminatorInst>(I))
    return false;

is better than listing all the other less frequent instructions.

Differential Revision: https://reviews.llvm.org/D23929

llvm-svn: 282160
2016-09-22 14:45:40 +00:00
Tim Northover a5e38fa00d GlobalISel: handle stack-based parameters on AArch64.
llvm-svn: 282153
2016-09-22 13:49:25 +00:00
Anna Thomas 82c3717f54 [RS4GC] Remat in presence of phi and use live value
Summary:

Reviewers:

Subscribers:

llvm-svn: 282150
2016-09-22 13:13:06 +00:00
Artem Tamazov 2146a0a90e [AMDGPU][mc] Add support for absolute expressions in DPP modifiers.
Also added range checking for DPP attributes.
Assembler tests added as well.

Differential Revision: https://reviews.llvm.org/D24755

llvm-svn: 282145
2016-09-22 11:47:21 +00:00
Nemanja Ivanovic 6e7879c5e6 [Power9] Add exploitation of non-permuting memory ops
This patch corresponds to review:
https://reviews.llvm.org/D19825

The new lxvx/stxvx instructions do not require the swaps to line the elements
up correctly. In order to select them over the lxvd2x/lxvw4x instructions which
require swaps, the patterns for the old instruction have a predicate that
ensures they won't be selected on Power9 and newer CPUs.

llvm-svn: 282143
2016-09-22 09:52:19 +00:00
Sagar Thakur e74eb4e7b9 [EfficiencySanitizer] Using '$' instead of '#' for struct counter name
For MIPS '#' is the start of comment line. Therefore we get assembler errors if # is used in the structure names.

Differential: D24334
Reviewed by: zhaoqin

llvm-svn: 282141
2016-09-22 08:33:06 +00:00
Dorit Nuzman d1247a684e Fix revision 281960
llvm-svn: 282139
2016-09-22 07:56:23 +00:00
Craig Topper 202b453a8a [AVX-512] Add support for commuting VPTERNLOG instructions.
VPTERNLOG is a ternary instruction with an immediate specifying the logical operation to perform. For each bit position in the 3 source vectors the bit from each source is concatenated together and the resulting 3-bit value is used to select a bit in the immediate. This bit value is written to the result vector.

We can commute this by swapping operands and modifying the immediate. To modify the immediate we need to swap two pairs of bits. The pairs correspond to the locations in the immediate where the commuted operands bits have opposite values and the uncommuted operand has the same value. Bits 0 and 7 will never be swapped since the relevant bits from all sources are the same value.

This refactors and reuses parts of the FMA3 commuting code which is also a three operand instruction.

llvm-svn: 282132
2016-09-22 03:00:50 +00:00
Kevin Enderby e71e13c7d6 Next set of additional error checks for invalid Mach-O files for bad LC_UUID
load commands.  Added a missing check and made the check for more than
one like other other “more than one” checks.  And of course added test cases.

llvm-svn: 282104
2016-09-21 20:03:09 +00:00