Commit Graph

40260 Commits

Author SHA1 Message Date
Michael Ilseman 6d6b4d87a3 Revert "Add -strip-nonlinetable-debuginfo capability"
This reverts commit r283473.

Reverted until review is completed.

llvm-svn: 283478
2016-10-06 18:30:26 +00:00
Michael Ilseman d0a4db7632 Add -strip-nonlinetable-debuginfo capability
This adds a new function to DebugInfo.cpp that takes an llvm::Module
as input and removes all debug info metadata that is not directly
needed for line tables, thus effectively stripping all type and
variable information from the module.

The primary motivation for this feature was the bitcode work flow
(cf. http://lists.llvm.org/pipermail/llvm-dev/2016-June/100643.html
for more background). This is not wired up yet, but will be in
subsequent patches.  For testing, the new functionality is exposed to
opt with a -strip-nonlinetable-debuginfo option.

The secondary use-case (and one that works right now!) is as a
reduction pass in bugpoint. I added two new bugpoint options
(-disable-strip-debuginfo and -disable-strip-debug-types) to control
the new features. By default it will first attempt to remove all debug
information, then only the type info, and then proceed to hack at any
remaining MDNodes.

llvm-svn: 283473
2016-10-06 17:58:38 +00:00
Matt Arsenault 6bc43d8627 BranchRelaxation: Support expanding unconditional branches
AMDGPU needs to expand unconditional branches in a new
block with an indirect branch.

llvm-svn: 283464
2016-10-06 16:20:41 +00:00
Krzysztof Parzyszek d391d6f1c3 [Hexagon] Avoid replacing full regs with subregisters in tied operands
Doing so will result in the two-address pass generating incorrect code.

llvm-svn: 283463
2016-10-06 16:18:04 +00:00
Matt Arsenault ef5bba0136 BranchRelaxation: Account for function alignment
llvm-svn: 283462
2016-10-06 16:00:58 +00:00
Matt Arsenault 36919a4f7c Move AArch64BranchRelaxation to generic code
llvm-svn: 283459
2016-10-06 15:38:53 +00:00
Nirav Dave ee554e6155 [X86] Fix intel syntax push parsing bug
Change erroneous parsing of push immediate instructions in intel syntax
to default to pointer size by rewriting into the ATT style for matching.

This fixes PR22028.

Reviewers: majnemer, rnk

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D25288

llvm-svn: 283457
2016-10-06 15:28:08 +00:00
Rafael Espindola d9525a166d Centralize sh_entsize checking.
llvm-svn: 283455
2016-10-06 15:08:10 +00:00
Rafael Espindola c3befb2e39 Refactor to use getSectionContentsAsArray.
This centralizes quite a bit of error checking.

llvm-svn: 283454
2016-10-06 14:47:04 +00:00
Sam Kolton 3381d7a216 [AMDGPU] Disassembler: print label names in branch instructions
Summary: Add AMDGPUSymbolizer for finding names for labels from ELF symbol table.
Initialize MCObjectFileInfo with some default values.

Reviewers: vpykhtin, artem.tamazov, tstellarAMD

Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tony-tye

Differential Revision: https://reviews.llvm.org/D24802

llvm-svn: 283450
2016-10-06 13:46:08 +00:00
Hal Finkel 4d6f3088c3 [llvm-opt-report] Record VF, etc. correctly for multiple opts on one line
When there are multiple optimizations on one line, record the vectorization
factors, etc. correctly (instead of incorrectly substituting default values).

llvm-svn: 283443
2016-10-06 11:58:52 +00:00
Diana Picus 6341e46cd1 Revert "[ARM] Use __rt_div functions for divrem on Windows"
This reverts commit r283383 because it broke some of the bots:
undefined reference to ` __aeabi_uldivmod'

It affected (at least) clang-cmake-armv7-a15-selfhost,
clang-cmake-armv7-a15-selfhost and clang-native-arm-lnt.

llvm-svn: 283442
2016-10-06 11:24:29 +00:00
Hal Finkel 47faf3be89 [llvm-opt-report] Print line numbers starting from 1
Line numbers should start from 1, not 2.

llvm-svn: 283440
2016-10-06 11:11:11 +00:00
Zvi Rackover 08a37f46e3 Add test-cases which demontrate pr30561
llvm-svn: 283436
2016-10-06 10:04:00 +00:00
Bjorn Pettersson 3961603921 [ValueTracking] Teach computeKnownBits and ComputeNumSignBits to look through ExtractElement.
Summary:
The computeKnownBits and ComputeNumSignBits functions in ValueTracking can now do a simple look-through of ExtractElement.

Reviewers: majnemer, spatel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D24955

llvm-svn: 283434
2016-10-06 09:56:21 +00:00
James Molloy 6215fad0e9 [ARM] Constant pool promotion - fix alignment calculation
Global variables are GlobalValues, so they have explicit alignment. Querying
DataLayout for the alignment was incorrect.

Testcase added.

llvm-svn: 283423
2016-10-06 07:56:00 +00:00
James Molloy 78561c4917 [ARM] Improve testcase for r283323
We can work around a shortcoming of FileCheck by using {{\[}} to match a square
bracket before a [[ sequence.

Thanks to Eli Friedman for the heads up!

llvm-svn: 283422
2016-10-06 07:44:05 +00:00
Konstantin Zhuravlyov b4eb5d5049 [AMDGPU] Promote uniform i16 bitreverse intrinsic to i32
Differential Revision: https://reviews.llvm.org/D25121

llvm-svn: 283415
2016-10-06 02:20:46 +00:00
Sanjay Patel edc2baddf8 [DAG] add tests to show missing checks for SDNode FMF
The AVX attribute is added to remove noise caused by SSE's destructive insts.  

llvm-svn: 283410
2016-10-05 23:20:32 +00:00
Hal Finkel 5d0fbbbca1 Fix tests for Windows
We need to match file names with both forward and backward slashes.

llvm-svn: 283407
2016-10-05 22:48:13 +00:00
Reid Kleckner bb96df602e [codeview] Truncate records to maximum record size near 64KB
If we don't truncate, LLVM asserts when the label difference doesn't fit
in a 16 bit field. This patch truncates two kinds of data: trailing null
terminated names in symbol records, and inline line tables. The inline
line table test that I have is too large (many MB), so I'm not checking
it in.

Hopefully fixes PR28264.

llvm-svn: 283403
2016-10-05 22:36:07 +00:00
Hal Finkel 5aa0248059 [llvm-opt-report] Distinguish inlined contexts when optimizations differ
How code is optimized sometimes, perhaps often, depends on the context into
which it was inlined. This change allows llvm-opt-report to track the
differences between the optimizations performed, or not, in different contexts,
and when these differ, display those differences.

For example, this code:

  $ cat /tmp/q.cpp
  void bar();
  void foo(int n) {
    for (int i = 0; i < n; ++i)
      bar();
  }

  void quack() {
    foo(4);
  }

  void quack2() {
    foo(4);
  }

will now produce this report:

  < /home/hfinkel/src/llvm/test/tools/llvm-opt-report/Inputs/q.cpp
   2         | void bar();
   3         | void foo(int n) {
   [[
    > foo(int):
   4         |   for (int i = 0; i < n; ++i)
    > quack(), quack2():
   4  U4     |   for (int i = 0; i < n; ++i)
   ]]
   5         |     bar();
   6         | }
   7         |
   8         | void quack() {
   9 I       |   foo(4);
  10         | }
  11         |
  12         | void quack2() {
  13 I       |   foo(4);
  14         | }
  15         |

Note that the tool has demangled the function names, and grouped the reports
associated with line 4. This shows that the loop on line 4 was unrolled by a
factor of 4 when inlined into the functions quack() and quack2(), but not in
the function foo(int) itself.

llvm-svn: 283402
2016-10-05 22:25:33 +00:00
Adrian Prantl b3510afcd1 Verifier: Reject any unknown named MD nodes in the llvm.dbg namespace.
This came out of a discussion in https://reviews.llvm.org/D25285.

There used to be various other llvm.dbg.* nodes, but we don't support
upgrading them and we want to reserve the namespace for future uses.

This also removes an entirely obsolete and bitrotted testcase for PR7662.

Reapplies 283390 with a forgotten testcase.

llvm-svn: 283400
2016-10-05 22:15:37 +00:00
Adrian Prantl 497f085475 Revert "Verifier: Reject any unknown named MD nodes in the llvm.dbg namespace."
Forgot to add a testcase in r283390.

llvm-svn: 283399
2016-10-05 22:15:34 +00:00
Hal Finkel 52031b7e65 Add an llvm-opt-report tool to generate basic source-annotated optimization summaries
LLVM now has the ability to record information from optimization remarks in a
machine-consumable YAML file for later analysis. This can be enabled in opt
(see r282539), and D25225 adds a Clang flag to do the same. This patch adds
llvm-opt-report, a tool to generate basic optimization "listing" files
(annotated sources with information about what optimizations were performed)
from one of these YAML inputs.

D19678 proposed to add this capability directly to Clang, but this more-general
YAML-based infrastructure was the direction we decided upon in that review
thread.

For this optimization report, I focused on making the output as succinct as
possible while providing information on inlining and loop transformations. The
goal here is that the source code should still be easily readable in the
report. My primary inspiration here is the reports generated by Cray's tools
(http://docs.cray.com/books/S-2496-4101/html-S-2496-4101/z1112823641oswald.html).
These reports are highly regarded within the HPC community. Intel's compiler,
for example, also has an optimization-report capability
(https://software.intel.com/sites/default/files/managed/55/b1/new-compiler-optimization-reports.pdf).

  $ cat /tmp/v.c
  void bar();
  void foo() { bar(); }

  void Test(int *res, int *c, int *d, int *p, int n) {
    int i;

  #pragma clang loop vectorize(assume_safety)
    for (i = 0; i < 1600; i++) {
      res[i] = (p[i] == 0) ? res[i] : res[i] + d[i];
    }

    for (i = 0; i < 16; i++) {
      res[i] = (p[i] == 0) ? res[i] : res[i] + d[i];
    }

    foo();

    foo(); bar(); foo();
  }

D25225 adds -fsave-optimization-record (and
-fsave-optimization-record=filename), and this would be used as follows:

  $ clang -O3 -o /tmp/v.o -c /tmp/v.c -fsave-optimization-record
  $ llvm-opt-report /tmp/v.yaml > /tmp/v.lst
  $ cat /tmp/v.lst

  < /tmp/v.c
   2          | void bar();
   3          | void foo() { bar(); }
   4          |
   5          | void Test(int *res, int *c, int *d, int *p, int n) {
   6          |   int i;
   7          |
   8          | #pragma clang loop vectorize(assume_safety)
   9     V4,2 |   for (i = 0; i < 1600; i++) {
  10          |     res[i] = (p[i] == 0) ? res[i] : res[i] + d[i];
  11          |   }
  12          |
  13  U16     |   for (i = 0; i < 16; i++) {
  14          |     res[i] = (p[i] == 0) ? res[i] : res[i] + d[i];
  15          |   }
  16          |
  17 I        |   foo();
  18          |
  19          |   foo(); bar(); foo();
     I        |   ^
     I        |                 ^
  20          | }

Each source line gets a prefix giving the line number, and a few columns for
important optimizations: inlining, loop unrolling and loop vectorization. An
'I' is printed next to a line where a function was inlined, a 'U' next to an
unrolled loop, and 'V' next to a vectorized loop. These are printed on the
relevant code line when that seems unambiguous, or on subsequent lines when
multiple potential options exist (messages, both positive and negative, from
the same optimization with different column numbers are taken to indicate
potential ambiguity). When on subsequent lines, a '^' is output in the relevant
column.

Annotated source for all relevant input files are put into the listing file
(each starting with '<' and then the file name).

You can disable having the unrolling/vectorization factors appear by using the
-s flag.

Differential Revision: https://reviews.llvm.org/D25262

llvm-svn: 283398
2016-10-05 22:10:35 +00:00
Sanjay Patel 5839858584 [DAG] change test to use 'unsafe' function attribute instead of global setting
But we have node-level FMF, so the next step is to fix this at the instruction/node-level.

llvm-svn: 283393
2016-10-05 21:43:50 +00:00
Adrian Prantl 71bba7253e Verifier: Reject any unknown named MD nodes in the llvm.dbg namespace.
This came out of a discussion in https://reviews.llvm.org/D25285.

There used to be various other llvm.dbg.* nodes, but we don't support
upgrading them and we want to reserve the namespace for future uses.

This also removes an entirely obsolete and bitrotted testcase for PR7662.

llvm-svn: 283390
2016-10-05 21:31:19 +00:00
Reid Kleckner 2b3e6428e5 [codeview] Translate bitpiece metadata to DEFRANGE_SUBFIELD* records
This allows LLVM to describe locations of aggregate variables that have
been split by SROA.

Fixes PR29141

Reviewers: amccarth, majnemer

Differential Revision: https://reviews.llvm.org/D25253

llvm-svn: 283388
2016-10-05 21:21:33 +00:00
Martin Storsjo f997759aef [ARM] Use __rt_div functions for divrem on Windows
This avoids falling back to calling out to the GCC rem functions
(__moddi3, __umoddi3) when targeting Windows.

The __rt_div functions have flipped the two arguments compared
to the __aeabi_divmod functions. To match MSVC, we emit a
check for division by zero before actually calling the library
function (even if the library function itself also might do
the same check).

Not all calls to __rt_div functions for division are currently
merged with calls to the same function with the same parameters
for the remainder. This is more wasteful than a div + mls as before,
but avoids calls to __moddi3.

Differential Revision: https://reviews.llvm.org/D24076

llvm-svn: 283383
2016-10-05 21:08:02 +00:00
James Y Knight b0a473aaf8 [Sparc] Implement UMUL_LOHI and SMUL_LOHI instead of MULHS/MULHU/MUL.
This is what the instruction-set actually provides, and the default
expansions of the others into the lohi opcodes are good.

llvm-svn: 283381
2016-10-05 20:54:17 +00:00
Yunzhong Gao ba150d6156 Improve the debug-info test created in r274263.
This patch is related to r274263 or Phabricator/D21818.
This patch aims to improve the test case added in the previous commit to verify
specifically that the stack protector pass is adding the debug line info as
intended. Before, the test only verified that the verifier pass does not crash.
The current approach is to generate the assembly output and then look for the
.loc directive.

Differential Revision: https://reviews.llvm.org/D25290

llvm-svn: 283374
2016-10-05 20:26:29 +00:00
Krzysztof Parzyszek 3b6cbd55f7 [RDF] Fix live def propagation through basic block
llvm-svn: 283371
2016-10-05 20:08:09 +00:00
Reid Kleckner f9dddec21c Improve DEBUG_VALUE assembly comments for spilled bitpieces
Previously we would give up when we saw the bitpiece DWARF expression
and print "[complex expression]" when actually we handled bitpiece
expressions outside the loop.

llvm-svn: 283355
2016-10-05 18:36:02 +00:00
Simon Dardis 299dbd6cd1 [mips][ias] fix li macro when values are negated with ~
The integrated assembler evaluates the expressions such as ~0x80000000 to
0xffffffff7fffffff early in the parsing process. This patch adds compatibility
with gas so that li loads the expected value (0x7fffffff) in those cases. This
only occurs iff all the upper 32bits are set and maintains existing checks by
not truncating the result down to 32 bits if any of the the upper bits are not
set.

Reviewers: dsanders, zoran.jovanovic

Differential Review: https://reviews.llvm.org/D23399

llvm-svn: 283353
2016-10-05 18:26:19 +00:00
Bjorn Pettersson 12559441bd [DAG] Teach computeKnownBits and ComputeNumSignBits in SelectionDAG to look through EXTRACT_VECTOR_ELT.
Summary: Both computeKnownBits and ComputeNumSignBits can now do a simple
look-through of EXTRACT_VECTOR_ELT. It will compute the result based
on the known bits (or known sign bits) for the vector that the element
is extracted from.

Reviewers: bogner, tstellarAMD, mkuper

Subscribers: wdng, RKSimon, jyknight, llvm-commits, nhaehnle

Differential Revision: https://reviews.llvm.org/D25007

llvm-svn: 283347
2016-10-05 17:40:27 +00:00
Bjorn Pettersson ddd31e5637 Test commit permission. NFC
llvm-svn: 283346
2016-10-05 17:22:11 +00:00
Simon Dardis f45a59f80b Recommit: "[mips] Add rsqrt, recip for MIPS"
Add rsqrt.[ds], recip.[ds] for MIPS. Correct the microMIPS definitions for
architecture support and register usage.

Reviewers: vkalintiris, zoran.jovanoic

Differential Review: https://reviews.llvm.org/D24499

llvm-svn: 283334
2016-10-05 16:11:01 +00:00
Hans Wennborg c26c03d911 Revert r282920 "X86: Allow conditional tail calls in Win64 "leaf" functions (PR26302)"
This is suspected to cause a miscompile in Chromium. Reverting while
investigating.

llvm-svn: 283329
2016-10-05 15:39:27 +00:00
Simon Dardis bbfd528748 Revert "[mips] Add rsqrt, recip for MIPS"
This reverts commit r282485 which contain two patches instead of
one.

llvm-svn: 283327
2016-10-05 15:28:33 +00:00
Douglas Katzman 0411e8669b [X86] Don't randomly encode %rip where illegal
Differential Revision: https://reviews.llvm.org/D25112

llvm-svn: 283326
2016-10-05 15:23:35 +00:00
James Molloy b7de497cb9 [Thumb] Don't try and emit LDRH/LDRB from the constant pool
This is not a valid encoding - these instructions cannot do PC-relative addressing.

The underlying problem here is of whitelist in ARMISelDAGToDAG that unwraps ARMISD::Wrappers during addressing-mode selection. This didn't realise TargetConstantPool was actually possible, so didn't handle it.

llvm-svn: 283323
2016-10-05 14:52:13 +00:00
Douglas Katzman 8449b238ea [X86] Fix some tests that didn't assert anything
llvm-svn: 283322
2016-10-05 14:46:14 +00:00
Oren Ben Simhon 0670e5a35b Test commit permission
llvm-svn: 283319
2016-10-05 14:12:41 +00:00
Krzysztof Parzyszek e7c72cdbb0 Fix machine operand traversal in ScheduleDAGInstrs::fixupKills
llvm-svn: 283315
2016-10-05 13:15:06 +00:00
Kyle Butt 25ac35d822 Revert "Codegen: Tail-duplicate during placement."
This reverts commit 062ace9764953e9769142c1099281a345f9b6bdc.

Issue with loop info and block removal revealed by polly.
I have a fix for this issue already in another patch, I'll re-roll this
together with that fix, and a test case.

llvm-svn: 283292
2016-10-05 01:39:29 +00:00
Kyle Butt adabac2d57 Codegen: Tail-duplicate during placement.
The tail duplication pass uses an assumed layout when making duplication
decisions. This is fine, but passes up duplication opportunities that
may arise when blocks are outlined. Because we want the updated CFG to
affect subsequent placement decisions, this change must occur during
placement.

In order to achieve this goal, TailDuplicationPass is split into a
utility class, TailDuplicator, and the pass itself. The pass delegates
nearly everything to the TailDuplicator object, except for looping over
the blocks in a function. This allows the same code to be used for tail
duplication in both places.

This change, in concert with outlining optional branches, allows
triangle shaped code to perform much better, esepecially when the
taken/untaken branches are correlated, as it creates a second spine when
the tests are small enough.

Issue from previous rollback fixed, and a new test was added for that
case as well.

Differential revision: https://reviews.llvm.org/D18226

llvm-svn: 283274
2016-10-04 23:54:18 +00:00
Sanjay Patel bfdbea6481 [Target] move reciprocal estimate settings from TargetOptions to TargetLowering
The motivation for the change is that we can't have pseudo-global settings for
codegen living in TargetOptions because that doesn't work with LTO.

Ideally, these reciprocal attributes will be moved to the instruction-level via
FMF, metadata, or something else. But making them function attributes is at least
an improvement over the current state.

The ingredients of this patch are:

    Remove the reciprocal estimate command-line debug option.
    Add TargetRecip to TargetLowering.
    Remove TargetRecip from TargetOptions.
    Clean up the TargetRecip implementation to work with this new scheme.
    Set the default reciprocal settings in TargetLoweringBase (everything is off).
    Update the PowerPC defaults, users, and tests.
    Update the x86 defaults, users, and tests.

Note that if this patch needs to be reverted, the related clang patch checked in
at r283251 should be reverted too.

Differential Revision: https://reviews.llvm.org/D24816

llvm-svn: 283252
2016-10-04 20:46:43 +00:00
Kevin Enderby f993d6e72c Next set of additional error checks for invalid Mach-O files for the
load commands that uses the MachO::encryption_info_command and
MachO::encryption_info_command types but not used in llvm libObject
code but used in llvm tool code.

This includes just LC_ENCRYPTION_INFO and
LC_ENCRYPTION_INFO_64 load commands.

llvm-svn: 283250
2016-10-04 20:37:43 +00:00
Matthias Braun 46a5238682 AArch64: Macrofusion: Split features, add missing combinations.
AArch64InstrInfo::shouldScheduleAdjacent() determines whether two
instruction can benefit from macroop fusion on apple CPUs. The list
turned out to be incomplete:
- the "rr" variants of the instructions were missing
- even the "rs" variants can have shift value == 0 and behave like the
  "rr" variants

This also splits the MacropFusion target feature into
ArithmeticBccFusion and ArithmeticCbzFusion.

Differential Revision: https://reviews.llvm.org/D25142

llvm-svn: 283243
2016-10-04 19:28:21 +00:00
Hal Finkel bdd6735a9e Don't filter diagnostics written as YAML to the output file
The purpose of the YAML diagnostic output file is to collect information on
optimizations performed, or not performed, for later processing by tools that
help users (and compiler developers) understand how code was optimized. As
such, the diagnostics that appear in the file should not be coupled to what a
user might want to see summarized for them as the compiler runs, and in fact,
because the user likely does not know what optimization diagnostics their tools
might want to use, the user cannot provide a useful filter regardless. As such,
we shouldn't filter the diagnostics going to the output file.

Differential Revision: https://reviews.llvm.org/D25224

llvm-svn: 283236
2016-10-04 18:13:45 +00:00
Adam Nemet 0428e93217 Serialize remark argument as a mapping to get proper quotation for the value.
llvm-svn: 283231
2016-10-04 17:05:04 +00:00
Alexey Bataev 7e217c2402 [SLPVectorizer] Add a test with non-vectorizable IR.
llvm-svn: 283225
2016-10-04 15:07:23 +00:00
Anna Thomas 479cbb9405 [RS4GC] Handle ShuffleVector instruction in findBasePointer
Summary:
This patch modifies the findBasePointer to handle the shufflevector instruction.

Tests run: RS4GC tests, local downstream tests.

Reviewers: reames, sanjoy

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D25197

llvm-svn: 283219
2016-10-04 13:48:37 +00:00
Andrey Bokhanko 6903be56d5 Fix IntegerType::MAX_INT_BITS value
IntegerType::MAX_INT_BITS is apparently not in sync with Type::SubclassData
size. This patch fixes this.

Differential Revision: https://reviews.llvm.org/D24814

llvm-svn: 283215
2016-10-04 12:43:46 +00:00
Nemanja Ivanovic 6354d23555 [Power9] Exploit D-Form VSX Scalar memory ops that target full VSX register set
This patch corresponds to review:

The newly added VSX D-Form (register + offset) memory ops target the upper half
of the VSX register set. The existing ones target the lower half. In order to
unify these and have the ability to target all the VSX registers using D-Form
operations, this patch defines Pseudo-ops for the loads/stores which are
expanded post-RA. The expansion then choses the correct opcode based on the
register that was allocated for the operation.

llvm-svn: 283212
2016-10-04 11:25:52 +00:00
Simon Dardis 86b3a1e79b [mips][fastisel] Consider soft-float an unsupported floating point mode
Treat soft-float as unsupported for fast-isel. Additionally, ensure we check
that lowering f32 arguments also considers the case of soft-float mode.

Reviewers: ehostunreach, vkalintiris, zoran.jovanovic

Differential Review: https://reviews.llvm.org/D24505

llvm-svn: 283209
2016-10-04 10:35:07 +00:00
George Rimar 67443021a4 [Object/ELF] - Do not crash on invalid sh_offset value of REL[A] section.
Previously code would access invalid memory and may crash,
patch fixes the issue.

Differential revision: https://reviews.llvm.org/D25187

llvm-svn: 283204
2016-10-04 09:25:39 +00:00
George Rimar 5cbf23664d [Object/ELF] - Avoid possible crash in getExtendedSymbolTableIndex().
When using broken input object found using AFL,
getExtendedSymbolTableIndex() crashed because ShndxTable
was empty as object does not contain SHT_SYMTAB_SHNDX section.

Differential revision: https://reviews.llvm.org/D25189

llvm-svn: 283196
2016-10-04 08:44:03 +00:00
Nemanja Ivanovic a565d9e612 Fix a test case failure on Apple PPC.
llvm-svn: 283191
2016-10-04 07:37:38 +00:00
Nemanja Ivanovic 11049f8f07 [Power9] Part-word VSX integer scalar loads/stores and sign extend instructions
This patch corresponds to review:
https://reviews.llvm.org/D23155

This patch removes the VSHRC register class (based on D20310) and adds
exploitation of the Power9 sub-word integer loads into VSX registers as well
as vector sign extensions.
The new instructions are useful for a few purposes:

    Int to Fp conversions of 1 or 2-byte values loaded from memory
    Building vectors of 1 or 2-byte integers with values loaded from memory
    Storing individual 1 or 2-byte elements from integer vectors

This patch implements all of those uses.

llvm-svn: 283190
2016-10-04 06:59:23 +00:00
Kyle Butt 3ffb8529bc Revert "Codegen: Tail-duplicate during placement."
This reverts commit ff234efbe23528e4f4c80c78057b920a51f434b2.

Causing crashes on aarch64 build.

llvm-svn: 283172
2016-10-04 00:38:23 +00:00
Eli Friedman 74bed9d757 Make GlobalsAA ignore dead constant expressions.
Slightly improves the precision of GlobalsAA in certain situations, and
makes the behavior of optimization passes more predictable.

Differential Revision: https://reviews.llvm.org/D24104

llvm-svn: 283165
2016-10-04 00:03:55 +00:00
Kyle Butt 396bfdd707 Codegen: Tail-duplicate during placement.
The tail duplication pass uses an assumed layout when making duplication
decisions. This is fine, but passes up duplication opportunities that
may arise when blocks are outlined. Because we want the updated CFG to
affect subsequent placement decisions, this change must occur during
placement.

In order to achieve this goal, TailDuplicationPass is split into a
utility class, TailDuplicator, and the pass itself. The pass delegates
nearly everything to the TailDuplicator object, except for looping over
the blocks in a function. This allows the same code to be used for tail
duplication in both places.

This change, in concert with outlining optional branches, allows
triangle shaped code to perform much better, esepecially when the
taken/untaken branches are correlated, as it creates a second spine when
the tests are small enough.

llvm-svn: 283164
2016-10-04 00:00:09 +00:00
Matthias Braun d2fc0d40e4 Set some tests to an unknown vendor and OS
This avoids llc using the hosts OS/vendor as defaults and triggering
unwanted behaviour in the tests. This should deal with the buildbot
breakages on windows after r283140.

llvm-svn: 283149
2016-10-03 21:58:20 +00:00
Mehdi Amini 762d68bbab [LTO] Fix test to not depend on the exact address of symbols, just their linkage
llvm-svn: 283148
2016-10-03 21:40:50 +00:00
Krzysztof Parzyszek c8b6ecabd8 [RDF] Fix liveness propagation through shadows
Each shadow only represents data flow that is restricted to its reaching
def. Propagating more than that could lead to spurious register liveness,
resulting in extra (incorrectly) block live-ins.

llvm-svn: 283143
2016-10-03 20:17:20 +00:00
Matthias Braun eccdee9196 X86: Do not produce GOT relocations on windows
Windows has no GOT relocations the way elf/darwin has. Some people use
x86_64-pc-win32-macho to build EFI firmware; Do not produce GOT
relocations for this target.

Differential Revision: https://reviews.llvm.org/D24627

llvm-svn: 283140
2016-10-03 20:11:24 +00:00
Sanjoy Das 0359a193a7 [PruneEH] Be correct in the face IPO
This fixes one spot I had missed in r265762.  Credit goes to Philip
Reames for spotting this one!

llvm-svn: 283137
2016-10-03 19:35:30 +00:00
Konstantin Zhuravlyov 691e2e020b [AMDGPU] Sign extend AShr when promoting (instead of zero extending)
llvm-svn: 283130
2016-10-03 18:29:01 +00:00
Hans Wennborg b4d2678c6f Jump threading: avoid trying to split edge into landingpad block (PR27840)
Splitting the edge is nontrivial because of the landing pad, and we would
currently assert trying to do it.

Differential Revision: https://reviews.llvm.org/D24680

llvm-svn: 283129
2016-10-03 18:18:04 +00:00
Sanjay Patel d27a21874b [x86, SSE/AVX] allow 128/256-bit lowering for copysign vector intrinsics (PR30433)
This should fix:
https://llvm.org/bugs/show_bug.cgi?id=30433

There are a couple of open questions about the codegen:
1. Should we let scalar ops be scalars and avoid vector constant loads/splats?
2. Should we have a pass to combine constants such as the inverted pair that we have here?

Differential Revision: https://reviews.llvm.org/D25165
 

llvm-svn: 283119
2016-10-03 16:38:27 +00:00
Rafael Espindola d7325ee702 Don't drop the llvm. prefix when renaming.
If the llvm. prefix is dropped other parts of llvm don't see this as
an intrinsic.  This means that the number of regular symbols depends
on the context the module is loaded into, which causes LTO to abort.

Fixes PR30509.

llvm-svn: 283117
2016-10-03 15:51:42 +00:00
Nirav Dave 157891c57f Prevent out of order HashDirective lexing in AsmLexer.
Retrying after buildbot reset.

To lex hash directives we peek ahead to find component tokens, create a
unified token, and unlex the peeked tokens so the parser does not need
to parse the tokens then. Make sure we do not to lex another hash
directive during peek operation.

This fixes PR28921.

Reviewers: rnk, loladiro

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D24839

llvm-svn: 283111
2016-10-03 13:48:27 +00:00
Matt Arsenault 40bae76620 AMDGPU: Fix missing -verify-machineinstrs in test
llvm-svn: 283107
2016-10-03 12:58:59 +00:00
Simon Pilgrim 52ab136881 [X86][SSE] Add PR30371 (shuffle constant folding) test case
llvm-svn: 283103
2016-10-03 12:16:39 +00:00
Sjoerd Meijer 4dbe73c1ed [ARM] Code size optimisation to lower udiv+urem to udiv+mls instead of a
library call to __aeabi_uidivmod. This is an improved implementation of
r280808, see also D24133, that got reverted because isel was stuck in a loop.
That was caused by the optimisation incorrectly triggering on i64 ints, which
shouldn't happen because there is no 64bit hwdiv support; that put isel's type
legalization and this optimisation in a loop. A native ARM compiler and testing
now shows that this is fixed.

Patch mostly by Pablo Barrio.

Differential Revision: https://reviews.llvm.org/D25077

llvm-svn: 283098
2016-10-03 10:12:32 +00:00
Alexey Bataev fe91cf3aba [CodeGen] Adding a test showing the current state of poor code gen of
search loop, by Andrey Tischenko

PR27136 shows failure to hoist constant out of loop. This test is used
as start point to fix the failure: it shows the current state of codegen
and discovers what should be fixed

Differential Revision: https://reviews.llvm.org/D25097

llvm-svn: 283091
2016-10-03 07:47:01 +00:00
Simon Pilgrim a8d2168cb0 [X86][AVX2] Add support for combining target shuffles to VPERMD/VPERMPS
llvm-svn: 283080
2016-10-02 21:07:58 +00:00
Simon Pilgrim bce1f6b491 [X86][AVX2] Missed opportunities to combine to VPERMD/VPERMPS
llvm-svn: 283077
2016-10-02 20:43:02 +00:00
Simon Pilgrim b5200971d6 [X86][AVX2] Fix typo in test names
We are testing vpermps not vpermd

llvm-svn: 283076
2016-10-02 19:31:58 +00:00
Sanjay Patel 170d7eb303 [x86] remove 'nan' strings from copysign assertions; NFC
Preemptively scrubbing these to avoid a bot fail as in PR30443:
https://llvm.org/bugs/show_bug.cgi?id=30443

I'm nearly done with a patch to fix these cases, so not trying very
hard to do better for the temporary win. 

I plan to use better checks than what the script produces for the vectorized cases.

llvm-svn: 283072
2016-10-02 17:07:24 +00:00
Sanjay Patel dfbbbcd662 [x86] add test to show unnecessary scalarization of copysign intrinsics (PR30433)
llvm-svn: 283071
2016-10-02 16:31:35 +00:00
Simon Pilgrim 03afbe783d [X86][AVX] Ensure broadcast loads respect dependencies
To allow broadcast loads of a non-zero'th vector element, lowerVectorShuffleAsBroadcast can replace a load with a new load with an adjusted address, but unfortunately we weren't ensuring that the new load respected the same dependencies.

This patch adds a TokenFactor and updates all dependencies of the old load to reference the new load instead.

Bug found during internal testing.

Differential Revision: https://reviews.llvm.org/D25039

llvm-svn: 283070
2016-10-02 15:59:15 +00:00
Hal Finkel a9321059b9 [PowerPC] Refactor soft-float support, and enable PPC64 soft float
This change enables soft-float for PowerPC64, and also makes soft-float disable
all vector instruction sets for both 32-bit and 64-bit modes. This latter part
is necessary because the PPC backend canonicalizes many Altivec vector types to
floating-point types, and so soft-float breaks scalarization support for many
operations. Both for embedded targets and for operating-system kernels desiring
soft-float support, it seems reasonable that disabling hardware floating-point
also disables vector instructions (embedded targets without hardware floating
point support are unlikely to have Altivec, etc. and operating system kernels
desiring not to use floating-point registers to lower syscall cost are unlikely
to want to use vector registers either). If someone needs this to work, we'll
need to change the fact that we promote many Altivec operations to act on
v4f32. To make it possible to disable Altivec when soft-float is enabled,
hardware floating-point support needs to be expressed as a positive feature,
like the others, and not a negative feature, because target features cannot
have dependencies on the disabling of some other feature. So +soft-float has
now become -hard-float.

Fixes PR26970.

llvm-svn: 283060
2016-10-02 02:10:20 +00:00
Martell Malone 3a4d900039 COFF: Fix short import lib import name type bitshift
As per the PE COFF spec (section 8.3, Import Name Type)
Offset: 18 Size 2 bits Name: Type
Offset: 20 Size 3 bits Name: Name Type

Offset: 20 added based on 18+2

Partially commited as rL279069

Differential Revision: https://reviews.llvm.org/D23540

llvm-svn: 283055
2016-10-01 23:10:20 +00:00
Simon Pilgrim 04e249b128 [SLPVectorizer][X86] Added fptosi/fptoui tests
llvm-svn: 283048
2016-10-01 19:35:59 +00:00
Simon Pilgrim 1ec20e9b0a [CostModel][X86] Added tests for current fptosi/fptoui costs
llvm-svn: 283047
2016-10-01 19:09:59 +00:00
Simon Pilgrim 567c4fbdae [SLPVectorizer][X86] Added fcopysign tests
llvm-svn: 283046
2016-10-01 17:00:26 +00:00
Simon Pilgrim cceeb2a4fa [SLPVectorizer][X86] Added fabs tests
llvm-svn: 283045
2016-10-01 16:54:01 +00:00
Simon Pilgrim e0ec5c1f05 [CostModel][X86] Added fcopysign costs
llvm-svn: 283044
2016-10-01 16:41:52 +00:00
Simon Pilgrim 8b021c382d [CostModel][X86] Added fabs costs
llvm-svn: 283042
2016-10-01 16:30:13 +00:00
Simon Pilgrim 1638d49f20 [X86][SSE] Add support for combining target shuffles to binary BLEND
We already had support for 1-input BLEND with zero - this adds support for 2-input BLEND as well.

llvm-svn: 283040
2016-10-01 16:04:28 +00:00
Simon Pilgrim ae17cf20ce [X86][SSE] Always combine target shuffles to MOVSD/MOVSS
Now we can commute to BLENDPD/BLENDPS on SSE41+ targets if necessary, so simplify the combine matching where we can.

This required me to add a couple of scalar math movsd/moss fold patterns that hadn't been needed in the past.

llvm-svn: 283038
2016-10-01 15:33:01 +00:00
Simon Pilgrim ccdd1ff49b [X86][SSE] Enable commutation from MOVSD/MOVSS to BLENDPD/BLENDPS on SSE41+ targets
Instead of selecting between MOVSD/MOVSS and BLENDPD/BLENDPS at shuffle lowering by subtarget this will help us select the instruction based on actual commutation requirements.

We could possibly add BLENDPD/BLENDPS -> MOVSD/MOVSS commutation and MOVSD/MOVSS memory folding using a similar approach if it proves useful

I avoided adding AVX512 handling as I'm not sure when we should be making use of VBLENDPD/VBLENDPS on EVEX targets

llvm-svn: 283037
2016-10-01 14:26:11 +00:00
Simon Pilgrim f1c575bad7 [X86][SSE] Regenerate vselect tests and improve AVX1/AVX2 coverage
llvm-svn: 283035
2016-10-01 13:10:14 +00:00
Nirav Dave e4c6153cf1 Revert "[MC] Prevent out of order HashDirective lexing in AsmLexer."
This reverts commit r282992 which appears to be causing an LTO test failure.

llvm-svn: 283034
2016-10-01 10:57:55 +00:00
Craig Topper 5eb5ade894 [X86] Cleanup patterns for using VMOVDDUP for broadcasts.
-Remove OptForSize. Not all of the backend follows the same rules for creating broadcasts and there is no conflicting pattern.
-Don't stop selecting VEX VMOVDDUP when AVX512 is supported. We need VLX for EVEX VMOVDDUP.
-Only use VMOVDDUP for v2i64 broadcasts if AVX2 is not supported.

llvm-svn: 283020
2016-10-01 07:11:24 +00:00
Craig Topper 8aca90507f [AVX-512] Add VLX command lines to 128 and 256-bit shufffle tests.
llvm-svn: 283014
2016-10-01 06:01:18 +00:00
Mehdi Amini 86eeda8e20 Revert "AMDGPU: Don't use offen if it is 0"
This reverts commit r282999.
Tests are not passing: http://lab.llvm.org:8011/builders/clang-x86_64-linux-selfhost-modules/builds/20038

llvm-svn: 283003
2016-10-01 02:35:24 +00:00
Matt Arsenault 3070fdf798 AMDGPU: Don't use offen if it is 0
This removes many re-initializations of a base register to 0.

llvm-svn: 282999
2016-10-01 01:37:15 +00:00
Nirav Dave 9f2bd4e7ea [MC] Prevent out of order HashDirective lexing in AsmLexer.
To lex hash directives we peek ahead to find component tokens, create a
unified token, and unlex the peeked tokens so the parser does not need
to parse the tokens then. Make sure we do not to lex another hash
directive during peek operation.

This fixes PR28921.

Reviewers: rnk, loladiro

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D24839

llvm-svn: 282992
2016-10-01 00:42:32 +00:00
Mehdi Amini 6610b01a27 [ASAN] Add the binder globals on Darwin to llvm.compiler.used to avoid LTO dead-stripping
The binder is in a specific section that "reverse" the edges in a
regular dead-stripping: the binder is live as long as a global it
references is live.

This is a big hammer that prevents LLVM from dead-stripping these,
while still allowing linker dead-stripping (with special knowledge
of the section).

Differential Revision: https://reviews.llvm.org/D24673

llvm-svn: 282988
2016-10-01 00:05:34 +00:00
Reid Kleckner 9cb915b7be [SEH] Emit the parent frame offset label even if there are no funclets
This avoids errors about references to undefined local labels from
unreferenced filter functions.

Fixes (sort of) PR30431

llvm-svn: 282967
2016-09-30 22:10:12 +00:00
Rui Ueyama 5d6714e593 Do not pass a superblock to PDBFileBuilder.
When we create a PDB file using PDBFileBuilder, the information
in the superblock, such as the size of the resulting file, is not
available.

Previously, PDBFileBuilder::initialize took a superblock assuming
that all the members of the struct are correct. That is useful when
you want to restore the exact information from a YAML file, but
that's probably the only use case in which that is useful.
When we are creating a PDB file on the fly, we have to backfill the
members.

This patch redefines PDBFileBuilder::initialize to take only a
block size. Now all the other members are left as default values,
so that they'll be updated when commit() is called.

Differential Revision: https://reviews.llvm.org/D25108

llvm-svn: 282944
2016-09-30 20:52:12 +00:00
Hans Wennborg b5643b47b6 X86: Allow conditional tail calls in Win64 "leaf" functions (PR26302)
We can't use Jcc to leave a Win64 function in general, because that
confuses the unwinder. However, for "leaf" functions, that is, functions
where the return address is always on top of the stack and which don't
have unwind info, it's OK.

Differential Revision: https://reviews.llvm.org/D24836

llvm-svn: 282920
2016-09-30 20:07:35 +00:00
Sanjay Patel f7b851fe84 [InstCombine] allow non-splat folds of select cond (ext X), C
llvm-svn: 282906
2016-09-30 19:49:22 +00:00
Dehao Chen a067b0926f Revert test change in r282894 as it's broken in some platforms.
llvm-svn: 282903
2016-09-30 19:25:23 +00:00
Gor Nishanov a263a60ad5 [Coroutines] Part15c: Fix coro-split to correctly handle definitions between coro.save and coro.suspend
Summary:
In the case below, %Result.i19 is defined between coro.save and coro.suspend and used after coro.suspend. We need to correctly place such a value into the coroutine frame.

```
  %save = call token @llvm.coro.save(i8* null)
  %Result.i19 = getelementptr inbounds %"struct.lean_future<int>::Awaiter", %"struct.lean_future<int>::Awaiter"* %ref.tmp7, i64 0, i32 0
  %suspend = call i8 @llvm.coro.suspend(token %save, i1 false)
  switch i8 %suspend, label %exit [
    i8 0, label %await.ready
    i8 1, label %exit
  ]
await.ready:
  %val = load i32, i32* %Result.i19

```

Reviewers: majnemer

Subscribers: llvm-commits, mehdi_amini

Differential Revision: https://reviews.llvm.org/D24418

llvm-svn: 282902
2016-09-30 19:24:19 +00:00
Sanjay Patel 75b2518762 [InstCombine] add tests for non-splat select(ext)
llvm-svn: 282901
2016-09-30 19:15:41 +00:00
Gor Nishanov c16219486a [Coroutines] Part15b: Fix dbg information handling in coro-split.
Summary:
Without the fix, if there was a function inlined into the coroutine with debug information, CloneFunctionInto(NewF, &F, VMap, /*ModuleLevelChanges=*/true, Returns); would duplicate all of the debug information including the DICompileUnit.

We know use VMap to indicate that debug metadata for a File, Unit and FunctionType should not be duplicated when we creating clones that will become f.resume, f.destroy and f.cleanup.

Reviewers: majnemer

Subscribers: mehdi_amini, llvm-commits

Differential Revision: https://reviews.llvm.org/D24417

llvm-svn: 282899
2016-09-30 19:05:06 +00:00
Gor Nishanov 768de2c604 [Coroutines] Part 15a: Lower coro.subfn.addr in CoroCleanup
Summary: Not all coro.subfn.addr intrinsics can be eliminated in CoroElide through devirtualization. Those that remain need to be lowered in CoroCleanup.

Reviewers: majnemer

Subscribers: llvm-commits, mehdi_amini

Differential Revision: https://reviews.llvm.org/D24412

llvm-svn: 282897
2016-09-30 18:41:35 +00:00
Sanjay Patel b43712a513 clean up tests and auto-generate checks
llvm-svn: 282896
2016-09-30 18:37:34 +00:00
Michal Gorny b002f6370b cmake: Install the OCaml libraries into a more correct path
Add a OCAML_INSTALL_PATH variable that can be used to control
the install path for OCaml libraries. The new variable defaults to
${OCAML_STDLIB_PATH}, i.e. the OCaml library path obtained from
the OCaml compiler. Install libraries into "llvm" subdirectory.

This fixes two issues:

1. OCaml library directories differ between systems, and 'lib/ocaml' is
incorrect e.g. on amd64 Gentoo where OCaml is installed
in 'lib64/ocaml'. Therefore, obtain the library path from the OCaml
compiler using 'ocamlc -where' (which is already used to set
OCAML_STDLIB_PATH), which is the method used commonly in OCaml packages.

2. The top-level directory is reserved for the standard library, and has
precedence over local directory in search path. As a result, OCaml
preferred the files installed along with previous LLVM version over the
source tree when building a new version, resulting in two versions being
mixed during the build. The new layout is used commonly by other OCaml
packages, and findlib is able to find the LLVM libraries successfully.

Bug: https://bugs.gentoo.org/559134
Bug: https://bugs.gentoo.org/559624

Differential Revision: https://reviews.llvm.org/D24354

llvm-svn: 282895
2016-09-30 18:34:23 +00:00
Dehao Chen 977853b7c5 Update loop unroller cost model to make sure debug info does not affect optimization decisions.
Summary: Debug info should *not* affect optimization decisions. This patch updates loop unroller cost model to make it not affected by debug info.

Reviewers: davidxl, mzolotukhin

Subscribers: haicheng, llvm-commits, mzolotukhin

Differential Revision: https://reviews.llvm.org/D25098

llvm-svn: 282894
2016-09-30 18:30:04 +00:00
Sanjay Patel 9685b3eb57 [InstCombine] add tests for select X, (ext X), C
llvm-svn: 282891
2016-09-30 18:10:14 +00:00
Derek Schuff e9e6891b2d [WebAssembly] Make register stackification more conservative
Register stackification currently checks VNInfo for changes. Make that
more accurate by testing each intervening instruction for any other defs
to the same virtual register.

Patch by Jacob Gravelle

Differential Revision: https://reviews.llvm.org/D24942

llvm-svn: 282886
2016-09-30 18:02:54 +00:00
Etienne Bergeron 0ca0568604 [asan] Support dynamic shadow address instrumentation
Summary:
This patch is adding the support for a shadow memory with
dynamically allocated address range.

The compiler-rt needs to export a symbol containing the shadow
memory range.

This is required to support ASAN on windows 64-bits.

Reviewers: kcc, rnk, vitalybuka

Subscribers: zaks.anna, kubabrecka, dberris, llvm-commits, chrisha

Differential Revision: https://reviews.llvm.org/D23354

llvm-svn: 282881
2016-09-30 17:46:32 +00:00
Matthew Simpson 7808833e28 [LV] Build all scalar steps for non-uniform induction variables
When building the steps for scalar induction variables, we previously attempted
to determine if all the scalar users of the induction variable were uniform. If
they were, we would only emit the step corresponding to vector lane zero. This
optimization was too aggressive. We generally don't know the entire set of
induction variable users that will be scalar. We have
isScalarAfterVectorization, but this is only a conservative estimate of the
instructions that will be scalarized. Thus, an induction variable may have
scalar users that aren't already known to be scalar. To avoid emitting unused
steps, we can only check that the induction variable is uniform. This should
fix PR30542.

Reference: https://llvm.org/bugs/show_bug.cgi?id=30542
llvm-svn: 282863
2016-09-30 15:13:52 +00:00
Dylan McKay 309eba75b1 Revert "[RegAllocGreedy] Attempt to split unspillable live intervals"
It was accidentally committed.

llvm-svn: 282855
2016-09-30 14:05:15 +00:00
Dylan McKay 2a80cc688a [RegAllocGreedy] Attempt to split unspillable live intervals
Summary:
Previously, when allocating unspillable live ranges, we would never
attempt to split. We would always bail out and try last ditch graph
recoloring.

This patch changes this by attempting to split all live intervals before
performing recoloring.

This fixes LLVM bug PR14879.

I can't add test cases for any backends other than AVR because none of
them have small enough register classes to trigger the bug.

Reviewers: qcolombet

Subscribers: MatzeB

Differential Revision: https://reviews.llvm.org/D25070

llvm-svn: 282852
2016-09-30 13:59:20 +00:00
Craig Topper 3f37a4180b Revert r282835 "[AVX-512] Always use the full 32 register vector classes for addRegisterClass regardless of whether AVX512/VLX is enabled or not."
Turns out this doesn't pass verify-machineinstrs.

llvm-svn: 282841
2016-09-30 05:35:42 +00:00
Craig Topper bc6e97b8f4 [AVX-512] Always use the full 32 register vector classes for addRegisterClass regardless of whether AVX512/VLX is enabled or not.
If AVX512 is disabled, the registers should already be marked reserved. Pattern predicates and register classes on instructions should take care of most of the rest. Loads/stores and physical register copies for XMM16-31 and YMM16-31 without VLX have already been taken care of.

I'm a little unclear why this changed the register allocation of the SSE2 run of the sad.ll test, but the registers selected appear to be valid after this change.

llvm-svn: 282835
2016-09-30 04:31:33 +00:00
Piotr Padlewski d28694739c [thinlto] Don't decay threshold for hot callsites
Summary:
We don't want to decay hot callsites to import chains of hot
callsites. The same mechanism is used in LIPO.

Reviewers: tejohnson, eraman, mehdi_amini

Subscribers: llvm-commits, mehdi_amini

Differential Revision: https://reviews.llvm.org/D24976

llvm-svn: 282833
2016-09-30 03:01:17 +00:00
Matt Arsenault 5d8eb25e78 AMDGPU: Use unsigned compare for eq/ne
For some reason there are both of these available, except
for scalar 64-bit compares which only has u64. I'm not sure
why there are both (I'm guessing it's for the one bit inputs we
don't use), but for consistency always using the
unsigned one.

llvm-svn: 282832
2016-09-30 01:50:20 +00:00
Reid Kleckner 147f91c88e [X86] Don't preserve Win64 SSE CSRs when SSE is disabled
Code that doesn't use floating point and doesn't use SSE (kernel code)
shouldn't save and restore SSE registers.

Fixes PR30503

llvm-svn: 282819
2016-09-30 00:17:49 +00:00
Kevin Enderby 4f229d867b Next set of additional error checks for invalid Mach-O files for the
load command that uses the MachO::entry_point_command type
but not used in llvm libObject code but used in llvm tool code.

This includes just the LC_MAIN load command.

llvm-svn: 282766
2016-09-29 21:07:29 +00:00
Reid Kleckner e45b2c7d8e [codeview] Use character types for all byte-sized integer types
The VS debugger doesn't appear to understand the 0x68 or 0x69 type
indices, which were probably intended for use on a platform where a C
'int' is 8 bits. So, use the character types instead. Clang was already
using the character types because '[u]int8_t' is usually defined in
terms of 'char'.

See the Rust issue for screenshots of what VS does:
https://github.com/rust-lang/rust/issues/36646

Fixes PR30552

llvm-svn: 282739
2016-09-29 17:55:01 +00:00
Kevin Enderby 245be3ed2a Next set of additional error checks for invalid Mach-O files for the
load command that uses the Mach::source_version_command type
but not used in llvm libObject code but used in llvm tool code.

This includes just the LC_SOURCE_VERSION load command.

llvm-svn: 282736
2016-09-29 17:45:23 +00:00
Kostya Serebryany a9b0dd0e51 [sanitizer-coverage/libFuzzer] make the guards for trace-pc 32-bit; create one array of guards per function, instead of one guard per BB. reorganize the code so that trace-pc-guard does not create unneeded globals
llvm-svn: 282735
2016-09-29 17:43:24 +00:00
Piotr Padlewski ba72b95f7b [thinlto] Add cold-callsite import heuristic
Summary:
Not tunned up heuristic, but with this small heuristic there is about
+0.10% improvement on SPEC 2006

Reviewers: tejohnson, mehdi_amini, eraman

Subscribers: mehdi_amini, llvm-commits

Differential Revision: https://reviews.llvm.org/D24940

llvm-svn: 282733
2016-09-29 17:32:07 +00:00
Simon Pilgrim d72330dd09 [X86] Add explicit test triple to make windows/msvc builds happier
llvm-svn: 282719
2016-09-29 15:10:09 +00:00
Craig Topper bd74f75619 [X86] Really fix the FileCheck line from r282690.
Why does Folded Spill comments print with a different number of # characters on different systems?

llvm-svn: 282693
2016-09-29 06:49:21 +00:00
Craig Topper 1b60e9d7c0 [AVX-512] Fix a check line from r282690.
llvm-svn: 282691
2016-09-29 06:37:21 +00:00
Craig Topper d875d6b9b4 [AVX-512] Support spills of XMM16-31 and YMM16-31 when VLX isn't available.
This adds new pseudo instructions that can be selected during register allocation to represent loads and stores of XMM/YMM registers when AVX512F is available, but VLX isn't. They will be converted to VEX encoded moves if the register turns out to be XMM0-15/YMM0-15. Otherwise either an EVEX VEXTRACT(store) or VBROADCAST(load) will be used.

Fixes one of the cases from PR29112.

llvm-svn: 282690
2016-09-29 06:07:09 +00:00
Craig Topper f91830e6ee [X86] Remove extra FileCheck lines that got left behind in r282688.
llvm-svn: 282689
2016-09-29 06:07:07 +00:00
Craig Topper 7eb0e7ce1f [AVX-512] Replicate pattern from AVX to select VMOVDDUP for (v2f64 (X86VBroadcast f64:)). Add AVX512VL to command line of existing AVX2 test that hits this condition.
llvm-svn: 282688
2016-09-29 05:54:43 +00:00
Craig Topper e7f2611160 [X86] Add EVEX encoded VBROADCASTSS/SD and VPBROADCASTD/Q to execution domain fixing table.
llvm-svn: 282687
2016-09-29 05:54:39 +00:00
Craig Topper 7da0465062 [X86] Add 512-bit VPBROADCASTB and VPBROADCASTW tests.
llvm-svn: 282685
2016-09-29 05:54:32 +00:00
Craig Topper 816a1d7783 [X86] Add VBROADCASTF128/VBROADCASTI128 to execution domain fixing tables.
llvm-svn: 282684
2016-09-29 05:54:28 +00:00
Matt Arsenault e6740754f0 AMDGPU: Partially fix control flow at -O0
Fixes to allow spilling all registers at the end of the block
work with exec modifications. Don't emit s_and_saveexec_b64 for
if lowering, and instead emit copies. Mark control flow mask
instructions as terminators to get correct spill code placement
with fast regalloc, and then have a separate optimization pass
form the saveexec.

This should work if SGPRs are spilled to VGPRs, but
will likely fail in the case that an SGPR spills to memory
and no workitem takes a divergent branch.

llvm-svn: 282667
2016-09-29 01:44:16 +00:00
Lei Liu 361615cfd0 AArch64: Set shift bit of TLSLE HI12 add instruction
Summary: AArch64 LLVM assembler emits add instruction without shift bit to calculate the higher 12-bit address of TLS variables in local exec model.  This generates wrong code sequence to access TLS variables with thread offset larger than 0x1000.

Reviewers: t.p.northover, peter.smith, rovka

Subscribers: salim.nasser, aemerson, llvm-commits, rengolin

Differential Revision: https://reviews.llvm.org/D24702

llvm-svn: 282661
2016-09-29 01:05:48 +00:00
Evgeny Stupachenko dc8a254663 Wisely choose sext or zext when widening IV.
Summary:
The patch fixes regression caused by two earlier patches D18777 and D18867.

Reviewers: reames, sanjoy

Differential Revision: http://reviews.llvm.org/D24280

From: Li Huang
llvm-svn: 282650
2016-09-28 23:39:39 +00:00
Kevin Enderby 76966bf066 Next set of additional error checks for invalid Mach-O files for the
load command that uses the Mach::rpath_command type
but not used in llvm libObject code but used in llvm tool code.

This includes just the LC_RPATH load command.

llvm-svn: 282649
2016-09-28 23:16:01 +00:00
Mike Aizatsky 392caa538d [sancov] introducing symbolized coverage files (.symcov)
Summary:
Answering any meaningful questions about .sancov files requires
accessing symbol information from the corresponding binary.

This change introduces a separate intermediate data structure and
format: symbolized coverage. It contains all symbol information that
is required to answer common queries:
- merging
- coverd/uncovered files and functions
- line status.

Also removing the html report functionality from sancov: generated
HTML files are too huge, and a different approach is required.
Maintaining this half-working approach in the C++ is painful.

Differential Revision: https://reviews.llvm.org/D24947

llvm-svn: 282639
2016-09-28 21:39:28 +00:00
Kevin Enderby 32359dbf6b Next set of additional error checks for invalid Mach-O files for the
other load commands that use the Mach::version_min_command type
but not used in llvm libObject code but used in llvm tool code.

This includes LC_VERSION_MIN_MACOSX, LC_VERSION_MIN_IPHONEOS,
LC_VERSION_MIN_TVOS and LC_VERSION_MIN_WATCHOS load commands.

llvm-svn: 282635
2016-09-28 21:20:45 +00:00
Krzysztof Parzyszek dcb1bcae0b IfConversion: Add implicit uses for redefined regs with live subregisters
Normally, if conversion would add implicit uses for redefined registers,
e.g. R0<def> = add_if ..., R0<imp-use>. However, if only subregisters of
R0 are known to be live but not R0 itself, such implicit uses will not be
added, causing prior definitions of such subregisters and R0 itself to
become dead.

llvm-svn: 282626
2016-09-28 20:07:41 +00:00
Konstantin Zhuravlyov e14df4b236 [AMDGPU] Promote uniform i16 ops to i32 ops for targets that have 16 bit instructions
Differential Revision: https://reviews.llvm.org/D24125

llvm-svn: 282624
2016-09-28 20:05:39 +00:00
Sanjay Patel 3e9e5ccf7c [InstCombine] update to use FileCheck
Also, remove unnecessary function attributes, parameters, and comments.
It looks like at least some of these tests are not minimal though...

llvm-svn: 282620
2016-09-28 19:10:16 +00:00
Simon Pilgrim fea5c7a051 [X86][AVX] Add test showing that VBROADCAST loads don't correctly respect dependencies
llvm-svn: 282613
2016-09-28 17:59:30 +00:00
Artur Pilipenko b6ce6e5dac Don't look through addrspacecast in GetPointerBaseWithConstantOffset
Pointers in different addrspaces can have different sizes, so it's not valid to look through addrspace cast calculating base and offset for a value.

This is similar to D13008.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D24729

llvm-svn: 282612
2016-09-28 17:57:16 +00:00
Adrian Prantl 7f5866c227 Teach LiveDebugValues about lexical scopes.
This addresses PR26055 LiveDebugValues is very slow.

Contrary to the old LiveDebugVariables pass LiveDebugValues currently
doesn't look at the lexical scopes before inserting a DBG_VALUE
intrinsic. This means that we often propagate DBG_VALUEs much further
down than necessary. This is especially noticeable in large C++
functions with many inlined method calls that all use the same
"this"-pointer.

For example, in the following code it makes no sense to propagate the
inlined variable a from the first inlined call to f() into any of the
subsequent basic blocks, because the variable will always be out of
scope:

void sink(int a);
void __attribute((always_inline)) f(int a) { sink(a); }
void foo(int i) {
   f(i);
   if (i)
     f(i);
   f(i);
}

This patch reuses the LexicalScopes infrastructure we have for
LiveDebugVariables to take this into account.

The effect on compile time and memory consumption is quite noticeable:
I tested a benchmark that is a large C++ source with an enormous
amount of inlined "this"-pointers that would previously eat >24GiB
(most of them for DBG_VALUE intrinsics) and whose compile time was
dominated by LiveDebugValues. With this patch applied the memory
consumption is 1GiB and 1.7% of the time is spent in LiveDebugValues.

https://reviews.llvm.org/D24994
Thanks to Daniel Berlin and Keith Walker for reviewing!

llvm-svn: 282611
2016-09-28 17:51:14 +00:00
Artem Belevich 3e1211581c [NVPTX] Added intrinsics for atom.gen.{sys|cta}.* instructions.
These are only available on sm_60+ GPUs.

Differential Revision: https://reviews.llvm.org/D24943

llvm-svn: 282607
2016-09-28 17:25:38 +00:00
Nirav Dave e524f50882 Revert "In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled."
This reverts commit r282600 due to test failues with MCJIT

llvm-svn: 282604
2016-09-28 16:37:50 +00:00
Marina Yatsina 76bfc6670b [x86] Accept 'retn' as an alias to 'ret[lqw]'\'ret' (At&t\Intel)
Implement 'retn' simply by aliasing it to the relevant 'ret' instruction

Commit on behalf of coby

Differential Revision: https://reviews.llvm.org/D24346

llvm-svn: 282601
2016-09-28 15:52:56 +00:00
Nirav Dave e17e055b75 In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled.
Simplify Consecutive Merge Store Candidate Search

  Now that address aliasing is much less conservative, push through
  simplified store merging search which only checks for parallel stores
  through the chain subgraph. This is cleaner as the separation of
  non-interfering loads/stores from the store-merging logic.

  Whem merging stores, search up the chain through a single load, and
  finds all possible stores by looking down from through a load and a
  TokenFactor to all stores visited. This improves the quality of the
  output SelectionDAG and generally the output CodeGen (with some
  exceptions).

  Additional Minor Changes:

    1. Finishes removing unused AliasLoad code
    2. Unifies the the chain aggregation in the merged stores across
       code paths
    3. Re-add the Store node to the worklist after calling
       SimplifyDemandedBits.
    4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is
       arbitrary, but seemed sufficient to not cause regressions in
       tests.

  This finishes the change Matt Arsenault started in r246307 and
  jyknight's original patch.

  Many tests required some changes as memory operations are now
  reorderable. Some tests relying on the order were changed to use
  volatile memory operations

  Noteworthy tests:

    CodeGen/AArch64/argument-blocks.ll -
      It's not entirely clear what the test_varargs_stackalign test is
      supposed to be asserting, but the new code looks right.

    CodeGen/AArch64/arm64-memset-inline.lli -
    CodeGen/AArch64/arm64-stur.ll -
    CodeGen/ARM/memset-inline.ll -
      The backend now generates *worse* code due to store merging
      succeeding, as we do do a 16-byte constant-zero store efficiently.

    CodeGen/AArch64/merge-store.ll -
      Improved, but there still seems to be an extraneous vector insert
      from an element to itself?

    CodeGen/PowerPC/ppc64-align-long-double.ll -
      Worse code emitted in this case, due to the improved store->load
      forwarding.

    CodeGen/X86/dag-merge-fast-accesses.ll -
    CodeGen/X86/MergeConsecutiveStores.ll -
    CodeGen/X86/stores-merging.ll -
    CodeGen/Mips/load-store-left-right.ll -
      Restored correct merging of non-aligned stores

    CodeGen/AMDGPU/promote-alloca-stored-pointer-value.ll -
      Improved. Correctly merges buffer_store_dword calls

    CodeGen/AMDGPU/si-triv-disjoint-mem-access.ll -
      Improved. Sidesteps loading a stored value and merges two stores

    CodeGen/X86/pr18023.ll -
      This test has been removed, as it was asserting incorrect
      behavior. Non-volatile stores *CAN* be moved past volatile loads,
      and now are.

    CodeGen/X86/vector-idiv.ll -
    CodeGen/X86/vector-lzcnt-128.ll -
      It's basically impossible to tell what these tests are actually
      testing. But, looks like the code got better due to the memory
      operations being recognized as non-aliasing.

    CodeGen/X86/win32-eh.ll -
      Both loads of the securitycookie are now merged.

    CodeGen/AMDGPU/vgpr-spill-emergency-stack-slot-compute.ll -
      This test appears to work but no longer exhibits the spill
      behavior.

Reviewers: arsenm, hfinkel, tstellarAMD, nhaehnle, jyknight

Subscribers: wdng, nhaehnle, nemanjai, arsenm, weimingz, niravd, RKSimon, aemerson, qcolombet, resistor, tstellarAMD, t.p.northover, spatel

Differential Revision: https://reviews.llvm.org/D14834

llvm-svn: 282600
2016-09-28 15:50:43 +00:00
Sanjay Patel 220a8730fb [InstSimplify] allow or-of-icmps folds with vector splat constants
llvm-svn: 282592
2016-09-28 14:27:21 +00:00
Sanjay Patel a8f9e57c74 [InstSimplify] add vector splat tests for or-of-icmps
llvm-svn: 282591
2016-09-28 14:17:35 +00:00
Sanjay Patel 1b312ad42d [InstSimplify] allow and-of-icmps folds with vector splat constants
llvm-svn: 282590
2016-09-28 13:53:13 +00:00
Guy Blank 2bdc74a471 [X86][FastISel] Use a COPY from K register to a GPR instead of a K operation
The KORTEST was introduced due to a bug where a TEST instruction used a K register.
but, turns out that the opposite case of KORTEST using a GPR is now happening

The change removes the KORTEST flow and adds a COPY instruction from the K reg to a GPR.

Differential Revision: https://reviews.llvm.org/D24953

llvm-svn: 282580
2016-09-28 11:22:17 +00:00
Michael Kuperstein 3e06eafc20 [DAG] Remove isVectorClearMaskLegal() check from vector_build dagcombine
This check currently doesn't seem to do anything useful on any in-tree target:
On non-x86, it always evaluates to false, so we never hit the code path that
creates the shuffle with zero.
On x86, it just forwards to isShuffleMaskLegal(), which is a reasonable thing to
query in general, but doesn't make sense if only restricted to zero blends.

Differential Revision: https://reviews.llvm.org/D24625

llvm-svn: 282567
2016-09-28 06:13:58 +00:00
Adam Nemet c507ac96f5 [Inliner] Port all opt remarks to new streaming API
llvm-svn: 282559
2016-09-27 23:47:03 +00:00
Adam Nemet 0427909434 Pass -S to opt in this test to avoid printing binary on mismatch
The purpose of the test is to verify diagnostics.

llvm-svn: 282558
2016-09-27 23:46:59 +00:00
Kevin Enderby 3e490ef94e Next set of additional error checks for invalid Mach-O files for the
other load commands that use the MachO::dylinker_command type
but not used in llvm libObject code but used in llvm tool code.

This includes LC_ID_DYLINKER, LC_LOAD_DYLINKER
and LC_DYLD_ENVIRONMENT load commands.

llvm-svn: 282553
2016-09-27 23:24:13 +00:00
Sanjay Patel 764ae8bd72 [x86] add folds for FP logic with vector zeros
The 'or' case shows up in copysign. The copysign code also had 
redundant checking for a scalar zero operand with 'and', so I 
removed that. 

I'm not sure how to test vector 'and', 'andn', and 'xor' yet, 
but it seems better to just include all of the logic ops since
we're fixing 'or' anyway.

llvm-svn: 282546
2016-09-27 22:28:13 +00:00
Geoff Berry b124331db7 [TargetRegisterInfo, AArch64] Add target hook for isConstantPhysReg().
Summary:
The current implementation of isConstantPhysReg() checks for defs of
physical registers to determine if they are constant.  Some
architectures (e.g. AArch64 XZR/WZR) have registers that are constant
and may be used as destinations to indicate the generated value is
discarded, preventing isConstantPhysReg() from returning true.  This
change adds a TargetRegisterInfo hook that overrides the no defs check
for cases such as this.

Reviewers: MatzeB, qcolombet, t.p.northover, jmolloy

Subscribers: junbuml, aemerson, mcrosier, rengolin

Differential Revision: https://reviews.llvm.org/D24570

llvm-svn: 282543
2016-09-27 22:17:27 +00:00
Adam Nemet 1142147e41 [Inliner] Fold the analysis remark into the missed remark
There is really no reason for these to be separate.

The vectorizer started this pretty bad tradition that the text of the
missed remarks is pretty meaningless, i.e. vectorization failed.  There,
you have to query analysis to get the full picture.

I think we should just explain the reason for missing the optimization
in the missed remark when possible.  Analysis remarks should provide
information that the pass gathers regardless whether the optimization is
passing or not.

llvm-svn: 282542
2016-09-27 21:58:17 +00:00
Michael Zolotukhin 1a554be3b6 [LoopSimplify] When simplifying phis in loop-simplify, do it only if it preserves LCSSA form.
llvm-svn: 282541
2016-09-27 21:03:45 +00:00
Adam Nemet a62b7e1a28 Output optimization remarks in YAML
(Re-committed after moving the template specialization under the yaml
namespace.  GCC was complaining about this.)

This allows various presentation of this data using an external tool.
This was first recommended here[1].

As an example, consider this module:

  1 int foo();
  2 int bar();
  3
  4 int baz() {
  5   return foo() + bar();
  6 }

The inliner generates these missed-optimization remarks today (the
hotness information is pulled from PGO):

  remark: /tmp/s.c:5:10: foo will not be inlined into baz (hotness: 30)
  remark: /tmp/s.c:5:18: bar will not be inlined into baz (hotness: 30)

Now with -pass-remarks-output=<yaml-file>, we generate this YAML file:

  --- !Missed
  Pass:            inline
  Name:            NotInlined
  DebugLoc:        { File: /tmp/s.c, Line: 5, Column: 10 }
  Function:        baz
  Hotness:         30
  Args:
    - Callee: foo
    - String:  will not be inlined into
    - Caller: baz
  ...
  --- !Missed
  Pass:            inline
  Name:            NotInlined
  DebugLoc:        { File: /tmp/s.c, Line: 5, Column: 18 }
  Function:        baz
  Hotness:         30
  Args:
    - Callee: bar
    - String:  will not be inlined into
    - Caller: baz
  ...

This is a summary of the high-level decisions:

* There is a new streaming interface to emit optimization remarks.
E.g. for the inliner remark above:

   ORE.emit(DiagnosticInfoOptimizationRemarkMissed(
                DEBUG_TYPE, "NotInlined", &I)
            << NV("Callee", Callee) << " will not be inlined into "
            << NV("Caller", CS.getCaller()) << setIsVerbose());

NV stands for named value and allows the YAML client to process a remark
using its name (NotInlined) and the named arguments (Callee and Caller)
without parsing the text of the message.

Subsequent patches will update ORE users to use the new streaming API.

* I am using YAML I/O for writing the YAML file.  YAML I/O requires you
to specify reading and writing at once but reading is highly non-trivial
for some of the more complex LLVM types.  Since it's not clear that we
(ever) want to use LLVM to parse this YAML file, the code supports and
asserts that we're writing only.

On the other hand, I did experiment that the class hierarchy starting at
DiagnosticInfoOptimizationBase can be mapped back from YAML generated
here (see D24479).

* The YAML stream is stored in the LLVM context.

* In the example, we can probably further specify the IR value used,
i.e. print "Function" rather than "Value".

* As before hotness is computed in the analysis pass instead of
DiganosticInfo.  This avoids the layering problem since BFI is in
Analysis while DiagnosticInfo is in IR.

[1] https://reviews.llvm.org/D19678#419445

Differential Revision: https://reviews.llvm.org/D24587

llvm-svn: 282539
2016-09-27 20:55:07 +00:00
Keith Walker 83ebef5db3 Propagate DBG_VALUE entries when there are unvisited predecessors
Variables are sometimes missing their debug location information in
blocks in which the variables should be available. This would occur
when one or more predecessor blocks had not yet been visited by the
routine which propagated the information from predecessor blocks.

This is addressed by only considering predecessor blocks which have
already been visited.

The solution to this problem was suggested by Daniel Berlin on the
LLVM developer mailing list.

Differential Revision: https://reviews.llvm.org/D24927

llvm-svn: 282506
2016-09-27 16:46:07 +00:00
Adam Nemet cc2a3fa8e8 Revert "Output optimization remarks in YAML"
This reverts commit r282499.

The GCC bots are failing

llvm-svn: 282503
2016-09-27 16:39:24 +00:00
Adam Nemet 92e928c10a Output optimization remarks in YAML
This allows various presentation of this data using an external tool.
This was first recommended here[1].

As an example, consider this module:

  1 int foo();
  2 int bar();
  3
  4 int baz() {
  5   return foo() + bar();
  6 }

The inliner generates these missed-optimization remarks today (the
hotness information is pulled from PGO):

  remark: /tmp/s.c:5:10: foo will not be inlined into baz (hotness: 30)
  remark: /tmp/s.c:5:18: bar will not be inlined into baz (hotness: 30)

Now with -pass-remarks-output=<yaml-file>, we generate this YAML file:

  --- !Missed
  Pass:            inline
  Name:            NotInlined
  DebugLoc:        { File: /tmp/s.c, Line: 5, Column: 10 }
  Function:        baz
  Hotness:         30
  Args:
    - Callee: foo
    - String:  will not be inlined into
    - Caller: baz
  ...
  --- !Missed
  Pass:            inline
  Name:            NotInlined
  DebugLoc:        { File: /tmp/s.c, Line: 5, Column: 18 }
  Function:        baz
  Hotness:         30
  Args:
    - Callee: bar
    - String:  will not be inlined into
    - Caller: baz
  ...

This is a summary of the high-level decisions:

* There is a new streaming interface to emit optimization remarks.
E.g. for the inliner remark above:

   ORE.emit(DiagnosticInfoOptimizationRemarkMissed(
                DEBUG_TYPE, "NotInlined", &I)
            << NV("Callee", Callee) << " will not be inlined into "
            << NV("Caller", CS.getCaller()) << setIsVerbose());

NV stands for named value and allows the YAML client to process a remark
using its name (NotInlined) and the named arguments (Callee and Caller)
without parsing the text of the message.

Subsequent patches will update ORE users to use the new streaming API.

* I am using YAML I/O for writing the YAML file.  YAML I/O requires you
to specify reading and writing at once but reading is highly non-trivial
for some of the more complex LLVM types.  Since it's not clear that we
(ever) want to use LLVM to parse this YAML file, the code supports and
asserts that we're writing only.

On the other hand, I did experiment that the class hierarchy starting at
DiagnosticInfoOptimizationBase can be mapped back from YAML generated
here (see D24479).

* The YAML stream is stored in the LLVM context.

* In the example, we can probably further specify the IR value used,
i.e. print "Function" rather than "Value".

* As before hotness is computed in the analysis pass instead of
DiganosticInfo.  This avoids the layering problem since BFI is in
Analysis while DiagnosticInfo is in IR.

[1] https://reviews.llvm.org/D19678#419445

Differential Revision: https://reviews.llvm.org/D24587

llvm-svn: 282499
2016-09-27 16:15:16 +00:00
Simon Dardis d2ed8abb15 [mips] Disable tail calls temporarily
Disable tail calls while the remaining bugs are fixed. Enable only for tests.

Reviewers: vkalintiris

Differential Review: https://reviews.llvm.org/D24912

llvm-svn: 282487
2016-09-27 13:15:54 +00:00
Simon Dardis 0486d585c5 [mips] Add rsqrt, recip for MIPS
Add rsqrt.[ds], recip.[ds] for MIPS. Correct the microMIPS definitions for
architecture support and register usage.

Reviewers: vkalintiris, zoran.jovanoic

Differential Review: https://reviews.llvm.org/D24499

llvm-svn: 282485
2016-09-27 12:25:15 +00:00
Nemanja Ivanovic 6f22b41398 [Power9] Builtins for ELF v.2 API conformance - back end portion
This patch corresponds to review:
https://reviews.llvm.org/D24396

This patch adds support for the "vector count trailing zeroes",
"vector compare not equal" and "vector compare not equal or zero instructions"
as well as "scalar count trailing zeroes" instructions. It also changes the
vector negation to use XXLNOR (when VSX is enabled) so as not to increase
register pressure (previously this was done with a splat immediate of all
ones followed by an XXLXOR). This was done because the altivec.h
builtins (patch to follow) use vector negation and the use of an additional
register for the splat immediate is not optimal.

llvm-svn: 282478
2016-09-27 08:42:12 +00:00
Craig Topper 71f1c64320 [X86] Add test case for PR30511 and r282341.
llvm-svn: 282473
2016-09-27 06:44:30 +00:00
Craig Topper 4ffe5d5af0 [X86] Expand all-ones-vector test to cover 256-bit and 512-bit vectors.
llvm-svn: 282472
2016-09-27 06:44:27 +00:00
Kostya Serebryany 45c144754b [sanitizer-coverage] fix a bug in trace-gep
llvm-svn: 282467
2016-09-27 01:55:08 +00:00
Kostya Serebryany 186d61801c [sanitizer-coverage] don't emit the CTOR function if nothing has been instrumented
llvm-svn: 282465
2016-09-27 01:08:33 +00:00
Ivan Krasin 4ff4f21e15 Revert r277556. Add -lowertypetests-bitsets-level to control bitsets generation
Summary:
We don't currently need this facility for CFI. Disabling individual hot methods proved
to be a better strategy in Chrome.

Also, the design of the feature is suboptimal, as pointed out by Peter Collingbourne.

Reviewers: pcc

Subscribers: kcc

Differential Revision: https://reviews.llvm.org/D24948

llvm-svn: 282461
2016-09-27 00:29:53 +00:00
Davide Italiano a9f85d68cc [CodeGen] Add support for emitting .init_array instead of .ctors on FreeBSD.
PR: 30494
llvm-svn: 282451
2016-09-26 22:53:15 +00:00
Davide Italiano f5d77f4aad [CodeGen] Switch test as FreeBSD will support .init_array soon.
llvm-svn: 282450
2016-09-26 22:38:17 +00:00
Derek Schuff 92d300eb8f [WebAssembly] Use the frame pointer instead of the stack pointer
When we have dynamic allocas we have a frame pointer, and
when we're lowering frame indexes we should make sure we use it.

Patch by Jacob Gravelle

Differential Revision: https://reviews.llvm.org/D24889

llvm-svn: 282442
2016-09-26 21:18:03 +00:00
Kevin Enderby 90986e6c7c Next set of additional error checks for invalid Mach-O files for the
other load commands that use the Mach::linkedit_data_command type
but not used in llvm libObject code but used in llvm tool code.

This includes LC_FUNCTION_STARTS, LC_SEGMENT_SPLIT_INFO
and LC_DYLIB_CODE_SIGN_DRS load commands.

llvm-svn: 282441
2016-09-26 21:11:03 +00:00
Piotr Padlewski d9830eb79f [thinlto] Basic thinlto fdo heuristic
Summary:
This patch improves thinlto importer
by importing 3x larger functions that are called from hot block.

I compared performance with the trunk on spec, and there
were about 2% on povray and 3.33% on milc. These results seems
to be consistant and match the results Teresa got with her simple
heuristic. Some benchmarks got slower but I think they are just
noisy (mcf, xalancbmki, omnetpp)- running the benchmarks again with
more iterations to confirm. Geomean of all benchmarks including the noisy ones
were about +0.02%.

I see much better improvement on google branch with Easwaran patch
for pgo callsite inlining (the inliner actually inline those big functions)
Over all I see +0.5% improvement, and I get +8.65% on povray.
So I guess we will see much bigger change when Easwaran patch will land
(it depends on new pass manager), but it is still worth putting this to trunk
before it.

Implementation details changes:
- Removed CallsiteCount.
- ProfileCount got replaced by Hotness
- hot-import-multiplier is set to 3.0 for now,
didn't have time to tune it up, but I see that we get most of the interesting
functions with 3, so there is no much performance difference with higher, and
binary size doesn't grow as much as with 10.0.

Reviewers: eraman, mehdi_amini, tejohnson

Subscribers: mehdi_amini, llvm-commits

Differential Revision: https://reviews.llvm.org/D24638

llvm-svn: 282437
2016-09-26 20:37:32 +00:00
Nirav Dave 6477ce2697 Add support for Code16GCC
[X86] The .code16gcc directive parses X86 assembly input in 32-bit mode and
outputs in 16-bit mode. Teach parser to switch modes appropriately.

Reviewers: dwmw2, craig.topper

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D20109

llvm-svn: 282430
2016-09-26 19:33:36 +00:00
Evandro Menezes 055767d5f4 [AArch64] Fix test triplet
Specify proper target triplet to pass under Windows too.

llvm-svn: 282423
2016-09-26 18:09:21 +00:00
Tom Stellard 1b9748c6a2 AMDGPU/SI: Don't crash on anonymous GlobalValues
Summary:
We need to call AsmPrinter::getNameWithPrefix() in order to handle
anonymous GlobalValues (e.g. @0, @1).

Reviewers: arsenm, b-sumner

Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tony-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D24865

llvm-svn: 282420
2016-09-26 17:29:25 +00:00
Daniel Berlin 1e98c04226 Remove pruning of phi nodes in MemorySSA - it makes updating harder
Reviewers: george.burgess.iv

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D24923

llvm-svn: 282419
2016-09-26 17:22:54 +00:00
Matthew Simpson b764aba2ab [LV] Scalarize instructions marked scalar after vectorization
This patch ensures that we actually scalarize instructions marked scalar after
vectorization. Previously, such instructions may have been vectorized instead.

Differential Revision: https://reviews.llvm.org/D23889

llvm-svn: 282418
2016-09-26 17:08:37 +00:00
Gor Nishanov bc0ebb383c [Coroutines] Part14: Handle coroutines with no suspend points.
Summary:
If coroutine has no suspend points, remove heap allocation and turn a coroutine into a normal function.

Also, if a pattern is detected that coroutine resumes or destroys itself prior to coro.suspend call, turn the suspend point into a simple jump to resume or cleanup label. This pattern occurs when coroutines are used to propagate errors in functions that return expected<T>.

Reviewers: majnemer

Subscribers: mehdi_amini, llvm-commits

Differential Revision: https://reviews.llvm.org/D24408

llvm-svn: 282414
2016-09-26 15:49:28 +00:00
Geoff Berry 256fcf975f [AArch64] Improve add/sub/cmp isel of uxtw forms.
Don't match the UXTW extended reg forms of ADD/ADDS/SUB/SUBS if the
32-bit to 64-bit zero-extend can be done for free by taking advantage
of the 32-bit defining instruction zeroing the upper 32-bits of the X
register destination.  This enables better instruction selection in a
few cases, such as:

  sub x0, xzr, x8
  instead of:
  mov x8, xzr
  sub x0, x8, w9, uxtw

  madd x0, x1, x1, x8
  instead of:
  mul x9, x1, x1
  add x0, x9, w8, uxtw

  cmp x2, x8
  instead of:
  sub x8, x2, w8, uxtw
  cmp x8, #0

  add x0, x8, x1, lsl #3
  instead of:
  lsl x9, x1, #3
  add x0, x9, w8, uxtw

Reviewers: t.p.northover, jmolloy

Subscribers: mcrosier, aemerson, llvm-commits, rengolin

Differential Revision: https://reviews.llvm.org/D24747

llvm-svn: 282413
2016-09-26 15:34:47 +00:00
Evandro Menezes e45de8a5ec Add support to optionally limit the size of jump tables.
Many high-performance processors have a dedicated branch predictor for
indirect branches, commonly used with jump tables.  As sophisticated as such
branch predictors are, they tend to have well defined limits beyond which
their effectiveness is hampered or even nullified.  One such limit is the
number of possible destinations for a given indirect branches that such
branch predictors can handle.

This patch considers a limit that a target may set to the number of
destination addresses in a jump table.

Patch by: Evandro Menezes <e.menezes@samsung.com>, Aditya Kumar
<aditya.k7@samsung.com>, Sebastian Pop <s.pop@samsung.com>.

Differential revision: https://reviews.llvm.org/D21940

llvm-svn: 282412
2016-09-26 15:32:33 +00:00
Alexey Bataev 793c946ecb [InstCombine] Fixed bug introduced in r282237
The index of the new insertelement instruction was evaluated in the
wrong way, it was considered as the index of the inserted value instead
of index of the position, where the value should be inserted.

llvm-svn: 282401
2016-09-26 13:18:59 +00:00
Andrea Di Biagio a82d52d11d [InstCombine] Teach the udiv folding logic how to handle constant expressions.
This patch fixes PR30366.

Function foldUDivShl() worked under the assumption that one of the values
in input to the function was always an instance of llvm::Instruction.
However, function visitUDivOperand() (the only user of foldUDivShl) was
clearly violating that precondition; internally, visitUDivOperand() uses pattern
matches to check the operands of a udiv. Pattern matchers for binary operators
know how to handle both Instruction and ConstantExpr values.

This patch fixes the problem in foldUDivShl(). Now we use pattern matchers
instead of explicit casts to Instruction. The reduced test case from PR30366
has been added to test file InstCombine/udiv-simplify.ll.

Differential Revision: https://reviews.llvm.org/D24565

llvm-svn: 282398
2016-09-26 12:07:23 +00:00
Sam Kolton 984461062f Revert "[AMDGPU] Disassembler: print label names in branch instructions"
This reverts commit 6c6dbe625263ec9fcf8de0df27263cf147cde550.

llvm-svn: 282396
2016-09-26 11:29:03 +00:00
Sam Kolton 1559f76257 [AMDGPU] Disassembler: print label names in branch instructions
Summary: Add AMDGPUSymbolizer for finding names for labels from ELF symbol table.

Reviewers: vpykhtin, artem.tamazov, tstellarAMD

Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tony-tye

Differential Revision: https://reviews.llvm.org/D24802

llvm-svn: 282394
2016-09-26 10:05:50 +00:00
James Molloy 9abb2fa5bb [ARM] Promote small global constants to constant pools
If a constant is unamed_addr and is only used within one function, we can save
on the code size and runtime cost of an indirection by changing the global's storage
to inside the constant pool. For example, instead of:

      ldr r0, .CPI0
      bl printf
      bx lr
    .CPI0: &format_string
    format_string: .asciz "hello, world!\n"

We can emit:

      adr r0, .CPI0
      bl printf
      bx lr
    .CPI0: .asciz "hello, world!\n"

This can cause significant code size savings when many small strings are used in one
function (4 bytes per string).

This recommit contains fixes for a nasty bug related to fast-isel fallback - because
fast-isel doesn't know about this optimization, if it runs and emits references to
a string that we inline (because fast-isel fell back to SDAG) we will end up
with an inlined string and also an out-of-line string, and we won't emit the
out-of-line string, causing backend failures.

It also contains fixes for emitting .text relocations which made the sanitizer
bots unhappy.

llvm-svn: 282387
2016-09-26 07:26:24 +00:00
Zvi Rackover 839d15a194 [X86] Optimization for replacing LEA with MOV at frame index elimination time
Summary:
Replace a LEA instruction of the form 'lea (%esp), %ebx' --> 'mov %esp, %ebx'

MOV is preferable over LEA because usually there are more issue-slots available to execute MOVs than LEAs. Latest processors also support zero-latency MOVs.

Fixes pr29022.

Reviewers: hfinkel, delena, igorb, myatsina, mkuper

Differential Revision: https://reviews.llvm.org/D24705

llvm-svn: 282385
2016-09-26 06:42:07 +00:00
Ayman Musa d7a5ed4141 [X86][avx512] Fix bug in masked compress store.
Differential Revision: https://reviews.llvm.org/D23984

llvm-svn: 282381
2016-09-26 06:22:08 +00:00
Craig Topper 60d3ef1d72 [AVX-512] Fix some patterns predicates to properly enforce priority for various versions of CVTDQ2PD instruction.
llvm-svn: 282358
2016-09-25 16:34:02 +00:00
Craig Topper d8b2bd492c [AVX-512] Add the scalar unsigned integer to fp conversion instructions to hasUndefRegUpdate.
llvm-svn: 282356
2016-09-25 16:33:57 +00:00
Sanjay Patel 752ad8fde7 [x86] don't try to create a vector integer inst for an SSE1 target (PR30512)
This bug was introduced with:
http://reviews.llvm.org/rL272511

We need to restrict the lowering to v4f32 comparisons because that's all SSE1 can handle.

This should fix:
https://llvm.org/bugs/show_bug.cgi?id=28044

llvm-svn: 282336
2016-09-24 20:24:06 +00:00
Sanjay Patel 0b36337d61 [x86] fix FCOPYSIGN lowering to create constants instead of ConstantPool loads
This is similar to:
https://reviews.llvm.org/rL279958

By not prematurely lowering to loads, we should be able to more easily eliminate
the 'or' with zero instructions seen in copysign-constant-magnitude.ll.

We should also be able to extend this code to handle vectors.

llvm-svn: 282312
2016-09-23 23:17:29 +00:00
Petr Hosek 85b2f67613 [MC] Support .ds directives in assembler parser
These directives are already supported by GNU assembler.

Differential Revision: https://reviews.llvm.org/D24740

llvm-svn: 282303
2016-09-23 21:53:36 +00:00
Matthias Braun 729c989083 llc: Add -start-before/-stop-before options
Differential Revision: https://reviews.llvm.org/D23089

llvm-svn: 282302
2016-09-23 21:46:02 +00:00
Teresa Johnson 896fee2846 [gold] Split plugin options controlling ThinLTO and codegen parallelism.
Summary:
As suggested in D24826, use different options for ThinLTO backend
parallelism from the option controlling regular LTO code gen
parallelism. They are already split in the LTO API, and this enables
controlling them with different clang options.

Reviewers: pcc, mehdi_amini

Subscribers: dexonsmith, llvm-commits, mehdi_amini

Differential Revision: https://reviews.llvm.org/D24873

llvm-svn: 282290
2016-09-23 20:35:19 +00:00
Petr Hosek 4cb08ce3bc [MC] Support .dcb directives in assembler parser
These directives are already supported by GNU assembler.

Differential Revision: https://reviews.llvm.org/D24741

llvm-svn: 282283
2016-09-23 19:25:15 +00:00
Vedant Kumar 4588088050 [llvm-cov] Filter away source files that aren't in the coverage mapping
... so that they don't show up in the index. This came up because polly
contains a .git directory and some other unmapped input in its source
dir.

llvm-svn: 282282
2016-09-23 18:57:35 +00:00
Vedant Kumar bc6479850e [llvm-cov] Get rid of all invalid filename references
We used to append filenames into a vector of std::string, and then
append a reference to each string into a separate vector. This made it
easier to work with the getUniqueSourceFiles API. But it's buggy.

std::string has a small-string optimization, so you can't expect to
capture a reference to one if you're copying it into a growing vector.
Add a test that triggers this invalid reference to std::string scenario,
and kill the issue with fire by just using ArrayRef<std::string>
everywhere.

llvm-svn: 282281
2016-09-23 18:57:32 +00:00
Sanjay Patel 04949faf69 [TLI] isdigit / isascii / toascii param type should match return type (PR30484)
We crash in LibCallSimplifier if we don't check the validity of the function signature properly.

llvm-svn: 282278
2016-09-23 18:44:09 +00:00
Matthias Braun 1acb55e67c ScheduleDAG: Match enum names when printing sdep kinds
It is less confusing to have the same names in the debug print as the
enum members.

llvm-svn: 282273
2016-09-23 18:28:31 +00:00
Jun Bum Lim 3822939ba7 Enhance calcColdCallHeuristics for InvokeInst
Summary: When identifying cold blocks, consider only the edge to the normal destination if the terminator is InvokeInst and let calcInvokeHeuristics() decide edge weights for the InvokeInst.

Reviewers: mcrosier, hfinkel, davidxl

Subscribers: mcrosier, llvm-commits

Differential Revision: https://reviews.llvm.org/D24868

llvm-svn: 282262
2016-09-23 17:26:14 +00:00
James Molloy 85124c76fc Revert "[ARM] Promote small global constants to constant pools"
This reverts commit r282241. It caused http://lab.llvm.org:8011/builders/clang-native-arm-lnt/builds/19882.

llvm-svn: 282249
2016-09-23 13:35:43 +00:00
Nemanja Ivanovic d2c3c51a70 [Power9] Exploit move and splat instructions for build_vector improvement
This patch corresponds to review:
https://reviews.llvm.org/D21135

This patch exploits the following instructions:
mtvsrws
lxvwsx
mtvsrdd
mfvsrld

In order to improve some build_vector and extractelement patterns.

llvm-svn: 282246
2016-09-23 13:25:31 +00:00
James Molloy 1ce54d6be2 [ARM] Promote small global constants to constant pools
If a constant is unamed_addr and is only used within one function, we can save
on the code size and runtime cost of an indirection by changing the global's storage
to inside the constant pool. For example, instead of:

      ldr r0, .CPI0
      bl printf
      bx lr
    .CPI0: &format_string
    format_string: .asciz "hello, world!\n"

We can emit:

      adr r0, .CPI0
      bl printf
      bx lr
    .CPI0: .asciz "hello, world!\n"

This can cause significant code size savings when many small strings are used in one
function (4 bytes per string).

This recommit contains fixes for a nasty bug related to fast-isel fallback - because
fast-isel doesn't know about this optimization, if it runs and emits references to
a string that we inline (because fast-isel fell back to SDAG) we will end up
with an inlined string and also an out-of-line string, and we won't emit the
out-of-line string, causing backend failures.

It also contains fixes for emitting .text relocations which made the sanitizer
bots unhappy.

llvm-svn: 282241
2016-09-23 12:15:58 +00:00
George Rimar 4f82df52ae Revert r282238 "Revert r282235 "[llvm-dwarfdump] - Teach dwarfdump to dump gdb-index section.""
Build bot issues (http://lab.llvm.org:8011/builders/clang-x64-ninja-win7/builds/15856/steps/ninja%20check%201/logs/FAIL%3A%20LLVM%3A%3Adwarfdump-dump-gdbindex.test)
should be fixed in that version. Issue was that MSVS does not support "%zu". Though it works fine on MSCS 2015,
Bot looks running MSVS 2013 that does not like it. MSDN also says that "z" prefix is not supported: https://msdn.microsoft.com/en-us/library/tcxf1dw6.aspx
I had to use PRId64 instead.

Original commit message:

[llvm-dwarfdump] - Teach dwarfdump to dump gdb-index section.

gold linker's --gdb-index option currently is able to create the .gdb_index section that allows GDB to locate and read the .dwo files as it needs them,
this helps reduce the total size of the object files processed by the linker.

More info about that:
https://gcc.gnu.org/wiki/DebugFission
https://sourceware.org/gdb/onlinedocs/gdb/Index-Section-Format.html

Patch teaches dwarfdump tool to dump this section.

Differential revision: https://reviews.llvm.org/D21503

llvm-svn: 282239
2016-09-23 11:01:53 +00:00
George Rimar a348527186 Revert r282235 "[llvm-dwarfdump] - Teach dwarfdump to dump gdb-index section."
It broke BB:
http://lab.llvm.org:8011/builders/clang-x64-ninja-win7/builds/15856

llvm-svn: 282238
2016-09-23 10:12:56 +00:00
Alexey Bataev fee9078dcd [InstCombine] Fix for PR29124: reduce insertelements to shufflevector
If inserting more than one constant into a vector:

define <4 x float> @foo(<4 x float> %x) {
  %ins1 = insertelement <4 x float> %x, float 1.0, i32 1
  %ins2 = insertelement <4 x float> %ins1, float 2.0, i32 2
  ret <4 x float> %ins2
}

InstCombine could reduce that to a shufflevector:

define <4 x float> @goo(<4 x float> %x) {
 %shuf = shufflevector <4 x float> %x, <4 x float> <float undef, float 1.0, float 2.0, float undef>, <4 x i32><i32 0, i32 5, i32 6, i32 3>
 ret <4 x float> %shuf
}
Also, InstCombine tries to convert shuffle instruction to single insertelement, if one of the vectors is a constant vector and only a single element from this constant should be used in shuffle, i.e.
shufflevector <4 x float> %v, <4 x float> <float undef, float 1.0, float
undef, float undef>, <4 x i32> <i32 0, i32 5, i32 undef, i32 undef> ->
insertelement <4 x float> %v, float 1.0, 1

Differential Revision: https://reviews.llvm.org/D24182

llvm-svn: 282237
2016-09-23 09:14:08 +00:00
George Rimar a77bcf5e42 [llvm-dwarfdump] - Teach dwarfdump to dump gdb-index section.
gold linker's --gdb-index option currently is able to create the .gdb_index section that allows GDB to locate and read the .dwo files as it needs them,
this helps reduce the total size of the object files processed by the linker.

More info about that:
https://gcc.gnu.org/wiki/DebugFission
https://sourceware.org/gdb/onlinedocs/gdb/Index-Section-Format.html

Patch teaches dwarfdump tool to dump this section.

Differential revision: https://reviews.llvm.org/D21503

llvm-svn: 282235
2016-09-23 09:09:26 +00:00
Tom Stellard e88bbc34c6 AMDGPU/SI: Include implicit arguments in kernarg_segment_byte_size
Reviewers: arsenm

Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, llvm-commits, tony-tye

Differential Revision: https://reviews.llvm.org/D24835

llvm-svn: 282223
2016-09-23 01:33:26 +00:00
Petr Hosek 2f4ac44dc3 [MC] Support skip and count for .incbin directive
These optional arguments are supported by GNU assembler.

Differential Revision: https://reviews.llvm.org/D24714

llvm-svn: 282217
2016-09-23 00:41:06 +00:00
Sanjay Patel 30ef70b090 [InstCombine] fold X urem C -> X < C ? X : X - C when C is big (PR28672)
We already have the udiv variant of this transform, so I think this is ok for 
InstCombine too even though there is an increase in IR instructions. As the 
tests and TODO comments show, the transform can lead to follow-on combines.

This should fix: https://llvm.org/bugs/show_bug.cgi?id=28672

Differential Revision: https://reviews.llvm.org/D24527

llvm-svn: 282209
2016-09-22 22:36:26 +00:00
Vedant Kumar 1ce90d889a [llvm-cov] Add the ability to specify directories of input source files
We've supported restricting coverage reports to a set of files for a
long time. Add support for being able to restrict by entire directories.

I suppose this supersedes D20803.

llvm-svn: 282202
2016-09-22 21:49:43 +00:00
Hans Wennborg c7957ef86c Revert r282168 "GVN-hoist: fix store past load dependence analysis (PR30216)"
and also the dependent r282175 "GVN-hoist: do not dereference null pointers"

It's causing compiler crashes building Harfbuzz (PR30499).

llvm-svn: 282199
2016-09-22 21:20:53 +00:00
Arnold Schwaighofer 0fd32c005b i386 does not support optimized swifterror handling
rdar://28432565

llvm-svn: 282186
2016-09-22 20:06:25 +00:00
Hans Wennborg c4b1d20ba2 Win64: Don't emit unwind info for "leaf" functions (PR30337)
According to MSDN (see the PR), functions which don't touch any callee-saved
registers (including %rsp) don't need any unwind info.

This patch makes LLVM not emit unwind info for such functions, to save
binary size.

Differential Revision: https://reviews.llvm.org/D24748

llvm-svn: 282185
2016-09-22 19:50:05 +00:00
Nemanja Ivanovic 8dacca943a [PowerPC] Sign extend sub-word values for atomic comparisons
Atomic comparison instructions use the sub-word load instruction on
Power8 and up but the value is not sign extended prior to the signed word
compare instruction. This patch adds that sign extension.

llvm-svn: 282182
2016-09-22 19:06:38 +00:00
Nirav Dave 9011da3d44 [DAG] Fix incorrect alignment of ext load.
Correctly use alignment size from loaded size not output value size.

Reviewers: jyknight, tstellarAMD, arsenm

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D23356

llvm-svn: 282177
2016-09-22 17:28:43 +00:00
Krzysztof Parzyszek b66efb855c [PPC] Set SP after loading data from stack frame, if no red zone is present
Follow-up to r280705: Make sure that the SP is only restored after all data
is loaded from the stack frame, if there is no red zone.

This completes the fix for https://llvm.org/bugs/show_bug.cgi?id=26519.

Differential Revision: https://reviews.llvm.org/D24466

llvm-svn: 282174
2016-09-22 17:22:43 +00:00
Sebastian Pop 8e6e3318c2 GVN-hoist: fix store past load dependence analysis (PR30216)
To hoist stores past loads, we used to search for potential
conflicting loads on the hoisting path by following a MemorySSA
def-def link from the store to be hoisted to the previous
defining memory access, and from there we followed the def-use
chains to all the uses that occur on the hoisting path. The
problem is that the def-def link may point to a store that does
not alias with the store to be hoisted, and so the loads that are
walked may not alias with the store to be hoisted, and even as in
the testcase of PR30216, the loads that may alias with the store
to be hoisted are not visited.

The current patch visits all loads on the path from the store to
be hoisted to the hoisting position and uses the alias analysis
to ask whether the store may alias the load. I was not able to
use the MemorySSA functionality to ask for whether load and
store are clobbered: I'm not sure which function to call, so I
used a call to AA->isNoAlias().

Store past store is still working as before using a MemorySSA
query: I added an extra test to pr30216.ll to make sure store
past store does not regress.

Differential Revision: https://reviews.llvm.org/D24517

llvm-svn: 282168
2016-09-22 15:33:51 +00:00
Sebastian Pop f6bfc1ad8b GVN-hoist: move hoist testcase to GVNHoist dir
llvm-svn: 282161
2016-09-22 14:45:46 +00:00
Sebastian Pop 440f15b7fc GVN-hoist: only hoist relevant scalar instructions
Without this patch, GVN-hoist would think that a branch instruction is a scalar instruction
and would try to value number it. The patch filters out all such kind of irrelevant instructions.

A bit frustrating is that there is no easy way to discard all those very infrequent instructions,
a bit like isa<TerminatorInst> that stands for a large family of instructions. I'm thinking that
checking for those very infrequent other instructions would cost us more in compilation time
than just letting those instructions getting numbered, so I'm still thinking that a simpler check:

  if (isa<TerminatorInst>(I))
    return false;

is better than listing all the other less frequent instructions.

Differential Revision: https://reviews.llvm.org/D23929

llvm-svn: 282160
2016-09-22 14:45:40 +00:00
Tim Northover a5e38fa00d GlobalISel: handle stack-based parameters on AArch64.
llvm-svn: 282153
2016-09-22 13:49:25 +00:00
Anna Thomas 82c3717f54 [RS4GC] Remat in presence of phi and use live value
Summary:

Reviewers:

Subscribers:

llvm-svn: 282150
2016-09-22 13:13:06 +00:00
Artem Tamazov 2146a0a90e [AMDGPU][mc] Add support for absolute expressions in DPP modifiers.
Also added range checking for DPP attributes.
Assembler tests added as well.

Differential Revision: https://reviews.llvm.org/D24755

llvm-svn: 282145
2016-09-22 11:47:21 +00:00
Nemanja Ivanovic 6e7879c5e6 [Power9] Add exploitation of non-permuting memory ops
This patch corresponds to review:
https://reviews.llvm.org/D19825

The new lxvx/stxvx instructions do not require the swaps to line the elements
up correctly. In order to select them over the lxvd2x/lxvw4x instructions which
require swaps, the patterns for the old instruction have a predicate that
ensures they won't be selected on Power9 and newer CPUs.

llvm-svn: 282143
2016-09-22 09:52:19 +00:00
Sagar Thakur e74eb4e7b9 [EfficiencySanitizer] Using '$' instead of '#' for struct counter name
For MIPS '#' is the start of comment line. Therefore we get assembler errors if # is used in the structure names.

Differential: D24334
Reviewed by: zhaoqin

llvm-svn: 282141
2016-09-22 08:33:06 +00:00
Dorit Nuzman d1247a684e Fix revision 281960
llvm-svn: 282139
2016-09-22 07:56:23 +00:00
Craig Topper 202b453a8a [AVX-512] Add support for commuting VPTERNLOG instructions.
VPTERNLOG is a ternary instruction with an immediate specifying the logical operation to perform. For each bit position in the 3 source vectors the bit from each source is concatenated together and the resulting 3-bit value is used to select a bit in the immediate. This bit value is written to the result vector.

We can commute this by swapping operands and modifying the immediate. To modify the immediate we need to swap two pairs of bits. The pairs correspond to the locations in the immediate where the commuted operands bits have opposite values and the uncommuted operand has the same value. Bits 0 and 7 will never be swapped since the relevant bits from all sources are the same value.

This refactors and reuses parts of the FMA3 commuting code which is also a three operand instruction.

llvm-svn: 282132
2016-09-22 03:00:50 +00:00
Kevin Enderby e71e13c7d6 Next set of additional error checks for invalid Mach-O files for bad LC_UUID
load commands.  Added a missing check and made the check for more than
one like other other “more than one” checks.  And of course added test cases.

llvm-svn: 282104
2016-09-21 20:03:09 +00:00
Chad Rosier 00eb8db3a1 [LoopInterchange] Track all dependencies, not just anti dependencies.
Currently, we give up on loop interchange if we encounter a flow dependency
anywhere in the loop list. Worse yet, we don't even track output dependencies.

This patch updates the dependency matrix computation to track flow and output
dependencies in the same way we track anti dependencies.

This improves an internal workload by 2.2x.

Note the loop interchange pass is off by default and it can be enabled with
'-mllvm -enable-loopinterchange'

Differential Revision: https://reviews.llvm.org/D24564

llvm-svn: 282101
2016-09-21 19:16:47 +00:00
Teresa Johnson 3f212b8908 [ThinLTO] Emit files for distributed builds for all modules
With the new LTO API in r278338, we stopped emitting the individual
index files and imports files for some modules in the distributed backend
case (thinlto-index-only plugin option).

Specifically, this is when the linker decides not to include a module in the
link, because it was in an archive library and did not have a strong
reference to it. Not creating the expected output files makes the
distributed build system implementation more difficult, in terms of
checking for the expected outputs of the thin link, and scheduling the
backend jobs. To address this, the gold-plugin will write dummy empty
.thinlto.bc and .imports files for modules not included in the link
(which LTO never sees).

Augmented a gold v1.12+ test, since that version of gold has the handling
for notifying on modules not being included in the link.

llvm-svn: 282100
2016-09-21 19:12:05 +00:00
Nico Weber a489438849 revert 281908 because 281909 got reverted
llvm-svn: 282097
2016-09-21 18:25:43 +00:00
Arnold Schwaighofer de2490d0dc Disable tail calls if there is an swifterror argument
ISel does not handle them correctly yet i.e we crash trying to emit tail call
code.

radar://28407842

llvm-svn: 282088
2016-09-21 16:53:36 +00:00
Matthew Simpson 15869f86d8 [LV] Don't emit unused scalars for uniform instructions
If we identify an instruction as uniform after vectorization, we know that we
should only use the value corresponding to the first vector lane of each unroll
iteration. However, when scalarizing such instructions, we still produce values
for the other vector lanes. This patch prevents us from generating the unused
scalars.

Differential Revision: https://reviews.llvm.org/D24275

llvm-svn: 282087
2016-09-21 16:50:24 +00:00
Artem Tamazov 2e217b87cb [AMDGPU][mc] Add support for ds_add_[rtn_]f32.
Lit tests added.
Resolves https://github.com/RadeonOpenCompute/hcc/issues/122.

Differential Revision: https://reviews.llvm.org/D24765

llvm-svn: 282086
2016-09-21 16:35:44 +00:00
Dehao Chen 160fbc3f95 Change the basic block weight calculation algorithm to use max instead of voting.
Summary: Now that we have more precise debug info, we should change back to use maximum to get basic block weight.

Reviewers: dnovillo

Subscribers: andreadb, llvm-commits

Differential Revision: https://reviews.llvm.org/D24788

llvm-svn: 282084
2016-09-21 16:26:51 +00:00
Cameron McInally 2aa85e1210 [AVX512] Fix return types on int_x86_avx512_gatherXXX_di intrinsics
The return type should match the pass through vector type.

Differential Revision: https://reviews.llvm.org/D24744

llvm-svn: 282081
2016-09-21 16:06:10 +00:00
Hans Wennborg 1049085c78 Revert r281895 "Add @llvm.dbg.value entries for the phi node created by -mem2reg"
(And follow-up r281964.)

It caused PR30468.

llvm-svn: 282077
2016-09-21 15:55:53 +00:00
Nico Weber 903859c0e4 Revert r281715, it caused PR30475
llvm-svn: 282076
2016-09-21 15:33:24 +00:00
Arnold Schwaighofer f62ba1031f DeadArgElim: Don't mark swifterror arguments as unused
Replacing swifterror arguments with undef creates invalid IR.

rdar://28300490

llvm-svn: 282075
2016-09-21 15:29:08 +00:00
Tim Northover 9a46718378 GlobalISel: produce correct code for signext/zeroext ABI flags.
We still don't really have an equivalent of "AssertXExt" in DAG, so we don't
exploit the guarantees on the receiving side yet, but this should produce
conservatively correct code on iOS ABIs.

llvm-svn: 282069
2016-09-21 12:57:45 +00:00
Simon Dardis 9a66bbecae [mips] LLVM PR/30197 - Tail call incorrectly clobbers arguments for mips
The postRA scheduler performs alias analysis to determine if stores and loads
can moved past each other. When a function has more arguments than argument
registers for the calling convention used, excess arguments are spilled onto the
stack. LLVM by default assumes that argument slots are immutable, unless the
function contains a tail call. Without the knowledge of that a function contains
a tail call site, stores and loads to fixed stack slots may be re-ordered
causing the out-going arguments to clobber the incoming arguments before the
incoming arguments are supposed to be dead.

Reviewers: vkalintiris

Differential Review: https://reviews.llvm.org/D24077

llvm-svn: 282063
2016-09-21 09:43:40 +00:00
Diana Picus 2a3f066349 Revert "AArch64: Set shift bit of TLSLE HI12 add instruction"
This reverts commit r282057 because it broke the buildbots - see e.g.
http://lab.llvm.org:8011/builders/clang-cmake-aarch64-42vma/builds/12063

llvm-svn: 282058
2016-09-21 08:24:41 +00:00
Lei Liu 6c87f23526 AArch64: Set shift bit of TLSLE HI12 add instruction
Summary: AArch64 LLVM assembler emits add instruction without shift bit to calculate the higher 12-bit address of TLS variables in local exec model.  This generates wrong code sequence to access TLS variables with thread offset larger than 0x1000.

Reviewers: t.p.northover, peter.smith, rovka

Subscribers: salim.nasser, aemerson, llvm-commits, rengolin

Differential Revision: https://reviews.llvm.org/D24702

llvm-svn: 282057
2016-09-21 07:41:41 +00:00
NAKAMURA Takumi 0d57392ef3 llvm/test/CodeGen/NVPTX/zero-cs.ll: Relax an expression to match in -Asserts.
LLVM ERROR: Cannot select: 0x3607bf0: i32 = ExternalSymbol'__powidf2'

llvm-svn: 282053
2016-09-21 04:43:11 +00:00
Adam Nemet e3cef93727 [LV] When reporting about a specific instruction without debug location use loop's
This can occur for example if some optimization drops the debug location.

llvm-svn: 282048
2016-09-21 03:14:20 +00:00
Jacques Pienaar 98345fc0a1 [NVPTX] Check if callsite is defined when computing argument allignment
Summary: In getArgumentAlignment check if the ImmutableCallSite pointer CS is non-null before dereferencing. If CS is 0x0 fall back to the ABI type alignment else compute the alignment as before.

Reviewers: eliben, jpienaar

Subscribers: jlebar, vchuravy, cfe-commits, jholewinski

Differential Revision: https://reviews.llvm.org/D9168

llvm-svn: 282045
2016-09-21 01:57:57 +00:00
Michael Kuperstein 79dcc274b4 [InferAttributes] Don't access parameters that don't exist.
Check for the correct number of parameters before querying their type.
This fixes PR30455.

llvm-svn: 282038
2016-09-20 23:10:31 +00:00
Teresa Johnson 620c140a9b [ThinLTO] Always emit a summary when compiling in ThinLTO mode
Summary:
Emit an empty summary section, instead of no summary section, when
there are no global variables in the index. This ensures that LTO
will treat these files as ThinLTO inputs, instead of as regular
LTO inputs.

In addition to not being what the user likely intended when
compiling with -flto=thin, the current behavior is problematic for
distributed build systems that expect to get ThinLTO index and imports
files back for each input compiled with -flto=thin. Combining into
a single regular LTO module also reduces the backend parallelism.
And in the case where the index was suppressed due to uses in
inline assembly, combining into a single LTO module could provoke
renaming of duplicates that we were trying to prevent by suppressing
the index.

This change required a couple of fixes to handle the empty summary
section.

Reviewers: mehdi_amini

Subscribers: mehdi_amini, llvm-commits, pcc

Differential Revision: https://reviews.llvm.org/D24779

llvm-svn: 282037
2016-09-20 23:07:17 +00:00
Vedant Kumar e90797794f [llvm-cov] Demangle names for hidden instantiation views
llvm-svn: 282020
2016-09-20 21:27:48 +00:00
Xinliang David Li deda33cdbd [Profile] dump ic value profile value/site-count histogram
Differential Revision: http://reviews.google.com/D24783

llvm-svn: 282017
2016-09-20 21:04:22 +00:00
Petr Hosek 1290355f5a Mark ELF sections whose name start with .note as note
Previously, such section would be marked as SHT_PROGBITS which
makes it impossible to use an initialized C variable declaration
to emit an (allocated) ELF note. The new behavior is also consistent
with ELF assembly parser.

Differential Revision: https://reviews.llvm.org/D24692

llvm-svn: 282010
2016-09-20 20:21:13 +00:00
Kevin Enderby fc0929adb8 Next set of additional error checks for invalid Mach-O files for bad load commands
that use the Mach::dylib_command type for the load commands that are
currently used in the MachOObjectFile constructor.

This contains the missing checks for LC_ID_DYLIB, LC_ID_DYLIB, etc.
load commands and the fields for the Mach::dylib_command type.

Also checks that only an MH_DYLIB or MH_STUB_DYLIB has an
LC_ID_DYLIB load command (and others filetype don’t) and there
is not more than one of these load commands.

llvm-svn: 282008
2016-09-20 20:14:14 +00:00
Sanjay Patel db3438abcb [x86] split up tests, regenerate checks
Note that we fail to eliminate 'or' with 0!

llvm-svn: 282005
2016-09-20 19:31:30 +00:00
Evandro Menezes ba4926efde Revert "[AArch64] Use the reciprocal estimation machinery"
This reverts commit b7d42b0048f65346e9fa37fb65defeea7ce8c337 per request by
Eric Christopher <echristo@gmail.com> (v. http://bit.ly/2cmz6kW).

llvm-svn: 282000
2016-09-20 19:02:06 +00:00
Evandro Menezes 61a1273d27 Revert "[AArch64] Properly validate the reciprocal estimation."
This reverts commit ad8ca1528242e2a4cb363e3779309e70eb7a430e per request by
Eric Christopher <echristo@gmail.com> (v. http://bit.ly/2cmz6kW).

llvm-svn: 281999
2016-09-20 19:02:02 +00:00
Adrian Prantl 12fa3b3911 ASAN: Don't drop debug info attachements for global variables.
This is a follow-up to r281284. Global Variables now can have
!dbg attachements, so ASAN should clone these when generating a
sanitized copy of a global variable.

<rdar://problem/24899262>

llvm-svn: 281994
2016-09-20 18:28:42 +00:00
Adrian McCarthy c64acfd4c2 Emit S_COMPILE3 CodeView record
CodeView has an S_COMPILE3 record to identify the compiler and source language of the compiland.  This record comes first in the debug$S section for the compiland. The debuggers rely on this record to know the source language of the code.

There was a little test fallout from introducing a new record into the symbols subsection.

Differential Revision: https://reviews.llvm.org/D24317

llvm-svn: 281990
2016-09-20 17:20:51 +00:00
Saleem Abdulrasool 03ffa797ad X86: loosen an overly aggressive MachO assertion
We would assert that the FP setup CFI used esp/rsp always.  This held up in
practice when the code was generated from IR.  However, with the integrated
assembler, it is possible to have the input be user specified assembly.  In such
a case, we cannot assume that the function implementation has a compact unwind
representation.  Loosen the assertion into a check and bail if we cannot
represent the frame pointer in the compact unwinding.

Addresses PR30453!

llvm-svn: 281986
2016-09-20 17:05:04 +00:00
Tim Northover b18ea162df GlobalISel: split aggregates for PCS lowering
This should match the existing behaviour for passing complicated struct and
array types, in particular HFAs come through like that from Clang.

For C & C++ we still need to somehow support all the weird ABI flags, or at
least those that are present in the IR (signext, byval, ...), and stack-based
parameter passing.

llvm-svn: 281977
2016-09-20 15:20:36 +00:00
Simon Pilgrim 1c38e1189f [X86][SSE] Regenerate multiple combine tests
llvm-svn: 281973
2016-09-20 14:42:45 +00:00
Artem Tamazov 2e721aa8c8 [AMDGPU][mc] Add regression tests for Bug 28168
llvm-svn: 281967
2016-09-20 11:58:40 +00:00
Elena Demikhovsky d3ff7c288b AVX-512: Fixed a bug in lowering saturated operations on KNL.
The generated code is still not optimal.

Differential Revision: https://reviews.llvm.org/D24723

llvm-svn: 281966
2016-09-20 11:02:26 +00:00
Dorit Nuzman 02efef0525 Reverting revision 281960 due to test failures.
llvm-svn: 281961
2016-09-20 08:27:48 +00:00
Dorit Nuzman d3686e5269 [SROA] Preserve llvm.mem.parallel_loop_access metadata.
SROA doesn't preserve the llvm.mem.parallel_loop_access metadata when it
transforms loads/stores. This patch fixes a couple occurences of this
issue.

(Partially addresses PR28981).

Differential Revision: https://reviews.llvm.org/D23549

llvm-svn: 281960
2016-09-20 07:50:49 +00:00
Craig Topper 9820e341f9 [AVX-512] Use 512-bit vcvtps2ph/vcvtph2ps to implement fp_to_f16/f16_to_fp when F16C and VLX are not supported.
Fixes PR23941.

llvm-svn: 281958
2016-09-20 05:44:47 +00:00
Matthias Braun 4040c0f4ec BranchFolder: Fix invalid undef flags after merge.
It is legal to merge instructions with different undef flags; However we
must drop the undef flag from the merged instruction if it isn't present
everywhere.

This fixes http://llvm.org/PR30199

llvm-svn: 281957
2016-09-20 01:14:42 +00:00
Kostya Serebryany 06694d0a2f [sanitizer-coverage] add comdat to coverage guards if needed
llvm-svn: 281952
2016-09-20 00:16:54 +00:00
Sanjay Patel e9ac83dbe3 [x86] auto-generate checks
llvm-svn: 281950
2016-09-19 23:44:50 +00:00
Eric Christopher bf9a7c43b0 Move the armv8.1-a ras test to a negative with noras test as ras is
included in armv8.1-a by default and so we weren't testing anything.

llvm-svn: 281941
2016-09-19 21:55:04 +00:00
Mehdi Amini 7cf6382657 BitcodeWriter: fix emission of invoke when calling a var-arg function with operand bundles
llvm-svn: 281940
2016-09-19 21:27:04 +00:00
Simon Pilgrim 233374c4d1 [X86][SSE] Updated vector abs tests
Renamed and added v2i64 / v4i64 tests

llvm-svn: 281937
2016-09-19 20:50:35 +00:00
Dehao Chen 20866ed57e Handle early inline for hot callsites that reside in the same basic block.
Summary: Callsites in the same basic block should share the same hotness. This patch checks for the hottest callsite in the same basic block, and use the hotness for all callsites in that basic block for early inline decisions. It also fixes the test to add "-S" so theat the "CHECK-NOT" is actually checking the content.

Reviewers: dnovillo

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D24734

llvm-svn: 281927
2016-09-19 18:38:14 +00:00
Matthias Braun e1d9628bba LiveRangeCalc: Fix reporting of invalid vreg usage in liveness calculation
Machine programs need a definition of each vreg before reaching a use
(the definition may come from an IMPLICIT_DEF instruction). This class
of errors is not detected by the MachineVerifier because of efficiency
concerns. LiveRangeCalc used to report these problems, make it do that
again (followup to r279625).

Also use report_fatal_error() instead of llvm_unreachable() as the error
reporting is only present in asserts build anyway.

llvm-svn: 281914
2016-09-19 16:49:45 +00:00
Dehao Chen 82667d04c5 Only set branch weight during sample pgo annotation when max_weight of the branch is non-zero. Otherwise use default static profile to set branch probability.
Summary: It does not make sense to set equal weights for all unkown branches as we have static branch prediction available.

Reviewers: dnovillo

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D24732

llvm-svn: 281912
2016-09-19 16:33:41 +00:00
Dehao Chen 38e3731c47 Use call target count to derive the call instruction weight
Summary: The call target count profile is directly derived from LBR branch->target data. This is more reliable than instruction frequency profiles that could be moved across basic block boundaries. This patches uses call target count profile to annotate call instructions.

Reviewers: davidxl, dnovillo

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D24410

llvm-svn: 281911
2016-09-19 16:06:37 +00:00
Etienne Bergeron 6ba5176862 [asan] Support dynamic shadow address instrumentation
Summary:
This patch is adding the support for a shadow memory with
dynamically allocated address range.

The compiler-rt needs to export a symbol containing the shadow
memory range.

This is required to support ASAN on windows 64-bits.

Reviewers: kcc, rnk, vitalybuka

Subscribers: kubabrecka, dberris, llvm-commits, chrisha

Differential Revision: https://reviews.llvm.org/D23354

llvm-svn: 281908
2016-09-19 15:58:38 +00:00
Nico Weber 9be7267516 Revert r281841, it does not work on Windows (PR30443).
llvm-svn: 281905
2016-09-19 15:22:04 +00:00
Diana Picus a53660e4a3 [AArch64] Fix encoding for lsl #12 in add/sub immediates
Whenever an add/sub immediate needs a fixup, we set that immediate field to zero,
which is correct, but we also set the shift bits to zero, which is not true for
instructions that use lsl #12. This patch makes sure that if lsl #12 was used,
it will appear in the encoding of the instruction.

Differential Revision: https://reviews.llvm.org/D23930

llvm-svn: 281898
2016-09-19 11:10:18 +00:00
Sam Kolton be7ffb90bf [AMDGPU] Fix s_branch with -1 offset
Summary:
In case s_branch instruction target is itself backend should emit offset -1 but instead it emit 0.
'''
label:
    s_branch label  // should emit [0xff,0xff,0x82,0xbf]
'''

Tom, Matt: why are we adjusting fixup values in applyFixup() method instead of processFixup()? processFixup() is calling adjustFixupValue() but does nothing with its result.

Reviewers: vpykhtin, artem.tamazov, tstellarAMD

Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl

Differential Revision: https://reviews.llvm.org/D24671

llvm-svn: 281896
2016-09-19 10:20:55 +00:00
Keith Walker c941252374 Add @llvm.dbg.value entries for the phi node created by -mem2reg
When phi nodes are created in the -mem2reg phase, the @llvm.dbg.declare
entries are converted to @llvm.dbg.value entries at the place where the
store instructions existed. However no entry is created to describe
the resulting value of the phi node.

The effect of this is especially noticeable in for loops which have a
constant for the intial value; the loop control variable's location
would be described as the intial constant value in the loop body once
the -mem2reg optimization phase was run.

This change adds the creation of the @llvm.dbg.value entries to describe
variables whose location is the result of a phi node created in -mem2reg.

Also when the phi node is finally lowered to a machine instruction it
is important that the lowered "load" instruction is placed before the
associated DEBUG_VALUE entry describing the value loaded.

Differential Revision: https://reviews.llvm.org/D23715

llvm-svn: 281895
2016-09-19 09:49:30 +00:00
Oliver Stannard e1f6dc59ce [Thumb] Set correct initial mapping symbol for big-endian thumb
The initial mapping symbol state is set from the triple, but we only checked
for the little-endian thumb triple, so could end up with an ARM mapping symbol
for big-endian thumb.

Differential Revision: https://reviews.llvm.org/D24553

llvm-svn: 281894
2016-09-19 09:21:45 +00:00
Tim Northover eaee28b5ca ARM: check alignment before transforming ldr -> ldm (or similar).
ldm and stm instructions always require 4-byte alignment on the pointer, but we
weren't checking this before trying to reduce code-size by replacing a
post-indexed load/store with them. Unfortunately, we were also dropping this
incormation in DAG ISel too, but that's easy enough to fix.

llvm-svn: 281893
2016-09-19 09:11:09 +00:00
Elena Demikhovsky 59f6f8c05d [X86 Codegen Test] Divided masked_memop into several files. NFC.
The masked_memop.ll became huge. I extracted AVX-512 specific tests into separate files.

llvm-svn: 281892
2016-09-19 08:58:43 +00:00
James Molloy 0efb96a8ee [SimplifyCFG] Update (AND) IR flags when CSE'ing instructions
We were updating metadata but not IR flags. Because we pick an arbitrary instruction to be the CSE candidate, it comes down to luck (50% or less chance) if this results in broken codegen or not, which is why PR30373 which is actually not the fault of the commit it was bisected down to.

Fixes PR30373.

llvm-svn: 281889
2016-09-19 08:23:08 +00:00
Craig Topper 61403201ea [X86,AVX-512] Use INSERT_SUBREG instead of SUBREG_TO_REG when the input is not the output of an instruction.
SUBREG_TO_REG is supposed to indicate that the super register has been zeroed, but we can't prove that if we don't know where it came from.

llvm-svn: 281885
2016-09-19 02:53:43 +00:00
Craig Topper b3b5033179 [AVX-512] Add support for lowering fp_to_f16 and f16_to_fp when VLX is supported regardless of whether F16C is also supported.
Still need to add support for lowering using AVX512F when neither VLX or F16C is supported.

llvm-svn: 281884
2016-09-19 02:53:37 +00:00
Vedant Kumar b8d0694caa [llvm-cov] Delete the NonCodeLines field, it was always dead
llvm-svn: 281882
2016-09-19 01:46:01 +00:00
Dean Michael Berris 4640154446 [XRay] ARM 32-bit no-Thumb support in LLVM
This is a port of XRay to ARM 32-bit, without Thumb support yet. The XRay instrumentation support is moving up to AsmPrinter.
This is one of 3 commits to different repositories of XRay ARM port. The other 2 are:

https://reviews.llvm.org/D23932 (Clang test)
https://reviews.llvm.org/D23933 (compiler-rt)

Differential Revision: https://reviews.llvm.org/D23931

llvm-svn: 281878
2016-09-19 00:54:35 +00:00
Vedant Kumar 3c46abb2ea [llvm-cov] Teach the coverage exporter about instantiation coverage
While we're at it, re-use the logic from CoverageReport to compute
summaries.

llvm-svn: 281877
2016-09-19 00:38:29 +00:00
Vedant Kumar 016111f7b9 [llvm-cov] Track function and instantiation coverage separately
These are distinct statistics which are useful to look at separately.

Example: say you have a template function "foo" with 5 instantiations
and only 3 of them are covered. Then this contributes (1/1) to the total
function coverage and (3/5) to the total instantiation coverage. I.e,
the old "Function Coverage" column has been renamed to "Instantiation
Coverage", and the new "Function Coverage" aggregates information from
the various instantiations of a function.

One benefit of making this switch is that the Line and Region coverage
columns will start making sense. Let's continue the example and assume
that the 5 instantiations of "foo" cover {2, 4, 6, 8, 10} out of 10
lines respectively. The new line coverage for "foo" is (10/10), not
(30/50).  The old scenario got confusing because we'd report that there
were more lines in a file than what was actually possible.

llvm-svn: 281875
2016-09-19 00:38:23 +00:00
Vedant Kumar 98ba34e5ad [llvm-cov] Drop another redundant 'No.' suffix
llvm-svn: 281872
2016-09-19 00:38:14 +00:00
Dehao Chen 41cde0b986 Handle Invoke during sample profiler annotation: make it inlinable.
Summary: Previously we reline on inst-combine to remove inlinable invoke instructions. This causes trouble because a few extra optimizations are schedule early that could introduce too much CFG change (e.g. simplifycfg removes too much control flow). This patch handles invoke instruction in-place during sample profile annotation, so that we do not rely on instcombine to remove those invoke instructions.

Reviewers: davidxl, dnovillo

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D24409

llvm-svn: 281870
2016-09-18 23:11:37 +00:00
Simon Pilgrim 9178059826 [CostModel][X86] Added scalar float op costs
llvm-svn: 281864
2016-09-18 21:01:20 +00:00
Simon Pilgrim 34032cba14 Rename tests
llvm-svn: 281863
2016-09-18 20:25:41 +00:00
Xinliang David Li 4ca1733a06 [Profile] Implement select instruction instrumentation in IR PGO
Differential Revision: http://reviews.llvm.org/D23727

llvm-svn: 281858
2016-09-18 18:34:07 +00:00
Elena Demikhovsky 5f8cc0c346 [Loop Vectorizer] Consecutive memory access - fixed and simplified
Amended consecutive memory access detection in Loop Vectorizer.
Load/Store were not handled properly without preceding GEP instruction.

Differential Revision: https://reviews.llvm.org/D20789

llvm-svn: 281853
2016-09-18 13:56:08 +00:00
Simon Pilgrim 6c21e6a54e [X86][SSE] Improve recognition of uitofp conversions that can be performed as sitofp
With D24253 we can now use SelectionDAG::SignBitIsZero with vector operations.

This patch uses SelectionDAG::SignBitIsZero to recognise that a zero sign bit means that we can use a sitofp instead of a uitofp (which is not directly support on pre-AVX512 hardware).

While AVX512 does provide support for uitofp, the conversion to sitofp should not cause any regressions.

Differential Revision: https://reviews.llvm.org/D24343

llvm-svn: 281852
2016-09-18 12:45:23 +00:00
Wei Mi ab24cd189f Change the order of the splitted store from high - low to low - high.
It is a trivial change which could make the testcase easier to be reused
for the store splitting in CodeGenPrepare.

llvm-svn: 281846
2016-09-18 06:10:32 +00:00
Simon Pilgrim aeb764a7b7 [X86][SSE] Added vector udiv combine tests
llvm-svn: 281842
2016-09-17 22:02:23 +00:00
Simon Pilgrim 2cc2900296 [X86][SSE] Added vector fcopysign combine tests
Also demonstrating the poor lowering of fcopysign...

llvm-svn: 281841
2016-09-17 21:31:34 +00:00
Teresa Johnson fbb431b292 [ThinLTO] Ensure anonymous globals renamed even at -O0
Summary:
This fixes an issue when files are compiled with -flto=thin
at default -O0. We need to rename anonymous globals before attempting
to write the module summary because all values need names for
the summary. This was happening at -O1 and above, but not before
the early exit when constructing the pipeline for -O0.

Also add an internal -prepare-for-thinlto option to enable this
to be tested via opt.

Fixes PR30419.

Reviewers: mehdi_amini

Subscribers: probinson, llvm-commits, mehdi_amini

Differential Revision: https://reviews.llvm.org/D24701

llvm-svn: 281840
2016-09-17 20:40:16 +00:00
Simon Pilgrim d4473f1126 [X86][SSE] Added vector mul combine tests
llvm-svn: 281839
2016-09-17 20:06:16 +00:00
Simon Pilgrim 6736096ac3 [X86][SSE] Improve target shuffle mask extraction
Add ability to extract vXi64 'vzext_movl' masks on 32-bit targets

llvm-svn: 281834
2016-09-17 18:50:54 +00:00
Simon Pilgrim 06bfabbfb6 [X86][AVX] Test target shuffle combining on 32 and 64-bit targets
llvm-svn: 281833
2016-09-17 18:42:41 +00:00
Simon Pilgrim 06f85e43cf [X86][AVX2] Add target shuffle constant folding tests
llvm-svn: 281830
2016-09-17 17:42:15 +00:00
Simon Pilgrim a7785cdee6 [X86][AVX] Add target shuffle constant folding tests
llvm-svn: 281829
2016-09-17 17:41:14 +00:00
Simon Pilgrim 44f3aead3d [X86][XOP] Add target shuffle constant folding tests
llvm-svn: 281828
2016-09-17 17:40:40 +00:00
Simon Pilgrim f0c22ea1a4 [X86][SSSE3] Add target shuffle constant folding tests
llvm-svn: 281827
2016-09-17 17:40:08 +00:00
Ron Lieberman da5df7c99e [Hexagon] segv while processing SUnit with nullNodePtr
Added BoundaryNode check to isBestZeroLatency function.

llvm-svn: 281825
2016-09-17 16:21:09 +00:00
Matt Arsenault ac0fc849cf AMDGPU: Fix broken FrameIndex handling
We were trying to avoid using a FrameIndex operand in non-pointer
operands in a convoluted way, and would break because of
using TargetFrameIndex. The TargetFrameIndex should only be used
in the case where it makes sense to fold it as part of the addressing
mode, otherwise it requires materialization like a normal constant.
This wasn't working reliably and failed in the added testcase, hitting
the assert when processing the frame index.

The TargetFrameIndex was coming from trying to produce an AssertZext
limiting the maximum stack size. I'm not sure this was correct to begin
with, because it is apparently possible to have a single workitem
dispatch that requires all 4G of private memory.

llvm-svn: 281824
2016-09-17 16:09:55 +00:00
Matt Arsenault d99ef1144b AMDGPU: Push bitcasts through build_vector
This reduces the number of copies and reg_sequences
when using fp constant vectors. This significantly
reduces the code size in local-stack-alloc-bug.ll

llvm-svn: 281822
2016-09-17 15:44:16 +00:00
Kostya Serebryany 8ad4155745 [sanitizer-coverage] change trace-pc to use 8-byte guards
llvm-svn: 281809
2016-09-17 05:03:05 +00:00
Matt Arsenault 7b1dc2c983 AMDGPU: Use i64 scalar compare instructions
VI added eq/ne for i64, so use them.

llvm-svn: 281800
2016-09-17 02:02:19 +00:00
Tom Stellard 7998db634c AMDGPU/SI: Fix kernel argument ABI for HSA
Summary: i8, i16, and f16 values are not extended to 32-bit in the HSA kernel ABI.

Reviewers: arsenm

Subscribers: arsenm, kzhuravl, wdng, nhaehnle, llvm-commits, yaxunl

Differential Revision: https://reviews.llvm.org/D24621

llvm-svn: 281789
2016-09-16 22:20:24 +00:00
Sanjay Patel f26710d97d [InstCombine] canonicalize vector select with constant vector condition to shuffle
As discussed on llvm-dev ( http://lists.llvm.org/pipermail/llvm-dev/2016-August/104210.html ): 
turn a vector select with constant condition operand into a shuffle as a canonicalization step.
Shuffles may be easier to reason about in conjunction with other shuffles and insert/extract.

Possible known (minor?) regressions from this change are filed as:
https://llvm.org/bugs/show_bug.cgi?id=28530 
https://llvm.org/bugs/show_bug.cgi?id=28531 
https://llvm.org/bugs/show_bug.cgi?id=30371

If something terrible happens to perf after this commit, feel free to revert until a backend
fix is in place.

Differential Revision: https://reviews.llvm.org/D24279

llvm-svn: 281787
2016-09-16 22:16:18 +00:00
Matt Arsenault 6408c9135c AMDGPU: Allow some control flow intrinsics to be CSEd
These clean up some unnecessary or instructions in
cases with complex loops.

In the original testcase I noticed this, the same
or with exec was repeated 5 or 6 times in a row. With
this only one is emitted or sometimes a copy.

llvm-svn: 281786
2016-09-16 22:11:18 +00:00
Evgeniy Stepanov aa84f050fc [safestack] Fix assertion failure in stack coloring.
This is a fix for PR30318.

Clang may generate IR where an alloca is already live when entering a
BB with lifetime.start. In this case, conservatively extend the
alloca lifetime all the way back to the block entry.

llvm-svn: 281784
2016-09-16 22:04:10 +00:00
Tom Stellard bbeb45aff6 AMDGPU: Refactor kernel argument lowering
Summary:
The main challenge in lowering kernel arguments for AMDGPU is determing the
memory type of the argument.  The generic calling convention code assumes
that only legal register types can be stored in memory, but this is not the
case for AMDGPU.

This consolidates all the logic AMDGPU uses for deducing memory types into a single
function.  This will make it much easier to support different ABIs in the future.

Reviewers: arsenm

Subscribers: arsenm, wdng, nhaehnle, llvm-commits, yaxunl

Differential Revision: https://reviews.llvm.org/D24614

llvm-svn: 281781
2016-09-16 21:53:00 +00:00
Matt Arsenault 7ccf6cd104 AMDGPU: Use SOPK compare instructions
llvm-svn: 281780
2016-09-16 21:41:16 +00:00
Tom Stellard 0b76fc4c77 AMDGPU/SI: Add support for triples with the mesa3d operating system
Summary:
mesa3d will use the same kernel calling convention as amdhsa, but it will
handle everything else like the default 'unknown' OS type.

Reviewers: arsenm

Subscribers: arsenm, llvm-commits, kzhuravl

Differential Revision: https://reviews.llvm.org/D22783

llvm-svn: 281779
2016-09-16 21:34:26 +00:00
Sanjay Patel c96f6db246 [InstCombine] allow vector types for constant folding / computeKnownBits (PR24942)
computeKnownBits() already works for integer vectors, so allow vector types when calling that from InstCombine.

I don't think the change to use m_APInt in computeKnownBits is strictly necessary because we do check for 
ConstantVector later, but it's more efficient to handle the splat case without needing to loop on vector elements.

This should work with InstSimplify, but doesn't yet, so I made that a FIXME comment on the test for PR24942:
https://llvm.org/bugs/show_bug.cgi?id=24942

Differential Revision: https://reviews.llvm.org/D24677

llvm-svn: 281777
2016-09-16 21:20:36 +00:00
Davide Italiano 14e9e8af35 [LTO] Add ability to parse AA pipelines.
This is supposed to be a drop in replacement for what lld
provides via --lto-newpm-aa-pipeline.

llvm-svn: 281774
2016-09-16 21:03:21 +00:00
Derek Schuff 3b04b7eba1 [WebAssembly] Fix function types of CFGStackify tests
Make the function's declared type match its (lack of) return type

llvm-svn: 281773
2016-09-16 20:58:31 +00:00
Simon Pilgrim 99cc6f8043 [X86][SSE] Added vector sub combine tests
llvm-svn: 281769
2016-09-16 20:00:51 +00:00
Simon Pilgrim 28568c477d [X86][SSE] Added vector add combine tests
Some work great and others currently demonstrate the anti-vector bias prevalent in DAGCombiner

llvm-svn: 281768
2016-09-16 19:20:41 +00:00
Nirav Dave 2364748a49 Defer asm errors to post-statement failure
Recommitting after fixing AsmParser initialization and X86 inline asm
error cleanup.

Allow errors to be deferred and emitted as part of clean up to simplify
and shorten Assembly parser code. This will allow error messages to be
emitted in helper functions and be modified by the caller which has
better context.

As part of this many minor cleanups to the Parser:

* Unify parser cleanup on error
* Add Workaround for incorrect return values in ParseDirective instances
* Tighten checks on error-signifying return values for parser functions
  and fix in-tree TargetParsers to be more consistent with the changes.
* Fix AArch64 test cases checking for spurious error messages that are
  now fixed.

These changes should be backwards compatible with current Target Parsers
so long as the error status are correctly returned in appropriate
functions.

Reviewers: rnk, majnemer

Subscribers: aemerson, jyknight, llvm-commits

Differential Revision: https://reviews.llvm.org/D24047

llvm-svn: 281762
2016-09-16 18:30:20 +00:00
Michael Kuperstein dadb71be88 Make test slightly more explicit. NFC.
llvm-svn: 281759
2016-09-16 18:20:43 +00:00
Sanjay Patel e104a5f35a auto-generate checks
llvm-svn: 281756
2016-09-16 17:54:52 +00:00
Sanjay Patel 0417c7c3ef auto-generate checks
llvm-svn: 281755
2016-09-16 17:48:16 +00:00
Mehdi Amini 55b06538b5 Fix test after renaming -name-anon-functions pass to -name-anon-globals
llvm-svn: 281752
2016-09-16 17:18:16 +00:00
Teresa Johnson a8f6520c13 [LTO] Use llvm-nm instead of nm in new tests
The use of nm in the new tests added with r281725 caused a couple
of bot failures:
http://lab.llvm.org:8011/builders/clang-x64-ninja-win7/builds/15701
http://bb.pgr.jp/builders/ninja-clang-i686-msc19-R/builds/6939

Use llvm-nm instead.

llvm-svn: 281750
2016-09-16 17:12:48 +00:00
Mehdi Amini 2cac787919 Fix NameAnonFunctions pass: for ThinLTO we need to rename global variables as well
A follow-up patch will rename this pass and the source file accordingly,
but I figured the non-NFC change will be easier to spot in isolation.

Differential Revision: https://reviews.llvm.org/D24641

llvm-svn: 281744
2016-09-16 16:56:25 +00:00
Davide Italiano dc8e07b98e [LTO] Prevent asm references to be dropped from the output.
Differential Revision:  https://reviews.llvm.org/D24617

llvm-svn: 281741
2016-09-16 16:05:25 +00:00
Ahmed Bougacha 85ef4a1c47 [AArch64][GlobalISel] Add default regbank mapping for int<>FP.
llvm-svn: 281739
2016-09-16 15:12:46 +00:00
Ahmed Bougacha 7b3b2e7f65 [AArch64][GlobalISel] Add default regbank mapping for G_FCMP.
llvm-svn: 281738
2016-09-16 15:12:43 +00:00
Ahmed Bougacha 90637f6196 [AArch64][GlobalISel] Add default regbank mapping for FP ops.
These should have all their operands - even scalars - go on FPR.

llvm-svn: 281737
2016-09-16 15:12:40 +00:00
Ahmed Bougacha 74db8faa71 [AArch64][GlobalISel] Test default regbank mapping for G_ICMP.
Also relax a RegisterBankInfo verifier check that's incompatible with
1-bit mappings.

llvm-svn: 281735
2016-09-16 14:44:54 +00:00
Ahmed Bougacha 7306313e6d [AArch64][GlobalISel] Add default regbank mappings for mixed-type ops.
We used to only support instructions with same-type operands.
Instead, use the per-register type information to map each
operand more accurately.

llvm-svn: 281734
2016-09-16 14:44:51 +00:00