Commit Graph

40094 Commits

Author SHA1 Message Date
Kyle Butt 37e676d857 Codegen: Tail-duplicate during placement.
The tail duplication pass uses an assumed layout when making duplication
decisions. This is fine, but passes up duplication opportunities that
may arise when blocks are outlined. Because we want the updated CFG to
affect subsequent placement decisions, this change must occur during
placement.

In order to achieve this goal, TailDuplicationPass is split into a
utility class, TailDuplicator, and the pass itself. The pass delegates
nearly everything to the TailDuplicator object, except for looping over
the blocks in a function. This allows the same code to be used for tail
duplication in both places.

This change, in concert with outlining optional branches, allows
triangle shaped code to perform much better, esepecially when the
taken/untaken branches are correlated, as it creates a second spine when
the tests are small enough.

Issue from previous rollback fixed, and a new test was added for that
case as well. Issue was worklist/scheduling/taildup issue in layout.

Issue from 2nd rollback fixed, with 2 additional tests. Issue was
tail merging/loop info/tail-duplication causing issue with loops that share
a header block.

Differential revision: https://reviews.llvm.org/D18226

llvm-svn: 283619
2016-10-07 22:33:20 +00:00
Arnold Schwaighofer 3f25658143 swifterror: Don't compute swifterror vregs during instruction selection
The code used llvm basic block predecessors to decided where to insert phi
nodes. Instruction selection can and will liberally insert new machine basic
block predecessors. There is not a guaranteed one-to-one mapping from pred.
llvm basic blocks and machine basic blocks.

Therefore the current approach does not work as it assumes we can mark
predecessor machine basic block as needing a copy, and needs to know the set of
all predecessor machine basic blocks to decide when to insert phis.

Instead of computing the swifterror vregs as we select instructions, propagate
them at the end of instruction selection when the MBB CFG is complete.

When an instruction needs a swifterror vreg and we don't know the value yet,
generate a new vreg and remember this "upward exposed" use, and reconcile this
at the end of instruction selection.

This will only happen if the target supports promoting swifterror parameters to
registers and the swifterror attribute is used.

rdar://28300923

llvm-svn: 283617
2016-10-07 22:06:55 +00:00
Davide Italiano f6988d2980 [InstCombine] Don't unpack arrays that are too large (part 2).
This is similar to r283599, but for store instructions.
Thanks to David for pointing out!

llvm-svn: 283612
2016-10-07 21:53:09 +00:00
Davide Italiano da11412243 [InstCombine] Don't unpack arrays that are too large
Differential Revision:  https://reviews.llvm.org/D25376

llvm-svn: 283599
2016-10-07 20:57:42 +00:00
Tom Stellard 6982bb8f25 AMDGPU/SI: Add support for 8-byte relocations
Reviewers: arsenm, kzhuravl

Subscribers: wdng, nhaehnle, yaxunl, llvm-commits, tony-tye

Differential Revision: https://reviews.llvm.org/D25375

llvm-svn: 283593
2016-10-07 20:36:58 +00:00
Anna Thomas e76d77ace5 [RS4GC] Strengthen coverage: add more tests
Summary: Add tests for cases where we have zero coverage in RS4GC.

Reviewers: sanjoy, reames

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D25341

llvm-svn: 283591
2016-10-07 20:34:00 +00:00
Sanjay Patel 4326c4ac8f [InstCombine] fold select X, (ext X), C
If we're going to canonicalize IR towards select of constants, try harder to create those.
Also, don't lose the metadata.

This is actually 4 related transforms in one patch:
      // select X, (sext X), C --> select X, -1, C
      // select X, (zext X), C --> select X,  1, C
      // select X, C, (sext X) --> select X, C, 0
      // select X, C, (zext X) --> select X, C, 0

Differential Revision: https://reviews.llvm.org/D25126

llvm-svn: 283575
2016-10-07 17:53:07 +00:00
Simon Pilgrim f9648b72df [X86][SSE] Reapplied: Add vector fcopysign combine tests
Now with better lowering and fix for PR30443

llvm-svn: 283569
2016-10-07 16:00:59 +00:00
Artem Tamazov 73f1ab28cd [AMDGPU][mc] Add support for buffer_load_dwordx3, buffer_store_dwordx3.
Partially fixes Bug 28232.
Lit tests added.

Differential Revision: https://reviews.llvm.org/D25367

llvm-svn: 283567
2016-10-07 15:53:16 +00:00
Matthew Simpson a371c14ffe [LV] Don't mark multi-use branch conditions uniform
Previously, we marked the branch conditions of latch blocks uniform after
vectorization if they were instructions contained in the loop. However, if a
condition instruction has users other than the branch, it may not remain
uniform. This patch ensures the conditions we mark uniform are only used by the
branch. This should fix PR30627.

Reference: https://llvm.org/bugs/show_bug.cgi?id=30627
llvm-svn: 283563
2016-10-07 15:20:13 +00:00
Sam Kolton a3ec5c10e2 [AMDGPU] Assembler: support v_mac_f32 DPP and SDWA. Move getNamedOperandIdx to AMDGPUBaseInfo.h
Reviewers: artem.tamazov, tstellarAMD

Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tony-tye

Differential Revision: https://reviews.llvm.org/D25084

llvm-svn: 283560
2016-10-07 14:46:06 +00:00
Simon Pilgrim 02f623e74c [X86][SSE] Tidied up tests - use standard check prefixes
llvm-svn: 283559
2016-10-07 14:42:22 +00:00
Tom Stellard 17eb3413cd [ValueTracking] Fix crash in GetPointerBaseWithConstantOffset()
Summary:
While walking defs of pointer operands we were assuming that the pointer
size would remain constant.  This is not true, because addresspacecast
instructions may cast the pointer to an address space with a different
pointer width.

This partial reverts r282612, which was a more conservative solution
to this problem.

Reviewers: reames, sanjoy, apilipenko

Subscribers: wdng, llvm-commits

Differential Revision: https://reviews.llvm.org/D24772

llvm-svn: 283557
2016-10-07 14:23:29 +00:00
Konstantin Zhuravlyov f74fc60a7d [AMDGPU] Promote uniform (i1, i16] operations to i32
Differential Revision: https://reviews.llvm.org/D25302

llvm-svn: 283555
2016-10-07 14:22:58 +00:00
Martin Storsjo 04864f45b2 [ARM] Reapply: Use __rt_div functions for divrem on Windows
Reapplying r283383 after revert in r283442. The additional fix
is a getting rid of a stray space in a function name, in the
refactoring part of the commit.

This avoids falling back to calling out to the GCC rem functions
(__moddi3, __umoddi3) when targeting Windows.

The __rt_div functions have flipped the two arguments compared
to the __aeabi_divmod functions. To match MSVC, we emit a
check for division by zero before actually calling the library
function (even if the library function itself also might do
the same check).

Not all calls to __rt_div functions for division are currently
merged with calls to the same function with the same parameters
for the remainder. This is more wasteful than a div + mls as before,
but avoids calls to __moddi3.

Differential Revision: https://reviews.llvm.org/D25332

llvm-svn: 283550
2016-10-07 13:28:53 +00:00
Javed Absar fb4b6e8db9 [ARM]: Add Cortex-R52 target to LLVM
This patch adds Cortex-R52, the new ARM real-time processor, to LLVM. 
Cortex-R52 implements the ARMv8-R architecture.

llvm-svn: 283542
2016-10-07 12:06:40 +00:00
Simon Pilgrim a5d019ee95 [X86][SSE] Update register class during MOVSD/MOVSS - BLENDPD/BLENDPS commutation
MOVSD/MOVSS take a 128-bit register and a FR32/FR64 register input, the commutation code wasn't taking this into account leading to verification errors.

This patch inserts a vreg copy mi to ensure that the registers are correct.

Fix for PR30607

Differential Revision: https://reviews.llvm.org/D25280

llvm-svn: 283539
2016-10-07 11:18:38 +00:00
Alexey Bataev 6ad5da7c81 [SLPVectorizer] Fix for PR25748: reduction vectorization after loop
unrolling.

The next code is not vectorized by the SLPVectorizer:
```
 int test(unsigned int *p) {
  int sum = 0;
  for (int i = 0; i < 8; i++)
    sum += p[i];
  return sum;
 }
```
During optimization this loop is fully unrolled and SLPVectorizer is
unable to vectorize it. Patch tries to fix this problem.

Differential Revision: https://reviews.llvm.org/D24796

llvm-svn: 283535
2016-10-07 09:39:22 +00:00
Oliver Stannard 4df1cc0b00 [ARM] Don't convert switches to lookup tables of pointers with ROPI/RWPI
With the ROPI and RWPI relocation models we can't always have pointers
to global data or functions in constant data, so don't try to convert switches
into lookup tables if any value in the lookup table would require a relocation.
We can still safely emit lookup tables of other values, such as simple
constants.

Differential Revision: https://reviews.llvm.org/D24462

llvm-svn: 283530
2016-10-07 08:48:24 +00:00
Nicolai Haehnle 87bc4c218b AMDGPU: Fix use-after-free in SIOptimizeExecMasking
Summary:
There was a bug with sequences like

   s_mov_b64 s[0:1], exec
   s_and_b64 s[2:3]<def>, s[0:1], s[2:3]<kill>
   ...
   s_mov_b64_term exec, s[2:3]

because s[2:3] was defined and used in the same instruction, ending up with
SaveExecInst inside OtherUseInsts.

Note that the test case also exposes an unrelated bug.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98028

Reviewers: tstellarAMD, arsenm

Subscribers: kzhuravl, wdng, yaxunl, llvm-commits, tony-tye

Differential Revision: https://reviews.llvm.org/D25306

llvm-svn: 283528
2016-10-07 08:40:14 +00:00
Matt Arsenault 93401f4b5e AMDGPU: Change check prefix in test
llvm-svn: 283521
2016-10-07 03:55:04 +00:00
Hal Finkel bd5a172d9c [llvm-opt-report] Left justify unrolling counts, etc.
In the left part of the reports, we have things like U<number>; if some of
these numbers use more digits than others, we don't want a space in between the
U and the start of the number. Instead, the space should come afterward. This
way it is clear that the number goes with the U and not any other optimization
indicator that might come later on the line.

llvm-svn: 283518
2016-10-07 01:57:06 +00:00
David Majnemer 8c03c1bade [SimplifyCFG] Correctly test for unconditional branches in GetCaseResults
GetCaseResults assumed that a terminator with one successor was an
unconditional branch.  This is not necessarily the case, it could be a
cleanupret.

Strengthen the check by querying whether or not the terminator is
exceptional.

llvm-svn: 283517
2016-10-07 01:38:35 +00:00
Hal Finkel 16d29e3111 [llvm-opt-report] Use -no-demangle to disable demangling
As this is intended to be a user-facing option, -no-demangle seems much better
than -demangle=0. Add testing for the option.

llvm-svn: 283516
2016-10-07 01:30:59 +00:00
Michael Kuperstein 5185b7dde3 [LV] Remove triples from target-independent vectorizer tests. NFC.
Vectorizer tests in the target-independent directory should not have a target
triple. If a test really needs to query a specific backend, it belongs in the
right target subdirectory (which "REQUIRES" the right backend). Otherwise, it
should not specify a triple.

llvm-svn: 283512
2016-10-06 23:57:25 +00:00
Dan Gohman 2726b88c03 [WebAssemby] Implement block signatures.
Per spec changes, this implements block signatures, and adds just enough
logic to produce correct block signatures at the ends of functions.

Differential Revision: https://reviews.llvm.org/D25144

llvm-svn: 283503
2016-10-06 22:29:32 +00:00
Dan Gohman 3a643e8d46 [WebAssembly] Remove loop's bottom label.
Per spec changes, loop constructs no longer have a bottom label.

https://reviews.llvm.org/D25118

llvm-svn: 283502
2016-10-06 22:10:23 +00:00
Dan Gohman 7f1bdb2e02 [WebAssembly] Remove the output operand from stores.
Per spec changes, store instructions in WebAssembly no longer have a return
value. Update the instruction descriptions.

Differential Revision: https://reviews.llvm.org/D25122

llvm-svn: 283501
2016-10-06 22:08:28 +00:00
Wolfgang Pieb e51bede1d8 Preserve the debug location when CodeGenPrepare sinks a compare instruction into the
basic block of a user.

Patch by Andrea DiBiagio.

Differential Revision: https://reviews.llvm.org/D24632

llvm-svn: 283500
2016-10-06 21:43:45 +00:00
Pirama Arumuga Nainar cc152ac794 Handle *_EXTEND_VECTOR_INREG during Integer Legalization
Summary:
These nodes need legalization for 3-element vectors.  This commit
handles the legalization and adds tests for zext and sext.

This fixes PR30614.

Reviewers: RKSimon, srhines

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D25268

llvm-svn: 283496
2016-10-06 21:27:05 +00:00
Rong Xu 0e79f7d11d [PGO] Create weak alias for the renamed Comdat function
Add a weak alias to the renamed Comdat function in IR level instrumentation,
using it's original name. This ensures the same behavior w/ and w/o IR
instrumentation, even for non standard conforming code.

Differential Revision: http://reviews.llvm.org/D25339

llvm-svn: 283490
2016-10-06 20:38:13 +00:00
Michael Kuperstein e524e22846 [X86] Preserve BasePtr for LEA64_32r
When replacing FrameIndex with BasePtr, we must preserve BasePtr for
LEA64_32r since BasePtr is used later for stack adjustment if it is
the same as StackPtr.

Patch by H.J Lu <hjl.tools@gmail.com>

Differential Revision: https://reviews.llvm.org/D23575

llvm-svn: 283486
2016-10-06 19:31:27 +00:00
Simon Pilgrim bddb412896 [X86][SSE] Add f16/f80/f128 vector sitofp test cases
As discussed on D23808

llvm-svn: 283485
2016-10-06 19:29:25 +00:00
Michael Kuperstein 7cc2123847 [DAG] Generalize build_vector -> vector_shuffle combine for more than 2 inputs
This generalizes the build_vector -> vector_shuffle combine to support any
number of inputs. The idea is to create a binary tree of shuffles, where
the first layer performs pairwise shuffles of the input vectors placing each
input element into the correct lane, and the rest of the tree blends these
shuffles together.

This doesn't try to be smart and create any sort of "optimal" shuffles.
The assumption is that even a "poor" shuffle sequence is better than extracting
and inserting the elements one by one.

Differential Revision: https://reviews.llvm.org/D24683

llvm-svn: 283480
2016-10-06 18:58:24 +00:00
Michael Ilseman 6d6b4d87a3 Revert "Add -strip-nonlinetable-debuginfo capability"
This reverts commit r283473.

Reverted until review is completed.

llvm-svn: 283478
2016-10-06 18:30:26 +00:00
Michael Ilseman d0a4db7632 Add -strip-nonlinetable-debuginfo capability
This adds a new function to DebugInfo.cpp that takes an llvm::Module
as input and removes all debug info metadata that is not directly
needed for line tables, thus effectively stripping all type and
variable information from the module.

The primary motivation for this feature was the bitcode work flow
(cf. http://lists.llvm.org/pipermail/llvm-dev/2016-June/100643.html
for more background). This is not wired up yet, but will be in
subsequent patches.  For testing, the new functionality is exposed to
opt with a -strip-nonlinetable-debuginfo option.

The secondary use-case (and one that works right now!) is as a
reduction pass in bugpoint. I added two new bugpoint options
(-disable-strip-debuginfo and -disable-strip-debug-types) to control
the new features. By default it will first attempt to remove all debug
information, then only the type info, and then proceed to hack at any
remaining MDNodes.

llvm-svn: 283473
2016-10-06 17:58:38 +00:00
Matt Arsenault 6bc43d8627 BranchRelaxation: Support expanding unconditional branches
AMDGPU needs to expand unconditional branches in a new
block with an indirect branch.

llvm-svn: 283464
2016-10-06 16:20:41 +00:00
Krzysztof Parzyszek d391d6f1c3 [Hexagon] Avoid replacing full regs with subregisters in tied operands
Doing so will result in the two-address pass generating incorrect code.

llvm-svn: 283463
2016-10-06 16:18:04 +00:00
Matt Arsenault ef5bba0136 BranchRelaxation: Account for function alignment
llvm-svn: 283462
2016-10-06 16:00:58 +00:00
Matt Arsenault 36919a4f7c Move AArch64BranchRelaxation to generic code
llvm-svn: 283459
2016-10-06 15:38:53 +00:00
Nirav Dave ee554e6155 [X86] Fix intel syntax push parsing bug
Change erroneous parsing of push immediate instructions in intel syntax
to default to pointer size by rewriting into the ATT style for matching.

This fixes PR22028.

Reviewers: majnemer, rnk

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D25288

llvm-svn: 283457
2016-10-06 15:28:08 +00:00
Rafael Espindola d9525a166d Centralize sh_entsize checking.
llvm-svn: 283455
2016-10-06 15:08:10 +00:00
Rafael Espindola c3befb2e39 Refactor to use getSectionContentsAsArray.
This centralizes quite a bit of error checking.

llvm-svn: 283454
2016-10-06 14:47:04 +00:00
Sam Kolton 3381d7a216 [AMDGPU] Disassembler: print label names in branch instructions
Summary: Add AMDGPUSymbolizer for finding names for labels from ELF symbol table.
Initialize MCObjectFileInfo with some default values.

Reviewers: vpykhtin, artem.tamazov, tstellarAMD

Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tony-tye

Differential Revision: https://reviews.llvm.org/D24802

llvm-svn: 283450
2016-10-06 13:46:08 +00:00
Hal Finkel 4d6f3088c3 [llvm-opt-report] Record VF, etc. correctly for multiple opts on one line
When there are multiple optimizations on one line, record the vectorization
factors, etc. correctly (instead of incorrectly substituting default values).

llvm-svn: 283443
2016-10-06 11:58:52 +00:00
Diana Picus 6341e46cd1 Revert "[ARM] Use __rt_div functions for divrem on Windows"
This reverts commit r283383 because it broke some of the bots:
undefined reference to ` __aeabi_uldivmod'

It affected (at least) clang-cmake-armv7-a15-selfhost,
clang-cmake-armv7-a15-selfhost and clang-native-arm-lnt.

llvm-svn: 283442
2016-10-06 11:24:29 +00:00
Hal Finkel 47faf3be89 [llvm-opt-report] Print line numbers starting from 1
Line numbers should start from 1, not 2.

llvm-svn: 283440
2016-10-06 11:11:11 +00:00
Zvi Rackover 08a37f46e3 Add test-cases which demontrate pr30561
llvm-svn: 283436
2016-10-06 10:04:00 +00:00
Bjorn Pettersson 3961603921 [ValueTracking] Teach computeKnownBits and ComputeNumSignBits to look through ExtractElement.
Summary:
The computeKnownBits and ComputeNumSignBits functions in ValueTracking can now do a simple look-through of ExtractElement.

Reviewers: majnemer, spatel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D24955

llvm-svn: 283434
2016-10-06 09:56:21 +00:00
James Molloy 6215fad0e9 [ARM] Constant pool promotion - fix alignment calculation
Global variables are GlobalValues, so they have explicit alignment. Querying
DataLayout for the alignment was incorrect.

Testcase added.

llvm-svn: 283423
2016-10-06 07:56:00 +00:00
James Molloy 78561c4917 [ARM] Improve testcase for r283323
We can work around a shortcoming of FileCheck by using {{\[}} to match a square
bracket before a [[ sequence.

Thanks to Eli Friedman for the heads up!

llvm-svn: 283422
2016-10-06 07:44:05 +00:00
Konstantin Zhuravlyov b4eb5d5049 [AMDGPU] Promote uniform i16 bitreverse intrinsic to i32
Differential Revision: https://reviews.llvm.org/D25121

llvm-svn: 283415
2016-10-06 02:20:46 +00:00
Sanjay Patel edc2baddf8 [DAG] add tests to show missing checks for SDNode FMF
The AVX attribute is added to remove noise caused by SSE's destructive insts.  

llvm-svn: 283410
2016-10-05 23:20:32 +00:00
Hal Finkel 5d0fbbbca1 Fix tests for Windows
We need to match file names with both forward and backward slashes.

llvm-svn: 283407
2016-10-05 22:48:13 +00:00
Reid Kleckner bb96df602e [codeview] Truncate records to maximum record size near 64KB
If we don't truncate, LLVM asserts when the label difference doesn't fit
in a 16 bit field. This patch truncates two kinds of data: trailing null
terminated names in symbol records, and inline line tables. The inline
line table test that I have is too large (many MB), so I'm not checking
it in.

Hopefully fixes PR28264.

llvm-svn: 283403
2016-10-05 22:36:07 +00:00
Hal Finkel 5aa0248059 [llvm-opt-report] Distinguish inlined contexts when optimizations differ
How code is optimized sometimes, perhaps often, depends on the context into
which it was inlined. This change allows llvm-opt-report to track the
differences between the optimizations performed, or not, in different contexts,
and when these differ, display those differences.

For example, this code:

  $ cat /tmp/q.cpp
  void bar();
  void foo(int n) {
    for (int i = 0; i < n; ++i)
      bar();
  }

  void quack() {
    foo(4);
  }

  void quack2() {
    foo(4);
  }

will now produce this report:

  < /home/hfinkel/src/llvm/test/tools/llvm-opt-report/Inputs/q.cpp
   2         | void bar();
   3         | void foo(int n) {
   [[
    > foo(int):
   4         |   for (int i = 0; i < n; ++i)
    > quack(), quack2():
   4  U4     |   for (int i = 0; i < n; ++i)
   ]]
   5         |     bar();
   6         | }
   7         |
   8         | void quack() {
   9 I       |   foo(4);
  10         | }
  11         |
  12         | void quack2() {
  13 I       |   foo(4);
  14         | }
  15         |

Note that the tool has demangled the function names, and grouped the reports
associated with line 4. This shows that the loop on line 4 was unrolled by a
factor of 4 when inlined into the functions quack() and quack2(), but not in
the function foo(int) itself.

llvm-svn: 283402
2016-10-05 22:25:33 +00:00
Adrian Prantl b3510afcd1 Verifier: Reject any unknown named MD nodes in the llvm.dbg namespace.
This came out of a discussion in https://reviews.llvm.org/D25285.

There used to be various other llvm.dbg.* nodes, but we don't support
upgrading them and we want to reserve the namespace for future uses.

This also removes an entirely obsolete and bitrotted testcase for PR7662.

Reapplies 283390 with a forgotten testcase.

llvm-svn: 283400
2016-10-05 22:15:37 +00:00
Adrian Prantl 497f085475 Revert "Verifier: Reject any unknown named MD nodes in the llvm.dbg namespace."
Forgot to add a testcase in r283390.

llvm-svn: 283399
2016-10-05 22:15:34 +00:00
Hal Finkel 52031b7e65 Add an llvm-opt-report tool to generate basic source-annotated optimization summaries
LLVM now has the ability to record information from optimization remarks in a
machine-consumable YAML file for later analysis. This can be enabled in opt
(see r282539), and D25225 adds a Clang flag to do the same. This patch adds
llvm-opt-report, a tool to generate basic optimization "listing" files
(annotated sources with information about what optimizations were performed)
from one of these YAML inputs.

D19678 proposed to add this capability directly to Clang, but this more-general
YAML-based infrastructure was the direction we decided upon in that review
thread.

For this optimization report, I focused on making the output as succinct as
possible while providing information on inlining and loop transformations. The
goal here is that the source code should still be easily readable in the
report. My primary inspiration here is the reports generated by Cray's tools
(http://docs.cray.com/books/S-2496-4101/html-S-2496-4101/z1112823641oswald.html).
These reports are highly regarded within the HPC community. Intel's compiler,
for example, also has an optimization-report capability
(https://software.intel.com/sites/default/files/managed/55/b1/new-compiler-optimization-reports.pdf).

  $ cat /tmp/v.c
  void bar();
  void foo() { bar(); }

  void Test(int *res, int *c, int *d, int *p, int n) {
    int i;

  #pragma clang loop vectorize(assume_safety)
    for (i = 0; i < 1600; i++) {
      res[i] = (p[i] == 0) ? res[i] : res[i] + d[i];
    }

    for (i = 0; i < 16; i++) {
      res[i] = (p[i] == 0) ? res[i] : res[i] + d[i];
    }

    foo();

    foo(); bar(); foo();
  }

D25225 adds -fsave-optimization-record (and
-fsave-optimization-record=filename), and this would be used as follows:

  $ clang -O3 -o /tmp/v.o -c /tmp/v.c -fsave-optimization-record
  $ llvm-opt-report /tmp/v.yaml > /tmp/v.lst
  $ cat /tmp/v.lst

  < /tmp/v.c
   2          | void bar();
   3          | void foo() { bar(); }
   4          |
   5          | void Test(int *res, int *c, int *d, int *p, int n) {
   6          |   int i;
   7          |
   8          | #pragma clang loop vectorize(assume_safety)
   9     V4,2 |   for (i = 0; i < 1600; i++) {
  10          |     res[i] = (p[i] == 0) ? res[i] : res[i] + d[i];
  11          |   }
  12          |
  13  U16     |   for (i = 0; i < 16; i++) {
  14          |     res[i] = (p[i] == 0) ? res[i] : res[i] + d[i];
  15          |   }
  16          |
  17 I        |   foo();
  18          |
  19          |   foo(); bar(); foo();
     I        |   ^
     I        |                 ^
  20          | }

Each source line gets a prefix giving the line number, and a few columns for
important optimizations: inlining, loop unrolling and loop vectorization. An
'I' is printed next to a line where a function was inlined, a 'U' next to an
unrolled loop, and 'V' next to a vectorized loop. These are printed on the
relevant code line when that seems unambiguous, or on subsequent lines when
multiple potential options exist (messages, both positive and negative, from
the same optimization with different column numbers are taken to indicate
potential ambiguity). When on subsequent lines, a '^' is output in the relevant
column.

Annotated source for all relevant input files are put into the listing file
(each starting with '<' and then the file name).

You can disable having the unrolling/vectorization factors appear by using the
-s flag.

Differential Revision: https://reviews.llvm.org/D25262

llvm-svn: 283398
2016-10-05 22:10:35 +00:00
Sanjay Patel 5839858584 [DAG] change test to use 'unsafe' function attribute instead of global setting
But we have node-level FMF, so the next step is to fix this at the instruction/node-level.

llvm-svn: 283393
2016-10-05 21:43:50 +00:00
Adrian Prantl 71bba7253e Verifier: Reject any unknown named MD nodes in the llvm.dbg namespace.
This came out of a discussion in https://reviews.llvm.org/D25285.

There used to be various other llvm.dbg.* nodes, but we don't support
upgrading them and we want to reserve the namespace for future uses.

This also removes an entirely obsolete and bitrotted testcase for PR7662.

llvm-svn: 283390
2016-10-05 21:31:19 +00:00
Reid Kleckner 2b3e6428e5 [codeview] Translate bitpiece metadata to DEFRANGE_SUBFIELD* records
This allows LLVM to describe locations of aggregate variables that have
been split by SROA.

Fixes PR29141

Reviewers: amccarth, majnemer

Differential Revision: https://reviews.llvm.org/D25253

llvm-svn: 283388
2016-10-05 21:21:33 +00:00
Martin Storsjo f997759aef [ARM] Use __rt_div functions for divrem on Windows
This avoids falling back to calling out to the GCC rem functions
(__moddi3, __umoddi3) when targeting Windows.

The __rt_div functions have flipped the two arguments compared
to the __aeabi_divmod functions. To match MSVC, we emit a
check for division by zero before actually calling the library
function (even if the library function itself also might do
the same check).

Not all calls to __rt_div functions for division are currently
merged with calls to the same function with the same parameters
for the remainder. This is more wasteful than a div + mls as before,
but avoids calls to __moddi3.

Differential Revision: https://reviews.llvm.org/D24076

llvm-svn: 283383
2016-10-05 21:08:02 +00:00
James Y Knight b0a473aaf8 [Sparc] Implement UMUL_LOHI and SMUL_LOHI instead of MULHS/MULHU/MUL.
This is what the instruction-set actually provides, and the default
expansions of the others into the lohi opcodes are good.

llvm-svn: 283381
2016-10-05 20:54:17 +00:00
Yunzhong Gao ba150d6156 Improve the debug-info test created in r274263.
This patch is related to r274263 or Phabricator/D21818.
This patch aims to improve the test case added in the previous commit to verify
specifically that the stack protector pass is adding the debug line info as
intended. Before, the test only verified that the verifier pass does not crash.
The current approach is to generate the assembly output and then look for the
.loc directive.

Differential Revision: https://reviews.llvm.org/D25290

llvm-svn: 283374
2016-10-05 20:26:29 +00:00
Krzysztof Parzyszek 3b6cbd55f7 [RDF] Fix live def propagation through basic block
llvm-svn: 283371
2016-10-05 20:08:09 +00:00
Reid Kleckner f9dddec21c Improve DEBUG_VALUE assembly comments for spilled bitpieces
Previously we would give up when we saw the bitpiece DWARF expression
and print "[complex expression]" when actually we handled bitpiece
expressions outside the loop.

llvm-svn: 283355
2016-10-05 18:36:02 +00:00
Simon Dardis 299dbd6cd1 [mips][ias] fix li macro when values are negated with ~
The integrated assembler evaluates the expressions such as ~0x80000000 to
0xffffffff7fffffff early in the parsing process. This patch adds compatibility
with gas so that li loads the expected value (0x7fffffff) in those cases. This
only occurs iff all the upper 32bits are set and maintains existing checks by
not truncating the result down to 32 bits if any of the the upper bits are not
set.

Reviewers: dsanders, zoran.jovanovic

Differential Review: https://reviews.llvm.org/D23399

llvm-svn: 283353
2016-10-05 18:26:19 +00:00
Bjorn Pettersson 12559441bd [DAG] Teach computeKnownBits and ComputeNumSignBits in SelectionDAG to look through EXTRACT_VECTOR_ELT.
Summary: Both computeKnownBits and ComputeNumSignBits can now do a simple
look-through of EXTRACT_VECTOR_ELT. It will compute the result based
on the known bits (or known sign bits) for the vector that the element
is extracted from.

Reviewers: bogner, tstellarAMD, mkuper

Subscribers: wdng, RKSimon, jyknight, llvm-commits, nhaehnle

Differential Revision: https://reviews.llvm.org/D25007

llvm-svn: 283347
2016-10-05 17:40:27 +00:00
Bjorn Pettersson ddd31e5637 Test commit permission. NFC
llvm-svn: 283346
2016-10-05 17:22:11 +00:00
Simon Dardis f45a59f80b Recommit: "[mips] Add rsqrt, recip for MIPS"
Add rsqrt.[ds], recip.[ds] for MIPS. Correct the microMIPS definitions for
architecture support and register usage.

Reviewers: vkalintiris, zoran.jovanoic

Differential Review: https://reviews.llvm.org/D24499

llvm-svn: 283334
2016-10-05 16:11:01 +00:00
Hans Wennborg c26c03d911 Revert r282920 "X86: Allow conditional tail calls in Win64 "leaf" functions (PR26302)"
This is suspected to cause a miscompile in Chromium. Reverting while
investigating.

llvm-svn: 283329
2016-10-05 15:39:27 +00:00
Simon Dardis bbfd528748 Revert "[mips] Add rsqrt, recip for MIPS"
This reverts commit r282485 which contain two patches instead of
one.

llvm-svn: 283327
2016-10-05 15:28:33 +00:00
Douglas Katzman 0411e8669b [X86] Don't randomly encode %rip where illegal
Differential Revision: https://reviews.llvm.org/D25112

llvm-svn: 283326
2016-10-05 15:23:35 +00:00
James Molloy b7de497cb9 [Thumb] Don't try and emit LDRH/LDRB from the constant pool
This is not a valid encoding - these instructions cannot do PC-relative addressing.

The underlying problem here is of whitelist in ARMISelDAGToDAG that unwraps ARMISD::Wrappers during addressing-mode selection. This didn't realise TargetConstantPool was actually possible, so didn't handle it.

llvm-svn: 283323
2016-10-05 14:52:13 +00:00
Douglas Katzman 8449b238ea [X86] Fix some tests that didn't assert anything
llvm-svn: 283322
2016-10-05 14:46:14 +00:00
Oren Ben Simhon 0670e5a35b Test commit permission
llvm-svn: 283319
2016-10-05 14:12:41 +00:00
Krzysztof Parzyszek e7c72cdbb0 Fix machine operand traversal in ScheduleDAGInstrs::fixupKills
llvm-svn: 283315
2016-10-05 13:15:06 +00:00
Kyle Butt 25ac35d822 Revert "Codegen: Tail-duplicate during placement."
This reverts commit 062ace9764953e9769142c1099281a345f9b6bdc.

Issue with loop info and block removal revealed by polly.
I have a fix for this issue already in another patch, I'll re-roll this
together with that fix, and a test case.

llvm-svn: 283292
2016-10-05 01:39:29 +00:00
Kyle Butt adabac2d57 Codegen: Tail-duplicate during placement.
The tail duplication pass uses an assumed layout when making duplication
decisions. This is fine, but passes up duplication opportunities that
may arise when blocks are outlined. Because we want the updated CFG to
affect subsequent placement decisions, this change must occur during
placement.

In order to achieve this goal, TailDuplicationPass is split into a
utility class, TailDuplicator, and the pass itself. The pass delegates
nearly everything to the TailDuplicator object, except for looping over
the blocks in a function. This allows the same code to be used for tail
duplication in both places.

This change, in concert with outlining optional branches, allows
triangle shaped code to perform much better, esepecially when the
taken/untaken branches are correlated, as it creates a second spine when
the tests are small enough.

Issue from previous rollback fixed, and a new test was added for that
case as well.

Differential revision: https://reviews.llvm.org/D18226

llvm-svn: 283274
2016-10-04 23:54:18 +00:00
Sanjay Patel bfdbea6481 [Target] move reciprocal estimate settings from TargetOptions to TargetLowering
The motivation for the change is that we can't have pseudo-global settings for
codegen living in TargetOptions because that doesn't work with LTO.

Ideally, these reciprocal attributes will be moved to the instruction-level via
FMF, metadata, or something else. But making them function attributes is at least
an improvement over the current state.

The ingredients of this patch are:

    Remove the reciprocal estimate command-line debug option.
    Add TargetRecip to TargetLowering.
    Remove TargetRecip from TargetOptions.
    Clean up the TargetRecip implementation to work with this new scheme.
    Set the default reciprocal settings in TargetLoweringBase (everything is off).
    Update the PowerPC defaults, users, and tests.
    Update the x86 defaults, users, and tests.

Note that if this patch needs to be reverted, the related clang patch checked in
at r283251 should be reverted too.

Differential Revision: https://reviews.llvm.org/D24816

llvm-svn: 283252
2016-10-04 20:46:43 +00:00
Kevin Enderby f993d6e72c Next set of additional error checks for invalid Mach-O files for the
load commands that uses the MachO::encryption_info_command and
MachO::encryption_info_command types but not used in llvm libObject
code but used in llvm tool code.

This includes just LC_ENCRYPTION_INFO and
LC_ENCRYPTION_INFO_64 load commands.

llvm-svn: 283250
2016-10-04 20:37:43 +00:00
Matthias Braun 46a5238682 AArch64: Macrofusion: Split features, add missing combinations.
AArch64InstrInfo::shouldScheduleAdjacent() determines whether two
instruction can benefit from macroop fusion on apple CPUs. The list
turned out to be incomplete:
- the "rr" variants of the instructions were missing
- even the "rs" variants can have shift value == 0 and behave like the
  "rr" variants

This also splits the MacropFusion target feature into
ArithmeticBccFusion and ArithmeticCbzFusion.

Differential Revision: https://reviews.llvm.org/D25142

llvm-svn: 283243
2016-10-04 19:28:21 +00:00
Hal Finkel bdd6735a9e Don't filter diagnostics written as YAML to the output file
The purpose of the YAML diagnostic output file is to collect information on
optimizations performed, or not performed, for later processing by tools that
help users (and compiler developers) understand how code was optimized. As
such, the diagnostics that appear in the file should not be coupled to what a
user might want to see summarized for them as the compiler runs, and in fact,
because the user likely does not know what optimization diagnostics their tools
might want to use, the user cannot provide a useful filter regardless. As such,
we shouldn't filter the diagnostics going to the output file.

Differential Revision: https://reviews.llvm.org/D25224

llvm-svn: 283236
2016-10-04 18:13:45 +00:00
Adam Nemet 0428e93217 Serialize remark argument as a mapping to get proper quotation for the value.
llvm-svn: 283231
2016-10-04 17:05:04 +00:00
Alexey Bataev 7e217c2402 [SLPVectorizer] Add a test with non-vectorizable IR.
llvm-svn: 283225
2016-10-04 15:07:23 +00:00
Anna Thomas 479cbb9405 [RS4GC] Handle ShuffleVector instruction in findBasePointer
Summary:
This patch modifies the findBasePointer to handle the shufflevector instruction.

Tests run: RS4GC tests, local downstream tests.

Reviewers: reames, sanjoy

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D25197

llvm-svn: 283219
2016-10-04 13:48:37 +00:00
Andrey Bokhanko 6903be56d5 Fix IntegerType::MAX_INT_BITS value
IntegerType::MAX_INT_BITS is apparently not in sync with Type::SubclassData
size. This patch fixes this.

Differential Revision: https://reviews.llvm.org/D24814

llvm-svn: 283215
2016-10-04 12:43:46 +00:00
Nemanja Ivanovic 6354d23555 [Power9] Exploit D-Form VSX Scalar memory ops that target full VSX register set
This patch corresponds to review:

The newly added VSX D-Form (register + offset) memory ops target the upper half
of the VSX register set. The existing ones target the lower half. In order to
unify these and have the ability to target all the VSX registers using D-Form
operations, this patch defines Pseudo-ops for the loads/stores which are
expanded post-RA. The expansion then choses the correct opcode based on the
register that was allocated for the operation.

llvm-svn: 283212
2016-10-04 11:25:52 +00:00
Simon Dardis 86b3a1e79b [mips][fastisel] Consider soft-float an unsupported floating point mode
Treat soft-float as unsupported for fast-isel. Additionally, ensure we check
that lowering f32 arguments also considers the case of soft-float mode.

Reviewers: ehostunreach, vkalintiris, zoran.jovanovic

Differential Review: https://reviews.llvm.org/D24505

llvm-svn: 283209
2016-10-04 10:35:07 +00:00
George Rimar 67443021a4 [Object/ELF] - Do not crash on invalid sh_offset value of REL[A] section.
Previously code would access invalid memory and may crash,
patch fixes the issue.

Differential revision: https://reviews.llvm.org/D25187

llvm-svn: 283204
2016-10-04 09:25:39 +00:00
George Rimar 5cbf23664d [Object/ELF] - Avoid possible crash in getExtendedSymbolTableIndex().
When using broken input object found using AFL,
getExtendedSymbolTableIndex() crashed because ShndxTable
was empty as object does not contain SHT_SYMTAB_SHNDX section.

Differential revision: https://reviews.llvm.org/D25189

llvm-svn: 283196
2016-10-04 08:44:03 +00:00
Nemanja Ivanovic a565d9e612 Fix a test case failure on Apple PPC.
llvm-svn: 283191
2016-10-04 07:37:38 +00:00
Nemanja Ivanovic 11049f8f07 [Power9] Part-word VSX integer scalar loads/stores and sign extend instructions
This patch corresponds to review:
https://reviews.llvm.org/D23155

This patch removes the VSHRC register class (based on D20310) and adds
exploitation of the Power9 sub-word integer loads into VSX registers as well
as vector sign extensions.
The new instructions are useful for a few purposes:

    Int to Fp conversions of 1 or 2-byte values loaded from memory
    Building vectors of 1 or 2-byte integers with values loaded from memory
    Storing individual 1 or 2-byte elements from integer vectors

This patch implements all of those uses.

llvm-svn: 283190
2016-10-04 06:59:23 +00:00
Kyle Butt 3ffb8529bc Revert "Codegen: Tail-duplicate during placement."
This reverts commit ff234efbe23528e4f4c80c78057b920a51f434b2.

Causing crashes on aarch64 build.

llvm-svn: 283172
2016-10-04 00:38:23 +00:00
Eli Friedman 74bed9d757 Make GlobalsAA ignore dead constant expressions.
Slightly improves the precision of GlobalsAA in certain situations, and
makes the behavior of optimization passes more predictable.

Differential Revision: https://reviews.llvm.org/D24104

llvm-svn: 283165
2016-10-04 00:03:55 +00:00
Kyle Butt 396bfdd707 Codegen: Tail-duplicate during placement.
The tail duplication pass uses an assumed layout when making duplication
decisions. This is fine, but passes up duplication opportunities that
may arise when blocks are outlined. Because we want the updated CFG to
affect subsequent placement decisions, this change must occur during
placement.

In order to achieve this goal, TailDuplicationPass is split into a
utility class, TailDuplicator, and the pass itself. The pass delegates
nearly everything to the TailDuplicator object, except for looping over
the blocks in a function. This allows the same code to be used for tail
duplication in both places.

This change, in concert with outlining optional branches, allows
triangle shaped code to perform much better, esepecially when the
taken/untaken branches are correlated, as it creates a second spine when
the tests are small enough.

llvm-svn: 283164
2016-10-04 00:00:09 +00:00
Matthias Braun d2fc0d40e4 Set some tests to an unknown vendor and OS
This avoids llc using the hosts OS/vendor as defaults and triggering
unwanted behaviour in the tests. This should deal with the buildbot
breakages on windows after r283140.

llvm-svn: 283149
2016-10-03 21:58:20 +00:00
Mehdi Amini 762d68bbab [LTO] Fix test to not depend on the exact address of symbols, just their linkage
llvm-svn: 283148
2016-10-03 21:40:50 +00:00
Krzysztof Parzyszek c8b6ecabd8 [RDF] Fix liveness propagation through shadows
Each shadow only represents data flow that is restricted to its reaching
def. Propagating more than that could lead to spurious register liveness,
resulting in extra (incorrectly) block live-ins.

llvm-svn: 283143
2016-10-03 20:17:20 +00:00
Matthias Braun eccdee9196 X86: Do not produce GOT relocations on windows
Windows has no GOT relocations the way elf/darwin has. Some people use
x86_64-pc-win32-macho to build EFI firmware; Do not produce GOT
relocations for this target.

Differential Revision: https://reviews.llvm.org/D24627

llvm-svn: 283140
2016-10-03 20:11:24 +00:00
Sanjoy Das 0359a193a7 [PruneEH] Be correct in the face IPO
This fixes one spot I had missed in r265762.  Credit goes to Philip
Reames for spotting this one!

llvm-svn: 283137
2016-10-03 19:35:30 +00:00
Konstantin Zhuravlyov 691e2e020b [AMDGPU] Sign extend AShr when promoting (instead of zero extending)
llvm-svn: 283130
2016-10-03 18:29:01 +00:00
Hans Wennborg b4d2678c6f Jump threading: avoid trying to split edge into landingpad block (PR27840)
Splitting the edge is nontrivial because of the landing pad, and we would
currently assert trying to do it.

Differential Revision: https://reviews.llvm.org/D24680

llvm-svn: 283129
2016-10-03 18:18:04 +00:00
Sanjay Patel d27a21874b [x86, SSE/AVX] allow 128/256-bit lowering for copysign vector intrinsics (PR30433)
This should fix:
https://llvm.org/bugs/show_bug.cgi?id=30433

There are a couple of open questions about the codegen:
1. Should we let scalar ops be scalars and avoid vector constant loads/splats?
2. Should we have a pass to combine constants such as the inverted pair that we have here?

Differential Revision: https://reviews.llvm.org/D25165
 

llvm-svn: 283119
2016-10-03 16:38:27 +00:00
Rafael Espindola d7325ee702 Don't drop the llvm. prefix when renaming.
If the llvm. prefix is dropped other parts of llvm don't see this as
an intrinsic.  This means that the number of regular symbols depends
on the context the module is loaded into, which causes LTO to abort.

Fixes PR30509.

llvm-svn: 283117
2016-10-03 15:51:42 +00:00
Nirav Dave 157891c57f Prevent out of order HashDirective lexing in AsmLexer.
Retrying after buildbot reset.

To lex hash directives we peek ahead to find component tokens, create a
unified token, and unlex the peeked tokens so the parser does not need
to parse the tokens then. Make sure we do not to lex another hash
directive during peek operation.

This fixes PR28921.

Reviewers: rnk, loladiro

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D24839

llvm-svn: 283111
2016-10-03 13:48:27 +00:00
Matt Arsenault 40bae76620 AMDGPU: Fix missing -verify-machineinstrs in test
llvm-svn: 283107
2016-10-03 12:58:59 +00:00
Simon Pilgrim 52ab136881 [X86][SSE] Add PR30371 (shuffle constant folding) test case
llvm-svn: 283103
2016-10-03 12:16:39 +00:00
Sjoerd Meijer 4dbe73c1ed [ARM] Code size optimisation to lower udiv+urem to udiv+mls instead of a
library call to __aeabi_uidivmod. This is an improved implementation of
r280808, see also D24133, that got reverted because isel was stuck in a loop.
That was caused by the optimisation incorrectly triggering on i64 ints, which
shouldn't happen because there is no 64bit hwdiv support; that put isel's type
legalization and this optimisation in a loop. A native ARM compiler and testing
now shows that this is fixed.

Patch mostly by Pablo Barrio.

Differential Revision: https://reviews.llvm.org/D25077

llvm-svn: 283098
2016-10-03 10:12:32 +00:00
Alexey Bataev fe91cf3aba [CodeGen] Adding a test showing the current state of poor code gen of
search loop, by Andrey Tischenko

PR27136 shows failure to hoist constant out of loop. This test is used
as start point to fix the failure: it shows the current state of codegen
and discovers what should be fixed

Differential Revision: https://reviews.llvm.org/D25097

llvm-svn: 283091
2016-10-03 07:47:01 +00:00
Simon Pilgrim a8d2168cb0 [X86][AVX2] Add support for combining target shuffles to VPERMD/VPERMPS
llvm-svn: 283080
2016-10-02 21:07:58 +00:00
Simon Pilgrim bce1f6b491 [X86][AVX2] Missed opportunities to combine to VPERMD/VPERMPS
llvm-svn: 283077
2016-10-02 20:43:02 +00:00
Simon Pilgrim b5200971d6 [X86][AVX2] Fix typo in test names
We are testing vpermps not vpermd

llvm-svn: 283076
2016-10-02 19:31:58 +00:00
Sanjay Patel 170d7eb303 [x86] remove 'nan' strings from copysign assertions; NFC
Preemptively scrubbing these to avoid a bot fail as in PR30443:
https://llvm.org/bugs/show_bug.cgi?id=30443

I'm nearly done with a patch to fix these cases, so not trying very
hard to do better for the temporary win. 

I plan to use better checks than what the script produces for the vectorized cases.

llvm-svn: 283072
2016-10-02 17:07:24 +00:00
Sanjay Patel dfbbbcd662 [x86] add test to show unnecessary scalarization of copysign intrinsics (PR30433)
llvm-svn: 283071
2016-10-02 16:31:35 +00:00
Simon Pilgrim 03afbe783d [X86][AVX] Ensure broadcast loads respect dependencies
To allow broadcast loads of a non-zero'th vector element, lowerVectorShuffleAsBroadcast can replace a load with a new load with an adjusted address, but unfortunately we weren't ensuring that the new load respected the same dependencies.

This patch adds a TokenFactor and updates all dependencies of the old load to reference the new load instead.

Bug found during internal testing.

Differential Revision: https://reviews.llvm.org/D25039

llvm-svn: 283070
2016-10-02 15:59:15 +00:00
Hal Finkel a9321059b9 [PowerPC] Refactor soft-float support, and enable PPC64 soft float
This change enables soft-float for PowerPC64, and also makes soft-float disable
all vector instruction sets for both 32-bit and 64-bit modes. This latter part
is necessary because the PPC backend canonicalizes many Altivec vector types to
floating-point types, and so soft-float breaks scalarization support for many
operations. Both for embedded targets and for operating-system kernels desiring
soft-float support, it seems reasonable that disabling hardware floating-point
also disables vector instructions (embedded targets without hardware floating
point support are unlikely to have Altivec, etc. and operating system kernels
desiring not to use floating-point registers to lower syscall cost are unlikely
to want to use vector registers either). If someone needs this to work, we'll
need to change the fact that we promote many Altivec operations to act on
v4f32. To make it possible to disable Altivec when soft-float is enabled,
hardware floating-point support needs to be expressed as a positive feature,
like the others, and not a negative feature, because target features cannot
have dependencies on the disabling of some other feature. So +soft-float has
now become -hard-float.

Fixes PR26970.

llvm-svn: 283060
2016-10-02 02:10:20 +00:00
Martell Malone 3a4d900039 COFF: Fix short import lib import name type bitshift
As per the PE COFF spec (section 8.3, Import Name Type)
Offset: 18 Size 2 bits Name: Type
Offset: 20 Size 3 bits Name: Name Type

Offset: 20 added based on 18+2

Partially commited as rL279069

Differential Revision: https://reviews.llvm.org/D23540

llvm-svn: 283055
2016-10-01 23:10:20 +00:00
Simon Pilgrim 04e249b128 [SLPVectorizer][X86] Added fptosi/fptoui tests
llvm-svn: 283048
2016-10-01 19:35:59 +00:00
Simon Pilgrim 1ec20e9b0a [CostModel][X86] Added tests for current fptosi/fptoui costs
llvm-svn: 283047
2016-10-01 19:09:59 +00:00
Simon Pilgrim 567c4fbdae [SLPVectorizer][X86] Added fcopysign tests
llvm-svn: 283046
2016-10-01 17:00:26 +00:00
Simon Pilgrim cceeb2a4fa [SLPVectorizer][X86] Added fabs tests
llvm-svn: 283045
2016-10-01 16:54:01 +00:00
Simon Pilgrim e0ec5c1f05 [CostModel][X86] Added fcopysign costs
llvm-svn: 283044
2016-10-01 16:41:52 +00:00
Simon Pilgrim 8b021c382d [CostModel][X86] Added fabs costs
llvm-svn: 283042
2016-10-01 16:30:13 +00:00
Simon Pilgrim 1638d49f20 [X86][SSE] Add support for combining target shuffles to binary BLEND
We already had support for 1-input BLEND with zero - this adds support for 2-input BLEND as well.

llvm-svn: 283040
2016-10-01 16:04:28 +00:00
Simon Pilgrim ae17cf20ce [X86][SSE] Always combine target shuffles to MOVSD/MOVSS
Now we can commute to BLENDPD/BLENDPS on SSE41+ targets if necessary, so simplify the combine matching where we can.

This required me to add a couple of scalar math movsd/moss fold patterns that hadn't been needed in the past.

llvm-svn: 283038
2016-10-01 15:33:01 +00:00
Simon Pilgrim ccdd1ff49b [X86][SSE] Enable commutation from MOVSD/MOVSS to BLENDPD/BLENDPS on SSE41+ targets
Instead of selecting between MOVSD/MOVSS and BLENDPD/BLENDPS at shuffle lowering by subtarget this will help us select the instruction based on actual commutation requirements.

We could possibly add BLENDPD/BLENDPS -> MOVSD/MOVSS commutation and MOVSD/MOVSS memory folding using a similar approach if it proves useful

I avoided adding AVX512 handling as I'm not sure when we should be making use of VBLENDPD/VBLENDPS on EVEX targets

llvm-svn: 283037
2016-10-01 14:26:11 +00:00
Simon Pilgrim f1c575bad7 [X86][SSE] Regenerate vselect tests and improve AVX1/AVX2 coverage
llvm-svn: 283035
2016-10-01 13:10:14 +00:00
Nirav Dave e4c6153cf1 Revert "[MC] Prevent out of order HashDirective lexing in AsmLexer."
This reverts commit r282992 which appears to be causing an LTO test failure.

llvm-svn: 283034
2016-10-01 10:57:55 +00:00
Craig Topper 5eb5ade894 [X86] Cleanup patterns for using VMOVDDUP for broadcasts.
-Remove OptForSize. Not all of the backend follows the same rules for creating broadcasts and there is no conflicting pattern.
-Don't stop selecting VEX VMOVDDUP when AVX512 is supported. We need VLX for EVEX VMOVDDUP.
-Only use VMOVDDUP for v2i64 broadcasts if AVX2 is not supported.

llvm-svn: 283020
2016-10-01 07:11:24 +00:00
Craig Topper 8aca90507f [AVX-512] Add VLX command lines to 128 and 256-bit shufffle tests.
llvm-svn: 283014
2016-10-01 06:01:18 +00:00
Mehdi Amini 86eeda8e20 Revert "AMDGPU: Don't use offen if it is 0"
This reverts commit r282999.
Tests are not passing: http://lab.llvm.org:8011/builders/clang-x86_64-linux-selfhost-modules/builds/20038

llvm-svn: 283003
2016-10-01 02:35:24 +00:00
Matt Arsenault 3070fdf798 AMDGPU: Don't use offen if it is 0
This removes many re-initializations of a base register to 0.

llvm-svn: 282999
2016-10-01 01:37:15 +00:00
Nirav Dave 9f2bd4e7ea [MC] Prevent out of order HashDirective lexing in AsmLexer.
To lex hash directives we peek ahead to find component tokens, create a
unified token, and unlex the peeked tokens so the parser does not need
to parse the tokens then. Make sure we do not to lex another hash
directive during peek operation.

This fixes PR28921.

Reviewers: rnk, loladiro

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D24839

llvm-svn: 282992
2016-10-01 00:42:32 +00:00
Mehdi Amini 6610b01a27 [ASAN] Add the binder globals on Darwin to llvm.compiler.used to avoid LTO dead-stripping
The binder is in a specific section that "reverse" the edges in a
regular dead-stripping: the binder is live as long as a global it
references is live.

This is a big hammer that prevents LLVM from dead-stripping these,
while still allowing linker dead-stripping (with special knowledge
of the section).

Differential Revision: https://reviews.llvm.org/D24673

llvm-svn: 282988
2016-10-01 00:05:34 +00:00
Reid Kleckner 9cb915b7be [SEH] Emit the parent frame offset label even if there are no funclets
This avoids errors about references to undefined local labels from
unreferenced filter functions.

Fixes (sort of) PR30431

llvm-svn: 282967
2016-09-30 22:10:12 +00:00
Rui Ueyama 5d6714e593 Do not pass a superblock to PDBFileBuilder.
When we create a PDB file using PDBFileBuilder, the information
in the superblock, such as the size of the resulting file, is not
available.

Previously, PDBFileBuilder::initialize took a superblock assuming
that all the members of the struct are correct. That is useful when
you want to restore the exact information from a YAML file, but
that's probably the only use case in which that is useful.
When we are creating a PDB file on the fly, we have to backfill the
members.

This patch redefines PDBFileBuilder::initialize to take only a
block size. Now all the other members are left as default values,
so that they'll be updated when commit() is called.

Differential Revision: https://reviews.llvm.org/D25108

llvm-svn: 282944
2016-09-30 20:52:12 +00:00
Hans Wennborg b5643b47b6 X86: Allow conditional tail calls in Win64 "leaf" functions (PR26302)
We can't use Jcc to leave a Win64 function in general, because that
confuses the unwinder. However, for "leaf" functions, that is, functions
where the return address is always on top of the stack and which don't
have unwind info, it's OK.

Differential Revision: https://reviews.llvm.org/D24836

llvm-svn: 282920
2016-09-30 20:07:35 +00:00
Sanjay Patel f7b851fe84 [InstCombine] allow non-splat folds of select cond (ext X), C
llvm-svn: 282906
2016-09-30 19:49:22 +00:00
Dehao Chen a067b0926f Revert test change in r282894 as it's broken in some platforms.
llvm-svn: 282903
2016-09-30 19:25:23 +00:00
Gor Nishanov a263a60ad5 [Coroutines] Part15c: Fix coro-split to correctly handle definitions between coro.save and coro.suspend
Summary:
In the case below, %Result.i19 is defined between coro.save and coro.suspend and used after coro.suspend. We need to correctly place such a value into the coroutine frame.

```
  %save = call token @llvm.coro.save(i8* null)
  %Result.i19 = getelementptr inbounds %"struct.lean_future<int>::Awaiter", %"struct.lean_future<int>::Awaiter"* %ref.tmp7, i64 0, i32 0
  %suspend = call i8 @llvm.coro.suspend(token %save, i1 false)
  switch i8 %suspend, label %exit [
    i8 0, label %await.ready
    i8 1, label %exit
  ]
await.ready:
  %val = load i32, i32* %Result.i19

```

Reviewers: majnemer

Subscribers: llvm-commits, mehdi_amini

Differential Revision: https://reviews.llvm.org/D24418

llvm-svn: 282902
2016-09-30 19:24:19 +00:00
Sanjay Patel 75b2518762 [InstCombine] add tests for non-splat select(ext)
llvm-svn: 282901
2016-09-30 19:15:41 +00:00
Gor Nishanov c16219486a [Coroutines] Part15b: Fix dbg information handling in coro-split.
Summary:
Without the fix, if there was a function inlined into the coroutine with debug information, CloneFunctionInto(NewF, &F, VMap, /*ModuleLevelChanges=*/true, Returns); would duplicate all of the debug information including the DICompileUnit.

We know use VMap to indicate that debug metadata for a File, Unit and FunctionType should not be duplicated when we creating clones that will become f.resume, f.destroy and f.cleanup.

Reviewers: majnemer

Subscribers: mehdi_amini, llvm-commits

Differential Revision: https://reviews.llvm.org/D24417

llvm-svn: 282899
2016-09-30 19:05:06 +00:00
Gor Nishanov 768de2c604 [Coroutines] Part 15a: Lower coro.subfn.addr in CoroCleanup
Summary: Not all coro.subfn.addr intrinsics can be eliminated in CoroElide through devirtualization. Those that remain need to be lowered in CoroCleanup.

Reviewers: majnemer

Subscribers: llvm-commits, mehdi_amini

Differential Revision: https://reviews.llvm.org/D24412

llvm-svn: 282897
2016-09-30 18:41:35 +00:00
Sanjay Patel b43712a513 clean up tests and auto-generate checks
llvm-svn: 282896
2016-09-30 18:37:34 +00:00
Michal Gorny b002f6370b cmake: Install the OCaml libraries into a more correct path
Add a OCAML_INSTALL_PATH variable that can be used to control
the install path for OCaml libraries. The new variable defaults to
${OCAML_STDLIB_PATH}, i.e. the OCaml library path obtained from
the OCaml compiler. Install libraries into "llvm" subdirectory.

This fixes two issues:

1. OCaml library directories differ between systems, and 'lib/ocaml' is
incorrect e.g. on amd64 Gentoo where OCaml is installed
in 'lib64/ocaml'. Therefore, obtain the library path from the OCaml
compiler using 'ocamlc -where' (which is already used to set
OCAML_STDLIB_PATH), which is the method used commonly in OCaml packages.

2. The top-level directory is reserved for the standard library, and has
precedence over local directory in search path. As a result, OCaml
preferred the files installed along with previous LLVM version over the
source tree when building a new version, resulting in two versions being
mixed during the build. The new layout is used commonly by other OCaml
packages, and findlib is able to find the LLVM libraries successfully.

Bug: https://bugs.gentoo.org/559134
Bug: https://bugs.gentoo.org/559624

Differential Revision: https://reviews.llvm.org/D24354

llvm-svn: 282895
2016-09-30 18:34:23 +00:00
Dehao Chen 977853b7c5 Update loop unroller cost model to make sure debug info does not affect optimization decisions.
Summary: Debug info should *not* affect optimization decisions. This patch updates loop unroller cost model to make it not affected by debug info.

Reviewers: davidxl, mzolotukhin

Subscribers: haicheng, llvm-commits, mzolotukhin

Differential Revision: https://reviews.llvm.org/D25098

llvm-svn: 282894
2016-09-30 18:30:04 +00:00
Sanjay Patel 9685b3eb57 [InstCombine] add tests for select X, (ext X), C
llvm-svn: 282891
2016-09-30 18:10:14 +00:00
Derek Schuff e9e6891b2d [WebAssembly] Make register stackification more conservative
Register stackification currently checks VNInfo for changes. Make that
more accurate by testing each intervening instruction for any other defs
to the same virtual register.

Patch by Jacob Gravelle

Differential Revision: https://reviews.llvm.org/D24942

llvm-svn: 282886
2016-09-30 18:02:54 +00:00