Commit Graph

52164 Commits

Author SHA1 Message Date
Mikhail Maltsev 68f35bcc85 [ARM] Do not convert some vmov instructions
Summary:
Patch https://reviews.llvm.org/D44467 implements conversion of invalid
vmov instructions into valid ones. It turned out that some valid
instructions also get converted, for example

  vmov.i64 d2, #0xff00ff00ff00ff00 ->
  vmov.i16 d2, #0xff00

Such behavior is incorrect because according to the ARM ARM section
F2.7.7 Modified immediate constants in T32 and A32 Advanced SIMD
instructions, "On assembly, the data type must be matched in the table
if possible."

This patch fixes the isNEONmovReplicate check so that the above
instruction is not modified any more.

Reviewers: rengolin, olista01

Reviewed By: rengolin

Subscribers: javed.absar, kristof.beyls, rogfer01, llvm-commits

Differential Revision: https://reviews.llvm.org/D44678

llvm-svn: 329158
2018-04-04 08:54:19 +00:00
Max Kazantsev 613af1f7ca [SCEV] Prove implications for SCEVUnknown Phis
This patch teaches SCEV how to prove implications for SCEVUnknown nodes that are Phis.
If we need to prove `Pred` for `LHS, RHS`, and `LHS` is a Phi with possible incoming values
`L1, L2, ..., LN`, then if we prove `Pred` for `(L1, RHS), (L2, RHS), ..., (LN, RHS)` then we can also
prove it for `(LHS, RHS)`. If both `LHS` and `RHS` are Phis from the same block, it is sufficient
to prove the predicate for values that come from the same predecessor block.

The typical case that it handles is that we sometimes need to prove that `Phi(Len, Len - 1) >= 0`
given that `Len > 0`. The new logic was added to `isImpliedViaOperations` and only uses it and
non-recursive reasoning to prove the facts we need, so it should not hurt compile time a lot.

Differential Revision: https://reviews.llvm.org/D44001
Reviewed By: anna

llvm-svn: 329150
2018-04-04 05:46:47 +00:00
Craig Topper 7d3aba6687 [SimplifyCFG] Teach merge conditional stores to handle cases where the PostBB has more than 2 predecessors by inserting a new block for the store.
Summary:
Currently merge conditional stores can't handle cases where PostBB (the block we need to move the store to) has more than 2 predecessors.

This patch removes that restriction by creating a new block with only the 2 predecessors we care about and an unconditional branch to the original block. This provides a place to put the store.

Reviewers: efriedma, jmolloy, ABataev

Reviewed By: efriedma

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D39760

llvm-svn: 329142
2018-04-04 03:47:17 +00:00
Vlad Tsyrklevich e3446017ed Add the ShadowCallStack pass
Summary:
The ShadowCallStack pass instruments functions marked with the
shadowcallstack attribute. The instrumented prolog saves the return
address to [gs:offset] where offset is stored and updated in [gs:0].
The instrumented epilog loads/updates the return address from [gs:0]
and checks that it matches the return address on the stack before
returning.

Reviewers: pcc, vitalybuka

Reviewed By: pcc

Subscribers: cryptoad, eugenis, craig.topper, mgorny, llvm-commits, kcc

Differential Revision: https://reviews.llvm.org/D44802

llvm-svn: 329139
2018-04-04 01:21:16 +00:00
Jessica Paquette 5fa2a63785 [MachineOutliner] Test for X86FI->getUsesRedZone() as well as Attribute::NoRedZone
This commit is similar to r329120, but uses the existing getUsesRedZone() function
in X86MachineFunctionInfo. This teaches the outliner to look at whether or not a
function *truly* uses a redzone instead of just the noredzone attribute on a
function.

Thus, after this commit, it's possible to outline from x86 without using
-mno-red-zone and still get outlining results.

This also adds a new test for the new redzone behaviour.

llvm-svn: 329134
2018-04-03 23:32:41 +00:00
Farhana Aleen e80aeac0f2 [AMDGPU] performMinMaxCombine should not optimize patterns of vectors to min3/max3.
Summary: There are no packed instructions for min3 or max3. So, performMinMaxCombine should not optimize vectors of f16 to min3/max3.

Author: FarhanaAleen

Reviewed By: arsenm

Subscribers: llvm-commits, AMDGPU

Differential Revision: https://reviews.llvm.org/D45219

llvm-svn: 329131
2018-04-03 23:00:30 +00:00
Sanjay Patel 81b3b10a95 [InstCombine] allow more fmul folds with 'reassoc'
The tests marked with 'FIXME' require loosening the check
in SimplifyAssociativeOrCommutative() to optimize completely;
that's still checking isFast() in Instruction::isAssociative().

llvm-svn: 329121
2018-04-03 22:19:19 +00:00
Jessica Paquette 642f6c61a3 [MachineOutliner] Keep track of fns that use a redzone in AArch64FunctionInfo
This patch adds a hasRedZone() function to AArch64MachineFunctionInfo. It
returns true if the function is known to use a redzone, false if it is known
to not use a redzone, and no value otherwise.

This removes the requirement to pass -mno-red-zone when outlining for AArch64.

https://reviews.llvm.org/D45189

llvm-svn: 329120
2018-04-03 21:56:10 +00:00
Farhana Aleen 936947349a Revert "MSG"
This reverts commit 9a0ce889d1c39c74d69ecad5ce9c875155ae55de.

This was committed by mistake.

llvm-svn: 329119
2018-04-03 21:51:45 +00:00
Jessica Paquette d506bf8e3d [MachineOutliner][NFC] Make outlined functions have internal linkage
The linkage type on outlined functions was private before. This meant that if
you set a breakpoint in an outlined function, the debugger wouldn't be able to
give a sane name to the outlined function.

This commit changes the linkage type to internal and updates any tests that
relied on the prefixes on the names of outlined functions.
 

llvm-svn: 329116
2018-04-03 21:36:00 +00:00
Farhana Aleen 3ab409dc86 MSG
llvm-svn: 329114
2018-04-03 21:20:39 +00:00
Gor Nishanov d4712715dd [coroutines] Respect alloca alignment requirements when building coroutine frame
Summary:
If an alloca need to be stored in the coroutine frame and it has an alignment specified and the alignment does not match the natural alignment of the alloca type. Insert appropriate padding into the coroutine frame to make sure that it gets requested alignment.

For example for a packet type (which natural alignment is 1), but alloca alignment is 8, we may need to insert a padding field with required number of bytes to make sure it is properly aligned.

```
%PackedStruct = type <{ i64 }>
...
  %data = alloca %PackedStruct, align 8
```

If the previous field in the coroutine frame had alignment 2, we would have [6 x i8] inserted before %PackedStruct in the coroutine frame:

```
%f.Frame = type { ..., i16, [6 x i8], %PackedStruct }
```

Reviewers: rnk, lewissbaker, modocache

Reviewed By: modocache

Subscribers: EricWF, llvm-commits

Differential Revision: https://reviews.llvm.org/D45221

llvm-svn: 329112
2018-04-03 20:54:20 +00:00
Florian Hahn 9467ccf447 [LoopInterchange] Add remark for calls preventing interchanging.
It also updates test/Transforms/LoopInterchange/call-instructions.ll
to use accesses where we can prove dependence after D35430.


Reviewers: sebpop, karthikthecool, blitz.opensource

Reviewed By: sebpop

Differential Revision: https://reviews.llvm.org/D45206

llvm-svn: 329111
2018-04-03 20:54:04 +00:00
Vlad Tsyrklevich d17f61ea3b Add the ShadowCallStack attribute
Summary:
Introduce the ShadowCallStack function attribute. It's added to
functions compiled with -fsanitize=shadow-call-stack in order to mark
functions to be instrumented by a ShadowCallStack pass to be submitted
in a separate change.

Reviewers: pcc, kcc, kubamracek

Reviewed By: pcc, kcc

Subscribers: cryptoad, mehdi_amini, javed.absar, llvm-commits, kcc

Differential Revision: https://reviews.llvm.org/D44800

llvm-svn: 329108
2018-04-03 20:10:40 +00:00
Sanjay Patel 223ef402c9 [x86] add tests for convert-FP-to-integer with constants; NFC
We don't constant fold any of these, but we could...but if we
do, we must produce the right answer.

Unlike the IR fptosi instruction or its DAG node counterpart 
ISD::FP_TO_SINT, these are not undef for an out-of-range input.

llvm-svn: 329100
2018-04-03 18:34:56 +00:00
David Blaikie 3945b15bb3 Disable a test using environment variables that requires a real shell
llvm-svn: 329096
2018-04-03 18:19:52 +00:00
Alexey Bataev f7226ed67d [DEBUGINFO] Add option that allows to disable emission of flags in .loc directives.
Summary:
Some targets do not support extended format of .loc directive and
support only simple format: .loc <FileID> <Line> <Column>. Patch adds
MCAsmInfo flag and option that allows emit .loc directive without
additional flags.

Reviewers: echristo

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D45184

llvm-svn: 329089
2018-04-03 17:28:55 +00:00
Daniel Neilson 901acfab0c [InstCombine] Fold compare of int constant against a splatted vector of ints
Summary:
Folding patterns like:
  %vec = shufflevector <4 x i8> %insvec, <4 x i8> undef, <4 x i32> zeroinitializer
  %cast = bitcast <4 x i8> %vec to i32
  %cond = icmp eq i32 %cast, 0
into:
  %ext = extractelement <4 x i8> %insvec, i32 0
  %cond = icmp eq i32 %ext, 0

Combined with existing rules, this allows us to fold patterns like:
  %insvec = insertelement <4 x i8> undef, i8 %val, i32 0
  %vec = shufflevector <4 x i8> %insvec, <4 x i8> undef, <4 x i32> zeroinitializer
  %cast = bitcast <4 x i8> %vec to i32
  %cond = icmp eq i32 %cast, 0
into:
  %cond = icmp eq i8 %val, 0

When we construct a splat vector via a shuffle, and bitcast the vector into an integer type for comparison against an integer constant. Then we can simplify the the comparison to compare the splatted value against the integer constant.

Reviewers: spatel, anna, mkazantsev

Reviewed By: spatel

Subscribers: efriedma, rengolin, llvm-commits

Differential Revision: https://reviews.llvm.org/D44997

llvm-svn: 329087
2018-04-03 17:26:20 +00:00
Alexey Bataev 428e9d9d87 [SLP] Fix PR36481: vectorize reassociated instructions.
Summary:
If the load/extractelement/extractvalue instructions are not originally
consecutive, the SLP vectorizer is unable to vectorize them. Patch
allows reordering of such instructions.

Patch does not support reordering of the repeated instruction, this must
be handled in the separate patch.

Reviewers: RKSimon, spatel, hfinkel, mkuper, Ayal, ashahid

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D43776

llvm-svn: 329085
2018-04-03 17:14:47 +00:00
Andrea Di Biagio 8dabf4f145 [llvm-mca] Move the logic that prints register file statistics to its own view. NFCI
Before this patch, the "BackendStatistics" view was responsible for printing the
register file usage (as well as many other statistics).

Now users can enable register file usage statistics using the command line flag
`-register-file-stats`. By default, the tool doesn't print register file
statistics.

llvm-svn: 329083
2018-04-03 16:46:23 +00:00
Florian Hahn b79217077d [LoopInterchange] Update tests so DA can handle access after D35430.
I have taken the opportunity to simplify some tests slightly and move
parts around.

It also brings back a few IR checks for interchangable loops.

Reviewers: karthikthecool, sebpop, grosser

Reviewed By: sebpop

Differential Revision: https://reviews.llvm.org/D45207

llvm-svn: 329081
2018-04-03 16:37:58 +00:00
Alexey Bataev 976aff148a [SLP] Added tests for checks of reordering of the repeated instructions,
NFC.

llvm-svn: 329080
2018-04-03 16:31:26 +00:00
Krzysztof Parzyszek 45ac73f71a [Hexagon] Remove unneeded attributes from lit test
llvm-svn: 329078
2018-04-03 16:05:20 +00:00
Benjamin Kramer 2fc3b18922 Revert "[SLP] Fix PR36481: vectorize reassociated instructions."
This reverts commit r328980 and r329046. Makes the vectorizer crash.

llvm-svn: 329071
2018-04-03 14:40:33 +00:00
Andrea Di Biagio 9da4d6db33 [MC][Tablegen] Allow the definition of processor register files in the scheduling model for llvm-mca
This patch allows the description of register files in processor scheduling
models. This addresses PR36662.

A new tablegen class named 'RegisterFile' has been added to TargetSchedule.td.
Targets can optionally describe register files for their processors using that
class. In particular, class RegisterFile allows to specify:
 - The total number of physical registers.
 - Which target registers are accessible through the register file.
 - The cost of allocating a register at register renaming stage.

Example (from this patch - see file X86/X86ScheduleBtVer2.td)

  def FpuPRF : RegisterFile<72, [VR64, VR128, VR256], [1, 1, 2]>

Here, FpuPRF describes a register file for MMX/XMM/YMM registers. On Jaguar
(btver2), a YMM register definition consumes 2 physical registers, while MMX/XMM
register definitions only cost 1 physical register.

The syntax allows to specify an empty set of register classes.  An empty set of
register classes means: this register file models all the registers specified by
the Target.  For each register class, users can specify an optional register
cost. By default, register costs default to 1.  A value of 0 for the number of
physical registers means: "this register file has an unbounded number of
physical registers".

This patch is structured in two parts.

* Part 1 - MC/Tablegen *

A first part adds the tablegen definition of RegisterFile, and teaches the
SubtargetEmitter how to emit information related to register files.

Information about register files is accessible through an instance of
MCExtraProcessorInfo.
The idea behind this design is to logically partition the processor description
which is only used by external tools (like llvm-mca) from the processor
information used by the llvm machine schedulers.
I think that this design would make easier for targets to get rid of the extra
processor information if they don't want it.

* Part 2 - llvm-mca related *

The second part of this patch is related to changes to llvm-mca.

The main differences are:
 1) class RegisterFile now needs to take into account the "cost of a register"
when allocating physical registers at register renaming stage.
 2) Point 1. triggered a minor refactoring which lef to the removal of the
"maximum 32 register files" restriction.
 3) The BackendStatistics view has been updated so that we can print out extra
details related to each register file implemented by the processor.

The effect of point 3. is also visible in tests register-files-[1..5].s.

Differential Revision: https://reviews.llvm.org/D44980

llvm-svn: 329067
2018-04-03 13:36:24 +00:00
Chandler Carruth ff2f4fcd51 [x86] Fix a pretty obvious think-o with my asm scrubbing. You have to in
fact use regular expression syntax to use regular expressions.

Should restore the bots. Sorry for the noise on this test.

Thanks to Philip for spotting the bug!

llvm-svn: 329057
2018-04-03 10:28:56 +00:00
Chandler Carruth 44a791a57a [x86] Clean up and enhance a test around eflags copying.
This adds the basic test cases from all the EFLAGS bugs in more direct
forms. It also switches to generated check lines, and includes both
32-bit and 64-bit variations.

No functionality changing here, just setting things up to have a nice
clean asm diff in my EFLAGS patch.

llvm-svn: 329056
2018-04-03 10:04:37 +00:00
Chandler Carruth 6646becd0c [x86] Extend my goofy SP offset scrubbing for llc test cases to actually
do explicit scrubbing of the offsets of stack spills and reloads.

You can always turn this off in order to test specific stack slot usage.
We were already hiding most of this, but the new logic hides it more
generically. Notably, we should effectively hide stack slot churn in
functions that have a frame pointer now, and should also hide it when
changing a function from stack pointer to frame pointer. That transition
already changes enough to be clearly noticed in the test case diff,
showing *every* spill and reload is really noisy without benefit. See
the test case I ran this on as a classic example.

llvm-svn: 329055
2018-04-03 09:57:05 +00:00
Alexander Potapenko ac70668cff MSan: introduce the conservative assembly handling mode.
The default assembly handling mode may introduce false positives in the
cases when MSan doesn't understand that the assembly call initializes
the memory pointed to by one of its arguments.

We introduce the conservative mode, which initializes the first
|sizeof(type)| bytes for every |type*| pointer passed into the
assembly statement.

llvm-svn: 329054
2018-04-03 09:50:06 +00:00
Max Kazantsev c01e47b43f [SCEV] Make computeExitLimit more simple and more powerful
Current implementation of `computeExitLimit` has a big piece of code
the only purpose of which is to prove that after the execution of this
block the latch will be executed. What it currently checks is actually a
subset of situations where the exiting block dominates latch.

This patch replaces all these checks for simple particular cases with
domination check over loop's latch which is the only necessary condition
of taking the exiting block into consideration. This change allows to
calculate exact loop taken count for simple loops like

  for (int i = 0; i < 100; i++) {
    if (cond) {...} else {...}
    if (i > 50) break;
    . . .
  }

Differential Revision: https://reviews.llvm.org/D44677
Reviewed By: efriedma

llvm-svn: 329047
2018-04-03 05:57:19 +00:00
Yonghong Song d3b522f519 bpf: fix incorrect SELECT_CC lowering
Commit 37962a331c77 ("bpf: Improve expanding logic in LowerSELECT_CC")
intended to improve code quality for certain jmp conditions. The
commit, however, has a couple of issues:
  (1). In code, just swap is not enough, ConditionalCode CC
       should also be swapped, otherwise incorrect code will
       be generated.
  (2). The ConditionalCode swap should be subject to
       getHasJmpExt(). If getHasJmpExt() is False, certain
       conditional codes will not be supported and swap
       may generate incorrect code.

The original goal for this patch is to optimize jmp operations
which does not have JmpExt turned on. If JmpExt is on,
better code could be generated. For example, the test
select_ri.ll is introduced to demonstrate the optimization.
The same result can be achieved with -mcpu=v2 flag.

Signed-off-by: Yonghong Song <yhs@fb.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
llvm-svn: 329043
2018-04-03 03:56:37 +00:00
Ikhlas Ajbar b7322e8ac7 peel loops with runtime small trip counts
For Hexagon, peeling loops with small runtime trip count is beneficial for our
benchmarks. We set PeelCount in HexagonTargetInfo.cpp and we use PeelCount set
by the target for computing the desired peel count.

Differential Revision: https://reviews.llvm.org/D44880

llvm-svn: 329042
2018-04-03 03:39:43 +00:00
Chandler Carruth 72eb30f7b3 [x86] Tidy up test case, generate check lines with script. NFC.
Just adds basic block labels and tidies up where comments go in the test
case and then generates fresh CHECK lines with the script. This way, the
check lines are much easier to maintain. They were already close to this
but not quite there.

llvm-svn: 329040
2018-04-03 02:19:05 +00:00
Haicheng Wu 7f0daaeb86 [SLP] Distinguish "demanded and shrinkable" from "demanded and not shrinkable" values when determining the minimum bitwidth
We use two approaches for determining the minimum bitwidth.

   * Demanded bits
   * Value tracking

If demanded bits doesn't result in a narrower type, we then try value tracking.
We need this if we want to root SLP trees with the indices of getelementptr
instructions since all the bits of the indices are demanded.

But there is a missing piece though. We need to be able to distinguish "demanded
and shrinkable" from "demanded and not shrinkable". For example, the bits of %i
in

%i = sext i32 %e1 to i64
%gep = getelementptr inbounds i64, i64* %p, i64 %i

are demanded, but we can shrink %i's type to i32 because it won't change the
result of the getelementptr. On the other hand, in

%tmp15 = sext i32 %tmp14 to i64
%tmp16 = insertvalue { i64, i64 } undef, i64 %tmp15, 0

it doesn't make sense to shrink %tmp15 and we can skip the value tracking.

Ideas are from Matthew Simpson!

Differential Revision: https://reviews.llvm.org/D44868

llvm-svn: 329035
2018-04-03 00:05:10 +00:00
Brian Gesiak 64521bed0d [Coroutines] Avoid assert splitting hidden coros
Summary:
When attempting to split a coroutine with 'hidden' visibility (for
example, a C++ coroutine that is inlined when compiled with the option
'-fvisibility-inlines-hidden'), LLVM would hit an assertion in
include/llvm/IR/GlobalValue.h:240: "local linkage requires default
visibility". The issue is that the visibility is copied from the source
of the function split in the `CloneFunctionInto` function, but the linkage
is not. To fix, create the new function first with external linkage,
then copy the linkage from the original function *after* `CloneFunctionInto`
is called.

Since `GlobalValue::setLinkage` in turn calls `maybeSetDsoLocal`, the
explicit call to `setDSOLocal` can be removed in CoroSplit.cpp.

Test Plan: check-llvm

Reviewers: GorNishanov, lewissbaker, EricWF, majnemer, rnk

Reviewed By: rnk

Subscribers: llvm-commits, eric_niebler

Differential Revision: https://reviews.llvm.org/D44185

llvm-svn: 329033
2018-04-02 23:39:40 +00:00
Rafael Espindola 8c58750cc4 Align stubs for external and common global variables to pointer size.
This patch fixes PR36885: clang++ generates unaligned stub symbol
holding a pointer.

Patch by Rahul Chaudhry!

llvm-svn: 329030
2018-04-02 23:20:30 +00:00
Eric Christopher 84e2cd6c02 Remove llvm-mcmarkup.
It was never used and I've checked with the original authors.

llvm-svn: 329029
2018-04-02 23:17:55 +00:00
Reid Kleckner 298ffc609b [InstCombine] Don't strip function type casts from musttail calls
Summary:
The cast simplifications that instcombine does here do not make any
attempt to obey the verifier rules for musttail calls. Therefore we have
to disable them.

Reviewers: efriedma, majnemer, pcc

Subscribers: hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D45186

llvm-svn: 329027
2018-04-02 22:49:44 +00:00
Reid Kleckner a9e9918ee4 Treat inlining a notail call as a regular, non-tail call
Otherwise, we end up inlining a musttail call into a non-tail position,
which breaks verifier invariants.

Fixes PR31014

llvm-svn: 329015
2018-04-02 21:23:16 +00:00
Sanjay Patel cbb0450540 [InstCombine] add folds for icmp + sub (PR36969)
(A - B) >u A --> A <u B
C <u (C - D) --> C <u D

https://rise4fun.com/Alive/e7j

Name: ugt
  %sub = sub i8 %x, %y
  %cmp = icmp ugt i8 %sub, %x
=>
  %cmp = icmp ult i8 %x, %y
  
Name: ult
  %sub = sub i8 %x, %y
  %cmp = icmp ult i8 %x, %sub
=>
  %cmp = icmp ult i8 %x, %y

This should fix:
https://bugs.llvm.org/show_bug.cgi?id=36969

llvm-svn: 329011
2018-04-02 20:37:40 +00:00
Sanjay Patel be0442eeaa [InstCombine] add tests for icmp (sub x, y), x (PR36969); NFC
llvm-svn: 329010
2018-04-02 20:23:54 +00:00
Douglas Yung 03fef3a42c Another attempt to fix papertrail-warnings.test on Windows bots by making expected message less case sensitive.
llvm-svn: 329008
2018-04-02 20:05:05 +00:00
Zachary Turner d11328a1bb [llvm-pdbutil] Add an export subcommand.
This command can dump the binary contents of a stream to a file.
This is useful when you want to do side-by-side comparisons of
a specific stream from two PDBs to examine the differences between
them.  You can export both of them to a file, then open them up
side by side in a hex editor (for example), so as to eliminate any
differences that might arise from the contents being on different
blocks in the PDB.

In subsequent patches I plan to improve the "explain" subcommand
so that you can explain the contents of a binary file that isn't
necessarily a full PDB, but one of these dumped streams, by telling
the subcommand how to interpret the contents.

llvm-svn: 329002
2018-04-02 18:35:21 +00:00
Rong Xu 5a8d4c3357 [DeadArgumentElim] Clone function level metadatas
Some Function level metadatas, such as function entry count, are not cloned in
DeadArgumentElim. This happens a lot in lto/thinlto because of DeadArgumentElim
after internalization.

This patch clones the metadatas in the original function to the new function.

Differential Revision: https://reviews.llvm.org/D44127

llvm-svn: 328991
2018-04-02 17:27:38 +00:00
Dmitry Preobrazhensky b181c7312e [AMDGPU][MC][GFX9] Added instructions v_cvt_norm_*16_f16, v_sat_pk_u8_i16
See bug 36847: https://bugs.llvm.org/show_bug.cgi?id=36847

Differential Revision: https://reviews.llvm.org/D45097

Reviewers: artem.tamazov, arsenm, timcorringham
llvm-svn: 328988
2018-04-02 17:09:20 +00:00
Gor Nishanov b0316d96ae [coroutines] Add support for llvm.coro.noop intrinsics
Summary:
A recent addition to Coroutines TS (https://wg21.link/p0913) adds a pre-defined coroutine noop_coroutine that does nothing.
To implement this feature, we implemented an llvm.coro.noop intrinsic that returns a coroutine handle to a coroutine that does nothing when resumed or destroyed.

Reviewers: EricWF, modocache, rnk, lewissbaker

Reviewed By: modocache

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D45114

llvm-svn: 328986
2018-04-02 16:55:12 +00:00
Dmitry Preobrazhensky 6bad04ecf5 [AMDGPU][MC][GFX9] Added s_atomic_* and s_buffer_atomic_* instructions
Fixed a bug which caused Tablegen crash.

See bug 36837: https://bugs.llvm.org/show_bug.cgi?id=36837

Differential Revision: https://reviews.llvm.org/D45085

Reviewers: artem.tamazov, arsenm, timcorringham
llvm-svn: 328983
2018-04-02 16:10:25 +00:00
Alexey Bataev 3decaf4275 [SLP] Fix PR36481: vectorize reassociated instructions.
Summary:
If the load/extractelement/extractvalue instructions are not originally
consecutive, the SLP vectorizer is unable to vectorize them. Patch
allows reordering of such instructions.

Reviewers: RKSimon, spatel, hfinkel, mkuper, Ayal, ashahid

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D43776

llvm-svn: 328980
2018-04-02 14:51:37 +00:00
Nico Weber f492f58182 Revert r328975, it makes TableGen assert on the bots.
llvm-svn: 328978
2018-04-02 14:20:23 +00:00
Dmitry Preobrazhensky 32c450ae6a [AMDGPU][MC][GFX9] Added s_atomic_* and s_buffer_atomic_* instructions
See bug 36837: https://bugs.llvm.org/show_bug.cgi?id=36837

Differential Revision: https://reviews.llvm.org/D45085

Reviewers: artem.tamazov, arsenm, timcorringham
llvm-svn: 328975
2018-04-02 13:52:23 +00:00
Lama Saba 927468309f [X86] Reduce Store Forward Block issues in HW - Recommit after fixing Bug 36346
If a load follows a store and reloads data that the store has written to memory, Intel microarchitectures can in many cases forward the data directly from the store to the load, This "store forwarding" saves cycles by enabling the load to directly obtain the data instead of accessing the data from cache or memory.
A "store forward block" occurs in cases that a store cannot be forwarded to the load. The most typical case of store forward block on Intel Core microarchiticutre that a small store cannot be forwarded to a large load.
The estimated penalty for a store forward block is ~13 cycles.

This pass tries to recognize and handle cases where "store forward block" is created by the compiler when lowering memcpy calls to a sequence
of a load and a store.

The pass currently only handles cases where memcpy is lowered to XMM/YMM registers, it tries to break the memcpy into smaller copies.
breaking the memcpy should be possible since there is no atomicity guarantee for loads and stores to XMM/YMM.

Differential revision: https://reviews.llvm.org/D41330

Change-Id: Ib48836ccdf6005989f7d4466fa2035b7b04415d9
llvm-svn: 328973
2018-04-02 13:48:28 +00:00
Andrea Di Biagio 6fd62feff8 [llvm-mca] Do not assume that implicit reads cannot be associated with ReadAdvance entries.
Before, the instruction builder incorrectly assumed that only explicit reads
could have been associated with ReadAdvance entries.
This patch fixes the issue and adds a test to verify it.

llvm-svn: 328972
2018-04-02 13:46:49 +00:00
Nico Weber 62ea0c562e Attempt to fix papertrail-warnings.test on Windows bots.
llvm-svn: 328971
2018-04-02 13:45:39 +00:00
Jonas Devlieghere 9e3e7a99e8 [dsymutil] Upstream emitting of papertrail warnings.
When running dsymutil as part of your build system, it can be desirable
for warnings to be part of the end product, rather than just being
emitted to the output stream. This patch upstreams that functionality.

Differential revision: https://reviews.llvm.org/D44639

llvm-svn: 328965
2018-04-02 10:40:43 +00:00
Craig Topper 96729cd64b [X86][Silvermont] Use correct latency and throughput information for divide and square root in the scheduler model.
Data taken from Table 16-17 in the Intel Optimization Manual.

llvm-svn: 328962
2018-04-02 06:34:16 +00:00
Craig Topper 6a814904da [X86][SkylakeServer] Correct throughput for 512-bit sqrt and divide.
Data taken from the AVX512_SKX_PortAssign spreadsheet at http://instlatx64.atw.hu/

llvm-svn: 328961
2018-04-02 05:54:34 +00:00
Craig Topper 8104f266a4 [X86] Correct the throughput for divide instructions in Sandy Bridge/Haswell/Broadwell/Skylake scheduler models.
Fixes most of PR36898. Still need to fix the 512-bit instructions, but Agner's tables don't have those.

llvm-svn: 328960
2018-04-02 05:33:28 +00:00
Craig Topper dc74094398 [X86] Fix the SchedRW for AVX512 shift instructions.
It was being inadvertently defaulted to an FADD scheduler class.

llvm-svn: 328959
2018-04-02 03:15:02 +00:00
Craig Topper caec723a1a [X86] Add an itinerary to BTR64rr.
llvm-svn: 328956
2018-04-02 01:12:34 +00:00
Harlan Haskins b7881bbfa2 Add C API bindings for DIBuilder 'Type' APIs
This patch adds a set of unstable C API bindings to the DIBuilder interface for
creating structure, function, and aggregate types.

This patch also removes the existing implementations of these functions from
the Go bindings and updates the Go API to fit the new C APIs.

llvm-svn: 328953
2018-04-02 00:17:40 +00:00
Petr Hosek 934e5d5436 [AArch64] Reserve x18 register on Fuchsia
This register is reserved as a platform register on Fuchsia.

Differential Revision: https://reviews.llvm.org/D45105

llvm-svn: 328950
2018-04-01 23:44:04 +00:00
Nicolai Haehnle 398c0b6701 TableGen: Support Intrinsic values in SearchableTable
Summary:
We will use this in the AMDGPU backend in a subsequent patch
in the stack to lookup target-specific per-intrinsic information.

The generic CodeGenIntrinsic machinery is used to ensure that,
even though we don't calculate actual enum values here, we do
get the intrinsics in the right order for the binary search
index.

Change-Id: If61cd5587963a4c5a1cc53df1e59c5e4dec1f9dc

Reviewers: arsenm, rampitec, b-sumner

Subscribers: wdng, tpr, llvm-commits

Differential Revision: https://reviews.llvm.org/D44935

llvm-svn: 328937
2018-04-01 17:08:58 +00:00
Teresa Johnson 974706ebf7 [ThinLTO] Add an import cutoff for debugging/triaging
Summary:
Adds -import-cutoff=N which will stop importing during the thin link
after N imports. Default is -1 (no  limit).

Reviewers: wmi

Subscribers: inglorion, llvm-commits

Differential Revision: https://reviews.llvm.org/D45127

llvm-svn: 328934
2018-04-01 15:54:40 +00:00
David Green f80ebc8d21 [LoopRotate] Rotate loops with loop exiting latches
If a loop has a loop exiting latch, it can be profitable
to rotate the loop if it leads to the simplification of
a phi node. Perform rotation in these cases even if loop
rotate itself didnt simplify the loop to get there.

Differential Revision: https://reviews.llvm.org/D44199

llvm-svn: 328933
2018-04-01 12:48:24 +00:00
Craig Topper db6caabccc [X86] Check if the load and store are to the same pointer before preventing i16 RMW shifts and subtracts from being promoted.
llvm-svn: 328930
2018-04-01 06:29:28 +00:00
Craig Topper 3998041e80 [X86] Add test case to show failure to promote i16 subtract when the LHS is a load and the result is stored to a different address.
We mistakenly believe we might be able to fold this as a RMW operation, but that doesn't end up happening.

llvm-svn: 328929
2018-04-01 06:29:27 +00:00
Craig Topper ae2de57db0 [X86] Allow i16 subtracts to be promoted if the load is on the LHS and its not being stored.
llvm-svn: 328928
2018-04-01 06:29:25 +00:00
Craig Topper 280f631350 [X86] Add test case to show failure to promote i16 subtract because we mistakenly believe the load can be folded. NFC
The left hand side of the subtract is a load, but we cna't fold those unless we also have a store.

llvm-svn: 328927
2018-04-01 06:29:23 +00:00
Sanjay Patel 6124cae8f7 [DAGCombine] (float)((int) f) --> ftrunc (PR36617)
fptosi / fptoui round towards zero, and that's the same behavior as ISD::FTRUNC, 
so replace a pair of casts with the equivalent node. We don't have to account for 
special cases (NaN, INF) because out-of-range casts are undefined.

Differential Revision: https://reviews.llvm.org/D44909

llvm-svn: 328921
2018-03-31 17:55:44 +00:00
Simon Pilgrim 3b8ad346f9 [X86][Btver2] Add MMX_PSHUFB to the JWritePSHUFB InstRW entries
llvm-svn: 328918
2018-03-31 09:15:54 +00:00
Puyan Lotfi 57c4f38c35 [MIR-Canon] Adding support for local idempotent instruction hoisting.
llvm-svn: 328915
2018-03-31 05:48:51 +00:00
Craig Topper 13a0f83a05 [X86] Add SchedRW for PMULLD
Summary:
It seems many CPUs don't implement this instruction as well as the other vector multiplies. Often using a multi uop flow. Silvermont in particular has a 7 uop flow with 11 cycle throughput. Sandy Bridge implements it as a single uop with 5 cycle latency and 1 cycle throughput. But Haswell and later use 2 uops with 10 cycle latency and 2 cycle throughput.

This patch adds a new X86SchedWritePair we can use to tag this instruction separately. I've provided correct information for Silvermont, Btver2, and Sandy Bridge. I've removed the InstRWs for SandyBridge. I've left Haswell/Broadwell/Skylake InstRWs in place because I wasn't sure how to account for the different load latency between 128 and 256 bits. I also left Znver1 InstRWs in place because the existing values don't match Agner's spreadsheet.

I also left a FIXME in the SandyBridge model because it being used for the "generic" model is too optimistic for the 256/512-bit versions since those are multiple uops on all known CPUs.

Reviewers: RKSimon, GGanesh, courbet

Reviewed By: RKSimon

Subscribers: gchatelet, gbedwell, andreadb, llvm-commits

Differential Revision: https://reviews.llvm.org/D44972

llvm-svn: 328914
2018-03-31 04:54:32 +00:00
Teresa Johnson db83aceb06 [ThinLTO] Add an option to force summary call edges cold for debugging
Summary:
Useful to selectively disable importing into specific modules for
debugging/triaging/workarounds.

Reviewers: eraman

Subscribers: inglorion, llvm-commits

Differential Revision: https://reviews.llvm.org/D45062

llvm-svn: 328909
2018-03-31 00:18:08 +00:00
Krzysztof Parzyszek 526fbf8e33 [Hexagon] Fix testcase
llvm-svn: 328899
2018-03-30 19:46:28 +00:00
Krzysztof Parzyszek 0f983d69a4 [Hexagon] Avoid creating invalid offsets in packetizer
Two memory instructions with a dependency only on the address register
between the two (the first one of them being post-incrememnt) can be
packetized together after the offset on the second was updated to the
incremement value. Make sure that the new offset is valid for the
instruction.

llvm-svn: 328897
2018-03-30 19:28:37 +00:00
Andrea Di Biagio dc97172b2f [X86][BtVer2] Fixed the number of micro opcodes for AVX vector converts and
VSQRT instructions.

There were still a few AVX instructions with an incorrect number of opcodes.
These should be fixed now.

llvm-svn: 328892
2018-03-30 18:53:47 +00:00
Peter Collingbourne d03bf12c1b DataFlowSanitizer: wrappers of functions with local linkage should have the same linkage as the function being wrapped
This patch resolves link errors when the address of a static function is taken, and that function is uninstrumented by DFSan.

This change resolves bug 36314.

Patch by Sam Kerner!

Differential Revision: https://reviews.llvm.org/D44784

llvm-svn: 328890
2018-03-30 18:37:55 +00:00
Puyan Lotfi 399b46c98d [MIR] Adding support for Named Virtual Registers in MIR.
llvm-svn: 328887
2018-03-30 18:15:54 +00:00
Andrea Di Biagio 3eaa26bb64 [X86][BtVer2] Fix the number of uOps for horizontal operations.
llvm-svn: 328886
2018-03-30 18:15:30 +00:00
Robert Widmann 478fce9ebf [LLVM-C] Finish exception instruction bindings - Round 2
Summary:
Previous revision caused a leak in the echo test that got caught by the ASAN bots because of missing free of the handlers array and was reverted in r328759.  Resubmitting the patch with that correction.

Add support for cleanupret, catchret, catchpad, cleanuppad and catchswitch and their associated accessors.

Test is modified from SimplifyCFG because it contains many diverse usages of these instructions.

Reviewers: whitequark, deadalnix

Reviewed By: whitequark

Subscribers: llvm-commits, vlad.tsyrklevich

Differential Revision: https://reviews.llvm.org/D45100

llvm-svn: 328883
2018-03-30 17:49:53 +00:00
Zachary Turner d5cf5cf637 [llvm-pdbutil] Dig deeper into the PDB and DBI streams when explaining.
This will show more detail when using `llvm-pdbutil explain` on an
offset in the DBI or PDB streams.  Specifically, it will dig into
individual header fields and substreams to give a more precise
description of what the byte represents.

llvm-svn: 328878
2018-03-30 17:16:50 +00:00
Krzysztof Parzyszek fce30c2ba3 Revert "peel loops with runtime small trip counts"
This reverts commit r328854, it breaks some Hexagon tests.

llvm-svn: 328875
2018-03-30 16:55:44 +00:00
Stanislav Mekhanoshin 74e2974ac6 [AMDGPU] Fixed some instructions latencies
Differential Revision: https://reviews.llvm.org/D45073

llvm-svn: 328874
2018-03-30 16:19:13 +00:00
Sanjay Patel e09b7dcf3d [SelectionDAG] Removing FABS folding from DAGCombiner
The code has bugs dealing with -0.0.

Since D44550 introduced FABS pattern folding in InstCombine, 
this patch removes the now-redundant code that causes 
https://bugs.llvm.org/show_bug.cgi?id=36600.

Patch by Mikhail Dvoretckii!

Differential Revision: https://reviews.llvm.org/D44683

llvm-svn: 328872
2018-03-30 15:42:52 +00:00
Krzysztof Parzyszek 4f99836a9e [Hexagon] Recognize and handle :endloop01
llvm-svn: 328870
2018-03-30 15:29:47 +00:00
Krzysztof Parzyszek 46abcb236b [Hexagon] Fix printing :mem_noshuf on compiler-generated packets
llvm-svn: 328869
2018-03-30 15:09:05 +00:00
Andrea Di Biagio 073a9d74ca [X86][BtVer2] Add missing ReadAfterLd to RM variants of AVX horizontal adds and
most vector logic instructions.

Fixed a few InstRW that forgot to specify a ReadAfterLd for the register input
operand.

llvm-svn: 328867
2018-03-30 14:48:08 +00:00
Andrea Di Biagio 42d8ea22c0 [X86][BtVer2] Add tests that show how ReadAfterLd is missing for some
instructions.

In the Btver2 model, there are a few InstRW overrides that don't specify a
ReadAfterLd for the register input operand.

As a result, a few AVX variants of horizontal operations and most vector logic
operations with a folded memory operand don't have a ReadAdvance info associated
to their input register operands.

llvm-svn: 328865
2018-03-30 14:29:33 +00:00
Andrea Di Biagio 01043625cf [X86] Add llvm-mca tests for r328834.
Verify that the ReadAfterLd is correctly applied to FMA and 4-ops variable blend
instructions.

As Craig pointed out in D44726, some Intel models still have to be fixed.

llvm-svn: 328861
2018-03-30 13:38:37 +00:00
Andrea Di Biagio 0823090843 [X86] Add tests to verify the presence of "ReadAfterLd" after r328823.
This change adds a couple of tests to verify the change introduced by revision
328823 ([X86] Correct the placement of ReadAfterLd in BEXTR and BZHI).

llvm-svn: 328859
2018-03-30 11:44:48 +00:00
Vlad Tsyrklevich 894c028d56 Revert "[LLVM-C] Finish exception instruction bindings"
This reverts commit r328759. It was causing LSan failures on sanitizer-x86_64-linux-bootstrap

llvm-svn: 328858
2018-03-30 06:21:28 +00:00
Michael Bedy 59e5ef793c [AMDGPU] Fix the SDWA Peephole phase to handle src for dst:UNUSED_PRESERVE.
Summary:
The phase attempts to transform operations that extract a portion of a value
into an SDWA src operand in cases where that value is used only once. It
was not prepared for this use to be the preserved portion of a value for
dst:UNUSED_PRESERVE, resulting in a crash or assert.

This change either rejects the illegal SDWA attempt, or in the case where
dst:WORD_1 and the src_sel would be WORD_0, removes the unneeded
extract instruction.

Reviewers: arsenm, #amdgpu

Reviewed By: arsenm, #amdgpu

Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D44364

llvm-svn: 328856
2018-03-30 05:03:36 +00:00
Ikhlas Ajbar a7d614c3e5 [Hexagon] add missing lit config file
llvm-svn: 328855
2018-03-30 03:32:24 +00:00
Ikhlas Ajbar 66c8ba5a50 peel loops with runtime small trip counts
For Hexagon, peeling loops with small runtime trip count is beneficial for our
benchmarks. We set PeelCount in HexagonTargetInfo.cpp and we use PeelCount set
by the target for computing the desired peel count.

Differential Revision: https://reviews.llvm.org/D44880

llvm-svn: 328854
2018-03-30 03:05:34 +00:00
Eli Friedman 208fe67a78 [MachineCopyPropagation] Handle COPY with overlapping source/dest.
MachineCopyPropagation::CopyPropagateBlock has a bunch of special
handling for COPY instructions. This handling assumes that COPY
instructions do not modify the source of the copy; this is wrong if
the COPY destination overlaps the source.

To fix the bug, check explicitly for this situation, and fall back to
the generic instruction handling.

This bug can't happen for most register classes because they don't
have this sort of overlap, but there are a few register classes
where this is possible. The testcase uses the AArch64 QQQQ register
class.

Differential Revision: https://reviews.llvm.org/D44911

llvm-svn: 328851
2018-03-30 00:56:03 +00:00
Matt Arsenault 03ae399d50 AMDGPU: Support realigning stack
While the stack access instructions don't care about
alignment > 4, some transformations on the pointer calculation
do make assumptions based on knowing the low bits of a pointer
are 0. If a stack object ends up being accessed through its
absolute address (relative to the kernel scratch wave offset),
the addressing expression may depend on the stack frame being
properly aligned. This was breaking in a testcase due to the
add->or combine.

I think some of the SP/FP handling logic is still backwards,
and overly simplistic to support all of the stack features.
Code which tries to modify the SP with inline asm for example
or variable sized objects will probably require redoing this.

llvm-svn: 328831
2018-03-29 21:30:06 +00:00
Evgeniy Stepanov 50635dab26 Add msan custom mapping options.
Similarly to https://reviews.llvm.org/D18865 this adds options to provide custom mapping for msan.
As discussed in http://lists.llvm.org/pipermail/llvm-dev/2018-February/121339.html

Patch by vit9696(at)avp.su.

Differential Revision: https://reviews.llvm.org/D44926

llvm-svn: 328830
2018-03-29 21:18:17 +00:00
Craig Topper 89310f56c8 [X86] Correct the placement of ReadAfterLd in BEXTR and BZHI. Add dedicated SchedRW for BEXTR/BZHI.
These instructions have the memory operand before the register operand. So we need to put ReadDefault for all the load ops first. Then the ReadAfterLd

Differential Revision: https://reviews.llvm.org/D44838

llvm-svn: 328823
2018-03-29 20:41:39 +00:00
Matt Arsenault ffb132e74b AMDGPU: Increase default stack alignment
8 and 16-byte values are common, so increase the default
alignment to avoid realigning the stack in most functions.

llvm-svn: 328821
2018-03-29 20:22:04 +00:00
Kevin Enderby d9911f6f7b For llvm-nm and Mach-O files that are fully stripped, special case a redacted LC_MAIN
As a further refinement on:

r328274 - For llvm-nm and Mach-O files also use function starts info in some cases when printing symbols

we want to special case a redacted LC_MAIN so it is easier to find.

rdar://38978929

llvm-svn: 328820
2018-03-29 20:04:29 +00:00