Commit Graph

172194 Commits

Author SHA1 Message Date
Nicolai Haehnle 413f8691ab LegacyDivergenceAnalysis: fix uninitialized value
Change-Id: I014502e431a68f7beddf169f6a3d19dac5dd2c26
llvm-svn: 348051
2018-11-30 23:07:49 +00:00
Nicolai Haehnle a7b00058e0 AMDGPU: Divergence-driven selection of scalar buffer load intrinsics
Summary:
Moving SMRD to VMEM in SIFixSGPRCopies is rather bad for performance if
the load is really uniform. So select the scalar load intrinsics directly
to either VMEM or SMRD buffer loads based on divergence analysis.

If an offset happens to end up in a VGPR -- either because a floating
point calculation was involved, or due to other remaining deficiencies
in SIFixSGPRCopies -- we use v_readfirstlane.

There is some unrelated churn in tests since we now select MUBUF offsets
in a unified way with non-scalar buffer loads.

Change-Id: I170e6816323beb1348677b358c9d380865cd1a19

Reviewers: arsenm, alex-t, rampitec, tpr

Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D53283

llvm-svn: 348050
2018-11-30 22:55:38 +00:00
Nicolai Haehnle a9cc92c247 AMDGPU: Fix various issues around the VirtReg2Value mapping
Summary:
The VirtReg2Value mapping is crucial for getting consistently
reliable divergence information into the SelectionDAG. This
patch fixes a bunch of issues that lead to incorrect divergence
info and introduces tight assertions to ensure we don't regress:

1. VirtReg2Value is generated lazily; there were some cases where
   a lookup was performed before all relevant virtual registers were
   created, leading to an out-of-sync mapping. Those cases were:

  - Complex code to lower formal arguments that generated CopyFromReg
    nodes from live-in registers (fixed by never querying the mapping
    for live-in registers).

  - Code that generates CopyToReg for formal arguments that are used
    outside the entry basic block (fixed by never querying the
    mapping for Register nodes, which don't need the divergence info
    anyway).

2. For complex values that are lowered to a sequence of registers,
   all registers must be reflected in the VirtReg2Value mapping.

I am not adding any new tests, since I'm not actually aware of any
bugs that these problems are causing with trunk as-is. However,
I recently added a test case (in r346423) which fails when D53283 is
applied without this change. Also, the new assertions should provide
most of the effective test coverage.

There is one test change in sdwa-peephole.ll. The underlying issue
is that since the divergence info is now correct, the DAGISel will
select V_OR_B32 directly instead of S_OR_B32. This leads to an extra
COPY which affects the behavior of MachineLICM in a way that ends up
with the S_MOV_B32 with the constant in a different basic block than
the V_OR_B32, which is presumably what defeats the peephole.

Reviewers: alex-t, arsenm, rampitec

Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D54340

llvm-svn: 348049
2018-11-30 22:55:29 +00:00
Nicolai Haehnle 56d0ed2a50 [DA] GPUDivergenceAnalysis for unstructured GPU kernels
Summary:
This is patch #3 of the new DivergenceAnalysis

  <https://lists.llvm.org/pipermail/llvm-dev/2018-May/123606.html>

The GPUDivergenceAnalysis is intended to eventually supersede the existing
LegacyDivergenceAnalysis. The existing LegacyDivergenceAnalysis produces
incorrect results on unstructured Control-Flow Graphs:

  <https://bugs.llvm.org/show_bug.cgi?id=37185>

This patch adds the option -use-gpu-divergence-analysis to the
LegacyDivergenceAnalysis to turn it into a transparent wrapper for the
GPUDivergenceAnalysis.

Reviewers: nhaehnle

Reviewed By: nhaehnle

Subscribers: jholewinski, jvesely, jfb, llvm-commits, alex-t, sameerds, arsenm, nhaehnle

Differential Revision: https://reviews.llvm.org/D53493

llvm-svn: 348048
2018-11-30 22:55:20 +00:00
Sanjay Patel 39298cae9f [x86] add tests for undef + partial undef constant folding; NFC
Keep this file sync'd with the instsimplify version (rL348045).

llvm-svn: 348047
2018-11-30 22:54:33 +00:00
Craig Topper 502fc1bdd5 [X86] Split skylake-avx512 run lines in SLP vectorizer tests to cover -mprefer=vector-width=256 and -mprefer-vector-width=512.
This will make these tests immune if we ever change the default behavior of -march=skylake-avx512 to prefer 256 bit vectors.

llvm-svn: 348046
2018-11-30 22:53:21 +00:00
Sanjay Patel 398728732e [InstSimplify] add tests for undef + partial undef constant folding; NFC
These tests should probably go under a separate test file because they
should fold with just -constprop, but they're similar to the scalar
tests already in here.

llvm-svn: 348045
2018-11-30 22:51:34 +00:00
Nikita Popov 219e5367d0 [ValueTracking] Make unit tests easier to write; NFC
Generalize the existing MatchSelectPatternTest class to also work
with other types of tests. This reduces the amount of boilerplate
necessary to write ValueTracking tests in general, and computeKnownBits
tests in particular.

The inherited convention is that the function must be @test and the
tested instruction %A.

Differential Revision: https://reviews.llvm.org/D55141

llvm-svn: 348043
2018-11-30 22:22:30 +00:00
Saleem Abdulrasool 5842df93dd Support: use std::is_trivially_copyable on MSVC
MSVC 2015 and newer have std::is_trivially_copyable available for use.
We should prefer that over the std::is_class to get this check be
correct.

llvm-svn: 348042
2018-11-30 22:13:42 +00:00
Jessica Paquette 1cb18ec4ec [MachineOutliner] Outline both register save calls + no LR save calls together
Instead of treating the outlined functions for these as distinct frames, they
should be combined into one case. Neither allows for stack fixups, and both
generate the same frame. Thus, they ought to be considered one case.

This makes the code far easier to understand, for one thing. It also offers
some small code size improvements. It's fairly rare to see a class of outlined
functions that doesn't fall entirely into one variant (on CTMark anyway). It
does happen from time to time though.

This mostly offers some serious simplification.

Also update the test to show the added functionality.

llvm-svn: 348036
2018-11-30 21:14:58 +00:00
Peter Collingbourne 35fcc294ab AArch64: Don't emit CFI for SCS register in nounwind functions.
All that you can legitimately do with the CFI for a nounwind function
is get a backtrace, and adjusting the SCS register is not (currently)
required for this purpose.

Differential Revision: https://reviews.llvm.org/D54988

llvm-svn: 348035
2018-11-30 21:04:25 +00:00
Evandro Menezes 58e94f91a8 [TableGen] Fix negation of simple predicates
Simple predicates, such as those defined by `CheckRegOperandSimple` or
`CheckImmOperandSimple`, were not being negated when used with `CheckNot`.

This change fixes this issue by defining the previously declared methods to
handle simple predicates.

Differential revision: https://reviews.llvm.org/D55089

llvm-svn: 348034
2018-11-30 21:03:24 +00:00
Joseph Tremoulet 27b1e3bd4f [Mem2Reg] Fix nondeterministic corner case
Summary:
When mem2reg inserts phi nodes in blocks with unreachable predecessors,
it adds undef operands for those incoming edges.  When there are
multiple such predecessors, the order is currently based on the address
of the BasicBlocks.  This change fixes that by using the BBNumbers in
the sort/search predicates, as is done elsewhere in mem2reg to ensure
determinism.

Also adds a testcase with a bunch of unreachable preds, which
(nodeterministically) fails without the fix.


Reviewers: majnemer

Reviewed By: majnemer

Subscribers: mgrang, llvm-commits

Differential Revision: https://reviews.llvm.org/D55077

llvm-svn: 348024
2018-11-30 19:20:02 +00:00
Scott Linder 4ed5195712 [DWARFv5] Verify all-or-nothing constraint on DIFile source
Update IR verifier to check the constraint that DIFile source is present on all
files or no files.

Differential Revision: https://reviews.llvm.org/D54953

llvm-svn: 348022
2018-11-30 19:13:38 +00:00
Jonas Devlieghere d1c9751657 [dsymutil] Gather global and local symbol addresses in the main executable.
Usually local symbols will have their address described in the debug
map. Global symbols have to have their address looked up in the symbol
table of the main executable. By playing with 'ld -r' and export lists,
you can get a symbol described as global by the debug map while actually
being a local symbol as far as the link in concerned. By gathering the
address of local symbols, we fix this issue.

Also, we prefer a global symbol in case of a name collision to preserve
the previous behavior.

Note that using the 'ld -r' tricks, people can actually cause symbol
names collisions that dsymutil has no way to figure out. This fixes the
simple case where there is only one symbol of a given name.

rdar://problem/32826621

Differential revision: https://reviews.llvm.org/D54922

llvm-svn: 348021
2018-11-30 18:56:10 +00:00
Craig Topper 4d80f199e8 [X86] Change vXi8 MULHU lowering to unpack high and low half of lanes instead of extracting and concating low and high half registers.
This reduces the number of shuffle operations that need to be done. The splitting strategy requires the shuffle unit for the extraction and the extension. With the unpack strategy the unpacks accomplish a splitting and extending in one operation.

llvm-svn: 348019
2018-11-30 18:43:18 +00:00
Craig Topper 8191307d09 [X86] Prefer lowerVectorShuffleAsBitMask over using a avx512 masked operation when avx512bw/avx512vl is enabled.
This does require a constant pool load instead of loading an immediate into a gpr, moving to a k register and masking. But its less instructions and more consistent with previous ISAs. It probably opens up more combine opportunities as one of the test cases demonstrates.

llvm-svn: 348018
2018-11-30 18:43:15 +00:00
Sanjay Patel 1901a12e76 [SelectionDAG] fold FP binops with 2 undef operands to undef
llvm-svn: 348016
2018-11-30 18:38:52 +00:00
Ron Lieberman f48e43bbf7 [AMDGPU] Disable SReg Global LD/ST, perf regression
Differential Revision: https://reviews.llvm.org/D55093

llvm-svn: 348014
2018-11-30 18:29:17 +00:00
Andrea Di Biagio 7e695b97d7 [llvm-mca] Speedup the default resource selection strategy.
This patch removes a (potentially) slow while loop in
DefaultResourceStrategy::select(). A better (and faster) approach is to do some
bit manipulation in order to shrink the range of candidate resources.
On a release build, this change gives an average speedup of ~10%.

llvm-svn: 348007
2018-11-30 17:15:52 +00:00
Yonghong Song f487334622 Revert "[BTF] Add BTF DebugInfo"
This reverts commit 9c6b970db8bc63b28ce58a129bb1580a6a3c6caf.

llvm-svn: 348004
2018-11-30 16:54:43 +00:00
Sanjay Patel 1cfb796b58 [x86] add tests for fake vector FP ops; NFC
llvm-svn: 348002
2018-11-30 16:50:08 +00:00
Yonghong Song 81b77e9159 [BTF] Add BTF DebugInfo
This patch adds BPF Debug Format (BTF) as a standalone
LLVM debuginfo. The BTF related sections are directly
generated from IR. The BTF debuginfo is generated
only when the compilation target is BPF.

What is BTF?
============

First, the BPF is a linux kernel virtual machine
and widely used for tracing, networking and security.
  https://www.kernel.org/doc/Documentation/networking/filter.txt
  https://cilium.readthedocs.io/en/v1.2/bpf/

BTF is the debug info format for BPF, introduced in the below
linux patch
  69b693f0ae (diff-06fb1c8825f653d7e539058b72c83332)
in the patch set mentioned in the below lwn article.
  https://lwn.net/Articles/752047/

The BTF format is specified in the above github commit.
In summary, its layout looks like
  struct btf_header
  type subsection (a list of types)
  string subsection (a list of strings)

With such information, the kernel and the user space is able to
pretty print a particular bpf map key/value. One possible example below:
  Withtout BTF:
    key: [ 0x01, 0x01, 0x00, 0x00 ]
  With BTF:
    key: struct t { a : 1; b : 1; c : 0}
  where struct is defined as
    struct t { char a; char b; short c; };

How BTF is generated?
=====================

Currently, the BTF is generated through pahole.
  https://git.kernel.org/pub/scm/devel/pahole/pahole.git/commit/?id=68645f7facc2eb69d0aeb2dd7d2f0cac0feb4d69
and available in pahole v1.12
  https://git.kernel.org/pub/scm/devel/pahole/pahole.git/commit/?id=4a21c5c8db0fcd2a279d067ecfb731596de822d4

Basically, the bpf program needs to be compiled with -g with
dwarf sections generated. The pahole is enhanced such that
a .BTF section can be generated based on dwarf. This format
of the .BTF section matches the format expected by
the kernel, so a bpf loader can just take the .BTF section
and load it into the kernel.
  8a138aed4a

The .BTF section layout is also specified in this patch:
with file include/llvm/BinaryFormat/BTF.h.

What use cases this patch tries to address?
===========================================

Currently, only the bpf instruction stream is required to
pass to the kernel. The kernel verifies it, jits it if configured
to do so, attaches it to a particular kernel attachment point,
and later executes when a particular event happens.

This patch tries to expand BTF to support two more use cases below:
  (1). BPF supports subroutine calls.
       During performance analysis, it would be good to
       differentiate which call is hot instead of just
       providing a virtual address. This would require to
       pass a unique identifier for each subroutine to
       the kernel, the subroutine name is a natual choice.
  (2). If a particular jitted instruction is hot, we want
       user to know which source line this jitted instruction
       belongs to. This would require the source information
       is available to various profiling tools.

Note that in a single ELF file,
  . there may be multiple loadable bpf programs,
  . for a particular to-be-loaded bpf instruction stream,
    its instructions may come from multiple PROGBITS sections,
    the bpf loader needs to merge them together to a single
    consecutive insn stream before loading to the kernel.
For example:
  section .text: subroutines funcFoo
  section _progA: calling funcFoo
  section _progB: calling funcFoo
The bpf loader could construct two loadable bpf instruction
streams and load them into the kernel:
  . _progA funcFoo
  . _progB funcFoo
So per ELF section function offset and instruction offset
will need to be adjusted before passing to the kernel, and
the kernel essentially expect only one code section regardless
of how many in the ELF file.

What do we propose and Why?
===========================

To support the above two use cases, we propose to
add an additional section, .BTF.ext, to the ELF file
which is the input of the bpf loader. A different section
is preferred since loader may need to manipulate it before
loading part of its data to the kernel.

The .BTF.ext section has a similar header to the .BTF section
and it contains two subsections for func_info and line_info.
  . the func_info maps the func insn byte offset to a func
    type in the .BTF type subsection.
  . the line_info maps the insn byte offset to a line info.
  . both func_info and line_info subsections are organized
    by ELF PROGBITS AX sections.

pahole is not a good place to implement .BTF.ext as
pahole is mostly for structure hole information and more
importantly, we want to pass the actual code to the kernel.
  . bpf program typically is small so storage overhead
    should be small.
  . in bpf land, it is totally possible that
    an application loads the bpf program into the
    kernel and then that application quits, so
    holding debug info by the user space application
    is not practical as you may not even know who
    loads this bpf program.
  . having source codes directly kept by kernel
    would ease deployment since the original source
    code does not need ship on every hosts and
    kernel-devel package does not need to be
    deployed even if kernel headers are used.

LLVM is a good place to implement.
  . The only reliable time to get the source code is
    during compilation time. This will result in both more
    accurate information and easier deployment as
    stated in the above.
  . Another consideration is for JIT. The project like bcc
    (https://github.com/iovisor/bcc)
    use MCJIT to compile a C program into bpf insns and
    load them to the kernel. The llvm generated BTF sections
    will be readily available for such cases as well.

Design and implementation of emiting .BTF/.BTF.ext sections
===========================================================

The BTF debuginfo format is defined. Both .BTF and .BTF.ext
sections are generated directly from IR when both
"-target bpf" and "-g" are specified. Note that
dwarf sections are still generated as dwarf is used
by user space tools like llvm-objdump etc. for BPF target.

This patch also contains tests to verify generated
.BTF and .BTF.ext sections for all supported types, func_info
and line_info subsections. The patch is also tested
against linux kernel bpf sample tests and selftests.

Signed-off-by: Yonghong Song <yhs@fb.com>

Differential Revision: https://reviews.llvm.org/D53736

llvm-svn: 347999
2018-11-30 16:22:59 +00:00
Than McIntosh 0e0a8a3fee [CodeGen] Prefer static frame index for STATEPOINT liveness args
Summary:
If a given liveness arg of STATEPOINT is at a fixed frame index
(e.g. a function argument passed on stack), prefer to use this
fixed location even the address is also in a register. If we use
the register it will generate a spill, which is not necessary
since the fixed frame index can be directly recorded in the stack
map.

Patch by Cherry Zhang <cherryyz@google.com>.

Reviewers: thanm, niravd, reames

Reviewed By: reames

Subscribers: cherryyz, reames, anna, arphaman, llvm-commits

Differential Revision: https://reviews.llvm.org/D53889

llvm-svn: 347998
2018-11-30 16:22:41 +00:00
Alexey Bataev 3689747619 [SLP]PR39774: Update references of the replaced external instructions.
Summary:
An additional fix for PR39774. Need to update the references for the
RedcutionRoot instruction when it is replaced during the vectorization
phase to avoid compiler crash on reduction vectorization.

Reviewers: RKSimon, spatel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D55017

llvm-svn: 347997
2018-11-30 15:14:20 +00:00
Nico Weber 5595925044 [gn build] Add build files for llvm/lib/Bitcode/Reader and llvm/lib/MC/MCParser.
Differential Revision: https://reviews.llvm.org/D55087

llvm-svn: 347995
2018-11-30 14:49:46 +00:00
Valery Pykhtin 3d9afa273f [AMDGPU] Combine DPP mov with use instructions (VOP1/2/3)
Introduces DPP pseudo instructions and the pass that combines DPP mov with subsequent uses.

Differential revision: https://reviews.llvm.org/D53762

llvm-svn: 347993
2018-11-30 14:21:56 +00:00
Nicolai Haehnle 445b0b6260 TableGen/ISel: Allow PatFrag predicate code to access captured operands
Summary:
This simplifies writing predicates for pattern fragments that are
automatically re-associated or commuted.

For example, a followup patch adds patterns for fragments of the form
(add (shl $x, $y), $z) to the AMDGPU backend. Such patterns are
automatically commuted to (add $z, (shl $x, $y)), which makes it basically
impossible to refer to $x, $y, and $z generically in the PredicateCode.

With this change, the PredicateCode can refer to $x, $y, and $z simply
as `Operands[i]`.

Test confirmed that there are no changes to any of the generated files
when building all (non-experimental) targets.

Change-Id: I61c00ace7eed42c1d4edc4c5351174b56b77a79c

Reviewers: arsenm, rampitec, RKSimon, craig.topper, hfinkel, uweigand

Subscribers: wdng, tpr, llvm-commits

Differential Revision: https://reviews.llvm.org/D51994

llvm-svn: 347992
2018-11-30 14:15:13 +00:00
Alex Bradbury 4830fdd21a [RISCV] Add additional CSR instruction aliases (imm. operands)
This patch adds CSR instructions aliases for the cases where the instruction 
takes an immediate operand but the alias doesn't have the i suffix. This is 
necessary for gas/gcc compatibility.

gas doesn't do a similar conversion for fsflags or fsrm, so this should be 
complete.

Differential Revision: https://reviews.llvm.org/D55008
Patch by Luís Marques.

llvm-svn: 347991
2018-11-30 14:10:52 +00:00
Renato Golin de4b88e5ac Fix parenthesis warning in IVDescriptors
llvm-svn: 347990
2018-11-30 13:54:36 +00:00
Renato Golin 135e72e1b9 Add a new reduction pattern match
Adding a new reduction pattern match for vectorizing code similar
to TSVC s3111:

for (int i = 0; i < N; i++)
  if (a[i] > b)
    sum += a[i];

This patch adds support for fadd, fsub and fmull, as well as multiple
branches and different (but compatible) instructions (ex. add+sub) in
different branches.

The difference from the previous patch(https://reviews.llvm.org/D49168)
is as follows:
 - Added check of fast-math property of fp-instruction to the
   previous patch
 - Fix/add some pattern for if-reduction.ll


Differential Revision: https://reviews.llvm.org/D54464

Patch by Takahiro Miyoshi <takahiro.miyoshi@linaro.org>
     and Masakazu Ueno <masakazu.ueno@linaro.org>

llvm-svn: 347989
2018-11-30 13:40:10 +00:00
Alex Bradbury 26403def69 [RISCV] Add UNIMP instruction (32- and 16-bit forms)
This patch adds support for UNIMP in both 32- and 16-bit forms. The 32-bit 
form can be seen as a variant of the ECALL/EBREAK/etc. family of instructions. 
The 16-bit form is just all zeroes, which isn't a valid RISC-V instruction, 
but still follows the 16-bit instruction form (i.e. bits 0-1 != 11).

Until recently unimp was undocumented and supported just by binutils, which 
printed unimp for either the 16 or 32-bit form. Both forms are now documented 
<https://github.com/riscv/riscv-asm-manual/pull/20> and binutils now supports 
c.unimp <https://sourceware.org/ml/binutils-cvs/2018-11/msg00179.html>.

Differential Revision: https://reviews.llvm.org/D54316
Patch by Luís Marques.

llvm-svn: 347988
2018-11-30 13:39:17 +00:00
Alex Bradbury fca95cfee9 [SelectionDAG] Support result type promotion for FLT_ROUNDS_
For targets where i32 is not a legal type (e.g. 64-bit RISC-V), 
LegalizeIntegerTypes must promote the result of ISD::FLT_ROUNDS_.

Differential Revision: https://reviews.llvm.org/D53820

llvm-svn: 347986
2018-11-30 13:18:33 +00:00
Andrea Di Biagio d20cdccb70 [llvm-mca] Simplify code in class Scheduler. NFCI
llvm-svn: 347985
2018-11-30 12:49:30 +00:00
Alex Bradbury bd24c7b045 [SelectionDAG] Support promotion of PREFETCH operands
For targets where i32 is not a legal type (e.g. 64-bit RISC-V), 
LegalizeIntegerTypes must promote the operands of ISD::PREFETCH.

Differential Revision: https://reviews.llvm.org/D53281

llvm-svn: 347980
2018-11-30 10:06:31 +00:00
Max Kazantsev 9cf417db78 [LoopSimplifyCFG] Update MemorySSA in terminator folding. PR39783
Terminator folding transform lacks MemorySSA update for memory Phis,
while they exist within MemorySSA analysis. They need exactly the same
type of updates as regular Phis. Failing to update them properly ends up
with inconsistent MemorySSA and manifests in various assertion failures.

This patch adds Memory Phi updates to this transform.

Thanks to @jonpa for finding this!

Differential Revision: https://reviews.llvm.org/D55050
Reviewed By: asbirlea

llvm-svn: 347979
2018-11-30 10:06:23 +00:00
Alex Bradbury 36e0fd1d39 [SelectionDAG] Support promotion of FRAMEADDR/RETURNADDR operands
For targets where i32 is not a legal type (e.g. 64-bit RISC-V), 
LegalizeIntegerTypes must promote the operand.

Differential Revision: https://reviews.llvm.org/D53279

llvm-svn: 347978
2018-11-30 10:02:06 +00:00
Alex Bradbury e0e62e97df [TargetLowering][RISCV] Introduce isSExtCheaperThanZExt hook and implement for RISC-V
DAGTypeLegalizer::PromoteSetCCOperands currently prefers to zero-extend 
operands when it is able to do so. For some targets this is more expensive 
than a sign-extension, which is also a valid choice. Introduce the 
isSExtCheaperThanZExt hook and use it in the new SExtOrZExtPromotedInteger 
helper. On RISC-V, we prefer sign-extension for FromTy == MVT::i32 and ToTy == 
MVT::i64, as it can be performed using a single instruction.

Differential Revision: https://reviews.llvm.org/D52978

llvm-svn: 347977
2018-11-30 09:56:54 +00:00
Max Kazantsev deaa3e2068 [NFC] Simplify and reduce tests for PR39783
llvm-svn: 347976
2018-11-30 09:51:25 +00:00
Alex Bradbury bc96a98ed0 [RISCV] Introduce codegen patterns for instructions introduced in RV64I
As discussed in the RFC 
<http://lists.llvm.org/pipermail/llvm-dev/2018-October/126690.html>, 64-bit 
RISC-V has i64 as the only legal integer type.  This patch introduces patterns 
to support codegen of the new instructions 
introduced in RV64I: addiw, addiw, subw, sllw, slliw, srlw, srliw, sraw, 
sraiw, ld, sd.

Custom selection code is needed for srliw as SimplifyDemandedBits will remove 
lower bits from the mask, meaning the obvious pattern won't work:

def : Pat<(sext_inreg (srl (and GPR:$rs1, 0xffffffff), uimm5:$shamt), i32),
          (SRLIW GPR:$rs1, uimm5:$shamt)>;
This is sufficient to compile and execute all of the GCC torture suite for 
RV64I other than those files using frameaddr or returnaddr intrinsics 
(LegalizeDAG doesn't know how to promote the operands - a future patch 
addresses this).

When promoting i32 sltu/sltiu operands, it would be more efficient to use 
sign-extension rather than zero-extension for RV64. A future patch adds a hook 
to allow this.

Differential Revision: https://reviews.llvm.org/D52977

llvm-svn: 347973
2018-11-30 09:38:44 +00:00
Alex Bradbury f612fadc51 [docs][AtomicExpandPass] Document the alternate lowering strategy for part-word atomicrmw/cmpxchg
D47882, D48130 and D48131 introduce a new lowering strategy for part-word 
atomicrmw/cmpxchg and uses it to lower these operations for the RISC-V target. 
Rather than having AtomicExpandPass produce the LL/SC loop in the IR level, it 
instead calculates the necessary mask values and inserts a target-specific 
intrinsic, which is lowered at a much later stage (after register allocation). 
This ensures that architecture-specific restrictions for forward-progress in 
LL/SC loops can be guaranteed.

This patch documents this new AtomicExpandPass functionality. See the previous 
llvm-dev RFC for more info 
<http://lists.llvm.org/pipermail/llvm-dev/2018-June/123993.html>.

Differential Revision: https://reviews.llvm.org/D52234

llvm-svn: 347971
2018-11-30 09:23:24 +00:00
Craig Topper a2133061c0 [X86] Emit PACKUS directly from the v16i8 LowerMULH code instead of using a shuffle.
llvm-svn: 347967
2018-11-30 08:32:05 +00:00
Craig Topper 6e4b266a0d [X86] Change the pre-sse4.1 code in the v16i8 MULHU lowering to be what we get after DAG combine cleans it up.
Previously we emitted a punpcklbw/punpckhbw to move the byte elements into the upper half of 16 bit elements then shifted right by 8 to zero the upper bits. After DAG combine we end up with punpcklbw/punpckhbw into the lower bits with zeros in the uppers bits and no shifts. So just emit that directly.

llvm-svn: 347966
2018-11-30 08:32:01 +00:00
Sjoerd Meijer ecc7dcb879 [ARM] Don't expand sdiv when optimising for minsize
Don't expand SDIV with an immediate that is a power of 2 if we optimise for
minimum code size. For example:

sdiv %1, i32 4

gets expanded to a sequence of 3 instructions, but this is suboptimal for
minimum code size so instead we just generate a MOV and a SDIV if integer
division is supported.

Differential Revision: https://reviews.llvm.org/D54546

llvm-svn: 347965
2018-11-30 08:14:28 +00:00
Hsiangkai Wang 957578ddf7 [CodeGen] Fix bugs in BranchFolderPass when debug labels are generated.
Skip DBG_VALUE and DBG_LABEL in branch folding algorithms.

The bug is reported in
https://bugs.chromium.org/p/chromium/issues/detail?id=898160.

Differential Revision: https://reviews.llvm.org/D54199

llvm-svn: 347964
2018-11-30 08:07:29 +00:00
Hsiangkai Wang d72f6f133a [NFC] Refine doxygen format.
Differential Revision: https://reviews.llvm.org/D54568

llvm-svn: 347963
2018-11-30 08:07:24 +00:00
Jonas Paulsson b1d014883c [SystemZ::TTI] i8/i16 operands extension costs revisited
Three minor changes to these extra costs:

* For ICmp instructions, instead of adding 2 all the time for extending each
  operand, this is only done if that operand is neither a load or an
  immediate.

* The operands extension costs for divides removed, because we now use a high
  cost already for the divide (20).

* The costs for lhsr/ashr extra costs removed as this did not seem useful.

Review: Ulrich Weigand
https://reviews.llvm.org/D55053

llvm-svn: 347961
2018-11-30 07:09:34 +00:00
Craig Topper 0850e8a6b6 [X86] Fix a couple types in SimplifyDemandedVectorEltsForTargetNode. NFCI
We had a EVT variable capturing the result of getSimpleValueType which returns an MVT. Another place using EVT that could have been MVT. And an 'int' that should be 'unsigned'.

llvm-svn: 347959
2018-11-30 06:23:55 +00:00
Alexander Shaposhnikov 6e4dc6f23f [llvm-objcopy] Move elf-specific tests into subfolder
In this diff the elf-specific tests are moved into the subfolder llvm-objcopy/ELF
(the change was discussed in the comments on https://reviews.llvm.org/D54674).
A separate code reivew wasn't sent for this change 
since Phabricator is failing to create such a large diff.

Test plan: 
make check-all
make check-llvm-tools
make check-llvm-tools-llvm-objcopy

llvm-svn: 347958
2018-11-30 05:43:39 +00:00
Mircea Trofin 5e0b21fb45 Fix build warnings introduced in rL347938
Summary:
Suppressed warnings in release builds due to variable used
only in assert statement.

Subscribers: llvm-commits, eraman, mgorny

Differential Revision: https://reviews.llvm.org/D55100

llvm-svn: 347939
2018-11-30 01:53:17 +00:00
Mircea Trofin f1a49e8525 Revert "Revert r347596 "Support for inserting profile-directed cache prefetches""
Summary:
This reverts commit d8517b96dfbd42e6a8db33c50d1fa1e58e63fbb9.

Fix: correct  the use of DenseMap.

Reviewers: davidxl, hans, wmi

Reviewed By: wmi

Subscribers: mgorny, eraman, llvm-commits

Differential Revision: https://reviews.llvm.org/D55088

llvm-svn: 347938
2018-11-30 01:01:52 +00:00
Shoaib Meenai 3eb01f99fb [CMake] build correctly if build path contains whitespace
The add_llvm_symbol_exports function in AddLLVM.cmake creates command
line link flags with paths containing CMAKE_CURRENT_BINARY_DIR, but that
will break if CMAKE_CURRENT_BINARY_DIR contains whitespace. This patch
adds quotes to those paths.

Fixes PR39843.

Patch by John Garvin.

Differential Revision: https://reviews.llvm.org/D55081

llvm-svn: 347937
2018-11-30 00:30:53 +00:00
Warren Ristow 72d1f3a285 [SCEV] Guard movement of insertion point for loop-invariants
r320789 suppressed moving the insertion point of SCEV expressions with
dev/rem operations to the loop header in non-loop-invariant situations.
This, and similar, hoisting is also unsafe in the loop-invariant case,
since there may be a guard against a zero denominator. This is an
adjustment to the fix of r320789 to suppress the movement even in the
loop-invariant case.

This fixes PR30806.

Differential Revision: https://reviews.llvm.org/D54713

llvm-svn: 347934
2018-11-30 00:02:54 +00:00
Nico Weber 7a2c5856a9 [gn build] merge r346978 and r347741.
llvm-svn: 347929
2018-11-29 23:03:17 +00:00
Nico Weber bd00f04a39 [gn build] Set +x bit on .py files in llvm/utils/gn/build.
Also add a shebang line to write_cmake_config.py.

llvm-svn: 347928
2018-11-29 22:56:40 +00:00
Nico Weber 8a2454cab8 [gn build] Add template for running llvm-tblgen and use it to add build file for llvm/lib/IR.
Also adds a boring build file for llvm/lib/BinaryFormat (needed by llvm/lib/IR).

lib/IR marks Attributes and IntrinsicsEnum as public_deps (because IR's public
headers include the generated .inc files), so projects depending on lib/IR will
implicitly depend on them being generated. As a consequence, most targets won't
have to explicitly list a dependency on these tablegen steps (contrast with
intrinsics_gen in the cmake build).

This doesn't yet have the optimization where tablegen's output is only updated
if it's changed.

Differential Revision: https://reviews.llvm.org/D55028#inline-486755

llvm-svn: 347927
2018-11-29 22:53:21 +00:00
Nico Weber 12ccfed3ff [gn build] Add a script checking if sources in BUILD.gn and CMakeLists.txt files match.
Also fix a missing file in lib/Support/BUILD.gn found by the script.

The script is very stupid and assumes that CMakeLists.txt follow the standard
LLVM CMakeLists.txt formatting with one cpp source file per line. Despite its
simplicity, it works well in practice.

It would be nice if it also checked deps and maybe automatically applied its
suggestions.

Differential Revision: https://reviews.llvm.org/D54930

llvm-svn: 347925
2018-11-29 22:25:31 +00:00
Thomas Lively 66ea30c7bc [WebAssembly] Expand unavailable integer operations for vectors
Summary:
Expands for vector types all of the integer operations that are
expanded for scalars because they are not supported at all by
WebAssembly.

This CL has no tests because such tests would really be testing the
target-independent expansion, but I'm happy to add tests if reviewers
think it would be helpful.

Reviewers: aheejin, dschuff

Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D55010

llvm-svn: 347923
2018-11-29 22:01:01 +00:00
Jonas Devlieghere ccf7d4b4aa Produce an error on non-encodable offsets for darwin ARM scattered relocations.
Scattered ARM relocations for Mach-O's only have 24 bits available to
encode the offset. This is not checked but just truncated and can result
in corrupt binaries after linking because the relocations are applied to
the wrong offset. This patch will check and error out in those
situations instead of emitting a wrong relocation.

Patch by: Sander Bogaert (dzn)

Differential revision: https://reviews.llvm.org/D54776

llvm-svn: 347922
2018-11-29 21:58:23 +00:00
Paul Robinson 49f51bcce3 Comment tweak requested in code review. NFC
I forgot to do this before committing D54755.

llvm-svn: 347918
2018-11-29 21:13:51 +00:00
Sanjay Patel 8d27144251 [DAGCombiner] narrow truncated binops
The motivating case for this is shown in:
https://bugs.llvm.org/show_bug.cgi?id=32023
and the corresponding rot16.ll regression tests.

Because x86 scalar shift amounts are i8 values, we can end up with trunc-binop-trunc 
sequences that don't get folded in IR.

As the TODO comments suggest, there will be regressions if we extend this (for x86, 
we mostly seem to be missing LEA opportunities, but there are likely vector folds 
missing too). I think those should be considered existing bugs because this is the 
same transform that we do as an IR canonicalization in instcombine. We just need 
more tests to make those visible independent of this patch.

Differential Revision: https://reviews.llvm.org/D54640

llvm-svn: 347917
2018-11-29 20:58:26 +00:00
Martin Storsjo c1410635bf [obj2yaml] [COFF] Write RVA instead of VA for sections, fix roundtripping executables
yaml2obj writes the yaml value as is to the output file.

Differential Revision: https://reviews.llvm.org/D54965

llvm-svn: 347916
2018-11-29 20:53:57 +00:00
Alex Bradbury 66d9a752b9 [RISCV] Implement codegen for cmpxchg on RV32IA
Utilise a similar ('late') lowering strategy to D47882. The changes to 
AtomicExpandPass allow this strategy to be utilised by other targets which 
implement shouldExpandAtomicCmpXchgInIR.

All cmpxchg are lowered as 'strong' currently and failure ordering is ignored. 
This is conservative but correct.

Differential Revision: https://reviews.llvm.org/D48131

llvm-svn: 347914
2018-11-29 20:43:42 +00:00
Craig Topper 73c1d75d58 [X86] Change the pre-type legalization DAG combine added in r347898 into a custom type legalization operation instead.
This seems to produce the same results on the tests we have.

llvm-svn: 347912
2018-11-29 20:18:58 +00:00
David Stuttard c6603861d8 Revert r347871 "Fix: Add support for TFE/LWE in image intrinsic"
Also revert fix r347876

One of the buildbots was reporting a failure in some relevant tests that I can't
repro or explain at present, so reverting until I can isolate.

llvm-svn: 347911
2018-11-29 20:14:17 +00:00
Artur Pilipenko eba2365f23 Introduce MaxUsesToExplore argument to capture tracking
Currently CaptureTracker gives up if it encounters a value with more than 20 
uses. The motivation for this cap is to keep it relatively cheap for 
BasicAliasAnalysis use case, where the results can't be cached. Although, other 
clients of CaptureTracker might be ok with higher cost. This patch introduces an 
argument for PointerMayBeCaptured functions to specify the max number of uses to 
explore. The motivation for this change is a downstream user of CaptureTracker, 
but I believe upstream clients of CaptureTracker might also benefit from more 
fine grained cap.

Reviewed By: hfinkel

Differential Revision: https://reviews.llvm.org/D55042

llvm-svn: 347910
2018-11-29 20:08:12 +00:00
Francis Visoiu Mistrih 0b8dd4488e [MachineScheduler] Order FI-based memops based on stack direction
It makes more sense to order FI-based memops in descending order when
the stack goes down. This allows offsets to stay "consecutive" and allow
easier pattern matching.

llvm-svn: 347906
2018-11-29 20:03:19 +00:00
Craig Topper 129d529ab3 [SelectionDAG][AArch64][X86] Move legalization of vector MULHS/MULHU from LegalizeDAG to LegalizeVectorOps
I believe we should be legalizing these with the rest of vector binary operations. If any custom lowering is required for these nodes, this will give the DAG combine between LegalizeVectorOps and LegalizeDAG to run on the custom code before constant build_vectors are lowered in LegalizeDAG.

I've moved MULHU/MULHS handling in AArch64 from Lowering to isel. Moving the lowering earlier caused build_vector+extract_subvector simplifications to kick in which made the generated code worse.

Differential Revision: https://reviews.llvm.org/D54276

llvm-svn: 347902
2018-11-29 19:36:17 +00:00
Craig Topper 6cd0b17078 [X86] Add a DAG combine pre type legalization to widen division by constant splat on narrow vectors to avoid scalarization
This is another patch for -x86-experimental-vector-widening. This pre widens narrow division by constants so that we can get pass the legal type check in the generic DAG combiner. Otherwise we end up scalarizing.

I've restricted this to splats for now because it was easy to just call DAG.getConstant. Not sure what we should do for non-splat? Increase the element size?Widen the constant vector by padding with 1?

Differential Revision: https://reviews.llvm.org/D54919

llvm-svn: 347898
2018-11-29 19:13:38 +00:00
Sanjay Patel d802270808 [InstSimplify] fold select with implied condition
This is an almost direct move of the functionality from InstCombine to 
InstSimplify. There's no reason not to do this in InstSimplify because 
we never create a new value with this transform.

(There's a question of whether any dominance-based transform belongs in
either of these passes, but that's a separate issue.)

I've changed 1 of the conditions for the fold (1 of the blocks for the 
branch must be the block we started with) into an assert because I'm not 
sure how that could ever be false.

We need 1 extra check to make sure that the instruction itself is in a
basic block because passes other than InstCombine may be using InstSimplify
as an analysis on values that are not wired up yet.

The 3-way compare changes show that InstCombine has some kind of 
phase-ordering hole. Otherwise, we would have already gotten the intended
final result that we now show here.

llvm-svn: 347896
2018-11-29 18:44:39 +00:00
Krzysztof Parzyszek a26a848da3 [TableGen] Examine entire subreg compositions to detect ambiguity
When tablegen detects that there exist two subregister compositions that
result in the same value for some register, it will emit a warning. This
kind of an overlap in compositions should only happen when it is caused
by a user-defined composition. It can happen, however, that the user-
defined composition is not identically equal to another one, but it does
produce the same value for one or more registers. In such cases suppress
the warning.
This patch is to silence the warning when building the System Z backend
after D50725.

Differential Revision: https://reviews.llvm.org/D50977

llvm-svn: 347894
2018-11-29 18:20:08 +00:00
Volkan Keles 4fe0080984 [GlobalISel] LegalizationArtifactCombiner: Combine aext([asz]ext x) -> [asz]ext x
Summary:
Replace `aext([asz]ext x)` with `aext/sext/zext x` in order to
reduce the number of instructions generated to clean up some
legalization artifacts.

Reviewers: aditya_nandakumar, dsanders, aemerson, bogner

Reviewed By: aemerson

Subscribers: rovka, kristof.beyls, javed.absar, llvm-commits

Differential Revision: https://reviews.llvm.org/D54174

llvm-svn: 347893
2018-11-29 18:19:24 +00:00
Fangrui Song e4ee066190 [llvm-objcopy] Delete redundant !Config.xx.empty() when followed by positive is_contained() check
Summary: The original intention of !Config.xx.empty() was probably to emphasize the thing that is currently considered, but I feel the simplified form is actually easier to understand and it is also consistent with the call sites in other llvm components.

Reviewers: alexshap, rupprecht, jakehehrlich, jhenderson, espindola

Reviewed By: alexshap, rupprecht

Subscribers: emaste, arichardson, llvm-commits

Differential Revision: https://reviews.llvm.org/D55040

llvm-svn: 347891
2018-11-29 17:32:51 +00:00
Serge Guelton 644aa88235 Avoid redundant reference to isPodLike in SmallVect/Optional implementation
NFC, preparatory work for isPodLike cleaning.

Differential Revision: https://reviews.llvm.org/D55005

llvm-svn: 347890
2018-11-29 17:21:54 +00:00
John Brawn a7eb2c863f [LICM] Reapply r347776 "Make LICM able to hoist phis" with fix
This commit caused a large compile-time slowdown in some cases when NDEBUG is
off due to the dominator tree verification it added. Fix this by only doing
dominator tree and loop info verification when something has been hoisted.

Differential Revision: https://reviews.llvm.org/D52827

llvm-svn: 347889
2018-11-29 17:10:00 +00:00
Teresa Johnson 93f9996278 [ThinLTO] Import local variables from the same module as caller
Summary:
We can sometimes end up with multiple copies of a local variable that
have the same GUID in the index. This happens when there are local
variables with the same name that are in different source files having the
same name/path at compile time (but compiled into different bitcode objects).

In this case make sure we import the copy in the caller's module.
This enables importing both of the variables having the same GUID
(but which will have different promoted names since the module paths,
and therefore the module hashes, will be distinct).

Importing the wrong copy is particularly problematic for read only
variables, since we must import them as a local copy whenever
referenced. Otherwise we get undefs at link time.

Note that the llvm-lto.cpp and ThinLTOCodeGenerator changes are needed
for testing the distributed index case via clang, which will be sent as
a separate clang-side patch shortly. We were previously not doing the
dead code/read only computation before computing imports when testing
distributed index generation (like it was for testing importing and
other ThinLTO mechanisms alone).

Reviewers: evgeny777

Subscribers: mehdi_amini, inglorion, eraman, steven_wu, dexonsmith, dang, llvm-commits

Differential Revision: https://reviews.llvm.org/D55047

llvm-svn: 347886
2018-11-29 17:02:42 +00:00
James Y Knight 3c24bd09bb git-llvm: Fix incremental population of svn tree.
"svn update --depth=..." is, annoyingly, not a specification of the
desired depth, but rather a _limit_ added on top of the "sticky" depth
in the working-directory. However, if the directory doesn't exist yet,
then it sets the sticky depth of the new directory entries.

Unfortunately, the svn command-line has no way of expanding the depth
of a directory from "empty" to "files", without also removing any
already-expanded subdirectories. The way you're supposed to increase
the depth of an existing directory is via --set-depth, but
--set-depth=files will also remove any subdirs which were already
requested.

This change avoids getting into the state of ever needing to increase
the depth of an existing directory from "empty" to "files" in the
first place, by:

1. Use svn update --depth=files, not --depth=immediates.

The latter has the effect of checking out the subdirectories and
marking them as depth=empty. The former excludes sub-directories from
the list of entries, which avoids the problem.

2. Explicitly populate missing parent directories.

Using --parents seemed nice and easy, but it marks the parent dirs as
depth=empty. Instead, check out parents explicitly if they're missing.

llvm-svn: 347883
2018-11-29 16:46:34 +00:00
Sanjay Patel 515b91cdef [SimplifyCFG] auto-generate complete checks; NFC
llvm-svn: 347882
2018-11-29 16:28:37 +00:00
Sanjay Patel 81449c6b0e [InstCombine] auto-generate complete checks; NFC
llvm-svn: 347881
2018-11-29 16:26:03 +00:00
Graham Sellers 04f7a4d2d2 [AMDGPU] Add and update scalar instructions
This patch adds support for S_ANDN2, S_ORN2 32-bit and 64-bit instructions and adds splits to move them to the vector unit (for which there is no equivalent instruction). It modifies the way that the more complex scalar instructions are lowered to vector instructions by first breaking them down to sequences of simpler scalar instructions which are then lowered through the existing code paths. The pattern for S_XNOR has also been updated to apply inversion to one input rather than the output of the XOR as the result is equivalent and may allow leaving the NOT instruction on the scalar unit.

A new tests for NAND, NOR, ANDN2 and ORN2 have been added, and existing tests now hit the new instructions (and have been modified accordingly).

Differential: https://reviews.llvm.org/D54714
llvm-svn: 347877
2018-11-29 16:05:38 +00:00
David Stuttard 535c1af0bf Fix: Add support for TFE/LWE in image intrinsic
My change svn-id: 347871 caused a buildbot failure due to an unused
variable def (used in an assert).

Change-Id: Ia882d18bb6fa79b4d7bbfda422b9ea5d23eab336
llvm-svn: 347876
2018-11-29 15:56:36 +00:00
Hans Wennborg e632286d24 Revert r347823 "[TextAPI] Switch back to a custom Platform enum."
It broke the Windows buildbots, e.g.
http://lab.llvm.org:8011/builders/llvm-clang-lld-x86_64-scei-ps4-windows10pro-fast/builds/21829/steps/test/logs/stdio

This also reverts the follow-ups: r347824, r347827, and r347836.

llvm-svn: 347874
2018-11-29 15:47:24 +00:00
Joseph Tremoulet 926ee459c4 [CallSiteSplitting] Report edge deletion to DomTreeUpdater
Summary:
When splitting musttail calls, the split blocks' original terminators
get removed; inform the DTU when this happens.

Also add a testcase that fails an assertion in the DTU without this fix.


Reviewers: fhahn, junbuml

Reviewed By: fhahn

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D55027

llvm-svn: 347872
2018-11-29 15:27:04 +00:00
David Stuttard de02e4b1cc Add support for TFE/LWE in image intrinsics
TFE and LWE support requires extra result registers that are written in the
event of a failure in order to detect that failure case.
The specific use-case that initiated these changes is sparse texture support.

This means that if image intrinsics are used with either option turned on, the
programmer must ensure that the return type can contain all of the expected
results. This can result in redundant registers since the vector size must be a
power-of-2.

This change takes roughly 6 parts:
1. Modify the instruction defs in tablegen to add new instruction variants that
can accomodate the extra return values.
2. Updates to lowerImage in SIISelLowering.cpp to accomodate setting TFE or LWE
(where the bulk of the work for these instruction types is now done)
3. Extra verification code to catch cases where intrinsics have been used but
insufficient return registers are used.
4. Modification to the adjustWritemask optimisation to account for TFE/LWE being
enabled (requires extra registers to be maintained for error return value).
5. An extra pass to zero initialize the error value return - this is because if
the error does not occur, the register is not written and thus must be zeroed
before use. Also added a new (on by default) option to ensure ALL return values
are zero-initialized that is required for sparse texture support.
6. Disable the inst_combine optimization in the presence of tfe/lwe (later TODO
for this to re-enable and handle correctly).

There's an additional fix now to avoid a dmask=0

For an image intrinsic with tfe where all result channels except tfe
were unused, I was getting an image instruction with dmask=0 and only a
single vgpr result for tfe. That is incorrect because the hardware
assumes there is at least one vgpr result, plus the one for tfe.

Fixed by forcing dmask to 1, which gives the desired two vgpr result
with tfe in the second one.

The TFE or LWE result is returned from the intrinsics using an aggregate
type. Look in the test code provided to see how this works, but in essence IR
code to invoke the intrinsic looks as follows:

%v = call {<4 x float>,i32} @llvm.amdgcn.image.load.1d.v4f32i32.i32(i32 15,
                                      i32 %s, <8 x i32> %rsrc, i32 1, i32 0)
%v.vec = extractvalue {<4 x float>, i32} %v, 0
%v.err = extractvalue {<4 x float>, i32} %v, 1

Differential revision: https://reviews.llvm.org/D48826

Change-Id: If222bc03642e76cf98059a6bef5d5bffeda38dda
llvm-svn: 347871
2018-11-29 15:21:13 +00:00
Sanjay Patel 8242c82de4 [CVP] tidy processCmp(); NFC
1. The variables were confusing: 'C' typically refers to a constant, but here it was the Cmp.
2. Formatting violations.
3. Simplify code to return true/false constant.

llvm-svn: 347868
2018-11-29 14:41:21 +00:00
Martin Storsjo bfd1d27585 Revert "[LICM] Enable control flow hoisting by default" and "[LICM] Reapply r347190 "Make LICM able to hoist phis" with fix"
This reverts commits r347776 and r347778.

The first one, r347776, caused significant compile time regressions
for certain input files, see PR39836 for details.

llvm-svn: 347867
2018-11-29 14:39:39 +00:00
Sanjay Patel 83d1d3f167 [CVP] auto-generate complete test checks; NFC
llvm-svn: 347866
2018-11-29 14:28:47 +00:00
Hans Wennborg 6e3be9d12e Revert r347596 "Support for inserting profile-directed cache prefetches"
It causes asserts building BoringSSL. See https://crbug.com/91009#c3 for
repro.

This also reverts the follow-ups:
Revert r347724 "Do not insert prefetches with unsupported memory operands."
Revert r347606 "[X86] Add dependency from X86 to ProfileData after rL347596"
Revert r347607 "Add new passes to X86 pipeline tests"

llvm-svn: 347864
2018-11-29 13:58:02 +00:00
Petr Pavlu 6bb80512db [GlobalISel] Fix insertion of stack-protector epilogue
* Tell the StackProtector pass to generate the epilogue instrumentation
  when GlobalISel is enabled because GISel currently does not implement
  the same deferred epilogue insertion as SelectionDAG.
* Update StackProtector::InsertStackProtectors() to find a stack guard
  slot by searching for the llvm.stackprotector intrinsic when the
  prologue was not created by StackProtector itself but the pass still
  needs to generate the epilogue instrumentation. This fixes a problem
  when the pass would abort because the stack guard AllocInst pointer
  was null when generating the epilogue -- test
  CodeGen/AArch64/GlobalISel/arm64-irtranslator-stackprotect.ll.

Differential Revision: https://reviews.llvm.org/D54518

llvm-svn: 347862
2018-11-29 13:22:53 +00:00
Petr Pavlu e6406d568c [GlobalISel] Make EnableGlobalISel always set when GISel is enabled
Change meaning of TargetOptions::EnableGlobalISel. The flag was
previously set only when a target switched on GlobalISel but it is now
always set when the GlobalISel pipeline is enabled. This makes the flag
consistent with TargetOptions::EnableFastISel and allows its use in
other parts of the compiler to determine when GlobalISel is enabled.

The EnableGlobalISel flag had previouly only one use in
TargetPassConfig::isGlobalISelAbortEnabled(). The method used its value
to determine if GlobalISel was enabled by a target and returned false in
such a case. To preserve the current behaviour, a new flag
TargetOptions::GlobalISelAbort is introduced to separately record the
abort behaviour.

Differential Revision: https://reviews.llvm.org/D54518

llvm-svn: 347861
2018-11-29 12:56:32 +00:00
Martin Storsjo a876b5c0f5 [llvm-rc] Support EXSTYLE statement.
Patch by Jacek Caban!

Differential Revision: https://reviews.llvm.org/D55020

llvm-svn: 347858
2018-11-29 12:17:39 +00:00
Andrea Di Biagio 373a4ccf6c [llvm-mca][MC] Add the ability to declare which processor resources model load/store queues (PR36666).
This patch adds the ability to specify via tablegen which processor resources
are load/store queue resources.

A new tablegen class named MemoryQueue can be optionally used to mark resources
that model load/store queues.  Information about the load/store queue is
collected at 'CodeGenSchedule' stage, and analyzed by the 'SubtargetEmitter' to
initialize two new fields in struct MCExtraProcessorInfo named `LoadQueueID` and
`StoreQueueID`.  Those two fields are identifiers for buffered resources used to
describe the load queue and the store queue.
Field `BufferSize` is interpreted as the number of entries in the queue, while
the number of units is a throughput indicator (i.e. number of available pickers
for loads/stores).

At construction time, LSUnit in llvm-mca checks for the presence of extra
processor information (i.e. MCExtraProcessorInfo) in the scheduling model.  If
that information is available, and fields LoadQueueID and StoreQueueID are set
to a value different than zero (i.e. the invalid processor resource index), then
LSUnit initializes its LoadQueue/StoreQueue based on the BufferSize value
declared by the two processor resources.

With this patch, we more accurately track dynamic dispatch stalls caused by the
lack of LS tokens (i.e. load/store queue full). This is also shown by the
differences in two BdVer2 tests. Stalls that were previously classified as
generic SCHEDULER FULL stalls, are not correctly classified either as "load
queue full" or "store queue full".

About the differences in the -scheduler-stats view: those differences are
expected, because entries in the load/store queue are not released at
instruction issue stage. Instead, those are released at instruction executed
stage.  This is the main reason why for the modified tests, the load/store
queues gets full before PdEx is full.

Differential Revision: https://reviews.llvm.org/D54957

llvm-svn: 347857
2018-11-29 12:15:56 +00:00
Nicolai Haehnle 7bed696915 AMDGPU/InsertWaitcnts: Remove the dependence on MachineLoopInfo
Summary:
MachineLoopInfo cannot be relied on for correctness, because it cannot
properly recognize loops in irreducible control flow which can be
introduced by late machine basic block optimization passes. See the new
test case for the reduced form of an example that occurred in practice.

Use a simple fixpoint iteration instead.

In order to facilitate this change, refactor WaitcntBrackets so that it
only tracks pending events and registers, rather than also maintaining
state that is relevant for the high-level algorithm. Various accessor
methods can be removed or made private as a consequence.

Affects (in radv):
- dEQP-VK.glsl.loops.special.{for,while}_uniform_iterations.select_iteration_count_{fragment,vertex}

Fixes: r345719 ("AMDGPU: Rewrite SILowerI1Copies to always stay on SALU")

Reviewers: msearles, rampitec, scott.linder, kanarayan

Subscribers: arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits, hakzsam

Differential Revision: https://reviews.llvm.org/D54231

llvm-svn: 347853
2018-11-29 11:06:26 +00:00
Nicolai Haehnle ab43bf60fe AMDGPU/InsertWaitcnt: Consistently use uint32_t for scores / time points
Summary:
There is one obsolete reference to using -1 as an indication of "unknown",
but this isn't actually used anywhere.

Using unsigned makes robust wrapping checks easier.

Reviewers: msearles, rampitec, scott.linder, kanarayan

Subscribers: arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, llvm-commits, tpr, t-tye, hakzsam

Differential Revision: https://reviews.llvm.org/D54230

llvm-svn: 347852
2018-11-29 11:06:21 +00:00
Nicolai Haehnle f96456c611 AMDGPU/InsertWaitcnt: Remove unused WaitAtBeginning
Reviewers: msearles, rampitec, scott.linder, kanarayan

Subscribers: arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits, hakzsam

Differential Revision: https://reviews.llvm.org/D54229

llvm-svn: 347851
2018-11-29 11:06:18 +00:00
Nicolai Haehnle d1f45dad84 AMDGPU/InsertWaitcnts: Simplify pending events tracking
Summary:
Instead of storing the "score" (last time point) of the various relevant
events, only store whether an event is pending or not.

This is sufficient, because whenever only one event of a count type is
pending, its last time point is naturally the upper bound of all time
points of this count type, and when multiple event types are pending,
the count type has gone out of order and an s_waitcnt to 0 is required
to clear any pending event type (and will then clear all pending event
types for that count type).

This also removes the special handling of GDS_GPR_LOCK and EXP_GPR_LOCK.
I do not understand what this special handling ever attempted to achieve.
It has existed ever since the original port from an internal code base,
so my best guess is that it solved a problem related to EXEC handling in
that internal code base.

Reviewers: msearles, rampitec, scott.linder, kanarayan

Subscribers: arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits, hakzsam

Differential Revision: https://reviews.llvm.org/D54228

llvm-svn: 347850
2018-11-29 11:06:14 +00:00
Nicolai Haehnle ae369d70c3 AMDGPU/InsertWaitcnts: Use foreach loops for inst and wait event types
Summary:
It hides the type casting ugliness, and I happened to have to add a new
such loop (in a later patch).

Reviewers: msearles, rampitec, scott.linder, kanarayan

Subscribers: arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits, hakzsam

Differential Revision: https://reviews.llvm.org/D54227

llvm-svn: 347849
2018-11-29 11:06:11 +00:00
Nicolai Haehnle 1a94cbb3f5 AMDGPU/InsertWaitcnts: Untangle some semi-global state
Summary:
Reduce the statefulness of the algorithm in two ways:

1. More clearly split generateWaitcntInstBefore into two phases: the
   first one which determines the required wait, if any, without changing
   the ScoreBrackets, and the second one which actually inserts the wait
   and updates the brackets.

2. Communicate pre-existing s_waitcnt instructions using an argument to
   generateWaitcntInstBefore instead of through the ScoreBrackets.

To simplify these changes, a Waitcnt structure is introduced which carries
the counts of an s_waitcnt instruction in decoded form.

There are some functional changes:

1. The FIXME for the VCCZ bug workaround was implemented: we only wait for
   SMEM instructions as required instead of waiting on all counters.

2. We now properly track pre-existing waitcnt's in all cases, which leads
   to less conservative waitcnts being emitted in some cases.

     s_load_dword ...
     s_waitcnt lgkmcnt(0)    <-- pre-existing wait count
     ds_read_b32 v0, ...
     ds_read_b32 v1, ...
     s_waitcnt lgkmcnt(0)    <-- this is too conservative
     use(v0)
     more code
     use(v1)

   This increases code size a bit, but the reduced latency should still be a
   win in basically all cases. The worst code size regressions in my shader-db
   are:

 WORST REGRESSIONS - Code Size
 Before After     Delta Percentage
   1724  1736        12    0.70 %   shaders/private/f1-2015/1334.shader_test [0]
   2276  2284         8    0.35 %   shaders/private/f1-2015/1306.shader_test [0]
   4632  4640         8    0.17 %   shaders/private/ue4_elemental/62.shader_test [0]
   2376  2384         8    0.34 %   shaders/private/f1-2015/1308.shader_test [0]
   3284  3292         8    0.24 %   shaders/private/talos_principle/1955.shader_test [0]

Reviewers: msearles, rampitec, scott.linder, kanarayan

Subscribers: arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits, hakzsam

Differential Revision: https://reviews.llvm.org/D54226

llvm-svn: 347848
2018-11-29 11:06:06 +00:00
Martin Storsjo c9a23ccd9b [CODE_OWNERS] Add myself as code owner for MinGW
llvm-svn: 347847
2018-11-29 10:58:15 +00:00
Max Kazantsev a63b275285 [NFC] Add two XFAIL tests from PR39783
llvm-svn: 347845
2018-11-29 09:38:22 +00:00