Commit Graph

176657 Commits

Author SHA1 Message Date
Pavel Labath 581d79a440 [Object] Add basic minidump support
Summary:
This patch adds basic support for reading minidump files. It contains
the definitions of various important minidump data structures (header,
stream directory), and of one minidump stream (SystemInfo). The ability
to read other streams will be added in follow-up patches. However, all
streams can be read even now as raw data, which means lldb's minidump
support (where this code is taken from) can be immediately rebased on
top of this patch as soon as it lands.

As we don't have any support for generating minidump files (yet), this
tests the code via unit tests with some small handcrafted binaries in
the form of c char arrays.

Reviewers: Bigcheese, jhenderson, zturner

Subscribers: srhines, dschuff, mgorny, fedor.sergeev, lemo, clayborg, JDevlieghere, aprantl, lldb-commits, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D59291

llvm-svn: 356652
2019-03-21 09:18:59 +00:00
Fangrui Song ebfb7852be [BasicAA] Use DenseMap::try_emplace after D59151. NFC
llvm-svn: 356651
2019-03-21 08:47:40 +00:00
Mikael Holmen 5b1754f93d Silence warning about unused variable in builds without asserts [NFC]
llvm-svn: 356648
2019-03-21 07:54:44 +00:00
Craig Topper 8de7bc0bff [ScalarizeMaskedMemIntrinsics] Reverse some if conditions to reduce indentations to remove curly braces.
Pre-commit for D59180

llvm-svn: 356646
2019-03-21 05:54:37 +00:00
Craig Topper 72d888ba9f [InstCombine] Add test case for PR41164. NFC
llvm-svn: 356645
2019-03-21 05:33:10 +00:00
Alina Sbirlea 4fdbd822fc [BasicAA] Reduce no of map seaches [NFCI].
Summary:
This is a refactoring patch.
- Reduce the number of map searches by reusing the iterator.
- Add asserts to check that the entry is in the cache, as this is something BasicAA relies on to avoid infinite recursion.

Reviewers: chandlerc, aschwaighofer

Subscribers: sanjoy, jlebar, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D59151

llvm-svn: 356644
2019-03-21 05:02:05 +00:00
Philip Reames 60212be619 [instcombine] Add some todos, and arrange code for readibility
llvm-svn: 356642
2019-03-21 03:23:40 +00:00
George Burgess IV ae84e9ab49 [MSSA] Delete move ctor; remove dynamic never-moved verification
Code archaeology in D59315 revealed that MSSA should never be moved.
Rather than trying to check dynamically that this hasn't happened in the
verify() functions of Walkers, it's likely best to just delete its move
constructor.

Since all these verify() functions did is check that MSSA hasn't moved,
this allows us to remove these verify functions.

I can readd the verification checks if someone's super concerned about
us trying to `memcpy` MemorySSA or something somewhere, but I imagine we
have other problems if we're trying anything like that...

llvm-svn: 356641
2019-03-21 03:11:34 +00:00
Craig Topper 8d46403b8e [X86] Add CMPXCHG8B feature flag. Set it for all CPUs except i386/i486 including 'generic'. Disable use of CMPXCHG8B when this flag isn't set.
CMPXCHG8B was introduced on i586/pentium generation.

If its not enabled, limit the atomic width to 32 bits so the AtomicExpandPass will expand to lib calls. Unclear if we should be using a different limit for other configs. The default is 1024 and experimentation shows that using an i256 atomic will cause a crash in SelectionDAG.

Differential Revision: https://reviews.llvm.org/D59576

llvm-svn: 356631
2019-03-20 23:35:49 +00:00
Michael Trent 02a2ce9a4b Fix Mach-O bind and rebase validation errors in libObject
Summary:
llvm-objdump (via libObject) validates DYLD_INFO rebase and bind
entries against the basic structure found in the Mach-O file before
evaluating the contents of those entries. Certain malformed Mach-Os can
defeat the validation check and force llvm-objdump (libObject) to crash.

The previous logic verified a rebase or bind started in a valid Mach-O
section, but did not verify that the section wholely contained the
fixup. It also generally allows rebases or binds to start immediately
after a valid section even if that range is not itself part of a valid
section. Finally, bind and rebase opcodes that indicate more than one
fixup (apply N times...) are not completely validated: only the first
and final fixups are checked.

The previous logic also rejected certain binaries as false positives.
Some bind and rebase opcodes can modify the state machine such that the
next bind or rebase will fail. libObject will reject these opcodes as
invalid in order to be helpful and print an error message associated
with the instruction that caused the problem, even though the binary is
not actually illegal until it consumes the invalid state in the state
machine. In other words, libObject may reject a Mach-O binary that
Apple's dynamic linker may consider legal. The original version of
macho-rebase-add-addr-uleb-too-big is an example of such a binary.

I have replaced the existing checkSegAndOffset and checkCountAndSkip
functions with a single function, checkSegAndOffsets, which validates
all of the fixups realized by a DYLD_INFO opcode. checkSegAndOffsets
verifies that a Mach-O section fully contains each fixup. Every fixup
realized by an opcode is validated, and some (but not all!)
inconsistencies in the state machine are allowed until a fixup is
realized. This means that libObject may fail on an opcode that realizes
a fixup, not on the opcode that introduced the arithmetic error.

Existing test cases have been modified to reflect the changes in error
messages returned by libObject. What's more, the test case for 
macho-rebase-add-addr-uleb-too-big has been modified so that it actually
triggers the error condition; the new code in libObject considers the
original test binary "legal".

rdar://47797757

Reviewers: lhames, pete, ab

Reviewed By: pete

Subscribers: rupprecht, jdoerfert, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D59574

llvm-svn: 356629
2019-03-20 23:21:16 +00:00
Thomas Lively 5098f8589d [WebAssembly][NFC] Fix formatting error from rL356610
llvm-svn: 356622
2019-03-20 22:34:34 +00:00
Tim Renouf 2327c231d6 [AMDGPU] Do not generate spurious PAL metadata
My previous fix rL356591 "[AMDGPU] Added MsgPack format PAL metadata"
accidentally caused a spurious PAL metadata .note record to be emitted
for any AMDGPU output. That caused failures in the lld test
amdgpu-relocs.s. Fixed.

Differential Revision: https://reviews.llvm.org/D59613

Change-Id: Ie04a2aaae890dcd490f22c89edf9913a77ce070e
llvm-svn: 356621
2019-03-20 22:02:09 +00:00
Nikita Popov 03dbfc2eef [InstCombine] Add additional sub nsw inference tests; NFC
nsw can be determined based on known bits here, but currently
isn't.

llvm-svn: 356620
2019-03-20 21:42:17 +00:00
Stanislav Mekhanoshin 0a11829ab2 Allow machine dce to remove uses in the same instruction
Machine DCE cannot remove a dead definition if there are non-dbg uses.
A use however can be in the same instruction:

  dead %0 = INST %0

Such instructions sometimes created by Detect dead lanes pass.

Allow this instruction to be deleted despite the use if the only use
belongs to the same instruction.

Differential Revision: https://reviews.llvm.org/D59565

llvm-svn: 356619
2019-03-20 21:42:05 +00:00
Craig Topper 0367553304 [X86] Call lowerShuffleAsBitMask for 512-bit vectors in lowerShuffleAsBlend.
This patch enables the use of lowerShuffleAsBitMask for 512-bit blends before
falling back to move immedate, GPR to k-register, and masked op.

I had to make some changes to support v8i64 when i64 is not a legal type. And to
support floating point types.

This trades a load for the move immediate and GPR move which is higher latency.
But its probably better for register pressure not having to hop through other
register classes. The load+and should play better with LICM and
rematerialization I think.

Differential Revision: https://reviews.llvm.org/D59479

llvm-svn: 356618
2019-03-20 21:30:20 +00:00
Michael Liao bbcb95a64e [AMDGPU] Fix dependency on `BinaryFormat`
Summary: - The linking is broken when this library is built as shared one.

Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D59610

llvm-svn: 356617
2019-03-20 21:22:27 +00:00
Matt Arsenault 2065206a9d AMDGPU: Don't look for constant in insert/extract_vector_elt regbankselect
The constantness shouldn't change the register bank choice. We also
don't need to restrict this to only indexing VGPRs, since it's
possible to index SGPRs (but SelectionDAG made using this
difficult). Allow directly indexing SGPRs when appropriate.

llvm-svn: 356611
2019-03-20 20:41:34 +00:00
Thomas Lively f6f4f84378 [WebAssembly] Target features section
Summary:
Implements a new target features section in assembly and object files
that records what features are used, required, and disallowed in
WebAssembly objects. The linker uses this information to ensure that
all objects participating in a link are feature-compatible and records
the set of used features in the output binary for use by optimizers
and other tools later in the toolchain.

The "atomics" feature is always required or disallowed to prevent
linking code with stripped atomics into multithreaded binaries. Other
features are marked used if they are enabled globally or on any
function in a module.

Future CLs will add linker flags for ignoring feature compatibility
checks and for specifying the set of allowed features, implement using
the presence of the "atomics" feature to control the type of memory
and segments in the linked binary, and add front-end flags for
relaxing the linkage policy for atomics.

Reviewers: aheejin, sbc100, dschuff

Subscribers: jgravelle-google, hiraditya, sunfish, mgrang, jfb, jdoerfert, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D59173

llvm-svn: 356610
2019-03-20 20:26:45 +00:00
Michael Liao eea5177d30 [AMDGPU] Fix clamp bit DAG operand
Summary:
- Should use `targetconstant` instead of `constant` operand for clamp
  bit, which is expected as an immediate operand. Under certain
  conditions, such as a common `i1 false` constant is used in other
  place and selected before the instruction with clamp bit, register
  operand may be added instead of immediate one. Use `targetcosntant` to
  enforce that.

Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D59608

llvm-svn: 356608
2019-03-20 20:18:56 +00:00
Pete Couperus b062239d63 [ARC] Add ARCOptAddrMode pass to generate postincrement loads/stores.
Build on newly introduced ARC postincrement loads/stores from r356200.

Patch By Denis Antrushin! <denis@synopsys.com>

Differential Revision: https://reviews.llvm.org/D59409

llvm-svn: 356606
2019-03-20 20:06:21 +00:00
Evandro Menezes f7522cab39 [AArch64] Fix formatting (NFC)
Indent macro instances properly.

llvm-svn: 356604
2019-03-20 19:57:59 +00:00
Konstantin Zhuravlyov 88268e3e36 AMDHSA: Fix COMPUTE_PGM_RSRC2.USER_SGPR calculation when parsing ISA assembly
It must match https://llvm.org/docs/AMDGPUUsage.html#initial-kernel-execution-state

Differential Revision: https://reviews.llvm.org/D59570

llvm-svn: 356603
2019-03-20 19:44:47 +00:00
Eli Friedman 638be660d7 [ARM] Eliminate redundant "mov rN, sp" instructions in Thumb1.
This takes sequences like "mov r4, sp; str r0, [r4]", and optimizes them
to something like "str r0, [sp]".

For regular stack variables, this optimization was already implemented:
we lower loads and stores using frame indexes, which are expanded later.
However, when constructing a call frame for a call with more than four
arguments, the existing optimization doesn't apply.  We need to use
stores which are actually relative to the current value of sp, and don't
have an associated frame index.

This patch adds a special case to handle that construct.  At the DAG
level, this is an ISD::STORE where the address is a CopyFromReg from SP
(plus a small constant offset).

This applies only to Thumb1: in Thumb2 or ARM mode, a regular store
instruction can access SP directly, so the COPY gets eliminated by
existing code.

The change to ARMDAGToDAGISel::SelectThumbAddrModeSP is a related
cleanup: we shouldn't pretend that it can select anything other than
frame indexes.

Differential Revision: https://reviews.llvm.org/D59568

llvm-svn: 356601
2019-03-20 19:40:45 +00:00
Rafael Auler 6dc53ccb0b [Linker] Fix crash handling appending linkage
Summary:
When linking two llvm.used arrays, if the resulting merged
array ends up with duplicated elements (with the same name) but with
different types, the IRLinker was crashing. This was supposed to be
legal, as the IRLinker bitcasts elements to match types in these
situations.

This bug was exposed by D56928 in clang to support attribute used
in member functions of class templates. Crash happened when self-hosting
with LTO. Since LLVM depends on attribute used to generate code
for the dump() method, ubiquitous in the code base, many input bc
had a definition of this method referenced in their llvm.used array.
Some of these classes got optimized, changing the type of the first
parameter (this) in the dump method, leading to a scenario with a
pool of valid definitions but some with a different type, triggering
this bug.

This is a memory bug: ValueMapper depends on (calls) the materializer
provided by IRLinker, and this materializer was freely calling RAUW
methods whenever a global definition was updated in the temporary merged
output file. However, replaceAllUsesWith may or may not destroy
constants that use this global. If the linked definition has a type
mismatch regarding the new def and the old def, the materializer would
bitcast the old type to the new type and the elements of the llvm.used
array, which already uses bitcast to i8*, would end up with elements
cascading two bitcasts. RAUW would then indirectly call the
constantfolder to update the constant to the new ref, which would,
instead of updating the constant, destroy it to be able to create
a new constant that folds the two bitcasts into one. The problem is that
ValueMapper works with pointers to the same constants that may be
getting destroyed by RAUW. Obviously, RAUW can update references in the
Module to do not use the old destroyed constant, but it can't update
ValueMapper's internal pointers to these constants, which are now
invalid.

The approach here is to move the task of RAUWing old definitions
outside of the materializer.

Test Plan:
Added LIT test case, tested clang self-hosting with D56928 and
verified it works

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D59552

llvm-svn: 356597
2019-03-20 19:20:07 +00:00
Alina Sbirlea f69f807321 [NFC] Fix brace indentation.
llvm-svn: 356596
2019-03-20 19:18:55 +00:00
Robert Lougher f2158a8ef0 Resubmit r356511 "[TailCallElim] Add tailcall elimination pass to LTO pipelines"
Failing LLD tests have been fixed in r356593.

llvm-svn: 356594
2019-03-20 19:08:18 +00:00
Tim Renouf e7bd52f86e [AMDGPU] Added MsgPack format PAL metadata
Summary:
PAL metadata now supports both the old linear reg=val pairs format and
the new MsgPack format.

The MsgPack format uses YAML as its textual representation. On output to
YAML, a mnemonic name is provided for some hardware registers.

Differential Revision: https://reviews.llvm.org/D57028

Change-Id: I2bbaabaaca4b3574f7e03b80fbef7c7a69d06a94
llvm-svn: 356591
2019-03-20 18:47:21 +00:00
Philip Reames e4588bbf80 Simplify operands of masked stores and scatters based on demanded elements
If we know we're not storing a lane, we don't need to compute the lane. This could be improved by using the undef element result to further prune the mask, but I want to separate that into its own change since it's relatively likely to expose other problems.

Differential Revision: https://reviews.llvm.org/D57247

llvm-svn: 356590
2019-03-20 18:44:58 +00:00
Alina Sbirlea 5baa72ea74 [LICM & MemorySSA] Don't sink/hoist stores in the presence of ordered loads.
Summary:
Before this patch, if any Use existed in the loop, with a defining
access in the loop, we conservatively decide to not move the store.
What this approach was missing, is that ordered loads are not Uses, they're Defs
in MemorySSA. So, even when the clobbering walker does not find that
volatile load to interfere, we still cannot hoist a store past a
volatile load.
Resolves PR41140.

Reviewers: george.burgess.iv

Subscribers: sanjoy, jlebar, Prazek, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D59564

llvm-svn: 356588
2019-03-20 18:33:37 +00:00
Nikita Popov 00b5ecab5d [ValueTracking] Compute range for abs without nsw
This is a small followup to D59511. The code that was moved into
computeConstantRange() there is a bit overly conversative: If the
abs is not nsw, it does not compute any range. However, abs without
nsw still has a well-defined contiguous unsigned range from 0 to
SIGNED_MIN. This is a lot less useful than the usual 0 to SIGNED_MAX
range, but if we're already here we might as well specify it...

Differential Revision: https://reviews.llvm.org/D59563

llvm-svn: 356586
2019-03-20 18:16:02 +00:00
Nikita Popov 37cf25c3c6 [InstCombine] Fold add nuw + uadd.with.overflow
Fold add nuw and uadd.with.overflow with constants if the
addition does not overflow.

Part of https://bugs.llvm.org/show_bug.cgi?id=38146.

Patch by Dan Robertson.

Differential Revision: https://reviews.llvm.org/D59471

llvm-svn: 356584
2019-03-20 18:00:27 +00:00
Jordan Rupprecht becd797a97 [Remarks] Fix mismatched delete due to missing virtual destructor
This fixes an asan failure introduced in r356519.

llvm-svn: 356583
2019-03-20 17:44:24 +00:00
Tim Renouf d737b551e9 [AMDGPU] Factored PAL metadata handling out into its own class
Summary:
This commit introduces a new AMDGPUPALMetadata class that:
* is inside the AMDGPU target;
* keeps an in-memory representation of PAL metadata;
* provides a method to read the frontend-supplied metadata from LLVM IR;
* provides methods for the asm printer to set metadata items;
* provides methods to write the metadata as a binary blob to put in a
  .note record or as an asm directive;
* provides a method to read the metadata as a binary blob from a .note
  record.

Because llvm-readobj cannot call directly into a target, I had to remove
llvm-readobj's ability to dump PAL metadata, pending a resolution to
https://reviews.llvm.org/D52821

Differential Revision: https://reviews.llvm.org/D57027

Change-Id: I756dc830894fcb6850324cdcfa87c0120eb2cf64
llvm-svn: 356582
2019-03-20 17:42:00 +00:00
Nico Weber e8062d20c9 Remove HAVE_REALPATH from config.h
Its last use was removed in r352916.
No behavior change.

Differential Revision: https://reviews.llvm.org/D59601

llvm-svn: 356579
2019-03-20 17:26:11 +00:00
Dmitry Preobrazhensky 04bd1185ad [AMDGPU][MC] Corrected checks for DS offset0 range
See bug 40889: https://bugs.llvm.org/show_bug.cgi?id=40889

Reviewers: artem.tamazov, arsenm

Differential Revision: https://reviews.llvm.org/D59313

llvm-svn: 356576
2019-03-20 17:13:58 +00:00
Sanjay Patel a2250e923b [CGP] fix formatting; NFC
llvm-svn: 356572
2019-03-20 16:47:53 +00:00
Clement Courbet 1cb6430228 Fix sanitizer failures for 356550.
Mark bcmp as having optimized codegen, so that asan can detect it and
mark users as nobuiltin.

llvm-svn: 356568
2019-03-20 16:14:59 +00:00
Nico Weber 9920b98c71 gn build: Add build files for some clang-tools-extra
Adds clang-change-namespace, clang-move, clang-query,
clang-reorder-fields.

Differential Revision: https://reviews.llvm.org/D59554

llvm-svn: 356567
2019-03-20 16:14:16 +00:00
Sanjay Patel d1ce455f7b [CGP] convert chain of 'if' to 'switch'; NFC
This should be extended, but CGP does some strange things,
so I'm intentionally not changing the potential order of
any transforms yet.

llvm-svn: 356566
2019-03-20 15:53:06 +00:00
Nico Weber 6112b76b2f gn build: Merge r356508
llvm-svn: 356563
2019-03-20 15:41:25 +00:00
Dmitry Preobrazhensky 137976fae2 [AMDGPU][MC][GFX9] Added support of operands shared_base, shared_limit, private_base, private_limit, pops_exiting_wave_id
See bug 39297: https://bugs.llvm.org/show_bug.cgi?id=39297

Reviewers: artem.tamazov, arsenm, rampitec

Differential Revision: https://reviews.llvm.org/D59290

llvm-svn: 356561
2019-03-20 15:40:52 +00:00
Nico Weber 9e7af8d026 gn build: Merge r356519
llvm-svn: 356560
2019-03-20 15:36:11 +00:00
Sanjay Patel fb44f99b73 [CGP][x86] add tests for usubo regression (PR41129); NFC
llvm-svn: 356559
2019-03-20 15:02:35 +00:00
Sjoerd Meijer 7bb785cbc3 Follow up of rL356555
Pacify buildbot that complained about a member function not marked with
override.

llvm-svn: 356557
2019-03-20 14:33:39 +00:00
Sjoerd Meijer 633fb0f266 [TTI] getMemcpyCost
This adds new function getMemcpyCost to TTI so that the cost of a memcpy can be
modeled and queried. The default implementation returns Expensive, but targets
can override this function to model the cost more accurately.

Differential Revision: https://reviews.llvm.org/D59252

llvm-svn: 356555
2019-03-20 14:15:46 +00:00
George Rimar 0373bedb41 [llvm-objcopy] - Use replaceSectionReferences to update the sections for symbols in symbol table.
If the compression was used and we had a symbol not involved in relocation,
we never updated its section and it was silently removed from the output.

Differential revision: https://reviews.llvm.org/D59542

llvm-svn: 356554
2019-03-20 13:57:47 +00:00
Simon Pilgrim 51f65171e9 Remove out of date comment. NFCI.
DAGCombiner::convertBuildVecZextToZext just requires the extractions to be sequential, they don't have to start from 0'th index.

llvm-svn: 356552
2019-03-20 12:24:15 +00:00
Clement Courbet 238af52ded [ExpandMemCmp] Trigger on bcmp too.
Summary: Fixes 41150.

Reviewers: gchatelet

Subscribers: hiraditya, llvm-commits, ckennelly, sbenza, jyknight

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D59593

llvm-svn: 356550
2019-03-20 11:51:11 +00:00
Simon Pilgrim 2acca37a2d [X86] Use getConstantOperandAPInt to detect out-of-range shifts.
llvm-svn: 356549
2019-03-20 11:41:52 +00:00
Andrea Di Biagio 624f5deff4 [X86] Remove X86 specific dag nodes for RDTSC/RDTSCP/RDPMC. NFCI
This patch removes the following dag node opcodes from namespace X86ISD:

RDTSC_DAG,
RDTSCP_DAG,
RDPMC_DAG

The logic that expands RDTSC/RDPMC/XGETBV intrinsics is basically the same. The
only differences are:

    RDTSC/RDTSCP don't implicitly read ECX.
    RDTSCP also implicitly writes ECX.

I moved the common expansion logic into a helper function with the goal to get
rid of code repetition. That helper is now used for the expansion of
RDTSC/RDTSCP/RDPMC/XGETBV intrinsics.

No functional change intended.

Differential Revision: https://reviews.llvm.org/D59547

llvm-svn: 356546
2019-03-20 11:21:15 +00:00
Sylvestre Ledru ba92e9bb11 [perf][DebugInfo] follow up for "add SectionedAddress to DebugInfo interfaces"
Summary: Fix the build failure when perf jit is enabled

Reviewers: avl, dblaikie

Reviewed By: avl

Subscribers: modocache, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D59189

llvm-svn: 356542
2019-03-20 10:02:18 +00:00
David Stuttard fc2a747345 [AMDGPU] Allow MIMG with no uses in adjustWritemask in isel
Summary:
If an MIMG instruction has managed to get through to adjustWritemask in isel but
has no uses (and doesn't enable TFC) then prevent an assertion by not attempting
to adjust the writemask.

The instruction will be removed anyway.

Change-Id: I9a5dba6bafe1f35ac99c1b73df390936e2ac27a7

Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, tpr, t-tye, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D58964

llvm-svn: 356540
2019-03-20 09:29:55 +00:00
Philip Reames 484d07c828 [instcombine] Add todos describing missing transforms for masked.* intrinsics
llvm-svn: 356536
2019-03-20 03:36:05 +00:00
Craig Topper fda1f96d28 [X86] Remove X32 check lines from a test that doesn't have an X32 FileCheck prefix. Regenerate the test using update_llc_test_checks. NFC
llvm-svn: 356535
2019-03-20 03:13:28 +00:00
Douglas Yung 16a8c54127 Retry to add workaround to build scoped enums with VS2015. NFCI.
We need this as we still have internal build bots on VS2015.

llvm-svn: 356534
2019-03-20 01:52:40 +00:00
Douglas Yung 30ff436319 Revert "Add workaround to build scoped enums with VS2015. NFCI."
This reverts commit 6080a6fb19 (r356532).

Clang does not accept this syntax, so reverting this until I can find something that works across all compilers.

llvm-svn: 356533
2019-03-20 00:41:12 +00:00
Douglas Yung 6080a6fb19 Add workaround to build scoped enums with VS2015. NFCI.
We need this as we still have internal build bots on VS2015.

llvm-svn: 356532
2019-03-20 00:26:56 +00:00
Craig Topper 97d104cbee [X86] Re-disable cmpxchg16b for 32-bit mode assembly parsing.
This was broken recently when I factored the 64 bit mode check into hasCmpxchg16 without thinking about the AssemblerPredicate.

llvm-svn: 356531
2019-03-19 23:57:16 +00:00
Eli Friedman 2596e8b3e7 [ARM] Make sure to save/restore LR when we use tBfar.
This change does two things. One, it ensures compilation will abort
instead of miscompiling if ARMFrameLowering::determineCalleeSaves
chooses not to save LR in a case where it's necessary.  Two, it changes
the way we estimate the size of a function to be more conservative in
the presence of constant pool entries and jump tables.

EstimateFunctionSizeInBytes probably still isn't really conservative
enough, but I'm not sure how we can come up with a reliable estimate
before constant islands runs.

Differential Revision: https://reviews.llvm.org/D59439

llvm-svn: 356527
2019-03-19 21:48:08 +00:00
Amara Emerson 761ca2e53b [AArch64][GlobalISel] Add an optimization to select vector DUP instructions.
This adds pattern matching for the insert+shufflevector sequence so we can
generate dup instructions instead of the current TBL sequence.

Differential Revision: https://reviews.llvm.org/D59558

llvm-svn: 356526
2019-03-19 21:43:05 +00:00
Amara Emerson 18e2c5724a [AArch64][GlobalISel] Make v4s32 G_IMPLICIT_DEF legal.
llvm-svn: 356525
2019-03-19 21:43:02 +00:00
Reid Kleckner e7effeed76 Remove MSVC compat hack since the inline keyword was added in 2015
Our minimum MSVC toolchain requirement is greater than 2015, so we don't
need this conditional macro anymore.  New versions of MSVC apparently
have a header, xkeycheck.h, to check that keywords haven't been
redefined.

Fixes PR41144

llvm-svn: 356524
2019-03-19 21:40:59 +00:00
Francis Visoiu Mistrih 5616718c08 [Remarks] Fix gcc build for r356519
Fails here:
http://lab.llvm.org:8011/builders/clang-cmake-x86_64-sde-avx512-linux/builds/20046/steps/build%20stage%201/logs/stdio

llvm-svn: 356522
2019-03-19 21:32:03 +00:00
Florian Hahn 91d96b3a6a [DwarfDebug] Add triple to test.
llvm-svn: 356521
2019-03-19 21:18:59 +00:00
Nikita Popov 2dd1566e8b [InstSimplify] Add additional cmp of abs without nsw tests; NFC
llvm-svn: 356520
2019-03-19 21:12:21 +00:00
Francis Visoiu Mistrih 5a05cc0eeb Reland "[Remarks] Add a new Remark / RemarkParser abstraction"
This adds a Remark class that allows us to share code when working with
remarks.

The C API has been updated to reflect this. Instead of the parser
generating C structs, it's now using a C++ object that is used through
opaque pointers in C. This gives us much more flexibility on what
changes we can make to the internal state of the object and interacts
much better with scenarios where the library is used through dlopen.

* C API updates:
  * move from C structs to opaque pointers and functions
  * the remark type is now an enum instead of a string
* unit tests updates:
  * use mostly the C++ API
  * keep one test for the C API
  * rename to YAMLRemarksParsingTest
* a typo was fixed: AnalysisFPCompute -> AnalysisFPCommute.
* a new error message was added: "expected a remark tag."
* llvm-opt-report has been updated to use the C++ parser instead of the
C API

Differential Revision: https://reviews.llvm.org/D59049

Original llvm-svn: 356491

llvm-svn: 356519
2019-03-19 21:11:07 +00:00
Robert Lougher c67a759c99 Revert r356511 "[TailCallElim] Add tailcall elimination pass to LTO pipelines"
Due to buildbot failures (LLD tests).

llvm-svn: 356516
2019-03-19 20:54:20 +00:00
Florian Hahn 1663c9466f [DwarfDebug] Skip entries to big for 16 bit size field in Dwarf < 5.
Nothing prevents entries from being bigger than the 16 bit size field in
Dwarf < 5. For entries that are too big, just emit an empty entry
instead of crashing.

This fixes PR41038.

Reviewers: probinson, aprantl, davide

Reviewed By: probinson

Differential Revision: https://reviews.llvm.org/D59518

llvm-svn: 356514
2019-03-19 20:37:06 +00:00
Robert Lougher de548ccab9 [TailCallElim] Add tailcall elimination pass to LTO pipelines
LTO provides additional opportunities for tailcall elimination due to
link-time inlining and visibility of nocapture attribute. Testing showed
negligible impact on compilation times.

Differential Revision: https://reviews.llvm.org/D58391

llvm-svn: 356511
2019-03-19 20:24:28 +00:00
Philip Reames 70537abe52 Demanded elements support for masked.load and masked.gather
Teach instcombine to propagate demanded elements through a masked load or masked gather instruction. This is in the broader context of improving vector pointer instcombine under https://reviews.llvm.org/D57140.

Differential Revision: https://reviews.llvm.org/D57372

llvm-svn: 356510
2019-03-19 20:10:00 +00:00
Matt Arsenault cf55a657f0 CodeGen: Refactor regallocator command line and target selection
This will allow targets more flexibility to replace the
register allocator core passes. In a future commit,
AMDGPU will run the core register assignment passes
twice, and will also want to disallow using the
standard -regalloc option.

llvm-svn: 356506
2019-03-19 19:33:12 +00:00
Matt Arsenault 3c98cdd218 RegAllocFast: Do not allocate registers for undef uses
Do not actually allocate a register for an undef use. Previously we we
would create unnecessary reload instruction for undef uses where the
register wasn't live.

Patch by Matthias Braun

llvm-svn: 356501
2019-03-19 19:16:04 +00:00
Matt Arsenault c2e35a6f32 RegAllocFast: Remove early selection loop, the spill calculation will report cost 0 anyway for free regs
The 2nd loop calculates spill costs but reports free registers as cost
0 anyway, so there is little benefit from having a separate early
loop.

Surprisingly this is not NFC, as many register are marked regDisabled
so the first loop often picks up later registers unnecessarily instead
of the first one available in the allocation order...

Patch by Matthias Braun

llvm-svn: 356499
2019-03-19 19:01:34 +00:00
Simon Pilgrim 77482120da Fix for ABS legalization on PPC buildbot.
llvm-svn: 356498
2019-03-19 18:55:46 +00:00
Philip Reames db65a5b776 Allow unordered loads to be considered invariant in CodeGen
The actual code change is fairly straight forward, but exercising it isn't. First, it turned out we weren't adding the appropriate flags in SelectionDAG. Second, it turned out that we've got some optimization gaps, so obvious test cases don't work.

My first attempt (in atomic-unordered.ll) points out a deficiency in our peephole-opt folding logic which I plan to fix separately. Instead, I'm exercising this through MachineLICM.

Differential Revision: https://reviews.llvm.org/D59375

llvm-svn: 356494
2019-03-19 18:27:18 +00:00
Francis Visoiu Mistrih 064774f753 Revert "[Remarks] Add a new Remark / RemarkParser abstraction"
This reverts commit 51dc6a8c84cd6a58562e320e1828a0158dbbf750.

Breaks
http://lab.llvm.org:8011/builders/clang-cmake-x86_64-sde-avx512-linux/builds/20034/steps/build%20stage%201/logs/stdio.

llvm-svn: 356492
2019-03-19 18:21:43 +00:00
Francis Visoiu Mistrih 9ef60a2539 [Remarks] Add a new Remark / RemarkParser abstraction
This adds a Remark class that allows us to share code when working with
remarks.

The C API has been updated to reflect this. Instead of the parser
generating C structs, it's now using a C++ object that is used through
opaque pointers in C. This gives us much more flexibility on what
changes we can make to the internal state of the object and interacts
much better with scenarios where the library is used through dlopen.

* C API updates:
  * move from C structs to opaque pointers and functions
  * the remark type is now an enum instead of a string
* unit tests updates:
  * use mostly the C++ API
  * keep one test for the C API
  * rename to YAMLRemarksParsingTest
* a typo was fixed: AnalysisFPCompute -> AnalysisFPCommute.
* a new error message was added: "expected a remark tag."
* llvm-opt-report has been updated to use the C++ parser instead of the
C API

Differential Revision: https://reviews.llvm.org/D59049

llvm-svn: 356491
2019-03-19 18:09:51 +00:00
Nikita Popov 208381953b [ValueTracking] Use computeConstantRange() for unsigned add/sub overflow
Improve computeOverflowForUnsignedAdd/Sub in ValueTracking by
intersecting the computeConstantRange() result into the ConstantRange
created from computeKnownBits(). This allows us to detect some
additional never/always overflows conditions that can't be determined
from known bits.

This revision also adds basic handling for constants to
computeConstantRange(). Non-splat vectors will be handled in a followup.

The signed case will also be handled in a followup, as it needs some
more groundwork.

Differential Revision: https://reviews.llvm.org/D59386

llvm-svn: 356489
2019-03-19 17:53:56 +00:00
Peter Collingbourne e092f806f0 gn build: Merge r356387.
llvm-svn: 356485
2019-03-19 17:30:59 +00:00
Peter Collingbourne 0dd018d944 gn build: Merge r356451.
llvm-svn: 356484
2019-03-19 17:30:50 +00:00
Simon Pilgrim e744f513c4 [X86][SSE] SimplifyDemandedVectorEltsForTargetNode - handle repeated shift amounts
If a value with multiple uses is only ever used for SSE shift amounts then we know that only the bottom 64-bits are needed.

llvm-svn: 356483
2019-03-19 17:23:25 +00:00
Philip Reames 2153c4b828 [AtomicExpand] Fix a crash bug when lowering unordered loads to cmpxchg
Add tests for wider atomic loads and stores.  In the process, fix a crasher where we appearently handled unorder stores, but not loads, when lowering to cmpxchg idioms.

llvm-svn: 356482
2019-03-19 17:20:49 +00:00
Simon Atanasyan db4601e60a [MIPS][microMIPS] Enable dynamic stack realignment
Dynamic stack realignment was disabled on micromips by checking if
target has standard encoding. We simply change the condition to skip
Mips16 only.

Patch by Mirko Brkusanin.

Differential Revision: http://reviews.llvm.org/D59499

llvm-svn: 356478
2019-03-19 17:01:24 +00:00
Jordan Rupprecht f74d45a775 [NFC] Fix unused variable in release builds
This was introduced in rL356468.

llvm-svn: 356477
2019-03-19 16:52:40 +00:00
Justin Bogner b353d6887e [DAGCombine] Fix a miscompile when reducing BUILD_VECTORs to a shuffle
In r311255 we added a case where we split vectors whose elements are
all derived from the same input vector so that we could shuffle it
more efficiently. In doing so, createBuildVecShuffle was taught to
adjust for the fact that all indices would be based off of the first
vector when this happens, but it's possible for the code that checked
that to fire incorrectly if we happen to have a BUILD_VECTOR of
extracts from subvectors and don't hit this new optimization.

Instead of trying to detect if we've split the vector by checking if
we have extracts from the same base vector, we can just pass that
information into createBuildVecShuffle, avoiding the miscompile.

Differential Revision: https://reviews.llvm.org/D59507

llvm-svn: 356476
2019-03-19 16:52:00 +00:00
Simon Pilgrim 7a8e5051f4 Fix unused variable warning. NFCI.
llvm-svn: 356474
2019-03-19 16:49:59 +00:00
Philip Reames 376c87fcd4 [Tests] Update to newer ISA
There are some issues w/missed opts on older platforms, but that's not the purpose of this test.  Using a newer API points out that some TODOs are already handled, and allows addition of tests to exercise other issues (future patch.)

llvm-svn: 356473
2019-03-19 16:46:56 +00:00
Sanjay Patel 5b820323ca [InstCombine] fold logic-of-nan-fcmps (PR41069)
Combine 2 fcmps that are checking for nan-ness:
   and (fcmp ord X, 0), (and (fcmp ord Y, 0), Z) --> and (fcmp ord X, Y), Z
   or  (fcmp uno X, 0), (or  (fcmp uno Y, 0), Z) --> or  (fcmp uno X, Y), Z

This is an exact match for a minimal reassociation pattern.
If we want to handle this more generally that should go in
the reassociate pass and allow removing this code.

This should fix:
https://bugs.llvm.org/show_bug.cgi?id=41069

llvm-svn: 356471
2019-03-19 16:39:17 +00:00
Neil Henning 47c2bd2b34 [AMDGPU] Add convergent attribute to WWM.
Add the convergent attribute to the WWM intrinsic to stop it ever being
sunk out of cfg.

Differential Revision: https://reviews.llvm.org/D59536

llvm-svn: 356470
2019-03-19 16:32:24 +00:00
Simon Pilgrim a56f2822d0 [SelectionDAG] Handle unary SelectPatternFlavor for ABS case in SelectionDAGBuilder::visitSelect
These changes are related to PR37743 and include:

    SelectionDAGBuilder::visitSelect handles the unary SelectPatternFlavor::SPF_ABS case to build ABS node.

    Delete the redundant recognizer of the integer ABS pattern from the DAGCombiner.

    Add promoting the integer ABS node in the LegalizeIntegerType.

    Expand-based legalization of integer result for the ABS nodes.

    Expand-based legalization of ABS vector operations.

    Add some integer abs testcases for different typesizes for Thumb arch

    Add the custom ABS expanding and change the SAD pattern recognizer for X86 arch: The i64 result of the ABS is expanded to:
        tmp = (SRA, Hi, 31)
        Lo = (UADDO tmp, Lo)
        Hi = (XOR tmp, (ADDCARRY tmp, hi, Lo:1))
        Lo = (XOR tmp, Lo)

    The "detectZextAbsDiff" function is changed for the recognition of pattern with the ABS node. Given a ABS node, detect the following pattern:
        (ABS (SUB (ZERO_EXTEND a), (ZERO_EXTEND b))).

    Change integer abs testcases for codegen with the ABS node support for AArch64.
        Indicate that the ABS is legal for the i64 type when the NEON is supported.
        Change the integer abs testcases to show changing of codegen.

    Add combine and legalization of ABS nodes for Thumb arch.

    Extend 'matchSelectPattern' to recognize the ABS patterns with ICMP_SGE condition.

For discussion, see https://bugs.llvm.org/show_bug.cgi?id=37743

Patch by: @ikulagin (Ivan Kulagin)

Differential Revision: https://reviews.llvm.org/D49837

llvm-svn: 356468
2019-03-19 16:24:55 +00:00
Jordan Rupprecht 4a6b9f2316 [llvm-ar] Support N [count] modifier
Summary:
GNU ar supports the 'N' count modifier for the extract (x) and delete (d) operations. When an archive contains multiple members with the same name, this can be used to extract (or delete) them individually. For example:

```
$ llvm-ar t archive.a
foo
foo
$ llvm-ar x archive.a
-> Writes foo twice, overwriting it the second time :( :(
$ llvm-ar xN 1 archive.a foo && mv foo foo.1
$ llvm-ar xN 2 archive.a foo && mv foo foo.2
-> Write foo twice, renaming it in between invocations to preserve all versions
```

Reviewers: ruiu, MaskRay

Reviewed By: ruiu, MaskRay

Subscribers: jdoerfert, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D59503

llvm-svn: 356466
2019-03-19 16:09:54 +00:00
Ryan Taylor 00e063ab92 [AMDGPU] Add buffer/load 8/16 bit overloaded intrinsics
Summary:
Add buffer store/load 8/16 overloaded intrinsics for buffer, raw_buffer and struct_buffer

Change-Id: I166a29f071b2ff4e4683fb0392564b1f223ac61d

Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D59265

llvm-svn: 356465
2019-03-19 16:07:00 +00:00
Neil Henning e85f6bd64f [AMDGPU] Ban i8 min3 promotion.
I found this really weird WWM-related case whereby through the WWM
transformations our isel lowering was trying to promote 2 min's into a
min3 for the i8 type, which our hardware doesn't support.

The new min3_i8.ll test case would previously spew the error:

PromoteIntegerResult #0: t69: i8 = SMIN3 t70, Constant:i8<0>, t68

Before the simple fix to our isel lowering to not do it for i8 MVT's.

Differential Revision: https://reviews.llvm.org/D59543

llvm-svn: 356464
2019-03-19 15:50:24 +00:00
Teresa Johnson bda581b831 [InstCombine] Add missing test for icmp transformation (NFC)
This was split out of D59378. There was no testing for the EQ case in
foldICmpWithDominatingICmp, add one here.

llvm-svn: 356463
2019-03-19 15:43:56 +00:00
Simon Atanasyan af40d4371d [mips] Fix crash on recursive using of .set
Switch to the `MCParserUtils::parseAssignmentExpression` for parsing
assignment expressions in the `.set` directive reduces code and allows
to print an error message instead of crashing in case of incorrect
recursive using of the `.set`.

Fix for the bug https://bugs.llvm.org/show_bug.cgi?id=41053.

Differential Revision: http://reviews.llvm.org/D59452

llvm-svn: 356461
2019-03-19 15:15:35 +00:00
Markus Lavin 00160e226f [DebugInfo] Move test files added in r356451
Moved the X86 dependant .ll tests added in r356451 from
test/DebugInfo/Generic to test/DebugInfo/X86.

llvm-svn: 356460
2019-03-19 15:15:28 +00:00
Simon Pilgrim 8ee477a2ab [InstSimplify] SimplifyICmpInst - icmp eq/ne %X, undef -> undef
As discussed on PR41125 and D59363, we have a mismatch between icmp eq/ne cases with an undef operand:

When the other operand is constant we fold to undef (handled in ConstantFoldCompareInstruction)
When the other operand is non-constant we fold to a bool constant based on isTrueWhenEqual (handled in SimplifyICmpInst).

Neither is really wrong, but this patch changes the logic in SimplifyICmpInst to consistently fold to undef.

The NewGVN test change is annoying (as with most heavily reduced tests) but AFAICT I have kept the purpose of the test based on rL291968.

Differential Revision: https://reviews.llvm.org/D59541

llvm-svn: 356456
2019-03-19 14:08:23 +00:00
Petar Jovanovic 38a6187396 [DebugInfoMetadata] Move main subprogram DIFlag into DISPFlags
Moving subprogram specific flags into DISPFlags makes IR code more readable.
In addition, we provide free space in DIFlags for other
'non-subprogram-specific' debug info flags.

Patch by Djordje Todorovic.

Differential Revision: https://reviews.llvm.org/D59288

llvm-svn: 356454
2019-03-19 13:49:03 +00:00
Sanjay Patel 423b958306 [InstCombine] add FMF to tests for extra coverage; NFC
ninf is probably the only relevant possible flag here
(nnan allows simplification and nsz never makes a difference).

llvm-svn: 356453
2019-03-19 13:39:29 +00:00
Markus Lavin b86ce219f4 [DebugInfo] Introduce DW_OP_LLVM_convert
Introduce a DW_OP_LLVM_convert Dwarf expression pseudo op that allows
for a convenient way to perform type conversions on the Dwarf expression
stack. As an additional bonus it paves the way for using other Dwarf
v5 ops that need to reference a base_type.

The new DW_OP_LLVM_convert is used from lib/Transforms/Utils/Local.cpp
to perform sext/zext on debug values but mainly the patch is about
preparing terrain for adding other Dwarf v5 ops that need to reference a
base_type.

For Dwarf v5 the op maps to DW_OP_convert and for earlier versions a
complex shift & mask pattern is generated to emulate sext/zext.

This is a recommit of r356442 with trivial fixes for the failing tests.

Differential Revision: https://reviews.llvm.org/D56587

llvm-svn: 356451
2019-03-19 13:16:28 +00:00
Simon Pilgrim 9497b2b2f7 [InstCombine] Regenerate + add icmp with undef tests
Better test coverage for PR41125 and D59363

llvm-svn: 356448
2019-03-19 11:44:22 +00:00
Markus Lavin ad78768d59 Revert "[DebugInfo] Introduce DW_OP_LLVM_convert"
This reverts commit 1cf4b593a7ebd666fc6775f3bd38196e8e65fafe.

Build bots found failing tests not detected locally.

Failing Tests (3):
  LLVM :: DebugInfo/Generic/convert-debugloc.ll
  LLVM :: DebugInfo/Generic/convert-inlined.ll
  LLVM :: DebugInfo/Generic/convert-linked.ll

llvm-svn: 356444
2019-03-19 09:17:28 +00:00
Serge Guelton d2f2f33ef2 Use response file when generating LLVM-C.dll
As discovered in D56774 the command line gets to long, so use a response file
to give the script the libs. This change has been tested and is confirmed
working for me.

Commited on behalf of Jakob Bornecrantz.
Differential Revision: https://reviews.llvm.org/D56781

llvm-svn: 356443
2019-03-19 09:14:09 +00:00
Markus Lavin cd8a940b37 [DebugInfo] Introduce DW_OP_LLVM_convert
Introduce a DW_OP_LLVM_convert Dwarf expression pseudo op that allows
for a convenient way to perform type conversions on the Dwarf expression
stack. As an additional bonus it paves the way for using other Dwarf
v5 ops that need to reference a base_type.

The new DW_OP_LLVM_convert is used from lib/Transforms/Utils/Local.cpp
to perform sext/zext on debug values but mainly the patch is about
preparing terrain for adding other Dwarf v5 ops that need to reference a
base_type.

For Dwarf v5 the op maps to DW_OP_convert and for earlier versions a
complex shift & mask pattern is generated to emulate sext/zext.

Differential Revision: https://reviews.llvm.org/D56587

llvm-svn: 356442
2019-03-19 08:48:19 +00:00
Heejin Ahn c60bc94afc [WebAssembly] Small improvements in FixIrreducibleControlFlow (NFC)
Summary:
- Make some class member methods const
- Delete unnecessary includes
- Use a simpler form of `BuildMI`

Reviewers: kripken

Subscribers: dschuff, sbc100, jgravelle-google, sunfish, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D59454

llvm-svn: 356440
2019-03-19 05:26:33 +00:00
Heejin Ahn 1045b41510 [WebAssembly] Improve readability of irreducibility tests
Summary:
This adds `preds` comment lines to BB names for readability, while also
fixes some of existing incorrect comment lines. Also deletes a few
unnecessary attributes. Autogenerated by `opt`.

Reviewers: kripken

Subscribers: dschuff, sbc100, jgravelle-google, sunfish, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D59456

llvm-svn: 356439
2019-03-19 05:10:39 +00:00
Heejin Ahn 34dc1f2483 [WebAssembly] Rename methods according to instruction name changes (NFC)
Reviewers: tlively, sbc100

Subscribers: dschuff, jgravelle-google, sunfish, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D59469

llvm-svn: 356438
2019-03-19 05:07:33 +00:00
Heejin Ahn 9203d21838 [WebAssembly] Add immarg attribute to intrinsics
Summary:
After r355981, intrinsic arguments that are immediate values should be
marked as `ImmArg`.

Reviewers: dschuff, tlively

Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D59447

llvm-svn: 356437
2019-03-19 05:02:30 +00:00
Thomas Lively 0200d62ec7 [WebAssembly] Lower SIMD nnan setcc nodes
Summary:
Adds patterns to lower all the remaining setcc modes: lt, gt,
le, and ge. Fixes PR40912.

Reviewers: aheejin, sbc100, dschuff

Reviewed By: dschuff

Subscribers: jgravelle-google, hiraditya, sunfish, jdoerfert, llvm-commits, srj

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D59519

llvm-svn: 356431
2019-03-19 00:55:34 +00:00
Nikita Popov 3e9770d2dc Revert "[ValueTracking][InstSimplify] Support min/max selects in computeConstantRange()"
This reverts commit 106f0cdefb.

This change impacts the AMDGPU smed3.ll and umed3.ll codegen tests.

llvm-svn: 356424
2019-03-18 22:26:27 +00:00
Kostya Serebryany 9aac4c1be3 [libFuzzer] document -len_control
llvm-svn: 356422
2019-03-18 22:20:47 +00:00
Craig Topper 1dd518da7d [X86] Add coverage for 16-bit and 64-bit versions of bsf/bsr/bt/btc/btr/bts in the assembly tests that are supposed to provide full coverage. Add coverage for cwtl/cltq/cwtd/cqto as well.
llvm-svn: 356420
2019-03-18 22:06:19 +00:00
Craig Topper b24bdf626a [X86] Disable CQTO and CLTQ instructions in the assembly parser outside 64-bit mode.
llvm-svn: 356419
2019-03-18 22:06:14 +00:00
Nikita Popov 106f0cdefb [ValueTracking][InstSimplify] Support min/max selects in computeConstantRange()
Add support for min/max flavor selects in computeConstantRange(),
which allows us to fold comparisons of a min/max against a constant
in InstSimplify. This was suggested by spatel as an alternative
approach to D59378. I've also added the infinite looping test from
that revision here.

Differential Revision: https://reviews.llvm.org/D59506

llvm-svn: 356415
2019-03-18 21:35:19 +00:00
Nikita Popov 930341ba30 [InstCombine] Add tests for add nuw + uaddo; NFC
Baseline tests for D59471 (InstCombine of `add nuw` and `uaddo` with
constants).

Patch by Dan Robertson.

Differential Revision: https://reviews.llvm.org/D59472

llvm-svn: 356414
2019-03-18 21:35:09 +00:00
Craig Topper e732bc6bea [X86] Allow any 8-bit immediate to be used with BT/BTC/BTR/BTS not just sign extended 8-bit immediates.
We need to allow [128,255] in addition to [-128, 127] to match gas.

llvm-svn: 356413
2019-03-18 21:33:59 +00:00
Amara Emerson a140276a1e [GlobalISel] Include missing change from r356396
Forgot to add a change to relax some asserts in r356396.

llvm-svn: 356411
2019-03-18 21:29:21 +00:00
Sam Clegg b7708ec87f [WebAssembly] Don't override default implementation of isOffsetFoldingLegal. NFC.
The default implementation does we want and is going to more compatible
with dynamic linking (-fPIC) support that is planned.

This is NFC because currently we only build wasm with
`-relocation-model=static` which in turn means that the default
`isOffsetFoldingLegal` always returns true today.

Differential Revision: https://reviews.llvm.org/D54661

llvm-svn: 356410
2019-03-18 21:21:12 +00:00
Nikita Popov f89343bc47 [ValueTracking][InstSimplify] Move abs handling into computeConstantRange(); NFC
This is preparation for D59506. The InstructionSimplify abs handling
is moved into computeConstantRange(), which is the general place for
such calculations. This is NFC and doesn't affect the existing tests
in test/Transforms/InstSimplify/icmp-abs-nabs.ll.

Differential Revision: https://reviews.llvm.org/D59511

llvm-svn: 356409
2019-03-18 21:20:03 +00:00
Nikita Popov 05baa9ee1a [InstSimplify] Add additional icmp of min/max tests; NFC
These are baseline tests for D59506.

llvm-svn: 356408
2019-03-18 21:19:56 +00:00
Craig Topper f086e562f9 [X86] Use relocImm in the ROL8ri/ROL16ri/ROL32ri/ROL64ri patterns to be consistent with the ROR patterns.
llvm-svn: 356407
2019-03-18 20:43:15 +00:00
Craig Topper 0b9c640fe0 [X86] Replace uses of i64immSExt32_su with i64relocImmSExt32_su.
For the i8, i16, and i32 instructions we were using a relocImm. Presumably we should for i64 as well.

llvm-svn: 356406
2019-03-18 20:43:09 +00:00
Michael Liao efb4f9e568 [AMDGPU] Enable code selection using `s_mul_hi_u32`/`s_mul_hi_i32`.
Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D59501

llvm-svn: 356405
2019-03-18 20:40:09 +00:00
Jake Ehrlich 5049c3422d [llvm-objcopy] Make .build-id linking atomic
This change makes linking into .build-id atomic and safe to use.
Some users under particular workflows are reporting that this races
more than half the time under particular conditions.

llvm-svn: 356404
2019-03-18 20:35:18 +00:00
Nikita Popov c1d4fc8a62 [InstCombine] Improve with.overflow intrinsic tests; NFC
- Do not use unnamed values in saddo tests
- Add tests for canonicalization of a constant arg0

Patch by Dan Robertson.

Differential Revision: https://reviews.llvm.org/D59476

llvm-svn: 356403
2019-03-18 20:08:35 +00:00
Sam Clegg 1d716acf76 Restore comment regarding why Reloc::PIC_ can't be PIC
The original change back in rL29307 explained this but it was
lost somewhere along the way.

Differential Revision: https://reviews.llvm.org/D59445

llvm-svn: 356402
2019-03-18 20:04:34 +00:00
Alexandre Ganea 9af9f500d1 Fix flat-error-unsupported-gpu-hsa test
Differential Revision: https://reviews.llvm.org/D59505

llvm-svn: 356400
2019-03-18 19:38:04 +00:00
Tim Renouf cfdfba996b [AMDGPU] Asm/disasm clamp modifier on vop3 int arithmetic
Allow the clamp modifier on vop3 int arithmetic instructions in assembly
and disassembly.

This involved adding a clamp operand to the affected instructions in MIR
and MC, and thus having to fix up several places in codegen and MIR
tests.

Differential Revision: https://reviews.llvm.org/D59267

Change-Id: Ic7775105f02a985b668fa658a0cd7837846a534e
llvm-svn: 356399
2019-03-18 19:35:44 +00:00
Tim Renouf 2e94f6e584 [AMDGPU] Asm/disasm v_cndmask_b32_e64 with abs/neg source modifiers
This commit allows v_cndmask_b32_e64 with abs, neg source
modifiers on src0, src1 to be assembled and disassembled.

This does appear to be allowed, even though they are floating point
modifiers and the operand type is b32.

To do this, I added src0_modifiers and src1_modifiers to the
MachineInstr, which involved fixing up several places in codegen and mir
tests.

Differential Revision: https://reviews.llvm.org/D59191

Change-Id: I69bf4a8c73ebc65744f6110bb8fc4e937d79fbea
llvm-svn: 356398
2019-03-18 19:25:39 +00:00
Amara Emerson 8627178d46 Revert r356304: remove subreg parameter from MachineIRBuilder::buildCopy()
After review comments, it was preferred to not teach MachineIRBuilder about
non-generic instructions beyond using buildInstr().

For AArch64 I've changed the buildCopy() calls to buildInstr() + a
separate addReg() call.

This also relaxes the MachineIRBuilder's COPY checking more because it may
not always have a SrcOp given to it.

llvm-svn: 356396
2019-03-18 19:20:10 +00:00
Alexandre Ganea 4aeea4cc42 [DebugInfo][PDB] Don't write empty debug streams
Before, empty debug streams were written as 8 bytes (4 bytes signature + 4 bytes for the GlobalRefs count).

With this patch, unused empty streams aren't emitted anymore. Modules now encode 65535 as an 'unused stream' value, by convention.
Also fix the * Linker * contrib section which wasn't correctly emitted previously.

Differential Revision: https://reviews.llvm.org/D59502

llvm-svn: 356395
2019-03-18 19:13:23 +00:00
Tim Renouf 8723a56551 [MsgPack][AMDGPU] Fix unflushed raw_string_ostream bugs on windows expensive checks bot
This fixes a couple of unflushed raw_string_ostream bugs in recent
commits that only show up on a bot building on windows with expensive
checks.

Differential Revision: https://reviews.llvm.org/D59396

Change-Id: I9c6208325503b3ee0786b4b688e13fc24a15babf
llvm-svn: 356394
2019-03-18 19:00:46 +00:00
Craig Topper f07062a798 [X86] Rename imm8_su/imm16_su/imm32_su to relocImm8_su/relocImm16_su/relocImm32_su/ to accurately reflect what they are.
llvm-svn: 356393
2019-03-18 18:54:06 +00:00
Warren Ristow ad7d0ded2e [SCEV] Guard movement of insertion point for loop-invariants
This reinstates r347934, along with a tweak to address a problem with
PHI node ordering that that commit created (or exposed). (That commit
was reverted at r348426, due to the PHI node issue.)

Original commit message:

r320789 suppressed moving the insertion point of SCEV expressions with
dev/rem operations to the loop header in non-loop-invariant situations.
This, and similar, hoisting is also unsafe in the loop-invariant case,
since there may be a guard against a zero denominator. This is an
adjustment to the fix of r320789 to suppress the movement even in the
loop-invariant case.

This fixes PR30806.

Differential Revision: https://reviews.llvm.org/D57428

llvm-svn: 356392
2019-03-18 18:52:35 +00:00
Adhemerval Zanella 270249de2b [AArch64] Small fix for getIntImmCost
It uses the generic AArch64_IMM::expandMOVImm to get the correct
number of instruction used in immediate materialization.

Reviewers: efriedma

Differential Revision: https://reviews.llvm.org/D58461

llvm-svn: 356391
2019-03-18 18:50:58 +00:00
Adhemerval Zanella a3cefa5d64 [AArch64] Optimize floating point materialization
This patch follows some ideas from r352866 to optimize the floating
point materialization even further. It changes isFPImmLegal to
considere up to 2 mov instruction or up to 5 in case subtarget has
fused literals.

The rationale is the cost is the same for mov+fmov vs. adrp+ldr; but
the mov+fmov sequence is always better because of the reduced d-cache
pressure. The timings are still the same if you consider movw+movk+fmov
vs. adrp+ldr will be fused (although one instruction longer).

Reviewers: efriedma

Differential Revision: https://reviews.llvm.org/D58460

llvm-svn: 356390
2019-03-18 18:45:57 +00:00
Adhemerval Zanella 664c1ef528 [TargetLowering] Add code size information on isFPImmLegal. NFC
This allows better code size for aarch64 floating point materialization
in a future patch.

Reviewers: evandro

Differential Revision: https://reviews.llvm.org/D58690

llvm-svn: 356389
2019-03-18 18:40:07 +00:00
Adhemerval Zanella 8a595b1d2e [AArch64] Refactor floating point materialization. NFC
It splits the login of actual instruction emission away from the logic
that figures out the appropriate sequence on AArch64ExpandPseudo::expandMOVImm.
The new function AArch64_IMM::expandMOVImm, which return the list of the 
instructions to materialize the immediate constant, is implemented on a 
separated unit because it will be used in a subsequent patch to optimize
floating point materialization.

Reviewers: efriedma

Differential Revision: https://reviews.llvm.org/D58915

llvm-svn: 356387
2019-03-18 18:23:23 +00:00
Craig Topper c2b35ebc1d [X86] Remove the _alt forms of (V)CMP instructions. Use a combination of custom printing and custom parsing to achieve the same result and more
Similar to previous change done for VPCOM and VPCMP

Differential Revision: https://reviews.llvm.org/D59468

llvm-svn: 356384
2019-03-18 17:59:59 +00:00
Sanjay Patel 08b5e68ef6 [InstCombine] add/adjust test for NaN checks; NFC
llvm-svn: 356383
2019-03-18 17:37:05 +00:00
Nirav Dave 55c921f4bf [DAG] Cleanup unused node in SimplifySelectCC.
Delete temporarily constructed node uses for analysis after it's use,
holding onto original input nodes. Ideally this would be rewritten
without making nodes, but this appears relatively complex.

Reviewers: spatel, RKSimon, craig.topper

Subscribers: jdoerfert, hiraditya, deadalnix, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D57921

llvm-svn: 356382
2019-03-18 17:02:38 +00:00
Michael Liao c131e0e2ae [MVT] Fix typos in comment. NFC.
llvm-svn: 356381
2019-03-18 16:57:40 +00:00
Neil Henning 523dab0788 [AMDGPU] Add an experimental buffer fat pointer address space.
Add an experimental buffer fat pointer address space that is currently
unhandled in the backend. This commit reserves address space 7 as a
non-integral pointer repsenting the 160-bit fat pointer (128-bit buffer
descriptor + 32-bit offset) that is heavily used in graphics workloads
using the AMDGPU backend.

Differential Revision: https://reviews.llvm.org/D58957

llvm-svn: 356373
2019-03-18 14:44:28 +00:00
Sanjay Patel 6063393536 [InstCombine] allow general vector constants for funnel shift to shift transforms
Follow-up to:
rL356338
rL356369

We can calculate an arbitrary vector constant minus the bitwidth, so there's
no need to limit this transform to scalars and splats.

llvm-svn: 356372
2019-03-18 14:27:51 +00:00
George Rimar faf308b11a [llvm-objcopy] - Calculate the string table section sizes correctly.
This fixes the https://bugs.llvm.org/show_bug.cgi?id=40980.

Previously if string optimization occurred as a result of
StringTableBuilder's finalize() method, the size wasn't updated.

This hopefully also makes the interaction between sections during finalization
processes a bit more clear.

Differential revision: https://reviews.llvm.org/D59488

llvm-svn: 356371
2019-03-18 14:27:41 +00:00
Sanjay Patel 84de8a30a0 [InstCombine] extend rotate-left-by-constant canonicalization to funnel shift
Follow-up to:
rL356338

Rotates are a special case of funnel shift where the 2 input operands
are the same value, but that does not need to be a restriction for the
canonicalization when the shift amount is a constant.

llvm-svn: 356369
2019-03-18 14:10:11 +00:00
Simon Pilgrim f9ab4f5f4e [SystemZ] Remove icmp undef from reduced tests
Pre-commit for D59363 (Add icmp UNDEF handling to SelectionDAG::FoldSetCC)

Approved by @uweigand (Ulrich Weigand)

llvm-svn: 356368
2019-03-18 13:55:28 +00:00
Sanjay Patel d7f1539322 [InstCombine] add funnel shift tests with arbitrary constants; NFC
llvm-svn: 356367
2019-03-18 13:35:51 +00:00
Roman Lebedev 23629385f1 [llvm-exegesis] Separate tool options into three categories.
Results in much nicer -help output:
```
$ ./bin/llvm-exegesis -help
USAGE: llvm-exegesis [options]

OPTIONS:

Color Options:

  -color                                         - Use colors in output (default=autodetect)

General options:

  -enable-cse-in-irtranslator                    - Should enable CSE in irtranslator
  -enable-cse-in-legalizer                       - Should enable CSE in Legalizer

Generic Options:

  -help                                          - Display available options (-help-hidden for more)
  -help-list                                     - Display list of available options (-help-list-hidden for more)
  -version                                       - Display the version of this program

llvm-exegesis analysis options:

  -analysis-clustering-epsilon=<number>          - dbscan epsilon for benchmark point clustering
  -analysis-clusters-output-file=<string>        -
  -analysis-display-unstable-clusters            - if there is more than one benchmark for an opcode, said benchmarks may end up not being clustered into the same cluster if the measured performance characteristics are different. by default all such opcodes are filtered out. this flag will instead show only such unstable opcodes
  -analysis-inconsistencies-output-file=<string> -
  -analysis-inconsistency-epsilon=<number>       - epsilon for detection of when the cluster is different from the LLVM schedule profile values
  -analysis-numpoints=<uint>                     - minimum number of points in an analysis cluster

llvm-exegesis benchmark options:

  -ignore-invalid-sched-class                    - ignore instructions that do not define a sched class
  -mode=<value>                                  - the mode to run
    =latency                                     -   Instruction Latency
    =inverse_throughput                          -   Instruction Inverse Throughput
    =uops                                        -   Uop Decomposition
    =analysis                                    -   Analysis
  -num-repetitions=<uint>                        - number of time to repeat the asm snippet
  -opcode-index=<int>                            - opcode to measure, by index
  -opcode-name=<string>                          - comma-separated list of opcodes to measure, by name
  -snippets-file=<string>                        - code snippets to measure

llvm-exegesis options:

  -benchmarks-file=<string>                      - File to read (analysis mode) or write (latency/uops/inverse_throughput modes) benchmark results. “-” uses stdin/stdout.
  -mcpu=<string>                                 - cpu name to use for pfm counters, leave empty to autodetect
```

llvm-svn: 356364
2019-03-18 11:32:37 +00:00
David Stenberg 8a2e4af7e7 [DebugInfo] Ignore bitcasts when lowering stack arg dbg.values
Summary:
Look past bitcasts when looking for parameter debug values that are
described by frame-index loads in `EmitFuncArgumentDbgValue()`.

In the attached test case we would be left with an undef `DBG_VALUE`
for the parameter without this patch.

A similar fix was done for parameters passed in registers in D13005.

This fixes PR40777.

Reviewers: aprantl, vsk, jmorse

Reviewed By: aprantl

Subscribers: bjope, javed.absar, jdoerfert, llvm-commits

Tags: #debug-info, #llvm

Differential Revision: https://reviews.llvm.org/D58831

llvm-svn: 356363
2019-03-18 11:27:32 +00:00
Christof Douma 8cfd91dcc7 [AArch64] Fix bug 35094 atomicrmw on Armv8.1-A+lse
Fixes https://bugs.llvm.org/show_bug.cgi?id=35094

The Dead register definition pass should leave alone the atomicrmw
instructions on AArch64 (LTE extension). The reason is the following
statement in the Arm ARM:

"The ST<OP> instructions, and LD<OP> instructions where the destination
register is WZR or XZR, are not regarded as doing a read for the purpose
of a DMB LD barrier."

A good example was given in the gcc thread by Will Deacon (linked in the
bugzilla ticket 35094):

    P0 (atomic_int* y,atomic_int* x) {
      atomic_store_explicit(x,1,memory_order_relaxed);
      atomic_thread_fence(memory_order_release);
      atomic_store_explicit(y,1,memory_order_relaxed);
    }

    P1 (atomic_int* y,atomic_int* x) {
      atomic_fetch_add_explicit(y,1,memory_order_relaxed);  // STADD
      atomic_thread_fence(memory_order_acquire);
      int r0 = atomic_load_explicit(x,memory_order_relaxed);
    }

    P2 (atomic_int* y) {
      int r1 = atomic_load_explicit(y,memory_order_relaxed);
    }

    My understanding is that it is forbidden for r0 == 0 and r1 == 2 after
    this test has executed. However, if the relaxed add in P1 compiles to
    STADD and the subsequent acquire fence is compiled as DMB LD, then we
    don't have any ordering guarantees in P1 and the forbidden result could
    be observed.

Change-Id: I419f9f9df947716932038e1100c18d10a96408d0
llvm-svn: 356360
2019-03-18 09:21:06 +00:00
Craig Topper ba898da132 [X86] Hopefully fix a tautological compare warning in printVecCompareInstr.
llvm-svn: 356359
2019-03-18 07:05:01 +00:00
Alex Bradbury 60444ad16f [RISCV] Add ImmArg to intrinsics
llvm-svn: 356358
2019-03-18 06:01:27 +00:00
Craig Topper d94db9364d [X86] Add ADD8ri_DB and ADD8rr_DB to the autogenerated load folding table.
These were added in r355423.

We only use the autogenerated table to assist with the maintenance of the
manual table. These entries are alreayd in the manual table.

llvm-svn: 356357
2019-03-18 05:48:19 +00:00
Craig Topper b4c49255aa [X86] Make ADD*_DB post-RA pseudos and expand them in expandPostRAPseudo.
These are used to help convert OR->LEA when needed to avoid avoid a copy. They
aren't need after register allocation.

Happens to remove an ugly goto from X86MCCodeEmitter.cpp

llvm-svn: 356356
2019-03-18 05:48:18 +00:00
Craig Topper 860a27208e [X86] Add tab character to the custom printing of VPCMP and VPCOM instructions.
All the other instructions are printed with a preceeding tab.

llvm-svn: 356355
2019-03-18 02:53:11 +00:00
Matt Arsenault 4873056ced Remove immarg from llvm.expect
The LangRef claimed this was required to be a constant, but this
appears to be wrong.

Fixes bug 41079.

llvm-svn: 356353
2019-03-17 23:16:18 +00:00
Craig Topper 04cc28fe13 [X86] Merge printf32mem/printi32mem into a single printdwordmem. Do the same for all other printing functions.
The only thing the print methods currently need to know is the string to print for the memory size in intel syntax.

This patch merges the functions based on this string. If we ever need something else in the future, its easy to split them back out.

This reduces the number of cases in the assembly printers. It shrinks the intel printer to only use 7 bytes per instruction instead of 8.

llvm-svn: 356352
2019-03-17 22:57:21 +00:00
Tim Renouf c4e128e221 [CodeGen] Defined MVTs v3i32, v3f32, v5i32, v5f32
AMDGPU would like to use these MVTs.

Differential Revision: https://reviews.llvm.org/D58901

Change-Id: I6125fea810d7cc62a4b4de3d9904255a1233ae4e
llvm-svn: 356351
2019-03-17 22:56:38 +00:00
Tim Renouf c302b9b5fe [CodeGen] Prepare for introduction of v3 and v5 MVTs
AMDGPU would like to have MVTs for v3i32, v3f32, v5i32, v5f32. This
commit does not add them, but makes preparatory changes:

* Exclude non-legal non-power-of-2 vector types from ComputeRegisterProp
  mechanism in TargetLoweringBase::getTypeConversion.

* Cope with SETCC and VSELECT for odd-width i1 vector when the other
  vectors are legal type.

Some of this patch is from Matt Arsenault, also of AMD.

Differential Revision: https://reviews.llvm.org/D58899

Change-Id: Ib5f23377dbef511be3a936211a0b9f94e46331f8
llvm-svn: 356350
2019-03-17 21:43:12 +00:00
David Green baa94ef03b [ARM] Check that CPSR does not have other uses
Fix up rL356335 by checking that CPSR is not read between
the compare and the branch.

llvm-svn: 356349
2019-03-17 21:36:15 +00:00
Matt Arsenault 884a18d792 RegAllocFast: Add hint to debug printing
llvm-svn: 356348
2019-03-17 21:31:40 +00:00
Matt Arsenault e0c1f9e76d AMDGPU: Partially fix default device for HSA
There are a few different issues, mostly stemming from using
generation based checks for anything instead of subtarget
features. Stop adding flat-address-space as a feature for HSA, as it
should only be a device property. This was incorrectly allowing flat
instructions to select for SI.

Increase the default generation for HSA to avoid the encoding error
when emitting objects. This has some other side effects from various
checks which probably should be separate subtarget features (in the
cost model and for dealing with the DS offset folding issue).

Partial fix for bug 41070. It should probably be an error to try using
amdhsa without flat support.

llvm-svn: 356347
2019-03-17 21:31:35 +00:00
Nikita Popov 5e7b62de05 [ConstantRange] Add assertion for KnownBits validity; NFC
Following the suggestion in D59475.

llvm-svn: 356346
2019-03-17 21:25:32 +00:00
Nikita Popov 322e2dbee1 [ValueTracking] Use ConstantRange overflow check for signed add; NFC
This is the same change as rL356290, but for signed add. It replaces
the existing ripple logic with the overflow logic in ConstantRange.

This is NFC in that it should return NeverOverflow in exactly the
same cases as the previous implementation. However, it does make
computeOverflowForSignedAdd() more powerful by now also determining
AlwaysOverflows conditions. As none of its consumers handle this yet,
this has no impact on optimization. Making use of AlwaysOverflows
in with.overflow folding will be handled as a followup.

Differential Revision: https://reviews.llvm.org/D59450

llvm-svn: 356345
2019-03-17 21:25:26 +00:00
Craig Topper affead9ad0 [X86] Remove the _alt forms of AVX512 VPCMP instructions. Use a combination of custom printing and custom parsing to achieve the same result and more
Similar to the previous patch for VPCOM.

Differential Revision: https://reviews.llvm.org/D59398

llvm-svn: 356344
2019-03-17 21:21:40 +00:00
Craig Topper 12509d87f3 [X86] Remove the _alt forms of XOP VPCOM instructions. Use a combination of custom printing and custom parsing to achieve the same result and more
Previously we had a regular form of the instruction used when the immediate was 0-7. And _alt form that allowed the full 8 bit immediate. Codegen would always use the 0-7 form since the immediate was always checked to be in range. Assembly parsing would use the 0-7 form when a mnemonic like vpcomtrueb was used. If the immediate was specified directly the _alt form was used. The disassembler would prefer to use the 0-7 form instruction when the immediate was in range and the _alt form otherwise. This way disassembly would print the most readable form when possible.

The assembly parsing for things like vpcomtrueb relied on splitting the mnemonic into 3 pieces. A "vpcom" prefix, an immediate representing the "true", and a suffix of "b". The tablegenerated printing code would similarly print a "vpcom" prefix, decode the immediate into a string, and then print "b".

The _alt form on the other hand parsed and printed like any other instruction with no specialness.

With this patch we drop to one form and solve the disassembly printing issue by doing custom printing when the immediate is 0-7. The parsing code has been tweaked to turn "vpcomtrueb" into "vpcomb" and then the immediate for the "true" is inserted either before or after the other operands depending on at&t or intel syntax.

I'd rather not do the custom printing, but I tried using an InstAlias for each possible mnemonic for all 8 immediates for all 16 combinations of element size, signedness, and memory/register. The code emitted into printAliasInstr ended up checking the number of operands, the register class of each operand, and the immediate for all 256 aliases. This was repeated for both the at&t and intel printer. Despite a lot of common checks between all of the aliases, when compiled with clang at least this commonality was not well optimized. Nor do all the checks seem necessary. Since I want to do a similar thing for vcmpps/pd/ss/sd which have 32 immediate values and 3 encoding flavors, 3 register sizes, etc. This didn't seem to scale well for clang binary size. So custom printing seemed a better trade off.

I also considered just using the InstAlias for the matching and not the printing. But that seemed like it would add a lot of extra rows to the matcher table. Especially given that the 32 immediates for vpcmpps have 46 strings associated with them.

Differential Revision: https://reviews.llvm.org/D59398

llvm-svn: 356343
2019-03-17 21:21:37 +00:00
Tim Renouf e30aa6a136 [AMDGPU] Prepare for introduction of v3 and v5 MVTs
AMDGPU would like to have MVTs for v3i32, v3f32, v5i32, v5f32. This
commit does not add them, but makes preparatory changes:

* Fixed assumptions of power-of-2 vector type in kernel arg handling,
  and added v5 kernel arg tests and v3/v5 shader arg tests.

* Added v5 tests for cost analysis.

* Added vec3/vec5 arg test cases.

Some of this patch is from Matt Arsenault, also of AMD.

Differential Revision: https://reviews.llvm.org/D58928

Change-Id: I7279d6b4841464d2080eb255ef3c589e268eabcd
llvm-svn: 356342
2019-03-17 21:04:16 +00:00
Tim Renouf d1477e989c [ARM] Fixed an assumption of power-of-2 vector MVT
I am about to introduce some non-power-of-2 width vector MVTs. This
commit fixes a power-of-2 assumption that my forthcoming change would
otherwise break, as shown by test/CodeGen/ARM/vcvt_combine.ll and
vdiv_combine.ll.

Differential Revision: https://reviews.llvm.org/D58927

Change-Id: I56a282e365d3874ab0621e5bdef98a612f702317
llvm-svn: 356341
2019-03-17 20:48:54 +00:00
Simon Pilgrim 10ba65cc48 [AMDGPU] Regenerate some f16/i16 tests.
Prep work for D51589

llvm-svn: 356340
2019-03-17 20:36:12 +00:00
Nikita Popov ef2d979943 [ConstantRange] Add fromKnownBits() method
Following the suggestion in D59450, I'm moving the code for constructing
a ConstantRange from KnownBits out of ValueTracking, which also allows us
to test this code independently.

I'm adding this method to ConstantRange rather than KnownBits (which
would have been a bit nicer API wise) to avoid creating a dependency
from Support to IR, where ConstantRange lives.

Differential Revision: https://reviews.llvm.org/D59475

llvm-svn: 356339
2019-03-17 20:24:02 +00:00
Sanjay Patel b3bcd95771 [InstCombine] canonicalize rotate right by constant to rotate left
This was noted as a backend problem:
https://bugs.llvm.org/show_bug.cgi?id=41057
...and subsequently fixed for x86:
rL356121
But we should canonicalize these in IR for the benefit of all targets
and improve IR analysis such as CSE.

llvm-svn: 356338
2019-03-17 19:08:00 +00:00
Sanjay Patel a3a2f9424e [InstCombine] add tests for rotate by constant using funnel intrinsics; NFC
llvm-svn: 356337
2019-03-17 18:50:39 +00:00
David Green e0b48a8015 [ARM] Search backwards for CMP when combining into CBZ
The constant island pass currently only looks at the instruction immediately
before a branch for a CMP to fold into a CBZ/CBNZ. This extends it to search
backwards for the instruction that defines CPSR. We need to ensure that the
register is not overridden between the CMP and the branch.

Differential Revision: https://reviews.llvm.org/D59317

llvm-svn: 356336
2019-03-17 16:11:22 +00:00
David Green 30673299d4 [ARM] Add some CBZ constant island tests. NFC
llvm-svn: 356335
2019-03-17 16:00:21 +00:00
Nikita Popov 9a4453592b [DAGCombine] Fold (x & ~y) | y patterns
Fold (x & ~y) | y and it's four commuted variants to x | y. This pattern
can in particular appear when a vselect c, x, -1 is expanded to
(x & ~c) | (-1 & c) and combined to (x & ~c) | c.

This change has some overlap with D59066, which avoids creating a
vselect of this form in the first place during uaddsat expansion.

Differential Revision: https://reviews.llvm.org/D59174

llvm-svn: 356333
2019-03-17 15:45:38 +00:00
Sanjay Patel 6a6e808b69 [TargetLowering] improve the default expansion of uaddsat/usubsat
This is a subset of what was proposed in:
D59006
...and may overlap with test changes from:
D59174
...but it seems like a good general optimization to turn selects
into bitwise-logic when possible because we never know exactly
what can happen at this stage of DAG combining depending on how
the target has defined things.

Differential Revision: https://reviews.llvm.org/D59066

llvm-svn: 356332
2019-03-17 14:57:40 +00:00
Alex Bradbury 997947961a [RISCV][NFC] Factor out matchRegisterNameHelper in RISCVAsmParser.cpp
Contains common logic to match a string to a register name.

llvm-svn: 356330
2019-03-17 12:02:32 +00:00
Alex Bradbury b18e314a7c [RISCV] Fix RISCVAsmParser::ParseRegister and add tests
RISCVAsmParser::ParseRegister is called from AsmParser::parseRegisterOrNumber,
which in turn is called when processing CFI directives. The RISC-V
implementation wasn't setting RegNo, and so was incorrect. This patch address
that and adds cfi directive tests that demonstrate the fix. A follow-up patch
will factor out the register parsing logic shared between ParseRegister and
parseRegister.

llvm-svn: 356329
2019-03-17 12:00:58 +00:00
Simon Pilgrim 3b0a6c69ee [DAGCombine] combineShuffleOfScalars - handle non-zero SCALAR_TO_VECTOR indices (PR41097)
rL356292 reduces the size of scalar_to_vector if we know the upper bits are undef - which means that shuffles may find they are suddenly referencing scalar_to_vector elements other than zero - so make sure we handle this as undef.

llvm-svn: 356327
2019-03-16 17:36:26 +00:00
Yonghong Song 6db6b56a5c [BPF] Add BTF Var and DataSec Support
Two new kinds, BTF_KIND_VAR and BTF_KIND_DATASEC, are added.

BTF_KIND_VAR has the following specification:
   btf_type.name: var name
   btf_type.info: type kind
   btf_type.type: var type
   // btf_type is followed by one u32
   u32: varinfo (currently, only 0 - static, 1 - global allocated in elf sections)

Not all globals are supported in this patch. The following globals are supported:
  . static variables with or without section attributes
  . global variables with section attributes

The inclusion of globals with section attributes
is for future potential extraction of key/value
type id's from map definition.

BTF_KIND_DATASEC has the following specification:
  btf_type.name: section name associated with variable or
                 one of .data/.bss/.readonly
  btf_type.info: type kind and vlen for # of variables
  btf_type.size: 0
  #vlen number of the following:
    u32: id of corresponding BTF_KIND_VAR
    u32: in-session offset of the var
    u32: the size of memory var occupied

At the time of debug info emission, the data section
size is unknown, so the btf_type.size = 0 for
BTF_KIND_DATASEC. The loader can patch it during
loading time.

The in-session offseet of the var is only available
for static variables. For global variables, the
loader neeeds to assign the global variable symbol value in
symbol table to in-section offset.

The size of memory is used to specify the amount of the
memory a variable occupies. Typically, it equals to
the type size, but for certain structures, e.g.,
  struct tt {
    int a;
    int b;
    char c[];
   };
   static volatile struct tt s2 = {3, 4, "abcdefghi"};
The static variable s2 has size of 20.

Note that for BTF_KIND_DATASEC name, the section name
does not contain object name. The compiler does have
input module name. For example, two cases below:
   . clang -target bpf -O2 -g -c test.c
     The compiler knows the input file (module) is test.c
     and can generate sec name like test.data/test.bss etc.
   . clang -target bpf -O2 -g -emit-llvm -c test.c -o - |
     llc -march=bpf -filetype=obj -o test.o
     The llc compiler has the input file as stdin, and
     would generate something like stdin.data/stdin.bss etc.
     which does not really make sense.

For any user specificed section name, e.g.,
  static volatile int a __attribute__((section("id1")));
  static volatile const int b __attribute__((section("id2")));
The DataSec with name "id1" and "id2" does not contain
information whether the section is readonly or not.
The loader needs to check the corresponding elf section
flags for such information.

A simple example:
  -bash-4.4$ cat t.c
  int g1;
  int g2 = 3;
  const int g3 = 4;
  static volatile int s1;
  struct tt {
   int a;
   int b;
   char c[];
  };
  static volatile struct tt s2 = {3, 4, "abcdefghi"};
  static volatile const int s3 = 4;
  int m __attribute__((section("maps"), used)) = 4;
  int test() { return g1 + g2 + g3 + s1 + s2.a + s3 + m; }
  -bash-4.4$ clang -target bpf -O2 -g -S t.c
Checking t.s, 4 BTF_KIND_VAR's are generated (s1, s2, s3 and m).
4 BTF_KIND_DATASEC's are generated with names
".data", ".bss", ".rodata" and "maps".

Signed-off-by: Yonghong Song <yhs@fb.com>

Differential Revision: https://reviews.llvm.org/D59441

llvm-svn: 356326
2019-03-16 15:36:31 +00:00
Simon Pilgrim f2c53b5d6c [X86][SSE] Constant fold PEXTRB/PEXTRW/EXTRACT_VECTOR_ELT nodes.
Replaces existing i1-only fold.

llvm-svn: 356325
2019-03-16 15:02:00 +00:00
Simon Pilgrim 0f472e1d01 [X86] Add SimplifyDemandedBitsForTargetNode support for PEXTRB/PEXTRW
Improved constant folding for PEXTRB/PEXTRW will be added in a future commit

llvm-svn: 356324
2019-03-16 14:29:50 +00:00
Heejin Ahn 66ce419468 [WebAssembly] Make rethrow take an except_ref type argument
Summary:
In the new wasm EH proposal, `rethrow` takes an `except_ref` argument.
This change was missing in r352598.

This patch adds `llvm.wasm.rethrow.in.catch` intrinsic. This is an
intrinsic that's gonna eventually be lowered to wasm `rethrow`
instruction, but this intrinsic can appear only within a catchpad or a
cleanuppad scope. Also this intrinsic needs to be invokable - otherwise
EH pad successor for it will not be correctly generated in clang.

This also adds lowering logic for this intrinsic in
`SelectionDAGBuilder::visitInvoke`. This routine is basically a
specialized and simplified version of
`SelectionDAGBuilder::visitTargetIntrinsic`, but we can't use it
because if is only for `CallInst`s.

This deletes the previous `llvm.wasm.rethrow` intrinsic and related
tests, which was meant to be used within a `__cxa_rethrow` library
function. Turned out this needs some more logic, so the intrinsic for
this purpose will be added later.

LateEHPrepare takes a result value of `catch` and inserts it into
matching `rethrow` as an argument.

`RETHROW_IN_CATCH` is a pseudo instruction that serves as a link between
`llvm.wasm.rethrow.in.catch` and the real wasm `rethrow` instruction. To
generate a `rethrow` instruction, we need an `except_ref` argument,
which is generated from `catch` instruction. But `catch` instrutions are
added in LateEHPrepare pass, so we use `RETHROW_IN_CATCH`, which takes
no argument, until we are able to correctly lower it to `rethrow` in
LateEHPrepare.

Reviewers: dschuff

Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D59352

llvm-svn: 356316
2019-03-16 05:38:57 +00:00
Heejin Ahn b47a18cd4b [WebAssembly] Method order change in LateEHPrepare (NFC)
Summary:
Currently the order of these methods does not matter, but the following
CL needs to have this order changed. Merging the order change and the
semantics change within a CL complicates the diff, so submitting the
order change first.

Reviewers: dschuff

Subscribers: sbc100, jgravelle-google, sunfish, jdoerfert, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D59342

llvm-svn: 356315
2019-03-16 04:46:05 +00:00
Peter Collingbourne c51470e7ef gn build: Merge r356305.
llvm-svn: 356314
2019-03-16 03:46:02 +00:00
Heejin Ahn a41250c7be [WebAssembly] Irreducible control flow rewrite
Summary:
Rewrite WebAssemblyFixIrreducibleControlFlow to a simpler and cleaner
design, which directly computes reachability and other properties
itself. This avoids previous complexity and bugs. (The new graph
analyses are very similar to how the Relooper algorithm would find loop
entries and so forth.)

This fixes a few bugs, including where we had a false positive and
thought fannkuch was irreducible when it was not, which made us much
larger and slower there, and a reverse bug where we missed
irreducibility. On fannkuch, we used to be 44% slower than asm2wasm and
are now 4% faster.

Reviewers: aheejin

Subscribers: jdoerfert, mgrang, dschuff, sbc100, jgravelle-google, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D58919

Patch by Alon Zakai (kripken)

llvm-svn: 356313
2019-03-16 03:00:19 +00:00
Fangrui Song a957f47e0a [ADT] Make SmallVector emplace_back return a reference
This follows the C++17 std::vector change and can simplify immediate
back() calls.

llvm-svn: 356312
2019-03-16 02:41:45 +00:00
Amara Emerson 7097e83dab [GlobalISel] Make isel verification checks of vregs run under NDEBUG only.
llvm-svn: 356309
2019-03-16 01:02:10 +00:00
Peter Collingbourne 3dea548017 gn build: Add missing dependency to check-clang target.
llvm-svn: 356306
2019-03-15 22:47:34 +00:00
Fedor Sergeev 6a9c2f4f98 [TimePasses] allow -time-passes reporting into a custom stream
TimePassesHandler object (implementation of time-passes for new pass manager)
gains ability to report into a stream customizable per-instance (per pipeline).

Intended use is to specify separate time-passes output stream per each compilation,
setting up TimePasses member of StandardInstrumentation during PassBuilder setup.
That allows to get independent non-overlapping pass-times reports for parallel
independent compilations (in JIT-like setups).

By default it still puts timing reports into the info-output-file stream
(created by CreateInfoOutputFile every time report is requested).

Unit-test added for non-default case, and it also allowed to discover that print() does not work
as declared - it did not reset the timers, leading to yet another report being printed into the default stream.
Fixed print() to actually reset timers according to what was declared in print's comments before.

Reviewed By: philip.pfaffe
Differential Revision: https://reviews.llvm.org/D59366

llvm-svn: 356305
2019-03-15 22:15:23 +00:00
Amara Emerson 3739a20875 [GlobalISel] Allow MachineIRBuilder to build subregister copies.
This relaxes some asserts about sizes, and adds an optional subreg parameter
to buildCopy().

Also update AArch64 instruction selector to use this in places where we
previously used MachineInstrBuilder manually.

Differential Revision: https://reviews.llvm.org/D59434

llvm-svn: 356304
2019-03-15 21:59:50 +00:00
Eli Friedman 68d9a60573 [ARM] Add MachineVerifier logic for some Thumb1 instructions.
tMOVr and tPUSH/tPOP/tPOP_RET have register constraints which can't be
expressed in TableGen, so check them explicitly. I've unfortunately run
into issues with both of these recently; hopefully this saves some time
for someone else in the future.

Differential Revision: https://reviews.llvm.org/D59383

llvm-svn: 356303
2019-03-15 21:44:49 +00:00
Roman Lebedev 9f37790608 [X86] X86ISelLowering::combineSextInRegCmov(): also handle i8 CMOV's
Summary:
As noted by @andreadb in https://reviews.llvm.org/D59035#inline-525780

If we have `sext (trunc (cmov C0, C1) to i8)`,
we can instead do `cmov (sext (trunc C0 to i8)), (sext (trunc C1 to i8))`

Reviewers: craig.topper, andreadb, RKSimon

Reviewed By: craig.topper

Subscribers: llvm-commits, andreadb

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D59412

llvm-svn: 356301
2019-03-15 21:18:05 +00:00
Roman Lebedev b6e376ddfa [X86] Promote i8 CMOV's (PR40965)
Summary:
@mclow.lists brought up this issue up in IRC, it came up during
implementation of libc++ `std::midpoint()` implementation (D59099)
https://godbolt.org/z/oLrHBP

Currently LLVM X86 backend only promotes i8 CMOV if it came from 2x`trunc`.
This differential proposes to always promote i8 CMOV.

There are several concerns here:
* Is this actually more performant, or is it just the ASM that looks cuter?
* Does this result in partial register stalls?
* What about branch predictor?

# Indeed, performance should be the main point here.
Let's look at a simple microbenchmark: {F8412076}
```
#include "benchmark/benchmark.h"

#include <algorithm>
#include <cmath>
#include <cstdint>
#include <iterator>
#include <limits>
#include <random>
#include <type_traits>
#include <utility>
#include <vector>

// Future preliminary libc++ code, from Marshall Clow.
namespace std {
template <class _Tp>
__inline _Tp midpoint(_Tp __a, _Tp __b) noexcept {
  using _Up = typename std::make_unsigned<typename remove_cv<_Tp>::type>::type;

  int __sign = 1;
  _Up __m = __a;
  _Up __M = __b;
  if (__a > __b) {
    __sign = -1;
    __m = __b;
    __M = __a;
  }
  return __a + __sign * _Tp(_Up(__M - __m) >> 1);
}
}  // namespace std

template <typename T>
std::vector<T> getVectorOfRandomNumbers(size_t count) {
  std::random_device rd;
  std::mt19937 gen(rd());
  std::uniform_int_distribution<T> dis(std::numeric_limits<T>::min(),
                                       std::numeric_limits<T>::max());
  std::vector<T> v;
  v.reserve(count);
  std::generate_n(std::back_inserter(v), count,
                  [&dis, &gen]() { return dis(gen); });
  assert(v.size() == count);
  return v;
}

struct RandRand {
  template <typename T>
  static std::pair<std::vector<T>, std::vector<T>> Gen(size_t count) {
    return std::make_pair(getVectorOfRandomNumbers<T>(count),
                          getVectorOfRandomNumbers<T>(count));
  }
};
struct ZeroRand {
  template <typename T>
  static std::pair<std::vector<T>, std::vector<T>> Gen(size_t count) {
    return std::make_pair(std::vector<T>(count, T(0)),
                          getVectorOfRandomNumbers<T>(count));
  }
};

template <class T, class Gen>
void BM_StdMidpoint(benchmark::State& state) {
  const size_t Length = state.range(0);

  const std::pair<std::vector<T>, std::vector<T>> Data =
      Gen::template Gen<T>(Length);
  const std::vector<T>& a = Data.first;
  const std::vector<T>& b = Data.second;
  assert(a.size() == Length && b.size() == a.size());

  benchmark::ClobberMemory();
  benchmark::DoNotOptimize(a);
  benchmark::DoNotOptimize(a.data());
  benchmark::DoNotOptimize(b);
  benchmark::DoNotOptimize(b.data());

  for (auto _ : state) {
    for (size_t i = 0; i < Length; i++) {
      const auto calculated = std::midpoint(a[i], b[i]);
      benchmark::DoNotOptimize(calculated);
    }
  }
  state.SetComplexityN(Length);
  state.counters["midpoints"] =
      benchmark::Counter(Length, benchmark::Counter::kIsIterationInvariant);
  state.counters["midpoints/sec"] =
      benchmark::Counter(Length, benchmark::Counter::kIsIterationInvariantRate);
  const size_t BytesRead = 2 * sizeof(T) * Length;
  state.counters["bytes_read/iteration"] =
      benchmark::Counter(BytesRead, benchmark::Counter::kDefaults,
                         benchmark::Counter::OneK::kIs1024);
  state.counters["bytes_read/sec"] = benchmark::Counter(
      BytesRead, benchmark::Counter::kIsIterationInvariantRate,
      benchmark::Counter::OneK::kIs1024);
}

template <typename T>
static void CustomArguments(benchmark::internal::Benchmark* b) {
  const size_t L2SizeBytes = 2 * 1024 * 1024;
  // What is the largest range we can check to always fit within given L2 cache?
  const size_t MaxLen = L2SizeBytes / /*total bufs*/ 2 /
                        /*maximal elt size*/ sizeof(T) / /*safety margin*/ 2;
  b->RangeMultiplier(2)->Range(1, MaxLen)->Complexity(benchmark::oN);
}

// Both of the values are random.
// The comparison is unpredictable.
BENCHMARK_TEMPLATE(BM_StdMidpoint, int32_t, RandRand)
    ->Apply(CustomArguments<int32_t>);
BENCHMARK_TEMPLATE(BM_StdMidpoint, uint32_t, RandRand)
    ->Apply(CustomArguments<uint32_t>);
BENCHMARK_TEMPLATE(BM_StdMidpoint, int64_t, RandRand)
    ->Apply(CustomArguments<int64_t>);
BENCHMARK_TEMPLATE(BM_StdMidpoint, uint64_t, RandRand)
    ->Apply(CustomArguments<uint64_t>);
BENCHMARK_TEMPLATE(BM_StdMidpoint, int16_t, RandRand)
    ->Apply(CustomArguments<int16_t>);
BENCHMARK_TEMPLATE(BM_StdMidpoint, uint16_t, RandRand)
    ->Apply(CustomArguments<uint16_t>);
BENCHMARK_TEMPLATE(BM_StdMidpoint, int8_t, RandRand)
    ->Apply(CustomArguments<int8_t>);
BENCHMARK_TEMPLATE(BM_StdMidpoint, uint8_t, RandRand)
    ->Apply(CustomArguments<uint8_t>);

// One value is always zero, and another is bigger or equal than zero.
// The comparison is predictable.
BENCHMARK_TEMPLATE(BM_StdMidpoint, uint32_t, ZeroRand)
    ->Apply(CustomArguments<uint32_t>);
BENCHMARK_TEMPLATE(BM_StdMidpoint, uint64_t, ZeroRand)
    ->Apply(CustomArguments<uint64_t>);
BENCHMARK_TEMPLATE(BM_StdMidpoint, uint16_t, ZeroRand)
    ->Apply(CustomArguments<uint16_t>);
BENCHMARK_TEMPLATE(BM_StdMidpoint, uint8_t, ZeroRand)
    ->Apply(CustomArguments<uint8_t>);
```

```
$ ~/src/googlebenchmark/tools/compare.py --no-utest benchmarks ./llvm-cmov-bench-OLD ./llvm-cmov-bench-NEW
RUNNING: ./llvm-cmov-bench-OLD --benchmark_out=/tmp/tmp5a5qjm
2019-03-06 21:53:31
Running ./llvm-cmov-bench-OLD
Run on (8 X 4000 MHz CPU s)
CPU Caches:
  L1 Data 16K (x8)
  L1 Instruction 64K (x4)
  L2 Unified 2048K (x4)
  L3 Unified 8192K (x1)
Load Average: 1.78, 1.81, 1.36
----------------------------------------------------------------------------------------------------
Benchmark                                          Time             CPU   Iterations UserCounters<...>
----------------------------------------------------------------------------------------------------
<...>
BM_StdMidpoint<int32_t, RandRand>/131072      300398 ns       300404 ns         2330 bytes_read/iteration=1024k bytes_read/sec=3.25083G/s midpoints=305.398M midpoints/sec=436.319M/s
BM_StdMidpoint<int32_t, RandRand>_BigO          2.29 N          2.29 N
BM_StdMidpoint<int32_t, RandRand>_RMS              2 %             2 %
<...>
BM_StdMidpoint<uint32_t, RandRand>/131072     300433 ns       300433 ns         2330 bytes_read/iteration=1024k bytes_read/sec=3.25052G/s midpoints=305.398M midpoints/sec=436.278M/s
BM_StdMidpoint<uint32_t, RandRand>_BigO         2.29 N          2.29 N
BM_StdMidpoint<uint32_t, RandRand>_RMS             2 %             2 %
<...>
BM_StdMidpoint<int64_t, RandRand>/65536       169857 ns       169858 ns         4121 bytes_read/iteration=1024k bytes_read/sec=5.74929G/s midpoints=270.074M midpoints/sec=385.828M/s
BM_StdMidpoint<int64_t, RandRand>_BigO          2.59 N          2.59 N
BM_StdMidpoint<int64_t, RandRand>_RMS              3 %             3 %
<...>
BM_StdMidpoint<uint64_t, RandRand>/65536      169770 ns       169771 ns         4125 bytes_read/iteration=1024k bytes_read/sec=5.75223G/s midpoints=270.336M midpoints/sec=386.026M/s
BM_StdMidpoint<uint64_t, RandRand>_BigO         2.59 N          2.59 N
BM_StdMidpoint<uint64_t, RandRand>_RMS             3 %             3 %
<...>
BM_StdMidpoint<int16_t, RandRand>/262144      591169 ns       591179 ns         1182 bytes_read/iteration=1024k bytes_read/sec=1.65189G/s midpoints=309.854M midpoints/sec=443.426M/s
BM_StdMidpoint<int16_t, RandRand>_BigO          2.25 N          2.25 N
BM_StdMidpoint<int16_t, RandRand>_RMS              1 %             1 %
<...>
BM_StdMidpoint<uint16_t, RandRand>/262144     591264 ns       591274 ns         1184 bytes_read/iteration=1024k bytes_read/sec=1.65162G/s midpoints=310.378M midpoints/sec=443.354M/s
BM_StdMidpoint<uint16_t, RandRand>_BigO         2.25 N          2.25 N
BM_StdMidpoint<uint16_t, RandRand>_RMS             1 %             1 %
<...>
BM_StdMidpoint<int8_t, RandRand>/524288      2983669 ns      2983689 ns          235 bytes_read/iteration=1024k bytes_read/sec=335.156M/s midpoints=123.208M midpoints/sec=175.718M/s
BM_StdMidpoint<int8_t, RandRand>_BigO           5.69 N          5.69 N
BM_StdMidpoint<int8_t, RandRand>_RMS               0 %             0 %
<...>
BM_StdMidpoint<uint8_t, RandRand>/524288     2668398 ns      2668419 ns          262 bytes_read/iteration=1024k bytes_read/sec=374.754M/s midpoints=137.363M midpoints/sec=196.479M/s
BM_StdMidpoint<uint8_t, RandRand>_BigO          5.09 N          5.09 N
BM_StdMidpoint<uint8_t, RandRand>_RMS              0 %             0 %
<...>
BM_StdMidpoint<uint32_t, ZeroRand>/131072     300887 ns       300887 ns         2331 bytes_read/iteration=1024k bytes_read/sec=3.24561G/s midpoints=305.529M midpoints/sec=435.619M/s
BM_StdMidpoint<uint32_t, ZeroRand>_BigO         2.29 N          2.29 N
BM_StdMidpoint<uint32_t, ZeroRand>_RMS             2 %             2 %
<...>
BM_StdMidpoint<uint64_t, ZeroRand>/65536      169634 ns       169634 ns         4102 bytes_read/iteration=1024k bytes_read/sec=5.75688G/s midpoints=268.829M midpoints/sec=386.338M/s
BM_StdMidpoint<uint64_t, ZeroRand>_BigO         2.59 N          2.59 N
BM_StdMidpoint<uint64_t, ZeroRand>_RMS             3 %             3 %
<...>
BM_StdMidpoint<uint16_t, ZeroRand>/262144     592252 ns       592255 ns         1182 bytes_read/iteration=1024k bytes_read/sec=1.64889G/s midpoints=309.854M midpoints/sec=442.62M/s
BM_StdMidpoint<uint16_t, ZeroRand>_BigO         2.26 N          2.26 N
BM_StdMidpoint<uint16_t, ZeroRand>_RMS             1 %             1 %
<...>
BM_StdMidpoint<uint8_t, ZeroRand>/524288      987295 ns       987309 ns          711 bytes_read/iteration=1024k bytes_read/sec=1012.85M/s midpoints=372.769M midpoints/sec=531.028M/s
BM_StdMidpoint<uint8_t, ZeroRand>_BigO          1.88 N          1.88 N
BM_StdMidpoint<uint8_t, ZeroRand>_RMS              1 %             1 %
RUNNING: ./llvm-cmov-bench-NEW --benchmark_out=/tmp/tmpPvwpfW
2019-03-06 21:56:58
Running ./llvm-cmov-bench-NEW
Run on (8 X 4000 MHz CPU s)
CPU Caches:
  L1 Data 16K (x8)
  L1 Instruction 64K (x4)
  L2 Unified 2048K (x4)
  L3 Unified 8192K (x1)
Load Average: 1.17, 1.46, 1.30
----------------------------------------------------------------------------------------------------
Benchmark                                          Time             CPU   Iterations UserCounters<...>
----------------------------------------------------------------------------------------------------
<...>
BM_StdMidpoint<int32_t, RandRand>/131072      300878 ns       300880 ns         2324 bytes_read/iteration=1024k bytes_read/sec=3.24569G/s midpoints=304.611M midpoints/sec=435.629M/s
BM_StdMidpoint<int32_t, RandRand>_BigO          2.29 N          2.29 N
BM_StdMidpoint<int32_t, RandRand>_RMS              2 %             2 %
<...>
BM_StdMidpoint<uint32_t, RandRand>/131072     300231 ns       300226 ns         2330 bytes_read/iteration=1024k bytes_read/sec=3.25276G/s midpoints=305.398M midpoints/sec=436.578M/s
BM_StdMidpoint<uint32_t, RandRand>_BigO         2.29 N          2.29 N
BM_StdMidpoint<uint32_t, RandRand>_RMS             2 %             2 %
<...>
BM_StdMidpoint<int64_t, RandRand>/65536       170819 ns       170777 ns         4115 bytes_read/iteration=1024k bytes_read/sec=5.71835G/s midpoints=269.681M midpoints/sec=383.752M/s
BM_StdMidpoint<int64_t, RandRand>_BigO          2.60 N          2.60 N
BM_StdMidpoint<int64_t, RandRand>_RMS              3 %             3 %
<...>
BM_StdMidpoint<uint64_t, RandRand>/65536      171705 ns       171708 ns         4106 bytes_read/iteration=1024k bytes_read/sec=5.68733G/s midpoints=269.091M midpoints/sec=381.671M/s
BM_StdMidpoint<uint64_t, RandRand>_BigO         2.62 N          2.62 N
BM_StdMidpoint<uint64_t, RandRand>_RMS             3 %             3 %
<...>
BM_StdMidpoint<int16_t, RandRand>/262144      592510 ns       592516 ns         1182 bytes_read/iteration=1024k bytes_read/sec=1.64816G/s midpoints=309.854M midpoints/sec=442.425M/s
BM_StdMidpoint<int16_t, RandRand>_BigO          2.26 N          2.26 N
BM_StdMidpoint<int16_t, RandRand>_RMS              1 %             1 %
<...>
BM_StdMidpoint<uint16_t, RandRand>/262144     614823 ns       614823 ns         1180 bytes_read/iteration=1024k bytes_read/sec=1.58836G/s midpoints=309.33M midpoints/sec=426.373M/s
BM_StdMidpoint<uint16_t, RandRand>_BigO         2.33 N          2.33 N
BM_StdMidpoint<uint16_t, RandRand>_RMS             4 %             4 %
<...>
BM_StdMidpoint<int8_t, RandRand>/524288      1073181 ns      1073201 ns          650 bytes_read/iteration=1024k bytes_read/sec=931.791M/s midpoints=340.787M midpoints/sec=488.527M/s
BM_StdMidpoint<int8_t, RandRand>_BigO           2.05 N          2.05 N
BM_StdMidpoint<int8_t, RandRand>_RMS               1 %             1 %
BM_StdMidpoint<uint8_t, RandRand>/524288     1071010 ns      1071020 ns          653 bytes_read/iteration=1024k bytes_read/sec=933.689M/s midpoints=342.36M midpoints/sec=489.522M/s
BM_StdMidpoint<uint8_t, RandRand>_BigO          2.05 N          2.05 N
BM_StdMidpoint<uint8_t, RandRand>_RMS              1 %             1 %
<...>
BM_StdMidpoint<uint32_t, ZeroRand>/131072     300413 ns       300416 ns         2330 bytes_read/iteration=1024k bytes_read/sec=3.2507G/s midpoints=305.398M midpoints/sec=436.302M/s
BM_StdMidpoint<uint32_t, ZeroRand>_BigO         2.29 N          2.29 N
BM_StdMidpoint<uint32_t, ZeroRand>_RMS             2 %             2 %
<...>
BM_StdMidpoint<uint64_t, ZeroRand>/65536      169667 ns       169669 ns         4123 bytes_read/iteration=1024k bytes_read/sec=5.75568G/s midpoints=270.205M midpoints/sec=386.257M/s
BM_StdMidpoint<uint64_t, ZeroRand>_BigO         2.59 N          2.59 N
BM_StdMidpoint<uint64_t, ZeroRand>_RMS             3 %             3 %
<...>
BM_StdMidpoint<uint16_t, ZeroRand>/262144     591396 ns       591404 ns         1184 bytes_read/iteration=1024k bytes_read/sec=1.65126G/s midpoints=310.378M midpoints/sec=443.257M/s
BM_StdMidpoint<uint16_t, ZeroRand>_BigO         2.26 N          2.26 N
BM_StdMidpoint<uint16_t, ZeroRand>_RMS             1 %             1 %
<...>
BM_StdMidpoint<uint8_t, ZeroRand>/524288     1069421 ns      1069413 ns          655 bytes_read/iteration=1024k bytes_read/sec=935.092M/s midpoints=343.409M midpoints/sec=490.258M/s
BM_StdMidpoint<uint8_t, ZeroRand>_BigO          2.04 N          2.04 N
BM_StdMidpoint<uint8_t, ZeroRand>_RMS              0 %             0 %
Comparing ./llvm-cmov-bench-OLD to ./llvm-cmov-bench-NEW
Benchmark                                                   Time             CPU      Time Old      Time New       CPU Old       CPU New
----------------------------------------------------------------------------------------------------------------------------------------
<...>
BM_StdMidpoint<int32_t, RandRand>/131072                 +0.0016         +0.0016        300398        300878        300404        300880
<...>
BM_StdMidpoint<uint32_t, RandRand>/131072                -0.0007         -0.0007        300433        300231        300433        300226
<...>
BM_StdMidpoint<int64_t, RandRand>/65536                  +0.0057         +0.0054        169857        170819        169858        170777
<...>
BM_StdMidpoint<uint64_t, RandRand>/65536                 +0.0114         +0.0114        169770        171705        169771        171708
<...>
BM_StdMidpoint<int16_t, RandRand>/262144                 +0.0023         +0.0023        591169        592510        591179        592516
<...>
BM_StdMidpoint<uint16_t, RandRand>/262144                +0.0398         +0.0398        591264        614823        591274        614823
<...>
BM_StdMidpoint<int8_t, RandRand>/524288                  -0.6403         -0.6403       2983669       1073181       2983689       1073201
<...>
BM_StdMidpoint<uint8_t, RandRand>/524288                 -0.5986         -0.5986       2668398       1071010       2668419       1071020
<...>
BM_StdMidpoint<uint32_t, ZeroRand>/131072                -0.0016         -0.0016        300887        300413        300887        300416
<...>
BM_StdMidpoint<uint64_t, ZeroRand>/65536                 +0.0002         +0.0002        169634        169667        169634        169669
<...>
BM_StdMidpoint<uint16_t, ZeroRand>/262144                -0.0014         -0.0014        592252        591396        592255        591404
<...>
BM_StdMidpoint<uint8_t, ZeroRand>/524288                 +0.0832         +0.0832        987295       1069421        987309       1069413
```

What can we tell from the benchmark?
* `BM_StdMidpoint<[u]int8_t, RandRand>` indeed has the worst performance.
* All `BM_StdMidpoint<uint{8,16,32}_t, ZeroRand>` are all performant, even the 8-bit case.
  That is because there we are computing mid point between zero and some random number,
  thus if the branch predictor is in use, it is in optimal situation.
* Promoting 8-bit CMOV did improve performance of `BM_StdMidpoint<[u]int8_t, RandRand>`, by -59%..-64%.

# What about branch predictor?
* `BM_StdMidpoint<uint8_t, ZeroRand>` was faster than `BM_StdMidpoint<uint{16,32,64}_t, ZeroRand>`,
  which may mean that well-predicted branch is better than `cmov`.
* Promoting 8-bit CMOV degraded performance of `BM_StdMidpoint<uint8_t, ZeroRand>`,
  `cmov` is up to +10% worse than well-predicted branch.
* However, i do not believe this is a concern. If the branch is well predicted,  then the PGO
  will also say that it is well predicted, and LLVM will happily expand cmov back into branch:
  https://godbolt.org/z/P5ufig

# What about partial register stalls?
I'm not really able to answer that.
What i can say is that if the branch is unpredictable (if it is predictable, then use PGO and you'll have branch)
in ~50% of cases you will have to pay branch misprediction penalty.
```
$ grep -i MispredictPenalty X86Sched*.td
X86SchedBroadwell.td:  let MispredictPenalty = 16;
X86SchedHaswell.td:  let MispredictPenalty = 16;
X86SchedSandyBridge.td:  let MispredictPenalty = 16;
X86SchedSkylakeClient.td:  let MispredictPenalty = 14;
X86SchedSkylakeServer.td:  let MispredictPenalty = 14;
X86ScheduleBdVer2.td:  let MispredictPenalty = 20; // Minimum branch misdirection penalty.
X86ScheduleBtVer2.td:  let MispredictPenalty = 14; // Minimum branch misdirection penalty
X86ScheduleSLM.td:  let MispredictPenalty = 10;
X86ScheduleZnver1.td:  let MispredictPenalty = 17;
```
.. which it can be as small as 10 cycles and as large as 20 cycles.
Partial register stalls do not seem to be an issue for AMD CPU's.
For intel CPU's, they should be around ~5 cycles?
Is that actually an issue here? I'm not sure.

In short, i'd say this is an improvement, at least on this microbenchmark.

Fixes [[ https://bugs.llvm.org/show_bug.cgi?id=40965 | PR40965 ]].

Reviewers: craig.topper, RKSimon, spatel, andreadb, nikic

Reviewed By: craig.topper, andreadb

Subscribers: jfb, jdoerfert, llvm-commits, mclow.lists

Tags: #llvm, #libc

Differential Revision: https://reviews.llvm.org/D59035

llvm-svn: 356300
2019-03-15 21:17:53 +00:00
Nikita Popov 1a26144ff5 [AArch64] Turn BIC immediate creation into a DAG combine
Switch BIC immediate creation for vector ANDs from custom lowering
to a DAG combine, which gives generic DAG combines a change to
apply first. In particular this avoids (and x, -1) being turned into
a (bic x, 0) instead of being eliminated entirely.

Differential Revision: https://reviews.llvm.org/D59187

llvm-svn: 356299
2019-03-15 21:04:34 +00:00
Changpeng Fang 989ec59c9f AMDGPU: Fix a SIAnnotateControlFlow issue when there are multiple backedges.
Summary:
At the exit of the loop, the compiler uses a register to remember and accumulate
the number of threads that have already exited. When all active threads exit the
loop, this register is used to restore the exec mask, and the execution continues
for the post loop code.

When there is a "continue" in the loop, the compiler made a mistake to reset the
register to 0 when the "continue" backedge is taken. This will result in some
threads not executing the post loop code as they are supposed to.

This patch fixed the issue.

Reviewers:
  nhaehnle, arsenm

Differential Revision:
  https://reviews.llvm.org/D59312

llvm-svn: 356298
2019-03-15 21:02:48 +00:00
Alex Langford 5ac90b8ba1 [CMake] Correct CMake message mode
Summary:
This wasn't actually printing out a CMake warning, it was prepending
"WARN" to the message.

Reviewers: zturner

Subscribers: mgorny, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D59432

llvm-svn: 356297
2019-03-15 20:43:53 +00:00
Craig Topper af856db961 [X86] Strip the SAE bit from the rounding mode passed to the _RND opcodes. Use TargetConstant to save a conversion in the isel table.
The asm parser generates the immediate without the SAE bit. So for consistency we should generate the MCInst the same way from CodeGen.

Since they are now both the same, remove the masking from the printer and replace with an llvm_unreachable.

Use a target constant since we're rebuilding the node anyway. Then we don't have to have isel convert it. Saves about 500 bytes from the isel table.

llvm-svn: 356294
2019-03-15 19:59:35 +00:00
Philip Reames 68a2e4d48b [SimplifyDemandedVec] Strengthen handling all undef lanes (particularly GEPs)
A change of two parts:
1) A generic enhancement for all callers of SDVE to exploit the fact that if all lanes are undef, the result is undef.
2) A GEP specific piece to strengthen/fix the vector index undef element handling, and call into the generic infrastructure when visiting the GEP.

The result is that we replace a vector gep with at least one undef in each lane with a undef.  We can also do the same for vector intrinsics.  Once the masked.load patch (D57372) has landed, I'll update to include call tests as well.

Differential Revision: https://reviews.llvm.org/D57468

llvm-svn: 356293
2019-03-15 19:54:06 +00:00