Commit Graph

64649 Commits

Author SHA1 Message Date
Sam Tebbs 8b2df85d02 [ARM] Select vmla
This patch adds vmla selection.

Differential revision: https://reviews.llvm.org/D66297

llvm-svn: 370704
2019-09-03 08:17:46 +00:00
Craig Topper 9dc8c448ed [X86] Don't use Expand for i32 fp_to_uint on SSE1/2 targets on 32-bit target.
Use Custom lowering instead. Fall back to default expansion only
when the scalar FP type belongs in an XMM register. This improves
lowering for i32 to fp80, and also i32 to double on SSE1 only.

llvm-svn: 370699
2019-09-03 05:57:18 +00:00
Craig Topper f255f44336 [X86] Add an exhaustive test for i32 fptosi/fptoui across different triples and features.
llvm-svn: 370698
2019-09-03 05:57:14 +00:00
Craig Topper dcecc7ea46 [X86] Custom promote i32->f80 uint_to_fp on AVX512 64-bit targets.
Reuse the same code to promote all i32 uint_to_fp on 64-bit targets
to simplify the X86ISelLowering constructor.

llvm-svn: 370693
2019-09-03 02:51:10 +00:00
Simon Pilgrim cacf4db571 [CostModel][X86] Add scalar sext/zext cost tests
llvm-svn: 370684
2019-09-02 21:02:51 +00:00
Craig Topper 45cd185109 [X86] Enable fp128 as a legal type with SSE1 rather than with MMX.
FP128 values are passed in xmm registers so should be asssociated
with an SSE feature rather than MMX which uses a different set
of registers.

llc enables sse1 and sse2 by default with x86_64. But does not
enable mmx. Clang enables all 3 features by default.

I've tried to add command lines to test with -sse
where possible, but any test that returns a value in an xmm
register fails with a fatal error with -sse since we have no
defined ABI for that scenario.

llvm-svn: 370682
2019-09-02 20:16:30 +00:00
David Green a5fd8d8f47 [ARM] MVE predicate bitcast test and VPSEL adjustment. NFC
llvm-svn: 370678
2019-09-02 19:03:35 +00:00
David Green a95ec59fa5 [ARM] Use MQPR not QPR for MVE registers
We should be using MQPR, and if we don't we can get COPYs and PHIs created for
QPR. These get folded into instructions, failing verification checks.

Differential revision: https://reviews.llvm.org/D66214

llvm-svn: 370676
2019-09-02 17:18:23 +00:00
Robert Lougher 13190c4225 [TargetLowering][PS4] Add sincos(f) lib functions when target is PS4
PS4 supports sincosf and sincos. Adding the library functions enables
the sin(f)+cos(f) -> sincos(f) optimization.

Differential Revision: https://reviews.llvm.org/D67009

llvm-svn: 370675
2019-09-02 16:53:32 +00:00
Ulrich Weigand b21e245711 [SystemZ] Support constrained fpto[su]i intrinsics
Now that constrained fpto[su]i intrinsic are available,
add codegen support to the SystemZ backend.

In addition to pure back-end changes, I've also needed
to add the strict_fp_to_[su]int and any_fp_to_[su]int
pattern fragments in the obvious way.

llvm-svn: 370674
2019-09-02 16:49:29 +00:00
Kerry McLaughlin da4ef9b4c8 [SVE][Inline-Asm] Support for SVE asm operands
Summary:
Adds the following inline asm constraints for SVE:
  - w: SVE vector register with full range, Z0 to Z31
  - x: Restricted to registers Z0 to Z15 inclusive.
  - y: Restricted to registers Z0 to Z7 inclusive.

This change also adds the "z" modifier to interpret a register as an SVE register.

Not all of the bitconvert patterns added by this patch are used, but they have been included here for completeness.

Reviewers: t.p.northover, sdesmalen, rovka, momchil.velikov, rengolin, cameron.mcinally, greened

Reviewed By: sdesmalen

Subscribers: javed.absar, tschuett, rkruppe, psnobl, cfe-commits, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66302

llvm-svn: 370673
2019-09-02 16:12:31 +00:00
George Rimar 78e8011a29 Recommit r370661 "[llvm-nm] - Add a test case for case when we dump a symbol that belongs to a section with a broken sh_name."
Fix: add a 'consumeError()' call to ObjectFile.cpp.
This error was never checked.

Original commit message:

It adds a test case for a problem fixed by D66976 <https://reviews.llvm.org/D66976>.

It was introduced by me in D66089 <https://reviews.llvm.org/D66089>.
The error reported was never consumed because of a wrong variable name used,
so it could fail when LLVM_ENABLE_ABI_BREAKING_CHECKS is used.

Differential revision: https://reviews.llvm.org/D67002

llvm-svn: 370669
2019-09-02 14:57:35 +00:00
Sanjay Patel 4e54cf3e0e [DAGCombiner] try to form test+set out of shift+mask patterns
The motivating bugs are:
https://bugs.llvm.org/show_bug.cgi?id=41340
https://bugs.llvm.org/show_bug.cgi?id=42697

As discussed there, we could view this as a failure of IR canonicalization,
but then we would need to implement a backend fixup with target overrides
to get this right in all cases. Instead, we can just view this as a codegen
opportunity. It's not even clear for x86 exactly when we should favor
test+set; some CPUs have better theoretical throughput for the ALU ops than
bt/test.

This patch is made more complicated than I expected because there's an early
DAGCombine for 'and' that can change types of the intermediate ops via
trunc+anyext.

Differential Revision: https://reviews.llvm.org/D66687

llvm-svn: 370668
2019-09-02 14:52:09 +00:00
Jay Foad 6e18266aa4 Partially revert D61491 "AMDGPU: Be explicit about whether the high-word in SI_PC_ADD_REL_OFFSET is 0"
Summary:
D61491 caused us to use relocs when they're not strictly necessary, to
refer to symbols in the text section. This is a pessimization and it's a
problem for some loaders that don't support relocs yet.

Reviewers: nhaehnle, arsenm, tpr

Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, t-tye, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65813

llvm-svn: 370667
2019-09-02 14:40:57 +00:00
Dmitry Preobrazhensky 4aa90ea58e [AMDGPU][MC][GFX10] Corrected constant bus checks to exclude null
See AMD SWDEV-157286

Reviewers: atamazov, arsenm

Differential Revision: https://reviews.llvm.org/D65229

llvm-svn: 370665
2019-09-02 14:19:52 +00:00
Thomas Preud'homme a291b950db [FileCheck] Forbid using var defined on same line
Summary:
Commit r366897 introduced the possibility to set a variable from an
expression, such as [[#VAR2:VAR1+3]]. While introducing this feature, it
introduced extra logic to allow using such a variable on the same line
later on. Unfortunately that extra logic is flawed as it relies on a
mapping from variable to expression defining it when the mapping is from
variable definition to expression. This flaw causes among other issues
PR42896.

This commit avoids the problem by forbidding all use of a variable
defined on the same line, and removes the now useless logic. Redesign
will be done in a later commit because it will require some amount of
refactoring first for the solution to be clean. One example is the need
for some sort of transaction mechanism to set a variable temporarily and
from an expression and rollback if the CHECK pattern does not match so
that diagnostics show the right variable values.

Reviewers: jhenderson, chandlerc, jdenny, probinson, grimar, arichardson, rnk

Subscribers: JonChesterfield, rogfer01, hfinkel, kristina, rnk, tra, arichardson, grimar, dblaikie, probinson, llvm-commits, hiraditya

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66141

llvm-svn: 370663
2019-09-02 14:04:00 +00:00
George Rimar b567ce7680 Revert r370661 "[llvm-nm] - Add a test case for case when we dump a symbol that belongs to a section with a broken sh_name"
It broke BB:
http://lab.llvm.org:8011/builders/clang-x86_64-debian-fast/builds/16955/steps/test/logs/stdio

Expected<T> must be checked before access or destruction.
Unchecked Expected<T> contained error:
a section [index 1] has an invalid sh_name (0xffff) offset which goes past the end of the section name string tableStack dump:
0.	Program arguments: /srv/llvm-buildbot-srcatch/llvm-build-dir/clang-x86_64-debian-fast/llvm.obj/bin/llvm-nm /srv/llvm-buildbot-srcatch/llvm-build-dir/clang-x86_64-debian-fast/llvm.obj/test/tools/llvm-nm/Output/format-sysv-section.test.tmp2.o --format=sysv 
 #0 0x00000000008af7c4 PrintStackTraceSignalHandler(void*) (/srv/llvm-buildbot-srcatch/llvm-build-dir/clang-x86_64-debian-fast/llvm.obj/bin/llvm-nm+0x8af7c4)
 #1 0x00000000008ad8be llvm::sys::RunSignalHandlers() (/srv/llvm-buildbot-srcatch/llvm-build-dir/clang-x86_64-debian-fast/llvm.obj/bin/llvm-nm+0x8ad8be)
 #2 0x00000000008afbd8 SignalHandler(int) (/srv/llvm-buildbot-srcatch/llvm-build-dir/clang-x86_64-debian-fast/llvm.obj/bin/llvm-nm+0x8afbd8)
 #3 0x00007f0a6b989730 __restore_rt (/lib/x86_64-linux-gnu/libpthread.so.0+0x12730)
 #4 0x00007f0a6b48d7bb raise (/lib/x86_64-linux-gnu/libc.so.6+0x377bb)
 #5 0x00007f0a6b478535 abort (/lib/x86_64-linux-gnu/libc.so.6+0x22535)
 #6 0x000000000042004b llvm::Expected<llvm::StringRef>::fatalUncheckedExpected() const (/srv/llvm-buildbot-srcatch/llvm-build-dir/clang-x86_64-debian-fast/llvm.obj/bin/llvm-nm+0x42004b)
 #7 0x00000000008367f5 (/sv/llvm-buildbot-srcatch/llvm-build-dir/clang-x86_64-debian-fast/llvm.obj/bin/llvm-nm+0x8367f5)
 #8 0x0000000000817b80 llvm::object::IRObjectFile::findBitcodeInObject(llvm::object::ObjectFile const&) (/srv/llvm-buildbot-srcatch/llvm-build-dir/clang-x86_64-debian-fast/llvm.obj/bin/llvm-nm+0x817b80)
 #9 0x0000000000838416 llvm::object::SymbolicFile::createSymbolicFile(llvm::MemoryBufferRef, llvm::file_magic, llvm::LLVMContext*) (/srv/llvm-buildbot-srcatch/llvm-build-dir/clang-x86_64-debian-fast/llvm.obj/bin/llvm-nm+0x838416)
#10 0x00000000007f36cb llvm::object::createBinary(llvm::MemoryBufferRef, llvm::LLVMContext*) (/srv/llvm-buildbot-srcatch/llvm-build-dir/clang-x86_64-debian-fast/llvm.obj/bin/llvm-nm+0x7f36cb)
#11 0x0000000000413123 dumpSymbolNamesFromFile(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&) (/srv/llvm-buildbot-srcatch/llvm-build-dir/clang-x86_64-debian-fast/llvm.obj/bin/llvm-nm+0x413123)
#12 0x0000000000412e38 main (/srv/llvm-buildbot-srcatch/llvm-build-dir/clang-x86_64-debian-fast/llvm.obj/bin/llvm-nm+0x412e38)
#13 0x00007f0a6b47a09b __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2409b)
#14 0x00000000004120da _start (/srv/llvm-buildbot-srcatch/llvm-build-dir/clang-x86_64-debian-fast/llvm.obj/bin/llvm-nm+0x4120da)
FileCheck error: '-' is empty.
FileCheck command line:  /srv/llvm-buildbot-srcatch/llvm-build-dir/clang-x86_64-debian-fast/llvm.obj/bin/FileCheck /srv/llvm-buildbot-srcatch/llvm-build-dir/clang-x86_64-debian-fast/llvm.src/test/tools/llvm-nm/format-sysv-section.test --check-prefix=ERR

--

llvm-svn: 370662
2019-09-02 14:03:50 +00:00
George Rimar 4b9233cafb [llvm-nm] - Add a test case for case when we dump a symbol that belongs to a section with a broken sh_name.
It adds a test case for a problem fixed by D66976.

It was introduced by me in D66089.
The error reported was never consumed because of a wrong variable name used,
so it could fail when LLVM_ENABLE_ABI_BREAKING_CHECKS is used.

Differential revision: https://reviews.llvm.org/D67002

llvm-svn: 370661
2019-09-02 13:54:45 +00:00
Dmitry Preobrazhensky 9c68eddbbe [AMDGPU][MC][GFX10] Enabled null with 64-bit operands
See Bug 42745: https://bugs.llvm.org/show_bug.cgi?id=42745

Reviewers: atamazov, arsenm

https://reviews.llvm.org/D65231

llvm-svn: 370660
2019-09-02 13:42:25 +00:00
Sanjay Patel 561c39994b [InstCombine] recognize bswap disguised as shufflevector
bitcast <N x i8> (shuf X, undef, <N, N-1,...0>) to i{N*8} --> bswap (bitcast X to i{N*8})

In PR43146:
https://bugs.llvm.org/show_bug.cgi?id=43146
...we have a more complicated case where SLP is making a mess of bswap. This patch won't
do anything for that currently, but we need to improve bswap recognition in instcombine,
SLP, and/or a standalone pass to avoid that problem.

This is limited using the data-layout so we don't try to do this transform with actual
vector types. The backend does not appear to have folds to convert in either direction,
so we don't want to mess up something that is actually better lowered as a shuffle.

On x86, we're trading something like this:

  vmovd	%edi, %xmm0
  vpshufb	LCPI0_0(%rip), %xmm0, %xmm0 ## xmm0 = xmm0[3,2,1,0,u,u,u,u,u,u,u,u,u,u,u,u]
  vmovd	%xmm0, %eax

For:

  movl	%edi, %eax
  bswapl	%eax

Differential Revision: https://reviews.llvm.org/D66965

llvm-svn: 370659
2019-09-02 13:33:20 +00:00
Martin Storsjo 491fc23a60 [test] [llvm-dlltool] Improve test strictness a little. NFC.
llvm-svn: 370657
2019-09-02 13:28:21 +00:00
Martin Storsjo 1cec6b2970 [llvm-dlltool] Handle external and internal names with differing decoration
Also add a missed part of the test from SVN r369747.

Differential Revision: https://reviews.llvm.org/D66996

llvm-svn: 370656
2019-09-02 13:28:16 +00:00
Dmitry Preobrazhensky fe2ee4c46a [AMDGPU][MC][GFX10] Corrected constant bus limit for 64-bit shift instructions
See bug 42744: https://bugs.llvm.org/show_bug.cgi?id=42744

Reviewers: atamazov, arsenm

Differential Revision: https://reviews.llvm.org/D65228

llvm-svn: 370652
2019-09-02 12:50:05 +00:00
Andrea Di Biagio 528f68144b [X86][BtVer2] Fix latency and throughput of conditional SIMD store instructions.
On BtVer2 conditional SIMD stores are heavily microcoded.
The latency is directly proportional to the number of packed elements extracted
from the input vector. Also, according to micro-benchmarks, most of the
computation seems to be done in the integer unit.

Only a minority of the uOPs is executed by the FPU. The observed behaviour on
the FPU looks similar to this:
 - The input MASK value is moved to the Integer Unit
   -- [ a VMOVMSK-like uOP-executed on JFPU0].
 - In parallel, each element of the input XMM/YMM is extracted and then sent to
   the IntegerUnit through JFPU1.

As expected, a (conditional) store is executed for every extracted element.
Interestingly, a (speculative) load is executed for every extracted element too.
It is as-if a "LOAD - BIT_EXTRACT- CMOV" sequence of uOPs is repeated by the
integer unit for every contionally stored element.
VMASKMOVDQU is a special case: the number of speculative loads is always 2
(presumably, one load per quadword). That means, extra shifts and masking is
performed on (one of) the loaded quadwords before each conditional store (that
also explains the big number of non-FP uOPs retired).

This patch replaces the existing writes for conditional SIMD stores (i.e.
WriteFMaskedStore, and WriteFMaskedStoreY) with the following new writes:

  WriteFMaskedStore32  [ XMM Packed Single ]
  WriteFMaskedStore32Y [ YMM Packed Single ]
  WriteFMaskedStore64  [ XMM Packed Double ]
  WriteFMaskedStore64Y [ YMM Packed Double ]

Added a wrapper class named X86SchedWriteMaskMove in X86Schedule.td to describe
both RM and MR variants for conditional SIMD moves in a single tablegen
definition.
Instances of that class are then passed in input to multiclass avx_movmask_rm
when constructing MASKMOVPS/PD definitions.

Since this patch introduces new writes, I had to update all the X86 scheduling
models.

Differential Revision: https://reviews.llvm.org/D66801

llvm-svn: 370649
2019-09-02 12:32:28 +00:00
Jeremy Morse 22493f66f1 [DebugInfo] LiveDebugValues: correctly discriminate kinds of variable locations
The missing line added by this patch ensures that only spilt variable
locations are candidates for being restored from the stack. Otherwise,
register or constant-value information can be interpreted as a spill
location, through a union.

The added regression test replicates a scenario where this occurs: the
stack load from [rsp] causes the register-location DBG_VALUE to be
"restored" to rsi, when it should be left alone. See PR43058 for details.

Un x-fail a test that was suffering from this from a previous patch.

Differential Revision: https://reviews.llvm.org/D66895

llvm-svn: 370648
2019-09-02 12:28:36 +00:00
James Henderson 43e9ead1ed [llvm-strings][test] Merge two closely related tests
This is a follow-up to feedback on D66015.

Reviewed by: grimar

Differential Revision: https://reviews.llvm.org/D67069

llvm-svn: 370643
2019-09-02 11:42:30 +00:00
Piotr Sobczak 252a584cbd [AMDGPU] Add test
Summary:
Add test checking that the redundant immediate MOV instruction
(by-product of handling phi nodes) is not found in the generated code.

Reviewers: arsenm, anton-afanasyev, craig.topper, rtereshin, bogner

Reviewed By: arsenm

Subscribers: kzhuravl, yaxunl, dstuttard, tpr, t-tye, wdng, jvesely, nhaehnle, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D63860

llvm-svn: 370634
2019-09-02 10:02:54 +00:00
George Rimar 86cc736df1 [yaml2obj] - Allow overriding sh_name fields of the sections.
This is in line with the previous changes which allowed to
override the sh_offset/sh_size and useful for writing test cases.

Differential revision: https://reviews.llvm.org/D66998

llvm-svn: 370633
2019-09-02 09:47:17 +00:00
Djordje Todorovic 5c6b82a756 [DWARFVerifier] Verify GNU extensions of call site DWARF symbols
Verify that the call site DWARF symbols (added during the implementation
of the debug entry values feature) are generated properly.

Differential Revision: https://reviews.llvm.org/D66865

llvm-svn: 370631
2019-09-02 09:20:46 +00:00
Amara Emerson 453ef4e376 [AArch64][GlobalISel] Fix zext narrowScalar to use the right type when creating
the merges.

Fixes PR43171.

llvm-svn: 370627
2019-09-02 08:18:55 +00:00
Craig Topper 3ab210862a [X86] Add initial support for unfolding broadcast loads from arithmetic instructions to enable LICM hoisting of the load
MachineLICM can hoist an invariant load, but if that load is folded it needs to be unfolded. On AVX512 sometimes this load is an broadcast load which we were previously unable to unfold. This patch adds initial support for that with a very basic list of supported instructions as a starting point.

Differential Revision: https://reviews.llvm.org/D67017

llvm-svn: 370620
2019-09-01 22:14:36 +00:00
Sanjay Patel c882208367 [DAGCombiner] improve throughput of shift+logic+shift
The motivating case for this is a long way from here:
https://bugs.llvm.org/show_bug.cgi?id=43146
...but I think this is where we have to start.

We need to canonicalize/optimize sequences of shift and logic to ease
pattern matching for things like bswap and improve perf in general.
But without the artificial limit of '!LegalTypes' (early combining),
there are a lot of test diffs, and not all are good.

In the minimal tests added for this proposal, x86 should have better
throughput in all cases. AArch64 is neutral for scalar tests because
it can fold shifts into bitwise logic ops.

There are 3 shift opcodes and 3 logic opcodes for a total of 9 possible patterns:
https://rise4fun.com/Alive/VlI
https://rise4fun.com/Alive/n1m
https://rise4fun.com/Alive/1Vn

Differential Revision: https://reviews.llvm.org/D67021

llvm-svn: 370617
2019-09-01 18:38:15 +00:00
Roman Lebedev ff21e3f055 [ConstantFolding] Fix 'undef' folding for @llvm.[us]{add,sub}.with.overflow ops (PR43188)
As we have already established/fixed in
  https://bugs.llvm.org/show_bug.cgi?id=42209
  https://reviews.llvm.org/D63065
  https://reviews.llvm.org/rL363522
the InstSimplify handling for @llvm.with.overflow ops with undefs
is correct. Therefore if ConstantFolding produces different results,
then it is wrong.

This duplication of code hints at the need for some refactoring,
but for now address the brokenness of ConstantFolding by
copying the known-good handling from rL363522.

Fixes https://bugs.llvm.org/show_bug.cgi?id=43188

llvm-svn: 370608
2019-09-01 11:56:52 +00:00
David Green 8469a39af3 [ARM] Remove MVE masked loads/stores
These were never enabled correctly and are causing other problems. Taking them
out for the moment, whilst we work on the issues.

This reverts r370329.

llvm-svn: 370607
2019-09-01 10:11:40 +00:00
Shiva Chen adfdcb9c26 [TargetLowering] Fix Bugzilla ID 43183 to avoid soften comparison broken with constant inputs
Summary:
  This fixes the bugzilla id 43183 which triggerd by the following commit:
  [RISCV] Avoid generating AssertZext for LP64 ABI when lowering floating LibCall

llvm-svn: 370604
2019-09-01 04:52:54 +00:00
Puyan Lotfi 75a8a212d4 [GlobalISel][NFC] Regression test cases for aarch64 legalizer (s128 sext+icmp).
There were legalizer asserts in aarch64 globalisel (in debug mode) with s128
sext+icmp before r367060 and r366943 landed. These are just a couple reduced
mir and ir regression tests that came from a build where these were encountered.

llvm-svn: 370602
2019-09-01 00:45:28 +00:00
David Bolvansky ff0ad3c43d [InstCombine] mempcpy(d,s,n) to memcpy(d,s,n) + n
Summary:
Back-end currently expands mempcpy, but middle-end should work with memcpy instead of mempcpy to enable more memcpy-optimization.

GCC backend emits mempcpy, so LLVM backend could form it too, if we know mempcpy libcall is better than memcpy + n.
https://godbolt.org/z/dOCG96

Reviewers: efriedma, spatel, craig.topper, RKSimon, jdoerfert

Reviewed By: efriedma

Subscribers: hjl.tools, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65737

llvm-svn: 370593
2019-08-31 18:19:05 +00:00
Simon Pilgrim f8d1d00190 [X86] EltsFromConsecutiveLoads - Don't confuse elt count with vector element count (PR43170)
EltsFromConsecutiveLoads was assuming that the number of input elts was the same as the number of elements in the output vector type when creating a zeroing shuffle, causing an assert when subvectors were being combined instead of just scalars.

llvm-svn: 370592
2019-08-31 16:21:31 +00:00
Simon Pilgrim 20be06db97 [X86][AVX512] Regenerate tests with common prefixes
llvm-svn: 370591
2019-08-31 16:04:39 +00:00
Sanjay Patel 11704d0f51 [AArch64][x86] increase value type coverage in tests; NFC
This goes with D67021.

llvm-svn: 370590
2019-08-31 15:49:16 +00:00
Amaury Sechet 82825ab882 [DAGCombiner] Match (add X, X) as (shl X, 1) when detecting rotate.
Summary: The combiner transforms (shl X, 1) into (add X, X).

Reviewers: craig.topper, efriedma, RKSimon, lebedev.ri

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66882

llvm-svn: 370578
2019-08-31 11:40:02 +00:00
Nikita Popov a91f729279 [CVP] Add tests for simplified with.overflow + icmp; NFC
These tests are based on D19867.

llvm-svn: 370574
2019-08-31 09:58:42 +00:00
Nikita Popov b9e668f2e7 [CVP] Generate simpler code for elided with.overflow intrinsics
Use a { iN undef, i1 false } struct as the base, and only insert
the first operand, instead of using { iN undef, i1 undef } as the
base and inserting both. This is the same as what we do in InstCombine.

Differential Revision: https://reviews.llvm.org/D67034

llvm-svn: 370573
2019-08-31 09:58:37 +00:00
Wei Mi 798e59b81f [SampleFDO] Add profile symbol list section to discriminate function being
cold versus function being newly added.

This is the second half of https://reviews.llvm.org/D66374.

Profile symbol list is the collection of function symbols showing up in
the binary which generates the current profile. It is used to discriminate
function being cold versus function being newly added. Profile symbol list
is only added for profile with ExtBinary format.

During profile use compilation, when profile-sample-accurate is enabled,
a function without profile will be regarded as cold only when it is
contained in that list.

Differential Revision: https://reviews.llvm.org/D66766

llvm-svn: 370563
2019-08-31 02:27:26 +00:00
Thomas Lively d0d9317061 [WebAssembly] Add SIMD QFMA/QFMS
Summary:
Adds clang builtins and LLVM intrinsics for these experimental
instructions. They are not implemented in engines yet, but that is ok
because the user must opt into using them by calling the builtins.

Reviewers: aheejin, dschuff

Reviewed By: aheejin

Subscribers: sbc100, jgravelle-google, hiraditya, sunfish, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D67020

llvm-svn: 370556
2019-08-31 00:12:29 +00:00
Alina Sbirlea 3d03769ba0 [MemorySSA] Rename all phi entries.
When renaming Phis incoming values, there may be multiple edges incoming
from the same block (switch). Rename all.

llvm-svn: 370548
2019-08-30 23:02:53 +00:00
Wei Mi 5ef5829fb0 [GVN] Verify value equality before doing phi translation for call instruction
This is an updated version of https://reviews.llvm.org/D66909 to fix PR42605.

Basically, current phi translatation translates an old value number to an new
value number for a call instruction based on the literal equality of call
expression, without verifying there is no clobber in between. This is incorrect.

To get a finegrain check, use MachineDependence analysis to do the job. However,
this is still not ideal. Although given a call instruction,
`MemoryDependenceResults::getCallDependencyFrom` returns identical call
instructions without clobber in between using MemDepResult with its DepType to
be `Def`. However, identical is too strict here and we want it to be relaxed a
little to consider phi-translation -- callee is the same, param operands can be
different. That means changing the semantic of `MemDepResult::Def` and I don't
know the potential impact.

So currently the patch is still conservative to only handle
MemDepResult::NonFuncLocal, which means the current call has no function local
clobber. If there is clobber, even if the clobber doesn't stand in between the
current call and the call with the new value, we won't do phi-translate.

Differential Revision: https://reviews.llvm.org/D67013

llvm-svn: 370547
2019-08-30 23:01:22 +00:00
Reid Kleckner 657a06c619 [MC] Avoid crashes from improperly nested or wrong target .seh_handlerdata directives
llvm-svn: 370540
2019-08-30 22:25:55 +00:00
Reid Kleckner a33474d595 [X86] Print register names in .seh_* directives
Also improve assembler parser register validation for .seh_ directives.
This requires moving X86-specific seh directive handling into the x86
backend, which addresses some assembler FIXMEs.

Differential Revision: https://reviews.llvm.org/D66625

llvm-svn: 370533
2019-08-30 21:23:05 +00:00
Sanjay Patel cfe959709f [x86] add tests for shift-logic-shift; NFC
llvm-svn: 370529
2019-08-30 20:51:51 +00:00
Sanjay Patel 82847b50e9 [AArch64] add tests for shift-logic-shift; NFC
llvm-svn: 370528
2019-08-30 20:48:43 +00:00
Reid Kleckner 0bb1630685 [Windows] Disable TrapUnreachable for Win64, add SEH_NoReturn
Users have complained llvm.trap produce two ud2 instructions on Win64,
one for the trap, and one for unreachable. This change fixes that.

TrapUnreachable was added and enabled for Win64 in r206684 (April 2014)
to avoid poorly understood issues with the Windows unwinder.

There seem to be two major things in play:
- the unwinder
- C++ EH, _CxxFrameHandler3 & co

The unwinder disassembles forward from the return address to scan for
epilogues. Inserting a ud2 had the effect of stopping the unwinder, and
ensuring that it ran the EH personality function for the current frame.
However, it's not clear what the unwinder does when the return address
happens to be the last address of one function and the first address of
the next function.

The Visual C++ EH personality, _CxxFrameHandler3, needs to figure out
what the current EH state number is. It does this by consulting the
ip2state table, which maps from PC to state number. This seems to go
wrong when the return address is the last PC of the function or catch
funclet.

I'm not sure precisely which system is involved here, but in order to
address these real or hypothetical problems, I believe it is enough to
insert int3 after a call site if it would otherwise be the last
instruction in a function or funclet.  I was able to reproduce some
similar problems locally by arranging for a noreturn call to appear at
the end of a catch block immediately before an unrelated function, and I
confirmed that the problems go away when an extra trailing int3
instruction is added.

MSVC inserts int3 after every noreturn function call, but I believe it's
only necessary to do it if the call would be the last instruction. This
change inserts a pseudo instruction that expands to int3 if it is in the
last basic block of a function or funclet. I did what I could to run the
Microsoft compiler EH tests, and the ones I was able to run showed no
behavior difference before or after this change.

Differential Revision: https://reviews.llvm.org/D66980

llvm-svn: 370525
2019-08-30 20:46:39 +00:00
Sanjay Patel a39ef6dea6 [Thumb2] tighten CHECK lines in test; NFC
The sequence between the function call and the asm start
may change without affecting what this test is looking for,
but we should have a better idea about what that sequence
looks like.

llvm-svn: 370518
2019-08-30 20:15:01 +00:00
Craig Topper 4b61b6476b [X86] Fix mul test cases in avx512-broadcast-unfold.ll to not get canonicalized to fadd. Remove the fsub test cases which were also testing fadd.
Not sure how to prevent an fsub by constant getting turned into an fadd by negative constant.

llvm-svn: 370515
2019-08-30 20:04:23 +00:00
Craig Topper a707ced18f [X86] Regenerate the test cases added in r370506.
Something weird happened with the v2i64/v2f64 test cases which
don't use broadcast. So they should already be hoisted, but
weren't in the version I submitted in r370506. This fixes that.
Not sure if something changed or I screwed up.

llvm-svn: 370507
2019-08-30 19:42:48 +00:00
Craig Topper 2396919200 [X86] Add test caes for opportunities for machine LICM to unfold broadcasted constant pool loads.
MachineLICM is able to unfold loads to move an invariant load out
a loop, but X86 infrastructure currently lacks the ability to do
this when avx512 embedded broadcasting is used.

This test adds examples for the basic float point operations,
add, mul, and, or, and xor.

llvm-svn: 370506
2019-08-30 19:26:06 +00:00
Jinsong Ji fb4b86af92 [PowerPC][NFC] Avoid checking non-relevant .cfi instructions
Summary:
This is brought up in
https://reviews.llvm.org/D64662?id=209923#inline-599490

CFI information are non-relevant to quite some testcases,
we should get rid of checking them when its unecessary.

This patch avoid generating cfi info in testcases that are not
testing prolog/epilog or exception handling.

Reviewers: kbarton, hfinkel, nemanjai, #powerpc

Reviewed By: hfinkel

Subscribers: MaskRay, shchenz, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D67016

llvm-svn: 370505
2019-08-30 19:24:25 +00:00
Puyan Lotfi d719c50655 [llvm-ifs][IFS] llvm Interface Stubs merging + object file generation tool.
This tool merges interface stub files to produce a merged interface stub file
or a stub library. Currently it for stub library generation it can produce an
ELF .so stub file, or a TBD file (experimental). It will be used by the clang
-emit-interface-stubs compilation pipeline to merge and assemble the per-CU
stub files into a stub library.

The new IFS format is as follows:

--- !experimental-ifs-v1
IfsVersion:      1.0
Triple:          <llvm triple>
ObjectFileFormat: <ELF | TBD>
Symbols:
  _ZSymbolName: { Type: <type>, etc... }
...

Differential Revision: https://reviews.llvm.org/D66405

llvm-svn: 370499
2019-08-30 18:26:05 +00:00
Craig Topper 18e8d02e8c [X86] Pass v32i16/v64i8 in zmm registers on KNL target.
gcc and icc pass these types in zmm registers in zmm registers.

This patch implements a quick hack to override the register
type before calling convention handling to one that is legal.
Longer term we might want to do something similar to 256-bit
integer registers on AVX1 where we just split all the operations.

Fixes PR42957

Differential Revision: https://reviews.llvm.org/D66708

llvm-svn: 370495
2019-08-30 17:35:08 +00:00
Evgeniy Stepanov 04647f5e22 MemTag: unchecked load/store optimization.
Summary:
MTE allows memory access to bypass tag check iff the address argument
is [SP, #imm]. This change takes advantage of this to demote uses of
tagged addresses to regular FrameIndex operands, reducing register
pressure in large functions.

MO_TAGGED target flag is used to signal that the FrameIndex operand
refers to memory that might be tagged, and needs to be handled with
care. Such operand must be lowered to [SP, #imm] directly, without a
scratch register.

The transformation pass attempts to predict when the offset will be
out of range and disable the optimization.
AArch64RegisterInfo::eliminateFrameIndex has an escape hatch in case
this prediction has been wrong, but it is quite inefficient and should
be avoided.

Reviewers: pcc, vitalybuka, ostannard

Subscribers: mgorny, javed.absar, kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66457

llvm-svn: 370490
2019-08-30 17:23:02 +00:00
Johannes Doerfert 3fac668d83 [Attributor] Use existing function information for the call site
Summary:
Instead of recomputing information for call sites we now use the
function information directly. This is always valid and once we have
call site specific information we can improve here.

This patch also bootstraps attributes that are created on-demand through
an initial update call. Information that is known will then directly be
available in the new attribute without causing an iteration delay.

The tests show how this improves the iteration count.

Reviewers: sstefan1, uenoku

Subscribers: hiraditya, bollu, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66781

llvm-svn: 370480
2019-08-30 15:24:52 +00:00
Johannes Doerfert 81df452d82 [Attributor] Manifest load/store alignment generally
Summary:
Any pointer could have load/store users not only floating ones so we
move the manifest logic for alignment into the AAAlignImpl class.

Reviewers: uenoku, sstefan1

Subscribers: hiraditya, bollu, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66922

llvm-svn: 370479
2019-08-30 15:22:28 +00:00
Piotr Sobczak 67b979466a [InstCombine][AMDGPU] Simplify tbuffer loads
Summary: Add missing tbuffer loads intrinsics in SimplifyDemandedVectorElts.

Reviewers: arsenm, nhaehnle

Reviewed By: arsenm

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66926

llvm-svn: 370475
2019-08-30 14:20:04 +00:00
George Rimar 4e71702cd4 [yaml2obj][obj2yaml] - Use a single "Other" field instead of "Other", "Visibility" and "StOther".
Currenly we can encode the 'st_other' field of symbol using 3 fields.
'Visibility' is used to encode STV_* values.
'Other' is used to encode everything except the visibility, but it can't handle arbitrary values.
'StOther' is used to encode arbitrary values when 'Visibility'/'Other' are not helpfull enough.

'st_other' field is used to encode symbol visibility and platform-dependent
flags and values. Problem to encode it is that it consists of Visibility part (STV_* values)
which are enumeration values and the Other part, which is different and inconsistent.

For MIPS the Other part contains flags for all STO_MIPS_* values except STO_MIPS_MIPS16.
(Like comment in ELFDumper says: "Someones in their infinite wisdom decided to make
STO_MIPS_MIPS16 flag overlapped with other ST_MIPS_xxx flags."...)

And for PPC64 the Other part might actually encode any value.

This patch implements custom logic for handling the st_other and removes
'Visibility' and 'StOther' fields.

Here is an example of a new YAML style this patch allows:

- Name:  foo
  Other: [ 0x4 ]
- Name:  bar
  Other: [ STV_PROTECTED, 4 ]
- Name:  zed
  Other: [ STV_PROTECTED, STO_MIPS_OPTIONAL, 0xf8 ]

Differential revision: https://reviews.llvm.org/D66886

llvm-svn: 370472
2019-08-30 13:39:22 +00:00
Simon Atanasyan 68f73bf262 [mips] Merge common checkings under the same check prefix. NFC
llvm-svn: 370467
2019-08-30 12:15:12 +00:00
Luis Marques c2b3d527fa [RISCV] Fix a couple of tests' CHECKs
llvm-svn: 370466
2019-08-30 12:11:47 +00:00
Amaury Sechet 485760f4c0 [X86] Add tests for rotate matching. NFC
llvm-svn: 370464
2019-08-30 11:35:28 +00:00
Chris Jackson fa1fe93789 [llvm-objcopy] Allow the visibility of symbols created by --binary and
--add-symbol to be specified with --new-symbol-visibility

llvm-svn: 370458
2019-08-30 10:17:16 +00:00
Hideto Ueno 6381b143f6 [Attributor] Implement AANoAliasCallSiteArgument initialization
Summary: This patch adds an appropriate `initialize` method for `AANoAliasCallSiteArgument`.

Reviewers: jdoerfert, sstefan1

Reviewed By: jdoerfert

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66927

llvm-svn: 370456
2019-08-30 10:00:32 +00:00
Roman Lebedev 5c9f3cfec7 [LoopIdiomRecognize] BCmp loop idiom recognition
Summary:
@mclow.lists brought up this issue up in IRC.
It is a reasonably common problem to compare some two values for equality.
Those may be just some integers, strings or arrays of integers.

In C, there is `memcmp()`, `bcmp()` functions.
In C++, there exists `std::equal()` algorithm.
One can also write that function manually.

libstdc++'s `std::equal()` is specialized to directly call `memcmp()` for
various types, but not `std::byte` from C++2a. https://godbolt.org/z/mx2ejJ

libc++ does not do anything like that, it simply relies on simple C++'s
`operator==()`. https://godbolt.org/z/er0Zwf (GOOD!)

So likely, there exists a certain performance opportunities.
Let's compare performance of naive `std::equal()` (no `memcmp()`) with one that
is using `memcmp()` (in this case, compiled with modified compiler). {F8768213}

```
#include <algorithm>
#include <cmath>
#include <cstdint>
#include <iterator>
#include <limits>
#include <random>
#include <type_traits>
#include <utility>
#include <vector>

#include "benchmark/benchmark.h"

template <class T>
bool equal(T* a, T* a_end, T* b) noexcept {
  for (; a != a_end; ++a, ++b) {
    if (*a != *b) return false;
  }
  return true;
}

template <typename T>
std::vector<T> getVectorOfRandomNumbers(size_t count) {
  std::random_device rd;
  std::mt19937 gen(rd());
  std::uniform_int_distribution<T> dis(std::numeric_limits<T>::min(),
                                       std::numeric_limits<T>::max());
  std::vector<T> v;
  v.reserve(count);
  std::generate_n(std::back_inserter(v), count,
                  [&dis, &gen]() { return dis(gen); });
  assert(v.size() == count);
  return v;
}

struct Identical {
  template <typename T>
  static std::pair<std::vector<T>, std::vector<T>> Gen(size_t count) {
    auto Tmp = getVectorOfRandomNumbers<T>(count);
    return std::make_pair(Tmp, std::move(Tmp));
  }
};

struct InequalHalfway {
  template <typename T>
  static std::pair<std::vector<T>, std::vector<T>> Gen(size_t count) {
    auto V0 = getVectorOfRandomNumbers<T>(count);
    auto V1 = V0;
    V1[V1.size() / size_t(2)]++;  // just change the value.
    return std::make_pair(std::move(V0), std::move(V1));
  }
};

template <class T, class Gen>
void BM_bcmp(benchmark::State& state) {
  const size_t Length = state.range(0);

  const std::pair<std::vector<T>, std::vector<T>> Data =
      Gen::template Gen<T>(Length);
  const std::vector<T>& a = Data.first;
  const std::vector<T>& b = Data.second;
  assert(a.size() == Length && b.size() == a.size());

  benchmark::ClobberMemory();
  benchmark::DoNotOptimize(a);
  benchmark::DoNotOptimize(a.data());
  benchmark::DoNotOptimize(b);
  benchmark::DoNotOptimize(b.data());

  for (auto _ : state) {
    const bool is_equal = equal(a.data(), a.data() + a.size(), b.data());
    benchmark::DoNotOptimize(is_equal);
  }
  state.SetComplexityN(Length);
  state.counters["eltcnt"] =
      benchmark::Counter(Length, benchmark::Counter::kIsIterationInvariant);
  state.counters["eltcnt/sec"] =
      benchmark::Counter(Length, benchmark::Counter::kIsIterationInvariantRate);
  const size_t BytesRead = 2 * sizeof(T) * Length;
  state.counters["bytes_read/iteration"] =
      benchmark::Counter(BytesRead, benchmark::Counter::kDefaults,
                         benchmark::Counter::OneK::kIs1024);
  state.counters["bytes_read/sec"] = benchmark::Counter(
      BytesRead, benchmark::Counter::kIsIterationInvariantRate,
      benchmark::Counter::OneK::kIs1024);
}

template <typename T>
static void CustomArguments(benchmark::internal::Benchmark* b) {
  const size_t L2SizeBytes = []() {
    for (const benchmark::CPUInfo::CacheInfo& I :
         benchmark::CPUInfo::Get().caches) {
      if (I.level == 2) return I.size;
    }
    return 0;
  }();
  // What is the largest range we can check to always fit within given L2 cache?
  const size_t MaxLen = L2SizeBytes / /*total bufs*/ 2 /
                        /*maximal elt size*/ sizeof(T) / /*safety margin*/ 2;
  b->RangeMultiplier(2)->Range(1, MaxLen)->Complexity(benchmark::oN);
}

BENCHMARK_TEMPLATE(BM_bcmp, uint8_t, Identical)
    ->Apply(CustomArguments<uint8_t>);
BENCHMARK_TEMPLATE(BM_bcmp, uint16_t, Identical)
    ->Apply(CustomArguments<uint16_t>);
BENCHMARK_TEMPLATE(BM_bcmp, uint32_t, Identical)
    ->Apply(CustomArguments<uint32_t>);
BENCHMARK_TEMPLATE(BM_bcmp, uint64_t, Identical)
    ->Apply(CustomArguments<uint64_t>);

BENCHMARK_TEMPLATE(BM_bcmp, uint8_t, InequalHalfway)
    ->Apply(CustomArguments<uint8_t>);
BENCHMARK_TEMPLATE(BM_bcmp, uint16_t, InequalHalfway)
    ->Apply(CustomArguments<uint16_t>);
BENCHMARK_TEMPLATE(BM_bcmp, uint32_t, InequalHalfway)
    ->Apply(CustomArguments<uint32_t>);
BENCHMARK_TEMPLATE(BM_bcmp, uint64_t, InequalHalfway)
    ->Apply(CustomArguments<uint64_t>);
```
{F8768210}
```
$ ~/src/googlebenchmark/tools/compare.py --no-utest benchmarks build-{old,new}/test/llvm-bcmp-bench
RUNNING: build-old/test/llvm-bcmp-bench --benchmark_out=/tmp/tmpb6PEUx
2019-04-25 21:17:11
Running build-old/test/llvm-bcmp-bench
Run on (8 X 4000 MHz CPU s)
CPU Caches:
  L1 Data 16K (x8)
  L1 Instruction 64K (x4)
  L2 Unified 2048K (x4)
  L3 Unified 8192K (x1)
Load Average: 0.65, 3.90, 4.14
---------------------------------------------------------------------------------------------------
Benchmark                                         Time             CPU   Iterations UserCounters...
---------------------------------------------------------------------------------------------------
<...>
BM_bcmp<uint8_t, Identical>/512000           432131 ns       432101 ns         1613 bytes_read/iteration=1000k bytes_read/sec=2.20706G/s eltcnt=825.856M eltcnt/sec=1.18491G/s
BM_bcmp<uint8_t, Identical>_BigO               0.86 N          0.86 N
BM_bcmp<uint8_t, Identical>_RMS                   8 %             8 %
<...>
BM_bcmp<uint16_t, Identical>/256000          161408 ns       161409 ns         4027 bytes_read/iteration=1000k bytes_read/sec=5.90843G/s eltcnt=1030.91M eltcnt/sec=1.58603G/s
BM_bcmp<uint16_t, Identical>_BigO              0.67 N          0.67 N
BM_bcmp<uint16_t, Identical>_RMS                 25 %            25 %
<...>
BM_bcmp<uint32_t, Identical>/128000           81497 ns        81488 ns         8415 bytes_read/iteration=1000k bytes_read/sec=11.7032G/s eltcnt=1077.12M eltcnt/sec=1.57078G/s
BM_bcmp<uint32_t, Identical>_BigO              0.71 N          0.71 N
BM_bcmp<uint32_t, Identical>_RMS                 42 %            42 %
<...>
BM_bcmp<uint64_t, Identical>/64000            50138 ns        50138 ns        10909 bytes_read/iteration=1000k bytes_read/sec=19.0209G/s eltcnt=698.176M eltcnt/sec=1.27647G/s
BM_bcmp<uint64_t, Identical>_BigO              0.84 N          0.84 N
BM_bcmp<uint64_t, Identical>_RMS                 27 %            27 %
<...>
BM_bcmp<uint8_t, InequalHalfway>/512000      192405 ns       192392 ns         3638 bytes_read/iteration=1000k bytes_read/sec=4.95694G/s eltcnt=1.86266G eltcnt/sec=2.66124G/s
BM_bcmp<uint8_t, InequalHalfway>_BigO          0.38 N          0.38 N
BM_bcmp<uint8_t, InequalHalfway>_RMS              3 %             3 %
<...>
BM_bcmp<uint16_t, InequalHalfway>/256000     127858 ns       127860 ns         5477 bytes_read/iteration=1000k bytes_read/sec=7.45873G/s eltcnt=1.40211G eltcnt/sec=2.00219G/s
BM_bcmp<uint16_t, InequalHalfway>_BigO         0.50 N          0.50 N
BM_bcmp<uint16_t, InequalHalfway>_RMS             0 %             0 %
<...>
BM_bcmp<uint32_t, InequalHalfway>/128000      49140 ns        49140 ns        14281 bytes_read/iteration=1000k bytes_read/sec=19.4072G/s eltcnt=1.82797G eltcnt/sec=2.60478G/s
BM_bcmp<uint32_t, InequalHalfway>_BigO         0.40 N          0.40 N
BM_bcmp<uint32_t, InequalHalfway>_RMS            18 %            18 %
<...>
BM_bcmp<uint64_t, InequalHalfway>/64000       32101 ns        32099 ns        21786 bytes_read/iteration=1000k bytes_read/sec=29.7101G/s eltcnt=1.3943G eltcnt/sec=1.99381G/s
BM_bcmp<uint64_t, InequalHalfway>_BigO         0.50 N          0.50 N
BM_bcmp<uint64_t, InequalHalfway>_RMS             1 %             1 %
RUNNING: build-new/test/llvm-bcmp-bench --benchmark_out=/tmp/tmpQ46PP0
2019-04-25 21:19:29
Running build-new/test/llvm-bcmp-bench
Run on (8 X 4000 MHz CPU s)
CPU Caches:
  L1 Data 16K (x8)
  L1 Instruction 64K (x4)
  L2 Unified 2048K (x4)
  L3 Unified 8192K (x1)
Load Average: 1.01, 2.85, 3.71
---------------------------------------------------------------------------------------------------
Benchmark                                         Time             CPU   Iterations UserCounters...
---------------------------------------------------------------------------------------------------
<...>
BM_bcmp<uint8_t, Identical>/512000            18593 ns        18590 ns        37565 bytes_read/iteration=1000k bytes_read/sec=51.2991G/s eltcnt=19.2333G eltcnt/sec=27.541G/s
BM_bcmp<uint8_t, Identical>_BigO               0.04 N          0.04 N
BM_bcmp<uint8_t, Identical>_RMS                  37 %            37 %
<...>
BM_bcmp<uint16_t, Identical>/256000           18950 ns        18948 ns        37223 bytes_read/iteration=1000k bytes_read/sec=50.3324G/s eltcnt=9.52909G eltcnt/sec=13.511G/s
BM_bcmp<uint16_t, Identical>_BigO              0.08 N          0.08 N
BM_bcmp<uint16_t, Identical>_RMS                 34 %            34 %
<...>
BM_bcmp<uint32_t, Identical>/128000           18627 ns        18627 ns        37895 bytes_read/iteration=1000k bytes_read/sec=51.198G/s eltcnt=4.85056G eltcnt/sec=6.87168G/s
BM_bcmp<uint32_t, Identical>_BigO              0.16 N          0.16 N
BM_bcmp<uint32_t, Identical>_RMS                 35 %            35 %
<...>
BM_bcmp<uint64_t, Identical>/64000            18855 ns        18855 ns        37458 bytes_read/iteration=1000k bytes_read/sec=50.5791G/s eltcnt=2.39731G eltcnt/sec=3.3943G/s
BM_bcmp<uint64_t, Identical>_BigO              0.32 N          0.32 N
BM_bcmp<uint64_t, Identical>_RMS                 33 %            33 %
<...>
BM_bcmp<uint8_t, InequalHalfway>/512000        9570 ns         9569 ns        73500 bytes_read/iteration=1000k bytes_read/sec=99.6601G/s eltcnt=37.632G eltcnt/sec=53.5046G/s
BM_bcmp<uint8_t, InequalHalfway>_BigO          0.02 N          0.02 N
BM_bcmp<uint8_t, InequalHalfway>_RMS             29 %            29 %
<...>
BM_bcmp<uint16_t, InequalHalfway>/256000       9547 ns         9547 ns        74343 bytes_read/iteration=1000k bytes_read/sec=99.8971G/s eltcnt=19.0318G eltcnt/sec=26.8159G/s
BM_bcmp<uint16_t, InequalHalfway>_BigO         0.04 N          0.04 N
BM_bcmp<uint16_t, InequalHalfway>_RMS            29 %            29 %
<...>
BM_bcmp<uint32_t, InequalHalfway>/128000       9396 ns         9394 ns        73521 bytes_read/iteration=1000k bytes_read/sec=101.518G/s eltcnt=9.41069G eltcnt/sec=13.6255G/s
BM_bcmp<uint32_t, InequalHalfway>_BigO         0.08 N          0.08 N
BM_bcmp<uint32_t, InequalHalfway>_RMS            30 %            30 %
<...>
BM_bcmp<uint64_t, InequalHalfway>/64000        9499 ns         9498 ns        73802 bytes_read/iteration=1000k bytes_read/sec=100.405G/s eltcnt=4.72333G eltcnt/sec=6.73808G/s
BM_bcmp<uint64_t, InequalHalfway>_BigO         0.16 N          0.16 N
BM_bcmp<uint64_t, InequalHalfway>_RMS            28 %            28 %
Comparing build-old/test/llvm-bcmp-bench to build-new/test/llvm-bcmp-bench
Benchmark                                                  Time             CPU      Time Old      Time New       CPU Old       CPU New
---------------------------------------------------------------------------------------------------------------------------------------
<...>
BM_bcmp<uint8_t, Identical>/512000                      -0.9570         -0.9570        432131         18593        432101         18590
<...>
BM_bcmp<uint16_t, Identical>/256000                     -0.8826         -0.8826        161408         18950        161409         18948
<...>
BM_bcmp<uint32_t, Identical>/128000                     -0.7714         -0.7714         81497         18627         81488         18627
<...>
BM_bcmp<uint64_t, Identical>/64000                      -0.6239         -0.6239         50138         18855         50138         18855
<...>
BM_bcmp<uint8_t, InequalHalfway>/512000                 -0.9503         -0.9503        192405          9570        192392          9569
<...>
BM_bcmp<uint16_t, InequalHalfway>/256000                -0.9253         -0.9253        127858          9547        127860          9547
<...>
BM_bcmp<uint32_t, InequalHalfway>/128000                -0.8088         -0.8088         49140          9396         49140          9394
<...>
BM_bcmp<uint64_t, InequalHalfway>/64000                 -0.7041         -0.7041         32101          9499         32099          9498
```

What can we tell from the benchmark?
* Performance of naive equality check somewhat improves with element size,
  maxing out at eltcnt/sec=1.58603G/s for uint16_t, or bytes_read/sec=19.0209G/s
  for uint64_t. I think, that instability implies performance problems.
* Performance of `memcmp()`-aware benchmark always maxes out at around
  bytes_read/sec=51.2991G/s for every type. That is 2.6x the throughput of the
  naive variant!
* eltcnt/sec metric for the `memcmp()`-aware benchmark maxes out at
  eltcnt/sec=27.541G/s for uint8_t (was: eltcnt/sec=1.18491G/s, so 24x) and
  linearly decreases with element size.
  For uint64_t, it's ~4x+ the elements/second.
* The call obvious is more pricey than the loop, with small element count.
  As it can be seen from the full output {F8768210}, the `memcmp()` is almost
  universally worse, independent of the element size (and thus buffer size) when
  element count is less than 8.

So all in all, bcmp idiom does indeed pose untapped performance headroom.
This diff does implement said idiom recognition. I think a reasonable test
coverage is present, but do tell if there is anything obvious missing.

Now, quality. This does succeed to build and pass the test-suite, at least
without any non-bundled elements. {F8768216} {F8768217}
This transform fires 91 times:
```
$ /build/test-suite/utils/compare.py -m loop-idiom.NumBCmp result-new.json
Tests: 1149
Metric: loop-idiom.NumBCmp

Program                                         result-new

MultiSourc...Benchmarks/7zip/7zip-benchmark    79.00
MultiSource/Applications/d/make_dparser         3.00
SingleSource/UnitTests/vla                      2.00
MultiSource/Applications/Burg/burg              1.00
MultiSourc.../Applications/JM/lencod/lencod     1.00
MultiSource/Applications/lemon/lemon            1.00
MultiSource/Benchmarks/Bullet/bullet            1.00
MultiSourc...e/Benchmarks/MallocBench/gs/gs     1.00
MultiSourc...gs-C/TimberWolfMC/timberwolfmc     1.00
MultiSourc...Prolangs-C/simulator/simulator     1.00
```
The size changes are:
I'm not sure what's going on with SingleSource/UnitTests/vla.test yet, did not look.
```
$ /build/test-suite/utils/compare.py -m size..text result-{old,new}.json --filter-hash
Tests: 1149
Same hash: 907 (filtered out)
Remaining: 242
Metric: size..text

Program                                        result-old result-new diff
test-suite...ingleSource/UnitTests/vla.test   753.00     833.00     10.6%
test-suite...marks/7zip/7zip-benchmark.test   1001697.00 966657.00  -3.5%
test-suite...ngs-C/simulator/simulator.test   32369.00   32321.00   -0.1%
test-suite...plications/d/make_dparser.test   89585.00   89505.00   -0.1%
test-suite...ce/Applications/Burg/burg.test   40817.00   40785.00   -0.1%
test-suite.../Applications/lemon/lemon.test   47281.00   47249.00   -0.1%
test-suite...TimberWolfMC/timberwolfmc.test   250065.00  250113.00   0.0%
test-suite...chmarks/MallocBench/gs/gs.test   149889.00  149873.00  -0.0%
test-suite...ications/JM/lencod/lencod.test   769585.00  769569.00  -0.0%
test-suite.../Benchmarks/Bullet/bullet.test   770049.00  770049.00   0.0%
test-suite...HMARK_ANISTROPIC_DIFFUSION/128    NaN        NaN        nan%
test-suite...HMARK_ANISTROPIC_DIFFUSION/256    NaN        NaN        nan%
test-suite...CHMARK_ANISTROPIC_DIFFUSION/64    NaN        NaN        nan%
test-suite...CHMARK_ANISTROPIC_DIFFUSION/32    NaN        NaN        nan%
test-suite...ENCHMARK_BILATERAL_FILTER/64/4    NaN        NaN        nan%
Geomean difference                                                   nan%
         result-old    result-new       diff
count  1.000000e+01  10.00000      10.000000
mean   3.152090e+05  311695.40000  0.006749
std    3.790398e+05  372091.42232  0.036605
min    7.530000e+02  833.00000    -0.034981
25%    4.243300e+04  42401.00000  -0.000866
50%    1.197370e+05  119689.00000 -0.000392
75%    6.397050e+05  639705.00000 -0.000005
max    1.001697e+06  966657.00000  0.106242
```

I don't have timings though.

And now to the code. The basic idea is to completely replace the whole loop.
If we can't fully kill it, don't transform.
I have left one or two comments in the code, so hopefully it can be understood.

Also, there is a few TODO's that i have left for follow-ups:
* widening of `memcmp()`/`bcmp()`
* step smaller than the comparison size
* Metadata propagation
* more than two blocks as long as there is still a single backedge?
* ???

Reviewers: reames, fhahn, mkazantsev, chandlerc, craig.topper, courbet

Reviewed By: courbet

Subscribers: hiraditya, xbolva00, nikic, jfb, gchatelet, courbet, llvm-commits, mclow.lists

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61144

llvm-svn: 370454
2019-08-30 09:51:23 +00:00
David Stenberg b35d4699d0 [LiveDebugValues] Insert entry values after bundles
Summary:
Change LiveDebugValues so that it inserts entry values after the bundle
which contains the clobbering instruction. Previously it would insert
the debug value after the bundle head using insertAfter(), breaking the
bundle.

Reviewers: djtodoro, NikolaPrica, aprantl, vsk

Reviewed By: vsk

Subscribers: hiraditya, llvm-commits

Tags: #debug-info, #llvm

Differential Revision: https://reviews.llvm.org/D66888

llvm-svn: 370448
2019-08-30 09:06:50 +00:00
Martin Storsjo 9438221785 [COFF] Add a ResourceSectionRef method for getting resource contents
This allows llvm-readobj to print the contents of each resource
when printing resources from an object file or executable, like it
already does for plain .res files.

This requires providing the whole COFFObjectFile to ResourceSectionRef.

This supports both object files and executables. For executables,
the DataRVA field is used as is to look up the right section.

For object files, ideally we would need to complete linking of them
and fix up all relocations to know what the DataRVA field would end up
being. In practice, the only thing that makes sense for an RVA field
is an ADDR32NB relocation. Thus, find a relocation pointing at this
field, verify that it has the expected type, locate the symbol it
points at, look up the section the symbol points at, and read from the
right offset in that section.

This works both for GNU windres object files (which use one single
.rsrc section, with all relocations against the base of the .rsrc
section, with the original value of the DataRVA field being the
offset of the data from the beginning of the .rsrc section) and
cvtres object files (with two separate .rsrc$01 and .rsrc$02 sections,
and one symbol per data entry, with the original pre-relocated DataRVA
field being set to zero).

Differential Revision: https://reviews.llvm.org/D66820

llvm-svn: 370433
2019-08-30 06:55:49 +00:00
Petar Avramovic e96892a8aa [MIPS GlobalISel] Lower uitofp
Add custom lowering for G_UITOFP for MIPS32.

Differential Revision: https://reviews.llvm.org/D66930

llvm-svn: 370432
2019-08-30 05:51:12 +00:00
Petar Avramovic 6412b56513 [MIPS GlobalISel] Lower fptoui
Add lower for G_FPTOUI. Algorithm is similar to the SDAG version
in TargetLowering::expandFP_TO_UINT.
Lower G_FPTOUI for MIPS32.

Differential Revision: https://reviews.llvm.org/D66929

llvm-svn: 370431
2019-08-30 05:44:02 +00:00
Dan Gohman 8cfeeaf9de [CodeGen] Fix lowering for returning the result of an extractvalue
When the number of return values exceeds the number of registers available,
SelectionDAGBuilder::visitRet transforms a function's return to use a
pointer to a buffer to hold return values. When the returned value is an
operator such as extractvalue, the value may have a non-zero result number.
Add that number to the indexing when obtaining the values to store.

This fixes https://bugs.llvm.org/show_bug.cgi?id=43132.

Differential Revision: https://reviews.llvm.org/D66978

llvm-svn: 370430
2019-08-30 04:33:22 +00:00
Jinsong Ji 54a1ad5bd7 [PowerPC][NFC] Use -mtriple in RUN line, remove target triple in tls.ll
To avoid confusion, especially when -mtriple are also added for PPC32.

llvm-svn: 370427
2019-08-30 02:57:33 +00:00
Fangrui Song 7704b54389 [PPC32] Emit R_PPC_GOT_TPREL16 instead R_PPC_GOT_TPREL16_LO
Unlike ppc64, which has ADDISgotTprelHA+LDgotTprelL pairs,
ppc32 just uses LDgotTprelL32, so it does not make lots of sense to use
_LO without a paired _HA.

Emit R_PPC_GOT_TPREL16 instead R_PPC_GOT_TPREL16_LO to match GCC, and
get better linker relocation check. Note, R_PPC_GOT_TPREL16_{HA,LO}
don't have good linker support:

(a) lld does not support R_PPC_GOT_TPREL16_{HA,LO}.
(b) Top of tree ld.bfd does not support R_PPC_GOT_REL16_HA Initial-Exec -> Local-Exec relaxation:

  // a.o
  addis 3, 3, tsd_tls@got@tprel@ha
  lwz 3, tsd_tls@got@tprel@l(3)
  add 3, 3, tsd_tls@tls
  // b.o
  .section .tdata,"awT"; .globl tsd_tls; tsd_tls:

  // ld/ld-new a.o b.o
  internal error, aborting at ../../bfd/elf32-ppc.c:7952 in ppc_elf_relocate_section

Reviewed By: adalava

Differential Revision: https://reviews.llvm.org/D66925

llvm-svn: 370426
2019-08-30 02:20:49 +00:00
Dan Gohman da84b688f9 [WebAssembly] Make __attribute__((used)) not imply export.
Add an WASM_SYMBOL_NO_STRIP flag, so that __attribute__((used)) doesn't
need to imply exporting. When targeting Emscripten, have
WASM_SYMBOL_NO_STRIP imply exporting.

Differential Revision: https://reviews.llvm.org/D62542

llvm-svn: 370415
2019-08-29 22:40:00 +00:00
Philip Reames 452e5647a5 [Tests] Precommit a few cases where we're missing oppurtunities for block local simplications off assumes.
llvm-svn: 370414
2019-08-29 22:08:17 +00:00
Jinsong Ji 1ed7d2119e [PowerPC] Support extended mnemonics mffprwz etc.
Summary:
Reported in https://github.com/opencv/opencv/issues/15413.

We have serveral extended mnemonics for Move To/From Vector-Scalar Register Instructions
eg: mffprd,mtfprd etc.

We only support one of them, this patch add the others.

Reviewers: nemanjai, steven.zhang, hfinkel, #powerpc

Reviewed By: hfinkel

Subscribers: wuzish, qcolombet, hiraditya, kbarton, MaskRay, shchenz, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66963

llvm-svn: 370411
2019-08-29 21:53:59 +00:00
Jessica Paquette 04e657be28 [AArch64][GlobalISel] Select arithmetic extended register patterns
This teaches GISel to select patterns which fold an extend plus optional shift
into the addressing mode. In particular, adds and subs.

Factor out the arith extended register ComplexPatterns in AArch64InstrFormats.td
and create GISel equivalents.

Add some equivalent functions to the ones in AArch64ISelDAGToDAG:

- `selectArithExtendedRegister`
- `narrowExtendRegIfNeeded`
- `getExtendTypeForInst`

`getExtendTypeForInst` includes the checks for loads and stores. This will be
used for WRO addressing modes in loads + stores.

Teach selectCopy to properly handle subregister copies on the same bank in
order to support `narrowExtendRegIfNeeded`. The extended register must be a
GPR32, so we need to support same-bank subregister copies.

Fix a bug in getSubRegForClass which would cause registers on things like
GPR32common to end up getting ssub. Just change the check to look for FPR32
rather than GPR32.

For tests:

- Add select-arith-extended-reg.mir
- Update addsub_ext.ll to include GlobalISel checks

Differential Revision: https://reviews.llvm.org/D66835

llvm-svn: 370410
2019-08-29 21:53:58 +00:00
Reid Kleckner 5b79e603d3 [X86] Don't emit unreachable stack adjustments
Summary:
This is a minor improvement on our past attempts to do this. Fixes
PR43155.

Reviewers: hans

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66905

llvm-svn: 370409
2019-08-29 21:24:41 +00:00
Reid Kleckner 81e458d001 Allow '@' to appear in x86 mingw symbols
Summary:
There is no reason to differ in assembler behavior here between -msvc
and -gnu targets. Without this setting, the text after the '@' is
interpreted as a symbol variable, like foo@IMGREL.

Reviewers: mstorsjo

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66974

llvm-svn: 370408
2019-08-29 21:15:02 +00:00
Sanjay Patel 33541fafde [InstCombine] add possible bswap as widening shuffle test; NFC
Goes with the proposal in D66965.

llvm-svn: 370407
2019-08-29 20:57:50 +00:00
Simon Pilgrim 3d705a1fa4 [X86][SSE] combinePMULDQ - pmuldq(x, 0) -> zero vector (PR43159)
ISD::isBuildVectorAllZeros permits undef elements to be present, which means we can't return it as a zero vector. PMULDQ/PMULUDQ is an extending multiply so a multiply by zero of the lower 32-bits should result in a zero 64-bit element.

llvm-svn: 370404
2019-08-29 20:22:08 +00:00
Julian Lettner af78899457 [ASan] Version mismatch check follow-up
Follow-up for:
[ASan] Make insertion of version mismatch guard configurable
3ae9b9d5e4

This tiny change makes sure that this test passes on our internal bots
as well.

llvm-svn: 370403
2019-08-29 20:20:05 +00:00
Matt Arsenault cbd1782c79 AMDGPU/GlobalISel: Legalize sin/cos
llvm-svn: 370402
2019-08-29 20:06:48 +00:00
Jordan Rupprecht f9f81289e6 Revert [MBP] Disable aggressive loop rotate in plain mode
This reverts r369664 (git commit 51f48295cb)

It causes many benchmark regressions, internally and in llvm's benchmark suite.

llvm-svn: 370398
2019-08-29 19:03:58 +00:00
Alina Sbirlea 4b87023bae Revert enabling MemorySSA.
Breaks sanitizers bots.

Differential Revision: https://reviews.llvm.org/D58311

llvm-svn: 370397
2019-08-29 19:01:23 +00:00
Craig Topper 5a43fdd313 [X86] Remove what little support we had for MPX
-Deprecate -mmpx and -mno-mpx command line options
-Remove CPUID detection of mpx for -march=native
-Remove MPX from all CPUs
-Remove MPX preprocessor define

I've left the "mpx" string in the backend so we don't fail on old IR, but its not connected to anything.

gcc has also deprecated these command line options. https://www.phoronix.com/scan.php?page=news_item&px=GCC-Patch-To-Drop-MPX

Differential Revision: https://reviews.llvm.org/D66669

llvm-svn: 370393
2019-08-29 18:09:02 +00:00
Matt Arsenault caff0a88dd GlobalISel: Add known bits to InstructionSelector
AMDGPU uses this for some addressing mode selection patterns. The
analysis run itself doesn't do anything so it seems easier to just
always require this than adding a way to opt in.

llvm-svn: 370388
2019-08-29 17:24:32 +00:00
Alina Sbirlea 6289ee941d [MemorySSA & LoopPassManager] Enable MemorySSA as loop dependency. Update tests.
Summary:
I'm not planning to check this in at the moment, but feedback is very welcome, in particular how this affects performance.
The feedback obtains here will guide the next steps towards enabling this.

This patch enables the use of MemorySSA in the loop pass manager.

Passes that currently use MemorySSA:
 - EarlyCSE
Passes that use MemorySSA after this patch:
 - EarlyCSE
 - LICM
 - SimpleLoopUnswitch
Loop passes that update MemorySSA (and do not use it yet, but could use it after this patch):
 - LoopInstSimplify
 - LoopSimplifyCFG
 - LoopUnswitch
 - LoopRotate
 - LoopSimplify
 - LCSSA
Loop passes that do *not* update MemorySSA:
 - IndVarSimplify
 - LoopDelete
 - LoopIdiom
 - LoopSink
 - LoopUnroll
 - LoopInterchange
 - LoopUnrollAndJam
 - LoopVectorize
 - LoopReroll
 - IRCE

Reviewers: chandlerc, george.burgess.iv, davide, sanjoy, gberry

Subscribers: jlebar, Prazek, dmgreen, jdoerfert, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D58311

llvm-svn: 370384
2019-08-29 17:08:13 +00:00
Jessica Paquette ba04f5fac1 [GlobalISel][AArch64] Select llvm.aarch64.stxr* intrinsics.
Add a GISelPredicateCode to the stxr_* PatFrags in AArch64InstrAtomics.td.

This allows us to select these intrinsics.

Differential Revision: https://reviews.llvm.org/D65779

llvm-svn: 370382
2019-08-29 16:55:55 +00:00
Sanjay Patel 63411910a2 [InstCombine] add tests for bswap disguised as shuffle; NFC
Somewhat motivating case In PR43146:
https://bugs.llvm.org/show_bug.cgi?id=43146

But that's a lot more complicated.

llvm-svn: 370381
2019-08-29 16:48:00 +00:00
Jessica Paquette b8b23a1648 [GlobalISel][AArch64] Use a GISelPredicateCode to select llvm.aarch64.stlxr.*
Remove manual selection code for this intrinsic and use a GISelPredicateCode
instead.

This allows us to fully select this intrinsic without any tricky custom C++
matching.

Differential Revision: https://reviews.llvm.org/D65780

llvm-svn: 370380
2019-08-29 16:45:19 +00:00
Jessica Paquette 87720ac8c8 [AArch64][GlobalISel] Select @llvm.aarch64.ldxr.* intrinsics
Same thing as D66897, but for ldxr.* instead. Add a GISelPredicateCode to the
ldxr_* definitions, which allows us to import them.

Add select-ldxr-intrin.mir, and update arm64-ldxr-stxr.ll.

Differential Revision: https://reviews.llvm.org/D66898

llvm-svn: 370378
2019-08-29 16:33:01 +00:00
Jessica Paquette c327daeea5 [AArch64][GlobalISel] Select @llvm.aarch64.ldaxr.* intrinsics
Add a GISelPredicateCode to ldaxr_*. This allows us to import the patterns for
@llvm.aarch64.ldaxr.*, and thus select them.

Add `isLoadStoreOfNumBytes` for the GISelPredicateCode, since each of these
intrinsics involves the same check.

Add select-ldaxr-intrin.mir, and update arm64-ldxr-stxr.ll.

Differential Revision: https://reviews.llvm.org/D66897

llvm-svn: 370377
2019-08-29 16:16:38 +00:00
Michael Liao 001871dee8 [SimplifyCFG] Skip sinking common lifetime markers of `alloca`.
Summary:
- Similar to the workaround in fix of PR30188, skip sinking common
  lifetime markers of `alloca`. They are mostly left there after
  inlining functions in branches.

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66950

llvm-svn: 370376
2019-08-29 16:12:05 +00:00
Jinsong Ji 8b0317ad7d [PowerPC][NFC] Update fp-int-conversions-direct-moves.ll using script
Also add -ppc-asm-full-reg-names,-ppc-vsr-nums-as-vr.

llvm-svn: 370375
2019-08-29 15:38:02 +00:00
Roman Lebedev 05ef49515e [NFC][SimplifyCFG] 'Safely extract low bits' pattern will also benefit from -phi-node-folding-threshold=3
This is the naive implementation of x86 BZHI/BEXTR instruction:
it takes input and bit count, and extracts low nbits up to bit width.
I.e. unlike shift it does not have any UB when nbits >= bitwidth.
Which means we don't need a while PHI here, simple select will do.
And if it's a select, it should then be trivial to fix codegen
to select it to BEXTR/BZHI.

See https://bugs.llvm.org/show_bug.cgi?id=34704

llvm-svn: 370369
2019-08-29 14:46:49 +00:00