Commit Graph

55430 Commits

Author SHA1 Message Date
Craig Topper 484b342c68 [X86] Add constant folding for AVX512 versions of scalar floating point to integer conversion intrinsics.
Summary:
We've supported constant folding for sse versions for many years. This patch adds support for the avx512 versions including unsigned with the default rounding mode. We could probably do more with other roundings modes and SAE in the future.

The test cases are largely based on the sse.ll test cases. But I did add some test cases to ensure the unsigned versions don't accept negative values. Also checked the bounds of f64->i32 conversions to make sure unsigned has a larger positive range than signed.

Reviewers: RKSimon, spatel, chandlerc

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D50553

llvm-svn: 339529
2018-08-12 22:09:54 +00:00
Matt Arsenault 3763f307bd AMDGPU: Cleanup min/max legacy tests
Also add some more tests in preparation for
a future patch.

llvm-svn: 339526
2018-08-12 19:29:53 +00:00
Matt Arsenault 1201301b94 DAG: Check no-signed-zeros instead of unsafe-fp-math
Addresses fixme, although this should still be checking individual
operand flags.

llvm-svn: 339525
2018-08-12 19:09:12 +00:00
David Bolvansky cd57242587 [NFC] Fixed build, updated tests
llvm-svn: 339524
2018-08-12 18:32:53 +00:00
David Bolvansky ddfe408f9a [NFC] Renamed test file
llvm-svn: 339523
2018-08-12 17:43:27 +00:00
David Bolvansky 01d98cc03f [InstCombine] Fold Select with binary op - non-commutative opcodes
Summary:
Basic version was merged - https://reviews.llvm.org/D49954

This adds support for FP & non-commutative opcodes

Precommited tests: https://reviews.llvm.org/rL338727

Reviewers: spatel, lebedev.ri

Reviewed By: spatel

Subscribers: jfb

Differential Revision: https://reviews.llvm.org/D50190

llvm-svn: 339520
2018-08-12 17:30:07 +00:00
Sanjay Patel dc185ee275 [InstCombine] fix/enhance fadd/fsub factorization
(X * Z) + (Y * Z) --> (X + Y) * Z
  (X * Z) - (Y * Z) --> (X - Y) * Z
  (X / Z) + (Y / Z) --> (X + Y) / Z
  (X / Z) - (Y / Z) --> (X - Y) / Z

The existing code that implemented these folds failed to 
optimize vectors, and it transformed code with multiple 
uses when it should not have.

llvm-svn: 339519
2018-08-12 15:48:26 +00:00
Sanjay Patel ce104b6c16 [InstCombine] move/add tests for fadd/fsub factorization; NFC
llvm-svn: 339518
2018-08-12 15:06:15 +00:00
Matt Arsenault 13b0db9285 AMDGPU: Check NSZ MI flag when folding omod
I'm not sure the exact nsz flag combination that
is OK. I think as long as it's on either, this is OK.
For now just check it on the omod multiply.

llvm-svn: 339513
2018-08-12 08:44:25 +00:00
Matt Arsenault b5acec1f79 AMDGPU: Use splat vectors for undefs when folding canonicalize
If one of the elements is undef, use the canonicalized constant
from the other element instead of 0.

Splat vectors are more useful for other optimizations, such
as matching vector clamps. This was breaking on clamps
of half3 from the undef 4th component.

llvm-svn: 339512
2018-08-12 08:42:54 +00:00
Matt Arsenault 3ead7d7389 AMDGPU: Fix packing undef parts of build_vector
llvm-svn: 339511
2018-08-12 08:42:46 +00:00
David Green f7111d1ece [UnJ] Improve explicit loop count checks
Try to improve the computed counts when it has been explicitly set by a pragma
or command line option. This moves the code around, so that first call to
computeUnrollCount to get a sensible count and override that if explicit unroll
and jam counts are specified.

Also added some extra debug messages for when unroll and jamming is disabled.

Differential Revision: https://reviews.llvm.org/D50075

llvm-svn: 339501
2018-08-11 07:37:31 +00:00
Craig Topper 570d47a010 [X86] Change the MOV32ri64 pseudo instruction to def a GR64 directly instead of wrapping it in a SUBREG_TO_REG.
Now we switch to the subregister in expandPostRAPseudos where we already switched the opcode.

This simplifies a few isel patterns that used the pseudo directly. And magically seems to have improved our ability to CSE it in the undef-label.ll test.

llvm-svn: 339496
2018-08-11 05:33:00 +00:00
Tom Stellard 69bf876b49 [gold] Fix Tests cases on i686
Reviewers: tejohnson

Reviewed By: tejohnson

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D50583

llvm-svn: 339492
2018-08-11 01:08:34 +00:00
Tom Stellard 8adc86a7dc AMDGPU/GlobalISel: Define instruction mapping for G_INSERT
Reviewers: arsenm

Reviewed By: arsenm

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D49625

llvm-svn: 339491
2018-08-11 00:51:54 +00:00
Philip Reames 85afd1a9a0 [LICM] Hoist assumes out of loops
If we have an assume which is known to execute and whose operand is invariant, we can lift that into the pre-header. So long as we don't change which paths the assume executes on, this is a legal transformation. It's likely to be a useful canonicalization as other transforms only look for dominating assumes.

Differential Revision: https://reviews.llvm.org/D50364

llvm-svn: 339481
2018-08-10 22:21:56 +00:00
Wouter van Oortmerssen ab26bd0647 [WebAssembly] Added default stack-only instruction mode for MC.
Summary:
Moved Explicit Locals pass to last.
Made that pass obligatory.
Made it convert from register to stack based instructions, and removed the registers.
Fixes to related code that was expecting register based instructions.
Added the correct testing flag to all tests, depending on what the
format they were expecting so far.
Translated one test to stack format as example: reg-stackify-stack.ll

tested:
llvm-lit -v `find test -name WebAssembly`
unittests/MC/*

Reviewers: dschuff, sunfish

Subscribers: jfb, llvm-commits, aheejin, eraman, jgravelle-google, sbc100

Differential Revision: https://reviews.llvm.org/D50568

llvm-svn: 339474
2018-08-10 21:32:47 +00:00
Eli Friedman e1687a89e8 [ARM] Adjust AND immediates to make them cheaper to select.
LLVM normally prefers to minimize the number of bits set in an AND
immediate, but that doesn't always match the available ARM instructions.
In Thumb1 mode, prefer uxtb or uxth where possible; otherwise, prefer
a two-instruction sequence movs+ands or movs+bics.

Some potential improvements outlined in
ARMTargetLowering::targetShrinkDemandedConstant, but seems to work
pretty well already.

The ARMISelDAGToDAG fix ensures we don't generate an invalid UBFX
instruction due to a larger-than-expected mask. (It's orthogonal, in
some sense, but as far as I can tell it's either impossible or nearly
impossible to reproduce the bug without this change.)

According to my testing, this seems to consistently improve codesize by
a small amount by forming bic more often for ISD::AND with an immediate.

Differential Revision: https://reviews.llvm.org/D50030

llvm-svn: 339472
2018-08-10 21:21:53 +00:00
Zachary Turner 29ec67b62f [MS Demangler] Support extern "C" functions.
There are two cases we need to support with extern "C"
functions.  The first is the case of a '9' indicating that
the function has no prototype.  This occurs when we mangle
a symbol inside of an extern "C" function, but not the
function itself.

The second case is when we have an overloaded extern "C"
functions.  In this case we emit $$J0 to indicate this.
This patch adds support for both of these cases.

llvm-svn: 339471
2018-08-10 21:09:05 +00:00
Sanjay Patel 0b62b01129 [InstCombine] add tests for fsub factorization; NFC
The tests show that;
1. The fold doesn't fire for vectors, but it should.
2. The fold fires regardless of uses, but it shouldn't.

llvm-svn: 339470
2018-08-10 21:00:27 +00:00
Sanjay Patel 3950095edf [InstCombine] add tests to show disabling of libcall/intrinsic shrinking; NFC
llvm-svn: 339467
2018-08-10 20:12:36 +00:00
Zachary Turner 909b819cf9 Resubmit r339450 - [MS Demangler] Add conversion operator tests
This was broken because of a malformed check line.  Incidentally,
this exposed a case where we crash when we should just be returning
an error, so we should fix that.  The demangler shouldn't crash due
to user input.

llvm-svn: 339466
2018-08-10 20:08:46 +00:00
Zachary Turner 073620bc3b [MS Demangler] Demangle cv qualifiers on template args.
Before we wouldn't properly demangle something like
Foo<const int>.  Template args have a special escape sequence
'$$C' that is optional, but if it is present contains
qualifiers.  So we need to check for this and only if it
present, demangle qualifiers before demangling the type.

With this fix, we re-enable some tests that were previously
marked FIXME.

llvm-svn: 339465
2018-08-10 19:57:36 +00:00
Matt Arsenault 940e6075e4 AMDGPU: More canonicalized operations
llvm-svn: 339464
2018-08-10 19:20:17 +00:00
Sanjay Patel 8988b8d92c revert r339450 - [MS Demangler] Add conversion operator tests
Something here causes an assertion failure that killed a bunch of bots.
Example:
http://lab.llvm.org:8011/builders/reverse-iteration/builds/7021/steps/check_all/logs/stdio

llvm-svn: 339463
2018-08-10 19:20:16 +00:00
Matt Arsenault 3dcf4ce435 AMDGPU: Combine and of seto/setuo and fp_class
Clear the nan (or non-nan) test bits from the mask.

llvm-svn: 339462
2018-08-10 18:58:56 +00:00
Matt Arsenault d35f46caf1 AMDGPU: Turn class x, p_zero|n_zero into fcmp oeq x, 0
The library does use this for some reason.

llvm-svn: 339461
2018-08-10 18:58:49 +00:00
Matt Arsenault 8ad00d30fa AMDGPU: Match isfinite pattern to class instructions
llvm-svn: 339460
2018-08-10 18:58:41 +00:00
Sanjay Patel 12a2911f62 [InstCombine] add/update tests for selectBinOpIdentity; NFC
This includes a test that would have exposed the bug in rL339439
which was reverted at rL339446. The compare can be integer while
the binop is FP or vice-versa, so we need to use the binop type
when we ask for the identity constant.

llvm-svn: 339453
2018-08-10 17:20:24 +00:00
Zachary Turner d664117794 [MS Demangler] Add conversion operator tests.
The mangled names were added in the original commit, but
the demangled equivalents weren't, so nothing was actually
being checked.

llvm-svn: 339450
2018-08-10 16:55:59 +00:00
Evgeniy Stepanov 453e7ac785 [hwasan] Add -hwasan-with-ifunc flag.
Summary: Similar to asan's flag, it can be used to disable the use of ifunc to access hwasan shadow address.

Reviewers: vitalybuka, kcc

Subscribers: srhines, hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D50544

llvm-svn: 339447
2018-08-10 16:21:37 +00:00
David Bolvansky 5099835541 [InstCombine][NFC] Added tests for select with binop fold
llvm-svn: 339441
2018-08-10 15:29:09 +00:00
Zachary Turner a17721cf5d [MS Demangler] Properly demangle conversion operators.
These were completely broken before.  We need to handle
the 'B' operator tag.

llvm-svn: 339436
2018-08-10 15:04:56 +00:00
Zachary Turner e89f2fa657 [MS Demangler] Disable a couple of tests.
The check lines are marked FIXME but not the mangled names.
This is causing an error.

llvm-svn: 339435
2018-08-10 14:53:33 +00:00
Zachary Turner dbefc6cd4e [MS Demangler] Fix several issues related to templates.
These were uncovered when porting the mangling tests in
ms-templates.cpp from clang/CodeGenCXX over to demangling
tests.  The main issues fixed here are surrounding integer
literal signed and unsignedness, empty array dimensions,
and pointer and reference non-type template parameters.

Differential Revision: https://reviews.llvm.org/D50512

llvm-svn: 339434
2018-08-10 14:31:04 +00:00
Sam Parker 8c4b964c5a [ARM] Disallow zexts in ARMCodeGenPrepare
Enabling ARMCodeGenPrepare by default caused a whole load of
failures. This is due to zexts and truncs not being handled properly.
ZExts are messy so it's just easier to disable for now and truncs
are allowed only as 'sinks'. I still need to figure out why allowing
them as 'sources' causes so many failures. The other main changes are
that we are explicit in the types that we converting to, it's now
always 'TypeSize'. Type support is also now performed while checking
for valid opcodes as it unnecessarily complicated having the checks
are different stages.
    
I've moved the tests around too, so we have the zext and truncs in
their own file as well as the overflowing opcode tests.

Differential Revision: https://reviews.llvm.org/D50518

llvm-svn: 339432
2018-08-10 13:57:13 +00:00
Hans Wennborg d4090be340 Rename the cfguard module flag to cfguardtable
The previous name sounds like it inserts cfguard implementation, but it
really just emits the table of address-taken functions. Change the name
to better reflect that.

Clang will be updated in the next commit.

llvm-svn: 339419
2018-08-10 09:48:53 +00:00
Max Kazantsev 4e9def57c7 [NFC] Add tests that demonstrate that MustExecute is fundamentally broken
llvm-svn: 339417
2018-08-10 09:20:46 +00:00
Alexander Potapenko 75a954330b [MSan] Shrink the register save area for non-SSE builds
If code is compiled for X86 without SSE support, the register save area
doesn't contain FPU registers, so `AMD64FpEndOffset` should be equal to
`AMD64GpEndOffset`.

llvm-svn: 339414
2018-08-10 08:06:43 +00:00
George Burgess IV ff08c80efc [MemorySSA] "Fix" lifetime intrinsic handling
MemorySSA currently creates MemoryAccesses for lifetime intrinsics, and
sometimes treats them as clobbers. This may/may not be the best way
forward, but while we're doing it, we should consider
MayAlias/PartialAlias to be clobbers.

The ideal fix here is probably to remove all of this reasoning about
lifetimes from MemorySSA + put it into the passes that need to care. But
that's a wayyy broader fix that needs some consensus, and we have
miscompiles + a release branch today, and this should solve the
miscompiles just as well.

differential revision is D43269. Landing without an explicit LGTM (and
without using the special please-autoclose-this syntax) so we can still
use that revision as a place to decide what the right fix here is.

llvm-svn: 339411
2018-08-10 05:14:43 +00:00
David Bolvansky 909889b2cb [InstCombine] Transform str(n)cmp to memcmp
Summary:
Motivation examples:
int strcmp_memcmp() {
    char buf[12];
    return strcmp(buf, "key") == 0;
}

int strcmp_memcmp2() {
    char buf[12];
    return strcmp(buf, "key") != 0;
}

int strncmp_memcmp() {
    char buf[12];
    return strncmp(buf, "key", 3) == 0;
}

can be turned to memcmp.

See test file for more cases.

Reviewers: efriedma

Reviewed By: efriedma

Subscribers: spatel, llvm-commits

Differential Revision: https://reviews.llvm.org/D50233

llvm-svn: 339410
2018-08-10 04:32:54 +00:00
Heejin Ahn 5831e9cc79 [WebAssembly] Gate i64x2 and f64x2 on -wasm-enable-unimplemented
Summary:
i64x2 and f64x2 operations are not implemented in V8, so we normally
do not want to emit them. However, they are in the SIMD spec proposal,
so we still want to be able to test them in the toolchain. This patch
adds a flag to enable their emission.

Reviewers: aheejin, dschuff

Subscribers: sunfish, jgravelle-google, sbc100, llvm-commits

Differential Revision: https://reviews.llvm.org/D50423

Patch by Thomas Lively (tlively)

llvm-svn: 339407
2018-08-09 23:58:51 +00:00
Craig Topper 9a8136f7b4 [X86] Qualify one of the heuristics in combineMul to only apply to positive multiply amounts.
This seems to slightly help the performance of one of our internal benchmarks. We probably need better heuristics here.

llvm-svn: 339406
2018-08-09 23:27:42 +00:00
Jordan Rupprecht 88ed5e59bd [llvm-objcopy] NFC: Add some color to error()
llvm-svn: 339404
2018-08-09 22:52:03 +00:00
Matt Arsenault d54b7f0592 ValueTracking: Start enhancing isKnownNeverNaN
llvm-svn: 339399
2018-08-09 22:40:08 +00:00
Sanjay Patel c6944f795d [InstSimplify] move minnum/maxnum with Inf folds from instcombine
llvm-svn: 339396
2018-08-09 22:20:44 +00:00
Ana Pazos 10de234905 [RISC-V] Fixed alias for addi x2, x2, 0
A missing check for non-zero immediate in MCOperandPredicate
caused c.addi16sp sp, 0 to be selected which is not a valid
instruction.

llvm-svn: 339381
2018-08-09 20:51:53 +00:00
Philip Reames ca256d93fb [LICM] hoist fences out of loops w/o memory operations
The motivating case is an otherwise dead loop with a fence in it. At the moment, this goes all the way through the optimizer and we end up emitting an entirely pointless loop on x86. This case may seem a bit contrived, but we've seen it in real code as the result of otherwise reasonable lowering strategies combined w/thread local memory optimizations (such as escape analysis).

To handle this simple case, we can teach LICM to hoist must execute fences when there is no other memory operation within the loop.

Differential Revision: https://reviews.llvm.org/D50489

llvm-svn: 339378
2018-08-09 20:18:42 +00:00
Sanjay Patel 55accd7dd3 [InstCombine] allow fsub+fmul FMF folds for vectors
llvm-svn: 339368
2018-08-09 18:42:12 +00:00
Krzysztof Parzyszek 75c2ca3638 [Hexagon] Map ISD::TRAP to J2_trap0(#0)
llvm-svn: 339365
2018-08-09 18:03:45 +00:00
Alina Sbirlea bf9fe79397 SCEV should forget all loops containing a deleted block.
Summary:
LoopSimplifyCFG should update ScEv for all loops after a block is deleted.
If the deleted block "Succ" is part of L, then it is part of all parent loops, so forget topmost loop.

Reviewers: greened, mkazantsev, sanjoy

Subscribers: jlebar, javed.absar, uabelho, llvm-commits

Differential Revision: https://reviews.llvm.org/D50422

llvm-svn: 339363
2018-08-09 17:53:26 +00:00
Paul Semel 7a3dc2c184 [llvm-objcopy] Add --prefix-symbols option
Differential Revision: https://reviews.llvm.org/D50381

llvm-svn: 339362
2018-08-09 17:49:04 +00:00
Sanjay Patel 373790293e [InstCombine] add vector tests for fsub+fmul; NFC
llvm-svn: 339361
2018-08-09 17:40:27 +00:00
Reid Kleckner 80c6ec11d9 [GlobalOpt] Don't apply fastcc if it would break inalloca invariants
The inalloca parameter has to be the only parameter passed in memory.
Changing the convention to fastcc can break that.

At some point we should teach global opt how to optimize ABI attributes
like inalloca and maybe byval. These attributes are mainly used to match
C ABIs. They are harder for LLVM to optimize and they don't always
generate the best code.

Fixes PR38487

llvm-svn: 339360
2018-08-09 17:29:26 +00:00
Sanjay Patel 15d1501aae [SelectionDAG] try harder to convert funnel shift to rotate
Similar to rL337966 - if the DAGCombiner's rotate matching was 
working as expected, I don't think we'd see any test diffs here.

AArch only goes right, and PPC only goes left. 
x86 has both, so no diffs there.

Differential Revision: https://reviews.llvm.org/D50091

llvm-svn: 339359
2018-08-09 17:26:22 +00:00
Paul Semel a42dec7a1b [llvm-objcopy] Add --dump-section
Differential Revision: https://reviews.llvm.org/D49979

llvm-svn: 339358
2018-08-09 17:05:21 +00:00
Michael Berg ca38254601 extend folding fsub/fadd to fneg for FMF
Summary: This change provides a common optimization path for both Unsafe and FMF driven optimization for this fsub fold adding reassociation, as it the flag that most closely represents the translation

Reviewers: spatel, wristow, arsenm

Reviewed By: spatel

Subscribers: wdng

Differential Revision: https://reviews.llvm.org/D50195

llvm-svn: 339357
2018-08-09 17:00:03 +00:00
Evandro Menezes 9a92fe0c9e [ARM] Replace processor check with feature
Add new feature, `FeatureUseWideStrideVFP`, that replaces the need for a
processor check.  Otherwise, NFC.

llvm-svn: 339354
2018-08-09 16:13:24 +00:00
Sjoerd Meijer 806f70d229 [ARM] FP16: codegen support for VTRN
Differential Revision: https://reviews.llvm.org/D50454

llvm-svn: 339340
2018-08-09 12:45:09 +00:00
Simon Pilgrim 511c3fc529 [X86][SSE] Remove PMULDQ/PMULUDQ by zero
Exposed by D50328

Differential Revision: https://reviews.llvm.org/D50328

llvm-svn: 339337
2018-08-09 12:37:36 +00:00
Simon Pilgrim 01ae462fef [X86][SSE] Combine (some) target shuffles with multiple uses
As discussed on D41794, we have many cases where we fail to combine shuffles as the input operands have other uses.

This patch permits these shuffles to be combined as long as they don't introduce additional variable shuffle masks, which should reduce instruction dependencies and allow the total number of shuffles to still drop without increasing the constant pool.

However, this may mean that some memory folds may no longer occur, and on pre-AVX require the occasional extra register move.

This also exposes some poor PMULDQ/PMULUDQ codegen which was doing unnecessary upper/lower calculations which will in fact fold to zero/undef - the fix will be added in a followup commit.

Differential Revision: https://reviews.llvm.org/D50328

llvm-svn: 339335
2018-08-09 12:30:02 +00:00
Jonas Hahnfeld 20526bf483 [NVPTX] Select atomic loads and stores
According to PTX ISA .volatile has the same memory synchronization
semantics as .relaxed.sys, so it can be used to implement monotonic
atomic loads and stores. This is important for OpenMP's atomic
construct where
 - 'read's and 'write's are lowered to atomic loads and stores, and
 - an update of float or double types are lowered into a cmpxchg loop.
(Note that PTX could do better because it has atom.add.f{32,64} but
LLVM's atomicrmw instruction only allows integer types.)

Higher levels of atomicity (like acquire and release) need additional
synchronization properties which were added with PTX ISA 6.0 / sm_70.
So using these instructions still results in an error.

Differential Revision: https://reviews.llvm.org/D50391

llvm-svn: 339316
2018-08-09 07:45:49 +00:00
Roger Ferrer Ibanez 577a97e2b9 [RISCV] Add "lla" pseudo-instruction to assembler
This pseudo-instruction is similar to la but uses PC-relative addressing
unconditionally. This is, la is only different to lla when using -fPIC. This
pseudo-instruction seems often forgotten in several specs but it is definitely
mentioned in binutils opcodes/riscv-opc.c. The semantics are defined both in
page 37 of the "RISC-V Reader" book but also in function macro found in
gas/config/tc-riscv.c.

This is a very first step towards adding PIC support for Linux in the RISC-V
backend.

The lla pseudo-instruction expands to a sequence of auipc + addi with a couple
of pc-rel relocations where the second points to the first one. This is
described in
https://github.com/riscv/riscv-elf-psabi-doc/blob/master/riscv-elf.md#pc-relative-symbol-addresses

For now, this patch only introduces support of that pseudo instruction at the
assembler parser.

Differential Revision: https://reviews.llvm.org/D49661

llvm-svn: 339314
2018-08-09 07:08:20 +00:00
Philip Reames 954eab1087 [LICM] Add tests for future hoisting of fence instructions [NFC]
The main interesting case is a fence in an otherwise dead loop or one containing only arithmetic.  This can happen as a result of DSE or other transforms from seemingly reasonable initial IR.  

llvm-svn: 339310
2018-08-09 04:21:02 +00:00
Petr Hosek eb46c95c3e [CMake] Use normalized Windows target triples
Changes the default Windows target triple returned by
GetHostTriple.cmake from the old environment names (which we wanted to
move away from) to newer, normalized ones. This also requires updating
all tests to use the new systems names in constraints.

Differential Revision: https://reviews.llvm.org/D47381

llvm-svn: 339307
2018-08-09 02:16:18 +00:00
Paul Robinson 508b081514 [DWARF] Verifier now handles .debug_types sections.
Differential Revision: https://reviews.llvm.org/D50466

llvm-svn: 339302
2018-08-08 23:50:22 +00:00
Sanjay Patel f9a80fe87a [x86] add test for commuted variant for fsub fold; NFC
llvm-svn: 339300
2018-08-08 23:06:59 +00:00
Sanjay Patel e47dc1a405 [DAGCombiner] loosen constraints for fsub+fadd fold
isNegatibleForFree() should not matter here (as the test diffs show)
because it's always a win to replace an fsub+fadd with fneg. The
problem in D50195 persists because either (1) we are doing these
folds in the wrong order or (2) we're missing another fold for fadd.

llvm-svn: 339299
2018-08-08 23:04:43 +00:00
Petr Hosek 7b27454477 [ADT] Normalize empty triple components
LLVM triple normalization is handling "unknown" and empty components
differently; for example given "x86_64-unknown-linux-gnu" and
"x86_64-linux-gnu" which should be equivalent, triple normalization
returns "x86_64-unknown-linux-gnu" and "x86_64--linux-gnu". autoconf's
config.sub returns "x86_64-unknown-linux-gnu" for both
"x86_64-linux-gnu" and "x86_64-unknown-linux-gnu". This changes the
triple normalization to behave the same way, replacing empty triple
components with "unknown".

This addresses PR37129.

Differential Revision: https://reviews.llvm.org/D50219

llvm-svn: 339294
2018-08-08 22:23:57 +00:00
Sanjay Patel f8937c8406 [x86] add tests for fsub+fadd with FMF; NFC
These are related to the block of code under review in D50195.

llvm-svn: 339293
2018-08-08 22:18:16 +00:00
Jonas Devlieghere 49ff4d9041 [DWARF] Unclamp line table version on Darwin for v5 and later.
On Darwin we pin the DWARF line tables to version 2. Stop doing so for
DWARF v5 and later.

Differential revision: https://reviews.llvm.org/D49381

llvm-svn: 339288
2018-08-08 21:16:50 +00:00
Eli Friedman 5b45a39056 [ARM] Avoid spilling lr with Thumb1 tail calls.
Normally, if any registers are spilled, we prefer to spill lr on Thumb1
so we can fold the "bx lr" into the "pop".  However, if there are tail
calls involved, restoring lr is expensive, so skip the optimization in
that case.

The spill of r7 in the new test also isn't necessary, but that's
mostly orthogonal to this patch. (It's the same code in
ARMFrameLowering, but it's not related to tail calls.)

Differential Revision: https://reviews.llvm.org/D49459

llvm-svn: 339283
2018-08-08 20:03:10 +00:00
Ties Stuij 0244aa67d6 revert tests of '[CodeGen] emit inline asm clobber list warnings for reserved'
llvm-svn: 339276
2018-08-08 17:19:32 +00:00
Zachary Turner d346cba91b [MS Demangler] Create a new backref context for template instantiations.
Template manglings use a fresh back-referencing context, so we
need to do the same.  This fixes several existing tests which are
marked as FIXME, so those are now actually run.

llvm-svn: 339275
2018-08-08 17:17:04 +00:00
Krzysztof Parzyszek 1df7059150 [Hexagon] Diagnose misaligned absolute loads and stores
Differential Revision: https://reviews.llvm.org/D50405

llvm-svn: 339272
2018-08-08 17:00:09 +00:00
Matt Arsenault 935f3b70fe AMDGPU: Error more gracefully on libcalls
I think this is the only situation where the callsite
will have a null instruction.

llvm-svn: 339271
2018-08-08 16:58:39 +00:00
Matt Arsenault e719139b10 AMDGPU: Fix shifts for i128
llvm-svn: 339270
2018-08-08 16:58:33 +00:00
Jonas Devlieghere 8511777d3a [WASM] Fix overflow when reading custom section
When reading a custom WASM section, it was possible that its name
extended beyond the size of the section. This resulted in a bogus value
for the section size due to the size overflowing.

Fixes heap buffer overflow detected by OSS-fuzz:
https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=8190

Differential revision: https://reviews.llvm.org/D50387

llvm-svn: 339269
2018-08-08 16:34:03 +00:00
Jonas Devlieghere caacedb03e [DebugInfo] Fine tune emitting flags as part of the producer
When using APPLE extensions, don't duplicate the compiler invocation's
flags both in AT_producer and AT_APPLE_flags.

Differential revision: https://reviews.llvm.org/D50453

llvm-svn: 339268
2018-08-08 16:33:22 +00:00
Sanjay Patel fe839695a8 [InstCombine] fold fadd+fsub with common operand
This is a sibling to the simplify from:
https://reviews.llvm.org/rL339174

llvm-svn: 339267
2018-08-08 16:19:22 +00:00
Sanjay Patel 2054dd79c2 [InstCombine] fold fsub+fsub with common operand
This is a sibling to the simplify from:
rL339171

llvm-svn: 339266
2018-08-08 16:04:48 +00:00
Sanjay Patel abd4767a0d [InstCombine] add tests for fsub folds; NFC
The scalar cases are handled in instcombine's internal
reassociation pass for FP ops, but it misses the vector types.

These patterns are similar to what was handled in InstSimplify in:
https://reviews.llvm.org/rL339171
https://reviews.llvm.org/rL339174
https://reviews.llvm.org/rL339176
...but we can't use instsimplify on these because we require negation
of the original operand.

llvm-svn: 339263
2018-08-08 15:44:56 +00:00
Zaara Syeda b2595b988b [PowerPC] Improve codegen for vector loads using scalar_to_vector
This patch aims to improve the codegen for vector loads involving the
scalar_to_vector (load X) sequence. Initially, ld->mv instructions were used
for scalar_to_vector (load X), so this patch allows scalar_to_vector (load X)
to utilize:

LXSD and LXSDX for i64 and f64
LXSIWAX for i32 (sign extension to i64)
LXSIWZX for i32 and f64

Committing on behalf of Amy Kwan.
Differential Revision: https://reviews.llvm.org/D48950

llvm-svn: 339260
2018-08-08 15:20:43 +00:00
Ties Stuij 52f3631f4b [CodeGen] emit inline asm clobber list warnings for reserved
Summary:
Currently, in line with GCC, when specifying reserved registers like sp or pc on an inline asm() clobber list, we don't always preserve the original value across the statement. And in general, overwriting reserved registers can have surprising results.

For example:


```
extern int bar(int[]);

int foo(int i) {
  int a[i]; // VLA
  asm volatile(
      "mov r7, #1"
    :
    :
    : "r7"
  );

  return 1 + bar(a);
}
```

Compiled for thumb, this gives:
```
$ clang --target=arm-arm-none-eabi -march=armv7a -c test.c -o - -S -O1 -mthumb
...
foo:
        .fnstart
@ %bb.0:                                @ %entry
        .save   {r4, r5, r6, r7, lr}
        push    {r4, r5, r6, r7, lr}
        .setfp  r7, sp, #12
        add     r7, sp, #12
        .pad    #4
        sub     sp, #4
        movs    r1, #7
        add.w   r0, r1, r0, lsl #2
        bic     r0, r0, #7
        sub.w   r0, sp, r0
        mov     sp, r0
        @APP
        mov.w   r7, #1
        @NO_APP
        bl      bar
        adds    r0, #1
        sub.w   r4, r7, #12
        mov     sp, r4
        pop     {r4, r5, r6, r7, pc}
...
```

r7 is used as the frame pointer for thumb targets, and this function needs to restore the SP from the FP because of the variable-length stack allocation a. r7 is clobbered by the inline assembly (and r7 is included in the clobber list), but LLVM does not preserve the value of the frame pointer across the assembly block.

This type of behavior is similar to GCC's and has been discussed on the bugtracker: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=11807 . No consensus seemed to have been reached on the way forward.  Clang behavior has briefly been discussed on the CFE mailing (starting here: http://lists.llvm.org/pipermail/cfe-dev/2018-July/058392.html). I've opted for following Eli Friedman's advice to print warnings when there are reserved registers on the clobber list so as not to diverge from GCC behavior for now.

The patch uses MachineRegisterInfo's target-specific knowledge of reserved registers, just before we convert the inline asm string in the AsmPrinter.

If we find a reserved register, we print a warning:
```
repro.c:6:7: warning: inline asm clobber list contains reserved registers: R7 [-Winline-asm]
      "mov r7, #1"
      ^
```

Reviewers: eli.friedman, olista01, javed.absar, efriedma

Reviewed By: efriedma

Subscribers: efriedma, eraman, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D49727

llvm-svn: 339257
2018-08-08 15:15:59 +00:00
Alex Bradbury 07224dfb47 [RISCV] Add mnemonic alias: move, sbreak and scall.
Further improve compatibility with the GNU assembler.

Differential Revision: https://reviews.llvm.org/D50217
Patch by Kito Cheng.

llvm-svn: 339255
2018-08-08 14:53:45 +00:00
Simon Pilgrim 164e8b0b5c [TargetLowering] BuildUDIV - Add support for divide by one (PR38477)
Provide a pass-through of the numerator for divide by one cases - this is the same approach we take in DAGCombiner::visitSDIVLike.

I investigated whether we could achieve this by magic MULHU/SRL values but nothing appeared to work as we don't have a way for MULHU(x,c) -> x

llvm-svn: 339254
2018-08-08 14:51:19 +00:00
Alex Bradbury 7d8d87c143 [RISCV] Add InstAlias definitions for add[w], and, xor, or, sll[w], srl[w], sra[w], slt and sltu with immediate
Match the GNU assembler in supporting immediate operands for these 
instructions even when the reg-reg mnemonic is used.

Differential Revision: https://reviews.llvm.org/D50046
Patch by Kito Cheng.

llvm-svn: 339252
2018-08-08 14:45:44 +00:00
Sjoerd Meijer 1919ecfd0b [ARM][NFC] Replaced tab-characters in test file vtrn.ll
llvm-svn: 339251
2018-08-08 14:42:11 +00:00
Sanjay Patel a194b2d2ff [InstCombine] fold fneg into constant operand of fmul/fdiv
This accounts for the missing IR fold noted in D50195. We don't need any fast-math to enable the negation transform. 
FP negation can always be folded into an fmul/fdiv constant to eliminate the fneg.

I've limited this to one-use to ensure that we are eliminating an instruction rather than replacing fneg by a 
potentially expensive fdiv or fmul.

Differential Revision: https://reviews.llvm.org/D50417

llvm-svn: 339248
2018-08-08 14:29:08 +00:00
Simon Pilgrim 9f5b8f093e [X86][SSE] PR38477 test is more cleanly tested with udiv instead of urem
Making the test use urem relies on it calling udiv-like combines, but the real issue is with the udiv so we're better off using that directly.

llvm-svn: 339247
2018-08-08 14:11:44 +00:00
Roman Lebedev a677651a5a [InstCombine] De Morgan: sink 'not' into 'xor' (PR38446)
Summary:
https://rise4fun.com/Alive/IT3

Comes up in the [most ugliest]  `signed int` -> `signed char`  case of
`-fsanitize=implicit-conversion` (https://reviews.llvm.org/D50250)
Previously, we were stuck with `not`: {F6867736}
But now we are able to completely get rid of it: {F6867737}
(FIXME: why are we loosing the metadata? that seems wrong/strange.)

Here, we only want to do that it we will be able to completely
get rid of that 'not'.

Reviewers: spatel, craig.topper

Reviewed By: spatel

Subscribers: vsk, erichkeane, llvm-commits

Differential Revision: https://reviews.llvm.org/D50301

llvm-svn: 339243
2018-08-08 13:31:19 +00:00
Sjoerd Meijer f8c394f0f5 [ARM] FP16: codegen support for VEXT
Differential Revision: https://reviews.llvm.org/D50427

llvm-svn: 339241
2018-08-08 13:26:38 +00:00
Sjoerd Meijer db5908deb9 [ARM] FP16: vector vmov and vdup support
This adds codegen support for the vmov_n_f16 and vdup_n_f16 variants.

Differential Revision: https://reviews.llvm.org/D50329

llvm-svn: 339238
2018-08-08 13:11:31 +00:00
Sjoerd Meijer 920a453485 [ARM] FP16: vector VMUL variants
This adds codegen support for the vmul_lane_f16 and vmul_n_f16 variants.

Differential Revision: https://reviews.llvm.org/D50326

llvm-svn: 339232
2018-08-08 10:27:34 +00:00
Simon Pilgrim 5477f11ba3 [X86][SSE] Add divide-by-one exact sdiv vector test
Based on PR38477, we need to ensure we're testing for divide-by-one in non-uniform vectors

llvm-svn: 339231
2018-08-08 10:16:43 +00:00
Simon Pilgrim a10cfcc1db [TargetLowering] BuildUDIV - Early out for divide by one (PR38477)
We're not handling the UDIV by one special case properly - for now just early out.

llvm-svn: 339229
2018-08-08 10:00:54 +00:00
Sjoerd Meijer b33a4c02cc [ARM] FP16: support vector INT_TO_FP and FP_TO_INT
This adds codegen support for the different vcvt_f16 variants.

Differential Revision: https://reviews.llvm.org/D50393

llvm-svn: 339227
2018-08-08 09:45:34 +00:00
Thomas Preud'homme 4107b31df2 Support inline asm with multiple 64bit output in 32bit GPR
Summary: Extend fix for PR34170 to support inline assembly with multiple output operands that do not naturally go in the register class it is constrained to (eg. double in a 32-bit GPR as in the PR).

Reviewers: bogner, t.p.northover, lattner, javed.absar, efriedma

Reviewed By: efriedma

Subscribers: efriedma, tra, eraman, javed.absar, llvm-commits

Differential Revision: https://reviews.llvm.org/D45437

llvm-svn: 339225
2018-08-08 09:35:26 +00:00
Roman Lebedev c6a00f545c [NFC][InstCombine] Cleanup demorgan-sink-not-into-xor.ll test
We are only going to do it if it is free to do.

llvm-svn: 339223
2018-08-08 08:46:07 +00:00
Sjoerd Meijer b264944ed5 [ARM] FP16: support the vector vmin and vmax variants
Differential Revision: https://reviews.llvm.org/D50238

llvm-svn: 339221
2018-08-08 07:20:15 +00:00
Max Kazantsev c9dca6df78 [NFC] Add some tests on mustexec
llvm-svn: 339219
2018-08-08 04:40:47 +00:00
Zachary Turner 58d29cf590 [MS Demangler] Properly handle backreferencing of special names.
Function template names are not stored in the backref table,
but non-template function names are.  The general pattern seems
to be that when you are demangling a symbol name, if the name
starts with '?' it does not go into the backreference table,
otherwise it does.  Note that this even handles the general case
of operator names (template or otherwise) not going into the
back-reference table, anonymous namespaces not going into the
backreference table, etc.

It's important that we apply this check *only* for the
unqualified portion of a name, and only for symbol names.
For example, this does not apply to type names (such as class
templates) and we need to make sure that these still do go
into the backref table.

Differential Revision: https://reviews.llvm.org/D50394

llvm-svn: 339211
2018-08-08 00:43:31 +00:00
Sanjay Patel 979423c996 [InstCombine] add tests for fneg fold including FMF; NFC
llvm-svn: 339203
2018-08-07 23:24:25 +00:00
Sanjay Patel bac052ef52 [InstCombine] fix FP constant in test; NFC
Too many digits...

llvm-svn: 339200
2018-08-07 23:03:29 +00:00
Michael Berg 2e60ad2e58 [NFC] adding tests for Y - (X + Y) --> -X
llvm-svn: 339197
2018-08-07 22:52:57 +00:00
Sanjay Patel 25887da162 [InstCombine] add tests for fneg of fmul/fdiv with constant; NFC
llvm-svn: 339195
2018-08-07 22:30:43 +00:00
Jan Vesely 7b2c98ab59 AMDGPU: Remove broken i16 ternary patterns
Fixup test to check for GCN prefix
These patterns always zero extend the result even though it might need sign extension.
This has been broken since the addition of i16 support.
It has popped up in mad_sat(char) test since min(max()) combination is turned into v_med3, resulting in the following (incorrect) sequence:
        v_mad_i16 v2, v10, v9, v11
        v_med3_i32 v2, v2, v8, v7

Fixes mad_sat(char) piglit on VI.

Differential Revision: https://reviews.llvm.org/D49836

llvm-svn: 339190
2018-08-07 21:54:37 +00:00
Derek Schuff 51ed131ed2 [WebAssembly] Update SIMD binary arithmetic
Add missing SIMD types (v2f64) and binary ops. Also adds
tablegen support for automatically prepending prefix byte to SIMD
opcodes.

Differential Revision: https://reviews.llvm.org/D50292

Patch by Thomas Lively

llvm-svn: 339186
2018-08-07 21:24:01 +00:00
Krzysztof Parzyszek e7ce247dd7 [Hexagon] Allow use of gather intrinsics even with no-packets
Vgather requires must be in a packet with a store, which contradicts
the no-packets feature. As a consequence, gather/scatter could not be
used with no-packets. Relax this, and allow gather packets as exceptions
to the no-packets requirements.

llvm-svn: 339177
2018-08-07 20:33:47 +00:00
Sanjay Patel 9b07347033 [InstSimplify] fold fsub+fadd with common operand
llvm-svn: 339176
2018-08-07 20:32:55 +00:00
Sanjay Patel 4364d604c2 [InstSimplify] fold fadd+fsub with common operand
llvm-svn: 339174
2018-08-07 20:23:49 +00:00
Heejin Ahn 7fb68d2679 [WebAssembly] CFG sort support for exception handling
Summary:
This patch extends CFGSort pass to support exception handling. Once it
places a loop header, it does not place blocks that are not dominated by
the loop header until all the loop blocks are sorted. This patch extends
the same algorithm to exception 'catch' part, using the information
calculated by WebAssemblyExceptionInfo class.

Reviewers: dschuff, sunfish

Subscribers: sbc100, jgravelle-google, llvm-commits

Differential Revision: https://reviews.llvm.org/D46500

llvm-svn: 339172
2018-08-07 20:19:23 +00:00
Sanjay Patel f7a8fb2dee [InstSimplify] fold fsub+fsub with common operand
llvm-svn: 339171
2018-08-07 20:14:27 +00:00
Sanjay Patel 50976393ed [InstSimplify] add tests for fadd/fsub; NFC
Instcombine gets some, but not all, of these cases via
it's internal reassociation transforms. It fails in
all cases with vector types.

llvm-svn: 339168
2018-08-07 19:49:13 +00:00
Alexey Bataev 0edcd0278d [SLP] Fix insert point for reused extract instructions.
Summary:
Reworked the previously committed patch to insert shuffles for reused
extract element instructions in the correct position. Previous logic was
incorrect, and might lead to the crash with PHIs and EH instructions.

Reviewers: efriedma, javed.absar

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D50143

llvm-svn: 339166
2018-08-07 19:21:05 +00:00
Wei Mi b1ef2cc53d [SampleFDO] Fix a bug in getOrCompHotCountThreshold/getOrCompColdCountThreshold
getOrCompHotCountThreshold/getOrCompColdCountThreshold introduced in
https://reviews.llvm.org/D45377 contain a bad mistake and will only return 1 or 0
instead of the true hot/cold cutoff value. The patch fixes the mistake. But the
mistake seems not causing big performance difference according to internal server
benchmarks testing.

Differential Revision: https://reviews.llvm.org/D50370

llvm-svn: 339162
2018-08-07 18:13:10 +00:00
Philip Reames c792e197b4 [LICM] Strengthen assume hoisting tests [NFC]
As requested in review of https://reviews.llvm.org/D50364

llvm-svn: 339159
2018-08-07 17:54:36 +00:00
Craig Topper 49ed49fcb1 [SelectionDAG] When splitting scatter nodes during DAGCombine, create a serial chain dependency.
Scatter could have multiple identical indices. We need to maintain sequential order. We get this right in LegalizeVectorTypes, but not in this code.

Differential Revision: https://reviews.llvm.org/D50374

llvm-svn: 339157
2018-08-07 17:35:02 +00:00
Florian Hahn 950576bdf8 [GVN,NewGVN] Keep nonnull if K does not move.
In combineMetadata, we should be able to preserve K's nonnull metadata,
if K does not move. This condition should hold for all replacements by
NewGVN/GVN, but I added a bunch of assertions to verify that.

Fixes PR35038.

There probably are additional kinds of metadata that could be preserved
using similar reasoning. This is follow-up work.

Reviewers: dberlin, davide, efriedma, nlopes

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D47339

llvm-svn: 339149
2018-08-07 15:36:11 +00:00
Sjoerd Meijer b39cd886b9 [ARM] FP16: codegen support for VACGT
Differential Revision: https://reviews.llvm.org/D50236

llvm-svn: 339148
2018-08-07 15:11:47 +00:00
Andrew V. Tischenko 1fe3375620 [X86] MCA tests for XCHG*, XADD* and CMPXCHG* instructions
Differential Revision: https://reviews.llvm.org/D49912

llvm-svn: 339145
2018-08-07 14:36:43 +00:00
Sanjay Patel 948ff87d7d [InstSimplify] move minnum/maxnum with common op fold from instcombine
llvm-svn: 339144
2018-08-07 14:36:27 +00:00
Sanjay Patel b06d283909 [InstSimplify] add tests for minnum/maxnum with shared op; NFC
llvm-svn: 339142
2018-08-07 14:13:40 +00:00
Sanjay Patel b802d18df7 [InstSimplify] move misplaced minnum/maxnum tests; NFC
llvm-svn: 339141
2018-08-07 14:12:08 +00:00
Jonas Devlieghere 42243df3b9 Fix inconsistency with/without debug information (-g)
This fixes an inconsistency in code generation when compiling with or
without debug information (-g). When debug information is available in
an empty block, the original test would fail, resulting in possibly
different code.

Patch by: Jeroen Dobbelaere

Differential revision: https://reviews.llvm.org/D49467

llvm-svn: 339129
2018-08-07 12:14:01 +00:00
Aleksandar Beserminji 949a17c016 [mips] Handle branch expansion corner cases
When potential jump instruction and target are in the same segment, use
jump instruction with immediate field.

In cases where offset does not fit immediate value of a bc/j instructions,
offset is stored into register, and then jump register instruction is used.

Differential Revision: https://reviews.llvm.org/D48019

llvm-svn: 339126
2018-08-07 10:45:45 +00:00
Pavel Labath 2f0881160c [DebugInfo] Reduce debug_str_offsets section size
Summary:
The accelerator tables use the debug_str section to store their strings.
However, they do not support the indirect method of access that is
available for the debug_info section (DW_FORM_strx et al.).

Currently our code is assuming that all strings can/will be referenced
indirectly, and puts all of them into the debug_str_offsets section.
This is generally true for regular (unsplit) dwarf, but in the DWO case,
most of the strings in the debug_str section will only be used from the
accelerator tables. Therefore the contents of the debug_str_offsets
section will be largely unused and bloating the main executable.

This patch rectifies this by teaching the DwarfStringPool to
differentiate between strings accessed directly and indirectly. When a
user inserts a string into the pool it has to declare whether that
string will be referenced directly or not. If at least one user requsts
indirect access, that string will be assigned an index ID and put into
debug_str_offsets table. Otherwise, the offset table is skipped.

This approach reduces the overall binary size (when compiled with
-gdwarf-5 -gsplit-dwarf) in my tests by about 2% (debug_str_offsets is
shrunk by 99%).

Reviewers: probinson, dblaikie, JDevlieghere

Subscribers: aprantl, mgrang, llvm-commits

Differential Revision: https://reviews.llvm.org/D49493

llvm-svn: 339122
2018-08-07 09:54:52 +00:00
Simon Pilgrim 7e18938793 [TargetLowering] Add support for non-uniform vectors to BuildUDIV
This patch refactors the existing TargetLowering::BuildUDIV base implementation to support non-uniform constant vector denominators.

It also includes a fold for MULHU by pow2 constants to SRL which can now more readily occur from BuildUDIV.

Differential Revision: https://reviews.llvm.org/D49248

llvm-svn: 339121
2018-08-07 09:51:34 +00:00
Simon Pilgrim 974a5a7d94 [X86][SSE] Add more non-uniform exact sdiv vector tests covering all/none ashr paths
llvm-svn: 339120
2018-08-07 09:31:22 +00:00
George Rimar 65a6828b17 [yaml2obj] - Add a support for changing EntSize.
I was trying to add a test case for LLD and found that it
is impossible to set sh_entsize via yaml.
The patch implements the missing part.

Differential revision: https://reviews.llvm.org/D50235

llvm-svn: 339113
2018-08-07 08:11:38 +00:00
Sjoerd Meijer a2ddddfd3e [ARM][NFC] Replaced tab characters in test file vfcmp.ll.
llvm-svn: 339111
2018-08-07 08:05:15 +00:00
Heejin Ahn e8653bb89a [WebAssembly] Enable atomic expansion for unsupported atomicrmws
Summary:
Wasm does not have direct counterparts to some of LLVM IR's atomicrmw
instructions (min, max, umin, umax, and nand). This enables atomic
expansion using cmpxchg instruction within a loop for those atomicrmw
instructions.

Reviewers: dschuff

Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D49440

llvm-svn: 339084
2018-08-07 00:22:22 +00:00
Matt Arsenault 08f3fe4fae AMDGPU: cvt_pk_rtz_f16 canonicalizes
llvm-svn: 339078
2018-08-06 23:01:31 +00:00
Matt Arsenault e94ee833f9 AMDGPU: Handle some vector operations in isCanonicalized
llvm-svn: 339077
2018-08-06 22:45:51 +00:00
Stella Stamenova cc2404c01d [lit, python] Always add quotes around the python path in lit
Summary:
The issue with the python path is that the path to python on Windows can contain spaces. To make the tests always work, the path to python needs to be surrounded by quotes.

This change updates several configuration files which specify the path to python as a substitution and also remove quotes from existing tests.

Reviewers: asmith, zturner, alexshap, jakehehrlich

Reviewed By: zturner, alexshap, jakehehrlich

Subscribers: mehdi_amini, nemanjai, eraman, kbarton, jakehehrlich, steven_wu, dexonsmith, stella.stamenova, delcypher, llvm-commits

Differential Revision: https://reviews.llvm.org/D50206

llvm-svn: 339073
2018-08-06 22:37:44 +00:00
Matt Arsenault a29e76244a AMDGPU: Push fcanonicalize through partially constant build_vector
This usually avoids some re-packing code, and may
help find canonical sources.

llvm-svn: 339072
2018-08-06 22:30:44 +00:00
Peter Collingbourne 69dd7cd45e MC: Redirect .addrsig directives referring to private (.L) symbols to the section symbol.
This matches our behaviour for regular (i.e. relocated) references to
private symbols and therefore avoids needing to unnecessarily write
address-significant .L symbols to the object file's symbol table,
which can interfere with stack traces.

Fixes check-cfi after r339050.

llvm-svn: 339066
2018-08-06 21:59:58 +00:00
Matt Arsenault d49ab0b214 AMDGPU: Treat more custom operations as canonicalizing
Everything should quiet, and I think everything should
flush.

I assume the min3/med3/max3 follow the same rules
as regular min/max for flushing, which should at
least be conservatively correct.

There are still more operations that need to
be handled.

llvm-svn: 339065
2018-08-06 21:58:11 +00:00
Matt Arsenault ce6d61fba8 AMDGPU: Conversions always produce canonical results
Not sure why this was checking for denormals for f16.
My interpretation of the IEEE standard is conversions
should produce a canonical result, and the ISA manual
says denormals are created when appropriate.

llvm-svn: 339064
2018-08-06 21:51:52 +00:00
Philip Reames 94b29601ef [LICM] Further strengthen tests for hoisting guards and invariant.starts [NFC]
llvm-svn: 339062
2018-08-06 21:39:43 +00:00
Matt Arsenault f8768bfc84 AMDGPU: Fix implementation of isCanonicalized
If denormals are enabled, denormals are canonical.
Also fix a few other issues. minnum/maxnum are supposed
to canonicalize. Temporarily improve workaround for the
instruction behavior change in gfx9.

Handle selects and fcopysign.

The tests were also largely broken, since they were
checking for a flush used on some targets after the
store of the result.

llvm-svn: 339061
2018-08-06 21:38:27 +00:00
Philip Reames 9d7bb2f700 [LICM] Strengthen invariant.start hoisting tests [NFC]
llvm-svn: 339057
2018-08-06 21:18:34 +00:00
Reid Kleckner 15e91c3235 [X86] Fix assertion in subreg extraction
This assert fires when attempting to extract a subregister from the
global PIC base register. This virtual register SD node is not in the
VRBaseMap, so we shouldn't call getVR to look it up there. If this is a
RegisterSDNode, we should be able to use the virtual register directly.

Fixes PR38385

llvm-svn: 339056
2018-08-06 21:16:16 +00:00
Philip Reames 81c7dc93d2 [LICM] Add tests highlighting missing hoists for intrinsics [NFC]
llvm-svn: 339054
2018-08-06 21:06:15 +00:00
Evandro Menezes 6e137cb9f0 [SLC] Fix shrinking of pow()
Properly shrink `pow()` to `powf()` as a binary function and, when no other
simplification applies, do not discard it.

Differential revision: https://reviews.llvm.org/D50113

llvm-svn: 339046
2018-08-06 19:40:17 +00:00
Alexandre Ganea 741cc3531a [llvm-pdbutil] Support PDBs without a DBI stream
Differential Revision: https://reviews.llvm.org/D50258

llvm-svn: 339045
2018-08-06 19:35:00 +00:00
Easwaran Raman 10fd92dd94 [X86] Recognize a splat of negate in isFNEG
Summary:
Expand isFNEG so that we generate the appropriate F(N)M(ADD|SUB)
instructions in more cases. For example, the following sequence
a = _mm256_broadcast_ss(f)
d = _mm256_fnmadd_ps(a, b, c)

generates an fsub and fma without this patch and an fnma with this
change.

Reviewers: craig.topper

Subscribers: llvm-commits, davidxl, wmi

Differential Revision: https://reviews.llvm.org/D48467

llvm-svn: 339043
2018-08-06 19:23:38 +00:00
Craig Topper 0076477a4c [X86] When using "and $0" and "orl $-1" to store 0 and -1 for minsize, make sure the store isn't volatile
If the store is volatile this might be a memory mapped IO access. In that case we shouldn't generate a load that didn't exist in the source

Differential Revision: https://reviews.llvm.org/D50270

llvm-svn: 339041
2018-08-06 18:44:26 +00:00
Craig Topper f8a8c746e3 [X86] Add test cases to show bad use of "and $0" and "orl $-1" for minsize when the store is volatile
If the store is volatile we shouldn't be adding a little that didn't exist in the source.

llvm-svn: 339040
2018-08-06 18:44:21 +00:00
Wei Mi 3c1c088500 [RegisterCoalescer] Delay live interval update work until the rematerialization
for all the uses from the same def is done.

We run into a compile time problem with flex generated code combined with
`-fno-jump-tables`. The cause is that machineLICM hoists a lot of invariants
outside of a big loop, and drastically increases the compile time in global
register splitting and copy coalescing.  https://reviews.llvm.org/D49353
relieves the problem in global splitting. This patch is to handle the problem
in copy coalescing.

About the situation where the problem in copy coalescing happens. After
machineLICM, we have several defs outside of a big loop with hundreds or
thousands of uses inside the loop. Rematerialization in copy coalescing
happens for each use and everytime rematerialization is done, shrinkToUses
will be called to update the huge live interval. Because we have 'n' uses
for a def, and each live interval update will have at least 'n' complexity,
the total update work is n^2.

To fix the problem, we try to do the live interval update work in a collective
way. If a def has many copylike uses larger than a threshold, each time
rematerialization is done for one of those uses, we won't do the live interval
update in time but delay that work until rematerialization for all those uses
are completed, so we only have to do the live interval update work once.

Delaying the live interval update could potentially change the copy coalescing
result, so we hope to limit that change to those defs with many
(like above a hundred) copylike uses, and the cutoff can be adjusted by the
option -mllvm -late-remat-update-threshold=xxx.

Differential Revision: https://reviews.llvm.org/D49519

llvm-svn: 339035
2018-08-06 17:30:45 +00:00
Matt Arsenault 0d1b3934e2 AMDGPU: Fold v_lshl_or_b32 with 0 src0
Appears from expansion of some packed cases.

llvm-svn: 339025
2018-08-06 15:40:20 +00:00
Matt Arsenault 56b31d8d75 ValueTracking: Handle canonicalize in CannotBeNegativeZero
Also fix apparently missing test coverage for any of the
handling here.

llvm-svn: 339023
2018-08-06 15:16:26 +00:00
Matt Arsenault dbf77c5b41 AMDGPU: Rename check prefixes in test
Will avoid noisy diff in future change.

llvm-svn: 339022
2018-08-06 15:16:12 +00:00
Bryan Chan e023706471 [AArch64] Fix assertion failure on widened f16 BUILD_VECTOR
Summary:
Ensure that NormalizedBuildVector returns a BUILD_VECTOR with operands of the
same type. This fixes an assertion failure in VerifySDNode.

Reviewers: SjoerdMeijer, t.p.northover, javed.absar

Reviewed By: SjoerdMeijer

Subscribers: kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D50202

llvm-svn: 339013
2018-08-06 14:14:41 +00:00
Tim Northover 9956e4a24b ARM-MachO: don't add Thumb bit for addend to non-external relocation.
ld64 supplies its own Thumb bit for Thumb functions, and intentionally zeroes
out that part of any addend in an object file. But it only does that for
symbols marked N_EXT -- i.e. external symbols. So LLVM should avoid setting
that extra bit in other cases.

llvm-svn: 339007
2018-08-06 11:32:44 +00:00
Max Kazantsev 2dbbd64cb7 Re-enable "[ValueTracking] Teach isKnownNonNullFromDominatingCondition about AND"
The patch was reverted because of bug detected by sanitizer. The bug is fixed,
respective tests added.

Differential Revision: https://reviews.llvm.org/D50172

llvm-svn: 339005
2018-08-06 11:14:18 +00:00
Max Kazantsev 3271f379a9 Revert rL338990 to see if it causes sanitizer failures
Multiple failues reported by sanitizer-x86_64-linux, seem to be caused by this
patch. Reverting to see if they sustain without it.

Differential Revision: https://reviews.llvm.org/D50172

llvm-svn: 338994
2018-08-06 08:10:28 +00:00
Max Kazantsev 34b0666be9 [ValueTracking] Teach isKnownNonNullFromDominatingCondition about AND
`isKnownNonNullFromDominatingCondition` is able to prove non-null basing on `br` or `guard`
by `%p != null` condition, but is unable to do so basing on `(%p != null) && %other_cond`.
This patch allows it to do so.

Differential Revision: https://reviews.llvm.org/D50172
Reviewed By: reames

llvm-svn: 338990
2018-08-06 06:11:36 +00:00
Max Kazantsev eded4abef8 [GuardWidening] Widen guards with conditions of frequently taken dominated branches
If there is a frequently taken branch dominated by a guard, and its condition is available
at the point of the guard, we can widen guard with condition of this branch and convert
the branch into unconditional:

  guard(cond1)
  if (cond2) {
    // taken in 99.9% cases
    // do something
  } else {
    // do something else    
  }

Converts to

  guard(cond1 && cond2)
  // do something

Differential Revision: https://reviews.llvm.org/D49974
Reviewed By: reames

llvm-svn: 338988
2018-08-06 05:49:19 +00:00
David Bolvansky b7fcd10700 [NFC] Fixed inliner tests - 2
llvm-svn: 338973
2018-08-05 16:53:36 +00:00
David Bolvansky 2f1f3b10ad [NFC] Fixed inliner tests
llvm-svn: 338972
2018-08-05 16:30:46 +00:00
David Bolvansky c0aa4b75a4 Enrich inline messages
Summary:
This patch improves Inliner to provide causes/reasons for negative inline decisions.
1. It adds one new message field to InlineCost to report causes for Always and Never instances. All Never and Always instantiations must provide a simple message.
2. Several functions that used to return the inlining results as boolean are changed to return InlineResult which carries the cause for negative decision.
3. Changed remark priniting and debug output messages to provide the additional messages and related inline cost.
4. Adjusted tests for changed printing.

Patch by: yrouban (Yevgeny Rouban)


Reviewers: craig.topper, sammccall, sgraenitz, NutshellySima, shchenz, chandlerc, apilipenko, javed.absar, tejohnson, dblaikie, sanjoy, eraman, xbolva00

Reviewed By: tejohnson, xbolva00

Subscribers: xbolva00, llvm-commits, arsenm, mehdi_amini, eraman, haicheng, steven_wu, dexonsmith

Differential Revision: https://reviews.llvm.org/D49412

llvm-svn: 338969
2018-08-05 14:53:08 +00:00
Eric Christopher 9855a5a0a1 Revert "Add a warning if someone attempts to add extra section flags to sections"
There are a bunch of edge cases and inconsistencies in how we're emitting sections
cause this warning to fire and it needs more work.

This reverts commit r335558.

llvm-svn: 338968
2018-08-05 14:23:37 +00:00
Roman Lebedev 365fa96055 [NFC][InstCombine] Add tests for sinking 'not' into 'xor' (PR38446)
https://rise4fun.com/Alive/IT3

Comes up in the [most ugliest]  signed int -> signed char  case of
-fsanitize=implicit-conversion (https://reviews.llvm.org/D50250)

Not sure if we want to do it always, or only when it is free to invert.

llvm-svn: 338967
2018-08-05 10:15:04 +00:00
Roman Lebedev 656a478e98 [NFC][InstCombine] Regenerate set.ll test
llvm-svn: 338965
2018-08-05 08:53:40 +00:00
Craig Topper fb33181038 [X86] Remove stale comments from a test. NFC
The 16-bit case was recently fixed so this comment no longer applies.

llvm-svn: 338964
2018-08-05 06:25:01 +00:00
David Bolvansky b82a5ec1b6 [InstCombine] [NFC] Tests for strcmp to memcmp transformation
llvm-svn: 338963
2018-08-05 05:46:56 +00:00
Chijun Sima 8b5de48d62 [TailCallElim] Preserve DT and PDT
Summary:
Previously, in the NewPM pipeline, TailCallElim recalculates the DomTree when it modifies any instruction in the Function.
For example,
```
CallInst *CI = dyn_cast<CallInst>(&I);
...
CI->setTailCall();
Modified = true;
...
if (!Modified || ...)
  return PreservedAnalyses::all();
```
After applying this patch, the DomTree only recalculates if needed (plus an extra insertEdge() + an extra deleteEdge() call).

When optimizing SQLite with `-passes="default<O3>"` pipeline of the newPM, the number of DomTree recalculation decreases by 6.2%, the number of nodes visited by DFS decreases by 2.9%. The time used by DomTree will decrease approximately 1%~2.5% after applying the patch.
 
Statistics:
```
Before the patch:
 23010 dom-tree-stats               - Number of DomTree recalculations
489264 dom-tree-stats               - Number of nodes visited by DFS -- DomTree
After the patch:
 21581 dom-tree-stats               - Number of DomTree recalculations
475088 dom-tree-stats               - Number of nodes visited by DFS -- DomTree
```

Reviewers: kuhar, dmgreen, brzycki, grosser, davide

Reviewed By: kuhar, brzycki

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D49982

llvm-svn: 338954
2018-08-04 08:13:47 +00:00
Chijun Sima eacad79777 [ADCE] Remove the need of DomTree
Summary: ADCE doesn't need to query domtree.

Reviewers: kuhar, brzycki, dmgreen, davide, grosser

Reviewed By: kuhar

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D49988

llvm-svn: 338950
2018-08-04 02:50:12 +00:00
Aditya Nandakumar e07b3b737b [GISel]: Add Opcodes for CTLZ/CTTZ/CTPOP
https://reviews.llvm.org/D48600

Added IRTranslator support to translate these known intrinsics into GISel opcodes.

llvm-svn: 338944
2018-08-04 01:22:12 +00:00
Craig Topper 3c869cb5e5 [X86] Add isel patterns for atomic_load+sub+atomic_sub.
Despite the comment removed in this patch, this is beneficial when the RHS of the sub is a register.

llvm-svn: 338930
2018-08-03 22:08:30 +00:00
Craig Topper 84319d1b42 [X86] Add test cases to show missed opportunity to use RMW for atomic_load+sub+atomic_store.
llvm-svn: 338929
2018-08-03 22:08:28 +00:00
Reid Kleckner 8e40702c1c [X86] Re-generate abi-isel.ll checks with update_llc_test_checks.py
These tests were clearly auto-generated when they were converted to
FileCheck back in r80019 (2009), but we didn't have a fancy script to
keep them up to date then. I've reviewed the diff, and we should be
generating the exact same code sequences we used to.

After this, I plan to commit a change that changes our output slightly,
but in a way that is still correct. It will generate a large diff, and I
want it to be clearly correct, so I am regenerating these checks in
preparation for that.

llvm-svn: 338928
2018-08-03 21:58:25 +00:00
Reid Kleckner 5578b53c92 [X86] Make abi-isel.ll like update_llc_test_checks.py output
- Remove -asm-verbose=0 from every llc command. The tests still pass.
- Reorder the RUN lines to match CHECKs.
- Use -LABEL like update_llc_test_checks.py does.

llvm-svn: 338927
2018-08-03 21:58:12 +00:00
Reid Kleckner 13a9035190 [X86] Layout tests exactly as update_llc_test_checks.py would
Put the LLVM IR at the bottom of the function instead of the top.  In my
next patch, I will run update_llc_test_checks.py on this file, and I
want to only highlight the diffs in the CHECK lines. Hopefully by doing
this change first, the patch will be more understandable.

llvm-svn: 338926
2018-08-03 21:57:59 +00:00
Craig Topper d7391eefdf [X86] Remove RELEASE_ and ACQUIRE_ pseudo instructions. Use isel patterns and the normal instructions instead
At one point in time acquire implied mayLoad and mayStore as did release. Thus we needed separate pseudos that also carried that property. This appears to no longer be the case. I believe it was changed in 2012 with a comment saying that atomic memory accesses are marked volatile which preserves the ordering.

So from what I can tell we shouldn't need additional pseudos since they aren't carry any flags that are different from the normal instructions. The only thing I can think of is that we may consider them for load folding candidates in the peephole pass now where we didn't before. If that's important hopefully there's something in the memory operand we can check to prevent the folding without relying on pseudo instructions.

Differential Revision: https://reviews.llvm.org/D50212

llvm-svn: 338925
2018-08-03 21:40:44 +00:00
Craig Topper 8c41136ca3 [X86] Autogenerate complete checks. NFC
llvm-svn: 338921
2018-08-03 20:58:14 +00:00
Anastasis Grammenos 4dfe279e00 [TRE][DebugInfo] Preserve Debug Location in new branch instruction
There are two branch instructions created
so the new test covers them both.

Differential Revision: https://reviews.llvm.org/D50263

llvm-svn: 338917
2018-08-03 20:27:13 +00:00
Craig Topper c4960582ec [SelectionDAG] Teach LegalizeVectorTypes to widen the mask input to a masked store.
The mask operand is visited before the data operand so we need to be able to widen it.

Fixes PR38436.

llvm-svn: 338915
2018-08-03 20:14:18 +00:00
Matt Arsenault c3dc8e65e2 DAG: Enhance isKnownNeverNaN
Add a parameter for testing specifically for
sNaNs - at least one instruction pattern on AMDGPU
needs to check specifically for this.

Also handle more cases, and add a target hook
for custom nodes, similar to the hooks for known
bits.

llvm-svn: 338910
2018-08-03 18:27:52 +00:00
Artem Belevich 0a11b6366a [NVPTX] Handle __nvvm_reflect("__CUDA_ARCH").
Summary:
libdevice in recent CUDA versions relies on __nvvm_reflect() to select
GPU-specific bitcode. This patch addresses the requirement.

Reviewers: jlebar

Subscribers: jholewinski, sanjoy, hiraditya, bixia, llvm-commits

Differential Revision: https://reviews.llvm.org/D50207

llvm-svn: 338908
2018-08-03 18:05:24 +00:00
Craig Topper feb2a58860 [X86] Add a DAG combine for the __builtin_parity idiom used by clang to enable better codegen
Clang uses "ctpop & 1" to implement __builtin_parity. If the popcnt instruction isn't supported this generates a large amount of code to calculate the population count. Instead we can bisect the data down to a single byte using xor and then check the parity flag.

Even when popcnt is supported, its still a good idea to split 64-bit data on 32-bit targets using an xor in front of a single popcnt. Otherwise we get two popcnts and an add before the and.

I've specifically targeted this at the sizes supported by clang builtins, but we could generalize this if we think that's useful.

Differential Revision: https://reviews.llvm.org/D50165

llvm-svn: 338907
2018-08-03 18:00:29 +00:00
Craig Topper b0ad9b9fd7 [X86] Add test cases for the current codegen of __builtin_parity.
Will be improved in a follow commit

llvm-svn: 338906
2018-08-03 18:00:23 +00:00
Joel Galenson cfe5bc158d Fix crash in bounds checking.
In r337830 I added SCEV checks to enable us to insert fewer bounds checks.  Unfortunately, this sometimes crashes when multiple bounds checks are added due to SCEV caching issues.  This patch splits the bounds checking pass into two phases, one that computes all the conditions (using SCEV checks) and the other that adds the new instructions.

Differential Revision: https://reviews.llvm.org/D49946

llvm-svn: 338902
2018-08-03 17:12:23 +00:00
Nicholas Wilson e408a89a3a [WebAssembly] Cleanup of the way globals and global flags are handled
Differential Revision: https://reviews.llvm.org/D44030

llvm-svn: 338894
2018-08-03 14:33:37 +00:00
Jonas Devlieghere 3a92c5c1d3 [DebugInfo/Verifier] Don't emit error for missing module in index
We don't expect module names to be present in the index. This patch adds
DW_TAG_module to the blacklist.

Differential revision: https://reviews.llvm.org/D50237

llvm-svn: 338878
2018-08-03 12:01:43 +00:00
Jonas Paulsson f107b7275c [SystemZ] Improve handling of instructions which expand to several groups
Some instructions expand to more than one decoder group.

This has been hitherto ignored, but is handled with this patch.

Review: Ulrich Weigand
https://reviews.llvm.org/D50187

llvm-svn: 338849
2018-08-03 10:43:05 +00:00
Sjoerd Meijer d62c5ec2fe [ARM] FP16: support vector zip and unzip
This is addressing PR38404.

Differential Revision: https://reviews.llvm.org/D50186

llvm-svn: 338835
2018-08-03 09:24:29 +00:00
Simon Pilgrim 4014fb1049 [X86] Add example of 'zero shift' guards on rotation patterns (PR34924)
Basic pattern that leaves an unnecessary select on a rotation by zero result. This variant is trivial - the more general case with a compare+branch to prevent execution of undefined shifts is more tricky.

llvm-svn: 338833
2018-08-03 09:20:02 +00:00
Sjoerd Meijer 9b30213828 [ARM] FP16: support VFMA
This is addressing PR38404.

llvm-svn: 338830
2018-08-03 09:12:56 +00:00
Craig Topper a7a12399a1 [X86] Remove all the vector NOP bitcast patterns. Use a few lines of code in the Select method in X86ISelDAGToDAG.cpp instead.
There are a lot of permutations of types here generating a lot of patterns in the isel table. It's more efficient to just ReplaceUses and RemoveDeadNode from the Select function.

The test changes are because we have a some shuffle patterns that have a bitcast as their root node. But the behavior is identical to another instruction whose pattern doesn't start with a bitcast. So this isn't a functional change.

llvm-svn: 338824
2018-08-03 07:01:10 +00:00
Craig Topper e902b7d0b0 [X86] Support fp128 and/or/xor/load/store with VEX and EVEX encoded instructions.
Move all the patterns to X86InstrVecCompiler.td so we can keep SSE/AVX/AVX512 all in one place.

To save some patterns we'll use an existing DAG combine to convert f128 fand/for/fxor to integer when sse2 is enabled. This allows use to reuse all the existing patterns for v2i64.

I believe this now makes SHA instructions the only case where VEX/EVEX and legacy encoded instructions could be generated simultaneously.

llvm-svn: 338821
2018-08-03 06:12:56 +00:00
Hiroshi Inoue 73f8b255b6 [InstSimplify] fold extracting from std::pair (2/2)
This is the second patch of the series which intends to enable jump threading for an inlined method whose return type is std::pair<int, bool> or std::pair<bool, int>. 
The first patch is https://reviews.llvm.org/rL338485.

This patch handles code sequences that merges two values using `shl` and `or`, then extracts one value using `and`.

Differential Revision: https://reviews.llvm.org/D49981

llvm-svn: 338817
2018-08-03 05:39:48 +00:00
Craig Topper a80352c04e [X86] When post-processing the DAG to remove zero extending moves for YMM/ZMM, make sure the producing instruction is VEX/XOP/EVEX encoded.
If the producing instruction is legacy encoded it doesn't implicitly zero the upper bits. This is important for the SHA instructions which don't have a VEX encoded version. We might also be able to hit this with the incomplete f128 support that hasn't been ported to VEX.

llvm-svn: 338812
2018-08-03 04:49:42 +00:00
Craig Topper ded14af7aa [X86] Autogenerate complete checks. NFC
llvm-svn: 338811
2018-08-03 04:49:41 +00:00
Craig Topper 55697276dc [X86] Autogenerate complete checks. NFC
llvm-svn: 338802
2018-08-03 01:28:12 +00:00
Craig Topper b99281c9b8 [X86] Autogenerate complete checks. NFC
llvm-svn: 338799
2018-08-03 01:20:32 +00:00
Craig Topper 2c095444a4 [X86] Prevent promotion of i16 add/sub/and/or/xor to i32 if we can fold an atomic load and atomic store.
This makes them consistent with i8/i32/i64. Which still seems to be more aggressive on folding than icc, gcc, or MSVC.

llvm-svn: 338795
2018-08-03 00:37:34 +00:00
Philip Reames 5937368d4f [LICM] Remove unneccessary safety check to increase sinking effectiveness
This one requires a bit of explaination.  It's not every day you simply delete code to implement an optimization.  :)

The transform in question is sinking an instruction from a loop to the uses in loop exiting blocks.  We know (from LCSSA) that all of the uses outside the loop must be phi nodes, and after predecessor splitting, we know all phi users must have a single operand.  Since the use must be strictly dominated by the def, we know from the definition of dominance/ssa that the exit block must execute along a (non-strict) subset of paths which reach the def.  As a result, duplicating a potentially faulting instruction can not *introduce* a fault that didn't previously exist in the program.  

The full story is that this patch builds on "rL338671: [LICM] Factor out fault legality from canHoistOrSinkInst [NFC]" which pulled this logic out of a common helper routine.  As best I can tell, this check was originally added to the helper function for hoisting legality, later an incorrect fastpath for loads/calls was added, and then the bug was fixed by duplicating the fault safety check in the hoist path.  This left the redundant check in the common code to pessimize sinking for no reason.  I split it out in an NFC, and am not removing the unneccessary check.  I wanted there to be something easy to revert in case I missed something.

Reviewed by: Anna Thomas (in person)

llvm-svn: 338794
2018-08-03 00:21:56 +00:00
Dave Lee 3fb120f12e objdump: Better handling of Mach-O universal binaries
Summary:
With Mach-O, there is a flag requirement discrepancy between working with
universal binaries and thin binaries. Many flags that don't require the `-macho`
flag (for example `-private-headers` and `-disassemble`) fail to work on
universal binaries unless `-macho` is given. When this happens, the error
message is unhelpful, stating:

    The file was not recognized as a valid object file.

Which can lead to confusion.

This change allows generic flags to be used on universal binaries with and
without the `-macho` flag. This means flags that can be used for thin files can
be used consistently with fat files too.

To do this, the universal binary support within `ParseInputMachO()` is extracted
into a new function. This new function is called directly from `DumpInput()`
when the input binary is universal. Additionally the `-arch` flag validation in
`ParseInputMachO()` was extracted to be reused.

Reviewers: compnerd

Reviewed By: compnerd

Subscribers: keith, llvm-commits

Differential Revision: https://reviews.llvm.org/D48702

llvm-svn: 338792
2018-08-03 00:06:38 +00:00
Eli Friedman 1ba5e9ac24 [GlobalMerge] Allow merging globals with explicit section markings.
At least on ELF, it's impossible to tell from the object file whether
two globals with the same section marking were merged: the merged global
uses "private" linkage to hide its symbol, and the aliases look like
regular symbols. I can't think of any other reason to disallow it.
(Of course, we can only merge globals in the same section.)

The weird alignment handling matches AsmPrinter; our alignment handling
for global variables should probably be refactored.

Differential Revision: https://reviews.llvm.org/D49822

llvm-svn: 338791
2018-08-02 23:54:16 +00:00
Tim Renouf abd85fb1f5 [AMDGPU] Reworked SIFixWWMLiveness
Summary:
I encountered some problems with SIFixWWMLiveness when WWM is in a loop:

1. It sometimes gave invalid MIR where there is some control flow path
   to the new implicit use of a register on EXIT_WWM that does not pass
   through any def.

2. There were lots of false positives of registers that needed to have
   an implicit use added to EXIT_WWM.

3. Adding an implicit use to EXIT_WWM (and adding an implicit def just
   before the WWM code, which I tried in order to fix (1)) caused lots
   of the values to be spilled and reloaded unnecessarily.

This commit is a rework of SIFixWWMLiveness, with the following changes:

1. Instead of considering any register with a def that can reach the WWM
   code and a def that can be reached from the WWM code, it now
   considers three specific cases that need to be handled.

2. A register that needs liveness over WWM to be synthesized now has it
   done by adding itself as an implicit use to defs other than the
   dominant one.

Also added the following fixmes:

FIXME: We should detect whether a register in one of the above
categories is already live at the WWM code before deciding to add the
implicit uses to synthesize its liveness.

FIXME: I believe this whole scheme may be flawed due to the possibility
of the register allocator doing live interval splitting.

Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D46756

Change-Id: Ie7fba0ede0378849181df3f1a9a7a39ed1a94a94
llvm-svn: 338783
2018-08-02 23:31:32 +00:00
Craig Topper 63873db5c4 [X86] Allow 'atomic_store (neg/not atomic_load)' to isel to a RMW instruction.
There was a FIXMe in the td file about a type inference issue that was easy to fix.

llvm-svn: 338782
2018-08-02 23:30:38 +00:00
Craig Topper 2deeeae2a5 [X86] Add NEG and NOT test cases to atomic_mi.ll in preparation for fixing the FIXME in X86InstrCompiler.td to make these work for atomic load/store.
llvm-svn: 338781
2018-08-02 23:30:31 +00:00
Tim Renouf f1c7b92a6a [AMDGPU] Avoid using divergent value in mubuf addr64 descriptor
Summary:
This fixes a problem where a load from global+idx generated incorrect
code on <=gfx7 when the index is divergent.

Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D47383

Change-Id: Ib4d177d6254b1dd3f8ec0203fdddec94bd8bc5ed
llvm-svn: 338779
2018-08-02 22:53:57 +00:00
Zachary Turner 666de23fbf [MS Demangler] Fix some tests that are no longer broken.
These were fixed with earlier patches, but had not yet been
re-enabled.

llvm-svn: 338778
2018-08-02 22:37:40 +00:00
Krzysztof Parzyszek d91a9e27a9 [Hexagon] Simplify CFG after atomic expansion
This will remove suboptimal branching from the generated ll/sc loops.
The extra simplification pass affects a lot of testcases, which have
been modified to accommodate this change: either by modifying the
test to become immune to the CFG simplification, or (less preferablt)
by adding option -hexagon-initial-cfg-clenaup=0.

llvm-svn: 338774
2018-08-02 22:17:53 +00:00
Heejin Ahn 4128cb0b6b [WebAssembly] Support for atomic.wait / atomic.wake instructions
Summary:
This adds support for atomic.wait / atomic.wake instructions in the wasm
thread proposal.

Reviewers: dschuff

Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D49395

llvm-svn: 338770
2018-08-02 21:44:24 +00:00
Craig Topper db89ec1185 [X86] Autogenerate complete checks. NFC
llvm-svn: 338765
2018-08-02 20:28:45 +00:00
Krzysztof Parzyszek 90f3249ce2 [SCEV] Properly solve quadratic equations
Differential Revision: https://reviews.llvm.org/D48283

llvm-svn: 338758
2018-08-02 19:13:35 +00:00
Sam Clegg 41d7047de5 [WebAssembly] Ensure bitcasts that would result in invalid wasm are removed by FixFunctionBitcasts
Rather than allowing invalid bitcasts to be lowered to wasm
call instructions that won't validate, generate wrappers that
contain unreachable thereby delaying the error until runtime.

Differential Revision: https://reviews.llvm.org/D49517

llvm-svn: 338744
2018-08-02 17:38:06 +00:00
Craig Topper 0423881820 [X86] Allow fake unary unpckhpd and movhlps to be commuted for execution domain fixing purposes
These instructions perform the same operation, but the semantic of which operand is destroyed is reversed. If the same register is used as both operands we can change the execution domain without worrying about this difference.

Unfortunately, this really only works in cases where the input register is killed by the instruction. If its not killed, the two address isntruction pass inserts a copy that will become a move instruction. This makes the instruction use different physical registers that contain the same data at the time the unpck/movhlps executes. I've considered using a unary pseudo instruction with tied operand to trick the two address instruction pass. We could then expand the pseudo post regalloc to get the same physical register on both inputs.

Differential Revision: https://reviews.llvm.org/D50157

llvm-svn: 338735
2018-08-02 16:48:01 +00:00
Simon Pilgrim ef494e1722 [X86][SSE] Add uniform/non-uniform exact sdiv vector tests covering all paths
Regenerated tests and tested on 64-bit (AVX2) as well.

llvm-svn: 338729
2018-08-02 15:34:51 +00:00
David Bolvansky 67647bcfbe [InstCombine] [NFC] Tests for select with binop fold
llvm-svn: 338727
2018-08-02 14:59:23 +00:00
Sanjay Patel 3f6e9a71f7 [InstSimplify] move minnum/maxnum with undef fold from instcombine
llvm-svn: 338719
2018-08-02 14:33:40 +00:00
Sjoerd Meijer 8e7fab0443 [ARM][NFC] Follow up of r338568
I disabled more tests than necessary, this enables them.

llvm-svn: 338717
2018-08-02 14:04:48 +00:00
Sanjay Patel f9a0d593e9 [ValueTracking] fix maxnum miscompile for cannotBeOrderedLessThanZero (PR37776)
This adds the NAN checks suggested in PR37776:
https://bugs.llvm.org/show_bug.cgi?id=37776

If both operands to maxnum are NAN, that should get constant folded, so we don't 
have to handle that case. This is the same assumption as other FP ops in this
function. Returning 'false' is always conservatively correct.

Copying from the bug report:

Currently, we have this for "when is cannotBeOrderedLessThanZero 
(mustBePositiveOrNaN) true for maxnum":
               L
        -------------------
        | Pos | Neg | NaN |
   ------------------------
   |Pos |  x  |  x  |  x  |
   ------------------------
 R |Neg |  x  |     |  x  |
   ------------------------
   |NaN |  x  |  x  |  x  |
   ------------------------


The cases with (Neg & NaN) are wrong. We should have:

                L
        -------------------
        | Pos | Neg | NaN |
   ------------------------
   |Pos |  x  |  x  |  x  |
   ------------------------
 R |Neg |  x  |     |     |
   ------------------------
   |NaN |  x  |     |  x  |
   ------------------------

Differential Revision: https://reviews.llvm.org/D50081

llvm-svn: 338716
2018-08-02 13:46:20 +00:00
Matt Arsenault 1f3977a856 DAG: Fix vector widening fcanonicalize
llvm-svn: 338715
2018-08-02 13:43:53 +00:00
Matt Arsenault 36cdcfadcf AMDGPU: Fix scalarizing v4f16 fcanonicalize
llvm-svn: 338714
2018-08-02 13:43:42 +00:00
Ben Dunbobbin d498dcdbbf [llvm-ar] Fix help text test. NFC.
Missed from @338703

llvm-svn: 338709
2018-08-02 12:27:01 +00:00
Simon Pilgrim 090d58b2b5 [X86][SSE] Add more UDIV nonuniform-constant vector tests
Ensure we cover all paths for vector data as requested on D49248

llvm-svn: 338698
2018-08-02 10:53:53 +00:00
Alexander Ivchenko 49168f6778 [GlobalISel] Rewrite CallLowering::lowerReturn to accept multiple VRegs per Value
This is logical continuation of https://reviews.llvm.org/D46018 (r332449)

Differential Revision: https://reviews.llvm.org/D49660

llvm-svn: 338685
2018-08-02 08:33:31 +00:00
David Green ea60446c6d [AArch64] Add support for got relocated LDR's
As a part of adding the tiny codemodel, we need to support ldr's with :got:
relocations on them. This seems to be mostly already done, just needs the
relocation type support.

Differential Revision: https://reviews.llvm.org/D50137

llvm-svn: 338673
2018-08-02 06:24:40 +00:00
Philip Reames 24b13cb06d [LICM] Expand tests to highlight an oddity in sinking implementation
llvm-svn: 338670
2018-08-02 03:54:29 +00:00
Lei Liu b9a7b7a84d Fix FCOPYSIGN expansion
In expansion of FCOPYSIGN, the shift node is missing when the two
operands of FCOPYSIGN are of the same size. We should always generate
shift node (if the required shift bit is not zero) to put the sign
bit into the right position, regardless of the size of underlying
types.

Differential Revision: https://reviews.llvm.org/D49973

llvm-svn: 338665
2018-08-02 01:54:12 +00:00
Nemanja Ivanovic e1a525ed06 [PowerPC] Do not round values prior to converting to integer
Adding the FP_ROUND nodes when combining FP_TO_[SU]INT of elements
feeding a BUILD_VECTOR into an FP_TO_[SU]INT of the built vector
loses precision. This patch removes the code that adds these nodes
to true f64 operands. It also adds patterns required to ensure
the code is still vectorized rather than converting individual
elements and inserting into a vector.

Fixes https://bugs.llvm.org/show_bug.cgi?id=38342

Differential Revision: https://reviews.llvm.org/D50121

llvm-svn: 338658
2018-08-02 00:03:22 +00:00
Lei Liu 8e422b8403 [AArch64] DWARF: do not generate AT_location for thread local
AArch64 ELF ABI does not define a static relocation type for TLS offset within
a module, which makes it impossible for compiler to generate a valid
DW_AT_location content for thread local variables. Currently LLVM generates an
invalid R_AARCH64_ABS64 relocation at the DW_AT_location field for a TLS
variable. That causes trouble for linker because thread local variable does
not have an absolute address at link time. AArch64 GCC solves the problem by
not generating DW_AT_location for thread local variables. We should do the
same in LLVM.

Differential Revision: https://reviews.llvm.org/D43860

llvm-svn: 338655
2018-08-01 23:46:49 +00:00
George Burgess IV 213d1d23ef Reland r338431: "Add DebugCounters to DivRemPairs"
(Previously reverted in r338442)

I'm told that the breakage came from us using an x86 triple on configs
that didn't have x86 enabled. This is remedied by moving the
debugcounter test to an x86 directory (where there's also a
opt-bisect-isel.ll test for similar reasons).

I can't repro the reverse-iteration failure mentioned in the revert with
this patch, so I assume that a misconfiguration on my end is what caused
that.

Original commit message:

    Add DebugCounters to DivRemPairs

    For people who don't use DebugCounters, NFCI.

    Patch by Zhizhou Yang!

    Differential Revision: https://reviews.llvm.org/D50033

llvm-svn: 338653
2018-08-01 23:14:14 +00:00
Sanjay Patel 28c7e41c09 [InstSimplify] move minnum/maxnum with same arg fold from instcombine
llvm-svn: 338652
2018-08-01 23:05:55 +00:00
Reid Kleckner a30a6d2c29 Load from the GOT for external symbols in the large, PIC code model
Do the same handling for external symbols that we do for jump table
symbols and global values.

Fixes one of the cases in PR38385

llvm-svn: 338651
2018-08-01 22:56:05 +00:00
John Baldwin c5d7e04052 [ASAN] Use the correct shadow offset for ASAN on FreeBSD/mips64.
Reviewed By: atanasyan

Differential Revision: https://reviews.llvm.org/D49939

llvm-svn: 338650
2018-08-01 22:51:13 +00:00
Matt Arsenault 709374d186 AMDGPU: Improve hack for packing conversion ops
Mutate the node type during selection when it
doesn't matter. This avoids an intermediate bitcast
node on targets with legal i16/f16.

Also fixes missing output modifiers on v_cvt_pkrtz_f32_f16,
which I assume are OK.

llvm-svn: 338619
2018-08-01 20:13:58 +00:00
Matt Arsenault 55ab9213d3 AMDGPU: Partially fix handling of packed amdgpu_ps arguments
Fixes annoying limitations when writing tests.
Also remove more leftover code for manually scalarizing arguments
and return values.

llvm-svn: 338618
2018-08-01 19:57:34 +00:00
Heejin Ahn b3724b7169 [WebAssembly] Support for a ternary atomic RMW instruction
Summary: This adds support for a ternary atomic RMW instruction: cmpxchg.

Reviewers: dschuff

Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D49195

llvm-svn: 338617
2018-08-01 19:40:28 +00:00
Alexey Bataev d4dd7215f6 [DEBUGINFO] Disable emission of the dwarf sections, but allow directives.
Summary:
Added an option that allows to emit only '.loc' and '.file' kind debug
directives, but disables emission of the DWARF sections. Required for
NVPTX target to support profiling. It requires '.loc' and '.file'
directives, but does not require any DWARF sections for the profiler.

Reviewers: probinson, echristo, dblaikie

Subscribers: aprantl, JDevlieghere, llvm-commits

Differential Revision: https://reviews.llvm.org/D46021

llvm-svn: 338616
2018-08-01 19:38:20 +00:00
Craig Topper c985d42903 [X86] Canonicalize the pattern for __builtin_ffs in a similar way to '__builtin_ffs + 5'
We now emit a move of -1 before the cmov and do the addition after the cmov just like the case with an extra addition.

This may be slightly worse for code size, but is more consistent with other compilers. And we might be able to hoist the mov -1 outside of loops.

llvm-svn: 338613
2018-08-01 18:38:46 +00:00
Craig Topper ffb8eb30ff [X86] Add test cases for the patterns used by __builtin_ffs.
We previously had tests for "__builtin_ffs + 5", but the SelectinoDAG without an extra addition came out slightly different.

llvm-svn: 338612
2018-08-01 18:38:43 +00:00
Jan Vesely 93b252799b AMDGPU/R600: Convert kernel param loads to use PARAM_I_ADDRESS
Non ext aligned i32 loads are still optimized to use CONSTANT_BUFFER (AS 8)

llvm-svn: 338610
2018-08-01 18:36:07 +00:00
Zachary Turner 44ebbc216a [MS Demangler] Properly demangle templated operators.
After we detected the presence of a template via ?$ we would proceed by
only demangling a simple unqualified name. This means we would fail on
templated operators (and perhaps other yet-to-be-determined things)

This was discovered while doing some refactoring to store richer
semantic information about the demangled types to pave the way for
overhauling the way we handle backreferences. (Specifically, we need to
defer recording or resolving back-references until a symbol has been
completely demangled, because we need to use information that only
occurs later in the mangled string to decide whether a back-reference
should be recorded.)

Differential Revision: https://reviews.llvm.org/D50145

llvm-svn: 338608
2018-08-01 18:32:47 +00:00
Vlad Tsyrklevich ab016e00ec [X86] FastISel fall back on !absolute_symbol GVs
Summary:
D25878, which added support for !absolute_symbol for normal X86 ISel,
did not add support for materializing references to absolute symbols for
X86 FastISel. This causes build failures because FastISel generates
PC-relative relocations for absolute symbols. Fall back to normal ISel
for references to !absolute_symbol GVs. Fix for PR38200.

Reviewers: pcc, craig.topper

Reviewed By: pcc

Subscribers: hiraditya, llvm-commits, kcc

Differential Revision: https://reviews.llvm.org/D50116

llvm-svn: 338599
2018-08-01 17:44:37 +00:00
Simon Pilgrim b911d6721d [llvm-mca][x86] Add CMPXCHG instruction resource tests
I've put CMPXCHG8B/CMPXCHG16B in the same file, even though technically they are under separate CPUID bits all targets seem to support both (or neither).

llvm-svn: 338595
2018-08-01 17:25:11 +00:00
Sanjay Patel d5ae183034 [x86] remove stale FIXME note from test; NFC
This was fixed with rL338592.

llvm-svn: 338593
2018-08-01 17:18:50 +00:00
Sanjay Patel 8aac22e06a [SelectionDAG] fix bug in translating funnel shift with non-power-of-2 type
The bug is visible in the constant-folded x86 tests. We can't use the
negated shift amount when the type is not power-of-2:
https://rise4fun.com/Alive/US1r

...so in that case, use the regular lowering that includes a select
to guard against a shift-by-bitwidth. This path is improved by only
calculating the modulo shift amount once now.

Also, improve the rotate (with power-of-2 size) lowering to use
a negate rather than subtract from bitwidth. This improves the
codegen whether we have a rotate instruction or not (although
we can still see that we're not matching to a legal rotate in
all cases).

llvm-svn: 338592
2018-08-01 17:17:08 +00:00
Sanjay Patel 6d302c93cc [x86] add tests to show miscompile for funnel shift with weird size; NFC
llvm-svn: 338587
2018-08-01 16:59:54 +00:00
Simon Pilgrim 5c4fb14e07 [llvm-mca][x86] Add PREFETCHW instruction resource tests
These aren't just available via 3DNow! so test for them separately as well.

llvm-svn: 338584
2018-08-01 16:34:39 +00:00
Simon Pilgrim dcfa732b2f [llvm-mca][x86] Add PCLMUL instruction resource tests
Renamed the btver2 file that already contained them - the other targets were only testing the AVX versions

llvm-svn: 338583
2018-08-01 16:25:50 +00:00
Jordan Rupprecht d67c1e129b [llvm-objcopy] Add support for --rename-section flags from gnu objcopy
Summary:
Add support for --rename-section flags from gnu objcopy.

Not all flags appear to have an effect for ELF objects, but allowing them would allow easier drop-in replacement. Other unrecognized flags are rejected.

This was only tested by comparing flags printed by "readelf -e <.o>" against the output of gnu vs llvm objcopy, it hasn't been tested to be valid beyond that.

Reviewers: jakehehrlich, alexshap

Subscribers: llvm-commits, paulsemel, alexshap

Differential Revision: https://reviews.llvm.org/D49870

llvm-svn: 338582
2018-08-01 16:23:22 +00:00
Andrea Di Biagio 7f3bf5c1f9 [llvm-mca] Correctly update the rank in `Scheduler::select()`.
Found by inspection.

llvm-svn: 338579
2018-08-01 16:06:33 +00:00
Simon Pilgrim 34ac6533f4 [llvm-mca][x86] Add SET/TEST instruction resource tests
llvm-svn: 338576
2018-08-01 15:29:47 +00:00
Sjoerd Meijer 590e4e8dde [ARM] Armv8.2-A FP16 vector intrinsics tests
Clang support for the Armv8.2-A FP16 vector intrinsic was committed in
rC328277, but this was never followed up, i.e. the LLVM part is missing.

I've raised PR38404, and this is the first step to address this. I.e.,
this adds tests for the Armv8.2-A FP16 vector intrinsic, and thus shows
which intrinsics already work, and which need further work.

Differential Revision: https://reviews.llvm.org/D50142

llvm-svn: 338568
2018-08-01 14:43:59 +00:00
Simon Pilgrim e364e57ac9 [llvm-mca][x86] Add LEA instruction resource tests
We already added these to btver2, now add them to other targets, even though none of their models treat them specially (yet).

llvm-svn: 338565
2018-08-01 14:25:33 +00:00
Simon Pilgrim 6754913e95 [llvm-mca][x86] Add more x86-64 system instruction resource tests
CPUID, IN/OUT, INS/OUTS, INT, PAUSE, SCAS, UD2, XLAT

llvm-svn: 338563
2018-08-01 14:18:09 +00:00
Cameron McInally 04ae85859d [FPEnv] Widen illegal width StrictFP vector operations as needed
Differential Revision: https://reviews.llvm.org/D49806

llvm-svn: 338562
2018-08-01 14:17:19 +00:00
Bryan Chan 67106b5e08 [AArch64] Fix FCCMP with FP16 operands
Summary: This patch adds support for FCCMP instruction with FP16 operands, avoiding an assertion during instruction selection.

Reviewers: olista01, SjoerdMeijer, t.p.northover, javed.absar

Reviewed By: SjoerdMeijer

Subscribers: kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D50115

llvm-svn: 338554
2018-08-01 13:50:29 +00:00
Simon Pilgrim 5f41ab79c0 [llvm-mca][x86] Add CLFLUSHOPT instruction resource tests
llvm-svn: 338550
2018-08-01 13:34:17 +00:00
Simon Pilgrim bd014f4d91 [llvm-mca][x86] Add CMPS/LODS/MOVS/STOS string instruction resource tests
llvm-svn: 338532
2018-08-01 13:14:45 +00:00
Jonas Devlieghere 8acb74e01f [MC] Report fatal error for DWARF types for non-ELF object files
Getting the DWARF types section is only implemented for ELF object
files. We already disabled emitting debug types in clang (r337717), but
now we also report an fatal error (rather than crashing) when trying to
obtain this section in MC. Additionally we ignore the generate debug
types flag for unsupported target triples.

See PR38190 for more information.

Differential revision: https://reviews.llvm.org/D50057

llvm-svn: 338527
2018-08-01 12:53:06 +00:00
Ryan Taylor 894c8fd0e2 [AMDGPU] Optimize _L image intrinsic to _LZ when lod is zero
Summary:
Add _L to _LZ image intrinsic table mapping to table gen.
In ISelLowering check if image intrinsic has lod and if it's equal
to zero, if so remove lod and change opcode to equivalent mapped _LZ.

Change-Id: Ie24cd7e788e2195d846c7bd256151178cbb9ec71

Subscribers: arsenm, mehdi_amini, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, steven_wu, dexonsmith, llvm-commits

Differential Revision: https://reviews.llvm.org/D49483

llvm-svn: 338523
2018-08-01 12:12:01 +00:00
Ulrich Weigand 58a9786e81 [SystemZ, TableGen] Fix shift count handling
The DAG combiner logic to simplify AND masks in shift counts is invalid.
While it is true that the SystemZ shift instructions ignore all but the
low 6 bits of the shift count, it is still invalid to simplify the AND
masks while the DAG still uses the standard shift operators (which are
*not* defined to match the SystemZ instruction behavior).

Instead, this patch performs equivalent operations during instruction
selection. For completely removing the AND, this now happens via
additional DAG match patterns implemented by a multi-alternative
PatFrags. For simplifying a 32-bit AND to a 16-bit AND, the existing DAG
patterns were already mostly OK, they just needed an output XForm to
actually truncate the immediate value.

Unfortunately, the latter change also exposed a bug in TableGen: it
seems XForms are currently only handled correctly for direct operands of
the outermost operation node. This patch also fixes that bug by simply
recurring through the whole pattern. This should be NFC for all other
targets.

Differential Revision: https://reviews.llvm.org/D50096

llvm-svn: 338521
2018-08-01 11:57:58 +00:00
Simon Pilgrim 18d025a732 [llvm-mca][x86] Add STC + STD instruction resource tests
llvm-svn: 338514
2018-08-01 11:00:11 +00:00
Petar Jovanovic 64c10ba8e2 [MIPS GlobalISel] Select global address
Select G_GLOBAL_VALUE for position dependent code.

Patch by Petar Avramovic.

Differential Revision: https://reviews.llvm.org/D49803

llvm-svn: 338499
2018-08-01 09:03:23 +00:00
David Bolvansky fbbb83c782 Revert "Enrich inline messages", tests fail
llvm-svn: 338496
2018-08-01 08:02:40 +00:00
David Bolvansky 7f36cd9d96 Enrich inline messages
Summary:
This patch improves Inliner to provide causes/reasons for negative inline decisions.
1. It adds one new message field to InlineCost to report causes for Always and Never instances. All Never and Always instantiations must provide a simple message.
2. Several functions that used to return the inlining results as boolean are changed to return InlineResult which carries the cause for negative decision.
3. Changed remark priniting and debug output messages to provide the additional messages and related inline cost.
4. Adjusted tests for changed printing.

Patch by: yrouban (Yevgeny Rouban)


Reviewers: craig.topper, sammccall, sgraenitz, NutshellySima, shchenz, chandlerc, apilipenko, javed.absar, tejohnson, dblaikie, sanjoy, eraman, xbolva00

Reviewed By: tejohnson, xbolva00

Subscribers: xbolva00, llvm-commits, arsenm, mehdi_amini, eraman, haicheng, steven_wu, dexonsmith

Differential Revision: https://reviews.llvm.org/D49412

llvm-svn: 338494
2018-08-01 07:37:16 +00:00
Martin Storsjo d4590c38ab [AArch64] Disallow the MachO specific .loh directive for windows
Also add a test for it being unsupported for linux.

Differential Revision: https://reviews.llvm.org/D49929

llvm-svn: 338493
2018-08-01 06:50:18 +00:00
Victor Leschuk 64e0c56717 [DWARF] Basic support for producing DWARFv5 .debug_addr section
This revision implements support for generating DWARFv5 .debug_addr section.
The implementation is pretty straight-forward: we just check the dwarf version
and emit section header if needed.

Reviewers: aprantl, dblaikie, probinson

Reviewed by: dblaikie

Differential Revision: https://reviews.llvm.org/D50005

llvm-svn: 338487
2018-08-01 05:48:06 +00:00
Hiroshi Inoue 02f79eae06 [InstSimplify] fold extracting from std::pair (1/2)
This patch intends to enable jump threading when a method whose return type is std::pair<int, bool> or std::pair<bool, int> is inlined.
For example, jump threading does not happen for the if statement in func.

std::pair<int, bool> callee(int v) {
  int a = dummy(v);
  if (a) return std::make_pair(dummy(v), true);
  else return std::make_pair(v, v < 0);
}

int func(int v) {
  std::pair<int, bool> rc = callee(v);
  if (rc.second) {
    // do something
  }

SROA executed before the method inlining replaces std::pair by i64 without splitting in both callee and func since at this point no access to the individual fields is seen to SROA.
After inlining, jump threading fails to identify that the incoming value is a constant due to additional instructions (like or, and, trunc).

This series of patch add patterns in InstructionSimplify to fold extraction of members of std::pair. To help jump threading, actually we need to optimize the code sequence spanning multiple BBs.
These patches does not handle phi by itself, but these additional patterns help NewGVN pass, which calls instsimplify to check opportunities for simplifying instructions over phi, apply phi-of-ops optimization to result in successful jump threading. 
SimplifyDemandedBits in InstCombine, can do more general optimization but this patch aims to provide opportunities for other optimizers by supporting a simple but common case in InstSimplify.

This first patch in the series handles code sequences that merges two values using shl and or and then extracts one value using lshr.

Differential Revision: https://reviews.llvm.org/D48828

llvm-svn: 338485
2018-08-01 04:40:32 +00:00
Jatin Bhateja 36432a70c1 [X86] Adding more test patterns for lea-opt (PR37939)
Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D50128

llvm-svn: 338483
2018-08-01 03:53:27 +00:00
Chandler Carruth 2ce191e220 [x86] Fix a really subtle miscompile due to a somewhat glaring bug in
EFLAGS copy lowering.

If you have a branch of LLVM, you may want to cherrypick this. It is
extremely unlikely to hit this case empirically, but it will likely
manifest as an "impossible" branch being taken somewhere, and will be
... very hard to debug.

Hitting this requires complex conditions living across complex control
flow combined with some interesting memory (non-stack) initialized with
the results of a comparison. Also, because you have to arrange for an
EFLAGS copy to be in *just* the right place, almost anything you do to
the code will hide the bug. I was unable to reduce anything remotely
resembling a "good" test case from the place where I hit it, and so
instead I have constructed synthetic MIR testing that directly exercises
the bug in question (as well as the good behavior for completeness).

The issue is that we would mistakenly assume any SETcc with a valid
condition and an initial operand that was a register and a virtual
register at that to be a register *defining* SETcc...

It isn't though....

This would in turn cause us to test some other bizarre register,
typically the base pointer of some memory. Now, testing this register
and using that to branch on doesn't make any sense. It even fails the
machine verifier (if you are running it) due to the wrong register
class. But it will make it through LLVM, assemble, and it *looks*
fine... But wow do you get a very unsual and surprising branch taken in
your actual code.

The fix is to actually check what kind of SETcc instruction we're
dealing with. Because there are a bunch of them, I just test the
may-store bit in the instruction. I've also added an assert for sanity
that ensure we are, in fact, *defining* the register operand. =D

llvm-svn: 338481
2018-08-01 03:01:58 +00:00
Chandler Carruth 014047a99a [x86/slh] Add unwind info to several tests to make it more obvious that
we aren't incorrectly generating any of it when doing SLH.

There was a bug that only occured with SLH that very much looked like it
could be caused by bad unwind info, and so this was a prime suspect.
Turns out that everything is fine, but this way we'll *see* if we end
up, for example, putting things we shouldn't inside the prolog.

llvm-svn: 338480
2018-08-01 03:01:10 +00:00
Hsiangkai Wang 5c63af0d04 [DebugInfo] Generate fixups as emitting DWARF .debug_line.
It is necessary to generate fixups in .debug_line as relaxation is
enabled due to the address delta may be changed after relaxation.

DWARF will record the mappings of lines and addresses in
.debug_line section. It will encode the information using special
opcodes, standard opcodes and extended opcodes in Line Number
Program. I use DW_LNS_fixed_advance_pc to encode fixed length
address delta and DW_LNE_set_address to encode absolute address
to make it possible to generate fixups in .debug_line section.

Differential Revision: https://reviews.llvm.org/D46850

llvm-svn: 338477
2018-08-01 02:18:06 +00:00
Amara Emerson 6cdfe29d8e [GlobalISel][IRTranslator] Use RPO traversal when visiting blocks to translate.
Previously we were just visiting the blocks in the function in IR order, which
is rather arbitrary. Therefore we wouldn't always visit defs before uses, but
the translation code relies on this assumption in some places.

Only codegen change seen in tests is an elision of a redundant copy.

Fixes PR38396

llvm-svn: 338476
2018-08-01 02:17:42 +00:00
Konstantin Zhuravlyov bb30ef7af4 AMDGPU: Add clamp bit to dot intrinsics
Differential Revision: https://reviews.llvm.org/D49874

llvm-svn: 338470
2018-08-01 01:31:30 +00:00
Evandro Menezes eb159e56a6 [PATCH] [SLC] Test simplification of pow() for vector types (NFC)
Add test case for the simplification of `pow()` for vector types that D50035
enables.

llvm-svn: 338463
2018-08-01 00:30:43 +00:00
Reid Kleckner b32ff46ff7 Revert r338354 "[ARM] Revert r337821"
Disable ARMCodeGenPrepare by default again. It is causing verifier
failues in V8 that look like:

  Duplicate integer as switch case
  switch i32 %trunc, label %if.end13 [
    i32 0, label %cleanup36
    i32 0, label %if.then8
  ], !dbg !4981
  i32 0
  fatal error: error in backend: Broken function found, compilation aborted!

I will continue reducing the test case and send it along.

llvm-svn: 338452
2018-07-31 23:09:42 +00:00
David L. Jones 9fb2e3ceaa [WebAssembly] Fix debug info tests after r338437.
After r338437, debug_ranges are no longer emitted. Previously, this was only
done for DWARF version 5 and above.

llvm-svn: 338448
2018-07-31 22:24:14 +00:00
Victor Leschuk 58d3399d8a [DWARF] Support for .debug_addr (consumer)
This patch implements basic support for parsing
  and dumping DWARFv5 .debug_addr section.

llvm-svn: 338447
2018-07-31 22:19:19 +00:00
Fangrui Song 87b4b8f7b4 [llvm-objcopy] Make --strip-debug strip .gdb_index
Summary:
See binutils-gdb/bfd/elf.c, GNU objcopy also strips .stab* (STABS)
.line* (DWARF 1) .gnu.linkonce.wi.* (linkonce section for .debug_info) but
I'm not sure we need to be compatible with it.

Reviewers: dblaikie, alexshap, jakehehrlich, jhenderson

Reviewed By: alexshap, jakehehrlich

Subscribers: aprantl, JDevlieghere, jakehehrlich, llvm-commits

Differential Revision: https://reviews.llvm.org/D50100

llvm-svn: 338443
2018-07-31 21:26:35 +00:00
George Burgess IV 497e8fad51 Revert r338431: "Add DebugCounters to DivRemPairs"
This reverts r338431; the test it added is making buildbots unhappy.
Locally, I can repro the failure on reverse-iteration builds.

llvm-svn: 338442
2018-07-31 21:18:44 +00:00
Matt Arsenault 118c47b6d1 AMDGPU: Split amdgcn/r600 fminnum/fmaxnum tests
R600 breaks on too many things to usefully test changes
with ieee_mode on vs. off.

llvm-svn: 338435
2018-07-31 20:38:42 +00:00
George Burgess IV 907f4f6a74 Add DebugCounters to DivRemPairs
For people who don't use DebugCounters, NFCI.

Patch by Zhizhou Yang!

Differential Revision: https://reviews.llvm.org/D50033

llvm-svn: 338431
2018-07-31 20:07:46 +00:00
Alexandre Ganea b92d3ad762 [CodeView] Add coverage test for r338308 (Fixed crash in type merging)
llvm-svn: 338423
2018-07-31 19:30:03 +00:00
Matt Arsenault feedabfde7 AMDGPU: Break 64-bit arguments into 32-bit pieces
llvm-svn: 338421
2018-07-31 19:29:04 +00:00
Matt Arsenault 0395da7842 AMDGPU: Split wide vectors of i16/f16 into 32-bit regs on calls
This improves code for the same reasons as scalarizing 32-bit
element vectors.

llvm-svn: 338418
2018-07-31 19:17:47 +00:00
Alexandre Ganea ee8a720051 [CodeView] Minimal support for S_UNAMESPACE records
Differential Revision: https://reviews.llvm.org/D50007

llvm-svn: 338417
2018-07-31 19:15:50 +00:00
Matt Arsenault 9ced1e0d80 AMDGPU: Scalarize vector argument types to calls
When lowering calling conventions, prefer to decompose vectors
into the constitute register types. This avoids artifical constraints
to satisfy a wide super-register.

This improves code quality because now optimizations don't need to
deal with the super-register constraint. For example the immediate
folding code doesn't deal with 4 component reg_sequences, so by
breaking the register down earlier the existing immediate folding
code is able to work.

This also avoids the need for the shader input processing code
to manually split vector types.

llvm-svn: 338416
2018-07-31 19:05:14 +00:00
Vlad Tsyrklevich 48ed9acede Revert "[DebugInfo] Generate DWARF debug information for labels."
This reverts commits r338390 and r338398, they were causing LSan
failures on the ASan bot.

llvm-svn: 338408
2018-07-31 18:10:37 +00:00
Simon Pilgrim 5d9b00d15b [X86][SSE] Use ISD::MULHU for constant/non-zero ISD::SRL lowering (PR38151)
As was done for vector rotations, we can efficiently use ISD::MULHU for vXi8/vXi16 ISD::SRL lowering.

Shift-by-zero cases are still problematic (mainly on v32i8 due to extra AND/ANDN/OR or VPBLENDVB blend masks but v8i16/v16i16 aren't great either if PBLENDW fails) so I've limited this first patch to known non-zero cases if we can't easily use PBLENDW.

Differential Revision: https://reviews.llvm.org/D49562

llvm-svn: 338407
2018-07-31 18:05:56 +00:00
Simon Pilgrim 1f4b9cb6fe [llvm-mca][x86] Add 32-bit instruction resource tests
These aren't exhaustive, but cover some instructions that are only available in 32-bit mode (where would we be without good BCD math performance?).

llvm-svn: 338404
2018-07-31 17:33:08 +00:00
Zachary Turner d30700f82d Resubmit r338340 "[MS Demangler] Better demangling of template arguments."
This broke the build with GCC, but has since been fixed.

llvm-svn: 338403
2018-07-31 17:16:44 +00:00
Craig Topper bef126fb71 [X86] Add pattern matching for PMADDUBSW
Summary:
Similar to D49636, but for PMADDUBSW. This instruction has the additional complexity that the addition of the two products saturates to 16-bits rather than wrapping around. And one operand is treated as signed and the other as unsigned.

A C example that triggers this pattern

```
static const int N = 128;

int8_t A[2*N];
uint8_t B[2*N];
int16_t C[N];

void foo() {
  for (int i = 0; i != N; ++i)
    C[i] = MIN(MAX((int16_t)A[2*i]*(int16_t)B[2*i] + (int16_t)A[2*i+1]*(int16_t)B[2*i+1], -32768), 32767);
}
```

Reviewers: RKSimon, spatel, zvi

Reviewed By: RKSimon, zvi

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D49829

llvm-svn: 338402
2018-07-31 17:12:08 +00:00
Craig Topper d03d44e0b9 [X86] Add test cases that could use PMADDUBSW.
llvm-svn: 338401
2018-07-31 17:12:06 +00:00
Francis Visoiu Mistrih ae8002c1cf [X86] Preserve more liveness information in emitStackProbeInline
This commit fixes two issues with the liveness information after the
call:

1) The code always spills RCX and RDX if InProlog == true, which results
in an use of undefined phys reg.
2) FinalReg, JoinReg, RoundedReg, SizeReg are not added as live-ins to
the basic blocks that use them, therefore they are seen undefined.

https://llvm.org/PR38376

Differential Revision: https://reviews.llvm.org/D50020

llvm-svn: 338400
2018-07-31 16:41:12 +00:00
Hsiangkai Wang 68c6860434 [DebugInfo] Fix build failed in 'clang-cmake-armv8-full'.
Builder clang-cmake-armv8-full failed due to the assembly 'comment'
notation is not '#' in the target. So, I use CHECK-SAME to avoid to
check the comment notation in the same line in the test case.

llvm-svn: 338398
2018-07-31 16:22:09 +00:00
Ewan Crawford d83beb804c Fix InstCombine address space assert
Workaround bug where the InstCombine pass was asserting on the IR added in lit
test, where we have a bitcast instruction after a GEP from an addrspace cast.

The second bitcast in the test was getting combined into
`bitcast <16 x i32>* %0 to <16 x i32> addrspace(3)*`, which looks like it should
be an addrspace cast instruction instead. Otherwise if control flow is allowed
to continue as it is now we create a GEP instruction
`<badref> = getelementptr inbounds <16 x i32>, <16 x i32>* %0, i32 0`. However
because the type of this instruction doesn't match the address space we hit an
assert when replacing the bitcast with that GEP.

```
void llvm::Value::doRAUW(llvm::Value*, bool): Assertion `New->getType() == getType() && "replaceAllUses of value with new value of different type!"' failed.
```

Differential Revision: https://reviews.llvm.org/D50058

llvm-svn: 338395
2018-07-31 15:53:03 +00:00
Sanjay Patel a35781fdf9 [InstCombine] regenerate checks and add tests for D50035; NFC
llvm-svn: 338392
2018-07-31 15:07:32 +00:00
Anastasis Grammenos ac3f8028da [DebugInfo][LCSSA] Preserve debug location in lcssa phis
Summary:
When inserting lcssa Phi Nodes in the exit block
mak sure to preserve the original instructions DL.

Reviewers: vsk

Subscribers: JDevlieghere, llvm-commits

Differential Revision: https://reviews.llvm.org/D50009

llvm-svn: 338391
2018-07-31 14:54:52 +00:00
Hsiangkai Wang cbc58ada99 [DebugInfo] Generate DWARF debug information for labels.
There are two forms for label debug information in DWARF format.

1. Labels in a non-inlined function:

DW_TAG_label
  DW_AT_name
  DW_AT_decl_file
  DW_AT_decl_line
  DW_AT_low_pc

2. Labels in an inlined function:

DW_TAG_label
  DW_AT_abstract_origin
  DW_AT_low_pc

We will collect label information from DBG_LABEL. Before every DBG_LABEL,
we will generate a temporary symbol to denote the location of the label.
The symbol could be used to get DW_AT_low_pc afterwards. So, we create a
mapping between 'inlined label' and DBG_LABEL MachineInstr in DebugHandlerBase.
The DBG_LABEL in the mapping is used to query the symbol before it.

The AbstractLabels in DwarfCompileUnit is used to process labels in inlined
functions.

We also keep a mapping between scope and labels in DwarfFile to help to
generate correct tree structure of DIEs.

It also generates label debug information under global isel.

Differential Revision: https://reviews.llvm.org/D45556

llvm-svn: 338390
2018-07-31 14:48:32 +00:00
David Bolvansky ab79414f7b Revert Enrich inline messages
llvm-svn: 338389
2018-07-31 14:47:22 +00:00
Sanjay Patel 57d617d676 [InstCombine] auto-generate checks; NFC
llvm-svn: 338388
2018-07-31 14:27:30 +00:00
David Bolvansky b562dbabda Enrich inline messages
Summary:
This patch improves Inliner to provide causes/reasons for negative inline decisions.
1. It adds one new message field to InlineCost to report causes for Always and Never instances. All Never and Always instantiations must provide a simple message.
2. Several functions that used to return the inlining results as boolean are changed to return InlineResult which carries the cause for negative decision.
3. Changed remark priniting and debug output messages to provide the additional messages and related inline cost.
4. Adjusted tests for changed printing.

Patch by: yrouban (Yevgeny Rouban)


Reviewers: craig.topper, sammccall, sgraenitz, NutshellySima, shchenz, chandlerc, apilipenko, javed.absar, tejohnson, dblaikie, sanjoy, eraman, xbolva00

Reviewed By: tejohnson, xbolva00

Subscribers: xbolva00, llvm-commits, arsenm, mehdi_amini, eraman, haicheng, steven_wu, dexonsmith

Differential Revision: https://reviews.llvm.org/D49412

llvm-svn: 338387
2018-07-31 14:25:24 +00:00