Commit Graph

60699 Commits

Author SHA1 Message Date
Hubert Tong ab2eb2bfac [XCOFF] Add functionality for parsing AIX XCOFF object file headers
Summary:
1. Add functionality for parsing AIX XCOFF object files headers.
2. Only support 32-bit AIX XCOFF object files in this patch.
3. Print out the AIX XCOFF object file header in YAML format.

Reviewers: sfertile, hubert.reinterpretcast, jasonliu, mstorsjo, zturner, rnk

Reviewed By: sfertile, hubert.reinterpretcast

Subscribers: jsji, mgorny, hiraditya, jdoerfert, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D59419

Patch by Digger Lin

llvm-svn: 357663
2019-04-04 00:53:21 +00:00
Craig Topper 051bd16faf [X86] Remove CustomInserters for RDPKRU/WRPKRU. Use some custom lowering and new ISD opcodes instead.
These inserters inserted some instructions to zero some registers and copied from virtual registers to physical registers.

This change instead inserts the zeros directly into the DAG at lowering time using new ISD opcodes
that take the extra zeroes as inputs. The zeros will then go through isel on their own to select
the MOV32r0 pseudo. Then we just need to mention the physical registers directly
in the isel patterns and the isel table and InstrEmitter will take care of inserting the necessary
copies to/from physical registers.

llvm-svn: 357659
2019-04-04 00:28:49 +00:00
Craig Topper 52cac4b79f [X86] Remove CustomInserter pseudos for MONITOR/MONITORX/CLZERO. Use custom instruction selection instead.
This custom inserter existed so we could do a weird thing where we pretended that the instructions support
a full address mode instead of taking a pointer in EAX/RAX. I think was largely so we could be pointer
size agnostic in the isel pattern.

To make this work we would then put the address into an LEA into EAX/RAX in front of the instruction after
isel. But the LEA is overkill when we just have a base pointer. So we end up using the LEA as a slower MOV
instruction.

With this change we now just do custom selection during isel instead and just assign the incoming address
of the intrinsic into EAX/RAX based on its size. After the intrinsic is selected, we can let isel take
care of selecting an LEA or other operation to do any address computation needed in this basic block.

I've also split the instruction into a 32-bit mode version and a 64-bit mode version so the implicit
use is properly sized based on the pointer. Without this we get comments in the assembly output about
killing eax and defing rax or vice versa depending on whether we define the instruction to use EAX/RAX.

llvm-svn: 357652
2019-04-03 23:28:30 +00:00
Craig Topper 477008bd50 [X86] Remove dead CHECK lines for a test. NFC
llvm-svn: 357651
2019-04-03 23:28:18 +00:00
Craig Topper 437b45a1f8 [X86] Autogenerate checks. NFC
llvm-svn: 357650
2019-04-03 23:28:11 +00:00
Nico Weber 1672581e96 llvm-undname: Fix a crash-on-invalid
Found by oss-fuzz, fixes issue 13260 on oss-fuzz.

Differential Revision: https://reviews.llvm.org/D60207

llvm-svn: 357649
2019-04-03 23:27:18 +00:00
Nico Weber a9886f8278 llvm-undame: Fix an assert-on-invalid
Found by oss-fuzz, fixes issue 12432 on os-fuzz.

Differential Revision: https://reviews.llvm.org/D60206

llvm-svn: 357648
2019-04-03 23:23:32 +00:00
Nico Weber 321de48a94 llvm-undname: Fix an assert-on-invalid
Found by oss-fuzz, fixes issues 12428 and 12429 on oss-fuzz.

Differential Revision: https://reviews.llvm.org/D60204

llvm-svn: 357647
2019-04-03 23:19:39 +00:00
Nico Weber c7444ddfe5 llvm-undname: Fix a crash-on-invalid
Found by oss-fuzz, fixes issues 12435 and 12438 on oss-fuzz.

Differential Revision: https://reviews.llvm.org/D60202

llvm-svn: 357646
2019-04-03 23:15:56 +00:00
Sanjay Patel c9a012e4ea [x86] fold shuffles of h-ops that have an undef operand
If an operand is undef, we can assume it's the same as the
other operand.

llvm-svn: 357644
2019-04-03 22:40:35 +00:00
Sanjay Patel 61b5e3c6a9 [x86] eliminate movddup of horizontal op
This pattern would show up as a regression if we more
aggressively convert vector FP ops to scalar ops.

There's still a missed optimization for the v4f64 legal
case (AVX) because we create that h-op with an undef operand.
We should probably just duplicate the operands for that
pattern to avoid trouble.

llvm-svn: 357642
2019-04-03 22:15:29 +00:00
Sanjay Patel 0b874c7c60 [x86] add another test for disguised h-op; NFC
llvm-svn: 357636
2019-04-03 21:10:55 +00:00
Matt Arsenault 396653f8a1 AMDGPU: Split block for si_end_cf
Relying on no spill or other code being inserted before this was
precarious. It relied on code diligently checking isBasicBlockPrologue
which is likely to be forgotten.

Ideally this could be done earlier, but this doesn't work because of
phis. Any other instruction can't be placed before them, so we have to
accept the position being incorrect during SSA.

This avoids regressions in the fast register allocator rewrite from
inverting the direction.

llvm-svn: 357634
2019-04-03 20:53:20 +00:00
Sanjay Patel 8c9ceecdc6 [x86] add test for disguised horizontal op; NFC
llvm-svn: 357630
2019-04-03 20:34:22 +00:00
Jonas Devlieghere 2156797cf0 [dwarfdump] Remove bogus verifier error
The standard doesn't require a DW_TAG_variable, DW_TAG_formal_parameter
or DW_TAG_constant to A DW_AT_type attribute describing the type of the
variable. It only specifies that it *can* have one.

llvm-svn: 357628
2019-04-03 19:57:13 +00:00
Taewook Oh a960f89962 [ProfileSummary] Count callsite samples when computing total samples.
Summary: Currently ProfileSummaryBuilder doesn't count into callsite samples when computing total samples. Considering that ProfileSummaryInfo is used to checked the hotness of not only body samples but also callsite samples (from SampleProfileLoader), I think the callsite sample counts should be considered when computing total samples.

Reviewers: eraman, danielcdh, wmi

Subscribers: hiraditya, jdoerfert, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D59835

llvm-svn: 357627
2019-04-03 19:54:43 +00:00
Krzysztof Parzyszek 4841643a1d [X86] Extend boolean arguments to inline-asm according to getBooleanType
Differential Revision: https://reviews.llvm.org/D60208

llvm-svn: 357615
2019-04-03 17:43:14 +00:00
Simon Pilgrim 15919ad306 [X86][AVX] combineHorizontalPredicateResult - split any/allof v16i16/v32i8 reduction on AVX1
Perform the 2 x 128-bit lo/hi OR/AND on the vectors before calling PMOVMSKB on the 128-bit result.

llvm-svn: 357611
2019-04-03 17:28:34 +00:00
Simon Pilgrim 9e28dddf55 [X86][AVX] combineHorizontalPredicateResult - support v16i16/v32i8 reduction on AVX1
Use getPMOVMSKB helper which splits v32i8 MOVMSK calls on pre-AVX2 targets.

llvm-svn: 357608
2019-04-03 17:17:13 +00:00
Paul Semel 0c27bc2e1f [DWARF] check whether the DIE is valid before querying for information
Differential Revision: https://reviews.llvm.org/D60147

llvm-svn: 357607
2019-04-03 17:13:45 +00:00
Jessica Paquette e794121cd0 [AArch64][GlobalISel] Legalize G_FEXP2
Same as G_EXP. Add a test, and update legalizer-info-validation.mir and
f16-instructions.ll.

Differential Revision: https://reviews.llvm.org/D60165

llvm-svn: 357605
2019-04-03 16:58:32 +00:00
Sanjay Patel 8055034666 [x86] make stack folding tests immune to unrelated transforms; NFC
llvm-svn: 357604
2019-04-03 16:33:24 +00:00
George Rimar d4e5500cfa [llvm-readobj] - Fix 2 test cases.
https://reviews.llvm.org/D60122 (r357595) changed the
symbols description format.

This change fix two more new test cases to fix BB:
http://lab.llvm.org:8011/builders/clang-with-thin-lto-ubuntu/builds/16205/steps/test-stage1-compiler/logs/stdio

llvm-svn: 357598
2019-04-03 15:11:19 +00:00
Ulrich Weigand 35dfd1b7df [SystemZ] Improve codegen for certain SADDO-immediate cases
When performing an add-with-overflow with an immediate in the
range -2G ... -4G, code currently loads the immediate into a
register, which generally takes two instructions.

In this particular case, it is preferable to load the negated
immediate into a register instead, which always only requires
one instruction, and then perform a subtract.

llvm-svn: 357597
2019-04-03 15:09:19 +00:00
George Rimar 6da44ad75d [yaml2obj][obj2yaml] - Change how symbol's binding is descibed when parsing/dumping.
Currently, YAML has the following syntax for describing the symbols:

Symbols:
  Local:
    LocalSymbol1:
    ...
    LocalSymbol2:
    ...
  ...
  Global:
    GlobalSymbol1:
  ...
  Weak:
  ...
  GNUUnique:

I.e. symbols are grouped by their bindings. That is not very convenient,
because:

It does not allow to set a custom binding, what can be useful for producing
broken/special outputs for test cases. Adding a new binding would require to
change a syntax (what we observed when added GNUUnique recently).

It does not allow to change the order of the symbols in .symtab/.dynsym,
i.e. currently all Local symbols are placed first, then Global, Weak and GNUUnique
are following, but we are not able to change the order.

It is not consistent. Binding is just one of the properties of the symbol,
we do not group them by other properties.

It makes the code more complex that it can be. This patch shows it can be simplified
with the change performed.

The patch changes the syntax to just:

Symbols:
  Symbol1:
  ...
  Symbol2:
  ...
...

With that, we are able to work with the binding field just like with any other symbol property.

Differential revision: https://reviews.llvm.org/D60122

llvm-svn: 357595
2019-04-03 14:53:42 +00:00
James Henderson f5b181e16d [NFC] Address missed review comment for test
Reviewed by: grimar

Differential Revision: https://reviews.llvm.org/D60200

llvm-svn: 357594
2019-04-03 14:50:50 +00:00
Sanjay Patel 281cf28329 [x86] remove duplicate tests
Accidentally double-committed these.

llvm-svn: 357593
2019-04-03 14:45:45 +00:00
Sanjay Patel 393458f3ed [x86] add negative tests for FP scalarization; NFC
These go with the proposal in D60150.

llvm-svn: 357592
2019-04-03 14:41:28 +00:00
Sanjay Patel 04848090cd [x86] add tests with constants for FP scalarization; NFC
llvm-svn: 357591
2019-04-03 14:41:24 +00:00
James Henderson d931cf3e46 [llvm-objcopy] Make section rename/set flags case-insensitive
This fixes https://bugs.llvm.org/show_bug.cgi?id=41305. GNU objcopy
--set-section-flags/--rename-section flags are case-insensitive, so this
patch updates llvm-objcopy to match.

Reviewed by: grimar

Differential Revision: https://reviews.llvm.org/D60200

llvm-svn: 357590
2019-04-03 14:40:27 +00:00
Sanjay Patel eb5ffc7842 [x86] add tests with constants for FP scalarization; NFC
llvm-svn: 357587
2019-04-03 14:36:47 +00:00
Petar Avramovic afa3afa384 [MIPS GlobalISel] Select floating point arithmetic operations
Select 32 and 64 bit floating point add, sub, mul and div for MIPS32.

Differential Revision: https://reviews.llvm.org/D60191

llvm-svn: 357584
2019-04-03 14:12:59 +00:00
Javed Absar 5820db93c9 [AArch64] Update v8.5a MTE LDG/STG instructions
The latest MTE specification adds register Xt to the STG instruction family:
  STG [Xn, #offset] -> STG Xt, [Xn, #offset]
The tag written to memory is taken from Xt rather than Xn.
Also, the LDG instruction also was changed to read return address from Xt:
  LDG Xt, [Xn, #offset].
This patch includes those changes and tests.
Specification is at: https://developer.arm.com/docs/ddi0596/c
Differential Revision: https://reviews.llvm.org/D60188

llvm-svn: 357583
2019-04-03 14:12:13 +00:00
Sanjay Patel 00dae6b22d [DAGCombiner] loosen restrictions for moving shuffles after vector binop
There are 3 changes to make this correspond to the same transform in instcombine:
1. Remove the legality check - we can't create anything less legal than we started with.
2. Ease the use restriction, so we only bail out if both operands have >1 use.
3. Ease the use restriction for binops with a repeated operand (eg, mul x, x).

As discussed in D60150, there's a scalarization opportunity that will be made
easier by allowing this transform more generally.

llvm-svn: 357580
2019-04-03 13:42:06 +00:00
Xing GUO 8f6166a72e [llvm-readobj] Add GNU style dumper for .gnu.version section
Summary: Currently, `llvm-readobj` do not support GNU style dumper for symbol versioning sections. In this patch, I would like to implement dumper for `.gnu.version` section

Reviewers: jhenderson, rupprecht, grimar

Reviewed By: jhenderson, rupprecht

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D59877

llvm-svn: 357578
2019-04-03 13:32:49 +00:00
James Henderson ef93be84d3 [llvm-nm]Add support for --no-demangle
GNU nm has --no-demangle, so llvm-nm should too. It disables the
--demangle switch. The patch also allows --demangle to be specified
multiple times (the last of all --no-demangle/--demangle switches
takes precedence).

Reviewed by: grimar, rupprecht, mattd

Differential Revision: https://reviews.llvm.org/D60134

llvm-svn: 357575
2019-04-03 12:57:46 +00:00
Simon Pilgrim 143279e61f [X86] Regenerate LEA codegen tests
llvm-svn: 357573
2019-04-03 12:33:16 +00:00
Clement Courbet 26a8ed3ac9 [X86] Make the post machine scheduler macrofusion-aware.
Summary:
Given that X86 does not use this currently, this is an NFC. I'll
experiment with enabling and will report numbers.

Reviewers: andreadb, lebedev.ri

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60185

llvm-svn: 357568
2019-04-03 09:37:30 +00:00
Clement Courbet 5bfa946d69 [X86][NFC] Add tests for misched macro-fusion.
llvm-svn: 357565
2019-04-03 08:21:54 +00:00
David Bolvansky 937720e75b [InstCombine] Simplify ctpop with bitreverse/bswap
Summary: Fixes PR41337

Reviewers: spatel

Reviewed By: spatel

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60148

llvm-svn: 357564
2019-04-03 08:08:44 +00:00
Hans Wennborg 94b867dc7c Revert r357256 "[DAGCombine] Improve Lifetime node chains."
As it caused a pathological compile-time regressionin V8, see PR41352.

> Improve both start and end lifetime nodes chain dependencies.
>
> Reviewers: courbet
>
> Reviewed By: courbet
>
> Subscribers: hiraditya, llvm-commits
>
> Tags: #llvm
>
> Differential Revision: https://reviews.llvm.org/D59795

This also reverts the follow-up r357309:

> [DAGCombiner] Rewrite ImproveLifetimeNodeChain to avoid DAG loop.
>
> Avoid EXPENSIVE_CHECK failure. NFCI.

llvm-svn: 357563
2019-04-03 07:41:58 +00:00
Chen Zheng 4178c15330 [PowerPC]add testcase for ppcctrloops pass shortloop check
llvm-svn: 357560
2019-04-03 03:11:34 +00:00
Matt Arsenault f426ddbfc7 AMDGPU: Assume ECC is enabled by default if supported
The test should really be checking for the property directly in the
code object headers, but there are problems with this. I don't see
this directly represented in the text form, and for the binary
emission this is depending on a function level subtarget feature to
emit a global flag.

llvm-svn: 357558
2019-04-03 01:58:57 +00:00
Matt Arsenault 03e7492876 InstSimplify: Fold round intrinsics from sitofp/uitofp
https://godbolt.org/z/gEMRZb

llvm-svn: 357549
2019-04-03 00:25:06 +00:00
Craig Topper 16683a3ef8 [X86] Update the test case for v4i1 bitselect in combine-bitselect.ll to not have an infinite loop in IR.
In fact we don't even need a loop at all. I backed out the bug fix this was testing for and verified that this new case hit the same issue.

This should stop D59626 from deleting some of this code by realizing it was dead due to the loop.

llvm-svn: 357544
2019-04-03 00:05:03 +00:00
Craig Topper ca9eb68541 [X86] Autogenerate complete checks. NFC
llvm-svn: 357543
2019-04-03 00:04:57 +00:00
Matt Arsenault 2065680b47 AMDGPU: Don't use the default cpu in a few tests
Avoids unnecessary test changes in a future commit.

llvm-svn: 357539
2019-04-03 00:00:58 +00:00
Jessica Paquette ed23352379 [GlobalISel] Add IRTranslator support for llvm.stacksave and llvm.stackrestore
Also update arm64-irtranslator.ll.

Differential Revision: https://reviews.llvm.org/D60140

llvm-svn: 357538
2019-04-02 22:46:31 +00:00
Stanislav Mekhanoshin ea2e227926 X86: regenerate speculative-load-hardening-indirect.ll tests. NFC.
llvm-svn: 357537
2019-04-02 22:44:46 +00:00
Sanjay Patel 8e6d41aeb2 [x86] add more tests for FP scalarization; NFC
llvm-svn: 357523
2019-04-02 20:24:06 +00:00
David Bolvansky 9f179b2c65 [InstCombine] Added tests for PR41337
llvm-svn: 357522
2019-04-02 20:21:26 +00:00
David Bolvansky 5ba60b22a4 [InstCombine] Simplify ctlz/cttz with bitreverse
Summary: Fixes PR41273

Reviewers: spatel

Reviewed By: spatel

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60096

llvm-svn: 357521
2019-04-02 20:13:28 +00:00
Jessica Paquette 22c6215c7e [AArch64][GlobalISel] Select llvm.aarch64.stlxr(i64, i64*)
This adds partial instruction selection support for llvm.aarch64.stlxr. It also
factors out selection for G_INTRINSIC_W_SIDE_EFFECTS into its own function. The
new function removes the restriction that the intrinsic ID on the
G_INTRINSIC_W_SIDE_EFFECTS be on operand 0.

Also add a test, and add a GISel line to arm64-ldxr-stxr.ll.

Differential Revision: https://reviews.llvm.org/D60100

llvm-svn: 357518
2019-04-02 19:57:26 +00:00
David Bolvansky 9bba938de4 [InstCombine] Added tests for PR41273
llvm-svn: 357508
2019-04-02 18:33:54 +00:00
Vedant Kumar 9da8a68d6b [ArgPromotion] Set debug location at updated callsites
Set the correct debug location on instructions which load arguments in
preparation for a call to an arg-promoted function.

This prevents location cascade from misattributing the line/scope of one
of these loads to the location of the instruction preceding the call.

Differential Revision: https://reviews.llvm.org/D60113

llvm-svn: 357500
2019-04-02 17:42:17 +00:00
Vedant Kumar c6bceec01a [DebugInfo] Fix pr41180 : Loop Vectorization Debugify Failure
Bug: https://bugs.llvm.org/show_bug.cgi?id=41180

In the bug test case the debug location was missing for the cmp instruction in
the "middle block" BB. This patch fixes the bug by copying the debug location
from the cmp of the scalar loop's terminator branch, if it exists.

The patch also fixes the debug location on the subsequent branch instruction.
It was previously using the location of the of the original loop's pre-header
block terminator. Both of these instructions will now map to the source line of
the conditional branch in the original loop.

A regression test has been added that covers these issues.

Patch by Orlando Cazalet-Hyams!

Differential Revision: https://reviews.llvm.org/D59944

llvm-svn: 357499
2019-04-02 17:28:34 +00:00
Craig Topper 0d3a533270 [X86] Allow FixupLEAs to form INC/DEC under OptSize not just MinSize
This matches our usual INC/DEC heuristic used during isel.

llvm-svn: 357497
2019-04-02 17:13:03 +00:00
Stefan Pintilie fa6cd5ceb9 [PowerPC] Fix reversed bit issue in DCMX mask for "xvtstdcdp" and "xvtstdcsp" P9 implementation
Did experiments on power 9 machine, checked the outputs for NaN & Infinity+
cases with corresponding DCMX bit set. Confirmed the DCMX mask bit for NaN and
infinity+ are reversed.

This patch fixes the issue.

Patch by Victor Huang.

Differential Revision: https://reviews.llvm.org/D59384

llvm-svn: 357494
2019-04-02 16:56:01 +00:00
Philip Reames d3d5d76a7b [WideableCond] Fix a nasty bug in detection of "explicit guards"
The code was failing to actually check for the presence of the call to widenable_condition.  The whole point of specifying the widenable_condition intrinsic was allowing widening transforms.  A normal branch is not widenable.  A normal branch leading to a deopt is not widenable (in general).

I added a test case via LoopPredication, but GuardWidening has an analogous bug.  Those are the only two passes actually using this utility just yet. Noticed while working on LoopPredication for non-widenable branches; POC in D60111.

llvm-svn: 357493
2019-04-02 16:51:43 +00:00
Jordan Rupprecht 017deaf1ae [llvm-objcopy] Change SHT_NOBITS to SHT_PROBITS for some --set-section-flags
Summary:
Some flags accepted by --set-section-flags and --rename-section can change a SHT_NOBITS section to a SHT_PROGBITS section. Note that none of them can change a SHT_PROGBITS to SHT_NOBITS.

The full list (found via experimentation of individually setting each flag) that does this is: contents, load, noload, code, data, rom, and debug.

This was found by testing llvm-objcopy with the gnu binutils test suite, specifically this test case: https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;a=blob;f=binutils/testsuite/binutils-all/copy-1.d;h=f2b0d9e90df738c2891b4d5c7b62f62894b556ca;hb=HEAD

Reviewers: jhenderson, grimar, jakehehrlich, alexshap, espindola

Reviewed By: jhenderson

Subscribers: emaste, arichardson, MaskRay, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D59958

llvm-svn: 357492
2019-04-02 16:49:56 +00:00
Joseph Tremoulet fb4d9f7287 [SimplifyCFG] Don't split musttail call from ret
Summary:
When inserting an `unreachable` after a noreturn call, we must ensure
that it's not a musttail call to avoid breaking the IR invariants for
musttail calls.

Reviewers: fedor.sergeev, majnemer

Reviewed By: majnemer

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60080

llvm-svn: 357485
2019-04-02 15:48:58 +00:00
Taewook Oh 6a27c48be2 [SampleProfile] Repeat indirect call promotion only when the target is actually hot.
Summary: It is possible that multiple indirect call targets have been promoted for a single callsite from the profiled binary. Current implementation repeats promotion for all these targets as far as the callsite itself is hot (the callsite is assumed to be hot if any one of these targets was "hot" during the profiling). However, even when one of the ICPed target is hot other targets may not, and we should not repeat promotion for "cold" targets.

Reviewers: danielcdh, wmi

Subscribers: hiraditya, jdoerfert, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D59940

llvm-svn: 357484
2019-04-02 15:48:21 +00:00
Joseph Tremoulet b69afa8e9b [PruneEH] Don't split musttail call from ret
Summary:
When inserting an `unreachable` after a noreturn call, we must ensure
that it's not a musttail call to avoid breaking the IR invariants for
musttail calls.

Reviewers: fedor.sergeev, majnemer

Reviewed By: majnemer

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60079

llvm-svn: 357483
2019-04-02 15:47:11 +00:00
Jonas Paulsson f76fe45426 [SystemZ] Improve instruction selection of 64 bit shifts and rotates.
For shift and rotate instructions that only use the last 6 bits of the shift
amount, a shift amount of (x*64-s) can be substituted with (-s). This saves
one instruction and a register:

  lhi     %r1, 64
  sr      %r1, %r3
  sllg    %r2, %r2, 0(%r1)
  =>
  lcr     %r1, %r3
  sllg    %r2, %r2, 0(%r1)

Review: Ulrich Weigand
llvm-svn: 357481
2019-04-02 15:36:30 +00:00
James Henderson 38cb238f75 [llvm-objcopy]Allow llvm-objcopy to be used on an ELF file with no section headers
This patch fixes https://bugs.llvm.org/show_bug.cgi?id=41293 and
https://bugs.llvm.org/show_bug.cgi?id=41045. llvm-objcopy assumed that
it could always read a section header string table. This isn't the case
when the sections were previously all stripped, and the e_shstrndx field
was set to 0. This patch fixes this. It also fixes a double space in an
error message relating to this issue, and prevents llvm-objcopy from
adding extra space for non-existent section headers, meaning that
--strip-sections on the output of a previous --strip-sections run
produces identical output, simplifying the test.

Reviewed by: rupprecht, grimar

Differential Revision: https://reviews.llvm.org/D59989

llvm-svn: 357475
2019-04-02 14:11:13 +00:00
Simon Atanasyan 2634a141fd [mips] Use AltOrders to prevent using odd FP-registers
To disable using of odd floating-point registers (O32 ABI and
-mno-odd-spreg command line option) such registers and their
super-registers added to the set of reserved registers. In general, it
works. But there is at least one problem - in case of enabled machine
verifier pass some floating-point tests failed because live ranges of
register units that are reserved is not empty and verification pass
failed with "Live segment doesn't end at a valid instruction" error
message.

There is D35985 patch which tries to solve the problem by explicit
removing of register units. This solution did not get approval.

I would like to use another approach for prevent using odd floating
point registers - define `AltOrders` and `AltOrderSelect` for MIPS
floating point register classes. Such `AltOrders` contains reduced set
of registers. At first glance, such solution does not break any test
cases and allows enabling machine instruction verification for all MIPS
test cases.

Differential Revision: http://reviews.llvm.org/D59799

llvm-svn: 357472
2019-04-02 13:57:32 +00:00
Alex Bradbury f8078f6b1d [RISCV] Support assembling @plt symbol operands
This patch allows symbols appended with @plt to parse and assemble with the
R_RISCV_CALL_PLT relocation.

Differential Revision: https://reviews.llvm.org/D55335
Patch by Lewis Revill.

llvm-svn: 357470
2019-04-02 12:47:20 +00:00
Pavel Labath 3cee663e71 Add minidump support to obj2yaml
Summary:
This patch adds the code needed to parse a minidump file into the
MinidumpYAML model, and the necessary glue code so that obj2yaml can
recognise the minidump files and process them.

Reviewers: jhenderson, zturner, clayborg

Subscribers: mgorny, lldb-commits, amccarth, markmentovai, aprantl, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D59634

llvm-svn: 357469
2019-04-02 11:58:37 +00:00
Simon Pilgrim 64bd87ad4b [X86][AVX] Add test case showing failure to fold broadcast load if its also used as a scalar
llvm-svn: 357465
2019-04-02 10:31:00 +00:00
Sander de Smalen 7f23e0a62f Enforce StackID definition in PEI
There are various places in LLVM where the definition of StackID is not
properly honoured, for example in PEI where objects with a StackID > 0 are
allocated on the default stack (StackID0). This patch enforces that PEI
only considers allocating objects to StackID 0.

Reviewers: arsenm, thegameg, MatzeB

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D60062

llvm-svn: 357460
2019-04-02 09:46:52 +00:00
Hans Wennborg b669fea42f SimplifyCFG SinkCommonCodeFromPredecessors: Also sink function calls without used results (PR41259)
The code was previously checking that candidates for sinking had exactly
one use or were a store instruction (which can't have uses). This meant
we could sink call instructions only if they had a use.

That limitation seemed a bit arbitrary, so this patch changes it to
"instruction has zero or one use" which seems more natural and removes
the need to special-case stores.

Differential revision: https://reviews.llvm.org/D59936

llvm-svn: 357452
2019-04-02 08:01:38 +00:00
Craig Topper 536383a354 [X86] Add test cases to fixup-lea.ll for optsize and no size optimization. Add +/-slow-incdec command lines
We only form inc/dec in FixupLEAs under minsize today, but all other locations in the compiler for inc/dec with optsize.

llvm-svn: 357446
2019-04-02 00:54:22 +00:00
Craig Topper c133015975 [X86] Autogenerate complete checks. NFC
llvm-svn: 357445
2019-04-02 00:54:15 +00:00
Matt Arsenault fa0a2c529b InstSimplify: Add missing case from r357386
llvm-svn: 357443
2019-04-02 00:46:19 +00:00
Michael Liao 9bef688bc2 [AMDGPU] Add more test cases of D59608.
Summary: - Add more test cases.

Reviewers: arsenm

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60071

llvm-svn: 357442
2019-04-02 00:36:37 +00:00
Matt Arsenault 294e07cf03 AMDGPU: Fix test filename
llvm-svn: 357441
2019-04-02 00:36:04 +00:00
Eli Friedman 3813fe0bda [ARM] Optimize expressions like "return x != 0;" for Thumb1.
There's an existing optimization for x != C, but somehow it was missing
a special case for 0.

While I'm here, also cleaned up the code/comments a bit: the second
value produced by the MERGE_VALUES was actually dead, since a CMOV only
produces one result.

Differential Revision: https://reviews.llvm.org/D59616

llvm-svn: 357437
2019-04-02 00:01:23 +00:00
Eli Friedman 73af6ef2e7 [ARM] Don't try to create "push {r12, lr}" in Thumb1 at -Oz.
It's a little tricky to make this issue show up because
prologue/epilogue emission normally likes to push at least two
registers... but it doesn't when lr is force-spilled due to function
length.  Not sure if that really makes sense, but I decided not to touch
it for now.

Differential Revision: https://reviews.llvm.org/D59385

llvm-svn: 357436
2019-04-01 23:55:57 +00:00
Jessica Paquette e44c20a68d [AArch64][GlobalISe] Select STRQui for stores into v264s instead of scalarizing
This improves selection for vector stores into v2s64s. Before we just
scalarized them, but we can just use a STRQui instead.

Differential Revision: https://reviews.llvm.org/D60083

llvm-svn: 357432
2019-04-01 22:19:13 +00:00
Yi Kong f2baddb0fc [llvm-objcopy] Add --keep-symbols option
Differential Revision: https://reviews.llvm.org/D60054

llvm-svn: 357418
2019-04-01 18:12:43 +00:00
Caroline Tice 2a67c91076 Commit accidentally omitted test case.
This test case was approved as part of
https://reviews.llvm.org/D49434, but was accidentally
omitted from the final commit.

llvm-svn: 357409
2019-04-01 16:29:40 +00:00
Philip Reames 05e3e554b4 [LoopPred] Be uniform about proving generated conditions
We'd been optimizing the case where the predicate was obviously true, do the same for the false case.  Mostly just for completeness sake, but also may improve compile time in loops which will exit through the guard.  Such loops are presumed rare in fastpath code, but may be present down untaken paths, so optimizing for them is still useful.

llvm-svn: 357408
2019-04-01 16:26:08 +00:00
Bixia Zheng 6c21ccd245 [NVPTX] Fix the codegen for llvm.round.
Summary:
Previously, we translate llvm.round to PTX cvt.rni, which rounds to the
even interger when the source is equidistant between two integers. This
is not correct as llvm.round should round away from zero. This change
replaces llvm.round with a round away from zero implementation through
target specific custom lowering.

Modify a few affected tests to not check for cvt.rni. Instead, we check
for the use of a few constants used in implementing round. We are also
adding CUDA runnable tests to check for the values produced by
llvm.round to test-suites/External/CUDA.

Reviewers: tra

Subscribers: jholewinski, sanjoy, jlebar, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D59947

llvm-svn: 357407
2019-04-01 16:10:26 +00:00
Philip Reames d109e2a7c3 [LoopPred] Delete the old condition expressions if unused
LoopPredication was replacing the original condition, but leaving the instructions to compute the old conditions around.  This would get cleaned up by other passes of course, but we might as well do it eagerly.  That also makes the test output less confusing.  

llvm-svn: 357406
2019-04-01 16:05:15 +00:00
Philip Reames 7eee62b5d4 [Tests] Autogen all the LoopPredication tests
I'm about to make some changes to the pass which cause widespread - but uninteresting - test diffs.  Prepare the tests for easy updating.

llvm-svn: 357404
2019-04-01 15:35:30 +00:00
Philip Reames 9ef7708bbb [Tests] Add tests for a possible loop predication transform variant
As highlighted by tests, if one of the operands is loop variant, but guaranteed to have the same value on all iterations, we have a missed oppurtunity.

llvm-svn: 357403
2019-04-01 15:32:07 +00:00
Neil Henning 0a30f33ce2 [AMDGPU] Pre-allocate WWM registers to reduce VGPR pressure.
This change incorporates an effort by Connor Abbot to change how we deal
with WWM operations potentially trashing valid values in inactive lanes.

Previously, the SIFixWWMLiveness pass would work out which registers
were being trashed within WWM regions, and ensure that the register
allocator did not have any values it was depending on resident in those
registers if the WWM section would trash them. This worked perfectly
well, but would cause sometimes severe register pressure when the WWM
section resided before divergent control flow (or at least that is where
I mostly observed it).

This fix instead runs through the WWM sections and pre allocates some
registers for WWM. It then reserves these registers so that the register
allocator cannot use them. This results in a significant register
saving on some WWM shaders I'm working with (130 -> 104 VGPRs, with just
this change!).

Differential Revision: https://reviews.llvm.org/D59295

llvm-svn: 357400
2019-04-01 15:19:52 +00:00
David Spickett 3d233d5d4d [AArch64] Add v8.5-a Memory Tagging STZGM instruction
This instruction writes a block of allocation tags
and stores zero to the associated data locations.

It differs from STGM by 1 bit and has the same
arguments.

The specification can be found here:
https://developer.arm.com/docs/ddi0596/c

Differential Revision: https://reviews.llvm.org/D60065

llvm-svn: 357397
2019-04-01 14:56:37 +00:00
David Spickett 9142b8ef1b [AArch64] Add v8.5-a Memory Tagging STGM/LDGM instructions
The STGV/LDGV instructions were replaced with
STGM/LDGM. The encodings remain the same but there
is no longer writeback so there are no unpredictable
encodings to check for.

The specfication can be found here:
https://developer.arm.com/docs/ddi0596/c

Differential Revision: https://reviews.llvm.org/D60064

llvm-svn: 357395
2019-04-01 14:52:18 +00:00
Alex Bradbury da20f5ca74 [RISCV] Generate address sequences suitable for mcmodel=medium
This patch adds an implementation of a PC-relative addressing sequence to be
used when -mcmodel=medium is specified. With absolute addressing, a 'medium'
codemodel may cause addresses to be out of range. This is because while
'medium' implies a 2 GiB addressing range, this 2 GiB can be at any offset as
opposed to 'small', which implies the first 2 GiB only.

Note that LLVM/Clang currently specifies code models differently to GCC, where
small and medium imply the same functionality as GCC's medlow and medany
respectively.

Differential Revision: https://reviews.llvm.org/D54143
Patch by Lewis Revill.

llvm-svn: 357393
2019-04-01 14:42:56 +00:00
David Spickett efe376add6 [AArch64] Add v8.5-a Memory Tagging GMID_EL1 register
The latest version of the MTE spec added a system
register 'GMID_EL1'. It contains the block size used
by the LDGM and STGM instructions and is read only.

The specification can be found here:
https://developer.arm.com/docs/ddi0596/c

llvm-svn: 357392
2019-04-01 14:41:14 +00:00
Mikael Holmen 150a7ec2dc [InstCombine] Handle vector gep with scalar argument in evaluateInDifferentElementOrder
Summary:
This fixes PR41270.

The recursive function evaluateInDifferentElementOrder expects to be called
on a vector Value, so when we call it on a vector GEP's arguments, we must
first check that the argument is indeed a vector.

Reviewers: reames, spatel

Reviewed By: spatel

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60058

llvm-svn: 357389
2019-04-01 14:10:10 +00:00
Mikael Holmen 3e527cd823 Revert "[InstCombine] Handle vector gep with scalar argument in evaluateInDifferentElementOrder"
This reverts commit 75216a6dbcfe5fb55039ef06a07e419fa875f4a5.

I'll recommit with a better commit message with reference to the
phabricator review.

llvm-svn: 357387
2019-04-01 14:06:45 +00:00
Matt Arsenault 0276b94356 InstSimplify: Add baseline test for upcoming change
llvm-svn: 357386
2019-04-01 14:03:44 +00:00
Mikael Holmen d66a47f90a [InstCombine] Handle vector gep with scalar argument in evaluateInDifferentElementOrder
This fixes PR41270.

The recursive function evaluateInDifferentElementOrder expects to be called
on a vector Value, so when we call it on a vector GEP's arguments, we must
first check that the argument is indeed a vector.

llvm-svn: 357385
2019-04-01 13:48:56 +00:00
Clement Courbet 7e062c9b1f [X86] Make post-ra scheduling macrofusion-aware.
Subscribers: MatzeB, arsenm, jvesely, nhaehnle, hiraditya, javed.absar, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D59688

llvm-svn: 357384
2019-04-01 13:48:50 +00:00
Sanjay Patel 97d1bc4454 [InstCombine] eliminate commuted select-shuffles + binop (PR41304)
If we have a commutable vector binop with inverted select-shuffles,
we don't care about the order of the operands in each vector lane:

LHS = shuffle V1, V2, <0, 5, 6, 3>
RHS = shuffle V2, V1, <0, 5, 6, 3>
LHS + RHS --> <V1[0]+V2[0], V2[1]+V1[1], V2[2]+V1[2], V1[3]+V2[3]> --> V1 + V2

PR41304:
https://bugs.llvm.org/show_bug.cgi?id=41304
...is currently titled as an SLP enhancement, but at least for the
given example, we can reduce that in instcombine because we are just
eliminating shuffles.

As noted in the TODO, this could be generalized, but I haven't thought
through those patterns completely, so this is limited to what appears
to be always safe.

Differential Revision: https://reviews.llvm.org/D60048

llvm-svn: 357382
2019-04-01 13:36:40 +00:00
Clement Courbet d9f6ee1c3c [X86MacroFusion][NFC] Add more tests.
In preparation for D59688.

llvm-svn: 357381
2019-04-01 13:18:34 +00:00
Krasimir Georgiev 7af32444b9 [X86] Fix a test from r357317
Summary:
The missing `<` causes the lld command to override the test file, which fails in
environments marking the test files as readonly.

Reviewers: bkramer

Reviewed By: bkramer

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60060

llvm-svn: 357380
2019-04-01 11:42:54 +00:00
Simon Pilgrim e8c3136994 [X86][SSE] Add fcmp constant folding tests
Initial test coverage for D60006

llvm-svn: 357379
2019-04-01 10:54:04 +00:00
Luis Marques 3091884e25 [RISCV] Add seto pattern expansion
Adds a `seto` pattern expansion. Without it the lowerings of `fcmp one` and 
`fcmp ord` would be inefficient due to an unoptimized double negation.

Differential Revision: https://reviews.llvm.org/D59699

llvm-svn: 357378
2019-04-01 09:54:14 +00:00
Alex Bradbury ca81a56f65 [RISCV] Don't evaluatePCRelLo if a relocation will be forced (e.g. due to linker relaxation)
A pcrel_lo will point to the associated pcrel_hi fixup which in turn points to
the real target. RISCVMCExpr::evaluatePCRelLo will work around this
indirection in order to allow the fixup to be evaluate properly. However, if
relocations are forced (e.g. due to linker relaxation is enabled) then its
evaluation is undesired and will result in a relocation with the wrong target.

This patch modifies evaluatePCRelLo so it will not try to evaluate if the
fixup will be forced as a relocation. A new helper method is added to
RISCVAsmBackend to query this.

Differential Revision: https://reviews.llvm.org/D59686

llvm-svn: 357374
2019-04-01 02:38:27 +00:00
Sanjay Patel 7ac1186b58 [InstCombine] add tests for inverted select-shuffles + binop (PR41304); NFC
llvm-svn: 357368
2019-03-31 15:45:47 +00:00
Sanjay Patel e1bc360fc6 [x86] allow movmsk with 2-element reductions
One motivation for making this change is that the lack of using movmsk is likely
a main source of perf difference between clang and gcc on the C-Ray benchmark as
shown here:
https://www.phoronix.com/scan.php?page=article&item=gcc-clang-2019&num=5
...but this change alone isn't enough to solve that problem.

The 'all-of' examples show what is likely the worst case trade-off: we end up with
an extra instruction (or 2 if we count the 'xor' register clearing). The 'any-of'
examples look clearly better using movmsk because we've traded 2 vector instructions
for 2 scalar instructions, and movmsk may have better timing than the generic 'movq'.

If we examine the llvm-mca output for these cases, it appears that even though the
'all-of' movmsk variant looks worse on paper, it would perform better on both
Haswell and Jaguar.

  $ llvm-mca -mcpu=haswell no_movmsk.s -timeline
  Iterations:        100
  Instructions:      400
  Total Cycles:      504
  Total uOps:        400

  Dispatch Width:    4
  uOps Per Cycle:    0.79
  IPC:               0.79
  Block RThroughput: 1.0

  $ llvm-mca -mcpu=haswell movmsk.s -timeline
  Iterations:        100
  Instructions:      600
  Total Cycles:      358
  Total uOps:        600

  Dispatch Width:    4
  uOps Per Cycle:    1.68
  IPC:               1.68
  Block RThroughput: 1.5

  $ llvm-mca -mcpu=btver2 no_movmsk.s -timeline
  Iterations:        100
  Instructions:      400
  Total Cycles:      407
  Total uOps:        400

  Dispatch Width:    2
  uOps Per Cycle:    0.98
  IPC:               0.98
  Block RThroughput: 2.0

  $ llvm-mca -mcpu=btver2 movmsk.s -timeline
  Iterations:        100
  Instructions:      600
  Total Cycles:      311
  Total uOps:        600

  Dispatch Width:    2
  uOps Per Cycle:    1.93
  IPC:               1.93
  Block RThroughput: 3.0

Finally, there may be CPUs where movmsk is horribly slow (old AMD small cores?), but if
that's true, then we're also almost certainly making the wrong transform already for
reductions with >2 elements, so that should be fixed independently.

Differential Revision: https://reviews.llvm.org/D59997

llvm-svn: 357367
2019-03-31 15:11:34 +00:00
Sanjay Patel b276dd195a [InstCombine] canonicalize select shuffles by commuting
In PR41304:
https://bugs.llvm.org/show_bug.cgi?id=41304
...we have a case where we want to fold a binop of select-shuffle (blended) values.

Rather than try to match commuted variants of the pattern, we can canonicalize the
shuffles and check for mask equality with commuted operands.

We don't produce arbitrary shuffle masks in instcombine, but select-shuffles are a
special case that the backend is required to handle because we already canonicalize
vector select to this shuffle form.

So there should be no codegen difference from this change. It's possible that this
improves CSE in IR though.

Differential Revision: https://reviews.llvm.org/D60016

llvm-svn: 357366
2019-03-31 15:01:30 +00:00
Luqman Aden 7c67dbdc65 [NFC][InstCombine] Add tests for combining icmp of no-wrap sub w/ constant.
llvm-svn: 357360
2019-03-31 08:58:50 +00:00
Simon Pilgrim ec56621a5c [SystemZ] Remove fcmp undef from reduced test
Pre-commit for D60006 (Add fcmp UNDEF handling to SelectionDAG::FoldSetCC)

Approved by @uweigand (Ulrich Weigand)

llvm-svn: 357355
2019-03-30 20:24:26 +00:00
Simon Pilgrim 513e6b9d58 [MIPS] Remove fcmp undef from reduced test
Pre-commit for D60006 (Add fcmp UNDEF handling to SelectionDAG::FoldSetCC)

Approved by @atanasyan (Simon Atanasyan)

llvm-svn: 357354
2019-03-30 20:16:16 +00:00
Craig Topper e4a0fc7d75 [X86] Teach isel for RMW binops to handle negate
Negate updates flags like a subtract. We should be able to use the flags from the RMW form of negate when we have (store (X86ISD::SUB 0, load A), A)

Differential Revision: https://reviews.llvm.org/D60007

llvm-svn: 357353
2019-03-30 18:59:17 +00:00
Alex Bradbury 0b2803ee65 [RISCV] Add codegen support for ilp32f, ilp32d, lp64f, and lp64d ("hard float") ABIs
This patch adds support for the RISC-V hard float ABIs, building on top of
rL355771, which added basic target-abi parsing and MC layer support. It also
builds on some re-organisations and expansion of the upstream ABI and calling
convention tests which were recently committed directly upstream.

A number of aspects of the RISC-V float hard float ABIs require frontend
support (e.g. flattening of structs and passing int+fp for fp+fp structs in a
pair of registers), and will be addressed in a Clang patch.

As can be seen from the tests, it would be worthwhile extending
RISCVMergeBaseOffsets to handle constant pool as well as global accesses.

Differential Revision: https://reviews.llvm.org/D59357

llvm-svn: 357352
2019-03-30 17:59:30 +00:00
Simon Pilgrim 10c9032c02 [X86][SSE] detectAVGPattern - Match zext(or(x,y)) 'add like' patterns (PR41316)
Fixes PR41316 where the expanded PAVG intrinsic had had one of its ADDs turned into an OR due to its operands having no conflicting bits.

llvm-svn: 357351
2019-03-30 17:12:29 +00:00
Alex Bradbury b5498cbf64 [RISCV] Add RV64 CHECK lines to test/CodeGen/RISCV/vararg.ll and prepare for hard float tests
vararg.ll previously missed RV64 tests. This patch also prepares for using
vararg.ll to test handling of varargs for the ilp32f/ilp32d/lp64f/lp64d hard
float ABIs. In these ABIs, varargs are passed as in either the ilp32 or lp64
ABI. Due to some slight codegen differences, different check lines are needed
for when RV32D is enabled.

llvm-svn: 357350
2019-03-30 15:53:38 +00:00
Simon Pilgrim cfdf09ba7d [X86][SSE] Add PAVG test case from PR41316
llvm-svn: 357346
2019-03-30 13:53:11 +00:00
Heejin Ahn c4ac74fb49 [WebAssembly] Fix unwind destination mismatches in CFG stackify
Summary:
Linearing the control flow by placing `try`/`end_try` markers can create
mismatches in unwind destinations. This patch resolves these mismatches
by wrapping those instructions with an incorrect unwind destination with
a nested `try`/`catch`/`end_try` and branching to the right destination
within the new catch block.

Reviewers: dschuff

Subscribers: sunfish, sbc100, jgravelle-google, chrib, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D48345

llvm-svn: 357343
2019-03-30 11:04:48 +00:00
Heejin Ahn e9fd9073e4 [WebAssembly] Run ExplicitLocals pass after CFGStackify
Summary:
While this does not change any final output, this will greatly simplify
ixing unwind destination mismatches in CFGStackify (D48345), because we
have to create some new registers there.

Reviewers: dschuff

Subscribers: sunfish, sbc100, jgravelle-google, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D59652

llvm-svn: 357342
2019-03-30 09:29:57 +00:00
Alex Bradbury 9681b01c21 [RISCV] Add DAGCombine for (SplitF64 (ConstantFP x))
The SplitF64 node is used on RV32D to convert an f64 directly to a pair of i32
(necessary as bitcasting to i64 isn't legal). When performed on a ConstantFP,
this will result in a FP load from the constant pool followed by a store to
the stack and two integer loads from the stack (necessary as there is no way
to directly move between f64 FPRs and i32 GPRs on RV32D). It's always cheaper
to just materialise integers for the lo and hi parts of the FP constant, so do
that instead.

llvm-svn: 357341
2019-03-30 09:15:47 +00:00
Alex Bradbury 98b8ecde64 [RISCV][NFC] Remove floating point operations from test/CodeGen/RISCV/vararg.ll
This minimises differences in output when compiling with hardware floating
point support, which will be done in a future patch (to demonstrate the same
vararg calling convention is used).

llvm-svn: 357339
2019-03-30 05:24:42 +00:00
Heejin Ahn 7e7aad1510 [WebAssembly] Optimize the number of routing blocks in FixIrreducibleCFG
Summary:
Currently we create a routing block to the dispatch block for every
predecessor of every entry. So the total number of routing blocks
created will be (# of preds) * (# of entries). But we don't need to do
this: we need at most 2 routing blocks per loop entry, one for when the
predecessor is inside the loop and one for it is outside the loop. (We
can't merge these into one because this will creates another loop cycle
between blocks inside and blocks outside) This patch fixes this and
creates at most 2 routing blocks per entry.

This also renames variable `Split` to `Routing`, which I think is a bit
clearer.

Reviewers: kripken

Subscribers: sunfish, dschuff, sbc100, jgravelle-google, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D59462

llvm-svn: 357337
2019-03-30 01:31:11 +00:00
Thomas Lively 5f0c4c67bb [WebAssembly] Add mutable globals feature
Summary:
This feature is not actually used for anything in the WebAssembly
backend, but adding it allows users to get it into the target features
sections of their objects, which makes these objects
future-compatible.

Reviewers: aheejin, dschuff

Subscribers: sbc100, jgravelle-google, hiraditya, sunfish, jdoerfert, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D60013

llvm-svn: 357321
2019-03-29 22:00:18 +00:00
Sanjoy Das 32fd32bc6f [SCEV] Check the cache in get{S|U}MaxExpr before doing any work
Summary:
This lets us avoid e.g. checking if A >=s B in getSMaxExpr(A, B) if we've
already established that (A smax B) is the best we can do.

Fixes PR41225.

Reviewers: asbirlea

Subscribers: mcrosier, jlebar, bixia, jdoerfert, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60010

llvm-svn: 357320
2019-03-29 22:00:12 +00:00
Alina Sbirlea f085cc5aa7 [MemorySSA] Limit clobber walks.
Summary: This patch limits all getClobberingMemoryAccess() walks to MaxCheckLimit.

Reviewers: george.burgess.iv

Subscribers: sanjoy, jlebar, Prazek, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D59569

llvm-svn: 357319
2019-03-29 21:56:09 +00:00
Jessica Paquette d3ffd47df9 [GlobalISel][AArch64] Add isel support for G_INSERT_VECTOR_ELT on v2s32s
This adds support for v2s32 vector inserts, and updates the selection +
regbankselect tests for G_INSERT_VECTOR_ELT.

Differential Revision: https://reviews.llvm.org/D59910

llvm-svn: 357318
2019-03-29 21:39:36 +00:00
Amara Emerson d413f41de6 [X86] When using Win64 ABI, exit with error if SSE is disabled for varargs
We need XMM registers to handle varargs with the Win64 ABI. Before we would
silently generate bad code resulting in an assertion failure elsewhere in the
backend.

llvm-svn: 357317
2019-03-29 21:30:51 +00:00
Alina Sbirlea e589067e61 [MemorySSA] Don't optimize incomplete phis.
Summary:
MemoryPhis cannot be optimized out until they are complete.
Resolves PR41254.

Reviewers: george.burgess.iv

Subscribers: sanjoy, jlebar, Prazek, jdoerfert, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D59966

llvm-svn: 357315
2019-03-29 21:16:31 +00:00
Heejin Ahn 67f74aceab [WebAssembly] Handle END_LOOP in unreachable BB in CFGStackify
Summary:
This fixes crashes when a BB in which an END_LOOP is to be placed is
unreachable and does not have any predecessors. Fixes PR41307.

Reviewers: dschuff

Subscribers: yurydelendik, sbc100, jgravelle-google, sunfish, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60004

llvm-svn: 357303
2019-03-29 19:36:51 +00:00
Matt Arsenault 055e4dce45 AMDGPU: Remove dx10-clamp from subtarget features
Since this can be set with s_setreg*, it should not be a subtarget
property. Set a default based on the calling convention, and Introduce
a new amdgpu-dx10-clamp attribute to override this if desired.

Also introduce a new amdgpu-ieee attribute to match.

The values need to match to allow inlining. I think it is OK for the
caller's dx10-clamp attribute to override the callee, but there
doesn't appear to be the infrastructure to do this currently without
definining the attribute in the generic Attributes.td.

Eventually the calling convention lowering will need to insert a mode
switch somewhere for these.

llvm-svn: 357302
2019-03-29 19:14:54 +00:00
Simon Pilgrim d395bc1cc2 [Hexagon] Remove fcmp undef from reduced tests
Pre-commit for D60006 (Add fcmp UNDEF handling to SelectionDAG::FoldSetCC)

Approved by @kparzysz (Krzysztof Parzyszek)

llvm-svn: 357301
2019-03-29 19:14:52 +00:00
Craig Topper 103fbbbfca [X86] Add test cases showing failure to use RMW form of negate when only flags are used. NFC
llvm-svn: 357300
2019-03-29 19:09:37 +00:00
Simon Pilgrim 759cbee744 [SystemZ] Regenerate double constant comparison test
Prep work for PR40800 (Add UNDEF handling to SelectionDAG::FoldSetCC) 

llvm-svn: 357295
2019-03-29 18:23:08 +00:00
Simon Pilgrim 05e2621342 [MIPS] Regenerate double constant comparison test
Prep work for PR40800 (Add UNDEF handling to SelectionDAG::FoldSetCC) 

llvm-svn: 357294
2019-03-29 18:22:18 +00:00
Simon Pilgrim a3fb3d5583 [ARM] Regenerate execute-only float comparison tests
Prep work for PR40800 (Add UNDEF handling to SelectionDAG::FoldSetCC) 

llvm-svn: 357293
2019-03-29 18:21:19 +00:00
Sanjay Patel 01c07b1a45 [InstCombine] autogenerate complete checks; NFC
llvm-svn: 357291
2019-03-29 17:51:39 +00:00
Sanjay Patel 2bff8b4272 [InstCombine] regenerate test checks; NFC
llvm-svn: 357288
2019-03-29 17:47:51 +00:00
Simon Pilgrim dee8a14389 [AArch64] Regenerate half precision tests
Prep work for PR40800 (Add UNDEF handling to SelectionDAG::FoldSetCC) 

llvm-svn: 357286
2019-03-29 17:46:06 +00:00
Nirav Dave fe59e14031 [DAGCombine] Prune unnused nodes.
Summary:
Nodes that have no uses are eventually pruned when they are selected
from the worklist. Record nodes newly added to the worklist or DAG and
perform pruning after every combine attempt.

Reviewers: efriedma, RKSimon, craig.topper, spatel, jyknight

Reviewed By: jyknight

Subscribers: jdoerfert, jyknight, nemanjai, jvesely, nhaehnle, javed.absar, hiraditya, jsji, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D58070

llvm-svn: 357283
2019-03-29 17:35:56 +00:00
Simon Pilgrim b4b98a528b [ARM] Regenerate vector comparison tests
Prep work for PR40800 (Add UNDEF handling to SelectionDAG::FoldSetCC) 

llvm-svn: 357281
2019-03-29 17:35:11 +00:00
Simon Pilgrim 4e00a93558 [X86] Fix some tests using fcmp with undef arguments
Prep work for PR40800 (Add UNDEF handling to SelectionDAG::FoldSetCC) 

llvm-svn: 357278
2019-03-29 17:20:27 +00:00
Jordan Rupprecht 871baa2551 [llvm-readobj] Add some generic notes (e.g. NT_VERSION)
Summary: Support reading notes that don't have a standard note name.

Reviewers: MaskRay

Reviewed By: MaskRay

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D59969

llvm-svn: 357271
2019-03-29 16:48:19 +00:00
Jordan Rupprecht 342aaa14b1 [llvm-readelf] Allow prefix flags for -p and -x
Summary: This allows syntax like `llvm-readelf -p.data1 -x.data2`.

Reviewers: jhenderson

Reviewed By: jhenderson

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D59965

llvm-svn: 357270
2019-03-29 16:43:13 +00:00
Simon Pilgrim 6a75c36ea9 [SLP] Add support for commutative icmp/fcmp predicates
For the cases where the icmp/fcmp predicate is commutative, use reorderInputsAccordingToOpcode to collect and commute the operands.

This requires a helper to recognise commutativity in both general Instruction and CmpInstr types - the CmpInst::isCommutative doesn't overload the Instruction::isCommutative method for reasons I'm not clear on (maybe because its based on predicate not opcode?!?).

Differential Revision: https://reviews.llvm.org/D59992

llvm-svn: 357266
2019-03-29 15:28:25 +00:00
Simon Atanasyan f26f56d6d3 [mips] Fix lowering a signed immediate for *.d MSA instructions
The `lowerMSASplatImm` function zero-extends `i32` immediates while
building constant. If target type is `i64`, negative immediate loses
the sign. As a result, for example `__builtin_msa_ldi_d(-1)` lowered
to series of instruction loads incorrect value 0xffffffff to the `$w0`
register instead of single `ldi.d $w0, -1` instruction.

The fix zero-extends unsigned immediates and signed-extend signed
immediates.

Differential Revision: http://reviews.llvm.org/D59884

llvm-svn: 357264
2019-03-29 15:15:22 +00:00
Dmitry Preobrazhensky d6827ce3a3 [AMDGPU][MC] Corrected conversion rules for inlinable constants to match rules for literals
See bug 40806: https://bugs.llvm.org/show_bug.cgi?id=40806

Reviewers: artem.tamazov, arsenm

Differential Revision: https://reviews.llvm.org/D59786

llvm-svn: 357262
2019-03-29 14:50:20 +00:00
Sanjay Patel 12685d0f7c [DAGCombiner] simplify shuffle of shuffle
After investigating the examples from D59777 targeting an SSE4.1 machine,
it looks like a very different problem due to how we map illegal types (256-bit in these cases).

We're missing a shuffle simplification that maps elements of a vector back to a shuffled operand.
We have a more general version of this transform in DAGCombiner::visitVECTOR_SHUFFLE(), but that
generality means it is limited to patterns with a one-use constraint, and the examples here have
2 uses. We don't need any uses or legality limitations for a simplification (no new value is
created).

It looks like we miss this pattern in IR too.

In one of the zext examples here, we have shuffle masks like this:

Shuf0 = vector_shuffle<0,u,3,7,0,u,3,7>
Shuf = vector_shuffle<4,u,6,7,u,u,u,u>

...so that's moving the high half of the 1st vector into the low half. But the high half of the
1st vector is already identical to the low half.

Differential Revision: https://reviews.llvm.org/D59961

llvm-svn: 357258
2019-03-29 14:20:38 +00:00
Nirav Dave 9259de217e [DAGCombine] Improve Lifetime node chains.
Improve both start and end lifetime nodes chain dependencies.

Reviewers: courbet

Reviewed By: courbet

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D59795

llvm-svn: 357256
2019-03-29 14:09:47 +00:00
Sanjay Patel 665a385035 [DAGCombiner] fold sext into decrement
This is a sibling to rL357178 that I noticed we'd hit if we chose
an alternate transform in D59818.

  %z = zext i8 %x to i32
  %dec = add i32 %z, -1
  %r = sext i32 %dec to i64
  =>
  %z2 = zext i8 %x to i64
  %r = add i64 %z2, -1

https://rise4fun.com/Alive/kPP

The x86 vector diffs show a slight regression, so there's a chance
that we should limit this and the previous transform to scalars.

But given that we allowed vectors before, I'm matching that behavior
here. We should change both transforms together if that's the right
thing to do.

llvm-svn: 357254
2019-03-29 13:49:08 +00:00
Hans Wennborg 800b12f90a Switch lowering: exploit unreachable fall-through when lowering case range cluster
In the example below, we would previously emit two range checks, one for cases
1--3 and one for 4--6. This patch makes us exploit the fact that the
fall-through is unreachable and only one range check is necessary.

  switch i32 %i, label %default [
    i32 1,  label %bb1
    i32 2,  label %bb1
    i32 3,  label %bb1
    i32 4,  label %bb2
    i32 5,  label %bb2
    i32 6,  label %bb2
  ]
  default: unreachable

llvm-svn: 357252
2019-03-29 13:40:05 +00:00
Sanjay Patel 881bcbe094 [x86] add tests for decrement+sext; NFC
llvm-svn: 357251
2019-03-29 13:34:48 +00:00
Dmitry Preobrazhensky 7f33574be3 [AMDGPU][MC] Corrected handling of tied src for atomic return MUBUF opcodes
See bug 40917: https://bugs.llvm.org/show_bug.cgi?id=40917

Reviewers: artem.tamazov, arsenm

Differential Revision: https://reviews.llvm.org/D59878

llvm-svn: 357249
2019-03-29 12:16:04 +00:00
Andrea Di Biagio e074ac60b4 [MCA] Add an experimental MicroOpQueue stage.
This patch adds an experimental stage named MicroOpQueueStage.
MicroOpQueueStage can be used to simulate a hardware micro-op queue (basically,
a decoupling queue between 'decode' and 'dispatch').  Users can specify a queue
size, as well as a optional MaxIPC (which - in the absence of a "Decoders" stage
- can be used to simulate a different throughput from the decoders).

This stage is added to the default pipeline between the EntryStage and the
DispatchStage only if PipelineOption::MicroOpQueue is different than zero. By
default, llvm-mca sets PipelineOption::MicroOpQueue to the value of hidden flag
-micro-op-queue-size.

Throughput from the decoder can be simulated via another hidden flag named
-decoder-throughput.  That flag allows us to quickly experiment with different
frontend throughputs.  For targets that declare a loop buffer, flag
-decoder-throughput allows users to do multiple runs, each time simulating a
different throughput from the decoders.

This stage can/will be extended in future. For example, we could add a "buffer
full" event to notify bottlenecks caused by backpressure. flag
-decoder-throughput would probably go away if in future we delegate to another
stage (DecoderStage?) the simulation of a (potentially variable) throughput from
the decoders. For now, flag -decoder-throughput is "good enough" to run some
simple experiments.

Differential Revision: https://reviews.llvm.org/D59928

llvm-svn: 357248
2019-03-29 12:15:37 +00:00
Konstantin Zhuravlyov 2b766ed774 AMDGPU: Make sram-ecc off by default for Vega20
Differential Revision: https://reviews.llvm.org/D59718

llvm-svn: 357247
2019-03-29 12:04:18 +00:00
James Henderson 814ab373ac [llvm-readelf]Merge dynamic and static relocation printing to avoid code duplication
The majority of the printRelocation and printDynamicRelocation functions
were identical. This patch factors this all out into a new function.
There are a couple of minor differences to do with printing of symbols
without names, but I think these are harmless, and in some cases a small
improvement.

Reviewed by: grimar, rupprecht, Higuoxing

Differential Revision: https://reviews.llvm.org/D59823

llvm-svn: 357246
2019-03-29 11:47:19 +00:00
Simon Pilgrim aeaf7fcdde [X86] Add X86TargetLowering::isCommutativeBinOp override.
We currently just have test coverage for PMULUDQ - will add more in the future.

llvm-svn: 357244
2019-03-29 11:25:58 +00:00
Simon Pilgrim 62f0d1650a [SLP] Add support for swapping icmp/fcmp predicates to permit vectorization
We should be able to match elements with the swapped predicate as well - as long as we commute the source operands.

Differential Revision: https://reviews.llvm.org/D59956

llvm-svn: 357243
2019-03-29 10:41:00 +00:00
Kang Zhang 05f78b35ae [PowerPC] Add the support for __builtin_setrnd()
Summary:
PowerPC64/PowerPC64le supports the builtin function __builtin_setrnd to set the floating point rounding mode. This function will use the least significant two bits of integer argument to set the floating point rounding mode.
double __builtin_setrnd(int mode);
The effective values for mode are:
0 - round to nearest
1 - round to zero
2 - round to +infinity
3 - round to -infinity
Note that the mode argument will modulo 4, so if the int argument is greater than 3, it will only use the least significant two bits of the mode. Namely, builtin_setrnd(102)) is equal to builtin_setrnd(2).

Reviewed By: jsji

Differential Revision: https://reviews.llvm.org/D59405

llvm-svn: 357241
2019-03-29 08:45:24 +00:00
Matt Arsenault 5fddf09187 AMDGPU/GlobalISel: Insert waterfall loop for vector indexing
The register index can only really be an SGPR. Lie that a VGPR index
is legal, and then rewrite the instruction in a waterfall loop to
handle the index.

llvm-svn: 357235
2019-03-29 03:54:56 +00:00
Zi Xuan Wu 1445b77e8c [PowerPC] Strength reduction of multiply by a constant by shift and add/sub in place
A shift and add/sub sequence combination is faster in place of a multiply by constant. 
Because the cycle or latency of multiply is not huge, we only consider such following
worthy patterns.

```
(mul x, 2^N + 1) => (add (shl x, N), x)
(mul x, -(2^N + 1)) => -(add (shl x, N), x)
(mul x, 2^N - 1) => (sub (shl x, N), x)
(mul x, -(2^N - 1)) => (sub x, (shl x, N))
```

And the cycles or latency is subtarget-dependent so that we need consider the
subtarget to determine to do or not do such transformation. 
Also data type is considered for different cycles or latency to do multiply.

Differential Revision: https://reviews.llvm.org/D58950

llvm-svn: 357233
2019-03-29 03:08:39 +00:00
Thomas Lively 3f34e1b883 [WebAssembly] Merge used feature sets, update atomics linkage policy
Summary:
It does not currently make sense to use WebAssembly features in some functions
but not others, so this CL adds an IR pass that takes the union of all used
feature sets and applies it to each function in the module. This allows us to
prevent atomics from being lowered away if some function has opted in to using
them. When atomics is not enabled anywhere, we detect whether there exists any
atomic operations or thread local storage that would be stripped and disallow
linking with objects that contain atomics if and only if atomics or tls are
stripped. When atomics is enabled, mark it as used but do not require it of
other objects in the link. These changes allow libraries that do not use atomics
to be built once and linked into both single-threaded and multithreaded
binaries.

Reviewers: aheejin, sbc100, dschuff

Subscribers: jgravelle-google, hiraditya, sunfish, jfb, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D59625

llvm-svn: 357226
2019-03-29 00:14:01 +00:00
Jordan Rupprecht 1dc28b6d2b [llvm-readobj] Fix formatting of unknown note types
llvm-svn: 357221
2019-03-28 23:08:06 +00:00
Puyan Lotfi 6c82695753 [yaml2obj] Fixing opening empty yaml files.
Essentially echo "" | yaml2obj crashes. This patch attempts to trim whitespace
and determine if the yaml string in the file is empty or not. If the input is
empty then it will not properly print out an error message and return an error
code.

Differential Revision: https://reviews.llvm.org/D59964

A    test/tools/yaml2obj/empty.yaml
M    tools/yaml2obj/yaml2obj.cpp

llvm-svn: 357219
2019-03-28 22:55:08 +00:00
Florian Hahn 45682fd633 [LSR] Fix signed overflow in GenerateCrossUseConstantOffsets.
For the attached test case, unchecked addition of immediate starts and
ends overflows, as they can be arbitrary i64 constants.

Proof: https://rise4fun.com/Alive/Plqc

Reviewers: qcolombet, gilr, efriedma

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D59218

llvm-svn: 357217
2019-03-28 22:17:29 +00:00
Yonghong Song 360a4e2ca6 [BPF] add proper multi-dimensional array support
For multi-dimensional array like below
  int a[2][3];
the previous implementation generates BTF_KIND_ARRAY type
like below:
  . element_type: int
  . index_type: unsigned int
  . number of elements: 6

This is not the best way to represent arrays, esp.,
when converting BTF back to headers and users will see
  int a[6];
instead.

This patch generates proper support for multi-dimensional arrays.
For "int a[2][3]", the two BTF_KIND_ARRAY types will be
generated:
  Type #n:
    . element_type: int
    . index_type: unsigned int
    . number of elements: 3
  Type #(n+1):
    . element_type: #n
    . index_type: unsigned int
    . number of elements: 2

The linux kernel already supports such a multi-dimensional
array representation properly.

Signed-off-by: Yonghong Song <yhs@fb.com>

Differential Revision: https://reviews.llvm.org/D59943

llvm-svn: 357215
2019-03-28 21:59:49 +00:00
Eli Friedman 3dd72ea810 [MC] Fix floating-point literal lexing.
This patch has three related fixes to improve float literal lexing:

1. Make AsmLexer::LexDigit handle floats without a decimal point more
   consistently.
2. Make AsmLexer::LexFloatLiteral print an error for floats which are
   apparently missing an "e".
3. Make APFloat::convertFromString use binutils-compatible exponent
   parsing.

Together, this fixes some cases where a float would be incorrectly
rejected, fixes some cases where the compiler would crash, and improves
diagnostics in some cases.

Patch by Brandon Jones.

Differential Revision: https://reviews.llvm.org/D57321

llvm-svn: 357214
2019-03-28 21:12:28 +00:00
Eli Friedman 96f295e23b [InterleavedAccessPass] Don't increase the number of bytes loaded.
Even if the interleaving transform would otherwise be legal, we shouldn't
introduce an interleaved load that is wider than the original load: it might
have undefined behavior.

It might be possible to perform some sort of mask-narrowing transform in
some cases (using a narrower interleaved load, then extending the
results using shufflevectors).  But I haven't tried to implement that,
at least for now.

Fixes https://bugs.llvm.org/show_bug.cgi?id=41245 .

Differential Revision: https://reviews.llvm.org/D59954

llvm-svn: 357212
2019-03-28 20:44:50 +00:00
Simon Pilgrim ceb3de5d25 [SLP][X86] Add tests showing failure to commute icmp/fcmp by swapping predicate
By swapping icmp/fcmp predicates we can commute their operands to improve vectorization

llvm-svn: 357204
2019-03-28 19:13:38 +00:00
Simon Pilgrim 66b5e322fc [SLP][X86] Add tests showing failure to commute icmp/fcmp operands
Some predicates are fully commutative - we should be able to easily commute their operands to improve vectorization

llvm-svn: 357202
2019-03-28 19:03:53 +00:00
Craig Topper c25c9b4d16 [X86] Teach the isel optimization for (x << C1) op C2 to (x op (C2>>C1)) << C1 to consider cases where C2>>C1 can fit an unsigned 32-bit immediate
For 64-bit operations we should consider if the immediate can be made to fit
in an unsigned 32-bits immedate. For OR/XOR this allows us to load the immediate
with MOV32ri instead of movabsq. For AND this allows us to fold the immediate.

Differential Revision: https://reviews.llvm.org/D59867

llvm-svn: 357196
2019-03-28 18:05:37 +00:00
Petar Avramovic 1af05df3de [MIPS GlobalISel] Select float constants
Select 32 and 64 bit float constants for MIPS32.

Differential Revision: https://reviews.llvm.org/D59933

llvm-svn: 357183
2019-03-28 16:58:12 +00:00
Sanjay Patel ffa8d3def7 [DAGCombiner] fold sext into negation
As noted in D59818:
  %z = zext i8 %x to i32
  %neg = sub i32 0, %z
  %r = sext i32 %neg to i64
  =>
  %z2 = zext i8 %x to i64
  %r = sub i64 0, %z2

https://rise4fun.com/Alive/KzSR

llvm-svn: 357178
2019-03-28 15:46:02 +00:00
Sanjay Patel e781528278 [x86] add vector test for sext of negate; NFC
llvm-svn: 357177
2019-03-28 15:30:09 +00:00
Sanjay Patel 5bbf6f0bd8 [x86] avoid cmov in movmsk reduction
This is probably the least important of our movmsk problems, but I'm starting
at the bottom to reduce distractions.

We were creating a select_cc which bypasses the select and bitmask codegen
optimizations that we have now. If we produce a compare+negate instead, we
allow things like neg/sbb carry bit hacks, and in all cases we avoid a cmov.
There's no partial register update danger in these sequences because we always
produce the zero-register xor ahead of the 'set' if needed.

There seems to be a missing fold for sext of a bool bit here:

negl %ecx
movslq %ecx, %rax

...but that's an independent transform.

Differential Revision: https://reviews.llvm.org/D59818

llvm-svn: 357172
2019-03-28 14:16:13 +00:00
Clement Courbet 699dc025a6 [X86MacroFusion] Handle branch fusion (AMD CPUs).
Summary:
This adds a BranchFusion feature to replace the usage of the MacroFusion
for AMD CPUs.

See D59688 for context.

Reviewers: andreadb, lebedev.ri

Subscribers: hiraditya, jdoerfert, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D59872

llvm-svn: 357171
2019-03-28 14:12:46 +00:00
Matt Arsenault a353fd572a AMDGPU: Make exec mask optimzations more resistant to block splits
Also improve the check for SALU instructions to also ignore
implicit_def and other fake instructions.

llvm-svn: 357170
2019-03-28 14:01:39 +00:00
Roman Lebedev c325be6cef [X86] AMD Piledriver (BdVer2): fine-tune some latencies
Based on llvm-exegesis measurements.

Now that llvm-exegesis is ~2 magnitudes faster, and is a bit smarter,
it is now possible to continue cleanup of the scheduler model.

With this, there are no more latency inconsistencies for the
opcodes that produce stable measurements, and only a few inconsistencies
for unstable measurements (MMX_* opcodes, opcodes that llvm-exegesis
measures by chaining - CMP, TEST, BT, SETcc, CVT, MOV, etc.)

llvm-svn: 357169
2019-03-28 13:40:34 +00:00
Simon Pilgrim 38a0616c1d [DAGCombiner] Fold truncate(build_vector(x,y)) -> build_vector(truncate(x),truncate(y))
If scalar truncates are free, attempt to pre-truncate build_vectors source operands.

Only attempt to do this before legalization as we often end up with truncations/extensions during build_vector lowering.

Differential Revision: https://reviews.llvm.org/D59654

llvm-svn: 357161
2019-03-28 11:34:21 +00:00
Diana Picus 13ef0c5309 [ARM GlobalISel] Run regbankselect test for Thumb. NFCI
This should just work, since ARM mode and Thumb2 mode are at the same
level of support now and should map the same to GPR and FPR.

llvm-svn: 357159
2019-03-28 10:57:29 +00:00
George Rimar 4111299584 [yaml2obj][obj2yaml] - Teach yaml2obj/obj2yaml tools about STB_GNU_UNIQUE symbols.
yaml2obj/obj2yaml does not support the symbols with STB_GNU_UNIQUE yet.
Currently, obj2yaml fails with llvm_unreachable when met such a symbol.

I faced it when investigated the https://bugs.llvm.org/show_bug.cgi?id=41196.

Differential revision: https://reviews.llvm.org/D59875

llvm-svn: 357158
2019-03-28 10:52:14 +00:00
Pierre Gousseau a833c2bd3e [asan] Add options -asan-detect-invalid-pointer-cmp and -asan-detect-invalid-pointer-sub options.
This is in preparation to a driver patch to add gcc 8's -fsanitize=pointer-compare and -fsanitize=pointer-subtract.
Disabled by default as this is still an experimental feature.

Reviewed By: morehouse, vitalybuka

Differential Revision: https://reviews.llvm.org/D59220

llvm-svn: 357157
2019-03-28 10:51:24 +00:00
Florian Hahn e21ed594d8 [VPlan] Determine Vector Width programmatically.
With this change, the VPlan native path is triggered with the directive:

   #pragma clang loop vectorize(enable)

There is no need to specify the vectorize_width(N) clause.

Patch by Francesco Petrogalli <francesco.petrogalli@arm.com>

Differential Revision: https://reviews.llvm.org/D57598

llvm-svn: 357156
2019-03-28 10:37:12 +00:00
Simon Pilgrim 22be913ac0 [X85][AVX] Add missing vXi16 broadcast fold patterns
Now that D59484 has landed its easier to add these.

Added missing AVX512BW v32i16 equivalents while I was at it.

llvm-svn: 357155
2019-03-28 10:25:13 +00:00
Diana Picus 52495c472f [ARM GlobalISel] Fix G_STORE with s1
G_STORE for 1-bit values uses a STRBi12, which stores the whole byte.
Zero out the undefined bits before writing.

llvm-svn: 357154
2019-03-28 09:09:36 +00:00
Diana Picus 4d512df300 [ARM GlobalISel] Fix selection of G_SELECT
G_SELECT uses a 1-bit scalar for the condition, and is currently
implemented with a plain CMPri against 0. This means that values such as
0x1110 are interpreted as true, when instead the higher bits should be
treated as undefined and therefore ignored. Replace the CMPri with a
TSTri against 0x1, which performs an implicit AND, yielding the expected
result.

llvm-svn: 357153
2019-03-28 09:09:27 +00:00
Roman Lebedev c2423fe689 [llvm-exegesis] Introduce a 'naive' clustering algorithm (PR40880)
Summary:
This is an alternative to D59539.

Let's suppose we have measured 4 different opcodes, and got: `0.5`, `1.0`, `1.5`, `2.0`.
Let's suppose we are using `-analysis-clustering-epsilon=0.5`.
By default now we will start processing the `0.5` point, find that `1.0` is it's neighbor, add them to a new cluster.
Then we will notice that `1.5` is a neighbor of `1.0` and add it to that same cluster.
Then we will notice that `2.0` is a neighbor of `1.5` and add it to that same cluster.
So all these points ended up in the same cluster.
This may or may not be a correct implementation of dbscan clustering algorithm.

But this is rather horribly broken for the reasons of comparing the clusters with the LLVM sched data.
Let's suppose all those opcodes are currently in the same sched cluster.
If i specify `-analysis-inconsistency-epsilon=0.5`, then no matter
the LLVM values this cluster will **never** match the LLVM values,
and thus this cluster will **always** be displayed as inconsistent.

The solution is obviously to split off some of these opcodes into different sched cluster.
But how do i do that? Out of 4 opcodes displayed in the inconsistency report,
which ones are the "bad ones"? Which ones are the most different from the checked-in data?
I'd need to go in to the `.yaml` and look it up manually.

The trivial solution is to, when creating clusters, don't use the full dbscan algorithm,
but instead "pick some unclustered point, pick all unclustered points that are it's neighbor,
put them all into a new cluster, repeat". And just so as it happens, we can arrive
at that algorithm by not performing the "add neighbors of a neighbor to the cluster" step.

But that won't work well once we teach analyze mode to operate in on-1D mode
(i.e. on more than a single measurement type at a time), because the clustering would
depend on the order of the measurements.

Instead, let's just create a single cluster per opcode, and put all the points of that opcode into said cluster.
And simultaneously check that every point in that cluster is a neighbor of every other point in the cluster,
and if they are not, the cluster (==opcode) is unstable.

This is //yet another// step to bring me closer to being able to continue cleanup of bdver2 sched model..

Fixes [[ https://bugs.llvm.org/show_bug.cgi?id=40880 | PR40880 ]].

Reviewers: courbet, gchatelet

Reviewed By: courbet

Subscribers: tschuett, jdoerfert, RKSimon, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D59820

llvm-svn: 357152
2019-03-28 08:55:01 +00:00
Piotr Sobczak f896785cb7 [SelectionDAG] Add 2 tests for selection across basic blocks
Summary:
Add tests for selection across basic block boundary:
 * one test containing a buffer load, where part of the offset
   computation is placed in the predecessor of the load
 * similar test, but containing two buffer loads and shared
   computations

Please note that the behaviour being tested will be updated in
a subsequent commit.

This commit was extracted from https://reviews.llvm.org/D59535.

Reviewers: RKSimon

Reviewed By: RKSimon

Subscribers: jvesely, nhaehnle, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D59690

llvm-svn: 357149
2019-03-28 07:06:26 +00:00
Eric Christopher 0a2d0c1f5f Add reproduction instructions to llvm-objdump's embedded source test.
llvm-svn: 357142
2019-03-28 01:56:16 +00:00
Chandler Carruth 923ff550b9 [NewPM] Fix a nasty bug with analysis invalidation in the new PM.
The issue here is that we actually allow CGSCC passes to mutate IR (and
therefore invalidate analyses) outside of the current SCC. At a minimum,
we need to support mutating parent and ancestor SCCs to support the
ArgumentPromotion pass which rewrites all calls to a function.

However, the analysis invalidation infrastructure is heavily based
around not needing to invalidate the same IR-unit at multiple levels.
With Loop passes for example, they don't invalidate other Loops. So we
need to customize how we handle CGSCC invalidation. Doing this without
gratuitously re-running analyses is even harder. I've avoided most of
these by using an out-of-band preserved set to accumulate the cross-SCC
invalidation, but it still isn't perfect in the case of re-visiting the
same SCC repeatedly *but* it coming off the worklist. Unclear how
important this use case really is, but I wanted to call it out.

Another wrinkle is that in order for this to successfully propagate to
function analyses, we have to make sure we have a proxy from the SCC to
the Function level. That requires pre-creating the necessary proxy.

The motivating test case now works cleanly and is added for
ArgumentPromotion.

Thanks for the review from Philip and Wei!

Differential Revision: https://reviews.llvm.org/D59869

llvm-svn: 357137
2019-03-28 00:51:36 +00:00
Craig Topper 929932954d [X86] Add test cases from PR27202.
llvm-svn: 357132
2019-03-27 23:12:19 +00:00
Sanjay Patel 1df0bb6264 [x86] improve AVX lowering of vector zext
If we know the 2 halves of an oversized zext-in-reg are the same,
don't create those halves independently.

I tried several different approaches to fold this, but it's difficult
to get right during legalization. In the default path, we are creating
a generic shuffle that looks like an unpack high, but it can get
transformed into a different mask (a blend), so it's not
straightforward to match that. If we try to fold after it actually
becomes an X86ISD::UNPCKH node, we can't be sure what the operand node
is - it might be a generic shuffle, or it could be some x86-specific op.

From the test output, we should be doing something like this for SSE4.1
as well, but I'd rather leave that as a follow-up since it involves
changing lowering actions.

Differential Revision: https://reviews.llvm.org/D59777

llvm-svn: 357129
2019-03-27 22:42:11 +00:00
Daniel Sanders 495156dc6a test/CodeGen/X86/codegen-prepare-replacephi.mir requires a default triple
llvm-svn: 357122
2019-03-27 20:43:47 +00:00
Nirav Dave 6b741a8038 [DAGCombiner] Teach TokenFactor pruning to peek through lifetime nodes
Summary: Lifetime nodes were inhibiting TokenFactor simplification inhibiting chain-based optimizations.

Reviewers: courbet, jyknight

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D59897

llvm-svn: 357121
2019-03-27 20:37:08 +00:00
Justin Bogner b1650f0da9 [LegalizeVectorTypes] Allow single loads and stores for more short vectors
When lowering a load or store for TypeWidenVector, the type legalizer
would use a single load or store if the associated integer type was legal
or promoted. E.g. it loads a v4i8 as an i32 if i32 is legal/promotable.
(See https://reviews.llvm.org/rL236528 for reference.)

This applies that behaviour to vector types. If the vector type is
TypePromoteInteger, the element type is going to be TypePromoteInteger
as well, which will lead to have a single promoting load rather than N
individual promoting loads. For instance, if we have a v3i1, we would
now have a load of v4i1 instead of 3 loads of i1.

Patch by Guillaume Marques. Thanks!

Differential Revision: https://reviews.llvm.org/D56201

llvm-svn: 357120
2019-03-27 20:35:56 +00:00
Evgeniy Stepanov 67646d0570 Fix llvm-rc tests.
Summary:
Follow-up for D56743.
* Add more "--" in llvm-rc invocations.
* Add llvm-rc to the tools list. This uses full path to llvm-rc in test
  RUN lines (llvm-lit -v), making them copy-pasteable.

Reviewers: mstorsjo, zturner

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D59858

llvm-svn: 357118
2019-03-27 20:15:08 +00:00
Nirav Dave c6dfaa0e83 Revert r356996 "[DAG] Avoid smart constructor-based dangling nodes."
This patch appears to trigger very large compile time increases in
halide builds.

llvm-svn: 357116
2019-03-27 19:54:41 +00:00
Eli Friedman c388bfa230 [ARM] Don't confuse the scheduler for very large VLDMDIA etc.
ARMBaseInstrInfo::getNumLDMAddresses is making bad assumptions about the
memory operands of load and store-multiple operations.  This doesn't
really fix the problem properly, but it's enough to prevent crashing,
at least.

Fixes https://bugs.llvm.org/show_bug.cgi?id=41231 .

Differential Revision: https://reviews.llvm.org/D59834

llvm-svn: 357109
2019-03-27 18:33:30 +00:00
Matt Arsenault 2e9ddcc30e RegPressure: Fix crash on blocks with only dbg_value
If there were only dbg_values in the block, recede would hit the
beginning of the block and try to use thet dbg_value as a real
instruction.

llvm-svn: 357105
2019-03-27 18:14:02 +00:00
Nikita Popov 7462303e06 [InstCombine] Use uadd.sat and usub.sat for canonicalization
Start using the uadd.sat and usub.sat intrinsics for the existing
canonicalizations. These intrinsics should optimize better than
expanded IR, have better handling in the X86 backend and should
be no worse than expanded IR in other backends, as far as we know.

rL357012 already introduced use of uadd.sat for the add+umin pattern.

Differential Revision: https://reviews.llvm.org/D58872

llvm-svn: 357103
2019-03-27 17:56:15 +00:00
Amara Emerson 381188f1f3 [GlobalISel] Fix legalizer artifact combiner from crashing with invalid dead instructions.
The artifact combiners push instructions which have been marked for deletion
onto an list for the legalizer to deal with on return. However, for trunc(ext)
combines the combiner routine recursively calls itself. When it does this the
dead instructions list may not be empty, and the other combiners don't expect
to be dealing with essentially invalid MIR (multiple vreg defs etc).

This change fixes it by ensuring that the dead instructions are processed on
entry into tryCombineInstruction.

As a result, this fix exposed a few places in tests where G_TRUNC instructions
were not being deleted even though they were dead.

Differential Revision: https://reviews.llvm.org/D59892

llvm-svn: 357101
2019-03-27 17:47:42 +00:00
Clement Courbet f8666b0649 [X86MacroFusion][NFC] Add a bulldozer test.
llvm-svn: 357099
2019-03-27 17:44:16 +00:00
Matt Arsenault 7b14b2425d Reapply "AMDGPU: Scavenge register instead of findUnusedReg"
This reapplies r356149, using the correct overload of findUnusedReg
which passes the current iterator.

This worked most of the time, because the scavenger iterator was moved
at the end of the frame index loop in PEI. This would fail if the
spill was the first instruction. This was further hidden by the fact
that the scavenger wasn't passed in for normal frame index
elimination.

llvm-svn: 357098
2019-03-27 17:31:29 +00:00
Matt Arsenault 86e4fc0504 AMDGPU: Add testcase I meant to merge into r357093
llvm-svn: 357097
2019-03-27 17:31:26 +00:00
Craig Topper 7c9afc35bc [X86] Add post-isel pseudos for rotate by immediate using SHLD/SHRD
Haswell CPUs have special support for SHLD/SHRD with the same register for both sources. Such an instruction will go to the rotate/shift unit on port 0 or 6. This gives it 1 cycle latency and 0.5 cycle reciprocal throughput. When the register is not the same, it becomes a 3 cycle operation on port 1. Sandybridge and Ivybridge always have 1 cyc latency and 0.5 cycle reciprocal throughput for any SHLD.

When FastSHLDRotate feature flag is set, we try to use SHLD for rotate by immediate unless BMI2 is enabled. But MachineCopyPropagation can look through a copy and change one of the sources to be different. This will break the hardware optimization.

This patch adds psuedo instruction to hide the second source input until after register allocation and MachineCopyPropagation. I'm not sure if this is the best way to do this or if there's some other way we can make this work.

Fixes PR41055

Differential Revision: https://reviews.llvm.org/D59391

llvm-svn: 357096
2019-03-27 17:29:34 +00:00