Commit Graph

61110 Commits

Author SHA1 Message Date
Jessica Paquette 0aa9b453c4 [GlobalISel][AArch64] Legalize/select G_(S/Z/ANY)_EXT for v8s8s
This adds legalization for G_SEXT, G_ZEXT, and G_ANYEXT for v8s8s.

We were falling back on G_ZEXT in arm64-vabs.ll before, preventing us from
selecting the @llvm.aarch64.neon.sabd.v8i8 intrinsic.

This adds legalizer support for those 3, which gives us selection via the
importer. Update the relevant tests (legalize-ext.mir, select-int-ext.mir) and
add a GISel line to arm64-vabs.ll.

Differential Revision: https://reviews.llvm.org/D60881

llvm-svn: 358715
2019-04-18 21:15:48 +00:00
Jessica Paquette 3b5119c684 [GlobalISel][AArch64] Legalize v8s8 loads
Add legalizer support for loads of v8s8 and update legalize-load-store.mir.

Differential Revision: https://reviews.llvm.org/D60877

llvm-svn: 358714
2019-04-18 21:13:58 +00:00
Nico Weber a0ac65c98f llvm-undname: Fix two more asserts-on-invalid, found by oss-fuzz
llvm-svn: 358708
2019-04-18 19:52:32 +00:00
Nico Weber 502cf4bd19 llvm-undname: Fix two asserts-on-invalid
llvm-svn: 358707
2019-04-18 19:30:21 +00:00
Philip Reames 137995d8da [GuardWidening] Wire up a NPM version of the LoopGuardWidening pass
llvm-svn: 358704
2019-04-18 19:17:14 +00:00
Quentin Colombet ea3364bf85 [BlockExtractor] Extend the file format to support the grouping of basic blocks
Prior to this patch, each basic block listed in the extrack-blocks-file
would be extracted to a different function.

This patch adds the support for comma separated list of basic blocks
to form group.

When the region formed by a group is not extractable, e.g., not single
entry, all the blocks of that group are left untouched.

Let us see this new format in action (comments are not part of the
file format):
;; funcName bbName[,bbName...]
   foo      bb1        ;; Extract bb1 in its own function
   foo      bb2,bb3    ;; Extract bb2,bb3 in their own function
   bar      bb1,bb4    ;; Extract bb1,bb4 in their own function
   bar      bb2        ;; Extract bb2 in its own function

Assuming all regions are extractable, this will create one function and
thus one call per region.

Differential Revision: https://reviews.llvm.org/D60746

llvm-svn: 358701
2019-04-18 18:28:30 +00:00
Roland Froese a5dd08cac2 [PowerPC] Add some PPC vec cost tests to prep for D60160 NFC
llvm-svn: 358699
2019-04-18 18:12:09 +00:00
Simon Pilgrim 4171a91e92 [X86] combineVectorTruncationWithPACKUS - remove split/concatenation of mask
combineVectorTruncationWithPACKUS is currently splitting the upper bit bit masking into 128-bit subregs and then concatenating them back together.

This was originally done to avoid regressions that caused existing subregs to be concatenated to the larger type just for the AND masking before being extracted again. This was fixed by @spatel (notably rL303997 and rL347356).

This also lets SimplifyDemandedBits do some further improvements before it hits the recursive depth limit.

My only annoyance with this is that we were broadcasting some xmm masks but we seem to have lost them by moving to ymm - but that's a known issue as the logic in lowerBuildVectorAsBroadcast isn't great.

Differential Revision: https://reviews.llvm.org/D60375#inline-539623

llvm-svn: 358692
2019-04-18 17:23:09 +00:00
Philip Reames adf288c5d9 [LoopPred] Fix a blatantly obvious bug in r358684
The bug is that I didn't check whether the operand of the invariant_loads were themselves invariant.  I don't know how this got missed in the patch and review.  I even had an unreduced test case locally, and I remember handling this case, but I must have lost it in one of the rebases.  Oops.

llvm-svn: 358688
2019-04-18 17:01:19 +00:00
Sanjay Patel 51fa60bcbb [x86] add tests for improved insertelement to index 0 (PR41512); NFC
Patch proposal in D60852.

llvm-svn: 358687
2019-04-18 16:58:50 +00:00
Philip Reames 92a7177e6b [LoopPredication] Allow predication of loop invariant computations (within the loop)
The purpose of this patch is to eliminate a pass ordering dependence between LoopPredication and LICM. To understand the purpose, consider the following snippet of code inside some loop 'L' with IV 'i'
A = _a.length;
guard (i < A)
a = _a[i]
B = _b.length;
guard (i < B);
b = _b[i];
...
Z = _z.length;
guard (i < Z)
z = _z[i]
accum += a + b + ... + z;

Today, we need LICM to hoist the length loads, LoopPredication to make the guards loop invariant, and TrivialUnswitch to eliminate the loop invariant guard to establish must execute for the next length load. Today, if we can't prove speculation safety, we'd have to iterate these three passes 26 times to reduce this example down to the minimal form.

Using the fact that the array lengths are known to be invariant, we can short circuit this iteration. By forming the loop invariant form of all the guards at once, we remove the need for LoopPredication from the iterative cycle. At the moment, we'd still have to iterate LICM and TrivialUnswitch; we'll leave that part for later.

As a secondary benefit, this allows LoopPred to expose peeling oppurtunities in a much more obvious manner.  See the udiv test changes as an example.  If the udiv was not hoistable (i.e. we couldn't prove speculation safety) this would be an example where peeling becomes obviously profitable whereas it wasn't before.

A couple of subtleties in the implementation:
- SCEV's isSafeToExpand guarantees speculation safety (i.e. let's us expand at a new point).  It is not a precondition for expansion if we know the SCEV corresponds to a Value which dominates the requested expansion point.
- SCEV's isLoopInvariant returns true for expressions which compute the same value across all iterations executed, regardless of where the original Value is located.  (i.e. it can be in the loop)  This implies we have a speculation burden to prove before expanding them outside loops.
- invariant_loads and AA->pointsToConstantMemory are two cases that SCEV currently does not handle, but meets the SCEV definition of invariance.  I plan to sink this part into SCEV once this has baked for a bit.

Differential Revision: https://reviews.llvm.org/D60093

llvm-svn: 358684
2019-04-18 16:33:17 +00:00
Nicolai Haehnle 523f90a2ba [SDA] Bug fix: Use IPD outside the loop as divergence bound
Summary:
The immediate post dominator of the loop header may be part of the divergent loop.
Since this /was/ the divergence propagation bound the SDA would not detect joins of divergent paths outside the loop.

Reviewers: nhaehnle

Reviewed By: nhaehnle

Subscribers: mmasten, arsenm, jvesely, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D59042

llvm-svn: 358681
2019-04-18 16:17:35 +00:00
Pavel Labath 7429d86f36 MinidumpYAML: Add support for ModuleList stream
Summary:
This patch adds support for yaml (de)serialization of the minidump
ModuleList stream. It's a fairly straight forward-application of the
existing patterns to the ModuleList structures defined in previous
patches.

One thing, which may be interesting to call out explicitly is the
addition of "new" allocation functions to the helper BlobAllocator
class. The reason for this was, that there was an emerging pattern of a
need to allocate space for entities, which do not have a suitable
lifetime for use with the existing allocation functions. A typical
example of that was the "size" of various lists, which is only available
as a temporary returned by the .size() method of some container. For
these cases, one can use the new set of allocation functions, which
will take a temporary object, and store it in an allocator-managed
buffer until it is written to disk.

Reviewers: amccarth, jhenderson, clayborg, zturner

Subscribers: lldb-commits, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60405

llvm-svn: 358672
2019-04-18 14:57:31 +00:00
Jordan Rupprecht 2b32902a88 [llvm-objcopy] Add -B mips
llvm-svn: 358667
2019-04-18 14:22:37 +00:00
George Rimar a630b34057 [yaml2elf/obj2yaml] - Allow normal parsing/dumping of the .rela.dyn section
.rela.dyn is a section that has sh_info normally
set to zero. And Info is an optional field in the description
of the relocation section in YAML.

But currently, yaml2obj would fail to produce the object when
Info is not explicitly listed.

The patch fixes the issue.

Differential revision: https://reviews.llvm.org/D60820

llvm-svn: 358656
2019-04-18 11:02:07 +00:00
Simon Pilgrim 8f87e53462 [X86][SSE] Lower ICMP EQ(AND(X,C),C) -> SRA(SHL(X,LOG2(C)),BW-1) iff C is power-of-2.
This replaces the MOVMSK combine introduced at D52121/rL342326

(movmsk (setne (and X, (1 << C)), 0)) -> (movmsk (X << C))

with the more general icmp lowering so it can pick up more cases through bitcasts - notably vXi8 cases which use vXi16 shifts+masks, this patch can remove the mask and use pcmpgtb(0,x) for the sra.

Differential Revision: https://reviews.llvm.org/D60625

llvm-svn: 358651
2019-04-18 09:58:59 +00:00
James Henderson 66a9d0f8c6 [llvm-objcopy][llvm-strip] Add switch to allow removing referenced sections
llvm-objcopy currently emits an error if a section to be removed is
referenced by another section. This is a reasonable thing to do, but is
different to GNU objcopy. We should allow users who know what they are
doing to have a way to produce the invalid ELF. This change adds a new
switch --allow-broken-links to both llvm-strip and llvm-objcopy to do
precisely that. The corresponding sh_link field is then set to 0 instead
of an error being emitted.

I cannot use llvm-readelf/readobj to test the link fields because they
emit an error if any sections, like the .dynsym, cannot be properly
loaded.

Reviewed by: rupprecht, grimar

Differential Revision: https://reviews.llvm.org/D60324

llvm-svn: 358649
2019-04-18 09:13:30 +00:00
Kang Zhang 009a21d2fd [PowerPC] Fix wrong ElemSIze when calling isConsecutiveLS()
Summary:
This issue from the bugzilla: https://bugs.llvm.org/show_bug.cgi?id=41177

When the two operands for BUILD_VECTOR are same, we will get assert error.
llvm::SDValue combineBVOfConsecutiveLoads(llvm::SDNode*, llvm::SelectionDAG&):
Assertion `!(InputsAreConsecutiveLoads && InputsAreReverseConsecutive) &&
"The loads cannot be both consecutive and reverse consecutive."' failed.

This error caused by the wrong ElemSIze when calling isConsecutiveLS(). We
should use `getScalarType().getStoreSize();` to get the ElemSize instread of
 `getScalarSizeInBits() / 8`.

Reviewed By: jsji

Differential Revision: https://reviews.llvm.org/D60811

llvm-svn: 358644
2019-04-18 07:24:15 +00:00
Tim Renouf 7c55c8d8c3 [AMDGPU] Avoid DAG combining assert with fneg(fadd(A,0))
fneg combining attempts to turn it into fadd(fneg(A), fneg(0)), but
creating the new fadd folds to just fneg(A). When A has multiple uses,
this confuses it and you get an assert. Fixed.

Differential Revision: https://reviews.llvm.org/D60633

Change-Id: I0ddc9b7286abe78edc0cd8d734fdeb05ff09821c
llvm-svn: 358640
2019-04-18 05:27:01 +00:00
Sanjay Patel fb363a778f [x86] try to widen 'shl' as part of LEA formation
The test file has pairs of tests that are logically equivalent:
https://rise4fun.com/Alive/2zQ

%t4 = and i8 %t1, 8
%t5 = zext i8 %t4 to i16
%sh = shl i16 %t5, 2
%t6 = add i16 %sh, %t0
=>
%t4 = and i8 %t1, 8
%sh2 = shl i8 %t4, 2
%z5 = zext i8 %sh2 to i16
%t6 = add i16 %z5, %t0

...so if we can fold the shift op into LEA in the 1st pattern, then we
should be able to do the same in the 2nd pattern (unnecessary 'movzbl'
is a separate bug I think).

We don't want to do this any sooner though because that would conflict
with generic transforms that try to narrow the width of the shift.

Differential Revision: https://reviews.llvm.org/D60789

llvm-svn: 358622
2019-04-17 22:38:51 +00:00
Amara Emerson daf6e66ac5 [GlobalISel] Add legalization support for non-power-2 loads and stores
Legalize things like i24 load/store by splitting them into smaller power of 2 operations.

This matches how SelectionDAG handles these operations.

Differential Revision: https://reviews.llvm.org/D59971

llvm-svn: 358613
2019-04-17 21:30:07 +00:00
Kit Barton 3cdf87940f Add basic loop fusion pass.
This patch adds a basic loop fusion pass. It will fuse loops that conform to the
following 4 conditions:
  1. Adjacent (no code between them)
  2. Control flow equivalent (if one loop executes, the other loop executes)
  3. Identical bounds (both loops iterate the same number of iterations)
  4. No negative distance dependencies between the loop bodies.

The pass does not make any changes to the IR to create opportunities for fusion.
Instead, it checks if the necessary conditions are met and if so it fuses two
loops together.

The pass has not been added to the pass pipeline yet, and thus is not enabled by
default. It can be run stand alone using the -loop-fusion option.

Differential Revision: https://reviews.llvm.org/D55851

llvm-svn: 358607
2019-04-17 18:53:27 +00:00
Nick Desaulniers a2077bab40 [AsmPrinter] defer %c to base class for ARM, PPC, and Hexagon. NFC
Summary:
None of these derived classes do anything that the base class cannot.
If we remove these case statements, then the base class can handle them
just fine.

Reviewers: peter.smith, echristo

Reviewed By: echristo

Subscribers: nemanjai, javed.absar, eraman, kristof.beyls, hiraditya, kbarton, jsji, llvm-commits, srhines

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60803

llvm-svn: 358603
2019-04-17 18:22:48 +00:00
Steven Wu 05a358cdcd [ThinLTO] Fix ThinLTOCodegenerator to export llvm.used symbols
Summary:
Reapply r357931 with fixes to ThinLTO testcases and llvm-lto tool.

ThinLTOCodeGenerator currently does not preserve llvm.used symbols and
it can internalize them. In order to pass the necessary information to the
legacy ThinLTOCodeGenerator, the input to the code generator is
rewritten to be based on lto::InputFile.

Now ThinLTO using the legacy LTO API will requires data layout in
Module.

"internalize" thinlto action in llvm-lto is updated to run both
"promote" and "internalize" with the same configuration as
ThinLTOCodeGenerator. The old "promote" + "internalize" option does not
produce the same output as ThinLTOCodeGenerator.

This fixes: PR41236
rdar://problem/49293439

Reviewers: tejohnson, pcc, kromanova, dexonsmith

Reviewed By: tejohnson

Subscribers: ormris, bd1976llvm, mehdi_amini, inglorion, eraman, hiraditya, jkorous, dexonsmith, arphaman, dang, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60421

llvm-svn: 358601
2019-04-17 17:38:09 +00:00
Nikita Popov 2039581002 [LVI][CVP] Constrain values in with.overflow branches
If a branch is conditional on extractvalue(op.with.overflow(%x, C), 1)
then we can constrain the value of %x inside the branch based on
makeGuaranteedNoWrapRegion(). We do this by extending the edge-value
handling in LVI. This allows CVP to then fold comparisons against %x,
as illustrated in the tests.

Differential Revision: https://reviews.llvm.org/D60650

llvm-svn: 358597
2019-04-17 16:57:42 +00:00
Dmitry Preobrazhensky 394d0a1637 [AMDGPU][MC] Corrected handling of "-" before expressions
See bug 41156: https://bugs.llvm.org/show_bug.cgi?id=41156

Reviewers: artem.tamazov, arsenm

Differential Revision: https://reviews.llvm.org/D60622

llvm-svn: 358596
2019-04-17 16:56:34 +00:00
Sanjay Patel 1964962b49 [ARM] tighten test checks; NFC
llvm-svn: 358594
2019-04-17 16:51:09 +00:00
Rhys Perry c2814e12e7 AMDGPU: Force skip over SMRD, VMEM and s_waitcnt instructions
Summary: This fixes a large Dawn of War 3 performance regression with RADV from Mesa 19.0 to master which was caused by creating less code in some branches.

Reviewers: arsen, nhaehnle

Reviewed By: nhaehnle

Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60824

llvm-svn: 358592
2019-04-17 16:31:52 +00:00
Sanjay Patel 1f2c81af72 [ARM] make test checks more thorough; NFC
This will change with the proposal in D60214.
Unfortunately, the triple is not supported for auto-generation
via script, and the multiple RUN lines have diffs on this test,
but I can't tell exactly what is required by this test.
PR7162 was an assert/crash, so hopefully, this is good enough.

llvm-svn: 358587
2019-04-17 16:02:07 +00:00
Florian Hahn 893aea58ea [LoopUnroll] Allow unrolling if the unrolled size does not exceed loop size.
Summary:
In the following cases, unrolling can be beneficial, even when
optimizing for code size:
 1) very low trip counts
 2) potential to constant fold most instructions after fully unrolling.

We can unroll in those cases, by setting the unrolling threshold to the
loop size. This might highlight some cost modeling issues and fixing
them will have a positive impact in general.

Reviewers: vsk, efriedma, dmgreen, paquette

Reviewed By: paquette

Differential Revision: https://reviews.llvm.org/D60265

llvm-svn: 358586
2019-04-17 15:57:43 +00:00
Dmitry Preobrazhensky 20d52e3aa2 [AMDGPU][MC] Corrected parsing of registers
See bug 41280: https://bugs.llvm.org/show_bug.cgi?id=41280

Reviewers: artem.tamazov, arsenm

Differential Revision: https://reviews.llvm.org/D60621

llvm-svn: 358581
2019-04-17 14:44:01 +00:00
Tim Renouf 59e8bd3093 [AMDGPU] Flag new raw/struct atomic ops as source of divergence
Differential Revision: https://reviews.llvm.org/D60731

Change-Id: I821d93dec8b9cdd247b8172d92fb5e15340a9e7d
llvm-svn: 358579
2019-04-17 14:04:31 +00:00
Simon Pilgrim 9daacec816 [CostModel][X86] Add bool anyof/allof reduction costs
On pre-AVX512 targets we can use MOVMSK to extract reduced boolean results. This is properly optimized, annoyingly AVX512 isn't and produces code that is almost as bad as the (unchanged) costs suggest......

Differential Revision: https://reviews.llvm.org/D60403

llvm-svn: 358574
2019-04-17 10:58:19 +00:00
Jordan Rupprecht b0b65cae59 [llvm-objcopy] Support full list of bfd targets that lld uses.
Summary:
This change takes the full list of bfd targets that lld supports (see `ScriptParser.cpp`), including generic handling for `*-freebsd` targets (which uses the same settings but with a FreeBSD OSABI). In particular this adds mips support for `--output-target` (but not yet via `--binary-architecture`).

lld and llvm-objcopy use their own different custom data structures, so I'd prefer to check this in as-is (add support directly in llvm-objcopy, including all the test coverage) and do a separate NFC patch(s) that consolidate the two by putting this mapping into libobject.

See [[ https://bugs.llvm.org/show_bug.cgi?id=41462 | PR41462 ]].

Reviewers: jhenderson, jakehehrlich, espindola, alexshap, arichardson

Reviewed By: arichardson

Subscribers: fedor.sergeev, emaste, sdardis, krytarowski, atanasyan, llvm-commits, MaskRay, arichardson

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60773

llvm-svn: 358562
2019-04-17 07:42:31 +00:00
Roman Lebedev 0080645846 [CVP] processOverflowIntrinsic(): don't crash if constant-holding happened
As reported by Mikael Holmén in post-commit review in
https://reviews.llvm.org/D60791#1469765

llvm-svn: 358559
2019-04-17 06:35:07 +00:00
Craig Topper 5ca2e04c7a [X86] Autogenerate complete checks. NFC
llvm-svn: 358556
2019-04-17 06:09:16 +00:00
Eric Christopher e29874eaa0 Revert "Add basic loop fusion pass." Per request.
This reverts commit r358543/ab70da07286e618016e78247e4a24fcb84077fda.

llvm-svn: 358553
2019-04-17 04:55:24 +00:00
Eric Christopher cee313d288 Revert "Temporarily Revert "Add basic loop fusion pass.""
The reversion apparently deleted the test/Transforms directory.

Will be re-reverting again.

llvm-svn: 358552
2019-04-17 04:52:47 +00:00
Eric Christopher a863435128 Temporarily Revert "Add basic loop fusion pass."
As it's causing some bot failures (and per request from kbarton).

This reverts commit r358543/ab70da07286e618016e78247e4a24fcb84077fda.

llvm-svn: 358546
2019-04-17 02:12:23 +00:00
Kit Barton ab70da0728 Add basic loop fusion pass.
This patch adds a basic loop fusion pass. It will fuse loops that conform to the
following 4 conditions:
  1. Adjacent (no code between them)
  2. Control flow equivalent (if one loop executes, the other loop executes)
  3. Identical bounds (both loops iterate the same number of iterations)
  4. No negative distance dependencies between the loop bodies.

The pass does not make any changes to the IR to create opportunities for fusion.
Instead, it checks if the necessary conditions are met and if so it fuses two
loops together.

The pass has not been added to the pass pipeline yet, and thus is not enabled by
default. It can be run stand alone using the -loop-fusion option.

Phabricator: https://reviews.llvm.org/D55851
llvm-svn: 358543
2019-04-17 01:37:00 +00:00
Sanjay Patel d5bc5ca3e4 [x86] adjust LEA tests for better coverage; NFC
The scale can 1, 2, or 3.

llvm-svn: 358539
2019-04-16 23:10:41 +00:00
Sanjay Patel e08783e2f5 [EarlyCSE] detect equivalence of selects with inverse conditions and commuted operands (PR41101)
This is 1 of the problems discussed in the post-commit thread for:
rL355741 / http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20190311/635516.html
and filed as:
https://bugs.llvm.org/show_bug.cgi?id=41101

Instcombine tries to canonicalize some of these cases (and there's room for improvement
there independently of this patch), but it can't always do that because of extra uses.
So we need to recognize these commuted operand patterns here in EarlyCSE. This is similar
to how we detect commuted compares and commuted min/max/abs.

Differential Revision: https://reviews.llvm.org/D60723

llvm-svn: 358523
2019-04-16 20:41:20 +00:00
Nikita Popov 52b24ee932 [CVP] Simplify umulo and smulo that cannot overflow
If a umul.with.overflow or smul.with.overflow operation cannot
overflow, simplify it to a simple mul nuw / mul nsw. After the
refactoring in D60668 this is just a matter of removing an
explicit check against multiplications.

Differential Revision: https://reviews.llvm.org/D60791

llvm-svn: 358521
2019-04-16 20:31:41 +00:00
Simon Pilgrim 82ffa88a04 [SLP] Refactoring of the operand reordering code.
This is a refactoring patch which should have all the functionality of the current code. Its goal is twofold:
i. Cleanup and simplify the reordering code, and
ii. Generalize reordering so that it will work for an arbitrary number of operands, not just 2.

This is the second patch in a series of patches that will enable operand reordering across chains of operations. An example of this was presented in EuroLLVM'18 https://www.youtube.com/watch?v=gIEn34LvyNo .

Committed on behalf of @vporpo (Vasileios Porpodas)

Differential Revision: https://reviews.llvm.org/D59973

llvm-svn: 358519
2019-04-16 19:27:00 +00:00
Nikita Popov 5a30177906 [CVP] Add tests for non-overflowing mulo; NFC
Should be simplified to simple mul.

llvm-svn: 358517
2019-04-16 19:25:35 +00:00
Simon Pilgrim d769bb1e58 [X86][AVX] X86ISD::PERMV/PERMV3 node types can never fold index ops
Improves codegen demonstrated by D60512 - instructions represented by X86ISD::PERMV/PERMV3 can never memory fold the operand used for their index register.

This patch updates the 'isUseOfShuffle' helper into the more capable 'isFoldableUseOfShuffle' that recognises that the op is used for a X86ISD::PERMV/PERMV3 index mask and can't be folded - allowing us to use broadcast/subvector-broadcast ops to reduce the size of the mask constant pool data.

Differential Revision: https://reviews.llvm.org/D60562

llvm-svn: 358516
2019-04-16 19:18:53 +00:00
Nikita Popov 5ecd6a48b9 [InstCombine] Prune fshl/fshr with masked operands
If a constant shift amount is used, then only some of the LHS/RHS
operand bits are demanded and we may be able to simplify based on
that. InstCombineSimplifyDemanded already had the necessary support
for that, we just weren't calling it with fshl/fshr as root.

In particular, this allows us to relax some masked funnel shifts
into simple shifts, as shown in the tests.

Patch by Shawn Landden.

Differential Revision: https://reviews.llvm.org/D60660

llvm-svn: 358515
2019-04-16 19:05:49 +00:00
Nikita Popov f700081a7d [InstCombine] Add tests for fshl/fshr with masked operands; NFC
Baseline tests for D60660.

Patch by Shawn Landden.

Differential Revision: https://reviews.llvm.org/D60688

llvm-svn: 358514
2019-04-16 19:05:40 +00:00
Sanjay Patel f136c46bd6 [x86] add more tests for LEA formation; NFC
Promoting the shift to the wider type should allow LEA.

llvm-svn: 358513
2019-04-16 18:58:03 +00:00
Philip Reames c44b68e2b7 [Tests] Add branch_weights to latches so that test is not effected by future profitability patch to LoopPredication
llvm-svn: 358506
2019-04-16 16:32:59 +00:00
Fangrui Song 29cca27140 [llvm-objdump] Test tabs in disassemble-align.s with a more visible character
Summary: Apply rupprecht's suggestion in D60376

Reviewers: rupprecht

Reviewed By: rupprecht

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60777

llvm-svn: 358504
2019-04-16 15:58:42 +00:00
Luis Marques 20d2424016 [RISCV] Custom lower SHL_PARTS, SRA_PARTS, SRL_PARTS
When not optimizing for minimum size (-Oz) we custom lower wide shifts
(SHL_PARTS, SRA_PARTS, SRL_PARTS) instead of expanding to a libcall.

Differential Revision: https://reviews.llvm.org/D59477

llvm-svn: 358498
2019-04-16 14:38:32 +00:00
Ulrich Weigand 452060ab87 [SystemZ] Add missing intrinsics to intrinsics-immarg.ll
As of r356091, support for the ImmArg intrinsics was added,
including a SystemZ test case.  However, that test case doesn't
actually verify all SystemZ intrinsics with immediate arguments,
only a subset.  The rest of them actually works correctly, there's
just no test for them.  This patch add all missing intrinsics.

llvm-svn: 358495
2019-04-16 14:35:18 +00:00
Nico Weber c035c243da llvm-undname: Fix nullptr deref on invalid structor names in template args
Similar to r358421: A StructorIndentifierNode has a Class field which
is read when printing it, but if the StructorIndentifierNode appears in
a template argument then demangleFullyQualifiedSymbolName() which sets
Class isn't called. Since StructorIndentifierNodes are always leaf
names, we can just reject them as well.

Found by oss-fuzz.

llvm-svn: 358491
2019-04-16 14:10:34 +00:00
Nico Weber aa18ae862d llvm-undname: Tweak arena allocator
- Make `allocUnalignedBuffer` look more like `allocArray` and `alloc`.
  No behavior change.
- Change `Head->Used < Head->Capacity` to `Head->Used <= Head->Capacity`
  in `allocArray` and `alloc`. No intended behavior change, might be a
  minuscule memory usage improvement. Noticed this since it was the logic
  used in `allocUnalignedBuffer`.
- Don't let `allocArray` alloc too small buffers for names that have
  more than 512 levels of nesting (in 64-bit builds). Fixes a heap
  buffer overflow found by oss-fuzz.

Differential Revision: https://reviews.llvm.org/D60774

llvm-svn: 358489
2019-04-16 13:52:30 +00:00
Nico Weber 5961b0203a llvm-undname: add a missing CHECK: to a passing test
llvm-svn: 358488
2019-04-16 13:30:50 +00:00
Nico Weber ff92e715d3 Fix llvm-undname tests after r358485
llvm-svn: 358487
2019-04-16 13:18:51 +00:00
Hans Wennborg 21eb771dcb Re-commit r357452: SimplifyCFG SinkCommonCodeFromPredecessors: Also sink function calls without used results (PR41259)
The original commit caused false positives from AddressSanitizer's
use-after-scope checks, which have now been fixed in r358478.

> The code was previously checking that candidates for sinking had exactly
> one use or were a store instruction (which can't have uses). This meant
> we could sink call instructions only if they had a use.
>
> That limitation seemed a bit arbitrary, so this patch changes it to
> "instruction has zero or one use" which seems more natural and removes
> the need to special-case stores.
>
> Differential revision: https://reviews.llvm.org/D59936

llvm-svn: 358483
2019-04-16 12:13:25 +00:00
Hans Wennborg 6ae05777b8 Asan use-after-scope: don't poison allocas if there were untraced lifetime intrinsics in the function (PR41481)
If there are any intrinsics that cannot be traced back to an alloca, we
might have missed the start of a variable's scope, leading to false
error reports if the variable is poisoned at function entry. Instead, if
there are some intrinsics that can't be traced, fail safe and don't
poison the variables in that function.

Differential revision: https://reviews.llvm.org/D60686

llvm-svn: 358478
2019-04-16 07:54:20 +00:00
Fangrui Song fa860ff733 [llvm-objdump] Align instructions to a tab stop in disassembly output
This relands D60376/rL358405, with the difference: sed 'y/\t/ /' -> tr '\t' ' '
BSD sed doesn't support escape characters for the 'y' command.
I didn't use it in rL358405 because it was not listed at
https://llvm.org/docs/GettingStarted.html#software but it
should be available.

Original description:

In GNU objdump, -w/--wide aligns instructions in the disassembly output.
This patch does the same to llvm-objdump. However, we always use the
wide format (-w/--wide is ignored), because the narrow format
(instructions are misaligned) is probably not very useful.

In llvm-readobj, we made a similar decision: always use the wide format,
accept but ignore -W/--wide.

To save some columns, we change the tab before hex bytes (controlled by
--[no-]show-raw-insn) to a space.

llvm-svn: 358474
2019-04-16 03:56:55 +00:00
Fangrui Song 051a699ed6 [llvm-objdump] Simplify PrintHelpMessage() logic
This relands rL358418. It missed one test that should also use -macho
Note, all the other -private-header -exports-trie tests are used
together with -macho.

llvm-svn: 358472
2019-04-16 02:37:29 +00:00
Alex Lorenz d9d0c3e138 Revert r358405: "[llvm-objdump] Align instructions to a tab stop in disassembly output"
The test fails on darwin due to a sed error:

sed: 1: "y/\t/ /": transform strings are not the same length
llvm-svn: 358459
2019-04-15 22:36:12 +00:00
Amara Emerson 02a90ea73d [AArch64][GlobalISel] Don't do extending loads combine for non-pow-2 types.
Since non-pow-2 types are going to get split up into multiple loads anyway,
don't do the [SZ]EXTLOAD combine for those and save us trouble later in
legalization.

llvm-svn: 358458
2019-04-15 22:34:08 +00:00
Quentin Colombet fda0426888 [LSR] Rewrite misses some fixup locations if it splits critical edge
If LSR split critical edge during rewriting phi operands and
phi node has other pending fixup operands, we need to
update those pending fixups. Otherwise formulae will not be
implemented completely and some instructions will not be eliminated.

llvm.org/PR41445

Differential Revision: https://reviews.llvm.org/D60645

Patch by: Denis Bakhvalov <denis.bakhvalov@intel.com>

llvm-svn: 358457
2019-04-15 22:23:46 +00:00
Sanjay Patel 800a0c3e4b [EarlyCSE] add more tests for double-negated select condition; NFC
llvm-svn: 358454
2019-04-15 21:51:51 +00:00
Craig Topper 0495f29e42 [X86] Limit the 'x' inline assembly constraint to zmm0-15 when used for a 512 type.
The 'v' constraint is used to select zmm0-31. This makes 512 bit consistent with 128/256-bit.a

llvm-svn: 358450
2019-04-15 21:06:32 +00:00
Craig Topper 77439bb128 [X86] Fix a stack folding test to have a full xmm2-31 clobber list instead of stopping at xmm15. Add an additional dependency to keep instruction below inline asm block.
llvm-svn: 358449
2019-04-15 21:06:23 +00:00
Matt Arsenault 101abd219b AMDGPU: Fix unreachable when counting register usage of SGPR96
llvm-svn: 358447
2019-04-15 20:51:12 +00:00
Matt Arsenault fbdd2a1887 AMDGPU: Fix printed format of SReg_96
These are artificial, so I think this should only come up with inline
asm comments.

llvm-svn: 358446
2019-04-15 20:42:18 +00:00
Sanjay Patel 5ae05d810c [EarlyCSE] add test for select condition double-negation; NFC
llvm-svn: 358444
2019-04-15 20:25:31 +00:00
Alex Lorenz 16256123d0 Revert r358418: "[llvm-objdump] Simplify PrintHelpMessage() logic"
This reverts commit r358418 as it broke `test/Object/objdump-export-list`
on Darwin.

llvm-svn: 358443
2019-04-15 20:16:19 +00:00
Philip Reames af808ee2ee [Tests] Add a few more tests for LoopPredication w/invariant loads
Making sure to cover an important legality cornercase.

llvm-svn: 358439
2019-04-15 19:45:27 +00:00
Craig Topper 3d9b47c770 [X86] Block i32/i64 for 'k' and 'Yk' in getRegForInlineAsmConstraint without avx512bw.
32 and 64 bit k-registers require avx512bw. If we don't block this properly, it leads to a crash.

llvm-svn: 358436
2019-04-15 18:39:45 +00:00
Sanjay Patel 8ae68f2648 [x86] update test checks; NFC
llvm-svn: 358432
2019-04-15 17:38:47 +00:00
Wolfgang Pieb 4fe42214e2 [DEBUGINFO] Prevent Instcombine from dropping debuginfo when removing zexts
Zexts can be treated like no-op casts when it comes to assessing whether their
removal affects debug info.

Reviewer: aprantl

Differential Revision: https://reviews.llvm.org/D60641

llvm-svn: 358431
2019-04-15 17:36:29 +00:00
Don Hinton b85f74a283 [CommandLineParser] Add DefaultOption flag
Summary: Add DefaultOption flag to CommandLineParser which provides a
default option or alias, but allows users to override it for some
other purpose as needed.

Also, add `-h` as a default alias to `-help`, which can be seamlessly
overridden by applications like llvm-objdump and llvm-readobj which
use `-h` as an alias for other options.

(relanding after revert, r358414)
Added DefaultOptions.clear() to reset().

Reviewers: alexfh, klimek

Reviewed By: klimek

Subscribers: kristina, MaskRay, mehdi_amini, inglorion, dexonsmith, hiraditya, llvm-commits, jhenderson, arphaman, cfe-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D59746

llvm-svn: 358428
2019-04-15 17:18:10 +00:00
Craig Topper 8e364c680f [X86] Restore the pavg intrinsics.
The pattern we replaced these with may be too hard to match as demonstrated by
PR41496 and PR41316.

This patch restores the intrinsics and then we can start focusing
on the optimizing the intrinsics.

I've mostly reverted the original patch that removed them. Though I modified
the avx512 intrinsics to not have masking built in.

Differential Revision: https://reviews.llvm.org/D60674

llvm-svn: 358427
2019-04-15 17:17:35 +00:00
Sean Fertile 8d856488a8 Add slbfee instruction.
llvm-svn: 358425
2019-04-15 17:08:43 +00:00
Hiroshi Yamauchi 09e539fcae [PGO] Profile guided code size optimization.
Summary:
Enable some of the existing size optimizations for cold code under PGO.

A ~5% code size saving in big internal app under PGO.

The way it gets BFI/PSI is discussed in the RFC thread

http://lists.llvm.org/pipermail/llvm-dev/2019-March/130894.html 

Note it doesn't currently touch loop passes.

Reviewers: davidxl, eraman

Reviewed By: eraman

Subscribers: mgorny, javed.absar, smeenai, mehdi_amini, eraman, zzheng, steven_wu, dexonsmith, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D59514

llvm-svn: 358422
2019-04-15 16:49:00 +00:00
Nico Weber 64041d7b90 llvm-undname: Fix nullptr deref on invalid conversion operator names in template args
A ConversionOperatorIdentifierNode has a TargetType which is read when
printing it, but if the ConversionOperatorIdentifierNode appears in a
template argument there's nothing that can provide the TargetType.
Normally the COIN is a symbol (leaf) name and takes its TargetType from the
symbol's type, but in a template argument context the COIN can only be
either a non-leaf name piece or a type, and must hence be invalid.

Similar to the COIN check in demangleDeclarator().

Found by oss-fuzz.

llvm-svn: 358421
2019-04-15 16:42:44 +00:00
Sanjay Patel 0e0bb0e24a [EarlyCSE] add tests for selects with commuted operands (PR41101); NFC
llvm-svn: 358420
2019-04-15 16:01:05 +00:00
Philip Reames fbe64a2cfb [LoopPred] Hoist and of predicated checks where legal
If we have multiple range checks which can be predicated, hoist the and of the results outside the loop.  This minorly cleans up the resulting IR, but the main motivation is as a building block for D60093.

llvm-svn: 358419
2019-04-15 15:53:25 +00:00
Fangrui Song 204339a234 [llvm-objdump] Simplify PrintHelpMessage() logic
llvm-svn: 358418
2019-04-15 15:52:32 +00:00
Ilya Biryukov 70921d4a86 Revert r358337: "[CommandLineParser] Add DefaultOption flag"
The change causes test failures under asan. Reverting to unbreak our
integrate.

llvm-svn: 358414
2019-04-15 14:43:50 +00:00
Sanjay Patel c71433335a [EarlyCSE] regenerate test checks; NFC
llvm-svn: 358407
2019-04-15 14:02:37 +00:00
Fangrui Song b688a200e4 [llvm-objdump] Align instructions to a tab stop in disassembly output
Summary:
In GNU objdump, -w/--wide aligns instructions in the disassembly output.
This patch does the same to llvm-objdump. However, we always use the
wide format (-w/--wide is ignored), because the narrow format
(instructions are misaligned) is probably not very useful.

In llvm-readobj, we made a similar decision: always use the wide format,
accept but ignore -W/--wide.

To save some columns, we change the tab before hex bytes (controlled by
--[no-]show-raw-insn) to a space.

Reviewers: rupprecht, jhenderson, grimar

Reviewed By: jhenderson

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60376

llvm-svn: 358405
2019-04-15 13:32:41 +00:00
Sanjay Patel 5e13cd2e61 [InstCombine] canonicalize fdiv after fmul if reassociation is allowed
(X / Y) * Z --> (X * Z) / Y

This can allow other optimizations/reassociations as shown in the test diffs.

llvm-svn: 358404
2019-04-15 13:23:38 +00:00
Eugene Leviant 4918738c07 [llvm-readelf] Correctly dump symbols whose section id is SHN_XINDEX
Differential revision: https://reviews.llvm.org/D60614

llvm-svn: 358396
2019-04-15 11:21:47 +00:00
Stephen Tozer 19bb1d5739 [llvm-readobj] Reapply: Improve error message for --string-dump
This is a resubmission of a previous patch that caused test failures,
with the fixes for the relevant tests included.

Fixes bug 40630: https://bugs.llvm.org/show_bug.cgi?id=40630

This patch changes the error message when the section specified by
--string-dump cannot be found by including the name of the section in
the error message and changing the prefix text to not imply that the
file itself was invalid. As part of this change some uses of
std::error_code have been replaced with the llvm Error class to better
encapsulate the error info (rather than passing File strings around),
and the WithColor class replaces string literal error prefixes.

llvm-svn: 358395
2019-04-15 11:17:48 +00:00
Simon Tatham 301ed1cb49 [TableGen] Include schedule model name in diagnostic.
If you have more than one schedule model in your TableGen target
definitions, then the diagnostic "No schedule information for
instruction 'foo'" is rather unhelpful, because it doesn't tell you
_which_ schedule model is missing the necessary information (or, as it
might be, missing the UnsupportedFeatures definition that would stop
it thinking it needed it).

Extended the message to include the name of the schedule model that
it's complaining about.

Reviewers: nhaehnle, hfinkel, javedabsar, efriedma, javed.absar

Reviewed By: javed.absar

Subscribers: javed.absar, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60559

llvm-svn: 358389
2019-04-15 10:06:26 +00:00
Yevgeny Rouban 3992e9d229 Codegen: Fixed perf branch_weights in couple of tests. NFC.
This is need to pass future checks of perf branch_weights metadata.

llvm-svn: 358384
2019-04-15 09:30:31 +00:00
Serguei Katkov f54328372b [NewPM] Add Option handling for SimplifyCFG
This patch enables passing options to SimplifyCFGPass via the passes pipeline.

Reviewers: chandlerc, fedor.sergeev, leonardchan, philip.pfaffe
Reviewed By: fedor.sergeev
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D60675

llvm-svn: 358379
2019-04-15 08:57:53 +00:00
Bjorn Pettersson 60569363a5 [SelectionDAG] Use KnownBits::computeForAddSub/computeForAddCarry
Summary:
Use KnownBits::computeForAddSub/computeForAddCarry
in SelectionDAG::computeKnownBits when doing value
tracking for addition/subtraction.

This should improve the precision of the known bits,
as we only used to make a simple estimate of known
zeroes. The KnownBits support functions are also
able to deduce bits that are known to be one in the
result.

Reviewers: spatel, RKSimon, nikic, lebedev.ri

Reviewed By: nikic

Subscribers: nikic, javed.absar, lebedev.ri, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60460

llvm-svn: 358372
2019-04-15 07:19:11 +00:00
Craig Topper abd87ff48b [X86] Regenerate checks for domain-reassignment.mir
Apparently there are some stray IMPLICIT_DEF operations that weren't in the
checks. Not sure if they've always been there or something changed at some
point.

llvm-svn: 358371
2019-04-15 05:22:47 +00:00
Amara Emerson 946b1246d6 [GlobalISel] Enable CSE in the IRTranslator & legalizer for -O0 with constants only.
Other opcodes shouldn't be CSE'd until we can be sure debug info quality won't
be degraded.

This change also improves the IRTranslator so that in most places, but not all,
it creates constants using the MIRBuilder directly instead of first creating a
new destination vreg and then creating a constant. By doing this, the
buildConstant() method can just return the vreg of an existing G_CONSTANT
instead of having to create a COPY from it.

I measured a 0.2% improvement in compile time and a 0.9% improvement in code
size at -O0 ARM64.

Compile time:
Program                                        base   cse    diff
test-suite...ark/tramp3d-v4/tramp3d-v4.test     9.04   9.12  0.8%
test-suite...Mark/mafft/pairlocalalign.test     2.68   2.66 -0.7%
test-suite...-typeset/consumer-typeset.test     5.53   5.51 -0.4%
test-suite :: CTMark/lencod/lencod.test         5.30   5.28 -0.3%
test-suite :: CTMark/Bullet/bullet.test        25.82  25.76 -0.2%
test-suite...:: CTMark/ClamAV/clamscan.test     6.92   6.90 -0.2%
test-suite...TMark/7zip/7zip-benchmark.test    34.24  34.17 -0.2%
test-suite :: CTMark/SPASS/SPASS.test           6.25   6.24 -0.1%
test-suite...:: CTMark/sqlite3/sqlite3.test     1.66   1.66 -0.1%
test-suite :: CTMark/kimwitu++/kc.test         13.61  13.60 -0.0%
Geomean difference                                          -0.2%

Code size:
Program                                        base     cse      diff
test-suite...-typeset/consumer-typeset.test    1315632  1266480 -3.7%
test-suite...:: CTMark/ClamAV/clamscan.test    1313892  1297508 -1.2%
test-suite :: CTMark/lencod/lencod.test        1439504  1423112 -1.1%
test-suite...TMark/7zip/7zip-benchmark.test    2936980  2904172 -1.1%
test-suite :: CTMark/Bullet/bullet.test        3478276  3445460 -0.9%
test-suite...ark/tramp3d-v4/tramp3d-v4.test    8082868  8033492 -0.6%
test-suite :: CTMark/kimwitu++/kc.test         3870380  3853972 -0.4%
test-suite :: CTMark/SPASS/SPASS.test          1434904  1434896 -0.0%
test-suite...Mark/mafft/pairlocalalign.test    764528   764528   0.0%
test-suite...:: CTMark/sqlite3/sqlite3.test    782092   782092   0.0%
Geomean difference                                              -0.9%

Differential Revision: https://reviews.llvm.org/D60580

llvm-svn: 358369
2019-04-15 05:04:20 +00:00
Nico Weber ae050d214b llvm-undname: Fix oss-fuzz-foudn crash-on-invalid with incomplete special table nodes
llvm-svn: 358367
2019-04-14 23:32:37 +00:00
Nico Weber 63fe2593ae llvm-undname: Fix another crash-on-invalid found by oss-fuzz
llvm-svn: 358363
2019-04-14 23:08:12 +00:00
Craig Topper 3c57976447 [X86] Move VPTESTM matching from the isel table to custom code in X86ISelDAGToDAG.
We had many tablegen patterns for these instructions. And due to the
commutability of the patterns, tablegen expands them to even more patterns. All
together VPTESTMD patterns accounted for more the 50K of the 610K isel table.
This had gotten bad when we stopped canonicalizing AND to vXi64. This required
a pattern for every combination of bitcast input type.

This change moves the matching to custom code where it is easier to look through
the bitcasts without being concerned with the specific types.

The test changes are because we are now stricter with one use checks as its
required to make load folding legal. We now require the AND and any BITCAST to
only have a single use. This prevents forming VPTESTM and a VPAND with the same
inputs.

We now support broadcast loads for 128/256 patterns without VLX. We'll widen to
512-bit like and still fold the broadcast since the amount of memory read
doesn't change.

There are a few tests that got slightly longer because are now prefering
load + VPTESTM over XOR+VPCMPEQ for (seteq (load), allzeros). Previously we were
able to share the XOR with multiple VPTESTM instructions.

llvm-svn: 358359
2019-04-14 18:26:11 +00:00
Craig Topper b17e5ec61b [X86] Don't form masked vpcmp/vcmp/vptestm operations if the setcc node has more than one use.
We're better of emitting a single compare + kand rather than a compare for the
other use and a masked compare.

I'm looking into using custom instruction selection for VPTESTM to reduce the
ridiculous number of permutations of patterns in the isel table. Putting a one
use check on all masked compare folding makes load fold matching in the custom
code easier.

llvm-svn: 358358
2019-04-14 18:26:06 +00:00
Craig Topper 476dd06854 [X86] Update bool_reduction_v8f32 test cases from vector-compare-any_of.ll and vector-compare-all_of.ll to be proper reductions.
One of the shuffles was used twice. While the intended shuffle wasn't connected.

llvm-svn: 358346
2019-04-14 04:20:42 +00:00
Philip Reames 0eeb2cd491 [Tests] Add tests for D60659, and make adjustments to others to make diff clear
Three related changes:
1) auto-gen several test files
2) Add the new tests at the bottom of said files
3) Adjust a couple of other test files not to use stores to constants when trying to test constexpr address handling

llvm-svn: 358344
2019-04-13 22:12:56 +00:00
Bill Wendling 191f1487b6 [X86] Use PC-relative mode for the kernel code model
Summary:
The Linux kernel uses PC-relative mode, so allow that when the code model is
"kernel".

Reviewers: craig.topper

Reviewed By: craig.topper

Subscribers: llvm-commits, kees, nickdesaulniers

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60643

llvm-svn: 358343
2019-04-13 21:39:28 +00:00
Nikita Popov 040871db48 [CVP] Add tests for range of with.overflow result; NFC
Test range of with.overflow result in the no-overflow branch.

llvm-svn: 358341
2019-04-13 19:43:51 +00:00
Don Hinton 7d2021defc [CommandLineParser] Add DefaultOption flag
Summary: Add DefaultOption flag to CommandLineParser which provides a
default option or alias, but allows users to override it for some
other purpose as needed.

Also, add `-h` as a default alias to `-help`, which can be seamlessly
overridden by applications like llvm-objdump and llvm-readobj which
use `-h` as an alias for other options.

Reviewers: alexfh, klimek

Reviewed By: klimek

Subscribers: MaskRay, mehdi_amini, inglorion, dexonsmith, hiraditya, llvm-commits, jhenderson, arphaman, cfe-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D59746

llvm-svn: 358337
2019-04-13 16:55:28 +00:00
Nikita Popov 41e284b9c3 [CVP] Fix inverted predicates in test; NFC
Checked the wrong direction in the umul tests... fix predicated to
line up with the test name.

llvm-svn: 358331
2019-04-13 11:47:36 +00:00
Nikita Popov 25c1aa15a7 [CVP] Add tests for with.overflow used as condition; NFC
llvm-svn: 358330
2019-04-13 11:40:16 +00:00
Chen Zheng 87dd0e06dc [InstCombine] Canonicalize (-X srem Y) to -(X srem Y).
Differential Revision: https://reviews.llvm.org/D60647

llvm-svn: 358328
2019-04-13 09:21:22 +00:00
Chen Zheng fc59a0326b [InstCombine] [NFC] add testcases for canonicalizing (-X srem Y) to -(X srem Y).
llvm-svn: 358327
2019-04-13 07:34:55 +00:00
Philip Reames e03301a3b3 [StackMaps] Update llvm-readobj to parse V3 Stackmaps
This updates the StackMap parser in the llvm-readobj tool to parse version 3 StackMaps, which were bumped in https://reviews.llvm.org/D32629.

Version 3 StackMaps differ in that they have a uint16 sized "location size" field which was added to the Location block in a StackMap record. The record has additional padding for alignment. This was a backwards incompatible change resulting in a StackMap version bump.

Patch By: jacob.hughes@kcl.ac.uk (with a rewrite of tests by me)
Differential Revision: https://reviews.llvm.org/D59020

llvm-svn: 358325
2019-04-13 03:55:13 +00:00
Philip Reames eea989a909 [StackMaps] Add location size to llvm-readobj -stackmap output
The size field of a location can be different for each entry, so it is useful to have this displayed in the output of llvm-readobj -stackmap. Below is an example of how the output would look:

Record ID: 2882400000, instruction offset: 16
   3 locations:
     #1: Constant 1, size: 8
     #2: Constant 2, size: 8
     #3: Constant 3, size: 8
   0 live-outs: [ ]

Patch By: jacob.hughes@kcl.ac.uk (with heavy modification by me)
Differential Revision: https://reviews.llvm.org/D59169

llvm-svn: 358324
2019-04-13 03:08:45 +00:00
Amara Emerson 93e58d2396 [AArch64][GlobalISel] Enable copy elision in the pre-legalizer combine and fix a crash.
This enables the simple copy combine that already exists in the CombinerHelper.
However, it exposed a bug in the GISelChangeObserver where it wouldn't clear a
set of MIs to process, and so would end up causing a crash when deleted MIs were
being added to the combiner worklist again.

Differential Revision: https://reviews.llvm.org/D60579

llvm-svn: 358318
2019-04-13 00:33:25 +00:00
Thomas Lively fef8de66a6 [WebAssembly] Add DataCount section to object files
Summary:
This ensures that object files will continue to validate as
WebAssembly modules in the presence of bulk memory operations. Engines
that don't support bulk memory operations will not recognize the
DataCount section and will report validation errors, but that's ok
because object files aren't supposed to be run directly anyway.

Reviewers: aheejin, dschuff, sbc100

Subscribers: jgravelle-google, hiraditya, sunfish, rupprecht, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60623

llvm-svn: 358315
2019-04-12 22:27:48 +00:00
Amara Emerson bdb5e4e4ca [GlobalISel] Fix a crash when handling an invalid MVT during call lowering.
This crash was introduced in r358032 as we try to construct an EVT from an MVT
in order to find the register type for the calling conv. Fall back instead of
trying to do this with an invalid MVT coming from i256.

llvm-svn: 358314
2019-04-12 22:05:46 +00:00
Alina Sbirlea f9f073a861 [MemorySSA] Add previous def to cache when found, even if trivial.
Summary:
When inserting a new Def, MemorySSA may be have non-minimal number of Phis.
While inserting, the walk to find the previous definition may cleanup minimal Phis.
When the last definition is trivial to obtain, we do not cache it.

It is possible while getting the previous definition for a Def to get two different answers:
- one that was straight-forward to find when walking the first path (a trivial phi in this case), and
- another that follows a cleanup of the trivial phi, it determines it may need additional Phi nodes, it inserts them and returns a new phi in the same position as the former trivial one.
While the Phis added for the second path are all redundant, they are not complete (the walk is only done upwards), and they are not properly cleaned up afterwards.

A way to fix this problem is to cache the straight-forward answer we got on the first walk.
The caching is only kept for the duration of a getPreviousDef call, and for Phis we use TrackingVH, so removing the trivial phi will lead to replacing it with the next dominating phi in the cache.
Resolves PR40749.

Reviewers: george.burgess.iv

Subscribers: jlebar, Prazek, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60634

llvm-svn: 358313
2019-04-12 21:58:52 +00:00
Amara Emerson 2806fd01a1 [AArch64][GlobalISel] Fix a crash when selecting shufflevectors with an undef mask element.
If a shufflevector's mask vector has an element with "undef" then the generic
instruction defining that element register is a G_IMPLICT_DEF instead of G_CONSTANT.
This fixes the selector to handle this case, and for now assumes that undef just means
zero. In future we'll optimize this case properly.

llvm-svn: 358312
2019-04-12 21:31:21 +00:00
Thomas Lively 9e27514996 [WebAssembly] Add mutable-globals to bleeding-edge CPU
Summary: This brings the backend in line with Clang.

Reviewers: aheejin, dschuff

Subscribers: sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60594

llvm-svn: 358310
2019-04-12 20:39:53 +00:00
Alina Sbirlea 57769382b1 [MemorySSA] Small fix for the clobber limit.
Summary:
After introducing the limit for clobber walking, `walkToPhiOrClobber` would assert that the limit is at least 1 on entry.
The test included triggered that assert.

The callsite in `tryOptimizePhi` making the calls to `walkToPhiOrClobber` is structured like this:
```
while (true) {
   if (getBlockingAccess()) { // calls walkToPhiOrClobber
   }
   for (...) {
     walkToPhiOrClobber();
   }
}
```

The cleanest fix is to check if the limit was reached inside `walkToPhiOrClobber`, and give an allowence of 1.
This approach not make any alias() calls (no calls to instructionClobbersQuery), so the performance condition is enforced.
The limit is set back to 0 if not used, as this provides info on the fact that we stopped before reaching a true clobber.

Reviewers: george.burgess.iv

Subscribers: jlebar, Prazek, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60479

llvm-svn: 358303
2019-04-12 18:48:46 +00:00
Philip Reames b091cc081d [InstCombine] Fix a nasty miscompile introduced w/masked.gather demanded elts
This fixes a miscompile which was introduced in r356510 (https://reviews.llvm.org/D57372).

The problem is that the original patch removed pointer operands where the load results we're demanded, but without considering the legality of the load itself.  If the masked.gather had active, but undemanded, lanes, then we could end up creating a load which loaded from an undef address.  The result could be a segfault, or, in theory, an arbitrary read from a random memory location into an used register.  

llvm-svn: 358299
2019-04-12 18:26:56 +00:00
Nikita Popov 00a0d5d1de [CVP] Set NSW/NUW flags when simplifying with.overflow
When CVP determines that a with.overflow intrinsic cannot overflow,
it currently inserts a simple add/sub. As we already determined that
there can be no overflow, we should add the appropriate NUW/NSW flag.

Differential Revision: https://reviews.llvm.org/D60585

llvm-svn: 358298
2019-04-12 18:18:17 +00:00
Philip Reames 7a60cd38af [Tests] Checkin a test demonstrating a miscompile so that patch which fixes it shows a clear diff
llvm-svn: 358296
2019-04-12 18:11:58 +00:00
Lang Hames c7c1f21525 Simplify decoupling between RuntimeDyld/RuntimeDyldChecker, add 'got_addr' util.
This patch reduces the number of functions in the interface between RuntimeDyld
and RuntimeDyldChecker by combining "GetXAddress" and "GetXContent" functions
into "GetXInfo" functions that return a struct describing both the address and
content. The GetStubOffset function is also replaced with a pair of utilities,
GetStubInfo and GetGOTInfo, that fit the new scheme. For RuntimeDyld both of
these functions will return the same result, but for the new JITLink linker
(https://reviews.llvm.org/D58704) these will provide the addresses of PLT stubs
and GOT entries respectively.

For JITLink's use, a 'got_addr' utility has been added to the rtdyld-check
language, and the syntax of 'got_addr' and 'stub_addr' has been changed: both
functions now take two arguments, a 'stub container name' and a target symbol
name. For llvm-rtdyld/RuntimeDyld the stub container name is the object file
name and section name, separated by a slash. E.g.:

rtdyld-check: *{8}(stub_addr(foo.o/__text, y)) = y

For the upcoming llvm-jitlink utility, which creates stubs on a per-file basis
rather than a per-section basis, the container name is just the file name. E.g.:

jitlink-check: *{8}(got_addr(foo.o, y)) = y
llvm-svn: 358295
2019-04-12 18:07:28 +00:00
Brendon Cahoon 4df216cd62 [Hexagon] Fix reuse bug in Vector Loop Carried Reuse pass
The Hexagon Vector Loop Carried Reuse pass was allowing reuse between
two shufflevectors with different masks. The reason is that the masks
are not instruction objects, so the code that checks each operand
just skipped over the operands.

This patch fixes the bug by checking if the operands are the same
when they are not instruction objects. If the objects are not the
same, then the code assumes that reuse cannot occur.

Differential Revision: https://reviews.llvm.org/D60019

llvm-svn: 358292
2019-04-12 16:37:12 +00:00
Sanjay Patel 5e4ad39af7 [DAGCombiner] narrow shuffle of concatenated vectors
// shuffle (concat X, undef), (concat Y, undef), Mask -->
// concat (shuffle X, Y, Mask0), (shuffle X, Y, Mask1)

The ARM changes with 'vtrn' and narrowed 'vuzp' are improvements.

The x86 changes look neutral or better. There's one test with an
extra instruction, but that could be reversed for a subtarget with
the right attributes. But by default, we want to avoid the 256-bit
op when possible (in my motivating benchmark, a handful of ymm ops
sprinkled into a sequence of xmm ops are triggering frequency
throttling on Haswell resulting in significantly worse perf).

Differential Revision: https://reviews.llvm.org/D60545

llvm-svn: 358291
2019-04-12 16:31:56 +00:00
Simon Pilgrim 6c8f4ada36 [X86][SSE] Recognise vXi1 boolean anyof/allof reduction patterns
Currently combineHorizontalPredicateResult only handles anyof/allof reduction patterns of legal types, which can be tricky to match as type legalization of bools can introduce bitcasts/truncs/extensions.

This patch extends combineHorizontalPredicateResult to recognise vXi1 bool reductions as well and uses the existing combineBitcastvxi1 helper to create the MOVMSK necessary to then compare the signmask result.

This ensures the accuracy of the reduction costs added in D60403 which assume the MOVMSK generation.

Differential Revision: https://reviews.llvm.org/D60610

llvm-svn: 358286
2019-04-12 14:22:57 +00:00
Hans Wennborg 4e6b857922 Revert r358268 "[DebugInfo] DW_OP_deref_size in PrologEpilogInserter."
It causes clang to crash while building Chromium. See https://crbug.com/952230
for reproducer.

> The PrologEpilogInserter need to insert a DW_OP_deref_size before
> prepending a memory location expression to an already implicit
> expression to avoid having the existing expression act on the memory
> address instead of the value behind it.
>
> The reason for using DW_OP_deref_size and not plain DW_OP_deref is that
> big-endian targets need to read the right size as simply truncating a
> larger read would yield the wrong result (LSB bytes are not at the lower
> address).
>
> Differential Revision: https://reviews.llvm.org/D59687

llvm-svn: 358281
2019-04-12 12:54:52 +00:00
Eugene Leviant 88089fed9c [llvm-objcopy] Fill .symtab_shndx section correctly
Differential revision: https://reviews.llvm.org/D60555

llvm-svn: 358278
2019-04-12 11:59:30 +00:00
Kang Zhang 2446f843ae [PowerPC] Add initialization for some ppc passes
Summary:

Some llc debug options need pass-name as the parameters.
But if we use the pass-name ppc-early-ret, we will get below error:
llc test.ll -stop-after ppc-early-ret
LLVM ERROR: "ppc-early-ret" pass is not registered.
Below pass-names have the pass is not registered error:
ppc-ctr-loops
ppc-ctr-loops-verify
ppc-loop-preinc-prep
ppc-toc-reg-deps
ppc-vsx-copy
ppc-early-ret
ppc-vsx-fma-mutate
ppc-vsx-swaps
ppc-reduce-cr-ops
ppc-qpx-load-splat
ppc-branch-coalescing
ppc-branch-select

Reviewed By: jsji

Differential Revision: https://reviews.llvm.org/D60248

llvm-svn: 358271
2019-04-12 09:59:40 +00:00
Jeremy Morse 32afe6a1f8 [DebugInfo] Fix pr41175 Dead Store Elimination missing debug loc
Bug: https://bugs.llvm.org/show_bug.cgi?id=41175

In the bug test case the DSE pass is shortening the range of memory that a
memset is working on. A getelementptr is generated so that the new
starting address can be passed to memset. This instruction was not given
a DebugLoc.

To fix the bug, copy the DebugLoc from the memset instruction.

Patch by Orlando Cazalet-Hyams!

Differential Revision: https://reviews.llvm.org/D60556

llvm-svn: 358270
2019-04-12 09:47:35 +00:00
Markus Lavin 138c76129b [DebugInfo] DW_OP_deref_size in PrologEpilogInserter.
The PrologEpilogInserter need to insert a DW_OP_deref_size before
prepending a memory location expression to an already implicit
expression to avoid having the existing expression act on the memory
address instead of the value behind it.

The reason for using DW_OP_deref_size and not plain DW_OP_deref is that
big-endian targets need to read the right size as simply truncating a
larger read would yield the wrong result (LSB bytes are not at the lower
address).

Differential Revision: https://reviews.llvm.org/D59687

llvm-svn: 358268
2019-04-12 08:23:55 +00:00
Fangrui Song d5c404246f [ConstantFold] Don't evaluate FP or FP vector casts or truncations when simplifying icmp
Fix PR41476

llvm-svn: 358262
2019-04-12 07:34:30 +00:00
Eric Christopher b6926bdcff Revert "[PowerPC] Add initialization for some ppc passes"
This reverts commit 6f8f98ce8d as it
is breaking nearly every bot.

llvm-svn: 358260
2019-04-12 07:16:58 +00:00
Craig Topper 3b1239d2a8 [TargetLowering][X86] Teach SimplifyDemandedBits to use ShrinkDemandedOp on ISD::SHL nodes.
If the upper bits of the SHL result aren't used, we might be able to use a narrower shift. For example, on X86 this can turn a 64-bit into 32-bit enabling a smaller encoding.

Differential Revision: https://reviews.llvm.org/D60358

llvm-svn: 358257
2019-04-12 06:49:28 +00:00
Kang Zhang 6f8f98ce8d [PowerPC] Add initialization for some ppc passes
Summary:

Some llc debug options need pass-name as the parameters.
But if we use the pass-name ppc-early-ret, we will get below error:
llc test.ll -stop-after ppc-early-ret
LLVM ERROR: "ppc-early-ret" pass is not registered.
Below pass-names have the pass is not registered error:
ppc-ctr-loops
ppc-ctr-loops-verify
ppc-loop-preinc-prep
ppc-toc-reg-deps
ppc-vsx-copy
ppc-early-ret
ppc-vsx-fma-mutate
ppc-vsx-swaps
ppc-reduce-cr-ops
ppc-qpx-load-splat
ppc-branch-coalescing
ppc-branch-select

Reviewed By: jsji

Differential Revision: https://reviews.llvm.org/D60248

llvm-svn: 358256
2019-04-12 06:35:15 +00:00
Zi Xuan Wu ac79ef8f0e [PowerPC] More precise exploitation of P9 maddld instruction when operands are constant
There are 3 operands of maddld, (add (mul %1, %2), %3) and sometimes
they are constant. If there is constant operand, it takes extra li to 
materialize the operand, and one more extra register too. So it's not 
profitable to use maddld to optimize mul-add pattern.

Differential Revision: https://reviews.llvm.org/D60181

llvm-svn: 358253
2019-04-12 05:21:31 +00:00
Nico Weber 03db625c13 llvm-undname: Fix out-of-bounds read on invalid intrinsic function code
Found by inspection.

llvm-svn: 358239
2019-04-11 23:11:33 +00:00
Nico Weber e5b62654a5 llvm-undname: Don't crash on incomplete enum tag manglings
Found by inspection.

llvm-svn: 358238
2019-04-11 22:59:25 +00:00
Nico Weber b4f33bbbb0 llvm-undname: Fix crash on incomplete virtual this adjusts
Found by oss-fuzz.

Also remove an else-after-return, this part has no behavior change.

llvm-svn: 358237
2019-04-11 22:47:18 +00:00
Nico Weber f2d8f09d5d llvm-undname: Fix crash on invalid name in a template parameter pointer to member arg
Found by oss-fuzz.

llvm-svn: 358234
2019-04-11 22:23:35 +00:00
Brendon Cahoon 57c3d4bed3 [Pipeliner] Fix incorrect loop carried dependence calculation
The isLoopCarriedDep function does not correctly compute loop
carried dependences when the array index offset is negative
or the stride is smallar than the access size.

Patch by Denis Antrushin.

Differential Revision: https://reviews.llvm.org/D60135

llvm-svn: 358233
2019-04-11 21:57:51 +00:00
Nikita Popov 6ffa1511ea [CVP] Generate full test checks for overflows.ll; NFC
llvm-svn: 358229
2019-04-11 21:10:39 +00:00
Rong Xu 959ef16859 [PGO] Better handling of profile hash mismatch
We currently assume profile hash conflicts will be caught by an upfront
check and we assert for the cases that escape the check. The assumption
is not always true as there are chances of conflict. This patch prints
a warning and skips annotating the function for the escaped cases,.

Differential Revision: https://reviews.llvm.org/D60154

llvm-svn: 358225
2019-04-11 20:54:17 +00:00
Amara Emerson 7e9355f870 [AArch64][GlobalISel] Flesh out vector load/store support for more types.
Some of these were legalizing into smaller vector types unnecessarily,
others were simply not supported yet.

llvm-svn: 358223
2019-04-11 20:40:01 +00:00
Amara Emerson b956051415 [AArch64][GlobalISel] Legalization and ISel support for load/stores of vectors of pointers.
Loads and store of values with type like <2 x p0> currently don't get imported
because SelectionDAG has no knowledge of pointer types. To leverage the existing
support for vector load/stores, we can bitcast the value to have s64 element
types instead. We do this as a custom legalization.

This patch also adds support for general loads of <2 x s64>, and relaxes some
type conditions on selecting G_BITCAST.

Differential Revision: https://reviews.llvm.org/D60534

llvm-svn: 358221
2019-04-11 20:32:24 +00:00
Aaron Smith 994023a3f1 [DebugInfo] Combine Trivial and NonTrivial flags
Summary:
Companion to https://reviews.llvm.org/D59347


Reviewers: rnk, zturner, probinson, dblaikie, deadalnix

Subscribers: aprantl, jdoerfert, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D59348

llvm-svn: 358220
2019-04-11 20:25:10 +00:00
Craig Topper 68a5d619a4 [X86] Restrict vselect handling in scalarizeExtEltFP to only case to pre type legalization where the setcc result type is vXi1.
If the vector setcc has been legalized then we will need to convert a vector boolean of 0 or -1 to a scalar boolean of 0 or 1.

The added test case previously crashed in 32-bit mode by creating a setcc with an i64 condition that type legalization couldn't expand.

llvm-svn: 358218
2019-04-11 19:57:44 +00:00
Craig Topper a3635b94c4 [X86] Add 32-bit command line to extractelement-fp.ll so I can add a test case for a 32-bit only crasher. NFC
This is a bit ugly for ABI reasons about how floats/doubles are returned.

llvm-svn: 358217
2019-04-11 19:57:24 +00:00
Craig Topper 586fad50ac [X86] Add patterns for using movss/movsd for atomic load/store of f32/64. Remove atomic fadd pseudos use isel patterns instead.
This patch adds patterns for turning bitcasted atomic load/store into movss/sd.

It also removes the pseudo instructions for atomic RMW fadd. Instead just adding isel patterns for folding an atomic load into addss/sd. And relying on the new movss/sd store pattern to handle the write part.

This also makes the fadd patterns use VEX and EVEX instructions when AVX or AVX512F are enabled.

Differential Revision: https://reviews.llvm.org/D60394

llvm-svn: 358215
2019-04-11 19:19:52 +00:00
Craig Topper f7e548c076 Recommit r358211 "[X86] Use FILD/FIST to implement i64 atomic load on 32-bit targets with X87, but no SSE2"
With correct test checks this time.

If we have X87, but not SSE2 we can atomicaly load an i64 value into the significand of an 80-bit extended precision x87 register using fild. We can then use a fist instruction to convert it back to an i64 integ

This matches what gcc and icc do for this case and removes an existing FIXME.

llvm-svn: 358214
2019-04-11 19:19:42 +00:00
Craig Topper 8200880c9a Revert r358211 "[X86] Use FILD/FIST to implement i64 atomic load on 32-bit targets with X87, but no SSE2"
I seem to have messed up the test checks.

llvm-svn: 358212
2019-04-11 19:04:38 +00:00
Craig Topper 1c2dfc3100 [X86] Use FILD/FIST to implement i64 atomic load on 32-bit targets with X87, but no SSE2
If we have X87, but not SSE2 we can atomicaly load an i64 value into the significand of an 80-bit extended precision x87 register using fild. We can then use a fist instruction to convert it back to an i64 integer and store it to a stack temporary. From there we can do two 32-bit loads to get the value into integer registers without worrying about atomicness.

This matches what gcc and icc do for this case and removes an existing FIXME.

Differential Revision: https://reviews.llvm.org/D60156

llvm-svn: 358211
2019-04-11 18:40:21 +00:00
Craig Topper 1fe5a9963d [X86] Pre-commit i64 volatile test case for D60156. NFC
llvm-svn: 358210
2019-04-11 18:40:08 +00:00
Simon Pilgrim 8d083c5e0b [ConstantFold] ExtractConstantBytes - handle shifts on large integer types
Use APInt instead of getZExtValue from the ConstantInt until we can confirm that the shift amount is in range.

Reduced from OSS-Fuzz #14169 - https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=14169

llvm-svn: 358192
2019-04-11 16:39:31 +00:00
Simon Pilgrim 40b647ae8e [X86] SimplifyDemandedVectorElts - add X86ISD::VPERMV3 mask support
Completes SimplifyDemandedVectorElts's basic variable shuffle mask support which should help D60512 + D60562 

llvm-svn: 358186
2019-04-11 15:29:15 +00:00
Serge Guelton 3742bb89f8 Make llvm-nm -help great again
Only display help from the llvm-nm category instead of all llvm options, which make it much more usable.
There's still an issue with -s, which is probably a bug in llvm::cl and worth another commit.

Differential Revision: https://reviews.llvm.org/D60411

llvm-svn: 358185
2019-04-11 15:22:48 +00:00
Roger Ferrer Ibanez b621f04135 [RISCV] Diagnose invalid second input register operand when using %tprel_add
RISCVMCCodeEmitter::expandAddTPRel asserts that the second operand must be
x4/tp. As we are not currently checking this in the RISCVAsmParser, the assert
is easy to trigger due to wrong assembly input.

This patch does a late check of this constraint.

An alternative could be using a singleton register class for x4/tp similar to
the current one for sp. Unfortunately it does not result in a good diagnostic.
Because add is an overloaded mnemonic, if no matching is possible, the
diagnostic of the first failing alternative seems to be used as the diagnostic
itself. This means that this case the %tprel_add is diagnosed as an invalid
operand (because the real add instruction only has 3 operands).

Differential Revision: https://reviews.llvm.org/D60528

llvm-svn: 358183
2019-04-11 15:13:12 +00:00
Simon Pilgrim a41275a398 [X86][AVX] Tweak X86ISD::VPERMV3 demandedelts test
Original test was too dependent on the order of the combines that could cause the inserted element being demanded after all

llvm-svn: 358182
2019-04-11 15:09:03 +00:00
Simon Pilgrim 34686b6e97 [X86][AVX] Add X86ISD::VPERMV3 demandedelts test
llvm-svn: 358175
2019-04-11 14:48:46 +00:00
Simon Pilgrim 8a25154fa7 [X86] SimplifyDemandedVectorElts - add X86ISD::VPERMV mask support
llvm-svn: 358174
2019-04-11 14:35:45 +00:00
Simon Pilgrim b237b54c2d [X86][AVX] Add X86ISD::VPERMV demandedelts test
llvm-svn: 358173
2019-04-11 14:26:32 +00:00
Sanjay Patel c0f4a35e68 [DAGCombiner][x86] scalarize inserted vector FP ops
// bo (build_vec ...undef, x, undef...), (build_vec ...undef, y, undef...) -->
// build_vec ...undef, (bo x, y), undef...

The lifetime of the nodes in these examples is different for variables versus constants,
but they are all build vectors briefly, so I'm proposing to catch them in this form to
handle all of the leading examples in the motivating test file.

Before we have build vectors, we might have insert_vector_element. After that, we might
have scalar_to_vector and constant pool loads.

It's going to take more work to ensure that FP vector operands are getting simplified
with undef elements, so this transform can apply more widely. In a non-loose FP environment,
we are likely simplifying FP elements to NaN values rather than undefs.

We also need to allow more opcodes down this path. Eg, we don't handle FP min/max flavors
yet.

Differential Revision: https://reviews.llvm.org/D60514

llvm-svn: 358172
2019-04-11 14:21:57 +00:00
Diogo N. Sampaio 8ddfd46c61 [AArch64] Add lowering pattern for llvm.aarch64.neon.vcvtfxs2fp.f16.i64
Summary:  Add lowering pattern for llvm.aarch64.neon.vcvtfxs2fp.f16.i64

Reviewers: pbarrio, DavidSpickett, LukeGeeson

Reviewed By: LukeGeeson

Subscribers: javed.absar, kristof.beyls, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60259

llvm-svn: 358171
2019-04-11 14:19:43 +00:00
Simon Pilgrim 6f3866c6fb [X86] SimplifyDemandedVectorElts - add X86ISD::VPERMILPV mask support
llvm-svn: 358170
2019-04-11 14:15:01 +00:00
Simon Pilgrim 886e32e0f2 [X86][AVX] Add X86ISD::VPERMILPV demandedelts tests
llvm-svn: 358168
2019-04-11 14:09:35 +00:00
Simon Pilgrim cb5218ad48 [X86] SimplifyDemandedVectorElts - add X86ISD::VPERMIL2 mask support
llvm-svn: 358167
2019-04-11 14:04:19 +00:00
Simon Pilgrim 7021dec26e [X86][XOP] Add X86ISD::VPERMIL2 demandedelts test
llvm-svn: 358166
2019-04-11 13:52:43 +00:00
Simon Pilgrim e468cc7f14 [X86] SimplifyDemandedVectorElts - add VPPERM support
We need to add support for all variable shuffle mask ops, but VPPERM is the only one that already has test coverage.

llvm-svn: 358165
2019-04-11 13:30:38 +00:00
Andrea Di Biagio 2050dff996 [MCA] Remove wrong comments from a test. NFC
llvm-svn: 358160
2019-04-11 10:15:04 +00:00
Roman Lebedev fbb823891d [llvm-exegesis] Fix serialization/deserialization of special NoRegister register (PR41448)
Summary:
A *lot* of instructions have this special register.
It seems this never really worked, but i finally noticed it only
because it happened to break for `CMOV16rm` instruction.

We serialized that register as "" (empty string), which is naturally
'ignored' during deserialization, so we re-create a `MCInst` with
too few operands.

And when we then happened to try to resolve variant sched class
for this mis-serialized instruction, and the variant predicate
tried to read an operand that was out of bounds since we got less operands,
we crashed.

Fixes [[ https://bugs.llvm.org/show_bug.cgi?id=41448 | PR41448 ]].

Reviewers: craig.topper, courbet

Reviewed By: courbet

Subscribers: tschuett, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60517

llvm-svn: 358153
2019-04-11 07:20:50 +00:00
Shiva Chen 7cc03bd064 [RISCV] Put data smaller than eight bytes to small data section
Because of gp = sdata_start_address + 0x800, gp with signed twelve-bit offset
could covert most of the small data section. Linker relaxation could transfer
the multiple data accessing instructions to a gp base with signed twelve-bit
offset instruction.

Differential Revision: https://reviews.llvm.org/D57493

llvm-svn: 358150
2019-04-11 04:59:13 +00:00
Fangrui Song 6a285dfe71 [DWARF] Set discriminator to 0 for DW_LNS_copy
Summary:
Make DW_LNS_copy set the discriminator register to 0, to conform to
DWARF 4 & 5: "Then it sets the discriminator register to 0, and sets the
basic_block, prologue_end and epilogue_begin registers to false."

Because all of DW_LNE_end_sequence, DN_LNS_copy, and special opcodes reset
discriminator to 0, we can move discriminator=0 to appendRowToMatrix.

Also, make DW_LNS_copy print before appending the row, as it is similar
to a address+=0,line+=0 special opcode, which prints before appending
the row.

Reviewers: dblaikie, probinson, aprantl

Reviewed By: dblaikie

Subscribers: danielcdh, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60364

llvm-svn: 358148
2019-04-11 02:02:44 +00:00
Erik Pilkington cb5c7bd9eb Fix a hang when lowering __builtin_dynamic_object_size
If the ObjectSizeOffsetEvaluator fails to fold the object size call, then it may
litter some unused instructions in the function. When done repeatably in
InstCombine, this results in an infinite loop. Fix this by tracking the set of
instructions that were inserted, then removing them on failure.

rdar://49172227

Differential revision: https://reviews.llvm.org/D60298

llvm-svn: 358146
2019-04-10 23:42:11 +00:00
Amara Emerson 213e0bde04 [AArch64][GlobalISel] Make <2 x p0> = G_BUILD_VECTOR legal.
The existing isel support already works for p0 once the legalizer accepts it.

llvm-svn: 358144
2019-04-10 23:06:14 +00:00
Amara Emerson a7ff111b04 [AArch64][GlobalISel] Add legalizer support for <8 x s16> and <16 x s8> G_ADD.
llvm-svn: 358143
2019-04-10 23:06:11 +00:00
Amara Emerson ae878dab03 [AArch64][GlobalISel] Scalarize vector SDIV.
llvm-svn: 358142
2019-04-10 23:06:08 +00:00
Craig Topper 10048060f6 [X86] Add SSE1 command line to atomic-fp.ll and atomic-non-integer.ll. NFC
llvm-svn: 358141
2019-04-10 22:35:32 +00:00
Craig Topper a3ee7e2b3e [X86] Autogenerate complete checks. NFC
llvm-svn: 358140
2019-04-10 22:35:24 +00:00
Craig Topper 61f31cbcb2 [X86] Teach foldMaskedShiftToScaledMask to look through an any_extend from i32 to i64 between the and & shl
foldMaskedShiftToScaledMask tries to reorder and & shl to enable the shl to fold into an LEA. But if there is an any_extend between them it doesn't work.

This patch modifies the code to look through any_extend from i32 to i64 when the and mask only uses bits that weren't from the extended part.

This will prevent a regression from D60358 caused by 64-bit SHL being narrowed to 32-bits when their upper bits aren't demanded.

Differential Revision: https://reviews.llvm.org/D60532

llvm-svn: 358139
2019-04-10 21:42:08 +00:00
Craig Topper 4a32ce39b7 [X86] Make _Int instructions the preferred instructon for the assembly parser and disassembly parser to remove inconsistencies between VEX and EVEX.
Many of our instructions have both a _Int form used by intrinsics and a form
used by other IR constructs. In the EVEX space the _Int versions usually cover
all the capabilities include broadcasting and rounding. While the other version
only covers simple register/register or register/load forms. For this reason
in EVEX, the non intrinsic form is usually marked isCodeGenOnly=1.

In the VEX encoding space we were less consistent, but usually the _Int version
was the isCodeGenOnly version.

This commit makes the VEX instructions match the EVEX instructions. This was
done by manually studying the AsmMatcher table so its possible I missed some
cases, but we should be closer now.

I'm thinking about using the isCodeGenOnly bit to simplify the EVEX2VEX
tablegen code that disambiguates the _Int and non _Int versions. Currently it
checks register class sizes and Record the memory operands come from. I have
some other changes I was looking into for D59266 that may break the memory check.

I had to make a few scheduler hacks to keep the _Int versions from being treated
differently than the non _Int version.

Differential Revision: https://reviews.llvm.org/D60441

llvm-svn: 358138
2019-04-10 21:29:41 +00:00
David Green deb3342018 [ARM] Add an extra test for constant hoist. NFC
llvm-svn: 358128
2019-04-10 19:18:58 +00:00
Craig Topper cacb70c94b [X86] Add test case for LEA formation regression seen with D60358. NFC
If we have an (add X, (and (aext (shl Y, C1)), C2)), we can pull the shift through and+aext to fold into an LEA with the.
Assuming C1 is small enough and C2 masks off all of the extend bits.

This pattern showed up in D60358. And we need to handle it to prevent a regression.

llvm-svn: 358124
2019-04-10 19:09:06 +00:00
Roman Lebedev 5d9f656bb7 [TableGen] Introduce !listsplat 'binary' operator
Summary:
```
``!listsplat(a, size)``
    A list value that contains the value ``a`` ``size`` times.
    Example: ``!listsplat(0, 2)`` results in ``[0, 0]``.
```

I plan to use this in X86ScheduleBdVer2.td for LoadRes handling.

This is a little bit controversial because unlike every other binary operator
the types aren't identical.

Reviewers: stoklund, javed.absar, nhaehnle, craig.topper

Reviewed By: javed.absar

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60367

llvm-svn: 358117
2019-04-10 18:26:36 +00:00
David Green 4e3fd7757a [ARM] Add an extra constant hoisting test. NFC
This adds a simple extra test for constant hoisting to show it's
usefulness with constant addresses like those seen in memory
mapped registers in embedded systems.

llvm-svn: 358114
2019-04-10 18:05:57 +00:00
David Green 0861c87b06 Revert rL357745: [SelectionDAG] Compute known bits of CopyFromReg
Certain optimisations from ConstantHoisting and CGP rely on Selection DAG not
seeing through to the constant in other blocks. Revert this patch while we come
up with a better way to handle that.

I will try to follow this up with some better tests.

llvm-svn: 358113
2019-04-10 18:00:41 +00:00
Nico Weber 5f6eb1817a llvm-undname: Fix another crash-on-invalid
This fixes a regression from https://reviews.llvm.org/D60354. We used to

  SymbolNode *Symbol = demangleEncodedSymbol(MangledName, QN);
  if (Symbol) {
    Symbol->Name = QN;
  }

but changed that to
  SymbolNode *Symbol = demangleEncodedSymbol(MangledName, QN);
  if (Error)
    return nullptr;
  Symbol->Name = QN;

and one branch somewhere returned a nullptr without setting Error.

Looking at the code changed in r340083 and r340710 that branch looks
like a remnant from an earlier attempt to demangle RTTI descriptors
that has since been rewritten -- so just remove this branch. It
shouldn't change behavior for correctly mangled symbols.

llvm-svn: 358112
2019-04-10 17:31:34 +00:00
Matt Arsenault 7187272b2b GlobalISel: Support legalizing G_CONSTANT with irregular breakdown
llvm-svn: 358109
2019-04-10 17:27:53 +00:00
Craig Topper 35fe07916a [AArch64] Teach getTestBitOperand to look through ANY_EXTENDS
This patch teach getTestBitOperand to look through ANY_EXTENDs when the extended bits aren't used. The test case changed here is based what D60358 did to test16 in tbz-tbnz.ll. So this patch will avoid that regression.

Differential Revision: https://reviews.llvm.org/D60482

llvm-svn: 358108
2019-04-10 17:27:29 +00:00
Matt Arsenault 9e0eeba569 GlobalISel: Handle odd breakdowns for bit ops
llvm-svn: 358105
2019-04-10 17:07:56 +00:00
Nikita Popov 0a8228fd28 [InstCombine] Handle ssubo always overflow
Following D60483 and D60497, this adds support for AlwaysOverflows
handling for ssubo. This is the last case we can handle right now.

Differential Revision: https://reviews.llvm.org/D60518

llvm-svn: 358100
2019-04-10 16:32:15 +00:00
Nikita Popov 7a543c3758 [InstCombine] ssubo X, C -> saddo X, -C
ssubo X, C is equivalent to saddo X, -C. Make the transformation in
InstCombine and allow the logic implemented for saddo to fold prior
usages of add nsw or sub nsw with constants.

Patch by Dan Robertson.

Differential Revision: https://reviews.llvm.org/D60061

llvm-svn: 358099
2019-04-10 16:27:36 +00:00
Simon Pilgrim 37d8d55823 [X86][AVX] getTargetConstantBitsFromNode - extract bits from X86ISD::SUBV_BROADCAST
llvm-svn: 358096
2019-04-10 16:24:47 +00:00
Nikita Popov ef23e88480 [InstCombine] Handle saddo always overflow
Followup to D60483: Handle AlwaysOverflow conditions for saddo as
well.

Differential Revision: https://reviews.llvm.org/D60497

llvm-svn: 358095
2019-04-10 16:18:01 +00:00
Diogo N. Sampaio aae424a2d2 [AArch64] Add lowering pattern for scalar fp16 facge and facgt
Summary: The fp16 scalar version of facge and facgt requires a custom patter matching, as the result type is not the same width of the operands.

Reviewers: olista01, javed.absar, pbarrio

Reviewed By: javed.absar

Subscribers: kristof.beyls, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60212

llvm-svn: 358083
2019-04-10 13:34:18 +00:00
Diogo N. Sampaio 651463e4a8 [ARM] [FIX] Add missing f16 vector operations lowering
Summary:
Add missing <8xhalf> shufflevectors pattern, when using concat_vector dag node.
As well, allows <8xhalf> and <4xhalf> vldup1 operations.

These instructions are required for v8.2a fp16 lowering of vmul_n_f16, vmulq_n_f16 and vmulq_lane_f16 intrinsics.

Reviewers: olista01, pbarrio, LukeGeeson, efriedma

Reviewed By: efriedma

Subscribers: efriedma, javed.absar, kristof.beyls, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60319

llvm-svn: 358081
2019-04-10 13:28:06 +00:00
Xing GUO 8ab7414580 [llvm-readobj] Should declare `ListScope` for `verneed` entries.
Summary: YAML mappings require keys to be unique. See: https://yaml.org/spec/1.2/spec.html#id2764652

Reviewers: jhenderson, grimar, rupprecht, espindola, ruiu

Reviewed By: ruiu

Subscribers: ruiu, emaste, arichardson, MaskRay, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60438

llvm-svn: 358078
2019-04-10 12:47:21 +00:00
David Stenberg b96943b6a0 [DebugInfo] Track multiple registers in DbgEntityHistoryCalculator
Summary:
When calculating the debug value history, DbgEntityHistoryCalculator
would only keep track of register clobbering for the latest debug value
per inlined entity. This meant that preceding register-described debug
value fragments would live on until the next overlapping debug value,
ignoring any potential clobbering. This patch amends
DbgEntityHistoryCalculator so that it keeps track of all registers that
a inlined entity's currently live debug values are described by.

The DebugInfo/COFF/pieces.ll test case has had to be changed since
previously a register-described fragment would incorrectly outlive its
basic block.

The parent patch D59941 is expected to increase the coverage slightly,
as it makes sure that location list entries are inserted after clobbered
fragments, and this patch is expected to decrease it, as it stops
preceding register-described from living longer than they should. All in
all, this patch and the preceding patch has a negligible effect on the
output from `llvm-dwarfdump -statistics' for a clang-3.4 binary built
using the RelWithDebInfo build profile. "Scope bytes covered" increases
by 0.5%, and "variables with location" increases from 2212083 to
2212088, but it should improve the accuracy quite a bit.

This fixes PR40283.

Reviewers: aprantl, probinson, dblaikie, rnk, bjope

Reviewed By: aprantl

Subscribers: llvm-commits

Tags: #debug-info, #llvm

Differential Revision: https://reviews.llvm.org/D59942

llvm-svn: 358073
2019-04-10 11:28:28 +00:00
David Stenberg 5ffec6deef [DebugInfo] Improve handling of clobbered fragments
Summary:
Currently the DbgValueHistorymap only keeps track of clobbered registers
for the last debug value that it has encountered. This could lead to
preceding register-described debug values living on longer in the
location lists than they should. See PR40283 for an example.  This
patch does not introduce tracking of multiple registers, but changes
the DbgValueHistoryMap structure to allow for that in a follow-up
patch. This patch is not NFC, as it at least fixes two bugs in
DwarfDebug (both are covered in the new clobbered-fragments.mir test):

* If a debug value was clobbered (its End pointer set), the value would
  still be added to OpenRanges, meaning that the succeeding location list
  entries could potentially contain stale values.

* If a debug value was clobbered, and there were non-overlapping
  fragments that were still live after the clobbering, DwarfDebug would
  not create a location list entry starting directly after the
  clobbering instruction. This meant that the location list could have
  a gap until the next debug value for the variable was encountered.

Before this patch, the history map was represented by <Begin, End>
pairs, where a new pair was created for each new debug value. When
dealing with partially overlapping register-described debug values, such
as in the following example:

  DBG_VALUE $reg2, $noreg, !1, !DIExpression(DW_OP_LLVM_fragment, 32, 32)
  [...]
  DBG_VALUE $reg3, $noreg, !1, !DIExpression(DW_OP_LLVM_fragment, 64, 32)
  [...]
  $reg2 = insn1
  [...]
  $reg3 = insn2

the history map would then contain the entries `[<DV1, insn1>, [<DV2, insn2>]`.
This would leave it up to the users of the map to be aware of
the relative order of the instructions, which e.g. could make
DwarfDebug::buildLocationList() needlessly complex. Instead, this patch
makes the history map structure monotonically increasing by dropping the
End pointer, and replacing that with explicit clobbering entries in the
vector. Each debug value has an "end index", which if set, points to the
entry in the vector that ends the debug value. The ending entry can
either be an overlapping debug value, or an instruction which clobbers
the register that the debug value is described by. The ending entry's
instruction can thus either be excluded or included in the debug value's
range. If the end index is not set, the debug value that the entry
introduces is valid until the end of the function.

Changes to test cases:

 * DebugInfo/X86/pieces-3.ll: The range of the first DBG_VALUE, which
   describes that the fragment (0, 64) is located in RDI, was
   incorrectly ended by the clobbering of RAX, which the second
   (non-overlapping) DBG_VALUE was described by. With this patch we
   get a second entry that only describes RDI after that clobbering.

 * DebugInfo/ARM/partial-subreg.ll: This test seems to indiciate a bug
   in LiveDebugValues that is caused by it not being aware of fragments.
   I have added some comments in the test case about that. Also, before
   this patch DwarfDebug would incorrectly include a register-described
   debug value from a preceding block in a location list entry.

Reviewers: aprantl, probinson, dblaikie, rnk, bjope

Reviewed By: aprantl

Subscribers: javed.absar, kristof.beyls, jdoerfert, llvm-commits

Tags: #debug-info, #llvm

Differential Revision: https://reviews.llvm.org/D59941

llvm-svn: 358072
2019-04-10 11:28:20 +00:00
Diana Picus b6e83b98f9 [ARM GlobalISel] Select G_FCONSTANT for VFP3
Make it possible to TableGen code for FCONSTS and FCONSTD.

We need to make two changes to the TableGen descriptions of vfp_f32imm
and vfp_f64imm respectively:
* add GISelPredicateCode to check that the immediate fits in 8 bits;
* extract the SDNodeXForms into separate definitions and create a
GISDNodeXFormEquiv and a custom renderer function for each of them.

There's a lot of boilerplate to get the actual value of the immediate,
but it basically just boils down to calling ARM_AM::getFP32Imm or
ARM_AM::getFP64Imm.

llvm-svn: 358063
2019-04-10 09:14:32 +00:00
Diana Picus 3533ad6801 [ARM GlobalISel] Select G_FCONSTANT into pools
Put all floating point constants into constant pools and load their
values from there.

llvm-svn: 358062
2019-04-10 09:14:24 +00:00
Diana Picus 165846b031 [ARM GlobalISel] Map G_FCONSTANT
llvm-svn: 358061
2019-04-10 09:14:16 +00:00
David Stenberg fab4bdf4b9 Add REQUIRES: asserts to test using -debug-only
llvm-svn: 358057
2019-04-10 08:44:57 +00:00
Florian Hahn db1a69c250 [VPLAN] Minor improvement to testing and debug messages.
1. Use computed VF for stress testing.
2. If the computed VF does not produce vector code (VF smaller than 2), force VF to be 4.
3. Test vectorization of i64 data on AArch64 to make sure we generate VF != 4 (on X86 that was already tested on AVX).

Patch by Francesco Petrogalli <francesco.petrogalli@arm.com>

Differential Revision: https://reviews.llvm.org/D59952

llvm-svn: 358056
2019-04-10 08:17:28 +00:00
Nikita Popov 09020ec2a7 [InstCombine] Handle usubo always overflow
Check AlwaysOverflow condition for usubo. The implementation is the
same as the existing handling for uaddo and umulo. Handling for saddo
and ssubo will follow (smulo doesn't have the necessary ValueTracking
support).

Differential Revision: https://reviews.llvm.org/D60483

llvm-svn: 358052
2019-04-10 07:10:53 +00:00
Chen Zheng 5e13ff1da2 [InstCombine] Canonicalize (-X s/ Y) to -(X s/ Y).
Differential Revision: https://reviews.llvm.org/D60395

llvm-svn: 358050
2019-04-10 06:52:09 +00:00
Akira Hatanaka 9ca9d32b6b [ObjC][ARC] Convert the retainRV marker that is passed as a named
metadata into a module flag in the auto-upgrader and make the ARC
contract pass read the marker as a module flag.

This is needed to fix a bug where ARC contract wasn't inserting the
retainRV marker when LTO was enabled, which caused objects returned
from a function to be auto-released.

rdar://problem/49464214

Differential Revision: https://reviews.llvm.org/D60303

llvm-svn: 358047
2019-04-10 06:20:20 +00:00
Craig Topper 391d5caa10 [X86] Move the 2 byte VEX optimization for MOV instructions back to the X86AsmParser::processInstruction where it used to be. Block when {vex3} prefix is present.
Years ago I moved this to an InstAlias using VR128H/VR128L. But now that we support {vex3} pseudo prefix, we need to block the optimization when it is set to match gas behavior.

llvm-svn: 358046
2019-04-10 05:43:20 +00:00
Jim Lin a49c95e02a [Sparc] Fix incorrect MI insertion position for spilling f128.
Summary:
Obviously, new built MI (sethi+add or sethi+xor+add) for constructing large offset
should be inserted before new created MI for storing even register into memory.
So the insertion position should be *StMI instead of II.

before fixed:

std %f0, [%g1+80]
sethi 4, %g1        <<<
add %g1, %sp, %g1   <<< this two instructions should be put before "std %f0, [%g1+80]".
sethi 4, %g1
add %g1, %sp, %g1
std %f2, [%g1+88]

after fixed:

sethi 4, %g1
add %g1, %sp, %g1
std %f0, [%g1+80]
sethi 4, %g1
add %g1, %sp, %g1
std %f2, [%g1+88]

Reviewers: venkatra, jyknight

Reviewed By: jyknight

Subscribers: jyknight, fedor.sergeev, jrtc27, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60397

llvm-svn: 358042
2019-04-10 01:56:32 +00:00
Craig Topper 9ca3a95f79 [X86] Support the EVEX versions vcvt(t)ss2si and vcvt(t)sd2si with the {evex} pseudo prefix in the assembler.
The EVEX versions are ambiguous with the VEX versions based on operands alone so we had explicitly dropped
them from the AsmMatcher table. Unfortunately, when we add them they incorrectly show in the table before
their VEX counterparts. This is different how the prioritization normally works.

To fix this we have to explicitly reject the instructions unless the {evex} prefix has been seen.

llvm-svn: 358041
2019-04-10 01:29:59 +00:00
Amara Emerson 9bf092d719 [AArch64][GlobalISel] Add isel support for vector G_ICMP and G_ASHR & G_SHL
The selection for G_ICMP is unfortunately not currently importable from SDAG
due to the use of custom SDNodes. To support this, this selection method has an
opcode table which has been generated by a script, indexed by various
instruction properties. Ideally in future we will have a GISel native selection
patterns that we can write in tablegen to improve on this.

For selection of some types we also need support for G_ASHR and G_SHL which are
generated as a result of legalization. This patch also adds support for them,
generating the same code as SelectionDAG currently does.

Differential Revision: https://reviews.llvm.org/D60436

llvm-svn: 358035
2019-04-09 21:22:43 +00:00
Amara Emerson 888dd5d198 [AArch64][GlobalISel] Legalize vector G_ICMP.
Selection support will be coming in a later patch.

Differential Revision: https://reviews.llvm.org/D60435

llvm-svn: 358034
2019-04-09 21:22:40 +00:00
Amara Emerson 92d74f19cf [AArch64][GlobalISel] Add legalization for some vector G_SHL and G_ASHR.
This is needed for some future support for vector ICMP.

Differential Revision: https://reviews.llvm.org/D60433

llvm-svn: 358033
2019-04-09 21:22:37 +00:00
Amara Emerson 2b523f8162 [GlobalISel][AArch64] Allow CallLowering to handle types which are normally
required to be passed as different register types. E.g. <2 x i16> may need to
be passed as a larger <2 x i32> type, so formal arg lowering needs to be able
truncate it back. Likewise, when dealing with returns of these types, they need
to be widened in the appropriate way back.

Differential Revision: https://reviews.llvm.org/D60425

llvm-svn: 358032
2019-04-09 21:22:33 +00:00
Nikita Popov c176b708e4 [InstCombine] Add with.overflow always overflow tests; NFC
The uadd and umul cases are currently handled, the usub, sadd, ssub
and smul cases are not. usub, sadd and ssub already have the
necessary ValueTracking support, smul doesn't.

llvm-svn: 358031
2019-04-09 20:02:23 +00:00
Craig Topper ba55a40fd0 [AArch64] Add test case to show missed opportunity to remove a shift before tbnz when the shift has been zero extended from i32 to i64. NFC
This pattern showed up in D60358 and it was suggested I had a test and fix that separately.

llvm-svn: 358030
2019-04-09 19:23:37 +00:00
Craig Topper 8e2871cd2c [X86] Add support for {vex2}, {vex3}, and {evex} to the assembler to match gas. Use {evex} to improve the one our 32-bit AVX512 tests.
These can be used to force the encoding used for instructions.

{vex2} will fail if the instruction is not VEX encoded, but otherwise won't do anything since we prefer vex2 when possible. Might need to skip use of the _REV MOV instructions for this too, but I haven't done that yet.

{vex3} will force the instruction to use the 3 byte VEX encoding or fail if there is no VEX form.

{evex} will force the instruction to use the EVEX version or fail if there is no EVEX version.

Differential Revision: https://reviews.llvm.org/D59266

llvm-svn: 358029
2019-04-09 18:45:15 +00:00
Craig Topper 61e77b11d1 [DAGCombiner][X86][SystemZ] Canonicalize SSUBO with immediate RHS to SADDO by negating the immediate.
This lines up with what we do for regular subtract and it matches up better with X86 assumptions in isel patterns that add with immediate is more canonical than sub with immediate.

Differential Revision: https://reviews.llvm.org/D60020

llvm-svn: 358027
2019-04-09 18:33:56 +00:00
Nikita Popov 2f5e9de8d1 Revert "[InstCombine] [InstCombine] Canonicalize (-X s/ Y) to -(X s/ Y)."
This reverts commit 1383a91689.

sdiv-canonicalize.ll fails after this revision. The fold needs to be
moved outside the branch handling constant operands. However when this
is done there are further test changes, so I'm reverting this in the
meantime.

llvm-svn: 358026
2019-04-09 18:32:38 +00:00
Nikita Popov 4b2323d1a3 [ValueTracking] Use computeConstantRange() for signed sub overflow determination
This is the same change as D60420 but for signed sub rather than
signed add: Range information is intersected into the known bits
result, allows to detect more no/always overflow conditions.

Differential Revision: https://reviews.llvm.org/D60469

llvm-svn: 358020
2019-04-09 17:01:49 +00:00
Simon Pilgrim d7cc0ec581 [TargetLowering] SimplifyDemandedBits - add ISD::INSERT_SUBVECTOR support
llvm-svn: 358019
2019-04-09 16:52:21 +00:00
Chen Zheng 1383a91689 [InstCombine] [InstCombine] Canonicalize (-X s/ Y) to -(X s/ Y).
Differential Revision: https://reviews.llvm.org/D60395

llvm-svn: 358017
2019-04-09 16:34:31 +00:00
Stanislav Mekhanoshin 913ba8eeb4 Revert LIS handling in MachineDCE
One of out of tree targets has regressed with this patch. Reverting
it for now and let liveness to be fully reconstructed in case pass
was used after the LIS is created to resolve the regression.

Differential Revision: https://reviews.llvm.org/D60466

llvm-svn: 358015
2019-04-09 16:13:53 +00:00
Nikita Popov 10edd2b79d [ValueTracking] Use computeConstantRange() in signed add overflow determination
This is D59386 for the signed add case. The computeConstantRange()
result is now intersected into the existing known bits information,
allowing to detect additional no-overflow/always-overflow conditions
(though the latter isn't used yet).

This (finally...) covers the motivating case from D59071.

Differential Revision: https://reviews.llvm.org/D60420

llvm-svn: 358014
2019-04-09 16:12:59 +00:00
Sanjay Patel 49d9d17a77 [InstCombine] prevent possible miscompile with sdiv+negate of vector op
Similar to:
rL358005

Forego folding arbitrary vector constants to fix a possible miscompile bug.
We can enhance the transform if we do want to handle the more complicated
vector case.

llvm-svn: 358013
2019-04-09 15:13:03 +00:00
Sanjay Patel d5173f5acf [InstCombine] add tests for sdiv with negated dividend and constant divisor; NFC
llvm-svn: 358010
2019-04-09 14:48:44 +00:00
Sanjay Patel 7563b65ad4 [InstCombine] add tests for sdiv-by-int-min; NFC
llvm-svn: 358008
2019-04-09 14:27:07 +00:00
Sanjay Patel d469954d61 [InstCombine] auto-generate complete test checks; NFC
llvm-svn: 358007
2019-04-09 14:27:03 +00:00
Sanjay Patel f62dcea7ed [InstCombine] prevent possible miscompile with negate+sdiv of vector op
// 0 - (X sdiv C)  -> (X sdiv -C)  provided the negation doesn't overflow.

This fold has been around for many years and nobody noticed the potential
vector miscompile from overflow until recently...
So it seems unlikely that there's much demand for a vector sdiv optimization
on arbitrary vector constants, so just limit the matching to splat constants
to avoid the possible bug.

Differential Revision: https://reviews.llvm.org/D60426

llvm-svn: 358005
2019-04-09 14:09:06 +00:00
Sanjay Patel a230bb5fc0 [InstCombine] add tests/comments for negate+sdiv; NFC
llvm-svn: 358003
2019-04-09 13:41:29 +00:00
Chen Zheng 11cf397292 [InstCombine] add more testcases for canonicalize (-X s/ Y) to -(X s/ Y).
llvm-svn: 358000
2019-04-09 12:47:29 +00:00
Simon Pilgrim 345eacd555 [TargetLowering] SimplifyDemandedBits - call SimplifyDemandedBits in bitcast handling
When bitcasting from a source op to a larger bitwidth op, split the demanded bits and OR them on top of one another and demand those merged bits in the SimplifyDemandedBits call on the source op.

llvm-svn: 357992
2019-04-09 10:27:59 +00:00
Tom Stellard 206b9927f8 AMDGPU/GlobalISel: Implement call lowering for shaders returning values
Reviewers: arsenm, nhaehnle

Subscribers: kzhuravl, jvesely, wdng, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, volkan, llvm-commits

Differential Revision: https://reviews.llvm.org/D57166

llvm-svn: 357964
2019-04-09 02:26:03 +00:00
Chen Zheng 19ce6719bc [PowerPC] initialize SchedModel according to platform.
Differential Revision: https://reviews.llvm.org/D60177

llvm-svn: 357962
2019-04-09 01:25:25 +00:00
Peter Collingbourne df57979ba7 hwasan: Enable -hwasan-allow-ifunc by default.
It's been on in Android for a while without causing problems, so it's time
to make it the default and remove the flag.

Differential Revision: https://reviews.llvm.org/D60355

llvm-svn: 357960
2019-04-09 00:25:59 +00:00
Sanjay Patel 74ccef1f4f [InstCombine] add tests for negate+sdiv; NFC
PR41425:
https://bugs.llvm.org/show_bug.cgi?id=41425

llvm-svn: 357953
2019-04-08 22:55:10 +00:00
Shoaib Meenai 867131a96c [BinaryFormat] Update Mach-O ARM64E CPU subtype and dumping
The new value is taken from <mach/machine.h> in the MacOSX10.14 SDK from
Xcode 10.1. Update llvm-objdump and llvm-readobj accordingly.

Differential Revision: https://reviews.llvm.org/D58636

llvm-svn: 357945
2019-04-08 21:37:08 +00:00
Sanjay Patel 773e04c883 [InstCombine] peek through fdiv to find a squared sqrt
A more general canonicalization between fdiv and fmul would not
handle this case because that would have to be limited by uses
to prevent 2 values from becoming 3 values:
(x/y) * (x/y) --> (x*x) / (y*y)

(But we probably should still have that limited -- but more general --
canonicalization independently of this change.)

llvm-svn: 357943
2019-04-08 21:23:50 +00:00
Simon Pilgrim 9f74df7d5b [TargetLowering] SimplifyDemandedBits - use DemandedElts in bitcast handling
Be more selective in the SimplifyDemandedBits -> SimplifyDemandedVectorElts bitcast call based on the demanded elts.

llvm-svn: 357942
2019-04-08 20:59:38 +00:00
Sanjay Patel bf1417d7e4 [InstCombine] add extra-use tests for fmul+sqrt; NFC
llvm-svn: 357939
2019-04-08 20:37:34 +00:00
Nikita Popov 15abd74de7 [InstCombine] Add more tests for signed saturing math overflow; NFC
Overflow conditions for sadd.sat and ssub.sat which can be determined
based on constant ranges, but not necessarily known bits.

llvm-svn: 357938
2019-04-08 20:02:47 +00:00
Nico Weber 63b97d2a67 llvm-undname: Fix more crashes and asserts on invalid inputs
For functions whose callers don't check that enough input is present,
add checks at the start of the function that enough input is there and
set Error otherwise.

For functions that return AST objects, return nullptr instead of
incomplete AST objects with nullptr fields if an error occurred during
the function.

Introduce a new function demangleDeclarator() for the sequence
demangleFullyQualifiedSymbolName(); demangleEncodedSymbol() and
use it in the two places that had this sequence. Let this new function
check that ConversionOperatorIdentifiers have a valid TargetType.

Some of the bad inputs found by oss-fuzz, others by inspection.

Differential Revision: https://reviews.llvm.org/D60354

llvm-svn: 357936
2019-04-08 19:46:53 +00:00
Adrian Prantl 6ed5706a2b Add LLVM IR debug info support for Fortran COMMON blocks
COMMON blocks are a feature of Fortran that has no direct analog in C languages, but they are similar to data sections in assembly language programming. A COMMON block is a named area of memory that holds a collection of variables. Fortran subprograms may map the COMMON block memory area to their own, possibly distinct, non-empty list of variables. A Fortran COMMON block might look like the following example.

    COMMON /ALPHA/ I, J

    For this construct, the compiler generates a new scope-like DI construct (!DICommonBlock) into which variables (see I, J above) can be placed. As the common block implies a range of storage with global lifetime, the !DICommonBlock refers to a !DIGlobalVariable. The Fortran variable that comprise the COMMON block are also linked via metadata to offsets within the global variable that stands for the entire common block.

    @alpha_ = common global %alphabytes_ zeroinitializer, align 64, !dbg !27, !dbg !30, !dbg !33
    !14 = distinct !DISubprogram(…)
    !20 = distinct !DICommonBlock(scope: !14, declaration: !25, name: "alpha")
    !25 = distinct !DIGlobalVariable(scope: !20, name: "common alpha", type: !24)
    !27 = !DIGlobalVariableExpression(var: !25, expr: !DIExpression())
    !29 = distinct !DIGlobalVariable(scope: !20, name: "i", file: !3, type: !28)
    !30 = !DIGlobalVariableExpression(var: !29, expr: !DIExpression())
    !31 = distinct !DIGlobalVariable(scope: !20, name: "j", file: !3, type: !28)
    !32 = !DIExpression(DW_OP_plus_uconst, 4)
    !33 = !DIGlobalVariableExpression(var: !31, expr: !32)

    The DWARF generated for this is as follows.

    DW_TAG_common_block:
    DW_AT_name: alpha
    DW_AT_location: @alpha_+0
    DW_TAG_variable:
    DW_AT_name: common alpha
    DW_AT_type: array of 8 bytes
    DW_AT_location: @alpha_+0
    DW_TAG_variable:
    DW_AT_name: i
    DW_AT_type: integer*4
    DW_AT_location: @Alpha+0
    DW_TAG_variable:
    DW_AT_name: j
    DW_AT_type: integer*4
    DW_AT_location: @Alpha+4

Patch by Eric Schweitz!

Differential Revision: https://reviews.llvm.org/D54327

llvm-svn: 357934
2019-04-08 19:13:55 +00:00
Steven Wu f41e70d6eb Revert [ThinLTO] Fix ThinLTOCodegenerator to export llvm.used symbols
This reverts r357931 (git commit 8b70a5c11e)

llvm-svn: 357932
2019-04-08 18:53:21 +00:00
Steven Wu 8b70a5c11e [ThinLTO] Fix ThinLTOCodegenerator to export llvm.used symbols
Summary:
ThinLTOCodeGenerator currently does not preserve llvm.used symbols and
it can internalize them. In order to pass the necessary information to the
legacy ThinLTOCodeGenerator, the input to the code generator is
rewritten to be based on lto::InputFile.

This fixes: PR41236
rdar://problem/49293439

Reviewers: tejohnson, pcc, dexonsmith

Reviewed By: tejohnson

Subscribers: mehdi_amini, inglorion, eraman, hiraditya, jkorous, dang, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60226

llvm-svn: 357931
2019-04-08 18:24:10 +00:00
Brian M. Rzycki 887865c1ad [JumpThreading] Fix incorrect fold conditional after indirectbr/callbr
Fixes bug 40992: https://bugs.llvm.org/show_bug.cgi?id=40992

There is potential for miscompiled code emitted from JumpThreading when
analyzing a block with one or more indirectbr or callbr predecessors. The
ProcessThreadableEdges() function incorrectly folds conditional branches
into an unconditional branch.

This patch prevents incorrect branch folding without fully pessimizing
other potential threading opportunities through the same basic block.

This IR shape was manually fed in via opt and is unclear if clang and the
full pass pipeline will ever emit similar code shapes.

Thanks to Matthias Liedtke for the bug report and simplified IR example.

Differential Revision: https://reviews.llvm.org/D60284

llvm-svn: 357930
2019-04-08 18:20:35 +00:00
Andrea Di Biagio f6a60f1f80 [llvm-mca][scheduler-stats] Print issued micro opcodes per cycle. NFCI
It makes more sense to print out the number of micro opcodes that are issued
every cycle rather than the number of instructions issued per cycle.
This behavior is also consistent with the dispatch-stats: numbers from the two
views can now be easily compared.

llvm-svn: 357919
2019-04-08 16:05:54 +00:00
Simon Pilgrim 86844a865e [X86][AVX] Add PR34380 shuffle test cases
llvm-svn: 357914
2019-04-08 14:05:42 +00:00
Sanjay Patel 50c3b290ed [x86] make 8-bit shl undesirable
I was looking at a potential DAGCombiner fix for 1 of the regressions in D60278, and it caused severe regression test pain because x86 TLI lies about the desirability of 8-bit shift ops.

We've hinted at making all 8-bit ops undesirable for the reason in the code comment:

// TODO: Almost no 8-bit ops are desirable because they have no actual
//       size/speed advantages vs. 32-bit ops, but they do have a major
//       potential disadvantage by causing partial register stalls.

...but that leads to massive diffs and exposes all kinds of optimization holes itself.

Differential Revision: https://reviews.llvm.org/D60286

llvm-svn: 357912
2019-04-08 13:58:50 +00:00
Sanjay Patel b33938df7a [InstCombine] remove overzealous assert for shuffles (PR41419)
As the TODO indicates, instsimplify could be improved.

Should fix:
https://bugs.llvm.org/show_bug.cgi?id=41419

llvm-svn: 357910
2019-04-08 13:28:29 +00:00
Simon Pilgrim b4f1bfa659 [InstCombine][X86] Expand MOVMSK to generic IR (PR39927)
First step towards removing the MOVMSK intrinsics completely - this patch expands MOVMSK to the pattern:

e.g. PMOVMSKB(v16i8 x):
%cmp = icmp slt <16 x i8> %x, zeroinitializer
%int = bitcast <16 x i8> %cmp to i16
%res = zext i16 %int to i32

Which is correctly handled by ISel and FastIsel (give or take an annoying movzx move....): https://godbolt.org/z/rkrSFW

Differential Revision: https://reviews.llvm.org/D60256

llvm-svn: 357909
2019-04-08 13:17:51 +00:00
Chen Zheng 923c7c9daa [InstCombine] sdiv exact flag fixup.
Differential Revision: https://reviews.llvm.org/D60396

llvm-svn: 357904
2019-04-08 12:08:03 +00:00
Roman Lebedev a82235843b [llvm-exegesis][X86] Randomize CMOVcc/SETcc OPERAND_COND_CODE CondCodes
Reviewers: courbet, gchatelet

Reviewed By: gchatelet

Subscribers: tschuett, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60066

llvm-svn: 357898
2019-04-08 10:11:00 +00:00
Chen Zheng edf91ed855 [InstCombine] add more testcases for sdiv exact flag fixup.
llvm-svn: 357894
2019-04-08 09:19:42 +00:00
Chen Zheng d3b1d74624 [InstCombine] add testcases for sdiv exact flag fixing - NFC.
llvm-svn: 357884
2019-04-08 05:49:15 +00:00
Chen Zheng c84107612a [InstCombine]add testcase for sdiv canonicalizetion - NFC
llvm-svn: 357883
2019-04-08 03:07:32 +00:00
Craig Topper afb6b42691 [X86] Split floating point tests out of atomic-mi.ll into atomic-fp.ll. Add avx and avx512f command lines. NFC
llvm-svn: 357882
2019-04-08 01:54:27 +00:00
Craig Topper 8aeefe3149 [X86] Add avx and avx512f command lines to atomic-non-integer.ll. NFC
llvm-svn: 357881
2019-04-08 01:54:24 +00:00
Craig Topper 424417da79 [X86] Use (SUBREG_TO_REG (MOV32rm)) for extloadi64i8/extloadi64i16 when the load is 4 byte aligned or better and not volatile.
Summary:
Previously we would use MOVZXrm8/MOVZXrm16, but those are longer encodings.

This is similar to what we do in the loadi32 predicate.

Reviewers: RKSimon, spatel

Reviewed By: RKSimon

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60341

llvm-svn: 357875
2019-04-07 19:19:44 +00:00
Nikita Popov 3db93ac5d6 Reapply [ValueTracking] Support min/max selects in computeConstantRange()
Add support for min/max flavor selects in computeConstantRange(),
which allows us to fold comparisons of a min/max against a constant
in InstSimplify. This fixes an infinite InstCombine loop, with the
test case taken from D59378.

Relative to the previous iteration, this contains some adjustments for
AMDGPU med3 tests: The AMDGPU target runs InstSimplify prior to codegen,
which ends up constant folding some existing med3 tests after this
change. To preserve these tests a hidden -amdgpu-scalar-ir-passes option
is added, which allows disabling scalar IR passes (that use InstSimplify)
for testing purposes.

Differential Revision: https://reviews.llvm.org/D59506

llvm-svn: 357870
2019-04-07 17:22:16 +00:00
Simon Pilgrim 07adb6abda [X86][SSE] SimplifyDemandedBitsForTargetNode - Add initial PACKSS support
In the case where we only want the sign bit (e.g. when using PACKSS truncation of comparison results for MOVMSK) then we can just demand the sign bit of the source operands.

This makes use of the fact that PACKSS saturates out of range values to the min/max int values - so the sign bit is always preserved.

Differential Revision: https://reviews.llvm.org/D60333

llvm-svn: 357859
2019-04-07 10:40:01 +00:00
Fangrui Song 47a7662e29 [llvm-objdump] Fix split of source lines; don't ltrim source lines
If the file does not end with a newline, it may be dropped. Fix the
splitting algorithm.

Also delete an unnecessary SourceCache lookup.

llvm-svn: 357858
2019-04-07 10:16:46 +00:00
Fangrui Song 545ed223a6 [llvm-objdump] Simplify disassembleObject
* Use std::binary_search to replace some std::lower_bound
* Use llvm::upper_bound to replace some std::upper_bound
* Use format_hex and support::endian::read{16,32}

llvm-svn: 357853
2019-04-07 05:32:16 +00:00
Craig Topper 399102b464 [X86] When converting (x << C1) AND C2 to (x AND (C2>>C1)) << C1 during isel, try using andl over andq by favoring 32-bit unsigned immediates.
llvm-svn: 357848
2019-04-06 19:00:11 +00:00
Craig Topper f9b9f8d2e4 [X86] Use a signed mask in foldMaskedShiftToScaledMask to enable a shorter immediate encoding.
This function reorders AND and SHL to enable the SHL to fold into an LEA. The
upper bits of the AND will be shifted out by the SHL so it doesn't matter what
mask value we use for these bits. By using sign bits from the original mask in
these upper bits we might enable a shorter immediate encoding to be used.

llvm-svn: 357846
2019-04-06 18:00:50 +00:00
Craig Topper 82448bc09e [X86] Add test cases to show missed opportunities to use a sign extended 8 or 32 bit immediate AND when reversing SHL+AND to form an LEA.
When we shift the AND mask over we should shift in sign bits instead of zero bits. The scale in the LEA will shift these bits out so it doesn't matter whether we mask the bits off or not. Using sign bits will potentially allow a sign extended immediate to be used.

Also add some other test cases for cases that are currently optimal.

llvm-svn: 357845
2019-04-06 18:00:45 +00:00
Craig Topper 9d7379c250 [X86] Autogenerate complete checks. NFC
llvm-svn: 357844
2019-04-06 18:00:41 +00:00
Simon Pilgrim ec28615f7f [X86] Add AVX-target expandload and compressstore tests
llvm-svn: 357842
2019-04-06 14:40:52 +00:00
Roman Lebedev 404bdb1c9e [llvm-exegesis][X86] Handle CMOVcc/SETcc OPERAND_COND_CODE OperandType
Summary:
D60041 / D60138 refactoring changed how CMOV/SETcc opcodes
are handled. concode is now an immediate, with it's own operand type.

This at least allows to not crash on the opcode.
However, this still won't generate all the snippets
with all the condcode enumerators. D60066 does that.

Reviewers: courbet, gchatelet

Reviewed By: gchatelet

Subscribers: tschuett, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60057

llvm-svn: 357841
2019-04-06 14:16:26 +00:00
Simon Pilgrim d23611f9ad [X86] Split expandload and compressstore tests
llvm-svn: 357840
2019-04-06 14:14:54 +00:00
Simon Pilgrim 18a8a64c9f [X86][SSE] Add more exhaustive masked load/store tests
Reordered/renamed some existing tests to match the cleaned up order

llvm-svn: 357839
2019-04-06 14:01:37 +00:00
Simon Pilgrim 2ea8dbf564 [CostModel][X86] Add more exhaustive masked load/store/gather/scatter/expand/compress cost tests
llvm-svn: 357838
2019-04-06 12:08:37 +00:00
Sanjay Patel c538c50113 [InstCombine] add more tests for fmul+fdiv+sqrt; NFC
llvm-svn: 357816
2019-04-05 20:54:35 +00:00
Francis Visoiu Mistrih 9d9d1b6b2b [X86] Enable tail calls for CallingConv::Swift
It's currently only enabled on AArch64 (enabled in r281376).

llvm-svn: 357809
2019-04-05 20:18:25 +00:00
Francis Visoiu Mistrih ab051a378c [X86] Preserve operand flag when expanding TCRETURNri
The expansion of TCRETURNri(64) would not keep operand flags like
undef/renamable/etc. which can result in machine verifier issues.

Also add plumbing to be able to use `-run-pass=x86-pseudo`.

llvm-svn: 357808
2019-04-05 20:18:21 +00:00
Stanislav Mekhanoshin c8f78f8dd3 [AMDGPU] Add MachineDCE pass after RenameIndependentSubregs
Detect dead lanes can create some dead defs. Then RenameIndependentSubregs
will break a REG_SEQUENCE which may use these dead defs. At this point
a dead instruction can be removed but we do not run a DCE anymore.

MachineDCE was only running before live variable analysis. The patch
adds a mean to preserve LiveIntervals and SlotIndexes in case it works
past this.

Differential Revision: https://reviews.llvm.org/D59626

llvm-svn: 357805
2019-04-05 20:11:32 +00:00
Craig Topper 80aa2290fb [X86] Merge the different Jcc instructions for each condition code into single instructions that store the condition code as an operand.
Summary:
This avoids needing an isel pattern for each condition code. And it removes translation switches for converting between Jcc instructions and condition codes.

Now the printer, encoder and disassembler take care of converting the immediate. We use InstAliases to handle the assembly matching. But we print using the asm string in the instruction definition. The instruction itself is marked IsCodeGenOnly=1 to hide it from the assembly parser.

Reviewers: spatel, lebedev.ri, courbet, gchatelet, RKSimon

Reviewed By: RKSimon

Subscribers: MatzeB, qcolombet, eraman, hiraditya, arphaman, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60228

llvm-svn: 357802
2019-04-05 19:28:09 +00:00
Craig Topper 7323c2bf85 [X86] Merge the different SETcc instructions for each condition code into single instructions that store the condition code as an operand.
Summary:
This avoids needing an isel pattern for each condition code. And it removes translation switches for converting between SETcc instructions and condition codes.

Now the printer, encoder and disassembler take care of converting the immediate. We use InstAliases to handle the assembly matching. But we print using the asm string in the instruction definition. The instruction itself is marked IsCodeGenOnly=1 to hide it from the assembly parser.

Reviewers: andreadb, courbet, RKSimon, spatel, lebedev.ri

Reviewed By: andreadb

Subscribers: hiraditya, lebedev.ri, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60138

llvm-svn: 357801
2019-04-05 19:27:49 +00:00
Craig Topper e0bfeb5f24 [X86] Merge the different CMOV instructions for each condition code into single instructions that store the condition code as an immediate.
Summary:
Reorder the condition code enum to match their encodings. Move it to MC layer so it can be used by the scheduler models.

This avoids needing an isel pattern for each condition code. And it removes
translation switches for converting between CMOV instructions and condition
codes.

Now the printer, encoder and disassembler take care of converting the immediate.
We use InstAliases to handle the assembly matching. But we print using the
asm string in the instruction definition. The instruction itself is marked
IsCodeGenOnly=1 to hide it from the assembly parser.

This does complicate the scheduler models a little since we can't assign the
A and BE instructions to a separate class now.

I plan to make similar changes for SETcc and Jcc.

Reviewers: RKSimon, spatel, lebedev.ri, andreadb, courbet

Reviewed By: RKSimon

Subscribers: gchatelet, hiraditya, kristina, lebedev.ri, jdoerfert, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60041

llvm-svn: 357800
2019-04-05 19:27:41 +00:00
Guozhi Wei 36fc9c3107 [LCG] Add aliased functions as LCG roots
Current LCG doesn't check aliased functions. So if an internal function has a public alias it will not be added to CG SCC, but it is still reachable from outside through the alias.
So this patch adds aliased functions to SCC.

Differential Revision: https://reviews.llvm.org/D59898

llvm-svn: 357795
2019-04-05 18:51:08 +00:00
Sanjay Patel 79df4454e1 [InstCombine] add tests for fdiv+fmul; NFC
llvm-svn: 357782
2019-04-05 17:00:57 +00:00
Sanjay Patel 7e3e7f8040 [InstCombine] add tests for sqrt+fdiv+fmul; NFC
Examples based on recent llvm-dev thread. These are specific
patterns of more general enhancements that would solve these.

llvm-svn: 357780
2019-04-05 16:52:57 +00:00
Sanjay Patel 9965f5aa70 [InstCombine] add test to show reassociation that creates a denormal constant; NFC
llvm-svn: 357776
2019-04-05 16:42:21 +00:00
Stephen Tozer bbeca849d7 Revert "[llvm-readobj] Improve error message for --string-dump"
This reverts commit 681b0798db.

Reverted due to causing build failures: llvm-svn: 357772

llvm-svn: 357774
2019-04-05 16:32:25 +00:00
Stephen Tozer 681b0798db [llvm-readobj] Improve error message for --string-dump
Fixes bug 40630: https://bugs.llvm.org/show_bug.cgi?id=40630

This patch changes the error message when the section specified by
--string-dump cannot be found by including the name of the section in
the error message and changing the prefix text to not imply that the
file itself was invalid. As part of this change some uses of
std::error_code have been replaced with the llvm Error class to better
encapsulate the error info (rather than passing File strings around),
and the WithColor class replaces string literal error prefixes.

Differential Revision: https://reviews.llvm.org/D59946

llvm-svn: 357772
2019-04-05 16:15:50 +00:00
Clement Courbet 1d8c9dfe03 [ExpandMemCmp][NFC] Add tests for `memcmp(p, q, n) < 0` case.
llvm-svn: 357767
2019-04-05 15:03:25 +00:00
Simon Pilgrim 17586cda4a [SelectionDAG] Add fcmp UNDEF handling to SelectionDAG::FoldSetCC
Second half of PR40800, this patch adds DAG undef handling to fcmp instructions to match the behavior in llvm::ConstantFoldCompareInstruction, this permits constant folding of vector comparisons where some elements had been reduced to UNDEF (by SimplifyDemandedVectorElts etc.).

This involves a lot of tweaking to reduced tests as bugpoint loves to reduce fcmp arguments to undef........

Differential Revision: https://reviews.llvm.org/D60006

llvm-svn: 357765
2019-04-05 14:56:21 +00:00
Matt Arsenault 4ed6ccab9b AMDGPU/GlobalISel: Fix non-power-of-2 select
llvm-svn: 357762
2019-04-05 14:03:04 +00:00
Sanjay Patel 50a8652785 [DAGCombiner][x86] scalarize splatted vector FP ops
There are a variety of vector patterns that may be profitably reduced to a
scalar op when scalar ops are performed using a subset (typically, the
first lane) of the vector register file.

For x86, this is true for float/double ops and element 0 because
insert/extract is just a sub-register rename.

Other targets should likely enable the hook in a similar way.

Differential Revision: https://reviews.llvm.org/D60150

llvm-svn: 357760
2019-04-05 13:32:17 +00:00
Simon Pilgrim faa5b939f0 [X86][AVX] Add PR34584 masked store test cases
llvm-svn: 357757
2019-04-05 11:34:30 +00:00
Simon Pilgrim 329e63b915 [X86] Add SSE/AVX1/AVX2 masked trunc+store tests
llvm-svn: 357756
2019-04-05 11:22:28 +00:00
Roger Ferrer Ibanez e011e4f89c [RISCV] Implement adding a displacement to a BlockAddress
Recent change rL357393 uses MachineInstrBuilder::addDisp to add a based on a
BlockAddress but this case was not implemented.

This patch adds the missing case and a test for RISC-V that exercises the new
case.

Differential Revision: https://reviews.llvm.org/D60136

llvm-svn: 357752
2019-04-05 08:40:57 +00:00
Pavel Labath 51d9fa0a22 Minidump: Add support for reading/writing strings
Summary:
Strings in minidump files are stored as a 32-bit length field, giving
the length of the string in *bytes*, which is followed by the
appropriate number of UTF16 code units. The string is also supposed to
be null-terminated, and the null-terminator is not a part of the length
field. This patch:
- adds support for reading these strings out of the minidump file (this
  implementation does not depend on proper null-termination)
- adds support for writing them to a minidump file
- using the previous two pieces implements proper (de)serialization of
  the CSDVersion field of the SystemInfo stream. Previously, this was
  only read/written as hex, and no attempt was made to access the
  referenced string -- now this string is read and written correctly.

The changes are tested via yaml2obj|obj2yaml round-trip as well as a
unit test which checks the corner cases of the string deserialization
logic.

Reviewers: jhenderson, zturner, clayborg

Subscribers: llvm-commits, aprantl, markmentovai, amccarth, lldb-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D59775

llvm-svn: 357749
2019-04-05 08:06:26 +00:00
Piotr Sobczak 0376ac1d94 [SelectionDAG] Compute known bits of CopyFromReg
Summary:
Teach SelectionDAG how to compute known bits of ISD::CopyFromReg if
the virtual reg used has one def only.

This can be particularly useful when calling isBaseWithConstantOffset()
with the ISD::CopyFromReg argument, as more optimizations may get enabled
in the result.

Also add a missing truncation on X86, found by testing of this patch.

Change-Id: Id1c9fceec862d118c54a5b53adf72ada5d6daefa

Reviewers: bogner, craig.topper, RKSimon

Reviewed By: RKSimon

Subscribers: lebedev.ri, nemanjai, jvesely, nhaehnle, javed.absar, jsji, jdoerfert, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D59535

llvm-svn: 357745
2019-04-05 07:44:09 +00:00
Craig Topper 94f1772b1e [X86] Promote i16 SRA instructions to i32
We already promote SRL and SHL to i32.

This will introduce sign extends sometimes which might be harder to deal with than the zero we use for promoting SRL. I ran this through some of our internal benchmark lists and didn't see any major regressions.

I think there might be some DAG combine improvement opportunities in the test changes here.

Differential Revision: https://reviews.llvm.org/D60278

llvm-svn: 357743
2019-04-05 06:32:50 +00:00
Serguei Katkov c39636cc2c [FastISel] Fix crash for gc.relocate lowring
Lowering safepoint checks that all gc.relocaes observed in safepoint
must be lowered. However Fast-Isel is able to skip dead gc.relocate.

To resolve this issue we just ignore dead gc.relocate in the check.

Reviewers: reames
Reviewed By: reames
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D60184

llvm-svn: 357742
2019-04-05 05:41:08 +00:00
David Callahan f498bdcebf Include invoke'd functions for recursive extract
Summary: When recursively extracting a function from a bit code file, include functions mentioned in InvokeInst as well as CallInst

Reviewers: loladiro, espindola, volkan

Reviewed By: loladiro

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60231

llvm-svn: 357735
2019-04-04 23:30:47 +00:00
James Y Knight a040174418 Revert [X86] When using Win64 ABI, exit with error if SSE is disabled for varargs
It unnecessarily breaks previously-working code which used varargs,
but didn't pass any float/double arguments (such as EDK2).

Also revert the fixup on top of that:
Revert [X86] Fix a test from r357317

This reverts r357317 (git commit d413f41de6)
This reverts r357380 (git commit 7af32444b9)

llvm-svn: 357718
2019-04-04 19:05:48 +00:00
Sam Clegg 2a7cac932b [WebAssembly] Add new explicit relocation types for PIC relocations
See https://github.com/WebAssembly/tool-conventions/pull/106

Differential Revision: https://reviews.llvm.org/D59907

llvm-svn: 357710
2019-04-04 17:43:50 +00:00
Don Hinton 98e3954fe9 [llvm-objcopy] [llvm-symbolizer] Fix failing tests
Summary: Fix failing tests that matched substrings in path.

Reviewers: evgeny777, mattd, espindola, alexshap, rupprecht, jhenderson

Reviewed By: jhenderson

Subscribers: Bulletmagnet, emaste, arichardson, jakehehrlich, MaskRay, rupprecht, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60170

llvm-svn: 357709
2019-04-04 17:35:41 +00:00
Adrian Prantl ce2d45e7ba llvm-dwarfdump: Support alternative architecture names in the -arch filter
<rdar://problem/47918606>

llvm-svn: 357706
2019-04-04 15:48:40 +00:00
Sanjay Patel 17648b848e [x86] eliminate unnecessary broadcast of horizontal op
This is another pattern that comes up if we more aggressively
scalarize FP ops.

llvm-svn: 357703
2019-04-04 14:46:13 +00:00
Lewis Revill aa79a3fe8e [RISCV] Support assembling TLS add and associated modifiers
This patch adds support in the MC layer for parsing and assembling the
4-operand add instruction needed for TLS addressing. This also involves
parsing the %tprel_hi, %tprel_lo and %tprel_add operand modifiers.

Differential Revision: https://reviews.llvm.org/D55341

llvm-svn: 357698
2019-04-04 14:13:37 +00:00