Commit Graph

62300 Commits

Author SHA1 Message Date
Amara Emerson d3144a4abc [AArch64][GlobalISel] Add manual selection support for G_ZEXTLOADs to s64.
We already get support for G_ZEXTLOAD to s32 from the importer, but it can't
deal with the SUBREG_TO_REG in the pattern. Tweaking the existing manual
selection code for G_LOAD to handle an additional SUBREG_TO_REG when dealing
with G_ZEXTLOAD isn't much work.

Also add tests to check the imported pattern selections to s32 work.

llvm-svn: 362681
2019-06-06 07:58:37 +00:00
Amara Emerson d940e20051 [AArch64][GlobalISel] Add the new changes to fix PR42129 that were supposed to go into r362666.
The changes weren't staged so ended up just re-commiting the unmodified reverted change.

llvm-svn: 362677
2019-06-06 07:33:47 +00:00
Craig Topper 9226ba6b37 [X86] Don't turn avx masked.load with constant mask into masked.load+vselect when passthru value is all zeroes.
This is intended to enable the use of an immediate blend or
more optimal instruction. But if the passthru is zero we don't
need any additional instructions.

llvm-svn: 362675
2019-06-06 05:41:27 +00:00
Craig Topper cf44372137 [X86] Add test case for masked load with constant mask and all zeros passthru.
avx/avx2 masked loads only support all zeros for passthru in hardware.
So we have to emit a blend for all other values. We have an optimization
that tries to optimize this blend if the mask is constant. But we
don't need to perform this optimization if the passthru value is zero
which doesn't need the blend at all.

llvm-svn: 362674
2019-06-06 05:41:22 +00:00
Amara Emerson c37ff0d138 Revert "Revert "[AArch64][GlobalISel] Optimize G_FCMP + G_SELECT pairs when G_SELECT is fp""
When looking through copies, make sure to not try to find the vreg def of a physreg.
Normally getVRegDef will return nullptr in this case, but if there happens to be
multiple defs then it will assert.

This fixes PR42129.

llvm-svn: 362666
2019-06-05 23:46:16 +00:00
Matt Arsenault 34c8b835b1 AMDGPU: Don't fix emergency stack slot at offset 0
This forced the caller to be aware of this, which is an ugly ABI
feature.

Partially reverts r295877. The original reasons for doing this are
mostly fixed. Alloca is now in a non-0 address space, so it should be
OK to have 0 as a valid pointer. Since we treat the absolute address
as the pointer value, this part only really needed to apply to
kernels.

Since r357093, we avoid the need to increment/decrement the offset
register in more cases, and since r354816 the scavenger can fail
without spilling, so it's less critical that we try to avoid an offset
that fits in the MUBUF offset.

Restrict to callable functions for now to split this into 2 steps to
limit thte number of test updates and in case anything breaks.

llvm-svn: 362665
2019-06-05 22:37:50 +00:00
Cameron McInally c72fbe5dc1 [MSAN] Add unary FNeg visitor to the MemorySanitizer
Differential Revision: https://reviews.llvm.org/D62909

llvm-svn: 362664
2019-06-05 22:37:05 +00:00
Ulrich Weigand 6c5d5ce551 Allow target to handle STRICT floating-point nodes
The ISD::STRICT_ nodes used to implement the constrained floating-point
intrinsics are currently never passed to the target back-end, which makes
it impossible to handle them correctly (e.g. mark instructions are depending
on a floating-point status and control register, or mark instructions as
possibly trapping).

This patch allows the target to use setOperationAction to switch the action
on ISD::STRICT_ nodes to Legal. If this is done, the SelectionDAG common code
will stop converting the STRICT nodes to regular floating-point nodes, but
instead pass the STRICT nodes to the target using normal SelectionDAG
matching rules.

To avoid having the back-end duplicate all the floating-point instruction
patterns to handle both strict and non-strict variants, we make the MI
codegen explicitly aware of the floating-point exceptions by introducing
two new concepts:

- A new MCID flag "mayRaiseFPException" that the target should set on any
  instruction that possibly can raise FP exception according to the
  architecture definition.
- A new MI flag FPExcept that CodeGen/SelectionDAG will set on any MI
  instruction resulting from expansion of any constrained FP intrinsic.

Any MI instruction that is *both* marked as mayRaiseFPException *and*
FPExcept then needs to be considered as raising exceptions by MI-level
codegen (e.g. scheduling).

Setting those two new flags is straightforward. The mayRaiseFPException
flag is simply set via TableGen by marking all relevant instruction
patterns in the .td files.

The FPExcept flag is set in SDNodeFlags when creating the STRICT_ nodes
in the SelectionDAG, and gets inherited in the MachineSDNode nodes created
from it during instruction selection. The flag is then transfered to an
MIFlag when creating the MI from the MachineSDNode. This is handled just
like fast-math flags like no-nans are handled today.

This patch includes both common code changes required to implement the
new features, and the SystemZ implementation.

Reviewed By: andrew.w.kaylor

Differential Revision: https://reviews.llvm.org/D55506

llvm-svn: 362663
2019-06-05 22:33:10 +00:00
Petr Hosek 2f94203e23 Revert "[AArch64][GlobalISel] Optimize G_FCMP + G_SELECT pairs when G_SELECT is fp"
This reverts commit r362435 as this triggers ICE, see PR42129 for details.

llvm-svn: 362662
2019-06-05 22:27:31 +00:00
Matt Arsenault b812b7a45e AMDGPU: Invert frame index offset interpretation
Since the beginning, the offset of a frame index has been consistently
interpreted backwards. It was treating it as an offset from the
scratch wave offset register as a frame register. The correct
interpretation is the offset from the SP on entry to the function,
before the prolog. Frame index elimination then should select either
SP or another register as an FP.

Treat the scratch wave offset on kernel entry as the pre-incremented
SP. Rely more heavily on the standard hasFP and frame pointer
elimination logic, and clean up the private reservation code. This
saves a copy in most callee functions.

The kernel prolog emission code is still kind of a mess relying on
checking the uses of physical registers, which I would prefer to
eliminate.

Currently selection directly emits MUBUF instructions, which require
using a reference to some register. Use the register chosen for SP,
and then ignore this later. This should probably be cleaned up to use
pseudos that don't refer to any specific base register until frame
index elimination.

Add a workaround for shaders using large numbers of SGPRs. I'm not
sure these cases were ever working correctly, since as far as I can
tell the logic for figuring out which SGPR is the scratch wave offset
doesn't match up with the shader input initialization in the shader
programming guide.

llvm-svn: 362661
2019-06-05 22:20:47 +00:00
Joseph Tremoulet acb5609063 [EarlyCSE] Add tests for negated min/max/abs [NFC]
Summary:
I'm planning to update the hashing logic to recognize their equivalence
in a subsequent change (D62644).

Reviewers: spatel

Reviewed By: spatel

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62918

llvm-svn: 362657
2019-06-05 21:30:10 +00:00
Matt Arsenault 663d762c9a NewGVN: Handle addrspacecast
The AllConstant check needs to be moved out of the if/else if chain to
avoid a test regression. The "there is no SimplifyZExt" comment
puzzles me, since there is SimplifyCastInst. Additionally, the
Simplify* calls seem to not see the operand as constant, so this needs
to be tried if the simplify failed.

llvm-svn: 362653
2019-06-05 21:15:52 +00:00
Tim Northover 8d7f118ab2 InstCombine: correctly change byval type attribute alongside call args.
When the byval attribute has a type, it must match the pointee type of
any parameter; but InstCombine was not updating the attribute when
folding casts of various kinds away.

llvm-svn: 362643
2019-06-05 20:38:17 +00:00
Matt Arsenault 4fb580c314 AMDGPU: Remove amdgpu-max-work-group-size attribute
This has been deprecated for a long time, and mesa recently switched
to amdgpu-flat-work-group-size.

llvm-svn: 362641
2019-06-05 20:32:32 +00:00
Dan Gohman 53572d0470 [WebAssembly] Limit PIC support to the Emscripten target
The current PIC support currently only works with Emscripten, so
disable it for other targets.

This is the PIC portion of https://reviews.llvm.org/D62542.

Reviewed By: dschuff, sbc100

llvm-svn: 362638
2019-06-05 20:01:01 +00:00
Simon Pilgrim 036fa5346f [X86][SSE] Add vector tests to cover more isNegatibleForFree/GetNegatedExpression cases (PR42105)
Some already combine correctly, but vector constant analysis is weak.

llvm-svn: 362633
2019-06-05 18:55:54 +00:00
Cameron McInally 8b83a9c6b1 [NFC][Reassociate] Fix mistake in 468b2ad
Missed 2 'fast fsub(0.0,X) -> fneg(X)' changes.

llvm-svn: 362631
2019-06-05 18:50:07 +00:00
Cameron McInally 5162266515 [NFC][Reassociate] Add unary fneg tests to fast-basictest.ll
llvm-svn: 362630
2019-06-05 18:35:54 +00:00
Craig Topper d0fff89b81 [X86] Add the vector integer min/max instructions to isAssociativeAndCommutative.
As far as I know these should be freely reassociatable just like
the floating point MAXC/MINC instructions.

The *reduce* test changes are largely regressions and caused by
the "generic" CPU we default to not having a scheduler model.

The machine-combiner-int-vec.ll test shows the positive benefits
of this change.

Differential Revision: https://reviews.llvm.org/D62787

llvm-svn: 362629
2019-06-05 18:25:09 +00:00
Philip Reames 13dd125043 [Tests] Add poison inference tests for indvars showing both existing transforms, and some room for improvement
llvm-svn: 362628
2019-06-05 18:00:59 +00:00
Cameron McInally 0a31726d20 [NFC][Reassociate] Regenerate CHECKs for fast-basictest.ll
llvm-svn: 362627
2019-06-05 18:00:27 +00:00
Sanjay Patel 2bf82879bd [x86] split more 256-bit stores of concatenated vectors
As suggested in D62498 - collectConcatOps() matches both
concat_vectors and insert_subvector patterns, and we see
more test improvements by using the more general match.

llvm-svn: 362620
2019-06-05 16:40:57 +00:00
Simon Pilgrim a0e350e640 [X86][SSE] Add additional nt-load test cases as discussed on D62910
llvm-svn: 362616
2019-06-05 16:11:57 +00:00
George Rimar 5da702308c [llvm-readobj] - Remove TODOs from gnu-hash-symbols.test and demangle.test test cases.
We can remove this TODOs now.

Differential revision: https://reviews.llvm.org/D62846

llvm-svn: 362614
2019-06-05 15:29:50 +00:00
Dinar Temirbulatov 15c657d13d [SLP] Fix regression in broadcasts caused by operand reordering patch D59973.
This patch fixes a regression caused by the operand reordering refactoring patch https://reviews.llvm.org/D59973 .
The fix changes the strategy to Splat instead of Opcode, if broadcast opportunities are found.
Please see the lit test for some examples.

Committed on behalf of @vporpo (Vasileios Porpodas)
    
Differential Revision: https://reviews.llvm.org/D62427

llvm-svn: 362613
2019-06-05 15:26:28 +00:00
Roman Lebedev 253086230f [NFC][Codegen][X86] Add AVX2 runline for '(X & (C l>> Y)) ==/!= 0' tests
llvm-svn: 362606
2019-06-05 14:08:11 +00:00
Roman Lebedev 54bd6c840e UpdateTestChecks: hexagon support
Summary:
These tests are being affected by an upcoming patch,
so having an understandable (autogenerated) diff is helpful.

This target, again, prefers `-march`:
```
llvm/test/CodeGen/Hexagon$ grep -r triple | wc -l
467
llvm/test/CodeGen/Hexagon$ grep -r march | wc -l
1167
```

Reviewers: RKSimon, kparzysz

Reviewed By: kparzysz

Subscribers: xbolva00, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62867

llvm-svn: 362605
2019-06-05 14:08:01 +00:00
Petar Avramovic 22e99c434f [MIPS GlobalISel] Select fcmp
Select floating point compare for MIPS32.

Differential Revision: https://reviews.llvm.org/D62721

llvm-svn: 362603
2019-06-05 14:03:13 +00:00
George Rimar 66296dc3e4 [yaml2obj] - Change how we handle implicit sections.
We have a few sections that can be added implicitly to the output:
".dynsym", ".dynstr", ".symtab", ".strtab" and ".shstrtab".

Problem appears when such section is listed explicitly in YAML.
In that case it's content is written twice:
first time during writing of regular sections listed in the document
and second time during special handling.

Because of that their file offsets can become unexpectedly broken:
(yaml file for sample below lists .dynsym explicitly before .text.foo)

Before patch:
  [Nr] Name              Type             Address           Offset
       Size              EntSize          Flags  Link  Info  Align
  [ 0]                   NULL             0000000000000000  00000000
       0000000000000000  0000000000000000           0     0     0
  [ 1] .dynsym           DYNSYM           0000000000000100  00000250
       0000000000000030  0000000000000018   A       6     0     8
  [ 2] .text.foo         PROGBITS         0000000000000200  00000200
       0000000000000000  0000000000000000  AX       0     0     0

After patch:
Section Headers:
  [Nr] Name         Type             Address           Offset
       Size              EntSize          Flags  Link  Info  Align
  [ 0]                   NULL             0000000000000000  00000000
       0000000000000000  0000000000000000           0     0     0
  [ 1] .dynsym           DYNSYM           0000000000000100  00000200
       0000000000000030  0000000000000018   A       6     0     8
  [ 2] .text.foo         PROGBITS         0000000000000200  00000230
       0000000000000000  0000000000000000  AX       0     0     0

This patch reorganizes our code and fixes the issue described.

Differential revision: https://reviews.llvm.org/D62809

llvm-svn: 362602
2019-06-05 13:16:53 +00:00
Simon Pilgrim 886a55eaa0 [X86][AVX] combineX86ShuffleChain - combine shuffle(extractsubvector(x),extractsubvector(y))
We already handle the case where we combine shuffle(extractsubvector(x),extractsubvector(x)), this relaxes the requirement to permit different sources as long as they have the same value type.

This causes a couple of cases where the VPERMV3 binary shuffles occur at a wider width than before, which I intend to improve in future commits - but as only the subvector's mask indices are defined, these will broadcast so we don't see any increase in constant size.

llvm-svn: 362599
2019-06-05 12:56:53 +00:00
George Rimar b42196661b [llvm-objdump] - Disassemble non-executable sections if specifically requested.
This is https://bugs.llvm.org/show_bug.cgi?id=41897.

Previously -d + -j .data had no effect, that wasn't consistent with GNU,
which proccesses .data in that case. With this patch we follow this behavior.

Diffeential revision: https://reviews.llvm.org/D62848

llvm-svn: 362596
2019-06-05 11:37:53 +00:00
Simon Pilgrim ddfbfd6172 [X86][SSE] Add some nt-store test cases inspired by PR42123
llvm-svn: 362594
2019-06-05 10:55:55 +00:00
Yevgeny Rouban a3e16719c4 Resubmit "[CorrelatedValuePropagation] Fix prof branch_weights metadata handling for SwitchInst"
This reverts commit 5b32f60ec3.
The fix is in commit 4f9e68148b.

This patch fixes the CorrelatedValuePropagation pass to keep
prof branch_weights metadata of SwitchInst consistent.
It makes use of SwitchInstProfUpdateWrapper.
New tests are added.

Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D62126

llvm-svn: 362583
2019-06-05 05:46:40 +00:00
Johannes Doerfert aade782a98 [Attributor] Pass infrastructure and fixpoint framework
NOTE: Note that no attributes are derived yet. This patch will not go in
      alone but only with others that derive attributes. The framework is
      split for review purposes.

This commit introduces the Attributor pass infrastructure and fixpoint
iteration framework. Further patches will introduce abstract attributes
into this framework.

In a nutshell, the Attributor will update instances of abstract
arguments until a fixpoint, or a "timeout", is reached. Communication
between the Attributor and the abstract attributes that are derived is
restricted to the AbstractState and AbstractAttribute interfaces.

Please see the file comment in Attributor.h for detailed information
including design decisions and typical use case. Also consider the class
documentation for Attributor, AbstractState, and AbstractAttribute.

Reviewers: chandlerc, homerdin, hfinkel, fedor.sergeev, sanjoy, spatel, nlopes, nicholas, reames

Subscribers: mehdi_amini, mgorny, hiraditya, bollu, steven_wu, dexonsmith, dang, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D59918

llvm-svn: 362578
2019-06-05 03:02:24 +00:00
Johannes Doerfert 76467c4d7f [NFC][FnAttrs] Stress tests for attribute deduction
This commit is a preparation of upcoming patches on attribute deduction.
It will shorten the diffs and make it clear what we inferred before.

Reviewers: chandlerc, homerdin, hfinkel, fedor.sergeev, sanjoy, spatel, nlopes

Subscribers: bollu, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D59903

llvm-svn: 362577
2019-06-05 03:00:06 +00:00
Nemanja Ivanovic 7c842fadf1 [PowerPC] Collapse RLDICL/RLDICR into RLDIC when possible
Generally speaking, we lower to an optimal rotate sequence for nodes visible in
the SDAG. However, there are instances where the two rotates are not visible at
ISEL time - most notably those in a very common sequence when lowering switch
statements to jump tables.

A common situation is a switch on a 32-bit integer. This value has to have the
upper 32 bits cleared and because jump table offsets are word offsets, the value
needs to be shifted left by 2 bits. We currently emit the clear and the left
shift as two separate instructions, but this is not needed as we can lower it to
a single RLDIC.

This patch just cleans that up.

Differential revision: https://reviews.llvm.org/D60402

llvm-svn: 362576
2019-06-05 02:36:40 +00:00
Nemanja Ivanovic cfb6c82172 [PowerPC][NFC] Add codegen test for consecutive stores of vector elements
NFC commit of a test case in order for the subsequent review to show differences
in codegen.

Differential revision: https://reviews.llvm.org/D62843

llvm-svn: 362573
2019-06-05 02:09:03 +00:00
Fangrui Song f090e6f7b6 [llvm-objdump/llvm-readobj/obj2yaml/yaml2obj] Support DT_PPC_GOT and DT_PPC_OPT
In glibc, DT_PPC_GOT indicates that PowerPC32 Secure PLT ABI is used.
I plan to use it in D62464.

DT_PPC_OPT currently indicates if a TLSDESC inspired TLS optimization is
enabled.

Reviewed By: grimar, jhenderson, rupprecht

Differential Revision: https://reviews.llvm.org/D62851

llvm-svn: 362569
2019-06-05 01:36:48 +00:00
Nemanja Ivanovic fe97754acf Initial support for IBM MASS vector library
This is the LLVM portion of patch https://reviews.llvm.org/D59881.
The clang portion is to follow.

llvm-svn: 362568
2019-06-05 01:31:43 +00:00
Amara Emerson 5e312be0fa [AArch64] FastISel: fix test to specify -fast-isel when -fast-isel-abort=1 is used.
This test has been inadvertently been GISel, and now assert due to incompatible flags.

llvm-svn: 362559
2019-06-04 23:11:42 +00:00
Cameron McInally 5c7245b830 [Scalarizer] Add UnaryOperator visitor to scalarization pass
Differential Revision: https://reviews.llvm.org/D62858

llvm-svn: 362558
2019-06-04 23:01:36 +00:00
Alex Brachet 375d5fb9ca [test][llvm-objcopy] Test llvm-objcopy with standard streams
Differential Revision: https://reviews.llvm.org/D62817

llvm-svn: 362556
2019-06-04 22:17:27 +00:00
Amara Emerson 2d37cb82f0 [AArch64][GlobalISel] Make extloads to i64 legal.
Although we had the support in the prelegalizer combiner to generate the
G_SEXTLOAD or G_ZEXTLOAD ops, the legalizer definitions for arm64 had them as
lowering back to separate ops.

llvm-svn: 362553
2019-06-04 21:51:34 +00:00
Craig Topper 1648cb17e4 [X86] Add avx512bw to the avx512 machine-combiner-int-vec.ll to ensure we use zmm for v32i16/v64i8.
llvm-svn: 362552
2019-06-04 21:47:50 +00:00
Craig Topper 8362518c6e [X86] Add vector min/max reassociation tests to machine-combiner-int-vec.ll. NFC
llvm-svn: 362550
2019-06-04 21:26:46 +00:00
Craig Topper 2fb7306f82 [X86] Add 512-bit test cases to machine-combiner-int-vec.ll. NFC
llvm-svn: 362549
2019-06-04 21:26:36 +00:00
Thomas Lively 3d9ca00e74 [WebAssembly] Fix ISel crash on sext_inreg/extract type mismatch
Summary:
Adjusts the index and adds a bitcast around the vector operand of
EXTRACT_VECTOR_ELT so that its lane type matches the source type of
its parent sext_inreg. Without this bitcast the ISel patterns do not
match and ISel fails.

Reviewers: aheejin

Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62646

llvm-svn: 362547
2019-06-04 21:08:20 +00:00
Johannes Doerfert 6b432dca5d [SelectionDAG][FIX] Allow "returned" arguments to be bit-casted
Summary:
An argument that is return by a function but bit-casted before can still
be annotated as "returned". Make sure we do not crash for this case.

Reviewers: sunfish, stephenwlin, niravd, arsenm

Subscribers: wdng, hiraditya, bollu, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D59917

llvm-svn: 362546
2019-06-04 20:34:43 +00:00
Nico Weber 1dce82636c llvm-undname: Correctly demangle vararg parameters
FunctionSignatureNode already had an IsVariadic field,
but it wasn't used anywhere yet. Set it and use it.

llvm-svn: 362541
2019-06-04 19:10:08 +00:00
Nico Weber 4638548468 llvm-undname: More coverage-related cleanups
- The loop in demangleFunctionParameterList() only exits
  on Error, @, and Z. All 3 cases were handled, so the
  rest of the function is DEMANGLE_UNREACHABLE.

- The loop in demangleTemplateParameterList() always returns
  on Error, so there's no need to check for that in the loop
  header and after the loop.

- Add test cases for invalid function parameter manglings.

- Add a (redundant) test case for a simple template parameter
  list mangling.

- Add a test case pointing out that varargs functions aren't
  demangled correctly.

llvm-svn: 362540
2019-06-04 18:49:05 +00:00
Nemanja Ivanovic aed7227b71 Revert r362472 as it is breaking PPC build bots
The patch https://reviews.llvm.org/rL362472 broke PPC LNT buildbots.
Reverting it to bring the bots back to green.

llvm-svn: 362539
2019-06-04 18:48:43 +00:00
Alina Sbirlea bfceed49ce [Utils] Clean another duplicated util method.
Summary:
Following the cleanup in D48202, method foldBlockIntoPredecessor has the
same behavior. Replace its uses with MergeBlockIntoPredecessor.
Remove foldBlockIntoPredecessor.

Reviewers: chandlerc, dmgreen

Subscribers: jlebar, javed.absar, zzheng, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62751

llvm-svn: 362538
2019-06-04 18:45:15 +00:00
Nico Weber 878df1c2a9 llvm-undname: Add test coverage for demangleInitFiniStub()
llvm-svn: 362536
2019-06-04 18:06:28 +00:00
Craig Topper 09a4415803 [DAGCombiner][X86] Fold (not (neg X)) -> (add X, -1)
This is a special case of a more general transform (not (sub Y, X)) -> (add X, ~Y). InstCombine knows the general form. I've restricted to the special case to fix the motivating case PR42118. I tried handling any case where Y was constant, but got some changes on some Mips tests that I couldn't quickly prove where beneficial.

Fixes PR42118

Differential Revision: https://reviews.llvm.org/D62828

llvm-svn: 362533
2019-06-04 17:44:18 +00:00
Philip Reames 0cdaf3a09f [Tests] Autogen a test so future changes are visible
Oddly, I had to change a value name from "tmp0" to "bc0" to get the autogened test to pass.  I'm putting this down to an oddity of update_test_checks or FileCheck, but don't understand it.

llvm-svn: 362532
2019-06-04 17:29:55 +00:00
Roman Lebedev 925553ec91 [NFC][Codegen][PowerPC] Autogenerate shift-cmp.ll test
Being affected by upcoming patch

llvm-svn: 362529
2019-06-04 17:05:34 +00:00
Roman Lebedev 78ec94e4ec [NFC][Codegen][AMDGPU] Autogenerate commute-shifts.ll test
Being affected by upcoming patch

llvm-svn: 362528
2019-06-04 17:05:06 +00:00
Sanjay Patel 606eb2367f [x86] split 256-bit store of concatenated vectors
This shows up as a side issue to the main problem for the AVX target example from PR37428:
https://bugs.llvm.org/show_bug.cgi?id=37428 - https://godbolt.org/z/7tpRa3

But as we can see in the pile of existing test diffs, it's actually a widespread problem
that affects any AVX or later target. Apart from a couple of oddballs, I think these are
all improvements for the reasons stated in the code comment: we do not want to enable YMM
unnecessarily (avoid vzeroupper and frequency throttling) and some cores split 256-bit
stores anyway.

We could say that MergeConsecutiveStores() is going overboard on some of these examples,
but that won't solve the problem completely. But that is a reason I'm proposing this as
a lowering rather than a combine: we will infinite loop fighting the merge code if we try
this earlier.

Differential Revision: https://reviews.llvm.org/D62498

llvm-svn: 362524
2019-06-04 16:40:04 +00:00
Peter Smith f15e3d856f [AArch64][ELF] Add support for PLT decoding with BTI instructions present
Arm Architecture v8.5a introduces Branch Target Identification (BTI). When
enabled all indirect branches must target a bti instruction of the
appropriate form. As PLT sequences may sometimes be the target of an
indirect branch and PLT[0] always is, a static linker may need to generate
PLT sequences that contain "bti c" as the first instruction. In effect:
bti     c
adrp    x16, page offset to .got.plt
...
Instead of:
adrp    x16, page offset to .got.plt
...
At present the PLT decoding assumes the adrp will always be the first
instruction. This patch adds support for a single "bti c" to prefix it. A
test binary has been uploaded with such a PLT sequence. A forthcoming LLD
patch will make heavy use of the PLT decoding code.

Differential Revision: https://reviews.llvm.org/D62598

llvm-svn: 362523
2019-06-04 16:35:40 +00:00
Nico Weber d98a0a362f llvm-undname: Yet more coverage for error paths
- For error returns in demangleSpecialTableNode(),
  demangleLocalStaticGuard(), RTTITypeDescriptor,
  demangleRttiBaseClassDescriptorNode(), demangleUnsigned(),
  demangleUntypedVariable() (via RttiBaseClassArray)

- For ?_A and ?_P which are handled at early levels of the
  demangler but are not implemented in a later stage; this
  is now more obvious

- Replace a "default:" with an explicit list of cases, to
  get -Wswitch check we list all cases

llvm-svn: 362520
2019-06-04 16:25:28 +00:00
Nikita Popov df621bdfc8 [LVI][CVP] Add support for urem, srem and sdiv
The underlying ConstantRange functionality has been added in D60952,
D61207 and D61238, this just exposes it for LVI.

I'm switching the code from using a whitelist to a blacklist, as
we're down to one unsupported operation here (xor) and writing it
this way seems more obvious :)

Differential Revision: https://reviews.llvm.org/D62822

llvm-svn: 362519
2019-06-04 16:24:09 +00:00
Philip Reames af11a4376c [Tests] Update a test to consistently use new pass manager and FileCheck the result
llvm-svn: 362518
2019-06-04 16:19:34 +00:00
Philip Reames 78e71c4d09 [Tests] Autogen tests so that diffs for a future change are understandable
llvm-svn: 362516
2019-06-04 16:15:19 +00:00
Nico Weber dc2a8c7d7f llvm-undname: Add coverage for startsWithLocalScopePattern()
llvm-svn: 362515
2019-06-04 15:47:25 +00:00
Nico Weber c1a0e6fe6b llvm-undname: More no-op changes to increase test coverage
- Add test coverage around invalid anon namespaces and
  for error paths in demanglePrimitiveType() and in
  demangleFullyQualifiedTypeName()

- Use DEMANGLE_UNREACHABLE in two more unreachable places

llvm-svn: 362514
2019-06-04 15:38:00 +00:00
James Henderson 7f3135037d [llvm-symbolizer] Flush output on bad input
One way of using llvm-symbolizer is to interactively within a process
write a line from a parent process to llvm-symbolizer's stdin, and then
read the output, then write the next line, read, etc. This worked as
long as all the lines were good. However, this didn't work prior to this
patch if any of the inputs were bad inputs, because the output is not
flushed after a bad input, meaning the parent process is sat waiting for
output, whilst llvm-symbolizer is sat waiting for input. This patch
flushes the output after every invocation of symbolizeInput when reading
from stdin. It also removes unnecessary flushing when llvm-symbolizer is
not reading addresses from stdin, which should give a slight performance
boost in these situations.

Reviewed by: ikudrin

Differential Revision: https://reviews.llvm.org/D62371

llvm-svn: 362511
2019-06-04 15:34:58 +00:00
Jinsong Ji 3144d7a2da [PowerPC] P9 Scheduling Model: dispatching rule fixes
This is to address some of the problems in existing P9 resource modeling,
especially about the dispatching rules.

Instead of using a hypothetical DISPATCHER , we try to use the number of
actual dispatch slots, and define SchedWriteRes to model dispatch rules,
then update instruction classes according to dispatch rules.

All the dispatch rules and instruction classes update are made according
to POWER9 User Manual.

Differential Revision: https://reviews.llvm.org/D61873

llvm-svn: 362509
2019-06-04 15:22:23 +00:00
Sanjay Patel 1e63dd0b44 [SelectionDAG][x86] limit post-legalization store merging by type
The proposal in D62498 showed that x86 would benefit from vector
store splitting, but that may conflict with the generic DAG
combiner's store merging transforms.

Add memory type to the existing TLI hook that enables the merging
transforms, so we can limit those changes to scalars only for x86.

llvm-svn: 362507
2019-06-04 15:15:59 +00:00
Nico Weber 880d21d3cb llvm-undname: Several behavior-preserving changes to increase coverage
- Replace `Error = true` in a few branches that are truly unreachable
  with DEMANGLE_UNREACHABLE

- Remove early return early in startsWithLocalScopePattern() because
  it's redundant with the next two early returns

- Remove unreachable `case '0'` (it's handled in the branch below)

- Remove an unused bool return

- Add test coverage for several early error returns, mostly in
  array type parsing

llvm-svn: 362506
2019-06-04 15:13:30 +00:00
Sanjay Patel d6de9426ee [x86] add test for store merging/splitting; NFC
This is a reduction of a test that would infinite loop with D62498.

llvm-svn: 362502
2019-06-04 14:40:37 +00:00
Shawn Landden 2ee9a827ad [SimplifyCFG] fix last commit
llvm-svn: 362501
2019-06-04 14:32:52 +00:00
Shawn Landden 7f22fecac2 [SimplifyCFG] NFC; remove bogus test case
Even if one bit is defined, the code is not clear what it is suppose to do.

The test wants to assert that some bits are undef, but that's not what the IR does and I don't think it's even possible to do that in any meaningful way. It was added in D12497, so @reames might want to double check.

Differential Revision: https://reviews.llvm.org/D60859

llvm-svn: 362499
2019-06-04 14:17:46 +00:00
Roman Lebedev 2e49e8196d [NFC][Codegen] D62818 - also add tests with X being constant
For X86, these may be a 'BT' pattern, and in general, can cause
the transform to deadlock.

llvm-svn: 362494
2019-06-04 11:44:50 +00:00
Peter Smith 49d7221f71 [AArch64][ELF][llvm-readobj] Add support for BTI and PAC dynamic tags
ELF for the 64-bit Arm Architecture defines two processor-specific dynamic
tags:
DT_AARCH64_BTI_PLT 0x70000001, d_val
DT_AARCH64_PAC_PLT 0x70000003, d_val

These presence of these tags indicate that PLT sequences have been
protected using Branch Target Identification and Pointer Authentication
respectively. The presence of both indicates that the PLT sequences have
been protected with both Branch Target Identification and Pointer
Authentication.

This patch adds the tags and tests for llvm-readobj and yaml2obj.

As some of the processor specific dynamic tags overlap, this patch splits
them up, keeping their original default value if they were not previously
mentioned explicitly in a switch case.

Differential Revision: https://reviews.llvm.org/D62596

llvm-svn: 362493
2019-06-04 11:44:33 +00:00
Peter Smith 580c6d31c0 [AARCH64][ELF][llvm-readobj] Support for AArch64 .note.gnu.property
ELF for the 64-bit Arm Architecture defines a processor specific property
type GNU_PROPERTY_AARCH64_FEATURE_1_AND as GNU_PROPERTY_LOPROC. This
property works in a similar way to the existing X86 processor specific
property GNU_PROPERTY_GNU_X86_FEATURE_1_AND.

Two feature bits are defined for GNU_PROPERTY_AARCH64_FEATURE_1_AND:
- GNU_PROPERTY_AARCH64_FEATURE_1_BTI 0x1
- GNU_PROPERTY_AARCH64_FEATURE_1_PAC 0x2

This patch defines the property, feature bits and implements support for
printing in llvm-readobj.

Differential Revision: https://reviews.llvm.org/D62595

llvm-svn: 362490
2019-06-04 11:28:22 +00:00
Roman Lebedev 3dce0326fe [DAGCombine][X86][AArch64][MIPS][LANAI] (C - x) - y -> C - (x + y) fold (PR41952)
Summary:
This *might* be the last fold for `sink-addsub-of-const.ll`, but i'm not sure yet.

As far as i can tell, there are no regressions here (ignoring x86-32),
all changes are either good or neutral.

This, almost surprisingly to me, fixes the motivational tests (in `shift-amount-mod.ll`)
`@reg32_lshr_by_sub_from_negated` from [[ https://bugs.llvm.org/show_bug.cgi?id=41952 | PR41952 ]].

https://rise4fun.com/Alive/vMd3

Reviewers: RKSimon, t.p.northover, craig.topper, spatel, efriedma

Reviewed By: RKSimon

Subscribers: sdardis, javed.absar, arichardson, kristof.beyls, jrtc27, atanasyan, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62774

llvm-svn: 362488
2019-06-04 11:06:21 +00:00
Roman Lebedev be6ce7b3f2 [DAGCombine][X86][AArch64][ARM] (C - x) + y -> (y - x) + C fold
Summary:
All changes except ARM look **great**.
https://rise4fun.com/Alive/R2M

The regression `test/CodeGen/ARM/addsubcarry-promotion.ll`
is recovered fully by D62392 + D62450.

Reviewers: RKSimon, craig.topper, spatel, rogfer01, efriedma

Reviewed By: efriedma

Subscribers: dmgreen, javed.absar, kristof.beyls, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62266

llvm-svn: 362487
2019-06-04 11:06:08 +00:00
Simon Pilgrim ad298f86b7 [SelectionDAG] ComputeNumSignBits - support constant pool values from target
As I mentioned on D61887 we don't get many hits on ComputeNumSignBits as we did on computeKnownBits.

The case we do get is interesting though - it allows us to use the 'ConditionalNegate' combine in combineLogicBlendIntoPBLENDV to remove a select.

It comes too late for SSE41 (BLENDV) cases, but SSE2 tests can hit it now. We should probably try to make use of this for SSE41+ targets as well - avoiding variable blends is usually a good idea. I'll investigate as a followup.

Differential Revision: https://reviews.llvm.org/D62777

llvm-svn: 362486
2019-06-04 10:49:06 +00:00
Owen Reynolds 5d5078e341 [llvm-ar] Reapply Fix relative thin archive path handling
Includes a fix for an introduced build failure due to a post c++11 use of std::mismatch. 

This fixes some thin archive relative path issues, paths are shortened where possible and paths are output correctly when using the display table command.

Differential Revision: https://reviews.llvm.org/D59491

llvm-svn: 362484
2019-06-04 10:13:03 +00:00
Simon Pilgrim 3018d505a3 [SelectionDAG] Add fpto[us]i(undef) --> undef constant fold
Follow up to D62807.

Differential Revision: https://reviews.llvm.org/D62811

llvm-svn: 362483
2019-06-04 10:04:55 +00:00
Mikhail Maltsev 08da01b496 [ARM] Add FP16 vector insert/extract patterns
This change adds two FP16 extraction and two insertion patterns
(one per possible vector length).
Extractions are handled by copying a Q/D register into one of VFP2
class registers, where single FP32 sub-registers can be accessed. Then
the extraction of even lanes are simple sub-register extractions
(because we don't care about the top parts of registers for FP16
operations). Odd lanes need an additional VMOVX instruction.

Unfortunately, insertions cannot be handled in the same way, because:
* There is no instruction to insert FP16 into an even lane (VINS only
  works with odd lanes)
* The patterns for odd lanes will have a form of a DAG (not a tree),
  and will not be implementable in pure tablegen

Because of this insertions are handled in the same way as 16-bit
integer insertions (with conversions between FP registers and GPRs
using VMOVHR instructions).

Without these patterns the ARM backend would sometimes fail during
instruction selection.

This patch also adds patterns which combine:
* an FP16 element extraction and a store into a single VST1
  instruction
* an FP16 load and insertion into a single VLD1 instruction

Differential Revision: https://reviews.llvm.org/D62651

llvm-svn: 362482
2019-06-04 09:39:55 +00:00
QingShan Zhang 11de0e71b0 [DAGCombine] Match a pattern where a wide type scalar value is stored by several narrow stores
This opportunity is found from spec 2017 557.xz_r. And it is used by the sha encrypt/decrypt. See sha-2/sha512.c

static void store64(u64 x, unsigned char* y)
{
    for(int i = 0; i != 8; ++i)
        y[i] = (x >> ((7-i) * 8)) & 255;
}

static u64 load64(const unsigned char* y)
{
    u64 res = 0;
    for(int i = 0; i != 8; ++i)
        res |= (u64)(y[i]) << ((7-i) * 8);
    return res;
}
The load64 has been implemented by https://reviews.llvm.org/D26149
This patch is trying to implement the store pattern.

Match a pattern where a wide type scalar value is stored by several narrow
stores. Fold it into a single store or a BSWAP and a store if the targets
supports it.

Assuming little endian target:
i8 *p = ...
i32 val = ...
p[0] = (val >> 0) & 0xFF;
p[1] = (val >> 8) & 0xFF;
p[2] = (val >> 16) & 0xFF;
p[3] = (val >> 24) & 0xFF;

>
*((i32)p) = val;

i8 *p = ...
i32 val = ...
p[0] = (val >> 24) & 0xFF;
p[1] = (val >> 16) & 0xFF;
p[2] = (val >> 8) & 0xFF;
p[3] = (val >> 0) & 0xFF;

>
*((i32)p) = BSWAP(val);

Differential Revision: https://reviews.llvm.org/D61843

llvm-svn: 362472
2019-06-04 08:53:53 +00:00
QingShan Zhang 72667b4e48 [NFC] Update the test to check the endianness after the CodeGenPrepare instead of checking the assembly instructions.
llvm-svn: 362471
2019-06-04 08:45:07 +00:00
Simon Tatham ac02445524 [ARM] Turn some undefined encoding bits into 0s.
The family of 32-bit Thumb instruction encodings that include t2ORR,
t2AND and t2EOR are all listed in the ArmARM as having (0) in bit 15.
The Tablegen descriptions of those instructions listed them as ?. This
change tightens that up by making them into 0 + Unpredictable.

In the specific case of t2ORR, we tighten it up still further by
making the zero bit mandatory. This change comes from Arm v8.1-M, in
which encodings with that bit equal to 1 will now be used for
different instructions.


Reviewers: dmgreen, samparker, SjoerdMeijer, efriedma

Reviewed By: dmgreen, efriedma

Subscribers: efriedma, javed.absar, kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60705

llvm-svn: 362470
2019-06-04 08:28:48 +00:00
Chen Zheng a050b25544 [PowerPC] add testcases for reordering LSR and PPCCTRLoops - NFC
llvm-svn: 362468
2019-06-04 06:48:14 +00:00
Roman Lebedev b3650868f6 [NFC][X86] Fixup FileCheck prefixes - drop duplicates
llvm-svn: 362460
2019-06-03 23:00:51 +00:00
Craig Topper ac062bbad8 [X86] Add test cases for 32 and 64 bit versions of PR42118. NFC
llvm-svn: 362457
2019-06-03 22:34:15 +00:00
Roman Lebedev 6dc8ce323e [NFC][Codegen] Add tests for hoisting and-by-const from "logical shift", when then eq-comparing with 0
This was initially reported as: https://reviews.llvm.org/D62818

https://rise4fun.com/Alive/oPH

llvm-svn: 362455
2019-06-03 22:30:18 +00:00
Craig Topper 099f4a9fa8 Revert r362451 "foo" and r362452 "[X86] Add test cases for 32 and 64 bit versions of PR42118. NFC"
I failed to squash these properly

llvm-svn: 362453
2019-06-03 22:14:54 +00:00
Craig Topper 17728e7c15 [X86] Add test cases for 32 and 64 bit versions of PR42118. NFC
llvm-svn: 362452
2019-06-03 22:11:40 +00:00
Craig Topper 27a546610c foo
llvm-svn: 362451
2019-06-03 22:11:30 +00:00
Cameron McInally 89f9af5487 [SCCP] Add UnaryOperator visitor to SCCP for unary FNeg
Differential Revision: https://reviews.llvm.org/D62819

llvm-svn: 362449
2019-06-03 21:53:56 +00:00
Michael Berg 6ff978ee05 Propagate fmf for setcc in SDAG for select folds
llvm-svn: 362448
2019-06-03 21:53:26 +00:00
Matt Arsenault 0ceda9fb5c AMDGPU: Disable stack realignment for kernels
This is something of a workaround, and the state of stack realignment
controls is kind of a mess. Ideally, we would be able to specify the
stack is infinitely aligned on entry to a kernel.

TargetFrameLowering provides multiple controls which apply at
different points. The StackRealignable field is used during
SelectionDAG, and for some reason distinct from this
hook. StackAlignment is a single field not dependent on the
function. It would probably be better to make that dependent on the
calling convention, and the maximum value for kernels.

Currently this doesn't really change anything, since the frame
lowering mostly does its own thing. This helps avoid regressions in a
future change which will rely more heavily on hasFP.

llvm-svn: 362447
2019-06-03 21:33:22 +00:00
Jessica Paquette 7500c97ce4 [AArch64][GlobalISel] Optimize G_FCMP + G_SELECT pairs when G_SELECT is fp
Instead of emitting all of the test stuff for a compare when it's only used by
a select, instead, just emit the compare + select. The select will use the
value of NZCV correctly, so we don't need to emit all of the test instructions
etc.

For now, only support fp selects which use G_FCMP. Also only support condition
codes which will only require one select to represent.

Also add a test.

Differential Revision: https://reviews.llvm.org/D62695

llvm-svn: 362446
2019-06-03 20:47:20 +00:00
Craig Topper dcf865f0ca [X86] Fix the pattern for merge masked vcvtps2pd.
r362199 fixed it for zero masking, but not zero masking. The load
folding in the peephole pass hid the bug. This patch turns off
the peephole pass on the relevant test to ensure coverage.

llvm-svn: 362440
2019-06-03 19:29:14 +00:00
Michael Berg 0b7f98da65 Propagate fmf for setcc/select folds
Summary: This change facilitates propagating fmf which was placed on setcc from fcmp through folds with selects so that back ends can model this path for arithmetic folds on selects in SDAG.

Reviewers: qcolombet, spatel

Reviewed By: qcolombet

Subscribers: nemanjai, jsji

Differential Revision: https://reviews.llvm.org/D62552

llvm-svn: 362439
2019-06-03 19:12:15 +00:00
Nemanja Ivanovic bad43d8f49 [PowerPC] Look through copies for compare elimination
We currently miss the opportunities for optmizing comparisons in the peephole
optimizer if the input is the result of a COPY since we look for record-form
versions of the producing instruction.

This patch simply lets the optimization peek through copies.

Differential revision: https://reviews.llvm.org/D59633

llvm-svn: 362438
2019-06-03 19:09:15 +00:00
Matt Arsenault 8dbeb9256c TTI: Improve default costs for addrspacecast
For some reason multiple places need to do this, and the variant the
loop unroller and inliner use was not handling it.

Also, introduce a new wrapper to be slightly more precise, since on
AMDGPU some addrspacecasts are free, but not no-ops.

llvm-svn: 362436
2019-06-03 18:41:34 +00:00
Andrew Kaylor 4172dbab5d Fix a crash when the default of a switch is removed
This patch fixes a problem that occurs in LowerSwitch when a switch statement has a PHI node as its condition, and the PHI node only has two incoming blocks, and one of those incoming blocks is through an unreachable default in the switch statement. When this condition occurs, LowerSwitch holds a pointer to the condition value, but removes the switch block as a predecessor of the PHI block, causing the PHI node to be replaced. LowerSwitch then tries to use its stale pointer to the original condition value, causing a crash.

Differential Revision: https://reviews.llvm.org/D62560

llvm-svn: 362427
2019-06-03 17:54:15 +00:00
Philip Reames 83645d214d [Tests] Add LFTR tests for multiple exit loops (try 2)
(Recommit after fixing a keymash in the run line.  Sorry for breakage.)

This is preparation for D62625 <https://reviews.llvm.org/D62625>

llvm-svn: 362426
2019-06-03 17:41:12 +00:00
Dmitri Gribenko b46934eeb8 Revert "[Tests] Add LFTR tests for multiple exit loops"
This reverts commit r362417.  There's a syntax error in the RUN line.

llvm-svn: 362418
2019-06-03 16:58:11 +00:00
Philip Reames 2fcd2bd0df [Tests] Add LFTR tests for multiple exit loops
This is preparation for D62625

llvm-svn: 362417
2019-06-03 16:46:03 +00:00
Simon Pilgrim 985f2f48bd [WebAssembly] Remove fptosi(undef) and fptoui(undef) from reduced test case.
Pre-commit for D62811 - which adds DAG fpto[us]i(undef) --> undef constant fold

llvm-svn: 362414
2019-06-03 16:21:58 +00:00
Dmitri Gribenko 857de979a7 Revert "[llvm-ar] Fix relative thin archive path handling"
This reverts commit r362407.  It broke compilation of
llvm/lib/Object/ArchiveWriter.cpp:

error: type 'llvm::sys::path::const_iterator' does not provide a call
operator

llvm-svn: 362413
2019-06-03 16:21:37 +00:00
Owen Reynolds fade9cbed7 [llvm-ar] Fix relative thin archive path handling
This fixes some thin archive relative path issues, paths are shortened where possible and paths are output correctly when using the display table command.

Differential Revision: https://reviews.llvm.org/D59491

llvm-svn: 362407
2019-06-03 15:26:07 +00:00
Michal Gorny 9158d57d19 [llvm] [test] Remove non-portable EISDIR test from macho-disassemble-g-dsym.test
Remove the test checking error message for 'is a directory'.  It does
not seem to serve any real purpose, and it relies on matching platform
error strings which are unpredictable and makes the test fragile.
Furthermore, it fails on NetBSD where read() works on directories,
and therefore does not return EISDIR at all.

Fixes r362141.

Differential Revision: https://reviews.llvm.org/D62773

llvm-svn: 362404
2019-06-03 14:50:03 +00:00
Dmitry Preobrazhensky 9111f35f02 [AMDGPU][MC] Added support of SCC, VCCZ and EXECZ operands
See bug 39292: https://bugs.llvm.org/show_bug.cgi?id=39292

Reviewers: rampitec, arsenm

Differential Revision: https://reviews.llvm.org/D62660

llvm-svn: 362400
2019-06-03 13:51:24 +00:00
Simon Pilgrim cb7e4e8193 [SelectionDAG] Add [us]itofp(undef) --> 0 constant fold (PR39205)
We were missing this fold in the DAG, which I've copied directly from llvm::ConstantFoldCastInstruction

Differential Revision: https://reviews.llvm.org/D62807

llvm-svn: 362397
2019-06-03 13:02:07 +00:00
Simon Pilgrim 74467814f2 [SystemZ] Remove sitofp(undef) from reduced test case.
Pre-commit for D62807 - which adds DAG [us]itofp(undef) --> 0 constant fold

llvm-svn: 362396
2019-06-03 12:58:36 +00:00
Cullen Rhodes 3901dd3e41 [AArch64][SVE2] Add CPU and arch directive tests
Summary:
This patch adds tests for directives .arch, .arch_extension and .cpu for
all features defined in Arm SVE2 architecture extension.

Reviewed By: chill

Differential Revision: https://reviews.llvm.org/D62602

llvm-svn: 362378
2019-06-03 10:42:02 +00:00
George Rimar ab93e6e0fe [llvm-readobj] - Convert gnu-sections.test to use YAML.
gnu-sections.test currently use relocs.obj.elf-x86_64 and
relocs.obj.elf-i386 precompiled objects as an inputs.

These inputs actually initially were introduced to test the
dump of relocations and have almost nothing common with dumping
sections.

Patch converts the test to use yaml2obj. That allows to remove
relocs.obj.elf-i386 binary.
(relocs.obj.elf-x86_64 is still used by another test and can't be removed atm).

Differential revision: https://reviews.llvm.org/D62659

llvm-svn: 362377
2019-06-03 09:58:41 +00:00
George Rimar 1115a199aa [llvm-readobj/llvm-readelf] - Remove gnu-relocations.test completely.
rL362089 introduced a set of yaml based reloc-types-*.test test cases
(instead of huge reloc-types.test that used a lot of precompiled binaries)
These test cases checks LLVM-styled dumping of the relocations.

gnu-relocations.test was a test case to check GNU styled relocations dumping.
It did that only for elf-x86 and elf-x86_64 targets. It did not test all of the
relocations though.

Now, after rL362089, it does not make sence to keep it.
This patch updates reloc-types-elf-i386.test and reloc-types-elf-x64.test tests
with llvm-readelf calls to check GNU styled output in one place.
It removes gnu-relocations.test completely.

One of intentions of doing this is also to get rid of relocs.obj.elf-i386 and
relocs.obj.elf-x86_64 precompiled objects completely (they are used in other tests still).

Differential revision: https://reviews.llvm.org/D62655

llvm-svn: 362374
2019-06-03 09:52:32 +00:00
Nikola Prica 2d0106a110 [LiveDebugValues] Close range for previous variable's location when adding newly deduced location
When LiveDebugValues deduces new variable's location from spill, restore or
register copy instruction it should close old variable's location. Otherwise
we can have multiple block output locations for same variable. That could lead
to inserting two DBG_VALUEs for same variable to the beginning of the successor
block which results to ignoring of first DBG_VALUE.

Reviewers: aprantl, jmorse, wolfgangp, dstenb

Reviewed By: aprantl

Subscribers: probinson, asowda, ivanbaev, petarj, djtodoro

Tags: #debug-info

Differential Revision: https://reviews.llvm.org/D62196

llvm-svn: 362373
2019-06-03 09:48:29 +00:00
Diogo N. Sampaio df92f84110 [ARM][FIX] Ran out of registers due tail recursion
Summary:
- pr42062
When compiling for MinSize,
ARMTargetLowering::LowerCall decides to indirect
multiple calls to a same function. However,
it disconsiders the limitation that thumb1
indirect calls require the callee to be in a
register from r0 to r3 (llvm limiation).
If all those registers are used by arguments, the
compiler dies with "error: run out of registers
during register allocation".
This patch tells the function
IsEligibleForTailCallOptimization if we intend to
perform indirect calls, as to avoid tail call
optimization.

Reviewers: dmgreen, efriedma

Reviewed By: efriedma

Subscribers: javed.absar, kristof.beyls, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62683

llvm-svn: 362366
2019-06-03 08:58:05 +00:00
Sam Parker a0bd6f8a1a [AArch64] Check for simple type in FPToUInt
DAGCombiner was hitting a SimpleType assertion when trying to combine
a v3f32 before type legalization.

bugzilla: https://bugs.llvm.org/show_bug.cgi?id=41916

Differential Revision: https://reviews.llvm.org/D62734

llvm-svn: 362365
2019-06-03 08:49:17 +00:00
Roman Lebedev bcd542881d [NFC][X86] extract-{low,}bits.ll: one more pattern c with truncation
llvm-svn: 362364
2019-06-03 08:44:09 +00:00
Jim Lin 20b14dacbb [AVR] Fix incorrect source regclass of LDWRdPtr
Summary:
LDWRdPtr would be expanded to ld+ldd. ldd only accepts the pointer register is Y or Z.
So the register class of pointer of LDWRdPtr should be PTRDISPREGS instead of PTRREGS.

Reviewers: dylanmckay

Reviewed By: dylanmckay

Subscribers: dylanmckay, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62300

llvm-svn: 362351
2019-06-03 02:31:07 +00:00
Florian Hahn e71963c850 Recommit r360171: [DAGCombiner] Avoid creating large tokenfactors in visitTokenFactor.
If we hit the limit, we do expand the outstanding tokenfactors.
Otherwise, we might drop nodes with users in the unexpanded
tokenfactors. This fixes the crashes reported by Jordan Rupprecht.

Reviewers: niravd, spatel, craig.topper, rupprecht

Reviewed By: niravd

Differential Revision: https://reviews.llvm.org/D62633

llvm-svn: 362350
2019-06-03 01:30:19 +00:00
Nico Weber 3cbb8b8391 llvm-undname: Add coverage for some error paths
llvm-svn: 362346
2019-06-02 23:48:28 +00:00
Nico Weber 54362477c7 llvm-undname; Add more test coverage for demangleFunctionClass()
Also add two FC_Far that seem to be missing, by symmetry from
the public and protected cases. (But FC_Far isn't really a thing
anymore, so this doesn't really have an observable effect.)

llvm-svn: 362344
2019-06-02 23:26:57 +00:00
Craig Topper 50b35caf30 [DAGCombiner][X86] Fold away masked store and scatter with all zeroes mask.
Similar to what was done for masked load and gather.

llvm-svn: 362342
2019-06-02 22:52:38 +00:00
Craig Topper 5f79d74946 [X86] Add test cases for masked store and masked scatter with an all zeroes mask. Fix bug in ScalarizeMaskedMemIntrin
Need to cast only to Constant instead of ConstantVector to allow
ConstantAggregateZero.

llvm-svn: 362341
2019-06-02 22:52:34 +00:00
Simon Pilgrim 8a32ca381d [CostModel][X86] Improve masked load/store AVX1/AVX2 costs
A mixture of internal tests and review of the scheduler models indicates we're overestimating the cost of a masked load, which we're estimating at 4x regular memory ops - more realistic values indicates that its closer to 2x. Masked stores costs are a lot more diverse but 8x is roughly in the middle of the range.

e.g. SandyBridge
defm : X86WriteRes<WriteFMaskedLoad, [SBPort23,SBPort05], 8, [1,2], 3>;
defm : X86WriteRes<WriteFMaskedLoadY, [SBPort23,SBPort05], 9, [1,2], 3>;
defm : X86WriteRes<WriteFMaskedStore, [SBPort4,SBPort01,SBPort23], 5, [1,1,1], 3>;
defm : X86WriteRes<WriteFMaskedStoreY, [SBPort4,SBPort01,SBPort23], 5, [1,1,1], 3>;

e.g. Btver2
defm : X86WriteRes<WriteFMaskedLoad, [JLAGU, JFPU01, JFPX], 6, [1, 2, 2], 1>;
defm : X86WriteRes<WriteFMaskedLoadY, [JLAGU, JFPU01, JFPX], 6, [2, 4, 4], 2>;
defm : X86WriteRes<WriteFMaskedStore, [JSAGU, JFPU01, JFPX], 6, [1, 1, 4], 1>;
defm : X86WriteRes<WriteFMaskedStoreY, [JSAGU, JFPU01, JFPX], 6, [2, 2, 4], 2>;

Differential Revision: https://reviews.llvm.org/D61257

llvm-svn: 362338
2019-06-02 20:37:02 +00:00
Craig Topper a7bc31ebc6 [DAGCombiner] Replace masked loads with a zero mask with the passthru value
Similar to what was recently done for gathers in r362015.

llvm-svn: 362337
2019-06-02 18:58:46 +00:00
Nico Weber 869308dd55 Add demangling test coverage for unsigned short, unsigned long
llvm-svn: 362332
2019-06-02 17:29:26 +00:00
Nico Weber dfe02bc4e9 Add mangling test coverage for non-volatile const member pointers
llvm-svn: 362331
2019-06-02 17:23:53 +00:00
Roman Lebedev 420f5df1c3 [NFC][X86] extract-{low,}bits.ll: one more pattern a with truncation
llvm-svn: 362330
2019-06-02 17:11:21 +00:00
Nico Weber d0d32c35d9 Add test coverage for __pascal mangling
llvm-svn: 362329
2019-06-02 16:47:07 +00:00
Simon Pilgrim 71a39bcf68 [X86] isHorizontalBinOp - add extract_subvector(shuffle(x)) handling (PR39921)
Let's us match horizontal op patterns on fast-variable-shuffle targets (Haswell etc.)

llvm-svn: 362327
2019-06-02 15:47:49 +00:00
Simon Pilgrim b0dc262ffb [X86] Add AVX2 'fast-variable-shuffle' PHADD tests (PR39921)
Haswell etc. will combine shuffles to a extract_subvector(permd(x)) before isHorizontalBinOp can match it.

llvm-svn: 362326
2019-06-02 15:33:28 +00:00
Roman Lebedev 2065ddfd79 [NFC][X86] extract-lowbits.ll: add one more pattern a with truncation
We are also free to interpret this as 'BZHI'/'BEXTR'.
https://rise4fun.com/Alive/dD6

llvm-svn: 362325
2019-06-02 15:07:49 +00:00
Simon Pilgrim ffb4d2bff7 [DAG] isBitwiseNot / isConstOrConstSplat - add support for build vector undefs + truncation (PR41020)
Add (opt-in) support for implicit truncation to isConstOrConstSplat, which allows us to match truncated 'all ones' cases in isBitwiseNot.

PR41020 compares against using ISD::isBuildVectorAllOnes() instead, but that predicate silently accepts any UNDEF elements in the build vector which might not be what we want in isBitwiseNot - so I've added an opt-in 'AllowUndefs' flag that is set to false by default but will allow us to enable it on individual cases where its safe.

Differential Revision: https://reviews.llvm.org/D62783

llvm-svn: 362323
2019-06-02 11:56:39 +00:00
Nikita Popov eb37509832 [IndVarSimplify] Add tests for saturating math on IV; NFC
These saturating math ops can be replaced with simple math.

llvm-svn: 362320
2019-06-02 08:49:35 +00:00
Roman Lebedev 0bfa9359b0 [NFC][X86] extract-lowbits.ll: add patterns with truncation too
If we look past truncations of X too eagerly (D62786), we may
end up with 64-bit 'BEXTR', even though 32-bit-one would suffice.

llvm-svn: 362319
2019-06-02 08:05:24 +00:00
Craig Topper fe699c32a2 [X86] Simplify the CHECK lines in vector-reduce-and/or/xor-widen.ll in similar way to r362308.
Forgot to do the widen forms when I was doing the others.

llvm-svn: 362310
2019-06-02 00:43:02 +00:00
Craig Topper 396a915c26 [X86] Add the SSE versions of PMULLW and PMULLD to isAssociativeAndCommutative.
llvm-svn: 362309
2019-06-02 00:42:58 +00:00
Craig Topper 4721fad972 [X86] Simplify the CHECK lines in vector-reduce-and/or/xor.
The AVX512BW and AVX512VL checks were never used. And AVX512 is the same
as AVX on all tests that weren't already split for AVX1 and AVX2.

llvm-svn: 362308
2019-06-02 00:07:52 +00:00
Craig Topper eeaecc63e9 [X86] Add avx512 command lines and test cases to machine-combiner.ll
llvm-svn: 362307
2019-06-02 00:07:48 +00:00
Craig Topper 7cebf0af40 [InlineCost] Don't add the soft float function call cost for the fneg idiom, fsub -0.0, %x
Summary: Fneg can be implemented with an xor rather than a function call so we don't need to add the function call overhead. This was pointed out in D62699

Reviewers: efriedma, cameron.mcinally

Reviewed By: efriedma

Subscribers: javed.absar, eraman, hiraditya, haicheng, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62747

llvm-svn: 362304
2019-06-01 19:40:07 +00:00
Simon Pilgrim cd1878d0f9 [AMDGPU] Regenerate SDIV tests for an upcoming patch
llvm-svn: 362303
2019-06-01 18:27:06 +00:00
Simon Pilgrim 0d4a040510 [X86][AVX] Add tests for CONCAT(MOVDDUP(x),MOVDDUP(y))
llvm-svn: 362300
2019-06-01 14:05:46 +00:00
Simon Atanasyan 25694e0084 [mips] Extend range of register indexes accepted by cfcmsa/ctcmsa
The `cfcmsa` and `ctcmsa` instructions accept index of MSA control
register. The MIPS64 SIMD Architecture define eight MSA control
registers. But register index for `cfcmsa` and `ctcmsa` instructions
might be any number in 0..31 range. If the index is greater then 7,
`cfcmsa` writes zero to the destination registers and `ctcmsa` does
nothing [1].

[1] MIPS Architecture for Programmers Volume IV-j:
    The MIPS64 SIMD Architecture Module
https://www.mips.com/?do-download=the-mips64-simd-architecture-module

Differential Revision: https://reviews.llvm.org/D62597

llvm-svn: 362299
2019-06-01 13:55:18 +00:00
Dylan McKay 45eb4c7e55 [AVR] Disable register coalescing to the PTRDISPREGS class
If we would allow register coalescing on PTRDISPREGS class then register
allocator can lock Z register to some virtual register. Larger instructions
requiring a memory acces then fail during the register allocation phase since
there is no available register to hold a pointer if Y register was already
taken for a stack frame. This patch prevents it by keeping Z register
spillable. It does it by not allowing coalescer to lock it.

Original discussion on https://github.com/avr-rust/rust/issues/128.

llvm-svn: 362298
2019-06-01 12:38:56 +00:00
Simon Pilgrim e6d1a80370 [SLPVectorizer][X86] Add other tests described in PR28474
llvm-svn: 362297
2019-06-01 12:35:03 +00:00
Simon Pilgrim 2ef83571f2 [SLPVectorizer][X86] This test was from PR28474
llvm-svn: 362296
2019-06-01 12:10:29 +00:00
Roman Lebedev 1aaa23c0fc [NFC][Codegen] shift-amount-mod.ll: drop innermost operation
I have initially added it in for test to display both
whether the binop w/ constant is sinked or hoisted.
But as it can be seen from the 'sub (sub C, %x), %y'
test, that actually conceals the issues it is supposed to test.

At least two more patterns are unhandled:
* 'add (sub C, %x), %y' - D62266
* 'sub (sub C, %x), %y'

llvm-svn: 362295
2019-06-01 11:08:29 +00:00
Nikita Popov 46d4dba6e6 [IndVarSimplify] Fixup nowrap flags during LFTR (PR31181)
Fix for https://bugs.llvm.org/show_bug.cgi?id=31181 and partial fix
for LFTR poison handling issues in general.

When LFTR moves a condition from pre-inc to post-inc, it may now
depend on value that is poison due to nowrap flags. To avoid this,
we clear any nowrap flag that SCEV cannot prove for the post-inc
addrec.

Additionally, LFTR may switch to a different IV that is dynamically
dead and as such may be arbitrarily poison. This patch will correct
nowrap flags in some but not all cases where this happens. This is
related to the adoption of IR nowrap flags for the pre-inc addrec.
(See some of the switch_to_different_iv tests, where flags are not
dropped or insufficiently dropped.)

Finally, there are likely similar issues with the handling of GEP
inbounds, but we don't have a test case for this yet.

Differential Revision: https://reviews.llvm.org/D60935

llvm-svn: 362292
2019-06-01 09:40:18 +00:00
Nikita Popov 2b1d799a59 [IndVarSimplify] Add additional PR33181 tests; NFC
Two more tests with a switch to a dynamically dead IV, with poison
occuring on the first or second iteration.

llvm-svn: 362291
2019-06-01 09:40:09 +00:00
Dylan McKay 038e3b9f57 Extend the DWARFExpression address handling to support 16-bit addresses
This allows the DWARFExpression class to handle addresses without
crashing on targets with 16-bit pointers like AVR.

This is required in order to generate assembly from clang via the '-S'
flag.

This fixes an error with the following message:

clang: llvm/include/llvm/DebugInfo/DWARF/DWARFExpression.h:132: llvm::DWARFExpression::DWARFExpression(llvm::DataExtractor, uint16_t, uint8_t):
       Assertion `AddressSize == 8 || AddressSize == 4' failed.
llvm-svn: 362290
2019-06-01 09:18:26 +00:00
Craig Topper c288a19bb7 [X86] Add AVX512BF16 and AVX512VP2INTERSECT instructions to the loading folding tables.
llvm-svn: 362288
2019-06-01 06:20:59 +00:00
Tom Tan 2258ecc2aa [COFF, ARM64] Fix location of ARM64 CodeView test
ARM64 CodeView test was incorrectly put under test/DebugInfo/COFF folder which
runs for all all architectures. This fix moves it to a subfolder AArch64 with
lit.local.cfg which specify it supports AArch64 only.

llvm-svn: 362283
2019-06-01 02:38:08 +00:00
Philip Reames 099eca832e [LoopPred] Handle a subset of NE comparison based latches
At the moment, LoopPredication completely bails out if it sees a latch of the form:
%cmp = icmp ne %iv, %N
br i1 %cmp, label %loop, label %exit
OR
%cmp = icmp ne %iv.next, %NPlus1
br i1 %cmp, label %loop, label %exit

This is unfortunate since this is exactly the form that LFTR likes to produce. So, go ahead and recognize simple cases where we can.

For pre-increment loops, we leverage the fact that LFTR likes canonical counters (i.e. those starting at zero) and a (presumed) range fact on RHS to discharge the check trivially.

For post-increment forms, the key insight is in remembering that LFTR had to insert a (N+1) for the RHS. CVP can hopefully prove that add nsw/nuw (if there's appropriate range on N to start with). This leaves us both with the post-inc IV and the RHS involving an nsw/nuw add, and SCEV can discharge that with no problem.

This does still need to be extended to handle non-one steps, or other harder patterns of variable (but range restricted) starting values. That'll come later.

Differential Revision: https://reviews.llvm.org/D62748

llvm-svn: 362282
2019-06-01 00:31:58 +00:00
Tom Tan eb4d6142dc [COFF, ARM64] Add CodeView register mapping
CodeView has its own register map which is defined in cvconst.h. Missing this
mapping before saving register to CodeView causes debugger to show incorrect
value for all register based variables, like variables in register and local
variables addressed by register (stack pointer + offset).

This change added mapping between LLVM register and CodeView register so the
correct register number will be stored to CodeView/PDB, it aso fixed the
mapping from CodeView register number to register name based on current
CPUType but print PDB to yaml still assumes X86 CPU and needs to be fixed.

Differential Revision: https://reviews.llvm.org/D62608

llvm-svn: 362280
2019-05-31 23:43:31 +00:00
Reid Kleckner eddd6c25b5 [codeview] Revert inline line table change of r362264
Testing with debuggers shows that our previous behavior was correct.
The reason I thought MSVC did things differently is that MSVC prefers to
use the 0xB combined code offset and code length update opcode when
inline sites are discontiguous.

Keep the test changes, and update the llvm-pdbutil inline line table
dumper to account for this new interpretation of the opcodes.

llvm-svn: 362277
2019-05-31 22:55:03 +00:00
Matt Arsenault 302eedcbfa AMDGPU: Fix not adding ImplicitBufferPtr as a live-in
Fixes missing test from r293000.

llvm-svn: 362275
2019-05-31 22:47:36 +00:00
Erik Pilkington abb2a93c53 [SimplifyLibCalls] Fold more fortified functions into non-fortified variants
When the object size argument is -1, no checking can be done, so calling the
_chk variant is unnecessary. We already did this for a bunch of these
functions.

rdar://50797197

Differential revision: https://reviews.llvm.org/D62358

llvm-svn: 362272
2019-05-31 22:41:36 +00:00
Philip Reames fa6bcd0b96 [Tests] Better represent the postinc form produced by LFTR in LoopPred tests
llvm-svn: 362270
2019-05-31 22:22:29 +00:00
Reid Kleckner e98cf5fe47 [codeview] Fix inline line table accuracy for discontiguous segments
After improving the inline line table dumper in llvm-pdbutil and looking
at MSVC's inline line tables, it is clear that setting the length of the
inlined code region does not update the code offset. This means that the
delta to the beginning of a new discontiguous inlined code region should
be calculated relative to the last code offset, excluding the length.
Implementing this is a one line fix for MC: simply don't update
LastLabel.

While I'm updating these test cases, switch them to use llvm-objdump -d
and llvm-pdbutil. This allows us to show offsets of each instruction and
correlate the line table offsets to the actual code.

llvm-svn: 362264
2019-05-31 20:55:31 +00:00
Nikita Popov 7bafae55c0 Reapply [CVP] Simplify non-overflowing saturating add/sub
If we can determine that a saturating add/sub will not overflow based
on range analysis, convert it into a simple binary operation. This is
a sibling transform to the existing with.overflow handling.

Reapplying this with an additional check that the saturating intrinsic
has integer type, as LVI currently does not support vector types.

Differential Revision: https://reviews.llvm.org/D62703

llvm-svn: 362263
2019-05-31 20:48:26 +00:00
Nikita Popov d435093056 [CVP] Add vector saturating add test; NFC
Extra test for the assertion failure from D62703.

llvm-svn: 362262
2019-05-31 20:42:13 +00:00
Nikita Popov 23a02f6a5f [CVP] Fix assertion failure on vector with.overflow
Noticed on D62703. LVI only handles plain integers, not vectors of
integers. This was previously not an issue, because vector support
for with.overflow is only a relatively recent addition.

llvm-svn: 362261
2019-05-31 20:42:07 +00:00
Philip Reames f711d59427 [Tests] Add ne icmp tests w/preinc forms for LoopPredication
Turns out this is substaintially easier to match then the post increment form, so let's start there.

llvm-svn: 362260
2019-05-31 20:34:57 +00:00
Cameron McInally 5594ee0a3e [NFC][InstCombine] Add unary FNeg tests to AMDGPU/amdgcn-intrinsics.ll
llvm-svn: 362255
2019-05-31 19:12:59 +00:00
Nikita Popov ccb63e0bfe Revert "[CVP] Simplify non-overflowing saturating add/sub"
This reverts commit 1e692d1777.

Causes assertion failure in builtins-wasm.c clang test.

llvm-svn: 362254
2019-05-31 19:04:47 +00:00
Cameron McInally 51e0de6954 [NFC][InstCombine] Add unary FNeg to cos-1.ll cos-2.ll cos-sin-intrinsic.ll
llvm-svn: 362253
2019-05-31 18:54:44 +00:00
Puyan Lotfi 3ea6b24f41 [MIR-Canon] Don't do vreg skip for independent instructions if there are none.
We don't want to create vregs if there is nothing to use them for. That causes
verifier errors.

Differential Revision: https://reviews.llvm.org/D62740

llvm-svn: 362247
2019-05-31 17:34:25 +00:00
Philip Reames 8dda4a1675 [Tests] Add tests for loop predication of loops w/ne latch conditions
llvm-svn: 362244
2019-05-31 16:54:38 +00:00
Nikita Popov 1e692d1777 [CVP] Simplify non-overflowing saturating add/sub
If we can determine that a saturating add/sub will not overflow
based on range analysis, convert it into a simple binary operation.
This is a sibling transform to the existing with.overflow handling.

Differential Revision: https://reviews.llvm.org/D62703

llvm-svn: 362242
2019-05-31 16:46:05 +00:00
Kevin P. Neal ac79007205 Revert revert of r362112 with minor SystemZ test file corrections.
[FPEnv] Added a special UnrollVectorOp method to deal with the chain on StrictFP opcodes

This change creates UnrollVectorOp_StrictFP. The purpose of this is to address a failure that consistently occurs when calling StrictFP functions on vectors whose number of elements is 3 + 2n on most platforms, such as PowerPC or SystemZ. The old UnrollVectorOp method does not expect that the vector that it will unroll will have a chain, so it has an assert that prevents it from running if this is the case. This new StrictFP version of the method deals with the chain while unrolling the vector. With this new function in place during vector widending, llc can run vector-constrained-fp-intrinsics.ll for SystemZ successfully.

Submitted by:	Drew Wock <drew.wock@sas.com>
Reviewed by:	Cameron McInally, Kevin P. Neal
Approved by:	Cameron McInally
Differential Revision:	https://reviews.llvm.org/D62546

llvm-svn: 362241
2019-05-31 16:32:12 +00:00
Stanislav Mekhanoshin fbbe5230f4 [AMDGPU] Use InliningThresholdMultiplier for inline hint
AMDGPU uses multiplier 9 for the inline cost. It is taken into account
everywhere except for inline hint threshold. As a result we are penalizing
functions with the inline hint making them less probable to be inlined
than those without the hint. Defaults are 225 for a normal function and
325 for a function with an inline hint. Currently we have effective
threshold 225 * 9 = 2025 for normal functions and just 325 for those with
the hint. That is fixed by this patch.

Differential Revision: https://reviews.llvm.org/D62707

llvm-svn: 362239
2019-05-31 16:19:26 +00:00
Cameron McInally 8ff009a461 [NFC][InstCombine] Add unary FNeg tests to fabs.ll
llvm-svn: 362238
2019-05-31 16:17:04 +00:00
Guozhi Wei c3a24e93d5 [PPC] Correctly adjust branch probability in PPCReduceCRLogicals
In PPCReduceCRLogicals after splitting the original MBB into 2, the 2 impacted branches still use original branch probability. This is unreasonable. Suppose we have following code, and the probability of each successor is 50%.

    condc = conda || condb
    br condc, label %target, label %fallthrough

It can be transformed to following,

    br conda, label %target, label %newbb
  newbb:
    br condb, label %target, label %fallthrough

Since each branch has a probability of 50% to each successor, the total probability to %fallthrough is 25% now, and the total probability to %target is 75%. This actually changed the original profiling data. A more reasonable probability can be set to 70% to the false side for each branch instruction, so the total probability to %fallthrough is close to 50%.

This patch assumes the branch target with two incoming edges have same edge frequency and computes new probability fore each target, and keep the total probability to original targets unchanged.

Differential Revision: https://reviews.llvm.org/D62430

llvm-svn: 362237
2019-05-31 16:11:17 +00:00
Cameron McInally 6d2a4712f3 [NFC][InstCombine] Add unary FNeg tests to fcmp.ll
llvm-svn: 362234
2019-05-31 15:40:03 +00:00
Cameron McInally aea3149e6c [NFC][InstCombine] Add unary FNeg tests to fdiv.ll
llvm-svn: 362231
2019-05-31 15:10:34 +00:00
Simon Pilgrim db6a1d4f24 [AMDGPU] Regenerate add/sub shrink constant tests for an upcoming patch
llvm-svn: 362230
2019-05-31 15:06:51 +00:00
Simon Pilgrim 27d6ea9698 [AMDGPU] Regenerate CTLZ tests for an upcoming patch
llvm-svn: 362229
2019-05-31 15:06:14 +00:00
Cameron McInally 66c25def00 [NFC][InstCombine] Add unary FNeg tests to fma.ll
llvm-svn: 362227
2019-05-31 14:49:31 +00:00
George Rimar 60d88e0e90 [llvm-readobj] - Remove excessive `dynamic.test`
dynamic.test is a test that checks dumping of
dynamic tags. It uses precompiled objects as inputs
and it is completely excessive nowadays:

Now we have elf-dynamic-tags-machine-specific.test
and elf-dynamic-tags.test. 
(https://github.com/llvm-mirror/llvm/blob/master/test/tools/llvm-readobj/elf-dynamic-tags-machine-specific.test)
(https://github.com/llvm-mirror/llvm/blob/master/test/tools/llvm-readobj/elf-dynamic-tags.test)

First is used to check target specific tags and second tests the common flags.
These tests use YAML, which is much better than using precompiled binaries.

Note that new reviews tend to update the YAML based
tests to add new tags, e.g. see D62596.

With this patch it became possible to remove
dynamic-table-so.aarch64 binary from the inputs folder.
(other binaries are still used in other tests).

Differential revision: https://reviews.llvm.org/D62728

llvm-svn: 362224
2019-05-31 13:16:21 +00:00
Roman Lebedev 39390d8317 [InstCombine] 'C-(C2-X) --> X+(C-C2)' constant-fold
It looks this fold was already partially happening, indirectly
via some other folds, but with one-use limitation.
No other fold here has that restriction.

https://rise4fun.com/Alive/ftR

llvm-svn: 362217
2019-05-31 09:47:16 +00:00
Roman Lebedev 886c4ef35a [InstCombine] 'add (sub C1, X), C2 --> sub (add C1, C2), X' constant-fold
https://rise4fun.com/Alive/qJQ

llvm-svn: 362216
2019-05-31 09:47:04 +00:00
Cullen Rhodes 0fc3a07398 [AArch64][SVE2] Asm: support WHILE instructions
Summary:
Patch adds support for the following instructions:
    * WHILEGE, WHILEGT, WHILEHS, WHILEHI, WHILEWR, WHILERW

The specification can be found here:
https://developer.arm.com/docs/ddi0602/latest

Reviewed By: chill

Differential Revision: https://reviews.llvm.org/D62601

llvm-svn: 362215
2019-05-31 09:13:55 +00:00
Cullen Rhodes 087d1337f8 [AArch64][SVE2] Asm: support TBL/TBX instructions
Summary:
A three sources variant of the TBL instruction is added to the existing
SVE instruction in SVE2. This is implemented with minor changes to the
existing TableGen class. TBX is a new instruction with its own
definition.

The specification can be found here:
https://developer.arm.com/docs/ddi0602/latest

Reviewed By: chill

Differential Revision: https://reviews.llvm.org/D62600

llvm-svn: 362214
2019-05-31 09:06:53 +00:00
Cullen Rhodes 2e870011b6 [AArch64][SVE2] Asm: support SVE2 store instructions
Summary:
Patch adds support for the following instructions:
    * STNT1B, STNT1H, STNT1S, STNT1D

The specification can be found here:
https://developer.arm.com/docs/ddi0602/latest

Reviewed By: chill

Differential Revision: https://reviews.llvm.org/D62599

llvm-svn: 362213
2019-05-31 08:59:40 +00:00
Petar Avramovic f317debdb8 [MIPS GlobalISel] Add detailed tests for lower call
Test different operand types of callee and their behavior whether
relocation model is pic or not.
Possible operand types are:
Register (function pointer),
External symbol (used for libcalls e.g. __udivdi3 or memcpy),
Global address.

Global address has different handling depending on relocation model
and linkage type. Register and external symbol do not.

Differential Revision: https://reviews.llvm.org/D62590

llvm-svn: 362212
2019-05-31 08:40:08 +00:00
Petar Avramovic efcd3c0009 [MIPS GlobalISel] Handle position independent code
Handle position independent code for MIPS32.
When callee is global address, lower call will emit callee
as G_GLOBAL_VALUE and add target flag if needed.
Support $gp in getRegBankFromRegClass().
Select G_GLOBAL_VALUE, specially handle case when
there are target flags attached by lowerCall.

Differential Revision: https://reviews.llvm.org/D62589

llvm-svn: 362210
2019-05-31 08:27:06 +00:00
Roman Lebedev d1d915b8da [NFC][InstCombine] Copy add/sub constant-folding tests from codegen
Last three patterns are missed.

llvm-svn: 362209
2019-05-31 08:24:07 +00:00
Roman Lebedev 7c1ac8269a [NFC][Codegen] Add/sub constant-folding: add scalar tests too
Just for completeness.

llvm-svn: 362208
2019-05-31 08:23:48 +00:00
Petar Avramovic f4a6dd28b6 [MIPS GlobalISel] Lower call for callee that is register
Lower call for callee that is register for MIPS32.
Register should contain callee function address.

Differential Revision: https://reviews.llvm.org/D62585

llvm-svn: 362204
2019-05-31 08:06:17 +00:00
Craig Topper 31d00d80a2 [X86] Remove patterns for X86VSintToFP/X86VUintToFP+loadv4f32 to v2f64.
These patterns can incorrectly narrow a volatile load from 128-bits to 64-bits.
Similar to PR42079.

Switch to using (v4i32 (bitcast (v2i64 (scalar_to_vector (loadi64))))) as the
load pattern used in the instructions.

This probably still has issues in 32-bit mode where loadi64 isn't legal. Maybe
we should use VZMOVL for widened loads even when we don't need the upper bits
as zeroes?

llvm-svn: 362203
2019-05-31 07:38:26 +00:00
Craig Topper cded573710 [X86] Add test cases for failure to use 128-bit masked vcvtdq2pd when load starts as v2i32.
llvm-svn: 362202
2019-05-31 07:38:22 +00:00
Craig Topper 67d43e0744 [X86] Add test cases for a volatile load shrinking bug involving cvtdq2pd. NFC
Similar to PR42079

llvm-svn: 362201
2019-05-31 07:38:18 +00:00
Craig Topper cb0ad5accb [X86] Copy a test case from avx512-cvt.ll to avx512-cvt-widen.ll. NFC
llvm-svn: 362200
2019-05-31 07:38:14 +00:00
Craig Topper b79cc5f802 [X86] Remove avx512 isel patterns for fpextend+load. Prefer to only match fp extloads instead.
DAG combine will usually fold fpextend+load to an fp extload anyway. So the
256 and 512 patterns were probably unnecessary. The 128 bit pattern was special
in that it looked for a v4f32 load, but then used it in an instruction that
only loads 64-bits. This is bad if the load happens to be volatile. We could
probably make the patterns volatile aware, but that's more work for something
that's probably rare. The peephole pass might kick in and save us anyway. We
might also be able to fix this with some additional DAG combines.

This also adds patterns for vselect+extload to enabled masked vcvtps2pd to be
used. Previously we looked for the unlikely vselect+fpextend+load.

llvm-svn: 362199
2019-05-31 06:21:53 +00:00
Craig Topper 73b07284df [X86] Add test to show missed opportunity to use masked vcvtps2pd for vselect+extload.
llvm-svn: 362198
2019-05-31 06:21:49 +00:00
Craig Topper 8cb076ec6e [X86] Add test case for PR42079. NFC
llvm-svn: 362197
2019-05-31 06:21:45 +00:00
Puyan Lotfi 0d63cef180 [MIR-Canon] Skip the first N vreg names lazily.
This consolidates the vreg skip code into one function (SkipVRegs()).
SkipVRegs() now knows if it should skip as if it is the first initialization or
subsequent skips.

The first skip is also done the first time createVirtualRegister is called by
the cursor instead of by the cursor's constructor. This prevents verifier
errors on machine functions that have no vregs (where the verifier will
complain that there are vregs when the function uses none).

Differential Revision: https://reviews.llvm.org/D62717

llvm-svn: 362195
2019-05-31 06:02:38 +00:00
Craig Topper 23066033a1 [X86] Correct the ins operand order for MASKPAIR16STORE to match other store instructions.
This makes the 5 address operands come first. And the data operand comes last.

This matches the operand order the instruction is created with. It's also the
expected order in X86MCInstLower. So everything appeared to work, but the
operands didn't match their declared type.

Fixes a -verify-machineinstrs failure.

Also remove the isel patterns from these instructions since they should only
be used for stack spills and reloads. I'm not even sure what types the patterns
were looking for to match.

llvm-svn: 362193
2019-05-31 05:20:27 +00:00
Puyan Lotfi 2a901401fe [MIR-Canon] Hardening propagateLocalCopies.
This is am almost NFC, it does the following:
- If there is no register class for a COPY's src or dst, bail.
- Fixes uses iterator invalidation bug.

Differential Revision: https://reviews.llvm.org/D62713

llvm-svn: 362191
2019-05-31 04:49:58 +00:00
Pengfei Wang 2e67d0c842 [X86] Add VP2INTERSECT instructions
Support Intel AVX512 VP2INTERSECT instructions in llvm

Patch by Xiang Zhang (xiangzhangllvm)

Differential Revision: https://reviews.llvm.org/D62366

llvm-svn: 362188
2019-05-31 02:50:41 +00:00
Douglas Yung f1e300ca1a Fix test to add missing '|' to regex.
llvm-svn: 362168
2019-05-30 22:20:31 +00:00
Michael Trent 5e1881f9b2 Update the tests in r362121 / r362141 to allow for Windows-specific error
messages: "Is a directory" instead of "is a directory"

This should resolve the errors being reported on clang-x64-windows-msvc.

llvm-svn: 362167
2019-05-30 22:11:29 +00:00
Amy Huang dd3a9caf47 Add enums as global variables in the IR metadata.
Summary:
Keeps track of the enums that were used by saving them as DIGlobalVariables,
since CodeView emits debug info for global constants.

Reviewers: rnk

Subscribers: aprantl, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D62635

llvm-svn: 362166
2019-05-30 22:04:11 +00:00
Roman Lebedev 31f1939848 [NFC][ARM] Add a test that potentially causes endless combine loop with D62266
llvm-svn: 362159
2019-05-30 21:41:21 +00:00
Puyan Lotfi daaecf98c9 [MIR-Canon] Fixing case where MachineFunction is empty.
In cases where the machine function is empty: bail on the RPO traversal.

Differential Revision: https://reviews.llvm.org/D62617

llvm-svn: 362158
2019-05-30 21:37:25 +00:00
Nikita Popov 751be7d51a [CVP] Add tests for non-overflowing saturating math; NFC
llvm-svn: 362153
2019-05-30 21:03:17 +00:00
Roman Lebedev a4e3b50e26 [DAGCombiner][X86][AArch64] (x - C) + y -> (x + y) - C fold. Try 2
Summary:
Only vector tests are being affected here,
since subtraction by scalar constant is rewritten
as addition by negated constant.

No surprising test changes.

https://rise4fun.com/Alive/pbT

This is a recommit, originally committed in rL361852, but reverted
to investigate test-suite compile-time hangs.

Reviewers: RKSimon, craig.topper, spatel

Reviewed By: RKSimon

Subscribers: javed.absar, kristof.beyls, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62257

llvm-svn: 362146
2019-05-30 20:37:49 +00:00
Roman Lebedev 57aa36ff91 [DAGCombine] (x - C) - y -> (x - y) - C fold. Try 3
Summary:
Again only vectors affected. Frustrating. Let me take a look into that..

https://rise4fun.com/Alive/AAq

This is a recommit, originally committed in rL361852, but reverted
to investigate test-suite compile-time hangs, and then reverted in
rL362109 to fix missing constant folds that were causing
endless combine loops.

Reviewers: RKSimon, craig.topper, spatel

Reviewed By: RKSimon

Subscribers: javed.absar, JDevlieghere, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62294

llvm-svn: 362145
2019-05-30 20:37:39 +00:00
Roman Lebedev 63b4741534 [DAGCombine][X86][AArch64][AMDGPU] (x - y) + -1 -> add (xor y, -1), x fold. Try 3
Summary:
This prevents regressions in next patch,
and somewhat recovers from the regression to AMDGPU test in D62223.

It is indeed not great that we leave vector decrement,
don't transform it into vector add all-ones..

https://rise4fun.com/Alive/ZRl

This is a recommit, originally committed in rL361852, but reverted
to investigate test-suite compile-time hangs, and then reverted in
rL362109 to fix missing constant folds that were causing
endless combine loops.

Reviewers: RKSimon, craig.topper, spatel, arsenm

Reviewed By: RKSimon, arsenm

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, javed.absar, dstuttard, tpr, t-tye, kristof.beyls, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62263

llvm-svn: 362144
2019-05-30 20:37:29 +00:00
Roman Lebedev 05ad5fd213 [DAGCombiner][X86][AArch64][SPARC][SystemZ] y - (x + C) -> (y - x) - C fold. Try 3
Summary:
Direct sibling of D62223 patch.
While i don't have a direct motivational pattern for this,
it would seem to make sense to handle both patterns (or none),
for symmetry?

The aarch64 changes look neutral;
sparc and systemz look like improvement (one less instruction each);
x86 changes - 32bit case improves, 64bit case shows that LEA no longer
gets constructed, which may be because that whole test is `-mattr=+slow-lea,+slow-3ops-lea`

https://rise4fun.com/Alive/ffh

This is a recommit, originally committed in rL361852, but reverted
to investigate test-suite compile-time hangs, and then reverted in
rL362109 to fix missing constant folds that were causing
endless combine loops.

Reviewers: RKSimon, craig.topper, spatel, t.p.northover

Reviewed By: t.p.northover

Subscribers: t.p.northover, jyknight, javed.absar, kristof.beyls, fedor.sergeev, jrtc27, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62252

llvm-svn: 362143
2019-05-30 20:37:18 +00:00
Roman Lebedev 1d9ec7a81b [DAGCombiner][X86][AArch64][AMDGPU] (x + C) - y -> (x - y) + C fold. Try 3
Summary:
The main motivation is shown by all these `neg` instructions that are now created.
In particular, the `@reg32_lshr_by_negated_unfolded_sub_b` test.

AArch64 test changes all look good (`neg` created), or neutral.

X86 changes look neutral (vectors), or good (`neg` / `xor eax, eax` created).

I'm not sure about `X86/ragreedy-hoist-spill.ll`, it looks like the spill
is now hoisted into preheader (which should still be good?),
2 4-byte reloads become 1 8-byte reload, and are elsewhere,
but i'm not sure how that affects that loop.

I'm unable to interpret AMDGPU change, looks neutral-ish?

This is hopefully a step towards solving [[ https://bugs.llvm.org/show_bug.cgi?id=41952 | PR41952 ]].

https://rise4fun.com/Alive/pkdq (we are missing more patterns, i'll submit them later)

This is a recommit, originally committed in rL361852, but reverted
to investigate test-suite compile-time hangs, and then reverted in
rL362109 to fix missing constant folds that were causing
endless combine loops.

Reviewers: craig.topper, RKSimon, spatel, arsenm

Reviewed By: RKSimon

Subscribers: bjope, qcolombet, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, javed.absar, dstuttard, tpr, t-tye, kristof.beyls, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62223

llvm-svn: 362142
2019-05-30 20:36:54 +00:00
Michael Trent c58130bc84 Write new tests for r362121
Summary:
The tests for r362121 ran dsymutil against a test binary every time.
This caused problems on lld-x86_64-ubuntu-fast as dsymutil required
a lipo tool be available to process those binaries.

This change rewrites the new test cases in macho-disassemble-g-dsym
to use bespoke test binaries (exe and dwarf) simplifying the test's
runtime dependencies.

The changes to tools/llvm-objdump/MachODump.cpp are unchanged from
r362121

Reviewers: pete, lhames, JDevlieghere

Reviewed By: pete

Subscribers: smeenai, aprantl, rupprecht, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62694

llvm-svn: 362141
2019-05-30 20:09:09 +00:00
Lang Hames 0e124b37bd [RuntimeDyld] Apply padding and alignment bumps to all sections with stubs, and
increase the MachO/x86-64 stub alignment to 8.

Stub alignment should be guaranteed for any section containing RuntimeDyld
stubs/GOT-entries. To do this we should pad and align all sections containing
stubs, not just code sections.

This commit also bumps the MachO/x86-64 stub alignment to 8, so that GOT entries
will be aligned.

llvm-svn: 362139
2019-05-30 19:59:20 +00:00
Cameron McInally 04a38b924e [NFC][InstCombine] Add unary FNeg tests to fmul.ll
llvm-svn: 362137
2019-05-30 19:42:25 +00:00
Matt Arsenault e0a4da8c0a AMDGPU/GlobalISel: Add wave scratch offset argument
Avoids crashing in PEI in a future change.

llvm-svn: 362136
2019-05-30 19:33:18 +00:00
Roman Lebedev 7eb8b5b5dd [DAGCombine] ((c1-A)-c2) -> ((c1-c2)-A) constant-fold
Summary: https://rise4fun.com/Alive/B0A

Reviewers: t.p.northover, RKSimon, spatel, craig.topper

Reviewed By: RKSimon

Subscribers: javed.absar, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62691

llvm-svn: 362135
2019-05-30 19:27:51 +00:00
Roman Lebedev 691b5e2ecc [DAGCombine] (A-C1)-C2 -> A-(C1+C2) constant-fold
Summary: https://rise4fun.com/Alive/Mb1M

Reviewers: RKSimon, craig.topper, spatel, t.p.northover

Reviewed By: t.p.northover

Subscribers: t.p.northover, javed.absar, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62689

llvm-svn: 362134
2019-05-30 19:27:42 +00:00
Roman Lebedev 0a3dbbcdfb [DAGCombine] (A+C1)-C2 -> A+(C1-C2) constant-fold
Summary:
Direct sibling of D62662, the root cause of the endless combine loop in D62257

https://rise4fun.com/Alive/d3W

Reviewers: RKSimon, craig.topper, spatel, t.p.northover

Reviewed By: t.p.northover

Subscribers: t.p.northover, javed.absar, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62664

llvm-svn: 362133
2019-05-30 19:27:32 +00:00
Roman Lebedev cc9a9cf237 [DAGCombine] ((A-c1)+c2) -> (A+(c2-c1)) constant-fold
Summary:
This was the root cause of the endless combine loop in D62257

https://rise4fun.com/Alive/d3W

Reviewers: RKSimon, spatel, craig.topper, t.p.northover

Reviewed By: t.p.northover

Subscribers: t.p.northover, javed.absar, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62662

llvm-svn: 362131
2019-05-30 19:27:19 +00:00
Tim Northover b7141207a4 Reapply: IR: add optional type to 'byval' function parameters
When we switch to opaque pointer types we will need some way to describe
how many bytes a 'byval' parameter should occupy on the stack. This adds
a (for now) optional extra type parameter.

If present, the type must match the pointee type of the argument.

The original commit did not remap byval types when linking modules, which broke
LTO. This version fixes that.

Note to front-end maintainers: if this causes test failures, it's probably
because the "byval" attribute is printed after attributes without any parameter
after this change.

llvm-svn: 362128
2019-05-30 18:48:23 +00:00
Tim Renouf 7fecdf36cc [AMDGPU] Added target-specific attribute amdgpu-max-memory-clause
With LLPC, previous investigation has suggested that si-scheduler
interacts badly with SiFormMemoryClauses on an XNACK target in some
games.

That needs further investigation in the future. In the meantime, this
commit adds a target-specific attribute to allow us to disable
SIFormMemoryClauses by setting it to 1 on a per-function basis for LLPC
to use.

Differential Revision: https://reviews.llvm.org/D62572

Change-Id: Ia0ca12ce79093cbbe86caded723ffb13384ede92
llvm-svn: 362127
2019-05-30 18:46:34 +00:00
Craig Topper 778e445c58 [LoopVectorize] Add FNeg instruction support
Differential Revision: https://reviews.llvm.org/D62510

llvm-svn: 362124
2019-05-30 18:19:35 +00:00
Michael Trent 5d5f629922 Reverting change r362121 due to lld-x86_64-ubuntu-fast test failures
llvm-svn: 362123
2019-05-30 18:17:10 +00:00
Puyan Lotfi 0f4446b270 [MIR-Canon] Add support for rewriting VRegs that are typed but don't have an RC.
There were crashes (addrspace-memoperands.mir was only one of them) in MIR that
had operands that came from before register classes were set. With these
operands, creating a replacement vreg (for MIR-Canon's renaming) needs to use
the vreg type rather than the RegisterClass which is not present.

Differential Revision: https://reviews.llvm.org/D62543

llvm-svn: 362122
2019-05-30 18:06:28 +00:00
Michael Trent 50daaa5f6b Support Universal dSYM files in llvm-objdump
Summary:
Commonly programmers use llvm-objdump to disassemble Mach-O target
binaries with Mach-O dSYMS. While llvm-objdump allows programmers to
disassemble Universal binaries, it previously did not recognize
Universal dSYM files. This change updates llvm-objdump to support
passing in Universal files via the -dsym option. Now, when
disassembling a Mach-O file either as a stand alone file or as an entry
in a Universal binariy, llvm-objdump will search through a Universal
dSYM for a Mach-O matching the architecture flag of the file being
disassembled.

Reviewers: pete, lhames

Reviewed By: pete

Subscribers: rupprecht, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62642

llvm-svn: 362121
2019-05-30 17:56:05 +00:00
Kevin P. Neal d3db7b40b0 Revert r362112, it broke the bots with the message "Unsupported vector argument or return type"
Differential Revision:	http://reviews.llvm.org/D62546

llvm-svn: 362117
2019-05-30 17:10:21 +00:00
Roman Lebedev 2ae4b33181 [NFC][Codegen] Potential add/sub constant folding: fixup non-splat tests
llvm-svn: 362114
2019-05-30 16:50:43 +00:00
Kevin P. Neal 2e1807678d [FPEnv] Added a special UnrollVectorOp method to deal with the chain on StrictFP opcodes
This change creates UnrollVectorOp_StrictFP. The purpose of this is to address a failure that consistently occurs when calling StrictFP functions on vectors whose number of elements is 3 + 2n on most platforms, such as PowerPC or SystemZ. The old UnrollVectorOp method does not expect that the vector that it will unroll will have a chain, so it has an assert that prevents it from running if this is the case. This new StrictFP version of the method deals with the chain while unrolling the vector. With this new function in place during vector widending, llc can run vector-constrained-fp-intrinsics.ll for SystemZ successfully.

Submitted by:	Drew Wock <drew.wock@sas.com>
Reviewed by:	Cameron McInally, Kevin P. Neal
Approved by:	Cameron McInally
Differential Revision:	http://reviews.llvm.org/D62546

llvm-svn: 362112
2019-05-30 16:44:47 +00:00
Roman Lebedev 700fdb1070 [NFC][Codegen] Add better test coverage for potential add/sub constant folding
This adds hopefully-full test coverage for all the possible permutations:
First op is one of:
* x + c1
* x - c1
* c1 - x

Second op is one of:
* + c2
* - c2
* c2 -

And thus 3*3=9 patterns.
Some of them show missed constant-folds.

Without previous patch (the revert), these tests were causing endless
dagcombine loop. I really should have thought about this first :S

llvm-svn: 362110
2019-05-30 16:07:19 +00:00
Roman Lebedev 019d270e43 [DAGCombine] Revert of recommit of "binop-with-const hoisting" patches
I was looking into an endless combine loop the uncommitted follow-up patch
was causing, and it appears even these patches can exibit such an
endless loop. The root cause is that we try to hoist one binop (add/sub) with
constant operand, and if we get two such binops both of which are
eligible for this hoisting, we get stuck.

Some cases may highlight missing constant-folds.

Reverts r361871,r361872,r361873,r361874.

llvm-svn: 362109
2019-05-30 16:07:11 +00:00
Roman Lebedev 8f220a5d2c [NFC][Codegen] Add add+sub/sub+add constant-fold tests for from D62257
add+sub/sub+add when second operands are constants should be folded
into a single add, just like with add+add.

llvm-svn: 362093
2019-05-30 13:02:11 +00:00
Roman Lebedev e8578953ac [LoopIdiom] Basic OptimizationRemarkEmitter handling
Summary:
I'm adding ORE to memset/memcpy formation, with tests,
but mainly this is split off from D61144.

Reviewers: reames, anemet, thegameg, craig.topper

Reviewed By: thegameg

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62631

llvm-svn: 362092
2019-05-30 13:02:06 +00:00
Sjoerd Meijer 930dee2c0b [ARM] add target arch definitions for 8.1-M and MVE
This adds:
- LLVM subtarget features to make all the new instructions conditional on,
- CPU and FPU names for use on clang's command line, with default FPUs set
  so that "armv8.1-m.main+fp" and "armv8.1-m.main+fp.dp" will select the right
  FPU features,
- architecture extension names "mve" and "mve.fp",
- ABI build attribute support for v8.1-M (a new value for Tag_CPU_arch) and MVE
  (a new actual tag).

Patch mostly by Simon Tatham.

Differential Revision: https://reviews.llvm.org/D60698

llvm-svn: 362090
2019-05-30 12:57:04 +00:00
George Rimar 31e6d8feea [llvm-readobj] - Rewrite reloc-types.test to use YAML. NFCI.
This change rewrites and splits reloc-types.test
to use yaml2obj instead of precompiled binaries.
That allowed to remove 7 precompiled objects from the inputs.

I took the existent objects, used obj2yaml on them, simplified the result and
used yaml2obj in the test case with the result.

Notes:
* I converted, but did not remove relocs.obj.elf-i386, relocs.obj.elf-x86_64 or relocs.obj.elf-mips objects
because found they are used in other tests. 
* I was unable to convert relocs.obj.elf-ppc64, because obj2yaml hangs on this file for me.
* I was unable to convert relocs.obj.macho-arm, relocs.obj.macho-i386 and relocs.obj.macho-x86_64
because the output produced by obj2yaml does not seem to be correct.
* Because of the above I did not remove the script for creating all
of those objects: test\tools\llvm-readobj\Inputs\relocs.py

Differential revision: https://reviews.llvm.org/D62594

llvm-svn: 362089
2019-05-30 12:39:05 +00:00
Sjoerd Meijer 7eb95d672d [ARM] Introduce separate features for FP registers
The MVE extension in Arm v8.1-M permits the use of some move, load and
store isntructions which access the FP registers, even if there's no
actual FP support in the processor (in particular, if you have the
integer-only version of MVE).

Therefore, we need separate subtarget features to condition those
instructions on, which are implied by both FP and MVE but are not part
of either.

Patch mostly by Simon Tatham.

Differential Revision: https://reviews.llvm.org/D60694

llvm-svn: 362088
2019-05-30 12:37:05 +00:00
Simon Pilgrim 9e7be9b745 [CostModel][X86] Add bool vector and/or/xor cost tests
llvm-svn: 362083
2019-05-30 10:41:04 +00:00
George Rimar c372f41c18 [llvm-readobj/llvm-readelf] - Implement GNU style dumper of the SHT_GNU_verdef section.
It was not implemented yet, we had only LLVM style dumper implemented.
Section description is here: https://refspecs.linuxfoundation.org/LSB_5.0.0/LSB-Core-generic/LSB-Core-generic/symversion.html

Differential revision: https://reviews.llvm.org/D62520

llvm-svn: 362082
2019-05-30 10:36:52 +00:00
Simon Pilgrim 32aac1727a [X86][SSE] Improve bool vector extload (PR26091)
We already have good codegen for (vXiY *ext(vXi1 bitcast(iX))) cases, this patch uses it for loads of vXi1 types as well - changing the load into a iX integer load, and bitcasting so that combineToExtendBoolVectorInReg can then use it.

Differential Revision: https://reviews.llvm.org/D62449

llvm-svn: 362081
2019-05-30 10:25:20 +00:00
George Rimar e3406c42a4 [llvm-readobj/llvm-readelf] - Implement GNU style dumper of the SHT_GNU_verneed section.
It was not implemented yet, we had only LLVM style dumper implemented.
Section description is here: https://refspecs.linuxfoundation.org/LSB_2.0.1/LSB-Core/LSB-Core/symverrqmts.html

Differential revision: https://reviews.llvm.org/D62516

llvm-svn: 362080
2019-05-30 10:14:41 +00:00
Eugene Leviant fa147c97d6 [llvm-objcopy] Remove %p format specifiers
On 32-bit machines %p expects 32 bit values, however
addresses in llvm-objcopy are always 64 bits.

llvm-svn: 362074
2019-05-30 09:09:01 +00:00
Cullen Rhodes 7fad428931 [AArch64][SVE2] Asm: support SVE2 vector splice (constructive)
Summary:
The specification can be found here:
https://developer.arm.com/docs/ddi0602/latest

Reviewed By: SjoerdMeijer

Differential Revision: https://reviews.llvm.org/D62530

llvm-svn: 362073
2019-05-30 08:51:39 +00:00
Cullen Rhodes ebe23041f0 [AArch64][SVE2] Asm: support SVE2 load instructions
Summary:
Patch adds support for the following instructions:
    * LDNT1SB, LDNT1B, LDNT1SH, LDNT1H, LDNT1SW, LDNT1W, LDNT1D

The specification can be found here:
https://developer.arm.com/docs/ddi0602/latest

Reviewed By: SjoerdMeijer

Differential Revision: https://reviews.llvm.org/D62528

llvm-svn: 362072
2019-05-30 08:44:27 +00:00
Cullen Rhodes 455c529f77 [AArch64][SVE2] Asm: support FCVTX/FLOGB instructions
Summary:

Patch completes SVE2 support for:

    SVE Floating Point Unary Operations - Predicated Group

The specification can be found here:
https://developer.arm.com/docs/ddi0602/latest

Reviewed By: SjoerdMeijer

Differential Revision: https://reviews.llvm.org/D62526

llvm-svn: 362071
2019-05-30 08:35:12 +00:00
Cullen Rhodes 028413f5ae [AArch64][SVE2] Asm: add ext (immediate offset, constructive) instruction
Summary:
The specification can be found here:
https://developer.arm.com/docs/ddi0602/latest

Reviewed By: chill

Differential Revision: https://reviews.llvm.org/D62518

llvm-svn: 362070
2019-05-30 08:25:17 +00:00
Craig Topper a807495fd1 [LoopVectorize] Precommit tests for D62510. NFC
llvm-svn: 362060
2019-05-30 06:48:13 +00:00
Florian Hahn e4cfa89915 [LV] Inform about exactly reason of loop illegality
Currently, only the following information is provided by LoopVectorizer
in the case when the CF of the loop is not legal for vectorization:

 LV: Can't vectorize the instructions or CFG
    LV: Not vectorizing: Cannot prove legality.

But this information is not enough for the root cause analysis; what is
exactly wrong with the loop should also be printed:

 LV: Not vectorizing: The exiting block is not the loop latch.

Patch by Pavel Samolysov.

Reviewers: mkuper, hsaito, rengolin, fhahn

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D62311

llvm-svn: 362056
2019-05-30 05:03:12 +00:00
Pengfei Wang 1f67d94279 [X86] Add ENQCMD instructions
For more details about these instructions, please refer to the latest
ISE document:
https://software.intel.com/en-us/download/intel-architecture-instruction-set-extensions-programming-reference.

Patch by Tianqing Wang (tianqing)

Differential Revision: https://reviews.llvm.org/D62281

llvm-svn: 362053
2019-05-30 03:59:16 +00:00
Amy Huang 325003be02 CodeView - add static data members to global variable debug info.
Summary:
Add static data members to IR debug info's list of global variables
so that they are emitted as S_CONSTANT records.

Related to https://bugs.llvm.org/show_bug.cgi?id=41615.

Reviewers: rnk

Subscribers: aprantl, cfe-commits, llvm-commits, thakis

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D62167

llvm-svn: 362038
2019-05-29 21:45:34 +00:00
Reid Kleckner 86bad3f924 [llvm-pdbutil] Dump inline call site line table annotations
This ports and improves on some existing llvm-readobj -codeview dumping
functionality that llvm-pdbutil lacked.

Helpful for comparing inline line tables between MSVC and clang.

llvm-svn: 362037
2019-05-29 21:26:25 +00:00
Matt Arsenault 79b3ea701c LoopVersioningLICM: Respect convergent and noduplicate
llvm-svn: 362031
2019-05-29 20:47:59 +00:00
Tim Northover 71ee3d0237 Revert "IR: add optional type to 'byval' function parameters"
The IRLinker doesn't delve into the new byval attribute when mapping types, and
this breaks LTO.

llvm-svn: 362029
2019-05-29 20:46:38 +00:00
Roman Lebedev 68908c9017 UpdateTestChecks: Lanai triple support
Summary:
The assembly structure most resembles the SPARC pattern:
```
        .globl  f6                      ! -- Begin function f6
        .p2align        2
        .type   f6,@function
f6:                                     ! @f6
        .cfi_startproc
! %bb.0:
        st      %fp, [--%sp]
<...>
        ld      -8[%fp], %fp
.Lfunc_end0:
        .size   f6, .Lfunc_end0-f6
        .cfi_endproc
                                        ! -- End function
```
Test being affected by upcoming patch, so regenerate it.

Reviewers: RKSimon, jpienaar

Reviewed By: RKSimon

Subscribers: jyknight, fedor.sergeev, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62545

llvm-svn: 362019
2019-05-29 20:03:00 +00:00
Benjamin Kramer 107f8d9873 [DAGCombiner] Replace gathers with a zero mask with the passthru value
These can be created by the legalizer when splitting a larger gather.

See https://llvm.org/PR42055 for a motivating example.

Differential Revision: https://reviews.llvm.org/D62613

llvm-svn: 362015
2019-05-29 19:24:19 +00:00
Tim Northover 6e07f16fae IR: add optional type to 'byval' function parameters
When we switch to opaque pointer types we will need some way to describe
how many bytes a 'byval' parameter should occupy on the stack. This adds
a (for now) optional extra type parameter.

If present, the type must match the pointee type of the argument.

Note to front-end maintainers: if this causes test failures, it's probably
because the "byval" attribute is printed after attributes without any parameter
after this change.

llvm-svn: 362012
2019-05-29 19:12:48 +00:00
Nikita Popov 5382803b04 [InstCombine] Optimize always overflowing signed saturating add/sub
Based on the overflow direction information added in D62463, we can
now fold always overflowing signed saturating add/sub to signed min/max.

Differential Revision: https://reviews.llvm.org/D62544

llvm-svn: 362006
2019-05-29 18:37:13 +00:00
Aakanksha Patil d5443f8c21 AMDGPU: Return address lowering
The patch computes the return address for the current function.

Differential revision: https://reviews.llvm.org/D59666

llvm-svn: 362001
2019-05-29 18:20:11 +00:00
Eugene Leviant c98b288b03 Yet another attempt to fix buildbot after r361949
Looks like %p format specifier of createStringError behaves
differently on different platforms

llvm-svn: 361993
2019-05-29 17:14:48 +00:00
Craig Topper e3a76fa1e2 [X86] Fix machineverifier error on avx512f-256-set0.mir
Previously the pass ran the entire llc pipeline which caused the IR to be recodegened.

This commit restricts it to just running the postrapseudos pass and checking the results of that instead of the final assembly.

llvm-svn: 361991
2019-05-29 17:02:27 +00:00
Matt Arsenault f80c4241b3 CallSiteSplitting: Respect convergent and noduplicate
llvm-svn: 361990
2019-05-29 16:59:48 +00:00
Teresa Johnson 5b2088d1fa [ThinLTO] Use original alias visibility when importing
Summary:
When we import an alias, we do so by making a clone of the aliasee. Just
as this clone uses the original alias name and linkage, it should also
use the same visibility (not the aliasee's visibility). Otherwise,
linker behavior is affected (e.g. if the aliasee was hidden, but the
alias is not, the resulting imported clone should not be hidden,
otherwise the linker will make the final symbol hidden which is
incorrect).

Reviewers: wmi

Subscribers: mehdi_amini, inglorion, eraman, steven_wu, dexonsmith, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62535

llvm-svn: 361989
2019-05-29 16:50:46 +00:00
Cameron McInally 98a797c224 [NFC][InstCombine] Add a unary FNeg test to fsub.ll.
llvm-svn: 361988
2019-05-29 16:50:14 +00:00
Kevin P. Neal 308b7139b1 Partial revert of revert of r361827: Add constrained intrinsic tests for powerpc64le.
The powerpc64-"nonle" tests are removed. They fail because of a bug that
Drew is currently working on that affects multiple targets.

Submitted by:	Drew Wock <drew.wock@sas.com>
Reviewed by:	Hal Finkel, Kevin P. Neal
Approved by:	Hal Finkel
Differential Revision:	http://reviews.llvm.org/D62388

llvm-svn: 361985
2019-05-29 16:29:31 +00:00
Cameron McInally 28f384a7c7 [NFC][InstCombine] Add unary FNeg tests to fpcast.ll and fpextend.ll
llvm-svn: 361973
2019-05-29 15:29:35 +00:00
Cameron McInally 4ebbc4d73a [NFC][InstCombine] Add unary FNeg tests to fsub.ll known-never-nan.ll
llvm-svn: 361971
2019-05-29 15:21:28 +00:00
Simon Atanasyan 909c8c2b0d [mips] Use reg-exp in tests to tolerate register indexes changing. NFC
llvm-svn: 361966
2019-05-29 14:59:07 +00:00
Matt Arsenault 36e7254441 SpeculateAroundPHIs: Respect convergent
llvm-svn: 361957
2019-05-29 13:14:39 +00:00
Matt Arsenault 9ffd8b5a6f AMDGPU/GlobalISel: Remove unnecesssary REQUIREs
This has been a mandatory part of the build for a while.

llvm-svn: 361956
2019-05-29 13:14:35 +00:00
Graham Hunter f4fc01f8dd [SVE][IR] Scalable Vector IR Type
* Adds a 'scalable' flag to VectorType
* Adds an 'ElementCount' class to VectorType to pass (possibly scalable) vector lengths, with overloaded operators.
* Modifies existing helper functions to use ElementCount
* Adds support for serializing/deserializing to/from both textual and bitcode IR formats
* Extends the verifier to reject global variables of scalable types
* Updates documentation

See the latest version of the RFC here: http://lists.llvm.org/pipermail/llvm-dev/2018-July/124396.html

Reviewers: rengolin, lattner, echristo, chandlerc, hfinkel, rkruppe, samparker, SjoerdMeijer, greened, sebpop

Reviewed By: hfinkel, sebpop

Differential Revision: https://reviews.llvm.org/D32530

llvm-svn: 361953
2019-05-29 12:22:54 +00:00
Eugene Leviant a6fb183c98 [llvm-objcopy] Implement IHEX writer
Differential revision: https://reviews.llvm.org/D60270

llvm-svn: 361949
2019-05-29 11:37:16 +00:00
George Rimar 5b363c14d7 [llvm-readobj] - Repair the test case.
I forgot to change the test tag in r361932.
Now it is fixed.

llvm-svn: 361945
2019-05-29 11:01:07 +00:00
George Rimar 8ac7b2d07b [llvm-readelf] - Allow dumping of the .dynamic section even if there is no PT_DYNAMIC header.
It is now possible after D61937 was landed and was discussed
in it's review comments. It is not consistent with GNU, which
does not output .dynamic section content in this case for
no visible reason.

Differential revision: https://reviews.llvm.org/D62179

llvm-svn: 361943
2019-05-29 10:31:46 +00:00
Cullen Rhodes 6c04ef3d48 [AArch64][SVE2] Asm: support SVE Bitwise Logical - Unpredicated Group
Summary:
Patch adds support for the following instructions:
    * EOR3, BSL, BCAX, BSL1N, BSL2N, NBSL, XAR

Aliases for types .B/.H/.S for EOR3 and BCAX have been added, the
preferred disassembly is .D.

The specification can be found here:
https://developer.arm.com/docs/ddi0602/latest

Reviewed By: SjoerdMeijer

Differential Revision: https://reviews.llvm.org/D62387

llvm-svn: 361936
2019-05-29 09:03:27 +00:00
Cullen Rhodes 75dfbdc2da [AArch64][SVE2] Asm: support Floating Point Widening Multiply-Add
Summary:
Patch adds support for the indexed and unpredicated vectors forms of the
FMLALB, FMLALT, FMLSLB and FMLSLT instructions.

The specification can be found here:
https://developer.arm.com/docs/ddi0602/latest

Reviewed By: chill

Differential Revision: https://reviews.llvm.org/D62386

llvm-svn: 361935
2019-05-29 08:53:06 +00:00
Cullen Rhodes 4f58ad4e72 [AArch64][SVE2] Asm: support SVE2 Floating Point Pairwise Group
Summary:
Patch adds support for the following instructions:

SVE2 floating-point pairwise operations:
    * FADDP, FMAXNMP, FMINNMP, FMAXP, FMINP

The specification can be found here:
https://developer.arm.com/docs/ddi0602/latest

Reviewed By: chill

Differential Revision: https://reviews.llvm.org/D62383

llvm-svn: 361933
2019-05-29 08:40:33 +00:00
George Rimar 65dde1e0db [llvm-readobj/llvm-readelf] - Simplify the elf-versioninfo.test test case.
This removes 2 precompiled objects from the test case and replaces
them with a single YAML. That allowed to simplify and clean up the test,
remove excessive checks.

Differential revision: https://reviews.llvm.org/D62529

llvm-svn: 361932
2019-05-29 08:28:47 +00:00
Fangrui Song ed6fa44f23 [llvm-readobj] -u: don't crash when dumping SHT_ARM_EXIDX if .symtab doesn't exist
Reviewed By: kongyi

Differential Revision: https://reviews.llvm.org/D62567

llvm-svn: 361929
2019-05-29 06:18:34 +00:00
Peter Collingbourne 31fda09b2d Add IR support, ELF section and user documentation for partitioning feature.
The partitioning feature was proposed here:
http://lists.llvm.org/pipermail/llvm-dev/2019-February/130583.html

This is mostly just documentation. The feature itself will be contributed
in subsequent patches.

Differential Revision: https://reviews.llvm.org/D60242

llvm-svn: 361923
2019-05-29 03:29:01 +00:00
Fangrui Song 656afe370d [X86] Fix x86-64 call *foo@tlsdesc(%rax) and support R_386_TLSGOTDESC R_386_TLS_DESC_CALL
D18885 emitted 5 bytes for call *foo@tlsdesc(%rax). It should use the
2-byte form instead and let R_X86_64_TLSDESC_CALL apply to the beginning
of the call instruction.

The 2-byte form was deliberately chosen to make ->LE and ->IE relaxation work:

    0:   48 8d 05 00 00 00 00    lea    0x0(%rip),%rax        # 7 <.text+0x7>
                         3: R_X86_64_GOTPC32_TLSDESC     a-0x4
    7:   ff 10                   callq  *(%rax)
                         7: R_X86_64_TLSDESC_CALL        a

=>

    0:   48 c7 c0 fc ff ff ff    mov    $0xfffffffffffffffc,%rax
    7:   66 90                   xchg   %ax,%ax

Also change the symbol type to STT_TLS when VK_TLSCALL or VK_TLSDESC is
seen.

Reviewed By: compnerd

Differential Revision: https://reviews.llvm.org/D62512

llvm-svn: 361910
2019-05-29 02:02:59 +00:00
Sanjay Patel 19f703e0d7 [AArch64] auto-generate complete test checks; NFC
llvm-svn: 361908
2019-05-29 01:37:44 +00:00
Sanjay Patel 860736cc3c [AArch64] auto-generate complete test checks; NFC
llvm-svn: 361906
2019-05-29 01:35:10 +00:00
Quentin Colombet a6f57ad2c9 [RegUsageInfoCollector] Don't mark as saved registers that don't have subregister lanes
To determine the list of clobbered registers, the RegUsageInfoCollector pass
uses the list of callee saved registers provided by the target and then augments
it with the list of registers which have all their subregisters saved. It then
basically does the difference between all the registers and the saved registers
to come up with what is clobbered (plus it checks that the register is defined
within that functions).

The patch fixes a bug where when register does not have any subregister lane,
hence when checking if any of its subregister are not saved, we would find none
and think the register is saved as well.

That's obviously wrong.

The code was actually kind of checking for something like that with the
CoveredBySubRegs bit. What this bit says is that a register is completely
covered by its subregisters.
We required that this bit was set, to check that a register was saved by its
subregister lanes, since without this bit, we potentially would miss to check
some part of the register.

However, this bit is used de facto on registers that don't have any
subregisters (e.g., on ARM) and the code was not prepared for that.

This patch fixes this by checking that a register has subregisters before
declaring it saved when none of its lanes are modified.

llvm-svn: 361901
2019-05-28 23:43:12 +00:00
Alexander Shaposhnikov 88aed8da61 [tools] Introduce llvm-lipo
This diff starts the implementation of llvm-lipo 
which is supposed to be a drop-in replacement for the well-known tool lipo.

Test plan: make check-all

Differential revision: https://reviews.llvm.org/D61927

llvm-svn: 361896
2019-05-28 23:22:12 +00:00
Jessica Paquette b73ea75b38 [AArch64][GlobalISel] Select FCMPSri/FCMPDri when comparing against 0.0
Add support for selecting FCMPSri and FCMPDri when comparing against 0.0, and
factor out opcode selection for G_FCMP into its own function.

Add a test to show that we don't do this with other immediates.

Differential Revision: https://reviews.llvm.org/D62539

llvm-svn: 361888
2019-05-28 22:52:49 +00:00
Heejin Ahn 5514658591 [WebAssembly] Support for atomic fences
Summary:
This adds support for translation of LLVM IR fence instruction. We
convert a singlethread fence to a pseudo compiler barrier which becomes
0 instructions in final binary, and a thread fence to an idempotent
atomicrmw instruction to a memory address.

Reviewers: dschuff, jfb, sunfish, tlively

Subscribers: sbc100, jgravelle-google, llvm-commits

Differential Revision: https://reviews.llvm.org/D50277

llvm-svn: 361884
2019-05-28 22:09:12 +00:00
Rong Xu e88173abc0 [PGO] Handle cases of failing to split critical edges
Fix PR41279 where critical edges to EHPad are not split.
The fix is to not instrument those critical edges. We used to be able to know
the size of counters right after MST is computed. With this, we have to
pre-collect the instrument BBs to know the size, and then instrument them.

Differential Revision: https://reviews.llvm.org/D62439

llvm-svn: 361882
2019-05-28 21:45:56 +00:00
Nikita Popov 5b32f60ec3 Revert "[CorrelatedValuePropagation] Fix prof branch_weights metadata handling for SwitchInst"
This reverts commit 53f2f32865.

As reported on D62126, this causes assertion failures if the switch
has incorrect branch_weights metadata, which may happen as a result
of other transforms not handling it correctly yet.

llvm-svn: 361881
2019-05-28 21:28:24 +00:00
Konstantin Zhuravlyov fe23ed2c68 AMDGPU: Temporary drop s_mul_hi_i/u32 patterns
It introduces performance regressions in several applications.

This has already been submitted downstream.

llvm-svn: 361879
2019-05-28 21:18:34 +00:00
Adhemerval Zanella 34d8daae53 [AArch64] Handle ISD::LRINT and ISD::LLRINT
This patch optimizes ISD::LRINT and ISD::LLRINT to frintx plus
fcvtzs. It currently only handles the scalar version.

Reviewed By: SjoerdMeijer, mstorsjo

Differential Revision: https://reviews.llvm.org/D62018

llvm-svn: 361877
2019-05-28 21:04:29 +00:00
Adhemerval Zanella 6d7bf5e8df [CodeGen] Add lrint/llrint builtins
This patch add the ISD::LRINT and ISD::LLRINT along with new
intrinsics.  The changes are straightforward as for other
floating-point rounding functions, with just some adjustments
required to handle the return value being an interger.

The idea is to optimize lrint/llrint generation for AArch64
in a subsequent patch.  Current semantic is just route it to libm
symbol.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D62017

llvm-svn: 361875
2019-05-28 20:47:44 +00:00
Roman Lebedev dfc34f0211 [DAGCombine] (x - C) - y -> (x - y) - C fold. Try 2
Summary:
Again only vectors affected. Frustrating. Let me take a look into that..

https://rise4fun.com/Alive/AAq

This is a recommit, originally committed in rL361856, but reverted
to investigate test-suite compile-time hangs.

Reviewers: RKSimon, craig.topper, spatel

Reviewed By: RKSimon

Subscribers: javed.absar, JDevlieghere, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62294

llvm-svn: 361874
2019-05-28 20:40:10 +00:00
Roman Lebedev d485c6bc9f [DAGCombine][X86][AArch64][AMDGPU] (x - y) + -1 -> add (xor y, -1), x fold. Try 2
Summary:
This prevents regressions in next patch,
and somewhat recovers from the regression to AMDGPU test in D62223.

It is indeed not great that we leave vector decrement,
don't transform it into vector add all-ones..

https://rise4fun.com/Alive/ZRl

This is a recommit, originally committed in rL361855, but reverted
to investigate test-suite compile-time hangs.

Reviewers: RKSimon, craig.topper, spatel, arsenm

Reviewed By: RKSimon, arsenm

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, javed.absar, dstuttard, tpr, t-tye, kristof.beyls, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62263

llvm-svn: 361873
2019-05-28 20:40:03 +00:00
Roman Lebedev 96c9986199 [DAGCombiner][X86][AArch64][SPARC][SystemZ] y - (x + C) -> (y - x) - C fold. Try 2
Summary:
Direct sibling of D62223 patch.
While i don't have a direct motivational pattern for this,
it would seem to make sense to handle both patterns (or none),
for symmetry?

The aarch64 changes look neutral;
sparc and systemz look like improvement (one less instruction each);
x86 changes - 32bit case improves, 64bit case shows that LEA no longer
gets constructed, which may be because that whole test is `-mattr=+slow-lea,+slow-3ops-lea`

https://rise4fun.com/Alive/ffh

This is a recommit, originally committed in rL361853, but reverted
to investigate test-suite compile-time hangs.

Reviewers: RKSimon, craig.topper, spatel, t.p.northover

Reviewed By: t.p.northover

Subscribers: t.p.northover, jyknight, javed.absar, kristof.beyls, fedor.sergeev, jrtc27, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62252

llvm-svn: 361872
2019-05-28 20:39:55 +00:00
Roman Lebedev 2feb7e56e2 [DAGCombiner][X86][AArch64][AMDGPU] (x + C) - y -> (x - y) + C fold. Try 2
Summary:
The main motivation is shown by all these `neg` instructions that are now created.
In particular, the `@reg32_lshr_by_negated_unfolded_sub_b` test.

AArch64 test changes all look good (`neg` created), or neutral.

X86 changes look neutral (vectors), or good (`neg` / `xor eax, eax` created).

I'm not sure about `X86/ragreedy-hoist-spill.ll`, it looks like the spill
is now hoisted into preheader (which should still be good?),
2 4-byte reloads become 1 8-byte reload, and are elsewhere,
but i'm not sure how that affects that loop.

I'm unable to interpret AMDGPU change, looks neutral-ish?

This is hopefully a step towards solving [[ https://bugs.llvm.org/show_bug.cgi?id=41952 | PR41952 ]].

https://rise4fun.com/Alive/pkdq (we are missing more patterns, i'll submit them later)

This is a recommit, originally committed in rL361852, but reverted
to investigate test-suite compile-time hangs.

Reviewers: craig.topper, RKSimon, spatel, arsenm

Reviewed By: RKSimon

Subscribers: bjope, qcolombet, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, javed.absar, dstuttard, tpr, t-tye, kristof.beyls, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62223

llvm-svn: 361871
2019-05-28 20:39:39 +00:00
Peter Collingbourne 0dac476072 Change ELF tools to allow multiple sections per file.
This is how multi-partition combined output files are going to look. If we
see multiple sections, the tools will just read the first one.

Differential Revision: https://reviews.llvm.org/D62349

llvm-svn: 361869
2019-05-28 20:01:25 +00:00
Michael Liao 5fc1dfa784 [AMDGPU] Correct the handling of inlineasm output registers.
Summary:
- There's a regression due to the cross-block RC assignment. Use the
  proper way to derive the output register RC in inline asm.

Reviewers: rampitec, alex-t

Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, dstuttard, tpr, t-tye, eraman, hiraditya, llvm-commits, yaxunl

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62537

llvm-svn: 361868
2019-05-28 19:37:09 +00:00
Roman Lebedev 272d70c366 Revert DAGCombine "hoist binop with const" folds
Appear to introduce test-suite compile-time hang.

http://lab.llvm.org:8011/builders/clang-cmake-x86_64-sde-avx512-linux/builds/22825

This reverts r361852,r361853,r361854,r361855,r361856

llvm-svn: 361865
2019-05-28 19:04:21 +00:00
Nikita Popov 2941eb6864 [InstCombine] Add tests for signed saturating always overflow; NFC
llvm-svn: 361864
2019-05-28 18:59:28 +00:00
Roman Lebedev caeec8501e [NFC][MIPS] Autogenerater madd-msub.ll test
Being affected by upcoming patch

llvm-svn: 361860
2019-05-28 18:31:36 +00:00
Roman Lebedev 7669665432 [DAGCombine] (x - C) - y -> (x - y) - C fold
Summary:
Again only vectors affected. Frustrating. Let me take a look into that..

https://rise4fun.com/Alive/AAq

Reviewers: RKSimon, craig.topper, spatel

Reviewed By: RKSimon

Subscribers: javed.absar, JDevlieghere, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62294

llvm-svn: 361856
2019-05-28 17:54:21 +00:00
Roman Lebedev 8c9b3e4e4a [DAGCombine][X86][AArch64][AMDGPU] (x - y) + -1 -> add (xor y, -1), x fold
Summary:
This prevents regressions in next patch,
and somewhat recovers from the regression to AMDGPU test in D62223.

It is indeed not great that we leave vector decrement,
don't transform it into vector add all-ones..

https://rise4fun.com/Alive/ZRl

Reviewers: RKSimon, craig.topper, spatel, arsenm

Reviewed By: RKSimon, arsenm

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, javed.absar, dstuttard, tpr, t-tye, kristof.beyls, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62263

llvm-svn: 361855
2019-05-28 17:54:13 +00:00