Commit Graph

65953 Commits

Author SHA1 Message Date
Nikita Popov c08666abaf [InstCombine] Add tests for uadd/sub.sat(a, b) == 0; NFC
llvm-svn: 375372
2019-10-20 19:50:31 +00:00
Roman Lebedev 49483a3bc2 [InstCombine] Shift amount reassociation in shifty sign bit test (PR43595)
Summary:
This problem consists of several parts:
* Basic sign bit extraction - `trunc? (?shr %x, (bitwidth(x)-1))`.
  This is trivial, and easy to do, we have a fold for it.
* Shift amount reassociation - if we have two identical shifts,
  and we can simplify-add their shift amounts together,
  then we likely can just perform them as a single shift.
  But this is finicky, has one-use restrictions,
  and shift opcodes must be identical.

But there is a super-pattern where both of these work together.
to produce sign bit test from two shifts + comparison.
We do indeed already handle this in most cases.
But since we get that fold transitively, it has one-use restrictions.
And what's worse, in this case the right-shifts aren't required to be
identical, and we can't handle that transitively:

If the total shift amount is bitwidth-1, only a sign bit will remain
in the output value. But if we look at this from the perspective of
two shifts, we can't fold - we can't possibly know what bit pattern
we'd produce via two shifts, it will be *some* kind of a mask
produced from original sign bit, but we just can't tell it's shape:
https://rise4fun.com/Alive/cM0 https://rise4fun.com/Alive/9IN

But it will *only* contain sign bit and zeros.
So from the perspective of sign bit test, we're good:
https://rise4fun.com/Alive/FRz https://rise4fun.com/Alive/qBU
Superb!

So the simplest solution is to extend `reassociateShiftAmtsOfTwoSameDirectionShifts()` to also have a
sudo-analysis mode that will ignore extra-uses, and will only check
whether a) those are two right shifts and b) they end up with bitwidth(x)-1
shift amount and return either the original value that we sign-checking,
or null.

This does not have any functionality change for
the existing `reassociateShiftAmtsOfTwoSameDirectionShifts()`.

All that being said, as disscussed in the review, this yet again
increases usage of instsimplify in instcombine as utility.
Some day that may need to be reevaluated.

https://bugs.llvm.org/show_bug.cgi?id=43595

Reviewers: spatel, efriedma, vsk

Reviewed By: spatel

Subscribers: xbolva00, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D68930

llvm-svn: 375371
2019-10-20 19:38:50 +00:00
Matt Arsenault e5be543a55 AMDGPU: Increase vcc liveness scan threshold
Avoids a test regression in a future patch. Also add debug printing on
this case, so I waste less time debugging folds in the future.

llvm-svn: 375367
2019-10-20 17:44:17 +00:00
Matt Arsenault 7cd57dcd5b AMDGPU: Split flat offsets that don't fit in DAG
We handle it this way for some other address spaces.

Since r349196, SILoadStoreOptimizer has been trying to do this. This
is after SIFoldOperands runs, which can change the addressing
patterns. It's simpler to just split this earlier.

llvm-svn: 375366
2019-10-20 17:34:44 +00:00
Matt Arsenault bba8fd7132 AMDGPU: Add baseline tests for flat offset splitting
llvm-svn: 375364
2019-10-20 16:33:21 +00:00
George Rimar 2779987d0e [yaml2obj][obj2yaml] - Do not create a symbol table by default.
This patch tries to resolve problems faced in D68943
and uses some of the code written by Konrad Wilhelm Kleine
in that patch.

Previously, yaml2obj tool always created a .symtab section.
This patch changes that. With it we only create it when
have a "Symbols:" tag in the YAML document or when
we need to create it because it is used by another section(s).

obj2yaml follows the new behavior and does not print "Symbols:"
anymore when there is no symbol table.

Differential revision: https://reviews.llvm.org/D69041

llvm-svn: 375361
2019-10-20 14:47:17 +00:00
Matt Arsenault 8a8b317460 AMDGPU: Don't error on calls to null or undef
Calls to constants should probably be generally handled.

llvm-svn: 375356
2019-10-20 07:46:04 +00:00
Philip Reames 1d509201e2 [SCEV] Simplify umin/max of zext and sext of the same value
This is a common idiom which arises after induction variables are widened, and we have two or more exit conditions.  Interestingly, we don't have instcombine or instsimplify support for this either.

Differential Revision: https://reviews.llvm.org/D69006

llvm-svn: 375349
2019-10-19 17:23:02 +00:00
Sanjay Patel a298964d22 [TargetLowering][DAGCombine][MSP430] add/use hook for Shift Amount Threshold (1/2)
Provides a TLI hook to allow targets to relax the emission of shifts, thus enabling
codegen improvements on targets with no multiple shift instructions and cheap selects
or branches.

Contributes to a Fix for PR43559:
https://bugs.llvm.org/show_bug.cgi?id=43559

Patch by: @joanlluch (Joan LLuch)

Differential Revision: https://reviews.llvm.org/D69116

llvm-svn: 375347
2019-10-19 16:57:02 +00:00
Sanjay Patel 0a15981a84 [MSP430] Shift Amount Threshold in DAGCombine (Baseline Tests); NFC
Patch by: @joanlluch (Joan LLuch)

Differential Revision: https://reviews.llvm.org/D69099

llvm-svn: 375345
2019-10-19 16:29:32 +00:00
Simon Pilgrim b5088aa944 [X86][SSE] lowerV16I8Shuffle - tryToWidenViaDuplication - undef unpack args
tryToWidenViaDuplication lowers using the shuffle_v8i16(unpack_v16i8(shuffle_v8i16(x),shuffle_v8i16(x))) pattern, but the unpack only needs the even/odd 16i8 args if the original v16i8 shuffle mask references the even/odd elements - which isn't true for many extension style shuffles.

llvm-svn: 375342
2019-10-19 13:18:02 +00:00
Simon Pilgrim 6ada70d1b5 [X86][SSE] LowerUINT_TO_FP_i64 - only use HADDPD for size/fast-hops
We were always generating a single source HADDPD, but really we should only do this if shouldUseHorizontalOp says its a good idea.

Differential Revision: https://reviews.llvm.org/D69175

llvm-svn: 375341
2019-10-19 11:53:48 +00:00
Matt Arsenault 1aae510893 AMDGPU: Remove optnone from a test
It's not clear why the test had this. I'm unable to break the original
case with the original patch reverted with or without optnone.

This avoids a failure in a future commit.

llvm-svn: 375321
2019-10-19 01:34:59 +00:00
Matt Arsenault d4274f0174 LiveIntervals: Fix handleMoveUp with subreg def moving across a def
If a subregister def was moved across another subregister def and
another use, the main range was not correctly updated. The end point
of the moved interval ended too early and missed the use from theh
other lanes in the subreg def.

llvm-svn: 375300
2019-10-18 23:24:25 +00:00
Stanislav Mekhanoshin 0fab220eb5 [AMDGPU] move PHI nodes to AGPR class
If all uses of a PHI are in AGPR register class we should
avoid unneeded copies via VGPRs.

Differential Revision: https://reviews.llvm.org/D69200

llvm-svn: 375297
2019-10-18 22:48:45 +00:00
Wei Mi 8c8ec1f686 [SampleFDO] Add profile remapping support for profile on-demand loading used
by ExtBinary format profile

Profile on-demand loading was added for ExtBinary format profile in rL374233,
but currently profile on-demand loading doesn't work well with profile
remapping. The patch adds the support.

Suppose a function in the current module has outline instance in the profile.
The function name in the module is different from the name of the outline
instance, but remapper knows the two names are equal. When loading profile
on-demand, the outline instance has to be loaded with remapper's help.

At the same time SampleProfileReaderItaniumRemapper is changed from a proxy
of SampleProfileReader to a helper member in SampleProfileReader.

Differential Revision: https://reviews.llvm.org/D68901

llvm-svn: 375295
2019-10-18 22:35:20 +00:00
Jay Foad a9aa4ec6a3 [AMDGPU] Remove -amdgpu-spill-sgpr-to-smem.
Summary: The implementation was never completed and never used except in tests.

Reviewers: arsenm, mareko

Subscribers: qcolombet, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D69163

llvm-svn: 375293
2019-10-18 21:48:22 +00:00
Reid Kleckner 52d765544b [X86] Fix register parsing in .seh_* in Intel syntax
Previously, the parser checked for a '%' prefix to indicate a register.
In Intel syntax mode, LLVM does not print a '%' prefix on registers, so
LLVM could not parse its own assembly output. Instead, require that
register numbers be integer literals, or at least start with an integer
literal, which is consistent with .cfi_* directive register parsing.

llvm-svn: 375287
2019-10-18 21:01:41 +00:00
Roman Lebedev 7c4fa28e5c [NFC][CVP] Some tests for `mul` no-wrap deduction
llvm-svn: 375285
2019-10-18 20:36:19 +00:00
Thomas Lively 393d0f799f [WebAssembly] Allow multivalue signatures in object files
Summary:
Also changes the wasm YAML format to reflect the possibility of having
multiple return types and to put the returns after the params for
consistency with the binary encoding.

Reviewers: aheejin, sbc100

Subscribers: dschuff, jgravelle-google, hiraditya, sunfish, arphaman, rupprecht, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D69156

llvm-svn: 375283
2019-10-18 20:27:30 +00:00
Roman Lebedev 284b6d7f4d [CVP] After proving that @llvm.with.overflow()/@llvm.sat() don't overflow, also try to prove other no-wrap
Summary:
CVP, unlike InstCombine, does not run till exaustion.
It only does a single pass.

When dealing with those special binops, if we prove that they can
safely be demoted into their usual binop form,
we do set the no-wrap we deduced. But when dealing with usual binops,
we try to deduce both no-wraps.

So if we convert e.g. @llvm.uadd.with.overflow() to `add nuw`,
we won't attempt to check whether it can be `add nuw nsw`.

This patch proposes to call `processBinOp()` on newly-created binop,
which is identical to what we do for div/rem already.

Reviewers: nikic, spatel, reames

Reviewed By: nikic

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D69183

llvm-svn: 375273
2019-10-18 19:32:47 +00:00
Matt Arsenault f9a42ed0a7 AMDGPU: Relax 32-bit SGPR register class
Mostly use SReg_32 instead of SReg_32_XM0 for arbitrary values. This
will allow the register coalescer to do a better job eliminating
copies to m0.

For GlobalISel, as a terrible hack, use SGPR_32 for things that should
use SCC until booleans are solved.

llvm-svn: 375267
2019-10-18 18:26:37 +00:00
Austin Kerbow 2f41a023af AMDGPU: Fix SMEM WAR hazard for gfx10 readlane
Summary: Hazard recognizer fails to see hazard with V_READLANE_B32_gfx10.

Reviewers: rampitec

Reviewed By: rampitec

Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D69172

llvm-svn: 375265
2019-10-18 18:20:30 +00:00
Roman Lebedev d532f12c82 [NFC][CVP] Add @llvm.*.sat tests where we could prove both no-overflows
llvm-svn: 375260
2019-10-18 17:18:12 +00:00
Joseph Tremoulet a50272f826 Update MinidumpYAML to use minidump::Exception for exception stream
Reviewers: labath, jhenderson, clayborg, MaskRay, grimar

Reviewed By: grimar

Subscribers: lldb-commits, grimar, MaskRay, hiraditya, llvm-commits

Tags: #llvm, #lldb

Differential Revision: https://reviews.llvm.org/D68657

llvm-svn: 375242
2019-10-18 14:56:19 +00:00
Dmitry Preobrazhensky 6c7d7eebda [AMDGPU][MC][GFX10] Added sdwa/dpp versions of v_cndmask_b32
See https://bugs.llvm.org/show_bug.cgi?id=43608

Reviewers: arsenm, rampitec

Differential Revision: https://reviews.llvm.org/D69096

llvm-svn: 375241
2019-10-18 14:49:53 +00:00
Eugene Leviant be78734371 One more attempt to fix PS4 buildbot after r375219
PS4 buildbot seems to be dropping variable names for some reason

llvm-svn: 375237
2019-10-18 14:11:19 +00:00
Eugene Leviant 7e8f79cdc1 Attempt to fix PS4 buildbot after r375219
llvm-svn: 375235
2019-10-18 13:52:51 +00:00
Nemanja Ivanovic dd7021d466 Revert r375152 as it is causing failures on EXPENSIVE_CHECKS bot
llvm-svn: 375233
2019-10-18 13:38:46 +00:00
Dmitry Preobrazhensky 7d325fe57b [AMDGPU][MC][GFX9] Corrected parsing of v_cndmask_b32_sdwa
See https://bugs.llvm.org/show_bug.cgi?id=43607

Reviewers: arsenm, rampitec

Differential Revision: https://reviews.llvm.org/D69095

llvm-svn: 375231
2019-10-18 13:31:53 +00:00
Victor Campos ffcd7698ae [AArch64] Adding support for PMMIR_EL1 register
Summary:
The PMMIR_EL1 register is present in Armv8.4 with PMU extension.
This patch adds support for it.

Reviewers: t.p.northover, dnsampaio

Reviewed By: dnsampaio

Subscribers: kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D68940

llvm-svn: 375228
2019-10-18 12:40:29 +00:00
Graham Hunter 84da2596f9 [AArch64][SVE] Add SPLAT_VECTOR ISD Node
Adds a new ISD node to replicate a scalar value across all elements of
a vector. This is needed for scalable vectors, since BUILD_VECTOR cannot
be used.

Fixes up default type legalization for scalable vectors after the
new MVT type ranges were introduced.

At present I only use this node for scalable vectors. A DAGCombine has
been added to transform a BUILD_VECTOR into a SPLAT_VECTOR if all
elements are the same, but only if the default operation action of
Expand has been overridden by the target.

I've only added result promotion legalization for scalable vector
i8/i16/i32/i64 types in AArch64 for now.

Reviewers: t.p.northover, javed.absar, greened, cameron.mcinally, jmolloy

Reviewed By: jmolloy

Differential Revision: https://reviews.llvm.org/D47775

llvm-svn: 375222
2019-10-18 11:48:35 +00:00
Eugene Leviant eb34c3e8a4 [ThinLTOCodeGenerator] Add support for index-based WPD
Differential revision: https://reviews.llvm.org/D68950

llvm-svn: 375219
2019-10-18 10:54:14 +00:00
David Green 651f07908a [AArch64] Don't combine callee-save and local stack adjustment when optimizing for size
For arm64, D18619 introduced the ability to combine bumping the stack pointer
upfront in case it needs to be bumped for both the callee-save area as well as
the local stack area.

That diff already remarks that "This change can cause an increase in
instructions", but argues that even when that happens, it should be still be a
performance benefit because the number of micro-ops is reduced.

We have observed that this code-size increase can be significant in practice.
This diff disables combining stack bumping for methods that are marked as
optimize-for-size.

Example of a prologue with the behavior before this diff (combining stack bumping when possible):
  sub        sp, sp, #0x40
  stp        d9, d8, [sp, #0x10]
  stp        x20, x19, [sp, #0x20]
  stp        x29, x30, [sp, #0x30]
  add        x29, sp, #0x30
  [... compute x8 somehow ...]
  stp        x0, x8, [sp]

And after this  diff, if the method is marked as optimize-for-size:
  stp        d9, d8, [sp, #-0x30]!
  stp        x20, x19, [sp, #0x10]
  stp        x29, x30, [sp, #0x20]
  add        x29, sp, #0x20
  [... compute x8 somehow ...]
  stp        x0, x8, [sp, #-0x10]!

Note that without combining the stack bump there are two auto-decrements,
nicely folded into the stp instructions, whereas otherwise there is a single
sub sp, ... instruction, but not folded.

Patch by Nikolai Tillmann!

Differential Revision: https://reviews.llvm.org/D68530

llvm-svn: 375217
2019-10-18 10:35:46 +00:00
Simon Pilgrim ef04598e14 [X86] Regenerate memcmp tests and add X64-AVX512 common prefix
Should help make the changes in D69157 clearer

llvm-svn: 375215
2019-10-18 09:59:51 +00:00
David Green e6f313b380 [Codegen] Alter the default promotion for saturating adds and subs
The default promotion for the add_sat/sub_sat nodes currently does:
    ANY_EXTEND iN to iM
    SHL by M-N
    [US][ADD|SUB]SAT
    L/ASHR by M-N

If the promoted add_sat or sub_sat node is not legal, this can produce code
that effectively does a lot of shifting (and requiring large constants to be
materialised) just to use the overflow flag. It is simpler to just do the
saturation manually, using the higher bitwidth addition and a min/max against
the saturating bounds. That is what this patch attempts to do.

Differential Revision: https://reviews.llvm.org/D68926

llvm-svn: 375211
2019-10-18 09:47:48 +00:00
Kerry McLaughlin 0c7cc383e5 [AArch64][SVE] Implement unpack intrinsics
Summary:
Implements the following intrinsics:
  - int_aarch64_sve_sunpkhi
  - int_aarch64_sve_sunpklo
  - int_aarch64_sve_uunpkhi
  - int_aarch64_sve_uunpklo

This patch also adds AArch64ISD nodes for UNPK instead of implementing
the intrinsics directly, as they are required for a future patch which
implements the sign/zero extension of legal vectors.

This patch includes tests for the Subdivide2Argument type added by D67549

Reviewers: sdesmalen, SjoerdMeijer, greened, rengolin, rovka

Reviewed By: greened

Subscribers: tschuett, kristof.beyls, rkruppe, psnobl, cfe-commits, llvm-commits

Differential Revision: https://reviews.llvm.org/D67550

llvm-svn: 375210
2019-10-18 09:40:16 +00:00
Bjorn Pettersson 6456252dbf [InstCombine] Fix miscompile bug in canEvaluateShuffled
Summary:
Add restrictions in canEvaluateShuffled to prevent that we for example
transform

  %0 = insertelement <2 x i16> undef, i16 %a, i32 0
  %1 = srem <2 x i16> %0, <i16 2, i16 1>
  %2 = shufflevector <2 x i16> %1, <2 x i16> undef, <2 x i32> <i32 undef, i32 0>

into

   %1 = insertelement <2 x i16> undef, i16 %a, i32 1
   %2 = srem <2 x i16> %1, <i16 undef, i16 2>

as having an undef denominator makes the srem undefined (for all
vector elements).

Fixes: https://bugs.llvm.org/show_bug.cgi?id=43689

Reviewers: spatel, lebedev.ri

Reviewed By: spatel, lebedev.ri

Subscribers: lebedev.ri, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D69038

llvm-svn: 375208
2019-10-18 07:42:02 +00:00
Bjorn Pettersson 459134064d [InstCombine] Pre-commit of test case showing miscompile bug in canEvaluateShuffled
Adding the reproducer from  https://bugs.llvm.org/show_bug.cgi?id=43689,
showing that instcombine is doing a bad transform. It transforms

  %0 = insertelement <2 x i16> undef, i16 %a, i32 0
  %1 = srem <2 x i16> %0, <i16 2, i16 1>
  %2 = shufflevector <2 x i16> %1, <2 x i16> undef, <2 x i32> <i32 undef, i32 0>

into

   %1 = insertelement <2 x i16> undef, i16 %a, i32 1
   %2 = srem <2 x i16> %1, <i16 undef, i16 2>

The undef denominator makes the whole srem undefined.

llvm-svn: 375207
2019-10-18 07:41:53 +00:00
David Zarzycki 7b9fd37fa1 [X86] Emit KTEST when possible
https://reviews.llvm.org/D69111

llvm-svn: 375197
2019-10-18 03:45:52 +00:00
Philip Reames 3266eac714 [Test] Precommit test for D69006
llvm-svn: 375190
2019-10-17 23:32:35 +00:00
Jordan Rupprecht 98a2ae7dad Reland [llvm-objdump] Use a counter for llvm-objdump -h instead of the section index.
This relands r374931 (reverted in r375088). It fixes 32-bit builds by using the right format string specifier for uint64_t (PRIu64) instead of `%d`.

Original description:

When listing the index in `llvm-objdump -h`, use a zero-based counter instead of the actual section index (e.g. shdr->sh_index for ELF).

While this is effectively a noop for now (except one unit test for XCOFF), the index values will change in a future patch that filters certain sections out (e.g. symbol tables). See D68669 for more context. Note: the test case in `test/tools/llvm-objdump/X86/section-index.s` already covers the case of incrementing the section index counter when sections are skipped.

Reviewers: grimar, jhenderson, espindola

Reviewed By: grimar

Subscribers: emaste, sbc100, arichardson, aheejin, arphaman, seiya, llvm-commits, MaskRay

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D68848

llvm-svn: 375178
2019-10-17 21:55:43 +00:00
Jordan Rupprecht edeebad771 [llvm-objcopy] Add support for shell wildcards
Summary: GNU objcopy accepts the --wildcard flag to allow wildcard matching on symbol-related flags. (Note: it's implicitly true for section flags).

The basic syntax is to allow *, ?, \, and [] which work similarly to how they work in a shell. Additionally, starting a wildcard with ! causes that wildcard to prevent it from matching a flag.

Use an updated GlobPattern in libSupport to handle these patterns. It does not fully match the `fnmatch` used by GNU objcopy since named character classes (e.g. `[[:digit:]]`) are not supported, but this should support most existing use cases (mostly just `*` is what's used anyway).

Reviewers: jhenderson, MaskRay, evgeny777, espindola, alexshap

Reviewed By: MaskRay

Subscribers: nickdesaulniers, emaste, arichardson, hiraditya, jakehehrlich, abrachet, seiya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66613

llvm-svn: 375169
2019-10-17 20:51:00 +00:00
Sanjay Patel e3905dee00 [x86] add test for setcc to shift transform; NFC
llvm-svn: 375158
2019-10-17 19:32:24 +00:00
Nemanja Ivanovic 8a3d7c9cbd [PowerPC] Turn on CR-Logical reducer pass
Quite a while ago, we implemented a pass that will reduce the number of
CR-logical operations we emit. It does so by converting a CR-logical operation
into a branch. We have kept this off by default because it seemed to cause a
significant regression with one benchmark.
However, that regression turned out to be due to a completely unrelated
reason - AADB introducing a self-copy that is a priority-setting nop and it was
just exacerbated by this pass.

Now that we understand the reason for the only degradation, we can turn this
pass on by default. We have long since fixed the cause for the degradation.

Differential revision: https://reviews.llvm.org/D52431

llvm-svn: 375152
2019-10-17 18:24:28 +00:00
Sanjay Patel 990c43380b [PowerPC] add tests for popcount with zext; NFC
llvm-svn: 375142
2019-10-17 17:44:04 +00:00
Reid Kleckner fc69ad0988 [codeview] Workaround for PR43479, don't re-emit instr labels
Summary:
In the long run we should come up with another mechanism for marking
call instructions as heap allocation sites, and remove this workaround.
For now, we've had two bug reports about this, so let's apply this
workaround. SLH (the other client of instruction labels) probably has
the same bug, but the solution there is more likely to be to mark the
call instruction as not duplicatable, which doesn't work for debug info.

Reviewers: akhuang

Subscribers: aprantl, hiraditya, aganea, chandlerc, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D69068

llvm-svn: 375137
2019-10-17 17:28:31 +00:00
Roman Lebedev 08de59bed5 [NFC][InstCombine] Tests for "fold variable mask before variable shift-of-trunc" (PR42563)
https://bugs.llvm.org/show_bug.cgi?id=42563

llvm-svn: 375135
2019-10-17 17:20:12 +00:00
Xiangling Liao ffe2ec5170 [AIX] TOC pseudo expansion for 64bit large + 64bit small + 32bit large models
This patch provides support for peudo ops including ADDIStocHA8, ADDIStocHA, LWZtocL,
LDtoc, LDtocL for AIX, lowering them from MIR to assembly.

Differential Revision: https://reviews.llvm.org/D68341

llvm-svn: 375113
2019-10-17 13:20:25 +00:00
Daniil Fukalov 3972057511 [AMDGPU] Improve code size cost model
Summary:
Added estimation for zero size insertelement, extractelement
and llvm.fabs operators.
Updated inline/unroll parameters default values.

Reviewers: rampitec, arsenm

Reviewed By: arsenm

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D68881

llvm-svn: 375109
2019-10-17 12:15:35 +00:00
Sam Parker 8e6a638c74 [ARM][MVE] Enable truncating masked stores
Allow us to generate truncating masked store which take v4i32 and
v8i16 vectors and can store to v4i8, v4i16 and v8i8 and memory.
Removed support for unaligned masked stores.

Differential Revision: https://reviews.llvm.org/D68461

llvm-svn: 375108
2019-10-17 12:11:18 +00:00
Fangrui Song a69cc92cb5 [llvm-ar] Implement the O modifier: display member offsets inside the archive
Since GNU ar 2.31, the 't' operation prints member offsets beside file
names if the 'O' modifier is specified. 'O' is ignored for thin
archives.

Reviewed By: gbreynoo, ruiu

Differential Revision: https://reviews.llvm.org/D69087

llvm-svn: 375106
2019-10-17 11:34:29 +00:00
Fangrui Song 9dce25a9fa [llvm-objcopy] --add-symbol: fix crash if SHT_SYMTAB does not exist
Exposed by D69041. If SHT_SYMTAB does not exist, ELFObjcopy.cpp:handleArgs will crash due
to a null pointer dereference.

  for (const NewSymbolInfo &SI : Config.ELF->SymbolsToAdd) {
    ...
    Obj.SymbolTable->addSymbol(

Fix this by creating .symtab and .strtab on demand in ELFBuilder<ELFT>::readSections,
if --add-symbol is specified.

Reviewed By: grimar

Differential Revision: https://reviews.llvm.org/D69093

llvm-svn: 375105
2019-10-17 11:21:54 +00:00
Roman Lebedev fda3243fdd [LoopIdiom] BCmp: check, not assert that loop exits exit out of the loop (PR43687)
We can't normally stumble into that assertion because a tautological
*conditional* `br` in loop body is required, one that always
branches to loop latch. But that should have been always folded
to an unconditional branch before we get it.
But that is not guaranteed if the pass is run standalone.
So let's just promote the assertion into a proper check.

Fixes https://bugs.llvm.org/show_bug.cgi?id=43687

llvm-svn: 375100
2019-10-17 11:01:29 +00:00
George Rimar 9b8e5316f2 [llvm-readobj] - Refine the LLVM-style output to be consistent.
Our LLVM-style output was inconsistent.
This patch changes the output in the following way:

SHT_GNU_verdef { -> VersionDefinitions [
SHT_GNU_verneed { -> VersionRequirements [
Version symbols [ -> VersionSymbols [
EH_FRAME Header [ -> EHFrameHeader {

Differential revision: https://reviews.llvm.org/D68636

llvm-svn: 375095
2019-10-17 10:23:48 +00:00
Oliver Stannard 3b598b9c86 Reland: Dead Virtual Function Elimination
Remove dead virtual functions from vtables with
replaceNonMetadataUsesWith, so that CGProfile metadata gets cleaned up
correctly.

Original commit message:

Currently, it is hard for the compiler to remove unused C++ virtual
functions, because they are all referenced from vtables, which are referenced
by constructors. This means that if the constructor is called from any live
code, then we keep every virtual function in the final link, even if there
are no call sites which can use it.

This patch allows unused virtual functions to be removed during LTO (and
regular compilation in limited circumstances) by using type metadata to match
virtual function call sites to the vtable slots they might load from. This
information can then be used in the global dead code elimination pass instead
of the references from vtables to virtual functions, to more accurately
determine which functions are reachable.

To make this transformation safe, I have changed clang's code-generation to
always load virtual function pointers using the llvm.type.checked.load
intrinsic, instead of regular load instructions. I originally tried writing
this using clang's existing code-generation, which uses the llvm.type.test
and llvm.assume intrinsics after doing a normal load. However, it is possible
for optimisations to obscure the relationship between the GEP, load and
llvm.type.test, causing GlobalDCE to fail to find virtual function call
sites.

The existing linkage and visibility types don't accurately describe the scope
in which a virtual call could be made which uses a given vtable. This is
wider than the visibility of the type itself, because a virtual function call
could be made using a more-visible base class. I've added a new
!vcall_visibility metadata type to represent this, described in
TypeMetadata.rst. The internalization pass and libLTO have been updated to
change this metadata when linking is performed.

This doesn't currently work with ThinLTO, because it needs to see every call
to llvm.type.checked.load in the linkage unit. It might be possible to
extend this optimisation to be able to use the ThinLTO summary, as was done
for devirtualization, but until then that combination is rejected in the
clang driver.

To test this, I've written a fuzzer which generates random C++ programs with
complex class inheritance graphs, and virtual functions called through object
and function pointers of different types. The programs are spread across
multiple translation units and DSOs to test the different visibility
restrictions.

I've also tried doing bootstrap builds of LLVM to test this. This isn't
ideal, because only classes in anonymous namespaces can be optimised with
-fvisibility=default, and some parts of LLVM (plugins and bugpoint) do not
work correctly with -fvisibility=hidden. However, there are only 12 test
failures when building with -fvisibility=hidden (and an unmodified compiler),
and this change does not cause any new failures for either value of
-fvisibility.

On the 7 C++ sub-benchmarks of SPEC2006, this gives a geomean code-size
reduction of ~6%, over a baseline compiled with "-O2 -flto
-fvisibility=hidden -fwhole-program-vtables". The best cases are reductions
of ~14% in 450.soplex and 483.xalancbmk, and there are no code size
increases.

I've also run this on a set of 8 mbed-os examples compiled for Armv7M, which
show a geomean size reduction of ~3%, again with no size increases.

I had hoped that this would have no effect on performance, which would allow
it to awlays be enabled (when using -fwhole-program-vtables). However, the
changes in clang to use the llvm.type.checked.load intrinsic are causing ~1%
performance regression in the C++ parts of SPEC2006. It should be possible to
recover some of this perf loss by teaching optimisations about the
llvm.type.checked.load intrinsic, which would make it worth turning this on
by default (though it's still dependent on -fwhole-program-vtables).

Differential revision: https://reviews.llvm.org/D63932

llvm-svn: 375094
2019-10-17 09:58:57 +00:00
Mikhail Maltsev b6534b2a26 [Analysis] Don't assume that unsigned overflow can't happen in EmitGEPOffset (PR42699)
Summary:
Currently when computing a GEP offset using the function EmitGEPOffset
for the following instruction

  getelementptr inbounds i32, i32* %p, i64 %offs

we get

  mul nuw i64 %offs, 4

Unfortunately we cannot assume that unsigned wrapping won't happen
here because %offs is allowed to be negative.

Making such assumptions can lead to miscompilations: see the new test
test24_neg_offs in InstCombine/icmp.ll. Without the patch InstCombine
would generate the following comparison:

   icmp eq i64 %offs, 4611686018427387902; 0x3ffffffffffffffe

Whereas the correct value to compare with is -2.

This patch replaces the NUW flag with NSW in the multiplication
instructions generated by EmitGEPOffset and adjusts the test suite.

https://bugs.llvm.org/show_bug.cgi?id=42699

Reviewers: chandlerc, craig.topper, ostannard, lebedev.ri, spatel, efriedma, nlopes, aqjune

Reviewed By: lebedev.ri

Subscribers: reames, lebedev.ri, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D68342

llvm-svn: 375089
2019-10-17 08:59:06 +00:00
Hans Wennborg 312c4a6e24 Revert r374931 "[llvm-objdump] Use a counter for llvm-objdump -h instead of the section index."
This broke llvm-objdump in 32-bit builds, see e.g.
http://lab.llvm.org:8011/builders/clang-cmake-armv7-quick/builds/10925

> Summary:
> When listing the index in `llvm-objdump -h`, use a zero-based counter instead of the actual section index (e.g. shdr->sh_index for ELF).
>
> While this is effectively a noop for now (except one unit test for XCOFF), the index values will change in a future patch that filters certain sections out (e.g. symbol tables). See D68669 for more context. Note: the test case in `test/tools/llvm-objdump/X86/section-index.s` already covers the case of incrementing the section index counter when sections are skipped.
>
> Reviewers: grimar, jhenderson, espindola
>
> Reviewed By: grimar
>
> Subscribers: emaste, sbc100, arichardson, aheejin, arphaman, seiya, llvm-commits, MaskRay
>
> Tags: #llvm
>
> Differential Revision: https://reviews.llvm.org/D68848

llvm-svn: 375088
2019-10-17 08:52:29 +00:00
Sam Parker 3ff961cabd [ARM][MVE] Change VPST to use, not def, VPR
Unlike VPT, VPST just uses the current value of VPR.P0.

Differential Revision: https://reviews.llvm.org/D69037

llvm-svn: 375087
2019-10-17 08:46:31 +00:00
James Molloy 12092a9691 [DFAPacketizer] Use DFAEmitter. NFC.
Summary:
This is a NFC change that removes the NFA->DFA construction and emission logic from DFAPacketizerEmitter and instead uses the generic DFAEmitter logic. This allows DFAPacketizer to use the Automaton class from Support and remove a bunch of logic there too.

After this patch, DFAPacketizer is mostly logic for grepping Itineraries and collecting functional units, with no state machine logic. This will allow us to modernize by removing the 16-functional-unit limit and supporting non-itinerary functional units. This is all for followup patches.

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D68992

llvm-svn: 375086
2019-10-17 08:34:29 +00:00
Sam Parker 39af8a3a3b [DAGCombine][ARM] Enable extending masked loads
Add generic DAG combine for extending masked loads.

Allow us to generate sext/zext masked loads which can access v4i8,
v8i8 and v4i16 memory to produce v4i32, v8i16 and v4i32 respectively.

Differential Revision: https://reviews.llvm.org/D68337

llvm-svn: 375085
2019-10-17 07:55:55 +00:00
Eugene Leviant 943afb57aa [ThinLTO] Import virtual method with single implementation in hybrid mode
Differential revision: https://reviews.llvm.org/D68782

llvm-svn: 375083
2019-10-17 07:46:18 +00:00
Daniel Sanders 329e748c8c [gicombiner] Add the run-time rule disable option
Summary:
Each generated helper can be configured to generate an option that disables
rules in that helper. This can be used to bisect rulesets.

The disable bits are stored in a SparseVector as this is very cheap for the
common case where nothing is disabled. It gets more expensive the more rules
are disabled but you're generally doing that for debug purposes where
performance is less of a concern.

Depends on D68426

Reviewers: volkan, bogner

Reviewed By: volkan

Subscribers: hiraditya, Petar.Avramovic, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D68438

llvm-svn: 375067
2019-10-17 00:37:04 +00:00
Quentin Colombet c319afc903 [GISel][CombinerHelper] Add concat_vectors(build_vector, build_vector) => build_vector
Teach the combiner helper how to flatten concat_vectors of build_vectors
into a build_vector.

Add this combine as part of AArch64 pre-legalizer combiner.

Differential Revision: https://reviews.llvm.org/D69071

llvm-svn: 375066
2019-10-17 00:34:32 +00:00
Jonas Devlieghere 8cdc842c51 [dsymutil] Print warning/error for unknown/missing arguments.
After changing dsymutil to use libOption, we lost error reporting for
missing required arguments (input files). Additionally, we stopped
complaining about unknown arguments. This patch fixes both and adds a
test.

llvm-svn: 375044
2019-10-16 21:48:41 +00:00
Shoaib Meenai 6d1891c508 [AArch64] Fix offset calculation
r374772 changed Offset to be an int64_t but left NewOffset as an int.
Scale is unsigned, so in the calculation `Offset - NewOffset * Scale`,
`NewOffset * Scale` was promoted to unsigned and was then zero-extended
to 64 bits, leading to an incorrect computation which manifested as an
out-of-memory when building the Swift standard library for Android
aarch64. Promote NewOffset to int64_t to fix this, and promote
EmittableOffset as well, since its one user passes it to a function
which takes an int64_t anyway.

Test case based on a suggestion by Sander de Smalen!

Differential Revision: https://reviews.llvm.org/D69018

llvm-svn: 375043
2019-10-16 21:41:05 +00:00
Matt Arsenault 34ed76e180 GlobalISel: Implement lower for G_SADDO/G_SSUBO
Port directly from SelectionDAG, minus the path using
ISD::SADDSAT/ISD::SSUBSAT.

llvm-svn: 375042
2019-10-16 20:46:32 +00:00
Martin Storsjo a4f6b59846 [Symbolize] Use the local MSVC C++ demangler instead of relying on dbghelp. NFC.
This allows making a couple llvm-symbolizer tests run in all
environments.

Differential Revision: https://reviews.llvm.org/D68133

llvm-svn: 375041
2019-10-16 20:38:44 +00:00
Philip Reames ac77947315 Remove a stale comment, noted in post commit review for rL375038
llvm-svn: 375040
2019-10-16 20:27:10 +00:00
Philip Reames d4346584fa [IndVars] Fix a miscompile in off-by-default loop predication implementation
The problem is that we can have two loop exits, 'a' and 'b', where 'a' and 'b' would exit at the same iteration, 'a' precedes 'b' along some path, and 'b' is predicated while 'a' is not. In this case (see the previously submitted test case), we causing the loop to exit through 'b' whereas it should have exited through 'a'.

This only applies to loop exits where the exit counts are not provably inequal, but that isn't as much of a restriction as it appears. If we could order the exit counts, we'd have already removed one of the two exits. In theory, we might be able to prove inequality w/o ordering, but I didn't really explore that piece. Instead, I went for the obvious restriction and ensured we didn't predicate exits following non-predicateable exits.

Credit goes to Evgeny Brevnov for figuring out the problematic case. Fuzzing probably also found it (failures seen), but due to some silly infrastructure problems I hadn't gotten to the results before Evgeny hand reduced it from a benchmark (he manually enabled the transform). Once this is fixed, I'll try to filter through the fuzzer failures to see if there's anything additional lurking.

Differential Revision https://reviews.llvm.org/D68956

llvm-svn: 375038
2019-10-16 19:58:26 +00:00
Stanislav Mekhanoshin edcd5815ce [AMDGPU] Do not combine dpp mov reading physregs
We cannot be sure physregs will stay unchanged.

Differential Revision: https://reviews.llvm.org/D69065

llvm-svn: 375033
2019-10-16 19:28:25 +00:00
Stanislav Mekhanoshin 3d99310c15 [AMDGPU] Do not combine dpp with physreg def
We will remove dpp mov along with the physreg def otherwise.

Differential Revision: https://reviews.llvm.org/D69063

llvm-svn: 375030
2019-10-16 18:48:54 +00:00
Jordan Rupprecht a86bd22515 [llvm-ar] Implement the V modifier as an alias for --version
Summary: Also update the help modifier (h) so that it works as a modifier and not just as a standalone `h`. For example, `llvm-ar h` prints the help message, but `llvm-ar xh` currently prints `unknown option h`.

Reviewers: MaskRay, gbreynoo

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D69007

llvm-svn: 375028
2019-10-16 18:39:52 +00:00
Sanjay Patel 8cc6d42e8d [SLP] avoid reduction transform on patterns that the backend can load-combine (2nd try)
The 1st attempt at this modified the cost model in a bad way to avoid the vectorization,
but that caused problems for other users (the loop vectorizer) of the cost model.

I don't see an ideal solution to these 2 related, potentially large, perf regressions:
https://bugs.llvm.org/show_bug.cgi?id=42708
https://bugs.llvm.org/show_bug.cgi?id=43146

We decided that load combining was unsuitable for IR because it could obscure other
optimizations in IR. So we removed the LoadCombiner pass and deferred to the backend.
Therefore, preventing SLP from destroying load combine opportunities requires that it
recognizes patterns that could be combined later, but not do the optimization itself (
it's not a vector combine anyway, so it's probably out-of-scope for SLP).

Here, we add a cost-independent bailout with a conservative pattern match for a
multi-instruction sequence that can probably be reduced later.

In the x86 tests shown (and discussed in more detail in the bug reports), SDAG combining
will produce a single instruction on these tests like:

  movbe   rax, qword ptr [rdi]

or:

  mov     rax, qword ptr [rdi]

Not some (half) vector monstrosity as we currently do using SLP:

  vpmovzxbq       ymm0, dword ptr [rdi + 1] # ymm0 = mem[0],zero,zero,..
  vpsllvq ymm0, ymm0, ymmword ptr [rip + .LCPI0_0]
  movzx   eax, byte ptr [rdi]
  movzx   ecx, byte ptr [rdi + 5]
  shl     rcx, 40
  movzx   edx, byte ptr [rdi + 6]
  shl     rdx, 48
  or      rdx, rcx
  movzx   ecx, byte ptr [rdi + 7]
  shl     rcx, 56
  or      rcx, rdx
  or      rcx, rax
  vextracti128    xmm1, ymm0, 1
  vpor    xmm0, xmm0, xmm1
  vpshufd xmm1, xmm0, 78          # xmm1 = xmm0[2,3,0,1]
  vpor    xmm0, xmm0, xmm1
  vmovq   rax, xmm0
  or      rax, rcx
  vzeroupper
  ret

Differential Revision: https://reviews.llvm.org/D67841

llvm-svn: 375025
2019-10-16 18:06:24 +00:00
Joel E. Denny 6ce2d81032 [lit] Fix a test case that r374652 missed
llvm-svn: 375023
2019-10-16 17:56:12 +00:00
Joel E. Denny 2622419c78 [lit] Fix internal diff's --strip-trailing-cr and use it
Using GNU diff, `--strip-trailing-cr` removes a `\r` appearing before
a `\n` at the end of a line.  Without this patch, lit's internal diff
only removes `\r` if it appears as the last character.  That seems
useless.  This patch fixes that.

This patch also adds `--strip-trailing-cr` to some tests that fail on
Windows bots when D68664 is applied.  Based on what I see in the bot
logs, I think the following is happening.  In each test there, lit
diff is comparing a file with `\r\n` line endings to a file with `\n`
line endings.  Without D68664, lit diff reads those files in text
mode, which in Windows causes `\r\n` to be replaced with `\n`.
However, with D68664, lit diff reads the files in binary mode instead
and thus reports that every line is different, just as GNU diff does
(at least under Ubuntu).  Adding `--strip-trailing-cr` to those tests
restores the previous behavior while permitting the behavior of lit
diff to be more like GNU diff.

Reviewed By: rnk

Differential Revision: https://reviews.llvm.org/D68839

llvm-svn: 375020
2019-10-16 17:21:57 +00:00
Adrian Prantl a9cfde1f6a [DWARF5] Added support for DW_AT_noreturn attribute to be emitted for
C++ class member functions.

Patch by Sourabh Singh Tomar!

Differential Revision: https://reviews.llvm.org/D68697

llvm-svn: 375012
2019-10-16 16:30:38 +00:00
Mark Murray b6dd128621 [AArch64,Assembler] Compiler support for ID_MMFR5_EL1
Summary: Add read-only system register ID_MMFR5_EL1 and unit tests.

Subscribers: kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D69039

llvm-svn: 375010
2019-10-16 15:59:06 +00:00
David Green fe2d15b39b [Codegen] Adjust saturation test. NFC.
Add some extra sat tests and adjust some of the existing tests to use signext where it would naturally be.

llvm-svn: 375009
2019-10-16 15:50:42 +00:00
Jiong Wang ec51851026 bpf: fix wrong truncation elimination when there is back-edge/loop
Currently, BPF backend is doing truncation elimination. If one truncation
is performed on a value defined by narrow loads, then it could be redundant
given BPF loads zero extend the destination register implicitly.

When the definition of the truncated value is a merging value (PHI node)
that could come from different code paths, then checks need to be done on
all possible code paths.

Above described optimization was introduced as r306685, however it doesn't
work when there is back-edge, for example when loop is used inside BPF
code.

For example for the following code, a zero-extended value should be stored
into b[i], but the "and reg, 0xffff" is wrongly eliminated which then
generates corrupted data.

void cal1(unsigned short *a, unsigned long *b, unsigned int k)
{
  unsigned short e;

  e = *a;
  for (unsigned int i = 0; i < k; i++) {
    b[i] = e;
    e = ~e;
  }
}

The reason is r306685 was trying to do the PHI node checks inside isel
DAG2DAG phase, and the checks are done on MachineInstr. This is actually
wrong, because MachineInstr is being built during isel phase and the
associated information is not completed yet. A quick search shows none
target other than BPF is access MachineInstr info during isel phase.

For an PHI node, when you reached it during isel phase, it may have all
predecessors linked, but not successors. It seems successors are linked to
PHI node only when doing SelectionDAGISel::FinishBasicBlock and this
happens later than PreprocessISelDAG hook.

Previously, BPF program doesn't allow loop, there is probably the reason
why this bug was not exposed.

This patch therefore fixes the bug by the following approach:
 - The existing truncation elimination code and the associated
   "load_to_vreg_" records are removed.
 - Instead, implement truncation elimination using MachineSSA pass, this
   is where all information are built, and keep the pass together with other
   similar peephole optimizations inside BPFMIPeephole.cpp. Redundant move
   elimination logic is updated accordingly.
 - Unit testcase included + no compilation errors for kernel BPF selftest.

Patch Review
===
Patch was sent to and reviewed by BPF community at:

  https://lore.kernel.org/bpf

Reported-by: David Beckett <david.beckett@netronome.com>
Reviewed-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
llvm-svn: 375007
2019-10-16 15:27:59 +00:00
Luis Marques 1893f9a458 [RISCV] Add MachineInstr immediate verification
Summary:
This patch implements the `TargetInstrInfo::verifyInstruction` hook for RISC-V. Currently the hook verifies the machine instruction's immediate operands, to check if the immediates are within the expected bounds. Without the hook invalid immediates are not detected except when doing assembly parsing, so they are silently emitted (including being truncated when emitting object code).

The bounds information is specified in tablegen by using the `OperandType` definition, which sets the `MCOperandInfo`'s `OperandType` field. Several RISC-V-specific immediate operand types were created, which extend the `MCInstrDesc`'s `OperandType` `enum`.

To have the hook called with `llc` pass it the `-verify-machineinstrs` option. For Clang add the cmake build config `-DLLVM_ENABLE_EXPENSIVE_CHECKS=True`, or temporarily patch `TargetPassConfig::addVerifyPass`.

Review concerns:

- The patch adds immediate operand type checks that cover at least the base ISA. There are several other operand types for the C extension and one type for the F/D extensions that were left out of this initial patch because they introduced further design concerns that I felt were best evaluated separately.

- Invalid register classes (e.g. passing a GPR register where a GPRC is expected) are already caught, so were not included.

- This design makes the more abstract `MachineInstr` verification depend on MC layer definitions, which arguably is not the cleanest design, but is in line with how things are done in other parts of the target and LLVM in general.

- There is some duplication of logic already present in the `MCOperandPredicate`s. Since the `MachineInstr` and `MCInstr` notions of immediates are fundamentally different, this is currently necessary.

Reviewers: asb, lenary

Reviewed By: lenary

Subscribers: hiraditya, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, kito-cheng, shiva0217, jrtc27, MaskRay, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, rkruppe, PkmX, jocewei, psnobl, benna, Jim, s.egerton, pzheng, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D67397

llvm-svn: 375006
2019-10-16 15:06:02 +00:00
David Stuttard 2d6a2303f8 [AMDGPU] Fix-up cases where writelane has 2 SGPR operands
Summary:
Even though writelane doesn't have the same constraints as other valu
instructions it still can't violate the >1 SGPR operand constraint

Due to later register propagation (e.g. fixing up vgpr operands via
readfirstlane) changing writelane to only have a single SGPR is tricky.

This implementation puts a new check after SIFixSGPRCopies that prevents
multiple SGPRs being used in any writelane instructions.

The algorithm used is to check for trivial copy prop of suitable constants into
one of the SGPR operands and perform that if possible. If this isn't possible
put an explicit copy of Src1 SGPR into M0 and use that instead (this is
allowable for writelane as the constraint is for SGPR read-port and not
constant-bus access).

Reviewers: rampitec, tpr, arsenm, nhaehnle

Reviewed By: rampitec, arsenm, nhaehnle

Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, mgorny, yaxunl, tpr, t-tye, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D51932

Change-Id: Ic7553fa57440f208d4dbc4794fc24345d7e0e9ea
llvm-svn: 375004
2019-10-16 14:37:39 +00:00
Owen Reynolds 28a3b2aeb4 [llvm-ar] Make paths case insensitive when on windows
When on windows gnu-ar treats member names as case insensitive. This
commit implements the same behaviour.

Differential Revision: https://reviews.llvm.org/D68033

llvm-svn: 375002
2019-10-16 14:07:57 +00:00
Piotr Sobczak 79769a4475 [InstCombine][AMDGPU] Fix crash with v3i16/v3f16 buffer intrinsics
Summary:
This is something of a workaround to avoid a crash later on in type
legalizer (WidenVectorResult()).
Also added some f16 tests, including a non-working v3f16 case with
a FIXME.

Reviewers: arsenm, tpr, nhaehnle

Reviewed By: arsenm

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D68865

llvm-svn: 374993
2019-10-16 11:14:01 +00:00
Sjoerd Meijer 5a13188966 Revert "[HardwareLoops] Optimisation remarks"
while I investigate the PPC build bot failures.

This reverts commit ad76375156.

llvm-svn: 374992
2019-10-16 10:55:06 +00:00
Mikhail Maltsev 95b5d459a0 [ARM] Add a register class for GPR pairs without SP and use it. NFCI
Summary:
Currently Thumb2InstrInfo.cpp uses a register class which is
auto-generated by tablegen. Such approach is fragile because
auto-generated classes might change when other register classes are
added. For example, before https://reviews.llvm.org/D62667
we were using GPRPair_with_gsub_1_in_rGPRRegClass, but had to
change it to GPRPair_with_gsub_1_in_GPRwithAPSRnospRegClass
because the former class stopped being generated (this did not change
the functionality though).

This patch adds a register class consisting of even-odd GPR register
pairs from (R0, R1) to (R10, R11), which excludes (R12, SP) and uses
it in Thumb2InstrInfo.cpp instead of
GPRPair_with_gsub_1_in_GPRwithAPSRnospRegClass.

Reviewers: ostannard, simon_tatham, dmgreen, efriedma

Reviewed By: simon_tatham

Subscribers: kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D69026

llvm-svn: 374990
2019-10-16 10:40:57 +00:00
Piotr Sobczak 02baaca742 [AMDGPU] Extend the SI Load/Store optimizer
Summary:
Extend the SI Load/Store optimizer to merge MIMG load instructions. Handle
different flavours of image_load and image_sample instructions.

When the instructions of the same subclass differ only in dmask, merge
them and update dmask accordingly.

Reviewers: nhaehnle

Reviewed By: nhaehnle

Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D64911

llvm-svn: 374984
2019-10-16 10:17:02 +00:00
Sam Parker 1c3ca61294 [ARM][ParallelDSP] Change smlad insertion order
Instead of inserting everything after the 'root' of the reduction,
insert all instructions as close to their operands as possible. This
can help reduce register pressure.

Differential Revision: https://reviews.llvm.org/D67392

llvm-svn: 374981
2019-10-16 09:37:03 +00:00
Sjoerd Meijer ad76375156 [HardwareLoops] Optimisation remarks
This adds the initial plumbing to support optimisation remarks in
the IR hardware-loop pass.

I have left a todo in a comment where we can improve the reporting,
and will iterate on that now that we have this initial support in.

Differential Revision: https://reviews.llvm.org/D68579

llvm-svn: 374980
2019-10-16 09:09:55 +00:00
Craig Topper 7b49e8ac35 [LegalizeTypes] Don't call PromoteTargetBoolean from SplitVecOp_VSETCC.
PromoteTargetBoolean calls getSetccResultType to get the return
type. But we were passing it the setcc result type rather than the
setcc input type. This causes an issue on X86 with avx512vl where
the setcc result type for vXf16 vectors is vXi16 while the
result type for vXi16 vectors is vXi1.

There's really no guarantee that getSetccResultType is the type
we need here. So now we just grab the extend type from
getExtendForContent and extend to the original result VT of the
node we're splitting.

llvm-svn: 374970
2019-10-16 02:50:04 +00:00
Jonas Devlieghere 20c692a445 [dsymutil] Support and relocate base address selection entries for debug_loc
Since r374600 clang emits base address selection entries. Currently
dsymutil does not support these entries and incorrectly interprets them
as location list entries.

This patch adds support for base address selection entries in dsymutil
and makes sure they are relocated correctly.

Thanks to Dave for coming up with the test case!

Differential revision: https://reviews.llvm.org/D69005

llvm-svn: 374957
2019-10-15 23:43:37 +00:00
Lang Hames c85d0aaa2a [JITLink] Switch to slab allocation for InProcessMemoryManager, re-enable test.
InProcessMemoryManager used to make separate memory allocation calls for each
permission level (RW, RX, RO), which could lead to target-out-of-range errors
if data and code were placed too far apart (this was the source of failures in
the JITLink/AArch64 testcase when it was first landed).

This patch updates InProcessMemoryManager to allocate a single slab which is
subdivided between text and data. This should guarantee that accesses remain
in-range provided that individual object files do not exceed 1Mb in size.
This patch also re-enables the JITLink/AArch64 testcase.

llvm-svn: 374948
2019-10-15 21:06:57 +00:00
Digger Lin 34d4bff3d6 [XCOFF]implement parsing relocation information for 32-bit xcoff object file
Summary:
    Parsing the relocation entry information for 32-bit xcoff object file
including deal with the relocation overflow.

Reviewers: hubert.reinterpretcast, jasonliu, sfertile, xingxue.

Subscribers: hiraditya, rupprecht, seiya

Differential Revision: https://reviews.llvm.org/D67008

llvm-svn: 374946
2019-10-15 20:42:11 +00:00
Austin Kerbow 527e9f9a3f AMDGPU: Fix infinite searches in SIFixSGPRCopies
Summary:
Two conditions could lead to infinite loops when processing PHI nodes in
SIFixSGPRCopies.

The first condition involves a REG_SEQUENCE that uses registers defined by both
a PHI and a COPY.

The second condition arises when a physical register is copied to a virtual
register which is then used in a PHI node. If the same virtual register is
copied to the same physical register, the result is an endless loop.

%0:sgpr_64 = COPY $sgpr0_sgpr1
%2 = PHI %0, %bb.0, %1, %bb.1
$sgpr0_sgpr1 = COPY %0

Reviewers: alex-t, rampitec, arsenm

Reviewed By: rampitec

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D68970

llvm-svn: 374944
2019-10-15 19:59:45 +00:00
Digger Lin 1875dcc478 [llvm-readobj][xcoff] implement parsing overflow section header.
SUMMARY:
in the xcoff, if the number of relocation entries or line number entries is
overflow(large than or equal 65535) , there will be overflow section for it.
The interpret of overflow section is different with generic section header,
the patch implement parsing the overflow section.

Reviewers: hubert.reinterpretcast,sfertile,jasonliu
Subscribers: rupprecht, seiya

Differential Revision: https://reviews.llvm.org/D68575

llvm-svn: 374941
2019-10-15 19:28:11 +00:00
Dmitry Mikulin f14642f2f1 Added support for "#pragma clang section relro=<name>"
Differential Revision: https://reviews.llvm.org/D68806

llvm-svn: 374934
2019-10-15 18:31:10 +00:00
Thomas Lively 2cb27072ce [WebAssembly] Allow multivalue types in block signature operands
Summary:
Renames `ExprType` to the more apt `BlockType` and adds a variant for
multivalue blocks. Currently non-void blocks are only generated at the
end of functions where the block return type needs to agree with the
function return type, and that remains true for multivalue
blocks. That invariant means that the actual signature does not need
to be stored in the block signature `MachineOperand` because it can be
inferred by `WebAssemblyMCInstLower` from the return type of the
parent function. `WebAssemblyMCInstLower` continues to lower block
signature operands to immediates when possible but lowers multivalue
signatures to function type symbols. The AsmParser and Disassembler
are updated to handle multivalue block types as well.

Reviewers: aheejin, dschuff, aardappel

Subscribers: sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D68889

llvm-svn: 374933
2019-10-15 18:28:22 +00:00
Jordan Rupprecht eb501b1fc1 [llvm-objdump] Use a counter for llvm-objdump -h instead of the section index.
Summary:
When listing the index in `llvm-objdump -h`, use a zero-based counter instead of the actual section index (e.g. shdr->sh_index for ELF).

While this is effectively a noop for now (except one unit test for XCOFF), the index values will change in a future patch that filters certain sections out (e.g. symbol tables). See D68669 for more context. Note: the test case in `test/tools/llvm-objdump/X86/section-index.s` already covers the case of incrementing the section index counter when sections are skipped.

Reviewers: grimar, jhenderson, espindola

Reviewed By: grimar

Subscribers: emaste, sbc100, arichardson, aheejin, arphaman, seiya, llvm-commits, MaskRay

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D68848

llvm-svn: 374931
2019-10-15 18:13:20 +00:00
Vedant Kumar c7ec51a7c3 [llvm-profdata] Reinstate tools/llvm-profdata/malformed-ptr-to-counter-array.test
I removed this test to unblock the ARM bots while looking into failures
(r374915), and am reinstating it now with a fix.

I believe the problem was that counter ptr address I used,
'\0\0\6\0\1\0\0\1', set the high bits of the pointer, not the low bits
like I wanted. On x86_64 this superficially looks like it tests r370826,
but it doesn't, as it would have been caught before r370826. However, on
ARM (or, 32-bit hosts more generally), I suspect the high bits were
cleared, and you get a 'valid' profile.

I verified that setting the *low* bits of the pointer does trigger the
new condition:

-// Note: The CounterPtr here is off-by-one. This should trigger a malformed profile error.
-RUN: printf '\0\0\6\0\1\0\0\1' >> %t.profraw
+// Note: The CounterPtr here is off-by-one.
+//
+// Octal '\11' is 9 in decimal: this should push CounterOffset to 1. As there are two counters,
+// the profile reader should error out.
+RUN: printf '\11\0\6\0\1\0\0\0' >> %t.profraw

This reverts commit c7cf5b3e4b918c9769fd760f28485b8d943ed968.

llvm-svn: 374927
2019-10-15 17:53:48 +00:00
Digger Lin fdfd6ab12e [XCOFF] Output object text section header and symbol entry for program code.
This is remaining part of  rG41ca91f2995b: [AIX][XCOFF] Output XCOFF
object text section header and symbol entry for rogram code.

SUMMARY:
Original form of this patch is provided by Stefan Pintillie.

1. The patch try to output program code section header , symbol entry for
 program code (PR) and Instruction into the raw text section.
2. The patch include how to alignment and layout the CSection in the text
 section.
3. The patch also reorganize the code , put some codes into a function.
 (XCOFFObjectWriter::writeSymbolTableEntryForControlSection)

Additional: We can not add raw data of text section test in the patch, If want
 to output raw text section data,it need a function description patch first.

Reviewers: hubert.reinterpretcast, sfertile, jasonliu, xingxue.
Subscribers: wuzish, nemanjai, hiraditya, MaskRay, jsjji.

Differential Revision: https://reviews.llvm.org/D66969

llvm-svn: 374923
2019-10-15 17:40:41 +00:00