Commit Graph

48136 Commits

Author SHA1 Message Date
Craig Topper 3dc37cc592 [X86] Add a bunch of -mcpu strings to the cpus.ll test.
We were missing most of the "core" aliases as well as skylake, cannonlake, and knights landing.

llvm-svn: 315606
2017-10-12 18:55:57 +00:00
Artem Belevich 3bafc2f0d9 [NVPTX] Implemented wmma intrinsics and instructions.
WMMA = "Warp Level Matrix Multiply-Accumulate".
These are the new instructions introduced in PTX6.0 and available
on sm_70 GPUs.

Differential Revision: https://reviews.llvm.org/D38645

llvm-svn: 315601
2017-10-12 18:27:55 +00:00
Reid Kleckner 1a7e387849 [codeview] Don't emit FPO data in funclet prologues
Attempt 3 to work around bugs in FPO data with funclets.

llvm-svn: 315600
2017-10-12 18:20:35 +00:00
Justin Bogner 754a1a8a6f llvm-isel-fuzzer: Work around BUILD_SHARED_LIBS testing issues
Building with BUILD_SHARED_LIBS makes it tricky to copy around
executables at will, since they won't be able to find the LLVM
libraries any more. This makes testing a feature that's based on the
executable name problematic, so we'll just disable these two tests in
that configuration.

We could potentially fix this by symlinking the lib directory into the
test directory, but that wouldn't work on windows, and losing testing
on windows would be far worse than losing testing on a configuration
that's barely even supported.

llvm-svn: 315599
2017-10-12 18:10:22 +00:00
Artem Belevich 786ca6a166 [TableGen] Allow intrinsics to have up to 8 return values.
Differential Revision: https://reviews.llvm.org/D38633

llvm-svn: 315598
2017-10-12 17:40:00 +00:00
Sanjay Patel e272be7c9a [ValueTracking] return zero when there's conflict in known bits of a shift (PR34838)
Poison allows us to return a better result than undef.

llvm-svn: 315595
2017-10-12 17:31:46 +00:00
Bruno Cardoso Lopes 326fdcbff8 Reintroduce "[SCCP] Propagate integer range info for parameters in IPSCCP."
This is r315288 & r315294, which were reverted due to stage2 bot
failures.

Summary:
This updates the SCCP solver to use of the ValueElement lattice for
parameters, which provides integer range information. The range
information is used to remove unneeded icmp instructions.

For the following function, f() can be optimized to `ret i32 2` with
this change

  source_filename = "sccp.c"
  target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
  target triple = "x86_64-unknown-linux-gnu"

  ; Function Attrs: norecurse nounwind readnone uwtable
  define i32 @main() local_unnamed_addr #0 {
  entry:
    %call = tail call fastcc i32 @f(i32 1)
    %call1 = tail call fastcc i32 @f(i32 47)
    %add3 = add nsw i32 %call, %call1
    ret i32 %add3
  }

  ; Function Attrs: noinline norecurse nounwind readnone uwtable
  define internal fastcc i32 @f(i32 %x) unnamed_addr #1 {
  entry:
    %c1 = icmp sle i32 %x, 100

    %cmp = icmp sgt i32 %x, 300
    %. = select i1 %cmp, i32 1, i32 2
    ret i32 %.
  }

  attributes #1 = { noinline }

Reviewers: davide, sanjoy, efriedma, dberlin

Reviewed By: davide, dberlin

Subscribers: mcrosier, gberry, mssimpso, dberlin, llvm-commits

Differential Revision: https://reviews.llvm.org/D36656

llvm-svn: 315593
2017-10-12 16:54:11 +00:00
Lei Huang 0724fea2da [PowerPC] Add profitablilty check for conversion to mtctr loops
Add profitability checks for modifying counted loops to use the mtctr instruction.

The latency of mtctr is only justified if there are more than 4 comparisons that
will be removed as a result.  Usually counted loops are formed relatively early
and before unrolling, so most low trip count loops often don't survive.  However
we want to ensure that if they do, we do not mistakenly update them to mtctr loops.

Use CodeMetrics to ensure we are only doing this for small loops with small trip counts.

Differential Revision: https://reviews.llvm.org/D38212

llvm-svn: 315592
2017-10-12 16:43:33 +00:00
Tim Renouf c8ffffe462 [AMDGPU] For amdpal, widen interpolation mode workaround
Summary:
The interpolation mode workaround ensures that at least one
interpolation mode is enabled in PSInputAddr. It does not also check
PSInputEna on the basis that the user might enable bits in that
depending on run-time state.

However, for amdpal os type, the user does not enable some bits after
compilation based on run-time states; the register values being
generated here are the final ones set in the hardware. Therefore, apply
the workaround to PSInputAddr and PSInputEnable together. (The case
where a bit is set in PSInputAddr but not in PSInputEnable is where the
frontend set up an input arg for a particular interpolation mode, but
nothing uses that input arg. Really we should have an earlier pass that
removes such an arg.)

Reviewers: arsenm, nhaehnle, dstuttard

Subscribers: kzhuravl, wdng, yaxunl, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D37758

llvm-svn: 315591
2017-10-12 16:16:41 +00:00
Mikael Holmen a079ef68e3 [RegisterCoalescer] Don't set read-undef in pruneValues, only clear
Summary:
The comments in the code said

 // Remove <def,read-undef> flags. This def is now a partial redef.

but the code didn't just remove read-undef, it could introduce new ones which
could cause errors.

E.g. if we have something like

%vreg1<def> = IMPLICIT_DEF
%vreg2:subreg1<def, read-undef> = op %vreg3, %vreg4
%vreg2:subreg2<def> = op %vreg6, %vreg7

and we merge %vreg1 and %vreg2 then we should not set undef on the second subreg
def, which the old code did.

Now we solve this by actually do what the code comment says. We remove
read-undef flags rather than remove or introduce them.

Reviewers: qcolombet, MatzeB

Reviewed By: MatzeB

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D38616

llvm-svn: 315564
2017-10-12 06:21:28 +00:00
Justin Bogner 9ea7fbd1e8 Re-commit "llvm-isel-fuzzer: Handle a subset of backend flags in the exec name"
Here we add a secondary option parser to llvm-isel-fuzzer (and provide
it for use with other fuzzers). With this, you can copy the fuzzer to
a name like llvm-isel-fuzzer=aarch64-gisel for a fuzzer that fuzzer
AArch64 with GlobalISel enabled, or fuzzer=x86_64 to fuzz x86, with no
flags required. This should be useful for running these in OSS-Fuzz.

Note that this handrolls a subset of cl::opts to recognize, rather
than embedding a complete command parser for argv[0]. If we find we
really need the flexibility of handling arbitrary options at some
point we can rethink this.

This re-applies 315545 using "=" instead of ":" as a separator for
arguments.

llvm-svn: 315557
2017-10-12 04:35:32 +00:00
Hans Wennborg 022829d84c Revert r315545 "llvm-isel-fuzzer: Handle a subset of backend flags in the executable name"
It broke some tests on Windows:

Failing Tests (4):
    LLVM :: tools/llvm-isel-fuzzer/execname-options.ll
    LLVM :: tools/llvm-isel-fuzzer/missing-triple.ll
    LLVM :: tools/llvm-isel-fuzzer/x86-empty-bc.ll
    LLVM :: tools/llvm-isel-fuzzer/x86-empty.ll

> llvm-isel-fuzzer: Handle a subset of backend flags in the executable name
>
> Here we add a secondary option parser to llvm-isel-fuzzer (and provide
> it for use with other fuzzers). With this, you can copy the fuzzer to
> a name like llvm-isel-fuzzer:aarch64-gisel for a fuzzer that fuzzer
> AArch64 with GlobalISel enabled, or fuzzer:x86_64 to fuzz x86, with no
> flags required. This should be useful for running these in OSS-Fuzz.
>
> Note that this handrolls a subset of cl::opts to recognize, rather
> than embedding a complete command parser for argv[0]. If we find we
> really need the flexibility of handling arbitrary options at some
> point we can rethink this.

llvm-svn: 315554
2017-10-12 03:32:09 +00:00
Hongbin Zheng d36f2030e2 [SimplifyIndVar] Replace IVUsers with loop invariant whenever possible
Differential Revision: https://reviews.llvm.org/D38415

llvm-svn: 315551
2017-10-12 02:54:11 +00:00
Justin Bogner a5969ce15f llvm-isel-fuzzer: Handle a subset of backend flags in the executable name
Here we add a secondary option parser to llvm-isel-fuzzer (and provide
it for use with other fuzzers). With this, you can copy the fuzzer to
a name like llvm-isel-fuzzer:aarch64-gisel for a fuzzer that fuzzer
AArch64 with GlobalISel enabled, or fuzzer:x86_64 to fuzz x86, with no
flags required. This should be useful for running these in OSS-Fuzz.

Note that this handrolls a subset of cl::opts to recognize, rather
than embedding a complete command parser for argv[0]. If we find we
really need the flexibility of handling arbitrary options at some
point we can rethink this.

llvm-svn: 315545
2017-10-12 01:57:49 +00:00
Wei Mi 1736efd16a Revert r307036 because of PR34919.
llvm-svn: 315540
2017-10-12 00:24:52 +00:00
Konstantin Zhuravlyov c3beb6a075 AMDGPU/NFC: Minor clean ups in PAL metadata
- Move PAL metadata definitions to AMDGPUMetadata
  - Make naming consistent with HSA metadata

Differential Revision: https://reviews.llvm.org/D38745

llvm-svn: 315523
2017-10-11 22:41:09 +00:00
Konstantin Zhuravlyov a63b0f9d20 AMDGPU/NFC: Rename code object metadata as HSA metadata
- Rename AMDGPUCodeObjectMetadata to AMDGPUMetadata (PAL metadata will be included in this file in the follow up change)
  - Rename AMDGPUCodeObjectMetadataStreamer to AMDGPUHSAMetadataStreamer
  - Introduce HSAMD namespace
  - Other minor name changes in function and test names

llvm-svn: 315522
2017-10-11 22:18:53 +00:00
Reid Kleckner ddf413f3e1 Really fix llvm-rc include-paths.test
llvm-svn: 315515
2017-10-11 21:27:54 +00:00
Reid Kleckner ade90cbd79 Attempt to fix failing llvm-rc include-paths.text
llvm-svn: 315514
2017-10-11 21:25:03 +00:00
Reid Kleckner 9cdd4df81a [codeview] Implement FPO data assembler directives
Summary:
This adds a set of new directives that describe 32-bit x86 prologues.
The directives are limited and do not expose the full complexity of
codeview FPO data. They are merely a convenience for the compiler to
generate more readable assembly so we don't need to generate tons of
labels in CodeGen. If our prologue emission changes in the future, we
can change the set of available directives to suit our needs. These are
modelled after the .seh_ directives, which use a different format that
interacts with exception handling.

The directives are:
  .cv_fpo_proc _foo
  .cv_fpo_pushreg ebp/ebx/etc
  .cv_fpo_setframe ebp/esi/etc
  .cv_fpo_stackalloc 200
  .cv_fpo_endprologue
  .cv_fpo_endproc
  .cv_fpo_data _foo

I tried to follow the implementation of ARM EHABI CFI directives by
sinking most directives out of MCStreamer and into X86TargetStreamer.
This helps avoid polluting non-X86 code with WinCOFF specific logic.

I used cdb to confirm that this can show locals in parent CSRs in a few
cases, most importantly the one where we use ESI as a frame pointer,
i.e. the one in http://crbug.com/756153#c28

Once we have cdb integration in debuginfo-tests, we can add integration
tests there.

Reviewers: majnemer, hans

Subscribers: aemerson, mgorny, kristof.beyls, llvm-commits, hiraditya

Differential Revision: https://reviews.llvm.org/D38776

llvm-svn: 315513
2017-10-11 21:24:33 +00:00
Krzysztof Parzyszek c4a9a8d8e0 [Hexagon] Make sure that new-value jump is packetized with producer
llvm-svn: 315510
2017-10-11 21:20:43 +00:00
Florian Hahn e52abba277 [MachineCombiner] Fix initialisation of LastUpdate for incremental update.
Summary:
Fixes a bogus iterator resulting from the removal of a block's first instruction at the point that incremental update is enabled.

Patch by Paul Walker.

Reviewers: fhahn, Gerolf, efriedma, MatzeB

Reviewed By: fhahn

Subscribers: aemerson, javed.absar, llvm-commits

Differential Revision: https://reviews.llvm.org/D38734

llvm-svn: 315502
2017-10-11 20:25:58 +00:00
Lei Huang 263dc4ef3a [PowerPC] Utilize DQ-Form instructions for spill/restore and fix FrameIndex elimination to only use `lis/addi` if necessary.
Currently we produce a bunch of unnecessary code when emitting the
prologue/epilogue for spills/restores.  Namely, if the load from stack
slot/store to stack slot instruction is an X-Form instruction, we will
always produce an LIS/ORI sequence for the stack offset.

Furthermore, we have not exploited the P9 vector D-Form loads/stores for this
purpose.

This patch address both issues.

Specifying the D-Form load as the instruction to use for stack spills/reloads
should be safe because:

1. The stack should be aligned according to the ABI
2. If the stack isn't aligned, PPCRegisterInfo::eliminateFrameIndex() will
   check for the offset being a multiple of 16 and will convert it to an
   X-Form instruction if it isn't.

Differential Revision : https://reviews.llvm.org/D38758

llvm-svn: 315500
2017-10-11 20:20:58 +00:00
Zachary Turner fa0ca6cbd0 [llvm-rc] Use proper search algorithm for finding resources.
Previously we would only look in the current directory for a
resource, which might not be the same as the directory of the
rc file.  Furthermore, MSVC rc supports a /I option, and can
also look in the system environment.  This patch adds support
for this search algorithm.

Differential Revision: https://reviews.llvm.org/D38740

llvm-svn: 315499
2017-10-11 20:12:09 +00:00
Sanjay Patel 6c0aef77aa [x86] avoid infinite loop from SoftenFloatOperand (PR34866)
Legalization of fp128 assumes things that we should have asserts for,
so that's another potential improvement.

Differential Revision: https://reviews.llvm.org/D38771

llvm-svn: 315485
2017-10-11 18:24:21 +00:00
Jake Ehrlich f03384dce7 Reland "[llvm-objcopy] Add support for --strip-sections to remove all section headers leaving only program headers and loadable segment data"
ubsan caught an issue I made where I was converting a null pointer to a
reference.

elf utils implements a particularly extreme form of stripping that I'd
like to support. eu-strip has an option called "strip-sections" that
removes all section headers and leaves only program headers and the
segment data. I have implemented this option partly as a test but mainly
because in Fuchsia we would like to use this option to minimize the size
of our executables. The other strip options that are on my list include
--strip-all and --strip-debug. This is a preliminary implementation that
I'd like to start using in Fuchsia builds if possible. This change
implements such a stripping option for llvm-objcopy

Differential Revision: https://reviews.llvm.org/D38335

llvm-svn: 315484
2017-10-11 18:09:18 +00:00
Lei Huang f9c7f7fed4 [NFC] update test case so checks are not order dependent when not needed
llvm-svn: 315482
2017-10-11 18:04:41 +00:00
Rafael Espindola 1a0e5a1933 Convert an ErrorOr to Expected.
getRelocationAddend should never be called on non SHT_RELA sections,
but changing that requires changing RelocVisitor.h.

llvm-svn: 315473
2017-10-11 16:56:33 +00:00
Krzysztof Parzyszek 8f174dde92 [Pipeliner] Improve serialization order for post-increments
The pipeliner is generating a serial sequence that causes poor
register allocation when a post-increment instruction appears
prior to the use of the post-increment register. This occurs when
there is a circular set of dependences involved with a sequence
of instructions in the same cycle. In this case, there is no
serialization of the parallel semantics that will not cause an
additional register to be allocated.

This patch fixes the problem by changing the instructions so that
the post-increment instruction is used by the subsequent
instruction, which enables the register allocator to make a
better decision and not require another register.

Patch by Brendon Cahoon.

llvm-svn: 315466
2017-10-11 15:51:44 +00:00
Sanjay Patel 8d565a233d [InstCombine] add baseline tests for D38531; NFC
llvm-svn: 315461
2017-10-11 14:29:17 +00:00
Sanjay Patel 34fd5eaaf0 [DAGCombiner] convert insertelement of bitcasted vector into shuffle
Eg:
insert v4i32 V, (v2i16 X), 2 --> shuffle v8i16 V', X', {0,1,2,3,8,9,6,7}

This is a generalization of the IR fold in D38316 to handle insertion into a non-undef vector. 
We may want to abandon that one if we can't find value in squashing the more specific pattern sooner.

We're using the existing legal shuffle target hook to avoid AVX512 horror with vXi1 shuffles.

There may be room for improvement in the shuffle lowering here, but that would be follow-up work.

Differential Revision: https://reviews.llvm.org/D38388

llvm-svn: 315460
2017-10-11 14:12:16 +00:00
Jonas Devlieghere ec053332cf Revert "[dsymutil] Timestmap verification for __swift_ast"
This reverts commit r315456.

llvm-svn: 315458
2017-10-11 13:51:30 +00:00
Jonas Devlieghere 8acb2e3ac4 [dsymutil] Timestmap verification for __swift_ast
This patch adds timestamp verification for swiftmodule files.

 - A new flag is provided to allows us to continue testing of the code
   for embedding the__swift_ast. (git doesn't maintain timestamps)
 - Adds a new test for fat (arm) binaries.

Differential revision: https://reviews.llvm.org/D38686

llvm-svn: 315456
2017-10-11 13:34:52 +00:00
Simon Dardis 442ee63468 [mips] Add missing tests from rL315451
llvm-svn: 315454
2017-10-11 11:45:06 +00:00
Uriel Korach 782f28bf2f [X86] Added tests for TESTM and TESTNM (NFC)
Adding this test files now so after another commit that will add a new pattern for
TESTM and TESTNM instructions will show the improvemnts that have been done.

Change-Id: If3908b7f91897d764053312365a2bc1de78b291d
llvm-svn: 315443
2017-10-11 08:39:25 +00:00
Max Kazantsev 3b81809e06 [GVN] Prevent LoadPRE from hoisting across instructions that don't pass control flow to successors
This patch fixes the miscompile that happens when PRE hoists loads across guards and
other instructions that don't always pass control flow to their successors. PRE is now prohibited
to hoist across such instructions because there is no guarantee that the load standing after such
instruction is still valid before such instruction. For example, a load from under a guard may be
invalid before the guard in the following case:
  int array[LEN];
  ...
  guard(0 <= index && index < LEN);
  use(array[index]);

Differential Revision: https://reviews.llvm.org/D37460

llvm-svn: 315440
2017-10-11 08:10:43 +00:00
Max Kazantsev 0c8dd052b8 [LICM] Disallow sinking of unordered atomic loads into loops
Sinking of unordered atomic load into loop must be disallowed because it turns
a single load into multiple loads. The relevant section of the documentation
is: http://llvm.org/docs/Atomics.html#unordered, specifically the Notes for
Optimizers section. Here is the full text of this section:

> Notes for optimizers
> In terms of the optimizer, this **prohibits any transformation that
> transforms a single load into multiple loads**, transforms a store into
> multiple stores, narrows a store, or stores a value which would not be
> stored otherwise. Some examples of unsafe optimizations are narrowing
> an assignment into a bitfield, rematerializing a load, and turning loads
> and stores into a memcpy call. Reordering unordered operations is safe,
> though, and optimizers should take advantage of that because unordered
> operations are common in languages that need them.

Patch by Daniil Suchkov!

Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D38392

llvm-svn: 315438
2017-10-11 07:26:45 +00:00
Max Kazantsev 25d8655dc2 [IRCE] Do not process empty safe ranges
IRCE should not apply when the safe iteration range is proved to be empty.
In this case we do unneeded job creating pre/post loops and then never
go to the main loop.

This patch makes IRCE not apply to empty safe ranges, adds test for this
situation and also modifies one of existing tests where it used to happen
slightly.

Reviewed By: anna
Differential Revision: https://reviews.llvm.org/D38577

llvm-svn: 315437
2017-10-11 06:53:07 +00:00
Davide Italiano e2138fe41b [GVN] Don't replace constants with constants.
This fixes PR34908. Patch by Alex Crichton!

Differential Revision:  https://reviews.llvm.org/D38765

llvm-svn: 315429
2017-10-11 04:21:51 +00:00
Jake Ehrlich d9a283463a Revert "[llvm-objcopy] Add support for --strip-sections to remove all section headers leaving only program headers and loadable segment data"
This reverts commit rL315412

llvm-svn: 315417
2017-10-11 02:42:29 +00:00
Jake Ehrlich b5152447ba [llvm-objcopy] Add support for --strip-sections to remove all section headers leaving only program headers and loadable segment data
elf utils implements a particularly extreme form of stripping that I'd
like to support. eu-strip has an option called "strip-sections" that
removes all section headers and leaves only program headers and the
segment data. I have implemented this option partly as a test but mainly
because in Fuchsia we would like to use this option to minimize the size
of our executables. The other strip options that are on my list include
--strip-all and --strip-debug. This is a preliminary implementation that
I'd like to start using in Fuchsia builds if possible. This change
implements such a stripping option for llvm-objcopy

Differential Revision: https://reviews.llvm.org/D38335

llvm-svn: 315412
2017-10-11 01:59:06 +00:00
Craig Topper 6ce20bd184 [X86] Add 128-bit version of vbroadcasti32x2 to shuffle comment decoding.
llvm-svn: 315395
2017-10-11 00:11:53 +00:00
Jake Ehrlich fcc05627d4 [llvm-objcopy] Add ability to remove multiple sections by name
This change adds the ability to use the "-R"/"-remove-section" option
multiple times.

Differential Revision: https://reviews.llvm.org/D38332

llvm-svn: 315385
2017-10-10 23:02:43 +00:00
Craig Topper bb0e316dc7 [X86] Add broadcast patterns that allow a scalar_to_vector between the broadcast and the load.
We already have these patterns for AVX512VL, but not AVX1 or 2.

llvm-svn: 315382
2017-10-10 22:40:31 +00:00
Rafael Espindola 8f1f7b1442 Make the ELFFile constructor private.
With this all clients have to use the new create method which returns
an Expected.

Fixes a crash on invalid input.

llvm-svn: 315376
2017-10-10 22:17:49 +00:00
Rafael Espindola ef421f9c18 Make the ELFObjectFile constructor private.
This forces every user to use the new create method that returns an
Expected. This in turn propagates better error messages.

llvm-svn: 315371
2017-10-10 21:21:16 +00:00
Dehao Chen 3f56a05ae5 Use the first instruction's count to estimate the funciton's entry frequency.
Summary: In the current implementation, we only have accurate profile count for standalone symbols. For inlined functions, we do not have entry count data because it's not available in LBR. In this patch, we use the first instruction's frequency to estimiate the function's entry count, especially for inlined functions. This may be inaccurate due to debug info in optimized code. However, this is a better estimate than the static 80/20 estimation we have in the current implementation.

Reviewers: tejohnson, davidxl

Reviewed By: tejohnson

Subscribers: sanjoy, llvm-commits, aprantl

Differential Revision: https://reviews.llvm.org/D38478

llvm-svn: 315369
2017-10-10 21:13:50 +00:00
Sanjay Patel b74063d21f [x86] fix prefix typos for CHECK lines; NFC
llvm-svn: 315368
2017-10-10 21:12:47 +00:00
Simon Dardis b994128d14 [mips] Correct the instruction predicates for microMIPSr3
Rather than using the AdditionalPredicates mechanism to guard
the microMIPS instructions, use the existing predicates to properly
guard those instructions.

This also resolves a case where an instruction pattern was incorrectly
available for microMIPS32R6, which caused a register allocation failure
as the registers specified in the pattern were not available.

Reviewers: nitesh.jain, atanasyan

Differential Revision: https://reviews.llvm.org/D38451

llvm-svn: 315362
2017-10-10 20:52:53 +00:00
Matt Arsenault f42074b699 AMDGPU: Fix missing skipFunction calls
llvm-svn: 315361
2017-10-10 20:48:36 +00:00