Commit Graph

191298 Commits

Author SHA1 Message Date
LLVM GN Syncbot f722284cdf [gn build] Port b8a847c0a3 2020-02-05 01:27:20 +00:00
LLVM GN Syncbot 2406a06e67 [gn build] Port 7531a5039f 2020-02-05 01:27:19 +00:00
Francis Visoiu Mistrih 7531a5039f [Remarks] Extend the RemarkStreamer to support other emitters
This extends the RemarkStreamer to allow for other emitters (e.g.
frontends, SIL, etc.) to emit remarks through a common interface.

See changes in llvm/docs/Remarks.rst for motivation and design choices.

Differential Revision: https://reviews.llvm.org/D73676
2020-02-04 17:16:02 -08:00
Alina Sbirlea 1c03cc5a39 [NFCI] Update according to style.
clang-tidy + clang-format
2020-02-04 17:11:36 -08:00
Reid Kleckner f3bacd0738 Fix some more -Wrange-loop-analysis warnings in AArch64TargetParser 2020-02-04 16:57:49 -08:00
Craig Topper 016f42e3dc [X86] Add custom lowering for lrint/llrint to either cvtss2si/cvtsd2si or fist.
lrint/llrint are defined as rounding using the current rounding
mode. Numbers that can't be converted raise FE_INVALID and an
implementation defined value is returned. They may also write to
errno.

I believe this means we can use cvtss2si/cvtsd2si or fist to
convert as long as -fno-math-errno is passed on the command line.
Clang will leave them as libcalls if errno is enabled so they
won't become ISD::LRINT/LLRINT in SelectionDAG.

For 64-bit results on a 32-bit target we can't use cvtss2si/cvtsd2si
but we can use fist since it can write to a 64-bit memory location.
Though maybe we could consider using vcvtps2qq/vcvtpd2qq on avx512dq
targets?

gcc also does this optimization.

I think we might be able to do this with STRICT_LRINT/LLRINT as
well, but I've left that for future work.

Differential Revision: https://reviews.llvm.org/D73859
2020-02-04 16:15:40 -08:00
Reid Kleckner 7982db5dc6 [Support] Fix warnings in ARMTargetParser.cpp 2020-02-04 15:48:22 -08:00
Craig Topper c67773bebe [X86] Give KSET0* and KSET1* pseudos the same scheduler resource usage as KXOR/KXNOR.
These aren't recognized as idioms by the CPU so they still use
execution resources. We just use the pseudo to force the input
register to k0.
2020-02-04 15:22:54 -08:00
Reid Kleckner 2d89e0a098 [SEH] Remove CATCHPAD SDNode and X86::EH_RESTORE MachineInstr
The CATCHPAD node mostly existed to be selected into the EH_RESTORE
instruction, which sets the frame back up when 32-bit Windows exceptions
return to the parent function. However, creating this MachineInstr early
increases the risk that other passes will come along and insert
instructions that use the stack before ESP and EBP are restored. That
happened in PR44697.

Instead of representing these in the instruction stream early, delay it
until PEI. Mark the blocks where this needs to happen as EHPads, but not
funclet entry blocks. Passes after PEI have to be careful not to hoist
instructions that can use stack across frame setup instructions, so this
should be relatively reliable.

Fixes PR44697

Reviewed By: hans

Differential Revision: https://reviews.llvm.org/D73752
2020-02-04 15:13:12 -08:00
Kiran Chandramohan a969e051a5 [OpenMP] Add Flush directive to OpenMPIRBuilder
Add support for Flush in the OMPIRBuilder. This patch also adds changes
to clang to use the OMPIRBuilder when '-fopenmp-enable-irbuilder'
commandline option is used.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D70712
2020-02-04 22:48:02 +00:00
Matt Arsenault 4f9f5d09de AMDGPU: Fix isAlwaysUniform for simple asm SGPR results
We were handling the case where the result was a struct with an
extracted SGPR component, but not for the simple case.
2020-02-04 13:34:14 -08:00
Simon Pilgrim 83d0db59d6 Fix "expression is redundant [misc-redundant-expression]" warning (PR44768)
Be more specific that getOperandConstraint should return -1 or a uint8_t value
2020-02-04 21:24:21 +00:00
Matt Arsenault 12fe9b26ec AMDGPU/GlobalISel: Select G_SEXT_INREG 2020-02-04 13:23:53 -08:00
Matt Arsenault 0693e827ed AMDGPU/GlobalISel: Do a better job splitting 64-bit G_SEXT_INREG
We don't need to expand to full shifts for the > 32-bit case. This
just switches to a sext_inreg of the high half.
2020-02-04 13:23:53 -08:00
Matt Arsenault 05f2a04ba7 AMDGPU/GlobalISel: Legalize G_SEXT_INREG
Split the VALU 64-bit case in RegBankSelect.
2020-02-04 13:23:53 -08:00
Austin Kerbow 0f116fd9d8 [AMDGPU] Fix infinite loop with fma combines
https://reviews.llvm.org/D72312 introduced an infinite loop which involves
DAGCombiner::visitFMA and AMDGPUTargetLowering::performFNegCombine.

fma( a, fneg(b), fneg(c) ) => fneg( fma (a, b, c) ) => fma( a, fneg(b), fneg(c) ) ...

This only breaks with types where 'isFNegFree' returns flase, e.g. v4f32.
Reproducing the issue also needs the attribute 'no-signed-zeros-fp-math',
and no source mods allowed on one of the users of the Op.

This fix makes changes to indicate that it is not free to negate a fma if it
has users with source mods.

Differential Revision: https://reviews.llvm.org/D73939
2020-02-04 13:11:09 -08:00
Matt Arsenault 9b0ce8edfa AMDGPU/GlobalISel: Remove extension legality hacks
The legalization has improved since this was added, and the tests
relying on this no longer need it.
2020-02-04 12:50:47 -08:00
Craig Topper e195ff98f6 Recommit "[X86] Use X86ISD::SUB instead of X86ISD::CMP in some places."
This time with correct types for the data result from the SUB.

Original commit message:

Our normal lowering for ISD::SETCC uses X86ISD::SUB to enable
CSE unless the RHS is 0. optimizeCompareInstr called by the peephole
pass can turn subs with unused results into cmps to clean this up.

This commit makes other places that create X86ISD::CMP have the
same behavior.
2020-02-04 12:19:34 -08:00
Teresa Johnson 7f37a8026f [InlineCost] Add flag to allow changing the default inline cost
Summary:
It can be useful to tune the default inline threshold without overriding other inlining thresholds (e.g. in code compiled for size).

The existing `-inline-threshold` flag overrides other thresholds, so it is insufficient in codebases where there is a mix of code compiled for size and speed.

Patch by Michael Holman <michael.holman@microsoft.com>

Reviewers: eraman, tejohnson

Reviewed By: tejohnson

Subscribers: tejohnson, mtrofin, davidxl, hiraditya, haicheng, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D73217
2020-02-04 12:06:20 -08:00
Matt Arsenault 5d2749938c AMDGPU/GlobalISel: Custom lower G_FEXP 2020-02-04 11:50:55 -08:00
Matt Arsenault b461436d01 AMDGPU/GlobalISel: Legalize s16 G_FEXP2 2020-02-04 11:50:55 -08:00
Matt Arsenault 23b76096b7 CodeGenPrepare: Reorder check for cold and shouldOptimizeForSize
shouldOptimizeForSize is showing up in a profile, spending around 10%
of the pass time in one function. This should probably not be so slow,
but the much cheaper attribute check should be done first anyway.
2020-02-04 11:23:13 -08:00
Matt Arsenault 1024b73ef5 AMDGPU: Split denormal mode tracking bits
Prepare to accurately track the future denormal-fp-math attribute
changes. The way to actually set these separately is not wired in yet.

This is just a mechanical change, and mostly still assumes the input
and output mode match. This should be refined for some cases. For
example, fcanonicalize lowering should use the flushing variant if
either input or output flushing is enabled
2020-02-04 10:44:21 -08:00
Fangrui Song 531fad736e [test] yaml2obj -docnum => --docnum=
Make usage more consistent, and make it possible to enable LongOptionsUseDoubleDash.
2020-02-04 10:33:21 -08:00
Matt Arsenault 75fcdfa1fc AMDGPU: Cleanup SMRD buffer selection
The usage of the Imm out argument from SelectSMRDOffset is pretty
confusing. Stop trying to reject CI immediates in the case where the
offset field can be used. It's not an illegal way to encode the
immediate, so just prefer the better encoding pattern with
AddedComplexity.

We probably don't even really need the different opcodes for the
different offset types anymore, but that will be more work to cleanup.

The SMRD non-buffer load patterns could also use a cleanup to be done
separately.
2020-02-04 10:28:08 -08:00
Matt Arsenault de8451fe4d GlobalISel: Fold SmallVector resizes into constructors 2020-02-04 10:28:08 -08:00
Simon Pilgrim f25a2a3de5 [X86] Fix missing load latencies (PR36894)
We weren't account for load latencies in the SSE42/AES/CLMUL schedule classes
2020-02-04 18:18:29 +00:00
Matt Arsenault 33081d2361 Try to fix buildbot failure 2020-02-04 13:12:46 -05:00
Hiroshi Yamauchi 803dd6fe6b [BFI] Add a debug check for unknown block queries.
Summary:
Add a debug check for frequency queries for unknown blocks (typically blocks
that are created after BFI is computed but their frequencies are not
communicated to BFI.)

This is useful for detecting and debugging missed BFI updates.

This is debug build only and disabled behind a flag.

Reviewers: davidxl

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D73920
2020-02-04 10:05:28 -08:00
Sanjay Patel dc42ff6697 [InstCombine] add FIXME comment to shuffle transform; NFC
Existing tests:
rG5d04e008f708
rG2a191cf8500f
...should verify that the underlying analysis doesn't improve
too much without updating this user code.
2020-02-04 13:02:06 -05:00
Matt Arsenault a3c814d234 Separately track input and output denormal mode
AMDGPU and x86 at least both have separate controls for whether
denormal results are flushed on output, and for whether denormals are
implicitly treated as 0 as an input. The current DAGCombiner use only
really cares about the input treatment of denormals.
2020-02-04 12:59:21 -05:00
Fangrui Song 8ff86fcf4c [X86] -fpatchable-function-entry=N,0: place patch label after ENDBR{32,64}
Similar to D73680 (AArch64 BTI).

A local linkage function whose address is not taken does not need ENDBR32/ENDBR64. Placing the patch label after ENDBR32/ENDBR64 has the advantage that code does not need to differentiate whether the function has an initial ENDBR.

Also, add 32-bit tests and test that .cfi_startproc is at the function
entry. The line information has a general implementation and is tested
by AArch64/patchable-function-entry-empty.mir

Reviewed By: nickdesaulniers

Differential Revision: https://reviews.llvm.org/D73760
2020-02-04 09:42:36 -08:00
David Spickett a05566c994 [ARM] Correct missing newline after outputting .tlsdescseq directive.
Differential Revision: https://reviews.llvm.org/D73972
2020-02-04 17:38:09 +00:00
Cameron McInally 2eaa9d991d [NFC][LangRef][FPEnv] Fix whitespace for denormal-fp-math/denormal-fp-math-f32
Fix incorrect spacing for `denormal-fp-math` and `denormal-fp-math-f32`. No
other changes.
2020-02-04 11:21:03 -06:00
Yonghong Song 6d07802d63 [BPF] handle typedef of struct/union for CO-RE relocations
Linux commit
  1cf5b23988 (diff-289313b9fec99c6f0acfea19d9cfd949)
uses "#pragma clang attribute push (__attribute__((preserve_access_index)),
      apply_to = record)"
to apply CO-RE relocations to all records including the following pattern:
  #pragma clang attribute push (__attribute__((preserve_access_index)), apply_to = record)
  typedef struct {
    int a;
  } __t;
  #pragma clang attribute pop
  int test(__t *arg) { return arg->a; }

The current approach to use struct/union type in the relocation record will
result in an anonymous struct, which make later type matching difficult
in bpf loader. In fact, current BPF backend will fail the above program
with assertion:
  clang: ../lib/Target/BPF/BPFAbstractMemberAccess.cpp:796: ...
     Assertion `TypeName.size()' failed.

clang will change to use the type of the base of the member access
which will preserve the typedef modifier for the
preserve_{struct,union}_access_index intrinsics in the above example.
Here we adjust BPF backend to accept that the debuginfo
type metadata may be 'typedef' and handle them properly.

Differential Revision: https://reviews.llvm.org/D73902
2020-02-04 08:53:03 -08:00
Justin Hibbits b8dc54cf39 PowerPC: Remove redundancy in ternary for predicate selection
rG2c4620ad57b8 inadvertently added redundancies in selection of GT and
LE predicates for SPE.  Correct this.

Partially addresses PR 44768.
2020-02-04 10:38:21 -06:00
David Spickett 95c95a94d7 [ARM][AsmParser] Make assembly directives case insensitive
Differential Revision: https://reviews.llvm.org/D73469
2020-02-04 16:34:39 +00:00
Kazushi (Jam) Marukawa 3ed12232b0 [VE] half fptrunc+store&load+fpext
Summary:
fp16 (half) load+fpext and fptrunc+store isel legalization and tests.
Also, ExternalSymbolSDNode operand printing (tested by fp16 lowering).

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D73899
2020-02-04 17:16:09 +01:00
Jonas Paulsson e943329ba0 [SystemZ] Add 'REQUIRES:' or '-mtriple' to some newly added tests.
Needed to fix buildbots.
2020-02-04 10:52:10 -05:00
Jonas Paulsson 563e84790f [SystemZ] Support -msoft-float
This is needed when building the Linux kernel.

Review: Ulrich Weigand

Differential Revision: https://reviews.llvm.org/D72189
2020-02-04 10:32:45 -05:00
Nico Weber 191a9a78b3 Revert "DebugInfo: Add missing test coverage for DW_OP_convert in loclists"
This reverts commit 5327b917e3.
Already fails on non-Linux at this commit.
2020-02-04 10:10:04 -05:00
Nico Weber f75301d16d Revert "DebugInfo: Check DW_OP_convert in loclists with Split DWARF"
and follow-ups.

This reverts commit 1ced28cbe7.
This reverts commit 4f281f0474.
This reverts commit 552a8fe12b.

The test fails on non-Linux.
2020-02-04 10:06:46 -05:00
Jeremy Morse 41206b61e3 [DebugInfo] Re-instate LiveDebugVariables scope trimming
This patch reverts part of r362750 / D62650, which stopped
LiveDebugVariables from trimming leading variable location ranges down
to only covering those instructions that are in scope. I've observed some
circumstances where the number of DBG_VALUEs in a function can be
amplified in an un-necessary way, to cover more instructions that are
out of scope, leading to very slow compile times. Trimming the range
of instructions that the variables cover solves the slow compile times.

The specific problem that r362750 tries to fix is addressed by the
assignment to RStart that I've added. Any variable location that begins
at the first instruction of a block will now be considered to begin at the
start of the block. While these sound the same, the have different
SlotIndexes, and the register allocator may shoehorn additional
instructions in between the two. The test added in the past
(wrong_debug_loc_after_regalloc.ll) still works with this modification.

live-debug-variables.ll has a range trimmed to not cover the prologue of
the function, while dbg-addr-dse.ll has a DBG_VALUE sink past one
instruction with no DebugLoc, which is expected behaviour.

Differential Revision: https://reviews.llvm.org/D73691
2020-02-04 14:51:06 +00:00
Mikhail Maltsev 65b3b6c0ac [ARM] Make ARM::ArchExtKind use 64-bit underlying type (part 2), NFCI
Summary:
After following Simon's suggestion about additional testing posted at
https://reviews.llvm.org/D73906, I found several more places that
need to be updated.

Reviewers: simon_tatham, dmgreen, ostannard, eli.friedman

Reviewed By: simon_tatham

Subscribers: merge_guards_bot, kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D73963
2020-02-04 14:48:10 +00:00
Sanjay Patel 2a191cf850 [InstCombine] add more splat tests with undef elements; NFC 2020-02-04 09:13:08 -05:00
Sanjay Patel 5d04e008f7 [InstCombine] add splat tests with undef elements; NFC 2020-02-04 07:59:12 -05:00
Sanjay Patel 0cf0be993c [InstCombine] fix operands of shouldChangeType() for casted phi transform
This is a bug noted in the recent D72733 and seen
in the similar transform just above the changed source code.

I added tests with illegal types and zexts to show the bug -
we could transform legal phi ops to illegal, etc. I did not add
tests with trunc because we won't see any diffs on those patterns.
That is because InstCombiner::SliceUpIllegalIntegerPHI() appears to
do those transforms independently of datalayout. It can also create
more casts than are present in existing code.

There are some existing regression tests that do not include a
datalayout that would be altered by this fix. I assumed that the
lack of a datalayout in those regression files is an oversight, so
I added the minimal layout (make i32 legal) necessary to preserve
behavior on those tests.

Differential Revision: https://reviews.llvm.org/D73907
2020-02-04 07:45:48 -05:00
Florian Hahn 8c681f5e47 [Matrix] Mark matrix memory intrinsics as argmemonly/write|read mem.
matrix.columnwise.load and matrix.columnwise.store only access memory
through the argument pointers. Also matrix.columnwise.store only writes
memory.
2020-02-04 12:32:45 +00:00
Georgii Rymar bec54e464e [yaml2obj/obj2yaml] - Add support for the SHT_LLVM_CALL_GRAPH_PROFILE section.
This is a LLVM specific section that is well described here:
https://llvm.org/docs/Extensions.html#sht-llvm-call-graph-profile-section-call-graph-profile

This patch teaches yaml2obj and obj2yaml about how to work with it.

Differential revision: https://reviews.llvm.org/D73788
2020-02-04 15:13:20 +03:00
Mikhail Maltsev 7128aace60 [ARM] Make ARM::ArchExtKind use 64-bit underlying type, NFCI
Summary:
This patch changes the underlying type of the ARM::ArchExtKind
enumeration to uint64_t and adjusts the related code.

The goal of the patch is to prepare the code base for a new
architecture extension.

Reviewers: simon_tatham, eli.friedman, ostannard, dmgreen

Reviewed By: dmgreen

Subscribers: merge_guards_bot, kristof.beyls, hiraditya, cfe-commits, llvm-commits, pbarrio

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D73906
2020-02-04 11:24:18 +00:00