Commit Graph

67313 Commits

Author SHA1 Message Date
Fangrui Song eb16435b5e Migrate function attribute "no-frame-pointer-elim-non-leaf" to "frame-pointer"="non-leaf" as cleanups after D56351 2019-12-24 16:05:15 -08:00
Fangrui Song 502a77f125 Migrate function attribute "no-frame-pointer-elim" to "frame-pointer"="all" as cleanups after D56351 2019-12-24 15:57:33 -08:00
Craig Topper c06e53119b [X86] Use 128-bit vector instructions for f32/f64->i64 conversions on 32-bit targets with avx512dq and avx512vl instructions.
On 32-bit targets we can't use the scalar instruction so we
insert the scalar into a vector and use packed conversions.
Previously we used either v4f32->v4i64 or v4f64->v4i64 to avoid
some complexity creating target specific ISD opcodes for
v4f32->v2i64. But this causes extra vzeroupper instructions and
possibly frequency throttling on Intel CPUs.

This patch changes this to create a 128-bit vector and uses a
target specific ISD opcode if needed.
2019-12-24 11:20:10 -08:00
Craig Topper a21beccea2 [X86] Add STRICT versions of CVTTP2SI, CVTTP2UI, CMPM, and CMPP.
Differential Revision: https://reviews.llvm.org/D71850
2019-12-24 10:07:04 -08:00
Matt Arsenault df5c2159d0 AMDGPU/GlobalISel: Legalize some 16-bit round instructions 2019-12-24 09:53:01 -05:00
Matt Arsenault e351256c0d GlobalISel: Define equivalent node for G_INTRINSIC_TRUNC 2019-12-24 09:53:01 -05:00
Matt Arsenault 9035fa6b54 AMDGPU/GlobalISel: Lower llvm.amdgcn.else 2019-12-24 09:53:01 -05:00
David Blaikie fccac1ec16 DebugInfo: Correct the form of DW_AT_macro_info in .dwo files (sec_offset, rather than data4) 2019-12-24 01:23:21 -08:00
Georgii Rymar 301cb91428 [llvm-readobj] - Remove an excessive helper for printing dynamic tags.
This removes the `getTypeString` from readeobj source because it
almost duplicates the existent method: `ELFFile<ELFT>::getDynamicTagAsString`.

Side effect: now it prints "<unknown:>0xHEXVALUE" instead of "(unknown)" for unknown values.
llvm-readelf before this patch printed:

```
0x0000000012345678 (unknown) 0x8765432187654321
0x000000006abcdef0 (unknown) 0x9988776655443322
0x0000000076543210 (unknown) 0x5555666677778888
```

and now it prints:

```
0x0000000012345678 (<unknown:>0x12345678) 0x8765432187654321
0x000000006abcdef0 (<unknown:>0x6abcdef0) 0x9988776655443322
0x0000000076543210 (<unknown:>0x76543210) 0x5555666677778888
```

GNU reaedlf prints different thing:

```
0x0000000012345678 (<unknown>: 12345678) 0x8765432187654321
0x000000006abcdef0 (Operating System specific: 6abcdef0) 0x9988776655443322
0x0000000076543210 (Processor Specific: 76543210) 0x5555666677778888
```

I am not sure we want to follow GNU here. Even if we do, it should be separate
patch probably. The new output looks better and closer to GNU anyways,
and the code is a bit simpler.

Differential revision: https://reviews.llvm.org/D71835
2019-12-24 11:55:45 +03:00
Sourabh Singh Tomar 0a72515d33 [DebugInfo] Fix v4 macinfo for dwo files.
Dwo files must contain have DW_AT_macro_info attribute, when macro information is emitted. Adjusted the test case
for the same.
2019-12-24 12:50:34 +05:30
David Blaikie 199700a5cf DebugInfo: Support dumping any exprloc as an expression
Now that DWARFv5 provides a way to identify DWARF expressions based on
form, rather than only by attribute - use it to always provide pretty
printing for any exprloc attribute, not only the attributes known to
contain expressions.
2019-12-23 19:18:47 -08:00
Igor Kudrin 6f635f9092 [DWARF] Check that all fields of a Unit Header are read.
Tests "dwarfdump-rnglists-dwarf64.s" and "dwarfdump-rnglists.s" were
malformed because they had missing required DWO ID fields in split
compilation unit headers. The patch fixes the tests and checks
the reading of a unit header more thoroughly.

Differential Revision: https://reviews.llvm.org/D71704
2019-12-24 09:38:20 +07:00
Sanjay Patel 25cf5d97ac [InstCombine] add test for copysign; NFC 2019-12-23 17:54:31 -05:00
Sanjay Patel 9a77c20954 [InstCombine] add tests for not(select ...); NFC 2019-12-23 17:14:32 -05:00
Ulrich Weigand 0d3f782e41 [FPEnv][X86] More strict int <-> FP conversion fixes
Fix several several additional problems with the int <-> FP conversion
logic both in common code and in the X86 target. In particular:

- The STRICT_FP_TO_UINT expansion emits a floating-point compare. This
  compare can raise exceptions and therefore needs to be a strict compare.
  I've made it signaling (even though quiet would also be correct) as
  signaling is the more usual default for an LT. This code exists both
  in common code and in the X86 target.

- The STRICT_UINT_TO_FP expansion algorithm was incorrect for strict mode:
  it emitted two STRICT_SINT_TO_FP nodes and then used a select to choose one
  of the results. This can cause spurious exceptions by the STRICT_SINT_TO_FP
  that ends up not chosen. I've fixed the algorithm to use only a single
  STRICT_SINT_TO_FP instead.

- The !isStrictFPEnabled logic in DoInstructionSelection would sometimes do
  the wrong thing because it calls getOperationAction using the result VT.
  But for some opcodes, incuding [SU]INT_TO_FP, getOperationAction needs to
  be called using the operand VT.

- Remove some (obsolete) code in X86DAGToDAGISel::Select that would mutate
  STRICT_FP_TO_[SU]INT to non-strict versions unnecessarily.

Reviewed by: craig.topper

Differential Revision: https://reviews.llvm.org/D71840
2019-12-23 21:11:45 +01:00
David Blaikie e028cee66a MC: Ensure test only reads from the Inputs directory 2019-12-23 11:08:26 -08:00
Jay Foad c7c05b0c8a [AMDGPU] Don't create MachinePointerInfos with an UndefValue pointer
Summary:
The only useful information the UndefValue conveys is the address space,
which MachinePointerInfo can represent directly without referring to an
IR value.

Reviewers: arsenm, rampitec

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, Petar.Avramovic, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71838
2019-12-23 15:58:19 +00:00
Luís Marques 5b1d0dc6bf [RISCV][NFC] Fix use of missing attribute groups in tests 2019-12-23 15:39:04 +00:00
czhengsz 79b3325be0 [PowerPC] NFC - fix the testcase bug of folding rlwinm 2019-12-23 10:28:22 -05:00
Sanjay Patel 8cefc37be5 [DAGCombine] visitEXTRACT_SUBVECTOR - 'little to big' extract_subvector(bitcast()) support
This moves the X86 specific transform from rL364407
into DAGCombiner to generically handle 'little to big' cases
(for example: extract_subvector(v2i64 bitcast(v16i8))). This
allows us to remove both the x86 implementation and the aarch64
bitcast(extract_subvector(bitcast())) combine.

Earlier patches that dealt with regressions initially exposed
by this patch:
rG5e5e99c041e4
rG0b38af89e2c0

Patch by: @RKSimon (Simon Pilgrim)

Differential Revision: https://reviews.llvm.org/D63815
2019-12-23 10:11:45 -05:00
Florian Hahn 8d6f59b78a [Matrix] Use fmuladd for matrix.multiply if allowed.
If the matrix.multiply calls have the contract fast math flag, we can
use fmuladd. This als adds a command line option to force fmuladd
generation. We can retire this option once there is a clang-level
option.

Reviewers: anemet, Gerolf, hfinkel, andrew.w.kaylor

Reviewed By: anemet

Differential Revision: https://reviews.llvm.org/D70951
2019-12-23 14:49:14 +01:00
Florian Hahn 109e4e3851 [Matrix] Add forward shape propagation and first shape aware lowerings.
This patch adds infrastructure for forward shape propagation to
LowerMatrixIntrinsics. It also updates the pass to make use of
the shape information to break up larger vector operations and to
eliminate unnecessary conversion operations between columnwise matrixes
and flattened vectors: if shape information is available for an
instruction, lower the operation to a set of instructions operating on
columns. For example, a store of a matrix is broken down into separate
stores for each column. For users that do not have shape
information (e.g. because they do not yet support shape information
aware lowering), we pack the result columns into a flat vector and
update those users.

It also adds shape aware lowering for the first non-intrinsic
instruction: vector stores.

Example:

For
  %c  = call <4 x double> @llvm.matrix.transpose(<4 x double> %a, i32 2, i32 2)
  store <4 x double> %c, <4 x double>* %Ptr

We generate the code below without shape propagation. Note %9 which
combines the columns of the transposed matrix into a flat vector.

  %split = shufflevector <4 x double> %a, <4 x double> undef, <2 x i32> <i32 0, i32 1>
  %split1 = shufflevector <4 x double> %a, <4 x double> undef, <2 x i32> <i32 2, i32 3>
  %1 = extractelement <2 x double> %split, i64 0
  %2 = insertelement <2 x double> undef, double %1, i64 0
  %3 = extractelement <2 x double> %split1, i64 0
  %4 = insertelement <2 x double> %2, double %3, i64 1
  %5 = extractelement <2 x double> %split, i64 1
  %6 = insertelement <2 x double> undef, double %5, i64 0
  %7 = extractelement <2 x double> %split1, i64 1
  %8 = insertelement <2 x double> %6, double %7, i64 1
  %9 = shufflevector <2 x double> %4, <2 x double> %8, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
  store <4 x double> %9, <4 x double>* %Ptr

With this patch, we propagate the 2x2 shape information from the
transpose to the store and we generate the code below. Note that we
store the columns directly and do not need an extra shuffle.

  %9 = bitcast <4 x double>* %Ptr to double*
  %10 = bitcast double* %9 to <2 x double>*
  store <2 x double> %4, <2 x double>* %10, align 8
  %11 = getelementptr double, double* %9, i32 2
  %12 = bitcast double* %11 to <2 x double>*
  store <2 x double> %8, <2 x double>* %12, align 8

Reviewers: anemet, Gerolf, reames, hfinkel, andrew.w.kaylor

Reviewed By: anemet

Differential Revision: https://reviews.llvm.org/D70897
2019-12-23 13:51:56 +01:00
Georgii Rymar f027e1a68d [yaml2obj] - Allow using an arbitrary value for OSABI.
There was no way to set an unsupported or unknown OS ABI.
With this patch it is possible to use any numeric value.

Differential revision: https://reviews.llvm.org/D71765
2019-12-23 13:29:52 +03:00
Georgii Rymar 1f98577556 [yaml2obj] - Add support for ELFOSABI_LINUX.
ELFOSABI_LINUX is an alias for ELFOSABI_GNU.
It is not that obvious probably.

Differential revision: https://reviews.llvm.org/D71764
2019-12-23 13:25:58 +03:00
Georgii Rymar 2cebc1a717 [yaml2obj] - Add testing for OSABI field.
We have no such testing. This makes impossible
to add support for new ELFOSABI_* tags.

Differential revision: https://reviews.llvm.org/D71763
2019-12-23 13:18:18 +03:00
Martin Storsjö 5a751e747d [AArch64] [Windows] Use COFF stubs for calls to extern_weak functions
As the extern_weak target might be missing, resolving to the absolute
address zero, we can't use the normal direct PC-relative branch
instructions (as that would result in relocations out of range).

Improve the classifyGlobalFunctionReference method to set
MO_DLLIMPORT/MO_COFFSTUB, and simplify the existing code in
AArch64TargetLowering::LowerCall to use the return value from
classifyGlobalFunctionReference for these cases.

Add code in both AArch64FastISel and GlobalISel/IRTranslator to
bail out for function calls to extern weak functions on windows,
to let SelectionDAG handle them.

This matches what was done for X86 in 6bf108d77a.

Differential Revision: https://reviews.llvm.org/D71721
2019-12-23 12:13:49 +02:00
Martin Storsjö b774aa1011 [ARM] [Windows] Use COFF stubs for calls to extern_weak functions
As the extern_weak target might be missing, resolving to the absolute
address zero, we can't use the normal direct PC-relative branch
instructions (as that would result in relocations out of range).

Instead check the shouldAssumeDSOLocal method and load the address
from a COFF stub.

This matches what was done for X86 in 6bf108d77a.

Differential Revision: https://reviews.llvm.org/D71720
2019-12-23 12:13:49 +02:00
Georgii Rymar cc522bc4e3 [llvm-readobj][test] - Stop using Inputs/trivial.obj.elf-x86-64.
This rewrites a few tests to stop using the
trivial.obj.elf-x86-64 precompiled object
and removes it.

Differential revision: https://reviews.llvm.org/D71662
2019-12-23 13:10:26 +03:00
QingShan Zhang 6d5e35e89d [Power9] Remove the PPCISD::XXREVERSE as it has completely the same semantics of ISD::BSWAP
The custom node PPCISD::XXREVERSE has completely the same semantics of generic node ISD::BSWAP.
We need to clean up it as we have the combine rules for bswap in the base class, while nothing for xxreverse.

Differential Revision: https://reviews.llvm.org/D70657
2019-12-23 07:44:33 +00:00
QingShan Zhang 9d1071eac4 [NFC][Test][PowerPC] Add more tests for 'and mask' 2019-12-23 06:59:14 +00:00
Jim Lin da0fe5db99 [AVR] Fix codegen for rotate instructions
Summary:
    This patch introduces the ROLBRd and RORBRd pseudo-instructions,
    which implemenent the "traditional" rotate operations; instead of
    the AVR rotate instructions that use the carry bit.

    The code is not optimized at all. Especially when dealing with
    loops of rotate instructions, this codegen should be improved some
    day.

Related bug: 41358 <https://bugs.llvm.org/show_bug.cgi?id=41358>

//Note//: This is my first submitted patch.

Reviewers: dylanmckay, Jim

Reviewed By: dylanmckay

Subscribers: hiraditya, llvm-commits, dylanmckay, dsprenkels

Tags: #llvm

Patched by dsprenkels (Daan Sprenkels)

Differential Revision: https://reviews.llvm.org/D60365
2019-12-23 11:41:28 +08:00
Kai Luo 9681dc9627 [PowerPC] Exploit `vrl(b|h|w|d)` to perform vector rotation
Summary:
Currently, we set legalization action of `ISD::ROTL` vectors as
`Expand` in `PPCISelLowering`. However, we can exploit `vrl(b|h|w|d)`
to lower `ISD::ROTL` directly.

Differential Revision: https://reviews.llvm.org/D71324
2019-12-23 03:04:43 +00:00
Shengchen Kan fb53396c49 [NFC] Remove unnecessary blank and rename align-branch-64-5b.s to align-branch-64-6a.s 2019-12-23 10:22:02 +08:00
czhengsz 7259f04dde [SCEV] add testcase for get accurate range for addrecexpr with nuw flag 2019-12-22 20:58:19 -05:00
Carl Ritson 2791667d2e [DAGCombiner] Check term use before applying aggressive FSUB optimisations
Summary:
Without this check unnecessary FMA instructions are generated when the FSUB terms are reused.
This also has the side-effect that the same value is computed to different levels of precision, which can create undesirable effects if the results are used together in subsequent computation.

Reviewers: arsenm, nhaehnle, foad, tpr, dstuttard, spatel

Reviewed By: arsenm

Subscribers: jvesely, wdng, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71656
2019-12-23 09:37:58 +09:00
Valentin Churavy fb0ccff6e5
[SelectionDAG] Copy FP flags when visiting a binary instruction.
Summary:
We noticed in Julia that the sequence below no longer turned into
a sequence of FMA instructions in LLVM 7+, but it did in LLVM 6.

```
    %29 = fmul contract <4 x double> %wide.load, %wide.load16
    %30 = fmul contract <4 x double> %wide.load13, %wide.load17
    %31 = fmul contract <4 x double> %wide.load14, %wide.load18
    %32 = fmul contract <4 x double> %wide.load15, %wide.load19
    %33 = fadd fast <4 x double> %vec.phi, %29
    %34 = fadd fast <4 x double> %vec.phi10, %30
    %35 = fadd fast <4 x double> %vec.phi11, %31
    %36 = fadd fast <4 x double> %vec.phi12, %32
```

Unlike Clang, Julia doesn't set the `unsafe-fp-math=true` function
attribute, but rather emits more local instruction flags.

This partially undoes https://reviews.llvm.org/D46854 and if required I can try to minimize the test further.

Reviewers: spatel, mcberg2017

Reviewed By: spatel

Subscribers: chriselrod, merge_guards_bot, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71495
2019-12-22 14:29:36 -05:00
Reid Kleckner b2c1ba5b1f Revert "[ARM][TypePromotion] Enable by default"
This reverts commit ee7579409b.

It causes crashes during ThinLTO. I suspect the issue is related to
races on the global TypeSize variable, which is 80 at the time of the
crash.
2019-12-22 11:27:11 -08:00
Craig Topper a4aa40cebc [X86] Autogenerate complete checks. NFC 2019-12-22 11:18:37 -08:00
Craig Topper fa303ea5d3 [X86] Fix typo of intrinsic name in test cases. NFC
These said test_f32_olt_s for the type of an overloaded intrinsic.
But the parser doesn't use that part of the name and just uses
the types of the arguments.
2019-12-22 11:18:32 -08:00
Philip Reames be051f4312 [Test] Add examples of problematic assembler auto-padding
This is in the context of the automatic padding work for the jcc erratum mitigation.  These are example cases we need to *not* pad for correctness.  Exact mechanism to suppress is still TBD, but saving the tests which have come up.
2019-12-22 09:01:04 -08:00
Sanjay Patel 9cdcd81d3f [InstCombine] enhance fold for copysign with known sign arg
This is another optimization suggested in PRPR44153:
https://bugs.llvm.org/show_bug.cgi?id=44153
2019-12-22 10:07:01 -05:00
Eric Astor dc5b614fa9 [ms] [X86] Use "P" modifier on operands to call instructions in inline X86 assembly.
Summary:
This is documented as the appropriate template modifier for call operands.
Fixes PR44272, and adds a regression test.

Also adds support for operand modifiers in Intel-style inline assembly.

Reviewers: rnk

Reviewed By: rnk

Subscribers: merge_guards_bot, hiraditya, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D71677
2019-12-22 09:16:34 -05:00
Sanjay Patel 0b38af89e2 [AArch64] match splat of bitcasted extract subvector to DUPLANE
This is another potential regression exposed by D63815.

Here we peek through a bitcast to find an extract subvector and
scale the splat offset based on that:
splat (bitcast (extract X, C)), LaneC --> duplane (bitcast X), LaneC'

Differential Revision: https://reviews.llvm.org/D71672
2019-12-22 08:37:03 -05:00
Sanjay Patel 79c7fa31f3 [InstCombine] check alloc size in bitcast of geps fold (PR44321)
We missed a constraint in D44833
when folding a bitcast into a GEP with vector/array types.
If the alloc sizes specified by the datalayout don't match,
this could miscompile as shown in:
https://bugs.llvm.org/show_bug.cgi?id=44321

Differential Revision: https://reviews.llvm.org/D71771
2019-12-21 10:31:21 -05:00
Sanjay Patel 19f9f374d9 [SimplifyLibCalls] require fast-math-flags for pow(X, -0.5) transforms
As discussed in PR44330:
https://bugs.llvm.org/show_bug.cgi?id=44330
...the transform from pow(X, -0.5) libcall/intrinsic to
reciprocal square root can result in small deviations from
the expected result due to differences in the pow()
implementation and/or the extra rounding step from the division.

This patch proposes to allow that difference with either the
'approximate functions' or 'reassociate' FMF:
http://llvm.org/docs/LangRef.html#fast-math-flags

In practice, this likely means that the code is compiled with
all of 'fast' (-ffast-math), but I have preserved the existing
specializations for -0.0/-INF that enable generating safe code
if those special values are allowed simultaneously with
allowing approximation/reassociation.

The question about whether a similar restriction is needed for
the non-reciprocal case -- pow(X, 0.5) -- is deferred. That
transform is allowed without FMF currently, and this patch does
not change that behavior.

Differential Revision: https://reviews.llvm.org/D71706
2019-12-21 10:00:53 -05:00
Florian Hahn d269255b95 [AArch64] Respect reserved registers while renaming in LdSt opt.
We cannot pick reserved registers as rename registers.

Fixes https://bugs.llvm.org/show_bug.cgi?id=44358
2019-12-21 15:10:07 +01:00
Matt Arsenault f9677c4757 Mips: Make test resistant to future changes
This seems to have been relying on extra spills being inserted in
these blocks to increase the code size to trigger branch
relaxation. This broke when these spills were avoided. Add some asm to
pad the size of the blocks to make it not matter.
2019-12-21 04:56:20 -05:00
Matt Arsenault 42a26445f9 AMDGPU/GlobalISel: Fix misuse of div_scale intrinsics
Confusingly, the intrinsic operands do not match the
instruction/custom node. The order is shuffled, and the 3rd operand is
an immediate to select operands.

I'm not 100% sure I did this right, but fdiv still doesn't select end
to end and it will be easier to tell when it does. This at least
avoids an assertion in RegBankSelect and allows hitting the fallback
on selection.
2019-12-21 04:55:36 -05:00
Matt Arsenault dff3f8d742 AMDGPU/GlobalISel: Fix missing scc imp-def on scalar and/or/xor 2019-12-21 04:55:36 -05:00
Michael Trent b4dfa74a5d Constrain the macho-stabs test added in f72d001e09 to run on systems configured with an x86 backend.
Summary: This fixes a failure on the Builder clang-cmake-armv7-quick bot.

Reviewers: lhames, jhenderson

Reviewed By: lhames

Subscribers: kristof.beyls, rupprecht, seiya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71792
2019-12-20 17:40:37 -08:00