Commit Graph

3576 Commits

Author SHA1 Message Date
Jessica Paquette 7f6fe7c02c [GlobalISel][AArch64] Select llvm.aarch64.crypto.sha1h
This was falling back and gives us a reason to create a selectIntrinsic function
which we would need eventually anyway. Update arm64-crypto.ll to show that we
correctly select it.

Also factor out the code for finding an intrinsic ID.

llvm-svn: 359501
2019-04-29 20:58:17 +00:00
Cullen Rhodes 2c0d5043a7 [AArch64][SVE] Asm: add aliases for unpredicated bitwise logical instructions
This patch adds aliases for element sizes .B/.H/.S to the
AND/ORR/EOR/BIC bitwise logical instructions. The assembler now accepts
these instructions with all element sizes up to 64-bit (.D). The
preferred disassembly is .D.

llvm-svn: 359457
2019-04-29 15:27:27 +00:00
Jessica Paquette 76f64b665b [GlobalISel][AArch64] Use getConstantVRegValWithLookThrough for extracts
getConstantVRegValWithLookThrough does the same thing as the
getConstantValueForReg function, and has more visibility across GISel. Plus, it
supports looking through G_TRUNC, G_SEXT, and G_ZEXT. So, we get better code
reuse and more functionality for free by using it.

Add some test cases to select-extract-vector-elt.mir to show that we can now
look through those instructions.

llvm-svn: 359351
2019-04-26 21:53:13 +00:00
Nick Desaulniers 7ab164c4a4 [AsmPrinter] refactor to support %c w/ GlobalAddress'
Summary:
Targets like ARM, MSP430, PPC, and SystemZ have complex behavior when
printing the address of a MachineOperand::MO_GlobalAddress. Move that
handling into a new overriden method in each base class. A virtual
method was added to the base class for handling the generic case.

Refactors a few subclasses to support the target independent %a, %c, and
%n.

The patch also contains small cleanups for AVRAsmPrinter and
SystemZAsmPrinter.

It seems that NVPTXTargetLowering is possibly missing some logic to
transform GlobalAddressSDNodes for
TargetLowering::LowerAsmOperandForConstraint to handle with "i" extended
inline assembly asm constraints.

Fixes:
- https://bugs.llvm.org/show_bug.cgi?id=41402
- https://github.com/ClangBuiltLinux/linux/issues/449

Reviewers: echristo, void

Reviewed By: void

Subscribers: void, craig.topper, jholewinski, dschuff, jyknight, dylanmckay, sdardis, nemanjai, javed.absar, sbc100, jgravelle-google, eraman, kristof.beyls, hiraditya, aheejin, kbarton, fedor.sergeev, jrtc27, atanasyan, jsji, llvm-commits, kees, tpimh, nathanchance, peter.smith, srhines

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60887

llvm-svn: 359337
2019-04-26 18:45:04 +00:00
Jessica Paquette 67ab9eb193 [AArch64][GlobalISel] Select G_BSWAP for vectors of s32 and s64
There are instructions for these, so mark them as legal. Select the correct
instruction in AArch64InstructionSelector.cpp.

Update select-bswap.mir and arm64-rev.ll to reflect the changes.

llvm-svn: 359331
2019-04-26 18:00:01 +00:00
Hans Wennborg 5d5ee4aff7 Fix alignment in AArch64InstructionSelector::emitConstantPoolEntry()
The code was using the alignment of a pointer to the value, not the
alignment of the constant itself.

Maybe we got away with it so far because the pointer alignment is
fairly high, but we did end up under-aligning <16 x i8> vectors,
which was caught in the Chromium build after lld stopped over-aligning
the .rodata.cst16 section in r356428. (See crbug.com/953815)

Differential revision: https://reviews.llvm.org/D61124

llvm-svn: 359287
2019-04-26 08:31:00 +00:00
Jessica Paquette f54258c888 [GlobalISel][AArch64] Make G_EXTRACT_VECTOR_ELT legal for v8s16s
This case was missing before, so we couldn't legalize it.

Add it to AArch64LegalizerInfo.cpp and update select-extract-vector-elt.mir.

llvm-svn: 359231
2019-04-25 20:00:57 +00:00
Jessica Paquette 8184b6e7f6 [GlobalISel][AArch64] Add generic legalization rule for extends
This adds a legalization rule for G_ZEXT, G_ANYEXT, and G_SEXT which allows
extends whenever the types will fit in registers (or the source is an s1).

Update tests. Add GISel checks throughout all of arm64-vabs.ll,
where we now select a good portion of the code. Add GISel checks to
arm64-subvector-extend.ll, which has a good number of vector extends in it.

Differential Revision: https://reviews.llvm.org/D60889

llvm-svn: 359222
2019-04-25 18:42:00 +00:00
Jessica Paquette ba55767f51 [GlobalISel][AArch64] Legalize G_FNEARBYINT
Add legalizer support for G_FNEARBYINT. It's the same as G_FCEIL etc.

Since the importer allows us to automatically select this after legalization,
also add tests for selection etc. Also update arm64-vfloatintrinsics.ll.

llvm-svn: 359204
2019-04-25 16:44:40 +00:00
Jessica Paquette 4fe7574d5d [AArch64][GlobalISel] Select G_INTRINSIC_ROUND
Add selection support for G_INTRINSIC_ROUND, add a selection test, and add
check lines to arm64-vfloatintrinsics.ll and f16-instructions.ll.

llvm-svn: 359046
2019-04-23 23:03:03 +00:00
Jessica Paquette 9766bf1854 [AArch64][GlobalISel] Mark G_INTRINSIC_ROUND as a pre-isel floating point opcode
Add G_INTRINSIC_ROUND to isPreISelGenericFloatingPointOpcode to ensure that its
input and output are assigned the correct register bank.

Add a regbankselect test to verify that we get what we expect here.

llvm-svn: 359044
2019-04-23 22:47:00 +00:00
Jessica Paquette 3cc6d1f542 [AArch64][GlobalISel] Legalize G_INTRINSIC_ROUND
Add it to the same rule as G_FCEIL etc. Add a legalizer test, and add a missing
switch case to AArch64LegalizerInfo.cpp.

llvm-svn: 359033
2019-04-23 21:11:57 +00:00
Jessica Paquette 991cb39242 [AArch64][GlobalISel] Actually select G_INTRINSIC_TRUNC
Apparently FileCheck wasn't actually matching the fallback check lines in
arm64-vfloatintrinsics.ll properly. So, there were selection fallbacks for
G_INTRINSIC_TRUNC there.

Actually hook it up into AArch64InstructionSelector.cpp and write a proper
selection test.

I guess I'll figure out the FileCheck magic to make the fallback checks work
properly in arm64-vfloatintrinsics.ll.

llvm-svn: 359030
2019-04-23 20:46:19 +00:00
Jessica Paquette ede0b2e695 [AArch64][GlobalISel] Teach regbankselect about G_INTRINSIC_TRUNC
Add it to isPreISelGenericFloatingPointOpcode, and add a regbankselect test.

Update arm64-vfloatintrinsics.ll now that we can select it.

llvm-svn: 359022
2019-04-23 18:20:47 +00:00
Jessica Paquette 56342642a0 [AArch64][GlobalISel] Legalize G_INTRINSIC_TRUNC
Same patch as G_FCEIL etc.

Add the missing switch case in widenScalar, add G_INTRINSIC_TRUNC to the correct
rule in AArch64LegalizerInfo.cpp, and add a test.

llvm-svn: 359021
2019-04-23 18:20:44 +00:00
Jessica Paquette df5ce782ad [AArch64][GlobalISel] Legalize G_FMA for more vector types
Same as G_FCEIL, G_FABS, etc. Just move it into that rule.

Add a legalizer test for G_FMA, which we didn't have before and update
arm64-vfloatintrinsics.ll.

llvm-svn: 359015
2019-04-23 17:37:56 +00:00
Jessica Paquette e50e6d2563 [AArch64][GlobalISel] Add G_FMA to isPreISelGenericFloatingPointOpcode
Noticed an unnecessary fallback in arm64-vmul caused by this.

Also add a regbankselect test for G_FMA.

llvm-svn: 359013
2019-04-23 17:17:06 +00:00
Simon Pilgrim e7a68fd93e Fix MSVC "32-bit shift implicitly converted to 64 bits" warning. NFCI.
llvm-svn: 358969
2019-04-23 11:11:34 +00:00
Javed Absar 1cdc3dbc58 [AArch64] Add support for MTE intrinsics
This patch provides intrinsics support for Memory Tagging Extension (MTE),
which was introduced with the Armv8.5-a architecture.
The intrinsics are described in detail in the latest
ACLE Q1 2019 documentation: https://developer.arm.com/docs/101028/latest
Reviewed by: David Spickett
Differential Revision: https://reviews.llvm.org/D60486

llvm-svn: 358963
2019-04-23 09:39:58 +00:00
Amara Emerson 4286652556 Revert r358800. Breaks Obsequi from the test suite.
The last attempt fixed gcc and consumer-typeset, but Obsequi seems to fail with
a different issue.

llvm-svn: 358829
2019-04-20 21:25:00 +00:00
Amara Emerson eac69e9377 Revert "Revert "[GlobalISel] Add legalization support for non-power-2 loads and stores""
We were shifting the wrong component of a split load when trying to combine them
back into a single value.

llvm-svn: 358800
2019-04-19 23:54:44 +00:00
Jessica Paquette d5c69e0836 [GlobalISel][AArch64] Legalize + select G_FRINT
Exactly the same as G_FCEIL, G_FABS, etc.

Add tests for the fp16/nofp16 behaviour, update arm64-vfloatintrinsics, etc.

Differential Revision: https://reviews.llvm.org/D60895

llvm-svn: 358799
2019-04-19 23:41:52 +00:00
Eli Friedman 1810339bc3 [AArch64] Fix checks for AArch64MCExpr::VK_SABS flag.
VK_SABS is part of the SymLoc bitfield in the variant kind which should
be compared for equality, not by checking the VK_SABS bit.

As far as I know, the existing code happened to produce the correct
results in all cases, so this is just a cleanup.

Patch by Stephen Crane.

Differential Revision: https://reviews.llvm.org/D60596

llvm-svn: 358788
2019-04-19 21:58:10 +00:00
Amara Emerson 36c5baef49 Revert "[GlobalISel] Add legalization support for non-power-2 loads and stores"
This introduces some runtime failures which I'll need to investigate further.

llvm-svn: 358771
2019-04-19 17:42:13 +00:00
Jessica Paquette dfd87f6fa1 [GlobalISel][AArch64] Legalize vector G_FPOW
This instruction is legalized in the same way as G_FSIN, G_FCOS, G_FLOG10, etc.

Update legalize-pow.mir and arm64-vfloatintrinsics.ll to reflect the change.

Differential Revision: https://reviews.llvm.org/D60218

llvm-svn: 358764
2019-04-19 16:28:08 +00:00
Bjorn Pettersson 238c9d6308 [CodeGen] Add "const" to MachineInstr::mayAlias
Summary:
The basic idea here is to make it possible to use
MachineInstr::mayAlias also when the MachineInstr
is const (or the "Other" MachineInstr is const).

The addition of const in MachineInstr::mayAlias
then rippled down to the need for adding const
in several other places, such as
TargetTransformInfo::getMemOperandWithOffset.

Reviewers: hfinkel

Reviewed By: hfinkel

Subscribers: hfinkel, MatzeB, arsenm, jvesely, nhaehnle, hiraditya, javed.absar, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60856

llvm-svn: 358744
2019-04-19 09:08:38 +00:00
Jessica Paquette 0aa9b453c4 [GlobalISel][AArch64] Legalize/select G_(S/Z/ANY)_EXT for v8s8s
This adds legalization for G_SEXT, G_ZEXT, and G_ANYEXT for v8s8s.

We were falling back on G_ZEXT in arm64-vabs.ll before, preventing us from
selecting the @llvm.aarch64.neon.sabd.v8i8 intrinsic.

This adds legalizer support for those 3, which gives us selection via the
importer. Update the relevant tests (legalize-ext.mir, select-int-ext.mir) and
add a GISel line to arm64-vabs.ll.

Differential Revision: https://reviews.llvm.org/D60881

llvm-svn: 358715
2019-04-18 21:15:48 +00:00
Jessica Paquette 3b5119c684 [GlobalISel][AArch64] Legalize v8s8 loads
Add legalizer support for loads of v8s8 and update legalize-load-store.mir.

Differential Revision: https://reviews.llvm.org/D60877

llvm-svn: 358714
2019-04-18 21:13:58 +00:00
Nick Desaulniers 9609ce2f33 [AsmPrinter] hoist %a output template to base class for ARM+Aarch64
Summary:
X86 is quite complicated; so I intend to leave it as is. ARM+Aarch64 do
basically the same thing (Aarch64 did not correctly handle immediates,
ARM has a test llvm/test/CodeGen/ARM/2009-04-06-AsmModifier.ll that uses
%a with an immediate) for a flag that should be target independent
anyways.

Reviewers: echristo, peter.smith

Reviewed By: echristo

Subscribers: javed.absar, eraman, kristof.beyls, hiraditya, llvm-commits, srhines

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60841

llvm-svn: 358618
2019-04-17 22:21:10 +00:00
Amara Emerson daf6e66ac5 [GlobalISel] Add legalization support for non-power-2 loads and stores
Legalize things like i24 load/store by splitting them into smaller power of 2 operations.

This matches how SelectionDAG handles these operations.

Differential Revision: https://reviews.llvm.org/D59971

llvm-svn: 358613
2019-04-17 21:30:07 +00:00
Amara Emerson 946b1246d6 [GlobalISel] Enable CSE in the IRTranslator & legalizer for -O0 with constants only.
Other opcodes shouldn't be CSE'd until we can be sure debug info quality won't
be degraded.

This change also improves the IRTranslator so that in most places, but not all,
it creates constants using the MIRBuilder directly instead of first creating a
new destination vreg and then creating a constant. By doing this, the
buildConstant() method can just return the vreg of an existing G_CONSTANT
instead of having to create a COPY from it.

I measured a 0.2% improvement in compile time and a 0.9% improvement in code
size at -O0 ARM64.

Compile time:
Program                                        base   cse    diff
test-suite...ark/tramp3d-v4/tramp3d-v4.test     9.04   9.12  0.8%
test-suite...Mark/mafft/pairlocalalign.test     2.68   2.66 -0.7%
test-suite...-typeset/consumer-typeset.test     5.53   5.51 -0.4%
test-suite :: CTMark/lencod/lencod.test         5.30   5.28 -0.3%
test-suite :: CTMark/Bullet/bullet.test        25.82  25.76 -0.2%
test-suite...:: CTMark/ClamAV/clamscan.test     6.92   6.90 -0.2%
test-suite...TMark/7zip/7zip-benchmark.test    34.24  34.17 -0.2%
test-suite :: CTMark/SPASS/SPASS.test           6.25   6.24 -0.1%
test-suite...:: CTMark/sqlite3/sqlite3.test     1.66   1.66 -0.1%
test-suite :: CTMark/kimwitu++/kc.test         13.61  13.60 -0.0%
Geomean difference                                          -0.2%

Code size:
Program                                        base     cse      diff
test-suite...-typeset/consumer-typeset.test    1315632  1266480 -3.7%
test-suite...:: CTMark/ClamAV/clamscan.test    1313892  1297508 -1.2%
test-suite :: CTMark/lencod/lencod.test        1439504  1423112 -1.1%
test-suite...TMark/7zip/7zip-benchmark.test    2936980  2904172 -1.1%
test-suite :: CTMark/Bullet/bullet.test        3478276  3445460 -0.9%
test-suite...ark/tramp3d-v4/tramp3d-v4.test    8082868  8033492 -0.6%
test-suite :: CTMark/kimwitu++/kc.test         3870380  3853972 -0.4%
test-suite :: CTMark/SPASS/SPASS.test          1434904  1434896 -0.0%
test-suite...Mark/mafft/pairlocalalign.test    764528   764528   0.0%
test-suite...:: CTMark/sqlite3/sqlite3.test    782092   782092   0.0%
Geomean difference                                              -0.9%

Differential Revision: https://reviews.llvm.org/D60580

llvm-svn: 358369
2019-04-15 05:04:20 +00:00
Amara Emerson d189680baa [GlobalISel] Introduce a CSEConfigBase class to allow targets to define their own CSE configs.
Because CodeGen can't depend on GlobalISel, we need a way to encapsulate the CSE
configs that can be passed between TargetPassConfig and the targets' custom
pass configs. This CSEConfigBase allows targets to create custom CSE configs
which is then used by the GISel passes for the CSEMIRBuilder.

This support will be used in a follow up commit to allow constant-only CSE for
-O0 compiles in D60580.

llvm-svn: 358368
2019-04-15 04:53:46 +00:00
Amara Emerson 93e58d2396 [AArch64][GlobalISel] Enable copy elision in the pre-legalizer combine and fix a crash.
This enables the simple copy combine that already exists in the CombinerHelper.
However, it exposed a bug in the GISelChangeObserver where it wouldn't clear a
set of MIs to process, and so would end up causing a crash when deleted MIs were
being added to the combiner worklist again.

Differential Revision: https://reviews.llvm.org/D60579

llvm-svn: 358318
2019-04-13 00:33:25 +00:00
Amara Emerson 2806fd01a1 [AArch64][GlobalISel] Fix a crash when selecting shufflevectors with an undef mask element.
If a shufflevector's mask vector has an element with "undef" then the generic
instruction defining that element register is a G_IMPLICT_DEF instead of G_CONSTANT.
This fixes the selector to handle this case, and for now assumes that undef just means
zero. In future we'll optimize this case properly.

llvm-svn: 358312
2019-04-12 21:31:21 +00:00
Eric Christopher 6b06c6a5ef Add explicit dependencies on MCSection.h and MCDwarf.h to the .cpp
files rather than rely on transitive includes from MCStreamer.h.

llvm-svn: 358263
2019-04-12 07:40:01 +00:00
Amara Emerson 7e9355f870 [AArch64][GlobalISel] Flesh out vector load/store support for more types.
Some of these were legalizing into smaller vector types unnecessarily,
others were simply not supported yet.

llvm-svn: 358223
2019-04-11 20:40:01 +00:00
Amara Emerson b956051415 [AArch64][GlobalISel] Legalization and ISel support for load/stores of vectors of pointers.
Loads and store of values with type like <2 x p0> currently don't get imported
because SelectionDAG has no knowledge of pointer types. To leverage the existing
support for vector load/stores, we can bitcast the value to have s64 element
types instead. We do this as a custom legalization.

This patch also adds support for general loads of <2 x s64>, and relaxes some
type conditions on selecting G_BITCAST.

Differential Revision: https://reviews.llvm.org/D60534

llvm-svn: 358221
2019-04-11 20:32:24 +00:00
Diogo N. Sampaio 8ddfd46c61 [AArch64] Add lowering pattern for llvm.aarch64.neon.vcvtfxs2fp.f16.i64
Summary:  Add lowering pattern for llvm.aarch64.neon.vcvtfxs2fp.f16.i64

Reviewers: pbarrio, DavidSpickett, LukeGeeson

Reviewed By: LukeGeeson

Subscribers: javed.absar, kristof.beyls, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60259

llvm-svn: 358171
2019-04-11 14:19:43 +00:00
Amara Emerson 213e0bde04 [AArch64][GlobalISel] Make <2 x p0> = G_BUILD_VECTOR legal.
The existing isel support already works for p0 once the legalizer accepts it.

llvm-svn: 358144
2019-04-10 23:06:14 +00:00
Amara Emerson a7ff111b04 [AArch64][GlobalISel] Add legalizer support for <8 x s16> and <16 x s8> G_ADD.
llvm-svn: 358143
2019-04-10 23:06:11 +00:00
Amara Emerson ae878dab03 [AArch64][GlobalISel] Scalarize vector SDIV.
llvm-svn: 358142
2019-04-10 23:06:08 +00:00
Craig Topper 35fe07916a [AArch64] Teach getTestBitOperand to look through ANY_EXTENDS
This patch teach getTestBitOperand to look through ANY_EXTENDs when the extended bits aren't used. The test case changed here is based what D60358 did to test16 in tbz-tbnz.ll. So this patch will avoid that regression.

Differential Revision: https://reviews.llvm.org/D60482

llvm-svn: 358108
2019-04-10 17:27:29 +00:00
Nick Desaulniers 5277b3ff25 [AsmPrinter] refactor to remove remove AsmVariant. NFC
Summary:
The InlineAsm::AsmDialect is only required for X86; no architecture
makes use of it and as such it gets passed around between arch-specific
and general code while being unused for all architectures but X86.

Since the AsmDialect is queried from a MachineInstr, which we also pass
around, remove the additional AsmDialect parameter and query for it deep
in the X86AsmPrinter only when needed/as late as possible.

This refactor should help later planned refactors to AsmPrinter, as this
difference in the X86AsmPrinter makes it harder to make AsmPrinter more
generic.

Reviewers: craig.topper

Subscribers: jholewinski, arsenm, dschuff, jyknight, dylanmckay, sdardis, nemanjai, jvesely, nhaehnle, javed.absar, sbc100, jgravelle-google, eraman, hiraditya, aheejin, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, jsji, llvm-commits, peter.smith, srhines

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60488

llvm-svn: 358101
2019-04-10 16:38:43 +00:00
Diogo N. Sampaio aae424a2d2 [AArch64] Add lowering pattern for scalar fp16 facge and facgt
Summary: The fp16 scalar version of facge and facgt requires a custom patter matching, as the result type is not the same width of the operands.

Reviewers: olista01, javed.absar, pbarrio

Reviewed By: javed.absar

Subscribers: kristof.beyls, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60212

llvm-svn: 358083
2019-04-10 13:34:18 +00:00
Clement Courbet 48e2eb0b27 [NFC] Fix unused variable warning.
llvm-svn: 358080
2019-04-10 13:18:05 +00:00
Amara Emerson 9bf092d719 [AArch64][GlobalISel] Add isel support for vector G_ICMP and G_ASHR & G_SHL
The selection for G_ICMP is unfortunately not currently importable from SDAG
due to the use of custom SDNodes. To support this, this selection method has an
opcode table which has been generated by a script, indexed by various
instruction properties. Ideally in future we will have a GISel native selection
patterns that we can write in tablegen to improve on this.

For selection of some types we also need support for G_ASHR and G_SHL which are
generated as a result of legalization. This patch also adds support for them,
generating the same code as SelectionDAG currently does.

Differential Revision: https://reviews.llvm.org/D60436

llvm-svn: 358035
2019-04-09 21:22:43 +00:00
Amara Emerson 888dd5d198 [AArch64][GlobalISel] Legalize vector G_ICMP.
Selection support will be coming in a later patch.

Differential Revision: https://reviews.llvm.org/D60435

llvm-svn: 358034
2019-04-09 21:22:40 +00:00
Amara Emerson 92d74f19cf [AArch64][GlobalISel] Add legalization for some vector G_SHL and G_ASHR.
This is needed for some future support for vector ICMP.

Differential Revision: https://reviews.llvm.org/D60433

llvm-svn: 358033
2019-04-09 21:22:37 +00:00
Amara Emerson 2b523f8162 [GlobalISel][AArch64] Allow CallLowering to handle types which are normally
required to be passed as different register types. E.g. <2 x i16> may need to
be passed as a larger <2 x i32> type, so formal arg lowering needs to be able
truncate it back. Likewise, when dealing with returns of these types, they need
to be widened in the appropriate way back.

Differential Revision: https://reviews.llvm.org/D60425

llvm-svn: 358032
2019-04-09 21:22:33 +00:00
Evandro Menezes 85bd3978ae [IR] Refactor attribute methods in Function class (NFC)
Rename the functions that query the optimization kind attributes.

Differential revision: https://reviews.llvm.org/D60287

llvm-svn: 357731
2019-04-04 22:40:06 +00:00
Sander de Smalen 772e4734d9 [AArch64][AsmParser] Fix .arch_extension directive parsing
This patch fixes .arch_extension directive parsing to handle a wider
range of architecture extension options. The existing parser was parsing
extensions as an identifier which breaks for extensions containing a
"-", such as the "tlb-rmi" extension.

The extension is now parsed as a string. This is consistent with the
extension parsing in the .arch and .cpu directive parsing.

Patch by Cullen Rhodes (c-rhodes)

Reviewed By: SjoerdMeijer

Differential Revision: https://reviews.llvm.org/D60118

llvm-svn: 357677
2019-04-04 09:11:17 +00:00
Jessica Paquette e794121cd0 [AArch64][GlobalISel] Legalize G_FEXP2
Same as G_EXP. Add a test, and update legalizer-info-validation.mir and
f16-instructions.ll.

Differential Revision: https://reviews.llvm.org/D60165

llvm-svn: 357605
2019-04-03 16:58:32 +00:00
Javed Absar 5820db93c9 [AArch64] Update v8.5a MTE LDG/STG instructions
The latest MTE specification adds register Xt to the STG instruction family:
  STG [Xn, #offset] -> STG Xt, [Xn, #offset]
The tag written to memory is taken from Xt rather than Xn.
Also, the LDG instruction also was changed to read return address from Xt:
  LDG Xt, [Xn, #offset].
This patch includes those changes and tests.
Specification is at: https://developer.arm.com/docs/ddi0596/c
Differential Revision: https://reviews.llvm.org/D60188

llvm-svn: 357583
2019-04-03 14:12:13 +00:00
Jessica Paquette 22c6215c7e [AArch64][GlobalISel] Select llvm.aarch64.stlxr(i64, i64*)
This adds partial instruction selection support for llvm.aarch64.stlxr. It also
factors out selection for G_INTRINSIC_W_SIDE_EFFECTS into its own function. The
new function removes the restriction that the intrinsic ID on the
G_INTRINSIC_W_SIDE_EFFECTS be on operand 0.

Also add a test, and add a GISel line to arm64-ldxr-stxr.ll.

Differential Revision: https://reviews.llvm.org/D60100

llvm-svn: 357518
2019-04-02 19:57:26 +00:00
Jessica Paquette e44c20a68d [AArch64][GlobalISe] Select STRQui for stores into v264s instead of scalarizing
This improves selection for vector stores into v2s64s. Before we just
scalarized them, but we can just use a STRQui instead.

Differential Revision: https://reviews.llvm.org/D60083

llvm-svn: 357432
2019-04-01 22:19:13 +00:00
David Spickett 3d233d5d4d [AArch64] Add v8.5-a Memory Tagging STZGM instruction
This instruction writes a block of allocation tags
and stores zero to the associated data locations.

It differs from STGM by 1 bit and has the same
arguments.

The specification can be found here:
https://developer.arm.com/docs/ddi0596/c

Differential Revision: https://reviews.llvm.org/D60065

llvm-svn: 357397
2019-04-01 14:56:37 +00:00
David Spickett 9142b8ef1b [AArch64] Add v8.5-a Memory Tagging STGM/LDGM instructions
The STGV/LDGV instructions were replaced with
STGM/LDGM. The encodings remain the same but there
is no longer writeback so there are no unpredictable
encodings to check for.

The specfication can be found here:
https://developer.arm.com/docs/ddi0596/c

Differential Revision: https://reviews.llvm.org/D60064

llvm-svn: 357395
2019-04-01 14:52:18 +00:00
David Spickett efe376add6 [AArch64] Add v8.5-a Memory Tagging GMID_EL1 register
The latest version of the MTE spec added a system
register 'GMID_EL1'. It contains the block size used
by the LDGM and STGM instructions and is read only.

The specification can be found here:
https://developer.arm.com/docs/ddi0596/c

llvm-svn: 357392
2019-04-01 14:41:14 +00:00
Jessica Paquette d3ffd47df9 [GlobalISel][AArch64] Add isel support for G_INSERT_VECTOR_ELT on v2s32s
This adds support for v2s32 vector inserts, and updates the selection +
regbankselect tests for G_INSERT_VECTOR_ELT.

Differential Revision: https://reviews.llvm.org/D59910

llvm-svn: 357318
2019-03-29 21:39:36 +00:00
Evandro Menezes 0f797b8732 [CodeGen] Refactor the option for the maximum jump table size
Refactor the option `max-jump-table-size` to default to the maximum
representable number.  Essentially, NFC.

llvm-svn: 357280
2019-03-29 17:28:11 +00:00
Amara Emerson 8a02aea6fc [AArch64][GlobalISel] Make G_PHI of v2s64, v4s32, v2s32 legal.
llvm-svn: 357108
2019-03-27 18:31:46 +00:00
Sander de Smalen e1eab42f65 [AArch64][SVE] Asm: error on unexpected SVE vector register type suffix
This patch fixes an assembler bug that allowed SVE vector registers to contain a
type suffix when not expected. The SVE unpredicated movprfx instruction is the
only instruction affected.

The following are examples of what was previously valid:

    movprfx z0.b, z0.b
    movprfx z0.b, z0.s
    movprfx z0, z0.s

These instructions are now erroneous.

Patch by Cullen Rhodes (c-rhodes)

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D59636

llvm-svn: 357094
2019-03-27 17:23:38 +00:00
Sander de Smalen 90d1b551e1 [AArch64] NFC: Cleanup isAArch64FrameOffsetLegal
Cleanup isAArch64FrameOffsetLegal by:
- Merging the large switch statement to reuse AArch64InstrInfo::getMemOpInfo().
- Using AArch64InstrInfo::getUnscaledLdSt() to determine whether an instruction
  has an unscaled variant.
- Simplifying the logic that calculates the offset to fit the immediate.

Reviewers: paquette, evandro, eli.friedman, efriedma

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D59636

llvm-svn: 357064
2019-03-27 13:16:19 +00:00
Sander de Smalen 46edefe3c4 [AArch64] Adds cases for LDRSHWui and LDRSHXui to getMemOpInfo
This patch also adds cases PRFUMi and PRFMui.
This change was discussed in https://reviews.llvm.org/D59635.

llvm-svn: 357059
2019-03-27 10:39:03 +00:00
Eli Friedman 92d0d13366 [AArch64] Prefer "mov" over "orr" to materialize constants.
This is generally more readable due to the way the assembler aliases
work.

(This causes a lot of test changes, but it's not really as scary as it
looks at first glance; it's just mechanically changing a bunch of checks
for orr to check for mov instead.)

Differential Revision: https://reviews.llvm.org/D59720

llvm-svn: 356954
2019-03-25 21:25:28 +00:00
Evandro Menezes 4a7739b681 [AArch64, ARM] Add support for Exynos M5
Add Exynos M5 support and test cases.

llvm-svn: 356793
2019-03-22 18:42:14 +00:00
Amara Emerson c10b24691a [AArch64] Split the neon.addp intrinsic into integer and fp variants.
This is the result of discussions on the list about how to deal with intrinsics
which require codegen to disambiguate them via only the integer/fp overloads.
It causes problems for GlobalISel as some of that information is lost during
translation, while with other operations like IR instructions the information is
encoded into the instruction opcode.

This patch changes clang to emit the new faddp intrinsic if the vector operands
to the builtin have FP element types. LLVM IR AutoUpgrade has been taught to
upgrade existing calls to aarch64.neon.addp with fp vector arguments, and
we remove the workarounds introduced for GlobalISel in r355865.

This is a more permanent solution to PR40968.

Differential Revision: https://reviews.llvm.org/D59655

llvm-svn: 356722
2019-03-21 22:31:37 +00:00
Evandro Menezes e5e77815b4 [AArch64] Update for Exynos
Fix the feature set for Exynos M4 by removing support for `+fp16fml` and fix test case.

llvm-svn: 356698
2019-03-21 18:54:58 +00:00
Oliver Stannard defdb1070f [AArch64] Allow -mattr=tpidr-el[1|2|3]
Added subtarget features for AArch64 to use TPIDR_EL[1|2|3] as the TLS base
register, rather than the default TPIDR_EL0.

Patch by Philip Derrin!

Differential revision: https://reviews.llvm.org/D54685

llvm-svn: 356657
2019-03-21 11:30:17 +00:00
Amara Emerson 761ca2e53b [AArch64][GlobalISel] Add an optimization to select vector DUP instructions.
This adds pattern matching for the insert+shufflevector sequence so we can
generate dup instructions instead of the current TBL sequence.

Differential Revision: https://reviews.llvm.org/D59558

llvm-svn: 356526
2019-03-19 21:43:05 +00:00
Amara Emerson 18e2c5724a [AArch64][GlobalISel] Make v4s32 G_IMPLICIT_DEF legal.
llvm-svn: 356525
2019-03-19 21:43:02 +00:00
Amara Emerson 8627178d46 Revert r356304: remove subreg parameter from MachineIRBuilder::buildCopy()
After review comments, it was preferred to not teach MachineIRBuilder about
non-generic instructions beyond using buildInstr().

For AArch64 I've changed the buildCopy() calls to buildInstr() + a
separate addReg() call.

This also relaxes the MachineIRBuilder's COPY checking more because it may
not always have a SrcOp given to it.

llvm-svn: 356396
2019-03-18 19:20:10 +00:00
Adhemerval Zanella 270249de2b [AArch64] Small fix for getIntImmCost
It uses the generic AArch64_IMM::expandMOVImm to get the correct
number of instruction used in immediate materialization.

Reviewers: efriedma

Differential Revision: https://reviews.llvm.org/D58461

llvm-svn: 356391
2019-03-18 18:50:58 +00:00
Adhemerval Zanella a3cefa5d64 [AArch64] Optimize floating point materialization
This patch follows some ideas from r352866 to optimize the floating
point materialization even further. It changes isFPImmLegal to
considere up to 2 mov instruction or up to 5 in case subtarget has
fused literals.

The rationale is the cost is the same for mov+fmov vs. adrp+ldr; but
the mov+fmov sequence is always better because of the reduced d-cache
pressure. The timings are still the same if you consider movw+movk+fmov
vs. adrp+ldr will be fused (although one instruction longer).

Reviewers: efriedma

Differential Revision: https://reviews.llvm.org/D58460

llvm-svn: 356390
2019-03-18 18:45:57 +00:00
Adhemerval Zanella 664c1ef528 [TargetLowering] Add code size information on isFPImmLegal. NFC
This allows better code size for aarch64 floating point materialization
in a future patch.

Reviewers: evandro

Differential Revision: https://reviews.llvm.org/D58690

llvm-svn: 356389
2019-03-18 18:40:07 +00:00
Adhemerval Zanella 8a595b1d2e [AArch64] Refactor floating point materialization. NFC
It splits the login of actual instruction emission away from the logic
that figures out the appropriate sequence on AArch64ExpandPseudo::expandMOVImm.
The new function AArch64_IMM::expandMOVImm, which return the list of the 
instructions to materialize the immediate constant, is implemented on a 
separated unit because it will be used in a subsequent patch to optimize
floating point materialization.

Reviewers: efriedma

Differential Revision: https://reviews.llvm.org/D58915

llvm-svn: 356387
2019-03-18 18:23:23 +00:00
Christof Douma 8cfd91dcc7 [AArch64] Fix bug 35094 atomicrmw on Armv8.1-A+lse
Fixes https://bugs.llvm.org/show_bug.cgi?id=35094

The Dead register definition pass should leave alone the atomicrmw
instructions on AArch64 (LTE extension). The reason is the following
statement in the Arm ARM:

"The ST<OP> instructions, and LD<OP> instructions where the destination
register is WZR or XZR, are not regarded as doing a read for the purpose
of a DMB LD barrier."

A good example was given in the gcc thread by Will Deacon (linked in the
bugzilla ticket 35094):

    P0 (atomic_int* y,atomic_int* x) {
      atomic_store_explicit(x,1,memory_order_relaxed);
      atomic_thread_fence(memory_order_release);
      atomic_store_explicit(y,1,memory_order_relaxed);
    }

    P1 (atomic_int* y,atomic_int* x) {
      atomic_fetch_add_explicit(y,1,memory_order_relaxed);  // STADD
      atomic_thread_fence(memory_order_acquire);
      int r0 = atomic_load_explicit(x,memory_order_relaxed);
    }

    P2 (atomic_int* y) {
      int r1 = atomic_load_explicit(y,memory_order_relaxed);
    }

    My understanding is that it is forbidden for r0 == 0 and r1 == 2 after
    this test has executed. However, if the relaxed add in P1 compiles to
    STADD and the subsequent acquire fence is compiled as DMB LD, then we
    don't have any ordering guarantees in P1 and the forbidden result could
    be observed.

Change-Id: I419f9f9df947716932038e1100c18d10a96408d0
llvm-svn: 356360
2019-03-18 09:21:06 +00:00
Amara Emerson 3739a20875 [GlobalISel] Allow MachineIRBuilder to build subregister copies.
This relaxes some asserts about sizes, and adds an optional subreg parameter
to buildCopy().

Also update AArch64 instruction selector to use this in places where we
previously used MachineInstrBuilder manually.

Differential Revision: https://reviews.llvm.org/D59434

llvm-svn: 356304
2019-03-15 21:59:50 +00:00
Nikita Popov 1a26144ff5 [AArch64] Turn BIC immediate creation into a DAG combine
Switch BIC immediate creation for vector ANDs from custom lowering
to a DAG combine, which gives generic DAG combines a change to
apply first. In particular this avoids (and x, -1) being turned into
a (bic x, 0) instead of being eliminated entirely.

Differential Revision: https://reviews.llvm.org/D59187

llvm-svn: 356299
2019-03-15 21:04:34 +00:00
Amara Emerson d55016b276 [AArch64][GlobalISel] Regbankselect: Fix G_BUILD_VECTOR trying to use s16 gpr sources.
Since we can't insert s16 gprs as we don't have 16 bit GPR registers, we need to
teach RBS to assign them to the FPR bank so our selector works.

llvm-svn: 356282
2019-03-15 18:00:01 +00:00
Jessica Paquette 7d6784f522 [AArch64][GlobalISel] Add isel support for G_UADDO on s32s and s64s
This adds instruction selection support for G_UADDO on s32s and s64s.

Also
- Add an instruction selection test
- Update the arm64-xaluo.ll test to show that we generate the correct assembly

Differential Revision: https://reviews.llvm.org/D58734

llvm-svn: 356214
2019-03-14 22:54:29 +00:00
Amara Emerson d61b89be8d [AArch64][GlobalISel] Implement selection for G_UNMERGE of vectors to vectors.
This re-uses the previous support for extract vector elt to extract the
subvectors.

Differential Revision: https://reviews.llvm.org/D59390

llvm-svn: 356213
2019-03-14 22:48:18 +00:00
Amara Emerson 2ff2298c3e [AArch64][GlobalISel] Add some support for G_CONCAT_VECTORS.
Handles concatenating 2 x v2s32 and 2 x v4s16

Differential Revision: https://reviews.llvm.org/D59390

llvm-svn: 356212
2019-03-14 22:48:15 +00:00
Jessica Paquette 5aff1f475c [GlobalISel][AArch64] Add partial selection support for G_INSERT_VECTOR_ELT
This adds support for inserting elements into packed vectors. It also adds
two tests: one for selection, and one for regbank select.

Unpacked vectors will come in a follow-up.

Differential Revision: https://reviews.llvm.org/D59325

llvm-svn: 356182
2019-03-14 18:01:30 +00:00
Jessica Paquette 85ace6269f [AArch64][GlobalISel] Gardening: Simplify subregister copy in selectBuildVector
NFC. Some more preliminary factoring for G_INSERT_VECTOR_ELT.

Also better code-reuse, etc., etc.

Differential Revision: https://reviews.llvm.org/D59323

llvm-svn: 356107
2019-03-13 23:29:54 +00:00
Jessica Paquette 16d67a3e32 [GlobalISel][AArch64] Gardening: Factor out vector inserts
Factor out the vector insert code in `selectBuildVector`. Replace part of it
with `emitScalarToVector`, since it was pretty much equivalent.

This will make implementing G_INSERT_VECTOR_ELT easier.

Differential Revision: https://reviews.llvm.org/D59322

llvm-svn: 356106
2019-03-13 23:22:23 +00:00
Jessica Paquette bb1aced80d [GlobalISel][AArch64] Gardening: Factor out code to find lane indices
Some more refactoring for G_INSERT_VECTOR_ELT.

Factor out the code used to find a lane index from `selectExtractElt`. Put it
into a more general-purpose `getConstantValueForReg` function.

This will be shared with the code for G_INSERT_VECTOR_ELT.

Differential Revision: https://reviews.llvm.org/D59324

llvm-svn: 356101
2019-03-13 21:19:29 +00:00
Jessica Paquette 607774c960 Recommit "[GlobalISel][AArch64] Add selection support for G_EXTRACT_VECTOR_ELT"
After r355865, we should be able to safely select G_EXTRACT_VECTOR_ELT without
running into any problematic intrinsics.

Also add a fix for lane copies, which don't support index 0.

llvm-svn: 355871
2019-03-11 22:18:01 +00:00
Jessica Paquette 42d16501e6 [GlobalISel][AArch64] Always fall back on aarch64.neon.addp.*
Overloaded intrinsics aren't necessarily safe for instruction selection. One
such intrinsic is aarch64.neon.addp.*.

This is a temporary workaround to ensure that we always fall back on that
intrinsic. Eventually this will be replaced with a proper solution.

https://bugs.llvm.org/show_bug.cgi?id=40968

Differential Revision: https://reviews.llvm.org/D59062

llvm-svn: 355865
2019-03-11 20:51:17 +00:00
Nikita Popov aa7cfa75f9 [SDAG][AArch64] Legalize VECREDUCE
Fixes https://bugs.llvm.org/show_bug.cgi?id=36796.

Implement basic legalizations (PromoteIntRes, PromoteIntOp,
ExpandIntRes, ScalarizeVecOp, WidenVecOp) for VECREDUCE opcodes.
There are more legalizations missing (esp float legalizations),
but there's no way to test them right now, so I'm not adding them.

This also includes a few more changes to make this work somewhat
reasonably:

 * Add support for expanding VECREDUCE in SDAG. Usually
   experimental.vector.reduce is expanded prior to codegen, but if the
   target does have native vector reduce, it may of course still be
   necessary to expand due to legalization issues. This uses a shuffle
   reduction if possible, followed by a naive scalar reduction.
 * Allow the result type of integer VECREDUCE to be larger than the
   vector element type. For example we need to be able to reduce a v8i8
   into an (nominally) i32 result type on AArch64.
 * Use the vector operand type rather than the scalar result type to
   determine the action, so we can control exactly which vector types are
   supported. Also change the legalize vector op code to handle
   operations that only have vector operands, but no vector results, as
   is the case for VECREDUCE.
 * Default VECREDUCE to Expand. On AArch64 (only target using VECREDUCE),
   explicitly specify for which vector types the reductions are supported.

This does not handle anything related to VECREDUCE_STRICT_*.

Differential Revision: https://reviews.llvm.org/D58015

llvm-svn: 355860
2019-03-11 20:22:13 +00:00
Stanislav Mekhanoshin e98944ed47 Use bitset for assembler predicates
AMDGPU target run out of Subtarget feature flags hitting the limit of 64.
AssemblerPredicates uses at most uint64_t for their representation.
At the same time CodeGen has exhausted this a long time ago and switched
to a FeatureBitset with the current limit of 192 bits.

This patch completes transition to the bitset for feature bits extending
it to asm matcher and MC code emitter.

Differential Revision: https://reviews.llvm.org/D59002

llvm-svn: 355839
2019-03-11 17:04:35 +00:00
Amara Emerson 7a05d1c1f1 [AArch64][GlobalISel] Fix i1 arguments not being zero-extended as required by ABI.
Fixes PR41001.

llvm-svn: 355745
2019-03-08 22:17:00 +00:00
Mitch Phillips 790edbc16e [HWASan] Save + print registers when tag mismatch occurs in AArch64.
Summary:
This change change the instrumentation to allow users to view the registers at the point at which tag mismatch occured. Most of the heavy lifting is done in the runtime library, where we save the registers to the stack and emit unwind information. This allows us to reduce the overhead, as very little additional work needs to be done in each __hwasan_check instance.

In this implementation, the fast path of __hwasan_check is unmodified. There are an additional 4 instructions (16B) emitted in the slow path in every __hwasan_check instance. This may increase binary size somewhat, but as most of the work is done in the runtime library, it's manageable.

The failure trace now contains a list of registers at the point of which the failure occured, in a format similar to that of Android's tombstones. It currently has the following format:

Registers where the failure occurred (pc 0x0055555561b4):
    x0  0000000000000014  x1  0000007ffffff6c0  x2  1100007ffffff6d0  x3  12000056ffffe025
    x4  0000007fff800000  x5  0000000000000014  x6  0000007fff800000  x7  0000000000000001
    x8  12000056ffffe020  x9  0200007700000000  x10 0200007700000000  x11 0000000000000000
    x12 0000007fffffdde0  x13 0000000000000000  x14 02b65b01f7a97490  x15 0000000000000000
    x16 0000007fb77376b8  x17 0000000000000012  x18 0000007fb7ed6000  x19 0000005555556078
    x20 0000007ffffff768  x21 0000007ffffff778  x22 0000000000000001  x23 0000000000000000
    x24 0000000000000000  x25 0000000000000000  x26 0000000000000000  x27 0000000000000000
    x28 0000000000000000  x29 0000007ffffff6f0  x30 00000055555561b4

... and prints after the dump of memory tags around the buggy address.

Every register is saved exactly as it was at the point where the tag mismatch occurs, with the exception of x16/x17. These registers are used in the tag mismatch calculation as scratch registers during __hwasan_check, and cannot be saved without affecting the fast path. As these registers are designated as scratch registers for linking, there should be no important information in them that could aid in debugging.

Reviewers: pcc, eugenis

Reviewed By: pcc, eugenis

Subscribers: srhines, kubamracek, mgorny, javed.absar, krytarowski, kristof.beyls, hiraditya, jdoerfert, llvm-commits, #sanitizers

Tags: #sanitizers, #llvm

Differential Revision: https://reviews.llvm.org/D58857

llvm-svn: 355738
2019-03-08 21:22:35 +00:00
Abderrazek Zaafrani 5ced596198 [AArch64] Improve FP16 instruction selection for vector round and vector conver from half instructions
https://reviews.llvm.org/D58855

llvm-svn: 355545
2019-03-06 20:30:06 +00:00
Amara Emerson 21f44dfe9c [AArch64] Remove a stray test from the AArch64 directory.
llvm-svn: 355534
2019-03-06 18:54:07 +00:00
Jessica Paquette 00d5847b5c Revert "[GlobalISel][AArch64] Add selection support for G_EXTRACT_VECTOR_ELT"
This broke test-suite::aarch64_neon_intrinsics.test

Reverting while I look into it.

Example failure:
http://lab.llvm.org:8011/builders/clang-cmake-aarch64-quick/builds/17740

llvm-svn: 355408
2019-03-05 15:47:00 +00:00
Jessica Paquette caf62b1d47 [GlobalISel][AArch64] Add selection support for G_EXTRACT_VECTOR_ELT
This adds instruction selection support for G_EXTRACT_VECTOR_ELT for cases
where the index is defined by a G_CONSTANT.

It also factos out the lane copy opcode selection part into its own function,
`getLaneCopyOpcode`. This is used by both `selectUnmergeValues` and
`selectExtractElt`.

Differential Revision: https://reviews.llvm.org/D58469

llvm-svn: 355344
2019-03-04 22:35:32 +00:00
Jessica Paquette 0632e12f89 [GlobalISel][AArch64] Legalize vector G_SELECT
Just scalarize it, and add a test showing it works.

Differential Revision: https://reviews.llvm.org/D58747

llvm-svn: 355339
2019-03-04 21:12:46 +00:00
Amara Emerson 8acb0d9c82 Re-commit r355104: "[AArch64][GlobalISel] Add support for 64 bit vector shuffle using TBL1."
The code to materialize a mask from a constant pool load tried to use a 128 bit
LDR to load a 64 bit constant pool entry, which was 8 byte aligned. This resulted
in a link failure in the NEON tests in the test suite since the LDR address was
unaligned. This change fixes that to instead emit a 64 bit LDR if the entry is
64 bit, before converting back to a 128 bit register for the TBL.

llvm-svn: 355326
2019-03-04 19:16:00 +00:00
Jonas Hahnfeld 65a401f6a9 [AArch64/ARM] Fix two compiler warnings in InstructionSelector, NFCI
1) GCC complains that KnownValid is set but not used.
2) In ARMInstructionSelector::selectGlobal() the code is mixing "enumeral
   and non-enumeral type in conditional expression". Solve this by casting
   to unsigned which is the final type anyway.

Differential Revision: https://reviews.llvm.org/D58834

llvm-svn: 355304
2019-03-04 08:51:32 +00:00
Eli Friedman d19a7060c6 [AArch64] [Windows] Don't skip constructing UnwindHelp.
In certain cases, the first non-frame-setup instruction in a function is
a branch.  For example, it could be a cbz on an argument.  Make sure we
correctly allocate the UnwindHelp, and find an appropriate register to
use to initialize it.

Fixes https://bugs.llvm.org/show_bug.cgi?id=40184

Differential Revision: https://reviews.llvm.org/D58752

llvm-svn: 355136
2019-02-28 20:38:45 +00:00
Abderrazek Zaafrani abfd10807c [AArch64] Improve FP16 vector convert from short instructions.
https://reviews.llvm.org/D58563

llvm-svn: 355134
2019-02-28 20:21:46 +00:00
Amara Emerson 8d70e6425c Revert "[AArch64][GlobalISel] Add support for 64 bit vector shuffle using TBL1."
Seems to break some neon intrinsics tests.

llvm-svn: 355115
2019-02-28 18:47:29 +00:00
Amara Emerson 85c3afd7f6 [AArch64][GlobalISel] Add support for 64 bit vector shuffle using TBL1.
This extends the existing support for shufflevector to handle cases like
<2 x float>, which we can implement by concating the vectors and using a TBL1.

Differential Revision: https://reviews.llvm.org/D58684

llvm-svn: 355104
2019-02-28 16:43:11 +00:00
Abderrazek Zaafrani 2fc498a652 [AArch64] Generate FP16 vector compare instructions.
https://reviews.llvm.org/D58561

llvm-svn: 355050
2019-02-28 00:31:38 +00:00
Amara Emerson 6bcfa1c419 [AArch64][GlobalISel] Refactor selectBuildVector to use MachineIRBuilder. NFC.
This is a preparatory change as I want to use emitScalarToVector() elsewhere,
and in general we want to transition to MIRBuilder instead of using BuildMI
directly.

Differential Revision: https://reviews.llvm.org/D58528

llvm-svn: 354807
2019-02-25 18:52:54 +00:00
Luke Cheeseman 59f77e7891 [AArch64] Add support for Cortex-A76 and Cortex-A76AE
- Add LLVM backend support for Cortex-A76 and Cortex-A76AE
- Documentation can be found at
  https://developer.arm.com/products/processors/cortex-a/cortex-a76

llvm-svn: 354788
2019-02-25 15:08:27 +00:00
Amara Emerson 1abe05c0dd Re-land "[AArch64][GlobalISel] Implement partial support for G_SHUFFLE_VECTOR""
Thanks to Richard Trieu for pointing out that the failures were due to a
use-after-free of an ArrayRef.

llvm-svn: 354616
2019-02-21 20:20:16 +00:00
David Spickett 8ac2b181a1 [AArch64] Print instruction before atomic semantic annotations
Commit r353303 added annotations when acquire semantics
were dropped from an instruction.

printAnnotation was called before printInstruction.
So if you didn't set a separate comment output stream
you got <comment><instr> instead of <instr><comment>
as expected.

To fix this move the new printAnnotation to after
the instruction is printed.

Differential Revision: https://reviews.llvm.org/D58059

llvm-svn: 354565
2019-02-21 10:42:49 +00:00
Amara Emerson 71f2a5e60f Revert "[AArch64][GlobalISel] Implement partial support for G_SHUFFLE_VECTOR"
This reverts r354521 because it broke the bots, but passes on Darwin somehow.

llvm-svn: 354532
2019-02-21 00:31:13 +00:00
Amara Emerson a946d057b4 [AArch64][GlobalISel] Implement partial support for G_SHUFFLE_VECTOR
This change makes some basic type combinations for G_SHUFFLE_VECTOR legal, and
implements them with a very pessimistic TBL2 instruction in the selector.

For TBL2, support is also needed to generate constant pool entries and load from
them in order to materialize the mask register.

Currently supports <2 x s64> and <4 x s32> result types.

Differential Revision: https://reviews.llvm.org/D58466

llvm-svn: 354521
2019-02-20 22:11:39 +00:00
Jessica Paquette b53e0f4b81 [GlobalISel][AArch64] Legalize + select some llvm.ctlz.* intrinsics
Legalize/select llvm.ctlz.*

Add select-ctlz to show that we actually select them. Update arm64-clrsb.ll and
arm64-vclz.ll to show that we perform valid transformations in optimized builds,
and document where GISel can improve.

Differential Revision: https://reviews.llvm.org/D58155

llvm-svn: 354299
2019-02-18 23:33:24 +00:00
Matt Arsenault 530d05e94a GlobalISel: Add alignment to LegalityQuery MMOs
This allows targets to specify the minimum alignment required for the
load/store.

llvm-svn: 354071
2019-02-14 22:41:09 +00:00
Petr Hosek fcbec02ea6 [AArch64] Support reserving arbitrary general purpose registers
This is a follow up to D48580 and D48581 which allows reserving
arbitrary general purpose registers with the exception of registers
with special purpose (X8, X16-X18, X29, X30) and registers used by LLVM
(X0, X19). This change also generalizes some of the existing logic to
rely entirely on values generated from tablegen.

Differential Revision: https://reviews.llvm.org/D56305

llvm-svn: 353957
2019-02-13 17:28:47 +00:00
Nikita Popov a3be17ea1c [AArch64] Expand v8i8 cttz (PR39729)
Fix for https://bugs.llvm.org/show_bug.cgi?id=39729.

Rather than adding just a case for v8i8 I'm setting cttz to expand
for all vector types.

Differential Revision: https://reviews.llvm.org/D58008

llvm-svn: 353872
2019-02-12 18:55:53 +00:00
Jessica Paquette 0e71e73faa [GlobalISel][AArch64] Select llvm.bswap* for non-vector types
This teaches the IRTranslator to emit G_BSWAP when it runs into
Intrinsic::bswap. This allows us to select G_BSWAP for non-vector types in
AArch64.

Add a select-bswap.mir test, and add global isel checks to a couple existing
tests in test/CodeGen/AArch64.

This doesn't handle every bswap case, since some of these rely on known bits
stuff. This just lets us handle the naive case.

Differential Revision: https://reviews.llvm.org/D58081

llvm-svn: 353861
2019-02-12 17:28:17 +00:00
Jessica Paquette e57fe23f70 [AArch64][GlobalISel] Add isel support for a couple vector exts/truncs
Add support for

- v4s16 <-> v4s32
- v2s64 <-> v2s32

And update tests that use them to show that we generate the correct
instructions.

Differential Revision: https://reviews.llvm.org/D57832

llvm-svn: 353732
2019-02-11 18:56:39 +00:00
Jessica Paquette ebdb021031 [GlobalISel][AArch64] Select G_FFLOOR
This teaches the legalizer about G_FFLOOR, and lets us select G_FFLOOR in
AArch64.

It updates the existing floating point tests, and adds a select-floor.mir test.

Differential Revision: https://reviews.llvm.org/D57486

llvm-svn: 353722
2019-02-11 17:22:58 +00:00
Benjamin Kramer 711950c116 Move some classes into anonymous namespaces. NFC.
llvm-svn: 353710
2019-02-11 15:16:21 +00:00
Eli Friedman 29c0609301 [AArch64] Fix condition for "high-vector" DUP optimizations.
AArch64 NEON has a bunch of instructions with a "2" suffix that extract
the top half of the source vectors, instead of the bottom half.  We have
some DAGCombines to try to take advantage of that.  However, they
assumed that any EXTRACT_VECTOR was extracting the high half of the
vector in question.

This issue has apparently existed since the AArch64 backend was merged.

Fixes https://bugs.llvm.org/show_bug.cgi?id=40632 .

Differential Revision: https://reviews.llvm.org/D57862

llvm-svn: 353486
2019-02-08 00:23:35 +00:00
Matt Arsenault fbec8fe93b GlobalISel: Implement narrowScalar for shift main type
This is pretty much directly ported from SelectionDAG. Doesn't include
the shift by non-constant but known bits version, since there isn't a
globalisel version of computeKnownBits yet.

This shows a disadvantage of targets not specifically which type
should be used for the shift amount. If type 0 is legalized before
type 1, the operations on the shift amount type use the wider type
(which are also less likely to legalize). This can be avoided by
targets specifying legalization actions on type 1 earlier than for
type 0.

llvm-svn: 353455
2019-02-07 19:37:44 +00:00
Tim Northover 638110a208 AArch64: implement copy for paired GPR registers.
When doing 128-bit atomics using CASP we might need to copy a GPRPair to a
different register, but that was unimplemented up to now.

llvm-svn: 353383
2019-02-07 10:35:34 +00:00
Tim Northover 474f5d9b55 AArch64: enforce even/odd register pairs for CASP instructions.
ARMv8.1a CASP instructions need the first of the pair to be an even register
(otherwise the encoding is unallocated). We enforced this during assembly, but
not CodeGen before.

llvm-svn: 353308
2019-02-06 15:26:35 +00:00
Tim Northover 71025a2f3e AArch64: annotate atomics with dropped acquire semantics when printing.
A quirk of the v8.1a spec is that when the writeback regiser for an atomic
read-modify-write instruction is wzr/xzr, the instruction no longer enforces
acquire ordering. However, it's still written with the misleading 'a' mnemonic.

So this adds an annotation when disassembling such instructions, mentioning the
change.

llvm-svn: 353303
2019-02-06 15:07:59 +00:00
Aditya Nandakumar fef7619b05 [NFC][GlobalISel]: Add a convenience method to MachineInstrBuilder to simplify getOperand(i).getReg()
https://reviews.llvm.org/D57608

It's a common pattern in GISel to have a MachineInstrBuilder from which we get various regs
(commonly MIB->getOperand(0).getReg()). This adds a helper method and the above can be
replaced with MIB.getReg(0).

llvm-svn: 353223
2019-02-05 22:14:40 +00:00
Oliver Stannard 78dc38ec94 [AArch64][Outliner] Don't outline BTI instructions
We can't outline BTI instructions, because they need to be the very first
instruction executed after an indirect call or branch. If we outline them, then
an indirect call might go to the branch to the outlined function, which will
fault.

Differential revision: https://reviews.llvm.org/D57753

llvm-svn: 353190
2019-02-05 17:21:57 +00:00
Matt Arsenault 86504fb20e AArch64/GlobalISel: Don't clamp from 2 to 2
This is equivalent to clampMaxNumElements, but saves a check.

llvm-svn: 353188
2019-02-05 16:57:18 +00:00
Florian Hahn 3b251963c3 [CGP] Add support for sinking operands to their users, if they are free.
This patch improves code generation for some AArch64 ACLE intrinsics. It adds
support to CGP to duplicate and sink operands to their user, if they can be
folded into a target instruction, like zexts and sub into usubl. It adds a
TargetLowering hook shouldSinkOperands, which looks at the operands of
instructions to see if sinking is profitable.

I decided to add a new target hook, as for the sinking to be profitable,
at least on AArch64, we have to look at multiple operands of an
instruction, instead of looking at the users of a zext for example.

The sinking is done in CGP, because it works around an instruction
selection limitation. If instruction selection is not limited to a
single basic block, this patch should not be needed any longer.

Alternatively this could be done in the LoopSink pass, which tries to
undo LICM for instructions in blocks that are not executed frequently.

Note that we do not force the operands to sink to have a single user,
because we duplicate them before sinking. Therefore this is only
desirable if they really can be done for free. Additionally we could
consider the impact on live ranges later on.

This should fix https://bugs.llvm.org/show_bug.cgi?id=40025.

As for performance, we have internal code that uses intrinsics and can
be speed up by 10% by this change.

Reviewers: SjoerdMeijer, t.p.northover, samparker, efriedma, RKSimon, spatel

Reviewed By: samparker

Differential Revision: https://reviews.llvm.org/D57377

llvm-svn: 353152
2019-02-05 10:27:40 +00:00
Andrea Di Biagio edbf06a767 [AsmPrinter] Remove hidden flag -print-schedule.
This patch removes hidden codegen flag -print-schedule effectively reverting the
logic originally committed as r300311
(https://llvm.org/viewvc/llvm-project?view=revision&revision=300311).

Flag -print-schedule was originally introduced by r300311 to address PR32216
(https://bugs.llvm.org/show_bug.cgi?id=32216). That bug was about adding "Better
testing of schedule model instruction latencies/throughputs".

These days, we can use llvm-mca to test scheduling models. So there is no longer
a need for flag -print-schedule in LLVM. The main use case for PR32216 is
now addressed by llvm-mca.
Flag -print-schedule is mainly used for debugging purposes, and it is only
actually used by x86 specific tests. We already have extensive (latency and
throughput) tests under "test/tools/llvm-mca" for X86 processor models. That
means, most (if not all) existing -print-schedule tests for X86 are redundant.

When flag -print-schedule was first added to LLVM, several files had to be
modified; a few APIs gained new arguments (see for example method
MCAsmStreamer::EmitInstruction), and MCSubtargetInfo/TargetSubtargetInfo gained
a couple of getSchedInfoStr() methods.

Method getSchedInfoStr() had to originally work for both MCInst and
MachineInstr. The original implmentation of getSchedInfoStr() introduced a
subtle layering violation (reported as PR37160 and then fixed/worked-around by
r330615).
In retrospect, that new API could have been designed more optimally. We can
always query MCSchedModel to get the latency and throughput. More importantly,
the "sched-info" string should not have been generated by the subtarget.
Note, r317782 fixed an issue where "print-schedule" didn't work very well in the
presence of inline assembly. That commit is also reverted by this change.

Differential Revision: https://reviews.llvm.org/D57244

llvm-svn: 353043
2019-02-04 12:51:26 +00:00
Mandeep Singh Grang dc1e778369 [AArch64] Fix unused variable [NFC]
llvm-svn: 352940
2019-02-01 23:42:34 +00:00
Mandeep Singh Grang 70d484d94e [COFF, ARM64] Fix localaddress to handle stack realignment and variable size objects
Summary: This fixes using the correct stack registers for SEH when stack realignment is needed or when variable size objects are present.

Reviewers: rnk, efriedma, ssijaric, TomTan

Reviewed By: rnk, efriedma

Subscribers: javed.absar, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D57183

llvm-svn: 352923
2019-02-01 21:41:33 +00:00
James Y Knight 7716075a17 [opaque pointer types] Pass value type to GetElementPtr creation.
This cleans up all GetElementPtr creation in LLVM to explicitly pass a
value type rather than deriving it from the pointer's element-type.

Differential Revision: https://reviews.llvm.org/D57173

llvm-svn: 352913
2019-02-01 20:44:47 +00:00
James Y Knight 14359ef1b6 [opaque pointer types] Pass value type to LoadInst creation.
This cleans up all LoadInst creation in LLVM to explicitly pass the
value type rather than deriving it from the pointer's element-type.

Differential Revision: https://reviews.llvm.org/D57172

llvm-svn: 352911
2019-02-01 20:44:24 +00:00
James Y Knight 7976eb5838 [opaque pointer types] Pass function types to CallInst creation.
This cleans up all CallInst creation in LLVM to explicitly pass a
function type rather than deriving it from the pointer's element-type.

Differential Revision: https://reviews.llvm.org/D57170

llvm-svn: 352909
2019-02-01 20:43:25 +00:00
Adhemerval Zanella b3ccc5550d [AArch64] Optimize floating point materialization
This patch changes isFPImmLegal to return if the value can be enconded
as the immediate operand of a logical instruction besides checking if
for immediate field for fmov.

This optimizes some floating point materization, inclusive values
used on isinf lowering.

Reviewed By: rengolin, efriedma, evandro

Differential Revision: https://reviews.llvm.org/D57044

llvm-svn: 352866
2019-02-01 12:26:06 +00:00
James Y Knight 13680223b9 [opaque pointer types] Add a FunctionCallee wrapper type, and use it.
Recommit r352791 after tweaking DerivedTypes.h slightly, so that gcc
doesn't choke on it, hopefully.

Original Message:
The FunctionCallee type is effectively a {FunctionType*,Value*} pair,
and is a useful convenience to enable code to continue passing the
result of getOrInsertFunction() through to EmitCall, even once pointer
types lose their pointee-type.

Then:
- update the CallInst/InvokeInst instruction creation functions to
  take a Callee,
- modify getOrInsertFunction to return FunctionCallee, and
- update all callers appropriately.

One area of particular note is the change to the sanitizer
code. Previously, they had been casting the result of
`getOrInsertFunction` to a `Function*` via
`checkSanitizerInterfaceFunction`, and storing that. That would report
an error if someone had already inserted a function declaraction with
a mismatching signature.

However, in general, LLVM allows for such mismatches, as
`getOrInsertFunction` will automatically insert a bitcast if
needed. As part of this cleanup, cause the sanitizer code to do the
same. (It will call its functions using the expected signature,
however they may have been declared.)

Finally, in a small number of locations, callers of
`getOrInsertFunction` actually were expecting/requiring that a brand
new function was being created. In such cases, I've switched them to
Function::Create instead.

Differential Revision: https://reviews.llvm.org/D57315

llvm-svn: 352827
2019-02-01 02:28:03 +00:00
James Y Knight fadf25068e Revert "[opaque pointer types] Add a FunctionCallee wrapper type, and use it."
This reverts commit f47d6b38c7 (r352791).

Seems to run into compilation failures with GCC (but not clang, where
I tested it). Reverting while I investigate.

llvm-svn: 352800
2019-01-31 21:51:58 +00:00
James Y Knight f47d6b38c7 [opaque pointer types] Add a FunctionCallee wrapper type, and use it.
The FunctionCallee type is effectively a {FunctionType*,Value*} pair,
and is a useful convenience to enable code to continue passing the
result of getOrInsertFunction() through to EmitCall, even once pointer
types lose their pointee-type.

Then:
- update the CallInst/InvokeInst instruction creation functions to
  take a Callee,
- modify getOrInsertFunction to return FunctionCallee, and
- update all callers appropriately.

One area of particular note is the change to the sanitizer
code. Previously, they had been casting the result of
`getOrInsertFunction` to a `Function*` via
`checkSanitizerInterfaceFunction`, and storing that. That would report
an error if someone had already inserted a function declaraction with
a mismatching signature.

However, in general, LLVM allows for such mismatches, as
`getOrInsertFunction` will automatically insert a bitcast if
needed. As part of this cleanup, cause the sanitizer code to do the
same. (It will call its functions using the expected signature,
however they may have been declared.)

Finally, in a small number of locations, callers of
`getOrInsertFunction` actually were expecting/requiring that a brand
new function was being created. In such cases, I've switched them to
Function::Create instead.

Differential Revision: https://reviews.llvm.org/D57315

llvm-svn: 352791
2019-01-31 20:35:56 +00:00
Sjoerd Meijer f7cc34cae8 [SelectionDAG] Codesize: don't expand SHIFT to SHIFT_PARTS
And instead just generate a libcall. My motivating example on ARM was a simple:
  
  shl i64 %A, %B

for which the code bloat is quite significant. For other targets that also
accept __int128/i128 such as AArch64 and X86, it is also beneficial for these
cases to generate a libcall when optimising for minsize. On these 64-bit targets,
the 64-bits shifts are of course unaffected because the SHIFT/SHIFT_PARTS
lowering operation action is not set to custom/expand.

Differential Revision: https://reviews.llvm.org/D57386

llvm-svn: 352736
2019-01-31 08:07:30 +00:00
Matt Arsenault 2a64598ef2 GlobalISel: Fix creating MMOs with align 0
llvm-svn: 352712
2019-01-31 01:38:47 +00:00
Jessica Paquette 84bedac7e9 [GlobalISel][AArch64] Select G_FEXP
This teaches the legalizer to handle G_FEXP in AArch64. As a result, it also
allows us to select G_FEXP.

It...

- Updates the legalizer-info tests
- Adds a test for legalizing exp
- Updates the existing fp tests to show that we can now select G_FEXP

https://reviews.llvm.org/D57483

llvm-svn: 352692
2019-01-30 23:46:15 +00:00
Jessica Paquette 10f59405ae [GlobalISel][AArch64] Select G_FABS
This adds instruction selection support for G_FABS in AArch64. It also updates
the existing basic FP tests, adds a selection test for G_FABS.

https://reviews.llvm.org/D57418

llvm-svn: 352684
2019-01-30 22:54:21 +00:00
Jessica Paquette 0154bd1385 [GlobalISel][AArch64] Add instruction selection support for @llvm.log2
This teaches GlobalISel to emit a RTLib call for @llvm.log2 when it encounters
it.

It updates the existing floating point tests to show that we don't fall back on
the intrinsic, and select the correct instructions. It also adds a legalizer
test for G_FLOG2.

https://reviews.llvm.org/D57357

llvm-svn: 352673
2019-01-30 21:16:04 +00:00
Jessica Paquette 22457f8e9b [GlobalISel][AArch64] Add instruction selection support for @llvm.sqrt
This teaches the legalizer about G_FSQRT in AArch64. Also adds a legalizer
test for G_FSQRT, a selection test for it, and updates existing floating point
tests.

https://reviews.llvm.org/D57361

llvm-svn: 352671
2019-01-30 21:03:52 +00:00
Amara Emerson 102c9ed768 [AArch64][GlobalISel] Unmerge into scalars from a vector should use FPR bank.
This currently shows up as a selection fallback since the dest regs were given
GPR banks but the source was a vector FPR reg.

Differential Revision: https://reviews.llvm.org/D57408

llvm-svn: 352545
2019-01-29 21:19:33 +00:00
Martin Storsjo f5884d255e [COFF, ARM64] Don't put jump table into a separate COFF section for EK_LabelDifference32
Windows ARM64 has PIC relocation model and uses jump table kind
EK_LabelDifference32. This produces jump table entry as
".word LBB123 - LJTI1_2" which represents the distance between the block
and jump table.

A new relocation type (IMAGE_REL_ARM64_REL32) is needed to do the fixup
correctly if they are in different COFF section.

This change saves the jump table to the same COFF section as the
associated code. An ideal fix could be utilizing IMAGE_REL_ARM64_REL32
relocation type.

Patch by Tom Tan!

Differential Revision: https://reviews.llvm.org/D57277

llvm-svn: 352465
2019-01-29 09:36:48 +00:00
Reid Kleckner 96c581d7d0 [AArch64] Include AArch64GenCallingConv.inc once
Summary:
Avoids duplicating generated static helpers for calling convention
analysis.

This also means you can modify AArch64CallingConv.td without recompiling
the AArch64ISelLowering.cpp monolith, so it provides faster incremental
rebuilds.

Saves 12K in llc.exe, but adds a new object file, which is large.

Reviewers: efriedma, t.p.northover

Subscribers: mgorny, javed.absar, kristof.beyls, hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D56948

llvm-svn: 352430
2019-01-28 21:28:40 +00:00
Jessica Paquette 2d73ecd0a3 [GlobalISel][AArch64] Add legalization for G_FLOG
This adds support for legalizing G_FLOG into a RTLib call.

It adds a legalizer test, and updates the existing floating point tests.

https://reviews.llvm.org/D57347

llvm-svn: 352429
2019-01-28 21:27:23 +00:00
Jessica Paquette c49428a97d [GlobalISel][AArch64] Add instruction selection support for @llvm.log10
This adds instruction selection support for @llvm.log10 in AArch64. It teaches
GISel to lower it to a library call, updates the relevant tests, and adds a
legalizer test for log10.

https://reviews.llvm.org/D57341

llvm-svn: 352418
2019-01-28 19:53:14 +00:00
Francis Visoiu Mistrih 556ea7d2e0 [AArch64] Add 'apple-latest' CPU alias
The 'apple-latest' alias is supposed to provide a CPU that contains the
latest Apple processor model supported by LLVM.

This is supposed to be used by tools like lldb to provide a target that
supports most of the CPU features.

For now, this is mapped to Cyclone.

Differential Revision: https://reviews.llvm.org/D56384

llvm-svn: 352412
2019-01-28 19:27:33 +00:00
Jessica Paquette 7db82d7257 [GlobalISel][AArch64] Add instruction selection support for G_FCOS and G_FSIN
This contains all of the legalizer changes from D57197 necessary to select
G_FCOS and G_FSIN. It also updates several existing IR tests in
test/CodeGen/AArch64 that verify that we correctly lower the G_FCOS and G_FSIN
instructions.

https://reviews.llvm.org/D57197
3/3

llvm-svn: 352402
2019-01-28 18:34:18 +00:00
Amara Emerson fd31bf95c1 [AArch64][GlobalISel] Teach RBS about G_FNEG default mapping.
llvm-svn: 352340
2019-01-28 03:21:14 +00:00
Amara Emerson 0bfa2faccc [AArch64][GlobalISel] Add some missing vector support for FP arithmetic ops.
Moved the fneg lowering legalization test from AArch64 to X86, as we want to
specify that it's already legal.

llvm-svn: 352338
2019-01-28 02:28:22 +00:00
Amara Emerson 92ffb305cc [AArch64][GlobalISel] Add some vector support for fp <-> int conversions.
Some unrelated, but benign, test changes as well due to the test update script.

llvm-svn: 352337
2019-01-28 02:27:59 +00:00
Jessica Paquette 1f9bc2854f [GlobalISel][AArch64][NFC] Fix incorrect comment in selectUnmergeValues
s/scalar/vector/

llvm-svn: 352243
2019-01-25 21:28:27 +00:00
Simon Pilgrim dea6174b0b Fix gcc -Wparentheses warning. NFCI.
llvm-svn: 352193
2019-01-25 11:38:40 +00:00
Matt Arsenault 990f507704 GlobalISel: Add convenience mutatations to scalarize
llvm-svn: 352143
2019-01-25 00:51:00 +00:00
Benjamin Kramer 653020d3cc [GlobalISel][AArch64] Avoid unused variable warning for variable only used in assert
llvm-svn: 352133
2019-01-24 23:45:07 +00:00
Benjamin Kramer 1411ecf08b [GlobalISel][AArch64] Avoid unused function warnings in Release builds
llvm-svn: 352129
2019-01-24 23:39:47 +00:00
Jessica Paquette 76c40f827d Suppress unused capture warning in CheckCopy
Werror bots didn't like the lambda + assert thing in my previous commit.

Capture everything to suppress the error.

Example failure here:
http://lab.llvm.org:8011/builders/lld-x86_64-darwin13/builds/29393

llvm-svn: 352124
2019-01-24 22:51:31 +00:00
Jessica Paquette 245047dfe8 [GlobalISel][AArch64] Add isel support for FP16 vector @llvm.ceil
This patch adds support for vector @llvm.ceil intrinsics when full 16 bit
floating point support isn't available.

To do this, this patch...

- Implements basic isel for G_UNMERGE_VALUES
- Teaches the legalizer about 16 bit floats
- Teaches AArch64RegisterBankInfo to respect floating point registers on
  G_BUILD_VECTOR and G_UNMERGE_VALUES
- Teaches selectCopy about 16-bit floating point vectors

It also adds

- A legalizer test for the 16-bit vector ceil which verifies that we create a
  G_UNMERGE_VALUES and G_BUILD_VECTOR when full fp16 isn't supported
- An instruction selection test which makes sure we lower to G_FCEIL when
  full fp16 is supported
- A test for selecting G_UNMERGE_VALUES

And also updates arm64-vfloatintrinsics.ll to show that the new ceiling types
work as expected.

https://reviews.llvm.org/D56682

llvm-svn: 352113
2019-01-24 22:00:41 +00:00
Benjamin Kramer 4ebed81fc4 [AArch64] Fix out of bounds strlen
CFIInst is not zero-terminated. This is one of more annoying functional
differences between StringRef and ArrayRef.

Found by asan.

llvm-svn: 351955
2019-01-23 14:51:21 +00:00
Kristof Beyls 3ff5dfd735 [SLH] AArch64: correctly pick temporary register to mask SP
As part of speculation hardening, the stack pointer gets masked with the
taint register (X16) before a function call or before a function return.
Since there are no instructions that can directly mask writing to the
stack pointer, the stack pointer must first be transferred to another
register, where it can be masked, before that value is transferred back
to the stack pointer.
Before, that temporary register was always picked to be x17, since the
ABI allows clobbering x17 on any function call, resulting in the
following instruction pattern being inserted before function calls and
returns/tail calls:

mov x17, sp
and x17, x17, x16
mov sp, x17
However, x17 can be live in those locations, for example when the call
is an indirect call, using x17 as the target address (blr x17).

To fix this, this patch looks for an available register just before the
call or terminator instruction and uses that.

In the rare case when no register turns out to be available (this
situation is only encountered twice across the whole test-suite), just
insert a full speculation barrier at the start of the basic block where
this occurs.

Differential Revision: https://reviews.llvm.org/D56717

llvm-svn: 351930
2019-01-23 08:18:39 +00:00
Peter Collingbourne 73078ecd38 hwasan: Move memory access checks into small outlined functions on aarch64.
Each hwasan check requires emitting a small piece of code like this:
https://clang.llvm.org/docs/HardwareAssistedAddressSanitizerDesign.html#memory-accesses

The problem with this is that these code blocks typically bloat code
size significantly.

An obvious solution is to outline these blocks of code. In fact, this
has already been implemented under the -hwasan-instrument-with-calls
flag. However, as currently implemented this has a number of problems:
- The functions use the same calling convention as regular C functions.
  This means that the backend must spill all temporary registers as
  required by the platform's C calling convention, even though the
  check only needs two registers on the hot path.
- The functions take the address to be checked in a fixed register,
  which increases register pressure.
Both of these factors can diminish the code size effect and increase
the performance hit of -hwasan-instrument-with-calls.

The solution that this patch implements is to involve the aarch64
backend in outlining the checks. An intrinsic and pseudo-instruction
are created to represent a hwasan check. The pseudo-instruction
is register allocated like any other instruction, and we allow the
register allocator to select almost any register for the address to
check. A particular combination of (register selection, type of check)
triggers the creation in the backend of a function to handle the check
for specifically that pair. The resulting functions are deduplicated by
the linker. The pseudo-instruction (really the function) is specified
to preserve all registers except for the registers that the AAPCS
specifies may be clobbered by a call.

To measure the code size and performance effect of this change, I
took a number of measurements using Chromium for Android on aarch64,
comparing a browser with inlined checks (the baseline) against a
browser with outlined checks.

Code size: Size of .text decreases from 243897420 to 171619972 bytes,
or a 30% decrease.

Performance: Using Chromium's blink_perf.layout microbenchmarks I
measured a median performance regression of 6.24%.

The fact that a perf/size tradeoff is evident here suggests that
we might want to make the new behaviour conditional on -Os/-Oz.
But for now I've enabled it unconditionally, my reasoning being that
hwasan users typically expect a relatively large perf hit, and ~6%
isn't really adding much. We may want to revisit this decision in
the future, though.

I also tried experimenting with varying the number of registers
selectable by the hwasan check pseudo-instruction (which would result
in fewer variants being created), on the hypothesis that creating
fewer variants of the function would expose another perf/size tradeoff
by reducing icache pressure from the check functions at the cost of
register pressure. Although I did observe a code size increase with
fewer registers, I did not observe a strong correlation between the
number of registers and the performance of the resulting browser on the
microbenchmarks, so I conclude that we might as well use ~all registers
to get the maximum code size improvement. My results are below:

Regs | .text size | Perf hit
-----+------------+---------
~all | 171619972  | 6.24%
  16 | 171765192  | 7.03%
   8 | 172917788  | 5.82%
   4 | 177054016  | 6.89%

Differential Revision: https://reviews.llvm.org/D56954

llvm-svn: 351920
2019-01-23 02:20:10 +00:00
Matt Arsenault 30989e492b GlobalISel: Allow shift amount to be a different type
For AMDGPU the shift amount is never 64-bit, and
this needs to use a 32-bit shift.

X86 uses i8, but seemed to be hacking around this before.

llvm-svn: 351882
2019-01-22 21:42:11 +00:00
Matt Arsenault 39508331ef Reapply "IR: Add fp operations to atomicrmw"
This reapplies commits r351778 and r351782 with
RISCV test fixes.

llvm-svn: 351850
2019-01-22 18:18:02 +00:00
Chandler Carruth 285fe716c5 Revert r351778: IR: Add fp operations to atomicrmw
This broke the RISCV build, and even with that fixed, one of the RISCV
tests behaves surprisingly differently with asserts than without,
leaving there no clear test pattern to use. Generally it seems bad for
hte IR to differ substantially due to asserts (as in, an alloca is used
with asserts that isn't needed without!) and nothing I did simply would
fix it so I'm reverting back to green.

This also required reverting the RISCV build fix in r351782.

llvm-svn: 351796
2019-01-22 10:29:58 +00:00
Matt Arsenault bfdba5e4fc IR: Add fp operations to atomicrmw
Add just fadd/fsub for now.

llvm-svn: 351778
2019-01-22 03:32:36 +00:00
Eli Friedman 23e60c7893 [AArch64] Add patterns for zext/sext of shift amount.
Not sure this is the best fix, but it saves an instruction for certain
constructs involving variable shifts.

Differential Revision: https://reviews.llvm.org/D55572

llvm-svn: 351768
2019-01-22 00:21:35 +00:00
Chandler Carruth 2946cd7010 Update the file headers across all of the LLVM projects in the monorepo
to reflect the new license.

We understand that people may be surprised that we're moving the header
entirely to discuss the new license. We checked this carefully with the
Foundation's lawyer and we believe this is the correct approach.

Essentially, all code in the project is now made available by the LLVM
project under our new license, so you will see that the license headers
include that license only. Some of our contributors have contributed
code under our old license, and accordingly, we have retained a copy of
our old license notice in the top-level files in each project and
repository.

llvm-svn: 351636
2019-01-19 08:50:56 +00:00
Sanjin Sijaric 4d1450298c Fix the buildbot failure introduced by r351404
EXPENSIVE_CHECKS buildbots are failing due to r351404.

Add x1 as live in to the funclet basic block for SEH funclets, as well as
-verify-machineinstrs to the test case that triggered the failure.

llvm-svn: 351472
2019-01-17 20:24:14 +00:00
Matt Arsenault 0cb08e448a Allow FP types for atomicrmw xchg
llvm-svn: 351427
2019-01-17 10:49:01 +00:00
Sanjin Sijaric 685565ae9a [SEH] [ARM64] Retrieve the frame pointer from SEH funclets
The Windows ARM64 runtime passes the establisher frame to funclets as the first
argument.

llvm-svn: 351404
2019-01-17 00:24:38 +00:00
Mandeep Singh Grang 33c49c0c82 [COFF, ARM64] Implement support for SEH extensions __try/__except/__finally
Summary:
This patch supports MS SEH extensions __try/__except/__finally. The intrinsics localescape and localrecover are responsible for communicating escaped static allocas from the try block to the handler.

We need to preserve frame pointers for SEH. So we create a new function/property HasLocalEscape.

Reviewers: rnk, compnerd, mstorsjo, TomTan, efriedma, ssijaric

Reviewed By: rnk, efriedma

Subscribers: smeenai, jrmuizel, alex, majnemer, ssijaric, ehsan, dmajor, kristina, javed.absar, kristof.beyls, chrib, llvm-commits

Differential Revision: https://reviews.llvm.org/D53540

llvm-svn: 351370
2019-01-16 19:52:59 +00:00
Aditya Nandakumar 500e3ead9f [GISel]: Add support for CSEing continuously during GISel passes.
https://reviews.llvm.org/D52803

This patch adds support to continuously CSE instructions during
each of the GISel passes. It consists of a GISelCSEInfo analysis pass
that can be used by the CSEMIRBuilder.

llvm-svn: 351283
2019-01-16 00:40:37 +00:00
Evandro Menezes f793fe1402 [AArch64] Adjust the feature set for Exynos
Enable the fusion of arithmetic and logic instructions for Exynos M4.

llvm-svn: 351149
2019-01-15 01:53:49 +00:00
Eli Friedman 33aecc8182 [AArch64] Explicitly use v1i64 type for llvm.aarch64.neon.abs.i64 .
Otherwise, with D56544, the intrinsic will be expanded to an integer
csel, which is probably not what the user expected.  This matches the
general convention of using "v1" types to represent scalar integer
operations in vector registers.

While I'm here, also add some error checking so we don't generate
illegal ABS nodes.

Differential Revision: https://reviews.llvm.org/D56616

llvm-svn: 351141
2019-01-15 00:15:24 +00:00
Evandro Menezes bf59cb02c3 [AArch64] Add new target feature to fuse arithmetic and logic operations
This feature enables the fusion of some arithmetic and logic instructions
together.

Differential revision: https://reviews.llvm.org/D56572

llvm-svn: 351139
2019-01-14 23:54:36 +00:00
Evandro Menezes 7d7e3256cd [AArch64] Improve Exynos predicates
Expand the predicate using shifted arithmetic and logic instructions to also
consider the respective not shifted instructions.

llvm-svn: 350976
2019-01-11 22:39:47 +00:00
Evandro Menezes 0c14c87d00 [AArch64] Add pipeline model for Exynos M4
Add the scheduling and cost model for Exynos M4.

llvm-svn: 350960
2019-01-11 19:36:25 +00:00
Evandro Menezes 0674762112 [AArch64] Create feature set for Exynos M4
Complete the feature set for Exynos M4 and update test cases.

llvm-svn: 350953
2019-01-11 18:54:25 +00:00
Bryan Chan 7ce5775e62 [AArch64] Fix operation actions for FP16 vector intrinsics
Summary:
This patch changes the legalization action for some half-precision floating-
point vector intrinsics (FSIN, FLOG, etc.) from Promote to Expand. These ops
are not supported in hardware for half-precision vectors, but promotion is
not always possible (for v8f16 operands). Changing the action to Expand fixes
an assertion failure in the legalizer when the frontend produces such ops.
In addition, a quick microbenchmark shows that, in the v4f16 case,
expanding introduces fewer spills and is therefore slightly faster than
promoting.

Reviewers: t.p.northover, SjoerdMeijer

Reviewed By: SjoerdMeijer

Subscribers: javed.absar, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D56296

llvm-svn: 350825
2019-01-10 15:02:37 +00:00
Mandeep Singh Grang 859cb2e35d [AArch64] Emit the correct MCExpr relocations specifiers like VK_ABS_G0, etc
Summary:
D55896 and D56029 add support to emit fixups for :abs_g0: , :abs_g1_s: , etc.
This patch adds the necessary enums and MCExpr needed for lowering these.

Reviewers: rnk, mstorsjo, efriedma

Reviewed By: efriedma

Subscribers: javed.absar, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D56037

llvm-svn: 350798
2019-01-10 04:59:44 +00:00
Kristof Beyls c650ff77eb Initial AArch64 SLH implementation.
This is an initial implementation for Speculative Load Hardening for
AArch64. It builds on top of the recently introduced
AArch64SpeculationHardening pass.
This doesn't implement (yet) some of the optimizations implemented for
the X86SpeculativeLoadHardening pass. I thought introducing the
optimizations incrementally in follow-up patches should make this easier
to review.

Differential Revision: https://reviews.llvm.org/D55929

llvm-svn: 350729
2019-01-09 15:13:34 +00:00
Diogo N. Sampaio 1eb31c8e94 [AArch64] Move feature predctrl to predres
Follow up patch of rL350385, for adding predres
command line option. This patch renames the
feature as to keep it aligned with the option
passed by/to clang

Differential Revision: https://reviews.llvm.org/D56484

llvm-svn: 350702
2019-01-09 11:24:15 +00:00
Matt Arsenault 3dddb163dd GlobalISel: Implement fewerElements for implicit_def
llvm-svn: 350697
2019-01-09 07:51:52 +00:00
Evandro Menezes 39c97bf6cd [AArch64] Adjust the cost model for Exynos
Improve the modeling of ALU instructions.

llvm-svn: 350663
2019-01-08 22:29:58 +00:00
Tim Northover 964eea7ad2 AArch64: avoid splitting vector truncating stores.
We have code to split vector splats (of zero and non-zero) for performance
reasons, but it ignores the fact that a store might be truncating.

Actually, truncating stores are formed for vNi8 and vNi16 types. Since the
truncation is from a legal type, the size of the store is always <= 64-bits and
so they don't actually benefit from being split up anyway, so this patch just
disables that transformation.

llvm-svn: 350620
2019-01-08 13:30:27 +00:00
Mandeep Singh Grang f286bee9fe [MC] [AArch64] Support resolving signed fixups for :abs_g0_s: etc.
Summary: This patch is a follow-up to D55896.

Reviewers: efriedma, mstorsjo

Reviewed By: efriedma

Subscribers: javed.absar, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D56029

llvm-svn: 350606
2019-01-08 04:48:00 +00:00
Evandro Menezes 9f53bea536 [AArch64] Adjust the cost model for Exynos M3
Improve the modeling of ASIMD loads and stores.

llvm-svn: 350434
2019-01-04 21:02:25 +00:00
Evandro Menezes 0f67746c92 [AArch64] Add new scheduling predicates
Add new scheduling predicates to identify the ASIMD loads and stores using the post indexed addressing mode.

llvm-svn: 350332
2019-01-03 17:28:09 +00:00
Martin Storsjo 74d93f9b24 [AArch64] Accept "sve" as arch feature in assembler
Differential Revision: https://reviews.llvm.org/D56128

llvm-svn: 350174
2018-12-31 10:22:04 +00:00
Martin Storsjo 2018777836 [AArch64] Implement the .arch_extension directive
Differential Revision: https://reviews.llvm.org/D56131

llvm-svn: 350169
2018-12-30 21:06:32 +00:00
Diogo N. Sampaio 9123f82cc4 [AArch64] Add command-line option for SB
SB (Speculative Barrier) is only mandatory from 8.5
onwards but is optional from Armv8.0-A. This patch adds a command
line option to enable SB, as it was previously only possible to
enable by selecting -march=armv8.5-a.

This patch also moves to FeatureSB the old FeatureSpecRestrict.

Reviewers: pbarrio, olista01, t.p.northover, LukeCheeseman	

Differential Revision: https://reviews.llvm.org/D55921

llvm-svn: 350126
2018-12-28 17:14:58 +00:00
Jessica Paquette 453ab1db5b [GlobalISel][AArch64] Add support for widening G_FCEIL
This adds support for widening G_FCEIL in LegalizerHelper and
AArch64LegalizerInfo. More specifically, it teaches the AArch64 legalizer to
widen G_FCEIL from a 16-bit float to a 32-bit float when the subtarget doesn't
support full FP 16.

This also updates AArch64/f16-instructions.ll to show that we perform the
correct transformation.

llvm-svn: 349927
2018-12-21 17:05:26 +00:00
Evandro Menezes 96c11eceb2 [AArch64] Refactor Exynos predicate (NFC)
Change order of conditions in predicate.

llvm-svn: 349918
2018-12-21 15:51:34 +00:00
Simon Pilgrim 148957f336 [AArch64] Always use the version of computeKnownBits that returns a value. NFCI.
Continues the work started by @bogner in rL340594 to remove uses of the KnownBits output paramater version.

llvm-svn: 349908
2018-12-21 15:05:10 +00:00
Luke Cheeseman 41a9e53500 [Dwarf/AArch64] Return address signing B key dwarf support
- When signing return addresses with -msign-return-address=<scope>{+<key>},
  either the A key instructions or the B key instructions can be used. To
  correctly authenticate the return address, the unwinder/debugger must know
  which key was used to sign the return address.
- When and exception is thrown or a break point reached, it may be necessary to
  unwind the stack. To accomplish this, the unwinder/debugger must be able to
  first authenticate an the return address if it has been signed.
- To enable this, the augmentation string of CIEs has been extended to allow
  inclusion of a 'B' character. Functions that are signed using the B key
  variant of the instructions should have and FDE whose associated CIE has a 'B'
  in the augmentation string.
- One must also be able to preserve these semantics when first stepping from a
  high level language into assembly and then, as a second step, into an object
  file. To achieve this, I have introduced a new assembly directive
  '.cfi_b_key_frame ', that tells the assembler the current frame uses return
  address signing with the B key.
- This ensures that the FDE is associated with a CIE that has 'B' in the
  augmentation string.

Differential Revision: https://reviews.llvm.org/D51798

llvm-svn: 349895
2018-12-21 10:45:08 +00:00
Jessica Paquette a6b9c68a85 [GlobalISel][AArch64] Add G_FCEIL to isPreISelGenericFloatingPointOpcode
If you don't do this, then if you hit a G_LOAD in getInstrMapping, you'll end
up with GPRs on the G_FCEIL instead of FPRs. This causes a fallback.

Add it to the switch, and add a test verifying that this happens.

llvm-svn: 349822
2018-12-20 21:14:15 +00:00
Eli Friedman 48397102d0 [MC] [AArch64] Correctly resolve ":abs_g1:3" etc.
We have to treat constructs like this as if they were "symbolic", to use
the correct codepath to resolve them.  This mostly only affects movz
etc. because the other uses of classifySymbolRef conservatively treat
everything that isn't a constant as if it were a symbol.

Differential Revision: https://reviews.llvm.org/D55906

llvm-svn: 349800
2018-12-20 19:46:14 +00:00
Eli Friedman 4648209e16 [MC] [AArch64] Support resolving fixups for abs_g0 etc.
This requires a bit more code than other fixups, to distingush between
abs_g0/abs_g1/etc.  Actually, I think some of the other fixups are
missing some checks, but I won't try to address that here.

I haven't seen any real-world code that uses a construct like this, but
it clearly should work, and we're considering using it in the
implementation of localescape/localrecover on Windows (see
https://reviews.llvm.org/D53540). I've verified that binutils produces
the same code as llvm-mc for the testcase.

This currently doesn't include support for the *_s variants (that
requires a bit more work to set the opcode).

Differential Revision: https://reviews.llvm.org/D55896

llvm-svn: 349799
2018-12-20 19:38:07 +00:00
Amara Emerson 321bfb210a Fix build errors introduced by r349712 on aarch64 bots.
llvm-svn: 349723
2018-12-20 03:27:42 +00:00
Amara Emerson 8cb186ce17 [AArch64][GlobalISel] Implement selection og G_MERGE of two s32s into s64.
This code pattern is an unfortunate side effect of the way some types get split
at call lowering. Ideally we'd either not generate it at all or combine it away
in the legalizer artifact combiner.

Until then, add selection support anyway which is a significant proportion of
our current fallbacks on CTMark.

rdar://46491420

llvm-svn: 349712
2018-12-20 01:11:04 +00:00
Evandro Menezes 374ccf6768 [AArch64] Improve Exynos predicates
Expand the predicate `ExynosResetPred` to include all forms of immediate
moves.

llvm-svn: 349686
2018-12-19 22:24:36 +00:00
Evandro Menezes ff827d737a [AArch64] Use canonical copy idiom
Use only the canonical form of the alias for register transfers in the
`IsCopyIdiomPred` predicate.

llvm-svn: 349685
2018-12-19 22:24:31 +00:00
Jessica Paquette 3560e93dc1 [GlobalISel][AArch64] Add support for @llvm.ceil
This adds a G_FCEIL generic instruction and uses it in AArch64. This adds
selection for floating point ceil where it has a supported, dedicated
instruction. Other cases aren't handled here.

It updates the relevant gisel tests and adds a select-ceil test. It also adds a
check to arm64-vcvt.ll which ensures that we don't fall back when we run into
one of the relevant cases.

llvm-svn: 349664
2018-12-19 19:01:36 +00:00
Evandro Menezes 5d409b2278 [AArch64] Improve the Exynos M3 pipeline model
llvm-svn: 349652
2018-12-19 17:37:51 +00:00
Evandro Menezes f03c45d582 [AArch64] Simplify the Exynos M3 pipeline model
llvm-svn: 349569
2018-12-18 23:19:57 +00:00
Evandro Menezes 4e39fa4474 [AArch64] Fix instructions order (NFC)
llvm-svn: 349568
2018-12-18 23:19:55 +00:00
Luke Cheeseman f57d7d8237 [AArch64] - Return address signing dwarf support
- Reapply changes intially introduced in r343089
- The archtecture info is no longer loaded whenever a DWARFContext is created
- The runtimes libraries (santiziers) make use of the dwarf context classes but
  do not intialise the target info
- The architecture of the object can be obtained without loading the target info
- Adding a method to the dwarf context to get this information and multiplex the
  string printing later on

Differential Revision: https://reviews.llvm.org/D55774

llvm-svn: 349472
2018-12-18 10:37:42 +00:00
Kristof Beyls e66bc1f756 Introduce control flow speculation tracking pass for AArch64
The pass implements tracking of control flow miss-speculation into a "taint"
register. That taint register can then be used to mask off registers with
sensitive data when executing under miss-speculation, a.k.a. "transient
execution".
This pass is aimed at mitigating against SpectreV1-style vulnarabilities.

At the moment, it implements the tracking of miss-speculation of control
flow into a taint register, but doesn't implement a mechanism yet to then
use that taint register to mask off vulnerable data in registers (something
for a follow-on improvement). Possible strategies to mask out vulnerable
data that can be implemented on top of this are:
- speculative load hardening to automatically mask of data loaded
  in registers.
- using intrinsics to mask of data in registers as indicated by the
  programmer (see https://lwn.net/Articles/759423/).

For AArch64, the following implementation choices are made.
Some of these are different than the implementation choices made in
the similar pass implemented in X86SpeculativeLoadHardening.cpp, as
the instruction set characteristics result in different trade-offs.
- The speculation hardening is done after register allocation. With a
  relative abundance of registers, one register is reserved (X16) to be
  the taint register. X16 is expected to not clash with other register
  reservation mechanisms with very high probability because:
  . The AArch64 ABI doesn't guarantee X16 to be retained across any call.
  . The only way to request X16 to be used as a programmer is through
    inline assembly. In the rare case a function explicitly demands to
    use X16/W16, this pass falls back to hardening against speculation
    by inserting a DSB SYS/ISB barrier pair which will prevent control
    flow speculation.
- It is easy to insert mask operations at this late stage as we have
  mask operations available that don't set flags.
- The taint variable contains all-ones when no miss-speculation is detected,
  and contains all-zeros when miss-speculation is detected. Therefore, when
  masking, an AND instruction (which only changes the register to be masked,
  no other side effects) can easily be inserted anywhere that's needed.
- The tracking of miss-speculation is done by using a data-flow conditional
  select instruction (CSEL) to evaluate the flags that were also used to
  make conditional branch direction decisions. Speculation of the CSEL
  instruction can be limited with a CSDB instruction - so the combination of
  CSEL + a later CSDB gives the guarantee that the flags as used in the CSEL
  aren't speculated. When conditional branch direction gets miss-speculated,
  the semantics of the inserted CSEL instruction is such that the taint
  register will contain all zero bits.
  One key requirement for this to work is that the conditional branch is
  followed by an execution of the CSEL instruction, where the CSEL
  instruction needs to use the same flags status as the conditional branch.
  This means that the conditional branches must not be implemented as one
  of the AArch64 conditional branches that do not use the flags as input
  (CB(N)Z and TB(N)Z). This is implemented by ensuring in the instruction
  selectors to not produce these instructions when speculation hardening
  is enabled. This pass will assert if it does encounter such an instruction.
- On function call boundaries, the miss-speculation state is transferred from
  the taint register X16 to be encoded in the SP register as value 0.

Future extensions/improvements could be:
- Implement this functionality using full speculation barriers, akin to the
  x86-slh-lfence option. This may be more useful for the intrinsics-based
  approach than for the SLH approach to masking.
  Note that this pass already inserts the full speculation barriers if the
  function for some niche reason makes use of X16/W16.
- no indirect branch misprediction gets protected/instrumented; but this
  could be done for some indirect branches, such as switch jump tables.

Differential Revision: https://reviews.llvm.org/D54896

llvm-svn: 349456
2018-12-18 08:50:02 +00:00
Martin Storsjo 8f0cb9c3a8 [AArch64] [MinGW] Allow enabling SEH exceptions
The default still is dwarf, but SEH exceptions can now be enabled
optionally for the MinGW target.

Differential Revision: https://reviews.llvm.org/D55748

llvm-svn: 349451
2018-12-18 08:32:37 +00:00
Tim Northover 256a16d031 FastIsel: take care to update iterators when removing instructions.
We keep a few iterators into the basic block we're selecting while
performing FastISel. Usually this is fine, but occasionally code wants
to remove already-emitted instructions. When this happens we have to be
careful to update those iterators so they're not pointint at dangling
memory.

llvm-svn: 349365
2018-12-17 17:25:53 +00:00
Alexandros Lamprineas 490ae11717 [AArch64] Re-run load/store optimizer after aggressive tail duplication
The Load/Store Optimizer runs before Machine Block Placement. At O3 the
Tail Duplication Threshold is set to 4 instructions and this can create
new opportunities for the Load/Store Optimizer. It seems worthwhile to
run it once again.

llvm-svn: 349338
2018-12-17 10:45:43 +00:00
Evandro Menezes ea9d90083f [AArch64] Simplify the scheduling predicates (NFC)
The instruction encodings make it unnecessary to distinguish extended W-form
from X-form instructions.

llvm-svn: 349185
2018-12-14 20:04:58 +00:00
Evandro Menezes 6fe51ac973 [AArch64] Fix Exynos predicates (NFC)
Fix the logic in the definition of the `ExynosShiftExPred` as a more
specific version of `ExynosShiftPred`.  But, since `ExynosShiftExPred` is
not used yet, this change has NFC.

llvm-svn: 349091
2018-12-13 23:19:46 +00:00
Arnaud A. de Grandmaison dfe861087d [AArch64] Catch some more CMN opportunities.
Fixes https://bugs.llvm.org/show_bug.cgi?id=33486

llvm-svn: 349022
2018-12-13 10:31:32 +00:00
Mandeep Singh Grang 802dc40f41 [COFF, ARM64] Emit COFF function header
Summary:
Emit COFF header when printing out the function. This is important as the
header contains two important pieces of information: the storage class for the
symbol and the symbol type information. This bit of information is required for
the linker to correctly identify the type of symbol that it is dealing with.

This patch mimics X86 and ARM COFF behavior for function header emission.

Reviewers: rnk, mstorsjo, compnerd, TomTan, ssijaric

Reviewed By: mstorsjo

Subscribers: dmajor, javed.absar, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D55535

llvm-svn: 348875
2018-12-11 18:36:14 +00:00
Aditya Nandakumar cef44a2342 [GISel]: Refactor MachineIRBuilder to allow passing additional parameters to build Instrs
https://reviews.llvm.org/D55294

Previously MachineIRBuilder::buildInstr used to accept variadic
arguments for sources (which were either unsigned or
MachineInstrBuilder). While this worked well in common cases, it doesn't
allow us to build instructions that have multiple destinations.
Additionally passing in other optional parameters in the end (such as
flags) is not possible trivially. Also a trivial call such as

B.buildInstr(Opc, Reg1, Reg2, Reg3)
can be interpreted differently based on the opcode (2defs + 1 src for
unmerge vs 1 def + 2srcs).
This patch refactors the buildInstr to

buildInstr(Opc, ArrayRef<DstOps>, ArrayRef<SrcOps>)
where DstOps and SrcOps are typed unions that know how to add itself to
MachineInstrBuilder.
After this patch, most invocations would look like

B.buildInstr(Opc, {s32, DstReg}, {SrcRegs..., SrcMIBs..});
Now all the other calls (such as buildAdd, buildSub etc) forward to
buildInstr. It also makes it possible to build instructions with
multiple defs.
Additionally in a subsequent patch, we should make it possible to add
flags directly while building instructions.
Additionally, the main buildInstr method is now virtual and other
builders now only have to override buildInstr (for say constant
folding/cseing) is straightforward.

Also attached here (https://reviews.llvm.org/F7675680) is a clang-tidy
patch that should upgrade the API calls if necessary.

llvm-svn: 348815
2018-12-11 00:48:50 +00:00
Amara Emerson 5ec146046c [GlobalISel] Restrict G_MERGE_VALUES capability and replace with new opcodes.
This patch restricts the capability of G_MERGE_VALUES, and uses the new
G_BUILD_VECTOR and G_CONCAT_VECTORS opcodes instead in the appropriate places.

This patch also includes AArch64 support for selecting G_BUILD_VECTOR of <4 x s32>
and <2 x s64> vectors.

Differential Revisions: https://reviews.llvm.org/D53629

llvm-svn: 348788
2018-12-10 18:44:58 +00:00
Evandro Menezes 53f0d41dc4 [AArch64] Refactor the Exynos scheduling predicates
Refactor the scheduling predicates based on `MCInstPredicate`.  In this
case, for the Exynos processors.

Differential revision: https://reviews.llvm.org/D55345

llvm-svn: 348774
2018-12-10 17:17:26 +00:00
Evandro Menezes 1ec1a0d342 [AArch64] Refactor the scheduling predicates
Refactor the scheduling predicates based on `MCInstPredicate`.  Augment the
number of helper predicates used by processor specific predicates.

Differential revision: https://reviews.llvm.org/D55375

llvm-svn: 348768
2018-12-10 16:24:30 +00:00
David Green ca29c271d2 [Targets] Add errors for tiny and kernel codemodel on targets that don't support them
Adds fatal errors for any target that does not support the Tiny or Kernel
codemodels by rejigging the getEffectiveCodeModel calls.

Differential Revision: https://reviews.llvm.org/D50141

llvm-svn: 348585
2018-12-07 12:10:23 +00:00
Evandro Menezes 799b76eae2 [AArch64] Fix Exynos predicate
Fix predicate for arithmetic instructions with shift and/or extend.

llvm-svn: 348510
2018-12-06 18:25:37 +00:00
Diogo N. Sampaio 9c9067316b [NFC][AArch64] Split out backend features
This patch splits backend features currently
hidden behind architecture versions.

For example, currently the only way to activate
complex numbers extension is targeting an v8.3
architecture, where after the patch this extension
can be added separately.

This refactoring is required by the new command lines proposal:
http://lists.llvm.org/pipermail/llvm-dev/2018-September/126346.html


Reviewers: DavidSpickett, olista01, t.p.northover

Subscribers: kristof.beyls, bryanpkc, javed.absar, pbarrio

Differential revision: https://reviews.llvm.org/D54633

--

It was reverted in rL348249 due a	build bot failure in one of the
regression tests:
http://lab.llvm.org:8011/builders/llvm-clang-x86_64-expensive-checks-win/builds/14386

The problem seems to be that FileCheck behaves
different in windows and linux. This new patch
splits the test file in multiple,
and does more exact pattern matching attempting
to circumvent the issue.

llvm-svn: 348493
2018-12-06 15:39:17 +00:00
Matthias Braun d041212c07 AArch64: Fix invalid CCMP emission
The code emitting AND-subtrees used to check whether any of the operands
was an OR in order to figure out if the result needs to be negated.
However the OR could be hidden in further subtrees and not immediately
visible.

Change the code so that canEmitConjunction() determines whether the
result of the generated subtree needs to be negated. Cleanup emission
logic to use this. I also changed the code a bit to make all negation
decisions early before we actually emit the subtrees.

This fixes http://llvm.org/PR39550

Differential Revision: https://reviews.llvm.org/D54137

llvm-svn: 348444
2018-12-06 01:40:23 +00:00
Aditya Nandakumar f75d4f329c [GISel]: Provide standard interface to observe changes in GISel passes
https://reviews.llvm.org/D54980

This provides a standard API across GISel passes to observe and notify
passes about changes (insertions/deletions/mutations) to MachineInstrs.
This patch also removes the recordInsertion method in MachineIRBuilder
and instead provides method to setObserver.

Reviewed by: vkeles.

llvm-svn: 348406
2018-12-05 20:14:52 +00:00
Evandro Menezes 4b5707550c [AArch64] Reword description of feature (NFC)
Reword the description of the feature that enables custom handling of cheap
instructions.

llvm-svn: 348398
2018-12-05 18:42:57 +00:00
Saleem Abdulrasool efd2cb8a0d AArch64: support funclets in fastcall and swift_call
Functions annotated with `__fastcall` or `__attribute__((__fastcall__))`
or `__attribute__((__swiftcall__))` may contain SEH handlers even on
Win64.  This matches the behaviour of cl which allows for
`__try`/`__except` inside a `__fastcall` function.  This was detected
while trying to self-host clang on Windows ARM64.

llvm-svn: 348337
2018-12-05 07:09:20 +00:00
Amara Emerson 8547f4fb7f [AArch64][GlobalISel] Re-enable selection of volatile loads.
We previously disabled this in r323371 because of a bug where we selected an
extending load, but didn't delete the old G_LOAD, resulting in two loads being
generated for volatile loads.

Since we now have dedicated G_SEXTLOAD/G_ZEXTLOAD operations, and that the
tablegen patterns should no longer be able to select (ext(load x)) patterns, it
should be safe to re-enable it.

The old test case should still work as expected.

llvm-svn: 348320
2018-12-05 00:03:09 +00:00
Saleem Abdulrasool a9248fdfab AArch64: clean up some whitespace in Windows CC (NFC)
Drive by clean up for Windows ARM64 variadic CC (NFC).

llvm-svn: 348310
2018-12-04 22:19:29 +00:00
Simon Pilgrim 58d44235e5 Fix -Wparentheses warning. NFCI.
llvm-svn: 348254
2018-12-04 12:24:10 +00:00
Simon Pilgrim 1a2e0200ac Revert rL348121 from llvm/trunk: [NFC][AArch64] Split out backend features
This patch splits backend features currently
hidden behind architecture versions.

For example, currently the only way to activate
complex numbers extension is targeting an v8.3
architecture, where after the patch this extension
can be added separately.

This refactoring is required by the new command lines proposal:
http://lists.llvm.org/pipermail/llvm-dev/2018-September/126346.html

Reviewers: DavidSpickett, olista01, t.p.northover

Subscribers: kristof.beyls, bryanpkc, javed.absar, pbarrio

Differential revision: https://reviews.llvm.org/D54633

........

This has been causing buildbots failures for the past 24 hours: http://lab.llvm.org:8011/builders/llvm-clang-x86_64-expensive-checks-win/builds/14386

llvm-svn: 348249
2018-12-04 10:55:48 +00:00
Sanjin Sijaric dc6403d133 [ARM64][Windows] Fix local stack size for funclets
The comment was misplaced, and the code didn't do what the comment indicated,
namely ignoring the varargs portion when computing the local stack size of a
funclet in emitEpilogue.  This results in incorrect offset computations within
funclets that are contained in vararg functions.

Differential Revision: https://reviews.llvm.org/D55096

llvm-svn: 348222
2018-12-04 00:54:52 +00:00
Jessica Paquette bce2086ad1 [MachineOutliner] Move stack instr check logic to getOutliningCandidateInfo
This moves the stack check logic into a lambda within getOutliningCandidateInfo.

This allows us to be less conservative with stack checks. Whether or not a
stack instruction is safe to outline is dependent on the frame variant and call
variant of the outlined function; only in cases where we modify the stack can
these be unsafe.

So, if we move that logic later, when we're looking at an individual candidate,
we can make better decisions here.

This gives some code size savings as a result.

llvm-svn: 348220
2018-12-04 00:31:55 +00:00
Jessica Paquette 2f5833ecd9 [MachineOutliner][AArch64][NFC] Add early exit to candidate discarding logic
If we dropped too many candidates to be beneficial when dropping candidates
that modify the stack, there's no reason to check for other cost model
qualities.

llvm-svn: 348219
2018-12-04 00:31:47 +00:00
Jessica Paquette 2accb31690 [MachineOutliner] Drop candidates that require fixups if it's beneficial
If it's a bigger code size win to drop candidates that require stack fixups
than to demote every candidate to that variant, the outliner should do that.

This happens if the number of bytes taken by calls to functions that don't
require fixups, plus the number of bytes that'd be left is less than the
number of bytes that it'd take to emit a save + restore for all candidates.

Also add tests for each possible new behaviour.

- machine-outliner-compatible-candidates shows that when we have candidates
that don't use the stack, we can use the default call variant along with the
no save/regsave variant.

- machine-outliner-all-stack shows that when it's better to fix up the stack,
we still will demote all candidates to that case

- machine-outliner-drop-stack shows that we can discard candidates that
require stack fixups when it would be beneficial to do so.

llvm-svn: 348168
2018-12-03 19:11:27 +00:00
Pablo Barrio a17f855698 [AArch64] Add command-line option for SSBS
Summary:
SSBS (Speculative Store Bypass Safe) is only mandatory from 8.5
onwards but is optional from Armv8.0-A. This patch adds a command
line option to enable SSBS, as it was previously only possible to
enable by selecting -march=armv8.5-a.

Similar patch upstream in GNU binutils:
https://sourceware.org/ml/binutils/2018-09/msg00274.html

Reviewers: olista01, samparker, aemerson

Reviewed By: samparker

Subscribers: javed.absar, kristof.beyls, kristina, llvm-commits

Differential Revision: https://reviews.llvm.org/D54629

llvm-svn: 348137
2018-12-03 14:00:47 +00:00
Diogo N. Sampaio 3c7d062b6b [NFC][AArch64] Split out backend features
This patch splits backend features currently
hidden behind architecture versions.

For example, currently the only way to activate
complex numbers extension is targeting an v8.3
architecture, where after the patch this extension
can be added separately.

This refactoring is required by the new command lines proposal:
http://lists.llvm.org/pipermail/llvm-dev/2018-September/126346.html

Reviewers: DavidSpickett, olista01, t.p.northover

Subscribers: kristof.beyls, bryanpkc, javed.absar, pbarrio

Differential revision: https://reviews.llvm.org/D54633

llvm-svn: 348121
2018-12-03 11:08:13 +00:00
Jessica Paquette 9a7103b0f8 [MachineOutliner][AArch64] Improve checks for stack instructions
If we know that we'll definitely save LR to a register, there's no reason to
pre-check whether or not a stack instruction is unsafe to fix up.

This makes it so that we check for that condition before mapping instructions.

This allows us to outline more, since we don't pessimise as many instructions.

Also update some tests, since we outline more.

llvm-svn: 348081
2018-12-01 21:24:06 +00:00
Jessica Paquette 1cb18ec4ec [MachineOutliner] Outline both register save calls + no LR save calls together
Instead of treating the outlined functions for these as distinct frames, they
should be combined into one case. Neither allows for stack fixups, and both
generate the same frame. Thus, they ought to be considered one case.

This makes the code far easier to understand, for one thing. It also offers
some small code size improvements. It's fairly rare to see a class of outlined
functions that doesn't fall entirely into one variant (on CTMark anyway). It
does happen from time to time though.

This mostly offers some serious simplification.

Also update the test to show the added functionality.

llvm-svn: 348036
2018-11-30 21:14:58 +00:00
Peter Collingbourne 35fcc294ab AArch64: Don't emit CFI for SCS register in nounwind functions.
All that you can legitimately do with the CFI for a nounwind function
is get a backtrace, and adjusting the SCS register is not (currently)
required for this purpose.

Differential Revision: https://reviews.llvm.org/D54988

llvm-svn: 348035
2018-11-30 21:04:25 +00:00
Francis Visoiu Mistrih 0b8dd4488e [MachineScheduler] Order FI-based memops based on stack direction
It makes more sense to order FI-based memops in descending order when
the stack goes down. This allows offsets to stay "consecutive" and allow
easier pattern matching.

llvm-svn: 347906
2018-11-29 20:03:19 +00:00
Craig Topper 129d529ab3 [SelectionDAG][AArch64][X86] Move legalization of vector MULHS/MULHU from LegalizeDAG to LegalizeVectorOps
I believe we should be legalizing these with the rest of vector binary operations. If any custom lowering is required for these nodes, this will give the DAG combine between LegalizeVectorOps and LegalizeDAG to run on the custom code before constant build_vectors are lowered in LegalizeDAG.

I've moved MULHU/MULHS handling in AArch64 from Lowering to isel. Moving the lowering earlier caused build_vector+extract_subvector simplifications to kick in which made the generated code worse.

Differential Revision: https://reviews.llvm.org/D54276

llvm-svn: 347902
2018-11-29 19:36:17 +00:00
Petr Pavlu e6406d568c [GlobalISel] Make EnableGlobalISel always set when GISel is enabled
Change meaning of TargetOptions::EnableGlobalISel. The flag was
previously set only when a target switched on GlobalISel but it is now
always set when the GlobalISel pipeline is enabled. This makes the flag
consistent with TargetOptions::EnableFastISel and allows its use in
other parts of the compiler to determine when GlobalISel is enabled.

The EnableGlobalISel flag had previouly only one use in
TargetPassConfig::isGlobalISelAbortEnabled(). The method used its value
to determine if GlobalISel was enabled by a target and returned false in
such a case. To preserve the current behaviour, a new flag
TargetOptions::GlobalISelAbort is introduced to separately record the
abort behaviour.

Differential Revision: https://reviews.llvm.org/D54518

llvm-svn: 347861
2018-11-29 12:56:32 +00:00
Francis Visoiu Mistrih 879087ce5b [MachineScheduler] Add support for clustering mem ops with FI base operands
Before this patch, the following stores in `merge_fail` would fail to be
merged, while they would get merged in `merge_ok`:

```
void use(unsigned long long *);
void merge_fail(unsigned key, unsigned index)
{
  unsigned long long args[8];
  args[0] = key;
  args[1] = index;
  use(args);
}
void merge_ok(unsigned long long *dst, unsigned a, unsigned b)
{
  dst[0] = a;
  dst[1] = b;
}
```

The reason is that `getMemOpBaseImmOfs` would return false for FI base
operands.

This adds support for this.

Differential Revision: https://reviews.llvm.org/D54847

llvm-svn: 347747
2018-11-28 12:00:28 +00:00
Francis Visoiu Mistrih d7eebd6d83 [CodeGen][NFC] Make `TII::getMemOpBaseImmOfs` return a base operand
Currently, instructions doing memory accesses through a base operand that is
not a register can not be analyzed using `TII::getMemOpBaseRegImmOfs`.

This means that functions such as `TII::shouldClusterMemOps` will bail
out on instructions using an FI as a base instead of a register.

The goal of this patch is to refactor all this to return a base
operand instead of a base register.

Then in a separate patch, I will add FI support to the mem op clustering
in the MachineScheduler.

Differential Revision: https://reviews.llvm.org/D54846

llvm-svn: 347746
2018-11-28 12:00:20 +00:00
Evandro Menezes 9ef79c884a [TableGen] Refactor macro names (NFC)
Make the names for the macros for `TargetInstrInfo` uniform.

llvm-svn: 347706
2018-11-27 20:58:27 +00:00
David Blaikie 1fecbec5fa AArch64ISelLowering: Remove a return-of-assignment to allow NRVO
Patch by Arthur O'Dwyer!

llvm-svn: 347609
2018-11-26 22:57:18 +00:00
Evandro Menezes 6a38a5effe [AArch64] Refactor the scheduling predicates (3/3) (NFC)
Refactor the scheduling predicates based on `MCInstPredicate`.  In this
case, `AArch64InstrInfo::hasExtendedReg()`.

Differential revision: https://reviews.llvm.org/D54822

llvm-svn: 347599
2018-11-26 21:47:46 +00:00