Commit Graph

42886 Commits

Author SHA1 Message Date
Davide Italiano 9b8e3d308f [Solaris] emit .init_array instead of .ctors on Solaris (Sparc/x86)
Patch by Fedor Sergeev.

Differential Revision:  https://reviews.llvm.org/D33868

llvm-svn: 305948
2017-06-21 20:36:32 +00:00
Krzysztof Parzyszek fd048cc0ec [Hexagon] Handle more types of immediate operands in expand-condsets
llvm-svn: 305943
2017-06-21 19:21:30 +00:00
Lei Huang 84dbbfdeb9 [PowerPC] define target hook isReallyTriviallyReMaterializable()
Define target hook isReallyTriviallyReMaterializable() to explicitly specify
PowerPC instructions that are trivially rematerializable.  This will allow
the MachineLICM pass to accurately identify PPC instructions that should always
be hoisted.

Differential Revision: https://reviews.llvm.org/D34255

llvm-svn: 305932
2017-06-21 17:17:56 +00:00
Dmitry Preobrazhensky 851a3d9f05 [AMDGPU][MC][GFX9] Corrected VOP3P relevant code to fix disassembler failures
See Bug 33509: https://bugs.llvm.org//show_bug.cgi?id=33509

Reviewers: Sam Kolton, Artem Tamazov, Valery Pykhtin

Differential Revision: https://reviews.llvm.org/D34360

llvm-svn: 305923
2017-06-21 16:00:54 +00:00
Dmitry Preobrazhensky dc4ac823ec [AMDGPU][MC] Corrected V_*QSAD* instructions to check that dest register is different than any of the src
See Bug 33279: https://bugs.llvm.org//show_bug.cgi?id=33279

Reviewers: artem.tamazov, vpykhtin

Differential Revision: https://reviews.llvm.org/D34003

llvm-svn: 305915
2017-06-21 14:41:34 +00:00
Sanjay Patel cec6a500a8 [x86] fix formatting; NFC
llvm-svn: 305914
2017-06-21 14:27:11 +00:00
Christof Douma c1c28051d2 [AARCH64][LSE] Preliminary support for ARMv8.1 LSE Atomics.
Implemented support to AArch64 codegen for ARMv8.1 Large System
Extensions atomic instructions. Where supported, these instructions can
provide atomic operations with higher performance.

Currently supported operations include: fetch_add, fetch_or, fetch_xor,
fetch_smin, fetch_min/max (signed and unsigned), swap, and
compare_exchange.

This implementation implies sequential-consistency ordering, more
relaxed ordering is under development.

Subtarget->hasLSE is currently supported for Cavium ThunderX2T99.

Patch by Ananth Jasty.

Differential Revision: https://reviews.llvm.org/D33586

Change-Id: I82f6d3d64255622791ceb0715b7ab9f4dc4d4b2c
llvm-svn: 305893
2017-06-21 10:58:31 +00:00
Florian Hahn 8552e591a1 [AArch64] Add early exit to promoteLoadFromStore.
There should be at most a single kill flag for the
promoted operand between the store/load pair.
Discussed in https://reviews.llvm.org/D34402.

llvm-svn: 305889
2017-06-21 09:51:52 +00:00
Strahinja Petrovic d280ea4f76 [MIPS] Fix for selecting of DINS/INS instruction
This patch adds one more condition in selection DINS/INS
instruction, which fixes MultiSource/Applications/JM/ldecod/
for mips32r2 (and mips64r2 n32 abi).

Differential Revision: https://reviews.llvm.org/D33725

llvm-svn: 305888
2017-06-21 09:25:51 +00:00
Sam Kolton 549c89d2c9 [AMDGPU] SDWA: merge VI and GFX9 pseudo instructions
Summary: Previously there were two separate pseudo instruction for SDWA on VI and on GFX9. Created one pseudo instruction that is union of both of them. Added verifier to check that operands conform either VI or GFX9.

Reviewers: dp, arsenm, vpykhtin

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, artem.tamazov

Differential Revision: https://reviews.llvm.org/D34026

llvm-svn: 305886
2017-06-21 08:53:38 +00:00
Florian Hahn 80e485179e [AArch64] Preserve register flags when promoting a load from store.
Summary:
This patch updates promoteLoadFromStore to use the store MachineOperand as the
source operand of the of the new instruction instead of creating a new
register MachineOperand. This way, the existing register flags are
preserved. 

This fixes PR33468 (https://bugs.llvm.org/show_bug.cgi?id=33468). 


Reviewers: MatzeB, t.p.northover, junbuml

Reviewed By: MatzeB

Subscribers: aemerson, rengolin, javed.absar, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D34402

llvm-svn: 305885
2017-06-21 08:47:23 +00:00
Rafael Espindola 3ac4c09daf clang-format a region.
It will make a followup patch easier to read.

llvm-svn: 305865
2017-06-20 22:53:29 +00:00
Matt Arsenault 67cd347e93 AMDGPU: Allow vectorization of packed types
llvm-svn: 305844
2017-06-20 20:38:06 +00:00
Stanislav Mekhanoshin a9d846c6ef [AMDGPU] Fix illegal shrink of V_SUBB_U32 and V_ADDC_U32
If there is an immediate operand we shall not shrink V_SUBB_U32
and V_ADDC_U32, it does not fit e32 encoding.

Differential Revison: https://reviews.llvm.org/D34291

llvm-svn: 305840
2017-06-20 20:33:44 +00:00
Matt Arsenault 9698f1c862 AMDGPU: Start adding global_* instructions
llvm-svn: 305838
2017-06-20 19:54:14 +00:00
Matt Arsenault ff3f912e74 AMDGPU: Do operand folding in program order
Before it was possible to partially fold use instructions
before the defs. After the xor is folded into a copy, the same
mov can end up in the fold list twice, so on the second attempt
it will fail expecting to see a register to fold.

llvm-svn: 305821
2017-06-20 18:56:32 +00:00
Matt Arsenault 76858f5a1d AMDGPU: Preserve undef when folding register operands
If the source was a copy of an undef register, this would
produce a read of an undefined register which is a verifier
error.

llvm-svn: 305816
2017-06-20 18:41:31 +00:00
Stanislav Mekhanoshin 465a1ff193 [AMDGPU] Eliminate SGPR to VGPR copy when possible
SGPRs are generally cheaper, so try to use them over VGPRs.

Differential Revision: https://reviews.llvm.org/D34130

llvm-svn: 305815
2017-06-20 18:32:42 +00:00
Matt Arsenault 7f67b35901 AMDGPU: Fix crash with undef vreg input operand
llvm-svn: 305814
2017-06-20 18:28:02 +00:00
Hiroshi Inoue e7a35539c5 [PowerPC] fix trivial typos in comment, NFC
llvm-svn: 305813
2017-06-20 17:53:33 +00:00
Sanjay Patel 0656629b87 [x86] enable CGP memcmp() expansion for 2/4/8 byte sizes
There are a couple of potential improvements as seen in the IR and asm:
1. We're unnecessarily extending to a larger type to compare values.
2. The codegen for (select cond, 1, -1) could avoid a cmov.
(or we could change the order of the compares, so we have a select with 0 operand)

llvm-svn: 305802
2017-06-20 15:58:30 +00:00
Simon Pilgrim 4822b5b649 [X86][SSE] Relax 0/-1 vector element insertion to work for any vector with >=16bit elements
Shuffle lowering/combining now does a good job for 256/512-bit vectors - we don't need to prevent this

llvm-svn: 305801
2017-06-20 15:19:02 +00:00
Simon Pilgrim b233c0a5d2 [X86][SSE] Dropped old INSERT_VECTOR_ELT lowering TODO
Target shuffle combining now supports the matching of INSERT_VECTOR_ELT/PINSRW/PINSRB for merging multiple insertions into shuffles/bitmasks.

llvm-svn: 305788
2017-06-20 10:33:34 +00:00
Igor Breger 0dee0f458e [GlobalISel][X86] fix compilation error ( -Werror=unused-function )
llvm-svn: 305786
2017-06-20 09:40:57 +00:00
Igor Breger 1dcd5e8dc8 [GlobalISel][X86] Get correct RegClass for given RegBank.
Summary:
In some cases RegClass depends on target feature. Hight (16-31) vector registers exist only if AVX512f available.
Split from https://reviews.llvm.org/D33665

Reviewers: qcolombet, t.p.northover, zvi, guyblank

Reviewed By: t.p.northover, guyblank

Subscribers: guyblank, rovka, llvm-commits, kristof.beyls

Differential Revision: https://reviews.llvm.org/D33952

Conflicts:
	test/CodeGen/X86/GlobalISel/select-memop-scalar.mir

llvm-svn: 305784
2017-06-20 09:15:10 +00:00
Alexandros Lamprineas 2b2b420563 [ARM] Support constant pools in data when generating execute-only code.
Resubmission of r305387, which was reverted at r305390. The Address
Sanitizer caught a stack-use-after-scope of a Twine variable. This
is now fixed by passing the Twine directly as a function parameter.

The ARM backend asserts against constant pool lowering when it generates
execute-only code in order to prevent the generation of constant pools in
the text section. It appears that target independent optimizations might
generate DAG nodes that represent constant pools. By lowering such nodes
as global addresses we don't violate the semantics of execute-only code
and also it is guaranteed that execute-only behaves correct with the
position-independent addressing modes that support execute-only code.

Differential Revision: https://reviews.llvm.org/D33773

llvm-svn: 305776
2017-06-20 07:20:52 +00:00
Matt Arsenault c595185f8f AMDGPU: Fix scratch wave offset relative FI expansion
The offset may not be an inline immediate, so this needs
to be materialized into a register. The post-RA run of
SIShrinkInstructions is able to fold it later if it can.

llvm-svn: 305761
2017-06-19 23:47:21 +00:00
Stanislav Mekhanoshin 50c2f251f5 [AMDGPU] Add infer address spaces pass before SROA
It adds it for the target after inlining but before SROA where
we can get most out of it.

Differential Revision: https://reviews.llvm.org/D34366

llvm-svn: 305759
2017-06-19 23:17:36 +00:00
Eugene Zelenko 8361b0a9bb [Target] Fix some Clang-tidy modernize-use-using and Include What You Use warnings; other minor fixes (NFC).
llvm-svn: 305757
2017-06-19 22:43:19 +00:00
Geoff Berry 5e46600e3a [AArch64][Falkor] Fix MOVZ sched predicate to not assert on non-imm operands (e.g. blockaddress).
llvm-svn: 305752
2017-06-19 21:57:44 +00:00
Geoff Berry e9972cabbd [AArch64][Kryo] Add missing write latency for LDAXP, LDXP second destination.
Fixes PR33491 and PR33512.

llvm-svn: 305751
2017-06-19 21:57:42 +00:00
Geoff Berry 3cc4b9f780 [AArch64][Falkor] Refine load/store increment latencies.
Also fix LDXP & LDAXP write latency to avoid similar assert as PR33491 and PR33512.

llvm-svn: 305750
2017-06-19 21:56:21 +00:00
Matt Arsenault e0e68a757e AMDGPU: Cleanup CreateLiveInRegister
llvm-svn: 305748
2017-06-19 21:52:45 +00:00
Nico Weber 4c5c02a448 Revert r305382, it caused PR33513.
llvm-svn: 305735
2017-06-19 19:48:59 +00:00
Hans Wennborg ca69fc1cb7 Revert r304824 "Fix PR23384 (part 3 of 3)"
This seems to be interacting badly with ASan somehow, causing false reports of
heap-buffer overflows: PR33514.

> Summary:
> The patch makes instruction count the highest priority for
> LSR solution for X86 (previously registers had highest priority).
>
> Reviewers: qcolombet
>
> Differential Revision: http://reviews.llvm.org/D30562
>
> From: Evgeny Stupachenko <evstupac@gmail.com>

llvm-svn: 305720
2017-06-19 17:57:15 +00:00
Florian Hahn fd44ca6c76 [AArch64] Fix order of checks in shouldScheduleAdjacent.
We need to check the opcode of FirstMI before accessing the operands. This
caused a buildbot failure during bootstrapping on AArch64.

llvm-svn: 305694
2017-06-19 13:45:41 +00:00
Tom Stellard ff63ee0db5 AMDGPU/GlobalISel: Mark G_BITCAST s32 <--> <2 x s16> legal
Reviewers: arsenm

Reviewed By: arsenm

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, igorb, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D34129

llvm-svn: 305692
2017-06-19 13:15:45 +00:00
Igor Breger bd2dedaa38 [GlobalISel][X86] Fold FI/G_GEP into LDR/STR instruction addressing mode.
Summary: Implement some of the simplest addressing modes.It should help to test ABI.

Reviewers: zvi, guyblank

Reviewed By: guyblank

Subscribers: rovka, llvm-commits, kristof.beyls

Differential Revision: https://reviews.llvm.org/D33888

llvm-svn: 305691
2017-06-19 13:12:57 +00:00
Florian Hahn 5f746c8e27 Recommit rL305677: [CodeGen] Add generic MacroFusion pass
Use llvm::make_unique to avoid ambiguity with MSVC.

This patch adds a generic MacroFusion pass, that is used on X86 and
AArch64, which both define target-specific shouldScheduleAdjacent
functions. This generic pass should make it easier for other targets to
implement macro fusion and I intend to add macro fusion for ARM shortly.

Differential Revision: https://reviews.llvm.org/D34144

llvm-svn: 305690
2017-06-19 12:53:31 +00:00
Diana Picus 78aaf7db04 [ARM] GlobalISel: Support G_ICMP for s8 and s16
Widen to s32 (like all other binary ops).

llvm-svn: 305683
2017-06-19 11:47:28 +00:00
Florian Hahn e16d3106f3 Revert r305677 [CodeGen] Add generic MacroFusion pass.
This causes Windows buildbot failures do an ambiguous call.

llvm-svn: 305681
2017-06-19 11:26:15 +00:00
Florian Hahn ee1b096f8a [CodeGen] Add generic MacroFusion pass.
Summary:
This patch adds a generic MacroFusion pass, that is used on X86 and
AArch64, which both define target-specific shouldScheduleAdjacent
functions. This generic pass should make it easier for other targets to
implement macro fusion and I intend to add macro fusion for ARM shortly.

Reviewers: craig.topper, evandro, t.p.northover, atrick, MatzeB

Reviewed By: MatzeB

Subscribers: atrick, aemerson, mgorny, javed.absar, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D34144

llvm-svn: 305677
2017-06-19 10:51:38 +00:00
Diana Picus 621894ac76 [ARM] GlobalISel: Support G_ICMP for i32 and pointers
Add support throughout the pipeline:
- mark as legal for s32 and pointers
- map to GPRs
- lower to a sequence of instructions, which moves 0 or 1 into the
  result register based on the flags set by a CMPrr

We have copied from FastISel a helper function which maps CmpInst
predicates into ARMCC codes. Ideally, we should be able to move it
somewhere that both FastISel and GlobalISel can use.

llvm-svn: 305672
2017-06-19 09:40:51 +00:00
Eric Christopher c70d07b7ea Rework logic and comment out the default relocation models for PPC.
llvm-svn: 305630
2017-06-17 02:25:56 +00:00
Eric Christopher 5ec30ef4e4 Turn a large if block into a smaller early return for clarity.
llvm-svn: 305629
2017-06-17 02:25:55 +00:00
Eric Christopher ded727c5a8 Remove the old and unused PPC32 and PPC64TargetMachine classes.
llvm-svn: 305628
2017-06-17 02:25:53 +00:00
Eric Christopher 3b78693b9a Remove unused forward declaration.
llvm-svn: 305627
2017-06-17 02:25:51 +00:00
Eric Christopher a5b50cd645 Tidy up some calls to getRegister for readability.
llvm-svn: 305626
2017-06-17 02:25:49 +00:00
Tim Shen d123c194e0 [PPC] Remove isBarrier from CFENCE8's definition.
Summary:
This is my misunderstanding on isBarrier. It's not for memory barriers,
but for other control flow purposes. lwsync doesn't have it either.

This fixes a simple crash with -verify-machineinstrs like below:

  define void @Foo() {
  entry:
    %tmp = load atomic i64, i64* undef acquire, align 8
    unreachable
  }

I deliberately don't want to check in the test, since there is little
chance to regress on such a mistake. Such a test adds noise to the code
base.

I plan to check in first, since it fixes a crash, and the fix is obvious.

Reviewers: kbarton, echristo

Subscribers: sanjoy, nemanjai, hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D34314

llvm-svn: 305624
2017-06-17 01:25:34 +00:00
Sam Clegg 9d24fb7ff3 [WebAssembly] Use __stack_pointer global when writing wasm binary
This ensures that symbolic relocations are generated for stack
pointer manipulations.

These relocations are of type R_WEBASSEMBLY_GLOBAL_INDEX_LEB.
This change also adds support for reading relocations of this
type in WasmObjectFile.cpp.

Since its a globally imported symbol this does mean that
the get_global/set_global instruction won't be valid until
the objects are linked that global used in no longer an
imported global.

Differential Revision: https://reviews.llvm.org/D34172

llvm-svn: 305616
2017-06-16 23:59:10 +00:00
Yonghong Song a63178f756 bpf: fix a strict-aliasing issue
Davide Italiano reported the following issue if llvm
is compiled with gcc -Wstrict-aliasing -Werror:
.....
lib/Target/BPF/CMakeFiles/LLVMBPFCodeGen.dir/BPFISelDAGToDAG.cpp.o
../lib/Target/BPF/BPFISelDAGToDAG.cpp: In member function ‘virtual
void {anonymous}::BPFDAGToDAGISel::PreprocessISelDAG()’:
../lib/Target/BPF/BPFISelDAGToDAG.cpp:264:26: warning: dereferencing
type-punned pointer will break strict-aliasing rules
[-Wstrict-aliasing]
       val = *(uint16_t *)new_val;
.....

The error is caused by my previous commit (revision 305560).

This patch fixed the issue by introducing an union to avoid
type casting.

Signed-off-by: Yonghong Song <yhs@fb.com>
llvm-svn: 305608
2017-06-16 23:28:04 +00:00
Yonghong Song ac2e25026f bpf: avoid load from read-only sections
If users tried to have a structure decl/init code like below
   struct test_t t = { .memeber1 = 45 };
It is very likely that compiler will generate a readonly section
to hold up the init values for variable t. Later load of t members,
e.g., t.member1 will result in a read from readonly section.

BPF program cannot handle relocation. This will force users to
write:
  struct test_t t = {};
  t.member1 = 45;
This is just inconvenient and unintuitive.

This patch addresses this issue by implementing BPF PreprocessISelDAG.
For any load from a global constant structure or an global array of
constant struct, it attempts to
translate it into a constant directly. The traversal of the
constant struct and other constant data structures are similar
to where the assembler emits read-only sections.

Four different unit test cases are also added to cover
different scenarios.

Signed-off-by: Yonghong Song <yhs@fb.com>
llvm-svn: 305560
2017-06-16 15:41:16 +00:00
Yonghong Song 5c65439617 bpf: set missing types in insn tablegen file
o This is discovered during my study of 32-bit subregister
  support.
o This is no impact on current functionality since we
  only support 64-bit registers.
o Searching the web, looks like the issue has been discovered
  before, so fix it now.

Signed-off-by: Yonghong Song <yhs@fb.com>
llvm-svn: 305559
2017-06-16 15:30:55 +00:00
Simon Dardis 5852c4c108 Revert "[mips][microMIPS] Extending size reduction pass with ADDIUSP and ADDIUR1SP"
This reverts commit r305455. This commit was reported as breaking one of
the sanitizer buildbots. Reverting until lab.llvm.org comes back online.

llvm-svn: 305557
2017-06-16 14:00:33 +00:00
Krzysztof Parzyszek 3a40b34123 [Hexagon] Don't kill live registers when creating mux out of tfr
The second part of r305300: when placing the mux at the later location,
make sure that it won't use any register that was killed between the
two original instructions. Remove any such kills and transfer them to
the mux.

llvm-svn: 305553
2017-06-16 12:24:03 +00:00
Alfred Huang f9b521fdaf [AMDGPU] Testing commit access only, no real change
llvm-svn: 305523
2017-06-15 23:02:55 +00:00
Alexander Timofeev 0f9c84cd93 DivergencyAnalysis patch for review
llvm-svn: 305494
2017-06-15 19:33:10 +00:00
Lei Huang b4733ca8c5 [MachineLICM] Hoist TOC-based address instructions
Add condition for MachineLICM to safely hoist instructions that utilize
non constant registers that are reserved.

On PPC, global variable access is done through the table of contents (TOC)
which is always in register X2.  The ABI reserves this register in any
functions that have calls or access global variables.

A call through a function pointer involves saving, changing and restoring
this register around the call and thus MachineLICM does not consider it to
be invariant. We can however guarantee the register is preserved across the
call and thus is invariant.

Differential Revision: https://reviews.llvm.org/D33562

llvm-svn: 305490
2017-06-15 18:29:59 +00:00
Hiroshi Inoue 7a08bb1458 [PowerPC] fix potential verification errors on CFENCE8
This patch fixes a potential verification error (64-bit register operands for cmpw) with -verify-machineinstrs.

Differential Revision: https://reviews.llvm.org/D34208

llvm-svn: 305479
2017-06-15 16:51:28 +00:00
Simon Dardis 24ca9da2de [mips] Fix documentation of member variable. NFCI.
llvm-svn: 305478
2017-06-15 16:28:28 +00:00
Simon Pilgrim 07cfc80186 Remove trailing whitespace. NFCI.
llvm-svn: 305476
2017-06-15 16:20:27 +00:00
Simon Pilgrim 4d432b2c6b [X86][AVX2] Fix issue in lowerV8I16GeneralSingleInputVectorShuffle that was assuming v8i16 vectors
We can use this with v16i16/v32i16 as well.

Found during fuzz testing.

llvm-svn: 305472
2017-06-15 14:52:30 +00:00
Nirav Dave 85e92223b4 [AArch64] Add indexed check to splitStores. NFC.
Add explicit check for unhandled cases in preparation for delaying
splitStores to post-legalization.

llvm-svn: 305471
2017-06-15 14:47:44 +00:00
Simon Pilgrim b98cb3808c Revert r305465: [X86][AVX512] Improve lowering of AVX512 compare intrinsics (remove redundant shift left+right instructions).
This is causing windows buildbot failures

llvm-svn: 305470
2017-06-15 14:39:34 +00:00
Ayman Musa 56912cda71 [X86][AVX512] Improve lowering of AVX512 compare intrinsics (remove redundant shift left+right instructions).
AVX512 compare instructions return v*i1 types.
In cases where the number of elements in the returned value are less than 8, clang adds zeroes to get a mask of v8i1 type.
Later on it's replaced with CONCAT_VECTORS, which then is lowered to many DAG nodes including insert/extract element and shift right/left nodes.
The fact that AVX512 compare instructions put the result in a k register and zeroes all its upper bits allows us to remove the extra nodes simply by copying the result to the required register class.

When lowering, identify these cases and transform them into an INSERT_SUBVECTOR node (marked legal), then catch this pattern in instructions selection phase and transform it into one avx512 cmp instruction.

Differential Revision: https://reviews.llvm.org/D33188

llvm-svn: 305465
2017-06-15 13:02:37 +00:00
Diana Picus 02e11010b2 [ARM] GlobalISel: Add support for i32 modulo
Add support for modulo for targets that have hardware division and for
those that don't. When hardware division is not available, we have to
choose the correct libcall to use. This is generally straightforward,
except for AEABI.

The AEABI variant is trickier than the other libcalls because it
returns { quotient, remainder }, instead of just one value like the
other libcalls that we've seen so far. Therefore, we need to use custom
lowering for it. However, we don't want to have too much special code,
so we refactor the target-independent code in the legalizer by adding a
helper for replacing an instruction with a libcall. This helper is used
by the legalizer itself when dealing with simple calls, and also by the
custom ARM legalization for the more complicated AEABI divmod calls.

llvm-svn: 305459
2017-06-15 10:53:31 +00:00
Diana Picus 8fd1601d32 [ARM] GlobalISel: Lower only homogeneous struct args
Lowering mixed struct args, params and returns used G_INSERT, which is a
bit more convoluted to support through the entire pipeline. Since they
don't occur that often in practice, it's probably wiser to leave them
out until later.

Meanwhile, we can lower homogeneous structs using G_MERGE_VALUES, which
has good support in the legalizer. These occur e.g. as the return of
__aeabi_idivmod, so it's nice to be able to support them.

llvm-svn: 305458
2017-06-15 09:42:02 +00:00
Florian Hahn 0a26d2c298 [AArch64] Enable FeatureFuseAES for the generic processor model.
Summary:
Scheduling AESE/AESMC and AESD/AESIMC instruction pairs back-to-back
gives a double digit speedup on benchmarks using those instructions on
Cortex-A processors. In GCC, this optimization is part of the generic
processor model as well.

This change should not have a major performance impact on processors
that do not optimize AES instruction pairs, although I only had access
to Cortex-A processors for benchmarking.


Reviewers: rengolin, kristof.beyls, javed.absar, evandro, silviu.baranga, MatzeB, mcrosier, joelkevinjones, joel_k_jones, bmakam, t.p.northover

Reviewed By: evandro

Subscribers: sbaranga, aemerson, llvm-commits

Differential Revision: https://reviews.llvm.org/D33836

llvm-svn: 305457
2017-06-15 09:31:23 +00:00
Zoran Jovanovic d9299293ad [mips][microMIPS] Extending size reduction pass with ADDIUSP and ADDIUR1SP
Author: milena.vujosevic.janicic
Reviewers: sdardis
The patch extends size reduction pass for MicroMIPS.
The following instructions are examined and transformed, if possible:
ADDIU instruction is transformed into 16-bit instruction ADDIUSP
ADDIU instruction is transformed into 16-bit instruction ADDIUR1SP
Differential Revision: https://reviews.llvm.org/D33887

llvm-svn: 305455
2017-06-15 09:14:33 +00:00
Sanjay Patel 83cb007940 [x86] avoid unnecessary shuffle mask math in combineX86ShufflesRecursively()
This is a follow-up to https://reviews.llvm.org/D34174 / https://reviews.llvm.org/rL305398.

We mentioned replacing the multiplies with shifts, but the real win seems to be in
bypassing the extra ops in the common case when the RootRatio and OpRatio are one.

This gives us another 1-2% overall win for the test in PR32037:
https://bugs.llvm.org/show_bug.cgi?id=32037

llvm-svn: 305414
2017-06-14 20:37:11 +00:00
Lei Huang f689f69fea Test commit - NFC.
Modified a comment to confirm commit access functionality.

llvm-svn: 305402
2017-06-14 17:25:55 +00:00
Sanjay Patel ce0b99563a [x86] replace div/rem with shift/mask for better shuffle combining perf
We know that shuffle masks are power-of-2 sizes, but there's no way (?) for LLVM to know that, 
so hack combineX86ShufflesRecursively() to be much faster by replacing div/rem with shift/mask.

This makes the motivating compile-time bug in PR32037 ( https://bugs.llvm.org/show_bug.cgi?id=32037 ) 
about 9% faster overall.

Differential Revision: https://reviews.llvm.org/D34174

llvm-svn: 305398
2017-06-14 17:00:57 +00:00
Alexandros Lamprineas 1c15ee2631 Revert "[ARM] Support constant pools in data when generating execute-only code."
This reverts commit 3a204faa093c681a1e96c5e0622f50649b761ee0.

I've upset a buildbot which runs the address sanitizer:
ERROR: AddressSanitizer: stack-use-after-scope
lib/Target/ARM/ARMISelLowering.cpp:2690
That Twine variable is used illegally.

llvm-svn: 305390
2017-06-14 15:00:08 +00:00
Simon Dardis 9790e39f45 [mips] Fix multiprecision arithmetic.
For multiprecision arithmetic on MIPS, rather than using ISD::ADDE / ISD::ADDC,
get SelectionDAG to break down the operation into ISD::ADDs and ISD::SETCCs.

For MIPS, only the DSP ASE has a carry flag, so in the general case it is not
useful to directly support ISD::{ADDE, ADDC, SUBE, SUBC} nodes.

Also improve the generation code in such cases for targets with
TargetLoweringBase::ZeroOrOneBooleanContent by directly using the result of the
comparison node rather than using it in selects. Similarly for ISD::SUBE /
ISD::SUBC.

Address optimization breakage by moving the generation of MIPS specific integer
multiply-accumulate nodes to before legalization.

This revolves PR32713 and PR33424.

Thanks to Simonas Kazlauskas and Pirama Arumuga Nainar for reporting the issue!

Reviewers: slthakur

Differential Revision: https://reviews.llvm.org/D33494

llvm-svn: 305389
2017-06-14 14:46:30 +00:00
Alexandros Lamprineas c582d6e133 [ARM] Support constant pools in data when generating execute-only code.
The ARM backend asserts against constant pool lowering when it generates
execute-only code in order to prevent the generation of constant pools in
the text section. It appears that target independent optimizations might
generate DAG nodes that represent constant pools. By lowering such nodes
as global addresses we don't violate the semantics of execute-only code
and also it is guaranteed that execute-only behaves correct with the
position-independent addressing modes that support execute-only code.

Differential Revision: https://reviews.llvm.org/D33773

llvm-svn: 305387
2017-06-14 13:22:41 +00:00
Simon Dardis 941a49b6d6 [mips] Fix machine verifier errors in the long branch pass
This patch fixes two systemic machine verifier errors in the long
branch pass. The first is the incorrect basic block successors
and the second was the incorrect construction of several jump
instructions.

This partially resolves PR27458 and the associated PR32146.

Reviewers: slthakur

Differential Revision: https://reviews.llvm.org/D33378

llvm-svn: 305382
2017-06-14 12:16:47 +00:00
Nemanja Ivanovic 7855185bbb Revert r304907 as it is causing some failures that I cannot reproduce.
Reverting this until a test case can be provided to aid the investigation.

llvm-svn: 305372
2017-06-14 07:05:42 +00:00
Davide Italiano 36559b2527 [AMDGPU] Remove now dead defaultOffsetS13(). NFCI.
Fixes the GCC7 build with -Werror.

llvm-svn: 305329
2017-06-13 22:24:24 +00:00
Sam Clegg ae03c1e724 [WebAssembly] Cleanup WebAssemblyWasmObjectWriter
Differential Revision: https://reviews.llvm.org/D34131

llvm-svn: 305316
2017-06-13 18:51:50 +00:00
Geoff Berry 13d5dcb093 [AArch64][Falkor] Fix sched details for FDIV, FSQRT, SDIV, UDIV
llvm-svn: 305310
2017-06-13 17:43:39 +00:00
Kit Barton 0b216305db Test commit - NFC.
Modified a comment to confirm commit access functionality.

llvm-svn: 305309
2017-06-13 17:35:29 +00:00
Krzysztof Parzyszek b3a8d20e27 [Hexagon] Generate store-immediate instructions for stack objects
Store-immediate instructions have a non-extendable offset. Since the
actual offset for a stack object is not known until much later, only
generate these stores when the stack size (at the time of instruction
selection) is small.

llvm-svn: 305305
2017-06-13 17:10:16 +00:00
Krzysztof Parzyszek c83c267b84 [Hexagon] Generate multiply-high instruction in isel
llvm-svn: 305302
2017-06-13 16:21:57 +00:00
Yonghong Song 7e9d2cb553 bpf: clang-format on BPFAsmPrinter.cpp
Signed-off-by: Yonghong Song <yhs@fb.com>
llvm-svn: 305301
2017-06-13 16:17:20 +00:00
Krzysztof Parzyszek de2ac17b7b [Hexagon] Don't kill live registers when creating mux out of tfr
When a mux instruction is created from a pair of complementary conditional
transfers, it can be placed at the location of either the earlier or the
later of the transfers. Since it will use the operands of the original
transfers, putting it in the earlier location may hoist a kill of a source
register that was originally further down. Make sure the kill flag is
removed if the register is still used afterwards.

llvm-svn: 305300
2017-06-13 16:07:36 +00:00
Simon Dardis c38d391f56 [MIPS] BuildCondBr should preserve MO flags
While simplifying branches in the MachineInstr representation, the
routine BuildCondBr must preserve flags on register MachineOperands. In
particular, it must preserve the <undef> flag.

This fixes a bug that is unlikely to occur in any real scenario, but
which bugpoint is likely to introduce.

Patch By Nick Johnson!

Reviewers: ahatanak, sdardis

Differential Revision: https://reviews.llvm.org/D34041

llvm-svn: 305290
2017-06-13 14:11:29 +00:00
Krzysztof Parzyszek 9bd4d91037 [Hexagon] Stop pmpy recognition when shift conversion fails
The conversion of shifts from right shifts to left shifts may fail.
In such case, the pmpy recognition cannot proceed.

llvm-svn: 305289
2017-06-13 13:51:49 +00:00
Oliver Stannard 852fbd2fea [ARM] Add scheduling classes for VFNM[AS]
The VFNM[AS] instructions did not have scheduling information attached, which
was causing assertion failures with the Cortex-A57 scheduling model and
-fp-contract=fast, because the Cortex-A57 sched model claims to be complete.

Differential Revision: https://reviews.llvm.org/D34139

llvm-svn: 305288
2017-06-13 13:04:32 +00:00
Simon Pilgrim 9ff06a0c7e Strip UTF8 BOM that got added in rL305091
Seems my recent move to VS2017 has resulted in a few text editor issues.....

llvm-svn: 305285
2017-06-13 10:17:57 +00:00
Simon Pilgrim 2b3b717768 [X86][SSE] Refactor getTargetConstantBitsFromNode to avoid large APInts (PR32037)
Much of PR32037's compile time regression is due to getTargetConstantBitsFromNode always creating large (>64bit) APInts during the bitcasting from the source data to the destination bitwidth.

This commit avoids this bitcast stage if the data is already the correct bitwidth.

llvm-svn: 305284
2017-06-13 10:13:48 +00:00
NAKAMURA Takumi 3807ab24c6 PPCISelLowering.cpp: Fix warnings in r305214. [-Wdocumentation]
llvm-svn: 305277
2017-06-13 07:34:32 +00:00
Craig Topper 8b8767662c [AVX-512] Mark masked VPCMP instructions as commutable.
llvm-svn: 305276
2017-06-13 07:13:50 +00:00
Craig Topper e1d8103d8f [AVX-512] Mark masked version of vpcmpeq as being commutable.
llvm-svn: 305275
2017-06-13 07:13:47 +00:00
Craig Topper 42d0339257 [X86] Add masked integer compare instructions to load folding tables.
llvm-svn: 305274
2017-06-13 07:13:44 +00:00
Sam Clegg 7736855dee [WebAssembly] Fix symbol type for addresses of external functions
These symbols were previously not being marked as functions
so were appearing as globals instead, and with the incorrect
relocation type.

Without this fix, objects that take address of external
functions include them as global imports rather than function
imports which then fails at link time.

Differential Revision: https://reviews.llvm.org/D34068

llvm-svn: 305263
2017-06-13 01:42:21 +00:00
Tom Stellard ee6e6452df AMDGPU/GlobalISel: Mark 32-bit G_ADD as legal
Reviewers: arsenm

Reviewed By: arsenm

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, igorb, dstuttard, tpr, llvm-commits, t-tye

Differential Revision: https://reviews.llvm.org/D33992

llvm-svn: 305232
2017-06-12 20:54:56 +00:00
Tim Northover 7a61316e89 AArch64: don't try to emit an add (shifted reg) for SP.
The "Add/sub (shifted reg)" instructions use the 31 encoding for xzr and wzr
rather than the SP, so we need to use different variants.

Situations where this actually comes up are rare enough (see test-case) that I
think falling back to DAG is fine.

llvm-svn: 305230
2017-06-12 20:49:53 +00:00
Tony Jiang 1a8eec141a [PowerPC] Match vec_revb builtins to P9 instructions.
Power9 has instructions that will reverse the bytes within an element for all
sizes (half-word, word, double-word and quad-word). These can be used for the
vec_revb builtins in altivec.h. However, we implement these to match vector
shuffle nodes as that will cover both the builtins and vector shuffles that
occur in the SDAG through other means.

Differential Revision: https://reviews.llvm.org/D33690

llvm-svn: 305214
2017-06-12 18:24:36 +00:00
Tony Jiang 30a49d1a3d [Power9] Added support for the modsw, moduw, modsd, modud hardware instructions.
Note that if we need the result of both the divide and the modulo then we
compute the modulo based on the result of the divide and not using the new
hardware instruction.

Commit on behalf of STEFAN PINTILIE.
Differential Revision: https://reviews.llvm.org/D33940

llvm-svn: 305210
2017-06-12 17:58:42 +00:00
Matt Arsenault 05c26472fa AMDGPU: Don't add same implicit use multiple times
For the last component, the same register use
was added as an implicit use and another implicit kill use.

llvm-svn: 305205
2017-06-12 17:19:20 +00:00