Commit Graph

20569 Commits

Author SHA1 Message Date
Yonghong Song ac2e25026f bpf: avoid load from read-only sections
If users tried to have a structure decl/init code like below
   struct test_t t = { .memeber1 = 45 };
It is very likely that compiler will generate a readonly section
to hold up the init values for variable t. Later load of t members,
e.g., t.member1 will result in a read from readonly section.

BPF program cannot handle relocation. This will force users to
write:
  struct test_t t = {};
  t.member1 = 45;
This is just inconvenient and unintuitive.

This patch addresses this issue by implementing BPF PreprocessISelDAG.
For any load from a global constant structure or an global array of
constant struct, it attempts to
translate it into a constant directly. The traversal of the
constant struct and other constant data structures are similar
to where the assembler emits read-only sections.

Four different unit test cases are also added to cover
different scenarios.

Signed-off-by: Yonghong Song <yhs@fb.com>
llvm-svn: 305560
2017-06-16 15:41:16 +00:00
Daniel Neilson 3faabbbe85 [Atomics] Rename and change prototype for atomic memcpy intrinsic
Summary:

Background: http://lists.llvm.org/pipermail/llvm-dev/2017-May/112779.html

This change is to alter the prototype for the atomic memcpy intrinsic. The prototype itself is being changed to more closely resemble the semantics and parameters of the llvm.memcpy intrinsic -- to ease later combination of the llvm.memcpy and atomic memcpy intrinsics. Furthermore, the name of the atomic memcpy intrinsic is being changed to make it clear that it is not a generic atomic memcpy, but specifically a memcpy is unordered atomic.

Reviewers: reames, sanjoy, efriedma

Reviewed By: reames

Subscribers: mzolotukhin, anna, llvm-commits, skatkov

Differential Revision: https://reviews.llvm.org/D33240

llvm-svn: 305558
2017-06-16 14:43:59 +00:00
Simon Dardis 5852c4c108 Revert "[mips][microMIPS] Extending size reduction pass with ADDIUSP and ADDIUR1SP"
This reverts commit r305455. This commit was reported as breaking one of
the sanitizer buildbots. Reverting until lab.llvm.org comes back online.

llvm-svn: 305557
2017-06-16 14:00:33 +00:00
Krzysztof Parzyszek 3a40b34123 [Hexagon] Don't kill live registers when creating mux out of tfr
The second part of r305300: when placing the mux at the later location,
make sure that it won't use any register that was killed between the
two original instructions. Remove any such kills and transfer them to
the mux.

llvm-svn: 305553
2017-06-16 12:24:03 +00:00
Matthias Braun a42c537912 RegScavenging: Add scavengeRegisterBackwards()
Re-apply r276044/r279124. Trying to reproduce or disprove the ppc64
problems reported in the stage2 build last time, which I cannot
reproduce right now.

This is a variant of scavengeRegister() that works for
enterBasicBlockEnd()/backward(). The benefit of the backward mode is
that it is not affected by incomplete kill flags.

This patch also changes
PrologEpilogInserter::doScavengeFrameVirtualRegs() to use the register
scavenger in backwards mode.

Differential Revision: http://reviews.llvm.org/D21885

llvm-svn: 305516
2017-06-15 22:14:55 +00:00
Alexander Timofeev 0f9c84cd93 DivergencyAnalysis patch for review
llvm-svn: 305494
2017-06-15 19:33:10 +00:00
Lei Huang b4733ca8c5 [MachineLICM] Hoist TOC-based address instructions
Add condition for MachineLICM to safely hoist instructions that utilize
non constant registers that are reserved.

On PPC, global variable access is done through the table of contents (TOC)
which is always in register X2.  The ABI reserves this register in any
functions that have calls or access global variables.

A call through a function pointer involves saving, changing and restoring
this register around the call and thus MachineLICM does not consider it to
be invariant. We can however guarantee the register is preserved across the
call and thus is invariant.

Differential Revision: https://reviews.llvm.org/D33562

llvm-svn: 305490
2017-06-15 18:29:59 +00:00
Arnold Schwaighofer ae9312c487 ISel: Fix FastISel of swifterror values
The code assumed that we process instructions in basic block order.  FastISel
processes instructions in reverse basic block order. We need to pre-assign
virtual registers before selecting otherwise we get def-use relationships wrong.

This only affects code with swifterror registers.

rdar://32659327

llvm-svn: 305484
2017-06-15 17:34:42 +00:00
Hiroshi Inoue 7a08bb1458 [PowerPC] fix potential verification errors on CFENCE8
This patch fixes a potential verification error (64-bit register operands for cmpw) with -verify-machineinstrs.

Differential Revision: https://reviews.llvm.org/D34208

llvm-svn: 305479
2017-06-15 16:51:28 +00:00
Simon Pilgrim 4d432b2c6b [X86][AVX2] Fix issue in lowerV8I16GeneralSingleInputVectorShuffle that was assuming v8i16 vectors
We can use this with v16i16/v32i16 as well.

Found during fuzz testing.

llvm-svn: 305472
2017-06-15 14:52:30 +00:00
Simon Pilgrim b98cb3808c Revert r305465: [X86][AVX512] Improve lowering of AVX512 compare intrinsics (remove redundant shift left+right instructions).
This is causing windows buildbot failures

llvm-svn: 305470
2017-06-15 14:39:34 +00:00
Ayman Musa 56912cda71 [X86][AVX512] Improve lowering of AVX512 compare intrinsics (remove redundant shift left+right instructions).
AVX512 compare instructions return v*i1 types.
In cases where the number of elements in the returned value are less than 8, clang adds zeroes to get a mask of v8i1 type.
Later on it's replaced with CONCAT_VECTORS, which then is lowered to many DAG nodes including insert/extract element and shift right/left nodes.
The fact that AVX512 compare instructions put the result in a k register and zeroes all its upper bits allows us to remove the extra nodes simply by copying the result to the required register class.

When lowering, identify these cases and transform them into an INSERT_SUBVECTOR node (marked legal), then catch this pattern in instructions selection phase and transform it into one avx512 cmp instruction.

Differential Revision: https://reviews.llvm.org/D33188

llvm-svn: 305465
2017-06-15 13:02:37 +00:00
Diana Picus 02e11010b2 [ARM] GlobalISel: Add support for i32 modulo
Add support for modulo for targets that have hardware division and for
those that don't. When hardware division is not available, we have to
choose the correct libcall to use. This is generally straightforward,
except for AEABI.

The AEABI variant is trickier than the other libcalls because it
returns { quotient, remainder }, instead of just one value like the
other libcalls that we've seen so far. Therefore, we need to use custom
lowering for it. However, we don't want to have too much special code,
so we refactor the target-independent code in the legalizer by adding a
helper for replacing an instruction with a libcall. This helper is used
by the legalizer itself when dealing with simple calls, and also by the
custom ARM legalization for the more complicated AEABI divmod calls.

llvm-svn: 305459
2017-06-15 10:53:31 +00:00
Diana Picus 8fd1601d32 [ARM] GlobalISel: Lower only homogeneous struct args
Lowering mixed struct args, params and returns used G_INSERT, which is a
bit more convoluted to support through the entire pipeline. Since they
don't occur that often in practice, it's probably wiser to leave them
out until later.

Meanwhile, we can lower homogeneous structs using G_MERGE_VALUES, which
has good support in the legalizer. These occur e.g. as the return of
__aeabi_idivmod, so it's nice to be able to support them.

llvm-svn: 305458
2017-06-15 09:42:02 +00:00
Florian Hahn 0a26d2c298 [AArch64] Enable FeatureFuseAES for the generic processor model.
Summary:
Scheduling AESE/AESMC and AESD/AESIMC instruction pairs back-to-back
gives a double digit speedup on benchmarks using those instructions on
Cortex-A processors. In GCC, this optimization is part of the generic
processor model as well.

This change should not have a major performance impact on processors
that do not optimize AES instruction pairs, although I only had access
to Cortex-A processors for benchmarking.


Reviewers: rengolin, kristof.beyls, javed.absar, evandro, silviu.baranga, MatzeB, mcrosier, joelkevinjones, joel_k_jones, bmakam, t.p.northover

Reviewed By: evandro

Subscribers: sbaranga, aemerson, llvm-commits

Differential Revision: https://reviews.llvm.org/D33836

llvm-svn: 305457
2017-06-15 09:31:23 +00:00
Zoran Jovanovic d9299293ad [mips][microMIPS] Extending size reduction pass with ADDIUSP and ADDIUR1SP
Author: milena.vujosevic.janicic
Reviewers: sdardis
The patch extends size reduction pass for MicroMIPS.
The following instructions are examined and transformed, if possible:
ADDIU instruction is transformed into 16-bit instruction ADDIUSP
ADDIU instruction is transformed into 16-bit instruction ADDIUR1SP
Differential Revision: https://reviews.llvm.org/D33887

llvm-svn: 305455
2017-06-15 09:14:33 +00:00
Alexandros Lamprineas 1c15ee2631 Revert "[ARM] Support constant pools in data when generating execute-only code."
This reverts commit 3a204faa093c681a1e96c5e0622f50649b761ee0.

I've upset a buildbot which runs the address sanitizer:
ERROR: AddressSanitizer: stack-use-after-scope
lib/Target/ARM/ARMISelLowering.cpp:2690
That Twine variable is used illegally.

llvm-svn: 305390
2017-06-14 15:00:08 +00:00
Simon Dardis 9790e39f45 [mips] Fix multiprecision arithmetic.
For multiprecision arithmetic on MIPS, rather than using ISD::ADDE / ISD::ADDC,
get SelectionDAG to break down the operation into ISD::ADDs and ISD::SETCCs.

For MIPS, only the DSP ASE has a carry flag, so in the general case it is not
useful to directly support ISD::{ADDE, ADDC, SUBE, SUBC} nodes.

Also improve the generation code in such cases for targets with
TargetLoweringBase::ZeroOrOneBooleanContent by directly using the result of the
comparison node rather than using it in selects. Similarly for ISD::SUBE /
ISD::SUBC.

Address optimization breakage by moving the generation of MIPS specific integer
multiply-accumulate nodes to before legalization.

This revolves PR32713 and PR33424.

Thanks to Simonas Kazlauskas and Pirama Arumuga Nainar for reporting the issue!

Reviewers: slthakur

Differential Revision: https://reviews.llvm.org/D33494

llvm-svn: 305389
2017-06-14 14:46:30 +00:00
Alexandros Lamprineas c582d6e133 [ARM] Support constant pools in data when generating execute-only code.
The ARM backend asserts against constant pool lowering when it generates
execute-only code in order to prevent the generation of constant pools in
the text section. It appears that target independent optimizations might
generate DAG nodes that represent constant pools. By lowering such nodes
as global addresses we don't violate the semantics of execute-only code
and also it is guaranteed that execute-only behaves correct with the
position-independent addressing modes that support execute-only code.

Differential Revision: https://reviews.llvm.org/D33773

llvm-svn: 305387
2017-06-14 13:22:41 +00:00
Florian Hahn ffc498dfcc Align definition of DW_OP_plus with DWARF spec [3/3]
Summary:
This patch is part of 3 patches that together form a single patch, but must be introduced in stages in order not to break things.
 
The way that LLVM interprets DW_OP_plus in DIExpression nodes is basically that of the DW_OP_plus_uconst operator since LLVM expects an unsigned constant operand. This unnecessarily restricts the DW_OP_plus operator, preventing it from being used to describe the evaluation of runtime values on the expression stack. These patches try to align the semantics of DW_OP_plus and DW_OP_minus with that of the DWARF definition, which pops two elements off the expression stack, performs the operation and pushes the result back on the stack.
 
This is done in three stages:
• The first patch (LLVM) adds support for DW_OP_plus_uconst.
• The second patch (Clang) contains changes all its uses from DW_OP_plus to DW_OP_plus_uconst.
• The third patch (LLVM) changes the semantics of DW_OP_plus and DW_OP_minus to be in line with its DWARF meaning. This patch includes the bitcode upgrade from legacy DIExpressions.

Patch by Sander de Smalen.

Reviewers: echristo, pcc, aprantl

Reviewed By: aprantl

Subscribers: fhahn, javed.absar, aprantl, llvm-commits

Differential Revision: https://reviews.llvm.org/D33894

llvm-svn: 305386
2017-06-14 13:14:38 +00:00
Simon Dardis 941a49b6d6 [mips] Fix machine verifier errors in the long branch pass
This patch fixes two systemic machine verifier errors in the long
branch pass. The first is the incorrect basic block successors
and the second was the incorrect construction of several jump
instructions.

This partially resolves PR27458 and the associated PR32146.

Reviewers: slthakur

Differential Revision: https://reviews.llvm.org/D33378

llvm-svn: 305382
2017-06-14 12:16:47 +00:00
Nemanja Ivanovic 7855185bbb Revert r304907 as it is causing some failures that I cannot reproduce.
Reverting this until a test case can be provided to aid the investigation.

llvm-svn: 305372
2017-06-14 07:05:42 +00:00
Daniel Sanders 4e52366c2a [globalisel][legalizer] G_LOAD/G_STORE NarrowScalar should not emit G_GEP x, 0.
Summary:
When legalizing G_LOAD/G_STORE using NarrowScalar, we should avoid emitting
	%0 = G_CONSTANT ty 0
	%1 = G_GEP %x, %0
since it's cheaper to not emit the redundant instructions than it is to fold them
away later.

Reviewers: qcolombet, t.p.northover, ab, rovka, aditya_nandakumar, kristof.beyls

Reviewed By: qcolombet

Subscribers: javed.absar, llvm-commits, igorb

Differential Revision: https://reviews.llvm.org/D32746

llvm-svn: 305340
2017-06-13 23:42:32 +00:00
Krzysztof Parzyszek b3a8d20e27 [Hexagon] Generate store-immediate instructions for stack objects
Store-immediate instructions have a non-extendable offset. Since the
actual offset for a stack object is not known until much later, only
generate these stores when the stack size (at the time of instruction
selection) is small.

llvm-svn: 305305
2017-06-13 17:10:16 +00:00
Krzysztof Parzyszek c83c267b84 [Hexagon] Generate multiply-high instruction in isel
llvm-svn: 305302
2017-06-13 16:21:57 +00:00
Krzysztof Parzyszek de2ac17b7b [Hexagon] Don't kill live registers when creating mux out of tfr
When a mux instruction is created from a pair of complementary conditional
transfers, it can be placed at the location of either the earlier or the
later of the transfers. Since it will use the operands of the original
transfers, putting it in the earlier location may hoist a kill of a source
register that was originally further down. Make sure the kill flag is
removed if the register is still used afterwards.

llvm-svn: 305300
2017-06-13 16:07:36 +00:00
Simon Dardis c38d391f56 [MIPS] BuildCondBr should preserve MO flags
While simplifying branches in the MachineInstr representation, the
routine BuildCondBr must preserve flags on register MachineOperands. In
particular, it must preserve the <undef> flag.

This fixes a bug that is unlikely to occur in any real scenario, but
which bugpoint is likely to introduce.

Patch By Nick Johnson!

Reviewers: ahatanak, sdardis

Differential Revision: https://reviews.llvm.org/D34041

llvm-svn: 305290
2017-06-13 14:11:29 +00:00
Krzysztof Parzyszek 9bd4d91037 [Hexagon] Stop pmpy recognition when shift conversion fails
The conversion of shifts from right shifts to left shifts may fail.
In such case, the pmpy recognition cannot proceed.

llvm-svn: 305289
2017-06-13 13:51:49 +00:00
Oliver Stannard 852fbd2fea [ARM] Add scheduling classes for VFNM[AS]
The VFNM[AS] instructions did not have scheduling information attached, which
was causing assertion failures with the Cortex-A57 scheduling model and
-fp-contract=fast, because the Cortex-A57 sched model claims to be complete.

Differential Revision: https://reviews.llvm.org/D34139

llvm-svn: 305288
2017-06-13 13:04:32 +00:00
Craig Topper 8b8767662c [AVX-512] Mark masked VPCMP instructions as commutable.
llvm-svn: 305276
2017-06-13 07:13:50 +00:00
Craig Topper e1d8103d8f [AVX-512] Mark masked version of vpcmpeq as being commutable.
llvm-svn: 305275
2017-06-13 07:13:47 +00:00
Craig Topper 42d0339257 [X86] Add masked integer compare instructions to load folding tables.
llvm-svn: 305274
2017-06-13 07:13:44 +00:00
Tom Stellard ee6e6452df AMDGPU/GlobalISel: Mark 32-bit G_ADD as legal
Reviewers: arsenm

Reviewed By: arsenm

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, igorb, dstuttard, tpr, llvm-commits, t-tye

Differential Revision: https://reviews.llvm.org/D33992

llvm-svn: 305232
2017-06-12 20:54:56 +00:00
Tim Northover 7a61316e89 AArch64: don't try to emit an add (shifted reg) for SP.
The "Add/sub (shifted reg)" instructions use the 31 encoding for xzr and wzr
rather than the SP, so we need to use different variants.

Situations where this actually comes up are rare enough (see test-case) that I
think falling back to DAG is fine.

llvm-svn: 305230
2017-06-12 20:49:53 +00:00
Tony Jiang 1a8eec141a [PowerPC] Match vec_revb builtins to P9 instructions.
Power9 has instructions that will reverse the bytes within an element for all
sizes (half-word, word, double-word and quad-word). These can be used for the
vec_revb builtins in altivec.h. However, we implement these to match vector
shuffle nodes as that will cover both the builtins and vector shuffles that
occur in the SDAG through other means.

Differential Revision: https://reviews.llvm.org/D33690

llvm-svn: 305214
2017-06-12 18:24:36 +00:00
Tony Jiang 30a49d1a3d [Power9] Added support for the modsw, moduw, modsd, modud hardware instructions.
Note that if we need the result of both the divide and the modulo then we
compute the modulo based on the result of the divide and not using the new
hardware instruction.

Commit on behalf of STEFAN PINTILIE.
Differential Revision: https://reviews.llvm.org/D33940

llvm-svn: 305210
2017-06-12 17:58:42 +00:00
Sanjay Patel 5e7b7b7503 [x86] regenerate checks with update_llc_test_checks.py
The dream of a unified check-line auto-generator for all phases of compilation is dead.
The llc script has already diverged to be better at its goal, so having 2 scripts that
do almost the same thing is just causing confusion.

We can rip out the llc ability in update_test_checks.py next and rename it, so it will
be clear that we have one script for llc check auto-generation and another for opt.

llvm-svn: 305206
2017-06-12 17:31:36 +00:00
Geoff Berry 06c9dc3d9c [SelectionDAG] Allow sin/cos -> sincos optimization on GNU triples w/ just -fno-math-errno
Summary:
This change enables the sin(x) cos(x) -> sincos(x) optimization on GNU
target triples.  This optimization was being inhibited when -ffast-math
wasn't set because sincos in GLibC does not set errno, while sin and cos
do.  However, this optimization will only run if the attributes on the
sin/cos calls include readnone, which is how clang represents the fact
that it doesn't care about the errno values set by these functions (via
the -fno-math-errno flag).

Reviewers: hfinkel, bogner

Subscribers: mcrosier, javed.absar, llvm-commits, paul.redmond

Differential Revision: https://reviews.llvm.org/D32921

llvm-svn: 305204
2017-06-12 17:15:41 +00:00
Matt Arsenault d9b77848f2 AMDGPU: Teach isLegalAddressingMode about flat offsets
Also fix reporting r+r as a valid addressing mode without
offsets.

llvm-svn: 305203
2017-06-12 17:06:35 +00:00
Sanjay Patel 9d13a18845 [x86] regenerate checks with update_llc_test_checks.py
The dream of a unified check-line auto-generator for all phases of compilation is dead.
The llc script has already diverged to be better at its goal, so having 2 scripts that
do almost the same thing is just causing confusion for newcomers. I plan to fix up more
x86 tests in a next commit. We can rip out the llc ability in update_test_checks.py after
that. 

llvm-svn: 305202
2017-06-12 17:05:43 +00:00
Matt Arsenault db7c6a8731 AMDGPU: Start selecting flat instruction offsets
llvm-svn: 305201
2017-06-12 16:53:51 +00:00
Matt Arsenault fd02314113 AMDGPU: Start adding offset fields to flat instructions
llvm-svn: 305194
2017-06-12 15:55:58 +00:00
Than McIntosh 14d61436c0 StackColoring: smarter check for slot overlap
Summary:
The old check for slot overlap treated 2 slots `S` and `T` as
overlapping if there existed a CFG node in which both of the slots could
possibly be active. That is overly conservative and caused stack blowups
in Rust programs. Instead, check whether there is a single CFG node in
which both of the slots are possibly active *together*.

Fixes PR32488.

Patch by Ariel Ben-Yehuda <ariel.byd@gmail.com>

Reviewers: thanm, nagisa, llvm-commits, efriedma, rnk

Reviewed By: thanm

Subscribers: dotdash

Differential Revision: https://reviews.llvm.org/D31583

llvm-svn: 305193
2017-06-12 14:56:02 +00:00
Craig Topper 69fead95c7 [AVX-512] Add VPCONFLICT and VPLZCNT to load folding tables.
llvm-svn: 305180
2017-06-12 04:57:31 +00:00
Sanjay Patel dcbfbb11d9 [x86] use vperm2f128 rather than vinsertf128 when there's a chance to fold a 32-byte load
I was looking closer at the x86 test diffs in D33866, and the first change seems like it 
shouldn't happen in the first place. So this patch will resolve that.

Using Agner's tables and AMD docs, vperm2f128 and vinsertf128 have identical timing for 
any given CPU model, so we should be able to interchange those without affecting perf. 
But as we can see in some of the diffs here, using vperm2f128 allows load folding, so 
we should take that opportunity to reduce code size and register pressure.

A secondary advantage is making AVX1 and AVX2 codegen more similar. Given that vperm2f128 
was introduced with AVX1, we should be selecting it in all of the same situations that we 
would with AVX2. If there's some reason that an AVX1 CPU would not want to use this 
instruction, that should be fixed up in a later pass.

Differential Revision: https://reviews.llvm.org/D33938

llvm-svn: 305171
2017-06-11 21:18:58 +00:00
Amaury Sechet 2127452ff7 [DAGCombine] Make sure we check the ResNo from UADDO before combining
Summary: UADDO has 2 result, and one must check the result no before doing any kind of combine. Without it, the transform is invalid.

Reviewers: joerg

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D34088

llvm-svn: 305162
2017-06-11 11:36:38 +00:00
Simon Pilgrim 8622f51e94 [X86][SSE] Extended PR32368 to SSE/AVX1/AVX2
llvm-svn: 305154
2017-06-10 21:13:01 +00:00
Simon Pilgrim 46619359db [X86][AVX512] Added test case for PR32368
llvm-svn: 305153
2017-06-10 20:58:43 +00:00
Wei Ding 7c3e5115a5 AMDGPU : Fix ISA Version Definitions.
Differential Revision: http://reviews.llvm.org/D28531

llvm-svn: 305137
2017-06-10 03:53:19 +00:00
Sanjay Patel dd96270472 [PowerPC] add memcmp test with one constant operand and equality cmp; NFC
llvm-svn: 305131
2017-06-09 23:15:14 +00:00
I-Jui (Ray) Sung 21fde385fa [AArch64] Add fallback in FastISel fp16 conversions
Summary:
- Fix assertion failures on F16 to/from int types in FastISel by falling
  back to regular ISel
- Add a testcase of various conversion cases with FastISel (-O0)

Reviewers: kristof.beyls, jmolloy, SjoerdMeijer

Reviewed By: SjoerdMeijer

Subscribers: SjoerdMeijer, llvm-commits, srhines, pirama, aemerson, rengolin, javed.absar, kristof.beyls

Differential Revision: https://reviews.llvm.org/D33734

llvm-svn: 305127
2017-06-09 22:40:50 +00:00
Stanislav Mekhanoshin 1a61ab8172 [AMDGPU] Add intrinsics for alignbit and alignbyte instructions
Differential Revision: https://reviews.llvm.org/D34046

llvm-svn: 305098
2017-06-09 19:03:00 +00:00
Simon Pilgrim 3d37b1a277 [X86][SSE] Add support for PACKSS nodes to faux shuffle extraction
If the inputs won't saturate during packing then we can treat the PACKSS as a truncation shuffle

llvm-svn: 305091
2017-06-09 17:29:52 +00:00
Simon Dardis 212cccb2f4 Reland "[SelectionDAG] Enable target specific vector scalarization of calls and returns"
By target hookifying getRegisterType, getNumRegisters, getVectorBreakdown,
backends can request that LLVM to scalarize vector types for calls
and returns.

The MIPS vector ABI requires that vector arguments and returns are passed in
integer registers. With SelectionDAG's new hooks, the MIPS backend can now
handle LLVM-IR with vector types in calls and returns. E.g.
'call @foo(<4 x i32> %4)'.

Previously these cases would be scalarized for the MIPS O32/N32/N64 ABI for
calls and returns if vector types were not legal. If vector types were legal,
a single 128bit vector argument would be assigned to a single 32 bit / 64 bit
integer register.

By teaching the MIPS backend to inspect the original types, it can now
implement the MIPS vector ABI which requires a particular method of
scalarizing vectors.

Previously, the MIPS backend relied on clang to scalarize types such as "call
@foo(<4 x float> %a) into "call @foo(i32 inreg %1, i32 inreg %2, i32 inreg %3,
i32 inreg %4)".

This patch enables the MIPS backend to take either form for vector types.

The previous version of this patch had a "conditional move or jump depends on
uninitialized value".

Reviewers: zoran.jovanovic, jaydeep, vkalintiris, slthakur

Differential Revision: https://reviews.llvm.org/D27845

llvm-svn: 305083
2017-06-09 14:37:08 +00:00
David Stuttard 82618baa0f [AMDGPU] Fix for issue in alloca to vector promotion pass
Summary:
Alloca promotion pass not dealing with non-canonical input

Added some additional checks so the pass simply backs-off forms it can't deal with (non-canonical)

Also added some test cases in non-canonical form to check that it no longer crashes

Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tpr, t-tye

Differential Revision: https://reviews.llvm.org/D31710

llvm-svn: 305079
2017-06-09 14:16:22 +00:00
Nirav Dave 43a4d8122f Prevent RemoveDeadNodes from deleted already deleted node.
This prevents against assertion errors like PR32659 which occur from a
replacement deleting a node after it's been added to the list argument
of RemoveDeadNodes. The specific failure from PR32659 does not
currently happen, but it is still potentially possible. The underlying
cause is that the callers of the change dfunction builds up a list of
nodes to delete after having moved their uses and it possible that a
move of a later node will cause a previously deleted nodes to be
deleted.

Reviewers: bkramer, spatel, davide

Reviewed By: spatel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D33731

llvm-svn: 305070
2017-06-09 12:57:35 +00:00
Oliver Stannard ad0973557c [ARM] Add scheduling info for VFMS
The scalar VFMS instructions did not have scheduling information attached (but
VFMA did), which was causing assertion failures with the Cortex-A57 scheduling
model and -fp-contract=fast.

Differential Revision: https://reviews.llvm.org/D34040

llvm-svn: 305064
2017-06-09 09:19:09 +00:00
Matthias Braun 1ee25e0c3f RegAllocPBQP: Do not assign reserved physical register
(0) RegAllocPBQP: Since getRawAllocationOrder() may return a collection that includes reserved physical registers, iterate to find an un-reserved physical register.

(1) VirtRegMap: Enforce the invariant: "no reserved physical registers" in assignVirt2Phys(). Previously, this was checked only after the fact in VirtRegRewriter::rewrite.

(2) MachineVerifier: updated the test per MatzeB's review.

(3) +testcase

Patch by Nick Johnson<Nicholas.Paul.Johnson@deshawresearch.com>!

Differential Revision: https://reviews.llvm.org/D33947

llvm-svn: 305016
2017-06-08 21:30:54 +00:00
Krzysztof Parzyszek 8a7fb0fe51 [Hexagon] Skip mux generation when predicate register is undefined
llvm-svn: 305014
2017-06-08 20:56:36 +00:00
Sanjay Patel 5e370850d4 [CGP] don't expand a memcmp with nobuiltin attribute
This matches the behavior used in the SDAG when expanding memcmp.

For reference, we're intentionally treating the earlier fortified call transforms differently after:
https://bugs.llvm.org/show_bug.cgi?id=23093
https://reviews.llvm.org/rL233776

One motivation for not transforming nobuiltin calls is that it can interfere with sanitizers:
https://reviews.llvm.org/D19781
https://reviews.llvm.org/D19801

Differential Revision: https://reviews.llvm.org/D34043

llvm-svn: 305007
2017-06-08 19:47:25 +00:00
Matt Arsenault 3c7581bbeb AMDGPU: Use correct register names in inline assembly
Fixes using physical registers in inline asm from clang.

llvm-svn: 305004
2017-06-08 19:03:20 +00:00
Guozhi Wei f31c56df2a [PPC] In PPCBoolRetToInt change the bool value to i64 if the target is ppc64
In PPCBoolRetToInt bool value is changed to i32 type. On ppc64 it may introduce an extra zero extension for the return value. This patch changes the integer type to i64 to avoid the zero extension on ppc64.

This patch fixed PR32442.

Differential Revision: https://reviews.llvm.org/D31407

llvm-svn: 305001
2017-06-08 18:27:24 +00:00
Mark Searles e5c7832311 [AMDGPU] Force qsads instrs to use different dest register than source registers
The V_MQSAD_PK_U16_U8, V_QSAD_PK_U16_U8, and V_MQSAD_U32_U8 take more than 1 pass in hardware. For these three instructions, the destination registers must be different than all sources, so that the first pass does not overwrite sources for the following passes.

Differential Revision: https://reviews.llvm.org/D33783

llvm-svn: 304998
2017-06-08 18:21:19 +00:00
Zaara Syeda 79acbbe513 [Power9] Exploit vector integer extend instructions
This patch adds build vector patterns to exploit the vector integer
extend instructions:
vextsb2w - Vector Extend Sign Byte To Word
vextsb2d - Vector Extend Sign Byte To Doubleword
vextsh2w - Vector Extend Sign Halfword To Word
vextsh2d - Vector Extend Sign Halfword To Doubleword
vextsw2d - Vector Extend Sign Word To Doubleword

Differential Revision: https://reviews.llvm.org/D33510

llvm-svn: 304992
2017-06-08 17:14:36 +00:00
Sanjay Patel 0edcd1d717 [PowerPC] add memcmp test with nobuiltin attr; NFC
In SDAG, we don't expand libcalls with a nobuiltin attribute.
It's not clear if that's correct from the existing code comment:
"Don't do the check if marked as nobuiltin for some reason."

...adding a test here either way to show that there is currently
a different behavior implemented in the CGP-based expansion.

llvm-svn: 304991
2017-06-08 17:09:18 +00:00
Sanjay Patel b5a03797df [x86] remove unused param from tests; NFC
llvm-svn: 304989
2017-06-08 17:02:39 +00:00
Sanjay Patel e7c5041c2a [CGP / PowerPC] avoid multi-block overhead for simple memcmp expansion
The test diff for PowerPC shows we can better optimize if this case is one block.

For x86, there's would be a substantial difference if CGP expansion was enabled because branches are assumed 
cheap and SDAG can't optimize across blocks. 

Instead of this:

_cmp_eq8:
  movq  (%rdi), %rax
  cmpq  (%rsi), %rax
  je  LBB23_1
## BB#2:                                ## %res_block
  movl  $1, %ecx
  jmp LBB23_3
LBB23_1:
  xorl  %ecx, %ecx
LBB23_3:                                ## %endblock
  xorl  %eax, %eax
  testl %ecx, %ecx
  sete  %al
  retq

We get this:

cmp_eq8:   
  movq  (%rdi), %rcx
  xorl  %eax, %eax
  cmpq  (%rsi), %rcx
  sete  %al
  retq

And that matches the optimal codegen that we get from the current expansion in SelectionDAGBuilder::visitMemCmpCall(). 
If this looks right, then I just need to confirm that vector-sized expansion will work from here, and we can enable 
CGP memcmp() expansion for x86. Ie, we'll bypass the power-of-2 special cases currently optimized in SDAG because we 
can lower the IR produced here optimally.

Differential Revision: https://reviews.llvm.org/D34005

llvm-svn: 304987
2017-06-08 16:53:18 +00:00
Andrew V. Tischenko 8cb1d0931f Add scheduler classes to integer/float horizontal operations.
This patch will close PR32801.
Differential Revision: https://reviews.llvm.org/D33203

llvm-svn: 304986
2017-06-08 16:44:13 +00:00
Sanjay Patel 2ab6ee0dc4 [x86] add tests for memcmp expansion; NFC
We already had a test to demonstrate PR33325:
https://bugs.llvm.org/show_bug.cgi?id=33325

I'm adding tests for general memcmp expansion (see D34005 / D33963) and:
https://bugs.llvm.org/show_bug.cgi?id=33329

...plus non-power-of-2 sizes, so we can see what that looks like currently or if expanded.

llvm-svn: 304979
2017-06-08 15:01:29 +00:00
Andrew V. Tischenko e0531025f8 This patch closes PR28513: an optimization of multiplication by different constants.
The initial patch was rejected: I fixed the issue and re-apply it.

llvm-svn: 304972
2017-06-08 10:20:13 +00:00
Diana Picus dbd4589042 [ARM] GlobalISel: Add more tests. NFC
Add a couple of tests to increase coverage for the TableGen'erated code,
in particular for rules where 2 generic instructions may be combined
into a single machine instruction.

llvm-svn: 304971
2017-06-08 09:47:30 +00:00
Krzysztof Parzyszek 5ba13825f0 [Hexagon] Generate 'inbounds' GEPs in HexagonCommonGEP
llvm-svn: 304937
2017-06-07 20:04:33 +00:00
Sanjay Patel 8ce1e3b759 [CGP] avoid zext/trunc of a memcmp expansion compare
This could be viewed as another shortcoming of the DAGCombiner:
when both operands of a compare are zexted from the same source
type, we should be able to compare the original types.

The effect on PowerPC perf is likely unnoticeable, but there's a
visible regression for x86 if we feed the suboptimal IR for memcmp
expansion to the DAG:

_cmp_eq4_zexted_to_i64:
  movl  (%rdi), %ecx
  movl  (%rsi), %edx
  xorl  %eax, %eax
  cmpq  %rdx, %rcx
  sete  %al

_cmp_eq4_better:
  movl  (%rdi), %ecx
  xorl  %eax, %eax
  cmpl  (%rsi), %ecx
  sete  %al

llvm-svn: 304923
2017-06-07 16:16:45 +00:00
Petar Jovanovic 2f5f8e947a [mips][dsp] Modify repl.ph to accept signed immediate values
Changed immediate type for repl.ph from uimm10 to simm10 as per the specs.
Repl.qb still accepts uimm8. Both instructions now mimic the behaviour of
GNU as.

Patch by Stefan Maksimovic.

Differential Revision: https://reviews.llvm.org/D33594

llvm-svn: 304918
2017-06-07 14:48:46 +00:00
Guy Blank e1888d4388 [X86] Add test to demonstrate inefficient lowering of v48i8 shuffle.
llvm-svn: 304915
2017-06-07 14:29:10 +00:00
Tom Stellard 2860a428f7 AMDGPU/GlobalISel: Mark 32-bit G_SELECT as legal
Reviewers: arsenm

Reviewed By: arsenm

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, igorb, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D33949

llvm-svn: 304910
2017-06-07 13:54:51 +00:00
Sanjay Patel 6e8e7cc70e [x86] avoid flipping sign bits for vector icmp by using known bits
If we know that both operands of an unsigned integer vector comparison are non-negative, 
then it's safe to directly use a signed-compare-greater-than instruction (the only non-equality
integer vector compare predicate provided by SSE/AVX).

We're intentionally not changing the condition code to signed in order to preserve the
existing transforms that use min/max/psubus below here.

This should solve PR33276:
https://bugs.llvm.org/show_bug.cgi?id=33276

Differential Revision: https://reviews.llvm.org/D33862

llvm-svn: 304909
2017-06-07 13:46:34 +00:00
Nemanja Ivanovic d8623f0825 [PowerPC] Eliminate integer compare instructions - vol. 5
Adds handling for i64 SETNE comparison (both sign and zero extended).

Differential Revision: https://reviews.llvm.org/D33720

llvm-svn: 304907
2017-06-07 13:18:06 +00:00
Petar Jovanovic 3c039d968e [mips] do not use FastISel when -mxgot is present
The clang compiler by default uses FastISel when invoked with -O0, which
is also the default. In that case, passing of -mxgot does not get honored,
i.e. the code path that is to deal with large got is not taken.
Clang produces same output regardless of -mxgot being present or not.
This change checks whether -mxgot is passed as an option, and turns off
FastISel if it is.

Patch by Stefan Maksimovic.

Differential Revision: https://reviews.llvm.org/D33593

llvm-svn: 304906
2017-06-07 12:59:53 +00:00
Diana Picus 0b4190a9d6 [ARM] GlobalISel: Purge G_SEQUENCE
According to the commit message from r296921, G_MERGE_VALUES and
G_INSERT are to be preferred over G_SEQUENCE. Therefore, stop generating
G_SEQUENCE in the ARM backend and remove the code dealing with it.

This boils down to the code breaking up double values for the soft float
calling convention. Use G_MERGE_VALUES + G_UNMERGE_VALUES instead of
G_SEQUENCE + G_EXTRACT for it. This maps very nicely to VMOVDRR +
VMOVRRD and simplifies the code in the instruction selector.

There's one occurence of G_SEQUENCE left in arm-irtranslator.ll, but
that is part of the target-independent code for translating constant
structs. Therefore, it is beyond the scope of this commit.

llvm-svn: 304902
2017-06-07 12:35:05 +00:00
Nemanja Ivanovic bb67f847d6 [PowerPC] Eliminate integer compare instructions - vol. 3
Adds handling for i32 SETNE comparison (both sign and zero extended).

Differential Revision: https://reviews.llvm.org/D33718

llvm-svn: 304901
2017-06-07 12:23:41 +00:00
Diana Picus 0196427b03 [ARM] GlobalISel: Support G_XOR
Same as the other binary operators:
- legalize to 32 bits
- map to GPRs
- select to EORrr via TableGen'erated code

llvm-svn: 304898
2017-06-07 11:57:30 +00:00
Simon Dardis 7c96ba1920 evert "[mips] Fix test mips64fpldst.ll with machine verifier enabled"
This reverts commit r301394. It broke some internal buildbots, reverting
while the issue is being investigated.

llvm-svn: 304896
2017-06-07 11:21:37 +00:00
Simon Pilgrim 58f5be2771 [X86][SSE] Fix an issue with PEXTRW/PEXTRB indices during shuffle combining
We were checking that the index was in range of the destination vector type, not the (larger) source vector type

llvm-svn: 304894
2017-06-07 10:30:35 +00:00
Diana Picus eeb0aad8e4 [ARM] GlobalISel: Support G_OR
Same as the other binary operators:
- legalize to 32 bits
- map to GPRs
- select ORRrr thanks to TableGen'erated code

llvm-svn: 304890
2017-06-07 10:14:23 +00:00
Diana Picus 8445858a93 [ARM] GlobalISel: Support G_AND
This is identical to the support for the other binary operators:
- widen to s32
- map into GPR
- select ANDrr (via TableGen'erated code)

llvm-svn: 304885
2017-06-07 09:17:41 +00:00
Sanjay Patel f57015d4cc [CGP / PowerPC] use direct compares if there's only one load per block in memcmp() expansion
I'd like to enable CGP memcmp expansion for x86, but the output from CGP would regress the 
special cases (memcmp(x,y,N) != 0 for N=1,2,4,8,16,32 bytes) that we already handle.

I'm not sure if we'll actually be able to produce the optimal code given the block-at-a-time 
limitation in the DAG. We might have to just avoid those special-cases here in CGP. But 
regardless of that, I think this is a win for the more general cases.

http://rise4fun.com/Alive/cbQ

Differential Revision: https://reviews.llvm.org/D33963

llvm-svn: 304849
2017-06-07 00:17:08 +00:00
Sanjay Patel 7a52296f1f [PowerPC] auto-generate full checks and increase test coverage
3 of the tests were testing exactly the same thing: memcmp(x, y, 16) != 0.
I changed that to test 4, 7, and 16 bytes, so we can see how those differ.

llvm-svn: 304838
2017-06-06 22:06:07 +00:00
Evgeny Stupachenko ff9d80cc7a Added tests for X86InterleavedStore.
Reviewers: RKSimon, DavidKreitzer

Differential Revision: https://reviews.llvm.org/D33684

Patch by: Aleen Farhana <Farhana.aleen@gmail.com>

llvm-svn: 304834
2017-06-06 21:08:00 +00:00
Matthias Braun 7e23fc05c1 llc: Add ability to parse mir from stdin
- Add -x <language> option to switch between IR and MIR inputs.
- Change MIR parser to read from stdin when filename is '-'.
- Add a simple mir roundtrip test.

llvm-svn: 304825
2017-06-06 20:06:57 +00:00
Evgeny Stupachenko 3b88291581 Fix PR23384 (part 3 of 3)
Summary:
The patch makes instruction count the highest priority for
 LSR solution for X86 (previously registers had highest priority).

Reviewers: qcolombet

Differential Revision: http://reviews.llvm.org/D30562

From: Evgeny Stupachenko <evstupac@gmail.com>
llvm-svn: 304824
2017-06-06 20:04:16 +00:00
Matthias Braun 8b5f9e4438 MIRPrinter: Avoid assert() when printing empty INLINEASM strings.
CodeGen uses MO_ExternalSymbol to represent the inline assembly strings.
Empty strings for symbol names appear to be invalid. For now just
special case the output code to avoid hitting an `assert()` in
`printLLVMNameWithoutPrefix()`.

This fixes https://llvm.org/PR33317

llvm-svn: 304815
2017-06-06 19:00:58 +00:00
Petar Jovanovic 64fb7a8ebd [mips] Add madd4 subtarget feature
Addition of a feature and a predicate used to control generation of madd.fmt
and similar instructions.

Patch by Stefan Maksimovic.

Differential Revision: https://reviews.llvm.org/D33400

llvm-svn: 304801
2017-06-06 15:33:01 +00:00
Simon Pilgrim f7113fd270 [X86][AVX1] Split 256-bit vector non-temporal FastISel loads to keep it non-temporal (PR32744)
Extension to D33728

llvm-svn: 304798
2017-06-06 14:18:39 +00:00
Tom Stellard 8cd60a5067 AMDGPU/GlobalISel: Mark 32-bit G_ICMP as legal
Reviewers: arsenm

Reviewed By: arsenm

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, igorb, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D33890

llvm-svn: 304797
2017-06-06 14:16:50 +00:00
Vivek Pandya 56d87ef5d7 [Improve CodeGen Testing] This patch renables MIRPrinter print fields which have value equal to its default.
If -simplify-mir option is passed then MIRPrinter will not print such fields.
This change also required some lit test cases in CodeGen directory to be changed.

Reviewed By: MatzeB

Differential Revision: https://reviews.llvm.org/D32304

llvm-svn: 304779
2017-06-06 08:16:19 +00:00
Chandler Carruth 11134628e2 [x86] Stop this test from dirtying the source tree when run.
The output isn't used anyways.

llvm-svn: 304766
2017-06-06 03:24:22 +00:00
Chandler Carruth d7120758ba [x86] Add the test for folding stack spills into pextrw.
This is a negative test as pextrw doesn't write to all 32-bits of the
spilled GPR. This fold ended up happening when D32684 was landed and
covers the regression that motivated reverting it in r304762.

llvm-svn: 304763
2017-06-06 02:16:01 +00:00
Chandler Carruth 41ed4034dd [x86] Revert the X86FoldTablesEmitter due to more miscompiles.
In testing, we've found yet another miscompile caused by the new tables.
And this one is even less clear how to fix (we could teach it to fold
a 16-bit load instead of the 32-bit load it wants, or block folding
entirely).

Also, the approach to excluding instructions seems increasingly to not
scale well.

I have left a more detailed analysis on the review log for the original
patch (https://reviews.llvm.org/D32684) along with suggested path
forward. I will land an additional test case that I wrote which covers
the code that was miscompiling (folding into the output of `pextrw`) in
a subsequent commit to keep this a pure revert.

For each commit reverted here, I've restricted the revert to the
non-test code touching the x86 fold table emission until the last commit
where I did revert the test updates. This means the *new* test cases
added for `insertps` and `xchg` remain untouched (and continue to pass).

Reverted commits:
r304540: [X86] Don't fold into memory operands into insertps in the ...
r304347: [TableGen] Adapt more places to getValueAsString now ...
r304163: [X86] Don't fold away the memory operand of an xchg.
r304123: Don't capture a temporary std::string in a StringRef.
r304122: Resubmit "[X86] Adding new LLVM TableGen backend that ..."

Original commit was in r304088, and after a string of fixes was reverted
previously in r304121 to fix build bots, and then re-landed in r304122.

llvm-svn: 304762
2017-06-06 02:15:31 +00:00
Matthias Braun 7bda195812 CodeGen: Refactor MIR parsing
When parsing .mir files immediately construct the MachineFunctions and
put them into MachineModuleInfo.

This allows us to get rid of the delayed construction (and delayed error
reporting) through the MachineFunctionInitialzier interface.

Differential Revision: https://reviews.llvm.org/D33809

llvm-svn: 304758
2017-06-06 00:44:35 +00:00
Matthias Braun c7c06f158c CodeGen/LLVMTargetMachine: Refactor ISel pass construction; NFCI
- Move ISel (and pre-isel) pass construction into TargetPassConfig
- Extract AsmPrinter construction into a helper function

Putting the ISel code into TargetPassConfig seems a lot more natural and
both changes together make make it easier to build custom pipelines
involving .mir in an upcoming commit. This moves MachineModuleInfo to an
earlier place in the pass pipeline which shouldn't have any effect.

llvm-svn: 304754
2017-06-06 00:26:13 +00:00
Sanjay Patel d47e64398e [x86] fix over-specific triple; NFC
There's nothing darwin-specific in these tests, and using that 
setting causes extra phantom diffs when the auto-generated check 
lines are regenerated today.

llvm-svn: 304753
2017-06-06 00:18:11 +00:00
Quentin Colombet c668935d85 [InlineSpiller] Don't spill fully undef values
Althought it is not wrong to spill undef values, it is useless and harms
both code size and runtime. Before spilling a value, check that its
content actually matters.

http://www.llvm.org/PR33311

llvm-svn: 304752
2017-06-05 23:51:27 +00:00
Matt Arsenault 9e5b5053d1 RenameIndependentSubregs: Fix handling of undef tied operands
If a tied source operand was undef, it would be replaced but not
update the other tied operand, which would end up using different
virtual registers.

llvm-svn: 304747
2017-06-05 22:58:57 +00:00
Volkan Keles ebe6bb9006 [GlobalISel] IRTranslator: Add MachineMemOperand to target memory intrinsics
Reviewers: qcolombet, ab, t.p.northover, aditya_nandakumar, dsanders

Reviewed By: qcolombet

Subscribers: rovka, kristof.beyls, javed.absar, igorb, llvm-commits

Differential Revision: https://reviews.llvm.org/D33724

llvm-svn: 304743
2017-06-05 22:17:17 +00:00
Davide Italiano fb4d5c095b [SelectionDAG] Update the dominator after splitting critical edges.
Running `llc -verify-dom-info` on the attached testcase results in a
crash in the verifier, due to a stale dominator tree.

i.e.

  DominatorTree is not up to date!
  Computed:
  =============================--------------------------------
  Inorder Dominator Tree:
    [1] %safe_mod_func_uint8_t_u_u.exit.i.i.i {0,7}
      [2] %lor.lhs.false.i61.i.i.i {1,2}
      [2] %safe_mod_func_int8_t_s_s.exit.i.i.i {3,6}
        [3] %safe_div_func_int64_t_s_s.exit66.i.i.i {4,5}

  Actual:
  =============================--------------------------------
  Inorder Dominator Tree:
    [1] %safe_mod_func_uint8_t_u_u.exit.i.i.i {0,9}
      [2] %lor.lhs.false.i61.i.i.i {1,2}
      [2] %safe_mod_func_int8_t_s_s.exit.i.i.i {3,8}
        [3] %safe_div_func_int64_t_s_s.exit66.i.i.i {4,5}
        [3] %safe_mod_func_int8_t_s_s.exit.i.i.i.lor.lhs.false.i61.i.i.i_crit_edge {6,7}

This is because in `SelectionDAGIsel` we split critical edges without
updating the corresponding dominator for the function (and we claim
in `MachineFunctionPass::getAnalysisUsage()` that the domtree is preserved).

We could either stop preserving the domtree in `getAnalysisUsage`
or tell `splitCriticalEdge()` to update it.
As the second option is easy to implement, that's the one I chose.

Differential Revision:  https://reviews.llvm.org/D33800

llvm-svn: 304742
2017-06-05 22:16:41 +00:00
Simon Pilgrim 807b708d13 [X86][SSE41] Non-temporal loads shouldn't be folded if it can be avoided (PR32743)
Missed SSE41 non-temporal load case in previous commit

Differential Revision: https://reviews.llvm.org/D33728

llvm-svn: 304722
2017-06-05 16:45:32 +00:00
Simon Pilgrim b2ef948628 [X86][AVX1] Split 256-bit vector non-temporal loads to keep it non-temporal (PR32744)
Differential Revision: https://reviews.llvm.org/D33728

llvm-svn: 304718
2017-06-05 16:02:01 +00:00
Simon Pilgrim a25bf0b6b9 [X86][SSE] Non-temporal loads shouldn't be folded if it can be avoided (PR32743)
Differential Revision: https://reviews.llvm.org/D33728

llvm-svn: 304717
2017-06-05 15:43:03 +00:00
Diana Picus 0091cc3528 [ARM] GlobalISel: Constrain callee register on indirect calls
When lowering calls, we generate instructions with machine opcodes
rather than generic ones. Therefore, we need to constrain the register
classes of the operands.

Also enable the machine verifier on the arm-irtranslator.ll test, since
that would've caught this issue.

Fixes (part of) PR32146.

llvm-svn: 304712
2017-06-05 12:54:53 +00:00
Javed Absar b16d146838 Add support for #pragma clang section
This patch provides a means to specify section-names for global variables,
functions and static variables, using #pragma directives.
This feature is only defined to work sensibly for ELF targets.
One can specify section names as:
#pragma clang section bss="myBSS" data="myData" rodata="myRodata" text="myText"
One can "unspecify" a section name with empty string e.g.
#pragma clang section bss="" data="" text="" rodata=""

Reviewers: Roger Ferrer, Jonathan Roelofs, Reid Kleckner
Differential Revision: https://reviews.llvm.org/D33413

llvm-svn: 304704
2017-06-05 10:09:13 +00:00
Stanislav Mekhanoshin 286a4225b9 [AMDGPU] Fix SIFoldOperands crash with clamp
Fixes bug #33302. Pass did not account that Src1 of max instruction
can be an immediate.

Differential Revision: https://reviews.llvm.org/D33884

llvm-svn: 304696
2017-06-05 01:03:04 +00:00
Simon Pilgrim 46dd55f1e1 [X86][SSE] Change BUILD_VECTOR interleaving ordering to improve coalescing/combine opportunities
We currently generate BUILD_VECTOR as a tree of UNPCKL shuffles of the same type:

e.g. for v4f32:

Step 1: unpcklps 0, 2 ==> X: <?, ?, 2, 0>
      : unpcklps 1, 3 ==> Y: <?, ?, 3, 1>
Step 2: unpcklps X, Y ==>    <3, 2, 1, 0>

The issue is because we are not placing sequential vector elements together early enough, we fail to recognise many combinable patterns - consecutive scalar loads, extractions etc.

Instead, this patch unpacks progressively larger sequential vector elements together:

e.g. for v4f32:

Step 1: unpcklps 0, 2 ==> X: <?, ?, 1, 0>
      : unpcklps 1, 3 ==> Y: <?, ?, 3, 2>
Step 2: unpcklpd X, Y ==>    <3, 2, 1, 0>

This does mean that we are creating UNPCKL shuffle of different value types, but the relevant combines that benefit from this are quite capable of handling the additional BITCASTs that are now included in the shuffle tree.

Differential Revision: https://reviews.llvm.org/D33864

llvm-svn: 304688
2017-06-04 20:12:04 +00:00
Igor Breger 3bfba2c569 [GlobalISel][X86] merge irtranslator-call test files. NFC
llvm-svn: 304683
2017-06-04 12:41:10 +00:00
Stanislav Mekhanoshin 0330660403 [AMDGPU] Untangle SDWA pass from SIShrinkInstructions
Remove dependency of SDWA pass on SIShrinkInstructions.
The goal is to move SDWA even higher in the stack to avoid second run
of MachineLICM, MachineCSE and SIFoldOperands.

Also added handling to preserve original src modifiers.

Differential Revision: https://reviews.llvm.org/D33860

llvm-svn: 304665
2017-06-03 17:39:47 +00:00
Amaury Sechet 39fbe3bb60 Regenerate expectations for trunc-to-bool.ll . NFC
llvm-svn: 304660
2017-06-03 11:35:40 +00:00
Simon Pilgrim f93debb40c [X86][SSE] Add SCALAR_TO_VECTOR(PEXTRW/PEXTRB) support to faux shuffle combining
Generalized existing SCALAR_TO_VECTOR(EXTRACT_VECTOR_ELT) code to support AssertZext + PEXTRW/PEXTRB cases as well. 

llvm-svn: 304659
2017-06-03 11:12:57 +00:00
Tom Stellard e042412ef1 AMDGPU/GlobalISel: Mark 1-bit integer constants as legal
Summary:
These are mostly legal, but will probably need special lowering for some
cases.

Reviewers: arsenm

Reviewed By: arsenm

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, igorb, dstuttard, tpr, llvm-commits, t-tye

Differential Revision: https://reviews.llvm.org/D33791

llvm-svn: 304628
2017-06-03 01:13:33 +00:00
Stanislav Mekhanoshin f154b4f52c [AMDGPU] Preserve operand order in SIFoldOperands
SIFoldOperands can commute operands even if no folding was done.
This change is to preserve IR is no folding was done.

Differential Revision: https://reviews.llvm.org/D33802

llvm-svn: 304625
2017-06-03 00:41:52 +00:00
Quentin Colombet 1ee8616ca0 [SystemZ] Simplify test case. NFC
Remove useless successors information.

llvm-svn: 304615
2017-06-02 23:40:58 +00:00
Sanjay Patel 56641ac497 [x86] fix over-specific triple; NFC
There's nothing darwin-specific in these tests, and using
that setting causes extra phantom diffs when the auto-generated 
check lines are regenerated today.

llvm-svn: 304614
2017-06-02 23:40:46 +00:00
Philip Reames 80135bdf9e Canonicalize a test via utils/update_test_checks.py
Turns out I might not have further changes to make here, but with the way I'd written the tests, even I couldn't tell that.  :(

llvm-svn: 304613
2017-06-02 23:27:36 +00:00
Sanjay Patel 4cad0f0477 [x86] add tests for unsigned vector compares with known signbits; NFC (PR33276)
llvm-svn: 304612
2017-06-02 23:24:28 +00:00
Matthias Braun 0021d46a1c RegisterScavenging: Add ScavengerTest pass
This pass allows to run the register scavenging independently of
PrologEpilogInserter to allow targeted testing.

Also adds some basic register scavenging tests.

llvm-svn: 304606
2017-06-02 23:01:42 +00:00
Quentin Colombet 2145cf3f07 [RABasic] Properly update the LiveRegMatrix when LR splitting occur
Prior to this patch we used to not touch the LiveRegMatrix while doing
live-range splitting. In other words, when live-range splitting was
occurring, the LiveRegMatrix was not reflecting the changes.
This is generally fine because it means the query to the LiveRegMatrix
will be conservately correct. However, when decisions are taken based on
what is going to happen on the interferences (e.g., when we spill a
register and know that it is going to be available for another one), we
might hit an assertion that the color used for the assignment is still
in use.

This patch makes sure the changes on the live-ranges are properly
reflected in the LiveRegMatrix, so the assertions don't break.
An alternative could have been to remove the assertion, but it would
make the invariants of the code and the general reasoning more
complicated in my opnion.

http://llvm.org/PR33057

llvm-svn: 304603
2017-06-02 22:46:31 +00:00
Quentin Colombet ebbaed6d3c [RABasic] Properly initialize the pass
Use the initializeXXX method to initialize the RABasic pass in the
pipeline. This enables us to take advantage of the .mir infrastructure.

llvm-svn: 304602
2017-06-02 22:46:26 +00:00
Ahmed Bougacha 018a68f9e4 [X86] Correctly broadcast NaN-like integers as float on AVX.
Since r288804, we try to lower build_vectors on AVX using broadcasts of
float/double.  However, when we broadcast integer values that happen to
have a NaN float bitpattern, we lose the NaN payload, thereby changing
the integer value being broadcast.

This is caused by ConstantFP::get, to which we pass the splat i32 as
a float (by bitcasting it using bitsToFloat).  ConstantFP::get takes
a double parameter, so we end up lossily converting a single-precision
NaN to double-precision.

Instead, avoid any kinds of conversions by directly building an APFloat
from the splatted APInt.

Note that this also fixes another piece of code (broadcast of
subvectors), that currently isn't susceptible to the same problem.

Also note that we could really just use APInt and ConstantInt
throughout: the constant pool type doesn't matter much.  Still, for
consistency, use the appropriate type.

llvm-svn: 304590
2017-06-02 20:02:59 +00:00
Amaury Sechet 04ffaca604 Regenerate expectation for wide-fma-contraction.ll . NFC
llvm-svn: 304586
2017-06-02 19:15:04 +00:00
Konstantin Zhuravlyov be6c0ca5e2 AMDGPU: Make auto waitcnt before barrier a feature
Differential Revision: https://reviews.llvm.org/D33793

llvm-svn: 304571
2017-06-02 17:40:26 +00:00
Philip Reames 94cc4a29ed Add placeholder for more extensive verification of psuedo ops
This initial patch doesn't actually do much useful. It's just to show where the new code goes. Once this is in, I'll extend the verification logic to check more useful properties.

For those curious, the more complicated version of this patch already found one very suspicious thing.

Differential Revision: https://reviews.llvm.org/D33819

llvm-svn: 304564
2017-06-02 16:36:37 +00:00
Amaury Sechet 5746e7356a Update select.ll expected results. NFC
llvm-svn: 304557
2017-06-02 16:07:43 +00:00
Alexander Timofeev 3f70b619a9 AMDGPUAnnotateUniformValue should always treat volatile loads as divergent
llvm-svn: 304554
2017-06-02 15:25:52 +00:00
Mark Searles 70359ac60d [AMDGPU] Turn on the new waitcnt insertion pass. Adjust tests.
-enable-si-insert-waitcnts=1 becomes the default
-enable-si-insert-waitcnts=0 to use old pass

Differential Revision: https://reviews.llvm.org/D33730

llvm-svn: 304551
2017-06-02 14:19:25 +00:00
Zoran Jovanovic 2aae0649a1 [mips][microMIPS] Extending size reduction pass with LBU16, LHU16, SB16 and SH16
Author: milena.vujosevic.janicic
Reviewers: sdardis
The patch extends size reduction pass for MicroMIPS.
The following instructions are examined and transformed, if possible:
LBU instruction is transformed into 16-bit instruction LBU16
LHU instruction is transformed into 16-bit instruction LHU16
SB instruction is transformed into 16-bit instruction SB16
SH instruction is transformed into 16-bit instruction SH16
Differential Revision: https://reviews.llvm.org/D33091

llvm-svn: 304550
2017-06-02 14:14:21 +00:00
Krzysztof Parzyszek 066e8b56a0 [Hexagon] Return 0 from getDotNewPredOp when .new opcode does not exist
This allows using this function to test if an instruction can be converted
to a .new form.

llvm-svn: 304549
2017-06-02 14:07:06 +00:00
Amaury Sechet 2e1fed9ef8 Regenerate sse3.ll test results. NFC
llvm-svn: 304548
2017-06-02 14:02:49 +00:00
Amaury Sechet 8e370f14cb Regenerate and-sink.ll test results. NFC
llvm-svn: 304547
2017-06-02 14:02:46 +00:00
Amaury Sechet f0c066f140 Regenerate shrink-compare.ll test results. NFC
llvm-svn: 304546
2017-06-02 14:02:43 +00:00
Benjamin Kramer 19092d783c [X86] Don't fold into memory operands into insertps in the generated folding tables.
insertps behaves differently, the register form selects from an input
register based on the immediate operand while the memory form just loads
the given address. We have custom code to change the immediate in cases
where that's legal, so completely remove insertps from the generated
tables.

llvm-svn: 304540
2017-06-02 10:50:22 +00:00
John Brawn 6671616cde [GlobalMerge] Don't merge globals that may be preempted
When a global may be preempted it needs to be accessed directly, instead of
indirectly through a MergedGlobals symbol, for the preemption to work.

This fixes PR33136.

Differential Revision: https://reviews.llvm.org/D33727

llvm-svn: 304537
2017-06-02 10:24:14 +00:00
Diana Picus e7aa90987d [ARM] GlobalISel: Support struct params/returns
Very very similar to the support for arrays. As with arrays, we don't
support returning large structs that wouldn't fit in R0-R3. Most
front-ends would likely use sret arguments for that anyway.

The only significant difference is that when splitting a struct, we need
to make sure we set the correct original alignment on each member,
otherwise it may get split incorrectly between stack and registers.

llvm-svn: 304536
2017-06-02 10:16:48 +00:00
Javed Absar 4ae7e81233 [ARM] Cortex-A57 scheduling model for ARM backend (AArch32)
This patch implements the Cortex-A57 scheduling model.
The main code is in ARMScheduleA57.td, ARMScheduleA57WriteRes.td.
Small changes in cpp,.h files to support required scheduling predicates.

Scheduling model implemented according to:
 http://infocenter.arm.com/help/topic/com.arm.doc.uan0015b/Cortex_A57_Software_Optimization_Guide_external.pdf.

Patch by : Andrew Zhogin (submitted on his behalf, as requested).
Rewiewed by: Renato Golin, Diana Picus, Javed Absar, Kristof Beyls.
Differential Revision: https://reviews.llvm.org/D28152

llvm-svn: 304530
2017-06-02 08:53:19 +00:00
Amaury Sechet 9a6fdc0bd5 Specify triple for xor-icmp.ll .
llvm-svn: 304526
2017-06-02 07:45:22 +00:00
Amaury Sechet 968dda7f81 Regenerate expectations for xor-icmp.ll . NFC
llvm-svn: 304525
2017-06-02 07:25:02 +00:00
Yaxun Liu a618acf923 [AMDGPU] Fix kernel arg segment size for amdgizcl
Differential Revision: https://reviews.llvm.org/D33307

llvm-svn: 304482
2017-06-01 21:31:53 +00:00
Nirav Dave 4952871630 [SDAG] Fix CombineTo ordering in visitZERO_EXTEND and visitSIGN_EXTEND
Reorder CombineTo Calls to prevent references to stale/deleted SDNodes which caused undue assertions.

Reviewers: dbabokin

Subscribers: aemerson, rengolin, llvm-commits

Differential Revision: https://reviews.llvm.org/D31625

llvm-svn: 304460
2017-06-01 19:33:50 +00:00
Krzysztof Parzyszek 3cf16576d5 [Hexagon] Fix dependence check in the packetizer
An incorrect check in the packetizer lead to an attempt to convert
an unconditional branch to a .new (conditional) form.

llvm-svn: 304442
2017-06-01 18:02:40 +00:00
Krzysztof Parzyszek 51fd5405d5 [Hexagon] Handle long-running simplification loop in idiom recognition
The initial assumption was that the simplification would converge to a
fixed point relatvely quickly. Turns out that there are legitimate situa-
tions where the complexity of the code causes it to take a large number
of iterations.

Two main changes:
- Instead of aborting upon hitting the limit, simply return nullptr.
- Reduce the limit to 10,000 from 100,000.

llvm-svn: 304441
2017-06-01 18:00:47 +00:00
Amaury Sechet 94eb633dd2 Fix addcarry-crash.ll
llvm-svn: 304415
2017-06-01 14:24:31 +00:00
Amaury Sechet b761959993 Add regression test for the addcarry crash. See D33770 for context.
llvm-svn: 304414
2017-06-01 14:09:56 +00:00