Commit Graph

21063 Commits

Author SHA1 Message Date
Davide Italiano 5fc5d0a406 [X86] Don't try to scale down if that exceeds the bitwidth.
Fixes the crash reported in PR33844.

llvm-svn: 308503
2017-07-19 18:09:46 +00:00
Tim Northover d59fbec8e2 GlobalISel: select G_EXTRACT and G_INSERT instructions on AArch64.
llvm-svn: 308493
2017-07-19 16:47:07 +00:00
Javed Absar 2cb0c95031 [ARM] Unify handling of M-Class system registers
This patch cleans up and fixes issues in the M-Class system register handling:

1. It defines the system registers and the encoding (SYSm values) in one place:
   a new ARMSystemRegister.td using SearchableTable, thereby removing the
   hand-coded values which existed in multiple places.

2. Some system registers e.g. BASEPRI_MAX_NS which do not exist were being allowed!
   Ref: ARMv6/7/8M architecture reference manual.

Reviewed by: @t.p.northover, @olist01, @john.brawn
Differential Revision: https://reviews.llvm.org/D35209

llvm-svn: 308456
2017-07-19 12:57:16 +00:00
Simon Pilgrim e5c7925c5e [X86][XOP] Use default AVX2 lowering for v4i64 ashr by splat constants
XOP shifts only support 128-bit vectors, so we were ending up with less optimal codegen requiring constants

llvm-svn: 308430
2017-07-19 10:29:31 +00:00
Balaram Makam b05a55787a [SimplifyCFG] Defer folding unconditional branches to LateSimplifyCFG if it can destroy canonical loop structure.
Summary:
When simplifying unconditional branches from empty blocks, we pre-test if the
BB belongs to a set of loop headers and keep the block to prevent passes from
destroying canonical loop structure. However, the current algorithm fails if
the destination of the branch is a loop header. Especially when such a loop's
latch block is folded into loop header it results in additional backedges and
LoopSimplify turns it into a nested loop which prevent later optimizations
from being applied (e.g., loop  unrolling and loop interleaving).

This patch augments the existing algorithm by further checking if the
destination of the branch belongs to a set of loop headers and defer
eliminating it if yes to LateSimplifyCFG.

Fixes PR33605: https://bugs.llvm.org/show_bug.cgi?id=33605

Reviewers: efriedma, mcrosier, pacxx, hsung, davidxl

Reviewed By: efriedma

Subscribers: ashutosh.nema, gberry, javed.absar, llvm-commits

Differential Revision: https://reviews.llvm.org/D35411

llvm-svn: 308422
2017-07-19 08:53:34 +00:00
Chandler Carruth bb83558f00 Revert r308273 to reinstate part of r308100.
That part was reverted because the underlying change necessitating it
(r308025) was reverted in r308271.

Nirav re-landed r308025 again in r308350, so re-landing this fix.

llvm-svn: 308418
2017-07-19 04:15:30 +00:00
Craig Topper 106b5b6856 AMD znver1 Initial Scheduler model
Summary:
This patch adds the following
1. Adds a skeleton scheduler model for AMD Znver1.
2. Introduces the znver1 execution units and pipes.
3. Caters the instructions based on the generic scheduler classes.
4. Further additions to the scheduler model with instruction itineraries will be carried out incrementally based on
        a. Instructions types
        b. Registers used
5. Since itineraries are not added based on instructions, throughput information are bound to change when incremental changes are added.
6. Scheduler testcases are modified accordingly to suit the new model.

Patch by Ganesh Gopalasubramanian. With minor formatting tweaks from me.

Reviewers: craig.topper, RKSimon

Subscribers: javed.absar, shivaram, ddibyend, vprasad

Differential Revision: https://reviews.llvm.org/D35293

llvm-svn: 308411
2017-07-19 02:45:14 +00:00
Mandeep Singh Grang d857b4ca98 [COFF, ARM64] Reserve X18 register by default
Reviewers: compnerd, rnk, ruiu, mstorsjo

Reviewed By: mstorsjo

Subscribers: aemerson, javed.absar, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D35531

llvm-svn: 308358
2017-07-18 20:41:33 +00:00
Nirav Dave d839749ae8 [DAG] Improve Aliasing of operations to static alloca
Re-recommiting after landing DAG extension-crash fix.

Recommiting after adding check to avoid miscomputing alias information
on addresses of the same base but different subindices.

Memory accesses offset from frame indices may alias, e.g., we
may merge write from function arguments passed on the stack when they
are contiguous. As a result, when checking aliasing, we consider the
underlying frame index's offset from the stack pointer.

Static allocs are realized as stack objects in SelectionDAG, but its
offset is not set until post-DAG causing DAGCombiner's alias check to
consider access to static allocas to frequently alias. Modify isAlias
to consider access between static allocas and access from other frame
objects to be considered aliasing.

Many test changes are included here. Most are fixes for tests which
indirectly relied on our aliasing ability and needed to be modified to
preserve their original intent.

The remaining tests have minor improvements due to relaxed
ordering. The exception is CodeGen/X86/2011-10-19-widen_vselect.ll
which has a minor degradation dispite though the pre-legalized DAG is
improved.

Reviewers: rnk, mkuper, jonpa, hfinkel, uweigand

Reviewed By: rnk

Subscribers: sdardis, nemanjai, javed.absar, llvm-commits

Differential Revision: https://reviews.llvm.org/D33345

llvm-svn: 308350
2017-07-18 20:06:24 +00:00
James Y Knight dda87cab7d [Sparc] Added software multiplication/division feature
Added a feature to the Sparc back-end that replaces the integer multiply and
divide instructions with calls to .mul/.sdiv/.udiv. This is a step towards
having full v7 support.

Patch by: Eric Kedaigle
Differential Revision: https://reviews.llvm.org/D35500

llvm-svn: 308343
2017-07-18 19:08:38 +00:00
Nirav Dave 07871007aa [DAG] Avoid deleting nodes before combining them.
When replacing a node and it's operand, replacing the operand node may
cause the deletion of the original node leading to an assertion
failure. Case around these replacements to avoid this without relying
on inspecting the DELETED_NODE opcode in various extend
dagcombiner cases.

Fixes PR32515.

Reviewers: dbabokin, RKSimon, davide, chandlerc

Subscribers: chandlerc, llvm-commits

Differential Revision: https://reviews.llvm.org/D34095

llvm-svn: 308330
2017-07-18 17:39:15 +00:00
Matt Arsenault 254ad3de5c AMDGPU: Annotate necessity of flat-scratch-init
As an approximation of the existing handling to avoid
regressions. Fixes using too many registers with calls
on subtargets with the SGPR allocation bug.

llvm-svn: 308326
2017-07-18 16:44:58 +00:00
Matt Arsenault 1cc47f8413 AMDGPU: Figure out private memory regs after lowering
Introduce pseudo-registers for registers needed for stack
access, which are replaced during finalizeLowering.
Note these pseudo-registers are currently only used for the
used register location, and not for determining their
input argument register.

This is better because it avoids the need to try to predict
whether a call will be emitted from the IR, and also
detects stack objects introduced by legalization.

Test changes are from the HasStackObjects check being more
accurate since stack objects introduced during legalization
are now known.

llvm-svn: 308325
2017-07-18 16:44:56 +00:00
Geoff Berry 9962faed2b [AArch64][Falkor] Avoid HW prefetcher tag collisions (step 2)
Summary:
Avoid HW prefetcher instruction tag collisions in loops by inserting
MOVs to change the base address register of strided loads.

Reviewers: t.p.northover, mcrosier

Subscribers: aemerson, rengolin, javed.absar, kristof.beyls, hfinkel, llvm-commits

Differential Revision: https://reviews.llvm.org/D35366

llvm-svn: 308324
2017-07-18 16:14:22 +00:00
Simon Pilgrim 964a1f1fb0 [X86][AVX] Regenerate shift test to show constant broadcast comment
llvm-svn: 308323
2017-07-18 16:07:12 +00:00
Simon Pilgrim 483927aefb [x86, CGP] increase memcmp() expansion up to 4 load pairs
It should be a win to avoid going out to the system lib for all small memcmp() calls using scalar ops. For x86 32-bit, this means most everything up to 16 bytes. For 64-bit, that doubles because we can do 8-byte loads.

Notes:

    Reduced from 4 to 2 loads for -Os behavior, which might not be optimal in all cases. It's effectively a question of how much do we trust the system implementation. Linux and macOS (and Windows I assume, but did not test) have optimized memcmp() code for x86, so it's probably not bad either way? PPC is using 8/4 for defaults on these. We do not expand at all for -Oz.

    There are still potential improvements to make for the CGP expansion IR and/or lowering such as avoiding select-of-constants (D34904) and not doing zexts to the max load type before doing a compare.

    We have special-case SSE/AVX codegen for (memcmp(x, y, 16/32) == 0) that will no longer be produced after this patch. I've shown the experimental justification for that change in PR33329:

https://bugs.llvm.org/show_bug.cgi?id=33329#c12
TLDR: While the vector code is a likely winner, we can't guarantee that it's a winner in all cases on all CPUs, so I'm willing to sacrifice it for the greater good of expanding all small memcmp(). If we want to resurrect that codegen, it can be done by adjusting the CGP params or poking a hole to let those fall-through the CGP expansion.

Committed on behalf of Sanjay Patel

Differential Revision: https://reviews.llvm.org/D35067

llvm-svn: 308322
2017-07-18 15:55:30 +00:00
Sumanth Gundapaneni d5aa0f3464 [Hexagon] Emit lookup tables in text section based on a flag
The flag "-hexagon-emit-lut-text" (defaulted to false) is added to decide
on where to keep the switch generated lookup table.
Differential Revision: https://reviews.llvm.org/D34818

llvm-svn: 308316
2017-07-18 15:31:37 +00:00
Nicolai Haehnle a253e4c028 AMDGPU: Fix crash when folding immediates into multiple uses
Summary:
When an immediate is folded by constant folding, we re-scan the entire
use list for two reasons:

1. The constant folding may have created a new use of the same reg.
2. The constant folding may have removed an additional use in the list
   we're currently traversing (e.g., constant folding an S_ADD_I32 c, c).

However, this could previously lead to a crash when an unrelated use was
added twice into the FoldList. Since we re-scan the whole list anyway, we
might as well just clear the FoldList again before we do so.

Using a MIR test to show this because real code seems to trigger the issue
only in connection with some really subtle control flow structures.

Fixes GL45-CTS.shading_language_420pack.binding_images on gfx9.

Reviewers: arsenm

Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye

Differential Revision: https://reviews.llvm.org/D35416

llvm-svn: 308314
2017-07-18 14:54:41 +00:00
Simon Pilgrim c2cbb525ec [X86] Add optsize and minsize memcmp tests (D35067)
llvm-svn: 308311
2017-07-18 14:26:07 +00:00
Sam Kolton 4685b70a77 [AMDGPU] resubmit r308179: CodeGen: check dst operand type to determine if omod is supported for VOP3 instructions
llvm-svn: 308310
2017-07-18 14:23:26 +00:00
Simon Pilgrim 420e5eadc2 [X86] Added cmov target to memcmp test
As discussed by @spatel on D35067:

"I added the cmov attribute to the 32-bit codegen test because it removes some noise for that file. I think the intent for the SSE vs no-SSE runs is to show the potential difference for the 16 and 32 byte cases rather than the lack of cmov (which has been available for all CPUs since SSE1, so that's why it shows up automatically with -mattr=sse2)."

llvm-svn: 308309
2017-07-18 14:19:34 +00:00
Daniel Sanders 40b66d646e [globalisel][tablegen] Enable the import of rules involving fma.
Summary:
G_FMA was recently added to GlobalISel which enables the import of rules
involving fma. Add the mapping to allow it.

Reviewers: ab, t.p.northover, qcolombet, rovka, aditya_nandakumar

Reviewed By: rovka

Subscribers: kristof.beyls, javed.absar, igorb, llvm-commits

Differential Revision: https://reviews.llvm.org/D35130

llvm-svn: 308308
2017-07-18 14:10:07 +00:00
Simon Pilgrim 4793a11df9 [DAGCombine] Fix issue with out of bound constant rotation (PR33828)
Take the modulo of rotations by a constant greater than or equal to the bit-width

llvm-svn: 308302
2017-07-18 12:31:46 +00:00
Stefan Maksimovic 58f225b371 [mips] Alter register classes for MSA pseudo f16 instructions
This change introduces additional machine instructions in functions
dealing with the expansion of msa pseudo f16 instructions due to
register classes being inappropriate when checked with machine
verifier.

Differential Revision: https://reviews.llvm.org/D34276

llvm-svn: 308301
2017-07-18 12:05:35 +00:00
Simon Pilgrim 0636fbd737 [X86][AVX512] Add ISD::ROTL/ISD::ROTR constant folding tests
llvm-svn: 308295
2017-07-18 11:18:38 +00:00
Simon Pilgrim 8d0fc91adc [X86] Add test case for PR32282
llvm-svn: 308286
2017-07-18 10:09:40 +00:00
Diana Picus da25d5b8b0 [ARM] GlobalISel: Support G_(S|U)REM for s8 and s16
Widen to s32, and then do whatever Lowering/Custom/Libcall action the
subtarget wants.

llvm-svn: 308285
2017-07-18 10:07:01 +00:00
Florian Hahn 3530094de6 [AArch64] Use 16 bytes as preferred function alignment on Cortex-A73.
Summary:
Using 16 byte alignment is beneficial on Cortex-A73, similar to
Cortex-A72 (added in D34961).

Reviewers: mcrosier, t.p.northover, aadg, silviu.baranga

Reviewed By: t.p.northover

Subscribers: aemerson, rengolin, javed.absar, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D35493

llvm-svn: 308283
2017-07-18 09:31:18 +00:00
Chandler Carruth 3a9968184a Revert part of r308100 since the cause (r308025) was also reverted.
The commit r308100 updated WebAssembly tests for r308025. In one case it
merely made the test more resilient but in another case it made
a substantive update. Because r308025 was reverted in r308271, these
changes to the test also need to be reverted. They should be folded into
the recommit of r308025 when it is ready.

llvm-svn: 308273
2017-07-18 08:20:50 +00:00
Chandler Carruth 0781d52cb3 [x86] Add a missing triple, without which the CPU won't parse.
Notably, this is failing on our PPC build bots:
http://lab.llvm.org:8011/builders/clang-ppc64le-linux/builds/8338/steps/ninja%20check%201/logs/FAIL%3A%20LLVM%3A%3Apr33772.ll

llvm-svn: 308272
2017-07-18 08:16:32 +00:00
Chandler Carruth a15e080b05 Revert r308025 due to uncovering a crash in SelectionDAG. This is filed
with a minimal test case in http://llvm.org/PR33833.

Original commit message:
  Improve Aliasing of operations to static alloca

llvm-svn: 308271
2017-07-18 07:53:47 +00:00
Chandler Carruth 9a7442d088 Revert r308179 which causes tablegen to spam stderr on every build.
Original commit log:
[AMDGPU] CodeGen: check dst operand type to determine if omod is supported for VOP3 instructions

llvm-svn: 308270
2017-07-18 07:40:47 +00:00
Craig Topper f54a500101 [X86] Prevent an assertion failure if a gather intrinsic is passed a non-constant scale value.
This isn't legal code, but we shouldn't crash on it. Now we just don't convert the gather intrinsic if the scale isn't constant and let it go through to isel where we'll report an isel failure.

Fixes PR33772.

llvm-svn: 308267
2017-07-18 06:49:23 +00:00
Matt Arsenault e15855d9e3 AMDGPU: Annotate features from x work item/group IDs.
This wasn't necessary before since they are always enabled
for kernels, but this is necessary if they need to be
forwarded to a callable function.

llvm-svn: 308226
2017-07-17 22:35:50 +00:00
Martin Storsjo 2f24e93481 [AArch64] Extend CallingConv::X86_64_Win64 to AArch64 as well
Rename the enum value from X86_64_Win64 to plain Win64.

The symbol exposed in the textual IR is changed from 'x86_64_win64cc'
to 'win64cc', but the numeric value is kept, keeping support for
old bitcode.

Differential Revision: https://reviews.llvm.org/D34474

llvm-svn: 308208
2017-07-17 20:05:19 +00:00
Ulrich Weigand f2968d58cb [SystemZ] Add support for IBM z14 processor (3/3)
This adds support for the new 128-bit vector float instructions of z14.
Note that these instructions actually only operate on the f128 type,
since only each 128-bit vector register can hold only one 128-bit
float value.  However, this is still preferable to the legacy 128-bit
float instructions, since those operate on pairs of floating-point
registers (so we can hold at most 8 values in registers), while the
new instructions use single vector registers (so we hold up to 32
value in registers).

Adding support includes:
- Enabling the instructions for the assembler/disassembler.
- CodeGen for the instructions.  This includes allocating the f128
  type now to the VR128BitRegClass instead of FP128BitRegClass.
- Scheduler description support for the instructions.

Note that for a small number of operations, we have no new vector
instructions (like integer <-> 128-bit float conversions), and so
we use the legacy instruction and then reformat the operand
(i.e. copy between a pair of floating-point registers and a
vector register).

llvm-svn: 308196
2017-07-17 17:44:20 +00:00
Ulrich Weigand 33435c4c9c [SystemZ] Add support for IBM z14 processor (2/3)
This adds support for the new 32-bit vector float instructions of z14.
This includes:
- Enabling the instructions for the assembler/disassembler.
- CodeGen for the instructions, including new LLVM intrinsics.
- Scheduler description support for the instructions.
- Update to the vector cost function calculations.

In general, CodeGen support for the new v4f32 instructions closely
matches support for the existing v2f64 instructions.

llvm-svn: 308195
2017-07-17 17:42:48 +00:00
Ulrich Weigand 2b3482fe85 [SystemZ] Add support for IBM z14 processor (1/3)
This patch series adds support for the IBM z14 processor.  This part includes:
- Basic support for the new processor and its features.
- Support for new instructions (except vector 32-bit float and 128-bit float).
- CodeGen for new instructions, including new LLVM intrinsics.
- Scheduler description for the new processor.
- Detection of z14 as host processor.

Support for the new 32-bit vector float and 128-bit vector float
instructions is provided by separate patches.

llvm-svn: 308194
2017-07-17 17:41:11 +00:00
Mandeep Singh Grang ed64963f1e [llvm] Remove redundant check-prefix=CHECK from tests. NFC.
Reviewers: t.p.northover, oren_ben_simhon, niravd, mcrosier

Reviewed By: oren_ben_simhon, mcrosier

Subscribers: nhaehnle, javed.absar, llvm-commits

Differential Revision: https://reviews.llvm.org/D35466

llvm-svn: 308193
2017-07-17 17:32:45 +00:00
Krzysztof Parzyszek 5eef92eb7f [Hexagon] Remove custom lowering of loads of v4i16
The target-independent lowering works fine, except concatenating 32-bit
words. Add a pattern to generate A2_combinew instead of 64-bit asl/or.

llvm-svn: 308186
2017-07-17 15:45:45 +00:00
Simon Pilgrim 948eca371e [X86] Add LEA scheduling tests
llvm-svn: 308180
2017-07-17 14:37:17 +00:00
Sam Kolton a2b9e2f755 [AMDGPU] CodeGen: check dst operand type to determine if omod is supported for VOP3 instructions
Summary:
Previously, CodeGen checked first src operand type to determine if omod is supported by instruction. This isn't correct for some instructions: e.g. V_CMP_EQ_F32 has floating-point src operands but desn't support omod.
Changed .td files to check if dst operand instead of src operand.

Reviewers: arsenm, vpykhtin

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye

Differential Revision: https://reviews.llvm.org/D35350

llvm-svn: 308179
2017-07-17 14:23:38 +00:00
Simon Pilgrim 1cbe8c2ca5 [X86][AVX512] Add lowering of vXi32/vXi64 ISD::ROTL/ISD::ROTR
Add support for lowering to ISD::ROTL/ISD::ROTR, including rotate by immediate

Differential Revision: https://reviews.llvm.org/D35463

llvm-svn: 308177
2017-07-17 14:11:30 +00:00
Simon Pilgrim 105a3716bb Fixed line endings. NFCI.
llvm-svn: 308175
2017-07-17 13:58:20 +00:00
Simon Pilgrim 11199b2ee5 [X86][AVX] Fix typo in vector rotate tests
Was preventing rotate matching

llvm-svn: 308171
2017-07-17 10:35:51 +00:00
Simon Pilgrim 5aa70e7fe5 [X86][AVX512] Add constant splat vector rotate tests for D35463
llvm-svn: 308169
2017-07-17 10:09:48 +00:00
Simon Pilgrim 701e25edce [X86][AVX512] Regenerate shift tests
llvm-svn: 308168
2017-07-17 09:53:45 +00:00
Dylan McKay 5c8a50bddd [AVR] Add/remove XFAILs to get the backend passing Generic CodeGen tests
A few tests have since been fixed, and a few since now fail.

llvm-svn: 308151
2017-07-16 23:33:50 +00:00
Andrew Zhogin 67a64041b9 [DAGCombiner] Recognise vector rotations with non-splat constants
Fixes PR33691.

Differential revision: https://reviews.llvm.org/D35381

llvm-svn: 308150
2017-07-16 23:11:45 +00:00
Dylan McKay 2c59215ae3 [AVR] Fix a typo in the tests
llvm-svn: 308148
2017-07-16 22:31:07 +00:00
Konstantin Zhuravlyov 2ec725c9d8 AMDGPU: Fix amdgpu-flat-work-group-size/amdgpu-waves-per-eu check
Differential Revision: https://reviews.llvm.org/D35433

llvm-svn: 308147
2017-07-16 19:38:47 +00:00
Simon Pilgrim 2899ec88fc [X86][AVX512] Add 512-bit vector rotate tests
llvm-svn: 308146
2017-07-16 19:26:49 +00:00
Amjad Aboud 4563c062b1 [X86] X86::CMOV to Branch heuristic based optimization.
LLVM compiler recognizes opportunities to transform a branch into IR select instruction(s) - later it will be lowered into X86::CMOV instruction, assuming no other optimization eliminated the SelectInst.
However, it is not always profitable to emit X86::CMOV instruction. For example, branch is preferable over an X86::CMOV instruction when:
1. Branch is well predicted
2. Condition operand is expensive, compared to True-value and the False-value operands

In CodeGenPrepare pass there is a shallow optimization that tries to convert SelectInst into branch, but it is not enough.
This commit, implements machine optimization pass that converts X86::CMOV instruction(s) into branch, based on a conservative heuristic.

Differential Revision: https://reviews.llvm.org/D34769

llvm-svn: 308142
2017-07-16 17:39:56 +00:00
Simon Pilgrim dad2aef037 [X86] Add F16C scheduling tests
llvm-svn: 308138
2017-07-16 14:34:18 +00:00
Simon Pilgrim 6f26f3d07f [X86] Add POPCNT scheduling tests
llvm-svn: 308137
2017-07-16 14:22:39 +00:00
Simon Pilgrim b884b208ee [X86] Add BMI2 scheduling tests
llvm-svn: 308136
2017-07-16 14:09:15 +00:00
Simon Pilgrim dfb6eb279f [X86] Add BMI1 scheduling tests
llvm-svn: 308135
2017-07-16 13:59:44 +00:00
Simon Pilgrim 7194513268 [X86] Add LZCNT scheduling tests
llvm-svn: 308133
2017-07-16 13:40:44 +00:00
Simon Pilgrim 73ef87978f [X86][SSE4A] Add EXTRQ/INSERTQ values to BTVER2 scheduling model
llvm-svn: 308132
2017-07-16 12:06:06 +00:00
Simon Pilgrim 7d43bcfd2d [X86][AVX] Regenerate tests with constant broadcast comments
llvm-svn: 308131
2017-07-16 11:43:16 +00:00
Simon Pilgrim e47df64a18 [X86][AVX] Regenerate vector tzcnt tests with constant broadcast comments
llvm-svn: 308130
2017-07-16 11:40:23 +00:00
Simon Pilgrim 17f20f48c2 [X86][AVX] Regenerate vector idiv tests with constant broadcast comments
llvm-svn: 308129
2017-07-16 11:38:14 +00:00
Simon Pilgrim 77ce072f6b [X86][AVX] Regenerate combine tests with constant broadcast comments
llvm-svn: 308128
2017-07-16 11:36:11 +00:00
Hiroshi Inoue 7f46baff2c fix typos in comments; NFC
llvm-svn: 308127
2017-07-16 08:11:56 +00:00
Simon Pilgrim f9ea0959d9 [X86][AVX] Regenerate tests with constant broadcast comments
llvm-svn: 308110
2017-07-15 21:17:35 +00:00
Simon Pilgrim c2221ee767 [X86][AVX] Regenerate tests with constant broadcast comments
llvm-svn: 308109
2017-07-15 20:28:09 +00:00
Chandler Carruth 85c82841ba [wasm] Update two tests for r308025 which causes scheduling changes due
to the newly improved AA information.

llvm-svn: 308100
2017-07-15 15:44:36 +00:00
Simon Atanasyan f217c7b7e2 [mips] Handle the `long-calls` feature flags in the MIPS backend
If the `long-calls` feature flags is enabled, disable use of the `jal`
instruction. Instead of that call a function by by first loading its
address into a register, and then using the contents of that register.

Differential revision: https://reviews.llvm.org/D35168

llvm-svn: 308087
2017-07-15 07:14:25 +00:00
Matt Arsenault b34635550a AMDGPU: Return correct type during argument lowering
The type needs to be casted back to the original argument type.
Fixes an assert that for some reason is only run when
using -debug.

Includes an additional combine to avoid test regressions
from having conversions mixed with multiple Assert[SZ]ext
nodes. On subtargets where i16 is legal, this was producing an i32
register with an i16 AssertZExt, truncated to i16 with another i8
AssertZExt.

t2: i32,ch = CopyFromReg t0, Register:i32 %vreg0
t3: i16 = truncate t2
t5: i16 = AssertZext t3, ValueType:ch:i8
t6: i8 = truncate t5
t7: i32 = zero_extend t6
llvm-svn: 308082
2017-07-15 05:52:59 +00:00
Yonghong Song 9276ef05c8 bpf: generate better lowering code for certain select/setcc instructions
Currently, for code like below,
===
  inner_map = bpf_map_lookup_elem(outer_map, &port_key);
  if (!inner_map) {
    inner_map = &fallback_map;
  }
===
the compiler generates (pseudo) code like the below:
===
  I1: r1 = bpf_map_lookup_elem(outer_map, &port_key);
  I2: r2 = 0
  I3: if (r1 == r2)
  I4:   r6 = &fallback_map
  I5: ...
===

During kernel verification process, After I1, r1 holds a state
map_ptr_or_null. If I3 condition is not taken
(path [I1, I2, I3, I5]), supposedly r1 should become map_ptr.
Unfortunately, kernel does not recognize this pattern
and r1 remains map_ptr_or_null at insn I5. This will cause
verificaiton failure later on.

Kernel, however, is able to recognize pattern "if (r1 == 0)"
properly and give a map_ptr state to r1 in the above case.

LLVM here generates suboptimal code which causes kernel verification
failure. This patch fixes the issue by changing BPF insn pattern
matching and lowering to generate proper codes if the righthand
parameter of the above condition is a constant. A test case
is also added.

Signed-off-by: Yonghong Song <yhs@fb.com>
llvm-svn: 308080
2017-07-15 05:41:42 +00:00
Yi Kong 3b680d8d81 [AArch64] Avoid selecting XZR inline ASM memory operand
Restricting register class to PointerRegClass for memory operands.

Also fix the PointerRegClass for AArch64 from GPR64 to GPR64sp, since
XZR cannot hold a memory pointer while SP is.

Fixes PR33134.

Differential Revision: https://reviews.llvm.org/D34999

llvm-svn: 308060
2017-07-14 21:46:16 +00:00
Geoff Berry b1e8714af9 [AArch64][Falkor] Avoid HW prefetcher tag collisions (step 1)
Summary:
This patch is the first step in reducing HW prefetcher instruction tag
collisions in inner loops for Falkor.  It adds a pass that annotates IR
loads with metadata to indicate that they are known to be strided loads,
and adds a target lowering hook that translates this metadata to a
target-specific MachineMemOperand flag.

A follow on change will use this MachineMemOperand flag to re-write
instructions to reduce tag collisions.

Reviewers: mcrosier, t.p.northover

Subscribers: aemerson, rengolin, mgorny, javed.absar, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D34963

llvm-svn: 308059
2017-07-14 21:44:12 +00:00
Alfred Huang 5b27072f57 [AMDGPU] Do not insert an instruction into worklist twice in movetovalu
In moveToVALU(), move to vector ALU is performed, all instrs in
the use chain will be visited. We do not want the same node to be
pushed to the visit worklist more than once.

Differential Revision: https://reviews.llvm.org/D34726

llvm-svn: 308039
2017-07-14 17:56:55 +00:00
Krzysztof Parzyszek 9c084fc55d [Hexagon] Add intrinsics for data cache operations
This is the LLVM part, adding definitions for
  void @llvm.hexagon.Y2.dccleana(i8*)
  void @llvm.hexagon.Y2.dccleaninva(i8*)
  void @llvm.hexagon.Y2.dcinva(i8*)
  void @llvm.hexagon.Y2.dczeroa(i8*)
  void @llvm.hexagon.Y4.l2fetch(i8*, i32)
  void @llvm.hexagon.Y5.l2fetch(i8*, i64)
The clang part will follow.

llvm-svn: 308032
2017-07-14 15:58:48 +00:00
Nirav Dave a8f63af9d1 Improve Aliasing of operations to static alloca
Recommiting after adding check to avoid miscomputing alias information
on addresses of the same base but different subindices.

Memory accesses offset from frame indices may alias, e.g., we
may merge write from function arguments passed on the stack when they
are contiguous. As a result, when checking aliasing, we consider the
underlying frame index's offset from the stack pointer.

Static allocs are realized as stack objects in SelectionDAG, but its
offset is not set until post-DAG causing DAGCombiner's alias check to
consider access to static allocas to frequently alias. Modify isAlias
to consider access between static allocas and access from other frame
objects to be considered aliasing.

Many test changes are included here. Most are fixes for tests which
indirectly relied on our aliasing ability and needed to be modified to
preserve their original intent.

The remaining tests have minor improvements due to relaxed
ordering. The exception is CodeGen/X86/2011-10-19-widen_vselect.ll
which has a minor degradation dispite though the pre-legalized DAG is
improved.

Reviewers: rnk, mkuper, jonpa, hfinkel, uweigand

Reviewed By: rnk

Subscribers: sdardis, nemanjai, javed.absar, llvm-commits

Differential Revision: https://reviews.llvm.org/D33345

llvm-svn: 308025
2017-07-14 13:56:21 +00:00
Zoran Jovanovic 0e03935182 Reverting commit 308011.
llvm-svn: 308017
2017-07-14 10:52:22 +00:00
Zoran Jovanovic d374c5993b [mips][microMIPS] Extending size reduction pass with ADDIUSP and ADDIUR1SP
Author: milena.vujosevic.janicic
Reviewers: sdardis
The patch extends size reduction pass for MicroMIPS.
The following instructions are examined and transformed, if possible:
ADDIU instruction is transformed into 16-bit instruction ADDIUSP
ADDIU instruction is transformed into 16-bit instruction ADDIUR1SP
Function InRange is changed to avoid left shifting of negative values, since 
that caused some sanitizer tests to fail (so the previous patch 
Differential Revision: https://reviews.llvm.org/D34511

llvm-svn: 308011
2017-07-14 10:13:11 +00:00
Diana Picus 87a7067983 [ARM] GlobalISel: Support G_BRCOND
Insert a TSTri to set the flags and a Bcc to branch based on their
values. This is a bit inefficient in the (common) cases where the
condition for the branch comes from a compare right before the branch,
since we set the flags both as part of the compare lowering and as part
of the branch lowering. We're going to live with that until we settle on
a principled way to handle this kind of situation, which occurs with
other patterns as well (combines might be the way forward here).

llvm-svn: 308009
2017-07-14 09:46:06 +00:00
Sam Parker 2893448576 [ARM] Allow rematerialization of ARM Thumb literal pool loads
Constants are crucial for code size in the ARM Thumb-1 instruction
set. The 16 bit instruction size often does not offer enough space
for immediate arguments. This means that additional instructions are
frequently used to load constants into registers. Since constants are
hoisted, this can lead to significant register spillage if they are
used multiple times in a single function. This can be avoided by
rematerialization, i.e. recomputing a constant instead of reloading
it from the stack. This patch fixes the rematerialization of literal
pool loads in the ARM Thumb instruction set.

Patch by Philip Ginsbach

Differential Revision: https://reviews.llvm.org/D33936

llvm-svn: 308004
2017-07-14 08:23:56 +00:00
Matt Arsenault 23e4df6a59 AMDGPU: Detect kernarg segment pointer
This is necessary to pass the kernarg segment pointer
to callee functions. Also don't unconditionally enable
for kernels.

llvm-svn: 307978
2017-07-14 00:11:13 +00:00
Stanislav Mekhanoshin dc2890a887 [AMDGPU] fcaninicalize optimization for GFX9+
Since GFX9 supports denorm modes for v_min_f32/v_max_f32 that
is possible to further optimize fcanonicalize and remove it
if applied to min/max given their operands are known not to be
an sNaN or that sNaNs are not supported.

Additionally we can remove fcanonicalize if denorms are supported
for the VT and we know that its argument is never a NaN.

Differential Revision: https://reviews.llvm.org/D35335

llvm-svn: 307976
2017-07-13 23:59:15 +00:00
Matt Arsenault 6b93046f29 AMDGPU: Annotate call graph with used features
Previously this wouldn't detect used features indirectly
used in callee functions.

llvm-svn: 307967
2017-07-13 21:43:42 +00:00
Andrew Zhogin af3d5fe83b [X86][tests] Added rotate_vec.ll CodeGen test. NFC precommit for bug 33691 fix.
llvm-svn: 307937
2017-07-13 18:57:40 +00:00
Nemanja Ivanovic 3c7e276d24 [PowerPC] Ensure displacements for DQ-Form instructions are multiples of 16
As outlined in the PR, we didn't ensure that displacements for DQ-Form
instructions are multiples of 16. Since the instruction encoding encodes
a quad-word displacement, a sub-16 byte displacement is meaningless and
ends up being encoded incorrectly.

Fixes https://bugs.llvm.org/show_bug.cgi?id=33671.

Differential Revision: https://reviews.llvm.org/D35007

llvm-svn: 307934
2017-07-13 18:17:10 +00:00
Martin Storsjo 68266faa31 [AArch64] Implement support for windows style vararg functions
Pass parameters properly in calls to such functions (pass all
floats in integer registers), and handle va_start properly (allocate
stack immediately below the arguments on the stack, to save the
register arguments into a single continuous array).

Differential Revision: https://reviews.llvm.org/D35006

llvm-svn: 307928
2017-07-13 17:03:12 +00:00
Matthew Simpson 06e6a6bdff [AArch64] Add preliminary support for ARMv8.1 SUB/AND atomics
This patch is a follow-up to r305893 and adds preliminary support for the
fetch_sub and fetch_and operations.

llvm-svn: 307913
2017-07-13 15:01:23 +00:00
Simon Dardis 250256f9c9 Reland "[mips] Fix multiprecision arithmetic."
For multiprecision arithmetic on MIPS, rather than using ISD::ADDE / ISD::ADDC,
get SelectionDAG to break down the operation into ISD::ADDs and ISD::SETCCs.

For MIPS, only the DSP ASE has a carry flag, so in the general case it is not
useful to directly support ISD::{ADDE, ADDC, SUBE, SUBC} nodes.

Also improve the generation code in such cases for targets with
TargetLoweringBase::ZeroOrOneBooleanContent by directly using the result of the
comparison node rather than using it in selects. Similarly for ISD::SUBE /
ISD::SUBC.

Address optimization breakage by moving the generation of MIPS specific integer
multiply-accumulate nodes to before legalization.

This revolves PR32713 and PR33424.

Thanks to Simonas Kazlauskas and Pirama Arumuga Nainar for reporting the issue!

Reviewers: slthakur

Differential Revision: https://reviews.llvm.org/D33494

The previous version of this patch was too aggressive in producing fused
integer multiple-addition instructions.

llvm-svn: 307906
2017-07-13 11:28:05 +00:00
Diana Picus c452175642 [ARM] GlobalISel: Support G_BR
This boils down to not crashing in reg bank select due to the lack of
register operands on this instruction, and adding some tests. The
instruction selection is already covered by the TableGen'erated code.

llvm-svn: 307904
2017-07-13 11:09:34 +00:00
Simon Pilgrim bb85cb16e3 [DAGCombiner] Fix issue with rotate combines asserting if the constant value types differ from the result type.
llvm-svn: 307900
2017-07-13 10:41:49 +00:00
Dylan McKay 9fb04071a2 [AVR] Fix indirect calls to function pointers
Patch by Carl Peto.

llvm-svn: 307888
2017-07-13 08:09:36 +00:00
Geoff Berry 6748abe24d [MIR] Add support for printing and parsing target MMO flags
Summary: Add target hooks for printing and parsing target MMO flags.
Targets may override getSerializableMachineMemOperandTargetFlags() to
return a mapping from string to flag value for target MMO values that
should be serialized/parsed in MIR output.

Add implementation of this hook for AArch64 SuppressPair MMO flag.

Reviewers: bogner, hfinkel, qcolombet, MatzeB

Subscribers: mcrosier, javed.absar, llvm-commits

Differential Revision: https://reviews.llvm.org/D34962

llvm-svn: 307877
2017-07-13 02:28:54 +00:00
Matt Arsenault ce34ac588e AMDGPU: Fix converting unanalyzable global loads to SMRD
Not all memory dependence queries succeed, so this needs to
be conservative if it fails.

llvm-svn: 307861
2017-07-12 23:06:18 +00:00
Sanjay Patel ac29895173 [x86] add select-of-constant tests; NFC
We're using cmov in these cases, but we could reduce to simpler ops.

llvm-svn: 307859
2017-07-12 22:42:39 +00:00
Daniel Neilson 965613ef1b Add element atomic memset intrinsic
Summary: Continuing the work from https://reviews.llvm.org/D33240, this change introduces an element unordered-atomic memset intrinsic. This intrinsic is essentially memset with the implementation requirement that all stores used for the assignment are done with unordered-atomic stores of a given element size.

Reviewers: eli.friedman, reames, mkazantsev, skatkov

Reviewed By: reames

Subscribers: jfb, dschuff, sbc100, jgravelle-google, aheejin, efriedma, llvm-commits

Differential Revision: https://reviews.llvm.org/D34885

llvm-svn: 307854
2017-07-12 21:57:23 +00:00
Stanislav Mekhanoshin 5680b0ca9f [AMDGPU] fcanonicalize elimination optimization
We are using multiplication by 1.0 to flush denormals and quiet sNaNs.
That is possible to omit this multiplication if source of the
fcanonicalize instruction is known to be flushed/quieted, i.e.
if it comes from another instruction known to do the normalization
and we are using IEEE mode to quiet sNaNs.

Differential Revision: https://reviews.llvm.org/D35218

llvm-svn: 307848
2017-07-12 21:20:28 +00:00
Sanjay Patel 4450e73b5e [x86] improve SBB optimizations for SETB/SETA with subtract
This is another step towards removing a combine that turns sext
into select of constants and preparing the backend for an IR
future where select is the canonical form.

Earlier commits in this area:
https://reviews.llvm.org/rL306040
https://reviews.llvm.org/rL306072
https://reviews.llvm.org/rL307404 (https://reviews.llvm.org/D34652)
https://reviews.llvm.org/rL307471

llvm-svn: 307821
2017-07-12 17:56:46 +00:00
Sanjay Patel 6d6c06879c [x86] add tests for improving sbb transforms; NFC
We're subtracting X from X the hard way...

llvm-svn: 307819
2017-07-12 17:44:50 +00:00
Justin Bogner 4fc696635d GlobalISel: Handle selection of G_IMPLICIT_DEF in AArch64
A generic variant of IMPLICIT_DEF was added in r306875, but this
survives to selection and hits a `Cannot Select`. Add handling that
converts the note to a regular IMPLICIT_DEF.

llvm-svn: 307817
2017-07-12 17:32:32 +00:00
Evandro Menezes 14ba3d7730 [CodeGen] Add dependency printer
Add SDep printer to make debugging sessions more productive.

Differential revision: https://reviews.llvm.org/D35144

llvm-svn: 307799
2017-07-12 15:30:59 +00:00
Davide Italiano a63981aaa9 [X86/FastIsel] Fall-back to SelectionDAG when lowering soft-floats.
FastIsel can't handle them, so we would end up crashing during
register class selection.
Fixes PR26522.

Differential Revision:  https://reviews.llvm.org/D35272

llvm-svn: 307797
2017-07-12 15:26:06 +00:00
Daniel Neilson 57226ef33c Add element atomic memmove intrinsic
Summary: Continuing the work from https://reviews.llvm.org/D33240, this change introduces an element unordered-atomic memmove intrinsic. This intrinsic is essentially memmove with the implementation requirement that all loads/stores used for the copy are done with unordered-atomic loads/stores of a given element size.

Reviewers: eli.friedman, reames, mkazantsev, skatkov

Reviewed By: reames

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D34884

llvm-svn: 307796
2017-07-12 15:25:26 +00:00
Simon Pilgrim 8dfbc772d7 [X86][SSE] Fix file check prefix warning breaking buildbots
llvm-svn: 307790
2017-07-12 13:41:13 +00:00
Kamil Rytarowski cce21c1dfe Make shell redirection construct portable
Summary:
NetBSD shell sh(1) does not support ">& /dev/null" construct.
This is bashism. The portable and POSIX solution is to use:
"> /dev/null 2>&1".

This change fixes 22 Unexpected Failures on NetBSD/amd64
for the "check-llvm" target.

Sponsored by <The NetBSD Foundation>

Reviewers: joerg, dim, rnk

Reviewed By: joerg, rnk

Subscribers: rnk, davide, llvm-commits

Differential Revision: https://reviews.llvm.org/D35277

llvm-svn: 307789
2017-07-12 13:24:46 +00:00
John Brawn 97cc283117 [ARM] Adjust ifcvt heuristic for the diamond ifcvt case
When we have a diamond ifcvt the fallthough block will have a branch at the end
of it that disappears when predicated, so discount it from the predication cost.

Differential Revision: https://reviews.llvm.org/D34952

llvm-svn: 307788
2017-07-12 13:23:10 +00:00
Simon Pilgrim ebbb969d21 [X86][SSE] Add 512-bit (iX bitcast(vXi1)) test cases
Improves test coverage for pre-AVX512 targets as well

llvm-svn: 307783
2017-07-12 12:44:10 +00:00
Diana Picus 21014df5e0 [ARM] GlobalISel: Select s64 G_FCMP
Very similar to how we select s32 G_FCMP, the only thing that is
different is the exact opcodes that we use.

llvm-svn: 307763
2017-07-12 09:01:54 +00:00
Michael Zuckerman fce5c67920 [X86][LLVM]Expanding Supports lowerInterleavedStore() in X86InterleavedAccess.
Adding base test for AVX512 

llvm-svn: 307761
2017-07-12 08:01:44 +00:00
Matthias Braun 053b084263 Specify complete target triple in test
This should fix the problems on the greendragon build.

llvm-svn: 307747
2017-07-12 01:16:50 +00:00
Konstantin Zhuravlyov bb80d3e1d3 Enhance synchscope representation
OpenCL 2.0 introduces the notion of memory scopes in atomic operations to
  global and local memory. These scopes restrict how synchronization is
  achieved, which can result in improved performance.

  This change extends existing notion of synchronization scopes in LLVM to
  support arbitrary scopes expressed as target-specific strings, in addition to
  the already defined scopes (single thread, system).

  The LLVM IR and MIR syntax for expressing synchronization scopes has changed
  to use *syncscope("<scope>")*, where <scope> can be "singlethread" (this
  replaces *singlethread* keyword), or a target-specific name. As before, if
  the scope is not specified, it defaults to CrossThread/System scope.

  Implementation details:
    - Mapping from synchronization scope name/string to synchronization scope id
      is stored in LLVM context;
    - CrossThread/System and SingleThread scopes are pre-defined to efficiently
      check for known scopes without comparing strings;
    - Synchronization scope names are stored in SYNC_SCOPE_NAMES_BLOCK in
      the bitcode.

Differential Revision: https://reviews.llvm.org/D21723

llvm-svn: 307722
2017-07-11 22:23:00 +00:00
Sanjay Patel 7c026cb1af [x86] auto-generate full checks; NFC
llvm-svn: 307718
2017-07-11 22:04:36 +00:00
Michael Zuckerman 1fe5628aa0 reverting 307677.
llvm-svn: 307698
2017-07-11 19:46:11 +00:00
Tony Jiang 892f8c42dc [PPC] Fix one test case regression for patch https://reviews.llvm.org/D34337.
llvm-svn: 307691
2017-07-11 19:07:10 +00:00
Michael Zuckerman 4b6d01a008 [X86][LLVM]Expanding Supports lowerInterleavedStore() in X86InterleavedAccess.
Base test for avx512
adding new base test to trunk befor commit change on the test

llvm-svn: 307677
2017-07-11 17:17:49 +00:00
Krzysztof Parzyszek f67cd8259d [Hexagon] Do not rely on callee-saved info in hasFP
llvm-svn: 307675
2017-07-11 17:11:54 +00:00
Tony Jiang d5acad053b [PPC] Fix two bugs in frame lowering.
1. The available program storage region of the red zone to compilers is 288
 bytes rather than 244 bytes.
2. The formula for negative number alignment calculation should be
y = x & ~(n-1) rather than y = (x + (n-1)) & ~(n-1).

Differential Revision: https://reviews.llvm.org/D34337

llvm-svn: 307672
2017-07-11 16:42:20 +00:00
Krzysztof Parzyszek c86e2ef3f5 [Hexagon] Add support for nontemporal loads and stores on HVX
Patch by Michael Wu.

Differential Revision: https://reviews.llvm.org/D35104

llvm-svn: 307671
2017-07-11 16:39:33 +00:00
Diana Picus 1e33c9c166 [ARM] GlobalISel: Tighten G_FCMP selection test. NFC
Use CHECK-NEXT for the comparison sequence, to make sure we don't get
any unexpected instructions in the middle of our flag manipulation
efforts.

llvm-svn: 307656
2017-07-11 12:34:33 +00:00
Guy Blank 509d1b2a5a [X86][AVX512] regenerate avx512-insert-extract.ll
llvm-svn: 307654
2017-07-11 11:51:49 +00:00
Diana Picus 069da27f49 [ARM] GlobalISel: Add reg mapping for s64 G_FCMP
Map the result into GPR and the operands into FPR.

llvm-svn: 307653
2017-07-11 11:47:45 +00:00
Diana Picus 84baba20db [ARM] GlobalISel: Tighten legalizer tests. NFC
Make sure that all the legalizer tests where the original instruction
needs to be removed check for the removal. We do this by adding
CHECK-NOT lines before and after the replacement sequence. This won't
catch pathological cases where the instruction remains somewhere in the
middle of the instruction sequence that's supposed to replace it, but
hopefully that won't occur in practice (since ideally we'd be setting
the insert point for the new instruction sequence either before or after
the original instruction and not fiddle with it while building the
sequence).

llvm-svn: 307647
2017-07-11 10:52:08 +00:00
Diana Picus 443135c6eb [ARM] GlobalISel: Fix oversight in G_FCMP legalization
We used to forget to erase the original instruction when replacing a
G_FCMP true/false. Fix this bug and make sure the tests check for it.

llvm-svn: 307639
2017-07-11 09:43:51 +00:00
Daniel Sanders fe12c0fa56 [globalisel][tablegen] Correct matching of intrinsic ID's.
TreePatternNode considers them to be plain integers but MachineInstr considers
them to be a distinct kind of operand.

The tweak to AArch64InstrInfo.td to produce a simple test case is a NFC for
everything except GlobalISelEmitter (confirmed by diffing the tablegenerated
files). GlobalISelEmitter is currently unable to infer the type of operands in
the Dst pattern from the operands in the Src pattern.

llvm-svn: 307634
2017-07-11 08:57:29 +00:00
Diana Picus b57bba8316 [ARM] GlobalISel: Legalize s64 G_FCMP
Same as the s32 version, for both hard and soft float.

llvm-svn: 307633
2017-07-11 08:50:01 +00:00
Serguei Katkov 0e831c996c Revert Revert [MBP] do not rotate loop if it creates extra branch
This is a second attempt to land this patch.

The first one resulted in a crash of clang sanitizer buildbot.
The fix is here and regression test is added.

This is a last fix for the corner case of PR32214. Actually this is not really corner case in general.

We should not do a loop rotation if we create an additional branch due to it.
Consider the case where we have a loop chain H, M, B, C , where
H is header with viable fallthrough from pre-header and exit from the loop
M - some middle block
B - backedge to Header but with exit from the loop also.
C - some cold block of the loop.

Let's H is determined as a best exit. If we do a loop rotation M, B, C, H we can introduce the extra branch.
Let's compute the change in number of branches:
+1 branch from pre-header to header
-1 branch from header to exit
+1 branch from header to middle block if there is such
-1 branch from cold bock to header if there is one

So if C is not a predecessor of H then we introduce extra branch.

This change actually prohibits rotation of the loop if both true
  Best Exit has next element in chain as successor.
  Last element in chain is not a predecessor of first element of chain.

Reviewers: iteratee, xur, sammccall, chandlerc	
Reviewed By: iteratee
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D34745

llvm-svn: 307631
2017-07-11 08:34:58 +00:00
Igor Breger 324d3791f8 [GlobalISel][X86] Use correct AND instructions.
AND8ri8 not supported in 64bit.

llvm-svn: 307630
2017-07-11 08:04:51 +00:00
Serguei Katkov 0b7b59ada3 [CGP] Relax a bit restriction for optimizeMemoryInst to extend scope
CodeGenPrepare::optimizeMemoryInst contains a check that we do nothing
if all instructions combining the address for memory instruction is in the same
block as memory instruction itself.

However if any of these instruction are placed after memory instruction then
address calculation will not be folded to memory instruction.

The added test case shows an example.

Reviewers: loladiro, spatel, efriedma
Reviewed By: efriedma
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D34862

llvm-svn: 307628
2017-07-11 06:24:44 +00:00
Dylan McKay 9cf1dc1e0f [AVR] Use the generic branch relaxer
llvm-svn: 307617
2017-07-11 04:17:13 +00:00
Matthias Braun b38736706e Revert "[DAG] Improve Aliasing of operations to static alloca"
Reverting as it breaks tramp3d-v4 in the llvm test-suite. I added some
comments to https://reviews.llvm.org/D33345 about it.

This reverts commit r307546.

llvm-svn: 307589
2017-07-10 20:51:30 +00:00
Matt Arsenault 9cff06f37b AMDGPU: Allow SIShrinkInstructions to fold FrameIndexes
llvm-svn: 307576
2017-07-10 20:04:35 +00:00
Matt Arsenault 6c29c5acfe AMDGPU: Allow SIShrinkInstructions to work in non-SSA
Immediates can be folded as long as the immediate is a vreg.

Also undo commuting instructions if it didn't fold an immediate.

llvm-svn: 307575
2017-07-10 19:53:57 +00:00
Krzysztof Parzyszek df4a05d6fb [Hexagon] Fix check for HMOTF_ConstExtend operand flag
This fixes https://llvm.org/PR33718.

llvm-svn: 307566
2017-07-10 18:38:52 +00:00
Krzysztof Parzyszek 0ac065f318 [Hexagon] Handle Hexagon-specific machine operand target flags in MIR
llvm-svn: 307564
2017-07-10 18:31:02 +00:00
Tony Jiang acefbcf38e [PPC CodeGen] Expand the bitreverse.i64 intrinsic.
Differential Revision: https://reviews.llvm.org/D34908
Fix PR: https://bugs.llvm.org/show_bug.cgi?id=33093

llvm-svn: 307563
2017-07-10 18:11:23 +00:00
Lei Huang 168d14b143 [PowerPC] Reduce register pressure by not materializing a constant just for use as an index register for X-Form loads/stores.
For this example:
float test (int *arr) {
    return arr[2];
}

We currently generate the following code:
  li r4, 8
  lxsiwax f0, r3, r4
  xscvsxdsp f1, f0

With this patch, we will now generate:
  addi r3, r3, 8
  lxsiwax f0, 0, r3
  xscvsxdsp f1, f0

Originally reported in: https://bugs.llvm.org/show_bug.cgi?id=27204
Differential Revision: https://reviews.llvm.org/D35027

llvm-svn: 307553
2017-07-10 16:44:45 +00:00
Andrew V. Tischenko ae9d6db769 [X86] Model 256-bit AVX instructions in the AMD Jaguar scheduler Part-1 (PR28573).
The new version of the model is definitely faster.

Differential Revision:
https://reviews.llvm.org/D35198

llvm-svn: 307552
2017-07-10 16:36:03 +00:00
Nirav Dave 163e1ad9dc [DAG] Improve Aliasing of operations to static alloca
Memory accesses offset from frame indices may alias, e.g., we
may merge write from function arguments passed on the stack when they
are contiguous. As a result, when checking aliasing, we consider the
underlying frame index's offset from the stack pointer.

Static allocs are realized as stack objects in SelectionDAG, but its
offset is not set until post-DAG causing DAGCombiner's alias check to
consider access to static allocas to frequently alias. Modify isAlias
to consider access between static allocas and access from other frame
objects to be considered aliasing.

Many test changes are included here. Most are fixes for tests which
indirectly relied on our aliasing ability and needed to be modified to
preserve their original intent.

The remaining tests have minor improvements due to relaxed
ordering. The exception is CodeGen/X86/2011-10-19-widen_vselect.ll
which has a minor degradation dispite though the pre-legalized DAG is
improved.

Reviewers: rnk, mkuper, jonpa, hfinkel, uweigand

Reviewed By: rnk

Subscribers: sdardis, nemanjai, javed.absar, llvm-commits

Differential Revision: https://reviews.llvm.org/D33345

llvm-svn: 307546
2017-07-10 15:39:41 +00:00
Gadi Haber f4d154c089 This patch completely replaces the scheduling information for the SandyBridge architecture target by modifying the file X86SchedSandyBridge.td located under the X86 Target.
The SandyBridge architects have provided us with a more accurate information about each instruction latency, number of uOPs and used ports and I used it to replace the existing estimated SNB instructions scheduling and to add missing scheduling information.

Please note that the patch extensively affects the X86 MC instr scheduling for SNB.

Also note that this patch will be followed by additional patches for the remaining target architectures HSW, IVB, BDW, SKL and SKX.

The updated and extended information about each instruction includes the following details:
•static latency of the instruction
•number of uOps from which the instruction consists of
•all ports used by the instruction's' uOPs

For example, the following code dictates that instructions, ADC64mr, ADC8mr, SBB64mr, SBB8mr have a static latency of 9 cycles. Each of these instructions is decoded into 6 micro operations which use ports 4, ports 2 or 3 and port 0 and ports 0 or 1 or 5:

def SBWriteResGroup94 : SchedWriteRes<[SBPort4,SBPort23,SBPort0,SBPort015]> {
let Latency = 9;
let NumMicroOps = 6;
let ResourceCycles = [1,2,2,1];

}
def: InstRW<[SBWriteResGroup94], (instregex "ADC64mr")>;
def: InstRW<[SBWriteResGroup94], (instregex "ADC8mr")>;
def: InstRW<[SBWriteResGroup94], (instregex "SBB64mr")>;
def: InstRW<[SBWriteResGroup94], (instregex "SBB8mr")>;

Note that apart for the header, most of the X86SchedSandyBridge.td file was generated by a script.

Reviewers: zvi, chandlerc, RKSimon, m_zuckerman, craig.topper, igorb

Differential Revision:  https://reviews.llvm.org/D35019#inline-304691

llvm-svn: 307529
2017-07-10 09:53:16 +00:00
Igor Breger d8b51e134e [GlobalISel][X86] Support G_LOAD/G_STORE i1.
Summary: Support G_LOAD/G_STORE i1.

Reviewers: zvi, guyblank

Reviewed By: guyblank

Subscribers: rovka, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D35178

llvm-svn: 307527
2017-07-10 09:26:09 +00:00
Igor Breger d48c5e4855 [GlobalISel][X86] extend G_ZEXT support.
Summary:
Mark G_ZEXT/G_SEXT i1 to i8/i16,  i8 to i16 as legal.
Support G_ZEXT i1 to i8/i16 instruction selection ( C++ code).
This patch requred to support G_LOAD/G_STORE i1.

Reviewers: zvi, guyblank

Reviewed By: guyblank

Subscribers: rovka, llvm-commits, kristof.beyls

Differential Revision: https://reviews.llvm.org/D35177

llvm-svn: 307526
2017-07-10 09:07:34 +00:00
Davide Italiano c4b0ccd049 [X86] Relax an assertion when legalizing vector types.
WidenVSELECTAndMask can fold (and it folds in this case) so we
get a BUILD_VECTOR of constants as mask. convertMask() seems to
work fine when the input is a vector of constants, and we still
need to call it to extend/add elements at the end. but the current
code just asserts on anything but a SETCC or AND/OR/XOR of 2xSETCC.
This change was discussed briefly with Simon Pilgrim, who also
suggests we might consider dropping this assertion in the future.

Fixes PR33715.

llvm-svn: 307508
2017-07-09 19:22:48 +00:00
Dylan McKay 448c56e2a5 [AVR] Fix test errors due to tied operands not matching
Broken due to r307259.

llvm-svn: 307503
2017-07-09 16:36:35 +00:00
Simon Pilgrim 55a4b6700f Handle ConstantExpr correctly in SelectionDAGBuilder
This change fixes a bug in SelectionDAGBuilder::visitInsertValue and SelectionDAGBuilder::visitExtractValue where constant expressions (InsertValueConstantExpr and ExtractValueConstantExpr) would be treated as non-constant instructions (InsertValueInst and ExtractValueInst). This bug resulted in an incorrect memory access, which manifested as an assertion failure in SDValue::SDValue.

Fixes PR#33094.

Submitted on behalf of @Praetonus (Benoit Vey)

Differential Revision: https://reviews.llvm.org/D34538

llvm-svn: 307502
2017-07-09 16:01:04 +00:00
Simon Pilgrim 8247687e0f [X86][AVX512] Regenerate AVX512VL comparison tests.
Show poor codegen on KNL targets as mentioned on D35179

llvm-svn: 307500
2017-07-09 15:47:43 +00:00
Igor Breger 769cd05232 [GlobalISel][X86] Add legalizer tests for G_LOAD/G_STORE operations. NFC.
llvm-svn: 307494
2017-07-09 07:25:57 +00:00
Igor Breger b80b44b7b9 [FastISel] fix a fallback diagnostic.
Summary: FastISel was marked as failed in case instruction selection succeeded.

Reviewers: qcolombet, zvi, rovka, ab

Reviewed By: zvi

Subscribers: javed.absar, ab, qcolombet, bogner, llvm-commits

Differential Revision: https://reviews.llvm.org/D34438

llvm-svn: 307489
2017-07-09 05:55:20 +00:00
Hiroshi Inoue 713b5ba2de fix trivial typos; NFC
sucessor -> successor 

llvm-svn: 307488
2017-07-09 05:54:44 +00:00
Sanjay Patel 18ee908ca2 [x86] add SBB optimization for SETBE (ule) condition code
x86 scalar select-of-constants (Cond ? C1 : C2) combining/lowering is a mess 
with missing optimizations. We handle some patterns, but miss logical variants.

To clean that up, we should convert all select-of-constants to logic/math and 
enhance the combining for the expected patterns from that. Selecting 0 or -1 
needs extra attention to produce the optimal code as shown here.

Attempt to verify that all of these IR forms are logically equivalent:
http://rise4fun.com/Alive/plxs

Earlier steps in this series:
rL306040
rL306072
rL307404 (D34652)

As acknowledged in the earlier review, there's a possibility that some Intel
uarch would prefer to produce an xor to clear the fake register operand with
sbb %eax, %eax. This will likely need to be addressed in a separate pass.

llvm-svn: 307471
2017-07-08 14:04:48 +00:00
Quentin Colombet 868ef847a6 [RegAllocFast] Don't insert kill flags of super-register for partial kill
When reusing a register for a new definition, the fast register allocator
used to insert a kill flag at the previous last use of that register to
inform later passes that this register is free between the redef and the
last use. However, this may be wrong when subregisters are involved.
Indeed, a partially redef would have trigger a kill of the full super
register, potentially wrongly marking all the other subregisters as
free. Given we don't track which lanes are still live, we cannot set the
kill flag in such case.

Note: This bug has been latent for about 7 years (r104056).

llvmg.org/PR33677

llvm-svn: 307428
2017-07-07 19:25:45 +00:00
Quentin Colombet 81551148b7 [RegAllocFast] Add the proper initialize method to use the .mir infrastructure
NFC

llvm-svn: 307427
2017-07-07 19:25:42 +00:00
Tony Jiang c260e0eb56 [PPC CodeGen] Expand the bitreverse.i32 intrinsic.
Differential Revision: https://reviews.llvm.org/D33572
Fix PR: https://bugs.llvm.org/show_bug.cgi?id=33093

llvm-svn: 307413
2017-07-07 16:41:55 +00:00
Sanjay Patel dd36f75733 [x86] add SBB optimization for SETAE (uge) condition code
x86 scalar select-of-constants (Cond ? C1 : C2) combining/lowering is a mess 
with missing optimizations. We handle some patterns, but miss logical variants.

To clean that up, we should convert all select-of-constants to logic/math and 
enhance the combining for the expected patterns from that. DAGCombiner already 
has the foundation to allow the transforms, so we just need to fill in the holes 
for x86 math op lowering. Selecting 0 or -1 needs extra attention to produce the
optimal code as shown here.

Attempt to verify that all of these IR forms are logically equivalent:
http://rise4fun.com/Alive/plxs

Earlier steps in this series:
rL306040
rL306072

Differential Revision: https://reviews.llvm.org/D34652

llvm-svn: 307404
2017-07-07 14:56:20 +00:00
Andrew V. Tischenko a2ab3ed0df NFC: I simply added CHECK-LABEL to prevent false matches in the tests.
llvm-svn: 307397
2017-07-07 13:41:33 +00:00
Florian Hahn d4550baf3b [AArch64] Use 16 bytes as preferred function alignment on Cortex-A57.
Summary:
This change gives a 0.89% speed on execution time, a 0.94% improvement
in benchmark scores and a 0.62% increase in binary size on a Cortex-A57.
These numbers are the geomean results on a wide range of benchmarks from
the test-suite, SPEC2000, SPEC2006 and a range of proprietary suites.

The software optimization guide for the Cortex-A57 recommends 16 byte
branch alignment.

Reviewers: t.p.northover, mcrosier, javed.absar, kristof.beyls, sbaranga

Reviewed By: kristof.beyls

Subscribers: aemerson, rengolin, llvm-commits

Differential Revision: https://reviews.llvm.org/D34954

llvm-svn: 307389
2017-07-07 10:43:01 +00:00
Florian Hahn e3666ec9d6 [AArch64] Use 16 bytes as preferred function alignment on Cortex-A72.
Summary:
This change gives a 0.34% speed on execution time, a 0.61% improvement
in benchmark scores and a 0.57% increase in binary size on a Cortex-A72.
These numbers are the geomean results on a wide range of benchmarks from
the test-suite, SPEC2000, SPEC2006 and a range of proprietary suites.

The software optimization guide for the Cortex-A72 recommends 16 byte
branch alignment.


Reviewers: t.p.northover, kristof.beyls, rengolin, sbaranga, mcrosier, javed.absar

Reviewed By: kristof.beyls

Subscribers: llvm-commits, aemerson

Differential Revision: https://reviews.llvm.org/D34961

llvm-svn: 307380
2017-07-07 10:15:49 +00:00
Florian Hahn 9872a6aaad [AArch64] Add test case for preferred function alignment (NFC).
Reviewers: evandro, joelkevinjones, mcrosier

Reviewed By: joelkevinjones, mcrosier

Subscribers: mcrosier, aemerson, llvm-commits, rengolin, evandro, javed.absar, joelkevinjones, kristof.beyls

Differential Revision: https://reviews.llvm.org/D34951

llvm-svn: 307369
2017-07-07 09:17:53 +00:00
Diana Picus 5b91653840 [ARM] GlobalISel: Select hard G_FCMP for s32
We lower to a sequence consisting of:
- MOVi 0 into a register
- VCMPS to do the actual comparison and set the VFP flags
- FMSTAT to move the flags out of the VFP unit
- MOVCCi to either use the "zero register" that we have previously set
  with the MOVi, or move 1 into the result register, based on the values
  of the flags

As was the case with soft-float, for some predicates (one, ueq) we
actually need two comparisons instead of just one. When that happens, we
generate two VCMPS-FMSTAT-MOVCCi sequences and chain them by means of
using the result of the first MOVCCi as the "zero register" for the
second one. This is a bit overkill, since one comparison followed by
two non-flag-setting conditional moves should be enough. In any case,
the backend manages to CSE one of the comparisons away so it doesn't
matter much.

Note that unlike SelectionDAG and FastISel, we always use VCMPS, and not
VCMPES. This makes the code a lot simpler, and it also seems correct
since the LLVM Lang Ref defines simple true/false returns if the
operands are QNaN's. For SNaN's, even VCMPS throws an Invalid Operand
exception, so they won't be slipping through unnoticed.

Implementation-wise, this introduces a template so we can share the same
code that we use for handling integer comparisons, since the only
differences are in the details (exact opcodes to be used etc). Hopefully
this will be easy to extend to s64 G_FCMP.

llvm-svn: 307365
2017-07-07 08:39:04 +00:00
Matthias Braun eeb1516884 RegisterScavenging: Fix PR33687
When scavenging for a use in instruction MI, we will reload after
that instruction and hence cannot spill uses/defs of this instruction.

This fixes http://llvm.org/PR33687

llvm-svn: 307352
2017-07-07 03:02:18 +00:00
Sean Fertile 9cd1cdf814 Extend memcpy expansion in Transform/Utils to handle wider operand types.
Adds loop expansions for known-size and unknown-sized memcpy calls, allowing the
target to provide the operand types through TTI callbacks. The default values
for the TTI callbacks use int8 operand types and matches the existing behaviour
if they aren't overridden by the target.

Differential revision: https://reviews.llvm.org/D32536

llvm-svn: 307346
2017-07-07 02:00:06 +00:00
Michael Kuperstein 20d8e4ef76 Reverting r307326 because it breaks clang tests.
llvm-svn: 307334
2017-07-06 23:24:39 +00:00
Wei Mi 20526b2725 [ConstHoisting] choose to hoist when frequency is the same.
The patch is to adjust the strategy of frequency based consthoisting:
Previously when the candidate block has the same frequency with the existing
blocks containing a const, it will not hoist the const to the candidate block.
For that case, now we change the strategy to hoist the const if only existing
blocks have more than one block member. This is helpful for reducing code size.

Differential Revision: https://reviews.llvm.org/D35084

llvm-svn: 307328
2017-07-06 22:32:27 +00:00
Michael Kuperstein b9fc48da83 [NVPTX] Add lowering of i128 params.
The patch adds support of i128 params lowering. The changes are quite trivial to
support i128 as a "special case" of integer type. With this patch, we lower i128
params the same way as aggregates of size 16 bytes: .param .b8 _ [16].

Currently, NVPTX can't deal with the 128 bit integers:
* in some cases because of failed assertions like
ValVTs.size() == OutVals.size() && "Bad return value decomposition"
* in other cases emitting PTX with .i128 or .u128 types (which are not valid [1])
[1] http://docs.nvidia.com/cuda/parallel-thread-execution/index.html#fundamental-types

Differential Revision: https://reviews.llvm.org/D34555
Patch by: Denys Zariaiev (denys.zariaiev@gmail.com)

llvm-svn: 307326
2017-07-06 22:18:54 +00:00
Matt Arsenault 9aa45f047f AMDGPU: Add macro fusion schedule DAG mutation
Try to increase opportunities to shrink vcc uses.

llvm-svn: 307313
2017-07-06 20:57:05 +00:00
Matt Arsenault 60b91e0ba2 AMDGPU: Remove unnecessary IR from MIR tests
llvm-svn: 307311
2017-07-06 20:56:57 +00:00
Stanislav Mekhanoshin 9d7b1c9ddb [AMDGPU] Always use rcp + mul with fast math
Regardless of relaxation options such as -cl-fast-relaxed-math
we are producing rather long code for fdiv via amdgcn_fdiv_fast
intrinsic. This intrinsic is used to replace fdiv with 2.5ulp
metadata and does not handle denormals, thus believed to be fast.

An fdiv instruction can also have fast math flag either by itself
or together with fpmath metadata. Clang used with a relaxation flag
always produces both metadata and fast flag:

%div = fdiv fast float %v, %0, !fpmath !12
!12 = !{float 2.500000e+00}

Current implementation ignores fast flag and favors metadata. An
instruction with just fast flag would be lowered to a fastest rcp +
mul, but that never happen on practice because of described mutual
clang and BE behavior.

This change allows an "fdiv fast" to be always lowered as rcp + mul.

Differential Revision: https://reviews.llvm.org/D34844

llvm-svn: 307308
2017-07-06 20:34:21 +00:00
Simon Pilgrim a80cb1d7a7 [X86][SSE] Tests for bitcasting iX integers to vXi1 boolean vectors
Including sign/zero extension to legal types

llvm-svn: 307301
2017-07-06 19:33:10 +00:00
Simon Pilgrim 0fee3372c9 [X86][SSE] Dropped -mcpu from bitcast+setcc tests
Use triple and attribute only for consistency

Added SSE2/AVX tests on 256-bit vectors to test PACKSS behaviour

llvm-svn: 307289
2017-07-06 18:27:34 +00:00
Wei Mi 90707394e3 [LSR] Narrow search space by filtering non-optimal formulae with the same ScaledReg and Scale.
When the formulae search space is huge, LSR uses a series of heuristic to keep
pruning the search space until the number of possible solutions are within
certain limit.

The big hammer of the series of heuristics is NarrowSearchSpaceByPickingWinnerRegs,
which picks the register which is used by the most LSRUses and deletes the other
formulae which don't use the register. This is a effective way to prune the search
space, but quite often not a good way to keep the best solution. We saw cases before
that the heuristic pruned the best formula candidate out of search space.

To relieve the problem, we introduce a new heuristic called
NarrowSearchSpaceByFilterFormulaWithSameScaledReg. The basic idea is in order to
reduce the search space while keeping the best formula, we want to keep as many
formulae with different Scale and ScaledReg as possible. That is because the central
idea of LSR is to choose a group of loop induction variables and use those induction
variables to represent LSRUses. An induction variable candidate is often represented
by the Scale and ScaledReg in a formula. If we have more formulae with different
ScaledReg and Scale to choose, we have better opportunity to find the best solution.
That is why we believe pruning search space by only keeping the best formula with the
same Scale and ScaledReg should be more effective than PickingWinnerReg. And we use
two criteria to choose the best formula with the same Scale and ScaledReg. The first
criteria is to select the formula using less non shared registers, and the second
criteria is to select the formula with less cost got from RateFormula. The patch
implements the heuristic before NarrowSearchSpaceByPickingWinnerRegs, which is the
last resort.

Testing shows we get 1.8% and 2% on two internal benchmarks on x86. llvm nightly
testsuite performance is neutral. We also tried lsr-exp-narrow and it didn't help
on the two improved internal cases we saw.

Differential Revision: https://reviews.llvm.org/D34583

llvm-svn: 307269
2017-07-06 15:52:14 +00:00
Simon Pilgrim 713600747e [X86][SSE4A] Add support for shuffle combining to INSERTQI.
llvm-svn: 307268
2017-07-06 15:34:17 +00:00
Simon Pilgrim 03641df383 [X86][SSE4A] Add test showing missed opportunities to combine INSERTQI shuffle
llvm-svn: 307265
2017-07-06 14:52:24 +00:00
Sanjay Patel 2a341620e7 [x86] fix over-specified triple and auto-generate checks; NFC
llvm-svn: 307262
2017-07-06 14:15:15 +00:00
Mikael Holmen 9c3e2eac6a [MachineVerifier] Add check that tied physregs aren't different.
Summary: Added MachineVerifier code to check register ties more thoroughly, especially so that physical registers that are tied are the same. This may help e.g. when creating MIR files.

Original patch by Jesper Antonsson

Reviewers: stoklund, sanjoy, qcolombet

Reviewed By: qcolombet

Subscribers: qcolombet, llvm-commits

Differential Revision: https://reviews.llvm.org/D34394

llvm-svn: 307259
2017-07-06 13:18:21 +00:00
Simon Pilgrim cc0f785dca [X86][SSE4A] Add support for shuffle combining to EXTRQ.
llvm-svn: 307254
2017-07-06 12:22:58 +00:00
Simon Pilgrim 40c0ae200f [X86][SSE4A] Add scheduling tests for SSE4A instructions
llvm-svn: 307251
2017-07-06 11:26:43 +00:00
David Stuttard 7528d4bd42 [RegisterCoalescer] Fix for SubRange join unreachable
Summary:
During remat, some subranges might end up having invalid segments which caused problems for later
coalescing.

Added in a check to remove segments that are invalidated as part of the remat.

See http://llvm.org/PR33524

Subscribers: MatzeB, qcolombet

Differential Revision: https://reviews.llvm.org/D34391

llvm-svn: 307247
2017-07-06 10:07:57 +00:00
Diana Picus c3a9c34761 [ARM] GlobalISel: Map s32 G_FCMP in reg bank select
Map hard G_FCMP operands to FPR and the result to GPR.

llvm-svn: 307245
2017-07-06 09:57:46 +00:00
Diana Picus d0104eaae8 [ARM] GlobalISel: Legalize G_FCMP for s32
This covers both hard and soft float.

Hard float is easy, since it's just Legal.

Soft float is more involved, because there are several different ways to
handle it based on the predicate: one and ueq need not only one, but two
libcalls to get a result. Furthermore, we have large differences between
the values returned by the AEABI and GNU functions.

AEABI functions return a nice 1 or 0 representing true and respectively
false. GNU functions generally return a value that needs to be compared
against 0 (e.g. for ogt, the value returned by the libcall is > 0 for
true).  We could introduce redundant comparisons for AEABI as well, but
they don't seem easy to remove afterwards, so we do different processing
based on whether or not the result really needs to be compared against
something (and just truncate if it doesn't).

llvm-svn: 307243
2017-07-06 09:09:33 +00:00
Diana Picus cd460c89c4 [ARM] GlobalISel: Widen s1, s8, s16 G_CONSTANT
Get the legalizer to widen small constants.

llvm-svn: 307239
2017-07-06 08:04:16 +00:00
Vadim Chugunov e6f76558c7 Fix libcall expansion creating DAG nodes with invalid type post type legalization.
If we are lowering a libcall after legalization, we'll split the return type into a pair of legal values.

Patch by Jatin Bhateja and Eli Friedman.

Differential Revision: https://reviews.llvm.org/D34240

llvm-svn: 307207
2017-07-05 22:01:49 +00:00
Simon Pilgrim ac78daf517 {DAGCombiner] Fold (rot x, 0) -> x
llvm-svn: 307184
2017-07-05 18:27:11 +00:00
Simon Pilgrim 49123d4bb0 [X86] Test bitfield loadstore tests on i686 as well
llvm-svn: 307182
2017-07-05 18:09:30 +00:00
Sean Fertile d44cb1838f [PowerPC] Make sure that we remove dead PHI nodes after the PPCCTRLoops pass.
Commiting on behalf of Stefan Pintilie.
Differential Revision: https://reviews.llvm.org/D34829

llvm-svn: 307180
2017-07-05 17:57:57 +00:00
Andrew Zhogin 45d192823e [DAGCombiner] visitRotate patch to optimize pair of ROTR/ROTL instructions into one with combined shift operand.
For two ROTR operations with shifts C1, C2; combined shift operand will be (C1 + C2) % bitsize.

Differential revision: https://reviews.llvm.org/D12833

llvm-svn: 307179
2017-07-05 17:55:42 +00:00
Simon Pilgrim 55006b407b [X86][SSE] Dropped -mcpu from bitcast+setcc mask tests
Use triple and attribute only for consistency

llvm-svn: 307176
2017-07-05 17:30:30 +00:00
Tony Jiang aa5a6a1c30 [Power9] Exploit vector extract with variable index.
This patch adds the exploitation for new power 9 instructions which extract
variable elements from vectors:
VEXTUBLX
VEXTUBRX
VEXTUHLX
VEXTUHRX
VEXTUWLX
VEXTUWRX

Differential Revision: https://reviews.llvm.org/D34032
Commit on behalf of Zaara Syeda (syzaara@ca.ibm.com)

llvm-svn: 307174
2017-07-05 16:55:00 +00:00
Tony Jiang 9a91a18110 [Power9] Exploit vector integer extend instructions when indices aren't correct.
This patch adds on to the exploitation added by https://reviews.llvm.org/D33510.
This now catches build vector nodes where the inputs are coming from sign
extended vector extract elements where the indices used by the vector extract
are not correct. We can still use the new hardware instructions by adding a
shuffle to move the elements to the correct indices. I introduced a new PPCISD
node here because adding a vector_shuffle and changing the elements of the
vector_extracts was getting undone by another DAG combine.

Commit on behalf of Zaara Syeda (syzaara@ca.ibm.com)
Differential Revision: https://reviews.llvm.org/D34009

llvm-svn: 307169
2017-07-05 16:00:38 +00:00
Nirav Dave 65b7ab1be4 [Hexagon] Preclude non-memory test from being optimized away. NFC.
llvm-svn: 307153
2017-07-05 13:08:03 +00:00
Igor Breger 55e2f5963a [GlobalIsel] allow x86_fp80 values to be dumped.
Summary:
Otherwise the fallback path fails with an assertion on x86_64 targets,
when "x86_fp80" is encountered.

Reviewers: t.p.northover, zvi, guyblank

Reviewed By: zvi

Subscribers: rovka, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D34975

llvm-svn: 307140
2017-07-05 11:11:10 +00:00
Nemanja Ivanovic 5fd4ea36fd Add the missing triple to the test case added as part of r307120.
llvm-svn: 307122
2017-07-05 05:14:43 +00:00
Nemanja Ivanovic 845a7968bc [PowerPC] Fix for PR33636
Remove casts to a constant when a node can be an undef.

Differential Revision: https://reviews.llvm.org/D34808

llvm-svn: 307120
2017-07-05 04:51:29 +00:00
Nirav Dave b320ef9fab Rewrite areNonVolatileConsecutiveLoads to use BaseIndexOffset
Relanding after rewriting undef.ll test to avoid host-dependant
endianness.

As discussed in D34087, rewrite areNonVolatileConsecutiveLoads using
generic checks. Also, propagate missing local handling from there to
BaseIndexOffset checks.

Tests of note:

  * test/CodeGen/X86/build-vector* - Improved.
  * test/CodeGen/BPF/undef.ll - Improved store alignment allows an
    additional store merge

  * test/CodeGen/X86/clear_upper_vector_element_bits.ll - This is a
    case we already do not handle well. Here, the DAG is improved, but
    scheduling causes a code size degradation.

Reviewers: RKSimon, craig.topper, spatel, andreadb, filcab

Subscribers: nemanjai, llvm-commits

Differential Revision: https://reviews.llvm.org/D34472

llvm-svn: 307114
2017-07-05 01:21:23 +00:00
Dylan McKay a24aa19900 Revert "[AVR] Add the branch selection pass from the GitHub repository"
This reverts commit 602ef067c1d58ecb425d061f35f2bc4c7e92f4f3.

llvm-svn: 307111
2017-07-05 00:50:56 +00:00
Dylan McKay f115c7f917 [AVR] Add the branch selection pass from the GitHub repository
We should rewrite this using the generic branch relaxation pass, but for
the moment having this pass is better than hitting an assertion error.

llvm-svn: 307109
2017-07-05 00:41:19 +00:00
Gadi Haber 689426e3cb NFC.
Made some updates to the half.ll test under CodeGen to make it friendly to the update_llc_test_checks .py tool as follows:
1.Removing the llc flag -asm-verbose=false
2.Grouping the multiple check-prefix directives
3.Apply update_llc_test_checks.py tool on the test

This change is needed to easily update scheduling changes in an upcoming patch.

Reviewers: zvi, RKSimon, craig.topper 

Differential Revision: https://reviews.llvm.org/D34934

llvm-svn: 307108
2017-07-04 21:51:05 +00:00
Andrew Zhogin 2f8be0552e [ARM][test] Added test/CodeGen/ARM/ror.ll test. NFC precommit for D12833.
llvm-svn: 307103
2017-07-04 19:50:22 +00:00
Simon Pilgrim ac3e7f3f57 [X86][SSE4A] Add support for combining from non-v16i8 EXTRQI/INSERTQI shuffles
With the improved shuffle decoding we can now combine EXTRQI/INSERTQI shuffles from non-v16i8 vector types

llvm-svn: 307099
2017-07-04 18:11:02 +00:00
Alexander Timofeev 982aee6a38 [AMDGPU] Switch scalarize global loads ON by default
Differential revision: https://reviews.llvm.org/D34407

llvm-svn: 307097
2017-07-04 17:32:00 +00:00
Anna Thomas 505941e7d6 [FastISel] Move gc intrinsic test to X86 directory
Move from generic to X86 directory since gc intrinsics only supposed in
X86 64 bit.
Add target triple as well.
Fixes build failure in i686-linux-RA  caused by rL307084.

llvm-svn: 307086
2017-07-04 15:24:08 +00:00
Anna Thomas a66a98cc74 [FastISel][SelectionDAG]Teach fastISel about GC intrinsics
Summary:
We are crashing in LLC at O0 when gc intrinsics are present in the block.
The reason being FastISel performs basic block ISel by modifying GC.relocates
to be the first instruction in the block. This can cause us to visit the GC
relocate before it's corresponding GC.statepoint is visited, which is incorrect.
When we lower the statepoint, we record the base and derived pointers, along
with the gc.relocates. After this we can visit the gc.relocate.

This patch avoids fastISel from incorrectly creating the block with gc.relocate
as the first instruction.

Reviewers: qcolombet, skatkov, qikon, reames

Reviewed by: skatkov

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D34421

llvm-svn: 307084
2017-07-04 15:09:09 +00:00
Simon Pilgrim d128222f0c [X86] Add combine tests for vector rotates
Reference tests for D12833

llvm-svn: 307073
2017-07-04 12:33:53 +00:00
Gadi Haber 4980790e81 NFC commit.
Converting the Codegen test "extractelement-legalization-store-ordering.ll" to be "update_llc_test_checks" friendly.

The changes to the test are needed for an upcoming scheduling patch.

Reviewers: zvi, RKSimon

Differential Revision: https://reviews.llvm.org/D34935

llvm-svn: 307066
2017-07-04 07:18:03 +00:00