Commit Graph

23501 Commits

Author SHA1 Message Date
Marek Olsak 7d92b7e23a AMDGPU: Fix S_BUFFER_LOAD_DWORD_SGPR moveToVALU
Author: Bas Nieuwenhuizen

https://reviews.llvm.org/D42881

llvm-svn: 324353
2018-02-06 15:17:55 +00:00
Krzysztof Parzyszek 88f11003a0 [Hexagon] Split HVX operations on vector pairs
Vector pairs are legal types, but not every operation can work on pairs.
For those operations that are legal for single vectors, generate a concat
of their results on pair halves.

llvm-svn: 324350
2018-02-06 14:24:57 +00:00
Krzysztof Parzyszek 69f1d7e370 [Hexagon] Handle lowering of SETCC via setCondCodeAction
It was expanded directly into instructions earlier. That was to avoid
loads from a constant pool for a vector negation: "xor x, splat(i1 -1)".
Implement ISD opcodes QTRUE and QFALSE to denote logical vectors of
all true and all false values, and handle setcc with negations through
selection patterns.

llvm-svn: 324348
2018-02-06 14:16:52 +00:00
Simon Pilgrim ae00a71f55 [X86][SSE] Add PACKUS support for truncation of clamped values
Followup to D42544 that matches PACKUSWB cases for non-AVX512, SSE and PACKUSDW cases will have to wait until we can add support for general SMIN/SMAX matching.

llvm-svn: 324347
2018-02-06 14:07:46 +00:00
Tim Renouf 807ecc3d66 [AMDGPU] do not generate .AMDGPU.config for amdpal os type
Summary:
Now we generate PAL metadata for the amdpal os type, there is no need to
generate the .AMDGPU.config section.

Reviewers: arsenm, nhaehnle, dstuttard

Subscribers: kzhuravl, wdng, yaxunl, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D37760

Change-Id: I303c5fad66656ce97293da60621afac6595b4c18
llvm-svn: 324346
2018-02-06 13:39:38 +00:00
Simon Pilgrim 90a237bf83 [X86][SSE] Add PACKSS support for truncation of clamped values
Followup to D42544 that matches PACKSSWB cases for non-AVX512, SSE and PACKSSDW cases will have to wait until we can add support for general SMIN/SMAX matching.

llvm-svn: 324339
2018-02-06 12:16:10 +00:00
Sjoerd Meijer 89ea2648bb [ARM] Armv8.2-A FP16 code generation (part 3/3)
This adds most of the FP16 codegen support, but these areas need further work:

- FP16 literals and immediates are not properly supported yet (e.g. literal
  pool needs work),
- Instructions that are generated from intrinsics (e.g. vabs) haven't been
  added.

This will be addressed in follow-up patches.

Differential Revision: https://reviews.llvm.org/D42849

llvm-svn: 324321
2018-02-06 08:43:56 +00:00
Clement Courbet 333be329c4 Revert "[MergeICmps] Enable the MergeICmps Pass by default."
Breaks clang-ppc64be-linux-multistage buildbot.

This reverts commit 515bab711f308c2e8299c49dd8c84ea6a2e0b60e.

llvm-svn: 324319
2018-02-06 08:40:18 +00:00
Clement Courbet 7d09780fa2 [MergeICmps] Enable the MergeICmps Pass by default.
Summary: Now that PR33325 is fixed, this should always improve the generated code.

Reviewers: spatel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D42793

llvm-svn: 324317
2018-02-06 07:20:33 +00:00
Craig Topper 94235556aa [X86] Modify a few tests to not use icmps that are provably false.
These used things like unsigned less than zero, which is always false because there is no unsigned number less than zero.

I plan to teach DAG combine to optimize these so need to stop using them.

llvm-svn: 324315
2018-02-06 06:44:05 +00:00
Konstantin Zhuravlyov 8818d13ed2 AMDGPU/MemoryModel: Fix monotonic atomic loads
Those should have glc bit set for system and agent synchronization scopes

llvm-svn: 324314
2018-02-06 04:06:04 +00:00
Derek Schuff dc51fb4919 [WebAssembly] Fix test expectations after r324274
Wasm uses the expand action for several FP compare ops, and that behavior
changed.

llvm-svn: 324305
2018-02-06 01:21:17 +00:00
Reid Kleckner acb31b92ee Update test expectations after reverting PLT change
llvm-svn: 324304
2018-02-06 00:56:06 +00:00
Reid Kleckner 697d1bc236 Revert "Don't assume a null GV is local for ELF and MachO."
This reverts r323297.

It breaks building grub.

llvm-svn: 324301
2018-02-06 00:47:14 +00:00
Craig Topper 9198efceb8 [X86] Auto-generate complete checks. NFC
llvm-svn: 324295
2018-02-05 23:57:03 +00:00
Craig Topper 9c6c7c5e9b [X86] Relax restrictions on what setcc condition codes can be folded with a sext when AVX512 is enabled.
We now allow all signed comparisons and not equal. The complement that needs to be added for this is no worse than the extend. And the vector output forms of pcmpeq/pcmpgt have better latency than the k-register version on SKX.

llvm-svn: 324294
2018-02-05 23:57:01 +00:00
Sanjay Patel d7c702b451 [LoopStrengthReduce, x86] don't add cost for a cmp that will be macro-fused (PR35681)
In the motivating case from PR35681 and represented by the macro-fuse-cmp test:
https://bugs.llvm.org/show_bug.cgi?id=35681
...there's a 37 -> 31 byte size win for the loop because we eliminate the big base 
address offsets.

SPEC2017 on Ryzen shows no significant perf difference.

Differential Revision: https://reviews.llvm.org/D42607

llvm-svn: 324289
2018-02-05 23:43:05 +00:00
Francis Visoiu Mistrih 3c748e55d5 [PEI] Fix failing test caused by r324283
X86FrameLowering sets stack size to 0 if redzone is enabled.

llvm-svn: 324285
2018-02-05 23:06:47 +00:00
Paul Robinson 0a22709f06 [DWARF] Regularize dumping strings from line tables.
The major visible difference here is that in line-table dumps,
directory and file names are wrapped in double-quotes; previously,
directory names got single quotes and file names were not quoted at
all.

The improvement in this patch is that when a DWARF v5 line table
header has indirect strings, in a verbose dump these will all have
their section[offset] printed as well as the name itself.  This
matches the format used for dumping strings in the .debug_info
section.

Differential Revision: https://reviews.llvm.org/D42802

llvm-svn: 324270
2018-02-05 20:43:15 +00:00
Nirav Dave eedb663221 [X86] Teach DAG unfoldMemoryOperand to reconvert CMPs to tests
Summary:
Copy MI-level cmp->test conversion to SelectionDAG-level memory unfold.
This fixes a regression from upcoming D41293 change.

Reviewers: craig.topper, RKSimon

Reviewed By: craig.topper

Subscribers: llvm-commits, hiraditya

Differential Revision: https://reviews.llvm.org/D42808

llvm-svn: 324261
2018-02-05 18:58:58 +00:00
Craig Topper 9a06f24704 [X86] Artificially lower the complexity of the scalar ANDN patterns so that AND with immediate will match first.
This allows the immediate to folded into the and instead of being forced to move into a register. This can sometimes result in shorter encodings since the and can sign extend an immediate.

This also allows us to match an and to a movzx after a not.

This can cause an extra move if the input to the separate NOT has an additional user which requires a copy before the NOT.

llvm-svn: 324260
2018-02-05 18:31:04 +00:00
Craig Topper 57e0643160 [X86] Teach X86DAGToDAGISel::shrinkAndImmediate to preserve upper 32 zeroes of a 64 bit mask.
If the upper 32 bits of a 64 bit mask are all zeros, we have special isel patterns to use a 32-bit and instead of a 64-bit and by relying on the impliciting zeroing of 32 bit ops.

This patch teachs shrinkAndImmediate not to break that optimization.

Differential Revision: https://reviews.llvm.org/D42899

llvm-svn: 324249
2018-02-05 16:54:07 +00:00
Krzysztof Parzyszek 02947b7112 [Hexagon] Use V6_vmpyih for halfword multiplication
Unlike V6_vmpyhv, it produces the result in the exact form that is
expected without the need for a shuffle.

llvm-svn: 324241
2018-02-05 15:40:06 +00:00
Hiroshi Inoue c5ab1ab797 [PowerPC] Check hot loop exit edge in PPCCTRLoops
PPCCTRLoops transform loops using mtctr/bdnz instructions if loop trip count is known and big enough to compensate for the cost of mtctr.
But if there is a loop exit edge which is known to be frequently taken (by builtin_expect or by PGO), we should not transform the loop to avoid the cost of mtctr instruction. Here is an example of a loop with hot exit edge:

for (unsigned i = 0; i < TripCount; i++) {
  // do something
  if (__builtin_expect(check(), 1))
    break;
  // do something
}

Differential Revision: https://reviews.llvm.org/D42637

llvm-svn: 324229
2018-02-05 12:25:29 +00:00
Craig Topper 5a2bd99a9e [X86] Add isel patterns for selecting masked SUBV_BROADCAST with bitcasts. Remove combineBitcastForMaskedOp.
Add test cases for the merge masked versions to make sure we have all those covered.

llvm-svn: 324210
2018-02-05 08:37:37 +00:00
Craig Topper 25ceba7f30 [X86] Remove X86ISD::SHUF128 from combineBitcastForMaskedOp. Use isel patterns instead.
We always created X86ISD::SHUF128 with a 64-bit element type so we can use isel patterns to detect a bitconvert to 32-bit to handle masking.

The test changes are because we also match the bitconvert even if there is no masking. This leads to unnecessary isel pattern, but it requires more multiclass hackery in tablegen to get rid of it.

llvm-svn: 324205
2018-02-05 06:00:23 +00:00
Craig Topper 0398ccd0c9 [X86] Auto-generate full checks. NFC
llvm-svn: 324202
2018-02-04 23:48:51 +00:00
Zvi Rackover 2401d20285 X86 Tests: Add shuffle that can be improved by widening elements. NFC
To be improved by D42044

llvm-svn: 324200
2018-02-04 19:31:14 +00:00
Craig Topper 8d511a65af [X86] Add DAG combine to turn (bitcast (and/or/xor (bitcast X), Y)) -> (and/or/xor X, (bitcast Y)) when casting between GPRs and mask operations.
This reduces the number of transitions between k-registers and GPRs, reducing the number of instructions.

There's still some room for improvement to remove more transitions, but this is a good start.

llvm-svn: 324184
2018-02-04 01:43:48 +00:00
Simon Pilgrim 7aec5063a5 [MIPS] Regenerate vector tests with update script
Hopefully help make this a lot more maintainable

llvm-svn: 324180
2018-02-03 22:11:22 +00:00
Simon Pilgrim 8fb1dd8a1d [X86][SSE] Don't chain shuffles together in schedule tests
This is necessary to prevent the shuffles from being combined/simplified in an upcoming patch.

llvm-svn: 324178
2018-02-03 21:20:19 +00:00
Craig Topper 071ad9c6e0 [X86] Remove and autoupgrade kand/kandn/kor/kxor/kxnor/knot intrinsics.
Clang already stopped using these a couple months ago.

The test cases aren't great as there is nothing forcing the operations to stay in k-registers so some of them moved back to scalar ops due to the bitcasts being moved around.

llvm-svn: 324177
2018-02-03 20:18:25 +00:00
Alex Bradbury 7c11527b03 [RISCV] Update two RISCV codegen tests after rL323991
From the discussion in D41835 it looks possible the change will be backed out, 
but for now let's fix the RISCV tests.

llvm-svn: 324172
2018-02-03 13:02:30 +00:00
Craig Topper e7e147f52c [X86] Add avx512 command line to ptest.ll to demonstrate that 512-bit vectors are not handled by LowerVectorAllZeroTest.
llvm-svn: 324130
2018-02-02 20:12:45 +00:00
Craig Topper bd2f6e9570 Partially revert r324124 [X86] Add tests for missed opportunities to use ptest for all ones comparison.
Turns out I misunderstood the flag behavior of PTEST because I read the documentation for KORTEST which is different than PTEST/KTEST and made a bad assumption.

Keep the test rename though cause that's useful.

llvm-svn: 324129
2018-02-02 20:12:44 +00:00
Craig Topper 9c936f88b1 [X86] Add tests for missed opportunities to use ptest for all ones comparison.
Also rename the test from pr12312.ll to ptest.ll so its more recognizable.

llvm-svn: 324124
2018-02-02 19:34:10 +00:00
Yaxun Liu 2a22c5deff [AMDGPU] Switch to the new addr space mapping by default
This requires corresponding clang change.

Differential Revision: https://reviews.llvm.org/D40955

llvm-svn: 324101
2018-02-02 16:07:16 +00:00
Clement Courbet a43e9653bb Add llc tests for comparison chains.
See https://reviews.llvm.org/D42793#996098 for context.

llvm-svn: 324099
2018-02-02 15:54:17 +00:00
Simon Pilgrim 1cb9bc6b6c [X86][SSE] Force double domain for SHUFPD stack folding tests
llvm-svn: 324094
2018-02-02 14:55:20 +00:00
Sjoerd Meijer 986d64ad73 [ARM] fixed some tabs/whitespaces in test. NFC.
llvm-svn: 324074
2018-02-02 11:51:06 +00:00
Jonas Paulsson 422dfbf7cc [SelectionDAG] Consider endianness in scalarizeVectorStore().
When handling vectors with non byte-sized elements, reverse the order of the
elements in the built integer if the target is Big-Endian.

SystemZ tests updated.

Review: Eli Friedman, Ulrich Weigand.
https://reviews.llvm.org/D42786

llvm-svn: 324063
2018-02-02 08:48:02 +00:00
Jonas Paulsson 0e50b6ed80 [SystemZ] Update test case (NFC)
test/CodeGen/SystemZ/vec-trunc-to-i1.ll was marked as a temporary
FAIL when it was previously updated when it needed one more COPY.
This was however wrong, since the loop body had been reduced
significantly, and it was actually an improvement.

Review: Ulrich Weigand.
llvm-svn: 324060
2018-02-02 07:52:02 +00:00
Craig Topper 76c5ce5184 [X86] Legalize (v64i1 (bitcast (i64 X))) on 32-bit targets by extracting 32-bit halves from i32, bitcasting each to v32i1, and concatenating.
This prevents the scalarization that would otherwise occur.

llvm-svn: 324057
2018-02-02 05:59:33 +00:00
Craig Topper 5570e03b21 [X86] Legalize (i64 (bitcast (v64i1 X))) on 32-bit targets by extracting to v32i1 and bitcasting to i32.
This saves a trip through memory and seems to open up other combining opportunities.

llvm-svn: 324056
2018-02-02 05:59:31 +00:00
Shiva Chen bbf4c5c25e [RISCV] Define getSetCCResultType for setting vector setCC type
To avoid trigger "No default SetCC type for vectors!" Assertion

Differential Revision: https://reviews.llvm.org/D42675

llvm-svn: 324054
2018-02-02 02:43:18 +00:00
Amara Emerson 572f6cecf1 [AArch64][GlobalISel] Fix old use of % sigil in test.
My rebase had missed the new $ sigil we're using.

llvm-svn: 324051
2018-02-02 02:14:42 +00:00
Amara Emerson 58aea52bc4 [GlobalISel] Constrain the dest reg of IMPLICT_DEF.
This fixes a crash where the user is a COPY, which deliberately does not
constrain its source operands, resulting in a vreg without a reg class escaping
selection.

Differential Revision: https://reviews.llvm.org/D42697

llvm-svn: 324047
2018-02-02 01:44:43 +00:00
Matthias Braun ca0abaebfb SplitKit: Fix liveness recomputation in some remat cases.
Example situation:
```
BB0:
  %0 = ...
  use %0
  ; ...
  condjump BB1
  jmp BB2

BB1:
  %0 = ...   ; rematerialized def from above (from earlier split step)
  jmp BB2

BB2:
  ; ...
  use %0
```

%0 will have a live interval with 3 value numbers (for the BB0, BB1 and
BB2 parts). Now SplitKit tries and succeeds in rematerializing the value
number in BB2 (This only works because it is a secondary split so
SplitKit is can trace this back to a single original def).

We need to recompute all live ranges affected by a value number that we
rematerialize. The case that we missed before is that when the value
that is rematerialized is at a join (Phi VNI) then we also have to
recompute liveness for the predecessor VNIs.

rdar://35699130

Differential Revision: https://reviews.llvm.org/D42667

llvm-svn: 324039
2018-02-02 00:08:19 +00:00
Simon Pilgrim d1379c6df1 Fix check-prefixes typo and line endings.
llvm-svn: 324024
2018-02-01 22:32:41 +00:00
Simon Pilgrim 808a0e1589 [X86][SSE] Add SSE41 to variable permute tests
llvm-svn: 324017
2018-02-01 22:05:44 +00:00
Simon Pilgrim 26bf800625 [X86][XOP] Add XOP to variable permute tests
llvm-svn: 324015
2018-02-01 21:57:37 +00:00
Nemanja Ivanovic 77e34f15c9 [PowerPC] Tell VSX swap removal that scalar conversions are lane-sensitive
This is a rather non-controversial change. We were missing these instructions
from the list of instructions that are lane-sensitive. These two put the result
into lane 0 (BE) or 3 (LE) regardless of the input. This patch fixes PR36068.

llvm-svn: 324005
2018-02-01 21:09:04 +00:00
Craig Topper a5944aade1 [DAGCombiner] When folding (insert_subvector undef, (bitcast (extract_subvector N1, Idx)), Idx) -> (bitcast N1) make sure that N1 has the same total size as the original output
We were only checking the element count, but not the total width. This could cause illegal bitcasts to be created if for example the output was 512-bits, but N1 is 256 bits, and the extraction size was 128-bits.

Fixes PR36199

Differential Revision: https://reviews.llvm.org/D42809

llvm-svn: 324002
2018-02-01 20:48:50 +00:00
Amara Emerson cbc02c71a4 [GlobalISel] Fix assert failure when legalizing non-power-2 loads.
Until we support extending loads properly we're going to fall back for these.
We already handle stores in the same way, so this is just being consistent.

llvm-svn: 324001
2018-02-01 20:47:03 +00:00
Geoff Berry 94503c7bc3 [MachineCopyPropagation] Extend pass to do COPY source forwarding
Summary:
This change extends MachineCopyPropagation to do COPY source forwarding
and adds an additional run of the pass to the default pass pipeline just
after register allocation.

This version of this patch uses the newly added
MachineOperand::isRenamable bit to avoid forwarding registers is such a
way as to violate constraints that aren't captured in the
Machine IR (e.g. ABI or ISA constraints).

This change is a continuation of the work started in D30751.

Reviewers: qcolombet, javed.absar, MatzeB, jonpa, tstellar

Subscribers: tpr, mgorny, mcrosier, nhaehnle, nemanjai, jyknight, hfinkel, arsenm, inouehrs, eraman, sdardis, guyblank, fedor.sergeev, aheejin, dschuff, jfb, myatsina, llvm-commits

Differential Revision: https://reviews.llvm.org/D41835

llvm-svn: 323991
2018-02-01 18:54:01 +00:00
Changpeng Fang 29fcf883fb AMDGPU/SI: Adjust the encoding family for D16 buffer instructions when the target has UnpackedD16VMem feature.
Reviewers:
  Matt and Brian

Differential Revision:
  https://reviews.llvm.org/D42548

llvm-svn: 323988
2018-02-01 18:41:33 +00:00
Simon Pilgrim 1a8cefc328 [X86][SSE] LowerBUILD_VECTORAsVariablePermute - add support for scaling index vectors
This allows us to use PSHUFB for v8i16/v4i32 and VPERMD/PERMPS for v4i64/v4f64 variable shuffles.

Differential Revision: https://reviews.llvm.org/D42487

llvm-svn: 323987
2018-02-01 18:10:30 +00:00
Sanjay Patel 702c19cc3e [AArch64] add tests with sqrt estimate and ieee denorms; NFC
As noted in D42323, we're not checking for denorms as we should.

llvm-svn: 323985
2018-02-01 17:57:45 +00:00
Sanjay Patel f42381fd7e [AArch64] auto-generate complete checks; NFC
llvm-svn: 323984
2018-02-01 17:44:50 +00:00
Craig Topper 7e910a9e85 [X86] Turn X86ISD::AND nodes that have no flag users back into ISD::AND just before isel to enable test instruction matching
Summary:
EmitTest sometimes creates X86ISD::AND specifically to hide the AND from DAG combine. But this prevents isel patterns that look for (cmp (and X, Y), 0) from being able to see it. So we end up with an AND and a TEST. The TEST gets removed by compare instruction optimization during the peephole pass.

This patch attempts to fix this by converting X86ISD::AND with no flag users back into ISD::AND during the DAG preprocessing just before isel.

In order to do this correctly I had to make the X86ISD::AND node created by EmitTest in this case really have a flag output. Which arguably it should have had anyway so that the number of operands would be consistent for the opcode in all cases. Then I had to modify the ReplaceAllUsesWith to understand that we might be looking at an instruction with 2 outputs. Though in this case there are no uses to replace since we just created the node, but that's what the code did before so I just made it keep working.

Reviewers: spatel, RKSimon, niravd, deadalnix

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D42764

llvm-svn: 323982
2018-02-01 17:08:39 +00:00
Sanjay Patel 657e5d8d41 [DAGCombiner] filter out denorm inputs when calculating sqrt estimate (PR34994)
As shown in the example in PR34994:
https://bugs.llvm.org/show_bug.cgi?id=34994
...we can return a very wrong answer (inf instead of 0.0) for square root when 
using a reciprocal square root estimate instruction.

Here, I've conditionalized the filtering out of denorms based on the function 
having "denormal-fp-math"="ieee" in its attributes. The other options for this 
attribute are 'preserve-sign' and 'positive-zero'.

So we don't generate this extra code by default with just '-ffast-math' (because 
then there's no denormal attribute string at all), but it works if you specify 
'-ffast-math -fdenormal-fp-math=ieee' from clang. 

As noted in the review, there may be other problems in clang that affect the 
results depending on platform (Linux x86 at least), but this should allow 
creating the desired codegen.

Differential Revision: https://reviews.llvm.org/D42323

llvm-svn: 323981
2018-02-01 16:57:18 +00:00
Nirav Dave 18f7f60e17 [SelectionDAG] Fix UpdateChains handling of TokenFactors
Summary:
In Instruction Selection UpdateChains replaces all matched Nodes'
chain references including interior token factors and deletes them.
This may allow nodes which depend on these interior nodes but are not
part of the set of matched nodes to be left with a dangling dependence.
Avoid this by doing the replacement for matched non-TokenFactor nodes.

Fixes PR36164.

Reviewers: jonpa, RKSimon, bogner

Subscribers: llvm-commits, hiraditya

Differential Revision: https://reviews.llvm.org/D42754

llvm-svn: 323977
2018-02-01 16:11:59 +00:00
Simon Pilgrim eb50b6d060 [X86][SSE] Add PR26491 horizontal add test
llvm-svn: 323973
2018-02-01 15:30:02 +00:00
Simon Pilgrim afc7c63bc2 [X86][AVX512DQ] Add DQ var permute 256 tests as requested on D42487
llvm-svn: 323970
2018-02-01 14:44:50 +00:00
Sjoerd Meijer 9d9a86535e [ARM] FullFP16 LowerReturn Fix
Commit r323512 introduced an optimisation in LowerReturn for half-precision
return values. A missing check caused a crash when the return value is "undef"
(i.e. a node that has no operands).

Differential Revision: https://reviews.llvm.org/D42743

llvm-svn: 323968
2018-02-01 13:48:40 +00:00
Aleksandar Beserminji a330c208f2 [mips] Include EVA instructions in Std2MicroMips mapping tables
This patch includes EVA instructions in the Std2MicroMips mapping
tables, which is required for direct object emission.

Differential Revision: https://reviews.llvm.org/D41771

llvm-svn: 323958
2018-02-01 12:53:26 +00:00
Matt Arsenault df0f25070c DAG: Fix not truncating when promoting bswap/bitreverse
These need to convert back to the original type, like any
other promotion.

llvm-svn: 323932
2018-01-31 23:54:16 +00:00
Evgeniy Stepanov 7746899f48 Revert "[ARM] Lower lower saturate to 0 and lower saturate to -1 using bit-operations"
Miscompiles code. Testcase pending.

This reverts commit r323869.

llvm-svn: 323929
2018-01-31 22:55:19 +00:00
Puyan Lotfi 43e94b15ea Followup on Proposal to move MIR physical register namespace to '$' sigil.
Discussed here:

http://lists.llvm.org/pipermail/llvm-dev/2018-January/120320.html

In preparation for adding support for named vregs we are changing the sigil for
physical registers in MIR to '$' from '%'. This will prevent name clashes of
named physical register with named vregs.

llvm-svn: 323922
2018-01-31 22:04:26 +00:00
Chandler Carruth 0dcee4fe7a [x86] Make the retpoline thunk insertion a machine function pass.
Summary:
This removes the need for a machine module pass using some deeply
questionable hacks. This should address PR36123 which is a case where in
full LTO the memory usage of a machine module pass actually ended up
being significant.

We should revert this on trunk as soon as we understand and fix the
memory usage issue, but we should include this in any backports of
retpolines themselves.

Reviewers: echristo, MatzeB

Subscribers: sanjoy, mcrosier, mehdi_amini, hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D42726

llvm-svn: 323915
2018-01-31 20:56:37 +00:00
Krzysztof Parzyszek 1108ee2496 [Hexagon] Implement HVX codegen for vector shifts
llvm-svn: 323914
2018-01-31 20:49:24 +00:00
Krzysztof Parzyszek 9eb085e6cf [Hexagon] Handle ANY_EXTEND_VECTOR_INREG in lowering
llvm-svn: 323912
2018-01-31 20:48:11 +00:00
Krzysztof Parzyszek b843f75179 [Hexagon] Handle SETCC on vector pairs in lowering
llvm-svn: 323911
2018-01-31 20:46:55 +00:00
Marek Olsak d4bb329d0e AMDGPU: Fold inline offset for loads properly in moveToVALU on GFX9
Summary:
This enables load merging into x2, x4, which is driven by inline offsets.

6500 shaders are affected:
Code Size in affected shaders: -15.14 %

Reviewers: arsenm, nhaehnle

Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D42078

llvm-svn: 323909
2018-01-31 20:18:11 +00:00
Marek Olsak 13e4741275 AMDGPU: Add intrinsics llvm.amdgcn.cvt.{pknorm.i16, pknorm.u16, pk.i16, pk.u16}
Reviewers: arsenm, nhaehnle

Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye

Differential Revision: https://reviews.llvm.org/D41663

llvm-svn: 323908
2018-01-31 20:18:04 +00:00
Geoff Berry 82203c4149 [MachineOutliner] Freeze registers in new functions
Summary:
Call MRI.freezeReservedRegs() on functions created during outlining so
that calls to isReserved() by the verifier called after this pass won't
assert.

Reviewers: MatzeB, qcolombet, paquette

Subscribers: mcrosier, javed.absar, llvm-commits

Differential Revision: https://reviews.llvm.org/D42749

llvm-svn: 323905
2018-01-31 20:15:16 +00:00
Amaury Sechet f9a9e9a251 [X86] Generate testl instruction through truncates.
Summary:
This was introduced in D42646 but ended up being reverted because the original implementation was buggy.

Depends on D42646

Reviewers: craig.topper, niravd, spatel, hfinkel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D42741

llvm-svn: 323899
2018-01-31 19:20:06 +00:00
Chih-Hung Hsieh 60d1e79ffb [Analysis] Disable calls to *_finite and other glibc-only functions on Android.
Since r322087, glibc's finite lib calls are generated when possible.
However, they are not supported on Android. This change also
disables other functions not available on Android.

Differential Revision: http://reviews.llvm.org/D42668

llvm-svn: 323898
2018-01-31 19:12:50 +00:00
Krzysztof Parzyszek 82a83391d3 [Hexagon] Handle BUILD_VECTOR from undef values in buildHvxVectorReg
llvm-svn: 323889
2018-01-31 16:52:15 +00:00
Amaury Sechet f89f188ddb [X86] Avoid using high register trick for test instruction
Summary:
It seems it's main effect is to create addition copies when values are inr register that do not support this trick, which increase register pressure and makes the code bigger.

Reviewers: craig.topper, niravd, spatel, hfinkel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D42646

llvm-svn: 323888
2018-01-31 16:48:54 +00:00
Krzysztof Parzyszek 8cc636c592 [Hexagon] Only process bitcasts of vsplats when selecting const vectors
Selecting of constant HVX vectors involves some "manual processing",
which mishandled an unrelated BITCAST operation causing a selection
error.

llvm-svn: 323887
2018-01-31 16:48:20 +00:00
Petar Jovanovic 540f4cd10a [DWARF] Allow duplication of tails with CFI instructions
This commit came as a result for revert of patch r317579 (originally
committed as r317100). The patch made CFI instructions duplicable, because
their existence in the epilogue block was affecting the Tail duplication
pass. However, duplicating blocks with CFI instructions was an issue for
compact unwind info on Darwin, which is why the patch was reverted.

This patch allows duplicating tails with CFI instructions, though they are
not duplicable, by copying them 'manually'.


Patch by Djordje Kovacevic.

Differential Revision: https://reviews.llvm.org/D40979

llvm-svn: 323883
2018-01-31 15:57:57 +00:00
Nirav Dave c3a1e16db1 [DAG] Prevent NodeId pruning of TokenFactors in Instruction Selection.
Summary:
Instruction Selection preserves relative orders of all nodes save
TokenFactors which we treat specially. As a result Node Ids for
TokenFactors may violate the topological ordering and should not be
considered as valid pruning candidates in predecessor search.

Fixes PR35316.

Reviewers: RKSimon, hfinkel

Subscribers: hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D42701

llvm-svn: 323880
2018-01-31 15:23:17 +00:00
Marina Yatsina 3f34f33148 Fix build error in r323870
Change-Id: I15a8b27764a4d817cfbe48836bf09dc6520934b7
llvm-svn: 323874
2018-01-31 14:18:37 +00:00
Florian Hahn c68428b5dc [MachineCombiner] Add check for optimal pattern order.
In D41587, @mssimpso discovered that the order of some patterns for
AArch64 was sub-optimal. I thought a bit about how we could avoid that
case in the future. I do not think there is a need for evaluating all
patterns for now. But this patch adds an extra (expensive) check, that
evaluates the latencies of all patterns, and ensures that the latency
saved decreases for subsequent patterns.

This catches the sub-optimal order fixed in D41587, but I am not
entirely happy with the check, as it only applies to sub-optimal
patterns seen while building with EXPENSIVE_CHECKS on. It did not
discover any other sub-optimal pattern ordering.

Reviewers: Gerolf, spatel, mssimpso

Reviewed By: Gerolf, mssimpso

Differential Revision: https://reviews.llvm.org/D41766

llvm-svn: 323873
2018-01-31 13:54:30 +00:00
Marina Yatsina cd5bc4a2cd Take into account the cost of local intervals when selecting split candidate.
When selecting a split candidate for region splitting, the register allocator tries to predict which candidate will have the cheapest spill cost.
Global splitting may cause the creation of local intervals, and they might spill.

This patch makes RA take into account the spill cost of local split intervals in use blocks (we already take into account the spill cost in through blocks).
A flag ("-condsider-local-interval-cost") controls weather we do this advanced cost calculation (it's on by default for X86 target, off for the rest).

Differential Revision: https://reviews.llvm.org/D41585

Change-Id: Icccb8ad2dbf13124f5d97a18c67d95aa6be0d14d
llvm-svn: 323870
2018-01-31 13:31:08 +00:00
Pablo Barrio 2e442a7831 [ARM] Lower lower saturate to 0 and lower saturate to -1 using bit-operations
Summary:
Expressions of the form x < 0 ? 0 :  x; and x < -1 ? -1 : x can be lowered using bit-operations instead of branching or conditional moves

In thumb-mode this results in a two-instruction sequence, a shift followed by a bic or or while in ARM/thumb2 mode that has flexible second operand the shift can be folded into a single bic/or instructions. In most cases this results in smaller code and possibly less branches, and in no case larger than before.

Patch by Marten Svanfeldt.

Reviewers: fhahn, pbarrio

Reviewed By: pbarrio

Subscribers: efriedma, rogfer01, aemerson, javed.absar, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D42574

llvm-svn: 323869
2018-01-31 13:20:10 +00:00
Amaury Sechet 99dbec52be Add a regression test for problems caused by D42646 . NFC
llvm-svn: 323868
2018-01-31 13:02:01 +00:00
Jonas Paulsson cc5fe73669 [SystemZ] Check the bitwidth before calling isInt/isUInt.
Since these methods will assert if the integer does not fit into 64 bits,
it is necessary to do this check before calling them in
supportedAddressingMode().

Review: Ulrich Weigand.
llvm-svn: 323866
2018-01-31 12:41:25 +00:00
Sjoerd Meijer 98d5359ea2 [ARM] Armv8.2-A FP16 code generation (part 2/3)
Half-precision arguments and return values are passed as if it were an int or
float for ARM. This results in truncates and bitcasts to/from i16 and f16
values, which are legalized very early to stack stores/loads. When FullFP16 is
enabled, we want to avoid codegen for these bitcasts as it is unnecessary and
inefficient.

Differential Revision: https://reviews.llvm.org/D42580

llvm-svn: 323861
2018-01-31 10:18:29 +00:00
Jonas Paulsson e6a8329e9f [PowerPC] Return true in enableMultipleCopyHints().
Enable multiple COPY hints to eliminate more COPYs during register allocation.

Note that this is something all targets should do, see
https://reviews.llvm.org/D38128.

Review: Nemanja Ivanovic
llvm-svn: 323858
2018-01-31 09:26:51 +00:00
Roger Ferrer Ibanez aea4208720 [ARM] Allow the scheduler to clone a node with glue to avoid a copy CPSR ↔ GPR.
In Thumb 1, with the new ADDCARRY / SUBCARRY the scheduler may need to do
copies CPSR ↔ GPR but not all Thumb1 targets implement them.

The schedule can attempt, before attempting a copy, to clone the instructions
but it does not currently do that for nodes with input glue. In this patch we
introduce a target-hook to let the hook decide if a glued machinenode is still
eligible for copying. In this case these are ARM::tADCS and ARM::tSBCS .

As a follow-up of this change we should actually implement the copies for the
Thumb1 targets that do implement them and restrict the hook to the targets that
can't really do such copy as these clones are not ideal.

This change fixes PR35836.

Differential Revision: https://reviews.llvm.org/D42051

llvm-svn: 323857
2018-01-31 09:23:43 +00:00
Eli Friedman 804d7ab811 Revert r323559 due to EXPENSIVE_CHECKS regression.
I have a fix for the issue (https://reviews.llvm.org/D42655) but
it's taking a while to get reviewed, so reverting in the meantime.

llvm-svn: 323841
2018-01-31 00:40:42 +00:00
Craig Topper f98baa7065 [X86] Add more madd reduction tests with wider vectors.
We had no test case exercising 512-bit vpmaddwd usage.

llvm-svn: 323840
2018-01-31 00:30:32 +00:00
Krzysztof Parzyszek 119856430e [RDF] Clear the renamable flag when copy propagating reserved registers
llvm-svn: 323831
2018-01-30 23:19:44 +00:00
Yaxun Liu c00d81e697 LLParser: add an argument for overriding data layout and do not check alloca addr space
Sometimes users do not specify data layout in LLVM assembly and let llc set the
data layout by target triple after loading the LLVM assembly.

Currently the parser checks alloca address space no matter whether the LLVM
assembly contains data layout definition, which causes false alarm since the
default data layout does not contain the correct alloca address space.

The parser also calls verifier to check debug info and updating invalid debug
info. Currently there is no way to let the verifier to check debug info only.
If the verifier finds non-debug-info issues the parser will fail.

For llc, the fix is to remove the check of alloca addr space in the parser and
disable updating debug info, and defer the updating of debug info and
verification to be after setting data layout of the IR by target.

For other llvm tools, since they do not override data layout by target but
instead can override data layout by a command line option, an argument for
overriding data layout is added to the parser. In cases where data layout
overriding is necessary for the parser, the data layout can be provided by
command line.

Differential Revision: https://reviews.llvm.org/D41832

llvm-svn: 323826
2018-01-30 22:32:39 +00:00
Evandro Menezes b7d1729787 [AArch64] Expand testing of zero cycle zeroing
Make sure that r321824 doesn't change zeroing.

Differential revision: https://reviews.llvm.org/D42089

llvm-svn: 323816
2018-01-30 21:14:11 +00:00
Martin Storsjo cc981d285d [GlobalISel] Bail out on calls to dllimported functions
Differential Revision: https://reviews.llvm.org/D42568

llvm-svn: 323811
2018-01-30 19:50:58 +00:00
Martin Storsjo 708498a164 [AArch64] Properly handle dllimport of variables when using fast-isel
Differential Revision: https://reviews.llvm.org/D42567

llvm-svn: 323810
2018-01-30 19:50:51 +00:00
Krzysztof Parzyszek 39a9842f3c [Hexagon] Handle non-aligned offsets in globals in extender optimization
Instructions like memd(r0+##global+1) are legal as long as the entire
address is properly aligned. Assuming that "global" is aligned at an
8-byte boundary, the expression "global+1" appears to be misaligned.
Handle such cases in HexagonConstExtenders, and make sure that any non-
extended offsets generated are still aligned accordingly.

llvm-svn: 323799
2018-01-30 18:12:37 +00:00
Krzysztof Parzyszek 96a284114e Revert: [Hexagon] Make sure that offset on globals matches alignment requirements
This reverts r323562, since it wasn't actually necessary. Constant-
extended offsets do not need to be aligned, as long as the effective
address is aligned.

Keep the testcase, with a modification which checks that such offsets
are not unnecessarily avoided.

llvm-svn: 323798
2018-01-30 18:10:27 +00:00
Geoff Berry 1d53101387 [AMDGPU] isRenamable fixes to support copy forwarding
Mark more opcodes as hasExtraSrcRegAllocReq so that their operands will
be marked as not renamable, to avoid copy forwarding violating the
constraint that only one operand may use the constant bus.

These changes fix a few mis-compiles when copy forwarding is enabled in
MachineCopyPropagation by D41835 (and were reviewed as part of that change).

llvm-svn: 323794
2018-01-30 17:37:39 +00:00
Mark Searles 94ae3b2f9b [AMDGPU] Revert "[AMDGPU] Add options for waitcnt pass debugging; add instr count in debug output."
Patch caused a buildbot failure; arg; http://lab.llvm.org:8011/builders/lld-x86_64-darwin13/builds/17373/s\
teps/build_Lld/logs/stdio :
        /Users/buildslave/as-bldslv9/lld-x86_64-darwin13/llvm.src/lib/Target/AMDGPU/SIInsertWaitcnts.cpp:1563:18: error: unused variable 'InstCnt' [-Werror,-Wunused-variable]
          static int32_t InstCnt = 0;
                                              "
This reverts commit 4f4a7d61e306b67044d9f16bc2016fee806bc2cc.

llvm-svn: 323791
2018-01-30 17:17:06 +00:00
Mark Searles d6d5a2571f [AMDGPU] Add options for waitcnt pass debugging; add instr count in debug output.
-amdgpu-waitcnt-forcezero={1|0}  Force all waitcnt instrs to be emitted as s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-amdgpu-waitcnt-forceexp=<n>  Force emit a s_waitcnt expcnt(0) before the first <n> instrs
-amdgpu-waitcnt-forcelgkm=<n> Force emit a s_waitcnt lgkmcnt(0) before the first <n> instrs
-amdgpu-waitcnt-forcevm=<n>   Force emit a s_waitcnt vmcnt(0) before the first <n> instrs

This patch was pushed ( abb190fd51cd2f9a9eef08c024e109f7f7e909fc ), which caused a buildbot failure, reverted ( 6227480d74da507cf8e1b4bcaffbdb9fb875b4b8 ), and then updated to fix buildbot failures (this patch).

Differential Revision: https://reviews.llvm.org/D40091

llvm-svn: 323788
2018-01-30 16:49:38 +00:00
Evandro Menezes f1d01645a7 [AArch64] Add new target feature to fuse address generation with load or store
This feature enables the fusion of the address generation and a
corresponding load or store together.

Differential revision: https://reviews.llvm.org/D42393

llvm-svn: 323782
2018-01-30 16:28:01 +00:00
Simon Dardis daaeaba665 [mips] Fix incorrect sign extension for fpowi libcall
PR36061 showed that during the expansion of ISD::FPOWI, that there
was an incorrect zero extension of the integer argument which for
MIPS64 would then give incorrect results. Address this with the
existing mechanism for correcting sign extensions.

This resolves PR36061.

Thanks to James Cowgill for reporting the issue!

Reviewers: atanasyan, hfinkel

Differential Revision: https://reviews.llvm.org/D42537

llvm-svn: 323781
2018-01-30 16:24:10 +00:00
Zaara Syeda 1f59ae311b Re-commit : [PowerPC] Add handling for ColdCC calling convention and a pass to mark
candidates with coldcc attribute.

This recommits r322721 reverted due to sanitizer memory leak build bot failures.

Original commit message:
This patch adds support for the coldcc calling convention for Power.
This changes the set of non-volatile registers. It includes a pass to stress
test the implementation by marking all static directly called functions with
the coldcc attribute through the option -enable-coldcc-stress-test. It also
includes an option, -ppc-enable-coldcc, to add the coldcc attribute to
functions which are cold at all call sites based on BlockFrequencyInfo when
the containing function does not call any non cold functions.

Differential Revision: https://reviews.llvm.org/D38413

llvm-svn: 323778
2018-01-30 16:17:22 +00:00
Simon Pilgrim 05f6014257 [X86][AVX512] Add VBMI target shuffle-trunc tests
llvm-svn: 323776
2018-01-30 16:01:41 +00:00
Evandro Menezes 16d7d81e5d [AArch64] Update test cases for Exynos M3
Update any test case relevant for Exynos M3.

llvm-svn: 323775
2018-01-30 15:40:27 +00:00
Evandro Menezes 9f9daa1f14 [AArch64] Add pipeline model for Exynos M3
Add the scheduling and cost model for Exynos M3.

Differential revision: https://reviews.llvm.org/D42387

llvm-svn: 323773
2018-01-30 15:40:16 +00:00
Andrei Elovikov ce256fd5f7 [X86FixupBWInsts] mir-simplify fixup-bw-inst.mir test. NFC.
llvm-svn: 323762
2018-01-30 14:25:12 +00:00
Eric Liu 0b69b5ed85 Revert "[X86] Avoid using high register trick for test instruction"
This reverts commit r323690. This causes crash in llc. See the original commit thread for details.

llvm-svn: 323761
2018-01-30 14:18:33 +00:00
Simon Pilgrim cbc2d1e111 [X86] Add test case for PR32690
llvm-svn: 323760
2018-01-30 14:15:51 +00:00
Amaury Sechet 25c9ee0fec Change simple-register-allocation-read-undef.mir so that it doesn't fail if the file path contains 'dead' . NFC
llvm-svn: 323748
2018-01-30 11:07:36 +00:00
Diana Picus f72e865372 [ARM GlobalISel] Add inst selector tests for G_SITOFP and G_UITOFP
These are handled by the TableGen'erated code.

llvm-svn: 323732
2018-01-30 09:15:27 +00:00
Diana Picus 2a5b962030 [ARM GlobalISel] Map G_SITOFP and G_UITOFP
Straightforward mapping (integer operand to GPR, floating point operand
to FPR).

llvm-svn: 323731
2018-01-30 09:15:23 +00:00
Diana Picus 517531e5a5 [ARM GlobalISel] Legalize G_SITOFP and G_UITOFP
Legal if we have hardware support, libcall otherwise.

Also add supporting code to the legalizer helper for libcalls.

llvm-svn: 323730
2018-01-30 09:15:17 +00:00
Diana Picus f5ad62d921 [ARM GlobalISel] Add inst selector tests for G_FPTOSI and G_FPTOUI
The work is done by the TableGen'erated code.

llvm-svn: 323728
2018-01-30 07:55:02 +00:00
Diana Picus a2da03022c [ARM GlobalISel] Map G_FPTOSI and G_FPTOUI
Straightforward mapping (integer operand goes to GPR, floating point
operand goes to FPR).

llvm-svn: 323727
2018-01-30 07:54:58 +00:00
Diana Picus 4ed0ee7b5f [ARM GlobalISel] Legalize G_FPTOSI and G_FPTOUI
Legal if we have hardware support for floating point, libcalls
otherwise.

Also add the necessary support for libcalls in the legalizer helper.

llvm-svn: 323726
2018-01-30 07:54:52 +00:00
Craig Topper dbf0bc75e4 [X86] Auto-generate complete checks. NFC
llvm-svn: 323724
2018-01-30 07:02:29 +00:00
Dan Gohman 832092ca12 [SelectionDAG]: Ignore "returned" in the presence of an implicit sret.
When a function return value can't be directly lowered, such as
returning an i128 on WebAssembly, as indicated by the CanLowerReturn
target hook, SelectionDAGBuilder can translate it to return the
value through a hidden sret-like argument.

If such a function has an argument with the "returned" attribute,
the attribute can't be automatically lowered, because the function
no longer has a normal return value. For now, just discard the
"returned" attribute.

This fixes PR36128.

llvm-svn: 323715
2018-01-30 00:14:40 +00:00
Quentin Colombet 72f6d59841 [RAFast] Don't dereference MBB::end
When RAFast sees liveins in on a basic block, it uses that information
to initialize the availability of the registers. The called
method uses an instruction as one of its argument and in the liveins
case, RAFast was dereferencing MBB::begin which can be MBB::end for
empty basic block.

Change the API of definePhysReg to use MachineBasicBlock::iterator
instead of MachineInstr so that we don't dereference an
invalid iterator while making the call.

rdar://problem/36952401

llvm-svn: 323710
2018-01-29 23:42:37 +00:00
Craig Topper 571231a7fe [X86] Use VMOVDQA64 for aligned vXi32 stores.
I meant to do this with the unaligned stores in r322820, but looks like I missed it.

llvm-svn: 323708
2018-01-29 23:27:23 +00:00
Marek Olsak 48057b554c AMDGPU: Allow a SGPR for the conditional KILL operand
Patch by: Bas Nieuwenhuizen

Just use the _e64 variant if needed. This should be possible as per

def : Pat <
  (int_amdgcn_kill (i1 (setcc f32:$src, InlineFPImm<f32>:$imm, cond:$cond))),
  (SI_KILL_F32_COND_IMM_PSEUDO $src, (bitcast_fpimm_to_i32 $imm), (cond_as_i32imm $cond))
> ;

I don't think we can get an immediate for the other operand for which we
need the second 32-bit word.

https://reviews.llvm.org/D42302

llvm-svn: 323706
2018-01-29 23:19:10 +00:00
Craig Topper a8f87a36f1 [X86] Add FeaturePOPCNTFalseDeps to skylake server CPU to match skylake client.
llvm-svn: 323700
2018-01-29 21:56:48 +00:00
Daniel Sanders 08464524c3 [ARM][GISel] PR35965 Constrain RegClasses of nested instructions built from Dst Pattern
Summary:
Apparently, we missed on constraining register classes of VReg-operands of all the instructions
built from a destination pattern but the root (top-level) one. The issue exposed itself
while selecting G_FPTOSI for armv7: the corresponding pattern generates VTOSIZS wrapped
into COPY_TO_REGCLASS, so top-level COPY_TO_REGCLASS gets properly constrained,
while nested VTOSIZS (or rather its destination virtual register to be exact) does not.

Fixing this by issuing GIR_ConstrainSelectedInstOperands for every nested GIR_BuildMI.

https://bugs.llvm.org/show_bug.cgi?id=35965
rdar://problem/36886530

Patch by Roman Tereshin

Reviewers: dsanders, qcolombet, rovka, bogner, aditya_nandakumar, volkan

Reviewed By: dsanders, qcolombet, rovka

Subscribers: aemerson, javed.absar, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D42565

llvm-svn: 323692
2018-01-29 21:09:12 +00:00
Amaury Sechet 015184b79e [X86] Avoid using high register trick for test instruction
Summary:
It seems it's main effect is to create addition copies when values are inr register that do not support this trick, which increase register pressure and makes the code bigger.

The main noteworthy regression I was able to observe was pattern of the type (setcc (trunc (and X, C)), 0) where C is such as it would benefit from the hi register trick. To prevent this, a new pattern is added to materialize such pattern using a 32 bits test. This has the added benefit of working with any constant that is materializable as a 32bits immediate, not just the ones that can leverage the high register trick, as demonstrated by the test case in test-shrink.ll using the constant 2049 .

Reviewers: craig.topper, niravd, spatel, hfinkel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D42646

llvm-svn: 323690
2018-01-29 20:54:33 +00:00
Amaury Sechet 4cbca08d71 [X86] Add test case to ensure testw is generated when optimizing for size. NFC
llvm-svn: 323687
2018-01-29 20:22:46 +00:00
Jun Bum Lim fc7d56d949 Revert "AArch64: Omit callframe setup/destroy when not necessary"
This reverts commit r322917 due to multiple performance regressions in spec2006
and spec2017. XFAILed llvm/test/CodeGen/AArch64/big-callframe.ll which initially
motivated this change.

llvm-svn: 323683
2018-01-29 19:56:42 +00:00
Rafael Espindola e899a0b824 Improve testcase.
We now test that pic and static produce different results for bar.
The function names were demangled.
The attributes are written inline.

llvm-svn: 323680
2018-01-29 19:37:27 +00:00
Geoff Berry d37dc77b6e [AMDGPU][X86][Mips] Make sure renamable bit not set for reserved regs
Summary:
Fix a few places that were modifying code after register
allocation to set the renamable bit correctly to avoid failing the
validation added in D42449.

llvm-svn: 323675
2018-01-29 18:47:48 +00:00
Craig Topper eb13ebdb99 [X86] Don't create SHRUNKBLEND when the condition is used by the true or false operand of the vselect.
Fixes PR34592.

Differential Revision: https://reviews.llvm.org/D42628

llvm-svn: 323672
2018-01-29 17:56:57 +00:00
Craig Topper 63db1c117a [X86] Add test case for pr34592
llvm-svn: 323671
2018-01-29 17:56:55 +00:00
Amaury Sechet 9827d8ed15 Add test case for truncated and promotion to test. NFC
llvm-svn: 323663
2018-01-29 16:13:01 +00:00
Jonas Devlieghere 865de57bde [Sparc] Account for bias in stack readjustment
Summary: This was broken long ago in D12208, which failed to account for
the fact that 64-bit SPARC uses a stack bias of 2047, and it is the
*unbiased* value which should be aligned, not the biased one. This was
seen to be an issue with Rust.

Patch by: jrtc27 (James Clarke)

Reviewers: jyknight, venkatra

Reviewed By: jyknight

Subscribers: jacob_hansen, JDevlieghere, fhahn, fedor.sergeev, llvm-commits

Differential Revision: https://reviews.llvm.org/D39425

llvm-svn: 323643
2018-01-29 12:10:32 +00:00
Andrei Elovikov c560a18c7f [X86FixupBWInsts] Fix miscompilation if sibling sub-register is live.
Summary: The issues was found during D40524.

Reviewers: andrew.w.kaylor, craig.topper, MatzeB

Reviewed By: andrew.w.kaylor

Subscribers: aivchenk, llvm-commits

Differential Revision: https://reviews.llvm.org/D42533

llvm-svn: 323635
2018-01-29 09:26:04 +00:00
Oliver Stannard a9d2e004d2 [AArch64] Generate the CASP instruction for 128-bit cmpxchg
The Large System Extension added an atomic compare-and-swap instruction
that operates on a pair of 64-bit registers, which we can use to
implement a 128-bit cmpxchg.

Because i128 is not a legal type for AArch64 we have to do all of the
instruction selection in C++, and the instruction requires even/odd
register pairs, so we have to wrap it in REG_SEQUENCE and EXTRACT_SUBREG
nodes. This is very similar to what we do for 64-bit cmpxchg in the ARM
backend.

Differential revision: https://reviews.llvm.org/D42104

llvm-svn: 323634
2018-01-29 09:18:37 +00:00
Craig Topper 62b62356fa [X86] Make foldLogicOfSetCCs work better for vectors pre legal types/operations
Summary:
There's a check in the code to only check getSetCCResultType after LegalOperations or if the type is MVT::i1. But the i1 check is only allowing scalar types through. I think it should check that the scalar type is MVT::i1 so that it will work for vectors.

The changed test already does this combine with AVX512VL where getSetCCResultType returns vXi1. But with avx512f and no VLX getSetCCResultType returns a type matching the width of the input type.

Reviewers: spatel, RKSimon

Reviewed By: spatel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D42619

llvm-svn: 323631
2018-01-29 07:52:55 +00:00
Hiroshi Inoue c8e9245816 [NFC] fix trivial typos in comments and documents
"to to" -> "to"

llvm-svn: 323628
2018-01-29 05:17:03 +00:00
Craig Topper 3913a4dd56 [X86] Fix a crash that can occur in combineExtractVectorElt due to not checking the width of a ConstantSDNode before calling getConstantOperandVal.
llvm-svn: 323614
2018-01-28 07:29:35 +00:00
Craig Topper 15d69739e2 [X86] Remove VPTESTM/VPTESTNM ISD opcodes. Use isel patterns matching cmpm eq/ne with immallzeros.
llvm-svn: 323612
2018-01-28 00:56:30 +00:00
Craig Topper 5e4b45361f [X86] Add patterns for using masked vptestnmd for 256-bit vectors without VLX.
We can widen the mask and extract it back down.

llvm-svn: 323610
2018-01-27 23:49:14 +00:00
Craig Topper 540daee124 [X86] Add test to demonstrate missed opportunity to merge kand into testnm when using 512-bit instruction due to lack of VLX.
llvm-svn: 323609
2018-01-27 23:49:11 +00:00
Justin Bogner 6e36f8250c Add triples or specify REQUIRES: default_triple to some tests
These were all failing when building the X86 backend but specifying
LLVM_DEFAULT_TARGET_TRIPLE=''.

llvm-svn: 323608
2018-01-27 23:31:09 +00:00
Simon Pilgrim 442aefdd22 [X86][AVX512] Add avx512dq fp2int/int2fp tests (PR31630)
llvm-svn: 323607
2018-01-27 22:08:27 +00:00
Craig Topper 247016a735 [X86] Use vptestm/vptestnm for comparisons with zero to avoid creating a zero vector.
We can use the same input for both operands to get a free compare with zero.

We already use this trick in a couple places where we explicitly create PTESTM with the same input twice. This generalizes it.

I'm hoping to remove the ISD opcodes and move this to isel patterns like we do for scalar cmp/test.

llvm-svn: 323605
2018-01-27 20:19:09 +00:00
Craig Topper 513d3fa674 [X86] Remove X86ISD::PCMPGTM/PCMPEQM and instead just use X86ISD::PCMPM and pattern match the immediate value during isel.
Legalization is still biased to turn LT compares in to GT by swapping operands to avoid needing extra isel patterns to commute.

I'm hoping to remove TESTM/TESTNM next and this should simplify that by making EQ/NE more similar.

llvm-svn: 323604
2018-01-27 20:19:02 +00:00
Simon Pilgrim 9c4fbad1d2 Regenerate test. NFCI
llvm-svn: 323603
2018-01-27 19:49:46 +00:00
Simon Pilgrim fe3fac805a [X86][SSE] Simplify demanded elements from BROADCAST shuffle source.
If broadcasting from another shuffle, attempt to simplify it.

We can probably generalize this a lot more (embedding in combineX86ShufflesRecursively), but BROADCAST is one of the more troublesome as it accepts inputs of different sizes to the result.

llvm-svn: 323602
2018-01-27 19:48:13 +00:00
Amaury Sechet c131a3e548 Regenerate test result for vastart-defs-eflags.ll. NFC.
llvm-svn: 323596
2018-01-27 17:52:32 +00:00
Amaury Sechet 0510b0f3d0 Regenerate test result for testb-je-fusion.ll. NFC.
llvm-svn: 323595
2018-01-27 17:19:16 +00:00
Amaury Sechet fb10aff542 Regenerate test result for stateppint-vector.ll. NFC.
llvm-svn: 323594
2018-01-27 17:16:26 +00:00
Amaury Sechet c207243405 Regenrate brcond.ll test results. NFC
llvm-svn: 323593
2018-01-27 16:57:15 +00:00
Amaury Sechet a078c2e5cb Regenrate test results for avx-brcond.ll . NFC
llvm-svn: 323592
2018-01-27 16:44:00 +00:00
Simon Pilgrim a01a52431f [X86][SSE] Regenerate fp2int/int2fp tests
Cleanup check prefixes and check full codegen

llvm-svn: 323591
2018-01-27 16:39:12 +00:00
Amaury Sechet 1ae296da36 Regenerate test results for and-su.ll . NFC
llvm-svn: 323588
2018-01-27 16:00:10 +00:00
Simon Pilgrim 516cee12cc [X86][SSE] Add broadcast from v2i32 memory tests (PR34394)
llvm-svn: 323587
2018-01-27 15:54:57 +00:00
Craig Topper 2c570eaa00 [TargetLowering] Teach TargetLowering::SimplifySetCC to simplify setcc of vXi1 vectors into logic ops.
This transform was already being done for setcc of scalar i1. This extends it to vectors.

llvm-svn: 323585
2018-01-27 09:10:58 +00:00
Craig Topper c80f0ced84 [SelectionDAG] Make DAGTypeLegalizer::PromoteSetCCOperands handle SETEQ/SETNE correctly for vector types.
The code was using getValueSizeInBits and combining with the result of a call to DAG.ComputeNumSignBits. But for vector types getValueSizeInBits returns the width of the full vector while ComputeNumSignBits is going to give a number no larger than the width of a single element. So we should be using getScalarValueSizeInBits to get the element width.

llvm-svn: 323583
2018-01-27 08:41:03 +00:00
Amara Emerson 77a5c96560 [GlobalISel][Legalizer] Convert the FP constants to the right APFloat type for G_FCONSTANT.
We weren't converting the immediate ConstantFP during legalization, which caused
the wrong bit patterns to be emitted for half type FP constants.

Fixes PR36106.

llvm-svn: 323582
2018-01-27 07:07:20 +00:00
Craig Topper 8a444ee67c [X86] Use vpternlog to implement vector not under AVX512.
Previously we had to materialize all 1s in a register using vpternlog or pcmpeq and then xor with that. By using vpternlog directly we can do it in one operation.

This is implemented using isel patterns, but we should maybe consider creating a generalized vpternlog combiner.

llvm-svn: 323572
2018-01-26 22:17:40 +00:00
Krzysztof Parzyszek 90ca4e8b0c [Hexagon] Generate constant splats instead of loads from constant pool
llvm-svn: 323568
2018-01-26 21:54:56 +00:00
Krzysztof Parzyszek d4273abb69 [Hexagon] Make sure that offset on globals matches alignment requirements
A correctly aligned address may happen to be separated into a variable
part and a constant part, where the constant part does not match the
alignment needed in a load/store that uses this address. Such a constant
cannot be used as an immediate offset in an indexed instruction.

When lowering a global address, make sure that if there is an offset
folded into the global, the offset is valid for all uses in load/store
instructions.

llvm-svn: 323562
2018-01-26 21:20:04 +00:00
Krzysztof Parzyszek 95614acc24 [Hexagon] Replace multiple vector extracts with store-load combinations
llvm-svn: 323561
2018-01-26 21:17:14 +00:00
Eli Friedman 29108843ff [LivePhysRegs] Preserve pristine regs in blocks with no successors.
One common source of blocks with no successors is calls to noreturn
functions; we want to preserve pristine registers in case they throw an
exception.

The whole pristine register thing is messy (we should really prefer to
explicitly model registers), but this fills a hole in the model for now.

Fixes https://bugs.llvm.org/show_bug.cgi?id=36073.

Differential Revision: https://reviews.llvm.org/D42509

llvm-svn: 323559
2018-01-26 20:23:00 +00:00
Craig Topper d4795b700d [X86] Allow any_extend to be combined with setcc on VLX targets.
For VLX target getSetccResultType returns vXi1 which prevents the target independent DAG combine from doing this tranform itself.

llvm-svn: 323555
2018-01-26 20:02:52 +00:00
Simon Pilgrim 8e9becbd81 [X86][AVX512] Add combining support for X86ISD::VTRUNCS
Similar to the existing support for X86ISD::VTRUNCUS.

Differential Revision: https://reviews.llvm.org/D42544

llvm-svn: 323553
2018-01-26 20:01:12 +00:00
Krzysztof Parzyszek 1a1edbfb04 [Hexagon] Fix an incorrect assertion in HexagonConstExtenders
llvm-svn: 323548
2018-01-26 19:20:50 +00:00
Simon Pilgrim 1b14bdc0b8 [X86][AVX] LowerBUILD_VECTORAsVariablePermute - add support for VPERMILPV to v4i32/v4f32
Extension to D42431, adding support for v4i32/v4f32 as well as v2i64/v2f64 now that D42308 has landed

llvm-svn: 323542
2018-01-26 17:19:59 +00:00
Simon Pilgrim 76ede609f6 [X86][SSE] Don't colaesce v4i32 extracts
We currently coalesce v4i32 extracts from all 4 elements to 2 v2i64 extracts + shifts/sign-extends.

This seems to have been added back in the days when we tended to spill vectors and reload scalars, or ended up with repeated shuffles moving everything down to 0'th index. I don't think either of these are likely these days as we have better EXTRACT_VECTOR_ELT and VECTOR_SHUFFLE handling, and the existing code tends to make it very difficult for various vector and load combines.

Differential Revision: https://reviews.llvm.org/D42308

llvm-svn: 323541
2018-01-26 17:11:34 +00:00
Nirav Dave 9896238dc9 [DAG] Teach findBaseOffset to interpret indexes of indexed memory operations
Indexed outputs are addition / subtractions and can be interpreted as such.

llvm-svn: 323539
2018-01-26 16:51:27 +00:00
Alexander Richardson 1f9636f3ef [MIPS] Don't crash on unsized extern types with -mgpopt
Summary: This fixes an assertion when building the FreeBSD MIPS64 kernel.

Reviewers: atanasyan, sdardis, emaste

Reviewed By: sdardis

Subscribers: krytarowski, llvm-commits

Differential Revision: https://reviews.llvm.org/D42571

llvm-svn: 323536
2018-01-26 15:56:14 +00:00
Simon Pilgrim f531cf8964 [DAGCombine] reduceBuildVecToShuffle - ensure EXTRACT_VECTOR_ELT index is in range
From OSS Fuzz Test Case #5688

llvm-svn: 323535
2018-01-26 15:50:20 +00:00
Simon Pilgrim 65ec923805 [X86][SSE] Add tests for vector truncation with PACKUS style signed saturation
PACKUS - truncates signed value, saturating to [0,unsigned_max_trunc]

llvm-svn: 323531
2018-01-26 14:58:50 +00:00
Francis Visoiu Mistrih e4718e84e8 [MIR] Add support for addrspace in MIR
Add support for printing / parsing the addrspace of a MachineMemOperand.

Fixes PR35970.

Differential Revision: https://reviews.llvm.org/D42502

llvm-svn: 323521
2018-01-26 11:47:28 +00:00
Daniil Fukalov 6e1dc68117 [AMDGPU] fix LDS f32 intrinsics
- using qualified pointer addrspace in intrinsics class to avoid .f32 mangling
- changed too common atomic mangling to ds
- added missing intrinsics to AMDGPUTTIImpl::getTgtMemIntrinsic

Reviewed by: b-sumner

Differential Revision: https://reviews.llvm.org/D42383

llvm-svn: 323516
2018-01-26 11:09:38 +00:00
Momchil Velikov d2cc6fd90b [ARM] Accept a subset of Thumb GPR register class when emitting an SP-relative
load instruction

The function `Thumb1InstrInfo::loadRegFromStackSlot` accepts only the `tGPR`
register class. The function serves to emit a `tLDRspi` instruction and
certainly any subset of the `tGPR` register class is a valid destination of the
load.

Differential revision: https://reviews.llvm.org/D42535

llvm-svn: 323514
2018-01-26 10:20:58 +00:00
Andrei Elovikov cbc5a688f3 [X86FixupBWInsts] Prefer positive checks in the test. NFC
Reviewers: andrew.w.kaylor, craig.topper, MatzeB

Reviewed By: andrew.w.kaylor

Subscribers: aivchenk, llvm-commits

Differential Revision: https://reviews.llvm.org/D42531

llvm-svn: 323513
2018-01-26 09:50:32 +00:00
Sjoerd Meijer 011de9c0ca [ARM] Armv8.2-A FP16 code generation (part 1/3)
This is the groundwork for Armv8.2-A FP16 code generation .

Clang passes and returns _Float16 values as floats, together with the required
bitconverts and truncs etc. to implement correct AAPCS behaviour, see D42318.
We will implement half-precision argument passing/returning lowering in the ARM
backend soon, but for now this means that this:

_Float16 sub(_Float16 a, _Float16 b) {
  return a + b;
}

gets lowered to this:

define float @sub(float %a.coerce, float %b.coerce) {
entry:
  %0 = bitcast float %a.coerce to i32
  %tmp.0.extract.trunc = trunc i32 %0 to i16
  %1 = bitcast i16 %tmp.0.extract.trunc to half
  <SNIP>
  %add = fadd half %1, %3
  <SNIP>
}

When FullFP16 is *not* supported, we don't make f16 a legal type, and we get
legalization for "free", i.e. nothing changes and everything works as before.
And also f16 argument passing/returning is handled.

When FullFP16 is supported, we do make f16 a legal type, and have 2 places that
we need to patch up: f16 argument passing and returning, which involves minor
tweaks to avoid unnecessary code generation for some bitcasts.

As a "demonstrator" that this works for the different FP16, FullFP16, softfp
modes, etc., I've added match rules to the VSUB instruction description showing
that we can codegen this instruction from IR, but more importantly, also to
some conversion instructions. These conversions were causing issue before in
the FP16 and FullFP16 cases.

I've also added match rules to the VLDRH and VSTRH desriptions, so that we can
actually compile the entire half-precision sub code example above. This showed
that these loads and stores had the wrong addressing mode specified: AddrMode5
instead of AddrMode5FP16, which turned out not be implemented at all, so that
has also been added.

This is the minimal patch that shows all the different moving parts. In patch
2/3 I will add some efficient lowering of bitcasts, and in 2/3 I will add the
remaining Armv8.2-A FP16 instruction descriptions.


Thanks to Sam Parker and Oliver Stannard for their help and reviews!


Differential Revision: https://reviews.llvm.org/D38315

llvm-svn: 323512
2018-01-26 09:26:40 +00:00
Hiroshi Inoue 0909ca132f [NFC] fix trivial typos in comments and documents
"in in" -> "in", "on on" -> "on" etc.

llvm-svn: 323508
2018-01-26 08:15:29 +00:00
Serguei Katkov 9fe0524ee6 [CGP] Re-enable Select in complex addressing mode.
Switch Select handling on after fixing two bugs: rL323192 and rL323497.

llvm-svn: 323498
2018-01-26 06:26:56 +00:00
Serguei Katkov 1ce7137c99 [X86] Fix killed flag handling in X86FixupLea pass
When pass creates a MOV instruction for 
lea (%base,%index,1), %dst => mov %base,%dst; add %index,%dst
modification it should clean the killed flag for base
if base is equal to index.

Otherwise verifier complains about usage of killed register in add instruction.

Reviewers: lsaba, zvi, zansari, aaboud
Reviewed By: lsaba
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D42522

llvm-svn: 323497
2018-01-26 04:49:26 +00:00
Shoaib Meenai d8fd16b08f [CodeGen] Ignore private symbols in llvm.used for COFF
Similar to the existing handling for internal symbols, private symbols
are also not visible to the linker and should be ignored.

llvm-svn: 323483
2018-01-26 00:15:25 +00:00
Krzysztof Parzyszek b2c458e648 [Hexagon] SETEQ and SETNE are valid integer condition codes
llvm-svn: 323452
2018-01-25 18:07:27 +00:00
Simon Pilgrim fb01d06669 [X86][SSE] Add tests for vector truncation with signed saturation
AVX512 isn't using X86ISD::VTRUNCS and SSE/AVX isn't using PACKSS/PACKUS

llvm-svn: 323428
2018-01-25 14:56:21 +00:00
Simon Pilgrim e59bf81e74 [X86][SSE] Add tests for vector truncation with unsigned saturation
AVX512 tends to do a good job, but there are some missed opportunities with SSE/AVX

llvm-svn: 323422
2018-01-25 14:28:55 +00:00
Zvi Rackover 0fb9638e3c X86 Tests: Add AVX+XOP config to SDIV combine tests
As pointed out in D42479, XOP also needs to be covered as it supports
vector shifts with variable shift amount.

llvm-svn: 323418
2018-01-25 14:07:33 +00:00
Craig Topper b369cdbaad [X86] Expand IMUL/MUL instregexs in Intel scheduler models. Add load latency to some of them in SkylakeClient model.
The regular expressions and the imul names caused some instructions to be matched by multiple regexs creating unpredictable results.

This changes them all to use explicit instrs instead.

While doing this I also found that some instructions in Skylake were missing load latency so I fixed that too.

llvm-svn: 323406
2018-01-25 06:57:42 +00:00
Amara Emerson 5ee0398849 [GlobalISel] Add a requires: asserts to a test.
llvm-svn: 323384
2018-01-24 22:40:25 +00:00
Amara Emerson 4f84f8862b [AArch64][GlobalISel] Fall back during AArch64 isel if we have a volatile load.
The tablegen imported patterns for sext(load(a)) don't check for single uses
of the load or delete the original after matching. As a result two loads are
left in the generated code. This particular issue will be fixed by adding
support for a G_SEXTLOAD opcode in future.

There are however other potential issues around this that wouldn't be fixed by
a G_SEXTLOAD, so until we have a proper solution we don't try to handle volatile
loads at all in the AArch64 selector.

Fixes/works around PR36018.

llvm-svn: 323371
2018-01-24 20:35:37 +00:00
Amara Emerson f386e2b081 [GlobalISel] Don't fall back to FastISel.
Apparently checking the pass structure isn't enough to ensure that we don't fall
back to FastISel, as it's set up as part of the SelectionDAGISel.

llvm-svn: 323369
2018-01-24 19:59:29 +00:00
Simon Pilgrim 9f551ad604 [X86][SSE] Aggressively use PMADDWD for v4i32 multiplies with 17 or more leading zeros
As discussed in D41484, PMADDWD for 'zero extended' vXi32 is nearly always a better option than PMULLD:
On SNB it will result in code that isn't any faster, but not any slower so we may as well keep it.
On KNL it only has half the throughput, so I've disabled it on there - ideally there'd be a better way than this.

Differential Revision: https://reviews.llvm.org/D42258

llvm-svn: 323367
2018-01-24 19:20:02 +00:00
Simon Pilgrim 21f17d4098 [X86][SSE] Add slow-pmulld attribute (silvermont-style) test
Requested by @zvi on D42258

llvm-svn: 323364
2018-01-24 19:09:11 +00:00
Nicolai Haehnle 4afb64e4c6 Revert r321751, "StructurizeCFG: Fix broken backedge detection"
It causes regressions in various OpenGL test suites.

Keep the test cases introduced by r321751 as XFAIL, and add a test case
for the regression.

Change-Id: I90b4cc354f68cebe5fcef1f2422dc8fe1c6d3514
Bugzilla: https://bugs.llvm.org/show_bug.cgi?id=36015
llvm-svn: 323355
2018-01-24 18:02:05 +00:00
Weiming Zhao 665784f170 [ARM] Expand long shifts for Thumb1 to __aeabi_ calls
Summary: For long shifts, the inlined version takes about 20 instructions on Thumb1. To avoid the code bloat, expand to __aeabi_ calls if target is Thumb1.

Reviewers: samparker

Reviewed By: samparker

Subscribers: samparker, aemerson, javed.absar, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D42401

llvm-svn: 323354
2018-01-24 18:00:57 +00:00
Craig Topper 05af43fbad [X86] Fix some inconsistencies in the itineraries and Sched for (V)PEXTRW/(V)PINSRW
The weirdest being that PEXTRWrr was tagged as a memory operation.

llvm-svn: 323353
2018-01-24 17:58:57 +00:00
Craig Topper b85b484fee [X86] Adjust names of PINSRW/PEXTRW intructions between MMX/SSE/AVX/AVX512 for consistency and to maybe enable more regular expression compaction in the scheduler models. NFCI
llvm-svn: 323352
2018-01-24 17:58:51 +00:00
Krzysztof Parzyszek cf3ad5841b [Hexagon] Run late copy propagation and dead code elimination passes
llvm-svn: 323346
2018-01-24 17:48:11 +00:00
Zvi Rackover 22bfa7e574 X86 Tests: Add more sdiv combine cases. NFC
Add cases with vector non-splat pow2 contant divider.

llvm-svn: 323329
2018-01-24 15:02:16 +00:00
Pablo Barrio 9b3d4c01a0 [AArch64] Avoid unnecessary vector byte-swapping in big-endian
Summary:
Loads/stores of some NEON vector types are promoted to other vector
types with different lane sizes but same vector size. This is not a
problem in little-endian but, when in big-endian, it requires
additional byte reversals required to preserve the lane ordering
while keeping the right endianness of the data inside each lane.
For example:

%1 = load <4 x half>, <4 x half>* %p

results in the following assembly:

ld1 { v0.2s }, [x1]
rev32 v0.4h, v0.4h

This patch changes the promotion of these loads/stores so that the
actual vector load/store (LD1/ST1) takes care of the endianness
correctly and there is no need for further byte reversals. The
previous code now results in the following assembly:

ld1 { v0.4h }, [x1]

Reviewers: olista01, SjoerdMeijer, efriedma

Reviewed By: efriedma

Subscribers: aemerson, rengolin, javed.absar, llvm-commits, kristof.beyls

Differential Revision: https://reviews.llvm.org/D42235

llvm-svn: 323325
2018-01-24 14:13:47 +00:00
Sven van Haastregt e8404780c3 [DAGCombiner] Bail out if vector size is not a multiple
For the included test case, the DAG transformation
  concat_vectors(scalar, undef) -> scalar_to_vector(sclr)
would attempt to create a v2i32 vector for a v9i8
concat_vector.  Bail out to avoid creating a bitcast with
mismatching sizes later on.

Differential Revision: https://reviews.llvm.org/D42379

llvm-svn: 323312
2018-01-24 09:53:47 +00:00
Martin Storsjo 4ed94a06ac [ARM] Call __chkstk for dynamic stack allocation in all windows environments
This matches what MSVC does for alloca() function calls on ARM.
Even if MSVC doesn't support VLAs at the language level, it does
support the alloca function.

On the clang level, both the _alloca() (when emulating MSVC, which is
what the alloca() function expands to) and __builtin_alloca() builtin
functions, and VLAs, map to the same LLVM IR "alloca" function - so
within LLVM they're not distinguishable from each other.

Differential Revision: https://reviews.llvm.org/D42292

llvm-svn: 323308
2018-01-24 06:40:11 +00:00
Martin Storsjo e8248f2e10 [GlobalMerge] Don't merge dllexport globals
Merging such globals loses the dllexport attribute. Add a test
to check that normal globals still are merged.

Differential Revision: https://reviews.llvm.org/D42127

llvm-svn: 323307
2018-01-24 06:40:04 +00:00
Hiroshi Inoue 501931b117 [NFC] fix trivial typos in comments
"the the" -> "the"

llvm-svn: 323302
2018-01-24 05:04:35 +00:00
Rafael Espindola 432a587cf0 Don't assume a null GV is local for ELF and MachO.
This is already a simplification, and should help with avoiding a plt
reference when calling an intrinsic with -fno-plt.

With this change we return false for null GVs, so the caller only
needs to check the new metadata to decide if it should use foo@plt or
*foo@got.

llvm-svn: 323297
2018-01-24 02:11:18 +00:00
Aditya Nandakumar f2aa2af24e [GISel]: Remove redundant copies at the end of ISel
https://reviews.llvm.org/D42402

A lot of these copies are useless (copies b/w VRegs having the same
regclass) and should be cleaned up.

llvm-svn: 323291
2018-01-24 01:35:26 +00:00
Matthias Braun 70fd374d1e AArch64: Cyclone: Remove SlowMisaligned128Store tuning flag
Remove FeatureSlowMisaligned128Store from cyclone flags.
This flag causes splitting of 16 byte wide stores into 2 stored of 8
bytes. This was useful on older apple CPUs which were slow for 16byte
stores that were not aligned on 16byte. As the compiler often cannot
predict the actual alignment, the splitting was choosen.

This has been a topic for a lot of debate as the splitting also
decreases performance for some benchmarks. Measuring the effects on
newer apple chips (rdar://35525421) shows that it harms more cases than
it helps. So it is time to retire this workaround.

llvm-svn: 323289
2018-01-24 00:39:53 +00:00
Tim Shen 7abe9887b0 [PPC] Avoid incorrect fp-i128-fp lowering.
Summary:
Fix an issue that's similar to what D41411 fixed:
  float(__int128(float_var)) shouldn't be optimized to xscvdpsxds +
  xscvsxdsp, as they mean (float)(int64_t)float_var.

Reviewers: jtony, hfinkel, echristo

Subscribers: sanjoy, nemanjai, hiraditya, llvm-commits, kbarton

Differential Revision: https://reviews.llvm.org/D42400

llvm-svn: 323270
2018-01-23 22:06:57 +00:00
Craig Topper 002657731b [X86] Move 'Int_' to the end of the name of the VCOMISS/VUCOMISS and instructions to get them picked up by the scheduler model regexs.
All other intrinsic instructions put the _Int on the end. This make these instructions consistent and gets the prefix instregexs in the scheduler models to pick them up.

llvm-svn: 323261
2018-01-23 21:37:51 +00:00
Simon Pilgrim 2cc74ed2be [X86][AVX] LowerBUILD_VECTORAsVariablePermute - add support for VPERMILPV to v2i64/v2f64
Minor refactor to make it possible for LowerBUILD_VECTORAsVariablePermute to be used with a wider variety of shuffles op and types.

I'd have liked to add v4i32/v4f32 support as well but we don't see v4i32 index extractions at the moment (which is why I created D42308)

After this I intend to begin adding scaling support for PSHUFB (v8i16, v4i32, v2i64)) and VPERMPS (v4f64, v4i64).

Differential Revision: https://reviews.llvm.org/D42431

llvm-svn: 323260
2018-01-23 21:33:24 +00:00
Evgeniy Stepanov d5a6fdbe95 [safestack] Inline safestack pointer access when possible.
Summary:
This adds an -mllvm flag that forces the use of a runtime function call to
get the unsafe stack pointer, the same that is currently used on non-x86, non-aarch64 android.
The call may be inlined.

Reviewers: pcc

Subscribers: aemerson, kristof.beyls, hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D37405

llvm-svn: 323259
2018-01-23 21:27:07 +00:00
Krzysztof Parzyszek d5e8a260bb [Hexagon] Add patterns for sext_inreg of HVX vector types
llvm-svn: 323250
2018-01-23 19:56:16 +00:00
Krzysztof Parzyszek 3780a0e1fa [Hexagon] Implement basic vector operations on vectors vNi1
In addition to that, make sure that there are no boolean vector types that
are associated with multiple register classes. Specifically, remove v32i1
and v64i1 from integer register classes. These types will correspond to
results of vector comparisons, and as such should belong to the vector
predicate class. Having them in scalar registers as well makes legalization
ambiguous.

llvm-svn: 323229
2018-01-23 17:53:59 +00:00
Simon Pilgrim 6ff241fc99 [X86][SSE] LowerBUILD_VECTORAsVariablePermute - extract subvector from oversized index vectors
llvm-svn: 323223
2018-01-23 17:02:15 +00:00
Dan Gohman 5464941a6a [WebAssembly] Add mem.* intrinsics.
The grow_memory and current_memory instructions are expected to be
officially renamed to mem.grow and mem.size. Introduce new intrinsics
with the new names. These new names aren't yet official, so for now,
use them at your own risk.

Also, take this opportunity to add arguments for the currently unused
immediate field in those instructions.

llvm-svn: 323222
2018-01-23 17:02:02 +00:00
Dan Gohman f2c1cae5cb [WebAssembly] Switch to *-wasm as the default target triple.
This makes wasm32-unknown-unknown-wasm the default, which supports
the .o file writer and the new linking ABI. To enable s2wasm-compatible
output, use the wasm32-unknown-unknown-elf triple.

llvm-svn: 323220
2018-01-23 16:55:44 +00:00
Alexander Ivchenko e642231bfc [x86] Reautogenerate a bunch of tests for D42287. NFC
llvm-svn: 323215
2018-01-23 16:08:15 +00:00
Yaxun Liu 8b7454a8dd CodeGen: Fix assertion in ScheduleDAGMILive::scheduleMI due to llvm.dbg.value
Fix a bug in ScheduleDAGMILive::scheduleMI which causes BotRPTracker not tracking CurrentBottom in some rare cases involving llvm.dbg.value.

This issues causes amdgcn target to assert when compiling some user codes with -g.

Differential Revision: https://reviews.llvm.org/D42394

llvm-svn: 323214
2018-01-23 16:04:53 +00:00
Craig Topper c58c2b5c9b [X86] Rewrite vXi1 element insertion by using a vXi1 scalar_to_vector and inserting into a vXi1 vector.
The existing code was already doing something very similar to subvector insertion so this allows us to remove the nearly duplicate code.

This patch is a little larger than it should be due to differences between the DQI handling between the two today.

llvm-svn: 323212
2018-01-23 15:56:36 +00:00
Simon Pilgrim 0c9f77a9f9 [X86][SSE] LowerBUILD_VECTORAsVariablePermute - ensure that the source vector is not larger than the destination
We might be able to support this in the future with VPERMV3, OR(PSHUFB, PSHUFB) etc.

llvm-svn: 323210
2018-01-23 15:51:03 +00:00
Alexander Ivchenko 347921a281 [x86] Mostly reautogenerate a bunch of tests that affect D37775. NFC
Tests required minor manual tweaks:
CodeGen/MIR/X86/generic-instr-type.mir
CodeGen/X86/GlobalISel/select-copy.mir
CodeGen/X86/GlobalISel/select-ext.mir
CodeGen/X86/GlobalISel/select-intrinsic-x86-flags-read-u32.mir
CodeGen/X86/GlobalISel/select-phi.mir
CodeGen/X86/GlobalISel/select-trunc.mir
CodeGen/X86/GlobalISel/select-frameIndex.mir

And following tests are split into 32/64 versions:
CodeGen/X86/GlobalISel/legalize-GV.mir
CodeGen/X86/GlobalISel/select-frameIndex.mir

llvm-svn: 323209
2018-01-23 15:48:50 +00:00
Simon Pilgrim e2905c8a0c [X86][SSE] LowerBUILD_VECTORAsVariablePermute - ensure that the index vector has the correct number of elements
llvm-svn: 323206
2018-01-23 15:13:37 +00:00
Tim Northover f9b560aa8e AArch64: get type from correct result when forming BFX
Some nodes produce multiple values so when obtaining the type of an ISD::OR we
need to make sure we ask for the correct one. Hopefully that's all of them.

llvm-svn: 323205
2018-01-23 15:11:27 +00:00
Tim Northover 9f3003d08f AArch64: get type from correct result when forming BFI/BFM
Some nodes produce multiple values so when obtaining the type of an ISD::OR we
need to make sure we ask for the correct one.

llvm-svn: 323202
2018-01-23 14:37:03 +00:00
Craig Topper 76adcc86cd [X86] Legalize v32i1 without BWI via splitting to v16i1 rather than the default of promoting to v32i8.
Summary:
For the most part its better to keep v32i1 as a mask type of a narrower width than trying to promote it to a ymm register.

I had to add some overrides to the methods that get the types for the calling convention so that we still use v32i8 for argument/return purposes.

There are still some regressions in here. I definitely saw some around shuffles. I think we probably should move vXi1 shuffle from lowering to a DAG combine where I think the extend and truncate we have to emit would be better combined.

I think we also need a DAG combine to remove trunc from (extract_vector_elt (trunc))

Overall this removes something like 13000 CHECK lines from lit tests.

Reviewers: zvi, RKSimon, delena, spatel

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D42031

llvm-svn: 323201
2018-01-23 14:25:39 +00:00
Simon Pilgrim 8ea1a0c690 [X86][SSE] LowerBUILD_VECTORAsVariablePermute - fix PSHUFB source/index operand ordering
As detailed in rL317463, PSHUFB (like most variable shuffle instructions) uses Op[0] for the source vector and Op[1] for the shuffle index vector, VPERMV works in reverse which is probably where the confusion comes from.

Differential Revision: https://reviews.llvm.org/D42380

llvm-svn: 323190
2018-01-23 11:39:06 +00:00
MinSeong Kim 27f77b4300 [Analysis] Disable exp/exp2/pow finite lib calls on Android with -ffast-math.
Summary:
Since r322087, glibc's finite lib calls are generated when possible.
However, glibc is not supported on Android. Therefore this change
enables llvm to finely distinguish between linux and Android for
unsupported library calls. The change also include some regression
tests.

Reviewers: srhines, pirama

Reviewed By: srhines

Subscribers: kongyi, chh, javed.absar, llvm-commits

Differential Revision: https://reviews.llvm.org/D42288

llvm-svn: 323187
2018-01-23 11:11:36 +00:00
Stefan Maksimovic 98749e0249 [mips] Properly select abs and sqrt instructions
- Alter abs for micromips to have both AFGR64 and FGR64
  variants, same as sqrt
- Remove sqrt and abs from MicroMips32r6InstrInfo.td,
  use micromips FGR64 variants
- Restrict non-micromips abs/sqrt with NotInMicroMips
  predicate

Differential revision: https://reviews.llvm.org/D41439

llvm-svn: 323184
2018-01-23 10:09:39 +00:00
Craig Topper c92edd994e [X86] Don't reorder (srl (and X, C1), C2) if (and X, C1) can be matched as a movzx
Summary:
If we can match as a zero extend there's no need to flip the order to get an encoding benefit. As movzx is 3 bytes with independent source/dest registers. The shortest 'and' we could make is also 3 bytes unless we get lucky in the register allocator and its on AL/AX/EAX which have a 2 byte encoding.

This patch was more impressive before r322957 went in. It removed some of the same Ands that got deleted by that patch.

Reviewers: spatel, RKSimon

Reviewed By: spatel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D42313

llvm-svn: 323175
2018-01-23 05:45:52 +00:00
Craig Topper e5aea25980 [X86] Remove 'NOREX' comment from the printing of _NOREX instructions.
Some of the NOREX instructions are used in 32-bit mode making this printing confusing. It also doesn't provide a lot of value since you can see the h-register being used by the instruction.

llvm-svn: 323174
2018-01-23 05:37:00 +00:00
Chandler Carruth c58f2166ab Introduce the "retpoline" x86 mitigation technique for variant #2 of the speculative execution vulnerabilities disclosed today, specifically identified by CVE-2017-5715, "Branch Target Injection", and is one of the two halves to Spectre..
Summary:
First, we need to explain the core of the vulnerability. Note that this
is a very incomplete description, please see the Project Zero blog post
for details:
https://googleprojectzero.blogspot.com/2018/01/reading-privileged-memory-with-side.html

The basis for branch target injection is to direct speculative execution
of the processor to some "gadget" of executable code by poisoning the
prediction of indirect branches with the address of that gadget. The
gadget in turn contains an operation that provides a side channel for
reading data. Most commonly, this will look like a load of secret data
followed by a branch on the loaded value and then a load of some
predictable cache line. The attacker then uses timing of the processors
cache to determine which direction the branch took *in the speculative
execution*, and in turn what one bit of the loaded value was. Due to the
nature of these timing side channels and the branch predictor on Intel
processors, this allows an attacker to leak data only accessible to
a privileged domain (like the kernel) back into an unprivileged domain.

The goal is simple: avoid generating code which contains an indirect
branch that could have its prediction poisoned by an attacker. In many
cases, the compiler can simply use directed conditional branches and
a small search tree. LLVM already has support for lowering switches in
this way and the first step of this patch is to disable jump-table
lowering of switches and introduce a pass to rewrite explicit indirectbr
sequences into a switch over integers.

However, there is no fully general alternative to indirect calls. We
introduce a new construct we call a "retpoline" to implement indirect
calls in a non-speculatable way. It can be thought of loosely as
a trampoline for indirect calls which uses the RET instruction on x86.
Further, we arrange for a specific call->ret sequence which ensures the
processor predicts the return to go to a controlled, known location. The
retpoline then "smashes" the return address pushed onto the stack by the
call with the desired target of the original indirect call. The result
is a predicted return to the next instruction after a call (which can be
used to trap speculative execution within an infinite loop) and an
actual indirect branch to an arbitrary address.

On 64-bit x86 ABIs, this is especially easily done in the compiler by
using a guaranteed scratch register to pass the target into this device.
For 32-bit ABIs there isn't a guaranteed scratch register and so several
different retpoline variants are introduced to use a scratch register if
one is available in the calling convention and to otherwise use direct
stack push/pop sequences to pass the target address.

This "retpoline" mitigation is fully described in the following blog
post: https://support.google.com/faqs/answer/7625886

We also support a target feature that disables emission of the retpoline
thunk by the compiler to allow for custom thunks if users want them.
These are particularly useful in environments like kernels that
routinely do hot-patching on boot and want to hot-patch their thunk to
different code sequences. They can write this custom thunk and use
`-mretpoline-external-thunk` *in addition* to `-mretpoline`. In this
case, on x86-64 thu thunk names must be:
```
  __llvm_external_retpoline_r11
```
or on 32-bit:
```
  __llvm_external_retpoline_eax
  __llvm_external_retpoline_ecx
  __llvm_external_retpoline_edx
  __llvm_external_retpoline_push
```
And the target of the retpoline is passed in the named register, or in
the case of the `push` suffix on the top of the stack via a `pushl`
instruction.

There is one other important source of indirect branches in x86 ELF
binaries: the PLT. These patches also include support for LLD to
generate PLT entries that perform a retpoline-style indirection.

The only other indirect branches remaining that we are aware of are from
precompiled runtimes (such as crt0.o and similar). The ones we have
found are not really attackable, and so we have not focused on them
here, but eventually these runtimes should also be replicated for
retpoline-ed configurations for completeness.

For kernels or other freestanding or fully static executables, the
compiler switch `-mretpoline` is sufficient to fully mitigate this
particular attack. For dynamic executables, you must compile *all*
libraries with `-mretpoline` and additionally link the dynamic
executable and all shared libraries with LLD and pass `-z retpolineplt`
(or use similar functionality from some other linker). We strongly
recommend also using `-z now` as non-lazy binding allows the
retpoline-mitigated PLT to be substantially smaller.

When manually apply similar transformations to `-mretpoline` to the
Linux kernel we observed very small performance hits to applications
running typical workloads, and relatively minor hits (approximately 2%)
even for extremely syscall-heavy applications. This is largely due to
the small number of indirect branches that occur in performance
sensitive paths of the kernel.

When using these patches on statically linked applications, especially
C++ applications, you should expect to see a much more dramatic
performance hit. For microbenchmarks that are switch, indirect-, or
virtual-call heavy we have seen overheads ranging from 10% to 50%.

However, real-world workloads exhibit substantially lower performance
impact. Notably, techniques such as PGO and ThinLTO dramatically reduce
the impact of hot indirect calls (by speculatively promoting them to
direct calls) and allow optimized search trees to be used to lower
switches. If you need to deploy these techniques in C++ applications, we
*strongly* recommend that you ensure all hot call targets are statically
linked (avoiding PLT indirection) and use both PGO and ThinLTO. Well
tuned servers using all of these techniques saw 5% - 10% overhead from
the use of retpoline.

We will add detailed documentation covering these components in
subsequent patches, but wanted to make the core functionality available
as soon as possible. Happy for more code review, but we'd really like to
get these patches landed and backported ASAP for obvious reasons. We're
planning to backport this to both 6.0 and 5.0 release streams and get
a 5.0 release with just this cherry picked ASAP for distros and vendors.

This patch is the work of a number of people over the past month: Eric, Reid,
Rui, and myself. I'm mailing it out as a single commit due to the time
sensitive nature of landing this and the need to backport it. Huge thanks to
everyone who helped out here, and everyone at Intel who helped out in
discussions about how to craft this. Also, credit goes to Paul Turner (at
Google, but not an LLVM contributor) for much of the underlying retpoline
design.

Reviewers: echristo, rnk, ruiu, craig.topper, DavidKreitzer

Subscribers: sanjoy, emaste, mcrosier, mgorny, mehdi_amini, hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D41723

llvm-svn: 323155
2018-01-22 22:05:25 +00:00
Mark Searles 7687d42052 [AMDGPU] SI Load Store Optimizer: When merging with offset, use V_ADD_{I|U}32_e64
- Change inserted add ( V_ADD_{I|U}32_e32 ) to _e64 version ( V_ADD_{I|U}32_e64 ) so that the add uses a vreg for the carry; this prevents inserted v_add from killing VCC; the _e64 version doesn't accept a literal in its encoding, so we need to introduce a mov instr as well to get the imm into a register.
- Change pass name to "SI Load Store Optimizer"; this removes the '/', which complicates scripts.

Differential Revision: https://reviews.llvm.org/D42124

llvm-svn: 323153
2018-01-22 21:46:43 +00:00
Petar Jovanovic 29aced1bae [mips] add warnings for using dsp and msa flags with inappropriate revisions
Dsp and dspr2 require MIPS revision 2, while msa requires revision 5. Adding
warnings for cases when these flags are used with earlier revision.

Patch by Milos Stojanovic.

Differential Revision: https://reviews.llvm.org/D40490

llvm-svn: 323131
2018-01-22 16:43:30 +00:00
Carey Williams da15b5b116 [AArch64] optimise v4f16 fcmps to utilise vector instructions
Improves the code generation for v4f16 FCMP instructions when FullFP16 is not supported.
Generating FCTVL(s) rather than a longer series of FCVTs.

Differential Revision: https://reviews.llvm.org/D41772

llvm-svn: 323118
2018-01-22 14:16:11 +00:00
Simon Pilgrim a2b157bde4 [X86][AVX] Add test case for PR34370
llvm-svn: 323106
2018-01-22 12:27:22 +00:00
Simon Pilgrim 17682a86da [X86][SSE] Add ISD::VECTOR_SHUFFLE to faux shuffle decoding (Reapplied)
Primarily, this allows us to use the aggressive extraction mechanisms in combineExtractWithShuffle earlier and make use of UNDEF elements that may be lost during lowering.

Reapplied after rL322279 was reverted at rL322335 due to PR35918, underlying issue was fixed at rL322644.

llvm-svn: 323104
2018-01-22 12:05:17 +00:00
Marina Yatsina 77a21dbad4 Break false dependencies for POPCNT, LZCNT, TZCNT
Add POPCNT, LZCNT, TZCNT to the list of instructions that have false dependency.
Add a test to make sure BreakFalseDeps breaks the dependencies for these instructions.
Update affected tests.

This fixes bugzilla https://bugs.llvm.org/show_bug.cgi?id=33869

This is the final of multiple patches that fix this bugzilla.
Most of the patches are intended at refactoring the existent code.

Reviews of the refactoring done to enable this change:
https://reviews.llvm.org/D40330
https://reviews.llvm.org/D40331
https://reviews.llvm.org/D40332
https://reviews.llvm.org/D40333

Differential Revision: https://reviews.llvm.org/D40334

Change-Id: If95cbf1a3f5c7dccff8f1b22ecb397542147303d
llvm-svn: 323096
2018-01-22 10:07:01 +00:00
Marina Yatsina 6fc2aaae8d Separate ExecutionDepsFix into 4 parts:
1. ReachingDefsAnalysis - Allows to identify for each instruction what is the “closest” reaching def of a certain register. Used by BreakFalseDeps (for clearance calculation) and ExecutionDomainFix (for arbitrating conflicting domains).
2. ExecutionDomainFix - Changes the variant of the instructions in order to minimize domain crossings.
3. BreakFalseDeps - Breaks false dependencies.
4. LoopTraversal - Creatws a traversal order of the basic blocks that is optimal for loops (introduced in revision L293571). Both ExecutionDomainFix and ReachingDefsAnalysis use this to determine the order they will traverse the basic blocks.

This also included the following changes to ExcecutionDepsFix original logic:
1. BreakFalseDeps and ReachingDefsAnalysis logic no longer restricted by a register class.
2. ReachingDefsAnalysis tracks liveness of reg units instead of reg indices into a given reg class.

Additional changes in affected files:
1. X86 and ARM targets now inherit from ExecutionDomainFix instead of ExecutionDepsFix. BreakFalseDeps also was added to the passes they activate.
2. Comments and references to ExecutionDepsFix replaced with ExecutionDomainFix and BreakFalseDeps, as appropriate.

Additional refactoring changes will follow.

This commit is (almost) NFC.
The only functional change is that now BreakFalseDeps will break dependency for all register classes.
Since no additional instructions were added to the list of instructions that have false dependencies, there is no actual change yet.
In a future commit several instructions (and tests) will be added.

This is the first of multiple patches that fix bugzilla https://bugs.llvm.org/show_bug.cgi?id=33869
Most of the patches are intended at refactoring the existent code.

Additional relevant reviews:
https://reviews.llvm.org/D40331
https://reviews.llvm.org/D40332
https://reviews.llvm.org/D40333
https://reviews.llvm.org/D40334

Differential Revision: https://reviews.llvm.org/D40330

Change-Id: Icaeb75e014eff96a8f721377783f9a3e6c679275
llvm-svn: 323087
2018-01-22 10:05:23 +00:00
Craig Topper 7fddf2bfef [X86] Add an override of targetShrinkDemandedConstant to limit the damage that shrinkdemandedbits can do to zext_in_reg operations
Summary:
This patch adds an implementation of targetShrinkDemandedConstant that tries to keep shrinkdemandedbits from removing bits that would otherwise have been recognized as a movzx.

We still need a follow patch to stop moving ands across srl if the and could be represented as a movzx before the shift but not after. I think this should help with some of the cases that D42088 ended up removing during isel.

Reviewers: spatel, RKSimon

Reviewed By: spatel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D42265

llvm-svn: 323048
2018-01-20 18:50:09 +00:00
Jonas Paulsson 9cee52732f Move new test from Generic to SystemZ.
A few build bots failed with r323042 because they are not configured to
build the SystemZ target.

llvm-svn: 323044
2018-01-20 16:57:06 +00:00
Jonas Paulsson 7ad28863fb [SelectionDAG] Fix codegen of vector stores with non byte-sized elements.
This was completely broken, but hopefully fixed by this patch.

In cases where it is needed, a vector with non byte-sized elements is stored
by extracting, zero-extending, shift:ing and or:ing the elements into an
integer of the same width as the vector, which is then stored.

Review: Eli Friedman, Ulrich Weigand
https://reviews.llvm.org/D42100#inline-369520
https://bugs.llvm.org/show_bug.cgi?id=35520

llvm-svn: 323042
2018-01-20 16:05:10 +00:00
Craig Topper a6c6d65b72 [X86] Add some more v32i1 shuffle tests with shuffles between mask creation and mask usage rather than being just shuffling input arguments.
The existing tests just tested shuffles of v32i1 inputs, but arguments are promoted to v32i8. So it wasn't a good demonstration of v32i1 shuffle handling.

The new test cases use compares and selects to get k-register operations around the shuffle.

This is prep work for demonstrating changes from D42031.

llvm-svn: 323031
2018-01-20 08:13:35 +00:00
Craig Topper 4f14153494 [X86] Add test cases for failures to use movzx due to various issues with demanded bits.
D42265 and D42313 should help with some of these.

llvm-svn: 323030
2018-01-20 07:50:57 +00:00
Saleem Abdulrasool d3169b478c test: fix ARM tests harder
Remove the missed check update for the removal of the x86 specific
vector call on ARM.

llvm-svn: 323023
2018-01-20 01:26:46 +00:00
Saleem Abdulrasool 6f831d54f8 test: move ARM test from x86
The ARM backend is not guaranteed to be present on x86, move the test to
the ARM tests.

llvm-svn: 323021
2018-01-20 01:03:11 +00:00
Saleem Abdulrasool 99f479abcf CodeGen: handle llvm.used properly for COFF
`llvm.used` contains a list of pointers to named values which the
compiler, assembler, and linker are required to treat as if there is a
reference that they cannot see.  Ensure that the symbols are preserved
by adding an explicit `-include` reference to the linker command.

llvm-svn: 323017
2018-01-20 00:28:02 +00:00
Craig Topper 08bd14803c [X86] Teach X86 codegen to use vector width preference to avoid promoting to 512-bit types when VLX is enabled and the preference is for a smaller size.
This change applies to places where we would turn 128/256-bit code into 512-bit in order to get a wider element type through sext/zext. Any 512-bit types that already existed in the IR/DAG will be left that way.

The width preference has no effect on codegen behavior when the target does not have AVX512 enabled. So AVX/AVX2 codegen cannot be limited via this mechanism yet.

If the preference is lower than 256 we may still use a 256 bit type to do the operation. Constraining to 128 bits makes it much more difficult to support some operations. For many of these cases we need to change element width while keeping element count constant which is easiest done by switching between 256 and 128 bit.

The preference is only obeyed when AVX512 and VLX are available. This means the preference is not obeyed for KNL, but is obeyed for SKX, Cannonlake, and Icelake. For KNL, the only way to do masked operation is on 512-bit registers so we would have to completely disable masking to obey the preference. We would also lose support for gather, scatter, ctlz, vXi64 multiplies, etc. This may change in the future, but this simplifies the initial implementation.

Differential Revision: https://reviews.llvm.org/D41895

llvm-svn: 323016
2018-01-20 00:26:12 +00:00
Sanjay Patel 4127e77e13 [x86] add tests for sqrt estimate that should respect denorms; NFC (PR34994)
llvm-svn: 323003
2018-01-19 22:47:49 +00:00
Craig Topper b0959fce1b [X86] Autogenerate complete checks on a couple tests. NFC
llvm-svn: 322997
2018-01-19 22:04:20 +00:00