Commit Graph

2785 Commits

Author SHA1 Message Date
Adrian Prantl bceaaa9643 [IR] Remove the DIExpression field from DIGlobalVariable.
This patch implements PR31013 by introducing a
DIGlobalVariableExpression that holds a pair of DIGlobalVariable and
DIExpression.

Currently, DIGlobalVariables holds a DIExpression. This is not the
best way to model this:

(1) The DIGlobalVariable should describe the source level variable,
    not how to get to its location.

(2) It makes it unsafe/hard to update the expressions when we call
    replaceExpression on the DIGLobalVariable.

(3) It makes it impossible to represent a global variable that is in
    more than one location (e.g., a variable with multiple
    DW_OP_LLVM_fragment-s).  We also moved away from attaching the
    DIExpression to DILocalVariable for the same reasons.

This reapplies r289902 with additional testcase upgrades and a change
to the Bitcode record for DIGlobalVariable, that makes upgrading the
old format unambiguous also for variables without DIExpressions.

<rdar://problem/29250149>
https://llvm.org/bugs/show_bug.cgi?id=31013
Differential Revision: https://reviews.llvm.org/D26769

llvm-svn: 290153
2016-12-20 02:09:43 +00:00
Eli Friedman 1a9a887a29 Add ARM support to update_llc_test_checks.py
Just the minimal support to get it working at the moment.

Includes checks for test/CodeGen/ARM/vzip.ll as an example.

Differential Revision: https://reviews.llvm.org/D27829

llvm-svn: 290144
2016-12-19 23:09:51 +00:00
Diana Picus 3bae486b7f [ARM] GlobalISel: Add more checks to test
llvm-svn: 290108
2016-12-19 14:08:11 +00:00
Diana Picus 58265c4788 [ARM] GlobalISel: Minor style fixup in test
llvm-svn: 290107
2016-12-19 14:08:06 +00:00
Diana Picus 97ae95c3a8 [ARM] GlobalISel: Lower i8 and i16 register args
This allows lowering i8 and i16 arguments if they can fit in the registers. Note
that the lowering is incomplete - ABI extensions are handled in a subsequent
patch.

(Last part of)
Differential Revision: https://reviews.llvm.org/D27704

llvm-svn: 290106
2016-12-19 14:08:02 +00:00
Diana Picus 5a724452a0 [ARM] GlobalISel: Allow i8 and i16 adds
Teach the instruction selector and legalizer that it's ok to have adds with 8 or
16-bit integers.

This is the second part of https://reviews.llvm.org/D27704

llvm-svn: 290105
2016-12-19 14:07:56 +00:00
Diana Picus 36aa09fa3c [ARM] GlobalISel: Select i8 and i16 copies
Teach the instruction selector that it's ok to copy small values from physical
registers.

First part of https://reviews.llvm.org/D27704

llvm-svn: 290104
2016-12-19 14:07:50 +00:00
Diana Picus 1437f6d710 [ARM] GlobalISel: Lower more than 4 arguments
This adds support for lowering more than 4 arguments (although still i32 only).
It uses the handleAssignments / ValueHandler infrastructure extracted from
the AArch64 backend in r288658.

Differential Revision: https://reviews.llvm.org/D27195

llvm-svn: 290098
2016-12-19 11:55:41 +00:00
Diana Picus 519807f7be [ARM] GlobalISel: Support loading from the stack
Add support for selecting simple G_LOAD and G_FRAME_INDEX instructions (32-bit
scalars only). This will be useful for functions that need to pass arguments on
the stack.

First part of https://reviews.llvm.org/D27195.

llvm-svn: 290096
2016-12-19 11:26:31 +00:00
Matthias Braun 9439d3dc2c Move test to correct directory
See also test/CodeGen/MIR/README

llvm-svn: 290032
2016-12-17 02:16:26 +00:00
Adrian Prantl 73ec065604 Revert "[IR] Remove the DIExpression field from DIGlobalVariable."
This reverts commit 289920 (again).
I forgot to implement a Bitcode upgrade for the case where a DIGlobalVariable
has not DIExpression. Unfortunately it is not possible to safely upgrade
these variables without adding a flag to the bitcode record indicating which
version they are.
My plan of record is to roll the planned follow-up patch that adds a
unit: field to DIGlobalVariable into this patch before recomitting.
This way we only need one Bitcode upgrade for both changes (with a
version flag in the bitcode record to safely distinguish the record
formats).

Sorry for the churn!

llvm-svn: 289982
2016-12-16 19:39:01 +00:00
Eli Friedman f624ec27b7 [ARM] Add ARMISD::VLD1DUP to match vld1_dup more consistently.
Currently, there are substantial problems forming vld1_dup even if the
VDUP survives legalization. The lack of an actual node
leads to terrible results: not only can we not form post-increment vld1_dup
instructions, but we form scalar pre-increment and post-increment
loads which force the loaded value into a GPR. This patch fixes that
by combining the vdup+load into an ARMISD node before DAGCombine
messes it up.

Also includes a crash fix for vld2_dup (see testcase @vld2dupi8_postinc_variable).

Recommiting with fix to avoid forming vld1dup if the type of the load
doesn't match the type of the vdup (see
https://llvm.org/bugs/show_bug.cgi?id=31404).

Differential Revision: https://reviews.llvm.org/D27694

llvm-svn: 289972
2016-12-16 18:44:08 +00:00
Diana Picus 812caee65a [ARM] GlobalISel: Select add i32, i32
Add the minimal support necessary to select a function that returns the sum of
two i32 values.

This includes some support for argument/return lowering of i32 values through
registers, as well as the handling of copy and add instructions throughout the
GlobalISel pipeline.

Differential Revision: https://reviews.llvm.org/D26677

llvm-svn: 289940
2016-12-16 12:54:46 +00:00
Nico Weber 11c2e6ee3a Revert 279703, it caused PR31404.
llvm-svn: 289923
2016-12-16 04:51:25 +00:00
Adrian Prantl 74a835cda0 [IR] Remove the DIExpression field from DIGlobalVariable.
This patch implements PR31013 by introducing a
DIGlobalVariableExpression that holds a pair of DIGlobalVariable and
DIExpression.

Currently, DIGlobalVariables holds a DIExpression. This is not the
best way to model this:

(1) The DIGlobalVariable should describe the source level variable,
    not how to get to its location.

(2) It makes it unsafe/hard to update the expressions when we call
    replaceExpression on the DIGLobalVariable.

(3) It makes it impossible to represent a global variable that is in
    more than one location (e.g., a variable with multiple
    DW_OP_LLVM_fragment-s).  We also moved away from attaching the
    DIExpression to DILocalVariable for the same reasons.

This reapplies r289902 with additional testcase upgrades.

<rdar://problem/29250149>
https://llvm.org/bugs/show_bug.cgi?id=31013
Differential Revision: https://reviews.llvm.org/D26769

llvm-svn: 289920
2016-12-16 04:25:54 +00:00
Chandler Carruth 4154062b69 Revert patch series introducing the DAG combine to match a load-by-bytes
idiom.

r289538: Match load by bytes idiom and fold it into a single load
r289540: Fix a buildbot failure introduced by r289538
r289545: Use more detailed assertion messages in the code ...
r289646: Add a couple of assertions to the load combine code ...

This DAG combine has a bad crash in it that is quite hard to trigger
sadly -- it relies on sneaking code with UB through the SDAG build and
into this particular combine. I've responded to the original commit with
a test case that reproduces it.

However, the code also has other problems that will require substantial
changes to address and so I'm going ahead and reverting it for now. This
should unblock us and perhaps others that are hitting the crash in the
wild and will let a fresh patch with updated approach come in cleanly
afterward.

Sorry for any trouble or disruption!

llvm-svn: 289916
2016-12-16 04:05:22 +00:00
Adrian Prantl 03c6d31a3b Revert "[IR] Remove the DIExpression field from DIGlobalVariable."
This reverts commit 289902 while investigating bot berakage.

llvm-svn: 289906
2016-12-16 01:00:30 +00:00
Adrian Prantl ce13935776 [IR] Remove the DIExpression field from DIGlobalVariable.
This patch implements PR31013 by introducing a
DIGlobalVariableExpression that holds a pair of DIGlobalVariable and
DIExpression.

Currently, DIGlobalVariables holds a DIExpression. This is not the
best way to model this:

(1) The DIGlobalVariable should describe the source level variable,
    not how to get to its location.

(2) It makes it unsafe/hard to update the expressions when we call
    replaceExpression on the DIGLobalVariable.

(3) It makes it impossible to represent a global variable that is in
    more than one location (e.g., a variable with multiple
    DW_OP_LLVM_fragment-s).  We also moved away from attaching the
    DIExpression to DILocalVariable for the same reasons.

<rdar://problem/29250149>
https://llvm.org/bugs/show_bug.cgi?id=31013
Differential Revision: https://reviews.llvm.org/D26769

llvm-svn: 289902
2016-12-16 00:36:43 +00:00
Eli Friedman 379294676d Don't combine splats with other shuffles.
We sometimes end up creating shuffles which are worse than the obvious
translation of the IR.

Fixes https://llvm.org/bugs/show_bug.cgi?id=31301 .

Differential Revision: https://reviews.llvm.org/D27793

llvm-svn: 289882
2016-12-15 22:41:40 +00:00
Eli Friedman 34505083c6 Don't combine a shuffle of two BUILD_VECTORs with duplicate elements.
Targets can't handle this case well in general; we often transform
a shuffle of two cheap BUILD_VECTORs to element-by-element insertion,
which is very inefficient.

Fixes https://llvm.org/bugs/show_bug.cgi?id=31364 . Partially
fixes https://llvm.org/bugs/show_bug.cgi?id=31301.

Differential Revision: https://reviews.llvm.org/D27787

llvm-svn: 289874
2016-12-15 21:36:59 +00:00
Sjoerd Meijer 96e10b5a9e [Thumb] Teach ISel how to lower compares of AND bitmasks efficiently
This is essentially a recommit of r285893, but with a correctness fix. The
problem of the original commit was that this:

bic r5, r7, #31
cbz r5, .LBB2_10

got rewritten into:

lsrs  r5, r7, #5
beq .LBB2_10

The result in destination register r5 is not the same and this is incorrect
when r5 is not dead. So this fix includes checking the uses of the AND
destination register. And also, compared to the original commit, some regression
tests didn't need changing anymore because of this extra check.

For completeness, this was the original commit message:

For the common pattern (CMPZ (AND x, #bitmask), #0), we can do some more
efficient instruction selection if the bitmask is one consecutive sequence of
set bits (32 - clz(bm) - ctz(bm) == popcount(bm)).

1) If the bitmask touches the LSB, then we can remove all the upper bits and
set the flags by doing one LSLS.
2) If the bitmask touches the MSB, then we can remove all the lower bits and
set the flags with one LSRS.
3) If the bitmask has popcount == 1 (only one set bit), we can shift that bit
into the sign bit with one LSLS and change the condition query from NE/EQ to
MI/PL (we could also implement this by shifting into the carry bit and
branching on BCC/BCS).
4) Otherwise, we can emit a sequence of LSLS+LSRS to remove the upper and lower
zero bits of the mask.

1-3 require only one 16-bit instruction and can elide the CMP. 4 requires two
16-bit instructions but can elide the CMP and doesn't require materializing a
complex immediate, so is also a win.

Differential Revision: https://reviews.llvm.org/D27761

llvm-svn: 289794
2016-12-15 09:38:59 +00:00
Prakhar Bahuguna 52a7dd7d78 [ARM] Implement execute-only support in CodeGen
This implements execute-only support for ARM code generation, which
prevents the compiler from generating data accesses to code sections.
The following changes are involved:

* Add the CodeGen option "-arm-execute-only" to the ARM code generator.
* Add the clang flag "-mexecute-only" as well as the GCC-compatible
  alias "-mpure-code" to enable this option.
* When enabled, literal pools are replaced with MOVW/MOVT instructions,
  with VMOV used in addition for floating-point literals. As the MOVT
  instruction is required, execute-only support is only available in
  Thumb mode for targets supporting ARMv8-M baseline or Thumb2.
* Jump tables are placed in data sections when in execute-only mode.
* The execute-only text section is assigned section ID 0, and is
  marked as unreadable with the SHF_ARM_PURECODE flag with symbol 'y'.
  This also overrides selection of ELF sections for globals.

llvm-svn: 289784
2016-12-15 07:59:08 +00:00
Eli Friedman db07ebbab6 Add testcases for some shuffle bugs.
See https://llvm.org/bugs/show_bug.cgi?id=31301 and
https://llvm.org/bugs/show_bug.cgi?id=31364 .

llvm-svn: 289751
2016-12-15 01:47:15 +00:00
Eli Friedman cbed30c501 [ARM] Split 128-bit vectors in BUILD_VECTOR lowering
Given that INSERT_VECTOR_ELT operates on D registers anyway, combining
64-bit vectors into a 128-bit vector is basically free. Therefore, try
to split BUILD_VECTOR nodes before giving up and lowering them to a series
of INSERT_VECTOR_ELT instructions. Sometimes this allows dramatically
better lowerings; see testcases for examples. Inspired by similar code
in the x86 backend for AVX.

Differential Revision: https://reviews.llvm.org/D27624

llvm-svn: 289706
2016-12-14 20:44:38 +00:00
Eli Friedman 10576e73c9 [ARM] Add ARMISD::VLD1DUP to match vld1_dup more consistently.
Currently, there are substantial problems forming vld1_dup even if the
VDUP survives legalization. The lack of an actual node
leads to terrible results: not only can we not form post-increment vld1_dup
instructions, but we form scalar pre-increment and post-increment
loads which force the loaded value into a GPR. This patch fixes that
by combining the vdup+load into an ARMISD node before DAGCombine
messes it up.

Also includes a crash fix for vld2_dup (see testcase @vld2dupi8_postinc_variable).

Differential Revision: https://reviews.llvm.org/D27694

llvm-svn: 289703
2016-12-14 20:25:26 +00:00
Nirav Dave f5bf03c7ef Revert "In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled."
Reverting due to ARM MCJIT and MIPS LLD error.

This reverts commit r289659.

llvm-svn: 289667
2016-12-14 16:43:44 +00:00
Nirav Dave 8527ab0ad2 In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled.
Retrying after fixing after removing load-store factoring through
token factors in favor of improved token factor operand pruning

Simplify Consecutive Merge Store Candidate Search

Now that address aliasing is much less conservative, push through
simplified store merging search which only checks for parallel stores
through the chain subgraph. This is cleaner as the separation of
non-interfering loads/stores from the store-merging logic.

Whem merging stores, search up the chain through a single load, and
finds all possible stores by looking down from through a load and a
TokenFactor to all stores visited. This improves the quality of the
output SelectionDAG and generally the output CodeGen (with some
exceptions).

Additional Minor Changes:

   1. Finishes removing unused AliasLoad code
   2. Unifies the the chain aggregation in the merged stores across
      code paths
   3. Re-add the Store node to the worklist after calling
      SimplifyDemandedBits.
   4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is
      arbitrary, but seemed sufficient to not cause regressions in
      tests.

This finishes the change Matt Arsenault started in r246307 and
jyknight's original patch.

Many tests required some changes as memory operations are now
reorderable. Some tests relying on the order were changed to use
volatile memory operations

Noteworthy tests:

    CodeGen/AArch64/argument-blocks.ll -
      It's not entirely clear what the test_varargs_stackalign test is
      supposed to be asserting, but the new code looks right.

    CodeGen/AArch64/arm64-memset-inline.lli -
    CodeGen/AArch64/arm64-stur.ll -
    CodeGen/ARM/memset-inline.ll -

      The backend now generates *worse* code due to store merging
      succeeding, as we do do a 16-byte constant-zero store efficiently.

    CodeGen/AArch64/merge-store.ll -
      Improved, but there still seems to be an extraneous vector insert
      from an element to itself?

    CodeGen/PowerPC/ppc64-align-long-double.ll -
      Worse code emitted in this case, due to the improved store->load
      forwarding.

    CodeGen/X86/dag-merge-fast-accesses.ll -
    CodeGen/X86/MergeConsecutiveStores.ll -
    CodeGen/X86/stores-merging.ll -
    CodeGen/Mips/load-store-left-right.ll -
      Restored correct merging of non-aligned stores

    CodeGen/AMDGPU/promote-alloca-stored-pointer-value.ll -
      Improved. Correctly merges buffer_store_dword calls

    CodeGen/AMDGPU/si-triv-disjoint-mem-access.ll -
      Improved. Sidesteps loading a stored value and
      merges two stores

    CodeGen/X86/pr18023.ll -
      This test has been removed, as it was asserting incorrect
      behavior. Non-volatile stores *CAN* be moved past volatile loads,
      and now are.

    CodeGen/X86/vector-idiv.ll -
    CodeGen/X86/vector-lzcnt-128.ll -
      It's basically impossible to tell what these tests are actually
      testing. But, looks like the code got better due to the memory
      operations being recognized as non-aliasing.

    CodeGen/X86/win32-eh.ll -
      Both loads of the securitycookie are now merged.

Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle

Subscribers: wdng, nhaehnle, nemanjai, arsenm, weimingz, niravd, RKSimon, aemerson, qcolombet, dsanders, resistor, tstellarAMD, t.p.northover, spatel

Differential Revision: https://reviews.llvm.org/D14834

llvm-svn: 289659
2016-12-14 15:44:26 +00:00
Evandro Menezes 54eb192b25 [ARM] Fix typo in checking prefix
llvm-svn: 289617
2016-12-14 00:02:03 +00:00
Evandro Menezes aeec780e42 Add support for Samsung Exynos M3 (NFC)
llvm-svn: 289613
2016-12-13 23:31:41 +00:00
Alina Sbirlea 77c5eaaeda Generalize strided store pattern in interleave access pass
Summary:
This patch aims to generalize matching of the strided store accesses to more general masks.
The more general rule is to have consecutive accesses based on the stride:
[x, y, ... z, x+1, y+1, ...z+1, x+2, y+2, ...z+2, ...]
All elements in the masks need not form a contiguous space, there may be gaps.
As before, undefs are allowed and filled in with adjacent element loads.

Reviewers: HaoLiu, mssimpso

Subscribers: mkuper, delena, llvm-commits

Differential Revision: https://reviews.llvm.org/D23646

llvm-svn: 289573
2016-12-13 19:32:36 +00:00
Artur Pilipenko c93cc5955f [DAGCombiner] Match load by bytes idiom and fold it into a single load
Match a pattern where a wide type scalar value is loaded by several narrow loads and combined by shifts and ors. Fold it into a single load or a load and a bswap if the targets supports it.

Assuming little endian target:
  i8 *a = ...
  i32 val = a[0] | (a[1] << 8) | (a[2] << 16) | (a[3] << 24)
=>
  i32 val = *((i32)a)

  i8 *a = ...
  i32 val = (a[0] << 24) | (a[1] << 16) | (a[2] << 8) | a[3]
=>
  i32 val = BSWAP(*((i32)a))

This optimization was discussed on llvm-dev some time ago in "Load combine pass" thread. We came to the conclusion that we want to do this transformation late in the pipeline because in presence of atomic loads load widening is irreversible transformation and it might hinder other optimizations.

Eventually we'd like to support folding patterns like this where the offset has a variable and a constant part:
  i32 val = a[i] | (a[i + 1] << 8) | (a[i + 2] << 16) | (a[i + 3] << 24)

Matching the pattern above is easier at SelectionDAG level since address reassociation has already happened and the fact that the loads are adjacent is clear. Understanding that these loads are adjacent at IR level would have involved looking through geps/zexts/adds while looking at the addresses.

The general scheme is to match OR expressions by recursively calculating the origin of individual bits which constitute the resulting OR value. If all the OR bits come from memory verify that they are adjacent and match with little or big endian encoding of a wider value. If so and the load of the wider type (and bswap if needed) is allowed by the target generate a load and a bswap if needed.

Reviewed By: hfinkel, RKSimon, filcab

Differential Revision: https://reviews.llvm.org/D26149

llvm-svn: 289538
2016-12-13 14:21:14 +00:00
Sanjoy Das 3336f681e3 [Verifier] Add verification for TBAA metadata
Summary:
This change adds some verification in the IR verifier around struct path
TBAA metadata.

Other than some basic sanity checks (e.g. we get constant integers where
we expect constant integers), this checks:

 - That by the time an struct access tuple `(base-type, offset)` is
   "reduced" to a scalar base type, the offset is `0`.  For instance, in
   C++ you can't start from, say `("struct-a", 16)`, and end up with
   `("int", 4)` -- by the time the base type is `"int"`, the offset
   better be zero.  In particular, a variant of this invariant is needed
   for `llvm::getMostGenericTBAA` to be correct.

 - That there are no cycles in a struct path.

 - That struct type nodes have their offsets listed in an ascending
   order.

 - That when generating the struct access path, you eventually reach the
   access type listed in the tbaa tag node.

Reviewers: dexonsmith, chandlerc, reames, mehdi_amini, manmanren

Subscribers: mcrosier, llvm-commits

Differential Revision: https://reviews.llvm.org/D26438

llvm-svn: 289402
2016-12-11 20:07:15 +00:00
Matthias Braun 2c7d52a540 Move .mir tests to appropriate directories
test/CodeGen/MIR should contain tests that intent to test the MIR
printing or parsing. Tests that test something else should be in
test/CodeGen/TargetName even when they are written in .mir.

As a rule of thumb, only tests using "llc -run-pass none" should be in
test/CodeGen/MIR.

llvm-svn: 289254
2016-12-09 19:08:15 +00:00
Nirav Dave bedb5d906c Revert "In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled."
This reverts commit r289221 which appears to be triggering an assertion

llvm-svn: 289226
2016-12-09 17:18:24 +00:00
Nirav Dave fd51ff4fd8 In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled.
Retrying after fixing overly aggressive load-store forwarding optimization.

Simplify Consecutive Merge Store Candidate Search

Now that address aliasing is much less conservative, push through
simplified store merging search which only checks for parallel stores
through the chain subgraph. This is cleaner as the separation of
non-interfering loads/stores from the store-merging logic.

Whem merging stores, search up the chain through a single load, and
finds all possible stores by looking down from through a load and a
TokenFactor to all stores visited. This improves the quality of the
output SelectionDAG and generally the output CodeGen (with some
exceptions).

Additional Minor Changes:

   1. Finishes removing unused AliasLoad code
   2. Unifies the the chain aggregation in the merged stores across
      code paths
   3. Re-add the Store node to the worklist after calling
      SimplifyDemandedBits.
   4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is
      arbitrary, but seemed sufficient to not cause regressions in
      tests.

This finishes the change Matt Arsenault started in r246307 and
jyknight's original patch.

Many tests required some changes as memory operations are now
reorderable. Some tests relying on the order were changed to use
volatile memory operations

Noteworthy tests:

    CodeGen/AArch64/argument-blocks.ll -
      It's not entirely clear what the test_varargs_stackalign test is
      supposed to be asserting, but the new code looks right.

    CodeGen/AArch64/arm64-memset-inline.lli -
    CodeGen/AArch64/arm64-stur.ll -
    CodeGen/ARM/memset-inline.ll -

      The backend now generates *worse* code due to store merging
      succeeding, as we do do a 16-byte constant-zero store efficiently.

    CodeGen/AArch64/merge-store.ll -
      Improved, but there still seems to be an extraneous vector insert
      from an element to itself?

    CodeGen/PowerPC/ppc64-align-long-double.ll -
      Worse code emitted in this case, due to the improved store->load
      forwarding.

    CodeGen/X86/dag-merge-fast-accesses.ll -
    CodeGen/X86/MergeConsecutiveStores.ll -
    CodeGen/X86/stores-merging.ll -
    CodeGen/Mips/load-store-left-right.ll -
      Restored correct merging of non-aligned stores

    CodeGen/AMDGPU/promote-alloca-stored-pointer-value.ll -
      Improved. Correctly merges buffer_store_dword calls

    CodeGen/AMDGPU/si-triv-disjoint-mem-access.ll -
      Improved. Sidesteps loading a stored value and
      merges two stores

    CodeGen/X86/pr18023.ll -
      This test has been removed, as it was asserting incorrect
      behavior. Non-volatile stores *CAN* be moved past volatile loads,
      and now are.

    CodeGen/X86/vector-idiv.ll -
    CodeGen/X86/vector-lzcnt-128.ll -
      It's basically impossible to tell what these tests are actually
      testing. But, looks like the code got better due to the memory
      operations being recognized as non-aliasing.

    CodeGen/X86/win32-eh.ll -
      Both loads of the securitycookie are now merged.

Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle

Subscribers: wdng, nhaehnle, nemanjai, arsenm, weimingz, niravd, RKSimon, aemerson, qcolombet, dsanders, resistor, tstellarAMD, t.p.northover, spatel

Differential Revision: https://reviews.llvm.org/D14834

llvm-svn: 289221
2016-12-09 16:15:12 +00:00
Weiming Zhao b38cfced8d Summary: Currently there is no way to disable deprecated warning from asm like this
clang  -target arm deprecated-asm.s -c
  deprecated-asm.s:30:9: warning: use of SP or PC in the list is deprecated
       stmia   r4!, {r12-r14}

We have to have an option what can disable it.

Patched by Yin Ma!

Reviewers: joey, echristo, weimingz

Subscribers: llvm-commits, aemerson

Differential Revision: https://reviews.llvm.org/D27219

llvm-svn: 288734
2016-12-05 23:55:13 +00:00
Wolfgang Pieb 42f92a7225 When instructions are hoisted out of loops by MachineLICM, remove their debug loc.
This prevents erratic stepping behavior as well as incorrect source attribution
for sample profiling.

Reviewers: dblakie

Subscribers: llvm-commit

Differential Revision: https://reviews.llvm.org/D27290

llvm-svn: 288442
2016-12-02 00:37:57 +00:00
Oleg Ranevskyy e2ae41519f [ARM] Fix for 64-bit CAS expansion on ARM32 with -O0
Summary:
This patch fixes comparison of 64-bit atomic with its expected value in CMP_SWAP_64 expansion.

Currently, the low words are compared with CMP, while the high words are compared with SBC. SBC expects the carry flag to be set if CMP detects a difference. CMP might leave the carry unset for unequal arguments though if the first one is >= than the second. This might cause the comparison logic to detect false equality.

Example of the broken C++ code:
```
std::atomic<long long> at(2);

long long ll = 1;
std::atomic_compare_exchange_strong(&at, &ll, 3);
```
Even though the atomic `at` and the expected value `ll` are not equal and `atomic_compare_exchange_strong` returns `false`, `at` is changed to 3.

The patch replaces SBC with CMPEQ.

Reviewers: t.p.northover

Subscribers: aemerson, rengolin, llvm-commits, asl

Differential Revision: https://reviews.llvm.org/D27315

llvm-svn: 288433
2016-12-01 22:58:35 +00:00
Kuba Mracek 06995e866b [xray] Add XRay support for Mach-O in CodeGen
Currently, XRay only supports emitting the XRay table (xray_instr_map) on ELF binaries. Let's add Mach-O support.

Differential Revision: https://reviews.llvm.org/D26983

llvm-svn: 287734
2016-11-23 02:07:04 +00:00
Pablo Barrio c41e856f53 [ARM] Relax restriction on variadic functions for tailcall optimization
Summary:
Variadic functions can be treated in the same way as normal functions
with respect to the number and types of parameters.

Reviewers: grosbach, olista01, t.p.northover, rengolin

Subscribers: javed.absar, aemerson, llvm-commits

Differential Revision: https://reviews.llvm.org/D26748

llvm-svn: 287219
2016-11-17 10:56:58 +00:00
Tim Northover 397f9d9d05 ARM: fix CodeGen for 64-bit shifts.
One half of the shifts obviously needed conditional selection based on whether
the shift amount is more than 32-bits, but leaving the other half as the
natural shift isn't acceptable either: it's undefined behaviour to shift a
32-bit value by more than 31.

llvm-svn: 287149
2016-11-16 20:54:28 +00:00
Javed Absar f043dac25d [ARM] Add machine scheduler for Cortex-R52
This patch adds the Sched Machine Model for Cortex-R52.

Details of the pipeline and descriptions are in comments
in file ARMScheduleR52.td included in this patch.

Reviewers: rengolin, jmolloy

Differential Revision: https://reviews.llvm.org/D26500

llvm-svn: 286949
2016-11-15 11:34:54 +00:00
Tim Northover 46a6f0fbf0 Recommit: ARM: sort register lists by encoding in push/pop instructions.
For example we were producing

    push {r8, r10, r11, r4, r5, r7, lr}

This is misleading (r4, r5 and r7 are actually pushed before the rest), and
other components (stack folding recently) often forget to deal with the extra
complexity coming from the different order, leading to miscompiles. Finally, we
warn about our own code in -no-integrated-as mode without this, which is really
not a good idea.

Fixed usage of std::sort so that we (hopefully) use instantiations that
actually exist in GCC 4.8.

llvm-svn: 286881
2016-11-14 20:28:24 +00:00
Tim Northover 1b66f39cf2 Revert "ARM: sort register lists by encoding in push/pop instructions."
This reverts commit 286866. It broke a bot, something to do with exactly which
templates std::sort accepts.

llvm-svn: 286867
2016-11-14 19:05:28 +00:00
Tim Northover e908ea844c ARM: sort register lists by encoding in push/pop instructions.
For example we were producing

    push {r8, r10, r11, r4, r5, r7, lr}

This is misleading (r4, r5 and r7 are actually pushed before the rest), and
other components (stack folding recently) often forget to deal with the extra
complexity coming from the different order, leading to miscompiles. Finally, we
warn about our own code in -no-integrated-as mode without this, which is really
not a good idea.

llvm-svn: 286866
2016-11-14 19:02:17 +00:00
Adrian Prantl 554fd99dd5 Revert "Use private linkage for MergedGlobals variables" on Darwin.
This is a partial revert of r244615 (http://reviews.llvm.org/D11942),
which caused a major regression in debug info quality.

Turning the artificial __MergedGlobal symbols into private symbols
(l__MergedGlobal) means that the linker will not include them in the
symbol table of the final executable. Without a symbol table entry
dsymutil is not be able to process the debug info for any of the
merged globals and thus drops the debug info for all of them.

This patch is enabling the old behavior for all MachO targets while
leaving all other targets unaffected.

rdar://problem/29160481
https://reviews.llvm.org/D26531

llvm-svn: 286607
2016-11-11 17:50:09 +00:00
Diana Picus 22274934f4 [ARM] Add plumbing for GlobalISel
Add GlobalISel skeleton, up to the point where we can select a ret void.

llvm-svn: 286573
2016-11-11 08:27:37 +00:00
Matthias Braun 325cd2c98a ScheduleDAGInstrs: Add condjump deps to addSchedBarrierDeps()
addSchedBarrierDeps() is supposed to add use operands to the ExitSU
node. The current implementation adds uses for calls/barrier instruction
and the MBB live-outs in all other cases. The use
operands of conditional jump instructions were missed.

Also added code to macrofusion to set the latencies between nodes to
zero to avoid problems with the fusing nodes lingering around in the
pending list now.

Differential Revision: https://reviews.llvm.org/D25140

llvm-svn: 286544
2016-11-11 01:34:21 +00:00
James Molloy b03e0879fc [Thumb1] Move padding earlier when synthesizing TBBs off of the PC
When the base register (register pointing to the jump table) is the PC, we expect the jump table to directly follow the jump sequence with no intervening padding.

If there is intervening padding, the calculated offsets will not be correct. One solution would be to account for any padding in the emitted LDRB instruction, but at the moment we don't support emitting MCExprs for the load offset.

In the meantime, it's correct and only a slight amount worse to just move the padding up, from just before the jump table to just before the jump instruction sequence. We can do that by emitting code alignment before the jump sequence, as we know the number of instructions in the sequence is always 4.

llvm-svn: 286107
2016-11-07 13:38:21 +00:00
Saleem Abdulrasool 804e12eeb5 ARM: lower fpowi appropriately for Windows ARM
This handles the last case of the builtin function calls that we would
generate code which differed from Microsoft's ABI.  Rather than
generating a call to `__pow{d,s}i2` we now promote the parameter to a
float or double and invoke `powf` or `pow` instead.

Addresses PR30825!

llvm-svn: 286082
2016-11-06 19:46:54 +00:00