Commit Graph

34511 Commits

Author SHA1 Message Date
Ahmed Bougacha a87c3480b5 [X86] Extract PSIGN/BLENDVP tests into vector-blend.ll. NFC.
We're going to stop generating PSIGN, so calling a test "psign"
isn't ideal. Instead, call these tests what they really are:
variable blends using logic.
Also add a test to exhibit a case we're currently missing in
the PSIGN combine.

llvm-svn: 261022
2016-02-16 22:13:59 +00:00
Derek Schuff f8f8f093aa [WebAssemly] Don't move calls or stores past intervening loads
The register stackifier currently checks for intervening stores (and
loads that may alias them) but doesn't account for the fact that the
instruction being moved may affect intervening loads.

Differential Revision: http://reviews.llvm.org/D17298

llvm-svn: 261014
2016-02-16 21:44:19 +00:00
Adam Nemet 106fedab6f [LTO] Support Statistics
Summary:
I thought -Xlinker -mllvm -Xlinker -stats worked at some point but maybe
it never did.

For clang, I believe that stats are printed from cc1_main.  This patch
also prints them for LTO, specifically right after codegen happens.

I only looked at the C API for LTO briefly to see if this is a good
place.  Probably there are still cases where this wouldn't be printed
but it seems to be working for the common case.  I also experimented
putting this in the LTOCodeGenerator destructor but that didn't trigger
for me because ld64 does not destroy the LTOCodeGenerator.

Reviewers: dexonsmith, joker.eph

Subscribers: rafael, joker.eph, llvm-commits

Differential Revision: http://reviews.llvm.org/D17302

llvm-svn: 261013
2016-02-16 21:41:51 +00:00
Reid Kleckner 6e0d5f573c [codeview] Fix assertion on non-memory, non-register DBG_VALUE instructions
Eventually we should find a way to describe constant variables, but it
is not obvious how to do this at the moment.

llvm-svn: 261010
2016-02-16 21:14:51 +00:00
Colin LeMahieu ecef1d9cbc [Hexagon] Adding relocation for code size, cold path optimization allowing a 23-bit 4-byte aligned relocation to be a valid instruction encoding.
The usual way to get a 32-bit relocation is to use a constant extender which doubles the size of the instruction, 4 bytes to 8 bytes.

Another way is to put a .word32 and mix code and data within a function.  The disadvantage is it's not a valid instruction encoding and jumping over it causes prefetch stalls inside the hardware.

This relocation packs a 23-bit value in to an "r0 = add(rX, #a)" instruction by overwriting the source register bits.  Since r0 is the return value register, if this instruction is placed after a function call which return void, r0 will be filled with an undefined value, the prefetch won't be confused, and the callee can access the constant value by way of the link register.

llvm-svn: 261006
2016-02-16 20:38:17 +00:00
Jun Bum Lim b389d9b9af [AArch64] Add pass to remove redundant copy after RA
Summary:
This change will add a pass to remove unnecessary zero copies in target blocks
of cbz/cbnz instructions. E.g., the copy instruction in the code below can be
removed because the cbz jumps to BB1 when x0 is zero :
  BB0:
    cbz x0, .BB1
  BB1:
    mov x0, xzr

Jun

Reviewers: gberry, jmolloy, HaoLiu, MatzeB, mcrosier

Subscribers: mcrosier, mssimpso, haicheng, bmakam, llvm-commits, aemerson, rengolin

Differential Revision: http://reviews.llvm.org/D16203

llvm-svn: 261004
2016-02-16 20:02:39 +00:00
Derek Schuff aadc89c25d [WebAssembly] Insert COPY_LOCAL between CopyToReg and FrameIndex DAG nodes
CopyToReg nodes don't support FrameIndex operands. Other targets select
the FI to some LEA-like instruction, but since we don't have that, we
need to insert some kind of instruction that can take an FI operand and
produces a value usable by CopyToReg (i.e. in a vreg). So insert a dummy
copy_local between Op and its FI operand. This results in a redundant
copy which we should optimize away later (maybe in the post-FI-lowering
peephole pass).

Differential Revision: http://reviews.llvm.org/D17213

llvm-svn: 260987
2016-02-16 18:18:36 +00:00
Philip Reames 845435c86a Revert 260705, it appears to be causing pr26628
The root issue appears to be a confusion around what makeNoWrapRegion actually does.   It seems likely we need two versions of this function with slightly different semantics.

llvm-svn: 260981
2016-02-16 17:14:30 +00:00
Andrey Turetskiy eab4e68650 [X86] Enable the LEA optimization pass by default.
Differential Revision: http://reviews.llvm.org/D16877

llvm-svn: 260979
2016-02-16 16:41:38 +00:00
Dan Gohman 442bfcec00 [WebAssembly] Switch from RPO sorting to topological sorting.
WebAssembly doesn't require full RPO; topological sorting is sufficient and
can preserve more of the MachineBlockPlacement ordering. Unfortunately, this
still depends a lot on heuristics, because while we use the
MachineBlockPlacement ordering as a guide, we can't use it in places where
it isn't topologically ordered. This area will require further attention.

llvm-svn: 260978
2016-02-16 16:22:41 +00:00
Dan Gohman 8aa237c3ca [WebAssembly] Create new registers instead of reusing old ones in RegStackify.
This avoids some complications updating LiveIntervals to be aware of the new
register lifetimes, because we can just compute new intervals from scratch
rather than describe how the old ones have been changed.

llvm-svn: 260971
2016-02-16 15:17:21 +00:00
Rafael Espindola 944f655e05 Reapply r260489.
Original commit message:

[readobj] Dump DT_JMPREL relocations when outputting dynamic relocations.

The bits of r260488 it depends on have been committed.

llvm-svn: 260970
2016-02-16 15:16:00 +00:00
Dan Gohman aa7429112e [WebAssembly] Implement support for custom NaN bit patterns.
llvm-svn: 260968
2016-02-16 15:14:23 +00:00
Rafael Espindola c70aedab0e Introduce a getAsRange helper.
This requires making an error message a bit more generic, but that seems
a reasonable tradeoff.

Extracted from r260488 but simplified a bit.

llvm-svn: 260967
2016-02-16 14:50:39 +00:00
Rafael Espindola 6009db696b This reverts commit r260488 and r260489.
Original messages:
    Revert "[readobj] Handle ELF files with no section table or with no program headers."
    Revert "[readobj] Dump DT_JMPREL relocations when outputting dynamic relocations."

r260489 depends on r260488 and among other issues r260488 deleted error
handling code.

llvm-svn: 260962
2016-02-16 14:17:48 +00:00
Andrey Turetskiy 1052ac2311 [X86] PR26575: Fix LEA optimization pass.
Add a missing check for a type of address displacement operand of the load/store instruction being a candidate for LEA substitution.

Ref: https://llvm.org/bugs/show_bug.cgi?id=26575

Differential Revision: http://reviews.llvm.org/D17261

llvm-svn: 260959
2016-02-16 12:47:45 +00:00
Amaury Sechet f447a6b5f4 Make sure the functions' range is empty before going through it in the LLVM C API test
llvm-svn: 260947
2016-02-16 08:37:01 +00:00
Junmo Park 6ebdc14cf1 [SCEVExpander] Make findExistingExpansion smarter
Summary:
Extending findExistingExpansion can use existing value in ExprValueMap.
This patch gives 0.3~0.5% performance improvements on 
benchmarks(test-suite, spec2000, spec2006, commercial benchmark)
   
Reviewers: mzolotukhin, sanjoy, zzheng

Differential Revision: http://reviews.llvm.org/D15559

llvm-svn: 260938
2016-02-16 06:46:58 +00:00
Amaury Sechet 5590967610 Restore the capability to manipulate datalayout from the C API
Summary:
This consist in variosu addition to the C API:

  LLVMTargetDataRef LLVMGetModuleDataLayout(LLVMModuleRef M);
  void LLVMSetModuleDataLayout(LLVMModuleRef M, LLVMTargetDataRef DL);
  LLVMTargetDataRef LLVMCreateTargetMachineData(LLVMTargetMachineRef T);

Reviewers: joker.eph, Wallbraker, echristo

Subscribers: axw

Differential Revision: http://reviews.llvm.org/D17255

llvm-svn: 260936
2016-02-16 05:11:24 +00:00
Zia Ansari 30a02384f7 Implemented stack symbol table ordering/packing optimization to improve data locality and code size from SP/FP offset encoding.
Differential Revision: http://reviews.llvm.org/D15393

llvm-svn: 260917
2016-02-15 23:44:13 +00:00
Simon Pilgrim 7c920e611c [X86][SSE2] Regenerated sse2 tests
llvm-svn: 260900
2016-02-15 17:57:40 +00:00
Krzysztof Parzyszek 04bf43bd83 [Hexagon] Missed testcase update in r260895
llvm-svn: 260897
2016-02-15 16:15:02 +00:00
Scott Egerton d1aeb05654 [mips] Implemented the .hword directive.
Summary:
In order to pass the tests, this required marking R_MIPS_16 relocations
as needing to point to the symbol and not the section.

Reviewers: vkalintiris, dsanders

Subscribers: dsanders, llvm-commits

Differential Revision: http://reviews.llvm.org/D17200

llvm-svn: 260896
2016-02-15 16:11:51 +00:00
Krzysztof Parzyszek 73f1a40626 [Hexagon] Use zero-extending loads for anyext
llvm-svn: 260895
2016-02-15 16:01:01 +00:00
Silviu Baranga ec7063ac77 [LV] Add support for insertelt/extractelt processing during type truncation
Summary:
While shrinking types according to the required bits, we can
encounter insert/extract element instructions. This will cause us to
reach an llvm_unreachable statement.

This change adds support for truncating insert/extract element
operations, and adds a regression test.

Reviewers: jmolloy

Subscribers: mzolotukhin, llvm-commits

Differential Revision: http://reviews.llvm.org/D17078

llvm-svn: 260893
2016-02-15 15:38:17 +00:00
Simon Pilgrim 766a659eb5 [X86] More thorough partial-register division checks
For when grep counts are just not enough...

llvm-svn: 260891
2016-02-15 14:09:35 +00:00
Simon Pilgrim a62170834d [X86] Regenerated 64/128 bit multiply tests
llvm-svn: 260890
2016-02-15 14:04:05 +00:00
Simon Pilgrim 9513b3c4c7 [X86][SSE] More thorough testing of all-ones vectors re-materialization
llvm-svn: 260889
2016-02-15 13:50:48 +00:00
Simon Pilgrim 02d3b6a82d [X86][SSE] Regenerated uint2fp special case tests
llvm-svn: 260888
2016-02-15 13:41:41 +00:00
NAKAMURA Takumi eda01dcd57 Make llvm/test/tools/llvm-symbolizer/pdb/pdb.test Py3-compatible.
llvm-svn: 260887
2016-02-15 13:19:13 +00:00
Simon Pilgrim 4e4989a64a [X86][SSE] Regenerated fast isel intrinsics tests
llvm-svn: 260885
2016-02-15 12:32:16 +00:00
Scott Egerton 2c2a2f5119 Reverted r260879 as it caused test failures in lld.
llvm-svn: 260880
2016-02-15 10:04:38 +00:00
Scott Egerton baec95a88c [mips] Removed the SHF_ALLOC flag from the .pdr section.
Summary:
This section is used for debug information and has no need to be
in memory at runtime. With this patch, LLVM now emits the same flags as 
the GNU assembler. This patch also fixes an error when compiling 
the Linux kernel, The error is that there are relocations within the 
.pdr section in a VDSO.

Reviewers: vkalintiris, dsanders

Subscribers: llvm-commits, dsanders

Differential Revision: http://reviews.llvm.org/D17199

llvm-svn: 260879
2016-02-15 09:34:15 +00:00
Igor Breger 4dc7d390db AVX512: Change store size of kmask. Store size of v8i1, v4i1 , v2i1 and i1 are changed to 16 bits.
If KMOVB not supported (require AVX512DQ) only KMOVW can be used so store size should be 2 bytes.

Differential Revision: http://reviews.llvm.org/D17138

llvm-svn: 260878
2016-02-15 08:25:28 +00:00
Simon Pilgrim 834931554b [X86][AVX] Fixed copy+paste typo in shuffle test
llvm-svn: 260852
2016-02-14 18:11:52 +00:00
Chandler Carruth bece8d517d [PM/AA] Wire BasicAA's new pass manager class up to the pass registry.
This ensures that all of the various pieces are working. The next patch
will wire up commandline-driven alias analysis chain building and allow
BasicAA to work with the AAManager.

llvm-svn: 260838
2016-02-13 23:46:24 +00:00
Chandler Carruth 6f5770b10f [PM/AA] Actually wire the AAManager I built for the new pass manager
into the new pass manager and fix the latent bugs there.

This lets everything live together nicely, but it isn't really useful
yet. I never finished wiring the AA layer up for the new pass manager,
and so subsequent patches will change this to do that wiring and get AA
stuff more fully integrated into the new pass manager. Turns out this is
necessary even to get functionattrs ported over. =]

llvm-svn: 260836
2016-02-13 23:32:00 +00:00
Simon Pilgrim 08ba012973 [X86][AVX] Lower shuffles as repeated lane shuffles then lane-crossing shuffles
This patch attempts to represent a shuffle as a repeating shuffle (recognisable by is128BitLaneRepeatedShuffleMask) with the source input(s) in their original lanes, followed by a single permutation of the 128-bit lanes to their final destinations.

On AVX2 we can additionally attempt to match using 64-bit sub-lane permutation. AVX2 can also now match a similar 'broadcasted' repeating shuffle.

This patch has several benefits:

 * Avoids prematurely matching with lowerVectorShuffleByMerging128BitLanes which can require both inputs to have their input lanes permuted before shuffling.
 * Can replace PERMPS/PERMD instructions - although these are useful for cross-lane unary shuffling, they require their shuffle mask to be pre-loaded (and increase register pressure).
 * Matching the repeating shuffle makes use of a lot of existing shuffle lowering.

There is an outstanding minor AVX1 regression (combine_unneeded_subvector1 in vector-shuffle-combining.ll) of a previously 128-bit shuffle + subvector splat being converted to a subvector splat + (2 instruction) 256-bit shuffle, I intend to fix this in a followup patch for review.

Differential Revision: http://reviews.llvm.org/D16537

llvm-svn: 260834
2016-02-13 21:54:04 +00:00
Sanjay Patel e9bf993cee [x86-64] allow mfence even with -mno-sse (PR23203)
As shown in:
https://llvm.org/bugs/show_bug.cgi?id=23203
...we currently die because lowering believes that mfence is allowed without SSE2 on x86-64,
but the instruction def doesn't know that.

I don't know if allowing mfence without SSE is right, but if not, at least now it's consistently wrong. :)

Differential Revision: http://reviews.llvm.org/D17219

llvm-svn: 260828
2016-02-13 17:26:29 +00:00
Chandler Carruth 632d208c78 [attrs] Move the norecurse deduction to operate on the node set rather
than the SCC object, and have it scan the instruction stream directly
rather than relying on call records.

This makes the behavior of this routine consistent between libc routines
and LLVM intrinsics for libc routines. We can go and start teaching it
about those being norecurse, but we should behave the same for the
intrinsic and the libc routine rather than differently. I chatted with
James Molloy and the inconsistency doesn't seem intentional and likely
is due to intrinsic calls not being modelled in the call graph analyses.

This also fixes a bug where we would deduce norecurse on optnone
functions, when generally we try to handle optnone functions as-if they
were replaceable and thus unanalyzable.

llvm-svn: 260813
2016-02-13 08:47:51 +00:00
Matt Arsenault f2ddbf00ed AMDGPU: Prepare for reducing private element size.
Tests for the new scalarize all private access options will be
included with a future commit.

The only functional change is to make the split/scalarize behavior
for private access of > 4 element vectors to be consistent
with the flat/global handling. This makes the spilling worse
in the two changed tests.

llvm-svn: 260804
2016-02-13 04:18:53 +00:00
Tom Stellard 4409051d00 AMDGPU/SI: Add llvm.amdgcn.mov.dpp intrinsic
This intrinsic will be used to expose dpp functionality to higher-level
languages. It will map to the dpp version of v_mov_b32.

llvm-svn: 260792
2016-02-13 02:09:49 +00:00
Davide Italiano ff11b90752 [llvm-size] Make error handling uniform.
llvm-svn: 260786
2016-02-13 01:38:16 +00:00
Matt Arsenault ce56a0ef54 AMDGPU: Add intrinsics for sin/cos
These provide direct access to the hardware instruction without
the unit version required like llvm.sin/llvm.cos lowering requires.

llvm-svn: 260782
2016-02-13 01:19:56 +00:00
Matt Arsenault 79963e80b8 AMDGPU: Rename intrinsic to better match instruction name
Also fixes missing f32 test.

llvm-svn: 260780
2016-02-13 01:03:00 +00:00
Pirama Arumuga Nainar 7476bc89e9 Don't combine fp_round (fp_round x) if f80 to f16 is generated
Summary:
This patch skips DAG combine of fp_round (fp_round x) if it results in
an fp_round from f80 to f16.

fp_round from f80 to f16 always generates an expensive (and as yet,
unimplemented) libcall to __truncxfhf2.  This prevents selection of
native f16 conversion instructions from f32 or f64.  Moreover, the first
(value-preserving) fp_round from f80 to either f32 or f64 may become a
NOP in platforms like x86.

Reviewers: ab

Subscribers: srhines, llvm-commits

Differential Revision: http://reviews.llvm.org/D17221

llvm-svn: 260769
2016-02-13 00:08:05 +00:00
Tom Stellard bc4497b13c AMDGPU/SI: Detect uniform branches and emit s_cbranch instructions
Reviewers: arsenm

Subscribers: mareko, MatzeB, qcolombet, arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D16603

llvm-svn: 260765
2016-02-12 23:45:29 +00:00
Yunzhong Gao 0de36ec169 Disable the vzeroupper insertion pass on PS4.
Differential Revision: http://reviews.llvm.org/D16837

llvm-svn: 260764
2016-02-12 23:37:57 +00:00
Krzysztof Parzyszek 7793ddb043 [Hexagon] Optimize stack slot spills
Replace spills to memory with spills to registers, if possible. This
applies mostly to predicate registers (both scalar and vector), since
they are very limited in number. A spill of a predicate register may
happen even if there is a general-purpose register available. In cases
like this the stack spill/reload may be eliminated completely.

This optimization will consider all stack objects, regardless of where
they came from and try to match the live range of the stack slot with
a dead range of a register from an appropriate register class.

llvm-svn: 260758
2016-02-12 22:53:35 +00:00
David Majnemer df3857c7d4 [llvm-pdbdump] Start to decode some streams
We can decode a little bit of the first stream now.

llvm-svn: 260754
2016-02-12 22:27:44 +00:00