llvm-project

Commit Graph

Author	SHA1	Message	Date
Ahmed Bougacha	a87c3480b5	[X86] Extract PSIGN/BLENDVP tests into vector-blend.ll. NFC. We're going to stop generating PSIGN, so calling a test "psign" isn't ideal. Instead, call these tests what they really are: variable blends using logic. Also add a test to exhibit a case we're currently missing in the PSIGN combine. llvm-svn: 261022	2016-02-16 22:13:59 +00:00
Derek Schuff	f8f8f093aa	[WebAssemly] Don't move calls or stores past intervening loads The register stackifier currently checks for intervening stores (and loads that may alias them) but doesn't account for the fact that the instruction being moved may affect intervening loads. Differential Revision: http://reviews.llvm.org/D17298 llvm-svn: 261014	2016-02-16 21:44:19 +00:00
Adam Nemet	106fedab6f	[LTO] Support Statistics Summary: I thought -Xlinker -mllvm -Xlinker -stats worked at some point but maybe it never did. For clang, I believe that stats are printed from cc1_main. This patch also prints them for LTO, specifically right after codegen happens. I only looked at the C API for LTO briefly to see if this is a good place. Probably there are still cases where this wouldn't be printed but it seems to be working for the common case. I also experimented putting this in the LTOCodeGenerator destructor but that didn't trigger for me because ld64 does not destroy the LTOCodeGenerator. Reviewers: dexonsmith, joker.eph Subscribers: rafael, joker.eph, llvm-commits Differential Revision: http://reviews.llvm.org/D17302 llvm-svn: 261013	2016-02-16 21:41:51 +00:00
Reid Kleckner	6e0d5f573c	[codeview] Fix assertion on non-memory, non-register DBG_VALUE instructions Eventually we should find a way to describe constant variables, but it is not obvious how to do this at the moment. llvm-svn: 261010	2016-02-16 21:14:51 +00:00
Colin LeMahieu	ecef1d9cbc	[Hexagon] Adding relocation for code size, cold path optimization allowing a 23-bit 4-byte aligned relocation to be a valid instruction encoding. The usual way to get a 32-bit relocation is to use a constant extender which doubles the size of the instruction, 4 bytes to 8 bytes. Another way is to put a .word32 and mix code and data within a function. The disadvantage is it's not a valid instruction encoding and jumping over it causes prefetch stalls inside the hardware. This relocation packs a 23-bit value in to an "r0 = add(rX, #a)" instruction by overwriting the source register bits. Since r0 is the return value register, if this instruction is placed after a function call which return void, r0 will be filled with an undefined value, the prefetch won't be confused, and the callee can access the constant value by way of the link register. llvm-svn: 261006	2016-02-16 20:38:17 +00:00
Jun Bum Lim	b389d9b9af	[AArch64] Add pass to remove redundant copy after RA Summary: This change will add a pass to remove unnecessary zero copies in target blocks of cbz/cbnz instructions. E.g., the copy instruction in the code below can be removed because the cbz jumps to BB1 when x0 is zero : BB0: cbz x0, .BB1 BB1: mov x0, xzr Jun Reviewers: gberry, jmolloy, HaoLiu, MatzeB, mcrosier Subscribers: mcrosier, mssimpso, haicheng, bmakam, llvm-commits, aemerson, rengolin Differential Revision: http://reviews.llvm.org/D16203 llvm-svn: 261004	2016-02-16 20:02:39 +00:00
Derek Schuff	aadc89c25d	[WebAssembly] Insert COPY_LOCAL between CopyToReg and FrameIndex DAG nodes CopyToReg nodes don't support FrameIndex operands. Other targets select the FI to some LEA-like instruction, but since we don't have that, we need to insert some kind of instruction that can take an FI operand and produces a value usable by CopyToReg (i.e. in a vreg). So insert a dummy copy_local between Op and its FI operand. This results in a redundant copy which we should optimize away later (maybe in the post-FI-lowering peephole pass). Differential Revision: http://reviews.llvm.org/D17213 llvm-svn: 260987	2016-02-16 18:18:36 +00:00
Philip Reames	845435c86a	Revert 260705, it appears to be causing pr26628 The root issue appears to be a confusion around what makeNoWrapRegion actually does. It seems likely we need two versions of this function with slightly different semantics. llvm-svn: 260981	2016-02-16 17:14:30 +00:00
Andrey Turetskiy	eab4e68650	[X86] Enable the LEA optimization pass by default. Differential Revision: http://reviews.llvm.org/D16877 llvm-svn: 260979	2016-02-16 16:41:38 +00:00
Dan Gohman	442bfcec00	[WebAssembly] Switch from RPO sorting to topological sorting. WebAssembly doesn't require full RPO; topological sorting is sufficient and can preserve more of the MachineBlockPlacement ordering. Unfortunately, this still depends a lot on heuristics, because while we use the MachineBlockPlacement ordering as a guide, we can't use it in places where it isn't topologically ordered. This area will require further attention. llvm-svn: 260978	2016-02-16 16:22:41 +00:00
Dan Gohman	8aa237c3ca	[WebAssembly] Create new registers instead of reusing old ones in RegStackify. This avoids some complications updating LiveIntervals to be aware of the new register lifetimes, because we can just compute new intervals from scratch rather than describe how the old ones have been changed. llvm-svn: 260971	2016-02-16 15:17:21 +00:00
Rafael Espindola	944f655e05	Reapply r260489. Original commit message: [readobj] Dump DT_JMPREL relocations when outputting dynamic relocations. The bits of r260488 it depends on have been committed. llvm-svn: 260970	2016-02-16 15:16:00 +00:00
Dan Gohman	aa7429112e	[WebAssembly] Implement support for custom NaN bit patterns. llvm-svn: 260968	2016-02-16 15:14:23 +00:00
Rafael Espindola	c70aedab0e	Introduce a getAsRange helper. This requires making an error message a bit more generic, but that seems a reasonable tradeoff. Extracted from r260488 but simplified a bit. llvm-svn: 260967	2016-02-16 14:50:39 +00:00
Rafael Espindola	6009db696b	This reverts commit r260488 and r260489. Original messages: Revert "[readobj] Handle ELF files with no section table or with no program headers." Revert "[readobj] Dump DT_JMPREL relocations when outputting dynamic relocations." r260489 depends on r260488 and among other issues r260488 deleted error handling code. llvm-svn: 260962	2016-02-16 14:17:48 +00:00
Andrey Turetskiy	1052ac2311	[X86] PR26575: Fix LEA optimization pass. Add a missing check for a type of address displacement operand of the load/store instruction being a candidate for LEA substitution. Ref: https://llvm.org/bugs/show_bug.cgi?id=26575 Differential Revision: http://reviews.llvm.org/D17261 llvm-svn: 260959	2016-02-16 12:47:45 +00:00
Amaury Sechet	f447a6b5f4	Make sure the functions' range is empty before going through it in the LLVM C API test llvm-svn: 260947	2016-02-16 08:37:01 +00:00
Junmo Park	6ebdc14cf1	[SCEVExpander] Make findExistingExpansion smarter Summary: Extending findExistingExpansion can use existing value in ExprValueMap. This patch gives 0.3~0.5% performance improvements on benchmarks(test-suite, spec2000, spec2006, commercial benchmark) Reviewers: mzolotukhin, sanjoy, zzheng Differential Revision: http://reviews.llvm.org/D15559 llvm-svn: 260938	2016-02-16 06:46:58 +00:00
Amaury Sechet	5590967610	Restore the capability to manipulate datalayout from the C API Summary: This consist in variosu addition to the C API: LLVMTargetDataRef LLVMGetModuleDataLayout(LLVMModuleRef M); void LLVMSetModuleDataLayout(LLVMModuleRef M, LLVMTargetDataRef DL); LLVMTargetDataRef LLVMCreateTargetMachineData(LLVMTargetMachineRef T); Reviewers: joker.eph, Wallbraker, echristo Subscribers: axw Differential Revision: http://reviews.llvm.org/D17255 llvm-svn: 260936	2016-02-16 05:11:24 +00:00
Zia Ansari	30a02384f7	Implemented stack symbol table ordering/packing optimization to improve data locality and code size from SP/FP offset encoding. Differential Revision: http://reviews.llvm.org/D15393 llvm-svn: 260917	2016-02-15 23:44:13 +00:00
Simon Pilgrim	7c920e611c	[X86][SSE2] Regenerated sse2 tests llvm-svn: 260900	2016-02-15 17:57:40 +00:00
Krzysztof Parzyszek	04bf43bd83	[Hexagon] Missed testcase update in r260895 llvm-svn: 260897	2016-02-15 16:15:02 +00:00
Scott Egerton	d1aeb05654	[mips] Implemented the .hword directive. Summary: In order to pass the tests, this required marking R_MIPS_16 relocations as needing to point to the symbol and not the section. Reviewers: vkalintiris, dsanders Subscribers: dsanders, llvm-commits Differential Revision: http://reviews.llvm.org/D17200 llvm-svn: 260896	2016-02-15 16:11:51 +00:00
Krzysztof Parzyszek	73f1a40626	[Hexagon] Use zero-extending loads for anyext llvm-svn: 260895	2016-02-15 16:01:01 +00:00
Silviu Baranga	ec7063ac77	[LV] Add support for insertelt/extractelt processing during type truncation Summary: While shrinking types according to the required bits, we can encounter insert/extract element instructions. This will cause us to reach an llvm_unreachable statement. This change adds support for truncating insert/extract element operations, and adds a regression test. Reviewers: jmolloy Subscribers: mzolotukhin, llvm-commits Differential Revision: http://reviews.llvm.org/D17078 llvm-svn: 260893	2016-02-15 15:38:17 +00:00
Simon Pilgrim	766a659eb5	[X86] More thorough partial-register division checks For when grep counts are just not enough... llvm-svn: 260891	2016-02-15 14:09:35 +00:00
Simon Pilgrim	a62170834d	[X86] Regenerated 64/128 bit multiply tests llvm-svn: 260890	2016-02-15 14:04:05 +00:00
Simon Pilgrim	9513b3c4c7	[X86][SSE] More thorough testing of all-ones vectors re-materialization llvm-svn: 260889	2016-02-15 13:50:48 +00:00
Simon Pilgrim	02d3b6a82d	[X86][SSE] Regenerated uint2fp special case tests llvm-svn: 260888	2016-02-15 13:41:41 +00:00
NAKAMURA Takumi	eda01dcd57	Make llvm/test/tools/llvm-symbolizer/pdb/pdb.test Py3-compatible. llvm-svn: 260887	2016-02-15 13:19:13 +00:00
Simon Pilgrim	4e4989a64a	[X86][SSE] Regenerated fast isel intrinsics tests llvm-svn: 260885	2016-02-15 12:32:16 +00:00
Scott Egerton	2c2a2f5119	Reverted r260879 as it caused test failures in lld. llvm-svn: 260880	2016-02-15 10:04:38 +00:00
Scott Egerton	baec95a88c	[mips] Removed the SHF_ALLOC flag from the .pdr section. Summary: This section is used for debug information and has no need to be in memory at runtime. With this patch, LLVM now emits the same flags as the GNU assembler. This patch also fixes an error when compiling the Linux kernel, The error is that there are relocations within the .pdr section in a VDSO. Reviewers: vkalintiris, dsanders Subscribers: llvm-commits, dsanders Differential Revision: http://reviews.llvm.org/D17199 llvm-svn: 260879	2016-02-15 09:34:15 +00:00
Igor Breger	4dc7d390db	AVX512: Change store size of kmask. Store size of v8i1, v4i1 , v2i1 and i1 are changed to 16 bits. If KMOVB not supported (require AVX512DQ) only KMOVW can be used so store size should be 2 bytes. Differential Revision: http://reviews.llvm.org/D17138 llvm-svn: 260878	2016-02-15 08:25:28 +00:00
Simon Pilgrim	834931554b	[X86][AVX] Fixed copy+paste typo in shuffle test llvm-svn: 260852	2016-02-14 18:11:52 +00:00
Chandler Carruth	bece8d517d	[PM/AA] Wire BasicAA's new pass manager class up to the pass registry. This ensures that all of the various pieces are working. The next patch will wire up commandline-driven alias analysis chain building and allow BasicAA to work with the AAManager. llvm-svn: 260838	2016-02-13 23:46:24 +00:00
Chandler Carruth	6f5770b10f	[PM/AA] Actually wire the AAManager I built for the new pass manager into the new pass manager and fix the latent bugs there. This lets everything live together nicely, but it isn't really useful yet. I never finished wiring the AA layer up for the new pass manager, and so subsequent patches will change this to do that wiring and get AA stuff more fully integrated into the new pass manager. Turns out this is necessary even to get functionattrs ported over. =] llvm-svn: 260836	2016-02-13 23:32:00 +00:00
Simon Pilgrim	08ba012973	[X86][AVX] Lower shuffles as repeated lane shuffles then lane-crossing shuffles This patch attempts to represent a shuffle as a repeating shuffle (recognisable by is128BitLaneRepeatedShuffleMask) with the source input(s) in their original lanes, followed by a single permutation of the 128-bit lanes to their final destinations. On AVX2 we can additionally attempt to match using 64-bit sub-lane permutation. AVX2 can also now match a similar 'broadcasted' repeating shuffle. This patch has several benefits: * Avoids prematurely matching with lowerVectorShuffleByMerging128BitLanes which can require both inputs to have their input lanes permuted before shuffling. * Can replace PERMPS/PERMD instructions - although these are useful for cross-lane unary shuffling, they require their shuffle mask to be pre-loaded (and increase register pressure). * Matching the repeating shuffle makes use of a lot of existing shuffle lowering. There is an outstanding minor AVX1 regression (combine_unneeded_subvector1 in vector-shuffle-combining.ll) of a previously 128-bit shuffle + subvector splat being converted to a subvector splat + (2 instruction) 256-bit shuffle, I intend to fix this in a followup patch for review. Differential Revision: http://reviews.llvm.org/D16537 llvm-svn: 260834	2016-02-13 21:54:04 +00:00
Sanjay Patel	e9bf993cee	[x86-64] allow mfence even with -mno-sse (PR23203) As shown in: https://llvm.org/bugs/show_bug.cgi?id=23203 ...we currently die because lowering believes that mfence is allowed without SSE2 on x86-64, but the instruction def doesn't know that. I don't know if allowing mfence without SSE is right, but if not, at least now it's consistently wrong. :) Differential Revision: http://reviews.llvm.org/D17219 llvm-svn: 260828	2016-02-13 17:26:29 +00:00
Chandler Carruth	632d208c78	[attrs] Move the norecurse deduction to operate on the node set rather than the SCC object, and have it scan the instruction stream directly rather than relying on call records. This makes the behavior of this routine consistent between libc routines and LLVM intrinsics for libc routines. We can go and start teaching it about those being norecurse, but we should behave the same for the intrinsic and the libc routine rather than differently. I chatted with James Molloy and the inconsistency doesn't seem intentional and likely is due to intrinsic calls not being modelled in the call graph analyses. This also fixes a bug where we would deduce norecurse on optnone functions, when generally we try to handle optnone functions as-if they were replaceable and thus unanalyzable. llvm-svn: 260813	2016-02-13 08:47:51 +00:00
Matt Arsenault	f2ddbf00ed	AMDGPU: Prepare for reducing private element size. Tests for the new scalarize all private access options will be included with a future commit. The only functional change is to make the split/scalarize behavior for private access of > 4 element vectors to be consistent with the flat/global handling. This makes the spilling worse in the two changed tests. llvm-svn: 260804	2016-02-13 04:18:53 +00:00
Tom Stellard	4409051d00	AMDGPU/SI: Add llvm.amdgcn.mov.dpp intrinsic This intrinsic will be used to expose dpp functionality to higher-level languages. It will map to the dpp version of v_mov_b32. llvm-svn: 260792	2016-02-13 02:09:49 +00:00
Davide Italiano	ff11b90752	[llvm-size] Make error handling uniform. llvm-svn: 260786	2016-02-13 01:38:16 +00:00
Matt Arsenault	ce56a0ef54	AMDGPU: Add intrinsics for sin/cos These provide direct access to the hardware instruction without the unit version required like llvm.sin/llvm.cos lowering requires. llvm-svn: 260782	2016-02-13 01:19:56 +00:00
Matt Arsenault	79963e80b8	AMDGPU: Rename intrinsic to better match instruction name Also fixes missing f32 test. llvm-svn: 260780	2016-02-13 01:03:00 +00:00
Pirama Arumuga Nainar	7476bc89e9	Don't combine fp_round (fp_round x) if f80 to f16 is generated Summary: This patch skips DAG combine of fp_round (fp_round x) if it results in an fp_round from f80 to f16. fp_round from f80 to f16 always generates an expensive (and as yet, unimplemented) libcall to __truncxfhf2. This prevents selection of native f16 conversion instructions from f32 or f64. Moreover, the first (value-preserving) fp_round from f80 to either f32 or f64 may become a NOP in platforms like x86. Reviewers: ab Subscribers: srhines, llvm-commits Differential Revision: http://reviews.llvm.org/D17221 llvm-svn: 260769	2016-02-13 00:08:05 +00:00
Tom Stellard	bc4497b13c	AMDGPU/SI: Detect uniform branches and emit s_cbranch instructions Reviewers: arsenm Subscribers: mareko, MatzeB, qcolombet, arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D16603 llvm-svn: 260765	2016-02-12 23:45:29 +00:00
Yunzhong Gao	0de36ec169	Disable the vzeroupper insertion pass on PS4. Differential Revision: http://reviews.llvm.org/D16837 llvm-svn: 260764	2016-02-12 23:37:57 +00:00
Krzysztof Parzyszek	7793ddb043	[Hexagon] Optimize stack slot spills Replace spills to memory with spills to registers, if possible. This applies mostly to predicate registers (both scalar and vector), since they are very limited in number. A spill of a predicate register may happen even if there is a general-purpose register available. In cases like this the stack spill/reload may be eliminated completely. This optimization will consider all stack objects, regardless of where they came from and try to match the live range of the stack slot with a dead range of a register from an appropriate register class. llvm-svn: 260758	2016-02-12 22:53:35 +00:00
David Majnemer	df3857c7d4	[llvm-pdbdump] Start to decode some streams We can decode a little bit of the first stream now. llvm-svn: 260754	2016-02-12 22:27:44 +00:00

1 2 3 4 5 ...

34511 Commits