llvm-project

Commit Graph

Author	SHA1	Message	Date
Michael Zolotukhin	7d6293a0d3	[InstCombine] Add optimization of redundant insertvalue instructions. rdar://problem/11861387 llvm-svn: 208214	2014-05-07 14:30:18 +00:00
Evgeniy Stepanov	c14fc42137	[msan] Fix -fsanitize=memory -fno-integrated-as. llvm-svn: 208211	2014-05-07 14:10:51 +00:00
Tim Northover	88a51d983e	AArch64/ARM64: optimise vector selects & enable test When performing a scalar comparison that feeds into a vector select, it's actually better to do the comparison on the vector side: the scalar route would be "CMP -> CSEL -> DUP", the vector is "CM -> DUP" since the vector comparisons are all mask based. llvm-svn: 208210	2014-05-07 14:10:27 +00:00
James Molloy	d3c401a2d0	[ARM64-BE] Fix fast-isel, and add appropriate RUN lines to appropriate tests. llvm-svn: 208200	2014-05-07 12:33:55 +00:00
James Molloy	36132057da	[ARM64-BE] Fix variable-argument saving. llvm-svn: 208199	2014-05-07 12:33:48 +00:00
James Molloy	4049e4fd77	[ARM64-BE] Implement the lane-twiddling logic at AAPCS boundaries for big endian. The AAPCS states that values passed in registers must have a value as though they had been loaded with "LDR". LDR is equivalent to "LD1.64 vX.1D" - that is, loading scalars to vector registers and loading 1-element vectors is equivalent. The logic implemented here is to ensure that at all call boundaries and during formal argument lowering all vectors are treated as their bitwidth-based floating point scalar counterpart, which is always one of f64 or f128 (v2i32 -> f64, v4i32 -> f128 etc). A BITCAST is inserted so that the appropriate REV will be generated during code generation. llvm-svn: 208198	2014-05-07 12:33:41 +00:00
James Molloy	30e0e11eb4	[ARM64-BE] Implement the crazy bitcast handling for big endian vectors. Because we've canonicalised on using LD1/ST1, every time we do a bitcast between vector types we must do an equivalent lane reversal. Consider a simple memory load followed by a bitconvert then a store. v0 = load v2i32 v1 = BITCAST v2i32 v0 to v4i16 store v4i16 v2 In big endian mode every memory access has an implicit byte swap. LDR and STR do a 64-bit byte swap, whereas LD1/ST1 do a byte swap per lane - that is, they treat the vector as a sequence of elements to be byte-swapped. The two pairs of instructions are fundamentally incompatible. We've decided to use LD1/ST1 only to simplify compiler implementation. LD1/ST1 perform the equivalent of a sequence of LDR/STR + REV. This makes the original code sequence: v0 = load v2i32 v1 = REV v2i32 (implicit) v2 = BITCAST v2i32 v1 to v4i16 v3 = REV v4i16 v2 (implicit) store v4i16 v3 But this is now broken - the value stored is different to the value loaded due to lane reordering. To fix this, on every BITCAST we must perform two other REVs: v0 = load v2i32 v1 = REV v2i32 (implicit) v2 = REV v2i32 v3 = BITCAST v2i32 v2 to v4i16 v4 = REV v4i16 v5 = REV v4i16 v4 (implicit) store v4i16 v5 This means an extra two instructions, but actually in most cases the two REV instructions can be combined into one. For example: (REV64_2s (REV64_4h X)) === (REV32_4h X) There is also no 128-bit REV instruction. This must be synthesized with an EXT instruction. Most bitconverts require some sort of conversion. The only exceptions are: a) Identity conversions - vNfX <-> vNiX b) Single-lane-to-scalar - v1fX <-> fX or v1iX <-> iX Even though there are hundreds of changed lines, I have a fairly high confidence that they are somewhat correct. The changes to add two REV instructions per bitcast were pretty mechanical, and once I'd done that I threw the resulting .td at a script I wrote which combined the two REVs together (and added an EXT instruction, for f128) based on an instruction description I gave it. This was much less prone to error than doing it all manually, plus my brain would not just have melted but would have vapourised. llvm-svn: 208194	2014-05-07 11:28:53 +00:00
James Molloy	ccc7f982c1	[ARM64-BE] Make big endian (scalar) argument passing work correctly. This completes the port of r204814 (cpirker "AArch64_BE function argument passing for ARM ABI") from AArch64 to ARM64, and fixes a bunch of issues found during later development along the way. The biggest of these was that the alignment fixup logic wasn't replicated into all the places it should have been. llvm-svn: 208192	2014-05-07 11:28:36 +00:00
Tim Northover	df723343fa	AArch64/ARM64: run test on ARM64 too. llvm-svn: 208188	2014-05-07 10:47:04 +00:00
Tim Northover	76a94e6ead	AArch64/ARM64: put annotation in test It makes finding already covered tests much easier with "grep -L arm64". llvm-svn: 208187	2014-05-07 10:47:00 +00:00
Tim Northover	2d7cacd86b	AArch64/ARM64: disable test directory if ARM64 not present llvm-svn: 208186	2014-05-07 10:42:06 +00:00
Daniel Sanders	314e80e5f8	[tablegen] Add !listconcat operator with the similar semantics as !strconcat Summary: It concatenates two or more lists. In addition to the !strconcat semantics the lists must have the same element type. My overall aim is to make it easy to append to Instruction.Predicates rather than override it. This can be done by concatenating lists passed as arguments, or by concatenating lists passed in additional fields. Reviewers: dsanders Reviewed By: dsanders Subscribers: hfinkel, llvm-commits Differential Revision: http://reviews.llvm.org/D3506 llvm-svn: 208183	2014-05-07 10:13:19 +00:00
Evgeniy Stepanov	3819f02819	[asan] Add a flag to control asm instrumentation. With this change, asm instrumentation is disabled by default. llvm-svn: 208167	2014-05-07 07:54:11 +00:00
Joerg Sonnenberger	cf86ce136c	Allow using normal .eh_frame based unwinding on ARM. Use the same encodings as x86. Use this exception model for NetBSD. llvm-svn: 208166	2014-05-07 07:49:34 +00:00
Saleem Abdulrasool	acd0338c61	ARM: fix WoA PEI instruction selection The ARM::BLX instruction is an ARM mode instruction. The Windows on ARM target is limited to Thumb instructions. Correctly use the thumb mode tBLXr instruction. This would manifest as an errant write into the object file as the instruction is 4-bytes in length rather than 2. The result would be a corrupted object file that would eventually result in an executable that would crash at runtime. llvm-svn: 208152	2014-05-07 03:03:27 +00:00
Justin Bogner	cf27e1b996	llvm-cov: Handle missing source files as GCOV does If the source files referenced by a gcno file are missing, gcov outputs a coverage file where every line is simply /EOF/. This also occurs for lines in the coverage that are past the end of a file that is found. This change mimics gcov. llvm-svn: 208149	2014-05-07 02:11:23 +00:00
Justin Bogner	1a18d7caa3	llvm-cov: Implement --no-output In gcov, there's a -n/--no-output option, which disables the writing of any .gcov files, so that it emits only the summary info on stdout. This implements the same behaviour in llvm-cov. llvm-svn: 208148	2014-05-07 02:11:18 +00:00
Joerg Sonnenberger	818e725158	If a function needs a frame pointer, but r11 (aka fp) has not been used, remove it from the list of unspilled registers. Otherwise the following attempt to keep the stack aligned by picking an extra GPR register to spill will not work as it picks up r11. llvm-svn: 208129	2014-05-06 20:43:01 +00:00
Diego Novillo	dd49157db1	Do not make -pass-remarks additive. Summary: When I initially introduced -pass-remarks, I thought it would be a neat idea to make it additive. So, if one used it as: $ llc -pass-remarks=inliner --pass-remarks=loop.* the compiler would build the regular expression '(inliner)\|(loop.*)'. The more I think about it, the more I regret it. This is not how other flags work. The standard semantics are right-to-left overrides. This is how clang interprets -Rpass. And I think the two should be compatible in this respect. Reviewers: qcolombet Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D3614 llvm-svn: 208122	2014-05-06 19:14:00 +00:00
Benjamin Kramer	1625bfccbe	TTI: Estimate @llvm.fmuladd cost as fmul + fadd when FMA's aren't legal on the target. llvm-svn: 208115	2014-05-06 18:36:23 +00:00
Andrea Di Biagio	c14ccc9184	[X86] Improve the lowering of BITCAST dag nodes from type f64 to type v2i32 (and vice versa). Before this patch, the backend always emitted a store+load sequence to bitconvert from f64 to i64 the input operand of a ISD::BITCAST dag node that performed a bitconvert from type MVT::f64 to type MVT::v2i32. The resulting i64 node was then used to build a v2i32 vector. With this patch, the backend now produces a cheaper SCALAR_TO_VECTOR from MVT::f64 to MVT::v2f64. That SCALAR_TO_VECTOR is then followed by a "free" bitcast to type MVT::v4i32. The elements of the resulting v4i32 are then extracted to build a v2i32 vector (which is illegal and therefore promoted to MVT::v2i64). This is in general cheaper than emitting a stack store+load sequence to bitconvert the operand from type f64 to type i64. llvm-svn: 208107	2014-05-06 17:09:03 +00:00
Renato Golin	c7aea40ec6	Implememting named register intrinsics This patch implements the infrastructure to use named register constructs in programs that need access to specific registers (bare metal, kernels, etc). So far, only the stack pointer is supported as a technology preview, but as it is, the intrinsic can already support all non-allocatable registers from any architecture. llvm-svn: 208104	2014-05-06 16:51:25 +00:00
Rafael Espindola	52dc5d828f	Special case aliases in GlobalValue::getAlignment. An alias has the address of what it points to, so it also has the same alignment. This allows a few optimizations to see past aliases for free. llvm-svn: 208103	2014-05-06 16:48:58 +00:00
Tim Northover	618850b6a5	AArch64/ARM64: implement diagnosis of unpredictable loads & stores llvm-svn: 208091	2014-05-06 14:15:14 +00:00
Tim Northover	2ac82426f8	AArch64/ARM64: add two more MC tests to ARM64 set. llvm-svn: 208085	2014-05-06 12:50:58 +00:00
Tim Northover	d450746dc9	AArch64/ARM64: enable MC-level diagnostic tests for NEON insts. Obviously we can't expect the two backends to produce identical diagnostics, since what's possible depends quite a bit on how the .td files are structured. I think the ARM64 diagnostics are basically of the same quality in all the changed cases, so I've split the CHECK lines. llvm-svn: 208084	2014-05-06 12:50:55 +00:00
Tim Northover	15641cd4e1	AArch64/ARM64: make NEON vector list parsing a bit more robust It doesn't change the results, but it seems silly not to diagnose obvious problems early on. llvm-svn: 208083	2014-05-06 12:50:51 +00:00
Tim Northover	0f54f309bb	AArch64/ARM64: produce more informative diagnostic assembling some immediates No tests here, they'll be added when the entire neon-diagnostics.s test from AArch64 is enabled. llvm-svn: 208079	2014-05-06 11:18:53 +00:00
Christian Pirker	fdce7cea93	ARM: For thumb fixups store halfwords high first and low second llvm-svn: 208076	2014-05-06 10:05:11 +00:00
Kevin Qin	1353c3405d	[ARM64] Enable alignment control option in front-end for ARM64. This is the modification in llvm part. llvm-svn: 208074	2014-05-06 09:48:52 +00:00
Reid Kleckner	4a406d32e9	Fix i128 div/mod on mingw64 The Win64 docs are very clear that anything larger than 8 bytes is passed by reference, and GCC MinGW64 honors that for __modti3 and friends. Patch by Jameson Nash! llvm-svn: 208029	2014-05-06 01:20:42 +00:00
Nick Lewycky	5ef6bc8815	Improve 'tail' call marking in TRE. A bootstrap of clang goes from 375k calls marked tail in the IR to 470k, however this improvement does not carry into an improvement of the call/jmp ratio on x86. The most common pattern is a tail call + br to a block with nothing but a 'ret'. The number of tail call to loop conversions remains the same (1618 by my count). The new algorithm does a local scan over the use-def chains to identify local "alloca-derived" values, as well as points where the alloca could escape. Then, a visit over the CFG marks blocks as being before or after the allocas have escaped, and annotates the calls accordingly. llvm-svn: 208017	2014-05-05 23:59:03 +00:00
Tom Stellard	45b3dcd35b	R600: Expand i64 ISD:SUB llvm-svn: 208005	2014-05-05 21:47:15 +00:00
Filipe Cabecinhas	fe59062b75	Revert "Optimize shufflevector that copies an i64/f64 and zeros the rest." This reverts commit 207992. I misread the phab number on the LGTM. llvm-svn: 207993	2014-05-05 19:40:36 +00:00
Filipe Cabecinhas	263d98c19f	Optimize shufflevector that copies an i64/f64 and zeros the rest. Summary: Also ran clang-format on the function. The code added is the last else if block. Reviewers: nadav, craig.topper Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D3518 llvm-svn: 207992	2014-05-05 19:36:28 +00:00
Michael Zolotukhin	e37f33c466	Move test from r207969 to another folder and rename it. llvm-svn: 207984	2014-05-05 18:10:15 +00:00
Yi Jiang	a4821fc9fb	Always set alignment of vectorized LD/ST in SLP-Vectorizer. <rdar://problem/16812145> llvm-svn: 207983	2014-05-05 17:59:14 +00:00
Joerg Sonnenberger	302be7e891	Fix spelling. llvm-svn: 207982	2014-05-05 17:58:46 +00:00
Duncan P. N. Exon Smith	1789fb6493	LTO: -internalize sets visibility to default Visibility is meaningless when the linkage is local. Change `-internalize` to reset the visibility to `default`. <rdar://problem/16141113> llvm-svn: 207979	2014-05-05 17:40:44 +00:00
Rafael Espindola	595f54205c	Remove the -disable-cfi option. This also add a release note about it. If this stays I will cleanup MC next week. llvm-svn: 207977	2014-05-05 17:33:26 +00:00
Adam Nemet	47c4e4e46d	[Test] Remove substitution for clang clang should not be used in the llvm tests. The topic was discussed in this thread: http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20140428/214905.html llvm-svn: 207976	2014-05-05 17:17:27 +00:00
Rafael Espindola	82ad91915e	Modify test to not use -disable-cfi. llvm-svn: 207974	2014-05-05 16:47:07 +00:00
Rafael Espindola	665bd05095	Move test to the ARM64 directory. llvm-svn: 207972	2014-05-05 16:14:37 +00:00
Rafael Espindola	f463b63448	Convert a CodeGen test into a MC test. llvm-svn: 207971	2014-05-05 15:34:13 +00:00
Michael Zolotukhin	4e030e8fb4	Fix test from r207966 and add a comment there. llvm-svn: 207969	2014-05-05 14:46:53 +00:00
Michael Zolotukhin	0c380a30d8	Add regression test for r207692. llvm-svn: 207966	2014-05-05 14:05:25 +00:00
Saleem Abdulrasool	e8a7afef86	CodeGen: correct memset emittance for WoA Windows on ARM does not conform to AEABI. However, memset would be emitted using the AEABI signature, resulting in inverted parameters. Handle this special case appropriately. llvm-svn: 207943	2014-05-04 23:13:21 +00:00
Saleem Abdulrasool	9c4716e4b6	CodeGen: strengthen WoA AEABI avoidance tests Add additional test cases for WoA AEABI avoidance checking. llvm-svn: 207942	2014-05-04 23:13:18 +00:00
Saleem Abdulrasool	729c7a08fb	MC: support FK_SecRel_4 for Windows on ARM Add handling for FK_SecRel_4 (4-byte section relative relocations). These are used by the generation of DWARF debug information (the abbrevations use section relative relocations). This will also be used in generation of CodeView line tables. llvm-svn: 207941	2014-05-04 23:13:15 +00:00
Benjamin Kramer	9130cb8547	LoopUnroll: If we're doing partial unrolling, use the PartialThreshold to limit unrolling. Otherwise we use the same threshold as for complete unrolling, which is way too high. This made us unroll any loop smaller than 150 instructions by 8 times, but only if someone specified -march=core2 or better, which happens to be the default on darwin. llvm-svn: 207940	2014-05-04 19:12:38 +00:00
Arnold Schwaighofer	cd566c423a	SLPVectorizer: Bring back the insertelement patch (r205965) with fixes When can't assume a vectorized tree is rooted in an instruction. The IRBuilder could have constant folded it. When we rebuild the build_vector (the series of InsertElement instructions) use the last original InsertElement instruction. The vectorized tree root is guaranteed to be before it. Also, we can't assume that the n-th InsertElement inserts the n-th element into a vector. This reverts r207746 which reverted the revert of the revert of r205018 or so. Fixes the test case in PR19621. llvm-svn: 207939	2014-05-04 17:10:15 +00:00
Elena Demikhovsky	e73333a50f	AVX-512: minor change in rndscale intrinsic llvm-svn: 207937	2014-05-04 13:35:37 +00:00
Saleem Abdulrasool	82b69fa105	X86: repair export compatibility with MinGW/cygwin Both MinGW and cygwin (i686) construct export directives without the global leader prefix. This is mostly due to the fact that they use GNU ld which does not correctly handle the export directive. This apparently has been been broken for a while. However, this was recently reported as being broken by mingwandroid and diorcety of the msys2 project. Remove the global leader prefix if targeting MinGW or cygwin, otherwise, retain the global leader prefix. Add an explicit test for cygwin's behaviour of export directives. llvm-svn: 207926	2014-05-04 00:03:48 +00:00
Rafael Espindola	3d082fa507	Fix pr19645. The fix itself is fairly simple: move getAccessVariant to MCValue so that we replace the old weak expression evaluation with the far more general EvaluateAsRelocatable. This then requires that EvaluateAsRelocatable stop when it finds a non trivial reference kind. And that in turn requires the ELF writer to look harder for weak references. Last but not least, this found a case where we were being bug by bug compatible with gas and accepting an invalid input. I reported pr19647 to track it. llvm-svn: 207920	2014-05-03 19:57:04 +00:00
Joey Gouly	b0afd1b929	[ARM64] Correctly select ANDWri in FastISel. http://reviews.llvm.org/D3598 llvm-svn: 207917	2014-05-03 17:27:06 +00:00
Karthik Bhat	ddd0cb5ecf	Vectorize intrinsic math function calls in SLPVectorizer. This patch adds support to recognize and vectorize intrinsic math functions in SLPVectorizer. Review: http://reviews.llvm.org/D3560 and http://reviews.llvm.org/D3559 llvm-svn: 207901	2014-05-03 09:59:54 +00:00
Adam Nemet	6a56c37b95	[LSR] Add llc testcase for r207271/r207569. See PR19608 for the details but to summarize it was easy to modify the .ll file to get the desired def-use ordering. llvm-svn: 207887	2014-05-02 23:49:01 +00:00
Chandler Carruth	271635da0d	[sanitizers] Propagate the sanitizer options through to the lit context. This makes it really easy to debug leaks FYI: ASAN_OPTIONS=detect_leaks=1 ./bin/llvm-lit -v <path to test> llvm-svn: 207874	2014-05-02 21:47:35 +00:00
Justin Bogner	c475e1bc77	llvm-cov: Fix handling of line zero appearing in a line table Reading line tables in llvm-cov was pretty broken, but would happen to work as long as no line in the table was 0. It's not clear to me whether a line of zero should show up in these tables, but deciding to read a string in the middle of the line table is certainly the wrong thing to do if it does. I've also added some comments, as trying to figure out what this block of code was doing was fairly unpleasant. llvm-svn: 207866	2014-05-02 20:01:24 +00:00
Daniel Sanders	6ef0a2f1be	[tablegen] !strconcat accepts more than two arguments but this wasn't documented or tested. Summary: * Updated the documentation * Added a test for >2 arguments * Added a check for the lexical concatenation * Made the existing test a bit stricter. Reviewers: t.p.northover Reviewed By: t.p.northover Subscribers: t.p.northover, llvm-commits Differential Revision: http://reviews.llvm.org/D3485 llvm-svn: 207865	2014-05-02 19:25:52 +00:00
Nico Weber	4b2acde21a	Teach GlobalDCE how to remove empty global_ctor entries. This moves most of GlobalOpt's constructor optimization code out of GlobalOpt into Transforms/Utils/CDtorUtils.{h,cpp}. The public interface is a single function OptimizeGlobalCtorsList() that takes a predicate returning which constructors to remove. GlobalOpt calls this with a function that statically evaluates all constructors, just like it did before. This part of the change is behavior-preserving. Also add a call to this from GlobalDCE with a filter that removes global constructors that contain a "ret" instruction and nothing else – this fixes PR19590. llvm-svn: 207856	2014-05-02 18:35:25 +00:00
Akira Hatanaka	f76388dd7e	[GVN] Pass the phi-translated address of a load instead of the untranslated address to AnalyzeLoadFromClobberingLoad. This fixes a bug in load-PRE where PRE is applied to a load that is not partially redundant. <rdar://problem/16638765>. llvm-svn: 207853	2014-05-02 17:59:17 +00:00
Saleem Abdulrasool	734bca04ff	MC: place .file records into the correct section .file records are supposed to have a section identifier of 65534 (IMAGE_SCN_DEBUG) rather than 0. This is spelt out clearly within the PE/COFF specification. Fix this minor oversight with the implementation for support for .file records. llvm-svn: 207851	2014-05-02 17:45:24 +00:00
Tim Northover	820e041a3c	DAGCombine: prevent formation of illegal ConstantFP nodes. llvm-svn: 207850	2014-05-02 17:25:02 +00:00
Tom Stellard	3dbf1f8df0	R600: Expand vector sin and cos. v2: move code to AMDGPUISelLowering.cpp squash with tests (both EG and SI) Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 207845	2014-05-02 15:41:47 +00:00
Tom Stellard	605e116e8e	R600: Expand TruncStore i64 -> {i16,i8} llvm-svn: 207844	2014-05-02 15:41:46 +00:00
Tim Northover	d7360900a8	AArch64/ARM64: add patterns for post-indexed ST1 ops. llvm-svn: 207840	2014-05-02 14:54:27 +00:00
Tim Northover	d0b07e133b	AArch64/ARM64: support indexed loads/stores on vector types. While post-indexed LD1/ST1 instructions do exist for vector loads, this patch makes use of the more flexible addressing-modes in LDR/STR instructions. llvm-svn: 207838	2014-05-02 14:54:15 +00:00
Benjamin Kramer	42d262f410	Allow SelectionDAG::FoldConstantArithmetic to work when it's called with a vector VT but scalar values. llvm-svn: 207835	2014-05-02 12:35:22 +00:00
Nick Lewycky	718ada97bc	Fold strlen(expr ? "str1" : "str2") to x ? len1 : len2. This fires about 330 times in a bootstrap of clang. llvm-svn: 207828	2014-05-02 04:11:45 +00:00
Michael J. Spencer	1f10c5ea94	[IR] Make {extract,insert}element accept an index of any integer type. Given the following C code llvm currently generates suboptimal code for x86-64: __m128 bss4( const __m128 ptr, size_t i, size_t j ) { float f = ptr[i][j]; return (__m128) { f, f, f, f }; } ================================================= define <4 x float> @_Z4bss4PKDv4_fmm(<4 x float> nocapture readonly %ptr, i64 %i, i64 %j) #0 { %a1 = getelementptr inbounds <4 x float>* %ptr, i64 %i %a2 = load <4 x float>* %a1, align 16, !tbaa !1 %a3 = trunc i64 %j to i32 %a4 = extractelement <4 x float> %a2, i32 %a3 %a5 = insertelement <4 x float> undef, float %a4, i32 0 %a6 = insertelement <4 x float> %a5, float %a4, i32 1 %a7 = insertelement <4 x float> %a6, float %a4, i32 2 %a8 = insertelement <4 x float> %a7, float %a4, i32 3 ret <4 x float> %a8 } ================================================= shlq $4, %rsi addq %rdi, %rsi movslq %edx, %rax vbroadcastss (%rsi,%rax,4), %xmm0 retq ================================================= The movslq is uneeded, but is present because of the trunc to i32 and then sext back to i64 that the backend adds for vbroadcastss. We can't remove it because it changes the meaning. The IR that clang generates is already suboptimal. What clang really should emit is: %a4 = extractelement <4 x float> %a2, i64 %j This patch makes that legal. A separate patch will teach clang to do it. Differential Revision: http://reviews.llvm.org/D3519 llvm-svn: 207801	2014-05-01 22:12:39 +00:00
Reed Kotler	bab3f23da6	Add basic functionality for assignment of ints. This creates a lot of core infrastructure in which to add, with little effort, quite a bit more to mips fast-isel Test Plan: simplestore.ll Reviewers: dsanders Reviewed By: dsanders Differential Revision: http://reviews.llvm.org/D3527 llvm-svn: 207790	2014-05-01 20:39:21 +00:00
Rafael Espindola	ea9f9d4030	Don't propagate StorageClass and ComplexType to aliases. This matches gas' behaviour on COFF. I think that this yak is now sufficiently shaved for aliases with offset to work. llvm-svn: 207786	2014-05-01 19:02:03 +00:00
Eli Bendersky	a108a65df2	Add an optimization that does CSE in a group of similar GEPs. This optimization merges the common part of a group of GEPs, so we can compute each pointer address by adding a simple offset to the common part. The optimization is currently only enabled for the NVPTX backend, where it has a large payoff on some benchmarks. Review: http://reviews.llvm.org/D3462 Patch by Jingyue Wu. llvm-svn: 207783	2014-05-01 18:38:36 +00:00
David Blaikie	748be6c376	DebugInfo: Correct the attribute type kind. Post commit review feedback from Paul Robinson regarding r207777. llvm-svn: 207782	2014-05-01 18:31:21 +00:00
David Blaikie	0f82c225b8	PR19623: Implement typedefs of void. This the LLVM portion that will allow Clang and other frontends to emit typedefs of void by providing a null type for the typedef's underlying type. llvm-svn: 207777	2014-05-01 17:56:13 +00:00
Matt Arsenault	06028dd7be	R600/SI: Fix verifier error with pseudo store instructions. Use i32 instead of specifying SReg_32. When this is the pseudo INDIRECT_BASE_ADDR, this would give a bogus verifier error. llvm-svn: 207770	2014-05-01 16:37:52 +00:00
Rafael Espindola	575f79a409	Compute the correct section for zed = foo + 1 in COFF. This fixes pr19147. There are a few more related issues to fix, but the testcase in the bug now passes. llvm-svn: 207763	2014-05-01 13:37:57 +00:00
Bradley Smith	3567cc1b42	[ARM64] Prefer generation of bzero on Darwin only llvm-svn: 207760	2014-05-01 13:11:59 +00:00
Rafael Espindola	4a04294882	Don't force symbols to be globals in .thumb_set. We currently force symbols to be globals in .thumb_set. The intent seems to be that given .thumb_set foo, bar we emit an undefined symbol to bar if it is never defined. The side effect is that we mark bar as global, even if it is defined, which gas does not. Producing an undefined reference to bar is a general difference from MC and gas. For example, given a = b gas will produce an undefined reference to b, MC will not. I would be surprised if any code depends on this, but it it does, we should fix the general difference, not special case .thumb_set. llvm-svn: 207757	2014-05-01 12:45:43 +00:00
Tim Northover	05017b1f8c	AArch64/ARM64: rewrite test to use FileCheck & add ARM64 lines llvm-svn: 207754	2014-05-01 12:30:01 +00:00
Tim Northover	4ec135fa2e	AArch64/ARM64: port basic disassembly tests to ARM64. llvm-svn: 207753	2014-05-01 12:29:56 +00:00
Tim Northover	534acbdf73	AArch64/ARM64: print BFM instructions as BFI or BFXIL The canonical form of the BFM instruction is always one of the more explicit extract or insert operations, which makes reading output much easier. llvm-svn: 207752	2014-05-01 12:29:38 +00:00
Richard Barton	3db1d580b3	Correction to assert statemtent to allow 32-bit unsigned numbers with the top bit set. This fixes an ARM assembler crash - regression test added. llvm-svn: 207747	2014-05-01 11:37:44 +00:00
Chandler Carruth	18c2fbb143	Revert r205965, which essentially reverts r205018 for the second time. =[ Turns out that this was the root cause of PR19621. We found a crasher only recently (likely due to improvements elsewhere in the SLP vectorizer) but the reduced test case failed all the way back to here. I've confirmed that reverting this patch both fixes the reduced test case in PR19621 and the actual source file that led to it, so it seems to really be rooted here. I've replied to the commit thread with discussion of my (feeble) attempts to debug this. Didn't make it very far, so reverting now that we have a good test case so that things can get back to healthy while the debugging carries on. llvm-svn: 207746	2014-05-01 11:24:11 +00:00
Simon Atanasyan	c48c58437d	[llvm-readobj] Add support for Mips specific ELF header e_flags. llvm-svn: 207744	2014-05-01 11:07:19 +00:00
Bradley Smith	f57d5ca234	[ARM64] Conditionalize CPU specific system registers on subtarget features llvm-svn: 207742	2014-05-01 10:25:36 +00:00
Matheus Almeida	d92a3fa212	[mips] Move expansion of .cpsetup to target streamer. Summary: There are two functional changes: 1) The directive is not expanded for the ASM->ASM code path. 2) If PIC is not set, there's no expansion for the ASM->OBJ code path (same behaviour as GAS). Reviewers: dsanders Reviewed By: dsanders Differential Revision: http://reviews.llvm.org/D3482 llvm-svn: 207741	2014-05-01 10:24:46 +00:00
Daniel Sanders	88fbbcaa30	[mips] Removed two-operand alias for sllv, sr[al]v, rotrv, dsllv, dsr[al]v, and drotrv GAS doesn't actually accept these particular cases. The mnemonic without the trailing 'v' still supports two-operand aliases. llvm-svn: 207740	2014-05-01 10:08:36 +00:00
Oliver Stannard	7eacbd5a71	Record the DWARF version in MCContext Record the DWARF version in MCContext, and use it when emitting the dwarf version into the debug info. llvm-svn: 207739	2014-05-01 08:46:02 +00:00
Rafael Espindola	ff68cb7f4c	Start fixing pr19147. This makes the coff writer compute the correct symbol value for the test in pr19147. The section is still incorrect, that will be fixed in a followup patch. llvm-svn: 207728	2014-05-01 00:10:17 +00:00
David Blaikie	899ae61fee	Revert "Emit DW_AT_object_pointer once, on the declaration, for each function." Breaks GDB buildbot (http://lab.llvm.org:8011/builders/clang-x86_64-ubuntu-gdb-75/builds/14517) GCC emits DW_AT_object_pointer /everywhere/ (declaration, abstract definition, inlined subroutine), but it looks like GCC relies on it being somewhere other than the declaration, at least. I'll experiment further & can hopefully still remove it from the inlined_subroutine. This reverts commit r207705. llvm-svn: 207719	2014-04-30 22:58:19 +00:00
David Blaikie	44078b3260	DebugInfo: Omit DW_AT_artificial on DW_TAG_formal_parameters in DW_TAG_inlined_subroutines. They just don't need to be there - they're inherited from the abstract definition. In theory I would like them to be inherited from the declaration, but the DWARF standard doesn't quite say that... we can probably do it anyway but I'm less confident about that so I'll leave it for a separate commit. llvm-svn: 207717	2014-04-30 22:41:33 +00:00
James Molloy	cbb9791e3b	Move a testcase from ELF to ARM64, incorrectly placed in r207627. llvm-svn: 207706	2014-04-30 21:31:11 +00:00
David Blaikie	3b2a53a437	Emit DW_AT_object_pointer once, on the declaration, for each function. This effectively reverts r164326, but adds some comments and justification and ensures we /don't/ emit the DW_AT_object_pointer on the (abstract and concrete) definitions. (while still preserving it on standalone definitions involving ObjC Blocks) This does increase the size of member function declarations from 7 to 11 bytes, unfortunately, but still seems like the Right Thing to do so that callers that see only the declaration still have the information about the object pointer. That said, I don't know what, if any, DWARF consumers don't have a heuristic to guess this in the case of normal C++ member functions - perhaps we can remove it entirely. llvm-svn: 207705	2014-04-30 21:29:41 +00:00
Alexey Samsonov	c717e7803c	Don't expect to find fpcmp and PerfectShuffle when running lit tests llvm-svn: 207704	2014-04-30 21:26:35 +00:00
Weiming Zhao	7f6daf1799	[ARM64] Prevent bit extraction to be adjusted by following shift For pattern like ((x >> C1) & Mask) << C2, DAG combiner may convert it into (x >> (C1-C2)) & (Mask << C2), which makes pattern matching of ubfx more difficult. For example: Given %shr = lshr i64 %x, 4 %and = and i64 %shr, 15 %arrayidx = getelementptr inbounds [8 x [64 x i64]]* @arr, i64 0, %i64 2, i64 %and %0 = load i64* %arrayidx With current shift folding, it takes 3 instrs to compute base address: lsr x8, x0, #1 and x8, x8, #0x78 add x8, x9, x8 If using ubfx, it only needs 2 instrs: ubfx x8, x0, #4, #4 add x8, x9, x8, lsl #3 This fixes bug 19589 llvm-svn: 207702	2014-04-30 21:07:24 +00:00
James Molloy	3ef43692b4	Add a testcase for r207627. llvm-svn: 207697	2014-04-30 20:06:26 +00:00
Hans Wennborg	59f0cba30f	Use the new StringTableBuilder in yaml2elf http://reviews.llvm.org/D3574 llvm-svn: 207694	2014-04-30 19:38:09 +00:00
Michael Zolotukhin	1f4a960ccf	[X86] Never hoist the shift value of a shift instruction. There is no need to check if we want to hoist the immediate value of an shift instruction. Simply return TCC_Free right away. This change is like r206101, but for X86. rdar://problem/16190769 llvm-svn: 207692	2014-04-30 19:17:32 +00:00
Carlo Kok	307625c974	[IPO/MergeFunctions] changes so it doesn't try to bitcast a struct return type but instead recreates it with insert/extract value. llvm-svn: 207679	2014-04-30 17:53:04 +00:00
David Majnemer	91db08bfe4	IR: Conservatively verify inalloca arguments Summary: Try to spot obvious mismatches with inalloca use. Reviewers: rnk Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D3572 llvm-svn: 207676	2014-04-30 17:22:00 +00:00
Matheus Almeida	e844872830	[mips] Add instruction alias (negu). Summary: negu $reg is equivalent to negu $reg, $reg. Reviewers: dsanders Reviewed By: dsanders Differential Revision: http://reviews.llvm.org/D3510 llvm-svn: 207673	2014-04-30 16:53:49 +00:00
Matheus Almeida	b7be52343d	[mips] Add instruction alias (sltu). Summary: The pattern sltu $r1, $r2, $imm is found in handwritten assembly which is just a shorthand version of sltui $r1, $r2, $imm. Reviewers: dsanders Reviewed By: dsanders Differential Revision: http://reviews.llvm.org/D3508 llvm-svn: 207671	2014-04-30 16:29:56 +00:00
Hans Wennborg	83e6e1e926	ELFObjectWriter: deduplicate suffices in strtab We already do this for shstrtab, so might as well do it for strtab. This extracts the string table building code into a separate class. The idea is to use it for other object formats too. I mostly wanted to do this for the general principle, but it does save a little bit on object file size. I tried this on a clang bootstrap and saved 0.54% on the sum of object file sizes (1.14 MB out of 212 MB for a release build). Differential Revision: http://reviews.llvm.org/D3533 llvm-svn: 207670	2014-04-30 16:25:02 +00:00
Tim Northover	a8c577e454	ARM64: print fp immediates without using scientific notation. llvm-svn: 207669	2014-04-30 16:13:34 +00:00
Tim Northover	7346f062b6	AArch64/ARM64: implement remaining TLS relocations (purely MC). llvm-svn: 207668	2014-04-30 16:13:26 +00:00
Tim Northover	b8fb7f4193	AArch64/ARM64: add specific diagnostic for MRS/MSR and enable tests. llvm-svn: 207667	2014-04-30 16:13:20 +00:00
Tim Northover	3c9a9401d5	AArch64/ARM64: accept and print floating-point immediate 0 as "#0.0" It's been decided that in the future, the floating-point immediate in instructions like "fcmeq v0.2s, v1.2s, #0.0" will be canonically "0.0", which has been implemented on AArch64 already but not ARM64. This fixes that issue. llvm-svn: 207666	2014-04-30 16:13:07 +00:00
Matheus Almeida	56df6ff2c5	[mips] Add instruction alias (dsll and dsrl). Summary: The pattern dsll/dsrl $rd, $rt, $rs is found in handwritten assembly which is just a shorthand version of dsllv/dsrlv $rd, $rt, $rs. Reviewers: dsanders Reviewed By: dsanders Differential Revision: http://reviews.llvm.org/D3486 llvm-svn: 207664	2014-04-30 16:00:49 +00:00
Rafael Espindola	b36f6189a5	Relax the test a bit. It is not relevant where the symbol and section names are stored, just their values. llvm-svn: 207662	2014-04-30 15:32:21 +00:00
Tom Stellard	1bd80725b3	R600/SI: Use VALU instructions for copying i1 values We can't use SALU instructions for this since they ignore the EXEC mask and are always executed. This fixes several OpenCV tests. llvm-svn: 207661	2014-04-30 15:31:33 +00:00
Tom Stellard	0c354f25c9	R600/SI: Teach moveToVALU how to handle some SMRD instructions llvm-svn: 207660	2014-04-30 15:31:29 +00:00
Chad Rosier	864e35db0a	[ARM64][fast-isel] Fast-isel doesn't know how to handle f128. llvm-svn: 207659	2014-04-30 15:29:57 +00:00
Rafael Espindola	194924e64b	Rename the test, it is testing the symver directive. llvm-svn: 207658	2014-04-30 15:27:44 +00:00
Matheus Almeida	312ac02491	[mips] Add instruction alias (sll and srl). Summary: The pattern sll/srl $rd, $rt, $rs is found in handwritten assembly which is just a shorthand version of sllv/srlv $rd, $rt, $rs. Reviewers: dsanders Reviewed By: dsanders Differential Revision: http://reviews.llvm.org/D3483 llvm-svn: 207657	2014-04-30 15:23:04 +00:00
Sasa Stankovic	7b061a42b1	[mips] Fix MipsLongBranch pass to work when the offset from the branch to the target cannot be determined accurately. This is the case for NaCl where the sandboxing instructions are added in MC layer, after the MipsLongBranch pass. It is also the case when the code has inline assembly. Instead of calculating offset in the MipsLongBranch pass, use %hi(sym1 - sym2) and %lo(sym1 - sym2) expressions that are resolved during the fixup. This patch also deletes microMIPS test file test/CodeGen/Mips/micromips-long-branch.ll and implements microMIPS CHECKs in a much simpler way in a file test/CodeGen/Mips/longbranch.ll, together with MIPS32 and MIPS64. llvm-svn: 207656	2014-04-30 15:06:25 +00:00
Matheus Almeida	bbd5e85e21	[mips] Update tests with encoding information for slt, slti, sltiu and sltu. Summary: Also renamed non-portable register names (e.g. $t2) so that we don't end up with a different encoding for what appears to be an equivalent instruction. Reviewers: dsanders Reviewed By: dsanders Differential Revision: http://reviews.llvm.org/D3505 llvm-svn: 207655	2014-04-30 14:52:57 +00:00
Tim Northover	3ffee2340e	ARM64: enable AArch64's basic-a64-instructions test llvm-svn: 207650	2014-04-30 13:37:10 +00:00
Tim Northover	0ac99404f0	ARM64: print lsr instead of lsrv for variable shifts (etc) The canonical syntax for shifts by a variable amount does not end with 'v', but that syntax should be supported as an alias (presumably for legacy reasons). llvm-svn: 207649	2014-04-30 13:37:07 +00:00
Tim Northover	7030f05b4f	ARM64: use 32-bit operations for uxtb & uxth Testing will be enabled shortly with basic-a64-instructions.s llvm-svn: 207648	2014-04-30 13:37:02 +00:00
Tim Northover	a307769b15	AArch64/ARM64: copy support for bCC instead of b.CC across. llvm-svn: 207646	2014-04-30 13:36:56 +00:00
Tim Northover	20ad359b77	AArch64/ARM64: use HS instead of CS & LO instead of CC. On instructions using the NZCV register, a couple of conditions have dual representations: HS/CS and LO/CC (meaning unsigned-higher-or-same/carry-set and unsigned-lower/carry-clear). The first of these is more descriptive in most circumstances, so we should print it. llvm-svn: 207644	2014-04-30 13:14:03 +00:00
Daniel Sanders	e296a0fce5	[mips][msa] Fix vector insertions where the index is variable Summary: This isn't supported directly so we rotate the vector by the desired number of elements, insert to element zero, then rotate back. The i64 case generates rather poor code on MIPS32. There is an obvious optimisation to be made in future (do both insert.w's inside a shared rotate/unrotate sequence) but for now it's sufficient to select valid code instead of aborting. Depends on D3536 Reviewers: matheusalmeida Reviewed By: matheusalmeida Differential Revision: http://reviews.llvm.org/D3537 llvm-svn: 207640	2014-04-30 12:09:32 +00:00
Tim Northover	f9941a9dc6	ARM64: accept ELF-relocated load/store insts without a #. E.g. we print "ldr x0, [x0, :lo12:symbol]" so we need to accept that syntax too. llvm-svn: 207639	2014-04-30 12:00:20 +00:00
Matheus Almeida	525bc4f708	[mips] Add support for .cpload. Summary: This directive is used for setting up $gp in the beginning of a function. It expands to three instructions if PIC is enabled: lui $gp, %hi(_gp_disp) addui $gp, $gp, %lo(_gp_disp) addu $gp, $gp, $reg _gp_disp is a special symbol that the linker sets to the distance between the lui instruction and the context pointer (_gp). Reviewers: dsanders Reviewed By: dsanders Differential Revision: http://reviews.llvm.org/D3480 llvm-svn: 207637	2014-04-30 11:28:42 +00:00
Matheus Almeida	c0284d118f	[mips] Emit all three relocation operations for each relocation entry on Mips64 big-endian systems. Summary: The N64 ABI allows up to three operations to be specified per relocation record independently of the endianness. Reviewers: dsanders Reviewed By: dsanders Differential Revision: http://reviews.llvm.org/D3529 llvm-svn: 207636	2014-04-30 11:21:10 +00:00
Tim Northover	970c4a8d35	ARM64: use hex immediates for movz/movk instructions Since these are mostly used in "lsl #16", "lsl #32", "lsl #48" combinations to piece together an immediate in 16-bit chunks, hex is probably the most appropriate format. llvm-svn: 207635	2014-04-30 11:19:40 +00:00
Tim Northover	4b2f8a990e	ARM64: hexify printing various immediate operands This is mostly aimed at the NEON logical operations and MOVI/MVNI (since they accept weird shifts which are more naturally understandable in hex notation). Also changes BRK/HINT etc, which is probably a neutral change, but easier than the alternative. llvm-svn: 207634	2014-04-30 11:19:28 +00:00
Tim Northover	cfd6e66544	ARM64: print canonical syntax for add/sub (imm) instructions. Since these instructions only accept a 12-bit immediate, possibly shifted left by 12, the canonical syntax used by the architecture reference manual is "#N {, lsl #12 }". We should accept an immediate that has already been shifted, (e.g. Also, print a comment giving the full addend since it can be helpful. llvm-svn: 207633	2014-04-30 11:19:15 +00:00
James Molloy	7c39df37b2	[ARM64] Ensure arm64_be is dealt with when emitting debug info. This is a partial port of r204816 (cpirker "Elf support for MC-JIT runtime dynamic linker") from AArch64 to ARM64. llvm-svn: 207625	2014-04-30 10:15:35 +00:00
Tim Northover	41cec5c3cb	ARM64: make sure FastISel uses a GPR64 source in 64-bit extensions. llvm-svn: 207620	2014-04-30 09:32:01 +00:00
Saleem Abdulrasool	25947c318b	ARM: support stack probe emission for Windows on ARM This introduces the stack lowering emission of the stack probe function for Windows on ARM. The stack on Windows on ARM is a dynamically paged stack where any page allocation which crosses a page boundary of the following guard page will cause a page fault. This page fault must be handled by the kernel to ensure that the page is faulted in. If this does not occur and a write access any memory beyond that, the page fault will go unserviced, resulting in an abnormal program termination. The watermark for the stack probe appears to be at 4080 bytes (for accommodating the stack guard canaries and stack alignment) when SSP is enabled. Otherwise, the stack probe is emitted on the page size boundary of 4096 bytes. llvm-svn: 207615	2014-04-30 07:05:07 +00:00
Saleem Abdulrasool	0aca1c30c6	ARM: print COFF function header for Windows on ARM Emit the COFF header when printing out the function. This is important as the header contains two important pieces of information: the storage class for the symbol and the symbol type information. This bit of information is required for the linker to correctly identify the type of symbol that it is dealing with. llvm-svn: 207613	2014-04-30 06:14:25 +00:00
Saleem Abdulrasool	f8222631a5	ARM: partially handle 32-bit relocations for WoA IMAGE_REL_ARM_MOV32T relocations require that the movw/movt pair-wise relocation is not split up and reordered. When expanding the mov32imm pseudo-instruction, create a bundle if the machine operand is referencing an address. This helps ensure that the relocatable address load is not reordered by subsequent passes. Unfortunately, this only partially handles the case as the Constant Island Pass occurs after the instructions are unbundled and does not properly handle bundles. That is a more fundamental issue with the pass itself and beyond the scope of this change. llvm-svn: 207608	2014-04-30 04:54:58 +00:00
Reid Kleckner	fb69308568	Implement X86 code generation for musttail Currently, musttail codegen is relying on sibcall optimization, and reporting a fatal error if fails. Sibcall optimization fails when stack arguments need to be modified, which is insufficient for musttail. The logic for moving arguments in memory safely is already implemented for GuaranteedTailCallOpt. This change merely arranges for musttail calls to use it. No functional change for GuaranteedTailCallOpt. Reviewers: espindola Differential Revision: http://reviews.llvm.org/D3493 llvm-svn: 207598	2014-04-29 23:55:41 +00:00
Tom Stellard	919bb6b83f	R600/SI: Custom lower SI_IF and SI_ELSE to avoid machine verifier errors SI_IF and SI_ELSE are terminators which also produce a value. For these instructions ISel always inserts a COPY to move their value to another basic block. This COPY ends up between SI_(IF\|ELSE) and the S_BRANCH* instruction at the end of the block. This breaks MachineBasicBlock::getFirstTerminator() and also the machine verifier which assumes that terminators are grouped together at the end of blocks. To solve this we coalesce the copy away right after ISel to make sure there are no instructions in between terminators at the end of blocks. llvm-svn: 207591	2014-04-29 23:12:53 +00:00
Tom Stellard	58ac7440e6	R600/SI: Only select SALU instructions in the entry or exit block SALU instructions ignore control flow, so it is not always safe to use them within branches. This is a partial solution to this problem until we can come up with something better. llvm-svn: 207590	2014-04-29 23:12:48 +00:00
Tom Stellard	676f571999	R600: optimize the UDIVREM 64 algorithm This is a squash of several optimization commits: - calculate DIV_Lo and DIV_Hi separately - use BFE_U32 if we are operating on 32bit values - use precomputed constants instead of shifting in UDVIREM - skip the first 32 iterations of udivrem v2: Check whether BFE is supported before using it Patch by: Jan Vesely Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> Reviewed-by: Tom Stellard <thomas.stellard@amd.com> llvm-svn: 207589	2014-04-29 23:12:46 +00:00
Rafael Espindola	85f3610222	Also handle ConstantAggregateZero when optimizing vpermilvar*. llvm-svn: 207582	2014-04-29 22:20:40 +00:00
Rafael Espindola	eb7bdbd0ce	Two fixes to the vpermilvar optimization. The instcomine logic to handle vpermilvar's pd and 256 variants was incorrect. The _256 variants have indexes into the individual 128 bit lanes and in all cases it also has to mask out unused bits. llvm-svn: 207577	2014-04-29 20:41:54 +00:00
Diego Novillo	cd64780d18	Fix vectorization remarks. This patch changes the vectorization remarks to also inform when vectorization is possible but not beneficial. Added tests to exercise some loop remarks. llvm-svn: 207574	2014-04-29 20:06:10 +00:00
Yi Jiang	1a3f18b161	Continue slp vectorization even the BB already has vectorized store radar://16641956 llvm-svn: 207572	2014-04-29 19:37:20 +00:00
Reed Kotler	67077b3032	Add Simple return instruction to Mips fast-isel Reviewers: dsanders Reviewed by: dsanders Differential Revision: http://reviews.llvm.org/D3430 llvm-svn: 207565	2014-04-29 17:57:50 +00:00
Tilmann Scheller	4418dda5ef	[ARM64] Disable regression tests for the old JIT. Since the ARM64 backend doesn't implement support for the old JIT those tests are failing when the regression tests are run on an AArch64 host. llvm-svn: 207530	2014-04-29 15:02:40 +00:00
Daniel Sanders	6857800b67	[mips][msa] Use CHECK-LABEL in basic_operations*.ll Differential Revision: http://reviews.llvm.org/D3536 llvm-svn: 207529	2014-04-29 14:28:58 +00:00
Joerg Sonnenberger	dd18d5b0f6	Parse and create GOT_PREL relocations. llvm-svn: 207526	2014-04-29 13:42:02 +00:00
Daniel Sanders	b3268e71e2	[mips][msa] Fix element extraction where the index is variable. Summary: This isn't supported directly so we splat the vector element and extract the most convenient copy. Reviewers: matheusalmeida Reviewed By: matheusalmeida Differential Revision: http://reviews.llvm.org/D3530 llvm-svn: 207524	2014-04-29 13:31:37 +00:00
Rafael Espindola	b60c829a2a	Centralize the handling of the thumb bit. This patch centralizes the handling of the thumb bit around MCStreamer::isThumbFunc and makes isThumbFunc handle aliases. This fixes a corner case, but the main advantage is having just one way to check if a MCSymbol is thumb or not. This should still be refactored to be ARM only, but at least now it is just one predicate that has to be refactored instead of 3 (isThumbFunc, ELF_Other_ThumbFunc, and SF_ThumbFunc). llvm-svn: 207522	2014-04-29 12:46:50 +00:00
Tim Northover	aacce57d61	ARM: fix test after change to indirect symbol emission. llvm-svn: 207519	2014-04-29 10:13:10 +00:00
Tim Northover	9e7782dcf3	X86: emit hidden stubs into a proper non_lazy_symbol_pointer section. rdar://problem/16660411 llvm-svn: 207518	2014-04-29 10:06:10 +00:00
Tim Northover	2372301bcf	ARM: emit hidden stubs into a proper non_lazy_symbol_pointer section. rdar://problem/16660411 llvm-svn: 207517	2014-04-29 10:06:05 +00:00
Benjamin Kramer	e1ab3f062e	AArch64: Mark vector long multiplication as expand. There are no patterns for this. This was already fixed for ARM64 but I forgot to apply it to AArch64 too. llvm-svn: 207515	2014-04-29 09:37:54 +00:00
Elena Demikhovsky	299cf511c4	AVX-512: optimized a shuffle pattern to VINSERTI64x4. Added intrinsics for VPERMT2PS/PD/D/Q instructions. llvm-svn: 207513	2014-04-29 09:09:15 +00:00
Zinovy Nis	d373fec199	[OPENMP][LV][D3423] Respect Hints.Force meta-data for loops in LoopVectorizer llvm-svn: 207512	2014-04-29 08:55:11 +00:00
Hao Liu	6db3410071	[ARM64]Fix a bug about incorrect operand order in an EXT instruction, which is introduced by r207485. llvm-svn: 207500	2014-04-29 07:51:19 +00:00
Hao Liu	cf37110920	[ARM64]Fix a bug when lowering shuffle vector to an EXT instruction. E.g. Mask like <-1, -1, 1, ...> will generate incorrect EXT index. llvm-svn: 207485	2014-04-29 01:50:36 +00:00
Chandler Carruth	c71b2c3c7f	Revert r207271 for now. This commit introduced a test case that ran clang directly from the LLVM test suite! That doesn't work. I've followed up on the review thread to try and get a viable solution sorted out, but trying to get the tree clean here. llvm-svn: 207462	2014-04-28 23:07:49 +00:00
Rafael Espindola	bc91d7e25a	Add an option for evaluating past symbols. When evaluating an assembly expression for a relocation, we want to stop at MCSymbols that are in the symbol table, even if they are variables. This is needed since the semantics may require that the relocation use them. That is not the case when computing the value of a symbol in the symbol table. There are no relocations in this case and we have to keep going until we hit a section or find out that the expression doesn't have an assembly time value. llvm-svn: 207445	2014-04-28 20:53:11 +00:00
David Blaikie	d8f0ac7b4a	DwarfDebug: Omit DW_AT_object_pointer on inlined_subroutines While refactoring out constructScopeDIE into two functions I realized we were emitting DW_AT_object_pointer in the inlined subroutine when we didn't need to (GCC doesn't, and the abstract subprogram definition has the information already). So here's the refactoring and the bug fix. This is one step of refactoring to remove some subtle memory ownership semantics. It turns out the original constructScopeDIE returned ownership in its return value in some cases and not in others. The split into two functions now separates those two semantics - further cleanup (unique_ptr, etc) will follow. llvm-svn: 207441	2014-04-28 20:27:02 +00:00
Duncan P. N. Exon Smith	c5a3139ebd	Reapply "blockfreq: Approximate irreducible control flow" This reverts commit r207287, reapplying r207286. I'm hoping that declaring an explicit struct and instantiating `addBlockEdges()` directly works around the GCC crash from r207286. This is a lot more boilerplate, though. llvm-svn: 207438	2014-04-28 20:02:29 +00:00
Hans Wennborg	e36e116826	InstCombine: don't drop 'inalloca' in PromoteCastOfAllocation (PR19569) llvm-svn: 207426	2014-04-28 17:40:03 +00:00
Chad Rosier	0def8e2652	[ARM64] Fix an issue where we were always assuming a copy was coming from a D subregister. llvm-svn: 207423	2014-04-28 16:21:50 +00:00
Rafael Espindola	3b5ee55804	Don't include an invalid symbol in the symbol table. The symbol table itself has no relocations, so it is not possible to represent things like a = undefined + 1 With the patch we just omit these variables. That matches the behaviour of the gnu assembler. llvm-svn: 207419	2014-04-28 13:39:57 +00:00
Rafael Espindola	407f5be3cc	List the entire symbol table in this test. This will allow us to extend this test to show that other symbols don't show up in the symbol table. llvm-svn: 207418	2014-04-28 13:26:35 +00:00
Rafael Espindola	9645090181	Produce an error instead of a crash in an expr we cannot represent. llvm-svn: 207414	2014-04-28 12:40:50 +00:00
Tim Northover	7b839f833d	ARM64: diagnose use of v16-v31 in certain indexed NEON instructions. Someone couldn't bear to have a completely orthogonal set of floating-point registers, so we've got some instructions that only accept v0-v15 (coming in ARMv9, V128_prime: you're allowed v2, v3, v5, v7, ...). Anyway, we were permitting even the out of range registers during assembly (CodeGen handled it correctly). This adds a diagnostic. llvm-svn: 207412	2014-04-28 11:27:43 +00:00
Chandler Carruth	e01fd5f63a	[inliner] Significantly improve the compile time in cases like PR19499 by avoiding inlining massive switches merely because they have no instructions in them. These switches still show up where we fail to form lookup tables, and in those cases they are actually going to cause a very significant code size hit anyways, so inlining them is not the right call. The right way to fix any performance regressions stemming from this is to enhance the switch-to-lookup-table logic to fire in more places. This makes PR19499 about 5x less bad. It uncovers a second compile time problem in that test case that is unrelated (surprisingly!). llvm-svn: 207403	2014-04-28 08:52:44 +00:00
Hao Liu	9a342778b9	[ARM64]Fix a bug cannot select UQSHL/SQSHL with constant i64 shift amount. llvm-svn: 207399	2014-04-28 07:34:27 +00:00
Chandler Carruth	0ef74f571c	Update tests to use the new format of printing a TimeValue. It's a bit odd to have the output of 'llvm-ar tv' depend on the format of TimeValue::str(), but that's what we have today. If anyone needs the output to remain compatible with GNU ar or old versions of llvm-ar, just shout and I'll switch the code to manually format its times. Note that there isn't a portable format -- Mac and GNU have different formats at least (thanks Rafael!) so... llvm-svn: 207387	2014-04-28 01:24:32 +00:00
Rafael Espindola	466d66358d	Add emitThumbSet to the arm target streamer. This fixes the asm printer implementation and lets the parser be unaware of what .thumb_set is. llvm-svn: 207381	2014-04-27 20:23:58 +00:00
Benjamin Kramer	ce4b3fee72	X86TTI: Adjust sdiv cost now that we can lower it on plain SSE2. Includes a fix for a horrible typo that caused all SDIV costs to be slightly off :) llvm-svn: 207371	2014-04-27 18:47:54 +00:00
Benjamin Kramer	3693e77cb4	X86: If SSE4.1 is missing lower SMUL_LOHI of v4i32 to pmuludq and fix up the high parts. This is more expensive than pmuldq but still cheaper than scalarizing the whole thing. llvm-svn: 207370	2014-04-27 18:47:41 +00:00
Saleem Abdulrasool	7c34b6dd65	MC: duplicate .file test for WoA (SVN r207341) Since the COFF tests are dependent on X86, duplicate the test for ARM. Use the default check prefix. llvm-svn: 207365	2014-04-27 16:10:57 +00:00
NAKAMURA Takumi	079173d560	Revert r206989, "Mark llvm/test/BugPoint/compile-custom.ll as XFAIL:vg_leak." It has been fixed since r207265. llvm-svn: 207355	2014-04-27 11:59:33 +00:00
Benjamin Kramer	99767ddf0b	Update test not to check for a shuffle of an all-zero vector. llvm-svn: 207354	2014-04-27 11:54:45 +00:00
Benjamin Kramer	6bca8ef667	SelectionDAG: Aggressively fold shuffles of constant splats. llvm-svn: 207352	2014-04-27 11:41:06 +00:00
Saleem Abdulrasool	c31d1528af	tests: Windows ARM now supports object emission Update lit.cfg with the fact that LLVM can now generate WoA PE/COFF objects. llvm-svn: 207347	2014-04-27 04:29:36 +00:00
Saleem Abdulrasool	710f944830	COFF: move ARM COFF test to ARM directory The COFF tests all assume X86. Just move the new COFF tests under ARM to appease the build bots. llvm-svn: 207346	2014-04-27 04:29:32 +00:00
Saleem Abdulrasool	84b952b677	Add WoA object file emission support Introduce support for WoA PE/COFF object file emission from LLVM. Add the new target specific PE/COFF Streamer (ARMWinCOFFStreamer) that handles the ARM specific behaviour of PE/COFF object emission. ARM exception information is not yet emitted and is a TODO item. The ARM specific object writer (ARMWinCOFFObjectWriter) handles the ARM specific relocation handling in conjunction with the WinCOFFObjectWriter in the MC layer. The MC layer needs to be updated to deal with the relocation adjustments. Branch relocations are adjusted by 4 bytes (unlikely their ELF counterparts). Minor tweaks to switch multiple conditional checks into equivalent switch statements. The ObjectFileInfo is updated to relax the object file setup for Windows COFF. Move the architecture checks into an assertion. Windows COFF is currently only supported on x86, x86_64, and ARM (thumb). Rather than defaulting to ELF, we will refuse to generate an object file. This is better though as you do not get an (arbitrary) object file which is different from the request. llvm-svn: 207345	2014-04-27 03:48:22 +00:00
Benjamin Kramer	da4841b3a9	DAGCombiner: Simplify code a bit, make more transforms work with vectors. llvm-svn: 207338	2014-04-26 23:09:49 +00:00
David Blaikie	2b4669de8a	DebugInfo: Fix and test a regression caused by r207263 causing the DW_AT_object_pointer to go missing on blocks Noticed by inspection. Test coverage added. llvm-svn: 207333	2014-04-26 22:12:18 +00:00
David Blaikie	9c34526c17	Include C++ source for debug info test case committed in r207323 llvm-svn: 207324	2014-04-26 18:25:07 +00:00
David Blaikie	e12b49a6e8	DWARF Type Units: Avoid emitting type units under fission if the type requires an address. Since there's no way to ensure the type unit in the .dwo and the type unit skeleton in the .o are correlated, this cannot work. This implementation is a bit inefficient for a few reasons, called out in comments. llvm-svn: 207323	2014-04-26 17:27:38 +00:00
Benjamin Kramer	7c3722724b	X86TTI: i16/i32 vector div with a constant (splat) divisor are reasonably cheap now. Turn vectorization back on. llvm-svn: 207320	2014-04-26 14:53:05 +00:00
Benjamin Kramer	6d2dff61f9	X86: Lower SMUL_LOHI of v4i32 to pmuldq when SSE4.1 is available. llvm-svn: 207318	2014-04-26 14:12:19 +00:00
Benjamin Kramer	c9827ab103	X86: Add patterns for MULHU/MULHS of v8i16 and v16i16. This gets us pretty code for divs of i16 vectors. Turn the existing intrinsics into the corresponding nodes. llvm-svn: 207317	2014-04-26 13:01:03 +00:00
Benjamin Kramer	4dae598bc8	DAGCombiner: Turn divs of vector splats into vectorized multiplications. Otherwise the legalizer would just scalarize everything. Support for mulhi in the targets isn't that great yet so on most targets we get exactly the same scalarized output. Add a test for x86 vector udiv. I had to disable the mulhi nodes on ARM because there aren't any patterns for it. As far as I know ARM has instructions for getting the high part of a multiply so this should be fixed. llvm-svn: 207315	2014-04-26 12:06:28 +00:00
Michael Zolotukhin	1a97a7bcbf	Revert r206749 till a final decision about the intrinsics is made. llvm-svn: 207313	2014-04-26 09:56:41 +00:00
Gerolf Hoflehner	af7a87d2e3	RecursivelyDeleteTriviallyDeadInstructions() could remove more than 1 instruction. The caller need to be aware of this and adjust instruction iterators accordingly. rdar://16679376 Repaired r207302. llvm-svn: 207309	2014-04-26 05:58:11 +00:00
Juergen Ributzka	a6bda8bae2	[DAG] During DAG legalization keep opaque constants even after expanding. The included test case would return the incorrect results, because the expansion of an shift with a constant shift amount of 0 would generate undefined behavior. This is because ExpandShiftByConstant assumes that all shifts by constants with a value of 0 have already been optimized away. This doesn't happen for opaque constants and usually this isn't a problem, because opaque constants won't take this code path - they are not supposed to. In the case that the opaque constant has to be expanded by the legalizer, the legalizer would drop the opaque flag. In this case we hit the limitations of ExpandShiftByConstant and create incorrect code. This commit fixes the legalizer by not dropping the opaque flag when expanding opaque constants and adding an assertion to ExpandShiftByConstant to catch this not supported case in the future. This fixes <rdar://problem/16718472> llvm-svn: 207304	2014-04-26 02:58:04 +00:00
Gerolf Hoflehner	c46e9b0423	Revert commit r207302 since build failures have been reported. llvm-svn: 207303	2014-04-26 02:03:17 +00:00
Gerolf Hoflehner	34210108b3	RecursivelyDeleteTriviallyDeadInstructions() could remove more than 1 instruction. The caller need to be aware of this and adjust instruction iterators accordingly. rdar://16679376 llvm-svn: 207302	2014-04-26 01:19:16 +00:00
Quentin Colombet	ea18933d97	[X86] Implement TargetLowering::getScalingFactorCost hook. Scaling factors are not free on X86 because every "complex" addressing mode breaks the related instruction into 2 allocations instead of 1. <rdar://problem/16730541> llvm-svn: 207301	2014-04-26 01:11:26 +00:00
Andrea Di Biagio	8cc9059ce8	[InstCombine][X86] Teach how to fold calls to SSE2/AVX2 packed logical shift right intrinsics. A packed logical shift right with a shift count bigger than or equal to the element size always produces a zero vector. In all other cases, it can be safely replaced by a 'lshr' instruction. llvm-svn: 207299	2014-04-26 01:03:22 +00:00
Filipe Cabecinhas	d71f110fe9	Appease the almighty buildbots. llvm-svn: 207295	2014-04-26 00:02:37 +00:00
Filipe Cabecinhas	363b570d2a	Optimization for certain shufflevector by using insertps. Summary: If we're doing a v4f32/v4i32 shuffle on x86 with SSE4.1, we can lower certain shufflevectors to an insertps instruction: When most of the shufflevector result's elements come from one vector (and keep their index), and one element comes from another vector or a memory operand. Added tests for insertps optimizations on shufflevector. Added support and tests for v4i32 vector optimization. Reviewers: nadav Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D3475 llvm-svn: 207291	2014-04-25 23:51:17 +00:00
Duncan P. N. Exon Smith	42292ceaa9	Revert "blockfreq: Approximate irreducible control flow" This reverts commit r207286. It causes an ICE on the cmake-llvm-x86_64-linux buildbot [1]: llvm/lib/Analysis/BlockFrequencyInfo.cpp: In lambda function: llvm/lib/Analysis/BlockFrequencyInfo.cpp:182:1: internal compiler error: in get_expr_operands, at tree-ssa-operands.c:1035 [1]: http://bb.pgr.jp/builders/cmake-llvm-x86_64-linux/builds/12093/steps/build_llvm/logs/stdio llvm-svn: 207287	2014-04-25 23:16:58 +00:00
Duncan P. N. Exon Smith	384d0e8ad4	blockfreq: Approximate irreducible control flow Previously, irreducible backedges were ignored. With this commit, irreducible SCCs are discovered on the fly, and modelled as loops with multiple headers. This approximation specifies the headers of irreducible sub-SCCs as its entry blocks and all nodes that are targets of a backedge within it (excluding backedges within true sub-loops). Block frequency calculations act as if we insert a new block that intercepts all the edges to the headers. All backedges and entries to the irreducible SCC point to this imaginary block. This imaginary block has an edge (with even probability) to each header block. The result is now reasonable enough that I've added a number of testcases for irreducible control flow. I've outlined in `BlockFrequencyInfoImpl.h` ways to improve the approximation. <rdar://problem/14292693> llvm-svn: 207286	2014-04-25 23:08:57 +00:00
Adrian Prantl	232897feaa	Unbreak the gdb buildbot by not lowering dbg.declare intrinsics for arrays. llvm-svn: 207284	2014-04-25 23:00:25 +00:00

... 2 3 4 5 6 ...

24200 Commits