llvm-project

Commit Graph

Author	SHA1	Message	Date
Elena Demikhovsky	f58f838495	Changed basic cost of store operation on X86 Store operation takes 2 UOps on X86 processors. The exact cost calculation affects several optimization passes including loop unroling. This change compensates performance degradation caused by https://reviews.llvm.org/D34458 and shows improvements on some benchmarks. Differential Revision: https://reviews.llvm.org/D35888 llvm-svn: 311285	2017-08-20 12:34:29 +00:00
Aditya Kumar	a525fffd07	[Loop Vectorize] Added a separate metadata Added a separate metadata to indicate when the loop has already been vectorized instead of setting width and count to 1. Patch written by Divya Shanmughan and Aditya Kumar Differential Revision: https://reviews.llvm.org/D36220 llvm-svn: 311281	2017-08-20 10:32:41 +00:00
Igor Breger	88a3d5c855	[GlobalISel][X86] Support call ABI. Summary: Support call ABI. For now only Linux C and X86_64_SysV calling conventions supported. Variadic function not supported. Reviewers: zvi, guyblank, oren_ben_simhon Reviewed By: oren_ben_simhon Subscribers: rovka, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D34602 llvm-svn: 311279	2017-08-20 09:25:22 +00:00
Igor Breger	b3a860a5e8	[GlobalISel][X86] Support asimetric copy from/to GPR physical register. Usually this case generated by ABI lowering, it requare to performe trancate/anyext. llvm-svn: 311278	2017-08-20 07:14:40 +00:00
Sam Elliott	7fe0aaa140	Revert "Emit only A Single Opt Remark When Inlining" Reverting due to clang build failure llvm-svn: 311274	2017-08-20 06:55:10 +00:00
Sam Elliott	785dd75369	Emit only A Single Opt Remark When Inlining Summary: This updates the Inliner to only add a single Optimization Remark when Inlining, rather than an Analysis Remark and an Optimization Remark. Fixes https://bugs.llvm.org/show_bug.cgi?id=33786 Reviewers: anemet, davidxl, chandlerc Reviewed By: anemet Subscribers: haicheng, fhahn, mehdi_amini, dblaikie, llvm-commits, eraman Differential Revision: https://reviews.llvm.org/D36054 llvm-svn: 311273	2017-08-20 06:43:34 +00:00
Sam Elliott	b0c9753691	Keep Optimization Remark Yaml in NewPM Summary: The New Pass Manager infrastructure was forgetting to keep around the optimization remark yaml file that the compiler might have been producing. This meant setting the option to '-' for stdout worked, but setting it to a filename didn't give file output (presumably it was deleted because compilation didn't explicitly keep it). This change just ensures that the file is kept if compilation succeeds. So far I have updated one of the optimization remark output tests to add a version with the new pass manager. It is my intention for this patch to also include changes to all tests that use `-opt-remark-output=` but I wanted to get the code patch ready for review while I was making all those changes. Fixes https://bugs.llvm.org/show_bug.cgi?id=33951 Reviewers: anemet, chandlerc Reviewed By: anemet, chandlerc Subscribers: javed.absar, chandlerc, fhahn, llvm-commits Differential Revision: https://reviews.llvm.org/D36906 llvm-svn: 311271	2017-08-20 01:30:45 +00:00
Chandler Carruth	9ef881efab	[x86] Fix an even stranger corner case where we have multiple levels of cmov self-refrencing. Pointed out by Amjad Aboud in code review, test case minorly simplified from the one he posted. llvm-svn: 311267	2017-08-19 23:35:50 +00:00
Craig Topper	a0319bb434	[AVX512] Use alignedstore256 in a pattern that's emitting a 256-bit movaps from an extract subvector operation. llvm-svn: 311263	2017-08-19 22:02:02 +00:00
Martin Storsjo	91522ffa12	[ARM] Check the right order for halves of VZIP/VUZP if both parts are used This is the exact same fix as in SVN r247254. In that commit, the fix was applied only for isVTRNMask and isVTRN_v_undef_Mask, but the same issue is present for VZIP/VUZP as well. This fixes PR33921. Differential Revision: https://reviews.llvm.org/D36899 llvm-svn: 311258	2017-08-19 19:47:48 +00:00
Teresa Johnson	b225ad05af	Fix bot failures by requiring x86 target The tests added in r311254 require a target triple since they are running through code generation. Fix bot failures by requiring an x86 target. llvm-svn: 311257	2017-08-19 19:15:04 +00:00
Jatin Bhateja	6b4c205685	[DAGCombiner] Extending pattern detection for vector shuffle. Summary: If all the operands of a BUILD_VECTOR extract elements from same vector then split the vector efficiently based on the maximum vector access index. Reviewers: zvi, delena, RKSimon, thakis Reviewed By: RKSimon Subscribers: chandlerc, eladcohen, llvm-commits Differential Revision: https://reviews.llvm.org/D35788 llvm-svn: 311255	2017-08-19 18:08:59 +00:00
Teresa Johnson	73305f82e9	[ThinLTO] Fix ThinLTO crash Summary: Follow up to fix in r311023, which fixed the case where the combined index is written to disk. The same samplePGO logic exists for the in-memory index when computing imports, so we need to filter out GlobalVariable summaries there too. Reviewers: davidxl Subscribers: inglorion, llvm-commits Differential Revision: https://reviews.llvm.org/D36919 llvm-svn: 311254	2017-08-19 18:04:25 +00:00
Jatin Bhateja	66f7958e91	Revert rL311247 : To rectify commit message. Summary: This reverts commit rL311247. Differential Revision: https://reviews.llvm.org/D36927 llvm-svn: 311252	2017-08-19 17:59:58 +00:00
Jatin Bhateja	6f0d0d23b0	Merge branch 'arcpatch-D35788' llvm-svn: 311247	2017-08-19 17:00:04 +00:00
Jatin Bhateja	1c56863739	Revert rL311242 "Extension of shuffle vector pattern detection, updating post rebase." Summary: This reverts commit rL311242. Differential Revision: https://reviews.llvm.org/D36924 llvm-svn: 311246	2017-08-19 16:40:06 +00:00
Jatin Bhateja	313f97dd84	Extension of shuffle vector pattern detection, updating post rebase. llvm-svn: 311242	2017-08-19 15:58:36 +00:00
Victor Leschuk	ee7d232a41	revert failing test llvm-svn: 311238	2017-08-19 12:24:41 +00:00
Victor Leschuk	ba0954c4e2	Add temporary test to verify that win10 builder hangs on error llvm-svn: 311236	2017-08-19 12:02:39 +00:00
Chandler Carruth	4f3aa29a46	[Inliner] Fix a nasty bug when inlining a non-recursive trace of a function into itself. We tried to fix this before in r306495 but that got reverted as the assert was actually hit. This fixes the original bug (which we seem to have lost track of with the revert) by blocking a second remapping when the function being inlined is also the caller and the remapping could succeed but erroneously. The included test case would actually load from an inlined copy of the alloca before this change, failing to load the stored value and miscompiling. Many thanks to Richard Smith for diagnosing a user miscompile to this bug, and to Kyle for the first attempt and initial analysis and David Li for remembering the issue and how to fix it and suggesting the patch. I'm just stitching it together and landing it. =] llvm-svn: 311229	2017-08-19 06:56:11 +00:00
Chandler Carruth	2a80fddf67	[Inliner] Clean up a test case a bit to make it more clear what is being tested and why. llvm-svn: 311228	2017-08-19 06:06:44 +00:00
Chandler Carruth	93a645525c	[x86] Teach the cmov converter to aggressively convert cmovs with memory operands into control flow. We have seen periodically performance problems with cmov where one operand comes from memory. On modern x86 processors with strong branch predictors and speculative execution, this tends to be much better done with a branch than cmov. We routinely see cmov stalling while the load is completed rather than continuing, and if there are subsequent branches, they cannot be speculated in turn. Also, in many (even simple) cases, macro fusion causes the control flow version to be fewer uops. Consider the IACA output for the initial sequence of code in a very hot function in one of our internal benchmarks that motivates this, and notice the micro-op reduction provided. Before, SNB: ``` Throughput Analysis Report -------------------------- Block Throughput: 2.20 Cycles Throughput Bottleneck: Port1 \| Num Of \| Ports pressure in cycles \| \| \| Uops \| 0 - DV \| 1 \| 2 - D \| 3 - D \| 4 \| 5 \| \| --------------------------------------------------------------------- \| 1 \| \| 1.0 \| \| \| \| \| CP \| mov rcx, rdi \| 0* \| \| \| \| \| \| \| \| xor edi, edi \| 2^ \| 0.1 \| 0.6 \| 0.5 0.5 \| 0.5 0.5 \| \| 0.4 \| CP \| cmp byte ptr [rsi+0xf], 0xf \| 1 \| \| \| 0.5 0.5 \| 0.5 0.5 \| \| \| \| mov rax, qword ptr [rsi] \| 3 \| 1.8 \| 0.6 \| \| \| \| 0.6 \| CP \| cmovbe rax, rdi \| 2^ \| \| \| 0.5 0.5 \| 0.5 0.5 \| \| 1.0 \| \| cmp byte ptr [rcx+0xf], 0x10 \| 0F \| \| \| \| \| \| \| \| jb 0xf Total Num Of Uops: 9 ``` After, SNB: ``` Throughput Analysis Report -------------------------- Block Throughput: 2.00 Cycles Throughput Bottleneck: Port5 \| Num Of \| Ports pressure in cycles \| \| \| Uops \| 0 - DV \| 1 \| 2 - D \| 3 - D \| 4 \| 5 \| \| --------------------------------------------------------------------- \| 1 \| 0.5 \| 0.5 \| \| \| \| \| \| mov rax, rdi \| 0* \| \| \| \| \| \| \| \| xor edi, edi \| 2^ \| 0.5 \| 0.5 \| 1.0 1.0 \| \| \| \| \| cmp byte ptr [rsi+0xf], 0xf \| 1 \| 0.5 \| 0.5 \| \| \| \| \| \| mov ecx, 0x0 \| 1 \| \| \| \| \| \| 1.0 \| CP \| jnbe 0x39 \| 2^ \| \| \| \| 1.0 1.0 \| \| 1.0 \| CP \| cmp byte ptr [rax+0xf], 0x10 \| 0F \| \| \| \| \| \| \| \| jnb 0x3c Total Num Of Uops: 7 ``` The difference even manifests in a throughput cycle rate difference on Haswell. Before, HSW: ``` Throughput Analysis Report -------------------------- Block Throughput: 2.00 Cycles Throughput Bottleneck: FrontEnd \| Num Of \| Ports pressure in cycles \| \| \| Uops \| 0 - DV \| 1 \| 2 - D \| 3 - D \| 4 \| 5 \| 6 \| 7 \| \| --------------------------------------------------------------------------------- \| 0* \| \| \| \| \| \| \| \| \| \| mov rcx, rdi \| 0* \| \| \| \| \| \| \| \| \| \| xor edi, edi \| 2^ \| \| \| 0.5 0.5 \| 0.5 0.5 \| \| 1.0 \| \| \| \| cmp byte ptr [rsi+0xf], 0xf \| 1 \| \| \| 0.5 0.5 \| 0.5 0.5 \| \| \| \| \| \| mov rax, qword ptr [rsi] \| 3 \| 1.0 \| 1.0 \| \| \| \| \| 1.0 \| \| \| cmovbe rax, rdi \| 2^ \| 0.5 \| \| 0.5 0.5 \| 0.5 0.5 \| \| \| 0.5 \| \| \| cmp byte ptr [rcx+0xf], 0x10 \| 0F \| \| \| \| \| \| \| \| \| \| jb 0xf Total Num Of Uops: 8 ``` After, HSW: ``` Throughput Analysis Report -------------------------- Block Throughput: 1.50 Cycles Throughput Bottleneck: FrontEnd \| Num Of \| Ports pressure in cycles \| \| \| Uops \| 0 - DV \| 1 \| 2 - D \| 3 - D \| 4 \| 5 \| 6 \| 7 \| \| --------------------------------------------------------------------------------- \| 0* \| \| \| \| \| \| \| \| \| \| mov rax, rdi \| 0* \| \| \| \| \| \| \| \| \| \| xor edi, edi \| 2^ \| \| \| 1.0 1.0 \| \| \| 1.0 \| \| \| \| cmp byte ptr [rsi+0xf], 0xf \| 1 \| \| 1.0 \| \| \| \| \| \| \| \| mov ecx, 0x0 \| 1 \| \| \| \| \| \| \| 1.0 \| \| \| jnbe 0x39 \| 2^ \| 1.0 \| \| \| 1.0 1.0 \| \| \| \| \| \| cmp byte ptr [rax+0xf], 0x10 \| 0F \| \| \| \| \| \| \| \| \| \| jnb 0x3c Total Num Of Uops: 6 ``` Note that this cannot be usefully restricted to inner loops. Much of the hot code we see hitting this is not in an inner loop or not in a loop at all. The optimization still remains effective and indeed critical for some of our code. I have run a suite of internal benchmarks with this change. I saw a few very significant improvements and a very few minor regressions, but overall this change rarely has a significant effect. However, the improvements were very significant, and in quite important routines responsible for a great deal of our C++ CPU cycles. The gains pretty clealy outweigh the regressions for us. I also ran the test-suite and SPEC2006. Only 11 binaries changed at all and none of them showed any regressions. Amjad Aboud at Intel also ran this over their benchmarks and saw no regressions. Differential Revision: https://reviews.llvm.org/D36858 llvm-svn: 311226	2017-08-19 05:01:19 +00:00
Dinar Temirbulatov	7aff8cfa55	[SLPVectorizer] Tighten up VLeft, VRight declaration, remove unnecessary testcase test/Transforms/SLPVectorizer/X86/reorder.ll, NFCI. llvm-svn: 311223	2017-08-19 03:15:07 +00:00
Dinar Temirbulatov	e3ce1b455e	[SLPVectorizer] Add opcode parameter to reorderAltShuffleOperands, reorderInputsAccordingToOpcode functions. Reviewers: mkuper, RKSimon, ABataev, mzolotukhin, spatel, filcab Subscribers: llvm-commits, rengolin Differential Revision: https://reviews.llvm.org/D36766 llvm-svn: 311221	2017-08-19 02:54:20 +00:00
Adrian Prantl	2116dd360a	Filter out non-constant DIGlobalVariableExpressions reachable via the CU They won't affect the DWARF output, but they will mess with the sorting of the fragments. This fixes the crash reported in PR34159. https://bugs.llvm.org/show_bug.cgi?id=34159 llvm-svn: 311217	2017-08-19 01:15:06 +00:00
Eric Beckmann	91d8af5386	llvm-mt: Merge manifest namespaces. mt.exe performs a tree merge where certain element nodes are combined into one. This introduces the possibility of xml namespaces conflicting with each other. The original mt.exe has a hierarchy whereby certain namespace names can override others, and nodes that would then end up in ambigious namespaces have their namespaces explicitly defined. This namespace handles this merging process. llvm-svn: 311215	2017-08-19 00:37:41 +00:00
Xinliang David Li	709ffe178e	[Profile] backward propagate profile info in JumpThreading Differential Revsion: http://reviews.llvm.org/D36864 llvm-svn: 311208	2017-08-18 23:00:05 +00:00
Amjad Aboud	88ffa3afe2	[InstCombine] Teach ComputeNumSignBitsImpl to handle integer multiply instruction. Differential Revision: https://reviews.llvm.org/D36679 llvm-svn: 311206	2017-08-18 22:56:55 +00:00
Max Kazantsev	0aaf8c16ac	[IRCE] Fix buggy behavior in Clamp Clamp function was too optimistic when choosing signed or unsigned min/max function for calculations. In fact, `!IsSignedPredicate` guarantees us that `Smallest` and `Greatest` can be compared safely using unsigned predicates, but we did not check this for `S` which can in theory be negative. This patch makes Clamp use signed min/max for cases when it fails to prove `S` being non-negative, and it adds a test where such situation may lead to incorrect conditions calculation. Differential Revision: https://reviews.llvm.org/D36873 llvm-svn: 311205	2017-08-18 22:50:29 +00:00
Jonas Devlieghere	a2faf7b60f	[llvm-dwarfdump] Hide .debug_str and DIE reference offsets in brief mode This patch hides the .debug_str offset and DIE reference offsets into the CU when llvm-dwarfdump is invoked with -brief. Differential Revision: https://reviews.llvm.org/D36835 llvm-svn: 311201	2017-08-18 21:35:44 +00:00
Simon Pilgrim	f36cca88fb	[X86][ADX] Regenerate ADX intrinsics tests llvm-svn: 311198	2017-08-18 21:21:14 +00:00
Ana Pazos	6210f27dfc	[PGO] Fixed assertion due to mismatched memcpy size type. Summary: Memcpy intrinsics have size argument of any integer type, like i32 or i64. Fixed size type along with its value when cloning the intrinsic. Reviewers: davidxl, xur Reviewed By: davidxl Subscribers: mcrosier, llvm-commits Differential Revision: https://reviews.llvm.org/D36844 llvm-svn: 311188	2017-08-18 19:17:08 +00:00
Tim Northover	14302fcb24	ARM: use an external relocation for calls from MachO ARM mode. The internal (__text-relative) relocation risks the offset not being encodable if the destination is Thumb. llvm-svn: 311187	2017-08-18 19:13:56 +00:00
Matt Morehouse	5c7fc76983	[SanitizerCoverage] Add stack depth tracing instrumentation. Summary: Augment SanitizerCoverage to insert maximum stack depth tracing for use by libFuzzer. The new instrumentation is enabled by the flag -fsanitize-coverage=stack-depth and is compatible with the existing trace-pc-guard coverage. The user must also declare the following global variable in their code: thread_local uintptr_t __sancov_lowest_stack https://bugs.llvm.org/show_bug.cgi?id=33857 Reviewers: vitalybuka, kcc Reviewed By: vitalybuka Subscribers: kubamracek, hiraditya, cfe-commits, llvm-commits Differential Revision: https://reviews.llvm.org/D36839 llvm-svn: 311186	2017-08-18 18:43:30 +00:00
Marek Sokolowski	5cd3d5c8d6	Reapply: [llvm-rc] Add basic RC scripts parsing ability. As for now, the parser supports a limited set of statements and resources. This will be extended in the following patches. Thanks to Nico Weber (thakis) for his original work in this area. This patch was originally submitted as r311175 and got reverted in r311177 because of the problems with compilation under gcc. Differential Revision: https://reviews.llvm.org/D36340 llvm-svn: 311184	2017-08-18 18:24:17 +00:00
Jonas Devlieghere	e101b07a1d	[Debug info] Transfer DI to fragment expressions for split integer values. This patch teaches the SDag type legalizer how to split up debug info for integer values that are split into a hi and lo part. (re-commit) Differential Revision: https://reviews.llvm.org/D36805 llvm-svn: 311181	2017-08-18 18:07:00 +00:00
Marek Sokolowski	f276f52014	Revert "[llvm-rc] Add basic RC scripts parsing ability." This reverts commit r311175. This failed some buildbots compilation. llvm-svn: 311177	2017-08-18 17:25:55 +00:00
Marek Sokolowski	dbc16476c1	[llvm-rc] Add basic RC scripts parsing ability. As for now, the parser supports a limited set of statements and resources. This will be extended in the following patches. Thanks to Nico Weber (thakis) for his original work in this area. Differential Revision: https://reviews.llvm.org/D36340 llvm-svn: 311175	2017-08-18 17:05:47 +00:00
Simon Pilgrim	879ce046ad	[X86][BMI2] Added scheduling test for RORX/SARX/SHLX/SHRX instructions llvm-svn: 311171	2017-08-18 16:26:39 +00:00
Simon Pilgrim	358aeae7b8	[X86][AES] Add scheduling latency/throughput tests for AES instructions llvm-svn: 311167	2017-08-18 15:26:51 +00:00
Simon Pilgrim	9eb0869e91	[X86][PCLMUL] Add scheduling latency/throughput test for PCLMULQDQ instruction Added it to the SSE42 tests as targets seem to always have both llvm-svn: 311166	2017-08-18 15:08:30 +00:00
Simon Pilgrim	ccaec26175	[X86][SHA] Add scheduling latency/throughput tests for SHA instructions llvm-svn: 311164	2017-08-18 14:55:50 +00:00
Simon Pilgrim	7f506f7d72	[X86][MOVBE] Add scheduling latency/throughput tests for MOVBE instructions llvm-svn: 311163	2017-08-18 14:44:31 +00:00
Simon Pilgrim	320f89782a	[X86][BMI2] Added scheduling test for MULX instructions llvm-svn: 311159	2017-08-18 13:22:18 +00:00
Sjoerd Meijer	ec9581e5e0	[AArch64] Do not promote f16 when subtarget HasFullFP16 Armv8.2-A adds FP16 support, i.e. f16 is not only a storage-only type, but it also supports performing data processing on 16-bit floating-point quantities. All the necessary (tablegen) groundwork of adding the ARMv8.2-A FP16 (scalar) instructions was done in D15014. To take advantage of this, this patch avoids promotion of f16 to f32 types when the subtarget supports FullFP16, which enables instruction selection of these FP16 instructions. Differential Revision: https://reviews.llvm.org/D36396 llvm-svn: 311154	2017-08-18 10:51:14 +00:00
Diana Picus	42ea77d5c2	Revert "GlobalISel (AArch64): fix ABI at border between GPRs and SP." This reverts commit e8fd20964798ca6d46d2729dd3a789707a6416da in an attempt to appease the GlobalISel buildbot, which fails in the test-suite with errors like fpcmp: files differ without tolerance allowance llvm-svn: 311151	2017-08-18 09:31:21 +00:00
Geoff Berry	bd47e8a4f7	Revert "[MachineCopyPropagation] Extend pass to do COPY source forwarding" round 2 This reverts commit r311135. sanitizer-x86_64-linux-android buildbot is timing out with just this patch applied. llvm-svn: 311142	2017-08-18 01:43:11 +00:00
Richard Smith	c0541dfa3e	Increase tail dup threshold for -O3 from 3 to 4. We see a modest performance improvement from this slightly higher tail dup threshold. Differential Revision: https://reviews.llvm.org/D36775 llvm-svn: 311139	2017-08-17 23:38:41 +00:00
Craig Topper	1fae3ae6f0	[X86] Remove SSE/AVX patterns for AND/XOR/OR/ANDN that checked for the inputs being bitcasted from floating point types. There's really no reason to do this we should just let isel pick the integer version and let the execution dependency fixing pass take care of moving to FP if necessary. It's not very reliable to look for bitcasts at the edges of patterns. If for some reason one input was bitcasted and the other wasn't, or if one was a v4f32 bitcast and one was a v2f64 bitcast, we would have fallen back to the integer pattern anyway. llvm-svn: 311138	2017-08-17 23:20:57 +00:00
Tim Northover	48fff995d6	GlobalISel (AArch64): fix ABI at border between GPRs and SP. If a struct would end up half in GPRs and half on SP the ABI says it should actually go entirely on the stack. We were getting this wrong in GlobalISel before, causing compatibility issues. llvm-svn: 311137	2017-08-17 23:14:01 +00:00

1 2 3 4 5 ...

46905 Commits