llvm-project

Commit Graph

Author	SHA1	Message	Date
Chandler Carruth	46259260c7	[x86] NFC - normalize test case formatting of IR and generate CHECK lines with the script rather than using manually written checks. llvm-svn: 311753	2017-08-25 02:32:51 +00:00
Craig Topper	355d8cff49	[X86] Add TBM instructions to X86InstrInfo::isDefConvertible. This allows us to remove "test" instructions and use the flags from the TBM instructions directly. llvm-svn: 311747	2017-08-25 01:59:06 +00:00
Chandler Carruth	5b491808f5	[x86] Back out one aspect of r311318: don't generically set FeatureSlowUAMem32. The idea was to mark things that are slow on widely available processors as slow in the generic CPU so that the code generated for that CPU would be fast across those processors. However, for this feature that doesn't work out very well at all. The problem here is that you can very easily enable AVX or AVX2 on top of this generic CPU. For example, this can happen just by using AVX2 intrinsics from Clang within a region of code guarded by a dynamic CPU feature test. When you do that, the generated code with SlowUAMem32 set is ... amazingly slower. The problem is that there really aren't very good alternatives to the unaligned loads, and so our vector codegen regresses significantly. The other issue is that there are plenty of AMD CPUs with AVX1 that don't set FeatureSlowUAMem32 and so we shouldn't just check for AVX2 instead of this special feature. =/ It would be nice to have the target attriute logic be able to enable/disable more than just one feature at a time and control this in a more fine grained and useful way, but that doesn't seem easy. Given that it is only Sandybridge and Ivybridge that set this feature, for now I'm just backing it out of the generic CPU. That has the additional advantage of going back to the previous state that people seemed vaguely happy with. llvm-svn: 311740	2017-08-25 00:56:05 +00:00
Chandler Carruth	8ac488b161	[x86] Fix an amazing goof in the handling of sub, or, and xor lowering. The comment for this code indicated that it should work similar to our handling of add lowering above: if we see uses of an instruction other than flag usage and store usage, it tries to avoid the specialized X86ISD::* nodes that are designed for flag+op modeling and emits an explicit test. Problem is, only the add case actually did this. In all the other cases, the logic was incomplete and inverted. Any time the value was used by a store, we bailed on the specialized X86ISD node. All of this appears to have been historical where we had different logic here. =/ Turns out, we have quite a few patterns designed around these nodes. We should actually form them. I fixed the code to match what we do for add, and it has quite a positive effect just within some of our test cases. The only thing close to a regression I see is using: notl %r testl %r, %r instead of: xorl -1, %r But we can add a pattern or something to fold that back out. The improvements seem more than worth this. I've also worked with Craig to update the comments to no longer be actively contradicted by the code. =[ Some of this still remains a mystery to both Craig and myself, but this seems like a large step in the direction of consistency and slightly more accurate comments. Many thanks to Craig for help figuring out this nasty stuff. Differential Revision: https://reviews.llvm.org/D37096 llvm-svn: 311737	2017-08-25 00:34:07 +00:00
Sanjay Patel	e404cbff66	[DAG] convert vector select-of-constants to logic/math This goes back to a discussion about IR canonicalization. We'd like to preserve and convert more IR to 'select' than we currently do because that's likely the best choice in IR: http://lists.llvm.org/pipermail/llvm-dev/2016-September/105335.html ...but that's often not true for codegen, so we need to account for this pattern coming in to the backend and transform it to better DAG ops. Steps in this patch: 1. Add an EVT param to the existing convertSelectOfConstantsToMath() TLI hook to more finely enable this transform. Other targets will probably want that anyway to distinguish scalars from vectors. We're using that here to exclude AVX512 targets, but it may not be necessary. 2. Convert a vselect to ext+add. This eliminates a constant load/materialization, and the vector ext is often free. Implementing a more general fold using xor+and can be a follow-up for targets that don't have a legal vselect. It's also possible that we can remove the TLI hook for the special case fold implemented here because we're eliminating a constant, but it needs to be tested on other targets. Differential Revision: https://reviews.llvm.org/D36840 llvm-svn: 311731	2017-08-24 23:24:43 +00:00
Michael Zuckerman	9ee61d9b00	Adding base lit test for x86interleaved llvm-svn: 311658	2017-08-24 14:11:28 +00:00
Chandler Carruth	dc2556934c	[x86] NFC: Clean up two tests and generate precise checks for them. Mostly this involved giving unnamed values names and running the IR through `opt` to re-format it but merging in any important comments in the original. I then deleted pointless comments and inlined the function attributes for ease of reading and editting. All of this is to make it much easier to see the instructions being generated here and evaluate any updates to the tests. llvm-svn: 311634	2017-08-24 07:38:36 +00:00
Igor Breger	47be5fbbe9	[GlobalISel][X86] Support G_IMPLICIT_DEF. Summary: Support G_IMPLICIT_DEF. Reviewers: zvi, guyblank, t.p.northover Reviewed By: guyblank Subscribers: rovka, llvm-commits, kristof.beyls Differential Revision: https://reviews.llvm.org/D36733 llvm-svn: 311633	2017-08-24 07:06:27 +00:00
Wei Ding	a131d3fb29	Add ‘llvm.experimental.constrained.fma‘ Intrinsic. Differential Revision: http://reviews.llvm.org/D36335 llvm-svn: 311629	2017-08-24 04:18:24 +00:00
Hans Wennborg	c39ec95d88	[DAG] Fix Node Replacement in PromoteIntBinOp When one operand is a user of another in a promoted binary operation we may replace and delete the returned value before returning triggering an assertion. Reorder node replacements to prevent this. Fixes PR34137. Landing on behalf of Nirav. Differential Revision: https://reviews.llvm.org/D36581 llvm-svn: 311623	2017-08-24 01:08:27 +00:00
Reid Kleckner	6d353348e5	Parse and print DIExpressions inline to ease IR and MIR testing Summary: Most DIExpressions are empty or very simple. When they are complex, they tend to be unique, so checking them inline is reasonable. This also avoids the need for CodeGen passes to append to the llvm.dbg.mir named md node. See also PR22780, for making DIExpression not be an MDNode. Reviewers: aprantl, dexonsmith, dblaikie Subscribers: qcolombet, javed.absar, eraman, hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D37075 llvm-svn: 311594	2017-08-23 20:31:27 +00:00
Craig Topper	853a8d9ffc	[AVX512] Don't create SHRUNKBLEND SDNodes for 512-bit vectors There are no 512-bit blend instructions so we shouldn't create SHRUNKBLEND for them. On a side note, it looks like there may be a missed opportunity for constant folding TESTM when LHS and RHS are equal. This fixes PR34139. Differential Revision: https://reviews.llvm.org/D36992 llvm-svn: 311572	2017-08-23 16:41:02 +00:00
Dean Michael Berris	0884b73220	[XRay][CodeGen] Use PIC-friendly code in XRay sleds; remove synthetic references in .text Summary: This change achieves two things: - Redefine the Custom Event handling instrumentation points emitted by the compiler to not require dynamic relocation of references to the __xray_CustomEvent trampoline. - Remove the synthetic reference we emit at the end of a function that we used to keep auxiliary sections alive in favour of SHF_LINK_ORDER associated with the section where the function is defined. To achieve the custom event handling change, we've had to introduce the concept of sled versioning -- this will need to be supported by the runtime to allow us to understand how to turn on/off the new version of the custom event handling sleds. That change has to land first before we change the way we write the sleds. To remove the synthetic reference, we rely on a relatively new linker feature that preserves the sections that are associated with each other. This allows us to limit the effects on the .text section of ELF binaries. Because we're still using absolute references that are resolved at runtime for the instrumentation map (and function index) maps, we mark these sections write-able. In the future we can re-define the entries in the map to use relative relocations instead that can be statically determined by the linker. That change will be a bit more invasive so we defer this for later. Depends on D36816. Reviewers: dblaikie, echristo, pcc Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D36615 llvm-svn: 311525	2017-08-23 04:49:41 +00:00
Matthias Braun	d6c0868da5	Fix tail-merge-after-mbp test The output of this test changed after the fix in r311520 to have -run-pass=block-placement behave like it does in a normal pipeline. Adjust the test. llvm-svn: 311521	2017-08-23 03:49:53 +00:00
Sanjay Patel	0ab50f6d68	[x86] auto-generate full checks; NFC I don't see anything Darwin-specific here, so I made the target generic x86-64. llvm-svn: 311465	2017-08-22 16:27:00 +00:00
Sanjay Patel	40b8e3bfe5	[x86] simplify runs and auto-generate full checks I've replaced the two OS-specific runs with a generic run because there's no functional difference in the resulting output that we're checking. Also, the script still doesn't work with a Win target. llvm-svn: 311463	2017-08-22 16:21:45 +00:00
Craig Topper	b49f0893b2	[X86] Prevent several calls to ISD::isConstantSplatVector from returning a narrower APInt than the original scalar type ISD::isConstantSplatVector can shrink to the smallest splat width. But we don't check the size of the resulting APInt at all. This can cause us to misinterpret the results. This patch just adds a flag to prevent the APInt from changing width. Fixes PR34271. Differential Revision: https://reviews.llvm.org/D36996 llvm-svn: 311429	2017-08-22 05:40:17 +00:00
Craig Topper	8078dd2984	[X86] When selecting sse_load_f32/f64 pattern, make sure there's only one use of every node all the way back to the root of the match Summary: With masked operations, its possible for the operation node like fadd, fsub, etc. to be used by multiple different vselects. Since the pattern matching will start at the vselect, we need to make sure the operation node itself is only used once before we can fold a load. Otherwise we'll end up folding the same load into multiple instructions. Reviewers: RKSimon, spatel, zvi, igorb Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D36938 llvm-svn: 311342	2017-08-21 16:04:04 +00:00
Igor Breger	685889cf9b	[GlobalISel][X86] Support G_BRCOND operation. Summary: Support G_BRCOND operation. For now don't try to fold cmp/trunc instructions. Reviewers: zvi, guyblank Reviewed By: guyblank Subscribers: rovka, llvm-commits, kristof.beyls Differential Revision: https://reviews.llvm.org/D34754 llvm-svn: 311327	2017-08-21 10:51:54 +00:00
Igor Breger	1b5e3d3e28	[GlobalISel][X86] LowerCall, for now don't handel ByValue function arguments. llvm-svn: 311321	2017-08-21 08:59:59 +00:00
Michael Zuckerman	bdb6673151	[InterLeaved] Adding lit test for future work interleaved load strid 3 llvm-svn: 311320	2017-08-21 08:56:39 +00:00
Chandler Carruth	98c51cbee1	[x86] Teach the "generic" x86 CPU to avoid patterns that are slow on widely used processors. This occured to me when I saw that we were generating 'inc' and 'dec' when for Haswell and newer we shouldn't. However, there were a few "X is slow" things that we should probably just set. I've avoided any of the "X is fast" features because most of those would be pretty serious regressions on processors where X isn't actually fast. The slow things are likely to be negligible costs on processors where these aren't slow and a significant win when they are slow. In retrospect this seems somewhat obvious. Not sure why we didn't do this a long time ago. Differential Revision: https://reviews.llvm.org/D36947 llvm-svn: 311318	2017-08-21 08:45:22 +00:00
Chandler Carruth	63dd5e0ef6	[x86] Handle more cases where we can re-use an atomic operation's flags rather than doing a separate comparison. This both saves an explicit comparision and avoids the use of `xadd` which introduces register constraints and other challenges to the generated code. The motivating case is from atomic reference counts where `1` is the sentinel rather than `0` for whatever reason. This can and should be lowered efficiently on x86 by just using a different flag, however the x86 code only handled the `0` case. There remains some further opportunities here that are currently hidden due to canonicalization. I've included test cases that show these and FIXMEs. However, I don't at the moment have any production use cases and they seem substantially harder to address. Differential Revision: https://reviews.llvm.org/D36945 llvm-svn: 311317	2017-08-21 08:45:19 +00:00
Craig Topper	d6f4be97e6	[AVX-512] Don't change which instructions we use for unmasked subvector broadcasts when AVX512DQ is enabled. There's no functional difference between the AVX512DQ instructions if we're not masking. This change unifies test checks and removes extra isel entries. Similar was done for subvector insert and extracts recently. llvm-svn: 311308	2017-08-21 05:29:02 +00:00
Craig Topper	485cca1ecb	[AVX512] Add 128->256 vbroadcastf64x2/vbroadcasti64x2 instructions to the EVEX->VEX table. llvm-svn: 311307	2017-08-21 05:03:28 +00:00
Craig Topper	d63b33f9c4	[AVX512] Add a test to check what happens when a load is referenced by two different masked scalar intrinsics with the same op inputs, but different masking node. We're missing some single use checks in the sse_load_f32/f64 handling that cause us to replicate the load. llvm-svn: 311300	2017-08-20 19:47:00 +00:00
Igor Breger	88a3d5c855	[GlobalISel][X86] Support call ABI. Summary: Support call ABI. For now only Linux C and X86_64_SysV calling conventions supported. Variadic function not supported. Reviewers: zvi, guyblank, oren_ben_simhon Reviewed By: oren_ben_simhon Subscribers: rovka, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D34602 llvm-svn: 311279	2017-08-20 09:25:22 +00:00
Igor Breger	b3a860a5e8	[GlobalISel][X86] Support asimetric copy from/to GPR physical register. Usually this case generated by ABI lowering, it requare to performe trancate/anyext. llvm-svn: 311278	2017-08-20 07:14:40 +00:00
Chandler Carruth	9ef881efab	[x86] Fix an even stranger corner case where we have multiple levels of cmov self-refrencing. Pointed out by Amjad Aboud in code review, test case minorly simplified from the one he posted. llvm-svn: 311267	2017-08-19 23:35:50 +00:00
Craig Topper	a0319bb434	[AVX512] Use alignedstore256 in a pattern that's emitting a 256-bit movaps from an extract subvector operation. llvm-svn: 311263	2017-08-19 22:02:02 +00:00
Jatin Bhateja	6b4c205685	[DAGCombiner] Extending pattern detection for vector shuffle. Summary: If all the operands of a BUILD_VECTOR extract elements from same vector then split the vector efficiently based on the maximum vector access index. Reviewers: zvi, delena, RKSimon, thakis Reviewed By: RKSimon Subscribers: chandlerc, eladcohen, llvm-commits Differential Revision: https://reviews.llvm.org/D35788 llvm-svn: 311255	2017-08-19 18:08:59 +00:00
Jatin Bhateja	66f7958e91	Revert rL311247 : To rectify commit message. Summary: This reverts commit rL311247. Differential Revision: https://reviews.llvm.org/D36927 llvm-svn: 311252	2017-08-19 17:59:58 +00:00
Jatin Bhateja	6f0d0d23b0	Merge branch 'arcpatch-D35788' llvm-svn: 311247	2017-08-19 17:00:04 +00:00
Jatin Bhateja	1c56863739	Revert rL311242 "Extension of shuffle vector pattern detection, updating post rebase." Summary: This reverts commit rL311242. Differential Revision: https://reviews.llvm.org/D36924 llvm-svn: 311246	2017-08-19 16:40:06 +00:00
Jatin Bhateja	313f97dd84	Extension of shuffle vector pattern detection, updating post rebase. llvm-svn: 311242	2017-08-19 15:58:36 +00:00
Chandler Carruth	93a645525c	[x86] Teach the cmov converter to aggressively convert cmovs with memory operands into control flow. We have seen periodically performance problems with cmov where one operand comes from memory. On modern x86 processors with strong branch predictors and speculative execution, this tends to be much better done with a branch than cmov. We routinely see cmov stalling while the load is completed rather than continuing, and if there are subsequent branches, they cannot be speculated in turn. Also, in many (even simple) cases, macro fusion causes the control flow version to be fewer uops. Consider the IACA output for the initial sequence of code in a very hot function in one of our internal benchmarks that motivates this, and notice the micro-op reduction provided. Before, SNB: ``` Throughput Analysis Report -------------------------- Block Throughput: 2.20 Cycles Throughput Bottleneck: Port1 \| Num Of \| Ports pressure in cycles \| \| \| Uops \| 0 - DV \| 1 \| 2 - D \| 3 - D \| 4 \| 5 \| \| --------------------------------------------------------------------- \| 1 \| \| 1.0 \| \| \| \| \| CP \| mov rcx, rdi \| 0* \| \| \| \| \| \| \| \| xor edi, edi \| 2^ \| 0.1 \| 0.6 \| 0.5 0.5 \| 0.5 0.5 \| \| 0.4 \| CP \| cmp byte ptr [rsi+0xf], 0xf \| 1 \| \| \| 0.5 0.5 \| 0.5 0.5 \| \| \| \| mov rax, qword ptr [rsi] \| 3 \| 1.8 \| 0.6 \| \| \| \| 0.6 \| CP \| cmovbe rax, rdi \| 2^ \| \| \| 0.5 0.5 \| 0.5 0.5 \| \| 1.0 \| \| cmp byte ptr [rcx+0xf], 0x10 \| 0F \| \| \| \| \| \| \| \| jb 0xf Total Num Of Uops: 9 ``` After, SNB: ``` Throughput Analysis Report -------------------------- Block Throughput: 2.00 Cycles Throughput Bottleneck: Port5 \| Num Of \| Ports pressure in cycles \| \| \| Uops \| 0 - DV \| 1 \| 2 - D \| 3 - D \| 4 \| 5 \| \| --------------------------------------------------------------------- \| 1 \| 0.5 \| 0.5 \| \| \| \| \| \| mov rax, rdi \| 0* \| \| \| \| \| \| \| \| xor edi, edi \| 2^ \| 0.5 \| 0.5 \| 1.0 1.0 \| \| \| \| \| cmp byte ptr [rsi+0xf], 0xf \| 1 \| 0.5 \| 0.5 \| \| \| \| \| \| mov ecx, 0x0 \| 1 \| \| \| \| \| \| 1.0 \| CP \| jnbe 0x39 \| 2^ \| \| \| \| 1.0 1.0 \| \| 1.0 \| CP \| cmp byte ptr [rax+0xf], 0x10 \| 0F \| \| \| \| \| \| \| \| jnb 0x3c Total Num Of Uops: 7 ``` The difference even manifests in a throughput cycle rate difference on Haswell. Before, HSW: ``` Throughput Analysis Report -------------------------- Block Throughput: 2.00 Cycles Throughput Bottleneck: FrontEnd \| Num Of \| Ports pressure in cycles \| \| \| Uops \| 0 - DV \| 1 \| 2 - D \| 3 - D \| 4 \| 5 \| 6 \| 7 \| \| --------------------------------------------------------------------------------- \| 0* \| \| \| \| \| \| \| \| \| \| mov rcx, rdi \| 0* \| \| \| \| \| \| \| \| \| \| xor edi, edi \| 2^ \| \| \| 0.5 0.5 \| 0.5 0.5 \| \| 1.0 \| \| \| \| cmp byte ptr [rsi+0xf], 0xf \| 1 \| \| \| 0.5 0.5 \| 0.5 0.5 \| \| \| \| \| \| mov rax, qword ptr [rsi] \| 3 \| 1.0 \| 1.0 \| \| \| \| \| 1.0 \| \| \| cmovbe rax, rdi \| 2^ \| 0.5 \| \| 0.5 0.5 \| 0.5 0.5 \| \| \| 0.5 \| \| \| cmp byte ptr [rcx+0xf], 0x10 \| 0F \| \| \| \| \| \| \| \| \| \| jb 0xf Total Num Of Uops: 8 ``` After, HSW: ``` Throughput Analysis Report -------------------------- Block Throughput: 1.50 Cycles Throughput Bottleneck: FrontEnd \| Num Of \| Ports pressure in cycles \| \| \| Uops \| 0 - DV \| 1 \| 2 - D \| 3 - D \| 4 \| 5 \| 6 \| 7 \| \| --------------------------------------------------------------------------------- \| 0* \| \| \| \| \| \| \| \| \| \| mov rax, rdi \| 0* \| \| \| \| \| \| \| \| \| \| xor edi, edi \| 2^ \| \| \| 1.0 1.0 \| \| \| 1.0 \| \| \| \| cmp byte ptr [rsi+0xf], 0xf \| 1 \| \| 1.0 \| \| \| \| \| \| \| \| mov ecx, 0x0 \| 1 \| \| \| \| \| \| \| 1.0 \| \| \| jnbe 0x39 \| 2^ \| 1.0 \| \| \| 1.0 1.0 \| \| \| \| \| \| cmp byte ptr [rax+0xf], 0x10 \| 0F \| \| \| \| \| \| \| \| \| \| jnb 0x3c Total Num Of Uops: 6 ``` Note that this cannot be usefully restricted to inner loops. Much of the hot code we see hitting this is not in an inner loop or not in a loop at all. The optimization still remains effective and indeed critical for some of our code. I have run a suite of internal benchmarks with this change. I saw a few very significant improvements and a very few minor regressions, but overall this change rarely has a significant effect. However, the improvements were very significant, and in quite important routines responsible for a great deal of our C++ CPU cycles. The gains pretty clealy outweigh the regressions for us. I also ran the test-suite and SPEC2006. Only 11 binaries changed at all and none of them showed any regressions. Amjad Aboud at Intel also ran this over their benchmarks and saw no regressions. Differential Revision: https://reviews.llvm.org/D36858 llvm-svn: 311226	2017-08-19 05:01:19 +00:00
Simon Pilgrim	f36cca88fb	[X86][ADX] Regenerate ADX intrinsics tests llvm-svn: 311198	2017-08-18 21:21:14 +00:00
Simon Pilgrim	879ce046ad	[X86][BMI2] Added scheduling test for RORX/SARX/SHLX/SHRX instructions llvm-svn: 311171	2017-08-18 16:26:39 +00:00
Simon Pilgrim	358aeae7b8	[X86][AES] Add scheduling latency/throughput tests for AES instructions llvm-svn: 311167	2017-08-18 15:26:51 +00:00
Simon Pilgrim	9eb0869e91	[X86][PCLMUL] Add scheduling latency/throughput test for PCLMULQDQ instruction Added it to the SSE42 tests as targets seem to always have both llvm-svn: 311166	2017-08-18 15:08:30 +00:00
Simon Pilgrim	ccaec26175	[X86][SHA] Add scheduling latency/throughput tests for SHA instructions llvm-svn: 311164	2017-08-18 14:55:50 +00:00
Simon Pilgrim	7f506f7d72	[X86][MOVBE] Add scheduling latency/throughput tests for MOVBE instructions llvm-svn: 311163	2017-08-18 14:44:31 +00:00
Simon Pilgrim	320f89782a	[X86][BMI2] Added scheduling test for MULX instructions llvm-svn: 311159	2017-08-18 13:22:18 +00:00
Geoff Berry	bd47e8a4f7	Revert "[MachineCopyPropagation] Extend pass to do COPY source forwarding" round 2 This reverts commit r311135. sanitizer-x86_64-linux-android buildbot is timing out with just this patch applied. llvm-svn: 311142	2017-08-18 01:43:11 +00:00
Richard Smith	c0541dfa3e	Increase tail dup threshold for -O3 from 3 to 4. We see a modest performance improvement from this slightly higher tail dup threshold. Differential Revision: https://reviews.llvm.org/D36775 llvm-svn: 311139	2017-08-17 23:38:41 +00:00
Craig Topper	1fae3ae6f0	[X86] Remove SSE/AVX patterns for AND/XOR/OR/ANDN that checked for the inputs being bitcasted from floating point types. There's really no reason to do this we should just let isel pick the integer version and let the execution dependency fixing pass take care of moving to FP if necessary. It's not very reliable to look for bitcasts at the edges of patterns. If for some reason one input was bitcasted and the other wasn't, or if one was a v4f32 bitcast and one was a v2f64 bitcast, we would have fallen back to the integer pattern anyway. llvm-svn: 311138	2017-08-17 23:20:57 +00:00
Geoff Berry	51f52c4fca	Re-enable "[MachineCopyPropagation] Extend pass to do COPY source forwarding" Two issues identified by buildbots were addressed: - The pass no longer forwards COPYs to physical register uses, since doing so can break code that implicitly relies on the physical register number of the use. - The pass no longer forwards COPYs to undef uses, since doing so can break the machine verifier by creating LiveRanges that don't end on a use (since the undef operand is not considered a use). [MachineCopyPropagation] Extend pass to do COPY source forwarding This change extends MachineCopyPropagation to do COPY source forwarding. This change also extends the MachineCopyPropagation pass to be able to be run during register allocation, after physical registers have been assigned, but before the virtual registers have been re-written, which allows it to remove virtual register COPY LiveIntervals that become dead through the forwarding of all of their uses. Reviewers: qcolombet, javed.absar, MatzeB, jonpa Subscribers: jyknight, nemanjai, llvm-commits, nhaehnle, mcrosier, mgorny Differential Revision: https://reviews.llvm.org/D30751 llvm-svn: 311135	2017-08-17 23:06:55 +00:00
Sanjay Patel	f2d67f7ecc	[x86] add tests for vector select-of-constants; NFC We've discussed canonicalizing to this form in IR, so the backend should be prepared to lower these in ways better than what we see here in most cases. llvm-svn: 311103	2017-08-17 17:07:37 +00:00
Adrian Prantl	6a57daad81	Improve line debug info when translating a CaseBlock to SDNodes. The SelectionDAGBuilder translates various conditional branches into CaseBlocks which are then translated into SDNodes. If a conditional branch results in multiple CaseBlocks only the first CaseBlock is translated into SDNodes immediately, the rest of the CaseBlocks are put in a queue and processed when all LLVM IR instructions in the basic block have been processed. When a CaseBlock is transformed into SDNodes the SelectionDAGBuilder is queried for the current LLVM IR instruction and the resulting SDNodes are annotated with the debug info of the current instruction (if it exists and has debug metadata). When the deferred CaseBlocks are processed, the SelectionDAGBuilder does not have a current LLVM IR instruction, and the resulting SDNodes will not have any debuginfo. As DwarfDebug::beginInstruction() outputs a .loc directive for the first instruction in a labeled block (typically the case for something coming from a CaseBlock) this tends to produce a line-0 directive. This patch changes the handling of CaseBlocks to store the current instruction's debug info into the CaseBlock when it is created (and the SelectionDAGBuilder knows the current instruction) and to always use the stored debug info when translating a CaseBlock to SDNodes. Patch by Frej Drejhammar! Differential Revision: https://reviews.llvm.org/D36671 llvm-svn: 311097	2017-08-17 16:57:13 +00:00
Craig Topper	3a622a14f9	[AVX512] Don't switch unmasked subvector insert/extract instructions when AVX512DQI is enabled. There's no reason to switch instructions with and without DQI. It just creates extra isel patterns and test divergences. There is however value in enabling the masked version of the instructions with DQI. This required introducing some new multiclasses to enabling this splitting. Differential Revision: https://reviews.llvm.org/D36661 llvm-svn: 311091	2017-08-17 15:40:25 +00:00

1 2 3 4 5 ...

10081 Commits