llvm-project

Commit Graph

Author	SHA1	Message	Date
Alex Bradbury	07f1c62371	[RISCV] Add codegen support for RV64A In order to support codegen RV64A, this patch: * Introduces masked atomics intrinsics for atomicrmw operations and cmpxchg that use the i64 type. These are ultimately lowered to masked operations using lr.w/sc.w, but we need to use these alternate intrinsics for RV64 because i32 is not legal * Modifies RISCVExpandPseudoInsts.cpp to handle PseudoAtomicLoadNand64 and PseudoCmpXchg64 * Modifies the AtomicExpandPass hooks in RISCVTargetLowering to sext/trunc as needed for RV64 and to select the i64 intrinsic IDs when necessary * Adds appropriate patterns to RISCVInstrInfoA.td * Updates test/CodeGen/RISCV/atomic-*.ll to show RV64A support This ends up being a fairly mechanical change, as the logic for RV32A is effectively reused. Differential Revision: https://reviews.llvm.org/D53233 llvm-svn: 351422	2019-01-17 10:04:39 +00:00
Sanjin Sijaric	b694030647	[ARM64][Windows] Share unwind codes between epilogues There are cases where we have multiple epilogues that have the exact same unwind code sequence. In that case, the epilogues can share the same unwind codes in the .xdata section. This should get us past the assert "SEH unwind data splitting not yet implemented" in many cases. We still need to add support for generating multiple .pdata/.xdata sections for those functions that need to be split into fragments. Differential Revision: https://reviews.llvm.org/D56813 llvm-svn: 351421	2019-01-17 09:45:17 +00:00
Max Kazantsev	ee61308595	[NFC] Factor out some local vars llvm-svn: 351416	2019-01-17 06:20:42 +00:00
Thomas Lively	cbda16eb8e	[WebAssembly] Parse llvm.ident into producers section llvm-svn: 351413	2019-01-17 02:29:55 +00:00
Vedant Kumar	a9906c1e5e	[MergeFunc] Prevent silent miscompile of vararg functions The function merging pass miscompiles identical vararg functions. The forwarding thunk it emits doesn't forward the full variable-length list of arguments. Disable merging for vararg functions for now. I've filed llvm.org/PR40345 to track the issue. rdar://47326238 llvm-svn: 351411	2019-01-17 02:15:05 +00:00
Thomas Lively	3cfcc94c09	Revert "[WebAssembly] Parse llvm.ident into producers section" This reverts commit eccdbba3a02a33e13b5262e92200a33e2ead873d. llvm-svn: 351410	2019-01-17 00:39:49 +00:00
Vedant Kumar	e21ab22115	[FunctionComparator] Consider tail call kinds Essentially, do not treat `call` and `musttail call` as the same thing. As a drive-by, fold CallInst and InvokeInst handling together using the CallSite helper. Differential Revision: https://reviews.llvm.org/D56815 llvm-svn: 351405	2019-01-17 00:29:14 +00:00
Sanjin Sijaric	685565ae9a	[SEH] [ARM64] Retrieve the frame pointer from SEH funclets The Windows ARM64 runtime passes the establisher frame to funclets as the first argument. llvm-svn: 351404	2019-01-17 00:24:38 +00:00
Thomas Lively	a56c23c5ba	[WebAssembly] Parse llvm.ident into producers section Summary: Everything before the word "version" is the tool, and everything after the word "version" is the version. Reviewers: aheejin, dschuff Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D56742 llvm-svn: 351399	2019-01-16 23:46:14 +00:00
Wei Mi	79c4408aa2	Fix a mistake in rL351392. PGOInstrGen should be initialized to "" instead of false. llvm-svn: 351397	2019-01-16 23:31:40 +00:00
Jonas Devlieghere	669edb5ce5	[AsmPrinter] Collapse .loc 0 0 directives Currently we do not always collapse subsequent .loc 0 0 directives. The reason is that we were checking for a PrevInstLoc which is not set when we emit a line-0 record. We should only check the LastAsmLine, which seems to be created exactly for this purpose. // When we emit a line-0 record, we don't update PrevInstLoc; so look at // the last line number actually emitted, to see if it was line 0. unsigned LastAsmLine = Asm->OutStreamer->getContext().getCurrentDwarfLoc().getLine(); Differential revision: https://reviews.llvm.org/D56767 llvm-svn: 351395	2019-01-16 23:26:29 +00:00
Wei Mi	c876e3d42b	[PGO] Make pgo related options in opt more consistent. Currently we have pgo options defined in PassManagerBuilder.cpp only for instrument pgo, but not for sample pgo. We also have pgo options defined in NewPMDriver.cpp in opt only for new pass manager and for all kinds of pgo. They have some inconsistency. To make the options more consistent and make tests writing easier, the patch let old pass manager to share the same pgo options with new pass manager in opt, and removes the options in PassManagerBuilder.cpp. Differential Revision: https://reviews.llvm.org/D56749 llvm-svn: 351392	2019-01-16 23:19:02 +00:00
Sam Clegg	3a2028f4b0	[WebAssembly] Remove expected failure from known_gcc_test_failures.txt. NFC. Differential Revision: https://reviews.llvm.org/D56809 llvm-svn: 351388	2019-01-16 22:26:59 +00:00
Reid Kleckner	ca16e9d8a3	[X86] Sink complex MCU CC helper to .cpp file from .h file, NFC llvm-svn: 351384	2019-01-16 22:05:36 +00:00
Craig Topper	59abdf5f3f	[X86] Add X86ISD::VSHLV and X86ISD::VSRLV nodes for psllv and psrlv Previously we used ISD::SHL and ISD::SRL to represent these in SelectionDAG. ISD::SHL/SRL interpret an out of range shift amount as undefined behavior and will constant fold to undef. While the intrinsics are defined to return 0 for out of range shift amounts. A previous patch added a special node for VPSRAV to produce all sign bits. This was previously believed safe because undefs frequently get turned into 0 either from the constant pool or a desire to not have a false register dependency. But undef is treated specially in some optimizations. For example, its ignored in detection of vector splats. So if the ISD::SHL/SRL can be constant folded and all of the elements with in bounds shift amounts are the same, we might fold it to single element broadcast from the constant pool. This would not put 0s in the elements with out of bounds shift amounts. We do have an existing InstCombine optimization to use shl/lshr when the shift amounts are all constant and in bounds. That should prevent some loss of constant folding from this change. Patch by zhutianyang and Craig Topper Differential Revision: https://reviews.llvm.org/D56695 llvm-svn: 351381	2019-01-16 21:46:32 +00:00
Craig Topper	5ea3120718	[X86] Use X86ISD::BLENDV for blendv intrinsics. Replace vselect with blendv just before isel table lookup. Remove vselect isel patterns. This cleans up the duplication we have with both intrinsic isel patterns and vselect isel patterns. This should also allow the intrinsics to get SimplifyDemandedBits support for the condition. I've switched the canonical pattern in isel to use the X86ISD::BLENDV node instead of VSELECT. Since it always seemed weird to move from BLENDV with its relaxed rules on condition bits to VSELECT which has strict rules about all bits of the condition element being the same. Its more correct to go from VSELECT to BLENDV. Differential Revision: https://reviews.llvm.org/D56771 llvm-svn: 351380	2019-01-16 21:46:28 +00:00
Changpeng Fang	fe9269f804	AMDGPU: Adjust the chain for loads writing to the HI part of a register. Summary: For these loads that write to the HI part of a register, we should chain them to the op that writes to the LO part of the register to maintain the appropriate order. Reviewers: rampitec, arsenm Differential Revision: https://reviews.llvm.org/D56454 llvm-svn: 351379	2019-01-16 21:32:53 +00:00
Craig Topper	e5b7cc8aa0	[X86] Add a one use check to the setcc inversion code in combineVSelectWithAllOnesOrZeros If we're going to generate a new inverted setcc, we should make sure we will be able to remove the old setcc. Differential Revision: https://reviews.llvm.org/D56765 llvm-svn: 351378	2019-01-16 21:29:29 +00:00
Mandeep Singh Grang	33c49c0c82	[COFF, ARM64] Implement support for SEH extensions __try/__except/__finally Summary: This patch supports MS SEH extensions __try/__except/__finally. The intrinsics localescape and localrecover are responsible for communicating escaped static allocas from the try block to the handler. We need to preserve frame pointers for SEH. So we create a new function/property HasLocalEscape. Reviewers: rnk, compnerd, mstorsjo, TomTan, efriedma, ssijaric Reviewed By: rnk, efriedma Subscribers: smeenai, jrmuizel, alex, majnemer, ssijaric, ehsan, dmajor, kristina, javed.absar, kristof.beyls, chrib, llvm-commits Differential Revision: https://reviews.llvm.org/D53540 llvm-svn: 351370	2019-01-16 19:52:59 +00:00
Krzysztof Parzyszek	4121eaf0a5	[Hexagon] Do not promote terminator instructions in Hexagon loop idioms llvm-svn: 351369	2019-01-16 19:40:27 +00:00
Andrea Di Biagio	c5f0f5309e	[X86][BtVer2] Update latency of horizontal operations. On Jaguar, horizontal adds/subs have local forwarding disable. That means, we pay a compulsory extra cycle of write-back stage, and the value is not available until the end of that stage. This patch changes the latency of horizontal operations by adding an extra cycle. With this patch, latency numbers now match what is reported by perf. I plan to send another patch to also 'fix' the latency of shuffle operations (on Jaguar, local forwarding is disabled for vector shuffles too). Differential Revision: https://reviews.llvm.org/D56777 llvm-svn: 351366	2019-01-16 18:18:01 +00:00
Simon Pilgrim	5a2bbe267a	[X86] getFauxShuffleMask - bail for non-byte aligned shuffle types Remove the existing assertion and just return false for unexpected shuffle value types (<X x i1> mainly....). Found while updating combineX86ShufflesRecursively to run within SimplifyDemandedVectorElts/SimplifyDemandedBits. llvm-svn: 351365	2019-01-16 18:15:31 +00:00
Jeremy Morse	7dcea5ae3b	[DebugInfo] Allow creation of DBG_VALUEs in blocks where the operand is not used dbg.value intrinsics can appear in blocks where their operand is not used, meaning the operand never receives an SDNode, and thus no DBG_VALUE will be created. Get around this by looking to see whether the operand has already been allocated a virtual register. This allows dbg.values of Phi node and Values that are used across basic blocks to successfully be translated into DBG_VALUEs. Differential Revision: https://reviews.llvm.org/D56678 llvm-svn: 351358	2019-01-16 17:25:27 +00:00
Simon Pilgrim	524daea429	[X86] Add combineX86ShufflesRecursively helper. NFCI. combineX86ShufflesRecursively is pretty cumbersome with a lot of arguments that only matter later in recursion. This commit adds a wrapper version that only takes the initial root Op to simplify calls that don't need to worry about these. An early, cleanup step towards merging combineX86ShufflesRecursively into SimplifyDemandedVectorElts/SimplifyDemandedBits. llvm-svn: 351352	2019-01-16 16:01:42 +00:00
Marek Olsak	c5cec5e1fa	AMDGPU: Add llvm.amdgcn.ds.ordered.add & swap Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D52944 llvm-svn: 351351	2019-01-16 15:43:53 +00:00
Alexey Bataev	18809a6bbb	[SLP] Fix PR40310: The reduction nodes should stay scalar. Summary: Sometimes the SLP vectorizer tries to vectorize the horizontal reduction nodes during regular vectorization. This may happen inside of the loops, when there are some vectorizable PHIs. Patch fixes this by checking if the node is the reduction node and thus it must not be vectorized, it must be gathered. Reviewers: RKSimon, spatel, hfinkel, fedor.sergeev Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D56783 llvm-svn: 351349	2019-01-16 15:39:52 +00:00
Sanjay Patel	0dbecd05ed	[x86] lower shuffle of extracts to AVX2 vperm instructions I was trying to prevent shuffle regressions while matching more horizontal ops and ended up here: shuf (extract X, 0), (extract X, 4), Mask --> extract (shuf X, undef, Mask'), 0 The affected tests were added for: https://bugs.llvm.org/show_bug.cgi?id=34380 This patch won't change the examples in the bug report itself, but we should be able to extend this to catch more types. Differential Revision: https://reviews.llvm.org/D56756 llvm-svn: 351346	2019-01-16 14:15:18 +00:00
Anton Korobeynikov	cbdb4effae	[MSP430] Emit a separate section for every interrupt vector This is LLVM part of D56663 Linker scripts shipped by TI require to have every interrupt vector in a separate section with a specific name: SECTIONS { __interrupt_vector_XX : { KEEP (*(__interrupt_vector_XX )) } > VECTXX ... } Follow the requirement emit the section for every vector which contain address of interrupt handler: .section __interrupt_vector_XX,"ax",@progbits .word %isr% Patch by Kristina Bessonova! Differential Revision: https://reviews.llvm.org/D56664 llvm-svn: 351345	2019-01-16 14:03:41 +00:00
Gabor Buella	3ec170c85a	Assertion in isAllocaPromotable due to extra bitcast goes into lifetime marker For the given test SROA detects possible replacement and creates a correct alloca. After that SROA is adding lifetime markers for this new alloca. The function getNewAllocaSlicePtr is trying to deduce the pointer type based on the original alloca, which is split, to use it later in lifetime intrinsic. For the test we ended up with such code (rA is initial alloca [10 x float], which is split, and rA.sroa.0.0 is a new split allocation) ``` %rA.sroa.0.0.rA.sroa_cast = bitcast i32* %rA.sroa.0 to [10 x float]* <----- this one causing the assertion and is an extra bitcast %5 = bitcast [10 x float]* %rA.sroa.0.0.rA.sroa_cast to i8* call void @llvm.lifetime.start.p0i8(i64 4, i8* %5) ``` isAllocaPromotable code assumes that a user of alloca may go into lifetime marker through bitcast but it must be the only one bitcast to i8* type. In the test it's not a i8* type, return false and throw the assertion. As we are creating a pointer, which will be used in lifetime markers only, the proposed fix is to create a bitcast to i8* immediately to avoid extra bitcast creation. The test is a greatly simplified to just reproduce the assertion. Author: Igor Tsimbalist <igor.v.tsimbalist@intel.com> Reviewers: chandlerc, craig.topper Reviewed By: chandlerc Differential Revision: https://reviews.llvm.org/D55934 llvm-svn: 351325	2019-01-16 12:06:17 +00:00
Philip Pfaffe	81101de585	[MSan] Apply the ctor creation scheme of TSan Summary: To avoid adding an extern function to the global ctors list, apply the changes of D56538 also to MSan. Reviewers: chandlerc, vitalybuka, fedor.sergeev, leonardchan Subscribers: hiraditya, bollu, llvm-commits Differential Revision: https://reviews.llvm.org/D56734 llvm-svn: 351322	2019-01-16 11:14:07 +00:00
Florian Hahn	e94470f1cc	[SelectionDAG] Update check in createOperands to reflect max() is a valid value. The value returned by max() is the last valid value, adjust the comparison accordingly. The code added in D55073 creates TokenFactors with max() operands. Reviewers: aemerson, efriedma, RKSimon, craig.topper Reviewed By: aemerson Differential Revision: https://reviews.llvm.org/D56738 llvm-svn: 351318	2019-01-16 10:06:04 +00:00
Pavel Labath	1ad53ca2b0	[Support] Remove error return value from one overload of fs::make_absolute Summary: The version of make_absolute which accepted a specific directory to use as the "base" for the computation could never fail, even though it returned a std::error_code. The reason for that seems to be historical -- the CWD flavour (which can fail due to failure to retrieve CWD) was there first, and the new version was implemented by extending that. This removes the error return value from the non-CWD overload and reimplements the CWD version on top of that. This enables us to remove some dead code where people were pessimistically trying to handle the errors returned from this function. Reviewers: zturner, sammccall Subscribers: hiraditya, kristina, llvm-commits Differential Revision: https://reviews.llvm.org/D56599 llvm-svn: 351317	2019-01-16 09:55:32 +00:00
Philip Pfaffe	685c76d7a3	[NewPM][TSan] Reiterate the TSan port Summary: Second iteration of D56433 which got reverted in rL350719. The problem in the previous version was that we dropped the thunk calling the tsan init function. The new version keeps the thunk which should appease dyld, but is not actually OK wrt. the current semantics of function passes. Hence, add a helper to insert the functions only on the first time. The helper allows hooking into the insertion to be able to append them to the global ctors list. Reviewers: chandlerc, vitalybuka, fedor.sergeev, leonardchan Subscribers: hiraditya, bollu, llvm-commits Differential Revision: https://reviews.llvm.org/D56538 llvm-svn: 351314	2019-01-16 09:28:01 +00:00
Sam Parker	dd8cd6d26b	[DAGCombine] Fix ReduceLoadWidth for shifted offsets ReduceLoadWidth can trigger using a shifted mask is used and this requires that the function return a shl node to correct for the offset. However, the way that this was implemented meant that the returned result could be an existing node, which would be incorrect. This fixes the method of inserting the new node and replacing uses. Differential Revision: https://reviews.llvm.org/D50432 llvm-svn: 351310	2019-01-16 08:40:12 +00:00
Dan Gohman	9299637d3c	[WebAssembly] COWS has been renamed to WASI. llvm-svn: 351297	2019-01-16 05:23:52 +00:00
Tom Stellard	3d36e5c3e6	Only promote args when function attributes are compatible Summary: Check to make sure that the caller and the callee have compatible function arguments before promoting arguments. This uses the same TargetTransformInfo queries that are used to determine if attributes are compatible for inlining. The goal here is to avoid breaking ABI when a called function's ABI depends on a target feature that is not enabled in the caller. This is a very conservative fix for PR37358. Ideally we would have a more sophisticated check for ABI compatiblity rather than checking if the attributes are compatible for inlining. Reviewers: echristo, chandlerc, eli.friedman, craig.topper Reviewed By: echristo, chandlerc Subscribers: nikic, xbolva00, rkruppe, alexcrichton, llvm-commits Differential Revision: https://reviews.llvm.org/D53554 llvm-svn: 351296	2019-01-16 05:15:31 +00:00
Serguei Katkov	a5b0e5585b	[InstCombine]Avoid introduction of unaligned mem access InstCombine is able to transform mem transfer instrinsic to alone store or store/load pair. It might result in generation of unaligned atomic load/store which later in backend will be transformed to libcall. It is not an evident gain and it is better to keep intrinsic as is and handle it at backend. Reviewers: reames, anna, apilipenko, mkazantsev Reviewed By: reames Subscribers: t.p.northover, jfb, llvm-commits Differential Revision: https://reviews.llvm.org/D56582 llvm-svn: 351295	2019-01-16 04:36:26 +00:00
Sam Clegg	56c587adfd	[WebAssembly] Store section alignment as a power of 2 This change bumps for version number of the wasm object file metadata. See https://github.com/WebAssembly/tool-conventions/pull/92 Differential Revision: https://reviews.llvm.org/D56758 llvm-svn: 351285	2019-01-16 01:34:48 +00:00
Aditya Nandakumar	500e3ead9f	[GISel]: Add support for CSEing continuously during GISel passes. https://reviews.llvm.org/D52803 This patch adds support to continuously CSE instructions during each of the GISel passes. It consists of a GISelCSEInfo analysis pass that can be used by the CSEMIRBuilder. llvm-svn: 351283	2019-01-16 00:40:37 +00:00
Mandeep Singh Grang	436735c3fe	[EH] Rename llvm.x86.seh.recoverfp intrinsic to llvm.eh.recoverfp Summary: Make recoverfp intrinsic target-independent so that it can be implemented for AArch64, etc. Refer D53541 for the context. Clang counterpart D56748. Reviewers: rnk, efriedma Reviewed By: rnk, efriedma Subscribers: javed.absar, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D56747 llvm-svn: 351281	2019-01-16 00:37:13 +00:00
Craig Topper	0e420e6a62	[X86] Rename SHRUNKBLEND ISD node to BLENDV. That's really what it is. If we didn't use intrinsics for BLENDVPS/BLENDVPD/PBLENDVB all the way to isel, this is the node we would use. llvm-svn: 351278	2019-01-16 00:20:30 +00:00
Craig Topper	34ac509ac8	[X86] Add avx512 scatter intrinsics that use a vXi1 mask instead of a scalar integer. We're trying to have the vXi1 types in IR as much as possible. This prevents the need for bitcasts when the producer of the mask was already a vXi1 value like an icmp. The bitcasts can be subject to code motion and interfere with basic block at a time isel in bad ways. llvm-svn: 351275	2019-01-15 23:36:25 +00:00
Changpeng Fang	20fe3d2f35	AMDGPU: Raise the priority of MAD24 in instruction selection. Summary: We have seen performance regression when v_add3 is generated. The major reason is that the v_mad pattern is broken when v_add3 is generated. We also see the register pressure increased. While we could not properly estimate register pressure during instruction selection, we can give mad a higher priority. In this work, we raise the priority for mad24 in selection and resolve the performance regression. Reviewers: rampitec Differential Revision: https://reviews.llvm.org/D56745 llvm-svn: 351273	2019-01-15 23:12:36 +00:00
Jonas Devlieghere	1a0ce65ad3	[VFS] Move RedirectingFileSystem interface into header (NFC) This moves the RedirectingFileSystem into the header so it can be extended. This is needed in LLDB we need a way to obtain the external path to deal with FILE* and file descriptor APIs. Discussion on the mailing list: http://lists.llvm.org/pipermail/llvm-dev/2018-November/127755.html Differential revision: https://reviews.llvm.org/D54277 llvm-svn: 351265	2019-01-15 22:36:41 +00:00
Roman Lebedev	fb4eed381d	X86DAGToDAGISel::matchBitExtract() with truncation (PR36419) Summary: Previously in D54095 i have added support for extraction of `lshr` from `X` if we are to produce `BEXTR`. That was good, but the fix was partial, there was still [[ https://bugs.llvm.org/show_bug.cgi?id=36419 \| PR36419 ]]. That pattern can also appear, roughly, when you have a large (64-bit) storage, and the consume bits from it. It will not be unexpected if you will be doing further computations in 32-bit width. And then the current code breaks, as the tests show. The basic idea/pattern here is following: 1. We have `i64` input 2. We perform `i64` right-shift on it. 3. We `trunc`ate that shifted value 4. We do all further work (masking) in `i32` Since we see `trunc`ation and not `lshr`, we give up, and stop trying to extract that right-shift. BUT. The mask is `i32`, therefore we can extend both of the operands of the masking (`and`) to `i64` and truncate the result after masking: https://rise4fun.com/Alive/K4B ``` Name: @bextr64_32_b1 -> @bextr64_32_b0 %shiftedval = lshr i64 %val, %numskipbits %truncshiftedval = trunc i64 %shiftedval to i32 %widenumlowbits1 = zext i8 %numlowbits to i32 %notmask1 = shl nsw i32 -1, %widenumlowbits1 %mask1 = xor i32 %notmask1, -1 %res = and i32 %truncshiftedval, %mask1 => %shiftedval = lshr i64 %val, %numskipbits %widenumlowbits = zext i8 %numlowbits to i64 %notmask = shl nsw i64 -1, %widenumlowbits %mask = xor i64 %notmask, -1 %wideres = and i64 %shiftedval, %mask %res = trunc i64 %wideres to i32 ``` Thus, we are again able to extract that `lshr` into `BEXTR`'s control. Now, the perf (via `llvm-exegesis`) of the snippet suggests that it is not a good idea: ``` $ cat /tmp/old.s # bextr64_32_b1 # LLVM-EXEGESIS-LIVEIN RSI # LLVM-EXEGESIS-LIVEIN EDX # LLVM-EXEGESIS-LIVEIN RDI movq %rsi, %rcx shrq %cl, %rdi shll $8, %edx bextrl %edx, %edi, %eax $ cat /tmp/old.s \| ./bin/llvm-exegesis -mode=latency -snippets-file=- Check generated assembly with: /usr/bin/objdump -d /tmp/snippet-1e0082.o --- mode: latency key: instructions: - 'MOV64rr RCX RSI' - 'SHR64rCL RDI RDI' - 'SHL32ri EDX EDX i_0x8' - 'BEXTR32rr EAX EDI EDX' config: '' register_initial_values: [] cpu_name: bdver2 llvm_triple: x86_64-unknown-linux-gnu num_repetitions: 10000 measurements: - { key: latency, value: 0.6638, per_snippet_value: 2.6552 } error: '' info: '' assembled_snippet: 4889F148D3EFC1E208C4E268F7C74889F148D3EFC1E208C4E268F7C74889F148D3EFC1E208C4E268F7C74889F148D3EFC1E208C4E268F7C7C3 ... $ cat /tmp/old.s \| ./bin/llvm-exegesis -mode=uops -snippets-file=- Check generated assembly with: /usr/bin/objdump -d /tmp/snippet-43e346.o --- mode: uops key: instructions: - 'MOV64rr RCX RSI' - 'SHR64rCL RDI RDI' - 'SHL32ri EDX EDX i_0x8' - 'BEXTR32rr EAX EDI EDX' config: '' register_initial_values: [] cpu_name: bdver2 llvm_triple: x86_64-unknown-linux-gnu num_repetitions: 10000 measurements: - { key: PdFPU0, value: 0, per_snippet_value: 0 } - { key: PdFPU1, value: 0, per_snippet_value: 0 } - { key: PdFPU2, value: 0, per_snippet_value: 0 } - { key: PdFPU3, value: 0, per_snippet_value: 0 } - { key: NumMicroOps, value: 1.2571, per_snippet_value: 5.0284 } error: '' info: '' assembled_snippet: 4889F148D3EFC1E208C4E268F7C74889F148D3EFC1E208C4E268F7C74889F148D3EFC1E208C4E268F7C74889F148D3EFC1E208C4E268F7C7C3 ... ``` vs ``` $ cat /tmp/new.s # bextr64_32_b1 # LLVM-EXEGESIS-LIVEIN RDX # LLVM-EXEGESIS-LIVEIN SIL # LLVM-EXEGESIS-LIVEIN RDI shlq $8, %rdx movzbl %sil, %eax orq %rdx, %rax bextrq %rax, %rdi, %rax $ cat /tmp/new.s \| ./bin/llvm-exegesis -mode=latency -snippets-file=- Check generated assembly with: /usr/bin/objdump -d /tmp/snippet-8944f1.o --- mode: latency key: instructions: - 'SHL64ri RDX RDX i_0x8' - 'MOVZX32rr8 EAX SIL' - 'OR64rr RAX RAX RDX' - 'BEXTR64rr RAX RDI RAX' config: '' register_initial_values: [] cpu_name: bdver2 llvm_triple: x86_64-unknown-linux-gnu num_repetitions: 10000 measurements: - { key: latency, value: 0.7454, per_snippet_value: 2.9816 } error: '' info: '' assembled_snippet: 48C1E208400FB6C64809D0C4E2F8F7C748C1E208400FB6C64809D0C4E2F8F7C748C1E208400FB6C64809D0C4E2F8F7C748C1E208400FB6C64809D0C4E2F8F7C7C3 ... $ cat /tmp/new.s \| ./bin/llvm-exegesis -mode=uops -snippets-file=- Check generated assembly with: /usr/bin/objdump -d /tmp/snippet-da403c.o --- mode: uops key: instructions: - 'SHL64ri RDX RDX i_0x8' - 'MOVZX32rr8 EAX SIL' - 'OR64rr RAX RAX RDX' - 'BEXTR64rr RAX RDI RAX' config: '' register_initial_values: [] cpu_name: bdver2 llvm_triple: x86_64-unknown-linux-gnu num_repetitions: 10000 measurements: - { key: PdFPU0, value: 0, per_snippet_value: 0 } - { key: PdFPU1, value: 0, per_snippet_value: 0 } - { key: PdFPU2, value: 0, per_snippet_value: 0 } - { key: PdFPU3, value: 0, per_snippet_value: 0 } - { key: NumMicroOps, value: 1.2571, per_snippet_value: 5.0284 } error: '' info: '' assembled_snippet: 48C1E208400FB6C64809D0C4E2F8F7C748C1E208400FB6C64809D0C4E2F8F7C748C1E208400FB6C64809D0C4E2F8F7C748C1E208400FB6C64809D0C4E2F8F7C7C3 ... ``` ^ latency increased (worse). Except //maybe// not really. Like with all synthetic benchmarks, they //may// be misleading. Let's take a look on some actual real-world hotpath. In this case it's 'my' [[ https://github.com/darktable-org/rawspeed \| RawSpeed ]]'s `BitStream<>::peekBitsNoFill()`, in [[ `e3316dc851/src/librawspeed/decompressors/VC5Decompressor.cpp (L814)` \| GoPro VC5 decompressor ]]: ``` raw.pixls.us-unique/GoPro/HERO6 Black$ /usr/src/googlebenchmark/tools/compare.py -a benchmarks ~/rawspeed/build-clangs1-{old,new}/src/utilities/rsbench/rsbench --benchmark_counters_tabular=true --benchmark_min_time=0.00000001 --benchmark_repetitions=128 GOPR9172.GPR RUNNING: /home/lebedevri/rawspeed/build-clangs1-old/src/utilities/rsbench/rsbench --benchmark_counters_tabular=true --benchmark_min_time=0.00000001 --benchmark_repetitions=128 GOPR9172.GPR --benchmark_display_aggregates_only=true --benchmark_out=/tmp/tmplwbKEM 2018-12-22 21:23:03 Running /home/lebedevri/rawspeed/build-clangs1-old/src/utilities/rsbench/rsbench Run on (8 X 4012.81 MHz CPU s) CPU Caches: L1 Data 16K (x8) L1 Instruction 64K (x4) L2 Unified 2048K (x4) L3 Unified 8192K (x1) Load Average: 3.41, 2.41, 2.03 ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Benchmark Time CPU Iterations CPUTime,s CPUTime/WallTime Pixels Pixels/CPUTime Pixels/WallTime Raws/CPUTime Raws/WallTime WallTime,s ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- GOPR9172.GPR/threads:8/real_time_mean 40 ms 40 ms 128 0.322244 7.96974 12M 37.4457M 298.534M 3.12047 24.8778 0.040465 GOPR9172.GPR/threads:8/real_time_median 39 ms 39 ms 128 0.312606 7.99155 12M 38.387M 306.788M 3.19891 25.5656 0.039115 GOPR9172.GPR/threads:8/real_time_stddev 4 ms 3 ms 128 0.0271557 0.130575 0 2.4941M 21.3909M 0.207842 1.78257 3.81081m RUNNING: /home/lebedevri/rawspeed/build-clangs1-new/src/utilities/rsbench/rsbench --benchmark_counters_tabular=true --benchmark_min_time=0.00000001 --benchmark_repetitions=128 GOPR9172.GPR --benchmark_display_aggregates_only=true --benchmark_out=/tmp/tmpWAkan9 2018-12-22 21:23:08 Running /home/lebedevri/rawspeed/build-clangs1-new/src/utilities/rsbench/rsbench Run on (8 X 4013.1 MHz CPU s) CPU Caches: L1 Data 16K (x8) L1 Instruction 64K (x4) L2 Unified 2048K (x4) L3 Unified 8192K (x1) Load Average: 3.78, 2.50, 2.06 ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Benchmark Time CPU Iterations CPUTime,s CPUTime/WallTime Pixels Pixels/CPUTime Pixels/WallTime Raws/CPUTime Raws/WallTime WallTime,s ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- GOPR9172.GPR/threads:8/real_time_mean 39 ms 39 ms 128 0.311533 7.97323 12M 38.6828M 308.471M 3.22356 25.706 0.0390928 GOPR9172.GPR/threads:8/real_time_median 38 ms 38 ms 128 0.304231 7.99005 12M 39.4437M 315.527M 3.28698 26.294 0.0380316 GOPR9172.GPR/threads:8/real_time_stddev 3 ms 3 ms 128 0.0229149 0.133814 0 2.26225M 19.1421M 0.188521 1.59517 3.13671m Comparing /home/lebedevri/rawspeed/build-clangs1-old/src/utilities/rsbench/rsbench to /home/lebedevri/rawspeed/build-clangs1-new/src/utilities/rsbench/rsbench Benchmark Time CPU Time Old Time New CPU Old CPU New -------------------------------------------------------------------------------------------------------------------------------------- GOPR9172.GPR/threads:8/real_time_pvalue 0.0000 0.0000 U Test, Repetitions: 128 vs 128 GOPR9172.GPR/threads:8/real_time_mean -0.0339 -0.0316 40 39 40 39 GOPR9172.GPR/threads:8/real_time_median -0.0277 -0.0274 39 38 39 38 GOPR9172.GPR/threads:8/real_time_stddev -0.1769 -0.1267 4 3 3 3 ``` I.e. this results in //roughly// -3% improvements in perf. While this will help [[ https://bugs.llvm.org/show_bug.cgi?id=36419 \| PR36419 ]], it won't address it fully. Reviewers: RKSimon, craig.topper, andreadb, spatel Reviewed By: craig.topper Subscribers: courbet, llvm-commits Differential Revision: https://reviews.llvm.org/D56052 llvm-svn: 351253	2019-01-15 21:31:18 +00:00
David Callahan	d129d3e93f	treat invoke like call Summary: InvokeInst should be treated like CallInst and assigned a separate discriminator. This is particularly import when an Invoke is converted to a Call during compilation and so can invalidate sample profile data collected wtih different link time optimizations Reviewers: twoh, Kader, danielcdh, wmi Reviewed By: wmi Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D56491 llvm-svn: 351251	2019-01-15 21:26:51 +00:00
Matt Morehouse	19ff35c481	[SanitizerCoverage] Don't create comdat for interposable functions. Summary: Comdat groups override weak symbol behavior, allowing the linker to keep the comdats for weak symbols in favor of comdats for strong symbols. Fixes the issue described in: https://bugs.chromium.org/p/chromium/issues/detail?id=918662 Reviewers: eugenis, pcc, rnk Reviewed By: pcc, rnk Subscribers: smeenai, rnk, bd1976llvm, hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D56516 llvm-svn: 351247	2019-01-15 21:21:01 +00:00
Craig Topper	82015b633b	[X86] Add versions of the avx512 gather intrinsics that take the mask as a vXi1 vector instead of a scalar In keeping with our general direction of having the vXi1 type present in IR, this patch converts the mask argument for avx512 gather to vXi1. This can avoid k-register to GPR to k-register transitions late in codegen. I left the existing intrinsics behind because they have many out of tree users such as ISPC. They generate their own code and don't go through the autoupgrade path which only works for bitcode and ll parsing. Ideally we will get them to migrate to target independent intrinsics, but it might be easier for them to migrate to these new intrinsics. I'll work on scatter and gatherpf/scatterpf next. Differential Revision: https://reviews.llvm.org/D56527 llvm-svn: 351234	2019-01-15 20:12:33 +00:00
Anton Korobeynikov	c9e9e28487	[MSP430] Recognize '{' as a line separator msp430-as supports multiple assembly statements on the same line separated by a '{' character. llvm-svn: 351233	2019-01-15 20:10:46 +00:00
Craig Topper	99fcbf67d0	[Nios2] Remove Nios2 backend As mentioned here http://lists.llvm.org/pipermail/llvm-dev/2019-January/129121.html This backend is incomplete and has not been maintained in several months. Differential Revision: https://reviews.llvm.org/D56691 llvm-svn: 351231	2019-01-15 19:59:19 +00:00

1 2 3 4 5 ...

119675 Commits