llvm-project

Commit Graph

Author	SHA1	Message	Date
Hiroshi Yamauchi	f0c2cfe4d0	[PGO] Guard the memcmp/bcmp size value profiling instrumentation behind flag. Summary: Follow up D79751 and put the instrumentation / value collection side (in addition to the optimization side) behind the flag as well. Reviewers: davidxl Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D80646	2020-05-28 10:07:04 -07:00
Nikita Popov	e0e5c64460	[SDAG] Don't require LazyBlockFrequencyInfo at optnone While LazyBlockFrequencyInfo itself is lazy, the dominator tree and loop info analyses it requires are not. Drop the dependency on this pass in SelectionDAGIsel at O0. This makes for a ~0.6% O0 compile-time improvement. Differential Revision: https://reviews.llvm.org/D80387	2020-05-28 18:48:33 +02:00
alex-t	b726d071b4	[AMDGPU] Reject moving PHI to VALU if the only VGPR input originated from move immediate Summary: PHIs result register class is set to VGPR or SGPR depending on the cross block value divergence. In some cases uniform PHI need to be converted to return VGPR to prevent the oddnumber of moves values from VGPR to SGPR and back. PHI should certainly return VGPR if it has at least one VGPR input. This change adds the exception. We don't want to convert uniform PHI to VGPRs in case the only VGPR input is a VGPR to SGPR COPY and definition od the source VGPR in this COPY is move immediate. bb.0: %0:vgpr_32 = V_MOV_B32_e32 0, implicit $exec %2:sreg_32 = ..... bb.1: %3:sreg_32 = PHI %1, %bb.3, %2, %bb.1 S_BRANCH %bb.3 bb.3: %1:sreg_32 = COPY %0 S_BRANCH %bb.2 Reviewers: rampitec Reviewed By: rampitec Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D80434	2020-05-28 19:25:51 +03:00
Jean-Michel Gorius	f5192d7fb7	[x86] Propagate memory operands during call frame optimization Summary: Propagate memory operands when folding load instructions into instructions that directly operate on memory. The original revision has been split. See D80140 for the other part of the changes. Reviewers: craig.topper, rnk, lebedev.ri, efriedma Reviewed By: craig.topper Subscribers: lebedev.ri, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D80062	2020-05-28 17:45:53 +02:00
Matt Arsenault	06019e3125	AMDGPU: Add missing test for s_denorm_mode scheduling Forgot to add this file to `1a9e0d7092`	2020-05-28 11:07:22 -04:00
Matt Arsenault	d6671ee90c	InferAddressSpaces: Handle ptrmask intrinsic This one is slightly odd since it counts as an address expression, which previously could never fail. Allow the existing TTI hook to return the value to use, and re-use it for handling how to handle ptrmask. Handles the no-op addrspacecasts for AMDGPU. We could probably do something better based on analysis of the mask value based on the address space, but leave that for now.	2020-05-28 10:04:02 -04:00
Matt Arsenault	0da4353938	AMDGPU: Add baseline test for ptrmask infer address space	2020-05-28 10:04:02 -04:00
Simon Pilgrim	1ddac9563d	[X86][SSE] Peek though MOVMSK source sign bits using SimplifyMultipleUseDemandedBits Allows SimplifyDemandedBitsForTargetNode to peek through multi-use ops where MOVMSK only demands the signbit of each vector element.	2020-05-28 13:42:24 +01:00
Alok Kumar Sharma	7716681cfd	Fixed bot failure after `d20bf5a725` There was a failure on windows bit due to format mismatch on different(Hex and Decimal) platforms even if meaning of output is same. For example on X86 linux => DW_OP_plus_uconst 0x70, DW_OP_deref, DW_OP_lit4, DW_OP_mul ^ on X86 Windows-gnu => DW_AT_location (DW_OP_fbreg +112, DW_OP_deref, DW_OP_lit4, DW_OP_mul) : error: CHECK-SAME: expected string not found in input ; CHECK-SAME: DW_OP_plus_uconst 0x70, DW_OP_deref, DW_OP_lit4, DW_OP_mul ^ <stdin>:28:17: note: scanning from here DW_AT_location (DW_OP_fbreg +112, DW_OP_deref, DW_OP_lit4, DW_OP_mul) ^ <stdin>:28:18: note: possible intended match here DW_AT_location (DW_OP_fbreg +112, DW_OP_deref, DW_OP_lit4, DW_OP_mul) Now the test is limited to x86 using REQUIRED and -mtriple. http://45.33.8.238/win/16214/step_11.txt	2020-05-28 18:01:38 +05:30
Dmitry Preobrazhensky	f47e27e260	[AMDGPU][MC][GFX908] Corrected src0 of v_accvgpr_write to accept only VGPRs and inline constants. This change disables use of special SGPR registers like scc, vccz, execz, etc as operands of v_accvgpr_write. See bug 45414: https://bugs.llvm.org/show_bug.cgi?id=45414 Reviewers: arsenm, rampitec Differential Revision: https://reviews.llvm.org/D80530	2020-05-28 15:10:55 +03:00
Dmitry Preobrazhensky	45251ef534	[AMDGPU][MC] Corrected v_writelane_b32 to fix a decoding bug Corrected vdst_in to match vdst operand type. See bug 45193: https://bugs.llvm.org/show_bug.cgi?id=45193 Reviewers: arsenm, rampitec Differential Revision: https://reviews.llvm.org/D80636	2020-05-28 14:43:49 +03:00
Dmitry Preobrazhensky	bab5dadfcd	[AMDGPU][MC][DISASSEMBLER] Corrected decoder to consume each code fragment only once Summary: disabled disassembly of successfully decoded fragments of code. See detailed bug description: https://bugs.llvm.org/show_bug.cgi?id=46101 Reviewers: arsenm, rampitec Differential Revision: https://reviews.llvm.org/D80637	2020-05-28 14:20:18 +03:00
Georgii Rymar	ad07d5f394	[yaml2obj] - Implement the "SectionHeaderTable" tag. With the "SectionHeaderTable" it is now possible to reorder entries in the section header table. It also allows to stop emitting the table. Differential revision: https://reviews.llvm.org/D80002	2020-05-28 13:42:43 +03:00
Florian Hahn	ab95ac0132	[AArch64] Precommit new fp extraction/insertion test.	2020-05-28 11:13:47 +01:00
Cullen Rhodes	8a397b66b2	[AArch64][SVE] Add support for spilling/filling ZPR2/3/4 Summary: This patch enables the register allocator to spill/fill lists of 2, 3 and 4 SVE vectors registers to/from the stack. This is implemented with pseudo instructions that get expanded to individual LDR_ZXI/STR_ZXI instructions in AArch64ExpandPseudoInsts. Patch by Sander de Smalen. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D75988	2020-05-28 10:02:57 +00:00
Victor Campos	c010d4d195	[ARM] Improve codegen of volatile load/store of i64 Summary: Instead of generating two i32 instructions for each load or store of a volatile i64 value (two LDRs or STRs), now emit LDRD/STRD. These improvements cover architectures implementing ARMv5TE or Thumb-2. The code generation explicitly deviates from using the register-offset variant of LDRD/STRD. In this variant, the register allocated to the register-offset cannot be reused in any of the remaining operands. Such restriction seems to be non-trivial to implement in LLVM, thus it is left as a to-do. Differential Revision: https://reviews.llvm.org/D70072	2020-05-28 10:52:43 +01:00
Thomas Preud'homme	23ac16cf9b	FileCheck [10/12]: Add support for signed numeric values Summary: This patch is part of a patch series to add support for FileCheck numeric expressions. This specific patch adds support signed numeric values, thus allowing negative numeric values. As such, the patch adds a new class to represent a signed or unsigned value and add the logic for type promotion and type conversion in numeric expression mixing signed and unsigned values. It also adds the %d format specifier to represent signed value. Finally, it also adds underflow and overflow detection when performing a binary operation. Copyright: - Linaro (changes up to diff 183612 of revision D55940) - GraphCore (changes in later versions of revision D55940 and in new revision created off D55940) Reviewers: jhenderson, chandlerc, jdenny, probinson, grimar, arichardson Reviewed By: jhenderson, arichardson Subscribers: MaskRay, hiraditya, llvm-commits, probinson, dblaikie, grimar, arichardson, kristina, hfinkel, rogfer01, JonChesterfield Tags: #llvm Differential Revision: https://reviews.llvm.org/D60390	2020-05-28 10:44:21 +01:00
Cullen Rhodes	e533a176b3	[TableGen] Fix non-standard escape warnings for braces in InstAlias Summary: TableGen interprets braces ('{}') in the asm string of instruction aliases as variants but when defining aliases with literal braces they have to be escaped to prevent them being removed. Braces are escaped with '\\', for example: def FooBraces : InstAlias<"foo \\{$imm\\}", (foo IntOperand:$imm)>; Although when TableGen is emitting the assembly writer (-gen-asm-writer) the AsmString that gets emitted is: AsmString = "foo \{$\x01\}"; In c/c++ braces don't need to be escaped which causes compilation warnings: warning: use of non-standard escape character '\{' [-Wpedantic] This patch fixes the issue by unescaping the flattened alias asm string in the asm writer, by replacing '\{\}' with '{}'. Reviewed By: hfinkel Differential Revision: https://reviews.llvm.org/D79991	2020-05-28 09:36:24 +00:00
Alok Kumar Sharma	d20bf5a725	[DebugInfo] Upgrade DISubrange to support Fortran dynamic arrays This patch upgrades DISubrange to support fortran requirements. Summary: Below are the updates/addition of fields. lowerBound - Now accepts signed integer or DIVariable or DIExpression, earlier it accepted only signed integer. upperBound - This field is now added and accepts signed interger or DIVariable or DIExpression. stride - This field is now added and accepts signed interger or DIVariable or DIExpression. This is required to describe bounds of array which are known at runtime. Testing: unit test cases added (hand-written) check clang check llvm check debug-info Reviewed By: aprantl Differential Revision: https://reviews.llvm.org/D80197	2020-05-28 13:46:41 +05:30
Kazushi (Jam) Marukawa	5921782f74	[VE] Implements minimum MC layer for VE (3/4) Summary: Define ELF binary code for VE and modify code where should use this new code. Depends on D79544. Reviewed By: jhenderson Differential Revision: https://reviews.llvm.org/D79545	2020-05-28 10:07:48 +02:00
Xing GUO	3c3a6e26e7	[ObjectYAML][MachO] Add error handling in MachOEmitter. Currently, `yaml2macho` doesn't support error handling. This patch helps improve it. Differential Revision: https://reviews.llvm.org/D80535	2020-05-28 09:54:46 +08:00
Layton Kifer	2bf3fe9b6d	[TRE] Allow elimination when the returned value is non-constant Currently we can only eliminate call return pairs that either return the result of the call or a dynamic constant. This patch removes that limitation. Differential Revision: https://reviews.llvm.org/D79660	2020-05-27 16:55:03 -07:00
Stanislav Mekhanoshin	7392bbc301	AMDGPU/GlobalISel: Fixed insert element for non-standard vectors Differential Revision: https://reviews.llvm.org/D80653	2020-05-27 16:26:22 -07:00
Matt Arsenault	5e007fe998	AMDGPU: Support non-entry block static sized allocas OpenMP emits these for some reason, so handle them. Assume these use 4096 bytes by default, with a flag to override this. Also change the related stack assumption for calls to have a flag.	2020-05-27 18:46:10 -04:00
Stanislav Mekhanoshin	8aa81aaebe	AMDGPU/GlobalISel: Fixed handling of non-standard vectors We do not have register classes for all possible vector sizes, so round it up for extract vector element. Also fixes selection of G_MERGE_VALUES when vectors are not a power of two. This has required to refactor getRegSplitParts() in way that it can handle not just power of two vectors. Ideally we would like RegSplitParts to be generated by tablegen. Differential Revision: https://reviews.llvm.org/D80457	2020-05-27 15:44:09 -07:00
Michael Liao	fa342b5c80	Enable `align <n>` to be used in the intrinsic definition. - This allow us to specify the (minimal) alignment on an intrinsic's arguments and, more importantly, the return value. Differential Revision: https://reviews.llvm.org/D80422	2020-05-27 16:38:18 -04:00
Michael Liao	03481287ca	Refactor argument attribute specification in intrinsic definition. NFC. - Argument attribute needs specifiying through `ArgIndex<n>` (corresponding to `FirstArgIndex`) to distinguish explicitly from the index number from the overloaded type list. - In addition, `RetIndex` (corresponding to `ReturnIndex`) and `FuncIndex` (corresponding to `FunctionIndex`) are introduced for us to associate attributes on the return value and potentially function itself. Differential Revision: https://reviews.llvm.org/D80422	2020-05-27 16:37:53 -04:00
Juneyoung Lee	54b6457240	[TargetPassConfig] Add CanonicalizeFreezeInLoops before LSR Summary: This patch adds CanonicalizeFreezeInLoops before LSR. Relevant patch: https://reviews.llvm.org/D77523 Reviewers: spatel, efriedma, jdoerfert, fhahn, nikic, reames, xbolva00 Reviewed By: nikic Subscribers: xbolva00, nikic, lebedev.ri, hiraditya, llvm-commits, sanwou01, nlopes Tags: #llvm Differential Revision: https://reviews.llvm.org/D77524	2020-05-28 05:21:12 +09:00
Jessica Paquette	c593bf5342	[GlobalISel] Don't combine instructions which are fed by memory instructions. If we have a memory instruction (e.g. a load), we shouldn't combine it away in some trivial combine. It's possible that, say, a call lives between the instructions. This could modify the value loaded, making the load instructions not safe to fold. Differential Revision: https://reviews.llvm.org/D80053	2020-05-27 12:48:58 -07:00
alex-t	eb1092ada3	[AMDGPU] Fix for the lost CarryOut/CarryIn register operands in S_ADD/SUB_CO_PSEUDO. Summary: This fixes the `5b898bddff` bug when the carry-in and carry-out registers became lost in lowering S_ADD/SUB_CO_PSEUDO. Reviewers: rampitec, arsenm Reviewed By: arsenm Subscribers: msearles, arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D80158	2020-05-27 22:41:04 +03:00
Craig Topper	8e7e6a8d6b	[X86] Restore selection of MULX on BMI2 targets. Looking back over gcc and icc behavior it looks like icc does use mulx32 on 32-bit targets and mulx64 on 64-bit targets. It's also used when dividing i32 by constant on 32-bit targets and i64 by constant on 64-bit targets. gcc uses it multiplies producing a 64 bit result on 32-bit targets and 128-bit results on a 64-bit target. gcc does not appear to use it for division by constant. After this patch clang is closer to the icc behavior. This basically reverts `d1c61861dd`, but there were no strong feelings at the time. Fixes PR45518. Differential Revision: https://reviews.llvm.org/D80498	2020-05-27 12:01:18 -07:00
Sanjay Patel	48cb380abd	[InstCombine] add tests for vector demanded elements of select condition; NFC	2020-05-27 14:49:36 -04:00
Matt Arsenault	4b4496312e	AMDGPU: Start adding MODE register uses to instructions This is the groundwork required to implement strictfp. For now, this should be NFC for regular instructoins (many instructions just gain an extra use of a reserved register). Regalloc won't rematerialize instructions with reads of physical registers, but we were suffering from that anyway with the exec reads. Should add it for all the related FP uses (possibly with some extras). I did not add it to either the gpr index mode instructions (or every single VALU instruction) since it's a ridiculous feature already modeled as an arbitrary side effect. Also work towards marking instructions with FP exceptions. This doesn't actually set the bit yet since this would start to change codegen. It seems nofpexcept is currently not implied from the regular IR FP operations. Add it to some MIR tests where I think it might matter.	2020-05-27 14:47:00 -04:00
John Fastabend	13f6c81c5d	[BPF] simplify zero extension with MOV_32_64 The current pattern matching for zext results in the following code snippet being produced, w1 = w0 r1 <<= 32 r1 >>= 32 Because BPF implementations require zero extension on 32bit loads this both adds a few extra unneeded instructions but also makes it a bit harder for the verifier to track the r1 register bounds. For example in this verifier trace we see at the end of the snippet R2 offset is unknown. However, if we track this correctly we see w1 should have the same bounds as r8. R8 smax is less than U32 max value so a zero extend load should keep the same value. Adding a max value of 800 (R8=inv(id=0,smax_value=800)) to an off=0, as seen in R7 should create a max offset of 800. However at the end of the snippet we note the R2 max offset is 0xffffFFFF. R0=inv(id=0,smax_value=800) R1_w=inv(id=0,umax_value=2147483647,var_off=(0x0; 0x7fffffff)) R6=ctx(id=0,off=0,imm=0) R7=map_value(id=0,off=0,ks=4,vs=1600,imm=0) R8_w=inv(id=0,smax_value=800,umax_value=4294967295,var_off=(0x0; 0xffffffff)) R9=inv800 R10=fp0 fp-8=mmmm???? 58: (1c) w9 -= w8 59: (bc) w1 = w8 60: (67) r1 <<= 32 61: (77) r1 >>= 32 62: (bf) r2 = r7 63: (0f) r2 += r1 64: (bf) r1 = r6 65: (bc) w3 = w9 66: (b7) r4 = 0 67: (85) call bpf_get_stack#67 R0=inv(id=0,smax_value=800) R1_w=ctx(id=0,off=0,imm=0) R2_w=map_value(id=0,off=0,ks=4,vs=1600,umax_value=4294967295,var_off=(0x0; 0xffffffff)) R3_w=inv(id=0,umax_value=800,var_off=(0x0; 0x3ff)) R4_w=inv0 R6=ctx(id=0,off=0,imm=0) R7=map_value(id=0,off=0,ks=4,vs=1600,imm=0) R8_w=inv(id=0,smax_value=800,umax_value=4294967295,var_off=(0x0; 0xffffffff)) R9_w=inv(id=0,umax_value=800,var_off=(0x0; 0x3ff)) R10=fp0 fp-8=mmmm???? After this patch R1 bounds are not smashed by the <<=32 >>=32 shift and we get correct bounds on R2 umax_value=800. Further it reduces 3 insns to 1. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Differential Revision: https://reviews.llvm.org/D73985	2020-05-27 11:26:39 -07:00
Lei Huang	2368bf52cd	[PowerPC] Add support for -mcpu=pwr10 in both clang and llvm Summary: This patch simply adds support for the new CPU in anticipation of Power10. There isn't really any functionality added so there are no associated test cases at this time. Reviewers: stefanp, nemanjai, amyk, hfinkel, power-llvm-team, #powerpc Reviewed By: stefanp, nemanjai, amyk, #powerpc Subscribers: NeHuang, steven.zhang, hiraditya, llvm-commits, wuzish, shchenz, cfe-commits, kbarton, echristo Tags: #clang, #powerpc, #llvm Differential Revision: https://reviews.llvm.org/D80020	2020-05-27 13:14:25 -05:00
Matt Arsenault	07cd19efa2	AMDGPU: Fix dropping MI flags when rewriting instructions All 3 passes that change instruction encodings were dropping MI flags. This avoids scheduling regressions caused by setting mayRaiseFPExceptions on FP instructions for non-strictfp functions.	2020-05-27 13:27:06 -04:00
Fangrui Song	5b4cd2d4c4	[X86] Assemble movzb 1280(%rbx, %r12), %r12 after D80608 ffmpeg/libavcodec/x86/h264_cabac.c inline assembly may produce movzb 1280(%rbx, %r12), %r12 After D80608, llvm-mc errors: error: unknown use of instruction mnemonic without a size suffix	2020-05-27 09:55:55 -07:00
Philip Reames	1af3705c7f	Start migrating away from statepoint's inline length prefixed argument bundles In the current statepoint design, we have four distinct groups of operands to the call: call args, gc transition args, deopt args, and gc args. This format prexisted the support in IR for operand bundles and was in fact one of the inspirations for the extension. However, we never went back and rearchitected statepoints to fully leverage bundles. This change is the first in a small sequence to do so. All this does is extend the SelectionDAG lowering code to allow deopt and gc transition operands to be specified in either inline argument bundles or operand bundles. Differential Revision: https://reviews.llvm.org/D8059	2020-05-27 09:16:10 -07:00
Alex Richardson	3be5e53f20	[FileCheck] Allow parenthesized expressions With this change it is be possible to write FileCheck expressions such as [[#(VAR+1)-2]]. Currently, the only supported arithmetic operators are plus and minus, so this is not particularly useful yet. However, it our CHERI fork we have tests that benefit from having multiplication in FileCheck expressions. Allowing parenthesized expressions is the simplest way for us to work around the current lack of operator precedence in FileCheck expressions. Reviewed By: thopre, jhenderson Differential Revision: https://reviews.llvm.org/D77383	2020-05-27 16:31:39 +01:00
Lei Huang	559845f8fe	Revert "[PowerPC] Add support for -mcpu=pwr10 in both clang and llvm" This reverts commit `7eb666b155`.	2020-05-27 09:40:21 -05:00
Ties Stuij	78bd0c0e5e	[AArch64][BFloat] add BFloat instruction support for AArch64 Summary: Add support for lowering various BFloat related SelDAG nodes: - load/store (ldrh/strh) - concat - dup/duplane - bitconvert/bitcast - insert_subvector/insert_subreg This patch is part of a series implementing the Bfloat16 extension of the Armv8.6-a architecture, as detailed here: https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/arm-architecture-developments-armv8-6-a The bfloat type, and its properties are specified in the Arm Architecture Reference Manual: https://developer.arm.com/docs/ddi0487/latest/arm-architecture-reference-manual-armv8-for-armv8-a-architecture-profile Reviewers: ab, t.p.northover, john.brawn, fpetrogalli, sdesmalen, LukeGeeson Reviewed By: fpetrogalli Subscribers: LukeGeeson, pbarrio, kristof.beyls, hiraditya, danielkiss, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D79712	2020-05-27 15:36:54 +01:00
Georgii Rymar	4ab03e62fd	[llvm-readobj] - Do not crash when an invalid .eh_frame_hdr is dumped using --unwind. When the p_offset/p_filesz of the PT_GNU_EH_FRAME is invalid (e.g larger than the file size) then llvm-readobj might crash. This patch fixes the issue. I've introduced `ELFFile<ELFT>::getSegmentContent` method, which is very similar to `ELFFile<ELFT>::getSectionContentsAsArray` one. Differential revision: https://reviews.llvm.org/D80380	2020-05-27 16:41:09 +03:00
David Green	70d4a20299	[UnJ] Update LI for inner nested loops This makes sure to correctly register the loop info of the children of unroll and jammed loops. It re-uses some code from the unroller for registering subloops. Differential Revision: https://reviews.llvm.org/D80619	2020-05-27 14:36:38 +01:00
Matt Arsenault	833996cef1	AMDGPU: Fix backwards s_cselect_* operands The vector equivalent has backwards operands, but the scalar version does not. The passes that use these hooks aren't enabled by default, so this doesn't really change anything.	2020-05-27 09:26:09 -04:00
Guillaume Chatelet	5b84ee4f61	[Alignment] Fix misaligned interleaved loads Summary: Tentatively fixing https://bugs.llvm.org/show_bug.cgi?id=45957 Reviewers: craig.topper, nlopes Subscribers: hiraditya, llvm-commits, RKSimon, jdoerfert, efriedma Tags: #llvm Differential Revision: https://reviews.llvm.org/D80276	2020-05-27 12:12:22 +00:00
Victor Campos	c7593b0f0d	[ARM] Fix rewrite of frame index in Thumb2's address mode i8s4 Summary: In Thumb2's frame index rewriting process, the address mode i8s4, which is used by LDRD and STRD instructions, is handled by taking the immediate offset operand and multiplying it by 4. This behaviour is wrong, however. In this specific address mode, the MachineInstr's immediate operand is already in the expected form. By consequence of that, multiplying it once more by 4 yields a flawed offset value, four times greater than it should be. Differential Revision: https://reviews.llvm.org/D80557	2020-05-27 13:09:13 +01:00
Guillaume Chatelet	6e1eff7858	[NFC] Updating tests Summary: Updating IR now that alignment is explicitly set. This is a prerequisite to D80276. Reviewers: efriedma Subscribers: llvm-commits, craig.topper Tags: #llvm Differential Revision: https://reviews.llvm.org/D80549	2020-05-27 12:02:46 +00:00
Daniil Suchkov	706b22e3e4	[SimpleLoopUnswitch] Drop uses of instructions before block deletion Currently if instructions defined in a block are used in unreachable blocks and SimpleLoopUnswitch attempts deleting the block, it triggers assertion "Uses remain when a value is destroyed!". This patch fixes it by replacing all uses of instructions from BB with undefs before BB deletion. Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D80551	2020-05-27 18:25:18 +07:00
Georgii Rymar	fc98447af6	[llvm-readobj] - Do not skip building of the GNU hash table histogram. When the `--elf-hash-histogram` is used, the code first tries to build a histogram for the .hash table and then for the .gnu.hash table. The problem is that dumper might return early when unable or do not need to build a histogram for the .hash. This patch reorders the code slightly to fix the issue and adds a test case. Differential revision: https://reviews.llvm.org/D80204	2020-05-27 13:46:41 +03:00
Simon Pilgrim	410667f1b7	[X86][SSE] Convert PTEST to MOVMSK for allsign bits vector results If we are using PTEST to check 'allsign bits' vector elements we can use MOVMSK to extract the signbits directly and perform the comparison on the scalar value. For vXi16 cases, as we don't have a MOVMSK for this type, we must mask each signbit out of a PMOVMSKB v2Xi8 result, which folds into the TEST comparison. If this allows us to remove a vector op (via the SimplifyMultipleUseDemandedBits call) this is consistently faster than a PTEST (https://godbolt.org/z/ziJUst). I'm investigating whether we ever get regressions without the SimplifyMultipleUseDemandedBits call, even if this means we don't remove a vector op, but that has exposed some other poor codegen issues that I'm still investigating and would have to wait for a later patch. Suggested on PR42035 to avoid unnecessary ashr(x,bw-1)/pcmpgt(0,x) sign splat patterns feeding into ptest. Differential Revision: https://reviews.llvm.org/D80563	2020-05-27 11:06:16 +01:00

1 2 3 4 5 ...

71565 Commits