llvm-project

Commit Graph

Author	SHA1	Message	Date
Tom Stellard	a894043910	Revert "AMDGPU/GlobalISel: Implement select for G_INSERT" This reverts commit r344310. The test case was failing on some bots. llvm-svn: 344317	2018-10-11 23:36:46 +00:00
Matthias Braun	d6131c9633	X86/TargetTransformInfo: Report div/rem constant immediate costs as TCC_Free DIV/REM by constants should always be expanded into mul/shift/etc. patterns. Unfortunately the ConstantHoisting pass runs too early at a point where the pattern isn't expanded yet. However after ConstantHoisting hoisted some immediate the result may not expand anymore. Also the hoisting typically doesn't make sense because it operates on immediates that will change completely during the expansion. Report DIV/REM as TCC_Free so ConstantHoisting will not touch them. Differential Revision: https://reviews.llvm.org/D53174 llvm-svn: 344315	2018-10-11 23:14:35 +00:00
Tom Stellard	4733be6e7b	AMDGPU/GlobalISel: Implement select for G_INSERT Reviewers: arsenm Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D53116 llvm-svn: 344310	2018-10-11 22:49:54 +00:00
Ana Pazos	0a5fcefa31	[RISCV] Fix disassembling of fence instruction with invalid field Summary: Instruction with 0 in fence field being disassembled as fence , iorw. Printing "unknown" to match GAS behavior. This bug was uncovered by a LLVM MC Disassembler Protocol Buffer Fuzzer for the RISC-V assembly language. Reviewers: asb Subscribers: rbar, johnrusso, simoncook, sabuasal, niosHD, kito-cheng, shiva0217, zzheng, edward-jones, mgrang, rogfer01, MartinMosbeck, brucehoult, the_o, rkruppe, jfb, PkmX, jocewei, asb Differential Revision: https://reviews.llvm.org/D51828 llvm-svn: 344309	2018-10-11 22:49:13 +00:00
Richard Trieu	dfd1760b5f	Inline variable into assert to avoid unused variable warning. llvm-svn: 344308	2018-10-11 22:42:41 +00:00
Craig Topper	35d513c7e4	[X86] Type legalize v2f32 loads by using an f64 load and a scalar_to_vector. On 64-bit targets the generic legalize will use an i64 load and a scalar_to_vector for us. But on 32-bit targets i64 isn't legal and the generic legalizer will end up emitting two 32-bit loads. We have DAG combines that try to put those two loads back together with pretty good success. This patch instead uses f64 to avoid the splitting entirely. I've made it do the same for 64-bit mode for consistency and to keep the load in the fp domain. There are a few things in here that look like regressions in 32-bit mode, but I believe they bring us closer to the 64-bit mode codegen. And that the 64-bit mode code could be better. I think those issues should be looked at separately. Differential Revision: https://reviews.llvm.org/D52528 llvm-svn: 344291	2018-10-11 20:36:06 +00:00
Thomas Lively	f04bed8e79	[WebAssembly][NFC] Remove repetition of Defs = [ARGUMENTS] (fixed) llvm-svn: 344287	2018-10-11 20:21:22 +00:00
Sumanth Gundapaneni	a4a9155e4f	[Hexagon] Restrict compound instructions with constant value. Having a constant value operand in the compound instruction is not always profitable. This patch improves coremark by ~4% on Hexagon. Differential Revision: https://reviews.llvm.org/D53152 llvm-svn: 344284	2018-10-11 19:48:15 +00:00
Thomas Lively	ab37189f7e	[WebAssembly] Revert rL344180, which was breaking expensive checks llvm-svn: 344280	2018-10-11 18:45:48 +00:00
Krzysztof Parzyszek	5d3a6f76a8	[Hexagon] Eliminate potential sources of non-determinism in HCE Also, avoid comparing GUIDs when ordering global addresses, because source file location can cause different GUID to be calculated. As a result, a pair of symbols can compare "less" in one directory, but "greater" in another. llvm-svn: 344271	2018-10-11 18:26:02 +00:00
Craig Topper	fb2ac8969e	[X86] Restore X86ISelDAGToDAG::matchBEXTRFromAnd. Teach address matching to create a BEXTR pattern from a (shl (and X, mask >> C1) if C1 can be folded into addressing mode. This is an alternative to D53080 since I think using a BEXTR for a shifted mask is definitely an improvement when the shl can be absorbed into addressing mode. The other cases I'm less sure about. We already have several tricks for handling an and of a shift in address matching. This adds a new case for BEXTR. I've moved the BEXTR matching code back to X86ISelDAGToDAG to allow it to match. I suppose alternatively we could directly emit a X86ISD::BEXTR node that isel could pattern match. But I'm trying to view BEXTR matching as an isel concern so DAG combine can see 'and' and 'shift' operations that are well understood. We did lose a couple cases from tbm_patterns.ll, but I think there are ways to recover that. I've also put back the manual load folding code in matchBEXTRFromAnd that I removed a few months ago in r324939. This gives us some more freedom to make decisions based on the ability to fold a load. I haven't done anything with that yet. Differential Revision: https://reviews.llvm.org/D53126 llvm-svn: 344270	2018-10-11 18:06:07 +00:00
Diogo N. Sampaio	352a2fa1e7	[AARCH64][FIX] Emit data symbol for constant pool data The ARM64 elf emitter would omit printing data symbol for zero filled constant data. This patch overrides the emitFill method as to enforce that the symbol is correctly printed. Differential revision: https://reviews.llvm.org/D53132 llvm-svn: 344248	2018-10-11 14:10:32 +00:00
Roman Lebedev	4225f4adff	[X86][BMI1]: X86DAGToDAGISel: select BEXTR from x & ~(-1 << nbits) pattern Summary: As discussed in D48491, we can't really do this in the TableGen, since we need to produce two instructions. This only implements one single pattern. The other 3 patterns will be in follow-ups. I'm not sure yet if we want to also fuse shift into here (i.e `(x >> start) & ...`) Reviewers: RKSimon, craig.topper, spatel Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D52304 llvm-svn: 344224	2018-10-11 07:51:13 +00:00
Thomas Lively	7fa7e6a284	[WebAssembly][NFC] Use intrinsic dag nodes directly Summary: Instead of custom lowering to WebAssemblyISD nodes first. Reviewers: aheejin, dschuff Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D53119 llvm-svn: 344211	2018-10-11 00:49:24 +00:00
Thomas Lively	2ebacb107b	[WebAssembly] Saturating float to int intrinsics Summary: Although the saturating float to int instructions are already emitted from normal IR, the fpto{s,u}i instructions produce poison values if the argument cannot fit in the result type. These intrinsics are therefore necessary to get guaranteed defined saturating behavior. Reviewers: aheejin, dschuff Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D53004 llvm-svn: 344204	2018-10-11 00:01:25 +00:00
Craig Topper	b5421c498d	[X86] Prevent non-temporal loads from folding into instructions by blocking them in X86DAGToDAGISel::IsProfitableToFold rather than with a predicate. Remove tryFoldVecLoad since tryFoldLoad would call IsProfitableToFold and pick up the new check. This saves about 5K out of ~600K on the generated isel table. llvm-svn: 344189	2018-10-10 21:48:34 +00:00
George Burgess IV	6ef8002c2c	Replace most users of UnknownSize with LocationSize::unknown(); NFC Moving away from UnknownSize is part of the effort to migrate us to LocationSizes (e.g. the cleanup promised in D44748). This doesn't entirely remove all of the uses of UnknownSize; some uses require tweaks to assume that UnknownSize isn't just some kind of int. This patch is intended to just be a trivial replacement for all places where LocationSize::unknown() will Just Work. llvm-svn: 344186	2018-10-10 21:28:44 +00:00
Thomas Lively	eff0542c56	[WebAssembly][NFC] Remove repetition of Defs = [ARGUMENTS] Summary: By moving that line into the `I` multiclass. Reviewers: aheejin Subscribers: dschuff, sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D53093 llvm-svn: 344180	2018-10-10 20:40:54 +00:00
Roman Lebedev	33d84c6dac	[X86] Move X86DAGToDAGISel::matchBEXTRFromAnd() into X86ISelLowering Summary: As discussed in [[ https://bugs.llvm.org/show_bug.cgi?id=38938 \| PR38938 ]], we fail to emit `BEXTR` if the mask is shifted. We can't deal with that in `X86DAGToDAGISel` `before the address mode for the inc is selected`, and we can't really do it in the normal DAGCombine, because we don't have generic `ISD::BitFieldExtract` node, and if we simply turn the shifted mask into a normal mask + shift-left, it will be folded back. So it would seem X86ISelLowering is the place to handle this. This patch only moves the matchBEXTRFromAnd() from X86DAGToDAGISel to X86ISelLowering. It does not add support for the 'shifted mask' pattern. Reviewers: RKSimon, craig.topper, spatel Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D52426 llvm-svn: 344179	2018-10-10 20:40:12 +00:00
Thomas Lively	103f0161b3	[WebAssembly][NFC] Use vnot patfrag to simplify v128.not Reviewers: aheejin, dschuff Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D53097 llvm-svn: 344175	2018-10-10 19:09:16 +00:00
Sanjay Patel	6cca8af227	[x86] allow single source horizontal op matching (PR39195) This is intended to restore horizontal codegen to what it looked like before IR demanded elements improved in: rL343727 As noted in PR39195: https://bugs.llvm.org/show_bug.cgi?id=39195 ...horizontal ops can be worse for performance than a shuffle+regular binop, so I've added a TODO. Ideally, we'd solve that in a machine instruction pass, but a quicker solution will be adding a 'HasFastHorizontalOp' feature bit to deal with it here in the DAG. Differential Revision: https://reviews.llvm.org/D52997 llvm-svn: 344141	2018-10-10 13:39:59 +00:00
Simon Pilgrim	5cb3a82892	[TargetLowering] Add root node back to work list after successful SimplifyDemandedBits/SimplifyDemandedVectorElts Similar to what already happens in the DAGCombiner wrappers, this patch adds the root nodes back onto the worklist if the DCI wrappers' SimplifyDemandedBits/SimplifyDemandedVectorElts were successful. Differential Revision: https://reviews.llvm.org/D53026 llvm-svn: 344132	2018-10-10 10:44:15 +00:00
Jonas Paulsson	bf66f38705	[SystemZ] Temporarily disable high VFs with integer div/rem. Until mischeduler is clever enough to avoid spilling in a vectorized loop with many (scalar) DLRs it is better to avoid high vectorization factors (8 and above). llvm-svn: 344129	2018-10-10 09:30:29 +00:00
Craig Topper	02c62aa58a	[X86] Remove FeatureRTM from Skylake processor list Summary: There are a LOT of Skylakes and later without TSX-NI. Examples: - SKL: https://ark.intel.com/products/136863/Intel-Core-i3-8121U-Processor-4M-Cache-up-to-3-20-GHz- - KBL: https://ark.intel.com/products/97540/Intel-Core-i7-7560U-Processor-4M-Cache-up-to-3-80-GHz- - KBL-R: https://ark.intel.com/products/149091/Intel-Core-i7-8565U-Processor-8M-Cache-up-to-4-60-GHz- - CNL: https://ark.intel.com/products/136863/Intel-Core-i3-8121U-Processor-4M-Cache-up-to-3_20-GHz This feature seems to be present only on high-end desktop and server chips (I can't find any SKX without). This commit leaves it disabled for all processors, but can be re-enabled for specific builds with -mrtm. Patch by Thiago Macieira Reviewers: erichkeane, craig.topper Reviewed By: craig.topper Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D53041 llvm-svn: 344116	2018-10-10 07:43:35 +00:00
Jonas Paulsson	2c8b33770c	[SystemZ] Take better care when computing needed vector registers in TTI. A new function getNumVectorRegs() is better to use for the number of needed vector registers instead of getNumberOfParts(). This is to make sure that the number of vector registers (and typically operations) required for a vector type is accurate. getNumberOfParts() which was previously used works by splitting the vector type until it is legal gives incorrect results for types with a non power of two number of elements (rare). A new static function getScalarSizeInBits() that also checks for a pointer type and returns 64U for it since otherwise it gets a value of 0). Used in a few places where Ty may be pointer. Review: Ulrich Weigand llvm-svn: 344115	2018-10-10 07:36:27 +00:00
QingShan Zhang	bc1586352e	[PowerPC] Fix the assert of ISD::SIGN_EXTEND_INREG when type is v2i16 and v2i8 For ISD::SIGN_EXTEND_INREG operation of v2i16 and v2i8 types will cause assert because they are registered as custom operation. So that the type legalization phase will enter the custom hook, which do not handle ISD::SIGN_EXTEND_INREG operation and fall throw into unreachable assert. Patch By: wuzish (Zixuan Wu) Differential Revision: https://reviews.llvm.org/D52449 llvm-svn: 344109	2018-10-10 02:33:48 +00:00
Thomas Lively	108e98ec32	[WebAssembly] Fix fneg lowering Summary: Subtraction from zero and floating point negation do not have the same semantics, so fix lowering. Reviewers: aheejin, dschuff Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D52948 llvm-svn: 344107	2018-10-10 01:09:09 +00:00
Heejin Ahn	5d900954bd	[WebAssembly] Improve comments for SIMD instruction definitions llvm-svn: 344106	2018-10-10 01:04:02 +00:00
Thomas Lively	409f5840a7	[WebAssembly] Handle V128 register class in explicit locals pass Summary: Also add tests to catch crashes in passes that are not normally run in tests. Reviewers: aheejin, dschuff Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D52959 llvm-svn: 344094	2018-10-09 23:33:16 +00:00
Rong Xu	5c7bf1a756	[X86] Fix sanitizer bot failure from 344085 Fix the memory issue exposed by sanitizer. llvm-svn: 344092	2018-10-09 23:10:56 +00:00
Heejin Ahn	d9a6de3c38	[WebAssembly] Improve readability of SIMD instructions (NFC) Summary: - Categorize instructions into the categories as in the SIMD spec - Move SIMD-related definition to WebAssemblyInstrSIMD.td - Put definition and use of patterns together - Add newlines here and there Reviewers: tlively Subscribers: dschuff, sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D53045 llvm-svn: 344086	2018-10-09 22:23:39 +00:00
Rong Xu	3d2efdfdea	Recommit r343993: [X86] condition branches folding for three-way conditional codes Fix the memory issue exposed by sanitizer. llvm-svn: 344085	2018-10-09 22:03:40 +00:00
Nemanja Ivanovic	87873d04c3	[PowerPC] Implement hasBitPreservingFPLogic for types that can be supported This is the PPC-specific non-controversial part of https://reviews.llvm.org/D44548 that simply enables this combine for PPC since PPC has these instructions. This commit will allow the target-independent portion to be truly target independent. llvm-svn: 344077	2018-10-09 20:35:15 +00:00
Craig Topper	f6d8400869	[X86] When lowering unsigned v2i64 setcc without SSE42, flip the sign bits in the v2i64 type then bitcast to v4i32. This may give slightly better opportunities for DAG combine to simplify with the operations before the setcc. It also matches the type the xors will eventually be promoted to anyway so it saves a legalization step. Almost all of the test changes are because our constant pool entry is now v2i64 instead of v4i32 on 64-bit targets. On 32-bit targets getConstant should be emitting a v4i32 build_vector and a v4i32->v2i64 bitcast. There are a couple test cases where it appears we now combine a bitwise not with one of these xors which caused a new constant vector to be generated. This prevented a constant pool entry from being shared. But if that's an issue we're concerned about, it seems we need to address it another way that just relying a bitcast to hide it. This came about from experiments I've been trying with pushing the promotion of and/or/xor to vXi64 later than LegalizeVectorOps where it is today. We run LegalizeVectorOps in a bottom up order. So the and/or/xor are promoted before their users are legalized. The bitcasts added for the promotion act as a barrier to computeKnownBits if we try to use it during vector legalization of a later operation. So by moving the promotion out we can hopefully get better results from computeKnownBits/computeNumSignBits like in LowerTruncate on AVX512. I've also looked at running LegalizeVectorOps in a top down order like LegalizeDAG, but thats showing some other issues. llvm-svn: 344071	2018-10-09 19:05:50 +00:00
Sanjay Patel	f5fac1826a	[x86] use demanded bits to simplify masked store codegen As noted in D52747, if we prefer IR to use trunc for bool vectors rather than and+icmp, we can expose codegen shortcomings as seen here with masked store. Replace a hard-coded PCMPGT simplification with the more general demanded bits call to improve things. Differential Revision: https://reviews.llvm.org/D52964 llvm-svn: 344048	2018-10-09 14:04:14 +00:00
Simon Atanasyan	d465318c6d	[mips] Set pointer size to 4 bytes for N32 ABI CodePointerSize and CalleeSaveStackSlotSize values are used in DWARF generation. In case of MIPS it's incorrect to check for Triple::isMIPS64() only this function returns true for N32 ABI too. Now we do not have a method to recognize N32 if it's specified by a command line option and is not a part of a target triple. So we check for Triple::GNUABIN32 only. It's better than nothing. Differential revision: https://reviews.llvm.org/D52874 llvm-svn: 344039	2018-10-09 11:29:45 +00:00
Nemanja Ivanovic	4c0b110e3e	[PowerPC] Remove self-copies in pre-emit peephole There are occasionally instances where AADB rewrites registers in such a way that a reg-reg copy becomes a self-copy. Such an instruction is obviously redundant and can be removed. This patch does precisely that. Note that this will not remove various nop's that we insert (which are themselves just self-copies). The reason those are left alone is that all of them have their own opcodes (that just encode to a self-copy). What prompted this patch is the fact that these self-copies sometimes end up using registers that make the instruction a priority-setting nop, thereby having a significant effect on performance. Differential revision: https://reviews.llvm.org/D52432 llvm-svn: 344036	2018-10-09 10:54:04 +00:00
Simon Pilgrim	720db8ed7b	[X86][AVX1] Enable _EXTEND_VECTOR_INREG lowering of 256-bit vectors As discussed on D52964, this adds 256-bit _EXTEND_VECTOR_INREG lowering support for AVX1 targets to help improve SimplifyDemandedBits handling. Differential Revision: https://reviews.llvm.org/D52980 llvm-svn: 344019	2018-10-09 07:42:01 +00:00
Petar Jovanovic	aa97890d66	[MIPS GlobalISel] Legalize i64 add Custom legalize s64 G_ADD for MIPS32. Patch by Petar Avramovic. Differential Revision: https://reviews.llvm.org/D52652 llvm-svn: 344007	2018-10-08 23:59:37 +00:00
Rong Xu	47fd015163	[X86] Revert r343993 condition branches folding for three-way conditional codes Some buildbots failed. llvm-svn: 343998	2018-10-08 22:08:43 +00:00
Craig Topper	ff9f02580d	[X86] Prefer isTypeLegal over checking isSimple in a DAG combine. Simple types are a superset of what all in tree targets in LLVM could possibly have a legal type. This means the behavior of using isSimple to check for a supported type for X86 could change over time. For example, this could would change if a v256i1 type was added to MVT in the future. llvm-svn: 343995	2018-10-08 20:02:59 +00:00
Rong Xu	67b1b328f7	[X86] condition branches folding for three-way conditional codes This patch implements a pass that optimizes condition branches on x86 by taking advantage of the three-way conditional code generated by compare instructions. Currently, it tries to hoisting EQ and NE conditional branch to a dominant conditional branch condition where the same EQ/NE conditional code is computed. An example: bb_0: cmp %0, 19 jg bb_1 jmp bb_2 bb_1: cmp %0, 40 jg bb_3 jmp bb_4 bb_4: cmp %0, 20 je bb_5 jmp bb_6 Here we could combine the two compares in bb_0 and bb_4 and have the following code: bb_0: cmp %0, 20 jg bb_1 jl bb_2 jmp bb_5 bb_1: cmp %0, 40 jg bb_3 jmp bb_6 For the case of %0 == 20 (bb_5), we eliminate two jumps, and the control height for bb_6 is also reduced. bb_4 is gone after the optimization. This optimization is motivated by the branch pattern generated by the switch lowering: we always have pivot-1 compare for the inner nodes and we do a pivot compare again the leaf (like above pattern). This pass currently is enabled on Intel's Sandybridge and later arches. Some reviewers pointed out that on some arches (like AMD Jaguar), this pass may increase branch density to the point where it hurts the performance of the branch predictor. Differential Revision: https://reviews.llvm.org/D46662 llvm-svn: 343993	2018-10-08 18:52:39 +00:00
Scott Linder	823549a6ec	[AMDGPU] Legalize VGPR Rsrc operands for MUBUF instructions Emit a waterfall loop in the general case for a potentially-divergent Rsrc operand. When practical, avoid this by using Addr64 instructions. Recommits r341413 with changes to update the MachineDominatorTree when present. Differential Revision: https://reviews.llvm.org/D51742 llvm-svn: 343992	2018-10-08 18:47:01 +00:00
Simon Pilgrim	6fc8d05565	[X86][AVX2] Enable ZERO_EXTEND_VECTOR_INREG lowering of 256-bit vectors Some necessary yak shaving before lowering *_EXTEND_VECTOR_INREG 256-bit vectors on AVX1 targets as suggested by D52964. Differential Revision: https://reviews.llvm.org/D52970 llvm-svn: 343991	2018-10-08 18:40:50 +00:00
Sanjay Patel	43bf9917cc	[x86] make horizontal binop matching clearer; NFCI The instructions are complicated, so this code will probably never be very obvious, but hopefully this makes it better. As shown in PR39195: https://bugs.llvm.org/show_bug.cgi?id=39195 ...we need to improve the matching to not miss cases where we're h-opping on 1 source vector, and that should be a small patch after this rearranging. llvm-svn: 343989	2018-10-08 18:08:02 +00:00
Tom Stellard	14d8807d9a	AMDGPU/GlobalISel: Select amdgcn.cvt.pkrtz to 64-bit instructions Summary: The 32-bit variants do not exist on VI+. Reviewers: arsenm Reviewed By: arsenm Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D52958 llvm-svn: 343985	2018-10-08 17:49:29 +00:00
Neil Henning	6641657453	[AMDGPU] Add an AMDGPU specific atomic optimizer. This commit adds a new IR level pass to the AMDGPU backend to perform atomic optimizations. It works by: - Running through a function and finding atomicrmw add/sub or uses of the atomic buffer intrinsics for add/sub. - If all arguments except the value to be added/subtracted are uniform, record the value to be optimized. - Run through the atomic operations we can optimize and, depending on whether the value is uniform/divergent use wavefront wide operations (DPP in the divergent case) to calculate the total amount to be atomically added/subtracted. - Then let only a single lane of each wavefront perform the atomic operation, reducing the total number of atomic operations in flight. - Lastly we recombine the result from the single lane to each lane of the wavefront, and calculate our individual lanes offset into the final result. Differential Revision: https://reviews.llvm.org/D51969 llvm-svn: 343973	2018-10-08 15:49:19 +00:00
Oliver Stannard	367b4741f4	[AArch64][v8.5A] Don't create BR instructions in outliner when BTI enabled When branch target identification is enabled, we can only do indirect tail-calls through x16 or x17. This means that the outliner can't transform a BLR instruction at the end of an outlined region into a BR. Differential revision: https://reviews.llvm.org/D52869 llvm-svn: 343969	2018-10-08 14:12:08 +00:00
Oliver Stannard	c922116a51	[AArch64][v8.5A] Restrict indirect tail calls to use x16/17 only when using BTI When branch target identification is enabled, all indirectly-callable functions start with a BTI C instruction. this instruction can only be the target of certain indirect branches (direct branches and fall-through are not affected): - A BLR instruction, in either a protected or unprotected page. - A BR instruction in a protected page, using x16 or x17. - A BR instruction in an unprotected page, using any register. Without BTI, we can use any non call-preserved register to hold the address for an indirect tail call. However, when BTI is enabled, then the code being compiled might be loaded into a BTI-protected page, where only x16 and x17 can be used for indirect tail calls. Legacy code withiout this restriction can still indirectly tail-call BTI-protected functions, because they will be loaded into an unprotected page, so any register is allowed. Differential revision: https://reviews.llvm.org/D52868 llvm-svn: 343968	2018-10-08 14:09:15 +00:00
Oliver Stannard	250e5a5b65	[AArch64][v8.5A] Branch Target Identification code-generation pass The Branch Target Identification extension, introduced to AArch64 in Armv8.5-A, adds the BTI instruction, which is used to mark valid targets of indirect branches. When enabled, the processor will trap if an instruction in a protected page tries to perform an indirect branch to any instruction other than a BTI. The BTI instruction uses encodings which were NOPs in earlier versions of the architecture, so BTI-enabled code will still run on earlier hardware, just without the extra protection. There are 3 variants of the BTI instruction, which are valid targets for different kinds or branches: - BTI C can be targeted by call instructions, and is inteneded to be used at function entry points. These are the BLR instruction, as well as BR with x16 or x17. These BR instructions are allowed for use in PLT entries, and we can also use them to allow indirect tail-calls. - BTI J can be targeted by BR only, and is intended to be used by jump tables. - BTI JC acts ab both a BTI C and a BTI J instruction, and can be targeted by any BLR or BR instruction. Note that RET instructions are not restricted by branch target identification, the reason for this is that return addresses can be protected more effectively using return address signing. Direct branches and calls are also unaffected, as it is assumed that an attacker cannot modify executable pages (if they could, they wouldn't need to do a ROP/JOP attack). This patch adds a MachineFunctionPass which: - Adds a BTI C at the start of every function which could be indirectly called (either because it is address-taken, or externally visible so could be address-taken in another translation unit). - Adds a BTI J at the start of every basic block which could be indirectly branched to. This could be either done by a jump table, or by taking the address of the block (e.g. the using GCC label values extension). We only need to use BTI JC when a function is indirectly-callable, and takes the address of the entry block. I've not been able to trigger this from C or IR, but I've included a MIR test just in case. Using BTI C at function entries relies on the fact that no other code in BTI-protected pages uses indirect tail-calls, unless they use x16 or x17 to hold the address. I'll add that code-generation restriction as a separate patch. Differential revision: https://reviews.llvm.org/D52867 llvm-svn: 343967	2018-10-08 14:04:24 +00:00
Alexander Ivchenko	1aedf203dd	[GlobalIsel][X86] Support G_UDIV/G_UREM/G_SREM Support G_UDIV/G_UREM/G_SREM. The instruction selection code is taken from FastISel with only minor tweaks to adapt for GlobalISel. Differential Revision: https://reviews.llvm.org/D49781 llvm-svn: 343966	2018-10-08 13:40:34 +00:00
Neil Henning	57f5d0a885	[IRBuilder] Fixup CreateIntrinsic to allow specifying Types to Mangle. The IRBuilder CreateIntrinsic method wouldn't allow you to specify the types that you wanted the intrinsic to be mangled with. To fix this I've: - Added an ArrayRef<Type > member to both CreateIntrinsic overloads. - Used that array to pass into the Intrinsic::getDeclaration call. - Added a CreateUnaryIntrinsic to replace the most common use of CreateIntrinsic where the type was auto-deduced from operand 0. - Added a bunch more unit tests to test CreateIntrinsic calls that weren't being tested (including the FMF flag that wasn't checked). This was suggested as part of the AMDGPU specific atomic optimizer review (https://reviews.llvm.org/D51969). Differential Revision: https://reviews.llvm.org/D52087 llvm-svn: 343962	2018-10-08 10:32:33 +00:00
Peter Smith	6f36cd4d76	[ARM] Account for implicit IT when calculating inline asm size When deciding if it is safe to optimize a conditional branch to a CBZ or CBNZ the offsets of the BasicBlocks from the start of the function are estimated. For inline assembly the generic getInlineAsmLength() function is used to get a worst case estimate of the inline assembly by multiplying the number of instructions by the max instruction size of 4 bytes. This unfortunately doesn't take into account the generation of Thumb implicit IT instructions. In edge cases such as when all the instructions in the block are 4-bytes in size and there is an implicit IT then the size is underestimated. This can cause an out of range CBZ or CBNZ to be generated. The patch takes a conservative approach and assumes that every instruction in the inline assembly block may have an implicit IT. Fixes pr31805 Differential Revision: https://reviews.llvm.org/D52834 llvm-svn: 343960	2018-10-08 09:38:28 +00:00
Oliver Stannard	9ecdac8ee0	[AArch64] Fix verifier error when outlining indirect calls The MachineOutliner for AArch64 transforms indirect calls into indirect tail calls, replacing the call with the TCRETURNri pseudo-instruction. This pseudo lowers to a BR, but has the isCall and isReturn flags set. The problem is that TCRETURNri takes a tcGPR64 as the register argument, to prevent indiret tail-calls from using caller-saved registers. The indirect calls transformed by the outliner could use caller-saved registers. This is fine, because the outliner ensures that the register is available at all call sites. However, this causes a verifier failure when the register is not in tcGPR64. The fix is to add a new pseudo-instruction like TCRETURNri, but which accepts any GPR. Differential revision: https://reviews.llvm.org/D52829 llvm-svn: 343959	2018-10-08 09:18:48 +00:00
Simon Pilgrim	9fa1c66421	[X86] getFauxShuffleMask - Handle undef + sentinel values in subvector insertion llvm-svn: 343926	2018-10-06 22:13:44 +00:00
Simon Pilgrim	a30e8d23e2	[X86][AVX] Ensure resolveTargetShuffleInputs shuffle masks are the correct width Don't handle ZERO_EXTEND style shuffles until we support bitcasts. Found by inspection. llvm-svn: 343924	2018-10-06 17:18:41 +00:00
Simon Pilgrim	62d199f4e5	[X86] combinePMULDQ - add op back to worklist if SimplifyDemandedBits succeeds on either operand Prevents missing other simplifications that may occur deep in the operand chain where CommitTargetLoweringOpt won't add the PMULDQ back to the worklist itself llvm-svn: 343922	2018-10-06 14:51:14 +00:00
Simon Pilgrim	0cc0a24b55	[X86][SSE] SimplifyDemandedVectorEltsForTargetNode - simplify PSHUFB masks Attempt to simplify PSHUFB masks (even non-constant ones) - we should probably be able to simplify other variable shuffles as well as the need arises. llvm-svn: 343919	2018-10-06 13:49:31 +00:00
Simon Pilgrim	ae78d709b4	[X86] Use the SimplifyDemandedBits wrappers where possible. NFCI. Leave the wrapper to handle TargetLowering::TargetLoweringOpt and CommitTargetLoweringOpt. llvm-svn: 343918	2018-10-06 13:29:08 +00:00
Alex Bradbury	639df9e4c0	[RISCV] Compress addiw rd, x0, simm6 to c.li rd, simm6 A pattern was present for addi rd, x0, simm6 but not addiw which is semantically identical when the source register is x0. This patch addresses that, and the benefit can be seen in rv64c-aliases-valid.s. llvm-svn: 343911	2018-10-06 06:09:46 +00:00
Tom Stellard	251ee083a3	AMDGPU: Consolidate SMRD TableGen patterns Summary: Merge the SMRD patterns for CI into the same multiclass as the patterns for other sub-targets. This removes some duplicate code and will make it easier for some future GlobalISel changes I would like to do. Reviewers: arsenm Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D52557 llvm-svn: 343909	2018-10-06 03:32:43 +00:00
Matthias Braun	81578e9f77	X86, AArch64, ARM: Do not attach debug location to spill/reload instructions This rebases and recommits r343520. hwasan should be fixed now and this shouldn't break the tests anymore. Spill/reload instructions are artificially generated by the compiler and have no relation to the original source code. So the best thing to do is not attach any debug location to them (instead of just taking the next debug location we find on following instructions). Differential Revision: https://reviews.llvm.org/D52125 llvm-svn: 343895	2018-10-05 22:00:13 +00:00
Simon Pilgrim	dc97118efe	[X86][AVX] Limit getFauxShuffleMask INSERT_SUBVECTOR support to 2 inputs rL343853 didn't limit the number of subinputs, but we don't currently support faux shuffles with more than 2 total inputs, so put a limiter in place until this is fixed. Found by Artem Dergachev. llvm-svn: 343891	2018-10-05 21:44:19 +00:00
Craig Topper	0ed892da70	[X86] Don't promote i16 compares to i32 if the immediate will fit in 8 bits. The comments in this code say we were trying to avoid 16-bit immediates, but if the immediate fits in 8-bits this isn't an issue. This avoids creating a zero extend that probably won't go away. The movmskb related changes are interesting. The movmskb instruction writes a 32-bit result, but fills the upper bits with 0. So the zero_extend we were previously emitting was free, but we turned a -1 immediate that would fit in 8-bits into a 32-bit immediate so it was still bad. llvm-svn: 343871	2018-10-05 18:13:36 +00:00
Simon Pilgrim	f09fc3bc12	[X86] Move ReadAfterLd functionality into X86FoldableSchedWrite (PR36957) Currently we hardcode instructions with ReadAfterLd if the register operands don't need to be available until the folded load has completed. This doesn't take into account the different load latencies of different memory operands (PR36957). This patch adds a ReadAfterFold def into X86FoldableSchedWrite to replace ReadAfterLd, allowing us to specify the load latency at a scheduler class level. I've added ReadAfterVec*Ld classes that match the XMM/Scl, XMM and YMM/ZMM WriteVecLoad classes that we currently use, we can tweak these values in future patches once this infrastructure is in place. Differential Revision: https://reviews.llvm.org/D52886 llvm-svn: 343868	2018-10-05 17:57:29 +00:00
Simon Pilgrim	6c5ab48fe7	[X86][AVX] getFauxShuffleMask - add support for INSERT_SUBVECTOR subvector shuffles Decode subvector shuffles from INSERT_SUBVECTOR(SRC0, SHUFFLE(EXTRACT_SUBVECTOR(SRC1)) This was found necessary while investigating PR39161 llvm-svn: 343853	2018-10-05 14:41:00 +00:00
Jonas Paulsson	faad1b3056	[TargetRegisterInfo] Remove temporary hook enableMultipleCopyHints() Finally all targets are enabling multiple regalloc hints, so the hook to disable this can now be removed. NFC. Review: Simon Pilgrim https://reviews.llvm.org/D52316 llvm-svn: 343851	2018-10-05 14:23:11 +00:00
Tom Stellard	7c65078f04	AMDGPU/GlobalISel: Add support for G_INTTOPTR Summary: This is a no-op. Reviewers: arsenm Reviewed By: arsenm Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D52916 llvm-svn: 343839	2018-10-05 04:34:09 +00:00
Thomas Lively	4b47d08e52	[WebAssembly] Saturating arithmetic intrinsics Summary: Depends on D52805. Reviewers: aheejin, dschuff Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D52813 llvm-svn: 343833	2018-10-05 00:45:20 +00:00
Yury Delendik	409b439152	[WebAssembly] Ignore DBG_VALUE in WebAssemblyCFGStackify pass when looking for block start Summary: Fixes https://bugs.llvm.org/show_bug.cgi?id=39158 and regression caused by D49034. Though it is possible the problem was existed before and was exposed by additional DBG_VALUEs. Reviewers: sunfish, dschuff, aheejin Reviewed By: aheejin Subscribers: sbc100, aheejin, llvm-commits, alexcrichton, jgravelle-google Differential Revision: https://reviews.llvm.org/D52837 llvm-svn: 343827	2018-10-04 23:31:00 +00:00
Ana Pazos	9d6c55323f	[RISCV] Support named operands for CSR instructions. Reviewers: asb, mgrang Reviewed By: asb Subscribers: jocewei, mgorny, jfb, PkmX, MartinMosbeck, brucehoult, the_o, rkruppe, rogfer01, rbar, johnrusso, simoncook, jordy.potman.lists, sabuasal, niosHD, kito-cheng, shiva0217, zzheng, edward-jones Differential Revision: https://reviews.llvm.org/D46759 llvm-svn: 343822	2018-10-04 21:50:54 +00:00
Craig Topper	7d2155e3f9	[X86][LegalizeVectorOps] Use MERGE_VALUES to return two results from LowerLoad. Remove special case code in LegalizeVectorOps that allowed us to only return one result. Previously we replaced the chain use ourself and return the data result. LegalizeVectorOps then detected that we'd done this and assumed the chain had already been handled. This commit instead returns a MERGE_VALUES node with two results joined from nodes. This allows LegalizeVectorOps to do all the replacements for us without any special casing. The MERGE_VALUES will be removed by DAG combine. llvm-svn: 343817	2018-10-04 21:24:24 +00:00
Heejin Ahn	b68d591475	[WebAssembly] Don't modify preds/succs iterators while erasing from them Summary: This caused out-of-bound bugs. Found by `-DLLVM_ENABLE_EXPENSIVE_CHECKS=ON`. Reviewers: dschuff Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D52902 llvm-svn: 343814	2018-10-04 21:03:35 +00:00
Konstantin Zhuravlyov	aa067cb9fb	AMDGPU: Rename isAmdCodeObjectV2 -> isAmdHsaOrMesa The isAmdCodeObjectV2 is a misleading name which actually checks whether the os is amdhsa or mesa. Also add a test to make sure we do not generate old kernel header for code object v3. Differential Revision: https://reviews.llvm.org/D52897 llvm-svn: 343813	2018-10-04 21:02:16 +00:00
Martin Storsjo	37b742e208	[COFF] [X86] Don't use llvm_unreachable for unsupported relocation types This can happen if assembling a reference to _GLOBAL_OFFSET_TABLE_. While it doesn't make sense to try to assemble that for COFF, the fact that we previously used llvm_unreachable meant that the code had undefined behaviour if something tried to assemble that. The configure script of libgmp would try to assemble such a snippet (which should signal a failure). If llvm is built without assertions, the undefined behaviour meant a (near) infinite loop. Differential Revision: https://reviews.llvm.org/D52903 llvm-svn: 343811	2018-10-04 20:43:38 +00:00
Matthias Braun	0c67a4e958	AArch64: Fix XSeqPairs/WSeqPairs problems - Fix spill/reloads of XSeqPairs failing with vregs (only physregs worked correctly) - Add missing spill/reload code for WSeqPairs class Differential Revision: https://reviews.llvm.org/D52761 llvm-svn: 343799	2018-10-04 17:02:53 +00:00
Farhana Aleen	4bc597bff5	[AMDGPU] Match signed dot4/8 pattern. Summary: This patch matches signed dot4 and dot8 pattern. Author: FarhanaAleen Reviewed By: msearles Differential Revision: https://reviews.llvm.org/D52520 llvm-svn: 343798	2018-10-04 16:57:37 +00:00
Alex Bradbury	5bf3b20e99	[RISCV] Remove overzealous is64Bit checks lowerGlobalAddress, lowerBlockAddress, and insertIndirectBranch contain overzealous checks for is64Bit. These functions are all safe as-implemented for RV64. llvm-svn: 343781	2018-10-04 14:30:03 +00:00
David Greene	4f916df29e	[X86] Set correct MMO offset on scalarized load pieces When scalarizing a load, be sure to update the offset in the MachineMemOperand for each scalar load. llvm-svn: 343776	2018-10-04 14:07:59 +00:00
Simon Pilgrim	991b0d24ff	Fix MSVC "not all control paths return a value" warning. NFCI. llvm-svn: 343765	2018-10-04 10:25:52 +00:00
Alex Bradbury	e96b7c88a3	[RISCV] Bugfix for floats passed on the stack with the ILP32 ABI on RV32F f32 values passed on the stack would previously cause an assertion in unpackFromMemLoc.. This would only trigger in the presence of the F extension making f32 a legal type. Otherwise the f32 would be legalized. This patch fixes that by keeping LocVT=f32 when a float is passed on the stack. It also adds test coverage for this case, and tests that also demonstrate lw/sw/flw/fsw will be selected when most profitable. i.e. there is no unnecessary i32<->f32 conversion in registers. llvm-svn: 343756	2018-10-04 07:28:49 +00:00
Craig Topper	8b3c46f0a8	[X86] Merge matchANDXORWithAllOnesAsANDNP into combineANDXORWithAllOnesIntoANDNP. NFCI It's the only caller and the logic pretty easy to combine. llvm-svn: 343754	2018-10-04 06:13:27 +00:00
Alex Bradbury	0e16766b76	[RISCV][NFC] Fix naming of RISCVISelLowering::{LowerRETURNADDR,LowerFRAMEADDR} Rename to lowerRETURNADDR, lowerFRAMEADDR in order to be consistent with the LLVM coding style and the other functions in this file. llvm-svn: 343752	2018-10-04 05:27:50 +00:00
Alex Bradbury	5ac0a2fc48	[RISCV] Handle redundant SplitF64+BuildPairF64 pairs in a DAGCombine r343712 performed this optimisation during instruction selection. As Eli Friedman pointed out in post-commit review, implementing this as a DAGCombine might allow opportunities for further optimisations. llvm-svn: 343741	2018-10-03 23:30:16 +00:00
Thomas Lively	5d461c96bd	[WebAssembly] Bitselect intrinsic and instruction Summary: Depends on D52755. Reviewers: aheejin, dschuff Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D52805 llvm-svn: 343739	2018-10-03 23:02:23 +00:00
Alex Bradbury	1dbfdeb6e5	[RISCV][NFC] Refactor LocVT<->ValVT converstion in RISCVISelLowering There was some duplicated logic for using the LocInfo of a CCValAssign in order to convert from the ValVT to LocVT or vice versa. Resolve this by factoring out convertLocVTFromValVT from unpackFromRegLoc. Also rename packIntoRegLoc to the more appropriate convertValVTToLocVT and call these helper functions consistently. llvm-svn: 343737	2018-10-03 22:53:25 +00:00
Derek Schuff	77a7a38006	[WebAssembly] Refactor WasmSignature and use it for MCSymbolWasm MCContext does not destroy MCSymbols on shutdown. So, rather than putting SmallVectors (which may heap-allocate) inside MCSymbolWasm, use unowned pointer to a WasmSignature instead. The signatures are now owned by the AsmPrinter. Also uses WasmSignature instead of param and result vectors in TargetStreamer, and leaves some TODOs for further simplification. Differential Revision: https://reviews.llvm.org/D52580 llvm-svn: 343733	2018-10-03 22:22:48 +00:00
Craig Topper	a65c2dbfd6	[X86] Stop promoting vector ISD::SELECT to vXi64. The additional patterns needed for this aren't overwhelming and introducing extra bitcasts during lowering limits our ability to do computeNumSignBits. Not that I have a good example of that for select. I'm just becoming increasingly grumpy about promotion of AND/OR/XOR. SELECT was just a lot easier to fix. llvm-svn: 343723	2018-10-03 21:10:29 +00:00
Craig Topper	c39dc41b63	[X86] Add CMOV_VK2/VK4 pseudos and remove lowering code that turned v2i1/v4i1 SELECT into v8i1. llvm-svn: 343713	2018-10-03 20:28:43 +00:00
Alex Bradbury	ce9049952f	[RISCV][NFCI] Handle redundant splitf64+buildpairf64 pairs during instruction selection Although we can't write a tablegen pattern to remove redundant splitf64+buildf64 pairs due to the multiple return values, we can handle it with some C++ selection code. This is simpler than removing them after instruction selection through RISCVDAGToDAGISel::PostprocessISelDAG, as was done previously. llvm-svn: 343712	2018-10-03 20:12:10 +00:00
Craig Topper	703fbde3cb	[X86] Add CMOV pseudos for VR128X and VR256X register classes. Use them when AVX512VL is enabled. This allows the phi nodes to be generated with the correct register class when expanded. llvm-svn: 343710	2018-10-03 19:48:26 +00:00
Craig Topper	4b62c2dbda	[X86] Don't break CMOV pseudo instructions down by type. Just by register class. The register class is all that's important for the pseudo instructions. We can use patterns to handle the different types. llvm-svn: 343709	2018-10-03 19:48:23 +00:00
Simon Pilgrim	aabd99c27a	[X86] PUSH/POP 'mem-mem' instructions are not RMW - these are 2 different addresses This patch adds a 'WriteCopy' [WriteLoad, WriteStore] schedule sequence instead to better model the behaviour Found by @andreadb during llvm-mca testing on btver2 which was crashing on "zero uop" WriteRMW only instructions llvm-svn: 343708	2018-10-03 19:02:38 +00:00
Simon Pilgrim	b80d27a916	[X86] Move Atomic binops to use WriteALURMW schedule class These were being tagged as <WriteALULd, WriteRMW> instead of properly using the RMW sequence llvm-svn: 343705	2018-10-03 18:38:28 +00:00
Simon Pilgrim	0b451a2983	[X86][Btver2] Fix MMX PSHUFB schedule Match AMD Fam16h SOG + llvm-exegesis tests llvm-svn: 343701	2018-10-03 18:18:50 +00:00
Simon Pilgrim	a400612aed	[X86] Move Atomic CMPXCHG to WriteCMPXCHGRMW schedule class llvm-svn: 343700	2018-10-03 18:05:01 +00:00
Simon Pilgrim	2c59475c06	[X86] Add SkylakeClient uops counter - same as the other Intel models. llvm-svn: 343697	2018-10-03 16:45:26 +00:00
Nirav Dave	925b64be64	[X86] Correctly use SSE registers if no-x87 is selected. Fix use of SSE1 registers for f32 ops in no-x87 mode. Notably, allow use of SSE instructions for f32 operations in 64-bit mode (but not 32-bit which is disallowed by callign convention). Also avoid translating memset/memcopy/memmove into SSE registers without X87 for 32-bit mode. This fixes PR38738. Reviewers: nickdesaulniers, craig.topper Subscribers: hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D52555 llvm-svn: 343689	2018-10-03 14:13:30 +00:00
Alex Bradbury	d33ffe9bb1	[RISCV][NFC] Refactor RISCVDAGToDAGISel::Select Introduce and use a switch on the opcode. llvm-svn: 343688	2018-10-03 13:13:13 +00:00
Alex Bradbury	d934032e48	[RISCV] Gate float<->int and double<->int conversion patterns on IsRV32 The patterns as defined are correct only when XLen==32. This is another preparatory patch for a set of patches that flesh out RV64 codegen. llvm-svn: 343679	2018-10-03 11:35:22 +00:00

1 2 3 4 5 ...

49446 Commits