llvm-project

Commit Graph

Author	SHA1	Message	Date
Sander de Smalen	7f23e0a62f	Enforce StackID definition in PEI There are various places in LLVM where the definition of StackID is not properly honoured, for example in PEI where objects with a StackID > 0 are allocated on the default stack (StackID0). This patch enforces that PEI only considers allocating objects to StackID 0. Reviewers: arsenm, thegameg, MatzeB Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D60062 llvm-svn: 357460	2019-04-02 09:46:52 +00:00
Craig Topper	c5903c935c	[X86] Use unsigned type for opcodes throughout X86FixupLEAs. All of the interfaces related to opcode in MachineInstr and MCInstrInfo refer to opcodes as unsigned. llvm-svn: 357444	2019-04-02 00:50:58 +00:00
Eli Friedman	3813fe0bda	[ARM] Optimize expressions like "return x != 0;" for Thumb1. There's an existing optimization for x != C, but somehow it was missing a special case for 0. While I'm here, also cleaned up the code/comments a bit: the second value produced by the MERGE_VALUES was actually dead, since a CMOV only produces one result. Differential Revision: https://reviews.llvm.org/D59616 llvm-svn: 357437	2019-04-02 00:01:23 +00:00
Eli Friedman	73af6ef2e7	[ARM] Don't try to create "push {r12, lr}" in Thumb1 at -Oz. It's a little tricky to make this issue show up because prologue/epilogue emission normally likes to push at least two registers... but it doesn't when lr is force-spilled due to function length. Not sure if that really makes sense, but I decided not to touch it for now. Differential Revision: https://reviews.llvm.org/D59385 llvm-svn: 357436	2019-04-01 23:55:57 +00:00
Jessica Paquette	e44c20a68d	[AArch64][GlobalISe] Select STRQui for stores into v264s instead of scalarizing This improves selection for vector stores into v2s64s. Before we just scalarized them, but we can just use a STRQui instead. Differential Revision: https://reviews.llvm.org/D60083 llvm-svn: 357432	2019-04-01 22:19:13 +00:00
Craig Topper	4307172b84	[X86] Classify the AVX512 rounding control operand as X86::OPERAND_ROUNDING_CONTROL instead of MCOI::OPERAND_IMMEDIATE. Add an assert on legal values of rounding control in the encoder and remove an explicit mask. This should allow llvm-exegesis to intelligently constrain the rounding mode. The mask in the encoder shouldn't be necessary any more. We used to allow codegen to use 8-11 for rounding mode and the assembler would use 0-3 to mean the same thing so we masked here and in the printer. Codegen now matches the assembler and the printer was updated, but I forgot to update the encoder. llvm-svn: 357419	2019-04-01 19:08:15 +00:00
Bixia Zheng	6c21ccd245	[NVPTX] Fix the codegen for llvm.round. Summary: Previously, we translate llvm.round to PTX cvt.rni, which rounds to the even interger when the source is equidistant between two integers. This is not correct as llvm.round should round away from zero. This change replaces llvm.round with a round away from zero implementation through target specific custom lowering. Modify a few affected tests to not check for cvt.rni. Instead, we check for the use of a few constants used in implementing round. We are also adding CUDA runnable tests to check for the values produced by llvm.round to test-suites/External/CUDA. Reviewers: tra Subscribers: jholewinski, sanjoy, jlebar, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59947 llvm-svn: 357407	2019-04-01 16:10:26 +00:00
Neil Henning	0a30f33ce2	[AMDGPU] Pre-allocate WWM registers to reduce VGPR pressure. This change incorporates an effort by Connor Abbot to change how we deal with WWM operations potentially trashing valid values in inactive lanes. Previously, the SIFixWWMLiveness pass would work out which registers were being trashed within WWM regions, and ensure that the register allocator did not have any values it was depending on resident in those registers if the WWM section would trash them. This worked perfectly well, but would cause sometimes severe register pressure when the WWM section resided before divergent control flow (or at least that is where I mostly observed it). This fix instead runs through the WWM sections and pre allocates some registers for WWM. It then reserves these registers so that the register allocator cannot use them. This results in a significant register saving on some WWM shaders I'm working with (130 -> 104 VGPRs, with just this change!). Differential Revision: https://reviews.llvm.org/D59295 llvm-svn: 357400	2019-04-01 15:19:52 +00:00
David Spickett	3d233d5d4d	[AArch64] Add v8.5-a Memory Tagging STZGM instruction This instruction writes a block of allocation tags and stores zero to the associated data locations. It differs from STGM by 1 bit and has the same arguments. The specification can be found here: https://developer.arm.com/docs/ddi0596/c Differential Revision: https://reviews.llvm.org/D60065 llvm-svn: 357397	2019-04-01 14:56:37 +00:00
Alex Bradbury	44668ae7c7	[RISCV] Attach VK_RISCV_CALL to symbols upon creation This patch replaces the addition of VK_RISCV_CALL in RISCVMCCodeEmitter by creating the RISCVMCExpr when tail/call are parsed, or in the codegen case when the callee symbols are created. This required adding a new CallSymbol operand to allow only adding VK_RISCV_CALL to tail/call instructions. This patch will allow further expansion of parsing and codegen to easily include PLT symbols which must generate the R_RISCV_CALL_PLT relocation. Differential Revision: https://reviews.llvm.org/D55560 Patch by Lewis Revill. llvm-svn: 357396	2019-04-01 14:53:17 +00:00
David Spickett	9142b8ef1b	[AArch64] Add v8.5-a Memory Tagging STGM/LDGM instructions The STGV/LDGV instructions were replaced with STGM/LDGM. The encodings remain the same but there is no longer writeback so there are no unpredictable encodings to check for. The specfication can be found here: https://developer.arm.com/docs/ddi0596/c Differential Revision: https://reviews.llvm.org/D60064 llvm-svn: 357395	2019-04-01 14:52:18 +00:00
Alex Bradbury	da20f5ca74	[RISCV] Generate address sequences suitable for mcmodel=medium This patch adds an implementation of a PC-relative addressing sequence to be used when -mcmodel=medium is specified. With absolute addressing, a 'medium' codemodel may cause addresses to be out of range. This is because while 'medium' implies a 2 GiB addressing range, this 2 GiB can be at any offset as opposed to 'small', which implies the first 2 GiB only. Note that LLVM/Clang currently specifies code models differently to GCC, where small and medium imply the same functionality as GCC's medlow and medany respectively. Differential Revision: https://reviews.llvm.org/D54143 Patch by Lewis Revill. llvm-svn: 357393	2019-04-01 14:42:56 +00:00
David Spickett	efe376add6	[AArch64] Add v8.5-a Memory Tagging GMID_EL1 register The latest version of the MTE spec added a system register 'GMID_EL1'. It contains the block size used by the LDGM and STGM instructions and is read only. The specification can be found here: https://developer.arm.com/docs/ddi0596/c llvm-svn: 357392	2019-04-01 14:41:14 +00:00
Matt Arsenault	ebf90db084	X86: Fix override warning llvm-svn: 357388	2019-04-01 14:08:26 +00:00
Clement Courbet	7e062c9b1f	[X86] Make post-ra scheduling macrofusion-aware. Subscribers: MatzeB, arsenm, jvesely, nhaehnle, hiraditya, javed.absar, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59688 llvm-svn: 357384	2019-04-01 13:48:50 +00:00
Luis Marques	3091884e25	[RISCV] Add seto pattern expansion Adds a `seto` pattern expansion. Without it the lowerings of `fcmp one` and `fcmp ord` would be inefficient due to an unoptimized double negation. Differential Revision: https://reviews.llvm.org/D59699 llvm-svn: 357378	2019-04-01 09:54:14 +00:00
Craig Topper	2e1bf89e3a	[X86] Use ISD::INTRINSIC_VOID in getTgtMemIntrinsic for truncating stores and scatter intrinsics. This is the appropriate opcode for only having a chain output. Though I'm not sure it matters much. llvm-svn: 357375	2019-04-01 05:26:12 +00:00
Alex Bradbury	ca81a56f65	[RISCV] Don't evaluatePCRelLo if a relocation will be forced (e.g. due to linker relaxation) A pcrel_lo will point to the associated pcrel_hi fixup which in turn points to the real target. RISCVMCExpr::evaluatePCRelLo will work around this indirection in order to allow the fixup to be evaluate properly. However, if relocations are forced (e.g. due to linker relaxation is enabled) then its evaluation is undesired and will result in a relocation with the wrong target. This patch modifies evaluatePCRelLo so it will not try to evaluate if the fixup will be forced as a relocation. A new helper method is added to RISCVAsmBackend to query this. Differential Revision: https://reviews.llvm.org/D59686 llvm-svn: 357374	2019-04-01 02:38:27 +00:00
Sanjay Patel	e1bc360fc6	[x86] allow movmsk with 2-element reductions One motivation for making this change is that the lack of using movmsk is likely a main source of perf difference between clang and gcc on the C-Ray benchmark as shown here: https://www.phoronix.com/scan.php?page=article&item=gcc-clang-2019&num=5 ...but this change alone isn't enough to solve that problem. The 'all-of' examples show what is likely the worst case trade-off: we end up with an extra instruction (or 2 if we count the 'xor' register clearing). The 'any-of' examples look clearly better using movmsk because we've traded 2 vector instructions for 2 scalar instructions, and movmsk may have better timing than the generic 'movq'. If we examine the llvm-mca output for these cases, it appears that even though the 'all-of' movmsk variant looks worse on paper, it would perform better on both Haswell and Jaguar. $ llvm-mca -mcpu=haswell no_movmsk.s -timeline Iterations: 100 Instructions: 400 Total Cycles: 504 Total uOps: 400 Dispatch Width: 4 uOps Per Cycle: 0.79 IPC: 0.79 Block RThroughput: 1.0 $ llvm-mca -mcpu=haswell movmsk.s -timeline Iterations: 100 Instructions: 600 Total Cycles: 358 Total uOps: 600 Dispatch Width: 4 uOps Per Cycle: 1.68 IPC: 1.68 Block RThroughput: 1.5 $ llvm-mca -mcpu=btver2 no_movmsk.s -timeline Iterations: 100 Instructions: 400 Total Cycles: 407 Total uOps: 400 Dispatch Width: 2 uOps Per Cycle: 0.98 IPC: 0.98 Block RThroughput: 2.0 $ llvm-mca -mcpu=btver2 movmsk.s -timeline Iterations: 100 Instructions: 600 Total Cycles: 311 Total uOps: 600 Dispatch Width: 2 uOps Per Cycle: 1.93 IPC: 1.93 Block RThroughput: 3.0 Finally, there may be CPUs where movmsk is horribly slow (old AMD small cores?), but if that's true, then we're also almost certainly making the wrong transform already for reductions with >2 elements, so that should be fixed independently. Differential Revision: https://reviews.llvm.org/D59997 llvm-svn: 357367	2019-03-31 15:11:34 +00:00
Liang Zou	9f4a4d3974	fix typo: "\t" => " " Reviewers: llvm.org, Jim Reviewed By: Jim Subscribers: arsenm, jvesely, nhaehnle, rupprecht, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59983 llvm-svn: 357365	2019-03-31 14:49:00 +00:00
Craig Topper	e4a0fc7d75	[X86] Teach isel for RMW binops to handle negate Negate updates flags like a subtract. We should be able to use the flags from the RMW form of negate when we have (store (X86ISD::SUB 0, load A), A) Differential Revision: https://reviews.llvm.org/D60007 llvm-svn: 357353	2019-03-30 18:59:17 +00:00
Alex Bradbury	0b2803ee65	[RISCV] Add codegen support for ilp32f, ilp32d, lp64f, and lp64d ("hard float") ABIs This patch adds support for the RISC-V hard float ABIs, building on top of rL355771, which added basic target-abi parsing and MC layer support. It also builds on some re-organisations and expansion of the upstream ABI and calling convention tests which were recently committed directly upstream. A number of aspects of the RISC-V float hard float ABIs require frontend support (e.g. flattening of structs and passing int+fp for fp+fp structs in a pair of registers), and will be addressed in a Clang patch. As can be seen from the tests, it would be worthwhile extending RISCVMergeBaseOffsets to handle constant pool as well as global accesses. Differential Revision: https://reviews.llvm.org/D59357 llvm-svn: 357352	2019-03-30 17:59:30 +00:00
Simon Pilgrim	10c9032c02	[X86][SSE] detectAVGPattern - Match zext(or(x,y)) 'add like' patterns (PR41316) Fixes PR41316 where the expanded PAVG intrinsic had had one of its ADDs turned into an OR due to its operands having no conflicting bits. llvm-svn: 357351	2019-03-30 17:12:29 +00:00
Simon Pilgrim	3293455595	[X86][SSE] detectAVGPattern - begin generalizing ADD matches Move the ADD matching into a helper - first NFC stage towards supporting 'ADD like' cases such as in PR41316 llvm-svn: 357349	2019-03-30 15:31:53 +00:00
Heejin Ahn	c4ac74fb49	[WebAssembly] Fix unwind destination mismatches in CFG stackify Summary: Linearing the control flow by placing `try`/`end_try` markers can create mismatches in unwind destinations. This patch resolves these mismatches by wrapping those instructions with an incorrect unwind destination with a nested `try`/`catch`/`end_try` and branching to the right destination within the new catch block. Reviewers: dschuff Subscribers: sunfish, sbc100, jgravelle-google, chrib, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D48345 llvm-svn: 357343	2019-03-30 11:04:48 +00:00
Heejin Ahn	e9fd9073e4	[WebAssembly] Run ExplicitLocals pass after CFGStackify Summary: While this does not change any final output, this will greatly simplify ixing unwind destination mismatches in CFGStackify (D48345), because we have to create some new registers there. Reviewers: dschuff Subscribers: sunfish, sbc100, jgravelle-google, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59652 llvm-svn: 357342	2019-03-30 09:29:57 +00:00
Alex Bradbury	9681b01c21	[RISCV] Add DAGCombine for (SplitF64 (ConstantFP x)) The SplitF64 node is used on RV32D to convert an f64 directly to a pair of i32 (necessary as bitcasting to i64 isn't legal). When performed on a ConstantFP, this will result in a FP load from the constant pool followed by a store to the stack and two integer loads from the stack (necessary as there is no way to directly move between f64 FPRs and i32 GPRs on RV32D). It's always cheaper to just materialise integers for the lo and hi parts of the FP constant, so do that instead. llvm-svn: 357341	2019-03-30 09:15:47 +00:00
Heejin Ahn	7e7aad1510	[WebAssembly] Optimize the number of routing blocks in FixIrreducibleCFG Summary: Currently we create a routing block to the dispatch block for every predecessor of every entry. So the total number of routing blocks created will be (# of preds) * (# of entries). But we don't need to do this: we need at most 2 routing blocks per loop entry, one for when the predecessor is inside the loop and one for it is outside the loop. (We can't merge these into one because this will creates another loop cycle between blocks inside and blocks outside) This patch fixes this and creates at most 2 routing blocks per entry. This also renames variable `Split` to `Routing`, which I think is a bit clearer. Reviewers: kripken Subscribers: sunfish, dschuff, sbc100, jgravelle-google, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59462 llvm-svn: 357337	2019-03-30 01:31:11 +00:00
Thomas Lively	5f0c4c67bb	[WebAssembly] Add mutable globals feature Summary: This feature is not actually used for anything in the WebAssembly backend, but adding it allows users to get it into the target features sections of their objects, which makes these objects future-compatible. Reviewers: aheejin, dschuff Subscribers: sbc100, jgravelle-google, hiraditya, sunfish, jdoerfert, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D60013 llvm-svn: 357321	2019-03-29 22:00:18 +00:00
Jessica Paquette	d3ffd47df9	[GlobalISel][AArch64] Add isel support for G_INSERT_VECTOR_ELT on v2s32s This adds support for v2s32 vector inserts, and updates the selection + regbankselect tests for G_INSERT_VECTOR_ELT. Differential Revision: https://reviews.llvm.org/D59910 llvm-svn: 357318	2019-03-29 21:39:36 +00:00
Amara Emerson	d413f41de6	[X86] When using Win64 ABI, exit with error if SSE is disabled for varargs We need XMM registers to handle varargs with the Win64 ABI. Before we would silently generate bad code resulting in an assertion failure elsewhere in the backend. llvm-svn: 357317	2019-03-29 21:30:51 +00:00
Heejin Ahn	67f74aceab	[WebAssembly] Handle END_LOOP in unreachable BB in CFGStackify Summary: This fixes crashes when a BB in which an END_LOOP is to be placed is unreachable and does not have any predecessors. Fixes PR41307. Reviewers: dschuff Subscribers: yurydelendik, sbc100, jgravelle-google, sunfish, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60004 llvm-svn: 357303	2019-03-29 19:36:51 +00:00
Matt Arsenault	055e4dce45	AMDGPU: Remove dx10-clamp from subtarget features Since this can be set with s_setreg*, it should not be a subtarget property. Set a default based on the calling convention, and Introduce a new amdgpu-dx10-clamp attribute to override this if desired. Also introduce a new amdgpu-ieee attribute to match. The values need to match to allow inlining. I think it is OK for the caller's dx10-clamp attribute to override the callee, but there doesn't appear to be the infrastructure to do this currently without definining the attribute in the generic Attributes.td. Eventually the calling convention lowering will need to insert a mode switch somewhere for these. llvm-svn: 357302	2019-03-29 19:14:54 +00:00
Craig Topper	4ccb3b96b6	[X86] Use cached OptForSize in X86ISelDAGToDAG.cpp instead of pulling it from the function attribute. NFCI llvm-svn: 357297	2019-03-29 18:36:40 +00:00
Evandro Menezes	0f797b8732	[CodeGen] Refactor the option for the maximum jump table size Refactor the option `max-jump-table-size` to default to the maximum representable number. Essentially, NFC. llvm-svn: 357280	2019-03-29 17:28:11 +00:00
Simon Atanasyan	f26f56d6d3	[mips] Fix lowering a signed immediate for *.d MSA instructions The `lowerMSASplatImm` function zero-extends `i32` immediates while building constant. If target type is `i64`, negative immediate loses the sign. As a result, for example `__builtin_msa_ldi_d(-1)` lowered to series of instruction loads incorrect value 0xffffffff to the `$w0` register instead of single `ldi.d $w0, -1` instruction. The fix zero-extends unsigned immediates and signed-extend signed immediates. Differential Revision: http://reviews.llvm.org/D59884 llvm-svn: 357264	2019-03-29 15:15:22 +00:00
Dmitry Preobrazhensky	d6827ce3a3	[AMDGPU][MC] Corrected conversion rules for inlinable constants to match rules for literals See bug 40806: https://bugs.llvm.org/show_bug.cgi?id=40806 Reviewers: artem.tamazov, arsenm Differential Revision: https://reviews.llvm.org/D59786 llvm-svn: 357262	2019-03-29 14:50:20 +00:00
Dmitry Preobrazhensky	7f33574be3	[AMDGPU][MC] Corrected handling of tied src for atomic return MUBUF opcodes See bug 40917: https://bugs.llvm.org/show_bug.cgi?id=40917 Reviewers: artem.tamazov, arsenm Differential Revision: https://reviews.llvm.org/D59878 llvm-svn: 357249	2019-03-29 12:16:04 +00:00
Konstantin Zhuravlyov	2b766ed774	AMDGPU: Make sram-ecc off by default for Vega20 Differential Revision: https://reviews.llvm.org/D59718 llvm-svn: 357247	2019-03-29 12:04:18 +00:00
Simon Pilgrim	aeaf7fcdde	[X86] Add X86TargetLowering::isCommutativeBinOp override. We currently just have test coverage for PMULUDQ - will add more in the future. llvm-svn: 357244	2019-03-29 11:25:58 +00:00
Kang Zhang	05f78b35ae	[PowerPC] Add the support for __builtin_setrnd() Summary: PowerPC64/PowerPC64le supports the builtin function __builtin_setrnd to set the floating point rounding mode. This function will use the least significant two bits of integer argument to set the floating point rounding mode. double __builtin_setrnd(int mode); The effective values for mode are: 0 - round to nearest 1 - round to zero 2 - round to +infinity 3 - round to -infinity Note that the mode argument will modulo 4, so if the int argument is greater than 3, it will only use the least significant two bits of the mode. Namely, builtin_setrnd(102)) is equal to builtin_setrnd(2). Reviewed By: jsji Differential Revision: https://reviews.llvm.org/D59405 llvm-svn: 357241	2019-03-29 08:45:24 +00:00
Clement Courbet	b70355f0b4	[ScheduleDAG] Move `Topo` and `addEdge` to base class. Some DAG mutations can only be applied to `ScheduleDAGMI`, and have to internally cast a `ScheduleDAGInstrs` to `ScheduleDAGMI`. There is nothing actually specific to `ScheduleDAGMI` in `Topo`. llvm-svn: 357239	2019-03-29 08:33:05 +00:00
Matt Arsenault	5fddf09187	AMDGPU/GlobalISel: Insert waterfall loop for vector indexing The register index can only really be an SGPR. Lie that a VGPR index is legal, and then rewrite the instruction in a waterfall loop to handle the index. llvm-svn: 357235	2019-03-29 03:54:56 +00:00
Zi Xuan Wu	1445b77e8c	[PowerPC] Strength reduction of multiply by a constant by shift and add/sub in place A shift and add/sub sequence combination is faster in place of a multiply by constant. Because the cycle or latency of multiply is not huge, we only consider such following worthy patterns. ``` (mul x, 2^N + 1) => (add (shl x, N), x) (mul x, -(2^N + 1)) => -(add (shl x, N), x) (mul x, 2^N - 1) => (sub (shl x, N), x) (mul x, -(2^N - 1)) => (sub x, (shl x, N)) ``` And the cycles or latency is subtarget-dependent so that we need consider the subtarget to determine to do or not do such transformation. Also data type is considered for different cycles or latency to do multiply. Differential Revision: https://reviews.llvm.org/D58950 llvm-svn: 357233	2019-03-29 03:08:39 +00:00
Thomas Lively	3f34e1b883	[WebAssembly] Merge used feature sets, update atomics linkage policy Summary: It does not currently make sense to use WebAssembly features in some functions but not others, so this CL adds an IR pass that takes the union of all used feature sets and applies it to each function in the module. This allows us to prevent atomics from being lowered away if some function has opted in to using them. When atomics is not enabled anywhere, we detect whether there exists any atomic operations or thread local storage that would be stripped and disallow linking with objects that contain atomics if and only if atomics or tls are stripped. When atomics is enabled, mark it as used but do not require it of other objects in the link. These changes allow libraries that do not use atomics to be built once and linked into both single-threaded and multithreaded binaries. Reviewers: aheejin, sbc100, dschuff Subscribers: jgravelle-google, hiraditya, sunfish, jfb, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59625 llvm-svn: 357226	2019-03-29 00:14:01 +00:00
Yonghong Song	360a4e2ca6	[BPF] add proper multi-dimensional array support For multi-dimensional array like below int a[2][3]; the previous implementation generates BTF_KIND_ARRAY type like below: . element_type: int . index_type: unsigned int . number of elements: 6 This is not the best way to represent arrays, esp., when converting BTF back to headers and users will see int a[6]; instead. This patch generates proper support for multi-dimensional arrays. For "int a[2][3]", the two BTF_KIND_ARRAY types will be generated: Type #n: . element_type: int . index_type: unsigned int . number of elements: 3 Type #(n+1): . element_type: #n . index_type: unsigned int . number of elements: 2 The linux kernel already supports such a multi-dimensional array representation properly. Signed-off-by: Yonghong Song <yhs@fb.com> Differential Revision: https://reviews.llvm.org/D59943 llvm-svn: 357215	2019-03-28 21:59:49 +00:00
Craig Topper	c25c9b4d16	[X86] Teach the isel optimization for (x << C1) op C2 to (x op (C2>>C1)) << C1 to consider cases where C2>>C1 can fit an unsigned 32-bit immediate For 64-bit operations we should consider if the immediate can be made to fit in an unsigned 32-bits immedate. For OR/XOR this allows us to load the immediate with MOV32ri instead of movabsq. For AND this allows us to fold the immediate. Differential Revision: https://reviews.llvm.org/D59867 llvm-svn: 357196	2019-03-28 18:05:37 +00:00
Reid Kleckner	85e2cdac73	Delay initialization of three static global maps, NFC This avoids allocating a few KB of heap memory on startup, and instead allocates these maps lazily. I noticed this while profiling LLD. llvm-svn: 357192	2019-03-28 17:33:41 +00:00
Petar Avramovic	1af05df3de	[MIPS GlobalISel] Select float constants Select 32 and 64 bit float constants for MIPS32. Differential Revision: https://reviews.llvm.org/D59933 llvm-svn: 357183	2019-03-28 16:58:12 +00:00
Sanjay Patel	5bbf6f0bd8	[x86] avoid cmov in movmsk reduction This is probably the least important of our movmsk problems, but I'm starting at the bottom to reduce distractions. We were creating a select_cc which bypasses the select and bitmask codegen optimizations that we have now. If we produce a compare+negate instead, we allow things like neg/sbb carry bit hacks, and in all cases we avoid a cmov. There's no partial register update danger in these sequences because we always produce the zero-register xor ahead of the 'set' if needed. There seems to be a missing fold for sext of a bool bit here: negl %ecx movslq %ecx, %rax ...but that's an independent transform. Differential Revision: https://reviews.llvm.org/D59818 llvm-svn: 357172	2019-03-28 14:16:13 +00:00

1 2 3 4 5 ...

51455 Commits