llvm-project

Commit Graph

Author	SHA1	Message	Date
Sam Parker	a059efa885	[ARM] Remove ARMComputeBlockSize Forgot to remove file! llvm-svn: 363532	2019-06-17 09:13:10 +00:00
Sam Parker	f7c0b3aeb2	[ARM] Add ARMBasicBlockInfo.cpp Forgot to add file! llvm-svn: 363531	2019-06-17 09:05:43 +00:00
Sam Parker	966f4e874e	[ARM] Extract some code from ARMConstantIslandPass Create the ARMBasicBlockUtils class for tracking and querying basic blocks sizes so we can use them when generating low-overhead loops. Differential Revision: https://reviews.llvm.org/D63265 llvm-svn: 363530	2019-06-17 08:49:09 +00:00
Hans Wennborg	a9e5d2f35d	Re-commit r357452 (take 3): "SimplifyCFG SinkCommonCodeFromPredecessors: Also sink function calls without used results (PR41259)" Third time's the charm. This was reverted in r363220 due to being suspected of an internal benchmark regression and a test failure, none of which turned out to be caused by this. llvm-svn: 363529	2019-06-17 07:47:28 +00:00
Yevgeny Rouban	ee62c40eae	[SimplifyCFG] Fix prof branch_weights MD while removing unreachable switch cases SimplifyCFG has a bug that results in inconsistent prof branch_weights metadata if unreachable switch cases are removed. This patch fixes this bug by making use of the newly introduced SwitchInstProfUpdateWrapper class (see patch D62122). A new test is created. Differential Revision: https://reviews.llvm.org/D62186 llvm-svn: 363527	2019-06-17 05:55:12 +00:00
Justin Hibbits	1d1cf30b73	PowerPC: Optimize SPE double parameter calling setup Summary: SPE passes doubles the same as soft-float, in register pairs as i32 types. This is all handled by the target-independent layer. However, this is not optimal when splitting or reforming the doubles, as it pushes to the stack and loads from, on either side. For instance, to pass a double argument to a function, assuming the double value is in r5, the sequence currently looks like this: evstdd 5, X(1) lwz 3, X(1) lwz 4, X+4(1) Likewise, to form a double into r5 from args in r3 and r4: stw 3, X(1) stw 4, X+4(1) evldd 5, X(1) This optimizes the fence to use SPE instructions. Now, to pass a double to a function: mr 4, 5 evmergehi 3, 5, 5 And to form a double into r5 from args in r3 and r4: evmergelo 5, 3, 4 This is comparable to the way that gcc generates the double splits. This also fixes a bug with expanding builtins to libcalls, where the LowerCallTo() code path was generating intermediate illegal type nodes. Reviewers: nemanjai, hfinkel, joerg Subscribers: kbarton, jfb, jsji, llvm-commits Differential Revision: https://reviews.llvm.org/D54583 llvm-svn: 363526	2019-06-17 03:15:23 +00:00
Seiya Nuta	4f15732067	[yaml2obj][MachO] Don't fill dummy data for virtual sections Summary: Currently, MachOWriter::writeSectionData writes dummy data (0xdeadbeef) to fill section data areas in the file even if the section is a virtual one. Since virtual sections don't occupy any space in the file, writing dummy data could results the "OS.tell() - fileStart <= Sec.offset" assertion failure. This patch fixes the bug by simply not writing any dummy data for virtual sections. Reviewers: beanz, jhenderson, rupprecht, alexshap Reviewed By: alexshap Subscribers: compnerd, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62991 llvm-svn: 363525	2019-06-17 02:07:20 +00:00
Seiya Nuta	13de174b4c	[llvm-objcopy] Add elf32-sparc and elf32-sparcel target Summary: The "sparc"/"sparcel" architectures appears in ArchMap (used by -B option) but not in OutputFormatMap (used by -I/-O option). Add their targets into OutputFormatMap for consistency. Note that AFAIK there're no targets for 32-bit little-endian SPARC ("elf32-sparcel") in GNU binutils. Reviewers: espindola, alexshap, rupprecht, jhenderson, compnerd, jakehehrlich Reviewed By: jhenderson, compnerd, jakehehrlich Subscribers: jyknight, emaste, arichardson, fedor.sergeev, jakehehrlich, MaskRay, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63238 llvm-svn: 363524	2019-06-17 02:03:45 +00:00
Craig Topper	9f2f127009	[X86] Add TB_NO_REVERSE to some folding table entries where the register from uses the REX prefix, but the memory form does not. It would not be safe to unfold the memory form the register form without checking that we are compiling for 64-bit mode. This probaby isn't a real functional issue since we are unlikely to unfold any of these instructions since they don't have any tied registers, aren't commutable, and don't have any inputs other than the address. llvm-svn: 363523	2019-06-16 22:33:09 +00:00
Roman Lebedev	5a663bd77a	[InstSimplify] Fix addo/subo undef folds (PR42209) Fix folds of addo and subo with an undef operand to be: `@llvm.{u,s}{add,sub}.with.overflow` all fold to `{ undef, false }`, as per LLVM undef rules. Same for commuted variants. Based on the original version of the patch by @nikic. Fixes [[ https://bugs.llvm.org/show_bug.cgi?id=42209 \| PR42209 ]] Differential Revision: https://reviews.llvm.org/D63065 llvm-svn: 363522	2019-06-16 20:39:45 +00:00
Nicolai Haehnle	2da0b89d92	[AsmPrinter] Make EmitLinkage and EmitVisibility public Summary: This allows target to implement custom emit of global variables if required. See subsequent patch for a use case. Change-Id: I9654197e3df24503104a54c41fff06845aed37fe Reviewers: arsenm, kzhuravl Subscribers: wdng, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D61650 llvm-svn: 363519	2019-06-16 18:30:42 +00:00
Nicolai Haehnle	41abf2766e	AMDGPU: Prepare for explicit absolute relocations in code generation Summary: We will use absolute relocations for LDS symbols. Change-Id: I9a32795ed0ea835e433a787129cfe3c57ee9a325 Reviewers: arsenm, rampitec Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D61492 llvm-svn: 363517	2019-06-16 17:43:37 +00:00
Nicolai Haehnle	6d71be4e67	AMDGPU: Be explicit about whether the high-word in SI_PC_ADD_REL_OFFSET is 0 Summary: Instead of encoding a high-word of 0 using a fake TargetGlobalAddress, just use a literal target constant. This simplifies some subsequent changes. The generated assembly is now more explicit about the kind of relocation that is to be used. Change-Id: I066835202d23b5941fa7a358eb4b89e9b71ab6f8 Reviewers: arsenm, rampitec Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D61491 llvm-svn: 363516	2019-06-16 17:32:01 +00:00
Nicolai Haehnle	490e83cd43	AMDGPU/GFX10: Support DLC bit in llvm.amdgcn.s.buffer.load intrinsic Summary: Change-Id: Ie4c971462a7749740938c687144e77441dac2539 Reviewers: rampitec, arsenm Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62486 Change-Id: Iae59523edd75c74918d2118df6571a7b671717a0 llvm-svn: 363514	2019-06-16 17:14:12 +00:00
Stanislav Mekhanoshin	5250021672	[AMDGPU] gfx10 conditional registers handling This is cpp source part of wave32 support, excluding overriden getRegClass(). Differential Revision: https://reviews.llvm.org/D63351 llvm-svn: 363513	2019-06-16 17:13:09 +00:00
Sanjay Patel	c8d88ad1a9	[CodeGenPrepare][x86] shift both sides of a vector select when profitable This is based on the example/discussion in PR37428: https://bugs.llvm.org/show_bug.cgi?id=37428 Proper vector shift instructions don't appear until AVX2, so we may generate several extra instructions within a loop trying to compensate for that. It's difficult to recover from that shift expansion later than this, so use the existing TLI hook and splat analysis to enable better codegen. This extends CGP functionality introduced with: rL201655 Differential Revision: https://reviews.llvm.org/D63233 llvm-svn: 363511	2019-06-16 15:29:03 +00:00
Sanjay Patel	d14389c0a5	[x86] split 256-bit vector selects if operands are vector concats This is similar logic/motivation to the select splitting in D62969. In D63233, the pattern changes so that we no longer have an extract_subvector of vselect, but the operands of the select are still being concatenated. The closest case is represented in either the first or last test diffs here - we have an extra instruction, but we converted 3-4 ymm instructions into 4-5 xmm instructions. I think that's the right trade-off for most AVX1 targets. In the example based on PR37428: https://bugs.llvm.org/show_bug.cgi?id=37428 ...this makes the loop about 30% faster (tested on Haswell by compiling with -mavx). Differential Revision: https://reviews.llvm.org/D63364 llvm-svn: 363508	2019-06-16 14:04:49 +00:00
Simon Pilgrim	fcffc2facc	[X86] CombineShuffleWithExtract - handle cases with different vector extract sources Insert the shorter vector source into an undef vector of the longer vector source's type. llvm-svn: 363507	2019-06-16 08:00:41 +00:00
Nico Weber	f6db534224	gn build: Merge r363444 llvm-svn: 363505	2019-06-16 02:24:01 +00:00
Simon Pilgrim	456ca5d7f7	[X86] CombineShuffleWithExtract - assert all src ops types are multiples of rootsize. NFCI. llvm-svn: 363501	2019-06-15 19:12:44 +00:00
Simon Pilgrim	90e87af303	[X86][AVX] Handle lane-crossing shuffle(extract_subvector(x,c1),extract_subvector(y,c2),m1) shuffles Pull out the existing (non)lane-crossing fold into a helper lambda and use for lane-crossing unary shuffles as well. Fixes PR34380 llvm-svn: 363500	2019-06-15 18:30:43 +00:00
Simon Pilgrim	990f3ceb67	[X86][AVX] Decode constant bits from insert_subvector(c1, c2, c3) This mostly happens due to SimplifyDemandedVectorElts reducing a vector to insert_subvector(undef, c1, 0) llvm-svn: 363499	2019-06-15 17:05:24 +00:00
Roman Lebedev	5dd61974f9	[NFC][MCA][X86] Add one more 'clear super register' pattern - movss/movsd load clears high XMM bits llvm-svn: 363498	2019-06-15 16:12:13 +00:00
Roman Lebedev	680c43b73a	[NFC][MCA][X86] Add baseline test coverage for AMD Barcelona (aka K10, fam10h) Looking into sched model for that CPU ... llvm-svn: 363497	2019-06-15 16:12:05 +00:00
Aaron Puchert	e1dc495e63	[Clang] Harmonize Split DWARF options with llc Summary: With Split DWARF the resulting object file (then called skeleton CU) contains the file name of another ("DWO") file with the debug info. This can be a problem for remote compilation, as it will contain the name of the file on the compilation server, not on the client. To use Split DWARF with remote compilation, one needs to either * make sure only relative paths are used, and mirror the build directory structure of the client on the server, * inject the desired file name on the client directly. Since llc already supports the latter solution, we're just copying that over. We allow setting the actual output filename separately from the value of the DW_AT_[GNU_]dwo_name attribute in the skeleton CU. Fixes PR40276. Reviewers: dblaikie, echristo, tejohnson Reviewed By: dblaikie Differential Revision: https://reviews.llvm.org/D59673 llvm-svn: 363496	2019-06-15 15:38:51 +00:00
Kang Zhang	2d51adcb57	[PowerPC] Set the innermost hot loop to align 32 bytes Summary: If the nested loop is an innermost loop, prefer to a 32-byte alignment, so that we can decrease cache misses and branch-prediction misses. Actual alignment of the loop will depend on the hotness check and other logic in alignBlocks. The old code will only align hot loop to 32 bytes when the LoopSize larger than 16 bytes and smaller than 32 bytes, this patch will align the innermost hot loop to 32 bytes not only for the hot loop whose size is 16~32 bytes. Reviewed By: steven.zhang, jsji Differential Revision: https://reviews.llvm.org/D61228 llvm-svn: 363495	2019-06-15 15:10:24 +00:00
Gauthier Harnisch	83c7b61052	[clang] Add storage for APValue in ConstantExpr Summary: When using ConstantExpr we often need the result of the expression to be kept in the AST. Currently this is done on a by the node that needs the result and has been done multiple times for enumerator, for constexpr variables... . This patch adds to ConstantExpr the ability to store the result of evaluating the expression. no functional changes expected. Changes: - Add trailling object to ConstantExpr that can hold an APValue or an uint64_t. the uint64_t is here because most ConstantExpr yield integral values so there is an optimized layout for integral values. - Add basic* serialization support for the trailing result. - Move conversion functions from an enum to a fltSemantics from clang::FloatingLiteral to llvm::APFloatBase. this change is to make it usable for serializing APValues. - Add basic* Import support for the trailing result. - ConstantExpr created in CheckConvertedConstantExpression now stores the result in the ConstantExpr Node. - Adapt AST dump to print the result when present. basic* : None, Indeterminate, Int, Float, FixedPoint, ComplexInt, ComplexFloat, the result is not yet used anywhere but for -ast-dump. Reviewers: rsmith, martong, shafik Reviewed By: rsmith Subscribers: rnkovacs, hiraditya, dexonsmith, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D62399 llvm-svn: 363493	2019-06-15 10:24:47 +00:00
Fangrui Song	b6dc09e725	[BranchProbability] Delete a redundant overflow check llvm-svn: 363492	2019-06-15 10:09:59 +00:00
Nikita Popov	8550fb386a	[SCEV] Use unsigned/signed intersection type in SCEV Based on D59959, this switches SCEV to use unsigned/signed range intersection based on the sign hint. This will prefer non-wrapping ranges in the relevant domain. I've left the one intersection in getRangeForAffineAR() to use the smallest intersection heuristic, as there doesn't seem to be any obvious preference there. Differential Revision: https://reviews.llvm.org/D60035 llvm-svn: 363490	2019-06-15 09:15:52 +00:00
Nikita Popov	9145562b48	[SimplifyIndVar] Simplify non-overflowing saturating add/sub If we can detect that saturating math that depends on an IV cannot overflow, replace it with simple math. This is similar to the CVP optimization from D62703, just based on a different underlying analysis (SCEV vs LVI) that catches different cases. Differential Revision: https://reviews.llvm.org/D62792 llvm-svn: 363489	2019-06-15 08:48:52 +00:00
Fangrui Song	e1aa69f755	[RISCV] Regenerate remat.ll and atomic-rmw.ll after D43256 llvm-svn: 363487	2019-06-15 07:49:14 +00:00
Fangrui Song	44cc4e9351	[RISCV] Simplify RISCVAsmBackend::writeNopData(). NFC llvm-svn: 363486	2019-06-15 06:14:15 +00:00
Alex Brachet	899a3072f0	[objcopy] Error when --preserve-dates is specified with standard streams Summary: llvm-objcopy/strip now error when -p is specified when reading from stdin or writing to stdout Reviewers: jhenderson, rupprecht, espindola, alexshap Reviewed By: jhenderson, rupprecht Subscribers: emaste, arichardson, jakehehrlich, MaskRay, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63090 llvm-svn: 363485	2019-06-15 05:32:23 +00:00
Michael Berg	ad6bb86b2d	adding more fmf propagation for selects plus updated tests llvm-svn: 363484	2019-06-15 04:53:51 +00:00
Fangrui Song	968b5f84af	Revert "adding more fmf propagation for selects plus tests" This reverts rL363474. -debug-only=isel was added to some tests that don't specify `REQUIRES: asserts`. This causes failures on -DLLVM_ENABLE_ASSERTIONS=off builds. I chose to revert instead of fixing the tests because I'm not sure whether we should add `REQUIRES: asserts` to more tests. llvm-svn: 363482	2019-06-15 03:51:08 +00:00
Huihui Zhang	dc2fd6a14e	[InstCombine] Add tests to show missing fold opportunity for "icmp and shift" (nfc). Summary: For icmp pred (and (sh X, Y), C), 0 When C is signbit, expect to fold (X << Y) & signbit ==/!= 0 into (X << Y) >=/< 0, rather than (X & (signbit >> Y)) != 0. When C+1 is power of 2, expect to fold (X << Y) & ~C ==/!= 0 into (X << Y) </>= C+1, rather than (X & (~C >> Y)) == 0. For icmp pred (and X, (sh signbit, Y)), 0 Expect to fold (X & (signbit l>> Y)) ==/!= 0 into (X << Y) >=/< 0 Expect to fold (X & (signbit << Y)) ==/!= 0 into (X l>> Y) >=/< 0 Reviewers: lebedev.ri, efriedma, spatel, craig.topper Reviewed By: lebedev.ri Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63025 llvm-svn: 363479	2019-06-15 00:33:41 +00:00
Matt Arsenault	9487278010	Reapply "GlobalISel: Avoid producing Illegal copies in RegBankSelect" This reapplies r363410, avoiding null dereference if there is no AltRegBank. llvm-svn: 363478	2019-06-15 00:33:26 +00:00
Richard Smith	dda3597288	Add a map_range function for applying map_iterator to a range. In preparation for use in Clang. llvm-svn: 363477	2019-06-14 23:56:40 +00:00
Mitch Phillips	0d44f129bb	Revert "GlobalISel: Avoid producing Illegal copies in RegBankSelect" This patch breaks UBSan build bots. See https://github.com/google/sanitizers/wiki/SanitizerBotReproduceBuild for a guide as to how to reproduce the error. This reverts commit `c2864c0de0`. This reverts rL363410. llvm-svn: 363476	2019-06-14 23:45:34 +00:00
Michael Berg	69394bedc5	adding more fmf propagation for selects plus tests llvm-svn: 363474	2019-06-14 23:30:52 +00:00
Guozhi Wei	d2210af332	[MBP] Move a latch block with conditional exit and multi predecessors to top of loop Current findBestLoopTop can find and move one kind of block to top, a latch block has one successor. Another common case is: * a latch block * it has two successors, one is loop header, another is exit * it has more than one predecessors If it is below one of its predecessors P, only P can fall through to it, all other predecessors need a jump to it, and another conditional jump to loop header. If it is moved before loop header, all its predecessors jump to it, then fall through to loop header. So all its predecessors except P can reduce one taken branch. Differential Revision: https://reviews.llvm.org/D43256 llvm-svn: 363471	2019-06-14 23:08:59 +00:00
Akira Hatanaka	a704a8f28c	[ObjC][ARC] Delete ObjC runtime calls on global variables annotated with 'objc_arc_inert' Those calls are no-ops, so they can be safely deleted. rdar://problem/49839633 Differential Revision: https://reviews.llvm.org/D62433 llvm-svn: 363468	2019-06-14 22:06:32 +00:00
Matt Arsenault	aa41e92e17	AMDGPU: Avoid most waitcnts before calls Currently you get extra waits, because waits are inserted for the register dependencies of the call, and the function prolog waits on everything. Currently waits are still inserted on returns. It may make sense to not do this, and wait in the caller instead. llvm-svn: 363465	2019-06-14 21:52:26 +00:00
Ziang Wan	af857b93df	Add --print-supported-cpus flag for clang. This patch allows clang users to print out a list of supported CPU models using clang [--target=<target triple>] --print-supported-cpus Then, users can select the CPU model to compile to using clang --target=<triple> -mcpu=<model> a.c It is a handy feature to help cross compilation. llvm-svn: 363464	2019-06-14 21:42:21 +00:00
Francis Visoiu Mistrih	5501dda247	[Remarks][NFC] Improve testing and documentation of -foptimization-record-passes This adds: * documentation to the user manual * nicer error message * test for the error case * test for the gold plugin llvm-svn: 363463	2019-06-14 21:38:57 +00:00
Matt Arsenault	282dac717e	SROA: Allow eliminating addrspacecasted allocas There is a circular dependency between SROA and InferAddressSpaces today that requires running both multiple times in order to be able to eliminate all simple allocas and addrspacecasts. InferAddressSpaces can't remove addrspacecasts when written to memory, and SROA helps move pointers out of memory. This should avoid inserting new commuting addrspacecasts with GEPs, since there are unresolved questions about pointer wrapping between different address spaces. For now, don't replace volatile operations that don't match the alloca addrspace, as it would change the address space of the access. It may be still OK to insert an addrspacecast from the new alloca, but be more conservative for now. llvm-svn: 363462	2019-06-14 21:38:31 +00:00
Jinsong Ji	bbab7acedf	[PowerPC][NFC] Comments update and remove some unused def llvm-svn: 363461	2019-06-14 21:33:51 +00:00
Matt Arsenault	e6efb6433f	SROA: Add baseline test for addrspacecast changes llvm-svn: 363460	2019-06-14 21:22:26 +00:00
Matt Arsenault	bb0a610599	AMDGPU: Fix capitalized register names in asm constraints This was a workaround a long time ago, but the canonical lower case names work now. llvm-svn: 363459	2019-06-14 21:16:06 +00:00
Matt Arsenault	9e5fa33378	AMDGPU: Fix dropping memref for ds append/consume The way SelectionDAG treats memory operands is very frustrating, and by default drops them unless a property is set on the pattern. There is no pattern for manually selected instructions, so this requires manually setting them. llvm-svn: 363455	2019-06-14 21:01:24 +00:00

1 2 3 4 5 ...

180341 Commits