llvm-project

Commit Graph

Author	SHA1	Message	Date
Mirko Brkusanin	d7357c52a4	[Mips] Add support for min/max/umin/umax atomics In order to properly implement these atomic we need one register more than other binary atomics. It is used for storing result from comparing values in addition to the one that is used for actual result of operation. https://reviews.llvm.org/D71028	2019-12-12 11:32:37 +01:00
Nicola Zaghen	f798eb21ec	Temporarily Revert "[DataLayout] Fix occurrences that size and range of pointers are assumed to be the same." This reverts commit `5f6208778f`. This caused failures in Transforms/PhaseOrdering/scev-custom-dl.ll const: Assertion `getBitWidth() == CR.getBitWidth() && "ConstantRange types don't agree!"' failed.	2019-12-12 10:29:54 +00:00
Nicola Zaghen	5f6208778f	[DataLayout] Fix occurrences that size and range of pointers are assumed to be the same. GEP index size can be specified in the DataLayout, introduced in D42123. However, there were still places in which getIndexSizeInBits was used interchangeably with getPointerSizeInBits. This notably caused issues with Instcombine's visitPtrToInt; but the unit tests was incorrect, so this remained undiscovered. Differential Revision: https://reviews.llvm.org/D68328 Patch by Joseph Faulls!	2019-12-12 10:07:01 +00:00
Georgii Rymar	fff9f049b2	[llvm-readobj][test] - Cleanup and split tests in tools/llvm-readobj folder. tools/llvm-readobj currently contains tests that are either general for all file types or that mix file types inside. This patch refactors these test and leaves only general tests in that folder. All other tests were moved to ELF/COFF/MachO and wasm accordingly. I tried to minimize amount of changes, so most of the test parts remained unchanged. Any further refactorings and improvements for particular tests should be done independently from this patch. Differential revision: https://reviews.llvm.org/D71269	2019-12-12 12:21:58 +03:00
Alexey Lapshin	71aaebc824	[DWARF5][DWARFVerifier] Check that Skeleton compilation unit does not have children. That patch adds checking into DWARFVerifier that the Skeleton compilation unit does not have children. Differential Revision: https://reviews.llvm.org/D71244	2019-12-12 10:59:10 +03:00
Sam Parker	f8ff3bf55b	Revert "[ARM][MVE] Sink vector shift operand" This reverts commit `e0b966643f`. Instruction selection is failing with expensive checks.	2019-12-12 07:52:57 +00:00
Sam Parker	e0b966643f	[ARM][MVE] Sink vector shift operand The shift amount operand can be provided in a general purpose register so sink it. Flip the vdup and negate so the existing patterns can be used for matching. Differential Revision: https://reviews.llvm.org/D70841	2019-12-12 07:35:21 +00:00
Wenlei He	d275a06487	[AutoFDO] Statistic for context sensitive profile guided inlining Summary: AutoFDO compilation has two places that do inlining - the sample profile loader that does inlining with context sensitive profile, and the regular inliner as CGSCC pass. Ideally we want most inlining to come from sample profile loader as that is driven by context sensitive profile and also retains context sensitivity after inlining. However the reality is most of the inlining actually happens during regular inliner. To track the number of inline instances from sample profile loader and help move more inlining to sample profile loader, I'm adding statistics and optimization remarks for sample profile loader's inlining. Reviewers: wmi, davidxl Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70584	2019-12-11 21:37:21 -08:00
Puyan Lotfi	f5b7a46837	[llvm][MIRVRegNamerUtils] Adding hashing on memoperands. No more hash collisions for memoperands. Now the MIRCanonicalization pass shouldn't hit hash collisions when dealing with nearly identical memory accessing instructions when their memoperands are in fact different. Differential Revision: https://reviews.llvm.org/D71328	2019-12-11 22:11:49 -05:00
Cameron McInally	7aa5c16088	[AArch64][SVE] Add patterns for scalable vselect This patch matches scalable vector selects to predicated move instructions. Differential Revision: https://reviews.llvm.org/D71298	2019-12-11 20:15:44 -06:00
Reid Kleckner	5d986953c8	[IR] Split out target specific intrinsic enums into separate headers This has two main effects: - Optimizes debug info size by saving 221.86 MB of obj file size in a Windows optimized+debug build of 'all'. This is 3.03% of 7,332.7MB of object file size. - Incremental step towards decoupling target intrinsics. The enums are still compact, so adding and removing a single target-specific intrinsic will trigger a rebuild of all of LLVM. Assigning distinct target id spaces is potential future work. Part of PR34259 Reviewers: efriedma, echristo, MaskRay Reviewed By: echristo, MaskRay Differential Revision: https://reviews.llvm.org/D71320	2019-12-11 18:02:14 -08:00
Sanjay Patel	83e1bd36be	[AArch64][x86] add tests for possible infinite loops in DAGCombiner; NFC This is a reduction of a test that failed (infinite looped) with rGd1f0bdf2d2df (subsequently reverted). I've duplicated it for 2 targets to increase coverage - everything down here is wobbly.	2019-12-11 19:41:42 -05:00
Vedant Kumar	56232f950d	Revert "[DWARF] Allow cross-CU references of subprogram definitions" This reverts commit `30038da15b`. It causes the stage2 thinLTO bot to fail with: Assertion failed: (CU.getDIE(CalleeSP) && "Expected declaration subprogram DIE for callee") rdar://57840415	2019-12-11 15:55:48 -08:00
Sanjay Patel	cdf5cfea8e	Revert "[SDAG] remove use restriction in isNegatibleForFree() when called from getNegatedExpression()" This reverts commit `d1f0bdf2d2`. The patch can cause infinite loops in DAGCombiner.	2019-12-11 16:56:58 -05:00
Sam Clegg	881d877846	[WebAssembly] Add new `export_name` clang attribute for controlling wasm export names This is equivalent to the existing `import_name` and `import_module` attributes which control the import names in the final wasm binary produced by lld. This maps the existing This attribute currently requires a string rather than using the symbol name for a couple of reasons: 1. Avoid confusion with static and dynamic linking which is based on symbol name. Exporting a function from a wasm module using this directive is orthogonal to both static and dynamic linking. 2. Avoids name mangling. Differential Revision: https://reviews.llvm.org/D70520	2019-12-11 11:54:57 -08:00
Nikita Popov	8db5143b1a	[InstCombine] Optimize overflow check base on uadd.with.overflow result Fix for https://bugs.llvm.org/show_bug.cgi?id=40846. This adds a combine for cases where a (a + b) < a style overflow check is performed, but with a + b being the result of uadd.with.overflow, so the overflow result is also already available and we can just use it. Subsequently GVN/CSE will deduplicate the extracts. We can run into this situation if you have both a uadd.with.overflow and a manual add + overflow check in the same function (on the same operands), in which case GVN will rewrite the add to the with.overflow result and leave you with this pattern. The implementation is a bit ugly because I'm handling the various canonicalization edge cases. This does not yet handle the negated version of this pattern. Differential Revision: https://reviews.llvm.org/D58644	2019-12-11 20:52:04 +01:00
Danila Kutenin	19e83a9b4c	[ValueTracking] Pointer is known nonnull after load/store If the pointer was loaded/stored before the null check, the check is redundant and can be removed. For now the optimizers do not remove the nullptr check, see https://gcc.godbolt.org/z/H2r5GG. The patch allows to use more nonnull constraints. Also, it found one more optimization in some PowerPC test. This is my first llvm review, I am free to any comments. Differential Revision: https://reviews.llvm.org/D71177	2019-12-11 20:32:29 +01:00
Danila Kutenin	fc765698e0	[ValueTracking] Add tests for non-null check after load/store; NFC Tests for D71177.	2019-12-11 20:26:31 +01:00
Nikita Popov	b361d3bbcd	[MergeFuncs] Remove incorrect attribute copying Fix for https://bugs.llvm.org/show_bug.cgi?id=44236. This code was originally introduced in rG36512330041201e10f5429361bbd79b1afac1ea1. However, the attribute copying was done in the wrong place (in general call replacement, not thunk generation) and a proper fix was implemented in D12581. Previously this code was just unnecessary but harmless (because FunctionComparator ensured that the attributes of the two functions are exactly the same), but since byval was changed to accept a type this copying is actively wrong and may result in malformed IR. Differential Revision: https://reviews.llvm.org/D71173	2019-12-11 20:09:54 +01:00
Andrzej Warzynski	a75463c471	Add intrinsics for unary narrowing operations Summary: The following intrinsics for unary narrowing operations are added: * @llvm.aarch64.sve.sqxtnb * @llvm.aarch64.sve.uqxtnb * @llvm.aarch64.sve.sqxtunb * @llvm.aarch64.sve.sqxtnt * @llvm.aarch64.sve.uqxtnt * @llvm.aarch64.sve.sqxtunt Reviewers: sdesmalen, rengolin, efriedma Reviewed By: efriedma Subscribers: tschuett, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71270	2019-12-11 18:55:51 +00:00
Florian Hahn	2675a3c880	[AArch64] Be more careful to skip debug operands in LdSt Optimizier. This fixes crashes with $noreg operands.	2019-12-11 18:47:45 +00:00
Sanjay Patel	d1f0bdf2d2	[SDAG] remove use restriction in isNegatibleForFree() when called from getNegatedExpression() This is an alternate fix for the bug discussed in D70595. This also includes minimal tests for other in-tree targets to show the problem more generally. We check the number of uses as a predicate for whether some value is free to negate, but that use count can change as we rewrite the expression in getNegatedExpression(). So something that was marked free to negate during the cost evaluation phase becomes not free to negate during the rewrite phase (or the inverse - something that was not free becomes free). This can lead to a crash/assert because we expect that everything in an expression that is negatible to be handled in the corresponding code within getNegatedExpression(). This patch skips the use check during the rewrite phase. So we determine that some expression isNegatibleForFree (identically to without this patch), but during the rewrite, don't rely on use counts to decide how to create the optimal expression. Differential Revision: https://reviews.llvm.org/D70975	2019-12-11 13:30:39 -05:00
Bardia Mahjour	916d37a2bc	[DA] Improve dump to show source and sink of the dependence Summary: The current da printer shows the dependence without indicating which instructions are being considered as the src vs dst. It also silently ignores call instructions, despite the fact that they create confused dependence edges to other memory instructions. This patch addresses these two issues plus a couple of minor non-functional improvements. Authored By: bmahjour Reviewer: dmgreen, fhahn, philip.pfaffe, chandlerc Reviewed By: dmgreen, fhahn Tags: #llvm Differential Revision: https://reviews.llvm.org/D71088	2019-12-11 11:48:16 -05:00
Florian Hahn	4fe92abceb	[AArch64] Skip debug ops with regsOverlap in AArch64 LD/ST opt. This fixes a crash when debug instructions are in between 2 stores.	2019-12-11 16:26:31 +00:00
Ulrich Weigand	5ad67df988	[SystemZ] Add llvm.minimum / llvm.maximum tests The backend already supports the @llvm.minimum and @llvm.maximum intrinsics, but we had no test cases for those. Add tests.	2019-12-11 17:01:13 +01:00
Craig Topper	3adc819b7a	[X86] Erase dead LEA instruction after converting it to MOV in FixupLEAPass::processInstrForSlow3OpLEA.	2019-12-11 07:51:23 -08:00
Ulrich Weigand	ac473394ff	[SystemZ] Fix 128-bit strict FMA expansion pre-z14 Before z14, we did not have any FMA instruction for 128-bit floating-point, so the @llvm.fma.f128 intrinsic needs to be expanded to a libcall on those platforms. This worked correctly for regular FMA, but was implemented incorrectly for the strict version. This was not noticed because we did not have test coverage for this case. This patch fixes that incorrect expansion and adds the missing test cases.	2019-12-11 16:32:08 +01:00
Diogo Sampaio	ee21934588	[ARM][NFC] Change test to use CHECK-NEXT	2019-12-11 14:25:36 +00:00
Matt Arsenault	49d731b5e0	Verifier: Check frame-pointer attribute values There are a few places that check specific string attributes have particular values, and assert if they are something else. The verifier should catch these kinds of cases.	2019-12-11 19:53:49 +05:30
Matt Arsenault	32137699f7	AMDGPU: Fix copy-pasted test name error	2019-12-11 19:44:47 +05:30
Kerry McLaughlin	c0a3ab3655	Revert "[AArch64][SVE] Implement intrinsics for non-temporal loads & stores" This reverts commit `3f5bf35f86` as it was causing build failures in llvm-clang-x86_64-expensive-checks: http://lab.llvm.org:8011/builders/llvm-clang-x86_64-expensive-checks-debian/builds/392 http://lab.llvm.org:8011/builders/llvm-clang-x86_64-expensive-checks-ubuntu/builds/1045	2019-12-11 13:58:39 +00:00
Florian Hahn	17554b8961	[AArch64] Teach Load/Store optimizier to rename store operands for pairing. In some cases, we can rename a store operand, in order to enable pairing of stores. For store pairs, that cannot be merged because the first tored register is defined in between the second store, we try to find suitable rename register. First, we check if we can rename the given register: 1. The first store register must be killed at the store, which means we do not have to rename instructions after the first store. 2. We scan backwards from the first store, to find the definition of the stored register and check all uses in between are renamable. Along they way, we collect the minimal register classes of the uses for overlapping (sub/super)registers. Second, we try to find an available register from the minimal physical register class of the original register. A suitable register must not be 1. defined before FirstMI 2. between the previous definition of the register to rename 3. a callee saved register. We use KILL flags to clear defined registers while scanning from the beginning to the end of the block. This triggers quite often, here are the top changes for MultiSource, SPEC2000, SPEC2006 compiled with -O3 for iOS: Metric: aarch64-ldst-opt.NumPairCreated Program base patch diff test-suite...nch/fourinarow/fourinarow.test 2.00 39.00 1850.0% test-suite...s/ASC_Sequoia/IRSmk/IRSmk.test 46.00 80.00 73.9% test-suite...chmarks/Olden/power/power.test 70.00 96.00 37.1% test-suite...cations/hexxagon/hexxagon.test 29.00 39.00 34.5% test-suite...nchmarks/McCat/05-eks/eks.test 100.00 132.00 32.0% test-suite.../Trimaran/enc-rc4/enc-rc4.test 46.00 59.00 28.3% test-suite...T2006/473.astar/473.astar.test 160.00 200.00 25.0% test-suite.../Trimaran/enc-md5/enc-md5.test 8.00 10.00 25.0% test-suite...telecomm-gsm/telecomm-gsm.test 113.00 139.00 23.0% test-suite...ediabench/gsm/toast/toast.test 113.00 139.00 23.0% test-suite...Source/Benchmarks/sim/sim.test 91.00 111.00 22.0% test-suite...C/CFP2000/179.art/179.art.test 41.00 49.00 19.5% test-suite...peg2/mpeg2dec/mpeg2decode.test 245.00 279.00 13.9% test-suite...marks/Olden/health/health.test 16.00 18.00 12.5% test-suite...ks/Prolangs-C/cdecl/cdecl.test 90.00 101.00 12.2% test-suite...fice-ispell/office-ispell.test 91.00 100.00 9.9% test-suite...oxyApps-C/miniGMG/miniGMG.test 430.00 465.00 8.1% test-suite...lowfish/security-blowfish.test 39.00 42.00 7.7% test-suite.../Applications/spiff/spiff.test 42.00 45.00 7.1% test-suite...arks/mafft/pairlocalalign.test 2473.00 2646.00 7.0% test-suite.../VersaBench/ecbdes/ecbdes.test 29.00 31.00 6.9% test-suite...nch/beamformer/beamformer.test 220.00 235.00 6.8% test-suite...CFP2000/177.mesa/177.mesa.test 2110.00 2252.00 6.7% test-suite...ve-susan/automotive-susan.test 109.00 116.00 6.4% test-suite...s-C/unix-smail/unix-smail.test 65.00 69.00 6.2% test-suite...CI_Purple/SMG2000/smg2000.test 1194.00 1265.00 5.9% test-suite.../Benchmarks/nbench/nbench.test 472.00 500.00 5.9% test-suite...oxyApps-C/miniAMR/miniAMR.test 248.00 262.00 5.6% test-suite...quoia/CrystalMk/CrystalMk.test 18.00 19.00 5.6% test-suite...rks/tramp3d-v4/tramp3d-v4.test 7331.00 7710.00 5.2% test-suite.../Benchmarks/Bullet/bullet.test 5651.00 5938.00 5.1% test-suite...ternal/HMMER/hmmcalibrate.test 750.00 788.00 5.1% test-suite...T2006/456.hmmer/456.hmmer.test 764.00 802.00 5.0% test-suite...ications/JM/ldecod/ldecod.test 1028.00 1079.00 5.0% test-suite...CFP2006/444.namd/444.namd.test 1368.00 1434.00 4.8% test-suite...marks/7zip/7zip-benchmark.test 4471.00 4685.00 4.8% test-suite...6/464.h264ref/464.h264ref.test 3122.00 3271.00 4.8% test-suite...pplications/oggenc/oggenc.test 1497.00 1565.00 4.5% test-suite...T2000/300.twolf/300.twolf.test 742.00 774.00 4.3% test-suite.../Prolangs-C/loader/loader.test 24.00 25.00 4.2% test-suite...0.perlbench/400.perlbench.test 1983.00 2058.00 3.8% test-suite...ications/JM/lencod/lencod.test 4612.00 4785.00 3.8% test-suite...yApps-C++/PENNANT/PENNANT.test 995.00 1032.00 3.7% test-suite...arks/VersaBench/dbms/dbms.test 54.00 56.00 3.7% Reviewers: efriedma, thegameg, samparker, dmgreen, paquette, evandro Reviewed By: paquette Differential Revision: https://reviews.llvm.org/D70450	2019-12-11 13:50:11 +00:00
James Henderson	5224feb7ca	[test][llvm-dwarfdump] Add missing testing for some --debug-* options A number of the --debug-* options in llvm-dwarfdump are not particularly well tested. In some cases, the option is only tested as part of testing another feature, or a specific part of the section that the options dump. This change adds four new tests to address some of these holes. It is not aiming to address every hole however. I kept the --debug-line switch test separate to X86/brief.s because the latter only considers the parts of the line table that are affected by verbose printing, thus missing out things like the header and different values for things like the Line, Column etc registers. Reviewed by: JDevlieghere Differential Revision: https://reviews.llvm.org/D71276	2019-12-11 13:42:54 +00:00
Andrzej Warzynski	65651f197a	[AArch64][SVE] Add DAG combine rules for gather loads and sext/zext Summary: These changes allow us to support sign-extending gather loads with the exisiting intrinsics (i.e. @llvm.aarch64.sve.ld1.gather.*). Reviewers: sdesmalen, huntergr, kmclaughlin, efriedma, rengolin, rovka, dancgr, mgudim Reviewed By: sdesmalen Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits Tags: #llvm Differential revision: https://reviews.llvm.org/D70812	2019-12-11 12:56:18 +00:00
Georgii Rymar	9a5c849991	[llvm-readobj][llvm-readelf] - Remove excessive empty lines when reporting errors and warnings. After recent changes it is now seems possible to get rid of printing '\n' before each error and warning. This makes the output cleaner. Differential revision: https://reviews.llvm.org/D71246	2019-12-11 15:06:33 +03:00
Oliver Stannard	6ae3d310bd	Revert "Reland [AArch64][MachineOutliner] Return address signing for outlined functions" This reverts commit `cec2d5c174`. Reverting because this is still creating outlined functions with return address signing instructions with mismatches SP values. For example: int *volatile v; void foo(int x) { int a[x]; v = &a[0]; v = &a[0]; v = &a[0]; v = &a[0]; v = &a[0]; v = &a[0]; } void bar(int x) { int a[x]; v = 0; v = &a[0]; v = &a[0]; v = &a[0]; v = &a[0]; v = &a[0]; } This generates these two outlined functions, both of which modify SP between the paciasp and retaa instructions: $ clang --target=aarch64-arm-none-eabi -march=armv8.3-a -c test2.c -o - -S -Oz -mbranch-protection=pac-ret+leaf ... OUTLINED_FUNCTION_0: // @OUTLINED_FUNCTION_0 .cfi_sections .debug_frame .cfi_startproc // %bb.0: paciasp .cfi_negate_ra_state mov w8, w0 lsl x8, x8, #2 add x8, x8, #15 // =15 mov x9, sp and x8, x8, #0x7fffffff0 sub x8, x9, x8 mov x29, sp mov sp, x8 adrp x9, v retaa ... OUTLINED_FUNCTION_1: // @OUTLINED_FUNCTION_1 .cfi_startproc // %bb.0: paciasp .cfi_negate_ra_state str x8, [x9, :lo12:v] str x8, [x9, :lo12:v] str x8, [x9, :lo12:v] str x8, [x9, :lo12:v] str x8, [x9, :lo12:v] mov sp, x29 retaa	2019-12-11 12:06:20 +00:00
Simon Tatham	1fed9a0c0c	[TableGen] Add bang-operators !getop and !setop. Summary: These allow you to get and set the operator of a dag node, without affecting its list of arguments. `!getop` is slightly fiddly because in many contexts you need its return value to have a static type more specific than 'any record'. It works to say `!cast<BaseClass>(!getop(...))`, but it's cumbersome, so I made `!getop` take an optional type suffix itself, so that can be written as the shorter `!getop<BaseClass>(...)`. Reviewers: hfinkel, nhaehnle Reviewed By: nhaehnle Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71191	2019-12-11 12:05:22 +00:00
Kerry McLaughlin	3f5bf35f86	[AArch64][SVE] Implement intrinsics for non-temporal loads & stores Summary: Adds the following intrinsics: - llvm.aarch64.sve.ldnt1 - llvm.aarch64.sve.stnt1 This patch creates masked loads and stores with the MONonTemporal flag set when used with the intrinsics above. Reviewers: sdesmalen, paulwalker-arm, dancgr, mgudim, efriedma, rengolin Reviewed By: efriedma Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, cfe-commits, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71000	2019-12-11 11:13:51 +00:00
czhengsz	bf4580b7e7	[PowerPC][NFC] add test case for lwa - loop ds form prep	2019-12-11 06:10:11 -05:00
Sjoerd Meijer	d97cf1f889	[ARM][LowOverheadLoops] Remove dead loop update instructions. After creating a low-overhead loop, the loop update instruction was still lingering around hurting performance. This removes dead loop update instructions, which in our case are mostly SUBS instructions. To support this, some helper functions were added to MachineLoopUtils and ReachingDefAnalysis to analyse live-ins of loop exit blocks and find uses before a particular loop instruction, respectively. This is a first version that removes a SUBS instruction when there are no other uses inside and outside the loop block, but there are some more interesting cases in test/CodeGen/Thumb2/LowOverheadLoops/mve-tail-data-types.ll which shows that there is room for improvement. For example, we can't handle this case yet: .. dlstp.32 lr, r2 .LBB0_1: mov r3, r2 subs r2, #4 vldrh.u32 q2, [r1], #8 vmov q1, q0 vmla.u32 q0, q2, r0 letp lr, .LBB0_1 @ %bb.2: vctp.32 r3 .. which is a lot more tricky because r2 is not only used by the subs, but also by the mov to r3, which is used outside the low-overhead loop by the vctp instruction, and that requires a bit of a different approach, and I will follow up on this. Differential Revision: https://reviews.llvm.org/D71007	2019-12-11 10:20:19 +00:00
Simon Tatham	bd0f271c9e	[ARM][MVE] Add intrinsics for immediate shifts. (reland) This adds the family of `vshlq_n` and `vshrq_n` ACLE intrinsics, which shift every lane of a vector left or right by a compile-time immediate. They mostly work by expanding to the IR `shl`, `lshr` and `ashr` operations, with their second operand being a vector splat of the immediate. There's a fiddly special case, though. ACLE specifies that the immediate in `vshrq_n` can take values up to //and including// the bit size of the vector lane. But LLVM IR thinks that shifting right by the full size of the lane is UB, and feels free to replace the `lshr` with an `undef` half way through the optimization pipeline. Hence, to keep this legal in source code, I have to detect it at codegen time. Logical (unsigned) right shifts by the element size are handled by simply emitting the zero vector; arithmetic ones are converted into a shift of one bit less, which will always give the same output. In order to do that check, I also had to enhance the tablegen MveEmitter so that it can cope with converting a builtin function's operand into a bare integer to pass to a code-generating subfunction. Previously the only bare integers it knew how to handle were flags generated from within `arm_mve.td`. Reviewers: dmgreen, miyuki, MarkMurrayARM, ostannard Reviewed By: dmgreen, MarkMurrayARM Subscribers: echristo, hokein, rdhindsa, kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D71065	2019-12-11 10:10:09 +00:00
Sam Parker	ee7579409b	[ARM][TypePromotion] Enable by default Enable the TypePromotion pass my default (again). This patch was originally committed in `393dacacf7`. This patch was reverted in `a38396939c`. Differential Revision: https://reviews.llvm.org/D70998	2019-12-11 10:00:16 +00:00
Georgii Rymar	445c3fdd2a	[llvm-readelf] - Do no print an empty symbol version as "<corrupt>" It is discussed here https://reviews.llvm.org/D71118#inline-643172 Currently when a version is empty, llvm-readelf prints: "000: 0 (local) 2 (<corrupt>)" But GNU readelf does not treat empty section as corrupt. There is no sense in having empty versions anyways it seems, but this change is for consistency with GNU. Differential revision: https://reviews.llvm.org/D71243	2019-12-11 12:24:37 +03:00
Martin Storsjö	af39708c2d	[llvm-readobj] Fix/improve printing WinEH unwind info for linked PE images ARMWinEHPrinter was already designed to handle linked PE images (since `d2941b43f4`), but resolving symbols didn't consistently take the image base into account (as linked images seldom have a symbol table, except for in MinGW setups). Win64EHDumper wasn't really designed to handle linked images (it would crash if executed on such a file), but a few concepts (getSymbol, taking a virtual address instead of a relocation, and getSectionContaining for finding the section containing a certain virtual address) can be borrowed from ARMWinEHPrinter. Adjust ARMWinEHPrinter to print the address of the exception handler routine as a VA instead of an RVA, consistently with other addresses in the same printout, and make Win64EHDumper print addresses similarly for image cases. Differential Revision: https://reviews.llvm.org/D71303	2019-12-11 10:20:34 +02:00
QingShan Zhang	f99297176c	[PowerPC] Exploitate the Vector Integer Average Instructions PowerPC has instruction to do the semantics of this piece of code: vector int foo(vector int m, vector int n) { return (m + n + 1) >> 1; } This patch is adding the match rule to select it. Differential Revision: https://reviews.llvm.org/D71002	2019-12-11 07:25:57 +00:00
Nico Weber	caa4120906	Revert "[DebugInfo] Refactored macro related generation, added a test case for macinfo.dwo emission." This reverts commit `307f60a1a3`. DebugInfo/X86/debug-macinfo-split-dwarf.ll fails on Windows: Command Output (stdout): -- $ ":" "RUN: at line 1" $ "c:\src\llvm-project\out\gn\bin\llc.exe" "-mtriple=x86_64-pc-windows-gnu" "-O0" "-split-dwarf-file=foo.dwo" "-filetype=obj" Assertion failed: Section && "Cannot switch to a null section!", file ../../llvm/lib/MC/MCStreamer.cpp, line 1103 Stack dump: 0. Program arguments: c:\src\llvm-project\out\gn\bin\llc.exe -mtriple=x86_64-pc-windows-gnu -O0 -split-dwarf-file=foo.dwo -filetype=obj	2019-12-10 21:32:30 -05:00
Fangrui Song	4d53b99c5d	[llvm-ar] Improve tool selection heuristic If llvm-ar is installed at arm-pokymllib32-linux-gnueabi-llvm-ar, it may think it is llvm-lib due to the "lib" substring. Improve the heuristic to make all the following work as intended: llvm-ar-9 (llvm-9 package on Debian) llvm-ranlib.exe Lib.exe (reported by D44808) arm-pokymllib32-linux-gnueabi-llvm-ar (reported by D71030) Reviewed By: raj.khem, rupprecht Differential Revision: https://reviews.llvm.org/D71302	2019-12-10 17:32:50 -08:00
Craig Topper	935d41e4bd	[X86] Split v64i1 arguments into 2 v32i1s that will be promoted to v32i8 under min-legal-vector-width=256 This is an improvement to `88dacbd436`	2019-12-10 17:29:02 -08:00
Puyan Lotfi	f364686f34	[llvm][MIRVRegNamerUtil] Adding hashing against MachineInstr flags. Now, flags will result in differing hashes for a given MI. In effect, if you have two instructions with everything identical except for their flags then you should get two different hashes and fewer collisions. Differential Revision: https://reviews.llvm.org/D70479	2019-12-10 20:16:14 -05:00
Wang, Pengfei	21bc8631fe	[FPEnv][X86] Constrained FCmp intrinsics enabling on X86 Summary: This is a follow up of D69281, it enables the X86 backend support for the FP comparision. Reviewers: uweigand, kpn, craig.topper, RKSimon, cameron.mcinally, andrew.w.kaylor Subscribers: hiraditya, llvm-commits, annita.zhang, LuoYuanke, LiuChen3 Tags: #llvm Differential Revision: https://reviews.llvm.org/D70582	2019-12-11 08:23:09 +08:00
Vlad Tsyrklevich	636c93ed11	Revert "Reapply: [DebugInfo] Recover debug intrinsics when killing duplicated/empty..." This reverts commit `f2ba93971c`, it was causing build timeouts on sanitizer-x86_64-linux-autoconf such as http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-autoconf/builds/44917	2019-12-10 16:03:17 -08:00
Craig Topper	88dacbd436	[X86] Go back to considering v64i1 as a legal type under min-legal-vector-width=256. Scalarize v64i1 arguments and shuffles under min-legal-vector-width=256. This reverts `3e1aee2ba7` in favor of a different approach. Scalarizing isn't great codegen, but making the type illegal was interfering with k constraint in inline assembly.	2019-12-10 15:07:55 -08:00
Sanjay Patel	252d3b9805	[InstSimplify] add tests for insert constant + splat; NFC	2019-12-10 17:16:58 -05:00
Vedant Kumar	30038da15b	[DWARF] Allow cross-CU references of subprogram definitions This allows a call site tag in CU A to reference a callee DIE in CU B without resorting to creating an incomplete duplicate DIE for the callee inside of CU A. We already allow cross-CU references of subprogram declarations, so it doesn't seem like definitions ought to be special. This improves entry value evaluation and tail call frame synthesis in the LTO setting. During LTO, it's common for cross-module inlining to produce a call in some CU A where the callee resides in a different CU, and there is no declaration subprogram for the callee anywhere. In this case llvm would (unnecessarily, I think) emit an empty DW_TAG_subprogram in order to fill in the call site tag. That empty 'definition' defeats entry value evaluation etc., because the debugger can't figure out what it means. As a follow-up, maybe we could add a DWARF verifier check that a DW_TAG_subprogram at least has a DW_AT_name attribute. rdar://46577651 Differential Revision: https://reviews.llvm.org/D70350	2019-12-10 14:00:57 -08:00
Sourabh Singh Tomar	307f60a1a3	[DebugInfo] Refactored macro related generation, added a test case for macinfo.dwo emission. Reviewers: dblaikie, aprantl, jini.susan.george Tags: #debug-info #llvm Differential Revision: https://reviews.llvm.org/D71008	2019-12-11 02:19:27 +05:30
Sourabh Singh Tomar	fb4d8fe1a8	Recommit "[DWARF5] Start emitting DW_AT_dwo_name when -gdwarf-5 is specified." Reviewers: dblaikie, aprantl, probinson Tags: #debug-info #llvm Differential Revision: https://reviews.llvm.org/D71185	2019-12-11 01:24:50 +05:30
Sourabh Singh Tomar	d82b6ba21b	Revert "[DWARF5] Start emitting DW_AT_dwo_name when -gdwarf-5 is specified." This reverts commit `6ef01588f4`. Missing Differetial revision.	2019-12-11 01:20:40 +05:30
Sourabh Singh Tomar	6ef01588f4	[DWARF5] Start emitting DW_AT_dwo_name when -gdwarf-5 is specified.	2019-12-11 01:18:02 +05:30
Yonghong Song	7d0e8930ed	[BPF] put not-section-attribute externs into BTF ".extern" data section Currently for extern variables with section attribute, those BTF_KIND_VARs will not be placed in any DataSec. This is inconvenient as any other generated BTF_KIND_VAR belongs to one DataSec. This patch put these extern variables into ".extern" section so bpf loader can have a consistent processing mechanism for all data sections and variables.	2019-12-10 11:45:17 -08:00
Hans Wennborg	49da20ddb4	Revert `30e8f80fd5` "[DebugInfo] Don't create multiple DBG_VALUEs when sinking" This caused non-determinism in the compiler, see command on the Phabricator code review. > This patch addresses a performance problem reported in PR43855, and > present in the reapplication in in 001574938e5. It turns out that > MachineSink will (often) move instructions to the first block that > post-dominates the current block, and then try to sink further. This > means if we have a lot of conditionals, we can needlessly create large > numbers of DBG_VALUEs, one in each block the sunk instruction passes > through. > > To fix this, rather than immediately sinking DBG_VALUEs, record them in > a pass structure. When sinking is complete and instructions won't be > sunk any further, new DBG_VALUEs are added, avoiding lots of > intermediate DBG_VALUE $noregs being created. > > Differential revision: https://reviews.llvm.org/D70676	2019-12-10 19:20:11 +01:00
Simon Cook	a6e50e40e6	[RISCV] Improve assembler missing feature warnings This adds support for printing improved missing feature error messages from the assembler, which now indicates which feature caused the parse to fail. Differential Revision: https://reviews.llvm.org/D69899	2019-12-10 16:44:48 +00:00
Francesco Petrogalli	0be81968a2	[VectorUtils] Introduce the Vector Function Database (VFDatabase). This patch introduced the VFDatabase, the framework proposed in http://lists.llvm.org/pipermail/llvm-dev/2019-June/133484.html. [] In this patch the VFDatabase is used to bridge the TargetLibraryInfo (TLI) calls that were previously used to query for the availability of vector counterparts of scalar functions. The VFISAKind field `ISA` of VFShape have been moved into into VFInfo, under the assumption that different vector ISAs may provide the same vector signature. At the moment, the vectorizer accepts any of the available ISAs as long as the signature provided by the VFDatabase matches the one expected in the vectorization process. For example, when targeting AVX or AVX2, which both have 256-bit registers, the IR signature of the two vector functions associated to the two ISAs is the same. The `getVectorizedFunction` method at the moment returns the first available match. We will need to add more heuristics to the search system to decide which of the available version (TLI, AVX, AVX2, ...) the system should prefer, when multiple versions with the same VFShape are present. Some of the code in this patch is based on the work done by Sumedh Arani in https://reviews.llvm.org/D66025. [] Notice that in the proposal the VFDatabase was called SVFS. The name VFDatabase is more in line with LLVM recommendations for naming classes and variables. Differential Revision: https://reviews.llvm.org/D67572	2019-12-10 16:36:44 +00:00
Mikhail Maltsev	e6d3261c67	[ARM][MVE] Refactor complex vector intrinsics [NFCI] Summary: This patch refactors instruction selection of the complex vector addition, multiplication and multiply-add intrinsics, so that it is now based on TableGen patterns rather than C++ code. It also changes the first parameter (halving vs non-halving) of the arm_mve_vcaddq IR intrinsic to match the corresponding instruction encoding, hence it requires some changes in the tests. The patch addresses David's comment in https://reviews.llvm.org/D71190 Reviewers: dmgreen, ostannard, simon_tatham, MarkMurrayARM Reviewed By: dmgreen Subscribers: merge_guards_bot, kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D71245	2019-12-10 16:21:52 +00:00
diggerlin	98f5f022f0	[BUG-FIX][XCOFF] fixed a bug of XCOFFObjectFile.cpp when there is padding at the last csect of a sections SUMMARY: Fixed a bug of XCOFFObjectFile.cpp when there is padding at the last csect of a sections. when there is a tail padding of a section, but the value of CurrentAddressLocation do not be increased by the padding size. it will hit assert assert(CurrentAddressLocation == Section->Address && "We should have no padding between sections."); Reviewers: daltenty,hubert.reinterpretcast, Differential Revision: https://reviews.llvm.org/D70859	2019-12-10 11:14:49 -05:00
James Henderson	9614a7c939	[test][llvm-cxxfilt] Improve comment for clarity Differential Revision: https://reviews.llvm.org/D71202	2019-12-10 16:06:36 +00:00
Sanjay Patel	396d18aeb6	[InstCombine] replace shuffle's insertelement operand if inserted scalar is not demanded This pattern is noted as a regression from: D70246 ...where we removed an over-aggressive shuffle simplification. SimplifyDemandedVectorElts fails to catch this case when the insert has multiple uses, so I'm proposing to pattern match the minimal sequence directly. This fold does not conflict with any of our current shuffle undef/poison semantics. Differential Revision: https://reviews.llvm.org/D71220	2019-12-10 10:10:05 -05:00
Luís Marques	707e970781	[DWARF][RISCV] Test resolving of RISC-V relocations Summary: This patch adds an object file (in yaml format) with a synthetic .debug_info section which we use to test that the supported RISC-V relocations are properly resolved. Reviewers: asb, lenary, MaskRay Reviewed By: MaskRay Tags: #llvm Differential Revision: https://reviews.llvm.org/D70541	2019-12-10 14:02:07 +00:00
stozer	f2ba93971c	Reapply: [DebugInfo] Recover debug intrinsics when killing duplicated/empty... basic blocks Originally applied in `72ce759928`. Fixed a build failure caused by incorrect use of cast instead of dyn_cast. This reverts commit `8b0780f795`.	2019-12-10 13:33:32 +00:00
Sam Parker	06b0228e80	add test for previous commit	2019-12-10 13:24:01 +00:00
Kiran Chandramohan	965ed1e974	[AArch64] Fix issues with large arrays on stack Summary: This patch fixes a few issues when large arrays are allocated on the stack. Currently, clang has inconsistent behaviour, for debug builds there is an assertion failure when the array size on stack is around 2GB but there is no assertion when the stack is around 8GB. For release builds there is no assertion, the compilation succeeds but generates incorrect code. The incorrect code generated is due to using int/unsigned int instead of their 64-bit counterparts. This patch, 1) Removes the assertion in frame legality check. 2) Converts int/unsigned int in some places to the 64-bit variants. This helps in generating correct code and removes the inconsistent behaviour. 3) Adds a test which runs without optimisations. Reviewers: sdesmalen, efriedma, fhahn, aemerson Reviewed By: efriedma Subscribers: eli.friedman, fpetrogalli, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70496	2019-12-10 11:44:41 +00:00
Simon Tatham	0e894edee1	[TableGen] Permit dag operators to be unset. This is not a new semantic feature. The syntax `(? 1, 2, 3)` was disallowed by the parser in a dag //expression//, but there were already ways to sneak a `?` into the operator field of a dag //value//, e.g. by initializing it from a class template parameter which is then set to `?` by the instantiating `def`. This patch makes `?` in the operator slot syntactically legal, so it's now easy to construct dags with an unset operator. Also, the semantics of `!con` are relaxed so that it will allow a combination of set and unset operator fields in the dag nodes it's concatenating, with the restriction that all the operators that are //not// unset still have to agree with each other. Reviewers: hfinkel, nhaehnle Reviewed By: hfinkel, nhaehnle Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71195	2019-12-10 11:09:40 +00:00
Cullen Rhodes	1b9a608c84	[AArch64][SVE] Add wide compare immediate patterns Summary: Recognize wide compares where the wide operand is a splat of a scalar value in the appropriate range and convert to the immediate variant of the instruction. Patch by Graham Hunter Reviewers: sdesmalen, efriedma, dancgr, rovka, rengolin Reviewed By: efriedma Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71009	2019-12-10 10:41:22 +00:00
Mikael Holmen	4763267eee	[LegalizeTypes] Bugfixes for big-endian targets when handling BITCASTs Summary: This fixes PR44135. The special case when we promote a bitcast from a vector to an int needs special handling when we are on a big-endian target. Prior to this fix, for the added vec_to_int we see the following in the SelectionDAG printouts Type-legalized selection DAG: %bb.1 'foo:bb.1' SelectionDAG has 9 nodes: t0: ch = EntryToken t2: v8i16,ch = CopyFromReg t0, Register:v8i16 %0 t17: v4i32 = bitcast t2 t23: i32 = extract_vector_elt t17, Constant:i32<3> t8: ch,glue = CopyToReg t0, Register:i32 $r0, t23 t9: ch = ARMISD::RET_FLAG t8, Register:i32 $r0, t8:1 and I think here the extract_vector_elt is wrong and extracts the value from the wrong index. The program program should return the 32 bits made up of the elements at index 4 and 5 in the vec6 array, but with t23: i32 = extract_vector_elt t17, Constant:i32<3> as far as I can tell, we will extract values that originally didn't even exist in the vec6 vectore. If we would instead extract the element at index 2 we would get the wanted values. With this fix we insert a right shift after the bitcast in DAGTypeLegalizer::PromoteIntRes_BITCAST which then gives us Type-legalized selection DAG: %bb.1 'vec_to_int:bb.1' SelectionDAG has 9 nodes: t0: ch = EntryToken t2: v8i16,ch = CopyFromReg t0, Register:v8i16 %0 t23: v4i32 = bitcast t2 t27: i32 = extract_vector_elt t23, Constant:i32<2> t8: ch,glue = CopyToReg t0, Register:i32 $r0, t27 t9: ch = ARMISD::RET_FLAG t8, Register:i32 $r0, t8:1 So now we get t27: i32 = extract_vector_elt t23, Constant:i32<2> which is what we want. Similarly, the new int_to_vec testcase exposes a bug where we cast the other direction. Then we instead need to add a left shift before the bitcast on big-endian targets for the bits in the input integer to end up at the exptected place in the vector. Reviewers: bogner, spatel, craig.topper, t.p.northover, dmgreen, efriedma, SjoerdMeijer, samparker Reviewed By: efriedma Subscribers: eli.friedman, bjope, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70942	2019-12-10 11:22:35 +01:00
Mikael Holmen	4d280d3ac0	Add testcases exposing PR44135	2019-12-10 11:22:35 +01:00
Georgii Rymar	dac5ddb482	[llvm-readelf/llvm-readobj] - Improved the error reporting in a few method related to versioning. I was investigating a change previously discussed that eliminates an excessive empty lines from the output when we report warnings and errors (https://reviews.llvm.org/D70826#inline-639055) and found that we need this refactoring or alike to achieve that. The problem is that some of our functions that finds symbol versions just fail instead of returning errors or printing warnings. Another problem is that they might print a warning on the same line with the regular output. In this patch I've splitted getting of the version information and dumping of it for GNU printVersionSymbolSection(). I had to change a few methods to return Error or Expected<> to do that properly. Differential revision: https://reviews.llvm.org/D71118	2019-12-10 13:08:18 +03:00
Hans Wennborg	bfb53c55b8	Add more diff -b to roundtrip-compress.test It was missing on the first test invocation. The flag is necessary to ignore line-ending differences on Windows.	2019-12-10 10:32:16 +01:00
Georgii Rymar	dbf520f617	[llvm-readobj][test] - Move platform specific test cases and their inputs to separate folders. This creates the next subfolders in the test directory: "COFF", "ELF", "MachO", "wasm". I've also removed platform specific prefixes, like "coff-*". One unused binary was removed as well: `Inputs/relocs.obj.elf-mips` Differential revision: https://reviews.llvm.org/D71203	2019-12-10 11:36:23 +03:00
Yonghong Song	4448125007	[BPF] Support to emit debugInfo for extern variables extern variable usage in BPF is different from traditional pure user space application. Recent discussion in linux bpf mailing list has two use cases where debug info types are required to use extern variables: - extern types are required to have a suitable interface in libbpf (bpf loader) to provide kernel config parameters to bpf programs. https://lore.kernel.org/bpf/CAEf4BzYCNo5GeVGMhp3fhysQ=_axAf=23PtwaZs-yAyafmXC9g@mail.gmail.com/T/#t - extern types are required so kernel bpf verifier can verify program which uses external functions more precisely. This will make later link with actual external function no need to reverify. https://lore.kernel.org/bpf/87eez4odqp.fsf@toke.dk/T/#m8d5c3e87ffe7f2764e02d722cb0d8cbc136880ed This patch added bpf support to consume such info into BTF, which can then be used by bpf loader. Function processFuncPrototypes() only adds extern function definitions into BTF. The functions with actual definition have been added to BTF in some other places. Differential Revision: https://reviews.llvm.org/D70697	2019-12-09 21:53:29 -08:00
Jonas Devlieghere	d9466653e4	[llvm/dwarfdump] Use the architecture string to filter. Currently dwarfdump uses the ArchType to filter out architectures, which is problematic for architectures like arm64e and x86_64h that map back to arm64 and x86_64 respectively. The result is that the filter doesn't work for these architectures because it matches all the variants. This is especially bad because usually these architectures are the reason to use the filter in the first place. Instead, we should match the architecture based on the string name. This means the filter works for the values printed by dwarfdump. It has the unfortunate side effect of not working for aliases, like AArch64, but I think that's worth the trade-off. rdar://53653014 Differential revision: https://reviews.llvm.org/D71230	2019-12-09 17:17:01 -08:00
Liu, Chen3	bbf7860b93	add support for strict operation fpextend/fpround/fsqrt on X86 backend Differential Revision: https://reviews.llvm.org/D71184	2019-12-10 09:04:28 +08:00
Eric Christopher	9c6b7f68b8	Revert "[ARM][MVE] Add intrinsics for immediate shifts." and two follow-on commits: one warning fix and one functionality. As it's breaking at least the lto bot: http://lab.llvm.org:8011/builders/clang-with-lto-ubuntu/builds/15132/steps/test-stage1-compiler/logs/stdio This reverts commits: `8d70f3c933` `ff4dceef92` `d97b3e3e65`	2019-12-09 16:47:38 -08:00
Eli Friedman	7c69a03c56	[ConstantFold][SVE] Fix constant folding for shufflevector. Don't try to fold away shuffles which can't be folded. Fix creation of shufflevector constant expressions. Differential Revision: https://reviews.llvm.org/D71147	2019-12-09 15:31:50 -08:00
Dávid Bolvanský	584ed88226	[Codegen][X86] Modernize/regenerate old tests. NFCI. Summary: Switch to FileCheck where possible. Adjust tests so they can be easily regenerated by update scripts. Reviewers: craig.topper, spatel, RKSimon Reviewed By: spatel Subscribers: MatzeB, qcolombet, arphaman, jfb, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71211	2019-12-10 00:27:46 +01:00
Eli Friedman	f1ddef34f1	[AArch64][SVE] Implement SPLAT_VECTOR for i1 vectors. The generated sequence with whilelo is unintuitive, but it's the best I could come up with given the limited number of SVE instructions that interact with scalar registers. The other sequence I was considering was something like dup+cmpne, but an extra scalar instruction seems better than an extra vector instruction. Differential Revision: https://reviews.llvm.org/D71160	2019-12-09 15:09:33 -08:00
Johannes Doerfert	af52d5a04c	[IPConstantProp][NFCI] Improve and modernize tests Summary: This change is in preparation to reuse these test for the Attributor. It mainly is to remove UB, make it clear what is tested, and use "modern" run lines. Reviewers: fhahn, efriedma, mssimpso, davide Subscribers: bollu, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69747	2019-12-09 15:24:08 -06:00
Johannes Doerfert	a7d992c0f2	[ValueTracking] Allow context-sensitive nullness check for non-pointers Summary: Same as D60846 and D69571 but with a fix for the problem encountered after them. Both times it was a missing context adjustment in the handling of PHI nodes. The reproducers created from the bugs that caused the old commits to be reverted are included. Reviewers: nikic, nlopes, mkazantsev, spatel, dlrobertson, uabelho, hakzsam, hans Subscribers: hiraditya, bollu, asbirlea, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71181	2019-12-09 15:15:52 -06:00
Hiroshi Yamauchi	d9ae493937	[PGO][PGSO] Instrument the code gen / target passes. Summary: Split off of D67120. Add the profile guided size optimization instrumentation / queries in the code gen or target passes. This doesn't enable the size optimizations in those passes yet as they are currently disabled in shouldOptimizeForSize (for non-IR pass queries). A second try after reverted D71072. Reviewers: davidxl Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71149	2019-12-09 12:42:59 -08:00
Sanjay Patel	92f94b762a	[InstCombine] add tests for shuffle with insertelement operand; NFC	2019-12-09 14:27:03 -05:00
Jinsong Ji	3d41a58eac	[PowerPC][NFC] Rename ANDI(S)o8 to ANDI(S)8o Summary: This is found during https://reviews.llvm.org/D70758 All the other record forms are having suffix o at the end. ANDIo8 and ANDISo8 are the only two that put o before 8. This patch rename them to be consistent with others. Reviewers: #powerpc, hfinkel, nemanjai, lei, steven.zhang, echristo, jhibbits, joerg Reviewed By: jhibbits Subscribers: wuzish, hiraditya, kbarton, shchenz, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70928	2019-12-09 19:21:34 +00:00
Mark Murray	fc3417cb5a	[ARM][MVE][Intrinsics] Add VQADDQ, VHADDQ, VRHADDQ, VQSUBQ, VHSUBQ, VQDMULHQ, VQRDMULHQ intrinsics. Summary: Add VQADDQ, VHADDQ, VRHADDQ, VQSUBQ, VHSUBQ, VQDMULHQ, VQRDMULHQ intrinsics and unit tests. Reviewers: simon_tatham, ostannard, dmgreen, miyuki Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D71198	2019-12-09 17:41:47 +00:00
Mark Murray	2eb61fa5d6	[ARM][MVE][Intrinsics] Add VMULL[BT]Q_(INT\|POLY) intrinsics. Summary: Add VMULL[BT]Q_(INT\|POLY) intrinsics and unit tests. Reviewers: simon_tatham, ostannard, dmgreen Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D71066	2019-12-09 17:41:47 +00:00
Simon Tatham	d97b3e3e65	[ARM][MVE] Add intrinsics for immediate shifts. Summary: This adds the family of `vshlq_n` and `vshrq_n` ACLE intrinsics, which shift every lane of a vector left or right by a compile-time immediate. They mostly work by expanding to the IR `shl`, `lshr` and `ashr` operations, with their second operand being a vector splat of the immediate. There's a fiddly special case, though. ACLE specifies that the immediate in `vshrq_n` can take values up to //and including// the bit size of the vector lane. But LLVM IR thinks that shifting right by the full size of the lane is UB, and feels free to replace the `lshr` with an `undef` half way through the optimization pipeline. Hence, to keep this legal in source code, I have to detect it at codegen time. Logical (unsigned) right shifts by the element size are handled by simply emitting the zero vector; arithmetic ones are converted into a shift of one bit less, which will always give the same output. In order to do that check, I also had to enhance the tablegen MveEmitter so that it can cope with converting a builtin function's operand into a bare integer to pass to a code-generating subfunction. Previously the only bare integers it knew how to handle were flags generated from within `arm_mve.td`. Reviewers: dmgreen, miyuki, MarkMurrayARM, ostannard Reviewed By: MarkMurrayARM Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D71065	2019-12-09 15:44:09 +00:00
James Henderson	01d8bb4939	[test][llvm-cxxfilt] Add missing '-n' See also `e84468c1f`.	2019-12-09 15:06:41 +00:00
James Henderson	2815390532	[test][llvm-cxxfilt] Fix darwin build bot When committing `dba420bc05`, I missed that a darwin-specific change had been recently introduced into llvm-cxxfilt, which my change ignored and consequently broke the darwin build bot. This change fixes this issue as well as improving naming/commenting of things related to this point so that people are less likely to run into the same issue as I did.	2019-12-09 14:01:14 +00:00
Sam Elliott	cb664baf50	[RISCV] Fix mir-target-flags.ll	2019-12-09 13:51:08 +00:00
Sam Elliott	c20930a724	[RISCV] Machine Operand Flag Serialization Summary: These hooks ensure that the RISC-V backend can serialize and parse MIR correctly. Reviewers: jrtc27, luismarques Reviewed By: luismarques Subscribers: hiraditya, asb, rbar, johnrusso, simoncook, sabuasal, niosHD, kito-cheng, shiva0217, jrtc27, MaskRay, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, rkruppe, PkmX, jocewei, psnobl, benna, Jim, s.egerton, pzheng, sameer.abuasal, apazos, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70666	2019-12-09 13:18:32 +00:00
Jeremy Morse	00e238896c	[DebugInfo] Nerf placeDbgValues, with prejudice CodeGenPrepare::placeDebugValues moves variable location intrinsics to be immediately after the Value they refer to. This makes tracking of locations very easy; but it changes the order in which assignments appear to the debugger, from the source programs order to the order in which the optimised program computes values. This then leads to PR43986 and PR38754, where variable locations that were in a conditional block are made unconditional, which is highly misleading. This patch adjusts placeDbgValues to only re-order variable location intrinsics if they use a Value before it is defined, significantly reducing the damage that it does. This is still not 100% safe, but the rest of CodeGenPrepare needs polishing to correctly update debug info when optimisations are performed to fully fix this. This will probably break downstream debuginfo tests -- if the instruction-stream position of variable location changes isn't the focus of the test, an easy fix should be to manually apply placeDbgValues' behaviour to the failing tests, moving dbg.value intrinsics next to SSA variable definitions thus: %foo = inst1 %bar = ... %baz = ... void call @llvm.dbg.value(metadata i32 %foo, ... to %foo = inst1 void call @llvm.dbg.value(metadata i32 %foo, ... %bar = ... %baz = ... This should return your test to exercising whatever it was testing before. Differential Revision: https://reviews.llvm.org/D58453	2019-12-09 12:52:10 +00:00
James Henderson	dba420bc05	[test][tools] Add missing and improve testing Mostly this adds testing for certain aliases in more explicit ways. There are also a few tidy-ups, and additions of missing testing, where the feature was either not tested at all, or not tested explicitly and sufficiently. Reviewed by: MaskRay, rupprecht, grimar Differential Revision: https://reviews.llvm.org/D71116	2019-12-09 12:24:23 +00:00
Mikhail Maltsev	0d1490bf6a	[ARM][MVE] Add complex vector intrinsics Summary: This patch adds intrinsics for the following MVE instructions: * VCADD, VHCADD * VCMUL * VCMLA Each of the above 3 groups has a corresponding new LLVM IR intrinsic. Reviewers: simon_tatham, MarkMurrayARM, ostannard, dmgreen Reviewed By: MarkMurrayARM Subscribers: merge_guards_bot, kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D71190	2019-12-09 12:05:59 +00:00
David Green	d6642ed1c8	[ARM] Add missing REQUIRES: asserts to test. NFC	2019-12-09 11:43:43 +00:00
David Green	b1aba0378e	[ARM] Enable MVE masked loads and stores With the extra optimisations we have done, these should now be fine to enable by default. Which is what this patch does. Differential Revision: https://reviews.llvm.org/D70968	2019-12-09 11:37:34 +00:00
David Green	be7a107070	[ARM] Teach the Arm cost model that a Shift can be folded into other instructions This attempts to teach the cost model in Arm that code such as: %s = shl i32 %a, 3 %a = and i32 %s, %b Can under Arm or Thumb2 become: and r0, r1, r2, lsl #3 So the cost of the shift can essentially be free. To do this without trying to artificially adjust the cost of the "and" instruction, it needs to get the users of the shl and check if they are a type of instruction that the shift can be folded into. And so it needs to have access to the actual instruction in getArithmeticInstrCost, which if available is added as an extra parameter much like getCastInstrCost. We otherwise limit it to shifts with a single user, which should hopefully handle most of the cases. The list of instruction that the shift can be folded into include ADC, ADD, AND, BIC, CMP, EOR, MVN, ORR, ORN, RSB, SBC and SUB. This translates to Add, Sub, And, Or, Xor and ICmp. Differential Revision: https://reviews.llvm.org/D70966	2019-12-09 10:24:33 +00:00
David Green	f008b5b8ce	[ARM] Additional tests and minor formatting. NFC This adds some extra cost model tests for shifts, and does some minor adjustments to some Neon code to make it clear as to what it applies to. Both NFC.	2019-12-09 10:24:33 +00:00
David Stenberg	6965f835b4	[DebugInfo] Make describeLoadedValue() reg aware Summary: Currently the describeLoadedValue() hook is assumed to describe the value of the instruction's first explicit define. The hook will not be called for instructions with more than one explicit define. This commit adds a register parameter to the describeLoadedValue() hook, and invokes the hook for all registers in the worklist. This will allow us to for example describe instructions which produce more than two parameters' values; e.g. Hexagon's various combine instructions. This also fixes situations in our downstream target where we may pass smaller parameters in the high part of a register. If such a parameter's value is produced by a larger copy instruction, we can't describe the call site value using the super-register, and we instead need to know which sub-register that should be used. This also allows us to handle cases like this: $ebx = [...] $rdi = MOVSX64rr32 $ebx $esi = MOV32rr $edi CALL64pcrel32 @call The hook will first be invoked for the MOV32rr instruction, which will say that @call's second parameter (passed in $esi) is described by $edi. As $edi is not preserved it will be added to the worklist. When we get to the MOVSX64rr32 instruction, we need to describe two values; the sign-extended value of $ebx -> $rdi for the first parameter, and $ebx -> $edi for the second parameter, which is now possible. This commit modifies the dbgcall-site-lea-interpretation.mir test case. In the test case, the values of some 32-bit parameters were produced with LEA64r. Perhaps we can in general cases handle such by emitting expressions that AND out the lower 32-bits, but I have not been able to land in a case where a LEA64r is used for a 32-bit parameter instead of LEA64_32 from C code. I have not found a case where it would be useful to describe parameters using implicit defines, so in this patch the hook is still only invoked for explicit defines of forwarding registers. Reviewers: djtodoro, NikolaPrica, aprantl, vsk Reviewed By: djtodoro, vsk Subscribers: ormris, hiraditya, llvm-commits Tags: #debug-info, #llvm Differential Revision: https://reviews.llvm.org/D70431	2019-12-09 10:47:49 +01:00
David Stenberg	f3696533f2	Revert "[DebugInfo] Make describeLoadedValue() reg aware" This reverts commit `3cd93a4efc`. I'll recommit with a well-formatted arcanist commit message.	2019-12-09 10:45:13 +01:00
David Stenberg	3cd93a4efc	[DebugInfo] Make describeLoadedValue() reg aware Currently the describeLoadedValue() hook is assumed to describe the value of the instruction's first explicit define. The hook will not be called for instructions with more than one explicit define. This commit adds a register parameter to the describeLoadedValue() hook, and invokes the hook for all registers in the worklist. This will allow us to for example describe instructions which produce more than two parameters' values; e.g. Hexagon's various combine instructions. This also fixes a case in our downstream target where we may pass smaller parameters in the high part of a register. If such a parameter's value is produced by a larger copy instruction, we can't describe the call site value using the super-register, and we instead need to know which sub-register that should be used. This also allows us to handle cases like this: $ebx = [...] $rdi = MOVSX64rr32 $ebx $esi = MOV32rr $edi CALL64pcrel32 @call The hook will first be invoked for the MOV32rr instruction, which will say that @call's second parameter (passed in $esi) is described by $edi. As $edi is not preserved it will be added to the worklist. When we get to the MOVSX64rr32 instruction, we need to describe two values; the sign-extended value of $ebx -> $rdi for the first parameter, and $ebx -> $edi for the second parameter, which is now possible. This commit modifies the dbgcall-site-lea-interpretation.mir test case. In the test case, the values of some 32-bit parameters were produced with LEA64r. Perhaps we can in general cases handle such by emitting expressions that AND out the lower 32-bits, but I have not been able to land in a case where a LEA64r is used for a 32-bit parameter instead of LEA64_32 from C code. I have not found a case where it would be useful to describe parameters using implicit defines, so in this patch the hook is still only invoked for explicit defines of forwarding registers.	2019-12-09 10:44:17 +01:00
Hans Wennborg	a38396939c	Revert `393dacacf7` "[ARM] Enable TypePromotion by default" This caused "Too many bits for uint64_t" asserts when building Chromium. See https://crbug.com/1031978#c2 for a reproducer. I'll follow up on the llvm-commits thread with a creduced version. > ARMCodeGenPrepare has already been generalized and renamed to > TypePromotion. We've had it enabled and tested downstream for a > while, so enable it by default. > > Differential Revision: https://reviews.llvm.org/D70998	2019-12-09 09:39:31 +01:00
Amaury Séchet	d7aded3937	[PowerPC] Automatically generate store-constant.ll . NFC	2019-12-09 01:08:18 +01:00
Sanjay Patel	1c4dd3ae2f	[InstSimplify] fold copysign with negated operand, part 2 This is another transform suggested in PR44153: https://bugs.llvm.org/show_bug.cgi?id=44153 Unlike rG12f39e0fede9, it doesn't look like the backend matches this variant.	2019-12-08 10:16:29 -05:00
Sanjay Patel	12f39e0fed	[InstSimplify] fold copysign with negated operand This is another transform suggested in PR44153: https://bugs.llvm.org/show_bug.cgi?id=44153 The backend for some targets already manages to get this if it converts copysign to bitwise logic.	2019-12-08 10:08:02 -05:00
Kristina Bessonova	68f464ac2e	[llvm-dwarfdump][Statistics] Unify coverage statistic computation Summary: The patch removes OffsetToFirstDefinition in the 'scope bytes total' statistic computation. Thus it unifies the way the scope and the coverage buckets are computed. The rationals behind that are the following: 1. OffsetToFirstDefinition was used to calculate the variable's life range. However, there is no simple way to do it accurately, so the scope calculated this way might be misleading. See D69027 for more details on the subject. 2. Both 'scope bytes total' and coverage buckets seem to be intended to represent the same data in different ways. Otherwise, the statistics might be controversial and confusing. Note that the approach gives up a thorough evaluation of debug information completeness (i.e. coverage buckets by themselves doesn't tell how good the debug information is). Only changes in coverage over time make a 'physical' sense. Reviewers: djtodoro, aprantl, vsk, dblaikie, avl Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70548	2019-12-08 15:46:49 +03:00
David Green	792fab343b	[ARM] Attempt to use whole register vmovs for MVE shuffles. MVE doesn't have the range of shuffle instructions available in Neon. We also cannot use the trick of cutting a difficult vector shuffle in half to simplify things. Instead we need to be more careful about how we lower shuffles. This patch adds an extra combine that attempts to find "whole lane" vmovs when lowering shuffles of smaller types. This helps us make some shuffles a lot simpler, generating single lane movs for the parts that can make use of it, falling back to the original shuffle for the rest. Differential Revision: https://reviews.llvm.org/D69509	2019-12-08 10:53:54 +00:00
David Green	3a6eb5f160	[ARM] Disable VLD4 under MVE Alas, using half the available vector registers in a single instruction is just too much for the register allocator to handle. The mve-vldst4.ll test here fails when these instructions are enabled at present. This patch disables the generation of VLD4 and VST4 by adding a mve-max-interleave-factor option, which we currently default to 2. Differential Revision: https://reviews.llvm.org/D71109	2019-12-08 10:37:29 +00:00
Florian Hahn	c491949694	[LV] Pick correct BB as insert point when fixing PHI for FORs. Currently we fail to pick the right insertion point when PreviousLastPart of a first-order-recurrence is a PHI node not in the LoopVectorBody. This can happen when PreviousLastPart is produce in a predicated block. In that case, we should pick the insertion point in the BB the PHI is in. Fixes PR44020. Reviewers: hsaito, fhahn, Ayal, dorit Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D71071	2019-12-07 19:32:00 +00:00
Yonghong Song	5ea611daf9	[BPF] Support weak global variables for BTF Generate types for global variables with "weak" attribute. Keep allocation scope the same for both weak and non-weak globals as ELF symbol table can determine whether a global symbol is weak or not. Differential Revision: https://reviews.llvm.org/D71162	2019-12-07 08:58:19 -08:00
Ulrich Weigand	9db13b5a7d	[FPEnv] Constrained FCmp intrinsics This adds support for constrained floating-point comparison intrinsics. Specifically, we add: declare <ty2> @llvm.experimental.constrained.fcmp(<type> <op1>, <type> <op2>, metadata <condition code>, metadata <exception behavior>) declare <ty2> @llvm.experimental.constrained.fcmps(<type> <op1>, <type> <op2>, metadata <condition code>, metadata <exception behavior>) The first variant implements an IEEE "quiet" comparison (i.e. we only get an invalid FP exception if either argument is a SNaN), while the second variant implements an IEEE "signaling" comparison (i.e. we get an invalid FP exception if either argument is any NaN). The condition code is implemented as a metadata string. The same set of predicates as for the fcmp instruction is supported (except for the "true" and "false" predicates). These new intrinsics are mapped by SelectionDAG codegen onto two new ISD opcodes, ISD::STRICT_FSETCC and ISD::STRICT_FSETCCS, again representing quiet vs. signaling comparison operations. Otherwise those nodes look like SETCC nodes, with an additional chain argument and result as usual for strict FP nodes. The patch includes support for the common legalization operations for those nodes. The patch also includes full SystemZ back-end support for the new ISD nodes, mapping them to all available SystemZ instruction to fully implement strict semantics (scalar and vector). Differential Revision: https://reviews.llvm.org/D69281	2019-12-07 11:28:39 +01:00
Kai Luo	884351547d	[PowerPC] Fix MI peephole optimization for splats Summary: This patch fixes an issue where the PPC MI peephole optimization pass incorrectly remove a vector swap. Specifically, the pass can combine a splat/swap to a splat/copy. It uses `TargetRegisterInfo::lookThruCopyLike` to determine that the operands to the splat are the same. However, the current logic only compares the operands based on register numbers. In the case where the splat operands are ultimately feed from the same physical register, the pass can incorrectly remove a swap if the feed register for one of the operands has been clobbered. This patch adds a check to ensure that the registers feeding are both virtual registers or the operands to the splat or swap are both the same register. Here is an example in pseudo-MIR of what happens in the test cased added in this patch: Before PPC MI peephole optimization: ``` %arg = XVADDDP %0, %1 $f1 = COPY %arg.sub_64 call double rint(double) %res.first = COPY $f1 %vec.res.first = SUBREG_TO_REG 1, %res.first, %subreg.sub_64 %arg.swapped = XXPERMDI %arg, %arg, 2 $f1 = COPY %arg.swapped.sub_64 call double rint(double) %res.second = COPY $f1 %vec.res.second = SUBREG_TO_REG 1, %res.second, %subreg.sub_64 %vec.res.splat = XXPERMDI %vec.res.first, %vec.res.second, 0 %vec.res = XXPERMDI %vec.res.splat, %vec.res.splat, 2 ; %vec.res == [ %vec.res.second[0], %vec.res.first[0] ] ``` After optimization: ``` ; ... %vec.res.splat = XXPERMDI %vec.res.first, %vec.res.second, 0 ; lookThruCopyLike(%vec.res.first) == lookThruCopyLike(%vec.res.second) == $f1 ; so the pass replaces the swap with a copy: %vec.res = COPY %vec.res.splat ; %vec.res == [ %vec.res.first[0], %vec.res.second[0] ] ``` As best as I can tell, this has occurred since r288152, which added support for lowering certain vector operations to direct moves in the form of a splat. Committed for vddvss (Colin Samples). Thanks Colin for the patch! Differential Revision: https://reviews.llvm.org/D69497	2019-12-07 14:51:20 +08:00
Amara Emerson	c77b441140	[AArch64][GlobalISel] Add support for selection of vector G_SHL with immediates. Only implemented for the type combinations already supported for G_SHL. Differential Revision: https://reviews.llvm.org/D71153	2019-12-06 16:24:57 -08:00
Sam Clegg	b4f4e370b5	[WebAssebmly][MC] Support .import_name/.import_field asm directives Convert the MC test to use asm rather than bitcode. This is a precursor to https://reviews.llvm.org/D70520. Differential Revision: https://reviews.llvm.org/D70877	2019-12-06 15:09:56 -08:00
Amara Emerson	84fdd9d7a5	[X86] Fix prolog/epilog mismatch for stack protectors on win32-macho. The xor'ing behaviour is only used for msvc/crt environments, when we're targeting macho the guard load code doesn't know about the xor in the epilog. Disable xor'ing when targeting win32-macho to be consistent. Differential Revision: https://reviews.llvm.org/D71095	2019-12-06 14:44:56 -08:00
Craig Topper	28b573d249	[TargetLowering] Fix another potential FPE in expandFP_TO_UINT D53794 introduced code to perform the FP_TO_UINT expansion via FP_TO_SINT in a way that would never expose floating-point exceptions in the intermediate steps. Unfortunately, I just noticed there is still a way this can happen. As discussed in D53794, the compiler now generates this sequence: // Sel = Src < 0x8000000000000000 // Val = select Sel, Src, Src - 0x8000000000000000 // Ofs = select Sel, 0, 0x8000000000000000 // Result = fp_to_sint(Val) ^ Ofs The problem is with the Src - 0x8000000000000000 expression. As I mentioned in the original review, that expression can never overflow or underflow if the original value is in range for FP_TO_UINT. But I missed that we can get an Inexact exception in the case where Src is a very small positive value. (In this case the result of the sub is ignored, but that doesn't help.) Instead, I'd suggest to use the following sequence: // Sel = Src < 0x8000000000000000 // FltOfs = select Sel, 0, 0x8000000000000000 // IntOfs = select Sel, 0, 0x8000000000000000 // Result = fp_to_sint(Val - FltOfs) ^ IntOfs In the case where the value is already in range of FP_TO_SINT, we now simply compute Val - 0, which now definitely cannot trap (unless Val is a NaN in which case we'd want to trap anyway). In the case where the value is not in range of FP_TO_SINT, but still in range of FP_TO_UINT, the sub can never be inexact, as Val is between 2^(n-1) and (2^n)-1, i.e. always has the 2^(n-1) bit set, and the sub is always simply clearing that bit. There is a slight complication in the case where Val is a constant, so we know at compile time whether Sel is true or false. In that scenario, the old code would automatically optimize the sub away, while this no longer happens with the new code. Instead, I've added extra code to check for this case and then just fall back to FP_TO_SINT directly. (This seems to catch even slightly more cases.) Original version of the patch by Ulrich Weigand. X86 changes added by Craig Topper Differential Revision: https://reviews.llvm.org/D67105	2019-12-06 14:11:04 -08:00
Sanjay Patel	d5abaaf140	[InstSimplify] add tests for copysign with fneg operand; NFC	2019-12-06 16:23:44 -05:00
Reid Kleckner	c089f02898	[X86] Don't setup and teardown memory for a musttail call Summary: musttail calls should not require allocating extra stack for arguments. Updates to arguments passed in memory should happen in place before the epilogue. This bug was mostly a missed optimization, unless inalloca was used and store to push conversion fired. If a reserved call frame was used for an inalloca musttail call, the call setup and teardown instructions would be deleted, and SP adjustments would be inserted in the prologue and epilogue. You can see these are removed from several test cases in this change. In the case where the stack frame was not reserved, i.e. call frame optimization fires and turns argument stores into pushes, then the imbalanced call frame setup instructions created for inalloca calls become a problem. They remain in the instruction stream, resulting in a call setup that allocates zero bytes (expected for inalloca), and a call teardown that deallocates the inalloca pack. This deallocation was unbalanced, leading to subsequent crashes. Reviewers: hans Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71097	2019-12-06 12:58:54 -08:00
Hiroshi Yamauchi	2eb30fafa5	Revert "[PGO][PGSO] Instrument the code gen / target passes." This reverts commit `9a0b5e1407`. This seems to break buildbots.	2019-12-06 12:17:32 -08:00
Wenlei He	7b61ae68ec	[AutoFDO] Inline replay for cold/small callees from sample profile loader Summary: Sample profile loader of AutoFDO tries to replay previous inlining using context sensitive profile. The replay only repeats inlining if the call site block is hot. As a result it punts inlining of small functions, some of which can be beneficial for size, and will still be inlined by CSGCC inliner later. The oscillation between sample profile loader's inlining and regular CGSSC inlining cause unnecessary loss of context-sensitive profile. It doesn't have much impact for inline decision itself, but it negatively affects post-inline profile quality as CGSCC inliner have to scale counts which is not as accurate as the original context sensitive profile, and bad post-inline profile can misguide code layout. This change added regular Inline Cost calculation for sample profile loader, so we can inline small functions upfront under switch -sample-profile-inline-size. In addition -sample-profile-cold-inline-threshold is added so we can tune the separate size threshold - currently the default is chosen to be the same as regular inliner's cold call-site threshold. Reviewers: wmi, davidxl Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70750	2019-12-06 11:44:45 -08:00
Alina Sbirlea	c7faa68142	Revert "ARM-Darwin: keep the frame register reserved even if not updated." This reverts commit `a7d90af1be`. This revision came back as the root-cause for crashes in internal ARM-IOS apps. Reproducer in https://bugs.llvm.org/show_bug.cgi?id=44231.	2019-12-06 10:59:26 -08:00
Sanjay Patel	7ff0fcb53f	[x86] add cost model special-case for insert/extract from element 0 This is a follow-up to D70607 where we made any extract element on SLM more costly than default. But that is pessimistic for extract from element 0 because that corresponds to x86 movd/movq instructions. These generally have >1 cycle latency, but they are probably implemented as single uop instructions. Note that no vectorization tests are affected by this change. Also, no targets besides SLM are affected because those are falling through to the default cost of 1 anyway. But this will become visible/important if we add more specializations via cost tables. Differential Revision: https://reviews.llvm.org/D71023	2019-12-06 13:50:25 -05:00
Hiroshi Yamauchi	9a0b5e1407	[PGO][PGSO] Instrument the code gen / target passes. Summary: Split off of D67120. Add the profile guided size optimization instrumentation / queries in the code gen or target passes. This doesn't enable the size optimizations in those passes yet as they are currently disabled in shouldOptimizeForSize (for non-IR pass queries). Reviewers: davidxl Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71072	2019-12-06 10:43:39 -08:00
Guozhi Wei	72942459d0	[MBP] Avoid tail duplication if it can't bring benefit Current tail duplication integrated in bb layout is designed to increase the fallthrough from a BB's predecessor to its successor, but we have observed cases that duplication doesn't increase fallthrough, or it brings too much size overhead. To overcome these two issues in function canTailDuplicateUnplacedPreds I add two checks: make sure there is at least one duplication in current work set. the number of duplication should not exceed the number of successors. The modification in hasBetterLayoutPredecessor fixes a bug that potential predecessor must be at the bottom of a chain. Differential Revision: https://reviews.llvm.org/D64376	2019-12-06 09:53:53 -08:00
diggerlin	4a7e00df34	[AIX][XCOFF] created a test case to verify the raw text section of xcoffobject file SUMMARY: in the patch https://reviews.llvm.org/D66969 . we need a test case to verify the out text section of the xcoffobject file is correct or not. but we do not have llvm disassembly tools to dump the xcoffobjectfile . since we commit the patch https://reviews.llvm.org/D70255, we have tools for it. we create this test case for it. Reviewers: daltenty,hubert.reinterpretcast, Differential Revision: https://reviews.llvm.org/D70719	2019-12-06 10:12:09 -05:00
Cullen Rhodes	2c63e8e36d	[AArch64] Fix a bug with jump table generation Summary: When trying to calculate the offsets for the jump table entries we fail to take into account the block alignment, which could be greater than 4 bytes. This led to cases where the jump table offset was too big to fit in a byte. Reviewers: t.p.northover, sdesmalen, ostannard Reviewed By: ostannard Subscribers: ostannard, kristof.beyls, hiraditya, llvm-commits Committed on behalf of David Sherwood (david-arm) Tags: #llvm Differential Revision: https://reviews.llvm.org/D70533	2019-12-06 14:31:53 +00:00
Jeremy Morse	4650b2f369	Attempt to fix a debuginfo test that wasn't as generic as I thought An ARM buildbot croaks when this test doesn't have a triple specified: http://lab.llvm.org:8011/builders/clang-cmake-armv7-quick/builds/12021/ Move the test to the X86 directory and put an x86_64 triple on the llc command line.	2019-12-06 12:51:58 +00:00
Georgii Rymar	18cf93a6ed	[llvm-readobj][llvm-readelf] - Refactor parsing of the SHT_GNU_versym section. This introduce a new helper which is used to parse the SHT_GNU_versym section. LLVM/GNU styles implementations now use it to share the logic. Differential revision: https://reviews.llvm.org/D71054	2019-12-06 15:35:05 +03:00
Cullen Rhodes	b31a531f9b	[AArch64][SVE2] Implement while comparison intrinsics Summary: Adds the following intrinsics: * whilege, whilegt, whilehi, whilehs Reviewers: sdesmalen, rovka, dancgr, efriedma, rengolin, huntergr Reviewed By: sdesmalen Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70909	2019-12-06 11:29:34 +00:00
Georgii Rymar	cd2c409ceb	[llvm-readobj] - Implement --dependent-libraries flag. There is no way to dump SHT_LLVM_DEPENDENT_LIBRARIES sections currently. This patch implements this. The section is described here: https://llvm.org/docs/Extensions.html#sht-llvm-dependent-libraries-section-dependent-libraries Differential revision: https://reviews.llvm.org/D70665	2019-12-06 14:28:29 +03:00
Jeremy Morse	c93a9b15ce	[DebugInfo][CGP] Update dbg.values when sinking address computations One of CodeGenPrepare's optimizations is to duplicate address calculations into basic blocks, so that as much information as possible can be folded into memory addressing operands. This is great -- but the dbg.value variable location intrinsics are not updated in the same way. This can lead to dbg.values referring to address computations in other blocks that will never be encoded into the DAG, while duplicate address computations are performed locally that could be used by the dbg.value. Some of these (such as non-constant-offset GEPs) can't be salvaged past. Fix this by, whenever we duplicate an address computation into a block, looking for dbg.value users of the original memory address in the same block, and redirecting those to the local computation. Differential Revision: https://reviews.llvm.org/D58403	2019-12-06 11:27:19 +00:00
Ulrich Weigand	b3009edcf3	[X86] Regenerate test to fix build bot failures After my recent commit `daee549` the following test case is failing: CodeGen/X86/vector-constrained-fp-intrinsics.ll Not sure why I didn't catch this earlier, seems to be affected by other changes that came in recently. Fixed by regerenating the test again. Sorry for the disruption!	2019-12-06 12:11:56 +01:00
Cullen Rhodes	bb8c679f4b	[AArch64][SVE] Implement integer compare intrinsics Summary: Adds intrinsics for the following: * cmphs, cmphi * cmpge, cmpgt * cmpeq, cmpne * cmplt, cmple * cmplo, cmpls Includes a minor change to `TLI.getMemValueType` that fixes a crash due to the scalable flag being dropped. Reviewers: sdesmalen, efriedma, rengolin, rovka, dancgr, huntergr Reviewed By: efriedma Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70889	2019-12-06 10:39:06 +00:00
Ulrich Weigand	daee549b17	[FPEnv][SelectionDAG] Relax chain requirements This patch implements the following changes: 1) SelectionDAGBuilder::visitConstrainedFPIntrinsic currently treats each constrained intrinsic like a global barrier (e.g. a function call) and fully serializes all pending chains. This is actually not required; it is allowed for constrained intrinsics to be reordered w.r.t one another or (nonvolatile) memory accesses. The MI-level scheduler already allows for that flexibility, so it makes sense to allow it at the DAG level as well. This patch therefore changes the way chains for constrained intrisincs are created, and handles them basically like load operations are handled. This has the effect that constrained intrinsics are no longer serialized against one another or (nonvolatile) loads. They are still serialized against stores, but that seems hard to change with the current DAG chain setup, and it also doesn't seem to be a big problem preventing DAG 2) The OPC_CheckFoldableChainNode check requires that each of the intermediate nodes in a multi-node pattern match only has a single use. This check tends to fail if those intermediate nodes are strict operations as those have a chain output that typically indeed has another use. However, we don't really need to consider chains here at all, since they will all be rewritten anyway by UpdateChains later. Other parts of the matcher therefore already ignore chains, but this hasOneUse check doesn't. This patch replaces hasOneUse by a custom test that verifies there is no more than one use of any non-chain output value. In theory, this change could affect code unrelated to strict FP nodes, but at least on SystemZ I could not find any single instance of that happening 3) The SystemZ back-end currently does not allow matching multiply-and- extend operations (32x32 -> 64bit or 64x64 -> 128bit FP multiply) for strict FP operations. This was not possible in the past due to the problems described under 1) and 2) above. With those issues fixed, it is now possible to fully support those instructions in strict mode as well, and this patch does so. Differential Revision: https://reviews.llvm.org/D70913	2019-12-06 11:02:11 +01:00
Daniil Suchkov	c4d8c6319f	[LCSSA] Don't use VH callbacks to invalidate SCEV when creating LCSSA phis In general ValueHandleBase::ValueIsRAUWd shouldn't be called when not all uses of the value were actually replaced, though, currently formLCSSAForInstructions calls it when it inserts LCSSA-phis. Calls of ValueHandleBase::ValueIsRAUWd were added to LCSSA specifically to update/invalidate SCEV. In the best case these calls duplicate some of the work already done by SE->forgetValue, though in case when SCEV of the value is SCEVUnknown, SCEV replaces the underlying value of SCEVUnknown with the new value (i.e. acts like LCSSA-phi actually fully replaces the value it is created for), which leads to SCEV being corrupted because LCSSA-phi rarely dominates all uses of its inputs. Fixes bug https://bugs.llvm.org/show_bug.cgi?id=44058. Reviewers: fhahn, efriedma, reames, sanjoy.google Reviewed By: fhahn Subscribers: hiraditya, javed.absar, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70593	2019-12-06 13:21:49 +07:00
Huihui Zhang	381d3c5c45	[ConstantFold][SVE] Skip scalable vectors in ConstantFoldInsertElementInstruction. Summary: Should not constant fold insertelement instruction for scalable vector type. Reviewers: huntergr, sdesmalen, spatel, levedev.ri, apazos, efriedma, willlovett Reviewed By: efriedma, spatel Subscribers: tschuett, hiraditya, rkruppe, psnobl, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70985	2019-12-05 19:43:19 -08:00
Liu, Chen3	3041434450	Add strict fp support for instructions fadd/fsub/fmul/fdiv Differential Revision: https://reviews.llvm.org/D68757	2019-12-06 09:44:33 +08:00
Teresa Johnson	54a3c2a81e	[ThinLTO] Add option to disable readonly/writeonly attribute propagation Summary: Add an option to allow the attribute propagation on the index to be disabled, to allow a workaround for issues (such as that fixed by D70977). Also move the setting of the WithAttributePropagation flag on the index into propagateAttributes(), and remove some old stale code that predated this flag and cleared the maybe read/write only bits when we need to disable the propagation (previously only when importing disabled, now also when the new option disables it). Reviewers: evgeny777, steven_wu Subscribers: mehdi_amini, inglorion, hiraditya, dexonsmith, arphaman, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70984	2019-12-05 16:33:54 -08:00
Quentin Colombet	2ec71ea7c7	[RegisterCoalescer] Fix the creation of subranges when rematerialization is used * Context * During register coalescing, we use rematerialization when coalescing is not possible. That means we may rematerialize a super register when only a smaller register is actually used. E.g., 0B v1 = ldimm 0xFF 1B v2 = COPY v1.low8bits 2B = v2 => 0B v1 = ldimm 0xFF 1B v2 = ldimm 0xFF 2B = v2.low8bits Where xB are the slot indexes. Here v2 grew from a 8-bit register to a 16-bit register. When that happens and subregister liveness is enabled, we create subranges for the newly created value. E.g., before remat, the live range of v2 looked like: main range: [1r, 2r) (Reads v2 is defined at index 1 slot register and used before the slot register of index 2) After remat, it should look like: main range: [1r, 2r) low 8 bits: [1r, 2r) high 8 bits: [1r, 1d) <-- dead def I.e., the unsused lanes of v2 should be marked as dead definition. * The Problem * Prior to this patch, the live-ranges from the previous exampel, would have the full live-range for all subranges: main range: [1r, 2r) low 8 bits: [1r, 2r) high 8 bits: [1r, 2r) <-- too long * The Fix * Technically, the code that this patch changes is not wrong: When we create the subranges for the newly rematerialized value, we create only one subrange for the whole bit mask. In other words, at this point v2 live-range looks like this: main range: [1r, 2r) low & high: [1r, 2r) Then, it gets wrong when we call LiveInterval::refineSubRanges on low 8 bits: main range: [1r, 2r) low 8 bits: [1r, 2r) high 8 bits: [1r, 2r) <-- too long Ideally, we would like LiveInterval::refineSubRanges to be able to do the right thing and mark the dead lanes as such. However, this is not possible, because by the time we update / refine the live ranges, the IR hasn't been updated yet, therefore we actually don't have enough information to do the right thing. Another option to fix the problem would have been to call LiveIntervals::shrinkToUses after the IR is updated. This is not desirable as this may have a noticeable impact on compile time. Instead, what this patch does is when we create the subranges for the rematerialized value, we explicitly create one subrange for the lanes that were used before rematerialization and one for the lanes that were not used. The used one inherits the live range of the main range and the unused one is just created empty. The existing rematerialization code then detects that the unused one are not live and it correctly sets dead def intervals for them. https://llvm.org/PR41372	2019-12-05 16:32:30 -08:00
Wenlei He	532196d811	[AutoFDO] Top-down Inlining for specialization with context-sensitive profile Summary: AutoFDO's sample profile loader processes function in arbitrary source code order, so if I change the order of two functions in source code, the inline decision can change. This also prevented the use of context-sensitive profile to do specialization while inlining. This commit enforces SCC top-down order for sample profile loader. With this change, we can now do specialization, as illustrated by the added test case: Say if we have A->B->C and D->B->C call path, we want to inline C into B when root inliner is B, but not when root inliner is A or D, this is not possible without enforcing top-down order. E.g. Once C is inlined into B, A and D can only choose to inline (B->C) as a whole or nothing, but what we want is only inline B into A and D, not its recursive callee C. If we process functions in top-down order, this is no longer a problem, which is what this commit is doing. This change is guarded with a new switch "-sample-profile-top-down-load" for tuning, and it depends on D70653. Eventually, top-down can be the default order for sample profile loader. Reviewers: wmi, davidxl Subscribers: hiraditya, llvm-commits, tejohnson Tags: #llvm Differential Revision: https://reviews.llvm.org/D70655	2019-12-05 16:07:01 -08:00
Wenlei He	e503fd85d3	[AutoFDO] Properly merge context-sensitive profile of inlinee back to outlined function Summary: When sample profile loader decides not to inline a previously inlined call-site, we adjust the profile of outlined function simply by scaling up its profile counts by call-site count. This means the context-sensitive profile of that inlined instance will be thrown away. This commit try to keep context-sensitive profile for such cases: - Instead of scaling outlined function's profile, we now properly merge the FunctionSamples of inlined instance into outlined function, including all recursively inlined profile. - Instead of adjusting the profile for negative inline decision at the end of the sample profile loader pass, we do the profile merge right after processing each function. This change paired with top-down ordering of annotation/inline-replay (a separate diff) will make sure we recursively merge profile back before the profile is used for annotation and inline replay. A new switch -sample-profile-merge-inlinee is added to enable the new profile merge for tuning. It should be the default behavior eventually. Reviewers: wmi, davidxl Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70653	2019-12-05 15:57:55 -08:00
David Blaikie	decee04e63	DebugInfo: Fix LTO+DWARFv5 loclists The loclists_table_base was being overwritten for each CU even though only one loclists contribution is made so everything but the last CU would have a label that was never defined and fail to assemble.	2019-12-05 12:47:54 -08:00
David Tenty	1ea1e053f6	[AIX] Make sure to use QualNames for external global objects Summary: Previously we only handled the case where the csect hadn't been set up yet, so we'd hit an assert later on. Reviewers: jasonliu, DiggerLin, stevewan Reviewed By: jasonliu Subscribers: hubert.reinterpretcast, wuzish, nemanjai, hiraditya, kbarton, jsji, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71032	2019-12-05 15:22:53 -05:00
Amy Huang	23e63a906d	Use diff -b on zlib tests so they pass on Windows Reviewers: hubert.reinterpretcast Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71036	2019-12-05 11:32:58 -08:00
Florian Hahn	19071173fc	Revert "[DSE] Fix for a dangling point bug in DeadStoreElimination." The commit causes a failure: http://lab.llvm.org:8011/builders/llvm-clang-x86_64-expensive-checks-win/builds/20911 This reverts commit `1847fd9d85`.	2019-12-05 19:29:21 +00:00

1 2 3 4 5 ...

67148 Commits