llvm-project

Commit Graph

Author	SHA1	Message	Date
Reid Kleckner	85ba5f637a	Rename TTI::getIntImmCost for instructions and intrinsics Soon Intrinsic::ID will be a plain integer, so this overload will not be possible. Rename both overloads to ensure that downstream targets observe this as a build failure instead of a runtime failure. Split off from D71320 Reviewers: efriedma Differential Revision: https://reviews.llvm.org/D71381	2019-12-11 18:00:20 -08:00
Nico Weber	d5a43ce688	gn build: (manually) merge `d23c61490c`	2019-12-11 20:41:18 -05:00
Sanjay Patel	83e1bd36be	[AArch64][x86] add tests for possible infinite loops in DAGCombiner; NFC This is a reduction of a test that failed (infinite looped) with rGd1f0bdf2d2df (subsequently reverted). I've duplicated it for 2 targets to increase coverage - everything down here is wobbly.	2019-12-11 19:41:42 -05:00
Vedant Kumar	56232f950d	Revert "[DWARF] Allow cross-CU references of subprogram definitions" This reverts commit `30038da15b`. It causes the stage2 thinLTO bot to fail with: Assertion failed: (CU.getDIE(CalleeSP) && "Expected declaration subprogram DIE for callee") rdar://57840415	2019-12-11 15:55:48 -08:00
Julian Lettner	f38b543b97	[lit] Improve formatting of error messages. NFC	2019-12-11 14:39:39 -08:00
David Tenty	70d14255df	Don't call export_symbols.py with duplicate libs Summary: export_symbols.py discards duplicate symbols, assuming they have public definitions, so if we end up calling it with duplicate libraries we will end up with an inaccurate export list. Reviewers: jasonliu, stevewan, john.brawn Reviewed By: john.brawn Subscribers: mgorny, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70918	2019-12-11 17:23:31 -05:00
Sanjay Patel	cdf5cfea8e	Revert "[SDAG] remove use restriction in isNegatibleForFree() when called from getNegatedExpression()" This reverts commit `d1f0bdf2d2`. The patch can cause infinite loops in DAGCombiner.	2019-12-11 16:56:58 -05:00
Craig Topper	4b452952fe	[LegalizeTypes] In SoftenFloatRes_FP_EXTEND, move the check for input already being promoted above the check for fp16 converting to something other than fp32. The fp16 to larger than fp32 inserts an extend that need to re-legalized if fp16 is promoted. But if we check for fp16 promotion first, then we can avoid emiting the fp_extend all together.	2019-12-11 12:48:08 -08:00
Nikita Popov	fe593fe15f	[ADT] Fix SmallDenseMap assertion with large InlineBuckets Fixes issue encountered in D56362, where I tried to use a SmallSetVector<Instruction*, 128> with an excessively large number of inline elements. This triggers an "Must allocate more buckets than are inline" assertion inside allocateBuckets() under certain usage patterns. The issue is as follows: The grow() method is used either to grow the map, or to rehash it and remove tombstones. The latter is done if the fraction of empty (non-used, non-tombstone) elements is below 1/8. In this case grow() is invoked with the current number of buckets. This is currently incorrectly handled for dense maps using the small rep. The current implementation will switch them over to the large rep, which violates the invariant that the large rep is only used if there are more than InlineBuckets buckets. This patch fixes the issue by staying in the small rep and only moving the buckets. An alternative, if we do want to switch to the large rep in this case, would be to relax the assertion in allocateBuckets(). Differential Revision: https://reviews.llvm.org/D56455	2019-12-11 21:41:14 +01:00
Johannes Doerfert	d23c61490c	[OpenMP] Introduce the OpenMP-IR-Builder This is the initial patch for the OpenMP-IR-Builder, as discussed on the mailing list ([1] and later) and at the US Dev Meeting'19. The design is similar to D61953 but: - in a non-WIP status, with proper documentation and working. - using a OpenMPKinds.def file to manage lists of directives, runtime functions, types, ..., similar to the current Clang implementation. - restricted to handle only (simple) barriers, to implement most `#pragma omp barrier` directives and most implicit barriers. - properly hooked into Clang to be used if possible (D69922). - compatible with the remaining code generation. Parts have been extracted into D69853. The plan is to have multiple people working on moving logic from Clang here once the initial scaffolding (=this patch) landed. [1] http://lists.flang-compiler.org/pipermail/flang-dev_lists.flang-compiler.org/2019-May/000197.html Reviewers: kiranchandramohan, ABataev, RaviNarayanaswamy, gtbercea, grokos, sdmitriev, JonChesterfield, hfinkel, fghanim Subscribers: mgorny, hiraditya, bollu, guansong, jfb, cfe-commits, llvm-commits, penzn, ppenzin Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D69785	2019-12-11 14:38:49 -06:00
Sam Clegg	881d877846	[WebAssembly] Add new `export_name` clang attribute for controlling wasm export names This is equivalent to the existing `import_name` and `import_module` attributes which control the import names in the final wasm binary produced by lld. This maps the existing This attribute currently requires a string rather than using the symbol name for a couple of reasons: 1. Avoid confusion with static and dynamic linking which is based on symbol name. Exporting a function from a wasm module using this directive is orthogonal to both static and dynamic linking. 2. Avoids name mangling. Differential Revision: https://reviews.llvm.org/D70520	2019-12-11 11:54:57 -08:00
Nikita Popov	8db5143b1a	[InstCombine] Optimize overflow check base on uadd.with.overflow result Fix for https://bugs.llvm.org/show_bug.cgi?id=40846. This adds a combine for cases where a (a + b) < a style overflow check is performed, but with a + b being the result of uadd.with.overflow, so the overflow result is also already available and we can just use it. Subsequently GVN/CSE will deduplicate the extracts. We can run into this situation if you have both a uadd.with.overflow and a manual add + overflow check in the same function (on the same operands), in which case GVN will rewrite the add to the with.overflow result and leave you with this pattern. The implementation is a bit ugly because I'm handling the various canonicalization edge cases. This does not yet handle the negated version of this pattern. Differential Revision: https://reviews.llvm.org/D58644	2019-12-11 20:52:04 +01:00
Danila Kutenin	19e83a9b4c	[ValueTracking] Pointer is known nonnull after load/store If the pointer was loaded/stored before the null check, the check is redundant and can be removed. For now the optimizers do not remove the nullptr check, see https://gcc.godbolt.org/z/H2r5GG. The patch allows to use more nonnull constraints. Also, it found one more optimization in some PowerPC test. This is my first llvm review, I am free to any comments. Differential Revision: https://reviews.llvm.org/D71177	2019-12-11 20:32:29 +01:00
Danila Kutenin	fc765698e0	[ValueTracking] Add tests for non-null check after load/store; NFC Tests for D71177.	2019-12-11 20:26:31 +01:00
Nikita Popov	b361d3bbcd	[MergeFuncs] Remove incorrect attribute copying Fix for https://bugs.llvm.org/show_bug.cgi?id=44236. This code was originally introduced in rG36512330041201e10f5429361bbd79b1afac1ea1. However, the attribute copying was done in the wrong place (in general call replacement, not thunk generation) and a proper fix was implemented in D12581. Previously this code was just unnecessary but harmless (because FunctionComparator ensured that the attributes of the two functions are exactly the same), but since byval was changed to accept a type this copying is actively wrong and may result in malformed IR. Differential Revision: https://reviews.llvm.org/D71173	2019-12-11 20:09:54 +01:00
Fangrui Song	25e21a09b3	Fix -Wunused-variable in -DLLVM_ENABLE_ASSERTIONS=Off builds after D65958 and D70450	2019-12-11 11:04:03 -08:00
Andrzej Warzynski	a75463c471	Add intrinsics for unary narrowing operations Summary: The following intrinsics for unary narrowing operations are added: * @llvm.aarch64.sve.sqxtnb * @llvm.aarch64.sve.uqxtnb * @llvm.aarch64.sve.sqxtunb * @llvm.aarch64.sve.sqxtnt * @llvm.aarch64.sve.uqxtnt * @llvm.aarch64.sve.sqxtunt Reviewers: sdesmalen, rengolin, efriedma Reviewed By: efriedma Subscribers: tschuett, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71270	2019-12-11 18:55:51 +00:00
Jonas Devlieghere	e59f0af7d5	[VFS] Disable check for ../foo on Windows I'm not sure how .. is resolved on Windows. Disable it for now to make the bots happy again.	2019-12-11 10:53:35 -08:00
Florian Hahn	2675a3c880	[AArch64] Be more careful to skip debug operands in LdSt Optimizier. This fixes crashes with $noreg operands.	2019-12-11 18:47:45 +00:00
Jonas Devlieghere	db76588964	[StringRef] Test all default characters in unit test The default characters for trim, ltrim and rtrim are " \t\n\v\f\r" but only spaces were tested. Test that the others are trimmed as well.	2019-12-11 10:46:07 -08:00
Sanjay Patel	d1f0bdf2d2	[SDAG] remove use restriction in isNegatibleForFree() when called from getNegatedExpression() This is an alternate fix for the bug discussed in D70595. This also includes minimal tests for other in-tree targets to show the problem more generally. We check the number of uses as a predicate for whether some value is free to negate, but that use count can change as we rewrite the expression in getNegatedExpression(). So something that was marked free to negate during the cost evaluation phase becomes not free to negate during the rewrite phase (or the inverse - something that was not free becomes free). This can lead to a crash/assert because we expect that everything in an expression that is negatible to be handled in the corresponding code within getNegatedExpression(). This patch skips the use check during the rewrite phase. So we determine that some expression isNegatibleForFree (identically to without this patch), but during the rewrite, don't rely on use counts to decide how to create the optimal expression. Differential Revision: https://reviews.llvm.org/D70975	2019-12-11 13:30:39 -05:00
Jonas Devlieghere	f2f7749973	[VFS] Extend virtual working directory test Extend the virtual working directory test with a few edge cases that are not currently tested.	2019-12-11 09:50:41 -08:00
Bardia Mahjour	916d37a2bc	[DA] Improve dump to show source and sink of the dependence Summary: The current da printer shows the dependence without indicating which instructions are being considered as the src vs dst. It also silently ignores call instructions, despite the fact that they create confused dependence edges to other memory instructions. This patch addresses these two issues plus a couple of minor non-functional improvements. Authored By: bmahjour Reviewer: dmgreen, fhahn, philip.pfaffe, chandlerc Reviewed By: dmgreen, fhahn Tags: #llvm Differential Revision: https://reviews.llvm.org/D71088	2019-12-11 11:48:16 -05:00
Florian Hahn	4fe92abceb	[AArch64] Skip debug ops with regsOverlap in AArch64 LD/ST opt. This fixes a crash when debug instructions are in between 2 stores.	2019-12-11 16:26:31 +00:00
Ulrich Weigand	5ad67df988	[SystemZ] Add llvm.minimum / llvm.maximum tests The backend already supports the @llvm.minimum and @llvm.maximum intrinsics, but we had no test cases for those. Add tests.	2019-12-11 17:01:13 +01:00
Craig Topper	3adc819b7a	[X86] Erase dead LEA instruction after converting it to MOV in FixupLEAPass::processInstrForSlow3OpLEA.	2019-12-11 07:51:23 -08:00
Reid Kleckner	72c68f1352	[TableGen] Remove unused target intrinsic generation logic AMDGPU was the last in tree target to use this tablegen mode. I plan to split up the global intrinsic enum similar to the way that clang diagnostics are split up today. I don't plan to build on this mode. Reviewers: arsenm, echristo, efriedma Reviewed By: echristo Differential Revision: https://reviews.llvm.org/D71318	2019-12-11 07:38:45 -08:00
Ulrich Weigand	ac473394ff	[SystemZ] Fix 128-bit strict FMA expansion pre-z14 Before z14, we did not have any FMA instruction for 128-bit floating-point, so the @llvm.fma.f128 intrinsic needs to be expanded to a libcall on those platforms. This worked correctly for regular FMA, but was implemented incorrectly for the strict version. This was not noticed because we did not have test coverage for this case. This patch fixes that incorrect expansion and adds the missing test cases.	2019-12-11 16:32:08 +01:00
Kit Barton	942c9946cc	[Loop] Add isRotated method to Loop class. Summary: This patch adds a method to determine if a loop is in rotated form (the latch is an exiting block). It also modifies the getLoopGuardBranch method to use this new method. This method can also be used in Loopfusion. Once this patch lands I will make the corresponding changes there. Reviewers: jdoerfert, Meinersbur, dmgreen, etiotto, Whitney, fhahn, hfinkel Reviewed By: Meinersbur Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65958	2019-12-11 09:43:10 -05:00
Russell Gallop	df494f7512	[Support] Add TimeTraceScope constructor without detail arg This simplifies code where no extra details are required Also don't write out detail when it is empty. Differential Revision: https://reviews.llvm.org/D71347	2019-12-11 14:32:21 +00:00
Diogo Sampaio	ee21934588	[ARM][NFC] Change test to use CHECK-NEXT	2019-12-11 14:25:36 +00:00
Matt Arsenault	49d731b5e0	Verifier: Check frame-pointer attribute values There are a few places that check specific string attributes have particular values, and assert if they are something else. The verifier should catch these kinds of cases.	2019-12-11 19:53:49 +05:30
Matt Arsenault	32137699f7	AMDGPU: Fix copy-pasted test name error	2019-12-11 19:44:47 +05:30
Kerry McLaughlin	c0a3ab3655	Revert "[AArch64][SVE] Implement intrinsics for non-temporal loads & stores" This reverts commit `3f5bf35f86` as it was causing build failures in llvm-clang-x86_64-expensive-checks: http://lab.llvm.org:8011/builders/llvm-clang-x86_64-expensive-checks-debian/builds/392 http://lab.llvm.org:8011/builders/llvm-clang-x86_64-expensive-checks-ubuntu/builds/1045	2019-12-11 13:58:39 +00:00
Florian Hahn	17554b8961	[AArch64] Teach Load/Store optimizier to rename store operands for pairing. In some cases, we can rename a store operand, in order to enable pairing of stores. For store pairs, that cannot be merged because the first tored register is defined in between the second store, we try to find suitable rename register. First, we check if we can rename the given register: 1. The first store register must be killed at the store, which means we do not have to rename instructions after the first store. 2. We scan backwards from the first store, to find the definition of the stored register and check all uses in between are renamable. Along they way, we collect the minimal register classes of the uses for overlapping (sub/super)registers. Second, we try to find an available register from the minimal physical register class of the original register. A suitable register must not be 1. defined before FirstMI 2. between the previous definition of the register to rename 3. a callee saved register. We use KILL flags to clear defined registers while scanning from the beginning to the end of the block. This triggers quite often, here are the top changes for MultiSource, SPEC2000, SPEC2006 compiled with -O3 for iOS: Metric: aarch64-ldst-opt.NumPairCreated Program base patch diff test-suite...nch/fourinarow/fourinarow.test 2.00 39.00 1850.0% test-suite...s/ASC_Sequoia/IRSmk/IRSmk.test 46.00 80.00 73.9% test-suite...chmarks/Olden/power/power.test 70.00 96.00 37.1% test-suite...cations/hexxagon/hexxagon.test 29.00 39.00 34.5% test-suite...nchmarks/McCat/05-eks/eks.test 100.00 132.00 32.0% test-suite.../Trimaran/enc-rc4/enc-rc4.test 46.00 59.00 28.3% test-suite...T2006/473.astar/473.astar.test 160.00 200.00 25.0% test-suite.../Trimaran/enc-md5/enc-md5.test 8.00 10.00 25.0% test-suite...telecomm-gsm/telecomm-gsm.test 113.00 139.00 23.0% test-suite...ediabench/gsm/toast/toast.test 113.00 139.00 23.0% test-suite...Source/Benchmarks/sim/sim.test 91.00 111.00 22.0% test-suite...C/CFP2000/179.art/179.art.test 41.00 49.00 19.5% test-suite...peg2/mpeg2dec/mpeg2decode.test 245.00 279.00 13.9% test-suite...marks/Olden/health/health.test 16.00 18.00 12.5% test-suite...ks/Prolangs-C/cdecl/cdecl.test 90.00 101.00 12.2% test-suite...fice-ispell/office-ispell.test 91.00 100.00 9.9% test-suite...oxyApps-C/miniGMG/miniGMG.test 430.00 465.00 8.1% test-suite...lowfish/security-blowfish.test 39.00 42.00 7.7% test-suite.../Applications/spiff/spiff.test 42.00 45.00 7.1% test-suite...arks/mafft/pairlocalalign.test 2473.00 2646.00 7.0% test-suite.../VersaBench/ecbdes/ecbdes.test 29.00 31.00 6.9% test-suite...nch/beamformer/beamformer.test 220.00 235.00 6.8% test-suite...CFP2000/177.mesa/177.mesa.test 2110.00 2252.00 6.7% test-suite...ve-susan/automotive-susan.test 109.00 116.00 6.4% test-suite...s-C/unix-smail/unix-smail.test 65.00 69.00 6.2% test-suite...CI_Purple/SMG2000/smg2000.test 1194.00 1265.00 5.9% test-suite.../Benchmarks/nbench/nbench.test 472.00 500.00 5.9% test-suite...oxyApps-C/miniAMR/miniAMR.test 248.00 262.00 5.6% test-suite...quoia/CrystalMk/CrystalMk.test 18.00 19.00 5.6% test-suite...rks/tramp3d-v4/tramp3d-v4.test 7331.00 7710.00 5.2% test-suite.../Benchmarks/Bullet/bullet.test 5651.00 5938.00 5.1% test-suite...ternal/HMMER/hmmcalibrate.test 750.00 788.00 5.1% test-suite...T2006/456.hmmer/456.hmmer.test 764.00 802.00 5.0% test-suite...ications/JM/ldecod/ldecod.test 1028.00 1079.00 5.0% test-suite...CFP2006/444.namd/444.namd.test 1368.00 1434.00 4.8% test-suite...marks/7zip/7zip-benchmark.test 4471.00 4685.00 4.8% test-suite...6/464.h264ref/464.h264ref.test 3122.00 3271.00 4.8% test-suite...pplications/oggenc/oggenc.test 1497.00 1565.00 4.5% test-suite...T2000/300.twolf/300.twolf.test 742.00 774.00 4.3% test-suite.../Prolangs-C/loader/loader.test 24.00 25.00 4.2% test-suite...0.perlbench/400.perlbench.test 1983.00 2058.00 3.8% test-suite...ications/JM/lencod/lencod.test 4612.00 4785.00 3.8% test-suite...yApps-C++/PENNANT/PENNANT.test 995.00 1032.00 3.7% test-suite...arks/VersaBench/dbms/dbms.test 54.00 56.00 3.7% Reviewers: efriedma, thegameg, samparker, dmgreen, paquette, evandro Reviewed By: paquette Differential Revision: https://reviews.llvm.org/D70450	2019-12-11 13:50:11 +00:00
James Henderson	5224feb7ca	[test][llvm-dwarfdump] Add missing testing for some --debug-* options A number of the --debug-* options in llvm-dwarfdump are not particularly well tested. In some cases, the option is only tested as part of testing another feature, or a specific part of the section that the options dump. This change adds four new tests to address some of these holes. It is not aiming to address every hole however. I kept the --debug-line switch test separate to X86/brief.s because the latter only considers the parts of the line table that are affected by verbose printing, thus missing out things like the header and different values for things like the Line, Column etc registers. Reviewed by: JDevlieghere Differential Revision: https://reviews.llvm.org/D71276	2019-12-11 13:42:54 +00:00
Guillaume Chatelet	0a0d54b357	[Alignment][NFC] Introduce Align in IRBuilder Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71343	2019-12-11 14:41:23 +01:00
James Henderson	2f8155023a	[DebugInfo] Fix printing of DW_LNS_set_isa The Isa register is a uint8_t, but at least on Windows this is internally an unsigned char, which meant that prior to this patch it got formatted as an ASCII character, rather than a decimal number. This patch fixes this by casting it to a uint64_t before printing. I did it this way instead of using a uint8_t formatter because a) it is simpler, and b) it allows us to change the internal type of Isa in the future without this code breaking. I also took the opportunity to test the printing of the other standard opcodes. Reviewed by: probinson Differential Revision: https://reviews.llvm.org/D71274	2019-12-11 13:38:41 +00:00
Guillaume Chatelet	3491109587	Rollback assumeAligned in MemorySanitizer Summary: Rollback of parts of D71213. After digging more into the code I think we should leave 0 when creating the instructions (CreateMemcpy, CreateMaskedStore, CreateMaskedLoad). It's probably fine for MemorySanitizer because Alignement is resolved but I'm having a hard time convincing myself it has no impact at all (although tests are passing). Reviewers: courbet Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71332	2019-12-11 14:25:21 +01:00
Andrzej Warzynski	65651f197a	[AArch64][SVE] Add DAG combine rules for gather loads and sext/zext Summary: These changes allow us to support sign-extending gather loads with the exisiting intrinsics (i.e. @llvm.aarch64.sve.ld1.gather.*). Reviewers: sdesmalen, huntergr, kmclaughlin, efriedma, rengolin, rovka, dancgr, mgudim Reviewed By: sdesmalen Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits Tags: #llvm Differential revision: https://reviews.llvm.org/D70812	2019-12-11 12:56:18 +00:00
LLVM GN Syncbot	5ceb36b212	gn build: Merge `afb13afcf2`	2019-12-11 12:07:57 +00:00
Georgii Rymar	9a5c849991	[llvm-readobj][llvm-readelf] - Remove excessive empty lines when reporting errors and warnings. After recent changes it is now seems possible to get rid of printing '\n' before each error and warning. This makes the output cleaner. Differential revision: https://reviews.llvm.org/D71246	2019-12-11 15:06:33 +03:00
Oliver Stannard	6ae3d310bd	Revert "Reland [AArch64][MachineOutliner] Return address signing for outlined functions" This reverts commit `cec2d5c174`. Reverting because this is still creating outlined functions with return address signing instructions with mismatches SP values. For example: int *volatile v; void foo(int x) { int a[x]; v = &a[0]; v = &a[0]; v = &a[0]; v = &a[0]; v = &a[0]; v = &a[0]; } void bar(int x) { int a[x]; v = 0; v = &a[0]; v = &a[0]; v = &a[0]; v = &a[0]; v = &a[0]; } This generates these two outlined functions, both of which modify SP between the paciasp and retaa instructions: $ clang --target=aarch64-arm-none-eabi -march=armv8.3-a -c test2.c -o - -S -Oz -mbranch-protection=pac-ret+leaf ... OUTLINED_FUNCTION_0: // @OUTLINED_FUNCTION_0 .cfi_sections .debug_frame .cfi_startproc // %bb.0: paciasp .cfi_negate_ra_state mov w8, w0 lsl x8, x8, #2 add x8, x8, #15 // =15 mov x9, sp and x8, x8, #0x7fffffff0 sub x8, x9, x8 mov x29, sp mov sp, x8 adrp x9, v retaa ... OUTLINED_FUNCTION_1: // @OUTLINED_FUNCTION_1 .cfi_startproc // %bb.0: paciasp .cfi_negate_ra_state str x8, [x9, :lo12:v] str x8, [x9, :lo12:v] str x8, [x9, :lo12:v] str x8, [x9, :lo12:v] str x8, [x9, :lo12:v] mov sp, x29 retaa	2019-12-11 12:06:20 +00:00
Simon Tatham	1fed9a0c0c	[TableGen] Add bang-operators !getop and !setop. Summary: These allow you to get and set the operator of a dag node, without affecting its list of arguments. `!getop` is slightly fiddly because in many contexts you need its return value to have a static type more specific than 'any record'. It works to say `!cast<BaseClass>(!getop(...))`, but it's cumbersome, so I made `!getop` take an optional type suffix itself, so that can be written as the shorter `!getop<BaseClass>(...)`. Reviewers: hfinkel, nhaehnle Reviewed By: nhaehnle Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71191	2019-12-11 12:05:22 +00:00
Kerry McLaughlin	3f5bf35f86	[AArch64][SVE] Implement intrinsics for non-temporal loads & stores Summary: Adds the following intrinsics: - llvm.aarch64.sve.ldnt1 - llvm.aarch64.sve.stnt1 This patch creates masked loads and stores with the MONonTemporal flag set when used with the intrinsics above. Reviewers: sdesmalen, paulwalker-arm, dancgr, mgudim, efriedma, rengolin Reviewed By: efriedma Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, cfe-commits, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71000	2019-12-11 11:13:51 +00:00
czhengsz	bf4580b7e7	[PowerPC][NFC] add test case for lwa - loop ds form prep	2019-12-11 06:10:11 -05:00
Sjoerd Meijer	d97cf1f889	[ARM][LowOverheadLoops] Remove dead loop update instructions. After creating a low-overhead loop, the loop update instruction was still lingering around hurting performance. This removes dead loop update instructions, which in our case are mostly SUBS instructions. To support this, some helper functions were added to MachineLoopUtils and ReachingDefAnalysis to analyse live-ins of loop exit blocks and find uses before a particular loop instruction, respectively. This is a first version that removes a SUBS instruction when there are no other uses inside and outside the loop block, but there are some more interesting cases in test/CodeGen/Thumb2/LowOverheadLoops/mve-tail-data-types.ll which shows that there is room for improvement. For example, we can't handle this case yet: .. dlstp.32 lr, r2 .LBB0_1: mov r3, r2 subs r2, #4 vldrh.u32 q2, [r1], #8 vmov q1, q0 vmla.u32 q0, q2, r0 letp lr, .LBB0_1 @ %bb.2: vctp.32 r3 .. which is a lot more tricky because r2 is not only used by the subs, but also by the mov to r3, which is used outside the low-overhead loop by the vctp instruction, and that requires a bit of a different approach, and I will follow up on this. Differential Revision: https://reviews.llvm.org/D71007	2019-12-11 10:20:19 +00:00
Simon Tatham	bd0f271c9e	[ARM][MVE] Add intrinsics for immediate shifts. (reland) This adds the family of `vshlq_n` and `vshrq_n` ACLE intrinsics, which shift every lane of a vector left or right by a compile-time immediate. They mostly work by expanding to the IR `shl`, `lshr` and `ashr` operations, with their second operand being a vector splat of the immediate. There's a fiddly special case, though. ACLE specifies that the immediate in `vshrq_n` can take values up to //and including// the bit size of the vector lane. But LLVM IR thinks that shifting right by the full size of the lane is UB, and feels free to replace the `lshr` with an `undef` half way through the optimization pipeline. Hence, to keep this legal in source code, I have to detect it at codegen time. Logical (unsigned) right shifts by the element size are handled by simply emitting the zero vector; arithmetic ones are converted into a shift of one bit less, which will always give the same output. In order to do that check, I also had to enhance the tablegen MveEmitter so that it can cope with converting a builtin function's operand into a bare integer to pass to a code-generating subfunction. Previously the only bare integers it knew how to handle were flags generated from within `arm_mve.td`. Reviewers: dmgreen, miyuki, MarkMurrayARM, ostannard Reviewed By: dmgreen, MarkMurrayARM Subscribers: echristo, hokein, rdhindsa, kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D71065	2019-12-11 10:10:09 +00:00
QingShan Zhang	46822083ef	[NFC] Correct the example in the comments of JSON.h to avoid mislead user	2019-12-11 10:03:34 +00:00
Sam Parker	ee7579409b	[ARM][TypePromotion] Enable by default Enable the TypePromotion pass my default (again). This patch was originally committed in `393dacacf7`. This patch was reverted in `a38396939c`. Differential Revision: https://reviews.llvm.org/D70998	2019-12-11 10:00:16 +00:00

1 2 3 4 5 ...

188783 Commits